Generative AI systems, hallucinations, and mounting technical debt

As AI systems like large language models (LLMs) grow in size and complexity, researchers are uncovering intriguing fundamental limitations.

Recent studies from Google and the University of Singapore have uncovered the mechanics behind AI “hallucinations” – where models generate convincing but fabricated information – and the buildup of “technical debt,” which could create messy, unreliable systems over time.

Beyond the technical challenges, aligning AI’s capabilities and incentives with human values stays an open query.

As corporations like OpenAI push towards artificial general intelligence (AGI), securing the trail ahead means acknowledging the boundaries of current systems.

However, fastidiously acknowledging risks is antithetical to Silicon Valley’s motto to “move fast and break things,” which characterizes AI R&D because it did for tech innovations before it.

Study 1: AI models are accruing ‘technical debt’

Machine learning is usually touted as constantly scalable, with systems offering a modular, integrated framework for development.

However, within the background, developers could also be accruing a high level of ‘technical debt’ they’ll need to unravel down the road.

In a Google research paper, “Machine Learning: The High-Interest Credit Card of Technical Debt,” researchers discuss the concept of technical debt within the context of ML systems.

Kaggle CEO and long-time Google researcher D. Sculley and colleagues argue that while ML offers powerful tools for rapidly constructing complex systems, these “quick wins” are sometimes misleading.

The simplicity and speed of deploying ML models can mask the longer term burdens they impose on system maintainability and evolution.

As the authors describe, this hidden debt arises from several ML-specific risk aspects that developers should avoid or refactor.

Here are the important thing insights:

ML systems, by their nature, introduce a level of complexity beyond coding alone. This can result in what the authors call “boundary erosion,” where the clear lines between different system components turn into blurred on account of the interdependencies created by ML models. This makes it difficult to isolate and implement improvements without affecting other parts of the system.
The paper also highlights the issue of “entanglement,” where changes to any a part of an ML system, reminiscent of input features or model parameters, can have unpredictable effects on the remainder of the system. Altering one small parameter might instigate a cascade of effects that impacts a complete model’s function and integrity.
Another issue is the creation of “hidden feedback loops,” where ML models influence their very own training data in unexpected ways. This can result in systems that evolve in unintended directions, compounding the issue of managing and understanding the system’s behavior.
The authors also address “data dependencies,” reminiscent of where input signals change over time, that are particularly problematic as they’re harder to detect.

Why technical debt matters

Technical doubt touches on the long-term health and efficiency of ML systems.

When developers rush to get ML systems up and running, they could ignore the messy intricacies of information handling or the pitfalls of ‘gluing’ together different parts.

This might work within the short term but can result in a tangled mess that’s hard to dissect, update, and even understand later.

GenAI is an avalanche of technical debt* waiting to occur

Just this week
ChatGPT went “berserk” with almost no real explanation
Sora can’t consistently infer what number of legs a cat has
Gemini’s diversity intervention went utterly off the rails.… pic.twitter.com/qzrVlpX9yz

— Gary Marcus @ AAAI 2024 (@GaryMarcus) February 24, 2024

For example, using ML models as-is from a library seems efficient until you’re stuck with a “glue code” nightmare, where many of the system is just duct tape holding together bits and pieces that weren’t meant to suit together.

Or consider “pipeline jungles,” described in a previous paper by D. Sculley and colleagues, where data preparation becomes a labyrinth of intertwined processes, so making a change appears like defusing a bomb.

The implications of technical debt

For starters, the more tangled a system becomes, the harder it’s to enhance or maintain it. This not only stifles innovation but may also result in more sinister issues.

For instance, if an ML system starts making decisions based on outdated or biased data since it’s too cumbersome to update, it will probably reinforce or amplify societal biases.

Moreover, in critical applications like healthcare or autonomous vehicles, such technical debt could have dire consequences, not only when it comes to money and time but in human well-being.

As the study describes, “Not all debt is necessarily bad, but technical debt does are likely to compound. Deferring the work to pay it off ends in increasing costs, system brittleness, and reduced rates of innovation.”

It’s also a reminder for businesses and consumers to demand transparency and accountability within the AI technologies they adopt.

After all, the goal is to harness the facility of AI to make life higher, to not get bogged down in an infinite cycle of technical debt repayment.

Study 2: You can’t separate hallucinations from LLMs

In a distinct but related study from the National University of Singapore, researchers Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli investigated the inherent limitations of LLMs.

“Hallucination is Inevitable: An Innate Limitation of Large Language Models” explores the character of AI hallucinations, which describe instances when AI systems generate plausible but inaccurate or entirely fabricated information.

The hallucination phenomena pose a serious technical challenge, because it highlights a fundamental gap between the output of an AI model and what is taken into account the “ground truth” – a super model that at all times produces correct and logical information.

Understanding how and why generative AI hallucinates is paramount because the technology integrates into critical sectors reminiscent of policing and justice, healthcare, and legal.

What if one could *prove* that hallucinations are inevitable inside LLMs?

Would that change
• How you view LLMs?
• How much investment you’d make in them?
• How much you’d prioritize research in alternatives?

New paper makes the case: https://t.co/r0eP3mFxQg
h/t… pic.twitter.com/Id2kdaCSGk

— Gary Marcus @ AAAI 2024 (@GaryMarcus) February 25, 2024

Theoretical foundations of hallucinations

The study begins by laying out a theoretical framework to know hallucinations in LLMs.

Researchers created a theoretical model often called the “formal world.” This simplified, controlled environment enabled them to look at the conditions under which AI models fail to align with the bottom truth.

They then tested two major families of LLMs:

Llama 2: Specifically, the 70-billion-parameter version (llama2-70b-chat-hf) accessible on HuggingFace was used. This model represents one among the newer entries into the big language model arena, designed for a big selection of text generation and comprehension tasks.
Generative Pretrained Transformers (GPT): The study included tests on GPT-3.5, specifically the 175-billion-parameter gpt-3.5-turbo-16k model, and GPT-4 (gpt-4-0613), for which the precise variety of parameters stays undisclosed.

LLMs were asked to list strings of a given length using a specified alphabet, a seemingly easy computational task.

More specifically, the models were tasked with generating all possible strings of lengths various from 1 to 7, using alphabets of two characters (e.g., {a, b}) and three characters (e.g., {a, b, c}).

The outputs were evaluated based on whether or not they contained all and only the strings of the required length from the given alphabet.

Findings

The results showed a transparent limitation within the models’ abilities to finish the duty appropriately because the complexity increased (i.e., because the string length or the alphabet size increased). Specifically:

The models performed adequately for shorter strings and smaller alphabets but faltered because the task’s complexity increased.
Notably, even the advanced GPT-4 model, probably the most sophisticated LLM available at once, couldn’t successfully list all strings beyond certain lengths.

This shows that hallucinations aren’t a straightforward glitch that will be patched or corrected – they’re a fundamental aspect of how these models understand and replicate human language.

As the study describes, “LLMs cannot learn all of the computable functions and can due to this fact at all times hallucinate. Since the formal world is an element of the actual world which is rather more complicated, hallucinations are also inevitable for real world LLMs.”

The implications for high-stakes applications are vast. In sectors like healthcare, finance, or law, where the accuracy of knowledge can have serious consequences, counting on an LLM with no fail-safe to filter out these hallucinations may lead to serious errors.

This study caught the attention of AI expert Dr. Gary Marcus and eminent cognitive psychologist Dr. Steven Pinker.

Hallucination is inevitable with Large Language Models due to their design: no representation of facts or things, just statistical intercorrelations. New proof of “an innate limitation” of LLMs. https://t.co/Hl1kqxJGXt

— Steven Pinker (@sapinker) February 25, 2024

Deeper issues are at play

The accumulation of technical debt and the inevitability of hallucinations in LLMs are symptomatic of a deeper issue — the present paradigm of AI development could also be inherently misaligned to create very smart systems and reliably aligned with human values and factual truth.

In sensitive fields, having an AI system that’s right more often than not will not be enough. Technical debt and hallucinations each threaten model integrity over time.

Fixing this isn’t only a technical challenge but a multidisciplinary one, requiring input from AI ethics, policy, and domain-specific expertise to navigate safely.

Right now, that is seemingly at odds with the principles of an industry living as much as the motto to “move fast and break things.”

Let’s hope humans aren’t the ‘things.’

The post Generative AI systems, hallucinations, and mounting technical debt appeared first on DailyAI.

This article was originally published at dailyai.com

Generative AI systems, hallucinations, and mounting technical debt