Are bad incentives to blame for AI hallucinations?

TechCrunch — 2025-09-07T00:00:00Z

A new study from OpenAI reveals that the incentives in evaluating AI models may be fueling their tendency to hallucinate.

**OpenAI's latest research paper tackles the persistent issue of AI hallucinations.** Despite advancements in large language models like GPT-5 and ChatGPT, these systems still generate plausible but false statements. The paper highlights that the current evaluation methods may inadvertently encourage models to guess rather than express uncertainty, leading to inaccuracies. Researchers illustrate this with examples of a chatbot providing multiple incorrect answers to simple factual questions. For instance, when asked about the title of a Ph.D. dissertation, the chatbot provided three different incorrect answers, showcasing the challenge of ensuring factual accuracy in AI responses. This ongoing issue raises concerns about the reliability of AI systems in providing correct information, emphasizing the need for improved evaluation methods to address these hallucinations effectively.

Understanding AI Hallucinations

Hallucinations in AI refer to instances where models generate statements that sound credible but are factually incorrect. OpenAI defines these as ‘plausible but false statements.’ The paper emphasizes that while improvements have been made, hallucinations remain a fundamental challenge for all large language models. For instance, when asked about the title of a Ph.D. dissertation, a chatbot provided three different, incorrect answers. This phenomenon occurs because the models are trained on vast datasets, learning patterns in language without a clear understanding of factual correctness. As a result, they can confidently present incorrect information, which can mislead users who rely on their outputs.

The Role of Pretraining

The researchers suggest that the pretraining process of language models contributes to hallucinations. During pretraining, models learn to predict the next word based on patterns in the data, without true or false labels. This method works well for fluent language but fails for low-frequency facts, such as specific dates or names, leading to errors. The paper argues that while spelling and grammar improve with scale, factual accuracy does not. This is because the models are not explicitly trained to distinguish between correct and incorrect information, resulting in a tendency to generate plausible-sounding but inaccurate responses. The lack of rigorous factual training creates a gap in the model's ability to provide reliable information.

Evaluating AI Models: The Wrong Incentives

The crux of the issue lies in how AI models are evaluated. Current evaluation methods focus solely on accuracy, which can incentivize models to guess rather than admit uncertainty. The researchers compare this to multiple-choice tests where random guessing can yield correct answers, while leaving a question blank guarantees a zero. This creates a system where models are rewarded for confident errors. By prioritizing accuracy over uncertainty, the evaluation framework encourages models to produce answers even when they lack sufficient information, perpetuating the cycle of hallucinations. This misalignment between evaluation and desired outcomes highlights the need for a more nuanced approach to assessing AI performance.

Proposed Solutions for Improvement

To combat hallucinations, OpenAI suggests revising evaluation methods to penalize confident errors more than uncertainty. They propose a scoring system similar to standardized tests that includes negative scoring for wrong answers and partial credit for uncertainty. This approach aims to discourage blind guessing and encourage models to express uncertainty when they lack confidence. By implementing such changes, the researchers believe that AI models can be trained to provide more accurate and reliable information, ultimately improving user trust in AI systems. This shift in evaluation strategy could lead to a more responsible deployment of AI technologies in various applications.

The Future of AI Accuracy

The researchers stress that merely adding a few uncertainty-aware tests is insufficient. The entire evaluation framework needs an overhaul to ensure that models are not rewarded for lucky guesses. If the primary scoring systems continue to favor accuracy without considering uncertainty, models will perpetually learn to guess, maintaining the cycle of hallucinations. A comprehensive reevaluation of how AI performance is measured is crucial for fostering advancements in accuracy. By creating a system that values uncertainty and penalizes incorrect confident assertions, the future of AI can be directed towards more reliable and trustworthy outputs, ultimately benefiting users and developers alike.

Why it matters

AI hallucinations can lead to misinformation and erode trust in AI systems.
Improving evaluation methods could enhance the reliability of language models.
Addressing these issues is crucial as AI becomes more integrated into daily life.

Context

As AI technology continues to evolve, understanding and mitigating hallucinations is essential for building trustworthy systems that users can rely on.

References

techcrunch.com

Are bad incentives to blame for AI hallucinations?