Build This Now
Build This Now
speedy_devvkoen_salo
Blog/Handbook/Core/Why Does AI Sound Confident When It's Wrong?

Why Does AI Sound Confident When It's Wrong?

AI guesses with the same tone it uses for facts. The reason is the training scoreboard. Here is what the research says, and how to defend yourself.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

Published Apr 30, 202610 min readHandbook hubCore index

Problem: You ask a chatbot a factual question. The answer comes back smooth, structured, sourced. You check the source. The paper does not exist. The case was never filed. The quote is a hallucination written in the same calm voice as the truth. Your brain has no way to tell the two apart.

The answer is not random. The training process actively rewards models for guessing instead of saying "I don't know." Three new studies confirm it. One fix takes thirty seconds.

Quick Win: Ask the model to score its confidence 1-10 and explain it. Numbers below 7 mean check the answer.

Before you reply, give your confidence on a 1 to 10 scale and one sentence
on why. If you would not bet 100 dollars on this, say so. If a fact comes
from training memory and you are not sure, mark it as unsourced.

That paragraph closes most of the gap on day one. Keep reading for the science behind it, and how builders ship features that earn the confidence they show.

The Moment You Realize It Was Lying

You felt it before you had a name for it. The model gave you a perfect answer. Then a friend checked. The book never won that prize. The function does not exist in that library. The senator never said that.

Reddit calls it "confidently wrong." A r/ChatGPT user put it best: "It sounds correct. That's all. It's excellent at sounding correct." A New York writer spent paragraphs arguing with ChatGPT about who the mayor was. The bot kept doubling down. He called it "behaving like an entitled know-it-all who can't possibly be wrong."

Once you see the pattern you cannot unsee it. The bot does not slow down. It does not hedge. It does not say "I think." It speaks with the same flat certainty whether the answer is a verified fact or a fluent guess.

You're Not Crazy: The Numbers Back You Up

Stanford RegLab tested general chatbots on legal questions. They hallucinated 58 to 88 percent of the time. Even purpose-built legal AI tools like Lexis+ AI hallucinated 17 to 34 percent of queries.

A New York lawyer named Steven Schwartz cited six fake cases ChatGPT made up for him in Mata v. Avianca. He filed them. He got sanctioned. Air Canada's chatbot invented a bereavement-fare policy that did not exist. The court made the airline pay $812 to honor the made-up rule.

The smoking gun came from Carnegie Mellon in July 2025. They had Gemini play Pictionary. The model predicted it would get 10 of 20 sketches right. It scored 0.93 out of 20. Then, after the test, it claimed it had scored 14.40. More overconfident after failure than before. "It's like that friend who swears they're great at pool but never makes a shot," said researcher Trent Cash.

IncidentWhat happenedCost
Mata v. Avianca, 2023Lawyer cited 6 ChatGPT-invented casesCourt sanctions, public shame
Air Canada chatbot, 2024Bot made up a refund policy$812 + reputation hit
Stanford RegLab, 2024Legal queries to general LLMs58 to 88 percent hallucination
CMU Pictionary, 2025Gemini predicted 10, scored 0.93Claimed 14.40 retroactively

AI Sounds Confident Because It Cannot Sound Any Other Way

Here is the real story. Confidence is not a personality trait the model picked up. It is a side effect of how it learned to speak. Pretraining read the internet. The internet rarely hedges. Hedging gets edited out of finished prose. The model only ever saw smooth, declarative sentences. So that is the only voice it knows.

That alone would be a problem. Three more layers of training make it worse.

Reason 1: Pretraining Has No "I Don't Know" Label

OpenAI published a paper in September 2025 called Why Language Models Hallucinate. The core finding sounds simple. The training process never shows the model what "I don't know" looks like.

The paper compares it to a multiple-choice test. If you guess on a question you do not know, you might get lucky. If you leave it blank, you definitely get zero. So the smart play is always to guess. Pretraining works the same way. The model sees a sentence with a missing word. It must predict something. Saying nothing is not an option.

Patterns like spelling and grammar fix themselves with scale. Birthdays do not. Random facts about random people cannot be guessed from rules. So the model invents one and moves on.

Reason 2: Benchmarks Reward Guessing Over Honesty

Most leaderboards score one thing. Did the model get the right answer? They do not score "did the model know when not to answer." So a model that bluffs every time scores higher than a model that says "I'm not sure" half the time.

OpenAI showed this in their own GPT-5 system card. They compared two models on the same factual quiz:

ModelAbstention rateAccuracyWrong answer rate
gpt-5-thinking-mini52%22%26%
Older OpenAI o4-mini1%24%75%

The older model is two points more accurate and three times more wrong. It guesses on 99 of 100 unknowns. It gets 75 of those guesses wrong. The leaderboard rewards it anyway. The newer model abstains on half the questions it does not know. It gets way fewer wrong answers, and a slightly lower top-line score. Most evals would call that a regression.

OpenAI's fix is structural. "It is not enough to add a few new uncertainty-aware tests on the side. The widely used, accuracy-based evals need to be updated so that their scoring discourages guessing."

Reason 3: Training to Please Humans Makes It Worse

After pretraining, models go through RLHF. Real humans rate answers. The model learns to copy what humans like. Humans like answers that sound confident, fluent, and helpful. Humans punish answers that say "maybe." So the model learns to drop the hedges.

A 2024 paper called Taming Overconfidence in LLMs measured this directly. Models trained with RLHF showed more verbalized overconfidence than the same models before RLHF. The training step made them louder, not smarter.

Anthropic's sycophancy research found the same loop. Reviewers prefer answers that match their own views, even when those views are wrong. The model learns that fluent agreement scores best. Humility scores worst. Guess what comes out the other end.

Reason 4: Reasoning Models Reward Right or Wrong, Nothing in Between

The newest training step is reinforcement learning on reasoning. The model thinks step by step, then gets a reward only if the final answer is correct. MIT CSAIL studied this in April 2026 and found something nobody expected.

"Ordinary RL training doesn't just fail to help calibration. It actively hurts it. The models become more capable and more overconfident at the same time." That is Isha Puri at MIT, on a finding eight days old as of this post.

Why? The reward only checks one bit. Right or wrong. A model that walks through careful logic gets the same reward as one that flips a coin and lands on the answer. So the model learns that the cheapest path to reward is to bet on every question with full confidence. The reasoning trace becomes theater. The score goes up. The honesty goes down.

The fix MIT proposes is called RLCR. The model has to predict its own confidence and gets graded on both correctness and calibration. Their version cut calibration error by 90 percent. The work is fresh and not yet in production models.

Why Your Brain Falls For It (And Always Has)

You are not stupid. You are running on instincts that worked for two million years and just met something they were not built for.

Psychologists call it the confidence heuristic. Pulford and Colman, 2013: "People are confident when they know they are right, and their confidence makes them persuasive." In the wild, confident humans are usually confident because they know things. Your brain reads confidence as a shortcut for accuracy. The shortcut works on humans. It breaks on AI.

Tenney and colleagues at Berkeley dug deeper in 2007 and 2008. They studied how juries judge witnesses. The finding: a witness who hedges and turns out to be right is rated more credible than one who was confident and right. Calibration beats confidence. Knowing what you do not know is the real signal of trustworthiness. AI fails this test hard. Its tone is identical for verifiable facts and pure invention.

Then there is Dunning-Kruger. Bottom-quartile performers in grammar, logic, and humor rated themselves in the 60th to 70th percentile. The skill needed to be good at something is the same skill needed to know you are not. The CMU finding maps onto this perfectly. Humans are mildly overconfident before a task and adjust after. LLMs stay wildly overconfident even after they see their own failure. They do not have the metacognition layer.

The CMU study found one more thing. Humans flag uncertainty with a furrowed brow, an "uhhh," a slow answer. AI gives you none of those cues. "With AI, we don't have as many cues about whether it knows what it's talking about," said Daniel Oppenheimer. Your social radar is getting hit with three "trust this" signals at once and zero counter-signals. You are cognitively defenseless unless you force yourself to be skeptical.

Models Already Know What They Know. Training Erases It.

Here is the cruel twist. Anthropic showed in 2022 that large models can tell which of their own answers are correct. Ask a model to propose an answer, then ask it "is that answer true," and the second answer is well-calibrated. The internal signal exists.

RLHF crushes it. Reasoning RL crushes it more. By the time the model talks to you, the calibration layer has been trained out. The fluency stays. The humility does not.

Three things follow. The fix is possible. The fix is not yet shipping by default. You have to ask for it.

What This Means If You're Building With AI

If you are just chatting with ChatGPT, the user is you. You can ask "how confident are you?" and adjust. If you ship a product with an LLM inside, the user is your customer. "Confidently wrong" is now your liability. Air Canada's $812 was the cheap version of that lesson.

The pattern that fixes it is the same pattern Build This Now uses for code. One agent generates. A separate agent evaluates. The generator is allowed to be confident. The evaluator only cares about whether the confidence is earned.

You can copy six lines into your system prompt today:

You are calibrated. Before any factual claim, decide if you are sure.
Score your confidence 1 to 10 and say why in one line.
Below 7, lead with "I'm not sure" and ask for a source or a check.
Never invent citations, statistics, names, dates, or quotes.
If you do not know, say so plainly. Do not guess to seem helpful.
"I don't know" is a valid and rewarded answer.

Then add a regression eval. Take 50 questions where the right answer is "I don't know." Run them on every prompt change. Fail the build if abstention drops, the same way you fail a build on a TypeScript error. That is the BTN quality-gate idea applied to honesty. Type-check, lint, build, calibration. Four gates instead of three.

For high-stakes answers, run a second pass. The first model writes. The second model scores confidence and rejects answers above a threshold without sources. This is the generator-evaluator loop the framework already runs on every shipped feature. Wire it to text and you get the same protection on words that you get on code.

Three Things to Do Tomorrow

Save these. Use them every time you talk to an AI for something that matters.

  1. Ask for a confidence score. "How sure are you, 1 to 10, and why?" The number itself is a forcing function. Models trained to please will mark themselves down when the question is direct.
  2. Ask for sources, then check one. Not all of them. One. If the citation is fake, every other claim in the answer is now suspect. The bluff is the tell.
  3. Treat fluency as a warning, not a credential. Smooth prose is the easiest part for the model. Hard answers should sound a little harder. If everything sounds equally easy, the model is guessing about something.

AI confidence is unearned. Your build pipeline shouldn't be. Calibration is the difference between shipped and sanctioned, between trusted and refunded, between a useful tool and a $812 invoice. Build the gate. Then ship.

Continue in Core

  • 1M Context Window in Claude Code
    Anthropic flipped the 1M token context window on for Opus 4.6 and Sonnet 4.6 in Claude Code. No beta header, no surcharge, flat pricing, and fewer compactions.
  • AGENTS.md vs CLAUDE.md Explained
    Two context files, one codebase. How AGENTS.md and CLAUDE.md differ, what each one does, and how to use both without duplicating anything.
  • Auto Dream
    Claude Code cleans up its own project notes between sessions. Stale entries get pruned, contradictions get resolved, topic files get reshuffled. Run /memory.
  • Auto Memory in Claude Code
    Auto memory lets Claude Code keep running project notes. Where the files sit, what gets written, how /memory toggles it, and when to pick it over CLAUDE.md.
  • Auto-Planning Strategies
    Auto Plan Mode uses --append-system-prompt to force Claude Code into a plan-first loop. File operations pause for approval before anything gets touched.
  • Autonomous Claude Code
    A unified stack for agents that ship features overnight. Threads give you the structure, Ralph loops give you the autonomy, verification keeps it honest.

More from Handbook

  • Agent Fundamentals
    Five ways to build specialist agents in Claude Code: Task sub-agents, .claude/agents YAML, custom slash commands, CLAUDE.md personas, and perspective prompts.
  • Agent Harness Engineering
    The harness is every layer around your AI agent except the model itself. Learn the five control levers, the constraint paradox, and why harness design determines agent performance more than the model does.
  • Agent Patterns
    Orchestrator, fan-out, validation chain, specialist routing, progressive refinement, and watchdog. Six orchestration shapes to wire Claude Code sub-agents with.
  • Agent Teams Best Practices
    Battle-tested patterns for Claude Code Agent Teams. Context-rich spawn prompts, right-sized tasks, file ownership, delegate mode, and v2.1.33-v2.1.45 fixes.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

On this page

The Moment You Realize It Was Lying
You're Not Crazy: The Numbers Back You Up
AI Sounds Confident Because It Cannot Sound Any Other Way
Reason 1: Pretraining Has No "I Don't Know" Label
Reason 2: Benchmarks Reward Guessing Over Honesty
Reason 3: Training to Please Humans Makes It Worse
Reason 4: Reasoning Models Reward Right or Wrong, Nothing in Between
Why Your Brain Falls For It (And Always Has)
Models Already Know What They Know. Training Erases It.
What This Means If You're Building With AI
Three Things to Do Tomorrow

Stop configuring. Start building.

SaaS builder templates with AI orchestration.