Why Does ChatGPT Make Stuff Up?

Problem: You ask a chatbot for a source. It gives you a paper title, an author, a journal, a year. The cite looks perfect. The paper does not exist. You ask again, push back, ask if it is sure. It apologizes, then gives you a second fake one. Your gut says the model is lying. It is not. It cannot.

The mechanism is older than ChatGPT. The fix is not "trust the AI more." The fix is knowing what the AI actually does when you ask it a question, and what your own brain does when you read the answer.

Quick Win: When the answer matters, paste this after your question:

List your sources. For each source, give a URL I can open. If you are not sure a source exists, say so before listing it.

That one paragraph cuts most casual fabrications. Keep reading for what is actually happening, why your eyes trust it anyway, and what builders bake in so users never see a confident lie.

The Lawyer Who Filed Six Fake Cases

In May 2023, a New York attorney named Steven Schwartz filed a brief in Mata v. Avianca. ChatGPT had given him six supporting cases. Real-sounding names. Real-sounding citations. The cases did not exist. The judge fined Schwartz and his partner $5,000 each. The transcript is brutal. Schwartz told the court he had "never used ChatGPT" before and "was unaware of the possibility that its content could be false."

That was the starting gun. By April 2026, more than 600 U.S. court filings have been flagged for AI-fabricated citations. Utah lawyer Richard Bednar got sanctioned for citing Royer v. Nelson, a case that exists only because ChatGPT wrote it. Australia, the UK, France, the same playbook. Every month a new headline. Always the same beat. Lawyer trusted output. Output looked perfect. Output was made up.

You Have Already Done This Too

The lawyers were the loud ones. The pattern is everywhere.

In May 2025, the Chicago Sun-Times printed an AI summer reading list. Ten of fifteen books were fake. Real authors, invented titles. The federal MAHA report on children's health, also May 2025, cited at least seven studies that do not exist (NOTUS audited the bibliography). Librarians at the Library of Virginia now estimate fifteen percent of emailed reference questions are AI-generated, often pointing at sources that were never written. The International Committee of the Red Cross had to add a notice to its archive: when a reference cannot be found, it may not be lost. It may be a hallucination.

If you have ever pasted a chatbot answer into a doc and shipped it, you are on the same curve. You just got lucky.

What ChatGPT Actually Is

A large language model is a next-word predictor. Given the text so far, it outputs a probability distribution over the next token, samples one, sticks it on the end, and repeats. That is the whole algorithm.

There is no fact lookup. No internal database. No "is this true" check. When you ask "Who wrote The Cellar at the End of the Lane?" the model is not searching a library. It is asking itself a different question: given everything I read during training, what word most plausibly comes next here? If the book was in training data, the right author falls out. If it was not, the model still has to produce something. So it produces the most plausible-sounding name. Often a real-sounding novelist. Sometimes a real novelist who never wrote that book.

Karpathy put it cleanly on X: the algorithm is fixed at next-token prediction. The meaning of those tokens shifts per domain. The procedure does not change.

Fluent and True Are Not the Same

Two systems run when the model writes. One is fluency: does this read like good English. The other is accuracy: is the claim correct. Training pours billions of dollars into the first. The second is a side effect.

Accuracy only emerges when the truthful answer is also the most-frequent training pattern. Common facts (the capital of France, the boiling point of water) get memorized often enough that fluency and accuracy point at the same word. For obscure facts (a specific case citation, a specific person's birthday), the most plausible continuation and the correct continuation drift apart. Fluency wins. The model commits.

A Hacker News commenter put the consequence flatly: everything an LLM outputs is a hallucination. Some of those hallucinations happen to be true.

The "I Don't Know" Problem

OpenAI shipped a paper in September 2025 called "Why Language Models Hallucinate." The headline finding is not about the model. It is about how the model gets graded.

Standard evaluations score answers as right or wrong. Saying "I don't know" gets zero. A guess has positive expected value, even when the model is unsure. So during fine-tuning and RLHF, the model learns the right policy for a multiple-choice exam: always answer something. Hedging guarantees zero points. Guessing has a chance.

OpenAI's own SimpleQA numbers say it out loud:

Model	Error rate	Abstention rate
GPT-5-thinking-mini	26%	52%
OpenAI o4-mini (older)	75%	1%

The older model is less wrong overall, but it almost never says "I don't know." The newer one is more honest about its limits. That tradeoff is not a fluke. It is the lever.

Anthropic Looked Inside Claude's Brain

In March 2025, Anthropic published "On the Biology of a Large Language Model." Their interpretability team opened Claude up and traced the circuits behind a hallucination. The finding is the most useful mental model in this whole post.

Refusal is the default. A circuit is "on" by default that makes Claude say "I don't have enough information for that." A second circuit, a "known entity" feature, can fire when the model recognizes something. When that second circuit fires, it suppresses the default refusal. The model commits to producing an answer.

Hallucinations happen when the "known entity" circuit fires by mistake. The model sees a name it half-recognizes (a plausible book title, a real-sounding case caption, a person it has read about in another context), the recognition signal trips, the refusal circuit gets shut off, and the model is now committed. Anthropic's words: "Once the model has decided that it needs to answer the question, it proceeds to confabulate: to generate a plausible (but unfortunately untrue) response."

The AI is not lying. Its "I should answer this" reflex misfired. From there, fluency takes the wheel.

Why Your Brain Falls For It

The model is half the problem. Your reading brain is the other half.

Reber and Schwarz ran a clean experiment in 1999. They printed statements at different contrast levels. High-contrast, easy-to-read sentences were judged true significantly more often than low-contrast ones. Same content. Different visual fluency. The result: any variable that makes text easier to process raises its perceived truthfulness.

ChatGPT outputs are perceptually maximal. Clean Markdown. Tight grammar. Confident voice. Perfect formatting. Your System 1 (the fast, automatic part Kahneman wrote about in Thinking, Fast and Slow) reads "easy" as "true" before System 2 has time to fact-check. You did not consent to that step. It runs on its own.

This is the cognitive ease trap. The most polished prose ever written meets the part of you that mistakes polish for accuracy. The model wins that contest most of the time.

The Illusion You Understand AI

Rozenblit and Keil, 2002. Cognitive Science. Yale undergrads were asked to rate how well they understood everyday objects (toilets, zippers, sewing machines). Then they were asked to explain how each one worked, step by step. Then they re-rated. After explaining, their self-rated understanding crashed. Knowing what something does is not the same as knowing how it works. People over-rate their explanatory knowledge. The bias has a name: the illusion of explanatory depth.

Try this on yourself. You know what ChatGPT does. Now explain "token" out loud. Explain "training." Explain why a model's vocabulary is fixed but its outputs feel infinite. The gap between what you can describe and what you actually grasp is exactly the gap a confident answer slips through. Audit only catches what you understand. Most users cannot audit a citation they were not equipped to question.

What Builders Actually Do About It

If you ship an AI feature, a 91% honest model is still wrong 9% of the time. At a million queries a week, that is a lot of fabricated sources reaching paying users. The mitigation stack is well known. Most articles skip it. Here is the short version.

Ground the model in your own data. Retrieval augmented generation (RAG) pulls real records from your database before the model writes anything. The Lewis 2020 paper is the canonical reference. Stanford RegLab measured top legal RAG tools at 17 to 34 percent hallucination rates, so RAG is not magic. It is a floor, not a ceiling.

Make uncertainty visible. Ask the model for citations with URLs. Refuse to render an answer if a citation field is empty. Show the user the source. If the source does not load, flag the answer as unverified.

Train the refusal back in. Add this block to your system prompt:

If you are not sure a fact is correct, say "I do not know" before answering.
Cite sources only when you can give a URL the user can open.
Never invent a citation. If a source might be wrong, ask the user to verify.
You can refuse to answer when evidence is thin.

Test adversarially. Ask the model trick questions about entities that do not exist. Ask for sources you know are fake. Score abstention rate, not just accuracy. MASK and Petri 2.0 are open evals you can wire into CI today.

Run a generator and an evaluator. One model writes. A separate model, with a different prompt and different temperature, scores the output for groundedness, citation validity, and abstention. Reject and regenerate when the score is low. This is the same generator-evaluator pattern that catches code regressions.

Frontier Models Today, Ranked by Honesty

The honesty gap between models is real and widening. Numbers from public evals as of late April 2026:

Model	MASK honesty score	Notes
Mythos Preview (Anthropic)	95.4%	Research access only. Pushes back on false premises 80% of the time.
Claude Opus 4.7	91.7%	Public model. Ships with refusal behavior trained back in.
Claude Sonnet 4.6	89.1%	Cheaper, slightly looser.
GPT-5.5	Pending public score	OpenAI ships abstention as a configurable knob.
Gemini 3.1 Pro	Pending public score	Strong on grounded retrieval, weaker on abstention.
Grok 4.20	Pending public score	Lowest abstention rate of major frontier models.

Pick the model that matches your error budget. A coaching app and an internal data tool have different tolerances. The number that matters is not "smartest." It is "willing to say I do not know."

The Bottom Line

AI does not lie. Lying requires knowing the truth. The model is guessing every word, and most of the time the guess is right because the truth is also the most common pattern. When it is not, the model commits anyway. Scott Alexander reframed it in March 2026: shameless guesses, not hallucinations.

Humans confabulate too. Eyewitness memory, split-brain experiments, "I am pretty sure I read it somewhere." The model learned this from us. The fix is the same on both sides. Reward "I do not know." Audit fluent answers. Show the receipt.

How Build This Now Builds This In

Build This Now is an AI-powered SaaS build system. Eighteen specialist agents. Fifty-five skills. A five-step pipeline from idea to live product. The Tester agent runs adversarial checks. The Database Architect grounds features in real schema. Quality gates (type-check, lint, build) fire on every feature. One agent generates. A separate agent evaluates. The pattern that catches confident bugs is the same pattern that catches confident lies.

If you are wiring an AI feature into a product, the architecture matters more than the model. Ground the output. Make uncertainty visible. Score abstention. Run the eval on every prompt change. Most of the work is done. We just plug it in for you.

ChatGPT does not know it is wrong. Your brain does not know to ask. A real product knows both, and answers anyway.

Why Does ChatGPT Make Stuff Up?

On this page