Why Does AI Panic When You Correct It?

Problem: You point out a bug. The model says "you're absolutely right." It tries again. Same bug, dressed differently. You correct it harder. Now it apologizes twice and the answer is worse than the first one. The chat is gaslighting you.

It is not. The model is reading its own mistakes and treating them as ground truth.

Quick Win: When the AI gets it wrong twice, start a fresh chat. Repeat only the parts that matter, leave the bad attempt out.

That single rule fixes most correction loops. The rest of this post explains why the loop happens, what the research calls it, and how to wire your prompts so the loop never starts.

The "You're Absolutely Right" Moment

You felt it before you knew it had a name. You ask for code. Something is off. You say so. The reply opens with "You're absolutely right" and changes one line, leaving the real bug. You push back again. New apology. New version. Same bug.

GitHub issue #3382 on anthropic/claude-code collected 870-plus thumbs-up reactions and 180-plus comments on exactly this. One commenter wrote: "I'm always absolutely right. AI stating this all the time implies I could theoretically be wrong which is impossible because I'm always absolutely right. Please make it stop." Another opened the bug, replied to themselves saying it was a feature, then got "You're absolutely right! My apologies." in return. There is a website tracking it: absolutelyright.lol.

The meme is a symptom. The mechanism underneath is what hurts you when the stakes are real.

What You're Actually Watching

The pattern has six steps. Once you see it once, you see it everywhere:

You ask a question.
The model gives a wrong answer.
You say it is wrong.
The model apologizes and tries again.
The new answer inherits the old framing.
Repeat. Each round, the answer drifts further from the thing you asked for.

A user on r/claude posted the punchline most people eventually find: "if I just start a new chat with fresh context, with the same in-progress files, it would chill and behave."

The escape hatch is not better wording. It is an empty context.

Stop. You Are Making It Worse.

Counterintuitive but true. Every correction you type adds the wrong answer to the context the model is staring at when it generates the next answer. The error becomes part of the question.

A Microsoft and Salesforce paper put numbers on it. When prompts get sharded across multi-turn corrections instead of delivered in one shot, model accuracy drops by about 39 points on average. Their summary: "When LLMs take a wrong turn in a conversation, they get lost and do not recover."

Hitting "regenerate" inside the same chat does not erase the error. It generates a new answer conditioned on the same poisoned history.

What Is Actually Happening Inside the Model

A chatbot writes one token at a time. Each new token is conditioned on every token that came before, including the model's own earlier output.

Sebastian Raschka, PhD, summed up the loop on his FAQ:

"LLMs sometimes repeat themselves because text generation is a local next-token process. Once the model emits a pattern, that pattern becomes part of the context for the next step, which can make the same continuation even more likely."

So when answer A1 is wrong, A1 sits in the chat. The model sees it. The model writes A2, which is shaped by A1. The wrongness is now part of the prompt, not just the past.

This is autoregressive feedback. It is not a bug. It is how the model works.

The Technical Name: Context Contamination

Drew Breunig published a now-canonical taxonomy of how long contexts fail. There are five common failure modes, and most user-visible "AI panic" is one of them:

Failure mode	What goes wrong	What it feels like
Context poisoning	A hallucinated fact gets into the context and gets cited later	The model insists on something that was never true
Context distraction	Context grows so long the model overweights it and forgets training	The chat gets dumber the longer it runs
Context confusion	Irrelevant content in the context bleeds into the answer	Off-topic details show up where they should not
Context clash	Two parts of the context disagree	The model picks one and ignores the other
Cascade failure	A wrong answer in turn N becomes input for turn N+1	Apologizing, repeating, getting worse

The DeepMind Gemini 2.5 technical report coined "context poisoning" while watching an agent play Pokemon: "many parts of the context (goals, summary) are 'poisoned' with misinformation about the game state, which can often take a very long time to undo. As a result, the model can become fixated on achieving impossible or irrelevant goals."

Fixated on impossible goals. That is the technical phrase for "gaslighting me."

Why Pushing Harder Makes It Worse

Models attend more to the start and end of their context than the middle. The 2023 paper "Lost in the Middle" showed this empirically across GPT-4, Claude, and others.

Your latest correction sits at the recent end. So does the wrong answer right above it. So does the previous wrong answer right above that. The model is staring at a stack of failures every time it generates the next reply.

Chroma's Context Rot study tested 18 models, including GPT-4.1, Claude 4, Gemini 2.5, and Qwen3. All of them degraded as context grew, even on simple tasks. A Databricks study put numbers on Llama 3.1 405B: accuracy starts falling around 32k tokens, far short of advertised million-token windows.

Bigger windows do not save you. They give the cascade more room to grow.

The Human Parallel: Anchoring and Perseveration

Here is the part nobody talks about. The thing you are watching the AI do is the same thing humans do when they get cognitively stuck.

In 1974, Tversky and Kahneman published "Judgment under Uncertainty: Heuristics and Biases." They asked people what percent of African countries were in the UN, but first spun a wheel showing a random number. The wheel was meaningless. The number still moved every answer. That is anchoring bias.

A November 2025 paper, "Behavioral and Attributional Evidence of Anchoring Bias in LLMs," used Shapley-value attribution to prove anchors literally shift the internal log-probability distribution of LLM outputs across GPT-2, GPT-Neo, Falcon, Gemma, Phi, and Llama. Anchoring is not a metaphor for what models do. It is the same bias, measured the same way.

There is also the clinical pattern called perseveration: continuing the same wrong response after the rule changes. The Wisconsin Card Sorting Test diagnoses it. When you correct a chatbot, you are switching the rule. The model, like a perseverating subject, keeps producing the old strategy because the recent context still contains it.

What looks like AI panicking is the AI being too human.

The "You're Absolutely Right" Cherry On Top

Modern chatbots are also trained on human feedback. People click thumbs-up on replies that feel good. Apologies feel good. Agreement feels good. Train a model on enough of those clicks and you get a reflex.

GitHub issue #3382 caught the worst case. A user asked Claude whether to remove a code path. The user said "yes please." Claude responded "You're absolutely right!" Agreeing with a request that contained zero factual claim.

The apology is not an admission. It is the same probability distribution that produced the wrong answer, wearing different clothes.

Multi-Turn Corrections vs Fresh Chat

The strongest signal in the research is the gap between fixing in place and starting over. From the Microsoft and Salesforce paper, plus user reports in the same window:

Approach	What happens to accuracy
One-shot prompt with full context	Baseline. Best-case performance for the task.
Multi-turn corrections in the same chat	About 39 points lower on average across tested models
Fresh chat with the corrected framing up front	Returns close to one-shot baseline
Fresh chat with no mention of the prior failure	Cleanest result of all

Note the last row. If you start a new chat and tell the model "previously you said X which was wrong, now do Y," you just put X back in the context. You poisoned the new well with the old water.

The fix is to pretend the old chat never happened.

What Actually Works

You have three controls. Use them in order.

Two-correction limit. If the model fails a task twice in the same chat, do not try a third correction. The third try is statistically the worst one. Run /clear in Claude Code, open a new ChatGPT thread, or start a new Gemini conversation.

Front-load the right framing. Open the new chat with the answer you want, not the answer you got. State the task and the correct shape, like "fixing a TypeScript error in this file, the correct shape is X." Do not say "previously the model said Y."

Quarantine your tasks. Drew Breunig's "How to Fix Your Context" lists five patterns: quarantine, pruning, summarization, offloading, and tool loadout. The unifying idea is that one chat per task beats one chat for everything. Anthropic's own engineering team reported a 90.2 percent gain on internal evaluation when their multi-agent research system used isolated subagent contexts instead of one giant context window.

The takeaway scales. Less context, applied with intent, beats more context applied by reflex.

How Build This Now Solves This At The Architecture Level

Build This Now is an AI-powered SaaS build system that runs on Claude Code. The framework solves the cascade by design, not by discipline.

Eighteen specialist agents. Each agent gets its own context window, its own system prompt, its own tools. The Database Architect never sees the Designer's failed first attempt. The Tester never reads the Backend Developer's hallucinated API. When a task needs correcting, an orchestrator routes the correction to a fresh agent with a clean brief, not back into the contaminated chat.

Quality gates run between handoffs. Type-check, lint, and build each act as a fresh evaluator on the output, with no memory of how it was produced. The cascade pattern that ruins single-chat sessions has nowhere to start.

The contrast is direct. One giant chat collapses because it has nowhere to go but back into its own mistakes. A team of small agents with clean contexts and gates between them does not have that exit.

When the AI panics, you give it a clean room. Two corrections, then /clear. One task, one context. One agent, one job.

The fix is not louder prompts. It is fewer of them, in fresher rooms, with sharper gates.

Why Does AI Panic When You Correct It?

On this page