Claude Code Context Buffer

You hit 167K tokens. Claude compacts. Context slips away. Every. Single. Time.

Here's the annoying part: a slice of your context window stays off-limits, reserved by Claude Code itself. That slice used to be 45,000 tokens (22.5% of 200K). Early 2026, the buffer dropped to roughly 33,000 tokens (16.5%), freeing about 12K more tokens for actual work.

What It Is	Current (2026)	Previous	Can You Change It?
Compaction buffer	~33K tokens (16.5%)	~45K tokens (22.5%)	No - hardcoded
Compaction trigger	~83.5% usage	~77-78% usage	Yes - `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` (1-100)
Usable context	~167K tokens	~155K tokens	Yes - use `sonnet[1m]` for 1M token window

No announcement covered this shift in the official Claude Code changelog. The nearest clue is v2.1.21: "Fixed auto-compact triggering too early on models with large output token limits", which probably retuned the buffer calculation. Online posts and docs still throw around the 45K number, but /context now reports 33K on current versions.

The buffer is there for real reasons. Knowing exactly how it works separates people fighting the system from people working with it.

Read this page when the buffer itself is the question: when compaction triggers, why the reserved space exists, and what environment variables actually change. If you want the broader workflow impact of the larger window, read 1M Context Window in Claude Code. If you want practical rules for when to stay in a session versus resetting it, read Context Management.

How Auto-Compaction Actually Works

Context usage gets watched continuously by Claude Code. At roughly 83.5% of the total window (up from ~77-78% before), auto-compaction kicks in.

Here's the sequence:

Claude summarizes your conversation history
Older messages get replaced with a condensed summary
You lose granular details from early in the session
The session continues with reduced context

On a 200K window, compaction lands somewhere near 167K tokens of real use. That 33K buffer isn't sitting idle. Claude spends it on the summarization itself.

The /context Command

Run /context to see exactly where your tokens are going:

claude-opus-4-5-20251101 · 76k/200k tokens (38%)

System prompt: 2.7k tokens (1.3%)
System tools: 16.8k tokens (8.4%)
Custom agents: 1.3k tokens (0.7%)
Memory files: 7.4k tokens (3.7%)
Skills: 1.0k tokens (0.5%)
Messages: 9.6k tokens (4.8%)
Free space: 118k (58.9%)
Autocompact buffer: 33.0k tokens (16.5%)

The Messages row is your conversation history. Watch it climb. When free space hits zero (buffer included), compaction fires.

Why the Buffer Exists

Three jobs fall on that ~33K:

Working space for compaction. The summarization process itself needs tokens to operate
Completion buffer. Allows current tasks to finish before compaction triggers
Response generation space. Claude needs working memory to reason and construct responses

The buffer is baked into Claude Code's architecture. Requests to make it configurable have been closed as duplicates. GitHub Issue #15435 asked for this. The answer was no.

The Output Tokens Misconception

A lot of developers think CLAUDE_CODE_MAX_OUTPUT_TOKENS governs the compaction buffer.

It doesn't.

Variable	What It Controls	Default
`CLAUDE_CODE_MAX_OUTPUT_TOKENS`	Max tokens per API response	32K
(none - hardcoded)	Compaction buffer reservation	~33K

Two different mechanisms, zero overlap:

Output tokens. Caps how long a single Claude response can run
Compaction buffer. Reserved context space that triggers auto-compaction

Set CLAUDE_CODE_MAX_OUTPUT_TOKENS=16000 and you'll shrink Claude's max response length. Context before compaction won't budge. The 33K buffer is fixed.

# This limits response length, NOT context buffer
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=16000

Reasons to lower output tokens:

Faster responses (less to generate)
Lower costs per response
Force conciseness

Usable context before compaction? Still ~167K.

One caveat worth flagging: although CLAUDE_CODE_MAX_OUTPUT_TOKENS leaves the compaction buffer alone, pushing it very high can cut your effective context window. Output tokens are carved from the same context pool, so a bigger output reservation takes room away from history and system context. The 32K default balances reasonably well for most workflows.

The Real-World Impact

Picture a typical heavy session:

Phase	Context Used	What Happens
Start	20K	System prompt, CLAUDE.md, skills load
Mid-session	80K	Deep in implementation, full context
Pre-compact	167K	Auto-compact triggers
Post-compact	~60K	Summarized history, details lost

With a 33K buffer, compaction hits at 167K. That's your working ceiling, 12K higher than the old 155K one.

Where does information go? Into the summary. Exact variable names, precise error messages, subtle choices from early in the session all get squeezed into a recap that catches the gist and loses the detail.

What You Can Actually Control

1. Override the Compaction Trigger Percentage

One environment variable actually shifts when auto-compaction fires: CLAUDE_AUTOCOMPACT_PCT_OVERRIDE.

# Trigger compaction at 90% instead of the default ~83.5%
export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=90
 
# Trigger earlier at 70% for more aggressive compaction
export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=70

Values from 1 to 100 are accepted. The number directly sets the percentage at which auto-compaction triggers. A higher setting gives more usable context ahead of compaction and leaves less room for the summary job. A lower setting fires compaction earlier, keeps more working space, and gives you less room before the first hit.

Closest thing to a configurable buffer you can get. It doesn't change buffer size. It moves the moment compaction fires relative to the full window.

2. Use Extended Context Models

Rather than wrestling the 200K cap, reach for the 1M token context window.

As of March 2026, the 1M context window is generally available for Opus 4.6 and Sonnet 4.6, with no pricing premium. A 900K-token request costs the same per token as a 9K one. See our 1M context window guide for the full breakdown of what changed and what it means for your workflow.

/model sonnet[1m]

At 1M tokens, the compaction threshold moves way out. Even after a proportional buffer, room before compaction fires grows substantially. See the model selection guide for the full model alias reference.

3. Disable Auto-Compaction (Risky)

// ~/.claude/settings.json
{
  "autoCompact": false
}

Warning. GitHub Issue #18264 reports the setting may get ignored in some cases. Even when it holds, you risk slamming into hard context limits and crashing sessions.

Only flip this if you're ready to:

Monitor context manually with /context
Run /compact before hitting 100%
Accept occasional session crashes

4. Manual Compaction at Strategic Points

Turn off auto-compact, then compact on your schedule:

/compact   # Compact when you decide
/clear     # Full reset when starting new major task

Good moments to compact on purpose:

After completing a major feature
Before starting a new component
When debugging context feels stale

Upside: you pick what gets summarized and when, which keeps the fine-grained details around active work.

5. Work Within the 167K Limit

Accept that heavy sessions are going to compact. Set up for it:

Keep CLAUDE.md and skills lean
Use session files to persist state
Break complex tasks into multiple sessions

6. Proactive Backup Strategy

The most effective move: back things up before compaction arrives.

An idea catching on in the Claude Code community: proactive clearing at 50% plus structured recovery beats lossy auto-compaction.

Auto-compaction condenses your conversation and drops the fine detail. But do this instead:

Continuously record your session to a structured backup
Clear context manually at a threshold (like 50%)
Reload from structured backup instead of lossy summary

Context fidelity goes up. The backup holds exact details that summarization throws away.

How To Choose Your Operating Mode

Most people do not need one magic setting. They need the right mode for the kind of work they are doing.

Mode	Best For	What You Do
Default auto-compact	Everyday coding, short to medium sessions	Leave the defaults alone and watch `/context` occasionally
Manual compacting	Multi-phase work where you know what must survive	Compact on purpose before phase changes
Long-context model + defaults	Large-repo work and long traces	Use the 1M window to reduce forced compactions
Threshold backup workflow	Work where losing detail is expensive	Snapshot early, clear early, reload exact state instead of relying on summaries

The right question is not "how do I beat the buffer?" It is "how expensive is a lossy compact for this session?"

Three Worked Examples

Example 1: Regular feature build

You are building one feature, touching a handful of files, and the session is unlikely to cross 100K tokens.

Best move:

use defaults
check /context once or twice
do not over-engineer it

The buffer is not your problem here.

Example 2: Debugging session with lots of dead ends

You spend 40 minutes inspecting logs, traces, and failed hypotheses. You already know the next phase will be implementation.

Best move:

do not wait for autocompact
run /compact focus on the confirmed root cause, affected files, and fix plan
continue into implementation

This keeps the useful diagnosis while dropping the junk.

Example 3: High-detail content or audit session

Suppose you are auditing a security flow or writing a source-heavy article. Exact quotes, file paths, and observed behavior matter.

Best move:

keep a structured backup or notes file
clear earlier than usual
restore from the exact backup instead of trusting a compacted summary

This is where the backup workflow wins. The problem is not raw capacity. It is information fidelity.

When To Change `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE`

This variable is useful, but only in specific cases.

Raise it when:

the current task is coherent and you want more room before compaction
you are actively watching usage
the cost of a too-early compact is higher than the risk of waiting

Lower it when:

the session is messy and you want summaries sooner
you are okay with more frequent compaction
you want Claude to keep the working set tighter

Leave it alone when:

you are not monitoring context actively
you do not have a clear reason to change the trigger
the real problem is task switching, not compaction

Changing the trigger is not a substitute for good session boundaries. It only shifts when the same underlying mechanism fires.

StatusLine: The Only Live Monitor

StatusLine is the one mechanism that gets real-time context metrics. Other hooks don't receive token counts.

// .claude/settings.json
{
  "statusLine": {
    "type": "command",
    "command": "node .claude/hooks/context-monitor.mjs"
  }
}

The statusline receives JSON with context_window.remaining_percentage. That's live data, ready to act on.

Critical calculation. The remaining_percentage field already counts the 16.5% autocompact buffer. For actual "free until autocompact":

const AUTOCOMPACT_BUFFER_PCT = 16.5;
const freeUntilCompact = Math.max(
  0,
  remaining_percentage - AUTOCOMPACT_BUFFER_PCT,
);

25% remaining really means 8.5% left before compaction.

Why Hooks Can't Inject /clear

A technical wall a lot of people run into: hooks cannot inject slash commands.

Reasonable guess: a hook spots high context usage and injects /clear. It can't:

UserPromptSubmit has no updatedPrompt field. It can add context or block, but never replace
Slash commands skip hook evaluation entirely
No hook fires "instead of" user input

Real ways to clear and recover programmatically:

Claude Agent SDK. Send /clear via the SDK
Headless CLI wrapper. Pipe commands to headless Claude Code
Manual workflow. Hook warns you, you run /clear, SessionStart restores

What Happens at 100% Context

Push context all the way to the edge and here's what follows:

Best case. Claude's response gets truncated
Worse case. API returns an error, turn fails
Worst case. Session becomes unresponsive

The 33K buffer is there to keep these from happening. It's protection, not waste.

Key Takeaways

The buffer just dropped from 45K to 33K. Undocumented change, about 12K more usable tokens
Compaction now triggers at ~83.5% usage. That puts usable context around ~167K (up from ~155K)
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE shifts the trigger. Values 1 through 100 set when compaction fires
sonnet[1m] offers 1M token context. A real alternative to wrestling 200K limits
Output tokens and compaction buffer are separate. Don't mix them up
autoCompact: false may work. It also has reported bugs
StatusLine is the only live context monitor. Other hooks don't see token counts
Hooks cannot inject /clear. Go through the SDK, a wrapper, or a manual workflow
Proactive clearing plus structured recovery beats lossy auto-compaction

The buffer is there for good reason. Instead of fighting it, work alongside it: keep state in session files, run threshold-based backups ahead of compaction, and think about proactive clearing for heavy sessions.

The Solution: Threshold-Based Backups

The buffer is fixed. How you deal with approaching it is not.

Check out our threshold-based backup system for a proactive setup that watches context through StatusLine and creates backups at 30%, 15%, and 5% remaining, before compaction wipes your session history.

Context Recovery Hook - Threshold-based backup system
Context Engineering Guide - Strategic context usage
Memory Optimization - Reduce static context overhead
Claude Code Hooks Guide - All 12 hook types explained

Claude Code Context Buffer

On this page