Build This Now
Build This Now
What Is Claude Code?Claude Code InstallationClaude Code Native InstallerYour First Claude Code Project
speedy_devvkoen_salo
Blog/Handbook/Core/Claude Code Context Buffer

Claude Code Context Buffer

Claude Code's autocompact buffer dropped from 45K to 33K tokens in early 2026. Why it reserves space, when compaction fires, and the env var to tune it.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

Published Jan 24, 2026Handbook hubCore index

You hit 167K tokens. Claude compacts. Context slips away. Every. Single. Time.

Here's the annoying part: a slice of your context window stays off-limits, reserved by Claude Code itself. That slice used to be 45,000 tokens (22.5% of 200K). Early 2026, the buffer dropped to roughly 33,000 tokens (16.5%), freeing about 12K more tokens for actual work.

What It IsCurrent (2026)PreviousCan You Change It?
Compaction buffer~33K tokens (16.5%)~45K tokens (22.5%)No - hardcoded
Compaction trigger~83.5% usage~77-78% usageYes - CLAUDE_AUTOCOMPACT_PCT_OVERRIDE (1-100)
Usable context~167K tokens~155K tokensYes - use sonnet[1m] for 1M token window

No announcement covered this shift in the official Claude Code changelog. The nearest clue is v2.1.21: "Fixed auto-compact triggering too early on models with large output token limits", which probably retuned the buffer calculation. Online posts and docs still throw around the 45K number, but /context now reports 33K on current versions.

The buffer is there for real reasons. Knowing exactly how it works separates people fighting the system from people working with it.

Read this page when the buffer itself is the question: when compaction triggers, why the reserved space exists, and what environment variables actually change. If you want the broader workflow impact of the larger window, read 1M Context Window in Claude Code. If you want practical rules for when to stay in a session versus resetting it, read Context Management.

How Auto-Compaction Actually Works

Context usage gets watched continuously by Claude Code. At roughly 83.5% of the total window (up from ~77-78% before), auto-compaction kicks in.

Here's the sequence:

  1. Claude summarizes your conversation history
  2. Older messages get replaced with a condensed summary
  3. You lose granular details from early in the session
  4. The session continues with reduced context

On a 200K window, compaction lands somewhere near 167K tokens of real use. That 33K buffer isn't sitting idle. Claude spends it on the summarization itself.

The /context Command

Run /context to see exactly where your tokens are going:

claude-opus-4-5-20251101 · 76k/200k tokens (38%)

System prompt: 2.7k tokens (1.3%)
System tools: 16.8k tokens (8.4%)
Custom agents: 1.3k tokens (0.7%)
Memory files: 7.4k tokens (3.7%)
Skills: 1.0k tokens (0.5%)
Messages: 9.6k tokens (4.8%)
Free space: 118k (58.9%)
Autocompact buffer: 33.0k tokens (16.5%)

The Messages row is your conversation history. Watch it climb. When free space hits zero (buffer included), compaction fires.

Why the Buffer Exists

Three jobs fall on that ~33K:

  1. Working space for compaction. The summarization process itself needs tokens to operate
  2. Completion buffer. Allows current tasks to finish before compaction triggers
  3. Response generation space. Claude needs working memory to reason and construct responses

The buffer is baked into Claude Code's architecture. Requests to make it configurable have been closed as duplicates. GitHub Issue #15435 asked for this. The answer was no.

The Output Tokens Misconception

A lot of developers think CLAUDE_CODE_MAX_OUTPUT_TOKENS governs the compaction buffer.

It doesn't.

VariableWhat It ControlsDefault
CLAUDE_CODE_MAX_OUTPUT_TOKENSMax tokens per API response32K
(none - hardcoded)Compaction buffer reservation~33K

Two different mechanisms, zero overlap:

  • Output tokens. Caps how long a single Claude response can run
  • Compaction buffer. Reserved context space that triggers auto-compaction

Set CLAUDE_CODE_MAX_OUTPUT_TOKENS=16000 and you'll shrink Claude's max response length. Context before compaction won't budge. The 33K buffer is fixed.

# This limits response length, NOT context buffer
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=16000

Reasons to lower output tokens:

  • Faster responses (less to generate)
  • Lower costs per response
  • Force conciseness

Usable context before compaction? Still ~167K.

One caveat worth flagging: although CLAUDE_CODE_MAX_OUTPUT_TOKENS leaves the compaction buffer alone, pushing it very high can cut your effective context window. Output tokens are carved from the same context pool, so a bigger output reservation takes room away from history and system context. The 32K default balances reasonably well for most workflows.

The Real-World Impact

Picture a typical heavy session:

PhaseContext UsedWhat Happens
Start20KSystem prompt, CLAUDE.md, skills load
Mid-session80KDeep in implementation, full context
Pre-compact167KAuto-compact triggers
Post-compact~60KSummarized history, details lost

With a 33K buffer, compaction hits at 167K. That's your working ceiling, 12K higher than the old 155K one.

Where does information go? Into the summary. Exact variable names, precise error messages, subtle choices from early in the session all get squeezed into a recap that catches the gist and loses the detail.

What You Can Actually Control

1. Override the Compaction Trigger Percentage

One environment variable actually shifts when auto-compaction fires: CLAUDE_AUTOCOMPACT_PCT_OVERRIDE.

# Trigger compaction at 90% instead of the default ~83.5%
export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=90
 
# Trigger earlier at 70% for more aggressive compaction
export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=70

Values from 1 to 100 are accepted. The number directly sets the percentage at which auto-compaction triggers. A higher setting gives more usable context ahead of compaction and leaves less room for the summary job. A lower setting fires compaction earlier, keeps more working space, and gives you less room before the first hit.

Closest thing to a configurable buffer you can get. It doesn't change buffer size. It moves the moment compaction fires relative to the full window.

2. Use Extended Context Models

Rather than wrestling the 200K cap, reach for the 1M token context window.

As of March 2026, the 1M context window is generally available for Opus 4.6 and Sonnet 4.6, with no pricing premium. A 900K-token request costs the same per token as a 9K one. See our 1M context window guide for the full breakdown of what changed and what it means for your workflow.

/model sonnet[1m]

At 1M tokens, the compaction threshold moves way out. Even after a proportional buffer, room before compaction fires grows substantially. See the model selection guide for the full model alias reference.

3. Disable Auto-Compaction (Risky)

// ~/.claude/settings.json
{
  "autoCompact": false
}

Warning. GitHub Issue #18264 reports the setting may get ignored in some cases. Even when it holds, you risk slamming into hard context limits and crashing sessions.

Only flip this if you're ready to:

  • Monitor context manually with /context
  • Run /compact before hitting 100%
  • Accept occasional session crashes

4. Manual Compaction at Strategic Points

Turn off auto-compact, then compact on your schedule:

/compact   # Compact when you decide
/clear     # Full reset when starting new major task

Good moments to compact on purpose:

  • After completing a major feature
  • Before starting a new component
  • When debugging context feels stale

Upside: you pick what gets summarized and when, which keeps the fine-grained details around active work.

5. Work Within the 167K Limit

Accept that heavy sessions are going to compact. Set up for it:

  • Keep CLAUDE.md and skills lean
  • Use session files to persist state
  • Break complex tasks into multiple sessions

6. Proactive Backup Strategy

The most effective move: back things up before compaction arrives.

An idea catching on in the Claude Code community: proactive clearing at 50% plus structured recovery beats lossy auto-compaction.

Auto-compaction condenses your conversation and drops the fine detail. But do this instead:

  1. Continuously record your session to a structured backup
  2. Clear context manually at a threshold (like 50%)
  3. Reload from structured backup instead of lossy summary

Context fidelity goes up. The backup holds exact details that summarization throws away.

How To Choose Your Operating Mode

Most people do not need one magic setting. They need the right mode for the kind of work they are doing.

ModeBest ForWhat You Do
Default auto-compactEveryday coding, short to medium sessionsLeave the defaults alone and watch /context occasionally
Manual compactingMulti-phase work where you know what must surviveCompact on purpose before phase changes
Long-context model + defaultsLarge-repo work and long tracesUse the 1M window to reduce forced compactions
Threshold backup workflowWork where losing detail is expensiveSnapshot early, clear early, reload exact state instead of relying on summaries

The right question is not "how do I beat the buffer?" It is "how expensive is a lossy compact for this session?"

Three Worked Examples

Example 1: Regular feature build

You are building one feature, touching a handful of files, and the session is unlikely to cross 100K tokens.

Best move:

  • use defaults
  • check /context once or twice
  • do not over-engineer it

The buffer is not your problem here.

Example 2: Debugging session with lots of dead ends

You spend 40 minutes inspecting logs, traces, and failed hypotheses. You already know the next phase will be implementation.

Best move:

  • do not wait for autocompact
  • run /compact focus on the confirmed root cause, affected files, and fix plan
  • continue into implementation

This keeps the useful diagnosis while dropping the junk.

Example 3: High-detail content or audit session

Suppose you are auditing a security flow or writing a source-heavy article. Exact quotes, file paths, and observed behavior matter.

Best move:

  • keep a structured backup or notes file
  • clear earlier than usual
  • restore from the exact backup instead of trusting a compacted summary

This is where the backup workflow wins. The problem is not raw capacity. It is information fidelity.

When To Change CLAUDE_AUTOCOMPACT_PCT_OVERRIDE

This variable is useful, but only in specific cases.

Raise it when:

  • the current task is coherent and you want more room before compaction
  • you are actively watching usage
  • the cost of a too-early compact is higher than the risk of waiting

Lower it when:

  • the session is messy and you want summaries sooner
  • you are okay with more frequent compaction
  • you want Claude to keep the working set tighter

Leave it alone when:

  • you are not monitoring context actively
  • you do not have a clear reason to change the trigger
  • the real problem is task switching, not compaction

Changing the trigger is not a substitute for good session boundaries. It only shifts when the same underlying mechanism fires.

StatusLine: The Only Live Monitor

StatusLine is the one mechanism that gets real-time context metrics. Other hooks don't receive token counts.

// .claude/settings.json
{
  "statusLine": {
    "type": "command",
    "command": "node .claude/hooks/context-monitor.mjs"
  }
}

The statusline receives JSON with context_window.remaining_percentage. That's live data, ready to act on.

Critical calculation. The remaining_percentage field already counts the 16.5% autocompact buffer. For actual "free until autocompact":

const AUTOCOMPACT_BUFFER_PCT = 16.5;
const freeUntilCompact = Math.max(
  0,
  remaining_percentage - AUTOCOMPACT_BUFFER_PCT,
);

25% remaining really means 8.5% left before compaction.

Why Hooks Can't Inject /clear

A technical wall a lot of people run into: hooks cannot inject slash commands.

Reasonable guess: a hook spots high context usage and injects /clear. It can't:

  • UserPromptSubmit has no updatedPrompt field. It can add context or block, but never replace
  • Slash commands skip hook evaluation entirely
  • No hook fires "instead of" user input

Real ways to clear and recover programmatically:

  1. Claude Agent SDK. Send /clear via the SDK
  2. Headless CLI wrapper. Pipe commands to headless Claude Code
  3. Manual workflow. Hook warns you, you run /clear, SessionStart restores

What Happens at 100% Context

Push context all the way to the edge and here's what follows:

  1. Best case. Claude's response gets truncated
  2. Worse case. API returns an error, turn fails
  3. Worst case. Session becomes unresponsive

The 33K buffer is there to keep these from happening. It's protection, not waste.

Key Takeaways

  1. The buffer just dropped from 45K to 33K. Undocumented change, about 12K more usable tokens
  2. Compaction now triggers at ~83.5% usage. That puts usable context around ~167K (up from ~155K)
  3. CLAUDE_AUTOCOMPACT_PCT_OVERRIDE shifts the trigger. Values 1 through 100 set when compaction fires
  4. sonnet[1m] offers 1M token context. A real alternative to wrestling 200K limits
  5. Output tokens and compaction buffer are separate. Don't mix them up
  6. autoCompact: false may work. It also has reported bugs
  7. StatusLine is the only live context monitor. Other hooks don't see token counts
  8. Hooks cannot inject /clear. Go through the SDK, a wrapper, or a manual workflow
  9. Proactive clearing plus structured recovery beats lossy auto-compaction

The buffer is there for good reason. Instead of fighting it, work alongside it: keep state in session files, run threshold-based backups ahead of compaction, and think about proactive clearing for heavy sessions.

The Solution: Threshold-Based Backups

The buffer is fixed. How you deal with approaching it is not.

Check out our threshold-based backup system for a proactive setup that watches context through StatusLine and creates backups at 30%, 15%, and 5% remaining, before compaction wipes your session history.

Related Resources

  • Context Recovery Hook - Threshold-based backup system
  • Context Engineering Guide - Strategic context usage
  • Memory Optimization - Reduce static context overhead
  • Claude Code Hooks Guide - All 12 hook types explained

Continue in Core

  • 1M Context Window in Claude Code
    Anthropic flipped the 1M token context window on for Opus 4.6 and Sonnet 4.6 in Claude Code. No beta header, no surcharge, flat pricing, and fewer compactions.
  • AGENTS.md vs CLAUDE.md Explained
    Two context files, one codebase. How AGENTS.md and CLAUDE.md differ, what each one does, and how to use both without duplicating anything.
  • Auto Dream
    Claude Code cleans up its own project notes between sessions. Stale entries get pruned, contradictions get resolved, topic files get reshuffled. Run /memory.
  • Auto Memory in Claude Code
    Auto memory lets Claude Code keep running project notes. Where the files sit, what gets written, how /memory toggles it, and when to pick it over CLAUDE.md.
  • Auto-Planning Strategies
    Auto Plan Mode uses --append-system-prompt to force Claude Code into a plan-first loop. File operations pause for approval before anything gets touched.
  • Autonomous Claude Code
    A unified stack for agents that ship features overnight. Threads give you the structure, Ralph loops give you the autonomy, verification keeps it honest.

More from Handbook

  • Agent Fundamentals
    Five ways to build specialist agents in Claude Code: Task sub-agents, .claude/agents YAML, custom slash commands, CLAUDE.md personas, and perspective prompts.
  • Agent Harness Engineering
    The harness is every layer around your AI agent except the model itself. Learn the five control levers, the constraint paradox, and why harness design determines agent performance more than the model does.
  • Agent Patterns
    Orchestrator, fan-out, validation chain, specialist routing, progressive refinement, and watchdog. Six orchestration shapes to wire Claude Code sub-agents with.
  • Agent Teams Best Practices
    Battle-tested patterns for Claude Code Agent Teams. Context-rich spawn prompts, right-sized tasks, file ownership, delegate mode, and v2.1.33-v2.1.45 fixes.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

On this page

How Auto-Compaction Actually Works
The /context Command
Why the Buffer Exists
The Output Tokens Misconception
The Real-World Impact
What You Can Actually Control
1. Override the Compaction Trigger Percentage
2. Use Extended Context Models
3. Disable Auto-Compaction (Risky)
4. Manual Compaction at Strategic Points
5. Work Within the 167K Limit
6. Proactive Backup Strategy
How To Choose Your Operating Mode
Three Worked Examples
Example 1: Regular feature build
Example 2: Debugging session with lots of dead ends
Example 3: High-detail content or audit session
When To Change CLAUDE_AUTOCOMPACT_PCT_OVERRIDE
Raise it when:
Lower it when:
Leave it alone when:
StatusLine: The Only Live Monitor
Why Hooks Can't Inject /clear
What Happens at 100% Context
Key Takeaways
The Solution: Threshold-Based Backups
Related Resources

Stop configuring. Start building.

SaaS builder templates with AI orchestration.