Build This Now
Build This Now
What Is Claude Code?Claude Code InstallationClaude Code Native InstallerYour First Claude Code Project
1M Context Window in Claude CodeContext EngineeringContext Management in Claude CodeClaude Code Context Buffer
Get Build This Now
speedy_devvkoen_salo
Blog/Handbook/Core/Claude Code Context Buffer

Claude Code Context Buffer

The autocompact buffer shrank from 45K to 33K tokens in early 2026. Here's what changed and how to work around it.

You hit 167K tokens. Claude compacts. Context slips away. Every. Single. Time.

Here's the annoying part: a slice of your context window stays off-limits, reserved by Claude Code itself. That slice used to be 45,000 tokens (22.5% of 200K). Early 2026, the buffer dropped to roughly 33,000 tokens (16.5%), freeing about 12K more tokens for actual work.

What It IsCurrent (2026)PreviousCan You Change It?
Compaction buffer~33K tokens (16.5%)~45K tokens (22.5%)No - hardcoded
Compaction trigger~83.5% usage~77-78% usageYes - CLAUDE_AUTOCOMPACT_PCT_OVERRIDE (1-100)
Usable context~167K tokens~155K tokensYes - use sonnet[1m] for 1M token window

No announcement covered this shift in the official Claude Code changelog. The nearest clue is v2.1.21: "Fixed auto-compact triggering too early on models with large output token limits", which probably retuned the buffer calculation. Online posts and docs still throw around the 45K number, but /context now reports 33K on current versions.

The buffer is there for real reasons. Knowing exactly how it works separates people fighting the system from people working with it.

How Auto-Compaction Actually Works

Context usage gets watched continuously by Claude Code. At roughly 83.5% of the total window (up from ~77-78% before), auto-compaction kicks in.

Here's the sequence:

  1. Claude summarizes your conversation history
  2. Older messages get replaced with a condensed summary
  3. You lose granular details from early in the session
  4. The session continues with reduced context

On a 200K window, compaction lands somewhere near 167K tokens of real use. That 33K buffer isn't sitting idle. Claude spends it on the summarization itself.

The /context Command

Run /context to see exactly where your tokens are going:

claude-opus-4-5-20251101 · 76k/200k tokens (38%)

System prompt: 2.7k tokens (1.3%)
System tools: 16.8k tokens (8.4%)
Custom agents: 1.3k tokens (0.7%)
Memory files: 7.4k tokens (3.7%)
Skills: 1.0k tokens (0.5%)
Messages: 9.6k tokens (4.8%)
Free space: 118k (58.9%)
Autocompact buffer: 33.0k tokens (16.5%)

The Messages row is your conversation history. Watch it climb. When free space hits zero (buffer included), compaction fires.

Why the Buffer Exists

Three jobs fall on that ~33K:

  1. Working space for compaction. The summarization process itself needs tokens to operate
  2. Completion buffer. Allows current tasks to finish before compaction triggers
  3. Response generation space. Claude needs working memory to reason and construct responses

The buffer is baked into Claude Code's architecture. Requests to make it configurable have been closed as duplicates. GitHub Issue #15435 asked for this. The answer was no.

The Output Tokens Misconception

A lot of developers think CLAUDE_CODE_MAX_OUTPUT_TOKENS governs the compaction buffer.

It doesn't.

VariableWhat It ControlsDefault
CLAUDE_CODE_MAX_OUTPUT_TOKENSMax tokens per API response32K
(none - hardcoded)Compaction buffer reservation~33K

Two different mechanisms, zero overlap:

  • Output tokens. Caps how long a single Claude response can run
  • Compaction buffer. Reserved context space that triggers auto-compaction

Set CLAUDE_CODE_MAX_OUTPUT_TOKENS=16000 and you'll shrink Claude's max response length. Context before compaction won't budge. The 33K buffer is fixed.

# This limits response length, NOT context buffer
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=16000

Reasons to lower output tokens:

  • Faster responses (less to generate)
  • Lower costs per response
  • Force conciseness

Usable context before compaction? Still ~167K.

One caveat worth flagging: although CLAUDE_CODE_MAX_OUTPUT_TOKENS leaves the compaction buffer alone, pushing it very high can cut your effective context window. Output tokens are carved from the same context pool, so a bigger output reservation takes room away from history and system context. The 32K default balances reasonably well for most workflows.

The Real-World Impact

Picture a typical heavy session:

PhaseContext UsedWhat Happens
Start20KSystem prompt, CLAUDE.md, skills load
Mid-session80KDeep in implementation, full context
Pre-compact167KAuto-compact triggers
Post-compact~60KSummarized history, details lost

With a 33K buffer, compaction hits at 167K. That's your working ceiling, 12K higher than the old 155K one.

Where does information go? Into the summary. Exact variable names, precise error messages, subtle choices from early in the session all get squeezed into a recap that catches the gist and loses the detail.

What You Can Actually Control

1. Override the Compaction Trigger Percentage

One environment variable actually shifts when auto-compaction fires: CLAUDE_AUTOCOMPACT_PCT_OVERRIDE.

# Trigger compaction at 90% instead of the default ~83.5%
export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=90
 
# Trigger earlier at 70% for more aggressive compaction
export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=70

Values from 1 to 100 are accepted. The number directly sets the percentage at which auto-compaction triggers. A higher setting gives more usable context ahead of compaction and leaves less room for the summary job. A lower setting fires compaction earlier, keeps more working space, and gives you less room before the first hit.

Closest thing to a configurable buffer you can get. It doesn't change buffer size. It moves the moment compaction fires relative to the full window.

2. Use Extended Context Models

Rather than wrestling the 200K cap, reach for the 1M token context window.

As of March 2026, the 1M context window is generally available for Opus 4.6 and Sonnet 4.6, with no pricing premium. A 900K-token request costs the same per token as a 9K one. See our 1M context window guide for the full breakdown of what changed and what it means for your workflow.

/model sonnet[1m]

At 1M tokens, the compaction threshold moves way out. Even after a proportional buffer, room before compaction fires grows substantially. See the model selection guide for the full model alias reference.

3. Disable Auto-Compaction (Risky)

// ~/.claude/settings.json
{
  "autoCompact": false
}

Warning. GitHub Issue #18264 reports the setting may get ignored in some cases. Even when it holds, you risk slamming into hard context limits and crashing sessions.

Only flip this if you're ready to:

  • Monitor context manually with /context
  • Run /compact before hitting 100%
  • Accept occasional session crashes

4. Manual Compaction at Strategic Points

Turn off auto-compact, then compact on your schedule:

/compact   # Compact when you decide
/clear     # Full reset when starting new major task

Good moments to compact on purpose:

  • After completing a major feature
  • Before starting a new component
  • When debugging context feels stale

Upside: you pick what gets summarized and when, which keeps the fine-grained details around active work.

5. Work Within the 167K Limit

Accept that heavy sessions are going to compact. Set up for it:

  • Keep CLAUDE.md and skills lean
  • Use session files to persist state
  • Break complex tasks into multiple sessions

6. Proactive Backup Strategy

The most effective move: back things up before compaction arrives.

An idea catching on in the Claude Code community: proactive clearing at 50% plus structured recovery beats lossy auto-compaction.

Auto-compaction condenses your conversation and drops the fine detail. But do this instead:

  1. Continuously record your session to a structured backup
  2. Clear context manually at a threshold (like 50%)
  3. Reload from structured backup instead of lossy summary

Context fidelity goes up. The backup holds exact details that summarization throws away.

StatusLine: The Only Live Monitor

StatusLine is the one mechanism that gets real-time context metrics. Other hooks don't receive token counts.

// .claude/settings.json
{
  "statusLine": {
    "type": "command",
    "command": "node .claude/hooks/context-monitor.mjs"
  }
}

The statusline receives JSON with context_window.remaining_percentage. That's live data, ready to act on.

Critical calculation. The remaining_percentage field already counts the 16.5% autocompact buffer. For actual "free until autocompact":

const AUTOCOMPACT_BUFFER_PCT = 16.5;
const freeUntilCompact = Math.max(
  0,
  remaining_percentage - AUTOCOMPACT_BUFFER_PCT,
);

25% remaining really means 8.5% left before compaction.

Why Hooks Can't Inject /clear

A technical wall a lot of people run into: hooks cannot inject slash commands.

Reasonable guess: a hook spots high context usage and injects /clear. It can't:

  • UserPromptSubmit has no updatedPrompt field. It can add context or block, but never replace
  • Slash commands skip hook evaluation entirely
  • No hook fires "instead of" user input

Real ways to clear and recover programmatically:

  1. Claude Agent SDK. Send /clear via the SDK
  2. Headless CLI wrapper. Pipe commands to headless Claude Code
  3. Manual workflow. Hook warns you, you run /clear, SessionStart restores

What Happens at 100% Context

Push context all the way to the edge and here's what follows:

  1. Best case. Claude's response gets truncated
  2. Worse case. API returns an error, turn fails
  3. Worst case. Session becomes unresponsive

The 33K buffer is there to keep these from happening. It's protection, not waste.

Key Takeaways

  1. The buffer just dropped from 45K to 33K. Undocumented change, about 12K more usable tokens
  2. Compaction now triggers at ~83.5% usage. That puts usable context around ~167K (up from ~155K)
  3. CLAUDE_AUTOCOMPACT_PCT_OVERRIDE shifts the trigger. Values 1 through 100 set when compaction fires
  4. sonnet[1m] offers 1M token context. A real alternative to wrestling 200K limits
  5. Output tokens and compaction buffer are separate. Don't mix them up
  6. autoCompact: false may work. It also has reported bugs
  7. StatusLine is the only live context monitor. Other hooks don't see token counts
  8. Hooks cannot inject /clear. Go through the SDK, a wrapper, or a manual workflow
  9. Proactive clearing plus structured recovery beats lossy auto-compaction

The buffer is there for good reason. Instead of fighting it, work alongside it: keep state in session files, run threshold-based backups ahead of compaction, and think about proactive clearing for heavy sessions.

The Solution: Threshold-Based Backups

The buffer is fixed. How you deal with approaching it is not.

Check out our threshold-based backup system for a proactive setup that watches context through StatusLine and creates backups at 30%, 15%, and 5% remaining, before compaction wipes your session history.

Related Resources

  • Context Recovery Hook - Threshold-based backup system
  • Context Engineering Guide - Strategic context usage
  • Memory Optimization - Reduce static context overhead
  • Claude Code Hooks Guide - All 12 hook types explained

More in this guide

  • Agent Fundamentals
    Five ways to build specialized agents in Claude Code, from sub-agents to .claude/agents/ definitions to perspective prompts.
  • Agent Patterns
    Orchestrator, fan-out, validation chain, specialist routing, progressive refinement, and watchdog. Six ways to wire sub-agents in Claude Code.
  • Agent Teams Best Practices
    Battle-tested patterns for Claude Code agent teams. Troubleshooting, limitations, plan mode quirks, and fixes shipped from v2.1.33 through v2.1.45.
  • Agent Teams Controls
    Stop your agent team lead from grabbing implementation work. Configure delegate mode, plan approval, hooks, and CLAUDE.md for teams.
  • Agent Teams Prompt Templates
    Ten tested Agent Teams prompts for Claude Code. Code review, debugging, feature builds, architecture calls, and campaign research. Paste and go.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

Get Build This Now

Context Management in Claude Code

How to stretch Claude Code across big projects. Covers the 80/20 rule, /compact, CLAUDE.md, chunking, clean recovery, and four open-source tools that cut context 10×.

Claude Code Session Memory

Session Memory quietly summarises each Claude Code session and loads the relevant ones next time you open the project.

On this page

How Auto-Compaction Actually Works
The /context Command
Why the Buffer Exists
The Output Tokens Misconception
The Real-World Impact
What You Can Actually Control
1. Override the Compaction Trigger Percentage
2. Use Extended Context Models
3. Disable Auto-Compaction (Risky)
4. Manual Compaction at Strategic Points
5. Work Within the 167K Limit
6. Proactive Backup Strategy
StatusLine: The Only Live Monitor
Why Hooks Can't Inject /clear
What Happens at 100% Context
Key Takeaways
The Solution: Threshold-Based Backups
Related Resources

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

Get Build This Now