1M Context Window in Claude Code
Anthropic shipped the 1M token window on Opus 4.6 and Sonnet 4.6. Flat pricing, no beta header, fewer compactions.
Context limits have been the running annoyance in Claude Code since launch. That pain just shrank. Anthropic flipped the 1M token window on for Opus 4.6 and Sonnet 4.6, with no beta flag, no surcharge, and no waitlist. Max, Team, and Enterprise plans already have it turned on.
Think of this less as a version bump and more as 5x the working memory your agent carries. That memory holds your codebase, your tool call history, and the chain of reasoning across long runs. Pricing stays flat too. A 900K-token request costs the same per token as a 9K one.
200K vs 1M At A Glance
| Metric | Before (200K) | After (1M) |
|---|---|---|
| Usable tokens | ~167K | ~830K |
| Compaction frequency | Every 20-30 min on complex tasks | 15% fewer events |
| Files loadable | Small project | Entire monorepo |
| Media items per request | 100 | 600 |
| Long-context pricing | Premium ($10/$37.50 for Opus) | Same rate as short requests |
| Beta header required | Yes (over 200K) | No |
What Actually Changed At GA
The big window had been in beta for months. GA is about dropping the friction that made beta feel second-class.
Flat pricing across the whole window. Long context no longer carries a premium. Opus 4.6 is $5/$25 per million tokens (input/output). Sonnet 4.6 is $3/$15. Your 10K request and your 950K request bill at the same per-token rate.
Full rate limits everywhere. Longer requests used to get throttled harder during beta. That cap is gone. A 1M-token call pulls the same throughput as a short one.
600 media items in one request. Images and PDF pages used to cap at 100. The new ceiling is 6x higher at 600. For design system work, doc review, or contract stacks, this is a real lift.
No header toggle. Requests above 200K used to need an anthropic-beta header. Any existing headers just get ignored now. The API handles it.
Live on multi-cloud. You get the 1M window on Claude Platform, Microsoft Azure Foundry, and Google Cloud Vertex AI.
Why Claude Code Feels Different Now
API users get a pricing and convenience win here. Claude Code users get something structural.
Compaction Fires Less Often
Anyone who has pushed Claude Code on real work knows the compaction tax. You load files, chain tool calls, build up reasoning, and then auto-compaction fires. Claude squeezes the conversation to free space. Nuance gets lost. Edge cases disappear. Multi-step tasks drop the thread halfway through.
Jon Bell, Anthropic's CPO, put a figure on it: compaction events dropped 15% since the big window shipped. Not a lab benchmark. This is measured on real Claude Code traffic. Agents keep their context and push through hours of work without forgetting what they loaded at the start.
Curious about the mechanics of when compaction fires? See the context buffer management guide. The short story: Claude Code holds back a buffer around 33K tokens, then compacts when usage hits roughly 83.5%. A 1M ceiling means you have about 5x the room before you hit that line.
Whole Codebases In One Shot
At 200K, you had roughly 150K tokens of working space once the buffer was reserved. Fine for a small repo. Painful on anything larger, because you were constantly picking files.
Bump that to 1M and your usable headroom is ~830K. That is thousands of source files. A whole monorepo. Full docs next to the code they describe. Claude can hold the API layer and the frontend that calls it, the migration and the schema it changes, the test file and the code under test. All at once. You stop hand-picking which files to load.
Agent Traces That Actually Finish
This is the payoff for agent teams and complex orchestration runs. Every tool call, every reasoning step, every file read piles into context. At 200K, a multi-agent session on real work chewed through the budget in 20 to 30 minutes.
Anton Biryukov, a software engineer at Ramp, described the old pattern: "Claude Code can burn 100K+ tokens searching Datadog, Braintrust, databases, and source code. Then compaction kicks in." At 1M, he searches, searches again, collects edge cases, and ships fixes. All inside one session. Nothing gets dropped on the way.
Can The Model Really Use 1M Tokens?
A huge context is worthless if the model cannot actually recall and reason over what lives inside it. Anthropic ran two benchmarks built to test exactly that at the 1M mark.
Opus 4.6 scores 78.3% on MRCR v2 at 1M tokens. MRCR (Multi-Round Coreference Resolution) checks whether a model can track entities and the links between them across a huge prompt. Nearly 80% accuracy over a million tokens means the model is not just storing the words. It still knows how distant pieces connect.
Sonnet 4.6 scores 68.4% on GraphWalks BFS at 1M tokens. This test measures how well the model walks graph structures planted deep inside long inputs. Can it trace chains of references across hundreds of thousands of tokens? Both scores are listed as the top marks for frontier models at those context lengths.
In practice, this means Claude can still locate the helper function you defined 500K tokens ago and see how it hooks into the component you are editing right now.
How To Use It In Your Workflow
Change What You Do
Stop hand-managing file inclusion. Every @file call used to be a tradeoff at 200K. At 1M, just load what you need and move on. Pull in the test file with the implementation. Pull in the types with the component. Give Claude the whole picture.
Run sessions longer. The habit of restarting every 30 minutes came from survival, not preference. With 5x the ceiling, a session can run for hours on hard tasks. Restart when you genuinely switch focus, not because you are nervous about the buffer. For rules on when to compact and when to keep going, see the context management guide.
Lean into multi-step agents. The real payoff is not the quick edit. It is the kind of work where Claude has to research, plan, implement, and check across lots of files. That chain used to snap when compaction fired mid-task. It now fits in one window without drama.
Rethink your context engineering playbook. Your loading and preservation strategies still count. They just have more oxygen. The fundamentals from our context management guide still hold. The pressure shifts from "stay alive under 200K" to "use 1M well."
What Does Not Change
Context hygiene still matters. A 1M ceiling is not a cue to pile everything in and hope Claude sorts it out. Irrelevant files burn tokens and thin out the signal Claude uses to focus.
CLAUDE.md, skills-first loading, and clean session management are still best practice. They just get more breathing room. If you already follow the usage optimization patterns, the big window pays you back even more.
Who Gets The 1M Window
On Claude Code, Max, Team, and Enterprise plans get the 1M window automatically with Opus 4.6. Nothing to toggle. The extra usage allocation that long-context requests used to need is gone.
API users get it at standard per-token rates. Opus 4.6 at $5/$25 per million tokens. Sonnet 4.6 at $3/$15. No premium tier for long context.
The 200K window is still around as the default for standard API requests and lower-tier plans. The 1M option is tied specifically to Opus 4.6 and Sonnet 4.6.
What This Signals
Anthropic is not just making context windows larger. They are stripping out the tradeoffs that made large windows annoying to use. Flat pricing means you do not budget long requests differently. Full rate limits mean you do not lose throughput. Killing the beta header means existing code just runs.
The direction is obvious. Context management is shifting from a user job to an infrastructure job. Models keep getting better at using long context. Pricing keeps the door open. The tooling sorts itself out.
For Claude Code users, the takeaway is simple. Your agents think longer and remember more. Build your workflows on that, and the tasks that used to demand careful session management and hand-picked context start to just work. End to end. In one window.
Related Resources
- Context Buffer Management -- How auto-compaction works and the 33K token buffer
- Context Engineering -- The six pillars framework for loading context strategically
- Context Management -- Strategies for keeping critical context intact across sessions
- Model Selection Guide -- Choosing between Opus 4.6 and Sonnet 4.6 for different tasks
Stop configuring. Start building.