1M Context Window in Claude Code

Context limits have been the running annoyance in Claude Code since launch. That pain just shrank. Anthropic flipped the 1M token window on for Opus 4.6 and Sonnet 4.6, with no beta flag, no surcharge, and no waitlist. Max, Team, and Enterprise plans already have it turned on.

Think of this less as a version bump and more as 5x the working memory your agent carries. That memory holds your codebase, your tool call history, and the chain of reasoning across long runs. Pricing stays flat too. A 900K-token request costs the same per token as a 9K one.

Use this page to understand what the 1M window changed at the product and workflow level. If your real question is when compaction fires and how the reserved buffer behaves, read Claude Code Context Buffer. If your question is whether to continue, compact, rewind, or restart a session, read Context Management.

200K vs 1M At A Glance

Metric	Before (200K)	After (1M)
Usable tokens	~167K	~830K
Compaction frequency	Every 20-30 min on complex tasks	15% fewer events
Files loadable	Small project	Entire monorepo
Media items per request	100	600
Long-context pricing	Premium ($10/$37.50 for Opus)	Same rate as short requests
Beta header required	Yes (over 200K)	No

What Actually Changed At GA

The big window had been in beta for months. GA is about dropping the friction that made beta feel second-class.

Flat pricing across the whole window. Long context no longer carries a premium. Opus 4.6 is $5/$25 per million tokens (input/output). Sonnet 4.6 is $3/$15. Your 10K request and your 950K request bill at the same per-token rate.

Full rate limits everywhere. Longer requests used to get throttled harder during beta. That cap is gone. A 1M-token call pulls the same throughput as a short one.

600 media items in one request. Images and PDF pages used to cap at 100. The new ceiling is 6x higher at 600. For design system work, doc review, or contract stacks, this is a real lift.

No header toggle. Requests above 200K used to need an anthropic-beta header. Any existing headers just get ignored now. The API handles it.

Live on multi-cloud. You get the 1M window on Claude Platform, Microsoft Azure Foundry, and Google Cloud Vertex AI.

Why Claude Code Feels Different Now

API users get a pricing and convenience win here. Claude Code users get something structural.

Compaction Fires Less Often

Anyone who has pushed Claude Code on real work knows the compaction tax. You load files, chain tool calls, build up reasoning, and then auto-compaction fires. Claude squeezes the conversation to free space. Nuance gets lost. Edge cases disappear. Multi-step tasks drop the thread halfway through.

Jon Bell, Anthropic's CPO, put a figure on it: compaction events dropped 15% since the big window shipped. Not a lab benchmark. This is measured on real Claude Code traffic. Agents keep their context and push through hours of work without forgetting what they loaded at the start.

Curious about the mechanics of when compaction fires? See the context buffer management guide. The short story: Claude Code holds back a buffer around 33K tokens, then compacts when usage hits roughly 83.5%. A 1M ceiling means you have about 5x the room before you hit that line.

Whole Codebases In One Shot

At 200K, you had roughly 150K tokens of working space once the buffer was reserved. Fine for a small repo. Painful on anything larger, because you were constantly picking files.

Bump that to 1M and your usable headroom is ~830K. That is thousands of source files. A whole monorepo. Full docs next to the code they describe. Claude can hold the API layer and the frontend that calls it, the migration and the schema it changes, the test file and the code under test. All at once. You stop hand-picking which files to load.

Agent Traces That Actually Finish

This is the payoff for agent teams and complex orchestration runs. Every tool call, every reasoning step, every file read piles into context. At 200K, a multi-agent session on real work chewed through the budget in 20 to 30 minutes.

Anton Biryukov, a software engineer at Ramp, described the old pattern: "Claude Code can burn 100K+ tokens searching Datadog, Braintrust, databases, and source code. Then compaction kicks in." At 1M, he searches, searches again, collects edge cases, and ships fixes. All inside one session. Nothing gets dropped on the way.

Can The Model Really Use 1M Tokens?

A huge context is worthless if the model cannot actually recall and reason over what lives inside it. Anthropic ran two benchmarks built to test exactly that at the 1M mark.

Opus 4.6 scores 78.3% on MRCR v2 at 1M tokens. MRCR (Multi-Round Coreference Resolution) checks whether a model can track entities and the links between them across a huge prompt. Nearly 80% accuracy over a million tokens means the model is not just storing the words. It still knows how distant pieces connect.

Sonnet 4.6 scores 68.4% on GraphWalks BFS at 1M tokens. This test measures how well the model walks graph structures planted deep inside long inputs. Can it trace chains of references across hundreds of thousands of tokens? Both scores are listed as the top marks for frontier models at those context lengths.

In practice, this means Claude can still locate the helper function you defined 500K tokens ago and see how it hooks into the component you are editing right now.

How To Use It In Your Workflow

Change What You Do

Stop hand-managing file inclusion. Every @file call used to be a tradeoff at 200K. At 1M, just load what you need and move on. Pull in the test file with the implementation. Pull in the types with the component. Give Claude the whole picture.

Run sessions longer. The habit of restarting every 30 minutes came from survival, not preference. With 5x the ceiling, a session can run for hours on hard tasks. Restart when you genuinely switch focus, not because you are nervous about the buffer. For rules on when to compact and when to keep going, see the context management guide.

Lean into multi-step agents. The real payoff is not the quick edit. It is the kind of work where Claude has to research, plan, implement, and check across lots of files. That chain used to snap when compaction fired mid-task. It now fits in one window without drama.

Rethink your context engineering playbook. Your loading and preservation strategies still count. They just have more oxygen. The fundamentals from our context management guide still hold. The pressure shifts from "stay alive under 200K" to "use 1M well."

Where The 1M Window Actually Changes Outcomes

The best way to think about 1M context is not "Claude can read more." It is "entire classes of tasks stop feeling brittle."

1. Cross-layer bug hunts

Old pattern at 200K:

load the frontend
notice the issue might be in the API
unload some files
load the API
realize the bug also depends on the schema or a migration
compact halfway through and lose early clues

At 1M, you can often keep the page component, the API handler, the schema, the migration, and the failing test all in one session. That is not just convenient. It changes root-cause quality.

2. Security review across a real system boundary

Security reviews are context-hungry because the issue rarely lives in one file.

A serious review may need:

auth middleware
session handling
reset-password flow
rate-limit logic
audit logs
the route handlers that expose the surface

At 200K, you were choosing which layer to omit. At 1M, you can review the whole flow and ask better questions about takeover risk, replay risk, and privilege boundary mistakes.

3. Monorepo changes without hand-curating every file

At 200K, large-repo work often turns into context bookkeeping. You spend half the session deciding what Claude is allowed to see.

At 1M, a migration across:

shared types
API contracts
frontend callers
integration tests

fits much more naturally. You still need scope discipline. You just stop doing token triage every ten minutes.

4. Long document and design review

The bigger window matters outside code too. Product specs, design docs, architecture notes, PDFs, screenshots, and related implementation files can all stay in the same request. That makes "spec-to-implementation" and "design-to-code" work much more stable.

How To Tell If You Actually Need 1M

You probably benefit from the larger window if your sessions regularly involve one or more of these:

Signal	Why It Points To 1M
You keep hand-picking which files Claude may load	The working set is bigger than the old window tolerated comfortably
Compaction interrupts real work, not just rambling	The bottleneck is useful context, not sloppy prompting
Your task spans code + docs + tests + configs	Cross-surface tasks chew through 200K quickly
You run long agent traces or subagent-heavy workflows	Tool history compounds fast
You review PDFs, screenshots, or large reference sets	Media ceilings matter too

If your work is mostly quick edits, tiny repos, or short focused sessions, 1M is nice but not transformative. The lift shows up on broader tasks where context used to be the main constraint.

What Does Not Change

Context hygiene still matters. A 1M ceiling is not a cue to pile everything in and hope Claude sorts it out. Irrelevant files burn tokens and thin out the signal Claude uses to focus.

CLAUDE.md, skills-first loading, and clean session management are still best practice. They just get more breathing room. If you already follow the usage optimization patterns, the big window pays you back even more.

Who Gets The 1M Window

On Claude Code, Max, Team, and Enterprise plans get the 1M window automatically with Opus 4.6. Nothing to toggle. The extra usage allocation that long-context requests used to need is gone.

API users get it at standard per-token rates. Opus 4.6 at $5/$25 per million tokens. Sonnet 4.6 at $3/$15. No premium tier for long context.

The 200K window is still around as the default for standard API requests and lower-tier plans. The 1M option is tied specifically to Opus 4.6 and Sonnet 4.6.

What This Signals

Anthropic is not just making context windows larger. They are stripping out the tradeoffs that made large windows annoying to use. Flat pricing means you do not budget long requests differently. Full rate limits mean you do not lose throughput. Killing the beta header means existing code just runs.

The direction is obvious. Context management is shifting from a user job to an infrastructure job. Models keep getting better at using long context. Pricing keeps the door open. The tooling sorts itself out.

For Claude Code users, the takeaway is simple. Your agents think longer and remember more. Build your workflows on that, and the tasks that used to demand careful session management and hand-picked context start to just work. End to end. In one window.

Context Buffer Management -- How auto-compaction works and the 33K token buffer
Context Engineering -- The six pillars framework for loading context strategically
Context Management -- Strategies for keeping critical context intact across sessions
Model Selection Guide -- Choosing between Opus 4.6 and Sonnet 4.6 for different tasks

1M Context Window in Claude Code

On this page