Claude Opus 4.6
Anthropic's upgraded Opus flagship ships with 1M context GA, 128K output, and the same $5/$25 pricing.
Opus 4.6 is the upgraded flagship from Anthropic. Planning takes more thought. Agent runs stay coherent for longer. Large codebases feel less hostile. And Claude catches its own bugs before you do. It is also the first Opus-class release to ship with a generally available 1M token window, and response output now reaches 128K tokens.
Coding is the headline change, and the price tag stays flat at $5/$25 per million tokens while scores on the hardest public evals moved up across the board. The raw numbers sit in the benchmark section below.
Key Specs
| Spec | Details |
|---|---|
| API ID | claude-opus-4-6 |
| Release Date | February 5, 2026 |
| Context Window | 1M tokens (GA as of March 2026) |
| Max Output | 128,000 tokens |
| Pricing | $5 input / $25 output per 1M tokens |
| Status | Active, current recommended Opus |
What Changed: The Coding Improvements
Anthropic dogfoods Claude. Every Anthropic engineer lives in Claude Code day to day, and nothing ships until it survives the internal use case first. The 4.6 gains are concrete and practical.
Planning is more careful. Before committing to an approach, the model sits with the problem longer, loops back through its own reasoning, notices logic errors sooner, and lands a stronger first draft on hard tasks.
Agent runs stay coherent. Older models drifted after a while. Here, focus holds across long sessions. A workflow that fires off tool call after tool call, dozens deep, reaches the finish line more often now.
Large codebases feel less hostile. Navigating big projects, reading them, and changing them all improved. Claude keeps a clearer picture of layout and house conventions across a long session.
Review and debugging hit harder. Catching its own mistakes is noticeably better, and reviews read more thoroughly. Tracing a bug through a chain of dependencies now needs far less hand-holding from you.
Easy work moves fast. The deeper reasoning gets saved for the hard steps, and Opus 4.6 no longer dwells on the obvious ones. Catch it overthinking something simple? Drop the default from high to medium with /effort.
Benchmark Results
New records landed in several categories.
| Benchmark | Score | Notable Comparison |
|---|---|---|
| Terminal-Bench 2.0 | 65.4% | GPT-5.2: 64.7% |
| GDPval-AA Elo | 1,606 | 144 Elo above GPT-5.2, 190 above Opus 4.5 |
| Humanity's Last Exam | Leading | Highest among all frontier models |
| BrowseComp | Leading | Best at finding hard-to-locate information online |
| OSWorld | 72.7% | State-of-the-art for computer use |
| MRCR v2 (8-needle) | 78.3% | Highest among frontier models at 1M context |
Inside Claude Code, the benchmark to watch is Terminal-Bench 2.0. It scores real terminal work across coding, sysadmin tasks, and file handling. Top slot here means Opus 4.6 is the strongest pick for what a developer actually does at the command line all day.
GDPval-AA sits on the opposite end of the eval spectrum. It measures knowledge work that drives real economic value, in finance, legal, and the rest of the white-collar stack. The lead over the next-best industry model is wide.
The MRCR v2 number matters for a different reason. "Context rot" is the usual complaint, where answers degrade as the chat stretches on. That drift shrinks here. Across very long windows, Opus 4.6 keeps its grip on small details and pulls up buried facts the prior version missed. The 78.3% score is a real change in how much of the window Claude can put to work.
Humanity's Last Exam tests broad multidisciplinary reasoning, and no frontier model tops Opus 4.6 on it. BrowseComp scores how well the model digs up information that is genuinely hard to find online. OSWorld grades real desktop computer use. The new release takes the crown on all three.
1M Token Context Window and 128K Output
As of March 2026, the full 1M window is generally available, and token pricing is flat end to end. The per-token rate on a 900K-token call matches the rate on a 9K one. No beta header is needed. Any legacy beta headers get silently dropped.
Media limits grew 6x at the GA launch. The per-request ceiling is now 600 images or PDF pages, versus 100 before. Rate limits stay at their full values no matter how long the context gets.
Output also jumped. The ceiling moved from 16K tokens to 128K, which lets Claude finish larger-output jobs in a single call. Whole modules or long analyses can now come back in one response instead of being chopped across many.
Inside Claude Code, the full 1M window ships on by default for Max, Team, and Enterprise plans. Anthropic reports a 15% drop in compaction events, so long conversations now survive end to end without lossy summarization kicking in. Whatever context-management workflow you already use still works. You just bump into the ceiling less often.
Safety Profile
Smarter does not mean less safe. Anthropic runs an automated behavioral audit, and Opus 4.6 scored low on the behaviors that matter: deception, sycophancy, reinforcing user delusions, and going along with misuse. Its alignment sits level with Opus 4.5, the previous record holder for best-aligned frontier release.
Legitimate prompts also come through more often. Opus 4.6 posts the lowest rate of over-refusals in any recent Claude release. Real requests get blocked less.
The cybersecurity number is the headline. On one internal run, the model surfaced 500+ previously unknown high-severity zero-day flaws sitting in open-source libraries. Anthropic is pushing this harder, aiming the model at OSS projects to hunt down and patch the flaws buried inside. Security teams can drop Opus 4.6 into code review as a first-pass vulnerability scanner.
New API and Product Features
The model upgrade landed alongside several new features.
Adaptive thinking. Extended thinking used to be a binary toggle. Claude now picks its own moments to think harder. With effort set to high (the default), extended thinking kicks in wherever it helps. Four tiers are available to developers: low, medium, high (default), and max.
Context compaction (beta). When a long chat drifts toward the context ceiling, Claude now summarizes and compacts it on its own. Long-running tasks keep going instead of running out of room.
Agent teams (Claude Code research preview). Multiple Claude instances can now run in parallel as one coordinated team. Read-heavy jobs that fan out into independent pieces, like codebase reviews, are the sweet spot. Everything else lives in the agent teams guide.
Claude in PowerPoint (research preview). Layouts, fonts, and slide masters all get parsed by Claude so the output stays on brand, whether it is filling a template or spinning a deck up from scratch. Available on the Max, Team, and Enterprise plans.
Pricing
No price bump. The 1M window ships with unified pricing across the whole context length. The old 200K+ premium tier has been retired.
| Tier | Cost |
|---|---|
| All contexts | $5 input / $25 output per 1M tokens |
| Pro plan | $20/month |
| Max plan | $100/month |
Been on Opus 4.5 with your spend dialed in? The jump to 4.6 is free upside at the old price. And if long-context calls were paying the premium tier, the bill just dropped.
How to Use Opus 4.6 in Claude Code
One command changes the default model:
claude config set model claude-opus-4-6
For a single session, override it without touching the default:
claude --model claude-opus-4-6
The model ships everywhere: claude.ai, the Messages API, AWS Bedrock, and Google Vertex AI. On the API, the identifier to use is claude-opus-4-6.
Opus 4.6 vs Opus 4.5: What Changed
| Feature | Opus 4.5 | Opus 4.6 |
|---|---|---|
| Context window | 200K (standard), 1M (API beta) | 1M (GA, unified pricing) |
| Max output tokens | 16,384 | 128,000 |
| Terminal-Bench 2.0 | Not tested on v2.0 | 65.4% (highest) |
| GDPval-AA Elo | 1,416 | 1,606 (+190 points) |
| MRCR v2 | Not tested | 78.3% |
| Over-refusals | Low | Lowest of any recent model |
| Adaptive thinking | Not available | Built in |
| Context compaction | Auto at 95% | Configurable threshold (beta) |
| Standard pricing | $5/$25 per 1M | $5/$25 per 1M (unchanged) |
Coding quality and longer agent runs are the headline gains. Everything 4.5 already did well carries over too: multi-agent delegation, token efficiency, the effort parameter. Day to day, the practical wins in Claude Code are the bigger output ceiling and adaptive thinking.
Model selection is simple. Reach for Opus 4.6 when reasoning depth is what the job needs. Sonnet is the right call on smaller tasks that want speed over depth. Pricing is now at parity, so there's no reason left on the bill to stick with the older flagship.
Stop configuring. Start building.