Build This Now
Build This Now
Claude Code ModelsDeepSeek V4: Pricing, Context, and MigrationClaude Code Quality Regression: What Actually HappenedClaude Opus 4.7 vs GPT-5.5Claude Opus 4.7 vs Other AI ModelsClaude Mythos: The Model That Thinks in LoopsClaude Opus 4.5 in Claude CodeClaude Opus 4.7Claude Opus 4.7 vs 4.6Claude Opus 4.7 Use CasesClaude Opus 4.6Claude Sonnet 4.6Claude Opus 4.5Claude Sonnet 4.5Claude Haiku 4.5Claude Opus 4.1Claude 4Claude 3.7 SonnetClaude 3.5 Sonnet v2 and Claude 3.5 HaikuClaude 3.5 SonnetClaude 3Every Claude Model
speedy_devvkoen_salo
Blog/Model Picker/Claude Opus 4.6

Claude Opus 4.6

Claude Opus 4.6 ships February 2026 with 1M context GA, 128K max output, and the same $5/$25 pricing. Sharper planning, longer agent runs, big-codebase gains.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

Published Mar 29, 2026Model Picker hub

Opus 4.6 is the upgraded flagship from Anthropic. Planning takes more thought. Agent runs stay coherent for longer. Large codebases feel less hostile. And Claude catches its own bugs before you do. It is also the first Opus-class release to ship with a generally available 1M token window, and response output now reaches 128K tokens.

Coding is the headline change, and the price tag stays flat at $5/$25 per million tokens while scores on the hardest public evals moved up across the board. The raw numbers sit in the benchmark section below.

Key Specs

SpecDetails
API IDclaude-opus-4-6
Release DateFebruary 5, 2026
Context Window1M tokens (GA as of March 2026)
Max Output128,000 tokens
Pricing$5 input / $25 output per 1M tokens
StatusActive, current recommended Opus

What Changed: The Coding Improvements

Anthropic dogfoods Claude. Every Anthropic engineer lives in Claude Code day to day, and nothing ships until it survives the internal use case first. The 4.6 gains are concrete and practical.

Planning is more careful. Before committing to an approach, the model sits with the problem longer, loops back through its own reasoning, notices logic errors sooner, and lands a stronger first draft on hard tasks.

Agent runs stay coherent. Older models drifted after a while. Here, focus holds across long sessions. A workflow that fires off tool call after tool call, dozens deep, reaches the finish line more often now.

Large codebases feel less hostile. Navigating big projects, reading them, and changing them all improved. Claude keeps a clearer picture of layout and house conventions across a long session.

Review and debugging hit harder. Catching its own mistakes is noticeably better, and reviews read more thoroughly. Tracing a bug through a chain of dependencies now needs far less hand-holding from you.

Easy work moves fast. The deeper reasoning gets saved for the hard steps, and Opus 4.6 no longer dwells on the obvious ones. Catch it overthinking something simple? Drop the default from high to medium with /effort.

Benchmark Results

New records landed in several categories.

BenchmarkScoreNotable Comparison
Terminal-Bench 2.065.4%GPT-5.2: 64.7%
GDPval-AA Elo1,606144 Elo above GPT-5.2, 190 above Opus 4.5
Humanity's Last ExamLeadingHighest among all frontier models
BrowseCompLeadingBest at finding hard-to-locate information online
OSWorld72.7%State-of-the-art for computer use
MRCR v2 (8-needle)78.3%Highest among frontier models at 1M context

Inside Claude Code, the benchmark to watch is Terminal-Bench 2.0. It scores real terminal work across coding, sysadmin tasks, and file handling. Top slot here means Opus 4.6 is the strongest pick for what a developer actually does at the command line all day.

GDPval-AA sits on the opposite end of the eval spectrum. It measures knowledge work that drives real economic value, in finance, legal, and the rest of the white-collar stack. The lead over the next-best industry model is wide.

The MRCR v2 number matters for a different reason. "Context rot" is the usual complaint, where answers degrade as the chat stretches on. That drift shrinks here. Across very long windows, Opus 4.6 keeps its grip on small details and pulls up buried facts the prior version missed. The 78.3% score is a real change in how much of the window Claude can put to work.

Humanity's Last Exam tests broad multidisciplinary reasoning, and no frontier model tops Opus 4.6 on it. BrowseComp scores how well the model digs up information that is genuinely hard to find online. OSWorld grades real desktop computer use. The new release takes the crown on all three.

1M Token Context Window and 128K Output

As of March 2026, the full 1M window is generally available, and token pricing is flat end to end. The per-token rate on a 900K-token call matches the rate on a 9K one. No beta header is needed. Any legacy beta headers get silently dropped.

Media limits grew 6x at the GA launch. The per-request ceiling is now 600 images or PDF pages, versus 100 before. Rate limits stay at their full values no matter how long the context gets.

Output also jumped. The ceiling moved from 16K tokens to 128K, which lets Claude finish larger-output jobs in a single call. Whole modules or long analyses can now come back in one response instead of being chopped across many.

Inside Claude Code, the full 1M window ships on by default for Max, Team, and Enterprise plans. Anthropic reports a 15% drop in compaction events, so long conversations now survive end to end without lossy summarization kicking in. Whatever context-management workflow you already use still works. You just bump into the ceiling less often.

Safety Profile

Smarter does not mean less safe. Anthropic runs an automated behavioral audit, and Opus 4.6 scored low on the behaviors that matter: deception, sycophancy, reinforcing user delusions, and going along with misuse. Its alignment sits level with Opus 4.5, the previous record holder for best-aligned frontier release.

Legitimate prompts also come through more often. Opus 4.6 posts the lowest rate of over-refusals in any recent Claude release. Real requests get blocked less.

The cybersecurity number is the headline. On one internal run, the model surfaced 500+ previously unknown high-severity zero-day flaws sitting in open-source libraries. Anthropic is pushing this harder, aiming the model at OSS projects to hunt down and patch the flaws buried inside. Security teams can drop Opus 4.6 into code review as a first-pass vulnerability scanner.

New API and Product Features

The model upgrade landed alongside several new features.

Adaptive thinking. Extended thinking used to be a binary toggle. Claude now picks its own moments to think harder. With effort set to high (the default), extended thinking kicks in wherever it helps. Four tiers are available to developers: low, medium, high (default), and max.

Context compaction (beta). When a long chat drifts toward the context ceiling, Claude now summarizes and compacts it on its own. Long-running tasks keep going instead of running out of room.

Agent teams (Claude Code research preview). Multiple Claude instances can now run in parallel as one coordinated team. Read-heavy jobs that fan out into independent pieces, like codebase reviews, are the sweet spot. Everything else lives in the agent teams guide.

Claude in PowerPoint (research preview). Layouts, fonts, and slide masters all get parsed by Claude so the output stays on brand, whether it is filling a template or spinning a deck up from scratch. Available on the Max, Team, and Enterprise plans.

Pricing

No price bump. The 1M window ships with unified pricing across the whole context length. The old 200K+ premium tier has been retired.

TierCost
All contexts$5 input / $25 output per 1M tokens
Pro plan$20/month
Max plan$100/month

Been on Opus 4.5 with your spend dialed in? The jump to 4.6 is free upside at the old price. And if long-context calls were paying the premium tier, the bill just dropped.

How to Use Opus 4.6 in Claude Code

One command changes the default model:

claude config set model claude-opus-4-6

For a single session, override it without touching the default:

claude --model claude-opus-4-6

The model ships everywhere: claude.ai, the Messages API, AWS Bedrock, and Google Vertex AI. On the API, the identifier to use is claude-opus-4-6.

Opus 4.6 vs Opus 4.5: What Changed

FeatureOpus 4.5Opus 4.6
Context window200K (standard), 1M (API beta)1M (GA, unified pricing)
Max output tokens16,384128,000
Terminal-Bench 2.0Not tested on v2.065.4% (highest)
GDPval-AA Elo1,4161,606 (+190 points)
MRCR v2Not tested78.3%
Over-refusalsLowLowest of any recent model
Adaptive thinkingNot availableBuilt in
Context compactionAuto at 95%Configurable threshold (beta)
Standard pricing$5/$25 per 1M$5/$25 per 1M (unchanged)

Coding quality and longer agent runs are the headline gains. Everything 4.5 already did well carries over too: multi-agent delegation, token efficiency, the effort parameter. Day to day, the practical wins in Claude Code are the bigger output ceiling and adaptive thinking.

Model selection is simple. Reach for Opus 4.6 when reasoning depth is what the job needs. Sonnet is the right call on smaller tasks that want speed over depth. Pricing is now at parity, so there's no reason left on the bill to stick with the older flagship.

More in Model Picker

  • Claude Mythos: The Model That Thinks in Loops
    Claude Mythos is suspected to use recurrent-depth architecture: one shared layer looped N times, with ACT halting so hard questions get more passes and easy ones stop early.
  • Claude Opus 4.7 vs Other AI Models
    Claude Opus 4.7, GPT-5.4, Kimi K2.6, Gemini 3.1 Pro, DeepSeek V3.2: benchmarks, context windows, agent reliability, and cost, so you reach for the right one.
  • DeepSeek V4: Pricing, Context, and Migration
    DeepSeek V4 ships two models: V4-Flash at $0.28/M output and V4-Pro at $3.48/M. Both carry a genuine 1M context window and drop into any Anthropic-compatible SDK with one line changed.
  • Every Claude Model
    Every Claude model on one page: Claude 3, 3.5, 3.7, 4, Opus 4.1 to 4.6, Sonnet 4.5 and 4.6, Haiku 4.5. Specs, pricing, benchmarks, and when to use each.
  • Claude 3.5 Sonnet v2 and Claude 3.5 Haiku
    Claude 3.5 Sonnet v2 and 3.5 Haiku launched October 2024 with Computer Use beta, cursor control, upgraded coding and tool use, and cheaper Haiku at $0.80/$4.
  • Claude 3.5 Sonnet
    Claude 3.5 Sonnet launched June 2024 at $3/$15, beating Claude 3 Opus on MMLU, GPQA, HumanEval at a fifth of the cost. Specs, benchmarks, and code gains.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

On this page

Key Specs
What Changed: The Coding Improvements
Benchmark Results
1M Token Context Window and 128K Output
Safety Profile
New API and Product Features
Pricing
How to Use Opus 4.6 in Claude Code
Opus 4.6 vs Opus 4.5: What Changed

Stop configuring. Start building.

SaaS builder templates with AI orchestration.