Claude Opus 4.8 vs Sonnet 4.6: Which to Use for Coding

Q: Should I use Opus 4.8 or Sonnet 4.6 for coding?

Use Sonnet 4.6 as your default. It costs $3 input / $15 output per million tokens and was preferred over the previous Opus flagship on most coding sessions in Anthropic's testing. Switch to Opus 4.8 ($5/$25) for long autonomous runs where the model works for hours without you, because its stronger calibration means it flags its own uncertain or buggy output instead of presenting it confidently. Day-to-day coding: Sonnet 4.6. Long unattended agentic work: Opus 4.8.

Q: Is Opus 4.8 better than Sonnet 4.6 at coding?

On raw agentic benchmarks, yes: Opus 4.8 scores 88.6% on SWE-Bench Verified and leads SWE-Bench Pro at 69.2%. But Sonnet 4.6 is good enough that developers preferred it over the prior Opus flagship on 59% of sessions, at 40% lower cost. Opus 4.8 is better; Sonnet 4.6 is better value for most work. The gap matters most on long, autonomous tasks.

Q: How much cheaper is Sonnet 4.6 than Opus 4.8?

Sonnet 4.6 is $3 input / $15 output per million tokens. Opus 4.8 is $5 input / $25 output. That makes Sonnet roughly 40% cheaper per token, and the gap compounds on long sessions that burn a lot of tokens. If you run on a Claude Code subscription rather than the API, both models draw from the same plan, so the model you pick mostly affects how fast you hit your limit.

Use Sonnet 4.6 as your default coding model and switch to Opus 4.8 for long autonomous runs. Sonnet 4.6 costs 40% less ($3/$15 versus $5/$25 per million tokens) and was preferred over the previous Opus flagship on most coding sessions. Opus 4.8 wins when a task runs for hours unattended, because its stronger calibration means it tells you when its own output is shaky.

That one rule covers most cases. The detail below tells you when to break it.

The two models at a glance

	Sonnet 4.6	Opus 4.8
Role	Balanced default	Long-horizon flagship
Price (per 1M tokens)	$3 in / $15 out	$5 in / $25 out
Context window	1M (GA)	1M
Max output	16,384 tokens	128,000 tokens
SWE-Bench Verified	strong mid-tier	88.6%
SWE-Bench Pro	solid	69.2% (leads the field)
Headline strength	Best value, reads code well	Calibration and honesty on long runs

Both carry a 1M-token context, so neither is limited on how much code it can see. The difference is reasoning depth, output ceiling, and how much you can trust a long unattended run.

Why Sonnet 4.6 is the default

Sonnet 4.6 is the model that started beating last generation's flagship. In Anthropic's internal Claude Code testing, developers preferred it over Sonnet 4.5 about 70% of the time, and over Opus 4.5 (the prior frontier model) on 59% of coding sessions. A mid-tier model outscoring an Opus model on developer preference, at $3/$15, is why it is the sensible default.

It also got better at the thing that makes AI edits annoying. Sonnet 4.6 reads the surrounding code before it changes anything, picks up house conventions, folds shared logic into one place instead of duplicating it, and backs off the over-eager refactors older models loved. For everyday feature work, that behavior matters more than a few benchmark points. See the full Sonnet 4.6 breakdown.

Why Opus 4.8 wins the long runs

Opus 4.8's headline is not raw coding skill, though it leads SWE-Bench Pro at 69.2% and scores 88.6% on SWE-Bench Verified. The real upgrade is calibration: it is far less likely to let its own bugs pass unflagged. When you hand a model hours of autonomous work, there is no human watching each step to catch a confident mistake, so the model's honesty about its own output becomes the load-bearing feature.

That is why Opus 4.8 is the pick for long agentic sessions and for Dynamic Workflows, where one model plans a job, spins up many parallel subagents, and verifies their output before reporting back. It also has a 128,000-token output ceiling versus Sonnet's 16,384, which matters when a single step needs to produce a lot of code at once. The full Opus 4.8 breakdown goes deeper.

When to pick which

Your task	Pick
Everyday feature work, edits, bug fixes	Sonnet 4.6
Tight budget or token-metered API use	Sonnet 4.6
A long autonomous session running for hours	Opus 4.8
Multi-agent or Dynamic Workflows runs	Opus 4.8
One step that must output a lot of code at once	Opus 4.8
You want the cheapest model that still wins most sessions	Sonnet 4.6

A practical workflow is to run Sonnet 4.6 by default and reach for Opus 4.8 when a task is large, unattended, or high-stakes enough that you will not be reading every line. For the broader lineup including Fable 5 and Haiku, see model selection and the best AI coding model in 2026. If your jobs run for many hours, also weigh Fable 5 vs Opus 4.8.

A note on cost if you use a subscription

The $3/$15 versus $5/$25 gap matters most on the API, where you pay per token. If you run Claude Code on a Pro or Max subscription, both models draw from the same plan, so picking Opus 4.8 mostly means you hit your usage limit faster, not that you pay more per task. Either way, default to Sonnet 4.6 and spend Opus 4.8 where its calibration earns its keep. For the plan math, see Claude Code pricing.

FAQ

Should I use Opus 4.8 or Sonnet 4.6 for coding? Default to Sonnet 4.6 at $3/$15; it was preferred over the prior Opus flagship on most coding sessions. Switch to Opus 4.8 ($5/$25) for long autonomous runs, where its stronger calibration flags its own shaky output instead of presenting it confidently.

Is Opus 4.8 better than Sonnet 4.6 at coding? On benchmarks, yes (88.6% SWE-Bench Verified, 69.2% SWE-Bench Pro). But Sonnet 4.6 is good enough that developers preferred it over the previous Opus flagship on 59% of sessions at 40% lower cost. Opus 4.8 is better; Sonnet 4.6 is better value for most work.

How much cheaper is Sonnet 4.6 than Opus 4.8? Sonnet 4.6 is $3/$15 per million tokens versus Opus 4.8's $5/$25, roughly 40% cheaper, and the gap compounds on long token-heavy sessions. On a subscription, both draw from the same plan.

Which model does Claude Code use by default? You choose. Many builders set Sonnet 4.6 as the working default and switch to Opus 4.8 for long autonomous or multi-agent runs. Both are available on Claude Code plans.