Build This Now
Build This Now
Modèles Claude CodeOpus 4.8 CheatsheetDeepSeek V4: Pricing, Context, and MigrationRégression de qualité de Claude Code : ce qui s'est vraiment passéClaude Opus 4.7 vs GPT-5.5Claude Opus 4.7 face aux autres modèles IAClaude Mythos : le modèle qui pense en bouclesClaude Opus 4.5 dans Claude CodeClaude Opus 4.7Claude Opus 4.7 vs 4.6À quoi sert vraiment Claude Opus 4.7Claude Opus 4.6Claude Sonnet 4.6Claude Opus 4.5Claude Sonnet 4.5Claude Haiku 4.5Claude Opus 4.1Claude 4Claude 3.7 SonnetClaude 3.5 Sonnet v2 et Claude 3.5 HaikuClaude 3.5 SonnetClaude 3Tous les modèles Claude
speedy_devvkoen_salo
Blog/Model Picker/Opus 4.8 Cheatsheet

Opus 4.8 Cheatsheet

Claude Opus 4.8, Anthropic's May 2026 flagship: the headline change is honesty. 4x less likely to let its own bugs pass unflagged, runs longer solo, Dynamic Workflows, same $5/$25 pricing.

Arrêtez de configurer. Commencez à construire.

Templates SaaS avec orchestration IA.

Published May 28, 202614 min readModel Picker hub

Claude Opus 4.8 is a modest upgrade with one change that is not modest at all. The price is the same. The 1M context window is the same. The 128K output ceiling is the same. The benchmark gains over Opus 4.7 are real but incremental. The thing that actually moved is harder to put on a chart: the model is far more honest about its own work.

Anthropic's own framing calls this out as one of the most prominent improvements in the release. Opus 4.8 is about 4x less likely than Opus 4.7 to leave a flaw in code it wrote without flagging it. It is more willing to say it is unsure. It is less willing to present thin evidence as progress.

That sounds soft until you have lived through the opposite. The most expensive failure mode in agentic coding is not a syntax error. It is a model that is confidently wrong, keeps going, and reports success on a task it quietly broke three steps ago. Opus 4.8 is built to stop doing that.

Quick Verdict

Use Opus 4.8 when the cost of a confident mistake is high:

  • long-running agent runs you cannot babysit
  • migrations and refactors across many files
  • code review where a silently-introduced bug is expensive
  • work where "I am not sure" is more valuable than a plausible guess
  • Dynamic Workflows that fan out to many parallel subagents

Stay on Sonnet for fast daily edits where speed and cost beat maximum reasoning depth. Reach for Opus 4.8 the moment you are handing off work and walking away.

Key Specs

SpecDetails
API IDclaude-opus-4-8
Release dateMay 28, 2026
Context window1M tokens (200K on Microsoft Foundry)
Max output128,000 tokens
Pricing$5 input / $25 output per 1M tokens
Fast mode$10 / $50, 2.5x speed, 3x cheaper than prior fast modes
Thinking modeAdaptive thinking only
Effort levelslow, medium, high, xhigh, max
Default efforthigh (down from 4.7's xhigh in Claude Code)
StatusCurrent Opus flagship

The Headline Change: Honesty

Most model releases lead with a benchmark. This one leads with a behavior.

Opus 4.7 was strong, but it had a specific weakness that anyone running long agent sessions felt. It would sometimes confidently present work as progress despite thin evidence. It would invent a plausible fallback when the real data was missing. It would let a flaw it introduced slide by without mentioning it. None of those are dramatic in isolation. Across a two-hour autonomous run, they compound into work you cannot trust without re-checking every step yourself.

Opus 4.8 attacks that directly. The number Anthropic publishes is the cleanest summary: roughly 4x fewer cases where the model leaves a flaw in its own code unflagged, compared to 4.7. It is more likely to surface uncertainty about its own progress, and less likely to claim it finished something it only half-finished.

Here is what that looks like in a real Claude Code session. You ask it to change a type that is used in forty places. The old behavior was to make the change, watch the build pass, and report done, even if three call sites now rely on a runtime assumption that no longer holds. The new behavior is to pause and say it is not certain about a specific call site, check it, and tell you what it found before moving on. A Spotify staff engineer who tested the model described exactly this: in Claude Code it "asks the right questions, catches its own mistakes, and pushes back when a plan isn't sound" before making big changes.

That last part matters as much as the bug-catching. Pushing back on a bad plan is a form of honesty too. A model that quietly executes a flawed instruction is not being helpful. It is deferring a problem to your future self.

There is an alignment story underneath this. Anthropic reports that Opus 4.8 reaches new highs on prosocial traits like supporting user autonomy and acting in the user's actual interest, and that misaligned behavior such as deception is substantially lower than 4.7, closer to Claude Mythos Preview, their best-aligned model. The practical translation is simple: the model is more likely to tell you the truth about what it did, including the parts that did not work.

Why Honesty Is the Right Headline for 2026

It is tempting to read "more honest" as a nice-to-have. It is actually the most important kind of progress for where agentic coding is going, and the rest of this release explains why.

The frontier is no longer bottlenecked on whether a model can write the code. Opus 4.7 could already write most of it. The bottleneck is whether you can trust the output of a long autonomous run without reading every line. The more autonomy you hand a model, the more its calibration matters, because there is no human watching each step to catch the confident mistake.

Opus 4.8 ships alongside Dynamic Workflows, where one model plans a job, spins up hundreds of parallel subagents in a single session, and verifies its own outputs before reporting back. Think about what that requires. When one coordinator is directing hundreds of agents faster than any human can follow, self-verification is the only thing standing between you and a pile of plausible-looking, subtly-wrong work. Honesty is not a personality upgrade here. It is the load-bearing feature that makes autonomy usable.

So the order of this release is deliberate. Honesty first, then the autonomy that depends on it.

Dynamic Workflows: One Brief, a Whole Swarm

Dynamic Workflows is the headline feature in Claude Code, currently a research preview on Enterprise, Team, and Max plans. The shape is: you give one brief, the model plans the work, it launches hundreds of parallel subagents in one session, it verifies their outputs, and it reports back.

Two things make it different from the subagent fan-out you may already know. First, it is adaptive, not a fixed plan. The agents re-prioritize based on what they find mid-run, so a discovery in one branch can reshape the others. Second, with Opus 4.8 the subagents run even longer before they need a human, which is only safe because of the honesty gains above.

The intended use case is codebase-scale work. A migration across hundreds of thousands of lines, run from kickoff to merge, with your existing test suite as the bar for what counts as done. That last detail is the whole design. The swarm is not trusted because it sounds confident. It is trusted because it has to make a real test suite pass, and because the coordinator flags what it could not verify instead of papering over it.

A concrete version of the workflow: point it at a deprecated internal API used across a large monorepo. It plans the migration, fans out subagents to handle clusters of call sites in parallel, runs the tests, and comes back with a merged result plus an honest list of the cases it was unsure about. The cases it flags are the ones you review. Everything that passed the suite and the model's own verification, you can trust. That division of labor only works if the flagging is reliable, which loops back to the honesty story.

Opus 4.8 vs Opus 4.7

The story is not "much smarter." It is "more trustworthy on long, unsupervised work, plus a steady benchmark bump."

AreaOpus 4.7Opus 4.8
Self-honesty on own codebaselineabout 4x fewer unflagged flaws
SWE-Bench Pro (agentic coding)64.3%69.2%
OSWorld-Verified (computer use)82.8%83.4%
GDPval-AA (knowledge work, Elo)17531890
SWE-Bench Verifiedlower88.6% (2nd, behind Mythos Preview)
Default Claude Code effortxhighhigh
Long-running autonomystrongruns independently longer
Tool callingoccasionally over-verbosefewer steps, fewer skipped calls
Cross-session memorylimitedlearns across sessions via the memory tool

The behavioral row at the top is the one to read first. The benchmark rows are the supporting evidence, not the point.

Runs Longer, Uses Tools Cleaner

Anthropic positions Opus 4.8 as building on 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer. In practice that means you can hand off a feature, a migration, or a bug sweep and let it run further before it needs you.

Tool use also got tighter. On CursorBench the tool calling is more efficient, meaning fewer steps for the same result. The Devin team reported that 4.8 fixes the comment-verbosity and tool-calling issues they saw in 4.7. It is also less likely to skip a tool call the task actually required, which was a real 4.7 complaint. Long-context handling improved too: fewer compactions during long sessions, and better recovery when a compaction does happen.

These are not flashy. They are exactly the traits that decide whether a four-hour agent run finishes clean or stalls in a loop.

Benchmarks That Matter

The full benchmark wall is useful for launch day, but only some numbers map to user value.

BenchmarkOpus 4.8Why it matters
SWE-Bench Pro69.2%Closest to real-world agentic coding, and 4.8 leads the field
SWE-Bench Verified88.6%Second only to Anthropic's unreleased Mythos Preview (93.9%)
OSWorld-Verified83.4%Strongest computer-use score in the comparison set
GDPval-AA (Elo)1890Leads on knowledge work, well ahead of 4.7 and GPT-5.5
Humanity's Last Exam49.8% / 57.9%No tools / with tools, ahead of the three rivals
Online-Mind2Web84%Strongest browser-agent score testers measured
Databricks Genie61% cheaperReasons over PDFs and diagrams at 61% lower token cost than 4.7

Read the spread this way. Opus 4.8 leads agentic coding, computer use, knowledge work, and reasoning. GPT-5.5 still edges raw terminal coding on Terminal-Bench 2.1 (78.2% to 4.8's 74.6%). The only model that clearly beats Opus 4.8 across the board is Anthropic's own unreleased Mythos. For a point release at the same price, that is a strong position.

If you are choosing a model for Claude Code, SWE-Bench Pro and the honesty improvement are the two signals worth weighting most. The first says it writes better code. The second says it tells you when it did not.

The Effort Dial Changed

This is the one migration detail that will surprise people. The default effort in Claude Code dropped from xhigh (the 4.7 default) to high.

That sounds like a downgrade. It is not. On coding work, Opus 4.8 at high spends a similar number of tokens as Opus 4.7 spent at its default, with better performance. You are getting more out of each token, so the sensible default moved down a notch. Anthropic also reports that 4.8 exceeds prior Opus models at every effort level, so even low and medium are stronger than they were.

The levels, low to high cost, are low, medium, high (default), xhigh (shown as "extra" in claude.ai), and max. The guidance is straightforward:

  • keep high for most serious coding sessions, since it is now the calibrated default
  • step up to xhigh for genuinely difficult tasks and long-running async workflows
  • reserve max for the hardest problems and ceiling testing
  • drop to medium or low only when you are cost-constrained across many parallel sessions

There is also a new effort control UI sitting next to the model selector in both claude.ai and Cowork, available on all plans, and Claude Code rate limits were raised to accommodate higher effort. The takeaway: stop treating the default as always correct, and start tuning effort to the task in front of you.

Memory Across Sessions

Opus 4.8 can use memory to learn across sessions and carry long-running work forward with minimal oversight. The mechanism is the memory tool, which lets the model create, read, update, and delete files that persist between conversations. Knowledge builds over time without bloating the context window, because it pairs with context editing and compaction: important tool results get saved to memory before they are cleared, then recalled later when they are relevant.

There is one practical rule that is easy to miss. Pair memory with max effort, or do not bother. At max effort the model writes notes that are actually worth keeping, even on tasks that would not otherwise need max. At lower effort the notes tend to be thin, and you get the storage overhead without the payoff. If cross-session memory is the reason you are reaching for 4.8, turn the effort up.

When to Reach for Opus 4.8

The pattern across every strong use case is the same: high autonomy plus high cost of a quiet mistake.

Long unsupervised agent runs. If you are kicking off work and walking away, the honesty gains are the whole point. You are no longer there to catch the confident wrong turn, so you need a model that catches its own.

Large migrations and refactors. This is the Dynamic Workflows sweet spot. Hand it a codebase-scale change, let the swarm run against your test suite, and review the cases it flags as uncertain.

Code review and debugging on incomplete evidence. Opus 4.8 is better at admitting the data is missing instead of inventing fallback logic, which is exactly what you want when the bug is subtle and the context is partial.

High-stakes knowledge work. The GDPval-AA jump and the Databricks Genie cost drop point to document and diagram reasoning that is both stronger and cheaper than 4.7. Legal, finance, and operations work where a fabricated detail is a real liability benefits from the calibration most.

Stay on Sonnet for fast daily edits, cheap bulk automation, and quick Q&A where speed wins and the cost of a small mistake is low.

Pricing and Cost

Opus 4.8 kept the standard pricing.

TierCost
Input$5 per 1M tokens
Output$25 per 1M tokens
Fast mode input$10 per 1M tokens
Fast mode output$50 per 1M tokens

Fast mode is the interesting line. It runs at 2.5x the speed and costs 3x less than previous fast modes, which makes "fast" a genuinely usable option now rather than a premium novelty.

Your real bill is shaped less by the headline rate and more by effort. Because high is now the calibrated default and the model exceeds prior Opus at every level, many workloads will cost less for the same quality than they did on 4.7 at xhigh. A few API-side changes also help cost: the prompt-cache minimum dropped to 1,024 tokens, and the Messages API now accepts mid-conversation system messages, so you can update instructions or token budgets mid-task without breaking the prompt cache. As always, measure on your actual workloads rather than assuming.

Should You Upgrade to Claude Opus 4.8?

Yes, if your pain points are:

  • agents that report success on work they quietly broke
  • long autonomous runs you cannot fully supervise
  • confident-but-wrong reasoning on partial context
  • large migrations and multi-file refactors
  • cross-session work where the model should remember what it learned

Maybe not as your daily default if your workload is mostly small edits, cheap bulk automation, or quick Q&A. Sonnet still wins on speed and cost there.

For most serious Claude Code users, the strategy is the familiar one with a sharper edge: keep Sonnet for fast everyday work, and make Opus 4.8 the flagship for anything you are going to trust without reading every line. That trust is exactly what this release was built to earn.

Frequently Asked Questions

What is the biggest change in Claude Opus 4.8?

Honesty about its own work. Opus 4.8 is roughly 4x less likely than Opus 4.7 to leave a flaw in its own code unflagged, more likely to admit uncertainty, and more willing to push back on a bad plan. Anthropic calls it one of the most prominent improvements in the release.

Is Claude Opus 4.8 worth it over Opus 4.7?

For long-running, unsupervised, or high-stakes work, yes. The benchmark gains are incremental (SWE-Bench Pro 64.3% to 69.2%), but the reliability gain on autonomous runs is large. If you mostly do small supervised edits, the upgrade is less urgent.

Why did the default effort drop to high?

Opus 4.8 at high spends about the same tokens as Opus 4.7 spent at its default, with better performance. You get more per token, so the calibrated default moved down from xhigh to high. Step up to xhigh for hard or long async tasks, and max for the hardest work.

What are Dynamic Workflows?

A Claude Code research preview, on Enterprise, Team, and Max plans, where the model plans a job, spins up hundreds of parallel subagents in one session, verifies their outputs, and reports back. It is adaptive rather than fixed, and it is built for codebase-scale migrations with your test suite as the bar.

How much does Claude Opus 4.8 cost?

Standard pricing is unchanged at $5 input and $25 output per 1M tokens. Fast mode is $10 / $50, runs 2.5x faster, and costs 3x less than previous fast modes. The model ID is claude-opus-4-8.

Does Claude Opus 4.8 remember across sessions?

Yes, through the memory tool, which persists files between conversations and pairs with context editing and compaction so knowledge builds without bloating context. For useful notes, pair memory with max effort.

Sources

  • Introducing Claude Opus 4.8
  • What's new in Claude Opus 4.8 (API docs)
  • Effort (API docs)
  • Memory tool (API docs)
  • Claude's new model is more 'honest' when it messes up (The Verge)
  • Anthropic launches Opus 4.8, with honesty as its killer feature (ZDNET)
  • Anthropic releases Opus 4.8 with new 'dynamic workflow' tool (TechCrunch)

Related Pages

  • Claude Opus 4.7
  • Claude Opus 4.7 use cases
  • Claude Code Models
  • Claude Mythos and OpenMythos
  • Claude Code Pricing and Token Usage

More in Model Picker

  • Claude Mythos : le modèle qui pense en boucles
    Claude Mythos utiliserait une architecture recurrent-depth : une seule couche partagée bouclée N fois, avec halting ACT pour que les questions difficiles reçoivent plus de passes et les faciles s'arrêtent tôt.
  • Claude Opus 4.7 face aux autres modèles IA
    Claude Opus 4.7, GPT-5.4, Kimi K2.6, Gemini 3.1 Pro, DeepSeek V3.2 : benchmarks, fenêtres de contexte, fiabilité agent et coût, pour choisir le bon modèle au bon moment.
  • DeepSeek V4: Pricing, Context, and Migration
    DeepSeek V4 ships two models: V4-Flash at $0.28/M output and V4-Pro at $3.48/M. Both carry a genuine 1M context window and drop into any Anthropic-compatible SDK with one line changed.
  • Tous les modèles Claude
    Tous les modèles Claude sur une seule page : Claude 3, 3.5, 3.7, 4, Opus 4.1 à 4.6, Sonnet 4.5 et 4.6, Haiku 4.5. Specs, tarifs, benchmarks, et quand utiliser chacun.
  • Claude 3.5 Sonnet v2 et Claude 3.5 Haiku
    Claude 3.5 Sonnet v2 et 3.5 Haiku ont été lancés en octobre 2024 avec Computer Use en bêta, contrôle du curseur, codage et utilisation d'outils améliorés, et Haiku moins cher à $0.80/$4.
  • Claude 3.5 Sonnet
    Claude 3.5 Sonnet lancé en juin 2024 à $3/$15, surpassant Claude 3 Opus sur MMLU, GPQA, HumanEval au cinquième du coût. Specs, benchmarks et gains en codage.

Arrêtez de configurer. Commencez à construire.

Templates SaaS avec orchestration IA.

Modèles Claude Code

Choisir le bon modèle Claude Code : Sonnet, Opus, Haiku, sonnet[1m], ou opusplan. Changer de modèle selon la tâche réduit les coûts de 60 à 80% sans sacrifier la qualité.

DeepSeek V4: Pricing, Context, and Migration

DeepSeek V4 ships two models: V4-Flash at $0.28/M output and V4-Pro at $3.48/M. Both carry a genuine 1M context window and drop into any Anthropic-compatible SDK with one line changed.

On this page

Quick Verdict
Key Specs
The Headline Change: Honesty
Why Honesty Is the Right Headline for 2026
Dynamic Workflows: One Brief, a Whole Swarm
Opus 4.8 vs Opus 4.7
Runs Longer, Uses Tools Cleaner
Benchmarks That Matter
The Effort Dial Changed
Memory Across Sessions
When to Reach for Opus 4.8
Pricing and Cost
Should You Upgrade to Claude Opus 4.8?
Frequently Asked Questions
What is the biggest change in Claude Opus 4.8?
Is Claude Opus 4.8 worth it over Opus 4.7?
Why did the default effort drop to high?
What are Dynamic Workflows?
How much does Claude Opus 4.8 cost?
Does Claude Opus 4.8 remember across sessions?
Sources
Related Pages

Arrêtez de configurer. Commencez à construire.

Templates SaaS avec orchestration IA.