Does the June 15, 2026 Claude Code billing split change what Opus 4.8 costs?

Yes. From June 15, 2026, programmatic Claude Code usage (the Agent SDK, claude -p, GitHub Actions, third-party apps) draws from a separate Agent SDK credit pool billed at full API rates, with no rollover and no automatic fallback to your interactive subscription. Because Opus 4.8 is the priciest model at $5 input / $25 output per 1M tokens, long autonomous runs and Dynamic Workflows are where this bites hardest. Tune the effort dial and measure token spend on your actual workloads.

What is the model ID for Claude Opus 4.8?

The API ID is claude-opus-4-8. Pricing is $5 input and $25 output per 1M tokens, with a fast mode at $10 / $50 that runs 2.5x faster.

Opus 4.8 Cheatsheet

Claude Opus 4.8 is a modest upgrade with one change that is not modest at all. The price is the same. The 1M context window is the same. The 128K output ceiling is the same. The benchmark gains over Opus 4.7 are real but incremental. The thing that actually moved is harder to put on a chart: the model is far more honest about its own work.

Anthropic's own framing calls this out as one of the most prominent improvements in the release. Opus 4.8 is about 4x less likely than Opus 4.7 to leave a flaw in code it wrote without flagging it. It is more willing to say it is unsure. It is less willing to present thin evidence as progress.

That sounds soft until you have lived through the opposite. The most expensive failure mode in agentic coding is not a syntax error. It is a model that is confidently wrong, keeps going, and reports success on a task it quietly broke three steps ago. Opus 4.8 is built to stop doing that.

Quick Verdict

Use Opus 4.8 when the cost of a confident mistake is high:

long-running agent runs you cannot babysit
migrations and refactors across many files
code review where a silently-introduced bug is expensive
work where "I am not sure" is more valuable than a plausible guess
Dynamic Workflows that fan out to many parallel subagents

Stay on Sonnet for fast daily edits where speed and cost beat maximum reasoning depth. Reach for Opus 4.8 the moment you are handing off work and walking away.

If your work runs for hours rather than minutes, Anthropic's newer Claude Fable 5 (June 2026) sits one tier above Opus 4.8 for exactly that kind of long-running job. It costs more ($10 / $50 per 1M tokens), so the Fable 5 vs Opus 4.8 comparison is worth a read before you switch.

Key Specs

Spec	Details
API ID	`claude-opus-4-8`
Release date	May 28, 2026
Context window	1M tokens (200K on Microsoft Foundry)
Max output	128,000 tokens
Pricing	$5 input / $25 output per 1M tokens
Fast mode	$10 / $50, 2.5x speed, 3x cheaper than prior fast modes
Thinking mode	Adaptive thinking only
Effort levels	`low`, `medium`, `high`, `xhigh`, `max`
Default effort	`high` (down from 4.7's `xhigh` in Claude Code)
Status	Current Opus flagship

The Headline Change: Honesty

Most model releases lead with a benchmark. This one leads with a behavior.

Opus 4.7 was strong, but it had a specific weakness that anyone running long agent sessions felt. It would sometimes confidently present work as progress despite thin evidence. It would invent a plausible fallback when the real data was missing. It would let a flaw it introduced slide by without mentioning it. None of those are dramatic in isolation. Across a two-hour autonomous run, they compound into work you cannot trust without re-checking every step yourself.

Opus 4.8 attacks that directly. The number Anthropic publishes is the cleanest summary: roughly 4x fewer cases where the model leaves a flaw in its own code unflagged, compared to 4.7. It is more likely to surface uncertainty about its own progress, and less likely to claim it finished something it only half-finished.

Here is what that looks like in a real Claude Code session. You ask it to change a type that is used in forty places. The old behavior was to make the change, watch the build pass, and report done, even if three call sites now rely on a runtime assumption that no longer holds. The new behavior is to pause and say it is not certain about a specific call site, check it, and tell you what it found before moving on. A Spotify staff engineer who tested the model described exactly this: in Claude Code it "asks the right questions, catches its own mistakes, and pushes back when a plan isn't sound" before making big changes.

That last part matters as much as the bug-catching. Pushing back on a bad plan is a form of honesty too. A model that quietly executes a flawed instruction is not being helpful. It is deferring a problem to your future self.

There is an alignment story underneath this. Anthropic reports that Opus 4.8 reaches new highs on prosocial traits like supporting user autonomy and acting in the user's actual interest, and that misaligned behavior such as deception is substantially lower than 4.7, closer to Claude Mythos Preview, their best-aligned model. The practical translation is simple: the model is more likely to tell you the truth about what it did, including the parts that did not work.

Why Honesty Is the Right Headline for 2026

It is tempting to read "more honest" as a nice-to-have. It is actually the most important kind of progress for where agentic coding is going, and the rest of this release explains why.

The frontier is no longer bottlenecked on whether a model can write the code. Opus 4.7 could already write most of it. The bottleneck is whether you can trust the output of a long autonomous run without reading every line. The more autonomy you hand a model, the more its calibration matters, because there is no human watching each step to catch the confident mistake.

Opus 4.8 ships alongside Dynamic Workflows, where one model plans a job, spins up hundreds of parallel subagents in a single session, and verifies its own outputs before reporting back. Think about what that requires. When one coordinator is directing hundreds of agents faster than any human can follow, self-verification is the only thing standing between you and a pile of plausible-looking, subtly-wrong work. Honesty is not a personality upgrade here. It is the load-bearing feature that makes autonomy usable.

So the order of this release is deliberate. Honesty first, then the autonomy that depends on it.

Dynamic Workflows: One Brief, a Whole Swarm

Dynamic Workflows is the headline feature in Claude Code, currently a research preview on Enterprise, Team, and Max plans. The shape is: you give one brief, the model plans the work, it launches hundreds of parallel subagents in one session, it verifies their outputs, and it reports back.

Two things make it different from the subagent fan-out you may already know. First, it is adaptive, not a fixed plan. The agents re-prioritize based on what they find mid-run, so a discovery in one branch can reshape the others. Second, with Opus 4.8 the subagents run even longer before they need a human, which is only safe because of the honesty gains above.

The intended use case is codebase-scale work. A migration across hundreds of thousands of lines, run from kickoff to merge, with your existing test suite as the bar for what counts as done. That last detail is the whole design. The swarm is not trusted because it sounds confident. It is trusted because it has to make a real test suite pass, and because the coordinator flags what it could not verify instead of papering over it.

A concrete version of the workflow: point it at a deprecated internal API used across a large monorepo. It plans the migration, fans out subagents to handle clusters of call sites in parallel, runs the tests, and comes back with a merged result plus an honest list of the cases it was unsure about. The cases it flags are the ones you review. Everything that passed the suite and the model's own verification, you can trust. That division of labor only works if the flagging is reliable, which loops back to the honesty story.

Opus 4.8 vs Opus 4.7

The story is not "much smarter." It is "more trustworthy on long, unsupervised work, plus a steady benchmark bump."

Area	Opus 4.7	Opus 4.8
Self-honesty on own code	baseline	about 4x fewer unflagged flaws
SWE-Bench Pro (agentic coding)	64.3%	69.2%
OSWorld-Verified (computer use)	82.8%	83.4%
GDPval-AA (knowledge work, Elo)	1753	1890
SWE-Bench Verified	lower	88.6% (2nd, behind Mythos Preview)
Default Claude Code effort	`xhigh`	`high`
Long-running autonomy	strong	runs independently longer
Tool calling	occasionally over-verbose	fewer steps, fewer skipped calls
Cross-session memory	limited	learns across sessions via the memory tool

The behavioral row at the top is the one to read first. The benchmark rows are the supporting evidence, not the point.

Runs Longer, Uses Tools Cleaner

Anthropic positions Opus 4.8 as building on 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer. In practice that means you can hand off a feature, a migration, or a bug sweep and let it run further before it needs you.

Tool use also got tighter. On CursorBench the tool calling is more efficient, meaning fewer steps for the same result. The Devin team reported that 4.8 fixes the comment-verbosity and tool-calling issues they saw in 4.7. It is also less likely to skip a tool call the task actually required, which was a real 4.7 complaint. Long-context handling improved too: fewer compactions during long sessions, and better recovery when a compaction does happen.

These are not flashy. They are exactly the traits that decide whether a four-hour agent run finishes clean or stalls in a loop.

Benchmarks That Matter

The full benchmark wall is useful for launch day, but only some numbers map to user value.

Benchmark	Opus 4.8	Why it matters
SWE-Bench Pro	69.2%	Closest to real-world agentic coding, and 4.8 leads the field
SWE-Bench Verified	88.6%	Second only to Anthropic's unreleased Mythos Preview (93.9%)
OSWorld-Verified	83.4%	Strongest computer-use score in the comparison set
GDPval-AA (Elo)	1890	Leads on knowledge work, well ahead of 4.7 and GPT-5.5
Humanity's Last Exam	49.8% / 57.9%	No tools / with tools, ahead of the three rivals
Online-Mind2Web	84%	Strongest browser-agent score testers measured
Databricks Genie	61% cheaper	Reasons over PDFs and diagrams at 61% lower token cost than 4.7

Read the spread this way. Opus 4.8 leads agentic coding, computer use, knowledge work, and reasoning. GPT-5.5 still edges raw terminal coding on Terminal-Bench 2.1 (78.2% to 4.8's 74.6%). The only model that clearly beats Opus 4.8 across the board is Anthropic's own unreleased Mythos. For a point release at the same price, that is a strong position.

If you are choosing a model for Claude Code, SWE-Bench Pro and the honesty improvement are the two signals worth weighting most. The first says it writes better code. The second says it tells you when it did not.

The Effort Dial Changed

This is the one migration detail that will surprise people. The default effort in Claude Code dropped from xhigh (the 4.7 default) to high.

That sounds like a downgrade. It is not. On coding work, Opus 4.8 at high spends a similar number of tokens as Opus 4.7 spent at its default, with better performance. You are getting more out of each token, so the sensible default moved down a notch. Anthropic also reports that 4.8 exceeds prior Opus models at every effort level, so even low and medium are stronger than they were.

The levels, low to high cost, are low, medium, high (default), xhigh (shown as "extra" in claude.ai), and max. The guidance is straightforward:

keep high for most serious coding sessions, since it is now the calibrated default
step up to xhigh for genuinely difficult tasks and long-running async workflows
reserve max for the hardest problems and ceiling testing
drop to medium or low only when you are cost-constrained across many parallel sessions

There is also a new effort control UI sitting next to the model selector in both claude.ai and Cowork, available on all plans, and Claude Code rate limits were raised to accommodate higher effort. The takeaway: stop treating the default as always correct, and start tuning effort to the task in front of you.

Memory Across Sessions

Opus 4.8 can use memory to learn across sessions and carry long-running work forward with minimal oversight. The mechanism is the memory tool, which lets the model create, read, update, and delete files that persist between conversations. Knowledge builds over time without bloating the context window, because it pairs with context editing and compaction: important tool results get saved to memory before they are cleared, then recalled later when they are relevant.

There is one practical rule that is easy to miss. Pair memory with max effort, or do not bother. At max effort the model writes notes that are actually worth keeping, even on tasks that would not otherwise need max. At lower effort the notes tend to be thin, and you get the storage overhead without the payoff. If cross-session memory is the reason you are reaching for 4.8, turn the effort up.

When to Reach for Opus 4.8

The pattern across every strong use case is the same: high autonomy plus high cost of a quiet mistake.

Long unsupervised agent runs. If you are kicking off work and walking away, the honesty gains are the whole point. You are no longer there to catch the confident wrong turn, so you need a model that catches its own.

Large migrations and refactors. This is the Dynamic Workflows sweet spot. Hand it a codebase-scale change, let the swarm run against your test suite, and review the cases it flags as uncertain.

Code review and debugging on incomplete evidence. Opus 4.8 is better at admitting the data is missing instead of inventing fallback logic, which is exactly what you want when the bug is subtle and the context is partial.

High-stakes knowledge work. The GDPval-AA jump and the Databricks Genie cost drop point to document and diagram reasoning that is both stronger and cheaper than 4.7. Legal, finance, and operations work where a fabricated detail is a real liability benefits from the calibration most.

Stay on Sonnet for fast daily edits, cheap bulk automation, and quick Q&A where speed wins and the cost of a small mistake is low.

Pricing and Cost

Opus 4.8 kept the standard pricing.

Tier	Cost
Input	$5 per 1M tokens
Output	$25 per 1M tokens
Fast mode input	$10 per 1M tokens
Fast mode output	$50 per 1M tokens

Fast mode is the interesting line. It runs at 2.5x the speed and costs 3x less than previous fast modes, which makes "fast" a genuinely usable option now rather than a premium novelty.

Your real bill is shaped less by the headline rate and more by effort. Because high is now the calibrated default and the model exceeds prior Opus at every level, many workloads will cost less for the same quality than they did on 4.7 at xhigh. A few API-side changes also help cost: the prompt-cache minimum dropped to 1,024 tokens, and the Messages API now accepts mid-conversation system messages, so you can update instructions or token budgets mid-task without breaking the prompt cache. As always, measure on your actual workloads rather than assuming.

One thing changes the math as of June 15, 2026: the Claude Code billing split. Programmatic usage (the Agent SDK, claude -p, GitHub Actions, third-party apps) moves to a separate credit pool billed at full API rates, with no rollover and no automatic fallback to your interactive subscription. Since Opus 4.8 is the most expensive model in the lineup, the long autonomous runs and Dynamic Workflows this post recommends are exactly the workloads that pool will meter. Read the split before you point a swarm at a large repo.

Should You Upgrade to Claude Opus 4.8?

Yes, if your pain points are:

agents that report success on work they quietly broke
long autonomous runs you cannot fully supervise
confident-but-wrong reasoning on partial context
large migrations and multi-file refactors
cross-session work where the model should remember what it learned

Maybe not as your daily default if your workload is mostly small edits, cheap bulk automation, or quick Q&A. Sonnet still wins on speed and cost there.

For most serious Claude Code users, the strategy is the familiar one with a sharper edge: keep Sonnet for fast everyday work, and make Opus 4.8 the flagship for anything you are going to trust without reading every line. That trust is exactly what this release was built to earn.

Frequently Asked Questions

What is the biggest change in Claude Opus 4.8?

Honesty about its own work. Opus 4.8 is roughly 4x less likely than Opus 4.7 to leave a flaw in its own code unflagged, more likely to admit uncertainty, and more willing to push back on a bad plan. Anthropic calls it one of the most prominent improvements in the release.

Is Claude Opus 4.8 worth it over Opus 4.7?

For long-running, unsupervised, or high-stakes work, yes. The benchmark gains are incremental (SWE-Bench Pro 64.3% to 69.2%), but the reliability gain on autonomous runs is large. If you mostly do small supervised edits, the upgrade is less urgent.

Why did the default effort drop to high?

Opus 4.8 at high spends about the same tokens as Opus 4.7 spent at its default, with better performance. You get more per token, so the calibrated default moved down from xhigh to high. Step up to xhigh for hard or long async tasks, and max for the hardest work.

What are Dynamic Workflows?

A Claude Code research preview, on Enterprise, Team, and Max plans, where the model plans a job, spins up hundreds of parallel subagents in one session, verifies their outputs, and reports back. It is adaptive rather than fixed, and it is built for codebase-scale migrations with your test suite as the bar.