Claude Fable 5 vs Opus 4.8

Claude Fable 5 beats Opus 4.8 on nearly every benchmark Anthropic published, and it costs exactly twice as much ($10/$50 per million tokens versus $5/$25). The right call is not "which model is better" (Fable 5 is), it is "when does paying 2x per token return more than 2x the value."

That makes this a spend-per-task decision, not a price-per-token decision. Fable 5 earns its premium on long, complex, or failure-prone work where it finishes in fewer turns, on the first try, with no human rescue. On routine, well-scoped, high-volume work, Opus 4.8 at half the price is still the rational default.

Fable 5 is the first publicly available Mythos-class model, a tier that now sits above the Opus class. Anthropic's own framing is unusually direct: its capabilities "exceed those of any model we've ever made generally available," and "the longer and more complex the task, the larger Fable 5's lead over our other models." That last line is the whole decision in one sentence.

Quick Verdict

Reach for Fable 5 when the task is hard enough that capability compounds:

large codebase migrations and multi-repo refactors
long-running autonomous agent runs you kick off and walk away from
complex financial, analytical, or scientific research
vision-heavy work (screenshot-to-code, extracting numbers from dense figures)
near-1M-token analysis where missing one detail is expensive

Stay on Opus 4.8 when the work is routine, high-volume, latency-sensitive, or bound by zero data retention. Opus 4.8 is still a strong frontier model, ahead of GPT-5.5 on hard agentic coding. It did not get worse the day Fable 5 shipped.

Key Specs

Spec	Claude Fable 5	Claude Opus 4.8
API ID	`claude-fable-5`	`claude-opus-4-8`
Model class	Mythos-class (tier above Opus)	Opus-class flagship
Release date	June 9, 2026	May 28, 2026
Context window	1M tokens	1M tokens
Max output	128K tokens	128K tokens
Input price	$10 / 1M tokens	$5 / 1M tokens
Output price	$50 / 1M tokens	$25 / 1M tokens
Thinking	Adaptive thinking only	Adaptive thinking only
Effort levels	low, medium, high (default), xhigh	low, medium, high, xhigh, max
Data retention	30-day mandatory (covered model)	Zero data retention available
Safeguard fallback	cyber / bio-chem / distillation route to Opus 4.8	none

The two rows that drive the decision are price (exactly 2x) and class (a real tier jump, not an increment). Everything below explains how to read the gap between them.

The Benchmark Gap Is Real, and It Grows With Task Length

Most point releases show a few points of movement. This is not that. Fable 5's lead over Opus 4.8 is largest exactly where the work is hardest.

Benchmark	Fable 5	Opus 4.8	Delta
SWE-Bench Pro (agentic coding)	80.3%	69.2%	+11.1 pts
FrontierCode Diamond (Cognition)	29.3%	13.4%	+15.9 pts (2.2x)
SWE-Bench Verified	95.0%	88.6%	+6.4 pts
Terminal-Bench 2.1	88.0%	82.7%	+5.3 pts
GDPval-AA (knowledge-work Elo)	1932	1890	+42 Elo
GDP.pdf (vision, no tools)	29.8%	22.5%	+7.3 pts

Read this spread carefully, because not every row means the same thing.

SWE-Bench Pro is the one to weight most. It is the hard, end-to-end variant where a coding agent has to resolve real GitHub issues, and 80.3% versus 69.2% is the cleanest signal that Fable 5 lands hard work more often. For context, that +11.1 gap over Opus is larger than Opus 4.8's own lead over Gemini 3.1 Pro (54.2%).

SWE-Bench Verified at 95.0% looks dramatic but means less. Frontier models are near the ceiling on Verified, so the harder Pro number carries the real information.

FrontierCode Diamond is the quiet standout. It measures whether code is maintainable and production-grade, not just whether tests pass, and Fable 5 more than doubles Opus 4.8. Critically, Anthropic reports Fable 5 leads frontier models on FrontierCode even at medium effort. You do not have to pay for maximum effort to beat Opus, which matters for the cost math below.

One caveat worth saying out loud. Anthropic ran most of these evaluations, and several early-customer numbers are testimonials rather than audited results. At least one open-source researcher publicly questioned whether the pre-launch numbers were chosen to flatter. Treat the benchmarks as directional and validate on your own tasks before you commit traffic.

ROI Per Task, Not Per Token

Here is the argument that decides everything. The sticker says 2x. Your bill is not the sticker.

Anthropic's head of product management for research, Dianne Penn, put it plainly to CNBC: pricing is "very top of mind" for customers, but they are not just chasing lower costs. They want higher accuracy and higher benefit per dollar, and early Fable 5 customers "noted an improvement in spend per task." Her summary: "You just get a higher ROI by having more intelligent models."

Three things move spend-per-task in Fable 5's favor:

Fewer turns. A spreadsheet-automation customer found Fable 5 beats Opus 4.8 at every effort level and finishes runs 25 to 30% faster with fewer turns. Fewer turns means fewer tool calls and less repeated exploration, which is fewer billed tokens per completed job.

Fewer tokens for the same result. A frontier physics lab reported Fable 5 was the strongest model it tested "while using a third of the reasoning tokens," reaching in 36 hours nearly where GPT-5.5 landed after four days. Do the arithmetic: one-third the tokens at twice the per-token price is two-thirds the effective cost. On that class of task, Fable 5 is cheaper despite the 2x rate card.

No human rescue. A failed Opus run that needs a developer to step in costs far more than its token bill. Base44 described apps that "took a hundred prompts a year ago" now getting one-shotted. Rakuten was blunter: "the extra thinking pays for itself."

The clearest single example is Stripe. On a 50-million-line Ruby codebase, Fable 5 ran a codebase-wide migration in one day that was estimated at over two months of a team's manual work. At $10/$50, the token bill for that day is a rounding error against two months of engineer salaries. That is what "ROI per task, not per token" looks like at the extreme.

What a Task Actually Costs on Each Model

Take a representative agentic call: 100K tokens of context in, 20K tokens out.

On Opus 4.8:

input:  100,000 tokens × $5/1M  = $0.50
output:  20,000 tokens × $25/1M = $0.50
total                           = $1.00

On Fable 5, same token usage:

input:  100,000 tokens × $10/1M = $1.00
output:  20,000 tokens × $50/1M = $1.00
total                           = $2.00

Exactly 2x, confirming the sticker, but only if Fable 5 burns the same tokens. Now apply the efficiency evidence.

Suppose the task is genuinely hard. Opus 4.8 completes it on the first try only half the time; Fable 5 lands it first try. Cost per attempt stays $1.00 on Opus and $2.00 on Fable.

Opus 4.8: 2 attempts × $1.00 = $2.00 in tokens, plus a human review of the failed run
Fable 5:  1 attempt  × $2.00 = $2.00 in tokens, no rescue

Same token bill, but the Opus path also spent a developer's afternoon. That is the spend-per-task inversion Penn described, and it is why the per-token sticker is the wrong number to optimize.

The flip side is just as real. On routine, high-volume output that Opus already handles well, the 2x premium is pure overhead. At enterprise scale, billing analysts have modeled it: 5 billion output tokens a year runs about $125,000 on Opus 4.8 versus $250,000 on Fable 5. For classification, summarization, and structured extraction, that delta is not a rounding error. It is the whole budget conversation.

When Opus 4.8 Is Still the Right Call

Fable 5 winning the benchmarks does not make Opus 4.8 the wrong default. Stay on Opus when any of these hold:

The work is routine and high-volume. Per-token economics dominate, and 2x compounds fast across millions of calls.

Latency or cost per request is the priority. Opus is cheaper and does not run the long, deliberate turns Fable 5 takes at higher effort.

You need zero data retention. Opus 4.8 supports ZDR. Fable 5 is a covered model with mandatory 30-day retention, required to run its safety classifiers. The data is not used for training, but it is retained, and for some enterprises that is a hard procurement gate regardless of the benchmarks.

Your work sits near cyber, bio, or chem boundaries. Fable 5 routes flagged queries in those domains to Opus 4.8 anyway. You would pay the Fable premium right up until the fallback fires, then get an Opus answer. On that traffic, just use Opus.

And remember the swap is not a drop-in. Fable 5 keeps thinking always on (you tune depth with effort, you cannot disable it), returns refusals as a successful HTTP 200 with a refusal stop reason your code must check, and runs longer turns that can break client timeouts. Plan the migration; do not just change the model string.

The Fallback Relationship

There is one detail with no equivalent in any Opus release. Fable 5 ships with classifiers that watch for cybersecurity, biology and chemistry, and model-distillation requests. When one trips, your query is answered by Opus 4.8 instead, and you are told it happened.

Anthropic says this fires in fewer than 5% of sessions, and that more than 95% of sessions run entirely on Fable 5. Put another way, roughly one in twenty sessions may not be running on the model you picked. On the topics that trip it, the deployable Fable 5 effectively performs like Opus 4.8, because that is literally what answers.

The cost upside: those rerouted responses bill at Opus rates, not Fable rates. So bio, chem, or security-adjacent workloads that trip the classifier get a quiet discount. The downside is unpredictability, which is its own reason to keep that traffic on Opus by choice rather than by accident.

How to Choose

The decision collapses to a routing rule. Default to the cheapest model that reliably clears your quality bar, and promote a task to Fable 5 only when Opus 4.8 demonstrably fails, loses the plan mid-task, or burns more total tokens through retries.

Scenario	Pick	Why
Large codebase migration or multi-repo refactor	Fable 5	Largest measured gap; Stripe's two-months-to-one-day
Long-running autonomous agent runs	Fable 5	Fewer turns, plan retention, memory compounds
Complex financial or analytical research	Fable 5	First model to break 90% on Hex's analytics benchmark
Vision-heavy extraction or screenshot-to-code	Fable 5	New state of the art on vision
1M-token analysis where a missed detail is expensive	Fable 5	Context plus reasoning gains
Routine code edits, helpers, Q&A	Opus 4.8 or Sonnet 4.6	Fable is overkill at 2x
Budget-capped, high-volume pipelines	Opus 4.8	Per-token economics dominate
ZDR-mandated data	Opus 4.8	Fable requires 30-day retention
Cyber, bio, or chem adjacent work	Opus 4.8	Fable routes those to Opus anyway

If you run an agent fleet, you do not pick once. Put planners and the hardest builders on Fable 5, keep evaluators, linters, doc writers, and routine testers on Opus 4.8, and let each role buy exactly the intelligence it needs. The model choice lives next to the agent, not at the project root.

The Verdict

Fable 5 is a genuine tier jump, not a point release, and it is priced like one. The benchmark gap is real and it widens as tasks get longer and harder. The premium is exactly 2x on paper, but your real number depends on whether Fable's fewer turns, fewer tokens, and higher first-try success rate buy back more than the doubled rate.

For the hard, long-horizon tail of your work, they usually do. For everything routine, Opus 4.8 at half the price still wins. Route accordingly, and let the task decide the model.

Frequently Asked Questions

Is Claude Fable 5 worth it over Opus 4.8?

For long, complex, or failure-prone tasks, yes. Fable 5 leads Opus 4.8 on every published benchmark (80.3% vs 69.2% on SWE-Bench Pro), and its fewer turns and higher first-try success rate can make spend-per-task lower despite the 2x sticker. For routine, high-volume work, Opus 4.8 at half the price is the better choice.

How much more does Claude Fable 5 cost than Opus 4.8?

Exactly double on every line of the rate card: $10 vs $5 per million input tokens and $50 vs $25 per million output tokens. A 100K-in/20K-out task costs $2.00 on Fable 5 versus $1.00 on Opus 4.8 at identical token usage. Token efficiency can narrow or even invert that gap on hard tasks.

Should I choose Claude Fable 5 or Opus 4.8 for coding?

For large migrations, multi-repo refactors, and long autonomous runs, choose Fable 5, where the SWE-Bench Pro lead and plan retention compound. For routine edits, helpers, and high-volume calls, choose Opus 4.8 or Sonnet 4.6. Many teams route both: planners and hard builders on Fable, everything else on Opus.

Why did my Claude Fable 5 request get answered by Opus 4.8?

Fable 5's safeguards route flagged cybersecurity, biology, chemistry, and distillation requests to Opus 4.8 and notify you. Anthropic says this happens in under 5% of sessions. Those responses are billed at Opus rates, not Fable rates.

Does Claude Fable 5 support zero data retention?

No. Fable 5 is a covered model with a mandatory 30-day retention requirement, needed to run its safety classifiers. Retained data is not used for training, but it is retained. Opus 4.8 still supports zero data retention, which can be the deciding factor for regulated workloads.

Is the benchmark gap reliable?

Treat it as directional. Anthropic ran most of the evaluations and several early-customer figures are testimonials rather than audited results, and at least one researcher questioned the pre-launch numbers. The SWE-Bench Pro methodology is public and has been applied across models, which makes 80.3% vs 69.2% the most trustworthy single comparison. Validate on your own tasks before committing traffic.