Build This Now
Build This Now
Claude Code ModelsClaude Fable 5 CheatsheetClaude Fable 5 vs Opus 4.8Claude Fable 5 Use CasesClaude Fable 5 Pricing & Cost ControlClaude Fable 5 API GuideClaude Fable 5 in Claude CodeClaude Fable 5 Safeguards ExplainedOpus 4.8 CheatsheetDeepSeek V4: Pricing, Context, and MigrationClaude Code Quality Regression: What Actually HappenedClaude Opus 4.7 vs GPT-5.5Claude Opus 4.7 vs Other AI ModelsClaude Mythos: The Model That Thinks in LoopsClaude Opus 4.5 in Claude CodeClaude Opus 4.7Claude Opus 4.7 vs 4.6Claude Opus 4.7 Use CasesClaude Opus 4.6Claude Sonnet 4.6Claude Opus 4.5Claude Sonnet 4.5Claude Haiku 4.5Claude Opus 4.1Claude 4Claude 3.7 SonnetClaude 3.5 Sonnet v2 and Claude 3.5 HaikuClaude 3.5 SonnetClaude 3Every Claude ModelBest AI Model for Coding in 2026 (Tested & Ranked)
speedy_devvkoen_salo
Blog/Model Picker/Claude Fable 5 vs Opus 4.8

Claude Fable 5 vs Opus 4.8

Claude Fable 5 wins nearly every published benchmark over Opus 4.8 and costs exactly twice as much. It pays for itself when a task is long, complex, or failure-prone enough that 2x the token price buys more than 2x the value.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

Published Jun 10, 202612 min readModel Picker hub

Claude Fable 5 beats Opus 4.8 on nearly every benchmark Anthropic published, and it costs exactly twice as much ($10/$50 per million tokens versus $5/$25). The right call is not "which model is better" (Fable 5 is), it is "when does paying 2x per token return more than 2x the value."

That makes this a spend-per-task decision, not a price-per-token decision. Fable 5 earns its premium on long, complex, or failure-prone work where it finishes in fewer turns, on the first try, with no human rescue. On routine, well-scoped, high-volume work, Opus 4.8 at half the price is still the rational default.

Fable 5 is the first publicly available Mythos-class model, a tier that now sits above the Opus class. Anthropic's own framing is unusually direct: its capabilities "exceed those of any model we've ever made generally available," and "the longer and more complex the task, the larger Fable 5's lead over our other models." That last line is the whole decision in one sentence.

Quick Verdict

Reach for Fable 5 when the task is hard enough that capability compounds:

  • large codebase migrations and multi-repo refactors
  • long-running autonomous agent runs you kick off and walk away from
  • complex financial, analytical, or scientific research
  • vision-heavy work (screenshot-to-code, extracting numbers from dense figures)
  • near-1M-token analysis where missing one detail is expensive

Stay on Opus 4.8 when the work is routine, high-volume, latency-sensitive, or bound by zero data retention. Opus 4.8 is still a strong frontier model, ahead of GPT-5.5 on hard agentic coding. It did not get worse the day Fable 5 shipped.

Key Specs

SpecClaude Fable 5Claude Opus 4.8
API IDclaude-fable-5claude-opus-4-8
Model classMythos-class (tier above Opus)Opus-class flagship
Release dateJune 9, 2026May 28, 2026
Context window1M tokens1M tokens
Max output128K tokens128K tokens
Input price$10 / 1M tokens$5 / 1M tokens
Output price$50 / 1M tokens$25 / 1M tokens
ThinkingAdaptive thinking onlyAdaptive thinking only
Effort levelslow, medium, high (default), xhighlow, medium, high, xhigh, max
Data retention30-day mandatory (covered model)Zero data retention available
Safeguard fallbackcyber / bio-chem / distillation route to Opus 4.8none

The two rows that drive the decision are price (exactly 2x) and class (a real tier jump, not an increment). Everything below explains how to read the gap between them.

The Benchmark Gap Is Real, and It Grows With Task Length

Most point releases show a few points of movement. This is not that. Fable 5's lead over Opus 4.8 is largest exactly where the work is hardest.

BenchmarkFable 5Opus 4.8Delta
SWE-Bench Pro (agentic coding)80.3%69.2%+11.1 pts
FrontierCode Diamond (Cognition)29.3%13.4%+15.9 pts (2.2x)
SWE-Bench Verified95.0%88.6%+6.4 pts
Terminal-Bench 2.188.0%82.7%+5.3 pts
GDPval-AA (knowledge-work Elo)19321890+42 Elo
GDP.pdf (vision, no tools)29.8%22.5%+7.3 pts

Read this spread carefully, because not every row means the same thing.

SWE-Bench Pro is the one to weight most. It is the hard, end-to-end variant where a coding agent has to resolve real GitHub issues, and 80.3% versus 69.2% is the cleanest signal that Fable 5 lands hard work more often. For context, that +11.1 gap over Opus is larger than Opus 4.8's own lead over Gemini 3.1 Pro (54.2%).

SWE-Bench Verified at 95.0% looks dramatic but means less. Frontier models are near the ceiling on Verified, so the harder Pro number carries the real information.

FrontierCode Diamond is the quiet standout. It measures whether code is maintainable and production-grade, not just whether tests pass, and Fable 5 more than doubles Opus 4.8. Critically, Anthropic reports Fable 5 leads frontier models on FrontierCode even at medium effort. You do not have to pay for maximum effort to beat Opus, which matters for the cost math below.

One caveat worth saying out loud. Anthropic ran most of these evaluations, and several early-customer numbers are testimonials rather than audited results. At least one open-source researcher publicly questioned whether the pre-launch numbers were chosen to flatter. Treat the benchmarks as directional and validate on your own tasks before you commit traffic.

ROI Per Task, Not Per Token

Here is the argument that decides everything. The sticker says 2x. Your bill is not the sticker.

Anthropic's head of product management for research, Dianne Penn, put it plainly to CNBC: pricing is "very top of mind" for customers, but they are not just chasing lower costs. They want higher accuracy and higher benefit per dollar, and early Fable 5 customers "noted an improvement in spend per task." Her summary: "You just get a higher ROI by having more intelligent models."

Three things move spend-per-task in Fable 5's favor:

Fewer turns. A spreadsheet-automation customer found Fable 5 beats Opus 4.8 at every effort level and finishes runs 25 to 30% faster with fewer turns. Fewer turns means fewer tool calls and less repeated exploration, which is fewer billed tokens per completed job.

Fewer tokens for the same result. A frontier physics lab reported Fable 5 was the strongest model it tested "while using a third of the reasoning tokens," reaching in 36 hours nearly where GPT-5.5 landed after four days. Do the arithmetic: one-third the tokens at twice the per-token price is two-thirds the effective cost. On that class of task, Fable 5 is cheaper despite the 2x rate card.

No human rescue. A failed Opus run that needs a developer to step in costs far more than its token bill. Base44 described apps that "took a hundred prompts a year ago" now getting one-shotted. Rakuten was blunter: "the extra thinking pays for itself."

The clearest single example is Stripe. On a 50-million-line Ruby codebase, Fable 5 ran a codebase-wide migration in one day that was estimated at over two months of a team's manual work. At $10/$50, the token bill for that day is a rounding error against two months of engineer salaries. That is what "ROI per task, not per token" looks like at the extreme.

What a Task Actually Costs on Each Model

Take a representative agentic call: 100K tokens of context in, 20K tokens out.

On Opus 4.8:

input:  100,000 tokens × $5/1M  = $0.50
output:  20,000 tokens × $25/1M = $0.50
total                           = $1.00

On Fable 5, same token usage:

input:  100,000 tokens × $10/1M = $1.00
output:  20,000 tokens × $50/1M = $1.00
total                           = $2.00

Exactly 2x, confirming the sticker, but only if Fable 5 burns the same tokens. Now apply the efficiency evidence.

Suppose the task is genuinely hard. Opus 4.8 completes it on the first try only half the time; Fable 5 lands it first try. Cost per attempt stays $1.00 on Opus and $2.00 on Fable.

Opus 4.8: 2 attempts × $1.00 = $2.00 in tokens, plus a human review of the failed run
Fable 5:  1 attempt  × $2.00 = $2.00 in tokens, no rescue

Same token bill, but the Opus path also spent a developer's afternoon. That is the spend-per-task inversion Penn described, and it is why the per-token sticker is the wrong number to optimize.

The flip side is just as real. On routine, high-volume output that Opus already handles well, the 2x premium is pure overhead. At enterprise scale, billing analysts have modeled it: 5 billion output tokens a year runs about $125,000 on Opus 4.8 versus $250,000 on Fable 5. For classification, summarization, and structured extraction, that delta is not a rounding error. It is the whole budget conversation.

When Opus 4.8 Is Still the Right Call

Fable 5 winning the benchmarks does not make Opus 4.8 the wrong default. Stay on Opus when any of these hold:

The work is routine and high-volume. Per-token economics dominate, and 2x compounds fast across millions of calls.

Latency or cost per request is the priority. Opus is cheaper and does not run the long, deliberate turns Fable 5 takes at higher effort.

You need zero data retention. Opus 4.8 supports ZDR. Fable 5 is a covered model with mandatory 30-day retention, required to run its safety classifiers. The data is not used for training, but it is retained, and for some enterprises that is a hard procurement gate regardless of the benchmarks.

Your work sits near cyber, bio, or chem boundaries. Fable 5 routes flagged queries in those domains to Opus 4.8 anyway. You would pay the Fable premium right up until the fallback fires, then get an Opus answer. On that traffic, just use Opus.

And remember the swap is not a drop-in. Fable 5 keeps thinking always on (you tune depth with effort, you cannot disable it), returns refusals as a successful HTTP 200 with a refusal stop reason your code must check, and runs longer turns that can break client timeouts. Plan the migration; do not just change the model string.

The Fallback Relationship

There is one detail with no equivalent in any Opus release. Fable 5 ships with classifiers that watch for cybersecurity, biology and chemistry, and model-distillation requests. When one trips, your query is answered by Opus 4.8 instead, and you are told it happened.

Anthropic says this fires in fewer than 5% of sessions, and that more than 95% of sessions run entirely on Fable 5. Put another way, roughly one in twenty sessions may not be running on the model you picked. On the topics that trip it, the deployable Fable 5 effectively performs like Opus 4.8, because that is literally what answers.

The cost upside: those rerouted responses bill at Opus rates, not Fable rates. So bio, chem, or security-adjacent workloads that trip the classifier get a quiet discount. The downside is unpredictability, which is its own reason to keep that traffic on Opus by choice rather than by accident.

How to Choose

The decision collapses to a routing rule. Default to the cheapest model that reliably clears your quality bar, and promote a task to Fable 5 only when Opus 4.8 demonstrably fails, loses the plan mid-task, or burns more total tokens through retries.

ScenarioPickWhy
Large codebase migration or multi-repo refactorFable 5Largest measured gap; Stripe's two-months-to-one-day
Long-running autonomous agent runsFable 5Fewer turns, plan retention, memory compounds
Complex financial or analytical researchFable 5First model to break 90% on Hex's analytics benchmark
Vision-heavy extraction or screenshot-to-codeFable 5New state of the art on vision
1M-token analysis where a missed detail is expensiveFable 5Context plus reasoning gains
Routine code edits, helpers, Q&AOpus 4.8 or Sonnet 4.6Fable is overkill at 2x
Budget-capped, high-volume pipelinesOpus 4.8Per-token economics dominate
ZDR-mandated dataOpus 4.8Fable requires 30-day retention
Cyber, bio, or chem adjacent workOpus 4.8Fable routes those to Opus anyway

If you run an agent fleet, you do not pick once. Put planners and the hardest builders on Fable 5, keep evaluators, linters, doc writers, and routine testers on Opus 4.8, and let each role buy exactly the intelligence it needs. The model choice lives next to the agent, not at the project root.

The Verdict

Fable 5 is a genuine tier jump, not a point release, and it is priced like one. The benchmark gap is real and it widens as tasks get longer and harder. The premium is exactly 2x on paper, but your real number depends on whether Fable's fewer turns, fewer tokens, and higher first-try success rate buy back more than the doubled rate.

For the hard, long-horizon tail of your work, they usually do. For everything routine, Opus 4.8 at half the price still wins. Route accordingly, and let the task decide the model.

Frequently Asked Questions

Is Claude Fable 5 worth it over Opus 4.8?

For long, complex, or failure-prone tasks, yes. Fable 5 leads Opus 4.8 on every published benchmark (80.3% vs 69.2% on SWE-Bench Pro), and its fewer turns and higher first-try success rate can make spend-per-task lower despite the 2x sticker. For routine, high-volume work, Opus 4.8 at half the price is the better choice.

How much more does Claude Fable 5 cost than Opus 4.8?

Exactly double on every line of the rate card: $10 vs $5 per million input tokens and $50 vs $25 per million output tokens. A 100K-in/20K-out task costs $2.00 on Fable 5 versus $1.00 on Opus 4.8 at identical token usage. Token efficiency can narrow or even invert that gap on hard tasks.

Should I choose Claude Fable 5 or Opus 4.8 for coding?

For large migrations, multi-repo refactors, and long autonomous runs, choose Fable 5, where the SWE-Bench Pro lead and plan retention compound. For routine edits, helpers, and high-volume calls, choose Opus 4.8 or Sonnet 4.6. Many teams route both: planners and hard builders on Fable, everything else on Opus.

Why did my Claude Fable 5 request get answered by Opus 4.8?

Fable 5's safeguards route flagged cybersecurity, biology, chemistry, and distillation requests to Opus 4.8 and notify you. Anthropic says this happens in under 5% of sessions. Those responses are billed at Opus rates, not Fable rates.

Does Claude Fable 5 support zero data retention?

No. Fable 5 is a covered model with a mandatory 30-day retention requirement, needed to run its safety classifiers. Retained data is not used for training, but it is retained. Opus 4.8 still supports zero data retention, which can be the deciding factor for regulated workloads.

Is the benchmark gap reliable?

Treat it as directional. Anthropic ran most of the evaluations and several early-customer figures are testimonials rather than audited results, and at least one researcher questioned the pre-launch numbers. The SWE-Bench Pro methodology is public and has been applied across models, which makes 80.3% vs 69.2% the most trustworthy single comparison. Validate on your own tasks before committing traffic.

Sources

  • Claude Fable 5 and Claude Mythos 5
  • Anthropic's Claude Fable 5 is a version of Mythos the public can access today (TechCrunch)
  • Anthropic releases Mythos-like AI model to the public (CNBC)
  • Claude Fable 5 on AWS (AWS News Blog)
  • Claude Fable 5 and Mythos 5 benchmarks explained (Vellum)
  • Claude Fable 5 vs Opus 4.8: Benchmarks, Pricing & When to Use Each (TrueFoundry)
  • Prompting Claude Fable 5 (API docs)

Related Pages

  • Opus 4.8 Cheatsheet
  • Claude Fable 5 Pricing and Cost Control
  • Claude Opus 4.7 vs 4.6
  • Claude Code Models

More in Model Picker

  • Claude Mythos: The Model That Thinks in Loops
    Claude Mythos is suspected to use recurrent-depth architecture: one shared layer looped N times, with ACT halting so hard questions get more passes and easy ones stop early.
  • Claude Opus 4.7 vs Other AI Models
    Claude Opus 4.7, GPT-5.4, Kimi K2.6, Gemini 3.1 Pro, DeepSeek V3.2: benchmarks, context windows, agent reliability, and cost, so you reach for the right one.
  • DeepSeek V4: Pricing, Context, and Migration
    DeepSeek V4 ships two models: V4-Flash at $0.28/M output and V4-Pro at $3.48/M. Both carry a genuine 1M context window and drop into any Anthropic-compatible SDK with one line changed.
  • Every Claude Model
    Every Claude model on one page: Claude 3, 3.5, 3.7, 4, Opus 4.1 to 4.6, Sonnet 4.5 and 4.6, Haiku 4.5. Specs, pricing, benchmarks, and when to use each.
  • Best AI Model for Coding in 2026 (Tested & Ranked)
    The best AI model for coding in 2026, ranked by use case and budget: Claude Opus 4.8 for hardest agentic work, GPT-5.5 for terminal agents, DeepSeek V4 for value, with cited benchmarks.
  • Claude 3.5 Sonnet v2 and Claude 3.5 Haiku
    Claude 3.5 Sonnet v2 and 3.5 Haiku launched October 2024 with Computer Use beta, cursor control, upgraded coding and tool use, and cheaper Haiku at $0.80/$4.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

On this page

Quick Verdict
Key Specs
The Benchmark Gap Is Real, and It Grows With Task Length
ROI Per Task, Not Per Token
What a Task Actually Costs on Each Model
When Opus 4.8 Is Still the Right Call
The Fallback Relationship
How to Choose
The Verdict
Frequently Asked Questions
Is Claude Fable 5 worth it over Opus 4.8?
How much more does Claude Fable 5 cost than Opus 4.8?
Should I choose Claude Fable 5 or Opus 4.8 for coding?
Why did my Claude Fable 5 request get answered by Opus 4.8?
Does Claude Fable 5 support zero data retention?
Is the benchmark gap reliable?
Sources
Related Pages

Stop configuring. Start building.

SaaS builder templates with AI orchestration.