Claude Fable 5 Use Cases
What people actually did with Claude Fable 5 in early access: a Stripe migration in a day, Hex breaking 90% on analytics, web apps rebuilt from screenshots, and a coding agent that ships a week of work in an afternoon. Real implementations with names and numbers.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
In its first days of early access, Claude Fable 5 ran a codebase-wide migration across Stripe's 50-million-line Ruby codebase in a single day, work a whole team would have spent over two months doing by hand. It also rebuilt a web app's source code from screenshots alone, broke 90% on Hex's analytics benchmark, and shipped a week's worth of library features for an independent developer in one afternoon.
This is not a feature list. It is a proof list. Below is what real teams and one very public independent tester actually did with claude-fable-5 in its first 48 hours, with the names and the numbers attached.
A note on sourcing before you read. Most of these accounts come from early-access customers Anthropic quoted in its launch announcement, so they are first-party and vendor-curated. We flag which is which. The strongest independent signal comes from developer Simon Willison, who had no early access and ran his own tests on launch day.
The Proof List at a Glance
| Company / test | Use case | Result |
|---|---|---|
| Stripe | Codebase-wide migration, 50M-line Ruby codebase | 1 day vs over 2 months for a whole team |
| Cognition (Devin) | FrontierCode coding eval | Highest of any frontier model, even at medium effort |
| Cursor | Long-horizon coding (CursorBench) | State of the art; unlocked previously out-of-reach problems |
| GitHub | Complex long-horizon coding | Autonomy and reliability beyond previous benchmarks |
| Base44 | One-shotting full apps | Apps that took 100 prompts a year ago now one-shot |
| Genspark | UI design and game coding | Beat every other model tested |
| Hebbia | Finance Benchmark (senior reasoning) | Highest score of any model |
| IMC | Trading-analysis evals | Aced them nearly across the board |
| Hex | Core analytics benchmark | First model to break 90%, a 10-point jump over Opus |
| Physics lab | Frontier physics research | One third of the reasoning tokens; 36 hours got near GPT-5.5's four days |
| Legal team | Contract redlines (blind review) | Matched or beat their current model every time |
| Spreadsheet suite | Everyday spreadsheet tasks | Beats Opus 4.8 at every effort, 25-30% faster |
| Rakuten | Highly autonomous operations | Validates its own work; "the extra thinking pays for itself" |
| Anthropic (vision) | Rebuild web app from screenshots | Reconstructed source from screenshots alone |
| Simon Willison | MicroPython to full CPython in WASM | Working installable wheel in a day |
Coding, Migrations, and Long-Horizon Engineering
This is the category where Fable 5's lead is widest, and Anthropic is explicit about why: the longer and more complex the task, the bigger Fable's advantage over its other models.
The flagship example is Stripe. According to Anthropic's announcement, Stripe reported that Fable 5 "compressed months of engineering into days." In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand. That is the kind of work that normally gets scoped into quarters, not afternoons.
The agent and editor companies tell a consistent story. Cursor reported that Fable 5 is "the state of the art model on CursorBench" and that "it's opened up a class of long-horizon problems that were out of reach for earlier models." Cognition, the team behind Devin, said it is the highest-scoring model on their FrontierBench coding eval, that it "excels at long-horizon reasoning and generalizes to unfamiliar tools out of the box," and that it scores highest among frontier models even at medium effort. GitHub said that in early testing it took on complex, long-horizon coding tasks "with a level of autonomy and reliability that exceeded previous benchmarks."
For builders without a large legacy codebase, the vibe-coding numbers matter more. Base44 reported that "apps that took a hundred prompts a year ago, it now one-shots," and told TechCrunch that Fable is better at one-shotting full apps with excellent tool-calling. Genspark told TechCrunch that Fable beat every other model in its evaluations and was significantly better at UI design and game coding.
The one fully independent account comes from Simon Willison, who had no early access. In about five and a half hours on launch day, he used Fable inside Claude Code to add a human-in-the-loop pause-and-approve feature to his Datasette Agent project. When he told it that changes to his underlying LLM library were also in scope, it implemented four upstream features to support the work cleanly, then shipped them as a release. His verdict: "I spent several hours on it today, but it feels like several days' worth of work," and he praised the quality of the API design, tests, code, and documentation.
What this means for you: the unlock is not "writes code faster," it is "stays coherent across a job too big to babysit." If you have a migration, a refactor, or a feature that spans many files and would normally eat a sprint, this is the model you point at it. For small daily edits, Sonnet is still the cheaper, faster call.
Knowledge Work: Finance, Analytics, and Research
Fable 5 is not just a coding model. Some of the sharpest early results came from analysts.
Hex, the analytics platform, said Fable 5 was "the first to break 90% on our core analytics benchmark of complex, long-running analytical tasks," a 10-point jump over Opus, adding that "on the hardest questions, it shows strong judgment and attention to nuance." TechCrunch independently re-reported that result, which makes it one of the better-corroborated claims in the launch.
In finance, Hebbia reported that Fable 5 has the highest score of any model on its Finance Benchmark for senior-level reasoning, with substantial gains in document-based reasoning and chart and table interpretation. The trading firm IMC said Fable "aced their trading-analysis evaluations nearly across the board," including factual lookup, conceptual reasoning, root-cause analysis, and expected-value analysis.
The research results are the most striking. A physics research lab told Anthropic that Fable 5 is "the strongest model we've tested on frontier physics research while using a third of the reasoning tokens," and that "in 36 hours it got nearly to where GPT-5.5 landed after four days." Less compute, less time, comparable destination.
Even the unglamorous spreadsheet work improved. One customer reported that Fable beats Opus 4.8 on their everyday spreadsheet suite at every effort level, finishing runs 25 to 30% faster with fewer turns.
What this means for you: if your work is reading dense source material and getting the details right, finance memos, analytics pipelines, research synthesis, the gains here are about judgment under ambiguity, not raw speed. The token-efficiency angle is real too. Faster runs at lower effort levels can offset the higher per-token price.
Vision: Screenshots In, Code Out
Anthropic calls Fable 5 the new state of the art for vision tasks, and the examples are concrete rather than abstract.
The headline one for builders: Fable 5 can rebuild a web app's source code from screenshots alone. It can also extract precise numbers from detailed scientific figures, the kind of chart-reading that usually requires a human to transcribe.
The clearest demonstration of how far the vision gains go is a game. Earlier Claude models struggled to play Pokemon FireRed even when given harnesses full of helper tools, maps, and game-state information. Fable 5 beat the game using a minimal, vision-only harness, working from nothing but raw screenshots. The model is doing the navigation and planning itself, off the pixels, instead of leaning on scaffolding someone built for it.
What this means for you: screenshot-to-code and figure-extraction are now reliable enough to put in a workflow. If you have design mocks, dashboard captures, or scientific PDFs, you can hand them over directly instead of transcribing first. Less scaffolding required is the practical theme: the model meets messy real interfaces with fewer custom tools.
Long-Running Agents, Memory, and Self-Validation
The trait that makes all of the above usable is what happens when no human is watching.
Rakuten put it plainly in a statement reported by TechCrunch: "At the highest effort, Claude Fable 5 reflects on and validates its own work. For us, that's what makes highly autonomous operations possible. The extra thinking pays for itself." That self-check is the difference between an agent you can leave running and one you have to re-verify line by line.
Memory compounds the effect. In Anthropic's own test, the model played the deck-building game Slay the Spire with access to persistent file-based memory. That memory improved Fable's performance three times more than it improved Opus 4.8's, and Fable reached the game's final act three times more often. The model is not just remembering, it is improving its own play from its own notes across a long run.
On the agent-orchestration side, Anthropic's documentation says Fable 5 is significantly more dependable at dispatching and sustaining parallel subagents and at managing communication with long-running ones. One early customer reported that it "delivers more capable engineering in fewer turns" while handling the complex multi-agent Claude Code workflows their employees run daily.
What this means for you: this is the model for work you kick off and walk away from. If you run agents overnight, fan out subagents across a large job, or build autonomous pipelines, the self-validation is the load-bearing feature. It is also why people are reaching for it on jobs Opus 4.8 could not finish unsupervised.
Science, via the Same Underlying Model
The most dramatic results came from Mythos 5, which is the same underlying model as Fable 5 with the safety classifiers lifted. Worth reading with one caveat: public Fable 5 falls back to Opus 4.8 on most biology and chemistry queries, so you cannot necessarily reproduce these on the public model. They show what the model class is capable of, not what an open API call will do today.
With that flagged, the numbers are notable. Anthropic's internal protein-design experts reported accelerating parts of the drug-design process by around ten times. Running with protein-design and bioinformatics tools but no human assistance, the model matched or beat skilled human operators, choosing binding sites, selecting and running tools, and recovering from its own failures. Nine of the 14 protein targets in the study yielded strong drug-design candidates.
In molecular biology, Anthropic's scientists preferred the model's hypotheses about 80% of the time over Opus-class models in blinded comparisons, and one hypothesis, a novel mechanism for an E. coli protein, was independently corroborated by another lab working on the same problem. In genomics, the model ran over a week of largely autonomous work, assembled single-cell data across 138 animal species, and trained a custom model that outperformed a recent Science-published model while being 100 times smaller.
What this means for you: unless you are in a trusted-access research program, treat these as a ceiling demo rather than a daily capability. The signal for builders is the shape of it: a model that can run for a week, recover from its own dead ends, and produce a result worth publishing is the same engine doing your migrations.
The Catch: Cost, Guardrails, and a Closing Window
Fable 5 is the most capable model Anthropic has released to the public, and the trade-offs are honest ones.
It is expensive. Pricing is $10 per million input tokens and $50 per million output tokens, double Opus 4.8 and the same as the much-pricier Mythos Preview was at half its old rate. Simon Willison burned $110 of tokens in a single day of testing. The model is also slow, the flip side of feeling, in his words, "something of a beast." The token-efficiency gains some customers reported can soften the bill, but you should measure on your own workloads before committing.
There are guardrails. When Fable's classifiers detect a query about cybersecurity, biology and chemistry, or model distillation, the response is handled by Opus 4.8 instead and you are told. Anthropic's early data shows this happens in fewer than 5% of sessions, so for the vast majority of work you get Fable's full capability. But the fallbacks are tuned conservatively and will occasionally catch harmless requests.
There is also a clock. From launch through June 22, 2026, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost. On June 23 it leaves those plans and requires usage credits, with Anthropic aiming to restore it to standard subscriptions once capacity allows. If you want to test it on your own work without a separate bill, that window is the time.
Frequently Asked Questions
What has Claude Fable 5 actually been used for?
Real early-access work, mostly large coding jobs and analysis. Stripe ran a codebase-wide migration in a 50-million-line Ruby codebase in one day. Hex broke 90% on its analytics benchmark. Hebbia and IMC topped their finance and trading evals. Anthropic also showed it rebuilding a web app's source from screenshots and playing Pokemon FireRed from raw pixels. Most accounts come from Anthropic's launch announcement, so they are first-party.
Is Claude Fable 5 good at coding?
The early evidence says yes, especially for big, long-running jobs. Cursor called it state of the art on CursorBench, Cognition ranked it highest on their FrontierBench coding eval, and GitHub reported autonomy and reliability beyond previous benchmarks. Independent tester Simon Willison shipped a week's worth of library features in an afternoon with it. For small daily edits, a cheaper model like Sonnet is usually the better call.
How much does Claude Fable 5 cost?
It is $10 per million input tokens and $50 per million output tokens, double the price of Opus 4.8. The model ID is claude-fable-5. It is included free on Pro, Max, Team, and seat-based Enterprise plans through June 22, 2026, after which it requires usage credits until capacity allows a return to standard plans.
Why does Claude Fable 5 sometimes answer like a different model?
Fable 5 ships with safety classifiers. When a query touches cybersecurity, biology and chemistry, or attempts to distill the model, the response is handled by Opus 4.8 instead and you are notified. Anthropic says this fallback triggers in fewer than 5% of sessions, so most work runs on Fable 5 at full capability.
Can Claude Fable 5 do the science demos Anthropic showed?
Not directly on the public model in most cases. The protein design, genomics, and molecular biology results were produced by Mythos 5, the same underlying model with safeguards lifted, available only through trusted-access programs. Public Fable 5 falls back to Opus 4.8 on most biology and chemistry queries. Treat those results as a ceiling for the model class, not a daily public capability.
Is Claude Fable 5 worth it over Opus 4.8?
For long-horizon, autonomous, or high-stakes work, the early reports point to a clear step up. Customers consistently described it solving problems that were out of reach for earlier models, and it beats Opus 4.8 on benchmarks like the spreadsheet suite at every effort level. The trade-offs are real: double the price and slower runs. For routine work, Opus 4.8 or Sonnet remains the more economical choice.
Sources
- Claude Fable 5 and Claude Mythos 5 (Anthropic)
- Anthropic's Claude Fable 5 is a version of Mythos the public can access today (TechCrunch)
- Anthropic releases Fable 5, the first public Mythos-class model (NBC News)
- Anthropic is releasing a public version of its Mythos AI model as Claude Fable 5 (Quartz)
- Initial impressions of Claude Fable 5 (Simon Willison)
Related Pages
Stop configuring. Start building.
SaaS builder templates with AI orchestration.