Claude Fable 5 API Guide

To call Claude Fable 5, use the model ID claude-fable-5 and turn thinking on with thinking: {"type": "adaptive"}. That is the whole happy path. The traps are everything around it: sampling parameters that used to work now return a 400, and one parameter that is fine on Opus 4.8 will break your request on Fable 5.

Fable 5 launched June 9, 2026 as Anthropic's most capable model, a tier above Opus. It has a 1M context window and a 128K output ceiling, and it costs $10 per million input tokens and $50 per million output tokens. That is double Opus 4.8's rate, so this is a model you point at hard problems, not routine work.

This is the practical guide. How to make the call, what 400s and why, how to tune effort and budgets, how caching changes, and the one Bedrock step that will block you if you skip it.

The Minimal Working Call

Start here. This is the smallest call that works, in Python with the official Anthropic SDK. The client reads your ANTHROPIC_API_KEY from the environment, picks the model, turns on adaptive thinking, and sets effort to xhigh because that is the right setting for coding and agentic work.

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-fable-5",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "xhigh"},
    messages=[{"role": "user", "content": "Explain what a row-level security policy does."}],
)

print(message.content[0].text)

The model ID is the exact string claude-fable-5. There is no date suffix. If you append one, you get a 404.

If you do not want thinking at all, you remove the thinking line. You do not set it to disabled. That distinction is the single biggest gotcha on this model, and it gets its own section below.

The 400 Traps

Fable 5 dropped a set of parameters that earlier models accepted. Sending any of them returns an HTTP 400 (invalid_request_error). Here is the full matrix of what breaks and how to fix it.

Parameter you send	What happens	Fix
`thinking: {"type": "adaptive"}`	Works. This is the on-mode.	Keep it.
`thinking: {"type": "enabled", "budget_tokens": N}`	400	Use `{"type": "adaptive"}`.
`thinking: {"type": "disabled"}`	400 (new on Fable 5)	Omit the `thinking` param entirely.
(no `thinking` field)	Runs without thinking.	This is how you turn thinking off.
`temperature`	400	Remove it. Steer with the prompt instead.
`top_p`	400	Remove it.
`top_k`	400	Remove it.
Assistant-turn prefill	400	Use structured outputs (`output_config.format`).
`output_format` (top-level)	Deprecated API-wide	Use `output_config: {"format": {...}}`.

A prefill means your messages array ends with a role: "assistant" turn that you started writing for the model to continue. That pattern returns a 400 on Fable 5. If you used it to force JSON, switch to structured outputs, shown later.

Why Thinking Disabled Returns a 400

This is the trap that breaks "drop-in" upgrades, so it is worth being precise.

On Opus 4.8 and Opus 4.7, two things both run a request without thinking: omitting the thinking field, or setting thinking: {"type": "disabled"}. They are interchangeable.

On Fable 5 they are not. An explicit thinking: {"type": "disabled"} returns a 400. Only omitting the field runs thinking-free. So code that worked yesterday on Opus and explicitly set disabled will start erroring the moment you swap the model ID.

This snippet shows the broken version and the working version side by side. The first one 400s on Fable 5. The second one runs without thinking, which is what the first one was trying to do.

# WRONG on Fable 5 - returns a 400
message = client.messages.create(
    model="claude-fable-5",
    max_tokens=4096,
    thinking={"type": "disabled"},
    messages=[{"role": "user", "content": "Classify this ticket as bug or feature."}],
)

# RIGHT - omit the thinking param to run without thinking
message = client.messages.create(
    model="claude-fable-5",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Classify this ticket as bug or feature."}],
)

The fix is a deletion, not a value change. Remove the line. Do not try to find a different value for type.

One more behavior change worth knowing: with thinking off, Fable 5 sometimes writes longer reasoning into the visible answer. If you need short, fast replies, the cleanest fix is to leave adaptive thinking on, which keeps the reasoning out of the final text. If you must run thinking-free, add a system instruction telling the model to reply with only its final answer.

Effort Tuning

Effort controls how hard the model thinks and acts. You set it inside output_config, not at the top level. The default is high.

This call sets effort explicitly. Lower values mean fewer tool calls, less preamble, and terser output. Higher values mean deeper reasoning and more thoroughness, at higher token cost.

message = client.messages.create(
    model="claude-fable-5",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},  # low | medium | high | xhigh | max
    messages=[{"role": "user", "content": "Refactor this module for testability."}],
)

Here is how to pick a level.

Level	Use it for
`low`	Subagents, simple scoped tasks, latency-sensitive work, cheap bulk runs.
`medium`	Cost-sensitive work that can trade some intelligence for fewer tokens.
`high`	The default. The recommended minimum for anything intelligence-sensitive.
`xhigh`	Coding and agentic work. The best setting for most of it, and the Claude Code default.
`max`	The hardest problems and ceiling testing, where correctness beats cost.

Effort matters more on Fable 5 than on any earlier Opus. If you are migrating, re-tune it per route instead of assuming the old setting still fits. For long-running agentic tasks, give the full task spec up front in one turn and run at high or xhigh. At xhigh or max, give max_tokens real headroom (start at 64K) so the model has room to think and act across tool calls.

Task Budgets

A task budget tells the model how many tokens it has for a whole agentic loop, counting thinking, tool calls, and final output. The model sees a running countdown and paces itself, wrapping up gracefully as the budget runs down.

This is different from max_tokens. max_tokens is a hard ceiling the model cannot see. A task budget is a suggestion the model is aware of and plans around. Use the budget to make the model self-moderate, and keep max_tokens as the enforced cap.

Task budgets are in beta, so you pass a beta header and call through the beta namespace. The minimum budget is 20,000 tokens.

message = client.beta.messages.create(
    betas=["task-budgets-2026-03-13"],
    model="claude-fable-5",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    output_config={
        "effort": "high",
        "task_budget": {"type": "tokens", "total": 128000},
    },
    messages=[{"role": "user", "content": "Migrate the auth module off the deprecated API."}],
)

Set a generous budget for open-ended agentic runs and a tighter one for latency-sensitive tasks. If the budget is too small for the work, the model finishes less thoroughly and tells you the budget was the constraint.

Caching Notes

Prompt caching works the same way it does on every Claude model: it is a prefix match, so any byte change anywhere in the cached prefix invalidates everything after it. Keep stable content first and put volatile content (timestamps, request IDs, the per-request question) after your last breakpoint.

Two numbers are different on Fable 5. The minimum cacheable prefix is 2048 tokens, down from 4096 on Opus 4.8. So a 3K-token prompt that silently failed to cache on Opus 4.8 will cache on Fable 5. Cache reads cost about 0.1x the base input rate, and writes cost about 1.25x with the default five-minute window.

This call caches a large, stable system prompt. The cache_control marker goes on the system block. The per-request question in the user turn stays uncached, which is what you want.

message = client.messages.create(
    model="claude-fable-5",
    max_tokens=16000,
    system=[
        {
            "type": "text",
            "text": LARGE_STABLE_SYSTEM_PROMPT,
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "Summarize the key risks in section 4."}],
)

To confirm a cache hit, read message.usage.cache_read_input_tokens. If it stays zero across repeated requests with the same prefix, something in the prefix is changing every call. The usual culprits are a timestamp in the system prompt, JSON serialized without sorted keys, or a tool set that varies per request.

Streaming for Large Outputs

Fable 5 can produce up to 128K output tokens, but the SDK will hit an HTTP timeout on a non-streaming request that asks for a large output. The rule of thumb: stream anything above roughly 16K max_tokens.

This streams the response and collects the final message at the end, so you get both live tokens and a complete result object.

with client.messages.stream(
    model="claude-fable-5",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    output_config={"effort": "xhigh"},
    messages=[{"role": "user", "content": "Write the full migration plan."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    final = stream.get_final_message()

One streaming note specific to Fable 5: thinking blocks stream, but their text is empty by default. If you display reasoning to users, that default looks like a long silent pause before the answer starts. To restore visible progress, ask for summarized thinking with thinking={"type": "adaptive", "display": "summarized"}.

Fable 5 on Amazon Bedrock

Bedrock has one step that will block every request until you do it: you must opt into data sharing before you can invoke Fable 5. There is no console UI for this at launch, so it is easy to miss and the error it produces is not obvious.

You opt in through the Data Retention API by setting the mode to provider_data_share. This curl call does it for the bedrock-mantle engine, which is the one you use with the Anthropic Messages API. Run it once per account before your first model call.

curl -X PUT https://bedrock-mantle.us-east-1.api.aws/v1/data_retention \
  -H "x-api-key: <your-bedrock-api-key>" \
  -H "Content-Type: application/json" \
  -d '{ "mode": "provider_data_share" }'

If you call the Converse or Invoke API on the bedrock-runtime engine instead, the endpoint and auth differ. This is the equivalent opt-in for that engine.

curl -X PUT https://bedrock.us-east-1.amazonaws.com/data-retention \
  -H "Authorization: Bearer <your_bearer_token>" \
  -H "Content-Type: application/json" \
  -d '{ "mode": "provider_data_share" }'

Opting in means Anthropic retains your inputs and outputs for 30 days. The data is not used for training, human access to it is logged, and it is deleted after 30 days. It is a requirement for all Mythos-class traffic, and once you opt in your data leaves AWS's security boundary. Decide if that is acceptable for your workload before you flip it.

Once you are opted in, point the Anthropic SDK at the Bedrock endpoint. The model ID on the Messages API is anthropic.claude-fable-5, with the anthropic. prefix. This Python call runs against bedrock-mantle.

import anthropic

client = anthropic.Anthropic(
    base_url="https://bedrock-mantle.us-east-1.api.aws/anthropic",
    api_key="<your-bedrock-api-key>",
)

message = client.messages.create(
    model="anthropic.claude-fable-5",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Design a multi-region architecture for 100k requests per second."}],
)

print(message.content[0].text)

If you prefer the Converse API through boto3, the model ID changes to global.anthropic.claude-fable-5 and you call bedrock-runtime. This is the same request in that shape.

import boto3

bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

response = bedrock_runtime.converse(
    modelId="global.anthropic.claude-fable-5",
    messages=[{"role": "user", "content": [{"text": "Design a multi-region architecture for 100k RPS."}]}],
    inferenceConfig={"maxTokens": 4096},
)

print(response["output"]["message"]["content"][0]["text"])

At launch, Bedrock serves Fable 5 in US East (N. Virginia) and Europe (Stockholm). Access expands gradually across AWS accounts; if yours is not enabled yet, contact AWS Support to move faster.

A Billing Surprise to Plan For

Fable 5 ships with conservative safeguards. Queries about cyber, bio, chemistry, and health topics fall back to a response from Opus 4.8 instead, and the user is told. This fires in under 5% of sessions on average.

The part that affects your code is the billing. Fallback responses are charged at Opus rates, not Fable rates. On Bedrock, if a request is blocked mid-conversation, the initial tokens bill at Fable rates and the rest at Opus rates. So some responses will be cheaper than your model picker implies and will carry a note about being answered by Opus 4.8. Build your cost model to allow for it rather than treating every Fable call as a flat Fable charge.

Migrating From Opus 4.8 or 4.7

If your existing Opus code is already clean, moving to Fable 5 is a model-ID swap and nothing else. "Clean" means no sampling parameters, no budget_tokens, and no explicit thinking: {"type": "disabled"}.

Here is the checklist.

Swap the model ID to claude-fable-5.
Delete any thinking: {"type": "disabled"}. It 400s on Fable 5. Omit the field instead.
Remove temperature, top_p, and top_k. They already 400 on Opus 4.7 and 4.8, and they 400 here too.
Replace budget_tokens with thinking: {"type": "adaptive"}.
Replace any last-assistant-turn prefill with output_config: {"format": {...}}.
Re-tune effort per route. It carries more weight on Fable 5.
Set thinking.display to "summarized" if you surface reasoning to users.
Re-baseline your max_tokens and cost dashboards, since Fable runs at double Opus pricing.

If you are coming from Opus 4.6 or older, apply the full adaptive-thinking migration first, then the Fable 5 deltas above.

The one genuinely new breaking change versus Opus 4.8 and 4.7 is the explicit thinking-disabled 400. Everything else on this list already applied to recent Opus models. That is why people call Fable 5 "mostly drop-in." The word doing the work is "mostly."

Frequently Asked Questions

What is the Claude Fable 5 model ID?

The first-party model ID is the exact string claude-fable-5, with no date suffix. On Amazon Bedrock it carries an anthropic. prefix: anthropic.claude-fable-5 on the Messages API (bedrock-mantle), and global.anthropic.claude-fable-5 on the Converse or Invoke API (bedrock-runtime).

Why does thinking disabled return a 400 on Fable 5?

On Fable 5, an explicit thinking: {"type": "disabled"} is not accepted and returns a 400. This is unique to Fable 5; Opus 4.8 and 4.7 accept it. To run a request without thinking, omit the thinking parameter entirely instead of setting it to disabled.

How do I use Fable 5 on Bedrock?

First opt into data sharing by setting the Data Retention API mode to provider_data_share. There is no console UI for this at launch, so you do it with a curl call to the data-retention endpoint. After that, point the Anthropic SDK at the Bedrock base URL and use the model ID anthropic.claude-fable-5, or use boto3's Converse API with global.anthropic.claude-fable-5.

How much does Claude Fable 5 cost?

$10 per million input tokens and $50 per million output tokens. That is double Opus 4.8's $5 and $25. Responses that hit the safeguards and fall back to Opus 4.8 are billed at Opus rates instead, so some calls cost less than the headline Fable rate.

How do I turn thinking off on Fable 5 without an error?

Remove the thinking field from your request. A request with no thinking field runs without thinking. Do not set thinking: {"type": "disabled"}, which returns a 400. If you find the model writing long reasoning into the visible answer with thinking off, either leave adaptive thinking on or add a system instruction to reply with only the final answer.

Does prompt caching change on Fable 5?

The mechanics are the same, but the minimum cacheable prefix is lower: 2048 tokens on Fable 5 versus 4096 on Opus 4.8. So a mid-sized prompt that would not cache on Opus 4.8 will cache on Fable 5. Verify hits with usage.cache_read_input_tokens.