Claude Code Fast Mode

Problem: Interactive work on Opus 4.6 drags. Each round-trip feels slow enough to break rhythm. Dropping the effort level clears latency at the cost of quality, and you want the full depth back at a quicker cadence.

Quick Win: Inside your Claude Code session, run /fast and hit Tab. A lightning bolt pops up next to the prompt. Same model under the hood, replies 2.5x faster.

What Fast Mode Actually Is

Here's the key thing to understand: fast mode does not change which model you're running. Opus 4.6 stays Opus 4.6. Same weights. Same capabilities. Same ceiling on quality. There's no quiet swap to Haiku dressed up in an Opus wrapper. What you're getting is Opus 4.6 on an infrastructure priority lane.

What actually shifts is the serving configuration. Your API calls take a faster route, and replies land about 2.5x sooner than on standard Opus 4.6. The price-per-token goes up in exchange for the priority. The model stays just as smart. No shortcut reasoning, no compressed output, nothing skipped to hit the lower latency. Same answer. Faster arrival.

That's why the distinction matters. Before fast mode, the only way to speed Claude Code up was to dial down the effort level, which really does shrink how much thinking goes into a reply. Less reasoning time works on simple tasks and breaks on anything complex. Fast mode skips that bargain. Speed at the same quality.

Fast mode launched as a research preview, so expect Anthropic to refine pricing, rate limits, and the overall experience over time.

How to Enable Claude Code Fast Mode

One command flips it on:

/fast

Press Tab to confirm. That's it. A lightning bolt icon appears next to your prompt, confirming fast mode is active. Run /fast again to disable it.

You can also set it permanently in your user settings file:

{
  "fastMode": true
}

A few behaviors to know:

Persists across sessions. Fast mode stays on from one session to the next until you turn it off yourself.
Auto-switches to Opus 4.6. On Haiku or Sonnet when you flip fast mode on? You get bumped up to Opus 4.6 automatically.
Disabling keeps your model. Toggle fast mode off with /fast and your model stays put on Opus 4.6. To move to something else, run /model.

Pricing Breakdown

Rates depend on your context window size. Full pricing table:

Mode	Input (per MTok)	Output (per MTok)
Fast mode Opus 4.6 (under 200K context)	$30	$150
Fast mode Opus 4.6 (over 200K context)	$60	$225

A few cost details to keep in mind:

Compatible with 1M extended context. The full 1M context window is supported, though rates step up above 200K tokens.
Billed to extra usage only. Fast mode tokens stay outside your plan's included usage. Every single one lands on the extra usage line.
Switching mid-conversation is expensive. Turn fast mode on partway through a session and you re-price your whole existing context at the uncached input rate. To keep costs low, flip it on before your first message.

When to Use Fast Mode

Interactive work is where the extra speed really lands, because latency is the thing slowing you down:

Rapid iteration cycles. Edit, run, ask Claude to tweak, repeat. Across dozens of these micro-interactions per session, 2.5x compounds. A debugging session with 25 round-trips ends roughly 15-20 minutes earlier on fast mode than on standard Opus 4.6. A 20-round-trip task feels fundamentally different at 2.5x speed.
Live debugging. Chasing a bug through stack traces and log output, the wait for each reply is what breaks your concentration. Quicker answers hold flow state. You stay locked onto the problem instead of losing the thread while the cursor blinks.
Time-sensitive deadlines. 2 AM hotfixes. Demos an hour away. Times like those, paying more per token is obviously worth it. The cost premium vanishes once you measure it against the cost of shipping late.
Interactive pair programming. Any moment where latency beats cost, because you're thinking alongside Claude in real time and every pause cracks the rhythm. Design discussions where ideas fly back and forth are the clearest example.

When to Skip Fast Mode

Standard mode wins any time latency isn't the bottleneck. Here's the quick test: are you sitting there watching the cursor? If not, regular rates give you the same output for less.

Long autonomous tasks. Kick off a big refactor, walk away, come back. Response time is invisible to you, so save the money.
Batch processing or CI/CD pipelines. Automated flows see no benefit from lower latency. Standard rates are the right fit.
Cost-sensitive workloads. Burning through tokens on a large codebase analysis, you get identical output at the regular rate. Quality doesn't change, so the premium buys nothing.
Tasks where you step away. Tell Claude to restructure a module, grab coffee, and the speed difference is literally unnoticed. Fast mode only pays off while you're watching the cursor.

Fast Mode vs. Effort Level

Two settings, two different mechanisms for making things faster:

Setting	What It Does
Fast mode	Same model quality, lower latency, higher cost per token
Lower effort level	Less thinking time, faster responses, potentially lower quality on complex tasks

Both can run together. Fast mode plus a lower effort level stacks the infrastructure speed with the lower thinking overhead in one go. That's a solid fit for quick jobs: formatting, simple refactors, boilerplate, anywhere deep analysis isn't earning its keep. On complex architectural calls or tricky debugging, leave the effort level high and let fast mode handle the latency. Two independent dials. Each task gets tuned on its own merits instead of stuck in one compromise setting.

Rate Limits and Fallback Behavior

Fast mode comes with its own rate limit, independent of the standard Opus 4.6 pool. Enabling it leaves your regular limit alone, and the other way around. When you hit the ceiling, Claude Code rolls through it cleanly:

Auto-fallback to standard Opus 4.6. The session rolls on without a break. You lose priority speed for a while, that's it. No error, no interrupted workflow.
Lightning bolt turns gray. The indicator flips to cooldown mode, so a glance at the prompt tells you where you stand.
Standard speed and pricing apply. Billing drops to normal Opus 4.6 rates during the fallback window. After a heavy fast-mode burst, that's actually a cost break.
Auto re-enables when ready. Cooldown ends, fast mode turns itself back on. Nothing for you to do.

Don't feel like waiting out the cooldown? Run /fast to turn fast mode off manually and keep going at standard rates until you decide to switch back.

Requirements and Availability

Coverage isn't universal. Here's what you need:

Anthropic direct only. Access runs through the Anthropic Console API and Claude subscription plans. Amazon Bedrock, Google Vertex AI, and Microsoft Azure Foundry don't carry it.
Extra usage must be enabled. Because every fast mode token bills to extra usage, your account has to have extra usage turned on.
Teams and Enterprise restrictions. Org admins have to flip fast mode on inside Console settings or Claude AI admin settings before anyone on the team can use it. Without that, /fast returns: "Fast mode has been disabled by your organization."

Fast Mode with Agent Teams

On agent teams, fast mode sticks to the lead session. Teammate agents stay on their own speed settings, which gives you fine-grained control over where the cost goes in a multi-agent workflow.

A workable team setup looks like this. The lead agent runs on fast mode for quick coordination and decision calls. Teammates stay on standard speed so the bill stays reasonable. Short interactive exchanges sit with the lead, things like reviewing teammate output, making routing choices, and answering your direct questions. Teammates pick up the long autonomous jobs, like writing tests, refactoring modules, or generating documentation, where a 2.5x speedup wouldn't really change what you feel.

Cost Optimization Tips

A few practical moves to keep fast mode from running up the bill:

Enable at session start. Flipping fast mode on mid-conversation re-prices your whole context at the uncached input rate. With 50K tokens already in play, that's a real hit. Turn it on with the first message and the penalty never fires.
Pair with effort levels. Fast mode plus a lower effort level gives simple tasks the most speed. Bump effort back up when the work gets complex.
Toggle based on work type. Run fast mode during hands-on coding, then flip it off before kicking off anything autonomous. A few seconds of toggling a day saves meaningful token spend.
Monitor through Console. Track usage and billing inside the Anthropic Console dashboard so you can see how fast mode shifts your spend. A week or two of data shows you where to settle between speed and cost.

Putting It Together

One gap is exactly what fast mode fills: Opus 4.6 quality without the Opus 4.6 wait. What it costs you is dollars, not intelligence. For developers who spend hours per day in interactive Claude Code sessions, the premium pays itself back through sustained focus and tighter iteration loops.

Think of it as three independent dials on Claude Code performance. One dial is infrastructure speed, which is what fast mode controls. Another is thinking depth, which effort levels control. The third is base capability tier, which model selection sets. The dials work on their own and combine any way you want.

For your highest-value interactive work, everything goes up: fast mode on, high effort, Opus 4.6. Quick formatting or boilerplate is a good match for fast mode with low effort. Background work nobody watches gets standard speed and saves money at zero noticeable cost. Match the setting to the task in front of you. One fixed configuration doesn't cut it.

Claude Code Fast Mode

On this page