Speed Optimization
Model selection, context size, and prompt specificity are the three levers that decide how fast Claude Code answers.
Problem: Every request sits and spins. Replies crawl back when you need them in seconds.
Quick Win: Drop to a lighter model. Type /model haiku mid-session and the next reply lands sooner. Haiku handles syntax questions, quick explanations, and small code generations.
The Speed Multiplier Approach
Most of the waiting isn't the model being slow. It's the session set up wrong. Three knobs you already control decide how fast a reply lands: which model you picked, how big the context is, and how sharp the prompt reads.
Quick replies keep you in flow. Slow ones push you out.
Response Time Killers
Bloated Context: Every turn piles more tokens onto what Claude has to process. Long sessions collect history, and each reply gets heavier.
Model Mismatch: Sonnet for a one-liner is a truck run to the corner store. Match the engine to the job.
Vague Prompts: "Help me with this code" forces a guess. Spell out what you want and the answer comes back faster.
The 3-Round Speed Optimization Process
Round 1: Model Selection Strategy
Match your model to the task complexity using slash commands mid-session:
/model haiku # Quick questions, syntax help, simple edits
/model sonnet # Complex refactoring, architecture decisionsFlip to Haiku for the quick stuff. Flip back when reasoning matters. No restart.
Round 2: Context Management
Keep your context lean for faster responses:
/compact # Compress conversation history when it grows large
/clear # Start fresh when switching to unrelated tasksReach for /compact the moment replies start dragging. It folds the history into a summary while keeping what matters, so the token load per turn drops.
Round 3: Write Specific Prompts
The fastest optimization needs no commands at all. Just a sharper prompt:
Slow: "Fix this function" Fast: "Add null check for the user parameter in handleSubmit"
Slow: "Help me with the database" Fast: "Write a Prisma query to fetch users with their posts, ordered by createdAt desc"
Precise prompts kill the "wait, what do you mean?" round. That alone often halves the whole exchange.
Advanced Speed Techniques
Parallel Sessions: Run two terminals when the tasks don't touch. Frontend in one. Backend in the other.
Batch Related Tasks: One prompt can carry three jobs at once:
"In the UserProfile component:
1. Add loading state
2. Handle the error case
3. Add the avatar upload button"CLAUDE.md Patterns: Put recurring project conventions into CLAUDE.md. Claude loads the file on its own, so you stop re-explaining the same rules.
Shell Aliases: Create shortcuts for common workflows:
alias cc="claude"
alias cch="claude --model haiku"The Cost-Speed Balance
Tuning for speed drops the bill at the same time. Faster replies usually mean:
- Fewer tokens billed thanks to focused context
- Smaller model spend when Haiku does the easy work
- More shipped per dollar
- Context stops piling up between turns
Lock these habits in early. The gap over a slower setup keeps widening.
When Speed Matters Most
Tight Feedback Loops: Debugging runs on latency. Every second shaved off stacks up once you're locked into the problem.
Exploration Phase: Trying on different angles? Cheap replies make you bolder. Five ideas instead of two.
Code Reviews: Reviewing a diff or asking for an explanation works best when the exchange keeps pace with your reading.
Common Speed Mistakes
Never Compacting: Letting context grow until replies crawl. Run /compact before you hit the wall.
Sonnet for Everything: Running the heavier model on jobs Haiku finishes just as well.
Serial Thinking: Waiting on one answer before starting the next task, when a second session could run alongside.
Vague Requests: Leaving Claude to guess the brief instead of stating it cleanly up front.
Success Verification
Tuning is landing when:
- Easy questions come back almost instantly on Haiku
/compactfires before bloat slows you down- Parallel sessions run when the work doesn't overlap
- Your coding rhythm stays unbroken
Next Actions
- Get fluent at switching
/model haikuand/model sonnet - Work through context management end to end
- Build out your CLAUDE.md
- Pick up parallel workflow patterns
- Study the wider efficiency playbook
Stop configuring. Start building.
Deep Thinking Techniques
Thinking trigger phrases like `think harder` and `ultrathink` push Claude Code into extended reasoning without switching models.
Claude Code Fast Mode
Fast mode routes your Opus 4.6 requests down a priority serving path, so responses come back about 2.5x quicker at a higher per-token rate.