Speed Optimization

Problem: Every request sits and spins. Replies crawl back when you need them in seconds.

Quick Win: Drop to a lighter model. Type /model haiku mid-session and the next reply lands sooner. Haiku handles syntax questions, quick explanations, and small code generations.

The Speed Multiplier Approach

Most of the waiting isn't the model being slow. It's the session set up wrong. Three knobs you already control decide how fast a reply lands: which model you picked, how big the context is, and how sharp the prompt reads.

Quick replies keep you in flow. Slow ones push you out.

Response Time Killers

Bloated Context: Every turn piles more tokens onto what Claude has to process. Long sessions collect history, and each reply gets heavier.

Model Mismatch: Sonnet for a one-liner is a truck run to the corner store. Match the engine to the job.

Vague Prompts: "Help me with this code" forces a guess. Spell out what you want and the answer comes back faster.

The 3-Round Speed Optimization Process

Round 1: Model Selection Strategy

Match your model to the task complexity using slash commands mid-session:

/model haiku    # Quick questions, syntax help, simple edits
/model sonnet   # Complex refactoring, architecture decisions

Flip to Haiku for the quick stuff. Flip back when reasoning matters. No restart.

Round 2: Context Management

Keep your context lean for faster responses:

/compact        # Compress conversation history when it grows large
/clear          # Start fresh when switching to unrelated tasks

Reach for /compact the moment replies start dragging. It folds the history into a summary while keeping what matters, so the token load per turn drops.

Round 3: Write Specific Prompts

The fastest optimization needs no commands at all. Just a sharper prompt:

Slow: "Fix this function" Fast: "Add null check for the user parameter in handleSubmit"

Slow: "Help me with the database" Fast: "Write a Prisma query to fetch users with their posts, ordered by createdAt desc"

Precise prompts kill the "wait, what do you mean?" round. That alone often halves the whole exchange.

Advanced Speed Techniques

Parallel Sessions: Run two terminals when the tasks don't touch. Frontend in one. Backend in the other.

Batch Related Tasks: One prompt can carry three jobs at once:

"In the UserProfile component:
1. Add loading state
2. Handle the error case
3. Add the avatar upload button"

CLAUDE.md Patterns: Put recurring project conventions into CLAUDE.md. Claude loads the file on its own, so you stop re-explaining the same rules.

Shell Aliases: Create shortcuts for common workflows:

alias cc="claude"
alias cch="claude --model haiku"

The Cost-Speed Balance

Tuning for speed drops the bill at the same time. Faster replies usually mean:

Fewer tokens billed thanks to focused context
Smaller model spend when Haiku does the easy work
More shipped per dollar
Context stops piling up between turns

Lock these habits in early. The gap over a slower setup keeps widening.

When Speed Matters Most

Tight Feedback Loops: Debugging runs on latency. Every second shaved off stacks up once you're locked into the problem.

Exploration Phase: Trying on different angles? Cheap replies make you bolder. Five ideas instead of two.

Code Reviews: Reviewing a diff or asking for an explanation works best when the exchange keeps pace with your reading.

Common Speed Mistakes

Never Compacting: Letting context grow until replies crawl. Run /compact before you hit the wall.

Sonnet for Everything: Running the heavier model on jobs Haiku finishes just as well.

Serial Thinking: Waiting on one answer before starting the next task, when a second session could run alongside.

Vague Requests: Leaving Claude to guess the brief instead of stating it cleanly up front.

Success Verification

Tuning is landing when:

Easy questions come back almost instantly on Haiku
/compact fires before bloat slows you down
Parallel sessions run when the work doesn't overlap
Your coding rhythm stays unbroken

Next Actions

Get fluent at switching /model haiku and /model sonnet
Work through context management end to end
Build out your CLAUDE.md
Pick up parallel workflow patterns
Study the wider efficiency playbook

Speed Optimization

On this page