Build This Now
Build This Now
What Is Claude Code?Claude Code InstallationClaude Code Native InstallerYour First Claude Code Project
speedy_devvkoen_salo
Blog/Handbook/Performance/Speed Optimization

Speed Optimization

Model selection, context size, and prompt specificity are the three levers that decide how fast Claude Code replies. /model haiku, /compact, and /clear covered.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

Published Feb 3, 2026Handbook hubPerformance index

Problem: Every request sits and spins. Replies crawl back when you need them in seconds.

Quick Win: Drop to a lighter model. Type /model haiku mid-session and the next reply lands sooner. Haiku handles syntax questions, quick explanations, and small code generations.

The Speed Multiplier Approach

Most of the waiting isn't the model being slow. It's the session set up wrong. Three knobs you already control decide how fast a reply lands: which model you picked, how big the context is, and how sharp the prompt reads.

Quick replies keep you in flow. Slow ones push you out.

Response Time Killers

Bloated Context: Every turn piles more tokens onto what Claude has to process. Long sessions collect history, and each reply gets heavier.

Model Mismatch: Sonnet for a one-liner is a truck run to the corner store. Match the engine to the job.

Vague Prompts: "Help me with this code" forces a guess. Spell out what you want and the answer comes back faster.

The 3-Round Speed Optimization Process

Round 1: Model Selection Strategy

Match your model to the task complexity using slash commands mid-session:

/model haiku    # Quick questions, syntax help, simple edits
/model sonnet   # Complex refactoring, architecture decisions

Flip to Haiku for the quick stuff. Flip back when reasoning matters. No restart.

Round 2: Context Management

Keep your context lean for faster responses:

/compact        # Compress conversation history when it grows large
/clear          # Start fresh when switching to unrelated tasks

Reach for /compact the moment replies start dragging. It folds the history into a summary while keeping what matters, so the token load per turn drops.

Round 3: Write Specific Prompts

The fastest optimization needs no commands at all. Just a sharper prompt:

Slow: "Fix this function" Fast: "Add null check for the user parameter in handleSubmit"

Slow: "Help me with the database" Fast: "Write a Prisma query to fetch users with their posts, ordered by createdAt desc"

Precise prompts kill the "wait, what do you mean?" round. That alone often halves the whole exchange.

Advanced Speed Techniques

Parallel Sessions: Run two terminals when the tasks don't touch. Frontend in one. Backend in the other.

Batch Related Tasks: One prompt can carry three jobs at once:

"In the UserProfile component:
1. Add loading state
2. Handle the error case
3. Add the avatar upload button"

CLAUDE.md Patterns: Put recurring project conventions into CLAUDE.md. Claude loads the file on its own, so you stop re-explaining the same rules.

Shell Aliases: Create shortcuts for common workflows:

alias cc="claude"
alias cch="claude --model haiku"

The Cost-Speed Balance

Tuning for speed drops the bill at the same time. Faster replies usually mean:

  • Fewer tokens billed thanks to focused context
  • Smaller model spend when Haiku does the easy work
  • More shipped per dollar
  • Context stops piling up between turns

Lock these habits in early. The gap over a slower setup keeps widening.

When Speed Matters Most

Tight Feedback Loops: Debugging runs on latency. Every second shaved off stacks up once you're locked into the problem.

Exploration Phase: Trying on different angles? Cheap replies make you bolder. Five ideas instead of two.

Code Reviews: Reviewing a diff or asking for an explanation works best when the exchange keeps pace with your reading.

Common Speed Mistakes

Never Compacting: Letting context grow until replies crawl. Run /compact before you hit the wall.

Sonnet for Everything: Running the heavier model on jobs Haiku finishes just as well.

Serial Thinking: Waiting on one answer before starting the next task, when a second session could run alongside.

Vague Requests: Leaving Claude to guess the brief instead of stating it cleanly up front.

Success Verification

Tuning is landing when:

  • Easy questions come back almost instantly on Haiku
  • /compact fires before bloat slows you down
  • Parallel sessions run when the work doesn't overlap
  • Your coding rhythm stays unbroken

Next Actions

  1. Get fluent at switching /model haiku and /model sonnet
  2. Work through context management end to end
  3. Build out your CLAUDE.md
  4. Pick up parallel workflow patterns
  5. Study the wider efficiency playbook

Continue in Performance

  • Deep Thinking Techniques
    Thinking trigger phrases like think harder, ultrathink, and think step by step push Claude Code into extended reasoning and more test-time compute, same model.
  • Efficiency Patterns
    Permutation frameworks turn 8 to 12 manual builds into a CLAUDE.md template Claude Code uses to generate variations 11, 12, and 13 on demand. Captured once.
  • Claude Code Fast Mode
    Fast mode routes your Opus 4.6 requests down a priority serving path in Claude Code. Same weights, same ceiling, replies 2.5x quicker at a higher token rate.
  • Techniques de réflexion approfondie
    Des phrases déclencheurs comme think harder, ultrathink et think step by step poussent Claude Code en raisonnement étendu et en plus de calcul au moment du test, même modèle.
  • Modèles d'efficacité
    Les frameworks de permutation transforment 8 à 12 builds manuels en un template CLAUDE.md que Claude Code utilise pour générer les variations 11, 12 et 13 à la demande. Capturé une seule fois.
  • Le mode rapide de Claude Code
    Le mode rapide route tes requêtes Opus 4.6 sur un chemin de service prioritaire dans Claude Code. Mêmes poids, même plafond, réponses 2,5x plus vite à un tarif token plus élevé.

More from Handbook

  • Agent Fundamentals
    Five ways to build specialist agents in Claude Code: Task sub-agents, .claude/agents YAML, custom slash commands, CLAUDE.md personas, and perspective prompts.
  • Agent Harness Engineering
    The harness is every layer around your AI agent except the model itself. Learn the five control levers, the constraint paradox, and why harness design determines agent performance more than the model does.
  • Agent Patterns
    Orchestrator, fan-out, validation chain, specialist routing, progressive refinement, and watchdog. Six orchestration shapes to wire Claude Code sub-agents with.
  • Agent Teams Best Practices
    Battle-tested patterns for Claude Code Agent Teams. Context-rich spawn prompts, right-sized tasks, file ownership, delegate mode, and v2.1.33-v2.1.45 fixes.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

On this page

The Speed Multiplier Approach
Response Time Killers
The 3-Round Speed Optimization Process
Round 1: Model Selection Strategy
Round 2: Context Management
Round 3: Write Specific Prompts
Advanced Speed Techniques
The Cost-Speed Balance
When Speed Matters Most
Common Speed Mistakes
Success Verification
Next Actions

Stop configuring. Start building.

SaaS builder templates with AI orchestration.