Claude Opus 4.5 in Claude Code

Your Claude Code bill is mostly output tokens. Opus 4.5 cuts that bill by 67% and writes cleaner code while doing it. Here is how to turn it on and what changes once you do.

Quick Win: set Opus 4.5 as the default model and open a session:

claude config set model claude-opus-4-5-20251101
claude

You are now running the most token-efficient coding model available.

Token Efficiency

This is not marketing copy. GitHub reports Opus 4.5 "surpasses internal coding benchmarks while cutting token usage in half." Replit says it "beats Sonnet 4.5 and competition on our internal benchmarks, using fewer tokens to solve the same problems."

Here is what that looks like day to day:

Metric	Improvement
Output tokens vs Sonnet 4.5	76% reduction
Tool calls per task	50% fewer
Long-running tasks	Up to 65% reduction
With Tool Search enabled	85% reduction

Fewer tokens means faster answers, lower cost, and more room before you hit the context limit.

Built for Sub-Agent Delegation

Opus 4.5 writes better prompts for sub-agents than any other Claude model. Anthropic trained it for delegation on purpose.

This pays off when you run parallel agents for testing, code generation, or task distribution. The lead agent hands work out more cleanly:

# Example: Running parallel browser tests
claude "Run 4 parallel test agents against staging -
test login flow, checkout, search, and user settings"

The model handles the coordination. Each sub-agent gets clear, specific instructions. Results come back to you without the chaos of earlier models.

The Effort Parameter

New API control for trading speed against thoroughness. Set it per call without switching models:

const response = await anthropic.messages.create({
  model: "claude-opus-4-5-20251101",
  max_tokens: 8192,
  thinking: {
    type: "enabled",
    budget_tokens: 10000, // Low: 1024, Medium: 5000, High: 10000+
  },
  messages: [{ role: "user", content: prompt }],
});

Low effort for quick questions. High effort for big refactors. You decide the thinking budget per call.

Auto-Compaction for Long Sessions

Hit 95% of your 200K context window? Claude compacts earlier messages automatically while keeping your full chat history. Alex Albert calls it "effectively infinite context."

Manual control is there when you want it:

/compact

Best practice: compact at logical milestones rather than waiting for the automatic trigger. You keep more detail in the parts that matter.

When Things Go Wrong

Error: "model not found". Update your Claude Code install:

npm update -g @anthropic-ai/claude-code

Error: "rate limit exceeded". Opus 4.5 has separate limits from Sonnet. Check your plan tier or add a short delay between requests.

Error: "context too long". Run /compact by hand or split the task into smaller chunks. See memory optimization for deeper patterns.

What This Means for Your Workflow

Opus 4.5 is not just a version bump. It is a different way to work:

Delegate more. Hand off complex coordination you would not trust to earlier models.
Run longer sessions. Token efficiency means more work before compaction kicks in.
Pay less. A 67% cost drop at the same or better quality.

The model scores 80.9% on SWE-bench Verified (a new high) and leads across 7 of 8 programming languages. Your code works the first try, not the fifth.

Model selection for when to use Opus versus Sonnet
Sub-agent design patterns for getting the most out of delegation
Efficiency patterns for production workflows

Update: Claude Opus 4.6 is now available with 1M token context and native agent teams. See the complete model timeline for all Claude models.