Claude Opus 4.5 in Claude Code
Opus 4.5 shipped with $5/$25 pricing and 76% fewer output tokens than Sonnet 4.5. Set it as your default in two commands.
Your Claude Code bill is mostly output tokens. Opus 4.5 cuts that bill by 67% and writes cleaner code while doing it. Here is how to turn it on and what changes once you do.
Quick Win: set Opus 4.5 as the default model and open a session:
claude config set model claude-opus-4-5-20251101
claudeYou are now running the most token-efficient coding model available.
Token Efficiency
This is not marketing copy. GitHub reports Opus 4.5 "surpasses internal coding benchmarks while cutting token usage in half." Replit says it "beats Sonnet 4.5 and competition on our internal benchmarks, using fewer tokens to solve the same problems."
Here is what that looks like day to day:
| Metric | Improvement |
|---|---|
| Output tokens vs Sonnet 4.5 | 76% reduction |
| Tool calls per task | 50% fewer |
| Long-running tasks | Up to 65% reduction |
| With Tool Search enabled | 85% reduction |
Fewer tokens means faster answers, lower cost, and more room before you hit the context limit.
Built for Sub-Agent Delegation
Opus 4.5 writes better prompts for sub-agents than any other Claude model. Anthropic trained it for delegation on purpose.
This pays off when you run parallel agents for testing, code generation, or task distribution. The lead agent hands work out more cleanly:
# Example: Running parallel browser tests
claude "Run 4 parallel test agents against staging -
test login flow, checkout, search, and user settings"The model handles the coordination. Each sub-agent gets clear, specific instructions. Results come back to you without the chaos of earlier models.
The Effort Parameter
New API control for trading speed against thoroughness. Set it per call without switching models:
const response = await anthropic.messages.create({
model: "claude-opus-4-5-20251101",
max_tokens: 8192,
thinking: {
type: "enabled",
budget_tokens: 10000, // Low: 1024, Medium: 5000, High: 10000+
},
messages: [{ role: "user", content: prompt }],
});Low effort for quick questions. High effort for big refactors. You decide the thinking budget per call.
Auto-Compaction for Long Sessions
Hit 95% of your 200K context window? Claude compacts earlier messages automatically while keeping your full chat history. Alex Albert calls it "effectively infinite context."
Manual control is there when you want it:
/compact
Best practice: compact at logical milestones rather than waiting for the automatic trigger. You keep more detail in the parts that matter.
When Things Go Wrong
Error: "model not found". Update your Claude Code install:
npm update -g @anthropic-ai/claude-code
Error: "rate limit exceeded". Opus 4.5 has separate limits from Sonnet. Check your plan tier or add a short delay between requests.
Error: "context too long". Run /compact by hand or split the task into smaller chunks. See memory optimization for deeper patterns.
What This Means for Your Workflow
Opus 4.5 is not just a version bump. It is a different way to work:
- Delegate more. Hand off complex coordination you would not trust to earlier models.
- Run longer sessions. Token efficiency means more work before compaction kicks in.
- Pay less. A 67% cost drop at the same or better quality.
The model scores 80.9% on SWE-bench Verified (a new high) and leads across 7 of 8 programming languages. Your code works the first try, not the fifth.
Related Pages
- Model selection for when to use Opus versus Sonnet
- Sub-agent design patterns for getting the most out of delegation
- Efficiency patterns for production workflows
Update: Claude Opus 4.6 is now available with 1M token context and native agent teams. See the complete model timeline for all Claude models.
Stop configuring. Start building.