Claude Opus 4.7
Claude Opus 4.7 is Anthropic's April 16, 2026 flagship for Claude Code: stronger on hard coding, cyber-adjacent workflows, document reasoning, and long-running agentic tasks at the same $5/$25 pricing as Opus 4.6.
Claude Opus 4.7 is the first Claude release in a while that feels bigger than a point upgrade. The price is unchanged. The 1M context window is unchanged. The 128K output ceiling is unchanged. What changed is the part people actually feel inside Claude Code: the model is better at hard, ambiguous, long-running work that used to need constant supervision.
That shows up in three places.
- It catches more of its own mistakes before acting.
- It stays coherent for longer inside multi-step agent loops.
- It lands better across domains that are not just "write code": cyber workflows, code review, dense screenshots, enterprise docs, contracts, diagrams, and other ambiguous source-heavy work.
If you already live in Claude Code, the short answer is simple: Opus 4.7 is the new default upgrade for high-stakes engineering sessions. If you want the workflow advice, read the dedicated Opus 4.7 best practices guide. If you want concrete examples by domain, read the companion Opus 4.7 use cases page.
Quick Verdict
Use Opus 4.7 when the work is expensive to get wrong:
- complex refactors across many files
- debugging with incomplete or conflicting evidence
- code review where subtle bugs matter
- cyber-defense, vulnerability research, or security auditing
- document-heavy work in legal, finance, and operations
- multimodal tasks with dense screenshots, diagrams, or UI mocks
Stay on Sonnet for smaller daily edits where speed and cost matter more than maximum reasoning depth.
Key Specs
| Spec | Details |
|---|---|
| API ID | claude-opus-4-7 |
| Release date | April 16, 2026 |
| Context window | 1M tokens |
| Max output | 128,000 tokens |
| Pricing | $5 input / $25 output per 1M tokens |
| Thinking mode | Adaptive thinking |
| Effort levels | low, medium, high, xhigh, max |
| Claude Code default effort | xhigh |
| Knowledge cutoff | January 2026 |
| Status | Current Opus flagship |
Claude Opus 4.7 vs Opus 4.6
The basic story is not "a bit smarter." It is "more reliable on the hard slice of work."
| Area | Opus 4.6 | Opus 4.7 |
|---|---|---|
| CursorBench | 58% | 70% |
| Rakuten-SWE-Bench | Baseline | 3x more production tasks resolved |
| XBOW visual-acuity | 54.5% | 98.5% |
| OfficeQA Pro | Baseline | 21% fewer errors |
| BigLaw Bench | Lower | 90.9% at high effort |
| Notion Agent tool errors | Baseline | about one third as many |
| Resolution support | 1568px / 1.15MP | 2576px / 3.75MP |
| Default Claude Code effort | high | xhigh |
| Thinking control | adaptive, older migration path | adaptive only, fixed-budget thinking removed |
| Tool use style | more tool-happy | more selective, more reasoning-first |
| Subagent behavior | delegates more freely | delegates more selectively |
The important part is behavioral, not just numerical. Anthropic and launch partners describe Opus 4.7 as more literal about instructions, more willing to verify assumptions, better at pushing through long tasks, and less likely to loop or fail silently halfway through.
What Actually Improved in Practice
1. Self-verification shows up more often
One of the clearest launch signals is that Opus 4.7 does more checking before it commits to an answer or a code change.
Anthropic's launch page includes Vercel describing a new behavior: the model does proofs on systems code before starting work. Hex says it is better at admitting when data is missing instead of inventing plausible fallback logic. That matters because a lot of real engineering pain is not syntax failure. It is confident-but-wrong reasoning on incomplete context.
Inside Claude Code, this tends to look like:
- reading one more file before editing
- checking a call site before changing a type
- confirming an assumption about state shape or schema
- pausing to validate a concurrency or migration path
That extra step is often the difference between a clean first pass and a 40-minute loop.
2. Long-running agentic work derails less
Devin reported that Opus 4.7 works coherently for hours and pushes through difficult tasks instead of giving up early. Notion reported a 14% gain on complex multi-step workflows with roughly one third of the tool errors of Opus 4.6. Genspark called out loop resistance, consistency, and graceful recovery as the three production traits that matter most.
That makes Opus 4.7 a better fit for:
- longer refactors
- async coding agents
- CI and automation workflows
- service-wide review passes
- investigations where the model has to read, compare, and revise repeatedly
3. Hard coding moved, not just easy coding
CursorBench climbing from 58% to 70% matters because it is closer to the vague, messy, real prompts developers actually hand to coding agents. Rakuten's 3x improvement on production SWE tasks matters because it suggests the gain is not limited to toy examples or benchmark-friendly problems.
CodeRabbit reported over 10% better recall on review workloads while keeping precision stable. Warp and Qodo both called out harder bug classes that 4.7 now catches or resolves. Factory reported a 10-15% lift in task success for Droids with fewer tool errors and more reliable follow-through.
The pattern is consistent: Opus 4.7 is not just "more eloquent." It clears a harder class of engineering work.
4. Dense vision inputs are finally first-class
The resolution jump is one of the most underrated changes in the release. Moving from 1568px / 1.15MP to 2576px / 3.75MP is not cosmetic. It changes what you can trust the model to read without cropping.
That especially helps when the input is:
- a packed dashboard screenshot
- a terminal capture with small text
- a technical diagram
- a design mockup with dense labels
- a scanned contract table or document excerpt
- a chemistry or life-sciences figure
XBOW's visual-acuity jump from 54.5% to 98.5% is the sharpest proof that the added pixels translate into real utility.
5. It is stronger outside pure coding
Anthropic's release positioned Opus 4.7 as stronger on coding, enterprise workflows, and long-running agentic tasks. The partner examples back that up:
- Cybersecurity: XBOW says their biggest visual pain point in autonomous pentesting effectively disappeared.
- Legal: Harvey reports 90.9% on BigLaw Bench at
higheffort, with better reasoning on ambiguous edits and review tables. - Docs and enterprise reasoning: Databricks reported 21% fewer errors on OfficeQA Pro.
- Finance and research: Applied AI testers highlighted stronger disclosure discipline and better long-context performance.
- Life sciences: Solve Intelligence called out gains on chemical structures and technical diagrams.
- Design and UI: Lovable said the design taste is strong enough that the model makes choices they would actually ship.
That makes Opus 4.7 a broader "high-stakes knowledge work" model, not just a coding model.
Benchmark Results That Matter
The full benchmark wall is useful for launch day, but only some numbers map cleanly to user value.
| Benchmark | Why it matters |
|---|---|
| CursorBench: 70% | Closer to real coding-agent prompts than narrow coding evals |
| Rakuten-SWE-Bench: 3x more resolved | Signals movement on production engineering tasks, not just toy repos |
| XBOW visual-acuity: 98.5% | Proves dense image understanding is materially better |
| BigLaw Bench: 90.9% | Strong signal for contract and legal-review use cases |
| OfficeQA Pro: 21% fewer errors | Useful proxy for enterprise docs and document reasoning |
| Notion Agent: +14%, fewer tool errors | Good indicator for multi-step agent reliability |
| CodeRabbit: recall +10% | Strong signal for review and bug-finding workflows |
If you are choosing a model for Claude Code, CursorBench, Rakuten, Notion, CodeRabbit, and XBOW are the most actionable signals in this release.
Where Opus 4.7 Lands Hardest
Claude Code engineering sessions
This is the obvious one. Opus 4.7 is better when the task is vague, multi-file, or expensive to redo. API migrations, cross-cutting refactors, concurrency bugs, architecture reviews, and codebase-wide cleanups all benefit from the model being more literal, more patient, and more verification-heavy.
Security and cyber-defense workflows
Opus 4.7 matters in security because coding capability and cyber capability are now tightly linked. Project Glasswing, announced on April 7, 2026, is about Mythos Preview, not Opus 4.7. But Anthropic explicitly references Glasswing in the April 16, 2026 Opus 4.7 launch to explain why new cyber safeguards matter here: Opus 4.7 is the first public model where they are testing some of those safeguards in the real world.
That gives you two conclusions:
- the model is strong enough to be useful for serious defensive security work
- the model is strong enough that Anthropic is actively constraining risky misuse
If you do legitimate vulnerability research, penetration testing, or red-teaming, Anthropic points professionals toward the Cyber Verification Program.
Legal, finance, and enterprise operations
Opus 4.7 is a strong fit when the work is: compare, verify, summarize, and avoid hallucinating the missing pieces. Contracts, audit trails, review tables, financial memos, policy docs, and internal operating documents all benefit from the model's stronger calibration and document reasoning.
Multimodal product, design, and R&D work
Better screenshot reading and diagram handling make it more useful for design critique, product QA, life sciences workflows, patents, and technical documentation. If the source material used to require manual zooming or cropping, Opus 4.7 is much more usable.
For more concrete domain examples and prompt ideas, see Claude Opus 4.7 use cases.
Cyber, Risk, and Safety: Why This Release Is Different
Anthropic's launch messaging around Opus 4.7 is unusual because it does not just celebrate capability. It places the release inside a live cyber-risk story.
Anthropic says Opus 4.7 is less capable than Mythos Preview, but still strong enough that they experimented during training with differentially reducing cyber capabilities relative to Mythos. They also shipped automated safeguards that detect and block requests indicating prohibited or high-risk cybersecurity use.
That matters for anyone writing about the model because it changes the angle:
- Opus 4.7 is not just a faster copilot.
- It sits in the category where cyber benefit and cyber risk now move together.
- Defensive workflows are a legitimate strength area.
- Unsafe or disallowed offensive workflows are an explicit deployment concern.
In practical terms, that means you should position Opus 4.7 as strong for:
- secure code review
- defensive audit passes
- threat modeling
- vulnerability triage
- pentest support inside approved programs
- security documentation and remediation planning
Not as a generic "do anything cyber" engine.
Vision: The 3x Resolution Upgrade
Opus 4.7 is the first Claude release where the image pipeline deserves its own buying decision.
The new resolution ceiling means:
- less cropping before sending screenshots
- better reliability on small text and dense UIs
- stronger interpretation of technical diagrams
- cleaner mapping from returned coordinates to real pixels
The trade-off is token cost. Anthropic notes that a full-resolution image can consume roughly 4,784 tokens instead of the roughly 1,600-token range people were used to. For image-heavy workflows, downsampling is now part of cost control.
Best Practices for Opus 4.7 in Claude Code
Anthropic's own guidance for Opus 4.7 inside Claude Code is more behavioral than technical. The theme is: delegate better, batch context earlier, and reduce unnecessary back-and-forth.
The high-signal habits are:
- put the real task in the first turn: intent, constraints, file paths, acceptance criteria
- reduce user turns where possible, because interactive back-and-forth adds reasoning overhead
- keep
xhighas the default for serious coding work - drop to
highwhen you need to control spend across many parallel sessions - reserve
maxfor very hard work and eval-style ceiling testing - tell the model explicitly when to use tools and when to fan out to subagents
- use auto mode when the task is well-scoped and you trust the overall direction
- start a fresh session when the task changes, instead of dragging stale context forward
The full workflow version of that is in Claude Opus 4.7 best practices.
Migration Notes from Opus 4.6
If you are moving API workloads from 4.6 to 4.7, do not just swap the model name and ship.
Adaptive thinking replaces fixed-budget thinking
The older thinking: { type: "enabled", budget_tokens: N } flow is gone for Opus 4.7. Use adaptive thinking and effort levels instead.
Non-default sampling parameters are gone
If your code still sets temperature, top_p, or top_k away from default values, Opus 4.7 returns a 400. Remove those knobs and shape behavior through prompting and effort.
Thinking display changed
Thinking blocks are empty by default unless you explicitly opt in to summarized display. If your UI depended on visible thinking text, you need to update it.
The tokenizer changed
Anthropic says the same input can map to roughly 1.0x to 1.35x the prior token count depending on content. Re-baseline cost and token estimates before assuming old budgets still apply.
High-resolution images cost more
If you were previously sending screenshots casually, 4.7 makes image quality much better and image token cost materially higher. Treat downsampling as a conscious lever.
Task budgets are worth testing
Anthropic introduced task budgets as a public beta so models can self-pace across a full agentic run. If you run longer loops, test them now rather than waiting until a runaway session bites you.
Pricing and Cost
Opus 4.7 kept the same headline pricing as Opus 4.6:
| Tier | Cost |
|---|---|
| Input | $5 per 1M tokens |
| Output | $25 per 1M tokens |
That does not mean cost is identical in practice.
Your real bill is shaped by:
- the new tokenizer
- higher reasoning spend at higher effort levels
- more expensive full-resolution images
- whether you run interactive multi-turn sessions or one-shot delegated tasks
The optimistic reading comes from launch partners like Hex and Replit: better quality at lower effort can offset a chunk of the raw token increase. The correct move is not to assume. Measure on real workloads.
Should You Upgrade to Claude Opus 4.7?
Yes, if your pain points are:
- agents that stop halfway through
- models that sound plausible but guess too much
- hard code review and debugging work
- dense visual or document inputs
- multi-step workflows with tools
Maybe not immediately, or not as your default, if your workload is mostly:
- small edit cycles
- cheap bulk automation
- low-risk content generation
- quick Q&A where Sonnet already lands
For most serious Claude Code users, the right strategy is simple: keep Sonnet as the fast everyday option, and use Opus 4.7 as the flagship for intelligence-sensitive work.
Frequently Asked Questions
Is Claude Opus 4.7 worth it over Opus 4.6?
For hard engineering, review, document-heavy, and long-running agentic work, yes. The most important gains are not the raw benchmark numbers. They are the better calibration, stronger self-verification, lower tool-error rate, and better behavior on ambiguous tasks.
What is the best Claude Code effort setting for Opus 4.7?
xhigh is the default in Claude Code and the right starting point for most serious coding sessions. Use high when you need better cost control across many sessions. Use max deliberately for the hardest work, not as a blanket default.
Is Claude Opus 4.7 better for cybersecurity?
It is better for legitimate defensive security workflows, code review, vulnerability triage, and cyber-adjacent analysis. Anthropic also shipped explicit cyber safeguards with the model, which is part of why the release matters.
Does Opus 4.7 cost more than Opus 4.6?
List price is unchanged, but practical cost can rise because of the new tokenizer, higher reasoning spend at higher effort, and more expensive image inputs. Measure against your actual workloads.
When should I still use Sonnet instead of Opus 4.7?
Use Sonnet for fast daily coding, smaller edits, cheaper bulk work, and sessions where speed matters more than frontier-level reasoning.
Sources
- Introducing Claude Opus 4.7
- Best practices for using Claude Opus 4.7 with Claude Code
- Using Claude Code: session management and 1M context
- Project Glasswing
- Claude Code best practices docs
Related Pages
Stop configuring. Start building.
Claude Opus 4.5 in Claude Code
Opus 4.5 shipped with $5/$25 pricing and 76% fewer output tokens than Sonnet 4.5. Set it as your default in two commands.
Claude Opus 4.7 Use Cases
Real Claude Opus 4.7 workflows across coding, security, legal, finance, document reasoning, multimodal review, and long-running Claude Code agents.