Claude Opus 4.7

Claude Opus 4.7 is the first Claude release in a while that feels bigger than a point upgrade. The price is unchanged. The 1M context window is unchanged. The 128K output ceiling is unchanged. What changed is the part people actually feel inside Claude Code: the model is better at hard, ambiguous, long-running work that used to need constant supervision.

That shows up in three places.

It catches more of its own mistakes before acting.
It stays coherent for longer inside multi-step agent loops.
It lands better across domains that are not just "write code": cyber workflows, code review, dense screenshots, enterprise docs, contracts, diagrams, and other ambiguous source-heavy work.

If you already live in Claude Code, the short answer is simple: Opus 4.7 is the new default upgrade for high-stakes engineering sessions. If you want the workflow advice, read the dedicated Opus 4.7 best practices guide. If you want concrete examples by domain, read the companion Opus 4.7 use cases page.

Quick Verdict

Use Opus 4.7 when the work is expensive to get wrong:

complex refactors across many files
debugging with incomplete or conflicting evidence
code review where subtle bugs matter
cyber-defense, vulnerability research, or security auditing
document-heavy work in legal, finance, and operations
multimodal tasks with dense screenshots, diagrams, or UI mocks

Stay on Sonnet for smaller daily edits where speed and cost matter more than maximum reasoning depth.

Key Specs

Spec	Details
API ID	`claude-opus-4-7`
Release date	April 16, 2026
Context window	1M tokens
Max output	128,000 tokens
Pricing	$5 input / $25 output per 1M tokens
Thinking mode	Adaptive thinking
Effort levels	`low`, `medium`, `high`, `xhigh`, `max`
Claude Code default effort	`xhigh`
Knowledge cutoff	January 2026
Status	Current Opus flagship

Claude Opus 4.7 vs Opus 4.6

The basic story is not "a bit smarter." It is "more reliable on the hard slice of work."

Area	Opus 4.6	Opus 4.7
CursorBench	58%	70%
Rakuten-SWE-Bench	Baseline	3x more production tasks resolved
XBOW visual-acuity	54.5%	98.5%
OfficeQA Pro	Baseline	21% fewer errors
BigLaw Bench	Lower	90.9% at `high` effort
Notion Agent tool errors	Baseline	about one third as many
Resolution support	1568px / 1.15MP	2576px / 3.75MP
Default Claude Code effort	`high`	`xhigh`
Thinking control	adaptive, older migration path	adaptive only, fixed-budget thinking removed
Tool use style	more tool-happy	more selective, more reasoning-first
Subagent behavior	delegates more freely	delegates more selectively

The important part is behavioral, not just numerical. Anthropic and launch partners describe Opus 4.7 as more literal about instructions, more willing to verify assumptions, better at pushing through long tasks, and less likely to loop or fail silently halfway through.

What Actually Improved in Practice

1. Self-verification shows up more often

One of the clearest launch signals is that Opus 4.7 does more checking before it commits to an answer or a code change.

Anthropic's launch page includes Vercel describing a new behavior: the model does proofs on systems code before starting work. Hex says it is better at admitting when data is missing instead of inventing plausible fallback logic. That matters because a lot of real engineering pain is not syntax failure. It is confident-but-wrong reasoning on incomplete context.

Inside Claude Code, this tends to look like:

reading one more file before editing
checking a call site before changing a type
confirming an assumption about state shape or schema
pausing to validate a concurrency or migration path

That extra step is often the difference between a clean first pass and a 40-minute loop.

2. Long-running agentic work derails less

Devin reported that Opus 4.7 works coherently for hours and pushes through difficult tasks instead of giving up early. Notion reported a 14% gain on complex multi-step workflows with roughly one third of the tool errors of Opus 4.6. Genspark called out loop resistance, consistency, and graceful recovery as the three production traits that matter most.

That makes Opus 4.7 a better fit for:

longer refactors
async coding agents
CI and automation workflows
service-wide review passes
investigations where the model has to read, compare, and revise repeatedly

3. Hard coding moved, not just easy coding

CursorBench climbing from 58% to 70% matters because it is closer to the vague, messy, real prompts developers actually hand to coding agents. Rakuten's 3x improvement on production SWE tasks matters because it suggests the gain is not limited to toy examples or benchmark-friendly problems.

CodeRabbit reported over 10% better recall on review workloads while keeping precision stable. Warp and Qodo both called out harder bug classes that 4.7 now catches or resolves. Factory reported a 10-15% lift in task success for Droids with fewer tool errors and more reliable follow-through.

The pattern is consistent: Opus 4.7 is not just "more eloquent." It clears a harder class of engineering work.

4. Dense vision inputs are finally first-class

The resolution jump is one of the most underrated changes in the release. Moving from 1568px / 1.15MP to 2576px / 3.75MP is not cosmetic. It changes what you can trust the model to read without cropping.

That especially helps when the input is:

a packed dashboard screenshot
a terminal capture with small text
a technical diagram
a design mockup with dense labels
a scanned contract table or document excerpt
a chemistry or life-sciences figure

XBOW's visual-acuity jump from 54.5% to 98.5% is the sharpest proof that the added pixels translate into real utility.

5. It is stronger outside pure coding

Anthropic's release positioned Opus 4.7 as stronger on coding, enterprise workflows, and long-running agentic tasks. The partner examples back that up:

Cybersecurity: XBOW says their biggest visual pain point in autonomous pentesting effectively disappeared.
Legal: Harvey reports 90.9% on BigLaw Bench at high effort, with better reasoning on ambiguous edits and review tables.
Docs and enterprise reasoning: Databricks reported 21% fewer errors on OfficeQA Pro.
Finance and research: Applied AI testers highlighted stronger disclosure discipline and better long-context performance.
Life sciences: Solve Intelligence called out gains on chemical structures and technical diagrams.
Design and UI: Lovable said the design taste is strong enough that the model makes choices they would actually ship.

That makes Opus 4.7 a broader "high-stakes knowledge work" model, not just a coding model.

Benchmark Results That Matter

The full benchmark wall is useful for launch day, but only some numbers map cleanly to user value.

Benchmark	Why it matters
CursorBench: 70%	Closer to real coding-agent prompts than narrow coding evals
Rakuten-SWE-Bench: 3x more resolved	Signals movement on production engineering tasks, not just toy repos
XBOW visual-acuity: 98.5%	Proves dense image understanding is materially better
BigLaw Bench: 90.9%	Strong signal for contract and legal-review use cases
OfficeQA Pro: 21% fewer errors	Useful proxy for enterprise docs and document reasoning
Notion Agent: +14%, fewer tool errors	Good indicator for multi-step agent reliability
CodeRabbit: recall +10%	Strong signal for review and bug-finding workflows

If you are choosing a model for Claude Code, CursorBench, Rakuten, Notion, CodeRabbit, and XBOW are the most actionable signals in this release.

Where Opus 4.7 Lands Hardest

Claude Code engineering sessions

This is the obvious one. Opus 4.7 is better when the task is vague, multi-file, or expensive to redo. API migrations, cross-cutting refactors, concurrency bugs, architecture reviews, and codebase-wide cleanups all benefit from the model being more literal, more patient, and more verification-heavy.

Security and cyber-defense workflows

Opus 4.7 matters in security because coding capability and cyber capability are now tightly linked. Project Glasswing, announced on April 7, 2026, is about Mythos Preview, not Opus 4.7. But Anthropic explicitly references Glasswing in the April 16, 2026 Opus 4.7 launch to explain why new cyber safeguards matter here: Opus 4.7 is the first public model where they are testing some of those safeguards in the real world.

That gives you two conclusions:

the model is strong enough to be useful for serious defensive security work
the model is strong enough that Anthropic is actively constraining risky misuse

If you do legitimate vulnerability research, penetration testing, or red-teaming, Anthropic points professionals toward the Cyber Verification Program.

Legal, finance, and enterprise operations

Opus 4.7 is a strong fit when the work is: compare, verify, summarize, and avoid hallucinating the missing pieces. Contracts, audit trails, review tables, financial memos, policy docs, and internal operating documents all benefit from the model's stronger calibration and document reasoning.

Multimodal product, design, and R&D work

Better screenshot reading and diagram handling make it more useful for design critique, product QA, life sciences workflows, patents, and technical documentation. If the source material used to require manual zooming or cropping, Opus 4.7 is much more usable.

For more concrete domain examples and prompt ideas, see Claude Opus 4.7 use cases.

Cyber, Risk, and Safety: Why This Release Is Different

Anthropic's launch messaging around Opus 4.7 is unusual because it does not just celebrate capability. It places the release inside a live cyber-risk story.

Anthropic says Opus 4.7 is less capable than Mythos Preview, but still strong enough that they experimented during training with differentially reducing cyber capabilities relative to Mythos. They also shipped automated safeguards that detect and block requests indicating prohibited or high-risk cybersecurity use.

That matters for anyone writing about the model because it changes the angle:

Opus 4.7 is not just a faster copilot.
It sits in the category where cyber benefit and cyber risk now move together.
Defensive workflows are a legitimate strength area.
Unsafe or disallowed offensive workflows are an explicit deployment concern.

In practical terms, that means you should position Opus 4.7 as strong for:

secure code review
defensive audit passes
threat modeling
vulnerability triage
pentest support inside approved programs
security documentation and remediation planning

Not as a generic "do anything cyber" engine.

Vision: The 3x Resolution Upgrade

Opus 4.7 is the first Claude release where the image pipeline deserves its own buying decision.

The new resolution ceiling means:

less cropping before sending screenshots
better reliability on small text and dense UIs
stronger interpretation of technical diagrams
cleaner mapping from returned coordinates to real pixels

The trade-off is token cost. Anthropic notes that a full-resolution image can consume roughly 4,784 tokens instead of the roughly 1,600-token range people were used to. For image-heavy workflows, downsampling is now part of cost control.

Best Practices for Opus 4.7 in Claude Code

Anthropic's own guidance for Opus 4.7 inside Claude Code is more behavioral than technical. The theme is: delegate better, batch context earlier, and reduce unnecessary back-and-forth.

The high-signal habits are:

put the real task in the first turn: intent, constraints, file paths, acceptance criteria
reduce user turns where possible, because interactive back-and-forth adds reasoning overhead
keep xhigh as the default for serious coding work
drop to high when you need to control spend across many parallel sessions
reserve max for very hard work and eval-style ceiling testing
tell the model explicitly when to use tools and when to fan out to subagents
use auto mode when the task is well-scoped and you trust the overall direction
run the new /fewer-permission-prompts skill after a few sessions to turn repeated safe prompts into an allowlist policy
use recaps when returning to a long-running session so you can recover state quickly without re-reading the full transcript
use focus view when you trust the model and only want the final result instead of every intermediate step
tune /effort deliberately instead of treating the default as always correct
start a fresh session when the task changes, instead of dragging stale context forward

Those last four points are exactly the sort of "small workflow change, big quality jump" advice Boris Cherny started emphasizing in his Opus 4.7 launch-day X thread. The underlying pattern is consistent with the official docs too: fewer interruptions, cleaner session recovery, less transcript noise, and more deliberate effort control all matter more once the model is capable of longer autonomous runs.

The full workflow version of that is in Claude Opus 4.7 best practices.

Migration Notes from Opus 4.6

If you are moving API workloads from 4.6 to 4.7, do not just swap the model name and ship.

Adaptive thinking replaces fixed-budget thinking

The older thinking: { type: "enabled", budget_tokens: N } flow is gone for Opus 4.7. Use adaptive thinking and effort levels instead.

Non-default sampling parameters are gone

If your code still sets temperature, top_p, or top_k away from default values, Opus 4.7 returns a 400. Remove those knobs and shape behavior through prompting and effort.

Thinking display changed

Thinking blocks are empty by default unless you explicitly opt in to summarized display. If your UI depended on visible thinking text, you need to update it.

The tokenizer changed

Anthropic says the same input can map to roughly 1.0x to 1.35x the prior token count depending on content. Re-baseline cost and token estimates before assuming old budgets still apply.

High-resolution images cost more

If you were previously sending screenshots casually, 4.7 makes image quality much better and image token cost materially higher. Treat downsampling as a conscious lever.

Task budgets are worth testing

Anthropic introduced task budgets as a public beta so models can self-pace across a full agentic run. If you run longer loops, test them now rather than waiting until a runaway session bites you.

Pricing and Cost

Opus 4.7 kept the same headline pricing as Opus 4.6:

Tier	Cost
Input	$5 per 1M tokens
Output	$25 per 1M tokens

That does not mean cost is identical in practice.

Your real bill is shaped by:

the new tokenizer
higher reasoning spend at higher effort levels
more expensive full-resolution images
whether you run interactive multi-turn sessions or one-shot delegated tasks

The optimistic reading comes from launch partners like Hex and Replit: better quality at lower effort can offset a chunk of the raw token increase. The correct move is not to assume. Measure on real workloads.

Should You Upgrade to Claude Opus 4.7?

Yes, if your pain points are:

agents that stop halfway through
models that sound plausible but guess too much
hard code review and debugging work
dense visual or document inputs
multi-step workflows with tools

Maybe not immediately, or not as your default, if your workload is mostly:

small edit cycles
cheap bulk automation
low-risk content generation
quick Q&A where Sonnet already lands

For most serious Claude Code users, the right strategy is simple: keep Sonnet as the fast everyday option, and use Opus 4.7 as the flagship for intelligence-sensitive work.

Frequently Asked Questions

Is Claude Opus 4.7 worth it over Opus 4.6?

For hard engineering, review, document-heavy, and long-running agentic work, yes. The most important gains are not the raw benchmark numbers. They are the better calibration, stronger self-verification, lower tool-error rate, and better behavior on ambiguous tasks.

What is the best Claude Code effort setting for Opus 4.7?

xhigh is the default in Claude Code and the right starting point for most serious coding sessions. Use high when you need better cost control across many sessions. Use max deliberately for the hardest work, not as a blanket default.

Is Claude Opus 4.7 better for cybersecurity?

It is better for legitimate defensive security workflows, code review, vulnerability triage, and cyber-adjacent analysis. Anthropic also shipped explicit cyber safeguards with the model, which is part of why the release matters.

Does Opus 4.7 cost more than Opus 4.6?

List price is unchanged, but practical cost can rise because of the new tokenizer, higher reasoning spend at higher effort, and more expensive image inputs. Measure against your actual workloads.

When should I still use Sonnet instead of Opus 4.7?

Use Sonnet for fast daily coding, smaller edits, cheaper bulk work, and sessions where speed matters more than frontier-level reasoning.

Common Questions

What is Claude Opus 4.7?

Claude Opus 4.7 is Anthropic's flagship model as of April 2026. It targets hard, long-running engineering and knowledge work: complex code reviews, multi-step agent tasks, dense document reasoning, and security workflows. The context window is 1M tokens and the price is $5 input / $25 output per 1M tokens, same as Opus 4.6.

What is the context window for Claude Opus 4.7?

Claude Opus 4.7 has a 1M token context window and a maximum output of 128,000 tokens. Both limits are unchanged from Opus 4.6. The practical difference is that Opus 4.7 uses those tokens more effectively in long-running sessions because it stays coherent longer and derails less on multi-step reasoning.

Does Claude Opus 4.7 support computer use?

Yes. Opus 4.7 supports vision and tool use, including computer use workflows. The bigger visual change in this release is the resolution upgrade from 1,568px to 2,576px, which makes dense screenshots, UI mocks, and technical diagrams substantially more readable without manual cropping.

What is adaptive thinking in Claude Opus 4.7?

Adaptive thinking is the reasoning mode in Opus 4.7 that replaces the older fixed-budget thinking from 4.6. Instead of setting a token budget manually, you control depth through effort levels: low, medium, high, xhigh, or max. The older thinking: { budget_tokens: N } parameter is not supported in 4.7 and returns a 400 error.

Is Claude Opus 4.7 available via API?

Yes. Opus 4.7 is available through the Anthropic API with the model ID claude-opus-4-7. If you are migrating from Opus 4.6, note that fixed-budget thinking parameters, non-default sampling values like temperature and top_p, and the old tokenizer assumptions all changed. A direct model-name swap without testing is not safe.

How does Claude Opus 4.7 compare to GPT-4o?

On agentic coding work, Opus 4.7 leads: CursorBench at 70%, 3x improvement on Rakuten production SWE tasks, and 90.9% on BigLaw Bench legal reasoning. GPT-4o is faster and cheaper. Opus 4.7 is the better fit when the task is expensive to get wrong and requires sustained multi-step reasoning.