Claude Opus 4.7 vs 4.6

The Opus 4.7 sticker price reads identical to Opus 4.6. Five dollars per million input. Twenty-five per million output. What changed sits one layer below the price page: the tokenizer counts the same English text differently, so the same job bills more on 4.7.

Simon Willison ran one system prompt through Anthropic's own count_tokens API on both models. Opus 4.6 returned 5,039 tokens. Opus 4.7 returned 7,335. That is a 1.46x ratio on plain English text, and it sits 11 points above Anthropic's stated 1.0 to 1.35x upper bound (per Simon Willison's token counter writeup).

OpenRouter then ran the same comparison across one million real production requests. Their finding: 12 to 27 percent more billed tokens for typical 2K to 25K prompts after prompt caching, and 32 to 45 percent more without caching (per OpenRouter's tokenizer analysis).

One day before this post went live, Claude Code v2.1.142 flipped fast mode default to 4.7. So your next session starts paying the new rate by default unless you override it.

The Side-by-Side

Pull up the two model spec sheets next to each other and the rate card looks frozen. The work below the price line is where 4.7 paid for itself, and where it costs you more:

Dimension	Opus 4.6	Opus 4.7	Source
API model ID	`claude-opus-4-6`	`claude-opus-4-7`	Anthropic docs
Input price (per 1M)	$5	$5	Anthropic pricing
Output price (per 1M)	$25	$25	Anthropic pricing
Context window	1M tokens	1M tokens	Anthropic docs
Max output	128K tokens	128K tokens	Anthropic docs
Tokenizer (text)	1.0x baseline	1.0 to 1.46x	Willison, OpenRouter
Effective bill change (2K to 25K, cached)	baseline	+21 to +27 percent	OpenRouter
Effective bill change (under 2K)	baseline	-1.6 percent	OpenRouter
SWE-bench Verified	80.8%	87.6% (+6.8)	Vellum
SWE-bench Pro	53.4%	64.3% (+10.9)	Vellum
Terminal-Bench 2.0	65.4%	69.4% (+4.0)	Vellum
GPQA Diamond	91.3%	94.2% (+2.9)	Vellum
OSWorld-Verified	72.7%	78.0% (+5.3)	Vellum
CharXiv (no tools)	69.1%	82.1% (+13.0)	Vellum
BrowseComp (agentic search)	83.7%	79.3% (-4.4)	Vellum
Output speed	similar	68.2 tokens/sec	Artificial Analysis
TTFT (standard)	similar	16.26 sec	Artificial Analysis
Fast mode	2.5x speed at 6x price	2.5x speed at 6x price	Anthropic announcement
Vision max edge	smaller	2,576 px (~3.75 MP)	Anthropic announcement
Effort levels	high, max	adds `xhigh`	Anthropic announcement
Claude Code fast default (2026-05-14)	n/a	4.7	Claude Code CHANGELOG

Three rows do the heavy lifting here. SWE-bench Pro climbed nearly eleven points, which is the largest jump on the table for hard agent coding work. CharXiv (no tools) went up thirteen points, so chart and figure parsing got a real upgrade. BrowseComp regressed 4.4 points, so anything that looks like agentic web research got a little worse, not better.

What a Real Call Costs Now

Take a representative Claude Code build call. The agent reads ~25K tokens of context (your repo plus a spec) and writes ~4K tokens of code:

On Opus 4.6:

input:  25,000 tokens × $5/1M  = $0.125
output:  4,000 tokens × $25/1M = $0.100
total                          = $0.225

On Opus 4.7, the same English source gets recounted by the new tokenizer. OpenRouter measured native inflation at 1.34x for 25K to 50K prompts, and 13 to 30 percent longer completions at long context. Apply both:

input:  25,000 × 1.34 = 33,500 tokens × $5/1M  = $0.1675
output:  4,000 × 1.13 =  4,520 tokens × $25/1M = $0.1130
total                                          = $0.2805

A 24.7 percent effective increase for the exact same job. No rate-card change. No menu warning. The tokenizer ate your margin.

Run that math on a working day. A solo founder pushing 200 agent calls daily moves from $45 to $56. Same code shipped. Same prompts. About $330 a month in extra spend you did not opt into.

Two cases where the math flips. For long-context calls at 128K and up, prompt caching absorbs about 93 percent of the inflation, so the effective gap shrinks to roughly 15 percent. For short calls under 2K tokens, completions get 62 percent shorter on 4.7, so the bill is actually 1.6 percent cheaper. Mid-range prompts are where the cost lands hardest.

The Default Flip Nobody Warned You About

Claude Code v2.1.142 shipped on 2026-05-14, one day before this post. The changelog entry buries the change two bullets deep: fast mode now runs on Opus 4.7 by default. Every /fast you typed yesterday on 4.6 fires on 4.7 today.

That is a quiet rate change, not a rate-card one. Six times the per-token price of standard mode times the new tokenizer ratio. Fast mode on long English prompts now bills closer to seven and a half times the cost of standard 4.6, depending on prompt size.

You can opt back. Anthropic shipped a new env var alongside the default flip:

# Pin Claude Code fast mode to Opus 4.6 (not the new 4.7 default).
# Active as of Claude Code v2.1.142 (2026-05-14).
export CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE=1

# Verify:
claude --version   # expect 2.1.142 or later
claude /fast       # confirm fast mode is on
# Fast mode now runs on Opus 4.6 regardless of any other env var.

A few things to know about the flag. It takes precedence over CLAUDE_CODE_ENABLE_OPUS_4_7_FAST_MODE if both are set. Drop it in ~/.zshrc for global stickiness, or your project .env for per-repo stickiness. And confirm it still ships in your version with a quick search:

claude --help | grep OPUS_4_6

Anthropic typically keeps these flags around for six to twelve months after a default flip.

When 4.7 Actually Pays for Itself

The benchmark gains are real and they cluster in specific places. SWE-bench Pro at +10.9 points means hard, multi-file agent coding work lands more often on the first try. Terminal-Bench 2.0 at +4.0 helps long agent runs that touch many shell commands. CharXiv at +13.0 means chart and figure understanding got a real lift, useful for vision pipelines that read research PDFs or dashboards.

Stay on 4.7 when the call profile matches at least one of these:

Hard SWE work where SWE-bench Pro improvement (+10.9) compounds across a long task.
High-resolution images, charts, or PDFs where CharXiv (+13.0) shows up in your output.
Prompts at 128K and up where caching captures ~93 percent of the tokenizer inflation.
Plan-then-execute agents where one stronger plan saves five weaker build cycles.

When to Pin Back to 4.6

Three workloads regress or break even on 4.7 once you account for cost. Override fast mode back to 4.6 when:

Prompts sit between 2K and 25K and you do not use prompt caching. This is the worst tokenizer cost zone.
The job involves agentic web research. BrowseComp dropped from 83.7 to 79.3 percent, a 4.4 point regression on 4.7.
You run a Claude Max plan and your session caps now burn down 1.3 to 3 times faster than they did last month.

Evaluator and judgment work does not need the SWE-bench Pro lift. Linters and doc writers do not gain from CharXiv. Those agents still produce identical output on 4.6 at a real discount.

Per-Agent Model Selection

Most teams pick one model per project. The cost shift on 4.7 makes per-agent selection the better default. Different roles in an agent fleet get different value from the same intelligence step.

A planner reads a spec, decides the file layout, picks the build order. SWE-bench Pro +10.9 lands directly here. Keep planners on 4.7.

A backend agent or a designer agent writes code. The same SWE-bench Pro lift applies. Keep them on 4.7.

An evaluator agent reads code and finds gaps. Judgment does not need the new ceiling. Pin to 4.6 with the override flag and pocket the difference.

A linter, a formatter, a doc writer. Mechanical work. Pin to 4.6.

A test agent runs the test plan and reports back. Mechanical too. Pin to 4.6.

Five agents pinned to 4.6, three on 4.7, on a normal feature build. The build still gets the planner and builder lifts where they matter, and the cheap loops stay cheap.

Build This Now ships exactly this layout out of the box. The framework wires 18 specialist agents (32 with on-demand) into a .claude/ directory, each with a defined role and a configured model. Override the planner and builders to 4.7. Pin the evaluators, linters, doc writers, and testers to 4.6. The model selection lives next to the agent definition, not the project root. One framework. Many cost profiles.

The Verdict

Same rate card, different bill. Opus 4.7 prints better numbers on the benchmarks that matter for hard agent coding, and it costs more for the same English prompt almost everywhere except the very short tail. Pin 4.6 for evaluators, linters, and doc writers. Keep 4.7 for planners and SWE-heavy builders. Set the override flag once, forget about it, and let each agent role buy what it actually needs.