Build This Now
Build This Now
Claude Code モデルの選び方DeepSeek V4: Pricing, Context, and MigrationClaude Codeの品質低下:実際に何が起きていたのかClaude Opus 4.7 vs GPT-5.5Claude Opus 4.7 対 他のAIモデル比較Claude Mythos: ループで考えるモデルClaude Opus 4.5 in Claude CodeClaude Opus 4.7Claude Opus 4.7 vs 4.6Claude Opus 4.7 の活用シーンClaude Opus 4.6Claude Sonnet 4.6Claude Opus 4.5Claude Sonnet 4.5Claude Haiku 4.5Claude Opus 4.1Claude 4Claude 3.7 SonnetClaude 3.5 Sonnet v2 と Claude 3.5 HaikuClaude 3.5 SonnetClaude 3全Claudeモデル一覧
speedy_devvkoen_salo
Blog/Model Picker/Claude Opus 4.7 vs 4.6

Claude Opus 4.7 vs 4.6

Anthropic kept the $5/$25 rate card on Opus 4.7, but a new tokenizer pushes the same English prompt up to 1.46x more billed tokens, raising real bills 12 to 27 percent after caching.

設定をやめて、構築を始めよう。

AIオーケストレーション付きSaaSビルダーテンプレート。

Published May 15, 20269 min readModel Picker hub

The Opus 4.7 sticker price reads identical to Opus 4.6. Five dollars per million input. Twenty-five per million output. What changed sits one layer below the price page: the tokenizer counts the same English text differently, so the same job bills more on 4.7.

Simon Willison ran one system prompt through Anthropic's own count_tokens API on both models. Opus 4.6 returned 5,039 tokens. Opus 4.7 returned 7,335. That is a 1.46x ratio on plain English text, and it sits 11 points above Anthropic's stated 1.0 to 1.35x upper bound (per Simon Willison's token counter writeup).

OpenRouter then ran the same comparison across one million real production requests. Their finding: 12 to 27 percent more billed tokens for typical 2K to 25K prompts after prompt caching, and 32 to 45 percent more without caching (per OpenRouter's tokenizer analysis).

One day before this post went live, Claude Code v2.1.142 flipped fast mode default to 4.7. So your next session starts paying the new rate by default unless you override it.

The Side-by-Side

Pull up the two model spec sheets next to each other and the rate card looks frozen. The work below the price line is where 4.7 paid for itself, and where it costs you more:

DimensionOpus 4.6Opus 4.7Source
API model IDclaude-opus-4-6claude-opus-4-7Anthropic docs
Input price (per 1M)$5$5Anthropic pricing
Output price (per 1M)$25$25Anthropic pricing
Context window1M tokens1M tokensAnthropic docs
Max output128K tokens128K tokensAnthropic docs
Tokenizer (text)1.0x baseline1.0 to 1.46xWillison, OpenRouter
Effective bill change (2K to 25K, cached)baseline+21 to +27 percentOpenRouter
Effective bill change (under 2K)baseline-1.6 percentOpenRouter
SWE-bench Verified80.8%87.6% (+6.8)Vellum
SWE-bench Pro53.4%64.3% (+10.9)Vellum
Terminal-Bench 2.065.4%69.4% (+4.0)Vellum
GPQA Diamond91.3%94.2% (+2.9)Vellum
OSWorld-Verified72.7%78.0% (+5.3)Vellum
CharXiv (no tools)69.1%82.1% (+13.0)Vellum
BrowseComp (agentic search)83.7%79.3% (-4.4)Vellum
Output speedsimilar68.2 tokens/secArtificial Analysis
TTFT (standard)similar16.26 secArtificial Analysis
Fast mode2.5x speed at 6x price2.5x speed at 6x priceAnthropic announcement
Vision max edgesmaller2,576 px (~3.75 MP)Anthropic announcement
Effort levelshigh, maxadds xhighAnthropic announcement
Claude Code fast default (2026-05-14)n/a4.7Claude Code CHANGELOG

Three rows do the heavy lifting here. SWE-bench Pro climbed nearly eleven points, which is the largest jump on the table for hard agent coding work. CharXiv (no tools) went up thirteen points, so chart and figure parsing got a real upgrade. BrowseComp regressed 4.4 points, so anything that looks like agentic web research got a little worse, not better.

What a Real Call Costs Now

Take a representative Claude Code build call. The agent reads ~25K tokens of context (your repo plus a spec) and writes ~4K tokens of code:

On Opus 4.6:

input:  25,000 tokens × $5/1M  = $0.125
output:  4,000 tokens × $25/1M = $0.100
total                          = $0.225

On Opus 4.7, the same English source gets recounted by the new tokenizer. OpenRouter measured native inflation at 1.34x for 25K to 50K prompts, and 13 to 30 percent longer completions at long context. Apply both:

input:  25,000 × 1.34 = 33,500 tokens × $5/1M  = $0.1675
output:  4,000 × 1.13 =  4,520 tokens × $25/1M = $0.1130
total                                          = $0.2805

A 24.7 percent effective increase for the exact same job. No rate-card change. No menu warning. The tokenizer ate your margin.

Run that math on a working day. A solo founder pushing 200 agent calls daily moves from $45 to $56. Same code shipped. Same prompts. About $330 a month in extra spend you did not opt into.

Two cases where the math flips. For long-context calls at 128K and up, prompt caching absorbs about 93 percent of the inflation, so the effective gap shrinks to roughly 15 percent. For short calls under 2K tokens, completions get 62 percent shorter on 4.7, so the bill is actually 1.6 percent cheaper. Mid-range prompts are where the cost lands hardest.

The Default Flip Nobody Warned You About

Claude Code v2.1.142 shipped on 2026-05-14, one day before this post. The changelog entry buries the change two bullets deep: fast mode now runs on Opus 4.7 by default. Every /fast you typed yesterday on 4.6 fires on 4.7 today.

That is a quiet rate change, not a rate-card one. Six times the per-token price of standard mode times the new tokenizer ratio. Fast mode on long English prompts now bills closer to seven and a half times the cost of standard 4.6, depending on prompt size.

You can opt back. Anthropic shipped a new env var alongside the default flip:

# Pin Claude Code fast mode to Opus 4.6 (not the new 4.7 default).
# Active as of Claude Code v2.1.142 (2026-05-14).
export CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE=1

# Verify:
claude --version   # expect 2.1.142 or later
claude /fast       # confirm fast mode is on
# Fast mode now runs on Opus 4.6 regardless of any other env var.

A few things to know about the flag. It takes precedence over CLAUDE_CODE_ENABLE_OPUS_4_7_FAST_MODE if both are set. Drop it in ~/.zshrc for global stickiness, or your project .env for per-repo stickiness. And confirm it still ships in your version with a quick search:

claude --help | grep OPUS_4_6

Anthropic typically keeps these flags around for six to twelve months after a default flip.

When 4.7 Actually Pays for Itself

The benchmark gains are real and they cluster in specific places. SWE-bench Pro at +10.9 points means hard, multi-file agent coding work lands more often on the first try. Terminal-Bench 2.0 at +4.0 helps long agent runs that touch many shell commands. CharXiv at +13.0 means chart and figure understanding got a real lift, useful for vision pipelines that read research PDFs or dashboards.

Stay on 4.7 when the call profile matches at least one of these:

  • Hard SWE work where SWE-bench Pro improvement (+10.9) compounds across a long task.
  • High-resolution images, charts, or PDFs where CharXiv (+13.0) shows up in your output.
  • Prompts at 128K and up where caching captures ~93 percent of the tokenizer inflation.
  • Plan-then-execute agents where one stronger plan saves five weaker build cycles.

When to Pin Back to 4.6

Three workloads regress or break even on 4.7 once you account for cost. Override fast mode back to 4.6 when:

  • Prompts sit between 2K and 25K and you do not use prompt caching. This is the worst tokenizer cost zone.
  • The job involves agentic web research. BrowseComp dropped from 83.7 to 79.3 percent, a 4.4 point regression on 4.7.
  • You run a Claude Max plan and your session caps now burn down 1.3 to 3 times faster than they did last month.

Evaluator and judgment work does not need the SWE-bench Pro lift. Linters and doc writers do not gain from CharXiv. Those agents still produce identical output on 4.6 at a real discount.

Per-Agent Model Selection

Most teams pick one model per project. The cost shift on 4.7 makes per-agent selection the better default. Different roles in an agent fleet get different value from the same intelligence step.

A planner reads a spec, decides the file layout, picks the build order. SWE-bench Pro +10.9 lands directly here. Keep planners on 4.7.

A backend agent or a designer agent writes code. The same SWE-bench Pro lift applies. Keep them on 4.7.

An evaluator agent reads code and finds gaps. Judgment does not need the new ceiling. Pin to 4.6 with the override flag and pocket the difference.

A linter, a formatter, a doc writer. Mechanical work. Pin to 4.6.

A test agent runs the test plan and reports back. Mechanical too. Pin to 4.6.

Five agents pinned to 4.6, three on 4.7, on a normal feature build. The build still gets the planner and builder lifts where they matter, and the cheap loops stay cheap.

Build This Now ships exactly this layout out of the box. The framework wires 18 specialist agents (32 with on-demand) into a .claude/ directory, each with a defined role and a configured model. Override the planner and builders to 4.7. Pin the evaluators, linters, doc writers, and testers to 4.6. The model selection lives next to the agent definition, not the project root. One framework. Many cost profiles.

The Verdict

Same rate card, different bill. Opus 4.7 prints better numbers on the benchmarks that matter for hard agent coding, and it costs more for the same English prompt almost everywhere except the very short tail. Pin 4.6 for evaluators, linters, and doc writers. Keep 4.7 for planners and SWE-heavy builders. Set the override flag once, forget about it, and let each agent role buy what it actually needs.

More in Model Picker

  • Claude Mythos: ループで考えるモデル
    Claude Mythosはrecurrent-depth(再帰的深度)アーキテクチャを採用していると考えられています。1つの共有レイヤーをN回ループし、ACTハルティングにより難しい質問はより多くのパスを、簡単な質問は早期に停止します。
  • Claude Opus 4.7 対 他のAIモデル比較
    Claude Opus 4.7、GPT-5.4、Kimi K2.6、Gemini 3.1 Pro、DeepSeek V3.2をベンチマーク、コンテキストウィンドウ、エージェント信頼性、コストの観点から比較し、用途に応じた最適な選択を解説します。
  • DeepSeek V4: Pricing, Context, and Migration
    DeepSeek V4 ships two models: V4-Flash at $0.28/M output and V4-Pro at $3.48/M. Both carry a genuine 1M context window and drop into any Anthropic-compatible SDK with one line changed.
  • 全Claudeモデル一覧
    Claudeの全モデルを1ページに集約: Claude 3、3.5、3.7、4、Opus 4.1〜4.6、Sonnet 4.5・4.6、Haiku 4.5。スペック、価格、ベンチマーク、使い分けの指針。
  • Claude 3.5 Sonnet v2 と Claude 3.5 Haiku
    Claude 3.5 Sonnet v2 と 3.5 Haiku が2024年10月にリリース。Computer Use ベータ版、カーソル操作、コーディングとツール使用の強化、Haiku は $0.80/$4。
  • Claude 3.5 Sonnet
    Claude 3.5 Sonnetは2024年6月に $3/$15 でリリース。MMLU、GPQA、HumanEval でClaude 3 Opusを5分の1のコストで上回る。スペック、ベンチマーク、コード強化。

設定をやめて、構築を始めよう。

AIオーケストレーション付きSaaSビルダーテンプレート。

Claude Opus 4.7

AnthropicのApril 2026フラッグシップ、Claude Opus 4.7:より強力なハードコーディング、ドキュメント推論、長時間稼働エージェントタスク、Opus 4.6と同じ$5/$25の価格。

Claude Opus 4.7 の活用シーン

マルチファイルのコーディング、セキュリティレビュー、法務・財務分析、ドキュメント推論、マルチモーダル処理、長時間の Claude Code エージェントまで、Opus 4.7 の実践的な使いどころをまとめました。

On this page

The Side-by-Side
What a Real Call Costs Now
The Default Flip Nobody Warned You About
When 4.7 Actually Pays for Itself
When to Pin Back to 4.6
Per-Agent Model Selection
The Verdict

設定をやめて、構築を始めよう。

AIオーケストレーション付きSaaSビルダーテンプレート。