Claude 3.5 Sonnet launched June 2024 at $3/$15, beating Claude 3 Opus on MMLU, GPQA, HumanEval at a fifth of the cost. Specs, benchmarks, and code gains.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
Claude 3.5 Sonnet is the release where developers stopped defaulting to the biggest model. Anthropic shipped it on June 20, 2024, and from day one the pricing math flipped. A $3-input mid-tier model was scoring at or above the $15-input flagship on most public evals. Bigger stopped meaning better.
| Spec | Details |
|---|---|
| API ID | claude-3-5-sonnet-20240620 |
| Context window | 200K tokens |
| Input pricing | $3 / 1M tokens |
| Output pricing | $15 / 1M tokens |
| Release date | June 20, 2024 |
| Max output tokens | 8,192 |
Top-tier smarts at a mid-tier price. Graduate-level reasoning, undergraduate knowledge, code generation. Across GPQA, MMLU, and HumanEval, this release held its own against Claude 3 Opus or scored above it. And it did so while the bill ran $3 input and $15 output per million tokens, next to the $15/$75 tier the flagship was charging. No model had paired those two things before.
Coding strength. This is the version that turned Claude into a real tool for writing code. The headline code score climbed from 55% on the flagship to 64% here. A lot of the developers still on GPT-4 for their code switched over during this window. Reasoning plus output quality made this the go-to choice for software engineering work.
Speed. Response rate landed at roughly 2x what the flagship was delivering. On an interactive coding session, a chat UI, or anywhere latency is the bottleneck, you felt it on the first request.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
The "Sonnet is enough" moment. For a long time the rule of thumb was simple. Harder the problem, bigger the model. This release snapped that reflex. Teams who were paying Opus prices realized Sonnet got them to the same answer (or a better one) for one-fifth the bill. From that point on, picking a model started with Sonnet, not at the top of the menu.
| Benchmark | Claude 3 Opus | Claude 3.5 Sonnet |
|---|---|---|
| MMLU | 86.8% | 88.7% |
| GPQA | 50.4% | 59.4% |
| HumanEval | 55% | 64% |
| GSM8K | 95.0% | 96.4% |
Every row favors the cheaper model. The table wrote its own argument.
At the 3.5 Sonnet launch, Anthropic also named a Claude 3.5 Opus on the roadmap. It never arrived. Claude 4 absorbed the whole 3.5 family before a bigger 3.5 felt necessary. With Sonnet putting up the numbers it did, the market never pushed hard for a top-tier 3.5.
| Model | Status |
|---|---|
| Claude 3.5 Sonnet (v1) | Superseded by v2 (October 2024) |
A v2 took over in October 2024. It carried fresh gains on top of v1 and introduced Computer Use, a first for any frontier model.