Claude 3.7 Sonnet shipped February 2025 with hybrid reasoning and extended thinking. 64K output, thinking-budget control, SWE-bench coding gains at $3/$15.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
Claude 3.7 Sonnet was the model that taught Claude to think before it spoke. Released February 25, 2025, it brought hybrid reasoning: a mode where Claude could work through a problem internally, step by step, and then deliver a more accurate answer. This was the last Claude 3.x model, and it laid the groundwork for everything that came in Claude 4.
| Spec | Details |
|---|---|
| API ID | claude-3-7-sonnet-20250225 |
| Context window | 200K tokens |
| Input pricing | $3 / 1M tokens |
| Output pricing | $15 / 1M tokens |
| Thinking token pricing | Included in output pricing |
| Max output tokens | 64,000 (with extended thinking) |
| Release date | February 25, 2025 |
The defining feature. When you turn it on, Claude runs an internal reasoning loop before it writes a single output token. The model spends a thinking budget to work the problem, then delivers the answer. For math proofs, multi-step code logic, scientific work, and planning tasks, this produced a lot better results.
API users got fine-grained control of the budget. Set it low for quick questions. Set it high for hard problems. Thinking tokens counted toward output pricing, but on the hard tasks the quality jump was worth the cost.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
One conversation, two modes. Quick answers for simple questions. Slow, step-by-step reasoning for the hard ones. You did not have to pick between a "thinking model" and a "fast model". The same model handled both and switched based on the task.
Claude 3.7 Sonnet set new highs on SWE-bench Verified, the benchmark that tests real GitHub issues (not synthetic problems). It could read a bug report, work through the codebase, find the root cause, and ship a working fix more reliably than any Claude before it.
Building on the 3.5 Sonnet v2 gains, 3.7 Sonnet got better at following long, multi-constraint instructions. Images, charts, and mixed media inputs all came back with higher accuracy too.
The pattern was simple:
The biggest wins landed on math, science, and multi-file code changes. Tasks that used to need several back-and-forth corrections often came back right on the first try.
For a deeper look at getting the most out of it, see the deep thinking techniques guide.
Same $3/$15 per million tokens. Same 200K context window. But meaningfully better reasoning, coding, and instruction-following. The max output token limit jumped to 64,000 when extended thinking was on (up from 8,192), which made it real for generating long code, docs, or analysis in a single response.
The most important difference was qualitative: Claude 3.7 Sonnet made fewer reasoning mistakes on hard tasks. Extended thinking gave it a way to "show its work" internally, catching errors before they reached the output.
| Model | Status |
|---|---|
| Claude 3.7 Sonnet | Superseded by Claude 4 generation |
Claude 3.7 Sonnet was the bridge between 3.x and 4.x. The hybrid reasoning and extended thinking ideas it pioneered became standard in Claude 4 and every model that followed.