Claude Code Costs After June 15: What Actually Changed
The June 15, 2026 Claude Code billing split was paused before it shipped. Here is what really changed, what it costs now, and the cost-cutting levers that are still live.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
The Claude Code billing split planned for June 15, 2026 was paused before it shipped. Anthropic told customers "Nothing changes for now," and programmatic usage — the Claude Agent SDK, claude -p, GitHub Actions, and third-party apps — still draws from your normal subscription limits, not a separate per-user credit pool. The only thing that actually landed on June 15 was the retirement of two old model IDs (claude-opus-4-20250514 and claude-sonnet-4-20250514), which now return errors. So if you use Claude Code on a Pro or Max plan, your cost did not change. The real cost levers — model choice, prompt caching, /compact, and batching — are unchanged and fully in effect.
This post covers what changed and what didn't, the two-pools model people expected, a Max-plan-vs-API break-even table, and a numbered checklist for cutting token costs.
Table of Contents
- What Actually Changed on June 15
- The Two Pools: Subscription vs Agent SDK Credit
- What Claude Code Costs Right Now
- Max Plan vs API: The Break-Even
- The Cost-Cutting Checklist
- The One Thing You Must Fix: Retired Models
- Frequently Asked Questions
- Wrapping Up
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
What Actually Changed on June 15
Three changes were bundled in the conversation around June 15, 2026. Only two landed, and one of those happened eight months earlier. Separating them is the whole point of this post.
| Change | Status as of June 16, 2026 | Source |
|---|---|---|
| Billing split — programmatic usage moves to a separate per-user monthly credit | PAUSED. "Nothing changes for now." Agent SDK, claude -p, GitHub Actions, and third-party apps still draw from subscription limits. | Anthropic Help Center |
claude-opus-4-20250514 retired | DONE. Retirement date June 15, 2026 (deprecated April 14). Calls now error. Replacement: claude-opus-4-8. | model deprecations |
claude-sonnet-4-20250514 retired | DONE. Retirement date June 15, 2026. Calls now error. Replacement: claude-sonnet-4-6. | model deprecations |
SDK package rename (claude-code-sdk → claude-agent-sdk) | ALREADY DONE — September 29, 2025, not June 15. Unrelated to the billing date. | Agent SDK migration guide |
The billing split was the change everyone braced for. Anthropic told the press by email that it was "working to better align the plan with actual usage patterns" and that, for now, "Nothing changes." The pause arrived right before the implementation date, against the backdrop of a price war with OpenAI.
So the bundling that made June 15 feel like a hard deadline was misleading. The model retirements are real and require action. The SDK rename predates this entirely. The billing split is not in effect. For the full mechanics of what the split would have done — the migration steps, the opt-in flow — see the canonical billing-change post, which we have annotated with the pause.
The Two Pools: Subscription vs Agent SDK Credit
Here is the model that was proposed, and what is actually true today. Read this table as "what people thought June 15 would bring" versus reality.
| Interactive subscription (live now) | Agent SDK credit pool (PROPOSED — paused) | |
|---|---|---|
| What counts | Terminal and IDE sessions, plus — for now — Agent SDK, claude -p, GitHub Actions, third-party apps | Was going to be: Agent SDK, claude -p, GitHub Actions, third-party apps only |
| How billed | Flat subscription ($20 Pro / $100 Max 5x / $200 Max 20x per month), usage capped by plan limits | Was going to be: per-user monthly credit, drawn down by API-equivalent token spend |
| Pooling / sharing | N/A — your plan, your limits | Was going to be: per-user, no pooling, no sharing across seats |
| Rollover | N/A | Was going to be: none — use it or lose it each month |
| Opt-in | N/A | Was going to be: one-time opt-in |
| In effect today? | Yes | No |
The proposed per-user credit amounts were documented: $20 for Pro, $100 for Max 5x, $200 for Max 20x, plus per-seat figures for Team and Enterprise. None of these are charges you pay today. Treat them only as "what may return," because Anthropic said the split is paused "for now," not cancelled.
The practical takeaway: if you run an overnight Agent SDK job, a claude -p script in CI, or a GitHub Action that calls Claude, all of that still counts against your existing subscription limits exactly as it did before June 15. Nothing got carved out into a separate bucket.
What Claude Code Costs Right Now
Two ways to pay for Claude Code, unchanged by June 15:
1. Subscription (terminal and IDE). A flat monthly fee. Pro is $20/month, Max is $100 or $200/month. Usage is bounded by plan limits, not metered per token. This is the cheapest path for steady interactive work because you are not billed by the token at all.
2. API (pay-as-you-go). Per-token pricing on the model you call. The current rates, verified against the pricing page:
| Model | Input ($/MTok) | Output ($/MTok) | Context |
|---|---|---|---|
| Claude Opus 4.8 | $5 | $25 | 1M |
| Claude Sonnet 4.6 | $3 | $15 | 1M |
| Claude Haiku 4.5 | $1 | $5 | 200K |
Both Opus 4.8 and Sonnet 4.6 serve a 1M-token context window at standard pricing with no long-context premium. Opus 4.8 caps output at 128K tokens; Sonnet 4.6 at 64K. For how the million-token window behaves in real sessions, see Claude Code's 1M context in practice.
On top of the base rates, two discounts move the needle hard, both from the pricing page:
- Prompt caching. Cache reads cost 0.1x the base input price. A 5-minute cache write costs 1.25x; a 1-hour cache write costs 2.0x. Break-even is one read for the 5-minute cache, two reads for the 1-hour cache.
- Batch API. A flat 50% off both input and output tokens for non-urgent work.
Max Plan vs API: The Break-Even
When is a flat Max subscription cheaper than paying per token on the API? It comes down to how much output you generate per month. The math below uses the verified rates above and assumes a typical agentic mix that is output-heavy (output tokens dominate cost on Claude models because output is 5x the input rate).
| Monthly usage profile | API cost estimate (Sonnet 4.6) | API cost estimate (Opus 4.8) | Cheaper option |
|---|---|---|---|
| Light: ~3M output tokens | ~$45 + input | ~$75 + input | API on Sonnet; Max 5x on Opus |
| Moderate: ~7M output tokens | ~$105 + input | ~$175 + input | Max 5x ($100) on both |
| Heavy: ~15M output tokens | ~$225 + input | ~$375 + input | Max 20x ($200) wins decisively |
| Bursty / unattended agents | Variable, hard to cap | Variable, hard to cap | API + Batch (50% off) for non-urgent runs |
How to read this:
- If your output is under ~6-7M tokens/month on Sonnet, the API can come out cheaper than even Max 5x — you only pay for what you use. The crossover point on Opus is lower because Opus output is $25/MTok.
- If you run Claude Code interactively most of the day, a Max subscription almost always wins. You hit a flat ceiling instead of an open meter, and heavy interactive use blows past the API break-even fast.
- If you run unattended agents on a schedule, the API plus the Batch API's 50% discount is usually the right tool for the non-urgent portion, because you can bound spend per run instead of consuming subscription limits unpredictably.
The numbers are estimates — your input-to-output ratio, caching hit rate, and model mix all shift the line. The point is the shape: flat subscription wins for steady interactive work; metered API wins for light or burst-controllable work. For a deeper treatment of the trade-off, see Claude Code Max plan vs API.
The Cost-Cutting Checklist
These levers are all live today. Work through them in order — the early ones have the biggest payoff for the least effort.
-
Pick the cheapest model that does the job. Sonnet 4.6 is $3/$15 versus Opus 4.8 at $5/$25 — 40% cheaper on input, 40% cheaper on output. Reserve Opus for the hard reasoning steps and let Sonnet handle scanning, extraction, and routine edits. In Claude Code, run cheaper models for fan-out work and keep Opus for synthesis.
-
Turn on prompt caching and keep your prefix stable. Cache reads are 0.1x base input. The catch is the prefix-match rule: any byte change anywhere in the cached prefix invalidates everything after it. Freeze your system prompt — no
Date.now(), no per-request IDs, no unsorted JSON early in the prompt. Put volatile content last. -
Avoid the 5-minute cache cliff. The default cache TTL expires after 5 minutes of inactivity. Any call that fires more than 300 seconds after the previous one misses the cache entirely and pays a full re-write. For bursty traffic with gaps, use the 1-hour TTL (
ttl: '1h') at 2.0x the write cost — it pays off after two reads and keeps long, gappy workflows cheap. -
Run
/compactand keep context lean. Long sessions accumulate stale tool output and completed reasoning. Compaction summarizes earlier context so you stop re-sending tokens you no longer need. A lean context window is a cheaper context window on every subsequent turn. -
Batch non-urgent work. The Batch API is 50% off both input and output. Anything that does not need a real-time answer — overnight test generation, a documentation sweep, a backlog of classification — belongs in a batch.
-
Set adaptive thinking and tune effort, not a token budget. On Opus 4.8 and Sonnet 4.6, use adaptive thinking plus
output_config.effort(lowthroughmax). Lower effort means fewer, more-consolidated tool calls and less preamble. Note thatbudget_tokens,temperature,top_p, andtop_kare removed on Opus 4.8 and return a 400 — do not reach for them. -
Instrument your token spend now. Even though the credit split is paused, it may return. Track
cache_read_input_tokens,cache_creation_input_tokens, and output tokens per session so you already know your real usage profile if the per-user pool ever lands. This is also how you verify caching is working — ifcache_read_input_tokensis zero across repeated requests, a silent invalidator is in your prefix.
For more on running Claude Code without a live operator — where unattended cost control matters most — see Claude Code headless mode. For a deeper, single-topic treatment of token reduction, see how to cut Claude Code token costs.
The One Thing You Must Fix: Retired Models
The only June 15 change that requires action is the model retirement. claude-opus-4-20250514 and claude-sonnet-4-20250514 were both deprecated April 14, 2026 and retired June 15. Requests to either now fail on Anthropic-operated platforms (Claude API, Claude Platform on AWS, Microsoft Foundry). Partner platforms — Amazon Bedrock and Google Vertex AI — set their own schedules, so the dated Sonnet 4 ID may still resolve there for a while.
The drop-in replacements, per Anthropic's own deprecation page, are the bare aliases:
- model: "claude-opus-4-20250514"
+ model: "claude-opus-4-8"
- model: "claude-sonnet-4-20250514"
+ model: "claude-sonnet-4-6"Use the alias claude-sonnet-4-6, not a dated variant. Two things to know when you swap:
- Adaptive thinking only.
thinking: {type: "enabled", budget_tokens: N}returns a 400 on the new models. Usethinking: {type: "adaptive"}and control depth withoutput_config.effortinstead. - No sampling parameters.
temperature,top_p, andtop_kare removed on Opus 4.8 and return a 400. Steer with prompting.
If you were also relying on the inherited Claude Code system prompt in the Python Agent SDK, you now set it explicitly via systemPrompt={'type':'preset','preset':'claude_code'} in ClaudeAgentOptions — a consequence of the September 2025 SDK rename, not the June billing date.
Frequently Asked Questions
Did Claude Code's billing change on June 15, 2026?
No. The planned split that would have moved Agent SDK, claude -p, GitHub Actions, and third-party apps to a separate monthly credit pool was paused just before June 15. Anthropic's Help Center now states those surfaces still draw from your normal subscription limits, and the company told press "Nothing changes for now."
So what actually changed on June 15, 2026?
Two old model IDs retired: claude-opus-4-20250514 and claude-sonnet-4-20250514. API calls to either now return errors. Replace them with claude-opus-4-8 and claude-sonnet-4-6. That is the only thing developers had to act on.
Do I still need to rename claude-code-sdk to claude-agent-sdk?
If you have not already, yes — but that rename happened back on September 29, 2025 with Agent SDK v0.1.0, not on June 15, 2026. The package is claude-agent-sdk (Python) and @anthropic-ai/claude-agent-sdk (TypeScript), and ClaudeCodeOptions became ClaudeAgentOptions. It is unrelated to the paused billing change.
What does Claude Code actually cost now?
If you use the terminal or IDE on a Pro ($20/month) or Max ($100 or $200/month) subscription, the same flat subscription as before — the credit pool never went live. If you call the API directly, you pay per token: Sonnet 4.6 at $3/$15 per million in/out, Opus 4.8 at $5/$25.
How do I cut Claude Code and Agent SDK token costs?
Pick the cheapest model that does the job (Sonnet 4.6 is 40% cheaper than Opus 4.8 on both input and output), turn on prompt caching (cache reads are 0.1x base input), run /compact to keep context lean, batch non-urgent work for 50% off, and avoid the 5-minute cache TTL cliff by using the 1-hour TTL for bursty traffic.
Will the Claude Code billing split come back?
Possibly. Anthropic only said it is paused "for now" while it works to better align the plan with actual usage patterns, against the backdrop of an OpenAI price war. The proposed credit amounts ($20 Pro / $100 Max 5x / $200 Max 20x, per-user, no rollover) are documented, so it is worth instrumenting your token spend now in case the split returns.
Wrapping Up
The headline that mattered for June 15 was the one that didn't happen. The billing split is paused, programmatic usage still runs on your subscription limits, and the SDK rename was old news from September. The only thing on your to-do list is swapping the two retired model IDs to claude-opus-4-8 and claude-sonnet-4-6, and remembering that the new models drop budget_tokens and sampling parameters.
Everything that actually controls your bill — model choice, caching, compaction, batching — is unchanged and live. Tune those, instrument your token spend, and you are covered whether or not the credit pool ever ships.
Posted by @speedy_devv
Stop configuring. Start building.
SaaS builder templates with AI orchestration.