Cut Claude Code Token Costs
Five open-source tools that knock 40% to 95% off your Claude Code spend, with install commands, percentage sources, and the order to stack them in.
Hören Sie auf zu konfigurieren. Fangen Sie an zu bauen.
SaaS-Builder-Vorlagen mit KI-Orchestrierung.
Problem: Your Claude Code bill went up twice in 30 days. On June 15, 2026, Anthropic moves Agent SDK, claude -p, and Claude Code GitHub Actions onto a separate metered credit pool that does not roll over. Once the pool drains, you pay full API rates (The New Stack). At the same time, the new Opus 4.7 tokenizer reports about 1.46x more text tokens than 4.6 at the same per-token price, which Simon Willison flagged as "actually a pretty big price bump" (simonwillison.net).
Quick Win: Five GitHub repos fight back. Install one tonight, install all five over the week, and pair them with cc-ledger so you can see the line move.
Every percentage in this post is vendor-stated. Real savings shift with codebase size, MCP server count, and how often your sessions repeat work.
Why your bill is climbing in May 2026
Three things are stacking on top of each other.
First, the June 15 split. Programmatic Claude Code usage gets its own dedicated budget instead of sharing the chat pool. The Pro plan ships $20 of Agent SDK credit, Max 5x ships $100, Max 20x ships $200. None of it rolls over. Interactive Claude Code in your terminal is unaffected (The Register).
Second, the Opus 4.7 tokenizer. Same dollars per token, more tokens per request. Willison's testing measured 1.46x for plain text and up to 3.01x for images. A 15MB PDF only inflated 1.08x, so the impact varies with content type (simonwillison.net).
Third, Fast Mode now defaults to Opus 4.7 in recent Claude Code releases. Faster, smarter, and quietly more expensive per request than the 4.6 baseline you had a month ago.
The fix is not "use Sonnet for everything." The fix is fewer tokens hitting the wire on every call you do make.
The five tools, ranked by max stated savings
Verify each one in your own workflow. The percentages below come straight from each repo's README or the third-party listing noted.
- lean-ctx: 60% to 95% reduction across reads, up to 99% on cached reads (README).
- airis-mcp-gateway: up to 97% context token reduction. The 97% figure comes from the VoltAgent listing, not the repo itself. The repo's own README says only "Token Efficiency: Measurable reduction in initial context overhead" with no number (README).
- agentmemory: 92% fewer tokens than pasting full context across sessions. The badge sits at the top of the README (README).
- 9router: 20% to 40% per request via RTK token compression on tool output. Worked example in the README: 47K tokens shrunk to 28K (README).
- cc-ledger: 0% direct savings. This is the meter. You need it on before you can prove the other four did anything (README).
The comparison table
| Tool | Max savings claim | Claim source | How it works | Install | Best for |
|---|---|---|---|---|---|
| lean-ctx | 60% to 95% (99% cached) | Vendor README | Rust binary acts as shell hook plus MCP server, compresses file reads and shell output before they reach the model | curl -fsSL https://leanctx.com/install.sh | sh | Read-heavy and grep-heavy work on the same files |
| airis-mcp-gateway | up to 97% | Third-party listing (VoltAgent) | Docker MCP multiplexer aggregates many MCP servers behind one SSE endpoint with on-demand lifecycle | curl -fsSL https://raw.githubusercontent.com/agiletec-inc/airis-mcp-gateway/main/install.sh | bash | Setups with five or more MCP servers wired up |
| agentmemory | 92% vs full re-paste | Vendor README | Persistent memory MCP captures what the agent does, replays prior context into new sessions | npx @agentmemory/agentmemory | Repeated work on the same codebase across days |
| 9router | 20% to 40% per request | Vendor README (worked example) | Multi-provider router with RTK compression on tool_result content, plus routing to 40+ providers including free tiers | npm install -g 9router | Mixing cheaper providers with output compression |
| cc-ledger | 0% direct | N/A (observability) | Hooks capture every turn's input, output, cache_read, cache_write into a local SQLite ledger | curl -fsSL https://ccledger.dev/install | bash | Anyone running the four above |
lean-ctx: compress the inputs
lean-ctx is a single Rust binary that sits between Claude Code and your filesystem. It hooks every file read, every grep, every shell command. Output gets compressed before it reaches the model.
The headline claim is 60% to 95% reduction, with up to 99% on cached reads (README). The 99% figure assumes cache hits. Cold runs see less.
Three commands stand it up:
curl -fsSL https://leanctx.com/install.sh | sh
lean-ctx setup
lean-ctx init --agent claude-codeBest fit: workflows where you re-read the same files often. Worst fit: thin sessions with mostly short prompts. The compression overhead pays off only when there is something to compress.
airis-mcp-gateway: compress the tool listings
If your Claude Code talks to Sentry, GitHub, Linear, Postgres, and a couple of other MCP servers, the system prompt pays a tax for every tool listing on every turn. airis-mcp-gateway aggregates many MCP servers behind a single SSE endpoint with intelligent routing and on-demand lifecycle management (README).
The 97% figure that travels with this repo comes from the VoltAgent awesome-claude-code-subagents listing. The repo itself is more conservative. Read both before you quote a number to your team.
Production install:
curl -fsSL https://raw.githubusercontent.com/agiletec-inc/airis-mcp-gateway/main/install.sh | bashDev install with Docker:
docker compose up -dSkip this one if your .mcp.json lists fewer than five servers. The savings come from collapsing tool listings, and a slim setup has nothing to collapse.
9router: compress the outputs and arbitrage providers
9router is a multi-provider router with two tricks. The first is RTK Token Saver, which auto-compresses tool_result content like git diff, grep, find, ls, tree, and log output. The README quotes 20% to 40% per request and shows a worked example: 47K tokens without RTK, 28K tokens with it, a 40% cut on that one call (README).
The second trick is provider routing. 9router fronts 40+ providers including Kiro AI (Claude 4.5), OpenCode Free, Vertex AI's $300 credit pool, GLM at $0.6/1M, MiniMax at $0.2/1M, and Kimi at $9/month (README).
Install:
npm install -g 9router
9routerCaveat worth saying out loud. Routing to non-Anthropic providers changes the trust profile, the latency, the model quality, and the data handling. Read the provider's terms before you push real client code through it.
agentmemory: stop paying for re-explaining your project
Every new Claude Code session starts fresh. You re-paste the stack notes, the rules, the patterns. agentmemory kills that cost. It captures what the agent does via hooks, compresses into searchable observations, and injects relevant prior context into future sessions.
The README claims 92% fewer tokens against the worst-case "paste full context every session" baseline. Worked comparison in the repo: about 170K tokens per year (around $10) with agentmemory versus 19.5M+ tokens pasting full context manually (README).
Install:
npx @agentmemory/agentmemoryIf you already use Claude Code's /resume and a tight CLAUDE.md, the marginal savings are smaller than the badge implies. Still useful. Just not 92% useful.
Works with Claude Code, Cursor, Gemini CLI, Codex CLI, OpenCode, plus Claude Desktop, Windsurf, Roo Code, Cline, Goose, and Aider (README).
cc-ledger: see what you spent
You cannot manage what you cannot see. cc-ledger captures every Claude Code edit, prompt, and per-turn token cost via Claude Code hooks. It writes to ~/.cc-ledger/ledger.db and tracks five token classes per turn: input, output, cache_read, cache_write_5m, and cache_write_1h (README).
Those classes match Anthropic's own pricing model. Cache reads bill at 0.10x input rates. 5-minute cache writes bill at 1.25x. 1-hour cache writes bill at 2x (Claude Code docs). Without a ledger, prompt caching feels invisible. With one, you see the line move.
Install:
curl -fsSL https://ccledger.dev/install | bashcc-ledger also computes "shadow billing", which estimates what your subscription usage would have cost on the API. Sat at six stars on May 15, 2026. Early-stage. Use it for visibility, not as a billing system of record.
The do-this-in-order recipe
Stack the savings in this order. Each step compounds on the last, and the ledger lets you see what each step actually saved.
- Install cc-ledger first. You need a baseline. Run
curl -fsSL https://ccledger.dev/install | bash, then work normally for one day. Note the daily spend. - Install agentmemory. This kills the cost of re-explaining your project on every new session. Run
npx @agentmemory/agentmemoryand connect it to Claude Code. - Install lean-ctx. This compresses every file read and shell command before it hits the model. Run the three setup commands listed above.
- Add airis-mcp-gateway only if you have five or more MCP servers configured. Otherwise skip it.
- Add 9router only if you are willing to route to non-Anthropic providers. Highest impact for Pro users on tight budgets, also the most disruptive change to your workflow.
- Re-check cc-ledger after one week. Compare against your baseline.
One install at a time, with the ledger between each step. That's how you tell which tool moved the line.
What Anthropic itself recommends (free)
Before any third-party tool, the free moves are in Anthropic's own cost guide:
- Run
/clearbetween unrelated tasks so context does not balloon. - Run
/compactwith custom instructions to keep only what the next step needs. - Set
MAX_THINKING_TOKENS=8000so extended thinking has a ceiling. - Prefer CLI tools over MCP servers when the CLI is installed already.
- Move CLAUDE.md detail into skills, so the bytes load only when needed.
Anthropic also publishes a "$13 per developer per active day" enterprise benchmark, and notes that agent teams use about 7x more tokens than standard sessions (Claude Code docs). Worth knowing before you launch a fleet of subagents on a tight budget.
Risks and caveats
A few honest gotchas, in order of how often they bite people.
Every percentage above is vendor-stated. The 97% airis figure comes from a third-party listing, not the repo itself. The 99% lean-ctx figure assumes cache hits. The 92% agentmemory figure compares against the worst-case baseline. The 40% 9router figure is one worked example, not a benchmark.
9router routes traffic to non-Anthropic providers. That changes trust, latency, quality, and data handling. Read the provider's terms before sending real client code.
cc-ledger is early-stage. Use it for visibility, not as a billing system of record.
The June 15 split is recent. Anthropic has changed billing twice in two months. Check the Anthropic pricing page before making subscription decisions based on this post.
Stacking five tools adds operational surface area. Bash install, Docker, npm global, npx daemon, hook scripts. Install one at a time and re-measure with cc-ledger between each.
None of these tools is endorsed by Anthropic. They are community projects. The MIT and Apache 2.0 licenses cover code use. They do not cover support or breakage.
Per-feature cost, not per-prompt cost
The five tools above attack the cost per request. Build This Now attacks the cost per feature.
A planning team (backend, frontend, ux) plans in parallel against a fixed contract, which kills the contract-mismatch re-prompts that burn tokens in raw Claude Code. A GAN adversarial loop catches issues in two rounds max instead of ten chat turns of "no, fix this too." Quality gates ship features that pass npx tsc --noEmit, npx eslint ., and npm run build before being marked done, so you stop paying for fixes the model created. Mulch self-learning means the system stops re-discovering the same patterns across sessions.
You still pay for Claude Pro ($20/mo). That requirement is separate. CodeKit is $79 one-time. Speedy Swarm Desktop is $197 one-time. No subscriptions on top.
FAQ
Does prompt caching actually reduce Claude Code costs? Yes. Cached reads bill at 0.10x input rates, 5-minute cache writes at 1.25x, 1-hour cache writes at 2x (Claude Code docs).
What changes for Claude Code on June 15, 2026? Agent SDK, claude -p, and Claude Code GitHub Actions move to a separate Agent SDK credit pool: $20 on Pro, $100 on Max 5x, $200 on Max 20x. None of it rolls over. Interactive Claude Code in your terminal is unaffected (The New Stack).
Is Opus 4.7 more expensive than Opus 4.6? Same per-token price. The new tokenizer reports about 1.46x more text tokens, so effective cost runs roughly 40% higher per request. Image content can hit 3x (simonwillison.net).
Can I run Claude Code on free providers? 9router routes to free tiers like Kiro AI (Claude 4.5), OpenCode Free, and Vertex AI's $300 credit pool (README).
The bill went up twice in 30 days. The fix is not one tool. The fix is a stack with a meter on top. Install the ledger first, then layer the rest.
Hören Sie auf zu konfigurieren. Fangen Sie an zu bauen.
SaaS-Builder-Vorlagen mit KI-Orchestrierung.
Claude Code Memory
Konfiguriere CLAUDE.md so, dass dein Stack, deine Konventionen und dein Fokus beim Start in den hochprioritären Slot geladen werden, dem Claude stärker folgt als Chat oder geladenen Dateien.
Dynamischer Startkontext
Kombiniere --init mit einem Slash-Befehl wie /blog oder /ship, um genau das Kontext-Bundle zu laden, das diese Art von Arbeit braucht. Keine Setup-Hooks, keine Env-Vars, kein Copy-Paste.