Build This Now
Build This Now
speedy_devvkoen_salo
Blog/Handbook/Core/Cut Claude Code Token Costs

Cut Claude Code Token Costs

Five open-source tools that knock 40% to 95% off your Claude Code spend, with install commands, percentage sources, and the order to stack them in.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

Published May 15, 202611 min readHandbook hubCore index

Problem: Your Claude Code bill went up twice in 30 days. On June 15, 2026, Anthropic moves Agent SDK, claude -p, and Claude Code GitHub Actions onto a separate metered credit pool that does not roll over. Once the pool drains, you pay full API rates (The New Stack). At the same time, the new Opus 4.7 tokenizer reports about 1.46x more text tokens than 4.6 at the same per-token price, which Simon Willison flagged as "actually a pretty big price bump" (simonwillison.net).

Quick Win: Five GitHub repos fight back. Install one tonight, install all five over the week, and pair them with cc-ledger so you can see the line move.

Every percentage in this post is vendor-stated. Real savings shift with codebase size, MCP server count, and how often your sessions repeat work.

Why your bill is climbing in May 2026

Three things are stacking on top of each other.

First, the June 15 split. Programmatic Claude Code usage gets its own dedicated budget instead of sharing the chat pool. The Pro plan ships $20 of Agent SDK credit, Max 5x ships $100, Max 20x ships $200. None of it rolls over. Interactive Claude Code in your terminal is unaffected (The Register).

Second, the Opus 4.7 tokenizer. Same dollars per token, more tokens per request. Willison's testing measured 1.46x for plain text and up to 3.01x for images. A 15MB PDF only inflated 1.08x, so the impact varies with content type (simonwillison.net).

Third, Fast Mode now defaults to Opus 4.7 in recent Claude Code releases. Faster, smarter, and quietly more expensive per request than the 4.6 baseline you had a month ago.

The fix is not "use Sonnet for everything." The fix is fewer tokens hitting the wire on every call you do make.

The five tools, ranked by max stated savings

Verify each one in your own workflow. The percentages below come straight from each repo's README or the third-party listing noted.

  1. lean-ctx: 60% to 95% reduction across reads, up to 99% on cached reads (README).
  2. airis-mcp-gateway: up to 97% context token reduction. The 97% figure comes from the VoltAgent listing, not the repo itself. The repo's own README says only "Token Efficiency: Measurable reduction in initial context overhead" with no number (README).
  3. agentmemory: 92% fewer tokens than pasting full context across sessions. The badge sits at the top of the README (README).
  4. 9router: 20% to 40% per request via RTK token compression on tool output. Worked example in the README: 47K tokens shrunk to 28K (README).
  5. cc-ledger: 0% direct savings. This is the meter. You need it on before you can prove the other four did anything (README).

The comparison table

ToolMax savings claimClaim sourceHow it worksInstallBest for
lean-ctx60% to 95% (99% cached)Vendor READMERust binary acts as shell hook plus MCP server, compresses file reads and shell output before they reach the modelcurl -fsSL https://leanctx.com/install.sh | shRead-heavy and grep-heavy work on the same files
airis-mcp-gatewayup to 97%Third-party listing (VoltAgent)Docker MCP multiplexer aggregates many MCP servers behind one SSE endpoint with on-demand lifecyclecurl -fsSL https://raw.githubusercontent.com/agiletec-inc/airis-mcp-gateway/main/install.sh | bashSetups with five or more MCP servers wired up
agentmemory92% vs full re-pasteVendor READMEPersistent memory MCP captures what the agent does, replays prior context into new sessionsnpx @agentmemory/agentmemoryRepeated work on the same codebase across days
9router20% to 40% per requestVendor README (worked example)Multi-provider router with RTK compression on tool_result content, plus routing to 40+ providers including free tiersnpm install -g 9routerMixing cheaper providers with output compression
cc-ledger0% directN/A (observability)Hooks capture every turn's input, output, cache_read, cache_write into a local SQLite ledgercurl -fsSL https://ccledger.dev/install | bashAnyone running the four above

lean-ctx: compress the inputs

lean-ctx is a single Rust binary that sits between Claude Code and your filesystem. It hooks every file read, every grep, every shell command. Output gets compressed before it reaches the model.

The headline claim is 60% to 95% reduction, with up to 99% on cached reads (README). The 99% figure assumes cache hits. Cold runs see less.

Three commands stand it up:

curl -fsSL https://leanctx.com/install.sh | sh
lean-ctx setup
lean-ctx init --agent claude-code

Best fit: workflows where you re-read the same files often. Worst fit: thin sessions with mostly short prompts. The compression overhead pays off only when there is something to compress.

airis-mcp-gateway: compress the tool listings

If your Claude Code talks to Sentry, GitHub, Linear, Postgres, and a couple of other MCP servers, the system prompt pays a tax for every tool listing on every turn. airis-mcp-gateway aggregates many MCP servers behind a single SSE endpoint with intelligent routing and on-demand lifecycle management (README).

The 97% figure that travels with this repo comes from the VoltAgent awesome-claude-code-subagents listing. The repo itself is more conservative. Read both before you quote a number to your team.

Production install:

curl -fsSL https://raw.githubusercontent.com/agiletec-inc/airis-mcp-gateway/main/install.sh | bash

Dev install with Docker:

docker compose up -d

Skip this one if your .mcp.json lists fewer than five servers. The savings come from collapsing tool listings, and a slim setup has nothing to collapse.

9router: compress the outputs and arbitrage providers

9router is a multi-provider router with two tricks. The first is RTK Token Saver, which auto-compresses tool_result content like git diff, grep, find, ls, tree, and log output. The README quotes 20% to 40% per request and shows a worked example: 47K tokens without RTK, 28K tokens with it, a 40% cut on that one call (README).

The second trick is provider routing. 9router fronts 40+ providers including Kiro AI (Claude 4.5), OpenCode Free, Vertex AI's $300 credit pool, GLM at $0.6/1M, MiniMax at $0.2/1M, and Kimi at $9/month (README).

Install:

npm install -g 9router
9router

Caveat worth saying out loud. Routing to non-Anthropic providers changes the trust profile, the latency, the model quality, and the data handling. Read the provider's terms before you push real client code through it.

agentmemory: stop paying for re-explaining your project

Every new Claude Code session starts fresh. You re-paste the stack notes, the rules, the patterns. agentmemory kills that cost. It captures what the agent does via hooks, compresses into searchable observations, and injects relevant prior context into future sessions.

The README claims 92% fewer tokens against the worst-case "paste full context every session" baseline. Worked comparison in the repo: about 170K tokens per year (around $10) with agentmemory versus 19.5M+ tokens pasting full context manually (README).

Install:

npx @agentmemory/agentmemory

If you already use Claude Code's /resume and a tight CLAUDE.md, the marginal savings are smaller than the badge implies. Still useful. Just not 92% useful.

Works with Claude Code, Cursor, Gemini CLI, Codex CLI, OpenCode, plus Claude Desktop, Windsurf, Roo Code, Cline, Goose, and Aider (README).

cc-ledger: see what you spent

You cannot manage what you cannot see. cc-ledger captures every Claude Code edit, prompt, and per-turn token cost via Claude Code hooks. It writes to ~/.cc-ledger/ledger.db and tracks five token classes per turn: input, output, cache_read, cache_write_5m, and cache_write_1h (README).

Those classes match Anthropic's own pricing model. Cache reads bill at 0.10x input rates. 5-minute cache writes bill at 1.25x. 1-hour cache writes bill at 2x (Claude Code docs). Without a ledger, prompt caching feels invisible. With one, you see the line move.

Install:

curl -fsSL https://ccledger.dev/install | bash

cc-ledger also computes "shadow billing", which estimates what your subscription usage would have cost on the API. Sat at six stars on May 15, 2026. Early-stage. Use it for visibility, not as a billing system of record.

The do-this-in-order recipe

Stack the savings in this order. Each step compounds on the last, and the ledger lets you see what each step actually saved.

  1. Install cc-ledger first. You need a baseline. Run curl -fsSL https://ccledger.dev/install | bash, then work normally for one day. Note the daily spend.
  2. Install agentmemory. This kills the cost of re-explaining your project on every new session. Run npx @agentmemory/agentmemory and connect it to Claude Code.
  3. Install lean-ctx. This compresses every file read and shell command before it hits the model. Run the three setup commands listed above.
  4. Add airis-mcp-gateway only if you have five or more MCP servers configured. Otherwise skip it.
  5. Add 9router only if you are willing to route to non-Anthropic providers. Highest impact for Pro users on tight budgets, also the most disruptive change to your workflow.
  6. Re-check cc-ledger after one week. Compare against your baseline.

One install at a time, with the ledger between each step. That's how you tell which tool moved the line.

What Anthropic itself recommends (free)

Before any third-party tool, the free moves are in Anthropic's own cost guide:

  • Run /clear between unrelated tasks so context does not balloon.
  • Run /compact with custom instructions to keep only what the next step needs.
  • Set MAX_THINKING_TOKENS=8000 so extended thinking has a ceiling.
  • Prefer CLI tools over MCP servers when the CLI is installed already.
  • Move CLAUDE.md detail into skills, so the bytes load only when needed.

Anthropic also publishes a "$13 per developer per active day" enterprise benchmark, and notes that agent teams use about 7x more tokens than standard sessions (Claude Code docs). Worth knowing before you launch a fleet of subagents on a tight budget.

Risks and caveats

A few honest gotchas, in order of how often they bite people.

Every percentage above is vendor-stated. The 97% airis figure comes from a third-party listing, not the repo itself. The 99% lean-ctx figure assumes cache hits. The 92% agentmemory figure compares against the worst-case baseline. The 40% 9router figure is one worked example, not a benchmark.

9router routes traffic to non-Anthropic providers. That changes trust, latency, quality, and data handling. Read the provider's terms before sending real client code.

cc-ledger is early-stage. Use it for visibility, not as a billing system of record.

The June 15 split is recent. Anthropic has changed billing twice in two months. Check the Anthropic pricing page before making subscription decisions based on this post.

Stacking five tools adds operational surface area. Bash install, Docker, npm global, npx daemon, hook scripts. Install one at a time and re-measure with cc-ledger between each.

None of these tools is endorsed by Anthropic. They are community projects. The MIT and Apache 2.0 licenses cover code use. They do not cover support or breakage.

Per-feature cost, not per-prompt cost

The five tools above attack the cost per request. Build This Now attacks the cost per feature.

A planning team (backend, frontend, ux) plans in parallel against a fixed contract, which kills the contract-mismatch re-prompts that burn tokens in raw Claude Code. A GAN adversarial loop catches issues in two rounds max instead of ten chat turns of "no, fix this too." Quality gates ship features that pass npx tsc --noEmit, npx eslint ., and npm run build before being marked done, so you stop paying for fixes the model created. Mulch self-learning means the system stops re-discovering the same patterns across sessions.

You still pay for Claude Pro ($20/mo). That requirement is separate. CodeKit is $79 one-time. Speedy Swarm Desktop is $197 one-time. No subscriptions on top.

FAQ

Does prompt caching actually reduce Claude Code costs? Yes. Cached reads bill at 0.10x input rates, 5-minute cache writes at 1.25x, 1-hour cache writes at 2x (Claude Code docs).

What changes for Claude Code on June 15, 2026? Agent SDK, claude -p, and Claude Code GitHub Actions move to a separate Agent SDK credit pool: $20 on Pro, $100 on Max 5x, $200 on Max 20x. None of it rolls over. Interactive Claude Code in your terminal is unaffected (The New Stack).

Is Opus 4.7 more expensive than Opus 4.6? Same per-token price. The new tokenizer reports about 1.46x more text tokens, so effective cost runs roughly 40% higher per request. Image content can hit 3x (simonwillison.net).

Can I run Claude Code on free providers? 9router routes to free tiers like Kiro AI (Claude 4.5), OpenCode Free, and Vertex AI's $300 credit pool (README).

The bill went up twice in 30 days. The fix is not one tool. The fix is a stack with a meter on top. Install the ledger first, then layer the rest.

Continue in Core

  • 1M Context Window in Claude Code
    Anthropic flipped the 1M token context window on for Opus 4.6 and Sonnet 4.6 in Claude Code. No beta header, no surcharge, flat pricing, and fewer compactions.
  • AGENTS.md vs CLAUDE.md Explained
    Two context files, one codebase. How AGENTS.md and CLAUDE.md differ, what each one does, and how to use both without duplicating anything.
  • Auto Dream
    Claude Code cleans up its own project notes between sessions. Stale entries get pruned, contradictions get resolved, topic files get reshuffled. Run /memory.
  • Auto Memory in Claude Code
    Auto memory lets Claude Code keep running project notes. Where the files sit, what gets written, how /memory toggles it, and when to pick it over CLAUDE.md.
  • Auto-Planning Strategies
    Auto Plan Mode uses --append-system-prompt to force Claude Code into a plan-first loop. File operations pause for approval before anything gets touched.
  • Autonomous Claude Code
    A unified stack for agents that ship features overnight. Threads give you the structure, Ralph loops give you the autonomy, verification keeps it honest.

More from Handbook

  • Agent Fundamentals
    Five ways to build specialist agents in Claude Code: Task sub-agents, .claude/agents YAML, custom slash commands, CLAUDE.md personas, and perspective prompts.
  • Agent Harness Engineering
    The harness is every layer around your AI agent except the model itself. Learn the five control levers, the constraint paradox, and why harness design determines agent performance more than the model does.
  • Agent Patterns
    Orchestrator, fan-out, validation chain, specialist routing, progressive refinement, and watchdog. Six orchestration shapes to wire Claude Code sub-agents with.
  • Agent Teams Best Practices
    Battle-tested patterns for Claude Code Agent Teams. Context-rich spawn prompts, right-sized tasks, file ownership, delegate mode, and v2.1.33-v2.1.45 fixes.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

On this page

Why your bill is climbing in May 2026
The five tools, ranked by max stated savings
The comparison table
lean-ctx: compress the inputs
airis-mcp-gateway: compress the tool listings
9router: compress the outputs and arbitrage providers
agentmemory: stop paying for re-explaining your project
cc-ledger: see what you spent
The do-this-in-order recipe
What Anthropic itself recommends (free)
Risks and caveats
Per-feature cost, not per-prompt cost
FAQ

Stop configuring. Start building.

SaaS builder templates with AI orchestration.