Build This Now
Build This Now
O que é o Código Claude?Instalar o Claude CodeInstalador Nativo do Claude CodeO Teu Primeiro Projeto com Claude Code
Código Claude Memória de sessãoMemória automática no código ClaudeAuto DreamMemória do Claude CodeCut Claude Code Token CostsContexto Inicial Dinâmico
speedy_devvkoen_salo
Blog/Handbook/Core/Cut Claude Code Token Costs

Cut Claude Code Token Costs

Five open-source tools that knock 40% to 95% off your Claude Code spend, with install commands, percentage sources, and the order to stack them in.

Pare de configurar. Comece a construir.

Templates SaaS com orquestração de IA.

Published May 15, 202611 min readHandbook hubCore index

Problem: Your Claude Code bill went up twice in 30 days. On June 15, 2026, Anthropic moves Agent SDK, claude -p, and Claude Code GitHub Actions onto a separate metered credit pool that does not roll over. Once the pool drains, you pay full API rates (The New Stack). At the same time, the new Opus 4.7 tokenizer reports about 1.46x more text tokens than 4.6 at the same per-token price, which Simon Willison flagged as "actually a pretty big price bump" (simonwillison.net).

Quick Win: Five GitHub repos fight back. Install one tonight, install all five over the week, and pair them with cc-ledger so you can see the line move.

Every percentage in this post is vendor-stated. Real savings shift with codebase size, MCP server count, and how often your sessions repeat work.

Why your bill is climbing in May 2026

Three things are stacking on top of each other.

First, the June 15 split. Programmatic Claude Code usage gets its own dedicated budget instead of sharing the chat pool. The Pro plan ships $20 of Agent SDK credit, Max 5x ships $100, Max 20x ships $200. None of it rolls over. Interactive Claude Code in your terminal is unaffected (The Register).

Second, the Opus 4.7 tokenizer. Same dollars per token, more tokens per request. Willison's testing measured 1.46x for plain text and up to 3.01x for images. A 15MB PDF only inflated 1.08x, so the impact varies with content type (simonwillison.net).

Third, Fast Mode now defaults to Opus 4.7 in recent Claude Code releases. Faster, smarter, and quietly more expensive per request than the 4.6 baseline you had a month ago.

The fix is not "use Sonnet for everything." The fix is fewer tokens hitting the wire on every call you do make.

The five tools, ranked by max stated savings

Verify each one in your own workflow. The percentages below come straight from each repo's README or the third-party listing noted.

  1. lean-ctx: 60% to 95% reduction across reads, up to 99% on cached reads (README).
  2. airis-mcp-gateway: up to 97% context token reduction. The 97% figure comes from the VoltAgent listing, not the repo itself. The repo's own README says only "Token Efficiency: Measurable reduction in initial context overhead" with no number (README).
  3. agentmemory: 92% fewer tokens than pasting full context across sessions. The badge sits at the top of the README (README).
  4. 9router: 20% to 40% per request via RTK token compression on tool output. Worked example in the README: 47K tokens shrunk to 28K (README).
  5. cc-ledger: 0% direct savings. This is the meter. You need it on before you can prove the other four did anything (README).

The comparison table

ToolMax savings claimClaim sourceHow it worksInstallBest for
lean-ctx60% to 95% (99% cached)Vendor READMERust binary acts as shell hook plus MCP server, compresses file reads and shell output before they reach the modelcurl -fsSL https://leanctx.com/install.sh | shRead-heavy and grep-heavy work on the same files
airis-mcp-gatewayup to 97%Third-party listing (VoltAgent)Docker MCP multiplexer aggregates many MCP servers behind one SSE endpoint with on-demand lifecyclecurl -fsSL https://raw.githubusercontent.com/agiletec-inc/airis-mcp-gateway/main/install.sh | bashSetups with five or more MCP servers wired up
agentmemory92% vs full re-pasteVendor READMEPersistent memory MCP captures what the agent does, replays prior context into new sessionsnpx @agentmemory/agentmemoryRepeated work on the same codebase across days
9router20% to 40% per requestVendor README (worked example)Multi-provider router with RTK compression on tool_result content, plus routing to 40+ providers including free tiersnpm install -g 9routerMixing cheaper providers with output compression
cc-ledger0% directN/A (observability)Hooks capture every turn's input, output, cache_read, cache_write into a local SQLite ledgercurl -fsSL https://ccledger.dev/install | bashAnyone running the four above

lean-ctx: compress the inputs

lean-ctx is a single Rust binary that sits between Claude Code and your filesystem. It hooks every file read, every grep, every shell command. Output gets compressed before it reaches the model.

The headline claim is 60% to 95% reduction, with up to 99% on cached reads (README). The 99% figure assumes cache hits. Cold runs see less.

Three commands stand it up:

curl -fsSL https://leanctx.com/install.sh | sh
lean-ctx setup
lean-ctx init --agent claude-code

Best fit: workflows where you re-read the same files often. Worst fit: thin sessions with mostly short prompts. The compression overhead pays off only when there is something to compress.

airis-mcp-gateway: compress the tool listings

If your Claude Code talks to Sentry, GitHub, Linear, Postgres, and a couple of other MCP servers, the system prompt pays a tax for every tool listing on every turn. airis-mcp-gateway aggregates many MCP servers behind a single SSE endpoint with intelligent routing and on-demand lifecycle management (README).

The 97% figure that travels with this repo comes from the VoltAgent awesome-claude-code-subagents listing. The repo itself is more conservative. Read both before you quote a number to your team.

Production install:

curl -fsSL https://raw.githubusercontent.com/agiletec-inc/airis-mcp-gateway/main/install.sh | bash

Dev install with Docker:

docker compose up -d

Skip this one if your .mcp.json lists fewer than five servers. The savings come from collapsing tool listings, and a slim setup has nothing to collapse.

9router: compress the outputs and arbitrage providers

9router is a multi-provider router with two tricks. The first is RTK Token Saver, which auto-compresses tool_result content like git diff, grep, find, ls, tree, and log output. The README quotes 20% to 40% per request and shows a worked example: 47K tokens without RTK, 28K tokens with it, a 40% cut on that one call (README).

The second trick is provider routing. 9router fronts 40+ providers including Kiro AI (Claude 4.5), OpenCode Free, Vertex AI's $300 credit pool, GLM at $0.6/1M, MiniMax at $0.2/1M, and Kimi at $9/month (README).

Install:

npm install -g 9router
9router

Caveat worth saying out loud. Routing to non-Anthropic providers changes the trust profile, the latency, the model quality, and the data handling. Read the provider's terms before you push real client code through it.

agentmemory: stop paying for re-explaining your project

Every new Claude Code session starts fresh. You re-paste the stack notes, the rules, the patterns. agentmemory kills that cost. It captures what the agent does via hooks, compresses into searchable observations, and injects relevant prior context into future sessions.

The README claims 92% fewer tokens against the worst-case "paste full context every session" baseline. Worked comparison in the repo: about 170K tokens per year (around $10) with agentmemory versus 19.5M+ tokens pasting full context manually (README).

Install:

npx @agentmemory/agentmemory

If you already use Claude Code's /resume and a tight CLAUDE.md, the marginal savings are smaller than the badge implies. Still useful. Just not 92% useful.

Works with Claude Code, Cursor, Gemini CLI, Codex CLI, OpenCode, plus Claude Desktop, Windsurf, Roo Code, Cline, Goose, and Aider (README).

cc-ledger: see what you spent

You cannot manage what you cannot see. cc-ledger captures every Claude Code edit, prompt, and per-turn token cost via Claude Code hooks. It writes to ~/.cc-ledger/ledger.db and tracks five token classes per turn: input, output, cache_read, cache_write_5m, and cache_write_1h (README).

Those classes match Anthropic's own pricing model. Cache reads bill at 0.10x input rates. 5-minute cache writes bill at 1.25x. 1-hour cache writes bill at 2x (Claude Code docs). Without a ledger, prompt caching feels invisible. With one, you see the line move.

Install:

curl -fsSL https://ccledger.dev/install | bash

cc-ledger also computes "shadow billing", which estimates what your subscription usage would have cost on the API. Sat at six stars on May 15, 2026. Early-stage. Use it for visibility, not as a billing system of record.

The do-this-in-order recipe

Stack the savings in this order. Each step compounds on the last, and the ledger lets you see what each step actually saved.

  1. Install cc-ledger first. You need a baseline. Run curl -fsSL https://ccledger.dev/install | bash, then work normally for one day. Note the daily spend.
  2. Install agentmemory. This kills the cost of re-explaining your project on every new session. Run npx @agentmemory/agentmemory and connect it to Claude Code.
  3. Install lean-ctx. This compresses every file read and shell command before it hits the model. Run the three setup commands listed above.
  4. Add airis-mcp-gateway only if you have five or more MCP servers configured. Otherwise skip it.
  5. Add 9router only if you are willing to route to non-Anthropic providers. Highest impact for Pro users on tight budgets, also the most disruptive change to your workflow.
  6. Re-check cc-ledger after one week. Compare against your baseline.

One install at a time, with the ledger between each step. That's how you tell which tool moved the line.

What Anthropic itself recommends (free)

Before any third-party tool, the free moves are in Anthropic's own cost guide:

  • Run /clear between unrelated tasks so context does not balloon.
  • Run /compact with custom instructions to keep only what the next step needs.
  • Set MAX_THINKING_TOKENS=8000 so extended thinking has a ceiling.
  • Prefer CLI tools over MCP servers when the CLI is installed already.
  • Move CLAUDE.md detail into skills, so the bytes load only when needed.

Anthropic also publishes a "$13 per developer per active day" enterprise benchmark, and notes that agent teams use about 7x more tokens than standard sessions (Claude Code docs). Worth knowing before you launch a fleet of subagents on a tight budget.

Risks and caveats

A few honest gotchas, in order of how often they bite people.

Every percentage above is vendor-stated. The 97% airis figure comes from a third-party listing, not the repo itself. The 99% lean-ctx figure assumes cache hits. The 92% agentmemory figure compares against the worst-case baseline. The 40% 9router figure is one worked example, not a benchmark.

9router routes traffic to non-Anthropic providers. That changes trust, latency, quality, and data handling. Read the provider's terms before sending real client code.

cc-ledger is early-stage. Use it for visibility, not as a billing system of record.

The June 15 split is recent. Anthropic has changed billing twice in two months. Check the Anthropic pricing page before making subscription decisions based on this post.

Stacking five tools adds operational surface area. Bash install, Docker, npm global, npx daemon, hook scripts. Install one at a time and re-measure with cc-ledger between each.

None of these tools is endorsed by Anthropic. They are community projects. The MIT and Apache 2.0 licenses cover code use. They do not cover support or breakage.

Per-feature cost, not per-prompt cost

The five tools above attack the cost per request. Build This Now attacks the cost per feature.

A planning team (backend, frontend, ux) plans in parallel against a fixed contract, which kills the contract-mismatch re-prompts that burn tokens in raw Claude Code. A GAN adversarial loop catches issues in two rounds max instead of ten chat turns of "no, fix this too." Quality gates ship features that pass npx tsc --noEmit, npx eslint ., and npm run build before being marked done, so you stop paying for fixes the model created. Mulch self-learning means the system stops re-discovering the same patterns across sessions.

You still pay for Claude Pro ($20/mo). That requirement is separate. CodeKit is $79 one-time. Speedy Swarm Desktop is $197 one-time. No subscriptions on top.

FAQ

Does prompt caching actually reduce Claude Code costs? Yes. Cached reads bill at 0.10x input rates, 5-minute cache writes at 1.25x, 1-hour cache writes at 2x (Claude Code docs).

What changes for Claude Code on June 15, 2026? Agent SDK, claude -p, and Claude Code GitHub Actions move to a separate Agent SDK credit pool: $20 on Pro, $100 on Max 5x, $200 on Max 20x. None of it rolls over. Interactive Claude Code in your terminal is unaffected (The New Stack).

Is Opus 4.7 more expensive than Opus 4.6? Same per-token price. The new tokenizer reports about 1.46x more text tokens, so effective cost runs roughly 40% higher per request. Image content can hit 3x (simonwillison.net).

Can I run Claude Code on free providers? 9router routes to free tiers like Kiro AI (Claude 4.5), OpenCode Free, and Vertex AI's $300 credit pool (README).

The bill went up twice in 30 days. The fix is not one tool. The fix is a stack with a meter on top. Install the ledger first, then layer the rest.

Continue in Core

  • Janela de Contexto de 1M no Claude Code
    A Anthropic ativou a janela de contexto de 1M tokens para o Opus 4.6 e o Sonnet 4.6 no Claude Code. Sem header beta, sem sobretaxa, preços fixos e menos compactações.
  • AGENTS.md vs CLAUDE.md Explicados
    Dois arquivos de contexto, um codebase. Como AGENTS.md e CLAUDE.md diferem, o que cada um faz e como usar os dois sem duplicar nada.
  • Auto Dream
    Claude Code organiza as próprias notas de projeto entre sessões. Entradas obsoletas são removidas, contradições são resolvidas, arquivos de tópico são reorganizados. Execute /memory.
  • Memória automática no código Claude
    A memória automática permite ao Claude Code manter notas de projeto em curso. Onde estão os ficheiros, o que é escrito, como é que o /memory o altera, e quando é que se deve escolher o CLAUDE.md.
  • Estratégias de Auto-Planejamento
    O Auto Plan Mode usa --append-system-prompt para forçar o Claude Code a entrar em um loop plan-first. Operações de arquivo pausam para aprovação antes de qualquer coisa ser tocada.
  • Claude Code Autónomo
    Uma stack unificada para agentes que fazem ship de funcionalidades durante a noite. As threads dão-te a estrutura, os loops Ralph dão-te a autonomia, a verificação mantém tudo honesto.

More from Handbook

  • Fundamentos do agente
    Cinco maneiras de criar agentes especializados no Código Claude: Sub-agentes de tarefas, .claude/agents YAML, comandos de barra personalizados, personas CLAUDE.md e prompts de perspetiva.
  • Engenharia de Harness para Agentes
    O harness é cada camada ao redor do seu agente de IA, exceto o modelo em si. Aprenda os cinco pontos de controle, o paradoxo das restrições, e por que o design do harness determina o desempenho do agente mais do que o modelo.
  • Padrões de Agentes
    Orchestrator, fan-out, cadeia de validação, routing especializado, refinamento progressivo e watchdog. Seis formas de orquestração para ligar sub-agentes no Claude Code.
  • Boas Práticas para Equipas de Agentes
    Padrões testados em produção para Equipas de Agentes Claude Code. Prompts de criação ricos em contexto, tarefas bem dimensionadas, posse de ficheiros, modo delegado, e correções das versões v2.1.33-v2.1.45.

Pare de configurar. Comece a construir.

Templates SaaS com orquestração de IA.

Memória do Claude Code

Configure o CLAUDE.md para que sua stack, convenções e foco carreguem na inicialização no slot de alta prioridade que Claude segue mais rigorosamente do que o chat ou arquivos buscados.

Contexto Inicial Dinâmico

Combine o --init com um slash command como /blog ou /ship para carregar exatamente o pacote de contexto que aquele tipo de trabalho precisa. Sem hooks de setup, sem variáveis de ambiente, sem copiar e colar.

On this page

Why your bill is climbing in May 2026
The five tools, ranked by max stated savings
The comparison table
lean-ctx: compress the inputs
airis-mcp-gateway: compress the tool listings
9router: compress the outputs and arbitrage providers
agentmemory: stop paying for re-explaining your project
cc-ledger: see what you spent
The do-this-in-order recipe
What Anthropic itself recommends (free)
Risks and caveats
Per-feature cost, not per-prompt cost
FAQ

Pare de configurar. Comece a construir.

Templates SaaS com orquestração de IA.