Build This Now
Build This Now
Real BuildsState of Claude Code 2026: What 2,500 Public Repos RevealBuilding Isn't the Bottleneck AnymoreDistribution Is the New MoatWhy QA Is the Real Bottleneck in AI DevelopmentFirst Principles in the Age of 24-Hour MVPsThe Autonomy Curve: How Much Freedom Can You Give an AI Agent?Idea to SaaSGAN LoopSelf-Evolving HooksTrace to SkillDistribution AgentsAI Security AgentsAutonomous AI SwarmAI Email SequencesAI Cleans ItselfAgent Swarm OrchestrationBuild a Full App with Claude Code: Real ExamplesClaude Code for Non-Developers: Real ExamplesClaude Code for Freelancers: Ship 3x FasterA Security Update from Build This NowThe AI Agent That Deleted a Production Database in 9 SecondsHow to Build Your Own Claude Code Harness (or Buy One)Run Claude Code on a Cheaper Model: DeepSeek and GLM Cost ArbitrageIs Claude Code Just a Thin Wrapper? Inside the Harness DebateHow Much Does It Really Cost to Build a SaaS with Claude Code?How to Cut Your Claude Code Token Bill in HalfDo I Still Need a Boilerplate If I Use Claude Code?Harness vs Boilerplate vs Framework: The Build-System Stack ExplainedHow Long Does Idea to Production Actually Take with Claude Code?Is Vibe Coding Safe? What the Lovable and Moltbook Breaches TeachOwn Your Vercel Analytics: I Built a Drain-to-Postgres PipelineSpec-Driven Development Explained: Why Pros Stopped Vibe CodingState of Vibe-Coded SaaS Security (2026 Data)From Vibe Coding to Production: The Checklist That Stops Data LeaksVibe Coding vs Vibe Engineering vs Agentic Engineering: The 2026 GlossaryWhat Is an Agent Harness? Why the Harness, Not the Model, Is the 2026 Moat
speedy_devvkoen_salo
Blog/Real Builds/Run Claude Code on a Cheaper Model: DeepSeek and GLM Cost Arbitrage

Run Claude Code on a Cheaper Model: DeepSeek and GLM Cost Arbitrage

Point Claude Code at DeepSeek or GLM to cut your bill 7 to 17x. Setup, what breaks, and the July 2026 model-name change explained.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

Published Jun 22, 20267 min readReal Builds hub

You can run Claude Code on a cheaper model by setting four environment variables that redirect it to DeepSeek V4's Anthropic-compatible endpoint, cutting your API bill 7 to 17x with near-identical coding quality. The catch: image analysis, MCP tool forwarding, task budgets, and the /ultrareview command silently stop working, and any setup using the old DeepSeek model names breaks for good on July 24, 2026. GLM-5.1 from ZhipuAI is a strong second option at about 8x cheaper output, and almost no guide mentions it.


Stop configuring. Start building.

SaaS builder templates with AI orchestration.


The short version

Claude Code talks to a model over Anthropic's API format. DeepSeek and GLM both publish endpoints that speak the same format. So you can swap the brain without changing the tool. You keep the Claude Code interface (the same agentic loop, the same file edits) but the requests go to a model that charges a fraction of the price.

Why this matters to you: if you code with Claude Code every day and your bill is creeping up, a swap to DeepSeek V4-Flash for routine work can take a heavy day from tens of dollars down to single digits. The trade is not raw quality. On standard coding benchmarks the gap is tiny. The trade is a handful of Claude-specific features that quietly go dark.

The four-variable setup

You do not need a proxy, a LiteLLM wrapper, or any local server. DeepSeek serves a native Anthropic Messages API endpoint, so Claude Code connects directly. Set these four environment variables:

  1. ANTHROPIC_BASE_URL = https://api.deepseek.com/anthropic
  2. ANTHROPIC_AUTH_TOKEN = your DeepSeek API key
  3. ANTHROPIC_MODEL = deepseek-v4-pro (your main model)
  4. ANTHROPIC_SMALL_FAST_MODEL = deepseek-v4-flash (the cheap model Claude Code uses for small background tasks)

Restart Claude Code and it now routes to DeepSeek. That is the whole setup.

For GLM, the same four variables apply. Point ANTHROPIC_BASE_URL at ZhipuAI's Anthropic-compatible endpoint and set the model names to the GLM-5.1 identifiers from your provider dashboard.

If you want a cleaner way to manage these per-project, you can keep them in a project file alongside your other Claude Code config like your CLAUDE.md instructions, so each repo can route to a different model.

The July 24, 2026 deadline (read this before you copy any old tutorial)

This is the trap that will bite people. The old DeepSeek model aliases deepseek-chat and deepseek-reasoner are retired on July 24, 2026 at 15:59 UTC. The correct identifiers are now deepseek-v4-pro and deepseek-v4-flash.

Any tutorial written before April 2026 uses the old names. After the deadline, a config with the old names does not throw a clear error. It can fail mid-session, leaving you confused about why your agent stopped responding. If you set this up, use the new names from day one.

Is the quality actually there?

Yes, on standard coding tasks. The benchmark numbers are close enough that you would not feel a difference on most work:

  • DeepSeek V4-Pro scores 80.6% on SWE-bench Verified (a test of real GitHub bug fixes). Claude Opus 4.6 scores 80.8%. That is a rounding error.
  • On Terminal-Bench 2.0 (live terminal tasks), DeepSeek V4-Pro actually beats Claude: 67.9% versus 65.4%.
  • GLM-5.1 reportedly hits 94.6% parity on Z.ai's own internal Claude Code coding eval.

So the cost saving is not you accepting worse code. On the common stuff (fixing bugs, writing functions, running commands) the cheaper models keep up.

The comparison table

ModelInput ($/M tokens)Output ($/M tokens)SWE-bench VerifiedNative Anthropic endpointKnown Claude Code breakagesBest for
DeepSeek V4-Flash~$0.14lown/a (small model)YesSame as V4-ProCheap small/fast background tasks
DeepSeek V4-Pro~$0.435 to $1.74 (reported)comparable to GLM80.6%YesImages, MCP, task budgets, /ultrareviewHeavy agentic coding on a budget
GLM-5.1 (ZhipuAI)~$1.00~$3.20n/a (94.6% parity on Z.ai eval)YesSame Anthropic-feature gapsBalanced cost and quality
Claude Opus 4.6 (baseline)$5.00higher80.8%Yes (it is Anthropic)NoneFull feature set, images, MCP

Prices vary by source and tier, so treat the dollar figures as reported ranges, not fixed quotes. DeepSeek V4-Flash is the cheapest overall. GLM-5.1 sits in the middle.

What silently breaks

This is the honest part. When you point Claude Code at a non-Anthropic endpoint, four features stop working and none of them throws a loud error. They just produce nothing:

  • Image analysis. Paste a screenshot and the model cannot read it.
  • MCP tool forwarding. Connections to MCP servers (the plugins that give Claude Code extra tools) stop being passed through.
  • Task budgets. The spend-limit controls do not apply.
  • /ultrareview. The deep review command does not run.

Two more confirmed issues:

  • Streaming tool calls break on NVIDIA NIM. The agent emits a tool call and a stop signal but never records the tool result, so the loop halts. This is a confirmed open bug with no published workaround.
  • An auth bug in Claude Code versions 2.1.128 through 2.1.131 triggers rejection on DeepSeek's endpoint because of a metadata.user_id serialization issue. Use a version outside that range.

If your workflow leans on images, MCP servers, or those review commands, the swap will quietly cost you those. For plain code editing and terminal work, you lose nothing.

The cost math

One documented case: a day with 412 tool calls of heavy agentic work cost under $7 through DeepSeek. For comparison, Claude Max 5x is $100 per month, which breaks even at $3.33 per day of equivalent API usage. If your daily usage clears that line, the cheaper model is the obvious move.

A practical split many builders use: route everyday coding to DeepSeek V4-Flash or V4-Pro, and keep a Claude subscription on standby for the work that needs images, MCP, or the review commands. You get the cheap bill for 90% of the work and full features when you actually need them.

Where Build This Now fits

If you want a setup that already ships production apps (not snippets) on top of Claude Code, the Build This Now Code Kit is a $29 one-time build system: a production SaaS skeleton with auth, Stripe payments, and PostgreSQL with row-level security on every table, plus the agents, skills, and hooks wired in. It runs on Claude Code, so the model-routing trick above applies the same way. Route the cheap model for bulk work, keep Claude for the feature-heavy steps.

FAQ

How do I use DeepSeek with Claude Code?

Set ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic, ANTHROPIC_AUTH_TOKEN to your DeepSeek key, ANTHROPIC_MODEL=deepseek-v4-pro, and ANTHROPIC_SMALL_FAST_MODEL=deepseek-v4-flash. No proxy needed. Restart Claude Code and it routes to DeepSeek.

Does DeepSeek work as well as Claude for coding?

On SWE-bench Verified, DeepSeek V4-Pro (80.6%) and Claude Opus 4.6 (80.8%) are effectively tied. DeepSeek V4-Pro actually outperforms Claude on Terminal-Bench 2.0 live terminal tasks (67.9% versus 65.4%). For standard coding, quality is not the trade-off.

What breaks when you use Claude Code with DeepSeek?

Image analysis, MCP tool forwarding, task budgets, and /ultrareview stop working silently. Streaming tool calls also break on NVIDIA NIM with no workaround. Claude Code versions 2.1.128 through 2.1.131 trigger auth errors due to a metadata serialization bug, so avoid those versions.

Is GLM cheaper than DeepSeek for Claude Code?

GLM-5.1 input pricing ($1.00/M) sits between DeepSeek V4-Flash ($0.14/M) and V4-Pro (reported $0.435 to $1.74/M). GLM-5.1 output ($3.20/M) is comparable to V4-Pro. DeepSeek V4-Flash is the cheapest option overall.

Will old DeepSeek tutorials still work?

No. The model names deepseek-chat and deepseek-reasoner retire on July 24, 2026 at 15:59 UTC. Use deepseek-v4-pro and deepseek-v4-flash. Old configs can fail mid-session after that date without a clear error.

More in Real Builds

  • AI Cleans Itself
    Three overnight Claude Code workflows that clean AI's own mess: slop-cleaner removes dead code, /heal repairs broken branches, /drift catches pattern drift.
  • Agent Swarm Orchestration
    Four infrastructure layers that stop agent swarms from double-claiming tasks, drifting on field names, and collapsing under merge chaos.
  • GAN Loop
    One agent generates, one tears it apart, they loop until the score stops improving. GAN Loop implementation with agent definitions and rubric templates.
  • The Autonomy Curve: How Much Freedom Can You Give an AI Agent?
    How much autonomy you can give an AI agent is decided by one thing: how long a model holds a task without drifting. A good harness plus a reliable model is what unlocks real agent work.
  • The AI Agent That Deleted a Production Database in 9 Seconds
    An AI deleted PocketOS's production database and all backups in 9 seconds. Here is why it happened and the guardrails that prevent it.
  • AI Email Sequences
    One Claude Code command builds 17 lifecycle emails across 6 sequences, wires Inngest behavioral triggers, and ships a branching email funnel ready to deploy.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

On this page

The short version
The four-variable setup
The July 24, 2026 deadline (read this before you copy any old tutorial)
Is the quality actually there?
The comparison table
What silently breaks
The cost math
Where Build This Now fits
FAQ
How do I use DeepSeek with Claude Code?
Does DeepSeek work as well as Claude for coding?
What breaks when you use Claude Code with DeepSeek?
Is GLM cheaper than DeepSeek for Claude Code?
Will old DeepSeek tutorials still work?

Stop configuring. Start building.

SaaS builder templates with AI orchestration.