Build This Now
Build This Now
O que é o Código Claude?Instalar o Claude CodeInstalador Nativo do Claude CodeO Teu Primeiro Projeto com Claude Code
Claude Code v2.1.122 Release NotesClaude Code Dynamic Workflows: Como Orquestrar 1.000 Subagentes Num Codebase RealMelhores Práticas do Claude CodeBoas Práticas para o Claude Opus 4.7Claude Code num VPSIntegração GitRevisão de Código com ClaudeWorktrees no Claude CodeControle Remoto do Claude CodeChannels do Claude CodeChannels, Routines, Teleport, DispatchTarefas Agendadas no Claude CodePermissões do Claude CodeModo Auto do Claude CodeAdicionar Pagamentos Stripe Com o Claude CodeFeedback LoopsFluxos de Trabalho com TodosTarefas no Claude CodeTemplates de ProjetoPreços e Consumo de Tokens no Claude CodePreços do Claude Code: O Que Vais Mesmo PagarClaude Code Ultra ReviewConstruir Uma App Next.js Com o Claude CodeClaude Code With Supabase: Database, Auth, RLSVercel deepsec with Claude CodeTest-Driven Development with Claude CodeComércio Agêntico: Como Construir uma App Que Agentes de IA Podem PagarClaude Code 1M Context in Practice: When Bigger Isn't BetterClaude Code GitHub Actions Setup Guide (@claude + Cron)Claude Code Headless Mode: The Definitive Guide to claude -pClaude Code Max Plan vs API Cost: Break-Even GuideClaude Code Prompt Caching: The Token Discount Most People Never Turn OnQuanto Custa Construir um SaaS com Claude Code em 2026Run a Team of AI Agents in Parallel with Git WorktreesPrompt Injection in Coding Agents: How to Not Get Pwned
speedy_devvkoen_salo
Blog/Handbook/Workflow/Test-Driven Development with Claude Code

Test-Driven Development with Claude Code

Make Claude write failing tests from your spec, then implement until green without cheating. How to wire testing into the agent loop so quality is enforced, not hoped for.

Pare de configurar. Comece a construir.

Templates SaaS com orquestração de IA.

Published Jun 21, 20269 min readHandbook hubWorkflow index

The highest-leverage way to get reliable output from Claude Code is to make it write the test before it writes the code. A failing test is a spec the agent cannot fake. Code can look correct and be wrong; a test that goes from red to green is an objective definition of done that the agent can check itself, on every iteration, without you in the loop.

This guide shows the exact loop: have Claude write failing tests from your spec, confirm they fail, implement until green, and refuse to let it edit the tests to cheat. Then it shows how to enforce that loop with hooks and a test-runner subagent so the gate runs whether you remember it or not.

Pare de configurar. Comece a construir.

Templates SaaS com orquestração de IA.

Why TDD fits agentic coding

The core problem with AI-generated code is that "plausible" and "correct" look identical until something breaks. Claude is very good at producing code that reads like it works. Without a check, you are the check, and you are slower and less consistent than a test suite.

A test changes what the agent is optimizing for. Instead of "produce code that looks like a solution," the target becomes "produce code that makes this specific assertion pass." That is a closed loop the agent can run on its own. It writes, it runs the suite, it sees a failure, it fixes, it runs again. You only step in when the loop converges.

This is why TDD beats prompting harder. A longer prompt still leaves the definition of done fuzzy. A test makes it concrete. The agent stops when the suite is green, not when it feels finished.

There is a catch. Claude writes implementation first by default, then writes tests that pass against that implementation. That is the inverse of TDD and it proves nothing, because the test is shaped to the code rather than to the requirement. You have to flip the order on purpose.

The core loop

Four steps, in order, every time.

  1. Write a failing test from the spec. Describe the behavior you want and tell Claude to write a test for it with no implementation yet. The test encodes the requirement, not the code.
  2. Confirm it fails for the right reason. Run the test. It should fail because the behavior does not exist, not because of a typo or a bad import. A test that passes on the first run is testing nothing.
  3. Implement until green. Now let Claude write the minimum code to pass the test. It can run the suite as many times as it needs and self-correct.
  4. Refuse to edit the test to cheat. The test is frozen once you approve it. If the suite is red, the fix goes in the implementation, never in the assertions. Changing the spec to match broken code is the one move that breaks the whole approach.

The discipline lives in steps 2 and 4. Confirming the red state stops the agent from quietly writing a test that already passes. Freezing the test stops it from "passing" by deleting the part that was failing.

A concrete example

Say you need a pure function that splits a payment into equal cents across N recipients and handles the remainder so the totals always reconcile. This is exactly the kind of logic that looks trivial and hides off-by-one bugs in the rounding.

Start by giving Claude the behavior, not the implementation, and forcing the test-first order.

Write a Vitest test file for a function `splitAmount(totalCents, recipients)`
that splits an integer number of cents into `recipients` equal parts.
Rules: every part is a whole number of cents, the parts sum exactly to
totalCents, and any remainder is distributed one cent at a time to the
earliest recipients. Throw if recipients < 1.

Write the FAILING test first. Do NOT write the implementation yet.
Run it and show me that it fails.

What you want back is a test that pins down the tricky cases (exact division, a remainder, a single recipient, the error path) and nothing else. Here is the kind of file that spec produces.

// src/split-amount.test.ts
import { describe, it, expect } from 'vitest'
import { splitAmount } from './split-amount'

describe('splitAmount', () => {
  it('splits evenly when it divides exactly', () => {
    expect(splitAmount(900, 3)).toEqual([300, 300, 300])
  })

  it('gives the remainder to the earliest recipients', () => {
    expect(splitAmount(1000, 3)).toEqual([334, 333, 333])
  })

  it('always sums back to the total', () => {
    const parts = splitAmount(10001, 7)
    expect(parts.reduce((a, b) => a + b, 0)).toBe(10001)
  })

  it('handles a single recipient', () => {
    expect(splitAmount(500, 1)).toEqual([500])
  })

  it('throws when there are no recipients', () => {
    expect(() => splitAmount(500, 0)).toThrow()
  })
})

Run it and it fails, because splitAmount does not exist yet. That failure is the signal you want before any implementation gets written. Confirm it, then tell Claude to make it pass with the minimum code and no changes to the test.

// src/split-amount.ts
export function splitAmount(totalCents: number, recipients: number): number[] {
  if (recipients < 1) {
    throw new Error('recipients must be at least 1')
  }
  const base = Math.floor(totalCents / recipients)
  const remainder = totalCents - base * recipients
  return Array.from({ length: recipients }, (_, i) =>
    i < remainder ? base + 1 : base,
  )
}

The suite goes green. The "sums back to the total" assertion is the one that matters here, because it catches the rounding bug class directly. If Claude had written the implementation first, that assertion might never have existed, and the bug would ship.

Enforce it with hooks and subagents

Asking for TDD works until you forget to ask. Hooks make the gate automatic. Only 13% of public Claude Code repositories use hooks, the mechanism that can enforce a test gate without you remembering, per our analysis of 2,500 repos. It is the most underused lever in the whole tool.

Two events do the job. A PostToolUse hook matched to Edit and Write runs the suite right after Claude touches a file, giving fast feedback. A Stop hook runs when the turn ends and can block: exit code 2 prevents Claude from stopping and pushes it to keep working. The Stop hook is the real gate, because it can refuse to let the agent finish on a red suite.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          { "type": "command", "command": "npm test --silent" }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          { "type": "command", "command": "npm test --silent" }
        ]
      }
    ]
  }
}

Put that in .claude/settings.json. The PostToolUse run surfaces failures the moment a file changes; the Stop run is the one that holds the line, because a failing suite there blocks the turn from ending. For more on the event model and exit codes, see the hooks guide.

For heavier suites, push the running into a dedicated test-runner subagent so the output and the noise stay out of your main context. The subagent runs the suite, reports pass or fail and the failing cases, and your main session decides what to do. See how to create your first Claude Code subagent for the setup, or the best subagents in 2026 for ready-made ones.

Pitfalls

The dangerous failure mode is reward hacking: the agent makes the suite green by weakening the test instead of fixing the code. It deletes a failing assertion, adds .skip to a case, loosens an exact match to a toBeDefined, or comments the hard part out. The suite passes and the bug ships.

Three guards stop it. Tell Claude in plain words that the test file is frozen once you approve it and that fixes go in the implementation only. Review the diff, because a deleted assertion or a .skip is obvious the moment you look. And run the full suite on a Stop hook, so a skipped test still shows up as a smaller suite rather than a green checkmark.

The other trap is confirming red too lazily. A test can fail for a boring reason (a bad import, a typo in the function name) and you read it as "the behavior is missing." Glance at the failure message. It should fail because the thing under test does not work yet, not because the test file is broken.

What you get

You get output you can trust without reading every line. The test is the contract, the suite is the proof, and the agent does the iterating against both. Bugs that would have surfaced in production surface as a red test in the loop instead.

You also get a feedback loop that compounds. Every test you keep is a regression guard for every future change, including changes Claude makes months from now. The same hook that gates today's feature gates tomorrow's refactor. Pair this with a code review pass and the broader agentic workflow habits, and quality stops being something you hope for and becomes something the system enforces.

If you would rather not wire all of this from scratch, the Build This Now Code Kit is a Claude Code harness for Next.js and Supabase that ships adversarial evaluators and quality gates so the test-and-verify loop runs for you, alongside planning agents, a build pipeline, and auth, payments, and database already wired. It is $29 one-time, no subscription.

Frequently Asked Questions

Why does TDD work so well with Claude Code?

A failing test is an executable spec the agent cannot fake. With normal prompting, Claude writes code that looks right and you find out it is wrong later. With TDD, you write the test first so "done" has an objective definition: the suite goes green. The agent gets a tight feedback loop it can run on its own, so it self-corrects until the behavior actually matches the requirement instead of stopping at plausible.

How do I stop Claude from writing the implementation before the test?

Be explicit. Claude defaults to implementation-first, so tell it: "Write a failing test for this behavior. Do not write any implementation yet. Run the test and show me it fails." Confirm the red state yourself before letting it implement. If you skip that confirmation, the test can quietly be shaped to fit code that does not exist yet, which defeats the point.

How do I keep the agent from weakening a test just to pass it?

This is reward hacking and it is the main failure mode. Three guards: tell Claude explicitly that the test file is frozen once you approve it, review the diff so a deleted assertion is visible, and run a hook on Stop that runs the full suite so a skipped or commented-out test cannot hide. If a test needs to change, you change it deliberately, not the agent mid-task.

Which hook should run my tests, PostToolUse or Stop?

Use both for different jobs. A PostToolUse hook matching Edit and Write gives fast feedback right after a file changes, but it cannot block. A Stop hook runs when the turn ends and can block with exit code 2, which pushes Claude to keep working until the suite is green. The Stop hook is the real gate because it can refuse to let the agent finish on a red suite.

Do I still need to write tests myself if Claude writes them?

You write the spec and review the tests; Claude writes the test code. The leverage is not in typing the assertions, it is in deciding what "correct" means and confirming the test actually checks it. Read every generated test once. A test you never looked at is not a safety net, it is a second thing that can be wrong.

Continue in Workflow

  • Comércio Agêntico: Como Construir uma App Que Agentes de IA Podem Pagar
    Um guia em português simples sobre comércio agêntico em 2026: o que fazem o x402, o ACP e o Machine Payments Protocol, mais um passo a passo de fim de semana para lançar uma API paga que agentes de IA podem comprar.
  • Melhores Práticas do Claude Code
    Cinco hábitos separam os engenheiros que entregam com Claude Code: PRDs, regras modulares em CLAUDE.md, slash commands personalizados, resets com /clear e uma mentalidade de evolução do sistema.
  • Modo Auto do Claude Code
    Um segundo modelo Sonnet revê cada chamada de ferramenta do Claude Code antes de ser executada. O que o modo auto bloqueia, o que permite e as regras de permissão que cria nas tuas definições.
  • Channels, Routines, Teleport, Dispatch
    As quatro funcionalidades de Claude Code que a Anthropic lançou em março e abril de 2026 e que transformam a CLI numa camada de coordenação orientada a eventos entre telemóvel, web e desktop.
  • Claude Code 1M Context in Practice: When Bigger Isn't Better
    The 1M-token context window is GA at flat pricing, but bigger isn't always better. A decision framework, token-cost math, and when to use /compact, subagents, and dynamic workflows instead.
  • Channels do Claude Code
    Liga o Claude Code ao Telegram, Discord ou iMessage com plugins MCP. Walkthroughs de configuração e os fluxos de trabalho assíncronos e mobile-first que tornam a ligação válida.

More from Handbook

  • Fundamentos do agente
    Cinco maneiras de criar agentes especializados no Código Claude: Sub-agentes de tarefas, .claude/agents YAML, comandos de barra personalizados, personas CLAUDE.md e prompts de perspetiva.
  • Engenharia de Harness para Agentes
    O harness é cada camada ao redor do seu agente de IA, exceto o modelo em si. Aprenda os cinco pontos de controle, o paradoxo das restrições, e por que o design do harness determina o desempenho do agente mais do que o modelo.
  • Padrões de Agentes
    Orchestrator, fan-out, cadeia de validação, routing especializado, refinamento progressivo e watchdog. Seis formas de orquestração para ligar sub-agentes no Claude Code.
  • Boas Práticas para Equipas de Agentes
    Padrões testados em produção para Equipas de Agentes Claude Code. Prompts de criação ricos em contexto, tarefas bem dimensionadas, posse de ficheiros, modo delegado, e correções das versões v2.1.33-v2.1.45.

Pare de configurar. Comece a construir.

Templates SaaS com orquestração de IA.

Vercel deepsec with Claude Code

Open-source security harness from vercel-labs that audits your repo with Claude Opus. Wire it into the Claude Code build loop.

Comércio Agêntico: Como Construir uma App Que Agentes de IA Podem Pagar

Um guia em português simples sobre comércio agêntico em 2026: o que fazem o x402, o ACP e o Machine Payments Protocol, mais um passo a passo de fim de semana para lançar uma API paga que agentes de IA podem comprar.

On this page

Why TDD fits agentic coding
The core loop
A concrete example
Enforce it with hooks and subagents
Pitfalls
What you get
Frequently Asked Questions
Why does TDD work so well with Claude Code?
How do I stop Claude from writing the implementation before the test?
How do I keep the agent from weakening a test just to pass it?
Which hook should run my tests, PostToolUse or Stop?
Do I still need to write tests myself if Claude writes them?

Pare de configurar. Comece a construir.

Templates SaaS com orquestração de IA.