The Ralph Wiggum Technique

Update (Jan 2025): Anthropic shipped native task management with dependencies, blockers, and multi-session coordination through CLAUDE_CODE_TASK_LIST_ID. Many Ralph workarounds are built into the product now. The core principles below still hold. The new system just handles the plumbing natively.

Hand an agent a task list. It grabs one, writes the code, runs the tests, commits. Then it grabs the next. And the next. The whole thing runs while you're asleep.

That's Ralph Wiggum. No relation to the Simpsons kid. It's the autonomous coding loop that's quietly reshaping how engineers ship software.

What Makes Ralph Different

Most people drive Claude Code like a chat app. Prompt. Wait. Read. Prompt again. Fine for quick jobs. For shipping actual features though, you become the slow part.

Ralph flips that around. Instead of steering every turn, you build a loop that keeps Claude going until the work is done. The trick sits inside Claude Code's stop hooks: they fire the moment the agent tries to wrap up, which means you can catch that attempt and shove the agent back to work.

Here's the core pattern:

Claude works on a task
Claude tries to stop (outputs completion)
A stop hook intercepts and checks: is the work actually done?
If not, feed the prompt back and continue
If yes, let it complete

Step 4 is everything. Your agent doesn't quit the first time it thinks it's finished. It quits once the work has been verified.

The Completion Promise

Ralph leans on a "completion promise". A specific word or phrase that means actually done. When Claude is convinced the task is wrapped, it emits that promise (usually just the word "complete").

// In your Ralph loop configuration
completion_promise: "complete"
max_iterations: 25

Every time Claude tries to stop, the hook scans for that promise. Missing? Loop keeps going. Present? Loop ends. Premature exits get blocked, and real exits get through cleanly.

Critical rule: No promise, no stop. That forces the agent to keep going until it genuinely thinks the work is done.

Verification: The Non-Negotiable Core

Boris Cherny, who created Claude Code, has one rule he refuses to break: always give Claude a way to verify its work.

That rule is why Ralph works at all. Skip verification and you end up with a loop that either runs forever or stops far too soon. Add it and the loop actually knows when it's finished.

Three verification approaches pair well with Ralph:

1. Test-Driven Verification

Write the tests first. Claude runs them, watches them fail, writes code, runs them again. The loop keeps looping until everything is green.

Workflow:
1. Run all tests in /tests/feature-x/
2. If tests fail, implement code to make them pass
3. Run tests again
4. Repeat until all tests pass
5. Output "complete" only when test suite is green

This is the most reliable path. Tests don't lie. Pass or fail. Nothing fuzzy.

2. Background Agent Verification

Kick off a second agent whose only job is to check the main agent's work. Boris uses this for long runs:

After completing work, use a background agent to:
1. Review all changed files
2. Run the full test suite
3. Check for regressions
4. Report any issues found

You get an independent check. If the background agent spots problems, the main loop goes right back to work.

3. Stop Hook Validation

The stop hook itself can run validation. Check a progress file, run the linter, verify the build. Validation fails? Block the stop and send the agent back in.

// Stop hook pseudocode
if (agent_trying_to_stop) {
  validation_result = run_tests();
  if (validation_result.failed) {
    return { decision: "block", reason: "Tests failing, continue work" };
  }
  return { decision: "allow" };
}

The Two-Phase Workflow

First mistake most people make: they plan and implement in the same context window.

Split them apart.

Phase 1: Planning Session

Generate specifications through conversation
Review and edit by hand
Create an implementation plan with explicit file references
Keep the spec as a "pin" that prevents invention

Phase 2: Implementation Session

Fresh context (clear the previous conversation)
Feed only the plan document
Run the Ralph loop
Let the agent iterate until complete

Why the split? Because context window degradation is real. After enough back-and-forth, Claude starts leaning on stale messages from earlier. A clean start with just the plan means the focus stays tight.

Your plan becomes the anchor. Every iteration of the loop looks back at it. That's what keeps the agent from drifting off into something you didn't ask for.

Practical Implementation: The PRD Approach

Ryan Carson's version looks like this:

Start with a PRD (Product Requirements Document)

What are we building?
What's in scope?
What's explicitly out of scope?

Convert to user stories with acceptance criteria

Each story is a small, testable unit
Acceptance criteria define "done"

Structure for agent consumption

JSON or markdown format
Clear checkboxes for progress tracking
Links to relevant code locations

Run the loop

Agent picks the next uncompleted story
Implements it
Runs verification (tests)
Marks it complete
Moves to the next

Here's the payoff: you just walk away. Wake up to finished features, green tests, and commits already in the log.

UI Verification: The Hidden Trap

A gotcha that bites everyone sooner or later: the tests go green, but the UI looks wrong.

Here's why. Ralph can happily confirm the code runs while staying totally blind to visual bugs. The component renders, the tests pass, and the button is still off-screen or the text is cut in half.

Fix it with a screenshot-based verification protocol.

After implementing UI changes:
1. Take screenshots of affected components
2. Rename each with "verified_" prefix after review
3. Do NOT output completion promise yet
4. Let the next iteration confirm all files are verified
5. Only then output "complete"

That forces at least two loop passes for any UI change. Pass one implements and captures screenshots. Pass two confirms every screenshot got reviewed. The visual check can't be skipped.

The key insight: Instruct Claude that renaming the screenshots does NOT earn the completion promise. The next iteration is what signals done. That blocks the premature exits.

Economics: Why This Changes Everything

A coding agent running nonstop on Sonnet costs roughly $10.42 USD per hour (measured across a 24-hour burn rate).

Less than minimum wage in most places. And you're paying for a machine that can:

Clear backlogs overnight
Run multiple features in parallel
Never get tired or distracted
Scale with more compute

So the bottleneck shifts. It stops being "how much am I willing to spend?" and becomes "how much reliable work can I define?"

Teams running reliable loops will pull way ahead of teams that aren't. The gap is already widening.

Common Failures and Fixes

Loop Never Ends

Cause: Impossible task or missing completion criteria Fix: Set a max iteration count (e.g., 25). Add explicit completion criteria to your prompt.

Loop Ends Too Early

Cause: Claude outputs the promise before work is done Fix: Strengthen your verification. Add tests. Use the screenshot protocol for UI. Make "done" objectively measurable.

Quality Degrades Over Iterations

Cause: Context window filling with failed attempts Fix: Implement checkpoint state. Mark completed work in an external file. Let the loop resume cleanly if context fills.

Agent Invents Features

Cause: Spec is vague or missing Fix: Your spec is the "pin" that prevents invention. Make it specific. Include explicit references to existing code. Tell Claude what NOT to do.

Setting Up Your First Ralph Loop

Keep it small on the first run. Pick a feature you know well, with tests that already exist.

Install the Ralph plugin (or implement the stop hook pattern yourself)
Create your prompt file:

Study the implementation plan in /docs/plan.md
Pick the single most important incomplete task
Implement it following existing patterns
Run tests with: npm test
On pass: mark task complete in plan.md, commit changes
On fail: fix the issue and run tests again
Output "complete" only when all tasks are done and tests pass

Set constraints:

Max iterations: 25
Completion promise: "complete"
Quality gates: tests must pass, linting must pass

Watch the first run. Don't walk away yet. Cancel if behavior looks wrong. Adjust your prompt. Re-run.
Gradually increase autonomy as trust builds.

The Ralph Philosophy

Ralph is not about cutting humans out of coding. It's about cutting humans out of the tedious loop between attempts.

The design is still yours. The specs are still yours. You define what "done" means. You review the final result.

What Ralph takes is the 2 AM debugging slog. The endless test-fix-test grind. The switching between features. That's the stuff it handles.

Boris keeps coming back to the same line: verification drives everything. Give Claude a way to check its own work, and it'll run reliably for hours. Take that away and you're gambling.

Start with verification. Wrap your loops around it. The autonomous coding future isn't smarter prompts. It's better feedback systems.

Next Steps

Try native task management for built-in persistence and multi-session coordination
Learn about hooks to implement custom stop behaviors
Explore async workflows for running multiple loops
Read about thread-based engineering for scaling your autonomous workflows
Check feedback loops for verification patterns

People who get good at Ralph aren't just Claude Code users. They're building systems that ship code while they sleep.

The Ralph Wiggum Technique

On this page