Build This Now
Build This Now
What Is Claude Code?Claude Code InstallationClaude Code Native InstallerYour First Claude Code Project
The Ralph Wiggum TechniqueThread-Based EngineeringAutonomous Claude CodeRobots-First EngineeringClaude Code /simplify and /batch
Get Build This Now
speedy_devvkoen_salo
Blog/Handbook/Core/Autonomous Claude Code

Autonomous Claude Code

A unified stack for agents that ship features overnight. Threads give you the structure, Ralph loops give you the autonomy, verification keeps it honest.

Two ideas are rewiring how engineers drive AI agents: Ralph Wiggum loops and thread-based engineering.

Ralph is the how-to for keeping an agent running by itself. Threads are the how-to for scaling and measuring that autonomy. Stitched together, they're a working system for building software without a human in the seat.

This post is the stitching.

The Unified Model

Here is how the pieces slot in:

Thread-based engineering supplies the skeleton. Your mental model is threads: base, parallel, chained, fusion, big, and long-duration. Every thread type has its own job.

Ralph loops drive the L-threads. The stop hook pattern, the completion promise, and verification-first development turn long autonomous runs into something you can actually trust.

Verification is what keeps the whole thing standing. Without it, threads bail too early and loops grind forever.

Thread Types × Verification → Reliable Autonomous Work
     ↓
Ralph Loops = Implementation of L-Threads
     ↓
Result: Features shipping while you sleep

The Verification Stack

Boris Cherny's rule is one sentence: always give Claude a way to verify its work.

It shows up at every layer of the unified model:

Thread TypeVerification Method
BaseManual review
P-ThreadsParallel reviews, consensus
C-ThreadsPhase-by-phase validation
F-ThreadsCompare multiple outputs
B-ThreadsSub-agent verification
L-ThreadsAutomated tests + stop hooks

One thing matters more than the others. The longer a thread runs, the more its verification has to run itself. Nobody reviews a 26-hour L-thread by hand. The system has to check itself.

Building the Complete Stack

Here is a working setup that glues every concept together:

Layer 1: Specification (The Pin)

Every autonomous run starts with a spec. That spec is your pin. It stops the agent from inventing the problem.

## Feature: User Dashboard
 
### Scope
 
- Display user metrics
- Show recent activity
- Add export functionality
 
### Out of Scope
 
- Real-time updates (Phase 2)
- Mobile responsiveness (Phase 2)
 
### Acceptance Criteria
 
- [ ] Metrics load in under 2 seconds
- [ ] Activity shows last 30 days
- [ ] Export generates valid CSV

Point at existing code where you can. Say what the agent shouldn't do. Pin down what "done" looks like in plain terms.

Layer 2: Test-Driven Verification

Write the tests first. Those tests are the verification layer that makes L-threads dependable.

// For each acceptance criterion, create a test
tests/
  dashboard/
    metrics.test.ts      # Verifies metrics load time
    activity.test.ts     # Verifies activity display
    export.test.ts       # Verifies CSV generation

Running agents execute tests over and over. A loop cannot close until the tests are green. Nothing fuzzy. No early exits.

Layer 3: The Stop Hook

Set your stop hook to enforce verification:

// stop-hook.js
module.exports = async function (context) {
  // Run test suite
  const testResult = await runTests();
 
  if (testResult.failed > 0) {
    return {
      decision: "block",
      reason: `${testResult.failed} tests failing. Continue work.`,
    };
  }
 
  // Check for completion promise
  if (!context.output.includes("complete")) {
    return {
      decision: "block",
      reason: "Completion promise not found. Verify all work is done.",
    };
  }
 
  return { decision: "allow" };
};

The stop hook is the bouncer. It ignores what Claude thinks. It only cares whether the tests pass.

Layer 4: Thread Selection

Now pick the thread type that fits the work:

Small feature, one file: Base thread. Prompt, agent works, review.

Five independent features: P-threads. Spin up five terminals, assign one feature each.

Database migration with three phases: C-thread. Verify after each phase before continuing.

Critical architecture decision: F-thread. Get three agents' opinions, compare results.

Overnight feature build: L-thread with Ralph loop. Set it running before bed.

Multi-file refactor with sub-tasks: B-thread. Orchestrator spawns workers for each file.

Layer 5: Checkpoint State

Keep state outside the agent. This is especially important for L-threads:

## Progress: User Dashboard
 
### Completed
 
- [x] Set up test infrastructure
- [x] Implement metrics API endpoint
- [x] Create metrics display component
 
### In Progress
 
- [ ] Implement activity feed
 
### Remaining
 
- [ ] Add export functionality
- [ ] Performance optimization

The agent rewrites this file while it works. If the context window fills up and the agent restarts, it reads the progress file and picks up where it stopped.

UI Verification: The Missing Piece

Tests can pass while the screen is broken.

Any thread that touches UI needs screenshot-based verification on top of the test suite:

Workflow extension for UI work:

1. Complete implementation
2. Take screenshots of affected components
3. Review each screenshot for visual issues
4. Rename verified screenshots with "verified_" prefix
5. Do NOT output completion promise yet
6. Run one more loop to confirm all screenshots verified
7. Only then output "complete"

That is what forces visual review. Claude has no way to skip the screenshot check and still call the work done.

Scaling with Loom

The next rung up is Loom-style orchestration.

Loom is an environment built for agents rather than humans. It wires Ralph loops together into reactive systems.

Level 1: Single Ralph loop (L-thread) Level 2: Multiple parallel Ralph loops (P-threads of L-threads) Level 3: Orchestrated chains of loops (B-threads containing L-threads) Level 4: Autonomous product systems (agents that ship, observe, and iterate)

At Level 4, agents:

  • Ship behind feature flags
  • Deploy without code review
  • Observe analytics
  • Decide if changes worked
  • Iterate automatically

This is the Z-thread endpoint. Zero human input. Full autonomy.

Economics of Autonomous Loops

Keeping an agent running costs about $10.42 USD per hour on Sonnet.

That resets the math.

ApproachCostOutput
Human developer~$100/hour8 hours/day
Single agent~$10/hour24 hours/day
5 parallel agents~$50/hour120 agent-hours/day

Cost isn't the cap. The cap is how much reliable work you can define.

Teams that get verification-first loops right will ship at a different rate than teams that don't. Not a little different. A different rate entirely.

Common Integration Patterns

Pattern 1: Planning + L-Thread

  1. C-thread for planning (you verify the plan)
  2. Fresh context
  3. L-thread for implementation (Ralph loop)
  4. Final review

Why it works: Planning and implementation live in separate contexts. You spend your attention on the plan. The build runs on its own.

Pattern 2: P-Thread Feature Sprint

  1. Write specs for multiple features
  2. Spin up P-threads (one per feature)
  3. Each P-thread runs as an L-thread internally
  4. Review completed features as they finish

Why it works: Parallelism lives at the feature level. Autonomy lives at the implementation level.

Pattern 3: F-Thread Architecture

  1. Define architectural question
  2. Spin up F-thread (3-4 agents)
  3. Each agent proposes a solution
  4. Compare results, pick the best
  5. Implement chosen solution with L-thread

Why it works: You get several perspectives on the decisions that hurt to get wrong. Once picked, the build runs autonomously.

Pattern 4: B-Thread Orchestration

  1. Main agent receives large task
  2. Decomposes into sub-tasks
  3. Spawns worker agents (each runs mini L-thread)
  4. Aggregates results
  5. Main agent verifies and commits

Why it works: Labor divides cleanly. Each worker keeps a narrow focus. The main agent holds the thread together.

Failure Modes and Fixes

Threads End Too Early

Cause: Weak verification Fix: Add more tests. Make completion criteria objective. Use screenshot verification for UI.

L-Threads Spin Forever

Cause: Impossible task or missing completion promise Fix: Set max iterations. Add explicit completion criteria. Make sure the agent knows when to output the promise.

P-Threads Create Conflicts

Cause: Agents editing the same files Fix: Isolate by feature or file. Use git worktrees. Draw hard lines between parallel work.

B-Threads Lose Coherence

Cause: Sub-agents drift from main goal Fix: Better specs. More checkpoints. Orchestrator reviews sub-agent output.

Verification Passes But Work Is Wrong

Cause: Tests don't cover the real requirement Fix: Tighter acceptance criteria. Screenshot verification on UI. Review the first few runs by hand.

The Implementation Path

Start where you stand. Build toward full autonomy one week at a time.

Week 1: Run reliable base threads. Verify every result manually.

Week 2: Add P-threads. Run two agents in parallel. Handle the context switching.

Week 3: Implement test-driven verification. Write tests before implementation.

Week 4: Try your first L-thread. Use the stop hook. Set a max iteration count. Watch it run.

Week 5: Scale L-threads. Run them overnight. Trust the verification.

Week 6: Add B-threads. Let your agent spawn sub-agents. Orchestrate multi-file changes.

Week 7: Try F-threads. Get multiple opinions on architecture decisions.

Week 8: Combine patterns. P-threads of L-threads. B-threads containing F-threads.

Measure every week. How many threads. How long they ran. How many checkpoints the work needed.

The Destination

Software engineering is heading toward autonomous loops scaled through thread-based thinking.

  • More threads: Parallelism at every level
  • Longer threads: Hours and days, not minutes
  • Thicker threads: Agents spawning agents spawning agents
  • Fewer checkpoints: Verification replaces review

Developers who work this way aren't just "using AI." They're running autonomous software factories.

Ralph gives you the loop. Threads give you the shape. Verification gives you the trust.

Put all three together and you get systems that ship while you sleep.

Next Steps

  • Start with Ralph Wiggum loops for L-thread foundations
  • Learn thread-based engineering for the mental model
  • Study feedback loops for verification patterns
  • Explore async workflows for P-thread management
  • Build custom agents for B-thread orchestration

The loop is simple. Verification is what matters. Threads are the multiplier.

Now start building.

More in this guide

  • Agent Fundamentals
    Five ways to build specialized agents in Claude Code, from sub-agents to .claude/agents/ definitions to perspective prompts.
  • Agent Patterns
    Orchestrator, fan-out, validation chain, specialist routing, progressive refinement, and watchdog. Six ways to wire sub-agents in Claude Code.
  • Agent Teams Best Practices
    Battle-tested patterns for Claude Code agent teams. Troubleshooting, limitations, plan mode quirks, and fixes shipped from v2.1.33 through v2.1.45.
  • Agent Teams Controls
    Stop your agent team lead from grabbing implementation work. Configure delegate mode, plan approval, hooks, and CLAUDE.md for teams.
  • Agent Teams Prompt Templates
    Ten tested Agent Teams prompts for Claude Code. Code review, debugging, feature builds, architecture calls, and campaign research. Paste and go.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

Get Build This Now

Thread-Based Engineering

A framework for measuring AI-assisted engineering. Six thread patterns cover the workflows: base, P, C, F, B, L.

Robots-First Engineering

Stop designing code for people. Design it for autonomous agents running 24/7 at $10 an hour, with rails, verification loops, and back-pressure.

On this page

The Unified Model
The Verification Stack
Building the Complete Stack
Layer 1: Specification (The Pin)
Layer 2: Test-Driven Verification
Layer 3: The Stop Hook
Layer 4: Thread Selection
Layer 5: Checkpoint State
UI Verification: The Missing Piece
Scaling with Loom
Economics of Autonomous Loops
Common Integration Patterns
Pattern 1: Planning + L-Thread
Pattern 2: P-Thread Feature Sprint
Pattern 3: F-Thread Architecture
Pattern 4: B-Thread Orchestration
Failure Modes and Fixes
Threads End Too Early
L-Threads Spin Forever
P-Threads Create Conflicts
B-Threads Lose Coherence
Verification Passes But Work Is Wrong
The Implementation Path
The Destination
Next Steps

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

Get Build This Now