Autonomous Claude Code

Two ideas are rewiring how engineers drive AI agents: Ralph Wiggum loops and thread-based engineering.

Ralph is the how-to for keeping an agent running by itself. Threads are the how-to for scaling and measuring that autonomy. Stitched together, they're a working system for building software without a human in the seat.

This post is the stitching.

The Unified Model

Here is how the pieces slot in:

Thread-based engineering supplies the skeleton. Your mental model is threads: base, parallel, chained, fusion, big, and long-duration. Every thread type has its own job.

Ralph loops drive the L-threads. The stop hook pattern, the completion promise, and verification-first development turn long autonomous runs into something you can actually trust.

Verification is what keeps the whole thing standing. Without it, threads bail too early and loops grind forever.

Thread Types × Verification → Reliable Autonomous Work
     ↓
Ralph Loops = Implementation of L-Threads
     ↓
Result: Features shipping while you sleep

The Verification Stack

Boris Cherny's rule is one sentence: always give Claude a way to verify its work.

It shows up at every layer of the unified model:

Thread Type	Verification Method
Base	Manual review
P-Threads	Parallel reviews, consensus
C-Threads	Phase-by-phase validation
F-Threads	Compare multiple outputs
B-Threads	Sub-agent verification
L-Threads	Automated tests + stop hooks

One thing matters more than the others. The longer a thread runs, the more its verification has to run itself. Nobody reviews a 26-hour L-thread by hand. The system has to check itself.

Building the Complete Stack

Here is a working setup that glues every concept together:

Layer 1: Specification (The Pin)

Every autonomous run starts with a spec. That spec is your pin. It stops the agent from inventing the problem.

## Feature: User Dashboard
 
### Scope
 
- Display user metrics
- Show recent activity
- Add export functionality
 
### Out of Scope
 
- Real-time updates (Phase 2)
- Mobile responsiveness (Phase 2)
 
### Acceptance Criteria
 
- [ ] Metrics load in under 2 seconds
- [ ] Activity shows last 30 days
- [ ] Export generates valid CSV

Point at existing code where you can. Say what the agent shouldn't do. Pin down what "done" looks like in plain terms.

Layer 2: Test-Driven Verification

Write the tests first. Those tests are the verification layer that makes L-threads dependable.

// For each acceptance criterion, create a test
tests/
  dashboard/
    metrics.test.ts      # Verifies metrics load time
    activity.test.ts     # Verifies activity display
    export.test.ts       # Verifies CSV generation

Running agents execute tests over and over. A loop cannot close until the tests are green. Nothing fuzzy. No early exits.

Layer 3: The Stop Hook

Set your stop hook to enforce verification:

// stop-hook.js
module.exports = async function (context) {
  // Run test suite
  const testResult = await runTests();
 
  if (testResult.failed > 0) {
    return {
      decision: "block",
      reason: `${testResult.failed} tests failing. Continue work.`,
    };
  }
 
  // Check for completion promise
  if (!context.output.includes("complete")) {
    return {
      decision: "block",
      reason: "Completion promise not found. Verify all work is done.",
    };
  }
 
  return { decision: "allow" };
};

The stop hook is the bouncer. It ignores what Claude thinks. It only cares whether the tests pass.

Layer 4: Thread Selection

Now pick the thread type that fits the work:

Small feature, one file: Base thread. Prompt, agent works, review.

Five independent features: P-threads. Spin up five terminals, assign one feature each.

Database migration with three phases: C-thread. Verify after each phase before continuing.

Critical architecture decision: F-thread. Get three agents' opinions, compare results.

Overnight feature build: L-thread with Ralph loop. Set it running before bed.

Multi-file refactor with sub-tasks: B-thread. Orchestrator spawns workers for each file.

Layer 5: Checkpoint State

Keep state outside the agent. This is especially important for L-threads:

## Progress: User Dashboard
 
### Completed
 
- [x] Set up test infrastructure
- [x] Implement metrics API endpoint
- [x] Create metrics display component
 
### In Progress
 
- [ ] Implement activity feed
 
### Remaining
 
- [ ] Add export functionality
- [ ] Performance optimization

The agent rewrites this file while it works. If the context window fills up and the agent restarts, it reads the progress file and picks up where it stopped.

UI Verification: The Missing Piece

Tests can pass while the screen is broken.

Any thread that touches UI needs screenshot-based verification on top of the test suite:

Workflow extension for UI work:

1. Complete implementation
2. Take screenshots of affected components
3. Review each screenshot for visual issues
4. Rename verified screenshots with "verified_" prefix
5. Do NOT output completion promise yet
6. Run one more loop to confirm all screenshots verified
7. Only then output "complete"

That is what forces visual review. Claude has no way to skip the screenshot check and still call the work done.

Scaling with Loom

The next rung up is Loom-style orchestration.

Loom is an environment built for agents rather than humans. It wires Ralph loops together into reactive systems.

Level 1: Single Ralph loop (L-thread) Level 2: Multiple parallel Ralph loops (P-threads of L-threads) Level 3: Orchestrated chains of loops (B-threads containing L-threads) Level 4: Autonomous product systems (agents that ship, observe, and iterate)

At Level 4, agents:

Ship behind feature flags
Deploy without code review
Observe analytics
Decide if changes worked
Iterate automatically

This is the Z-thread endpoint. Zero human input. Full autonomy.

Economics of Autonomous Loops

Keeping an agent running costs about $10.42 USD per hour on Sonnet.

That resets the math.

Approach	Cost	Output
Human developer	~$100/hour	8 hours/day
Single agent	~$10/hour	24 hours/day
5 parallel agents	~$50/hour	120 agent-hours/day

Cost isn't the cap. The cap is how much reliable work you can define.

Teams that get verification-first loops right will ship at a different rate than teams that don't. Not a little different. A different rate entirely.

Common Integration Patterns

Pattern 1: Planning + L-Thread

C-thread for planning (you verify the plan)
Fresh context
L-thread for implementation (Ralph loop)
Final review

Why it works: Planning and implementation live in separate contexts. You spend your attention on the plan. The build runs on its own.

Pattern 2: P-Thread Feature Sprint

Write specs for multiple features
Spin up P-threads (one per feature)
Each P-thread runs as an L-thread internally
Review completed features as they finish

Why it works: Parallelism lives at the feature level. Autonomy lives at the implementation level.

Pattern 3: F-Thread Architecture

Define architectural question
Spin up F-thread (3-4 agents)
Each agent proposes a solution
Compare results, pick the best
Implement chosen solution with L-thread

Why it works: You get several perspectives on the decisions that hurt to get wrong. Once picked, the build runs autonomously.

Pattern 4: B-Thread Orchestration

Main agent receives large task
Decomposes into sub-tasks
Spawns worker agents (each runs mini L-thread)
Aggregates results
Main agent verifies and commits

Why it works: Labor divides cleanly. Each worker keeps a narrow focus. The main agent holds the thread together.

Week 2: Add P-threads. Run two agents in parallel. Handle the context switching.

Week 3: Implement test-driven verification. Write tests before implementation.

Week 4: Try your first L-thread. Use the stop hook. Set a max iteration count. Watch it run.

Week 5: Scale L-threads. Run them overnight. Trust the verification.

Week 6: Add B-threads. Let your agent spawn sub-agents. Orchestrate multi-file changes.

Week 7: Try F-threads. Get multiple opinions on architecture decisions.

Week 8: Combine patterns. P-threads of L-threads. B-threads containing F-threads.

Measure every week. How many threads. How long they ran. How many checkpoints the work needed.

The Destination

Software engineering is heading toward autonomous loops scaled through thread-based thinking.

More threads: Parallelism at every level
Longer threads: Hours and days, not minutes
Thicker threads: Agents spawning agents spawning agents
Fewer checkpoints: Verification replaces review

Developers who work this way aren't just "using AI." They're running autonomous software factories.

Ralph gives you the loop. Threads give you the shape. Verification gives you the trust.

Put all three together and you get systems that ship while you sleep.

Next Steps

Start with Ralph Wiggum loops for L-thread foundations
Learn thread-based engineering for the mental model
Study feedback loops for verification patterns
Explore async workflows for P-thread management
Build custom agents for B-thread orchestration

The loop is simple. Verification is what matters. Threads are the multiplier.

Now start building.