Autonomous Claude Code
A unified stack for agents that ship features overnight. Threads give you the structure, Ralph loops give you the autonomy, verification keeps it honest.
Two ideas are rewiring how engineers drive AI agents: Ralph Wiggum loops and thread-based engineering.
Ralph is the how-to for keeping an agent running by itself. Threads are the how-to for scaling and measuring that autonomy. Stitched together, they're a working system for building software without a human in the seat.
This post is the stitching.
The Unified Model
Here is how the pieces slot in:
Thread-based engineering supplies the skeleton. Your mental model is threads: base, parallel, chained, fusion, big, and long-duration. Every thread type has its own job.
Ralph loops drive the L-threads. The stop hook pattern, the completion promise, and verification-first development turn long autonomous runs into something you can actually trust.
Verification is what keeps the whole thing standing. Without it, threads bail too early and loops grind forever.
Thread Types × Verification → Reliable Autonomous Work
↓
Ralph Loops = Implementation of L-Threads
↓
Result: Features shipping while you sleepThe Verification Stack
Boris Cherny's rule is one sentence: always give Claude a way to verify its work.
It shows up at every layer of the unified model:
| Thread Type | Verification Method |
|---|---|
| Base | Manual review |
| P-Threads | Parallel reviews, consensus |
| C-Threads | Phase-by-phase validation |
| F-Threads | Compare multiple outputs |
| B-Threads | Sub-agent verification |
| L-Threads | Automated tests + stop hooks |
One thing matters more than the others. The longer a thread runs, the more its verification has to run itself. Nobody reviews a 26-hour L-thread by hand. The system has to check itself.
Building the Complete Stack
Here is a working setup that glues every concept together:
Layer 1: Specification (The Pin)
Every autonomous run starts with a spec. That spec is your pin. It stops the agent from inventing the problem.
## Feature: User Dashboard
### Scope
- Display user metrics
- Show recent activity
- Add export functionality
### Out of Scope
- Real-time updates (Phase 2)
- Mobile responsiveness (Phase 2)
### Acceptance Criteria
- [ ] Metrics load in under 2 seconds
- [ ] Activity shows last 30 days
- [ ] Export generates valid CSVPoint at existing code where you can. Say what the agent shouldn't do. Pin down what "done" looks like in plain terms.
Layer 2: Test-Driven Verification
Write the tests first. Those tests are the verification layer that makes L-threads dependable.
// For each acceptance criterion, create a test
tests/
dashboard/
metrics.test.ts # Verifies metrics load time
activity.test.ts # Verifies activity display
export.test.ts # Verifies CSV generationRunning agents execute tests over and over. A loop cannot close until the tests are green. Nothing fuzzy. No early exits.
Layer 3: The Stop Hook
Set your stop hook to enforce verification:
// stop-hook.js
module.exports = async function (context) {
// Run test suite
const testResult = await runTests();
if (testResult.failed > 0) {
return {
decision: "block",
reason: `${testResult.failed} tests failing. Continue work.`,
};
}
// Check for completion promise
if (!context.output.includes("complete")) {
return {
decision: "block",
reason: "Completion promise not found. Verify all work is done.",
};
}
return { decision: "allow" };
};The stop hook is the bouncer. It ignores what Claude thinks. It only cares whether the tests pass.
Layer 4: Thread Selection
Now pick the thread type that fits the work:
Small feature, one file: Base thread. Prompt, agent works, review.
Five independent features: P-threads. Spin up five terminals, assign one feature each.
Database migration with three phases: C-thread. Verify after each phase before continuing.
Critical architecture decision: F-thread. Get three agents' opinions, compare results.
Overnight feature build: L-thread with Ralph loop. Set it running before bed.
Multi-file refactor with sub-tasks: B-thread. Orchestrator spawns workers for each file.
Layer 5: Checkpoint State
Keep state outside the agent. This is especially important for L-threads:
## Progress: User Dashboard
### Completed
- [x] Set up test infrastructure
- [x] Implement metrics API endpoint
- [x] Create metrics display component
### In Progress
- [ ] Implement activity feed
### Remaining
- [ ] Add export functionality
- [ ] Performance optimizationThe agent rewrites this file while it works. If the context window fills up and the agent restarts, it reads the progress file and picks up where it stopped.
UI Verification: The Missing Piece
Tests can pass while the screen is broken.
Any thread that touches UI needs screenshot-based verification on top of the test suite:
Workflow extension for UI work:
1. Complete implementation
2. Take screenshots of affected components
3. Review each screenshot for visual issues
4. Rename verified screenshots with "verified_" prefix
5. Do NOT output completion promise yet
6. Run one more loop to confirm all screenshots verified
7. Only then output "complete"That is what forces visual review. Claude has no way to skip the screenshot check and still call the work done.
Scaling with Loom
The next rung up is Loom-style orchestration.
Loom is an environment built for agents rather than humans. It wires Ralph loops together into reactive systems.
Level 1: Single Ralph loop (L-thread) Level 2: Multiple parallel Ralph loops (P-threads of L-threads) Level 3: Orchestrated chains of loops (B-threads containing L-threads) Level 4: Autonomous product systems (agents that ship, observe, and iterate)
At Level 4, agents:
- Ship behind feature flags
- Deploy without code review
- Observe analytics
- Decide if changes worked
- Iterate automatically
This is the Z-thread endpoint. Zero human input. Full autonomy.
Economics of Autonomous Loops
Keeping an agent running costs about $10.42 USD per hour on Sonnet.
That resets the math.
| Approach | Cost | Output |
|---|---|---|
| Human developer | ~$100/hour | 8 hours/day |
| Single agent | ~$10/hour | 24 hours/day |
| 5 parallel agents | ~$50/hour | 120 agent-hours/day |
Cost isn't the cap. The cap is how much reliable work you can define.
Teams that get verification-first loops right will ship at a different rate than teams that don't. Not a little different. A different rate entirely.
Common Integration Patterns
Pattern 1: Planning + L-Thread
- C-thread for planning (you verify the plan)
- Fresh context
- L-thread for implementation (Ralph loop)
- Final review
Why it works: Planning and implementation live in separate contexts. You spend your attention on the plan. The build runs on its own.
Pattern 2: P-Thread Feature Sprint
- Write specs for multiple features
- Spin up P-threads (one per feature)
- Each P-thread runs as an L-thread internally
- Review completed features as they finish
Why it works: Parallelism lives at the feature level. Autonomy lives at the implementation level.
Pattern 3: F-Thread Architecture
- Define architectural question
- Spin up F-thread (3-4 agents)
- Each agent proposes a solution
- Compare results, pick the best
- Implement chosen solution with L-thread
Why it works: You get several perspectives on the decisions that hurt to get wrong. Once picked, the build runs autonomously.
Pattern 4: B-Thread Orchestration
- Main agent receives large task
- Decomposes into sub-tasks
- Spawns worker agents (each runs mini L-thread)
- Aggregates results
- Main agent verifies and commits
Why it works: Labor divides cleanly. Each worker keeps a narrow focus. The main agent holds the thread together.
Failure Modes and Fixes
Threads End Too Early
Cause: Weak verification Fix: Add more tests. Make completion criteria objective. Use screenshot verification for UI.
L-Threads Spin Forever
Cause: Impossible task or missing completion promise Fix: Set max iterations. Add explicit completion criteria. Make sure the agent knows when to output the promise.
P-Threads Create Conflicts
Cause: Agents editing the same files Fix: Isolate by feature or file. Use git worktrees. Draw hard lines between parallel work.
B-Threads Lose Coherence
Cause: Sub-agents drift from main goal Fix: Better specs. More checkpoints. Orchestrator reviews sub-agent output.
Verification Passes But Work Is Wrong
Cause: Tests don't cover the real requirement Fix: Tighter acceptance criteria. Screenshot verification on UI. Review the first few runs by hand.
The Implementation Path
Start where you stand. Build toward full autonomy one week at a time.
Week 1: Run reliable base threads. Verify every result manually.
Week 2: Add P-threads. Run two agents in parallel. Handle the context switching.
Week 3: Implement test-driven verification. Write tests before implementation.
Week 4: Try your first L-thread. Use the stop hook. Set a max iteration count. Watch it run.
Week 5: Scale L-threads. Run them overnight. Trust the verification.
Week 6: Add B-threads. Let your agent spawn sub-agents. Orchestrate multi-file changes.
Week 7: Try F-threads. Get multiple opinions on architecture decisions.
Week 8: Combine patterns. P-threads of L-threads. B-threads containing F-threads.
Measure every week. How many threads. How long they ran. How many checkpoints the work needed.
The Destination
Software engineering is heading toward autonomous loops scaled through thread-based thinking.
- More threads: Parallelism at every level
- Longer threads: Hours and days, not minutes
- Thicker threads: Agents spawning agents spawning agents
- Fewer checkpoints: Verification replaces review
Developers who work this way aren't just "using AI." They're running autonomous software factories.
Ralph gives you the loop. Threads give you the shape. Verification gives you the trust.
Put all three together and you get systems that ship while you sleep.
Next Steps
- Start with Ralph Wiggum loops for L-thread foundations
- Learn thread-based engineering for the mental model
- Study feedback loops for verification patterns
- Explore async workflows for P-thread management
- Build custom agents for B-thread orchestration
The loop is simple. Verification is what matters. Threads are the multiplier.
Now start building.
Stop configuring. Start building.
Thread-Based Engineering
A framework for measuring AI-assisted engineering. Six thread patterns cover the workflows: base, P, C, F, B, L.
Robots-First Engineering
Stop designing code for people. Design it for autonomous agents running 24/7 at $10 an hour, with rails, verification loops, and back-pressure.