Thread-Based Engineering

How do you actually know you're getting better at AI-assisted engineering?

Not "feeling productive." Counting it. Measuring it. Showing, with numbers, that this week beats last week.

Thread-based engineering is the frame that makes this possible. Every piece of AI-assisted work becomes a discrete unit called a thread. Once work shows up as threads, you can tune them.

A thread is one unit of engineering work stretched over time, driven by you plus an agent.

Two nodes in every thread need a human:

The beginning. You prompt or plan.
The end. You review or validate.

What about the middle? The agent handles it through tool calls.

That's the base thread. Any time you fire up Claude Code and run a prompt, you've started a thread. The agent executes tool calls (reads files, writes code, runs commands), and when it stops, you check the result.

Simple idea. Big consequences.

The core insight: tool calls roughly equal impact (assuming the prompt was worth running).

Before 2023, the tool calls were you. You edited code. You opened files. You ran commands. The whole chain was manual.

Today you show up at the start (prompt) and the end (review). The middle runs itself.

Whoever runs more useful tool calls wins against whoever runs fewer. That's the new scoreboard.

Once the base thread makes sense, scaling it follows. Six patterns cover almost every AI-assisted workflow.

1. Base Thread

The foundation. One prompt, agent work, one review.

Every pattern below builds on this one. A shaky base thread means nothing higher up works either.

Use for. Simple tasks, quick fixes, single-file changes.

2. P-Threads (Parallel Execution)

Several threads running at the same time.

Boris Cherny, who created Claude Code, keeps five Claude Code instances open in his terminal, numbered 1 through 5. On top of that, he runs 5 to 10 more in the Claude Code web interface.

Call it 10 to 15 parallel threads. One agent ships auth, another works API endpoints, another writes tests. You prompt, tab over, prompt, tab over, prompt. Then you circle back to review.

Use for. Independent tasks, code reviews, feature branches, research.

How to improve. Open more terminal windows. Push background agents into the Claude Code web interface. Fork terminals with custom tooling.

3. C-Threads (Chained Workloads)

Multi-phase work with human checkpoints between phases.

Sometimes the work won't fit in one context window. Or the stakes are high enough that you want eyes on every step before the next one starts.

C-threads split work into phases:

Phase 1: Database migration
Phase 2: API updates
Phase 3: Frontend changes

You review between phases. Anything broken gets caught early, before it snowballs into a huge rollback.

Use for. Production deploys, large refactors, sensitive migrations, multi-step workflows.

Trade-off. Your attention. C-threads cost more human time. Run them when the risk earns it.

The ask user question tool in Claude Code supports C-threads out of the box. An agent can pause mid-workflow and ask you something before moving to the next phase.

4. F-Threads (Fusion)

One prompt, many agents, then you merge the best results.

Think "best of N" across whole workflows. Fire the same prompt at four agents. Look at all four outputs. Pick the winner. Or stitch the strongest pieces from several outputs into something better than any single one.

Why it works. More attempts raise the odds of success. One agent might flounder while another nails it. Four angles beat one.

Use for. Rapid prototyping, research questions, architecture calls, code reviews where confidence matters.

The future of prototyping. F-threads will own rapid prototyping. Launch several agents, hand them the same problem, fuse their outputs. More compute buys more confidence.

5. B-Threads (Big/Meta)

One thread that holds other threads inside it.

This is where things turn meta. Prompts fire other prompts. Sub-agents spin up more sub-agents. An orchestrator agent runs a planner, then a builder, then a reviewer.

From the engineer seat, you still only prompt at the start and review at the end. Underneath, multiple threads run themselves.

The clearest example. Sub-agents. Tell Claude Code to "use sub-agents to handle these three tasks" and it spawns three threads inside itself. One prompt from you, three threads running.

Use for. Complex multi-file changes, team-of-agents workflows, orchestrated builds.

The pattern. Agents write prompts for you. The orchestrator writes prompts for worker agents. Output goes up 10x without effort going up 10x.

6. L-Threads (Long Duration)

Extended autonomy with no human in the loop.

The base thread, stretched to its edge. Not 10 tool calls. Try 100. Not 5 minutes. Try 5 hours. Boris has run threads past 26 hours.

L-threads need:

Strong prompts (great planning equals great prompting)
Solid verification (so the agent can tell when it's finished)
Checkpoint state (so work survives context limits)

The link to Ralph. The Ralph Wiggum technique is built around L-threads. A stop hook keeps the agent looping until the work is actually done. No early exits. No hand-holding.

Use for. Overnight feature builds, big codebases, backlog burndown.

One more thread type points at where engineering is heading.

Z-threads. Zero-touch threads. Full trust in your agents. No review node at all.

This isn't vibe coding. This is agentic engineering with so much verification and so many guardrails that reviewing the output is genuinely optional.

The agent ships to production. Watches analytics. Decides whether the change landed. Iterates.

Most engineers aren't there yet. But everything is pointed in that direction. The goal: systems reliable enough that review stops being required.

Every thread pattern comes back to four fundamentals:

Context. What the agent knows
Model. Which model is running
Prompt. What you're asking for
Tools. What the agent can touch

Get those four right and you get agents. Any thread optimization lands on one of them.

Better prompts mean longer threads
Better context means more accurate work
Better tools mean more capabilities
Better models mean higher reliability

For L-threads especially, the stop hook carries the weight.

When your agent tries to stop, the stop hook intercepts:

Agent tries to complete
Stop hook runs validation code
Decision: Is the task actually complete?
If no: Block the stop, continue iterating
If yes: Allow completion

That's the technical core of Ralph loops. The stop hook won't let the agent quit when it thinks it's done. It lets the agent quit when the work is verified.

Thread-based engineering gives you something you can actually measure.

1. Run More Threads (P-Threads)

Can you add more parallel agents? Boris is at 10 to 15. Can you reach 5? Can you reach 3?

Measure. Concurrent threads running.

2. Run Longer Threads (L-Threads)

Can threads go further in tool calls before you have to step in?

Measure. Average tool calls per thread before intervention.

3. Run Thicker Threads (B-Threads)

Can threads sit inside threads? Can one prompt fan out into five sub-agents?

Measure. Work per prompt you write.

4. Run Fewer Checkpoints

Can you cut the number of human reviews? Does your verification earn enough trust to skip them?

Measure. Phases that run before a manual check.

Improvement on any of those four dimensions is real improvement as an agentic engineer. That's the metric. That's how you tell.

Here's what this looks like on a real day:

Monday morning. Five features to ship.

Old way. Feature 1. Done. Feature 2. Done. Repeat. Five sequential sessions.

Thread-based way:

Write specs for all five features (planning phase)
Launch five parallel Claude Code instances (P-threads)
Hand each instance a feature
Review the first finished results while the rest run
Some features need chunked phases (C-threads)
The trickiest one spawns sub-agents (B-thread)
The overnight task runs as an L-thread with Ralph loop

Same five features. Except now you're running more threads, thicker threads, and longer threads.

Thread-based engineering and Ralph loops slot together.

Ralph answers the question: how do I keep an agent running reliably until it's genuinely done?

Thread-based engineering answers: how do I scale agent usage and measure that I'm getting better?

Ralph delivers L-threads. Thread-based engineering tells you when L-threads beat P-threads and when B-threads are the right call.

The stop hook behind Ralph is the same stop hook that keeps L-threads alive. Verification-first development is what makes either pattern work.

The engineers pulling ahead aren't only "using AI." They're thinking in threads.

Every task starts with one question: what kind of thread is this? Parallelize it? Chain it? Nest sub-threads inside?

The bottleneck moves. It used to be "how fast can I code?" Now it's "how many useful threads can I run?"

Scale your compute. Scale your impact.

Start small:

Audit your work. How many threads do you run now? (For most engineers: 1.)
Add one P-thread. Open a second terminal. Run a parallel task while the first agent is still busy.
Time your threads. Count tool calls before you step in. Track that number.
Try a C-thread. Break a big task into explicit phases. Review between phases.
Work toward L-threads. Stand up verification. Let an agent run 30 minutes unattended.

The point isn't hitting 15 parallel Z-threads tomorrow. The point is steady, measurable gain. More threads. Longer threads. Thicker threads. Fewer checkpoints.

That's the only real way to know you're improving. Not feeling. Counting.

Add Ralph Wiggum loops for L-thread autonomy
Pick up async workflows for running parallel agents
Study sub-agent design for B-thread architectures
Build feedback loops for verification patterns

Thread-based engineering turns AI coding from craft into measurable practice. You can track it. You can improve it. You can scale it.

Start counting your threads.

Thread-Based Engineering

On this page