Agent Swarm Orchestration

Running multiple AI agents in parallel without falling apart is harder than it looks.

Most swarms fail the same way. Agents double-claim the same task. They invent different field names for the same data. Merges turn into chaos. One agent loops forever and nobody notices. The failures are consistent because the causes are consistent: missing infrastructure.

Four layers fix all of that. This post walks through each one.

Why most swarms break

The single-agent model is easy to reason about. One agent reads a task, builds it, and either finishes or gets stuck. You see what it did. You fix what it missed.

Add a second agent and the problems multiply. Both agents can see the same task queue. Both can grab the same task at the same time. One of them does work the other already started. You now have two half-finished versions of the same feature and no way to know which one to keep.

Add a third agent and field names start diverging. Agent A calls it userId. Agent B calls it user_id. Agent C writes uid. Three agents, three conventions, three branches that won't merge cleanly.

This is not a model quality problem. Claude Code agents are good at writing code. The problem is coordination infrastructure. Without it, even well-prompted agents produce broken swarms.

The four failure modes appear in order:

Failure	What happens	When it appears
Double-claiming	Two agents grab the same task	As soon as 2+ agents run
Field name drift	Agents invent different names for shared data	First cross-agent feature
Merge chaos	Branches conflict because agents wrote to the same files	At merge time
Silent looping	One agent repeats the same failed step indefinitely	Long runs

Fix these four and a swarm becomes reliable. Skip any one and it breaks.

The four layers

Every working swarm shares the same architecture. The names vary. The shape does not.

Layer	Name	Job
01	Task Graph	Atomic task claims via database
02	Process Shell	Each agent in its own worktree
03	Contracts First	Shared interfaces injected before coding starts
04	Merge Queue	Serialized merges with tiered conflict resolution

These are not optional. Remove any one and a specific failure mode returns. The task graph stops double-claiming. The process shell stops file conflicts. Contracts stop field drift. The merge queue stops bad code landing on main.

Layer 1: the task graph

The most common fix people try is a markdown plan file. Agents read it, pick a task, update it. In practice this breaks immediately. Two agents read the file at the same time. Both see the same unclaimed task. Both write status: in-progress in parallel. The file has a race condition baked in.

The fix is a database with atomic transactions.

A task graph is a table with one row per task. Each row has a status column: pending, claimed, done, or failed. Agents claim tasks with a SQL transaction that checks and updates in one atomic step:

UPDATE tasks
SET status = 'claimed', agent_id = $1, claimed_at = NOW()
WHERE id = $2 AND status = 'pending'
RETURNING id;

If two agents run this query at the same time with the same task ID, the database serializes them. One gets the row back. One gets nothing. The agent that gets nothing moves on to the next unclaimed task. No race condition. No duplicate work.

The task graph also tracks dependencies. Task B can only be claimed after Task A reaches done. This keeps agents from trying to build a payment form before the payment table exists.

Three columns do most of the work:

Column	Type	Purpose
`status`	enum	`pending / claimed / done / failed`
`agent_id`	text	Which agent holds the claim
`depends_on`	int[]	Task IDs that must complete first

A SQLite file on disk is enough for single-machine swarms. Supabase or Postgres works for anything distributed. The database is not the complex part. The transaction pattern is.

Layer 2: process isolation

Agents sharing a working directory fight over the same files. Two agents editing the same file at the same time produce conflicts at best and corrupted output at worst. Git's index can only track one active operation at a time. When two agents both run git add in the same repo simultaneously, one of them fails with index.lock.

Git worktrees solve this completely.

A worktree is a separate checkout of the same repository at a different path on disk. Each checkout has its own working directory, its own index, and its own HEAD. The agents share the underlying object store but nothing else.

You create one worktree per agent at the start of a swarm run:

git worktree add ../agent-a-worktree feature/auth
git worktree add ../agent-b-worktree feature/payments
git worktree add ../agent-c-worktree feature/email

Agent A works in agent-a-worktree/. Agent B works in agent-b-worktree/. They never touch each other's directories. No index locks. No file conflicts during the build phase.

The worktree for each agent is pointed at its own branch. When Agent A is done with feature/auth, that branch merges back through the merge queue (Layer 4). The worktree is then cleaned up or reused for the next task.

What each agent gets:

Resource	Shared	Per-agent
Git object store	Yes	No
Working directory	No	Yes
Index	No	Yes
`HEAD` pointer	No	Yes
Branch	No	Yes

This is the layer that makes true parallelism possible. Agents cannot accidentally overwrite each other's work because they are never writing to the same location.

Layer 3: contracts first

Field name drift is invisible until merge time. Agent A builds an API that returns { userId: "abc" }. Agent B builds a frontend that reads data.user_id. Both work in isolation. At merge time, the frontend reads undefined and the team spends two hours tracing why.

The fix is shared type contracts injected into every agent prompt before coding starts.

A contract is a TypeScript interface (or JSON schema, or plain type definition) that all agents agree to use. You write the contracts before any agent starts:

// contracts/user.ts
export interface User {
  userId: string;
  email: string;
  createdAt: string;
}

export interface ApiResponse<T> {
  data: T;
  error: string | null;
}

Every agent gets these contracts in its system prompt. The orchestrator injects the full contracts file at the top of each agent's context. Agents are instructed to use the defined types and not invent new field names.

The result is measurable. Without contracts, a six-agent swarm building a SaaS backend produced three variants of the user ID field across six branches. Three of the six branches failed to merge cleanly. The integration quality score (measured by counting type errors across the merged codebase) was 28.

With contracts injected at the start, a four-agent swarm building the same feature used userId everywhere. Zero branches failed at merge. Quality score reached 68.

What changes with contracts:

Check	Without	With contracts
User field name	`userId / user_id / uid`	`userId` everywhere
Branch merge failures	3 of 6 fail	0 of 4 fail
Quality score	28	68
Merge time	Unpredictable	FIFO, tiered

The contracts file does not have to be large. Five to ten type definitions covering the shared data models are enough for most features. Add to it as the codebase grows.

Layer 4: the merge queue

Parallel branches are useful until they need to land. Without a queue, the team hits git merge on two branches at the same time, gets conflicts on both, and loses track of which resolution to keep.

A FIFO merge queue serializes landings and handles conflicts in tiers.

Agents push their completed branches to the queue. The queue processes one branch at a time, oldest first. For each branch, it tries four resolution steps:

Tier 1: git merge --no-ff (clean merge, no conflicts)
       ↓ fails
Tier 2: deterministic auto-resolve (whitespace, import order, lock files)
       ↓ fails
Tier 3: LLM resolver per conflicted file (Claude reads both versions, picks one)
       ↓ fails
Tier 4: human review (branch parked, notification sent)

Most merges land at Tier 1 or Tier 2. Tier 3 handles the cases where two agents both modified the same function with different changes. Tier 4 is rare and reserved for conflicts where neither automatic approach is safe.

The key constraint: the LLM resolver in Tier 3 is bounded. It resolves one file at a time. It must produce valid code or it rejects the merge entirely. Prose output is not accepted. A merge that cannot be resolved automatically reaches Tier 4 and parks there until a human reviews it.

This design keeps the queue predictable. Branches land in order. Every landing is logged with the tier it required. Over time, a pattern of Tier 3 conflicts in the same files tells you where the contracts are incomplete.

The cost math

Swarms cost more than sequential runs. That is true and worth knowing before you run one.

A single Claude Code agent completing a task uses a baseline token count. Add a second agent and you roughly double the tokens (two agents, two context windows). Add parallel coordination overhead and the multiplier rises further.

For complex multi-module tasks:

Scenario	Tokens	Cost	Quality gain
Sequential (1 agent)	1x	$9	Baseline
Swarm (20 agents, 6 hrs)	6.7x	$60	+28% quality
Time saved		+$51	2 hours faster

The 3.4x token multiplier for complex tasks produces 28% better output quality, measured by type error count and test pass rate on the merged codebase. For simple tasks the multiplier is higher (3.9x) but the quality gain is larger too (+32%).

The rule is straightforward:

Use a swarm when you have three or more independent modules that can be built in parallel. Auth, payments, and email are a good example. They share types but do not share implementation files. Three agents building in parallel with proper contracts and worktrees finish faster and produce cleaner code than one agent doing all three sequentially.

Do not use a swarm when the work fits in one context window. A single agent with full context is cheaper, simpler to debug, and produces equivalent output for tasks that are inherently sequential.

How to build your own version

You do not need a complex stack to run this. The minimum shape works on one machine.

What you need:

A SQLite file (or Postgres if you want multiple machines)
git worktree (built into Git, no install needed)
A contracts file with your shared types
A merge script that implements the four tiers

Start with the task graph. Create a SQLite table with the columns above. Write a small script that lets agents claim tasks atomically. Test it with two agents racing to claim the same task. Only one should succeed.

CREATE TABLE tasks (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  title TEXT NOT NULL,
  status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending','claimed','done','failed')),
  agent_id TEXT,
  claimed_at DATETIME,
  depends_on TEXT -- JSON array of task IDs
);

Add worktrees next. Write a setup script that creates one worktree per agent before the swarm starts. The script should also clean up worktrees after agents finish. Stale worktrees accumulate fast in long swarm runs.

Write your contracts file before creating any agent prompt. Put it in a shared location that every agent can access. Make it a non-negotiable part of the agent's system prompt.

Build the merge queue last. Start with Tier 1 and Tier 4 only. A clean merge lands immediately. A conflict parks for human review. Add Tier 2 and Tier 3 once you have a sense of what kinds of conflicts come up most in your codebase.

One rule per layer:

Task graph: always use transactions. Never a file.
Process shell: one worktree per agent. Never a shared working directory.
Contracts: inject at the top of every agent prompt. Non-negotiable.
Merge queue: never merge two branches simultaneously. Always serialize.

Where else this pattern applies

The four-layer architecture is not specific to feature builds.

Security audits benefit from the same shape. Multiple agents scan different parts of the codebase in parallel, each in its own worktree, each writing findings to a shared task graph. The merge queue combines their reports without duplication.

Content pipelines use it too. Multiple agents draft different sections of a document in parallel. Contracts define the shared outline structure. The merge queue combines sections in the right order.

Performance profiling runs several agents in parallel across different subsystems. Contracts define the shared benchmark format so all reports are comparable. The queue serializes which recommendations land.

The specific tools change. SQLite becomes Postgres. Worktrees become Docker containers. TypeScript contracts become JSON schemas. The four layers stay the same.

Task graph stops double-claiming. Process shell stops file conflicts. Contracts stop drift. Merge queue stops bad code reaching main. That is the whole model.

Agent Swarm Orchestration

On this page