Agent Teams Workflow

Problem: Agent Teams is turned on. You've run your first team. Spawning works. But the output is a pile of half-integrated pieces you end up stitching together yourself. The gap between "agent teams run" and "agent teams ship production code" is a process, not a feature flag.

Quick Win: A workflow that makes agent teams reliable runs in two phases. Planning strips out assumptions and locks down the contracts between domains. Execution then spawns agents in waves and hands each wave the contracts it needs so parallel work glues together cleanly. The full workflow is below.

This is the companion guide to the Agent Teams overview. If you haven't flipped the feature on or run a first team, start there. For controls and config, see Advanced Controls. For copy-paste templates, see Use Cases.

Being able to spawn parallel Claude sessions that talk is a capability. Agentic workflows need more than a capability. Raw spawning without a process is like handing five contractors the keys to a site with no blueprints: everyone builds, nothing fits.

Two failure modes show up whenever there's no workflow:

Assumption drift. Each agent picks its own data shapes, naming, error formats, and edge cases. The backend hands back { notif_type: "comment" } and the frontend reads { type: "COMMENT" }. Unit tests pass on both sides. Integration explodes.
Missing validation. Every agent reports success, so the lead marks the tasks done. Nobody tested the full flow. The app breaks the first time you click through it and buried errors surface.

One pipeline fixes both:

Brain dump your requirements (unstructured, messy, that's fine)
Research and Q&A where Claude investigates your codebase and asks you clarifying questions
Structured plan with team members, dependency chains, and acceptance criteria
Fresh context where you start a new session with just the plan
Contract chain analysis where the lead derives interfaces from the plan's dependency graph
Wave execution where agents build in parallel against injected contracts
Validation loop where the lead runs end-to-end checks against acceptance criteria

Each step exists because skipping it produces a specific failure. The rest of this post walks them one by one.

Step 1: Brain Dump

Don't polish the input. Write what you want in plain language. Include anything on your mind, even when it's messy, half-formed, or contradicts itself. Goal: drain your head onto the page.

This isn't a spec. It's raw material. The whole point of the brain dump is catching intent and context that dies the moment you try to force a formal spec out of your head.

A brain dump doubles as a scope check. If you can't sketch what you want in a few paragraphs, the feature is too big for a single agent team session. Chop it up first.

Step 2: Research and Q&A

This is the step most people skip, and it's the one that matters most. The goal: reduce assumptions before any code gets written.

Most agent team failures don't come from bad code. They come from misalignment. The agent ships something that isn't what you wanted because it guessed wrong. Fix: have Claude read your codebase and ask you questions before it plans anything. Structured team planning replaces the ad-hoc "drop into plan mode" habit.

Not 2 or 3 questions. At least 10. Every extra question Claude asks is one fewer assumption baked into the plan.

Claude will come back with questions like:

Should the billing page live under the dashboard layout or be standalone?
What token package tiers and prices do you want?
Should we use ChargeB's hosted checkout or embedded in-app?
What should happen in the chat UI when a user has zero tokens?
Do you already have a ChargeB account and webhook endpoint configured?

Every question is a potential fork in the implementation. An agent that guesses "embedded checkout" and starts building will trash everything once you tell it you wanted hosted. Ten minutes of Q&A kills hours of rework.

The AskUserQuestion tool in Claude Code makes this fast. Most answers are multiple choice. Click through where Claude's pick is right. Type a custom answer where you need precision.

For more structured takes on the planning phase, see auto-planning strategies.

Step 3: Turn It Into a Structured Plan

With Q&A closed out, bake everything into a structured plan. That plan becomes the single artifact driving the whole team run. It has to carry:

Task description and objective so the team knows what success looks like
Relevant files so agents know what exists and what to create
Team members with named roles, agent types, and single responsibilities
Step-by-step tasks with dependency chains (Depends On fields) and file ownership boundaries
Acceptance criteria that are specific and measurable
Validation commands that can be run to verify the work

Here's the structure that makes team orchestration work:

The plan does three jobs at once: it draws file ownership lines (no conflicts), it encodes a dependency chain (so execution knows what to build first), and it writes down acceptance criteria (so validation has something concrete to aim at).

Look at the Depends On fields. They're not docs. They form the contract chain the execution phase uses to pick which agents to spawn first and which interfaces to pull between waves.

Step 4: Start Fresh With the Plan

This one feels wrong and still matters. Open a new Claude Code session with only the plan loaded. Do not keep going in the same context window where you did the brain dump and Q&A.

Why? Planning conversations burn context. They're stuffed with exploratory questions, dead ideas, and back-and-forth that's now irrelevant. The plan is the distilled version. It already contains what the team needs. The planning chatter is dead weight crowding out the build.

This also means plans are reusable. A session that fails partway through restarts from the same plan file. No planning work gets redone.

Step 5: Contract Chain Analysis

Before any agent spawns, the lead walks the plan's dependency graph and pulls out the contract chain. This is the step that makes agent teams trustworthy for production builds.

A contract chain groups tasks into waves by their dependencies, then names the outputs each completing wave produces that the next wave needs:

Here's the key insight: no agent starts work until its contracts exist. The database agent runs first. When it finishes, it messages the lead with the real schema definitions, table types, and relationships. Those outputs ARE the contract. The lead then pastes those concrete schemas straight into the spawn prompts for the API and frontend agents, so both build against real interfaces instead of inventing their own.

That's why parallel agents end up with code that actually integrates: they aren't guessing data shapes independently. They're all writing against the same contract, and that contract came from work that already shipped.

What Gets Injected as a Contract

Contracts aren't abstract specs. They're the actual outputs of upstream work:

Database completes -> schema contract: exact table definitions, column types, foreign keys, TypeScript types
API completes -> API contract: endpoint routes, request/response shapes, status codes, auth requirements
Shared types created -> type contract: interfaces, enums, constants that multiple agents reference

Every downstream agent gets the matching contracts pasted into its spawn prompt. Not "go read what the database agent did." The raw content. That kills the failure mode where one agent reads stale files or misreads another agent's output. Subagent patterns leave each agent guessing in its own corner. A proper agentic workflow wires every agent to a verified interface.

Step 6: Wave Execution

With the contract chain laid out, the lead spawns agents in waves.

Wave 1 (Foundation): The database agent goes first. It handles foundational work: schemas, shared types, config. The lead waits for completion and takes delivery of the contract.

Wave 2+ (Parallel): Once the schema contract lands, the API and frontend agents spawn together. Each one gets the schema contract dropped into its prompt, alongside its task assignment and file ownership lines.

Teammate spawn prompts follow this structure:

The load-bearing pieces: file ownership boundaries (no two agents touch the same files), upstream contracts (actual content, not references), and downstream contract obligations (what this agent has to produce for others).

Turn delegate mode on so the lead coordinates instead of writing code itself. During execution the lead's job is watching the task list, sorting out contract mismatches, and nudging agents that start to drift.

Brownfield Codebases

Most real work happens inside an existing codebase. Brownfield agent teams need convention consistency. Three agents editing the same project at once still need to follow the patterns that are already there.

Fix: document your project conventions in CLAUDE.md (naming, error handling, file layout, testing approach). Agent teams read CLAUDE.md as shared runtime context, so every teammate starts aligned from line one. Skip this and one agent ships camelCase API responses while another ships snake_case because they each picked a convention on the fly.

Step 7: Post-Build Validation

Teammates finishing their tasks isn't done. Parallel builds produce components that pass on their own but crack open at the seams. Validation catches those failures.

The plan's acceptance criteria and validation commands drive this step. The lead (or a dedicated quality-engineer agent) grinds through each criterion one by one:

The False Positive Problem

The scariest post-build scenario: every agent marks its tasks complete and reports success, but buried errors are sitting there. It happens because agents are incentivized to close tasks and sometimes rubber-stamp their own work.

Counter it by asking for evidence, not confirmation. Never ask "did everything work?" Ask for specific outputs:

Demanding evidence forces the lead to actually verify instead of pattern-matching on a vibe. Same idea as code review: you read the diff, you don't just ask the author if it works.

The Fix-Retest Cycle

When validation turns up an issue, the loop is simple:

Lead identifies the mismatch (e.g., webhook handler returns wrong status code)
Lead messages the responsible teammate or spawns a targeted fix agent
Fix is applied
Lead re-runs the affected validation checks

Most runs converge in 1 or 2 iterations if contracts were set up right. Without contracts, the fix-retest loop spirals because every fix unearths another assumption mismatch. That spiral is exactly why the contract chain exists.

For structured fix loops built on dependency chains, the builder-validator pattern formalizes this as task dependencies where validators run automatically after builders complete.

Step	Phase	Action	Why It Matters
1. Brain dump	Planning	Write requirements in plain language	Captures intent without premature structure
2. Research & Q&A	Planning	Claude investigates codebase, asks 10+ questions	Eliminates assumptions before planning
3. Structured plan	Planning	Define team, tasks, dependencies, acceptance criteria	Gives each agent a clear, non-overlapping scope
4. Fresh context	Execution	Start new session with just the plan	Maximizes context, discards planning noise
5. Contract chain	Execution	Derive wave order and interfaces from dependency graph	Prevents integration failures in parallel builds
6. Wave execution	Execution	Spawn agents in waves with contracts injected	Fast parallel build with guaranteed compatibility
7. Validation	Execution	End-to-end testing against acceptance criteria	Catches seam failures that individual tests miss

Steps 1-3 (planning) take 15 to 30 minutes of interactive work. Steps 4-7 (execution) run mostly unattended once the first wave goes out.

The ratio is the point: 30% of your time on planning and contracts cancels out the 70% of rework you'd otherwise eat cleaning up integration failures from a poorly planned parallel build.

All seven steps can be done by hand with raw prompts. But the repetitive bits (plan formatting, contract chain derivation, wave execution, validation sequencing) are mechanical enough to wrap into reusable commands.

The concepts here (assumption reduction through Q&A, contract chains between waves, evidence-based validation) apply no matter whether you run them with commands, raw prompts, or a custom orchestration layer.

Pick a feature that crosses at least two layers (frontend + backend, or API + database):

Brain dump what you want
Have Claude ask you 10 clarifying questions
Build a structured plan with team members and dependency chains
Start fresh and let the lead derive the contract chain
Watch agents build in waves against shared contracts
Validate the integration points with evidence, not confirmation

The first run takes longer because you're building the muscle memory. By the third feature, the workflow becomes second nature and the compounding kicks in: plans turn into reusable templates, contracts turn into standardized interfaces, and validation criteria stack into a project-wide quality baseline.

Agent Teams Workflow

On this page