Fork Subagents in Claude Code

Problem: Running parallel subagents in Claude Code used to be expensive. Each child agent rebuilt the system prompt, the full tool array, and the entire conversation history from scratch. Five agents meant five full input token bills. No sharing. Before v2.1.117, the fork path existed in the codebase but was dead-code-eliminated in public releases. Only internal Anthropic builds could use it.

Quick Win: One env var changes the math. Set CLAUDE_CODE_FORK_SUBAGENT=1 and children 2-N pull their shared prefix straight from the prompt cache. Roughly a 10x cost reduction per additional child:

export CLAUDE_CODE_FORK_SUBAGENT=1

What Fork Subagents Actually Are

A fork child does not get a fresh session. It gets the exact bytes of its parent's already-rendered system prompt, via override.systemPrompt pulled from toolUseContext.renderedSystemPrompt. Same bytes, not a new render. This matters because feature flag state (like GrowthBook) stays consistent between parent and child without any extra coordination.

The tool array follows the same rule. It passes through with useExactTools: true, which bypasses resolveAgentTools() entirely. Serialization stays byte-identical, which is what makes the prompt cache work. There is one wrinkle: the Agent tool stays in the child's tool pool even though the child is forbidden from calling it. Removing it would break the cache. The restriction is enforced a different way (more on that below).

Every parent tool-use block in the child's message history gets a FORK_PLACEHOLDER_RESULT constant ('Fork started -- processing in background') as its result. Every child receives identical placeholder bytes. The cache boundary falls right before the per-child directive, so children diverge only at the instruction specific to their task.

Before v2.1.117, this entire path was removed from public builds at compile time. External users could not trigger it at all.

The Cost Math

At 48,500 tokens of shared prefix, the numbers are concrete:

Scenario	Per-child token cost
Without fork (each child rebuilds)	~48,700 tokens
With fork (children 2-N hit cache)	~5,050 tokens

Child one pays full price. It is the first request, so there is nothing in the cache yet. Children 2-N hit the shared prefix and pay for only the delta.

Five agents working in parallel: without fork you spend roughly 243,500 tokens on shared context. With fork you spend ~48,700 for child one plus ~20,200 for children two through five. The cache does the rest.

How to Enable It

Add the env var to your shell profile:

export CLAUDE_CODE_FORK_SUBAGENT=1

For CI/CD pipelines, set it in your environment variables at the pipeline level. Every agent run in that pipeline will pick it up.

One behavior to know: fork only fires when subagent_type is omitted from the Agent tool call. If the model specifies an explicit type (for example "Explore" or "Plan"), the fork path does not trigger. The named-type paths have their own logic.

The Recursive Fork Guard

This is the detail that does not appear anywhere else.

Each fork child gets a boilerplate XML tag injected into its session. The tag says, roughly: that instruction is for the parent. You ARE the fork. Do NOT spawn sub-agents.

Without this, children would try to fork their own children, and those would fork further, and the whole thing would recurse until the context window collapsed. The XML tag breaks the chain.

There are two guards, not one. The first is a querySource === 'agent:builtin:fork' fast-path check at the top of the agent options handler. The second is a fallback message-history scan that looks for the boilerplate XML tag itself. The fallback exists because querySource can be rewritten by autocompact during long sessions. Both checks need to agree the session is a fork child before the guard fires.

Fork is also incompatible with coordinator mode. A forked coordinator would inherit the parent's "you are the coordinator, delegate work" system prompt and start orchestrating instead of executing. The two modes cannot share a session.

What It Unlocks

Parallel agent dispatches are the main use case. Five agents across five modules run simultaneously, and children 2-5 share the parent's cached prefix. The total cost for the shared context drops to roughly one additional child's worth of tokens.

CI/CD pipelines get this automatically once CLAUDE_CODE_FORK_SUBAGENT=1 is in the environment. Multi-agent workflows that were previously cost-prohibitive at scale become practical.

The "policy islands" pattern also becomes usable. Commands with context: fork and agent: <name> in their frontmatter spawn isolated subagents with pre-declared allowed_tools. Those subagents run without mid-workflow approval prompts and return only a summary to the parent. The parent never sees the intermediate tool calls.

Limits to Know

Incompatible with --print mode. Fork uses permissionMode: 'bubble', which surfaces prompts to a parent terminal. In non-interactive SDK/headless runs there is no terminal.
Incompatible with coordinator mode. A forked coordinator inherits orchestration instructions and tries to delegate instead of execute.
Context window scales with session length. Fork children carry the full parent conversation history. Long sessions with many children cost more in total, even with caching. Hot long sessions are where this bites hardest.
Opt-in, not default. The env var must be set explicitly. Nothing changes without it.

What It Unlocks at Scale

With CLAUDE_CODE_FORK_SUBAGENT=1 set and Pro/Max users now at high effort by default (also as of v2.1.117), the combination is meaningful. An orchestrator running at high effort dispatches parallel fork children. Each child inherits the full parent context at cache-discounted rates. Each child runs at high effort. Before this release, Pro/Max users had neither.

Set the env var, run five agents in parallel, and four of them cost roughly one-tenth of what they did before. That is the whole thing.