CLAUDE_CODE_FORK_SUBAGENT=1 lets parallel child agents share their parent's prompt cache prefix, cutting input token costs by up to 90% for children 2-N.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
Problem: Running parallel subagents in Claude Code used to be expensive. Each child agent rebuilt the system prompt, the full tool array, and the entire conversation history from scratch. Five agents meant five full input token bills. No sharing. Before v2.1.117, the fork path existed in the codebase but was dead-code-eliminated in public releases. Only internal Anthropic builds could use it.
Quick Win: One env var changes the math. Set CLAUDE_CODE_FORK_SUBAGENT=1 and children 2-N pull their shared prefix straight from the prompt cache. Roughly a 10x cost reduction per additional child:
export CLAUDE_CODE_FORK_SUBAGENT=1A fork child does not get a fresh session. It gets the exact bytes of its parent's already-rendered system prompt, via override.systemPrompt pulled from toolUseContext.renderedSystemPrompt. Same bytes, not a new render. This matters because feature flag state (like GrowthBook) stays consistent between parent and child without any extra coordination.
The tool array follows the same rule. It passes through with useExactTools: true, which bypasses resolveAgentTools() entirely. Serialization stays byte-identical, which is what makes the prompt cache work. There is one wrinkle: the Agent tool stays in the child's tool pool even though the child is forbidden from calling it. Removing it would break the cache. The restriction is enforced a different way (more on that below).
Every parent tool-use block in the child's message history gets a constant () as its result. Every child receives identical placeholder bytes. The cache boundary falls right before the per-child directive, so children diverge only at the instruction specific to their task.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
FORK_PLACEHOLDER_RESULT'Fork started -- processing in background'Before v2.1.117, this entire path was removed from public builds at compile time. External users could not trigger it at all.
At 48,500 tokens of shared prefix, the numbers are concrete:
| Scenario | Per-child token cost |
|---|---|
| Without fork (each child rebuilds) | ~48,700 tokens |
| With fork (children 2-N hit cache) | ~5,050 tokens |
Child one pays full price. It is the first request, so there is nothing in the cache yet. Children 2-N hit the shared prefix and pay for only the delta.
Five agents working in parallel: without fork you spend roughly 243,500 tokens on shared context. With fork you spend ~48,700 for child one plus ~20,200 for children two through five. The cache does the rest.
Add the env var to your shell profile:
export CLAUDE_CODE_FORK_SUBAGENT=1For CI/CD pipelines, set it in your environment variables at the pipeline level. Every agent run in that pipeline will pick it up.
One behavior to know: fork only fires when subagent_type is omitted from the Agent tool call. If the model specifies an explicit type (for example "Explore" or "Plan"), the fork path does not trigger. The named-type paths have their own logic.
This is the detail that does not appear anywhere else.
Each fork child gets a boilerplate XML tag injected into its session. The tag says, roughly: that instruction is for the parent. You ARE the fork. Do NOT spawn sub-agents.
Without this, children would try to fork their own children, and those would fork further, and the whole thing would recurse until the context window collapsed. The XML tag breaks the chain.
There are two guards, not one. The first is a querySource === 'agent:builtin:fork' fast-path check at the top of the agent options handler. The second is a fallback message-history scan that looks for the boilerplate XML tag itself. The fallback exists because querySource can be rewritten by autocompact during long sessions. Both checks need to agree the session is a fork child before the guard fires.
Fork is also incompatible with coordinator mode. A forked coordinator would inherit the parent's "you are the coordinator, delegate work" system prompt and start orchestrating instead of executing. The two modes cannot share a session.
Parallel agent dispatches are the main use case. Five agents across five modules run simultaneously, and children 2-5 share the parent's cached prefix. The total cost for the shared context drops to roughly one additional child's worth of tokens.
CI/CD pipelines get this automatically once CLAUDE_CODE_FORK_SUBAGENT=1 is in the environment. Multi-agent workflows that were previously cost-prohibitive at scale become practical.
The "policy islands" pattern also becomes usable. Commands with context: fork and agent: <name> in their frontmatter spawn isolated subagents with pre-declared allowed_tools. Those subagents run without mid-workflow approval prompts and return only a summary to the parent. The parent never sees the intermediate tool calls.
--print mode. Fork uses permissionMode: 'bubble', which surfaces prompts to a parent terminal. In non-interactive SDK/headless runs there is no terminal.With CLAUDE_CODE_FORK_SUBAGENT=1 set and Pro/Max users now at high effort by default (also as of v2.1.117), the combination is meaningful. An orchestrator running at high effort dispatches parallel fork children. Each child inherits the full parent context at cache-discounted rates. Each child runs at high effort. Before this release, Pro/Max users had neither.
Set the env var, run five agents in parallel, and four of them cost roughly one-tenth of what they did before. That is the whole thing.