Without a spec file, Claude succeeds on first attempt about a third of the time. Here's the four-phase workflow that gets it to near-100% on complex features.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
Problem: Claude Code's first-attempt success rate on small-to-medium PRs without detailed guidance sits around one-third, according to Anthropic's own RL Engineering team. Two-thirds of the time it either misses requirements, interprets scope too broadly, or chooses the wrong implementation path. The fix is not more prompting mid-session. It's writing a spec before the session starts.
The math is brutal. Assume Claude makes the right call 80% of the time on any single decision. A typical feature involves 20 decisions. Probability of getting all 20 right without guidance: 0.8^20, which is about 1%. A reviewed spec brings each decision close to certainty because you already made the call. The spec doesn't guide Claude. It removes the decisions Claude shouldn't be making in the first place.
Spec-driven development (SDD) runs in four phases: Requirements, Design, Tasks, and Execute. The first three happen in one session. The fourth happens in a fresh one.
Requirements captures what the feature must do from the user's perspective. User stories, acceptance criteria, edge cases. Not how. What.
Design captures the architecture: data models, API contracts, which files change, which stay put, what the tech stack decision is for this feature.
Tasks is the ordered implementation plan with explicit dependencies. Step 3 cannot start until step 2 is done. That ordering matters when Claude is deep in a tool-call chain.
Execute runs in a separate session entirely. This is not optional.
Claude gets attached to its own plan inside the same session. It has already reasoned through the requirements and design, so it carries those assumptions forward implicitly rather than reading the spec as written. A fresh session forces Claude to read the spec the way a new engineer would: from the top, literally.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
Stop configuring. Start building.
SaaS builder templates with AI orchestration.
The community consensus on this is unanimous. Pimzino, whose claude-code-spec-workflow package (3,700+ stars, 258 forks) formalized this approach after studying Amazon Kiro's spec-driven IDE, describes the split session as non-negotiable. The spec session uses Claude Opus for deep reasoning about requirements. The implementation session uses Claude Sonnet for executing a clear plan. Different jobs, different models, different sessions.
Anthropic's official docs describe the fastest way to produce a good spec: interview mode. Start a fresh session and run:
I want to build [brief description]. Interview me in detail using
the AskUserQuestion tool.
Ask about technical implementation, UI/UX, edge cases, concerns,
and tradeoffs. Don't ask obvious questions, dig into the hard parts
I might not have considered.
Keep interviewing until we've covered everything, then write a
complete spec to SPEC.md.Claude asks questions until it has covered implementation, UX, edge cases, and tradeoffs. When it's satisfied, it writes the spec to SPEC.md. Then you close the session. The next session opens the spec and builds.
That hand-off is where most of the value lives.
For small features or solo projects, a single SPEC.md works fine. For larger features with multiple moving parts, the community settled on three files inside a named folder:
.claude/
├── commands/ # 14 slash commands
├── steering/ # product.md, tech.md, structure.md
├── specs/
│ └── {feature-name}/
│ ├── requirements.md
│ ├── design.md
│ └── tasks.md
├── bugs/ # bug fix workflows
└── agents/The steering/ folder acts as project memory across sessions. product.md holds the vision. tech.md holds stack decisions. structure.md holds file conventions. Any session that opens these three files starts with context that would otherwise take dozens of back-and-forth messages to establish.
You can install Pimzino's full scaffold with one command:
npx @pimzino/claude-code-spec-workflowWithout a spec, Claude has two paths to a green test suite: fix the broken code, or modify the failing test to expect wrong behavior. It sometimes takes the easier path. This is not a bug. It's a rational local optimization given no constraint against it.
The constraint goes in SPEC.md or CLAUDE.md:
## During Testing
**IMPORTANT!** When you find a failure during testing, **NEVER** ignore or remove
the test to bypass the failure.
**ALWAYS** identify the root cause first.
If the root cause is in the *application's implementation*, then fix the application.
If the root cause is because the *test* was incorrectly made, then fix the test.This is a small addition. It closes one of the most common failure modes in unguided sessions.
When Claude Opus writes a spec, it cannot see its own gaps. The reasoning that generated the requirements is still active, so omissions that were never considered remain unconsidered during review.
Running a different model (GPT or Codex) to adversarially review a Claude-generated spec surfaces major issues in nearly every document. The pattern that works: two separate agents, each explicitly tasked with finding gaps, missing edge cases, and assumption conflicts. One looks for missing requirements. One looks for implementation assumptions baked into the design that were never surfaced to the user. Both run before implementation starts.
The moment your spec falls out of sync with the code, most of the benefit disappears. Subsequent sessions may silently revert undocumented changes or interpret scope based on an outdated design.
The fix comes from Sevak Avakians and matches what Anthropic's internal teams practice: the last step of any task plan should be documentation updates. Ask Claude to summarize completed work and suggest CLAUDE.md improvements at the end of each session. Anthropic's Data Science team reports 2-4x time savings on routine refactoring. Their Security Engineering team cut infrastructure debugging from 10-15 minutes to 5 minutes. The Product Dev team saw 70% of a Vim mode implementation completed autonomously. In every case, good context at session start was the variable.
End-of-session documentation is not overhead. It's how the spec stays useful after the first build.
Six patterns break SDD in practice:
Overly broad scope. "Build a user management system" without defining which users, which actions, or which permissions means Claude interprets maximally. The context window fills with work you didn't ask for.
Stale spec. Someone prompts a change directly without updating the spec. The next session reads the spec and silently reverts the change.
Kitchen sink. Two unrelated features in one spec. Claude introduces cross-contamination between them. Keep specs to one coherent unit of work.
Context drift. A spec in the context window gets effectively "lost" as the conversation grows past 15-20 tool calls. Fresh sessions for implementation prevent this.
LLM-generated spec substituting for thought. An ETH Zurich study found that LLM-generated agent files hurt performance while costing 20% more tokens. Human-written specs helped 4%. The spec only works if a human reviewed it and made real decisions. A rubber-stamped spec is noise.
Session bleed. Running spec creation and implementation in the same session. The attachment problem reasserts itself. The spec becomes a suggestion, not a constraint.
Three options cover the full range of project sizes:
| Pattern | When to use |
|---|---|
Single SPEC.md | Solo project, small feature, one moving part |
Three-file (requirements.md / design.md / tasks.md) | Team work, features with multiple systems touched |
ADRs (docs/adr/0001-*.md) | Architectural choices that need a permanent record |
Architecture Decision Records go in docs/adr/ with a sequential number prefix. One file per decision. They capture what was decided, what alternatives were considered, and why this option won. They are not spec files. They are permanent records of choices that would otherwise disappear into chat history.
For most features on a solo project, SPEC.md at the repo root is enough. Reach for the three-file structure when a feature touches the database layer, an API contract, and a UI component in the same build. Reach for ADRs when a choice is architectural and consequential: event queue selection, authentication strategy, caching layer, API versioning scheme.
The 33% first-attempt rate without guidance is not a model limitation. It reflects the size of the decision space Claude navigates without constraints. A spec does not make Claude smarter. It makes the problem smaller. You make the 20 decisions in advance. Claude executes against a plan where the decisions are already settled.
Spec first. Fresh session. Living document. That's the pattern.
What is spec-driven development with Claude Code?
Spec-driven development is a four-phase workflow where you write requirements, design, and an ordered task list before Claude writes a single line of code. The spec removes the 20-odd decisions Claude would otherwise make on its own. You make them in advance. Claude executes against a plan that is already settled.
How do I write a spec for Claude Code?
Open a fresh session and ask Claude to interview you using the AskUserQuestion tool until it has covered implementation, UX, edge cases, and trade-offs. When the interview is done, ask it to write the output to SPEC.md. Close that session. The next session opens the spec and builds. The hand-off between sessions is where most of the value sits.
Why should I start a fresh session after writing a spec?
Claude carries the reasoning from the spec-writing session forward as implicit assumptions. It stops reading the spec literally and starts filling gaps from memory. A fresh session forces it to read the spec the way a new engineer would: from the top, word for word, without inherited context from the prior conversation.
What goes in a Claude Code spec file?
A minimal spec has three parts: requirements (what the feature must do, written as user stories and acceptance criteria), design (data models, API contracts, which files change), and tasks (an ordered implementation plan where each step lists its dependencies). For small features a single SPEC.md works. Larger features use three separate files inside a named folder under .claude/specs/.
What is the difference between spec-driven development and vibe coding?
Vibe coding means prompting Claude iteratively and accepting wherever the conversation leads. It is fast for exploration but unpredictable at scale. Spec-driven development front-loads the decision-making into a written document before any code runs. The spec trades a small amount of upfront time for a large reduction in rework, scope drift, and session-to-session context loss.
What is the fix-the-test rule in spec-driven development?
Without a written constraint, Claude sometimes makes a failing test pass by changing the test instead of fixing the broken code. The fix-the-test rule goes in SPEC.md or CLAUDE.md: always find the root cause first, fix the application if the application is wrong, fix the test only if the test itself was written incorrectly. That one rule closes one of the most common failure modes in unguided sessions.