Experimentation Mindset

Problem: You reach for the same prompting patterns every session. You have no idea if they're the best ones, or if they're quietly costing you hours.

Quick Win: Run this one experiment right now and watch prompt style change Claude's output:

Prompt A: "Fix the bug in auth.ts"
Prompt B: "Let's work through auth.ts together. What might be wrong?"
Prompt C: "Review auth.ts like a senior engineer. List every issue you find."

Compare the three results. The collaborative version usually produces deeper analysis and catches edge cases the direct version walks past. That's the experiment loop in action.

Experiment 1: Prompt Style Impact

The same task gives different outputs based on how you frame it. Try these three styles on any debugging task:

Style 1 (Command): "Fix the bug in login.ts"
Style 2 (Teaching): "Walk me through login.ts and help me find what's wrong"
Style 3 (Review): "Act as a code reviewer. Analyze login.ts for bugs, security issues, and edge cases."

What you'll find: Style 3 (review mode) tends to surface more issues because it frames the work as analysis rather than quick-fix execution.

Here's the pattern in the real world. Run all three styles against a login handler with a clear null-check bug plus a subtle timing flaw in the rate limiter. Style 1 (command) patches the null check and stops. Style 2 (teaching) patches the null check and explains why it matters, but still misses the timing issue. Style 3 (review) catches the null check, the timing vulnerability, and flags error messages leaking internal state to the client.

The pattern holds across task types, but the size of the gap varies. For security-sensitive code, review mode catches noticeably more. For simple UI bugs, the gap shrinks and command style does fine. For refactors, teaching mode often produces the cleanest results because it forces Claude to explain every change as it makes it.

Log which style wins for each task type in your projects. After a handful of sessions, you'll have your own lookup table: security work gets review mode, simple fixes get command mode, refactors get teaching mode.

Experiment 2: File Context Impact

Claude's code quality shifts based on what files you feed it as context. Test it yourself:

Experiment A: "Refactor errors.ts to use a cleaner pattern"
Experiment B: "Refactor errors.ts. Reference utils/errors.ts and AuthController.ts for the existing pattern."

The second version produces refactors that fit your existing patterns. Without context, Claude invents patterns that may fight with your codebase.

Try this on a real project with a custom error hierarchy. In experiment A, Claude writes a generic try/catch with new Error() calls. Clean code, but it ignores the AppError, ValidationError, and NotFoundError classes already in the repo. The refactored file won't even compile next to the rest of the code without type mismatches.

In experiment B, Claude finds the error classes in utils/errors.ts, reads how AuthController.ts already uses them, and produces a refactor that matches the existing pattern. Same task, same model, same session. The only difference was three extra file references.

The real lesson runs deeper than "hand Claude more files." There's a sweet spot. Including 2-4 tightly related files improves output quality consistently. Past 6-7 files, you hit diminishing returns because Claude spreads attention across too many patterns. Start with the type definition file and one file that already implements the pattern you want.

Experiment 3: Planning Mode vs Direct Execution

Planning mode (Shift+Tab twice) changes how Claude approaches problems. Run this comparison:

Direct: "Add user permissions to the admin dashboard"
Planning: [Shift+Tab twice] "Add user permissions to the admin dashboard"

In planning mode, Claude reads your codebase and presents options before touching a single file. You review the approach, catch issues early, and pick the path forward.

The gap shows up fast on multi-file features. Run this experiment on a permissions system spanning the auth middleware, dashboard routes, and frontend components. Direct execution dives straight into the middleware and starts coding. It makes assumptions about the role structure that fight with the existing user model, and half the changes have to be reverted.

Planning mode, by contrast, surfaces three architecture options before writing any code. One option uses the existing role enum. Another suggests a permissions table for finer granularity. The third proposes a hybrid. Pick option one, and the implementation lines up with the codebase from the first file touched.

Where the threshold sits: Complex features touching 3+ files almost always benefit from planning mode. Simple fixes (single file, obvious change) don't. The break-even sits around 2 files. At 2 files, planning mode adds about 30 seconds of overhead but saves backtracking. Below that, it's ceremony for ceremony's sake.

Experiment 4: CLAUDE.md Configurations

Your CLAUDE.md file shapes Claude's behavior directly. Test different configurations:

Configuration A: (empty CLAUDE.md)
Configuration B:
  - Always ask before making changes outside the current feature directory
  - Use conventional commits
  - Prefer small, focused changes over large refactors

Run the same task with each one and compare:

How many clarifying questions does Claude ask?
Does it follow your project patterns?
Are the commits scoped properly?

With Configuration A, Claude treats every directory as open ground. Ask it to add a dashboard feature and it modifies the database schema, updates the API routes, and changes the deployment config in one shot. No questions asked. One giant commit.

With Configuration B, the same request triggers a clarifying question: "The dashboard feature needs a new API endpoint. Should I create it in src/api/ or modify the existing routes.ts?" The resulting work lands as three focused commits, each touching only the files for that layer.

Your CLAUDE.md isn't documentation. It's an operating system. The instructions you put there shape every decision Claude makes. Run different instructions against the same task and watch the output shift. Small changes in CLAUDE.md produce surprisingly large changes in behavior. Check the rules directory guide for more granular control over Claude's behavior per file type.

Experiment 5: Context Window Pressure

This one is quietly important. Claude's output quality drops under context pressure in ways that don't announce themselves. Run this test:

Start a fresh session and ask Claude to build a moderately complex feature (something touching 3-4 files). Note the quality. Then, in an existing session that's been running for a while with accumulated context, ask for the exact same feature.

The fresh session usually produces cleaner code with better error handling. The loaded session cuts corners: shorter variable names, fewer edge case checks, sometimes skipping tests. It's not that Claude forgets how to write good code. It's that the model spreads attention across all the accumulated context, leaving less room for the current task.

The practical takeaway: use /compact aggressively before complex tasks. Run /compact before any feature touching more than two files, and the quality difference is easy to see. Some developers prefer /clear for a totally fresh start, but you lose conversation history. The /compact command splits the difference by summarizing previous context without throwing it out.

Build Your Own Experiment Log

Create a simple tracking file to capture what you learn:

# Claude Code Experiments

## Prompt Style
- Review mode: best for security and auth code
- Command mode: best for quick UI fixes
- Teaching mode: best for refactors

## Planning Mode
- Use for 3+ files
- Skip for single-file changes

This log becomes your personal optimization guide. After a month, you'll have a dataset of what actually works in your codebase with your patterns. That's worth more than any generic advice, because the right approach shifts by project, by language, and by the kind of work you're doing.

Being specific is the point. "Review mode is better" isn't useful. "Review mode catches 2-3x more security issues in auth code" is something you act on every session.

Next Experiments to Try

Ready to test more? Try these:

Context management: What happens at 60% vs 90% context usage?
Auto-planning: Does automated planning beat manual mode switching?
Model selection: How do different models handle the same complex task?
Deep thinking: Does extended thinking produce better architecture calls?

Stop guessing. Run the experiment. Log the result. Iterate.

Every session is a chance to find something that makes you faster. Developers who experiment systematically outrun the ones who stick with their first instinct. Every workaround is a pattern waiting to be named. Every named pattern is a skill Claude can reuse.

This is what the "system evolution mindset" means in practice. It's one of the 5 Claude Code best practices that separate top developers from everyone else.