A second Sonnet model reviews every Claude Code tool call before it fires. What auto mode blocks, what it allows, and the allow rules it drops in your settings.
Problem: Every Claude Code user burns out on permission prompts. You're three files into a refactor, Claude needs npm test, and a modal drops in front of your work. Approve. File read. Approve. Migration write. Approve. After thirty prompts you're not reading them anymore. You're just clicking.
The other option was --dangerously-skip-permissions. That flag pulls every safety rail out. Fine inside a container. On your laptop, with SSH keys and .env files and git credentials sitting right there? Not an option any adult should pick.
Auto mode is the middle path. It shipped March 24, 2026, and it works by running a second AI in the background. Every tool call Claude wants to make gets inspected first. Risky calls get blocked and Claude gets told why. Safe ones run with no prompt at all. The reviewer sits between Claude and your filesystem, and it makes the call you would have made, faster than you could click.
Auto mode is a new permission mode. It slots between default (you review everything) and bypassPermissions (nothing is reviewed). Turn it on and Claude stops showing prompts. Before each tool call actually runs, a separate classifier model looks at the conversation so far and the pending action, then decides pass or block.
Three risk categories drive the decision:
Scope escalation: is the action beyond what you actually asked for?
Untrusted infrastructure: is the target something the classifier has no reason to trust?
Prompt injection: does the action look like it came from hostile content Claude read in a file or webpage?
Pass and the action fires. Block and Claude gets the reason back so it can try a different approach. Your hands stay on the keyboard. The reviewer stays on watch.
Every classifier call runs on Claude Sonnet 4.6, no matter which model your session uses. Input is your user messages plus the pending tool calls. Claude's own prose and prior tool results are stripped out on purpose. Because tool output never lands in the classifier's context, nothing malicious inside a file or page can reach in and flip the decision.
Your CLAUDE.md does get passed through. Project rules feed into what the classifier accepts and refuses. Static allow and deny lists match tool names and arguments like a grep pass. The classifier reads prose and reasons about intent, so it handles cases pattern matching cannot.
Each tool call walks a fixed ladder. First match wins:
Step
Check
Result
1
Matches your allow or deny rules
Resolves immediately
2
Read-only action or file edit in working directory
Auto-approved
3
Everything else
Goes to classifier
4
Classifier blocks
Claude retries with alternative approach
Your settings.json rules still run first. Bash(npm test) in the allow list executes without the classifier ever waking up. Bash(rm -rf *) in the deny list gets killed before the classifier sees it.
Here is the gotcha most people miss: the moment you flip into auto mode, Claude Code removes your broad allow rules that hand out arbitrary execution. Anything like Bash(*), Bash(python*), Bash(node*), and every Agent allow rule gets pulled for the duration.
Reason is blunt. If Bash(*) stayed live, the exact commands most likely to hurt you would auto-approve before the classifier ever got a look. The whole feature would be defeated.
Tight rules stay put. Bash(git status) and Bash(npm test) carry over fine. Dropped rules come back when you leave auto mode.
A trust boundary runs through the classifier's view of your system. Your local working directory is trusted. If you're inside a git repo, the configured remotes for that repo are trusted. Anything outside that perimeter counts as external until an admin tells it otherwise.
Reading, writing, editing files in your working directory
Declared dependencies
Installing packages already in your lock files or manifests
Credential usage
Reading .env and sending credentials to their matching API
Read-only network
HTTP GET requests, fetching documentation
Branch operations
Pushing to your current branch or one Claude created
Pull the full default ruleset the way the classifier reads it:
claude auto-mode defaults
Routine team work sometimes trips the classifier. Pushing to your org's repo, writing to a company bucket. The classifier has no idea those are yours. Admins fix this by configuring trusted infrastructure under the autoMode.environment setting.
Teams running agent loops and multi-agent setups need to know how this flows through to subagents. Three checkpoints:
At spawn: before the subagent starts, the classifier reads its delegated task description. "Delete all remote branches matching this pattern" gets refused at the door. The subagent never runs.
During execution: inside the child session, auto mode applies the same allow and block rules as the parent. Any permissionMode the subagent set in its own frontmatter is ignored. Every tool call goes through the classifier on its own.
On return: once the subagent finishes, the classifier reads its full action log. A child that looked fine at spawn might have been hijacked mid-run by something it read. If the return pass flags a problem, a security warning gets stuck on the front of the subagent's output so the parent can react.
Three checkpoints means subagents do not need your blind trust. The classifier treats them the same way it treats the main session.
Here is the full spread across every permission mode Claude Code ships:
default
acceptEdits
plan
auto
dontAsk
bypassPermissions
Permission prompts
File edits and commands
Commands only
Same as default
None (unless fallback)
None (blocked unless pre-allowed)
None
Safety checks
You review each action
You review commands
You review commands
Classifier reviews commands
Your pre-approved rules only
None
Token usage
Standard
Standard
Standard
Higher (classifier calls)
Standard
Standard
Best for
Sensitive work
Code iteration
Codebase exploration
Long-running tasks
CI/CD pipelines
Isolated containers only
Risk level
Lowest
Low
Low
Medium
Depends on rules
Highest
The trade is simple. You pay more tokens and wait a bit longer per checked action. You lose the stream of prompts that turns any long session into a clicking exercise.
False positives should not sink your session, so the fallback catches them. If the classifier blocks 3 in a row or 20 total inside one session, auto mode pauses and Claude Code goes back to asking for approval by hand.
Neither threshold can be tuned.
When it fires:
CLI: a note appears in the status area. Approve the next manual prompt and the block counters reset, so you can stay in auto mode after.
Non-interactive mode (-p flag): the session exits. No one is there to answer.
Repeat blocks come from one of two places. The task genuinely wants something the classifier is built to stop, or the classifier is missing context about infrastructure you actually own. Use /feedback when it feels like a false positive. If it keeps missing that your repos and services are trusted, get an admin to configure trusted infrastructure in managed settings.
One layer is never the whole story. Auto mode gives you more protection than bypassPermissions and less than reviewing every call by hand. The strongest setup stacks:
Layer 1: Permission rules. Allow and deny lists in settings.json resolve before the classifier runs. Use them for hard, deterministic control.
Layer 2: Auto mode classifier. Catches everything the rules do not. Reasons about context, not just text patterns.
Layer 3: Hooks. PreToolUse hooks run custom logic ahead of the permission system. The Permission Hook ships an LLM-powered auto-approver with a three-tier flow (fast approve, fast deny, LLM analysis). Hooks and auto mode coexist: hooks run first and can approve, deny, or escalate before the classifier sees the call.
Layer 4: Sandboxing. OS-level sandboxing walls off filesystem and network access at the kernel. Even when the classifier misses, the sandbox keeps shell commands inside the box you drew. This matters because the classifier reads intent while the sandbox enforces hard walls.
Layer 5: Self-validating agents and stop hooks. These keep agents on task and inside scope, adding another verification pass on top of the permission story.
Every layer fills the gap the others leave. That is defense in depth.
This shipped as a research preview. Be honest about what that word means:
No safety guarantee. Ambiguous user intent or missing environment context can cause the classifier to miss a risky action. The reverse happens too (false positives on benign ones).
It costs more. Classifier calls count against your token usage. Each checked action sends a slice of the transcript plus the pending call. Most of the extra cost comes from shell commands and network operations, because read-only actions and local file edits skip the classifier entirely.
Latency is real. Every check adds a round trip before the action runs. Sequences of fast shell commands feel slower.
Narrow availability. Team plan only right now (research preview). Enterprise and API support is rolling out shortly. Sonnet 4.6 or Opus 4.6 required. No Haiku, no claude-3, no third-party providers.
Not a substitute for review on sensitive ops. Trust it with work where the direction is solid. For anything touching production, credentials, or shared infrastructure, human review is still the right call.
Calibration improves with data. /feedback is how false positives and missed blocks get reported. Every one of those reports tunes the system.