Penetration testing

Not a scan. An attack.

Security scanners read your code and flag potential issues. That's useful but it's also theoretical. A scanner says "this input might be vulnerable to injection." A pentest agent actually tries to inject it and shows you what happens.

Inside Claude Code, type /pentest. It spawns a team of agents that attack your running application. They read your latest security report first (if you've run /security), then validate each finding by attempting the actual exploit. After that, they go looking for vulnerabilities the static scan missed.

How it works

Three agents work different attack surfaces simultaneously.

One targets the API layer. It sends malformed inputs, tests parameter manipulation, tries to access data it shouldn't have permission to see, and verifies that your rate limiting actually blocks abuse.

Another opens a real browser and tests user-facing attack vectors. Cross-site scripting through form inputs, clickjacking on sensitive pages, session fixation attempts, authentication flow manipulation.

A third focuses specifically on auth and permissions. Token forgery, session hijacking, privilege escalation between user roles, what happens when tokens expire mid-action, whether deleted users can still access resources.

What you get

A report with actual proof of concept for every finding. Not "this might be vulnerable." Instead: "we sent this request and got back data belonging to a different user." Or: "we injected this script through the feedback form and it executed." Each finding includes the exact request, the response, and a severity rating.

Findings that the agents can safely patch get fixed automatically. Things that need your judgment are flagged with explanation and a recommendation.

When to run it

After /security gives you the static analysis. Before you launch publicly. After you add any feature that touches authentication, payments, or personal data. This is the command that tells you whether your security policies actually work when someone tries to break them.

Not a scan. An attack.

How it works

Three agents work different attack surfaces simultaneously.

What you get

Findings that the agents can safely patch get fixed automatically. Things that need your judgment are flagged with explanation and a recommendation.

Not a scan. An attack.

How it works

What you get

When to run it

On this page

Penetration testing

Not a scan. An attack.

How it works

What you get

When to run it

On this page