AI Security Agents
How to build a two-phase security pipeline with Claude Code sub-agents that understands your business logic and kills false positives.
Your app has security holes. Every app does. The question is whether you find them before your users do.
Here is what a security hole looks like in practice. Someone signs up for your app. They poke around. They change a number in the URL bar. Suddenly they're looking at another user's private data. Billing info, saved documents, personal messages. Not because they're a hacker. Because nothing in your app stopped them.
That's the kind of bug most SaaS apps ship with. Not some movie-style hack. Just a missing rule that says "you can only see your own stuff."
This post walks through a system that catches these bugs automatically. Two commands. Eight AI agents. Phase 1 finds everything that looks wrong. Phase 2 tries to actually break in and proves which problems are real. Only confirmed bugs make the final report.
The five holes most SaaS apps ship with
These are the most common security problems in apps built by solo founders, indie hackers, and vibe coders. None of them require hacking skills to exploit. A curious user with browser developer tools can find most of them.
1. Users can see each other's data
You create a database table to store user data. Profiles, documents, settings, whatever. By default, most databases don't care who is asking for the data. If someone requests it, the database hands it over.
The fix is a rule that says "only return rows that belong to the person asking." In database terms, this is called row-level security. Think of it like a filter that automatically adds "WHERE user_id = the person logged in" to every query. Without it, any logged-in user can request anyone else's data.
This is the most common security hole in SaaS apps. A user changes an ID in the URL or opens their browser's developer tools, and they can see things they shouldn't.
2. Someone skips the login and hits your backend directly
Your app has a login page. Behind the login, there are API endpoints that fetch and modify data. The frontend always sends the login token with every request. So you assume every request has a valid token.
But someone can call your API endpoints directly, without using your frontend at all. They can use tools like curl or Postman. If your backend doesn't check for a valid login token on every single request, they're in. No login needed.
3. Secret keys sitting in the browser
Your app talks to external services: payment processors, email providers, AI APIs. Each service gives you a secret key. Some of those keys are safe to expose in the browser (like your Stripe publishable key). Others are not (like your Stripe secret key, your email API key, or your database admin key).
If a secret key ends up in your frontend code, anyone can open browser developer tools, find it, and use it. They can send emails from your account, charge credit cards, or access your database with full admin permissions.
4. Your API tells people more than it should
Your user profile endpoint returns the user's data so the frontend can display it. But the backend returns everything in the database row, not just what the frontend needs. Email, full name, internal IDs, subscription status, maybe even password hashes.
A user calls your API and gets back their own data. Fine. But the response includes 15 extra fields the frontend never uses. Now they know your internal data structure. Worse, if hole #1 is also present, they can pull this detailed information for every user in your system.
Error messages are part of this too. When something goes wrong, your app might return the raw database error: "column 'stripe_customer_id' does not exist in table 'users'." That tells an attacker exactly how your database is structured.
5. Missing browser security rules
Browsers have built-in security features, but they only work if your app turns them on. These are HTTP headers your server sends with every response. They tell the browser things like:
- "Don't let other websites embed my app in a frame" (prevents clickjacking)
- "Only run scripts that I explicitly approved" (prevents code injection)
- "Only accept requests from my own domain" (prevents cross-site attacks)
Without these headers, your app is exposed to attacks that browsers were designed to block. Most frameworks don't set them by default.
Why security scanners don't help
There are tools that scan your code for security issues. The problem: they flag everything that looks wrong, even when it's fine.
Here's an example. Your app has a background job that processes payments. Background jobs don't have a logged-in user, so they need admin-level database access to work. A scanner sees "admin database access" and flags it as a critical vulnerability. But it's correct. The background job needs that access. It's not a bug, it's how the feature works.
This is called a false positive. The scanner flagged something that looks dangerous but is actually fine.
In a real scan, this happens constantly. The scanner reports 87 problems. You read through all of them. 82 of them are things that are working as designed. The 5 real bugs are buried in a pile of false alarms, and you can't tell which is which without deep knowledge of your own codebase.
The core problem: security tools don't understand your business logic. They don't know that your background job needs admin access. They don't know that your "ideas" table is intentionally public. They don't know that your onboarding flow uses a specific auth pattern on purpose. They just see patterns that look dangerous and flag them.
The two-phase solution
The pipeline is two Claude Code slash commands. Each one spawns a team of AI agents that work at the same time.
.claude/
commands/
security.md # Phase 1: 5 agents scan your code
pentest.md # Phase 2: 3 agents try to break in
agents/
security-auditor.md # Rules all Phase 1 agents follow
dev/
reports/
security/ # Phase 1 reports
pentest/ # Phase 2 reportsPhase 1 (Reporters): Five agents read your code and check for the five holes above. Each agent focuses on one area. They also have access to your live database, so they can check what's actually deployed, not just what the code says. Output: a report listing everything that looks wrong.
Phase 2 (Exploiters): Three agents try to actually break into your running app. They read the Phase 1 report and attempt to exploit each finding. They send real requests, try real attacks, and record what happens. If they can't actually break in, the finding gets marked as a false positive and removed. Output: a validated report where every remaining finding has proof attached.
The filter between the two phases is what makes this work. Phase 1 catches everything suspicious. Phase 2 proves what's real. The false alarms die in Phase 2 instead of wasting your time.
Phase 1: five reporters
Each reporter is an AI agent that focuses on one type of security problem. They all run at the same time.
Database Access Auditor
Checks whether users can see each other's data. Connects to your live database and looks at the actual access rules, not just the code. Finds tables where user data is stored but no "only return your own rows" rule exists. Also checks for database functions that have more permissions than they should.
Input Validation Auditor
Checks every place your app accepts user input. Can someone type code into a form field and have it execute? Can someone send a carefully crafted string that tricks your database into running commands? Can someone upload a file with a name like ../../etc/passwd and read files they shouldn't? This agent tests all of those.
Login and Session Auditor
Checks whether your login system is solid. Are there endpoints that should require a login but don't? Can someone call your API with a fake or expired login token and still get in? Can a regular user access admin-only features by tweaking their token? Is there anything stopping someone from trying thousands of passwords?
Data Leakage Auditor
Checks what your app reveals. Are API responses returning extra fields the frontend doesn't use? Are error messages showing internal database details? Are there secret keys in your frontend JavaScript that shouldn't be there? Is sensitive data showing up in URLs where browser history or server logs can capture it?
Config Auditor
Checks your app's security settings. Are the browser security headers turned on? Is your app telling browsers to accept requests from any website (it shouldn't)? Are your login cookies configured correctly? Are there known vulnerabilities in the packages your app depends on?
All five run at the same time. Each one sends back its findings as text. None of them change your code. One orchestrator reads everything and combines it into a single report.
The key to fewer false alarms: business logic awareness
This is what separates these agents from a regular security scanner.
Every codebase has things that look wrong but are correct. The agents need to know about these upfront. Otherwise they'll flag them every time, just like any other scanner.
The agent definition includes a section called "Documented Exceptions." It's a list of patterns the agents should recognize and skip. Things like:
- Background jobs that need admin database access (there's no logged-in user, so admin access is the only way)
- Tables that store public data and are intentionally readable by everyone
- Auth patterns that fetch extra user info during signup (needed for things like getting a user's name from Google)
- Public-facing API keys that are meant to be in the browser (like your site key for bot protection)
- Tables managed by third-party services (your payment provider syncs data into its own tables)
Each exception is specific: when the pattern appears, why it's correct. The agent checks its exceptions list before generating a finding. If a pattern matches, it skips it. No false alarm.
This list is the single most effective thing you can write. Start with 5 to 10 entries. After each audit, add any false positives that come through. By the third run, the agents catch 90%+ of noise before it reaches you.
The agents can also connect to your live database and check what's actually deployed. Code says what should be true. A live database query shows what is true. If a migration script was supposed to add access controls to a table but failed silently, the live query catches it. This removes an entire category of false alarms: things where the code looks fine but the deployment doesn't match.
Scoping: only check what changed
Running five agents across your entire codebase every time is expensive. Most of the time, you only need to check what changed since the last report.
The command handles this automatically. It looks at the date of the last security report, finds all files that changed since then, and only sends those files to the agents. If nothing security-relevant changed, it exits early.
Full scans are for the first audit or after a big refactor. Everything else is scoped to recent changes.
Phase 2: three exploiters
Phase 2 reads the Phase 1 report and tries to actually break into your running app. The dev server has to be running. These agents make real requests and try real attacks.
API Exploiter
Calls your backend endpoints directly. Tries the attacks from hole #1 (changing IDs to access other users' data), hole #2 (calling endpoints without a login token), and hole #3 (sending special characters that might trick the database). Records every request and response as evidence.
Browser Exploiter
Opens your app in a browser and tries attacks from hole #4 and #5. Types code into form fields to see if it executes. Checks if your app can be embedded in another website's page (which could trick users into clicking things). Copies a login token, logs out, and tries using the old token to see if it still works.
Login Exploiter
Focuses entirely on your authentication. Logs in as a regular user and tries to access admin features. Tries to modify the login token to change the user ID or permission level. Sends 50 rapid login attempts to see if there's any rate limiting. Tests the password reset flow for ways to reuse tokens.
Every finding needs proof. The exact request that was sent and the exact response that came back. If an agent can't produce proof that the attack worked, the finding gets killed. This is why Phase 2 eliminates false alarms: the agent has to actually exploit the bug, not just say it might exist.
The filter in action
In a real audit of a production SaaS app, the numbers looked like this:
| Stage | Count |
|---|---|
| Phase 1 reported issues | 87 |
| Phase 2 false alarms killed | 82 |
| Confirmed real bugs | 5 |
| Noise eliminated | 94% |
The 82 killed findings were things like: admin database access in a background job (correct), public tables without per-user access rules (intentionally public), a specific auth pattern used during signup (needed for that feature), verbose error messages that only appear in development mode (not in production).
The 5 confirmed bugs were real. One let any user give themselves admin access by updating their profile. Another let anyone add unlimited credits to their own account. A third was an open redirect in the payment flow that could send users to a fake site after checkout. Each one came with the specific code change to fix it.
Without Phase 2, you get 87 items and no idea which ones matter. With Phase 2, you get 5 items that are proven real, with the attack evidence attached.
Running it
Two commands:
/securityPhase 1. Five agents scan at the same time. Defaults to changes since the last report. Report saves to dev/reports/security/. If it finds serious issues, it tells you to run Phase 2.
/pentestPhase 2. Reads the Phase 1 report. Starts your dev server if it's not already running. Three agents try to break in at the same time. Validated report saves to dev/reports/pentest/.
| Flag | Command | What it does |
|---|---|---|
--full | /security | Scan everything, not just recent changes |
--days N | /security | Check changes from the last N days |
--skip-security | /pentest | Use the latest Phase 1 report instead of re-running it |
--api-only | /pentest | Only test the backend |
--browser-only | /pentest | Only test the frontend |
--auth-only | /pentest | Only test the login system |
Rules for building this yourself
The pattern works with any database, any framework, any auth provider. Here's what makes it work.
Write your exceptions list before the first scan. Every app has things that look wrong but are fine. The agents need that list upfront. Start with the patterns you know are correct. Add false positives from each run. The list stabilizes after 2 or 3 audits.
Give agents access to the real database, not just the code. Code says what should be true. The live database shows what is true. If those don't match, you have a problem the code can't tell you about.
Split finding from proving. Phase 1 agents report anything suspicious. Phase 2 agents try to exploit it. Putting both jobs in one agent creates a conflict. It either reports too much (noisy) or too little (misses things). Two phases with different jobs produce better results.
Require proof, not opinions. Never ask an agent "is this secure?" Ask it to show the request and response that proves the attack worked. Proof forces real verification. Opinions invite shortcuts.
Every finding needs a fix. Not "consider improving your security." The specific line to change and what to change it to. A finding without a fix is a finding that sits in a folder forever.
Include what worked. The Phase 2 report should list attacks that failed. Injection blocked. Script execution blocked. Cross-site requests blocked. Knowing what your app defends against is as valuable as knowing what it doesn't.
Beyond security
The two-phase pattern (find everything, then prove what's real) works for more than security.
Code review. Phase 1 agents flag potential problems. Phase 2 agents write failing tests to prove the problems are real. False positives killed the same way.
Performance. Phase 1 agents identify slow database queries and large file bundles. Phase 2 agents run actual benchmarks. A "slow query" that takes 2 milliseconds on real data isn't a real problem.
Compliance. Phase 1 agents flag data handling patterns. Phase 2 agents trace where data actually flows to verify the flags matter. A function that processes anonymous analytics data doesn't need privacy consent handling, even if the pattern looks similar to one that does.
Same core idea every time. Give agents enough context to understand why code exists. Split finding from proving. Filter the noise before it reaches you.
Stop configuring. Start building.