AI Cleans Itself

Your repo gets messier every time an AI agent ships a feature. Dead code piles up. Branches break and sit. Naming patterns drift. None of it is dramatic. It just compounds, feature after feature, until the codebase is slower to change and scarier to touch.

This post walks through a janitor crew that cleans it up for you, on a schedule, while you sleep. Three workflows. One scheduler. Each one takes a different kind of mess and handles it quietly. You wake up, read one email, and know what got swept, what got fixed, and what still needs a human look.

The three small messes that become big messes

Most codebase problems do not start big. They start small and stay small for a while. Then a few features later, the small version has copies in three other places, and the cleanup is a week instead of an hour.

These are the three that show up the most in AI-heavy repos.

1. Dead code that nobody deletes

The agent ships a feature. A week later you ship a different version of that feature. The old code still sits there. Nobody deletes it because nobody is sure if something still uses it. Three more features later, your repo has three abandoned helpers, two unused routes, and a full page that no link points to anymore.

On an average project, dead code grows to around 18% of the repo within a few months. That is code being type-checked, linted, bundled, and loaded every build. It slows down cold builds, confuses new contributors, and makes refactors scarier than they need to be.

The problem is not finding dead code. A linter can do that. The problem is deleting it safely. You need to know that removing the file actually does not break anything. Most of the time, nobody has the patience to verify that line by line, so the dead code stays.

2. Broken branches that block shipping

An agent run finishes. It says the feature is done. You pull the branch in the morning and the build is red. Type error, failing test, missing import. Small thing. You fix it in ten minutes.

Except you do that several mornings a week. Ten minutes here. Twenty minutes there. Two full days a month gone to fixing a broken handoff before you can even start the real work. And if the break lands on main, every other feature waits on it.

The fix is simple in theory. The agent should fix its own break. In practice, that means running tests, reading the error, editing the right file, and not making things worse by guessing. That is a loop most agents cannot close on their own without extra structure.

3. Patterns that quietly drift

You have four features that all do roughly the same thing: auth, billing, notify, search. They all follow one pattern. Same folder shape. Same filenames. Same naming for the key functions.

Then one feature drifts. The agent names something slightly differently. Or splits a file that the others keep together. Or skips a pattern that the others all share. Alone, that is fine. Three features later, a new agent sees the drifted one and copies that shape instead of the original. Now you have a fork in your own codebase that nobody agreed to.

Drift is the most expensive mess of the three. A dead file costs you one delete. A broken branch costs you a morning. A drift that propagates into three more features costs you a real refactor across the whole folder.

The janitor crew

The system is three Claude Code workflows that run on one overnight schedule. You can think of them as one crew of cleaners, each with a narrow job.

Janitor	Job	Output
Slop-cleaner	Deletes dead code after tests pass	Removed files, tests still green
/heal	Fixes broken branches on a side branch	Confidence score, email verdict
/drift	Flags pattern divergence across recent features	Pattern map, drift warnings

One scheduler wakes the crew at night. It fires each workflow in order. Each one writes a short note to a shared log. The last step assembles the log into an email and sends it to you.

Nothing ships to main on its own. Slop-cleaner edits happen behind tests. /heal works on a side branch, not on main. /drift only flags, it does not rewrite. You stay the person who accepts or rejects the morning report.

Janitor one: slop-cleaner

Slop-cleaner's job is to delete dead code without breaking anything.

The trick is the order. Most dead-code tools scan, list, and then let a human delete. That is slow because nobody wants to be the person who breaks prod by removing the wrong file. The workflow flips the order: write the safety net first, then delete, then re-run the safety net.

Step one, write regression tests. The agent looks at the files that are about to be touched and writes a small test for each one that still has a caller. The goal is not full coverage. The goal is a green signal that says "the parts that matter still work." These tests are committed on a side branch so a human can read them.

Step two, delete the dead code. The agent uses a static check to list candidates: files with zero inbound imports, functions never called, routes with no links, types never referenced. It removes them in small batches.

Step three, re-run the tests. If every test is still green, the deletes stand. If any test goes red, the specific delete that caused it is reversed. The agent does not try to "fix" the test. It assumes the delete was wrong and rolls it back. Nothing subtle.

The result is a repo that gets lighter without the usual fear. In one overnight run, the crew removed 412 lines of dead code across a midsize SaaS. Zero tests broke. Zero pages stopped rendering. The next morning the reviewer skimmed the diff, clicked approve, and moved on.

Slop-cleaner only works if the test safety net is honest. If the agent writes fake tests that always pass, the whole pattern falls apart. The workflow enforces one rule: every test must call a real function and assert a real output. Tests that never import the file under test are rejected.

Janitor two: /heal

/heal's job is to fix broken branches without putting broken code on main.

When /heal wakes up, it reads the list of recent feature branches. For each one, it runs the build, the type check, and the test suite. If a branch is red, /heal starts working on it.

The important detail: /heal works on a side branch, not on the original. It copies the broken branch into something like heal/feature-name, makes its edits there, and leaves the original alone. Main stays clean the entire time. The original author still owns the fix decision.

The fix loop is tight. /heal reads the first failing check and asks one question: what change makes this specific check pass? It edits the smallest possible surface, re-runs the check, and moves to the next failure. It does not refactor. It does not improve. It only closes the red signals.

After the loop, /heal runs everything again. If all checks pass, it scores its own confidence on a zero to one hundred scale. The score is based on simple signals: did the fix touch a test file or a source file, how many lines changed, were the errors clear and narrow or vague and wide, did the same test fail twice in a row before passing.

Then /heal writes a short verdict and emails it. The verdict includes the branch name, the confidence score, the exact files changed, and a one-line reason. Ninety-two percent confidence, two files touched, missing import in the handler. That kind of note.

A human still decides whether to merge. High confidence with a tight diff usually gets a yes. Low confidence or a wide diff gets a closer look. The point is not to remove human review. The point is to make sure the human is reviewing a clean, narrow fix instead of a raw red branch.

Janitor three: /drift

/drift's job is to catch pattern divergence across recent features before it spreads.

When /drift wakes up, it builds a pattern map of your codebase. It picks a small set of features (usually the last five to ten) and records the shape of each one: folder layout, filenames, function names, import style, test location. This is a simple snapshot, not a deep analysis.

Then it compares. If four features have the same shape and one feature is slightly different, that one is an outlier. /drift does not rewrite the outlier. It flags it.

A flag looks like this: "notify uses sendNotification while auth, billing, and search use sendEmail. Naming drift. Costs 4x to fix after three more features copy it." That is specific enough to act on and vague enough to leave the final call to a human.

Why flag instead of fix. Because drift is not always wrong. Sometimes the outlier is the right call and the other four are the legacy. A human can tell which is which. An agent cannot, because the rule changes based on intent. So /drift does the work an agent is good at (spotting the pattern) and leaves the judgment to you.

The critical window is early. If /drift catches the drift in notify before three more features copy the pattern, the fix is one file. If it catches it after, the fix is four files plus the coordination across teams. That is where the "costs 4x to fix" number comes from.

One night, one schedule

The scheduler is simple. One trigger, one cron-style entry, fires at a set time every night.

When it fires, a fresh Claude Code session starts. The orchestrator reads the current state of the repo and runs the three workflows in order: slop-cleaner, then /heal, then /drift. Each one writes a short note to a shared log at ./overnight.log.

The order matters. Slop-cleaner runs first because it shrinks the code under review. /heal runs second because it fixes what is broken right now. /drift runs last because it reads the cleanest version of the repo, after the other two have done their work.

Each janitor has a time budget. If it runs long, the workflow ends that step and moves to the next one. The point is a predictable morning email, not a perfect overnight. Unfinished work gets flagged and picked up the next night.

Nothing in this setup requires a cloud platform. A real crontab entry on your machine works. A Claude Code Desktop scheduled task works. The choice is operational, not architectural. What matters is that the trigger fires, a fresh session wakes, and the orchestrator knows where to start.

The morning report

The last step of the overnight run assembles the log into an email and sends it. The format is short, scannable, and ordered from "needs nothing" to "needs review."

A real morning log looks like this:

overnight.log - janitor crew
01:00  Slop-cleaner swept the repo
       412 dead lines removed. Tests green.

02:30  /heal rescued a broken branch
       Fixed on side branch. 92% confidence.

04:00  /drift caught a naming gap
       1 file out of pattern. Flagged.

07:00  Verdict emailed to you
       3 cleanups ready for review.

You read it over coffee. Four minutes, tops. Each line has a time, a short outcome, and one number that tells you how confident the janitor was. The cleanups with high confidence usually get approved on the spot. The flagged drift gets a quick look and either a fix or a "leave it, that one is intentional."

The shape of the email is the point. Short. Concrete. Ordered. No prose summaries. No "the agent worked hard tonight" filler. You want to know what changed, how sure the janitor is, and what you need to do about it. Nothing else.

How to build your own version

You do not need the whole stack to get the benefit. The pattern is simple. Start with this shape and expand from there.

Write three small workflows. Keep them narrow. Slop-cleaner deletes dead code only if a test passes. /heal fixes broken branches only on a side branch. /drift flags pattern divergence only on recent features. Each one has a single job and a single output format.

Put them behind one scheduler. Cron is fine. A scheduled Claude Code session is fine. You do not need a job queue or a message bus. You need one ping that wakes the crew at a set time.

Write the regression tests before the delete, every time. This is the rule that makes slop-cleaner safe. If the agent cannot write an honest test for the file it wants to delete, the delete does not happen. That single rule is worth the whole workflow.

Keep /heal on a side branch. Main must stay clean. If /heal's fix is wrong, the side branch is thrown away and the original broken branch is still waiting for a human. No surprises on main, ever.

Make /drift flag, not fix. This is the hardest rule to hold. Agents want to "improve" code when they see it. For drift, that is the wrong move. A pattern that looks wrong might be intentional. The agent's job is to surface it and stop. The human's job is to decide what the pattern should be going forward.

End every run with a short, structured email. Time, outcome, number, one-line reason. Nothing else. If you cannot scan the whole email in under five minutes, the janitor is doing too much work in one night. Split it.

Where else this pattern applies

Once you have a janitor crew running on code, the same shape works for other kinds of slow-burning mess.

Content janitor. A workflow that sweeps old blog posts, flags stale sections, and writes a note to the author. It does not rewrite. It flags.

Analytics janitor. A workflow that checks event naming across your product for drift, flags outliers, and proposes a canonical list. Same pattern map as /drift, different inputs.

Dependency janitor. A workflow that scans unused packages, writes a test that imports each active one, and removes the rest if tests still pass. Same safety net as slop-cleaner, applied to package.json.

The core idea is not "AI cleans code." It is "small messes get handled on a schedule, behind a safety net, in a format you can read in five minutes." That works for code, content, analytics, and anything else that compounds quietly between the big ships.

One crew. One schedule. Three narrow jobs. A cleaner repo in the morning and an email that respects your time. That is the whole system.

AI Cleans Itself

On this page