What 17 Retrospectives Taught Me About Coding With AI

A note is advice. Advice gets ignored. A pre-commit hook that exits 1 does not.

I ran /retro at the end of every build for three months. Two pain points, one win, no platitudes: name the file, the error, the wasted hours. I expected a journal. What I got was a pattern. The same three failures, wearing different disguises, across eight unrelated projects.

That recurrence is what actually changed how I work with AI. Here is the loop, the three failures, and what I did about them.

The loop

Run a retro at the end of every session. Force specifics: the exact bug, the wrong assumption, the hours it cost.

Read back across projects. Watch for the same lesson showing up in a new costume.

When a lesson recurs, stop trusting yourself to remember it. A recurring mistake is a process gap, not a knowledge gap.

Encode it as enforcement: a skill, a CLAUDE.md rule, or a hook that fails the commit.

The next retro is about something new. The old pain is dead.

The point of a retro is not reflection. It is to find the mistake that keeps happening, then make it impossible.

Failure 1: docs and specs lie

On SerpDelta I burned hours chasing a bounce-rate “bug.” The spec said the threshold was 30 seconds; the running code used 10. The system was correct the whole time. The document was the stale artifact.

Then BestThriends did it again from the other direction. Its schema.sql declared flag IN ('good','bad','style','template'), but the live code only ever wrote 'win' and 'fail'. Two columns the app queried on every request were not in the schema file at all. A migration had run months earlier and nobody updated the canonical schema.

The lesson stopped being “keep docs updated” (impossible in a fast codebase) and became: when the doc and the runtime disagree, the runtime wins, so grep the runtime value before you believe anything. That became my Ground Truth Rule.

Failure 2: a check that tests a proxy is not verification

This site shipped broken structured data for days. Blade saw the @context in my JSON-LD and tried to run it as a directive. The page looked fine, so it passed. Separately, my deploy script had a sitemap “check” that counted URLs and reported “47 URLs,” but never confirmed the new pages were actually in it.

Both checks ran. Neither verified the thing that had changed.

Proxy check: “the sitemap has 47 URLs” / “the page renders”

Real check: “this specific new page is in the sitemap” / “the JSON-LD parses”

The corollary kept recurring too: when one missing guard causes a bug, the same gap is almost always copy-pasted elsewhere. On itbroke, a noise filter ran on one display list while every number users actually read (totals, severity, alert counts, email triggers) queried raw data across three separate files. Fixing one would have left the other two lying. Those two lessons became /audit and /fix: test the path that changed, then grep every consumer for the same shape of mistake.

Failure 3: instructions get ignored; enforcement does not

This was the expensive one. My trading bot’s executor shipped with four runtime bugs and crashed every ten minutes for two days. The health check reported “0 errors” the entire time, because it watched for exceptions it could see, not for whether the job produced any output. Syntax checks passed. The CLAUDE.md rule that said “prove the real path runs” was sitting right there in the file. It changed nothing.

CLAUDE.md rules, memories, and polite instructions get ignored under pressure. exit 1 does not.

That is the lesson three months of retros hammered flat. An AI assistant, and a tired human, will skip a guideline when they are moving fast. Neither can skip a pre-commit hook that blocks the commit. So the rules that actually mattered stopped being prose and became a gate: AI Change Control, a hook that scans staged files and fails the commit on the exact patterns my retros kept flagging.

What earned a hook

Verification must test the thing that changed, not a proxy for it
Grep every consumer when you fix a missing guard
Never trust a doc’s claim about its own schema; check the runtime
No debug leftovers or banned patterns in committed code

What actually changed

The retros did not just produce lessons. They produced a ranking of which lessons were costing me the most, and that ranking is what I encoded. The skills I lean on now (/audit, /fix, the Ground Truth Rule, AI Change Control) are not best practices I read in someone’s thread. They are the three mistakes I got tired of repeating, converted into things that cannot be skipped.

The habit is cheap: two pain points and one win, written down before the context evaporates. The payoff compounds. My tenth project did not repeat the first project’s bugs, because the first project’s bugs had become a hook. The honest version of “we should be more careful” is a script that returns a non-zero exit code.

Stop writing your hard-won lessons down as advice. Write them down as enforcement.

Related:

AI Trusts Your Docs. That’s the Problem.: the Ground Truth Rule in full
AI Change Control: the hook that enforces what instructions can’t
From 733 False Positives to 9 Safe Deletes: an audit loop built from the same habit

What 17 Retrospectives Taught Me About Coding With AI

The loop

Failure 1: docs and specs lie

Failure 2: a check that tests a proxy is not verification

Failure 3: instructions get ignored; enforcement does not

What actually changed

AI Change Control: The Pre-Commit Hook Framework That Stops Claude Code From Ignoring Your CLAUDE.md

Why Claude Code Keeps Saying 'Next Session' and How to Stop It

From 733 False Positives to 9 Safe Deletes with Claude Code