AI is dangerously good at validating bad assumptions. When you hand it bad ground truth, it validates those too.
Tell any AI to audit a feature and it’ll read the planning doc, check that the code matches the spec, and report back that everything looks good. It’s thorough, it’s fast, and it’s wrong. The spec was wrong. This isn’t a Claude problem or a GPT problem. It’s what happens when you point a powerful pattern-matcher at documentation that was never verified against the actual system.
I found this out the hard way. Twice in the same week.
The “display-only” claim
I had a planning doc for a new feature: preliminary ranking data. The doc said clearly: “preliminary data is display-only. It does not affect calculations, alerts, or exports.”
The AI audited the plan. Clean bill of health. The spec says display-only, the code puts data in the rankings table, the chart reads from the rankings table. Looks correct.
Except it wasn’t.
Docs said: “display-only — does not affect any calculations, alerts, or exports”
Reality: 4 consumers silently ingesting preliminary data — summaries, alert scoring, CTR anomaly detection, and cannibalization analysis
I added five rules to my CLAUDE.md, a “Ground Truth Rule” that forces the AI to grep for actual runtime values instead of trusting what docs claim. Then I ran the same audit again.
This time the AI didn’t read the spec and nod. It grepped every consumer of that rankings table. It found four that would have silently ingested preliminary data. None of them filter by date range. All of them would have produced wrong results with no visible error.
What the audit saw: rankings table, chart component, spec saying “display-only”
What the audit missed: 4 downstream consumers querying the same table without date filters
Why it missed it: trusted the doc’s claim instead of grepping every consumer of the table
The fix wasn’t adding date filters to every consumer, which would be a behavioral guard requiring every future query to remember the rule.
Fragile fix: add WHERE date <= confirmedDate to every consumer (must find all 7+, every future query must remember)
Structural fix: separate preliminary_rankings table — existing consumers can’t see data that isn’t in their table
The “display-only” promise became structurally true, not aspirationally true.
When the docs don’t know they’re wrong
Same week, different codebase. Alert system audit.
This time the docs weren’t just stale — they were actively misleading. They referenced a constant that didn’t exist (the real one had a different name), a class that was never built, and volume thresholds that were off by 5-20x. Seven claims, all wrong. Some had been wrong for two days. In a fast-moving codebase, docs rot in hours.
But the real finding wasn’t stale docs. It was architectural drift that no single file reveals: the kind of cross-file structural problem that code review never catches.
Two separate scheduled jobs were creating alerts for the same domains on independent schedules with different dedup keys. Neither path checked the other’s duplicates. This wasn’t a recent regression. It was slow accretion where a second system was added alongside the first without retiring it. That kind of dead code accumulates silently until something forces you to look. The bug only surfaced by tracing the scheduled jobs through to their creation endpoints.
Same audit also found: health status was computed before silenced alerts were filtered (users saw “critical” with no visible problems), and a traffic detection formula that always computed 0% for increases because of a denominator choice.
Every one of these came from grepping runtime values and tracing actual execution paths, not reading docs.
The fix: five lines in CLAUDE.md
The guardrail isn’t “be more careful.” It’s mechanical. Five rules that change the AI’s default from “trust then verify” to “verify then trust”:
- Never trust specs, docs, or constant names as ground truth — grep for actual runtime values
- When code and docs disagree, check git blame on both
- Trace what producers actually output, not what constants are named
- Verify conversion layer outputs contain what consumers need
- Verify state against production/runtime, not local snapshots
Add these under a “Ground Truth Rule” heading in your CLAUDE.md. They apply to audits, planning doc reviews, and any task where AI is evaluating existing code against a written claim. CLAUDE.md is also the right place to stop Claude from gating your progress mid-session, same principle, different failure mode.
The rule
The spec is a hypothesis. The codebase is the evidence. Force your AI to treat them that way.
An audit that reads the spec and confirms the code matches is a confirmation exercise, not an audit. The same pattern that makes AI fast at validating your architecture makes it fast at validating your mistakes, unless you force it to check.
Related:
- Audit Before You Spread the Mess: When to audit, and what to look for
- Don’t Assume AI Fixes Things Properly: The verification step most people skip
- Every Child of a Product Needs Its Own Moat: The same principle applied to content — clean on paper doesn’t mean it deserves to rank