Skills improve odds. They do not prevent violations.
You wrote a clean CLAUDE.md. Architecture rules, naming conventions, the canonical service, the non-negotiables. And Claude Code still hardcoded a value you told it to pull from an enum, in the same session you reminded it not to.
It’s not your prompt. It’s not the model being dumb. The reason Claude Code ignores your CLAUDE.md is structural, and once you see it you stop trying to fix it with better wording. A rule that lives in a prompt is a suggestion. The AI weighs it against the task you actually asked for, and the task usually wins.
The problem, honestly
I’m not a coder. I’m an AI orchestrator. I plan, I architect, I have AI write the actual code. And I’d built what should be a tight system: clean codebase, frameworks, custom skills, rules, shell scripts, preflight checks, deploy pipelines.
And AI still shipped bugs.
Not because the model was dumb. Not because the prompts were weak. Not because the docs were missing. It shipped bugs because it was asked to do one thing, and it did that one thing, even when doing it meant:
- using a different function than the canonical one
- hardcoding a value that should come from an enum
- inventing a near-duplicate helper
- bypassing a service layer “just this once”
- adding inline logic in a controller because it was faster
The task got done. The big picture got broken. Every one of those moves violated a rule I had written down, in a file the AI had read, in the same project.
What I already had (and why it wasn’t enough)
Before writing any enforcement, my setup already included:
- A tight
CLAUDE.md: architectural rules, structure, conventions, non-negotiables. Load-bearing, focused, not bloated. - Custom skills for Claude Code:
/think,/audit,/fix,/todo,/remediation,/content-write,/content-audit. Skills for pausing, zooming out, auditing before remediating, verifying after changes. - Shell scripts and preflight checks: deploy pipeline, content preflight, slug manifest, redirect generation. All the operational plumbing.
That’s two of the three layers a real system needs. I had what matters (CLAUDE.md) and how to work (skills). What I was missing was what is allowed: the machine-enforced guardrails. The layer that doesn’t ask AI to behave. It makes misbehavior impossible to ship.
This is also a case study in why you can’t trust your own docs. I had literally written the rules into CLAUDE.md, and my own code still violated them. Docs describe intent. Enforcement produces outcomes. They are not the same thing.
That pullquote at the top took me longer to accept than it should have. I kept trying to solve an enforcement problem with better prompts and tighter skills. You can’t. The job descriptions are different. A skill nudges the odds in your favor. It never closes the door.
The wrong goal: “AI needs to understand the whole codebase”
For a while I thought the goal was “AI needs to understand the entire codebase.” More context. More memory. More planning. Bigger windows. If only it could hold the whole system in its head, it would stop breaking things.
That’s the wrong abstraction. Even humans don’t hold an entire system in working memory. A senior engineer doesn’t keep all 400 files loaded when they touch one function. They rely on the compiler, the test suite, the linter, the code review to catch what they forgot. The fix isn’t more memory. It’s stronger constraints at the points where bad decisions can enter.
The other wrong framing: “no bugs, no refactors.” That’s fantasy. There will always be bugs. There will always be refactors.
The real goal is sharper than that:
Make local changes obey global rules by default.
That’s it. That’s the job of an AI orchestration system. Not perfect code. Not omniscient context. Just this: when the AI makes a small change over here, it can’t quietly break the rule you set over there.
What soft enforcement looks like (and why CLAUDE.md isn’t enough)
Your system is soft if the AI can still:
- hardcode a value where canon exists
- invent a near-duplicate helper
- bypass the canonical service
- add one-off logic in a controller
- mutate a pattern because “it worked”
- create a new abstraction instead of extending an existing one
If any of these are possible, drift is guaranteed. Not likely. Guaranteed. The only question is how fast.
CLAUDE.md is soft enforcement by definition. It’s text the model reads, weighs, and sometimes sets aside when the immediate task pulls harder. You can write the rule in bold. You can write it in caps. You can write “this is a critical rule, not a suggestion” right next to it. I did. The AI still bypassed the service layer when bypassing it was the shortest path to the thing I’d asked for.
That’s not a failure of obedience. It’s the nature of a suggestion competing with an instruction.
The shift: machine enforcement over prompt enforcement
What finally worked was changing what kind of thing the rule was. Not a sentence in a file. A wall in the pipeline.
High velocity inside hard guardrails:
- AI should be free inside a lane
- it should hit a wall when leaving the lane
- the wall should be machine enforced, not prompt enforced
Prompts shape behavior. Enforcement controls outcomes. These are not the same thing, and I’d been treating them as if they were.
Soft system: rules live in CLAUDE.md, skills, and prompts. AI is “encouraged” to follow canon. It complies most of the time, which means it drifts some of the time, which compounds.
Hard system: rules live in CI, static analysis, and pre-commit hooks. AI literally cannot commit code that violates canon. Compliance isn’t requested. It’s the only path the code can take to land.
The difference isn’t strictness. It’s where the rule lives. A rule in a prompt is something the AI can reason its way around. A rule in a pre-commit hook is something that physically stops the commit. One is advice. The other is a gate.
The turn: a grep script caught 104 violations
Here’s the part that made it real, and made me stop trusting my own setup.
I wrote a dead-simple checker. Not PHPStan. Not Larastan. Not an AST parser. A handful of grep patterns reading a policy.json file, wired into a pre-commit hook. The kind of thing you’d dismiss as too crude to matter. It runs in under a second and has zero dependencies.
I pointed it at my own codebase. The one with the tight CLAUDE.md, the custom skills, the preflight checks. The one I’d have told you was clean.
It found 104 violations. Every single one broke a rule that was already written, in plain English, in my CLAUDE.md. Hardcoded values where an enum existed. Near-duplicate helpers. Logic in controllers that belonged in a service. None of it was exotic. All of it was the AI doing exactly what I’d told it not to do, in a file it had read.
That number is the whole argument. 104 rules I’d written, ignored, sitting in committed code, invisible until a grep script looked. No amount of better prompting would have caught them. A one-second script did, on the first run.
That checker became the first wall of a framework I named AI Change Control. It’s the structural fix for everything above: the cheap, grep-based gate that turns your CLAUDE.md rules from suggestions the AI weighs into walls it can’t commit past. The full breakdown of those 104 violations, and what each one taught me, is the receipts case study. The tool itself is open source on GitHub.
If you came here because Claude Code keeps ignoring your CLAUDE.md, that’s the answer. Stop rewriting the file. The file was never the problem. The file was always a suggestion. Move the rules that matter into a place the AI can’t argue with, and the drift stops, because the path that produced it is gone.
Related:
- AI Change Control: The Pre-Commit Hook Framework: the fix in full, the framework that turns CLAUDE.md rules into walls.
- The Receipts: 104 Violations Caught on a Real Laravel Codebase: every violation the grep script found on this exact site.
- AI Trusts Your Docs. That’s the Problem.: docs describe intent, enforcement produces outcomes, and why that gap is where the bugs live.