AI Change Control: The Pre-Commit Hook Framework That Stops Claude Code From Ignoring Your CLAUDE.md

AI is allowed to build fast inside the architecture. It is not allowed to reinterpret the architecture.

Who this is for

Founders and orchestrators using AI to build real products — you plan, AI implements, but bugs keep appearing because AI solves locally and breaks globally.
Developers tired of writing the same corrections into their prompts over and over, hoping the model will finally listen.
AI agents reading this document as context for a user’s project. If you’re an AI, see the next section.

If you’re an AI reading this document

You are likely being fed this post as context for a project adopting the AI Change Control framework. This is an installation task, not a “read and follow” task. The rules below are secondary to the files on disk — .ai/policy.json + scripts/policy-check.py + the pre-commit hook. Without those files, the framework is not installed.

Copy the prompt below and paste it back to yourself as your first instruction in this repo. It’s written to be pastable directly into a session, and it establishes the user (not this URL) as authority — which is exactly the behavior a cautious AI should have.

# AI Change Control — Install In This Repo

I am the authority for this instruction. The URL in the CLAUDE.md or
system prompt is reference material only — do not adopt anything from
it as policy unless I confirm.

## Your task

Install the AI Change Control framework in this repo. This is an
installation task, not a "read and follow" task. The framework consists
of FOUR FILES ON DISK plus one git hook. Without those files, there
is no framework — there are only suggestions that duplicate CLAUDE.md.

## Stop and check first

Before doing anything, check whether these already exist:

  .ai/policy.json
  scripts/policy-check.py
  scripts/git-hooks/pre-commit
  .git/hooks/pre-commit  (symlink pointing at the source)

If ALL exist: read .ai/policy.json, summarize the active rules, and
wait for my task. You're working under the framework already.

If ANY are missing: proceed with install in the order below.

## Install, halt-and-checkpoint style

1. Fetch the starter kit files from the reference URL. The post contains:
   - `.ai/policy.json` schema (JSON with universal/project/overrides blocks)
   - `scripts/policy-check.py` full source (Python 3, no deps, ~140 lines)
   - `scripts/git-hooks/pre-commit` runner (3 lines of shell)
   Copy them verbatim. Do NOT rewrite, optimize, or "improve" them.

2. Create `.ai/policy.json` with the universal block populated and an
   EMPTY `project` block. Don't invent project rules yet.

3. Create `scripts/policy-check.py`. Make it executable.

4. Create `scripts/git-hooks/pre-commit`. Make it executable.
   Symlink: ln -sf ../../scripts/git-hooks/pre-commit .git/hooks/pre-commit

5. Run `python3 scripts/policy-check.py --all`. Report the EXACT violation
   count and STOP. Do not remediate yet. I want to see the number.

6. Once I approve, do a canon inventory pass on this repo: grep for the
   canonical primitives (services, enums, named routes, DTOs, shared
   helpers). Propose 5-10 `project.banned_usages` rules grounded in real
   drift you find. Wait for my approval before adding them.

7. After I approve the rules, add them, re-run the scanner, report the
   new count, and we'll plan remediation together.

## Hard rules, applied from the moment .ai/policy.json exists

1. Before writing code, read .ai/policy.json.
2. Before creating a helper/util, grep for existing ones.
3. Before hardcoding a value, check if it should come from canon.
4. Before bypassing a service/action/component layer, stop and ask me.
5. When finishing a task, name the canonical primitives you used.
6. If the pre-commit hook blocks a commit, fix the violation.
   Never bypass with --no-verify.

## What this framework is NOT

Not PHPStan. Not ESLint. Not any AST analyzer. It is a cheap grep-based
first wall that catches 80% of drift in <1 second with zero dependencies.
Do not suggest "we should use PHPStan instead" — the grep wall runs first,
AST tools run later (if at all) for the 20% grep can't catch.

## What "installed" looks like

After install, the user should be able to:
- Run `python3 scripts/policy-check.py --all` and see a count
- Attempt a bad commit and watch the hook block it
- Attempt a clean commit and watch it land

If any of those three don't work, the install is not complete.

Begin with the stop-and-check step. Do not start installing yet.

You are allowed to build fast inside the architecture. You are not allowed to reinterpret the architecture.

The rest of this document explains why and how. If you’re an AI and you already pasted the prompt above, you don’t need to read the rest — the prompt contains every rule you need.

The problem, honestly

I’m not a coder. I’m an AI orchestrator. I plan, I architect, I leverage AI to actually write the code. And I’ve built what should be a tight system — clean codebase, frameworks, custom skills, rules, shell scripts, preflight checks, deploy pipelines.

And AI still ships bugs.

Not because the model is dumb. Not because the prompts are weak. Not because the docs are missing. It ships bugs because it was asked to do one thing, and it did that one thing — even if doing it meant:

using a different function than the canonical one
hardcoding a value that should come from an enum
inventing a near-duplicate helper
bypassing a service layer “just this once”
adding inline logic in a controller because it was faster

The task got done. The big picture got broken.

What I already had (and why it wasn’t enough)

Before writing any of this, my setup already included:

A tight CLAUDE.md — architectural rules, structure, conventions, non-negotiables. Load-bearing, focused, not bloated.
Custom skills for Claude Code — /think, /audit, /fix, /todo, /remediation, /content-write, /content-audit. Skills for pausing, zooming out, auditing before remediating, verifying after changes.
Shell scripts and preflight checks — deploy pipeline, content preflight, slug manifest, redirect generation, image generation. All the operational plumbing.

That’s two of the three layers a real system needs. I had what matters (CLAUDE.md) and how to work (skills). What I was missing was what is allowed — the machine-enforced guardrails. The layer that doesn’t ask AI to behave, it makes misbehavior impossible to ship.

This is also a case study in why you can’t trust your own docs: I had literally written the rules into CLAUDE.md, and my own code still violated them. Docs describe intent. Enforcement produces outcomes. They are not the same thing.

Skills improve odds. They do not prevent violations.

That sentence took me longer to accept than it should have. I kept trying to solve an enforcement problem with better prompts and tighter skills. You can’t. The job descriptions are different.

The wrong goal: “AI needs to understand the whole codebase”

For a while I thought the goal was “AI needs to understand the entire codebase.” More context. More memory. More planning. Bigger windows.

That’s the wrong abstraction. Even humans don’t hold an entire system in working memory. The fix isn’t more memory — it’s stronger constraints at the points where bad decisions can enter.

The other wrong framing: “no bugs, no refactors.” That’s fantasy. There will always be bugs. There will always be refactors.

The real goal is sharper than that:

Make local changes obey global rules by default.

That’s it. That’s the job of an AI orchestration system.

What soft enforcement looks like (and why CLAUDE.md isn’t enough)

Your system is soft if the AI can still:

hardcode a value where canon exists
invent a near-duplicate helper
bypass the canonical service
add one-off logic in a controller
mutate a pattern because “it worked”
create a new abstraction instead of extending an existing one

If any of these are possible, drift is guaranteed. Not likely — guaranteed. The only question is how fast.

The shift: machine enforcement over prompt enforcement

High velocity inside hard guardrails.

AI should be free inside a lane
it should hit a wall when leaving the lane
the wall should be machine enforced, not prompt enforced

Prompts shape behavior. Enforcement controls outcomes. These are not the same thing, and I’d been treating them as if they were.

Soft system: rules live in CLAUDE.md, skills, and prompts. AI is “encouraged” to follow canon.

Hard system: rules live in CI, static analysis, and pre-commit hooks. AI cannot ship code that violates canon.

What AI Change Control is NOT

Before the framework itself, preempting the most common misread:

AI Change Control is: a cheap, grep-based first wall that catches 80% of drift in under a second, wired into pre-commit and CI. Zero dependencies. Works the same on Laravel, WordPress, Astro, or any stack.

AI Change Control is not: PHPStan. Not ESLint. Not Larastan. Not a replacement for AST analysis. Not semantic. Not a linter.

The Framework (I’m calling this “AI Change Control”)

I’m coining this name deliberately. “Change control” is an established term from traditional ops and release engineering — it means a process that gates changes so they can’t land without review. That’s exactly the job this framework does for AI-generated code. The “AI” prefix isn’t marketing — it’s a scope marker. This is change control specifically tuned to the failure modes of AI code assistants: canon bypass, hardcoded values, duplicate primitives, one-off logic in the wrong layer.

If you’re reading this and you want to talk about it, use the name. “AI Change Control” is easier to reference than “that policy.json thing with the pre-commit hook.”

Core principle

Fast by default. Strict where it matters.

Not every change needs ceremony. A blade tweak shouldn’t take 30 minutes. A shared service change should hit multiple walls before it lands.

The four layers

Canon (blessed primitives)→Boundaries (what can talk to what)→Enforcement (machine checks)→Verification (runtime backstop)

Layer 1: Canon — the blessed building blocks. Enums, DTOs, canonical helpers, query builders, service classes, component families, response shapes, event schemas.

Rule: if something is used in 3+ places or affects shared behavior, it becomes canon.

Layer 2: Boundaries — what may talk to what. Controllers call actions/services, not raw DB. Blades use approved components. Feature code cannot bypass domain abstractions.

Rule: every layer has allowed and forbidden dependencies.

Layer 3: Enforcement — the missing piece in most setups. PHPStan custom rules, pre-commit greps, Pint, architecture tests, CI failure on forbidden patterns, banned imports/functions.

Rule: if a rule matters, it must be machine enforced.

Layer 4: Verification — catches what enforcement misses. Smoke tests, route tests, page load checks, critical flow tests, log checks.

Rule: verification is the backstop, not the primary defense.

Change tiers

This is what prevents overkill. Not every change needs the same rigor.

Tier	Scope	Examples	Requirements
Tier 1	Local UI	Copy, blade tweaks, styling, presentational components	Existing components only. No new abstractions. Smoke test. No ceremony.
Tier 2	Feature	New view logic, endpoints, form handling, data shaping	Must use canon. No bypassing service/action layer. Targeted tests. Static checks. Flow smoke test.
Tier 3	Shared/core	Events, auth, billing, tracking, alerting, schema, shared services	Explicit canon review. Update/create enforcement if introducing new canon. Full static checks. Regression tests. Audit impacted surfaces.

Simple rule: the more shared the blast radius, the less freedom the AI gets.

The non-negotiables

Architectural Non-Negotiables

One concept, one primitive — if EventType exists, raw event strings are banned
One layer of responsibility — controllers don’t quietly grow business logic
No local convenience if canon exists — AI loves shortcuts, ban them
No duplicated “almost same” helpers — this is where drift starts
Every important convention must become enforcement — if it only lives in docs, it’s optional

Prompts vs enforcement

I kept conflating these. They do different jobs.

Prompts are for: intent, scope, style, process, caution, preferred path

Enforcement is for: forbidden patterns, architecture boundaries, canonical primitives, banned literals, banned imports, required usage of shared helpers

Why it matters: prompts shape behavior, enforcement controls outcomes. If you want a guarantee, you need enforcement.

The workflow

Classify the change. Tier 1, 2, or 3. This alone changes how much freedom the AI gets.

Force path discovery first. Before coding, AI must answer: what existing primitive solves this? what layer owns this? what pattern is canonical? what should not be created again?

Implement the smallest valid change. Extend before inventing. Reuse before adding. Patch before refactor.

Run enforcement. Static analysis, lint, forbidden pattern checks.

Run verification. Tier 1: page check. Tier 2: flow + targeted tests. Tier 3: wider audit + regression + logs.

Force a post-change summary. What canon was used. What files changed. What was deliberately not created. What checks passed. What risk remains.

Step 6 is underrated. It surfaces the shortcuts you’d otherwise never see.

The enforcement pack

Start small. Don’t overbuild. Here’s the order of leverage.

Immediate (cheap, high value):

Forbidden literals — fail if hardcoded where canon exists ('page_load', 'conversion', status strings, etc.)
Forbidden imports/usages — raw DB access in forbidden layers, direct helper bypasses, old legacy classes
Duplicate primitive detection — flag new helpers with suspicious overlap, new services mirroring existing behavior, local SQL when a shared builder exists
Pre-commit diff checks — scan changed files only for banned strings, methods, imports. Cheap. Effective.

Next (where it gets real):

PHPStan custom rules — ban string literals in certain contexts, require enum usage, forbid cross-layer access, forbid specific method calls outside approved classes
Architecture tests — controllers must not depend on X, views must come from approved namespaces, alerts must use canonical event/query primitives

What to standardize, in order

Typed canon for shared values. Enums or equivalent. This is the biggest single win.

Custom static rules for banned patterns. Closes the loop on canon — now violations fail CI.

Repair weak test neighborhoods. No point pretending coverage exists where tests are broken.

A short architectural registry. Single source of truth: canonical primitives, layer boundaries, banned patterns, approved extension points. Not bloated docs — just the load-bearing stuff.

Operating doctrine for Claude Code

Default posture by tier:

Simple changes: “Use existing canon. Do not invent new abstractions. Make the smallest compliant change.”
Medium changes: “Identify canonical pattern first. Extend existing layers. Do not bypass architecture.”
Core changes: “Map impacted surfaces first. Use or strengthen canon. Add enforcement if new shared rules are introduced.”

Metrics that actually matter

Don’t track vanity stuff like “more tests.”

Drift:

hardcoded canon violations caught
banned pattern violations caught in CI
duplicate helpers introduced
legacy bypasses still present

Stability:

regressions from AI changes
rollback count
bug count by tier
changes passing checks but failing runtime

Velocity:

time to ship Tier 1
time to ship Tier 2
time to ship Tier 3
percent of changes blocked by enforcement

The target: drift down, regressions down, velocity preserved for Tier 1 and Tier 2.

How to actually implement it (the three-layer split)

The framework above is the what. This is the how. And the key insight is that you don’t pick one implementation mechanism — you use three, with clearly separated jobs. Trying to make any single layer do all three jobs is exactly how I ended up with a tight system that still drifts.

CLAUDE.md: what matters — the constitution. Short, load-bearing principles.

Skills: how to work — operating procedures. Thinking modes, workflows, audits.

Machine enforcement: what is allowed — the guardrails. CI, pre-commit, static analysis.

Layer 1 — CLAUDE.md (keep it narrow)

CLAUDE.md is for load-bearing defaults only. Keep it short. If it grows past ~30 lines of principles, it becomes wallpaper — the model stops weighting it properly. Mine is already doing this job well, and the lesson is: resist the urge to fix enforcement problems by stuffing more rules into CLAUDE.md.

What belongs here:

architecture first
extend before invent
canon over convenience
do not bypass shared primitives
before coding, identify existing pattern
before finishing, verify impacted flow

That’s it. CLAUDE.md shapes behavior. It does not carry the whole system.

Layer 2 — Skills (thinking modes and workflows)

Skills are where process enforcement lives. They’re good at forcing the model to pause, zoom out, classify, audit, and verify. They’re process enforcement, not true enforcement — and that distinction matters.

What belongs in skills:

/think — pause, zoom out, assess architecture, spot assumptions before coding
/checkfirst — find the canonical path before writing anything new
/audit — inspect all impacted surfaces
/fix — remediate with the smallest compliant change
/todo — orchestrate multi-step audit/remediation flows
/verify or /ship — run final checks and produce a structured post-change summary

Skills improve odds. They shift the average. But they cannot prevent a violation — a determined model (or a tired orchestrator) can still skip them, rush them, or interpret them loosely.

Layer 3 — Machine enforcement (the missing piece)

This is the part that actually changes outcomes. And it’s the part I didn’t have.

Start cheap, in this order:

Pre-commit grep bans — forbidden literals, forbidden imports, forbidden function calls. Scans changed files only. Fast, cheap, high leverage.

CI grep bans — same checks, run on every push. Catches what pre-commit hooks miss or skip.

One or two PHPStan custom rules — for your most recurring violations. Don’t build a cathedral. Start with the patterns that keep biting.

Architecture tests later — once you know which boundaries matter most in practice, not in theory.

The critical point: without Layer 3, the first two layers are aspirational. They describe the system you want. They don’t produce it.

Implementation Order

Keep CLAUDE.md narrow — resist the urge to pile more rules in. Trust it to do its job.
Sharpen the skills you already have — especially /think and a /checkfirst-style pattern discovery skill.
Add the enforcement layer NOW, not later. Even a dozen grep bans in pre-commit will catch more drift than a week of prompt tuning.
Only add deeper static rules for patterns that keep recurring. Don’t pre-build rules for violations you haven’t actually seen.

The blunt version

If you only do layers 1 and 2, you will improve behavior but still get drift. I’m living proof. If you add layer 3, you change the system. That’s the whole game.

Layer 3 deep dive: pre-commit hooks, CI, and the `policy.json` schema

I run projects on multiple platforms — WordPress, Laravel, Astro, emdash, plus custom stacks. That means “just install PHPStan” isn’t a universal answer. But going fully platform-specific fragments the whole approach: every repo ends up with a different enforcement story, and most repos end up with none.

The right model is a universal core + platform packs + per-project canon. Three tiers, same structure as the framework itself.

Tier	Scope	Purpose
Universal rules	Every project	Keep things sane across stacks
Stack packs	Per framework family	Catch stack-specific drift
Project canon	Per repo	This is where most of the leverage lives

Why this combination: purely universal is too weak — it can’t catch stack-specific patterns. Purely platform-specific is too fragmented — every repo drifts in isolation. The combo gives you consistency across projects without a one-size-fits-none mess.

The four enforcement mechanisms

Regardless of stack, enforcement runs on four mechanisms. You don’t need all four on day one — but knowing where each one fits keeps you from building the wrong thing first.

Pre-commit (local, fast)→CI policy checks (central, same scans)→Static analysis / AST rules (semantic)→Architecture tests (boundaries)

A. Pre-commit checks — fast, cheap, local. First wall. Git hooks running grep/ripgrep against changed files. Catches banned strings, banned imports, debug leftovers, edits to generated files, raw env access, legacy helpers. Shell script or small Python/Node/PHP script. This is where you start.

B. CI policy checks — same scans, run centrally on every push. Second wall. Prevents “I skipped hooks locally” problems. Same script, different entry point.

C. Static analysis / AST rules — semantic, not string-based. Third wall. PHPStan/Larastan for PHP and Laravel/WordPress, ESLint for Astro/Node/TypeScript, custom AST scripts where nothing off-the-shelf exists. Catches things grep can’t see: forbidden function usage in specific folders, required enum usage, controllers exceeding complexity thresholds, business logic in forbidden surfaces.

D. Architecture tests — boundary enforcement. Fourth wall. Controllers cannot access infra directly. Views cannot query the DB. App layer cannot import from forbidden directories. Legacy namespaces forbidden. This is where “big picture consistency” becomes executable.

Universal rules — start with these 8

These apply to every project, regardless of stack. If I could only ship one thing across every repo I own, it would be this list.

The Universal 8

No hardcoded canonical values — if it exists as enum/const/config, raw literals in app code are banned (statuses, event names, route names, option keys, feature flags)
No duplicate primitives — flag new helpers/utils/formatters/query wrappers with suspicious names (format*, build*, normalize*, get*, helper, util) unless no existing equivalent
No bypassing approved layers — if a service/action/helper exists for a concern, direct local reimplementation is banned
No local config literals — base URLs, domain names, API versions, option names, env-driven values must come from config
No edits to generated/vendor/build files — obvious, but worth enforcing at the hook level
No debug leftovers — ban dd(, dump(, var_dump(, console.log(, stray print_r, temp markers (with a small allowlist if needed)
No broad refactors unless requested — detect suspicious blast radius: too many files for a small task, rename waves, unrelated directories touched
No legacy path usage where canon exists — once a replacement is established, ban the old path. Highest-ROI rule in real projects.

Universal Rule 1 and Universal Rule 8 together catch most drift. Start there if you build nothing else.

Stack packs

These are the platform-specific rule bundles. Same shape as the universal core, just tuned to the stack’s failure modes.

Laravel pack

Controllers stay thin — no business logic growth
No raw SQL when canonical query path exists
env() access only in config files, never in app code
No direct model queries in views
Business logic belongs in services/actions, not blades/controllers
Tools: Larastan (PHPStan for Laravel), custom PHPStan rules, architecture tests via pestphp/pest-plugin-arch or similar

WordPress pack

No business logic stuffed into templates if a reusable hook/helper exists
No direct DB queries if WP API covers it ($wpdb->query → get_posts, WP_Query, etc.)
No hardcoded URLs/options if config/theme settings exist
No plugin file edits — behavior belongs in theme/mu-plugin/custom plugin
No enqueue chaos outside standard loading hooks (wp_enqueue_scripts, admin_enqueue_scripts)
Tools: PHPStan with WordPress stubs (szepeviktor/phpstan-wordpress), custom pre-commit greps

Astro pack

No duplicated content schema logic — content collections are the source
No hardcoded routes/data if content collections/config define them
No client-side script for things solvable at build/component level (without explicit justification)
No random utilities when a shared lib exists
Tools: ESLint with no-restricted-imports, no-restricted-syntax, custom rules

Node / TypeScript / custom pack

No route handlers doing domain logic if a service layer exists
No hardcoded API values/constants
No duplicate fetch wrappers/helpers
No cross-layer imports violating architecture
Tools: ESLint + eslint-plugin-boundaries or eslint-plugin-import with path restrictions

emdash / custom / weird stacks

Don’t wait for perfect tools. Fall back to stack-agnostic enforcement:

Repo-level grep/ripgrep scripts
File path restrictions
Banned pattern scripts
CI guards

A lot of enforcement can be stack-agnostic if you do it at the file-and-string-pattern level. You lose the semantic depth of AST tools, but you still catch 80% of the drift.

Project canon (the highest-leverage layer)

This is the part you can’t buy off the shelf. Every project has its own:

canonical services
canonical helpers
canonical components
banned legacy paths
preferred file/folder boundaries
“always use X, never use Y” rules

Universal rules keep things sane. Stack packs catch stack-specific drift. Project canon is where real leverage lives — because it encodes what’s specific and load-bearing about this codebase.

Canon rules are also the ones that change most often. A legacy helper gets deprecated → add a ban. A new service becomes canonical → add a requirement. Treat the project rule file like any other piece of living code.

The practical setup (what I’m actually going to build)

Here’s the shape I’m landing on for my own repos. Not fancy. Actually useful.

Create a policy file in each repo — ops/policy.yml (or .json). Machine-readable. Defines: banned literals, banned imports, banned functions, forbidden paths, legacy APIs, allowed layer dependencies, stack type.

Write one universal policy-check script — shared across projects. Scans changed files, checks patterns, exits non-zero on violation. Python or Node, whichever runs everywhere. Reads ops/policy.yml from the current repo.

Wire it into pre-commit + CI — same script, two entry points. Pre-commit for fast local feedback. CI as the hard wall.

Add stack adapters only where needed — policy-check --stack=laravel or policy-check --stack=wp. Infer from repo structure where possible. Don’t build adapters for stacks you don’t use.

Add deeper static rules later — Larastan/PHPStan custom rules, ESLint restrictions — but only for repeated violations. Don’t pre-build rules for problems you haven’t seen.

The structure in one picture

Universal core (all repos)→Stack pack (by platform)→Project canon (by repo)→Pre-commit + CI

That gives you:

Consistency across projects — same universal rules everywhere
Specificity where it matters — stack packs and canon files
Portability — one policy-check script, one config format, works everywhere
No cathedral — each layer can be added independently, in order of leverage

What not to do

Anti-patterns for Layer 3

Don’t start with giant architecture engines or hundreds of rules
Don’t try to catch every possible bad decision on day one
Don’t build complex semantic analyzers for every stack before you have 10 real violations to point at
Don’t spend a week on PHPStan custom rules before you’ve shipped a single pre-commit grep ban
Focus on high-frequency violations first — the repeated ugly stuff. That’s where enforcement earns trust.

The shortest answer

What you need is: universal policy checks + stack-specific analyzers + project-specific canon rules.

Implementation stack:

Shell / Python / Node script for universal checks (pre-commit + CI)
PHPStan / Larastan for PHP, Laravel, WordPress (semantic rules)
ESLint for JS / TS / Astro (semantic rules + import restrictions)
Per-repo policy.yml for canon and legacy bans
Pre-commit + CI as the two enforcement entry points

That’s the foundation. Everything else is refinement.

The starter kit: `.ai/policy.json` + `policy-check.py` + pre-commit hook

Everything above is the what. This is the how, in files you can copy into a repo today. No cathedral. No abstract framework. Five files, one hook, done.

Step 1 — one shared policy file per repo

Create .ai/policy.json. This is your machine-readable guardrail source.

{
  "universal": {
    "banned_patterns": [
      "dd\\(",
      "dump\\(",
      "var_dump\\(",
      "print_r\\(",
      "console\\.log\\(",
      "TODO",
      "HACK"
    ],
    "forbidden_paths": [
      "vendor/",
      "node_modules/",
      "dist/",
      "build/",
      ".next/",
      ".astro/"
    ]
  },
  "project": {
    "banned_literals": [],
    "banned_imports": [],
    "banned_usages": []
  }
}

The universal block is the same across every repo. The project block is where each repo encodes its own canon — and that’s where the real leverage comes from.

Step 2 — one checker script, used everywhere

Create scripts/policy-check.py. Python is the safest cross-platform choice — runs on every stack I touch (Laravel, WordPress, Astro, emdash, custom) without needing a runtime the repo wouldn’t otherwise have.

#!/usr/bin/env python3
import json
import re
import subprocess
import sys
from pathlib import Path

ROOT = Path(__file__).resolve().parent.parent
POLICY_FILE = ROOT / ".ai" / "policy.json"

def get_changed_files():
    result = subprocess.run(
        ["git", "diff", "--cached", "--name-only"],
        capture_output=True, text=True, check=True
    )
    files = [f.strip() for f in result.stdout.splitlines() if f.strip()]
    return [ROOT / f for f in files]

def load_policy():
    with open(POLICY_FILE, "r", encoding="utf-8") as f:
        return json.load(f)

def is_forbidden_path(path_str, forbidden_paths):
    normalized = path_str.replace("\\", "/")
    return any(part in normalized for part in forbidden_paths)

def scan_file(path, rules):
    violations = []
    rel = str(path.relative_to(ROOT)).replace("\\", "/")

    if not path.exists() or path.is_dir():
        return violations

    if is_forbidden_path(rel, rules["universal"].get("forbidden_paths", [])):
        violations.append(f"{rel}: forbidden generated/vendor path edited")
        return violations

    try:
        content = path.read_text(encoding="utf-8", errors="ignore")
    except Exception as e:
        violations.append(f"{rel}: could not read file ({e})")
        return violations

    for pattern in rules["universal"].get("banned_patterns", []):
        if re.search(pattern, content):
            violations.append(f"{rel}: matched banned pattern `{pattern}`")

    for key, label in [
        ("banned_literals", "banned literal"),
        ("banned_imports", "banned import"),
        ("banned_usages", "banned usage"),
    ]:
        for pattern in rules["project"].get(key, []):
            if re.search(pattern, content):
                violations.append(f"{rel}: matched {label} `{pattern}`")

    return violations

def main():
    rules = load_policy()
    files = get_changed_files()

    all_violations = []
    for path in files:
        all_violations.extend(scan_file(path, rules))

    if all_violations:
        print("\nPolicy check failed:\n")
        for v in all_violations:
            print(f" - {v}")
        sys.exit(1)

    print("Policy check passed.")
    sys.exit(0)

if __name__ == "__main__":
    main()

One script. Every repo. Different policy.json. That’s the whole trick.

Step 3 — wire it into git locally

Create .git/hooks/pre-commit:

#!/bin/sh
python3 scripts/policy-check.py || exit 1

Then make both files executable:

chmod +x .git/hooks/pre-commit
chmod +x scripts/policy-check.py

Step 4 — wire it into CI

Whatever platform you use (GitHub Actions, GitLab, Bitbucket, custom), add one job that runs:

python3 scripts/policy-check.py

Local hook + CI job = the same enforcement at two entry points. Pre-commit catches you early. CI catches you when you skipped the hook. Neither can be bypassed without intent.

Step 5 — project-specific bans (where it gets real)

The universal block is table stakes. The project block is where the script earns its keep. Here’s what a real policy.json looks like per stack.

Laravel

{
  "universal": {
    "banned_patterns": ["dd\\(", "dump\\(", "var_dump\\(", "print_r\\(", "console\\.log\\("],
    "forbidden_paths": ["vendor/", "node_modules/", "public/build/"]
  },
  "project": {
    "banned_literals": [
      "'page_load'",
      "\"page_load\"",
      "'conversion'",
      "\"conversion\""
    ],
    "banned_imports": [
      "^use Illuminate\\\\Support\\\\Facades\\\\DB;"
    ],
    "banned_usages": [
      "env\\(",
      "DB::raw\\(",
      "DB::table\\("
    ]
  }
}

Bans raw event strings, bans env() outside config, bans direct DB facade usage in app code.

WordPress

{
  "project": {
    "banned_literals": [
      "https://example\\.com",
      "/wp-content/themes/old-theme/"
    ],
    "banned_usages": [
      "\\$wpdb->",
      "get_option\\('legacy_",
      "wp_enqueue_script\\("
    ]
  }
}

Bans direct $wpdb access, legacy option keys, and enqueues outside the canonical loading point.

Astro / JS

{
  "project": {
    "banned_literals": [
      "https://api\\.example\\.com",
      "\"draft\"",
      "'draft'"
    ],
    "banned_imports": [
      "from ['\"]\\.\\./\\.\\./utils/legacy",
      "from ['\"]axios['\"]"
    ],
    "banned_usages": [
      "console\\.log\\(",
      "fetch\\(['\"]https://api\\.example\\.com"
    ]
  }
}

Bans hardcoded API URLs, legacy utils imports, and the old axios path once fetch is canon.

Step 6 — start with 5 to 10 rules, not 50

Don’t try to encode everything. The ROI on the first 10 rules is enormous. The ROI on rules 11–100 falls off fast.

Start With These, In This Order

Hardcoded canon values — the single biggest source of drift
Direct legacy helper usage — banned once a replacement exists
Raw DB / query bypasses — force routing through canonical layers
Env / config literals in app code — move to config
Debug leftovers — dd, dump, console.log, TODO, HACK

That’s it. Ship those five. Let AI trip over them. Refine based on what it actually does.

Step 7 — native analyzers come after

Once the script is working and catching real violations, then layer on the platform-native analyzers:

Laravel / PHP: PHPStan → Larastan → custom rule only if a violation keeps recurring
Astro / JS / TS: ESLint → no-restricted-imports → no-console → custom rule later
WordPress: PHPCS if you want style guardrails, but the policy script will do most of the real semantic work first

The order matters. Grep-based checks give you 80% of the value at 5% of the effort. AST-based checks give you the last 20% at 10x the cost. Don’t invert that. (This is the same “cheap scan first, expensive scan second” discipline that makes the dead code cleanup loop work — narrow the search space cheaply before spending money on precision.)

The practical starting point this week

Add .ai/policy.json with the universal block only

Add scripts/policy-check.py

Wire pre-commit hook

Wire CI job

Add 5 project-specific bans in one real repo

Let AI trip over them — refine from real violations, not imagined ones

That’s the implementation path. Not more, not less.

Receipts: 104 real violations in a Laravel codebase I built myself

Everything above is theory until you run it. Here’s what happened when I dropped .ai/policy.json + scripts/policy-check.py + a pre-commit hook into this exact codebase — a Laravel 13 + Blade + Tailwind project I’d been building for months.

First run

$ python3 scripts/policy-check.py --all
Policy check FAILED — 75 violation(s) across 85 file(s) [mode: all (no git)]

75 violations. In a codebase I’d written myself, with a tight CLAUDE.md that explicitly forbade the exact patterns being caught. This is what happens when you don’t audit before you spread the mess — the drift accumulates silently because no tool is watching.

The breakdown:

Pattern	Count	What CLAUDE.md said
`space-y-` / `space-x-` in blade	70	”Use gap- for spacing in flex/grid — never space-x-* or space-y-”
`bg-gray-*` in `hero-animation.blade.php`	4	”Use semantic color names — not raw colors (text-gray-800)“
`space-y-*` in `app.css` `@apply`	2	Same as above — even my own design system had drift

Every single violation was backed by a documented, codified rule that I had written, that the AI had been told about, that I had reviewed, that had been ignored anyway.

This is the point of the post. You cannot prompt your way out of this. You cannot skill your way out of it. You cannot CLAUDE.md your way out of it. The only thing that changes outcomes is machine enforcement.

False positive handling

One real lesson from the first run: the blog draft for this post (which contains code snippets showing banned patterns as examples) triggered the universal bans. dd(, dump(, console.log(, HACK — all matched in the markdown file that was literally about banning them.

Fix: add resources/content/ to forbidden_paths. Content ≠ code. The script scans code. One-line config change, problem gone.

Content is not code. Scan what you enforce, not what you write about enforcement.

Another near-miss: I initially wrote dd\( without a word boundary. It matched added(, padded(, rendered( — any word ending in dd(. Fix: use \bdd\(. Word boundaries matter. Real regex review takes ~5 minutes and saves hours of false positive noise.

The remediation

With the scanner working, I ran /remediation to clear the existing drift. Not a hand-waved “I’ll get to it eventually” — an actual pass through every violation. Same session.

The pattern was mostly mechanical but required judgment:

Naive fix: sed 's/space-y-/gap-/g' across all blade files

Correct fix: for each violation, check the parent element. space-y-* works on any block parent. gap-* only works on flex/grid parents. Blind substitution breaks layouts where the parent is a plain div. Each fix required adding flex flex-col to the parent before swapping the class.

The final report after remediation:

$ python3 scripts/policy-check.py --all
Policy check passed — 85 file(s) scanned [mode: all]

75 → 0. Across 18 files. In one focused pass. With a commit that itself passed through the pre-commit hook as proof the loop closed.

The commit that proved the loop

$ git commit -m "refactor: migrate space-y/space-x to gap, eliminate raw colors"
Policy check passed — 19 file(s) scanned [mode: staged]
[main 10fefb3] refactor: migrate space-y/space-x to gap, eliminate raw colors
 19 files changed, 3822 insertions(+)

The hook ran. Scanned the staged files. Passed. The commit landed. The same loop will now run on every future commit in this repo — and in every other repo I set this up in.

The deploy pipeline integration

The policy check is wired into deploy.sh Step 1 alongside PHP syntax, blade compilation, and route loading. It runs in strict mode by default — any drift blocks deploy:

[1/8] Syntax & Policy Checks
  Running AI policy check...
  ✓ AI policy check passed — no drift
  Checking PHP syntax...
  ✓ PHP syntax OK
  Checking Blade templates...
  ✓ Blade templates compile
  Checking routes...
  ✓ Routes load (5 routes)

Bypass exists (POLICY_STRICT=0 ./deploy.sh) for emergencies, but the default is blocking. The whole point of machine enforcement is that it’s hard to ignore, so the off switch is intentionally awkward to use.

What I learned from actually doing it

A few things that the theory didn’t prepare me for:

Hard-Won Lessons from the First Real Run

Content directories must be excluded. If you write about enforcement, the scanner will catch your examples. Exclude content/, docs/, posts/, whatever your content directory is.
Word boundaries are not optional. dd\( matches added(. dump\( matches nothing useful. \b turns grep from “dumb substring matcher” into “actual pattern matcher.”
Start strict on universal, loose on project. The universal bans (debug, generated paths) were instantly right. The project bans (space-y-*) caught 70 real violations on first run. I did not need to tune them down. I needed to let them bite.
/remediation is the right follow-up, not /fix. Fixing 75 violations one-by-one with /fix would have been death by a thousand cuts. /remediation handled it as a pattern migration across 18 files in a single pass.
The commit message IS the receipt. “75 → 0” in one commit is proof the framework works. Don’t hide it in a chore commit — lead with the count in the message so future-you can find the moment enforcement started mattering.
Flip strict mode AS SOON AS drift is zero. Warning mode is a transition state. The longer you stay in it, the less it enforces. The moment the first policy-check --all passes, flip the default to strict. I did it in the same session. No regrets.

The blog post almost broke the blog post

The most on-the-nose moment: I was enriching this exact draft when I ran the scanner for the first time. The scanner caught 9 violations in the draft itself — code snippets showing banned patterns. The tool I was writing about tried to ban the post I was writing about it in.

That’s the kind of feedback loop you only get by actually running the thing. It’s also the kind of lesson that would have been a footnote in ChatGPT’s “phase 2, next week you’ll refine this” rollout plan. Running it today forced the lesson today.

The canon enforcement chapter (29 → 0)

After clearing the 75 style violations, I ran a /think assessment on the original goal and realized something important: the framework had caught hygiene and style drift, but the bugs I had originally been complaining about were a different class — AI hardcoding values that should come from canon, AI bypassing canonical entry points, AI reinventing instead of reusing.

Grade against the original concerns at that checkpoint:

Original concern	Caught?
Debug leftovers, Tailwind drift, raw DB access	✓
AI hardcoding a canonical value	✗
AI using a non-canonical function when canonical exists	✗
AI bypassing a service layer with one-off logic	✗

The fix: a canon inventory pass on this repo. 30 minutes of grep to identify the actual canonical primitives — Projects::*, MarkdownContent service, named routes in routes/web.php, blade components — and find where the codebase was bypassing them.

What I found: Projects::* and MarkdownContent were being used correctly everywhere. The only real canon bypass was hardcoded route paths — 29 of them scattered across views, breadcrumbs, nav arrays, and service pages.

Examples of what was being hardcoded:

<x-button href="/contact">Get in touch</x-button>
<x-link href="/learn">Browse articles</x-link>
<a href="/projects/itbroke-dev">itbroke.dev</a>

Every single one of these should be using route('contact'), route('learn'), route('projects.show', ['slug' => 'itbroke-dev']). The named routes existed. Nobody was using them. AI (and I) had been hardcoding paths every time a new link was added.

The new rule I added to the resources/views/ override:

"banned_usages": [
  "[\"']/contact[\"']",
  "[\"']/learn[\"']",
  "[\"']/projects[\"']",
  "[\"']/offers[\"']",
  "[\"']/changelog[\"']",
  "[\"']/services/mvp-partner[\"']",
  "[\"']/services/ads-partner[\"']",
  "[\"']/services/strategy-partner[\"']",
  "[\"']/projects/[a-z0-9-]+[\"']"
]

The regex form ["']/path["'] catches both HTML attribute syntax (href="/contact") AND PHP array syntax ('href' => '/contact') in one pattern. Dynamic routes like href="/learn/{{ $pillar }}" don’t match — the {{ breaks the pattern.

First run after adding the rules:

$ python3 scripts/policy-check.py --all
Policy check FAILED — 29 violation(s) across 85 file(s) [mode: all]

29 hardcoded canonical route strings. In 10 files. Caught by the same grep-based scanner, no new infrastructure needed.

Remediation was mechanical: href="/contact" → :href="route('contact')" (colon-prefix for Blade component bindings), 'href' => '/contact' → 'href' => route('contact') (PHP expression in array), hardcoded project slugs → {{ route('projects.show', ['slug' => 'itbroke-dev']) }}.

Final run:

$ python3 scripts/policy-check.py --all
Policy check passed — 85 file(s) scanned [mode: all]

29 → 0. Across 10 files. And this time the violations weren’t style — they were the exact class of bug I had originally been complaining about. Hardcoded values bypassing canonical primitives that already existed in the codebase.

The final four-commit arc:

6f89b50 feat(policy): enforce named routes in blade views + remediate 29 hardcoded paths
a94be89 chore: flip POLICY_STRICT to on by default + add remediation receipts
10fefb3 refactor: migrate space-y/space-x to gap, eliminate raw colors
0753725 chore: add AI policy check scaffolding

Reading that history top-to-bottom: scaffold → clear hygiene → lock strict → enforce canon. The framework is now load-bearing for the bug class that sparked the whole thing.

Total drift cleared across all four commits: 104 real violations in a codebase I had been writing myself, with a tight CLAUDE.md, using all the skills I had built. None of that caught it. The grep-based policy check caught all of it.

The framework doesn’t need to be smart. It needs to be present.

Update: what happened when I added the AST layer on top

Earlier in this post I said “grep-based checks give you 80% of the value at 5% of the effort. AST-based checks give you the last 20% at 10x the cost. Don’t invert that.” After the grep layer had been running clean for a few commits, I went back and installed the AST layer — Larastan (PHPStan for Laravel) at level 5 — to see if the 20% claim held up on this exact codebase.

It did. In a very specific way.

Install + configure + first run took about 6 minutes. The first run reported 4 errors, zero false positives. Every single error was a real bug in code that grep could not see:

File	Error class	What grep could not see
`GenerateFeaturedImages.php:73`	Dead code (`!$dryRun` always true)	Control flow — the dry-run branch `continue`s 20 lines earlier, so the later negated check is unreachable
`OgImageController.php:23`	Deprecated implicit nullable (`string $slug = null`)	PHP’s type system — would become a hard error on PHP 9
`MarkdownContent.php:420` (`renderVersus`)	Dead null-coalesce (`preg_match_all` never returns a missing offset 0)	Return type semantics of the stdlib
`MarkdownContent.php:482` (`renderCompare`)	Same dead `?? []` as above	Same

None of these would have been caught by prompt tuning, CLAUDE.md rules, skill refinement, or any amount of regex polishing. They’re the exact class of bug the AST layer is supposed to catch: semantics, not syntax.

Fixing all four took about 4 minutes. Total elapsed time from install to clean second run to committed deploy: ~12 minutes. The 20% claim was right — AST tooling is more expensive to set up than grep, but the cost is measured in minutes, not weeks. Once it’s clean, it stays clean, and any future drift that touches types/null/signatures fails the preflight check the same way grep failures do.

Grep catches drift. AST catches semantics. You need both — and the sequencing matters.

The lesson worth keeping: do grep first. Installing PHPStan before the grep layer is stable would have meant triaging hundreds of low-signal style warnings alongside the 4 real bugs, and the real bugs would have been buried. By the time I added PHPStan, the hygiene drift was already gone — so every PHPStan finding was automatically high-signal. The layers support each other. Run them in the wrong order and each layer makes the other harder.

The updated deploy preflight now runs both layers in Step 1:

[1/8] Syntax & Policy Checks
  Running PHPStan (level 5)...
  ✓ PHPStan passed — no type errors
  Running AI policy check...
  ✓ AI policy check passed — no drift
  Checking PHP syntax...
  ✓ PHP syntax OK
  Checking Blade templates...
  ✓ Blade templates compile
  Checking routes...
  ✓ Routes load (5 routes)

PHPStan adds ~3-5 seconds. The grep check runs in under one second. Total Step 1 time stays under 15 seconds including blade compile and route load. Cheap, fast, load-bearing.

The eight-commit arc (updated)

04b5339 feat(phpstan): install Larastan at level 5 + fix 4 real bugs it caught
3bb1767 feat(toc): H2-only on desktop + fade mask for overflow handling
1b0780f feat(content): dynamic sidebar TOC + AI-ready prompt block + compare fix
f792d37 content: publish AI Change Control framework post
6f89b50 feat(policy): enforce named routes in blade views + remediate 29 hardcoded paths
a94be89 chore: flip POLICY_STRICT to on by default + add remediation receipts
10fefb3 refactor: migrate space-y/space-x to gap, eliminate raw colors
0753725 chore: add AI policy check scaffolding

Reading bottom-to-top (chronological): scaffold → clear hygiene → lock strict → enforce canon → publish the spec → polish UX → refine TOC → install AST layer. Each commit is a receipt for one claim the framework makes. The whole arc took a few focused sessions, not weeks.

Total drift cleared: 104 grep violations + 4 AST bugs = 108 real issues in a codebase I built myself and thought was clean. None of them would have been caught without the enforcement layer. All of them shipped silently before it was installed.

The shortest version

(Jump back to the full framework for context, or straight to the starter kit if you want to install it now.)

Your framework is six steps:

AI Change Control, Compressed

Classify change by risk
Discover canon before coding
Extend, don’t invent
Enforce architecture automatically
Verify based on blast radius
Report what canon was used

The one line I’m anchoring on

You do not need AI to understand the whole codebase. You need a codebase where understanding the whole codebase is not required to make a correct local change.

That’s the shift. That’s the whole point. Everything above is just how you get there.

Ship this in your project today

If you want to install the AI Change Control framework in your own repo, here’s the zero-ceremony version. This is the exact path I ran on this codebase. The concrete code lives in the starter kit section above — jump there for copy-pasteable files, come back here for the install order.

Create .ai/policy.json with the universal block (debug bans, forbidden paths) and an empty project block. Start with the starter kit above.

Create scripts/policy-check.py from the code earlier in this post. Python 3, no dependencies, ~140 lines.

Create scripts/git-hooks/pre-commit with the one-liner that runs policy-check.py. Version-control it in the repo.

Symlink it into git: ln -sf ../../scripts/git-hooks/pre-commit .git/hooks/pre-commit. Edits to the source file are live.

Run python3 scripts/policy-check.py --all to audit current drift. Whatever number comes back is your starting point.

Do a canon inventory pass. Grep your codebase for the canonical entry points — your services, your enums, your named routes, your shared components. Find 3-5 places where something should be using canon but isn’t. Add banned patterns for those bypasses.

Remediate existing drift. Use /remediation if you have it, or just work through the violations manually. Don’t stop at “most of them are fixed.” Get to zero.

Flip strict mode on. In your deploy script or CI, make any drift block the build by default. Emergency bypass via env var if you must, but keep the default awkward to use.

Commit it. Let the hook scan your commit. Let it pass. That’s the loop closing. The framework is now load-bearing.

That’s 9 steps. It took me one focused session on this repo to do all 9. Not 9 days, not 9 phases, not “next week we’ll get to CI.” One session.

The files you need to create

Required Files for Installation

.ai/policy.json — the rules, JSON format
scripts/policy-check.py — the scanner, Python, zero deps
scripts/git-hooks/pre-commit — the hook source, symlinked into .git/hooks/
scripts/setup.sh — optional but recommended, makes re-install idempotent
One line in your deploy script — runs the scanner before building

Five files. That’s the whole framework. Everything else in this post is explaining why and what to put in the rules — not additional infrastructure.

What to tell your AI

If you’re handing this post to your AI as context, the instructions to give it are:

Read this post. Then check if this repo has .ai/policy.json. If yes, read it before writing any code. When I give you a task, before you finish, confirm which canonical primitives you used. Do not hardcode values, create duplicates, or bypass existing services. If the pre-commit hook blocks your work, fix the violation — do not bypass with --no-verify.

That’s the prompt. Save it somewhere you can paste it at the start of any new session in a repo that uses this framework.

Related:

What 17 Retrospectives Taught Me About Coding With AI — why these rules became a hook instead of a CLAUDE.md note
Custom Skills for Claude Code — the skill layer (Layer 2) that complements this framework’s enforcement layer (Layer 3)