AI code reviews grounded in twelve classic engineering books.
Consistent. Traceable. Actionable.
English · 简体中文
The Six Decay Risks • What It Looks Like • Benchmark • Installation
"The bearing of a child takes nine months, no matter how many women are assigned." — Frederick Brooks, The Mythical Man-Month (1975)
50 years later, Brooks was still right — and so were McConnell, Fowler, Martin, Hunt & Thomas, Evans, Ousterhout, Winters, Meszaros, Osherove, Feathers, and the Google Testing team.
Most code quality tools count lines and cyclomatic complexity. brooks-lint goes deeper — it diagnoses your code against six decay risk dimensions synthesized from twelve classic engineering books, producing structured findings with book citations, severity labels, and concrete remedies every time.
For the full source-to-skill mapping, including exceptions and false-positive guards, see
skills/_shared/source-coverage.md.
| Book | Author | Contributes to |
|---|---|---|
| The Mythical Man-Month | Frederick Brooks | R2, R4, R5 |
| Code Complete | Steve McConnell | R1, R4 |
| Refactoring | Martin Fowler | R1, R2, R3, R4, R6 |
| Clean Architecture | Robert C. Martin | R2, R5 |
| The Pragmatic Programmer | Hunt & Thomas | R2, R3, R4, R5, T2, T3 |
| Domain-Driven Design | Eric Evans | R1, R3, R6 |
| A Philosophy of Software Design | John Ousterhout | R1, R4 |
| Software Engineering at Google | Winters, Manshreck & Wright | R2, R5 |
| The Art of Unit Testing | Roy Osherove | T1, T2, T4, T5 |
| How Google Tests Software | James A. Whittaker, Jason Arbon & Jeff Carollo | T5, T6 |
| Working Effectively with Legacy Code | Michael Feathers | T4, T5, T6 |
| xUnit Test Patterns | Gerard Meszaros | T1, T2, T3, T4 |
brooks-lint evaluates your code across six production-code decay risks and six test-suite decay risks synthesized from twelve classic engineering books:
| Decay Risk | Diagnostic Question | Sources |
|---|---|---|
| 🧠 Cognitive Overload | How much mental effort to understand this? | Code Complete, Refactoring, DDD, Philosophy of SD |
| 🔗 Change Propagation | How many unrelated things break on one change? | Refactoring, Clean Architecture, Pragmatic, SE@Google |
| 📋 Knowledge Duplication | Is the same decision expressed in multiple places? | Pragmatic, Refactoring, DDD |
| 🌀 Accidental Complexity | Is the code more complex than the problem? | Refactoring, Code Complete, Brooks, Philosophy of SD |
| 🏗️ Dependency Disorder | Do dependencies flow in a consistent direction? | Clean Architecture, Brooks, Pragmatic, SE@Google |
| 🗺️ Domain Model Distortion | Does the code faithfully represent the domain? | DDD, Refactoring |
Philosophy of SD = A Philosophy of Software Design (Ousterhout) · SE@Google = Software Engineering at Google (Winters et al.)
Given this code:
class UserService:
def update_profile(self, user_id, name, email, avatar_url):
user = self.db.query(f"SELECT * FROM users WHERE id = {user_id}")
user['email'] = email
...
if user['email'] != email: # always False — silent bug
self.smtp.send(...)
points = user['login_count'] * 10 + 500
self.db.execute(f"UPDATE loyalty SET points={points} WHERE user_id={user_id}")brooks-lint produces:
Health Score: 28/100
This method concentrates four unrelated business responsibilities into a single function, contains a logic bug that silently suppresses email change notifications, and is wide open to SQL injection.
Symptom: update_profile performs profile field updates, email change notifications, loyalty points recalculation, and cache invalidation all in one method body.
Source: Fowler — Refactoring — Divergent Change; Hunt & Thomas — The Pragmatic Programmer — Orthogonality
Consequence: Any change to the loyalty formula risks breaking email notifications and vice versa. Every edit carries regression risk across four unrelated domains simultaneously.
Remedy: Extract NotificationService, LoyaltyService, and UserCacheInvalidator. UserService.update_profile should orchestrate by calling each — it should hold no implementation logic itself.
Symptom: user['email'] = email overwrites the old value before if user['email'] != email — the condition is always False. The notification is dead code.
Source: McConnell — Code Complete — Ch. 17: Unusual Control Structures
Consequence: Users are never notified when their email address changes. Silent data integrity failure — the system appears functional while violating a business rule.
Remedy: Capture old_email = user['email'] before any mutation. Compare against old_email, not user['email'].
(+ 6 more findings including SQL injection, dependency disorder, magic numbers)
In Mode 2 (Architecture Audit), brooks-lint generates a Mermaid dependency graph at the top of the report. Modules are color-coded by severity: red = Critical findings, yellow = Warning, green = clean.
graph TD
subgraph src/api
AuthController
UserController
end
subgraph src/domain
UserService
OrderService
end
subgraph src/infra
Database
EmailClient
end
AuthController --> UserService
UserController --> UserService
UserController --> OrderService
OrderService --> UserService
OrderService --> EmailClient
UserService --> Database
EmailClient -.->|circular| OrderService
classDef critical fill:#ff6b6b,stroke:#c92a2a,color:#fff
classDef warning fill:#ffd43b,stroke:#e67700
classDef clean fill:#51cf66,stroke:#2b8a3e,color:#fff
class OrderService,EmailClient critical
class AuthController warning
class UserService,UserController,Database clean
The graph renders natively in GitHub, Notion, and other Markdown environments — no extra tools needed.
The Full Gallery has real brooks-lint output across Python, TypeScript, Go, and Java — including PR reviews, architecture audits with Mermaid dependency graphs, tech debt assessments, and test quality reviews.
New to the decay risks? The Decay Risk Field Guide explains all six — diagnostic question, signature symptoms, source books, and remedy for each.
Tested across 3 real-world scenarios (PR review, architecture audit, tech debt assessment):
| Criterion | brooks-lint | Claude alone |
|---|---|---|
| Structured findings (Symptom → Source → Consequence → Remedy) | ✅ 100% | ❌ 0% |
| Book citations per finding | ✅ 100% | ❌ 0% |
| Severity labels (🔴/🟡/🟢) | ✅ 100% | ❌ 0% |
| Health Score (0–100) | ✅ 100% | ❌ 0% |
| Detects Change Propagation | ✅ 100% | ✅ 100% |
| Overall pass rate | 94% | 16% |
The gap isn't what Claude can find — it's what it consistently finds, with traceable evidence and actionable remedies every time.
The table above is illustrative. These numbers are deterministic and you can reproduce them locally:
Parser fidelity — SARIF export and the CI gates depend on parsing the model's Markdown report correctly. Against a frozen corpus of 30 real, model-generated reports spanning all six modes (evals/benchmark-corpus.json), each paired with an independently graded finding inventory (a separate model pass, spot-checked by hand), the shipped parser scores — run npm run benchmark:
| Metric (n = 30, frozen corpus) | Result |
|---|---|
| Exact severity-count match (parser vs. graded truth) | 30 / 30 |
| Risk-code precision / recall | 100% / 100% (56 finding-level codes, 0 FP / 0 FN) |
| Valid SARIF 2.1.0 emitted | 30 / 30 |
Because the parser is deterministic and the corpus is frozen, npm run benchmark gives everyone the same result, and npm test guards it as a regression. The corpus deliberately includes 9 false-positive / tradeoff reports (e.g. a ports-and-adapters design that looks like a dependency cycle) that must stay clean.
Scoring determinism — for a fixed finding set (2 Critical / 3 Warning / 1 Suggestion), the strictness presets produce exactly the scores their common.md table predicts: strict 34, balanced 54, legacy-friendly 74 — and only legacy-friendly leads with the top-three fixes.
Model quality — whether the model finds the right risks on real code is measured by the 57-scenario eval suite (evals/evals.json): npm run evals (structural) and npm run evals:live (live, needs ANTHROPIC_API_KEY).
Scope & honesty: the parser numbers are deterministic and exactly reproducible. The strictness and eval-suite figures are single-run live measurements against the model and vary slightly run to run. The parser benchmark measures report-parsing fidelity (does the tooling read every finding the report states?), not whether a given finding is "correct." The severity-count match is the fully independent signal; risk-code agreement also reflects the shared canonical name→code legend.
| brooks-lint | ESLint / Pylint | GitHub Copilot Review | Plain Claude | |
|---|---|---|---|---|
| Detects syntax & style issues | — | ✅ | ✅ | ~ |
| Structured diagnosis chain | ✅ | ❌ | ❌ | ❌ |
| Traces findings to classic books | ✅ | ❌ | ❌ | ❌ |
| Consistent severity labels | ✅ | ✅ | ~ | ❌ |
| Architecture-level insights | ✅ | ❌ | ~ | ~ |
| Domain model analysis | ✅ | ❌ | ❌ | ~ |
| Zero config, no plugins to install | ✅ | ❌ | ✅ | ✅ |
| Works with any language | ✅ | ❌ | ✅ | ✅ |
~= occasionally / inconsistently
brooks-lint doesn't replace your linter. It catches what linters can't: architectural drift, knowledge silos, and domain model distortion — the problems that slow teams down for months before anyone notices.
/plugin marketplace add hyhmrright/brooks-lint
/plugin install brooks-lint@brooks-lint-marketplaceShort-form commands (/brooks-review) are auto-installed on first session start. To install manually:
cp commands/*.md ~/.claude/commands/mkdir -p ~/.claude/skills/brooks-lint
cp -r skills/* ~/.claude/skills/brooks-lint//extensions install https://github.com/hyhmrright/brooks-lintmkdir -p ~/.gemini/skills
cp -r skills/* ~/.gemini/skills/ # flat — Gemini discovers skills only one level deepOr simply:
./scripts/install.sh gemini
Install the brooks-lint skill from hyhmrright/brooks-lint
python3 ~/.codex/skills/.system/skill-installer/scripts/install-skill-from-github.py \
--repo hyhmrright/brooks-lint --path skills --name brooks-lintgit clone https://github.com/hyhmrright/brooks-lint.git /tmp/brooks-lint
mkdir -p ~/.codex/skills
cp -r /tmp/brooks-lint/skills/* ~/.codex/skills/ # flat — matches the skill-installer layoutOr simply:
./scripts/install.sh codex
brooks-lint ships as standard Agent Skills. Any agent that loads Agent Skills runs all six modes with no conversion — one command installs them:
# pick your platform; --project installs into the current repo instead of your global config
curl -fsSL https://raw.githubusercontent.com/hyhmrright/brooks-lint/main/scripts/install.sh | bash -s -- <platform>
# <platform> = opencode · cursor · windsurf · antigravity · pi · kiro · copilot · droid · gemini · codex · agentsThe installer copies the skills flat into the right folder for your platform, so the shared
framework (../_shared/) always resolves — you can't get the layout wrong. Then just ask
("review this PR", "audit the architecture") and the matching skill auto-triggers from its
description. New to skills, or using another agent? See docs/getting-started.md.
OpenCode
./scripts/install.sh opencode → ~/.config/opencode/skills (also reads ~/.claude/skills and
AGENTS.md). Full guide: docs/opencode-setup.md.
Cursor (2.4+)
./scripts/install.sh cursor → ~/.cursor/skills (also .agents/skills; reads AGENTS.md).
Full guide: docs/cursor-setup.md.
Windsurf (Cascade)
./scripts/install.sh windsurf → ~/.codeium/windsurf/skills (reads AGENTS.md).
Full guide: docs/windsurf-setup.md.
Antigravity (Google)
./scripts/install.sh antigravity --project → .agent/skills (reads AGENTS.md / GEMINI.md).
Full guide: docs/antigravity-setup.md.
pi (earendil-works)
./scripts/install.sh pi → ~/.pi/agent/skills, or point pi's skills setting at a clone.
Full guide: docs/pi-setup.md.
GitHub Copilot
./scripts/install.sh copilot --project → .github/skills (also auto-detects .claude/skills; reads
AGENTS.md). Full guide: docs/copilot-setup.md.
Kiro (AWS)
./scripts/install.sh kiro → ~/.kiro/skills (auto-registers /brooks-review; reads AGENTS.md).
Full guide: docs/kiro-setup.md.
Factory Droid
./scripts/install.sh droid → ~/.factory/skills (registers /brooks-review; reads AGENTS.md).
Full guide: docs/factory-droid-setup.md.
🧪 Verification status. Claude Code, Gemini CLI, and Codex CLI are maintainer-verified. The eight platforms above are documented from each tool's official skill spec and verified at the file-layout level (the installer is tested), but not yet end-to-end run by the maintainer on every platform. Tried one — working or broken? Open an issue with the platform, version, and what you saw. Another Agent-Skills agent? It almost certainly works the same way — tell us and we'll add it.
| Command | Short Form | Action |
|---|---|---|
/brooks-lint:brooks-review |
/brooks-review |
PR-level code review |
/brooks-lint:brooks-audit |
/brooks-audit |
Full architecture audit |
/brooks-lint:brooks-debt |
/brooks-debt |
Tech debt assessment |
/brooks-lint:brooks-test |
/brooks-test |
Test suite health review |
/brooks-lint:brooks-health |
/brooks-health |
Health dashboard — all four dimensions |
/brooks-lint:brooks-sweep |
/brooks-sweep |
Full sweep — analyse all dimensions and auto-fix findings |
Short-form commands are auto-installed on first session start by the session-start hook.
| Command | Action |
|---|---|
/brooks-review |
PR-level code review |
/brooks-audit |
Full architecture audit |
/brooks-debt |
Tech debt assessment |
/brooks-test |
Test suite health review |
/brooks-health |
Health dashboard — all four dimensions |
/brooks-sweep |
Full sweep — analyse all dimensions and auto-fix findings |
| Command | Action |
|---|---|
$brooks-review |
PR-level code review |
$brooks-audit |
Full architecture audit |
$brooks-debt |
Tech debt assessment |
$brooks-test |
Test suite health review |
$brooks-health |
Health dashboard — all four dimensions |
$brooks-sweep |
Full sweep — analyse all dimensions and auto-fix findings |
The skills also trigger automatically when you discuss code quality, architecture, maintainability, or test health.
These platforms invoke Agent Skills automatically from each skill's description — just ask
("review this PR", "audit the architecture", "where's our worst tech debt?") and the matching mode
runs. For explicit invocation, use the platform's skill-command syntax (e.g. pi registers each skill
as /skill:brooks-review; Cursor and OpenCode expose /brooks-review once the skill is discovered).
/brooks-review # Claude Code (short form) / Gemini CLI
/brooks-lint:brooks-review # Claude Code (full form)
$brooks-review # Codex CLI
Paste a diff or point the AI at changed files. Diagnoses each of the six decay risks with specific findings in Symptom → Source → Consequence → Remedy format.
/brooks-audit # Claude Code (short form) / Gemini CLI
/brooks-lint:brooks-audit # Claude Code (full form)
$brooks-audit # Codex CLI
Describe your project structure or share key files. It maps module dependencies, identifies circular dependencies, and checks Conway's Law alignment.
/brooks-debt # Claude Code (short form) / Gemini CLI
/brooks-lint:brooks-debt # Claude Code (full form)
$brooks-debt # Codex CLI
Classifies your debt across the six decay risks, scores each finding by Pain × Spread priority, and produces a prioritized repayment roadmap with Critical / Scheduled / Monitored classification.
/brooks-test # Claude Code (short form) / Gemini CLI
/brooks-lint:brooks-test # Claude Code (full form)
$brooks-test # Codex CLI
Audits your test suite against six test-space decay risks — Test Obscurity, Test Brittleness, Test Duplication, Mock Abuse, Coverage Illusion, and Architecture Mismatch — sourced from xUnit Test Patterns, The Art of Unit Testing, How Google Tests Software, and Working Effectively with Legacy Code. PR reviews also include a lightweight Step 7 Quick Test Check automatically (skipped for docs-only or non-production diffs).
/brooks-health # Claude Code (short form) / Gemini CLI
/brooks-lint:brooks-health # Claude Code (full form)
$brooks-health # Codex CLI
Runs abbreviated scans across all four quality dimensions and produces a weighted composite Health Score (0–100). Use it before a release, when onboarding a new team, or whenever you want a big-picture "how are we doing?" report. For deeper diagnosis on any dimension, use the focused skill instead.
/brooks-sweep # Claude Code (short form) / Gemini CLI
/brooks-lint:brooks-sweep # Claude Code (full form)
$brooks-sweep # Codex CLI
Runs a unified scan across all production (R1–R6) and test (T1–T6) decay risks plus architecture in a single pass, then applies fixes: safe changes are auto-applied immediately, multi-file or interface-touching changes require confirmation, and complex architectural decisions are flagged as manual items. Outputs a Fix Log, Health Score delta, and a residual item list.
Place a .brooks-lint.yaml in your project root to customize review behavior:
version: 1
strictness: balanced # strict | balanced (default) | legacy-friendly — softer scoring for legacy code
disable:
- T5 # skip coverage metrics check — we don't enforce coverage
severity:
R1: suggestion # downgrade Cognitive Overload findings for this domain
ignore:
- "**/*.generated.*"
- "**/vendor/**"
# custom_risks: # define project-specific Cx codes — see skills/_shared/custom-risks-guide.md
# suppress: # downgrade specific findings by risk + path (e.g. accepted legacy debt)Copy .brooks-lint.example.yaml as a starting point.
All settings are optional — omit the file entirely for default behavior.
| Setting | Description |
|---|---|
strictness |
Scoring preset: strict, balanced (default), or legacy-friendly (lighter deductions, leads with top fixes) |
disable |
Risk codes to skip (R1–R6, T1–T6) |
severity |
Override severity tier (critical / warning / suggestion) |
ignore |
Glob patterns for files to exclude |
focus |
Evaluate only these risk codes (cannot combine with disable) |
custom_risks |
Define project-specific risk codes (C1, C2, …) — see custom-risks-guide.md |
suppress |
Downgrade specific findings by risk + path (optional expires: date) |
In the age of AI-assisted coding, we're writing more code faster than ever. But the insights from six decades of software engineering haven't changed:
"The complexity of software is an essential property, not an accidental one." — Frederick Brooks
AI can help you write code faster, but it can't tell you whether you're building a cathedral or a tar pit. brooks-lint bridges that gap — it brings the hard-won wisdom of twelve classic engineering books into your modern development workflow.
The decay risks these authors identified are more relevant than ever:
- Adding AI assistants doesn't fix cognitive overload or domain model distortion
- Generating more code increases change propagation and knowledge duplication
- Moving faster makes accidental complexity and dependency disorder even more dangerous
brooks-lint/
├── .claude-plugin/ # Claude Code plugin metadata
├── .codex-plugin/ # Codex CLI plugin metadata
├── skills/
│ ├── _shared/ # Shared framework files
│ │ ├── common.md # Iron Law, Project Config, Report Template, Health Score
│ │ ├── source-coverage.md # 12-book coverage matrix, tradeoffs, false-positive guards
│ │ ├── decay-risks.md # Six decay risks with symptoms and book citations
│ │ ├── test-decay-risks.md # Six test-space decay risks with book citations
│ │ ├── remedy-guide.md # --fix mode: actionable Remedy enhancement rules
│ │ └── custom-risks-guide.md # Template for project-specific risk codes
│ ├── brooks-review/ # Mode 1: PR Review
│ │ ├── SKILL.md
│ │ └── pr-review-guide.md
│ ├── brooks-audit/ # Mode 2: Architecture Audit
│ │ ├── SKILL.md
│ │ └── architecture-guide.md
│ ├── brooks-debt/ # Mode 3: Tech Debt Assessment
│ │ ├── SKILL.md
│ │ └── debt-guide.md
│ ├── brooks-test/ # Mode 4: Test Quality Review
│ │ ├── SKILL.md
│ │ └── test-guide.md
│ ├── brooks-health/ # Mode 5: Health Dashboard
│ │ ├── SKILL.md
│ │ └── health-guide.md
│ └── brooks-sweep/ # Mode 6: Full Sweep & Auto-Fix
│ ├── SKILL.md
│ └── sweep-guide.md
├── hooks/ # SessionStart hook
├── commands/ # Short-form command wrappers (auto-installed by hook)
├── evals/ # Benchmark test cases
│ └── evals.json
└── assets/
└── logo.svg
Automate brooks-lint on every PR using the GitHub Action:
# .github/workflows/brooks-lint.yml
name: Brooks-Lint PR Review
on:
pull_request:
types: [opened, synchronize, reopened]
jobs:
brooks-lint:
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: hyhmrright/brooks-lint/.github/actions/brooks-lint@main
with:
mode: review
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
fail-below: 70See docs/github-action-example.yml for the full template.
The action posts the review as a PR comment and optionally fails the check if the Health Score drops below a threshold. If .brooks-lint-history.json is committed to your repo, the comment also includes a trend delta (e.g., "85 → 82 (−3) over last 3 runs").
Quality gates and Code Scanning. Beyond fail-below, the action exposes:
with:
mode: review
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
fail-on: critical # fail on any Critical finding (none | warning | critical)
fail-on-regression: true # fail if the Health Score dropped vs the last run
sarif-file: brooks-lint.sarif # also upload findings to GitHub Code Scanningfail-on-regression reads .brooks-lint-history.json, so commit that file to enforce "no new regressions". Setting sarif-file makes findings appear inline on the PR's Files changed tab and requires security-events: write permission on the job.
Cost: ~$0.05–0.15 per PR run depending on diff size and model. Recommend running on pull_request events only.
Current state (v1.4): 12-book foundation, 6 production decay risks (R1–R6) + 6 test decay risks (T1–T6), 6 skills — PR Review, Architecture Audit, Tech Debt, Test Quality, Health Dashboard, Full Sweep — plus CI quality gates, SARIF output for GitHub Code Scanning, strictness presets, and a reproducible parser-fidelity benchmark. Earlier entries below describe historical milestones, not the current feature set.
- v0.2: Plugin infrastructure (
.claude-plugin/, hooks, slash commands) - v0.3: Eight Brooks dimensions, documentation completeness scoring
- v0.4: Six-book framework, decay risk dimensions, diagnosis chain, benchmark suite
- v0.5: Test Quality Review (Mode 4) — four testing books, six test decay risks
- v0.6: Mermaid dependency graph in Architecture Audit
- v0.7:
.brooks-lint.yamlproject config, Mode 2 proactive context, 10-book expansion - v0.8: Independent skill architecture with namespaced commands
- v0.9: Step validation, auto-diff scope,
/brooks-healthdashboard, trend tracking, triage mode,--fixremedies, onboarding report, GitHub Action - v1.0: Eval automation (
run-evals-live.mjs), custom risk extension (Cxcodes) - v1.1: Full Sweep skill (
brooks-sweep) — unified multi-dimension auto-fix - v1.2: Autonomous sweep pipeline,
npm run bumpversion propagation - v1.3: Codex marketplace metadata, one-command installer for multiple agent platforms, bilingual README + landing site
- v1.4: SARIF output for GitHub Code Scanning, CI severity + regression gates, strictness presets (strict/balanced/legacy-friendly), 57-scenario eval suite, reproducible parser-fidelity benchmark (
npm run benchmark)
Want to help? The best contributions right now are new eval test cases and improved decay risk symptom patterns. See CONTRIBUTING.md.
See CONTRIBUTING.md for how to add findings, improve guides, or expand the benchmark suite.
Run /brooks-review on your own PR — we review contributions with the tool we're building.
MIT License — see LICENSE for details.
This project stands on the shoulders of twelve giants:
Production Code Framework
- Frederick P. Brooks Jr. — The Mythical Man-Month (1975, Anniversary Edition 1995)
- Steve McConnell — Code Complete (1993, 2nd ed. 2004)
- Martin Fowler — Refactoring (1999, 2nd ed. 2018)
- Robert C. Martin — Clean Architecture (2017)
- Andrew Hunt & David Thomas — The Pragmatic Programmer (1999, 20th Anniversary Ed. 2019)
- Eric Evans — Domain-Driven Design (2003)
- John Ousterhout — A Philosophy of Software Design (2018)
- Titus Winters, Tom Manshreck, and Hyrum Wright — Software Engineering at Google (2020)
Test Quality Framework
- Gerard Meszaros — xUnit Test Patterns (2007)
- Roy Osherove — The Art of Unit Testing (2009, 3rd ed. 2023)
- Google Engineering — How Google Tests Software (2012)
- Michael Feathers — Working Effectively with Legacy Code (2004)
The decay risks encoded in this tool are our synthesis of their ideas, applied to modern code quality assessment.
⭐ If this tool helped you see your codebase differently, give it a star!
