Add refactor-scout agent — read-only technical-debt surveyor for source code

The janitor's source-code sibling: surveys existing code (not a diff) for smells, dead code, duplication, and over-complexity, prioritizes by churn × complexity, and recommends a disposition (refactor/delete/defer/accept) per finding designed to feed /triage. Test-net status and risk-to-change are first-class so a refactor is only recommended when behavior preservation can be proven. Read-only; the risky auto-apply half is deliberately deferred and gated. ROADMAP item 11.
2026-06-19 22:57:32 -05:00
parent 4533cc1d85
commit f258f9fd3c
4 changed files with 245 additions and 9 deletions
@@ -245,3 +245,49 @@ owner ratifies instead of researching.
 reversible + test-covered" class, once the owner has watched it make calls and trusts the verdicts
 (deliberately deferred — recommend-only first to build trust); (b) a thin `/triage`-then-`/adjudicate`
 combo if the two-command chaining friction proves real (YAGNI for now).
+
+## 11. `refactor-scout` — read-only technical-debt surveyor for source code ✅ BUILT (2026-06-19)
+
+Built and live: `guides/refactor-scout.md` + `adapters/claude/agents/refactor-scout.md` (the
+`refactor-scout` subagent). The **janitor's source-code sibling**: where the janitor spring-cleans
+docs/artifacts, this surveys *code* debt. It answers "where is the code getting bulky and tangled,
+and what's actually worth touching?" — the spring-cleaning the owner (a non-coder) wanted, scoped so
+the value is captured with none of the risk.
+
+**The insight that scoped it:** separate *seeing* debt (read-only, zero risk — almost all the value)
+from *changing* it (where 100% of the risk lives). This agent builds only the seeing half. It surveys
+*existing* code (not a diff — that's `reviewer`; not docs — that's `janitor`; not a whole-repo grade —
+that's `evaluator`), and is deliberately **opinionated and tiered** (≤~12 highest value-to-risk
+findings, never a 200-item dump that paralyzes).
+
+- **Targeted, not uniform.** Prioritizes by **churn × complexity** (the proven hotspot heuristic) so
+  attention lands where debt actually hurts, then inspects those hotspots for smells (DUPLICATION,
+  LONG, COMPLEX, LARGE, DEAD, COUPLING, INCONSISTENT, MAGIC).
+- **Tool-backed where possible.** Runs the repo's *own* analyzers read-only (knip/ts-prune, vulture,
+  deadcode, clippy, jscpd…) for high-confidence dead-code/duplication signals, kept separate from
+  reasoned judgment; **never installs anything** — a missing analyzer becomes a recommendation.
+- **Behavior preservation is sacred + test-net is first-class.** Every finding carries risk-to-change
+  (blast-radius, unclear ⇒ HIGH) and test-net status. A REFACTOR is only ever recommended when a green
+  test can prove behavior is unchanged before/after; no coverage ⇒ "write a characterization test
+  first," never "just refactor."
+- **The disposition loop (the deliverable's whole point).** Each finding is bucketed into
+  **REFACTOR** (worth it + safe + covered, annotated with the specific refactoring) · **DELETE** (dead,
+  tool-confirmed) · **DEFER** (worth it, gated on a test net → ROADMAP) · **ACCEPT** (real debt not
+  worth the risk — a legitimate choice). Buckets are designed to feed straight into `/triage`. The
+  non-coder approves on the *contract* ("tests green before/after, behavior unchanged, small diff,
+  reviewer signed off"), not by reading the diff.
+
+**Deliberately NOT built — the auto-apply half (deferred + heavily gated).** Actually editing working
+code is where the risk lives; for now the owner acts on findings manually, one at a time, via a normal
+gated agent session (the apply path already exists — hand a finding to `reviewer`-backed work under the
+test gate). A dedicated `/refactor-apply` command is a *future* item, justified only after the survey
+proves its worth, and only behind: existing-or-generated characterization tests, LOW risk-to-change,
+small diff, and human approval.
+
+**Remaining options:** (a) fold a `refactor-scout` pass into `/full-eval` for code repos; (b) the gated
+auto-apply command above, once the survey has earned trust; (c) once item 1's `/harden` exists, the
+churn×complexity hotspot list pairs naturally with the per-stack linter baseline.
+
+**Untested on a real repo** — the first run (recap or keysat) should confirm it stays tiered and honest
+about test coverage rather than over-recommending refactors. Tune the guide's tiering cap / risk rules
+if it over-produces.