Add refactor-scout agent — read-only technical-debt surveyor for source code
The janitor's source-code sibling: surveys existing code (not a diff) for smells, dead code, duplication, and over-complexity, prioritizes by churn × complexity, and recommends a disposition (refactor/delete/defer/accept) per finding designed to feed /triage. Test-net status and risk-to-change are first-class so a refactor is only recommended when behavior preservation can be proven. Read-only; the risky auto-apply half is deliberately deferred and gated. ROADMAP item 11.
This commit is contained in:
+46
@@ -245,3 +245,49 @@ owner ratifies instead of researching.
|
||||
reversible + test-covered" class, once the owner has watched it make calls and trusts the verdicts
|
||||
(deliberately deferred — recommend-only first to build trust); (b) a thin `/triage`-then-`/adjudicate`
|
||||
combo if the two-command chaining friction proves real (YAGNI for now).
|
||||
|
||||
## 11. `refactor-scout` — read-only technical-debt surveyor for source code ✅ BUILT (2026-06-19)
|
||||
|
||||
Built and live: `guides/refactor-scout.md` + `adapters/claude/agents/refactor-scout.md` (the
|
||||
`refactor-scout` subagent). The **janitor's source-code sibling**: where the janitor spring-cleans
|
||||
docs/artifacts, this surveys *code* debt. It answers "where is the code getting bulky and tangled,
|
||||
and what's actually worth touching?" — the spring-cleaning the owner (a non-coder) wanted, scoped so
|
||||
the value is captured with none of the risk.
|
||||
|
||||
**The insight that scoped it:** separate *seeing* debt (read-only, zero risk — almost all the value)
|
||||
from *changing* it (where 100% of the risk lives). This agent builds only the seeing half. It surveys
|
||||
*existing* code (not a diff — that's `reviewer`; not docs — that's `janitor`; not a whole-repo grade —
|
||||
that's `evaluator`), and is deliberately **opinionated and tiered** (≤~12 highest value-to-risk
|
||||
findings, never a 200-item dump that paralyzes).
|
||||
|
||||
- **Targeted, not uniform.** Prioritizes by **churn × complexity** (the proven hotspot heuristic) so
|
||||
attention lands where debt actually hurts, then inspects those hotspots for smells (DUPLICATION,
|
||||
LONG, COMPLEX, LARGE, DEAD, COUPLING, INCONSISTENT, MAGIC).
|
||||
- **Tool-backed where possible.** Runs the repo's *own* analyzers read-only (knip/ts-prune, vulture,
|
||||
deadcode, clippy, jscpd…) for high-confidence dead-code/duplication signals, kept separate from
|
||||
reasoned judgment; **never installs anything** — a missing analyzer becomes a recommendation.
|
||||
- **Behavior preservation is sacred + test-net is first-class.** Every finding carries risk-to-change
|
||||
(blast-radius, unclear ⇒ HIGH) and test-net status. A REFACTOR is only ever recommended when a green
|
||||
test can prove behavior is unchanged before/after; no coverage ⇒ "write a characterization test
|
||||
first," never "just refactor."
|
||||
- **The disposition loop (the deliverable's whole point).** Each finding is bucketed into
|
||||
**REFACTOR** (worth it + safe + covered, annotated with the specific refactoring) · **DELETE** (dead,
|
||||
tool-confirmed) · **DEFER** (worth it, gated on a test net → ROADMAP) · **ACCEPT** (real debt not
|
||||
worth the risk — a legitimate choice). Buckets are designed to feed straight into `/triage`. The
|
||||
non-coder approves on the *contract* ("tests green before/after, behavior unchanged, small diff,
|
||||
reviewer signed off"), not by reading the diff.
|
||||
|
||||
**Deliberately NOT built — the auto-apply half (deferred + heavily gated).** Actually editing working
|
||||
code is where the risk lives; for now the owner acts on findings manually, one at a time, via a normal
|
||||
gated agent session (the apply path already exists — hand a finding to `reviewer`-backed work under the
|
||||
test gate). A dedicated `/refactor-apply` command is a *future* item, justified only after the survey
|
||||
proves its worth, and only behind: existing-or-generated characterization tests, LOW risk-to-change,
|
||||
small diff, and human approval.
|
||||
|
||||
**Remaining options:** (a) fold a `refactor-scout` pass into `/full-eval` for code repos; (b) the gated
|
||||
auto-apply command above, once the survey has earned trust; (c) once item 1's `/harden` exists, the
|
||||
churn×complexity hotspot list pairs naturally with the per-stack linter baseline.
|
||||
|
||||
**Untested on a real repo** — the first run (recap or keysat) should confirm it stays tiered and honest
|
||||
about test coverage rather than over-recommending refactors. Tune the guide's tiering cap / risk rules
|
||||
if it over-produces.
|
||||
|
||||
Reference in New Issue
Block a user