Files
standards/ROADMAP.md
T
Keysat f258f9fd3c Add refactor-scout agent — read-only technical-debt surveyor for source code
The janitor's source-code sibling: surveys existing code (not a diff) for smells,
dead code, duplication, and over-complexity, prioritizes by churn × complexity, and
recommends a disposition (refactor/delete/defer/accept) per finding designed to feed
/triage. Test-net status and risk-to-change are first-class so a refactor is only
recommended when behavior preservation can be proven. Read-only; the risky auto-apply
half is deliberately deferred and gated. ROADMAP item 11.
2026-06-19 22:57:32 -05:00

294 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ROADMAP — Standards
Longer-term backlog for the standards repo: future agents, commands, and cross-repo
standards to hash out and build later. Near-term status lives in `AGENTS.md`
`## Current state`. Items here are parked, not committed — we iterate on them when we pick
one up. Newly captured cross-repo ideas land in `INBOX.md` first and graduate here on
triage.
---
## 1. Cross-repo quality-gate standard (linters / pre-commit hooks / CI)
**Why:** with agents writing the code, these stop being developer conveniences and become
the falsifiable rails that let an agent check its own work — write, get told exactly what's
wrong, iterate, verify. The *standard* is authored here; *application* is per-repo (in each
repo's AGENTS.md), because what's best-in-class differs by language/stack.
**The principle to encode:** every code repo should give its agent a fast, deterministic,
agent-runnable feedback loop — the subset of checks that run without a human and can't be
skipped. Tier it:
- **Linter/formatter** — per-stack (e.g. ruff/black, eslint/prettier, gofmt). Fast, runs on
every change; the agent fixes before moving on.
- **Pre-commit hook** — the unskippable gate: runs the linter + quick tests and blocks the
commit if they fail. This is the highest-ROI piece and the first to add.
- **CI on push** — the heavier rebuild + full test suite. Lower priority for solo repos on
Gitea (Gitea Actions exists); add when a repo has real collaborators or releases.
**This repo's own first instance:** it's Markdown + symlinks, so its quality gate isn't a
code linter — it's a pre-commit hook that runs the **structural checks** this repo already
has an agent for: relative-symlink integrity (`AGENTS.md ← CLAUDE.md`,
`docs/guides/* ← .claude/rules/*`, the `adapters/` directory symlinks) and internal-link
validity. The `portability-checker` agent encodes the invariants; the hook makes the
deterministic subset unskippable. Build this as the worked example of the standard. Concrete
checks to start with: (a) the `type` enum is identical across `guides/capture.md`,
`INBOX.md`, and `AGENTS.md`; (b) `CLAUDE.md` is a relative symlink resolving to `AGENTS.md`;
(c) every `adapters/claude/{commands,agents}/*.md` wrapper has a matching `guides/<name>.md`
substance file (no wrapper-without-guide drift).
**Open questions:** one shared hook framework (pre-commit.com) vs. hand-rolled per repo;
how the standard gets *adopted* into a repo (a `/harden` command that installs the right
linter+hook for the detected stack?); whether to define a minimal "agentic-ops baseline"
checklist doc alongside the other four standards docs.
## 2. `roundup` — cross-project status command ✅ BUILT
Built and live: `guides/roundup.md` + `adapters/claude/commands/roundup.md`. Fans out a
read-only reader per repo over AGENTS.md/ROADMAP.md, folds in the inbox, and synthesizes one
priority-grouped to-do list across all projects (prioritizing stays with the user). Every run
**writes the report to a tracked `~/Projects/standards/STATUS.md` snapshot** (overwritten each
run, then committed + pushed) so the portfolio state is diffable over time — and shows the
same report inline. That snapshot file is the *only* thing roundup writes; all project repos
stay read-only.
## 3. Deterministic inbox surfacing — SessionStart hook (optional upgrade over the portable line)
**Why:** the portable mechanism (the inbox-check line in every repo's AGENTS.md) is
model-interpreted and therefore skippable. A Claude `SessionStart` hook that greps
`INBOX.md` for the current repo's tag and prints matching items is deterministic and
unskippable — the same quality-gate logic as item 1, applied to capture.
**Tradeoff:** hooks are Claude-specific and per-repo, so they don't travel to other vendors.
Decision already made: keep the AGENTS.md line as the **belt-and-suspenders portable
default**, and offer the hook as an opt-in upgrade for repos where you want the guarantee.
Possible form: a snippet the quality-gate `/harden` flow (item 1) installs alongside the
linter hook.
## 4. Thread the inbox-check line into bootstrapping
**Why:** right now adding the portable inbox-check line to a repo is manual. It should be
automatic so every repo inherits it.
- Add the line to the AGENTS.md template in `retrofit-playbook.md` (Step 1, prompt A) and
to the `/retrofit` guide's Phase 4.
- Thread the canonical `.gitignore` block (now in `portability.md` → "What git tracks")
into `retrofit-playbook.md` Step 0 and the new-repo bootstrap, so every repo's committed
`.gitignore` carries it rather than relying on a global excludesfile.
- Consider a one-time sweep command that adds it to every existing repo's AGENTS.md.
- Decide whether the canonical wording lives in `how-i-work.md` (so it's truly universal)
or stays a per-repo line.
## 5. `new-project` — idea → scoped → scaffolded → Gitea repo ✅ BUILT
Built and live: `guides/new-project.md` + `adapters/claude/commands/new-project.md`. The
inverse of `/retrofit` — main-thread and collaborative, it turns a captured `(new)` inbox
idea into a repo that's standards-compliant from line one. Phases: locate the inbox seed →
workshop the scope (the high-value, interactive step) → brief + scaffolding-plan sign-off →
scaffold (`AGENTS.md` + `CLAUDE.md` symlink, `ROADMAP.md`, `README.md`, canonical
`.gitignore`, `.claude/`, minimal stack skeleton; inbox-check line tagged `(<name>)`) →
publish (git init + commit, Gitea manual-create gate, remote + push) → close the loop
(remove the `(new)` item from `INBOX.md`) → portability-checker verify.
**Open questions, resolved:** (a) Gitea repo creation is a **manual web-UI gate** (no API
token — matches retrofit-playbook Part 4); (b) the standards layer is always scaffolded, the
stack skeleton stays minimal, and the **linter/pre-commit quality gate is deferred to the
future `/harden`** (item 1) rather than hand-rolled; (c) the workshop **does** seed the first
`## Current state` with the first milestone.
**Remaining option:** once `/harden` (item 1) exists, call it as the scaffold's last step so a
new repo gets its stack's quality gate installed automatically.
## 6. Cross-repo git-hygiene audit + remediation ✅ DONE (2026-06-14)
Fanned out one read-only `portability-checker` per git repo under `~/Projects`. **No safety
issues anywhere:** zero tracked `.env` / `.DS_Store` / `*.local.json`, and every in-repo
symlink is relative. The gaps were consistency: the inbox-check line was missing in all 7
non-standards repos, and only `standards` had a complete canonical `.gitignore`.
**Fixed — 6 repos, one commit each, pushed** (`CRM`, `premier-gunner`, `recap`,
`spark-control`, `proof-of-work`; `recap-relay` committed locally — see residuals): added the
repo-tagged inbox-check line and normalized `.gitignore`.
**Standard improved by the audit:** the documented canonical `.claude/` block was
allow-by-default and would have *un-ignored* `premier-gunner`'s password-bearing
`.claude/launch.json`. Switched `portability.md` (and the two retrofit summaries) to a
**deny-by-default `.claude/*` + allow-list** of the shared wiring.
**Residual follow-ups:**
- **`ten31-transcripts` (MAJOR) — needs its own mini-retrofit.** Despite the name it's an
active Xcode/Swift app with no `.claude/` at all. Scaffold `.claude/settings.json`; decide
whether to reorganize its flat `docs/NN_*.md` into `docs/guides/` + `.claude/rules/` symlinks.
Too big for the mechanical pass — **captured to `INBOX.md`** (tagged `(ten31-transcripts)`)
with step-by-step instructions, for pickup on the next `/triage` in that repo.
- **`recap-relay` remote** ✅ Gitea `origin` added + pushed in a later session.
- **`premier-gunner/s9pk/.gitignore`** lacks the secrets/Claude lines (low priority; the root
`.gitignore` covers `.env` tree-wide already).
- **Many non-git folders under `~/Projects` are unprotected work** (discount-watcher,
expense-organizer, giga, heart-rate, licensing, one-river, satoshi-sleep, START9 PACKAGING,
ten31-agents/-command-center/-signal-engine, timestamp-converter, timestamp-newspaper,
website-landing, Grand-Cayman-paddleboard). Each needs `git init` + retrofit, or an explicit
"scratch, don't track" decision.
- **`start-os`** is an external upstream (Start9Labs/start-os) — out of scope, no action.
## 8. Design system — `/design` round-trip + `design-checker` agent ✅ BUILT (2026-06-16)
Built and live: `guides/design.md` + `adapters/claude/commands/design.md` (the `/design`
command) and `guides/design-checker.md` + `adapters/claude/agents/design-checker.md` (the
`design-checker` subagent). The principle: branding/design for any user-facing repo is a
**vendor-neutral on-disk contract**`design/DESIGN.md` (nine-section brand brief) +
`design/tokens.tokens.json` (W3C DTCG tokens) — that every agent reads before building UI.
The cloud tool (Anthropic's **Claude Design**, cloud-only/experimental) is an interchangeable
front-end, never a dependency.
- **`/design`** (main-thread, interactive) runs the round-trip: inspiration-first scoping →
`design/BRIEF.md` (the packet the user hand-carries to Claude Design) → user drives the
cloud step → distill the export back into the `DESIGN.md` + tokens contract. Distillation is
agent-mediated because Claude Design exports inline-hardcoded HTML/CSS, not DTCG tokens
(verified by research, 2026-06-16) — worth a human glance the first few runs.
- **`design-checker`** (read-only subagent) audits a repo's user-facing code against its *own*
committed contract — off-palette colors, wrong type scale, magic-number spacing, matched
"don'ts" — citing code `file:line` vs the contract rule. Reports "run `/design` first" if no
contract exists; never imposes its own taste.
- **`/new-project`** now scaffolds `design/` + the AGENTS.md Design line for user-facing
projects (Phase 4).
**Remaining options:** (a) `/retrofit` should backfill `design/` into existing user-facing
repos (keysat, recap, recaps.cc, premier-gunner, ten31-database, ten31-transcripts) — run
`/design` then `design-checker` per repo. **Done (recap, 2026-06-17):** recap was the first
**Case B (Extract)** backfill — document-as-is. It validated the extract→reconcile path
(previously untested — keysat was Case A/Import); the contract was distilled and the
conformance cleanup shipped in two phases (recap app 0.2.161). Extract-phase Phase-D learnings
landed in `guides/design.md` (commit `9031281`); cleanup-execution learnings are now in
`guides/design-checker.md`. **Remaining backfill repos:** recaps.cc, premier-gunner,
ten31-database, ten31-transcripts. (b) fold a `design-checker` pass into `/full-eval`
for repos that have a contract; (c) confirm against a real Claude Design run what the export
bundle actually contains and tune the Phase C distillation (the export internals are only
medium-confidence from research) — kept **decoupled** from the recap run so one untested thing
moves at a time.
## 9. `onboarding-tester` — docs-only fresh-adopter agent ✅ BUILT (agent); harness pending (staged Path 1 → Path 2)
Built and live: `guides/onboarding-tester.md` + `adapters/claude/agents/onboarding-tester.md`. A
global adopter agent: given a **goal**, a **docs-corpus allowlist**, a **sandbox**, and
**credentials/endpoint**, it walks a product's published docs as a literal newcomer — *never
reading the product's source* (needing to is itself a finding) — and reports every doc gap
(Blocker/Stumble/Nit + the one-line fix). On a **fully clean run** (zero blockers, zero stumbles)
it emits a second, publishable "all it took was X, Y, Z" walkthrough for marketing. Generic by
design; keysat is the first instantiation. The dual output (friction report always; marketing
walkthrough only on a clean run) makes it both a docs-QA tool and a source of agent-readiness copy.
**First instantiation — keysat SDK integration (harness lives in keysat, not here):** goal = gate
`proof-of-work` behind a keysat license end-to-end under the least-privilege `merchant-onboard`
scoped key (keysat shipped that role for this, commit `d5885d1`; covers catalog setup + manual
license issuance). Harness = two disposable containers (`keysat-server` fixture booted fresh +
`developer-sandbox` holding `proof-of-work`), master-key mints the scoped key, docs corpus =
keysat.xyz / docs.keysat.xyz / the four SDK package pages.
**Staged build:**
- **Stage 1 — Path 1, no payments (do first, unblocked now):** catalog → tiers → manual license
issuance → SDK integration → offline verify, all under `merchant-onboard`. Cheapest honest
walkthrough; validates the agent + core integration docs before adding payment infra. **Pending:**
build the harness + first live run in a keysat session.
- **Stage 2 — Path 2, full buyer-pays on regtest (gated):** depends on keysat shipping
`payment_providers:write` (opt-in scope, never bundled into `merchant-onboard`) + a network gate
(scoped connect = regtest/testnet/signet only; mainnet master-only; fail-closed) + a daemon-level
sandbox-mode flag — **greenlit with the keysat dev 2026-06-16** (`plans/agent-payment-connect-scope.md`).
Then the harness adds a BTCPay regtest stack and the agent connects BTCPay (regtest) **and** drives
a test buyer payment end-to-end, zero master-key steps. Walkthrough labeled regtest; production
mainnet-connect stays the operator's one reserved step **by design** (a key that can repoint
settlement is a fund-redirection key — surface the boundary as a security feature).
**Other follow-up (captured to `INBOX.md`):** a separate operator-onboarding agent for the StartOS
install journey (stand up + run keysat from docs alone), vs. the developer SDK-integration journey
this agent covers.
## 7. Verify & correct the placement guide ✅ DONE (2026-06-15)
Walked the "Infrastructure facts" section with the owner and corrected it against the real
setup (cross-checked against the project repos + a Spark-control topology dump). Fixes: **x86**
StartOS **0.4.0** box + full service inventory (Gitea, Nextcloud, Home Assistant, CLN + RTL,
Open WebUI, Vaultwarden, Synapse, and every Claude-built app); the two-Spark **role split**
(LLM Spark = vLLM; audio/speech Spark = Parakeet / Kokoro / Sortformer + TitaNet + bge-m3 +
Qdrant, and it hosts matrix-bridge); **don't hardcode a model — query the Spark Control
gateway** for the live one (daily driver Qwen3.6, hot-swappable); networking reduced to LAN /
WireGuard / StartTunnel (Proton VPN + Tor were legacy, dropped). UNVERIFIED banner replaced
with a "verified 2026-06-15" note; decision steps 4 and 6 aligned. Commit `ee5c8bb`.
## 10. `adjudicate` — debate low-priority backlog items to a verdict ✅ BUILT (2026-06-17)
Built and live: `guides/adjudicate.md` + `adapters/claude/commands/adjudicate.md` (the
`/adjudicate` command). Solves backlog clutter the owner can't easily judge: low-priority
(P2/P3) technical/backend items that may be necessary or may be bells-and-whistles, and that
he shouldn't spend expertise on *because* they're low priority. Run inside a repo, it
adjudicates that repo's ROADMAP items via a grounded debate and routes each to a verdict the
owner ratifies instead of researching.
- **Pipeline (per item):** investigator (read-only — does the problem exist? already handled?
what would it touch? + blast-radius classification) → build-advocate ∥ drop-advocate (argue
from the investigator's findings, not speculation) → judge (rubric = `how-i-work.md` + repo
`AGENTS.md`; **biased to DROP on ties / low confidence**, since these are already low-priority).
- **Three verdicts:** **DROP** (the only autonomously-applied call — ratified in one batch, owner
needn't understand the tech), **DO** (worth it + LOW blast radius → annotated with a ready plan,
recommend-only, not executed), **ESCALATE** (worth it but HIGH blast radius / low confidence /
an epic → balanced brief for the owner's call).
- **Autonomy is gated by blast radius, not priority** — HIGH = touches data/auth/money/external
surface or changes observable behavior (unclear ⇒ HIGH). It may auto-recommend *dropping* a HIGH
item but never *doing* one.
- **ROADMAP-only input.** Nudges the owner to `/triage` first if untriaged inbox items exist for
the repo, but never reads raw inbox items into the debate (that's `/triage`'s routing job —
duplicating it invites drift). Two gates: confirm the item set before fan-out (cost control),
then approve the batch of ROADMAP edits. The ROADMAP diff + commit message is the audit trail
(no separate report file).
**Remaining options:** (a) **v2 — narrow auto-execution** of the safe "DO + LOW blast radius +
reversible + test-covered" class, once the owner has watched it make calls and trusts the verdicts
(deliberately deferred — recommend-only first to build trust); (b) a thin `/triage`-then-`/adjudicate`
combo if the two-command chaining friction proves real (YAGNI for now).
## 11. `refactor-scout` — read-only technical-debt surveyor for source code ✅ BUILT (2026-06-19)
Built and live: `guides/refactor-scout.md` + `adapters/claude/agents/refactor-scout.md` (the
`refactor-scout` subagent). The **janitor's source-code sibling**: where the janitor spring-cleans
docs/artifacts, this surveys *code* debt. It answers "where is the code getting bulky and tangled,
and what's actually worth touching?" — the spring-cleaning the owner (a non-coder) wanted, scoped so
the value is captured with none of the risk.
**The insight that scoped it:** separate *seeing* debt (read-only, zero risk — almost all the value)
from *changing* it (where 100% of the risk lives). This agent builds only the seeing half. It surveys
*existing* code (not a diff — that's `reviewer`; not docs — that's `janitor`; not a whole-repo grade —
that's `evaluator`), and is deliberately **opinionated and tiered** (≤~12 highest value-to-risk
findings, never a 200-item dump that paralyzes).
- **Targeted, not uniform.** Prioritizes by **churn × complexity** (the proven hotspot heuristic) so
attention lands where debt actually hurts, then inspects those hotspots for smells (DUPLICATION,
LONG, COMPLEX, LARGE, DEAD, COUPLING, INCONSISTENT, MAGIC).
- **Tool-backed where possible.** Runs the repo's *own* analyzers read-only (knip/ts-prune, vulture,
deadcode, clippy, jscpd…) for high-confidence dead-code/duplication signals, kept separate from
reasoned judgment; **never installs anything** — a missing analyzer becomes a recommendation.
- **Behavior preservation is sacred + test-net is first-class.** Every finding carries risk-to-change
(blast-radius, unclear ⇒ HIGH) and test-net status. A REFACTOR is only ever recommended when a green
test can prove behavior is unchanged before/after; no coverage ⇒ "write a characterization test
first," never "just refactor."
- **The disposition loop (the deliverable's whole point).** Each finding is bucketed into
**REFACTOR** (worth it + safe + covered, annotated with the specific refactoring) · **DELETE** (dead,
tool-confirmed) · **DEFER** (worth it, gated on a test net → ROADMAP) · **ACCEPT** (real debt not
worth the risk — a legitimate choice). Buckets are designed to feed straight into `/triage`. The
non-coder approves on the *contract* ("tests green before/after, behavior unchanged, small diff,
reviewer signed off"), not by reading the diff.
**Deliberately NOT built — the auto-apply half (deferred + heavily gated).** Actually editing working
code is where the risk lives; for now the owner acts on findings manually, one at a time, via a normal
gated agent session (the apply path already exists — hand a finding to `reviewer`-backed work under the
test gate). A dedicated `/refactor-apply` command is a *future* item, justified only after the survey
proves its worth, and only behind: existing-or-generated characterization tests, LOW risk-to-change,
small diff, and human approval.
**Remaining options:** (a) fold a `refactor-scout` pass into `/full-eval` for code repos; (b) the gated
auto-apply command above, once the survey has earned trust; (c) once item 1's `/harden` exists, the
churn×complexity hotspot list pairs naturally with the per-stack linter baseline.
**Untested on a real repo** — the first run (recap or keysat) should confirm it stays tiered and honest
about test coverage rather than over-recommending refactors. Tune the guide's tiering cap / risk rules
if it over-produces.