f258f9fd3c
The janitor's source-code sibling: surveys existing code (not a diff) for smells, dead code, duplication, and over-complexity, prioritizes by churn × complexity, and recommends a disposition (refactor/delete/defer/accept) per finding designed to feed /triage. Test-net status and risk-to-change are first-class so a refactor is only recommended when behavior preservation can be proven. Read-only; the risky auto-apply half is deliberately deferred and gated. ROADMAP item 11.
294 lines
20 KiB
Markdown
294 lines
20 KiB
Markdown
# ROADMAP — Standards
|
||
|
||
Longer-term backlog for the standards repo: future agents, commands, and cross-repo
|
||
standards to hash out and build later. Near-term status lives in `AGENTS.md` →
|
||
`## Current state`. Items here are parked, not committed — we iterate on them when we pick
|
||
one up. Newly captured cross-repo ideas land in `INBOX.md` first and graduate here on
|
||
triage.
|
||
|
||
---
|
||
|
||
## 1. Cross-repo quality-gate standard (linters / pre-commit hooks / CI)
|
||
|
||
**Why:** with agents writing the code, these stop being developer conveniences and become
|
||
the falsifiable rails that let an agent check its own work — write, get told exactly what's
|
||
wrong, iterate, verify. The *standard* is authored here; *application* is per-repo (in each
|
||
repo's AGENTS.md), because what's best-in-class differs by language/stack.
|
||
|
||
**The principle to encode:** every code repo should give its agent a fast, deterministic,
|
||
agent-runnable feedback loop — the subset of checks that run without a human and can't be
|
||
skipped. Tier it:
|
||
|
||
- **Linter/formatter** — per-stack (e.g. ruff/black, eslint/prettier, gofmt). Fast, runs on
|
||
every change; the agent fixes before moving on.
|
||
- **Pre-commit hook** — the unskippable gate: runs the linter + quick tests and blocks the
|
||
commit if they fail. This is the highest-ROI piece and the first to add.
|
||
- **CI on push** — the heavier rebuild + full test suite. Lower priority for solo repos on
|
||
Gitea (Gitea Actions exists); add when a repo has real collaborators or releases.
|
||
|
||
**This repo's own first instance:** it's Markdown + symlinks, so its quality gate isn't a
|
||
code linter — it's a pre-commit hook that runs the **structural checks** this repo already
|
||
has an agent for: relative-symlink integrity (`AGENTS.md ← CLAUDE.md`,
|
||
`docs/guides/* ← .claude/rules/*`, the `adapters/` directory symlinks) and internal-link
|
||
validity. The `portability-checker` agent encodes the invariants; the hook makes the
|
||
deterministic subset unskippable. Build this as the worked example of the standard. Concrete
|
||
checks to start with: (a) the `type` enum is identical across `guides/capture.md`,
|
||
`INBOX.md`, and `AGENTS.md`; (b) `CLAUDE.md` is a relative symlink resolving to `AGENTS.md`;
|
||
(c) every `adapters/claude/{commands,agents}/*.md` wrapper has a matching `guides/<name>.md`
|
||
substance file (no wrapper-without-guide drift).
|
||
|
||
**Open questions:** one shared hook framework (pre-commit.com) vs. hand-rolled per repo;
|
||
how the standard gets *adopted* into a repo (a `/harden` command that installs the right
|
||
linter+hook for the detected stack?); whether to define a minimal "agentic-ops baseline"
|
||
checklist doc alongside the other four standards docs.
|
||
|
||
## 2. `roundup` — cross-project status command ✅ BUILT
|
||
|
||
Built and live: `guides/roundup.md` + `adapters/claude/commands/roundup.md`. Fans out a
|
||
read-only reader per repo over AGENTS.md/ROADMAP.md, folds in the inbox, and synthesizes one
|
||
priority-grouped to-do list across all projects (prioritizing stays with the user). Every run
|
||
**writes the report to a tracked `~/Projects/standards/STATUS.md` snapshot** (overwritten each
|
||
run, then committed + pushed) so the portfolio state is diffable over time — and shows the
|
||
same report inline. That snapshot file is the *only* thing roundup writes; all project repos
|
||
stay read-only.
|
||
|
||
## 3. Deterministic inbox surfacing — SessionStart hook (optional upgrade over the portable line)
|
||
|
||
**Why:** the portable mechanism (the inbox-check line in every repo's AGENTS.md) is
|
||
model-interpreted and therefore skippable. A Claude `SessionStart` hook that greps
|
||
`INBOX.md` for the current repo's tag and prints matching items is deterministic and
|
||
unskippable — the same quality-gate logic as item 1, applied to capture.
|
||
|
||
**Tradeoff:** hooks are Claude-specific and per-repo, so they don't travel to other vendors.
|
||
Decision already made: keep the AGENTS.md line as the **belt-and-suspenders portable
|
||
default**, and offer the hook as an opt-in upgrade for repos where you want the guarantee.
|
||
Possible form: a snippet the quality-gate `/harden` flow (item 1) installs alongside the
|
||
linter hook.
|
||
|
||
## 4. Thread the inbox-check line into bootstrapping
|
||
|
||
**Why:** right now adding the portable inbox-check line to a repo is manual. It should be
|
||
automatic so every repo inherits it.
|
||
|
||
- Add the line to the AGENTS.md template in `retrofit-playbook.md` (Step 1, prompt A) and
|
||
to the `/retrofit` guide's Phase 4.
|
||
- Thread the canonical `.gitignore` block (now in `portability.md` → "What git tracks")
|
||
into `retrofit-playbook.md` Step 0 and the new-repo bootstrap, so every repo's committed
|
||
`.gitignore` carries it rather than relying on a global excludesfile.
|
||
- Consider a one-time sweep command that adds it to every existing repo's AGENTS.md.
|
||
- Decide whether the canonical wording lives in `how-i-work.md` (so it's truly universal)
|
||
or stays a per-repo line.
|
||
|
||
## 5. `new-project` — idea → scoped → scaffolded → Gitea repo ✅ BUILT
|
||
|
||
Built and live: `guides/new-project.md` + `adapters/claude/commands/new-project.md`. The
|
||
inverse of `/retrofit` — main-thread and collaborative, it turns a captured `(new)` inbox
|
||
idea into a repo that's standards-compliant from line one. Phases: locate the inbox seed →
|
||
workshop the scope (the high-value, interactive step) → brief + scaffolding-plan sign-off →
|
||
scaffold (`AGENTS.md` + `CLAUDE.md` symlink, `ROADMAP.md`, `README.md`, canonical
|
||
`.gitignore`, `.claude/`, minimal stack skeleton; inbox-check line tagged `(<name>)`) →
|
||
publish (git init + commit, Gitea manual-create gate, remote + push) → close the loop
|
||
(remove the `(new)` item from `INBOX.md`) → portability-checker verify.
|
||
|
||
**Open questions, resolved:** (a) Gitea repo creation is a **manual web-UI gate** (no API
|
||
token — matches retrofit-playbook Part 4); (b) the standards layer is always scaffolded, the
|
||
stack skeleton stays minimal, and the **linter/pre-commit quality gate is deferred to the
|
||
future `/harden`** (item 1) rather than hand-rolled; (c) the workshop **does** seed the first
|
||
`## Current state` with the first milestone.
|
||
|
||
**Remaining option:** once `/harden` (item 1) exists, call it as the scaffold's last step so a
|
||
new repo gets its stack's quality gate installed automatically.
|
||
|
||
## 6. Cross-repo git-hygiene audit + remediation ✅ DONE (2026-06-14)
|
||
|
||
Fanned out one read-only `portability-checker` per git repo under `~/Projects`. **No safety
|
||
issues anywhere:** zero tracked `.env` / `.DS_Store` / `*.local.json`, and every in-repo
|
||
symlink is relative. The gaps were consistency: the inbox-check line was missing in all 7
|
||
non-standards repos, and only `standards` had a complete canonical `.gitignore`.
|
||
|
||
**Fixed — 6 repos, one commit each, pushed** (`CRM`, `premier-gunner`, `recap`,
|
||
`spark-control`, `proof-of-work`; `recap-relay` committed locally — see residuals): added the
|
||
repo-tagged inbox-check line and normalized `.gitignore`.
|
||
|
||
**Standard improved by the audit:** the documented canonical `.claude/` block was
|
||
allow-by-default and would have *un-ignored* `premier-gunner`'s password-bearing
|
||
`.claude/launch.json`. Switched `portability.md` (and the two retrofit summaries) to a
|
||
**deny-by-default `.claude/*` + allow-list** of the shared wiring.
|
||
|
||
**Residual follow-ups:**
|
||
- **`ten31-transcripts` (MAJOR) — needs its own mini-retrofit.** Despite the name it's an
|
||
active Xcode/Swift app with no `.claude/` at all. Scaffold `.claude/settings.json`; decide
|
||
whether to reorganize its flat `docs/NN_*.md` into `docs/guides/` + `.claude/rules/` symlinks.
|
||
Too big for the mechanical pass — **captured to `INBOX.md`** (tagged `(ten31-transcripts)`)
|
||
with step-by-step instructions, for pickup on the next `/triage` in that repo.
|
||
- **`recap-relay` remote** ✅ Gitea `origin` added + pushed in a later session.
|
||
- **`premier-gunner/s9pk/.gitignore`** lacks the secrets/Claude lines (low priority; the root
|
||
`.gitignore` covers `.env` tree-wide already).
|
||
- **Many non-git folders under `~/Projects` are unprotected work** (discount-watcher,
|
||
expense-organizer, giga, heart-rate, licensing, one-river, satoshi-sleep, START9 PACKAGING,
|
||
ten31-agents/-command-center/-signal-engine, timestamp-converter, timestamp-newspaper,
|
||
website-landing, Grand-Cayman-paddleboard). Each needs `git init` + retrofit, or an explicit
|
||
"scratch, don't track" decision.
|
||
- **`start-os`** is an external upstream (Start9Labs/start-os) — out of scope, no action.
|
||
|
||
## 8. Design system — `/design` round-trip + `design-checker` agent ✅ BUILT (2026-06-16)
|
||
|
||
Built and live: `guides/design.md` + `adapters/claude/commands/design.md` (the `/design`
|
||
command) and `guides/design-checker.md` + `adapters/claude/agents/design-checker.md` (the
|
||
`design-checker` subagent). The principle: branding/design for any user-facing repo is a
|
||
**vendor-neutral on-disk contract** — `design/DESIGN.md` (nine-section brand brief) +
|
||
`design/tokens.tokens.json` (W3C DTCG tokens) — that every agent reads before building UI.
|
||
The cloud tool (Anthropic's **Claude Design**, cloud-only/experimental) is an interchangeable
|
||
front-end, never a dependency.
|
||
|
||
- **`/design`** (main-thread, interactive) runs the round-trip: inspiration-first scoping →
|
||
`design/BRIEF.md` (the packet the user hand-carries to Claude Design) → user drives the
|
||
cloud step → distill the export back into the `DESIGN.md` + tokens contract. Distillation is
|
||
agent-mediated because Claude Design exports inline-hardcoded HTML/CSS, not DTCG tokens
|
||
(verified by research, 2026-06-16) — worth a human glance the first few runs.
|
||
- **`design-checker`** (read-only subagent) audits a repo's user-facing code against its *own*
|
||
committed contract — off-palette colors, wrong type scale, magic-number spacing, matched
|
||
"don'ts" — citing code `file:line` vs the contract rule. Reports "run `/design` first" if no
|
||
contract exists; never imposes its own taste.
|
||
- **`/new-project`** now scaffolds `design/` + the AGENTS.md Design line for user-facing
|
||
projects (Phase 4).
|
||
|
||
**Remaining options:** (a) `/retrofit` should backfill `design/` into existing user-facing
|
||
repos (keysat, recap, recaps.cc, premier-gunner, ten31-database, ten31-transcripts) — run
|
||
`/design` then `design-checker` per repo. **Done (recap, 2026-06-17):** recap was the first
|
||
**Case B (Extract)** backfill — document-as-is. It validated the extract→reconcile path
|
||
(previously untested — keysat was Case A/Import); the contract was distilled and the
|
||
conformance cleanup shipped in two phases (recap app 0.2.161). Extract-phase Phase-D learnings
|
||
landed in `guides/design.md` (commit `9031281`); cleanup-execution learnings are now in
|
||
`guides/design-checker.md`. **Remaining backfill repos:** recaps.cc, premier-gunner,
|
||
ten31-database, ten31-transcripts. (b) fold a `design-checker` pass into `/full-eval`
|
||
for repos that have a contract; (c) confirm against a real Claude Design run what the export
|
||
bundle actually contains and tune the Phase C distillation (the export internals are only
|
||
medium-confidence from research) — kept **decoupled** from the recap run so one untested thing
|
||
moves at a time.
|
||
|
||
## 9. `onboarding-tester` — docs-only fresh-adopter agent ✅ BUILT (agent); harness pending (staged Path 1 → Path 2)
|
||
|
||
Built and live: `guides/onboarding-tester.md` + `adapters/claude/agents/onboarding-tester.md`. A
|
||
global adopter agent: given a **goal**, a **docs-corpus allowlist**, a **sandbox**, and
|
||
**credentials/endpoint**, it walks a product's published docs as a literal newcomer — *never
|
||
reading the product's source* (needing to is itself a finding) — and reports every doc gap
|
||
(Blocker/Stumble/Nit + the one-line fix). On a **fully clean run** (zero blockers, zero stumbles)
|
||
it emits a second, publishable "all it took was X, Y, Z" walkthrough for marketing. Generic by
|
||
design; keysat is the first instantiation. The dual output (friction report always; marketing
|
||
walkthrough only on a clean run) makes it both a docs-QA tool and a source of agent-readiness copy.
|
||
|
||
**First instantiation — keysat SDK integration (harness lives in keysat, not here):** goal = gate
|
||
`proof-of-work` behind a keysat license end-to-end under the least-privilege `merchant-onboard`
|
||
scoped key (keysat shipped that role for this, commit `d5885d1`; covers catalog setup + manual
|
||
license issuance). Harness = two disposable containers (`keysat-server` fixture booted fresh +
|
||
`developer-sandbox` holding `proof-of-work`), master-key mints the scoped key, docs corpus =
|
||
keysat.xyz / docs.keysat.xyz / the four SDK package pages.
|
||
|
||
**Staged build:**
|
||
- **Stage 1 — Path 1, no payments (do first, unblocked now):** catalog → tiers → manual license
|
||
issuance → SDK integration → offline verify, all under `merchant-onboard`. Cheapest honest
|
||
walkthrough; validates the agent + core integration docs before adding payment infra. **Pending:**
|
||
build the harness + first live run in a keysat session.
|
||
- **Stage 2 — Path 2, full buyer-pays on regtest (gated):** depends on keysat shipping
|
||
`payment_providers:write` (opt-in scope, never bundled into `merchant-onboard`) + a network gate
|
||
(scoped connect = regtest/testnet/signet only; mainnet master-only; fail-closed) + a daemon-level
|
||
sandbox-mode flag — **greenlit with the keysat dev 2026-06-16** (`plans/agent-payment-connect-scope.md`).
|
||
Then the harness adds a BTCPay regtest stack and the agent connects BTCPay (regtest) **and** drives
|
||
a test buyer payment end-to-end, zero master-key steps. Walkthrough labeled regtest; production
|
||
mainnet-connect stays the operator's one reserved step **by design** (a key that can repoint
|
||
settlement is a fund-redirection key — surface the boundary as a security feature).
|
||
|
||
**Other follow-up (captured to `INBOX.md`):** a separate operator-onboarding agent for the StartOS
|
||
install journey (stand up + run keysat from docs alone), vs. the developer SDK-integration journey
|
||
this agent covers.
|
||
|
||
## 7. Verify & correct the placement guide ✅ DONE (2026-06-15)
|
||
|
||
Walked the "Infrastructure facts" section with the owner and corrected it against the real
|
||
setup (cross-checked against the project repos + a Spark-control topology dump). Fixes: **x86**
|
||
StartOS **0.4.0** box + full service inventory (Gitea, Nextcloud, Home Assistant, CLN + RTL,
|
||
Open WebUI, Vaultwarden, Synapse, and every Claude-built app); the two-Spark **role split**
|
||
(LLM Spark = vLLM; audio/speech Spark = Parakeet / Kokoro / Sortformer + TitaNet + bge-m3 +
|
||
Qdrant, and it hosts matrix-bridge); **don't hardcode a model — query the Spark Control
|
||
gateway** for the live one (daily driver Qwen3.6, hot-swappable); networking reduced to LAN /
|
||
WireGuard / StartTunnel (Proton VPN + Tor were legacy, dropped). UNVERIFIED banner replaced
|
||
with a "verified 2026-06-15" note; decision steps 4 and 6 aligned. Commit `ee5c8bb`.
|
||
|
||
## 10. `adjudicate` — debate low-priority backlog items to a verdict ✅ BUILT (2026-06-17)
|
||
|
||
Built and live: `guides/adjudicate.md` + `adapters/claude/commands/adjudicate.md` (the
|
||
`/adjudicate` command). Solves backlog clutter the owner can't easily judge: low-priority
|
||
(P2/P3) technical/backend items that may be necessary or may be bells-and-whistles, and that
|
||
he shouldn't spend expertise on *because* they're low priority. Run inside a repo, it
|
||
adjudicates that repo's ROADMAP items via a grounded debate and routes each to a verdict the
|
||
owner ratifies instead of researching.
|
||
|
||
- **Pipeline (per item):** investigator (read-only — does the problem exist? already handled?
|
||
what would it touch? + blast-radius classification) → build-advocate ∥ drop-advocate (argue
|
||
from the investigator's findings, not speculation) → judge (rubric = `how-i-work.md` + repo
|
||
`AGENTS.md`; **biased to DROP on ties / low confidence**, since these are already low-priority).
|
||
- **Three verdicts:** **DROP** (the only autonomously-applied call — ratified in one batch, owner
|
||
needn't understand the tech), **DO** (worth it + LOW blast radius → annotated with a ready plan,
|
||
recommend-only, not executed), **ESCALATE** (worth it but HIGH blast radius / low confidence /
|
||
an epic → balanced brief for the owner's call).
|
||
- **Autonomy is gated by blast radius, not priority** — HIGH = touches data/auth/money/external
|
||
surface or changes observable behavior (unclear ⇒ HIGH). It may auto-recommend *dropping* a HIGH
|
||
item but never *doing* one.
|
||
- **ROADMAP-only input.** Nudges the owner to `/triage` first if untriaged inbox items exist for
|
||
the repo, but never reads raw inbox items into the debate (that's `/triage`'s routing job —
|
||
duplicating it invites drift). Two gates: confirm the item set before fan-out (cost control),
|
||
then approve the batch of ROADMAP edits. The ROADMAP diff + commit message is the audit trail
|
||
(no separate report file).
|
||
|
||
**Remaining options:** (a) **v2 — narrow auto-execution** of the safe "DO + LOW blast radius +
|
||
reversible + test-covered" class, once the owner has watched it make calls and trusts the verdicts
|
||
(deliberately deferred — recommend-only first to build trust); (b) a thin `/triage`-then-`/adjudicate`
|
||
combo if the two-command chaining friction proves real (YAGNI for now).
|
||
|
||
## 11. `refactor-scout` — read-only technical-debt surveyor for source code ✅ BUILT (2026-06-19)
|
||
|
||
Built and live: `guides/refactor-scout.md` + `adapters/claude/agents/refactor-scout.md` (the
|
||
`refactor-scout` subagent). The **janitor's source-code sibling**: where the janitor spring-cleans
|
||
docs/artifacts, this surveys *code* debt. It answers "where is the code getting bulky and tangled,
|
||
and what's actually worth touching?" — the spring-cleaning the owner (a non-coder) wanted, scoped so
|
||
the value is captured with none of the risk.
|
||
|
||
**The insight that scoped it:** separate *seeing* debt (read-only, zero risk — almost all the value)
|
||
from *changing* it (where 100% of the risk lives). This agent builds only the seeing half. It surveys
|
||
*existing* code (not a diff — that's `reviewer`; not docs — that's `janitor`; not a whole-repo grade —
|
||
that's `evaluator`), and is deliberately **opinionated and tiered** (≤~12 highest value-to-risk
|
||
findings, never a 200-item dump that paralyzes).
|
||
|
||
- **Targeted, not uniform.** Prioritizes by **churn × complexity** (the proven hotspot heuristic) so
|
||
attention lands where debt actually hurts, then inspects those hotspots for smells (DUPLICATION,
|
||
LONG, COMPLEX, LARGE, DEAD, COUPLING, INCONSISTENT, MAGIC).
|
||
- **Tool-backed where possible.** Runs the repo's *own* analyzers read-only (knip/ts-prune, vulture,
|
||
deadcode, clippy, jscpd…) for high-confidence dead-code/duplication signals, kept separate from
|
||
reasoned judgment; **never installs anything** — a missing analyzer becomes a recommendation.
|
||
- **Behavior preservation is sacred + test-net is first-class.** Every finding carries risk-to-change
|
||
(blast-radius, unclear ⇒ HIGH) and test-net status. A REFACTOR is only ever recommended when a green
|
||
test can prove behavior is unchanged before/after; no coverage ⇒ "write a characterization test
|
||
first," never "just refactor."
|
||
- **The disposition loop (the deliverable's whole point).** Each finding is bucketed into
|
||
**REFACTOR** (worth it + safe + covered, annotated with the specific refactoring) · **DELETE** (dead,
|
||
tool-confirmed) · **DEFER** (worth it, gated on a test net → ROADMAP) · **ACCEPT** (real debt not
|
||
worth the risk — a legitimate choice). Buckets are designed to feed straight into `/triage`. The
|
||
non-coder approves on the *contract* ("tests green before/after, behavior unchanged, small diff,
|
||
reviewer signed off"), not by reading the diff.
|
||
|
||
**Deliberately NOT built — the auto-apply half (deferred + heavily gated).** Actually editing working
|
||
code is where the risk lives; for now the owner acts on findings manually, one at a time, via a normal
|
||
gated agent session (the apply path already exists — hand a finding to `reviewer`-backed work under the
|
||
test gate). A dedicated `/refactor-apply` command is a *future* item, justified only after the survey
|
||
proves its worth, and only behind: existing-or-generated characterization tests, LOW risk-to-change,
|
||
small diff, and human approval.
|
||
|
||
**Remaining options:** (a) fold a `refactor-scout` pass into `/full-eval` for code repos; (b) the gated
|
||
auto-apply command above, once the survey has earned trust; (c) once item 1's `/harden` exists, the
|
||
churn×complexity hotspot list pairs naturally with the per-stack linter baseline.
|
||
|
||
**Untested on a real repo** — the first run (recap or keysat) should confirm it stays tiered and honest
|
||
about test coverage rather than over-recommending refactors. Tune the guide's tiering cap / risk rules
|
||
if it over-produces.
|