From 46298e047f488e6a9d657803847c7ae78b3cf257 Mon Sep 17 00:00:00 2001 From: Keysat Date: Wed, 17 Jun 2026 22:42:32 -0500 Subject: [PATCH] Add /adjudicate command: debate low-priority backlog to a verdict MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Parked P2/P3 technical items accumulate faster than I can judge their necessity. /adjudicate runs a grounded per-item debate (investigator → build/drop advocates → judge) over a repo's ROADMAP and routes each to DROP / DO / ESCALATE, so I ratify decisions instead of researching them. Recommend-only in v1; verdict autonomy is gated by blast radius, not priority. ROADMAP-only input — nudges /triage rather than reading the raw inbox. --- AGENTS.md | 18 +++- ROADMAP.md | 31 +++++++ adapters/claude/commands/adjudicate.md | 20 +++++ guides/adjudicate.md | 119 +++++++++++++++++++++++++ 4 files changed, 184 insertions(+), 4 deletions(-) create mode 100644 adapters/claude/commands/adjudicate.md create mode 100644 guides/adjudicate.md diff --git a/AGENTS.md b/AGENTS.md index cf1ac03..097f7f0 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -17,7 +17,8 @@ The global layer lives here and is wired into `~/.claude` by **directory symlink file added under `adapters/` is live immediately — no per-file linking: - `~/.claude/commands` → `adapters/claude/commands/` — global slash commands (`/retrofit`, - `/handoff`, `/full-eval`, `/capture`, `/triage`, `/roundup`, `/new-project`, `/design`). + `/handoff`, `/full-eval`, `/capture`, `/triage`, `/roundup`, `/new-project`, `/design`, + `/adjudicate`). - `~/.claude/agents` → `adapters/claude/agents/` — global subagents (reviewer, evaluator, security-auditor, doc-auditor, exerciser, researcher, janitor, portability-checker, start9-spec-checker, design-checker, onboarding-tester). @@ -88,9 +89,18 @@ should carry this so any vendor's agent surfaces pending items at session start: ## Current state - **Fleet built and live** — commands `/capture /triage /roundup /new-project /handoff /retrofit - /full-eval /design`; subagents incl. `design-checker` + `onboarding-tester` (substance in - `guides/`, thin wrappers in `adapters/claude/`, symlinked into `~/.claude`). Dogfoods its own - standard. Latest `/roundup`: `STATUS.md` 2026-06-16. + /full-eval /design /adjudicate`; subagents incl. `design-checker` + `onboarding-tester` + (substance in `guides/`, thin wrappers in `adapters/claude/`, symlinked into `~/.claude`). + Dogfoods its own standard. Latest `/roundup`: `STATUS.md` 2026-06-16. +- **`/adjudicate` built this session (ROADMAP item 10), live.** Debates each parked P2/P3 backlog + item on a repo's ROADMAP to a verdict so the owner ratifies instead of researching: per item, + investigator (grounds it in the code + classifies blast radius) → build- ∥ drop-advocate → judge + (rubric = `how-i-work.md` + repo `AGENTS.md`, biased to DROP on ties). Verdicts: **DROP** (auto, + ratified in one batch), **DO** (low blast radius → annotated plan, recommend-only), **ESCALATE** + (HIGH blast radius / low confidence → balanced brief for the owner). Autonomy gated by blast + radius, not priority; ROADMAP-only (nudges `/triage` first, never reads raw inbox). **v1 is + recommend-only** — never executes; v2 (narrow auto-execution of the safe DO class) deferred until + trust is built. - **`onboarding-tester` built this session (ROADMAP item 9), live.** Docs-only adopter agent: walks a product's published docs as a literal newcomer (never reading source), reports doc gaps, and on a fully clean run emits a publishable "all it took was X, Y, Z" walkthrough. First target: keysat diff --git a/ROADMAP.md b/ROADMAP.md index fbe949f..19c63ba 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -214,3 +214,34 @@ Qdrant, and it hosts matrix-bridge); **don't hardcode a model — query the Spar gateway** for the live one (daily driver Qwen3.6, hot-swappable); networking reduced to LAN / WireGuard / StartTunnel (Proton VPN + Tor were legacy, dropped). UNVERIFIED banner replaced with a "verified 2026-06-15" note; decision steps 4 and 6 aligned. Commit `ee5c8bb`. + +## 10. `adjudicate` — debate low-priority backlog items to a verdict ✅ BUILT (2026-06-17) + +Built and live: `guides/adjudicate.md` + `adapters/claude/commands/adjudicate.md` (the +`/adjudicate` command). Solves backlog clutter the owner can't easily judge: low-priority +(P2/P3) technical/backend items that may be necessary or may be bells-and-whistles, and that +he shouldn't spend expertise on *because* they're low priority. Run inside a repo, it +adjudicates that repo's ROADMAP items via a grounded debate and routes each to a verdict the +owner ratifies instead of researching. + +- **Pipeline (per item):** investigator (read-only — does the problem exist? already handled? + what would it touch? + blast-radius classification) → build-advocate ∥ drop-advocate (argue + from the investigator's findings, not speculation) → judge (rubric = `how-i-work.md` + repo + `AGENTS.md`; **biased to DROP on ties / low confidence**, since these are already low-priority). +- **Three verdicts:** **DROP** (the only autonomously-applied call — ratified in one batch, owner + needn't understand the tech), **DO** (worth it + LOW blast radius → annotated with a ready plan, + recommend-only, not executed), **ESCALATE** (worth it but HIGH blast radius / low confidence / + an epic → balanced brief for the owner's call). +- **Autonomy is gated by blast radius, not priority** — HIGH = touches data/auth/money/external + surface or changes observable behavior (unclear ⇒ HIGH). It may auto-recommend *dropping* a HIGH + item but never *doing* one. +- **ROADMAP-only input.** Nudges the owner to `/triage` first if untriaged inbox items exist for + the repo, but never reads raw inbox items into the debate (that's `/triage`'s routing job — + duplicating it invites drift). Two gates: confirm the item set before fan-out (cost control), + then approve the batch of ROADMAP edits. The ROADMAP diff + commit message is the audit trail + (no separate report file). + +**Remaining options:** (a) **v2 — narrow auto-execution** of the safe "DO + LOW blast radius + +reversible + test-covered" class, once the owner has watched it make calls and trusts the verdicts +(deliberately deferred — recommend-only first to build trust); (b) a thin `/triage`-then-`/adjudicate` +combo if the two-command chaining friction proves real (YAGNI for now). diff --git a/adapters/claude/commands/adjudicate.md b/adapters/claude/commands/adjudicate.md new file mode 100644 index 0000000..89d44c8 --- /dev/null +++ b/adapters/claude/commands/adjudicate.md @@ -0,0 +1,20 @@ +--- +description: Debate each low-priority (P2/P3) backlog item on this repo's ROADMAP to a DROP/DO/ESCALATE verdict — recommend-only, applied on your approval +argument-hint: [optional scope, e.g. a ROADMAP item number or "P3"] +--- + +Adjudicate the low-priority technical backlog of the repository in the current working +directory. Scope, if any: $ARGUMENTS + +Your complete orchestration guide — phases, the per-item investigate→debate→judge pipeline, +the three verdicts (DROP / DO / ESCALATE), and the report + approval flow — is at: + + ~/Projects/standards/guides/adjudicate.md + +Read it in full first, then follow it exactly. If you cannot read that file, stop and report +precisely that — do not improvise the adjudication. + +Claude Code specifics for Phase 2: per item, launch the investigator first, then the build- and +drop-advocates as a single parallel batch, then the judge; run items concurrently in batches to +keep the fan-out manageable. These are read-only role agents — the only write is the ROADMAP +edit in Phase 4, after the owner approves. diff --git a/guides/adjudicate.md b/guides/adjudicate.md new file mode 100644 index 0000000..506109d --- /dev/null +++ b/guides/adjudicate.md @@ -0,0 +1,119 @@ +# Adjudicate — debate low-priority backlog items to a verdict + +*Substance file per the portability protocol. Vendor wrappers (e.g. +`adapters/claude/commands/adjudicate.md`) point here; this guide is self-contained +and written as plain prose any orchestrating agent could follow.* + +You are running inside one project repo. Low-priority technical/backend items pile up on its +`ROADMAP.md` that the owner can't easily judge the necessity of — and shouldn't have to spend +expertise on, precisely *because* they're low priority. Your job is to run a grounded debate +over each eligible item and reach a verdict, so the owner ratifies decisions instead of +researching them. + +**Recommend-only.** You never execute, build, or ship anything here. Your output is verdicts +and a single batch of ROADMAP edits the owner approves. The most you change is the backlog +itself. + +**Autonomy is gated by blast radius, not priority.** A low-priority item can still be +dangerous (it touches data, auth, money, an external surface, or changes observable app +behavior). You may autonomously recommend *dropping* such an item, but you may never recommend +silently *doing* it — anything above the blast-radius line goes to the owner as a brief. + +## Phase 1 — Orient & select (no fan-out yet) + +1. Read this repo's `ROADMAP.md` and `AGENTS.md` (especially `## Current state`) for context. +2. **Inbox nudge (don't triage).** Do the session-start inbox-check: if + `~/Projects/standards/INBOX.md` has unchecked items tagged for this repo, tell the owner + *"N untriaged inbox items for this repo — run `/triage` first to land them on the ROADMAP, + or proceed with just what's there."* You operate on ROADMAP only; never read raw inbox items + into the debate — that is `/triage`'s routing job, and duplicating its rules invites drift. +3. **Select candidates.** Eligible = parked, low-priority backlog items: P2/P3 where items + carry an explicit priority; otherwise items that read as nice-to-have / deferred. + **Exclude:** P0/P1 or clearly-active items, anything already marked done/built, and + `(new:…)`-style new-repo seeds. If `$ARGUMENTS` names specific items (e.g. a ROADMAP number + or `P3`), scope to those. +4. **Confirm the set before spending agents.** Show the owner the list you intend to adjudicate + (one line each) and let them trim or confirm. A full run is ~4 subagents per item — this gate + controls cost and catches any item that's more important than its placement suggests. + +## Phase 2 — Per item: investigate → debate → judge + +For each confirmed item, run this pipeline (items may run in parallel where your tooling +allows; within an item the stages are sequential): + +1. **Investigator** (read-only). Grounds the debate in reality so it isn't two models + speculating. Reads the actual code and reports: does the problem this item describes actually + exist, or is it already handled? What would the change touch (files, surfaces)? **Classify + blast radius:** LOW (reversible, internal, test-covered, no observable behavior change) or + HIGH (touches data/auth/money/an external surface, or changes observable app behavior). When + unsure, classify HIGH. +2. **Build-advocate** and **Drop-advocate** (in parallel). Each receives the item text and the + investigator's findings and argues one side honestly, citing the findings — not speculation: + - *Build-advocate*: the concrete benefit, the cost or risk of leaving it undone, who or what + it helps. + - *Drop-advocate*: YAGNI, added complexity and maintenance, opportunity cost, whether it's + bells-and-whistles for its own sake. +3. **Judge.** Receives the item, the investigator's findings (incl. blast radius), and both + briefs. Decides against the rubric = `how-i-work.md` + this repo's `AGENTS.md`. **Bias to + DROP on a tie or low confidence** — these items are already low-priority, so death is the + default unless a clear case is made. Emits a structured verdict (next section). + +## The three verdicts + +- **DROP** — not worth doing. The only autonomously-applied call. (Still ratified in one batch + by the owner per Phase 4 — "autonomous" means the owner needn't understand the tech, not that + files change unseen.) +- **DO** — worth doing **and** blast radius LOW. Annotate the ROADMAP item with the decision and + a short ready-to-act plan; surface it for the owner's go-ahead to schedule. You do **not** + execute it (recommend-only). +- **ESCALATE** — worth doing **but** blast radius HIGH, **or** the judge's confidence is low, + **or** the item is really an epic that should be split first. Produce a balanced brief: the + build case, the drop case, the judge's lean, and why it's above the line. This is the owner's + real judgment call — made cheap because they're ratifying reasoning, not generating it. + +## Phase 3 — Report (inline, no file written) + +Show the owner one report. No new tracked artifact — the ROADMAP diff and the commit message +are the durable record (same convention as `/triage`). + +``` +# Adjudication — +Adjudicated N of M eligible items. + +## DROP (ratify to remove) +- — one-line why-not + judge confidence + +## DO (low blast radius — your go-ahead to schedule) +- — one-line why + the ready plan + +## ESCALATE (your call — balanced brief) +- — build case / drop case / judge's lean / why it crosses the line +``` + +## Phase 4 — Approve, apply, commit + +1. **One approval gate.** Wait for the owner to confirm the batch. Never edit `ROADMAP.md` + before they approve — it's a durable file (same rule as `/triage`). +2. **Apply** the approved changes to `ROADMAP.md`: delete DROP items outright (git history is + the record — don't leave tombstones); annotate DO items with the decision + plan; annotate + ESCALATE items with the judge's lean so the brief isn't lost. +3. **Commit.** Present the proposed message and wait for confirmation (one approval covers + commit + push, per `how-i-work.md`). The message records the verdicts and the why for each + drop — that *is* the audit trail. No AI-attribution trailer. +4. **Report** what was dropped, what's queued as DO, and what's waiting on the owner as + ESCALATE. + +## Rules + +- Recommend-only. Never execute, build, or ship — your single write is the ROADMAP edit, after + approval. +- Never auto-recommend *doing* a HIGH-blast-radius item; route it to ESCALATE. When blast radius + is unclear, treat it as HIGH. +- Ground every argument in the investigator's findings. If the investigator can't read the code + or the item is too vague to investigate, say so and ESCALATE it rather than debating in a + vacuum. +- Don't read raw inbox items into the debate — nudge the owner to `/triage` first. ROADMAP is + the only input. +- Preserve the owner's judgment as the gate: propose verdicts, apply only on approval, and + surface anything consequential rather than deciding it. +- If blocked at any point, report exactly what blocked you — never fabricate a verdict.