Add portability protocol and subagents handbook

Document the portability protocol (vendor-neutral knowledge, vendor-named symlinks) and a Claude subagents reference handbook, and link the portability protocol from the retrofit playbook.
2026-06-12 13:05:19 -05:00
parent 786633253f
commit 98755f8507
3 changed files with 415 additions and 1 deletions
@@ -0,0 +1,329 @@
+# Subagents Handbook
+
+**Purpose:** reference for designing, using, and creating Claude Code subagents.
+**Lives at:** `~/Projects/standards/subagents-handbook.md`, sibling of `portability.md`
+and `retrofit-playbook.md`. The working parts follow the portability protocol: neutral
+substance in `guides/<agent>.md` (one operating guide per agent, plus
+`guides/full-eval.md` for the orchestrator); thin Claude wrappers in
+`adapters/claude/agents/` and `adapters/claude/commands/`, reached via the standard
+directory symlinks from `~/.claude/agents` and `~/.claude/commands`.
+
+Verified against the official docs (June 2026): https://code.claude.com/docs/en/sub-agents
+
+---
+
+## 1. What a subagent is
+
+A subagent is a helper Claude that runs in its **own context window**, does a chunk of
+work, and hands back **only its final report**. Your file hierarchy controls what enters
+context at session start; subagents control what stays out of context while work happens.
+
+The trigger is always the same shape: *the task involves lots of reading or noisy output,
+but I only need the conclusion.*
+
+### Mechanics (the facts that matter)
+
+- **Format.** A Markdown file with YAML frontmatter. Frontmatter = config; body = the
+  subagent's entire system prompt. Only `name` and `description` are required. Useful
+  optional fields: `tools` (allowlist), `disallowedTools`, `model` (`haiku`/`sonnet`/
+  `opus`/`inherit`), `effort` (`low`/`medium`/`high`/`xhigh`/`max` — reasoning depth,
+  default high), `maxTurns`, `memory`, `background`.
+- **Wrapper/guide split (this setup).** Per the portability protocol, each definition
+  here is a *thin wrapper*: vendor syntax confined to the frontmatter, and a body that
+  does exactly three things — direct the agent to read its operating guide at
+  `~/Projects/standards/guides/<name>.md` before acting, give a fail-safe ("can't read
+  the guide → stop and say so, don't improvise"), and state a one-line safety floor
+  (read-only, etc.) that holds even guideless. The guide carries all substance —
+  mission, procedure, hard rules, report contract — as plain prose any agent could
+  follow; adopting another provider is a header swap.
+- **Locations & precedence.** `.claude/agents/` (project) beats `~/.claude/agents/`
+  (user, all projects). Both are scanned recursively, so symlinked subfolders work.
+- **Loading.** Subagent files (the wrappers) are read **at session start**. Add or edit
+  a wrapper on disk → restart the session. (Agents made via `/agents` load immediately.)
+  Guides are different: the agent reads its guide **at run time**, so guide edits take
+  effect on the very next run, no restart.
+- **What the subagent sees.** Its own body, the delegation prompt the main agent writes,
+  your full CLAUDE.md/AGENTS.md hierarchy, and a git-status snapshot. It does **not** see
+  your conversation. The delegation prompt is the *only* channel in — paths, symptoms,
+  and scope must be stated there.
+- **What comes back.** The subagent's final message, verbatim. Everything else it read
+  or did dies with its context window.
+- **Built-ins.** `Explore` (Haiku, read-only — fast codebase search), `Plan`, and
+  `general-purpose` already exist. Don't rebuild them.
+- **No nesting.** Subagents cannot spawn subagents. Fan-out happens from the main
+  thread (see §6, the orchestrator).
+- **Foreground vs background.** Foreground blocks and passes permission prompts to you.
+  Background runs concurrently but **auto-denies** anything that would prompt — so
+  agents needing Bash approval (exerciser, auditor) should run foreground the first time.
+- **Invocation.** Auto-delegation via the `description` field; by name in prose
+  ("use the reviewer subagent"); or guaranteed via `@agent-<name>`.
+- **Cost.** Every subagent is a fresh model run reading files from scratch. Heavy
+  fan-outs can burn several times the tokens of a single thread. Spend it where the
+  context savings or fresh eyes are worth it.
+
+---
+
+## 2. The catalog
+
+Four properties a separate context window gives you. Every useful agent exploits at
+least one.
+
+| Property | What it buys you | Agents |
+|---|---|---|
+| **Compression** | Huge input → small conclusion | explorer, test-runner, researcher, docs-reader |
+| **Independence** | Fresh eyes, unanchored by your session | reviewer, fresh-debugger, spec-checker, security-auditor |
+| **Containment** | Messy trial-and-error stays out of your session | exerciser, reproducer, migrator |
+| **Parallelism** | N investigations at once | any of the above, fanned out |
+
+### Roster
+
+| Agent | Property | One-liner | Status |
+|---|---|---|---|
+| explorer | compression | Map a codebase/subsystem, report how it works | **built-in** (`Explore`) |
+| evaluator | independence | Whole-repo assessment: 6 lenses + intent match + alternatives | ✅ in kit |
+| reviewer | independence | Fresh-eyes review of a diff before commit | ✅ in kit |
+| security-auditor | independence | Hostile persona: vulns, secrets, dependency CVEs | ✅ in kit |
+| spec-checker | independence | Compliance vs Start9 community registry requirements | ✅ in kit (`start9-spec-checker`) |
+| exerciser | containment | Black-box QA: run it, feed it normal + hostile inputs | ✅ in kit |
+| researcher | compression | Multi-source web research → cited brief | ✅ in kit |
+| test-runner | compression | Run the suite, return only failures + causes | build when a repo has a real suite |
+| docs-reader | compression | Read a library's docs, return just the calls you need | build on 3rd manual-trawling task |
+| fresh-debugger | independence | Gets symptoms only (never your theories), hunts the bug | build next time you're stuck >1hr |
+| reproducer | containment | Bug report in → minimal repro steps out | build when triaging user reports |
+| migrator | containment | Mechanical change across many files → "done, N anomalies" | build at first big rename/upgrade |
+
+*CVE, since it'll keep coming up:* **Common Vulnerabilities and Exposures** — the public
+catalog of known security flaws, each with an ID like `CVE-2026-12345`. "Dependency
+CVEs" = known holes in the third-party libraries your software pulls in. Scanners
+(`npm audit`, `cargo audit`, `pip-audit`, `osv-scanner`) check your lockfiles against
+the catalog. The security-auditor runs them.
+
+---
+
+## 3. Rules of use
+
+1. **Delegate reading; keep understanding.** The cardinal rule. A subagent returns
+   conclusions — its *reasoning* dies with its context. Never delegate work you'll
+   iterate on (implementation, design) because you'd be editing half-blind. Test: *if
+   the next step needs the understanding, do it in the main window.*
+2. **Demand verifiable outputs.** Trustworthy reports cite things you can spot-check
+   without re-reading the inputs: `file:line`, repro commands, failing test names, URLs.
+   If an agent's output can't be made verifiable, it's a bad subagent task.
+3. **The description field is the API.** Auto-delegation reads only `description`. Write
+   it as trigger conditions for the delegating model, not marketing for you. "Use
+   proactively when…" and "MUST BE USED after…" genuinely change behavior.
+4. **Least-privilege tools.** Evaluative agents get read-only (`Read, Grep, Glob`,
+   maybe `Bash` for git/scanners). Only agents whose *job* is writing get `Write`/`Edit`.
+   This keeps reports honest (an agent that can "just fix it" stops reporting) and makes
+   background runs safe.
+5. **One job per agent.** When a definition grows "and also…", split it. A muddled
+   mission produces a muddled report.
+6. **Cap the report.** Every definition states a line budget. The main context is the
+   scarce resource the whole system exists to protect; an agent that returns 500 lines
+   defeats itself.
+7. **Mandate a Surprises section.** Cheap insurance against tunnel vision — the best
+   findings are often outside the asked question. "None" is an acceptable answer;
+   omitting the section is not.
+8. **Two dials per agent: model × effort.** `model` is who you hire — haiku for
+   mechanical work, sonnet default, opus where judgment is the product. `effort` is
+   how long they're allowed to think before answering — it scales reasoning depth, not
+   capability ceiling. Both live in the wrapper frontmatter; frontmatter effort
+   overrides the session level while that agent runs. Match the dial to where the cost
+   actually lives: evaluator and security-auditor are **opus/high** because the report
+   *is* the reasoning; reviewer, researcher, exerciser, and spec-checker are
+   **sonnet/medium** because their cost is dominated by frequency, tool-call turns, or
+   procedure-following — not thinking depth. The deep insight: a well-proceduralized
+   guide is *pre-computed reasoning*, so every hour spent tightening a guide lowers
+   the model+effort that task needs forever. `low` is for truly mechanical agents
+   (migrator, test-runner) once their guides are trusted; `xhigh` is the upgrade if an
+   opus agent's reports feel shallow on hard targets. Global overrides, strongest
+   first: `CLAUDE_CODE_EFFORT_LEVEL` and `CLAUDE_CODE_SUBAGENT_MODEL` env vars →
+   wrapper frontmatter → session setting — handy for "run everything cheap today."
+9. **Fresh eyes need fresh inputs.** When delegating to reviewer/debugger types, give
+   symptoms and scope — *never your theories*. Anchoring them throws away the one thing
+   the separate context bought you.
+10. **Parallelize the independent, serialize the dependent.** "Audit security, exercise
+    the app, and check spec compliance *in parallel*" is one sentence to the main agent.
+    But chained work (find issues → fix them) must flow back through the main thread.
+11. **Iterate the guide, not the chat.** A disappointing report means the guide is
+    wrong. Fix `guides/<name>.md` (it's version-controlled) and rerun — guides load at
+    run time, so no restart. Touch the wrapper only for tools/model/description
+    changes, and restart the session when you do. Correcting the agent
+    conversationally fixes nothing for next time.
+12. **Know when *not* to delegate.** Quick targeted changes; tasks needing your taste
+    mid-stream; reads under ~5 files (overhead exceeds benefit); anything where latency
+    matters more than context hygiene.
+
+---
+
+## 4. Shaping the report
+
+The summary is the interface — "report back" gets you mush. Every *guide* in this kit
+embeds a variant of the universal contract below; the agent reads its guide at startup,
+so the contract reaches it without bloating the wrapper. The universal form here is
+your template for designing new guides.
+
+### Universal report contract
+
+```
+## Verdict        — the answer, 1–3 sentences, no throat-clearing
+## Findings       — each: [severity] one-line title → evidence → so-what
+## Coverage       — what was examined and what wasn't (makes silence meaningful)
+## Surprises      — anything unexpected, on-mission or not ("none" allowed)
+## Next actions   — ranked, concrete, imperative mood
+## Confidence     — high/medium/low + the one thing that would raise it
+```
+
+### Severity scale (shared by all evaluative agents)
+
+- **P0** exploitable, data loss, or crashes in normal use — fix before anything else
+- **P1** wrong behavior or weak security on realistic paths — fix before release
+- **P2** works, but fragile / hard to maintain / poor practice
+- **P3** cosmetic, style, minor docs
+- **P4** informational / matter of taste
+
+### Evidence by agent type
+
+Per-agent evidence requirements: code findings cite `file:line`; bugs include exact
+repro commands plus expected-vs-actual; research claims carry a URL and are labeled
+**Fact** (sourced) or **Inference** (the agent's own conclusion); compliance findings
+cite the requirement's doc URL. A finding without its evidence type gets deleted, not
+softened.
+
+### Length budgets
+
+Reviewer ≤ 70 lines · exerciser & spec-checker ≤ 80 · researcher & security-auditor
+≤ 100 · evaluator ≤ 120. Tighten freely; loosen only with cause.
+
+---
+
+## 5. Creating a new agent
+
+Three routes, in order of preference:
+
+**Route 1 — copy a neighbor.** The fastest path to a *good* agent is copying the
+closest existing pair: its guide in `guides/` and its wrapper in
+`adapters/claude/agents/`. The contract and structure carry over; only the mission
+changes. Scope rule applies: project-specific agents put the guide in
+`<repo>/docs/guides/` and the wrapper in `<repo>/.claude/agents/` instead.
+
+**Route 2 — `/agents` → "Generate with Claude".** Interactive, guided, loads without a
+session restart. Good for quick experiments; once it earns permanence, split it —
+substance to a guide, vendor header to a thin wrapper — at the right scope.
+
+**Route 3 — the meta-prompt.** Paste this into any Claude session, filled in:
+
+```
+Create a Claude Code subagent definition file (YAML frontmatter + system-prompt body).
+
+Role:        [one sentence — what it does]
+Trigger:     [when the main agent should delegate; exact phrases that should fire it]
+Inputs:      [what the delegation prompt will contain — paths, scope, symptoms]
+Tools:       [least-privilege list; justify any Write/Edit/Bash]
+Must never:  [hard limits — e.g. never edits source, never returns raw logs]
+Report:      [sections, severity scale, evidence type, line cap]
+Model:       [haiku|sonnet|opus|inherit — and why]
+
+Produce TWO files per my portability protocol:
+1. The GUIDE — vendor-neutral, second person, self-contained: mission, procedure,
+   hard rules, then the report template verbatim. End with: "If blocked, report
+   exactly what blocked you — never guess or fabricate findings."
+   → guides/<name>.md (standards repo if global; <repo>/docs/guides/ if project).
+2. The WRAPPER — vendor syntax confined to the header: YAML frontmatter with name,
+   a description written FOR THE DELEGATING MODEL (explicit trigger conditions,
+   "use proactively when…" phrasing where wanted), least-privilege tools, model.
+   Body only: role one-liner, directive to read the guide first, stop-and-say-so
+   fail-safe if the guide is unreadable, one-line safety floor.
+   → adapters/claude/agents/<name>.md (global) or <repo>/.claude/agents/ (project).
+Then show one example invocation and the first 10 lines of an ideal report.
+```
+
+**Checklist** (every guide+wrapper pair, before it ships):
+Guide — self-contained · embedded report template with line cap · Surprises section ·
+blocked-means-say-so clause. Wrapper — trigger-rich description · least-privilege
+`tools` · explicit `model` · guide pointer with fail-safe · safety floor. Then: save
+both at the right scope, restart the session (new wrapper), run once on a real task,
+tighten the guide where it disappointed.
+
+---
+
+## 6. The orchestrator question
+
+You cannot build an orchestrator *as a subagent* — subagents can't spawn subagents.
+Three real options:
+
+1. **Slash command (this kit's choice): `/full-eval`.** A command file is just a stored
+   prompt for the *main thread* — and the main thread *can* fan out. `/full-eval` makes
+   it launch evaluator + security-auditor + exerciser (+ start9-spec-checker when
+   relevant) in parallel, then synthesize one deduplicated, prioritized report. Zero new
+   machinery, works in any session.
+2. **`claude --agent <name>` (advanced).** A session launched this way takes the agent's
+   system prompt as its main thread — and a main-thread agent *can* spawn subagents,
+   even restricted to a roster via `tools: Agent(evaluator, security-auditor, ...)`.
+   Build this only if `/full-eval` starts feeling too manual.
+3. **Agent teams (experimental).** Multiple communicating sessions. Ignore until a task
+   genuinely exceeds one session's context even with subagents.
+
+Synthesis is the orchestrator's real job, and it stays in the main thread on purpose:
+cross-referencing ("the crash the exerciser found is the injection the auditor
+flagged") is exactly the understanding you don't delegate.
+
+---
+
+## 7. Workflows
+
+### A. The retrospective (the dozen existing repos)
+
+Per repo, in order:
+
+1. **Retrofit first** (per `retrofit-playbook.md`): AGENTS.md with Current state, the
+   CLAUDE.md symlink. Subagents load this file — the evaluator literally reads it to
+   judge the build against your stated intent. Eval before retrofit = grading against
+   a blank rubric.
+2. **`/full-eval`** in a fresh session. Walk away; read one synthesized report.
+3. **Triage** into the repo's AGENTS.md Current state: P0/P1 become the work queue,
+   P2 becomes a "known debt" list, P3+ gets deleted or done in bulk.
+4. **Fix in the main session** (understanding work — yours), running **reviewer** on
+   each meaningful diff before commit.
+5. **Close out** per the ritual; move to the next repo. Don't batch — one repo's
+   full cycle beats six repos' half-cycles.
+
+### B. Steady state (everything built from now on)
+
+- **While building:** built-in Explore for "how does X work" questions; researcher
+  before adopting any dependency; fresh-debugger discipline when stuck >1hr (symptoms
+  only, no theories).
+- **Every meaningful diff:** reviewer, before commit. This is the habit that compounds.
+- **Before any release:** exerciser + security-auditor in parallel.
+- **Before registry submission:** start9-spec-checker.
+- **Quarterly-ish:** `/full-eval` on actively maintained repos; drift accumulates
+  silently.
+
+---
+
+## 8. Install & maintenance
+
+One-time install — the portability protocol's standard symlinks (if `~/.claude/agents`
+or `~/.claude/commands` already exist as real directories, move their contents into the
+adapters folders first):
+
+```bash
+ln -s ~/Projects/standards/adapters/claude/agents   ~/.claude/agents
+ln -s ~/Projects/standards/adapters/claude/commands ~/.claude/commands
+```
+
+Then restart any open Claude Code session (wrappers load at session start). First run:
+`cd` into a repo, `claude`, then `/full-eval` — or invoke singles anytime
+(`@agent-reviewer look at my uncommitted changes`).
+
+- **Two-file edit model.** Substance → `guides/<name>.md`, live on the agent's next
+  run, no restart. Machinery (tools, model, description) → the wrapper, restart
+  required. When in doubt, you're editing the guide.
+- Edits land via the normal close-out ritual → commit → push → backed up on the Start9
+  with the rest of the standards repo. The directory symlinks mean `git pull` updates
+  every machine-wide agent; there is no reinstall step.
+- Sessions that can't see `~/Projects` (cloud, containers) simply have no global
+  agents; a wrapper that loads without its guide stops and says so rather than
+  improvising — that's the fail-safe working as designed.
+- Prune ruthlessly. An agent unused for months is context-rot in the `/agents` list;
+  delete the pair — git remembers.