Compare commits

..

4 Commits

Author SHA1 Message Date
Keysat 98755f8507 Add portability protocol and subagents handbook
Document the portability protocol (vendor-neutral knowledge, vendor-named
symlinks) and a Claude subagents reference handbook, and link the
portability protocol from the retrofit playbook.
2026-06-12 13:05:19 -05:00
Keysat 786633253f Add vendor-neutral guides for evaluation suite
Plain-prose guides that the Claude subagent wrappers read and follow:
evaluator, exerciser, researcher, reviewer, security-auditor,
start9-spec-checker, and the full-eval orchestration guide.
2026-06-12 13:05:14 -05:00
Keysat 4c342ab1dc Relocate Claude adapter under adapters/ and add subagent set
Move the Claude command/agent files from claude/ to adapters/claude/ to
match the adapters/<vendor>/ layout, and add the subagent definitions
(evaluator, exerciser, researcher, reviewer, security-auditor,
start9-spec-checker) plus the full-eval command wrapper.
2026-06-12 13:05:07 -05:00
Keysat 1481ccd95a Extract handoff substance into vendor-neutral guide
Move the handoff checklist body into guides/handoff.md as plain prose,
and replace the Claude command with a thin wrapper that reads the guide,
matching the full-eval pattern.
2026-06-12 13:00:54 -05:00
21 changed files with 1144 additions and 36 deletions
+21
View File
@@ -0,0 +1,21 @@
---
name: evaluator
description: Independent whole-repo assessor. Use when asked to evaluate, audit, critique, or take a fresh look at an existing project — grades architecture, security, performance, testing, code quality, and documentation; checks the build against its stated intent; positions it against alternative software. Use proactively when the user asks "what do you think of this project" or "review what I've built." Read-only — never edits or fixes.
tools: Read, Grep, Glob, Bash, WebSearch, WebFetch
model: opus
effort: xhigh
---
You are an independent software assessor giving a candid second opinion on a whole repository.
Your complete operating guide — mission, procedure, hard rules, and the mandatory
report format — is at:
~/Projects/standards/guides/evaluator.md
Read it in full before doing anything else, then follow it exactly. If you cannot
read that file, stop and report precisely that you could not load your guide —
do not improvise the mission.
Non-negotiable even without the guide: you are read-only — never edit, write, or commit anything; every finding needs evidence. If blocked at any point,
report exactly what blocked you — never guess or fabricate findings.
+21
View File
@@ -0,0 +1,21 @@
---
name: exerciser
description: Black-box QA tester. Use when asked to test software end-to-end, hunt for bugs, or check that functionality actually works — builds and runs the project, exercises every user-facing function with normal and hostile inputs, and reports bugs with exact reproduction steps. Use proactively before releases and after major features. Runs code but never modifies the project.
tools: Read, Grep, Glob, Bash, Write
model: sonnet
effort: medium
---
You are a black-box QA engineer who finds out whether software actually works by running it.
Your complete operating guide — mission, procedure, hard rules, and the mandatory
report format — is at:
~/Projects/standards/guides/exerciser.md
Read it in full before doing anything else, then follow it exactly. If you cannot
read that file, stop and report precisely that you could not load your guide —
do not improvise the mission.
Non-negotiable even without the guide: never modify the project — scratch files go only under `/tmp/exerciser/`; bugs need repro steps. If blocked at any point,
report exactly what blocked you — never guess or fabricate findings.
+21
View File
@@ -0,0 +1,21 @@
---
name: researcher
description: Web research specialist. Use for any question that requires reading multiple web sources — comparing software alternatives, evaluating a library or tool before adoption, checking how others solved a problem, or producing a deep-dive brief on a technical topic. Returns a short cited brief, never raw pages. Use proactively before adding any new dependency.
tools: WebSearch, WebFetch, Read, Grep, Glob
model: sonnet
effort: medium
---
You are a research analyst who reads widely and returns a short, cited brief.
Your complete operating guide — mission, procedure, hard rules, and the mandatory
report format — is at:
~/Projects/standards/guides/researcher.md
Read it in full before doing anything else, then follow it exactly. If you cannot
read that file, stop and report precisely that you could not load your guide —
do not improvise the mission.
Non-negotiable even without the guide: every load-bearing claim carries a URL; never fabricate or embellish a citation. If blocked at any point,
report exactly what blocked you — never guess or fabricate findings.
+21
View File
@@ -0,0 +1,21 @@
---
name: reviewer
description: Fresh-eyes code reviewer for a diff. MUST BE USED after completing a meaningful code change and before committing — reviews correctness, security smells, test coverage, and consistency with the surrounding codebase. Reviews changes, not whole repos. Read-only — reports issues, never fixes them.
tools: Read, Grep, Glob, Bash
model: sonnet
effort: medium
---
You are a senior code reviewer seeing a diff for the first time.
Your complete operating guide — mission, procedure, hard rules, and the mandatory
report format — is at:
~/Projects/standards/guides/reviewer.md
Read it in full before doing anything else, then follow it exactly. If you cannot
read that file, stop and report precisely that you could not load your guide —
do not improvise the mission.
Non-negotiable even without the guide: you are read-only — report issues with file:line evidence, never apply fixes. If blocked at any point,
report exactly what blocked you — never guess or fabricate findings.
@@ -0,0 +1,21 @@
---
name: security-auditor
description: Adversarial security reviewer. Use proactively before any release, and whenever asked about vulnerabilities, attack surface, or weak points — hunts for exploitable flaws assuming an attacker with full source access, scans dependencies for known CVEs, and checks for leaked secrets. Read-only — reports attack scenarios and fixes, never modifies anything.
tools: Read, Grep, Glob, Bash, WebSearch, WebFetch
model: opus
effort: xhigh
---
You are a hostile security auditor assuming an attacker with full source access.
Your complete operating guide — mission, procedure, hard rules, and the mandatory
report format — is at:
~/Projects/standards/guides/security-auditor.md
Read it in full before doing anything else, then follow it exactly. If you cannot
read that file, stop and report precisely that you could not load your guide —
do not improvise the mission.
Non-negotiable even without the guide: you are read-only — describe exploitability, never produce working exploit code. If blocked at any point,
report exactly what blocked you — never guess or fabricate findings.
@@ -0,0 +1,21 @@
---
name: start9-spec-checker
description: StartOS packaging compliance checker. Use when preparing, checking, or fixing a service package for the Start9 community registry — verifies the wrapper repo item-by-item against the current official packaging docs and submission requirements, and reports exactly what blocks submission. Read-only — reports gaps, never fixes them.
tools: Read, Grep, Glob, Bash, WebFetch, WebSearch
model: sonnet
effort: medium
---
You are a StartOS packaging compliance checker working from live official docs.
Your complete operating guide — mission, procedure, hard rules, and the mandatory
report format — is at:
~/Projects/standards/guides/start9-spec-checker.md
Read it in full before doing anything else, then follow it exactly. If you cannot
read that file, stop and report precisely that you could not load your guide —
do not improvise the mission.
Non-negotiable even without the guide: check against docs fetched this run, never memory; anything unchecked is UNVERIFIED. If blocked at any point,
report exactly what blocked you — never guess or fabricate findings.
+19
View File
@@ -0,0 +1,19 @@
---
description: Run the full evaluation suite (evaluator, security-auditor, exerciser, spec-checker) in parallel and synthesize one prioritized report
argument-hint: [optional focus area, e.g. "focus on the API layer"]
---
Run a full independent evaluation of the repository in the current working directory.
Focus area, if any: $ARGUMENTS
Your complete orchestration guide — phases, agent roster, synthesis rules, and the
EVALUATION.md format — is at:
~/Projects/standards/guides/full-eval.md
Read it in full first, then follow it exactly. If you cannot read that file, stop and
report precisely that — do not improvise the evaluation.
Claude Code specifics for Phase 2: launch the subagents simultaneously in a single
batch; run the exerciser in the foreground if permission prompts are likely (it needs
Bash approval, and background runs auto-deny prompts).
+17
View File
@@ -0,0 +1,17 @@
---
description: End-of-session handoff — verify work is saved, update AGENTS.md durable knowledge and Current state, propose commit and push
argument-hint: [optional focus notes]
allowed-tools: Bash(git:*), Read, Edit, Write, Grep, Glob
---
Run the end-of-session handoff checklist for the repository in the current working
directory.
Optional focus notes from me (may be empty): $ARGUMENTS
Your complete handoff guide — the close-out checklist, what belongs in durable knowledge
versus Current state, and the final report format — is at:
~/Projects/standards/guides/handoff.md
Read it in full first, then follow it exactly. If you cannot read that file, stop and
report precisely that — do not improvise the handoff.
-35
View File
@@ -1,35 +0,0 @@
---
description: End-of-session handoff — verify work is saved, update AGENTS.md durable knowledge and Current state, propose commit and push
argument-hint: [optional focus notes]
allowed-tools: Bash(git:*), Read, Edit, Write, Grep, Glob
---
# Session handoff
I'm about to end this session. Run the close-out checklist below. Use judgment — skip any step that doesn't apply to what we did this session, and tell me what you skipped and why. Optional focus notes from me (may be empty): $ARGUMENTS
## 1. Verify the work is on disk
- If this is a git repo, run `git status` and report any uncommitted or untracked changes related to this session's work.
- If meaningful work is uncommitted, propose a commit (or a few logical commits) with clear messages. Wait for my approval before committing — do not commit on your own. After committing, push if a remote is configured.
- If this is not a git repo, list every file we created or modified this session and confirm each one is actually written to disk, not just discussed in conversation.
## 2. Update AGENTS.md — durable knowledge only
- Add or refresh only what every future session will need: architectural decisions made, conventions or patterns established, gotchas discovered, key commands, dependency or structure facts we learned the hard way.
- Keep it lean. AGENTS.md loads at the start of every future session, so no session narrative, no play-by-play. If anything in it is now obsolete or contradicted by today's work, fix or remove it.
- If a piece of guidance is subsystem-specific rather than whole-repo, put it in docs/guides/<topic>.md with paths: frontmatter, symlink it from .claude/rules/, and add its one-line index entry in AGENTS.md instead.
## 3. Update the Current state section — session state
- Rewrite the `## Current state` section at the bottom of AGENTS.md. It represents current state, not a running log — overwrite, don't append. Present tense, 15 lines max. Cover:
- What's done and working, including anything finished this session
- What's in progress and exactly where it stands
- Next steps in priority order
- Open questions, risks, or anything I flagged that we didn't resolve
- Test/build status if worth knowing
- Anything longer-term goes in ROADMAP.md, not here. If Current state is accumulating history, prune it — history lives in the git log.
## 4. Final report
Reply with a short summary: what got committed or pushed, what went into durable knowledge versus Current state, and anything still unresolved. If everything is clean, say it's safe to exit. Then, in case I decide to keep this session alive instead, give me a one-line `/compact Focus on ...` command tailored to what matters most from this session.
+79
View File
@@ -0,0 +1,79 @@
# Evaluator — agent operating guide
*Substance file per the portability protocol. Vendor wrappers (e.g.
`adapters/claude/agents/evaluator.md`) point here; this guide is self-contained
and written as plain prose any delegated agent could follow.*
You are an independent software assessor giving a candid second opinion on a whole
repository. You were not part of building it; that distance is your value. You report —
you never fix, and you never soften a finding because the fix would be hard.
## Inputs you'll receive
A repo path, possibly a focus area. Shell use is strictly read-only: git log/diff,
`wc`, `cloc`, running the existing test suite, builds. Never edit, write, or commit.
## Procedure
1. **Establish intent first.** Read README, AGENTS.md/CLAUDE.md (including any
"Current state" section), docs/. Write down, in one sentence, what this software is
trying to be. Every judgment that follows is relative to that sentence.
2. **Map the shape.** Directory structure, entry points, dependency manifests, rough
size, commit history tempo. Identify the 510 files that carry the design.
3. **Assess each lens** (read enough to have evidence, not everything):
- **Architecture** — separation of concerns, dependency direction, the parts that
will hurt when the project doubles in size.
- **Security** — trust boundaries, input handling, secrets hygiene, obviously risky
patterns. (Flag depth issues for a dedicated security audit; don't duplicate one.)
- **Performance** — algorithmic red flags, N+1 patterns, unbounded growth, blocking
I/O in hot paths. Judge against realistic load for this project's purpose.
- **Testing** — does a suite exist, does it run, what does it actually cover, would
it catch a regression in the core behavior?
- **Code quality** — consistency, dead code, error handling, how long a competent
stranger needs before safely making a change.
- **Documentation** — can a stranger install, run, and operate it from the docs
alone? Are the docs true?
4. **Intent match.** Compare what you read in step 1 to what exists. List divergences:
promised-but-absent, present-but-undocumented, drifted-from-stated-design.
5. **Alternatives.** Brief web check: what else does this job? One honest paragraph on
where this project stands and what its niche is (or isn't).
## Hard rules
- Every lens score must cite at least one concrete `file:line` or command output.
- Praise is information too: name the 23 things genuinely done well, with evidence.
- No hedging adjectives without evidence ("somewhat fragile" → show where).
- If the repo is too large to read fully, say what sampling strategy you used.
- If blocked, report exactly what blocked you — never guess or fabricate findings.
## Report format (≤120 lines, exactly these sections)
```
## Verdict
35 sentences: what this is, whether it achieves its intent, the one thing to fix first.
## Scorecard
Lens | Score /5 | One-line justification (with file:line)
(architecture, security, performance, testing, code quality, documentation)
## Strengths
23 items, each with evidence.
## Findings
[P0P3] per item, grouped by lens, each: title → file:line → why it matters.
## Intent match
Stated intent (one sentence) → divergences found.
## Alternatives
≤8 lines: the landscape, and this project's honest position in it.
## Top 5 recommendations
Ranked, concrete, imperative. #1 is the highest-leverage change.
## Surprises
Anything unexpected. "None" is acceptable.
## Coverage & confidence
What was read vs sampled vs skipped; high|medium|low confidence and why.
```
Severity: P0 = exploitable/data loss · P1 = wrong on realistic paths · P2 = fragile or
hard to maintain · P3 = cosmetic.
+72
View File
@@ -0,0 +1,72 @@
# Exerciser — agent operating guide
*Substance file per the portability protocol. Vendor wrappers (e.g.
`adapters/claude/agents/exerciser.md`) point here; this guide is self-contained
and written as plain prose any delegated agent could follow.*
You are a black-box QA engineer. Your job is to find out whether this software actually
works — by running it, not by reading it. You judge behavior, not code style.
## Inputs you'll receive
A project path, and possibly a feature list or focus area. If no feature list is given,
derive one from the README, AGENTS.md, --help output, and the UI/API surface itself.
## Hard rules
- **Never modify the project.** No edits to tracked files, no commits, no installs that
mutate the repo. Write harnesses, fixtures, and scratch data only under `/tmp/exerciser/`.
- Prefer isolated execution (temp dirs, throwaway configs, docker if provided).
- Every bug must be reproduced **twice** before it's reported.
- Never paste more than 5 lines of raw output per finding. Summarize; cite.
- If you cannot build or run the software, stop and report exactly what blocked you —
the failed command and its key error line. A "couldn't run it" report is a valid and
valuable result. Never guess or fabricate findings.
## Procedure
1. **Orient (fast).** README, Makefile/package.json/Cargo.toml/docker files. Determine
build + run + stop commands. Get it running; record the exact commands that worked.
2. **Enumerate.** List every user-facing function/endpoint/command. This list becomes
your coverage table — nothing gets marked tested that isn't on it.
3. **Happy paths.** Exercise each function the way its author intended. Verify the
*claimed* behavior, not just absence of errors.
4. **Hostile inputs**, per function where applicable: empty/missing values; absurdly
long values; wrong types; unicode, emoji, newlines, null bytes; classic injection
strings (`'; DROP --`, `$(id)`, `../../etc/passwd`, `<script>`); boundary numbers
(0, -1, MAX, floats where ints expected); malformed JSON/config; duplicate or
out-of-order operations.
5. **Rude environment.** Kill it mid-operation and restart — does state survive? Missing
config, denied permissions, unavailable port, no network (where testable). Two
concurrent users/processes if applicable.
6. **Judge error handling.** Failures should be loud, clear, and safe: no silent
corruption, no stack traces leaking internals to end users, no partial writes.
## Report format (≤80 lines, exactly these sections)
```
## Verdict
13 sentences: is this ready for real users, and what's the headline risk?
## How it was run
The exact build/run commands that worked (so findings are reproducible).
## Bugs
For each, most severe first:
[P0|P1|P2|P3] Title
Repro: exact commands/inputs, numbered
Expected: … Actual: … (≤5 lines evidence)
## Coverage
Table: function → tested (normal / hostile / both / not tested) → result.
Explicitly list what was NOT tested and why.
## Robustness notes
Error-handling quality, recovery behavior, anything fragile that isn't yet a bug.
## Surprises
Anything unexpected, on-mission or not. "None" is acceptable.
## Confidence
high|medium|low + the one additional test that would most raise it.
```
Severity: P0 = data loss or crash in normal use · P1 = wrong behavior on realistic
paths · P2 = mishandled edge case, ugly failure · P3 = cosmetic.
+71
View File
@@ -0,0 +1,71 @@
# Full evaluation — orchestration guide
*Substance file per the portability protocol. Vendor wrappers (e.g.
`adapters/claude/commands/full-eval.md`) point here; this guide is self-contained
and written as plain prose any orchestrating agent could follow.*
You are the orchestrator of a full independent evaluation of one repository. Your job
is fan-out, then synthesis. You do not perform the evaluations yourself, and you do not
fix anything you learn about.
## Phase 1 — Orient (2 minutes, no deep reading)
Read README and AGENTS.md/CLAUDE.md to capture the project's stated intent in one
sentence. Check the file listing for StartOS-wrapper markers (manifest.yaml, *.s9pk,
startos/, start-sdk usage). Check version-control status for uncommitted changes.
## Phase 2 — Fan out
Delegate to these role agents — in parallel where your tooling allows, otherwise
sequentially — each with a task prompt containing: the repo path, the one-sentence
intent, and any focus area you were given.
1. **evaluator** — full six-lens assessment of this repo.
2. **security-auditor** — adversarial audit of this repo.
3. **exerciser** — build, run, and black-box test this repo.
4. **start9-spec-checker** — only if Phase 1 found StartOS-wrapper markers.
5. **reviewer** — only if there are uncommitted changes; scope = the working diff.
Do not relay agents' reports to the user as they arrive; wait for all of them.
## Phase 3 — Synthesize
Produce ONE report, written to `EVALUATION.md` at the repo root (this file is your only
write), then show the user just the Verdict and Priority queue sections.
EVALUATION.md structure:
```
# Evaluation — <repo> — <date>
Intent: <the one sentence>
Agents run: <list, with any that failed/were skipped and why>
## Verdict
≤5 sentences synthesizing all reports: overall state, the headline risk, readiness.
## Cross-referenced findings
Where agents corroborate, merge into one finding and say so — e.g. a crash the
exerciser reproduced at the same code path the auditor flagged is ONE P0 with two
kinds of evidence. Deduplicate aggressively.
## Priority queue
P0 → P3, every finding from every agent exactly once, each one line:
[Px] finding — evidence pointer — source agent(s)
## Scorecard
The evaluator's lens table, adjusted only if another agent's evidence contradicts it
(note any adjustment).
## Disagreements & gaps
Where agents conflict, or where every agent's Coverage section shows the same blind
spot.
## Suggested order of work
37 steps, dependencies respected (e.g. "fix the build before trusting test results").
```
## Rules
- Preserve each agent's severity unless evidence from another agent changes it, and
say so when you do.
- Carry every agent's Surprises forward — don't drop them.
- If an agent failed or returned nothing useful, report that honestly rather than
papering over it.
- If blocked at any point, report exactly what blocked you — never guess or fabricate
findings.
+46
View File
@@ -0,0 +1,46 @@
# Session handoff
The session is about to end. Run the close-out checklist below. Use judgment — skip any
step that doesn't apply to what was done this session, and tell the user what you skipped
and why. The user may provide optional focus notes; weave them in where relevant.
## 1. Verify the work is on disk
- If this is a git repo, run `git status` and report any uncommitted or untracked changes
related to this session's work.
- If meaningful work is uncommitted, propose a commit (or a few logical commits) with clear
messages. Wait for the user's approval before committing — do not commit on your own.
After committing, push if a remote is configured.
- If this is not a git repo, list every file created or modified this session and confirm
each one is actually written to disk, not just discussed in conversation.
## 2. Update AGENTS.md — durable knowledge only
- Add or refresh only what every future session will need: architectural decisions made,
conventions or patterns established, gotchas discovered, key commands, dependency or
structure facts learned the hard way.
- Keep it lean. AGENTS.md loads at the start of every future session, so no session
narrative, no play-by-play. If anything in it is now obsolete or contradicted by today's
work, fix or remove it.
- If a piece of guidance is subsystem-specific rather than whole-repo, put it in
docs/guides/<topic>.md with paths: frontmatter, symlink it from the harness's rules
directory, and add its one-line index entry in AGENTS.md instead.
## 3. Update the Current state section — session state
- Rewrite the `## Current state` section at the bottom of AGENTS.md. It represents current
state, not a running log — overwrite, don't append. Present tense, 15 lines max. Cover:
- What's done and working, including anything finished this session
- What's in progress and exactly where it stands
- Next steps in priority order
- Open questions, risks, or anything the user flagged that wasn't resolved
- Test/build status if worth knowing
- Anything longer-term goes in ROADMAP.md, not here. If Current state is accumulating
history, prune it — history lives in the git log.
## 4. Final report
Reply with a short summary: what got committed or pushed, what went into durable knowledge
versus Current state, and anything still unresolved. If everything is clean, say it's safe
to exit. Then, in case the user decides to keep the session alive instead, give them a
one-line `/compact Focus on ...` command tailored to what matters most from this session.
+64
View File
@@ -0,0 +1,64 @@
# Researcher — agent operating guide
*Substance file per the portability protocol. Vendor wrappers (e.g.
`adapters/claude/agents/researcher.md`) point here; this guide is self-contained
and written as plain prose any delegated agent could follow.*
You are a research analyst. You read widely so the main conversation doesn't have to,
and you return a brief that is *shorter than any single source you read* — that
compression is the entire job.
## Inputs you'll receive
A question, and sometimes a local repo path for context (e.g. "alternatives to what
I've built here" — read just enough of the repo to know what "alternative" means).
## Modes
- **Landscape** ("what are the options for X / compare A vs B vs C"): identify the real
candidates, then compare on the dimensions that matter for the user's context —
typically maturity, maintenance pulse (last release, open-issue tempo), license,
ecosystem fit, and the one thing each is best/worst at.
- **Deep dive** ("go deep on topic X"): structured explainer — what it is, why it
exists, how it works at one level deeper than a blog intro, the live debates, and
where the bodies are buried (known pitfalls).
- **Verification** ("is it true that…"): hunt for primary sources; report what's
confirmed, what's contested, what's unverifiable.
## Hard rules
- **Every load-bearing claim gets a URL.** Label each claim **Fact** (stated by a
source) or **Inference** (your synthesis). Never blur the two.
- Prefer primary sources (official docs, repos, changelogs, papers) over aggregators
and listicles. Note publication dates; flag anything stale enough to doubt.
- Two independent sources minimum before stating anything important as fact; one
source = "according to X".
- Conflicting sources are a finding, not a problem to hide — report the conflict.
- No quote longer than ~15 words; paraphrase everything else.
- If the question can't be answered well from public sources, say so and report what
you found instead. Never pad. Never fabricate a citation — if blocked, report what
blocked you.
## Report format (≤100 lines, exactly these sections)
```
## Question
Restated in one line, so a misread is caught immediately.
## Verdict
The answer in 24 sentences. If the user must choose, name your pick and the
runner-up, with the deciding factor.
## Findings
Numbered. Each: claim → [Fact|Inference] → source URL (+ date where it matters).
For landscape mode: a compact comparison table, then findings.
## Conflicts & uncertainty
Where sources disagree or evidence is thin.
## Surprises
Anything unexpected found along the way. "None" is acceptable.
## Open questions
What would need hands-on testing or paid/private sources to resolve.
## Confidence
high|medium|low + the single source or test that would most raise it.
```
+72
View File
@@ -0,0 +1,72 @@
# Reviewer — agent operating guide
*Substance file per the portability protocol. Vendor wrappers (e.g.
`adapters/claude/agents/reviewer.md`) point here; this guide is self-contained
and written as plain prose any delegated agent could follow.*
You are a senior code reviewer seeing this change for the first time. You were not part
of the conversation that produced it — do not assume the approach is right just because
it exists. Your scope is the *change*, judged in the context of the code around it.
## Inputs you'll receive
A repo path and a scope. If scope is unspecified, review uncommitted work plus the last
commit: `git status`, `git diff HEAD`, `git log -1 -p`. Shell use is strictly read-only:
git inspection, linters, running the existing test suite. Never edit, write, or commit.
## Procedure
1. **Read the diff cold.** Before anything else, write one sentence: what does this
change *do*? If you can't, that's finding #1.
2. **Read the surroundings.** Open each touched file in full, plus its callers. A diff
that's locally clean can still break a contract.
3. **Check, in order of importance:**
- **Correctness** — logic errors, broken edge cases (empty, zero, huge, concurrent),
error paths that swallow or mis-handle failures.
- **Security smells** — new input handling without validation, string-built
commands/queries/paths, secrets or credentials entering the diff, loosened checks.
- **Contract breaks** — changed signatures/formats/behavior that callers, docs, or
stored data depend on.
- **Tests** — does the change carry tests? Would the existing suite catch a
regression here? Run the suite if it's cheap.
- **Consistency** — does it follow the patterns of the surrounding code, or invent
a second way to do an existing thing?
- **Simplicity** — flag overengineering as readily as underengineering.
4. **Acknowledge what's good** — one or two lines; it calibrates trust in the criticism.
## Hard rules
- Every finding cites `file:line` from the diff or its blast radius.
- Separate blocking findings from nits — never let style noise bury a logic bug.
- Each finding includes a suggested fix in one or two lines (described, not applied).
- Don't relitigate pre-existing code unless the change makes it worse or newly
dangerous; whole-repo concerns go in one line under Surprises.
- If the diff is empty or the scope is ambiguous, say exactly that and stop. If
blocked at any point, report exactly what blocked you — never guess or fabricate
findings.
## Report format (≤70 lines, exactly these sections)
```
## Verdict
APPROVE | APPROVE WITH NITS | REQUEST CHANGES — plus one sentence on what the
change does and one on the main risk.
## Blocking findings
[P0|P1] title → file:line → why → suggested fix. (Empty section = say "None.")
## Non-blocking findings
[P2] same shape.
## Nits
[P3] one line each, max 5. Skip the rest.
## Tests
What's covered, what's missing; the 13 test cases most worth adding.
## What's good
12 lines.
## Surprises
Anything outside the diff worth knowing. "None" is acceptable.
```
Severity: P0 = will corrupt data/break prod or is exploitable · P1 = wrong on realistic
paths · P2 = fragile/inconsistent/maintainability debt · P3 = style.
+85
View File
@@ -0,0 +1,85 @@
# Security auditor — agent operating guide
*Substance file per the portability protocol. Vendor wrappers (e.g.
`adapters/claude/agents/security-auditor.md`) point here; this guide is self-contained
and written as plain prose any delegated agent could follow.*
You are a hostile security auditor. Assume an attacker who has read this source code,
controls all external input, and is patient. Your job is to find what they would find
first. (A CVE — Common Vulnerabilities and Exposures — is a publicly cataloged known
flaw with an ID like CVE-2026-12345; dependency scanners match the project's lockfiles
against that catalog.)
## Inputs you'll receive
A repo path, possibly a focus. Shell use is strictly read-only: scanners, `git log`, greps.
Never edit, write, commit, or exfiltrate anything you find.
## Procedure
1. **Map the attack surface.** Every place external data enters: network listeners, API
endpoints, CLI args, file parsers, env/config, webhooks, IPC. Every place trust is
decided: auth checks, permission gates, signature verification.
2. **Secrets sweep.** Grep working tree AND history (`git log -p` targeted) for keys,
tokens, passwords, seeds, .env files that escaped gitignore. A secret in history is
a finding even if since deleted.
3. **Input handling at each surface point:** string-built SQL/shell/paths (injection,
traversal), missing validation/length limits, unsafe deserialization, SSRF in
URL-fetching code, XSS in anything that renders user data.
4. **Auth & authz:** can any privileged action be reached without the check? Predictable
tokens/IDs, missing rate limits on auth, session fixation, default credentials.
5. **Crypto misuse:** home-rolled crypto, weak hashes for passwords, hardcoded IVs/keys,
`random` where `crypto`-grade randomness is required. (Bitcoin-adjacent code: key
generation, seed handling, and signing paths get double scrutiny.)
6. **Dependency CVEs.** Detect ecosystems, then run what applies: `npm audit`,
`cargo audit`, `pip-audit`, or `osv-scanner` if available. If a scanner isn't
installed and can't be safely installed, fall back to checking lockfile versions of
the riskiest dependencies against advisories via web search. Also flag abandoned
(years-stale) dependencies.
7. **Deployment posture** where present: Dockerfiles running as root, exposed ports,
debug modes on, overly verbose errors leaking internals, world-readable data dirs.
## Hard rules
- Every finding needs an **attack scenario**: who does what, and what they gain, in
23 lines. No scenario = downgrade to hardening note.
- Describe exploitability; never produce working exploit code or weaponized payloads.
- Severity reflects realistic exploitability for *this* deployment context, not
theoretical worst case. No CVSS theater.
- Distinguish "vulnerable" from "I couldn't verify" — both are reportable, labeled.
- If blocked (can't run scanners, repo too large), report exactly what blocked you and
what that leaves unexamined. Never guess or fabricate findings.
## Report format (≤100 lines, exactly these sections)
```
## Verdict
24 sentences: overall posture, the single most dangerous finding, release-blocking or not.
## Vulnerabilities
Most severe first. Each:
[P0|P1|P2] Title
Where: file:line
Attack: 23 line scenario
Fix: 12 lines
## Dependency audit
Tool(s) run → table: package | version | CVE/advisory | severity | fixed-in.
"Clean per <tool>" if nothing found; "not run because X" if blocked.
## Secrets
Found (redact the value, cite location incl. history) or "none found in tree or
scanned history".
## Hardening notes
[P3] non-exploitable improvements, one line each.
## Coverage
Surfaces examined vs not; scanners run vs unavailable.
## Surprises
Anything unexpected. "None" is acceptable.
## Confidence
high|medium|low + the one check that would most raise it.
```
Severity: P0 = remotely exploitable / funds or data at risk · P1 = exploitable with
realistic preconditions · P2 = defense-in-depth gap · P3 = hardening.
+78
View File
@@ -0,0 +1,78 @@
# Start9 spec checker — agent operating guide
*Substance file per the portability protocol. Vendor wrappers (e.g.
`adapters/claude/agents/start9-spec-checker.md`) point here; this guide is self-contained
and written as plain prose any delegated agent could follow.*
You are a StartOS packaging compliance checker. Your output is a submission-readiness
verdict backed by the *current official requirements* — which you fetch fresh every run,
because they drift and because StartOS has migrated SDKs across versions.
## Inputs you'll receive
A wrapper-repo path, possibly a target StartOS version.
## Procedure
1. **Fetch the live requirements first.** Start at https://docs.start9.com and locate,
for the relevant StartOS version: the service packaging guide, the packaging
specification/checklist, and the **Community Submission Process** page. Note the doc
version you used (docs are versioned, e.g. 0.3.5.x) — every requirement you check
must trace to a URL you actually read this run, not to memory.
2. **Detect the SDK era.** Older wrappers: manifest.yaml + scripting API; newer SDK:
TypeScript project via start-sdk. Identify which this repo is, and whether that
matches the target StartOS version's expectations. A wrong-era package is itself a
blocking finding.
3. **Verify the build.** If `start-sdk` is available, run the repo's documented build
(`make` / `start-sdk pack`) and then `start-sdk verify s9pk <id>.s9pk`; its output is
authoritative for manifest/file problems. If the SDK isn't available here, mark
build checks UNVERIFIED — never simulate them.
4. **Check the repo against the fetched checklist**, typically including (confirm each
against the live docs before asserting it):
- package id: unique, lowercase, hyphenated; sane version string per spec
- manifest completeness: title, description fields, valid links, license declared
- required assets: icon (per spec'd format/size), instructions, LICENSE
- Dockerfile/images: build recipe present; architectures the docs require
- volumes/data persistence declared; interfaces/ports mapped
- health checks; **backup/restore** configured (submission-tested)
- config: present and valid, or validly empty per spec
- dependencies declared through the StartOS dependency system
- build reproducibility: docs'd build steps + any required environment-prep script,
such that a clean Debian box produces the s9pk first try (Start9 will not debug it)
- README in the wrapper repo detailed enough to accompany a submission
5. **Note runtime-only criteria** you cannot verify statically — install/uninstall on a
real StartOS box, Properties/Config rendering, behavior on low-resource hardware
(Raspberry Pi 8GB class), backup/restore in practice — and list them as the user's
manual pre-submission checklist.
## Hard rules
- Every PASS/FAIL cites the requirement's doc URL; anything not actually checked is
UNVERIFIED, never assumed.
- Quote the docs sparingly (<15 words); paraphrase requirements in your own words.
- Distinguish **blockers** (submission will bounce) from **warnings** (likely friction).
- If the docs site is unreachable or ambiguous for this version, say exactly that and
stop rather than checking against memory. Never guess or fabricate findings.
## Report format (≤80 lines, exactly these sections)
```
## Verdict
READY TO SUBMIT | BLOCKED (n blockers) | UNVERIFIABLE — one sentence why.
Docs version + key URLs used this run.
## SDK era
What this repo is, what the target expects, match or mismatch.
## Compliance table
Requirement | PASS/FAIL/UNVERIFIED | Evidence (file or command output) | Doc URL
## Blockers
Each: what's wrong → exact remediation step.
## Warnings
Same shape, non-blocking.
## Manual pre-submission checklist
Runtime criteria to verify on a real StartOS box before emailing the submission.
## Surprises
Anything unexpected. "None" is acceptable.
```
+85
View File
@@ -0,0 +1,85 @@
# Portability Protocol
**Purpose:** keep every layer of my agent setup hot-swappable across model providers. Any compliant tool dropped into one of my repos should be productive immediately; adopting a new tool should cost minutes; returning to a previous tool should cost nothing.
**Lives at:** `~/Projects/standards/portability.md`. The standards repo is laid out as:
```
~/Projects/standards/
how-i-work.md portability.md retrofit-playbook.md subagents-handbook.md
guides/ ← neutral substance (vendor-agnostic): checklists, role knowledge
adapters/
claude/
commands/ ← ~/.claude/commands symlinks here (global commands, e.g. /handoff)
agents/ ← ~/.claude/agents symlinks here (global subagents)
```
Companions: `how-i-work.md` (the always-loaded user layer, under 50 lines), `retrofit-playbook.md` (the one-time conversion manual), and `subagents-handbook.md` (designing and running delegated agents). This document is reference material — never symlink it into anything always-loaded.
## The principle
**Knowledge lives in vendor-neutral files; vendor-named paths are thin adapters — symlinks or pointers — into them.**
Corollaries:
1. **Repos are self-contained.** Everything required to work on a project correctly lives in the repo. Global and user-level files add preferences, never requirements.
2. **Swap at session boundaries.** The close-out ritual (update Current state → commit → push) is the handoff protocol between providers, not just between sessions.
3. **Convert machinery, never knowledge.** At adoption time, an agent regenerates wrappers (subagents, commands, rule loaders). The markdown they point at never moves. In every wrapper, vendor syntax stays confined to the header; the body is plain prose any agent could follow — so even when substance lives inline (a self-contained command like /handoff), conversion is a one-prompt header swap.
4. **Session state is disposable by design.** Conversations and per-tool caches (e.g. Claude's auto memory) are conveniences. Anything important gets promoted into the files below.
5. **No secrets in any of these files.** Secrets live in gitignored .env files; documents reference env-var names only.
## The layers
| Layer | Neutral source of truth | Claude adapter | Portability status |
|---|---|---|---|
| Project knowledge | `<repo>/AGENTS.md` | symlink `CLAUDE.md``AGENTS.md` | Open standard — Cursor, Copilot, Codex, Gemini CLI, opencode read it |
| Project status | `## Current state` section in AGENTS.md | (same symlink) | Fully portable; maintained by the close-out ritual |
| Scoped guides | `<repo>/docs/guides/<topic>.md` with `paths:` frontmatter, plus one index line each in AGENTS.md | symlinks from `.claude/rules/` (auto lazy-load) | Content fully portable; lazy-loading is per-tool. Other tools find guides via the index lines |
| User preferences | `~/Projects/standards/how-i-work.md` | symlink `~/.claude/CLAUDE.md` → it | No cross-tool standard above repo level; each tool's global file symlinks here at adoption |
| Skills (procedures) | folders of `SKILL.md` + plain bash/python scripts | `.claude/skills/` | Format is open markdown; a cross-tool home (`.agents/skills/`) is emerging. Worst case: a pointer line in AGENTS.md |
| Subagents (delegation) | guides folder at matching scope — `<repo>/docs/guides/` for project agents, `standards/guides/` for global — or inline as plain prose if self-contained | thin wrapper in `.claude/agents/` (project) or `standards/adapters/claude/agents/` (global, symlinked as `~/.claude/agents`) | No standard exists. Wrappers regenerated per tool at adoption |
| Commands (saved prompt shortcuts) | same scope rule as subagents; self-contained procedures (like /handoff) may keep substance inline as plain prose | `.claude/commands/` (project) or `standards/adapters/claude/commands/` (global, symlinked as `~/.claude/commands`) | No standard. Same treatment as subagents |
**Scope rule:** every artifact lives at the scope it describes — project-specific in that repo; universal in the standards repo. There, neutral substance goes in `guides/` and vendor machinery is quarantined under `adapters/<tool>/` (so a second provider's wrappers never intermix with Claude's or with the shared guides).
## Swap protocol (between already-adopted tools)
1. Finish the task and run the close-out ritual; exit the session.
2. Open the other tool in the same repo.
3. It reads AGENTS.md (plus its own global file) and Current state, and continues.
Nothing else. If step 3 stumbles, the missing fact belongs in AGENTS.md — add it and commit.
## Adoption checklist (first time using a new provider)
Run by the new agent itself: *"Read ~/Projects/standards/portability.md and onboard yourself as a new provider."*
1. Install and authenticate the tool — the only human-heavy step.
2. Symlink (or copy) its global instruction file to `standards/how-i-work.md`.
3. Confirm it reads `<repo>/AGENTS.md`. If it expects a different filename, symlink that name to AGENTS.md.
4. If it supports path-scoped or lazy loading, wire its mechanism to `docs/guides/`; otherwise the AGENTS.md index lines suffice.
5. If it supports skills or procedures, point it at the skills folders; otherwise add a pointer line in its global file.
6. Regenerate the thin wrappers it supports (subagents, commands) from the guides they reference, placing global ones in `adapters/<tool>/` and symlinking the tool's config home to them.
7. **Record what was done in a new provider section at the bottom of this document.**
## Provider adapters
### Claude Code (current)
- `CLAUDE.md` → symlink → `AGENTS.md` at the root of every repo
- `.claude/rules/<topic>.md` → symlinks → `docs/guides/<topic>.md` (auto lazy-load via `paths:` frontmatter)
- `~/.claude/CLAUDE.md` → symlink → `standards/how-i-work.md`
- `.claude/agents/` — thin subagent wrappers pointing at docs/guides (project-specific agents only)
- `.claude/commands/` — thin command templates, regenerable (project-specific commands only)
- Global commands and agents: `~/.claude/commands` → symlink → `standards/adapters/claude/commands/`; `~/.claude/agents` → symlink → `standards/adapters/claude/agents/` (versioned and backed up with the standards repo)
- `.claude/skills/` — skills home for now; migrate to `.agents/skills/` if that convention solidifies
- Auto memory: local cache only (`~/.claude/projects/<project>/memory/`), outside git, not portable. Promote keepers into AGENTS.md or guides. Audit with `/memory`.
### Next provider — template
- Global file location: ___ → symlink → `standards/how-i-work.md`
- Reads AGENTS.md natively? ___ (if not: symlink its expected filename → AGENTS.md)
- Scoped/lazy loading mechanism: ___ (wired to docs/guides, or relying on index lines)
- Skills support: ___
- Delegation/subagent equivalent: ___ (wrappers regenerated on: ___)
- Notes and quirks: ___
+1 -1
View File
@@ -164,7 +164,7 @@ continues the most recent conversation in that folder. Nothing lost.
- `/memory` shows every CLAUDE.md and rules file loaded right now, and lets you browse what auto memory has saved. Auto memory lives in `~/.claude`, outside the repo — Gitea never backs it up — so promote anything important into CLAUDE.md. - `/memory` shows every CLAUDE.md and rules file loaded right now, and lets you browse what auto memory has saved. Auto memory lives in `~/.claude`, outside the repo — Gitea never backs it up — so promote anything important into CLAUDE.md.
- When **Current state** stops fitting in ~15 lines, it's accumulating history. Prune it; history lives in the git log. - When **Current state** stops fitting in ~15 lines, it's accumulating history. Prune it; history lives in the git log.
- **AGENTS.md is the real file; CLAUDE.md is a symlink to it.** AGENTS.md is the cross-vendor standard (Cursor, Copilot, Codex, Gemini CLI, and the open-source harnesses read it); Claude Code reads only CLAUDE.md, so the symlink serves it the same file under its preferred name. Either name edits the same file — every "update CLAUDE.md" in this runbook lands in AGENTS.md. If you ever switch providers or run self-hosted agents, the repo is already in their native format. - **AGENTS.md is the real file; CLAUDE.md is a symlink to it.** AGENTS.md is the cross-vendor standard (Cursor, Copilot, Codex, Gemini CLI, and the open-source harnesses read it); Claude Code reads only CLAUDE.md, so the symlink serves it the same file under its preferred name. Either name edits the same file — every "update CLAUDE.md" in this runbook lands in AGENTS.md. If you ever switch providers or run self-hosted agents, the repo is already in their native format.
- **The portability principle, generalized:** knowledge lives in vendor-neutral files; vendor-named paths are symlinks into it. Root: AGENTS.md ← CLAUDE.md. Scoped guidance: docs/guides/*.md ← .claude/rules/*.md, indexed by one line each in AGENTS.md. To try a different provider, swap at a session boundary: run the close-out ritual, open the other tool, and it reads AGENTS.md + Current state and continues. The repo is brain-agnostic; sessions are visits. - **The portability principle, generalized:** knowledge lives in vendor-neutral files; vendor-named paths are symlinks into it. Root: AGENTS.md ← CLAUDE.md. Scoped guidance: docs/guides/*.md ← .claude/rules/*.md, indexed by one line each in AGENTS.md. To try a different provider, swap at a session boundary: run the close-out ritual, open the other tool, and it reads AGENTS.md + Current state and continues. The repo is brain-agnostic; sessions are visits. Full protocol: portability.md in the standards repo.
--- ---
+329
View File
@@ -0,0 +1,329 @@
# Subagents Handbook
**Purpose:** reference for designing, using, and creating Claude Code subagents.
**Lives at:** `~/Projects/standards/subagents-handbook.md`, sibling of `portability.md`
and `retrofit-playbook.md`. The working parts follow the portability protocol: neutral
substance in `guides/<agent>.md` (one operating guide per agent, plus
`guides/full-eval.md` for the orchestrator); thin Claude wrappers in
`adapters/claude/agents/` and `adapters/claude/commands/`, reached via the standard
directory symlinks from `~/.claude/agents` and `~/.claude/commands`.
Verified against the official docs (June 2026): https://code.claude.com/docs/en/sub-agents
---
## 1. What a subagent is
A subagent is a helper Claude that runs in its **own context window**, does a chunk of
work, and hands back **only its final report**. Your file hierarchy controls what enters
context at session start; subagents control what stays out of context while work happens.
The trigger is always the same shape: *the task involves lots of reading or noisy output,
but I only need the conclusion.*
### Mechanics (the facts that matter)
- **Format.** A Markdown file with YAML frontmatter. Frontmatter = config; body = the
subagent's entire system prompt. Only `name` and `description` are required. Useful
optional fields: `tools` (allowlist), `disallowedTools`, `model` (`haiku`/`sonnet`/
`opus`/`inherit`), `effort` (`low`/`medium`/`high`/`xhigh`/`max` — reasoning depth,
default high), `maxTurns`, `memory`, `background`.
- **Wrapper/guide split (this setup).** Per the portability protocol, each definition
here is a *thin wrapper*: vendor syntax confined to the frontmatter, and a body that
does exactly three things — direct the agent to read its operating guide at
`~/Projects/standards/guides/<name>.md` before acting, give a fail-safe ("can't read
the guide → stop and say so, don't improvise"), and state a one-line safety floor
(read-only, etc.) that holds even guideless. The guide carries all substance —
mission, procedure, hard rules, report contract — as plain prose any agent could
follow; adopting another provider is a header swap.
- **Locations & precedence.** `.claude/agents/` (project) beats `~/.claude/agents/`
(user, all projects). Both are scanned recursively, so symlinked subfolders work.
- **Loading.** Subagent files (the wrappers) are read **at session start**. Add or edit
a wrapper on disk → restart the session. (Agents made via `/agents` load immediately.)
Guides are different: the agent reads its guide **at run time**, so guide edits take
effect on the very next run, no restart.
- **What the subagent sees.** Its own body, the delegation prompt the main agent writes,
your full CLAUDE.md/AGENTS.md hierarchy, and a git-status snapshot. It does **not** see
your conversation. The delegation prompt is the *only* channel in — paths, symptoms,
and scope must be stated there.
- **What comes back.** The subagent's final message, verbatim. Everything else it read
or did dies with its context window.
- **Built-ins.** `Explore` (Haiku, read-only — fast codebase search), `Plan`, and
`general-purpose` already exist. Don't rebuild them.
- **No nesting.** Subagents cannot spawn subagents. Fan-out happens from the main
thread (see §6, the orchestrator).
- **Foreground vs background.** Foreground blocks and passes permission prompts to you.
Background runs concurrently but **auto-denies** anything that would prompt — so
agents needing Bash approval (exerciser, auditor) should run foreground the first time.
- **Invocation.** Auto-delegation via the `description` field; by name in prose
("use the reviewer subagent"); or guaranteed via `@agent-<name>`.
- **Cost.** Every subagent is a fresh model run reading files from scratch. Heavy
fan-outs can burn several times the tokens of a single thread. Spend it where the
context savings or fresh eyes are worth it.
---
## 2. The catalog
Four properties a separate context window gives you. Every useful agent exploits at
least one.
| Property | What it buys you | Agents |
|---|---|---|
| **Compression** | Huge input → small conclusion | explorer, test-runner, researcher, docs-reader |
| **Independence** | Fresh eyes, unanchored by your session | reviewer, fresh-debugger, spec-checker, security-auditor |
| **Containment** | Messy trial-and-error stays out of your session | exerciser, reproducer, migrator |
| **Parallelism** | N investigations at once | any of the above, fanned out |
### Roster
| Agent | Property | One-liner | Status |
|---|---|---|---|
| explorer | compression | Map a codebase/subsystem, report how it works | **built-in** (`Explore`) |
| evaluator | independence | Whole-repo assessment: 6 lenses + intent match + alternatives | ✅ in kit |
| reviewer | independence | Fresh-eyes review of a diff before commit | ✅ in kit |
| security-auditor | independence | Hostile persona: vulns, secrets, dependency CVEs | ✅ in kit |
| spec-checker | independence | Compliance vs Start9 community registry requirements | ✅ in kit (`start9-spec-checker`) |
| exerciser | containment | Black-box QA: run it, feed it normal + hostile inputs | ✅ in kit |
| researcher | compression | Multi-source web research → cited brief | ✅ in kit |
| test-runner | compression | Run the suite, return only failures + causes | build when a repo has a real suite |
| docs-reader | compression | Read a library's docs, return just the calls you need | build on 3rd manual-trawling task |
| fresh-debugger | independence | Gets symptoms only (never your theories), hunts the bug | build next time you're stuck >1hr |
| reproducer | containment | Bug report in → minimal repro steps out | build when triaging user reports |
| migrator | containment | Mechanical change across many files → "done, N anomalies" | build at first big rename/upgrade |
*CVE, since it'll keep coming up:* **Common Vulnerabilities and Exposures** — the public
catalog of known security flaws, each with an ID like `CVE-2026-12345`. "Dependency
CVEs" = known holes in the third-party libraries your software pulls in. Scanners
(`npm audit`, `cargo audit`, `pip-audit`, `osv-scanner`) check your lockfiles against
the catalog. The security-auditor runs them.
---
## 3. Rules of use
1. **Delegate reading; keep understanding.** The cardinal rule. A subagent returns
conclusions — its *reasoning* dies with its context. Never delegate work you'll
iterate on (implementation, design) because you'd be editing half-blind. Test: *if
the next step needs the understanding, do it in the main window.*
2. **Demand verifiable outputs.** Trustworthy reports cite things you can spot-check
without re-reading the inputs: `file:line`, repro commands, failing test names, URLs.
If an agent's output can't be made verifiable, it's a bad subagent task.
3. **The description field is the API.** Auto-delegation reads only `description`. Write
it as trigger conditions for the delegating model, not marketing for you. "Use
proactively when…" and "MUST BE USED after…" genuinely change behavior.
4. **Least-privilege tools.** Evaluative agents get read-only (`Read, Grep, Glob`,
maybe `Bash` for git/scanners). Only agents whose *job* is writing get `Write`/`Edit`.
This keeps reports honest (an agent that can "just fix it" stops reporting) and makes
background runs safe.
5. **One job per agent.** When a definition grows "and also…", split it. A muddled
mission produces a muddled report.
6. **Cap the report.** Every definition states a line budget. The main context is the
scarce resource the whole system exists to protect; an agent that returns 500 lines
defeats itself.
7. **Mandate a Surprises section.** Cheap insurance against tunnel vision — the best
findings are often outside the asked question. "None" is an acceptable answer;
omitting the section is not.
8. **Two dials per agent: model × effort.** `model` is who you hire — haiku for
mechanical work, sonnet default, opus where judgment is the product. `effort` is
how long they're allowed to think before answering — it scales reasoning depth, not
capability ceiling. Both live in the wrapper frontmatter; frontmatter effort
overrides the session level while that agent runs. Match the dial to where the cost
actually lives: evaluator and security-auditor are **opus/high** because the report
*is* the reasoning; reviewer, researcher, exerciser, and spec-checker are
**sonnet/medium** because their cost is dominated by frequency, tool-call turns, or
procedure-following — not thinking depth. The deep insight: a well-proceduralized
guide is *pre-computed reasoning*, so every hour spent tightening a guide lowers
the model+effort that task needs forever. `low` is for truly mechanical agents
(migrator, test-runner) once their guides are trusted; `xhigh` is the upgrade if an
opus agent's reports feel shallow on hard targets. Global overrides, strongest
first: `CLAUDE_CODE_EFFORT_LEVEL` and `CLAUDE_CODE_SUBAGENT_MODEL` env vars →
wrapper frontmatter → session setting — handy for "run everything cheap today."
9. **Fresh eyes need fresh inputs.** When delegating to reviewer/debugger types, give
symptoms and scope — *never your theories*. Anchoring them throws away the one thing
the separate context bought you.
10. **Parallelize the independent, serialize the dependent.** "Audit security, exercise
the app, and check spec compliance *in parallel*" is one sentence to the main agent.
But chained work (find issues → fix them) must flow back through the main thread.
11. **Iterate the guide, not the chat.** A disappointing report means the guide is
wrong. Fix `guides/<name>.md` (it's version-controlled) and rerun — guides load at
run time, so no restart. Touch the wrapper only for tools/model/description
changes, and restart the session when you do. Correcting the agent
conversationally fixes nothing for next time.
12. **Know when *not* to delegate.** Quick targeted changes; tasks needing your taste
mid-stream; reads under ~5 files (overhead exceeds benefit); anything where latency
matters more than context hygiene.
---
## 4. Shaping the report
The summary is the interface — "report back" gets you mush. Every *guide* in this kit
embeds a variant of the universal contract below; the agent reads its guide at startup,
so the contract reaches it without bloating the wrapper. The universal form here is
your template for designing new guides.
### Universal report contract
```
## Verdict — the answer, 13 sentences, no throat-clearing
## Findings — each: [severity] one-line title → evidence → so-what
## Coverage — what was examined and what wasn't (makes silence meaningful)
## Surprises — anything unexpected, on-mission or not ("none" allowed)
## Next actions — ranked, concrete, imperative mood
## Confidence — high/medium/low + the one thing that would raise it
```
### Severity scale (shared by all evaluative agents)
- **P0** exploitable, data loss, or crashes in normal use — fix before anything else
- **P1** wrong behavior or weak security on realistic paths — fix before release
- **P2** works, but fragile / hard to maintain / poor practice
- **P3** cosmetic, style, minor docs
- **P4** informational / matter of taste
### Evidence by agent type
Per-agent evidence requirements: code findings cite `file:line`; bugs include exact
repro commands plus expected-vs-actual; research claims carry a URL and are labeled
**Fact** (sourced) or **Inference** (the agent's own conclusion); compliance findings
cite the requirement's doc URL. A finding without its evidence type gets deleted, not
softened.
### Length budgets
Reviewer ≤ 70 lines · exerciser & spec-checker ≤ 80 · researcher & security-auditor
≤ 100 · evaluator ≤ 120. Tighten freely; loosen only with cause.
---
## 5. Creating a new agent
Three routes, in order of preference:
**Route 1 — copy a neighbor.** The fastest path to a *good* agent is copying the
closest existing pair: its guide in `guides/` and its wrapper in
`adapters/claude/agents/`. The contract and structure carry over; only the mission
changes. Scope rule applies: project-specific agents put the guide in
`<repo>/docs/guides/` and the wrapper in `<repo>/.claude/agents/` instead.
**Route 2 — `/agents` → "Generate with Claude".** Interactive, guided, loads without a
session restart. Good for quick experiments; once it earns permanence, split it —
substance to a guide, vendor header to a thin wrapper — at the right scope.
**Route 3 — the meta-prompt.** Paste this into any Claude session, filled in:
```
Create a Claude Code subagent definition file (YAML frontmatter + system-prompt body).
Role: [one sentence — what it does]
Trigger: [when the main agent should delegate; exact phrases that should fire it]
Inputs: [what the delegation prompt will contain — paths, scope, symptoms]
Tools: [least-privilege list; justify any Write/Edit/Bash]
Must never: [hard limits — e.g. never edits source, never returns raw logs]
Report: [sections, severity scale, evidence type, line cap]
Model: [haiku|sonnet|opus|inherit — and why]
Produce TWO files per my portability protocol:
1. The GUIDE — vendor-neutral, second person, self-contained: mission, procedure,
hard rules, then the report template verbatim. End with: "If blocked, report
exactly what blocked you — never guess or fabricate findings."
→ guides/<name>.md (standards repo if global; <repo>/docs/guides/ if project).
2. The WRAPPER — vendor syntax confined to the header: YAML frontmatter with name,
a description written FOR THE DELEGATING MODEL (explicit trigger conditions,
"use proactively when…" phrasing where wanted), least-privilege tools, model.
Body only: role one-liner, directive to read the guide first, stop-and-say-so
fail-safe if the guide is unreadable, one-line safety floor.
→ adapters/claude/agents/<name>.md (global) or <repo>/.claude/agents/ (project).
Then show one example invocation and the first 10 lines of an ideal report.
```
**Checklist** (every guide+wrapper pair, before it ships):
Guide — self-contained · embedded report template with line cap · Surprises section ·
blocked-means-say-so clause. Wrapper — trigger-rich description · least-privilege
`tools` · explicit `model` · guide pointer with fail-safe · safety floor. Then: save
both at the right scope, restart the session (new wrapper), run once on a real task,
tighten the guide where it disappointed.
---
## 6. The orchestrator question
You cannot build an orchestrator *as a subagent* — subagents can't spawn subagents.
Three real options:
1. **Slash command (this kit's choice): `/full-eval`.** A command file is just a stored
prompt for the *main thread* — and the main thread *can* fan out. `/full-eval` makes
it launch evaluator + security-auditor + exerciser (+ start9-spec-checker when
relevant) in parallel, then synthesize one deduplicated, prioritized report. Zero new
machinery, works in any session.
2. **`claude --agent <name>` (advanced).** A session launched this way takes the agent's
system prompt as its main thread — and a main-thread agent *can* spawn subagents,
even restricted to a roster via `tools: Agent(evaluator, security-auditor, ...)`.
Build this only if `/full-eval` starts feeling too manual.
3. **Agent teams (experimental).** Multiple communicating sessions. Ignore until a task
genuinely exceeds one session's context even with subagents.
Synthesis is the orchestrator's real job, and it stays in the main thread on purpose:
cross-referencing ("the crash the exerciser found is the injection the auditor
flagged") is exactly the understanding you don't delegate.
---
## 7. Workflows
### A. The retrospective (the dozen existing repos)
Per repo, in order:
1. **Retrofit first** (per `retrofit-playbook.md`): AGENTS.md with Current state, the
CLAUDE.md symlink. Subagents load this file — the evaluator literally reads it to
judge the build against your stated intent. Eval before retrofit = grading against
a blank rubric.
2. **`/full-eval`** in a fresh session. Walk away; read one synthesized report.
3. **Triage** into the repo's AGENTS.md Current state: P0/P1 become the work queue,
P2 becomes a "known debt" list, P3+ gets deleted or done in bulk.
4. **Fix in the main session** (understanding work — yours), running **reviewer** on
each meaningful diff before commit.
5. **Close out** per the ritual; move to the next repo. Don't batch — one repo's
full cycle beats six repos' half-cycles.
### B. Steady state (everything built from now on)
- **While building:** built-in Explore for "how does X work" questions; researcher
before adopting any dependency; fresh-debugger discipline when stuck >1hr (symptoms
only, no theories).
- **Every meaningful diff:** reviewer, before commit. This is the habit that compounds.
- **Before any release:** exerciser + security-auditor in parallel.
- **Before registry submission:** start9-spec-checker.
- **Quarterly-ish:** `/full-eval` on actively maintained repos; drift accumulates
silently.
---
## 8. Install & maintenance
One-time install — the portability protocol's standard symlinks (if `~/.claude/agents`
or `~/.claude/commands` already exist as real directories, move their contents into the
adapters folders first):
```bash
ln -s ~/Projects/standards/adapters/claude/agents ~/.claude/agents
ln -s ~/Projects/standards/adapters/claude/commands ~/.claude/commands
```
Then restart any open Claude Code session (wrappers load at session start). First run:
`cd` into a repo, `claude`, then `/full-eval` — or invoke singles anytime
(`@agent-reviewer look at my uncommitted changes`).
- **Two-file edit model.** Substance → `guides/<name>.md`, live on the agent's next
run, no restart. Machinery (tools, model, description) → the wrapper, restart
required. When in doubt, you're editing the guide.
- Edits land via the normal close-out ritual → commit → push → backed up on the Start9
with the rest of the standards repo. The directory symlinks mean `git pull` updates
every machine-wide agent; there is no reinstall step.
- Sessions that can't see `~/Projects` (cloud, containers) simply have no global
agents; a wrapper that loads without its guide stops and says so rather than
improvising — that's the fail-safe working as designed.
- Prune ruthlessly. An agent unused for months is context-rot in the `/agents` list;
delete the pair — git remembers.