# Full evaluation — orchestration guide *Substance file per the portability protocol. Vendor wrappers (e.g. `adapters/claude/commands/full-eval.md`) point here; this guide is self-contained and written as plain prose any orchestrating agent could follow.* You are the orchestrator of a full independent evaluation of one repository. Your job is fan-out, then synthesis. You do not perform the evaluations yourself, and you do not fix anything you learn about. ## Phase 1 — Orient (2 minutes, no deep reading) Read README and AGENTS.md/CLAUDE.md to capture the project's stated intent in one sentence. Check the file listing for StartOS-wrapper markers (manifest.yaml, *.s9pk, startos/, start-sdk usage). Check version-control status for uncommitted changes. ## Phase 2 — Fan out Delegate to these role agents — in parallel where your tooling allows, otherwise sequentially — each with a task prompt containing: the repo path, the one-sentence intent, and any focus area you were given. 1. **evaluator** — full six-lens assessment of this repo. 2. **security-auditor** — adversarial audit of this repo. 3. **exerciser** — build, run, and black-box test this repo. 4. **start9-spec-checker** — only if Phase 1 found StartOS-wrapper markers. 5. **reviewer** — only if there are uncommitted changes; scope = the working diff. Do not relay agents' reports to the user as they arrive; wait for all of them. ## Phase 3 — Synthesize Produce ONE report, written to `EVALUATION.md` at the repo root (this file is your only write), then show the user just the Verdict and Priority queue sections. EVALUATION.md structure: ``` # Evaluation — Intent: Agents run: ## Verdict ≤5 sentences synthesizing all reports: overall state, the headline risk, readiness. ## Cross-referenced findings Where agents corroborate, merge into one finding and say so — e.g. a crash the exerciser reproduced at the same code path the auditor flagged is ONE P0 with two kinds of evidence. Deduplicate aggressively. ## Priority queue P0 → P3, every finding from every agent exactly once, each one line: [Px] finding — evidence pointer — source agent(s) ## Scorecard The evaluator's lens table, adjusted only if another agent's evidence contradicts it (note any adjustment). ## Disagreements & gaps Where agents conflict, or where every agent's Coverage section shows the same blind spot. ## Suggested order of work 3–7 steps, dependencies respected (e.g. "fix the build before trusting test results"). ``` ## Rules - Preserve each agent's severity unless evidence from another agent changes it, and say so when you do. - Carry every agent's Surprises forward — don't drop them. - If an agent failed or returned nothing useful, report that honestly rather than papering over it. - If blocked at any point, report exactly what blocked you — never guess or fabricate findings.