Files

T

Keysat ee5c8bb3e2 Verify and correct placement guide infra facts with owner

Replace the one-shot/UNVERIFIED infra section with owner-confirmed facts:
x86 StartOS 0.4.0 box + full service inventory; the two-Spark role split
(LLM vs audio/speech, Qdrant on the audio Spark, matrix-bridge hosted there);
route via the Spark Control gateway and query the active model rather than
hardcoding one; networking reduced to LAN/WireGuard/StartTunnel (Proton/Tor
were legacy). Align decision steps 4 and 6.

2026-06-15 17:16:34 -05:00

6.6 KiB

Raw Blame History

Placement guide — where should a new project live?

Reference doc for the "where does this run, which model, what data layer?" question. It encodes two things: a stable decision sequence (rarely changes) and a set of infrastructure facts (go stale — keep them current). /new-project walks this against every new idea (guides/new-project.md, Phase 2); how-i-work.md points here so any session placing a project consults it rather than guessing.

✅ Verified with the owner 2026-06-15 (and cross-checked against the project repos). Keep this section current as the infra changes — see Maintenance. The decision sequence and the substance rule are stable regardless.

Infrastructure facts (verified 2026-06-15)

Start9 server — one box, StartOS 0.4.0, x86_64 (0.4.0 doesn't run on Raspberry Pi / ARM, so x86 is the only option — build s9pks x86_64). It hosts long-running services as s9pk packages. Running on it: Gitea (the default repo home for every project), Nextcloud (file backup), Home Assistant, Core Lightning + Ride the Lightning (RTL), Open WebUI (the sovereign chat layer), Vaultwarden, and Synapse (the Matrix homeserver, matrix.gilliam.ai). Every Claude-built app also lives here: recap (public at recaps.cc), keysat, premier-gunner, proof-of-work, recap-relay, ten31-database, spark-control.

Inference — two NVIDIA DGX Sparks (ARM64), fronted by the Spark Control gateway on the LAN. Spark Control is the single HTTP endpoint every app calls; the two Sparks split by role:

LLM Spark — vLLM, OpenAI-compatible. Serves whichever general model is currently activated (daily driver right now: Qwen3.6; Gemma and others are downloaded and hot-swappable from the Spark Control dashboard).
Audio / speech Spark — Parakeet (STT), Kokoro (TTS), Sortformer + TitaNet (diarization), bge-m3 embeddings + Qdrant, and the rerank model. It also hosts the matrix-bridge container (on the WireGuard subnet).

Treated as real production capacity — recap / recap-relay (transcription + analysis), ten31-database (CRM pipeline), ten31-signal-engine, and ten31-transcripts already depend on it.

Don't hardcode a model name. Route to the Spark Control gateway and ask its API which model is live — that single-endpoint indirection is the point; the active model changes when the owner swaps it from the dashboard.

Data layer defaults — SQLite for structured data; Qdrant + bge-m3 (both on the audio/speech Spark) when semantic retrieval is needed, with per-project collections; flat files when that's the honest answer.

Sovereignty boundary (standing rule) — anything touching sensitive investor, LP, or portfolio data uses local models only, via the Spark Control gateway, behind a redaction boundary wherever free text could carry names. Frontier APIs (Anthropic etc.) are fine for everything else. Non-negotiable per project; the only question is which side of the line the project's data sits on — and AGENTS.md must state it so a session never wires a frontier call to payload data.

Access / networking — three mechanisms, no others (Proton VPN and Tor were legacy and are not in use):

LAN — the default; apps, Sparks, and the box share it.
WireGuard — how the owner's own devices reach LAN-only services when off-LAN.
StartTunnel — Start9's ClearNet feature; publicly exposes selected services (recap at recaps.cc, Synapse/Matrix, and the ten31-database CRM — the CRM is ClearNet-exposed with app-level user auth so only the team reaches it).

Dev machine — macOS with Claude Code; also the s9pk / macOS-app build host. One-off and personal CLI tools live here happily.

Decision sequence (stable)

Walk these in order; each answer narrows the next.

1. Sensitivity. Does the project ingest, store, or send investor/LP/portfolio data to a model? If yes: local inference mandatory, hosting on the home subnet strongly preferred, and AGENTS.md must state the constraint explicitly so a coding session never "helpfully" wires in a frontier API call with payload data.

2. Runtime shape. One-shot CLI / scheduled job / long-running service / interactive UI?

One-shot or personal CLI → Mac. Don't deploy what doesn't need deploying.
Scheduled job → Mac launchd if it only matters while the laptop lives; Start9 if it must run unattended 24/7.
Long-running service, or anything other devices/family/agents need to reach → Start9.

3. If Start9: s9pk or plain container? s9pk earns its packaging cost when the service wants the StartOS lifecycle — backups, health checks, dependency management, clean updates — or could plausibly be published for others. Plain container (or script) wins for experiments, single-user glue, and anything still changing shape weekly. Default for prototypes: container now, promote to s9pk if it survives and stabilizes. Packaging for 0.4.x is nontrivial; don't pay it on spec.

4. Model routing. Default to the local model via the Spark Control gateway when the sovereignty boundary applies, when latency/cost favor local, or when the task is well within the local model's capability. Don't hardcode a model name — call the gateway and ask which model is active. Route to frontier (Claude API) for hard reasoning on non-sensitive data. Record the chosen endpoint (gateway vs frontier) in AGENTS.md so sessions don't guess.

5. Data layer. SQLite unless there's a reason; Qdrant + bge-m3 when retrieval quality is the product; flat files for logs and artifacts. Name Qdrant collections per-project to avoid the shared-collection mess.

6. Interface. CLI first unless the UI is the product. If it must be reachable from the phone or by the team off-LAN, decide up front how: expose it over ClearNet via StartTunnel with app-level auth (how the CRM and recaps.cc are reached), or keep it LAN-only and reach it over WireGuard from your own devices.

7. Repo home. Gitea on Start9. Always — even for parked-then-revived ideas, so history accumulates in one place.

Phase-exit criteria — the substance rule

Phase exits are falsifiable substance: numbers and demonstrable behavior. "46/46 tests pass," "recap generated from a real 40-minute call in under 2 minutes," "correct doc in top-3 for 9/10 canned queries." If the criterion can't fail, it isn't a criterion.

Maintenance

The infrastructure facts section is the part that goes stale. When the infra changes — new hardware, StartOS version, model lineup, network setup, a service added or retired — update that section here rather than working around it in conversation. The decision sequence and the substance rule rarely change.

6.6 KiB Raw Blame History