ee5c8bb3e2
Replace the one-shot/UNVERIFIED infra section with owner-confirmed facts: x86 StartOS 0.4.0 box + full service inventory; the two-Spark role split (LLM vs audio/speech, Qdrant on the audio Spark, matrix-bridge hosted there); route via the Spark Control gateway and query the active model rather than hardcoding one; networking reduced to LAN/WireGuard/StartTunnel (Proton/Tor were legacy). Align decision steps 4 and 6.
113 lines
6.6 KiB
Markdown
113 lines
6.6 KiB
Markdown
# Placement guide — where should a new project live?
|
|
|
|
Reference doc for the "where does this run, which model, what data layer?" question. It
|
|
encodes two things: a stable **decision sequence** (rarely changes) and a set of
|
|
**infrastructure facts** (go stale — keep them current). `/new-project` walks this against
|
|
every new idea (`guides/new-project.md`, Phase 2); `how-i-work.md` points here so any
|
|
session placing a project consults it rather than guessing.
|
|
|
|
> ✅ **Verified with the owner 2026-06-15** (and cross-checked against the project repos).
|
|
> Keep this section current as the infra changes — see Maintenance. The *decision sequence*
|
|
> and the *substance rule* are stable regardless.
|
|
|
|
## Infrastructure facts (verified 2026-06-15)
|
|
|
|
**Start9 server** — one box, **StartOS 0.4.0**, **x86_64** (0.4.0 doesn't run on Raspberry
|
|
Pi / ARM, so x86 is the only option — build s9pks `x86_64`). It hosts long-running services
|
|
as s9pk packages. Running on it: Gitea (the default repo home for every project), Nextcloud
|
|
(file backup), Home Assistant, Core Lightning + Ride the Lightning (RTL), Open WebUI (the
|
|
sovereign chat layer), Vaultwarden, and Synapse (the Matrix homeserver, `matrix.gilliam.ai`).
|
|
Every Claude-built app also lives here: recap (public at `recaps.cc`), keysat, premier-gunner,
|
|
proof-of-work, recap-relay, ten31-database, spark-control.
|
|
|
|
**Inference — two NVIDIA DGX Sparks (ARM64), fronted by the Spark Control gateway on the
|
|
LAN.** Spark Control is the single HTTP endpoint every app calls; the two Sparks split by role:
|
|
- **LLM Spark** — vLLM, OpenAI-compatible. Serves whichever general model is currently
|
|
activated (daily driver right now: **Qwen3.6**; Gemma and others are downloaded and
|
|
hot-swappable from the Spark Control dashboard).
|
|
- **Audio / speech Spark** — Parakeet (STT), Kokoro (TTS), Sortformer + TitaNet (diarization),
|
|
**bge-m3 embeddings + Qdrant**, and the rerank model. It also hosts the **matrix-bridge**
|
|
container (on the WireGuard subnet).
|
|
|
|
Treated as real production capacity — recap / recap-relay (transcription + analysis),
|
|
ten31-database (CRM pipeline), ten31-signal-engine, and ten31-transcripts already depend on it.
|
|
|
|
**Don't hardcode a model name.** Route to the Spark Control gateway and ask its API which
|
|
model is live — that single-endpoint indirection is the point; the active model changes when
|
|
the owner swaps it from the dashboard.
|
|
|
|
**Data layer defaults** — SQLite for structured data; **Qdrant + bge-m3** (both on the
|
|
audio/speech Spark) when semantic retrieval is needed, with per-project collections; flat
|
|
files when that's the honest answer.
|
|
|
|
**Sovereignty boundary (standing rule)** — anything touching sensitive investor, LP, or
|
|
portfolio data uses local models only, via the Spark Control gateway, behind a redaction
|
|
boundary wherever free text could carry names. Frontier APIs (Anthropic etc.) are fine for
|
|
everything else. Non-negotiable per project; the only question is which side of the line the
|
|
project's data sits on — and AGENTS.md must state it so a session never wires a frontier call
|
|
to payload data.
|
|
|
|
**Access / networking** — three mechanisms, no others (Proton VPN and Tor were legacy and are
|
|
not in use):
|
|
- **LAN** — the default; apps, Sparks, and the box share it.
|
|
- **WireGuard** — how the owner's own devices reach LAN-only services when off-LAN.
|
|
- **StartTunnel** — Start9's ClearNet feature; publicly exposes selected services (recap at
|
|
`recaps.cc`, Synapse/Matrix, and the ten31-database CRM — the CRM is ClearNet-exposed with
|
|
app-level user auth so only the team reaches it).
|
|
|
|
**Dev machine** — macOS with Claude Code; also the s9pk / macOS-app build host. One-off and
|
|
personal CLI tools live here happily.
|
|
|
|
## Decision sequence (stable)
|
|
|
|
Walk these in order; each answer narrows the next.
|
|
|
|
**1. Sensitivity.** Does the project ingest, store, or send investor/LP/portfolio data to a
|
|
model? If yes: local inference mandatory, hosting on the home subnet strongly preferred, and
|
|
AGENTS.md must state the constraint explicitly so a coding session never "helpfully" wires in
|
|
a frontier API call with payload data.
|
|
|
|
**2. Runtime shape.** One-shot CLI / scheduled job / long-running service / interactive UI?
|
|
- One-shot or personal CLI → Mac. Don't deploy what doesn't need deploying.
|
|
- Scheduled job → Mac launchd if it only matters while the laptop lives; Start9 if it must
|
|
run unattended 24/7.
|
|
- Long-running service, or anything other devices/family/agents need to reach → Start9.
|
|
|
|
**3. If Start9: s9pk or plain container?** s9pk earns its packaging cost when the service
|
|
wants the StartOS lifecycle — backups, health checks, dependency management, clean updates —
|
|
or could plausibly be published for others. Plain container (or script) wins for experiments,
|
|
single-user glue, and anything still changing shape weekly. Default for prototypes: container
|
|
now, promote to s9pk if it survives and stabilizes. Packaging for 0.4.x is nontrivial; don't
|
|
pay it on spec.
|
|
|
|
**4. Model routing.** Default to the local model via the Spark Control gateway when the
|
|
sovereignty boundary applies, when latency/cost favor local, or when the task is well within
|
|
the local model's capability. Don't hardcode a model name — call the gateway and ask which
|
|
model is active. Route to frontier (Claude API) for hard reasoning on non-sensitive data.
|
|
Record the chosen endpoint (gateway vs frontier) in AGENTS.md so sessions don't guess.
|
|
|
|
**5. Data layer.** SQLite unless there's a reason; Qdrant + bge-m3 when retrieval quality is
|
|
the product; flat files for logs and artifacts. Name Qdrant collections per-project to avoid
|
|
the shared-collection mess.
|
|
|
|
**6. Interface.** CLI first unless the UI *is* the product. If it must be reachable from the
|
|
phone or by the team off-LAN, decide up front how: expose it over ClearNet via StartTunnel
|
|
with app-level auth (how the CRM and `recaps.cc` are reached), or keep it LAN-only and reach
|
|
it over WireGuard from your own devices.
|
|
|
|
**7. Repo home.** Gitea on Start9. Always — even for parked-then-revived ideas, so history
|
|
accumulates in one place.
|
|
|
|
## Phase-exit criteria — the substance rule
|
|
|
|
Phase exits are falsifiable substance: numbers and demonstrable behavior. "46/46 tests
|
|
pass," "recap generated from a real 40-minute call in under 2 minutes," "correct doc in
|
|
top-3 for 9/10 canned queries." If the criterion can't fail, it isn't a criterion.
|
|
|
|
## Maintenance
|
|
|
|
The **infrastructure facts** section is the part that goes stale. When the infra changes —
|
|
new hardware, StartOS version, model lineup, network setup, a service added or retired —
|
|
update that section here rather than working around it in conversation. The decision sequence
|
|
and the substance rule rarely change.
|