Verify and correct placement guide infra facts with owner
Replace the one-shot/UNVERIFIED infra section with owner-confirmed facts: x86 StartOS 0.4.0 box + full service inventory; the two-Spark role split (LLM vs audio/speech, Qdrant on the audio Spark, matrix-bridge hosted there); route via the Spark Control gateway and query the active model rather than hardcoding one; networking reduced to LAN/WireGuard/StartTunnel (Proton/Tor were legacy). Align decision steps 4 and 6.
This commit is contained in:
+52
-32
@@ -6,39 +6,57 @@ encodes two things: a stable **decision sequence** (rarely changes) and a set of
|
|||||||
every new idea (`guides/new-project.md`, Phase 2); `how-i-work.md` points here so any
|
every new idea (`guides/new-project.md`, Phase 2); `how-i-work.md` points here so any
|
||||||
session placing a project consults it rather than guessing.
|
session placing a project consults it rather than guessing.
|
||||||
|
|
||||||
> ⚠️ **The infrastructure facts below are UNVERIFIED.** They were generated one-shot from
|
> ✅ **Verified with the owner 2026-06-15** (and cross-checked against the project repos).
|
||||||
> chat history and have **not** been confirmed against the actual setup. Treat every fact in
|
> Keep this section current as the infra changes — see Maintenance. The *decision sequence*
|
||||||
> the next section as provisional until reviewed and corrected with the user — see the
|
> and the *substance rule* are stable regardless.
|
||||||
> standards `ROADMAP.md` item "Verify & correct the placement guide." The *decision sequence*
|
|
||||||
> and the *substance rule* are sound regardless; it's the specific service/model/network
|
|
||||||
> facts that need a pass.
|
|
||||||
|
|
||||||
## Infrastructure facts (PROVISIONAL — last generated June 2026, not yet verified)
|
## Infrastructure facts (verified 2026-06-15)
|
||||||
|
|
||||||
**Start9 server** — StartOS 0.4.x. Hosts long-running services as s9pk packages or plain
|
**Start9 server** — one box, **StartOS 0.4.0**, **x86_64** (0.4.0 doesn't run on Raspberry
|
||||||
containers. Believed running: Gitea (version control for LLM-assisted projects — the default
|
Pi / ARM, so x86 is the only option — build s9pks `x86_64`). It hosts long-running services
|
||||||
repo home), Nextcloud (general file backup), Home Assistant (Container install), Electrs,
|
as s9pk packages. Running on it: Gitea (the default repo home for every project), Nextcloud
|
||||||
Core Lightning + RTL, Open WebUI as the sovereign chat/session layer.
|
(file backup), Home Assistant, Core Lightning + Ride the Lightning (RTL), Open WebUI (the
|
||||||
|
sovereign chat layer), Vaultwarden, and Synapse (the Matrix homeserver, `matrix.gilliam.ai`).
|
||||||
|
Every Claude-built app also lives here: recap (public at `recaps.cc`), keysat, premier-gunner,
|
||||||
|
proof-of-work, recap-relay, ten31-database, spark-control.
|
||||||
|
|
||||||
**Inference** — Two NVIDIA DGX Sparks behind the Spark Control HTTP gateway on the LAN,
|
**Inference — two NVIDIA DGX Sparks (ARM64), fronted by the Spark Control gateway on the
|
||||||
serving Qwen3 (vLLM, OpenAI-compatible endpoints) as the primary production backend. Kokoro
|
LAN.** Spark Control is the single HTTP endpoint every app calls; the two Sparks split by role:
|
||||||
for TTS. bge-m3 for embeddings. Treated as real production capacity — existing apps (call
|
- **LLM Spark** — vLLM, OpenAI-compatible. Serves whichever general model is currently
|
||||||
transcription/recap, CRM pipeline, email-summary agent) already depend on it.
|
activated (daily driver right now: **Qwen3.6**; Gemma and others are downloaded and
|
||||||
|
hot-swappable from the Spark Control dashboard).
|
||||||
|
- **Audio / speech Spark** — Parakeet (STT), Kokoro (TTS), Sortformer + TitaNet (diarization),
|
||||||
|
**bge-m3 embeddings + Qdrant**, and the rerank model. It also hosts the **matrix-bridge**
|
||||||
|
container (on the WireGuard subnet).
|
||||||
|
|
||||||
**Data layer defaults** — SQLite for structured data; Qdrant + bge-m3 when semantic
|
Treated as real production capacity — recap / recap-relay (transcription + analysis),
|
||||||
retrieval is needed; flat files when that's the honest answer.
|
ten31-database (CRM pipeline), ten31-signal-engine, and ten31-transcripts already depend on it.
|
||||||
|
|
||||||
**Sovereignty boundary (standing rule)** — Anything touching sensitive investor, LP, or
|
**Don't hardcode a model name.** Route to the Spark Control gateway and ask its API which
|
||||||
portfolio data uses local models only, via the Spark gateway. Frontier APIs (Anthropic etc.)
|
model is live — that single-endpoint indirection is the point; the active model changes when
|
||||||
are fine for everything else. Non-negotiable per project; the only question is which side of
|
the owner swaps it from the dashboard.
|
||||||
the line the project's data sits on.
|
|
||||||
|
|
||||||
**Access** — WireGuard split-tunnel from macOS to the home subnet (runs alongside Proton
|
**Data layer defaults** — SQLite for structured data; **Qdrant + bge-m3** (both on the
|
||||||
VPN). iOS is constrained to a single VPN tunnel; workarounds are Tor onion addresses or a
|
audio/speech Spark) when semantic retrieval is needed, with per-project collections; flat
|
||||||
merged WireGuard config. So "reachable from phone" is a real design constraint, not a
|
files when that's the honest answer.
|
||||||
footnote.
|
|
||||||
|
|
||||||
**Dev machine** — macOS with Claude Code. One-off and personal CLI tools live here happily.
|
**Sovereignty boundary (standing rule)** — anything touching sensitive investor, LP, or
|
||||||
|
portfolio data uses local models only, via the Spark Control gateway, behind a redaction
|
||||||
|
boundary wherever free text could carry names. Frontier APIs (Anthropic etc.) are fine for
|
||||||
|
everything else. Non-negotiable per project; the only question is which side of the line the
|
||||||
|
project's data sits on — and AGENTS.md must state it so a session never wires a frontier call
|
||||||
|
to payload data.
|
||||||
|
|
||||||
|
**Access / networking** — three mechanisms, no others (Proton VPN and Tor were legacy and are
|
||||||
|
not in use):
|
||||||
|
- **LAN** — the default; apps, Sparks, and the box share it.
|
||||||
|
- **WireGuard** — how the owner's own devices reach LAN-only services when off-LAN.
|
||||||
|
- **StartTunnel** — Start9's ClearNet feature; publicly exposes selected services (recap at
|
||||||
|
`recaps.cc`, Synapse/Matrix, and the ten31-database CRM — the CRM is ClearNet-exposed with
|
||||||
|
app-level user auth so only the team reaches it).
|
||||||
|
|
||||||
|
**Dev machine** — macOS with Claude Code; also the s9pk / macOS-app build host. One-off and
|
||||||
|
personal CLI tools live here happily.
|
||||||
|
|
||||||
## Decision sequence (stable)
|
## Decision sequence (stable)
|
||||||
|
|
||||||
@@ -62,18 +80,20 @@ single-user glue, and anything still changing shape weekly. Default for prototyp
|
|||||||
now, promote to s9pk if it survives and stabilizes. Packaging for 0.4.x is nontrivial; don't
|
now, promote to s9pk if it survives and stabilizes. Packaging for 0.4.x is nontrivial; don't
|
||||||
pay it on spec.
|
pay it on spec.
|
||||||
|
|
||||||
**4. Model routing.** Default to local Qwen3 via the Spark gateway when the sovereignty
|
**4. Model routing.** Default to the local model via the Spark Control gateway when the
|
||||||
boundary applies, when latency/cost favor local, or when the task is well within Qwen3's
|
sovereignty boundary applies, when latency/cost favor local, or when the task is well within
|
||||||
capability. Route to frontier (Claude API) for hard reasoning on non-sensitive data. Record
|
the local model's capability. Don't hardcode a model name — call the gateway and ask which
|
||||||
the chosen endpoint in AGENTS.md so sessions don't guess.
|
model is active. Route to frontier (Claude API) for hard reasoning on non-sensitive data.
|
||||||
|
Record the chosen endpoint (gateway vs frontier) in AGENTS.md so sessions don't guess.
|
||||||
|
|
||||||
**5. Data layer.** SQLite unless there's a reason; Qdrant + bge-m3 when retrieval quality is
|
**5. Data layer.** SQLite unless there's a reason; Qdrant + bge-m3 when retrieval quality is
|
||||||
the product; flat files for logs and artifacts. Name Qdrant collections per-project to avoid
|
the product; flat files for logs and artifacts. Name Qdrant collections per-project to avoid
|
||||||
the shared-collection mess.
|
the shared-collection mess.
|
||||||
|
|
||||||
**6. Interface.** CLI first unless the UI *is* the product. If it must be reachable from the
|
**6. Interface.** CLI first unless the UI *is* the product. If it must be reachable from the
|
||||||
phone, remember the iOS single-tunnel constraint — decide up front whether that means onion
|
phone or by the team off-LAN, decide up front how: expose it over ClearNet via StartTunnel
|
||||||
address, merged WireGuard config, or "Mac-only is fine."
|
with app-level auth (how the CRM and `recaps.cc` are reached), or keep it LAN-only and reach
|
||||||
|
it over WireGuard from your own devices.
|
||||||
|
|
||||||
**7. Repo home.** Gitea on Start9. Always — even for parked-then-revived ideas, so history
|
**7. Repo home.** Gitea on Start9. Always — even for parked-then-revived ideas, so history
|
||||||
accumulates in one place.
|
accumulates in one place.
|
||||||
|
|||||||
Reference in New Issue
Block a user