v0.5.0 - Wake-on-LAN + connectivity history
wol.py:
- build_magic_packet(): standard 6x0xFF + 16x MAC layout
- send_local_broadcast(): direct from container (ports 9 + 7 for safety)
- send_via_peer(): preferred path; SSHes to the OTHER Spark and runs a Python one-liner there so the packet originates on the target's LAN segment (most reliable)
- MAC validation + normalization
connectivity.py:
- /data/connectivity.json persistence (thread-safe, atomic rename)
- Stores per-Spark current state + last_change timestamp + rolling 200-event log
- Records up/down transitions; computes down_seconds / up_seconds durations
- MAC cache populated lazily during hardware probes
hardware.py:
- Probe now reads MAC via /sys/class/net/<default-route-iface>/address
- After each probe, record_state() emits a transition event if state changed
- record_mac() caches the address so WoL works when the Spark next goes down
Endpoints:
- GET /api/connectivity: macs, current state, last_change, events[]
- POST /api/spark/{name}/wake: tries via-peer first, falls back to direct broadcast
UI:
- Unreachable hardware card shows the cached MAC + 'Wake (WoL)' button (only if MAC known)
- New 'Connectivity log' button opens a modal with per-Spark transition history (last 25 each), including duration of each prior up/down period
- pollHardware also pulls /api/connectivity so WoL buttons appear without an extra fetch
Package: bump 0.5.0:0; main.ts sets CONNECTIVITY_LOG=/data/connectivity.json
This commit is contained in:
+12
-4
@@ -10,6 +10,7 @@ import time
|
||||
from typing import Any
|
||||
|
||||
from .config import Settings
|
||||
from .connectivity import record_mac, record_state
|
||||
from .ssh import ssh_run
|
||||
|
||||
|
||||
@@ -23,6 +24,8 @@ echo MEMORY=$(free -b 2>/dev/null | awk '/^Mem:/ {print $2, $3}')
|
||||
echo DISK=$(df -B1 / 2>/dev/null | awk 'NR==2 {print $2, $3}')
|
||||
echo GPU=$(nvidia-smi --query-gpu=name,utilization.gpu,temperature.gpu,power.draw,memory.total --format=csv,noheader,nounits 2>/dev/null | head -1)
|
||||
echo GPU_MEM_USED_MIB=$(nvidia-smi --query-compute-apps=used_gpu_memory --format=csv,noheader,nounits 2>/dev/null | awk '{s+=$1} END {print s+0}')
|
||||
DEFIF=$(ip route show default 2>/dev/null | awk '{print $5; exit}')
|
||||
echo MAC=$(cat /sys/class/net/$DEFIF/address 2>/dev/null)
|
||||
""".strip()
|
||||
|
||||
|
||||
@@ -78,6 +81,9 @@ def _parse(out: str) -> dict:
|
||||
# Sum per-process compute memory (works even on unified-memory systems)
|
||||
if info.get("gpu_mem_used_mib"):
|
||||
parsed["gpu_mem_used_mib"] = _parse_int(info["gpu_mem_used_mib"])
|
||||
# MAC address on the default-route interface (for Wake-on-LAN)
|
||||
if info.get("mac"):
|
||||
parsed["mac"] = info["mac"].lower()
|
||||
return parsed
|
||||
|
||||
|
||||
@@ -118,12 +124,14 @@ class HardwareProbe:
|
||||
# marked this host unreachable, return the cached failure immediately.
|
||||
rc, out, err = await ssh_run(host, user, _PROBE, self.settings, timeout=6)
|
||||
if rc != 0:
|
||||
# Cache failures for a slightly longer TTL so the dashboard isn't
|
||||
# blocked behind 6 s of SSH timeout on every poll.
|
||||
result = {"reachable": False, "configured": True, "host": host, "error": err.strip() or out.strip() or f"rc={rc}"}
|
||||
self._cache[key] = (now, result)
|
||||
# Override the TTL effectively by inserting a sentinel into the cache age
|
||||
record_state(key, False)
|
||||
return result
|
||||
result = {"reachable": True, "configured": True, "host": host, **_parse(out)}
|
||||
parsed = _parse(out)
|
||||
result = {"reachable": True, "configured": True, "host": host, **parsed}
|
||||
self._cache[key] = (now, result)
|
||||
record_state(key, True)
|
||||
if parsed.get("mac"):
|
||||
record_mac(key, parsed["mac"])
|
||||
return result
|
||||
|
||||
Reference in New Issue
Block a user