test(soak): add genie-ai-runtime 24h soak harness (#113)#130
Conversation
Pacing driver (soak_driver.py), criteria analyzer (analyze_soak.py) and a systemd-aware orchestrator (genie-soak.sh) under tests/soak/, plus a synthetic example fixture for an offline analyzer self-check and a make soak-selfcheck target. The 24h run itself is a maintainer hardware step; this is the harness that drives and scores it.
GeniePod#113) PEP 604 unions (float | None, int | None) in function signatures are evaluated at definition time and crash the make soak-selfcheck path on Python 3.9. Add from __future__ import annotations so annotations stay lazy strings; document Python 3.9+ support for the hardware-free self-check.
Refresh the soak-harness branch onto current GeniePod:main (7f80ca6) so the PR runs full CI against the latest base. No conflicts: the harness adds only new files under tests/soak/ and a Makefile edit upstream did not touch.
|
Review note: not merging this yet. The soak harness self-check currently crashes before it runs on the local That comes from evaluated annotations such as |
ai-hpc
left a comment
There was a problem hiding this comment.
Re-reviewed after the Python 3.9 annotation fix and latest-main refresh. Local python3 tests/soak/analyze_soak.py --self-check, make soak-selfcheck, bash -n, and Python compile checks passed here; GitHub Scripts, CI, and aarch64 cross-compile are green. This lands the harness only and does not close #113; the real 24h Jetson artifact still needs to follow.
|
@ai-hpc , All checks are passed, 👍 |
|
Thank you. |
|
merged at 561e01b Thanks @web-dev0521 |
Summary
Adds the soak-test harness for #113 (M1 exit: genie-ai-runtime v1 backend stable for 24h) under
tests/soak/. This delivers the "soak script" half of the issue's last acceptance item. The 24h run itself — and the committed telemetry artifact it produces — is a maintainer hardware step on a Jetson Orin Nano Super 8 GB; this PR is the tooling that drives and scores that run. Part of #113 (not closing it).Changes
tests/soak/soak_driver.py— stdlib pacing driver: every N seconds POSTs/api/chat/stream(timing first-token latency from the NDJSON stream) and polls/api/healthforgenie-ai-runtimereachability + memory; samples runtime RSS; writestelemetry.jsonl. Sends a freshconversation_idper tick by default so accumulating history can't inflate first-token latency at hours 12/24 and produce a false regression on the very criterion being scored (--conversation sharedopts into the growing-context profile).tests/soak/analyze_soak.py— scores telemetry + tegrastats + journal against all six acceptance criteria →report.md+summary.json; exits non-zero on any failure. Missing inputs report N/A and are excluded from the gate (never a silent pass).--self-checkscores a committed fixture offline.tests/soak/genie-soak.sh— systemd-aware orchestrator: snapshots unit state +NRestarts, runstegrastats, drives the workload, captures dmesg/journal for the OOM scan, then scores the run.tests/soak/example/— a synthetic fixture (not a real run) + its generator, so CI/reviewers can exercise the analyzer without hardware.tests/soak/README.md,.gitignore(ignoresruns/);make soak-selfchecktarget.Two honest limitations, called out in the README:
--budget-p50-ms/--budget-p99-msdefault to 1500/4000 ms; the repo pins no real alpha voice-latency number. A latency PASS is not authoritative until that figure is set.Real Behavior Proof
What I ran
No Jetson available — verified the harness on an x86_64 dev host instead:
What I observed
ruff check→ All checks passed.shellcheck --severity=warning→ clean (rc=0).analyze_soak.py --self-check→ 6/6 verdicts as expected, overall PASS./api/healthreachability recorded, telemetry flushed per tick. Fresh mode produced 3 distinctconversation_ids (…-1/-2/-3); shared mode produced exactly 1 — confirming the latency-comparability fix on the wire.Test plan
On a Jetson with
genie-core+genie-ai-runtimeup under systemd:tests/soak/genie-soak.sh --duration-h 0.05 --interval 5.tests/soak/genie-soak.sh --duration-h 24 --interval 30 --budget-p50-ms <ALPHA_P50> --budget-p99-ms <ALPHA_P99>.runs/<ts>/report.md; commit the run dir as the issue's telemetry artifact.Notes for reviewers
genie-core's/api/chat/stream(current — exercises the real path through the runtime) orgenie-ai-runtime's:8080directly? Open to either; the former matches how/api/healthreports reachability.