Skip to content

test(soak): add genie-ai-runtime 24h soak harness (#113)#130

Merged
ai-hpc merged 4 commits into
GeniePod:mainfrom
web-dev0521:feat/issue-113-runtime-soak-test
May 22, 2026
Merged

test(soak): add genie-ai-runtime 24h soak harness (#113)#130
ai-hpc merged 4 commits into
GeniePod:mainfrom
web-dev0521:feat/issue-113-runtime-soak-test

Conversation

@web-dev0521
Copy link
Copy Markdown
Contributor

Summary

Adds the soak-test harness for #113 (M1 exit: genie-ai-runtime v1 backend stable for 24h) under tests/soak/. This delivers the "soak script" half of the issue's last acceptance item. The 24h run itself — and the committed telemetry artifact it produces — is a maintainer hardware step on a Jetson Orin Nano Super 8 GB; this PR is the tooling that drives and scores that run. Part of #113 (not closing it).

Changes

  • tests/soak/soak_driver.py — stdlib pacing driver: every N seconds POSTs /api/chat/stream (timing first-token latency from the NDJSON stream) and polls /api/health for genie-ai-runtime reachability + memory; samples runtime RSS; writes telemetry.jsonl. Sends a fresh conversation_id per tick by default so accumulating history can't inflate first-token latency at hours 12/24 and produce a false regression on the very criterion being scored (--conversation shared opts into the growing-context profile).
  • tests/soak/analyze_soak.py — scores telemetry + tegrastats + journal against all six acceptance criteria → report.md + summary.json; exits non-zero on any failure. Missing inputs report N/A and are excluded from the gate (never a silent pass). --self-check scores a committed fixture offline.
  • tests/soak/genie-soak.sh — systemd-aware orchestrator: snapshots unit state + NRestarts, runs tegrastats, drives the workload, captures dmesg/journal for the OOM scan, then scores the run.
  • tests/soak/example/ — a synthetic fixture (not a real run) + its generator, so CI/reviewers can exercise the analyzer without hardware.
  • tests/soak/README.md, .gitignore (ignores runs/); make soak-selfcheck target.

Two honest limitations, called out in the README:

  • Latency budget is a placeholder. --budget-p50-ms / --budget-p99-ms default to 1500/4000 ms; the repo pins no real alpha voice-latency number. A latency PASS is not authoritative until that figure is set.
  • This harness does not, by itself, satisfy M1 exit: genie-ai-runtime client path stable for 24h continuous run #113 — five of six criteria are verdicts a real 24h run produces.

Real Behavior Proof

  • I have built and run the affected code locally (or noted why I could not).
  • I have verified the change end-to-end on Jetson hardware OR explained the equivalent verification path I used.

What I ran

No Jetson available — verified the harness on an x86_64 dev host instead:

# CI lint gates (the Scripts workflow)
ruff check tests/soak/soak_driver.py tests/soak/analyze_soak.py tests/soak/example/make_fixture.py
shellcheck --severity=warning genie-soak.sh   # run on an LF copy

# analyzer scored against the committed example fixture
python tests/soak/analyze_soak.py --self-check

# driver end-to-end against a local mock genie-core implementing
# NDJSON /api/chat/stream + /api/health, fresh and shared modes
python tests/soak/soak_driver.py --core-url http://127.0.0.1:PORT --interval 1 --duration 3 --out t.jsonl

What I observed

  • ruff checkAll checks passed.
  • shellcheck --severity=warningclean (rc=0).
  • analyze_soak.py --self-check6/6 verdicts as expected, overall PASS.
  • Driver E2E: NDJSON parsed correctly, first-token latency measured per tick, /api/health reachability recorded, telemetry flushed per tick. Fresh mode produced 3 distinct conversation_ids (…-1/-2/-3); shared mode produced exactly 1 — confirming the latency-comparability fix on the wire.

Test plan

On a Jetson with genie-core + genie-ai-runtime up under systemd:

  1. Smoke test: tests/soak/genie-soak.sh --duration-h 0.05 --interval 5.
  2. Full run under tmux/nohup: tests/soak/genie-soak.sh --duration-h 24 --interval 30 --budget-p50-ms <ALPHA_P50> --budget-p99-ms <ALPHA_P99>.
  3. Review runs/<ts>/report.md; commit the run dir as the issue's telemetry artifact.

Notes for reviewers

  • Please confirm the alpha first-token latency budget so the placeholder defaults can be replaced.
  • Should the driver hit genie-core's /api/chat/stream (current — exercises the real path through the runtime) or genie-ai-runtime's :8080 directly? Open to either; the former matches how /api/health reports reachability.

Pacing driver (soak_driver.py), criteria analyzer (analyze_soak.py) and a
systemd-aware orchestrator (genie-soak.sh) under tests/soak/, plus a synthetic
example fixture for an offline analyzer self-check and a make soak-selfcheck
target. The 24h run itself is a maintainer hardware step; this is the harness
that drives and scores it.
GeniePod#113)

PEP 604 unions (float | None, int | None) in function signatures are evaluated
at definition time and crash the make soak-selfcheck path on Python 3.9. Add
from __future__ import annotations so annotations stay lazy strings; document
Python 3.9+ support for the hardware-free self-check.
Refresh the soak-harness branch onto current GeniePod:main (7f80ca6) so the PR
runs full CI against the latest base. No conflicts: the harness adds only new
files under tests/soak/ and a Makefile edit upstream did not touch.
@ai-hpc
Copy link
Copy Markdown
Member

ai-hpc commented May 22, 2026

Review note: not merging this yet. The soak harness self-check currently crashes before it runs on the local python3 here:

TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

That comes from evaluated annotations such as float | None in tests/soak/analyze_soak.py. Either require/document Python >= 3.10 explicitly in the harness/Makefile, or add from __future__ import annotations so the no-hardware make soak-selfcheck path works on Python 3.9 too. After that, please refresh against current main so full CI runs.

Copy link
Copy Markdown
Member

@ai-hpc ai-hpc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after the Python 3.9 annotation fix and latest-main refresh. Local python3 tests/soak/analyze_soak.py --self-check, make soak-selfcheck, bash -n, and Python compile checks passed here; GitHub Scripts, CI, and aarch64 cross-compile are green. This lands the harness only and does not close #113; the real 24h Jetson artifact still needs to follow.

@ai-hpc ai-hpc merged commit 561e01b into GeniePod:main May 22, 2026
8 checks passed
@web-dev0521
Copy link
Copy Markdown
Contributor Author

@ai-hpc , All checks are passed, 👍

@web-dev0521
Copy link
Copy Markdown
Contributor Author

Thank you.

@ai-hpc
Copy link
Copy Markdown
Member

ai-hpc commented May 22, 2026

merged at 561e01b

Thanks @web-dev0521

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants