test(soak): add genie-ai-runtime 24h soak harness (#113) by web-dev0521 · Pull Request #130 · GeniePod/genie-claw

web-dev0521 · 2026-05-20T20:24:17Z

Summary

Adds the soak-test harness for #113 (M1 exit: genie-ai-runtime v1 backend stable for 24h) under tests/soak/. This delivers the "soak script" half of the issue's last acceptance item. The 24h run itself — and the committed telemetry artifact it produces — is a maintainer hardware step on a Jetson Orin Nano Super 8 GB; this PR is the tooling that drives and scores that run. Part of #113 (not closing it).

Changes

tests/soak/soak_driver.py — stdlib pacing driver: every N seconds POSTs /api/chat/stream (timing first-token latency from the NDJSON stream) and polls /api/health for genie-ai-runtime reachability + memory; samples runtime RSS; writes telemetry.jsonl. Sends a fresh conversation_id per tick by default so accumulating history can't inflate first-token latency at hours 12/24 and produce a false regression on the very criterion being scored (--conversation shared opts into the growing-context profile).
tests/soak/analyze_soak.py — scores telemetry + tegrastats + journal against all six acceptance criteria → report.md + summary.json; exits non-zero on any failure. Missing inputs report N/A and are excluded from the gate (never a silent pass). --self-check scores a committed fixture offline.
tests/soak/genie-soak.sh — systemd-aware orchestrator: snapshots unit state + NRestarts, runs tegrastats, drives the workload, captures dmesg/journal for the OOM scan, then scores the run.
tests/soak/example/ — a synthetic fixture (not a real run) + its generator, so CI/reviewers can exercise the analyzer without hardware.
tests/soak/README.md, .gitignore (ignores runs/); make soak-selfcheck target.

Two honest limitations, called out in the README:

Latency budget is a placeholder. --budget-p50-ms / --budget-p99-ms default to 1500/4000 ms; the repo pins no real alpha voice-latency number. A latency PASS is not authoritative until that figure is set.
This harness does not, by itself, satisfy M1 exit: genie-ai-runtime client path stable for 24h continuous run #113 — five of six criteria are verdicts a real 24h run produces.

Real Behavior Proof

I have built and run the affected code locally (or noted why I could not).
I have verified the change end-to-end on Jetson hardware OR explained the equivalent verification path I used.

What I ran

No Jetson available — verified the harness on an x86_64 dev host instead:

# CI lint gates (the Scripts workflow)
ruff check tests/soak/soak_driver.py tests/soak/analyze_soak.py tests/soak/example/make_fixture.py
shellcheck --severity=warning genie-soak.sh   # run on an LF copy

# analyzer scored against the committed example fixture
python tests/soak/analyze_soak.py --self-check

# driver end-to-end against a local mock genie-core implementing
# NDJSON /api/chat/stream + /api/health, fresh and shared modes
python tests/soak/soak_driver.py --core-url http://127.0.0.1:PORT --interval 1 --duration 3 --out t.jsonl

What I observed

ruff check → All checks passed.
shellcheck --severity=warning → clean (rc=0).
analyze_soak.py --self-check → 6/6 verdicts as expected, overall PASS.
Driver E2E: NDJSON parsed correctly, first-token latency measured per tick, /api/health reachability recorded, telemetry flushed per tick. Fresh mode produced 3 distinct conversation_ids (…-1/-2/-3); shared mode produced exactly 1 — confirming the latency-comparability fix on the wire.

Test plan

On a Jetson with genie-core + genie-ai-runtime up under systemd:

Smoke test: tests/soak/genie-soak.sh --duration-h 0.05 --interval 5.
Full run under tmux/nohup: tests/soak/genie-soak.sh --duration-h 24 --interval 30 --budget-p50-ms <ALPHA_P50> --budget-p99-ms <ALPHA_P99>.
Review runs/<ts>/report.md; commit the run dir as the issue's telemetry artifact.

Notes for reviewers

Please confirm the alpha first-token latency budget so the placeholder defaults can be replaced.
Should the driver hit genie-core's /api/chat/stream (current — exercises the real path through the runtime) or genie-ai-runtime's :8080 directly? Open to either; the former matches how /api/health reports reachability.

Pacing driver (soak_driver.py), criteria analyzer (analyze_soak.py) and a systemd-aware orchestrator (genie-soak.sh) under tests/soak/, plus a synthetic example fixture for an offline analyzer self-check and a make soak-selfcheck target. The 24h run itself is a maintainer hardware step; this is the harness that drives and scores it.

GeniePod#113) PEP 604 unions (float | None, int | None) in function signatures are evaluated at definition time and crash the make soak-selfcheck path on Python 3.9. Add from __future__ import annotations so annotations stay lazy strings; document Python 3.9+ support for the hardware-free self-check.

Refresh the soak-harness branch onto current GeniePod:main (7f80ca6) so the PR runs full CI against the latest base. No conflicts: the harness adds only new files under tests/soak/ and a Makefile edit upstream did not touch.

ai-hpc · 2026-05-22T15:39:29Z

Review note: not merging this yet. The soak harness self-check currently crashes before it runs on the local python3 here:

TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

That comes from evaluated annotations such as float | None in tests/soak/analyze_soak.py. Either require/document Python >= 3.10 explicitly in the harness/Makefile, or add from __future__ import annotations so the no-hardware make soak-selfcheck path works on Python 3.9 too. After that, please refresh against current main so full CI runs.

ai-hpc

Re-reviewed after the Python 3.9 annotation fix and latest-main refresh. Local python3 tests/soak/analyze_soak.py --self-check, make soak-selfcheck, bash -n, and Python compile checks passed here; GitHub Scripts, CI, and aarch64 cross-compile are green. This lands the harness only and does not close #113; the real 24h Jetson artifact still needs to follow.

web-dev0521 · 2026-05-22T15:56:51Z

@ai-hpc , All checks are passed, 👍

web-dev0521 · 2026-05-22T15:57:00Z

Thank you.

ai-hpc · 2026-05-22T15:59:44Z

merged at 561e01b

Thanks @web-dev0521

web-dev0521 added 3 commits May 20, 2026 16:21

Merge upstream/main into feat/issue-113-runtime-soak-test

5a9a896

Refresh the soak-harness branch onto current GeniePod:main (7f80ca6) so the PR runs full CI against the latest base. No conflicts: the harness adds only new files under tests/soak/ and a Makefile edit upstream did not touch.

ai-hpc added this to the M1 — Agent Harness Stabilization milestone May 22, 2026

Merge branch 'main' into feat/issue-113-runtime-soak-test

b906637

ai-hpc approved these changes May 22, 2026

View reviewed changes

ai-hpc merged commit 561e01b into GeniePod:main May 22, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(soak): add genie-ai-runtime 24h soak harness (#113)#130

test(soak): add genie-ai-runtime 24h soak harness (#113)#130
ai-hpc merged 4 commits into
GeniePod:mainfrom
web-dev0521:feat/issue-113-runtime-soak-test

web-dev0521 commented May 20, 2026

Uh oh!

ai-hpc commented May 22, 2026

Uh oh!

ai-hpc left a comment

Uh oh!

Uh oh!

web-dev0521 commented May 22, 2026

Uh oh!

web-dev0521 commented May 22, 2026

Uh oh!

ai-hpc commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

web-dev0521 commented May 20, 2026

Summary

Changes

Real Behavior Proof

What I ran

What I observed

Test plan

Notes for reviewers

Uh oh!

ai-hpc commented May 22, 2026

Uh oh!

ai-hpc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

web-dev0521 commented May 22, 2026

Uh oh!

web-dev0521 commented May 22, 2026

Uh oh!

ai-hpc commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants