Skip to content

fix: log SQLite health_log insert and prune failure#160

Merged
ai-hpc merged 2 commits into
GeniePod:mainfrom
andriypolanski:fix/genie-health-log-insert-and-prune-failure
May 25, 2026
Merged

fix: log SQLite health_log insert and prune failure#160
ai-hpc merged 2 commits into
GeniePod:mainfrom
andriypolanski:fix/genie-health-log-insert-and-prune-failure

Conversation

@andriypolanski
Copy link
Copy Markdown
Contributor

@andriypolanski andriypolanski commented May 23, 2026

Summary

genie-health discarded SQLite errors when inserting health check rows and when pruning old health_log entries
via let _ = self.db.execute(...). A full disk, permission error, or corrupted DB produced no log line, so the dashboard could show
stale “last known good” service history while writes silently failed.

This PR logs insert and prune failures with tracing::error! (service name or cutoff timestamp + rusqlite::Error). Polling continues unchanged — no fatal exit.
Closes #159

Changes

  • Extract insert_health_log and prune_health_log helpers in checker.rs.
  • Replace silent let _ = self.db.execute(...) with if let Err(e) = ... { tracing::error!(...) }.
  • Add unit tests: writable DB insert/prune happy path; read-only DB write failure does not panic.

Real Behavior Proof

  • I have built and run the affected code locally (or noted why I could not).
  • I have verified the change end-to-end on Jetson hardware OR explained the equivalent verification path I used.

What I ran

cargo test -p genie-health
cargo build -p genie-health --release

What I observed

  • cargo test -p genie-health: all tests passed, including health_log_insert_and_prune_on_writable_db and health_log_write_errors_do_not_panic_on_readonly_db.
  • Read-only repro (issue steps): test opens health.db read-only (chmod 444), calls insert_health_log / prune_health_log — no panic; at runtime these paths emit tracing::error! (visible via journalctl -u genie-health when the service DB is unwritable).
  • Happy path unchanged: writable DB insert + prune counts verified in unit test.

Test plan

  • cargo test -p genie-health
  • cargo build -p genie-health --release
  • Optional: point genie-health at unwritable DB path, confirm failed to insert health_log row in journalctl -u genie-health

@kiannidev
Copy link
Copy Markdown
Contributor

Hey @andriypolanski
This issue is assigned to @kiannidev
Please close this PR.

@andriypolanski andriypolanski changed the title fix(health): log SQLite health_log insert and prune failure fix: log SQLite health_log insert and prune failure May 23, 2026
@andriypolanski
Copy link
Copy Markdown
Contributor Author

Hey @andriypolanski This issue is assigned to @kiannidev Please close this PR.

I sent mine because of you uploaded issue only without pr in a few mins.
Excuse me, I'd like you to consider for me.

@ai-hpc
Copy link
Copy Markdown
Member

ai-hpc commented May 23, 2026

Reviewed: this targets valid #159 and keeps the runtime behavior in the right scope. I’ll merge the PR that provides the stronger real-behavior proof for the SQLite insert/prune failure path; please add concrete failure-path evidence if you want this one selected.

@andriypolanski
Copy link
Copy Markdown
Contributor Author

andriypolanski commented May 23, 2026

I just updated, please review again. Thanks

Real Behavior Proof — failure path (#159)

Verified on x86_64 Ubuntu 24.04 (not Jetson). This comment supplements the PR with concrete failure-path evidence for the SQLite insert/prune logging fix.

Repro (matches issue #159)

  1. Start genie-health with a writable health.db and [health] interval_secs = 1.
  2. chmod 444 the existing DB file (issue step: make health DB unwritable).
  3. Restart genie-health (required on Linux — an already-open SQLite FD may still write after chmod on the inode).

Evidence: DB stopped updating + errors are visible

Metric Writable phase Read-only restart (3 ticks)
health_log rows 6 6 (frozen)
MAX(ts_ms) 1779544248144 unchanged
Insert error log lines 0 6
Prune error log lines 0 3

Row count and max timestamp prove the dashboard/history would go stale; the new log lines make that failure observable in journalctl / stderr.

Sample ERROR lines (actual output)

2026-05-23T13:50:49.153101Z ERROR failed to insert health_log row service=core error=attempt to write a readonly database
2026-05-23T13:50:49.153151Z ERROR failed to insert health_log row service=llm error=attempt to write a readonly database
2026-05-23T13:50:49.153165Z ERROR failed to prune health_log rows cutoff_ts_ms=1779457849152 error=attempt to write a readonly database
2026-05-23T13:50:50.153396Z ERROR failed to insert health_log row service=core error=attempt to write a readonly database
2026-05-23T13:50:50.153455Z ERROR failed to insert health_log row service=llm error=attempt to write a readonly database
2026-05-23T13:50:50.153482Z ERROR failed to prune health_log rows cutoff_ts_ms=1779457850153 error=attempt to write a readonly database

Before vs after this PR

Pre-fix (#159) This PR
Insert/prune SQLite failure Silent (let _ = …) tracing::error! with service + rusqlite message
HTTP probes Still run Still run (unchanged)
Process exit on DB error No No (unchanged)

Commands run

cargo build -p genie-health --release
cargo test -p genie-health   # 5/5 pass

# chmod 444 + restart repro (abbreviated)
GENIEPOD_CONFIG=…/geniepod.toml timeout 3 ./target/release/genie-health   # seed rows
chmod 444 …/data/health.db
GENIEPOD_CONFIG=…/geniepod.toml RUST_LOG=info timeout 3 ./target/release/genie-health 2>&1 | tee readonly.log
sqlite3 …/data/health.db "SELECT COUNT(*), MAX(ts_ms) FROM health_log;"
grep -c 'failed to insert health_log row' readonly.log   # → 6
grep -c 'failed to prune health_log rows' readonly.log  # → 3

Unit tests

  • health_log_insert_and_prune_on_writable_db — happy path insert + prune
  • health_log_write_errors_do_not_panic_on_readonly_db — read-only connection, no panic

Checklist (for CI contribution template):

  • Built and ran affected code locally
  • Equivalent verification path documented (x86 live repro + unit tests; Jetson not available here)

Happy to re-run on Jetson with journalctl -u genie-health if a maintainer wants hardware confirmation.

@ai-hpc ai-hpc merged commit 59a89cc into GeniePod:main May 25, 2026
7 checks passed
@ai-hpc
Copy link
Copy Markdown
Member

ai-hpc commented May 25, 2026

Reviewed and merged: this was selected because it fixes valid #159 with tests and concrete SQLite insert/prune failure-path evidence. Merged at 59a89cc; thanks @andriypolanski.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] genie-health: SQLite health_log insert/prune failures are silent

3 participants