Skip to content

feat(observability): SQLite write-lock wait/hold instrumentation per component — make starvation detectable #1340

@Kpa-clawbot

Description

@Kpa-clawbot

Summary

The #1339 root cause (neighbor-builder write-lock starvation) was invisible to operators for ~3 days. We need first-class instrumentation for SQLite write-lock acquisition latency + duration so the next regression of this class is detectable from operator metrics.

Why this matters now

What to add

Wrap every Exec/tx.Begin on the writer connection in instrumentation:

  1. wait_ms histogram: time spent waiting for the write lock before the query actually starts
  2. hold_ms histogram: time the writer holds the lock during the query
  3. contention_total counter: count of writes that had to wait > 100ms
  4. slow_writer_log: log writer name + duration whenever hold_ms > 500ms (configurable)

Expose via the existing /api/perf endpoint structure (already has per-component metrics from #1123).

Per-writer attribution

The infra to distinguish writers exists in #1123's per-component IO metrics. Use the same component tags:

  • neighbor_builder
  • mqtt_handler (transmission/observation inserts)
  • prune_packets, prune_observers, prune_metrics
  • mbcap_persist
  • vacuum

Acceptance

  • /api/perf includes db.writer.wait_ms_p50/p95/p99 and db.writer.hold_ms_p50/p95/p99 per component tag
  • Log on hold_ms > 500ms: [db-slow-writer] component=X duration=Y query=<truncated>
  • Test: synthetic 60s-blocking query → wait_ms > 50000 for subsequent writes; log line emitted
  • Mutation: revert the instrumentation → metrics gone, test fails
  • CI gate: add a smoke test that asserts neighbor-builder hold_ms < 1000ms on a fixture with 100k observations

Out of scope

  • Replacing SQLite with a multi-writer DB
  • Per-statement timing (too granular)
  • Cross-process tracing

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions