Skip to content

fix: resolve #932 #935 #936 #938 — security, reliability, and observability#1041

Open
rejoicetukura-blip wants to merge 1 commit into
solutions-plug:mainfrom
rejoicetukura-blip:fix/issues-932-935-936-938
Open

fix: resolve #932 #935 #936 #938 — security, reliability, and observability#1041
rejoicetukura-blip wants to merge 1 commit into
solutions-plug:mainfrom
rejoicetukura-blip:fix/issues-932-935-936-938

Conversation

@rejoicetukura-blip

@rejoicetukura-blip rejoicetukura-blip commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Resolves four issues across security, blockchain reliability, and observability.


closes #935 — RPC retry backoff lacks jitter (thundering herd on recovery)

Problem: Exponential backoff was deterministic — all API instances retried simultaneously after an RPC outage.

Fix:

  • Replaced deterministic backoff with full-jitter: delay = random(0, min(cap, base * 2^attempt))
  • Added RPC_BACKOFF_JITTER_FACTOR env var (default 1.0 = full jitter, 0.0 = no jitter)
  • Added rpc_backoff_jitter_factor to Config and BlockchainClient
  • Unit tests: 100-simulation run verifies delays are unique; zero-jitter produces deterministic values

Files: src/blockchain.rs, src/config.rs, Cargo.toml (adds rand = "0.8")


closes #938 — Ledger divergence not handled in blockchain sync worker

Problem: Sync worker had no checkpoint persistence or gap detection — restarts and missed events silently corrupted market state.

Fix:

  • Persist last processed ledger as a checkpoint key in Redis (7-day TTL) separate from the in-memory cursor
  • On restart, compare stored checkpoint against current ledger; detect and log any gap with size
  • During normal sync, detect sequence skips and emit log::warn with gap_size
  • log::error alert fires when gap exceeds 10 ledgers (market state may be inconsistent)
  • New Prometheus counter blockchain_ledger_gaps_total{gap_type="sync"}

Files: src/blockchain.rs, src/metrics.rs


closes #936 — Blockchain sync worker crash not tracked or alerted

Problem: Sync worker was fire-and-forget — panics silently stopped market data updates with no metric, alert, or restart.

Fix:

  • Extracted inner run_sync_loop (no coordinator coupling) from run_sync_worker
  • start_background_tasks wraps the loop in a supervised restart: catches panics via JoinHandle, increments counter, restarts after 1 s back-off
  • New Prometheus counter blockchain_sync_worker_restarts_total
  • New Prometheus gauge blockchain_sync_worker_last_heartbeat_ts updated on every successful poll cycle
  • New GET /health/ready endpoint: returns 200 when heartbeat is ≤ 60 s old, 503 SERVICE_UNAVAILABLE otherwise — suitable as a Kubernetes readiness probe

Files: src/blockchain.rs, src/metrics.rs, src/handlers.rs, src/main.rs


closes #932 — Email idempotency key allows pre-computation

Problem: SHA-256(recipient || template) is fully deterministic from public inputs — attackers can pre-compute keys and poison the idempotency cache.

Fix:

  • Replaced with HMAC-SHA256(secret, recipient || "|" || template || "|" || hour_bucket)
  • EMAIL_IDEMPOTENCY_SECRET env var (falls back to HMAC_KEY) is required for production; prevents external pre-computation
  • Hour-boundary timestamp bucket bounds the validity window to ~1 hour — pre-computed keys expire each hour even if the secret leaks
  • Updated idempotency_key signature; secret threaded through EmailService (idempotency_secret field) and EmailQueue
  • Unit test: different_secret_produces_different_key directly asserts the acceptance criterion

Files: src/email/service.rs, src/email/queue.rs, src/config.rs


New environment variables

Variable Default Purpose
RPC_BACKOFF_JITTER_FACTOR 1.0 Jitter fraction for RPC retry backoff
EMAIL_IDEMPOTENCY_SECRET falls back to HMAC_KEY HMAC secret for email idempotency keys

New metrics

Metric Type Description
blockchain_sync_worker_restarts_total Counter Worker restart count after panics
blockchain_sync_worker_last_heartbeat_ts Gauge Unix timestamp of last sync heartbeat
blockchain_ledger_gaps_total Counter Ledger sequence gaps detected

New endpoints

Endpoint Description
GET /health/ready Readiness probe — 503 if sync worker heartbeat > 60 s old

Testing

…-plug#936, solutions-plug#938

solutions-plug#935 — Add full jitter to RPC retry backoff
- Introduce RPC_BACKOFF_JITTER_FACTOR env var (default 1.0 = full jitter)
- Replace deterministic exponential backoff with random(0, min(cap, base*2^n))
- Add rpc_backoff_jitter_factor field to Config and BlockchainClient
- Add unit tests verifying unique delays and zero-jitter determinism

solutions-plug#938 — Handle ledger gaps/forks in blockchain sync worker
- Persist last processed ledger as a checkpoint key in Redis (7d TTL)
- On restart, detect and log gaps between checkpoint and current ledger
- During normal sync, detect sequence skips and emit log.warn with gap size
- Emit log.error alert when gap exceeds 10 ledgers
- Add observe_ledger_gap metric (blockchain_ledger_gaps_total)

solutions-plug#936 — Supervised sync worker with Prometheus metrics and health endpoint
- Extract inner run_sync_loop (no coordinator) from run_sync_worker
- Wrap in supervised restart loop in start_background_tasks: catches panics,
  increments blockchain_sync_worker_restarts_total, restarts after 1s
- Add blockchain_sync_worker_last_heartbeat_ts gauge updated each poll cycle
- Add GET /health/ready endpoint: returns 503 if heartbeat older than 60s

solutions-plug#932 — HMAC-SHA256 email idempotency key
- Replace SHA-256(recipient||template||data) with HMAC-SHA256(secret, recipient||template||hour_bucket)
- Add EMAIL_IDEMPOTENCY_SECRET env var (falls back to HMAC_KEY)
- Add hour-boundary timestamp bucket to bound pre-computation window
- Update idempotency_key signature; propagate secret through EmailService and queue
- Add test verifying key changes when secret changes
@drips-wave

drips-wave Bot commented Jun 29, 2026

Copy link
Copy Markdown

@rejoicetukura-blip Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant