fix: resolve #932 #935 #936 #938 — security, reliability, and observability#1041
Open
rejoicetukura-blip wants to merge 1 commit into
Open
Conversation
…-plug#936, solutions-plug#938 solutions-plug#935 — Add full jitter to RPC retry backoff - Introduce RPC_BACKOFF_JITTER_FACTOR env var (default 1.0 = full jitter) - Replace deterministic exponential backoff with random(0, min(cap, base*2^n)) - Add rpc_backoff_jitter_factor field to Config and BlockchainClient - Add unit tests verifying unique delays and zero-jitter determinism solutions-plug#938 — Handle ledger gaps/forks in blockchain sync worker - Persist last processed ledger as a checkpoint key in Redis (7d TTL) - On restart, detect and log gaps between checkpoint and current ledger - During normal sync, detect sequence skips and emit log.warn with gap size - Emit log.error alert when gap exceeds 10 ledgers - Add observe_ledger_gap metric (blockchain_ledger_gaps_total) solutions-plug#936 — Supervised sync worker with Prometheus metrics and health endpoint - Extract inner run_sync_loop (no coordinator) from run_sync_worker - Wrap in supervised restart loop in start_background_tasks: catches panics, increments blockchain_sync_worker_restarts_total, restarts after 1s - Add blockchain_sync_worker_last_heartbeat_ts gauge updated each poll cycle - Add GET /health/ready endpoint: returns 503 if heartbeat older than 60s solutions-plug#932 — HMAC-SHA256 email idempotency key - Replace SHA-256(recipient||template||data) with HMAC-SHA256(secret, recipient||template||hour_bucket) - Add EMAIL_IDEMPOTENCY_SECRET env var (falls back to HMAC_KEY) - Add hour-boundary timestamp bucket to bound pre-computation window - Update idempotency_key signature; propagate secret through EmailService and queue - Add test verifying key changes when secret changes
|
@rejoicetukura-blip Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits. You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Resolves four issues across security, blockchain reliability, and observability.
closes #935 — RPC retry backoff lacks jitter (thundering herd on recovery)
Problem: Exponential backoff was deterministic — all API instances retried simultaneously after an RPC outage.
Fix:
delay = random(0, min(cap, base * 2^attempt))RPC_BACKOFF_JITTER_FACTORenv var (default1.0= full jitter,0.0= no jitter)rpc_backoff_jitter_factortoConfigandBlockchainClientFiles:
src/blockchain.rs,src/config.rs,Cargo.toml(addsrand = "0.8")closes #938 — Ledger divergence not handled in blockchain sync worker
Problem: Sync worker had no checkpoint persistence or gap detection — restarts and missed events silently corrupted market state.
Fix:
log::warnwithgap_sizelog::erroralert fires when gap exceeds 10 ledgers (market state may be inconsistent)blockchain_ledger_gaps_total{gap_type="sync"}Files:
src/blockchain.rs,src/metrics.rscloses #936 — Blockchain sync worker crash not tracked or alerted
Problem: Sync worker was fire-and-forget — panics silently stopped market data updates with no metric, alert, or restart.
Fix:
run_sync_loop(no coordinator coupling) fromrun_sync_workerstart_background_taskswraps the loop in a supervised restart: catches panics viaJoinHandle, increments counter, restarts after 1 s back-offblockchain_sync_worker_restarts_totalblockchain_sync_worker_last_heartbeat_tsupdated on every successful poll cycleGET /health/readyendpoint: returns200when heartbeat is ≤ 60 s old,503 SERVICE_UNAVAILABLEotherwise — suitable as a Kubernetes readiness probeFiles:
src/blockchain.rs,src/metrics.rs,src/handlers.rs,src/main.rscloses #932 — Email idempotency key allows pre-computation
Problem:
SHA-256(recipient || template)is fully deterministic from public inputs — attackers can pre-compute keys and poison the idempotency cache.Fix:
HMAC-SHA256(secret, recipient || "|" || template || "|" || hour_bucket)EMAIL_IDEMPOTENCY_SECRETenv var (falls back toHMAC_KEY) is required for production; prevents external pre-computationidempotency_keysignature; secret threaded throughEmailService(idempotency_secretfield) andEmailQueuedifferent_secret_produces_different_keydirectly asserts the acceptance criterionFiles:
src/email/service.rs,src/email/queue.rs,src/config.rsNew environment variables
RPC_BACKOFF_JITTER_FACTOR1.0EMAIL_IDEMPOTENCY_SECRETHMAC_KEYNew metrics
blockchain_sync_worker_restarts_totalblockchain_sync_worker_last_heartbeat_tsblockchain_ledger_gaps_totalNew endpoints
GET /health/readyTesting
idempotency_keysignature