feat(health): dependency latency budget healthcheck for RPC and Horizon (#848)#964
Merged
Cedarich merged 6 commits intoJun 29, 2026
Conversation
…RPC (Pulsefy#848) Add dependency latency budget signals that classify testnet RPC and Horizon response times into ok / degraded / hard_down states. Changes ------- * latency-budget.config.ts - Reads per-dependency thresholds from HEALTH_HORIZON_LATENCY_DEGRADED_MS, HEALTH_HORIZON_LATENCY_HARD_DOWN_MS, HEALTH_SOROBAN_RPC_LATENCY_DEGRADED_MS, HEALTH_SOROBAN_RPC_LATENCY_HARD_DOWN_MS env vars (configurable). - Sensible defaults: Horizon degraded=1000ms / hard-down=4000ms, RPC degraded=1500ms / hard-down=5000ms. * latency-budget.health.service.ts - Probes Horizon root via HTTP GET and Soroban RPC via JSON-RPC getHealth concurrently. - Measures round-trip latency and classifies each result as ok, degraded, or hard_down using the configured thresholds. - Connection failures are always hard_down; latency >= hard-down threshold is also hard_down; latency between thresholds is degraded. - Overall state is the worst state across all dependencies. * health.service.ts - Injects LatencyBudgetHealthService; runs latency probe in parallel with existing checks. - hard_down latency → overall status=error (HTTP 503, summary=down). - degraded latency → overall status=ok (HTTP 200, summary=degraded). - latencyBudget object included in every health report response. * health.controller.ts - New GET /health/latency endpoint returning just the latency budget report. - Returns HTTP 503 on hard_down, HTTP 200 otherwise — suitable for Vercel/preview smoke checks. * health.module.ts - Registers LatencyBudgetHealthService. * .env.example - Documents all four HEALTH_* threshold env vars with their defaults. Tests ----- * latency-budget.health.service.spec.ts — full unit suite covering ok path, connection failures, error message capture, response shape. * health.service.spec.ts — extended with LatencyBudgetHealthService mock; new cases for hard_down and degraded latency affecting overall status. Closes Pulsefy#848
|
@amankoli09 Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits. You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀 |
Contributor
|
@amankoli09 fix workflow |
Contributor
Author
|
@Cedarich Please approve the workflow |
Contributor
|
@amankoli09 please fix workflow |
Contributor
Contributor
Author
|
@Cedarich i have fix please do check |
Cedarich
approved these changes
Jun 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #848
Adds health signals that fail when testnet dependency latency exceeds
acceptable thresholds, satisfying all four acceptance criteria:
GET /healthnow includes alatencyBudgetfield; newGET /health/latencyendpoint exposes just the latency reportok/degraded/hard_down— onlyhard_downtriggers HTTP 503HEALTH_HORIZON_LATENCY_DEGRADED_MS,HEALTH_HORIZON_LATENCY_HARD_DOWN_MS,HEALTH_SOROBAN_RPC_LATENCY_DEGRADED_MS,HEALTH_SOROBAN_RPC_LATENCY_HARD_DOWN_MS) with safe defaultsGET /health/latencyreturns 200 on ok/degraded, 503 on hard_down — ideal for simple smoke check scriptsWhat changed
New files
latency-budget.config.ts— reads threshold env vars, exports a typedLatencyBudgetConfigobject.latency-budget.health.service.ts— probes Horizon (HTTP GET) andSoroban RPC (
getHealthJSON-RPC) concurrently, measures round-triplatency, classifies each as
ok/degraded/hard_down.latency-budget.health.service.spec.ts— unit tests: ok path, connectionfailures, error capture, response shape, state rollup.
Modified files
health.service.ts— injectsLatencyBudgetHealthService; runs probein parallel with existing checks; hard_down elevates
statustoerror(HTTP 503); degraded keeps HTTP 200 but sets
summarytodegraded;latencyBudgetobject included in every health report.health.controller.ts— newGET /health/latencyendpoint.health.module.ts— registersLatencyBudgetHealthService..env.example— documents all four threshold env vars with defaults.health.service.spec.ts— addsLatencyBudgetHealthServicemock;two new integration test cases (hard_down → 503, degraded → 200).
HTTP status semantics
overallStateGET /healthstatusGET /health/latencystatusokdegradedhard_downDefault thresholds