Summary
The current /health endpoint returns a static 200 OK without verifying that external dependencies are actually reachable. Kubernetes liveness/readiness probes pointing at this endpoint will not detect a broken database connection or a Stellar RPC outage.
Proposed Solution
Extend src/routes/health.ts with a GET /health/deep endpoint:
{
"status": "healthy" | "degraded" | "unhealthy",
"version": "1.2.3",
"uptime": 3600,
"checks": {
"database": { "status": "healthy", "latencyMs": 4 },
"stellarRpc": { "status": "healthy", "latencyMs": 120, "ledger": 54321 },
"twilio": { "status": "healthy", "latencyMs": 89 },
"agentLoop": { "status": "healthy", "lastTickAt": "2026-06-27T10:00:00Z" }
}
}
Rules:
- Any
unhealthy check → HTTP 503 (K8s marks pod NotReady)
- All
degraded but no unhealthy → HTTP 200 with status: degraded
- Deep check protected by internal token to avoid exposing infra details publicly
- Timeout each dependency check at 3s independently
Acceptance Criteria
Summary
The current
/healthendpoint returns a static200 OKwithout verifying that external dependencies are actually reachable. Kubernetes liveness/readiness probes pointing at this endpoint will not detect a broken database connection or a Stellar RPC outage.Proposed Solution
Extend
src/routes/health.tswith aGET /health/deependpoint:{ "status": "healthy" | "degraded" | "unhealthy", "version": "1.2.3", "uptime": 3600, "checks": { "database": { "status": "healthy", "latencyMs": 4 }, "stellarRpc": { "status": "healthy", "latencyMs": 120, "ledger": 54321 }, "twilio": { "status": "healthy", "latencyMs": 89 }, "agentLoop": { "status": "healthy", "lastTickAt": "2026-06-27T10:00:00Z" } } }Rules:
unhealthycheck → HTTP503(K8s marks pod NotReady)degradedbut nounhealthy→ HTTP200withstatus: degradedAcceptance Criteria
GET /health/deeprequires internal auth tokenSELECT 1via Prisma503when any dependency isunhealthydeployment.yamlreadiness probe updated to use/health/deep