Body:
Summary
Open-Audit already exposes Prometheus metrics at /metrics and has a circuit breaker and rate limiter in lib/resilience/. However, there is no human-readable status page that shows the health of the system at a glance — not just for operators, but for any contributor or user who wants to understand whether the system is working correctly. This issue adds a public /status page and the backend API that powers it.
Required work
New API route: app/api/status/route.ts
Return a structured JSON health report:
json{
"status": "healthy" | "degraded" | "down",
"timestamp": "ISO8601",
"components": {
"stellarRpc": {
"status": "healthy" | "degraded" | "down",
"latencyMs": 142,
"lastChecked": "ISO8601",
"circuitBreakerState": "closed" | "open" | "half-open"
},
"database": {
"status": "healthy" | "down",
"latencyMs": 8,
"lastChecked": "ISO8601"
},
"redis": {
"status": "healthy" | "down" | "not-configured",
"latencyMs": 2,
"lastChecked": "ISO8601"
},
"worker": {
"status": "healthy" | "down" | "not-configured",
"lastHeartbeat": "ISO8601"
}
},
"metrics": {
"eventsIndexedLast1h": 1452,
"eventsIndexedLast24h": 18934,
"translationSuccessRate1h": 0.94,
"translationSuccessRate24h": 0.97,
"averageTranslationLatencyMs": 12,
"activeWebSocketConnections": 7
}
}
Implementation:
Stellar RPC health: ping getLatestLedger and record latency; read circuit breaker state from lib/resilience/circuit-breaker.ts
Database health: run SELECT 1 via Prisma and record latency
Redis health: ping Redis if REDIS_URL is set; return "not-configured" otherwise
Worker health: the indexer worker writes a heartbeat key to Redis every 30 seconds; the status API reads it and marks the worker "down" if the key is older than 90 seconds
Metrics: query the database for event counts and translation outcomes in the last 1h and 24h windows
Overall status is "healthy" if all configured components are healthy; "degraded" if any component is degraded but the system is partially functional; "down" if Stellar RPC or the database is unreachable
New page: app/status/page.tsx
Server-rendered page that fetches /api/status on load and refreshes every 30 seconds (use setInterval + router.refresh())
Display each component as a status row with a green/amber/red indicator dot, component name, latency, and last-checked time
Display the metrics section as a simple stats grid (events per hour, translation success rate, active connections)
Show a banner at the top: "All systems operational" (green), "Partial outage" (amber), or "Major outage" (red)
The page must render meaningfully even when the API is partially down — show what is available and grey out what is not
Worker heartbeat:
Add a setInterval in src/worker/indexer.ts that writes HSET open-audit:worker:heartbeat lastSeen to Redis every 30 seconds
Acceptance criteria
GET /api/status returns a valid JSON response in under 500ms under normal conditions
Each component's health reflects its actual state — simulate a Stellar RPC timeout and confirm stellarRpc.status becomes "down" and overall status becomes "degraded"
The /status page renders correctly in a browser with all component rows and metric stats visible
The page auto-refreshes every 30 seconds without a full page reload
Worker heartbeat is written to Redis correctly and the status API correctly detects a stale heartbeat as "down"
Unit tests cover: all-healthy response shape, degraded when one component fails, down when database is unreachable, stale worker heartbeat detection
npm run lint and npm test pass with no regressions
Body:
Summary
Open-Audit already exposes Prometheus metrics at /metrics and has a circuit breaker and rate limiter in lib/resilience/. However, there is no human-readable status page that shows the health of the system at a glance — not just for operators, but for any contributor or user who wants to understand whether the system is working correctly. This issue adds a public /status page and the backend API that powers it.
Required work
New API route: app/api/status/route.ts
Return a structured JSON health report:
json{
"status": "healthy" | "degraded" | "down",
"timestamp": "ISO8601",
"components": {
"stellarRpc": {
"status": "healthy" | "degraded" | "down",
"latencyMs": 142,
"lastChecked": "ISO8601",
"circuitBreakerState": "closed" | "open" | "half-open"
},
"database": {
"status": "healthy" | "down",
"latencyMs": 8,
"lastChecked": "ISO8601"
},
"redis": {
"status": "healthy" | "down" | "not-configured",
"latencyMs": 2,
"lastChecked": "ISO8601"
},
"worker": {
"status": "healthy" | "down" | "not-configured",
"lastHeartbeat": "ISO8601"
}
},
"metrics": {
"eventsIndexedLast1h": 1452,
"eventsIndexedLast24h": 18934,
"translationSuccessRate1h": 0.94,
"translationSuccessRate24h": 0.97,
"averageTranslationLatencyMs": 12,
"activeWebSocketConnections": 7
}
}
Implementation:
Stellar RPC health: ping getLatestLedger and record latency; read circuit breaker state from lib/resilience/circuit-breaker.ts
Database health: run SELECT 1 via Prisma and record latency
Redis health: ping Redis if REDIS_URL is set; return "not-configured" otherwise
Worker health: the indexer worker writes a heartbeat key to Redis every 30 seconds; the status API reads it and marks the worker "down" if the key is older than 90 seconds
Metrics: query the database for event counts and translation outcomes in the last 1h and 24h windows
Overall status is "healthy" if all configured components are healthy; "degraded" if any component is degraded but the system is partially functional; "down" if Stellar RPC or the database is unreachable
New page: app/status/page.tsx
Server-rendered page that fetches /api/status on load and refreshes every 30 seconds (use setInterval + router.refresh())
Display each component as a status row with a green/amber/red indicator dot, component name, latency, and last-checked time
Display the metrics section as a simple stats grid (events per hour, translation success rate, active connections)
Show a banner at the top: "All systems operational" (green), "Partial outage" (amber), or "Major outage" (red)
The page must render meaningfully even when the API is partially down — show what is available and grey out what is not
Worker heartbeat:
Add a setInterval in src/worker/indexer.ts that writes HSET open-audit:worker:heartbeat lastSeen to Redis every 30 seconds
Acceptance criteria
GET /api/status returns a valid JSON response in under 500ms under normal conditions
Each component's health reflects its actual state — simulate a Stellar RPC timeout and confirm stellarRpc.status becomes "down" and overall status becomes "degraded"
The /status page renders correctly in a browser with all component rows and metric stats visible
The page auto-refreshes every 30 seconds without a full page reload
Worker heartbeat is written to Redis correctly and the status API correctly detects a stale heartbeat as "down"
Unit tests cover: all-healthy response shape, degraded when one component fails, down when database is unreachable, stale worker heartbeat detection
npm run lint and npm test pass with no regressions