-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
What problem are you facing?
The circuit breaker implementation in #6777 prevents XR reconciliation thrashing by opening when watch events arrive too frequently. When the circuit opens, users see a Responsive condition on their XR indicating the problem.
Operators need to monitor circuit breaker behavior at the cluster level to detect and alert on thrashing patterns. The status condition provides per-XR visibility, but there's no way to:
- Alert when circuits are opening frequently across the cluster
- Track how many circuits are currently open
- Measure the rate of dropped reconciliation events
- Identify which XRD types are experiencing thrashing
Without metrics, operators can't proactively detect thrashing patterns or set up alerts before users notice degraded responsiveness.
How could Crossplane help solve your problem?
Add Prometheus metrics for circuit breaker state transitions and event handling. The metrics should use the controller label (format: composite/<plural>.<group>) to provide per-XRD visibility without per-XR cardinality explosion.
Proposed metrics:
-
crossplane_circuit_breaker_opens_total- Counter incremented when circuit transitions from closed to open. Tracks thrashing frequency per XRD type. -
crossplane_circuit_breaker_closes_total- Counter incremented when circuit transitions from open to closed. Combined with opens, allows deriving current open circuit count. -
crossplane_circuit_breaker_events_total{result="allowed|dropped|halfopen_allowed"}- Counter tracking all events by outcome. Theresultlabel distinguishes normal events, dropped events during circuit open, and probe events during half-open state.
These counters enable operators to:
- Alert when
rate(opens_total[5m]) > thresholdindicates frequent thrashing - Track currently open circuits via
opens_total - closes_total - Monitor drop rates via
rate(events_total{result="dropped"}[5m]) - Identify problematic XRD types for investigation
Metadata
Metadata
Assignees
Labels
Type
Projects
Status