Skip to content

Commit a1ce94c

Browse files
rakduttacrivetimihai
authored andcommitted
doc
Signed-off-by: rakdutta <[email protected]>
1 parent 7aa2f4d commit a1ce94c

File tree

3 files changed

+120
-18
lines changed

3 files changed

+120
-18
lines changed

docs/docs/manage/observability.md

Lines changed: 113 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# Observability
1+
## Observability
22

3-
MCP Gateway includes production-grade OpenTelemetry instrumentation for distributed tracing, enabling you to monitor performance, debug issues, and understand request flows.
3+
MCP Gateway includes production-grade OpenTelemetry instrumentation for distributed tracing and Prometheus-compatible metrics exposure.
44

55
## Documentation
66

@@ -23,3 +23,114 @@ mcpgateway
2323
```
2424

2525
View traces at http://localhost:6006
26+
27+
## Prometheus metrics (important)
28+
29+
Note: the metrics exposure is wired from `mcpgateway/main.py` but the HTTP
30+
handler itself is registered by the metrics module. The main application
31+
imports and calls `setup_metrics(app)` from `mcpgateway.services.metrics`. The
32+
`setup_metrics` function instruments the FastAPI app and registers the
33+
Prometheus scrape endpoint using the Prometheus instrumentator; the endpoint
34+
available to Prometheus scrapers is:
35+
36+
- GET /metrics/prometheus
37+
38+
The route is created by `Instrumentator.expose` inside
39+
`mcpgateway/services/metrics.py` (not by manually adding a GET handler in
40+
`main.py`). The endpoint is registered with `include_in_schema=True` (so it
41+
appears in OpenAPI / Swagger) and gzip compression is enabled by default
42+
(`should_gzip=True`) for the exposition handler.
43+
44+
### Env vars / settings that control metrics
45+
46+
- `ENABLE_METRICS` (env) — set to `true` (default) to enable instrumentation; set `false` to disable.
47+
- `METRICS_EXCLUDED_HANDLERS` (env / settings) — comma-separated regexes for endpoints to exclude from instrumentation (useful for SSE/WS or per-request high-cardinality paths). The implementation reads `settings.METRICS_EXCLUDED_HANDLERS` and compiles the patterns.
48+
- `METRICS_CUSTOM_LABELS` (env / settings) — comma-separated `key=value` pairs used as static labels on the `app_info` gauge (low-cardinality values only). When present, a Prometheus `app_info` gauge is created and set to 1 with those labels.
49+
- Additional settings in `mcpgateway/config.py`: `METRICS_NAMESPACE`, `METRICS_SUBSYSTEM`. Note: these config fields exist, but the current `metrics` module does not wire them into the instrumentator by default (they're available for future use/consumption by custom collectors).
50+
51+
### Enable / verify locally
52+
53+
1. Ensure `ENABLE_METRICS=true` in your shell or `.env`.
54+
55+
```bash
56+
export ENABLE_METRICS=true
57+
export METRICS_CUSTOM_LABELS="env=local,team=dev"
58+
export METRICS_EXCLUDED_HANDLERS="/servers/.*/sse,/static/.*"
59+
```
60+
61+
2. Start the gateway (development). By default the app listens on port 4444. The Prometheus endpoint will be:
62+
63+
http://localhost:4444/metrics/prometheus
64+
65+
3. Quick check (get the first lines of exposition text):
66+
67+
```bash
68+
curl -sS http://localhost:4444/metrics/prometheus | head -n 20
69+
```
70+
71+
4. If metrics are disabled, the endpoint returns a small JSON 503 response.
72+
73+
### Prometheus scrape job example
74+
75+
Add the job below to your `prometheus.yml` for local testing:
76+
77+
```yaml
78+
scrape_configs:
79+
- job_name: 'mcp-gateway'
80+
metrics_path: /metrics/prometheus
81+
static_configs:
82+
- targets: ['localhost:4444']
83+
```
84+
85+
If Prometheus runs in Docker, adjust the target host accordingly (host networking
86+
or container host IP). See the repo `docs/manage/scale.md` for examples of
87+
deploying Prometheus in Kubernetes.
88+
89+
### Grafana and dashboards
90+
91+
- Use Grafana to import dashboards for Kubernetes, PostgreSQL and Redis (IDs
92+
suggested elsewhere in the repo). For MCP Gateway app metrics, create panels
93+
for:
94+
- Request rate: `rate(http_requests_total[1m])`
95+
- Error rate: `rate(http_requests_total{status=~"5.."}[5m])`
96+
- P99 latency: `histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))`
97+
98+
### Common pitfalls — short guidance
99+
100+
- High-cardinality labels
101+
- Never add per-request identifiers (user IDs, full URIs, request IDs) as
102+
Prometheus labels. They explode the number of time series and can crash
103+
Prometheus memory.
104+
- Use `METRICS_CUSTOM_LABELS` only for low-cardinality labels (env, region).
105+
106+
- Compression (gzip) vs CPU
107+
- The metrics exposer in `mcpgateway.services.metrics` enables gzip by
108+
default for the `/metrics/prometheus` endpoint. Compressing the payload
109+
reduces network usage but increases CPU on scrape time. On CPU-constrained
110+
nodes consider increasing scrape interval (e.g. 15s→30s) or disabling gzip
111+
at the instrumentor layer.
112+
113+
- Duplicate collectors during reloads/tests
114+
- Instrumentation registers collectors on the global Prometheus registry.
115+
When reloading the app in the same process (tests, interactive sessions)
116+
you may see "collector already registered"; restart the process or clear
117+
the registry in test fixtures.
118+
119+
### Quick checklist
120+
121+
- [ ] `ENABLE_METRICS=true`
122+
- [ ] `/metrics/prometheus` reachable
123+
- [ ] Add scrape job to Prometheus
124+
- [ ] Exclude high-cardinality paths with `METRICS_EXCLUDED_HANDLERS`
125+
- [ ] Use tracing (OTel) for high-cardinality debugging information
126+
127+
## Where to look in the code
128+
129+
- `mcpgateway/main.py` — wiring: imports and calls `setup_metrics(app)` from
130+
`mcpgateway.services.metrics`. The function call instruments the app at
131+
startup; the actual HTTP handler for `/metrics/prometheus` is registered by
132+
the `Instrumentator` inside `mcpgateway/services/metrics.py`.
133+
- `mcpgateway/services/metrics.py` — instrumentation implementation and env-vars.
134+
- `mcpgateway/config.py` — settings defaults and names used by the app.
135+
136+
---

mcpgateway/main.py

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -482,6 +482,8 @@ async def lifespan(_app: FastAPI) -> AsyncIterator[None]:
482482

483483
# Setup metrics instrumentation
484484
setup_metrics(app)
485+
486+
485487
async def validate_security_configuration():
486488
"""Validate security configuration on startup."""
487489
logger.info("🔒 Validating security configuration...")
@@ -1031,8 +1033,6 @@ async def _call_streamable_http(self, scope, receive, send):
10311033
tag_router = APIRouter(prefix="/tags", tags=["Tags"])
10321034
export_import_router = APIRouter(tags=["Export/Import"])
10331035
a2a_router = APIRouter(prefix="/a2a", tags=["A2A Agents"])
1034-
# Create a metrics router for Prometheus metrics
1035-
metrics_router = APIRouter(prefix="/metrics", tags=["Metrics"])
10361036

10371037
# Basic Auth setup
10381038

@@ -3952,12 +3952,6 @@ async def reset_metrics(entity: Optional[str] = None, entity_id: Optional[int] =
39523952
return {"status": "success", "message": f"Metrics reset for {entity if entity else 'all entities'}"}
39533953

39543954

3955-
# Define the /prometheus endpoint
3956-
@metrics_router.get("/prometheus", summary="Prometheus Metrics", description="Expose Prometheus metrics for monitoring.")
3957-
def prometheus_metrics():
3958-
"""Endpoint to expose Prometheus metrics."""
3959-
return setup_metrics(app)
3960-
39613955
####################
39623956
# Healthcheck #
39633957
####################

mcpgateway/services/metrics.py

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,9 @@
4343
import re
4444

4545
# Third-Party
46+
from fastapi import Response, status
4647
from prometheus_client import Gauge, REGISTRY
4748
from prometheus_fastapi_instrumentator import Instrumentator
48-
from fastapi import Response, status
4949

5050
# First-Party
5151
from mcpgateway.config import settings
@@ -108,17 +108,14 @@ def setup_metrics(app):
108108
# Instrument FastAPI app
109109
instrumentator.instrument(app)
110110

111-
# Expose Prometheus metrics at /metrics/prometheus
112-
instrumentator.expose(app, endpoint="/metrics/prometheus", include_in_schema=False, should_gzip=True)
111+
# Expose Prometheus metrics at /metrics/prometheus and include
112+
# the endpoint in the OpenAPI schema so it appears in Swagger UI.
113+
instrumentator.expose(app, endpoint="/metrics/prometheus", include_in_schema=True, should_gzip=True)
113114

114115
print("✅ Metrics instrumentation enabled")
115116
else:
116117
print("⚠️ Metrics instrumentation disabled")
117118

118119
@app.get("/metrics/prometheus")
119120
async def metrics_disabled():
120-
return Response(
121-
content='{"error": "Metrics collection is disabled"}',
122-
media_type="application/json",
123-
status_code=status.HTTP_503_SERVICE_UNAVAILABLE
124-
)
121+
return Response(content='{"error": "Metrics collection is disabled"}', media_type="application/json", status_code=status.HTTP_503_SERVICE_UNAVAILABLE)

0 commit comments

Comments
 (0)