Skip to content

Conversation

@shoummu1
Copy link
Collaborator

@shoummu1 shoummu1 commented Nov 14, 2025

πŸ“‹ Summary

This PR delivers a comprehensive structured JSON logging pipeline that captures correlation IDs end-to-end (ingress middleware β†’ services β†’ persistence) while maintaining backward compatibility with legacy console/file logs. It introduces:

  • Correlation ID tracking: Extract, preserve, and generate unique request identifiers across the entire request lifecycle
  • Structured logging: Persist enriched logs to database with user context, performance metrics, and security indicators
  • Security & audit trails: Specialized loggers for authentication events, suspicious activity, and CRUD operations
  • Performance aggregation: Automatic rollup of logs into time-windowed metrics with percentiles
  • Admin UI enhancement: Rebuilt System Logs tab with search, correlation tracing, security events, and performance analytics

πŸ”— Related Issues

#300


πŸ”§ Changes Made

Core Implementation

Correlation ID Infrastructure

  • New utility module (mcpgateway/utils/correlation_id.py): ContextVar-based correlation ID storage for async-safe request tracking across the entire request lifecycle
  • New middleware (mcpgateway/middleware/correlation_id.py): HTTP middleware for X-Correlation-ID header extraction, validation, generation, and injection into responses
  • Enhanced logging (mcpgateway/services/logging_service.py): CorrelationIdJsonFormatter for automatic correlation ID injection into JSON logs with OpenTelemetry trace context

Structured Logging & Observability

  • New structured logger (mcpgateway/services/structured_logger.py): Central logging facade that persists to database (StructuredLogEntry) with enriched metadata (user, component, operation type, duration)
  • New log aggregator (mcpgateway/services/log_aggregator.py): Aggregates structured logs into PerformanceMetric windows with percentiles (p50/p95/p99) and error rates
  • New security logger (mcpgateway/services/security_logger.py): Specialized logger for authentication attempts, suspicious activity, and threat scoring
  • New audit trail service (mcpgateway/services/audit_trail_service.py): CRUD operation tracking with change sets, data classification, and review flags

API & Admin UI

  • New log search router (mcpgateway/routers/log_search.py): RESTful endpoints for log search, correlation tracing, security events, audit trails, and performance metrics
  • Enhanced Admin UI (mcpgateway/static/admin.js, mcpgateway/templates/admin.html): System Logs tab rebuilt with quick actions, correlation trace modal, unified timeline view, and dynamic filters

Database Schema

  • New Alembic migration (mcpgateway/alembic/versions/k5e6f7g8h9i0_add_structured_logging_tables.py): Creates 4 new tables:
    • structured_log_entries: Comprehensive log storage with correlation IDs, user context, performance data, security indicators
    • performance_metrics: Time-windowed aggregations with percentile calculations
    • security_events: Threat analysis, failed attempt tracking, alert management
    • audit_trails: CRUD tracking with change detection and compliance metadata

βš™οΈ Configuration

New Settings in config.py:

  1. Correlation ID Settings (4 new fields):

    • correlation_id_enabled: Enable/disable correlation ID tracking (default: True)
    • correlation_id_header: Configurable header name (default: X-Correlation-ID)
    • correlation_id_preserve: Preserve client-provided IDs (default: True)
    • correlation_id_response_header: Echo correlation ID in responses (default: True)
  2. Structured Logging Settings (3 new fields):

    • structured_logging_enabled: Enable JSON logging with DB persistence (default: True)
    • structured_logging_database_enabled: Persist logs to database (default: True)
    • structured_logging_external_enabled: Send to external systems (default: False)
  3. Performance Tracking Settings (6 new fields):

    • performance_tracking_enabled: Enable performance metrics (default: True)
    • performance_threshold_*_ms: Alert thresholds for database queries, tool invocations, resource reads, HTTP requests
    • performance_degradation_multiplier: Alert threshold vs baseline (default: 1.5)
  4. Security Logging Settings (4 new fields):

    • security_logging_enabled: Enable security event logging (default: True)
    • security_failed_auth_threshold: Failed attempts before high severity (default: 5)
    • security_threat_score_alert: Threat score alert threshold (default: 0.7)
    • security_rate_limit_window_minutes: Rate limit check window (default: 5)
  5. Metrics Aggregation Settings (4 new fields):

    • metrics_aggregation_enabled: Enable automatic log aggregation (default: True)
    • metrics_aggregation_backfill_hours: Historical data to backfill on startup (default: 6)
    • metrics_aggregation_window_minutes: Aggregation window size (default: 5)
    • metrics_aggregation_auto_start: Auto-run aggregation loop (default: False)
  6. Log Search Settings (2 new fields):

    • log_search_max_results: Maximum results per query (default: 1000)
    • log_retention_days: Days to retain logs in database (default: 30)

Updated .env.example:

  • Added 4 new active Correlation ID settings (CORRELATION_ID_ENABLED, CORRELATION_ID_HEADER, CORRELATION_ID_PRESERVE, CORRELATION_ID_RESPONSE_HEADER)
  • Added 17 new commented examples for Structured Logging, Performance Tracking, Security Logging, Metrics Aggregation, and Log Search settings
  • All 21 settings are fully documented in config.py with Pydantic Field definitions and defaults

πŸ”Œ Integration Points

Middleware Stack (main.py):

  1. Registered CorrelationIDMiddleware after RequestLoggingMiddleware (execution order: RequestLogging β†’ CorrelationID β†’ Auth β†’ Observability)
  2. Added background tasks for metrics aggregation backfill + continuous loop when metrics_aggregation_auto_start=True
  3. Included log_search router when structured_logging_enabled=True

Authentication & Security:

  1. auth.py: Enhanced JWT validation with correlation ID context
  2. middleware/auth_middleware.py: AuthContextMiddleware now logs successful/failed authentication attempts via SecurityLogger
  3. middleware/http_auth_middleware.py: Unified correlation ID usage across plugin auth hooks

Service Layer:

  1. services/tool_service.py: Integrated correlation ID fallback chain and structured logging for tool invocations
  2. services/resource_service.py: Added user context and audit logging for resource operations
  3. services/prompt_service.py: Enhanced with structured logging and performance tracking
  4. services/server_service.py: Integrated audit trails for server lifecycle events
  5. services/gateway_service.py: Added correlation ID propagation for federated requests
  6. services/a2a_service.py: Added correlation ID and user context to agent invocations

Observability:

  1. observability.py: Auto-inject correlation_id into OpenTelemetry spans as request.id attribute
  2. middleware/request_logging_middleware.py: Gateway boundary logging (request_started/completed) with correlation IDs, user resolution, and duration tracking
  3. admin.py: Plugin marketplace endpoints emit structured logs + audit trails for compliance

πŸ“ New Files

  • mcpgateway/middleware/correlation_id.py – FastAPI middleware that extracts/preserves correlation IDs and injects them into responses
  • mcpgateway/utils/correlation_id.py – ContextVar utilities for generating, validating, and retrieving correlation IDs across async scopes
  • mcpgateway/services/structured_logger.py – Central structured logging facade that writes to JSON, DB, and optional external sinks
  • mcpgateway/services/log_aggregator.py – Aggregates StructuredLogEntry rows into PerformanceMetric windows and exposes helper APIs
  • mcpgateway/services/security_logger.py – Specialized logger for auth/suspicious events, computing threat scores and security audit entries
  • mcpgateway/services/audit_trail_service.py – Shared audit trail writer that records CRUD/data-access operations with change tracking
  • mcpgateway/routers/log_search.py – FastAPI router exposing /api/logs/search, /trace, /security-events, /audit-trails, /performance-metrics endpoints
  • mcpgateway/alembic/versions/k5e6f7g8h9i0_add_structured_logging_tables.py – Migration that creates structured_log_entries, performance_metrics, security_events, and audit_trails tables plus supporting indexes

Example Usage

curl -v http://localhost:4444/health

Full Response:

*   Trying 127.0.0.1:4444...
* Connected to localhost (127.0.0.1) port 4444 (#0)
> GET /health HTTP/1.1
> Host: localhost:4444
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< date: Thu, 27 Nov 2025 15:00:29 GMT
< server: uvicorn
< content-length: 20
< content-type: application/json
< x-content-type-options: nosniff
< x-frame-options: DENY
< x-xss-protection: 0
< x-download-options: noopen
< referrer-policy: strict-origin-when-cross-origin
< content-security-policy: default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval' https://cdnjs.cloudflare.com https://cdn.tailwindcss.com https://cdn.jsdelivr.net https://unpkg.com; style-src 'self' 'unsafe-inline' https://cdnjs.cloudflare.com https://cdn.jsdelivr.net; img-src 'self' data: https:; font-src 'self' data: https://cdnjs.cloudflare.com; connect-src 'self' ws: wss: https:; frame-ancestors 'none';
< x-correlation-id: 6930e1f1a8b84beb904e18594bbf15dd
<
* Connection #0 to host localhost left intact
{"status":"healthy"}
  • Response header: x-correlation-id: 6930e1f1a8b84beb904e18594bbf15dd
  • Server logs: {"request_id": "6930e1f1a8b84beb904e18594bbf15dd", ...}

Correlation trace in Admin UI:

  1. Navigate to Admin UI β†’ System Logs tab
  2. Click on correlation ID to Trace the correlation ID
  3. Enter correlation ID or paste from search box
  4. View unified timeline with all logs, security events, audit trails, and performance metrics for that request

@shoummu1 shoummu1 marked this pull request as ready for review November 14, 2025 13:46
@shoummu1 shoummu1 force-pushed the feat/correlation-id-logging branch from d0712be to a17e408 Compare November 20, 2025 07:48
@shoummu1 shoummu1 marked this pull request as draft November 20, 2025 11:18
@shoummu1 shoummu1 force-pushed the feat/correlation-id-logging branch 2 times, most recently from dd94c98 to cb9e60a Compare November 26, 2025 11:42
@shoummu1 shoummu1 marked this pull request as ready for review November 27, 2025 15:25
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
Signed-off-by: Shoumi <[email protected]>
@shoummu1 shoummu1 force-pushed the feat/correlation-id-logging branch from 7650197 to f5d3932 Compare November 28, 2025 05:52
Signed-off-by: Shoumi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants