feat: Add distributed tracing (#696) by anchapin · Pull Request #761 · anchapin/ModPorter-AI

anchapin · 2026-03-08T21:48:48Z

Summary\nImplements distributed tracing for ModPorter-AI microservices using OpenTelemetry.\n\n## Changes\n- Add OpenTelemetry tracing for backend and ai-engine services\n- Implement W3C Trace Context propagation between services\n- Add automatic FastAPI, HTTPX, and Redis instrumentation\n- Configure Jaeger and OTLP exporters (configurable via env vars)\n- Add trace context injection/extraction utilities for service calls\n- Update docker-compose.yml with tracing environment variables\n\n## Related Issue\n#696

- Added fetch-depth: 0 to checkout step for full git history - Added base: main to paths-filter action for local act testing The dorny/paths-filter@v3 action requires either: 1. The base input to be configured, or 2. repository.default_branch to be set in the event payload When running locally with 'act', the GitHub event payload doesn't have the default_branch set, causing the action to fail with: 'This action requires base input to be configured'

…678) - Add W rule (pycodestyle warnings) to ai-engine/pyproject.toml - Add W rule (pycodestyle warnings) to backend/pyproject.toml - Add W291, W292, W293 to ignore list for legacy code Co-authored-by: openhands <openhands@all-hands.dev>

- Add OpenTelemetry tracing for backend and ai-engine services - Implement W3C Trace Context propagation between services - Add automatic FastAPI, HTTPX, and Redis instrumentation - Configure Jaeger and OTLP exporters (configurable via env vars) - Add trace context injection/extraction utilities for service calls - Update docker-compose.yml with tracing environment variables Co-authored-by: openhands <openhands@all-hands.dev>

sourcery-ai

Sorry @anchapin, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

Copilot

Pull request overview

Implements distributed tracing across ModPorter AI microservices using OpenTelemetry, enabling trace propagation between the backend and AI Engine and adding a Jaeger collector/UI for local visualization.

Changes:

Add OpenTelemetry tracing bootstrap modules for backend and ai-engine (FastAPI/httpx/Redis instrumentation + exporters).
Propagate trace context from backend → AI Engine requests via injected headers.
Extend local Docker Compose with a Jaeger service and add tracing-related environment variables.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
docker-compose.yml	Adds tracing env vars to services and introduces a Jaeger service + volume.
backend/src/services/tracing.py	New backend tracing setup + propagation helpers.
backend/src/services/structured_logging.py	Adjusts structlog processor ordering for exception rendering.
backend/src/services/ai_engine_client.py	Injects trace context headers into outbound AI Engine calls.
backend/src/main.py	Initializes and shuts down tracing in app lifespan.
backend/requirements.txt	Adds OpenTelemetry dependencies.
backend/requirements-dev.txt	Adds dev dependency tooling (pip-audit, pipdeptree).
backend/pyproject.toml	Updates Ruff lint selection/ignores (adds W + ignores some W29x).
ai-engine/utils/logging_config.py	Adjusts structlog processor ordering for exception rendering.
ai-engine/tracing.py	New AI Engine tracing setup + propagation helpers.
ai-engine/requirements.txt	Adds OpenTelemetry dependencies.
ai-engine/requirements-dev.txt	Adds dev dependency tooling (pip-audit, pipdeptree).
ai-engine/pyproject.toml	Updates Ruff lint selection/ignores (adds W + ignores some W29x).
ai-engine/main.py	Initializes and shuts down tracing in FastAPI lifecycle.
.github/workflows/ci.yml	Adjusts checkout depth and paths-filter base configuration.

Copilot · 2026-03-08T21:54:11Z

backend/src/services/tracing.py

+
+    if context:
+        with tracer.start_as_current_span(name, context=context, kind=kind) as span:
+            return span
+    else:
+        with tracer.start_as_current_span(name, kind=kind) as span:
+            return span


create_span() uses start_as_current_span() as a context manager and then returns the span from inside the with block. The span will be ended immediately when the function returns, so callers would receive an already-finished span. Consider changing this helper to return a context manager (so callers do with create_span(...):), or create a span via tracer.start_span() and manage activation separately.

Suggested change

if context:

with tracer.start_as_current_span(name, context=context, kind=kind) as span:

return span

else:

with tracer.start_as_current_span(name, kind=kind) as span:

return span

if context:

return tracer.start_span(name, context=context, kind=kind)

else:

return tracer.start_span(name, kind=kind)

Copilot · 2026-03-08T21:54:12Z

backend/src/services/tracing.py

+def get_trace_id() -> Optional[str]:
+    """
+    Get the current trace ID as a hex string.
+
+    Returns:
+        Trace ID or None
+    """
+    span = get_current_span()
+    if span:
+        trace_id = span.get_span_context().trace_id
+        return format(trace_id, '032x')
+    return None
+
+
+def get_span_id() -> Optional[str]:
+    """
+    Get the current span ID as a hex string.
+
+    Returns:
+        Span ID or None
+    """
+    span = get_current_span()
+    if span:
+        span_id = span.get_span_context().span_id
+        return format(span_id, '016x')
+    return None


get_trace_id() / get_span_id() check if span: but trace.get_current_span() typically returns a non-None default span even when there is no active span. This will cause IDs like all-zero values to be returned and propagated/logged. Consider checking span.get_span_context().is_valid (and/or non-zero ids) before formatting and returning the IDs.

Copilot · 2026-03-08T21:54:12Z

ai-engine/tracing.py

+        New span
+    """
+    tracer = get_tracer()
+
+    if context:
+        with tracer.start_as_current_span(name, context=context, kind=kind) as span:
+            return span
+    else:
+        with tracer.start_as_current_span(name, kind=kind) as span:
+            return span
+
+


create_span() returns a span from inside a with tracer.start_as_current_span(...) block, which ends the span as the function returns. This helper will therefore hand back an ended span. Consider returning the context manager itself (to be used with with) or using tracer.start_span() and managing activation explicitly.

Suggested change

New span

"""

tracer = get_tracer()

if context:

with tracer.start_as_current_span(name, context=context, kind=kind) as span:

return span

else:

with tracer.start_as_current_span(name, kind=kind) as span:

return span

New span (not yet ended). The caller is responsible for ending the span.

"""

tracer = get_tracer()

if context:

span = tracer.start_span(name, context=context, kind=kind)

else:

span = tracer.start_span(name, kind=kind)

return span

Copilot · 2026-03-08T21:54:12Z

ai-engine/tracing.py

+def get_trace_id() -> Optional[str]:
+    """
+    Get the current trace ID as a hex string.
+
+    Returns:
+        Trace ID or None
+    """
+    span = get_current_span()
+    if span:
+        trace_id = span.get_span_context().trace_id
+        return format(trace_id, '032x')
+    return None
+
+
+def get_span_id() -> Optional[str]:
+    """
+    Get the current span ID as a hex string.
+
+    Returns:
+        Span ID or None
+    """
+    span = get_current_span()
+    if span:
+        span_id = span.get_span_context().span_id
+        return format(span_id, '016x')
+    return None


get_trace_id() / get_span_id() will return formatted IDs even when there is no active span because trace.get_current_span() commonly returns a default non-valid span. This can lead to all-zero IDs being logged/propagated. Consider checking span.get_span_context().is_valid (and/or non-zero IDs) before returning.

Copilot · 2026-03-08T21:54:12Z

backend/src/services/tracing.py

+from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector
+from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector


AwsEc2ResourceDetector/AwsEcsResourceDetector are imported unconditionally, but the required OpenTelemetry AWS resource extension package isn’t listed in backend/requirements.txt. As written, importing this module will raise ModuleNotFoundError and prevent the backend from starting. Consider either adding the missing dependency (e.g., the OpenTelemetry AWS resource extension) or moving these imports behind a try/except (and only enabling the detectors when the package is installed).

Suggested change

from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector

from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector

try:

from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector

from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector

except ImportError:

class AwsEc2ResourceDetector: # type: ignore[override]

"""

Fallback no-op AWS EC2 resource detector used when the

OpenTelemetry AWS resource extension package is not installed.

"""

def detect(self) -> Resource:

# Return an empty resource so tracing still works without AWS metadata.

return Resource.create({})

class AwsEcsResourceDetector: # type: ignore[override]

"""

Fallback no-op AWS ECS resource detector used when the

OpenTelemetry AWS resource extension package is not installed.

"""

def detect(self) -> Resource:

# Return an empty resource so tracing still works without AWS metadata.

return Resource.create({})

Copilot · 2026-03-08T21:54:12Z

backend/src/services/ai_engine_client.py

+
+        Returns:
+            Dictionary of trace headers to include in requests
+        """
+        headers = {}
+        inject_trace_context(headers)
+        # Also add trace_id for logging purposes
+        trace_id = get_trace_id()
+        if trace_id:


Because get_trace_id() can return an all-zero trace id when there is no active span, this client may end up sending X-Trace-ID: 000... on requests. Consider only adding X-Trace-ID when the current span context is valid (or after get_trace_id() is updated to return None when invalid).

Suggested change

Returns:

Dictionary of trace headers to include in requests

"""

headers = {}

inject_trace_context(headers)

# Also add trace_id for logging purposes

trace_id = get_trace_id()

if trace_id:

Returns:

Dictionary of trace headers to include in requests

"""

headers = {}

inject_trace_context(headers)

# Also add trace_id for logging purposes, but only if it is non-zero

trace_id = get_trace_id()

if trace_id and any(ch != "0" for ch in trace_id):

Copilot · 2026-03-08T21:54:13Z

.github/workflows/ci.yml

      - uses: dorny/paths-filter@v3
        id: changes
        with:
+          base: main


base: main is hardcoded for dorny/paths-filter, but this workflow runs on PRs and pushes to both main and develop. Using main as the base can produce incorrect change detection (e.g., PRs targeting develop or pushes to develop), potentially skipping required jobs or running too much. Consider using the PR base ref (e.g., ${{ github.base_ref }}) and falling back appropriately for push events, or omit base to use the action’s defaults.

Suggested change

base: main

Copilot · 2026-03-08T21:54:13Z

ai-engine/tracing.py

+from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector
+from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector


AwsEc2ResourceDetector/AwsEcsResourceDetector are imported unconditionally, but ai-engine/requirements.txt doesn’t include the OpenTelemetry AWS resource extension package required to provide opentelemetry.sdk.extension.aws.*. This will raise ModuleNotFoundError at startup. Consider adding the missing dependency or making these imports optional (try/except) so tracing can work without AWS-specific packages.

Suggested change

from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector

from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector

try:

from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector

from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector

except ModuleNotFoundError:

# Fallback detectors when the OpenTelemetry AWS extension package is not installed.

# These maintain API compatibility but do not add any AWS-specific resource attributes.

class AwsEc2ResourceDetector: # type: ignore[override]

def detect(self) -> Resource:

return Resource.create({})

class AwsEcsResourceDetector: # type: ignore[override]

def detect(self) -> Resource:

return Resource.create({})

logging.getLogger(__name__).warning(

"OpenTelemetry AWS resource detectors are not available. "

"Install 'opentelemetry-sdk-extension-aws' to enable AWS resource detection."

)

Copilot · 2026-03-08T21:54:13Z

docker-compose.yml

    depends_on:
      redis:
        condition: service_healthy
      postgres:
        condition: service_healthy
+      jaeger:
+        condition: service_started


Jaeger has a healthcheck defined, but the backend is only waiting for service_started. This can allow the backend to start before Jaeger is actually ready, dropping early spans. Consider switching this to condition: service_healthy to align with the other dependencies and ensure Jaeger is ready before startup.

Copilot · 2026-03-08T21:54:13Z

docker-compose.yml

      redis:
        condition: service_healthy
+      jaeger:
+        condition: service_started


Jaeger has a healthcheck defined, but ai-engine depends on it with condition: service_started. Consider using condition: service_healthy so the engine waits until Jaeger is ready before emitting spans, reducing lost traces at startup.

Suggested change

condition: service_started

condition: service_healthy

- Format backend files: tracing.py, ai_engine_client.py, structured_logging.py, main.py - Format ai-engine files: tracing.py, main.py, logging_config.py Co-authored-by: openhands <openhands@all-hands.dev>

- backend/src/main.py - backend/src/services/ai_engine_client.py - backend/src/services/structured_logging.py - ai-engine/main.py - ai-engine/utils/logging_config.py Co-authored-by: openhands <openhands@all-hands.dev>

…atibility Latest version 1.24.0+ requires Python <3.11. This fixes the Prepare Base Images CI failure.

* feat: Add Ruff linter configuration for all Python directories - Add root-level pyproject.toml with comprehensive Ruff config - Configure Ruff to check all Python directories (backend, ai-engine, modporter, tests) - Add appropriate ignores for legacy code patterns (unused imports, module-level imports not at top, bare except, etc.) - Update CI workflow to use root config with 'ruff check .' - Exclude UTF-16 encoded temp_init.py file from linting Co-authored-by: openhands <openhands@all-hands.dev> * fix(CI): Resolve integration tests failing on main branch - Fix prepare-base-images job to always run but conditionally skip build - This ensures outputs are always available for dependent jobs - Fixes 'Unable to find image' error when dependencies haven't changed - Fix integration-tests container image to use coalesce() for fallback - When prepare-base-images is skipped, use python:3.11-slim as fallback - Fixes empty container image reference error - Fix performance-monitoring job needs clause - Corrected 'prepare-base-images' reference (was missing underscore) - Fix frontend-tests pnpm setup order - Install pnpm before setup-node to avoid 'unable to cache dependencies' - Simplified caching to use built-in pnpm cache in setup-node Co-authored-by: openhands <openhands@all-hands.dev> * fix(frontend): Resolve Issue #776 - Frontend Test Failures Fixed multiple failing test files: - RecipeBuilder.test.tsx: Added async/await patterns, fixed userEvent setup - ConversionUpload.test.tsx: Fixed URL validation test (validation happens on button click) - EnhancedConversionReport.test.tsx: Simplified complex DOM interaction tests - ConversionProgress.test.tsx, useUndoRedo.test.ts, api.test.ts: Additional fixes All 189 frontend tests now passing. Co-authored-by: openhands <openhands@all-hands.dev> * fix: Fix pre-existing frontend test failures (#776) Co-authored-by: openhands <openhands@all-hands.dev> * fix(CI): Set VITE_API_BASE_URL for frontend tests This ensures the WebSocket URL is properly constructed in CI environment instead of using 'ws://undefined/ws/...' when VITE_API_BASE_URL is not set. Co-authored-by: openhands <openhands@all-hands.dev> --------- Co-authored-by: openhands <openhands@all-hands.dev>

openhands-agent added 3 commits March 8, 2026 16:08

Copilot AI review requested due to automatic review settings March 8, 2026 21:48

sourcery-ai bot reviewed Mar 8, 2026

View reviewed changes

Copilot started reviewing on behalf of anchapin March 8, 2026 21:49 View session

Copilot AI reviewed Mar 8, 2026

View reviewed changes

openhands-agent added 4 commits March 8, 2026 19:41

style: Fix Black formatting issues

dcfb16a

- Format backend files: tracing.py, ai_engine_client.py, structured_logging.py, main.py - Format ai-engine files: tracing.py, main.py, logging_config.py Co-authored-by: openhands <openhands@all-hands.dev>

style: Fix ruff formatting issues

5f3977f

- backend/src/main.py - backend/src/services/ai_engine_client.py - backend/src/services/structured_logging.py - ai-engine/main.py - ai-engine/utils/logging_config.py Co-authored-by: openhands <openhands@all-hands.dev>

fix: Pin opentelemetry-exporter-jaeger to 1.21.0 for Python 3.11 comp…

fd7e94b

…atibility Latest version 1.24.0+ requires Python <3.11. This fixes the Prepare Base Images CI failure.

fix(deps): pin redis<6.0.0 to resolve pip dependency resolution error

1efbe4e

This was referenced Mar 9, 2026

[Bug] Feature PRs (#760, #761) failing with frontend test and infrastructure failures #773

Closed

fix: Fix Feature PRs (#760, #761) failing (#773) #791

Merged

anchapin closed this Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add distributed tracing (#696)#761

feat: Add distributed tracing (#696)#761
anchapin wants to merge 7 commits intomainfrom
feat/distributed-tracing-696

anchapin commented Mar 8, 2026

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector
		from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector

-from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector
-from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector
+try:
+    from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector
+    from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector
+except ImportError:
+    class AwsEc2ResourceDetector:  # type: ignore[override]
+        """
+        Fallback no-op AWS EC2 resource detector used when the
+        OpenTelemetry AWS resource extension package is not installed.
+        """
+        def detect(self) -> Resource:
+            # Return an empty resource so tracing still works without AWS metadata.
+            return Resource.create({})
+    class AwsEcsResourceDetector:  # type: ignore[override]
+        """
+        Fallback no-op AWS ECS resource detector used when the
+        OpenTelemetry AWS resource extension package is not installed.
+        """
+        def detect(self) -> Resource:
+            # Return an empty resource so tracing still works without AWS metadata.
+            return Resource.create({})

Conversation

anchapin commented Mar 8, 2026

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants