Conversation
- Added fetch-depth: 0 to checkout step for full git history - Added base: main to paths-filter action for local act testing The dorny/paths-filter@v3 action requires either: 1. The base input to be configured, or 2. repository.default_branch to be set in the event payload When running locally with 'act', the GitHub event payload doesn't have the default_branch set, causing the action to fail with: 'This action requires base input to be configured'
…678) - Add W rule (pycodestyle warnings) to ai-engine/pyproject.toml - Add W rule (pycodestyle warnings) to backend/pyproject.toml - Add W291, W292, W293 to ignore list for legacy code Co-authored-by: openhands <openhands@all-hands.dev>
- Add OpenTelemetry tracing for backend and ai-engine services - Implement W3C Trace Context propagation between services - Add automatic FastAPI, HTTPX, and Redis instrumentation - Configure Jaeger and OTLP exporters (configurable via env vars) - Add trace context injection/extraction utilities for service calls - Update docker-compose.yml with tracing environment variables Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
Pull request overview
Implements distributed tracing across ModPorter AI microservices using OpenTelemetry, enabling trace propagation between the backend and AI Engine and adding a Jaeger collector/UI for local visualization.
Changes:
- Add OpenTelemetry tracing bootstrap modules for
backendandai-engine(FastAPI/httpx/Redis instrumentation + exporters). - Propagate trace context from backend → AI Engine requests via injected headers.
- Extend local Docker Compose with a Jaeger service and add tracing-related environment variables.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| docker-compose.yml | Adds tracing env vars to services and introduces a Jaeger service + volume. |
| backend/src/services/tracing.py | New backend tracing setup + propagation helpers. |
| backend/src/services/structured_logging.py | Adjusts structlog processor ordering for exception rendering. |
| backend/src/services/ai_engine_client.py | Injects trace context headers into outbound AI Engine calls. |
| backend/src/main.py | Initializes and shuts down tracing in app lifespan. |
| backend/requirements.txt | Adds OpenTelemetry dependencies. |
| backend/requirements-dev.txt | Adds dev dependency tooling (pip-audit, pipdeptree). |
| backend/pyproject.toml | Updates Ruff lint selection/ignores (adds W + ignores some W29x). |
| ai-engine/utils/logging_config.py | Adjusts structlog processor ordering for exception rendering. |
| ai-engine/tracing.py | New AI Engine tracing setup + propagation helpers. |
| ai-engine/requirements.txt | Adds OpenTelemetry dependencies. |
| ai-engine/requirements-dev.txt | Adds dev dependency tooling (pip-audit, pipdeptree). |
| ai-engine/pyproject.toml | Updates Ruff lint selection/ignores (adds W + ignores some W29x). |
| ai-engine/main.py | Initializes and shuts down tracing in FastAPI lifecycle. |
| .github/workflows/ci.yml | Adjusts checkout depth and paths-filter base configuration. |
backend/src/services/tracing.py
Outdated
|
|
||
| if context: | ||
| with tracer.start_as_current_span(name, context=context, kind=kind) as span: | ||
| return span | ||
| else: | ||
| with tracer.start_as_current_span(name, kind=kind) as span: | ||
| return span |
There was a problem hiding this comment.
create_span() uses start_as_current_span() as a context manager and then returns the span from inside the with block. The span will be ended immediately when the function returns, so callers would receive an already-finished span. Consider changing this helper to return a context manager (so callers do with create_span(...):), or create a span via tracer.start_span() and manage activation separately.
| if context: | |
| with tracer.start_as_current_span(name, context=context, kind=kind) as span: | |
| return span | |
| else: | |
| with tracer.start_as_current_span(name, kind=kind) as span: | |
| return span | |
| if context: | |
| return tracer.start_span(name, context=context, kind=kind) | |
| else: | |
| return tracer.start_span(name, kind=kind) |
| def get_trace_id() -> Optional[str]: | ||
| """ | ||
| Get the current trace ID as a hex string. | ||
|
|
||
| Returns: | ||
| Trace ID or None | ||
| """ | ||
| span = get_current_span() | ||
| if span: | ||
| trace_id = span.get_span_context().trace_id | ||
| return format(trace_id, '032x') | ||
| return None | ||
|
|
||
|
|
||
| def get_span_id() -> Optional[str]: | ||
| """ | ||
| Get the current span ID as a hex string. | ||
|
|
||
| Returns: | ||
| Span ID or None | ||
| """ | ||
| span = get_current_span() | ||
| if span: | ||
| span_id = span.get_span_context().span_id | ||
| return format(span_id, '016x') | ||
| return None |
There was a problem hiding this comment.
get_trace_id() / get_span_id() check if span: but trace.get_current_span() typically returns a non-None default span even when there is no active span. This will cause IDs like all-zero values to be returned and propagated/logged. Consider checking span.get_span_context().is_valid (and/or non-zero ids) before formatting and returning the IDs.
| New span | ||
| """ | ||
| tracer = get_tracer() | ||
|
|
||
| if context: | ||
| with tracer.start_as_current_span(name, context=context, kind=kind) as span: | ||
| return span | ||
| else: | ||
| with tracer.start_as_current_span(name, kind=kind) as span: | ||
| return span | ||
|
|
||
|
|
There was a problem hiding this comment.
create_span() returns a span from inside a with tracer.start_as_current_span(...) block, which ends the span as the function returns. This helper will therefore hand back an ended span. Consider returning the context manager itself (to be used with with) or using tracer.start_span() and managing activation explicitly.
| New span | |
| """ | |
| tracer = get_tracer() | |
| if context: | |
| with tracer.start_as_current_span(name, context=context, kind=kind) as span: | |
| return span | |
| else: | |
| with tracer.start_as_current_span(name, kind=kind) as span: | |
| return span | |
| New span (not yet ended). The caller is responsible for ending the span. | |
| """ | |
| tracer = get_tracer() | |
| if context: | |
| span = tracer.start_span(name, context=context, kind=kind) | |
| else: | |
| span = tracer.start_span(name, kind=kind) | |
| return span |
| def get_trace_id() -> Optional[str]: | ||
| """ | ||
| Get the current trace ID as a hex string. | ||
|
|
||
| Returns: | ||
| Trace ID or None | ||
| """ | ||
| span = get_current_span() | ||
| if span: | ||
| trace_id = span.get_span_context().trace_id | ||
| return format(trace_id, '032x') | ||
| return None | ||
|
|
||
|
|
||
| def get_span_id() -> Optional[str]: | ||
| """ | ||
| Get the current span ID as a hex string. | ||
|
|
||
| Returns: | ||
| Span ID or None | ||
| """ | ||
| span = get_current_span() | ||
| if span: | ||
| span_id = span.get_span_context().span_id | ||
| return format(span_id, '016x') | ||
| return None |
There was a problem hiding this comment.
get_trace_id() / get_span_id() will return formatted IDs even when there is no active span because trace.get_current_span() commonly returns a default non-valid span. This can lead to all-zero IDs being logged/propagated. Consider checking span.get_span_context().is_valid (and/or non-zero IDs) before returning.
| from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector | ||
| from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector |
There was a problem hiding this comment.
AwsEc2ResourceDetector/AwsEcsResourceDetector are imported unconditionally, but the required OpenTelemetry AWS resource extension package isn’t listed in backend/requirements.txt. As written, importing this module will raise ModuleNotFoundError and prevent the backend from starting. Consider either adding the missing dependency (e.g., the OpenTelemetry AWS resource extension) or moving these imports behind a try/except (and only enabling the detectors when the package is installed).
| from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector | |
| from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector | |
| try: | |
| from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector | |
| from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector | |
| except ImportError: | |
| class AwsEc2ResourceDetector: # type: ignore[override] | |
| """ | |
| Fallback no-op AWS EC2 resource detector used when the | |
| OpenTelemetry AWS resource extension package is not installed. | |
| """ | |
| def detect(self) -> Resource: | |
| # Return an empty resource so tracing still works without AWS metadata. | |
| return Resource.create({}) | |
| class AwsEcsResourceDetector: # type: ignore[override] | |
| """ | |
| Fallback no-op AWS ECS resource detector used when the | |
| OpenTelemetry AWS resource extension package is not installed. | |
| """ | |
| def detect(self) -> Resource: | |
| # Return an empty resource so tracing still works without AWS metadata. | |
| return Resource.create({}) |
|
|
||
| Returns: | ||
| Dictionary of trace headers to include in requests | ||
| """ | ||
| headers = {} | ||
| inject_trace_context(headers) | ||
| # Also add trace_id for logging purposes | ||
| trace_id = get_trace_id() | ||
| if trace_id: |
There was a problem hiding this comment.
Because get_trace_id() can return an all-zero trace id when there is no active span, this client may end up sending X-Trace-ID: 000... on requests. Consider only adding X-Trace-ID when the current span context is valid (or after get_trace_id() is updated to return None when invalid).
| Returns: | |
| Dictionary of trace headers to include in requests | |
| """ | |
| headers = {} | |
| inject_trace_context(headers) | |
| # Also add trace_id for logging purposes | |
| trace_id = get_trace_id() | |
| if trace_id: | |
| Returns: | |
| Dictionary of trace headers to include in requests | |
| """ | |
| headers = {} | |
| inject_trace_context(headers) | |
| # Also add trace_id for logging purposes, but only if it is non-zero | |
| trace_id = get_trace_id() | |
| if trace_id and any(ch != "0" for ch in trace_id): |
| - uses: dorny/paths-filter@v3 | ||
| id: changes | ||
| with: | ||
| base: main |
There was a problem hiding this comment.
base: main is hardcoded for dorny/paths-filter, but this workflow runs on PRs and pushes to both main and develop. Using main as the base can produce incorrect change detection (e.g., PRs targeting develop or pushes to develop), potentially skipping required jobs or running too much. Consider using the PR base ref (e.g., ${{ github.base_ref }}) and falling back appropriately for push events, or omit base to use the action’s defaults.
| base: main |
| from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector | ||
| from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector |
There was a problem hiding this comment.
AwsEc2ResourceDetector/AwsEcsResourceDetector are imported unconditionally, but ai-engine/requirements.txt doesn’t include the OpenTelemetry AWS resource extension package required to provide opentelemetry.sdk.extension.aws.*. This will raise ModuleNotFoundError at startup. Consider adding the missing dependency or making these imports optional (try/except) so tracing can work without AWS-specific packages.
| from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector | |
| from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector | |
| try: | |
| from opentelemetry.sdk.extension.aws.resource.ec2 import AwsEc2ResourceDetector | |
| from opentelemetry.sdk.extension.aws.resource.ecs import AwsEcsResourceDetector | |
| except ModuleNotFoundError: | |
| # Fallback detectors when the OpenTelemetry AWS extension package is not installed. | |
| # These maintain API compatibility but do not add any AWS-specific resource attributes. | |
| class AwsEc2ResourceDetector: # type: ignore[override] | |
| def detect(self) -> Resource: | |
| return Resource.create({}) | |
| class AwsEcsResourceDetector: # type: ignore[override] | |
| def detect(self) -> Resource: | |
| return Resource.create({}) | |
| logging.getLogger(__name__).warning( | |
| "OpenTelemetry AWS resource detectors are not available. " | |
| "Install 'opentelemetry-sdk-extension-aws' to enable AWS resource detection." | |
| ) |
| depends_on: | ||
| redis: | ||
| condition: service_healthy | ||
| postgres: | ||
| condition: service_healthy | ||
| jaeger: | ||
| condition: service_started |
There was a problem hiding this comment.
Jaeger has a healthcheck defined, but the backend is only waiting for service_started. This can allow the backend to start before Jaeger is actually ready, dropping early spans. Consider switching this to condition: service_healthy to align with the other dependencies and ensure Jaeger is ready before startup.
| redis: | ||
| condition: service_healthy | ||
| jaeger: | ||
| condition: service_started |
There was a problem hiding this comment.
Jaeger has a healthcheck defined, but ai-engine depends on it with condition: service_started. Consider using condition: service_healthy so the engine waits until Jaeger is ready before emitting spans, reducing lost traces at startup.
| condition: service_started | |
| condition: service_healthy |
- Format backend files: tracing.py, ai_engine_client.py, structured_logging.py, main.py - Format ai-engine files: tracing.py, main.py, logging_config.py Co-authored-by: openhands <openhands@all-hands.dev>
- backend/src/main.py - backend/src/services/ai_engine_client.py - backend/src/services/structured_logging.py - ai-engine/main.py - ai-engine/utils/logging_config.py Co-authored-by: openhands <openhands@all-hands.dev>
…atibility Latest version 1.24.0+ requires Python <3.11. This fixes the Prepare Base Images CI failure.
* feat: Add Ruff linter configuration for all Python directories - Add root-level pyproject.toml with comprehensive Ruff config - Configure Ruff to check all Python directories (backend, ai-engine, modporter, tests) - Add appropriate ignores for legacy code patterns (unused imports, module-level imports not at top, bare except, etc.) - Update CI workflow to use root config with 'ruff check .' - Exclude UTF-16 encoded temp_init.py file from linting Co-authored-by: openhands <openhands@all-hands.dev> * fix(CI): Resolve integration tests failing on main branch - Fix prepare-base-images job to always run but conditionally skip build - This ensures outputs are always available for dependent jobs - Fixes 'Unable to find image' error when dependencies haven't changed - Fix integration-tests container image to use coalesce() for fallback - When prepare-base-images is skipped, use python:3.11-slim as fallback - Fixes empty container image reference error - Fix performance-monitoring job needs clause - Corrected 'prepare-base-images' reference (was missing underscore) - Fix frontend-tests pnpm setup order - Install pnpm before setup-node to avoid 'unable to cache dependencies' - Simplified caching to use built-in pnpm cache in setup-node Co-authored-by: openhands <openhands@all-hands.dev> * fix(frontend): Resolve Issue #776 - Frontend Test Failures Fixed multiple failing test files: - RecipeBuilder.test.tsx: Added async/await patterns, fixed userEvent setup - ConversionUpload.test.tsx: Fixed URL validation test (validation happens on button click) - EnhancedConversionReport.test.tsx: Simplified complex DOM interaction tests - ConversionProgress.test.tsx, useUndoRedo.test.ts, api.test.ts: Additional fixes All 189 frontend tests now passing. Co-authored-by: openhands <openhands@all-hands.dev> * fix: Fix pre-existing frontend test failures (#776) Co-authored-by: openhands <openhands@all-hands.dev> * fix(CI): Set VITE_API_BASE_URL for frontend tests This ensures the WebSocket URL is properly constructed in CI environment instead of using 'ws://undefined/ws/...' when VITE_API_BASE_URL is not set. Co-authored-by: openhands <openhands@all-hands.dev> --------- Co-authored-by: openhands <openhands@all-hands.dev>
Summary\nImplements distributed tracing for ModPorter-AI microservices using OpenTelemetry.\n\n## Changes\n- Add OpenTelemetry tracing for backend and ai-engine services\n- Implement W3C Trace Context propagation between services\n- Add automatic FastAPI, HTTPX, and Redis instrumentation\n- Configure Jaeger and OTLP exporters (configurable via env vars)\n- Add trace context injection/extraction utilities for service calls\n- Update docker-compose.yml with tracing environment variables\n\n## Related Issue\n#696