Skip to content

Commit 81c601f

Browse files
etirelliAmbient Code Botclaude
authored
feat(runner): optional MLflow tracing parallel to Langfuse (#1263)
<!-- acp:session_id=session-f6f5c9bf-a278-4c57-a92b-f648abd59de1 source=#1263 last_action=2026-04-10T14:29:24Z retry_count=3 --> ## Summary Adds **optional MLflow GenAI tracing** alongside the existing Langfuse path. Langfuse remains the **default** when `OBSERVABILITY_BACKENDS` is unset, so current reports, evals, and feedback (`ambient:langfuse_trace` / `traceId`) behave as before unless operators opt in. **Backward compatibility:** `LANGFUSE_MASK_MESSAGES` / shared masking uses the **same rules as before this PR** (e.g. redact long strings, allow-listed keys including `metadata` unchanged). Session `spec.environmentVariables` **cannot** override operator-injected env vars that come from **SecretKeyRef** (Langfuse + MLflow), so platform observability config is not bypassed. ## What changed ### Runner (`ambient-runner`) - **`OBSERVABILITY_BACKENDS`**: comma list `langfuse`, `mlflow` (default: `langfuse` only). - **`MLflowSessionTracer`** (`mlflow_observability.py`): mirrors turn / tool boundaries with `mlflow.start_span`, usage attributes, session summary span, error cleanup, and flush. - **`observability_config.py`** / **`observability_privacy.py`**: backend selection and shared message masking (`LANGFUSE_MASK_MESSAGES` still applies to both paths). - **`ObservabilityManager`**: orchestrates Langfuse + MLflow; **`mlflow_tracing_active`** property; per-backend try/except so one backend failing does not disable the other. - **`pyproject.toml`**: optional extra **`mlflow-observability`** (`mlflow[kubernetes]>=3.11`, `opentelemetry-exporter-otlp`); included in **`all`** (Docker `[all]` install). - **README**: MLflow env vars and OTLP (`OTEL_*`, `MLFLOW_TRACE_ENABLE_OTLP_DUAL_EXPORT`). - **Tests**: `test_observability_config.py`, `test_observability_mlflow_integration.py`; **`_extract_assistant_text`** tolerates missing `claude_agent_sdk` so pytest passes without the `claude` extra. ### Operator & manifests - Copies **`ambient-admin-mlflow-observability-secret`** into session namespaces when **`MLFLOW_TRACING_ENABLED`** is set on the operator (same pattern as Langfuse). - Injects **`MLFLOW_TRACING_ENABLED`**, **`MLFLOW_TRACKING_URI`**, **`MLFLOW_EXPERIMENT_NAME`**, optional **`MLFLOW_TRACKING_AUTH`** / **`MLFLOW_WORKSPACE`**, **`OBSERVABILITY_BACKENDS`** via `secretKeyRef`. - **`replaceOrAppendEnvVars`**: entries already using **`ValueFrom`** (secret-backed) are not replaced by `spec.environmentVariables`, so user env cannot override Langfuse or MLflow keys wired from secrets. - When the MLflow observability secret is in use, runner pods use **`ServiceAccountName: ambient-session-<session>`** with **token automount** so `MLFLOW_TRACKING_AUTH=kubernetes-namespaced` can supply JWT + workspace headers (per MLflow 3.11). Session **`Role`** grants **`experiments`** (`mlflow.kubeflow.org`) for Kubeflow-style workspace auth (tune for your CRD group). - Deletes copied secret on session cleanup (mirrors Langfuse rules). - **`operator-deployment.yaml`**: optional env from the MLflow observability secret. - **Example secret**: `components/manifests/base/ambient-admin-mlflow-observability-secret.yaml.example`. ### Capabilities API - **`FrameworkCapabilities.tracing`**: e.g. `langfuse`, `mlflow`, or `langfuse,mlflow` via **`tracing_capability_label`**. ## How to enable (high level) 1. Create **`ambient-admin-mlflow-observability-secret`** in the operator namespace (see example YAML). 2. Set operator env **`MLFLOW_TRACING_ENABLED=true`** (and ensure the secret exists) so the operator copies it and wires runner env. 3. Runner image must include MLflow deps (already true for **`[all]`** builds). ## Testing - `cd components/runners/ambient-runner && uv run pytest tests/` (full suite; includes observability + privacy tests). - Operator: run **`go test ./...`** with a Go toolchain matching **`go.mod`** (e.g. 1.24+), including **`TestReplaceOrAppendEnvVarsPreservesValueFrom`**. ## Commits (squash on merge) Branch includes the feature commit plus follow-ups, e.g. MLflow k8s auth / SA automount / experiment RBAC, `ValueFrom` env precedence, assistant-text test fix, and masking left aligned with historical Langfuse behaviour—see **`git log`** on the PR branch for the exact list. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Optional MLflow observability: per-session MLflow tracing, selectable backends (langfuse, mlflow), backend selection and privacy utilities, and conditional runner service-account automount when namespaced auth is used. * **Bug Fixes** * Prevents user env vars from overriding operator-managed observability vars; preserves platform-managed settings and protects injected names. * **Documentation** * Expanded docs for multi-backend observability, MLflow setup, OTLP export, and masking options. * **Tests** * Added tests for backend selection, MLflow integration, and privacy masking. * **Chores** * Example MLflow secret manifest, RBAC for per-session MLflow access, and packaging extra for MLflow. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Edson Tirelli <etirelli@redhat.com> Signed-off-by: Ambient Code Bot <bot@ambient-code.local> Co-authored-by: Ambient Code Bot <bot@ambient-code.local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 519f9e4 commit 81c601f

17 files changed

+2702
-502
lines changed
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Example: platform-wide MLflow tracing credentials for the operator to copy into session namespaces.
2+
# Apply to the operator namespace (same pattern as ambient-admin-langfuse-secret).
3+
#
4+
# kubectl create secret generic ambient-admin-mlflow-observability-secret -n <operator-ns> \
5+
# --from-literal=MLFLOW_TRACING_ENABLED=true \
6+
# --from-literal=MLFLOW_TRACKING_URI=https://mlflow.example.com \
7+
# --from-literal=MLFLOW_TRACKING_AUTH=kubernetes-namespaced \
8+
# --from-literal=MLFLOW_EXPERIMENT_NAME=ambient-code-sessions \
9+
# --from-literal=MLFLOW_WORKSPACE=my-workspace \
10+
# --from-literal=OBSERVABILITY_BACKENDS=langfuse,mlflow
11+
#
12+
# Keys:
13+
# MLFLOW_TRACING_ENABLED - "true" to enable secret copy + runner env injection
14+
# MLFLOW_TRACKING_URI - required on the runner for MLflow tracing
15+
# MLFLOW_TRACKING_AUTH - auth method; use "kubernetes-namespaced" for MLflow 3.11+ on K8s
16+
# so MLflow sends Authorization (service account JWT) and X-MLFLOW-WORKSPACE
17+
# MLFLOW_EXPERIMENT_NAME - optional (runner default: ambient-code-sessions)
18+
# MLFLOW_WORKSPACE - optional; fixed workspace id for X-MLFLOW-WORKSPACE instead of pod namespace
19+
# OBSERVABILITY_BACKENDS - optional; comma list: langfuse, mlflow (runner default: langfuse only if unset)
20+
apiVersion: v1
21+
kind: Secret
22+
metadata:
23+
name: ambient-admin-mlflow-observability-secret
24+
namespace: CHANGE_ME_OPERATOR_NAMESPACE
25+
type: Opaque
26+
stringData:
27+
MLFLOW_TRACING_ENABLED: "true"
28+
MLFLOW_TRACKING_URI: "https://mlflow.example.com"
29+
MLFLOW_TRACKING_AUTH: "kubernetes-namespaced"
30+
MLFLOW_EXPERIMENT_NAME: "ambient-code-sessions"
31+
OBSERVABILITY_BACKENDS: "langfuse,mlflow"

components/manifests/base/core/operator-deployment.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,19 @@ spec:
9999
name: ambient-admin-langfuse-secret
100100
key: LANGFUSE_SECRET_KEY
101101
optional: true # Optional: only needed if Langfuse enabled
102+
# MLflow tracing (runner); optional secret managed by platform admin
103+
- name: MLFLOW_TRACING_ENABLED
104+
valueFrom:
105+
secretKeyRef:
106+
name: ambient-admin-mlflow-observability-secret
107+
key: MLFLOW_TRACING_ENABLED
108+
optional: true
109+
- name: OBSERVABILITY_BACKENDS
110+
valueFrom:
111+
secretKeyRef:
112+
name: ambient-admin-mlflow-observability-secret
113+
key: OBSERVABILITY_BACKENDS
114+
optional: true
102115
# Google OAuth client credentials for workspace-mcp
103116
- name: GOOGLE_OAUTH_CLIENT_ID
104117
valueFrom:

components/operator/internal/handlers/reconciler.go

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -115,8 +115,9 @@ func TransitionToStopped(ctx context.Context, session *unstructured.Unstructured
115115
// Cleanup secrets
116116
deleteCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
117117
defer cancel()
118-
_ = deleteAmbientVertexSecret(deleteCtx, namespace)
119-
_ = deleteAmbientLangfuseSecret(deleteCtx, namespace)
118+
_ = deleteAmbientVertexSecret(deleteCtx, namespace, name)
119+
_ = deleteAmbientLangfuseSecret(deleteCtx, namespace, name)
120+
_ = deleteAmbientMlflowObservabilitySecret(deleteCtx, namespace, name)
120121

121122
return nil
122123
}

0 commit comments

Comments
 (0)