The Ambient Code Platform uses Langfuse for LLM-specific observability and cost tracking. Langfuse provides detailed insights into Claude interactions, token usage, and tool executions across all agentic sessions.
Key capabilities:
- Turn-level generations with accurate token and cost tracking
- Tool execution visibility without cost inflation
- Session grouping and multi-user cost allocation
- Real-time trace streaming with async flush
OpenTelemetry Compatibility: Langfuse v3 is built on OpenTelemetry standards. While we use the native Langfuse SDK for simplicity, the platform can integrate with any OTEL-compatible observability backend if desired.
Our instrumentation creates a flat trace hierarchy where each Claude turn is a top-level trace:
claude_turn_1 (trace)
├─ input: user prompt
├─ output: assistant response
├─ usage: tokens + cost
└─ tool_Read, tool_Write (child spans for visibility)
claude_turn_2 (trace)
├─ input: follow-up prompt
├─ output: assistant response
├─ usage: tokens + cost
└─ tool_Bash (child span)
Grouped by: session_id via propagate_attributes
Design rationale:
- Turns as traces: Each
claude_turn_Xis a top-level trace (not nested under a session trace) - Session grouping:
propagate_attributes()groups related traces bysession_id,user_id, and tags - Tool visibility: Tool spans provide execution details without duplicate token counting
- Real-time streaming: Explicit flush after each turn for immediate UI visibility
The Claude SDK processes responses as an async generator that yields messages across multiple loop iterations:
async for message in client.receive_response():
# Messages arrive sequentially:
# 1. AssistantMessage → start turn
# 2. ToolUseBlock(s) → track tool spans
# 3. ToolResultBlock(s) → update tool results
# 4. ResultMessage → close turn with usage dataManual context management required: Python's with statement cannot maintain state across async iterations, so we manually call __enter__() and __exit__() on Langfuse contexts.
All traces inherit consistent metadata via propagate_attributes():
- user_id: For cost allocation and multi-user tracking
- session_id: Kubernetes AgenticSession name for grouping
- tags:
["claude-code", "namespace:X", "model:Y"] - metadata:
namespace: Project namespaceuser_name: User display namemodel: Specific model used (e.g.,claude-sonnet-4-5@20250929)initial_prompt: First 200 chars of session prompt
Name: claude_turn_X (sequential: 1, 2, 3...)
- Type: Generation with usage tracking
- Input: Turn prompt (user message or continuation)
- Output: Complete Claude response
- Model: Propagated from session context
- Usage: Canonical format for accurate cost calculation
input: Regular input tokensoutput: Output tokenscache_read_input_tokens: Cache hits (90% discount)cache_creation_input_tokens: Cache writes (25% premium)- See Model Pricing for complete pricing details
Turn counting: Uses SDK's authoritative num_turns field from ResultMessage to ensure accuracy.
Deferred creation: Turn 1 is not created until the first AssistantMessage arrives, ensuring traces have real user input (not synthetic prompts).
Name: tool_{ToolName} (e.g., tool_Read, tool_Write, tool_Bash)
- Type: Span (no usage tracking)
- Purpose: Execution visibility only
- Input: Full tool parameters
- Output: Tool results (truncated to 500 chars for large outputs)
- No tokens: Local operations already counted in parent turn
- Flat trace hierarchy: Turns as top-level traces (not observations under session trace)
- Manual context management: Required for async streaming architecture
- Real-time flush: Explicit
langfuse_client.flush()after each turn completes - Deferred turn creation: Store initial prompt, create turn when interaction actually begins
- Authoritative turn counting: Use SDK's
num_turnsfield (not manual increment) - Clean error handling: Trust Langfuse SDK for incomplete traces (no synthetic error messages)
Standard with statements cannot maintain state across async loop iterations:
# ❌ This doesn't work - context closes at end of iteration
async for message in stream:
with langfuse.start_as_current_observation() as turn:
process(message)
# Context already closed, but we need it for next message!
# ✅ Manual context management - stays open across iterations
def start_turn():
self._ctx = langfuse.start_as_current_observation(...)
self._generation = self._ctx.__enter__() # Manually enter
def end_turn():
self._ctx.__exit__(None, None, None) # Manually exit when readyDeploy Langfuse to your cluster using the provided script:
# Auto-detect platform (OpenShift or Kubernetes)
./e2e/scripts/deploy-langfuse.sh
# Or specify explicitly
./e2e/scripts/deploy-langfuse.sh --openshift
./e2e/scripts/deploy-langfuse.sh --kubernetesThe script handles:
- Platform detection (OpenShift vs Kubernetes)
- Helm chart installation
- PostgreSQL database setup
- Ingress/Route configuration
- Namespace creation
Langfuse is configured platform-wide via the ambient-admin-langfuse-secret secret in the operator namespace. See deployment documentation for setup details.
Why Langfuse for LLM observability:
- LLM-optimized: Purpose-built for prompt/response tracking and cost analysis
- Simple setup: Only API keys required, no additional infrastructure
- Cost tracking: Automatic token and cost calculation for Claude API usage
- Rich insights: Full tool I/O, generation content, and performance metrics
- Multi-user support: Track usage by user_id for cost allocation
- OTEL compatible: Can migrate to any OTEL backend if requirements change
- Implementation:
components/runners/claude-code-runner/observability.py - Integration:
components/runners/claude-code-runner/wrapper.py - Langfuse Docs: https://langfuse.com/docs
- Python SDK v3: https://langfuse.com/docs/sdk/python