Skip to content

feat: add OpenTelemetry tracing for agent execution #34

@pavelanni

Description

@pavelanni

Summary

Add OpenTelemetry (OTel) instrumentation to DocsClaw so that agent execution — LLM calls, tool invocations, skill loading, and A2A delegation — produces distributed traces that can be viewed in any OTel-compatible backend (Jaeger, Tempo, Datadog, etc.).

Motivation

Observability is a gap compared to frameworks like Google ADK that provide built-in tracing. Rather than adopting a framework-specific tracing solution, OTel provides a universal standard that works with any backend and any agent framework in a multi-agent system.

This matters for:

  • Debugging: understanding why an agent took 30 seconds (was it the LLM call? a tool timeout? skill loading?)
  • Multi-agent visibility: tracing A2A delegation chains across agents
  • Production readiness: operators need to understand agent behavior without reading logs

Proposed instrumentation points

Component Span name Key attributes
pkg/tools/loop.go agent.loop.iteration iteration number, tool calls made
internal/anthropic/ llm.complete model, token count, stop reason
pkg/tools/ tool.execute.<name> tool name, duration, success/error
pkg/skills/ skill.load skill name, source (OCI/ConfigMap)
internal/bridge/ a2a.delegate target agent, message ID

Implementation notes

  • Use go.opentelemetry.io/otel SDK — lightweight, no heavy dependencies
  • Export via OTLP (gRPC or HTTP) configured through standard OTEL_EXPORTER_* env vars
  • Traces should propagate through A2A calls via W3C Trace Context headers
  • Keep it optional — if no exporter is configured, tracing is a no-op

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions