Summary
Add OpenTelemetry (OTel) instrumentation to DocsClaw so that agent execution — LLM calls, tool invocations, skill loading, and A2A delegation — produces distributed traces that can be viewed in any OTel-compatible backend (Jaeger, Tempo, Datadog, etc.).
Motivation
Observability is a gap compared to frameworks like Google ADK that provide built-in tracing. Rather than adopting a framework-specific tracing solution, OTel provides a universal standard that works with any backend and any agent framework in a multi-agent system.
This matters for:
- Debugging: understanding why an agent took 30 seconds (was it the LLM call? a tool timeout? skill loading?)
- Multi-agent visibility: tracing A2A delegation chains across agents
- Production readiness: operators need to understand agent behavior without reading logs
Proposed instrumentation points
| Component |
Span name |
Key attributes |
pkg/tools/loop.go |
agent.loop.iteration |
iteration number, tool calls made |
internal/anthropic/ |
llm.complete |
model, token count, stop reason |
pkg/tools/ |
tool.execute.<name> |
tool name, duration, success/error |
pkg/skills/ |
skill.load |
skill name, source (OCI/ConfigMap) |
internal/bridge/ |
a2a.delegate |
target agent, message ID |
Implementation notes
- Use
go.opentelemetry.io/otel SDK — lightweight, no heavy dependencies
- Export via OTLP (gRPC or HTTP) configured through standard
OTEL_EXPORTER_* env vars
- Traces should propagate through A2A calls via W3C Trace Context headers
- Keep it optional — if no exporter is configured, tracing is a no-op
Related
Summary
Add OpenTelemetry (OTel) instrumentation to DocsClaw so that agent execution — LLM calls, tool invocations, skill loading, and A2A delegation — produces distributed traces that can be viewed in any OTel-compatible backend (Jaeger, Tempo, Datadog, etc.).
Motivation
Observability is a gap compared to frameworks like Google ADK that provide built-in tracing. Rather than adopting a framework-specific tracing solution, OTel provides a universal standard that works with any backend and any agent framework in a multi-agent system.
This matters for:
Proposed instrumentation points
pkg/tools/loop.goagent.loop.iterationinternal/anthropic/llm.completepkg/tools/tool.execute.<name>pkg/skills/skill.loadinternal/bridge/a2a.delegateImplementation notes
go.opentelemetry.io/otelSDK — lightweight, no heavy dependenciesOTEL_EXPORTER_*env varsRelated
internal/metrics/— traces complement, not replace