feat(instrumentation-microsoft-agent-framework): add plugin per execute.md#229
feat(instrumentation-microsoft-agent-framework): add plugin per execute.md#229rangemer333-cell wants to merge 3 commits into
Conversation
…te.md Implements the hybrid "SpanProcessor + optional ReAct step patch" plan documented in llm-dev/microsoft-agent-framework/investigate/execute.md. - MicrosoftAgentFrameworkInstrumentor: enables MAF native OTel layers (force=True) and registers MAFSemanticProcessor. - MAFSemanticProcessor (span_processor.py): injects gen_ai.span.kind, gen_ai.operation.name, renames MAF private-prefix attributes to gen_ai.*, normalizes provider.name (azure_openai -> openai), backfills TTFT from streaming events, sets StatusCode.OK on success, aggregates the 6 ARMS gauges. Reuses opentelemetry.util.genai.utils.gen_ai_json_dumps (aligned with openai-agents-v2/span_processor.py:27) to coerce dict/list attribute values into JSON strings. - react_step_patch.py (opt-in via ARMS_MAF_REACT_STEP_ENABLED): emits one react step span per LLM round-trip inside FunctionInvocationLayer via ExtendedTelemetryHandler.react_step() from opentelemetry-util-genai. - config.py: env switches (master, sensitive data, react step, slow threshold). - tests: 23 passing unit tests covering span classification, metric aggregation, provider normalization, TTFT backfill, dict coercion, and react_step handler behavior.
[M1] MCP span classification: detect mcp.method.name attribute (or SpanKind.CLIENT + mcp.* fallback) in _classify_span and return (CLIENT, MCP). MAF_SPAN_NAME_PREFIXES documents that MCP is detected via attribute rather than prefix (method names are unbounded). [M2] revert_react_step_patch: capture originals BEFORE wrapping (via __wrapped__ unwrap chain), and switch from broken decorator form to wrap_function_wrapper(class, name, wrapper) + @wrapt.decorator. revert now restores the original; apply->revert->apply does not stack wrappers. [L1] _safe_dumps: cap output at 4096 chars (execute.md single-field cap) since gen_ai_json_dumps only serializes. Docstring + module docstring updated to match actual behavior. Tests: +6 (test_mcp_span_classified_as_client, mcp client-kind fallback, non-mcp client negative, _safe_dumps truncation, apply->revert->apply round-trip, _unwrap_to_function). 29 passed.
- P0: react_step_patch wrappers now return coroutines from sync wrappers so MAF's await layer.get_response no longer raises TypeError. ContextVar tokens are set inside the coroutine body so set/reset share the same asyncio task context. - P1: extend op_name override condition to include CLIENT span kind so MCP tools/call inner spans get gen_ai.operation.name=mcp even when MAF pre-wrote execute_tool. - P3: provider.name normalization now handles sequence values (MAF emits list-wrapped values on AGENT spans) and falls back to case-insensitive matching so microsoft.agent_framework -> openai on AGENT spans. - P5: instrument() prepends MAFSemanticProcessor to the SDK processor tuple so on_end enrichments run before exporter processors registered earlier in bootstrap. Adds 7 unit tests; pytest tests/ -> 36 passed.
There was a problem hiding this comment.
Pull request overview
This PR introduces a new GenAI instrumentation plugin, opentelemetry-instrumentation-microsoft-agent-framework, to automatically enrich Microsoft Agent Framework (MAF) spans into ARMS GenAI semantic conventions via a custom SpanProcessor, plus an optional ReAct-step monkey patch.
Changes:
- Added
MAFSemanticProcessorto classify/enrich MAF spans (kind/op/framework/rename/provider normalization/TTFT) and aggregate ARMS gauges. - Added optional
react_step_patchto emit STEP spans around ReAct loop LLM calls viaExtendedTelemetryHandler.react_step(). - Added unit tests for config parsing, processor enrichment/metrics behavior, and patch idempotency/coroutine behavior without requiring MAF installed.
Reviewed changes
Copilot reviewed 12 out of 14 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/init.py | Instrumentor: enables MAF native telemetry, registers span processor, optionally applies ReAct patch |
| instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/span_processor.py | Core enrichment + metrics aggregation logic |
| instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/react_step_patch.py | Optional ReAct loop patch emitting STEP spans |
| instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/semantic_conventions.py | Semconv constants + MAF→gen_ai rename map + provider normalization map |
| instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/config.py | Env var parsing for enabling instrumentation/metrics/react-step/sensitive data/slow threshold |
| instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/package.py | Declares instrumented dependency + metrics support |
| instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/version.py | Package version |
| instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/pyproject.toml | Packaging metadata and entry-point registration |
| instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/README.rst | Usage + configuration docs |
| instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/tests/test_processor.py | Processor enrichment/metrics tests (incl. MCP + regressions) |
| instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/tests/test_react_step.py | ReAct patch tests and direct handler STEP span shape test |
| instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/tests/test_config.py | Config env var parsing tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import logging | ||
| from collections import defaultdict | ||
| from typing import Any, Dict, Optional, Tuple | ||
|
|
||
| from opentelemetry.context import Context | ||
| from opentelemetry.metrics import ObservableGauge, get_meter | ||
| from opentelemetry.sdk.trace import SpanProcessor | ||
| from opentelemetry.sdk.trace import TracerProvider # noqa: F401 (typing hint) | ||
| from opentelemetry.trace import Span as OtelSpan, Status, StatusCode | ||
| from opentelemetry.trace.span import TraceState # noqa: F401 | ||
| from opentelemetry.util.genai.utils import gen_ai_json_dumps | ||
| from opentelemetry.util.types import AttributeValue |
| self._live_spans: Dict[str, OtelSpan] = {} | ||
| self._span_parents: Dict[str, Optional[str]] = {} | ||
| self._slow_threshold_ns = int(slow_threshold_ms) * 1_000_000 | ||
| self._capture_sensitive = capture_sensitive_data | ||
| self._counters = _Counters() | ||
| self._meter = None | ||
| self._gauges: list[ObservableGauge] = [] | ||
| self._metrics_enabled = metrics_enabled | ||
| if metrics_enabled: |
| c = self._counters | ||
|
|
||
| def _calls_cb(options): | ||
| for (model, kind), count in c.calls_count.items(): | ||
| yield _obs( | ||
| count, | ||
| {"modelName": model or "unknown", "spanKind": kind}, | ||
| ) | ||
|
|
||
| def _duration_cb(options): | ||
| for (model, kind), total in c.calls_duration_ns_sum.items(): | ||
| count = max(c.calls_count.get((model, kind), 0), 1) | ||
| yield _obs( | ||
| total / count / 1e9, | ||
| {"modelName": model or "unknown", "spanKind": kind}, | ||
| ) | ||
|
|
| def _error_cb(options): | ||
| for (model, kind), count in c.calls_error_count.items(): | ||
| yield _obs( | ||
| count, | ||
| {"modelName": model or "unknown", "spanKind": kind}, | ||
| ) | ||
|
|
||
| def _slow_cb(options): | ||
| for (model, kind), count in c.calls_slow_count.items(): | ||
| yield _obs( | ||
| count, | ||
| {"modelName": model or "unknown", "spanKind": kind}, | ||
| ) | ||
|
|
| def _ttft_cb(options): | ||
| for (model, kind), total in c.llm_first_token_ns_sum.items(): | ||
| count = max(c.llm_first_token_count.get((model, kind), 0), 1) | ||
| yield _obs( | ||
| total / count / 1e9, | ||
| {"modelName": model or "unknown", "spanKind": kind}, | ||
| ) | ||
|
|
||
| def _tokens_input_cb(options): | ||
| for (model, kind), total in c.llm_usage_input_tokens.items(): | ||
| yield _obs( | ||
| total, | ||
| { | ||
| "modelName": model or "unknown", | ||
| "spanKind": kind, | ||
| "usageType": "input", | ||
| }, | ||
| ) | ||
|
|
||
| def _tokens_output_cb(options): | ||
| for (model, kind), total in c.llm_usage_output_tokens.items(): | ||
| yield _obs( | ||
| total, | ||
| { | ||
| "modelName": model or "unknown", | ||
| "spanKind": kind, | ||
| "usageType": "output", | ||
| }, | ||
| ) |
| def _uninstrument(self, **kwargs: Any) -> None: | ||
| if self._react_applied: | ||
| revert_react_step_patch() | ||
| self._react_applied = False | ||
| if self._processor is not None: | ||
| try: | ||
| self._processor.shutdown() | ||
| except Exception as exc: # pragma: no cover - defensive | ||
| logger.debug("processor shutdown error: %s", exc) | ||
| self._processor = None |
| assert asyncio.iscoroutine(coro), ( | ||
| "FIL wrapper must return a coroutine so MAF can `await` it" | ||
| ) | ||
| result = asyncio.get_event_loop().run_until_complete(coro) |
| assert asyncio.iscoroutine(coro), ( | ||
| "Chat wrapper must pass through the wrapped coroutine unchanged" | ||
| ) | ||
| result = asyncio.get_event_loop().run_until_complete(coro) |
| ``ARMS_MAF_INSTRUMENTATION_ENABLED`` ``true`` Master switch; ``false`` disables instrumentation. | ||
| ``ARMS_MAF_SENSITIVE_DATA_ENABLED`` ``false`` Capture inputs/outputs (linked to MAF's ``ENABLE_SENSITIVE_DATA``). | ||
| ``ARMS_MAF_REACT_STEP_ENABLED`` ``false`` Emit ``react step`` spans (opt-in). | ||
| ``ARMS_MAF_SLOW_THRESHOLD_MS`` ``1000`` Slow-call threshold in ms. |
| GEN_AI_FRAMEWORK, | ||
| GEN_AI_OPERATION_NAME, | ||
| GEN_AI_PROVIDER_NAME, | ||
| GEN_AI_REACT_ROUND, | ||
| GEN_AI_REQUEST_MODEL, | ||
| GEN_AI_RESPONSE_MODEL, | ||
| GEN_AI_RESPONSE_TTFT, |
ralf0131
left a comment
There was a problem hiding this comment.
Summary
New opentelemetry-instrumentation-microsoft-agent-framework plugin (+2233 lines, 14 files, 36 tests). Well-structured implementation following the established openai-agents-v2 pattern: MAFSemanticProcessor enriches native MAF OTel spans with ARMS GenAI semantic conventions (span kind, operation name, attribute renaming, provider normalization, TTFT backfill, status, 6 metrics gauges). ReAct step patch is opt-in and cleanly revertible. Test coverage is thorough — classification, MCP detection, provider normalization, TTFT, metrics, and patch apply/revert are all covered.
Overall verdict: solid work for a first-time contribution. No blocking issues found. A few suggestions below for production hardening.
Findings
- [Warning] span_processor.py:684 — Thread-safety of metrics counters (
+=on defaultdict) - [Info] span_processor.py:553 — Potential
_live_spansmemory growth on un-ended spans - [Info] semantic_conventions.py:106 — Provider normalization conflates framework with provider
- [Info] init.py:119 — Private SDK internals for processor ordering
Suggestions
- For the metrics counters, consider a
threading.Lockaround_aggregate_metrics— the overhead is negligible sinceon_endis not a hot path per-span, and it eliminates a class of subtle race conditions in multi-threaded apps. - For
_live_spans, a simple sweep inforce_flushthat removes entries whose span start time is older than e.g. 60s would prevent unbounded growth from orphaned spans. - Consider adding an integration test that verifies the processor works end-to-end with a real (or mock) MAF tracer provider, not just unit tests with mock spans.
First-Time Contributor Note
Welcome and thank you for this high-quality contribution! The code is clean, well-documented, and follows the repository's established patterns. The E2E validation (9/9 examples) is impressive. Please don't let the suggestions above discourage you — they are minor hardening ideas, not blockers.
Automated review by github-manager-bot
| model = model if isinstance(model, str) else "unknown" | ||
| key = (model, span_kind) | ||
| self._counters.calls_count[key] += 1 | ||
|
|
There was a problem hiding this comment.
[Warning] Thread-safety of metrics counters: The defaultdict increments here (self._counters.calls_count[key] += 1) are read-modify-write operations that are not atomic under concurrent access. While CPython's GIL makes individual dict lookups/stores atomic, the += sequence can lose increments if two threads end spans simultaneously with the same (model, span_kind) key. For high-throughput multi-threaded apps, consider wrapping counter updates in a threading.Lock.
| except Exception: | ||
| return | ||
| self._live_spans[key] = span | ||
| parent = getattr(span, "_parent", None) |
There was a problem hiding this comment.
[Info] Potential _live_spans memory growth: If a span is started via on_start but never ended (e.g., user code crashes mid-span), its entry in _live_spans will never be cleaned up. For long-running processes, consider adding a periodic sweep in force_flush to remove entries older than a threshold.
| "azure_ai_openai": "openai", | ||
| "azure.openai": "openai", | ||
| "microsoft.agent_framework": "openai", | ||
| } |
There was a problem hiding this comment.
[Info] Provider normalization conflates framework with provider: Mapping microsoft.agent_framework to openai works when MAF always uses OpenAI-compatible backends, but if a user configures MAF with Anthropic or Google as the underlying provider, this normalization would incorrectly report openai. Consider only normalizing azure_openai to openai and leaving microsoft.agent_framework as-is, or add a code comment explaining why MAF is always OpenAI-compatible in practice.
| # to a list-style attribute on alternative provider layouts). | ||
| try: | ||
| asp = getattr(tracer_provider, "_active_span_processor", None) | ||
| span_processors = ( |
There was a problem hiding this comment.
[Info] Private SDK internals for processor ordering: Accessing _active_span_processor and _span_processors relies on private OTel SDK attributes that could change across SDK versions. The defensive try/except prevents crashes, but if the SDK layout changes, enrichment would silently become a no-op. Consider adding a debug log when the prepend fails.
Closes #52
Summary
新增
opentelemetry-instrumentation-microsoft-agent-framework插件,为 Microsoft Agent Framework 框架提供自动插桩能力,按execute.md实现方案落地,遵循/home/admin/semantic-conventions/arms_docs/trace/gen-ai.md语义规范。rangemer333-cell:feat/microsoft-agent-framework(HEAD9caf6641)alibaba:main${WORKSPACE_ROOT}/llm-dev/microsoft-agent-framework/investigate/execute.md(WORKSPACE_ROOT=/apsara/loongsuite-plugin-microsoft-agent-framework.nPqhpw)${WORKSPACE_ROOT}/validation/microsoft_agent_framework_verification_report.mdvalidation/spans_v5.json(49 spans)validation/logs_v5/exNN.log改动范围
新增插件目录
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/,共 14 文件 / +2233 行:__init__.py— instrument 入口、processor 前置注册span_processor.py—MAFSemanticProcessor,复用opentelemetry.util.genai.utils的截断/PII/gen_ai_json_dumps,对齐openai-agents-v2/span_processor.pyreact_step_patch.py— ReAct step span 走util/opentelemetry-util-genai的ExtendedTelemetryHandler.react_step()semantic_conventions.py/config.py/package.py/version.py/README.rst/pyproject.tomltests/— 36 单测单测
pytest tests/→ 36 passed(原 29 + 本轮新增 7 覆盖 P0/P1/P3 修复路径)。E2E 观测结论(9/9 example 符合)
namespace
loongsuite-maf-e2e,9 个 Deploymentmaf-example-01..09,镜像acos-demo-registry.cn-hangzhou.cr.aliyuncs.com/private-mesh/maf-e2e:plugin-9caf6641。tools/callinner CLIENT spanop=mcp+mcp.method.name=tools/call+gen_ai.tool.name;AGENTprovider=openai归一Review 修复摘要(P0/P1/P3/P5)
react_step_patch.py:91-167:_fil_wrapper/_chat_wrapper改为同步函数返回_scoped()/_step_scoped()coroutine,修复 MAF_agents.py:964await layer.get_response(...)TypeError;ContextVar token 的 set/reset 移入 coroutine body,修复跨 taskValueError。span_processor.py:589-599:gen_ai.operation.name覆盖条件由{TASK, AGENT}扩为{TASK, AGENT, CLIENT},覆盖 MCPtools/callinner span。span_processor.py:218-241:_normalize_provider三步兜底(list/tuple 取首元素 → 精确匹配PROVIDER_NAME_NORMALIZE→.lower()),MAF 写入的microsoft.agent_framework归一为openai。__init__.py:_instrument:add_span_processor之后将MAFSemanticProcessor前置到_active_span_processor._span_processors首位,异常静默降级。Known Limitations
gen_ai.react.round恒为1(ex07 ReAct step span)gen_ai.react.round均为1,未随轮次递增(预期 1/2/3)。execute.md走ExtendedTelemetryHandler.react_step()正确产出 STEP span,仅 round 字段无法填充真实值。validation/spans_v5.jsonex07 三个 STEP span;console 日志validation/logs_v5/ex07.log。semantic-conventions规范 owner 给出 fallback 方案后再补齐。Test Plan
pytest tests/→ 36 passed🤖 Generated with Claude Code