Skip to content

feat(instrumentation-microsoft-agent-framework): add plugin per execute.md#229

Open
rangemer333-cell wants to merge 3 commits into
alibaba:mainfrom
rangemer333-cell:feat/microsoft-agent-framework
Open

feat(instrumentation-microsoft-agent-framework): add plugin per execute.md#229
rangemer333-cell wants to merge 3 commits into
alibaba:mainfrom
rangemer333-cell:feat/microsoft-agent-framework

Conversation

@rangemer333-cell

@rangemer333-cell rangemer333-cell commented Jun 24, 2026

Copy link
Copy Markdown

Closes #52

Summary

新增 opentelemetry-instrumentation-microsoft-agent-framework 插件,为 Microsoft Agent Framework 框架提供自动插桩能力,按 execute.md 实现方案落地,遵循 /home/admin/semantic-conventions/arms_docs/trace/gen-ai.md 语义规范。

  • Fork branch: rangemer333-cell:feat/microsoft-agent-framework (HEAD 9caf6641)
  • Target: alibaba:main
  • 实现方案: ${WORKSPACE_ROOT}/llm-dev/microsoft-agent-framework/investigate/execute.mdWORKSPACE_ROOT=/apsara/loongsuite-plugin-microsoft-agent-framework.nPqhpw
  • 验证报告: ${WORKSPACE_ROOT}/validation/microsoft_agent_framework_verification_report.md
  • 原始 span: validation/spans_v5.json(49 spans)
  • 逐 example 日志: validation/logs_v5/exNN.log

改动范围

新增插件目录 instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/,共 14 文件 / +2233 行:

  • __init__.py — instrument 入口、processor 前置注册
  • span_processor.pyMAFSemanticProcessor,复用 opentelemetry.util.genai.utils 的截断/PII/gen_ai_json_dumps,对齐 openai-agents-v2/span_processor.py
  • react_step_patch.py — ReAct step span 走 util/opentelemetry-util-genaiExtendedTelemetryHandler.react_step()
  • semantic_conventions.py / config.py / package.py / version.py / README.rst / pyproject.toml
  • tests/ — 36 单测

单测

pytest tests/36 passed(原 29 + 本轮新增 7 覆盖 P0/P1/P3 修复路径)。

E2E 观测结论(9/9 example 符合)

namespace loongsuite-maf-e2e,9 个 Deployment maf-example-01..09,镜像 acos-demo-registry.cn-hangzhou.cr.aliyuncs.com/private-mesh/maf-e2e:plugin-9caf6641

Example Span 结论 关键证据
ex01 basic-llm-chat 2 符合 AGENT>HelloAgent > LLM chat qwen3.6-plus, provider=openai
ex02 agent-tool 4 符合 AGENT>LLM/TOOL/LLM,provider 归一
ex03 workflow-function-executor 6 符合 workflow.build + workflow.run > CHAIN + TASK
ex04 workflow-agent-executor 10 符合 嵌套 AGENT 链路完整
ex05 embedding 1 符合 EMBEDDING op=embeddings, prov=openai
ex06 mcp 7 符合 tools/call inner CLIENT span op=mcp+mcp.method.name=tools/call+gen_ai.tool.name;AGENT provider=openai 归一
ex07 react-step 9 符合 原 TypeError 消除;3× STEP react step + 3× LLM chat + 2× TOOL + AGENT ReactAgent
ex08 failure-path 6 符合 TOOL/LLM/AGENT ERROR span 闭环 + exception event
ex09 entry 2 符合 AGENT EntryAgent > LLM chat

Review 修复摘要(P0/P1/P3/P5)

  • P0 react_step_patch.py:91-167: _fil_wrapper / _chat_wrapper 改为同步函数返回 _scoped() / _step_scoped() coroutine,修复 MAF _agents.py:964 await layer.get_response(...) TypeError;ContextVar token 的 set/reset 移入 coroutine body,修复跨 task ValueError
  • P1 span_processor.py:589-599: gen_ai.operation.name 覆盖条件由 {TASK, AGENT} 扩为 {TASK, AGENT, CLIENT},覆盖 MCP tools/call inner span。
  • P3 span_processor.py:218-241: _normalize_provider 三步兜底(list/tuple 取首元素 → 精确匹配 PROVIDER_NAME_NORMALIZE.lower()),MAF 写入的 microsoft.agent_framework 归一为 openai
  • P5 __init__.py:_instrument: add_span_processor 之后将 MAFSemanticProcessor 前置到 _active_span_processor._span_processors 首位,异常静默降级。

Known Limitations

  • P2 — gen_ai.react.round 恒为 1(ex07 ReAct step span)
    • 现象: ex07 3 个 ReAct step span 的 gen_ai.react.round 均为 1,未随轮次递增(预期 1/2/3)。
    • 根因: Microsoft Agent Framework 内部未对外暴露 round counter,插件层面无法拿到稳定的轮次号。非插件代码缺陷——插件已按 execute.mdExtendedTelemetryHandler.react_step() 正确产出 STEP span,仅 round 字段无法填充真实值。
    • 证据: validation/spans_v5.json ex07 三个 STEP span;console 日志 validation/logs_v5/ex07.log
    • Follow-up: 待 MAF 后续版本暴露 round counter,或 semantic-conventions 规范 owner 给出 fallback 方案后再补齐。

Test Plan

  • pytest tests/ → 36 passed
  • 9/9 example K8s 端到端观测符合
  • 主干 Reviewer 代码 Review

🤖 Generated with Claude Code

…te.md

Implements the hybrid "SpanProcessor + optional ReAct step patch" plan
documented in llm-dev/microsoft-agent-framework/investigate/execute.md.

- MicrosoftAgentFrameworkInstrumentor: enables MAF native OTel layers
  (force=True) and registers MAFSemanticProcessor.
- MAFSemanticProcessor (span_processor.py): injects gen_ai.span.kind,
  gen_ai.operation.name, renames MAF private-prefix attributes to gen_ai.*,
  normalizes provider.name (azure_openai -> openai), backfills TTFT from
  streaming events, sets StatusCode.OK on success, aggregates the 6 ARMS
  gauges. Reuses opentelemetry.util.genai.utils.gen_ai_json_dumps (aligned
  with openai-agents-v2/span_processor.py:27) to coerce dict/list attribute
  values into JSON strings.
- react_step_patch.py (opt-in via ARMS_MAF_REACT_STEP_ENABLED): emits one
  react step span per LLM round-trip inside FunctionInvocationLayer via
  ExtendedTelemetryHandler.react_step() from opentelemetry-util-genai.
- config.py: env switches (master, sensitive data, react step, slow threshold).
- tests: 23 passing unit tests covering span classification, metric
  aggregation, provider normalization, TTFT backfill, dict coercion, and
  react_step handler behavior.
[M1] MCP span classification: detect mcp.method.name attribute (or
SpanKind.CLIENT + mcp.* fallback) in _classify_span and return
(CLIENT, MCP). MAF_SPAN_NAME_PREFIXES documents that MCP is detected via
attribute rather than prefix (method names are unbounded).

[M2] revert_react_step_patch: capture originals BEFORE wrapping (via
__wrapped__ unwrap chain), and switch from broken decorator form to
wrap_function_wrapper(class, name, wrapper) + @wrapt.decorator. revert
now restores the original; apply->revert->apply does not stack wrappers.

[L1] _safe_dumps: cap output at 4096 chars (execute.md single-field cap)
since gen_ai_json_dumps only serializes. Docstring + module docstring
updated to match actual behavior.

Tests: +6 (test_mcp_span_classified_as_client, mcp client-kind fallback,
non-mcp client negative, _safe_dumps truncation, apply->revert->apply
round-trip, _unwrap_to_function). 29 passed.
- P0: react_step_patch wrappers now return coroutines from sync wrappers
  so MAF's await layer.get_response no longer raises TypeError.
  ContextVar tokens are set inside the coroutine body so set/reset share
  the same asyncio task context.
- P1: extend op_name override condition to include CLIENT span kind so MCP
  tools/call inner spans get gen_ai.operation.name=mcp even when MAF
  pre-wrote execute_tool.
- P3: provider.name normalization now handles sequence values (MAF emits
  list-wrapped values on AGENT spans) and falls back to case-insensitive
  matching so microsoft.agent_framework -> openai on AGENT spans.
- P5: instrument() prepends MAFSemanticProcessor to the SDK processor
  tuple so on_end enrichments run before exporter processors registered
  earlier in bootstrap.

Adds 7 unit tests; pytest tests/ -> 36 passed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new GenAI instrumentation plugin, opentelemetry-instrumentation-microsoft-agent-framework, to automatically enrich Microsoft Agent Framework (MAF) spans into ARMS GenAI semantic conventions via a custom SpanProcessor, plus an optional ReAct-step monkey patch.

Changes:

  • Added MAFSemanticProcessor to classify/enrich MAF spans (kind/op/framework/rename/provider normalization/TTFT) and aggregate ARMS gauges.
  • Added optional react_step_patch to emit STEP spans around ReAct loop LLM calls via ExtendedTelemetryHandler.react_step().
  • Added unit tests for config parsing, processor enrichment/metrics behavior, and patch idempotency/coroutine behavior without requiring MAF installed.

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/init.py Instrumentor: enables MAF native telemetry, registers span processor, optionally applies ReAct patch
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/span_processor.py Core enrichment + metrics aggregation logic
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/react_step_patch.py Optional ReAct loop patch emitting STEP spans
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/semantic_conventions.py Semconv constants + MAF→gen_ai rename map + provider normalization map
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/config.py Env var parsing for enabling instrumentation/metrics/react-step/sensitive data/slow threshold
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/package.py Declares instrumented dependency + metrics support
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/version.py Package version
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/pyproject.toml Packaging metadata and entry-point registration
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/README.rst Usage + configuration docs
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/tests/test_processor.py Processor enrichment/metrics tests (incl. MCP + regressions)
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/tests/test_react_step.py ReAct patch tests and direct handler STEP span shape test
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/tests/test_config.py Config env var parsing tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +30 to +41
import logging
from collections import defaultdict
from typing import Any, Dict, Optional, Tuple

from opentelemetry.context import Context
from opentelemetry.metrics import ObservableGauge, get_meter
from opentelemetry.sdk.trace import SpanProcessor
from opentelemetry.sdk.trace import TracerProvider # noqa: F401 (typing hint)
from opentelemetry.trace import Span as OtelSpan, Status, StatusCode
from opentelemetry.trace.span import TraceState # noqa: F401
from opentelemetry.util.genai.utils import gen_ai_json_dumps
from opentelemetry.util.types import AttributeValue
Comment on lines +413 to +421
self._live_spans: Dict[str, OtelSpan] = {}
self._span_parents: Dict[str, Optional[str]] = {}
self._slow_threshold_ns = int(slow_threshold_ms) * 1_000_000
self._capture_sensitive = capture_sensitive_data
self._counters = _Counters()
self._meter = None
self._gauges: list[ObservableGauge] = []
self._metrics_enabled = metrics_enabled
if metrics_enabled:
Comment on lines +434 to +450
c = self._counters

def _calls_cb(options):
for (model, kind), count in c.calls_count.items():
yield _obs(
count,
{"modelName": model or "unknown", "spanKind": kind},
)

def _duration_cb(options):
for (model, kind), total in c.calls_duration_ns_sum.items():
count = max(c.calls_count.get((model, kind), 0), 1)
yield _obs(
total / count / 1e9,
{"modelName": model or "unknown", "spanKind": kind},
)

Comment on lines +451 to +464
def _error_cb(options):
for (model, kind), count in c.calls_error_count.items():
yield _obs(
count,
{"modelName": model or "unknown", "spanKind": kind},
)

def _slow_cb(options):
for (model, kind), count in c.calls_slow_count.items():
yield _obs(
count,
{"modelName": model or "unknown", "spanKind": kind},
)

Comment on lines +465 to +493
def _ttft_cb(options):
for (model, kind), total in c.llm_first_token_ns_sum.items():
count = max(c.llm_first_token_count.get((model, kind), 0), 1)
yield _obs(
total / count / 1e9,
{"modelName": model or "unknown", "spanKind": kind},
)

def _tokens_input_cb(options):
for (model, kind), total in c.llm_usage_input_tokens.items():
yield _obs(
total,
{
"modelName": model or "unknown",
"spanKind": kind,
"usageType": "input",
},
)

def _tokens_output_cb(options):
for (model, kind), total in c.llm_usage_output_tokens.items():
yield _obs(
total,
{
"modelName": model or "unknown",
"spanKind": kind,
"usageType": "output",
},
)
Comment on lines +156 to +165
def _uninstrument(self, **kwargs: Any) -> None:
if self._react_applied:
revert_react_step_patch()
self._react_applied = False
if self._processor is not None:
try:
self._processor.shutdown()
except Exception as exc: # pragma: no cover - defensive
logger.debug("processor shutdown error: %s", exc)
self._processor = None
assert asyncio.iscoroutine(coro), (
"FIL wrapper must return a coroutine so MAF can `await` it"
)
result = asyncio.get_event_loop().run_until_complete(coro)
assert asyncio.iscoroutine(coro), (
"Chat wrapper must pass through the wrapped coroutine unchanged"
)
result = asyncio.get_event_loop().run_until_complete(coro)
Comment on lines +33 to +36
``ARMS_MAF_INSTRUMENTATION_ENABLED`` ``true`` Master switch; ``false`` disables instrumentation.
``ARMS_MAF_SENSITIVE_DATA_ENABLED`` ``false`` Capture inputs/outputs (linked to MAF's ``ENABLE_SENSITIVE_DATA``).
``ARMS_MAF_REACT_STEP_ENABLED`` ``false`` Emit ``react step`` spans (opt-in).
``ARMS_MAF_SLOW_THRESHOLD_MS`` ``1000`` Slow-call threshold in ms.
Comment on lines +46 to +52
GEN_AI_FRAMEWORK,
GEN_AI_OPERATION_NAME,
GEN_AI_PROVIDER_NAME,
GEN_AI_REACT_ROUND,
GEN_AI_REQUEST_MODEL,
GEN_AI_RESPONSE_MODEL,
GEN_AI_RESPONSE_TTFT,

@ralf0131 ralf0131 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

New opentelemetry-instrumentation-microsoft-agent-framework plugin (+2233 lines, 14 files, 36 tests). Well-structured implementation following the established openai-agents-v2 pattern: MAFSemanticProcessor enriches native MAF OTel spans with ARMS GenAI semantic conventions (span kind, operation name, attribute renaming, provider normalization, TTFT backfill, status, 6 metrics gauges). ReAct step patch is opt-in and cleanly revertible. Test coverage is thorough — classification, MCP detection, provider normalization, TTFT, metrics, and patch apply/revert are all covered.

Overall verdict: solid work for a first-time contribution. No blocking issues found. A few suggestions below for production hardening.

Findings

  • [Warning] span_processor.py:684 — Thread-safety of metrics counters (+= on defaultdict)
  • [Info] span_processor.py:553 — Potential _live_spans memory growth on un-ended spans
  • [Info] semantic_conventions.py:106 — Provider normalization conflates framework with provider
  • [Info] init.py:119 — Private SDK internals for processor ordering

Suggestions

  1. For the metrics counters, consider a threading.Lock around _aggregate_metrics — the overhead is negligible since on_end is not a hot path per-span, and it eliminates a class of subtle race conditions in multi-threaded apps.
  2. For _live_spans, a simple sweep in force_flush that removes entries whose span start time is older than e.g. 60s would prevent unbounded growth from orphaned spans.
  3. Consider adding an integration test that verifies the processor works end-to-end with a real (or mock) MAF tracer provider, not just unit tests with mock spans.

First-Time Contributor Note

Welcome and thank you for this high-quality contribution! The code is clean, well-documented, and follows the repository's established patterns. The E2E validation (9/9 examples) is impressive. Please don't let the suggestions above discourage you — they are minor hardening ideas, not blockers.


Automated review by github-manager-bot

model = model if isinstance(model, str) else "unknown"
key = (model, span_kind)
self._counters.calls_count[key] += 1

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Warning] Thread-safety of metrics counters: The defaultdict increments here (self._counters.calls_count[key] += 1) are read-modify-write operations that are not atomic under concurrent access. While CPython's GIL makes individual dict lookups/stores atomic, the += sequence can lose increments if two threads end spans simultaneously with the same (model, span_kind) key. For high-throughput multi-threaded apps, consider wrapping counter updates in a threading.Lock.

except Exception:
return
self._live_spans[key] = span
parent = getattr(span, "_parent", None)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Info] Potential _live_spans memory growth: If a span is started via on_start but never ended (e.g., user code crashes mid-span), its entry in _live_spans will never be cleaned up. For long-running processes, consider adding a periodic sweep in force_flush to remove entries older than a threshold.

"azure_ai_openai": "openai",
"azure.openai": "openai",
"microsoft.agent_framework": "openai",
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Info] Provider normalization conflates framework with provider: Mapping microsoft.agent_framework to openai works when MAF always uses OpenAI-compatible backends, but if a user configures MAF with Anthropic or Google as the underlying provider, this normalization would incorrectly report openai. Consider only normalizing azure_openai to openai and leaving microsoft.agent_framework as-is, or add a code comment explaining why MAF is always OpenAI-compatible in practice.

# to a list-style attribute on alternative provider layouts).
try:
asp = getattr(tracer_provider, "_active_span_processor", None)
span_processors = (

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Info] Private SDK internals for processor ordering: Accessing _active_span_processor and _span_processors relies on private OTel SDK attributes that could change across SDK versions. The defensive try/except prevents crashes, but if the SDK layout changes, enrichment would silently become a no-op. Consider adding a debug log when the prepend fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add instrumentation for Microsoft agent framework

4 participants