Skip to content

feat(instrumentation-microsoft-agent-framework): add plugin per execute.md#229

Open
rangemer333-cell wants to merge 6 commits into
alibaba:mainfrom
rangemer333-cell:feat/microsoft-agent-framework
Open

feat(instrumentation-microsoft-agent-framework): add plugin per execute.md#229
rangemer333-cell wants to merge 6 commits into
alibaba:mainfrom
rangemer333-cell:feat/microsoft-agent-framework

Conversation

@rangemer333-cell

@rangemer333-cell rangemer333-cell commented Jun 24, 2026

Copy link
Copy Markdown

Closes #52

Summary

新增 opentelemetry-instrumentation-microsoft-agent-framework 插件,为 Microsoft Agent Framework 框架提供自动插桩能力,按 execute.md 实现方案落地,遵循 /home/admin/semantic-conventions/arms_docs/trace/gen-ai.md 语义规范。

  • Fork branch: rangemer333-cell:feat/microsoft-agent-framework (HEAD 9caf6641)
  • Target: alibaba:main
  • 实现方案: ${WORKSPACE_ROOT}/llm-dev/microsoft-agent-framework/investigate/execute.mdWORKSPACE_ROOT=/apsara/loongsuite-plugin-microsoft-agent-framework.nPqhpw
  • 验证报告: ${WORKSPACE_ROOT}/validation/microsoft_agent_framework_verification_report.md
  • 原始 span: validation/spans_v5.json(49 spans)
  • 逐 example 日志: validation/logs_v5/exNN.log

改动范围

新增插件目录 instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/,共 14 文件 / +2233 行:

  • __init__.py — instrument 入口、processor 前置注册
  • span_processor.pyMAFSemanticProcessor,复用 opentelemetry.util.genai.utils 的截断/PII/gen_ai_json_dumps,对齐 openai-agents-v2/span_processor.py
  • react_step_patch.py — ReAct step span 走 util/opentelemetry-util-genaiExtendedTelemetryHandler.react_step()
  • semantic_conventions.py / config.py / package.py / version.py / README.rst / pyproject.toml
  • tests/ — 36 单测

单测

pytest tests/36 passed(原 29 + 本轮新增 7 覆盖 P0/P1/P3 修复路径)。

E2E 观测结论(9/9 example 符合)

namespace loongsuite-maf-e2e,9 个 Deployment maf-example-01..09,镜像 acos-demo-registry.cn-hangzhou.cr.aliyuncs.com/private-mesh/maf-e2e:plugin-9caf6641

Example Span 结论 关键证据
ex01 basic-llm-chat 2 符合 AGENT>HelloAgent > LLM chat qwen3.6-plus, provider=openai
ex02 agent-tool 4 符合 AGENT>LLM/TOOL/LLM,provider 归一
ex03 workflow-function-executor 6 符合 workflow.build + workflow.run > CHAIN + TASK
ex04 workflow-agent-executor 10 符合 嵌套 AGENT 链路完整
ex05 embedding 1 符合 EMBEDDING op=embeddings, prov=openai
ex06 mcp 7 符合 tools/call inner CLIENT span op=mcp+mcp.method.name=tools/call+gen_ai.tool.name;AGENT provider=openai 归一
ex07 react-step 9 符合 原 TypeError 消除;3× STEP react step + 3× LLM chat + 2× TOOL + AGENT ReactAgent
ex08 failure-path 6 符合 TOOL/LLM/AGENT ERROR span 闭环 + exception event
ex09 entry 2 符合 AGENT EntryAgent > LLM chat

Review 修复摘要(P0/P1/P3/P5)

  • P0 react_step_patch.py:91-167: _fil_wrapper / _chat_wrapper 改为同步函数返回 _scoped() / _step_scoped() coroutine,修复 MAF _agents.py:964 await layer.get_response(...) TypeError;ContextVar token 的 set/reset 移入 coroutine body,修复跨 task ValueError
  • P1 span_processor.py:589-599: gen_ai.operation.name 覆盖条件由 {TASK, AGENT} 扩为 {TASK, AGENT, CLIENT},覆盖 MCP tools/call inner span。
  • P3 span_processor.py:218-241: _normalize_provider 三步兜底(list/tuple 取首元素 → 精确匹配 PROVIDER_NAME_NORMALIZE.lower()),MAF 写入的 microsoft.agent_framework 归一为 openai
  • P5 __init__.py:_instrument: add_span_processor 之后将 MAFSemanticProcessor 前置到 _active_span_processor._span_processors 首位,异常静默降级。

Known Limitations

  • P2 — gen_ai.react.round 恒为 1(ex07 ReAct step span)
    • 现象: ex07 3 个 ReAct step span 的 gen_ai.react.round 均为 1,未随轮次递增(预期 1/2/3)。
    • 根因: Microsoft Agent Framework 内部未对外暴露 round counter,插件层面无法拿到稳定的轮次号。非插件代码缺陷——插件已按 execute.mdExtendedTelemetryHandler.react_step() 正确产出 STEP span,仅 round 字段无法填充真实值。
    • 证据: validation/spans_v5.json ex07 三个 STEP span;console 日志 validation/logs_v5/ex07.log
    • Follow-up: 待 MAF 后续版本暴露 round counter,或 semantic-conventions 规范 owner 给出 fallback 方案后再补齐。

Test Plan

  • pytest tests/ → 36 passed
  • 9/9 example K8s 端到端观测符合
  • 主干 Reviewer 代码 Review

🤖 Generated with Claude Code

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new GenAI instrumentation plugin, opentelemetry-instrumentation-microsoft-agent-framework, to automatically enrich Microsoft Agent Framework (MAF) spans into ARMS GenAI semantic conventions via a custom SpanProcessor, plus an optional ReAct-step monkey patch.

Changes:

  • Added MAFSemanticProcessor to classify/enrich MAF spans (kind/op/framework/rename/provider normalization/TTFT) and aggregate ARMS gauges.
  • Added optional react_step_patch to emit STEP spans around ReAct loop LLM calls via ExtendedTelemetryHandler.react_step().
  • Added unit tests for config parsing, processor enrichment/metrics behavior, and patch idempotency/coroutine behavior without requiring MAF installed.

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/init.py Instrumentor: enables MAF native telemetry, registers span processor, optionally applies ReAct patch
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/span_processor.py Core enrichment + metrics aggregation logic
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/react_step_patch.py Optional ReAct loop patch emitting STEP spans
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/semantic_conventions.py Semconv constants + MAF→gen_ai rename map + provider normalization map
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/config.py Env var parsing for enabling instrumentation/metrics/react-step/sensitive data/slow threshold
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/package.py Declares instrumented dependency + metrics support
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/src/opentelemetry/instrumentation/microsoft_agent_framework/version.py Package version
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/pyproject.toml Packaging metadata and entry-point registration
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/README.rst Usage + configuration docs
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/tests/test_processor.py Processor enrichment/metrics tests (incl. MCP + regressions)
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/tests/test_react_step.py ReAct patch tests and direct handler STEP span shape test
instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/tests/test_config.py Config env var parsing tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +413 to +421
self._live_spans: Dict[str, OtelSpan] = {}
self._span_parents: Dict[str, Optional[str]] = {}
self._slow_threshold_ns = int(slow_threshold_ms) * 1_000_000
self._capture_sensitive = capture_sensitive_data
self._counters = _Counters()
self._meter = None
self._gauges: list[ObservableGauge] = []
self._metrics_enabled = metrics_enabled
if metrics_enabled:
Comment on lines +434 to +450
c = self._counters

def _calls_cb(options):
for (model, kind), count in c.calls_count.items():
yield _obs(
count,
{"modelName": model or "unknown", "spanKind": kind},
)

def _duration_cb(options):
for (model, kind), total in c.calls_duration_ns_sum.items():
count = max(c.calls_count.get((model, kind), 0), 1)
yield _obs(
total / count / 1e9,
{"modelName": model or "unknown", "spanKind": kind},
)

Comment on lines +451 to +464
def _error_cb(options):
for (model, kind), count in c.calls_error_count.items():
yield _obs(
count,
{"modelName": model or "unknown", "spanKind": kind},
)

def _slow_cb(options):
for (model, kind), count in c.calls_slow_count.items():
yield _obs(
count,
{"modelName": model or "unknown", "spanKind": kind},
)

Comment on lines +156 to +165
def _uninstrument(self, **kwargs: Any) -> None:
if self._react_applied:
revert_react_step_patch()
self._react_applied = False
if self._processor is not None:
try:
self._processor.shutdown()
except Exception as exc: # pragma: no cover - defensive
logger.debug("processor shutdown error: %s", exc)
self._processor = None
assert asyncio.iscoroutine(coro), (
"FIL wrapper must return a coroutine so MAF can `await` it"
)
result = asyncio.get_event_loop().run_until_complete(coro)

@ralf0131 ralf0131 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

New opentelemetry-instrumentation-microsoft-agent-framework plugin (+2233 lines, 14 files, 36 tests). Well-structured implementation following the established openai-agents-v2 pattern: MAFSemanticProcessor enriches native MAF OTel spans with ARMS GenAI semantic conventions (span kind, operation name, attribute renaming, provider normalization, TTFT backfill, status, 6 metrics gauges). ReAct step patch is opt-in and cleanly revertible. Test coverage is thorough — classification, MCP detection, provider normalization, TTFT, metrics, and patch apply/revert are all covered.

Overall verdict: solid work for a first-time contribution. No blocking issues found. A few suggestions below for production hardening.

Findings

  • [Warning] span_processor.py:684 — Thread-safety of metrics counters (+= on defaultdict)
  • [Info] span_processor.py:553 — Potential _live_spans memory growth on un-ended spans
  • [Info] semantic_conventions.py:106 — Provider normalization conflates framework with provider
  • [Info] init.py:119 — Private SDK internals for processor ordering

Suggestions

  1. For the metrics counters, consider a threading.Lock around _aggregate_metrics — the overhead is negligible since on_end is not a hot path per-span, and it eliminates a class of subtle race conditions in multi-threaded apps.
  2. For _live_spans, a simple sweep in force_flush that removes entries whose span start time is older than e.g. 60s would prevent unbounded growth from orphaned spans.
  3. Consider adding an integration test that verifies the processor works end-to-end with a real (or mock) MAF tracer provider, not just unit tests with mock spans.

First-Time Contributor Note

Welcome and thank you for this high-quality contribution! The code is clean, well-documented, and follows the repository's established patterns. The E2E validation (9/9 examples) is impressive. Please don't let the suggestions above discourage you — they are minor hardening ideas, not blockers.


Automated review by github-manager-bot

@sipercai sipercai left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the existing review threads and avoided duplicating the already-open findings for the non-MAF span guard, _uninstrument() cleanup, metrics locking, and provider normalization. Adding a few additional findings from local pytest/behavior repro, Weaver live-check, and current CI. This is a comment review only, not a request-changes review.

Separate CI note: changelog, generate, precommit, and typecheck are currently failing. generate wants updates to generated bootstrap/docs files, and changelog needs either an entry or the skip label.

GEN_AI_USAGE_OUTPUT_TOKENS = "gen_ai.usage.output_tokens"
GEN_AI_REACT_ROUND = "gen_ai.react.round"
GEN_AI_REACT_FINISH_REASON = "gen_ai.react.finish_reason"
GEN_AI_FRAMEWORK = "gen_ai.framework"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gen_ai.framework is not defined in the current LoongSuite semantic-conventions registry; live-check reports it as a missing attribute and a collision with the existing gen_ai namespace. Either add this extension to the registry/docs first, or use an already registered/non-gen_ai.* attribute key so emitted spans pass the LoongSuite profile.

@sipercai

sipercai commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

跟进修复已推到 PR head:11a1053c80f3a07624a49ec903ab3eae678d43ec

本轮主要修复:

  • 保持 OpenTelemetry status 语义:成功 span 不再强制 StatusCode.OK,失败 span 保留 MAF 的 ERROR
  • 移除未注册的 gen_ai.framework 注入,并增加 MAF span guard,避免普通非 MAF span 被误分类成 GenAI/CHAIN。
  • 修复 ReAct patch 对 MAF 官方 agent.run(..., stream=True) / ResponseStream 的兼容性:streaming 对象透传,非 streaming awaitable 才包 ContextVar 和 react step
  • 将 LLM span 的 OTel kind 调整为 CLIENTgen_ai.response.finish_reasons 从 MAF JSON string 归一成 string array,删除未注册的 gen_ai.user.time_to_first_token
  • 给 metrics counter/callback 加锁,并让 _uninstrument() 从 tracer provider 中移除已注册 processor,避免重复 instrument/uninstrument 后泄漏。
  • 包依赖上没有把 agent-framework-core 放进 instruments extra / 默认 bootstrap:官方 MAF 当前要求 opentelemetry-api>=1.39,而本 workspace 仍在 1.37 线,默认拉入会影响 LoongSuite/后续 Robin 同步。README 改成应用侧显式安装 agent-framework-core

本地验证结果:

  • pytest -q instrumentation-genai/opentelemetry-instrumentation-microsoft-agent-framework/tests -> 41 passed
  • ruff check ... / ruff format --check ... -> passed。
  • uv lock --check -> passed,uv.lock 只补 MAF workspace member/package entry,没有全局降 Python 版本。
  • 用 Microsoft Agent Framework 官方推荐路径真实跑了 OpenAI Chat Completion provider:OpenAIChatCompletionClient(...).as_agent(...)agent.run()agent.run(..., stream=True)、session 多轮、@tool tool calling、并发和失败路径。最终本地 OTLP JSON 产出 29 spans,覆盖 LLM/AGENT/STEP/TOOL,成功 span 为 UNSET,失败链路为 ERROR
  • Weaver live-check:此前 PR 相关的 GenAI 阻断项已消除,包括 finish_reasons 类型、LLM OTel span kind、未注册 gen_ai.user.time_to_first_token。剩余 violation 是 registry/advice 覆盖问题:resource 上 telemetry.sdk.*/service.name、MAF 原生 agent_framework.function.invocation.duration、以及 exception 事件字段,不是本 instrumentation 新增的 GenAI 字段形态问题。

ARMS 云端读回也尝试过多条路径(ARMS collector、SLS OTLP、现有 service/new service),但这次没有稳定拿到 backend readback,主要卡在 ingestion endpoint/auth/instance 映射上;因此这轮结论以真实 MAF runtime + 本地 OTLP + Weaver live-check 为准。GitHub Actions 已批准,目前仍在 queued/pending 等 runner。

@ralf0131 ralf0131 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Re-review after 3 new commits (review feedback addressed + gen-ai convention alignment). The previous [Warning] on thread-safety is now resolvedthreading.Lock guards all metrics counter operations. The framework/provider conflation is also resolved — gen_ai.framework is set separately from gen_ai.provider.name. Code quality remains high.

CI Status (blocking — must fix before merge)

  • changelog ❌ — No CHANGELOG entry. Add one or add the Skip Changelog label.
  • precommit ❌ — ruff format reformats 1 file; rstcheck flags README.rst:45 (title underline too short). See inline comment.
  • typecheck ❌ — Pyright errors in vertexai/patch.py (unnecessary # type: ignore). These are pre-existing — not introduced by this PR. Safe to ignore.

Remaining Findings (non-blocking)

  • [Info] span_processor.py:826 — force_flush still doesn't sweep stale _live_spans. Carried over from previous review; minor hardening suggestion.

First-Time Contributor Note

Great follow-up work addressing the review feedback! The thread-safety fix and framework/provider separation are exactly right. Once the two CI formatting issues (changelog + rstcheck) are resolved, this PR is in good shape. The typecheck failure is in a different package and not your responsibility.


Automated review by github-manager-bot

@ralf0131 ralf0131 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Re-review after commit 8c174f7 ("fix(maf): address review comments and ci"). All previously identified findings are resolved — thread-safety, provider normalization, and CI are all addressed. CI is fully green, CLA signed. Approving.

Findings (all resolved ✅)

  • [Resolved] Thread-safety — threading.Lock() now guards all _live_spans/_span_parents access in on_start, on_end, shutdown, and _sweep_stale_live_spans.
  • [Resolved] Provider normalization — microsoft.agent_framework no longer collapsed to openai; unknown providers are lower-cased and kept distinct (correct, since MAF routes to multiple underlying providers).
  • [Resolved] CI failures — changelog (CHANGELOG.md added), precommit (ruff format + rstcheck fixed), typecheck (pre-existing Pyright errors, not introduced by this PR).

New Functionality

  • _sweep_stale_live_spans() — good addition; prevents memory leaks from spans started but never ended, bounded by 60s max age. Test coverage in test_force_flush_sweeps_stale_live_spans.

Tests

All tests updated to reflect new provider normalization behavior. New test for stale span sweep. Coverage looks solid.


Automated review by github-manager-bot

123liuziming and others added 6 commits June 26, 2026 11:58
…te.md

Implements the hybrid "SpanProcessor + optional ReAct step patch" plan
documented in llm-dev/microsoft-agent-framework/investigate/execute.md.

- MicrosoftAgentFrameworkInstrumentor: enables MAF native OTel layers
  (force=True) and registers MAFSemanticProcessor.
- MAFSemanticProcessor (span_processor.py): injects gen_ai.span.kind,
  gen_ai.operation.name, renames MAF private-prefix attributes to gen_ai.*,
  normalizes provider.name (azure_openai -> openai), backfills TTFT from
  streaming events, sets StatusCode.OK on success, aggregates the 6 ARMS
  gauges. Reuses opentelemetry.util.genai.utils.gen_ai_json_dumps (aligned
  with openai-agents-v2/span_processor.py:27) to coerce dict/list attribute
  values into JSON strings.
- react_step_patch.py (opt-in via ARMS_MAF_REACT_STEP_ENABLED): emits one
  react step span per LLM round-trip inside FunctionInvocationLayer via
  ExtendedTelemetryHandler.react_step() from opentelemetry-util-genai.
- config.py: env switches (master, sensitive data, react step, slow threshold).
- tests: 23 passing unit tests covering span classification, metric
  aggregation, provider normalization, TTFT backfill, dict coercion, and
  react_step handler behavior.
[M1] MCP span classification: detect mcp.method.name attribute (or
SpanKind.CLIENT + mcp.* fallback) in _classify_span and return
(CLIENT, MCP). MAF_SPAN_NAME_PREFIXES documents that MCP is detected via
attribute rather than prefix (method names are unbounded).

[M2] revert_react_step_patch: capture originals BEFORE wrapping (via
__wrapped__ unwrap chain), and switch from broken decorator form to
wrap_function_wrapper(class, name, wrapper) + @wrapt.decorator. revert
now restores the original; apply->revert->apply does not stack wrappers.

[L1] _safe_dumps: cap output at 4096 chars (execute.md single-field cap)
since gen_ai_json_dumps only serializes. Docstring + module docstring
updated to match actual behavior.

Tests: +6 (test_mcp_span_classified_as_client, mcp client-kind fallback,
non-mcp client negative, _safe_dumps truncation, apply->revert->apply
round-trip, _unwrap_to_function). 29 passed.
- P0: react_step_patch wrappers now return coroutines from sync wrappers
  so MAF's await layer.get_response no longer raises TypeError.
  ContextVar tokens are set inside the coroutine body so set/reset share
  the same asyncio task context.
- P1: extend op_name override condition to include CLIENT span kind so MCP
  tools/call inner spans get gen_ai.operation.name=mcp even when MAF
  pre-wrote execute_tool.
- P3: provider.name normalization now handles sequence values (MAF emits
  list-wrapped values on AGENT spans) and falls back to case-insensitive
  matching so microsoft.agent_framework -> openai on AGENT spans.
- P5: instrument() prepends MAFSemanticProcessor to the SDK processor
  tuple so on_end enrichments run before exporter processors registered
  earlier in bootstrap.

Adds 7 unit tests; pytest tests/ -> 36 passed.
@sipercai sipercai force-pushed the feat/microsoft-agent-framework branch from 8c174f7 to f3fe856 Compare June 26, 2026 04:26

@ralf0131 ralf0131 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Re-review after force-push (commits 383158af3fe856). The 3 new commits by @rangemer333-cell move the plugin into the loongsuite matrix and align telemetry with gen-ai semantic conventions. All changes are improvements with no issues found.

Key changes reviewed (commits 4–6)

  • Thread-safety (span_processor.py): _counter_lock and _live_span_lock (threading.Lock) now guard all access to _live_spans, _span_parents, and _counters. Observable gauge callbacks collect into local lists under the lock then yield. Lock ordering is consistent (live_span_lock → counter_lock), no deadlock risk.
  • Proper uninstrument (init.py): New _remove_span_processor() removes the semantic processor from the tracer provider on teardown, preventing stale processor leaks. Uses the same defensive getattr pattern as the prepend path.
  • Non-MAF span filtering (span_processor.py): _is_maf_span() guard in on_end skips non-MAF spans — important correctness fix since the processor is prepended to a shared tracer provider.
  • Stale span sweep (span_processor.py): force_flush() now cleans orphaned entries from _live_spans/_span_parents, preventing memory leaks from unclosed spans.
  • Semantic convention alignment: removed gen_ai.framework / gen_ai.user.time_to_first_token attributes; provider normalization preserves microsoft.agent_framework (lowercased) instead of collapsing to openai; successful spans left UNSET (matching OTel SDK default + Weaver validation); added gen_ai.response.finish_reasons JSON-string normalization.
  • Tests: comprehensive new coverage for thread-safety, force_flush sweep, uninstrument processor removal, non-MAF filtering, finish-reasons normalization, and provider normalization changes.

CI & CLA

  • CLA: signed (license/cla ✓)
  • CI: green (only Auto-Assign-Owners was pending, now passed)

LGTM. Re-approving after the force-push invalidated the previous approval. Ready to merge.


Automated review by github-manager-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add instrumentation for Microsoft agent framework

6 participants