feat(deepagents): support DeepAgents instrumentation#228
Conversation
2a30d21 to
974b460
Compare
ralf0131
left a comment
There was a problem hiding this comment.
Summary
Adds DeepAgents instrumentation: a new loongsuite-instrumentation-deepagents package that patches create_deep_agent to inject ReAct metadata, plus LangChain tracer enhancements for DeepAgents model-node STEP spans and read_file skill loading. Overall code quality is high — follows existing LangGraph patterns, comprehensive test coverage (470-line span tests + instrumentor lifecycle tests), CLA signed.
Findings
- [Warning] pyproject.toml:26 —
requires-python = ">=3.10"should be>=3.11; deepagents 0.6.x needs Python 3.11+, causing CI failure on the 3.10 test job. Remove the 3.10 classifier too. - [Info] patch.py:42 — Duplicated metadata key constants vs
langchain/internal/_utils.py; a silent break if one side changes. - [Info] _tracer.py:621 —
_extract_tool_call_argumentsbehavioral change: structured dict inputs now preserved as-is instead of JSON-serialized. Improvement, but changes span data shape for existing tool calls.
Suggestions
- Fix
requires-pythonto>=3.11and drop the 3.10 classifier to resolve the CI failure. - Consider importing metadata key constants from
_utils.py(or a shared module) instead of duplicating them, to prevent silent drift. - The
_extract_tool_call_argumentsrefactor is well-motivated and tested — no action needed, just ensure downstream span consumers handle dict-typedtool_call_arguments.
Automated review by github-manager-bot
ralf0131
left a comment
There was a problem hiding this comment.
Summary
Re-review following commit 0ae1399a ("fix(deepagents): align python version matrix"). The previous [Warning] about requires-python >= 3.10 is fully resolved — updated to >=3.11 in pyproject.toml, the 3.10 classifier removed, tox-loongsuite.ini envlist updated to py3{11,12,13}, and README compatibility updated to "Python 3.11+". No source code changes in this commit — only build/CI config alignment.
The two [Info] findings from the previous review remain non-blocking:
- Metadata key constant duplication in
patch.pyvs_utils.py— DRY suggestion, no functional impact. _extract_tool_call_argumentsbehavioral change (structured dict preserved as-is) — improvement, well-tested, just a downstream-consumer awareness note.
CLA signed. LGTM — ready to merge.
Automated review by github-manager-bot
There was a problem hiding this comment.
Pull request overview
Adds first-class DeepAgents instrumentation and extends the LangChain tracer to recognize DeepAgents ReAct runs, STEP spans, and skill-loading tool calls (annotating read_file tool spans with gen_ai.skill.* attributes). This fits into the existing instrumentation-loongsuite suite by enabling consistent AGENT/STEP/TOOL/LLM telemetry for DeepAgents workflows, while keeping LangGraph and DeepAgents ReAct handling distinct.
Changes:
- Introduces a new
loongsuite-instrumentation-deepagentspackage (patchingcreate_deep_agent, instrumentor lifecycle, docs, and tests). - Extends
loongsuite-instrumentation-langchaintracer to: (1) preserve structured tool call arguments and (2) detect DeepAgents skills metadata and apply skill attributes toread_filetool spans. - Wires DeepAgents into tox + GitHub Actions job matrices and adds LangChain changelog coverage.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tox-loongsuite.ini | Adds DeepAgents tox test + lint environments and dependency wiring. |
| loongsuite-distro/src/loongsuite/distro/bootstrap_gen.py | Adds DeepAgents to distro bootstrap instrumentation mapping. |
| instrumentation-loongsuite/loongsuite-instrumentation-langchain/tests/test_data_extraction.py | Adds unit tests for new extraction helpers. |
| instrumentation-loongsuite/loongsuite-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/internal/_utils.py | Adds DeepAgents metadata key + STEP node constant + detection helper. |
| instrumentation-loongsuite/loongsuite-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/internal/_tracer.py | Implements DeepAgents ReAct routing + skill metadata caching + tool span skill attributes + structured tool args. |
| instrumentation-loongsuite/loongsuite-instrumentation-langchain/CHANGELOG.md | Documents DeepAgents skill-load detection + structured tool args preservation. |
| instrumentation-loongsuite/loongsuite-instrumentation-deepagents/tests/test_instrumentor.py | Validates DeepAgents instrumentor dependency (un)instrument behavior. |
| instrumentation-loongsuite/loongsuite-instrumentation-deepagents/tests/test_deepagents_spans.py | Integration tests for AGENT/STEP spans and skill-load tool attribution. |
| instrumentation-loongsuite/loongsuite-instrumentation-deepagents/tests/requirements.latest.txt | Adds DeepAgents test dependencies. |
| instrumentation-loongsuite/loongsuite-instrumentation-deepagents/tests/conftest.py | Test fixtures for in-memory exporters + DeepAgents instrumentation. |
| instrumentation-loongsuite/loongsuite-instrumentation-deepagents/src/opentelemetry/instrumentation/deepagents/version.py | Introduces DeepAgents package version module. |
| instrumentation-loongsuite/loongsuite-instrumentation-deepagents/src/opentelemetry/instrumentation/deepagents/package.py | Declares instrumented library constraints. |
| instrumentation-loongsuite/loongsuite-instrumentation-deepagents/src/opentelemetry/instrumentation/deepagents/internal/patch.py | Patches create_deep_agent and injects metadata into run configs. |
| instrumentation-loongsuite/loongsuite-instrumentation-deepagents/src/opentelemetry/instrumentation/deepagents/internal/init.py | Adds internal package marker. |
| instrumentation-loongsuite/loongsuite-instrumentation-deepagents/src/opentelemetry/instrumentation/deepagents/init.py | Implements DeepAgentsInstrumentor and dependency instrumentation. |
| instrumentation-loongsuite/loongsuite-instrumentation-deepagents/README.md | Documents installation/usage and telemetry behavior. |
| instrumentation-loongsuite/loongsuite-instrumentation-deepagents/pyproject.toml | Adds packaging metadata, deps, and OTEL entrypoint. |
| instrumentation-loongsuite/loongsuite-instrumentation-deepagents/CHANGELOG.md | Adds initial DeepAgents instrumentation changelog. |
| instrumentation-genai/opentelemetry-instrumentation-vertexai/src/opentelemetry/instrumentation/vertexai/patch.py | Removes a couple of Pyright type: ignore suppressions. |
| .github/workflows/loongsuite_test_0.yml | Adds DeepAgents test jobs to generated workflow matrix. |
| .github/workflows/loongsuite_lint_0.yml | Adds DeepAgents lint job to generated workflow matrix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ralf0131
left a comment
There was a problem hiding this comment.
Summary
Re-review after new commits. Adds a new loongsuite-instrumentation-deepagents package and integrates DeepAgents skill-loading and ReAct-step tracing into the LangChain tracer. The graph-marker/with_config re-marking approach mirrors LangGraph's metadata handoff cleanly, the ReAct STEP rules are correctly separated per framework, and test coverage is thorough. Findings below are all minor hardening/maintainability notes; nothing blocking. CI (license/cla) is passing; some claw-eval jobs were still pending at review time.
Suggestions
- Cap or document the
_strong_wrapped_graphsfallback list to avoid unbounded growth in long-running processes. - Verify downstream OTLP serialization handles the now-structured
tool_call_arguments(dict/list) without double-encoding. - Extract the
SkillsMiddleware.before_agentnode name to a constant so a rename is easy to track.
Cross-repo Note
The vertexai changes here are pure type: ignore cleanup — no behavioral change. No shared protocol surface with loongsuite-pilot.
Automated review by github-manager-bot
| try: | ||
| _wrapped_graphs.add(graph) | ||
| except TypeError: | ||
| _strong_wrapped_graphs.append(graph) |
There was a problem hiding this comment.
[Info] _strong_wrapped_graphs is a module-level list that holds strong references to graphs that cannot be weak-referenced. In long-running processes that create many such graphs this grows without bound. Most LangChain graphs are weak-referenceable so this is rarely hit, but consider capping the list size or documenting the expectation. Non-blocking.
| return False | ||
|
|
||
|
|
||
| def _extract_tool_call_arguments(inputs: Any) -> Any: |
There was a problem hiding this comment.
[Info] _extract_tool_call_arguments now returns the raw dict/list value instead of JSON-serializing to a string (the intentional improvement noted in the diff). Worth confirming that downstream consumers — OTLP attribute serialization and any gen_ai.tool.call.arguments redaction — consistently handle non-string values so structured inputs are not double-encoded or dropped.
| """Cache SkillsMiddleware metadata on the parent DeepAgents run.""" | ||
| if not _has_deepagents_metadata(run): | ||
| return | ||
| if getattr(run, "name", "") != "SkillsMiddleware.before_agent": |
There was a problem hiding this comment.
[Info] The hardcoded node name "SkillsMiddleware.before_agent" couples this instrumentation to a specific DeepAgents internal name. If DeepAgents renames the node, skill-metadata capture degrades silently. Consider extracting it to a module-level constant for easier maintenance and discoverability.
|
Thanks for the re-review. I agree these are useful hardening notes, and I’m keeping them non-blocking for this PR to avoid expanding the scope after CI is green.
No additional code changes planned from these non-blocking comments. |
Description
Adds DeepAgents instrumentation coverage and LangChain tracer support for DeepAgents skill loading. DeepAgents exposes skill loading as a builtin filesystem tool call,
read_file(file_path="/skills/<skill>/SKILL.md"), so this change annotates matching tool spans with skill metadata captured fromSkillsMiddleware.This also maps DeepAgents model nodes to ReAct STEP spans, keeps LangGraph/DeepAgents ReAct step handling separated when both frameworks are installed, and captures OpenAI-style DeepAgents/LangGraph root input messages on the root
AGENTspan.Fixes # (none)
Type of change
How Has This Been Tested?
/tmp/loongsuite-arms-visual-verify/20260624161504/venv312/bin/python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-deepagents/tests -q(10 passed)/tmp/loongsuite-arms-visual-verify/20260624161504/venv-langchain-stable312/bin/python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-langchain/tests -q(158 passed)/tmp/loongsuite-arms-visual-verify/20260624161504/venv-langchain-stable312/bin/python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-langgraph/tests -q(16 passed)tox -e precommittox -e generate-workflowspython /Users/sipercai/project/ai_loop/team-skills/loongsuite-github-pipeline/scripts/check_loongsuite_pr_readiness.py --repo .python3 -m py_compile instrumentation-loongsuite/loongsuite-instrumentation-deepagents/src/opentelemetry/instrumentation/deepagents/internal/patch.py instrumentation-loongsuite/loongsuite-instrumentation-deepagents/tests/test_instrumentor.py instrumentation-loongsuite/loongsuite-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/internal/_tracer.py instrumentation-loongsuite/loongsuite-instrumentation-langchain/tests/test_agent_spans.py instrumentation-loongsuite/loongsuite-instrumentation-deepagents/tests/test_deepagents_spans.pyuvx ruff check instrumentation-loongsuite/loongsuite-instrumentation-deepagents/src/opentelemetry/instrumentation/deepagents/internal/patch.py instrumentation-loongsuite/loongsuite-instrumentation-deepagents/tests/test_instrumentor.py instrumentation-loongsuite/loongsuite-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/internal/_tracer.py instrumentation-loongsuite/loongsuite-instrumentation-langchain/tests/test_agent_spans.py instrumentation-loongsuite/loongsuite-instrumentation-deepagents/tests/test_deepagents_spans.pyuvx --with tox-uv tox -c tox-loongsuite.ini -e py311-test-loongsuite-instrumentation-deepagents-latest -- instrumentation-loongsuite/loongsuite-instrumentation-deepagents/tests/test_instrumentor.py::test_uninstrument_restores_wrapped_graph_methods instrumentation-loongsuite/loongsuite-instrumentation-deepagents/tests/test_deepagents_spans.py::test_deepagents_root_span_is_agent_and_single_step -q(11 passed)uvx --with tox-uv tox -c tox-loongsuite.ini -e py311-test-loongsuite-instrumentation-langchain-latest -- instrumentation-loongsuite/loongsuite-instrumentation-langchain/tests/test_agent_spans.py::test_extract_langgraph_input_message_from_openai_style_dict instrumentation-loongsuite/loongsuite-instrumentation-langchain/tests/test_agent_spans.py::test_extract_langgraph_input_message_keeps_empty_content_tool_call -q(160 passed)git diff --checkgen_ai.skill.*attributes on theread_filetool span.Does This PR Require a Core Repo Change?
Checklist: