Skip to content

feat: add snapshot to multi agent orchestrators#668

Open
JackYPCOnline wants to merge 4 commits intostrands-agents:mainfrom
JackYPCOnline:multiagent_snapshot
Open

feat: add snapshot to multi agent orchestrators#668
JackYPCOnline wants to merge 4 commits intostrands-agents:mainfrom
JackYPCOnline:multiagent_snapshot

Conversation

@JackYPCOnline
Copy link
Contributor

@JackYPCOnline JackYPCOnline commented Mar 16, 2026

Description

Adds snapshot (state capture/restore) support for multi-agent orchestrators (Graph and Swarm), along with serialization (toJSON/fromJSON) for all multi-agent state classes.

State serialization (state.ts)

Adds toJSON(): JSONValue and static fromJSON(data: JSONValue) methods to all four state classes:

  • NodeResult — serializes nodeId, status, duration, content blocks, optional error (as message string via normalizeError), and optional structuredOutput. Omits absent optional fields from JSON.
  • NodeState — serializes status, terminus, startTime, and results array.
  • MultiAgentResult — serializes status, results, content, duration, and optional error.
  • MultiAgentState — serializes startTime, steps, results, app state, and per-node states. Intentionally excludes structuredOutputSchema (Zod schema is config, not state — must be re-provided by the caller).

Multi-agent snapshot (snapshot.ts)

New module implementing takeSnapshot and loadSnapshot for Graph/Swarm orchestrators:

  • Two presets:

    • session — lightweight: Sufficient for resume since agent nodes are isolated per-execution. THIS IS A PLACEHOLDER for now.
    • full (default)— additionally captures per-node agent snapshots. Nested MultiAgentNodes are snapshotted recursively with proper Snapshot envelopes (orchestratorId + nodes, no state since nested execution state is ephemeral).
  • takeSnapshot(orchestrator, state?, options?) — state parameter is optional (undefined for nested orchestrators). Nested recursive calls explicitly pass appData: {} to prevent parent appData leaking into nested snapshots.

  • loadSnapshot(orchestrator, snapshot) — validates scope (multiAgent), schemaVersion, and orchestratorId. Restores per-node agent snapshots when present. Nested orchestratorId validation is lenient (warn + skip, not throw) since stale nested snapshots shouldn't fail the entire load. Returns MultiAgentState | undefined.

  • instanceof Agent guard — since AgentNode.agent returns AgentBase (not Agent), snapshot operations are guarded with node.agent instanceof Agent to only snapshot concrete Agent instances that have messages/state properties.

Example

{
  "scope": "multiAgent",
  "schemaVersion": "1.0",
  "createdAt": "2026-01-15T12:00:00.000Z",
  "appData": { "userId": "u-123" },
  "data": {
    "orchestratorId": "outer-graph",
    "state": { "startTime": 1736942400000, "steps": 2, "results": [...], "app": {}, "nodes": {...} },
    "nodes": {
      "a": {
        "scope": "agent",
        "schemaVersion": "1.0",
        "createdAt": "2026-01-15T12:00:00.000Z",
        "appData": {},
        "data": {
          "messages": [...],
          "state": { "agentKey": "agentVal" },
          "systemPrompt": [{ "text": "You are agent A." }]
        }
      },
      "inner-graph": {
        "scope": "multiAgent",
        "schemaVersion": "1.0",
        "createdAt": "2026-01-15T12:00:00.000Z",
        "appData": {},
        "data": {
          "orchestratorId": "inner-graph",
          "nodes": {
            "x": {
              "scope": "agent",
              "schemaVersion": "1.0",
              "createdAt": "2026-01-15T12:00:00.000Z",
              "appData": {},
              "data": {
                "messages": [...],
                "state": {},
                "systemPrompt": [{ "text": "You are agent X." }]
              }
            },
            "y": {
              "scope": "agent",
              "schemaVersion": "1.0",
              "createdAt": "2026-01-15T12:00:00.000Z",
              "appData": {},
              "data": {
                "messages": [...],
                "state": {},
                "systemPrompt": null
              }
            }
          }
        }
      }
    }
  }
}

Exports (index.ts)

Exports takeSnapshot, loadSnapshot, MultiAgentSnapshotPreset, and TakeMultiAgentSnapshotOptions from the multiagent barrel.

Test coverage

  • state.test.ts (24 tests) — toJSON/fromJSON round-trips for NodeResult (9), NodeState (3), MultiAgentResult (4), MultiAgentState (8). Covers completed/failed/cancelled results, structuredOutput with nested objects/null/primitives, multiple content block types, error serialization, schema exclusion, edge cases.

  • snapshot.test.ts (30 tests) — takeSnapshot (14): session/full presets, appData, state omission, nested MultiAgentNode recursion, nested appData isolation, agentSnapshotOptions forwarding, Swarm support. loadSnapshot (11): state restoration, validation errors (scope/version/orchestratorId), unknown node warning, nested orchestratorId mismatch warning, null state handling. Round-trip (5): state fidelity, JSON.stringify/parse survival, full preset agent state preservation, nested graph through JSON serialization.

Related Issues

Documentation PR

Type of Change

New feature

Testing

How have you tested the change?

  • All 54 new unit tests pass (npx vitest run --project unit-node)

  • 0 type errors from new/modified files (npx tsc --project src/tsconfig.json --noEmit)

  • Pre-existing type errors unchanged (telemetry/config.ts)

  • I ran npm run check

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@github-actions github-actions bot added the strands-running <strands-managed> Whether or not an agent is currently running label Mar 16, 2026
nodeId: d.nodeId as string,
status: d.status as ResultStatus,
duration: d.duration as number,
content: (d.content as JSONValue[]).map((c) => contentBlockFromData(c as never)),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The as never cast suppresses type checking entirely.

Suggestion: contentBlockFromData accepts ContentBlockData type. Consider using that type explicitly:

content: (d.content as ContentBlockData[]).map(contentBlockFromData),

This provides better type safety and documents the expected shape.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Using ContentBlockData[] explicitly is much better for type safety. ✅

@github-actions github-actions bot removed the strands-running <strands-managed> Whether or not an agent is currently running label Mar 16, 2026
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: Request Changes

This PR introduces snapshot support for multi-agent orchestrators with well-designed state serialization and comprehensive test coverage. However, there are a few items that need attention before merging.

Required Actions
  • API Review Label: This PR introduces new public APIs (takeSnapshot, loadSnapshot, MultiAgentSnapshotPreset, TakeMultiAgentSnapshotOptions) exported from the multiagent module. Per the API Bar Raising guidelines, please add the needs-api-review label and include in the PR description:

    • Expected use cases for the snapshot feature
    • Example code snippets demonstrating usage
    • Complete API signatures with default parameter values
  • Documentation PR: This is a new feature that adds public API surface. Per the documentation review guidelines, a Documentation PR is required for new features with public APIs. Please add a "Documentation PR" section with a link to https://github.com/strands-agents/docs/pull/... or provide justification if documentation is not needed.

Code Quality Suggestions
  • See inline comments regarding type assertions in state.ts (lines 81 and 261)

The implementation follows established patterns from the existing agent snapshot module and includes thorough test coverage with 54 tests.

@github-actions github-actions bot added the strands-running <strands-managed> Whether or not an agent is currently running label Mar 16, 2026
@github-actions
Copy link

Thanks for the update! I noticed the new commit that changes the default preset to full and marks session as a placeholder for future session manager integration.

Remaining items from previous review:

The blocking issues from my previous review still need to be addressed:

  1. needs-api-review label - This PR introduces new public APIs and requires the label per API Bar Raising guidelines

  2. Documentation PR - A link to the documentation PR in https://github.com/strands-agents/docs/pull/... is required for new features with public APIs

  3. Inline comments - The two suggestions about type assertions (lines 81 and 261 in state.ts) are still open

Once these are addressed, I'm happy to re-review!

@github-actions github-actions bot removed the strands-running <strands-managed> Whether or not an agent is currently running label Mar 16, 2026
@github-actions github-actions bot added the strands-running <strands-managed> Whether or not an agent is currently running label Mar 16, 2026
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: Request Changes

Thanks for addressing the feedback! The code quality suggestions have been resolved and the needs-api-review label has been added.

Remaining Item
  • Documentation PR: A link to the documentation PR in https://github.com/strands-agents/docs/pull/... is still required for new features with public APIs. Please update the PR description with a "Documentation PR" section, or provide justification if documentation is not needed.

Once the documentation PR is linked, this is ready for approval from a code quality perspective.

@github-actions github-actions bot removed the strands-running <strands-managed> Whether or not an agent is currently running label Mar 16, 2026
@JackYPCOnline JackYPCOnline marked this pull request as ready for review March 16, 2026 22:01
@github-actions github-actions bot added strands-running <strands-managed> Whether or not an agent is currently running and removed strands-running <strands-managed> Whether or not an agent is currently running labels Mar 16, 2026
@github-actions github-actions bot added the strands-running <strands-managed> Whether or not an agent is currently running label Mar 20, 2026
@github-actions
Copy link

Thanks for adding the example JSON structure to the PR description — this is very helpful for understanding the snapshot format and will be valuable for documentation!

Current Status:

  • needs-api-review label added
  • ✅ Code quality issues addressed
  • ✅ API documentation improved with example JSON
  • ❌ Documentation PR link still missing

Remaining Item:
A link to the documentation PR in https://github.com/strands-agents/docs/pull/... is still required for new features with public APIs. The excellent example in the PR description would translate well into user-facing documentation.

From a code quality perspective, this PR is ready to go. Once the documentation PR is linked (or justification provided), I can approve.

@github-actions github-actions bot removed the strands-running <strands-managed> Whether or not an agent is currently running label Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant