Skip to content

Persist agent and workflow run state for resume and historical views #3377

@senamakel

Description

@senamakel

Summary

Persist agent/workflow run state, telemetry, child lineage, and resumable checkpoints so OpenHuman can reliably render and resume parallel work after navigation, app restart, or interrupted runs.

This is the cross-cutting storage foundation for the other #3370 child issues.

Parent: #3370

Problem

OpenHuman currently has process-local orchestration state and live-only subagent transcript deltas. Some worker transcripts are persisted as conversation threads, and some tool timeline entries are persisted, but there is no single durable run index for background agents, teams, workflows, and child lineage.

Claude Code's workflow/agent surfaces work because every run/session has inspectable state: status, task/phase, child agent results, prompt/script, logs, and management commands. OpenHuman needs the same kind of durable run ledger, adapted to our existing memory/thread/task-board stores.

Implementation plan

  1. Define a durable run ledger.

    • AgentRun: id, kind (subagent, worker_thread, background_agent, team_member, workflow_child), parent run/thread, agent id, status, prompt ref, worker thread id, task board/card refs, started/updated/completed timestamps.
    • WorkflowRun: id, definition id, parent thread, input, phase states, child run ids, status, summary.
    • RunEvent: run id, sequence, event type, payload, timestamp.
    • RunTelemetry: token counts, cost estimate, elapsed ms, tool counts, model/provider, error.
  2. Store the ledger under workspace state.

    • Use existing workspace storage patterns instead of inventing a separate DB unless current thread storage is insufficient.
    • Keep large transcript text in conversation/thread storage; run ledger should reference it.
  3. Rehydrate app state.

    • Add API endpoints for list/get runs and recent run events.
    • chatRuntimeSlice should rehydrate historical subagent/tool rows from persisted metadata, accepting that live streamed prose is not replayed unless it exists in a worker thread.
  4. Add resume semantics.

    • Persist checkpoints for awaiting-user and paused workers, extending the existing continue_subagent checkpoint path.
    • Workflows resume by reusing completed child results and launching missing/failed phases according to policy.
  5. Add tests.

    • Rust tests for append/list/get ordering, schema compatibility, and restart-style rehydrate.
    • Vitest for historical run rendering.
    • E2E for starting a run, navigating away, and reopening it.

Reference code

Current orchestration state is explicitly process-local:

// src/openhuman/agent_orchestration/README.md
// The first implementation is process-local. The state shape is serializable so a
// later PR can persist orchestration sessions across app restart, cron resumes, and
// thread continuation without changing callers.

Current live/persisted split in frontend runtime:

// app/src/store/chatRuntimeSlice.ts
export interface SubagentActivity {
  taskId: string;
  agentId: string;
  workerThreadId?: string;
  status: ToolTimelineEntryStatus;
  toolCalls: SubagentToolCallEntry[];
  transcript?: SubagentTranscriptItem[];
}

Files to build from:

  • src/openhuman/agent_orchestration/README.md
  • src/openhuman/agent_orchestration/ops.rs
  • src/openhuman/agent_orchestration/types.rs
  • src/openhuman/agent_orchestration/tools/continue_subagent.rs
  • src/openhuman/agent_orchestration/tools/worker_thread.rs
  • src/openhuman/memory_conversations/
  • app/src/store/chatRuntimeSlice.ts
  • app/src/types/turnState.ts

Claude references:

Relevant Claude ideas to adapt:

  • Background work remains visible after detaching from the interactive session.
  • Workflow runtime tracks child results separately from the main conversation context.
  • Progress views need phase/agent state, elapsed time, token/cost totals, and stop/resume controls.
  • Resume should avoid rerunning completed child work where possible.

Acceptance criteria

  • Durable run, workflow, event, and telemetry schemas are implemented with migration/backward compatibility strategy.
  • Existing worker-thread and subagent progress paths write into the run ledger.
  • The app can list recent/running historical runs after navigation or app restart.
  • Awaiting-user and paused runs can be resumed through persisted checkpoint metadata.
  • Historical rendering handles missing live-only streamed prose gracefully.
  • Rust and frontend tests cover persistence, rehydrate, and resume behavior.
  • Diff coverage >= 80% for changed lines.

Related

Metadata

Metadata

Assignees

Labels

agentBuilt-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/.featureNet-new user-facing capability or product behavior.rust-coreCore Rust runtime in src/: CLI, core_server, shared infrastructure.

Type

No type
No fields configured for issues without a type.

Projects

Status
Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions