SSE Resumption Between `execd` and SDKs

# SSE Resumption Between `execd` and SDKs

## Overview

This document describes the **capabilities** and **core requirements** for **resumable** Server-Sent Events (SSE) streams between the `execd` daemon and language SDKs. The goal is to support **disconnect/reconnect** scenarios without losing or reordering execution output when the server can still replay recent events.

---

## Capabilities to Deliver

### 1. Resumable execution streams

After transient failures (network blips, client sleep/wake, load balancer rotation, SDK process restart), the SDK can establish a **new SSE connection** and **continue the same logical execution stream** when the run is still active and replay is available server-side—without forcing a new run unless required.

### 2. Deterministic replay from a client cursor

On reconnect, the client sends a **monotonic cursor** (e.g., last seen sequence number, byte offset, or opaque server-issued token). The server **replays missed events** from that point, or returns a **clear, machine-readable** outcome if:

- the run has already **completed**, or  
- the cursor is **unknown/invalid**, or  
- the event is **outside the replay window**.

### 3. Clean consumer semantics in the SDK

The SDK presents application code with a **single ordered** sequence of execution events. The implementation must **avoid duplicate delivery** at the public API surface (via deduplication, idempotent handling, or strict server replay guarantees).

### 4. Explicit terminal semantics

Terminal conditions—**success**, **failure**, **cancellation**, and **not resumable**—must be distinguishable through **documented SSE event types** and/or **HTTP status and error payloads**, so the SDK can stop retrying and release resources.

### 5. Operability

Reconnect behavior should be **testable** and **observable**: cursor advancement, replay hits/misses, retries, and fallbacks should map to **stable error categories** suitable for SDK backoff policies.

---

## Core Requirements

### Cursor and ordering

- Every application-relevant SSE event must be addressable by a **stable cursor** scoped to a **single logical stream** (e.g., one command/code execution).
- Cursors must define a **total order** for that stream (strictly increasing sequence or equivalent).
- Replay must preserve **the same delivery order** as the original stream.

### Wire contract and versioning

- Resumption inputs (cursor, stream/run identifier, optional flags) must be specified in **OpenAPI** (or an adjacent protocol note for non-generated SSE paths) and **versioned** with the API.
- **Field names** should align with existing models and generated clients where possible; handwritten SSE transport must stay **contract-compatible**.

### Replay window and retention

- `execd` must define **bounded** replay: limits by **time**, **event count**, and/or **buffered payload size**.
- If replay is impossible, the response must **not** imply success with silent loss; use **documented HTTP status** and **structured errors** (e.g., expired buffer, unknown run).

### Idempotency and side effects

- Reconnect with the **same cursor** is **read-only** with respect to execution side effects: it must not start a new run or mutate execution state unless that is an explicitly separate API.
- Starting a **new** run remains a distinct operation from **resuming** an existing stream.

### Terminal runs

- For **completed** runs, reconnect behavior must be **deterministic** and documented: either **immediate completion metadata**, **tail replay then completion**, or **non-resumable** with a clear reason—pick one consistent model per endpoint family.

### Concurrency

- Either **only one active consumer** per logical stream is supported, or **multi-consumer** semantics are explicitly defined (e.g., shared cursor, fan-out rules). Undefined concurrent attach is disallowed.

### Security and tenancy

- Stream identifiers and cursors must be **cryptographically or logically bound** to the authenticated principal and sandbox/session context so clients cannot resume another tenant’s stream by identifier guessing.

### Performance and safety

- Replay must be **chunked or limited** per response to protect memory and CPU on `execd` and on clients.
- Heartbeats, comments, and keep-alive frames must **not** break cursor or ordering semantics.

### SDK responsibilities

- Implement **automatic reconnect with exponential backoff** and **jitter** where appropriate.
- Persist and advance the **cursor** for the lifetime of the run.
- Map server errors to **retryable vs terminal** outcomes.
- Preserve **backwards compatibility**: clients that do not send resumption parameters continue to work unchanged.

---

## Non-Goals

- **Unbounded history**: resumption is not a full audit log; retention is finite by design.
- **General stream editing**: resumption is for **replay of server-emitted events**, not arbitrary mutation of in-flight execution unless specified elsewhere.

---

## Success Criteria (summary)

- Reconnect after a drop yields **no silent loss** within the documented replay window.  
- Event order at the SDK boundary is **stable** and **duplicate-free** from the app’s perspective.  
- Terminal and non-resumable cases are **explicit** and **test-covered**.  
- Legacy clients remain **compatible** without resumption parameters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSE Resumption Between `execd` and SDKs #507

SSE Resumption Between `execd` and SDKs

Overview

Capabilities to Deliver

1. Resumable execution streams

2. Deterministic replay from a client cursor

3. Clean consumer semantics in the SDK

4. Explicit terminal semantics

5. Operability

Core Requirements

Cursor and ordering

Wire contract and versioning

Replay window and retention

Idempotency and side effects

Terminal runs

Concurrency

Security and tenancy

Performance and safety

SDK responsibilities

Non-Goals

Success Criteria (summary)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SSE Resumption Between execd and SDKs #507

Description

SSE Resumption Between execd and SDKs

Overview

Capabilities to Deliver

1. Resumable execution streams

2. Deterministic replay from a client cursor

3. Clean consumer semantics in the SDK

4. Explicit terminal semantics

5. Operability

Core Requirements

Cursor and ordering

Wire contract and versioning

Replay window and retention

Idempotency and side effects

Terminal runs

Concurrency

Security and tenancy

Performance and safety

SDK responsibilities

Non-Goals

Success Criteria (summary)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

SSE Resumption Between `execd` and SDKs #507

SSE Resumption Between `execd` and SDKs