Skip to content

Conversation

@ditadi
Copy link
Contributor

@ditadi ditadi commented Jan 27, 2026

PR Description

Summary

Introduces @databricks/taskflow, a production-grade durable task execution system for Node.js applications. This package provides reliable task processing with write-ahead logging, real-time event streaming, rate limiting, and automatic recovery.

Key Features

  • Durable Execution - Write-ahead log (WAL) ensures tasks survive process crashes
  • Event Streaming - Real-time SSE with automatic reconnection via sequence numbers
  • Rate Limiting - Sliding window backpressure with global and per-user quotas
  • Retry & Recovery - Exponential backoff, dead letter queue, stale task detection
  • Type Safety - Branded types (TaskId, UserId, etc.) prevent ID mix-ups at compile time
  • Observability - OpenTelemetry-compatible hooks for traces, metrics, and logs
  • Pluggable Storage - SQLite (default) and Lakebase backends, or bring your own

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      TaskSystem                             │
│            submit() · subscribe() · getStatus()             │
└─────────────────────────────────────────────────────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         ▼                    ▼                    ▼
┌────────────────┐  ┌────────────────┐  ┌────────────────┐
│     Guard      │  │    Delivery    │  │  Persistence   │
│ Backpressure   │  │ StreamManager  │  │   EventLog     │
│ SlotManager    │  │ RingBuffer     │  │   Repository   │
│ DLQ            │  │                │  │                │
└────────────────┘  └────────────────┘  └────────────────┘
                              │
                    ┌─────────┴─────────┐
                    ▼                   ▼
          ┌────────────────┐  ┌────────────────┐
          │     Flush      │  │   Execution    │
          │ FlushWorker    │  │ Executor       │
          │ CircuitBreaker │  │ Recovery       │
          └────────────────┘  └────────────────┘

Implementation Phases

Phase Layer Components
1 Core Foundation types, errors, branded types
2 Domain Model events, task, handler
3 Guard Layer backpressure, slot-manager, dlq
4 Delivery Layer ring-buffer, stream-manager
5 Persistence Layer event-log, sqlite, lakebase
6 Flush Layer flush-worker, flush-manager
7 Execution Layer executor, recovery, system

Usage Example

import { TaskSystem } from '@databricks/taskflow';

const taskSystem = new TaskSystem({
  repository: { type: 'sqlite', database: './.taskflow/tasks.db' }
});

taskSystem.defineTask('send-email', {
  handler: async (input, ctx) => {
    ctx.progress({ status: 'sending' });
    await sendEmail(input.to, input.subject, input.body);
    return { sent: true };
  }
});

await taskSystem.initialize();

const task = await taskSystem.submit('send-email', {
  input: { to: '[email protected]', subject: 'Hello', body: 'World' },
  userId: 'user-123'
});

for await (const event of taskSystem.subscribe(task.idempotencyKey)) {
  console.log(event.type, event.payload);
}

Test Coverage

  • Core: Error hierarchy, type guards, branded types
  • Guard: Rate limiting, slot management, DLQ
  • Delivery: Ring buffer, stream reconnection
  • Persistence: Event log rotation, SQLite CRUD
  • Flush: Circuit breaker, checkpoint management
  • Execution: Task lifecycle, recovery, system orchestration

Files Changed

packages/taskflow/
├── src/
│   ├── core/           # types, errors, branded
│   ├── observability/  # hooks, metrics, spans
│   ├── domain/         # task, events, handler
│   ├── guard/          # backpressure, slots, dlq
│   ├── delivery/       # ring-buffer, stream
│   ├── persistence/    # event-log, repository
│   ├── flush/          # flush-worker, manager
│   ├── execution/      # executor, recovery, system
│   └── tests/          # comprehensive test suites
├── README.md
└── package.json

Checklist

  • Core types and error hierarchy
  • Branded types for type safety
  • Observability hooks interface
  • Task domain model with state machine
  • Guard layer (backpressure, slots, DLQ)
  • Delivery layer (ring buffer, SSE streams)
  • Persistence layer (WAL, SQLite, Lakebase)
  • Flush layer (worker process, circuit breaker)
  • Execution layer (executor, recovery, orchestrator)
  • Comprehensive test coverage
  • README documentation

Breaking Changes

None - this is a new package.

Related Issues

N/A - Initial implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants