Commit c6a599f
committed
feat: add map step type in sql (#208)
# Map Step Type Implementation (SQL Core)
## Overview
This PR implements **map step type** functionality in pgflow's SQL Core, enabling parallel processing of array data through fanout/map operations. This foundational feature allows workflows to spawn multiple tasks from a single array input, process them in parallel, and aggregate results back into ordered arrays.
## What We're Building
### Core Concept: Map Steps
Map steps transform single array inputs into parallel task execution:
```typescript
// Input array
const items = ['item1', 'item2', 'item3'];
// Map step spawns 3 parallel tasks
// Each task processes one array element
// Results aggregated back: ["result1", "result2", "result3"]
```
### Two Types of Map Steps
**1. Root Maps (0 dependencies)**
- Input: `runs.input` must be a JSONB array
- Behavior: Spawns N tasks where N = array length
- Example: `start_flow('my_flow', ["a", "b", "c"])` → 3 parallel tasks
**2. Dependent Maps (1 dependency)**
- Input: Output from a completed dependency step (must be array)
- Behavior: Spawns N tasks based on dependency's output array length
- Example: `single_step → ["x", "y"]` → `map_step` spawns 2 tasks
### Map→Map Chaining
Maps can depend on other maps:
- Parent map completes all tasks first
- Child map receives parent's task count as its planned count
- Child tasks read from aggregated parent array using simple approach
## Technical Implementation
### Hybrid Architecture: "Fresh Data" vs "Dumb Spawning"
**Key Innovation**: Separate array parsing from task spawning for optimal performance.
**Fresh Data Functions** (parse arrays, set counts):
- `start_flow()`: Validates `runs.input` arrays for root maps, sets `initial_tasks`
- `complete_task()`: Validates dependency outputs for dependent maps, sets `initial_tasks`
**Dumb Functions** (use pre-computed counts):
- `start_ready_steps()`: Copies `initial_tasks → remaining_tasks` and spawns N tasks (no JSON parsing)
- `start_tasks()`: Extracts array elements using `task_index`
- `maybe_complete_run()`: Aggregates results into ordered arrays
### Schema Changes
1. **Enable Map Step Type**
```sql
-- steps table constraint updated
check (step_type in ('single', 'map'))
```
2. **Remove Single Task Limit**
```sql
-- step_tasks table constraint removed
-- constraint only_single_task_per_step check (task_index = 0)
```
3. **Add Task Planning Columns**
```sql
-- step_states table updates
remaining_tasks int NULL, -- NULL = not started, >0 = active countdown
initial_tasks int DEFAULT 1 CHECK (initial_tasks >= 0), -- Planned count
CONSTRAINT remaining_tasks_state_consistency CHECK (
remaining_tasks IS NULL OR status != 'created'
)
```
4. **Enhanced Functions**
- `add_step()`: Accepts `step_type` parameter, validates map constraints
- `start_flow()`: Root map validation and count setting
- `complete_task()`: Dependent map count setting
- `start_ready_steps()`: Efficient bulk task spawning
- `start_tasks()`: Array element extraction
- `maybe_complete_run()`: Result aggregation
### Performance Optimizations
- **Minimal JSON Parsing**: Arrays parsed exactly twice (once in each "fresh data" function)
- **Batch Operations**: Efficient task creation using `generate_series(0, N-1)`
- **Atomic Updates**: Count setting under existing locks
### Edge Cases Handled
- **Empty Arrays**: Auto-complete with `[]` output (not failures)
- **Array Validation**: Fail fast with clear errors for non-array inputs
- **Mixed Workflows**: Single + map steps in same flow
- **Task Indexing**: 0-based indexing matching JSONB array indices
- **Ordered Results**: Results aggregated in `task_index` order
## Database Flow Examples
### Root Map Flow
```sql
-- 1. Create flow with root map step
SELECT pgflow.create_flow('example_flow');
SELECT pgflow.add_step('example_flow', 'process_items', 'map');
-- 2. Start with array input
SELECT pgflow.start_flow('example_flow', '["item1", "item2", "item3"]');
-- → Validates input is array
-- → Sets initial_tasks = 3 for 'process_items' step
-- 3. start_ready_steps() spawns 3 tasks
-- → Creates step_tasks with task_index 0, 1, 2
-- → Sends 3 messages to pgmq queue
-- 4. Workers process tasks in parallel
-- → Each gets individual array element: "item1", "item2", "item3"
-- → Returns processed results
-- 5. Results aggregated in run output
-- → Final output: {"process_items": ["result1", "result2", "result3"]}
```
### Dependent Map Flow
```sql
-- 1. Single step produces array output
SELECT pgflow.complete_task(run_id, 'prep_step', 0, '["a", "b"]');
-- → complete_task() detects map dependent
-- → Sets initial_tasks = 2 for dependent map step
-- 2. Map step becomes ready and spawns 2 tasks
-- → Each task gets individual element: "a", "b"
-- → Processes in parallel
-- 3. Results aggregated
-- → Map step output: ["processed_a", "processed_b"]
```
## Handler Interface
Map task handlers receive individual array elements (TypeScript `Array.map()` style):
```typescript
// Single step handler
const singleHandler = (input: {run: Json, dep1?: Json}) => { ... }
// Map step handler
const mapHandler = (item: Json) => {
// Receives just the array element, not wrapped object
return processItem(item);
}
```
## Comprehensive Testing
**33 test files** covering:
- Schema validation and constraints
- Function-by-function unit tests
- Integration workflows (root maps, dependent maps, mixed)
- Edge cases (empty arrays, validation failures)
- Error handling and failure propagation
- Idempotency and double-execution safety
## Benefits
- **Parallel Processing**: Automatic fanout for array data
- **Type Safety**: Strong validation ensures arrays where expected
- **Performance**: Minimal JSON parsing, efficient batch operations
- **Simplicity**: Familiar map semantics, clear error messages
- **Scalability**: Handles large arrays efficiently
- **Composability**: Map steps integrate seamlessly with existing single steps
## Breaking Changes
None. This is a pure addition:
- Existing single steps unchanged
- New `step_type` parameter defaults to `'single'`
- Backward compatibility maintained
- Database migrations are additive only
## Implementation Status
- ✅ **Architecture Designed**: Hybrid approach validated
- ✅ **Plan Documented**: Complete implementation roadmap
- 🔄 **In Progress**: Schema changes and function updates
- ⏳ **Next**: TDD implementation starting with `add_step()` function1 parent 85676cf commit c6a599f
File tree
17 files changed
+803
-435
lines changed- .changeset
- pkgs/core
- schemas
- src
- supabase
- migrations
- tests
- add_step
- fail_task
- timestamps
17 files changed
+803
-435
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
0 commit comments