chore: update PLAN.md with current feature and task statuses

jumski · jumski · commit 1cc39efff55c · 2025-09-17T09:08:12.000+02:00
Reflects ongoing work on map step output aggregation, related migration, testing,
and documentation efforts, along with current implementation status and pending
tasks for map step support in the project.
diff --git a/PERFORMANCE.md b/PERFORMANCE.md
@@ -0,0 +1,139 @@
+# Performance Measurements - Output Aggregation
+
+## Stage 1: Baseline (Before Output Aggregation)
+Date: 2025-01-17
+Branch: 09-17-add-map-step-output-aggregation (before changes)
+
+### Test Setup
+- Database: PostgreSQL (local Supabase)
+- Test data: TBD
+- Hardware: Local development machine
+
+### Core Function Performance
+
+#### `start_tasks`
+- **Description**: Starts tasks and constructs inputs from dependencies
+- **Test scenario**: TBD
+- **Average execution time**: TBD ms
+- **Min/Max**: TBD ms / TBD ms
+- **Notes**: Hot path - called for every ready step
+
+#### `complete_task`
+- **Description**: Completes task and updates dependencies
+- **Test scenario**: TBD
+- **Average execution time**: TBD ms
+- **Min/Max**: TBD ms / TBD ms
+- **Notes**: Called for every task completion
+
+#### `maybe_complete_run`
+- **Description**: Completes run and aggregates leaf outputs
+- **Test scenario**: TBD
+- **Average execution time**: TBD ms
+- **Min/Max**: TBD ms / TBD ms
+- **Notes**: Called once per run completion
+
+### Map-Specific Performance
+
+#### Map Task Spawning
+- **Test**: Creating N tasks for map step
+- **Array sizes tested**: [10, 100, 1000]
+- **Results**:
+  - 10 items: TBD ms
+  - 100 items: TBD ms
+  - 1000 items: TBD ms
+
+#### Map Task Completion
+- **Test**: Completing all tasks in a map step
+- **Array sizes tested**: [10, 100, 1000]
+- **Results**:
+  - 10 items: TBD ms
+  - 100 items: TBD ms
+  - 1000 items: TBD ms
+
+---
+
+## Stage 2: Naive Implementation (Inline Aggregation)
+Date: TBD
+Branch: 09-17-add-map-step-output-aggregation (with inline aggregation)
+
+### Changes Made
+- Inline aggregation in `start_tasks` deps CTE
+- Inline aggregation in `maybe_complete_run`
+- Inline aggregation in `complete_task` for broadcasts
+
+### Core Function Performance
+
+#### `start_tasks`
+- **Average execution time**: TBD ms
+- **Performance impact**: TBD% (compared to baseline)
+- **Notes**: TBD
+
+#### `complete_task`
+- **Average execution time**: TBD ms
+- **Performance impact**: TBD% (compared to baseline)
+- **Notes**: TBD
+
+#### `maybe_complete_run`
+- **Average execution time**: TBD ms
+- **Performance impact**: TBD% (compared to baseline)
+- **Notes**: TBD
+
+---
+
+## Stage 3: Optimized Map-to-Map Transfer
+Date: TBD
+Branch: 09-17-add-map-step-output-aggregation (with optimization)
+
+### Changes Made
+- Direct task-to-task output transfer for map->map dependencies
+- Avoids aggregation and decomposition overhead
+
+### Map-to-Map Performance
+
+#### Direct Transfer vs Aggregation
+- **Test**: Map(10) -> Map(10) dependency chain
+- **Naive approach**: TBD ms
+- **Optimized approach**: TBD ms
+- **Improvement**: TBD%
+
+#### Scaling Test
+- **Test**: Map(N) -> Map(N) with varying sizes
+- **Results**:
+  - 10 items: TBD ms (improvement: TBD%)
+  - 100 items: TBD ms (improvement: TBD%)
+  - 1000 items: TBD ms (improvement: TBD%)
+
+---
+
+## Stage 4: Function Extraction (Optional)
+Date: TBD
+Branch: TBD
+
+### Changes Made
+- Extracted aggregation to `pgflow.get_step_output()` helper function
+
+### Function Call Overhead
+
+#### `get_step_output` Performance
+- **Average execution time**: TBD ms
+- **Overhead compared to inline**: TBD ms (TBD%)
+
+### Overall Impact
+- **Total performance impact**: TBD%
+- **Recommendation**: TBD (extract/keep inline)
+
+---
+
+## Summary and Recommendations
+
+### Key Findings
+- TBD
+
+### Performance Bottlenecks
+- TBD
+
+### Recommended Approach
+- TBD
+
+### Future Optimizations
+- TBD
diff --git a/PLAN.md b/PLAN.md
@@ -2,13 +2,21 @@
 
 **NOTE: This PLAN.md file should be removed in the final PR once all map infrastructure is complete.**
 
-### Current State
+### Features
 
 - ✅ **WORKING**: Empty array maps (taskless) cascade and complete correctly
 - ✅ **WORKING**: Task spawning creates N tasks with correct indices
 - ✅ **WORKING**: Dependency count propagation for map steps
 - ✅ **WORKING**: Array element extraction - tasks get full array instead of individual items
-- ❌ **MISSING**: Output aggregation - no way to combine map task outputs for dependents
+- 🛠️ **CURRENT**: Output aggregation - no way to combine map task outputs for dependents
+- ⏳ **WAITING**: DSL support for `.map()` for defining map steps
+
+### Chores
+
+- ⏳ **WAITING**: Integration tests for map steps
+- ⏳ **WAITING**: Consolidated migration for map steps
+- ⏳ **WAITING**: Documentation for map steps
+- ⏳ **WAITING**: Graphite stack merge for map steps
 
 ## Implementation Status
 
diff --git a/PLAN_output_aggregation.md b/PLAN_output_aggregation.md
@@ -0,0 +1,183 @@
+# Output Aggregation Implementation Plan
+
+## Overview
+Implement output aggregation for map steps with performance-focused, test-first approach.
+
+## Stage 1: Baseline Performance Measurement
+
+### Tasks
+- Run existing performance tests multiple times (3-5 runs)
+- Calculate average values for each metric
+- Document results in `PERFORMANCE.md`
+
+### Commands
+```bash
+# Run performance tests (repeat 3-5 times)
+pnpm nx test:pgtap core -- pkgs/core/tests/performance/*.sql
+
+# Document results in PERFORMANCE.md with format:
+# - Test name
+# - Average execution time
+# - Min/Max values
+# - Standard deviation if significant
+```
+
+## Stage 2: Test-First Development (Naive Implementation)
+
+### Approach
+Write failing tests one at a time, implement inline solution to make them pass.
+
+### Test Scenarios (in order of complexity)
+1. **Basic map output aggregation**
+   - Single map step with 3 tasks
+   - Verify outputs aggregated in task_index order
+
+2. **Empty map output**
+   - Map step with 0 tasks
+   - Should return `[]` as output
+
+3. **Map feeding into single step**
+   - Map step output aggregated as array
+   - Single step receives full array as dependency input
+
+4. **Map feeding into another map**
+   - First map outputs array
+   - Second map processes each element
+
+5. **Edge case: NULL outputs**
+   - Some tasks return NULL
+   - Aggregation should include NULLs in array
+
+6. **Run completion with map leaf step**
+   - Map step as leaf (no dependents)
+   - Run output should contain aggregated array
+
+### Development Workflow
+```bash
+# 1. Write test
+vim pkgs/core/tests/map_output_aggregation_test.sql
+
+# 2. Run test (should fail)
+pkgs/core/scripts/run-test-with-colors pkgs/core/tests/map_output_aggregation_test.sql
+
+# 3. Update functions in database
+psql $DATABASE_URL -f updated_function.sql
+
+# 4. Re-run test (iterate until passing)
+pkgs/core/scripts/run-test-with-colors pkgs/core/tests/map_output_aggregation_test.sql
+
+# 5. Repeat for next test scenario
+```
+
+### Implementation Notes
+**Naive approach**: Inline aggregation directly in the affected functions
+- **`start_tasks`**: Aggregate map outputs inline in deps CTE
+- **`maybe_complete_run`**: Aggregate map outputs for leaf steps
+- **`complete_task`**: Aggregate for broadcast events
+
+## Stage 3: Performance Measurement (Naive)
+
+### Tasks
+- Run performance tests with naive implementation
+- Compare with baseline
+- Document in `PERFORMANCE.md`
+
+### Expected Impact
+- `start_tasks`: Moderate overhead (aggregation per dependency)
+- `maybe_complete_run`: Minimal (only at run completion)
+- `complete_task`: Minimal (only for broadcasts)
+
+## Stage 4: Map-to-Map Optimization
+
+### Concept
+Optimize the map->map case where we aggregate outputs only to immediately decompose them:
+- Map A task[i] → output[i]
+- Currently: Aggregate to array → decompose in Map B
+- Optimized: Map A task[i] → Map B task[i] directly
+
+### Implementation Strategy
+```sql
+-- In start_tasks deps CTE, add special case:
+CASE
+  WHEN step.step_type = 'map' AND dep_step.step_type = 'map' THEN
+    -- Direct task-to-task transfer
+    (SELECT output FROM pgflow.step_tasks
+     WHERE run_id = st.run_id
+       AND step_slug = dep.dep_slug
+       AND task_index = st.task_index
+       AND status = 'completed')
+  ELSE
+    -- Standard aggregation for non-map dependents
+    ...
+END
+```
+
+### Tests
+1. **Map-to-map direct transfer**
+   - Verify task[i] gets output[i] without aggregation
+
+2. **Map-to-map with different sizes**
+   - Source map: 5 tasks
+   - Target map: 5 tasks (should work)
+   - Error handling if sizes mismatch
+
+## Stage 5: Final Performance Measurement
+
+### Tasks
+- Run all performance tests
+- Compare baseline vs naive vs optimized
+- Document final results and recommendations
+
+### Metrics to Track
+- Execution time per function
+- Memory usage (if measurable)
+- Query complexity (EXPLAIN ANALYZE)
+
+## Stage 6: Function Extraction Decision
+
+### Evaluation Criteria
+After measuring performance of inline implementation:
+1. **Performance overhead**: Is function call cost acceptable?
+2. **Code duplication**: How much repetition exists?
+3. **Maintainability**: Would function improve code clarity?
+
+### If extracting to function:
+```sql
+-- Create pgflow.get_step_output() helper
+-- Update all three locations to use helper
+-- Re-run performance tests
+-- Document final decision and rationale
+```
+
+## Notes for Implementation
+
+### Key Files to Modify
+1. `pkgs/core/schemas/0120_function_start_tasks.sql` (lines 46-53)
+2. `pkgs/core/schemas/0100_function_maybe_complete_run.sql` (lines 16-27)
+3. `pkgs/core/schemas/0100_function_complete_task.sql` (line 156)
+
+### Testing Database Access
+```bash
+# Get database URL
+source .env.local
+echo $DATABASE_URL
+
+# Direct psql access for function updates
+psql $DATABASE_URL
+
+# View current function
+\sf pgflow.start_tasks
+```
+
+### Performance Testing Tips
+- Run tests when system is idle
+- Use consistent hardware/environment
+- Warm up database before measurements
+- Consider connection pooling effects
+
+## Success Criteria
+- [ ] All map output aggregation tests passing
+- [ ] Performance impact < 10% for typical workflows
+- [ ] Map-to-map optimization shows measurable improvement
+- [ ] Documentation complete with performance analysis
+- [ ] Decision made on function extraction based on data