-
Notifications
You must be signed in to change notification settings - Fork 12
feat: add map step type in sql #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature-map-and-array
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 68d09d6 The changes in this PR will be included in the next version bump. This PR includes changesets to release 7 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. ✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
View your CI Pipeline Execution ↗ for commit 68d09d6
☁️ Nx Cloud last updated this comment at |
e53492c
to
a3b7cb0
Compare
a3b7cb0
to
e5f7c0d
Compare
PLAN.md
Outdated
jsonb_build_object('run', r.input) || COALESCE(dep_out.deps_output, '{}'::jsonb) | ||
WHEN s.step_type = 'map' AND s.deps_count = 0 THEN | ||
-- Root map: extract item from run input array | ||
(SELECT input -> st.task_index FROM pgflow.runs WHERE run_id = st.run_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential array bounds issue: The direct array indexing with task_index
lacks bounds checking, which could cause runtime errors if the index exceeds array length. Consider adding validation:
CASE
WHEN st.task_index < jsonb_array_length(input)
THEN input -> st.task_index
ELSE (SELECT ERROR('Task index out of bounds'))
END
This would provide a clear error message rather than a potentially confusing SQL error when an index is out of range.
(SELECT input -> st.task_index FROM pgflow.runs WHERE run_id = st.run_id) | |
CASE | |
WHEN st.task_index < jsonb_array_length((SELECT input FROM pgflow.runs WHERE run_id = st.run_id)) | |
THEN (SELECT input -> st.task_index FROM pgflow.runs WHERE run_id = st.run_id) | |
ELSE (SELECT ERROR('Task index out of bounds')) | |
END |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
PLAN.md
Outdated
-- Set remaining_tasks = NULL for all 'created' steps (not started yet) | ||
UPDATE pgflow.step_states | ||
SET remaining_tasks = NULL | ||
WHERE status = 'created'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data migration statement could be optimized to avoid unnecessary updates. Consider adding a condition to only update rows that need changing:
UPDATE pgflow.step_states
SET remaining_tasks = NULL
WHERE status = 'created' AND remaining_tasks IS NOT NULL;
This prevents updating rows where remaining_tasks
is already NULL, reducing lock contention and improving migration performance, especially in production environments with existing data.
-- Set remaining_tasks = NULL for all 'created' steps (not started yet) | |
UPDATE pgflow.step_states | |
SET remaining_tasks = NULL | |
WHERE status = 'created'; | |
-- Set remaining_tasks = NULL for all 'created' steps (not started yet) | |
UPDATE pgflow.step_states | |
SET remaining_tasks = NULL | |
WHERE status = 'created' AND remaining_tasks IS NOT NULL; | |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
e5f7c0d
to
a3a6369
Compare
pkgs/core/supabase/tests/add_step/step_type_validation.test.sql
Outdated
Show resolved
Hide resolved
pkgs/core/supabase/tests/add_step/step_type_validation.test.sql
Outdated
Show resolved
Hide resolved
a3a6369
to
6d7f91f
Compare
pkgs/core/supabase/migrations/20250911062632_pgflow_add_map_step_type.sql
Show resolved
Hide resolved
6d7f91f
to
2036d92
Compare
SET status = 'started', | ||
started_at = now() | ||
started_at = now(), | ||
remaining_tasks = ready_steps.initial_tasks -- Copy initial_tasks to remaining_tasks when starting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a null check when copying initial_tasks
to remaining_tasks
. While the schema defines initial_tasks
with DEFAULT 1
, existing data might contain null values. Using COALESCE
would provide a safety net:
remaining_tasks = COALESCE(ready_steps.initial_tasks, 1) -- Safely copy with fallback
This ensures the task counting logic remains robust even with unexpected data states.
remaining_tasks = ready_steps.initial_tasks -- Copy initial_tasks to remaining_tasks when starting | |
remaining_tasks = COALESCE(ready_steps.initial_tasks, 1) -- Safely copy with fallback |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
a330d14
to
91ff1a6
Compare
08a197e
to
a0af828
Compare
91ff1a6
to
8c1901d
Compare
a0af828
to
c66f5b6
Compare
8c1901d
to
2496a29
Compare
2496a29
to
68d09d6
Compare
c66f5b6
to
436b03d
Compare
🔍 Preview Deployment: Website✅ Deployment successful! 🔗 Preview URL: https://pr-208.pgflow.pages.dev 📝 Details:
_Last updated: _ |
🔍 Preview Deployment: Playground✅ Deployment successful! 🔗 Preview URL: https://pr-208--pgflow-demo.netlify.app 📝 Details:
_Last updated: _ |
Map Step Type Implementation (SQL Core)
Overview
This PR implements map step type functionality in pgflow's SQL Core, enabling parallel processing of array data through fanout/map operations. This foundational feature allows workflows to spawn multiple tasks from a single array input, process them in parallel, and aggregate results back into ordered arrays.
What We're Building
Core Concept: Map Steps
Map steps transform single array inputs into parallel task execution:
Two Types of Map Steps
1. Root Maps (0 dependencies)
runs.input
must be a JSONB arraystart_flow('my_flow', ["a", "b", "c"])
→ 3 parallel tasks2. Dependent Maps (1 dependency)
single_step → ["x", "y"]
→map_step
spawns 2 tasksMap→Map Chaining
Maps can depend on other maps:
Technical Implementation
Hybrid Architecture: "Fresh Data" vs "Dumb Spawning"
Key Innovation: Separate array parsing from task spawning for optimal performance.
Fresh Data Functions (parse arrays, set counts):
start_flow()
: Validatesruns.input
arrays for root maps, setsinitial_tasks
complete_task()
: Validates dependency outputs for dependent maps, setsinitial_tasks
Dumb Functions (use pre-computed counts):
start_ready_steps()
: Copiesinitial_tasks → remaining_tasks
and spawns N tasks (no JSON parsing)start_tasks()
: Extracts array elements usingtask_index
maybe_complete_run()
: Aggregates results into ordered arraysSchema Changes
Enable Map Step Type
Remove Single Task Limit
Add Task Planning Columns
Enhanced Functions
add_step()
: Acceptsstep_type
parameter, validates map constraintsstart_flow()
: Root map validation and count settingcomplete_task()
: Dependent map count settingstart_ready_steps()
: Efficient bulk task spawningstart_tasks()
: Array element extractionmaybe_complete_run()
: Result aggregationPerformance Optimizations
generate_series(0, N-1)
Edge Cases Handled
[]
output (not failures)task_index
orderDatabase Flow Examples
Root Map Flow
Dependent Map Flow
Handler Interface
Map task handlers receive individual array elements (TypeScript
Array.map()
style):Comprehensive Testing
33 test files covering:
Benefits
Breaking Changes
None. This is a pure addition:
step_type
parameter defaults to'single'
Implementation Status
add_step()
function