-
Notifications
You must be signed in to change notification settings - Fork 12
feat: enhance start_tasks with conditional input logic and add array handling migration #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 09-16-make-initial-tasks-nullable
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -56,7 +56,8 @@ as $$ | |||||||||||||||
select | ||||||||||||||||
d.run_id, | ||||||||||||||||
d.step_slug, | ||||||||||||||||
jsonb_object_agg(d.dep_slug, d.dep_output) as deps_output | ||||||||||||||||
jsonb_object_agg(d.dep_slug, d.dep_output) as deps_output, | ||||||||||||||||
count(*) as dep_count | ||||||||||||||||
from deps d | ||||||||||||||||
group by d.run_id, d.step_slug | ||||||||||||||||
), | ||||||||||||||||
|
@@ -82,11 +83,82 @@ as $$ | |||||||||||||||
st.flow_slug, | ||||||||||||||||
st.run_id, | ||||||||||||||||
st.step_slug, | ||||||||||||||||
jsonb_build_object('run', r.input) || | ||||||||||||||||
coalesce(dep_out.deps_output, '{}'::jsonb) as input, | ||||||||||||||||
-- ========================================== | ||||||||||||||||
-- INPUT CONSTRUCTION LOGIC | ||||||||||||||||
-- ========================================== | ||||||||||||||||
-- This nested CASE statement determines how to construct the input | ||||||||||||||||
-- for each task based on the step type (map vs non-map). | ||||||||||||||||
-- | ||||||||||||||||
-- The fundamental difference: | ||||||||||||||||
-- - Map steps: Receive RAW array elements (e.g., just 42 or "hello") | ||||||||||||||||
-- - Non-map steps: Receive structured objects with named keys | ||||||||||||||||
-- (e.g., {"run": {...}, "dependency1": {...}}) | ||||||||||||||||
-- ========================================== | ||||||||||||||||
CASE | ||||||||||||||||
-- -------------------- MAP STEPS -------------------- | ||||||||||||||||
-- Map steps process arrays element-by-element. | ||||||||||||||||
-- Each task receives ONE element from the array at its task_index position. | ||||||||||||||||
WHEN step.step_type = 'map' THEN | ||||||||||||||||
-- Map steps get raw array elements without any wrapper object | ||||||||||||||||
CASE | ||||||||||||||||
-- ROOT MAP: Gets array from run input | ||||||||||||||||
-- Example: run input = [1, 2, 3] | ||||||||||||||||
-- task 0 gets: 1 | ||||||||||||||||
-- task 1 gets: 2 | ||||||||||||||||
-- task 2 gets: 3 | ||||||||||||||||
WHEN step.deps_count = 0 THEN | ||||||||||||||||
-- Root map (deps_count = 0): no dependencies, reads from run input. | ||||||||||||||||
-- Extract the element at task_index from the run's input array. | ||||||||||||||||
-- Note: If run input is not an array, this will return NULL | ||||||||||||||||
-- and the flow will fail (validated in start_flow). | ||||||||||||||||
jsonb_array_element(r.input, st.task_index) | ||||||||||||||||
|
||||||||||||||||
-- DEPENDENT MAP: Gets array from its single dependency | ||||||||||||||||
-- Example: dependency output = ["a", "b", "c"] | ||||||||||||||||
-- task 0 gets: "a" | ||||||||||||||||
-- task 1 gets: "b" | ||||||||||||||||
-- task 2 gets: "c" | ||||||||||||||||
ELSE | ||||||||||||||||
-- Has dependencies (should be exactly 1 for map steps). | ||||||||||||||||
-- Extract the element at task_index from the dependency's output array. | ||||||||||||||||
-- | ||||||||||||||||
-- Why the subquery with jsonb_each? | ||||||||||||||||
-- - The dependency outputs a raw array: [1, 2, 3] | ||||||||||||||||
-- - deps_outputs aggregates it into: {"dep_name": [1, 2, 3]} | ||||||||||||||||
-- - We need to unwrap and get just the array value | ||||||||||||||||
-- - Map steps have exactly 1 dependency (enforced by add_step) | ||||||||||||||||
-- - So jsonb_each will return exactly 1 row | ||||||||||||||||
-- - We extract the 'value' which is the raw array [1, 2, 3] | ||||||||||||||||
-- - Then get the element at task_index from that array | ||||||||||||||||
(SELECT jsonb_array_element(value, st.task_index) | ||||||||||||||||
FROM jsonb_each(dep_out.deps_output) | ||||||||||||||||
LIMIT 1) | ||||||||||||||||
Comment on lines
+133
to
+135
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Critical bug: The subquery uses LIMIT 1 without ORDER BY, making the result non-deterministic when a map step has multiple dependencies. While the comment states 'Map steps have exactly 1 dependency (enforced by add_step)', this creates a race condition if that constraint is ever violated or if there are concurrent modifications. The query should either add ORDER BY for deterministic results or add a runtime check to ensure exactly one dependency exists. This could cause map tasks to receive elements from random dependencies in multi-dependency scenarios.
Suggested change
Spotted by Diamond |
||||||||||||||||
END | ||||||||||||||||
|
||||||||||||||||
-- -------------------- NON-MAP STEPS -------------------- | ||||||||||||||||
-- Regular (non-map) steps receive ALL inputs as a structured object. | ||||||||||||||||
-- This includes the original run input plus all dependency outputs. | ||||||||||||||||
ELSE | ||||||||||||||||
-- Non-map steps get structured input with named keys | ||||||||||||||||
-- Example output: { | ||||||||||||||||
-- "run": {"original": "input"}, | ||||||||||||||||
-- "step1": {"output": "from_step1"}, | ||||||||||||||||
-- "step2": {"output": "from_step2"} | ||||||||||||||||
-- } | ||||||||||||||||
-- | ||||||||||||||||
-- Build object with 'run' key containing original input | ||||||||||||||||
jsonb_build_object('run', r.input) || | ||||||||||||||||
-- Merge with deps_output which already has dependency outputs | ||||||||||||||||
-- deps_output format: {"dep1": output1, "dep2": output2, ...} | ||||||||||||||||
-- If no dependencies, defaults to empty object | ||||||||||||||||
coalesce(dep_out.deps_output, '{}'::jsonb) | ||||||||||||||||
END as input, | ||||||||||||||||
st.message_id as msg_id | ||||||||||||||||
from tasks st | ||||||||||||||||
join runs r on st.run_id = r.run_id | ||||||||||||||||
join pgflow.steps step on | ||||||||||||||||
step.flow_slug = st.flow_slug and | ||||||||||||||||
step.step_slug = st.step_slug | ||||||||||||||||
left join deps_outputs dep_out on | ||||||||||||||||
dep_out.run_id = st.run_id and | ||||||||||||||||
dep_out.step_slug = st.step_slug | ||||||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
-- Modify "start_tasks" function | ||
CREATE OR REPLACE FUNCTION "pgflow"."start_tasks" ("flow_slug" text, "msg_ids" bigint[], "worker_id" uuid) RETURNS SETOF "pgflow"."step_task_record" LANGUAGE sql SET "search_path" = '' AS $$ | ||
with tasks as ( | ||
select | ||
task.flow_slug, | ||
task.run_id, | ||
task.step_slug, | ||
task.task_index, | ||
task.message_id | ||
from pgflow.step_tasks as task | ||
where task.flow_slug = start_tasks.flow_slug | ||
and task.message_id = any(msg_ids) | ||
and task.status = 'queued' | ||
), | ||
start_tasks_update as ( | ||
update pgflow.step_tasks | ||
set | ||
attempts_count = attempts_count + 1, | ||
status = 'started', | ||
started_at = now(), | ||
last_worker_id = worker_id | ||
from tasks | ||
where step_tasks.message_id = tasks.message_id | ||
and step_tasks.flow_slug = tasks.flow_slug | ||
and step_tasks.status = 'queued' | ||
), | ||
runs as ( | ||
select | ||
r.run_id, | ||
r.input | ||
from pgflow.runs r | ||
where r.run_id in (select run_id from tasks) | ||
), | ||
deps as ( | ||
select | ||
st.run_id, | ||
st.step_slug, | ||
dep.dep_slug, | ||
dep_task.output as dep_output | ||
from tasks st | ||
join pgflow.deps dep on dep.flow_slug = st.flow_slug and dep.step_slug = st.step_slug | ||
join pgflow.step_tasks dep_task on | ||
dep_task.run_id = st.run_id and | ||
dep_task.step_slug = dep.dep_slug and | ||
dep_task.status = 'completed' | ||
), | ||
deps_outputs as ( | ||
select | ||
d.run_id, | ||
d.step_slug, | ||
jsonb_object_agg(d.dep_slug, d.dep_output) as deps_output, | ||
count(*) as dep_count | ||
from deps d | ||
group by d.run_id, d.step_slug | ||
), | ||
timeouts as ( | ||
select | ||
task.message_id, | ||
task.flow_slug, | ||
coalesce(step.opt_timeout, flow.opt_timeout) + 2 as vt_delay | ||
from tasks task | ||
join pgflow.flows flow on flow.flow_slug = task.flow_slug | ||
join pgflow.steps step on step.flow_slug = task.flow_slug and step.step_slug = task.step_slug | ||
), | ||
-- Batch update visibility timeouts for all messages | ||
set_vt_batch as ( | ||
select pgflow.set_vt_batch( | ||
start_tasks.flow_slug, | ||
array_agg(t.message_id order by t.message_id), | ||
array_agg(t.vt_delay order by t.message_id) | ||
) | ||
from timeouts t | ||
) | ||
select | ||
st.flow_slug, | ||
st.run_id, | ||
st.step_slug, | ||
-- ========================================== | ||
-- INPUT CONSTRUCTION LOGIC | ||
-- ========================================== | ||
-- This nested CASE statement determines how to construct the input | ||
-- for each task based on the step type (map vs non-map). | ||
-- | ||
-- The fundamental difference: | ||
-- - Map steps: Receive RAW array elements (e.g., just 42 or "hello") | ||
-- - Non-map steps: Receive structured objects with named keys | ||
-- (e.g., {"run": {...}, "dependency1": {...}}) | ||
-- ========================================== | ||
CASE | ||
-- -------------------- MAP STEPS -------------------- | ||
-- Map steps process arrays element-by-element. | ||
-- Each task receives ONE element from the array at its task_index position. | ||
WHEN step.step_type = 'map' THEN | ||
-- Map steps get raw array elements without any wrapper object | ||
CASE | ||
-- ROOT MAP: Gets array from run input | ||
-- Example: run input = [1, 2, 3] | ||
-- task 0 gets: 1 | ||
-- task 1 gets: 2 | ||
-- task 2 gets: 3 | ||
WHEN step.deps_count = 0 THEN | ||
-- Root map (deps_count = 0): no dependencies, reads from run input. | ||
-- Extract the element at task_index from the run's input array. | ||
-- Note: If run input is not an array, this will return NULL | ||
-- and the flow will fail (validated in start_flow). | ||
jsonb_array_element(r.input, st.task_index) | ||
|
||
-- DEPENDENT MAP: Gets array from its single dependency | ||
-- Example: dependency output = ["a", "b", "c"] | ||
-- task 0 gets: "a" | ||
-- task 1 gets: "b" | ||
-- task 2 gets: "c" | ||
ELSE | ||
-- Has dependencies (should be exactly 1 for map steps). | ||
-- Extract the element at task_index from the dependency's output array. | ||
-- | ||
-- Why the subquery with jsonb_each? | ||
-- - The dependency outputs a raw array: [1, 2, 3] | ||
-- - deps_outputs aggregates it into: {"dep_name": [1, 2, 3]} | ||
-- - We need to unwrap and get just the array value | ||
-- - Map steps have exactly 1 dependency (enforced by add_step) | ||
-- - So jsonb_each will return exactly 1 row | ||
-- - We extract the 'value' which is the raw array [1, 2, 3] | ||
-- - Then get the element at task_index from that array | ||
(SELECT jsonb_array_element(value, st.task_index) | ||
FROM jsonb_each(dep_out.deps_output) | ||
LIMIT 1) | ||
END | ||
|
||
-- -------------------- NON-MAP STEPS -------------------- | ||
-- Regular (non-map) steps receive ALL inputs as a structured object. | ||
-- This includes the original run input plus all dependency outputs. | ||
ELSE | ||
-- Non-map steps get structured input with named keys | ||
-- Example output: { | ||
-- "run": {"original": "input"}, | ||
-- "step1": {"output": "from_step1"}, | ||
-- "step2": {"output": "from_step2"} | ||
-- } | ||
-- | ||
-- Build object with 'run' key containing original input | ||
jsonb_build_object('run', r.input) || | ||
-- Merge with deps_output which already has dependency outputs | ||
-- deps_output format: {"dep1": output1, "dep2": output2, ...} | ||
-- If no dependencies, defaults to empty object | ||
coalesce(dep_out.deps_output, '{}'::jsonb) | ||
END as input, | ||
st.message_id as msg_id | ||
from tasks st | ||
join runs r on st.run_id = r.run_id | ||
join pgflow.steps step on | ||
step.flow_slug = st.flow_slug and | ||
step.step_slug = st.step_slug | ||
left join deps_outputs dep_out on | ||
dep_out.run_id = st.run_id and | ||
dep_out.step_slug = st.step_slug | ||
$$; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array bounds vulnerability -
jsonb_array_element(r.input, st.task_index)
will return NULL if task_index is out of bounds, but this NULL is passed directly as task input without validation. If there's a race condition or bug in task creation that results in task_index >= array_length, tasks will receive NULL input and likely fail at runtime. Add bounds checking or handle NULL case explicitly.Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.