-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 0-Task Plans in Single-Shard Router When Updating a Local Table with Reference Table in Subquery #7897
base: release-13.0
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## release-13.0 #7897 +/- ##
================================================
- Coverage 89.48% 89.42% -0.06%
================================================
Files 276 276
Lines 60063 59895 -168
Branches 7524 7510 -14
================================================
- Hits 53747 53561 -186
- Misses 4166 4188 +22
+ Partials 2150 2146 -4 |
@@ -1583,7 +1583,7 @@ IsLocalTableModification(Oid targetRelationId, Query *query, uint64 shardId, | |||
return true; | |||
} | |||
|
|||
if (shardId == INVALID_SHARD_ID && ContainsOnlyLocalTables(rteProperties)) | |||
if (shardId == INVALID_SHARD_ID && ContainsOnlyLocalOrReferenceTables(rteProperties)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I update the function name and description based on the adjustments here?
WHERE EXISTS ( | ||
SELECT (SELECT c15 FROM t2_ref) | ||
FROM t4_pg | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the test also show the query plan ? EXPLAIN (VERBOSE, COSTS OFF)
output for this query, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding it, but it behaves differently across different PostgreSQL versions. In this case, I don't think it's worth maintaining separate test outputs. Do you have any suggestions on how to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way to handle this is to use a helper function explain_with_pg17_initplan_format()
(see the test columnar_chunk_filtering.sql
for an example of its usage), but even with that I think this difference in EXPLAIN format requires more than one test output, and I agree its not worth maintaining. Query correctness is the main concern here. So let's drop this suggestion if it means 2 output files.
|
||
-- Show final data | ||
SELECT 't6_pg after' AS label, * FROM t6_pg; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include a test that ensures updates on a reference table are handled correctly ? That is, all copies of the reference table are updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
{ | ||
return !rteProperties->hasDistributedTable && !rteProperties->hasReferenceTable; | ||
/* If hasDistributedTable is false, then all tables are either local or reference. */ | ||
return !rteProperties->hasDistributedTable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this enable a reference table to be updated ? e.g. update ref_table where <clause using local and ref tables>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPDATE t2_ref
SET c15 = '2099-01-01 00:00:00'::timestamp
WHERE EXISTS (
SELECT 1
FROM t6_pg
);
ERROR: relation t6_pg is not distributed
I executed the query above and encountered an error. Should we keep it? May they be duplicated or redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We get the same error with main, so it is expected. I'm in favor of keeping, even if the error condition is tested elsewhere, as it serves to document the current behavior. Can we add a test query where t6_pg is replaced by a reference table ? Something like:
create table t3_ref(pkey int, c15 timestamp);
select create_reference_table('t3_ref');
insert into t3_ref values (99, now());
UPDATE t2_ref
SET c15 = '2088-08-08 00:00:00'::timestamp
WHERE EXISTS (
SELECT 1
FROM t3_ref
);
and verify that t2_ref is updated on all nodes ? I think it should be, because the modify planner will not be looking at an invalid shard-id, but just to cover all bases... Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m3hm3t there was a typo in my previous comment - it should have read: verify that t2_ref is updated on all nodes.
I checked this case using your branch and it is okay; the reference table is updated correctly - the check I used is below, if you want to include it in the regress test.
This was the one remaining case I wanted to check, the PR now looks good to me overall.
create table t3_ref(pkey int, c15 text);
select create_reference_table('t3_ref');
insert into t3_ref values (99, 'Initial Data');
UPDATE t2_ref SET c15 = '2088-08-08 00:00:00'::timestamp WHERE EXISTS ( SELECT 1 FROM t3_ref);
SELECT 't2_ref after UPDATE' AS label, * FROM t2_ref;
SELECT citus_remove_node('localhost', :worker_2_port);
SELECT 't2_ref after UPDATE - without worker 2' AS label, * FROM t2_ref;
SELECT 1 FROM citus_add_node('localhost', :worker_2_port);
SELECT citus_remove_node('localhost', :worker_1_port);
SELECT 't2_ref after UPDATE - without worker 1' AS label, * FROM t2_ref;
SELECT 1 FROM citus_add_node('localhost', :worker_1_port);
-- - The outer subquery iterates over every row in t4_pg. | ||
-- - The inner subquery selects c15 from t2_ref. | ||
-- 5. The update should occur if the nested subquery returns any row, effectively updating t6_pg's vkey to 43. | ||
-- 6. The final state of t6_pg is displayed to confirm that the update was applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the issue present for DELETE and MERGE ? Can we include tests for those ? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the restrictions in the MERGE section, do you think keeping these tests might make them duplicated?
MERGE INTO t6_pg AS tgt
USING t4_pg AS src
ON (tgt.pkey = 14000) -- trivial condition to "match" row pkey=14000
WHEN MATCHED THEN
UPDATE SET c26 = 'merged_' || (SELECT pkey FROM t2_ref WHERE pkey=24000 LIMIT 1)
WHEN NOT MATCHED THEN
INSERT (vkey, pkey, c26)
VALUES (99, src.pkey, 'inserted_via_merge');
ERROR: non-IMMUTABLE functions are not yet supported in MERGE sql with distributed tables
MERGE INTO t6_pg AS tgt
USING t4_pg AS src
ON (tgt.pkey = src.pkey)
WHEN MATCHED THEN
UPDATE SET c26 = 'merged_value'
WHEN NOT MATCHED THEN
INSERT (vkey, pkey, c26)
VALUES (src.vkey, src.pkey, 'inserted_via_merge');
SELECT 't6_pg after MERGE' AS label, * FROM t6_pg;
label | vkey | pkey | c26
---------------------------------------------------------------------
t6_pg after MERGE | 43 | 12000 | initial
t6_pg after MERGE | 43 | 14000 | to_merge
t6_pg after MERGE | 5 | 15000 | inserted_via_merge
(3 rows)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per previous comment, I'm in favor of keeping because it documents the current behavior. But I'll leave it to your call. I've a couple of test suggestions though, can we add:
MERGE into a reference table using a postgres table as source
MERGE into a reference table using another reference table as source
…for clarity and update related documentation
…f local and reference tables
…dling of local and reference tables
…nting DELETE and MERGE operations to validate reference table handling
a843139
to
ce9915d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm, thanks for accommodating the test requests.
This PR fixes an issue #7891 in the Citus planner where an
UPDATE
on a local table with a subquery referencing a reference table could produce a 0-task plan. Historically, the planner sometimes failed to detect that both the target and referenced tables were effectively “local,” assigningINVALID_SHARD_ID
and yielding a no-op plan.Root Cause
PlanRouterQuery
), we relied onshardId
to determine whether a query should be routed to a single shard.shardId == INVALID_SHARD_ID
, but we also had not marked the query as a “local table modification,” the code path would produce zero tasks.Changes
Enhanced Local Table Detection
IsLocalTableModification
and related checks to consider both local and reference tables as “local” for planning, preventing the 0-task scenario.ContainsOnlyLocalOrReferenceTables
to return true if there are no fully distributed tables in the query.Added Regress Test
issue_7891.sql
) which reproduces the scenario.