Skip to content

Commit fd02150

Browse files
pavle-martinovic_datahaoyangeng-db
authored andcommitted
[SPARK-52549][SQL] Disable Recursive CTE self-references from window functions and inside sorts
### What changes were proposed in this pull request? Throw error when a UnionLoopRef is found inside a window function. ### Why are the changes needed? The context of a Recursive CTE is only the result of the previous iteration, so any type of cumulative operations should not be allowed. This is also the reason why self references inside aggregates aren't allowed. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New golden file test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51178 from Pajaraja/pavle-martinovic_data/noWindowFunctionsrCTE. Authored-by: pavle-martinovic_data <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent e027918 commit fd02150

File tree

5 files changed

+88
-2
lines changed

5 files changed

+88
-2
lines changed

common/utils/src/main/resources/error/error-conditions.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3429,7 +3429,7 @@
34293429
"subClass" : {
34303430
"PLACE" : {
34313431
"message" : [
3432-
"Recursive references cannot be used on the right side of left outer/semi/anti joins, on the left side of right outer joins, in full outer joins and in aggregates"
3432+
"Recursive references cannot be used on the right side of left outer/semi/anti joins, on the left side of right outer joins, in full outer joins, in aggregates, window functions or sorts"
34333433
]
34343434
}
34353435
},

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveWithCTE.scala

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,6 +347,10 @@ object ResolveWithCTE extends Rule[LogicalPlan] {
347347
checkIfSelfReferenceIsPlacedCorrectly(right, cteId, allowRecursiveRef = false)
348348
case Aggregate(_, _, child, _) =>
349349
checkIfSelfReferenceIsPlacedCorrectly(child, cteId, allowRecursiveRef = false)
350+
case Window(_, _, _, child, _) =>
351+
checkIfSelfReferenceIsPlacedCorrectly(child, cteId, allowRecursiveRef = false)
352+
case Sort(_, _, child, _) =>
353+
checkIfSelfReferenceIsPlacedCorrectly(child, cteId, allowRecursiveRef = false)
350354
case r: UnionLoopRef if !allowRecursiveRef && r.loopId == cteId =>
351355
throw new AnalysisException(
352356
errorClass = "INVALID_RECURSIVE_REFERENCE.PLACE",

sql/core/src/test/resources/sql-tests/analyzer-results/cte-recursion.sql.out

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2011,3 +2011,34 @@ WithCTE
20112011
+- Project [n#x]
20122012
+- SubqueryAlias t1
20132013
+- CTERelationRef xxxx, true, [n#x], false, false
2014+
2015+
2016+
-- !query
2017+
WITH RECURSIVE win(id, val) AS (
2018+
SELECT 1, CAST(10 AS BIGINT)
2019+
UNION ALL
2020+
SELECT id + 1, SUM(val) OVER (ORDER BY id ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
2021+
FROM win WHERE id < 3
2022+
)
2023+
SELECT * FROM win
2024+
-- !query analysis
2025+
org.apache.spark.sql.catalyst.ExtendedAnalysisException
2026+
{
2027+
"errorClass" : "INVALID_RECURSIVE_REFERENCE.PLACE",
2028+
"sqlState" : "42836"
2029+
}
2030+
2031+
2032+
-- !query
2033+
WITH RECURSIVE t1(n) AS (
2034+
SELECT 1
2035+
UNION ALL
2036+
(SELECT n + 1 FROM t1 WHERE n < 5 ORDER BY n)
2037+
)
2038+
SELECT * FROM t1
2039+
-- !query analysis
2040+
org.apache.spark.sql.catalyst.ExtendedAnalysisException
2041+
{
2042+
"errorClass" : "INVALID_RECURSIVE_REFERENCE.PLACE",
2043+
"sqlState" : "42836"
2044+
}

sql/core/src/test/resources/sql-tests/inputs/cte-recursion.sql

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -725,4 +725,20 @@ WITH RECURSIVE t1(n) AS (
725725
UNION ALL
726726
SELECT n + 1 FROM t1
727727
)
728-
((SELECT n FROM t1) UNION ALL (SELECT n FROM t1)) LIMIT 20
728+
((SELECT n FROM t1) UNION ALL (SELECT n FROM t1)) LIMIT 20;
729+
730+
-- Recursive CTE with self reference inside Window function
731+
WITH RECURSIVE win(id, val) AS (
732+
SELECT 1, CAST(10 AS BIGINT)
733+
UNION ALL
734+
SELECT id + 1, SUM(val) OVER (ORDER BY id ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
735+
FROM win WHERE id < 3
736+
)
737+
SELECT * FROM win;
738+
739+
WITH RECURSIVE t1(n) AS (
740+
SELECT 1
741+
UNION ALL
742+
(SELECT n + 1 FROM t1 WHERE n < 5 ORDER BY n)
743+
)
744+
SELECT * FROM t1;

sql/core/src/test/resources/sql-tests/results/cte-recursion.sql.out

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1804,3 +1804,38 @@ struct<n:int>
18041804
7
18051805
8
18061806
9
1807+
1808+
1809+
-- !query
1810+
WITH RECURSIVE win(id, val) AS (
1811+
SELECT 1, CAST(10 AS BIGINT)
1812+
UNION ALL
1813+
SELECT id + 1, SUM(val) OVER (ORDER BY id ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
1814+
FROM win WHERE id < 3
1815+
)
1816+
SELECT * FROM win
1817+
-- !query schema
1818+
struct<>
1819+
-- !query output
1820+
org.apache.spark.sql.catalyst.ExtendedAnalysisException
1821+
{
1822+
"errorClass" : "INVALID_RECURSIVE_REFERENCE.PLACE",
1823+
"sqlState" : "42836"
1824+
}
1825+
1826+
1827+
-- !query
1828+
WITH RECURSIVE t1(n) AS (
1829+
SELECT 1
1830+
UNION ALL
1831+
(SELECT n + 1 FROM t1 WHERE n < 5 ORDER BY n)
1832+
)
1833+
SELECT * FROM t1
1834+
-- !query schema
1835+
struct<>
1836+
-- !query output
1837+
org.apache.spark.sql.catalyst.ExtendedAnalysisException
1838+
{
1839+
"errorClass" : "INVALID_RECURSIVE_REFERENCE.PLACE",
1840+
"sqlState" : "42836"
1841+
}

0 commit comments

Comments
 (0)