Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for SQL APPLY operation #357

Open
ashvina opened this issue Oct 12, 2022 · 1 comment
Open

Add support for SQL APPLY operation #357

ashvina opened this issue Oct 12, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@ashvina
Copy link

ashvina commented Oct 12, 2022

Related Substrait issues #119 #206

APPLY [1], introduced in SQL Server in 2005, is frequently used in practice. In Calcite the operation is LATERAL [2]. This operator looks similar to a JOIN, i.e. it executes an expression on the two input relations. However it is semantically different. In a JOIN, both sets to be joined must be self-sufficient. Otherwise it may result in errors like "multi-part identifier could not be bound". If dependencies between sets exists, a user can use correlated-subqueries. However, nested subqueries result in recursive evaluation strategies which incur O(n2) runtime [3].

Example [4]:
Intent: For each row from table1 select first rowcount rows from table2, ordered by table2.id

Query using APPLY
SELECT * FROM table1, CROSS APPLY ( SELECT TOP (table1.rowcount) * FROM table2 ORDER BY id ) t2

Queries without using APPLY

SELECT  * FROM 
table1 t1 JOIN ( SELECT  t2o.*, ROW_NUMBER() OVER (ORDER BY id) AS rn FROM    table2 t2o ) t2 
ON t2.rn <= t1.rowcount

As benchmarked in [4], execution of query with APPLY in SQL Server is 2 orders of magnitude faster than the query without apply. I.e. APPLY yields efficiency and is not replaceable with traditional JOINs.

Currently APPLY operation cannot be represented in Substrait. Which means, optimizations in various engines cannot be leveraged if APPLY like operations are deconstructed and mapped into existing operations supported by Substrait. To efficiently support full SQL, operation for APPLY is needed in Substrait. De-correlation should not be enforced.

CC: @jcamachor

[1] https://www.mssqltips.com/sqlservertip/1958/sql-server-cross-apply-and-outer-apply/
[2] CALCITE-1472
[3] https://15721.courses.cs.cmu.edu/spring2018/papers/16-optimizer2/hyperjoins-btw2017.pdf
[4] https://explainextended.com/2009/07/16/inner-join-vs-cross-apply/

@ashvina
Copy link
Author

ashvina commented Oct 12, 2022

The difference between the join and APPLY operator becomes evident there is a table-valued expression on the right side and the table-valued expression is to be evaluated for each row from the left relation.

ashvina added a commit to ashvina/substrait-java that referenced this issue Nov 24, 2022
This change adds support for parsing of SQL queries with APPLY (join with
correlated subquery), and to build OuterReferences map of correlated variables
present in the query's join predicates. The OuterRefs will be used while
constructing Substrait plans to bind correlated variables. The change also
adds few example queries which depend on APPLY / LATERAL operators.

This change still does not map calcite-correlated-join to Substrait, as the
spec for APPLY is still not approved. As such, while the parsing of calcite
query plans will succeed after this change, the unit tests and run time
conversion will continue to fail in the final step of building the
Substrait plan. Additional changes are needed to support APPLY.

Refs #substrait-io/substrait/issues/357
ashvina added a commit to ashvina/substrait-java that referenced this issue Nov 27, 2022
This change adds support for parsing of SQL queries with APPLY (join with
correlated subquery), and to build OuterReferences map of correlated variables
present in the query's join predicates. The OuterRefs will be used while
constructing Substrait plans to bind correlated variables. The change also
adds few example queries which depend on APPLY / LATERAL operators.

This change still does not map calcite-correlated-join to Substrait, as the
spec for APPLY is still not approved. As such, while the parsing of calcite
query plans will succeed after this change, the unit tests and run time
conversion will continue to fail in the final step of building the
Substrait plan. Additional changes are needed to support APPLY.

Refs #substrait-io/substrait/issues/357
jinfengni pushed a commit to substrait-io/substrait-java that referenced this issue Dec 2, 2022
* feat: support parsing of SQL queries with APPLY

This change adds support for parsing of SQL queries with APPLY (join with
correlated subquery), and to build OuterReferences map of correlated variables
present in the query's join predicates. The OuterRefs will be used while
constructing Substrait plans to bind correlated variables. The change also
adds few example queries which depend on APPLY / LATERAL operators.

This change still does not map calcite-correlated-join to Substrait, as the
spec for APPLY is still not approved. As such, while the parsing of calcite
query plans will succeed after this change, the unit tests and run time
conversion will continue to fail in the final step of building the
Substrait plan. Additional changes are needed to support APPLY.

Refs #substrait-io/substrait/issues/357

* fix: unit test cases to validate correlated vars

This change addresses review comments, the unit tests validate the outer
reference map built from calcite plans of APPLY queries.

* fix: add test for nested APPLY

This change addresses review comments. A new test case to validate
nested APPLY join parsing is added. Also added validation of depth
information in existing tests.
@westonpace westonpace added the enhancement New feature or request label Mar 8, 2023
ajegou pushed a commit to ajegou/substrait-java that referenced this issue Mar 29, 2024
* feat: support parsing of SQL queries with APPLY

This change adds support for parsing of SQL queries with APPLY (join with
correlated subquery), and to build OuterReferences map of correlated variables
present in the query's join predicates. The OuterRefs will be used while
constructing Substrait plans to bind correlated variables. The change also
adds few example queries which depend on APPLY / LATERAL operators.

This change still does not map calcite-correlated-join to Substrait, as the
spec for APPLY is still not approved. As such, while the parsing of calcite
query plans will succeed after this change, the unit tests and run time
conversion will continue to fail in the final step of building the
Substrait plan. Additional changes are needed to support APPLY.

Refs #substrait-io/substrait/issues/357

* fix: unit test cases to validate correlated vars

This change addresses review comments, the unit tests validate the outer
reference map built from calcite plans of APPLY queries.

* fix: add test for nested APPLY

This change addresses review comments. A new test case to validate
nested APPLY join parsing is added. Also added validation of depth
information in existing tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants