Optimize de-correlation to eliminate redundant parent when keeping rows without matches is not needed by knassre-bodo · Pull Request #295 · bodo-ai/PyDough

knassre-bodo · 2025-03-12T17:46:00Z

Resolves #270. See issue for more details on the relevant pattern. Certain tests were changed so they would use the aggregation+semi pattern that gets optimized by this PR. Also added an ANYTHING function (equivalent of ANY_VALUE in SQL) for any columns that become pass-through aggregations (SQLGLot converts this to MIN if the dialect does not have ANY_VALUE). Several other optimizations were also added:

Removing redundant left-joins when only the LHS is used during column pruning (since the structure of PyDough guarantees the cardinality is 1:1 and the RHS must have already been aggregated otherwise)
Removing any dead children nodes during hybrid de-correlation if the children are no longer used

…the prev/next tests)

knassre-bodo · 2025-03-13T17:26:36Z

-                    if rhs_name not in child_node.terms:
-                        breakpoint()


This was debugging code from an earlier PR that was placed in a case where an exception occurs but didn't get deleted earlier.

…N CI]

…after pullup trnasformation to remove now-defunct children and renumber the remainder [RUN CI]

…oval, and name collisions, including multiple pullups happening together [RUN CI]

review-notebook-app · 2025-03-17T17:13:55Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

knassre-bodo · 2025-03-17T18:14:40Z

-     JOIN(conditions=[t0.key == t1.supplier_key], types=['left'], columns={'account_balance': t0.account_balance, 'key': t0.key})
-      FILTER(condition=nation_key <= 3:int64, columns={'account_balance': account_balance, 'key': key})
-       SCAN(table=tpch.SUPPLIER, columns={'account_balance': s_acctbal, 'key': s_suppkey, 'nation_key': s_nationkey})
-      AGGREGATE(keys={'supplier_key': supplier_key}, aggregations={})
-       JOIN(conditions=[t0.part_key == t1.key], types=['inner'], columns={'supplier_key': t0.supplier_key})
-        SCAN(table=tpch.PARTSUPP, columns={'part_key': ps_partkey, 'supplier_key': ps_suppkey})
-        SCAN(table=tpch.PART, columns={'key': p_partkey})
+     FILTER(condition=nation_key <= 3:int64, columns={'account_balance': account_balance, 'key': key})
+      SCAN(table=tpch.SUPPLIER, columns={'account_balance': s_acctbal, 'key': s_suppkey, 'nation_key': s_nationkey})


This is a good example of deleting useless joins (based on the change in the column pruner). We don't need lines 6 and 9-12 since lines 7-8 have the same cardinality as 9-12, and we aren't using any columns from 9-12, so we can just use 7-8.

vineetg3 · 2025-03-18T15:28:32Z

 ### MIN

-The `MIN` function returns the smallest value from the set of numerical values it is called on.
+The `MIN` function returns the smallest value from the set of values it is called on.


What are the expanded types of values considered?

In theory, MIN / MAX work on any type.

vineetg3

LGTM! Although the module level readmes could be updated to factor in the new logic.

knassre-bodo added 4 commits March 12, 2025 13:42

Rewriting tests to further encourage the desired behavior

9961d98

WIP

a042050

Achieved decorrelation optpimization for singular correl queries 9/17

a4bec0e

Adding support for the aggregation cases (correl 6/18/19/20, some of …

531c2b0

…the prev/next tests)

knassre-bodo changed the title ~~Rewriting tests to further encourage the desired behavior~~ Optimize de-correlation to eliminate redundant parent when keeping rows without matches is not needed Mar 13, 2025

knassre-bodo added 2 commits March 13, 2025 12:52

[RUN CI]

9598df6

Added ANYTHING function to avoid calling MIN when possible

eac96ad

knassre-bodo commented Mar 13, 2025

View reviewed changes

knassre-bodo added 2 commits March 13, 2025 13:32

Revisions, cleanup, and comments/docstrings

8bec6a8

Added two additiona correlation stress-tests for the new behavior [RU…

f24ea90

…N CI]

knassre-bodo marked this pull request as ready for review March 13, 2025 18:36

knassre-bodo requested a review from vineetg3 March 13, 2025 18:36

knassre-bodo added 4 commits March 13, 2025 15:04

Fixing correl_24 refsol [RUN CI]

0a48157

Adjusted q5, added more correl queries similar to q5, fixed behavior …

153447c

…after pullup trnasformation to remove now-defunct children and renumber the remainder [RUN CI]

Adding TOC entry for ANYTHING function

b97f8bc

Added more extreme edge case handling with PullUp operator, child rem…

46f8bfb

…oval, and name collisions, including multiple pullups happening together [RUN CI]

knassre-bodo added 4 commits March 17, 2025 13:14

Removing dead code [RUN CI]

664dd45

Removing dead code [RUN CI]

6a26a80

Resolving conflicts with singular PR

de17970

Add redundant left-join pruning [RUN CI]

329551c

knassre-bodo commented Mar 17, 2025

View reviewed changes

knassre-bodo added 2 commits March 17, 2025 14:35

Refactoring correl test 29 [RUN CI]

db2c508

Adjusting correl_29 again to avoid sqlite parser stack overflow [RUN CI]

e83082c

vineetg3 reviewed Mar 18, 2025

View reviewed changes

vineetg3 approved these changes Mar 18, 2025

View reviewed changes

Final revisions [RUN CI]

b50baf2

knassre-bodo merged commit 5ff58a1 into main Mar 19, 2025
5 checks passed

knassre-bodo deleted the kian/decorell_opt branch March 19, 2025 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize de-correlation to eliminate redundant parent when keeping rows without matches is not needed#295

Optimize de-correlation to eliminate redundant parent when keeping rows without matches is not needed#295
knassre-bodo merged 19 commits intomainfrom
kian/decorell_opt

knassre-bodo commented Mar 12, 2025 •

edited

Loading

Uh oh!

knassre-bodo Mar 13, 2025

Uh oh!

review-notebook-app Bot commented Mar 17, 2025

Uh oh!

knassre-bodo Mar 17, 2025

Uh oh!

vineetg3 Mar 18, 2025

Uh oh!

knassre-bodo Mar 19, 2025

Uh oh!

vineetg3 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

knassre-bodo commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

knassre-bodo Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

review-notebook-app Bot commented Mar 17, 2025

Uh oh!

knassre-bodo Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

vineetg3 Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

knassre-bodo Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

vineetg3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

knassre-bodo commented Mar 12, 2025 •

edited

Loading