Skip to content

Optimize de-correlation to eliminate redundant parent when keeping rows without matches is not needed#295

Merged
knassre-bodo merged 19 commits intomainfrom
kian/decorell_opt
Mar 19, 2025
Merged

Optimize de-correlation to eliminate redundant parent when keeping rows without matches is not needed#295
knassre-bodo merged 19 commits intomainfrom
kian/decorell_opt

Conversation

@knassre-bodo
Copy link
Copy Markdown
Contributor

@knassre-bodo knassre-bodo commented Mar 12, 2025

Resolves #270. See issue for more details on the relevant pattern. Certain tests were changed so they would use the aggregation+semi pattern that gets optimized by this PR. Also added an ANYTHING function (equivalent of ANY_VALUE in SQL) for any columns that become pass-through aggregations (SQLGLot converts this to MIN if the dialect does not have ANY_VALUE). Several other optimizations were also added:

  • Removing redundant left-joins when only the LHS is used during column pruning (since the structure of PyDough guarantees the cardinality is 1:1 and the RHS must have already been aggregated otherwise)
  • Removing any dead children nodes during hybrid de-correlation if the children are no longer used

@knassre-bodo knassre-bodo changed the title Rewriting tests to further encourage the desired behavior Optimize de-correlation to eliminate redundant parent when keeping rows without matches is not needed Mar 13, 2025
Comment thread pydough/conversion/hybrid_tree.py Outdated
Comment on lines -1167 to -1168
if rhs_name not in child_node.terms:
breakpoint()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was debugging code from an earlier PR that was placed in a case where an exception occurs but didn't get deleted earlier.

@knassre-bodo knassre-bodo marked this pull request as ready for review March 13, 2025 18:36
@knassre-bodo knassre-bodo requested a review from vineetg3 March 13, 2025 18:36
…after pullup trnasformation to remove now-defunct children and renumber the remainder [RUN CI]
…oval, and name collisions, including multiple pullups happening together [RUN CI]
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Comment on lines -6 to +7
JOIN(conditions=[t0.key == t1.supplier_key], types=['left'], columns={'account_balance': t0.account_balance, 'key': t0.key})
FILTER(condition=nation_key <= 3:int64, columns={'account_balance': account_balance, 'key': key})
SCAN(table=tpch.SUPPLIER, columns={'account_balance': s_acctbal, 'key': s_suppkey, 'nation_key': s_nationkey})
AGGREGATE(keys={'supplier_key': supplier_key}, aggregations={})
JOIN(conditions=[t0.part_key == t1.key], types=['inner'], columns={'supplier_key': t0.supplier_key})
SCAN(table=tpch.PARTSUPP, columns={'part_key': ps_partkey, 'supplier_key': ps_suppkey})
SCAN(table=tpch.PART, columns={'key': p_partkey})
FILTER(condition=nation_key <= 3:int64, columns={'account_balance': account_balance, 'key': key})
SCAN(table=tpch.SUPPLIER, columns={'account_balance': s_acctbal, 'key': s_suppkey, 'nation_key': s_nationkey})
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good example of deleting useless joins (based on the change in the column pruner). We don't need lines 6 and 9-12 since lines 7-8 have the same cardinality as 9-12, and we aren't using any columns from 9-12, so we can just use 7-8.

### MIN

The `MIN` function returns the smallest value from the set of numerical values it is called on.
The `MIN` function returns the smallest value from the set of values it is called on.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the expanded types of values considered?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, MIN / MAX work on any type.

Copy link
Copy Markdown
Contributor

@vineetg3 vineetg3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Although the module level readmes could be updated to factor in the new logic.

@knassre-bodo knassre-bodo merged commit 5ff58a1 into main Mar 19, 2025
5 checks passed
@knassre-bodo knassre-bodo deleted the kian/decorell_opt branch March 19, 2025 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve de-correlation to remove duplicate subtrees where possible

2 participants