Optimize de-correlation to eliminate redundant parent when keeping rows without matches is not needed#295
Conversation
| if rhs_name not in child_node.terms: | ||
| breakpoint() |
There was a problem hiding this comment.
This was debugging code from an earlier PR that was placed in a case where an exception occurs but didn't get deleted earlier.
…after pullup trnasformation to remove now-defunct children and renumber the remainder [RUN CI]
…oval, and name collisions, including multiple pullups happening together [RUN CI]
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
| JOIN(conditions=[t0.key == t1.supplier_key], types=['left'], columns={'account_balance': t0.account_balance, 'key': t0.key}) | ||
| FILTER(condition=nation_key <= 3:int64, columns={'account_balance': account_balance, 'key': key}) | ||
| SCAN(table=tpch.SUPPLIER, columns={'account_balance': s_acctbal, 'key': s_suppkey, 'nation_key': s_nationkey}) | ||
| AGGREGATE(keys={'supplier_key': supplier_key}, aggregations={}) | ||
| JOIN(conditions=[t0.part_key == t1.key], types=['inner'], columns={'supplier_key': t0.supplier_key}) | ||
| SCAN(table=tpch.PARTSUPP, columns={'part_key': ps_partkey, 'supplier_key': ps_suppkey}) | ||
| SCAN(table=tpch.PART, columns={'key': p_partkey}) | ||
| FILTER(condition=nation_key <= 3:int64, columns={'account_balance': account_balance, 'key': key}) | ||
| SCAN(table=tpch.SUPPLIER, columns={'account_balance': s_acctbal, 'key': s_suppkey, 'nation_key': s_nationkey}) |
There was a problem hiding this comment.
This is a good example of deleting useless joins (based on the change in the column pruner). We don't need lines 6 and 9-12 since lines 7-8 have the same cardinality as 9-12, and we aren't using any columns from 9-12, so we can just use 7-8.
| ### MIN | ||
|
|
||
| The `MIN` function returns the smallest value from the set of numerical values it is called on. | ||
| The `MIN` function returns the smallest value from the set of values it is called on. |
There was a problem hiding this comment.
What are the expanded types of values considered?
There was a problem hiding this comment.
In theory, MIN / MAX work on any type.
vineetg3
left a comment
There was a problem hiding this comment.
LGTM! Although the module level readmes could be updated to factor in the new logic.
Resolves #270. See issue for more details on the relevant pattern. Certain tests were changed so they would use the aggregation+semi pattern that gets optimized by this PR. Also added an
ANYTHINGfunction (equivalent ofANY_VALUEin SQL) for any columns that become pass-through aggregations (SQLGLot converts this toMINif the dialect does not haveANY_VALUE). Several other optimizations were also added: