Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing child reference bug by removing caching during qualification #307

Merged
merged 2 commits into from
Mar 21, 2025

Conversation

knassre-bodo
Copy link
Contributor

The bug in question arose in PyDough queries like this:

lines_1995=lines.WHERE(YEAR(order.order_date) == 1995)
lines_1996=lines.WHERE(YEAR(order.order_date) == 1996)
parts_per_year = parts.CALCULATE(sum1995=SUM(lines_1995.quantity), sum1996=SUM(lines_1996.quantity))

The problem arises during qualification: order.order_date is derived twice, both times within the context of TPCH.parts.lines, so its value is cached. However, this caching means that the second lines collection (the one that will become lines_1996) doesn't have the side effect of having a child collection for order added to its list. This causes a bug to pop up far later in the pipeline when lines_1996 references a child node that doesn't exist in order to fetch its order date. The fix is to remove the caching during the qualification step of the pipeline (confirmed to have essentially no effect on overall performance of the tests).

@knassre-bodo knassre-bodo merged commit 4b1492a into main Mar 21, 2025
5 checks passed
@knassre-bodo knassre-bodo deleted the kian/fix_qualbug branch March 21, 2025 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants