-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Prevent exponential planning time for Window functions #17563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent exponential planning time for Window functions #17563
Conversation
It was added as limited to avoid long benchmark time. However, criterion just runs fewer iterations in such case. Larger benchmark range better shows the problem, while still being real-life scenario.
Before the change, the planning time was exponential with respect to number of columns used in window partitioning clause. This is a stop-gap solution to avoid exponential planning time.
1353a6f
to
92e9cb2
Compare
Will post benchmark results soon |
Before
After
A |
I will review this today |
🤖 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @findepi - while this change may fix the exponential planning time issue, it seems like it will introduce query runtime regressions (due to a new SortExec). Is there any other way to fix the problem?
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
I am pretty torn on this PR It clearly solves the planning time problem: (186x speedup!)
But it also causes a potential regression by resorting data during query time Woudl it be possible to do some sort of half-way solution, like maybe allow up to 4 window functions, and above that turn off the optimization? |
We can do that but it would feel like a lipstick. I really hope #17624 is addressed. @berkaysynnada knows how to fix this properly without half-means like cutoff. Let's not invest time in a solution that's going to be superseded soonish. |
I agree having a cutoff is a (very) non ideal solution and I also hope we can fix #17624 asap. The reason I don't like the idea of just turning off the optimization for everyone, is if I imagine this change from a user perspective:
I would very much feel like this is a pretty major regression for me The reason I proposed the cutoff is to reduce the number of users who are affected.
I understand that whatever value of cutoff we pick may still result in some people hitting a regression, but I think by picking a reasonable cutoff we'll avoid most problems |
planning time gets considerable around 8 columns, IIRC. |
Any exponential cost in the planner (or anywhere else) should be considered an absolute no-go and removed promptly. I can take a look whether adding a cutoff is easy. I am worried, however, that the added complexity can mask bugs. With cutoff of |
83bee08
to
39a3647
Compare
Added cutoff at the cost of test coverage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh, not loving this workaround. If it can wait till this weekend, I can implement the actual fix I talked about earlier |
I think we can wait a few more days. Thank you @berkaysynnada 🙏 |
Before the change, the planning time was exponential with respect to number of columns used in window partitioning clause.
This is a stop-gap solution to avoid exponential planning time.
Which issue does this PR close?
Rationale for this change
Exponential planning time is not acceptable
What changes are included in this PR?
Reduce optimization eagerness to avoid exponential planning time
Are these changes tested?
benchmarks added in
Are there any user-facing changes?
i don't think so