Skip to content

Conversation

findepi
Copy link
Member

@findepi findepi commented Sep 15, 2025

Before the change, the planning time was exponential with respect to number of columns used in window partitioning clause.

This is a stop-gap solution to avoid exponential planning time.

Which issue does this PR close?

Rationale for this change

Exponential planning time is not acceptable

What changes are included in this PR?

Reduce optimization eagerness to avoid exponential planning time

Are these changes tested?

benchmarks added in

Are there any user-facing changes?

i don't think so

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Sep 15, 2025
It was added as limited to avoid long benchmark time. However, criterion
just runs fewer iterations in such case. Larger benchmark range better
shows the problem, while still being real-life scenario.
Before the change, the planning time was exponential with respect to
number of columns used in window partitioning clause.

This is a stop-gap solution to avoid exponential planning time.
@findepi findepi force-pushed the findepi/prevent-exponential-planning-time-for-window-functions-a2de75 branch from 1353a6f to 92e9cb2 Compare September 15, 2025 08:31
@findepi
Copy link
Member Author

findepi commented Sep 15, 2025

Will post benchmark results soon

@findepi
Copy link
Member Author

findepi commented Sep 15, 2025

cargo bench --bench sql_planner -- physical_window_function_partition

Before

Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

physical_window_function_partition_by_7_on_values
                        time:   [22.985 ms 23.007 ms 23.033 ms]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

Benchmarking physical_window_function_partition_by_8_on_values: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.4s, or reduce sample count to 50.
physical_window_function_partition_by_8_on_values
                        time:   [94.241 ms 94.382 ms 94.541 ms]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

Benchmarking physical_window_function_partition_by_12_on_values: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 2839.3s, or reduce sample count to 10.
physical_window_function_partition_by_12_on_values
                        time:   [28.261 s 28.270 s 28.279 s]
Found 1 outliers among 100 measurements (1.00%)

After

physical_window_function_partition_by_4_on_values
                        time:   [173.42 µs 173.49 µs 173.58 µs]
                        change: [-64.728% -64.636% -64.541%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

physical_window_function_partition_by_7_on_values
                        time:   [224.85 µs 225.07 µs 225.32 µs]
                        change: [-99.020% -99.019% -99.016%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

physical_window_function_partition_by_8_on_values
                        time:   [245.12 µs 245.79 µs 246.64 µs]
                        change: [-99.741% -99.740% -99.739%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe

physical_window_function_partition_by_12_on_values
                        time:   [313.47 µs 315.07 µs 317.13 µs]
                        change: [-99.999% -99.999% -99.999%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 34 outliers among 100 measurements (34.00%)
  6 (6.00%) low severe
  17 (17.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe

A -99.999% improvement on a query involving just 12 columns. Not bad.

@findepi findepi requested a review from adriangb September 15, 2025 12:14
@findepi findepi added the performance Make DataFusion faster label Sep 16, 2025
@alamb alamb mentioned this pull request Sep 16, 2025
18 tasks
@alamb
Copy link
Contributor

alamb commented Sep 16, 2025

I will review this today

@alamb
Copy link
Contributor

alamb commented Sep 16, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing findepi/prevent-exponential-planning-time-for-window-functions-a2de75 (92e9cb2) to e2c2d38 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=physical_window_function_partition
BENCH_BRANCH_NAME=findepi_prevent-exponential-planning-time-for-window-functions-a2de75
Results will be posted here when complete

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @findepi - while this change may fix the exponential planning time issue, it seems like it will introduce query runtime regressions (due to a new SortExec). Is there any other way to fix the problem?

@alamb
Copy link
Contributor

alamb commented Sep 16, 2025

🤖: Benchmark completed

Details

group                                                 findepi_prevent-exponential-planning-time-for-window-functions-a2de75    main
-----                                                 ---------------------------------------------------------------------    ----
physical_window_function_partition_by_12_on_values    1.00   898.8±10.82µs        ? ?/sec                                    
physical_window_function_partition_by_30_on_values    1.00   1634.9±8.15µs        ? ?/sec                                    
physical_window_function_partition_by_4_on_values     1.00    563.3±5.87µs        ? ?/sec                                      2.39  1345.8±26.76µs        ? ?/sec
physical_window_function_partition_by_7_on_values     1.00    683.9±3.80µs        ? ?/sec                                      51.43    35.2±0.17ms        ? ?/sec
physical_window_function_partition_by_8_on_values     1.00    733.7±2.76µs        ? ?/sec                                      186.86   137.1±0.89ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Sep 16, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing findepi/prevent-exponential-planning-time-for-window-functions-a2de75 (92e9cb2) to e2c2d38 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=physical_window_function_partition
BENCH_BRANCH_NAME=findepi_prevent-exponential-planning-time-for-window-functions-a2de75
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Sep 16, 2025

🤖: Benchmark completed

Details

group                                                 findepi_prevent-exponential-planning-time-for-window-functions-a2de75    main
-----                                                 ---------------------------------------------------------------------    ----
physical_window_function_partition_by_12_on_values    1.00    904.6±7.53µs        ? ?/sec                                    
physical_window_function_partition_by_30_on_values    1.00   1651.7±9.87µs        ? ?/sec                                    
physical_window_function_partition_by_4_on_values     1.00    564.8±9.21µs        ? ?/sec                                      2.34   1321.4±8.29µs        ? ?/sec
physical_window_function_partition_by_7_on_values     1.00    691.8±6.27µs        ? ?/sec                                      50.51    34.9±0.14ms        ? ?/sec
physical_window_function_partition_by_8_on_values     1.00    738.6±4.80µs        ? ?/sec                                      184.64   136.4±0.76ms        ? ?/sec

@findepi findepi requested a review from alamb September 17, 2025 14:16
@alamb
Copy link
Contributor

alamb commented Sep 18, 2025

I am pretty torn on this PR

It clearly solves the planning time problem: (186x speedup!)

physical_window_function_partition_by_4_on_values     1.00    563.3±5.87µs        ? ?/sec                                      2.39  1345.8±26.76µs        ? ?/sec
physical_window_function_partition_by_7_on_values     1.00    683.9±3.80µs        ? ?/sec                                      51.43    35.2±0.17ms        ? ?/sec
physical_window_function_partition_by_8_on_values     1.00    733.7±2.76µs        ? ?/sec                                      186.86   137.1±0.89ms        ? ?/sec

But it also causes a potential regression by resorting data during query time

Woudl it be possible to do some sort of half-way solution, like maybe allow up to 4 window functions, and above that turn off the optimization?

@findepi
Copy link
Member Author

findepi commented Sep 18, 2025

We can do that but it would feel like a lipstick. I really hope #17624 is addressed. @berkaysynnada knows how to fix this properly without half-means like cutoff. Let's not invest time in a solution that's going to be superseded soonish.

@alamb
Copy link
Contributor

alamb commented Sep 18, 2025

We can do that but it would feel like a lipstick. I really hope #17624 is addressed. @berkaysynnada knows how to fix this properly without half-means like cutoff. Let's not invest time in a solution that's going to be superseded soonish.

I agree having a cutoff is a (very) non ideal solution and I also hope we can fix #17624 asap.

The reason I don't like the idea of just turning off the optimization for everyone, is if I imagine this change from a user perspective:

  1. I am currently running my queries that have 3 window functions using DataFusion 49.0.0 / Datafusion 50.0.0, and everything is great!
  2. When I upgrade to DataFusion 50.0.1 my queries get much slower (b/c now there is a bunch more sorting happening)
  3. When I ask why my queries got slower, I get told "so people who have 30 window functions don't have problems"

I would very much feel like this is a pretty major regression for me

The reason I proposed the cutoff is to reduce the number of users who are affected.

  1. Users who are running today under the cutoff don't experience a regression and still get the same performance.
  2. There probably aren't many people using 20 window functions given they would hit exponential planning time anyways

I understand that whatever value of cutoff we pick may still result in some people hitting a regression, but I think by picking a reasonable cutoff we'll avoid most problems

@findepi
Copy link
Member Author

findepi commented Sep 18, 2025

3. d "so people who have 30 window functions don't have problems"

planning time gets considerable around 8 columns, IIRC.
It's really not a theoretical problem. 30-ish columns was what I saw in user queries (plural)

@findepi
Copy link
Member Author

findepi commented Sep 18, 2025

Any exponential cost in the planner (or anywhere else) should be considered an absolute no-go and removed promptly. I can take a look whether adding a cutoff is easy. I am worried, however, that the added complexity can mask bugs. With cutoff of N, we should have test coverage for various window functions with <N and N> columns, which is something we unlikely gonna remember about.

@findepi findepi force-pushed the findepi/prevent-exponential-planning-time-for-window-functions-a2de75 branch from 83bee08 to 39a3647 Compare September 18, 2025 19:14
@findepi
Copy link
Member Author

findepi commented Sep 18, 2025

Added cutoff at the cost of test coverage.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I agree it will be nice to get the real issue #17624 fixed asap

I added it to the list of items we should fix for

@berkaysynnada
Copy link
Contributor

tbh, not loving this workaround. If it can wait till this weekend, I can implement the actual fix I talked about earlier

@alamb
Copy link
Contributor

alamb commented Sep 19, 2025

tbh, not loving this workaround. If it can wait till this weekend, I can implement the actual fix I talked about earlier

I think we can wait a few more days. Thank you @berkaysynnada 🙏

@findepi findepi closed this Sep 22, 2025
@findepi findepi deleted the findepi/prevent-exponential-planning-time-for-window-functions-a2de75 branch September 22, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate datasource Changes to the datasource crate performance Make DataFusion faster physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exponential planning time when window function is partitioned by multiple columns
3 participants