Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add short circuit evaluation for AND and OR #15462

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

acking-you
Copy link
Contributor

@acking-you acking-you commented Mar 27, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

  1. Add an extended SQL statement targeting this optimization.
  2. Provide a description of this extended SQL.
  3. Using this optimization, the extended SQL achieves a 2X performance improvement. (If the string length in the filter is longer, the effect can increase linearly,such as 500X in BinaryExpr evaluate lacks optimization for Or and And scenarios #11212 (comment) .)

Below is the performance comparison of running the extended SQL locally. It seems there is also some improvement in Q4 (maybe noise).

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ add_short_circuit ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  964.48ms │         1006.79ms │     no change │
│ QQuery 1     │  409.89ms │          419.52ms │     no change │
│ QQuery 2     │  838.25ms │          868.74ms │     no change │
│ QQuery 3     │  408.15ms │          396.88ms │     no change │
│ QQuery 4     │ 1029.80ms │          783.40ms │ +1.31x faster │
│ QQuery 5     │ 9429.76ms │         8835.06ms │ +1.07x faster │
│ QQuery 6     │ 4096.47ms │         1382.42ms │ +2.96x faster │
└──────────────┴───────────┴───────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)                │ 17176.80ms │
│ Total Time (add_short_circuit)   │ 13692.80ms │
│ Average Time (main)              │  2453.83ms │
│ Average Time (add_short_circuit) │  1956.11ms │
│ Queries Faster                   │          3 │
│ Queries Slower                   │          0 │
│ Queries with No Change           │          4 │
└──────────────────────────────────┴────────────┘

At the same time, while creating this SQL, I also discovered a bug — one of the filter caused a panic: #15461

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Mar 27, 2025
@acking-you
Copy link
Contributor Author

Some tests failed, so let me take a look at what exactly is going on.

@ctsk
Copy link
Contributor

ctsk commented Mar 28, 2025

I think one issue is that the short-circuit logic is not handling cases where the the rhs contains NULLs. E.g. true OR NULL needs to evaluate to NULL

@acking-you
Copy link
Contributor Author

I think one issue is that the short-circuit logic is not handling cases where the the rhs contains NULLs. E.g. true OR NULL needs to evaluate to NULL

Thank you very much for your hint, it will be very helpful for me to fix these tests!

@acking-you
Copy link
Contributor Author

I think one issue is that the short-circuit logic is not handling cases where the the rhs contains NULLs. E.g. true OR NULL needs to evaluate to NULL

After taking a closer look, in fact, the situation you mentioned does not actually lead to the short-circuit optimization logic.
The error I’m currently seeing is the use of true_count() == 0 to determine if it is false, but in reality, it could also be null.

@ctsk
Copy link
Contributor

ctsk commented Mar 28, 2025

You're absolutely right, I got my logic wrong there. Embarrasing!

@acking-you
Copy link
Contributor Author

You're absolutely right, I got my logic wrong there. Embarrasing!

It's okay. You've also taught me a lot. When I first started writing this, I really didn't consider the case of null

@acking-you
Copy link
Contributor Author

Hello @alamb, the optimization SQL and documentation related to this PR have been completed, and all tests have passed. We may need to formally verify the performance, but I'm not quite sure how to do that (I can only run it locally).

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @acking-you -- this looks really nice.

I think this needs some tests but otherwise it is looking quite nice.

Also, could you please add the new Q6 benchmark in a separate PR so I can more easily run my benchmark scripts before/after your code change?


### Q6: How many social shares meet complex multi-stage filtering criteria?
**Question**: What is the count of sharing actions from iPhone mobile users on specific social networks, within common timezones, participating in seasonal campaigns, with high screen resolutions and closely matched UTM parameters?
**Important Query Properties**: Simple filter with high-selectivity, Costly string matching, A large number of filters with high overhead are positioned relatively later in the process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -358,7 +358,50 @@ impl PhysicalExpr for BinaryExpr {
fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
use arrow::compute::kernels::numeric::*;

fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @acking-you -- this looks great

Is there any reason to have this function defined in the evaluate method? I think we could just make it a normal function and reduce the nesting level

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we find that this slows down some other performance we could also add some sort of heuristic check to calling false_count / true_count -- like for example if the rhs arg is "complex" (not a Column for example)

Copy link
Contributor Author

@acking-you acking-you Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to have this function defined in the evaluate method?

There was no particular reason. Maybe I couldn't find a suitable place to write it at the time, haha. Where do you think this function should be placed?

If we find that this slows down some other performance we could also add some sort of heuristic check to calling false_count / true_count -- like for example if the rhs arg is "complex" (not a Column for example)

I also agree that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the function outside and added some comments.

@alamb alamb changed the title Add short circuit Add short circuit evaluation for AND and OR Mar 28, 2025
@acking-you
Copy link
Contributor Author

Also, could you please add the new Q6 benchmark in a separate PR so I can more easily run my benchmark scripts before/after your code change?

Okey,I got it.Do you mean that Q6 and its related description in the current branch need to be completed with a separate PR? But this way, it seems that you would still need to cherry-pick Q6 to the corresponding branch when testing.

acking-you and others added 12 commits March 31, 2025 10:51
…ultiple benchmarks (apache#14642)

* Add support --mem-pool-type and --memory-limit options for all benchmarks

* Add --sort-spill-reservation-bytes option
* Add unit tests to FFI_ExecutionPlan

* Add unit tests for FFI table source

* Add round trip tests for volatility

* Add unit tests for FFI insert op

* Simplify string generation in unit test

Co-authored-by: Andrew Lamb <[email protected]>

* Fix drop of borrowed value

---------

Co-authored-by: Andrew Lamb <[email protected]>
…ultiple benchmarks (apache#14642)

* Add support --mem-pool-type and --memory-limit options for all benchmarks

* Add --sort-spill-reservation-bytes option
* Add unit tests to FFI_ExecutionPlan

* Add unit tests for FFI table source

* Add round trip tests for volatility

* Add unit tests for FFI insert op

* Simplify string generation in unit test

Co-authored-by: Andrew Lamb <[email protected]>

* Fix drop of borrowed value

---------

Co-authored-by: Andrew Lamb <[email protected]>
@acking-you
Copy link
Contributor Author

Also, could you please add the new Q6 benchmark in a separate PR so I can more easily run my benchmark scripts before/after your code change?

I have successfully split Q6 into: #15500
Thank you very much for your CR. @alamb

match arg {
ColumnarValue::Array(array) => {
if let Ok(array) = as_boolean_array(&array) {
return array.false_count() == array.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this had some overhead (for calculating the counts) from a previous try.
I wonder if it helps to short optimize this expression (e.g. match until we get a chunk of the bitmap != 0)

Copy link
Contributor Author

@acking-you acking-you Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it helps to short optimize this expression (e.g. match until we get a chunk of the bitmap != 0)

I think the overhead added here should be very small (the compiler optimization should work well), and the test results we discussed before were sometimes fast and sometimes slow (maybe noise).

Your suggestion of making an early judgment and returning false seems like a good idea, but I'm not sure if it will be effective.
The concern I have with this approach is that it requires adding an if condition inside the for loop, which will most likely disable the compiler's SIMD instruction optimization (I've encountered a similar situation before, and I had to manually unroll the SIMD...).

Copy link
Contributor

@Dandandan Dandandan Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, we can use bool_and (https://docs.rs/arrow/latest/arrow/compute/fn.bool_and.html) and bool_or which operates on u64 values to test performance changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, we can use bool_and (https://docs.rs/arrow/latest/arrow/compute/fn.bool_and.html) and bool_or which operates on u64 values to test performance changes.

Thank you for your suggestion. I will try it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be overkill, but one could try a sampling approach: Run the loop with the early exit for the first few chunks, and then switch over to the unconditional loop.

Almost seems like something the compiler could automagically do...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be overkill, but one could try a sampling approach: Run the loop with the early exit for the first few chunks, and then switch over to the unconditional loop.

Thank you for your suggestion, but if we're only applying conditional checks to the first few blocks, then I feel this optimization might not be meaningful. If nearly all blocks can be filtered out by the preceding filter, the optimization will no longer be effective.

If we find that this slows down some other performance we could also add some sort of heuristic check to calling false_count / true_count -- like for example if the rhs arg is "complex" (not a Column for example)

I tend to agree with @alamb's point that if the overhead of verification is somewhat unacceptable, adopting some heuristic approaches would be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked more carefully at bool_or and I do think it would be faster than this implementation on the case where there are some true values (as it stops as soon as it finds a single non zero): https://docs.rs/arrow/latest/arrow/compute/fn.bool_or.html

@alamb alamb added the performance Make DataFusion faster label Mar 31, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @acking-you -- I am running the benchmarks now and assuming this shows now performance regressions I think we should merge it in

We can perhaps try @Dandandan 's suggestion for bool_and etc as a follow on PR / issue (I can file a follow on issue).

I looked for a binary expression and/or micro benchmark and could not find one: https://github.com/apache/datafusion/tree/c929a1cd133590e4944bc2c7900611220450335a/datafusion/physical-expr/benches

@acking-you
Copy link
Contributor Author

I looked for a binary expression and/or micro benchmark and could not find one: https://github.com/apache/datafusion/tree/c929a1cd133590e4944bc2c7900611220450335a/datafusion/physical-expr/benches

Thank you for your suggestions and help. I also noticed this part of the code yesterday. I've had a cold and fever in the past few days, so I haven't tried the method mentioned by @Dandandan for benchmarking yet. I should be able to work on it today.

@alamb
Copy link
Contributor

alamb commented Apr 3, 2025

I looked for a binary expression and/or micro benchmark and could not find one: https://github.com/apache/datafusion/tree/c929a1cd133590e4944bc2c7900611220450335a/datafusion/physical-expr/benches

Thank you for your suggestions and help. I also noticed this part of the code yesterday. I've had a cold and fever in the past few days, so I haven't tried the method mentioned by @Dandandan for benchmarking yet. I should be able to work on it today.

I hope you feel better soon

I have been running some benchmark numbers (below) and I have gotten some strange results. I definitely see this PR improves performance on the newly added clickbench extended benchmark (almost 3x faster!)

However clickbench_1 seems to show a slowdown compared to clickbench_partitioned which I don't understand. I will try and reproduce this locally

Details

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ add_short_circuit ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.58ms │            0.59ms │    no change │
│ QQuery 1     │    70.93ms │           72.44ms │    no change │
│ QQuery 2     │   118.33ms │          120.96ms │    no change │
│ QQuery 3     │   124.52ms │          130.17ms │    no change │
│ QQuery 4     │   788.26ms │          835.71ms │ 1.06x slower │
│ QQuery 5     │   891.18ms │         1033.71ms │ 1.16x slower │
│ QQuery 6     │    68.02ms │           66.47ms │    no change │
│ QQuery 7     │    79.51ms │           80.38ms │    no change │
│ QQuery 8     │   951.66ms │          962.46ms │    no change │
│ QQuery 9     │  1242.83ms │         1277.46ms │    no change │
│ QQuery 10    │   305.01ms │          312.25ms │    no change │
│ QQuery 11    │   334.19ms │          341.58ms │    no change │
│ QQuery 12    │   941.89ms │         1079.60ms │ 1.15x slower │
│ QQuery 13    │  1341.24ms │         1528.55ms │ 1.14x slower │
│ QQuery 14    │   874.85ms │         1057.59ms │ 1.21x slower │
│ QQuery 15    │  1118.77ms │         1134.54ms │    no change │
│ QQuery 16    │  1803.30ms │         1920.39ms │ 1.06x slower │
│ QQuery 17    │  1669.97ms │         1780.42ms │ 1.07x slower │
│ QQuery 18    │  3269.95ms │         3297.16ms │    no change │
│ QQuery 19    │   120.62ms │          116.02ms │    no change │
│ QQuery 20    │  1222.20ms │         1302.52ms │ 1.07x slower │
│ QQuery 21    │  1471.42ms │         1638.85ms │ 1.11x slower │
│ QQuery 22    │  2740.23ms │         4508.20ms │ 1.65x slower │
│ QQuery 23    │  8733.62ms │        10619.68ms │ 1.22x slower │
│ QQuery 24    │   515.74ms │          695.11ms │ 1.35x slower │
│ QQuery 25    │   431.26ms │          608.92ms │ 1.41x slower │
│ QQuery 26    │   568.41ms │          743.79ms │ 1.31x slower │
│ QQuery 27    │  1865.82ms │         1966.63ms │ 1.05x slower │
│ QQuery 28    │ 13400.82ms │        13320.07ms │    no change │
│ QQuery 29    │   574.24ms │          590.87ms │    no change │
│ QQuery 30    │   863.90ms │         1059.33ms │ 1.23x slower │
│ QQuery 31    │   940.63ms │         1115.46ms │ 1.19x slower │
│ QQuery 32    │  2790.62ms │         2933.31ms │ 1.05x slower │
│ QQuery 33    │  3534.40ms │         3583.06ms │    no change │
│ QQuery 34    │  3515.54ms │         3592.40ms │    no change │
│ QQuery 35    │  1327.76ms │         1384.98ms │    no change │
│ QQuery 36    │   257.33ms │          269.35ms │    no change │
│ QQuery 37    │   124.48ms │          187.43ms │ 1.51x slower │
│ QQuery 38    │   167.20ms │          182.73ms │ 1.09x slower │
│ QQuery 39    │   454.63ms │          474.86ms │    no change │
│ QQuery 40    │    82.41ms │           83.28ms │    no change │
│ QQuery 41    │    82.20ms │           80.21ms │    no change │
│ QQuery 42    │    78.13ms │           87.76ms │ 1.12x slower │
└──────────────┴────────────┴───────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)           │ 61858.61ms │
│ Total Time (add_short_circuit)   │ 68177.23ms │
│ Average Time (main_base)         │  1438.57ms │
│ Average Time (add_short_circuit) │  1585.52ms │
│ Queries Faster                   │          0 │
│ Queries Slower                   │         21 │
│ Queries with No Change           │         22 │
└──────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ add_short_circuit ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  1914.62ms │         1954.11ms │     no change │
│ QQuery 1     │   757.11ms │          760.32ms │     no change │
│ QQuery 2     │  1478.24ms │         1495.54ms │     no change │
│ QQuery 3     │   700.15ms │          705.43ms │     no change │
│ QQuery 4     │  1489.20ms │         1457.77ms │     no change │
│ QQuery 5     │ 17258.06ms │        17048.63ms │     no change │
│ QQuery 6     │  6872.16ms │         2318.32ms │ +2.96x faster │
└──────────────┴────────────┴───────────────────┴───────────────┘

--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ add_short_circuit ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.45ms │            2.71ms │  1.11x slower │
│ QQuery 1     │    36.96ms │           38.85ms │  1.05x slower │
│ QQuery 2     │    93.10ms │           98.35ms │  1.06x slower │
│ QQuery 3     │    99.41ms │          102.08ms │     no change │
│ QQuery 4     │   771.09ms │          762.77ms │     no change │
│ QQuery 5     │   880.06ms │          878.60ms │     no change │
│ QQuery 6     │    37.74ms │           34.11ms │ +1.11x faster │
│ QQuery 7     │    42.34ms │           43.34ms │     no change │
│ QQuery 8     │   961.07ms │          940.11ms │     no change │
│ QQuery 9     │  1212.05ms │         1215.74ms │     no change │
│ QQuery 10    │   272.28ms │          280.09ms │     no change │
│ QQuery 11    │   314.24ms │          321.60ms │     no change │
│ QQuery 12    │   947.28ms │          945.83ms │     no change │
│ QQuery 13    │  1412.66ms │         1420.04ms │     no change │
│ QQuery 14    │   883.62ms │          889.96ms │     no change │
│ QQuery 15    │  1066.12ms │         1084.71ms │     no change │
│ QQuery 16    │  1826.85ms │         1763.33ms │     no change │
│ QQuery 17    │  1654.90ms │         1636.73ms │     no change │
│ QQuery 18    │  3133.68ms │         3105.68ms │     no change │
│ QQuery 19    │    88.53ms │           91.74ms │     no change │
│ QQuery 20    │  1154.68ms │         1136.04ms │     no change │
│ QQuery 21    │  1387.95ms │         1325.31ms │     no change │
│ QQuery 22    │  2526.34ms │         2333.56ms │ +1.08x faster │
│ QQuery 23    │  8563.31ms │         8668.93ms │     no change │
│ QQuery 24    │   486.09ms │          481.52ms │     no change │
│ QQuery 25    │   410.40ms │          407.97ms │     no change │
│ QQuery 26    │   543.51ms │          554.22ms │     no change │
│ QQuery 27    │  1721.79ms │         1685.69ms │     no change │
│ QQuery 28    │ 12803.31ms │        12658.72ms │     no change │
│ QQuery 29    │   526.02ms │          537.58ms │     no change │
│ QQuery 30    │   863.88ms │          860.27ms │     no change │
│ QQuery 31    │   907.18ms │          909.85ms │     no change │
│ QQuery 32    │  2749.24ms │         2724.90ms │     no change │
│ QQuery 33    │  3434.16ms │         3391.96ms │     no change │
│ QQuery 34    │  3406.33ms │         3378.15ms │     no change │
│ QQuery 35    │  1306.15ms │         1301.79ms │     no change │
│ QQuery 36    │   230.41ms │          223.25ms │     no change │
│ QQuery 37    │    89.36ms │           88.77ms │     no change │
│ QQuery 38    │   129.02ms │          127.27ms │     no change │
│ QQuery 39    │   418.02ms │          408.13ms │     no change │
│ QQuery 40    │    52.52ms │           48.55ms │ +1.08x faster │
│ QQuery 41    │    45.36ms │           46.73ms │     no change │
│ QQuery 42    │    55.92ms │           55.00ms │     no change │
└──────────────┴────────────┴───────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)           │ 59547.38ms │
│ Total Time (add_short_circuit)   │ 59010.53ms │
│ Average Time (main_base)         │  1384.82ms │
│ Average Time (add_short_circuit) │  1372.34ms │
│ Queries Faster                   │          3 │
│ Queries Slower                   │          3 │
│ Queries with No Change           │         37 │
└──────────────────────────────────┴────────────┘

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is good to go - thank you for the nice improvement @acking-you

There is a very nice 3x improvement in the extended clickbench benchmarks

--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ add_short_circuit ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  1914.62ms │         1954.11ms │     no change │
│ QQuery 1     │   757.11ms │          760.32ms │     no change │
│ QQuery 2     │  1478.24ms │         1495.54ms │     no change │
│ QQuery 3     │   700.15ms │          705.43ms │     no change │
│ QQuery 4     │  1489.20ms │         1457.77ms │     no change │
│ QQuery 5     │ 17258.06ms │        17048.63ms │     no change │
│ QQuery 6     │  6872.16ms │         2318.32ms │ +2.96x faster │
└──────────────┴────────────┴───────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)           │ 30469.53ms │
│ Total Time (add_short_circuit)   │ 25740.13ms │
│ Average Time (main_base)         │  4352.79ms │
│ Average Time (add_short_circuit) │  3677.16ms │
│ Queries Faster                   │          1 │
│ Queries Slower                   │          0 │
│ Queries with No Change           │          6 │
└──────────────────────────────────┴────────────┘

And I didn't find any reproduceable slowdown for other queries.

I do think it is worth switching this to use bool_or / bool_and to avoid having to count bits (only detect if there is one) but we could do it as a follow on PR as well

match arg {
ColumnarValue::Array(array) => {
if let Ok(array) = as_boolean_array(&array) {
return array.false_count() == array.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked more carefully at bool_or and I do think it would be faster than this implementation on the case where there are some true values (as it stops as soon as it finds a single non zero): https://docs.rs/arrow/latest/arrow/compute/fn.bool_or.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Make DataFusion faster physical-expr Changes to the physical-expr crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BinaryExpr evaluate lacks optimization for Or and And scenarios
6 participants