GH-45591: [C++][Acero] Refine hash join benchmark and remove openmp from the project#45593
Conversation
|
|
ci/scripts/cpp_build.sh
Outdated
There was a problem hiding this comment.
IIRC, the hash join benchmark is never ran in our CI. And this is probably why.
cpp/src/arrow/acero/CMakeLists.txt
Outdated
There was a problem hiding this comment.
Feeling comfortable of removing one dependency.
97b4b22 to
c27cf8a
Compare
|
@ursabot please benchmark |
|
Benchmark runs are scheduled for commit c27cf8acfe7d13123a9185ad04a72870ab355731. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
c27cf8a to
492b429
Compare
06da9aa to
c6460ac
Compare
| [&](std::function<Status(size_t)> task) -> Status { | ||
| return thread_pool_->Spawn([&, task]() { DCHECK_OK(task(thread_indexer_())); }); | ||
| }, | ||
| thread_pool_->GetCapacity(), settings.num_threads == 1)); |
There was a problem hiding this comment.
Why do we need to change to thread_pool_->GetCapacity() (settings.num_threads) from 2 * settings.num_threads? Is the original 2 * settings.num_threads wrong?
There was a problem hiding this comment.
This argument controls the max number of concurrent tasks in the scheduler, so any value >= settings.num_threads is fine. (The original doubling isn't wrong though.)
| settings.num_threads == 1)); | ||
| /*thread_id=*/0, | ||
| [&](std::function<Status(size_t)> task) -> Status { | ||
| return thread_pool_->Spawn([&, task]() { DCHECK_OK(task(thread_indexer_())); }); |
There was a problem hiding this comment.
Why did you choose Spawn() + conditional variable not Submit() + Future::status() (or Spawn() + ThreadPool::WaitForIdle())? Is it easy to write/maintain?
There was a problem hiding this comment.
Oh right.
Spawn() + WaitForIdle() is easier and suffice (I was wrongly suspecting WaitForIdle() not working so changed to using cond var. But later it turned out the problem was something else and forgot to change it back.)
I'm updating it. Thank you for pointing this out.
cpp/src/arrow/acero/CMakeLists.txt
Outdated
|
Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit c27cf8acfe7d13123a9185ad04a72870ab355731. None of the specified runs were found on the Conbench server. The full Conbench report has more details. |
|
@ursabot please benchmark |
|
Benchmark runs are scheduled for commit 4cba288. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
| #include "arrow/testing/random.h" | ||
| #include "arrow/util/thread_pool.h" | ||
|
|
||
| #include <condition_variable> |
There was a problem hiding this comment.
Oops, forgot this.
Removing.
| #include <memory> | ||
|
|
||
| #include <omp.h> | ||
| #include <mutex> |
There was a problem hiding this comment.
Done. Thanks for reminding!
|
Though the benchmark result is not ready yet, I've manually checked some of the finished ones and saw hash join benchmarks are executed as expected, w/o baseline as expected as well (they never run before). So everything seems good now. I'm merging. Thanks @kou for reviewing. |
|
Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 4cba288. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit b36659a. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 21 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
See #45591 .
What changes are included in this PR?
Are these changes tested?
Manually tested.
Are there any user-facing changes?
Removed a public CMake option but I think it shouldn't affect the user.