Skip to content

GH-45591: [C++][Acero] Refine hash join benchmark and remove openmp from the project#45593

Merged
zanmato1984 merged 6 commits intoapache:mainfrom
zanmato1984:refine-hash-join-benchmark
Feb 21, 2025
Merged

GH-45591: [C++][Acero] Refine hash join benchmark and remove openmp from the project#45593
zanmato1984 merged 6 commits intoapache:mainfrom
zanmato1984:refine-hash-join-benchmark

Conversation

@zanmato1984
Copy link
Contributor

@zanmato1984 zanmato1984 commented Feb 20, 2025

Rationale for this change

See #45591 .

What changes are included in this PR?

  1. Replace the usage of openmp with arrow-native multi-threading primitives;
  2. Remove all the occurrences of openmp from the project;
  3. Support stats for build side rows in hash join benchmark, and update certain benchmark.

Are these changes tested?

Manually tested.

Are there any user-facing changes?

Removed a public CMake option but I think it shouldn't affect the user.

@github-actions
Copy link

⚠️ GitHub issue #45591 has been automatically assigned in GitHub to PR creator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the hash join benchmark is never ran in our CI. And this is probably why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feeling comfortable of removing one dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so too!

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 20, 2025
@zanmato1984 zanmato1984 force-pushed the refine-hash-join-benchmark branch from 97b4b22 to c27cf8a Compare February 20, 2025 15:27
@zanmato1984
Copy link
Contributor Author

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Feb 20, 2025

Benchmark runs are scheduled for commit c27cf8acfe7d13123a9185ad04a72870ab355731. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@zanmato1984 zanmato1984 force-pushed the refine-hash-join-benchmark branch from c27cf8a to 492b429 Compare February 20, 2025 15:59
@zanmato1984 zanmato1984 force-pushed the refine-hash-join-benchmark branch from 06da9aa to c6460ac Compare February 20, 2025 16:55
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

[&](std::function<Status(size_t)> task) -> Status {
return thread_pool_->Spawn([&, task]() { DCHECK_OK(task(thread_indexer_())); });
},
thread_pool_->GetCapacity(), settings.num_threads == 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to change to thread_pool_->GetCapacity() (settings.num_threads) from 2 * settings.num_threads? Is the original 2 * settings.num_threads wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument controls the max number of concurrent tasks in the scheduler, so any value >= settings.num_threads is fine. (The original doubling isn't wrong though.)

settings.num_threads == 1));
/*thread_id=*/0,
[&](std::function<Status(size_t)> task) -> Status {
return thread_pool_->Spawn([&, task]() { DCHECK_OK(task(thread_indexer_())); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you choose Spawn() + conditional variable not Submit() + Future::status() (or Spawn() + ThreadPool::WaitForIdle())? Is it easy to write/maintain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right.

Spawn() + WaitForIdle() is easier and suffice (I was wrongly suspecting WaitForIdle() not working so changed to using cond var. But later it turned out the problem was something else and forgot to change it back.)

I'm updating it. Thank you for pointing this out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so too!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Feb 20, 2025
@conbench-apache-arrow
Copy link

Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit c27cf8acfe7d13123a9185ad04a72870ab355731.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

@zanmato1984
Copy link
Contributor Author

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Feb 21, 2025

Benchmark runs are scheduled for commit 4cba288. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

#include "arrow/testing/random.h"
#include "arrow/util/thread_pool.h"

#include <condition_variable>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, forgot this.

Removing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#include <memory>

#include <omp.h>
#include <mutex>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for reminding!

@github-actions github-actions bot removed the awaiting merge Awaiting merge label Feb 21, 2025
@github-actions github-actions bot added the awaiting changes Awaiting changes label Feb 21, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 21, 2025
@zanmato1984
Copy link
Contributor Author

Though the benchmark result is not ready yet, I've manually checked some of the finished ones and saw hash join benchmarks are executed as expected, w/o baseline as expected as well (they never run before).

So everything seems good now. I'm merging.

Thanks @kou for reviewing.

@zanmato1984 zanmato1984 merged commit b36659a into apache:main Feb 21, 2025
39 checks passed
@zanmato1984 zanmato1984 removed the awaiting change review Awaiting change review label Feb 21, 2025
@zanmato1984 zanmato1984 deleted the refine-hash-join-benchmark branch February 21, 2025 06:10
@conbench-apache-arrow
Copy link

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 4cba288.

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details.

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit b36659a.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 21 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants