Skip to content

[C++][Acero] Refine hash join benchmark #45591

@zanmato1984

Description

@zanmato1984

Describe the enhancement requested

The current hash join benchmark can be refined in the following aspects:

  1. It is using openmp for multi-threading, and this is the solely place that depends on openmp library. Based on my experience of recent benchmarking, openmp can cause tricky inaccurate benchmarking numbers (some obvious improvement shows worse number in one or two out of tens of cases, I still don't know why), and make diagnosis hard (I was using flame graph and openmp makes it hard to interpret, see the following two pictures).
    Flame graph with openmp:
    Image
    Flame graph without openmp:
    Image
    I'm not entirely sure why openmp was introduced at the first place (maybe @save-buffer can elaborate a bit?), but I think at this point we can trivially replace it with arrow-native multi-threading primitives. (Besides a bunch of simplification of cmake could be done).

  2. In certain benchmarks, we are more interested in the number of rows from the build side. Currently all benchmarks are based on the number of rows from the probe side.

Component(s)

C++

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions