Skip to content

feat[gpu]: export binary as Arrow Binary#8320

Open
0ax1 wants to merge 3 commits into
developfrom
ad/gpu-arrow-binary-export
Open

feat[gpu]: export binary as Arrow Binary#8320
0ax1 wants to merge 3 commits into
developfrom
ad/gpu-arrow-binary-export

Conversation

@0ax1

@0ax1 0ax1 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

No description provided.

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 requested a review from a team June 9, 2026 15:59
@0ax1 0ax1 added the changelog/feature A new feature label Jun 9, 2026
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 requested review from onursatici and robert3005 June 9, 2026 16:01
@codspeed-hq

codspeed-hq Bot commented Jun 9, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 29.37%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 3 improved benchmarks
✅ 1520 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_canonical_into[(1000, 10)] 46.7 µs 31.8 µs +46.95%
Simulation chunked_varbinview_canonical_into[(1000, 10)] 197.9 µs 161.8 µs +22.34%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 213.6 µs 177.3 µs +20.42%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ad/gpu-arrow-binary-export (e599e32) with develop (f2148d4)

Open in CodSpeed

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Comment on lines +36 to +55
__device__ void repack_validity_device(const uint8_t *const input,
uint8_t *const output,
uint64_t len,
uint64_t input_offset,
uint64_t output_bytes) {
const uint64_t worker = blockIdx.x * blockDim.x + threadIdx.x;
const uint64_t start = start_elem(worker, output_bytes);
const uint64_t stop = stop_elem(worker, output_bytes);

for (uint64_t byte_idx = start; byte_idx < stop; byte_idx++) {
uint8_t byte = 0;
const uint64_t first_bit = byte_idx * 8;
for (uint64_t bit_idx = 0; bit_idx < 8 && first_bit + bit_idx < len; bit_idx++) {
if (get_bit(input, input_offset + first_bit + bit_idx)) {
byte |= static_cast<uint8_t>(1u << bit_idx);
}
}
output[byte_idx] = byte;
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 ⏰
Use a word level left shift

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants