GH-49438: [C++][Gandiva] Optimize LPAD/RPAD functions#49439
Conversation
|
|
Local Benchmark ResultsPlatform: Apple M3, macOS The original RPAD was pathologically slow compared to LPAD due to different algorithms. For 65K padding: LPAD took ~29ms while RPAD took ~992ms (34x slower for identical operation). The optimization applies the same efficient algorithm to both functions. LPAD (mean time in μs)
RPAD (mean time in μs)
|
|
|
1 similar comment
|
|
|
@lriggs @akravchukdremio @xxlaykxx You may want to review this. |
kou
left a comment
There was a problem hiding this comment.
It seems that lpad_utf8_int32_utf8() and rpad_utf8_int32_utf8() have many duplicated code. Can we factor out it as a helper function?
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 5707713. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
…9439) ### Rationale for this change The `lpad_utf8_int32_utf8` and `rpad_utf8_int32_utf8` functions have performance inefficiency and a potential memory safety issue: 1. **Performance**: Single-byte fills iterate character-by-character when `memset` would suffice. Multi-byte fills use O(n) iterations instead of O(log n) with a doubling strategy. 2. **Memory safety**: When the fill string is longer than the padding space needed, the code could write more bytes than allocated. Fixed preventatively. ### What changes are included in this PR? 1. **Memory safety fix**: Use `std::min(fill_text_len, total_fill_bytes)` for the initial copy to prevent overflow 2. **Fast path**: Add single-byte fill optimization using `memset` 3. **General path**: Replace character-by-character loop with doubling strategy for multi-byte fills 4. **Tests**: Add comprehensive tests for the new code paths ### Are these changes tested? Yes. Added tests covering: - Large UTF-8 fill characters (4-byte emoji, 3-byte Chinese) - Single-byte fill boundaries (1 char and 65536 char padding) - Content verification for fill patterns - Doubling strategy boundaries - Partial fill scenarios (fill text longer than padding needed) ### Are there any user-facing changes? No. * GitHub Issue: apache#49438 Authored-by: Dmitry Chirkov <dmitry.chirkov@dremio.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Rationale for this change
The
lpad_utf8_int32_utf8andrpad_utf8_int32_utf8functions have performance inefficiency and a potential memory safety issue:memsetwould suffice. Multi-byte fills use O(n) iterations instead of O(log n) with a doubling strategy.What changes are included in this PR?
std::min(fill_text_len, total_fill_bytes)for the initial copy to prevent overflowmemsetAre these changes tested?
Yes. Added tests covering:
Are there any user-facing changes?
No.