Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Dec 31, 2025

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

For non-ASCII strings, the original implementation used string.find() to get the byte index, then counted characters up to that byte index. This required two passes through the string.

This optimization uses char_indices() to find the substring while simultaneously tracking character positions, completing the search in a single pass.

Benchmark results (UTF-8 strings):

  • str_len_8: 188.98 µs → 140.54 µs (25.4% faster)
  • str_len_32: 615.69 µs → 294.15 µs (52.2% faster)
  • str_len_128: 2.2707 ms → 1.2462 ms (45.1% faster)
  • str_len_4096: 74.328 ms → 36.538 ms (50.9% faster)

ASCII performance unchanged (already optimized with fast path).

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the functions Changes to functions implementation label Dec 31, 2025
For non-ASCII strings, the original implementation used string.find()
to get the byte index, then counted characters up to that byte index.
This required two passes through the string.

This optimization uses char_indices() to find the substring while
simultaneously tracking character positions, completing the search
in a single pass.

Benchmark results (UTF-8 strings):
- str_len_8:    188.98 µs → 140.54 µs  (25.4% faster)
- str_len_32:   615.69 µs → 294.15 µs  (52.2% faster)
- str_len_128:  2.2707 ms → 1.2462 ms  (45.1% faster)
- str_len_4096: 74.328 ms → 36.538 ms  (50.9% faster)

ASCII performance unchanged (already optimized with fast path).
@adriangb
Copy link
Contributor

run benchmark character_length

@alamb-ghbot

This comment was marked as outdated.

@alamb
Copy link
Contributor

alamb commented Dec 31, 2025

run benchmark character_length

(I just added this to the whitelist and will run now)

@alamb
Copy link
Contributor

alamb commented Dec 31, 2025

run benchmark character_length

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing strpos_optimize (5ed05f7) to 7c50448 diff
BENCH_NAME=character_length
BENCH_COMMAND=cargo bench --features=parquet --bench character_length
BENCH_FILTER=
BENCH_BRANCH_NAME=strpos_optimize
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Dec 31, 2025

run benchmark strpos

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                  main                                   strpos_optimize
-----                                                  ----                                   ---------------
character_length_StringArray_ascii_str_len_128         1.00     27.4±0.84µs        ? ?/sec    1.05     28.8±1.72µs        ? ?/sec
character_length_StringArray_ascii_str_len_32          1.00      9.7±0.07µs        ? ?/sec    1.02      9.9±0.16µs        ? ?/sec
character_length_StringArray_ascii_str_len_4096        1.00      2.2±0.05ms        ? ?/sec    1.00      2.2±0.06ms        ? ?/sec
character_length_StringArray_ascii_str_len_8           1.00      6.0±0.13µs        ? ?/sec    1.00      6.0±0.07µs        ? ?/sec
character_length_StringArray_utf8_str_len_128          1.00    246.6±2.47µs        ? ?/sec    1.00    247.3±3.28µs        ? ?/sec
character_length_StringArray_utf8_str_len_32           1.00    182.2±2.39µs        ? ?/sec    1.00    181.4±1.28µs        ? ?/sec
character_length_StringArray_utf8_str_len_4096         1.00      5.3±0.07ms        ? ?/sec    1.00      5.3±0.09ms        ? ?/sec
character_length_StringArray_utf8_str_len_8            1.00    152.3±0.78µs        ? ?/sec    1.00    152.3±0.50µs        ? ?/sec
character_length_StringViewArray_ascii_str_len_128     1.00     63.7±1.60µs        ? ?/sec    1.00     63.6±1.28µs        ? ?/sec
character_length_StringViewArray_ascii_str_len_32      1.00     44.0±0.24µs        ? ?/sec    1.00     44.1±0.84µs        ? ?/sec
character_length_StringViewArray_ascii_str_len_4096    1.02      2.1±0.05ms        ? ?/sec    1.00      2.1±0.06ms        ? ?/sec
character_length_StringViewArray_ascii_str_len_8       1.00     56.4±0.20µs        ? ?/sec    1.00     56.6±0.97µs        ? ?/sec
character_length_StringViewArray_utf8_str_len_128      1.00    259.0±7.54µs        ? ?/sec    1.00    257.8±1.03µs        ? ?/sec
character_length_StringViewArray_utf8_str_len_32       1.00    189.5±0.76µs        ? ?/sec    1.01    191.7±4.86µs        ? ?/sec
character_length_StringViewArray_utf8_str_len_4096     1.01      5.4±0.08ms        ? ?/sec    1.00      5.4±0.10ms        ? ?/sec
character_length_StringViewArray_utf8_str_len_8        1.00    160.6±1.91µs        ? ?/sec    1.00    160.3±1.35µs        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing strpos_optimize (5ed05f7) to 7c50448 diff
BENCH_NAME=strpos
BENCH_COMMAND=cargo bench --features=parquet --bench strpos
BENCH_FILTER=
BENCH_BRANCH_NAME=strpos_optimize
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                        main                                   strpos_optimize
-----                                        ----                                   ---------------
strpos_StringArray_ascii_str_len_128         1.05   1137.5±6.77µs        ? ?/sec    1.00  1087.5±20.26µs        ? ?/sec
strpos_StringArray_ascii_str_len_32          1.00   327.9±11.29µs        ? ?/sec    1.02    333.7±6.68µs        ? ?/sec
strpos_StringArray_ascii_str_len_4096        1.00     32.4±0.95ms        ? ?/sec    1.22     39.4±0.34ms        ? ?/sec
strpos_StringArray_ascii_str_len_8           1.00    149.4±1.76µs        ? ?/sec    1.08    162.1±3.34µs        ? ?/sec
strpos_StringArray_utf8_str_len_128          1.07      3.3±0.01ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
strpos_StringArray_utf8_str_len_32           1.59  1371.2±11.86µs        ? ?/sec    1.00    863.3±7.31µs        ? ?/sec
strpos_StringArray_utf8_str_len_4096         1.09    102.0±1.46ms        ? ?/sec    1.00     93.6±0.54ms        ? ?/sec
strpos_StringArray_utf8_str_len_8            1.67   497.8±10.91µs        ? ?/sec    1.00    298.2±3.10µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_128     1.00  1259.8±11.10µs        ? ?/sec    1.01  1274.8±17.36µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_32      1.00    388.4±5.48µs        ? ?/sec    1.03    398.7±5.21µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_4096    1.00     38.8±0.46ms        ? ?/sec    1.02     39.5±0.93ms        ? ?/sec
strpos_StringViewArray_ascii_str_len_8       1.00    221.0±1.26µs        ? ?/sec    1.02    224.7±6.92µs        ? ?/sec
strpos_StringViewArray_utf8_str_len_128      1.06      3.3±0.06ms        ? ?/sec    1.00      3.1±0.04ms        ? ?/sec
strpos_StringViewArray_utf8_str_len_32       1.58   1398.0±4.49µs        ? ?/sec    1.00    886.5±8.76µs        ? ?/sec
strpos_StringViewArray_utf8_str_len_4096     1.07    102.0±1.22ms        ? ?/sec    1.00     95.2±0.53ms        ? ?/sec
strpos_StringViewArray_utf8_str_len_8        1.62    513.8±8.19µs        ? ?/sec    1.00    318.1±1.53µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jan 1, 2026

run benchmark strpos

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing strpos_optimize (5ed05f7) to 7c50448 diff
BENCH_NAME=strpos
BENCH_COMMAND=cargo bench --features=parquet --bench strpos
BENCH_FILTER=
BENCH_BRANCH_NAME=strpos_optimize
Results will be posted here when complete

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya -- this makes sense to me

for (byte_idx, _) in string.char_indices() {
char_pos += 1;
if byte_idx + substring_bytes.len() <= string_bytes.len()
&& &string_bytes[byte_idx..byte_idx + substring_bytes.len()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could potentially use string_bytes.get_unchecked here if it makes any difference as you validate the bounds check immediately above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use get_unchecked to avoid redundant bounds checking now when comparing substring slices.

@alamb alamb added the performance Make DataFusion faster label Jan 1, 2026
@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                        main                                   strpos_optimize
-----                                        ----                                   ---------------
strpos_StringArray_ascii_str_len_128         1.06   1134.8±7.74µs        ? ?/sec    1.00   1070.6±7.08µs        ? ?/sec
strpos_StringArray_ascii_str_len_32          1.00    325.3±2.17µs        ? ?/sec    1.01    329.3±1.09µs        ? ?/sec
strpos_StringArray_ascii_str_len_4096        1.00     32.2±0.64ms        ? ?/sec    1.20     38.7±0.47ms        ? ?/sec
strpos_StringArray_ascii_str_len_8           1.00    149.6±1.52µs        ? ?/sec    1.07    160.5±4.75µs        ? ?/sec
strpos_StringArray_utf8_str_len_128          1.08      3.3±0.07ms        ? ?/sec    1.00      3.1±0.06ms        ? ?/sec
strpos_StringArray_utf8_str_len_32           1.60   1369.1±9.15µs        ? ?/sec    1.00    857.2±3.27µs        ? ?/sec
strpos_StringArray_utf8_str_len_4096         1.09    101.3±0.83ms        ? ?/sec    1.00     92.8±1.46ms        ? ?/sec
strpos_StringArray_utf8_str_len_8            1.66    498.3±5.96µs        ? ?/sec    1.00    300.3±4.81µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_128     1.00  1257.0±18.33µs        ? ?/sec    1.00  1262.2±10.75µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_32      1.15    452.0±2.38µs        ? ?/sec    1.00    393.7±4.30µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_4096    1.00     38.2±0.27ms        ? ?/sec    1.00     38.3±0.27ms        ? ?/sec
strpos_StringViewArray_ascii_str_len_8       1.00    220.8±0.97µs        ? ?/sec    1.01    223.9±5.55µs        ? ?/sec
strpos_StringViewArray_utf8_str_len_128      1.06      3.4±0.02ms        ? ?/sec    1.00      3.1±0.07ms        ? ?/sec
strpos_StringViewArray_utf8_str_len_32       1.59  1404.6±28.19µs        ? ?/sec    1.00    884.2±6.02µs        ? ?/sec
strpos_StringViewArray_utf8_str_len_4096     1.07    101.2±1.35ms        ? ?/sec    1.00     94.2±0.37ms        ? ?/sec
strpos_StringViewArray_utf8_str_len_8        1.61    509.3±2.29µs        ? ?/sec    1.00    317.3±4.89µs        ? ?/sec

Use `get_unchecked` to avoid redundant bounds checking when comparing
substring slices, as we already validate the bounds in the if condition.

Benchmark results show modest improvements, especially for shorter strings:
- strpos_StringArray_ascii_str_len_8: 3.7% faster
- strpos_StringViewArray_ascii_str_len_8: 2.0% faster
- strpos_StringViewArray_utf8_str_len_8: 1.9% faster

Other benchmarks show no significant change (within noise threshold).
@viirya viirya added this pull request to the merge queue Jan 1, 2026
Merged via the queue into apache:main with commit 195d3d6 Jan 1, 2026
28 checks passed
@viirya viirya deleted the strpos_optimize branch January 1, 2026 21:24
@viirya
Copy link
Member Author

viirya commented Jan 2, 2026

Thanks @alamb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants