Skip to content

Conversation

alexandre-daubois
Copy link
Member

@alexandre-daubois alexandre-daubois commented Jul 28, 2025

I would like to propose this optimization for str_pad(). Here is the benchmark code:

<?php

$iterations = 1000000;

for ($i = 0; $i < $iterations; $i++) {
    str_pad('hello', 2000, ' ', STR_PAD_RIGHT);
}

And the results with right padding:

Benchmark 1: ./sapi/cli/php.branch validation_benchmark.php
  Time (mean ± σ):      44.6 ms ±   0.8 ms    [User: 41.3 ms, System: 2.3 ms]
  Range (min … max):    43.2 ms …  47.4 ms    64 runs
 
Benchmark 2: ./sapi/cli/php.master validation_benchmark.php
  Time (mean ± σ):      2.977 s ±  0.076 s    [User: 2.962 s, System: 0.009 s]
  Range (min … max):    2.863 s …  3.061 s    10 runs
 
Summary
  ./sapi/cli/php.branch validation_benchmark.php ran
   66.75 ± 2.06 times faster than ./sapi/cli/php.master validation_benchmark.php

Left padding results:

alex@alex-macos php-src % hyperfine './sapi/cli/php.branch validation_benchmark.php' './sapi/cli/php.master validation_benchmark.php' --warmup 10
Benchmark 1: ./sapi/cli/php.branch validation_benchmark.php
  Time (mean ± σ):      43.8 ms ±   1.5 ms    [User: 40.5 ms, System: 2.3 ms]
  Range (min … max):    42.2 ms …  49.8 ms    58 runs
 
Benchmark 2: ./sapi/cli/php.master validation_benchmark.php
  Time (mean ± σ):      1.150 s ±  0.020 s    [User: 1.130 s, System: 0.009 s]
  Range (min … max):    1.125 s …  1.174 s    10 runs
 
Summary
  ./sapi/cli/php.branch validation_benchmark.php ran
   26.23 ± 1.01 times faster than ./sapi/cli/php.master validation_benchmark.php

The idea is to avoid modulo operation in loops and copying char by char. Instead, this PR prefers the bulk copy approach.

}
/* }}} */

static inline void php_str_pad_fill(char *dest, size_t pad_chars, const char *pad_str, size_t pad_str_len) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static inline void php_str_pad_fill(char *dest, size_t pad_chars, const char *pad_str, size_t pad_str_len) {
static zend_always_inline void php_str_pad_fill(char *dest, size_t pad_chars, const char *pad_str, size_t pad_str_len) {

Using zend_always_inline increases the likelihood that the function will actually be inlined.

Also, instead of passing char *dest and size_t pad_chars as separate arguments, how about accepting a zend_string and extracting the values within this inline function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! PR updated with both suggestions

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this makes the API of php_str_pad_fill() a little safer to use, because it makes less assumptions about the target pointer.

With regard to the inlining: I generally trust the compiler to make better than decisions than I can do myself. I would make the function just static void without any inlining hints and let the compiler decide.

for (i = 0; i < left_pad; i++)
ZSTR_VAL(result)[ZSTR_LEN(result)++] = pad_str[i % pad_str_len];
if (left_pad > 0) {
php_str_pad_fill(result, left_pad, pad_str, pad_str_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
php_str_pad_fill(result, left_pad, pad_str, pad_str_len);
php_str_pad_fill(ZSTR_VAL(result) + ZSTR_LEN(result), left_pad, pad_str, pad_str_len);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SakiTakamachi advised otherwise if I get it right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears so, but the suggestion doesn't really make sense, since pad_chars cannot be piggy-backed on the zend_string. Perhaps Saki meant to pass pad_str and pad_str_len as a zend_string* (which would make sense to me).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes the code a bit slower, going from ~48ms per run to ~58ms. Do we still want this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s fine to go with the faster one! :)

@TimWolla
Copy link
Member

Please also see #19220 (comment) and #19220 (comment) for benchmarking advice.

@alexandre-daubois
Copy link
Member Author

Thanks for the link, I'm having a look to hyperfine for this PR and #19276

@alexandre-daubois
Copy link
Member Author

PR description updated to use hyperfine

@TimWolla
Copy link
Member

PR description updated to use hyperfine

The php binary appears to be the system binary, it is not unlikely that it was compiled with different flags or a different compiler (version). You should make sure to test two binaries compiled with identical flags and compilers.

@alexandre-daubois
Copy link
Member Author

I think that my system-wide install right now is PHP 8.5 nightly. Just to be sure, I rerun against master and the branch:

Benchmark 1: ./sapi/cli/php.branch validation_benchmark.php
  Time (mean ± σ):      44.6 ms ±   0.8 ms    [User: 41.3 ms, System: 2.3 ms]
  Range (min … max):    43.2 ms …  47.4 ms    64 runs
 
Benchmark 2: ./sapi/cli/php.master validation_benchmark.php
  Time (mean ± σ):      2.977 s ±  0.076 s    [User: 2.962 s, System: 0.009 s]
  Range (min … max):    2.863 s …  3.061 s    10 runs
 
Summary
  ./sapi/cli/php.branch validation_benchmark.php ran
   66.75 ± 2.06 times faster than ./sapi/cli/php.master validation_benchmark.php

Copy link

github-actions bot commented Aug 1, 2025

AWS x86_64 (c7i.24xl)

Attribute Value
Environment aws
Runner host
Instance type c7i.metal-24xl (dedicated)
Architecture x86_64
CPU 48 cores
CPU settings disabled deeper C-states, disabled turbo boost, disabled hyper-threading
RAM 188 GB
Kernel 6.1.144-170.251.amzn2023.x86_64
OS Amazon Linux 2023.8.20250721
GCC 11.5.0
Time 2025-08-01 08:20:50 UTC

Laravel 11.1.2 demo app - 30 consecutive runs, 100 requests (sec)

PHP Min Max Std dev Average Average diff % Median Median diff % Memory
PHP - baseline@eaf2 0.46107 0.46896 0.00123 0.46674 0.00% 0.46677 0.00% 43.54 MB
PHP - str-pad-opt 0.46154 0.46938 0.00127 0.46765 0.20% 0.46767 0.19% 43.46 MB

Symfony 2.6.0 demo app - 30 consecutive runs, 100 requests (sec)

PHP Min Max Std dev Average Average diff % Median Median diff % Memory
PHP - baseline@eaf2 0.72773 0.73390 0.00138 0.72954 0.00% 0.72921 0.00% 39.69 MB
PHP - str-pad-opt 0.73090 0.73512 0.00087 0.73262 0.42% 0.73241 0.44% 39.77 MB

Wordpress 6.2 main page - 30 consecutive runs, 20 requests (sec)

PHP Min Max Std dev Average Average diff % Median Median diff % Memory
PHP - baseline@eaf2 0.57803 0.58039 0.00059 0.57886 0.00% 0.57869 0.00% 43.54 MB
PHP - str-pad-opt 0.58101 0.65067 0.02497 0.63512 9.72% 0.64595 11.62% 43.54 MB

bench.php - 25 consecutive runs (sec)

PHP Min Max Std dev Average Average diff % Median Median diff % Memory
PHP - baseline@eaf2 0.21721 0.22099 0.00093 0.21825 0.00% 0.21808 0.00% 26.53 MB
PHP - str-pad-opt 0.21425 0.21712 0.00077 0.21580 -1.12% 0.21585 -1.02% 26.53 MB

micro_bench.php - 25 consecutive runs (sec)

PHP Min Max Std dev Average Average diff % Median Median diff % Memory
PHP - baseline@eaf2 1.26166 1.28426 0.00498 1.26842 0.00% 1.26635 0.00% 20.82 MB
PHP - str-pad-opt 1.31617 1.33954 0.00573 1.32695 4.61% 1.32688 4.78% 20.82 MB

@kocsismate
Copy link
Member

The above comment was due to test the real time benchmark :) Unfortunately, I guess most results should show zero difference, which is not currently the case :( The wordpress results are especially weird. I'll have to dig into it...

cc. @iluuu1994 @arnaud-lb

@alexandre-daubois
Copy link
Member Author

If I can be of any help, don't hesitate to tell me! It is indeed surprising

Copy link

github-actions bot commented Aug 1, 2025

AWS x86_64 (c7i.24xl)

Attribute Value
Environment aws
Runner host
Instance type c7i.metal-24xl (dedicated)
Architecture x86_64
CPU 48 cores
CPU settings disabled deeper C-states, disabled turbo boost, disabled hyper-threading
RAM 188 GB
Kernel 6.1.144-170.251.amzn2023.x86_64
OS Amazon Linux 2023.8.20250721
GCC 11.5.0
Time 2025-08-01 09:41:35 UTC

Laravel 11.1.2 demo app - 30 consecutive runs, 100 requests (sec)

PHP Min Max Std dev Average Average diff % Median Median diff % Instr count Memory
PHP - baseline@eaf2 0.45780 0.46629 0.00139 0.46472 0.00% 0.46488 0.00% 177247194 43.48 MB
PHP - str-pad-opt 0.46514 0.46736 0.00050 0.46594 0.26% 0.46581 0.20% 177251261 43.48 MB

Symfony 2.6.0 demo app - 30 consecutive runs, 100 requests (sec)

PHP Min Max Std dev Average Average diff % Median Median diff % Instr count Memory
PHP - baseline@eaf2 0.72645 0.72932 0.00071 0.72785 0.00% 0.72779 0.00% 288207116 39.78 MB
PHP - str-pad-opt 0.72983 0.73390 0.00103 0.73178 0.54% 0.73152 0.51% 288197741 39.78 MB

Wordpress 6.2 main page - 30 consecutive runs, 20 requests (sec)

PHP Min Max Std dev Average Average diff % Median Median diff % Instr count Memory
PHP - baseline@eaf2 0.57872 0.58180 0.00064 0.57969 0.00% 0.57960 0.00% 1129301005 43.56 MB
PHP - str-pad-opt 0.57772 0.58038 0.00056 0.57900 -0.12% 0.57888 -0.12% 1129302558 43.56 MB

bench.php - 25 consecutive runs (sec)

PHP Min Max Std dev Average Average diff % Median Median diff % Instr count Memory
PHP - baseline@eaf2 0.21719 0.22107 0.00105 0.21863 0.00% 0.21866 0.00% 1791734911 26.55 MB
PHP - str-pad-opt 0.21282 0.21641 0.00090 0.21439 -1.94% 0.21441 -1.94% 1794186569 26.55 MB

micro_bench.php - 25 consecutive runs (sec)

PHP Min Max Std dev Average Average diff % Median Median diff % Instr count Memory
PHP - baseline@eaf2 1.25910 1.27670 0.00515 1.26772 0.00% 1.26679 0.00% 1269754366 20.84 MB
PHP - str-pad-opt 1.31593 1.33956 0.00516 1.32448 4.48% 1.32418 4.53% 1271643119 20.83 MB

@iluuu1994
Copy link
Member

@kocsismate FYI, I've written a small benchmarking helper for myself.

https://github.com/iluuu1994/php-benchmark-diff

It differs slightly from things like hyperfine in a few ways:

  • It performs alternating tests to minimize noise from throttling. In other words, rather than running A 50 times and B 50 times, it runs A then B 50 times.
  • It allows using perf events rather than just wall time. It also has a "cgi" mode (should be renamed) that can measure sections of an application by using hrtime() and then printing the time of the section to stderr.
  • It looks at the fastest 25% of runs, i.e. filtering out the slow statistical outliers. I'm not sure yet if this is effective. This window is configurable.
  • It calculates the statistical p-value, i.e. how likely it is for the result to be random.

Not sure if this is useful to you, but wanted to mention it in case it is. It doesn't support JSON output yet.

@kocsismate
Copy link
Member

@iluuu1994 Nice thank you very much for the input! I've seen Tim's comment somewhere about hyperfine (and read a blog post about it by Volker), but didn't know how it exactly differs from what I do, but based on your description it looks very promising! Specifically, I'm pretty much interested about the p-value. I think my benchmark also discards some of the results, but I should look into it how I exactly implemented it. Alternating tests is also a very clever idea.

I'll try to implement it at some point, because I'm still pretty much disappointed by my results :( Most tests used to be stable a few months ago, but they stopped being reliable a while ago. And curiously, microbenchmarks perform the worst.

@iluuu1994
Copy link
Member

iluuu1994 commented Aug 5, 2025

I see I kept the branch with the JSON output too. iluuu1994/php-benchmark-diff@master...json I planned to replace the Valgrind benchmark, but unfortunately collecting CPU cycles with perf doesn't work on GH runners.

@alexandre-daubois
Copy link
Member Author

Friendly ping, are you fine merging this? It's safer now that 8.5 has its branch

@Girgias Girgias requested a review from iluuu1994 October 2, 2025 09:54
Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct to me. Thanks!

@iluuu1994
Copy link
Member

One small thing that might make sense before merging is benchmarking small paddings/string. I reckon these are pretty common, and might slightly regress with this implementation. Regardless, I don't object.

@alexandre-daubois
Copy link
Member Author

alexandre-daubois commented Oct 3, 2025

Here are the results with tiny padding/strings:

<?php

$iterations = 1000000;

for ($i = 0; $i < $iterations; $i++) {
    str_pad('hello', 10, ' ', STR_PAD_RIGHT);
}

/*
 Benchmark 1: ./sapi/cli/php.branch bench_small_right.php                      
       Time (mean ± σ):      1.360 s ±  0.015 s    [User: 1.314 s, System: 0.037 s]
       Range (min … max):    1.330 s …  1.378 s    10 runs

     Benchmark 2: ./sapi/cli/php.master bench_small_right.php
       Time (mean ± σ):      1.383 s ±  0.008 s    [User: 1.340 s, System: 0.034 s]
       Range (min … max):    1.367 s …  1.399 s    10 runs

     Summary
       ./sapi/cli/php.branch bench_small_right.php ran
         1.02 ± 0.01 times faster than ./sapi/cli/php.master bench_small_right.php
*/
<?php

$iterations = 1000000;

for ($i = 0; $i < $iterations; $i++) {
    str_pad('hello', 10, ' ', STR_PAD_LEFT);
}

/*
Benchmark 1: ./sapi/cli/php.branch bench_small_left.php                       
       Time (mean ± σ):      1.404 s ±  0.068 s    [User: 1.316 s, System: 0.039 s]
       Range (min … max):    1.347 s …  1.549 s    10 runs

     Benchmark 2: ./sapi/cli/php.master bench_small_left.php
       Time (mean ± σ):      1.422 s ±  0.068 s    [User: 1.296 s, System: 0.068 s]
       Range (min … max):    1.347 s …  1.586 s    10 runs

     Summary
       ./sapi/cli/php.branch bench_small_left.php ran
         1.01 ± 0.07 times faster than ./sapi/cli/php.master bench_small_left.php
*/
<?php

$iterations = 1000000;

for ($i = 0; $i < $iterations; $i++) {
    str_pad('hello', 10, ' ', STR_PAD_BOTH);
}

/*
 Benchmark 1: ./sapi/cli/php.branch bench_small_both.php                       
       Time (mean ± σ):      1.376 s ±  0.012 s    [User: 1.326 s, System: 0.038 s]
       Range (min … max):    1.355 s …  1.392 s    10 runs

     Benchmark 2: ./sapi/cli/php.master bench_small_both.php
       Time (mean ± σ):      1.361 s ±  0.030 s    [User: 1.265 s, System: 0.064 s]
       Range (min … max):    1.318 s …  1.413 s    10 runs

     Summary
       ./sapi/cli/php.master bench_small_both.php ran
         1.01 ± 0.02 times faster than ./sapi/cli/php.branch bench_small_both.php
*/
<?php

$iterations = 1000000;

for ($i = 0; $i < $iterations; $i++) {
    str_pad('x', 5, ' ', STR_PAD_RIGHT);
}

/*
 Benchmark 1: ./sapi/cli/php.branch bench_tiny_1char.php                       
       Time (mean ± σ):      1.378 s ±  0.027 s    [User: 1.242 s, System: 0.079 s]
       Range (min … max):    1.330 s …  1.420 s    10 runs

     Benchmark 2: ./sapi/cli/php.master bench_tiny_1char.php
       Time (mean ± σ):      1.333 s ±  0.009 s    [User: 1.264 s, System: 0.056 s]
       Range (min … max):    1.321 s …  1.350 s    10 runs

     Summary
       ./sapi/cli/php.master bench_tiny_1char.php ran
         1.03 ± 0.02 times faster than ./sapi/cli/php.branch bench_tiny_1char.php
*/
<?php

$iterations = 1000000;

for ($i = 0; $i < $iterations; $i++) {
    str_pad('foo', 15, '.-', STR_PAD_RIGHT);
}

/*
Benchmark 1: ./sapi/cli/php.branch bench_multichar_pad.php                     
       Time (mean ± σ):      1.361 s ±  0.010 s    [User: 1.292 s, System: 0.059 s]
       Range (min … max):    1.340 s …  1.376 s    10 runs

     Benchmark 2: ./sapi/cli/php.master bench_multichar_pad.php
       Time (mean ± σ):      1.402 s ±  0.012 s    [User: 1.330 s, System: 0.060 s]
       Range (min … max):    1.382 s …  1.422 s    10 runs

     Summary
       ./sapi/cli/php.branch bench_multichar_pad.php ran
         1.03 ± 0.01 times faster than ./sapi/cli/php.master bench_multichar_pad.php
*/

Are you fine with these results?

@iluuu1994
Copy link
Member

Looks good, thanks!

@alexandre-daubois alexandre-daubois merged commit 80e4278 into php:master Oct 3, 2025
9 checks passed
@alexandre-daubois alexandre-daubois deleted the str-pad-opt branch October 3, 2025 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants