Skip to content

Bound better the loop size for the estimator to increase the shift#191

Open
ameligrana wants to merge 3 commits into
mainfrom
ameligrana-patch-1
Open

Bound better the loop size for the estimator to increase the shift#191
ameligrana wants to merge 3 commits into
mainfrom
ameligrana-patch-1

Conversation

@ameligrana

@ameligrana ameligrana commented Apr 18, 2026

Copy link
Copy Markdown
Collaborator

This should be okay since significand_sum_hi < group_length <= m[4]

@github-actions

github-actions Bot commented Apr 18, 2026

Copy link
Copy Markdown

Benchmark Results

f537014 69f527f f537014 / 69f527f
TTFX excluding time to load 0.0384 ± 0 s 0.0388 ± 0 s 0.961 ± 0,1.01 ± 0,0.99 ± 0
code size in bytes 1.55e+04 ± 0 h 1.55e+04 ± 0 h 1 ± 0
code size in lines 573 ± 0 h 571 ± 0 h 1 ± 0
code size in syntax nodes 4.15e+03 ± 0 h 4.15e+03 ± 0 h 1 ± 0
constructor empty 1.53 ± 0.14 μs 1.65 ± 0.39 μs 0.853 ± 0.29,0.995 ± 0.2,0.93 ± 0.24
constructor n=100 σ=0.1 6.02 ± 0.18 μs 6.04 ± 0.21 μs 0.97 ± 0.086,1.74,0.997 ± 0.046
constructor n=100 σ=1.0 6.57 ± 0.17 μs 6.71 ± 1.2 μs 0.979 ± 0.64,1.01 ± 0.023,0.979 ± 0.18
constructor n=100 σ=10.0 8.19 ± 1.5 μs 16.8 μs 1.54,0.997 ± 0.026,0.486
constructor n=100 σ=100.0 8.5 ± 0.25 μs 8.67 ± 43 μs 0.872 ± 2.6,0.876 ± 0.093,0.98 ± 4.8
constructor n=1000 σ=0.1 0.0476 ± 0.0013 ms 0.0477 ± 0.0013 ms 1.02 ± 0.064,0.985 ± 0.069,0.998 ± 0.039
constructor n=1000 σ=1.0 0.0572 ± 0.0026 ms 0.0507 ± 0.0017 ms 1.09 ± 0.23,0.996 ± 0.041,1.13 ± 0.064
constructor n=1000 σ=10.0 0.058 ± 0.001 ms 0.0581 ± 0.0011 ms 0.998 ± 0.054,1 ± 0.023,0.998 ± 0.026
constructor n=1000 σ=100.0 0.0698 ± 0.0034 ms 0.07 ± 0.0036 ms 0.989 ± 0.11,1.01 ± 0.14,0.997 ± 0.071
constructor n=10000 σ=0.1 0.492 ± 0.03 ms 0.485 ± 0.025 ms 1.02 ± 0.11,0.996 ± 0.092,1.02 ± 0.081
constructor n=10000 σ=1.0 0.531 ± 0.015 ms 0.493 ± 0.02 ms 0.998 ± 0.1,1 ± 0.063,1.08 ± 0.055
constructor n=10000 σ=10.0 0.569 ± 0.027 ms 0.576 ± 0.018 ms 1 ± 0.049,0.995 ± 0.13,0.986 ± 0.056
constructor n=10000 σ=100.0 0.662 ± 0.021 ms 0.663 ± 0.015 ms 0.986 ± 0.078,1 ± 0.092,0.998 ± 0.039
constructor zeros 3.65 ± 2.4 μs 3 ± 0.24 μs 0.894 ± 0.24,0.988 ± 0.079,1.21 ± 0.8
delete ∘ rand n=100 σ=0.1 4.69 ± 0.19 μs 4.68 ± 0.24 μs 1.01 ± 0.071,0.998 ± 0.068,1 ± 0.066
delete ∘ rand n=100 σ=1.0 4.96 ± 0.22 μs 4.95 ± 0.22 μs 1.01 ± 0.063,0.993 ± 0.06,1 ± 0.063
delete ∘ rand n=100 σ=10.0 5.56 ± 0.28 μs 5.57 ± 0.29 μs 1.01 ± 0.076,0.984 ± 0.071,0.998 ± 0.072
delete ∘ rand n=100 σ=100.0 8.8 ± 0.47 μs 8.63 ± 0.35 μs 1.01 ± 0.056,0.988 ± 0.055,1.02 ± 0.068
delete ∘ rand n=1000 σ=0.1 0.0468 ± 0.001 ms 0.0468 ± 0.0012 ms 1.01 ± 0.034,1 ± 0.03,1 ± 0.033
delete ∘ rand n=1000 σ=1.0 0.0506 ± 0.0011 ms 0.0504 ± 0.0011 ms 1 ± 0.03,0.994 ± 0.037,1 ± 0.031
delete ∘ rand n=1000 σ=10.0 0.0528 ± 0.0011 ms 0.0524 ± 0.00095 ms 1 ± 0.026,0.995 ± 0.026,1.01 ± 0.029
delete ∘ rand n=1000 σ=100.0 0.061 ± 0.001 ms 0.0608 ± 0.00098 ms 1.01 ± 0.024,0.989 ± 0.022,1 ± 0.023
delete ∘ rand n=10000 σ=0.1 0.519 ± 0.02 ms 0.512 ± 0.014 ms 0.999 ± 0.052,0.989 ± 0.051,1.01 ± 0.048
delete ∘ rand n=10000 σ=1.0 0.553 ± 0.0089 ms 0.556 ± 0.015 ms 0.978 ± 0.048,0.991 ± 0.037,0.994 ± 0.031
delete ∘ rand n=10000 σ=10.0 0.556 ± 0.014 ms 0.552 ± 0.014 ms 1 ± 0.041,0.979 ± 0.034,1.01 ± 0.036
delete ∘ rand n=10000 σ=100.0 0.55 ± 0.017 ms 0.547 ± 0.016 ms 1.01 ± 0.044,0.985 ± 0.042,1.01 ± 0.043
intermixed_h n=100 σ=0.1 12.4 ± 1.4 μs 12.3 ± 1.5 μs 0.987 ± 0.14,0.994 ± 0.21,1.01 ± 0.17
intermixed_h n=100 σ=1.0 17.5 μs 11.5 ± 1.1 μs 0.78 ± 0.11,1.02 ± 0.27,1.53
intermixed_h n=100 σ=10.0 12 ± 1.7 μs 11.9 ± 1.7 μs 1 ± 0.35,0.989 ± 0.32,1 ± 0.21
intermixed_h n=100 σ=100.0 11.7 ± 0.88 μs 12.1 ± 1.5 μs 0.882 ± 0.25,0.896 ± 0.25,0.968 ± 0.14
intermixed_h n=1000 σ=0.1 0.141 ± 0.079 ms 0.156 ± 0.084 ms 0.935 ± 0.43,0.953 ± 0.54,0.907 ± 0.71
intermixed_h n=1000 σ=1.0 0.114 ± 0.0081 ms 0.115 ± 0.0076 ms 1.03 ± 0.11,1.01 ± 0.11,0.991 ± 0.096
intermixed_h n=1000 σ=10.0 0.109 ± 0.011 ms 0.109 ± 0.01 ms 0.981 ± 0.12,0.998 ± 0.14,1 ± 0.13
intermixed_h n=1000 σ=100.0 0.115 ± 0.011 ms 0.116 ± 0.011 ms 0.973,0.985 ± 0.14,0.993 ± 0.13
intermixed_h n=10000 σ=0.1 1.16 ± 0.12 ms 1.19 ± 0.12 ms 0.946 ± 0.19,1.01 ± 0.26,0.971 ± 0.14
intermixed_h n=10000 σ=1.0 1.21 ± 0.18 ms 1.18 ± 0.15 ms 0.952 ± 0.2,0.996 ± 0.2,1.03 ± 0.2
intermixed_h n=10000 σ=10.0 1.13 ± 0.18 ms 1.13 ± 0.17 ms 1.02 ± 0.21,1 ± 0.25,1 ± 0.22
intermixed_h n=10000 σ=100.0 1.14 ± 0.19 ms 1.11 ± 0.16 ms 0.975 ± 0.22,0.974 ± 0.2,1.03 ± 0.23
pathological 1 0.0466 ± 0.00059 μs 0.0465 ± 0.00056 μs 1.01 ± 0.011,0.999 ± 0.01,1 ± 0.018
pathological 1′ 0.121 ± 0.0015 μs 0.133 ± 0.0017 μs 0.83 ± 0.012,0.851 ± 0.015,0.908 ± 0.016
pathological 2 0.0634 ± 0.00057 μs 0.0633 ± 0.0007 μs 1 ± 0.0075,1 ± 0.0065,1 ± 0.014
pathological 2′ 0.135 ± 0.0018 μs 0.147 ± 0.0021 μs 0.908 ± 0.014,0.86 ± 0.017,0.922 ± 0.018
pathological 2′′ 0.18 ± 0.0062 μs 0.187 ± 0.012 μs 0.951 ± 0.05,0.877 ± 0.07,0.965 ± 0.071
pathological 3 18.7 ± 0.3 ns 18.7 ± 0.31 ns 1 ± 0.023,1.01 ± 0.027,1 ± 0.023
pathological 4 0.0629 ± 0.0006 μs 0.0627 ± 0.00073 μs 1 ± 0.0079,0.975 ± 0.0082,1 ± 0.015
pathological 4′ 0.138 ± 0.0019 μs 0.151 ± 0.002 μs 0.937 ± 0.014,0.857 ± 0.019,0.912 ± 0.017
pathological 4′′ 0.185 ± 0.007 μs 0.181 ± 0.0039 μs 0.967 ± 0.081,1.02,1.02 ± 0.044
pathological 5a 0.0462 ± 0.00073 μs 0.0461 ± 0.00059 μs 0.996 ± 0.0093,0.995 ± 0.011,1 ± 0.02
pathological 5b 0.0462 ± 0.00069 μs 0.0461 ± 0.00061 μs 1 ± 0.011,0.988 ± 0.011,1 ± 0.02
pathological 5b′ 0.259 ± 0.003 μs 0.265 ± 0.0029 μs 0.957 ± 0.011,0.895 ± 0.01,0.977 ± 0.015
pathological 5b′′ 0.275 ± 0.0076 μs 0.277 ± 0.013 μs 0.919 ± 0.02,0.944 ± 0.19,0.993 ± 0.055
pathological 6 0.273 ± 0.0031 μs 0.278 ± 0.0028 μs 0.9 ± 0.0098,0.951 ± 0.011,0.983 ± 0.015
pathological large compaction (133380-op) 13.6 ± 0.14 ms 13.7 ± 0.14 ms 0.972 ± 0.038,0.988 ± 0.031,0.996 ± 0.015
pathological medium compaction (1254-op) 0.107 ± 0.0048 ms 0.11 ± 0.019 ms 0.915 ± 0.12,0.987 ± 0.07,0.967 ± 0.18
pathological old compaction (6-op) 0.236 μs 0.235 μs 0.98,1.01,1
pathological small compaction (18-op) 0.652 ± 0.008 μs 0.653 ± 0.0065 μs 0.995 ± 0.01,0.917 ± 0.014,1 ± 0.016
pathological tiny compaction (6-op) 0.298 ± 0.0022 μs 0.299 ± 0.0026 μs 0.987 ± 0.011,0.962 ± 0.013,0.999 ± 0.011
sample (bulk) n=1000 k=10000 σ=1 0.179 ± 0.023 ms 0.179 ± 0.023 ms 1 ± 0.18,0.992 ± 0.18,0.998 ± 0.18
sample (bulk) n=1000 k=10000 σ=100 0.126 ± 0.053 ms 0.127 ± 0.053 ms 0.993 ± 0.56,0.983 ± 0.56,0.988 ± 0.58
sample (bulk) n=1000 k=1000000 σ=1 17.7 ± 2.3 ms 17.6 ± 2.7 ms 0.976 ± 0.14,0.99 ± 0.17,1.01 ± 0.2
sample (bulk) n=1000 k=1000000 σ=100 9.78 ± 4.9 ms 10.5 ± 5 ms 1.08 ± 0.8,0.905 ± 0.65,0.932 ± 0.64
sample (bulk) n=1000000 k=10000 σ=1 0.374 ± 0.044 ms 0.384 ± 0.017 ms 0.984 ± 0.12,1.04 ± 0.21,0.974 ± 0.12
sample (bulk) n=1000000 k=10000 σ=100 0.144 ± 0.031 ms 0.145 ± 0.025 ms 1.14 ± 0.5,0.878 ± 0.35,1 ± 0.28
sample (bulk) n=1000000 k=1000000 σ=1 22.9 ± 0.98 ms 22.5 ± 1.4 ms 0.982 ± 0.034,1.06 ± 0.053,1.01 ± 0.076
sample (bulk) n=1000000 k=1000000 σ=100 10.6 ± 3.8 ms 11.2 ± 4.5 ms 1.09 ± 0.6,1.14 ± 0.65,0.954 ± 0.51
sample n=100 σ=0.1 28.6 ± 5.9 ns 27.8 ± 0.94 ns 1 ± 0.04,1 ± 0.035,1.03 ± 0.21
sample n=100 σ=1.0 0.0325 ± 0.0029 μs 0.0324 ± 0.0028 μs 1 ± 0.12,1.01 ± 0.12,1 ± 0.12
sample n=100 σ=10.0 21.2 ± 5.4 ns 21.4 ± 5.4 ns 1 ± 0.35,0.987 ± 0.35,0.99 ± 0.36
sample n=100 σ=100.0 18.6 ± 4.9 ns 18.5 ± 4.7 ns 1 ± 0.36,0.979 ± 0.35,1.01 ± 0.37
sample n=1000 σ=0.1 25.5 ± 7.7 ns 25.4 ± 5.9 ns 0.986 ± 0.34,1 ± 0.39,1 ± 0.38
sample n=1000 σ=1.0 0.0352 ± 0.0028 μs 0.0357 ± 0.0028 μs 0.989 ± 0.1,0.995 ± 0.11,0.988 ± 0.11
sample n=1000 σ=10.0 22.7 ± 5.9 ns 22.2 ± 6 ns 0.984 ± 0.35,1 ± 0.39,1.02 ± 0.39
sample n=1000 σ=100.0 18.5 ± 4.8 ns 18.7 ± 5 ns 0.996 ± 0.36,0.977 ± 0.35,0.989 ± 0.37
sample n=10000 σ=0.1 0.0352 ± 0.0011 μs 0.0355 ± 0.0013 μs 0.987 ± 0.058,1.01 ± 0.045,0.992 ± 0.047
sample n=10000 σ=1.0 0.0371 ± 0.0011 μs 0.0379 ± 0.0024 μs 0.99 ± 0.054,0.971 ± 0.087,0.979 ± 0.069
sample n=10000 σ=10.0 24.3 ± 6.1 ns 23.3 ± 6.7 ns 0.906 ± 0.37,0.992 ± 0.35,1.04 ± 0.4
sample n=10000 σ=100.0 18.5 ± 4.8 ns 18.8 ± 5.4 ns 1.08 ± 0.38,0.944 ± 0.38,0.982 ± 0.38
summarysize n=100 σ=0.1 1.15e+05 ± 0 h 1.15e+05 ± 0 h 1 ± 0
summarysize n=100 σ=1.0 1.15e+05 ± 0 h 1.15e+05 ± 0 h 1 ± 0
summarysize n=100 σ=10.0 1.15e+05 ± 0 h 1.15e+05 ± 0 h 1 ± 0
summarysize n=100 σ=100.0 1.15e+05 ± 0 h 1.15e+05 ± 0 h 1 ± 0
summarysize n=1000 σ=0.1 1.44e+05 ± 0 h 1.44e+05 ± 0 h 1 ± 0
summarysize n=1000 σ=1.0 1.44e+05 ± 0 h 1.44e+05 ± 0 h 1 ± 0
summarysize n=1000 σ=10.0 1.44e+05 ± 0 h 1.44e+05 ± 0 h 1 ± 0
summarysize n=1000 σ=100.0 1.44e+05 ± 0 h 1.44e+05 ± 0 h 1 ± 0
summarysize n=10000 σ=0.1 1e+06 ± 0 h 1e+06 ± 0 h 1 ± 0
summarysize n=10000 σ=1.0 1e+06 ± 0 h 1e+06 ± 0 h 1 ± 0
summarysize n=10000 σ=10.0 1e+06 ± 0 h 1e+06 ± 0 h 1 ± 0
summarysize n=10000 σ=100.0 1e+06 ± 0 h 1e+06 ± 0 h 1 ± 0
update ∘ rand n=100 σ=0.1 0.0897 ± 0.0026 μs 0.0898 ± 0.0025 μs 1.01 ± 0.038,1.01 ± 0.039,0.999 ± 0.04
update ∘ rand n=100 σ=1.0 0.0961 ± 0.003 μs 0.0973 ± 0.0041 μs 0.999 ± 0.073,1 ± 0.047,0.988 ± 0.051
update ∘ rand n=100 σ=10.0 0.112 ± 0.007 μs 0.113 ± 0.007 μs 1 ± 0.098,0.995 ± 0.091,0.994 ± 0.087
update ∘ rand n=100 σ=100.0 0.169 ± 0.0098 μs 0.17 ± 0.0099 μs 0.98 ± 0.08,0.979 ± 0.083,0.993 ± 0.082
update ∘ rand n=1000 σ=0.1 0.0909 ± 0.0026 μs 0.0908 ± 0.0028 μs 1.01 ± 0.043,1.01 ± 0.04,1 ± 0.042
update ∘ rand n=1000 σ=1.0 0.098 ± 0.0031 μs 0.0973 ± 0.0027 μs 1.01 ± 0.044,1.01 ± 0.044,1.01 ± 0.042
update ∘ rand n=1000 σ=10.0 0.109 ± 0.0026 μs 0.108 ± 0.0027 μs 1.01 ± 0.043,0.997 ± 0.036,1.01 ± 0.035
update ∘ rand n=1000 σ=100.0 0.174 ± 0.0045 μs 0.175 ± 0.0067 μs 0.983 ± 0.035,0.982 ± 0.038,0.996 ± 0.046
update ∘ rand n=10000 σ=0.1 0.0999 ± 0.00097 μs 0.0996 ± 0.00017 μs 1 ± 0.035,0.992 ± 0.025,1 ± 0.0099
update ∘ rand n=10000 σ=1.0 0.103 ± 0.0015 μs 0.103 ± 0.0014 μs 0.977 ± 0.021,0.989 ± 0.019,0.996 ± 0.02
update ∘ rand n=10000 σ=10.0 0.105 ± 0.0013 μs 0.106 ± 0.0016 μs 0.984 ± 0.028,1 ± 0.03,0.99 ± 0.019
update ∘ rand n=10000 σ=100.0 0.148 ± 0.0013 μs 0.148 ± 0.0021 μs 0.993 ± 0.021,0.978 ± 0.017,0.999 ± 0.016
time_to_load 0.0489 ± 7.2e-05 s 0.0492 ± 0.00038 s 0.991 ± 0.017,1 ± 0.012,0.994 ± 0.0078

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@ameligrana ameligrana marked this pull request as ready for review April 18, 2026 19:47
@ameligrana

Copy link
Copy Markdown
Collaborator Author

This seems an improvement to me.

@LilithHafner LilithHafner left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all of our benchmarks that hit this line are on small weight vectors with low m[4], so those benchmarks over-represent the improvement of this PR. That said, I doubt it's a meaningful regression on large samplers. So performance wise LGTM.

If we're tightening this bound, I would like to see a test that stresses it (e.g. one that fails if that +1 is missing but passes with the +1) and while making that test, of course, if you find one that requires +2 then that's an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants