Skip to content

Use Bumper.jl to reduce allocations in bulk sampling#161

Open
ameligrana wants to merge 6 commits into
mainfrom
Tortar-patch-2
Open

Use Bumper.jl to reduce allocations in bulk sampling#161
ameligrana wants to merge 6 commits into
mainfrom
Tortar-patch-2

Conversation

@ameligrana

Copy link
Copy Markdown
Collaborator

No description provided.

@ameligrana

Copy link
Copy Markdown
Collaborator Author

Another dependency, but still lightweight and reasonably stable

@github-actions

github-actions Bot commented Jun 28, 2025

Copy link
Copy Markdown

Benchmark Results

3a191dc eaa8e9d 3a191dc / eaa8e9d
TTFX excluding time to load 1.02 ± 0 s 0.932 ± 0 s 1.09 ± 0,1.1 ± 0,1.09 ± 0
code size in bytes 1.51e+04 ± 0 h 1.51e+04 ± 0 h 1 ± 0
code size in lines 566 ± 0 h 564 ± 0 h 1 ± 0
code size in syntax nodes 4.08e+03 ± 0 h 4.08e+03 ± 0 h 1 ± 0
constructor n=100 σ=0.1 5.88 ± 0.2 μs 5.87 ± 0.15 μs 0.984 ± 0.042,0.973 ± 0.1,1 ± 0.044
constructor n=100 σ=1.0 6.27 ± 0.16 μs 6.32 ± 0.26 μs 0.973 ± 0.054,0.988 ± 0.061,0.992 ± 0.049
constructor n=100 σ=10.0 6.84 ± 0.38 μs 6.7 ± 0.17 μs 0.999 ± 0.055,0.989 ± 0.049,1.02 ± 0.063
constructor n=100 σ=100.0 0.0428 ± 0.0031 ms 10.5 ± 7.9 μs 4.18 ± 1.9,0.994 ± 0.89,4.06 ± 3.1
constructor n=1000 σ=0.1 0.0517 ± 0.0044 ms 0.042 ± 0.0013 ms 0.979 ± 0.043,1.02 ± 0.044,1.23 ± 0.11
constructor n=1000 σ=1.0 0.0448 ± 0.001 ms 0.0441 ± 0.001 ms 0.983 ± 0.04,1.01 ± 0.059,1.02 ± 0.034
constructor n=1000 σ=10.0 0.0517 ± 0.001 ms 0.056 ± 0.0047 ms 0.984 ± 0.033,1 ± 0.038,0.922 ± 0.079
constructor n=1000 σ=100.0 0.0601 ± 0.0012 ms 0.0597 ± 0.00082 ms 0.985 ± 0.03,0.996 ± 0.029,1.01 ± 0.024
constructor n=10000 σ=0.1 0.433 ± 0.03 ms 0.418 ± 0.024 ms 0.982 ± 0.074,1.02 ± 0.071,1.04 ± 0.093
constructor n=10000 σ=1.0 0.433 ± 0.025 ms 0.431 ± 0.026 ms 0.97 ± 0.065,1.01 ± 0.08,1 ± 0.083
constructor n=10000 σ=10.0 0.458 ± 0.028 ms 0.521 ± 0.03 ms 0.952 ± 0.1,1.15 ± 0.78,0.879 ± 0.073
constructor n=10000 σ=100.0 0.583 ± 0.026 ms 0.584 ± 0.034 ms 0.989 ± 0.075,0.999 ± 0.082,0.998 ± 0.073
delete ∘ rand n=100 σ=0.1 4.48 ± 0.19 μs 4.47 ± 0.2 μs 1 ± 0.06,1.01 ± 0.061,1 ± 0.062
delete ∘ rand n=100 σ=1.0 4.75 ± 0.2 μs 4.7 ± 0.19 μs 0.996 ± 0.058,1.01 ± 0.056,1.01 ± 0.059
delete ∘ rand n=100 σ=10.0 4.89 ± 0.18 μs 4.87 ± 0.18 μs 0.992 ± 0.052,1.01 ± 0.052,1 ± 0.052
delete ∘ rand n=100 σ=100.0 7.27 ± 0.47 μs 7.28 ± 0.45 μs 0.989 ± 0.088,0.989 ± 0.092,0.999 ± 0.089
delete ∘ rand n=1000 σ=0.1 0.0445 ± 0.00075 ms 0.0443 ± 0.00093 ms 1 ± 0.027,1.01 ± 0.028,1.01 ± 0.027
delete ∘ rand n=1000 σ=1.0 0.0484 ± 0.001 ms 0.0479 ± 0.00096 ms 0.996 ± 0.029,1.01 ± 0.03,1.01 ± 0.029
delete ∘ rand n=1000 σ=10.0 0.0494 ± 0.00086 ms 0.0491 ± 0.0011 ms 0.992 ± 0.025,1 ± 0.027,1 ± 0.028
delete ∘ rand n=1000 σ=100.0 0.0541 ± 0.001 ms 0.0541 ± 0.001 ms 0.992 ± 0.024,1 ± 0.027,0.999 ± 0.026
delete ∘ rand n=10000 σ=0.1 0.481 ± 0.012 ms 0.482 ± 0.019 ms 0.971 ± 0.12,1.01 ± 0.037,0.998 ± 0.047
delete ∘ rand n=10000 σ=1.0 0.515 ± 0.012 ms 0.518 ± 0.014 ms 0.995 ± 0.03,1.01 ± 0.035,0.995 ± 0.036
delete ∘ rand n=10000 σ=10.0 0.522 ± 0.016 ms 0.522 ± 0.014 ms 0.988 ± 0.043,1.01 ± 0.038,1 ± 0.041
delete ∘ rand n=10000 σ=100.0 0.52 ± 0.013 ms 0.514 ± 0.014 ms 0.985 ± 0.043,1.01 ± 0.045,1.01 ± 0.037
empty constructor 1.89 ± 0.16 μs 2.11 ± 0.32 μs 0.928 ± 0.2,0.92 ± 0.26,0.899 ± 0.16
intermixed_h n=100 σ=0.1 11.4 ± 1.2 μs 14.8 ± 1.4 μs 0.978 ± 0.11,0.984 ± 0.12,0.77 ± 0.11
intermixed_h n=100 σ=1.0 11.7 ± 1.1 μs 11.7 ± 1.1 μs 0.986 ± 0.13,0.99 ± 0.15,1 ± 0.13
intermixed_h n=100 σ=10.0 11.5 ± 1.7 μs 11.5 ± 2 μs 0.988 ± 0.23,0.994 ± 0.33,0.998 ± 0.23
intermixed_h n=100 σ=100.0 12.3 ± 1.2 μs 12.4 ± 1.4 μs 0.974 ± 0.15,1 ± 0.15,0.99 ± 0.15
intermixed_h n=1000 σ=0.1 0.108 ± 0.0089 ms 0.115 ± 0.0091 ms 0.979 ± 0.1,0.937 ± 0.13,0.941 ± 0.11
intermixed_h n=1000 σ=1.0 0.111 ± 0.0092 ms 0.109 ± 0.0085 ms 0.978 ± 0.11,0.985 ± 0.11,1.02 ± 0.12
intermixed_h n=1000 σ=10.0 0.104 ± 0.0096 ms 0.107 ± 0.01 ms 0.974 ± 0.11,1.03 ± 0.13,0.973 ± 0.13
intermixed_h n=1000 σ=100.0 0.113 ± 0.013 ms 0.114 ± 0.014 ms 0.975 ± 0.15,0.989 ± 0.15,0.99 ± 0.17
intermixed_h n=10000 σ=0.1 1.2 ± 0.19 ms 1.13 ± 0.19 ms 1.02 ± 0.22,1.03 ± 0.22,1.06 ± 0.25
intermixed_h n=10000 σ=1.0 1.2 ± 0.2 ms 1.21 ± 0.21 ms 0.956 ± 0.2,0.968 ± 0.21,0.988 ± 0.23
intermixed_h n=10000 σ=10.0 1.09 ± 0.17 ms 1.08 ± 0.13 ms 0.974 ± 0.2,1.03 ± 0.27,1.01 ± 0.2
intermixed_h n=10000 σ=100.0 1.14 ± 0.18 ms 1.11 ± 0.18 ms 1 ± 0.21,0.978 ± 0.2,1.03 ± 0.23
pathological 1 0.0547 ± 0.00032 μs 0.0468 ± 0.0011 μs 1.01 ± 0.0094,1.01 ± 0.0085,1.17 ± 0.028
pathological 1′ 0.162 ± 0.0016 μs 0.18 ± 0.0017 μs 1.03 ± 0.031,1.01 ± 0.013,0.902 ± 0.012
pathological 2 0.0749 ± 0.00016 μs 0.0636 ± 0.00031 μs 1.01 ± 0.0088,1.01 ± 0.0054,1.18 ± 0.0062
pathological 2′ 0.18 ± 0.0016 μs 0.171 ± 0.0016 μs 1.09 ± 0.022,1 ± 0.013,1.05 ± 0.013
pathological 2′′ 0.21 ± 0.0083 μs 0.206 ± 0.005 μs 1.08 ± 0.041,1.04 ± 0.11,1.02 ± 0.047
pathological 3 16.9 ± 0.24 ns 17 ± 0.23 ns 1 ± 0.019,1 ± 0.022,0.997 ± 0.02
pathological 4 0.0743 ± 0.00018 μs 0.063 ± 0.00029 μs 1.1 ± 0.0076,1.01 ± 0.0066,1.18 ± 0.0061
pathological 4′ 0.185 ± 0.0017 μs 0.185 ± 0.0014 μs 1.06 ± 0.019,0.986 ± 0.014,0.999 ± 0.012
pathological 4′′ 0.22 ± 0.0047 μs 0.217 ± 0.004 μs 1.09 ± 0.055,1.03 ± 0.051,1.02 ± 0.029
pathological 5a 0.0542 ± 0.00011 μs 0.0453 ± 0.00025 μs 1.01 ± 0.0077,1.01 ± 0.0071,1.2 ± 0.007
pathological 5b 0.0542 ± 0.00011 μs 0.0453 ± 0.00026 μs 1.01 ± 0.0079,1.01 ± 0.0077,1.2 ± 0.0074
pathological 5b′ 0.325 ± 0.0032 μs 0.329 ± 0.0033 μs 1.02 ± 0.015,1.02 ± 0.015,0.989 ± 0.014
pathological 5b′′ 0.336 ± 0.0084 μs 0.334 ± 0.0062 μs 0.991 ± 0.031,1 ± 0.033,1.01 ± 0.031
pathological large compaction (133380-op) 15.2 ± 0.12 ms 15.1 ± 0.24 ms 1.03 ± 0.03,1.01 ± 0.036,1 ± 0.018
pathological medium compaction (1254-op) 0.141 ± 0.096 ms 0.102 ± 0.0029 ms 0.985 ± 0.32,1.06 ± 0.24,1.38 ± 0.94
pathological old compaction (6-op) 0.211 μs 0.28 μs 1.04,1.01,0.755
pathological small compaction (18-op) 0.922 ± 0.022 μs 0.919 ± 0.013 μs 1.01 ± 0.022,1.01 ± 0.019,1 ± 0.028
pathological tiny compaction (6-op) 0.296 ± 0.0021 μs 0.285 ± 0.011 μs 1.05 ± 0.024,1.03 ± 0.014,1.04 ± 0.041
sample (bulk) n=1000 k=10000 σ=1 0.166 ± 0.02 ms 0.167 ± 0.021 ms 1 ± 0.19,0.995 ± 0.18,0.997 ± 0.17
sample (bulk) n=1000 k=10000 σ=100 0.101 ± 0.034 ms 0.103 ± 0.037 ms 1 ± 0.51,0.977 ± 0.5,0.986 ± 0.49
sample (bulk) n=1000 k=1000000 σ=1 16.5 ± 1.8 ms 16.3 ± 2.2 ms 0.987 ± 0.17,1.03 ± 0.19,1.02 ± 0.18
sample (bulk) n=1000 k=1000000 σ=100 8.12 ± 3 ms 8.2 ± 3.4 ms 1.05 ± 0.66,0.97 ± 0.53,0.991 ± 0.55
sample (bulk) n=1000000 k=10000 σ=1 0.337 ± 0.015 ms 0.368 ± 0.032 ms 0.987 ± 0.14,0.999 ± 0.14,0.916 ± 0.09
sample (bulk) n=1000000 k=10000 σ=100 0.168 ± 0.008 ms 0.155 ± 0.043 ms 1.15 ± 0.41,1.09 ± 0.39,1.08 ± 0.3
sample (bulk) n=1000000 k=1000000 σ=1 21.8 ± 0.64 ms 21.8 ± 1.4 ms 1.06 ± 0.032,1.04 ± 0.056,1 ± 0.069
sample (bulk) n=1000000 k=1000000 σ=100 6.86 ± 1.5 ms 8.67 ± 2.9 ms 0.818 ± 0.36,1.09 ± 0.65,0.791 ± 0.32
sample n=100 σ=0.1 25.4 ± 0.63 ns 25.5 ± 0.71 ns 1 ± 0.035,1 ± 0.036,0.997 ± 0.037
sample n=100 σ=1.0 29.5 ± 2.2 ns 29.6 ± 2.4 ns 0.998 ± 0.11,0.999 ± 0.11,0.999 ± 0.11
sample n=100 σ=10.0 19.3 ± 4.5 ns 19.6 ± 4.6 ns 1.01 ± 0.35,1.01 ± 0.35,0.986 ± 0.33
sample n=100 σ=100.0 16.9 ± 3.9 ns 17.1 ± 3.9 ns 1.01 ± 0.32,1.01 ± 0.33,0.989 ± 0.32
sample n=1000 σ=0.1 23.5 ± 6.3 ns 23.7 ± 6.1 ns 1.01 ± 0.35,1.01 ± 0.29,0.991 ± 0.37
sample n=1000 σ=1.0 0.0321 ± 0.0023 μs 0.0324 ± 0.0024 μs 1 ± 0.099,1 ± 0.098,0.992 ± 0.1
sample n=1000 σ=10.0 20.8 ± 5.6 ns 20.4 ± 5.3 ns 0.995 ± 0.37,1 ± 0.38,1.02 ± 0.38
sample n=1000 σ=100.0 16.9 ± 4 ns 17.5 ± 4.1 ns 1.01 ± 0.33,1.01 ± 0.34,0.968 ± 0.33
sample n=10000 σ=0.1 0.0318 ± 0.0011 μs 0.0318 ± 0.0013 μs 1 ± 0.083,0.996 ± 0.057,0.999 ± 0.054
sample n=10000 σ=1.0 0.0342 ± 0.0015 μs 0.0344 ± 0.0011 μs 1.01 ± 0.071,1.01 ± 0.05,0.994 ± 0.055
sample n=10000 σ=10.0 22.3 ± 6.6 ns 22.1 ± 6.2 ns 1.03 ± 0.42,1 ± 0.4,1.01 ± 0.41
sample n=10000 σ=100.0 17.5 ± 4.2 ns 18 ± 4.1 ns 1.01 ± 0.34,1.04 ± 0.32,0.973 ± 0.32
summarysize n=100 σ=0.1 1.19e+05 ± 0 h 1.19e+05 ± 0 h 1 ± 0
summarysize n=100 σ=1.0 1.19e+05 ± 0 h 1.19e+05 ± 0 h 1 ± 0
summarysize n=100 σ=10.0 1.19e+05 ± 0 h 1.19e+05 ± 0 h 1 ± 0
summarysize n=100 σ=100.0 1.19e+05 ± 0 h 1.19e+05 ± 0 h 1 ± 0
summarysize n=1000 σ=0.1 1.52e+05 ± 0 h 1.52e+05 ± 0 h 1 ± 0
summarysize n=1000 σ=1.0 1.52e+05 ± 0 h 1.52e+05 ± 0 h 1 ± 0
summarysize n=1000 σ=10.0 1.52e+05 ± 0 h 1.52e+05 ± 0 h 1 ± 0
summarysize n=1000 σ=100.0 1.52e+05 ± 0 h 1.52e+05 ± 0 h 1 ± 0
summarysize n=10000 σ=0.1 1.13e+06 ± 0 h 1.13e+06 ± 0 h 1 ± 0
summarysize n=10000 σ=1.0 1.13e+06 ± 0 h 1.13e+06 ± 0 h 1 ± 0
summarysize n=10000 σ=10.0 1.13e+06 ± 0 h 1.13e+06 ± 0 h 1 ± 0
summarysize n=10000 σ=100.0 1.13e+06 ± 0 h 1.13e+06 ± 0 h 1 ± 0
update ∘ rand n=100 σ=0.1 0.0857 ± 0.002 μs 0.0821 ± 0.0021 μs 1.03 ± 0.043,1.02 ± 0.036,1.04 ± 0.036
update ∘ rand n=100 σ=1.0 0.0918 ± 0.0024 μs 0.089 ± 0.0028 μs 1.03 ± 0.046,1.01 ± 0.042,1.03 ± 0.042
update ∘ rand n=100 σ=10.0 0.102 ± 0.0032 μs 0.0989 ± 0.0035 μs 1.02 ± 0.049,1.01 ± 0.046,1.04 ± 0.049
update ∘ rand n=100 σ=100.0 0.175 ± 0.015 μs 0.175 ± 0.016 μs 0.986 ± 0.12,0.99 ± 0.12,1 ± 0.12
update ∘ rand n=1000 σ=0.1 0.0861 ± 0.0022 μs 0.0827 ± 0.0022 μs 1.03 ± 0.04,1.02 ± 0.035,1.04 ± 0.039
update ∘ rand n=1000 σ=1.0 0.0917 ± 0.002 μs 0.0889 ± 0.0023 μs 1.02 ± 0.03,1.01 ± 0.031,1.03 ± 0.034
update ∘ rand n=1000 σ=10.0 0.0998 ± 0.0022 μs 0.0964 ± 0.0018 μs 1.02 ± 0.032,1.01 ± 0.031,1.04 ± 0.03
update ∘ rand n=1000 σ=100.0 0.172 ± 0.006 μs 0.171 ± 0.0074 μs 0.991 ± 0.053,0.994 ± 0.053,1 ± 0.056
update ∘ rand n=10000 σ=0.1 0.0928 ± 0.0013 μs 0.0905 ± 0.0013 μs 1.01 ± 0.02,1.01 ± 0.025,1.03 ± 0.021
update ∘ rand n=10000 σ=1.0 0.096 ± 0.0021 μs 0.0943 ± 0.0012 μs 1.01 ± 0.024,1.01 ± 0.014,1.02 ± 0.026
update ∘ rand n=10000 σ=10.0 0.0975 ± 0.0014 μs 0.0951 ± 0.001 μs 1.01 ± 0.024,0.999 ± 0.016,1.03 ± 0.019
update ∘ rand n=10000 σ=100.0 0.166 ± 0.0023 μs 0.166 ± 0.0024 μs 0.986 ± 0.023,0.981 ± 0.017,1 ± 0.02
time_to_load 0.0848 ± 0.0014 s 0.0834 ± 0.0013 s 1.02 ± 0.0091,1.03 ± 0.014,1.02 ± 0.023

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@ameligrana

ameligrana commented Jun 28, 2025

Copy link
Copy Markdown
Collaborator Author

Seems to have a significant effect on performance, and allocations are now practically only the one of the sample, though Bumper internals scares me a bit

@ameligrana

Copy link
Copy Markdown
Collaborator Author

I'm wondering if we can use for the same purpose yours https://github.com/LilithHafner/PtrArrays.jl @LilithHafner

@ameligrana

Copy link
Copy Markdown
Collaborator Author

Tried it, seems slower than Bumper.jl when I wrap the free calls in a try-finally block, and faster if I do not, but having that block seems strongly encouraged by the readme

@ameligrana

Copy link
Copy Markdown
Collaborator Author

Actually it is really needed for safety the try-finally block, but I will open up a new PR with PtrArrays to try it anyway

@LilithHafner LilithHafner left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What benchmarks motivate this?
How does this impact precompile time?

Comment thread src/bulk_sampling.jl Outdated
@ameligrana

Copy link
Copy Markdown
Collaborator Author

What benchmarks motivate this?

All cases where k is sufficiently low e.g. in our benchmarks they are

sample (bulk) n=1000 k=10000 σ=1
sample (bulk) n=1000 k=10000 σ=100
sample (bulk) n=1000000 k=10000 σ=1
sample (bulk) n=1000000 k=10000 σ=100

actually we don't see a significant effect in our benchmarks consistently, but this is because the effect is more on the average than on the minimum (around 10-15% there).

How does this impact precompile time?

with a fresh environment on 1.11:

without Bumper:

julia> @time @eval using WeightVectors
Precompiling WeightVectors...
  4 dependencies successfully precompiled in 2 seconds
  2.040833 seconds (58.16 k allocations: 4.299 MiB, 1.00% compilation time)

with Bumper:

julia> @time @eval using WeightVectors
Precompiling WeightVectors...
  6 dependencies successfully precompiled in 2 seconds
  2.444766 seconds (75.47 k allocations: 5.830 MiB, 0.90% compilation time)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants