Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize Array.of and Array.from for cpu pipelining and branch prediction #76

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

evanwashere
Copy link

@evanwashere evanwashere commented Jan 4, 2025

passes 262 webkit tests (Tools/Scripts/test262-runner --release, build first)
todo: check cpu cycles and instructions count (needs bun build for @mitata/counters)

patch: https://gist.github.com/evanwashere/17778413aba8b1eec1adfe5c6ef04a69

before

clk: ~3.33 GHz
cpu: null
runtime: jsc (null)

benchmark                               avg (min  max) p75   p99    (min  top 1%)
------------------------------------------------------- -------------------------------
Array.of(512)                              1.63 µs/iter   1.67 µs   1.68 µs ▃▆▇▁▁▂▁▂▄█▂
Array.of(4096)                            10.47 µs/iter  10.49 µs  10.50 µs ▃▃▁▆▁▆▆▃█▆▆
Array.of(32768)                           88.37 µs/iter  93.33 µs 110.25 µs ▂▅█▃▂▁▁▁▄▁▁
Array.of(65536)                          155.96 µs/iter 157.58 µs 174.00 µs ▂██▇▄▃▂▁▁▁▁

------------------------------------------------------- -------------------------------
Array.from(512)                            1.02 µs/iter   1.02 µs   1.32 µs ▃█▂▁▁▁▁▁▁▁▁
Array.from(4096)                           6.75 µs/iter   6.77 µs   6.81 µs ▂▅█▇▄▇▄▂▅▄▂
Array.from(32768)                         53.83 µs/iter  53.92 µs  54.25 µs ▃▁▃█▆▃▃▁▁▃▃
Array.from(65536)                        110.81 µs/iter 112.46 µs 124.75 µs ▂▆█▇▇▃▂▁▁▁▁

------------------------------------------------------- -------------------------------
Array.from(arr 512)                      200.16 ns/iter 202.21 ns 221.45 ns ▁▄█▆▄▂▂▁▁▁▁
Array.from(arr 4096)                     764.40 ns/iter 763.08 ns 989.76 ns ▂█▃▁▁▁▁▁▁▁▁
Array.from(arr 32768)                      6.12 µs/iter   6.07 µs   6.66 µs █▆▂▁▁▁▁▁▁▁▂
Array.from(arr 65536)                     11.80 µs/iter  11.87 µs  11.87 µs ▂▁▁▁▁▁▁▁▁▄█

------------------------------------------------------- -------------------------------
Array.from(512, () => 1)                   2.56 µs/iter   2.56 µs   2.58 µs ▂▃▄▅▄█▃▃▃▂▂
Array.from(4096, () => 1)                 14.27 µs/iter  14.30 µs  14.34 µs ▃▁▁▃▆▁▁█▆▃▃
Array.from(32768, () => 1)               118.04 µs/iter 119.67 µs 132.92 µs ▁█▅▆▄▂▂▁▁▁▁
Array.from(65536, () => 1)               229.38 µs/iter 231.33 µs 246.58 µs ▁▂▆█▆▄▂▂▁▁▁

------------------------------------------------------- -------------------------------
Array.from(iterator 512)                   2.21 µs/iter   2.22 µs   2.27 µs ▃▆█▆▂▁▁▁▁▁▁
Array.from(iterator 4096)                 16.04 µs/iter  16.04 µs  16.06 µs █▅▁▅▅▁█▅▁▁▅
Array.from(iterator 32768)               140.09 µs/iter 141.79 µs 161.83 µs ▁▁▂▂█▃▂▁▁▁▁
Array.from(iterator 65536)               262.21 µs/iter 263.79 µs 310.17 µs ▁▁█▇▃▂▁▁▁▁▁

------------------------------------------------------- -------------------------------
Array.from(iterator 512, () => 1)          2.58 µs/iter   2.58 µs   2.81 µs ▂█▂▁▁▁▁▁▁▁▁
Array.from(iterator 4096, () => 1)        18.93 µs/iter  18.90 µs  18.92 µs ▃▃▆▆▁▁▃▁█▁▃
Array.from(iterator 32768, () => 1)      164.04 µs/iter 165.50 µs 178.83 µs ▁▁▁▄█▅▃▂▁▁▁
Array.from(iterator 65536, () => 1)      305.42 µs/iter 307.25 µs 326.67 µs ▁▃██▅▂▂▂▁▁▁

------------------------------------------------------- -------------------------------
Array.fromAsync(512)                      26.16 µs/iter  26.08 µs  26.34 µs ▃▆█▆▃▁▁▁▃▁▃
Array.fromAsync(4096)                    214.39 µs/iter 216.50 µs 231.75 µs ▂▂▆█▂▄▂▁▁▁▁
Array.fromAsync(32768)                     1.64 ms/iter   1.65 ms   1.68 ms ▁▅▇▅▃▆█▅▃▂▁
Array.fromAsync(65536)                     3.26 ms/iter   3.28 ms   3.30 ms ▂▅▆▄▃▂▂▄█▄▂

------------------------------------------------------- -------------------------------
Array.fromAsync(arr 512)                  81.31 µs/iter  82.42 µs 100.88 µs ▁█▅▄▂▂▁▁▁▁▁
Array.fromAsync(arr 4096)                559.67 µs/iter 564.25 µs 591.58 µs ▁▅█▆▅▃▃▂▁▁▁
Array.fromAsync(arr 32768)                 4.20 ms/iter   4.21 ms   4.30 ms ▁▄██▅▂▂▁▁▁▁
Array.fromAsync(arr 65536)                 8.35 ms/iter   8.37 ms   8.49 ms ▄▅██▆▃▁▂▂▁▂

------------------------------------------------------- -------------------------------
Array.fromAsync(512, () => 1)             56.75 µs/iter  57.21 µs  69.75 µs ▂█▂▂▂▁▁▁▁▁▁
Array.fromAsync(4096, () => 1)           417.08 µs/iter 420.08 µs 438.79 µs ▁▃██▆▄▃▂▂▁▁
Array.fromAsync(32768, () => 1)            3.24 ms/iter   3.24 ms   3.32 ms ▃▆▇█▄▂▁▁▁▁▁
Array.fromAsync(65536, () => 1)            6.45 ms/iter   6.47 ms   6.49 ms ▂▄█▄▄▄██▆▄▃

------------------------------------------------------- -------------------------------
Array.fromAsync(iterator 512)             83.20 µs/iter  84.29 µs  97.50 µs ▁▁█▆▃▂▂▁▁▁▁
Array.fromAsync(iterator 4096)           571.32 µs/iter 575.21 µs 599.50 µs ▁▁▄█▆▄▃▂▂▁▁
Array.fromAsync(iterator 32768)            4.28 ms/iter   4.29 ms   4.34 ms ▂▄▆█▇▇▃▂▂▁▁
Array.fromAsync(iterator 65536)            8.53 ms/iter   8.55 ms   8.75 ms ▃▄█▅▂▂▂▁▁▁▁

------------------------------------------------------- -------------------------------
Array.fromAsync(iterator 512, () => 1)   105.42 µs/iter 106.00 µs 121.00 µs ▁▁█▄▂▂▁▁▁▁▁
Array.fromAsync(iterator 4096, () => 1)  747.20 µs/iter 752.13 µs 778.75 µs ▁▁▁▃█▄▃▂▂▁▁
Array.fromAsync(iterator 32768, () => 1)   5.70 ms/iter   5.72 ms   5.79 ms ▁▃▅▆▇█▄▃▃▂▂
Array.fromAsync(iterator 65536, () => 1)  11.38 ms/iter  11.40 ms  11.65 ms ▃▃█▆▅▂▂▁▂▁▂

after

clk: ~3.41 GHz
cpu: null
runtime: jsc (null)

benchmark                               avg (min  max) p75   p99    (min  top 1%)
------------------------------------------------------- -------------------------------
Array.of(512)                              1.45 µs/iter   1.48 µs   1.51 µs ▂▅█▃▂▁▂▇▇▂▁
Array.of(4096)                             9.15 µs/iter   9.17 µs   9.18 µs ▅▁▃▃▆▃▅▅▅█▃
Array.of(32768)                           71.90 µs/iter  71.94 µs  72.43 µs ▃▆▃▁▆█▁▁▃▁▃
Array.of(65536)                          138.11 µs/iter 139.79 µs 154.50 µs ▁▁▂▇█▄▃▂▁▁▁

------------------------------------------------------- -------------------------------
Array.from(512)                          393.27 ns/iter 395.49 ns 479.08 ns ▃█▃▂▁▁▁▁▁▁▁
Array.from(4096)                           1.54 µs/iter   1.55 µs   1.59 µs ▁▃▆█▇▅▃▁▁▁▁
Array.from(32768)                         12.18 µs/iter  12.20 µs  12.39 µs ▂█▅▂▂▁▁▁▂▁▄
Array.from(65536)                         24.60 µs/iter  24.61 µs  24.66 µs ▃▁▁▃▆▃▁█▆▁▃

------------------------------------------------------- -------------------------------
Array.from(arr 512)                      192.91 ns/iter 194.31 ns 224.31 ns ▁▆█▄▂▁▁▁▁▁▁
Array.from(arr 4096)                     765.31 ns/iter 755.46 ns   1.02 µs ▂█▂▁▁▁▁▁▁▁▁
Array.from(arr 32768)                      5.59 µs/iter   5.82 µs   5.89 µs ▄▁▁▁▁▁▁▁▁█▂
Array.from(arr 65536)                     11.59 µs/iter  11.73 µs  11.74 µs ▂▁▃▁▁▁▁▁▂▄█

------------------------------------------------------- -------------------------------
Array.from(512, () => 1)                 411.13 ns/iter 412.54 ns 461.70 ns ▂█▆▃▁▁▁▁▁▁▁
Array.from(4096, () => 1)                  2.74 µs/iter   2.74 µs   2.85 µs ▄█▆▂▁▁▁▂▁▁▁
Array.from(32768, () => 1)                22.48 µs/iter  22.54 µs  22.64 µs ▃█▃▁▃▁▁▅▁▃▃
Array.from(65536, () => 1)                45.53 µs/iter  45.58 µs  45.78 µs ▅█▅▅█▅█▁▁▁▅

------------------------------------------------------- -------------------------------
Array.from(iterator 512)                   2.13 µs/iter   2.13 µs   2.34 µs ▄█▂▁▁▁▁▁▁▁▁
Array.from(iterator 4096)                 15.42 µs/iter  15.43 µs  15.45 µs ▅▅▅█▁▅▅▅▁▁█
Array.from(iterator 32768)               140.76 µs/iter 142.79 µs 158.58 µs ▁▁▂▁▂█▅▂▂▁▁
Array.from(iterator 65536)               257.83 µs/iter 260.17 µs 279.75 µs ▁▁▁▃█▅▃▂▂▁▁

------------------------------------------------------- -------------------------------
Array.from(iterator 512, () => 1)          2.06 µs/iter   2.06 µs   2.12 µs ▂▄█▄▃▁▁▁▁▁▁
Array.from(iterator 4096, () => 1)        14.86 µs/iter  14.88 µs  14.89 µs ▆▃▁▃█▁▁▁▁▃▆
Array.from(iterator 32768, () => 1)      133.87 µs/iter 136.42 µs 149.58 µs ▁▁▁▂▃▆█▄▂▁▁
Array.from(iterator 65536, () => 1)      246.52 µs/iter 248.83 µs 266.54 µs ▁▁▁▂▆█▄▂▁▁▁

------------------------------------------------------- -------------------------------
Array.fromAsync(512)                     579.35 ns/iter 581.24 ns 670.80 ns ▃█▄▂▁▁▁▁▁▁▁
Array.fromAsync(4096)                      1.88 µs/iter   1.88 µs   1.94 µs ▁▁▆█▇▃▁▂▁▁▁
Array.fromAsync(32768)                    13.24 µs/iter  13.25 µs  13.36 µs ▅▆▃█▁▃▁▁▁▁▃
Array.fromAsync(65536)                    26.89 µs/iter  26.84 µs  26.93 µs ▃▁▃█▃▃▆▃▁▁▃

------------------------------------------------------- -------------------------------
Array.fromAsync(arr 512)                  77.46 µs/iter  78.71 µs  93.96 µs ▁▁█▄▃▂▁▁▁▁▁
Array.fromAsync(arr 4096)                525.49 µs/iter 530.42 µs 554.63 µs ▁▁▇█▆▄▄▂▂▁▁
Array.fromAsync(arr 32768)                 3.89 ms/iter   3.90 ms   3.99 ms ▂▆█▅▃▂▂▂▂▁▁
Array.fromAsync(arr 65536)                 7.81 ms/iter   7.87 ms   7.98 ms ▃█▄▅▆▆▅▅▄▂▂

------------------------------------------------------- -------------------------------
Array.fromAsync(512, () => 1)             26.76 µs/iter  26.83 µs  27.03 µs ▃▁█▁▃▆▃▆▁▁▃
Array.fromAsync(4096, () => 1)           212.37 µs/iter 215.58 µs 229.96 µs ▁▂█▂▃▂▁▁▁▁▁
Array.fromAsync(32768, () => 1)            1.62 ms/iter   1.63 ms   1.66 ms ▂▃▆██▆▄▃▁▁▁
Array.fromAsync(65536, () => 1)            3.24 ms/iter   3.25 ms   3.41 ms ▃█▅▃▂▂▁▁▁▁▁

------------------------------------------------------- -------------------------------
Array.fromAsync(iterator 512)             79.03 µs/iter  80.29 µs  92.04 µs ▁▁▆█▄▂▂▂▁▁▁
Array.fromAsync(iterator 4096)           538.00 µs/iter 542.71 µs 571.71 µs ▁▂▄█▇▄▃▂▂▁▁
Array.fromAsync(iterator 32768)            4.01 ms/iter   4.02 ms   4.10 ms ▂▆█▇▄▂▂▂▁▂▁
Array.fromAsync(iterator 65536)            8.05 ms/iter   8.11 ms   8.35 ms ▆█▆▅▇▇▃▃▃▂▁

------------------------------------------------------- -------------------------------
Array.fromAsync(iterator 512, () => 1)   106.25 µs/iter 107.38 µs 123.33 µs ▁▁▆█▃▂▂▁▁▁▁
Array.fromAsync(iterator 4096, () => 1)  745.55 µs/iter 751.38 µs 794.75 µs ▁▁▃█▅▄▂▂▁▁▁
Array.fromAsync(iterator 32768, () => 1)   5.71 ms/iter   5.73 ms   5.83 ms ▂▄█▇▆▃▃▂▂▂▂
Array.fromAsync(iterator 65536, () => 1)  11.51 ms/iter  11.57 ms  11.75 ms ▂▅▄▅▄█▆▄▃▁▂

@evanwashere
Copy link
Author

@Constellation what do you think of having this upstreamed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant