Skip to content

Conversation

MilesCranmer
Copy link
Member

Related to the CUDA implementation in #65. However the difference here is that this is a full AbstractNode type, capable of being used in all existing functions. It's also fully writeable, rather than just read-only and generated for the CUDA kernel.

In fact I think this is probably useful to rewrite the CUDA implementation to be a bit leaner. The existing CUDA kernel has a lot of redundancies.

Copy link
Contributor

github-actions bot commented Aug 30, 2025

Benchmark Results (Julia v1)

Time benchmarks
master 76e9a5b... master / 76e9a5b...
eval/ComplexF32/evaluation 7.3 ± 0.58 ms 7.3 ± 0.55 ms 1 ± 0.11
eval/ComplexF64/evaluation 10.9 ± 1.1 ms 11 ± 1.1 ms 0.996 ± 0.14
eval/Float32/derivative 12.1 ± 0.88 ms 13 ± 1.4 ms 0.928 ± 0.12
eval/Float32/derivative_turbo 12.1 ± 1 ms 13.1 ± 1.6 ms 0.925 ± 0.14
eval/Float32/evaluation 2.77 ± 0.26 ms 2.8 ± 0.26 ms 0.988 ± 0.13
eval/Float32/evaluation_bumper 0.586 ± 0.019 ms 0.583 ± 0.017 ms 1 ± 0.043
eval/Float32/evaluation_turbo 0.544 ± 0.03 ms 0.543 ± 0.038 ms 1 ± 0.09
eval/Float32/evaluation_turbo_bumper 0.586 ± 0.018 ms 0.583 ± 0.018 ms 1 ± 0.044
eval/Float64/derivative 15.3 ± 1.3 ms 16.2 ± 0.87 ms 0.945 ± 0.094
eval/Float64/derivative_turbo 15.5 ± 0.95 ms 16.3 ± 0.85 ms 0.953 ± 0.077
eval/Float64/evaluation 3.21 ± 0.32 ms 3.2 ± 0.33 ms 1 ± 0.15
eval/Float64/evaluation_bumper 1.2 ± 0.044 ms 1.2 ± 0.042 ms 0.998 ± 0.05
eval/Float64/evaluation_turbo 1.06 ± 0.062 ms 1.06 ± 0.062 ms 0.996 ± 0.082
eval/Float64/evaluation_turbo_bumper 1.21 ± 0.046 ms 1.2 ± 0.044 ms 1 ± 0.053
utils/combine_operators/break_sharing 0.0433 ± 0.0036 ms 0.0415 ± 0.002 ms 1.04 ± 0.1
utils/convert/break_sharing 31.2 ± 8 μs 0.0328 ± 0.0078 ms 0.952 ± 0.33
utils/convert/preserve_sharing 0.111 ± 0.013 ms 0.103 ± 0.0099 ms 1.07 ± 0.16
utils/copy/break_sharing 31.6 ± 8.7 μs 30.5 ± 6.1 μs 1.04 ± 0.35
utils/copy/preserve_sharing 0.111 ± 0.013 ms 0.107 ± 0.012 ms 1.04 ± 0.17
utils/count_constant_nodes/break_sharing 14.4 ± 3.3 μs 13.7 ± 2.8 μs 1.05 ± 0.33
utils/count_constant_nodes/preserve_sharing 0.093 ± 0.01 ms 0.0928 ± 0.0099 ms 1 ± 0.15
utils/count_depth/break_sharing 16.4 ± 4.1 μs 16.7 ± 3.4 μs 0.979 ± 0.31
utils/count_nodes/break_sharing 13.7 ± 2.9 μs 13.6 ± 3.2 μs 1.01 ± 0.32
utils/count_nodes/preserve_sharing 0.0931 ± 0.01 ms 0.0939 ± 0.011 ms 0.991 ± 0.16
utils/get_set_constants!/break_sharing 0.0359 ± 0.0054 ms 0.0338 ± 0.0028 ms 1.06 ± 0.18
utils/get_set_constants!/preserve_sharing 0.19 ± 0.015 ms 0.181 ± 0.012 ms 1.05 ± 0.11
utils/get_set_constants_parametric 0.0501 ± 0.0059 ms 0.0515 ± 0.01 ms 0.973 ± 0.23
utils/has_constants/break_sharing 8.66 ± 2.5 μs 9.9 ± 3 μs 0.875 ± 0.37
utils/has_operators/break_sharing 2.78 ± 0.49 μs 2.9 ± 0.51 μs 0.959 ± 0.24
utils/hash/break_sharing 25.8 ± 4.8 μs 26.4 ± 5.1 μs 0.975 ± 0.26
utils/hash/preserve_sharing 0.106 ± 0.01 ms 0.0993 ± 0.0083 ms 1.07 ± 0.14
utils/index_constant_nodes/break_sharing 0.0324 ± 0.0069 ms 0.0328 ± 0.0059 ms 0.988 ± 0.28
utils/index_constant_nodes/preserve_sharing 0.113 ± 0.011 ms 0.112 ± 0.014 ms 1.01 ± 0.16
utils/is_constant/break_sharing 10.8 ± 3.3 μs 10.7 ± 3.4 μs 1.01 ± 0.44
utils/simplify_tree/break_sharing 28.3 ± 5.1 μs 27.4 ± 4.6 μs 1.03 ± 0.25
utils/simplify_tree/preserve_sharing 0.112 ± 0.012 ms 0.115 ± 0.012 ms 0.978 ± 0.14
utils/string_tree/break_sharing 0.523 ± 0.04 ms 0.524 ± 0.039 ms 0.997 ± 0.11
utils/string_tree/preserve_sharing 0.619 ± 0.028 ms 0.617 ± 0.024 ms 1 ± 0.06
time_to_load 0.242 ± 0.0077 s 0.292 ± 0.0073 s 0.83 ± 0.034
Memory benchmarks
master 76e9a5b... master / 76e9a5b...
eval/ComplexF32/evaluation 0.978 k allocs: 2.5 MB 0.981 k allocs: 2.51 MB 0.997
eval/ComplexF64/evaluation 0.99 k allocs: 5.04 MB 1.01 k allocs: 5.15 MB 0.979
eval/Float32/derivative 4.69 k allocs: 17.7 MB 4.58 k allocs: 17.2 MB 1.02
eval/Float32/derivative_turbo 4.66 k allocs: 17.5 MB 4.68 k allocs: 17.6 MB 0.996
eval/Float32/evaluation 0.972 k allocs: 1.27 MB 0.969 k allocs: 1.26 MB 1
eval/Float32/evaluation_bumper 0.303 k allocs: 0.393 MB 0.303 k allocs: 0.393 MB 1
eval/Float32/evaluation_turbo 0.969 k allocs: 1.26 MB 0.981 k allocs: 1.28 MB 0.988
eval/Float32/evaluation_turbo_bumper 0.303 k allocs: 0.393 MB 0.303 k allocs: 0.393 MB 1
eval/Float64/derivative 4.8 k allocs: 0.0351 GB 4.84 k allocs: 0.0354 GB 0.991
eval/Float64/derivative_turbo 4.78 k allocs: 0.0349 GB 4.77 k allocs: 0.0349 GB 1
eval/Float64/evaluation 0.981 k allocs: 2.51 MB 1.01 k allocs: 2.58 MB 0.973
eval/Float64/evaluation_bumper 0.303 k allocs: 0.771 MB 0.303 k allocs: 0.771 MB 1
eval/Float64/evaluation_turbo 0.987 k allocs: 2.53 MB 0.999 k allocs: 2.56 MB 0.988
eval/Float64/evaluation_turbo_bumper 0.303 k allocs: 0.771 MB 0.303 k allocs: 0.771 MB 1
utils/combine_operators/break_sharing 4 allocs: 0.953 kB 4 allocs: 0.953 kB 1
utils/convert/break_sharing 2 k allocs: 0.123 MB 2 k allocs: 0.123 MB 1
utils/convert/preserve_sharing 2.4 k allocs: 0.192 MB 2.4 k allocs: 0.192 MB 1
utils/copy/break_sharing 2 k allocs: 0.123 MB 2 k allocs: 0.123 MB 1
utils/copy/preserve_sharing 2.4 k allocs: 0.192 MB 2.4 k allocs: 0.192 MB 1
utils/count_constant_nodes/break_sharing 4 allocs: 0.953 kB 4 allocs: 0.953 kB 1
utils/count_constant_nodes/preserve_sharing 0.404 k allocs: 0.0696 MB 0.404 k allocs: 0.0696 MB 1
utils/count_depth/break_sharing 4 allocs: 0.953 kB 4 allocs: 0.953 kB 1
utils/count_nodes/break_sharing 4 allocs: 0.953 kB 4 allocs: 0.953 kB 1
utils/count_nodes/preserve_sharing 0.404 k allocs: 0.0696 MB 0.404 k allocs: 0.0696 MB 1
utils/get_set_constants!/break_sharing 0.898 k allocs: 25.2 kB 0.898 k allocs: 25.2 kB 1
utils/get_set_constants!/preserve_sharing 1.7 k allocs: 0.138 MB 1.7 k allocs: 0.138 MB 1
utils/get_set_constants_parametric 1.42 k allocs: 0.0663 MB 1.42 k allocs: 0.0663 MB 1
utils/has_constants/break_sharing 4 allocs: 0.203 kB 4 allocs: 0.203 kB 1
utils/has_operators/break_sharing 4 allocs: 0.203 kB 4 allocs: 0.203 kB 1
utils/hash/break_sharing 0.104 k allocs: 2.52 kB 0.104 k allocs: 2.52 kB 1
utils/hash/preserve_sharing 0.504 k allocs: 0.0711 MB 0.504 k allocs: 0.0711 MB 1
utils/index_constant_nodes/break_sharing 2.1 k allocs: 0.094 MB 2.1 k allocs: 0.094 MB 1
utils/index_constant_nodes/preserve_sharing 2.5 k allocs: 0.163 MB 2.5 k allocs: 0.163 MB 1
utils/is_constant/break_sharing 4 allocs: 0.203 kB 4 allocs: 0.203 kB 1
utils/simplify_tree/break_sharing 4 allocs: 0.953 kB 4 allocs: 0.953 kB 1
utils/simplify_tree/preserve_sharing 0.404 k allocs: 0.0696 MB 0.404 k allocs: 0.0696 MB 1
utils/string_tree/break_sharing 11.8 k allocs: 1.04 MB 11.8 k allocs: 1.04 MB 1
utils/string_tree/preserve_sharing 12.2 k allocs: 1.11 MB 12.2 k allocs: 1.11 MB 1
time_to_load 0.159 k allocs: 11.2 kB 0.159 k allocs: 11.2 kB 1

Copy link

codecov bot commented Aug 30, 2025

Codecov Report

❌ Patch coverage is 0% with 245 lines in your changes missing coverage. Please review.
✅ Project coverage is 3.06%. Comparing base (2c8d3ca) to head (76e9a5b).

Files with missing lines Patch % Lines
src/ArrayNode.jl 0.00% 245 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (2c8d3ca) and HEAD (76e9a5b). Click for more details.

HEAD has 4 uploads less than BASE
Flag BASE (2c8d3ca) HEAD (76e9a5b)
5 1
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #138       +/-   ##
==========================================
- Coverage   96.66%   3.06%   -93.61%     
==========================================
  Files          30      31        +1     
  Lines        2612    2745      +133     
==========================================
- Hits         2525      84     -2441     
- Misses         87    2661     +2574     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant