Skip to content

Conversation

@austinderek
Copy link

Implement overflow-aware optimization in ctrBlocks8Asm: make a fast branch
in case when there is no overflow. One branch per 8 blocks is faster than
7 increments in general purpose registers and transfers from them to XMM.

Added AES-192 and AES-256 modes to the AES-CTR benchmark.

Added a correctness test in ctr_test.go for the overflow optimization.

This improves performance, especially in AES-128 mode.

goos: windows
goarch: amd64
pkg: crypto/cipher
cpu: AMD Ryzen 7 5800H with Radeon Graphics
│ B/s │ B/s vs base
AESCTR/128/50-16 1.377Gi ± 0% 1.384Gi ± 0% +0.51% (p=0.028 n=20)
AESCTR/128/1K-16 6.164Gi ± 0% 6.892Gi ± 1% +11.81% (p=0.000 n=20)
AESCTR/128/8K-16 7.372Gi ± 0% 8.768Gi ± 1% +18.95% (p=0.000 n=20)
AESCTR/192/50-16 1.289Gi ± 0% 1.279Gi ± 0% -0.75% (p=0.001 n=20)
AESCTR/192/1K-16 5.734Gi ± 0% 6.011Gi ± 0% +4.83% (p=0.000 n=20)
AESCTR/192/8K-16 6.889Gi ± 1% 7.437Gi ± 0% +7.96% (p=0.000 n=20)
AESCTR/256/50-16 1.170Gi ± 0% 1.163Gi ± 0% -0.54% (p=0.005 n=20)
AESCTR/256/1K-16 5.235Gi ± 0% 5.391Gi ± 0% +2.98% (p=0.000 n=20)
AESCTR/256/8K-16 6.361Gi ± 0% 6.676Gi ± 0% +4.94% (p=0.000 n=20)
geomean 3.681Gi 3.882Gi +5.46%

The slight slowdown on 50-byte workloads is unrelated to this change,
because such workloads never use ctrBlocks8Asm.

Depends-On: I28e4c9ebb5bf72506a524e36a0c81a1b50367a84


🔄 This is a mirror of upstream PR golang#76059

@austinderek austinderek force-pushed the master branch 2 times, most recently from 5bf50a0 to 53ad68d Compare October 26, 2025 18:01
@austinderek austinderek force-pushed the aes-ctr-amd64-overflow-aware-optimization branch from 8568edb to ef888f1 Compare October 26, 2025 18:02
@austinderek austinderek force-pushed the master branch 27 times, most recently from 5dcaf9a to 6cde005 Compare October 27, 2025 07:03
@austinderek austinderek force-pushed the master branch 30 times, most recently from 00ee186 to c37e118 Compare November 3, 2025 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants