-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Intel SHA extensions #2734
Comments
"The SHA instructions are non-SIMD although they are defined with XMM width operands" I had a look at this some time ago. As far as I understood it, the extensions are fast at producing a single SHA-1 or SHA-256 digest, as opposed to our current code producing eg. eight in parallel. At the time, my conclusion was we wouldn't gain anything (unless they are several times faster than our current code). I hope someone can prove me wrong! |
@solardiz do you have insights? |
It's hard to tell without testing on real hardware, including interleaving of several instances of SHA using those instructions and maybe our usual SIMD at once. We might gain something. Without access to hardware yet, we could review documentation for what uop port(s) those instructions are issued on - do they fully conflict with SIMD or not. Also need their latency & throughput numbers, to compare that against what we achieve with SIMD. At four (SHA-1) or two (SHA-256) rounds per instructions and interleaving of several instances (to be friendly to CPU's pipelining), these might be competitive with AVX2 or AVX-512 even despite of computing fewer instances. AMD Ryzen hardware is already available, cheaply. So maybe one of us should get a machine like that and try? Then we'll also need to try/tune on Intel, which might or might not require different tuning - interleaving factor and whether and how much SIMD to use as well. Per http://instlatx64.atw.hu it looks like on Ryzen the 4 rounds of SHA-1 may only be issued once per 4 cycles, and the 2 rounds of SHA-256 only once per 2 cycles. If so, I wouldn't expect them to be useful for us on those CPUs unless we can efficiently interleave with SIMD. |
This would be a great GSoC project, or the like. |
Closing this in favor of #5437. |
References,
https://software.intel.com/en-us/articles/intel-sha-extensions-implementations (has sample code)
https://en.wikipedia.org/wiki/Intel_SHA_extensions
It seems that Intel Software Development Emulator can be used to development and testing.
intel-sha-extensions_1.zip <- code from Intel.
The text was updated successfully, but these errors were encountered: