Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Intel SHA extensions #2734

Closed
kholia opened this issue Sep 7, 2017 · 5 comments
Closed

Support for Intel SHA extensions #2734

kholia opened this issue Sep 7, 2017 · 5 comments

Comments

@kholia
Copy link
Member

kholia commented Sep 7, 2017

References,

It seems that Intel Software Development Emulator can be used to development and testing.

intel-sha-extensions_1.zip <- code from Intel.

@magnumripper
Copy link
Member

magnumripper commented Sep 7, 2017

"The SHA instructions are non-SIMD although they are defined with XMM width operands"

I had a look at this some time ago. As far as I understood it, the extensions are fast at producing a single SHA-1 or SHA-256 digest, as opposed to our current code producing eg. eight in parallel. At the time, my conclusion was we wouldn't gain anything (unless they are several times faster than our current code). I hope someone can prove me wrong!

@magnumripper
Copy link
Member

@solardiz do you have insights?

@solardiz
Copy link
Member

solardiz commented Sep 7, 2017

It's hard to tell without testing on real hardware, including interleaving of several instances of SHA using those instructions and maybe our usual SIMD at once. We might gain something. Without access to hardware yet, we could review documentation for what uop port(s) those instructions are issued on - do they fully conflict with SIMD or not. Also need their latency & throughput numbers, to compare that against what we achieve with SIMD.

At four (SHA-1) or two (SHA-256) rounds per instructions and interleaving of several instances (to be friendly to CPU's pipelining), these might be competitive with AVX2 or AVX-512 even despite of computing fewer instances.

AMD Ryzen hardware is already available, cheaply. So maybe one of us should get a machine like that and try? Then we'll also need to try/tune on Intel, which might or might not require different tuning - interleaving factor and whether and how much SIMD to use as well.

Per http://instlatx64.atw.hu it looks like on Ryzen the 4 rounds of SHA-1 may only be issued once per 4 cycles, and the 2 rounds of SHA-256 only once per 2 cycles. If so, I wouldn't expect them to be useful for us on those CPUs unless we can efficiently interleave with SIMD.

@magnumripper
Copy link
Member

This would be a great GSoC project, or the like.

@solardiz
Copy link
Member

Closing this in favor of #5437.

@solardiz solardiz closed this as not planned Won't fix, can't repro, duplicate, stale May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants