-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable AVX512VL + AVX512DQ #5694
base: bleeding-jumbo
Are you sure you want to change the base?
Conversation
CI is happy. Nothing bad so far. |
It seems it tests CPU support twice (as seen below): configure: Trying to force avx512bw using default method (--enable-simd=avx512bw).
checking if gcc supports -mavx512bw -mavx512vl -mavx512dq w/ linking... yes
checking for extra ASFLAGS... None needed
checking for X32 ABI... no
checking special compiler flags... Intel x86
configure: Testing tool-chain's CPU support with given options
checking for MMX... yes
checking for SSE2... yes
checking for SSSE3... yes
checking for SSE4.1... yes
checking for SSE4.2... yes
checking for AVX... yes
checking for XOP... no
checking for AVX2... yes
checking for AVX512BW + AVX512VL + AVX512DQ... yes
checking if gcc supports -maes -mpclmul... yes It doesn't hurt. |
doc/NEWS
Outdated
@@ -420,6 +420,7 @@ Major changes from 1.9.0-jumbo-1 (May 2019) in this bleeding-edge version: | |||
- Add Oubliette Password Manager support (two formats and oubliette2john.py). | |||
[DavideDG; 2025] | |||
|
|||
- Turn AVX512 into AVX512BW + AVX512VL + AVX512DQ. [Claudio André; 2025] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's confusing. I suggest:
- Use AVX512VL XOP-like bit rotates for scrypt's Salsa20. [Solar; 2025]
- When we use AVX512BW, also enable usage of AVX512VL and AVX512DQ. [Claudio André; 2025]
src/configure.ac
Outdated
done | ||
else | ||
CPU_BEST_FLAGS_MAIN=-DJOHN_$(echo ${SIMD_NAME} | tr .a-z _A-Z) | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt we need this complication. Can't we just continue with JOHN_AVX512BW
alone, but understand that it implies VL and DQ? I also don't know whether the +=
syntax works with other shells.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible if everyone understands that BW implies the rest.
src/configure.ac
Outdated
CPU_NAME="$host_cpu AVX512BW" | ||
else | ||
CPU_NAME="$host_cpu $SIMD_NAME" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like unneeded complication as well.
src/m4/jtr_x86_logic.m4
Outdated
@@ -249,37 +249,38 @@ if test "x$simd" != xno; then | |||
AS_IF([test "x$CPU_NOTFOUND" = x0], | |||
[ | |||
CFLAGS="$CFLAGS_BACKUP -mavx512f -P $EXTRA_AS_FLAGS $CPPFLAGS $CFLAGS_EXTRA $CPUID_ASM" | |||
CFLAGS="$CFLAGS_BACKUP -mavx512bw -mavx512vl -mavx512dq -P $EXTRA_AS_FLAGS $CPPFLAGS $CFLAGS_EXTRA $CPUID_ASM" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're not implementing the full reverse order of checks + optimization, then maybe let's not reorder F vs. BW here? If we were checking F first, then continue to check it first. This PR's changes would be smaller then.
src/m4/jtr_x86_logic.m4
Outdated
[CPU_BEST_FLAGS="-mavx512f"] | ||
[SIMD_NAME="AVX512F"] | ||
[CPU_BEST_FLAGS="-mavx512bw -mavx512vl -mavx512dq"] | ||
[SIMD_NAME="AVX512(BW+VL+DQ)"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe continue to say just AVX512BW
here.
src/m4/jtr_x86_logic.m4
Outdated
AC_LINK_IFELSE( | ||
[ | ||
AC_LANG_SOURCE( | ||
[[#include <immintrin.h> | ||
#include <stdio.h> | ||
extern void exit(int); | ||
int main(){__m512i t, t1;*((long long*)&t)=1;t1=t;t=_mm512_mul_epi32(t1,t);if((*(long long*)&t)==88)printf(".");exit(0);}]] | ||
int main(){__m128i t, t1;*((long long*)&t)=1;t1=t;t=_mm_rol_epi32(t1,1);if((*(long long*)&t)==88)printf(".");exit(0);}]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did suggest using the same intrinsic we actually use, but I didn't mean to test it instead of testing any 512-bit BW intrinsic. I think we should either revert this change entirely or test both _mm_rol_epi32
and _mm512_mul_epi32
.
While there are no current nor planned CPUs that have BW without VL nor vice versa, there may be future CPUs supporting AVX10/256 where the 128-bit VL intrinsic would compile and run yet this wouldn't imply support for 512-bit BW. Such future CPUs wouldn't set the CPUID bit corresponding to VL, but here we're not checking CPUID at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't realize you previously got the _mm512_mul_epi32
from the section for F, not for BW. Then revert to what we were checking for BW, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there may be future CPUs supporting AVX10/256 where the 128-bit VL intrinsic would compile and run yet this wouldn't imply support for 512-bit BW. Such future CPUs wouldn't set the CPUID bit corresponding to VL
Upon a second thought, actually maybe they would set that CPUID bit. It's no problem, and no reason to change anything in this PR - I am just correcting what I wrote for the sake of it. We may want to add AVX10/256 support later, with a separate PR, and maybe when such CPUs actually appear and can be tested. As a guess, maybe we'll be checking for VL alone as a separate configure test from BW+VL+DQ, and would need to treat it differently in code (in many ways, including CPUID check and non-usage of 512-bit vectors).
#define CPU_NAME "AVX512BW" | ||
#define CPU_REQ_AVX512VL 1 | ||
#define CPU_REQ_AVX512DQ 1 | ||
#define CPU_NAME "AVX512(BW+VL+DQ)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep all 3 mentioned in CPU_NAME
, for reporting in the "Sorry" line. (No further change is needed here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of CPU_REQ_AVX512VL
(+DQ) is also required. At least desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks almost good enough to merge, with only trivial cleanups maybe left. Thank you, @claudioandre-br!
|
||
AC_MSG_CHECKING([for AVX512BW]) | ||
AC_MSG_CHECKING([for AVX512BW + AVX512VL + AVX512DQ]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, the test program we run only checks BW and VL, and then we assume DQ is implied. So we could want to make it just for AVX512BW + AVX512VL
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it gets confusing.
The else
part (runs when --enable-simd=avx512bw
) has a test program that only tests AVX512BW + AVX512VL
.
The if
part (runs when --native-tests=true
) does not use a test program. It uses CPU_detect()
.
- In any case, both use the
-mavx512dq
flag. CPU_detect
without setting a value for CPU_REQ_* seems wrong to me. It should be like this:
I added a #define CPU_REQ_AVX512BW 1
#define CPU_REQ_AVX512BW 1
extern int CPU_detect(void); extern char CPU_req_name[];
unsigned int nt_buffer8x[4], output8x[4];
int main(int argc, char **argv) { return !CPU_detect(); }
Anyway, should I remove DQ string ???? Is a new commit with a fix for cpu_detection required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I think it's OK to leave this as you have it for this PR, no further change needed. Thank you!
@@ -464,16 +464,17 @@ dnl ====================================================================== | |||
|
|||
AS_IF([test "x$CPU_NOTFOUND" = x0], | |||
[ | |||
AC_MSG_CHECKING([for AVX512BW]) | |||
AC_MSG_CHECKING([for AVX512BW + AVX512VL + AVX512DQ]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and here.
(I don't get why we have this in two places.)
#define C7_AVX2 $0x00000020 /* AVX2 */ | ||
#define C7_AVX512F $0x00010000 | ||
#define C7_AVX512BW $0x40010000 /* AVX512BW + AVX512F */ | ||
#define C7_AVX512VL $0xC0010000 /* AVX512BW + AVX512VL + AVX512F */ | ||
#define C7_AVX512DQ $0xC0030000 /* AVX512BW + AVX512DQ + AVX512VL + AVX512F */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I didn't review the specific bitmasks against the documentation. I just hope they're correct.)
Binary john needs AVX512VL's XOP-like bit rotates for faster Salsa20 in yescrypt. Without `VL` enabled compilers don't use mnemonics at all. As it stands now, the possible binaries are: - AVX512BW + AVX512VL + AVX512DQ - AVX512F - AVX2 - And so on. There is no AVX512BW only binary. See: openwall#5691. Signed-off-by: Claudio André <[email protected]>
Let's hear bots.
I'll remove
runstatedir
after testing.OR:
Version: 1.9.0-jumbo-1+bleeding-60f3614a06 2025-03-10 07:27:48 -0300 Build: linux-gnu 64-bit x86_64 AVX512(BW+VL+DQ) AC OMP SIMD: AVX512BW, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1 AES hardware acceleration: AES-NI CPU tests: AVX512(BW+VL+DQ) $JOHN is ../run/