-
Notifications
You must be signed in to change notification settings - Fork 589
Use compiler builtins to detect "simple common cases" in pp_add, pp_subtract, and pp_multiply #23503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This breaks Win32, which doesn't enable the I suspect it's due to |
Thank you for your comment. My patch has a fallback code (similar to the code used before this change) for compilers with no overflow-checking builtins, but it had a bug (I was able to reproduce similar symptoms by |
inline.h
Outdated
# ifndef IV_MUL_OVERFLOW_IS_EXPENSIVE | ||
/* Strict overflow check for IV multiplication is generally expensive | ||
* when IV is a multi-word integer. */ | ||
# define IV_MUL_OVERFLOW_IS_EXPENSIVE (IVSIZE > LONGSIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a reasonable test - if we enable the builtins for GCC on Win32 x86-64 this will be false even though it is a 64-bit platform.
Testing against PTRSIZE is probably better since that better matches the platform word size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing this out. I've pushed a commit to change the test to against PTRSIZE.
inline.h
Outdated
|
||
/* Define IV_*_OVERFLOW_IS_EXPENSIVE below to nonzero value | ||
* if strict overflow checks are too expensive | ||
* (for example, for CPUs that has no hardware overflow detection flag). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RISC V doesn't have a hardware overflow flag (or any classic carry, zero etc flags), so the compiler generates more complex code:
bool my_chk_add(long a, long b, long *result) {
return __builtin_add_overflow(a, b, result);
}
; RISC-V
my_chk_add(long, long, long*):
add a5,a0,a1
slt a0,a5,a0
slti a1,a1,0
sub a0,a1,a0
sd a5,0(a2)
snez a0,a0
ret
; amd64
"my_chk_add(long, long, long*)":
add rdi, rsi
mov QWORD PTR [rdx], rdi
seto al
ret
; arm64
my_chk_add(long, long, long*):
adds x1, x0, x1
str x1, [x2]
cset w0, vs
ret
(no action needed here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as RISC-V ISA is concerned, it seems to require only two (slt
and slti
) additional instructions to detect signed overflow. I think this is still faster than classic overflow estimations previously used in pp_add
and pp_subtract
.
# endif | ||
|
||
# if defined(I_STDCKDINT) && !IV_ADD_SUB_OVERFLOW_IS_EXPENSIVE | ||
/* XXX Preparation for upcoming C23, but I_STDCKDINT is not yet tested */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modern clang has stdckdint.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but it would require patches to Configure
and so on (and I currently have no test environment with modern Clang or GCC 14+), so I intend to make a separate PR to enable it.
It looks reasonable to me. 3 commit messages indicate they need to be squashed |
…vailable This will hopefully make the code faster and smaller, and make more cases to be handled as "simple common cases". Note that this change uses HAS_BUILTIN_{ADD,SUB,MUL}_OVERFLOW macros which have already been defined in config.h but seem not to have been used by existing code. t/op/64bitint.t: Add tests to exercise "simple common cases". Note that these tests should pass even before this change. Thanks to @tonycoz for advices to make this patch work better for LLP64 platforms, especially Win32 x86-64.
b2a723a
to
61ea76f
Compare
Thank you for reviewing. I've squashed (and rebased) the commits. |
t/op/64bitint.t
Outdated
@@ -469,4 +469,30 @@ cmp_ok 0x3ffffffffffffffe % -0xc000000000000000, '==', -0x8000000000000002, 'mo | |||
cmp_ok 0x3fffffffffffffff % -0xc000000000000000, '==', -0x8000000000000001, 'modulo is (IV_MIN-1)'; | |||
cmp_ok 0x4000000000000000 % -0xc000000000000000, '==', -0x8000000000000000, 'modulo is IV_MIN'; | |||
|
|||
# Arithmetic close to IV overflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this comment line and the one on line 482 below, consider adding [GH 23503]
so that in the future we can look up the context of the code change more easily.
…tests. Thanks to @jkeenan for pointing this out.
This will hopefully make the code faster and smaller, and make more cases to be handled as "simple common cases".
Note that this change uses
HAS_BUILTIN_{ADD,SUB,MUL}_OVERFLOW
macros which have already been defined inconfig.h
but seem not to have been used by existing code.The behavior should be the same before and after this change.