Description
Somebody on the internet (https://blog.dend.ro/rust-and-the-case-of-the-redundant-comparison/) complained that something like this:
fn vec_clear(x: &mut i32) {
if *x != 0 {
*x = 0;
}
}
generates a conditional store:
cmpl $0, (%rdi)
je .LBB0_2
movl $0, (%rdi)
.LBB0_2:
retq
on x86_64
instead of just an unconditional store movl $0, (%rdi); retq
.
Taking a look at the optimized LLVM-IR:
define void @vec_clear(i32* noalias nocapture dereferenceable(4) %x) {
start:
%0 = load i32, i32* %x, align 4
%1 = icmp eq i32 %0, 0
br i1 %1, label %bb2, label %bb1
bb1:
store i32 0, i32* %x, align 4
br label %bb2
bb2:
ret void
}
shows the issue.
The LLVM-IR generated by rustc is loosing critical information. It marks i32*
as noalias
, which means, that no other pointers in vec_clear
's scope will alias it. However, outside vec_clear
scope, other pointers are allowed to alias that memory. That is, if *x
is zero, other threads could be concurrently reading the memory and if LLVM would generate an unconditional store here, that would introduce a data-race, which means that this optimization is not safe on the LLVM-IR generated by rustc. OTOH, &mut i32` means that the pointer has unique access to the memory, that is, no other pointer can access the memory behind it as long as that pointer is alive. Therefore, transforming the code to an unconditional store does not introduce a data-race.
Therefore, I think that noalias
is not enough to perform this optimization and that we would need something stronger for LLVM to be able to perform it.
This also shows that &mut T
is stronger than C's restrict
keyword.
Activity
RalfJung commentedon Aug 6, 2018
Ah, good point. Read-write data races "just" make the read yield
undef
, but even that would clearly be a misoptimization.Correct. AFAIK,
noalias
was never meant to express the full set of properties. It's just the strongest thing LLVM provides.Oh yes, it is very much stronger in various ways.
varkor commentedon Aug 6, 2018
Isn't this the sort of thing the
noalias
andalias.scopes
metadata (#16515) allows one to express?gnzlbg commentedon Aug 6, 2018
@varkor what would be the scopes for the
load
andstore
s in the example?varkor commentedon Aug 6, 2018
In this example, as you point out, the aliasing is important with regards to memory accesses outside the function. So if in theory you could mark all the others... I doubt that's sufficient for LLVM though.
leonardo-m commentedon Aug 9, 2018
Is it a good idea to write a LLVM enhancement request?
gnzlbg commentedon Aug 9, 2018
As @varkor says, we could mark all others, and we would have to mark all others for every &mut that the programs creates, and even then, this is not something that alias analysis would take into account because no sane language front-end will do this.
Extending LLVM to support this won't be easy either. Currently LLVM hoists memory ops from functions when profitable, but:
so when hoisting the load (or store) from
foo
to the outer scope, the "invariant" that that's the only pointer to the data doesn't hold any more, because in the outer scope there might be other pointers to the data.So all the optimizations that currently move memory across scope would need to update and be extremely careful with any attribute/metadata that we might want to use.
Maybe a minimal extension to alias analysis that allow us to specify the "opposite" / "negative" aliasing groups would be enough, but one would need to teach many pieces of the pipeline about this for the new information to result in better code gen.
steveklabnik commentedon Sep 25, 2020
Triage: no idea what the current status of this is, to be honest. I imagine that this was never suggested upstream.
RalfJung commentedon Sep 26, 2020
If
noalias
really means "does not alias for the duration of this function call", then I think in fact it would be enough. Unfortunately,noalias
scoping in LLVM is basically undocumented, and ti is also buggy so one cannot just go from the implementation.However, a new round of noalias/restrict patches has been landing recently (https://lists.llvm.org/pipermail/llvm-dev/2019-March/131127.html, https://reviews.llvm.org/D69542#change-veawD9rpruA2), so maybe that new infrastructure is powerful enough to express the desired guarantee here.
FWIW, Stacked Borrows does allow the optimization.
6 remaining items