Description
We require that &UnsafeCell
are "dereferenceable on function entry" in the sense of pointing to allocated memory. This is believed to allow the introduction of spurious loads, if the compiler can prove that memory has not been deallocated yet. But that is far from obvious...
Consider:
use std::cell::UnsafeCell;
fn internal(x: &UnsafeCell<i32>, y: &mut i32, choice: impl FnOnce() -> bool) {
if choice() {
*y = 0;
} else {
let v = unsafe { x.get().read() };
println!("{}", v);
}
}
pub fn public(choice: impl FnOnce() -> bool) {
let x = &UnsafeCell::new(0);
let y = unsafe { &mut *x.get() };
internal(x, y, choice);
}
Under LLVM noalias
and under Tree Borrows, public
is a sound function: if choice()
is true, we write to y
and nothing is even strange; if choice()
is false then the read from x
means y
can no longer be written, but y
is still valid for reads so the protector does not kick in. (In LLVM terms, noalias
is completely fine with arbitrary aliasing as long as all accesses are reads, and if choice()
is false then there are no writes.) SB is unhappy with this example, but SB is often too strict.
Let's focus on internal
, and let's imagine we introduce a spurious load from x
at the beginning of the function. If any kind of spurious load is allowed, this one definitely is. Then we observe that in the else
case, x
is being read twice, and let's say we know that the closure will not mutate x
(it cannot, since x
is privately allocated in public
and its provenance is never leaked to outside code) -- let's say the choice
function is marked "readonly". We arrive at:
fn internal(x: &UnsafeCell<i32>, y: &mut i32, choice: impl FnOnce() -> bool) {
let v = unsafe { x.get().read() };
if choice() {
*y = 0;
} else {
println!("{}", v);
}
}
Interpreted as Rust code this has obvious UB if choice()
returns true, but okay, maybe our IR has a different semantics. But what could those semantics be?
- If
choice()
is true, the read must somehow be considered "invalid", maybe we make it returnpoison
or so (similar to what would happen according to LLVM semantics if this read introduced a data race). Certainly the read must not observe the actual memory contents, because if it did it would not be reorderable around the aliasing write. - If
choice()
is false, the read must not returnpoison
, as printing a poison value is UB! choice()
might well read from stdin and use that information to determine what it should return
In other words, the semantics of the read must predict the future to be able to decide whether it should return poison and keep the aliasing state as-is, or return the actual data and mark the memory as "must not be mutated through other pointers". Equivalently, the read is making an angelic choice between these two options.
So... any semantics that wants to introduce spurious loads for &UnsafeCell
must at least use angelic choice, and deal with the consequences of that (e.g. angelic choice cannot in general be freely reordered down across demonic choice). We should hold off in having MIR transformations introduce such spurious loads until that is clarified.
The status quo is that we do not emit dereferenceable
for these references, so LLVM shouldn't be introducing any spurious loads, so we are not at risk of being affected by this. Even if LLVM gains a "dereferenceable on entry" attribute, I think we shouldn't use it on !Freeze
(and !Unpin
) types.