Skip to content

On the legality of introducing spurious loads of &UnsafeCell (aka: dereferenceable and noalias don't interact well) #435

Open
@RalfJung

Description

@RalfJung

We require that &UnsafeCell are "dereferenceable on function entry" in the sense of pointing to allocated memory. This is believed to allow the introduction of spurious loads, if the compiler can prove that memory has not been deallocated yet. But that is far from obvious...

Consider:

use std::cell::UnsafeCell;

fn internal(x: &UnsafeCell<i32>, y: &mut i32, choice: impl FnOnce() -> bool) {
  if choice() {
    *y = 0;
  } else {
    let v = unsafe { x.get().read() };
    println!("{}", v);
  }
}

pub fn public(choice: impl FnOnce() -> bool) {
    let x = &UnsafeCell::new(0);
    let y = unsafe { &mut *x.get() };
    internal(x, y, choice);
}

Under LLVM noalias and under Tree Borrows, public is a sound function: if choice() is true, we write to y and nothing is even strange; if choice() is false then the read from x means y can no longer be written, but y is still valid for reads so the protector does not kick in. (In LLVM terms, noalias is completely fine with arbitrary aliasing as long as all accesses are reads, and if choice() is false then there are no writes.) SB is unhappy with this example, but SB is often too strict.

Let's focus on internal, and let's imagine we introduce a spurious load from x at the beginning of the function. If any kind of spurious load is allowed, this one definitely is. Then we observe that in the else case, x is being read twice, and let's say we know that the closure will not mutate x (it cannot, since x is privately allocated in public and its provenance is never leaked to outside code) -- let's say the choice function is marked "readonly". We arrive at:

fn internal(x: &UnsafeCell<i32>, y: &mut i32, choice: impl FnOnce() -> bool) {
  let v = unsafe { x.get().read() };
  if choice() {
    *y = 0;
  } else {
    println!("{}", v);
  }
}

Interpreted as Rust code this has obvious UB if choice() returns true, but okay, maybe our IR has a different semantics. But what could those semantics be?

  • If choice() is true, the read must somehow be considered "invalid", maybe we make it return poison or so (similar to what would happen according to LLVM semantics if this read introduced a data race). Certainly the read must not observe the actual memory contents, because if it did it would not be reorderable around the aliasing write.
  • If choice() is false, the read must not return poison, as printing a poison value is UB!
  • choice() might well read from stdin and use that information to determine what it should return

In other words, the semantics of the read must predict the future to be able to decide whether it should return poison and keep the aliasing state as-is, or return the actual data and mark the memory as "must not be mutated through other pointers". Equivalently, the read is making an angelic choice between these two options.

So... any semantics that wants to introduce spurious loads for &UnsafeCell must at least use angelic choice, and deal with the consequences of that (e.g. angelic choice cannot in general be freely reordered down across demonic choice). We should hold off in having MIR transformations introduce such spurious loads until that is clarified.

The status quo is that we do not emit dereferenceable for these references, so LLVM shouldn't be introducing any spurious loads, so we are not at risk of being affected by this. Even if LLVM gains a "dereferenceable on entry" attribute, I think we shouldn't use it on !Freeze (and !Unpin) types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-dereferenceableTopic: when exactly does a reference need to point to regular dereferenceable memory?

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions