When are things guaranteed to have unique addresses? #206

New issue

Open

When are things guaranteed to have unique addresses?#206

Labels

A-pointer-equalityC-open-questionS-pending-design

comex

References to constants are not guaranteed to have unique addresses:

assert!(&2 as *const i32 != &2 as *const i32); // fails

Since consts are just aliases, the same holds for those:

const A: i32 = 2;
const B: i32 = 2;
assert!(&A as *const i32 != &B as *const i32); // fails

What about statics? static variables with interior mutability (and static mut variables) obviously must have unique addresses, but what about ones without?

static A: i32 = 2;
static B: i32 = 2;
assert!(&A as *const i32 != &B as *const i32); // passes

And local variables? (Assuming that both variables are alive at the point of comparison, since obviously variables that have fallen out of scope can have their addresses reused.)

let a = 2;
let b = 2;
assert!(&a as *const i32 != &b as *const i32); // passes

Currently, rustc seems to produce unique addresses in both cases. But @gnzlbg is under the impression that multiple local variables are not guaranteed to have distinct addresses.

Address uniqueness can be a useful property, e.g. if you want a unique 'sentinel' value to assign to a pointer variable. On the other hand, I'd say Rust usually avoids giving much significance to something being a variable as opposed to an expression.

A related issue is #15, which is about whether the address of something can change over time.

Compared to C and C++

In C, rvalues are not implicitly bound to addresses unless assigned to a variable (or a C99 compound literal). C appears to guarantee that distinct variables have distinct addresses.

In C++, rvalues can be implicitly bound to const references, which gives them an address: this is "temporary materialization" and creates a "temporary object". Like C, the C++ spec guarantees that distinct "objects" "compare unequal", so I think this assertion is guaranteed to pass (not sure though):

#include <assert.h>
void foo(const int &a, const int &b) {
    assert(&a != &b);
}

int main() {
    foo(2, 2);
}

In practice, this means that the compiler always stores a copy of the constant on the stack and takes the address of that, rather than directly referencing a static allocation.

comex

mentioned this

on Sep 20, 2019

Some questions around what is UB and what is defined when using raw pointers #205

gnzlbg

Contributor

But @gnzlbg is under the impression that multiple local variables are not guaranteed to have distinct addresses.

FWIW I'm under that impression because I can't find such a guarantee being written down anywhere (I looked in the book, the reference, and the nomicon). In this case,

let mut a = 2;
let mut b = 2;
assert_ne!(&mut a as *mut _ as usize, &mut b as *mut _ as usize); // PASS

we do guarantee that the assert never fails because we guarantee that two aliasing &mut T cannot be created in safe Rust code. However, in this case:

let a: Freeze = val;
let b: Freeze = val;
assert_ne!(&a as *const _ as usize, &b as *const _ as usize);

I don't think it would be unsound for a and b to have the same observable addresses.

If we allow that, as long as the allocation cannot be modified (e.g. because the allocation is immutable and it does not contain an UnsafeCell), then all allocations with the same value can always be merged, independently of whether the program might try to observe their addresses or not.

I have no clue whether this is worth doing, but in general I find that code that relies on the relative addresses of let bindings on the stack to be brittle anyways.

Address uniqueness can be a useful property, e.g. if you want a unique 'sentinel' value to assign to a pointer variable

Just keep in mind that the address is only unique while the let binding is alive, e.g.,

pub fn foo() -> bool {  baz(bar()) }
pub fn bar() -> *const i32 {
    let x = 42;
    &x as *const _ // this address is only unique until the end of bar!
}
pub fn baz(x: *const i32) -> bool {
    let x = 42;
    assert_ne!(&x as *const _ as usize, x as usize)
    // ^ this can fail since both addresses are not guaranteed to be different
}

References to constants are not guaranteed to have unique addresses:

Note that there are no references to constants: &CONST is just sugar for let x = CONST; &x, and &mut CONST for let mut x = CONST; &mut x, so the behavior that you are observing for those is the same as the corresponding let bindings.

gnzlbg

Contributor

Compared to C and C++

One main difference between C and C++ is that they do not have zero-sized types (although C++ has EBO / [[no_unique_address]]). In Rust, [ZST; N] already creates N objects all having the same address. So (playground):

#[derive(Copy, Clone)] struct ZST;
fn main() { 
    let mut xs = [ZST; 2];
    assert_eq!(
        &mut xs[0] as *mut _ as usize, 
        &mut xs[1] as *mut _ as usize
    );
}

passes. I don't see why this couldn't happen for multiple let bindings in the stack to ZSTs.

RalfJung

Member

The entire purpose of static is for them to be items with a fixed stable address in memory, so I think that's pretty much a guarantee. By this I mean that every static is its own disjoint "allocated object" with the size given by its type. That means that taking the address of the same static twice during execution will give the same address. However, distinct statics can still have the same address if one of them is a ZST.

For let, I would also argue that these refer to distinct stack slots, so here:

let a: Freeze = val;
let b: Freeze = val;
assert_ne!(&a as *const _ as usize, &b as *const _ as usize);

we guarantee that addresses are different if the type has a size of at least 1. I find it hard to imagine a semantics that lets us overlap their storage here.

let-bound variables are also separate "allocated objects", but what makes this tricky is their lifetime (as in the time when they get allocated and freed, which has little to do with lifetimes in Rust's type systems). Separate allocated objects are only guaranteed to have distinct addresses if their size is non-zero and their lifetimes overlap. For let-bound variables, they might be live for shorter than the duration of the function call, and moreover a let-bound variable might switch between live and non-live any number of times within a function call, and it becomes a new allocated object (with a possibly distinct address) each time. This is hard to specify on the surface language (and we might not want to commit to all the details), but we can be fairly precise on the MIR level: StorageLive allocates the object for a local, StorageDead deletes it, and we might get a StorageRefresh as well which semantically is just sugar for StorageDead; StroageLive and thus means the allocated object can move to a new location.

In Rust, [ZST; N] already creates N objects all having the same address.

Operationally really [ZST; N] is just a NOP. So I wouldn't say that any objects are being created here. Rust doesn't really have a notion of "object" other than "allocated object", and let x = [ZST; N] just creates one allocated object of size 0 (no matter than N).

Diggsey

Address uniqueness can be a useful property, e.g. if you want a unique 'sentinel' value to assign to a pointer variable. On the other hand, I'd say Rust usually avoids giving much significance to something being a variable as opposed to an expression.

Are there any other uses of address uniqueness? Maybe instead of coming up with complicated rules for when addresses are unique and then being limited by those rules for backwards compatibility, we could have an attribute or something to opt-in to a variable having a unique address?

RalfJung

Member

I think the rules will only become even more complicated if we try to relax them. Remember, Rust is not specified axiomatically by saying "these properties hold for all program executions"; there is an Abstract Machine with an operational specification -- something you can put into an interpreter -- that explains all Rust behavior. So you'd have to propose some mechanism e.g. in Miri to actually observe overlapping addresses for such variables.

added

Contributor

I was thinking about this the other day, and have some thoughts. For now, I am going to assume two things 1) local variables have stable addresses (not that the alternative might not be just as interesting). 2) A strict provenance model like @Gankra proposes. If we allow some int to pointer casts, some of these options may fall away.

First thing first: I believe there is absolutely no need for distinct allocations to have distinct addresses - we have provenance to disambiguate. As I understand SB, the "provenance" of a pointer is an integer that identifies an item in the borrow stacks, and these integers never repeat. Consequently, there should be no problem saying that "when a pointer is dereferenced, we search all the bytes that have the same address as the pointer, and see if there is an item in any of the borrow stacks that makes the access legal. There can be at most one, since no pointer can have provenance to more than one allocation." This might feel a little surprising, but I don't think it's as bad as it initially sounds; it might even be a way to drive home the "memory is not flat" point. (this also plays nicely with the mental model that Gankra proposed for memory, where it's a two dimensional grid of address x provenance).

Now the probably more difficult question: What do we want to guarantee? I am for now only going to think about stack local variables; there might be interesting (different) arguments for other categories of allocations. I see at least a few possibilities, but am completely undecided myself.

Because I'll be talking about some optimizations, I'll need to differentiate between "live range in the abstract machine" and "live range as reported by compiler analyses." I'll refer to the first as "scope" and the second as "liveness."

We do not guarantee that simultaneously in scope locals have distinct addresses

This has the benefit of enabling optimizations. The stack slot in this code cannot be re-used:

let mut x = 5;
foo(&x);
x = 10;
let y = 5;
foo(&y);

It would be possible and actually fairly easy for a Rust compiler to see that x is dead when y is created; the x = 10; invalidates any other pointers to it. However, foo may compute the addresses of x and y, and so they can't overlap (and the pointers are valid to be dereferenced, so we can't lie to foo either). Interestingly though, this optimization does not require the assignment to x. It would also be legal to use a single stack slot in this case:

let x = 5;
let y = 5;
foo(&x, &y);

That is simultaneously more powerful but also potentially more surprising. I'm not sure how much benefit the above two optimizations give. However, I could see the following optimization being potentially more useful:

let mut x = input();
foo(&x);
x += 10;
let y = x;
bar(&y);

here, the optimization would not be to re-use a stack slot (which has relatively small benefits), but to be able to merge x and y at the MIR level entirely. This avoids a copy and also enables future optimizations in a very significant way.

We do guarantee that simultaneously in scope locals have distinct addresses

The main benefit of this is to disable the potentially surprising optimizations above. I had asked about use cases for such a guarantee on discord (besides not having to go "wtf is the compiler doing"), and something like HashMap<*const i32, T> came up. I guess I could see that being a useful type, but it's fairly limited - keep in mind that the pointers have to be in different allocations for the question in this issue to matter.

This additionally has the downside of making MIR storage markers be statements that have significant semantics. In other words,

StorageLive(_3);
StorageDead(_4);

could not be freely re-ordered. This is related to and discussed in rust-lang/rust#68622 .

Some alternative?

We could try and define the guarantees around here in terms of some analysis or other conditions. I've talked to Ralf enough that my instinctive reaction to that is now also "that's not an operational semantics," but I actually think the need for a real operational semantics might be reduced here - the values of the addresses are implementation defined anyway. @moulins had suggested the following on discord:

For two pointers p and q, if:

p and q have provenance for one byte, and

p and q have a valid tag in their borrow stack
Then, either:

p as usize == q as usize and p and q alias, or

p as usize != q as usize and p and q don't alias

This would maybe allow some of the optimizations, but I have some concerns about this definition; at least as I understand it, there's an implicit requirement here of "two pointers that exist at the same time." But it's not clear to me how we should define this concept without typed memory - pointers only exist as values temporarily, most of the time there are just pointer bytes in memory that do not necessarily correspond to an actual value.

In any case, there might be some idea here that I haven't thought of

scottmcm

Member

The entire purpose of static is for them to be items with a fixed stable address in memory, so I think that's pretty much a guarantee.

Is there anything that would keep us from merging impl Freeze statics? Certainly if they have unsafe cells then we shouldn't merge them, but for something like [u32; 4] it seems like we could reasonably merge things -- even have a different static be in the middle of another, so long as it's the right value.

(But it also seems fine to say "well just use const if you want that".)

Lokathor

Contributor

That does sound fine for any read-only static.

thomcc

Member

It seems better to require const for this. i can think of cases in C++ where a static is used just to generate a value so that the pointer is used as an identity. I think it would be confusing to have this require UnsafeCell even in the case where it's never written.

That said, I don't feel that strongly here... but I suspect in practice this would be pretty low value of an optimization TBH.

comex

Author

The defmt crate currently relies on identical statics not being combined, though its statics are unusual in having both link_section and export_name attributes.

RalfJung

Member

We do guarantee that simultaneously in scope locals have distinct addresses

What I expected cold be phrased as "simultaneously live locals have distinct addresses. That would still allow your first and 3rd optimization, but not the 2nd.

JakobDegen

Contributor

Well, maybe, but that would require a definition of liveness on the AM, which seems non-trivial

21 remaining items

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

A-pointer-equalityC-open-questionS-pending-design

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When are things guaranteed to have unique addresses? #206

Compared to C and C++

We do not guarantee that simultaneously in scope locals have distinct addresses

We do guarantee that simultaneously in scope locals have distinct addresses

Some alternative?

21 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

When are things guaranteed to have unique addresses? #206

Description

Compared to C and C++

Activity

gnzlbg commented on Sep 21, 2019

gnzlbg commented on Sep 21, 2019

RalfJung commented on Oct 9, 2019

Diggsey commented on Oct 10, 2019

RalfJung commented on Oct 10, 2019

JakobDegen commented on Mar 28, 2022

We do not guarantee that simultaneously in scope locals have distinct addresses

We do guarantee that simultaneously in scope locals have distinct addresses

Some alternative?

scottmcm commented on Mar 29, 2022

Lokathor commented on Mar 29, 2022

thomcc commented on Mar 29, 2022

comex commented on Mar 29, 2022

RalfJung commented on Apr 1, 2022

JakobDegen commented on Apr 1, 2022

21 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions