Description
References to constants are not guaranteed to have unique addresses:
assert!(&2 as *const i32 != &2 as *const i32); // fails
Since const
s are just aliases, the same holds for those:
const A: i32 = 2;
const B: i32 = 2;
assert!(&A as *const i32 != &B as *const i32); // fails
What about static
s? static
variables with interior mutability (and static mut
variables) obviously must have unique addresses, but what about ones without?
static A: i32 = 2;
static B: i32 = 2;
assert!(&A as *const i32 != &B as *const i32); // passes
And local variables? (Assuming that both variables are alive at the point of comparison, since obviously variables that have fallen out of scope can have their addresses reused.)
let a = 2;
let b = 2;
assert!(&a as *const i32 != &b as *const i32); // passes
Currently, rustc seems to produce unique addresses in both cases. But @gnzlbg is under the impression that multiple local variables are not guaranteed to have distinct addresses.
Address uniqueness can be a useful property, e.g. if you want a unique 'sentinel' value to assign to a pointer variable. On the other hand, I'd say Rust usually avoids giving much significance to something being a variable as opposed to an expression.
A related issue is #15, which is about whether the address of something can change over time.
Compared to C and C++
In C, rvalues are not implicitly bound to addresses unless assigned to a variable (or a C99 compound literal). C appears to guarantee that distinct variables have distinct addresses.
In C++, rvalues can be implicitly bound to const references, which gives them an address: this is "temporary materialization" and creates a "temporary object". Like C, the C++ spec guarantees that distinct "objects" "compare unequal", so I think this assertion is guaranteed to pass (not sure though):
#include <assert.h>
void foo(const int &a, const int &b) {
assert(&a != &b);
}
int main() {
foo(2, 2);
}
In practice, this means that the compiler always stores a copy of the constant on the stack and takes the address of that, rather than directly referencing a static allocation.
Activity
gnzlbg commentedon Sep 21, 2019
FWIW I'm under that impression because I can't find such a guarantee being written down anywhere (I looked in the book, the reference, and the nomicon). In this case,
we do guarantee that the assert never fails because we guarantee that two aliasing
&mut T
cannot be created in safe Rust code. However, in this case:I don't think it would be unsound for
a
andb
to have the same observable addresses.If we allow that, as long as the allocation cannot be modified (e.g. because the allocation is immutable and it does not contain an
UnsafeCell
), then all allocations with the same value can always be merged, independently of whether the program might try to observe their addresses or not.I have no clue whether this is worth doing, but in general I find that code that relies on the relative addresses of let bindings on the stack to be brittle anyways.
Just keep in mind that the address is only unique while the let binding is alive, e.g.,
Note that there are no references to constants:
&CONST
is just sugar forlet x = CONST; &x
, and&mut CONST
forlet mut x = CONST; &mut x
, so the behavior that you are observing for those is the same as the corresponding let bindings.gnzlbg commentedon Sep 21, 2019
One main difference between C and C++ is that they do not have zero-sized types (although C++ has EBO /
[[no_unique_address]]
). In Rust,[ZST; N]
already createsN
objects all having the same address. So (playground):passes. I don't see why this couldn't happen for multiple
let
bindings in the stack to ZSTs.RalfJung commentedon Oct 9, 2019
The entire purpose of
static
is for them to be items with a fixed stable address in memory, so I think that's pretty much a guarantee. By this I mean that everystatic
is its own disjoint "allocated object" with the size given by its type. That means that taking the address of the same static twice during execution will give the same address. However, distinct statics can still have the same address if one of them is a ZST.For
let
, I would also argue that these refer to distinct stack slots, so here:we guarantee that addresses are different if the type has a size of at least 1. I find it hard to imagine a semantics that lets us overlap their storage here.
let
-bound variables are also separate "allocated objects", but what makes this tricky is their lifetime (as in the time when they get allocated and freed, which has little to do with lifetimes in Rust's type systems). Separate allocated objects are only guaranteed to have distinct addresses if their size is non-zero and their lifetimes overlap. Forlet
-bound variables, they might be live for shorter than the duration of the function call, and moreover alet
-bound variable might switch between live and non-live any number of times within a function call, and it becomes a new allocated object (with a possibly distinct address) each time. This is hard to specify on the surface language (and we might not want to commit to all the details), but we can be fairly precise on the MIR level:StorageLive
allocates the object for a local,StorageDead
deletes it, and we might get aStorageRefresh
as well which semantically is just sugar forStorageDead; StroageLive
and thus means the allocated object can move to a new location.Operationally really
[ZST; N]
is just a NOP. So I wouldn't say that any objects are being created here. Rust doesn't really have a notion of "object" other than "allocated object", andlet x = [ZST; N]
just creates one allocated object of size 0 (no matter thanN
).Diggsey commentedon Oct 10, 2019
Are there any other uses of address uniqueness? Maybe instead of coming up with complicated rules for when addresses are unique and then being limited by those rules for backwards compatibility, we could have an attribute or something to opt-in to a variable having a unique address?
RalfJung commentedon Oct 10, 2019
I think the rules will only become even more complicated if we try to relax them. Remember, Rust is not specified axiomatically by saying "these properties hold for all program executions"; there is an Abstract Machine with an operational specification -- something you can put into an interpreter -- that explains all Rust behavior. So you'd have to propose some mechanism e.g. in Miri to actually observe overlapping addresses for such variables.
JakobDegen commentedon Mar 28, 2022
I was thinking about this the other day, and have some thoughts. For now, I am going to assume two things 1) local variables have stable addresses (not that the alternative might not be just as interesting). 2) A strict provenance model like @Gankra proposes. If we allow some int to pointer casts, some of these options may fall away.
First thing first: I believe there is absolutely no need for distinct allocations to have distinct addresses - we have provenance to disambiguate. As I understand SB, the "provenance" of a pointer is an integer that identifies an item in the borrow stacks, and these integers never repeat. Consequently, there should be no problem saying that "when a pointer is dereferenced, we search all the bytes that have the same address as the pointer, and see if there is an item in any of the borrow stacks that makes the access legal. There can be at most one, since no pointer can have provenance to more than one allocation." This might feel a little surprising, but I don't think it's as bad as it initially sounds; it might even be a way to drive home the "memory is not flat" point. (this also plays nicely with the mental model that Gankra proposed for memory, where it's a two dimensional grid of address x provenance).
Now the probably more difficult question: What do we want to guarantee? I am for now only going to think about stack local variables; there might be interesting (different) arguments for other categories of allocations. I see at least a few possibilities, but am completely undecided myself.
Because I'll be talking about some optimizations, I'll need to differentiate between "live range in the abstract machine" and "live range as reported by compiler analyses." I'll refer to the first as "scope" and the second as "liveness."
We do not guarantee that simultaneously in scope locals have distinct addresses
This has the benefit of enabling optimizations. The stack slot in this code cannot be re-used:
It would be possible and actually fairly easy for a Rust compiler to see that
x
is dead wheny
is created; thex = 10;
invalidates any other pointers to it. However,foo
may compute the addresses ofx
andy
, and so they can't overlap (and the pointers are valid to be dereferenced, so we can't lie tofoo
either). Interestingly though, this optimization does not require the assignment tox
. It would also be legal to use a single stack slot in this case:That is simultaneously more powerful but also potentially more surprising. I'm not sure how much benefit the above two optimizations give. However, I could see the following optimization being potentially more useful:
here, the optimization would not be to re-use a stack slot (which has relatively small benefits), but to be able to merge
x
andy
at the MIR level entirely. This avoids a copy and also enables future optimizations in a very significant way.We do guarantee that simultaneously in scope locals have distinct addresses
The main benefit of this is to disable the potentially surprising optimizations above. I had asked about use cases for such a guarantee on discord (besides not having to go "wtf is the compiler doing"), and something like
HashMap<*const i32, T>
came up. I guess I could see that being a useful type, but it's fairly limited - keep in mind that the pointers have to be in different allocations for the question in this issue to matter.This additionally has the downside of making MIR storage markers be statements that have significant semantics. In other words,
could not be freely re-ordered. This is related to and discussed in rust-lang/rust#68622 .
Some alternative?
We could try and define the guarantees around here in terms of some analysis or other conditions. I've talked to Ralf enough that my instinctive reaction to that is now also "that's not an operational semantics," but I actually think the need for a real operational semantics might be reduced here - the values of the addresses are implementation defined anyway. @moulins had suggested the following on discord:
This would maybe allow some of the optimizations, but I have some concerns about this definition; at least as I understand it, there's an implicit requirement here of "two pointers that exist at the same time." But it's not clear to me how we should define this concept without typed memory - pointers only exist as values temporarily, most of the time there are just pointer bytes in memory that do not necessarily correspond to an actual value.
In any case, there might be some idea here that I haven't thought of
scottmcm commentedon Mar 29, 2022
Is there anything that would keep us from merging
impl Freeze
statics? Certainly if they have unsafe cells then we shouldn't merge them, but for something like[u32; 4]
it seems like we could reasonably merge things -- even have a differentstatic
be in the middle of another, so long as it's the right value.(But it also seems fine to say "well just use
const
if you want that".)Lokathor commentedon Mar 29, 2022
That does sound fine for any read-only static.
thomcc commentedon Mar 29, 2022
It seems better to require const for this. i can think of cases in C++ where a static is used just to generate a value so that the pointer is used as an identity. I think it would be confusing to have this require UnsafeCell even in the case where it's never written.
That said, I don't feel that strongly here... but I suspect in practice this would be pretty low value of an optimization TBH.
comex commentedon Mar 29, 2022
The
defmt
crate currently relies on identical statics not being combined, though its statics are unusual in having bothlink_section
andexport_name
attributes.RalfJung commentedon Apr 1, 2022
What I expected cold be phrased as "simultaneously live locals have distinct addresses. That would still allow your first and 3rd optimization, but not the 2nd.
JakobDegen commentedon Apr 1, 2022
Well, maybe, but that would require a definition of liveness on the AM, which seems non-trivial
21 remaining items