Description
It's common knowledge that a "Rust object" or "Rust allocation" can't have a size which overflows isize
. This is relied upon in a lot of APIs such as the raw pointer add
method. However, the only place it seems to be documented as a general property is on the reference page for numeric types:
The
isize
type is a signed integer type with the same number of bits as the platform's pointer type. The theoretical upper bound on object and array size is the maximumisize
value. This ensures thatisize
can be used to calculate differences between pointers into an object or array and can address every byte within an object along with one byte past the end.
I see a few issues with this description:
- Being on the page for numeric types makes this text not that discoverable for people looking for general guarantees about types and allocations.
- It uses imprecise language: what are "objects" and "arrays"? Is the latter used in the sense of the type system - a
[T; N]
? - Some allocations are neither native Rust objects nor native Rust arrays. E.g., the allocation backing a
Vec
doesn't have a type (at least not in its API). If it's guaranteed that&T
can't refer to an object of more thanisize
bytes, thenvec.as_slice()
can't return a reference which violates this guarantee. However, that doesn't prevent the addresses ofvec[0]
andvec[N]
from being more thanisize
bytes apart. VariousVec
APIs strongly hint that this is impossible, but none actually guarantee it, and theVec
top-level docs make no guarantee about the interactions between various APIs (such as thevec[0]
/vec[N]
problem).
Based on my understanding of the current state of the language, here's a stab at a more complete set of guarantees; do these sound reasonable?
- For all
T
, givent: T
, the size oft
is guaranteed to fit in anisize
- For all
T
, givent: &T
, the size oft
's referent is guaranteed to fit in anisize
- With respect to non-builtin types like
Vec
, I could see a few approaches:- Each such type documents its own guarantees
- We define a formal notion of an "allocation", and document that
t: T
andt: &T
are instances of allocations. Beyond that, we leave it up to non-builtin types to document their own guarantees by making reference to the docs for "allocation". - We define a formal notion of an "allocation", and make it clear in the definition itself that it covers non-builtin things like
Vec
. That seems iffy; I'm not sure how you'd formally specify the set of objects that are covered by a definition like this (e.g., do we want to make guarantees about the memory backing aHashMap
?).
cc @jswrenn
Activity
RalfJung commentedon Sep 28, 2023
Don't have time for a full response right now, but note that this is also documented in
slice::from_raw_parts
, since that is the most obvious place where someone would violate this guarantee.CAD97 commentedon Sep 28, 2023
It may not be formalized yet, but in discussing I've been using the concept of a Rust Allocated Object to refer to the thing which exists in the abstract machine, provides memory addresses which can be pointed to, and defines what pointer offsets are "inbounds." The RAO refers just to the chunk of allocated memory; remember that memory is untyped and types only exist for typed copies between memory (as far as the AM is concerned).
The usual way to create a RAO is via
std::alloc
, and those APIs communicate thesize <= isize::MAX
requirement, and (mostly) enforce it viaLayout
1. Memory which is allocated externally to Rust but is still dereferencable requires the FFI code to logically create a RAO; the implementation/# target likely provides some (usually implicit) way to promote a region of read/write system memory to a RAO. The limit is also mentioned insize_of_val_raw
(unstable).What we do need to document though is whether creating a RAO with size >
isize::MAX
is immediate UB or merely unsound, with UB occurring when doing a layout calculation / field projection of overlarge size. (Allocating such RAO is definitely UB with the std allocation functions, thus only possible via FFI.) For simplicity of the model, I would argue to make it immediate UB, and this might be required for targeting LLVM, which will happily merge twoinbounds
offsets assuming the combined offset won't overflowisize
. Plus, such a large allocation can't even be addressed with 64-bit page tables.Footnotes
I did this smile and made some of my own layout polyfilling code unsound in the process smile ↩
joshlf commentedon Sep 28, 2023
The reason I mention
t: T
andt: &T
is that in a lot of theunsafe
code I write, that's the starting point: I'm trying to do something with pointers that are derived from aT
or&T
, and I need to be able to say something like "safe Rust can't produce aT
or&T
larger thanisize
, so my code is guaranteed that all offsets.... blah blah blah". I don't actually care about theT
itself, just the fact that Rust makes certain guarantees about all values or references-to-values.For more context, the place this has come up for me recently is in this PR. It adds a
Ptr<'a, T>
type which is somewhere in between aNonNull<T>
and a&'a T
or&'a mut T
in terms of its invariants. One of its invariants is that the referenced memory region has a size which fits inisize
. This is required in places such as this one, where it's a prerequisite for the pointeradd
method. We need guarantees aboutT
and&T
in order to ensure that we're satisfyingPtr
's invariants when we construct it here.CAD97 commentedon Sep 28, 2023
It's unquestionably a soundness requirement that values described by a Rust type (including
?Sized
ones) have a size <=isize::MAX
. Doingsize_of_val
for an oversized type is documented to be UB.I don't really know anywhere better to document this soundness requirement other than
size_of_val_raw
,slice_from_raw_parts
,alloc
/Layout
, and perhaps the variousfeature(ptr_metadata)
APIs. While the precise validity requirement for RAO carefully managed by pointer is technically undecided, the requirement on types/references is reasonably well documented in all the places it could potentially be violated.If the primary question here is w.r.t. soundness guarantees, it could merit a docs issue, but it's not particularly actionable for UCG let alone T-opsem.
(But it doesn't hold if you don't point to an actual allocation; it's safe to construct a raw pointer to an oversized slice.)
joshlf commentedon Sep 28, 2023
RalfJung commentedon Sep 29, 2023
size_of_val seems like the natural place to put the answer to your first two questions. By saying that function will never return a value exceeding isize::MAX, you should have everything you need -- even if you don't physically call that function, you can now rely on the size of any object you hold where you could safely call that function to not exceed isize::MAX.
I'm not sure where to put the answer to the third question.
In #464 we are calling it an "allocation".
I think it has to be UB; creating such an allocation (and giving Rust access to it) is a case of mutating the Rust-visible state in a way that is not possible from Rust. The Rust AM has an invariant that all allocations are at most isize::MAX in size; violating such an invariant must be immediate UB.
But of course one could say that when such an allocation is created, really it's just created with size isize::MAX, and the UB occurs on the first access outside that range.
joshlf commentedon Oct 9, 2023
I don't think that's sufficient because
size_of_val
can panic. It's not documented onsize_of_val
itself, butsize_of_val_raw
's docs say:I was originally going to put up a PR to add the following to
size_of_val
's docs:However, I realized that this argument is unsound: If
size_of_val
can panic, then givent: &T
, you only know that ifsize_of_val(t)
returns, it will return a value which fits inisize
. But you don't know that it will return.RalfJung commentedon Oct 10, 2023
So, we could say
isize
.Until the extern type situation is resolved, this seems the best we can do? Well actually we could do better, we could guarantee that it will panic (probably this has to be a non-unwinding panic) on extern type, and never just return nonsense. IMO that's what we should do, but currently the "panic" part hasn't been implemented yet I think.
joshlf commentedon Oct 11, 2023
Unfortunately I don't think that's sufficient because there are cases where you never actually call
size_of_val(t)
- you just know that you could. If you actually calledsize_of_val(t)
, you could at least argue that your code would diverge rather than misbehaving. But in code that doesn't call it, you can't rely on that argument.E.g., consider this type:
It has this impl:
In order to construct an instance of
Ptr
which satisfies the field invariant on theptr
field, we need a guarantee thatNonNull::from(t)
results in a pointer whose referent is a memory region whose length fits inisize
. We'd like to say something like "we knowt
refers to a memory region of no more thanisize::MAX
bytes because we could callsize_of_val(t)
, which in turn promises to return a size no greater thanisize::MAX
." However, that argument doesn't work ifsize_of_val
can panic.Given this limitation, I think we still need a separate location to document the size maximum (unless we can make a stable promise that
size_of_val
will never panic, in which case this type of reasoning would be sufficient).RalfJung commentedon Oct 11, 2023
joshlf commentedon Oct 11, 2023
Yeah so I've been talking this over with @jswrenn, and the conclusion we've come to is that the thing that makes the most sense is to make it a bit validity constraint on
&T
for allT: ?Sized
. Our rationale is that what we're trying to do is the following:We need to be able to make an argument whose premise is
t: &T
and whose conclusion is thatt
refers to no more thanisize::MAX
bytes. At first we considered a weaker guarantee like "safe Rust code will never produce a&T
which refers to more thanisize::MAX
bytes", but this on its own isn't sufficient - it doesn't guarantee thatunsafe
code won't synthesize such a reference. We need to also banunsafe
code from doing this, which is basically what it means to have a bit validity constraint.We're thinking something like:
RalfJung commentedon Oct 11, 2023
"referring to more than isize::MAX bytes" is basically an ill-typed statement. At least if you mean "refer to" in the sense of "there is that much dereferenceable memory behind this pointer". This is independent of whether it's a raw pointer or a reference. There just can't be a contiguous memory range larger than
isize::MAX
.A
&[u8]
with a size of more than isize::MAX is already invalid today because it is dangling, and dangling references are UB. So this doesn't need docs changes.joshlf commentedon Oct 11, 2023
I agree that this is true in practice, but is it guaranteed anywhere? IIUC that's exactly what we're trying to guarantee here.
Oh interesting, I'm not sure where this comes from. How is such a reference dangling?
RalfJung commentedon Oct 11, 2023
It's dangling because there can't be a memory range large enough for it to point to that would make it non-dangling. :)
That's what I was asking above -- where should such docs go? This property has nothing to do with references so stating it about references makes no sense. It's a property about what the Rust Abstract Machine considers an "allocated object".
RalfJung commentedon Oct 11, 2023
We could add it here maybe? That defines "allocated object". If we say that allocated objects have a maximal size of
isize::MAX
that should basically cover it?59 remaining items