Skip to content

Where to document allocation size upper bound? #465

Closed
@joshlf

Description

@joshlf

It's common knowledge that a "Rust object" or "Rust allocation" can't have a size which overflows isize. This is relied upon in a lot of APIs such as the raw pointer add method. However, the only place it seems to be documented as a general property is on the reference page for numeric types:

The isize type is a signed integer type with the same number of bits as the platform's pointer type. The theoretical upper bound on object and array size is the maximum isize value. This ensures that isize can be used to calculate differences between pointers into an object or array and can address every byte within an object along with one byte past the end.

I see a few issues with this description:

  • Being on the page for numeric types makes this text not that discoverable for people looking for general guarantees about types and allocations.
  • It uses imprecise language: what are "objects" and "arrays"? Is the latter used in the sense of the type system - a [T; N]?
  • Some allocations are neither native Rust objects nor native Rust arrays. E.g., the allocation backing a Vec doesn't have a type (at least not in its API). If it's guaranteed that &T can't refer to an object of more than isize bytes, then vec.as_slice() can't return a reference which violates this guarantee. However, that doesn't prevent the addresses of vec[0] and vec[N] from being more than isize bytes apart. Various Vec APIs strongly hint that this is impossible, but none actually guarantee it, and the Vec top-level docs make no guarantee about the interactions between various APIs (such as the vec[0]/vec[N] problem).

Based on my understanding of the current state of the language, here's a stab at a more complete set of guarantees; do these sound reasonable?

  • For all T, given t: T, the size of t is guaranteed to fit in an isize
  • For all T, given t: &T, the size of t's referent is guaranteed to fit in an isize
  • With respect to non-builtin types like Vec, I could see a few approaches:
    • Each such type documents its own guarantees
    • We define a formal notion of an "allocation", and document that t: T and t: &T are instances of allocations. Beyond that, we leave it up to non-builtin types to document their own guarantees by making reference to the docs for "allocation".
    • We define a formal notion of an "allocation", and make it clear in the definition itself that it covers non-builtin things like Vec. That seems iffy; I'm not sure how you'd formally specify the set of objects that are covered by a definition like this (e.g., do we want to make guarantees about the memory backing a HashMap?).

cc @jswrenn

Activity

RalfJung

RalfJung commented on Sep 28, 2023

@RalfJung
Member

Don't have time for a full response right now, but note that this is also documented in slice::from_raw_parts, since that is the most obvious place where someone would violate this guarantee.

CAD97

CAD97 commented on Sep 28, 2023

@CAD97
Contributor

It may not be formalized yet, but in discussing I've been using the concept of a Rust Allocated Object to refer to the thing which exists in the abstract machine, provides memory addresses which can be pointed to, and defines what pointer offsets are "inbounds." The RAO refers just to the chunk of allocated memory; remember that memory is untyped and types only exist for typed copies between memory (as far as the AM is concerned).

The usual way to create a RAO is via std::alloc, and those APIs communicate the size <= isize::MAX requirement, and (mostly) enforce it via Layout1. Memory which is allocated externally to Rust but is still dereferencable requires the FFI code to logically create a RAO; the implementation/# target likely provides some (usually implicit) way to promote a region of read/write system memory to a RAO. The limit is also mentioned in size_of_val_raw (unstable).

What we do need to document though is whether creating a RAO with size > isize::MAX is immediate UB or merely unsound, with UB occurring when doing a layout calculation / field projection of overlarge size. (Allocating such RAO is definitely UB with the std allocation functions, thus only possible via FFI.) For simplicity of the model, I would argue to make it immediate UB, and this might be required for targeting LLVM, which will happily merge two inbounds offsets assuming the combined offset won't overflow isize. Plus, such a large allocation can't even be addressed with 64-bit page tables.

Footnotes

  1. I did this smile and made some of my own layout polyfilling code unsound in the process smile

joshlf

joshlf commented on Sep 28, 2023

@joshlf
Author

It may not be formalized yet, but in discussing I've been using the concept of a Rust Allocated Object to refer to the thing which exists in the abstract machine, provides memory addresses which can be pointed to, and defines what pointer offsets are "inbounds." The RAO refers just to the chunk of allocated memory; remember that memory is untyped and types only exist for typed copies between memory (as far as the AM is concerned).

The reason I mention t: T and t: &T is that in a lot of the unsafe code I write, that's the starting point: I'm trying to do something with pointers that are derived from a T or &T, and I need to be able to say something like "safe Rust can't produce a T or &T larger than isize, so my code is guaranteed that all offsets.... blah blah blah". I don't actually care about the T itself, just the fact that Rust makes certain guarantees about all values or references-to-values.

For more context, the place this has come up for me recently is in this PR. It adds a Ptr<'a, T> type which is somewhere in between a NonNull<T> and a &'a T or &'a mut T in terms of its invariants. One of its invariants is that the referenced memory region has a size which fits in isize. This is required in places such as this one, where it's a prerequisite for the pointer add method. We need guarantees about T and &T in order to ensure that we're satisfying Ptr's invariants when we construct it here.

CAD97

CAD97 commented on Sep 28, 2023

@CAD97
Contributor

It's unquestionably a soundness requirement that values described by a Rust type (including ?Sized ones) have a size <= isize::MAX. Doing size_of_val for an oversized type is documented to be UB.

I don't really know anywhere better to document this soundness requirement other than size_of_val_raw, slice_from_raw_parts, alloc/Layout, and perhaps the various feature(ptr_metadata) APIs. While the precise validity requirement for RAO carefully managed by pointer is technically undecided, the requirement on types/references is reasonably well documented in all the places it could potentially be violated.

If the primary question here is w.r.t. soundness guarantees, it could merit a docs issue, but it's not particularly actionable for UCG let alone T-opsem.

(But it doesn't hold if you don't point to an actual allocation; it's safe to construct a raw pointer to an oversized slice.)

joshlf

joshlf commented on Sep 28, 2023

@joshlf
Author
RalfJung

RalfJung commented on Sep 29, 2023

@RalfJung
Member

size_of_val seems like the natural place to put the answer to your first two questions. By saying that function will never return a value exceeding isize::MAX, you should have everything you need -- even if you don't physically call that function, you can now rely on the size of any object you hold where you could safely call that function to not exceed isize::MAX.

I'm not sure where to put the answer to the third question.

It may not be formalized yet, but in discussing I've been using the concept of a Rust Allocated Object to refer to the thing which exists in the abstract machine, provides memory addresses which can be pointed to, and defines what pointer offsets are "inbounds."

In #464 we are calling it an "allocation".

What we do need to document though is whether creating a RAO with size > isize::MAX is immediate UB or merely unsound, with UB occurring when doing a layout calculation / field projection of overlarge size.

I think it has to be UB; creating such an allocation (and giving Rust access to it) is a case of mutating the Rust-visible state in a way that is not possible from Rust. The Rust AM has an invariant that all allocations are at most isize::MAX in size; violating such an invariant must be immediate UB.

But of course one could say that when such an allocation is created, really it's just created with size isize::MAX, and the UB occurs on the first access outside that range.

joshlf

joshlf commented on Oct 9, 2023

@joshlf
Author

size_of_val seems like the natural place to put the answer to your first two questions. By saying that function will never return a value exceeding isize::MAX, you should have everything you need -- even if you don't physically call that function, you can now rely on the size of any object you hold where you could safely call that function to not exceed isize::MAX.

I don't think that's sufficient because size_of_val can panic. It's not documented on size_of_val itself, but size_of_val_raw's docs say:

an (unstable) extern type, then this function is always safe to call, but may panic or otherwise return the wrong value, as the extern type’s layout is not known. This is the same behavior as size_of_val on a reference to a type with an extern type tail.

I was originally going to put up a PR to add the following to size_of_val's docs:

/// # Safety
///
/// It is guaranteed that `size_of_val` will always return a value which fits in
/// an `isize`. `unsafe` code may rely on this guarantee for its soundness. Note
/// that this amounts to a guarantee that, for all types, `T`, and for all values
/// `t: &T`, `t` references a value whose size can be encoded in an `isize`. This
/// holds because, given a `t: &T`, it is always valid to call `size_of_val(t)`.

However, I realized that this argument is unsound: If size_of_val can panic, then given t: &T, you only know that if size_of_val(t) returns, it will return a value which fits in isize. But you don't know that it will return.

RalfJung

RalfJung commented on Oct 10, 2023

@RalfJung
Member

However, I realized that this argument is unsound: If size_of_val can panic, then given t: &T, you only know that if size_of_val(t) returns, it will return a value which fits in isize. But you don't know that it will return.

So, we could say

  • When there are (unstable) extern types involved, the function may panic or otherwise return the wrong value.
  • If the function doesn't panic (even if extern types lead to a wrong value), we guarantee that the result fits in an isize.

Until the extern type situation is resolved, this seems the best we can do? Well actually we could do better, we could guarantee that it will panic (probably this has to be a non-unwinding panic) on extern type, and never just return nonsense. IMO that's what we should do, but currently the "panic" part hasn't been implemented yet I think.

joshlf

joshlf commented on Oct 11, 2023

@joshlf
Author

Unfortunately I don't think that's sufficient because there are cases where you never actually call size_of_val(t) - you just know that you could. If you actually called size_of_val(t), you could at least argue that your code would diverge rather than misbehaving. But in code that doesn't call it, you can't rely on that argument.

E.g., consider this type:

pub(crate) struct Ptr<'a, T: 'a + ?Sized> {
    // INVARIANTS:
    // - `ptr` addresses a byte range which is not longer than `isize::MAX`
    // (other invariants removed for brevity)
    ptr: NonNull<T>,
    _lifetime: PhantomData<&'a ()>,
}

It has this impl:

impl<'a, T: 'a + ?Sized> From<&'a T> for Ptr<'a, T> {
    fn from(t: &'a T) -> Ptr<'a, T> {
        Ptr { ptr: NonNull::from(t), _lifetime: PhantomData }
    }
}

In order to construct an instance of Ptr which satisfies the field invariant on the ptr field, we need a guarantee that NonNull::from(t) results in a pointer whose referent is a memory region whose length fits in isize. We'd like to say something like "we know t refers to a memory region of no more than isize::MAX bytes because we could call size_of_val(t), which in turn promises to return a size no greater than isize::MAX." However, that argument doesn't work if size_of_val can panic.

Given this limitation, I think we still need a separate location to document the size maximum (unless we can make a stable promise that size_of_val will never panic, in which case this type of reasoning would be sufficient).

RalfJung

RalfJung commented on Oct 11, 2023

@RalfJung
Member
joshlf

joshlf commented on Oct 11, 2023

@joshlf
Author

But where could that be documented? The "alloc" module would make sense but that is really only about heap allocations and we are stating a fact about all allocations...

Yeah so I've been talking this over with @jswrenn, and the conclusion we've come to is that the thing that makes the most sense is to make it a bit validity constraint on &T for all T: ?Sized. Our rationale is that what we're trying to do is the following:

fn foo<T: ?Sized>(t: &T) {
    let ptr = NonNull::from(t);
    // SAFETY: <what do we write here?>
    unsafe { requires_ptr_whose_referent_size_fits_in_isize(ptr) }
}

We need to be able to make an argument whose premise is t: &T and whose conclusion is that t refers to no more than isize::MAX bytes. At first we considered a weaker guarantee like "safe Rust code will never produce a &T which refers to more than isize::MAX bytes", but this on its own isn't sufficient - it doesn't guarantee that unsafe code won't synthesize such a reference. We need to also ban unsafe code from doing this, which is basically what it means to have a bit validity constraint.

We're thinking something like:

For all T: ?Sized, it is unsound to produce a value, t: &T, whose referent is more than isize::MAX bytes in size. Unsafe code may assume that any such t: &T will refer to no more than isize::MAX bytes.

RalfJung

RalfJung commented on Oct 11, 2023

@RalfJung
Member

"referring to more than isize::MAX bytes" is basically an ill-typed statement. At least if you mean "refer to" in the sense of "there is that much dereferenceable memory behind this pointer". This is independent of whether it's a raw pointer or a reference. There just can't be a contiguous memory range larger than isize::MAX.

A &[u8] with a size of more than isize::MAX is already invalid today because it is dangling, and dangling references are UB. So this doesn't need docs changes.

joshlf

joshlf commented on Oct 11, 2023

@joshlf
Author

There just can't be a contiguous memory range larger than isize::MAX.

I agree that this is true in practice, but is it guaranteed anywhere? IIUC that's exactly what we're trying to guarantee here.

A &[u8] with a size of more than isize::MAX is already invalid today because it is dangling, and dangling references are UB. So this doesn't need docs changes.

Oh interesting, I'm not sure where this comes from. How is such a reference dangling?

RalfJung

RalfJung commented on Oct 11, 2023

@RalfJung
Member

Oh interesting, I'm not sure where this comes from. How is such a reference dangling?

It's dangling because there can't be a memory range large enough for it to point to that would make it non-dangling. :)

I agree that this is true in practice, but is it guaranteed anywhere? IIUC that's exactly what we're trying to guarantee here.

That's what I was asking above -- where should such docs go? This property has nothing to do with references so stating it about references makes no sense. It's a property about what the Rust Abstract Machine considers an "allocated object".

RalfJung

RalfJung commented on Oct 11, 2023

@RalfJung
Member

We could add it here maybe? That defines "allocated object". If we say that allocated objects have a maximal size of isize::MAX that should basically cover it?

59 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @RalfJung@joshlf@CAD97@LegionMammal978@workingjubilee

        Issue actions

          Where to document allocation size upper bound? · Issue #465 · rust-lang/unsafe-code-guidelines