(Maybe) Undefined behavior in safe code from `getelementptr inbounds` with offset 0

As far as I can tell, slicing a `Vec<T>` (in safe code) results in undefined behavior when `T` is zero-sized or the `Vec` has zero capacity. I'm probably missing something, but I'm creating this issue in case my investigation is correct.

In particular, these two examples appear to cause undefined behavior due the `.offset()` call violating the [first safety constraint](https://doc.rust-lang.org/stable/std/primitive.pointer.html#safety-1) ("Both the starting and resulting pointer must be either in bounds or one byte past the end of *the same* allocated object.") when performing the slice:

```rust
// Example 1: zero-sized T
let v = Vec::from(&[(); 5][..]);
let _ = &v[2..3];

// Example 2: zero-capacity Vec
let v = Vec::<i32>::with_capacity(0);
let _ = &v[0..0];
```

## Example 1

In the first example, the `v` has field values

```rust
Vec {
    buf: RawVec {
        ptr: Unique {
            pointer: NonZero(0x1 as *const ()),
            ..
        },
        ..
    },
    len: 5,
}
```

(Verify this with `v.as_ptr()` and `v.len()`.) Performing the slice `&v[2..3]` expands to approximately the following:

```rust
let slice = unsafe {
    let p = v.buf.ptr();
    assume(!p.is_null());
    slice::from_raw_parts(p, v.len)
};
// Note that the pointer of `slice` has value `0x1`.
// (Bounds checks elided here.)
unsafe {
    from_raw_parts((slice as *const [()] as *const ()).offset(2), 3 - 2)
}
```

So, it's calling `ptr.offset(2)` where `ptr` has value `0x1`. This pointer is not "in bounds or one byte past the end of [an] allocated object", so the `.offset()` is undefined behavior. (This pointer was created from casting an integer (the alignment of `()`) to a pointer in `libcore/ptr.rs, Unique::empty`.)

## Example 2

The second example has a similar issue. In the second example, the `v` has field values

```rust
Vec {
    buf: RawVec {
        ptr: Unique {
            pointer: NonZero(0x4 as *const i32),
            ..
        },
        ..
    },
    len: 0,
}
```

(Verify this with `v.as_ptr()` and `v.len()`.) Performing the slice `&v[0..0]` expands to approximately the following:

```rust
let slice = unsafe {
    let p = v.buf.ptr();
    assume(!p.is_null());
    slice::from_raw_parts(p, v.len)
};
// Note that the pointer of `slice` has value `0x4`.
// (Bounds checks elided here.)
unsafe {
    from_raw_parts((slice as *const [i32] as *const i32).offset(0), 0 - 0)
}
```

So, it's calling `ptr.offset(0)` where `ptr` has value `0x4`. This pointer is not "in bounds or one byte past the end of [an] allocated object", so the `.offset()` is undefined behavior. (This pointer was created from casting an integer (the alignment of `i32`) to a pointer in `libcore/ptr.rs, Unique::empty`.)

## Further investigation

There are a few ways that these examples might actually not be undefined behavior:

1. If the documentation is incorrect, and `.offset()` is in fact safe if the offset in bytes is zero (even if the pointer is not part of an allocated object).

2. If LLVM considers `Unique::empty` to be an allocator so that the returned pointer is considered part of an allocated object. I don't see anything to indicate this is the case, though.

3. If, somewhere, the runtime allocates the range of bytes with addresses `0x1..=(max possible alignment)`. This would mean that pointers returned by `Unique::empty` would be within an allocated object. I don't see anything to indicate this is the case, though, and I'm not entirely convinced that casting an integer to a pointer would work in this case anyway (since the pointer would be derived from an integer instead of offsetting a pointer of an existing allocation).

I did some further investigation into possibility 1.

The `.offset()` method is converted into an LLVM `getelementptr inbounds` instruction. (`src/libcore/ptr.rs` provides the `.offset()` method, which calls `intrinsics::offset`. `src/libcore/intrinsics.rs` defines the `extern "rust-intrinsic"` `offset` but not the implementation. The `codegen_intrinsic_call` function in `src/librustc_codegen_llvm/intrinsic.rs` handles the `"offset"` case by calling `.inbounds_gep()` in the `Builder`. The implementation of `.inbounds_gep()` is provided in `src/librustc_codegen_llvm/builder.rs`, which in turn calls the `extern` function `LLVMBuildInBoundsGEP` imported in `src/librustc_llvm/ffi.rs`. The function is defined in `src/llvm/include/llvm-c/Core.h`)

[The docs for the LLVM `getelementptr inbounds` instruction](https://llvm.org/docs/LangRef.html#id219) say the following:

> If the `inbounds` keyword is present, the result value of the `getelementptr` is a [poison value](https://www.llvm.org/docs/LangRef.html#poisonvalues) if the base pointer is not an *in bounds* address of an allocated object, or if any of the addresses that would be formed by successive addition of the offsets implied by the indices to the base address with infinitely precise signed arithmetic are not an *in bounds* address of that allocated object. The *in bounds* addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end. The only *in bounds* address for a null pointer in the default address-space is the null pointer itself. In cases where the base is a vector of pointers the `inbounds` keyword applies to each of the computations element-wise.

The LLVM docs say [this about poison values](https://llvm.org/docs/LangRef.html#poisonvalues):

> Poison values are similar to [undef values][], however they also represent the fact that an instruction or constant expression that cannot evoke side effects has nevertheless detected a condition that results in undefined behavior.
>
> …
>
> Poison values have the same behavior as [undef values][], with the additional effect that any instruction that has a *dependence* on a poison value has undefined behavior.
>
> [undef values]: https://www.llvm.org/docs/LangRef.html#undefvalues

As far as I can tell, the reason why the Rust docs for `.offset()` consider getting a "poison value" to be undefined behavior is that performing any operation with a dependence on the poison value (e.g. printing it with `println!`) is undefined behavior. In particular, it's possible to perform operations with a dependence on a pointer value in safe code, so a pointer must never be a poison value.

Anyway, back to the safety constraints on `.offset()`. The constraints listed in the docs for `getelementptr inbounds` match the constraints listed in the docs for `.offset()` with one exception: "The only *in bounds* address for a null pointer in the default address-space is the null pointer itself." This means that even though a null pointer is not part of an allocation, it's still safe to perform an offset of 0 bytes on it. The docs for `getelementptr inbounds` don't indicate that this is true for non-null pointers, though, which is the case described in this issue (slicing a `Vec` with zero-size elements or zero capacity).

## Meta

This appears to be an issue in both stable (1.29.1) and nightly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(Maybe) Undefined behavior in safe code from `getelementptr inbounds` with offset 0 #54857

Example 1

Example 2

Further investigation

Meta

13 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

(Maybe) Undefined behavior in safe code from getelementptr inbounds with offset 0 #54857

Description

Example 1

Example 2

Further investigation

Meta

Activity

ishitatsuyuki commented on Oct 6, 2018

hanna-kruppe commented on Oct 6, 2018

RalfJung commented on Oct 6, 2018

arielb1 commented on Oct 6, 2018

RalfJung commented on Oct 7, 2018

jturner314 commented on Oct 8, 2018

RalfJung commented on Oct 8, 2018

arielb1 commented on Oct 8, 2018

RalfJung commented on Oct 9, 2018

jturner314 commented on Oct 13, 2018

RalfJung commented on Oct 13, 2018

comex commented on Oct 14, 2018

13 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions

(Maybe) Undefined behavior in safe code from `getelementptr inbounds` with offset 0 #54857