Skip to content

rustc performs auto-ref when a raw pointer would be enough #73987

@RalfJung

Description

@RalfJung
Member

The following code:

#![feature(slice_ptr_len)]

pub struct Test {
    data: [u8],
}

pub fn test_len(t: *const Test) -> usize {
    unsafe { (*t).data.len() }
}

generates MIR like

        _2 = &((*_1).0: [u8]);
        _0 = const core::slice::<impl [u8]>::len(move _2) -> bb1;

This means that a reference to data gets created, even though a raw pointer would be enough. That is a problem because creating a reference makes aliasing and validity assumptions that could be avoided. It would be better if rustc would not implicitly introduce such assumptions.

Cc @matthewjasper

Activity

added
A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html
C-bugCategory: This is a bug.
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on Jul 3, 2020
SimonSapin

SimonSapin commented on Jul 3, 2020

@SimonSapin
Contributor

This is specifically about field projection of a raw pointer, right?

RalfJung

RalfJung commented on Jul 3, 2020

@RalfJung
MemberAuthor

I think so, yes. It is key that t starts out as a raw pointer.

SimonSapin

SimonSapin commented on Jul 3, 2020

@SimonSapin
Contributor

Oh I just realized something, and I think that the issue title and description are misleading. They make it sound like we’re calling <*const [u8]>::len(self), and in the process unnecessarily going through &[u8]. But the second line of MIR shows that the method called is actually <[u8]>::len(&self). On closer look that seems completely expected to me. The expression (*t).data by itself has type [u8], and method resolution ends up finding a result through auto-ref. But there is no equivalent to auto-ref for raw pointer. If we instead try to call a raw pointer method that doesn’t have a slice method of the same name, we get an error:

trait Foo {
    fn bar(self);
}

impl Foo for *const [u8] {
    fn bar(self) {}
}

pub struct Test {
    data: [u8],
}

pub fn test_len(t: *const Test) -> usize {
    unsafe { (*t).data.bar() }
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error[E0599]: no method named `bar` found for slice `[u8]` in the current scope
  --> src/lib.rs:14:24
   |
14 |     unsafe { (*t).data.bar() }
   |                        ^^^ method not found in `[u8]`
   |
   = help: items from traits can only be used if the trait is implemented and in scope
note: `Foo` defines an item `bar`, perhaps you need to implement it
  --> src/lib.rs:1:1
   |
1  | trait Foo {
   | ^^^^^^^^^

Another example without involving a struct field:

fn ptr_after<T>(x: &T) -> *const T {
    (x as *const T).offset(1)  // Ok
}

fn ptr_after2<T>(x: &T) -> *const T {
    x.offset(1)
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error[E0599]: no method named `offset` found for reference `&T` in the current scope
 --> src/lib.rs:6:7
  |
6 |     x.offset(1)
  |       ^^^^^^ method not found in `&T`

So I’d be inclined to call this not a bug.

RalfJung

RalfJung commented on Jul 3, 2020

@RalfJung
MemberAuthor

Oh I just realized something, and I think that the issue title and description are misleading. They make it sound like we’re calling <*const [u8]>::len(self), and in the process unnecessarily going through &[u8]. But the second line of MIR shows that the method called is actually <[u8]>::len(&self).

Yes, that is the problem. We should be calling the raw ptr method, but instead the compiler chooses the call the other method. That is what this bug is about. I am happy for suggestions for how to word this better. :)

Elsewhere you wrote:

The example code in #73987 never involved a *const [u8] value at all. I’ve commented some more there.

The example code also doesn't involve a &[u8]. It just involves a [u8]. The issue is that the compiler chooses to introduce an &[u8] instead of introducing *const [u8]. Either choice works synatically, but one makes way more assumptions, so we should be auto-introducing the one with fewer assumptions.

I am aware that the reason for this is auto-ref, and not having auto-raw-ptr. But that is IMO a big problem as it means it is actually very hard to call raw-self methods on paths -- and it is very easy to accidentally call the reference method instead.

RalfJung

RalfJung commented on Jul 3, 2020

@RalfJung
MemberAuthor

If we instead try to call a raw pointer method that doesn’t have a slice method of the same name, we get an error:

Indeed. IMO we should not stabilize any raw ptr method where a reference method with the same name exists, until this bug is fixed. It's just too much of a footgun, with people accidentally calling the reference method instead of the raw ptr method.

SimonSapin

SimonSapin commented on Jul 3, 2020

@SimonSapin
Contributor

I’m not sure I agree that this is a bug in the first place. The language never had coercion from T to *const T in any context

RalfJung

RalfJung commented on Jul 3, 2020

@RalfJung
MemberAuthor

Would you agree that it is a footgun, though?

I agree it is behavior as intended. I just don't think the intentions are fit for modern Rust any more -- after all, when this behavior was designed, there were no raw-self methods.

neocturne

neocturne commented on Jul 3, 2020

@neocturne
Contributor

In this particular example, the behaviour does not feel like a footgun to me: In my simplified mental model of the language, (*t) is already only valid when the usual aliasing and validity assumptions hold, even if these assumptions only actually need to hold when I do something with the result.

I would go as far as saying that having &raw const (*t).data as the supported way to get from the raw struct pointer to the raw field pointer is quite ugly because the code looks as if t is dereferenced - is there some nicer way to do this? Optimally, the test_len function in the original example shouldn't even need unsafe at all. (But I'm likely missing years of discussions on these topics, given that I've only recently taken an interest in Rust)

RalfJung

RalfJung commented on Jul 4, 2020

@RalfJung
MemberAuthor

In my simplified mental model of the language, (*t) is already only valid when the usual aliasing and validity assumptions hold, even if these assumptions only actually need to hold when I do something with the result.

That is not the case though. *t (where t is a raw pointer) just requires t to be aligned and dereferencable; creating a reference (& or &mut) makes a huge difference on top of that by also making aliasing assumptions.

(What you said is basically right when t is a reference, though.)

I would go as far as saying that having &raw const (*t).data as the supported way to get from the raw struct pointer to the raw field pointer is quite ugly because the code looks as if t is dereferenced - is there some nicer way to do this?

Well, t does get dereferenced. No memory access happens, but all rules in the langauge that talk about pointers being dereferenced apply to *t, even when used in the context of &*t.

This is the same in C: &ptr->field is UB if ptr is dangling or unaligned.

ssomers

ssomers commented on Jul 15, 2020

@ssomers
Contributor

Wearing an old hat, *t is not just dereferencing (for some definition) to me but how you get from a raw pointer back into the safe world. So I would expect (*t).data.len() to make all the assumptions it does. And to find in a back alley some notation like t + .data or &t->data to do pointer arithmetic, reading in the doc that pointer arithmetic is subject to the same pointer validation as dereferencing.

Wearing a newer hat, since unsafe {&raw const *t} and &raw const (*t).data exist, and don't dereference (as much as *t), it's much less clear to me what (*t).data.len()should do. Isn't quietly doing raw pointer access also a risk, leaving you unprotected by aliasing rules that you thought were being applied?

46 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlC-bugCategory: This is a bug.F-arbitrary_self_types`#![feature(arbitrary_self_types)]`T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @kornelski@adetaylor@SimonSapin@RalfJung@neocturne

        Issue actions

          rustc performs auto-ref when a raw pointer would be enough · Issue #73987 · rust-lang/rust