-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Open
Labels
A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlC-bugCategory: This is a bug.Category: This is a bug.F-arbitrary_self_types`#![feature(arbitrary_self_types)]``#![feature(arbitrary_self_types)]`T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Description
The following code:
#![feature(slice_ptr_len)]
pub struct Test {
data: [u8],
}
pub fn test_len(t: *const Test) -> usize {
unsafe { (*t).data.len() }
}
generates MIR like
_2 = &((*_1).0: [u8]);
_0 = const core::slice::<impl [u8]>::len(move _2) -> bb1;
This means that a reference to data
gets created, even though a raw pointer would be enough. That is a problem because creating a reference makes aliasing and validity assumptions that could be avoided. It would be better if rustc would not implicitly introduce such assumptions.
Metadata
Metadata
Assignees
Labels
A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlC-bugCategory: This is a bug.Category: This is a bug.F-arbitrary_self_types`#![feature(arbitrary_self_types)]``#![feature(arbitrary_self_types)]`T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
SimonSapin commentedon Jul 3, 2020
This is specifically about field projection of a raw pointer, right?
RalfJung commentedon Jul 3, 2020
I think so, yes. It is key that
t
starts out as a raw pointer.SimonSapin commentedon Jul 3, 2020
Oh I just realized something, and I think that the issue title and description are misleading. They make it sound like we’re calling
<*const [u8]>::len(self)
, and in the process unnecessarily going through&[u8]
. But the second line of MIR shows that the method called is actually<[u8]>::len(&self)
. On closer look that seems completely expected to me. The expression(*t).data
by itself has type[u8]
, and method resolution ends up finding a result through auto-ref. But there is no equivalent to auto-ref for raw pointer. If we instead try to call a raw pointer method that doesn’t have a slice method of the same name, we get an error:(Playground)
Errors:
Another example without involving a struct field:
(Playground)
Errors:
So I’d be inclined to call this not a bug.
RalfJung commentedon Jul 3, 2020
Yes, that is the problem. We should be calling the raw ptr method, but instead the compiler chooses the call the other method. That is what this bug is about. I am happy for suggestions for how to word this better. :)
Elsewhere you wrote:
The example code also doesn't involve a
&[u8]
. It just involves a[u8]
. The issue is that the compiler chooses to introduce an&[u8]
instead of introducing*const [u8]
. Either choice works synatically, but one makes way more assumptions, so we should be auto-introducing the one with fewer assumptions.I am aware that the reason for this is auto-ref, and not having auto-raw-ptr. But that is IMO a big problem as it means it is actually very hard to call raw-
self
methods on paths -- and it is very easy to accidentally call the reference method instead.RalfJung commentedon Jul 3, 2020
Indeed. IMO we should not stabilize any raw ptr method where a reference method with the same name exists, until this bug is fixed. It's just too much of a footgun, with people accidentally calling the reference method instead of the raw ptr method.
SimonSapin commentedon Jul 3, 2020
I’m not sure I agree that this is a bug in the first place. The language never had coercion from
T
to*const T
in any contextRalfJung commentedon Jul 3, 2020
Would you agree that it is a footgun, though?
I agree it is behavior as intended. I just don't think the intentions are fit for modern Rust any more -- after all, when this behavior was designed, there were no raw-
self
methods.neocturne commentedon Jul 3, 2020
In this particular example, the behaviour does not feel like a footgun to me: In my simplified mental model of the language,
(*t)
is already only valid when the usual aliasing and validity assumptions hold, even if these assumptions only actually need to hold when I do something with the result.I would go as far as saying that having
&raw const (*t).data
as the supported way to get from the raw struct pointer to the raw field pointer is quite ugly because the code looks as ift
is dereferenced - is there some nicer way to do this? Optimally, thetest_len
function in the original example shouldn't even needunsafe
at all. (But I'm likely missing years of discussions on these topics, given that I've only recently taken an interest in Rust)RalfJung commentedon Jul 4, 2020
That is not the case though.
*t
(wheret
is a raw pointer) just requirest
to be aligned and dereferencable; creating a reference (&
or&mut
) makes a huge difference on top of that by also making aliasing assumptions.(What you said is basically right when
t
is a reference, though.)Well,
t
does get dereferenced. No memory access happens, but all rules in the langauge that talk about pointers being dereferenced apply to*t
, even when used in the context of&*t
.This is the same in C:
&ptr->field
is UB ifptr
is dangling or unaligned.ssomers commentedon Jul 15, 2020
Wearing an old hat,
*t
is not just dereferencing (for some definition) to me but how you get from a raw pointer back into the safe world. So I would expect(*t).data.len()
to make all the assumptions it does. And to find in a back alley some notation liket + .data
or&t->data
to do pointer arithmetic, reading in the doc that pointer arithmetic is subject to the same pointer validation as dereferencing.Wearing a newer hat, since
unsafe {&raw const *t}
and&raw const (*t).data
exist, and don't dereference (as much as*t
), it's much less clear to me what(*t).data.len()
should do. Isn't quietly doing raw pointer access also a risk, leaving you unprotected by aliasing rules that you thought were being applied?46 remaining items