Closed
Description
This is a tracking issue for RFC 2151 (rust-lang/rfcs#2151).
Steps:
- Implement the RFC (Implementation of RFC 2151, Raw Identifiers #48942)Adjust documentation (see instructions on forge)Stabilization PR (see instructions on forge)
- Settle on a final syntax for raw identifiers.
Unresolved questions:
- Do macros need any special care with such identifier tokens?
Probably not.Should diagnostics use ther#
syntax when printing identifiers that overlap keywords?
Depends on the edition?Does rustdoc need to use ther#
syntax? e.g. to documentpub use old_epoch::*
Metadata
Metadata
Assignees
Labels
Blocker: Approved by a merged RFC and implemented but not stabilized.Category: An issue tracking the progress of sth. like the implementation of an RFCCall for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.High priorityRelevant to the language teamThis issue / PR is in PFCP or FCP with a disposition to merge it.The final comment period is finished for this PR / Issue.
Type
Projects
Relationships
Development
No branches or pull requests
Activity
nikomatsakis commentedon Feb 28, 2018
@Centril you rock! @petrochenkov, think you could supply some mentoring instructions here?
Lymia commentedon Mar 6, 2018
I'd like to take a shot at this. It seems like it'd be a decent way to learn how rustc works.
Manishearth commentedon Mar 7, 2018
The relevant code is probably this, you'll want to make the
('r', Some('#'), _)
case allow for the third character to be alphabetic or an underscore, and in that case skip the r and the#
before runningident_continue
.We could add an
is_raw
boolean totoken::Ident
as well.You'll also need to feature gate this but we can do that later.
lmk if you have questions
petrochenkov commentedon Mar 7, 2018
This is possible, but would be unfortunate. Idents are used everywhere and supposed to be small.
Ideally we should limit the effect of
r#
to lexer, for example by interningr#keyword
identifiers into separate slots (like gensyms) sor#keyword
andkeyword
have differentName
Symbol
s.EDIT: The first paragraph is about
ast::Ident
,token::Ident
may actually be the appropriate place.petrochenkov commentedon Mar 7, 2018
Some clarifications are needed:
r#
affects context-dependent identifiers (aka weak keywords) likedefault
.Do they lose their special context-dependent properties and turn into "normal identifiers"?
r#
affect keywords that are "semantically special" and not "syntactically special"?I'm talking about path segment keywords specifically.
For example,
Self
inSelf::A::B
is already treated as normal identifier during parsing, it only gains special abilities during name resolution when we resolve identifiers namedSelf
(orself
/super
/etc) in a special way.Manishearth commentedon Mar 7, 2018
oh, I didn't realize we reuse Ident from the lexer.
I think
r#union
is an error (when used to create a union). We'll need the ident lexing step to return a bool on the lexed ident's raw-ness.I think it's ok for
r#Self
to work; but don't mind either waypetrochenkov commentedon Mar 7, 2018
Also, lifetime identifiers weren't covered by the RFC -
r#'ident
or'r#ident
.(One more case of ident vs lifetime mismatch caused by lifetime token being a separate entity rather than a combination of
'
and identifier, cc https://internals.rust-lang.org/t/pre-rfc-splitting-lifetime-into-two-tokens/6716).Manishearth commentedon Mar 7, 2018
I think it's fine if we don't have raw lifetime identifiers. Lifetimes are crate-local, their identifiers never need to be used by consumers of your crate, so lifetimes clashing with keywords can simply be fixed on epochs. Admittedly, writing a lint that makes that automatic may be tricky.
Raw identifiers are primarily necessary because people may need to call e.g. functions named
catch()
in crates on an older epoch. This problem doesn't occur for lifetimes.petrochenkov commentedon Mar 7, 2018
Yeah, it's mostly a consistency question rather than a practical issue.
Lymia commentedon Mar 7, 2018
From what I've been seeing while looking around the codebase, I think the best way to implement this is to add a new parameter to
token::Ident
, rather than messing with theSymbol
itself?I think this would make implementing epoch-specific keywords easier, since there's no question of what
Symbol
should be used when, and in what epoch. (For example, you'd have to make sure theSymbol
forcatch
being used as an identifier in 2015 epoch code is the same as the Symbol forr#catch
being used in a epoch where it's a full keyword.) This was already something I wasn't sure how to handle with contextual keywords.My main questions, right now, would be:
parser.rs
. Would adding one there be an issue? How would I go about doing that, considering that module doesn't have other feature checks that I can use as a template?token::Ident
is declared asIdent(ast::Ident)
. To add anis_raw
field would mean having a mystery unnamedbool
field in a tuple struct, or making it use named fields, in which case, matching ontoken::Ident
s becomes nastier. One idea that did come to mind is adding anRawIdent(ast::Ident)
variant, but then the compiler can't help me find places I might need to worry about raw identifiers. Any advice on this?I'll implement lifetime parameters if it turns out to be easy to, I guess. As Manishearth said, it's not something you really need to escape ever.
Manishearth commentedon Mar 7, 2018
Yes, we should not be affecting Symbol.
Regarding the feature gate, we can solve the problem later, but I was thinking of doing a delayed error or something since we don't know what feature gates are available whilst lexing
I think that's fine. Folks usually do this as
Ident(ast::Ident, /* is_raw */ bool)
80 remaining items