Skip to content

Tracking issue for RFC 2151, Raw Identifiers #48589

Closed
@Centril

Description

@Centril
Contributor

This is a tracking issue for RFC 2151 (rust-lang/rfcs#2151).

Steps:

Unresolved questions:

  • Do macros need any special care with such identifier tokens?
    Probably not.
    Should diagnostics use the r# syntax when printing identifiers that overlap keywords?
    Depends on the edition?
    Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*

Activity

added
B-RFC-approvedBlocker: Approved by a merged RFC but not yet implemented.
T-langRelevant to the language team
C-tracking-issueCategory: An issue tracking the progress of sth. like the implementation of an RFC
on Feb 27, 2018
nikomatsakis

nikomatsakis commented on Feb 28, 2018

@nikomatsakis
Contributor

@Centril you rock! @petrochenkov, think you could supply some mentoring instructions here?

Lymia

Lymia commented on Mar 6, 2018

@Lymia
Contributor

I'd like to take a shot at this. It seems like it'd be a decent way to learn how rustc works.

Manishearth

Manishearth commented on Mar 7, 2018

@Manishearth
Member

The relevant code is probably this, you'll want to make the ('r', Some('#'), _) case allow for the third character to be alphabetic or an underscore, and in that case skip the r and the # before running ident_continue.

We could add an is_raw boolean to token::Ident as well.

You'll also need to feature gate this but we can do that later.

lmk if you have questions

petrochenkov

petrochenkov commented on Mar 7, 2018

@petrochenkov
Contributor

We could add an is_raw boolean to token::Ident as well.

This is possible, but would be unfortunate. Idents are used everywhere and supposed to be small.

Ideally we should limit the effect of r# to lexer, for example by interning r#keyword identifiers into separate slots (like gensyms) so r#keyword and keyword have different NameSymbols.

EDIT: The first paragraph is about ast::Ident, token::Ident may actually be the appropriate place.

petrochenkov

petrochenkov commented on Mar 7, 2018

@petrochenkov
Contributor

Some clarifications are needed:

  • How r# affects context-dependent identifiers (aka weak keywords) like default.
    Do they lose their special context-dependent properties and turn into "normal identifiers"?
// `union` is a normal ident, this is not an error
union U {
    ....
}

// `union` is a raw ident, is this an error?
r#union U {
    ...
}
  • How does r# affect keywords that are "semantically special" and not "syntactically special"?
    I'm talking about path segment keywords specifically.
    For example, Self in Self::A::B is already treated as normal identifier during parsing, it only gains special abilities during name resolution when we resolve identifiers named Self (or self/super/etc) in a special way.
#[derive(Default)]
struct S;

impl S {
    fn f() -> S {
        r#Self::default() // Is this an error?
    }  
}
Manishearth

Manishearth commented on Mar 7, 2018

@Manishearth
Member

oh, I didn't realize we reuse Ident from the lexer.

I think r#union is an error (when used to create a union). We'll need the ident lexing step to return a bool on the lexed ident's raw-ness.

I think it's ok for r#Self to work; but don't mind either way

petrochenkov

petrochenkov commented on Mar 7, 2018

@petrochenkov
Contributor

Also, lifetime identifiers weren't covered by the RFC - r#'ident or 'r#ident.
(One more case of ident vs lifetime mismatch caused by lifetime token being a separate entity rather than a combination of ' and identifier, cc https://internals.rust-lang.org/t/pre-rfc-splitting-lifetime-into-two-tokens/6716).

Manishearth

Manishearth commented on Mar 7, 2018

@Manishearth
Member

I think it's fine if we don't have raw lifetime identifiers. Lifetimes are crate-local, their identifiers never need to be used by consumers of your crate, so lifetimes clashing with keywords can simply be fixed on epochs. Admittedly, writing a lint that makes that automatic may be tricky.

Raw identifiers are primarily necessary because people may need to call e.g. functions named catch() in crates on an older epoch. This problem doesn't occur for lifetimes.

petrochenkov

petrochenkov commented on Mar 7, 2018

@petrochenkov
Contributor

Yeah, it's mostly a consistency question rather than a practical issue.

Lymia

Lymia commented on Mar 7, 2018

@Lymia
Contributor

From what I've been seeing while looking around the codebase, I think the best way to implement this is to add a new parameter to token::Ident, rather than messing with the Symbol itself?

I think this would make implementing epoch-specific keywords easier, since there's no question of what Symbol should be used when, and in what epoch. (For example, you'd have to make sure the Symbol for catch being used as an identifier in 2015 epoch code is the same as the Symbol for r#catch being used in a epoch where it's a full keyword.) This was already something I wasn't sure how to handle with contextual keywords.

My main questions, right now, would be:

  • Does this actually sound like the best approach?
  • Reading the code, it looks like most feature gating is done on the AST after parsing, and not during parsing. Since nothing in the AST would reflect the raw identifiers being there at all with this approach, the feature check would have to be in parser.rs. Would adding one there be an issue? How would I go about doing that, considering that module doesn't have other feature checks that I can use as a template?
  • A minor code style point: right now, token::Ident is declared as Ident(ast::Ident). To add an is_raw field would mean having a mystery unnamed bool field in a tuple struct, or making it use named fields, in which case, matching on token::Idents becomes nastier. One idea that did come to mind is adding an RawIdent(ast::Ident) variant, but then the compiler can't help me find places I might need to worry about raw identifiers. Any advice on this?

I'll implement lifetime parameters if it turns out to be easy to, I guess. As Manishearth said, it's not something you really need to escape ever.

Manishearth

Manishearth commented on Mar 7, 2018

@Manishearth
Member

Yes, we should not be affecting Symbol.

Regarding the feature gate, we can solve the problem later, but I was thinking of doing a delayed error or something since we don't know what feature gates are available whilst lexing

a mystery unnamed bool field in a tuple struct

I think that's fine. Folks usually do this as Ident(ast::Ident, /* is_raw */ bool)

80 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

B-RFC-implementedBlocker: Approved by a merged RFC and implemented but not stabilized.C-tracking-issueCategory: An issue tracking the progress of sth. like the implementation of an RFCE-easyCall for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.E-mentorCall for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.P-highHigh priorityT-langRelevant to the language teamdisposition-mergeThis issue / PR is in PFCP or FCP with a disposition to merge it.finished-final-comment-periodThe final comment period is finished for this PR / Issue.

Type

No type

Projects

No projects

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @cuviper@seanmonstar@alexreg@eddyb@Nemo157

      Issue actions

        Tracking issue for RFC 2151, Raw Identifiers · Issue #48589 · rust-lang/rust