Tracking issue for RFC 2457, "Allow non-ASCII identifiers"

This is a tracking issue for the RFC "Allow non-ASCII identifiers" (rust-lang/rfcs#2457).

**Steps:**

- [x] Implement the RFC (cc @rust-lang/compiler @Manishearth)
   - [x] Normalize identifiers to NFC whilst parsing (#66670, #67702)
   - [x] Ensure that `#![forbid(non_ascii_idents)]` works. (#61883)
   - [x] Lint: `confusable_idents`(#71542, #72770)
   - [x] Lint: ~`less_used_codepoints`~ `uncommon_codepoints` (#67810)
   - [x] Adjustments to "<del>bad style</del>`non_standard_style`" lints. (See #73839)
   - [x] Lint: `mixed_script_confusables`(#72770)
   - [x] Provide reusable crates for above lints and checks on crates.io. ([unicode-security](https://docs.rs/unicode-security))
   - [x] Similarly to out-of-line modules (`mod фоо;`), extern crates and paths with a first segment naming a crate should not be able to do filesystem search using those non-ASCII identifiers (i.e. no , `extern crate ьаг;` or `му_сгате::baz`). (#73305)
   - [x] Disallow using non-ascii identifiers in extern blocks.(#83936)
- [x] Adjust documentation ([see instructions on forge][doc-guide]) (https://github.com/rust-lang/reference/pull/999)
- [x] Stabilization PR ([see instructions on forge][stabilization-guide]) (#83799)

[stabilization-guide]: https://forge.rust-lang.org/stabilization-guide.html
[doc-guide]: https://forge.rust-lang.org/stabilization-guide.html#updating-documentation

**Unresolved questions:**

[TR31Layout]: https://www.unicode.org/reports/tr31/#Layout_and_Format_Control_Characters

* [ ] Which context is adequate for confusable detection: file, current scope, crate?
* [ ] Should [ZWNJ and ZWJ be allowed in identifiers][TR31Layout]?
* [x] How are non-ASCII idents best supported in debuggers? 
        Resolved: DWARF and debuggers handle UTF-8 just fine
* [x] Which name mangling scheme is used by the compiler? (Punycode, see [RFC2603](https://github.com/rust-lang/rfcs/pull/2603))
* [x] Is there a better name for the `less_used_codepoints` lint?
       Resolved in favour of `uncommon_codepoints`
* [x] Which lint should the global mixed scripts confusables detection trigger? 
       Resolved in favor of `mixed_script_confusables`
* [ ] How badly do non-ASCII idents exacerbate const pattern confusion
  (rust-lang/rust#7526, rust-lang/rust#49680)?
  Can we improve precision of linting here?
* [ ] In `mixed_script_confusables`, do we actually need to make an exception for `Latin` identifiers?
* [ ] Terminal width is a tricky with unicode. Some characters are long, some have lengths dependent on the fonts installed (e.g. emoji sequences), and modifiers are a thing. The concept of monospace font doesn't generalize to other scripts as well. How does rustfmt deal with this when determining line width?
* [x] right-to-left scripts can lead to weird rendering in mixed contexts (depending on the software used), especially when mixed with operators. This is not something that should block stabilization, however we feel it is important to explicitly call out. Future RFCs (preferably put forth by RTL-using communities) may attempt to improve this situation (e.g. by allowing bidi control characters in specific contexts).
* [ ] Tweak `XID_Start` / `XID_Continue`? https://github.com/rust-lang/rust/issues/4928
  > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1518.htm
  >
  > The ISO JTC1/SC22/WG14 (C language) think that possibly UTR#31 didn't quite hit the nail on the head in terms of defining identifier syntax. They have a couple tweaks in mind. Consider following their lead.

----
zulip channel topic for real-time discussion:
https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/nonascii.20identifiers(rfc.202457)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tracking issue for RFC 2457, "Allow non-ASCII identifiers" #55467

188 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Tracking issue for RFC 2457, "Allow non-ASCII identifiers" #55467

Description

Activity

Manishearth commented on Oct 29, 2018

Centril commented on Oct 29, 2018

Manishearth commented on Oct 29, 2018

8573 commented on Oct 29, 2018

Manishearth commented on Oct 29, 2018

Serentty commented on Oct 29, 2018

eaglgenes101 commented on Nov 1, 2018

Centril commented on Nov 1, 2018

188 remaining items

crlf0710 commented on Apr 18, 2021

crlf0710 commented on Apr 19, 2021

Manishearth commented on Jun 5, 2021

Manishearth commented on Dec 22, 2021

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions