Closed
Description
This is a tracking issue for the RFC "Allow non-ASCII identifiers" (rust-lang/rfcs#2457).
Steps:
- Implement the RFC (cc @rust-lang/compiler @Manishearth)
- Normalize identifiers to NFC whilst parsing (Normalize ident #66670, Add symbol normalization for proc_macro_server. #67702)
- Ensure that
#![forbid(non_ascii_idents)]
works. (non_ascii_idents
lint (part of RFC 2457) #61883) - Lint:
confusable_idents
(Implementconfusable_idents
lint. #71542, Implement mixed script confusable lint. #72770) - Adjustments to "
bad stylenon_standard_style
" lints. (See Split and expand nonstandard-style lints unicode unit test. #73839) - Lint:
mixed_script_confusables
(Implement mixed script confusable lint. #72770) - Provide reusable crates for above lints and checks on crates.io. (unicode-security)
- Similarly to out-of-line modules (
mod фоо;
), extern crates and paths with a first segment naming a crate should not be able to do filesystem search using those non-ASCII identifiers (i.e. no ,extern crate ьаг;
orму_сгате::baz
). (Disallow loading crates with non-ascii identifier name. #73305) - Disallow using non-ascii identifiers in extern blocks.(Disable using non-ascii identifiers in extern blocks. #83936)
Adjust documentation (see instructions on forge) (Move non-ascii-idents content from unstable book to reference. reference#999)Stabilization PR (see instructions on forge) (Stablizenon-ascii-idents
#83799)
Unresolved questions:
- Which context is adequate for confusable detection: file, current scope, crate?How are non-ASCII idents best supported in debuggers?
Resolved: DWARF and debuggers handle UTF-8 just fineWhich name mangling scheme is used by the compiler? (Punycode, see RFC2603)Is there a better name for theless_used_codepoints
lint?
Resolved in favour ofuncommon_codepoints
Which lint should the global mixed scripts confusables detection trigger?
Resolved in favor ofmixed_script_confusables
How badly do non-ASCII idents exacerbate const pattern confusion
(Statics shadow local variables causing "refutable pattern error", and non-obvious bugs. #7526, We shouldn't even try to resolve irrefutable patterns as constants #49680)?
Can we improve precision of linting here?Inmixed_script_confusables
, do we actually need to make an exception forLatin
identifiers?Terminal width is a tricky with unicode. Some characters are long, some have lengths dependent on the fonts installed (e.g. emoji sequences), and modifiers are a thing. The concept of monospace font doesn't generalize to other scripts as well. How does rustfmt deal with this when determining line width?right-to-left scripts can lead to weird rendering in mixed contexts (depending on the software used), especially when mixed with operators. This is not something that should block stabilization, however we feel it is important to explicitly call out. Future RFCs (preferably put forth by RTL-using communities) may attempt to improve this situation (e.g. by allowing bidi control characters in specific contexts).TweakXID_Start
/XID_Continue
? XID_Start / XID_Continue might not be quite right #4928http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1518.htm
The ISO JTC1/SC22/WG14 (C language) think that possibly UTR#31 didn't quite hit the nail on the head in terms of defining identifier syntax. They have a couple tweaks in mind. Consider following their lead.
zulip channel topic for real-time discussion:
https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/nonascii.20identifiers(rfc.202457)
Metadata
Metadata
Assignees
Labels
Blocker: Approved by a merged RFC but not yet implemented.Blocker: Approved by a merged RFC and implemented but not stabilized.Blocker: Implemented in the nightly compiler and unstable.Category: An issue tracking the progress of sth. like the implementation of an RFC`#![feature(non_ascii_idents)]`Relevant to the language teamThis issue / PR is in PFCP or FCP with a disposition to merge it.The final comment period is finished for this PR / Issue.
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
Manishearth commentedon Oct 29, 2018
last unresolved question isn't a real unresolved question, it was included in the RFC for completeness but does not block this issue.
Centril commentedon Oct 29, 2018
@joshtriplett Please check that the list of checkboxes above are satisfactory. :)
@Manishearth alright; leave a note under it to that effect?
Manishearth commentedon Oct 29, 2018
The note saying so is already in the unresolved q
8573 commentedon Oct 29, 2018
Substituting "rare" or "unusual" for "less used" seems to me a simple, if not necessarily final, improvement, replacing the somewhat awkward "less used" with a single, shorter, more usual synonym.
(Edit: I note that I personally oppose allowing non-ASCII identifiers, but I recognize that the Rust Team favors it, and I have no problem bowing to their decision and chipping in my cents to help.)
Manishearth commentedon Oct 29, 2018
Serentty commentedon Oct 29, 2018
I would prefer “rare” as it sounds more objective to me than “unusual”, and perhaps less judgemental as well.
eaglgenes101 commentedon Nov 1, 2018
My first thought was "uncommon", but that's not strong enough of an adjective to get the intended meaning across.
Centril commentedon Nov 1, 2018
I'm partial towards "rare" as well;
rare_codepoints
is pretty short and sweet.203 remaining items