-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode 15.1 support #253
Unicode 15.1 support #253
Conversation
Similar to #47392, support [Unicode 15.1](https://www.unicode.org/versions/Unicode15.1.0/) by bumping utf8proc to 2.9.0 (JuliaStrings/utf8proc#253). This allows us to use [118 exciting new emoji characters](https://blog.emojipedia.org/whats-new-in-unicode-15-1-and-emoji-15-1/) as identifiers, including "edible mushroom" `"\U1f344\u200d\U1f7eb"` (but still no superscript "q"). Interestingly, they also updated the [Unicode recommendations on programming-language identifiers (UAX#31)](https://www.unicode.org/reports/tr31/tr31-39.html#Mathematical_Compatibility_Notation_Profile) to finally "bless" identifiers beginning with `∂` and `∇` and/or ending with numeric sub/superscripts. They still don't recommend nearly the range of identifiers accepted by Julia, however.
Do you think this, i.e. the Julialang PR JuliaLang/julia#51799 I reviewed the code here, which is rather small (and that PR trivial), except for the Ruby generator that I think I don't need to scrutinize, and it seems safe/preferred to backport, though I did not look at utf8proc_data.c since it's quite large (and generated?). I don't think your change is a breaking change, but I'm not sure.
Because of, at Wikipedia:
Also:
What I find likely breaking about regarding GB18030-2022 and thus I think Unicode 15.1 (but not at the level of utf8proc?)::
|
It's not a breaking change, I think (mainly just adding new characters, and tweaking some grapheme-break rules), but it's a new feature and thus probably not eligible for backport. |
Support for Unicode 15.1, which means updating the tables but also adding a new rule to the grapheme-break algorithm to account for the new
Indic_Conjunct_Break
property. Fixes #252Currently a work-in-progress. To do:
Update: should be ready now.