-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Intl.Collator implementation using ICU4C #1413
base: main
Are you sure you want to change the base?
Conversation
Classes without ICU / Platform API dependency: * Constants - String constants for options bag’s names and values, valid option values, other constants. * IntlUtils - Conversion between UTF-8 and UTF-16 ASCII strings, convert string to bool, i.e. case- insensitive match against “true”, lowercase ASCII strings. * LocaleBCP47Object - Wrapper over a BCP47Parser::ParsedLocaleIdentifier, provides factory method to create instances from a BCP47 locale string, methods to get canonicalized locale string and locale string without extensions, and implements the CanonicalizeLocaleList() spec operation. * OptionHelpers - Gets string, bool, and number option from options bag. ICU dependent classes: * Collator - Intl.Collator implementation using ICU collator API. * Locale - Conversion between BCP47 and ICU locale strings. * LocaleResolver - Implements ResolveLocale() and SupportedLocales() spec operations, supports both “lookup” and “best fit” locale matching. “best fit” locale matching uses ICU acceptLanguage API. Fix memory leaks in existing DateTimeFormat implementation in PlatformIntlICU.cpp by wrapping created ICU object pointers in unique_ptr. Migrate ResolveLocale, SupportedLocalesOf, GetCanonicalLocales API implementation to use the corresponding newly added supporting classes. Add test cases to verify invalid locales and options input would result in throwing JS exception in collation.js. Add tests cases to verify resolution of locale extensions and options in a new file collation-resolved-options.js. [Testing] Build succeeds on Ubuntu 20.04. Run the following JS tests on Ubuntu 20.04, and all pass except for a few test cases in date-time-format-apple.js due to locale data differences. The output of running date-time-format-apple.js test is the same as before this change. ``` $ ./build/bin/hermes ./hermes/test/hermes/intl/intl.js $ ./build/bin/hermes ./hermes/test/hermes/intl/get-canonical-locales.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/get-canonical-locales.js $ ./build/bin/hermes ./hermes/test/hermes/intl/collator.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/collator.js $ LC_ALL=fr_FR _HERMES_TEST_LOCALE=fr_FR ./build/bin/hermes ./hermes/test/hermes/intl/collator-resolved-options.js $ TZ=GMT ./build/bin/hermes ./hermes/test/hermes/intl/date-time-format-apple.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format-apple.js ``` Run valgrind with above test runs and no memory leak detected. Run Test262 tests for Collator, String-LocaleCompare, DateTimeFormat, Date, and Intl. All non-skipped test cases pass. ``` $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Collator/ $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/String/prototype/localeCompare/ $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/DateTimeFormat/ $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Date/ $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Intl/ ```
Hi @robchu05! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thank you for putting this together! I haven't gone into the details yet, but wanted to first share some high level organisational comments. In particular, I'm wary of cluttering the My preference in this instance would be to flatten more of the internals into Where you feel that doesn't make sense, the extra files should all be in icu_impl, so it is clear that they exist only to support the ICU implementation. |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Hi @neildhar, Thanks for the code organization feedback. I agree with the comments on cluttering My intention is to place the files with no ICU dependency in Since we aren't yet consolidating implementation across platforms, the code organization in this revision does look wonky. Rather than having this code organization split currently, how about I move all the new implementation files into Regarding flatten more of the internals into |
@robchu05 Sure, if you feel it makes it easier to keep things split, that's fine too. My comment was primarily about keeping the ICU implementation internals together. |
Change type of string constants in Constants.h to char16_t[] from std::u16string to avoid global constructors and unnecessary dynamic allocations. Also, change structs to namespaces and specify inline constexpr so they are compile-time constants and can be shared across multiple .cpp files. Move those newly added files for ICU-based implementation in lib/Platform/Intl into lib/Platform/Intl/impl_icu/. Review (const reference) std::(u16)string usages in function parameters and change them to std::(u16)string_view if implementation does not need a std::(u16)string object or a null-terminated C-style string, to avoid unnecessary temporary std::(u16)string object construction. [Testing] Build succeeds on Ubuntu 20.04. Run the following JS tests on Ubuntu 20.04, and all pass except for a few test cases in date-time-format-apple.js due to locale data differences. The output of running date-time-format-apple.js test is the same as before this change. ``` $ ./build/bin/hermes ./hermes/test/hermes/intl/intl.js $ ./build/bin/hermes ./hermes/test/hermes/intl/get-canonical-locales.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/get-canonical-locales.js $ ./build/bin/hermes ./hermes/test/hermes/intl/collator.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/collator.js $ LC_ALL=fr_FR _HERMES_TEST_LOCALE=fr_FR ./build/bin/hermes ./hermes/test/hermes/intl/collator-resolved-options.js $ TZ=GMT ./build/bin/hermes ./hermes/test/hermes/intl/date-time-format-apple.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format-apple.js ``` Run valgrind with above test runs and no memory leak detected. Run Test262 tests for Collator, String-LocaleCompare, DateTimeFormat, Date, and Intl. All non-skipped test cases pass. ``` $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Collator/ $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/String/prototype/localeCompare/ $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/DateTimeFormat/ $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Date/ $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Intl/ ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologise for the delay in getting to this. I've done a pass over, and defer to you on the intricacies of interfacing with ICU, so I mostly have comments on potential inefficiencies.
Hi @neildhar, Thanks for the review and feedback. I target to address them and post a new revision by end of next week. |
Change the cache of available locales to a function local static from a static class member in LocaleResolver. Turn classes with only static methods to namespaced functions. Use arrays for small constant sets and arrays of pairs for small constant maps. [Testing] Build succeeds on Ubuntu 20.04. Run the following JS tests on Ubuntu 20.04, and all pass except for a few test cases in date-time-format-apple.js due to locale data differences. The output of running date-time-format-apple.js test is the same as before this change. ``` $ ./build/bin/hermes ./hermes/test/hermes/intl/intl.js $ ./build/bin/hermes ./hermes/test/hermes/intl/get-canonical-locales.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/get-canonical-locales.js $ ./build/bin/hermes ./hermes/test/hermes/intl/collator.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/collator.js $ LC_ALL=fr_FR _HERMES_TEST_LOCALE=fr_FR ./build/bin/hermes ./hermes/test/hermes/intl/collator-resolved-options.js $ TZ=GMT ./build/bin/hermes ./hermes/test/hermes/intl/date-time-format-apple.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format-apple.js ``` Run valgrind with above test runs and no memory leak detected. Run Test262 tests for Collator, String-LocaleCompare, DateTimeFormat, Date, and Intl. All non-skipped test cases pass. ``` $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Collator/ $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/String/prototype/localeCompare/ $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/DateTimeFormat/ $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Date/ $ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Intl/ ```
Hi @neildhar, I updated the PR to address your feedback. Please have a look 😀 I would also like to get your inputs on the tests. The CircleCI test failure on test-macos-test262 job is on the additional test cases that I added in test/hermes/intl/. I can work on fixing PlatformIntlApple for a few test cases. For other cases that are not straightforward to fix on Apple and are not critical, we can consider splitting the Hermes intl tests so that one set is run for mac and one set is run for linux. Another thing to consider is to add a test-linux-test262 job similar to test-macos-test262 on CircleCI. I think that would involve adding a different set of intl test skip list for the test262 test suite because the intl coverage on linux with icu4c is not yet matching Apple platform and some intl test cases in current skip list (for Apple) pass, thus do not need to be skipped, on linux. Appreciate your thoughts on the above! Thanks! |
Hey @robchu05 thanks for updating the PR, I haven't forgotten about this, I've been travelling for the last few weeks and will take a look once I'm back next week. |
Thank you for letting me know, @neildhar. Hope you are having a pleasant and safe travel. |
Summary
Add Intl.Collator implementation using ICU4C for non-Android / non-Apple platforms.
Classes without ICU / Platform API dependency:
ICU dependent classes:
Fix memory leaks in existing DateTimeFormat implementation in PlatformIntlICU.cpp by wrapping created ICU object pointers in unique_ptr.
Migrate ResolveLocale, SupportedLocalesOf, GetCanonicalLocales API implementation to use the corresponding newly added supporting classes.
Add test cases to verify invalid locales and options input would result in throwing JS exception in collation.js. Add tests cases to verify resolution of locale extensions and options in a new file collation-resolved-options.js.
Test Plan
Build succeeds on Ubuntu 20.04. Run the following JS tests on Ubuntu 20.04, and all pass except for a few test cases in date-time-format-apple.js due to locale data differences. The output of running date-time-format-apple.js test is the same as before this change.
Run valgrind with above test runs and no memory leak detected.
Run Test262 tests for Collator, String-LocaleCompare, DateTimeFormat, Date, and Intl. All non-skipped test cases pass.