Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Intl.Collator implementation using ICU4C #1413

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

robchu05
Copy link

Summary

Add Intl.Collator implementation using ICU4C for non-Android / non-Apple platforms.

Classes without ICU / Platform API dependency:

  • Constants - String constants for options bag’s names and values, valid option values, other constants.
  • IntlUtils - Conversion between UTF-8 and UTF-16 ASCII strings, convert string to bool, i.e. case- insensitive match against “true”, lowercase ASCII strings.
  • LocaleBCP47Object - Wrapper over a BCP47Parser::ParsedLocaleIdentifier, provides factory method to create instances from a BCP47 locale string, methods to get canonicalized locale string and locale string without extensions, and implements the CanonicalizeLocaleList() spec operation.
  • OptionHelpers - Gets string, bool, and number option from options bag.

ICU dependent classes:

  • Collator - Intl.Collator implementation using ICU collator API.
  • Locale - Conversion between BCP47 and ICU locale strings.
  • LocaleResolver - Implements ResolveLocale() and SupportedLocales() spec operations, supports both “lookup” and “best fit” locale matching. “best fit” locale matching uses ICU acceptLanguage API.

Fix memory leaks in existing DateTimeFormat implementation in PlatformIntlICU.cpp by wrapping created ICU object pointers in unique_ptr.

Migrate ResolveLocale, SupportedLocalesOf, GetCanonicalLocales API implementation to use the corresponding newly added supporting classes.

Add test cases to verify invalid locales and options input would result in throwing JS exception in collation.js. Add tests cases to verify resolution of locale extensions and options in a new file collation-resolved-options.js.

Test Plan

Build succeeds on Ubuntu 20.04. Run the following JS tests on Ubuntu 20.04, and all pass except for a few test cases in date-time-format-apple.js due to locale data differences. The output of running date-time-format-apple.js test is the same as before this change.

$ ./build/bin/hermes ./hermes/test/hermes/intl/intl.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/get-canonical-locales.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/get-canonical-locales.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/collator.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/collator.js
$ LC_ALL=fr_FR _HERMES_TEST_LOCALE=fr_FR ./build/bin/hermes ./hermes/test/hermes/intl/collator-resolved-options.js
$ TZ=GMT ./build/bin/hermes ./hermes/test/hermes/intl/date-time-format-apple.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format-apple.js

Run valgrind with above test runs and no memory leak detected.

Run Test262 tests for Collator, String-LocaleCompare, DateTimeFormat, Date, and Intl. All non-skipped test cases pass.

$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Collator/
...
-----------------------------------
| Results              |   PASS   |
|----------------------+----------|
| Total                |       61 |
| Pass                 |       45 |
| Fail                 |        0 |
| Skipped              |       16 |
| Permanently Skipped  |        0 |
| Pass Rate            |  100.00% |
-----------------------------------
| Failures             |          |
|----------------------+----------|
| Compile fail         |        0 |
| Compile timeout      |        0 |
| Execute fail         |        0 |
| Execute timeout      |        0 |
-----------------------------------

$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/String/prototype/localeCompare/
...
-----------------------------------
| Results              |   PASS   |
|----------------------+----------|
| Total                |       10 |
| Pass                 |        9 |
| Fail                 |        0 |
| Skipped              |        1 |
| Permanently Skipped  |        0 |
| Pass Rate            |  100.00% |
-----------------------------------
| Failures             |          |
|----------------------+----------|
| Compile fail         |        0 |
| Compile timeout      |        0 |
| Execute fail         |        0 |
| Execute timeout      |        0 |
-----------------------------------

$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/DateTimeFormat/
...
-----------------------------------
| Results              |   PASS   |
|----------------------+----------|
| Total                |      176 |
| Pass                 |       50 |
| Fail                 |        0 |
| Skipped              |      125 |
| Permanently Skipped  |        1 |
| Pass Rate            |  100.00% |
-----------------------------------
| Failures             |          |
|----------------------+----------|
| Compile fail         |        0 |
| Compile timeout      |        0 |
| Execute fail         |        0 |
| Execute timeout      |        0 |
-----------------------------------

$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Date/
...
-----------------------------------
| Results              |   PASS   |
|----------------------+----------|
| Total                |       12 |
| Pass                 |       10 |
| Fail                 |        0 |
| Skipped              |        2 |
| Permanently Skipped  |        0 |
| Pass Rate            |  100.00% |
-----------------------------------
| Failures             |          |
|----------------------+----------|
| Compile fail         |        0 |
| Compile timeout      |        0 |
| Execute fail         |        0 |
| Execute timeout      |        0 |
-----------------------------------

$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Intl/
...
-----------------------------------
| Results              |   PASS   |
|----------------------+----------|
| Total                |       66 |
| Pass                 |       20 |
| Fail                 |        0 |
| Skipped              |       19 |
| Permanently Skipped  |       27 |
| Pass Rate            |  100.00% |
-----------------------------------
| Failures             |          |
|----------------------+----------|
| Compile fail         |        0 |
| Compile timeout      |        0 |
| Execute fail         |        0 |
| Execute timeout      |        0 |
-----------------------------------

Classes without ICU / Platform API dependency:
* Constants - String constants for options bag’s names and values,
valid option values, other constants.
* IntlUtils - Conversion between UTF-8 and UTF-16 ASCII strings,
convert string to bool, i.e. case- insensitive match against
“true”, lowercase ASCII strings.
* LocaleBCP47Object - Wrapper over a
BCP47Parser::ParsedLocaleIdentifier, provides factory method to
create instances from a BCP47 locale string, methods to get
canonicalized locale string and locale string without extensions,
and implements the CanonicalizeLocaleList() spec operation.
* OptionHelpers - Gets string, bool, and number option from options
bag.

ICU dependent classes:
* Collator - Intl.Collator implementation using ICU collator API.
* Locale - Conversion between BCP47 and ICU locale strings.
* LocaleResolver - Implements ResolveLocale() and SupportedLocales()
spec operations, supports both “lookup” and “best fit” locale matching.
“best fit” locale matching uses ICU acceptLanguage API.

Fix memory leaks in existing DateTimeFormat implementation in
PlatformIntlICU.cpp by wrapping created ICU object pointers in
unique_ptr.

Migrate ResolveLocale, SupportedLocalesOf, GetCanonicalLocales
API implementation to use the corresponding newly added supporting
classes.

Add test cases to verify invalid locales and options input would result
in throwing JS exception in collation.js. Add tests cases to verify
resolution of locale extensions and options in a new file
collation-resolved-options.js.

[Testing]

Build succeeds on Ubuntu 20.04. Run the following JS tests on Ubuntu
20.04, and all pass except for a few test cases in
date-time-format-apple.js due to locale data differences. The output
of running date-time-format-apple.js test is the same as before this
change.

```
$ ./build/bin/hermes ./hermes/test/hermes/intl/intl.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/get-canonical-locales.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/get-canonical-locales.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/collator.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/collator.js
$ LC_ALL=fr_FR _HERMES_TEST_LOCALE=fr_FR ./build/bin/hermes ./hermes/test/hermes/intl/collator-resolved-options.js
$ TZ=GMT ./build/bin/hermes ./hermes/test/hermes/intl/date-time-format-apple.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format-apple.js
```

Run valgrind with above test runs and no memory leak detected.

Run Test262 tests for Collator, String-LocaleCompare, DateTimeFormat,
Date, and Intl. All non-skipped test cases pass.

```
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Collator/
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/String/prototype/localeCompare/
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/DateTimeFormat/
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Date/
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Intl/
```
@facebook-github-bot
Copy link
Contributor

Hi @robchu05!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

lib/Platform/Intl/Constants.h Outdated Show resolved Hide resolved
@neildhar
Copy link
Contributor

Thank you for putting this together!

I haven't gone into the details yet, but wanted to first share some high level organisational comments. In particular, I'm wary of cluttering the Platform/Intl directory with too many files that are specific to this ICU implementation. It looks like there are also dependencies going both ways between the Platorm/Intl directory and Platform/Intl/icu_impl directory, so it's difficult to reason about how things are organised.

My preference in this instance would be to flatten more of the internals into PlatformIntlICU.cpp (in the same way that PlatformIntlApple.mm is self-contained), which will reduce the need for things like Constants.h. Note that we have many files in Hermes where classes that provide internal functionality live only in the relevant cpp file, so there is no hard rule requiring them to be in separate files.

Where you feel that doesn't make sense, the extra files should all be in icu_impl, so it is clear that they exist only to support the ICU implementation.

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jun 13, 2024
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@robchu05
Copy link
Author

robchu05 commented Jul 8, 2024

Hi @neildhar,

Thanks for the code organization feedback. I agree with the comments on cluttering Platform/Intl and dependencies going both ways between Platform/Intl and Platform/Intl/impl_icu at this point in time.

My intention is to place the files with no ICU dependency in Platform/Intl so that they can be used in / shared with the Apple implementation in the future, not just for the ICU implementation. Also, in the future, I would like to refactor much more code blocks that aren't dependent on ICU out from Platform/Intl/impl_icu to Platform/Intl to share with the Apple implementation. This may not be the case if it is decided to change the Apple implementation to route the Intl calls to JSC per #1211.

Since we aren't yet consolidating implementation across platforms, the code organization in this revision does look wonky. Rather than having this code organization split currently, how about I move all the new implementation files into Platform/Intl/impl_icu for the time being, until we are ready to start on the implementation consolidation work?

Regarding flatten more of the internals into PlatformIntlICU.cpp, I am concerned that the file will grow to be tens of thousands of lines as more and eventually complete coverage of Intl APIs are implemented, which will become hard to manage and continue development. Thus, I would prefer organizing into separate files.

@neildhar
Copy link
Contributor

@robchu05 Sure, if you feel it makes it easier to keep things split, that's fine too. My comment was primarily about keeping the ICU implementation internals together.

Change type of string constants in Constants.h to char16_t[] from
std::u16string to avoid global constructors and unnecessary dynamic
allocations. Also, change structs to namespaces and specify inline
constexpr so they are compile-time constants and can be shared
across multiple .cpp files.

Move those newly added files for ICU-based implementation in
lib/Platform/Intl into lib/Platform/Intl/impl_icu/.

Review (const reference) std::(u16)string usages in function parameters
and change them to std::(u16)string_view if implementation does not need
a std::(u16)string object or a null-terminated C-style string, to avoid
unnecessary temporary std::(u16)string object construction.

[Testing]

Build succeeds on Ubuntu 20.04. Run the following JS tests on Ubuntu
20.04, and all pass except for a few test cases in
date-time-format-apple.js due to locale data differences. The output
of running date-time-format-apple.js test is the same as before this
change.

```
$ ./build/bin/hermes ./hermes/test/hermes/intl/intl.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/get-canonical-locales.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/get-canonical-locales.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/collator.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/collator.js
$ LC_ALL=fr_FR _HERMES_TEST_LOCALE=fr_FR ./build/bin/hermes ./hermes/test/hermes/intl/collator-resolved-options.js
$ TZ=GMT ./build/bin/hermes ./hermes/test/hermes/intl/date-time-format-apple.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format-apple.js
```

Run valgrind with above test runs and no memory leak detected.

Run Test262 tests for Collator, String-LocaleCompare, DateTimeFormat,
Date, and Intl. All non-skipped test cases pass.

```
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Collator/
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/String/prototype/localeCompare/
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/DateTimeFormat/
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Date/
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Intl/
```
@robchu05 robchu05 requested review from tmikov and neildhar July 18, 2024 21:37
@robchu05
Copy link
Author

robchu05 commented Aug 1, 2024

@tmikov, @neildhar, friendly reminder to review this PR. Appreciate your feedback and help in guiding me through.

Copy link
Contributor

@neildhar neildhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologise for the delay in getting to this. I've done a pass over, and defer to you on the intricacies of interfacing with ICU, so I mostly have comments on potential inefficiencies.

lib/Platform/Intl/impl_icu/LocaleResolver.cpp Outdated Show resolved Hide resolved
lib/Platform/Intl/impl_icu/LocaleResolver.h Outdated Show resolved Hide resolved
lib/Platform/Intl/impl_icu/Collator.cpp Outdated Show resolved Hide resolved
lib/Platform/Intl/impl_icu/Collator.cpp Outdated Show resolved Hide resolved
lib/Platform/Intl/impl_icu/Collator.cpp Outdated Show resolved Hide resolved
lib/Platform/Intl/PlatformIntlICU.cpp Outdated Show resolved Hide resolved
@robchu05
Copy link
Author

Hi @neildhar,

Thanks for the review and feedback. I target to address them and post a new revision by end of next week.

Change the cache of available locales to a function local static from
a static class member in LocaleResolver.

Turn classes with only static methods to namespaced functions.

Use arrays for small constant sets and arrays of pairs for small
constant maps.

[Testing]

Build succeeds on Ubuntu 20.04. Run the following JS tests on Ubuntu
20.04, and all pass except for a few test cases in
date-time-format-apple.js due to locale data differences. The output
of running date-time-format-apple.js test is the same as before this
change.

```
$ ./build/bin/hermes ./hermes/test/hermes/intl/intl.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/get-canonical-locales.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/get-canonical-locales.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/collator.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/collator.js
$ LC_ALL=fr_FR _HERMES_TEST_LOCALE=fr_FR ./build/bin/hermes ./hermes/test/hermes/intl/collator-resolved-options.js
$ TZ=GMT ./build/bin/hermes ./hermes/test/hermes/intl/date-time-format-apple.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format-apple.js
```

Run valgrind with above test runs and no memory leak detected.

Run Test262 tests for Collator, String-LocaleCompare, DateTimeFormat,
Date, and Intl. All non-skipped test cases pass.

```
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Collator/
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/String/prototype/localeCompare/
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/DateTimeFormat/
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Date/
$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Intl/
```
@robchu05
Copy link
Author

Hi @neildhar,

I updated the PR to address your feedback. Please have a look 😀

I would also like to get your inputs on the tests.

The CircleCI test failure on test-macos-test262 job is on the additional test cases that I added in test/hermes/intl/. I can work on fixing PlatformIntlApple for a few test cases. For other cases that are not straightforward to fix on Apple and are not critical, we can consider splitting the Hermes intl tests so that one set is run for mac and one set is run for linux.

Another thing to consider is to add a test-linux-test262 job similar to test-macos-test262 on CircleCI. I think that would involve adding a different set of intl test skip list for the test262 test suite because the intl coverage on linux with icu4c is not yet matching Apple platform and some intl test cases in current skip list (for Apple) pass, thus do not need to be skipped, on linux.

Appreciate your thoughts on the above! Thanks!

@robchu05 robchu05 requested a review from neildhar August 31, 2024 00:25
@neildhar
Copy link
Contributor

Hey @robchu05 thanks for updating the PR, I haven't forgotten about this, I've been travelling for the last few weeks and will take a look once I'm back next week.

@robchu05
Copy link
Author

Thank you for letting me know, @neildhar. Hope you are having a pleasant and safe travel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants