Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CharacterDirectionOfLocale: Returns "ltr" for scripts without any character direction #90

Open
anba opened this issue Aug 5, 2024 · 3 comments

Comments

@anba
Copy link
Contributor

anba commented Aug 5, 2024

ICU4C's uloc_isRightToLeft doesn't seem to provide this information, but at least ICU4X's LocaleDirectionality has three possible return values:

  • Direction::LeftToRight
  • Direction::RightToLeft
  • None for scripts like Zzzz or Zyyy.

See also https://docs.rs/icu/latest/icu/locid_transform/struct.LocaleDirectionality.html#method.get.

I'm not sure if there's an easy way to detect the unknown direction case in ICU4C, so I don't know if there's much support to support this case in the spec. But I guess we could at least document this case more clearly in the spec by adding an <emu-note> to CharacterDirectionOfLocale to describe that "ltr" is returned even when no direction is known.

@sffc
Copy link
Contributor

sffc commented Feb 7, 2025

@lianghai Do you have thoughts on what the correct behavior should be here?

Does it make sense for all non-RTL scripts to return LTR, either if they don't have data or if they use some other script direction?

@lianghai
Copy link

lianghai commented Feb 7, 2025

I suppose this is about https://tc39.es/proposal-intl-locale-info/#sec-character-direction-of-locale?

LTR is certainly the architectural default in practically all layout systems, but it’s weird for such a metadata API to use LTR to actually represent its internal “unknown”. ICU4X’s LeftToRight | RightToLeft | None design makes more sense. Such trivial fallback from “unknown” to LTR (if even needed) should be handled at call site explicitly, instead of being resolved internally (users then lose the info of “unknown”).

… or if they use some other script direction?

You mean other directions like top-to-bottom? Those are either analyzed as LTR/RTL, or you need to expand the enum to actually include those cases – shouldn‘t be treated as “unknown” or let blindly fallback to LTR anyway.

Note that some natively vertical scripts are LTR (eg, Mongolian) in the horizontal compromised layout, while other become RTL (eg, Old Uyghur). So it certainly doesn’t make sense to fold those all into LTR. And scripts like CJK have multiple native directions.

Because of the complexity of properly modeling script directions (CSS’s model is a good reference), it’s reasonable to limit the architecture to LTR/RTL/unknown.

@sffc
Copy link
Contributor

sffc commented Mar 3, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants