Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JVM] Several opcodes for string (e.g. chars, substr*) work on Java's chars (UTF-16) instead of graphemes #783

Open
usev6 opened this issue Oct 30, 2022 · 0 comments
Labels

Comments

@usev6
Copy link
Contributor

usev6 commented Oct 30, 2022

For the JVM backend various Unicode related tests (e.g. in https://github.com/Raku/roast/) fail, because some opcodes for strings don't work on graphemes, but on Java's chars.

Examples:

$ ./rakudo-m -e 'my Str $u = "\x[0043,0323]"; say "$u -- chars: " ~ $u.chars'
C̣ -- chars: 1
$ ./rakudo-j -e 'my Str $u = "\x[0043,0323]"; say "$u -- chars: " ~ $u.chars'
C̣ -- chars: 2
$ ./rakudo-m -e 'my $str = join "", 0x10426.chr, 0x10427.chr; say $str.chars; say substr($str, 0, 1).uniname; say substr($str, 1, 1).uniname'
2
DESERET CAPITAL LETTER OI
DESERET CAPITAL LETTER EW
$ ./rakudo-j -e 'my $str = join "", 0x10426.chr, 0x10427.chr; say $str.chars; say substr($str, 0, 1).uniname; say substr($str, 1, 1).uniname'
4
<surrogate-D801>
<surrogate-DC26>

The problem is even mentioned in Rakudo's documentation on routine chars:

Please note that on the JVM, you currently get codepoints instead of graphemes.

I'm not sure if this can be solved without fully supporting NFG (#241).
But at least I want to use this issue as a reference for fudged tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant