Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String.StartsWith doesn't work if the string contains "AA" when using Norwegian/Danish cultureinfo in .NET 6 #73999

Closed
runebrg opened this issue Aug 16, 2022 · 6 comments

Comments

@runebrg
Copy link

runebrg commented Aug 16, 2022

Description

String.StartsWith() will sometimes return the wrong value if the string contains "AA" and culture is set to Norwegian (nb-NO) or Danish (da-DK)

Even though the double A has a special meaning in Norwegian, I would expect s.StartsWith(s.Substring(0, 2)) to always return true

Example .net fiddle:
https://dotnetfiddle.net/h4u01x

Reproduction Steps

var s = "BAAC"; var b = s.StartsWith(s.Substring(0, 2), false, CultureInfo.CreateSpecificCulture("nb-NO"));

Expected behavior

b should be true

Actual behavior

b is false

Regression?

Using .NET 4.7.2, it works as expected for Norwegian, returning true. But for Danish it is still false.

Example:
https://dotnetfiddle.net/FCwqGH

Known Workarounds

Specifying InvariantCulture fixes the problem.

Configuration

No response

Other information

No response

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Aug 16, 2022
@ghost
Copy link

ghost commented Aug 16, 2022

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

String.StartsWith() will sometimes return the wrong value if the string contains "AA" and culture is set to Norwegian (nb-NO) or Danish (da-DK)

Even though the double A has a special meaning in Norwegian, I would expect s.StartsWith(s.Substring(0, 2)) to always return true

Example .net fiddle:
https://dotnetfiddle.net/h4u01x

Reproduction Steps

var s = "BAAC"; var b = s.StartsWith(s.Substring(0, 2), false, CultureInfo.CreateSpecificCulture("nb-NO"));

Expected behavior

b should be true

Actual behavior

b is false

Regression?

Using .NET 4.7.2, it works as expected for Norwegian, returning true. But for Danish it is still false.

Example:
https://dotnetfiddle.net/FCwqGH

Known Workarounds

Specifying InvariantCulture fixes the problem.

Configuration

No response

Other information

No response

Author: runebrg
Assignees: -
Labels:

area-System.Globalization

Milestone: -

@krwq
Copy link
Member

krwq commented Aug 16, 2022

FWIW I don't know specifics of Danish/Norwegian culture but in Polish we also have special phonetic characters (i.e. sz) and I'd be surprised if this:

string test = "Pszczoła";
Console.WriteLine(test.StartsWith(test.Substring(0, 2), false, CultureInfo.CreateSpecificCulture("pl-PL"))); // true

ever returned false (this works as I'd expect for Polish) so it makes sense that this works consistently across other cultures as well.

@tarekgh
Copy link
Member

tarekgh commented Aug 16, 2022

@runebrg this behavior is defined by the Unicode standard. aa is considered equivalent to Å. You may look at the history for more info. Look at the similar issue #72770.

If you disagree with this behavior, you may log a ticket to ICU unicode-org.atlassian.net/jira/software/c/projects/ICU/issues.

@tarekgh tarekgh closed this as completed Aug 16, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Aug 16, 2022
@runebrg
Copy link
Author

runebrg commented Aug 17, 2022

I agree that aa should be considered equivalent to å in many cases in Norwegian (though not always, this is context dependent). But I still think the .NET framework behaves inconsistently here. Both "aa".StartsWith("a") and "aa".StartsWith("å") are false, but "aa".StartsWith('a') and "aa".Contains("a") are true

Fiddle:
https://dotnetfiddle.net/RcbfSa

@svick
Copy link
Contributor

svick commented Aug 17, 2022

But I still think the .NET framework behaves inconsistently here.

It does, but this behavior is documented and I believe it can't be changed, because that would break backwards compatibility.

@tarekgh
Copy link
Member

tarekgh commented Aug 17, 2022

The following operations are performed as ordinal operation and not linguistic operation. You can achieve the same things with StartsWith and input string by doing something like "aa".StartsWith("a", StringComparison.Ordinal). This should return true.

		Console.WriteLine("aa".Contains("a")); //True
		Console.WriteLine("aa".StartsWith('a')); //True

Also, consistency with .NET Framework can be achieved if you enable the NLS mode. We don't recommend that though.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants