Skip to content

Conversation

YOCKOW
Copy link
Member

@YOCKOW YOCKOW commented May 9, 2025

Implementation of the proposal: #1243 (SF-0033)

@YOCKOW

This comment was marked as outdated.

@YOCKOW YOCKOW added the API Change Any changes to Foundation's public API surface label May 13, 2025
@YOCKOW YOCKOW force-pushed the StringEncodingNames/implementation branch from dd0b16c to 1ae2a4c Compare June 17, 2025 01:25
@YOCKOW

This comment was marked as outdated.

@YOCKOW YOCKOW force-pushed the StringEncodingNames/implementation branch from 1ae2a4c to 4d87ed8 Compare September 21, 2025 02:59
@YOCKOW YOCKOW changed the title Implement String.Encoding.ianaName and String.Encoding(ianaName:). SF-0033: Implement String.Encoding.ianaName and String.Encoding(ianaName:). Sep 21, 2025
@YOCKOW

This comment was marked as outdated.

@YOCKOW YOCKOW force-pushed the StringEncodingNames/implementation branch 4 times, most recently from 217c46d to 9cd5c9a Compare October 12, 2025 08:16
@YOCKOW

This comment was marked as outdated.

1 similar comment
@YOCKOW

This comment was marked as outdated.

@YOCKOW YOCKOW marked this pull request as ready for review October 14, 2025 05:27
@YOCKOW YOCKOW marked this pull request as draft October 14, 2025 07:51
@YOCKOW YOCKOW force-pushed the StringEncodingNames/implementation branch from 9cd5c9a to 3e9aef4 Compare October 14, 2025 08:12
@YOCKOW YOCKOW force-pushed the StringEncodingNames/implementation branch from 3e9aef4 to ac42127 Compare October 16, 2025 06:04
@YOCKOW
Copy link
Member Author

YOCKOW commented Oct 16, 2025

@swift-ci Please test

@YOCKOW YOCKOW marked this pull request as ready for review October 16, 2025 06:55
@YOCKOW
Copy link
Member Author

YOCKOW commented Oct 16, 2025

@itingliu

I've made this PR ready for review.
I'm happy if you take a look at this. Thank you.

cc: Foundation Workgroup
@adam-fowler @iCharlesHu @Lukasa @designatednerd @jmschonfeld @lorentey @stephentyrone @itingliu @tomerd @parkera

@YOCKOW YOCKOW requested a review from itingliu October 16, 2025 06:55
##===----------------------------------------------------------------------===##

"""
This is a python script that converts an XML file containing the list of IANA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must ask here since we're a Swift project 😄
Is there a reason why this needs to be a python script? Can't we write this in Swift?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just because there are not a few Python scripts in swiftlang/swift/utils.
Of course, we can rewrite it in Swift because we must love Swift more than Python.😁

}

/// A type to tokenize string for `String.Encoding` names.
internal protocol StringEncodingNameTokenizer: ~Copyable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to introduce other conformer for this protocol? I wonder if omitting using this protocol, but use the concrete ASCIICaseInsensitiveTokenizer would give us performance gains.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was certainly planning to be able to support other conformer such as UTS #22's Charset Alias Matching which has been already removed.
I should've removed (refactored) code more aggressively.


// MARK: - Private extensions for parsing encoding names

private extension Unicode.Scalar {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a general comment for the functions in this file:

Can we store and operate the string processing logic on UInt8 instead of Unicode.Scalar (and UTF8View instead of UnicodeScalarView) since we expect the names to be ascii?

We do so for quite a few places, such as parsing utilities in this file IIRC it did give us a better performance than resorting to UnicodeScalarView

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now come to think UInt8 is more effective than Unicode.Scalar.
I'll retouch functions to use UInt8.

_ string: String,
tokenizedBy tokenizer: T.Type
) -> Bool where T: StringEncodingNameTokenizer, T: ~Copyable {
if let preferredMIMEName = self.preferredMIMEName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why we need a tokenizer. This function looks like it is effectively the same as doing case-insensitive-compare on two strings. In the case of ascii strings, it is also effectively the same as calling lowercased() and compare. Is my understanding correct? Did you add a tokenizer to avoid extra allocation from calling lowercased()? Are there other things that I'm missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are/were two reasons which came to my mind:

  1. To adopt "Once and Only Once" principle: This is not applied to current implementation because code for UTS#22 has been removed.
  2. To avoid extra allocation from calling lowercased() (as you pointed out): I imagined that user input might be like "VeryVeryLongLongLongInput...".

@YOCKOW
Copy link
Member Author

YOCKOW commented Oct 19, 2025

@itingliu
Thank you for reviewing. I'm going to modify implementation reflecting your feedback.

YOCKOW added a commit to YOCKOW/swift-foundation that referenced this pull request Oct 19, 2025
@YOCKOW YOCKOW force-pushed the StringEncodingNames/implementation branch from 0045a48 to e674fa6 Compare October 19, 2025 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Change Any changes to Foundation's public API surface

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants