SF-0033: Implement `String.Encoding.ianaName` and `String.Encoding(ianaName:)`. #1286

YOCKOW · 2025-05-09T07:36:50Z

Implementation of the proposal: #1243 (SF-0033)

- source: https://github.com/YOCKOW/SF-StringEncodingNameImpl

- source: https://github.com/YOCKOW/SF-StringEncodingNameImpl/blob/0.4.0/Tests/StringEncodingNameImplTests/StringEncodingNameParserTests.swift

YOCKOW · 2025-10-16T06:19:36Z

@swift-ci Please test

YOCKOW · 2025-10-16T06:55:22Z

@itingliu

I've made this PR ready for review.
I'm happy if you take a look at this. Thank you.

cc: Foundation Workgroup
@adam-fowler @iCharlesHu @Lukasa @designatednerd @jmschonfeld @lorentey @stephentyrone @itingliu @tomerd @parkera

itingliu · 2025-10-16T23:37:15Z

utils/update-iana-charset-names-impl.py

+##===----------------------------------------------------------------------===##
+
+"""
+This is a python script that converts an XML file containing the list of IANA


I must ask here since we're a Swift project 😄
Is there a reason why this needs to be a python script? Can't we write this in Swift?

That's just because there are not a few Python scripts in swiftlang/swift/utils.
Of course, we can rewrite it in Swift because we must love Swift more than Python.😁

itingliu · 2025-10-18T01:09:05Z

Sources/FoundationEssentials/String/String+Encoding+Names.swift

+}
+
+/// A type to tokenize string for `String.Encoding` names.
+internal protocol StringEncodingNameTokenizer: ~Copyable {


Are you planning to introduce other conformer for this protocol? I wonder if omitting using this protocol, but use the concrete ASCIICaseInsensitiveTokenizer would give us performance gains.

I was certainly planning to be able to support other conformer such as UTS #22's Charset Alias Matching which has been already removed.
I should've removed (refactored) code more aggressively.

itingliu · 2025-10-18T01:26:42Z

Sources/FoundationEssentials/String/String+Encoding+Names.swift

+
+// MARK: - Private extensions for parsing encoding names
+
+private extension Unicode.Scalar {


This is a general comment for the functions in this file:

Can we store and operate the string processing logic on UInt8 instead of Unicode.Scalar (and UTF8View instead of UnicodeScalarView) since we expect the names to be ascii?

We do so for quite a few places, such as parsing utilities in this file IIRC it did give us a better performance than resorting to UnicodeScalarView

I now come to think UInt8 is more effective than Unicode.Scalar.
I'll retouch functions to use UInt8.

itingliu · 2025-10-18T03:15:44Z

Sources/FoundationEssentials/String/String+Encoding+Names.swift

+        _ string: String,
+        tokenizedBy tokenizer: T.Type
+    ) -> Bool where T: StringEncodingNameTokenizer, T: ~Copyable {
+        if let preferredMIMEName = self.preferredMIMEName,


I'm not sure I understand why we need a tokenizer. This function looks like it is effectively the same as doing case-insensitive-compare on two strings. In the case of ascii strings, it is also effectively the same as calling lowercased() and compare. Is my understanding correct? Did you add a tokenizer to avoid extra allocation from calling lowercased()? Are there other things that I'm missing?

There are/were two reasons which came to my mind:

To adopt "Once and Only Once" principle: This is not applied to current implementation because code for UTS#22 has been removed.

To avoid extra allocation from calling lowercased() (as you pointed out): I imagined that user input might be like "VeryVeryLongLongLongInput...".

YOCKOW · 2025-10-19T03:14:37Z

@itingliu
Thank you for reviewing. I'm going to modify implementation reflecting your feedback.

In response to: swiftlang#1286 (comment)

YOCKOW mentioned this pull request May 9, 2025

[Proposal] Add "String Encoding Names" proposal. #1243

Merged

This comment was marked as outdated.

Sign in to view

YOCKOW added the API Change Any changes to Foundation's public API surface label May 13, 2025

YOCKOW force-pushed the StringEncodingNames/implementation branch from dd0b16c to 1ae2a4c Compare June 17, 2025 01:25

This comment was marked as outdated.

Sign in to view

YOCKOW force-pushed the StringEncodingNames/implementation branch from 1ae2a4c to 4d87ed8 Compare September 21, 2025 02:59

YOCKOW changed the title ~~Implement String.Encoding.ianaName and String.Encoding(ianaName:).~~ SF-0033: Implement String.Encoding.ianaName and String.Encoding(ianaName:). Sep 21, 2025

This comment was marked as outdated.

Sign in to view

YOCKOW force-pushed the StringEncodingNames/implementation branch 4 times, most recently from 217c46d to 9cd5c9a Compare October 12, 2025 08:16

This comment was marked as outdated.

Sign in to view

YOCKOW marked this pull request as ready for review October 14, 2025 05:27

YOCKOW marked this pull request as draft October 14, 2025 07:51

YOCKOW force-pushed the StringEncodingNames/implementation branch from 9cd5c9a to 3e9aef4 Compare October 14, 2025 08:12

YOCKOW added 9 commits October 16, 2025 15:03

Import implementation for String Encoding Names from other repo.

3bc76c0

- source: https://github.com/YOCKOW/SF-StringEncodingNameImpl

Import tests for String Encoding Names from other repo.

0ea8aff

- source: https://github.com/YOCKOW/SF-StringEncodingNameImpl/blob/0.4.0/Tests/StringEncodingNameImplTests/StringEncodingNameParserTests.swift

Remove dead code in terms of the current proposal.

7acaa40

Use Testing for String Encoding Names tests.

5563472

NFC: Fix indentation in "String+Encoding+Names.swift".

a783db1

SF-0033: Adjust comments/attributes to match the accepted proposal.

7515bf4

Auto-generate Swift source code for IANA Charset names.

c726971

Remove unnecessary @inlinable.

5c9492c

Simplify String.init(ianaName:).

ac42127

YOCKOW force-pushed the StringEncodingNames/implementation branch from 3e9aef4 to ac42127 Compare October 16, 2025 06:04

Add new files related to SF-0033 to CMakeLists.txt.

c7bdbef

YOCKOW marked this pull request as ready for review October 16, 2025 06:55

YOCKOW requested a review from itingliu October 16, 2025 06:55

itingliu reviewed Oct 18, 2025

View reviewed changes

YOCKOW added a commit to YOCKOW/swift-foundation that referenced this pull request Oct 19, 2025

Rewrite script in Swift instead of Python.

0045a48

In response to: swiftlang#1286 (comment)

Rewrite script in Swift instead of Python.

e674fa6

In response to: swiftlang#1286 (comment)

YOCKOW force-pushed the StringEncodingNames/implementation branch from 0045a48 to e674fa6 Compare October 19, 2025 08:55


		// MARK: - Private extensions for parsing encoding names

		private extension Unicode.Scalar {

SF-0033: Implement String.Encoding.ianaName and String.Encoding(ianaName:). #1286

Are you sure you want to change the base?

SF-0033: Implement String.Encoding.ianaName and String.Encoding(ianaName:). #1286

Uh oh!

Conversation

YOCKOW commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

YOCKOW commented Oct 16, 2025

Uh oh!

YOCKOW commented Oct 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YOCKOW commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SF-0033: Implement `String.Encoding.ianaName` and `String.Encoding(ianaName:)`. #1286

SF-0033: Implement `String.Encoding.ianaName` and `String.Encoding(ianaName:)`. #1286

YOCKOW commented May 9, 2025 •

edited

Loading