-
Notifications
You must be signed in to change notification settings - Fork 204
SF-0033: Implement String.Encoding.ianaName
and String.Encoding(ianaName:)
.
#1286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
dd0b16c
to
1ae2a4c
Compare
This comment was marked as outdated.
This comment was marked as outdated.
1ae2a4c
to
4d87ed8
Compare
String.Encoding.ianaName
and String.Encoding(ianaName:)
.String.Encoding.ianaName
and String.Encoding(ianaName:)
.
This comment was marked as outdated.
This comment was marked as outdated.
217c46d
to
9cd5c9a
Compare
This comment was marked as outdated.
This comment was marked as outdated.
1 similar comment
This comment was marked as outdated.
This comment was marked as outdated.
9cd5c9a
to
3e9aef4
Compare
3e9aef4
to
ac42127
Compare
@swift-ci Please test |
I've made this PR ready for review. cc: Foundation Workgroup |
##===----------------------------------------------------------------------===## | ||
|
||
""" | ||
This is a python script that converts an XML file containing the list of IANA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must ask here since we're a Swift project 😄
Is there a reason why this needs to be a python script? Can't we write this in Swift?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just because there are not a few Python scripts in swiftlang/swift/utils.
Of course, we can rewrite it in Swift because we must love Swift more than Python.😁
} | ||
|
||
/// A type to tokenize string for `String.Encoding` names. | ||
internal protocol StringEncodingNameTokenizer: ~Copyable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to introduce other conformer for this protocol? I wonder if omitting using this protocol, but use the concrete ASCIICaseInsensitiveTokenizer
would give us performance gains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was certainly planning to be able to support other conformer such as UTS #22's Charset Alias Matching which has been already removed.
I should've removed (refactored) code more aggressively.
|
||
// MARK: - Private extensions for parsing encoding names | ||
|
||
private extension Unicode.Scalar { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a general comment for the functions in this file:
Can we store and operate the string processing logic on UInt8
instead of Unicode.Scalar
(and UTF8View
instead of UnicodeScalarView
) since we expect the names to be ascii?
We do so for quite a few places, such as parsing utilities in this file IIRC it did give us a better performance than resorting to UnicodeScalarView
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now come to think UInt8
is more effective than Unicode.Scalar
.
I'll retouch functions to use UInt8
.
_ string: String, | ||
tokenizedBy tokenizer: T.Type | ||
) -> Bool where T: StringEncodingNameTokenizer, T: ~Copyable { | ||
if let preferredMIMEName = self.preferredMIMEName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why we need a tokenizer. This function looks like it is effectively the same as doing case-insensitive-compare on two strings. In the case of ascii strings, it is also effectively the same as calling lowercased()
and compare. Is my understanding correct? Did you add a tokenizer to avoid extra allocation from calling lowercased()
? Are there other things that I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are/were two reasons which came to my mind:
- To adopt "Once and Only Once" principle: This is not applied to current implementation because code for UTS#22 has been removed.
- To avoid extra allocation from calling
lowercased()
(as you pointed out): I imagined that user input might be like "VeryVeryLongLongLongInput...".
@itingliu |
0045a48
to
e674fa6
Compare
Implementation of the proposal: #1243 (SF-0033)