You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The UTF-8 encoding of \u0a0a is the 3-byte sequence e0 a8 8a, so there's no problem there. There is also no problem with 0a ever being embedded within a valid UTF-8 byte sequence because UTF-8 is designed not to overlap with valid ASCII bytes -- UTF-8 encodings for codepoints above 7F all have the high bit set!
fcfb9bf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great.
Will comments consume multi-byte characters correctly as well? For example, what happens if a http://www.fileformat.info/info/unicode/char/0A0A/index.htm occurs within a comment?
fcfb9bf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the same test to graphql-js right now graphql/graphql-js@da5c4b0
fcfb9bf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UTF-8 encoding of
\u0a0a
is the 3-byte sequencee0 a8 8a
, so there's no problem there. There is also no problem with0a
ever being embedded within a valid UTF-8 byte sequence because UTF-8 is designed not to overlap with valid ASCII bytes -- UTF-8 encodings for codepoints above7F
all have the high bit set!fcfb9bf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, I forgot this explicitly parses a UTF8 stream :) Most excellent