Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle HTTP chunking across multi-byte boundaries #23

Merged
merged 3 commits into from
Mar 12, 2024

Conversation

Freaky
Copy link
Contributor

@Freaky Freaky commented Mar 8, 2024

When a response contains multi-byte UTF-8 characters that cross chunk boundaries, appending the two halves to the intermediate string results in two Unicode Replacement Characters being inserted instead of the two halves being joined.

Use StringDecoder to buffer the partial characters across chunks.

Resolves #22

@Freaky Freaky requested a review from zyc9012 March 8, 2024 19:55
Copy link
Collaborator

@zyc9012 zyc9012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Freaky 👍

Freaky and others added 2 commits March 11, 2024 21:27
When a response contains multi-byte UTF-8 characters that cross chunk
boundaries, appending the two halves to the intermediate string results
in two Unicode Replacement Characters being inserted instead of the two
halves being joined.

Set the response encoding to UTF-8 to ensure it buffers partial
characters between chunks.

Resolves #22

Co-authored-by: zyc9012 <[email protected]>
@Freaky Freaky force-pushed the fix-22-multibyte-char-corruption branch from bec58a8 to 7610f9f Compare March 11, 2024 21:30
@Freaky
Copy link
Contributor Author

Freaky commented Mar 11, 2024

CI failures look to be a result of https://letsencrypt.org/docs/dst-root-ca-x3-expiration-september-2021/

Instructing it to use OpenSSL's doesn't appear to help, hmm.

@Freaky Freaky force-pushed the fix-22-multibyte-char-corruption branch 3 times, most recently from 34d1d1c to 7d2266f Compare March 11, 2024 23:47
@zyc9012 zyc9012 force-pushed the fix-22-multibyte-char-corruption branch 6 times, most recently from 22ff7db to d0b1858 Compare March 12, 2024 08:54
Node hardcodes its default TLS certificates, meaning older versions do
not support the root certificate currently used by LetsEncrypt.

Use --use-openssl-ca to enable the use of the system's OpenSSL
certificates.

Since node v7 doesn't support NODE_OPTIONS, conditionally modify the
test script with sed.
@zyc9012 zyc9012 force-pushed the fix-22-multibyte-char-corruption branch from d0b1858 to d9385fc Compare March 12, 2024 08:56
@zyc9012
Copy link
Collaborator

zyc9012 commented Mar 12, 2024

Thank you @Freaky. I fixed the CI and squashed my commit with yours.

@zyc9012 zyc9012 merged commit ece9d5e into master Mar 12, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-byte character corruption on chunk boundaries
2 participants