Skip to content

Add an example to show how to handle a non-UTF-8 page #39385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

def00111
Copy link
Contributor

@def00111 def00111 commented May 4, 2025

Description

The example shows two things:

  1. How to get the charset from the page meta tag (using normal utf-8 TextDecoder)
  2. Using an external library because TextEncoder will not work

Motivation

All other examples are for a UTF-8 page.

Additional details

Related issues and pull requests

@def00111 def00111 requested a review from a team as a code owner May 4, 2025 14:11
@def00111 def00111 requested review from dotproto and removed request for a team May 4, 2025 14:11
@github-actions github-actions bot added Content:WebExt WebExtensions docs size/m [PR only] 51-500 LoC changed labels May 4, 2025
Copy link
Contributor

github-actions bot commented May 4, 2025

Preview URLs

(comment last updated: 2025-05-17 08:39:50)

@dotproto
Copy link
Collaborator

This is the first I've heard of fast-sjis-encoder and I'm extremely uncomfortable with including a third party library in example documentation.

You indirectly mentioned that TextEncoder only supports encoding content in UTF-8. Rather than try to encode the modified document back into shift_jis, have you considered updating the charset of the document to utf-8 and using TextEncoder to re-encode the data stream?

Also tagging @rebloor and @pepelsbey in case there are MDN editorial considerations that I'm not aware of here.


const encoder = new TextEncoder();
const start1 = encoder.encode(
'<a href="/pc/" class="p-catList_cell p-catList_cell--pc-">'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[mdn-linter] reported by reviewdog 🐶

Suggested change
'<a href="/pc/" class="p-catList_cell p-catList_cell--pc-">'
'<a href="/pc/" class="p-catList_cell p-catList_cell--pc-">',

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "," needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally, no, but the default prettier settings require trailing commas: https://prettier.io/docs/options#trailing-commas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


browser.webRequest.onBeforeRequest.addListener(
listener,
{ urls: ["https://kakaku.com/"], types: ["main_frame"] },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use https://example.com/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The page has utf-8 encoding.

This example shows, how to handle a non-UTF-8 page:

```js
Object.defineProperty(Array.prototype, 'indexOfMulti', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[mdn-linter] reported by reviewdog 🐶

Suggested change
Object.defineProperty(Array.prototype, 'indexOfMulti', {
Object.defineProperty(Array.prototype, "indexOfMulti", {

}

return i === initial + searchElements.length - 1 ? initial : -1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[mdn-linter] reported by reviewdog 🐶

Suggested change
}
},

@dotproto
Copy link
Collaborator

I don't think we should merge this PR. UTF-8 encoding has been an unofficial standard for decades and was formally adopted by WHATWG in Nov, 2018 (whatwg/html#4195). Based on data gathered in this comment on the related WHATWG discussion issue, "out of the 431,851 sites analyzed … merely 0.14%" of websites "include at least one charset that is not utf-8." As such, I don't think this use case is prevalent enough to justify the inclusion of an example.

Tagging @rebloor for a second opinion.

@dotproto dotproto requested a review from rebloor May 20, 2025 00:52
@rebloor
Copy link
Contributor

rebloor commented May 20, 2025

@def00111 I would agree with @dotproto, let's close this PR.

@dotproto dotproto closed this May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Content:WebExt WebExtensions docs size/m [PR only] 51-500 LoC changed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants