-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normative: Added note about sets of locales for web browser implementations needing to not change as a result of user behaviour #780
base: main
Are you sure you want to change the base?
Conversation
Why would this restriction only apply to web browsers? |
This PR is in response to feedback from the 2021-05-25 TC39 meeting and is meant to address concerns about potential fingerprinting issues that only pertain to browser implementations. |
Is there any reason not to apply the same restrictions to all engines tho? The ideal is that everything applies to everyone equally; having something only apply to a subset of impls is a suboptimal outcome. |
860f98e
to
3feac13
Compare
updated to apply restriction to all hosts |
(please don't land this until 402 editors have reviewed) |
spec/overview.html
Outdated
@@ -72,6 +72,9 @@ <h1>Implementation Dependencies</h1> | |||
<em>Subsets of Unicode:</em> Some operations, such as collation, operate on strings that can include characters from the entire Unicode character set. However, both the Unicode Standard and the ECMAScript standard allow implementations to limit their functionality to subsets of the Unicode character set. In addition, locale conventions typically don't specify the desired behaviour for the entire Unicode character set, but only for those characters that are relevant for the locale. While the Unicode Collation Algorithm combines a default collation order for the entire Unicode character set with the ability to tailor for local conventions, subsets and tailorings still result in differences in behaviour. | |||
</li> | |||
</ul> | |||
<emu-note> | |||
The set of locales made available by ECMAScript hosts must not change as the result of user behaviour, and the set of available locales must not produce observable differences between two users using the same version of the same host on the same platform. As a result, ECMAScript hosts must not allow on-demand installation of new locales. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set of locales, yes, but also the set of all enumerable items including currencies, numbering systems, calendars, etc.
Also, please explain why we're adding this constraint. I imagine that it may be desirable in the future to relax this constraint, and we need to understand why it existed.
The W3C I18N Working Group discussed this PR today in our teleconference. I am adding this comment on our behalf. We are concerned that this prohibition will disadvantage smaller language/cultural communities who might rely on installation of support to enable locale-based APIs in the browser or JS host. We feel that precluding the ability to install a locale or the parts of a locale (such as dictionaries for spell check/breaking/etc.) that assist with high-quality presentation on the Web and in JS applications has the potential to negatively impact those communities that cannot depend on support from browser or system vendors. If there is a "fingerprinting" risk associated with such installation, providing a warning to the user might be the best appropriate response. Also, note that currently we are not aware of runtimes that allows the list of locales to be updated (other than by updating the entire underlying ICU build), so this strikes us as preventing a feature from existing that might be useful. Also, we note that CLDR releases include new locales twice each year, so presumably browsers would change their list of available locales as updates propagate. |
CC @codehag for feedback on @aphillips' comment above (see issue #588 for a reminder of the problem this PR is trying to solve) |
@aphillips doesnt this prohibition only require refreshing the page after installing a new locale? |
@ljharb asked:
Not as far as I can tell? It seems to require that any two distinct users on the same version on the same platform (not the "same machine") should always get the same enumerable set of available values. The change in cb6449b extends this to include anything (numbering systems, calendars, etc.). I note that one frequent cause of such patching would be time zone data (which is not listed but is the regular source of runtime patching outside the normal release cycle). If I understand the threat here, it's to prevent a bad actor from installing a locale into a user's browser and then using that locale ID (perhaps using a well-formed locale ID like: There is a similar issue (w3c/css-drafts#4055) related to fingerprinting based on fonts (which have a much higher level of per-installation variability and which, unlike locales, can actually be installed currently 😃). The challenge here is to support under-served communities of users--particularly when the threat is more-or-less theoretical--without exposing large groups of Web users to abusive behavior. |
Which hosts and engines currently do runtime patching of time zone data? Are these OS patches or browser patches? |
@ben-allen I would like to ensure that this is including the possibility of vastly different behavior of the same browser. For example, iPhone has lockdown-mode (LDM, https://support.apple.com/en-us/HT212650) which explicitly disables some of the features to put some extra defense against the targeted attacks. I don't think Intl related things can be changed based on these modes, but I would like to ensure that there is this kind of explicit possibility and this possibility is allowed in the statement :) Eemeli pointed in the TG2 meeting that this can be said as a part of platform difference, and then this sounds fine. So I would like to ensure that the above possibility is counted as a part of platform difference. Other part pretty looks good to me! Thanks for your work. |
@Constellation can lockdown mode come into effect without reloading the page? If it forces a reload, then this note wouldn't apply at all, since it wouldn't be observable within the lifetime of a program. |
Our concern is that on-demand installation of locale data could provide an easy fingerprinting vector for members of smaller language/cultural communities who may face discrimination or persecution, for example by dominant cultural groups or by the government of the country in which they live. Our position is that it is better to ship data for all locales in a single bundle, which ensures that data for smaller communities is available, without exposing them to a fingerprinting risk. I believe it would be very difficult to create a user warning that would explain the potential risk in a way that would allow a user to make an informed decision about accepting extra locale data. I suspect most people would ignore these warnings. I'd also point out that there's no guarantee that for a small linguistic community the text of the warning would be localized, which would decrease the likelihood of making an informed decision. The key point is that the set of locales should not change as a result of user behaviour. We're not trying to prevent vendors from shipping a new bundle of locale data to users as part of an update, just data for individual locales. In the case of Firefox, CLDR and timezone updates are done as part of our normal release cycle anyway. |
This type of feature was recently requested to me, in the context of ICU4C, and explicitly for the purpose of supporting minority, disadvantaged languages. (Which, yes, could potentially be at-risk for fingerprinting of various kinds.)
Agreed. Default breaking for
CLDR releases include certain locales as "basic" and above, but there are other locales not included. ICU4C default build includes certain locales, but not others. Vendors include certain locales, but not others. In short, certain locales are already excluded from web implementations. I'm concerned that requiring that these locales cannot be added on the fly could end up negatively impacting users of already-digitally disadvantaged languages. |
On another topic, Node.js has, from the first versions that included Intl by default, had the ability to customize at build and runtime the set of locales available, and also to supplement the locales depending on the startup environment. It's also been requested to have some way to add locales at runtime there as well. This language seems to make Node.js v0.12 onwards potentially noncompliant. I don't see the argument for restriction in this type of environment at all. |
@srl295 that doesn't imply to me that it can be changed during the lifetime of a program, only at program start time. My understanding of this requirement is that once a JS program has started, it can't observe further changes to the list of available locales. To that end, anything that requires refreshing or navigating a page, or, restarting an application or launching a process, in order to observe a different set of locales seems to me that it complies with this requirement. |
OK. So "user behaviour" is scoped to the JS runtime? That's helpful… I then don't see how fingerprinting is mitigated.
That would be very different. And wouldn't then bring as much concern. Adding locales while running has been discussed as well, but certainly has a lot of other challenges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It might be worth noting that one potential way to satisfy this constraint is to pretend that all on-the-fly-available locales are already installed. |
@ben-allen to update the spec text to incorporate the remainder of @Manishearth's feedback. |
spec/overview.html
Outdated
@@ -73,6 +73,14 @@ <h1>Implementation Dependencies</h1> | |||
</li> | |||
</ul> | |||
|
|||
<emu-note> | |||
Changes to the set of locales, currencies, calendars, numbering systems, and other enumerable items made available by ECMAScript hosts must not result in two users using the same version of the same host on the same platform becoming distinguishable from each other. This constraint is imposed to reduce the fingerprinting risk inherent in internationalization, and may be relaxed in future revisions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be normative? The line below should be informative, but it sounds like this is expected to be normative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should indeed be normative, though it's in that fuzzy space where no extant browser implementation is affected by the change
549567a
to
8813f31
Compare
spec/overview.html
Outdated
@@ -73,6 +73,14 @@ <h1>Implementation Dependencies</h1> | |||
</li> | |||
</ul> | |||
|
|||
<emu-note> | |||
Changes to the set of locales, currencies, calendars, numbering systems, and other enumerable items made available by ECMAScript hosts must not result in two users using the same version of the same host on the same platform becoming distinguishable from each other. This constraint is imposed to reduce the fingerprinting risk inherent in internationalization, and may be relaxed in future revisions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to the set of locales, currencies, calendars, numbering systems, and other enumerable items made available by ECMAScript hosts must not result in two users using the same version of the same host on the same platform becoming distinguishable from each other. This constraint is imposed to reduce the fingerprinting risk inherent in internationalization, and may be relaxed in future revisions.
How about
The initial set of locales, currencies, calendars, numbering systems, and other enumerable items visible to a particular origin must be the same for all users sharing the same user agent string (engine and platform version). Furthermore, dynamic changes to these sets must not result in users becoming distinguishable from each other. This constraint is imposed to reduce the fingerprinting risk inherent in internationalization, and may be relaxed in future revisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much clearer, will use this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The notion of "user agent string" is foreign to ECMAScript and not a universal among implementations. @ljharb, please take a look, it seems to be going contrary to your comments above.
As something far from matters typically expected to be specified in a programming language standard (or API closely related to it, being an optional extension of a language's standard library) and pertaining to only a subset of implementations, namely Web browsers, shouldn't it live in a spec for those? Like the one defining what a Web browser is and what requirements it has to satisfy as an ECMAScript implementation (if supported) in addition to ECMA-262?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the concept and mention of a user agent doesn’t make any sense in an ecma specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I think we need to step back and consider the premise here. The fingerprinting concerns are largely only relevant for web browser environments.
Either it is ECMA402's job to address these fingerprinting concerns and thus it must be allowed to refer to mechanisms available in those contexts, or it is not ECMA402's job, and we don't need to handle this at all. We can't have our fingerprinting cake if we're not planning on eating it.
I suspect the framing here can be refined a bit to be clear that it is talking in a web browser context only. Alternatively, a more general point can be made about "systems where fingerprinting is a concern, like web browsers", and instead of saying UA strings talk about "already distinguishable bits of information (in the case of browsers, this is platform/version/UA string)". Something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it really belongs elsewhere. But if a critical mass insists it's desirable to make the spec longer and draw the attention also therein to this concern present in some implementations, an informative note with sufficiently generic wording could be added.
spec/overview.html
Outdated
@@ -73,6 +73,14 @@ <h1>Implementation Dependencies</h1> | |||
</li> | |||
</ul> | |||
|
|||
<emu-note> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is normative, it maybe shouldn't be a NOTE?
spec/overview.html
Outdated
</emu-note> | ||
|
||
<emu-note> | ||
Non-normative: As a result of this constraint, the first time a browser implementation that allows on-demand locale installation receives a request from a particular origin that could require installing a new locale, it must not reveal whether or not that locale is already installed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are notes in 402 not all non-normative?
08b9662
to
7edd7fb
Compare
Most recent push is a minimal change based on TC39 feedback — all I did was move the text that was in notes into the main text of the Implementation Dependencies section. This may, though, be insufficient or otherwise wrong. @littledan @bakkot @sffc |
7edd7fb
to
0a5f4b5
Compare
0a5f4b5
to
3ae5d92
Compare
3ae5d92
to
4f5f712
Compare
…ations needing to be fixed from version to version
4f5f712
to
69d17cb
Compare
To follow on from the discussion at TC39 yesterday, the SpiderMonkey team still considers this to be very important. I will ask for review on the current text from our privacy team before the next plenary. |
fix #588