-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated privacy section with discussion on fingerprinting and navigational tracking. #102
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -241,6 +241,18 @@ In the interest of user privacy, user agents are encouraged to deploy the follow | |
* The user agent offers users a way to turn Private Click Measurement on and off. | ||
* The user agent doesn't support Private Click Measurement in private/incognito mode. | ||
|
||
## Event-Level Measurement Combined With Pre-Existing Tracking Vectors ## {#pre-existing-tracking-vectors} | ||
|
||
Any form of event-level measurement, i.e. where there's one report per event, can be combined with pre-existing tracking vectors in an attempt to track users cross-site. | ||
|
||
### Combined With Fingerprinting ### {#combined-with-fingerprinting} | ||
|
||
One pre-existing tracking vector is device fingerprinting. The click source and click destination may be able to record fingerprints for all measured events based on e.g. IP address, user agent string, and user-installed fonts. These fingerprints would reduce the number of users who would need a unique ID in event-level attribution reports to be uniquelly identified. Combining with fingerprinting is not a problem inherent to event-level meaurement such as Private Click Measurement, but rather an existing privacy problem on the web. User agents have to defend against device fingerprinting to prohibit this kind of attack. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The net effect of a system like this is to create a new cross-site flow of information. I'd prefer that you own that rather than attempting to claim that it isn't your problem. I don't agree with the characterization (as in "...not a problem inherent...") that event-level attribution doesn't need to consider interactions with fingerprinting. Whether fingerprinting is very coarse (Safari vs. Firefox vs. Chrome) or precise (more or less what we have today), the feature is being introduced to a running system and it needs to consider the effect it has on that system. In general, when we give new platform features a pass on fingerprinting, it is because we are able to consider the incremental effect of the change. We recognize that fingerprinting exists and is hard to combat, but we can look at the overall increase to the entropy of a feature. Where the value of any fingerprinting increase seems worthwhile, that's where we might adjudge that the increase in fingerprinting is tolerable. This is further tempered by the fact that many fingerprinting elements are highly correlated (for instance, exposing the existence of a microphone is not usually very much information when a webcam is known to be present as those two things are often available at the same time). While we don't have a good handle on what the true entropy is, I've heard claims1 that the total entropy of web APIs is still low enough that individuals are not reliably identifiable. It is under this assumption that we accept a partial bit of extra leakage here or there. What makes PCM hard to reason about here is that it creates a time-based contribution to the amount of information sites can exchange. Over time, information is constantly released. In theory at least, there is no limit to the information sites can exchange about a user if they are patient. A better approach here might be to attempt to quantify the release of information somehow. Then, this can be combined with what we know (or might assume) about fingerprinting entropy to produce some sort of understanding of what abuse of the API might look like. nit: s/e.g./such as/ (and s/and/or/) or s/e.g./e.g.,/. Footnotes
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't understand this feedback. At the beginning of the Privacy section we go through exactly how much data is allowed to be linked cross-site as part of PCM. Do you think that information needs to be reiterated down here, that the two pieces of text are too far part, or that the piece about PCM's own data is unclear?
I've not tried to state that "event-level attribution doesn't need to consider interactions with fingerprinting." On the contrary, this update to the Privacy section tries to be very open about the fact that a) the combination of event-level attribution and fingerprinting is a problem and that b) user agents need to prevent or mitigate fingerprinting to prevent the combination of the two. Safari's implementation uses IP address hiding in PCM's attribution report requests for instance.
Are you saying PCM adds fingerprinting attack surface? The definition of fingerprinting I use is based on the fact that fingerprinting is stateless. Stateless in that use of web platform features don't change the fingerprint. By that definition, PCM does not add fingerprinting attack surface since PCM is stateful. If your view is that PCM adds fingerprinting attack surface, I need to understand how. For me, the interesting analysis lies in the fact that stateful cross-site data transfer such as event-level click attribution can be combined with stateless, cross-site, partial re-identification through fingerprinting. It's important to reason about it at the event-level and not just for PCM since there is at least one more proposal that's also event-level. The TAG has asked us to coordinate with them so the two features are very similar except in the amount of data on either side of the click.
Getting users to click to navigate cross-site is a limiting factor of course. In fact, that is a key piece of the analysis since otherwise we could support view-through attribution with the same mechanism.
You mean over time, under the assumption of repeated clicks and collusion, I assume.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The way I'm reading the text in this section is that you are saying that fingerprinting and PCM can combine to enable more effective user tracking, but this isn't your problem because fingerprinting is a known issue and browsers should (or will eventually) just block fingerprinting. To be clear, that's a deliberately pejorative reading, but I think that is a possible interpretation of that text. Given that I don't believe fingerprinting will ever go to zero (or that zero fingerprinting is relevant to this analysis), you need more...
I'm saying that PCM is being deployed into a system where fingerprinting is possible and that in order to understand the resulting privacy properties of the system (and PCM) you need to better understand how the two combine to subvert privacy. Hypothetical time. Let's say that you need ~30 bits of fingerprinting entropy in order to uniquely identify every web user and the current fingerprinting surface is ~15 bits. If PCM leaks one bit per week, then maybe we might decide that it is OK for a site with repeated interactions over the course of 15 weeks to learn about a person's activity on another site. Or not; we might each reach our own conclusions about that, with different assumptions about fingerprinting. The point being that there is some amount of time over which the gap between what fingerprinting enables and what a site needs for unique identification. Whatever the gap is (it will change over time, it will change for different sites with different visitor composition), PCM can bridge it eventually. Being able to understand the contribution PCM makes toward identification (or tracking) is a critical part of any analysis of the API. (Separately, I happen to think that the fact that PCM is capable of bridging any gap to be a real deal-breaker, but my purpose here is in helping you produce the best possible documentation of the system you are proposing.)
Excellent. That's exactly what I would expect from the TAG, so 👍. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think I see where you're going here. PCM was conceived with two requirements that are highly relevant for this discussion:
Yeah, some concrete discussion of the highest number of bits allowed in fingerprinting to not get to ≈30 in total would be useful. The timeline bit revives a proposal we discussed earlier, namely that PCM should rate limit measurements per site pair and browser instance. I think that is very doable.
|
||
|
||
### Combined With Navigational Tracking ### {#combined-with-navigational-tracking} | ||
|
||
Another pre-existing tracking vector is [navigational tracking](https://privacycg.github.io/nav-tracking-mitigations/#terminology), sometimes referred to as link decoration tracking. The click source can include a user ID or click ID in the destination URL and the click destination can read and store that ID for attribution purposes. Under the assumption of navigational tracking, there is no need for a feature like Private Click Measurement but it is of course possible to combine navigational tracking with event-level measurement to try to track individual users cross-site. This problem is not inherent to event-level meaurement such as Private Click Measurement, but rather an existing privacy problem on the web. User agents have to defend against navigational tracking to prohibit this kind of attack. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not clear on your interpretation of the definition here. Your cited definition is:
I don't consider "link decoration tracking" to be an alternative name for navigation tracking. I consider it a subset: just one technique that can be used. I do agree that if you have reliable navigation tracking that uses just a single navigation (not all techniques do), then you don't need PCM. In that light, I might make this more pointed:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I include cross-site tracking of user activity across sites and not just linking of identity across sites. Is that your take too? It's not clear from the above.
Fair. I'll update that language.
Would the [LINK] here be to the WHATWG site or the navigational tracking work item? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree. I only included this because we're talking specifically about navigation tracking and this is the definition you cited; so it seemed useful context.
[LINK] could be whatever you think is good, though the navigation tracking document defines both terms fairly well, so you could use those. |
||
|
||
# Performance Considerations # {#performance} | ||
|
||
The user agent may want to limit the amount of stored click attribution data. Limitations can be set per [=click source website=], per [=attribution destination website=], and on the total amount of click attribution data. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.