Skip to content

Add procedures for parsing site names from strings #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
2cc5aa8
Add procedures for parsing site names from strings
martinthomson Feb 26, 2025
371a7e3
Some rather obvious fixes
martinthomson Feb 27, 2025
a2e6cef
Expand on the measureConversion example
martinthomson Feb 26, 2025
c0809a2
Replace DOMString with USVString in URL/site-typed fields
apasel422 Feb 25, 2025
1295d11
Fix minor typo
apasel422 Feb 27, 2025
3763325
A better example of saving an impression
martinthomson Feb 26, 2025
c44ba49
downcase
martinthomson Feb 27, 2025
fe72d71
Invoke the algorithms directly
martinthomson Feb 27, 2025
fee794e
Link to enum declarations
apasel422 Mar 3, 2025
e08c6f6
Update the readme with updated status
martinthomson Feb 27, 2025
a8f4ba6
Missing _
martinthomson Feb 27, 2025
25f1772
trivial changes like this can just be committed directly
martinthomson Feb 28, 2025
c7c65b4
Fix erroneous usage of some method in example code
apasel422 Mar 3, 2025
ecdd818
Make aggregationServices a maplike
apasel422 Mar 5, 2025
9ec15b7
Clean up specification links, terminology, and TODOs
apasel422 Mar 11, 2025
e1a7dcf
Link "histogram indexes"
apasel422 Mar 11, 2025
58fa348
TODO
apasel422 Mar 12, 2025
0bd77ad
Explain difference in defaulting of filterData fields
apasel422 Mar 24, 2025
2750ad0
Move definition of attribution logic enum up
martinthomson Mar 27, 2025
687c572
Add validation for value vs maxValue
martinthomson Mar 27, 2025
2a347b0
Avoid useless noop calls
martinthomson Mar 27, 2025
825647e
Add a heading for last-touch and reorder
martinthomson Mar 27, 2025
711f445
Fix bad rebase
martinthomson Mar 31, 2025
fc428c0
Merge branch 'main' into site-names
martinthomson Apr 3, 2025
e289db9
Fix rebase error
martinthomson Apr 3, 2025
e7c6a91
Merge branch 'main' into site-names
martinthomson Apr 24, 2025
38f8fcb
Merge branch 'main' into site-names
martinthomson Apr 24, 2025
7f62ed2
Rework the saveImpression algorithm to include an actual impression
martinthomson Apr 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 77 additions & 8 deletions api.bs
Original file line number Diff line number Diff line change
Expand Up @@ -881,6 +881,50 @@ excludes expired [=impressions=] from [=attribution=]. However, the
[=user agent=] should not retain expired [=impressions=] indefinitely.


### Site Names ### {#site-name-algorithm}

The [=impression store=] saves information
about three types of [=site=]:
the [=impression/impression site=],
an optional [=impression/intermediary site=],
and a [=set=] of [=impression/conversion sites=].

These [=sites=] MUST all be in [=scheme-and-host=] form,
with a [=scheme=] of "`https`".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a major issue, but do we have any concerns about local development/debugging not being able to use http://localhost as a result of making https inherent? ARA supports that, for example.

Copy link
Member Author

@martinthomson martinthomson Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm 100% OK with that. We can talk about what site identity means for localhost, which is a whole other rats nest, but the obvious way out there is to use HTTPS and take steps to avoid the certificate warnings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should try to use https://w3c.github.io/webappsec-secure-contexts/#potentially-trustworthy-origin as the check before converting the input to a site, which is well-specified and used in many other parts of the platform. That link provides the following justification for local addresses:

treating such resources as potentially trustworthy is convenient for developers building an application before deploying it to the public.

This should allow localhost for user agents that conform to the name resolution rules in let localhost be localhost. Regarding site identity, we should just be able to use https://html.spec.whatwg.org/multipage/browsers.html#obtain-a-site as-is, which should support localhost.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem this is trying to solve is something that "obtain a site" cannot address, because we're looking to turn a string into a site. That means we could define a parse function for a serialized site, which would also mean that we force people to include "https://" in every string they provide. Or, as I'm suggesting, we encourage people to setup HTTPS from the outset (mkcert really is very convenient) and we can thereby avoid the whole localhost mess.

I know that it's not ideal, but local development will require quite a bit more setup than that, because the set of aggregation servers might also need to be overridden (or we might need a developer flag to enable reports that are trivially decoded...).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking we should just force https:// and use the URL parser. That seems totally fine with me given that it makes even clearer to callers that the site needs to be secure.

This means that a simple string serialization of a [=host=]
is sufficient to identify the site.
The API is therefore able to use a simple [=string=]
to represent [=sites=].

<p class=note>
It is also possible for an implementation to internally represent sites
using just the [=host=] part of the tuple.
</p>

To <dfn>parse a site</dfn>,
returning either [=site=] or failure,
given a [=string=] |input|,
run these steps:

1. Let |host| be the value returned by invoking [=host parser=],
passing |input|.

1. If |host| is failure, return failure.

1. Let |site| be the value returned by [=registrable domain|obtain a registrable domain=],
passing |host|.

1. If |site| is null, return failure.

1. Return a [=scheme-and-host=] tuple of ("`https`", |site|).

<p class=note>
This algorithm successfully produces a site from strings
that contain more [=domain labels=] than the [=registrable domain=].
For example, "`extra.example.com`" is parsed as "`example.com`".
</p>


## State For Privacy Budget Management ## {#privacy-state}

[=User agents=] maintain three pieces of state
Expand All @@ -890,6 +934,7 @@ that are used to manage the expenditure of [=privacy budgets=]:
of the per-[=site=] and per-[=epoch=] [=privacy budgets=].
It is updated by [=deduct privacy budget=].


* The [=epoch start store=] records when each [=epoch=] starts
for [=impression sites=].
This store is initialized as a side effect
Expand Down Expand Up @@ -1073,21 +1118,37 @@ and given <a dictionary lt=PrivateAttributionImpressionOptions>|options|</a>:
<!-- TODO: Check the {{PermissionPolicy/save-impression}} policy. -->

1. Collect the implicit API inputs from |settings|:
1. The timestamp is set to |settings|'s [=environment settings object/current wall time=].
1. The [=impression site=] is set to the result of
1. Let |timestamp| be |settings|'s [=environment settings object/current wall time=].
1. The [=impression site=] |site| is set to the result of
[=obtain a site|obtaining a site=] from the [=top-level origin=].
1. The [=intermediary site=] is set to
1. The [=intermediary site=] |intermediarySite| is set to
1. a value of `undefined` if the [=origin=] is [=same site=]
with the [=top-level origin=],
1. otherwise, the result of
[=obtain a site|obtaining a site=] from the [=origin=].
1. Validate the page-supplied API inputs:
1. If |options|.{{PrivateAttributionImpressionOptions/lifetimeDays}} is 0,
1. If |options|.{{PrivateAttributionImpressionOptions/lifetimeDays}} is 0,
throw a {{RangeError}}.
1. Clamp |options|.{{PrivateAttributionImpressionOptions/lifetimeDays}} to
1. Clamp |options|.{{PrivateAttributionImpressionOptions/lifetimeDays}} to
the [=user agent=]'s upper limit.
1. If the Private Attribution API is [[#opt-out|enabled]], save the impression
to the [=impression store=].
1. Let |conversionSite| be the result of invoking [=parse a site=]
with |options|.{{PrivateAttributionImpressionOptions/conversionSite}}.
1. If |conversionSite| is failure, return {{SyntaxError}}.
1. If the Private Attribution API is [[#opt-out|disabled]], return.
1. Construct |impression| as a [=impression|saved impression=] comprising:
* [=impression/Filter Data=] set to
|options|.{{PrivateAttributionImpressionOptions/filterData}}.
* [=impression/Impression Site=] set to |site|.
* [=impression/Intermediary Site=] set to |intermediarySite|.
* [=impression/Conversion Sites=] set to a single element [=set=]
containing |conversionSite|.
* [=impression/Timestamp=] set to |timestamp|.
* [=impression/Lifetime=] set to
|options|.{{PrivateAttributionImpressionOptions/lifetimeDays}},
multiplied by a [=duration=] of one day.
* [=impression/Histogram Index=] set to
|options|.{{PrivateAttributionImpressionOptions/histogramIndex}}.
1. Save |impression| to the [=impression store=].

<p class=advisement><a method for=PrivateAttribution>saveImpression()</a>
does not return a status indicating whether the impression was recorded.
Expand Down Expand Up @@ -2108,18 +2169,26 @@ The privacy architecture is courtesy of the authors of [[PPA-DP]].

<pre class=anchors>
urlPrefix: https://html.spec.whatwg.org/; spec: html; type: dfn
text: host; url: #concept-origin-host
text: obtain a site
text: origin; url: #concept-origin
text: relevant settings object
text: same site
text: scheme; url: #concept-origin-scheme
text: scheme-and-host
text: site
text: top-level origin; url: #concept-environment-top-level-origin
text: iframe; url: #child-navigable
urlPrefix: https://infra.spec.whatwg.org/; spec: infra; type: dfn;
text: user agent
text: set; url: #sets
text: string
text: user agent
urlPrefix: https://storage.spec.whatwg.org/; spec: storage; type: dfn;
text: storage key
urlPrefix: https://url.spec.whatwg.org/; spec: url; type: dfn;
text: domain label
text: host parser; url: #concept-host-parser
text: registrable domain; url: #host-registrable-domain
urlPrefix: https://w3ctag.github.io/privacy-principles/; type: dfn;
text: cross-site recognition
text: same-site recognition
Expand Down
Loading