Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate pages for trackers with insufficient data (instead of 404) #261

Open
philipp-classen opened this issue Jan 7, 2022 · 6 comments

Comments

@philipp-classen
Copy link
Member

To derive meaningful data, WhoTracks.me only generates pages for the trackers with the most traffic. For other remaining trackers, we should still generate some nicer page then having a 404 (Page not found). For instance, having a fallback page that explains that it is a potential tracker, but not enough data is available to confirm.

@ghostwords
Copy link

ghostwords commented Jan 7, 2022

Does analytics.tiktok.com fall into this bucket?

@philipp-classen
Copy link
Member Author

As a first step, I would expect it to generate pages for everything listed in the the input file trackerdb.sql. Currently, I don't see references to analytics.tiktok.com there, so it would not be covered. I wonder if analytics.tiktok.com is exclusively used on the tiktok domains (that would explain why it is not automatically picked up, as the system distinguish between first-party and third-party tracking).

For reference, this it where the file can be found, though its content might change over time:

aws s3 cp --no-sign-request s3://data.whotracks.me/trackerdb.sql .
grep analytics.tiktok.com trackerdb.sql # no results

@ghostwords
Copy link

I wonder if analytics.tiktok.com is exclusively used on the tiktok domains

Good question, it's not though. See https://ads.tiktok.com/help/article?aid=9663 and https://publicwww.com/websites/%22analytics.tiktok.com%22/ (for instance).

@ghostwords
Copy link

You should have started picking it up in the summer of 2020.

@philipp-classen
Copy link
Member Author

philipp-classen commented Jan 7, 2022

Thank you, I also found it in our raw data listed as third-party requests. Would need to investigate why it is not classified as a tracker. I created a separate issue for it: #262

@ghostwords
Copy link

Excellent, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants