-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tech Report: Technologies - major.minor versions granularity #48
base: main
Are you sure you want to change the base?
Conversation
@tunetheweb @rviscomi FYI
|
Some test cases for the version pattern: SELECT
version,
REGEXP_EXTRACT(version, r'\d+(?:\.\d+)?') AS major_minor
FROM UNNEST(['1.2.3', '01976.2.83', '0003.3.4', '0.0.1', '1.2', 'version 5.1.2', '8']) AS version
|
@rviscomi @tunetheweb After expanding the pattern to major + minor it's now obvious how messy the data is. Examples of the technologies that are omitted in stricter version: I was thinking to do something like |
I think relaxed is fine - we're just echoing what the site declared its version to be. When we aggregate technologies by version, the most popular ones will bubble up to the top anyway. |
Then, if no more questions, it's ready to be merged. |
Or considering we don't want to look into a long tail, maybe we limit to top 50 versions per technology? |
Related to HTTPArchive/httparchive.org#984
As the aggregation changes we have new schemas, and new tables for tech report.
I placed them in
reports
dataset:tech_crux
(successor ofcore_web_vitals.technologies
)tech_report_adoption
tech_report_categories
tech_report_core_web_vitals
tech_report_lighthouse
tech_report_page_weight
tech_report_technologies
tech_report_versions
Notes:
removed a few columns from
tech_crux
(as compared tocore_web_vitals.technologies
):category
origins_with_good_cwv_2023
andorigins_with_good_cwv_2024
- deduplicated inorigins_with_good_cwv
removed empty
similar_technologies
column fromtechnologies
all the metrics have 'ALL' version that aggregates at technology level and expected to match the current values:
![Screenshot 2025-01-26 at 23 55 37](https://private-user-images.githubusercontent.com/1611259/406769126-de2533ae-704c-4754-a4de-0f50edf98eb8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5Mzg0NzgsIm5iZiI6MTczODkzODE3OCwicGF0aCI6Ii8xNjExMjU5LzQwNjc2OTEyNi1kZTI1MzNhZS03MDRjLTQ3NTQtYTRkZS0wZjUwZWRmOThlYjgucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDdUMTQyMjU4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OTY0OTU2Y2ZjZmY1ZWY3YTkwOGNhZjAwNDVjYTE3YWFkNTZlNDYxNDc2Mzg2NGJlMzkyZjUzZThiZDM1MjAyNiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.qig1lwHV_czd51cxIW84s1tBDlRnV1wCEN1htNBdtrc)
corresponding to the current approach
![Screenshot 2025-01-27 at 00 05 01](https://private-user-images.githubusercontent.com/1611259/406769602-347db9a4-07ff-41a6-9b15-e91f8baa0a59.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5Mzg0NzgsIm5iZiI6MTczODkzODE3OCwicGF0aCI6Ii8xNjExMjU5LzQwNjc2OTYwMi0zNDdkYjlhNC0wN2ZmLTQxYTYtOWIxNS1lOTFmOGJhYTBhNTkucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDdUMTQyMjU4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZDJiNjlmODk2NTNlMWE3MmFhNTM5YTY4ZmNhMzM2Mzc2YjRmMmJiNDIyODdhZTk0ODgwODI0ZDU3ZWZhYmMzMSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.p1XZ_9Ynqzb5auSrNM_If5hgtVXp0fz93bCONx5UyRY)
tech_report_versions
has full adoption data fromcrawl.pages
andtech_report_adoption
has the smaller absolute values because of the JOIN with CrUX.example of the technology versions: