Skip to content

Files

Latest commit

beeae7d · Dec 1, 2024

History

History

fastly

Fastly log processing

This flake provides a systemd timer (./cron.sh) that every week:

  • Ingests raw Fastly logs for {cache,channels,tarballs,releases}.nixos.org (which are very big) and aggregates them into a smaller AWS Athena database.

    This is performed by ./ingest-raw-logs.sh.

  • Runs a number of SQL queries against the Athena database and stores them in S3.

    This is performed by ./run-queries.sh.

AWS Athena database

The Athena database is stored in the NixOS Foundation AWS account. To get the schema, run

# aws athena list-table-metadata --region eu-west-1 --catalog-name AwsDataCatalog --database-name default

It has the following external tables:

  • requests: An external table. These are the raw fastly logs stored in s3://fastly-logs-20220622145016462800000001/ as compressed JSON records. Note that this bucket has a lifecycle rule that moves logs to Glacier after a few weeks. Logs in Glacier are not processed by Athena.

  • asn_list: A list of ASNs. This can be updated by running ./update-asn-list.sh.

  • hosting-asns: A list of ASNs belonging to hosting/cloud providers.

  • all_paths: The set of all store paths known in the hydra.nixos.org database. This is used to expand the hash part of .narinfo requests (e.g. 8kbx6s9nn7060zsdms3br0mk7bjrvbij) to store paths (e.g. /nix/store/8kbx6s9nn7060zsdms3br0mk7bjrvbij-coreutils-full-9.0).

    FIXME: describe how to update.

  • release_paths: All the store paths belonging to NixOS evals in hydra.nixos.org, as {project, jobset, eval, release_name, build, output, path} tuples.

    FIXME: describe how to update.

The ingestion script populates the following tables stored in s3://nixos-athena/fastly-logs-processed/:

  • urls: For each host/day/url, the total number of requests, bytes and elapsed microseconds. This only includes info about successful (2xx/3xx) requests.

  • clients: For each host/day/ASN/country/region, the total number of requests, bytes and elapsed microseconds.

  • nix_cache_info: For each day/ASN/country/region/user-agent, the number of requests for nix-cache-info.

Reports

Currently the following reports are created every week: