A containerized headless-browser audit that visits a list of *.umn.edu sites
and reports cookies that are incorrectly scoped to .umn.edu. The output is a
CSV you can use to prioritize remediation and target communications to site
owners.
Many *.umn.edu sites still set analytics/tracking cookies at the root .umn.edu
domain. That means those cookies are sent to unrelated subdomains, inflating
request headers and triggering HTTP 431 “Request Header Fields Too Large” errors.
UMN guidance is to scope cookies to each subdomain. This tool automates detection
at scale so teams can quickly prioritize and verify fixes.
This project grew out of a U of M Tech People Co-working discussion: despite
several communications, mis-scoped cookies persist and users still hit 431s.
The scanner provides a fast, repeatable way to find problems across many sites
and confirm remediation (e.g., GA/GTM cookie_domain updates).
For each site in data/sites.txt, the tool:
- Launches a clean, headless Chromium session inside Docker.
- Loads the site and waits briefly for tags to set cookies.
- Records all cookies and flags those scoped to
umn.eduor.umn.eduusing the same suspect lists as the U of M Library's CookieCutter code. - Optionally checks the CookieCutter page to capture its verdict.
- Writes results to
data/report.csv.
siteoffending?offending_cookie_countoffending_cookie_namesoffending_cookie_sizesoffending_total_sizecookiecutter_okcookiecutter_excerptremediationall_cookies_json
The script prints a single line per site as it finishes, plus a start/finish line:
- ✅ ok — no offending cookies detected
- ❌ offending=N — N offending cookies detected
- 🚫 error — navigation or evaluation error for the site
ANSI colors are enabled by default; use --no-color to disable.
- Docker (recent version)
- macOS optional: the
opencommand is used to display the CSV automatically
Everything else (Ruby, gems, Chromium) is installed inside the container image.
.
├── Dockerfile
├── Gemfile
├── umn_cookie_audit.rb
├── data/
│ └── sites.txt
└── script/
└── run
- Put one site per line.
- URLs may be written with or without
https://. - Lines starting with
#are treated as comments and ignored.
Example:
onestop.umn.edu
https://asr.umn.edu
roomsearch.umn.edu
Use the convenience script:
script/runThis will:
- Move to the project root.
- Build the Docker image:
docker build -t umn-cookie-audit . - Run the container, mounting
./data:docker run --rm -v "$PWD/data:/data" umn-cookie-audit - On macOS, open
data/report.csvautomatically.
If you prefer the raw Docker commands:
docker build -t umn-cookie-audit .
docker run --rm -v "$PWD/data:/data" umn-cookie-auditYou can pass flags to the Ruby tool by appending them after the image name:
docker run --rm -v "$PWD/data:/data" umn-cookie-audit ruby umn_cookie_audit.rb --sites /data/sites.txt --output /data/report.csv --delay 6 --timeout 30 --no-verify-cookiecutter --separator newline --pool 4--sitesPath to the input list (default/data/sites.txt).--outputPath to the output CSV (default/data/report.csv).--delaySeconds to wait after navigation to allow tags to set cookies (default4).--timeoutNavigation timeout in seconds (default25).--no-verify-cookiecutterSkip the post-visit CookieCutter page check.--separator NAMEJoiner for multi-value CSV cells:newline(default),comma, orpipe.--no-colorDisable ANSI colors in console output.--pool NNumber of parallel workers (default 4). To run single-threaded, set--pool 1.
- Publish your GA/GTM/Drupal changes. Changes to
cookie_domainand related settings do not take effect until published in Google Tag Manager (and in Drupal’s admin UI, if applicable). Bust caches (e.g., Varnish) afterward, then re-run the scan. - Some sites set cookies only after interaction or deeper navigation. Increase
--delayand test again if CookieCutter has different results than this scan. - For very large batches, increase
--poolto use more CPU, but watch RAM/CPU and external rate limits. Research shows that a pool of 6-8 may be better performance than higher. Your mileage may vary depending on your processor/core count.