Try the production version here: https://vitalglobalhub.com/privacy-scanner (Generates a full legal analysis PDF in ~45 seconds)
Serverless Privacy-as-a-Service (PaaS) engine for automated forensic auditing and legal analysis.
Modern web ecosystems are plagued by "Shadow IT". Marketing teams inject third-party scripts (Facebook Pixel, TikTok Ads, Hotjar) without legal oversight, causing companies to unknowingly violate GDPR (Art. 25) and CCPA regulations. Manual auditing is slow, expensive, and technically shallow.
PrivAI Sentinel is a high-performance, containerized microservice that performs deep forensic scanning of websites. It combines Puppeteer Stealth specifically tuned to bypass bot detection with Google Gemini 2.5 Pro to generate lawyer-grade compliance reports in seconds.
It serves as a powerful Lead Magnet, requiring an email exchange to release the high-value PDF report.
- π΅οΈ Deep Forensic Crawling: Uses recursive logic to scan up to 12 pages per session, simulating real user behavior (scrolling, clicking) to trigger lazy-loaded trackers.
- π€ Generative Legal Analysis: Integrates Gemini 2.5 Pro to interpret technical findings (cookies/scripts) and write a contextualized legal opinion based on GDPR standards.
- π» Stealth Mode: Powered by
puppeteer-extra-plugin-stealthto evade anti-bot systems (Cloudflare/WAF) and detect hidden trackers. - π£ Automated Lead Capture: Synchronous integration with Mailchimp API to store leads before processing the audit (guaranteed ROI).
- π Dynamic PDF Generation: Creates a professional, branded PDF with "Privacy Trust Scores", evidence screenshots, and categorized cookie lists.
- βοΈ Serverless Optimized: Architecture tuned for Google Cloud Run (stateless, memory-efficient,
--disable-gpuflags).
The system follows a strict linear pipeline to ensure data integrity and resource optimization:
graph TD
A[Client Request /scan] -->|URL + Email| B(Mailchimp Lead Capture)
B -->|Success| C{Puppeteer Cluster}
C -->|Recursive Crawl| D[Extract Cookies & Scripts]
C -->|Screenshot| E[Visual Evidence]
D -->|JSON Payload| F[Gemini 2.5 Pro AI]
F -->|Legal Opinion| G[PDF Generation Engine]
G -->|Stream Buffer| H[Downloadable Report]
To run Puppeteer in serverless environments (like Cloud Run or AWS Lambda) with limited RAM:
- Flag Optimization: Uses
--disable-dev-shm-usageand--disable-gpu. - Resource Cleanup: Explicit stream closing and temporary file deletion (
fs.unlinkSync) immediately after response delivery. - Scoped Variables: Database variables are strictly scoped to prevent memory leaks during high concurrency.
- Node.js 18+
- Docker (optional, for containerization)
- Google Cloud Platform Account (for Gemini API)
- Mailchimp Account
git clone https://github.com/DiegoRibeirodeSouza/privacy-scanner-cloud.git
cd privacy-scanner-cloud
npm installEnsure the open-source cookie database is present in the root directory:
Note: This project relies on
open-cookie-database.jsonfor heuristic classification.
Create a .env file in the root:
PORT=8080
# AI Configuration
GEMINI_API_KEY=your_google_gemini_key
# Marketing Integration
MAILCHIMP_API_KEY=your_mailchimp_key
MAILCHIMP_LIST_ID=your_list_id
MAILCHIMP_SERVER=us7 (e.g., us6, us7)
# Puppeteer (Optional for local dev)
# PUPPETEER_EXECUTABLE_PATH=/path/to/chromenpm start
# Server starts at http://localhost:8080This project includes a production-ready Dockerfile.
# Build the image
docker build -t privai-sentinel .
# Run container
docker run -p 8080:8080 --env-file .env privai-sentinelEndpoint: GET /scan
| Parameter | Type | Description |
|---|---|---|
url |
string |
The target website URL (e.g., https://example.com) |
email |
string |
User email for the report delivery (Lead Capture) |
Example Request:
http://localhost:8080/scan?url=https://techcrunch.com&email=audit@test.comThis project is optimized for Google Cloud Run (Serverless). It requires at least 2Gi of RAM to handle Puppeteer and Chrome headless efficiently.
To deploy your own instance using the Google Cloud CLI, run:
gcloud run deploy vgh-scanner \
--source . \
--region us-central1 \
--memory 2Gi \
--cpu 1 \
--timeout 300 \
--set-env-vars GEMINI_API_KEY="YOUR_GEMINI_KEY_HERE" \
--set-env-vars MAILCHIMP_API_KEY="YOUR_MAILCHIMP_KEY_HERE" \
--set-env-vars MAILCHIMP_LIST_ID="YOUR_LIST_ID" \
--set-env-vars MAILCHIMP_SERVER="us7"
---
--- Note: Make sure you have the Google Cloud SDK installed and authenticated (
gcloud auth login).
We welcome contributions from the community! This project aims to democratize privacy auditing, and we need help in several areas.
- Cookie Database Expansion:
- We use
open-cookie-database.jsonto identify trackers. - Help needed: Add more known cookies (TikTok, LinkedIn, Taboola) to the JSON file to improve detection accuracy.
- We use
- AI Prompt Engineering:
- Refine the prompt in
index.jsto generate even more precise legal insights for GDPR/LGPD/CCPA.
- Refine the prompt in
- Frontend Improvements:
- The current frontend is a simple HTML/CSS page. We are looking for React/Vue.js developers to build a dashboard with scan history.
- Performance:
- Optimize Puppeteer memory usage to run faster on serverless environments.
- Fork the repo.
- Create a branch (
git checkout -b feature/amazing-feature). - Commit your changes.
- Open a Pull Request.
DISCLAIMER: This software generates a technical analysis based on momentary sampling of a website's public-facing code. The "Legal Analysis" provided by the AI component is for informational purposes only and does not constitute legal advice. The generated PDF report does not guarantee immunity from regulatory fines (GDPR/CCPA/LGPD). Full compliance requires a comprehensive audit by a certified legal professional.
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.