Welcome to Big Crawler IPs! This is a serverless function that routinely checks both Google and Bing's list of official crawler IP addresses and saves them into a BigQuery table.
Watch the step by step deployment guide here.
When deploying Big Crawler IPs your Cloud Function needs the following environment variables:
bqProjectId
: Your Google Cloud Project's IDbqDataset
: The name of your BigQuery datasetbqTable
: The name of your table in BigQuerygServiceAccount
: Your Google Cloud Service AccountkgKey
: A unique identifier to "authenticate" incoming HTTP requests
Once the function is deployed as a Google Cloud Function that is triggered via cron job HTTP request, the function then:
- Checks to make sure that your table exists and if it doesn't create one
- Gathers all of the existing IPs from your BigQuery table
- Scrapes the offical GoogleBot and BingBot IP address files
- Cross references the pre-existing IPs in BigQuery and the official scrapped IPs
- Saves the IPs that were found on the official list but, not in BigQuery