Automated agent to scrape book data from BookDP.com.au, enrich it with AI insights, calculate costs, and dispatch it to your favorite productivity tools via Make.com!
This project is a TypeScript-based automation agent designed to:
- Scrape Book Data: Utilizes Puppeteer & Browser Pool to dynamically browse BookDP.com.au based on a user-provided thematic keyword (e.g., "australian history," "sci-fi novels").
- AI Enrichment: Leverages an AI model (e.g., OpenAI's GPT) to add value to the scraped data. This could include generating summaries, getting relevance score.
- Cost Calculation: Processes pricing information, value score, discounts.
- Productivity Integration: Sends the structured, enriched data to a Make.com webhook, enabling seamless integration with tools like Google Sheets, Trello, Slack, Notion, etc.
The core goal is to demonstrate robust web automation, AI integration, efficient data processing, and third-party workflow automation.
- Keyword-based Scraping: Dynamically fetches books related to any theme.
- Robust Web Automation: Powered by Puppeteer with browser-pool for concurrent, load-balanced browser instance management and scraping.
- Concurrent Processing: Designed to scrape and process multiple book entries concurrently for significant speed improvements.
- AI-Powered Insights: Enriches raw data with intelligent summaries, classifications, or other AI-generated content.
- Flexible Cost Analysis: Basic cost extraction and calculation logic.
- Seamless Make.com Integration: Easily pipe data into your existing workflows.
- Typed & Modular: Built with TypeScript for better maintainability and type safety.
- Language: TypeScript
- Runtime: Node.js
- Web Scraping: Browser Pool & Puppeteer
- AI Integration: OpenAI API
- Workflow Automation: Make.com
- Messaging: Apache Kafka
- Docker for contenarization.
- A Make.com account and a configured scenario with a Webhook trigger.
- OpenAI API Key
-
Clone the Repository:
git clone https://github.com/talibackend/anchor-browser-test.git cd anchor-browser-test -
Set Up Environment Variables:
cp .env.example .env
-
Build & Start the Project with Docker
docker build -t anchor_browser_test:latest . docker-compose up -d✅ The swagger docs should be available on http://localhost:9898/doc
The agent is designed as a web service with asynchronous background processing.
-
API Layer (e.g., using Express.js):
- Handles incoming HTTP requests.
- Performs rate limiting on requests.
- Responsible for validating input (e.g., presence of 'theme' in
/scrapepayload). - Manages API endpoints:
POST /scrape: Initiates a new job.GET /status/:id: Checks job progress.GET /results/:id: Retrieves job output.
-
Job Management System:
- When a
/scraperequest is received:- A unique
idis generated. - The job (theme, status: 'running') is stored in a postgres database.
- A background worker is being initialized for the job.
- A response containing the id of the job is returned.
- A unique
- When a
-
Background Worker Process:
- Scraping:
- Uses browser-pool to efficiently manage multiple Puppeteer browser instances.
- Navigates to the search output page of BookDP.com.au, this helps to skip entering the input. i.e:
let url = `https://bookdp.com.au/?s=${replaceAll(job.theme.toLowerCase(), " ", "+")}&post_type=product`;
- Get urls of all books.
- Publish each book url to the
scrape_book_jobstopic.
- Scraping:
-
Kafka Worker:
-
Listens to the Kafka topic
scrape_book_jobs. -
For each message:
-
Scrapes relevant book data using Puppeteer from the browser-pool.
-
Enriches the data with AI (summary, classification, etc.).
-
Calculates derived metrics like discount scores.
-
Sends the final structured result to a Make.com webhook.
-
Google sheet is https://docs.google.com/spreadsheets/d/1RMVdqPP_iDW1bjb0KGLRRkWQKZiQU7TkZ56i9ueLkWU/edit?usp=sharing
-
Store and update relevant data using a transaction.
This simplified, single-worker design ensures all processing happens asynchronously and in one place, making deployment and debugging easier while still enabling concurrency and resilience via Kafka.
-
-
-
Log in to your Make.com account.
-
Create a New Scenario.
-
Add a Webhook Trigger:
- Search for the "Webhooks" app and select it.
- Choose the "Custom webhook" trigger.
- Click "Add", give your webhook a name (e.g., "Book Scraper Hook"), and click "Save".
- Make will generate a unique URL. Copy this URL and paste it into your
.envfile asMAKE_WEBHOOK_URL. - Make will now be "listening" for data. You might need to run your script once to send sample data so Make can determine the data structure.
-
Add Subsequent Modules:
- Click the "+" to add another module after the webhook.
- Choose the app you want to send data to (e.g., Google Sheets, Trello, Slack, Airtable).
- Configure the module. For example:
- Google Sheets "Add a Row": Select your spreadsheet and sheet. Map the fields from the webhook data (e.g.,
1.title,1.author,1.ai_summary) to the columns in your sheet.
- Google Sheets "Add a Row": Select your spreadsheet and sheet. Map the fields from the webhook data (e.g.,
-
Data Structure: The webhook will receive a JSON payload. Below is a sample:
{ "title" : "string", "author" : "string", "current_price" : "number", "original_price" : "number", "url" : "string", "relevance_score" : "number", "value_score" : "number", "job_id" : "number", "discount_amount" : "number", "discount_percent" : "number", "summary" : "string" } -
Activate Your Scenario:
- Once configured, turn your scenario "ON" (usually a toggle switch at the bottom).
- Ensure "Run immediately" is enabled if you want it to process data as soon as it arrives.
-
Screenshots
- Below are some screenshots related to the make scenerio and the google sheet
- Website Structure Dependent: The scraper's reliability heavily depends on the HTML structure of BookDP.com.au. Changes to the website may break the scraper, requiring updates to selectors.
- Rate Limiting & IP Bans: Aggressive scraping can lead to IP blocks or CAPTCHAs.
- AI Model Costs & Limits: AI API calls incur costs and are subject to rate limits by the provider.
- AI Accuracy: The quality of AI enrichment depends on the model used and the clarity of prompts. AI may occasionally produce inaccurate or nonsensical results.
- Make.com Plan Limitations: Your Make.com plan might have limits on the number of operations or data transfer.
- Error Handling: While basic error handling is included, complex scenarios (e.g., network timeouts during AI calls, specific website errors) might require more sophisticated retry mechanisms.
- Implement proxy rotation and user-agent switching.
- More sophisticated error handling and retry logic.
- GUI or web interface for easier use.






