This project was developed as part of the Caprae Capital AI Readiness Challenge by Kavya Bhardwaj.
It is a smart, automated tool to extract structured company data (leads) using a mix of APIs, Regex, and AI models.
Given a list of company names, it fetches:
- π Official Website (via Serper API)
- π§ Emails (via Hunter.io and website scraping fallback)
- π€ CEO Name (via DataForSEO SERP + Hugging Face AI QA model)
- π LinkedIn URL (via Google SERP matching)
- π° Revenue Estimates (regex from live SERPs + AI fallback)
- π₯ Employee Estimates (regex logic + Hugging Face fallback)
- π Exports everything to a downloadable
.csv
| Data Field | Powered By |
|---|---|
| Website | β Serper API |
| Emails | β Hunter.io API, β Regex (fallback) |
| CEO Name | β DataForSEO API + π€ Hugging Face AI |
| LinkedIn Profile | β DataForSEO API |
| Revenue | β DataForSEO API, Regex, π€ Hugging Face AI |
| Employees | β DataForSEO API, Regex, π€ Hugging Face AI |
- β Regex + AI fallback pipeline
- β
Fallback email scraping using
BeautifulSoup - β Range filtering for employees (e.g., 1Kβ1M)
- β Currency-aware parsing (USD, INR, Crore)
- β Retry logic for missing data
- β Fully exportable CSV
| Company | CEO | Revenue | Employees |
|---|---|---|---|
| Apple | Tim Cook | $391.04B | ~164,000 |
| Zomato | Deepinder Goyal | βΉ12,114 Cr | ~4,000 |
| Sundar Pichai | $350.02B | ~183,000 |
- Install requirements:
!pip install transformers beautifulsoup4 requests-
Paste the code into a Google Colab or local Python script
-
Run:
companies = ["Apple", "Google", "Zomato"]
results = process_companies(companies)- Export CSV:
import pandas as pd
pd.DataFrame(results).to_csv("lead_results.csv", index=False)Feel free to connect:
- βοΈ bhardwajkavya099@gmail.com
Built with π‘ by Kavya Bhardwaj