Skip to content

Latest commit

 

History

History
38 lines (25 loc) · 1.02 KB

README.md

File metadata and controls

38 lines (25 loc) · 1.02 KB

Doctolaria Web Crawler

It is a web Crawler and Scraper to extract data from doctolaria site.

The informations collected are:

  • Name
  • Image_link
  • Specializations
  • Experiences
  • City
  • State
  • Address
  • Address_telephone

How to install and Run

After activate your Python Virtual Environment (venv) run the below command to install the dependencies:

pip install -r requirements.txt

Libraries and files

  • chromedriver.exe - Web driver used by Selenium to call Chrome. This executable is for Windows x64. If you are not confident to use this .exe file, OR have another Operation System, you can download the correct version at Selenium Chrome webdriver

How to use

python DoctoraliaWebCrawler.py

Observations about the target site

  • The pagination are limited to 100 pages and locked to 20 doctors per page
  • A doctor can have multiple addresses. In this project we are only extracting the First Address and the Telephones for this address