- Make sure you have Python 2.7 installed
- Make sure you have the latest version of lxml installed via
pip install lxml==4.1.0
- Make sure you are connected to the Internet and the the target website is currently functional
Scrape the BMJ's Doc2Doc Archives in 3 Steps
- Specify the output file to write results to by editing the
doc2doc.py
file's main entry point, e.g.Spidey().crawl('doc2doc.csv')
- Run the script via command line or terminal
python doc2doc.py
which will create a comma-separated output file (unit tests available) - You can open the output file in a program such as Microsoft Excel or LibreOffice Calc for customizing columns
Scrape the Optimal Health Network (OHN) Live Chat Archives in 3 Steps
- Specify the output file to write results to by editing the
ohn.py
file's main entry point, e.g.Spidey().crawl('ohn.csv')
- Run the script via command line or terminal
python ohn.py
which will create a comma-separated output file (unit tests available) - You can open the output file in a program such as Microsoft Excel or LibreOffice Calc for customizing columns