1 Branch 0 Tags

Name		Name	Last commit message	Last commit date
Latest commit hwsamuel Added scripts for scraping OHN data archives Oct 17, 2017 4ac76d0 · Oct 17, 2017 History 6 Commits
README.md		README.md	Updated documentation for Doc2Doc and OHN plus unit tests	Oct 17, 2017
doc2doc.csv		doc2doc.csv	Initial scraped Doc2Doc dataset added	Oct 16, 2017
doc2doc.py		doc2doc.py	Added unit tests	Oct 17, 2017
ohn.csv		ohn.csv	Added scripts for scraping OHN data archives	Oct 17, 2017
ohn.py		ohn.py	Added scripts for scraping OHN data archives	Oct 17, 2017

Repository files navigation

General Requirements for All Spiders

Make sure you have Python 2.7 installed
Make sure you have the latest version of lxml installed via pip install lxml==4.1.0
Make sure you are connected to the Internet and the the target website is currently functional

Scrape the BMJ's Doc2Doc Archives in 3 Steps

Specify the output file to write results to by editing the doc2doc.py file's main entry point, e.g. Spidey().crawl('doc2doc.csv')
Run the script via command line or terminal python doc2doc.py which will create a comma-separated output file (unit tests available)
You can open the output file in a program such as Microsoft Excel or LibreOffice Calc for customizing columns

Scrape the Optimal Health Network (OHN) Live Chat Archives in 3 Steps

Specify the output file to write results to by editing the ohn.py file's main entry point, e.g. Spidey().crawl('ohn.csv')
Run the script via command line or terminal python ohn.py which will create a comma-separated output file (unit tests available)
You can open the output file in a program such as Microsoft Excel or LibreOffice Calc for customizing columns

About

Collection of scripts for data scraping tasks on various health forums

scraper health-forums

AGPL-3.0 license

Report repository

Releases

No releases published

Sponsor this project

ko-fi.com/hwsamuel

Packages

No packages published

Languages

Python 100.0%