Skip to content

Web scraper for news from CNN Lite via lite.cnn.com. Installable via Python pip.

License

Notifications You must be signed in to change notification settings

paulzuradzki/cnnlite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cnnlite

Description

cnnlite is a web scraper for CNN Lite. This package collects CNN Lite articles at https://lite.cnn.com.

Installation

>>> pip install cnnlite

Usage

# instantiate scraper
import cnnlite
from pprint import pprint

# start up scraper
scraper = cnnlite.CNNLite()

# export all articles to a json file
# default name: 'cnn_lite_<timestamp>.json'
scraper.to_json_file()

# show sample of headlines and URL list
print(scraper.headlines[:5])
print(scraper.urls[:5])

# large collection / nest dict containing each article for the day
docs = scraper.all_articles

# articles can be access one at a time too
article_name = scraper.headlines[0]
pprint(docs[article_name])

Sample CNN Lite Home Page and Output JSON

Bot Etiquetee

Be a good bot and comply with the publisher's robots.txt guidelines.

About

Web scraper for news from CNN Lite via lite.cnn.com. Installable via Python pip.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages