Skip to content

Latest commit

 

History

History
4 lines (2 loc) · 436 Bytes

File metadata and controls

4 lines (2 loc) · 436 Bytes

sports_page_classification_and_analysis

Created a tfidf classifier which can now pick up on whether or not an OCR scan of an arbitrary newspaper page is a sports' page or not with high precision but low recall. Used this to extract all the sports pages from my college's 120 year archive of frequent newspaper editions which is available publiclly. I then analyzed and visuailzed interesting data from the exclusively sports pages.