Created a tfidf classifier which can now pick up on whether or not an OCR scan of an arbitrary newspaper page is a sports' page or not with high precision but low recall. Used this to extract all the sports pages from my college's 120 year archive of frequent newspaper editions which is available publiclly. I then analyzed and visuailzed interesting data from the exclusively sports pages.