GitHub - emmaolof/NFL-draft-picks---ETL: An ETL Project to retrieve NFL draft picks data from various websites, transformed in python and stored data in a MongoDB database. Key tools: Python, Beautiful Soup, Splinter/Selenium, MongoDB.

ETL Project Final Report

Brent Thomas Emmanuel Olofinkua Matt Houser

February, 29 2020

Extract: your original data sources and how the data was formatted (CSV, JSON, pgAdmin 4, etc).
Transform: what data cleaning or transformation was required.
Load: the final database, tables/collections, and why this was chosen.

Extract For this project we are asked to create and perform ETL successfully on multiple sources of data. We did this by extracted data from two separate websites, bleacherreport.com and USAtoday.com. These websites contained articles that discussed the all-time NFL draft picks, more specifically the best and worst picks for each team. The bleacher report article discussed first round picks only while the USA today article included all rounds.

Each team’s best and worst NFL draft picks (1st round only) https://bleacherreport.com/articles/2767040-every-nfl-teams-best-and-worst-1st-round-draft-pick-of-the-super-bowl-era#slide4

Each team’s best and worst NFL draft picks (all rounds) https://ftw.usatoday.com/2018/04/nfl-best-worst-picks-team

Transform To clean and transform our datasets, within Jupyter Notebook we first created a path to open chrome driver and defined the URL. We then used find.all to display all the paragraphs, titles, and headers within each article for reference.

Using a for-loop we were able to print these references to narrow down results. We then used the import re (regular expression) function to include only the teams with their corresponding best and worst draft picks. After this, we ran an additional for-loop that would loop through all three pieces of data (team, best pick, worst pick) and separate them for loading purposes.

When scraping the Bleacher Report article, we came across an issue with a blank string interrupting the data, causing an extra space between players and throwing the data set off. We needed to filter the blank lines out that were occurring around when there was a break in the html.

Load After we scraped our web pages and loaded them into the data frames, we created a local connection to mongoDB and loaded both collections to our NFL_Best_Worst_db in Mongo.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ETL Project Final Report.docx		ETL Project Final Report.docx
Project 1.ipynb		Project 1.ipynb
README.md		README.md
chromedriver.exe		chromedriver.exe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages