###Proposal ####Eden Zik and Kahlil Oppenheimer

Click here to view the proposal:

#####Summary: Data analysis tools are extremely high in demand. However, one known issue with data analysis is an inconsistency in formatting between different data sources. Is my data compatible with the analysis tool I’m trying to use? This problem is only exacerbated when data is represented inconsistently between various mediums. Take for example a team of two, each trying to represent the same structured data. Let’s say that one person decided to create an Excel spreadsheet while the other wrote a Python script to populate a CSV file. What if they chose inconsistent names of attributes for enumerated types (i.e. “female” vs. “f” vs. “Female”)? What if one person decided to leave some cells blank, while the other didn’t? What if the person entering into Excel had typos that were corrupting various aggregation results? Suddenly, the team’s simple task of trying to merge and aggregate their data with some analysis tool becomes much more complicated.

To deal with this, many analysts spend a significant amount writing various scripts to transform multiple data sources into one standard. This “data wrangling” often requires manual transformations of columns and rows, correcting individual values, and aggregating data from multiple sources with different layout. Fortunately, two applications were created to aid this process along: Stanford’s Data Wrangler and Google’s Open Refine. Although both projects have been developed with a similar goal of a visual and intuitive interface for data wrangling, they create a short-term but not scalable solution. Both Data Wrangler and Open Refine export their “wrangled” result as one massive CSV file. This is fine for small-scale individual tasks, but this does not scale in the way that the modern world demands that data scale. Our project will take this output CSV and transform it into a specifiably normalized SQL relational database. This transformation will allow the user all of the well-known benefits of a relational database management system, such as referential integrity, data consistency, and easier compatibility with analysis tools. Our project aims to provide data analysts with the ultimate tool for normalization of structured but unwrangled data.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
img		img
slid		slid
src		src
sty		sty
vec		vec
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

###Proposal ####Eden Zik and Kahlil Oppenheimer

About

Uh oh!

Releases

Packages

Languages

schematicfission/proposal

Folders and files

Latest commit

History

Repository files navigation

###Proposal ####Eden Zik and Kahlil Oppenheimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages