Preliminary
- Wrangle data into relational form
- Open Eclipse
- Go to File -> Import -> Existing Maven Project
- Select the "wizard" folder
- Once all is imported, click on "run" (green arrow)
- It will ask you for a build goal - in the window type the build goal as "jetty:run"
- Press the green arrow again and you're done!
- Detect hard functional dependencies
- Iterate over all pairs of attributes (all permutations of two attributes)
- For each pair a and b, run a query to detect if there are more than one value of b determined by a (not FD therefore)
- Record all sucessfull FD's to run statistical analysis later.
- If two attributes are not a functional dependency, but have high corrolation
- Typo detection - detect if a->b is true, but ruled false only because b can be a typo
- Detect typo using edit distance + some clustering
- Prompt the user to verify otherwise.
- Another method of detecting typo's- use typo frequency in the word (more typos likely to be in the middle) to rule if it's a typo.