Skip to content

Latest commit

 

History

History
12 lines (7 loc) · 746 Bytes

README.md

File metadata and controls

12 lines (7 loc) · 746 Bytes

Simple project to parse the index files from rosetta stone's pdf "index" to determine how many words there are.

The src/parse_index.lua can parse the vocab/rosetta_eng_n.txt files (where n is 1-5)

In the vocab directory, all the rosetta_eng_n.txt (where n is 1-5) are the words as listed in the index of rosetta stone's site.

For example, see index at http://resources.rosettastone.com/assets/ce/1312988037/assets/pdfs/course_contents/rs/en-US/level_1/ENG.pdf

The vocab_eng_n.txt where n is 1-5, are the processed versions of the rosetta_eng_n.txt files after running it through src/parse_index.lua

vocab_eng_full.txt is the concatenated version of all the vocab_eng_n.txt files vocab_eng_full_sort is the sorted uniq'd version of the file