The challenge is to build a lightweight spelling checker by leveraging bloom filter without having to filter through a dictionary file (dict.txt), thus minimizing disk and memory usage.
bloom_filter/__init__.py
contain the bloom filter class implementation which includes calculate_optimal_m_k(n, fp)
function that takes the number of items to be stored in the filter and the false positives as arguments. This function is used to calculate the most efficient number of bits per item and number of hash functions needed to achieve the false positive. The bloom filter class also include the compute_hashes(self, data)
function which uses a non-cryptographic hash function Fowler–Noll–Vo hash algorithm to generate the hash values to represent items in our bloom filter data structure.
bloom_filter/__main__.py
contains build_bloom_filter_from_file
helper for loading words in the dictionary into the bloom filter. The filter called words.bf
is then generated and saved in the root directory.
-
Usage:
spellcheck < -build > <dictionary_file> [-fp <false_positive_rate>]
./spellcheck -build dict.txt -fp 0.01
-
Output:
Bloom filter built and saved to words.bf
The words.bf
is loaded and the Bloom Filter Class is reconstructed and we can use our query(self, item)
function to check if a word exist in the bloom filter
-
Usage
spellcheck < -check > <words>
./spellcheck -check The challege is to build your own micro spel cecker that can determeane if a word is probably spelt correcly
-
Output:
These words are spelt wrong: challege spel cecker determeane correcly