GitHub - timi-codes/bloom-filter-spelling-checker: Lightweight spelling checker using bloom filter. The challenge is to build a lightweight spelling checker by leveraging bloom filter probabilistic data structure

Bloom Filter Spelling Checker

The challenge is to build a lightweight spelling checker by leveraging bloom filter without having to filter through a dictionary file (dict.txt), thus minimizing disk and memory usage.

Bloom Filter

bloom_filter/__init__.py contain the bloom filter class implementation which includes calculate_optimal_m_k(n, fp) function that takes the number of items to be stored in the filter and the false positives as arguments. This function is used to calculate the most efficient number of bits per item and number of hash functions needed to achieve the false positive. The bloom filter class also include the compute_hashes(self, data) function which uses a non-cryptographic hash function Fowler–Noll–Vo hash algorithm to generate the hash values to represent items in our bloom filter data structure.

Build Bloom Filter from Dictionary

bloom_filter/__main__.py contains build_bloom_filter_from_file helper for loading words in the dictionary into the bloom filter. The filter called words.bf is then generated and saved in the root directory.

Usage: spellcheck < -build > <dictionary_file> [-fp <false_positive_rate>]
```
./spellcheck -build dict.txt -fp 0.01
```

Output:

Bloom filter built and saved to words.bf

Check Spelling

The words.bf is loaded and the Bloom Filter Class is reconstructed and we can use our query(self, item) function to check if a word exist in the bloom filter

Usage spellcheck < -check > <words>

./spellcheck -check  The challege is to build your own micro spel cecker that can determeane if a word is probably spelt correcly

Output:

These words are spelt wrong:
challege
spel
cecker
determeane
correcly

References

Coding Challenge #53 - Bloom Filter Spell Checker

Bloom Filters Explained

FNV Hash

Bloom Filter Calculator

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
bloom_filter		bloom_filter
test		test
.gitignore		.gitignore
README.md		README.md
dict.txt		dict.txt
spellcheck		spellcheck
words.bf		words.bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bloom Filter Spelling Checker

Bloom Filter

Build Bloom Filter from Dictionary

Check Spelling

References

About

Languages

timi-codes/bloom-filter-spelling-checker

Folders and files

Latest commit

History

Repository files navigation

Bloom Filter Spelling Checker

Bloom Filter

Build Bloom Filter from Dictionary

Check Spelling

References

About

Topics

Resources

Stars

Watchers

Forks

Languages