Bloom filters are approximate data-structures hat says with 100% certainity that an element does not belong to a set
- False positivity rate increases with the increase in the size of the filter
- It was created by a researcher called bloom, so itβs named after him
- Filter = Bit Array
- We do not have to re-implement bloom filter
- There are libraries in every single language
- Redis has it as ne of itβs core features (Now-a-days people mostly go with this)
- Use it whenever
- You need to insert but not remove the data
- You need a NO with 100% certainity
- Having false-positivity is okay
Eg: Medium recommendation, feed generation, web crawlers, linkedin feed
Letβs say that we store each of these (apple, ball, cat) in an Array (Filter)
- Respective item will be stored in the array based on the hash function
- So we are only utilizing 1B (One bit) in the array (Itβs very space efficient)
What about finding if an elephant is stored in the array?
- Letβs get the hash β
Elephant β f(elephant) % 8 = 5
- So at the fifth index in the array, we donβt find any value, so there is no elephant in the list
But, when bloom filter says the item is present β We still need to check