Back-to-Basics Weekend Reading - Counting Bloom Filters
I am in India for the AWS Summits in Mumbai, Chennai and Bangalore (next week). As always in India I have an amazing time, the events are packed, the participants are extremely enthusiastic and eager to learn, the customers very appreciative and the food is just amazing.
This weeks reading was triggered by a note from Matt Wood who ran into a great in-depth analysis of the Bloom Filter data structure by Michael Nielsen on his Data Driven Intelligence blog. I love probabilistic data structures and Bloom filters have unique properties of possible false positives, but no false-negatives. They have been used in many network devices, network protocols and distributed applications where a question like “have I possibly seen this before” needs to be able to operate at very large scale.
Why Bloom filters work the way they do, Michael Nielsen, Data driven Intelligence, September 26,2012
In 2000 an improvement on the original Bloom filters called Counting Bloom Filters was published as part of the Summary Cache protocol. In Counting Blooms filters deletions to the sets can be applied more easily.
Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol, Li Fan, Pei Cao, Jussara Almeida, Anrei Broder, IEEE/ACM Transactions on Networking, 8(3):281-293,2000.