SE-Radio Episode 358: Probabilistic Data Structure for Big Data Problems

Filed in Episodes by on February 27, 2019 1 Comment

Andrii Gakhov, author of the book Probabilistic Data Structures and Algorithms for Big Data Applications talks about probabilistic data structures and their application to the big data domain. Host Robert Blumen spoke with Dr. Gakhov about how probabilistic data structures differ from their exact counterparts; hash functions – cryptographic and non-cryptographic; space versus accuracy tradeoffs; space versus processing time tradeoffs; the main problem domains: membership testing, cardinality, frequency, similarity and rank.  Bloom Filters for membership testing: performance characteristics, use cases, design patterns using Bloom Filters for lookup problems; and how they are implemented.  LinearCount and HyperLogLog for cardinality: use cases web applications, implementation.  CountMinSketch for frequency estimation.  Existing library support.  Should PDS be taught in beginning courses?

Related Links

Tags: , ,