probabilistic data structures in real life
TRANSCRIPT
POSSIBLE SOLUTIONS
Brute force (15 TB of transactional data) Sampling (1 % of users => 1.2 mb / b.o.)Magic tool (?!)
EstimatorHyperLogLog allows to estimate > 1 000 000 000 sets of unique elements with 1% error, and requires only 4kb memory
50 000 000 basic operations
HYPERLOGLOG INTUITION
00101010101010001111010101101 => a[2] = 010010101010100101010101001011 => a[9] = 100000101010100101010101110101 => a[0] = 101010101010100100101010101010 => a[5] = 1
01010000000000000000000000010 => a[5] = 23
CORNER CASES
|(A not(B)) C| => |A C||A not(B)| = |Everything| - |B| + |A B||A not(B)| => |A| - |A B|