on the k -independence required by linear probing and minwise independence
DESCRIPTION
On the k -Independence Required by Linear Probing and Minwise Independence. Mihai P ătra ș cu Mikkel Thorup ICALP’10. Linear Probing. h(x). insert(x). [Knuth’63] E[ time of one operation ] = O(1) “birth of algorithm analysis” - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/1.jpg)
On the k-Independence Required by LinearProbing and Minwise Independence
Mihai Pătrașcu Mikkel Thorup
ICALP’10
![Page 2: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/2.jpg)
Linear Probingf a c z m j
insert(x)
h(x)
[Knuth’63] E[time of one operation] = O(1)“birth of algorithm analysis”
But assumes h is a truly random function ⇒ not an algorithm, but a heuristic
![Page 3: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/3.jpg)
Implementable Hash Functions
k-independence [Wegman, Carter FOCS’79]
As we draw h from a family H:• uniformity: ( ) x U, ∀ ∈ h(x) uniform in
[b]• independence: ( ) x∀ 1, …, xk U,∈ h(x1), …, h(xk) i.i.d.
Possible implementation:• let U = prime field• draw a0, …, ak-1 U randomly∈
• h(x) = ( ak-1 xk-1 + … + a1x + a0 ) mod b
![Page 4: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/4.jpg)
Understanding Linear Probing
[Pagh, Pagh, Ružić STOC’07] 5-independence suffices!
Let N(v) = # leafs under v = 2level(v)
f a c z m
v
“Dangerous” if ¾ N(v) elements under it
![Page 5: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/5.jpg)
Understanding Linear Probing
Main Lemma: If h(x) is in a run of length 2k
level ⇒ k-1 ancestor or a sibling must be dangerous
* * * * * *
h(x)
insert(x)
One of these must be dangerous
![Page 6: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/6.jpg)
Understanding Linear Probing
Look at “construction time” = time to insert n elements
Main lemma construction time ≤ ⇒ Σdangerous v [N(v)]2
E[construction time] ≤ Σv [N(v)]2 · Pr[v dangerous]
2-independence Chebyshev bound ⇒ Pr[⇒ v dangerous] ≤ 1/N(v) E[construction time] ≤ ⇒ Σv N(v) = O(n lg n)
Just a classic balls-in-bins analysis!
![Page 7: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/7.jpg)
Understanding Linear Probing
Look at “construction time” = time to insert n elements
Main lemma construction time ≤ ⇒ Σdangerous v [N(v)]2
E[construction time] ≤ Σv [N(v)]2 · Pr[v dangerous]
4-independence ⇒ 4th moment bound Pr[⇒ v dangerous] ≤ 1/[N(v)]2
E[construction time] ≤ ⇒ Σv O(1) = O(n)
Just a classic balls-in-bins analysis!
![Page 8: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/8.jpg)
Understanding Linear Probingk=2 k=3 k=4 k=5
Construction time O(n lg n)Ω(n lg n) [PPR]
O(n)
Time/operation
![Page 9: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/9.jpg)
Understanding Linear Probingk=2 k=3 k=4 k=5
Construction time O(n lg n)Ω(n lg n) [PPR]
O(n)
Time/operation O(lg n) O(1)
One query with k-independence= keys arrange themselves by (k-1)-independence + the query hits a random location
![Page 10: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/10.jpg)
Understanding Linear Probingk=2 k=3 k=4 k=5
Construction time Θ(n lg n) Θ(n)
Time/operation ?? O(lg n) Θ(1)
Do we really need “one more” for bounds / operation?
![Page 11: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/11.jpg)
Understanding Linear Probingk=2 k=3 k=4 k=5
Construction time Θ(n lg n) Θ(n)
Time/operation Θ(√n) O(lg n) Θ(1)
Do we really need “one more” for bounds / operation?YES.Nasty 2-independent family such that:• often, ( ) run of √∃ n elements;• the query often falls in this bad run.
![Page 12: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/12.jpg)
Understanding Linear Probingk=2 k=3 k=4 k=5
Construction time Θ(n lg n) ?? Θ(n)
Time/operation Θ(√n) O(lg n) Θ(1)
Could 3-independence help?
![Page 13: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/13.jpg)
Understanding Linear Probingk=2 k=3 k=4 k=5
Construction time Θ(n lg n)Ω(n lg n)
Θ(n)
Time/operation Θ(√n) O(lg n) Θ(1)
Could 3-independence help? NODistribute keys down a tree:
n keys
n/2n/2
cost Ω (n2)
If Pr[Case 2] ≈ 1/n ⇒ 3-independence!
Case 1
0n
Case 2
![Page 14: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/14.jpg)
Understanding Linear Probingk=2 k=3 k=4 k=5
Construction time Θ(n lg n) Θ(n lg n) Θ(n)
Time/operation Θ(√n) Θ(lg n) ?? Θ(1)
Can both phenomena hit you simultaneously?
![Page 15: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/15.jpg)
Understanding Linear Probingk=2 k=3 k=4 k=5
Construction time Θ(n lg n) Θ(n lg n) Θ(n)
Time/operation Θ(√n) Θ(lg n) O(lg n)Ω(lg n)
Θ(1)
Can both phenomena hit you simultaneously? YES
Apply the bad 3-independent distribution only on the query path 4-independent!⇒
[Nasty proof ]
![Page 16: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/16.jpg)
Minwise Independence
Problem: how many packets pass through both A and B?Jaccard coefficient: |A ∩ B| / |A B|∪Algorithm:• hash and keep min h(A), min h(B)
• Pr[min h(A) = min h(B)] = |A ∩ B| / |A B|∪• repeat to estimate accurately
A
B
![Page 17: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/17.jpg)
Hashing Guarantees
Minwise Independence: for any S, x S∈Pr[ h(x) = min h(S) ] = 1 / |S|
Implies: Pr[min h(A) = min h(B)] = |A ∩ B| / |A B|∪
Minwise independence not easy to obtain
![Page 18: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/18.jpg)
Hashing Guarantees
ε-Minwise Independence: for any S, x S∈Pr[ h(x) = min h(S) ] = (1 ± ε) / |S|
Implies: Pr[min h(A) = min h(B)] = (1 ± ε) |A ∩ B| / |A B|∪
[Indyk SODA’99] Any c·lg(1/ε)-independent familyis ε-minwise independent
Here: Some c’·lg(1/ε)-independent familiesare not ε-minwise independent
![Page 19: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/19.jpg)
What it All Means
All our hash families are artificial… we understand the k-wise independence concept
In practice:• (a * x) >> shift
More results:• Ω(n lg n)-construction for linear probing• terrible minwise behavior
![Page 20: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/20.jpg)
What it All Means
All our hash families are artificial… we understand the k-wise independence concept
In practice:• (a * x) >> shift• tabulation-based hashing
Forthcoming paper:Simple tabulation (3-wise independent) achieves• linear probing in O(1) time (+ Chernoff concentration!)• o(1)-minwise independence
![Page 21: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/21.jpg)
What it All Means
All our hash families are artificial… we understand the k-wise independence concept
In practice:• (a * x) >> shift• tabulation-based hashing
The polynomial hash function:• performance not understood• but not too good in practice…
![Page 22: On the k -Independence Required by Linear Probing and Minwise Independence](https://reader035.vdocuments.net/reader035/viewer/2022062518/568143a5550346895db028c9/html5/thumbnails/22.jpg)
The End
Open problem: cuckoo hashing6-independence needed [Cohen, Kane]O(lg n)-independence suffices