lecture 12-cs648-2013
TRANSCRIPT
Randomized AlgorithmsCS648
Lecture 12Hashing - II
1
RECAP OF LAST LECTURE
Problem Definition• called universe• and • Examples: ,
Aim Given a set , build a data structure storing s.t. we can answer in O(1) time :
“Does ?” for any given .
Hashing• Hash table: : an array of size .• Hash function : Answering a Query: “Does ?” 1. ;2. Search the list stored at .
Properties of :• computable in O(1) time. • Space required by : O(1).
⋮
⋮
0 1
𝑻
How many bits needed to encode ?
Elements of
CollisionDefinition: Two elements are said to collide under hash function if
Worst case time complexity of searching an item : No. of elements in colliding with .
⋮
⋮
0 1
𝑻
Universal Hash Family
Definition: A collection of hash-functions is said to be universal if there exists a constant such that for any ,
This definition appears strange in the beginning! But we shall soon see that there is a very natural way to arrive at this definition.
Perfect hashing using O() space
Let be Universal Hash Family. Let : the number of collisions for when ? Question: What is ?
Perfect hashing using O() space
Let be Universal Hash Family. Let : the number of collisions for when ? Lemma1:Lemma2:For , there will be no collision with probability at least .
Algorithm1: Perfect hashing for Fix ;Repeat1. Pick ;2. the number of collisions for under .Until .Build the hash table.
Theorem: A perfect hash function can be computed for in expected O() time.
HASHING WITH OPTIMAL SPACE AND WORST CASE O(1) SEARCH TIME
Optimal space hashing with worst case O(1) search time
be Universal Hash Family. : no. of collisions for when ? Lemma1:.
Question: What is ] when = ?
Answer: .
Optimal space hashing with worst case O(1) search time
be Universal Hash Family. : no. of collisions for when ? Lemma1: when .Algorithm:Fix ;Repeat1. Pick ;2. no. of collisions for under ;Until ;Build the hash table; //primary hash table
For each If size of list > 1 1. Build a perfect hash table for list ; 2. Make point to this hash table;
0 1
𝑻
Optimal space hashing with worst case O(1) search time
be Universal Hash Family. : no. of collisions for when ? Lemma1: when .Algorithm:Fix ;Repeat1. Pick ;2. no. of collisions for under ;Until ;Build the hash table; //primary hash table
For each If size of list > 1 1. Build a perfect hash table for list ; 2. Make point to this hash table;
0 1
𝑻
Optimal space hashing with worst case O(1) search time
be Universal Hash Family. : no. of collisions for when ? Lemma1: when .Algorithm:Fix ;Repeat1. Pick ;2. no. of collisions for under ;Until ;Build the hash table; //primary hash table
For each If size of list > 1 1. Build a perfect hash table for list ; 2. Make point to this hash table;
0 1
𝑻
Optimal space hashing with worst case O(1) search time
be Universal Hash Family. : no. of collisions for when ? Lemma1: when .Algorithm:Fix ;Repeat1. Pick ;2. no. of collisions for under ;Until ;Build the hash table; //primary hash table
For each If size of list > 1 1. Build a perfect hash table for list ; 2. Make point to this hash table;
𝑻 0 1
be Universal Hash Family. : no. of collisions for when ? Lemma1: when .
Let : number of elements in []Extra Space required: = = +
𝑻𝑻
0 1 2
. .
.
0 1 2
. .
.
Is there any relation between and ’s?
Theorem: A given set can be preprocessed in expected O() time to build a data structure (2-level hash table) of O() size such that any search query can be answer in worst case O(1) time.
WHY SUCH A DEFINITION FOR UNIVERSAL HASH FAMILY ?
Why does hashing work so well in Practice ?
A simple hash function: .• works so well in practice because the set is usually a uniformly random
subset of . As a result
• It is easy to fool this hash function such that it achieves O(s) search time.
This makes us think:“Can we achieve expected O(1) search time for any given set .”
similar question while Quick Sort Randomized Quick Sort
Universal Hash Family
A simple hash function: .
Definition: A collection of hash-functions is said to be universal if there exists a constant such that for any ,
A SIMPLE AND COMPACT UNIVERSAL HASH FAMILY
The starting point
The simple hash function: .
Problem: Two elements in are bound to collide if divides || .
Is there some operation which when applied over any distributes || randomly
uniformly over [0,1,…, ] ?
mod operation : a non-negative integer : a positive integer mod {0,1,…,}.
Question: How is | mod | related to ||mod ?Consider some Examples: • | 55 mod 31 43 mod 31 | = ?? and | 55 43| mod 31 = ??
• | 91 mod 31 102 mod 31 | = ?? and | 91 102| mod 31 = ??
Answer: Let = || mod . Then | mod | = ??
12 12
20 11
{, }
mod operation : a prime number: {}Consider any .Question: What can we say about set = { } ?Example: , .
1 2 3 4 5 6
mod 3 6 2 5 1 4
mod operation : a prime number: {}Consider any .Question: What can we say about set = { } ?Example: , .
Fact: = for all .Proof: = divides divides divides or divides
1 2 3 4 5 6
mod
mod 3 6 2 5 1 44 1 5 2 6 3
Not possible
mod operation : a prime number: {}Consider any .Define set = { } ?Fact: = for all .
Question: If , then what can we say about ?Answer: distributed randomly uniformly over .
Can you now see, that the above answer plays the key role in formulating the hash function ?
Good fact: An element is mapped to a random element in {}.
Slightly bad fact :Once element is mapped to a location, the mapping of is no more random.
So it is not clear whether| - | is mapped uniformly randomly over {0,…, }.…So let us see () a bit more closely…
12
.
.
.
𝑖
𝑖𝑥𝐦𝐨𝐝𝑝
𝑖+Δ
Probability of collision between and
Let
and will collide under if |mod mod | is divisible by .
Question: What is relation between |mod mod | and mod ?
Answer: |mod mod | is either mod or .
Probability of collision between and
Let Lemma: If and collide under , then either mod is divisible by or is divisible by .
{mod | } = ??
Let .Probability of collision between and = P(mod is divisible of or is divisible by ) 2 P(mod is divisible of )=
{,…, }Students must
realize that it is a necessary condition
and not sufficient condition for
collision. To get an idea, study the
example given at the last slide of this
lecture.
Theorem: Let , then H={| } is universal.
Example
, .Observe that =1Question: How many collisions between nd ?Answer: two (for =3,4).Here for =4.And for =3
Answer: No collisions! (although for here.)
1 2 3 4 5 6
2 4 6 1 3 5
3 6 2 5 1 4
4 1 5 2 6 3
5 3 1 6 4 2
6 5 4 3 2 1
1 2 3 4 5 6
123456
Table storing
Homework:
Let , Then prove that H={| } is universal. In particular, show that for any ,
Hence it is slightly better than the hash family discussed just now.