cs 221 guest lecture: cuckoo hashing shannon larson march 11, 2011

42
CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

CS 221Guest lecture: Cuckoo Hashing

Shannon LarsonMarch 11, 2011

Page 2: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Learning Goals

• Describe the cuckoo hashing principle• Analyze the space and time complexity of

cuckoo hashing• Apply the insert and lookup algorithms in a

cuckoo hash table• Construct the graph for a cuckoo table

Page 3: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Remember Graphs?

• A set of nodes • A set of edges

• Here:

Page 4: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Graph Cycles

• A graph cycle is a path of edges such that the first and last vertices are the same

𝑣1 ,𝑣2 ,𝑣5 ,𝑣3 ,𝑣 4 ,𝑣1

Page 5: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Recall Hashing

• A hash function – Takes the target – Hashes x to a bucket

• Perfect hashing is ideal:– O(1) lookup– O(1) insert

• Perfect hashing is not realistic!

Page 6: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Cuckoo Hashing: the idea

• Remember the cuckoo bird?– Shares a nest with other species…– …then kicks the other species out!

• Same idea with cuckoo hashing– When we insert , we “kick out” what occupies the

nest, – Then finds a new, alternate home

Page 7: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Why is this cool?

• Perfect hashing guarantees– O(1) lookup, O(1) insert

• Cuckoo hashing guarantees– O(1) lookup– O(1) insert**

• Other hashing strategies can’t guarantee this!

• Also, it’s an option for your final project

** There’s a caveat here, but we’ll see it later

Page 8: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Cuckoo Hashing: Two Nests

• Suppose we have TWO hash tables – they each have a hash function – we prefer , but if we have to move we’ll go to – if we’re in and have to move, we’ll go back to

• This is our collision strategy for cuckoo hashing– Different from linear probing/open addressing– Different from trees

Page 9: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Cuckoo Hashing: Example

• We want to insert • There are no conflicts anywhere

x

h1(𝑥 )

h2(𝑥 )

Page 10: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Cuckoo Hashing : Example

• Now we want to insert • There are no conflicts anywhere

y

x

Page 11: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Cuckoo Hashing : Example

• To insert , • Move to

z

x

y

oh no!

Page 12: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Cuckoo Hashing : Example

• Now we insert into

z

x

y

NOW we’re fine!

Page 13: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Cuckoo Hashing : Example

• The final table after inserting in order

x

y

z

Page 14: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Why two tables?

• Two tables, one for each hash function• Simple to visualize, simple to implement

• But, why two?• One table works just as well!• Just as simple to implement (all one table)

Page 15: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

One Table Example

• Let’s insert again, with • Again, preferred

x

h1(𝑥 )

h2(𝑥 )

Page 16: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

One Table Example

• Now insert • No conflicts, no problem

y

x

h1(𝑦 )

h2(𝑦 )

Page 17: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

One Table Example

• Now insert • But, another conflict with :

z

x

y

oh no!h1(𝑧 )

h2(𝑧)

Page 18: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

One Table Example

• First, move to

z

x

y

h1(𝑧 )

h2(𝑥 )

Page 19: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

One Table Example

• Now we move to

x

y

z

Page 20: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

One Table Example

• Final table after inserting in order

x

y

z

Page 21: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Graph Representation

• How can we represent our table?

• Why not a graph?– Nodes are every possible table entry– Edges are inserted entries• This is a directed graph• Direction from current location TO alternate location

Page 22: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Graph Example

• Remember our one-table example?

x

y

z

1

2

3

4

1 2

3 4

Page 23: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Infinite Insert

• Suppose we insert something, and we end up in an infinite loop– Or, “too many” displacements– Some pre-defined maximum based on table size

Page 24: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Example: Loops

• Remember our one-table example?

x

y

z

1

2

3

4

1 2

3 4

Page 25: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Example: Loops

• Let’s insert : no conflicts still

x

y

z

1

2

3

4

1 2

3 4w

Page 26: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Example: Loops

• Now let’s insert : displace

x

y

z

1

2

3

4

1 2

3 4aw

Page 27: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Example: Loops

• Now is placed, and is displaced (put in 4)

a

y

x

1

2

3

4

1 2

3 4zw

Page 28: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Example: Loops

• Now is placed, and is displaced (put in 3)

a

y

x

1

2

3

4

1 2

3 4wz

Page 29: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Example: Loops

• Notice what happens to the graph• We keep going and going and going….

1 2

3 4

Page 30: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Analysis: Loops

• Remember infinite loops in a new insert?

• In the graph, this is a closed loop– We might forever re-do the same displacements

• The probability of getting a loop increases dramatically once we’ve inserted elements– N is the number of buckets (size of table)– This is from the research on cuckoo hashing

Page 31: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Analysis: Loops

• What can we do once we get a loop?– Rebuild, same size (ok solution)– Double table size (better solution)

• We’ll need new hash functions for both

Page 32: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Analysis

• Lookup has O(1) time– At MOST two places to look, ever– One location per hash function

• Insert has amortized O(1) time– Think of this as “in the long run”– In practice we see O(1) time insert– You’ll see amortized analysis in CPSC 320• Remember the “grass and trees” analysis?

Page 33: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Lookup: The Code

Return the position of (either or )Otherwise, return false

lookup(x)return T[h1(x)] = x or

T[h2(x)] = x

Page 34: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Insert: The Code

Given a table (array) T and item to insert:insert(x)

if lookup(x)return; // if it’s already here, done

pos <- h1(x); // store h1(x)

for i <- 1 to M // loop at most M timesif T[pos] empty

T[pos] <- xreturn; // if T[pos] empty, done

swap x and T[pos]; // put x in T[pos]if pos = h1(x) // now we’re displacing

pos <- h2(x)

elsepos <- h1(x)

rehash(); // if we couldn’t stop, rehashinsert(x); // then insert currently displaced

end

Page 35: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Analysis: Load Factor

• What is load?– The average fill factor (% full) the table is

• What about cuckoo hash tables?– For two hash functions, load factor • Remember loops?

– For three hash functions, we get • That’s pretty great, actually!

Page 36: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

More hash functions

• What would this look like?• We would have three tables (simple case)– One hash function per table

• Or, we would have two alternates (one table)

Page 37: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

More hash functions

• What would this look like?• Each entry has TWO alternates, not one

x

y

z

Page 38: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

More hash functions

• When something comes in new (insert)– Put it in

• If it’s displaced, check – If that’s full, go to

• To lookup, we just look in or – Still constant time!

Page 39: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Even better load?

• Currently we’ve only put one item per bucket

• What if we had two cells per bucket?

x,w

y,a

z

Page 40: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Even better load?

• Currently we’ve only put one item per bucket

• What if we had two cells per bucket?

• What about collision strategies?– Round-robin (cells take turns swapping out)– FIFO (oldest resident gets kicked out)

Page 41: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Even better load?

Page 42: CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011

Links & Resources

• http://en.wikipedia.org/wiki/Cuckoo_hashing• http://www.ru.is/faculty/ulfar/CuckooHash.pdf• http://

www.it-c.dk/people/pagh/papers/cuckoo-undergrad.pdf

• No neat animations on the internet…yet!– Possible personal project?– Brownie points?– Pre-coop project?