Download - D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

DESIGN & ANALYSIS OF ALGORITHM02 – HASHING (CONTD.)

Informatics Department

Parahyangan Catholic University

ANALOGY

Let's say that you have a drawer full of socks, 20 red socks (all identical) and 12 blue socks, and it is dark in the room. How many socks should you grab, to assure that you have at least one matching pair ?

How about 20 red socks, 12 blue socks, and 8 green socks ?

How about unlimited # ofred, blue, green, yellow, and purple socks ?

ANALOGY

In a city of 2 million people, no one has more than 1.5 million hairs on his/her head. Can you show that at least two people in the city have exactly the same number of hairs on their heads?

PIGEONHOLE PRINCIPLE

In mathematics, the pigeonhole principle states that if n pigeons are put into m pigeonholes with n > m, then at least one pigeonhole must contain more than one pigeon.

-- wikipedian = the range of

possible keysm = the size of the hash table

COLLISION When possible key range > table size, two

distinct keys k1 and k2 may be mapped to the same indexh(k1) = h(k2)

This condition is known as collision resolution strategy is requiredyellow orange red green blue black white

??

COLLISION HANDLING3 STRATEGIES

Open addressing Linear probing Quadratic probing Double Hashing

Separate chaining

Coalesced hashing

COLLISION HANDLINGOPEN ADDRESSING

In open addressing, a colliding entry will be placed in a new slot in the same table

John Smith

Lisa Smith

Kenny Baker

J (74)

K (75)

L (76)

M (77)

N (78)

Jane Smith

John Smith / 521-8976

Lisa Smith / 521-5030

Kenny Baker / 418-4165

Jane Smith / 521-1234

Kayla Newman

?

COLLISION HANDLINGSEPARATE CHAINING

In separate chaining, colliding entries are stored in linked list in different area

John Smith

Lisa Smith

Kenny Baker

J (74)

K (75)

L (76)

M (77)

Jane Smith

Kayla Newman

John Smith 521-8976

Jane Smith 521-1234

Kenny Baker 418-4165

Lisa Smith 521-5030

Kayla Newman418-4222

COLLISION HANDLINGCOALESCED HASHING

Coalesced hashing combines open addressing and separate chaining. It uses linked list like separate chaining, but stored in empty slot in the same table

John Smith

Lisa Smith

Kenny Baker

J (74)

K (75)

L (76)

M (77)

N (78)

Jane Smith





Kayla Newman

Kayla Newman / 418-4222

PERFORMANCE ANALYSIS

What is the advantage and disadvantage of the three collision handling methods ? How to compare them ? What measurement can we use ?

Load Factor : what is the average number of elements stored in a slot ?

Probe Number : how many slots we need to examine before finding the empty slot ?

EXAMPLE :: OPEN ADDRESSING

Load factor = 1 (because every slot only has 1 element)

Probe number for “Lisa Smith” = ? Probe number for “Kayla Newman” = ?

John Smith

Lisa Smith

Kenny Baker

J (74)

K (75)

L (76)

M (77)

N (78)

Jane Smith





Kayla Newman

Kayla Newman / 418-4222

EXAMPLE :: SEPARATE CHAINING

Load factor = #of probe (because collided elements are stored in linked list)

Probe number for “Jane Smith” = ? Probe number for “Kayla Newman” = ?

John Smith

Lisa Smith

Kenny Baker

J (74)

K (75)

L (76)

M (77)

Jane Smith

Kayla Newman

John Smith 521-8976

Jane Smith 521-1234

Kenny Baker 418-4165

Lisa Smith 521-5030

Kayla Newman418-4222

What if we insert new element in the

beginning of the list ?

EXAMPLE :: COALESCED HASHING

John Smith

Lisa Smith

Kenny Baker

J (74)

K (75)

L (76)

M (77)

N (78)

Jane Smith





Load factor = 1 (because every slot only has 1 element)

What is the advantage of this method ? How many slot(s) to check to insert Kenny Baker ? How many slot(s) to check to search Kenny Baker ?

OPEN ADDRESSING

In open addressing, a colliding entry will be placed in a new slot in the same table (using hash function h(k,i), where i is the probe number)

There are generally 3 techniques to decide the next slot to be filled : linear probing quadratic probing double hashing

The sequence of h(k,0), h(k,1), h(k,2), … is called probe sequence

OPEN ADDRESSINGLINEAR PROBING Define

where h`(k) is the initial hash function, and i is the probe number for key k

Example: m=13 k = 5 h’(k) = 5 k = 18 h’(k) = 5 (collision)

h(k,1) = (5+1) mod 13 = 6 k = 19 h’(k) = 6 (collision)

h(k,1) = (6+1) mod 13 = 7 k = 31 h’(k) = 5 (collision)

h(k,1) = (5+1) mod 13 = 6 (collision)h(k,2) = (5+2) mod 13 = 7 (collision)h(k,3) = (5+3) mod 13 = 8

mikhikh mod))`((),(

Suffers from primary clustering

Clusters arises since an empty slot preceded by i non-empty slots gets filled next with probability (i+1)/m

There are only m distinct probe sequence

OPEN ADDRESSINGLINEAR PROBING

idx Data

1 A

2 B

3 C

4 D

5 E

6 F

7

8

9

… …

Every k which h(k) between 1 and 6 will be placed in this slot

OPEN ADDRESSINGQUADRATIC PROBING

Definewhere h`(k) is the initial hash function, i is the probe number for key k, and c1 & c2 are some constant

micickhikh mod))`((),( 221


Example: m=13, c1=2, c2=3 k = 5 h’(k) = 5 k = 18 h’(k) = 5 (collision)

h(k,1) = (5+2*1+3*12) mod 13 = 10 k = 19 h’(k) = 6 k = 31 h’(k) = 5 (collision)

h(k,1) = (5+2*1+3*12) mod 13 = 10 (collision)

h(k,2) = (5+2*2+3*22) mod 13 = 8 k = 32 h’(k) = 6(collision)

h(k,1) = (6+2*1+3*12) mod 13 = 11

h(k,1) is not exactly next to h’(k), thus

avoid primary clustering problem

However, keys with same h’(k) are re-hashed to same place. This leads to a milder form of clustering, called secondary clustering.(again, there are only m distinct probe sequence)


Observe these 2 cases where h’(k)=5 and h’(k)=6(m=13, c1=2 and c2=3)

Note that only slot 0, 5, 6, 8, 9,10, and 12 can be filled by keys with h’(k)=5

Only slot 0, 1, 6, 7, 9, 10, and 11 can be filled by keys with h’(k)=6

h'(k) = 5 h'(k) = 6

probe# h(k,i)

probe# h(k,i)

1 10 1 11

2 8 2 9

3 12 3 0

4 9 4 10

5 12 5 0

6 8 6 9

7 10 7 11

8 5 8 6

9 6 9 7

10 0 10 1

11 0 11 1

12 6 12 7

13 5 13 6

This suggest that some slots might get filled with

higher probability than the others.


The choice of m, c1, and c2 are important for m = 2n , a good choice is c1 = c2 = 0.5 For prime m > 2, most choice of c1 and c2 will

make h(k, i) distinct for i in [0, (M-1)/2)].

Example: m = 24 = 16, c1 = c2 = 0.5, h’(k) = 0 Probe # h(k,i) Probe # h(k,i)

0 0 8 41 1 9 132 3 10 73 6 11 24 10 12 145 15 13 116 5 14 97 12 15 8

OPEN ADDRESSINGDOUBLE HASHING

Definewhere h1(k) is the initial hash function, i is the probe number for key k, and h2(k) is a different hash function than h1(k)

Two different keys a and b that initially hashed to the same location (h1(a) = h1(b)) will have a different probe sequence, since h2(a) ≠ h2(b)

mkhikhikh mod))()((),( 21

OPEN ADDRESSINGDOUBLE HASHING

h2(k) must be relative prime to m

Example : Let m be the power of 2 and h2(k) always returns

an odd number Let m be prime and h2(k) always returns positive

integers less than m

There are Θ(m2) distinct probe sequence

BASIC HASH TABLE OPERATION

INSERT(key, value)we have discussed this a lot

value SEARCH(key)similar to INSERT

DELETE(key)do not delete the value, mark it “deleted” instead

(why?)

In separate chaining, all three operations are merely inserting, searching, and deleting in appropriate linked

list

When to stop searching ?

INSERTIONIN OPEN ADDRESSING

INSERT(key, value)// returns true if key is successfully inserted// returns false otherwisei = 0while(i < m)

idx = HASHFUNCTION(key, i)if(table[idx] is empty or marked as deleted)

table[idx] = (key,value)return true

elsei = i+1

return false

SEARCHINGIN OPEN ADDRESSING

SEARCH(key)// returns associated value if key is found// returns null otherwise

i = 0while(i < m)

idx = HASHFUNCTION(key, i)if(table[idx] is empty)

return null //reached an empty slot, //so key is must not be

in the hash tableelse if(table[idx] not marked as deleted AND

table[idx].key == key))return table[idx].value //key found

elsei = i+1 //try the next slot

return null //tried all m possible slots and key not found

DELETIONIN OPEN ADDRESSING

DELETE(key)// returns associated value if key is found // and successfully deleted. returns null otherwisei = 0while(i < m)

idx = HASHFUNCTION(key, i)if(table[idx] is empty)

return null //reached an empty slot, //so key is must not be

in the hash tableelse if((table[idx] not marked as deleted

AND table[idx].key == key))

temp = table[idx].valuemark table[idx] as deletedreturn temp

elsei = i+1 //try the next slot

return null //tried all m possible slots and key not found

Download - D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University

Top Related