hashing

Hashing

Main ch. 11.2-11.5 Background:

– We want to store a collection of data. We want to add to, delete from, and search in the collection

– What is the average case complexity of add, delete, and search if:

The collection is stored as an unsorted array The collection is stored as a sorted array The collection is in a binary search tree

We Want to Do Better

Hashing has good average case behavior

Suppose– You want to keep track of students via their

student ID’s– If students ID’s range from 0-99 this is easy– What if SS type numbers are used?

This type of data is known as sparse data The SS# becomes a key for obtaining the data

Suppose

We want to keep track of only a small number of students in an array of size 10, and suppose we use SS#’s as keys

– We need a function that maps key values (SS #’s) to array indices (integers between 0 and 9)

– Such a function is called a hash function.– An example hash function could be: hash(ssn) =

ssn%10

Choosing a HashingFunction

For the previous example, we could have used:

– hash(ssn) = the first number in the ssn– We want a hash function that uniformly distributes the

keys throughout the array. This is called uniform hashing.– If you use a division hash function (remainder of division),

it is best to have a table size that is a prime number of the form 4k+3.

– see Main, p. 552 for other kinds of hash functions

What could go wrong?

If possible, store an object with key value key in array[hash(key)].

This is not always possible: you may want to add an object whose key value hashes to an index that’s already in use.

– This is called a collision.– What is the big-oh time complexity of a hash

table lookup (search) if there are no collisions?

Handlihng Collisions

Linear probing– Place the object in the next open spot– How would you find an object in a hash table

that uses linear probing?– How would you delete an object from a hash

table that uses linear probing?– See the example in Main p. 550-551

Linear probing

Performance isn’t all that great Easy to implement As the hash table gets fuller, larger and

larger consecutive stretches of the array will be filled. This is called clustering.

Double Hashing

If there is a collision, hash the key again, using a second hash function.

Double hashing is also called rehashing

Chained Hashing

Each element in the array can hold a list of elements.

Hash the key and put the object in the list in array[hash(key)]

See the demo at: http://www2.ics.hawaii.edu/~richardy/project/hash/applet.html

A Hash Function forNames

private int hashFunction( String name ) {int hashValue = 0;char cName = name.toCharArray();

for (int j=0; j < cName.length; j++) {hashValue += cName[j];

}

return hashValue % size;}

Note that size is previously defined as the size of the hash table.

Time Analysis

The load factor of a hash table is defined as follows:

array s table' theof size

tablein the elements ofnumber

Searching with LinearProbing

In a non-full hash-table with no removals, and using uniform hashing, the average number of table elements examined in a successful search is approximately:

)1

11(

2

1

Searching withDouble Hashing

In a non-full hash-table with no removals, and using uniform hashing, the average number of table elements examined in a successful search is approximately:

)1ln(

Searching withChained Hashing

In a non-full hash-table, using uniform hashing, the average number of table elements examined in a successful search is approximately:

21

Ave. # of Elements Examined

During a Search (Main p. 561)

Load Factor

Linear Probing

Double Hashing

Chained Hashing

0.5 1.5 1.39 1.25

0.6 1.75 1.53 1.3

0.7 2.17 1.72 1.35

0.8 3.0 2.01 1.4

0.9 5.5 2.56 1.45

1.0 NA NA 1.5

hashing

Documents