hashing
DESCRIPTION
Hashing. Main ch. 11.2-11.5 Background: We want to store a collection of data. We want to add to, delete from, and search in the collection What is the average case complexity of add, delete, and search if: The collection is stored as an unsorted array - PowerPoint PPT PresentationTRANSCRIPT
Hashing
Main ch. 11.2-11.5 Background:
– We want to store a collection of data. We want to add to, delete from, and search in the collection
– What is the average case complexity of add, delete, and search if:
The collection is stored as an unsorted array The collection is stored as a sorted array The collection is in a binary search tree
We Want to Do Better
Hashing has good average case behavior
Suppose– You want to keep track of students via their
student ID’s– If students ID’s range from 0-99 this is easy– What if SS type numbers are used?
This type of data is known as sparse data The SS# becomes a key for obtaining the data
Suppose
We want to keep track of only a small number of students in an array of size 10, and suppose we use SS#’s as keys
– We need a function that maps key values (SS #’s) to array indices (integers between 0 and 9)
– Such a function is called a hash function.– An example hash function could be: hash(ssn) =
ssn%10
Choosing a HashingFunction
For the previous example, we could have used:
– hash(ssn) = the first number in the ssn– We want a hash function that uniformly distributes the
keys throughout the array. This is called uniform hashing.– If you use a division hash function (remainder of division),
it is best to have a table size that is a prime number of the form 4k+3.
– see Main, p. 552 for other kinds of hash functions
What could go wrong?
If possible, store an object with key value key in array[hash(key)].
This is not always possible: you may want to add an object whose key value hashes to an index that’s already in use.
– This is called a collision.– What is the big-oh time complexity of a hash
table lookup (search) if there are no collisions?
Handlihng Collisions
Linear probing– Place the object in the next open spot– How would you find an object in a hash table
that uses linear probing?– How would you delete an object from a hash
table that uses linear probing?– See the example in Main p. 550-551
Linear probing
Performance isn’t all that great Easy to implement As the hash table gets fuller, larger and
larger consecutive stretches of the array will be filled. This is called clustering.
Double Hashing
If there is a collision, hash the key again, using a second hash function.
Double hashing is also called rehashing
Chained Hashing
Each element in the array can hold a list of elements.
Hash the key and put the object in the list in array[hash(key)]
See the demo at: http://www2.ics.hawaii.edu/~richardy/project/hash/applet.html
A Hash Function forNames
private int hashFunction( String name ) {int hashValue = 0;char cName = name.toCharArray();
for (int j=0; j < cName.length; j++) {hashValue += cName[j];
}
return hashValue % size;}
Note that size is previously defined as the size of the hash table.
Time Analysis
The load factor of a hash table is defined as follows:
array s table' theof size
tablein the elements ofnumber
Searching with LinearProbing
In a non-full hash-table with no removals, and using uniform hashing, the average number of table elements examined in a successful search is approximately:
)1
11(
2
1
Searching withDouble Hashing
In a non-full hash-table with no removals, and using uniform hashing, the average number of table elements examined in a successful search is approximately:
)1ln(
Searching withChained Hashing
In a non-full hash-table, using uniform hashing, the average number of table elements examined in a successful search is approximately:
21
Ave. # of Elements Examined
During a Search (Main p. 561)
Load Factor
Linear Probing
Double Hashing
Chained Hashing
0.5 1.5 1.39 1.25
0.6 1.75 1.53 1.3
0.7 2.17 1.72 1.35
0.8 3.0 2.01 1.4
0.9 5.5 2.56 1.45
1.0 NA NA 1.5