data structures and algorithms lecture notes 7 prepared by İnanç tahrali
TRANSCRIPT
DATA STRUCTURES
ANDALGORITHMS
Lecture Notes 7
Prepared by İnanç TAHRALI
2
REVIEWWe have investigated the following ADTs
LISTS Array Linked List
STACKS QUEUE TREES
Binary Trees Binary Search Trees AVL Trees
What about their running times ?
3
Running times of important operations
insertion
deletion find
Array O(n) O(n) O(n)
Linked list O(1) O(n) O(n)
Tree O(log n) O(log n) O(logn)
Can we decrease the running times more ?
4
ROAD MAP HASHING
General Idea Hash Function Separate Chaining Open Adressing Rehashing
5
Hashing Hashing: implementation of hash tables hash table: an array of elements
fixed size TableSize Search is performed on a part of the item: key Each key is mapped into a number
in the range 0 to TableSize-1 Used as array index
Mapping by hash function Simple to compute Ensure that any two distinct keys get different cells
How to perform insert, delete and find operations in O(1) time ?
6
An ideal hash table Each key is mapped to a
different index ! Not always possible
many keys, finite indexes
Even distribution
Considerations : Choose a hash function Decide what to do when
two keys hash to the same value
Decide on table size
7
Hash function
If keys are integers hash function return Key mod
TableSize Ex: TableSize = 10
Keys = 120, 330, 1000 TableSize should be prime
8
Hash function If keys are strings
Add ASCII values of the characters If TableSize is large and number of characters is small
TableSize = 10000 & number of characters in a key = 8127*8=1016 < 10000
int hash( const string & key, int tableSize ){
int hashVal = 0;for( int i = 0; i < key.length( ); i++ )
hashVal += key[i];
return hashVal % tableSize;}
9
Hash function If keys are strings
Use all characters∑ 32i Key [KeySize -i -1 ]
Early characters does not count Use only some number of characters Use characters in odd spaces
10
Hash function If keys are strings
Use first three characters729*key[2] + 27*key[1] + key[0]
If the keys are not random some part of the table is not used.
int hash( const string & key, int tableSize )
{
return ( key [0] + 27 * key [1] + 729 *
key [2]) % tableSize;
}
11
int hash( const string & key, int tableSize ){
int hashVal = 0;
for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key[ i ];
hashVal %= tableSize;if( hashVal < 0 )
hashVal += tableSize;
return hashVal;}
A good hash function
12
Collusion Main programming detail is collision
resolution If when an element is inserted, it hashes
to the same value as an already inserted element, there is collision.
There are several methods to deal with this problem Separate chaining Open addressing
13
Separate Chaining Hash Table
Keep a list of all elements that hash to the same value
TableSize = 10 is not good not prime
14
Type declaration for separate chaining hash table
template <class HashedObj>class HashTable { public:
explicit HashTable(const HashedObj & notFound,int size = 101);HashTable( const HashTable & rhs )
:ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND),theLists( rhs.theLists ) { }
const HashedObj & find( const HashedObj & x ) const;
void makeEmpty( );void insert( const HashedObj & x );void remove( const HashedObj & x );
const HashTable & operator=( const HashTable & rhs ); private:
vector<List<HashedObj> > theLists; // The array of Listsconst HashedObj ITEM_NOT_FOUND;
};
int hash( const string & key, int tableSize );int hash( int key, int tableSize );
15
/* Construct the hash table.template <class HashedObj>HashTable<HashedObj>::HashTable( const HashedObj &
notFound, int size ) : ITEM_NOT_FOUND(notFound), theLists( nextPrime( size ) ){}
/* Make the hash table logically empty.template <class HashedObj>void HashTable<HashedObj>::makeEmpty( ) {
for( int i = 0; i < theLists.size( ); i++ )theLists[ i ].makeEmpty( );
}
/* Deep copy.template <class HashedObj>const HashTable<HashedObj> & HashTable<HashedObj> ::operator=( const HashTable<HashedObj> & rhs ){
if( this != &rhs ) theLists = rhs.theLists; return *this;}
16
/* Remove item x from the hash table.template <class HashedObj>void HashTable<HashedObj>::remove( const HashedObj & x ) {
theLists[ hash( x, theLists.size( ) ) ].remove( x );}
/* Find item x in the hash table.template <class HashedObj>const HashedObj & HashTable<HashedObj>::find( const HashedObj & x ) const {
ListItr<HashedObj> itr;itr = theLists[ hash( x, theLists.size( ) ) ].find( x );if( itr.isPastEnd( ) ) return ITEM_NOT_FOUND;else return itr.retrieve( );
}
17
/* Insert item x into the hash table.template <class HashedObj>void HashTable<HashedObj>::insert( const HashedObj & x ){
List<HashedObj> & whichList = theLists[ hash( x, theLists.size( ) ) ];ListItr<HashedObj> itr = whichList.find( x );
if( itr.isPastEnd( ) )whichList.insert( x, whichList.zeroth( ) );
}
18
Analysis Let ג be load factor of a hash table
number of elements / TableSize is the avarage length of a list ג Successful Find 2/ג comparisons + time
to evaluate hash function Unsuccessful Find & Insert ג
comparisons + time to evaluate hash function
Good choise 1 ~ גDisadvantage of separate chaining is allocate/deallocate memory !
19
Open Adressing
If collision try an alternate cellh0(x), h1(x), h2(x), …
hi(x) = (hash(x) + F(i)) mod TableSizeF(0) = 0
1 > ג
Good choise < 0.5
20
Linear Probing• F is a linear function of i
– F(i) = i
Insert keys
{89, 18, 49, 58, 69} • When 49 is
inserted collision occurs– Put into the
next available spot 0
• 58 collidates with 18, 89, 49
21
Linear Probing
Problem: It is not easy to delete an element May have caused a collision before Mark the element deleted
Problem: Primary Clustering
22
Linear Probing
Analysis
21
11
2
1&
UI
1
11
2
1S
Problem: Primary Clustering
23
Quadratic Probing
F(i) is a quadratic functionEx : F(i) = i2
24
Quadratic Probing When 49
collides with 89, next position attemped is one cell away
58 collides at position 8. The cell one away is tried, another collision occurs. It is inserted into the cell 22=4 away
25
Quadratic Probing Solves primary clustering problem
All empty cells may not be accessed A loop around full cells may happen Hash table not full but empty space not found
Theorem : If the table size is prime and 0.5>ג new element can always be inserted.
Problem : Secondary clustering!...
26
template <class HashedObj>class HashTable{ public:
explicit HashTable(const HashedObj & notFound,int size = 101);HashTable( const HashTable & rhs) : ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND), array( rhs.array ),
currentSize( rhs.currentSize ) { }
const HashedObj & find( const HashedObj & x ) const;
void makeEmpty( );void insert( const HashedObj & x );void remove( const HashedObj & x );
const HashTable & operator=( const HashTable & rhs );
enum EntryType { ACTIVE, EMPTY, DELETED };
Type declaration for open addressing hash table
27
private:
struct HashEntry{
HashedObj element;EntryType info;HashEntry( const HashedObj & e = HashedObj( ), EntryType i = EMPTY ) : element( e ), info(i) {
}};
vector<HashEntry> array;int currentSize;const HashedObj ITEM_NOT_FOUND;
bool isActive( int currentPos ) const;int findPos( const HashedObj & x ) const;void rehash( );
};
Type declaration for open addressing hash table
28
/* Construct the hash table.template <class HashedObj>HashTable<HashedObj>::HashTable( const HashedObj & notFound, int size ) :ITEM_NOT_FOUND( notFound ), array( nextPrime( size ) ) {
makeEmpty( );}
/* Make the hash table logically empty.template <class HashedObj>void HashTable<HashedObj>::makeEmpty( ){
currentSize = 0;for( int i = 0; i < array.size( ); i++ )
array[ i ].info = EMPTY;}
29
/* Find item x in the hash table.template <class HashedObj>const HashedObj & HashTable<HashedObj>::find( const HashedObj & x ) const {
int currentPos = findPos( x );if( isActive( currentPos ) )
return array[ currentPos ].element;else return ITEM_NOT_FOUND;
}
/* Method that performs quadratic probing resolution.template <class HashedObj>int HashTable<HashedObj>::findPos(const HashedObj & x) const {
int collisionNum = 0;int currentPos = hash( x, array.size( ) );
while ( array[ currentPos ].info != EMPTY && array[ currentPos ].element != x ) {
currentPos += 2 * ++collisionNum - 1;
if( currentPos >= array.size( ) )currentPos -= array.size( );
}return currentPos;
}
30
/* Return true if currentPos exists and is active.template <class HashedObj>bool HashTable<HashedObj>::isActive( int currentPos ) const{
return array[ currentPos ].info == ACTIVE;}
/* Remove item x from the hash table.template <class HashedObj>void HashTable<HashedObj>::remove( const HashedObj & x ){
int currentPos = findPos( x );if( isActive( currentPos ) )
array[ currentPos ].info = DELETED;}
/* Insert routine with quadratic probingtemplate <class HashedObj>void HashTable<HashedObj>::insert( const HashedObj & x ) {
int currentPos = findPos( x );if( isActive( currentPos ) ) return;
array[ currentPos ] = HashEntry( x, ACTIVE );}
31
/* Deep copy.template <class HashedObj>const HashTable<HashedObj> & HashTable<HashedObj>::operator=( const HashTable<HashedObj> & rhs ){
if( this != &rhs ){
array = rhs.array; currentSize = rhs.currentSize;
}return *this;
}
32
Double Hashing Use second hash function F(i) = i * hash2(x) Poor example :
hash2(x) = X mod 9hash1(x) = X mod 10TableSize = 10
If X = 99 what happens ?hash2(x) ≠ 0 for any X
33
Double Hashing
Good choise : hash2(x) = R – (X mod R)
R is a prime and < TableSize
34
Double Hashing
hash2(x) = 7 – (X mod 7)
35
Analysis Random collision resolution
Probes are independent No clustering problem
Unsuccessful search and Insert Number of probes until an empty cell is found
fraction of cells that are empty = (ג -1)expected number of probes = (ג -1) / 1
Successful searchP(X)=Number of probes when the element X is inserted
1/N∑ P(X) approximately
0
1 1 1 1ln
1 1dxx
36
Rehashing If ג gets large, number of probes
increases. Running time of operations starts taking
too long and insertions might fail Solution : Rehashing with larger
TableSize (usually *2) When to rehash
if 0.5 < ג if insertion fails
37
Rehashing Example Elements 13, 15, 24 and 6 is inserted into an
open addressing hash table of size 7 H(X) = X mod 7 Linear probing is used to resolve collisions
38
Rehashing Example
If 23 is inserted, the table is over 70 percent full.
A new table is created
17 is the first primetwice as large as the old one; so
Hnew (X) = X mod 17
39
Rehashing
Rehashing is an expensive operation Running time is O(N)
Rehashing frees the programmer from worrying about table size
Amortized Analysis: Average over N operations Operations take: O(1) time
40
/* Insert routine with quadratic probingtemplate <class HashedObj>void HashTable<HashedObj>::insert( const HashedObj & x ) {
int currentPos = findPos( x );if( isActive( currentPos ) ) return;
array[ currentPos ] = HashEntry( x, ACTIVE );
if( ++currentSize > array.size( ) / 2 ) rehash( );}/* Expand the hash table.template <class HashedObj>void HashTable<HashedObj>::rehash( ) {
vector<HashEntry> oldArray = array;
array.resize( nextPrime( 2 * oldArray.size( ) ) );for( int j = 0; j < array.size( ); j++ )
array[ j ].info = EMPTY;currentSize = 0;for( int i = 0; i < oldArray.size( ); i++ )
if( oldArray[ i ].info == ACTIVE ) insert( oldArray[ i ].element );}