data structures and algorithms lecture notes 7 prepared by İnanç tahrali

DATA STRUCTURES

ANDALGORITHMS

Lecture Notes 7

Prepared by İnanç TAHRALI

2

REVIEWWe have investigated the following ADTs

LISTS Array Linked List

STACKS QUEUE TREES

Binary Trees Binary Search Trees AVL Trees

What about their running times ?

3

Running times of important operations

insertion

deletion find

Array O(n) O(n) O(n)

Linked list O(1) O(n) O(n)

Tree O(log n) O(log n) O(logn)

Can we decrease the running times more ?

4

ROAD MAP HASHING

General Idea Hash Function Separate Chaining Open Adressing Rehashing

5

Hashing Hashing: implementation of hash tables hash table: an array of elements

fixed size TableSize Search is performed on a part of the item: key Each key is mapped into a number

in the range 0 to TableSize-1 Used as array index

Mapping by hash function Simple to compute Ensure that any two distinct keys get different cells

How to perform insert, delete and find operations in O(1) time ?

6

An ideal hash table Each key is mapped to a

different index ! Not always possible

many keys, finite indexes

Even distribution

Considerations : Choose a hash function Decide what to do when

two keys hash to the same value

Decide on table size

7

Hash function

If keys are integers hash function return Key mod

TableSize Ex: TableSize = 10

Keys = 120, 330, 1000 TableSize should be prime

8

Hash function If keys are strings

Add ASCII values of the characters If TableSize is large and number of characters is small

TableSize = 10000 & number of characters in a key = 8127*8=1016 < 10000

int hash( const string & key, int tableSize ){

int hashVal = 0;for( int i = 0; i < key.length( ); i++ )

hashVal += key[i];

return hashVal % tableSize;}

9


Use all characters∑ 32i Key [KeySize -i -1 ]

Early characters does not count Use only some number of characters Use characters in odd spaces

10


Use first three characters729*key[2] + 27*key[1] + key[0]

If the keys are not random some part of the table is not used.

int hash( const string & key, int tableSize )

{

return ( key [0] + 27 * key [1] + 729 *

key [2]) % tableSize;

}

11

int hash( const string & key, int tableSize ){

int hashVal = 0;

for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key[ i ];

hashVal %= tableSize;if( hashVal < 0 )

hashVal += tableSize;

return hashVal;}

A good hash function

12

Collusion Main programming detail is collision

resolution If when an element is inserted, it hashes

to the same value as an already inserted element, there is collision.

There are several methods to deal with this problem Separate chaining Open addressing

13

Separate Chaining Hash Table

Keep a list of all elements that hash to the same value

TableSize = 10 is not good not prime

14

Type declaration for separate chaining hash table

template <class HashedObj>class HashTable { public:

explicit HashTable(const HashedObj & notFound,int size = 101);HashTable( const HashTable & rhs )

:ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND),theLists( rhs.theLists ) { }

const HashedObj & find( const HashedObj & x ) const;

void makeEmpty( );void insert( const HashedObj & x );void remove( const HashedObj & x );

const HashTable & operator=( const HashTable & rhs ); private:

vector<List<HashedObj> > theLists; // The array of Listsconst HashedObj ITEM_NOT_FOUND;

};

int hash( const string & key, int tableSize );int hash( int key, int tableSize );

15

/* Construct the hash table.template <class HashedObj>HashTable<HashedObj>::HashTable( const HashedObj &

notFound, int size ) : ITEM_NOT_FOUND(notFound), theLists( nextPrime( size ) ){}

/* Make the hash table logically empty.template <class HashedObj>void HashTable<HashedObj>::makeEmpty( ) {

for( int i = 0; i < theLists.size( ); i++ )theLists[ i ].makeEmpty( );

}

/* Deep copy.template <class HashedObj>const HashTable<HashedObj> & HashTable<HashedObj> ::operator=( const HashTable<HashedObj> & rhs ){

if( this != &rhs ) theLists = rhs.theLists; return *this;}

16

/* Remove item x from the hash table.template <class HashedObj>void HashTable<HashedObj>::remove( const HashedObj & x ) {

theLists[ hash( x, theLists.size( ) ) ].remove( x );}

/* Find item x in the hash table.template <class HashedObj>const HashedObj & HashTable<HashedObj>::find( const HashedObj & x ) const {

ListItr<HashedObj> itr;itr = theLists[ hash( x, theLists.size( ) ) ].find( x );if( itr.isPastEnd( ) ) return ITEM_NOT_FOUND;else return itr.retrieve( );

}

17

/* Insert item x into the hash table.template <class HashedObj>void HashTable<HashedObj>::insert( const HashedObj & x ){

List<HashedObj> & whichList = theLists[ hash( x, theLists.size( ) ) ];ListItr<HashedObj> itr = whichList.find( x );

if( itr.isPastEnd( ) )whichList.insert( x, whichList.zeroth( ) );

}

18

Analysis Let ג be load factor of a hash table

number of elements / TableSize is the avarage length of a list ג Successful Find 2/ג comparisons + time

to evaluate hash function Unsuccessful Find & Insert ג

comparisons + time to evaluate hash function

Good choise 1 ~ גDisadvantage of separate chaining is allocate/deallocate memory !

19

Open Adressing

If collision try an alternate cellh0(x), h1(x), h2(x), …

hi(x) = (hash(x) + F(i)) mod TableSizeF(0) = 0

1 > ג

Good choise < 0.5

20

Linear Probing• F is a linear function of i

– F(i) = i

Insert keys

{89, 18, 49, 58, 69} • When 49 is

inserted collision occurs– Put into the

next available spot 0

• 58 collidates with 18, 89, 49

21

Linear Probing

Problem: It is not easy to delete an element May have caused a collision before Mark the element deleted

Problem: Primary Clustering

22

Linear Probing

Analysis

21

11

2

1&

UI

1

11

2

1S

Problem: Primary Clustering

23

Quadratic Probing

F(i) is a quadratic functionEx : F(i) = i2

24

Quadratic Probing When 49

collides with 89, next position attemped is one cell away

58 collides at position 8. The cell one away is tried, another collision occurs. It is inserted into the cell 22=4 away

25

Quadratic Probing Solves primary clustering problem

All empty cells may not be accessed A loop around full cells may happen Hash table not full but empty space not found

Theorem : If the table size is prime and 0.5>ג new element can always be inserted.

Problem : Secondary clustering!...

26

template <class HashedObj>class HashTable{ public:

explicit HashTable(const HashedObj & notFound,int size = 101);HashTable( const HashTable & rhs) : ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND), array( rhs.array ),

currentSize( rhs.currentSize ) { }

const HashedObj & find( const HashedObj & x ) const;

void makeEmpty( );void insert( const HashedObj & x );void remove( const HashedObj & x );

const HashTable & operator=( const HashTable & rhs );

enum EntryType { ACTIVE, EMPTY, DELETED };

Type declaration for open addressing hash table

27

private:

struct HashEntry{

HashedObj element;EntryType info;HashEntry( const HashedObj & e = HashedObj( ), EntryType i = EMPTY ) : element( e ), info(i) {

}};

vector<HashEntry> array;int currentSize;const HashedObj ITEM_NOT_FOUND;

bool isActive( int currentPos ) const;int findPos( const HashedObj & x ) const;void rehash( );

};

Type declaration for open addressing hash table

28

/* Construct the hash table.template <class HashedObj>HashTable<HashedObj>::HashTable( const HashedObj & notFound, int size ) :ITEM_NOT_FOUND( notFound ), array( nextPrime( size ) ) {

makeEmpty( );}

/* Make the hash table logically empty.template <class HashedObj>void HashTable<HashedObj>::makeEmpty( ){

currentSize = 0;for( int i = 0; i < array.size( ); i++ )

array[ i ].info = EMPTY;}

29

/* Find item x in the hash table.template <class HashedObj>const HashedObj & HashTable<HashedObj>::find( const HashedObj & x ) const {

int currentPos = findPos( x );if( isActive( currentPos ) )

return array[ currentPos ].element;else return ITEM_NOT_FOUND;

}

/* Method that performs quadratic probing resolution.template <class HashedObj>int HashTable<HashedObj>::findPos(const HashedObj & x) const {

int collisionNum = 0;int currentPos = hash( x, array.size( ) );

while ( array[ currentPos ].info != EMPTY && array[ currentPos ].element != x ) {

currentPos += 2 * ++collisionNum - 1;

if( currentPos >= array.size( ) )currentPos -= array.size( );

}return currentPos;

}

30

/* Return true if currentPos exists and is active.template <class HashedObj>bool HashTable<HashedObj>::isActive( int currentPos ) const{

return array[ currentPos ].info == ACTIVE;}

/* Remove item x from the hash table.template <class HashedObj>void HashTable<HashedObj>::remove( const HashedObj & x ){

int currentPos = findPos( x );if( isActive( currentPos ) )

array[ currentPos ].info = DELETED;}

/* Insert routine with quadratic probingtemplate <class HashedObj>void HashTable<HashedObj>::insert( const HashedObj & x ) {

int currentPos = findPos( x );if( isActive( currentPos ) ) return;

array[ currentPos ] = HashEntry( x, ACTIVE );}

31

/* Deep copy.template <class HashedObj>const HashTable<HashedObj> & HashTable<HashedObj>::operator=( const HashTable<HashedObj> & rhs ){

if( this != &rhs ){

array = rhs.array; currentSize = rhs.currentSize;

}return *this;

}

32

Double Hashing Use second hash function F(i) = i * hash2(x) Poor example :

hash2(x) = X mod 9hash1(x) = X mod 10TableSize = 10

If X = 99 what happens ?hash2(x) ≠ 0 for any X

33

Double Hashing

Good choise : hash2(x) = R – (X mod R)

R is a prime and < TableSize

34

Double Hashing

hash2(x) = 7 – (X mod 7)

35

Analysis Random collision resolution

Probes are independent No clustering problem

Unsuccessful search and Insert Number of probes until an empty cell is found

fraction of cells that are empty = (ג -1)expected number of probes = (ג -1) / 1

Successful searchP(X)=Number of probes when the element X is inserted

1/N∑ P(X) approximately

0

1 1 1 1ln

1 1dxx

36

Rehashing If ג gets large, number of probes

increases. Running time of operations starts taking

too long and insertions might fail Solution : Rehashing with larger

TableSize (usually *2) When to rehash

if 0.5 < ג if insertion fails

37

Rehashing Example Elements 13, 15, 24 and 6 is inserted into an

open addressing hash table of size 7 H(X) = X mod 7 Linear probing is used to resolve collisions

38

Rehashing Example

If 23 is inserted, the table is over 70 percent full.

A new table is created

17 is the first primetwice as large as the old one; so

Hnew (X) = X mod 17

39

Rehashing

Rehashing is an expensive operation Running time is O(N)

Rehashing frees the programmer from worrying about table size

Amortized Analysis: Average over N operations Operations take: O(1) time

40

/* Insert routine with quadratic probingtemplate <class HashedObj>void HashTable<HashedObj>::insert( const HashedObj & x ) {

int currentPos = findPos( x );if( isActive( currentPos ) ) return;

array[ currentPos ] = HashEntry( x, ACTIVE );

if( ++currentSize > array.size( ) / 2 ) rehash( );}/* Expand the hash table.template <class HashedObj>void HashTable<HashedObj>::rehash( ) {

vector<HashEntry> oldArray = array;

array.resize( nextPrime( 2 * oldArray.size( ) ) );for( int j = 0; j < array.size( ); j++ )

array[ j ].info = EMPTY;currentSize = 0;for( int i = 0; i < oldArray.size( ); i++ )

if( oldArray[ i ].info == ACTIVE ) insert( oldArray[ i ].element );}

data structures and algorithms lecture notes 7 prepared by İnanç tahrali

Documents

int tablesize int hash

keys hash

int hashval

hashval key i hashval

int size

good hash functionint

string key

hash functionsimple