a knowledge sharing session on

52
A Knowledge Sharing Session on Unit IV: Tables (DSPS) 1

Upload: aiko-woodard

Post on 02-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

A Knowledge Sharing Session on. Unit IV: Tables (DSPS). Syllabus: Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree Hash Tables: Basic Concepts, Hash Function, Hashing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Knowledge Sharing Session on

A Knowledge Sharing Session on

Unit IV: Tables (DSPS)

1

Page 2: A Knowledge Sharing Session on

Syllabus: Symbol Tables: Static and dynamic tree tables,

AVL trees, AVL Tree Implementation, Algorithms

and analysis of AVL Tree

Hash Tables: Basic Concepts, Hash Function,

Hashing methods, Collision resolution, Bucket hashing,

Dynamic Hashing.

Tables |Unit IV of DSPS (SE-Comp)

2

Page 3: A Knowledge Sharing Session on

Part I : Symbol Tables

Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree.

Hash Tables: Basic Concepts, Hash Function, Hashing methods, Collision resolution, Bucket hashing, Dynamic Hashing.

Part II: Hash Tables

3

Page 4: A Knowledge Sharing Session on

Symbol Table Examples

AVL Tree

AVL Implementation

AVL Algorithm Analysis

Symbol Table | Why Symbol Table

What Compiler Does?

• Lexical analysis– Detects inputs with illegal tokens • e.g.: main$ ();

• Parsing– Detects inputs with ill-formed parse trees • e.g.: missing semicolons

• Semantic analysis– Last “front end” phase– Catches all remaining errors

Symbol Table

4

Page 5: A Knowledge Sharing Session on

Symbol Table | Why Symbol Table

Typical Semantic Errors

• multiple declarations: a variable should be declared (in the same region) at most once.

• undeclared variable: a variable should not be used before being declared.

• type mismatch: type of the left-hand side of an assignment should match the type of the right-hand side.

• wrong arguments: methods should be called with the right number and types of arguments.

5

Page 6: A Knowledge Sharing Session on

Symbol Table | Aim of Symbol Table

Purpose of Symbol Table

– keep track of names declared in the program

– names of• variables, classes, fields, methods,

6

Page 7: A Knowledge Sharing Session on

Symbol Table | Symbol Table Stores

What it Contains

associates a name with a set of attributes, e.g.:

• kind of name (variable, class, field, method, etc)

• type (int, float, etc)

• nesting level

• memory location (i.e., where will it be found at runtime).

7

Page 8: A Knowledge Sharing Session on

Symbol Table | Symbol Table Revisit

In Short,

During Lexical Analysis --Finds Symbols--Adds Symbols to symbol table

During Syntactic Analysis--Information about each symbol is filled in

During Semantic Analysis--Used for type checking.

8

Page 9: A Knowledge Sharing Session on

Symbol Table | Symbol Table Important?

Info Provided by Symbol Table,• Given an Identifier which name is it?

• What information is to be associated with a name? (Actual Characters of the name, Type, Storage allocation info (number of bytes), Line number where declared, Lines where referenced, Scope.

• How do we access this information?

• How do we associate this information with a name?

9

Page 10: A Knowledge Sharing Session on

Symbol Table | Reminder on Symbol Table

Note,

• A name can represent– Variable– Type– Constant– Parameter– Record– Record Field– Procedure– Array– Label– file

10

Page 11: A Knowledge Sharing Session on

Symbol Table

Operations on Symbol Table

determining whether a string has alreadybeen stored

inserting an entry for a string

deleting a string when it goes out of scope

This requires three functions:

1. lookup(s): returns the index of the entry forstring s, or 0 if there is no entry2. insert(s): add a new entry for string s and return its index3. delete(s): deletes s from the table (or, typically,hides it)

11

Symbol Table |Operations on Symbol Table

Page 12: A Knowledge Sharing Session on

Symbol Table

Example

01 PROGRAM Main02 GLOBAL a,b03 PROCEDURE P (PARAMETER x)04 LOCAL a05 BEGIN {P}06 …a…07 …b…08 …x…09 END {P}10 BEGIN{Main}11 Call P(a)12 END {Main}

12

Symbol Table | Symbol Table Examples

Page 13: A Knowledge Sharing Session on

Symbol Table Unsorted List

01 PROGRAM Main02 GLOBAL a,b03 PROCEDURE P (PARAMETER x)04 LOCAL a05 BEGIN {P}06 …a…07 …b…08 …x…09 END {P}10 BEGIN{Main}11 Call P(a)12 END {Main}

Name Characteristic Class Scope Other AttributesDeclared Referenced Other

Main Program 0 Line 1a Variable 0 Line 2 Line 11b Variable 0 Line 2 Line 7P Procedure 0 Line 3 Line 11 1, parameter, xx Parameter 1 Line 3 Line 8a Variable 1 Line 4 Line 6

nOLook up Complexity

13

Page 14: A Knowledge Sharing Session on

Symbol Table Sorted List

01 PROGRAM Main02 GLOBAL a,b03 PROCEDURE P (PARAMETER x)04 LOCAL a05 BEGIN {P}06 …a…07 …b…08 …x…09 END {P}10 BEGIN{Main}11 Call P(a)12 END {Main}

nO logLook up Complexity

Name Characteristic Class Scope Other AttributesDeclared Referenced Other

a Variable 0 Line 2 Line 11a Variable 1 Line 4 Line 6b Variable 0 Line 2 Line 7Main Program 0 Line 1P Procedure 0 Line 3 Line 11 1, parameter, xx Parameter 1 Line 3 Line 8

nOWorst Case:

14

Page 15: A Knowledge Sharing Session on

Two issues:

1. Interface: how to use symbol tables

2. Implementation: how to implement it.

15

Page 16: A Knowledge Sharing Session on

Basic Implementation Techniques

Considerations:

Number of names

Storage space

Retrieval time

16

Page 17: A Knowledge Sharing Session on

<1> unordered list (linked list/array)

<2> ordered list» binary search on arrays» expensive insertion

(+) good for a fixed set of names(e.g. reserved words, assembly opcodes)

<3> binary search tree» On average, searching takes O(log(n)) time.» However, names in programs are not chosen

randomly.

<4>AVL:<5> Hash table: most common

(+) constant time 17

Page 18: A Knowledge Sharing Session on

Static Tree TableIf Symbols are known in advance :

No insertion and Deletion allowed Cost of searching symbols of higher frequency

should be small.• Huffman tree and OBST

if

do Read

while

Fig: Optimal Search Tree when frequency of symbols are specified

0

0

0

0

1

11

1abc

de

Fig: Huffman Tree 18

Page 19: A Knowledge Sharing Session on

Dynamic Tree TablesSymbols are inserted as and when they

comeDeletion is also possibleAVL

32 60

20 45 55 68

50 bst

19

Page 20: A Knowledge Sharing Session on

Part I : Symbol Tables

Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree

Hash Tables: Basic Concepts, Hash Function, Hashing methods, Collision resolution, Bucket hashing, Dynamic Hashing.

Part II: Hash Tables

20

Page 21: A Knowledge Sharing Session on

Where Hashing will be Used?1. docDict2. Database3. Compliers 4. Network Router and Servers5. Substring Search6. Cryptography

Hash Table| Motivation

21

Page 22: A Knowledge Sharing Session on

Motivation

Hashing Methods

Collision Resolution

Symbol Table | Why Hash Table

A Problem?

• We have to store some records and perform the following:

add new recorddelete recordsearch a record by

key

Find a way to do these efficiently!

Hashing

22

Page 23: A Knowledge Sharing Session on

Use an array to store the records, in unsorted order1. add - add the records as the last entry

fast O(1)

2. delete a target - slow at finding the target, fast at filling the hole (just take the last entry) O(n)

3. search - sequential search slow O(n)

Hash Table| Unsorted Array

23

Page 24: A Knowledge Sharing Session on

Use an array to store the records, keeping them in sorted order1. add - insert the record in proper

position. much record movement slow O(n)

2. delete a target - how to handle the hole after deletion? Much record movement slow O(n)

3. search - binary search fast O(log n)

Hash Table| Sorted Array

24

Page 25: A Knowledge Sharing Session on

Store the records in a linked list (unsorted) 1. add - fast if one can insert node

anywhere O(1)2. delete a target - fast at disposing the

node, but slow at finding the target O(n)

3. search - sequential search slow O(n) (if we only use linked list, we cannot use binary search even if the list is sorted.)

Hash Table| Linked List

25

Page 26: A Knowledge Sharing Session on

What is the Solution then?have better performance but are more

complex

1. Hash table

2. Tree (BST, Heap, …)

Hash Table| More Approaches

26

Page 27: A Knowledge Sharing Session on

Array as table?

Hash Table| More Approaches

27

9903030

98020209801010

0056789

00123450033333

tushar

manalipeter

david

sandybubli

73

10020

56.8

81.590

studid name score

9908080 Namrata 49

...

...

Page 28: A Knowledge Sharing Session on

Hash Table| Array as table?

28

:33333

:12345

0:

:betty

:andy

:

:90:

81.5:

name score

56789 david 56.8

:9908080

::

:bill::

:49::

9999999

One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]

One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]

Page 29: A Knowledge Sharing Session on

Hash Table| Whats Wrong Then?

29

Consider this problem. We want to store 1,000 student records and search them by student id.

Consider this problem. We want to store 1,000 student records and search them by student id.

One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]

One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]

Page 30: A Knowledge Sharing Session on

1. Keys may not be nonnegative integers.

2. Gigantic Memory hog

Hash Table| What's Wrong Then?

30

Page 31: A Knowledge Sharing Session on

1. Keys may not be nonnegative integers.

Solution: Prehash

2. Gigantic Memory hogSolution: Direct Hash Table

(reduce universe of all keys to reasonable size)

Hash Table| What's Wrong Then?

31

Page 32: A Knowledge Sharing Session on

• Each slot, or position, corresponds to a key in U.

• If there’s an element x with key k, then T [k] contains a pointer to x.

• Otherwise, T [k] is empty, represented by NIL.

Hash Table| Direct Hashing Table

32

Page 33: A Knowledge Sharing Session on

Store the records in a huge array where the index corresponds to the keyadd - very fast O(1) delete - very fast O(1) search - very fast O(1)

Hash Table| Direct Hashing Table

33

Page 34: A Knowledge Sharing Session on

Hash Table| Hash function

34

function Hash(key: KeyType): integer;

Imagine that we have such a magic function Hash. It maps the key (studid) of the 1000 records into the integers 0..999, one to one. No two different keys maps to the same number.

Imagine that we have such a magic function Hash. It maps the key (studid) of the 1000 records into the integers 0..999, one to one. No two different keys maps to the same number.

H(‘0012345’) = 134H(‘0033333’) = 67H(‘0056789’) = 764…H(‘9908080’) = 3

Page 35: A Knowledge Sharing Session on

Hash Table| Hash Table

35

:betty

:bill:

:90:

49:

name score

andy 81.5

::

david:

::

56.8:

:0033333

:9908080

:

0012345

::

0056789:

3

67

0

764

999

134

To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id).

To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id).

Page 36: A Knowledge Sharing Session on

Ex: key mod size 2201 mod 1000 =201

Hash Table| Division Method

36

h(k) = k mod m

Page 37: A Knowledge Sharing Session on

different keys map to the same indexi.e h(k1)=h(k2)=i (k1!=K2)

Ex: 5 mod 11 and 27 mod 11 have index 5.

Hash Table| Collision

37

Page 38: A Knowledge Sharing Session on

HashingWidely useful technique for

implementing dictionariesConstant time per operation (on the

average)Best Case O(1)Worst Case O(n)

KeyRecord

f()=>address

01

23

45

38

Page 39: A Knowledge Sharing Session on

Ch s Hash FunctionQuick ComputationI t should spread keys evenly:

Uniform DistributionAvoid collisionVery rare cases

E.g Birth day paradox

39

Page 40: A Knowledge Sharing Session on

Hash FunctionsDirect hashingDigit ExtractionModulo –division methodMid-square MethodFolding method

40

Page 41: A Knowledge Sharing Session on

1. Hashing with Separate Chaining (Open hashing)-unlimited space

2. Hashing with Open Addressing(closed hashing)

Hash Table|-Collision Resolution DS

41

Page 42: A Knowledge Sharing Session on

Hash Table|-Collision Resolution Strategies

42

Separate chaining Open Addressing

Linear Probing Quadratic Probing Double Hashing

LP with chainingLP without chaining

LP WC without replacement

LPWC with replacement

Page 43: A Knowledge Sharing Session on

Hash Table| Chained Hash Table

43

2

4

10

3

nil

nilnil

5

nil

:

HASHMAX Key: 9903030name: tomscore: 73

One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.

One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.

Page 44: A Knowledge Sharing Session on

Is required:• When table is completely full• With quadratic probing when table is

filled half • When insertion fail due to overflow

• Size get double after rehashing• Mod value changed to new size* Very costly as new table creation, insertion from old table with using new hash fun.

Hash Table| Rehashing

44

Page 45: A Knowledge Sharing Session on

It’s more efficient when load factor is >=70%

Whr l is load factor=

l=h/t whr h is total mapped loc

t is total loc.

Hash Table| Rehashing

45

Page 46: A Knowledge Sharing Session on

Types of Linear Probing (with chaining with and without replacement

Note: Try to Solve all example that is taken in class on transparencies and on board ……you can take it from book…

46

Page 47: A Knowledge Sharing Session on

Extendible Hashing• All tech. so far are used for small data• When data becomes bulky there will be too

many disk access• So in that case use extendible hashing• This uses binary (disk) coding to mapped the

loc with binary values.– 4 size hash table with 4 slot– 00– 01– 10– 11 47

Page 48: A Knowledge Sharing Session on

**Implementation:

• Followings are some example how to create structure and apply hash function on it…

1. Linear Probing with store and search2. Double hashing 3. Quadratic probing

48

Page 49: A Knowledge Sharing Session on

Linear Probeint search_LP(int hashtable[],int key,int T[]){ int I,j;

J=key%max;// mapped locfor(i=0;i<MAX;i++){

if(T[j]==0){hashtable[j]=key; T[j]=1;return(j);}

j=(j+1)%MAX;//next loc in circular way.}

return(-1);}

49

Page 50: A Knowledge Sharing Session on

Search in LP

Only change if condition checking

for{If(T[j]==1 && hashtable[j]==key)

{ return(j);

}}

50

Page 51: A Knowledge Sharing Session on

Double hashing

51

int search_DH(int hashtable[],int T[]){

int I,j,start;start=f1(key)%max; // 1st mapped locu=f2(key); // u will used for incrementfor(i=0;i<MAX;i++){

j=(start+ i*u)%max; if(T[j]==0) // found empty{ hashtable[j]=key; T[j]=1; return(j);}

}return(-1);}

Page 52: A Knowledge Sharing Session on

Quadratic hashing

52

int search_QP(int hashtable[],int T[]){

int I,j,start;start=key%max; // 1st mapped locfor(i=0;i<MAX;i++){ j=(start+ i*i)%max;

if(T[j]==0) // found empty{ hashtable[j]=key; T[j]=1; return(j);}

}return(-1);}