more on data structures in c cs-2301 b-term 20081 more on lists and trees introduction to hash...
Post on 15-Jan-2016
224 views
TRANSCRIPT
More on Data Structures in C
CS-2301 B-term 2008 1
More on Lists and TreesIntroduction to Hash Tables
CS-2301, System Programming for Non-majors
(Slides include materials from The C Programming Language, 2nd ed., by Kernighan and Ritchie and from C: How to Program, 5th ed., by Deitel and Deitel)
More on Data Structures in C
CS-2301 B-term 2008 2
Linked List (review)
• Linear data structure
• Easy to grow and shrink
• Easy to add and delete items
• Time to search for an item – O(n)
More on Data Structures in C
CS-2301 B-term 2008 3
Linked List (review)
• Linear data structure
• Easy to grow and shrink
• Easy to add and delete items
• Time to search for an item – O(n)
“Big-O
” notat
ion:–
means “
order of”
More on Data Structures in C
CS-2301 B-term 2008 4
Definition — Big-O“Of the order of …”
• A characterization of the number of operations in an algorithm in terms of the number of data items involved
• O(n) means that the number of operations to complete the algorithm is proportional to n
• E.g., searching a list with n items requires, on average, n/2 comparisons with payloads
More on Data Structures in C
CS-2301 B-term 2008 5
Big-O (continued)
• O(n): proportional to n – i.e., linear• O(n2): proportional to n2 – i.e., quadratic• O(kn) – proportional to kn – i.e., exponential• …• O(log n) – proportional to log n – i.e.,
sublinear• O(n log n)
• Worse than O(n), better than O(n2)
• O(1) – independent of n; i.e., constant
More on Data Structures in C
CS-2301 B-term 2008 6
Anecdote & Questions:–
• In the design of electronic adders, what is the order of the carry-propagation?
• What is the order of floating point divide?
• What is the order of floating point square root?
• What program have we studied in this course that is O(2n)? i.e., exponential?
More on Data Structures in C
CS-2301 B-term 2008 7
Questions on Big-O?
More on Data Structures in C
CS-2301 B-term 2008 8
Back to Linked List Review
• Linear data structure
• Easy to grow and shrink
• Easy to add and delete items
• Time to search for an item – O(n)
More on Data Structures in C
CS-2301 B-term 2008 9
Linked List (continued)
payload
nextpayload
nextpayload
next
payload
next
struct listItem *head;
More on Data Structures in C
CS-2301 B-term 2008 10
Doubly-Linked List (review)
prev next
payload
prev next
payloadprev next
payload
prev next
payload
struct listItem *head, *tail;
More on Data Structures in C
CS-2301 B-term 2008 11
AddAfter(item *p, item *new)
Simple linked list{ new -> next =
p -> next;p -> next = new;
}
Doubly-linked list{ new -> next =
p -> next;new -> prev =
p->next->prev; p -> next =
p->next->prev = new;
}
More on Data Structures in C
CS-2301 B-term 2008 12
AddAfter(item *p, item *new)
Simple linked list{ new -> next =
p -> next;p -> next = new;
}
Doubly-linked list{ new -> next = p -> next;
new -> prev = p; p -> next -> prev = new;p -> next = new;}
prev next
payloadprev next
payload
prev next
payload
More on Data Structures in C
CS-2301 B-term 2008 13
AddAfter(item *p, item *new)
Simple linked list{ new -> next =
p -> next;p -> next = new;
}
Doubly-linked list{ new -> next = p -> next;
new -> prev = p; p -> next -> prev = new;p -> next = new;
}
prev next
payloadprev next
payload
prev next
payload
More on Data Structures in C
CS-2301 B-term 2008 14
AddAfter(item *p, item *new)
Simple linked list{ new -> next =
p -> next;p -> next = new;
}
Doubly-linked list{ new -> next = p -> next;
new -> prev = p; p -> next -> prev = new;p -> next = new;
}
prev next
payloadprev next
payload
prev next
payload
More on Data Structures in C
CS-2301 B-term 2008 15
AddAfter(item *p, item *new)
Simple linked list{ new -> next =
p -> next;p -> next = new;
}
Doubly-linked list{ new -> next = p -> next;
new -> prev = p; p -> next -> prev = new;p -> next = new;
}
prev next
payloadprev next
payload
prev next
payload
More on Data Structures in C
CS-2301 B-term 2008 16
deleteNext(item *p)
Simple linked list{ if (p->next != NULL)
p->next = p->next->next;
}
Doubly-linked list• Complicated• Easier to deleteItem
More on Data Structures in C
CS-2301 B-term 2008 17
deleteItem(item *p)
Simple linked list• Not possible without
having a pointer to previous item!
Doubly-linked list{ if(p->next != NULL)
p->next->prev = p->prev; if(p->prev != NULL)p->prev->next = p->next;
}
prev next
payloadprev next
payload
prev next
payload
More on Data Structures in C
CS-2301 B-term 2008 18
deleteItem(item *p)
Simple linked list• Not possible without
having a pointer to previous item!
Doubly-linked list{ if(p->next != NULL)
p->next->prev = p->prev; if(p->prev != NULL)p->prev->next = p->next;
}
prev next
payloadprev next
payload
prev next
payload
More on Data Structures in C
CS-2301 B-term 2008 19
deleteItem(item *p)
Simple linked list• Not possible without
having a pointer to previous item!
Doubly-linked list{ if(p->next != NULL)
p->next->prev = p->prev; if(p->prev != NULL)p->prev->next = p->next;
}
prev next
payloadprev next
payload
prev next
payload
More on Data Structures in C
CS-2301 B-term 2008 20
Special Cases of Linked Lists
• Queue:– – Items always added to tail– Items always removed from head
• Stack:– – Items always added to head– Items always removed from head
More on Data Structures in C
CS-2301 B-term 2008 21
Bubble Sort a Linked List
item *BubbleSort(item *p) {if (p->next != NULL) {
item *q = p->next, *qq = p;for (;q != NULL; qq = q, q = q-
>next)if (p->payload > q->payload){
/*swap p and q */
}p->next = BubbleSort(p->next);
};return p;
}
More on Data Structures in C
CS-2301 B-term 2008 22
Bubble Sort a Linked List
item *BubbleSort(item *p) {if (p->next != NULL) {item *q = p->next, *qq = p;for (;q != NULL; qq = q, q = q->next)if (p->payload > q->payload){item *temp = p->next;p->next = q->next; q->next = temp;qq->next = p; p = q;}p->next = BubbleSort(p->next);};return p;
}
More on Data Structures in C
CS-2301 B-term 2008 23
Potential Exam Questions
• Analyze BubbleSort to determine if it is correct, and fix it if incorrect.
• Hint: you need to define “correct”
• Hint2: you need to define a loop invariant to convince yourself
• Draw a diagram showing the nodes, pointers, and actions of the algorithm
More on Data Structures in C
CS-2301 B-term 2008 24
Observations:–
• What is the order of the Bubble Sort algorithm?
• Answer: O(n2)
• Note that Quicksort is faster• Pages 87 & 110 in Kernighan and Ritchie
• Potential exam question:– why?
More on Data Structures in C
CS-2301 B-term 2008 25
Questions?
More on Data Structures in C
CS-2301 B-term 2008 26
Binary Tree (review)
• A linked list but with two links per item
struct treeItem {type payload;treeItem *left; treeItem *right;
};
left right
payload
left right
payloadleft right
payload
left right
payloadleft right
payloadleft right
payload
left right
payload
More on Data Structures in C
CS-2301 B-term 2008 27
Binary Trees (continued)
• Two-dimensional data structure
• Easy to grow and shrink
• Easy to add and delete items at leaves• More work needed to insert or delete branch nodes
• Search time is O(log n)• If tree is reasonably balanced
• Degenerates to O(n) in worst case if unbalanced
More on Data Structures in C
CS-2301 B-term 2008 28
Order of Traversing Binary Trees
• In-order• Traverse left sub-tree (in-order)• Visit node itself• Traverse right sub-tree (in-order)
• Pre-order• Visit node first• Traverse left sub-tree• Traverse right sub-tree
• Post-order• Traverse left sub-tree• Traverse right sub-tree• Visit node last
More on Data Structures in C
CS-2301 B-term 2008 29
Order of Traversing Binary Trees
• In-order• Traverse left sub-tree (in-order)• Visit node itself• Traverse right sub-tree (in-order)
• Pre-order• Visit node first• Traverse left sub-tree• Traverse right sub-tree
• Post-order• Traverse left sub-tree• Traverse right sub-tree• Visit node last
Homework #5
More on Data Structures in C
CS-2301 B-term 2008 30
Example of Binary Tree
x = (a.real*b.imag - b.real*a.imag) / sqrt(a.real*b.real – a.imag*b.imag)
=
x /
sqrt-
* *
. .
a real b imag
. .
b real a imag
-
…
More on Data Structures in C
CS-2301 B-term 2008 31
Question
• What kind of traversal order is required for this expression?
• In-order?
• Pre-order?
• Post-order?
More on Data Structures in C
CS-2301 B-term 2008 32
Binary Trees in Compilers
• Used to represent the structure of the compiled program
• Optimizations• Common sub-expression detection
• Code simplification
• Loop unrolling
• Parallelization
• Reductions in strength – e.g., substituting additions for multiplications, etc.
• Many others
More on Data Structures in C
CS-2301 B-term 2008 33
Questions about Trees?
(or about Homework 5?)
More on Data Structures in C
CS-2301 B-term 2008 34
New Challenge
• What if we have a data structure that needs to be accessed by value in constant time?
• I.e., O(log n) is not good enough!
• Need to be able to add or delete items
• Total number of items unknown• But an approximate maximum might be known
More on Data Structures in C
CS-2301 B-term 2008 35
Examples
• Anti-virus scanner
• Symbol table of compiler
• Virtual memory tables in operating system
• Bank account for an individual
More on Data Structures in C
CS-2301 B-term 2008 36
Observation
• Arrays provide constant time access …
• … but you have to know which element you want!
• Also• Not easy to grow or shrink
• Not open-ended
• Can we do better?
More on Data Structures in C
CS-2301 B-term 2008 37
Answer – Hash Table
• Definition:– Hash Table• A data structure comprising an array (for constant time access)
• A set of linked lists (for each array element)
• A hashing function to convert value to array index
• Definition:– Hashing function (or simply hash function)
• A function that takes the value in question and “randomizes” it to produce an index
• So that non-randomness of values does not cause concentration of too many elements around a few indices in array
• See §6.6 in Kernighan & Ritchie
More on Data Structures in C
CS-2301 B-term 2008 38
datanext
Hash Table Structure
item item item item item item item item item item...
datanext data
next
datanextdatanext
datanext
datanextdatanext
datanext
datanextdatanext
datanext
datanext
More on Data Structures in C
CS-2301 B-term 2008 39
Guidelines for Hash Tables
• Lists from each item should be short• I.e., with short search time (approximately constant)
• Size of array should be based on expected # of entries
• Err on large side if possible
• Hashing function• Should “spread out” the values relatively uniformly
• Multiplication and division by prime numbers usually works well
More on Data Structures in C
CS-2301 B-term 2008 40
Example Hashing Function
• P. 144 of K & R
#define HASHSIZE 101
unsigned int hash(char *s) {unsigned int hashval;for (hashval = 0; *s != ‘\0’; s++)
hashval = *s + 31 * hashval;
return hashval % HASHSIZE
}
More on Data Structures in C
CS-2301 B-term 2008 41
Example Hashing Function
• P. 144 of K & R
#define HASHSIZE 101
unsigned int hash(char *s) {unsigned int hashval;for (hashval = 0; *s != ‘\0’; s++)
hashval = *s + 31 * hashval;
return hashval % HASHSIZE
}
Note choice of prime
numbers to “mix it
up”
More on Data Structures in C
CS-2301 B-term 2008 42
Using a Hash Table
struct item *lookup(char *s) {struct item *np;
for (np = hashtab[hash(s)]; np != NULL;np = np -> next)
if (strcmp(s, np->data) == 0)return np; /*found*/
return NULL; /* not found */
}
More on Data Structures in C
CS-2301 B-term 2008 43
Using a Hash Table
struct item *lookup(char *s) {struct item *np;
for (np = hashtab[hash(s)]; np != NULL;np = np -> next)
if (strcmp(s, np->data) == 0)return np; /*found*/
return NULL; /* not found */
}
Hash table is indexed
by hash value of s
More on Data Structures in C
CS-2301 B-term 2008 44
Using a Hash Table
struct item *lookup(char *s) {struct item *np;
for (np = hashtab[hash(s)]; np != NULL;np = np -> next)
if (strcmp(s, np->data) == 0)return np; /*found*/
return NULL; /* not found */
}
Traverse the linked
list to find item s
More on Data Structures in C
CS-2301 B-term 2008 45
Using a Hash Table (continued)
struct item *addItem(char *s, …) {struct item *np;unsigned int hv;
if ((np = lookup(s)) == NULL) {np = malloc(item);/* fill in s and data */np -> next = hashtab[hv = hash(s)];hashtab[hv] = np;
};
return np;}
More on Data Structures in C
CS-2301 B-term 2008 46
Using a Hash Table (continued)
struct item *addItem(char *s, …) {struct item *np;unsigned int hv;
if ((np = lookup(s)) == NULL) {np = malloc(item);/* fill in s and data */np -> next = hashtab[hv = hash(s)];hashtab[hv] = np;
};
return np;}
Inserts new ite
m at head
of the lis
t indexed by
hash value
More on Data Structures in C
CS-2301 B-term 2008 47
Hash Table Summary
• Widely used for constant time access
• Easy to build and maintain
• There exist an art and science to the choice of hashing functions
• Consult textbooks, web, etc.
More on Data Structures in C
CS-2301 B-term 2008 48
Questions?