![Page 1: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/1.jpg)
Hash Tables CS-2301, B-Term 2009 1
Hash Tables and Constant Access Time
CS-2301, System Programmingfor Non-Majors
(Slides include materials from The C Programming Language, 2nd edition, by Kernighan and Ritchie and from C: How to Program, 5th and 6th editions, by Deitel and Deitel)
![Page 2: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/2.jpg)
Hash Tables CS-2301, B-Term 2009 2
New Challenge
• What if we require a data structure that has to be accessed by value in constant time?
• I.e., O(log n) is not good enough!
• Need to be able to add or delete items
• Total number of items unknown• But an approximate maximum might be known
![Page 3: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/3.jpg)
Hash Tables CS-2301, B-Term 2009 3
Examples
• Anti-virus scanner
• Symbol table of compiler
• Virtual memory tables in operating system
• Bank or credit card account for a person
![Page 4: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/4.jpg)
Hash Tables CS-2301, B-Term 2009 4
Example – Validate a Credit Card
• 16-digit credit card numbers• 1016 possible card numbers
• Sparsely populated space• E.g., 108 MasterCard holders, similar for Visa
• Not “random” enough for a binary tree• Too many single branches really deep searches
• Need to respond to customer in 1-2 seconds• 1000s or tens of 1000s of customers per second!
Same is true for• ATM card numbers• Bank account numbers• Etc.
![Page 5: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/5.jpg)
Hash Tables CS-2301, B-Term 2009 5
Example — Anti-Virus Scanner
• Look at each sequence of bytes in a file
• See if it matches against library of virus patterns
• How many possible patterns?
• If so, flag it as a possible problem
Tens of Thousands!
![Page 6: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/6.jpg)
Hash Tables CS-2301, B-Term 2009 6
Anti-Virus Scanner (continued)
• Time to scan a file?• O(length) O(# of patterns)
• Can we do better?• Store patterns in a tree
• O(length) O(log (# of patterns))
• Can we do even better?• Yes — a Hash Table. Today’s topic.
![Page 7: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/7.jpg)
Hash Tables CS-2301, B-Term 2009 7
Requirement
• In these applications (and many like them), need constant time access
• I.e., O(1)
• Need to access by value!
![Page 8: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/8.jpg)
Hash Tables CS-2301, B-Term 2009 8
Observation
• Arrays provide constant time access …
• … but you have to know which element you want!• We only know the contents of the item we want!
• Also• Not easy to grow or shrink
• Not open-ended
• Can we do better?
![Page 9: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/9.jpg)
Hash Tables CS-2301, B-Term 2009 9
Definition – Hash Table
• A data structure comprising an array • for constant time access
• A set of linked lists• one list for each array element
• A hashing function to convert search key to array index
• a randomizing function to assure uniform distribution of values across array indices
Also known as a hash function
![Page 10: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/10.jpg)
Hash Tables CS-2301, B-Term 2009 10
Definition – Search Key
• A value stored as (part of) the payload of the item you are looking for
• E.g., your credit card number
• Your account number at Amazon
• A pattern characteristic of a virus
• Need to find the item containing that value (i.e., that key)
![Page 11: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/11.jpg)
Hash Tables CS-2301, B-Term 2009 11
Definition – Hash Function
• A function that randomizes the search key it to produce an index into the array
• Always returns the same value for the same key
• So that non-random keys don’t concentrate around a subset of the indices in the array
• See §6.6 in Kernighan & Ritchie
![Page 12: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/12.jpg)
Hash Tables CS-2301, B-Term 2009 12
datanext
Hash Table Structure
item item item item item item item item item item...
datanext data
next
datanextdatanext
datanext
datanextdatanext
datanext
datanextdatanext
datanext
datanext
The array
The lists
![Page 13: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/13.jpg)
Hash Tables CS-2301, B-Term 2009 13
datanext
Hash Table Structure (continued)
item item item item item item item item item item...
datanext data
next
datanextdatanext
datanext
datanextdatanext
datanext
datanextdatanext
datanext
datanext
The array
Note that some of the lists are emptyAverage length of list should be in single digits
![Page 14: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/14.jpg)
Hash Tables CS-2301, B-Term 2009 14
Guidelines for Hash Tables
• Lists from each item should be short• I.e., with short search time (approximately constant)
• Size of array should be based on expected # of entries
• Err on large side if possible
• Hashing function• Should “spread out” the values relatively uniformly
• Multiplication and division by prime numbers usually works well
![Page 15: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/15.jpg)
Hash Tables CS-2301, B-Term 2009 15
Example Hashing Function
• P. 144 of K & R
#define HASHSIZE 101
unsigned int hash(char *s) {unsigned int hashval;for (hashval = 0; *s != ‘\0’; s++)
hashval = *s + 31 * hashval;
return hashval % HASHSIZE
}
Note prime numbers to
“mix it up”
![Page 16: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/16.jpg)
Hash Tables CS-2301, B-Term 2009 16
Using a Hash Table
struct item *lookup(char *s) {struct item *np;
for (np = hashtab[hash(s)]; np != NULL;np = np -> next)
if (strcmp(s, np->data) == 0)return np; /*found*/
return NULL; /* not found */
}
![Page 17: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/17.jpg)
Hash Tables CS-2301, B-Term 2009 17
Using a Hash Table
struct item *lookup(char *s) {struct item *np;
for (np = hashtab[hash(s)]; np != NULL;np = np -> next)
if (strcmp(s, np->data) == 0)return np; /*found*/
return NULL; /* not found */
}
Hash table is indexed
by hash value of s
![Page 18: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/18.jpg)
Hash Tables CS-2301, B-Term 2009 18
Using a Hash Table
struct item *lookup(char *s) {struct item *np;
for (np = hashtab[hash(s)]; np != NULL;np = np -> next)
if (strcmp(s, np->data) == 0)return np; /*found*/
return NULL; /* not found */
}
Traverse the linked
list to find item s
![Page 19: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/19.jpg)
Hash Tables CS-2301, B-Term 2009 19
Using a Hash Table (continued)
struct item *addItem(char *s, …) {struct item *np;unsigned int hv;
if ((np = lookup(s)) == NULL) {np = malloc(item);/* fill in s and data */np -> next = hashtab[hv = hash(s)];hashtab[hv] = np;
};
return np;}
![Page 20: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/20.jpg)
Hash Tables CS-2301, B-Term 2009 20
Using a Hash Table (continued)
struct item *addItem(char *s, …) {struct item *np;unsigned int hv;
if ((np = lookup(s)) == NULL) {np = malloc(item);/* fill in s and data */np -> next = hashtab[hv = hash(s)];hashtab[hv] = np;
};
return np;}
Inserts new ite
m at head
of the lis
t indexed by
hash value
![Page 21: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/21.jpg)
Hash Tables CS-2301, B-Term 2009 21
Challenge
• What kinds of situations in your field might you need a hash table?
![Page 22: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/22.jpg)
Hash Tables CS-2301, B-Term 2009 22
Hash Table Summary
• Widely used for constant time access
• Easy to build and maintain
• There is an art and science regarding the choice of hashing functions
• Consult textbooks, web, etc.
![Page 23: Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d4c5503460f94a29d04/html5/thumbnails/23.jpg)
Hash Tables CS-2301, B-Term 2009 23
Questions?