hashing cse 331 section 2 james daly. reminders homework 3 is out due thursday in class spring break...
TRANSCRIPT
![Page 1: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/1.jpg)
Hashing
CSE 331Section 2James Daly
![Page 2: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/2.jpg)
Reminders
• Homework 3 is out• Due Thursday in class
• Spring Break is next week• Homework 4 is out
• Due after Spring Break
![Page 3: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/3.jpg)
Review: Sets
• Containers for determining membership in a group
• Elements are unique• Two main types
• Ordered tree sets• Unordered hash sets
Language Ordered Unordered
C++ set unordered_set
Java TreeSet HashSet
C# SortedSet HashSet
![Page 4: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/4.jpg)
Review: Set Operations
• Add / Insert• Remove / Delete• Exists / Find• Size / IsEmpty• Iterator• Clear / RemoveAll• Sometimes Union (AddAll) / Intersection
(RetainAll)
![Page 5: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/5.jpg)
Direct Addressing Table
• An element with key k is stored in slot k• Search(T, k) = O(1)• Insertion(T, k) = O(1)• Deletion(T, k) = O(1)
• Problem: number of keys can be large (232)
1 2 6 7 9T:
![Page 6: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/6.jpg)
Hashing
• Store an element with key k in h(k)• h(k) maps the universe U of keys into slots
of a hash table• Example
• T with slots [0, 1, …, m – 1]• h: U → {0, 1, …, m – 1}
• Key → O(1) hash → address
![Page 7: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/7.jpg)
Diagram
UActual keys
012
m – 1
T
h(k)
![Page 8: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/8.jpg)
Example
• Students with unique IDs• A: 10001• B: 10002• C: 10003• h(s) = s.id % 10
![Page 9: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/9.jpg)
Problem
• What if several keys hash to the same value?
• Several solutions• Knock out the old• Discard the new• Chaining (keep a list)• Probing (try another location)
![Page 10: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/10.jpg)
Chaining
012
m – 1
T
A
B
C
A B C
![Page 11: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/11.jpg)
Chained Hash
• Insert(T, k)• Insert k into the list T[h(k)]: O(1)
• Search(T, k)• Search for an element with key k in list T[h(k)]• O(|T[h(k)]|): the size of the chain at h(k)
• Deletion• Delete element with key k in list T[h(k)]: O(|
T[h(k)]|)
![Page 12: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/12.jpg)
Chained Hash
012
m – 1
T
Lots of stuff
012
m – 1
T
Bad Good
![Page 13: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/13.jpg)
Analysis
• Assumption: simple uniform hashing• Each key is equally likely to be hashed to any
slot• Independent of the other keys
• Load Factor: average number of keys per slot• α = n / m
• Expected search cost:• Θ(1 + α): hash cost + search through the list• Θ(1) if α = O(1)
![Page 14: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/14.jpg)
Analysis
• Load factor is more important than the table size!
![Page 15: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/15.jpg)
Birthday Problem
• What is the probability that there will be no collisions?• Approximately 45 people in the room• Probability everyone has a different birthday?• Load factor: 12.3%
![Page 16: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/16.jpg)
Hashing
• Two central problems• Design a good hash function
• Distributes keys uniformly into the table• Regularity in distribution should not affect the
uniformity• (shouldn’t use only half the slots with even numbers)
• Resolve collisions
![Page 17: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/17.jpg)
Hash Functions
• A good hash function:• Has equal probability of hashing a key in each
slot• Must be fast
![Page 18: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/18.jpg)
Sample Hash Function
• Hash function for integers• h(x) = x mod b• For some constant b
• Consider b = 2r
• 10111012 mod 23 = 1012 = 5
• h(x) returns the last r bits• Not good! Too easy to game.
• Typically b is chosen to be prime
![Page 19: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/19.jpg)
String hash functions
• “pt” = <112, 116> (ascii values)• Sum of ascii values
• 112 + 116 = 228• Same as for “tp” (bad)
• Weighted sum• 112 * 1 + 116 * 2 = 344• Same as “rs”: 114 * 1 + 115 * 2 = 344
![Page 20: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/20.jpg)
String hash functions
• Geometric Series• h(a0 a1 a2 a3) = a0 b0 + a1 b1 + a2 b2 + a3 b3
• More generally
• Usually b is chosen to be prime• Java uses this with b = 31 for
String.hashCode()
![Page 21: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/21.jpg)
Other hash functions
• Lots of them!• Murmur Hash• Fowler-Noll-Vo
• Some have different purposes• Crypographic (non-invertible)
• Used to validate integrity of message• SHA-1• MD5
![Page 22: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/22.jpg)
Open vs Closed Addressing
• Talked about Closed Addressing• Item always ends up in the same slot• Uses chaining or similar structure
• Open Addressing• Item may end up in different location• Probes alternate locations if the item isn’t
found
![Page 23: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/23.jpg)
Open Addressing
• No storage is used outside of the table itself
• Insertion systematically probes the table until an empty slot is found
• Hash function depends on both the keys and the probe number• h : U x {0, 1, …, m – 1} → {0, 1, …, m – 1}• Probe sequence <h(k, 0), h(k, 1), …, h(k, m-
1)> should be a permutation of {0, 1, … m – 1}
![Page 24: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/24.jpg)
Linear Probing
• Given ordinary hash function h’(k),• h(k, i) = h’(k) + i mod m
• Example:• h’(k) = k• h(k, i) = (k mod 11) + i) mod 11
![Page 25: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/25.jpg)
Example
0
321
4567
1098
Insert 15:h’(15) = 15 mod 11 = 4h(15, 0) = 4
15
Insert 4:h’(4) = 4h(4, 0) = 4h(4, 1) = 5
Insert 16:h’(16) = 16 mod 11 = 5h(16, 0) = 5h(15, 1) = 5 + 1 = 6
416
Primary Clustering
![Page 26: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/26.jpg)
Double Hashing
• Given two ordinary hash function h1(k) and h2(k)
• h(k, i) = (h1(k) + i * h2(k)) mod m
• h2 must be non-zero
![Page 27: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/27.jpg)
Example
0
32
791
694567
1099
8
501112
h1 = k mod 13h2 = 1 + (k mod 11)
Insert 14h1(14) = 14 mod 13 = 1h2(14) = 1 + (14 mod 1) = 4h(14, 0) = 1h(14, 1) = 1 + 4 = 5
14
Delete 72h1(72) = 72 mod 13 = 7h(72, 0) = 7
72
![Page 28: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/28.jpg)
Example
0
32
791
694145
67
1099
8
501112
h1 = k mod 13h2 = 1 + (k mod 11)
Delete 98h1(98) = 7h2(98) = 11h(98, 0) = 7h(98, 1) = 5h(98, 2) = 3
When can we stop?
![Page 29: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/29.jpg)
Rehashing
• Efficiency degrades as load factor increases• Dependent on number of items and table size
• Need to increase the table size occasionally to when adding items
• Need to move items• Requires slight adjustments to the hash
function• Mod by new table size
![Page 30: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/30.jpg)
Rehashing
• Rehashing requires Θ(n) time• Don’t increase size by a fixed amount
• Causes average time to also be Θ(n)
• Grow by a multiplicative factor instead (double)• Θ(n) once every Θ(n) inserts• Amortized Θ(n) time
![Page 31: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/31.jpg)
Rehashing
70
9, 733232151
4115866
0
3152
1
456
77
231099
73, 868
111112
![Page 32: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/32.jpg)
Applications – Pattern Matching
• For a given string, sub, test whether it is a substring of another (larger) string S
ACGT ACGTS
Sub = “ACGT”
|S| = n|Sub| = m
Cost = O(m n)
![Page 33: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/33.jpg)
RabinKarp(s[1..n], sub[1..m])
hsub ← hash(sub[1..m])For i = 1 to n – m + 1
hs ← hash(s[1..m])If hs = hsub
If s[i..i+m-1] = subReturn i
Return not found
String comparison:h(s1) = h(s2) does not mean s1 = s2
![Page 34: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/34.jpg)
Bloom-Filter
• Set membership detection• Space-efficient data structure use to test
for membership of a set• Uses several hash functions and a bitset
• Each hi(k) must be set to be in the set
• Allows false positives, but not false negatives• Probably “yes”, definitely “no”
![Page 35: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/35.jpg)
Bloom-Filter
1 1 1 1 1 1
{X, Y, Z}
![Page 36: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/36.jpg)
Map / Dictionary
• Abstract data type representing a partial function
• Relates keys to values• Keys are unique• Values might not be• Two main types (like sets)
• Ordered tree maps• Unordered hash maps
Language Ordered Unordered
C++ map unordered_map
Java TreeMap HashMap
C# SortedDictionary Dictionary
![Page 37: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/37.jpg)
Map / Dictionary
Keys Values
![Page 38: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/38.jpg)
Map / Dictionary Methods
• Insert / Put: inserts tuple• Get / At: gets value from key
• Often indexer (operator[]) to do both put and get
• Remove / Delete: removes tuple by key• KeyExists• Iterator• Size / IsEmpty• Clear
![Page 39: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/39.jpg)
TreeMap
John555-3612
Jacob555-3147
Mary555-1243
Mathew555-2179
Luke555-7293
Mark555-3479
Sarah555-5394
Key: NameValue: Cell #
Mary?
555-1243
![Page 40: Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649e215503460f94b0d7aa/html5/thumbnails/40.jpg)
Hash Map
Sarah, 555-539401
John, 555-36122Jacob, 555-31473
4Mary, 555-12435
Mathew, 555-21796Mark, 555-34797
8Luke, 555-72939
Mary?
h(Mary) = 5555-1243