25 string matching
TRANSCRIPT
![Page 1: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/1.jpg)
Analysis of AlgorithmsString matching
Andres Mendez-Vazquez
November 24, 2015
1 / 40
![Page 2: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/2.jpg)
Outline
1 String MatchingIntroductionSome Notation
2 Naive AlgorithmUsing Brute Force
3 The Rabin-Karp AlgorithmEfficiency After AllHorner’s RuleGeneraiting Possible MatchesThe Final AlgorithmOther Methods
4 Exercises
2 / 40
![Page 3: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/3.jpg)
What is string matching?
Given two sequences of characters drawn from a finite alphabet Σ,T [1..n] and P [1..m]
a b c a b a a b c a b a c a b
a b a a
TEXT T
PATTERN Ps=3
Valid Shift
3 / 40
![Page 4: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/4.jpg)
Where...
A Valid ShiftP occurs with a valid shift s if for some 0 ≤ s ≤ n −m =⇒T [s + 1..s + m] == P [1..m].
Otherwiseit is an invalid shift.
ThusThe string-matching problem is the problem of of finding all valid shiftsgiven a patten P on a text T .
4 / 40
![Page 5: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/5.jpg)
Where...
A Valid ShiftP occurs with a valid shift s if for some 0 ≤ s ≤ n −m =⇒T [s + 1..s + m] == P [1..m].
Otherwiseit is an invalid shift.
ThusThe string-matching problem is the problem of of finding all valid shiftsgiven a patten P on a text T .
4 / 40
![Page 6: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/6.jpg)
Where...
A Valid ShiftP occurs with a valid shift s if for some 0 ≤ s ≤ n −m =⇒T [s + 1..s + m] == P [1..m].
Otherwiseit is an invalid shift.
ThusThe string-matching problem is the problem of of finding all valid shiftsgiven a patten P on a text T .
4 / 40
![Page 7: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/7.jpg)
Possible Algorithms
We have the following onesAlgorithm Preprocessing Time Worst Case Matching TimeNaive 0 O ((n −m + 1) m)
Rabin-Karp Θ (m) O ((n −m + 1) m)Finite Automaton O (m |Σ|) Θ (n)Knuth-Morris-Pratt Θ (m) Θ (n)
5 / 40
![Page 8: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/8.jpg)
Outline
1 String MatchingIntroductionSome Notation
2 Naive AlgorithmUsing Brute Force
3 The Rabin-Karp AlgorithmEfficiency After AllHorner’s RuleGeneraiting Possible MatchesThe Final AlgorithmOther Methods
4 Exercises
6 / 40
![Page 9: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/9.jpg)
Notation and Terminology
DefinitionWe denote by Σ∗ (read “sigma-star”) the set of all finite length stringsformed using characters from the alphabet Σ.
ConstraintWe assume a finite length strings.
Some basic conceptsThe zero empty string, ε, also belong to Σ∗.The length of a string x is denoted |x|.The concatenation of two strings x and y, denoted xy, has length|x|+ |y|
7 / 40
![Page 10: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/10.jpg)
Notation and Terminology
DefinitionWe denote by Σ∗ (read “sigma-star”) the set of all finite length stringsformed using characters from the alphabet Σ.
ConstraintWe assume a finite length strings.
Some basic conceptsThe zero empty string, ε, also belong to Σ∗.The length of a string x is denoted |x|.The concatenation of two strings x and y, denoted xy, has length|x|+ |y|
7 / 40
![Page 11: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/11.jpg)
Notation and Terminology
DefinitionWe denote by Σ∗ (read “sigma-star”) the set of all finite length stringsformed using characters from the alphabet Σ.
ConstraintWe assume a finite length strings.
Some basic conceptsThe zero empty string, ε, also belong to Σ∗.The length of a string x is denoted |x|.The concatenation of two strings x and y, denoted xy, has length|x|+ |y|
7 / 40
![Page 12: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/12.jpg)
Notation and Terminology
DefinitionWe denote by Σ∗ (read “sigma-star”) the set of all finite length stringsformed using characters from the alphabet Σ.
ConstraintWe assume a finite length strings.
Some basic conceptsThe zero empty string, ε, also belong to Σ∗.The length of a string x is denoted |x|.The concatenation of two strings x and y, denoted xy, has length|x|+ |y|
7 / 40
![Page 13: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/13.jpg)
Notation and Terminology
DefinitionWe denote by Σ∗ (read “sigma-star”) the set of all finite length stringsformed using characters from the alphabet Σ.
ConstraintWe assume a finite length strings.
Some basic conceptsThe zero empty string, ε, also belong to Σ∗.The length of a string x is denoted |x|.The concatenation of two strings x and y, denoted xy, has length|x|+ |y|
7 / 40
![Page 14: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/14.jpg)
Notation and Terminology
PrefixA string w is a prefix x, w @ x if x = wy for some string y ∈ Σ∗.
SuffixA string w is a suffix x, w A x if x = yw for some string y ∈ Σ∗.
PropertiesPrefix: If w @ x ⇒ |w| ≤ |x|Suffix: If w A x ⇒ |w| ≤ |x|The ε is both suffix and prefix of every string.
8 / 40
![Page 15: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/15.jpg)
Notation and Terminology
PrefixA string w is a prefix x, w @ x if x = wy for some string y ∈ Σ∗.
SuffixA string w is a suffix x, w A x if x = yw for some string y ∈ Σ∗.
PropertiesPrefix: If w @ x ⇒ |w| ≤ |x|Suffix: If w A x ⇒ |w| ≤ |x|The ε is both suffix and prefix of every string.
8 / 40
![Page 16: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/16.jpg)
Notation and Terminology
PrefixA string w is a prefix x, w @ x if x = wy for some string y ∈ Σ∗.
SuffixA string w is a suffix x, w A x if x = yw for some string y ∈ Σ∗.
PropertiesPrefix: If w @ x ⇒ |w| ≤ |x|Suffix: If w A x ⇒ |w| ≤ |x|The ε is both suffix and prefix of every string.
8 / 40
![Page 17: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/17.jpg)
Notation and Terminology
PrefixA string w is a prefix x, w @ x if x = wy for some string y ∈ Σ∗.
SuffixA string w is a suffix x, w A x if x = yw for some string y ∈ Σ∗.
PropertiesPrefix: If w @ x ⇒ |w| ≤ |x|Suffix: If w A x ⇒ |w| ≤ |x|The ε is both suffix and prefix of every string.
8 / 40
![Page 18: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/18.jpg)
Notation and Terminology
PrefixA string w is a prefix x, w @ x if x = wy for some string y ∈ Σ∗.
SuffixA string w is a suffix x, w A x if x = yw for some string y ∈ Σ∗.
PropertiesPrefix: If w @ x ⇒ |w| ≤ |x|Suffix: If w A x ⇒ |w| ≤ |x|The ε is both suffix and prefix of every string.
8 / 40
![Page 19: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/19.jpg)
Notation and Terminology
In additionFor any string x and y and any character a, we have w A x if andonly if aw A axIn addition, @ and A are transitive relations.
9 / 40
![Page 20: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/20.jpg)
Notation and Terminology
In additionFor any string x and y and any character a, we have w A x if andonly if aw A axIn addition, @ and A are transitive relations.
9 / 40
![Page 21: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/21.jpg)
Outline
1 String MatchingIntroductionSome Notation
2 Naive AlgorithmUsing Brute Force
3 The Rabin-Karp AlgorithmEfficiency After AllHorner’s RuleGeneraiting Possible MatchesThe Final AlgorithmOther Methods
4 Exercises
10 / 40
![Page 22: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/22.jpg)
The naive string algorithm
AlgorithmNAIVE-STRING-MATCHER(T ,P)
1 n = T .length2 m = P.length3 for s = 0 to n −m4 if P [1..m] == T [s + 1..s + m]5 print “Pattern occurs with shift” s
ComplexityO((n −m + 1)m) or Θ
(n2)
if m =⌊n
2⌋
11 / 40
![Page 23: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/23.jpg)
The naive string algorithm
AlgorithmNAIVE-STRING-MATCHER(T ,P)
1 n = T .length2 m = P.length3 for s = 0 to n −m4 if P [1..m] == T [s + 1..s + m]5 print “Pattern occurs with shift” s
ComplexityO((n −m + 1)m) or Θ
(n2)
if m =⌊n
2⌋
11 / 40
![Page 24: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/24.jpg)
Outline
1 String MatchingIntroductionSome Notation
2 Naive AlgorithmUsing Brute Force
3 The Rabin-Karp AlgorithmEfficiency After AllHorner’s RuleGeneraiting Possible MatchesThe Final AlgorithmOther Methods
4 Exercises
12 / 40
![Page 25: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/25.jpg)
A more elaborated algorithm
Rabin-Karp algorithmLets assume that Σ = {0, 1, ..., 9} then we have the following:
Thus, each string of k consecutive characters is a k-length decimalnumber:c1c2 · · · ck−1ck = 10k−1c1 + 10k−2c2 + . . .+ 10ck−1 + ck
Thusp correspond the decimal number for pattern P [1..m].ts denote decimal value of m-length substring T [s + 1..s + m] fors = 0, 1, ...,n −m.
13 / 40
![Page 26: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/26.jpg)
A more elaborated algorithm
Rabin-Karp algorithmLets assume that Σ = {0, 1, ..., 9} then we have the following:
Thus, each string of k consecutive characters is a k-length decimalnumber:c1c2 · · · ck−1ck = 10k−1c1 + 10k−2c2 + . . .+ 10ck−1 + ck
Thusp correspond the decimal number for pattern P [1..m].ts denote decimal value of m-length substring T [s + 1..s + m] fors = 0, 1, ...,n −m.
13 / 40
![Page 27: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/27.jpg)
A more elaborated algorithm
Rabin-Karp algorithmLets assume that Σ = {0, 1, ..., 9} then we have the following:
Thus, each string of k consecutive characters is a k-length decimalnumber:c1c2 · · · ck−1ck = 10k−1c1 + 10k−2c2 + . . .+ 10ck−1 + ck
Thusp correspond the decimal number for pattern P [1..m].ts denote decimal value of m-length substring T [s + 1..s + m] fors = 0, 1, ...,n −m.
13 / 40
![Page 28: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/28.jpg)
A more elaborated algorithm
Rabin-Karp algorithmLets assume that Σ = {0, 1, ..., 9} then we have the following:
Thus, each string of k consecutive characters is a k-length decimalnumber:c1c2 · · · ck−1ck = 10k−1c1 + 10k−2c2 + . . .+ 10ck−1 + ck
Thusp correspond the decimal number for pattern P [1..m].ts denote decimal value of m-length substring T [s + 1..s + m] fors = 0, 1, ...,n −m.
13 / 40
![Page 29: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/29.jpg)
Then
PropertiesClearly ts == p if and only if T [s + 1..s + m] == P [1..m], thus s is avalid shift.
14 / 40
![Page 30: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/30.jpg)
Now, think about this
What if we put everything in a single word of the machineIf we can compute p in Θ (m).If we can compute all the ts in Θ (n −m + 1).
We will getΘ (m) + Θ (n −m + 1) = Θ (n)
15 / 40
![Page 31: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/31.jpg)
Now, think about this
What if we put everything in a single word of the machineIf we can compute p in Θ (m).If we can compute all the ts in Θ (n −m + 1).
We will getΘ (m) + Θ (n −m + 1) = Θ (n)
15 / 40
![Page 32: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/32.jpg)
Now, think about this
What if we put everything in a single word of the machineIf we can compute p in Θ (m).If we can compute all the ts in Θ (n −m + 1).
We will getΘ (m) + Θ (n −m + 1) = Θ (n)
15 / 40
![Page 33: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/33.jpg)
Outline
1 String MatchingIntroductionSome Notation
2 Naive AlgorithmUsing Brute Force
3 The Rabin-Karp AlgorithmEfficiency After AllHorner’s RuleGeneraiting Possible MatchesThe Final AlgorithmOther Methods
4 Exercises
16 / 40
![Page 34: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/34.jpg)
Remember Horner’s Rule
ConsiderWe can compute, the decimal representation by the Horner’s rule:
p = P[m] + 10(P[m − 1] + 10(P[m − 2] + ... + 10(P[2] + 10P[1])...)) (1)
Thus, we can compute t0 using this rule in
Θ (m) (2)
17 / 40
![Page 35: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/35.jpg)
Remember Horner’s Rule
ConsiderWe can compute, the decimal representation by the Horner’s rule:
p = P[m] + 10(P[m − 1] + 10(P[m − 2] + ... + 10(P[2] + 10P[1])...)) (1)
Thus, we can compute t0 using this rule in
Θ (m) (2)
17 / 40
![Page 36: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/36.jpg)
Example
If you have the following set of digits2 4 0 1
Using the Horner’s Rule for m = 4
2401 = 1 + 10× (0 + 10× (4 + 10× 2))= 2× 103 + 4× 102 + 0× 10 + 1
18 / 40
![Page 37: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/37.jpg)
Example
If you have the following set of digits2 4 0 1
Using the Horner’s Rule for m = 4
2401 = 1 + 10× (0 + 10× (4 + 10× 2))= 2× 103 + 4× 102 + 0× 10 + 1
18 / 40
![Page 38: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/38.jpg)
Then
To compute the remaining values, we can use the previous value
ts+1 = 10(ts − 10m−1T [s + 1]
)+ T [s + m + 1] . (3)
Notice the followingSubtracting from it 10m−1T [s + 1] removes the high-order digit fromts.Multiplying the result by 10 shifts the number left by one digitposition.Adding T [s + m + 1] brings in the appropriate low-order digit.
19 / 40
![Page 39: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/39.jpg)
Then
To compute the remaining values, we can use the previous value
ts+1 = 10(ts − 10m−1T [s + 1]
)+ T [s + m + 1] . (3)
Notice the followingSubtracting from it 10m−1T [s + 1] removes the high-order digit fromts.Multiplying the result by 10 shifts the number left by one digitposition.Adding T [s + m + 1] brings in the appropriate low-order digit.
19 / 40
![Page 40: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/40.jpg)
Then
To compute the remaining values, we can use the previous value
ts+1 = 10(ts − 10m−1T [s + 1]
)+ T [s + m + 1] . (3)
Notice the followingSubtracting from it 10m−1T [s + 1] removes the high-order digit fromts.Multiplying the result by 10 shifts the number left by one digitposition.Adding T [s + m + 1] brings in the appropriate low-order digit.
19 / 40
![Page 41: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/41.jpg)
Example
Imagine that you have...index 1 2 3 4 5 6digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 02412 Then we subtract
(103 × T [1]
)== 1000 of 1024
3 We get 00244 Multiply by 10, and we get 02405 We add T [5] == 16 Finally, we get 0241
20 / 40
![Page 42: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/42.jpg)
Example
Imagine that you have...index 1 2 3 4 5 6digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 02412 Then we subtract
(103 × T [1]
)== 1000 of 1024
3 We get 00244 Multiply by 10, and we get 02405 We add T [5] == 16 Finally, we get 0241
20 / 40
![Page 43: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/43.jpg)
Example
Imagine that you have...index 1 2 3 4 5 6digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 02412 Then we subtract
(103 × T [1]
)== 1000 of 1024
3 We get 00244 Multiply by 10, and we get 02405 We add T [5] == 16 Finally, we get 0241
20 / 40
![Page 44: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/44.jpg)
Example
Imagine that you have...index 1 2 3 4 5 6digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 02412 Then we subtract
(103 × T [1]
)== 1000 of 1024
3 We get 00244 Multiply by 10, and we get 02405 We add T [5] == 16 Finally, we get 0241
20 / 40
![Page 45: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/45.jpg)
Example
Imagine that you have...index 1 2 3 4 5 6digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 02412 Then we subtract
(103 × T [1]
)== 1000 of 1024
3 We get 00244 Multiply by 10, and we get 02405 We add T [5] == 16 Finally, we get 0241
20 / 40
![Page 46: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/46.jpg)
Example
Imagine that you have...index 1 2 3 4 5 6digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 02412 Then we subtract
(103 × T [1]
)== 1000 of 1024
3 We get 00244 Multiply by 10, and we get 02405 We add T [5] == 16 Finally, we get 0241
20 / 40
![Page 47: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/47.jpg)
Example
Imagine that you have...index 1 2 3 4 5 6digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 02412 Then we subtract
(103 × T [1]
)== 1000 of 1024
3 We get 00244 Multiply by 10, and we get 02405 We add T [5] == 16 Finally, we get 0241
20 / 40
![Page 48: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/48.jpg)
Remarks
FirstWe can extend this beyond the decimals to any d digit system!!!
What happens when the numbers are quite large?We can use the module
MeaningCompute p and ts values modulo a suitable modulus q.
21 / 40
![Page 49: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/49.jpg)
Remarks
FirstWe can extend this beyond the decimals to any d digit system!!!
What happens when the numbers are quite large?We can use the module
MeaningCompute p and ts values modulo a suitable modulus q.
21 / 40
![Page 50: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/50.jpg)
Remarks
FirstWe can extend this beyond the decimals to any d digit system!!!
What happens when the numbers are quite large?We can use the module
MeaningCompute p and ts values modulo a suitable modulus q.
21 / 40
![Page 51: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/51.jpg)
Outline
1 String MatchingIntroductionSome Notation
2 Naive AlgorithmUsing Brute Force
3 The Rabin-Karp AlgorithmEfficiency After AllHorner’s RuleGeneraiting Possible MatchesThe Final AlgorithmOther Methods
4 Exercises
22 / 40
![Page 52: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/52.jpg)
Remember Hash Functions
Yes, we are mapping the large numbers into the set
{0, 1, 2, ..., q − 1}
23 / 40
![Page 53: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/53.jpg)
Then, to reduce the possible representation
Use the module of qIt is possible to compute p mod q in Θ (m) time and all the ts mod q inΘ (n −m + 1) time.
Something NotableIf we choose the modulus q as a prime such that 10q just fits within onecomputer word, then we can perform all the necessary computations withsingle-precision arithmetic.
24 / 40
![Page 54: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/54.jpg)
Then, to reduce the possible representation
Use the module of qIt is possible to compute p mod q in Θ (m) time and all the ts mod q inΘ (n −m + 1) time.
Something NotableIf we choose the modulus q as a prime such that 10q just fits within onecomputer word, then we can perform all the necessary computations withsingle-precision arithmetic.
24 / 40
![Page 55: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/55.jpg)
Why 10q?
After all10q is the number that will subtracting for or multiplying by!!!
We use “truncated division” to implement the modulo operationFor example given two numbers a and n, we can do the following
q = trunc( a
n
)
Then
r = a − nq
25 / 40
![Page 56: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/56.jpg)
Why 10q?
After all10q is the number that will subtracting for or multiplying by!!!
We use “truncated division” to implement the modulo operationFor example given two numbers a and n, we can do the following
q = trunc( a
n
)
Then
r = a − nq
25 / 40
![Page 57: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/57.jpg)
Why 10q?
After all10q is the number that will subtracting for or multiplying by!!!
We use “truncated division” to implement the modulo operationFor example given two numbers a and n, we can do the following
q = trunc( a
n
)
Then
r = a − nq
25 / 40
![Page 58: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/58.jpg)
Why 10q?
ThusIf q is a prime we can use this as the element of truncated division thenr = a − 10q.
Truncated AlgorithmTruncated-Module(a, q)
1 r = a − 10q2 while r > 10q3 r = r − 10q4 return r
26 / 40
![Page 59: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/59.jpg)
Why 10q?
ThusIf q is a prime we can use this as the element of truncated division thenr = a − 10q.
Truncated AlgorithmTruncated-Module(a, q)
1 r = a − 10q2 while r > 10q3 r = r − 10q4 return r
26 / 40
![Page 60: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/60.jpg)
Then
ThusThen, we can implement this in basic arithmetic CPU operations.
Not only thatIn general, for a d-ary alphabet, we choose q such that dq fits within acomputer word.
27 / 40
![Page 61: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/61.jpg)
Then
ThusThen, we can implement this in basic arithmetic CPU operations.
Not only thatIn general, for a d-ary alphabet, we choose q such that dq fits within acomputer word.
27 / 40
![Page 62: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/62.jpg)
In general, we have
The following
ts+1 = (d(ts − T [s + 1]h) + T [s + m + 1]) mod q (4)
where h ≡ dm−1(mod q) is the value of the digit “1” in the high-orderposition of an m-digit text window.
HereWe have a small problem!!!
Question?Can we differentiate between p mod q and ts mod q?
28 / 40
![Page 63: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/63.jpg)
In general, we have
The following
ts+1 = (d(ts − T [s + 1]h) + T [s + m + 1]) mod q (4)
where h ≡ dm−1(mod q) is the value of the digit “1” in the high-orderposition of an m-digit text window.
HereWe have a small problem!!!
Question?Can we differentiate between p mod q and ts mod q?
28 / 40
![Page 64: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/64.jpg)
In general, we have
The following
ts+1 = (d(ts − T [s + 1]h) + T [s + m + 1]) mod q (4)
where h ≡ dm−1(mod q) is the value of the digit “1” in the high-orderposition of an m-digit text window.
HereWe have a small problem!!!
Question?Can we differentiate between p mod q and ts mod q?
28 / 40
![Page 65: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/65.jpg)
What?
Look at this with q = 1114 mod 11 == 325 mod 11 == 3
But 14 6= 25
However11 mod 11 == 025 mod 11 == 3
We can say that 11 6= 25!!!
This meansWe can use the modulo to differentiate numbers, but not to exactly to sayif they are equal!!!
29 / 40
![Page 66: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/66.jpg)
What?
Look at this with q = 1114 mod 11 == 325 mod 11 == 3
But 14 6= 25
However11 mod 11 == 025 mod 11 == 3
We can say that 11 6= 25!!!
This meansWe can use the modulo to differentiate numbers, but not to exactly to sayif they are equal!!!
29 / 40
![Page 67: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/67.jpg)
What?
Look at this with q = 1114 mod 11 == 325 mod 11 == 3
But 14 6= 25
However11 mod 11 == 025 mod 11 == 3
We can say that 11 6= 25!!!
This meansWe can use the modulo to differentiate numbers, but not to exactly to sayif they are equal!!!
29 / 40
![Page 68: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/68.jpg)
What?
Look at this with q = 1114 mod 11 == 325 mod 11 == 3
But 14 6= 25
However11 mod 11 == 025 mod 11 == 3
We can say that 11 6= 25!!!
This meansWe can use the modulo to differentiate numbers, but not to exactly to sayif they are equal!!!
29 / 40
![Page 69: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/69.jpg)
What?
Look at this with q = 1114 mod 11 == 325 mod 11 == 3
But 14 6= 25
However11 mod 11 == 025 mod 11 == 3
We can say that 11 6= 25!!!
This meansWe can use the modulo to differentiate numbers, but not to exactly to sayif they are equal!!!
29 / 40
![Page 70: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/70.jpg)
What?
Look at this with q = 1114 mod 11 == 325 mod 11 == 3
But 14 6= 25
However11 mod 11 == 025 mod 11 == 3
We can say that 11 6= 25!!!
This meansWe can use the modulo to differentiate numbers, but not to exactly to sayif they are equal!!!
29 / 40
![Page 71: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/71.jpg)
What?
Look at this with q = 1114 mod 11 == 325 mod 11 == 3
But 14 6= 25
However11 mod 11 == 025 mod 11 == 3
We can say that 11 6= 25!!!
This meansWe can use the modulo to differentiate numbers, but not to exactly to sayif they are equal!!!
29 / 40
![Page 72: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/72.jpg)
Thus
We have the following logicIf ts ≡ p (mod q) does not mean that ts == p.If ts 6≡ p (mod q), we have that ts 6= p.
Fixing the problemTo fix this problem we simply test to see if the hit is not spurious.
NoteIf q is large enough, then we can hope that spurious hits occur infrequentlyenough that the cost of the extra checking is low.
30 / 40
![Page 73: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/73.jpg)
Thus
We have the following logicIf ts ≡ p (mod q) does not mean that ts == p.If ts 6≡ p (mod q), we have that ts 6= p.
Fixing the problemTo fix this problem we simply test to see if the hit is not spurious.
NoteIf q is large enough, then we can hope that spurious hits occur infrequentlyenough that the cost of the extra checking is low.
30 / 40
![Page 74: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/74.jpg)
Thus
We have the following logicIf ts ≡ p (mod q) does not mean that ts == p.If ts 6≡ p (mod q), we have that ts 6= p.
Fixing the problemTo fix this problem we simply test to see if the hit is not spurious.
NoteIf q is large enough, then we can hope that spurious hits occur infrequentlyenough that the cost of the extra checking is low.
30 / 40
![Page 75: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/75.jpg)
Thus
We have the following logicIf ts ≡ p (mod q) does not mean that ts == p.If ts 6≡ p (mod q), we have that ts 6= p.
Fixing the problemTo fix this problem we simply test to see if the hit is not spurious.
NoteIf q is large enough, then we can hope that spurious hits occur infrequentlyenough that the cost of the extra checking is low.
30 / 40
![Page 76: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/76.jpg)
Outline
1 String MatchingIntroductionSome Notation
2 Naive AlgorithmUsing Brute Force
3 The Rabin-Karp AlgorithmEfficiency After AllHorner’s RuleGeneraiting Possible MatchesThe Final AlgorithmOther Methods
4 Exercises
31 / 40
![Page 77: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/77.jpg)
The Final AlgorithmRabin-Karp-Matcher(T , P, d, q)
1 n = T .length2 m = P.length3 h = dm−1 mod q // Storing the reminder of the highest power4 p = 05 t0 = 06 for i = 1 to m // Preprocessing7 p = (dp + P [i]) mod q8 t0 = (dp + T [i]) mod q9 for s = 0 to n −m10 if p == ts11 if P [1..m] == T [s + 1..s + m] // Actually a Loop12 print “Pattern occurs with shift” s13 if s < n −m14 ts+1 = (d(ts − T [s + 1]h) + T [s + m + 1]) mod q
32 / 40
![Page 78: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/78.jpg)
The Final AlgorithmRabin-Karp-Matcher(T , P, d, q)
1 n = T .length2 m = P.length3 h = dm−1 mod q // Storing the reminder of the highest power4 p = 05 t0 = 06 for i = 1 to m // Preprocessing7 p = (dp + P [i]) mod q8 t0 = (dp + T [i]) mod q9 for s = 0 to n −m10 if p == ts11 if P [1..m] == T [s + 1..s + m] // Actually a Loop12 print “Pattern occurs with shift” s13 if s < n −m14 ts+1 = (d(ts − T [s + 1]h) + T [s + m + 1]) mod q
32 / 40
![Page 79: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/79.jpg)
The Final AlgorithmRabin-Karp-Matcher(T , P, d, q)
1 n = T .length2 m = P.length3 h = dm−1 mod q // Storing the reminder of the highest power4 p = 05 t0 = 06 for i = 1 to m // Preprocessing7 p = (dp + P [i]) mod q8 t0 = (dp + T [i]) mod q9 for s = 0 to n −m10 if p == ts11 if P [1..m] == T [s + 1..s + m] // Actually a Loop12 print “Pattern occurs with shift” s13 if s < n −m14 ts+1 = (d(ts − T [s + 1]h) + T [s + m + 1]) mod q
32 / 40
![Page 80: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/80.jpg)
The Final AlgorithmRabin-Karp-Matcher(T , P, d, q)
1 n = T .length2 m = P.length3 h = dm−1 mod q // Storing the reminder of the highest power4 p = 05 t0 = 06 for i = 1 to m // Preprocessing7 p = (dp + P [i]) mod q8 t0 = (dp + T [i]) mod q9 for s = 0 to n −m10 if p == ts11 if P [1..m] == T [s + 1..s + m] // Actually a Loop12 print “Pattern occurs with shift” s13 if s < n −m14 ts+1 = (d(ts − T [s + 1]h) + T [s + m + 1]) mod q
32 / 40
![Page 81: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/81.jpg)
Complexity
Preprocessing1 p mod q is done in Θ(m).2 t ′ss mod q is done in Θ(n −m + 1).
Then checking P[1..m] == T [s + 1..s + m]In the worst case, Θ((n −m + 1)m)
33 / 40
![Page 82: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/82.jpg)
Complexity
Preprocessing1 p mod q is done in Θ(m).2 t ′ss mod q is done in Θ(n −m + 1).
Then checking P[1..m] == T [s + 1..s + m]In the worst case, Θ((n −m + 1)m)
33 / 40
![Page 83: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/83.jpg)
Still we can do better!
FirstThe number of spurious hits is O (n/q).
BecauseWe can estimate the chance that an arbitrary ts will be equivalent to pmod q as 1/q.
PropertiesSince there are O (n) positions at which the test of line 10 fails (Thus, youhave O (n/q) non valid hits) and we spend O (m) time per hit
34 / 40
![Page 84: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/84.jpg)
Still we can do better!
FirstThe number of spurious hits is O (n/q).
BecauseWe can estimate the chance that an arbitrary ts will be equivalent to pmod q as 1/q.
PropertiesSince there are O (n) positions at which the test of line 10 fails (Thus, youhave O (n/q) non valid hits) and we spend O (m) time per hit
34 / 40
![Page 85: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/85.jpg)
Still we can do better!
FirstThe number of spurious hits is O (n/q).
BecauseWe can estimate the chance that an arbitrary ts will be equivalent to pmod q as 1/q.
PropertiesSince there are O (n) positions at which the test of line 10 fails (Thus, youhave O (n/q) non valid hits) and we spend O (m) time per hit
34 / 40
![Page 86: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/86.jpg)
Finally, we have thatThe expected matching timeThe expected matching time by Rabin-Karp algorithm is
O(n) + O(
m(
v + nq
))
where v is the number of valid shifts.
In additionIf v = O(1) (Number of valid shifts small) and choose q ≥ m such thatnq = O(1) (q to be larger enough than the pattern’s length).
ThenThe algorithm takes O(m + n) for finding the matches.Finally, because m ≤ n, thus the expected time is O(n).
35 / 40
![Page 87: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/87.jpg)
Finally, we have thatThe expected matching timeThe expected matching time by Rabin-Karp algorithm is
O(n) + O(
m(
v + nq
))
where v is the number of valid shifts.
In additionIf v = O(1) (Number of valid shifts small) and choose q ≥ m such thatnq = O(1) (q to be larger enough than the pattern’s length).
ThenThe algorithm takes O(m + n) for finding the matches.Finally, because m ≤ n, thus the expected time is O(n).
35 / 40
![Page 88: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/88.jpg)
Finally, we have thatThe expected matching timeThe expected matching time by Rabin-Karp algorithm is
O(n) + O(
m(
v + nq
))
where v is the number of valid shifts.
In additionIf v = O(1) (Number of valid shifts small) and choose q ≥ m such thatnq = O(1) (q to be larger enough than the pattern’s length).
ThenThe algorithm takes O(m + n) for finding the matches.Finally, because m ≤ n, thus the expected time is O(n).
35 / 40
![Page 89: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/89.jpg)
Finally, we have thatThe expected matching timeThe expected matching time by Rabin-Karp algorithm is
O(n) + O(
m(
v + nq
))
where v is the number of valid shifts.
In additionIf v = O(1) (Number of valid shifts small) and choose q ≥ m such thatnq = O(1) (q to be larger enough than the pattern’s length).
ThenThe algorithm takes O(m + n) for finding the matches.Finally, because m ≤ n, thus the expected time is O(n).
35 / 40
![Page 90: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/90.jpg)
Finally, we have thatThe expected matching timeThe expected matching time by Rabin-Karp algorithm is
O(n) + O(
m(
v + nq
))
where v is the number of valid shifts.
In additionIf v = O(1) (Number of valid shifts small) and choose q ≥ m such thatnq = O(1) (q to be larger enough than the pattern’s length).
ThenThe algorithm takes O(m + n) for finding the matches.Finally, because m ≤ n, thus the expected time is O(n).
35 / 40
![Page 91: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/91.jpg)
Outline
1 String MatchingIntroductionSome Notation
2 Naive AlgorithmUsing Brute Force
3 The Rabin-Karp AlgorithmEfficiency After AllHorner’s RuleGeneraiting Possible MatchesThe Final AlgorithmOther Methods
4 Exercises
36 / 40
![Page 92: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/92.jpg)
We have other methods
We have the following onesAlgorithm Preprocessing Time Worst Case Matching TimeRabin-Karp Θ (m) O ((n −m + 1) m)
Finite Automaton O (m |Σ|) Θ (n)Knuth-Morris-Pratt Θ (m) Θ (n)
37 / 40
![Page 93: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/93.jpg)
Remarks about Knuth-Morris-Pratt
The AlgorithmIt is quite an elegant algorithm that improves over the state machine.
How?It avoid to compute the transition function in the state machine by usingthe prefix function π
It encapsulate information how the pattern matches against shifts ofitself
38 / 40
![Page 94: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/94.jpg)
Remarks about Knuth-Morris-Pratt
The AlgorithmIt is quite an elegant algorithm that improves over the state machine.
How?It avoid to compute the transition function in the state machine by usingthe prefix function π
It encapsulate information how the pattern matches against shifts ofitself
38 / 40
![Page 95: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/95.jpg)
However, At the same time (Circa 1977)
Boyer–Moore string search algorithmIt was presented at the same time
It is used in the GREP function for pattern matching in UNIXActually is the basic algorithm to beat when doing research in this area!!!
Richard Cole (Circa 1991)He gave a a proof of the algorithm with an upper bound of 3mcomparisons in the worst case!!!
39 / 40
![Page 96: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/96.jpg)
However, At the same time (Circa 1977)
Boyer–Moore string search algorithmIt was presented at the same time
It is used in the GREP function for pattern matching in UNIXActually is the basic algorithm to beat when doing research in this area!!!
Richard Cole (Circa 1991)He gave a a proof of the algorithm with an upper bound of 3mcomparisons in the worst case!!!
39 / 40
![Page 97: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/97.jpg)
However, At the same time (Circa 1977)
Boyer–Moore string search algorithmIt was presented at the same time
It is used in the GREP function for pattern matching in UNIXActually is the basic algorithm to beat when doing research in this area!!!
Richard Cole (Circa 1991)He gave a a proof of the algorithm with an upper bound of 3mcomparisons in the worst case!!!
39 / 40
![Page 98: 25 String Matching](https://reader030.vdocuments.net/reader030/viewer/2022033101/58ed74a61a28ab50248b468b/html5/thumbnails/98.jpg)
Exercises
32.1-132.1-232.1-432.2-132.2-232.2-332.2-4
40 / 40