kmp & bm copy

23
Exact String Matching BCS-4

Upload: hessam-yusaf

Post on 07-Jul-2015

113 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Kmp & bm   copy

Exact String Matching

BCS-4

Page 2: Kmp & bm   copy

Kafil Hussain (Sp12-BsCS-020)

Asad Iqbal (Sp12-BsCS-048)

Ehtisham Arshad (FA11-BsCS-059)

Hissam Yousaf (Sp12-BsCS-036)

Page 3: Kmp & bm   copy

Exact String Matching Algorithms

Knuth Morris And Pratt – KMP

Boyer Moore - BM

Page 4: Kmp & bm   copy

The goal of any string-searching algorithm is to determine whether or not a match of a particular string exists within another (typically much longer) string.

Many such algorithms exist, with varying efficiencies.

• Knuth Morris And Pratt - KMP

• Boyer Moore - BM

Page 5: Kmp & bm   copy

IntroductionThe algorithm was conceived in 1974 by DonaldKnuth and Vaughan Pratt, and independently by James H.Morris. The three published it jointly in 1977

KMP, linear time algorithm for the string matchingproblem, every character is checked.

Page 6: Kmp & bm   copy

Introduction

Developed in 1977, the BM string search algorithm is a particularly efficient algorithm.

This algorithm’s execution time can be sub-linear, as notevery character of the string to be searched needs to bechecked.

Page 7: Kmp & bm   copy
Page 8: Kmp & bm   copy

Left to Right Check

Scans the string from left to right to match a particular given pattern

If a match is found at the first index, the next index is checked otherwise the pointer moves to right of the string

Character Skip using KMP table

Partial_lenght – 1 (for Initial Match)

Partial_lenght – index value = SKIP

Page 9: Kmp & bm   copy

Step 1:compare p[1] with S[1]

S

p

Step 2: compare p[2] with S[2]

a b c a b a a b c a b a c

a b a a

a b c a b a a b c a b a c

a b a a

Page 10: Kmp & bm   copy

Step 3: compare p[3] with S[3]

S

P a b a a

a b c a b a a b c a b a c

Mismatch occurs here..

Since mismatch is detected, shift ‘p’ one position to the left and perform steps analogous to those from step 1 to step 3.

Page 11: Kmp & bm   copy

a b c a b a a b c a b a c

a b a a

Finally, a match would be found after shifting ‘p’ three times to the right side.

S

P

Final Step:

Page 12: Kmp & bm   copy
Page 13: Kmp & bm   copy

Bad Character Rule

Occurs when rightmost character of the pattern doesn’t match with the given string’s index.

Good Suffix Rule

If a number of characters match with the given string then the good suffix shift occurs.

Page 14: Kmp & bm   copy

Step 1: Try to match first m characters

Pattern: STING

String: A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

This fails. Slide pattern right to look for other matches.Since R isn’t in the pattern, slide down next to R.

Page 15: Kmp & bm   copy

Step 2:

Pattern : STINGString : A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

Fails again.Rightmost character S is in pattern precisely once, so slide until two S's line up.

String : A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

No C in pattern. Slide past it.

Page 16: Kmp & bm   copy

Final Step:

Pattern : STINGString : A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

Match found..

Page 17: Kmp & bm   copy

Pattern(Length)

1st Time(ms)

2nd Time(ms)

3rd Time(ms)

4th Time(ms)

5th Time(ms)

Hi(2) 8ms 9ms 6ms 10ms 9ms

Pakistan(8) 20ms 19ms 22ms 20ms 21ms

Longest(30) 38ms 46ms 39ms 37ms 43ms

The Table shows that the KMP has a best case for Short Strings and patterns.The Worst Case scenario are Larger Strings or Patterns.

Avg Time for shortest (2) = 8.4ms Avg Time for Intermediate = 20.4msAvg Time for Longest = 40.6ms

Page 18: Kmp & bm   copy

Pattern(Length)

1st Timems

2nd Timems

3rd Timems

4th Timems

5th Timems

Hi(2) 378ms 512ms 555ms 445ms 380ms

Pakistan(8) 27ms 25ms 24ms 29ms 35ms

Longest(30) 17ms 16ms 17ms 18ms 11ms

Avg Time for shortest (2) = 454ms Avg Time for Intermediate = 20msAvg Time for Longest = 15.7ms

The Table shows that the BM has a best case for Larger Strings and patterns.The Worst Case scenario is short Strings or Patterns.

Page 19: Kmp & bm   copy

Pro

cess

ing

tim

e (m

s)

On average, for sufficiently large alphabets (8 characters) Boyer-Moore has fast running time and sub-linear number of charactercomparisons.

On average, and in worst cases Boyer-Moore is faster than “Boyer-Moore-like” algorithms.

Page 20: Kmp & bm   copy

The running time of Knuth-Morris-Pratt algorithm is proportional to the time needed to read the characters in text and pattern. In other words, the worst-case running time of the algorithm is O(m + n) and it requires O(m) extra space.

Page 21: Kmp & bm   copy

• Boyer requires a preprocessing time of O(m+∂)

• The running time of BM algorithm is O(mn)

• The Boyer Moore Algorithm performs best forO(n/m)

• Worst Case of BM is 3n.

Page 22: Kmp & bm   copy

KMP and Boyer Moore finds its applications in many core Digital Systems and processes e.g.

Digital libraries Screen scrapersWord processorsWeb search engines Spam filters Natural language processing

Page 23: Kmp & bm   copy

Thank you