algorithms and data structures - part 5

Upload: swapnil-deshmukh

Post on 03-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Algorithms and Data Structures - Part 5

    1/26

    PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information.PDF generated at: Sun, 22 Dec 2013 17:47:20 UTC

    Algorithms and DataStructuresPart 5: String Matching (Wikipedia Book2014)

    By Wikipedians

    Editors: Reiner Creutzburg, Jenny Knackmu

  • 8/11/2019 Algorithms and Data Structures - Part 5

    2/26

    Contents

    Articles

    String Matching 1String (computer science) 1

    String searching algorithm 8

    Knuth Morris Pratt algorithm 11

    Boyer Moore string search algorithm 18

    References

    Article Sources and Contributors 22Image Sources, Licenses and Contributors 23

    Article LicensesLicense 24

  • 8/11/2019 Algorithms and Data Structures - Part 5

    3/26

    1

    String Matching

    String (computer science)In computer programming, astring is traditionally a sequence of characters, either as a literal constant or as somekind of variable. The latter may allow its elements to be mutated and/or the length changed, or it may be fixed (aftercreation). A string is generally understood as a data type and is often implemented as an array of bytes (or words)that stores a sequence of elements, typically characters, using some character encoding. A string may also denotemore general arrays or other sequence (or list) data types and structures.

    Depending on programming language and precise data type used, a variable declared to be a string may either causestorage in memory to be statically allocated for a predetermined maximum length or employ dynamic allocation toallow it to hold variable number of elements.

    When a string appears literally in source code, it is known as a string literal and has a representation that denotes it assuch.

    In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finitesequence of symbols that are chosen from a set called an alphabet.

    Formal theoryLet be a non-empty finite set of symbols (alternatively called characters), called thealphabet . No assumption ismade about the nature of the symbols. Astring (or word ) over is any finite sequence of symbols from . Forexample, if = {0, 1}, then01011 is a string over .

    The length of a string is the number of symbols in the string (the length of the sequence) and can be anynon-negative integer. Theempty string is the unique string over of length 0, and is denoted or .

    The set of all strings over of lengthn is denoted n. For example, if = {0, 1}, then 2 = {00, 01, 10, 11}. Notethat 0 = {} for any alphabet .

    The set of all strings over of any length is the Kleene closure of and is denoted *. In terms of n,

    For example, if = {0, 1}, * = {, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, ...}. Although * itself is countablyinfinite, all elements of * have finite length.

    A set of strings over (i.e. any subset of *) is called a formal language over . For example, if = {0, 1}, the setof strings with an even number of zeros ({, 1, 00, 11, 001, 010, 100, 111, 0000, 0011, 0101, 0110, 1001, 1010,1100, 1111, ...}) is a formal language over .

    https://en.wikipedia.org/w/index.php?title=Subsethttps://en.wikipedia.org/w/index.php?title=Countably_infinitehttps://en.wikipedia.org/w/index.php?title=Countably_infinitehttps://en.wikipedia.org/w/index.php?title=Kleene_starhttps://en.wikipedia.org/w/index.php?title=Empty_stringhttps://en.wikipedia.org/w/index.php?title=Non-negative_integerhttps://en.wikipedia.org/w/index.php?title=Lengthhttps://en.wikipedia.org/w/index.php?title=Sequencehttps://en.wikipedia.org/w/index.php?title=Finite_sethttps://en.wikipedia.org/w/index.php?title=Empty_sethttps://en.wikipedia.org/w/index.php?title=Alphabet_%28computer_science%29https://en.wikipedia.org/w/index.php?title=Set_%28mathematics%29https://en.wikipedia.org/w/index.php?title=Symbol_%28formal%29https://en.wikipedia.org/w/index.php?title=Theoretical_computer_sciencehttps://en.wikipedia.org/w/index.php?title=Mathematical_logichttps://en.wikipedia.org/w/index.php?title=Formal_languagehttps://en.wikipedia.org/w/index.php?title=String_literalhttps://en.wikipedia.org/w/index.php?title=Source_codehttps://en.wikipedia.org/w/index.php?title=Variable_%28programming%29https://en.wikipedia.org/w/index.php?title=List_%28abstract_data_type%29https://en.wikipedia.org/w/index.php?title=Array_%28data_type%29https://en.wikipedia.org/w/index.php?title=Character_encodinghttps://en.wikipedia.org/w/index.php?title=Word_%28computer_architecture%29https://en.wikipedia.org/w/index.php?title=Bytehttps://en.wikipedia.org/w/index.php?title=Array_data_structurehttps://en.wikipedia.org/w/index.php?title=Data_typehttps://en.wikipedia.org/w/index.php?title=Literal_%28computer_programming%29https://en.wikipedia.org/w/index.php?title=Character_%28computing%29https://en.wikipedia.org/w/index.php?title=Sequencehttps://en.wikipedia.org/w/index.php?title=Computer_programming
  • 8/11/2019 Algorithms and Data Structures - Part 5

    4/26

    String (computer science) 2

    Concatenation and substrings

    Concatenation is an important binary operation on *. For any two stringss and t in *, their concatenation isdefined as the sequence of symbols ins followed by the sequence of characters int , and is denotedst . For example,if = {a, b, ..., z},s = bear , and t = hug , then st = bearhug and ts = hugbear .

    String concatenation is an associative, but non-commutative operation. The empty string serves as the identity

    element; for any strings, s = s = s. Therefore, the set * and the concatenation operation form a monoid, the freemonoid generated by . In addition, the length function defines a monoid homomorphism from * to thenon-negative integers (that is, a function , such that ).

    A string s is said to be asubstring or factor of t if there exist (possibly empty) stringsu and v such that t = usv . Therelation "is a substring of" defines a partial order on *, the least element of which is the empty string.

    Prefixes and suffixes

    A strings is said to be a prefix oft if there exists a stringu such that t = su . If u is nonempty,s is said to be a proper prefix of t . Symmetrically, a strings is said to be a suffix oft if there exists a stringu such that t = us . If u is

    nonempty,s is said to be a proper suffix of t . Suffixes and prefixes are substrings oft .

    Rotations

    A string s = uv is said to be a rotation oft if t = vu. For example, if = {0, 1} the string 0011001 is a rotation of 0100110, where u = 00110 and v = 01.

    Reversal

    The reverse of a string is a string with the same symbols but in reverse order. For example, ifs = abc (where a, b, andc are symbols of the alphabet), then the reverse ofs is cba. A string that is the reverse of itself (e.g.,s = madam) iscalled a palindrome, which also includes the empty string and all strings of length 1.

    Lexicographical ordering

    It is often useful to define an ordering on a set of strings. If the alphabet has a total order (cf. alphabetical order)one can define a total order on * called lexicographical order. For example, if = {0, 1} and 0 < 1, then thelexicographical order on * includes the relationships < 0 < 00 < 000 < ... < 0001 < 001 < 01 < 010 < 011 < 0110 0 then

    let cnd T[cnd]

    (third case: we have run out of candidates. Note cnd = 0)

    else

    let T[pos] 0, pos pos + 1

    Efficiency of the table-building algorithm

    The complexity of the table algorithm isO(n) , where n is the length ofW. As except for some initialization all thework is done in thewhile loop, it is sufficient to show that this loop executes inO(n) time, which will be doneby simultaneously examining the quantitiespos and pos - cnd . In the first branch,pos - cnd is preserved,as bothpos and cnd are incremented simultaneously, but naturally,pos is increased. In the second branch,cndis replaced byT[cnd] , which we saw above is always strictly less thancnd , thus increasingpos - cnd . In thethird branch,pos is incremented andcnd is not, so bothpos and pos - cnd increase. Sincepos pos -cnd , this means that at each stage eitherpos or a lower bound forpos increases; therefore since the algorithmterminates oncepos = n , it must terminate after at most2n iterations of the loop, sincepos - cnd begins at1 . Therefore the complexity of the table algorithm isO(n) .

    Efficiency of the KMP algorithmSince the two portions of the algorithm have, respectively, complexities ofO(k) and O(n) , the complexity of theoverall algorithm isO(n + k) .

    These complexities are the same, no matter how many repetitive patterns are inW or S .

    VariantsA real-time version of KMP can be implemented using a separate failure function table for each character in thealphabet. If a mismatch occurs on character in the text, the failure function table for character is consulted forthe index in the pattern at which the mismatch took place. This will return the length of the longest substringending at matching a prefix of the pattern, with the added condition that the character after the prefix is . Withthis restriction, character in the text need not be checked again in the next phase, and so only a constant number of

    https://en.wikipedia.org/w/index.php?title=Real-time_computinghttps://en.wikipedia.org/w/index.php?title=Real-time_computing
  • 8/11/2019 Algorithms and Data Structures - Part 5

    19/26

    KnuthMorrisPratt algorithm 17

    operations are executed between the processing of each index of the text. This satisfies the real-time computingrestriction.

    The Booth algorithm uses a modified version of the KMP preprocessing function to find the lexicographicallyminimal string rotation. The failure function is progressively calculated as the string is rotated.

    References Knuth, Donald; Morris, James H., jr; Pratt, Vaughan (1977). "Fast pattern matching in strings"[1]. SIAM Journal

    on Computing 6 (2): 323 350. doi:10.1137/0206024[2]. Zbl 0372.68005[3]. Cormen, Thomas; Lesiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Section 32.4: The

    Knuth-Morris-Pratt algorithm". Introduction to Algorithms (Second ed.). MIT Press and McGraw-Hill.pp. 923 931. ISBN 0-262-03293-7. Zbl 1047.68161[4].

    Crochemore, Maxime; Rytter, Wojciech (2003). Jewels of stringology. Text algorithms . River Edge, NJ: WorldScientific. pp. 20 25. ISBN 981-02-4897-0. Zbl 1078.68151[5].

    Szpankowski, Wojciech (2001). Average case analysis of algorithms on sequences . Wiley-Interscience Series inDiscrete Mathematics and Optimization. With a foreword by Philippe Flajolet. Chichester: Wiley.pp. 15 17,136 141. ISBN 0-471-24063-X. Zbl 0968.68205[6].

    External links String Searching Applet animation[7]

    An explanation of the algorithm[8] and sample C++ code[9] by David Eppstein Knuth-Morris-Pratt algorithm[10] description and C code by Christian Charras and Thierry Lecroq Explanation of the algorithm from scratch[11] by FH Flensburg. Breaking down steps of running KMP[12] by Chu-Cheng Hsieh. [13] NPTELHRD YouTube lecture video

    References[1] http:/ / citeseer. ist. psu. edu/ context/ 23820/ 0[2] http:/ / dx. doi. org/ 10. 1137%2F0206024[3] http:/ / www. zentralblatt-math. org/ zmath/ en/ search/ ?format=complete& q=an:0372. 68005[4] http:/ / www. zentralblatt-math. org/ zmath/ en/ search/ ?format=complete& q=an:1047. 68161[5] http:/ / www. zentralblatt-math. org/ zmath/ en/ search/ ?format=complete& q=an:1078. 68151[6] http:/ / www. zentralblatt-math. org/ zmath/ en/ search/ ?format=complete& q=an:0968. 68205[7] http:/ / www. cs. pitt.edu/ ~kirk/ cs1501/ animations/ String. html[8] http:/ / www. ics. uci. edu/ ~eppstein/ 161/ 960227. html[9] http:/ / www. ics. uci. edu/ ~eppstein/ 161/ kmp/

    [10] http:/

    /

    www-igm.

    univ-mlv.

    fr/

    ~lecroq/

    string/

    node8.

    html[11] http:/ / www. inf. fh-flensburg. de/ lang/ algorithmen/ pattern/ kmpen. htm[12] http:/ / oak. cs.ucla.edu/ cs144/ examples/ KMPSearch. html[13] http:/ / www. youtube. com/ watch?v=Zj_er99KMb8

    http://www.youtube.com/watch?v=Zj_er99KMb8http://oak.cs.ucla.edu/cs144/examples/KMPSearch.htmlhttp://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htmhttp://www-igm.univ-mlv.fr/~lecroq/string/node8.htmlhttp://www.ics.uci.edu/~eppstein/161/kmp/http://www.ics.uci.edu/~eppstein/161/960227.htmlhttp://www.cs.pitt.edu/~kirk/cs1501/animations/String.htmlhttp://www.zentralblatt-math.org/zmath/en/search/?format=complete&q=an:0968.68205http://www.zentralblatt-math.org/zmath/en/search/?format=complete&q=an:1078.68151http://www.zentralblatt-math.org/zmath/en/search/?format=complete&q=an:1047.68161http://www.zentralblatt-math.org/zmath/en/search/?format=complete&q=an:0372.68005http://dx.doi.org/10.1137%2F0206024http://citeseer.ist.psu.edu/context/23820/0http://www.youtube.com/watch?v=Zj_er99KMb8http://oak.cs.ucla.edu/cs144/examples/KMPSearch.htmlhttp://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htmhttp://www-igm.univ-mlv.fr/~lecroq/string/node8.htmlhttps://en.wikipedia.org/w/index.php?title=David_Eppsteinhttp://www.ics.uci.edu/~eppstein/161/kmp/http://www.ics.uci.edu/~eppstein/161/960227.htmlhttp://www.cs.pitt.edu/~kirk/cs1501/animations/String.htmlhttp://www.zentralblatt-math.org/zmath/en/search/?format=complete&q=an:0968.68205https://en.wikipedia.org/w/index.php?title=Zentralblatt_MATHhttps://en.wikipedia.org/w/index.php?title=Special:BookSources/0-471-24063-Xhttps://en.wikipedia.org/w/index.php?title=International_Standard_Book_Numberhttp://www.zentralblatt-math.org/zmath/en/search/?format=complete&q=an:1078.68151https://en.wikipedia.org/w/index.php?title=Zentralblatt_MATHhttps://en.wikipedia.org/w/index.php?title=Special:BookSources/981-02-4897-0https://en.wikipedia.org/w/index.php?title=International_Standard_Book_Numberhttps://en.wikipedia.org/w/index.php?title=Wojciech_Rytterhttp://www.zentralblatt-math.org/zmath/en/search/?format=complete&q=an:1047.68161https://en.wikipedia.org/w/index.php?title=Zentralblatt_MATHhttps://en.wikipedia.org/w/index.php?title=Special:BookSources/0-262-03293-7https://en.wikipedia.org/w/index.php?title=International_Standard_Book_Numberhttps://en.wikipedia.org/w/index.php?title=Introduction_to_Algorithmshttps://en.wikipedia.org/w/index.php?title=Clifford_Steinhttps://en.wikipedia.org/w/index.php?title=Ronald_L._Rivesthttps://en.wikipedia.org/w/index.php?title=Charles_E._Leisersonhttps://en.wikipedia.org/w/index.php?title=Thomas_H._Cormenhttp://www.zentralblatt-math.org/zmath/en/search/?format=complete&q=an:0372.68005https://en.wikipedia.org/w/index.php?title=Zentralblatt_MATHhttp://dx.doi.org/10.1137%2F0206024https://en.wikipedia.org/w/index.php?title=Digital_object_identifierhttp://citeseer.ist.psu.edu/context/23820/0https://en.wikipedia.org/w/index.php?title=Vaughan_Pratthttps://en.wikipedia.org/w/index.php?title=James_H._Morris%2C_Jrhttps://en.wikipedia.org/w/index.php?title=Donald_Knuthhttps://en.wikipedia.org/w/index.php?title=Lexicographically_minimal_string_rotationhttps://en.wikipedia.org/w/index.php?title=Lexicographically_minimal_string_rotation
  • 8/11/2019 Algorithms and Data Structures - Part 5

    20/26

    BoyerMoore string search algorithm 18

    Boyer Moore string search algorithmIn computer science, theBoyer Moore string search algorithm is an efficient string searching algorithm that is thestandard benchmark for practical string search literature.[1] It was developed by Robert S. Boyer and J StrotherMoore in 1977. The algorithm preprocesses the string being searched for (the pattern), but not the string being

    searched in (the text). It is thus well-suited for applications in which the pattern is much shorter than the text or doespersist across multiple searches. The Boyer-Moore algorithm uses information gathered during the preprocess step toskip sections of the text, resulting in a lower constant factor than many other string algorithms. In general, thealgorithm runs faster as the pattern length increases.

    Definitions

    A N P A N M A N -

    P A N - - - - - -

    - P A N - - - - -- - P A N - - - -

    - - - P A N - - -

    - - - - P A N - -

    - - - - - P A N -

    Alignments of patternPAN to textANPANMAN , fromk=3 to k=8 . A match occurs atk=5 .

    S[i] refers to the character at indexi of stringS, counting from 1. S[i..j] refers to the substring of stringS starting at indexi and ending at j, inclusive.

    A prefix ofS

    is a substringS[1..i]

    for someiin range

    [1, n] , where

    nis the length of

    S. A suffix ofS is a substringS[i..n] for somei in range [1, n] , where n is the length ofS.

    The string to be searched for is called thepattern and is referred to with symbol P . The string being searched in is called thetext and is referred to with symbolT . The length of P is n . The length ofT is m . Analignment of P to T is an index k in T such that the last character of P is aligned with index k of T . Amatch or occurrence of P occurs at an alignment if P is equivalent toT[(k-n+1)..k] .

    DescriptionThe Boyer-Moore algorithm searches for occurrences of P in T by performing explicit character comparisons atdifferent alignments. Instead of a brute-force search of all alignments (of which there are m - n + 1 ), Boyer-Mooreuses information gained by preprocessing P to skip as many alignments as possible.

    The algorithm begins at alignment k = n , so the start of P is aligned with the start ofT . Characters in P and T arethen compared starting at index n in P and k in T , moving backward: the strings are matched from the end of P to thestart of P . The comparisons continue until either the beginning of P is reached (which means there is a match) or amismatch occurs upon which the alignment is shifted to the right according to the maximum value permitted by anumber of rules. The comparisons are performed again at the new alignment, and the process repeats until thealignment is shifted past the end ofT , which means no further matches will be found.

    The shift rules are implemented as constant-time table lookups, using tables generated during the preprocessing of P .

    https://en.wikipedia.org/w/index.php?title=Brute-force_searchhttps://en.wikipedia.org/w/index.php?title=Substring%23Suffixhttps://en.wikipedia.org/w/index.php?title=Substring%23Prefixhttps://en.wikipedia.org/w/index.php?title=Substringhttps://en.wikipedia.org/w/index.php?title=Preprocessorhttps://en.wikipedia.org/w/index.php?title=Algorithmhttps://en.wikipedia.org/w/index.php?title=J_Strother_Moorehttps://en.wikipedia.org/w/index.php?title=J_Strother_Moorehttps://en.wikipedia.org/w/index.php?title=Robert_S._Boyerhttps://en.wikipedia.org/w/index.php?title=Computer_science
  • 8/11/2019 Algorithms and Data Structures - Part 5

    21/26

    BoyerMoore string search algorithm 19

    Shift Rules

    The Bad Character Rule

    Description

    - - - - X - - K - - -

    A N P A N M A N A M -

    - N N A A M A N - - -

    - - - N N A A M A N -

    Demonstration of bad character rule with patternNNAAMAN .

    The bad-character rule considers the character inT at which the comparison process failed (assuming such a failureoccurred). The next occurrence of that character to the left in P is found, and a shift which brings that occurrence inline with the mismatched occurrence inT is proposed. If the mismatched character does not occur to the left in P , a

    shift is proposed that moves the entirety of P past the point of mismatch.

    Preprocessing

    Methods vary on the exact form the table for the bad character rule should take, but a simple constant-time lookupsolution is as follows: create a 2D table which is indexed first by the index of the character c in the alphabet andsecond by the indexi in the pattern. This lookup will return the occurrence of c in P with the next-highest index j < ior -1 if there is no such occurrence. The proposed shift will then bei - j , with O(1) lookup time andO(kn) space,assuming a finite alphabet of length k.

    The Good Suffix Rule

    Description

    - - - - X - - K - - - - -

    M A N P A N A M A N A P -

    A N A M P N A M - - - - -

    - - - - A N A M P N A M -

    Demonstration of good suffix rule with patternANAMPNAM .

    The good suffix rule is markedly more complex in both concept and implementation than the bad character rule. It is

    the reason comparisons begin at the end of the pattern rather than the start, and is formally stated thus:Suppose for a given alignment of P and T , a substring t of T matches a suffix of P , but a mismatchoccurs at the next comparison to the left. Then find, if it exists, the right-most copy t' of t in P such that t'is not a suffix of 'P' and the character to the left of 't'' in 'P' differs from the character to the left of 't' in'P'. Shift 'P' to the right so that substring 't'' in 'P' aligns with substring 't' in 'T'. If 't'' does not exist, thenshift the left end of 'P' past the left end of 't' in 'T' by the least amount so that a prefix of the shiftedpattern matches a suffix of 't' in 'T'. If no such shift is possible, then shift 'P' by 'n' places to the right. If an occurrence of 'P' is found, then shift 'P' by the least amount so that a proper prefix of the shifted 'P'matches a suffix of the occurrence of 'P' in 'T'. If no such shift is possible, then shift 'P' by 'n' places, thatis, shift 'P' past 't'.

  • 8/11/2019 Algorithms and Data Structures - Part 5

    22/26

    BoyerMoore string search algorithm 20

    Preprocessing

    The good suffix rule requires two tables: one for use in the general case, and another for use when either the generalcase returns no meaningful result or a match occurs. These tables will be designated L and H respectively. Theirdefinitions are as follows:

    For each i, L[i] is the largest position less than n such that string P[i..n] matches a suffix of P[1..L[i]]

    and such that the character preceding that suffix is not equal to P[i-1] . L[i] is defined to be zero if thereis no position satisfying the condition.

    Let H[i] denote the length of the largest suffix of P[i..n] that is also a prefix of P , if one exists. If noneexists, let H[i] be zero.

    Both of these tables are constructible inO(n) time and useO(n) space. The alignment shift for indexi in P is givenby n - L[i] or n - H[i] . H should only be used if L[i] is zero or a match has been found.

    The Galil RuleA simple but important optimization of Boyer-Moore was put forth by Galil in 1979. As opposed to shifting, the

    Galil rule deals with speeding up the actual comparisons done at each alignment by skipping sections that are knownto match. Suppose that at an alignment k

    1, P is compared withT down to character c of T . Then if P is shifted to k

    2such that its left end is between c and k

    1, in the next comparison phase a prefix of P must match the substringT[(k

    2-

    n)..k1 ] . Thus if the comparisons get down to position k

    1of T , an occurrence of P can be recorded without explicitly

    comparing past k1. In addition to increasing the efficiency of Boyer-Moore, the Galil rule is required for proving

    linear-time execution in the worst case.

    PerformanceThe Boyer-Moore algorithm as presented in the original paper has worst-case running time ofO(n+m) only if thepattern doesnot appear in the text. This was first proved by Knuth, Morris, and Pratt in 1977, followed by Guibasand Odlyzko in 1980 with an upper bound of 5m comparisons in the worst case. Richard Cole gave a proof with anupper bound of 3m comparisons in the worst case in 1991.

    When the patterndoes occur in the text, running time of the original algorithm isO(nm) in the worst case. This iseasy to see when both pattern and text consist solely of the same repeated character. However, inclusion of the Galilrule results in linear runtime across all cases.

    ImplementationsVarious implementations exist in different programming languages. In C++, Boost provides the genericBoyer Moore search[2] implementation under the Algorithm library.

    Below are a few simple implementations.

    VariantsThe Boyer-Moore-Horspool algorithm is a simplification of the Boyer-Moore algorithm using only the bad characterrule.

    The Apostolico-Giancarlo algorithm speeds up the process of checking whether a match has occurred at the givenalignment by skipping explicit character comparisons. This uses information gleaned during the pre-processing of the pattern in conjunction with suffix match lengths recorded at each match attempt. Storing suffix match lengthsrequires an additional table equal in size to the text being searched.

    https://en.wikipedia.org/w/index.php?title=Apostolico-Giancarlo_algorithmhttps://en.wikipedia.org/w/index.php?title=Boyer-Moore-Horspool_algorithmhttp://www.boost.org/doc/libs/1_53_0/libs/algorithm/doc/html/algorithm/Searching.html#the_boost_algorithm_library.Searching.BoyerMoorehttps://en.wikipedia.org/w/index.php?title=Boost_%28C%2B%2B_libraries%29https://en.wikipedia.org/w/index.php?title=C%2B%2Bhttps://en.wikipedia.org/w/index.php?title=Boyer%E2%80%93Moore_string_search_algorithm%23The_Galil_Rulehttps://en.wikipedia.org/w/index.php?title=Boyer%E2%80%93Moore_string_search_algorithm%23The_Galil_Rulehttps://en.wikipedia.org/w/index.php?title=Andrew_Odlyzkohttps://en.wikipedia.org/w/index.php?title=Leonidas_J._Guibashttps://en.wikipedia.org/w/index.php?title=Vaughan_Pratthttps://en.wikipedia.org/w/index.php?title=James_H._Morrishttps://en.wikipedia.org/w/index.php?title=Donald_Knuthhttps://en.wikipedia.org/w/index.php?title=Zvi_Galil
  • 8/11/2019 Algorithms and Data Structures - Part 5

    23/26

    BoyerMoore string search algorithm 21

    References[1] Hume and Sunday (1991)[Fast String Searching] SOFTWARE PRACTICE AND EXPERIENCE, VOL. 21(11), 1221 1248

    (NOVEMBER 1991)[2] http:/ / www. boost. org/ doc/ libs/ 1_53_0/ libs/ algorithm/ doc/ html/ algorithm/ Searching. html#the_boost_algorithm_library. Searching.

    BoyerMoore

    External links Original paper on the Boyer-Moore algorithm (http:/ / www. cs. utexas. edu/ ~moore/ publications/ fstrpos.pdf) An example of the Boyer-Moore algorithm (http:/ / www. cs. utexas. edu/ users/ moore/ best-ideas/

    string-searching/ fstrpos-example. html) from the homepage of J Strother Moore, co-inventor of the algorithm Richard Cole's 1991 paper proving runtime linearity (http:/ / www.cs. nyu. edu/ cs/ faculty/ cole/ papers/

    CHPZ95.ps)

    http://www.cs.nyu.edu/cs/faculty/cole/papers/CHPZ95.pshttp://www.cs.nyu.edu/cs/faculty/cole/papers/CHPZ95.pshttps://en.wikipedia.org/w/index.php?title=J_Strother_Moorehttp://www.cs.utexas.edu/users/moore/best-ideas/string-searching/fstrpos-example.htmlhttp://www.cs.utexas.edu/users/moore/best-ideas/string-searching/fstrpos-example.htmlhttp://www.cs.utexas.edu/~moore/publications/fstrpos.pdfhttp://www.boost.org/doc/libs/1_53_0/libs/algorithm/doc/html/algorithm/Searching.html#the_boost_algorithm_library.Searching.BoyerMoorehttp://www.boost.org/doc/libs/1_53_0/libs/algorithm/doc/html/algorithm/Searching.html#the_boost_algorithm_library.Searching.BoyerMoore
  • 8/11/2019 Algorithms and Data Structures - Part 5

    24/26

    Article Sources and Contributors 22

    Article Sources and ContributorsString (computer science) Source : https://en.wikipedia.org/w/index.php?oldid=586639232Contributors : 216.60.221.xxx, A4b3c2d1e0f, Ahoerstemeier, Alai, Alan Millar, Alfabalon,AnOddName, Andreas Kaufmann, Andres, Andrew Helwer, Andy Dingley, Anphanax, Anthony Borla, Arthur Frayn, AxelBoldt, B4hand, BIL, Beland, Bevo, Bkkbrad, Black Falcon,Bogdangiusca, Boleslav Bobcik, Borgx, Bryan Derksen, BurntSky, C45207, CanadianMaritimer, CanisRufus, Captain Conundrum, Castaa, Cedders, Cgtdk, Charles Matthews, Charvest, Chris thespeller, Christopherlin, Conversion script, Courcelles, Cybercobra, Dainomite, Damian Yerrick, Dcoetzee, Denispir, Dennis714, Dereckson, Derek farn, Doctorfluffy, Doug Bell, Dreadstar,Dreftymac, Drj, Drphilharmonic, Dysprosia, Elassint, Eloquence, Error, Fabartus, Fatboar, Forderud, Fropuff, Fudo, Furrykef, GNRY09, Gaiacarra, Garyzx, Georg Peter, Ghewgill, Giftlite,GoingBatty, Gparker, Gregbard, Gumum, Gurch, Gwil, Gyro Copter, Hairy Dude, Hippopha, Hornlo, Howcheng, IOLJeff, Ian Pitchford, Icep, JDP90, JLaTondre, Jarble, Jay-Sebastos, Jc3s5h,

    Jeremysr, Jiri 1984, Jncraton, John254, JonathanCross, Jonnabuz, Jordandanford, Kbdank71, Komarov om, Koyaanis Qatsi, Kusunose, Kyng, Lambiam, LilHelpa, Linas, Loadmaster,Local.empire, Luke Igoe, Mad Tinman, Marc van Leeuwen, MattGiuca, Maximaximax, Michael Hardy, Mikeblas, Minghong, MisterSheik, Mjb, Mkweise, Mlpkr, Mojo Hand, Murray Langton,Murtasa, Mythas11, Nasnema, Nbarth, Neelix, Nevyn, Obradovic Goran, Oleg Alexandrov, OpinionPerson, Pantser, Patrick, Pengo, Perique des Palottes, Peterdjones, Pexatus, Philg88, Pimlottc,Pinguin.tk, Plugwash, Pnm, Ptarjan, Qwertyus, R. S. Shaw, RTC, Richard W.M. Jones, RogerofRomsey, Rory O'Kane, Ruud Koot, S.rvarr.S, Sahirshah, Scarfboy, Sebbe, Seec77, Sewing,Shahab, Shirifan, Sietse Snel, Slady, Spearhead, Spitzak, Stephen Gilbert, StuartBrady, TBloemink, Taemyr, TakuyaMurata, Teles, Tentinator, The Anome, The Thing That Should Not Be,TheIncredibleEdibleOompaLoompa, Thumperward, Tigrisek, Tobias Bergemann, Tompsci, Tortoise3, Treekids, Ubermonkey, Underrated1 17, Urod, Uuf6429, Vadmium, Wayfarer, Wikipelli,WinterSpw, Witharebelyell, Zundark, , 149 anonymous edits

    String searching algorithm Source : https://en.wikipedia.org/w/index.php?oldid=585633011Contributors : A3RO, Algebran, Alvestrand, Andreas Kaufmann, Angela, Ascnder,Ayonggu114ster, B4hand, Bender235, Bisqwit, Boleslav Bobcik, Borgx, BrokenSegue, CRGreathouse, Catmoongirl, Conversion script, Dcirovic, Dcoetzee, Drake Wilson, Dummy6277,Excirial, Fredrik, Hariva, HerrMister, Ijustam, Jafet, Jan.papousek, Jarble, Jaredwf, Jwpat7, KerinthIT, Kku, Kragen, Kumioko (renamed), Ltickett, Macrakis, Mandarax, Materialscientist,MaxEnt, MegaHasher, Mikeblas, Mlpkr, Mordomo, Ms2ger, NJM, Neilbeach, Netrapt, Nils Grimsmo, Nils.grimsmo, Nixdorf, Nyxos, OldCodger2, PFHLai, Phe, PhilKnight, Plugwash, Pne,Poor Yorick, Quuxplusone, Ruud Koot, Sam Hocevar, Shadowhillway, Shehzad.kazmi, Shekharsuman93, Sniffnoy, Squash, Squidonius, SummerWithMorons, Szabolcs Nagy, Taw, Thosylve,Tony1, TripleF, Tristan Schmelcher, Trusilver, Watcher, Webmeischda, 91 anonymous edits

    Knuth Morris Pratt algorithm Source : https://en.wikipedia.org/w/index.php?oldid=586295098Contributors : A5b, Acntx, Adityasinghhhh, Adityasinghhhhh, Almi, Amitchaudhary, AndrewHelwer, Antaeus Feldspar, Arlolra, Axings, Bgwhite, B ikri, Blinken, Borgx, Bruguiea, Bryan Derksen, Byronknoll, Chadernook, Chester br, Chucheng, C rescent Moon, Curly Turkey, DavidEppstein, Dcoetzee, Deltahedron, Diagonalfish, Dmshafi, Ee19921, Elias, Erroneous01, Fibonacci, Glrx, GregorB, Haojin, Hariva, Hddqsb, Hobsonlane, J04n, Jagat sastry, Jaredwf, Javy413,Jeremiah Mountain, Jocapc, Johnuniq, Jon Awbrey, KSmrq, Krischik, LOL, Little Mountain 5, LokiClock, MadLex, Madoka, Magicheader, Magioladitis, Mandarax, Mark T, Mhss, Michael

    Hardy, Mikespedia, NeilFraser, Niceguyedc, Olau, OnePlusTwelve, PACO, Peni, Phe, Pranith, Pratik.mallya, Quuxplusone, RainCT, Raknarf44, Rich Farmbrough, Ruud Koot, Ryan Reich, ShellKinney, Sikuyihsoy, Smallman12q, Spencer4Hire, Swift, Timwi, Tom Alsberg, Tregoweth, Tushicomeng, Versus, Wahas1234, Wikibob, Winston Chuen-Shih Yang, Wisiti, Ww, Xterminatrix,Ycl6, Zhaladshar, Znora, 167 anonymous edits

    Boyer Moore string search algorithm Source : https://en.wikipedia.org/w/index.php?oldid=585928282Contributors : Abednigo, Adfellin, Alex.mccarthy, Ancheta Wis, Andrew Helwer,Antaeus Feldspar, Art1x com, Aunndroid, BD2412, Ba labiot, Barry Fruitman, Beland, Biker Biker, Billlava, Billyoneal, Bisqwit, Blueyoshi321, Booyabazooka, Borgx, Brunobowden, Chokfung,ChrisGualtieri, Cneubauer, Cwalgampaya, Czlaner, Damian Yerrick, DaveWF, Dcoetzee, Dekart, Deqing.huang, DocWatson42, Donner60, Dpakoha, Duplicity, Edsarian, Eeppeliteloop, Elassint,Evgeny Lykhin, Eyal0, Fbriere, Fib, Fre aky Dug, Fredrik, Furrykef, Greenrd, Icktoofay, IgushevEdward, Infinito, J12f, Jashmenn, Jemfinch, Jinghaoxu, JoeMarfice, JustAHappyCamper,Jy2wong, Karnan, Kayvee, Klutzy, Kostmo, Kri, Kucyla, Lauren Lester, Lisamh, Lumpynifkin, M.O.X, Martinkunev, Mathiasl26, Maximus Rex, Mboverload, Mi1ror, Mikeblas, Moink, Mr flea,Murray Langton, Neelpulse, Nemo bis, Nickjhay, Nneonneo, Ott2, PedR, Phe, PhilKnight, Plindenbaum, Pne, Quuxplusone, RJFJR, Radagast83, Rich Farmbrough, Ruud Koot, Ryan Reich,SeekerOfThePath, Smallman12q, Snowgene, SummerWithMorons, Szabolcs Nagy, Thegeneralguy, Tide rolls, Tim Starling, Tim.head, Tobias Bergemann, Triddle, TripleF, Vacation9, Watcher,Wikibob, Wthrower, Ww, Xillion, YUL89YYZ, Zearin, 218 anonymous edits

  • 8/11/2019 Algorithms and Data Structures - Part 5

    25/26

    Image Sources, Licenses and Contributors 23

    Image Sources, Licenses and ContributorsImage:DFA search mommy.svg Source : https://en.wikipedia.org/w/index.php?title=File:DFA_search_mommy.svg License : Public Domain Contributors : Dcoetzee, Jochen Burghardt,Kilom691

  • 8/11/2019 Algorithms and Data Structures - Part 5

    26/26

    License 24

    LicenseCreative Commons Attribution-Share Alike 3.0 //creativecommons.org/licenses/by-sa/3.0/