aho-corasick automata - stanford...

445
Aho-Corasick Automata

Upload: others

Post on 08-Oct-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Aho-Corasick Automata

Page 2: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

String Data Structures

● Over the next few days, we're going to be exploring data structures specifically designed for string processing.

● These data structures and their variants are frequently used in practice

Page 3: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Looking Forward

● Today: Aho-Corasick Automata● A fast data structure for string matching.

● Thursday: Suffix Trees● An absurdly versatile string data structure.

● Tuesday: Suffix Arrays● Suffix-tree like performance with array-like

space usage.

Page 4: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

String Searching

Page 5: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

The String Searching Problem

● Consider the following problem:

Given a string T and k nonempty stringsP₁, …, Pₖ, find all occurrences of P₁, …, Pₖ

in T.● T is called the text string and P₁, …, Pₖ are

called pattern strings.● This problem was originally studied in the

context of compiling indexes, but has found applications in computer security and computational genomics.

Page 6: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

Page 7: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Some Terminology

● Let m = |T|, the length of the string to be searched.

● Let n = |P₁| + |P₂| + … + |Pₖ| be the total length of all the pattern strings.

● Let Lmax be the length of the longest pattern string.

● Assume that strings are drawn from an alphabet Σ, where |Σ| is some constant.

● We'll use these terms when talking about the runtime of the algorithms and data structures we'll explore over the next couple of days.

Page 8: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

How quickly can we solve the stringsearching problem?

Page 9: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Let's start with a naïve approach.

Page 10: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 11: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 12: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 13: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 14: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 15: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 16: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 17: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 18: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 19: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 20: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 21: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 22: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 23: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 24: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 25: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 26: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 27: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 28: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 29: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 30: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 31: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 32: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 33: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 34: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 35: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 36: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 37: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

a b

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

For each position in T: For each pattern string Pᵢ:

Check if Pᵢ appears at that position.

Page 38: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Analyzing Our Approach

● As before, let m be the length of the text and n the total length of the pattern strings.

● For each character of the text string T, in the worst case, we scan over all n total characters in the patterns.

● Time complexity: O(mn).● Is this a tight bound?

Page 39: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Θ(mn)

a

a

Pattern Strings

a a a a a a a aa

a a

a a a a

a

Page 40: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Can we do better?

Page 41: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 42: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 43: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 44: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 45: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 46: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 47: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 48: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 49: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 50: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 51: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 52: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 53: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 54: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

a b e d g e t

e

e t

u t

Page 55: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Parallel Searching

● Idea: Rather than searching the pattern strings in serial, try searching them in parallel.

● Intuitively, this should cut down on a lot of the unnecessary rescanning that we're doing.

● Challenge: How exactly do we do this in practice?

Page 56: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

Page 57: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

Page 58: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

This data structure is called a trie. It comes from the word retrieval. It is not

pronounced like “retrieval.”

This data structure is called a trie. It comes from the word retrieval. It is not

pronounced like “retrieval.”

Page 59: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Representing Tries

● Each trie node needs to store pointers to its children.

● There are many different data structures we could use to store these pointers.

● For today, we'll assume we have an array of |Σ| pointers, one per possible child.

● You'll explore variants on this strategy in the problem set.

a

c

t

Page 60: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Representing Tries

● Each trie node needs to store pointers to its children.

● There are many different data structures we could use to store these pointers.

● For today, we'll assume we have an array of |Σ| pointers, one per possible child.

● You'll explore variants on this strategy in the problem set.

Page 61: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 62: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 63: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 64: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 65: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 66: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 67: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 68: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 69: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 70: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 71: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 72: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 73: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 74: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 75: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 76: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a t

a t e

a b

a b o

b e

b e d

e d g

g

Pattern Strings

e

e t

u t

a

b

o

u

t

t

e

b

e

d

e

d

g

e

g

e

t

a b e d g e t

Page 77: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Analyzing our New Algorithm

● Let's suppose we've already constructed the trie. How much work is required to perform the match?

● For each character of T, we inspect as most as many characters as exist in the deepest branch of the trie.

● Time complexity: O(mLmax), where Lmax is the length of the longest pattern string. (Do you see why?)

● In the (reasonable) case where Lmax is much smaller than n, this is a huge win over before. If Lmax is “objectively” small, this is a pretty good runtime.

● How much time does it take to build the trie?

Page 78: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

Page 79: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

Page 80: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e a

Page 81: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e a t

Page 82: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e a t e

Page 83: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e a t e

Page 84: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e a t e

Page 85: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

Page 86: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

Page 87: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

Page 88: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

Page 89: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

Page 90: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

a t

Page 91: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

a t

Page 92: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

a t

Page 93: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

a t

Page 94: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

a t

Page 95: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

a t

Page 96: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

a t

b e

Page 97: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

a t

b e

Page 98: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

a t

b eb

Page 99: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

a t

b eb e

Page 100: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

a t

b eb e

Page 101: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building a Trie

● Claim: Given a set of strings P₁, …, Pₖ of total length n, it's possible to build a trie for those strings in time Θ(n).

a t e

a n

a t e

n

a t

b eb e

Page 102: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Our Strategies

● Following our foray into RMQ, we'll say that a solution to multi-string matching runs in time ⟨p(m, n), q(m, n)⟩ if the preprocessing time is p(m, n) and the matching time is q(m, n).

● We now have two approaches:● No preprocessing: ⟨O(1), O(mn)⟩.● Trie searching: ⟨O(n), O(mLmax)⟩.

● Can we do better?

Page 103: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

t

r

o a r s

a t

Page 104: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

Page 105: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 106: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 107: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 108: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 109: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 110: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 111: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 112: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 113: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 114: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 115: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 116: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 117: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 118: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 119: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 120: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s

Page 121: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r sThis red link is called a suffix

link. We'll talk about them more formally in a minute.

This red link is called a suffix link. We'll talk about them more formally in a minute.

Page 122: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

o a r t

Page 123: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

o a r t

Page 124: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

o a r t

Page 125: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

o a r t

Page 126: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

o a r t

Page 127: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

o a r t

Page 128: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

o a r t

Page 129: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

o a r t

Page 130: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

o a r t

Page 131: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

o a r t

Page 132: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

o a r t

Page 133: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

Page 134: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 135: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 136: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 137: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 138: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 139: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 140: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 141: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 142: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 143: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 144: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 145: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 146: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 147: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a t

Page 148: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

Page 149: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 150: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 151: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 152: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 153: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 154: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 155: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 156: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 157: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 158: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 159: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 160: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 161: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 162: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 163: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 164: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r sIn general, suffix links might jump the

red cursor forward more than one step. The number of steps taken is equal to

the change of depth in the trie.

In general, suffix links might jump the red cursor forward more than one step. The number of steps taken is equal to

the change of depth in the trie.

Page 165: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 166: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 167: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 168: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 169: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 170: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 171: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o a r s o a r s

Page 172: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 173: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 174: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 175: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 176: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 177: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 178: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 179: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 180: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 181: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 182: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 183: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 184: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 185: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 186: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 187: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 188: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 189: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a r

s o a

Pattern Strings

ta

r

o a r s

tr

s o a

o a r s

r

t

a t

s o t a t

Page 190: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Suffix Links

● A suffix link (sometimes called a failure link) is a red edge from a trie node corresponding to string α to the trie node corresponding to a string ω such that ω is the longest proper suffix of α that is still in the trie.

● Intuition: When we hit a part of the string where we cannot continue to read characters, we fall back by following suffix links to try to preserve as much context as possible.

● Every node in the trie, except the root (which corresponds to the empty string ε), will have a suffix link associated with it.

Page 191: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Why Suffix Links Matter

● Suffix links can substantially improve the performance of our string search.

● At each step, we either● advance the black (end) pointer forward in the trie, or● advance the red (start) pointer forward.

● Each pointer can advance forward at most O(m) times.● This reduces the amount of time spent scanning

characters from O(mLmax) down to Θ(m).

● This is only useful if we can compute suffix links quickly... which we'll see how to do later.

Page 192: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

A Problem with our Optimization

Page 193: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

s

Pattern Strings

t i n

i

t i n g

Page 194: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 195: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 196: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 197: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 198: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 199: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 200: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 201: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 202: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 203: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 204: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 205: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 206: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

What Happened?

● Our heavily optimized string searcher no longer starts searching from each position in the string.

● As a result, we now might forget to output matches in certain cases.

● We need to figure out● when this happens, and● how to correct for it.

Page 207: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 208: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 209: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 210: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 211: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 212: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 213: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 214: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g We missed the pattern string i because it's a proper suffix sti.

We missed the pattern string i because it's a proper suffix sti.

Page 215: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 216: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 217: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g We missed both tin and in because each is a proper

suffix of stin.

We missed both tin and in because each is a proper

suffix of stin.

Page 218: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 219: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

How do we address this?

Page 220: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 221: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 222: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 223: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 224: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 225: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 226: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 227: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 228: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 229: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 230: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 231: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 232: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 233: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 234: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 235: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 236: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 237: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 238: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 239: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

The Problem

● The approach described previously will ensure that we don't miss any patterns, but it increases the complexity of the search.

● Specifically, we do O(Lmax) additional work at each character following suffix links backwards.

● This brings our search cost back up to O(mLmax) again – and that's too slow.

● Can we do better?

Page 240: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 241: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 242: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 243: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

This blue arrow is called an output link. Whenever we visit this gold node, we'll output the string represented by the node at the end of the blue arrow.

This blue arrow is called an output link. Whenever we visit this gold node, we'll output the string represented by the node at the end of the blue arrow.

Page 244: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 245: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 246: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

By precomputing where we eventually need to end up, we can instantly read off any extra patterns to emit at this

point. As you'll see, we can precompute these links really quickly!

By precomputing where we eventually need to end up, we can instantly read off any extra patterns to emit at this

point. As you'll see, we can precompute these links really quickly!

Page 247: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 248: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 249: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Even nodes that themselves correspond to are patterns might need output links if other patterns also end

at the corresponding string.

Even nodes that themselves correspond to are patterns might need output links if other patterns also end

at the corresponding string.

Page 250: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 251: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 252: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Notice that the blue edges here form a linked list. If we visit this node, we

need to output everything in the chain, not just the “tin” node we're

immediately pointing at.

Notice that the blue edges here form a linked list. If we visit this node, we

need to output everything in the chain, not just the “tin” node we're

immediately pointing at.

Page 253: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 254: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 255: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 256: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 257: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 258: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 259: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 260: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 261: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 262: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 263: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 264: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 265: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 266: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 267: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 268: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 269: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 270: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

The Final Matching Algorithm

● Start at the root node in the trie.● For each character c in the string:

● While there is no edge labeled c:– If you're at the root, break out of this loop.– Otherwise, follow a suffix link.

● If there is an edge labeled c, follow it.● If the current node corresponds to a pattern,

output that pattern.● Output all words in the chain of output links

originating at this node.

Page 271: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

The Runtime Impact

Page 272: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a a

a

Page 273: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 274: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 275: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 276: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 277: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 278: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 279: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 280: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 281: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 282: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 283: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 284: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 285: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 286: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 287: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 288: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 289: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 290: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 291: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 292: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 293: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 294: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 295: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 296: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 297: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 298: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 299: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 300: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 301: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 302: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a aa

a a

a

Pattern Strings

a a a

a

a a aa a a a a

a

a a a

Page 303: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

The Runtime

● In the worst case, we may have to spend a huge amount of time listing off all the matches in the string.

● This isn't the fault of the algorithm – any algorithm that matches strings this way would have to spend the time reporting matches.

● To account for this, let z denote the number of matches reported by our algorithm.

● The runtime of the match phase is then Θ(m + z), with the m term coming from the string scanning and the z term coming from the matches.● You sometimes hear algorithms whose runtime depends on how

much output is generated referred to as output-sensitive algorithms.

Page 304: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Where We Are

● Given the matching automaton (which is called an Aho-Corasick automaton or an AC automaton), we can find all occurrences of the pattern strings in any text of length m in time Θ(m+z).

● To see whether this is worthwhile, we need to see how quickly we can build the automaton.

Page 305: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Time-Out for Announcements!

Page 306: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Problem Set One

● As a friendly reminder, Problem Set One is due this Thursday at 3:00PM.● All solutions must be submitted electronically through

GradeScope. We strongly recommend leaving a few hours' buffer time so that you can get everything set up properly.

● If you haven't started yet... you probably should go and do that. ☺

● We've got office hours throughout the week if you have questions and you're welcome to ask questions on Piazza.

Page 307: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

● Stanford WiCS is hosting HackOverflow, a hackathon for programmers of all skill levels. It's coming up on Saturday, April 16 from 10AM – 10PM. Everyone is welcome!

● Highly recommended! If you've never been to a hackathon before, this is one of the best places to start.

● Want to attend? RSVP using this link.● Want to volunteer at the event or serve as a

mentor? RSVP at this link.

HackOverflow

Page 308: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

oSTEM Mixer

● Stanford's chapter of oSTEM (Out in STEM) is hosting a mixer event tomorrow, April 6, at 6PM at the LGBT-CRC.

● Interested in attending? Want to get involved in oSTEM leadership? Feel free to stop on by! Everyone is welcome.

● If you'd like to RSVP, you can use this link.

Page 309: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Back to CS166!

Page 310: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building the Aho-Corasick Automaton

Page 311: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Building the Automaton

● To construct the Aho-Corasick automaton, we need to● construct the trie,● construct suffix links, and● construct output links.

● We know we can build the trie in time Θ(n) using our logic from before.

● How quickly can we construct suffix and output links?

Page 312: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Constructing Suffix Links

Page 313: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 314: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 315: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 316: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 317: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 318: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 319: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 320: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 321: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 322: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 323: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 324: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 325: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 326: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 327: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

An Initial Algorithm

● Here is a simple, brute-force approach for computing suffix links:● For each node in the trie:

– Let α be the string that this particular node corresponds to.– For each proper suffix ω of α:

● Look up ω in the trie.● If the search ends up at some trie node, point the suffix link there

and stop.

● This approach is not very efficient – that doubly-nested loop is exactly the sort of thing we're trying to avoid.

● Can we do better?

Page 328: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 329: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 330: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o

e

s

o

t a

s

c

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a t

t

a

Page 331: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Fast Suffix Link Construction

Page 332: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Constructing Suffix Links

● Key insight: Suppose we know the suffix link for a node labeled w. After following a trie edge labeled a, there are two possibilities.

● Case 1: xa exists.

w wa

x xaa

a

w a

x a

Page 333: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Constructing Suffix Links

● Key insight: Suppose we know the suffix link for a node labeled w. After following a trie edge labeled a, there are two possibilities.

● Case 2: xa does not exist.

w wa

x

a

w a

x

y ay yaa

Page 334: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Constructing Suffix Links

● Key insight: Suppose we know the suffix link for a node labeled w. After following a trie edge labeled a, there are two possibilities.

● Case 2: xa does not exist.

w wa

x

a w a

x

yy

z zaa z a

Page 335: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Constructing Suffix Links

● To construct the suffix link for a node wa:● Follow w's suffix link to node x.● If node xa exists, wa has a suffix link to xa.● Otherwise, follow x's suffix link and repeat.● If you need to follow backwards from the root, then wa's

suffix link points to the root.

● Observation 1: Suffix links point from longer strings to shorter strings.

● Observation 2: If we precompute suffix links for nodes in ascending order of string length, all of the information needed for the above approach will be available at the time we need it.

Page 336: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 337: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 338: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 339: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 340: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 341: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 342: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 343: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 344: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 345: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 346: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 347: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 348: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 349: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 350: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 351: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 352: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 353: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 354: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 355: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 356: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 357: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 358: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 359: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 360: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 361: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 362: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 363: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 364: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 365: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 366: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 367: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 368: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 369: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 370: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 371: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 372: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 373: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 374: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 375: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 376: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 377: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 378: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 379: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 380: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 381: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 382: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 383: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 384: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 385: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 386: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 387: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 388: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a t

s

Pattern Strings

e

c o a t

t

s

a

o

a

c

Page 389: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Constructing Suffix Links

● Do a breadth-first search of the trie, performing the following operations:● If the node is the root, it has no suffix link.● If the node is one hop away from the root, its suffix link

points to the root.● Otherwise, the node corresponds to some string wa.● Let x be the node pointed at by w's suffix link. Then, do the

following:– If the node xa exists, wa's suffix link points to xa.– Otherwise, if x is the root node, wa's suffix link points to the root.– Otherwise, set x to the node pointed at by x's suffix link and

repeat.

Page 390: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Analyzing Efficiency

● How much time does it take to actually build all the suffix links?

● When filling in any individual suffix link, we might have to keep walking backwards in the trie following suffix links repeatedly while searching for a place to extend.

● Intuitively, it seems like it should be quadratic in the length of the longest string in the trie.

● Is that bound tight?

Page 391: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Analyzing Efficiency

● Claim: The previously-described algorithm for computing suffix links takes time O(n).

● Intuition: Focus on any one word in the trie. As you add suffix links, keep track of the depth of the node pointed at by the current node's suffix link.

Page 392: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

o a t

et

s

o

t a

s

a

c

Page 393: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a

et

s

o

t a

o a t s

c

Page 394: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a

et

s

o

t a

o a t s

c

Page 395: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a

et

s

o

t a

o a t s

c

Page 396: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a

et

s

o

t a

o a t s

c

Page 397: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a

et

s

o

t a

o a t s

c

Page 398: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a

et

s

o

t a

o a t s

c

Page 399: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a

et

s

o

t a

o a t s

c

Page 400: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a

et

s

o

t a

o a t s

c

Page 401: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a

et

s

o

t a

o a t s

c

Page 402: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a

et

s

o

t a

o a t s

c

Page 403: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

a

et

s

o

t a

o a t s

c

Page 404: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Construction Efficiency

● Focus on the time to fill in the suffix links for a single pattern of length h.

● The gold node (where the previous suffix link points) begins at the root. At each step, the gold node● takes some number of steps backward, then● takes at most one step forward.

● The gold node cannot take more steps backward than forward. Therefore, across the entire construction, the gold node takes at most h steps backward.

● Total time required to construct suffix links for a pattern of length h: O(h).

● Total time required to construct all suffix links: O(n).

Page 405: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Computing Output Links

Page 406: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 407: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 408: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 409: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 410: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 411: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 412: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 413: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 414: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 415: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 416: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 417: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 418: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 419: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 420: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 421: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 422: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

s t i n g

Page 423: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

The Idea

● Some trie nodes represent strings that have a pattern string as a proper suffix.

● Our goal is to introduce output links so that, when these nodes are visited, the automaton outputs all the suffixes that end there.

Page 424: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Output Links, Formally

● The output link at a node corresponding to a string w points to● the node corresponding to the longest proper

suffix of w that is a pattern, or● null if no such suffix exists.

● By always pointing to the node corresponding to the longest such word, we ensure that we chain together all the patterns using output links.

Page 425: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 426: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 427: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 428: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 429: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 430: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 431: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 432: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

We want the gold node to point to the first node reachable by suffix links that's also a pattern.

The blue node (at the end of the suffix link) isn't a pattern, but it knows where the first pattern is. We set the gold node's output link to equal the blue node's output link.

We want the gold node to point to the first node reachable by suffix links that's also a pattern.

The blue node (at the end of the suffix link) isn't a pattern, but it knows where the first pattern is. We set the gold node's output link to equal the blue node's output link.

Page 433: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 434: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 435: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 436: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 437: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 438: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

Page 439: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

i n

n

s t

t

i n

s

Pattern Strings

t i n

i

t i n g

i

n gi

We have the gold node point to the blue node because the blue

node corresponds to a word.

We have the gold node point to the blue node because the blue

node corresponds to a word.

Page 440: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Filling In Output Links

● Initially, set every node's output link to be a null pointer.

● While doing the BFS to fill in suffix links, set the output link of the current node v as follows:● Let u be the node pointed at by v's suffix link.● If u corresponds to a pattern, set v's output link to u

itself.● Otherwise, set v's output link to u's output link.

● Time complexity of building all output links: O(n).

Page 441: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

The Net Complexity

● Our preprocessing time is● Θ(n) work to build the trie,● O(n) work to fill in suffix links, and● O(n) work to fill in output links.

● Total preprocessing time: Θ(n).

Page 442: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

The Final Totals

● We now have a multi-string search data structure with time complexity

⟨O(n), O(m + z)⟩. ● In other words, this is exceptionally good

in the case where there are a fixed set of patterns and a variable string to search.

Page 443: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Where We're Going

● A powerful data structure called the suffix tree lets us solve this problem in

⟨O(m), O(n + z)⟩. ● In other words, it excels when there's a

fixed string to search and a variable set of patterns.

Page 444: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

More to Explore

● There are a number of other approaches to solving this problem, and there's often a large gap between theory and practice!

● The Boyer-Moore algorithm searches for a single pattern in a large text. It can actually run in sublinear time if the string searched for isn't present, but runs in quadratic case if a match exists.

● The Commentz-Waltz algorithm generalizes Boyer-Moore for multiple strings and has similar time guarantees, but is faster in practice.

● The Knuth-Morris-Pratt algorithm is a special case of the Aho-Corasick algorithm when there is just one pattern. You'll explore it on the upcoming problem set (after the TAs confirm it's not too difficult to derive it. ☺)

Page 445: Aho-Corasick Automata - Stanford Universityweb.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/02/Slides02.pdfLooking Forward Today: Aho-Corasick Automata A fast data structure

Next Time

● Suffix Trees● A highly versatile, flexible, powerful data

structure for string processing.

● Patricia Tries● Shrinking down trie space usage.

● Applications of RMQ● Getting some mileage out of Fischer-Heun.