s c a l e d pattern matching amihood amir ayelet butman bar-ilan university moshe lewenstein and...

38
SCALED Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan

Post on 22-Dec-2015

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

SCALEDPattern Matching

Amihood Amir Ayelet Butman Bar-Ilan University Moshe

Lewenstein and

Johns Hopkins University Bar-Ilan University

Motivation

Searching for Templates in Aerial Photographs

Input Aerial photo Template

Task Search for all locations where the template appears in the image

Model

bull Low level (pixel level) avoid costly processing

bull Asymptotically efficient solutions

bull Serial exact algorithms

Types of Approximations

Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches

O(nsup2ksup2( edit distance k errors

rectangular patterns

O(nsup2kradic(m log m) radic(k log k)

edit distance k errors

half rectangular patterns

AL-88

AF-95

Types of ApproximationOrientation results O(nsup2m ) FU-98

O(nsup2msup3) ACL-98

Scaling Natural scales results O(n) 1-d EV-88

O(nsup2 log |Σ|) 2-d ALV-92

O(nsup2) dictionary AC-96

Real scales this result O(n) 1-d truncation

5

It seems daunting buthellip

CPM 2003 Morelia Mexico

Problem inherently inexact

What if occurrence is 1frac12 times bigger

What is the meaning of ldquofrac12 a pixelrdquo

Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

DefinitionText Pattern

Find all occurrences of the pattern in the text in all discrete sizes

m

m

n

n

Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

A A A A A A A A A A C C A A C A

A A A C C A A A A A C C A A A A

A A A C C A A A A A A A A

A A A A A A A A A A A A A

A A A A A A A A A C C A A

A A A A A A A A A C C A A

A A A C C C A A A A A A A

A A A C C C A A A A A A A

A A A C C C A A A A C A A

A A A A A A A A A A A A A

A A A A A A A A A C C A C

A A A A A A A A A A A A A

Discrete exact Scaled Matching

P Z U Y K V S X E T

Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

Idea Fix a scale s

Constant amount of work for each square (s-block)

s

s

nns

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 2: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Motivation

Searching for Templates in Aerial Photographs

Input Aerial photo Template

Task Search for all locations where the template appears in the image

Model

bull Low level (pixel level) avoid costly processing

bull Asymptotically efficient solutions

bull Serial exact algorithms

Types of Approximations

Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches

O(nsup2ksup2( edit distance k errors

rectangular patterns

O(nsup2kradic(m log m) radic(k log k)

edit distance k errors

half rectangular patterns

AL-88

AF-95

Types of ApproximationOrientation results O(nsup2m ) FU-98

O(nsup2msup3) ACL-98

Scaling Natural scales results O(n) 1-d EV-88

O(nsup2 log |Σ|) 2-d ALV-92

O(nsup2) dictionary AC-96

Real scales this result O(n) 1-d truncation

5

It seems daunting buthellip

CPM 2003 Morelia Mexico

Problem inherently inexact

What if occurrence is 1frac12 times bigger

What is the meaning of ldquofrac12 a pixelrdquo

Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

DefinitionText Pattern

Find all occurrences of the pattern in the text in all discrete sizes

m

m

n

n

Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

A A A A A A A A A A C C A A C A

A A A C C A A A A A C C A A A A

A A A C C A A A A A A A A

A A A A A A A A A A A A A

A A A A A A A A A C C A A

A A A A A A A A A C C A A

A A A C C C A A A A A A A

A A A C C C A A A A A A A

A A A C C C A A A A C A A

A A A A A A A A A A A A A

A A A A A A A A A C C A C

A A A A A A A A A A A A A

Discrete exact Scaled Matching

P Z U Y K V S X E T

Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

Idea Fix a scale s

Constant amount of work for each square (s-block)

s

s

nns

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 3: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Model

bull Low level (pixel level) avoid costly processing

bull Asymptotically efficient solutions

bull Serial exact algorithms

Types of Approximations

Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches

O(nsup2ksup2( edit distance k errors

rectangular patterns

O(nsup2kradic(m log m) radic(k log k)

edit distance k errors

half rectangular patterns

AL-88

AF-95

Types of ApproximationOrientation results O(nsup2m ) FU-98

O(nsup2msup3) ACL-98

Scaling Natural scales results O(n) 1-d EV-88

O(nsup2 log |Σ|) 2-d ALV-92

O(nsup2) dictionary AC-96

Real scales this result O(n) 1-d truncation

5

It seems daunting buthellip

CPM 2003 Morelia Mexico

Problem inherently inexact

What if occurrence is 1frac12 times bigger

What is the meaning of ldquofrac12 a pixelrdquo

Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

DefinitionText Pattern

Find all occurrences of the pattern in the text in all discrete sizes

m

m

n

n

Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

A A A A A A A A A A C C A A C A

A A A C C A A A A A C C A A A A

A A A C C A A A A A A A A

A A A A A A A A A A A A A

A A A A A A A A A C C A A

A A A A A A A A A C C A A

A A A C C C A A A A A A A

A A A C C C A A A A A A A

A A A C C C A A A A C A A

A A A A A A A A A A A A A

A A A A A A A A A C C A C

A A A A A A A A A A A A A

Discrete exact Scaled Matching

P Z U Y K V S X E T

Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

Idea Fix a scale s

Constant amount of work for each square (s-block)

s

s

nns

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 4: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Types of Approximations

Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches

O(nsup2ksup2( edit distance k errors

rectangular patterns

O(nsup2kradic(m log m) radic(k log k)

edit distance k errors

half rectangular patterns

AL-88

AF-95

Types of ApproximationOrientation results O(nsup2m ) FU-98

O(nsup2msup3) ACL-98

Scaling Natural scales results O(n) 1-d EV-88

O(nsup2 log |Σ|) 2-d ALV-92

O(nsup2) dictionary AC-96

Real scales this result O(n) 1-d truncation

5

It seems daunting buthellip

CPM 2003 Morelia Mexico

Problem inherently inexact

What if occurrence is 1frac12 times bigger

What is the meaning of ldquofrac12 a pixelrdquo

Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

DefinitionText Pattern

Find all occurrences of the pattern in the text in all discrete sizes

m

m

n

n

Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

A A A A A A A A A A C C A A C A

A A A C C A A A A A C C A A A A

A A A C C A A A A A A A A

A A A A A A A A A A A A A

A A A A A A A A A C C A A

A A A A A A A A A C C A A

A A A C C C A A A A A A A

A A A C C C A A A A A A A

A A A C C C A A A A C A A

A A A A A A A A A A A A A

A A A A A A A A A C C A C

A A A A A A A A A A A A A

Discrete exact Scaled Matching

P Z U Y K V S X E T

Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

Idea Fix a scale s

Constant amount of work for each square (s-block)

s

s

nns

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 5: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Types of ApproximationOrientation results O(nsup2m ) FU-98

O(nsup2msup3) ACL-98

Scaling Natural scales results O(n) 1-d EV-88

O(nsup2 log |Σ|) 2-d ALV-92

O(nsup2) dictionary AC-96

Real scales this result O(n) 1-d truncation

5

It seems daunting buthellip

CPM 2003 Morelia Mexico

Problem inherently inexact

What if occurrence is 1frac12 times bigger

What is the meaning of ldquofrac12 a pixelrdquo

Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

DefinitionText Pattern

Find all occurrences of the pattern in the text in all discrete sizes

m

m

n

n

Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

A A A A A A A A A A C C A A C A

A A A C C A A A A A C C A A A A

A A A C C A A A A A A A A

A A A A A A A A A A A A A

A A A A A A A A A C C A A

A A A A A A A A A C C A A

A A A C C C A A A A A A A

A A A C C C A A A A A A A

A A A C C C A A A A C A A

A A A A A A A A A A A A A

A A A A A A A A A C C A C

A A A A A A A A A A A A A

Discrete exact Scaled Matching

P Z U Y K V S X E T

Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

Idea Fix a scale s

Constant amount of work for each square (s-block)

s

s

nns

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 6: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

It seems daunting buthellip

CPM 2003 Morelia Mexico

Problem inherently inexact

What if occurrence is 1frac12 times bigger

What is the meaning of ldquofrac12 a pixelrdquo

Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

DefinitionText Pattern

Find all occurrences of the pattern in the text in all discrete sizes

m

m

n

n

Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

A A A A A A A A A A C C A A C A

A A A C C A A A A A C C A A A A

A A A C C A A A A A A A A

A A A A A A A A A A A A A

A A A A A A A A A C C A A

A A A A A A A A A C C A A

A A A C C C A A A A A A A

A A A C C C A A A A A A A

A A A C C C A A A A C A A

A A A A A A A A A A A A A

A A A A A A A A A C C A C

A A A A A A A A A A A A A

Discrete exact Scaled Matching

P Z U Y K V S X E T

Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

Idea Fix a scale s

Constant amount of work for each square (s-block)

s

s

nns

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 7: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

CPM 2003 Morelia Mexico

Problem inherently inexact

What if occurrence is 1frac12 times bigger

What is the meaning of ldquofrac12 a pixelrdquo

Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

DefinitionText Pattern

Find all occurrences of the pattern in the text in all discrete sizes

m

m

n

n

Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

A A A A A A A A A A C C A A C A

A A A C C A A A A A C C A A A A

A A A C C A A A A A A A A

A A A A A A A A A A A A A

A A A A A A A A A C C A A

A A A A A A A A A C C A A

A A A C C C A A A A A A A

A A A C C C A A A A A A A

A A A C C C A A A A C A A

A A A A A A A A A A A A A

A A A A A A A A A C C A C

A A A A A A A A A A A A A

Discrete exact Scaled Matching

P Z U Y K V S X E T

Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

Idea Fix a scale s

Constant amount of work for each square (s-block)

s

s

nns

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 8: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Problem inherently inexact

What if occurrence is 1frac12 times bigger

What is the meaning of ldquofrac12 a pixelrdquo

Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

DefinitionText Pattern

Find all occurrences of the pattern in the text in all discrete sizes

m

m

n

n

Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

A A A A A A A A A A C C A A C A

A A A C C A A A A A C C A A A A

A A A C C A A A A A A A A

A A A A A A A A A A A A A

A A A A A A A A A C C A A

A A A A A A A A A C C A A

A A A C C C A A A A A A A

A A A C C C A A A A A A A

A A A C C C A A A A C A A

A A A A A A A A A A A A A

A A A A A A A A A C C A C

A A A A A A A A A A A A A

Discrete exact Scaled Matching

P Z U Y K V S X E T

Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

Idea Fix a scale s

Constant amount of work for each square (s-block)

s

s

nns

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 9: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

DefinitionText Pattern

Find all occurrences of the pattern in the text in all discrete sizes

m

m

n

n

Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

A A A A A A A A A A C C A A C A

A A A C C A A A A A C C A A A A

A A A C C A A A A A A A A

A A A A A A A A A A A A A

A A A A A A A A A C C A A

A A A A A A A A A C C A A

A A A C C C A A A A A A A

A A A C C C A A A A A A A

A A A C C C A A A A C A A

A A A A A A A A A A A A A

A A A A A A A A A C C A C

A A A A A A A A A A A A A

Discrete exact Scaled Matching

P Z U Y K V S X E T

Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

Idea Fix a scale s

Constant amount of work for each square (s-block)

s

s

nns

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 10: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

A A A A A A A A A A C C A A C A

A A A C C A A A A A C C A A A A

A A A C C A A A A A A A A

A A A A A A A A A A A A A

A A A A A A A A A C C A A

A A A A A A A A A C C A A

A A A C C C A A A A A A A

A A A C C C A A A A A A A

A A A C C C A A A A C A A

A A A A A A A A A A A A A

A A A A A A A A A C C A C

A A A A A A A A A A A A A

Discrete exact Scaled Matching

P Z U Y K V S X E T

Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

Idea Fix a scale s

Constant amount of work for each square (s-block)

s

s

nns

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 11: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Discrete exact Scaled Matching

P Z U Y K V S X E T

Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

Idea Fix a scale s

Constant amount of work for each square (s-block)

s

s

nns

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 12: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Idea Fix a scale s

Constant amount of work for each square (s-block)

s

s

nns

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 13: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 14: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 15: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 16: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 17: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 18: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 19: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 20: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 21: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 22: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 23: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 24: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 25: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 26: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 27: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 28: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 29: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 30: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 31: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 32: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 33: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 34: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 35: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 36: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem
Page 37: S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem