s c a l e d pattern matching amihood amir ayelet butman bar-ilan university moshe lewenstein and...
Post on 22-Dec-2015
219 views
TRANSCRIPT
SCALEDPattern Matching
Amihood Amir Ayelet Butman Bar-Ilan University Moshe
Lewenstein and
Johns Hopkins University Bar-Ilan University
Motivation
Searching for Templates in Aerial Photographs
Input Aerial photo Template
Task Search for all locations where the template appears in the image
Model
bull Low level (pixel level) avoid costly processing
bull Asymptotically efficient solutions
bull Serial exact algorithms
Types of Approximations
Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches
O(nsup2ksup2( edit distance k errors
rectangular patterns
O(nsup2kradic(m log m) radic(k log k)
edit distance k errors
half rectangular patterns
AL-88
AF-95
Types of ApproximationOrientation results O(nsup2m ) FU-98
O(nsup2msup3) ACL-98
Scaling Natural scales results O(n) 1-d EV-88
O(nsup2 log |Σ|) 2-d ALV-92
O(nsup2) dictionary AC-96
Real scales this result O(n) 1-d truncation
5
It seems daunting buthellip
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Motivation
Searching for Templates in Aerial Photographs
Input Aerial photo Template
Task Search for all locations where the template appears in the image
Model
bull Low level (pixel level) avoid costly processing
bull Asymptotically efficient solutions
bull Serial exact algorithms
Types of Approximations
Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches
O(nsup2ksup2( edit distance k errors
rectangular patterns
O(nsup2kradic(m log m) radic(k log k)
edit distance k errors
half rectangular patterns
AL-88
AF-95
Types of ApproximationOrientation results O(nsup2m ) FU-98
O(nsup2msup3) ACL-98
Scaling Natural scales results O(n) 1-d EV-88
O(nsup2 log |Σ|) 2-d ALV-92
O(nsup2) dictionary AC-96
Real scales this result O(n) 1-d truncation
5
It seems daunting buthellip
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Model
bull Low level (pixel level) avoid costly processing
bull Asymptotically efficient solutions
bull Serial exact algorithms
Types of Approximations
Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches
O(nsup2ksup2( edit distance k errors
rectangular patterns
O(nsup2kradic(m log m) radic(k log k)
edit distance k errors
half rectangular patterns
AL-88
AF-95
Types of ApproximationOrientation results O(nsup2m ) FU-98
O(nsup2msup3) ACL-98
Scaling Natural scales results O(n) 1-d EV-88
O(nsup2 log |Σ|) 2-d ALV-92
O(nsup2) dictionary AC-96
Real scales this result O(n) 1-d truncation
5
It seems daunting buthellip
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Types of Approximations
Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches
O(nsup2ksup2( edit distance k errors
rectangular patterns
O(nsup2kradic(m log m) radic(k log k)
edit distance k errors
half rectangular patterns
AL-88
AF-95
Types of ApproximationOrientation results O(nsup2m ) FU-98
O(nsup2msup3) ACL-98
Scaling Natural scales results O(n) 1-d EV-88
O(nsup2 log |Σ|) 2-d ALV-92
O(nsup2) dictionary AC-96
Real scales this result O(n) 1-d truncation
5
It seems daunting buthellip
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Types of ApproximationOrientation results O(nsup2m ) FU-98
O(nsup2msup3) ACL-98
Scaling Natural scales results O(n) 1-d EV-88
O(nsup2 log |Σ|) 2-d ALV-92
O(nsup2) dictionary AC-96
Real scales this result O(n) 1-d truncation
5
It seems daunting buthellip
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
It seems daunting buthellip
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
- SCALED Pattern Matching
- Motivation
- Slide 3
- Model
- Types of Approximations
- Types of Approximation
- It seems daunting buthellip
- CPM 2003 Morelia Mexico
- Problem inherently inexact
- Definition
- Discrete exact Scaled Matching
- Slide 12
- Idea Fix a scale s
- Algorithm time
- Problem Real scales
- Formally
- Remark
- Simplify definition
- Why are definitions equivalent
- Time
- Definitions are Equivalent
- Naiumlve algorithm for matching PL in TL
- Check intersection
- Slide 24
- Improvement ndash Parameterized Matching
- Parameterized Matching
- Slide 27
- The reduction
- Possibility 1
- Possibility 2
- Algorithm for Real Scaled String Matching
- Example
- Important Fact
- Tighter analysis
- Proof of Lemma
- Lemmarsquos proof (cont)
- Slide 37
- Open Problem
-