Permuted Scaled
MatchingAyelet ButmanNoa Lewenstein
Ian Munro
Scaled matchingInput: Text T=t1,…,tn
Pattern P=p1,…,pm
Scaling: P[i]=p1…p1p2…p2 … pm…pm
Output: All text-locations j where i s.t. p[i] matches at j.
i i i
Scaled matching
cb aa
bb cc a aa a b babcb a
bb cc aa aa
Permutation matchingInput: Text T=t1,…,tn
Pattern P=p1,…,pm
Permutation (of pattern):pπ(1)pπ(2)…pπ(m) where π is a permutation on [m].
Output: All text-locations j where a pattern permutation occurs.
ba ca b ba c b babcb a
cb aa a bb
Permutation matching
ba ca b ba c b babcb a
ba ca b ba
Permutation matching
Permutation matching• Easy to solve in O(n) time (linear size alphabets).
• The pattern matching version of Jumbled Indexing.
Scaled permutation matching
• Match: First Permutation and then Scaling.
Scaled permutation matching
cb aa
aa bb c ac a b babcb a
aa bb cc aa
Scaled permutation matching
• Match: First Permutation and then Scaling.
• B-Eres-Landau[04]: Scaled Permutation Matching in O(n) time.
• Open: Can one do the reverse efficiently, i.e. scaling and then permutation.
• Hard ?
How can we solve? First - Naïve algorithm
Permuted scaled matching
Input: Text T=t1,…,tn
Pattern P=p1,…,pm
Output: All text-locations j where exist permuted scaled matching
Permuted scaled matching
cb aa
bc aa b ca a b babcb a
bb cc aa aa
Naïve algorithm
a abc aa a c cb a c b
a ac bP=
T=
Naïve algorithm
a abc aa a c cb a c b
a ac bP=
T=
k=1
Naïve algorithm
a abc aa a c cb a c b
a ac bP=
T=
k=2
Naïve algorithm1. Construct a table R of size (n+1)×|Σ| such
that R(i,j)=#σj(T[0, i]) for i ≥ 0 and R(−1, j) = 0.
2. For every 0 ≤ i < j ≤ n−1 such that j −i+ 1 = km for some natural number k ≥ 1 do:
a. Let r(l) =( R(j,l)−R(i−1,l))/#σl(P).
b. if r(l) = k for each l, 0 ≤ l ≤ |Σ| − 1, then announce that i is a k-scaled appearance.
Naïve algorithm
a abc aa a c cb a c b
a ac bP=
T=
Naïve algorithm
a abc aa a c cb a c bT=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1
T=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
T=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
T=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
T=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
T=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
1111
aT=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
1111
a
2
11
T=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
1111
a
2
11
3
11
3
21
3
22
4
22
4
32
4
42
5
42
6
42
6
43
T=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
1111
a
2
11
3
11
3
21
3
22
4
22
4
32
4
42
5
42
6
42
6
43
a ac bP=
T=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
1111
a
2
11
3
11
3
21
3
22
4
22
4
32
4
42
5
42
6
42
6
43
a ac b 1P=
T=
K=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
1111
a
2
11
3
11
3
21
3
22
4
22
4
32
4
42
5
42
6
42
6
43
a ac b 1#a=2#b=#c=1
P=
T=
K=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
1111
a
2
11
3
11
3
21
3
22
4
22
4
32
4
42
5
42
6
42
6
43
a ac b 1#a=2#b=#c=1
P=
T=
K=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
1111
a
2
11
3
11
3
21
3
22
4
22
4
32
4
42
5
42
6
42
6
43
a ac b 1#a=2#b=#c=1
P=
T=
K=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
1111
a
2
11
3
11
3
21
3
22
4
22
4
32
4
42
5
42
6
42
6
43
a ac b 1K=#a=2
#b=#c=1 = = 1
= = 1
= = 1
P=
T=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
1111
a
2
11
3
11
3
21
3
22
4
22
4
32
4
42
5
42
6
42
6
43
a ac b 1#a=2#b=#c=1 =
= 1
= 0
K=
P=
T=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
1111
a
2
11
3
11
3
21
3
22
4
22
4
32
4
42
5
42
6
42
6
43
a ac b 2#a=2#b=#c=1 =
= = 2
= = 2
K=
P=
T=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
1111
a
2
11
3
11
3
21
3
22
4
22
4
32
4
42
5
42
6
42
6
43
a ac b 2#a=2#b=#c=1 =
= = 2
= = 2
K=
P=
T=
Naïve algorithm
a abc aa a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
101
1111
a
2
11
3
11
3
21
3
22
4
22
4
32
4
42
5
42
6
42
6
43
a ac b 2#a=2#b=#c=1
=
= = 2
= 2
K=
P=
T=
Naïve algorithm
The running time is where .
Better
?
• Mod-Equivalency: i and j are Mod-Equivalent if for every
character σ (with frequency c in P):
#σ in T[0,i] mod c = #σ in T[0,j] mod c
• Equal-Quotients:i and j have equal-quotients for char’s a & b if:
Properti
es
Mod-equivalent
• Mod-Equivalency: i and j are Mod-Equivalent if for every
character σ (with frequency c in P):
#σ in T[0,i] mod c = #σ in T[0,j] mod c
Mod-equivalent
c bbc ca a c cb a c b
1 1020 113 4 5 86 7. 9 12-1a
b
c
000
001
002
0121
a
1
21
2
21
2
31
2
32
3
32
3
42
3
52
3
53
3
63
3
64
a ac bP=#a=2#b=#c=1
T=
Mod-equivalent
c bbc ca a c cb a c b
113a
b
c
a
1
21
3
63
a ac b#a=2#b=#c=1
P=
T=
Mod-equivalent
c bbc ca a c cb a c b
113a
b
c
a
1
21
3
63
a ac b
a
#a=2
3𝑚𝑜𝑑2=1𝑚𝑜𝑑 2
P=
T=
Mod-equivalent
c bbc ca a c cb a c b
113a
b
c
a
1
21
3
63
a ac b
a
#a=2
3𝑚𝑜𝑑2=1𝑚𝑜𝑑 2
P=
T=
Mod-equivalent
c bbc ca a c cb a c b
113a
b
c
a
1
21
3
63
a ac b#a=2
3𝑚𝑜𝑑2=1𝑚𝑜𝑑 2
P=
T=
Mod-equivalent
c bbc ca a c cb a c b
113a
b
c
a
1
21
3
63
a ac b#b=1
3𝑚𝑜𝑑1=1𝑚𝑜𝑑1
P=
T=
Mod-equivalent
c bbc ca a c cb a c b
113a
b
c
a
1
21
3
63
a ac b#c=1
6𝑚𝑜𝑑1=2𝑚𝑜𝑑1
P=
T=
Mod-equivalent
c bbc ca a c cb a c b
113a
b
c
a
1
21
3
63
a ac bP=
T=
Mod-equivalent
c bbc ca a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
002
0121
a
1
21
2
21
2
31
2
32
3
32
3
42
3
52
3
53
3
43
3
44
a ac b#a=2
P=
T=
Mod-equivalent
c bbc ca a c cb a c b
102a
b
c
0121
a
3
53
a ac b#a=2
3𝑚𝑜𝑑2≠0𝑚𝑜𝑑2
P=
T=
Mod-equivalent
c bbc ca a c cb a c b
102a
b
c
0121
a
3
53
a ac b#a=2
3𝑚𝑜𝑑2≠0𝑚𝑜𝑑2
P=
T=
Mod-equivalent
c bbc aa a c cb a a b
113a
b
c
a
1
21
5
43
a ac b
5𝑚𝑜𝑑2=1𝑚𝑜𝑑2 3𝑚𝑜𝑑1=1𝑚𝑜𝑑1 4𝑚𝑜𝑑1=2𝑚𝑜𝑑1
P=
T=
Equal-quotients
• Equal-Quotients:i and j have equal-quotients for char’s a & b if:
Equal-quotients
c bbc aa a c cb a a b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
002
0121
a
1
21
2
21
2
31
2
32
3
32
3
42
4
42
4
43
5
43
5
44
a ac bP=
T=
Equal-quotients
c bbc aa a c cb a a b
113a
b
c
a
1
21
5
43
a ac bP=
T=
Equal-quotients
c bbc aa a c cb a a b
113a
b
c
a
1
21
5
43
a ac b
⌊52⌋− ⌊
31⌋=⌊
12⌋− ⌊
11⌋
P=
T=
Equal-quotients
c bbc aa a c cb a a b
113a
b
c
a
1
21
5
43
a ac b
⌊52⌋− ⌊
31⌋=⌊
12⌋− ⌊
11⌋
P=
T=
Equal-quotients
c bbc aa a c cb a a b
113a
b
c
a
1
21
5
43
a ac b
⌊52⌋− ⌊
31⌋=⌊
12⌋− ⌊
11⌋ ⌊
31⌋− ⌊
41⌋=⌊
11⌋− ⌊
21⌋
P=
T=
Equal-quotients
c bbc ca a c cb a c b
1 1020 113 4 5 86 7 9 12-1a
b
c
000
001
002
0121
a
1
21
2
21
2
31
2
32
3
32
3
42
3
52
3
53
3
63
3
64
a ac bP=
T=
Equal-quotients
c bbc ca a c cb a c b
113a
b
c
a
1
21
3
63
a ac bP=
T=
Equal-quotients
c bbc ca a c cb a c b
113a
b
c
a
1
21
3
63
a ac b
⌊32⌋− ⌊
31⌋ ≠ ⌊
12⌋− ⌊
11⌋
P=
T=
Equal-quotients
a aaa bb a a aa a a b
1 1520 3 … 10 1311 12 14-1a
b
00
10
20
30
31
……
101
102
103
104
105
106
a aa b b b
b b b
P=T=
Equal-quotients
a aaa bb a a aa a a b
15a
b
3 …31
……
106
a aa b b b
b b b
⌊103⌋− ⌊
63⌋=⌊
33⌋− ⌊
13⌋
P=T=
Theorem
T[i, j] is a permuted k-scaling of P for some k iff
1. Locations i and j of T are mod-equivalent
2. Locations i and j of T satisfy the equal-quotients property for each pair of characters
ji
a
b
c
d
e
f
a-b
b-c
c-d
d-e
e-f
Mod-Equivalent
Equal-quotients
ji
a
b
c
d
e
f
a-b
b-c
c-d
d-e
e-f
Mod-Equivalent
Equal-quotients
c bbc ca a c cb a c b
a
b
c
a
a-b
b-c
T=
b c a a a caP=
2 8000
0
00
0-1
0-1
Putting it together
ji
a
b
c
d
e
f
a-b
b-c
c-d
d-e
e-f
Mod-Equivalent
Equal-quotients
0 1 2
Build a table R of size n×2|Σ|+1
ji0 1 2
Each vector is associated with its location i
ji0 1 2
irisi1 i2 i3
Sort the vectors using Radix sort
irisi1 i2 i3
Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.
irisi1 i2 i3
For each equivalence class containing locations i1, i2,. . . , il announce appearances T[i + 1, j] for each i,j∈{i1, i2,. . . , il}, s.t. i < j.
Putting it all togetherAlgorithm:
1. Build a table R of size n×2|Σ|+1.2. 0 ≤ i ≤ n−1:
0 ≤ j ≤ |Σ|−1:
R(i,j)=#σj(T[0, i]) mod #σj(P) |Σ|≤ j ≤ 2|Σ|−1:
Putting it together
3. Each vector is associated with its location i.4. Sort the vectors using Radix sort.5. Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.6. For each equivalence class containing locations i1, i2,. . . , il announce appearances T[i + 1, j] for each i,j∈{i1, i2,. . . , il}, s.t. i < j.
Theorem
• The running time of the permuted scaled matching algorithm is:
O(n|Σ|+occ).
Output representation
• The output of the algorithm which we denoted occ may be as large as O(n2/m).
• Example:o Text an.o Pattern am.
Output representation• to reduce large number of appearances
set output to shortest match at each text location i.
a bbc aa a a ab a a b
a baP=
T=
Output representation• to reduce large number of appearances
set output to shortest match at each text location i.
a bbc aa a a ab a a b
a baP=
T=
Claim• Let i < j < h be three text locations. • Assume T[i, j] is a permuted scaled appearance of
P. • Then T[i, h] is a permuted scaled appearance of P
iff T[j + 1, h] is a permuted scaled appearance of P.
a bbc aa a a ab a a b
a baP=
T=
Claim• Let i < j < h be three text locations. • Assume T[i, j] is a permuted scaled appearance of
P. • Then T[i, h] is a permuted scaled appearance of P
iff T[j + 1, h] is a permuted scaled appearance of P.
a bbc aa a a ab a a b
a baP=
T=
Claim• Let i < j < h be three text locations. • Assume T[i, j] is a permuted scaled appearance of
P. • Then T[i, h] is a permuted scaled appearance of P
iff T[j + 1, h] is a permuted scaled appearance of P.
a bbc aa a a ab a a b
a baP=
T=
Putting it all togetherAlgorithm:
1. Build a table R of size n×2|Σ|+1.2. For every 0 ≤ i ≤ n−1:
o For every 0 ≤ j ≤ |Σ|−1:R(i,j)=#σj(T[0, i]) mod #σj(P)
o For every |Σ|≤ j ≤ 2|Σ|−1:
Putting it together
3. Each vector is associated with its location i.4. Sort the vectors using Radix sort.5. Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.6. For each entry q’ containing linked list i1, i2,. . . , il announce appearances T[ir+1,ir+1] for each ir∈{i1, i2,. . . , il}.
Running Time
• Permuted Scaled Matching:The running time is:
O(n|Σ|).
For efficiency• Need to generate the vectors quickly.
• Need to compare vectors quickly.
Idea: hash
• Need hash on vectors that can be modified quickly if vector changes very little.
• Use: hash – similar to Karp-Rabin
i+1
i
a
b
c
d
e
f
a-b
b-c
c-d
d-e
e-f
Mod-Equivalent
Equal-quotients
At most 1 change
s
At most 2change
s
c bbc ca a c cb a c b
8-1a
b
c
000
a
0
00
a-b
b-c
00 0
-1
b c a a a ca
90
10
0-1
T=
P=
c bbc ca a c cb a c b
8-1a
b
c
000
a
0
00
a-b
b-c
00 0
-1
b c a a a ca
90
10
0-1
T=
P=
• The running time can be improved to
oDeterministic O(n log |Σ|) oRandomized O(n)