Download - Permuted Scaled Matching Ayelet Butman Noa Lewenstein Ian Munro

Permuted Scaled

MatchingAyelet ButmanNoa Lewenstein

Ian Munro

Scaled matchingInput: Text T=t1,…,tn

Pattern P=p1,…,pm

Scaling: P[i]=p1…p1p2…p2 … pm…pm

Output: All text-locations j where i s.t. p[i] matches at j.

i i i

Scaled matching

cb aa

bb cc a aa a b babcb a

bb cc aa aa

Permutation matchingInput: Text T=t1,…,tn

Pattern P=p1,…,pm

Permutation (of pattern):pπ(1)pπ(2)…pπ(m) where π is a permutation on [m].

Output: All text-locations j where a pattern permutation occurs.

ba ca b ba c b babcb a

cb aa a bb

Permutation matching

ba ca b ba c b babcb a

ba ca b ba

Permutation matching

Permutation matching• Easy to solve in O(n) time (linear size alphabets).

• The pattern matching version of Jumbled Indexing.

Scaled permutation matching

• Match: First Permutation and then Scaling.


cb aa

aa bb c ac a b babcb a

aa bb cc aa


• Match: First Permutation and then Scaling.

• B-Eres-Landau[04]: Scaled Permutation Matching in O(n) time.

• Open: Can one do the reverse efficiently, i.e. scaling and then permutation.

• Hard ?

How can we solve? First - Naïve algorithm

Permuted scaled matching

Input: Text T=t1,…,tn

Pattern P=p1,…,pm

Output: All text-locations j where exist permuted scaled matching

Permuted scaled matching

cb aa

bc aa b ca a b babcb a

bb cc aa aa

Naïve algorithm

a abc aa a c cb a c b

a ac bP=

T=

Naïve algorithm


a ac bP=

T=

k=1

Naïve algorithm


a ac bP=

T=

k=2

Naïve algorithm1. Construct a table R of size (n+1)×|Σ| such

that R(i,j)=#σj(T[0, i]) for i ≥ 0 and R(−1, j) = 0.

2. For every 0 ≤ i < j ≤ n−1 such that j −i+ 1 = km for some natural number k ≥ 1 do:

a. Let r(l) =( R(j,l)−R(i−1,l))/#σl(P).

b. if r(l) = k for each l, 0 ≤ l ≤ |Σ| − 1, then announce that i is a k-scaled appearance.

Naïve algorithm


a ac bP=

T=

Naïve algorithm

a abc aa a c cb a c bT=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1

T=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

T=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

000

T=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

T=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

101

T=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

101

1111

aT=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

101

1111

a

2

11

T=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

T=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac bP=

T=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1P=

T=

K=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1#a=2#b=#c=1

P=

T=

K=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1K=#a=2

#b=#c=1 = = 1

= = 1

= = 1

P=

T=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1#a=2#b=#c=1 =

= 1

= 0

K=

P=

T=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 2#a=2#b=#c=1 =

= = 2

= = 2

K=

P=

T=

Naïve algorithm


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 2#a=2#b=#c=1

=

= = 2

= 2

K=

P=

T=

Naïve algorithm

The running time is where .

Better

?

• Mod-Equivalency: i and j are Mod-Equivalent if for every

character σ (with frequency c in P):

#σ in T[0,i] mod c = #σ in T[0,j] mod c

• Equal-Quotients:i and j have equal-quotients for char’s a & b if:

Properti

es

Mod-equivalent

• Mod-Equivalency: i and j are Mod-Equivalent if for every

character σ (with frequency c in P):

#σ in T[0,i] mod c = #σ in T[0,j] mod c

Mod-equivalent

c bbc ca a c cb a c b

1 1020 113 4 5 86 7. 9 12-1a

b

c

000

001

002

0121

a

1

21

2

21

2

31

2

32

3

32

3

42

3

52

3

53

3

63

3

64

a ac bP=#a=2#b=#c=1

T=

Mod-equivalent


113a

b

c

a

1

21

3

63

a ac b#a=2#b=#c=1

P=

T=

Mod-equivalent


113a

b

c

a

1

21

3

63

a ac b

a

#a=2

3𝑚𝑜𝑑2=1𝑚𝑜𝑑 2

P=

T=

Mod-equivalent


113a

b

c

a

1

21

3

63

a ac b#a=2

3𝑚𝑜𝑑2=1𝑚𝑜𝑑 2

P=

T=

Mod-equivalent


113a

b

c

a

1

21

3

63

a ac b#b=1

3𝑚𝑜𝑑1=1𝑚𝑜𝑑1

P=

T=

Mod-equivalent


113a

b

c

a

1

21

3

63

a ac b#c=1

6𝑚𝑜𝑑1=2𝑚𝑜𝑑1

P=

T=

Mod-equivalent


113a

b

c

a

1

21

3

63

a ac bP=

T=

Mod-equivalent


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

002

0121

a

1

21

2

21

2

31

2

32

3

32

3

42

3

52

3

53

3

43

3

44

a ac b#a=2

P=

T=

Mod-equivalent


102a

b

c

0121

a

3

53

a ac b#a=2

3𝑚𝑜𝑑2≠0𝑚𝑜𝑑2

P=

T=

Mod-equivalent

c bbc aa a c cb a a b

113a

b

c

a

1

21

5

43

a ac b

5𝑚𝑜𝑑2=1𝑚𝑜𝑑2 3𝑚𝑜𝑑1=1𝑚𝑜𝑑1 4𝑚𝑜𝑑1=2𝑚𝑜𝑑1

P=

T=

Equal-quotients

• Equal-Quotients:i and j have equal-quotients for char’s a & b if:

Equal-quotients


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

002

0121

a

1

21

2

21

2

31

2

32

3

32

3

42

4

42

4

43

5

43

5

44

a ac bP=

T=

Equal-quotients


113a

b

c

a

1

21

5

43

a ac bP=

T=

Equal-quotients


113a

b

c

a

1

21

5

43

a ac b

⌊52⌋− ⌊

31⌋=⌊

12⌋− ⌊

11⌋

P=

T=

Equal-quotients


113a

b

c

a

1

21

5

43

a ac b

⌊52⌋− ⌊

31⌋=⌊

12⌋− ⌊

11⌋ ⌊

31⌋− ⌊

41⌋=⌊

11⌋− ⌊

21⌋

P=

T=

Equal-quotients


1 1020 113 4 5 86 7 9 12-1a

b

c

000

001

002

0121

a

1

21

2

21

2

31

2

32

3

32

3

42

3

52

3

53

3

63

3

64

a ac bP=

T=

Equal-quotients


113a

b

c

a

1

21

3

63

a ac bP=

T=

Equal-quotients


113a

b

c

a

1

21

3

63

a ac b

⌊32⌋− ⌊

31⌋ ≠ ⌊

12⌋− ⌊

11⌋

P=

T=

Equal-quotients

a aaa bb a a aa a a b

1 1520 3 … 10 1311 12 14-1a

b

00

10

20

30

31

……

101

102

103

104

105

106

a aa b b b

b b b

P=T=

Equal-quotients

a aaa bb a a aa a a b

15a

b

3 …31

……

106

a aa b b b

b b b

⌊103⌋− ⌊

63⌋=⌊

33⌋− ⌊

13⌋

P=T=

Theorem

T[i, j] is a permuted k-scaling of P for some k iff

1. Locations i and j of T are mod-equivalent

2. Locations i and j of T satisfy the equal-quotients property for each pair of characters

ji

a

b

c

d

e

f

a-b

b-c

c-d

d-e

e-f

Mod-Equivalent

Equal-quotients


a

b

c

a

a-b

b-c

T=

b c a a a caP=

2 8000

0

00

0-1

0-1

Putting it together

ji

a

b

c

d

e

f

a-b

b-c

c-d

d-e

e-f

Mod-Equivalent

Equal-quotients

0 1 2

Build a table R of size n×2|Σ|+1

ji0 1 2

Each vector is associated with its location i

ji0 1 2

irisi1 i2 i3

Sort the vectors using Radix sort

irisi1 i2 i3

Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.

irisi1 i2 i3

For each equivalence class containing locations i1, i2,. . . , il announce appearances T[i + 1, j] for each i,j∈{i1, i2,. . . , il}, s.t. i < j.

Putting it all togetherAlgorithm:

1. Build a table R of size n×2|Σ|+1.2. 0 ≤ i ≤ n−1:

0 ≤ j ≤ |Σ|−1:

R(i,j)=#σj(T[0, i]) mod #σj(P) |Σ|≤ j ≤ 2|Σ|−1:

Putting it together

3. Each vector is associated with its location i.4. Sort the vectors using Radix sort.5. Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.6. For each equivalence class containing locations i1, i2,. . . , il announce appearances T[i + 1, j] for each i,j∈{i1, i2,. . . , il}, s.t. i < j.

Theorem

• The running time of the permuted scaled matching algorithm is:

O(n|Σ|+occ).

Output representation

• The output of the algorithm which we denoted occ may be as large as O(n2/m).

• Example:o Text an.o Pattern am.

Output representation• to reduce large number of appearances

set output to shortest match at each text location i.

a bbc aa a a ab a a b

a baP=

T=

Claim• Let i < j < h be three text locations. • Assume T[i, j] is a permuted scaled appearance of

P. • Then T[i, h] is a permuted scaled appearance of P

iff T[j + 1, h] is a permuted scaled appearance of P.

a bbc aa a a ab a a b

a baP=

T=

Putting it all togetherAlgorithm:

1. Build a table R of size n×2|Σ|+1.2. For every 0 ≤ i ≤ n−1:

o For every 0 ≤ j ≤ |Σ|−1:R(i,j)=#σj(T[0, i]) mod #σj(P)

o For every |Σ|≤ j ≤ 2|Σ|−1:

Putting it together

3. Each vector is associated with its location i.4. Sort the vectors using Radix sort.5. Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.6. For each entry q’ containing linked list i1, i2,. . . , il announce appearances T[ir+1,ir+1] for each ir∈{i1, i2,. . . , il}.

Running Time

• Permuted Scaled Matching:The running time is:

O(n|Σ|).

For efficiency• Need to generate the vectors quickly.

• Need to compare vectors quickly.

Idea: hash

• Need hash on vectors that can be modified quickly if vector changes very little.

• Use: hash – similar to Karp-Rabin

i+1

i

a

b

c

d

e

f

a-b

b-c

c-d

d-e

e-f

Mod-Equivalent

Equal-quotients

At most 1 change

s

At most 2change

s


8-1a

b

c

000

a

0

00

a-b

b-c

00 0

-1

b c a a a ca

90

10

0-1

T=

P=

• The running time can be improved to

oDeterministic O(n log |Σ|) oRandomized O(n)

Download - Permuted Scaled Matching Ayelet Butman Noa Lewenstein Ian Munro

Top Related