new lower bounds for the maximum number of runs in a string wataru matsubara 1, kazuhiko kusano 1,...

25
New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1 , Kazuhiko Kusano 1 , Akira Ishino 1 , Hideo Bannai 2 , Ayumi Shinohara 1 1 Tohoku University, Japan 2 Kyushu University, Japan

Upload: dwayne-matthew-glenn

Post on 26-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

New Lower Bounds for the Maximum Number of Runs in a String

Wataru Matsubara1, Kazuhiko Kusano1, Akira Ishino1, Hideo Bannai2, Ayumi Shinohara1

1Tohoku University, Japan2Kyushu University, Japan

Page 2: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Contents

Introduction New lower bounds A brief history of results on bounds Simple heuristics for generating run-rich strings Analyzing asymptotic lower bounds Discussion Conclusion and further research

2 Prague Stringology Conference 2008

Page 3: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

runs runs: occurrence of a periodic factor

non-extendable (maximal) exponent at least two primitive-rooted

example: aabaabaaaacaacac

aabaabaa= (aab)

period :3root :aab

exponent: 38

38

3 Prague Stringology Conference 2008

Page 4: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

number of runs: ρ(n) run(w) : number of runs in string w ρ(n) = max{run(w) : |w| = n }

maximum number of runs in a string of length n For any string w,

4 Prague Stringology Conference 2008

example

n 1 2 3 4 5 6 7 8 9 10 11 12 …

ρ(n) 0 1 1 2 2 3 4 5 5 6 7 8 …

run(aabaabbaabaa)=8

nw

wrunn

||

)()(

Page 5: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

0.927n[Franek et al. ’03][Franek & Yang ’06]

1.05n

0.90n

0.95n

c

1.048n[Crochemore et al. ’08]

Max Number of Runsin a String

5n   [Rytter ’06]

3.48n [Puglisi et al. ’08]

3.44n  [Rytter ’07]

1.6n[Crochemore & Ilie ’08]

n

2n

3n

4n

5n

0

cn[Kolpakov & Kucherov ’99]

c

1.00n

5

Page 6: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Our result: New lower bound

We discovered a run-rich string τaababaababbabaababaababbabaababaabbaababaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaabab

τ =

6 Prague Stringology Conference 2008

nnn 0.9331558

1455)(

run(τ) = 1455, | τ | = 1558 Known best lower bound

[Franek et al. ’03]

nnn 0.92751

3)(

New lower bound

Page 7: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

How to generate run-rich string run(τ) = 1455, | τ |= 1558

Let τ’ = τ[1:1557] (delete the last character),the number of runs not decrease drastically. run(τ’) = 1453, | τ’ |= 1557

In order to generate run-rich string, We only have to do is to append single characterto run-rich string.

0.9338..1558

1455

||

)(

run

7 Prague Stringology Conference 2008

0.9332..1557

1453

|'|

)'(

run

Page 8: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 238 Prague Stringology Conference 2008

a

aa

ab

aaaaaaabaabaaabbabaaabababbaabbb

aaa

aab

aba

abb

aaaaa 1aaaab 1aaaba 1aaabb 2aabaa 2aabab 2aabba 2aabbb 2abaaa 1abaab 1ababa 1ababb 2abbaa 2abbab 1abbba 1abbbb 1

aaabb 2aabaa 2aabab 2aabba 2aabbb 2ababb 2abbaa 2aaaaa 1aaaab 1aaaba 1

SelectTop10

The search first starts with the single string “a” in the buffer. At each round, two new strings are created from each string in the

buffer by appending “a” or “b” to the string. The new strings are then sorted with respect to the number of runs. Only those that fit in the buffer size are retained for the next round.

buffer size:10

Page 9: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 239

 

Prague Stringology Conference 2008

aaabb 2aabaa 2aabab 2aabba 2aabbb 2ababb 2abbaa 2aaaaa 1aaaab 1aaaba 1

aabaab 3aababb 3aabbaa 3aaabba 2 aaabbb 2aabaaa 2aababa 2aabbab 2aabbba 2aabbbb 2

aabaabb 4aabbabb 4aabaaba 3aababba 3aababbb 3aabbaaa 3aabbaab 3aaabbaa 3aababaa 3aabbaba 3

SelectTop10

SelectTop10

The string in the buffer become run-rich.

aaabba aaabbbaabaaa aabaab aababaaababbaabbaaaabbabaabbba aabbbb ababbaababbbabbaaaabbaabaaaaaaaaaaabaaaabaaaaabbaaabaaaaabab

aabaabaaabaabbaababbaaababbbaabbaaaaabbaabaaabbaaaaabbabaaabbbaaaabbbbaabaaaaaabaaabaababaaaabababaabbabaaabbabbaabbbaaaabbbabaabbbbaaabbbbb

Page 10: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 2310

Improving lower bound of ρ(n) (1/2)

Prague Stringology Conference 2008

We discovered a run-rich string τ such that run(τ) = 1455, | τ |= 1558

run(τ2) = 2915, | τ2 |= 2 ・ 1558 = 3116

nnn 0.9331558

1455)(

nnn 0.9353116

2915)(

Improved!!

run(τ2) > 2run(τ)

Page 11: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Improving lower bound of ρ(n) (2/2) Using run-rich string τ,

can we push lower bounds higher up more?k run(τk) |τk| ( ρ(n) ) ≧ run(τk)/|τk|

1 1455 1558 0.9338892 2915 3116 0.9354943 4374 4674 0.9358154 5833 6232 0.9359765 7292 7790 0.9360726 8751 9348 0.9361367 10210 10906 0.9361828 11669 12464 0.936216

: :

Next, we give a formula that calculate number of runs in wk.11 Prague Stringology Conference 2008

Page 12: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Number of runs in wk

TheoremLet w be a string of length n. For any k≧2, run(wk) = Ak - Bwhere A = run(w3) - run(w2) and B = 2run(w3) - 3run(w2)

12 Prague Stringology Conference 2008

Page 13: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Proof of the theorem (1/4)

If two strings wk and w are concatenated, the number of runs in wk+1 is changed in two cases:

case (a): increaseA new run may be newly created at the borderbetween two strings.

abba abba

abbaabba

13 Prague Stringology Conference 2008

Page 14: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Proof of the theorem (2/4)

If two strings wk and w are concatenated, the number of runs in wk+1 is changed in two cases:

case (b):decreaseA suffix run in wk and a prefix run in w may bemerged into one run in wk+1.

aabaaaabaa aabaaaabaa

aabaaaabaaaabaaaabaa

14 Prague Stringology Conference 2008

Page 15: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Proof of the theorem (3/4)

By periodicity lemma, there is no runs in wk such that length is longer than 2|w| except the whole string wk.

For any k 3≧ , run(wk) - run(wk-1) = c (constant).

w w w w w

15 Prague Stringology Conference 2008

Page 16: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Proof of the theorem (4/4)

TheoremLet w be a string of length n. For any k≧2, run(wk) = Ak - Bwhere A = run(w3) - run(w2) and B = 2run(w3) - 3run(w2)

proofFor any k≧3, run(wk) - run(wk-1) is a constant.

16 Prague Stringology Conference 2008

Page 17: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Asymptotic behavior of ρ(n)

TheoremFor any string w and any ε>0, there exists a positive integer N such that for any n≧N,

||

)()()( 23

w

wrunwrun

n

n

proof

)()1(

)(

nn

BAkwrun

BAN

k

17 Prague Stringology Conference 2008

Page 18: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Discovered run-rich strings

Length of τ

r(τ) r(τ2) r(τ3) ρ (n) ≧

125 110 227 343 0.928

1558 1455 2915 4374 0.93645

60064 56714 113448 170181 0.944542

105405 99541 199103 298664 0.944557

184973 174697 349417 524136 0.944565

We found some run-rich strings by using heuristic search.

The strings in the buffer are sortedwith respect to r(w3)-r(w2), instead of r(w) for improving asymptotic behavior.

current best lower bound

18 Prague Stringology Conference 2008

See our web site [http://www.shino.ecei.tohoku.ac.jp/runs]

Page 19: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Discussion What is the class of run-rich strings?

Sturmian words are not run-rich. [Rytter2008] (for any Sturmian word w)

Any recursive construction of a sequence of run-rich strings?

We believe that compression has a clue to understanding. run-rich string τ (|τ|=184973) can be represented by

only 24 LZ factors.

8.0||

)(

w

wrun

19 Prague Stringology Conference 2008

Page 20: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

LZ-factorization of τ ( |τ| = 184973 )

a, (0,1) / b / (1, 3) / (1, 4) / (2, 8) / (5, 13) (12,19) / (26,31) / (49,38) / (50,63) / (89,93) / (113,162) / (57,317) / (249,693) / (275,984) / (879,2120) / (942,3041) / (2811,6521) / (2999,9374) / (8764,20072) / (9332,28878) / (27096,45341) / (38210,67195)

20 Prague Stringology Conference 2008

aababaababbabaababaababbabaababab…τ =

LZ(τ)=

(0,1)(1,3)(1,4)(2,8)(5,13) :

Page 21: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Conclusion

We Introduced new approach for analyzing lower bounds using heuristic search.

We Improved the lower bound of the number of runs in a string. new lower bound is 0.944565.

21 Prague Stringology Conference 2008

Page 22: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Further research Improving heuristic algorithm

Speed up for counting runs in strings Find good heuristics Guess run-rich strings in compressed form (LZ factors)

Analyzing the class of run-rich strings Any recursive construction of a sequence of run-rich

strings? Relation with compression

Algorithms for finding all runs in strings process compressed string without decompression.

22 Prague Stringology Conference 2008

Page 23: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

0.927n[Franek et al. ’03]

1.05n

0.90n

0.95n

c

0.944565n[Matsubara et al. ’08]

1.048n[Crochemore et al. ’08]

Max Number of Runsin a String

5n   [Rytter ’06]

3.48n [Puglisi et al. ’08]

3.44n  [Rytter ’07]

1.6n[Crochemore & Ilie ’08]

n

2n

3n

4n

5n

0

cn[Kolpakov & Kucherov ’99]

c

1.00n

thank you for your attention.

23

Page 24: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 23

Appendix

24 Prague Stringology Conference 2008

Page 25: New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1

/ 2325

Conjecture: ρ(n) < n

Prague Stringology Conference 2008

0 10 20 30 40 50 600

10

20

30

40

50

60

nρ(n)