computing reversed lempel-ziv factorization online
DESCRIPTION
Computing Reversed Lempel-Ziv Factorization Online. Shiho Sugimoto , Tomohiro I, Shunsuke Inenaga , Hideo Bannai , Masayuki Takeda Kyushu University, Japan. Outline. Reversed LZ factorization without self-references (RLZ) Online RLZ algorithm by Kolpakov and Kucherov - PowerPoint PPT PresentationTRANSCRIPT
HABATAKITAI LaboratoryEverything is String.
Computing ReversedLempel-Ziv Factorization Online
Shiho Sugimoto, Tomohiro I, Shunsuke Inenaga,Hideo Bannai, Masayuki Takeda
Kyushu University, Japan
HABATAKITAI LaboratoryEverything is String.
• Reversed LZ factorization without self-references (RLZ)
• Online RLZ algorithm by Kolpakov and Kucherov
• New online RLZ algorithm using O(n log σ) bits of space
• Reversed LZ factorization with self-references (RLZS)
• New online RLZS algorithm using O(n log n) bits of space
• New online RLZS algorithm using O(n log σ) bits of space
Outline
n : the length of input stringσ : the alphabet size
HABATAKITAI LaboratoryEverything is String.
• LZ factorization was proposed in 1977[Ziv & Lempel, 1977].– data compression etc.
• Reversed LZ factorization (RLZ in short) was proposed in 2009 [Kolpakov & Kucherov, 2009].– finding gapped palindromes etc.
Background
HABATAKITAI LaboratoryEverything is String.
LZ factorization without self-references[Ziv & Lempel, 1977]
LZ factorization without self-references of string w of length n is a factorization s1,s2,...,sm such that• w = s1 s2…sm
• si is the longest non-empty prefix ofw[|s1…si−1|+1..n] that is also a substring ofw[1.. | s1…si−1|] if such exists
• si = w[|s1…si−1|+1] otherwise
HABATAKITAI LaboratoryEverything is String.
LZ factorization without self-references[Ziv & Lempel, 1977]
Ex ) w = a b b a a a a b b b a cs1 s2
LZ factorization without self-references of string w of length n is a factorization s1,s2,...,sm such that• w = s1 s2…sm
• si is the longest non-empty prefix ofw[|s1…si−1|+1..n] that is also a substring ofw[1.. | s1…si−1|] if such exists
• si = w[|s1…si−1|+1] otherwise
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
LZ factorization without self-references
s1 s2 s3
[Ziv & Lempel, 1977]
LZ factorization without self-references of string w of length n is a factorization s1,s2,...,sm such that• w = s1 s2…sm
• si is the longest non-empty prefix ofw[|s1…si−1|+1..n] that is also a substring ofw[1.. | s1…si−1|] if such exists
• si = w[|s1…si−1|+1] otherwise
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
LZ factorization without self-references
s1 s2 s3
[Ziv & Lempel, 1977]
LZ factorization without self-references of string w of length n is a factorization s1,s2,...,sm such that• w = s1 s2…sm
• si is the longest non-empty prefix ofw[|s1…si−1|+1..n] that is also a substring ofw[1.. | s1…si−1|] if such exists
• si = w[|s1…si−1|+1] otherwises4
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
LZ factorization without self-references
s1 s2 s3
[Ziv & Lempel, 1977]
LZ factorization without self-references of string w of length n is a factorization s1,s2,...,sm such that• w = s1 s2…sm
• si is the longest non-empty prefix ofw[|s1…si−1|+1..n] that is also a substring ofw[1.. | s1…si−1|] if such exists
• si = w[|s1…si−1|+1] otherwises4 s5
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
LZ factorization without self-references
s1 s2 s3
[Ziv & Lempel, 1977]
LZ factorization without self-references of string w of length n is a factorization s1,s2,...,sm such that• w = s1 s2…sm
• si is the longest non-empty prefix ofw[|s1…si−1|+1..n] that is also a substring ofw[1.. | s1…si−1|] if such exists
• si = w[|s1…si−1|+1] otherwises4 s5 s6
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
LZ factorization without self-references
s1 s2 s3
[Ziv & Lempel, 1977]
LZ factorization without self-references of string w of length n is a factorization s1,s2,...,sm such that• w = s1 s2…sm
• si is the longest non-empty prefix ofw[|s1…si−1|+1..n] that is also a substring ofw[1.. | s1…si−1|] if such exists
• si = w[|s1…si−1|+1] otherwises4 s5 s6 s7
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
LZ factorization without self-references
s1 s2 s3
[Ziv & Lempel, 1977]
LZ factorization without self-references of string w of length n is a factorization s1,s2,...,sm such that• w = s1 s2…sm
• si is the longest non-empty prefix ofw[|s1…si−1|+1..n] that is also a substring ofw[1.. | s1…si−1|] if such exists
• si = w[|s1…si−1|+1] otherwises4 s5 s6 s7 s8 s9
HABATAKITAI LaboratoryEverything is String.
RLZ without self-references of string w of length n is a factorization f1,f2,...,fm such that• w = f1 f2…fm
• fi is the longest non-empty prefix of w[|f1...fi−1|+1..n] that is also a substring of w[1.. | f1...fi−1|]R if such exists
• fi = w[|f1...fi−1|+1] otherwise
Reversed LZ factorizationwithout self-references (RLZ)
[Kolpakov & Kucherov, 2009]
reversed
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a cf1 f2
RLZ without self-references of string w of length n is a factorization f1,f2,...,fm such that• w = f1 f2…fm
• fi is the longest non-empty prefix of w[|f1...fi−1|+1..n] that is also a substring of w[1.. | f1...fi−1|]R if such exists
• fi = w[|f1...fi−1|+1] otherwise
Reversed LZ factorizationwithout self-references (RLZ)
[Kolpakov & Kucherov, 2009]
reversed
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a cf3f1 f2
RLZ without self-references of string w of length n is a factorization f1,f2,...,fm such that• w = f1 f2…fm
• fi is the longest non-empty prefix of w[|f1...fi−1|+1..n] that is also a substring of w[1.. | f1...fi−1|]R if such exists
• fi = w[|f1...fi−1|+1] otherwise
Reversed LZ factorizationwithout self-references (RLZ)
[Kolpakov & Kucherov, 2009]
reversed
HABATAKITAI LaboratoryEverything is String.
f4f3Ex ) w = a b b a a a a b b b a c
f1 f2
RLZ without self-references of string w of length n is a factorization f1,f2,...,fm such that• w = f1 f2…fm
• fi is the longest non-empty prefix of w[|f1...fi−1|+1..n] that is also a substring of w[1.. | f1...fi−1|]R if such exists
• fi = w[|f1...fi−1|+1] otherwise
Reversed LZ factorizationwithout self-references (RLZ)
[Kolpakov & Kucherov, 2009]
reversed
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a cf5f4f3f1 f2
RLZ without self-references of string w of length n is a factorization f1,f2,...,fm such that• w = f1 f2…fm
• fi is the longest non-empty prefix of w[|f1...fi−1|+1..n] that is also a substring of w[1.. | f1...fi−1|]R if such exists
• fi = w[|f1...fi−1|+1] otherwise
Reversed LZ factorizationwithout self-references (RLZ)
[Kolpakov & Kucherov, 2009]
reversed
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a cf5f4f3f1 f2
RLZ without self-references of string w of length n is a factorization f1,f2,...,fm such that• w = f1 f2…fm
• fi is the longest non-empty prefix of w[|f1...fi−1|+1..n] that is also a substring of w[1.. | f1...fi−1|]R if such exists
• fi = w[|f1...fi−1|+1] otherwise
Reversed LZ factorizationwithout self-references (RLZ)
[Kolpakov & Kucherov, 2009]
reversedf6 f7
HABATAKITAI LaboratoryEverything is String.
• Computes RLZ in an online manner• Works in O(n log n) bits of space and O(n log σ)
time (on a word RAM model).– Constructs suffix tree for reversed prefixes online.– Computes RLZ factors from suffix tree.– Blumer’s version of Weiner’s algorithm achieves
above complexity [Blumer et al, 1985] [Weiner, 1973].
KK algorithm[Kolpakov & Kucherov, 2009]
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
KK algorithm[Kolpakov & Kucherov, 2009]
Stree(ε)
f1
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
KK algorithm[Kolpakov & Kucherov, 2009]
Stree(aR)
a
f1 f2
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
KK algorithm[Kolpakov & Kucherov, 2009]
a ab
Stree((ab)R)
f1 f2
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
KK algorithm[Kolpakov & Kucherov, 2009]
Stree((ab)R)
a ab
f1 f2 f3
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
KK algorithm[Kolpakov & Kucherov, 2009]
Stree((abba)R)
a
a
bbba ba
f1 f2 f3 f4
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
KK algorithm[Kolpakov & Kucherov, 2009]
a
a
b
bba ba
abab
Stree((aabba)R)
f1 f2 f3 f4 f5
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
KK algorithm[Kolpakov & Kucherov, 2009]
Stree((aabba)R)
a
a
b
bba ba
abab
This suffix tree requires O(n log n) bits of space
We propose a new online RLZ algorithm which uses only O(n log σ) bits of space. (σ≦n is the alphabet size)
f1 f2 f3 f4 f5
HABATAKITAI LaboratoryEverything is String.
For O(n log σ) bits of space
• We utilize the idea of Starikovskaya’s algorithm.– It computes LZ factorization online in O(n log σ) bits of
space and O(n log2n) time [Starikovskaya, 2012].• We divide input string into blocks of length
r = O(logσn). – Each block is replaced by a meta-character.
HABATAKITAI LaboratoryEverything is String.
For O(n log σ) bits of space
Ex ) w = a b b a a a a b b b a c ………r = 3
B A B C ………
• We utilize the idea of Starikovskaya’s algorithm.– It computes LZ factorization online in O(n log σ) bits of
space and O(n log2n) time [Starikovskaya, 2012].• We divide input string into blocks of length
r = O(logσn). – Each block is replaced by a meta-character.
HABATAKITAI LaboratoryEverything is String.
• We utilize the idea of Starikovskaya’s algorithm.– It computes LZ factorization online in O(n log σ) bits of
space and O(n log2n) time [Starikovskaya, 2012].• We divide input string into blocks of length
r = O(logσn). – Each block is replaced by a meta-character.
For O(n log σ) bits of space
Ex ) w = a b b a a a a b b b a c ………r = 3
B A B C ………
HABATAKITAI LaboratoryEverything is String.
• For fi of length shorter than r, we use suffix trie of reversed subwords of length 2r.– can find fi in o(n) bits of space and O(|fi| log σ) time.
• For fi of length at least r, we use suffix tree of reversed blocks (meta-characters).– can find fi in O(n log σ) bits of space and O(|fi| log2n)
time.
Our online RLZ algorithm
HABATAKITAI LaboratoryEverything is String.
• For fi of length shorter than r, we use suffix trie of reversed subwords of length 2r.– can find fi in o(n) bits of space and O(|fi| log σ) time.
• For fi of length at least r, we use suffix tree of reversed blocks (meta-characters).– can find fi in O(n log σ) bits of space and O(|fi| log2n)
time.
Our online RLZ algorithm
We can compute RLZ without self-references online in O(n log σ) bits of space and O(n log2n) time.
Theorem
HABATAKITAI LaboratoryEverything is String.
Outline
• Reversed LZ factorization without self-references (RLZ)
• Online RLZ algorithm by Kolpakov and Kucherov
• New online RLZ algorithm using O(n log σ) bits of space
• Reversed LZ factorization with self-references (RLZS)
• New online RLZS algorithm using O(n log n) bits of space
• New online RLZS algorithm using O(n log σ) bits of space
n : the length of input stringσ : the alphabet size
HABATAKITAI LaboratoryEverything is String.
LZ factorization with self-references
LZ factorization with self-references of string w of length n is a factorization t1,t2,...,tm such that• w = t1 t2…tm
• ti is the longest non-empty prefix of w[|t1…ti−1|+1..n] that is also a substring of w[1.. | t1…ti|-1] if such exists
• ti = w[|t1…ti−1|+1] otherwise.self-reference
[Ziv & Lempel, 1977]
HABATAKITAI LaboratoryEverything is String.
LZ factorization with self-references
LZ factorization with self-references of string w of length n is a factorization t1,t2,...,tm such that• w = t1 t2…tm
• ti is the longest non-empty prefix of w[|t1…ti−1|+1..n] that is also a substring of w[1.. | t1…ti|-1] if such exists
• ti = w[|t1…ti−1|+1] otherwise.self-reference
[Ziv & Lempel, 1977]
Ex ) w = a b b a a a a b b b a ct1 t2 t3
HABATAKITAI LaboratoryEverything is String.
LZ factorization with self-references
LZ factorization with self-references of string w of length n is a factorization t1,t2,...,tm such that• w = t1 t2…tm
• ti is the longest non-empty prefix of w[|t1…ti−1|+1..n] that is also a substring of w[1.. | t1…ti|-1] if such exists
• ti = w[|t1…ti−1|+1] otherwise.self-reference
[Ziv & Lempel, 1977]
Ex ) w = a b b a a a a b b b a ct1 t2 t3 t4
HABATAKITAI LaboratoryEverything is String.
LZ factorization with self-references
LZ factorization with self-references of string w of length n is a factorization t1,t2,...,tm such that• w = t1 t2…tm
• ti is the longest non-empty prefix of w[|t1…ti−1|+1..n] that is also a substring of w[1.. | t1…ti|-1] if such exists
• ti = w[|t1…ti−1|+1] otherwise.self-reference
[Ziv & Lempel, 1977]
Ex ) w = a b b a a a a b b b a ct1 t2 t3 t4 t5 t6 t7 t8
HABATAKITAI LaboratoryEverything is String.
Reversed LZ factorizationwith self-references
RLZ with self-references (RLZS) of string w of length n is a factorization g1,g2,...,gm such that• w = g1 g2…gm
• gi is the longest non-empty prefix of w[|g1...gi−1|+1..n] that is also a substring of w[1.. | g1…gi|-1]R if such exists
• gi = w[|g1…gi−1|+1] otherwise.self-reference
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
Reversed LZ factorizationwith self-references
g1 g2
RLZ with self-references (RLZS) of string w of length n is a factorization g1,g2,...,gm such that• w = g1 g2…gm
• gi is the longest non-empty prefix of w[|g1...gi−1|+1..n] that is also a substring of w[1.. | g1…gi|-1]R if such exists
• gi = w[|g1…gi−1|+1] otherwise.self-reference
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
Reversed LZ factorizationwith self-references
g1 g3
RLZ with self-references (RLZS) of string w of length n is a factorization g1,g2,...,gm such that• w = g1 g2…gm
• gi is the longest non-empty prefix of w[|g1...gi−1|+1..n] that is also a substring of w[1.. | g1…gi|-1]R if such exists
• gi = w[|g1…gi−1|+1] otherwise.self-reference
g2
HABATAKITAI LaboratoryEverything is String.
Ex ) w = a b b a a a a b b b a c
Reversed LZ factorizationwith self-references
g1 g3
RLZ with self-references (RLZS) of string w of length n is a factorization g1,g2,...,gm such that• w = g1 g2…gm
• gi is the longest non-empty prefix of w[|g1...gi−1|+1..n] that is also a substring of w[1.. | g1…gi|-1]R if such exists
• gi = w[|g1…gi−1|+1] otherwise.self-reference
g2 g4 g5
HABATAKITAI LaboratoryEverything is String.
online computation of RLZSEx ) w = a b b a a a a b b b a cw[1..1] = a
HABATAKITAI LaboratoryEverything is String.
online computation of RLZS
w[1..2] = a bw[1..1] = aEx ) w = a b b a a a a b b b a c
HABATAKITAI LaboratoryEverything is String.
online computation of RLZS
w[1..2] = a bw[1..1] = aEx ) w = a b b a a a a b b b a c
w[1..3] = a b b
HABATAKITAI LaboratoryEverything is String.
online computation of RLZS
w[1..2] = a bw[1..1] = aEx ) w = a b b a a a a b b b a c
w[1..3] = a b bw[1..4] = a b b a
HABATAKITAI LaboratoryEverything is String.
online computation of RLZS
w[1..2] = a bw[1..1] = aEx ) w = a b b a a a a b b b a c
w[1..3] = a b bw[1..4] = a b b a
HABATAKITAI LaboratoryEverything is String.
online computation of RLZS
w[1..2] = a bw[1..1] = aEx ) w = a b b a a a a b b b a c
w[1..3] = a b bw[1..4] = a b b aw[1..5] = a b b a aw[1..6] = a b b a a aw[1..7] = a b b a a a aw[1..8] = a b b a a a a bw[1..9] = a b b a a a a b b
w[1..10] = a b b a a a a b b bw[1..11] = a b b a a a a b b b aw[1..12] = a b b a a a a b b b a c
HABATAKITAI LaboratoryEverything is String.
Every self-referencing factor is a suffix of a palindrome.
Ex ) w = a b b a a a a b b b a c
palindrome
Reversed LZ factorizationwith self-references
g1 g3g2 g4 g5
HABATAKITAI LaboratoryEverything is String.
Every self-referencing factor is a suffix of a palindrome.
Ex ) w = a b b a a a a b b b a c
palindrome
Reversed LZ factorizationwith self-references
g1 g3g2 g4 g5
HABATAKITAI LaboratoryEverything is String.
We can compute each RLZS factor gi by• using KK algorithm, and
– In a total of O(n log n) bits of space and O(n log σ) time.• computing the longest palindrome which ends at
each position, online– In a total of O(n log n) bits of space and O(n) time, by
modifying Manachar’s algorithm [Manacher, 1975].
online RLZS in O(nlogn) bits of space
We can compute RLZS online in O(n log n) bits of space and O(n log σ) time.
Theorem
HABATAKITAI LaboratoryEverything is String.
Outline
• Reversed LZ factorization without self-references (RLZ)
• Online RLZ algorithm by Kolpakov and Kucherov
• New online RLZ algorithm using O(n log σ) bits of space
• Reversed LZ factorization with self-references (RLZS)
• New online RLZS algorithm using O(n log n) bits of space
• New online RLZS algorithm using O(n log σ) bits of space
n : the length of input stringσ : the alphabet size
HABATAKITAI LaboratoryEverything is String.
Suffix palindromes
• All suffix palindromes of a string of length n can be presented by O(log n) arithmetic progressions [Apostolico,1995].
HABATAKITAI LaboratoryEverything is String.
Suffix palindromes
• All suffix palindromes of a string of length n can be presented by O(log n) arithmetic progressions [Apostolico,1995].
Ex) w = a b a b a c a b a b a d a b a b a c a b a b a
HABATAKITAI LaboratoryEverything is String.
Suffix palindromes
• All suffix palindromes of a string of length n can be presented by O(log n) arithmetic progressions [Apostolico,1995].
Ex) w = a b a b a c a b a b a d a b a b a c a b a b a
HABATAKITAI LaboratoryEverything is String.
online computation of suffix palindromes
wa = a b a b a b a b a
wc = a b a b a b a b c
Ex) w = a b a b a b a b
• What happens to the suffix palindromes when a new character is appended?
HABATAKITAI LaboratoryEverything is String.
online computation of suffix palindromes
• What happens to the suffix palindromes when a new character is appended?
xw a
HABATAKITAI LaboratoryEverything is String.
a
online computation of suffix palindromes
• What happens to the suffix palindromes when a new character is appended?
w a
xw a
if x = a
HABATAKITAI LaboratoryEverything is String.
x
online computation of suffix palindromes
• What happens to the suffix palindromes when a new character is appended?
w b b b b
w bb b b b
if x = b
HABATAKITAI LaboratoryEverything is String.
We can compute each RLZS factor gi by• using our RLZS algorithm, and
– In a total of O(n log σ) bits of space and O(n log2n) time.• computing the longest palindrome which ends at
each position, online– In a total of O(log2n) bits of space and O(n log n) time.
Computing RLZS in O(n log σ) bits of space
We can compute RLZS online in O(n log σ) bits of space and O(n log2n) time.
Theorem
HABATAKITAI LaboratoryEverything is String.
• RLZS was too difficult for us to factorize
The problems of RLZS
There is a mistake in proceedings of the PSC.
Proof
HABATAKITAI LaboratoryEverything is String.
• RLZS was too difficult for us to factorize
The problems of RLZS
There is a mistake in proceedings of the PSC.
Proof
p114
a b b a a a a b b b a c
a b b a a a a b b b a c
HABATAKITAI LaboratoryEverything is String.
• RLZS was too difficult for us to factorize
The problems of RLZS
There is a mistake in proceedings of the PSC.
Proof
HABATAKITAI LaboratoryEverything is String.
• RLZS was too difficult for us to factorize
• No idea for using RLZS
The problems of RLZS
There is a mistake in proceedings of the PSC.
Proof
HABATAKITAI LaboratoryEverything is String.
RLZ online algorithms
Conclusion
O(n log n) bits O(n log σ) bits
without O(n log σ) time O(n log2n) time
with O(n log σ) time O(n log2n) time
self-references
space
n : the length of input stringσ : the alphabet size
[Kolpakov & Kucherov, 2009]
HABATAKITAI LaboratoryEverything is String.
RLZ online algorithms
Conclusion
O(n log n) bits O(n log σ) bits
without O(n log σ) time O(n log2n) time
with O(n log σ) time O(n log2n) time
self-references
space
This workn : the length of input stringσ : the alphabet size
[Kolpakov & Kucherov, 2009]