compression of palindromes and regularity.kyodo/kokyuroku/contents/pdf/2051-24.pdf · compression...
TRANSCRIPT
Compression of Palindromes and Regularity.
Kayoko Shikishima-Tsuji
Center for Liberal Arts Education and Research
Tenri University
1 Introduction
In [1], a property of clickstream data at a view of database is discussed and it is shown that page repetitions occur for the majority as a very specific structure, namely in the form of nested palindromes. A kind of function CFP (Compress First Palindrome) required for an algorithm which extracts these structures in linear time is introduced.
In this paper, we define a rewriting system R which covers CFP and consider relation between R and CFP. We adopt the name “wrinkled word” instead of “nested palindrome” and it means a word which has a non-trivial palindrome as a factor. The set of all wrinkled words is regular, though the set of all palindromes is not regular. We also give automata on some alphabets which accept all wrinkled words.
2 Preliminaries
We assume the reader to be familiar with basic concepts as alphabet, word, language, regular expression and automaton (for more details see [2]).
Words together with the operation of concatenation form a free monoid, which is usually denoted by Σ∗ for a finite alphabet Σ. The length of a finite word w is the number of not necessarily distinct symbols it consists of and is written by � .
The empty word is denoted by � and � . For a word � for
� , a factor of � is � , where � , and the reverse � of
� is � . A word � is said to be palindrome if � . If a palindrome
� is not in � , the palindrome � is non-trivial, otherwise � is trivial. If a
word � has at least one non-trivial palindrome as a factor, � is said to be wrinkled.
w
λ λ = 0 w = a1a2!ana1,a2 ,!,an ∈Σ w ai!aj 1≤ i ≤ j ≤ n wR
w an!a2a1 p∈Σ∗ p = pR
p Σ∪{λ} p p
w∈Σ∗ w
�1
A string rewriting system � on � is a subset of � . We define reduction relation on � that is induced by � is defined as follows: for every � , � if and only if there exists � such that for some � , �
and � . By � , we denote the reflexive transitive closure of � . If �
and there is no � such that � , then � is irreducible; otherwise, � is
reducible. The set of all irreducible words with respect to � is denoted by
� . For � , if � and � is irreducible, then � is a normal form for
� . If, for all � , � and � imply that there exists � such
that � and � , we say that � is confluent. If for all � , �
and � imply that there exists � such that � and � , we say that � is locally confluent. If there is no infinite sequence � where
� , then � is said to be noetherian.
Two string rewriting systems � and � on � are called equivalence if � implies � and � implies � and then we denote � .
Compress First Palindrome CFP : � is defined as follows (see [1]) : if � is a wrinkled word, for the left most � of � where � �
� , there are � such that � and we define � and if
� is a non-wrinkled word, we define � . Since a wrinkled word � has only finite non-trivial palindromes as factor and � , then we can
define � � where � is a large enough number such that
� ( � ). 3 Regularity of the set of all wrinkled words.
Proposition 1. Let � and � be string rewriting systems � �and . Then R and are
equivalent.
Proof) Since � , it is clear that if � for some � , then we have � .
On the other side, let � for some � , then there exist � ,� and a palindrome � such that � and � . If � ,
R Σ Σ∗ × Σ∗
Σ∗ R u,v∈Σ∗
u→R v (l,m)∈R x, y∈Σ∗ u = xly
v = xmy →R∗ →R x ∈Σ∗
y∈Σ∗ x→R y x x
→R
IRR(R) x, y∈Σ∗ x→R∗ y y y
x w, x, y∈Σ∗ w→R∗ x w→R
∗ y z ∈Σ∗
x→R∗ z y→R
∗ z R w, x, y∈Σ∗ w→R x
w→R y z ∈Σ∗ x→R∗ z y→R
∗ zR x1 →R x2 →R!
x1, x2 ,!∈Σ∗ R
R S Σw→R
∗ z w→S∗ z w→S
∗ z w→R∗ z R ≅ S
Σ∗ → Σ∗
w∈Σ∗ aba w a∈Σ, b∈Σ∪{λ},
a ≠ b u,v∈Σ∗ w = uabav CFP(w) = uv
w∈Σ∗ CFP(w) = w ww > CFP(w)
CFP∞ (w) = CFPn (w) n
CFPn (w) = CFPn+1(w) = CFP(CFPn (w))
R S R = {apa→R a | a∈Σ,p is a palindrome} S = aba→S a | a∈Σ, b∈Σ∪ λ{ }{ } S
R ⊇ S w→S z w, z ∈Σ∗
w→R zw→R z w, z ∈Σ∗ u,v∈Σ∗
a∈Σ p∈Σ∗ w = uapav z = uav p∈Σ∪{λ}
�2
then � . If � , then � is written � by � , � and � . Since � and so on, we have � .
q.e.d.
The following lemma is well-known (see [3]).
Lemma 1. If a string rewriting system � is noetharian and locally confluent, then � is confluent.
Proposition 2. Let � and � be string rewriting systems � �and . Then R and S are
confluent.
Proof) By Proposition 1 and Lemma 1, it is enough to prove that S is locally confluent.(a) If � where � and � for some
� , then � , � and � , � .
(b) If � and � where � , � . Since � , � , � , � , we have � where � when � and � when � .
q.e.d.
Proposition 3. Let � be one of string rewriting systems of Proposition 2. If, for � and a natural number n, � , then � . On the other hand, if � for � , then there exist a natural number n and � such that � and � .
Proof) If � , it is obvious that � . If � has no non-trivial palindrome as a factor, then we have � and � . We may assume that � is wrinkled. There exists a finite sequence: � , � such that � is wrinkled and � is not wrinkled. Then we have a sequence � . Since � is not wrinkled, � has no non-trivial palindrome as a factor and � . By Proposition 2, the rewriting system � is confluent and then we have � .
The following corollary is clear by Proposition 3.
w→S z p∉Σ∪{λ} p xbcbxR c∈Σ∪{λ} b∈Σx ∈Σ∗ w = uaxbcbxRav→S uaxbx
Ravw = uaxbcbxRav→S
∗ uav = z
RR
R S R = {apa→R a | a∈Σ,p is a palindrome} S = aba→S a | a∈Σ, b∈Σ∪ λ{ }{ }
w = u1v1u2v2u3 u1,v1,u2 ,v2 ,u3 ∈Σ∗ v1 →S a,v2 →S ba, b∈Σ w→S u1au2v2u3 w→S u1v1u2bu3 u1au2v2u3 →S u1au2bu3u1v1u2bu3 →S u1au2bu3w = u1vu3 v∈{aaa, abaa, aaba, abab} u1,u3 ∈Σ∗ a,b∈Σ
aaa→S∗ a abaa→S
∗ a aaba→S∗ a abab→S
∗ ab w→S∗ u1 ′v u3
′v = a v∈{aaa, abaa, aaba} ′v = ab v = aaba
R 'w, z ∈Σ∗ CFPn (w) = z w→R '
∗ zw→R '
∗ z w, z ∈Σ∗ z '∈Σ∗
CFPn (w) = z ' w→R '∗ z '
CFP(w) = z w→R ' zw w = z
CFP(w) = w ww = w0 CFP(w) = w1,!,CFP
n (w) = wn wn−1 wn
w→S w1 →S!→S wn wn
wn wn ∈IRR(R ')R ' w→R '
∗ wn
�3
Corollary 1. Let � be the string rewriting system � �� . Then we have � .
By Corollary 1, we have � = {the set of all non-wrinkled words}The following lemma is well-known (see [3]).
Lemma 2. A string rewriting system � is finite, then � is a regular set.
By Proposition 3 and Lemma 2, we have the following theorem.
Theorem 1. The language NW = {the set of all non-wrinkled words} and the language W = {the set of all wrinkled words} are both regular.
Proof) The language NW is regular and then W =(NW)c is regular.q.e.d.
The set of all palindromes are contest-free but not regular. W = {the set of all wrinkled words}= � is regular.
4 Examples.
For � , we give an automaton on � which accepts all wrinkled words on � .
Example 1. If � , then NW = {� | w is non-wrinkled word} is the empty set.
Example 2. If � , then NW is the set {ab, ba}
Example 3. If � , then NW is the set { � | � is a factor of (abc)* ∪
(acb)* such that | u | > 1}.
By the following automaton A = (Q, Σ, δ, i, F) (Figure 1), NW is accepted,
where � , � is the initial state and � is a set of final states.
R R = {apa→R a | a∈Σ,p is a palindrome} CFP∞ (Σ∗) = IRR(R)
CFP∞ (Σ∗)
R IRR(R)
= CFP∞ (Σ∗)
w |w has at least one non-trivial palindrome as a factor{ }
Σ ≤ 4 ΣΣ
Σ = {a} w∈Σ∗
Σ = {a,b}
Σ = {a,b,c} u ∈Σ∗ u
Q = {i, 1, 2,!, 9} i ∈ Q F =Q
�4
Example 4. If � , then NW is the language which is accepted by the following automaton A = (Q, Σ, δ, i, F) (Figure 2), where � , � is the initial state and � is a set of final states. In Figure 2, states � and the following transition functions which start from these states are omitted for simplicity: � , � , � , � , � , � , � , � , � , � , � , � , � , � , � , � (see Figure 3).
Σ = {a,b,c,d}Q = {i, 1, 2,!,16}
i ∈ Q F =Qi, 13,14,15,16
a : i→13 b : i→14 c : i→14 d : i→15 b :13→ 2c :13→ 8 d :13→10 a :14→11 c :14→ 7 d :14→1 a :15→ 5 b :15→ 6d :15→ 9 a :16→ 3 b :16→12 a :16→ 4
�5
2
75
10
86
11
1
9
3
4
12
c
d
d
d
d
d
d
c
c
c
c
c
b
a
aa
a
a
b
b
b
bb
a
Figure 2
Figure 1
2
3
i
1
c
a
a b
cb
5
6
7
4
c
c
ab
b
8 9
b
a c
a
References.
[1] Michel Speiser, Gianluca Antonini, Abderrahim Labbi, Juliana Sutanto: On nested palindromes in clickstream data. The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, Beijing, China, August 12-16, 2012. ACM 2012: 1460-1468
[2] Peter Linz, An Introduction to Formal Language and Automata, Jones and Bartlett Publishers, Inc. , USA 2006, ISBN, 0763737984
[3] Ronald V. Book, Friedrich Otto: String-Rewriting Systems. Texts and Monographs in Computer Science, Springer 1993, ISBN 978-3-540-97965-4
Center for Liberal Arts Education and ResearchTenri University1050 Somanouchi, Tenri, Nara 632-8510, JapanE-mail address: tsuji sta.tenri-u.ac.jp
天理大学・総合教育研究センター 辻佳代子
�6
2
7 5
10
8
611
1 9
3 4
12
d
d d
d
c
c
c
c
a
a
a
b
b
b a
i
13
14 15
16
b
Figure 3