ravello, 19-20-21/09c.e. on some researches... chiara epifanio

32
Ravello, 19-20- 21/09 C.E. On some researches... Chiara Epifanio

Upload: beverly-owens

Post on 14-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

On some researches...

Chiara Epifanio

Page 2: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Outline

Compact representation

of local automata

The multidimensional Critical

Factorizazion Theorem

Page 3: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

The multidimensional Critical Factorization Theorem

Chiara Epifanio, Filippo Mignosi

Page 4: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

• A word is a sequence of characters over an alphabet A,

w A{1,2,…n}, ANN, AZZ

• w=a1…an is periodic if pN N s. t.

w(x+p)= w(x) x,1xn-p

W

• p is a period of w

Page 5: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

• a word may have more than a period (e. g. abaababaabaababaaba, that has periods 8 and 13)

• the smallest period of w is called “the” period of w.

Page 6: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

A factor v=wj…wj+n-1 of length n of w is a repetition of order if there exists a natural number p, 0pn such that wi=wi+p for i = j,…,j+n-1-p and such that n/p. The number p is called a period of the repetition. The smallest period of the repetition is called the period of the repetition.

Ex: abaabaRepetition of

period 6 and order 1

period 5 and order 6/5

period 3 and order 2

Page 7: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Word w has a central repetition of order in position i if there exists a factor v centered in i that is a repetition of order . In this case we denote c(w,i) the smallest period among all the central repetitions of order in position i and we call it the central local period of order in i.

i

We denote by P(w) the maximum of the central local

periods of order in w.

A position i is critical if c(w,i)=P(w).

v

Page 8: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

The Critical Factorization Theorem

Let w be a word having length |w| 2. In every

sequence of l max {1, p(w)-1} consecutive

positions there is a critical one and P(w)=p(w),

=2.

Page 9: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

The Critical factorization Theorem in particular

states that for =2 there exists at least one point

such that the central local period detected at this

point coincides with the (global) period of the word,

i.e., there exists an integer j, 1 j |w|, such that

c(w,j)=p(w), =2.

We have given a new proof for =4.

Page 10: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

u v

v w

v wu

Lemma 1

Let u, v, w be words such that uv and vw have period p and |v|p. Then the word uvw has period p.

(cf. Lemma 8.1.2,Lothaire 2 chapter 8)

Page 11: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

w

v

vw

Lemma 2

Suppose that w has period q and that there exists a factor v of w with |v| q that has period r, when r divides q. Then w has period r.

(cf. Lemma 8.1.3,Lothaire 2 chapter 8)

Page 12: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Fine and Wilf Theorem

Let w be a word having periods p and q, with

q p. If

|w| p + q - gcd(p,q),

then w has also period gcd(p,q).

Page 13: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Multidimensional case

(Multidimensional periodicity was introduced by Amir and Benson for the design of Pattern Matching algorithms (1991). Since then, lots of people worked on it giving slightly different definitions).

Page 14: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

If u is a factor of w then v is a periodicity vector for u if

w((x,y)+v) = w(x,y)

(x,y)Dom(u) t.c. ((x,y)+ v)Dom(u)

u

v is a periodicity vector for w if w((x,y)+v) = w(x,y) (x,y)

Page 15: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

A factor u of w is lattice-periodic with respect to v1 and v2 if v<v1,v2> is a periodicity vector for u.

a b c d a

f g h e f

c d a b ch e f g ha b c d a

L=<(2,2), (-2,2)> = <(2,2),(4,0)>

Page 16: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Given a subgroup H of Zd, a transversal TH of H is a subset of Zd such that for any element i Zd, there exists an unique element jTH such that i-j H.

An n-cubic factor v is a repetition of order , if

• v is L periodic, L lattice;

• n is such that n/hL, where hL is the smallest integer such that every hypercube of side hL

contains a transversal of L.

The lattice L is called a period of the -repetition v.

Page 17: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Word w has a central repetition of order in position jZd if there exists a factor v of w centered in j that is a repetition of order .

If w has at least a central repetition of order and period L in j, the set

H={hL s.t. every hypercube of side hL contains a transversal of L}

We denote c(w,j)=min(H).

Let P(w) = limsup{c(w,j), j position in w}

Page 18: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Lemma 3

Let v1 and v2 be two factors of same word w Zd that have both period a subgroup H. If sh(v1)sh(v2) contains a transversal of H then the factor v having shape sh(v)= sh(v1)sh(v2) has also period H.

sh(v1)sh(v2)

sh(v)

Page 19: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Lemma 4

Let v1 and v2 be two factors of same word w Zd such that sh(v2) sh(v1). Suppose that v1 has period H1 and that v2 has period H2, with H1 subgroup of H2 and that sh(v2) contains a transversal of H1. Under these hypotheses v1 has period H2.

sh(v1)sh(v2)

Page 20: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

A generalization of the Fine & Wilf Theorem

If w has two periodicity vectors v1 and v2 and w is “big enough” with respect to v1 and v2, then w is lattice-periodic with respect to v1 and v2.

Page 21: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

The multidimensional Critical Factorization Theorem

• Informally, the C.F.T. states that the maximal local repetition of order 2 is also a period of the whole word.

• But …. there is no total order among lattices!!

• Our solution is to order lattices by using the length hL of the side of the smallest hypercube that contains a transversal of L.

• We have further to prove that all the lattices with same maximal hL coincide over the word.

• To do this, for the moment, we loose the tightness of the local repetition order (4 instead of 2).

Page 22: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Theorem

Let w be a cubic bidimensional word, X be a cube included in the shape of w.

• Every cube T X, of side max(1,P4(X)-1) contains a position l such that c4(w,l)=P4(w).

• Let v be the factor of w having shape the intersection between sh(w) and the union X’ of the shapes of the 4-repetitions centered in position lX such that c4(w,l)=P4(X). Then v has period L, where L is a subgroup such that every cube of side P4(X) contains a transversal of L.

sh(v)

Page 23: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Proof of the theorem

Lemma4 Fine & Wilf generalizationLemma 3

Thesis

Page 24: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

• Importance of the extension to the d-dimensional case (d2).

• Difficulties on such an extension (new definitions, extension of already known results).

• It is known that for d=1 the tight value is =2. It remains an open problem to find the tight value of for any dimension.

• Applications.

Conclusions and open problems

Page 25: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Compact representation of local automata

M. Crochemore, C. Epifanio, R. Grossi, F. Mignosi

Page 26: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Compacting is a standard technique used for reducing the size of data structures such as factor automata, DAWG and suffix trees and consists on replacing paths in automata with single edges.

In 2000 Crochemore, Mignosi, Restivo and Salemi gave an algorithm for “self-compressing” trie of antifactorial binary sets of words. The aim of that algorithm was to represent in a compact way antidictionaries to be sent to the decoder of a static compression scheme. What we have worked on is an improvement scheme of that algorithm that works for sets of words over any alphabet.

Page 27: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

The suffix trie of a word Tr(w) is a trie where the set of leaves is the set of suffixes of w that does not appear previously as a factor in w.

Ex.:

Page 28: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

The suffix tree T(w) of a word w is a compressed suffix trie, where only leaves and forks are kept. Each edge is labelled with a substring of w. In this way the number of nodes and leaves of T(w) is smaller than 2|w|.

But if the labels of arcs are stored explicitely, the implementation can have quadratic size. The simple solution is to represent labels by pairs of integers (position, position) or (position, length) and to keep the text aside.

Ex.:

Page 29: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

There are classical on-line linear time implementations. All of them use suffix link function s, that is defined over all the nodes of the suffix trie and suffix tree by

• s(root)=root• s(v)=v’, where v=av’, v being the labelling of the path form the root to v and a being the first letter of v.

Ex.:

Page 30: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

Our new approach is basically the same one of the suffix tree, but we compact a bit less, i.e. we keep all nodes of the suffix tree and some more nodes of the trie, that are all the nodes v of the trie such that s(v) is a node of the suffix tree.

In this case for any arc of the form (v,v’) with label a in the trie we have an arc (v,x) with same label in our compacted trie T2(w), where x is

• v’, if v’T2(w);

• the first node in T2(w) that is a descendant of v’ in the original trie, if v’T2(w).

In this second case, we consider that (v,x) represents the whole path from v to x in the suffix trie and we

add a sign + to node x in order to maintain this information.

Page 31: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

To complete the definition of T2(w) we keep the suffix link function over these nodes.

Notice that, by definition, for any node v of T2(w), s(v) is always a node of the suffix tree T(w) and hence it also belongs to T2(w).

This new approach let us not

to maintain the text aside.

Page 32: Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio

Ravello, 19-20-21/09 C.E.

State of the art

• We have given compacting and decompacting algorithms;

• we have proved that the number of nodes in our compacted suffix tree is still linear;

•we have given an algorithm that can be used to check whether a pattern is present in a text, without “decompacting” the automaton;

• actually we are doing some experiments on the Calgary and Canterbury corpus.