internal pattern matching queries in a text and applicationskociumaka/files/soda2015b_slides.pdf ·...

84
Internal Pattern Matching Queries in a Text and Applications Tomasz Kociumaka Jakub Radoszewski Wojciech Rytter Tomasz Waleń University of Warsaw, Poland SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 1/16

Upload: others

Post on 29-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Pattern Matching Queries in a Text andApplications

Tomasz Kociumaka Jakub RadoszewskiWojciech Rytter Tomasz Waleń

University of Warsaw, Poland

SODA 2015San Diego, California, USA

January 4, 2015

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 1/16

Page 2: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Classic Data Structures for Text Processing

T : a b a a b a b a a b a a b a b a a b

a b a a b a b a a b a a b a b a a bconstruction

data structurequeryquery

P : a b a b apos: 4, 12

T [4..8]?= T [12..16]

YES

Indexing SubwordsGiven a pattern P, find alloccurrences of P in T .

Indexing queries:

a string is part of the queryinput.

Subword EqualityGiven two subwords x , y of T ,decide whether x = y .

Internal queries:

concern subwords of T only(identified by positions).

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 2/16

Page 3: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Classic Data Structures for Text Processing

T : a b a a b a b a a b a a b a b a a b

a b a a b a b a a b a a b a b a a b

construction

data structure

queryquery

P : a b a b apos: 4, 12

T [4..8]?= T [12..16]

YES

Indexing SubwordsGiven a pattern P, find alloccurrences of P in T .

Indexing queries:

a string is part of the queryinput.

Subword EqualityGiven two subwords x , y of T ,decide whether x = y .

Internal queries:

concern subwords of T only(identified by positions).

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 2/16

Page 4: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Classic Data Structures for Text Processing

T : a b a a b a b a a b a a b a b a a b

a b a a b a b a a b a a b a b a a b

construction

data structurequeryquery

P : a b a b apos: 4, 12

T [4..8]?= T [12..16]

YES

Indexing SubwordsGiven a pattern P, find alloccurrences of P in T .

Indexing queries:

a string is part of the queryinput.

Subword EqualityGiven two subwords x , y of T ,decide whether x = y .

Internal queries:

concern subwords of T only(identified by positions).

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 2/16

Page 5: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Classic Data Structures for Text Processing

T : a b a a b a b a a b a a b a b a a b

a b a a b a b a a b a a b a b a a b

construction

data structurequeryquery

P : a b a b a

pos: 4, 12

T [4..8]?= T [12..16]

YES

Indexing SubwordsGiven a pattern P, find alloccurrences of P in T .

Indexing queries:

a string is part of the queryinput.

Subword EqualityGiven two subwords x , y of T ,decide whether x = y .

Internal queries:

concern subwords of T only(identified by positions).

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 2/16

Page 6: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Classic Data Structures for Text Processing

T :

a b a a b a b a a b a a b a b a a b

a b a a b a b a a b a a b a b a a bconstruction

data structurequeryquery

P : a b a b apos: 4, 12

T [4..8]?= T [12..16]

YES

Indexing SubwordsGiven a pattern P, find alloccurrences of P in T .

Indexing queries:

a string is part of the queryinput.

Subword EqualityGiven two subwords x , y of T ,decide whether x = y .

Internal queries:

concern subwords of T only(identified by positions).

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 2/16

Page 7: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Classic Data Structures for Text Processing

T : a b a a b a b a a b a a b a b a a b

a b a a b a b a a b a a b a b a a b

construction

data structurequeryquery

P : a b a b apos: 4, 12

T [4..8]?= T [12..16]

YES

Indexing SubwordsGiven a pattern P, find alloccurrences of P in T .

Indexing queries:

a string is part of the queryinput.

Subword EqualityGiven two subwords x , y of T ,decide whether x = y .

Internal queries:

concern subwords of T only(identified by positions).

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 2/16

Page 8: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Classic Data Structures for Text Processing

T :

a b a a b a b a a b a a b a b a a b

a b a a b a b a a b a a b a b a a bconstruction

data structurequeryquery

P : a b a b apos: 4, 12

T [4..8]?= T [12..16]

YES

Indexing SubwordsGiven a pattern P, find alloccurrences of P in T .

Indexing queries:

a string is part of the queryinput.

Subword EqualityGiven two subwords x , y of T ,decide whether x = y .

Internal queries:

concern subwords of T only(identified by positions).

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 2/16

Page 9: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Classic Data Structures for Text Processing

T : a b a a b a b a a b a a b a b a a b

a b a a b a b a a b a a b a b a a b

construction

data structurequeryquery

P : a b a b apos: 4, 12

T [4..8]?= T [12..16]

YES

Indexing SubwordsGiven a pattern P, find alloccurrences of P in T .

Indexing queries:

a string is part of the queryinput.

Subword EqualityGiven two subwords x , y of T ,decide whether x = y .

Internal queries:

concern subwords of T only(identified by positions).

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 2/16

Page 10: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Classic Data Structures for Text Processing

T : a b a a b a b a a b a a b a b a a b

a b a a b a b a a b a a b a b a a b

construction

data structurequeryquery

P : a b a b apos: 4, 12

T [4..8]?= T [12..16]

YES

Indexing SubwordsGiven a pattern P, find alloccurrences of P in T .

Indexing queries:

a string is part of the queryinput.

Subword EqualityGiven two subwords x , y of T ,decide whether x = y .

Internal queries:←− this talk

concern subwords of T only(identified by positions).

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 2/16

Page 11: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Queries

Internal queries

Solve a problem for inputs being subwords of a given text T .

O(1) query input size allows for O(1) query time,Ω(|P|) required if the “pattern” P is in the input.

Motivation:

All input data must be known in advancePrimitives for algorithms and data structures

linear size and O(1) query time crucial for applicability,efficient construction important for applications in algorithms.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 3/16

Page 12: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Queries

Internal queries

Solve a problem for inputs being subwords of a given text T .

O(1) query input size allows for O(1) query time,Ω(|P|) required if the “pattern” P is in the input.

Motivation:

All input data must be known in advancePrimitives for algorithms and data structures

linear size and O(1) query time crucial for applicability,efficient construction important for applications in algorithms.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 3/16

Page 13: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Queries

Internal queries

Solve a problem for inputs being subwords of a given text T .

O(1) query input size allows for O(1) query time,Ω(|P|) required if the “pattern” P is in the input.

Motivation:

All input data must be known in advance

Primitives for algorithms and data structureslinear size and O(1) query time crucial for applicability,efficient construction important for applications in algorithms.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 3/16

Page 14: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Queries

Internal queries

Solve a problem for inputs being subwords of a given text T .

O(1) query input size allows for O(1) query time,Ω(|P|) required if the “pattern” P is in the input.

Motivation:

All input data must be known in advancePrimitives for algorithms and data structures

linear size and O(1) query time crucial for applicability,efficient construction important for applications in algorithms.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 3/16

Page 15: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Queries: Previous Techniques

13

6

8

1

10

3

12

5

7

9

2

11

4

$ab

a

b

a

b

b

$

b$ babababb$

b$ babababb$

b

$ a

b

a

babb$

bb$

bb

$abababb$

b

$abababb$

Suffix tree

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 4/16

Page 16: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Queries: Previous Techniques

Range successor queries

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 4/16

Page 17: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Queries: Previous Techniques

Suffix tree13

6

8

1

10

3

12

5

7

9

2

11

4

$ab

a

b

a

b

b

$

b$ babababb$

b$ babababb$

b

$ a

b

a

babb$

bb$

bb

$abababb$

b

$abababb$

Range successor queries

Suffix tree +problem-specific tools:

Longest common prefix

Min/Max suffix

+ efficient, often O(1) timequeries with O(n) space,

− limited use.

Suffix tree +orthogonal range queries:

Pattern matching-related

Period queries

+ wide applicability,

− lower bounds for query time,slow construction.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 4/16

Page 18: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Queries: Previous Techniques

Suffix tree13

6

8

1

10

3

12

5

7

9

2

11

4

$ab

a

b

a

b

b

$

b$ babababb$

b$ babababb$

b

$ a

b

a

babb$

bb$

bb

$abababb$

b

$abababb$

Range successor queries

Suffix tree +problem-specific tools:

Longest common prefix

Min/Max suffix

+ efficient, often O(1) timequeries with O(n) space,

− limited use.

Suffix tree +orthogonal range queries:

Pattern matching-related

Period queries

+ wide applicability,

− lower bounds for query time,slow construction.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 4/16

Page 19: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Queries: Previous Techniques

Suffix tree13

6

8

1

10

3

12

5

7

9

2

11

4

$ab

a

b

a

b

b

$

b$ babababb$

b$ babababb$

b

$ a

b

a

babb$

bb$

bb

$abababb$

b

$abababb$

Range successor queries

Suffix tree +problem-specific tools:

Longest common prefix

Min/Max suffix

+ efficient, often O(1) timequeries with O(n) space,

− limited use.

Suffix tree +orthogonal range queries:

Pattern matching-related

Period queries

+ wide applicability,

− lower bounds for query time,slow construction.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 4/16

Page 20: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Pattern Matching Queries

Fact

Let Occ(x , y) be the positions in y where occurrences of x start.Occ(x , y) forms an arithmetic progression if |y | ≤ 2|x |.

x : a b a b a b a b b a b a b a b a b a b a ay :

b b a b a b a b a b a b a ab b a b a b a b a b a b a ab b a b a b a b a b a b a a

Occ(x , y) =

3, 5, 7

T :

Problem (Internal Pattern Matching Queries)

Given subwords x and y of T with |y | ≤ 2|x |, report alloccurrences of x in y (as an arithmetic progression).

Θ(|y ||x |

)space is necessary to encode Occ(x , y),

Occ(x , y) can be computed with Θ(|y ||x |

)IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 5/16

Page 21: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Pattern Matching Queries

Fact

Let Occ(x , y) be the positions in y where occurrences of x start.Occ(x , y) forms an arithmetic progression if |y | ≤ 2|x |.

x : a b a b a b a

b b a b a b a b a b a b a a

y : b b a b a b a b a b a b a a

b b a b a b a b a b a b a ab b a b a b a b a b a b a a

Occ(x , y) = 3,

5, 7

T :

Problem (Internal Pattern Matching Queries)

Given subwords x and y of T with |y | ≤ 2|x |, report alloccurrences of x in y (as an arithmetic progression).

Θ(|y ||x |

)space is necessary to encode Occ(x , y),

Occ(x , y) can be computed with Θ(|y ||x |

)IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 5/16

Page 22: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Pattern Matching Queries

Fact

Let Occ(x , y) be the positions in y where occurrences of x start.Occ(x , y) forms an arithmetic progression if |y | ≤ 2|x |.

x : a b a b a b a

b b a b a b a b a b a b a a

y :

b b a b a b a b a b a b a a

b b a b a b a b a b a b a a

b b a b a b a b a b a b a a

Occ(x , y) = 3, 5,

7

T :

Problem (Internal Pattern Matching Queries)

Given subwords x and y of T with |y | ≤ 2|x |, report alloccurrences of x in y (as an arithmetic progression).

Θ(|y ||x |

)space is necessary to encode Occ(x , y),

Occ(x , y) can be computed with Θ(|y ||x |

)IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 5/16

Page 23: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Pattern Matching Queries

Fact

Let Occ(x , y) be the positions in y where occurrences of x start.Occ(x , y) forms an arithmetic progression if |y | ≤ 2|x |.

x : a b a b a b a

b b a b a b a b a b a b a a

y :

b b a b a b a b a b a b a ab b a b a b a b a b a b a a

b b a b a b a b a b a b a a

Occ(x , y) = 3, 5, 7

T :

Problem (Internal Pattern Matching Queries)

Given subwords x and y of T with |y | ≤ 2|x |, report alloccurrences of x in y (as an arithmetic progression).

Θ(|y ||x |

)space is necessary to encode Occ(x , y),

Occ(x , y) can be computed with Θ(|y ||x |

)IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 5/16

Page 24: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Pattern Matching Queries

Fact

Let Occ(x , y) be the positions in y where occurrences of x start.Occ(x , y) forms an arithmetic progression if |y | ≤ 2|x |.

x : a b a b a b a b b a b a b a b a b a b a ay :

b b a b a b a b a b a b a ab b a b a b a b a b a b a ab b a b a b a b a b a b a a

Occ(x , y) = 3, 5, 7

T :

Problem (Internal Pattern Matching Queries)

Given subwords x and y of T with |y | ≤ 2|x |, report alloccurrences of x in y (as an arithmetic progression).

Θ(|y ||x |

)space is necessary to encode Occ(x , y),

Occ(x , y) can be computed with Θ(|y ||x |

)IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 5/16

Page 25: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Internal Pattern Matching Queries

Fact

Let Occ(x , y) be the positions in y where occurrences of x start.Occ(x , y) forms an arithmetic progression if |y | ≤ 2|x |.

x : a b a b a b a b b a b a b a b a b a b a ay :

b b a b a b a b a b a b a ab b a b a b a b a b a b a ab b a b a b a b a b a b a a

Occ(x , y) = 3, 5, 7

T :

Problem (Internal Pattern Matching Queries)

Given subwords x and y of T with |y | ≤ 2|x |, report alloccurrences of x in y (as an arithmetic progression).

Θ(|y ||x |

)space is necessary to encode Occ(x , y),

Occ(x , y) can be computed with Θ(|y ||x |

)IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 5/16

Page 26: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

IPM Queries: Our Results

Problem (Internal Pattern Matching Queries)

Given subwords x and y of T with |y | ≤ 2|x |, report alloccurrences of x in y (as an arithmetic progression).

Theorem

IPM Queries can be answered in O(1) time by a data structureof size O(n), which can be constructed in O(n) time.

Technical assumptions:

Word-RAM model with word-size w = Ω(log n).Construction:

randomized (Las Vegas, expected time),

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 6/16

Page 27: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

IPM Queries: Our Results

Problem (Internal Pattern Matching Queries)

Given subwords x and y of T with |y | ≤ 2|x |, report alloccurrences of x in y (as an arithmetic progression).

Theorem

IPM Queries can be answered in O(1) time by a data structureof size O(n), which can be constructed in O(n) time.

Technical assumptions:

Word-RAM model with word-size w = Ω(log n).

Construction:randomized (Las Vegas, expected time),

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 6/16

Page 28: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

IPM Queries: Our Results

Problem (Internal Pattern Matching Queries)

Given subwords x and y of T with |y | ≤ 2|x |, report alloccurrences of x in y (as an arithmetic progression).

Theorem

IPM Queries can be answered in O(1) time by a data structureof size O(n), which can be constructed in O(n) time.

Technical assumptions:

Word-RAM model with word-size w = Ω(log n).Construction:

randomized (Las Vegas, expected time),

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 6/16

Page 29: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Applications

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 7/16

Page 30: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Prefix-Suffix and Period Queries

T : b a b a b a b a b b a a

b a b a b a b a b b a ab a b a b a b a b b a ab a b a b a b a b b a a

x :

d

2d

b a b b a b a b a b a

b a b b a b a b a b ab a b b a b a b a b ab a b b a b a b a b a

y :

d

2d

lengths: 4

, 6, 8

Problem (Prefix-Suffix Queries)

Given subwords x and y of T and d ∈ N, report all prefixes of x oflength between d and 2d that are also suffixes of y .

O(1) time using IPM Queries.

x : a b b a b a b b a b a b b a b a b b

Problem (Period Queries)

Given a subword x of T , report all periods of x .

O(log |x |) time using IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 8/16

Page 31: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Prefix-Suffix and Period Queries

T :

b a b a b a b a b b a a

b a b a b a b a b b a a

b a b a b a b a b b a ab a b a b a b a b b a a

x :

d

2d

b a b b a b a b a b a

b a b b a b a b a b a

b a b b a b a b a b ab a b b a b a b a b a

y :

d

2d

lengths: 4

, 6, 8

Problem (Prefix-Suffix Queries)

Given subwords x and y of T and d ∈ N, report all prefixes of x oflength between d and 2d that are also suffixes of y .

O(1) time using IPM Queries.

x : a b b a b a b b a b a b b a b a b b

Problem (Period Queries)

Given a subword x of T , report all periods of x .

O(log |x |) time using IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 8/16

Page 32: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Prefix-Suffix and Period Queries

T :

b a b a b a b a b b a ab a b a b a b a b b a a

b a b a b a b a b b a a

b a b a b a b a b b a a

x :

d

2d

b a b b a b a b a b ab a b b a b a b a b a

b a b b a b a b a b a

b a b b a b a b a b a

y :

d

2d

lengths: 4, 6

, 8

Problem (Prefix-Suffix Queries)

Given subwords x and y of T and d ∈ N, report all prefixes of x oflength between d and 2d that are also suffixes of y .

O(1) time using IPM Queries.

x : a b b a b a b b a b a b b a b a b b

Problem (Period Queries)

Given a subword x of T , report all periods of x .

O(log |x |) time using IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 8/16

Page 33: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Prefix-Suffix and Period Queries

T :

b a b a b a b a b b a ab a b a b a b a b b a ab a b a b a b a b b a a

b a b a b a b a b b a ax :

d

2d

b a b b a b a b a b ab a b b a b a b a b ab a b b a b a b a b a

b a b b a b a b a b ay :

d

2d

lengths: 4, 6, 8

Problem (Prefix-Suffix Queries)

Given subwords x and y of T and d ∈ N, report all prefixes of x oflength between d and 2d that are also suffixes of y .

O(1) time using IPM Queries.

x : a b b a b a b b a b a b b a b a b b

Problem (Period Queries)

Given a subword x of T , report all periods of x .

O(log |x |) time using IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 8/16

Page 34: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Prefix-Suffix and Period Queries

T : b a b a b a b a b b a a

b a b a b a b a b b a ab a b a b a b a b b a ab a b a b a b a b b a a

x :

d

2d

b a b b a b a b a b a

b a b b a b a b a b ab a b b a b a b a b ab a b b a b a b a b a

y :

d

2d

lengths: 4, 6, 8

Problem (Prefix-Suffix Queries)

Given subwords x and y of T and d ∈ N, report all prefixes of x oflength between d and 2d that are also suffixes of y .

O(1) time using IPM Queries.

x : a b b a b a b b a b a b b a b a b b

Problem (Period Queries)

Given a subword x of T , report all periods of x .

O(log |x |) time using IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 8/16

Page 35: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Prefix-Suffix and Period Queries

T : b a b a b a b a b b a a

b a b a b a b a b b a ab a b a b a b a b b a ab a b a b a b a b b a a

x :

d

2d

b a b b a b a b a b a

b a b b a b a b a b ab a b b a b a b a b ab a b b a b a b a b a

y :

d

2d

lengths: 4, 6, 8

Problem (Prefix-Suffix Queries)

Given subwords x and y of T and d ∈ N, report all prefixes of x oflength between d and 2d that are also suffixes of y .

O(1) time using IPM Queries.

x : a b b a b a b b a b a b b a b a b b

Problem (Period Queries)

Given a subword x of T , report all periods of x .

O(log |x |) time using IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 8/16

Page 36: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Prefix-Suffix and Period Queries

T : b a b a b a b a b b a a

b a b a b a b a b b a ab a b a b a b a b b a ab a b a b a b a b b a a

x :

d

2d

b a b b a b a b a b a

b a b b a b a b a b ab a b b a b a b a b ab a b b a b a b a b a

y :

d

2d

lengths: 4, 6, 8

Problem (Prefix-Suffix Queries)

Given subwords x and y of T and d ∈ N, report all prefixes of x oflength between d and 2d that are also suffixes of y .

O(1) time using IPM Queries.

x : a b b a b a b b a b a b b a b a b b

Problem (Period Queries)

Given a subword x of T , report all periods of x .

O(log |x |) time using IPM Queries.Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 8/16

Page 37: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Applications in Substring Compression

T :

y :i ji ′ j ′

x :

Substring Compressionserver

client

T [i , j ]? LZ (y)

y :

T [i ′, j ′]? LZ (x |y)

x :

Problem (Cormode & Muthukrishnan; SODA 2005)

Given subwords x and y , compute LZ (x |y), i.e., the section of theLempel-Ziv (LZ77) factorization LZ(y$x) that corresponds to x .

data structure of O(n) size,

query time improved from O(C log |x |C logε n) to

O(C log log |x |C logε n) where C is the output size,

combines orthogonal range queries with IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 9/16

Page 38: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Applications in Substring Compression

T :

y :i ji ′ j ′

x :

Substring Compressionserver

clientT [i , j ]?

LZ (y)

y :

T [i ′, j ′]? LZ (x |y)

x :

Problem (Cormode & Muthukrishnan; SODA 2005)

Given subwords x and y , compute LZ (x |y), i.e., the section of theLempel-Ziv (LZ77) factorization LZ(y$x) that corresponds to x .

data structure of O(n) size,

query time improved from O(C log |x |C logε n) to

O(C log log |x |C logε n) where C is the output size,

combines orthogonal range queries with IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 9/16

Page 39: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Applications in Substring Compression

T : y :i j

i ′ j ′x :

Substring Compressionserver

clientT [i , j ]?

LZ (y)

y :

T [i ′, j ′]? LZ (x |y)

x :

Problem (Cormode & Muthukrishnan; SODA 2005)

Given subwords x and y , compute LZ (x |y), i.e., the section of theLempel-Ziv (LZ77) factorization LZ(y$x) that corresponds to x .

data structure of O(n) size,

query time improved from O(C log |x |C logε n) to

O(C log log |x |C logε n) where C is the output size,

combines orthogonal range queries with IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 9/16

Page 40: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Applications in Substring Compression

T : y :i j

i ′ j ′x :

Substring Compressionserver

clientT [i , j ]? LZ (y)

y :

T [i ′, j ′]? LZ (x |y)

x :

Problem (Cormode & Muthukrishnan; SODA 2005)

Given subwords x and y , compute LZ (x |y), i.e., the section of theLempel-Ziv (LZ77) factorization LZ(y$x) that corresponds to x .

data structure of O(n) size,

query time improved from O(C log |x |C logε n) to

O(C log log |x |C logε n) where C is the output size,

combines orthogonal range queries with IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 9/16

Page 41: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Applications in Substring Compression

T : y :i j

i ′ j ′x :

Substring Compressionserver

clientT [i , j ]? LZ (y)

y :

T [i ′, j ′]? LZ (x |y)

x :

Problem (Cormode & Muthukrishnan; SODA 2005)

Given subwords x and y , compute LZ (x |y), i.e., the section of theLempel-Ziv (LZ77) factorization LZ(y$x) that corresponds to x .

data structure of O(n) size,

query time improved from O(C log |x |C logε n) to

O(C log log |x |C logε n) where C is the output size,

combines orthogonal range queries with IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 9/16

Page 42: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Applications in Substring Compression

T : y :i j

i ′ j ′x :

Generalized Substring Compressionserver

clientT [i , j ]? LZ (y)

y :

T [i ′, j ′]?

LZ (x |y)

x :

Problem (Cormode & Muthukrishnan; SODA 2005)

Given subwords x and y , compute LZ (x |y), i.e., the section of theLempel-Ziv (LZ77) factorization LZ(y$x) that corresponds to x .

data structure of O(n) size,

query time improved from O(C log |x |C logε n) to

O(C log log |x |C logε n) where C is the output size,

combines orthogonal range queries with IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 9/16

Page 43: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Applications in Substring Compression

T : y :i ji ′ j ′

x :

Generalized Substring Compressionserver

clientT [i , j ]? LZ (y)

y :

T [i ′, j ′]?

LZ (x |y)

x :

Problem (Cormode & Muthukrishnan; SODA 2005)

Given subwords x and y , compute LZ (x |y), i.e., the section of theLempel-Ziv (LZ77) factorization LZ(y$x) that corresponds to x .

data structure of O(n) size,

query time improved from O(C log |x |C logε n) to

O(C log log |x |C logε n) where C is the output size,

combines orthogonal range queries with IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 9/16

Page 44: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Applications in Substring Compression

T : y :i ji ′ j ′

x :

Generalized Substring Compressionserver

clientT [i , j ]? LZ (y)

y :

T [i ′, j ′]? LZ (x |y)

x :

Problem (Cormode & Muthukrishnan; SODA 2005)

Given subwords x and y , compute LZ (x |y), i.e., the section of theLempel-Ziv (LZ77) factorization LZ(y$x) that corresponds to x .

data structure of O(n) size,

query time improved from O(C log |x |C logε n) to

O(C log log |x |C logε n) where C is the output size,

combines orthogonal range queries with IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 9/16

Page 45: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Applications in Substring Compression

T : y :i ji ′ j ′

x :

Generalized Substring Compressionserver

clientT [i , j ]? LZ (y)

y :

T [i ′, j ′]? LZ (x |y)

x :

Problem (Cormode & Muthukrishnan; SODA 2005)

Given subwords x and y , compute LZ (x |y), i.e., the section of theLempel-Ziv (LZ77) factorization LZ(y$x) that corresponds to x .

data structure of O(n) size,

query time improved from O(C log |x |C logε n) to

O(C log log |x |C logε n) where C is the output size,

combines orthogonal range queries with IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 9/16

Page 46: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Applications in Substring Compression

T : y :i ji ′ j ′

x :

Generalized Substring Compressionserver

clientT [i , j ]? LZ (y)

y :

T [i ′, j ′]? LZ (x |y)

x :

Problem (Cormode & Muthukrishnan; SODA 2005)

Given subwords x and y , compute LZ (x |y), i.e., the section of theLempel-Ziv (LZ77) factorization LZ(y$x) that corresponds to x .

data structure of O(n) size,

query time improved from O(C log |x |C logε n) to

O(C log log |x |C logε n) where C is the output size,

combines orthogonal range queries with IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 9/16

Page 47: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Applications in Substring Compression

T : y :i ji ′ j ′

x :

Generalized Substring Compressionserver

clientT [i , j ]? LZ (y)

y :

T [i ′, j ′]? LZ (x |y)

x :

Problem (Cormode & Muthukrishnan; SODA 2005)

Given subwords x and y , compute LZ (x |y), i.e., the section of theLempel-Ziv (LZ77) factorization LZ(y$x) that corresponds to x .

data structure of O(n) size,

query time improved from O(C log |x |C logε n) to

O(C log log |x |C logε n) where C is the output size,

combines orthogonal range queries with IPM Queries.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 9/16

Page 48: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Other Applications

Problem (Cyclic Equivalence Queries)

Decide whether given subwords x and y are cyclic shifts of eachother (x = uv and y = vu for some strings u and v).

O(1) query time using IPM Queries.

Definition (δ-subrepetition)

A δ-subrepetition is a fragment x such that per(x) ≤ |x |1+δ and

x cannot be extended (to the left or right) preserving per(x).

Improved algorithm for finding δ-subrepetitions in a word:first (expected) linear-time algorithm for δ = Θ(1).

More applications through wavelet suffix trees:substring suffix rank & selection,substring compression using Burrows-Wheeler transform.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 10/16

Page 49: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Other Applications

Problem (Cyclic Equivalence Queries)

Decide whether given subwords x and y are cyclic shifts of eachother (x = uv and y = vu for some strings u and v).

O(1) query time using IPM Queries.

Definition (δ-subrepetition)

A δ-subrepetition is a fragment x such that per(x) ≤ |x |1+δ and

x cannot be extended (to the left or right) preserving per(x).

Improved algorithm for finding δ-subrepetitions in a word:first (expected) linear-time algorithm for δ = Θ(1).

More applications through wavelet suffix trees:substring suffix rank & selection,substring compression using Burrows-Wheeler transform.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 10/16

Page 50: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Other Applications

Problem (Cyclic Equivalence Queries)

Decide whether given subwords x and y are cyclic shifts of eachother (x = uv and y = vu for some strings u and v).

O(1) query time using IPM Queries.

Definition (δ-subrepetition)

A δ-subrepetition is a fragment x such that per(x) ≤ |x |1+δ and

x cannot be extended (to the left or right) preserving per(x).

Improved algorithm for finding δ-subrepetitions in a word:first (expected) linear-time algorithm for δ = Θ(1).

More applications through wavelet suffix trees:substring suffix rank & selection,substring compression using Burrows-Wheeler transform.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 10/16

Page 51: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

High-level Ideas

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 11/16

Page 52: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

IPM Queries: Outline

x :

z :

2k

y :

Main idea:Design (locally consistent) representative assignment

representatives of length 2k with |x|4 < 2k ≤ |x|

2 ,store information about location of representative only.

Query algorithm:

1. Compute the representative z assigned to x .

2. Find occurrences of z within y .

3. Check which occurrences of z extend to occurrences of x .

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 12/16

Page 53: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

IPM Queries: Outline

x :

z :

2k

y :

Main idea:Design (locally consistent) representative assignment

representatives of length 2k with |x|4 < 2k ≤ |x|

2 ,store information about location of representative only.

Query algorithm:

1. Compute the representative z assigned to x .

2. Find occurrences of z within y .

3. Check which occurrences of z extend to occurrences of x .

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 12/16

Page 54: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

IPM Queries: Outline

x :

z :

2k

y :

Main idea:Design (locally consistent) representative assignment

representatives of length 2k with |x|4 < 2k ≤ |x|

2 ,store information about location of representative only.

Query algorithm:

1. Compute the representative z assigned to x .

2. Find occurrences of z within y .

3. Check which occurrences of z extend to occurrences of x .

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 12/16

Page 55: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

IPM Queries: Outline

x : z :

2k

y :

Main idea:Design (locally consistent) representative assignment

representatives of length 2k with |x|4 < 2k ≤ |x|

2 ,store information about location of representative only.

Query algorithm:

1. Compute the representative z assigned to x .

2. Find occurrences of z within y .

3. Check which occurrences of z extend to occurrences of x .

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 12/16

Page 56: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

IPM Queries: Outline

x : z :

2k

y :

Main idea:Design (locally consistent) representative assignment

representatives of length 2k with |x|4 < 2k ≤ |x|

2 ,store information about location of representative only.

Query algorithm:

1. Compute the representative z assigned to x .

2. Find occurrences of z within y .

3. Check which occurrences of z extend to occurrences of x .

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 12/16

Page 57: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

IPM Queries: Outline

x : z :

2k

y :

Main idea:Design (locally consistent) representative assignment

representatives of length 2k with |x|4 < 2k ≤ |x|

2 ,store information about location of representative only.

Query algorithm:

1. Compute the representative z assigned to x .

2. Find occurrences of z (as representatives) within y .

3. Check which occurrences of z extend to occurrences of x .

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 12/16

Page 58: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

IPM Queries: Outline

x : z :

2k

y :

Main idea:Design (locally consistent) representative assignment

representatives of length 2k with |x|4 < 2k ≤ |x|

2 ,store information about location of representative only.

Query algorithm:

1. Compute the representative z assigned to x .

2. Find occurrences of z (as representatives) within y .

3. Check which occurrences of z extend to occurrences of x .Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 12/16

Page 59: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

IPM Queries: Outline

x : z :

2k

y :

Main idea:Design (locally consistent) representative assignment

representatives of length 2k with |x|4 < 2k ≤ |x|

2 ,store information about location of representative only.

Query algorithm:

1. Compute the representative z assigned to x .

2. Find occurrences of z (as representatives) within y .

3. Check which occurrences of z extend to occurrences of x .Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 12/16

Page 60: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Periodic Representatives

x :

y :

Representative: any periodic subword of appropriate length.

use combinatorial and algorithmic tools for repetitions.

Query algorithm:

1. Extend z to a run.

2. Find runs of the same period in y .

3. Generate candidates using run endpoints.

4. Check which candidates are indeed occurrences.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 13/16

Page 61: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Periodic Representatives

x :

y :

Representative: any periodic subword of appropriate length.

use combinatorial and algorithmic tools for repetitions.

Query algorithm:

1. Extend z to a run.

2. Find runs of the same period in y .

3. Generate candidates using run endpoints.

4. Check which candidates are indeed occurrences.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 13/16

Page 62: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Periodic Representatives

x :

y :

Representative: any periodic subword of appropriate length.

use combinatorial and algorithmic tools for repetitions.

Query algorithm:

1. Extend z to a run. (Suppose it does not cover x .)

2. Find runs of the same period in y .

3. Generate candidates using run endpoints.

4. Check which candidates are indeed occurrences.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 13/16

Page 63: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Periodic Representatives

x :

y :

Representative: any periodic subword of appropriate length.

use combinatorial and algorithmic tools for repetitions.

Query algorithm:

1. Extend z to a run. (Suppose it does not cover x .)

2. Find runs of the same period in y .

3. Generate candidates using run endpoints.

4. Check which candidates are indeed occurrences.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 13/16

Page 64: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Periodic Representatives

x :

y :

Representative: any periodic subword of appropriate length.

use combinatorial and algorithmic tools for repetitions.

Query algorithm:

1. Extend z to a run. (Suppose it does not cover x .)

2. Find runs of the same period in y .

3. Generate candidates using run endpoints.

4. Check which candidates are indeed occurrences.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 13/16

Page 65: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Periodic Representatives

x :

y :

Representative: any periodic subword of appropriate length.

use combinatorial and algorithmic tools for repetitions.

Query algorithm:

1. Extend z to a run. (Suppose it does not cover x .)

2. Find runs of the same period in y .

3. Generate candidates using run endpoints.

4. Check which candidates are indeed occurrences.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 13/16

Page 66: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Periodic Representatives

x :

y :

Representative: any periodic subword of appropriate length.

use combinatorial and algorithmic tools for repetitions.

Query algorithm:

1. Extend z to a run. (Suppose it covers x .)

2. Find runs of the same period in y .

3. Synchronize runs using Lyndon roots.

4. Output occurrences in bulk.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 13/16

Page 67: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Periodic Representatives

x :

y :

Representative: any periodic subword of appropriate length.

use combinatorial and algorithmic tools for repetitions.

Query algorithm:

1. Extend z to a run. (Suppose it covers x .)

2. Find runs of the same period in y .

3. Synchronize runs using Lyndon roots.

4. Output occurrences in bulk.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 13/16

Page 68: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Periodic Representatives

x :

y :

Representative: any periodic subword of appropriate length.

use combinatorial and algorithmic tools for repetitions.

Query algorithm:

1. Extend z to a run. (Suppose it covers x .)

2. Find runs of the same period in y .

3. Synchronize runs using Lyndon roots.

4. Output occurrences in bulk.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 13/16

Page 69: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Periodic Representatives

x :

y :

Representative: any periodic subword of appropriate length.

use combinatorial and algorithmic tools for repetitions.

Query algorithm:

1. Extend z to a run. (Suppose it covers x .)

2. Find runs of the same period in y .

3. Synchronize runs using Lyndon roots.

4. Output occurrences in bulk.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 13/16

Page 70: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Non-periodic Representatives

x : z :

x : z :x : z :x : z :x : z :

y :

Non-periodic representatives:

occurrences of z in y cannot overlap too much;O(1) occurrences within y .

Representative assignment:minimal subword of appropriate length wrt. a random order

guaranteed local consistence,neighbouring patterns often share the representative,in total O(n) representatives with O(n) occurrences as arepresentative.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 14/16

Page 71: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Non-periodic Representatives

x : z :

x : z :x : z :x : z :x : z :

y :

Non-periodic representatives:occurrences of z in y cannot overlap too much;

O(1) occurrences within y .

Representative assignment:minimal subword of appropriate length wrt. a random order

guaranteed local consistence,neighbouring patterns often share the representative,in total O(n) representatives with O(n) occurrences as arepresentative.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 14/16

Page 72: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Non-periodic Representatives

x : z :

x : z :x : z :x : z :x : z :

y :

Non-periodic representatives:occurrences of z in y cannot overlap too much;

O(1) occurrences within y .

Representative assignment:

minimal subword of appropriate length wrt. a random orderguaranteed local consistence,neighbouring patterns often share the representative,in total O(n) representatives with O(n) occurrences as arepresentative.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 14/16

Page 73: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Non-periodic Representatives

x : z :

x : z :x : z :x : z :x : z :

y :

Non-periodic representatives:occurrences of z in y cannot overlap too much;

O(1) occurrences within y .

Representative assignment:minimal subword of appropriate length wrt. a random order

guaranteed local consistence,

neighbouring patterns often share the representative,in total O(n) representatives with O(n) occurrences as arepresentative.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 14/16

Page 74: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Non-periodic Representatives

x : z :

x : z :x : z :x : z :x : z :

y :

Non-periodic representatives:occurrences of z in y cannot overlap too much;

O(1) occurrences within y .

Representative assignment:minimal subword of appropriate length wrt. a random order

guaranteed local consistence,neighbouring patterns often share the representative,

in total O(n) representatives with O(n) occurrences as arepresentative.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 14/16

Page 75: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Non-periodic Representatives

x : z :

x : z :

x : z :x : z :x : z :

y :

Non-periodic representatives:occurrences of z in y cannot overlap too much;

O(1) occurrences within y .

Representative assignment:minimal subword of appropriate length wrt. a random order

guaranteed local consistence,neighbouring patterns often share the representative,

in total O(n) representatives with O(n) occurrences as arepresentative.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 14/16

Page 76: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Non-periodic Representatives

x : z :x : z :

x : z :

x : z :x : z :

y :

Non-periodic representatives:occurrences of z in y cannot overlap too much;

O(1) occurrences within y .

Representative assignment:minimal subword of appropriate length wrt. a random order

guaranteed local consistence,neighbouring patterns often share the representative,

in total O(n) representatives with O(n) occurrences as arepresentative.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 14/16

Page 77: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Non-periodic Representatives

x : z :x : z :x : z :

x : z :

x : z :

y :

Non-periodic representatives:occurrences of z in y cannot overlap too much;

O(1) occurrences within y .

Representative assignment:minimal subword of appropriate length wrt. a random order

guaranteed local consistence,neighbouring patterns often share the representative,

in total O(n) representatives with O(n) occurrences as arepresentative.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 14/16

Page 78: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Non-periodic Representatives

x : z :x : z :x : z :x : z :

x : z :

y :

Non-periodic representatives:occurrences of z in y cannot overlap too much;

O(1) occurrences within y .

Representative assignment:minimal subword of appropriate length wrt. a random order

guaranteed local consistence,neighbouring patterns often share the representative,

in total O(n) representatives with O(n) occurrences as arepresentative.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 14/16

Page 79: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Non-periodic Representatives

x : z :x : z :x : z :x : z :

x : z :

y :

Non-periodic representatives:occurrences of z in y cannot overlap too much;

O(1) occurrences within y .

Representative assignment:minimal subword of appropriate length wrt. a random order

guaranteed local consistence,neighbouring patterns often share the representative,in total O(n) representatives with O(n) occurrences as arepresentative.

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 14/16

Page 80: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Conclusions & Open Problems

Our results:

a O(n)-size data structure for IPM Queries with O(1)-timequeries and O(n)-expected time construction,several applications of IPM Queries:

further internal queries,faster algorithms.

Open problems:

More applications of IPM Queries.Design a deterministic construction algorithm. . .

or an algorithm running in O(n) time w.h.p.

Can shortest periods be computed in o(log |x |) time?Is there any lower bound for this problem?

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 15/16

Page 81: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Conclusions & Open Problems

Our results:

a O(n)-size data structure for IPM Queries with O(1)-timequeries and O(n)-expected time construction,several applications of IPM Queries:

further internal queries,faster algorithms.

Open problems:

More applications of IPM Queries.

Design a deterministic construction algorithm. . .or an algorithm running in O(n) time w.h.p.

Can shortest periods be computed in o(log |x |) time?Is there any lower bound for this problem?

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 15/16

Page 82: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Conclusions & Open Problems

Our results:

a O(n)-size data structure for IPM Queries with O(1)-timequeries and O(n)-expected time construction,several applications of IPM Queries:

further internal queries,faster algorithms.

Open problems:

More applications of IPM Queries.Design a deterministic construction algorithm. . .

or an algorithm running in O(n) time w.h.p.

Can shortest periods be computed in o(log |x |) time?Is there any lower bound for this problem?

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 15/16

Page 83: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Conclusions & Open Problems

Our results:

a O(n)-size data structure for IPM Queries with O(1)-timequeries and O(n)-expected time construction,several applications of IPM Queries:

further internal queries,faster algorithms.

Open problems:

More applications of IPM Queries.Design a deterministic construction algorithm. . .

or an algorithm running in O(n) time w.h.p.

Can shortest periods be computed in o(log |x |) time?Is there any lower bound for this problem?

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 15/16

Page 84: Internal Pattern Matching Queries in a Text and Applicationskociumaka/files/soda2015b_slides.pdf · SODA 2015 San Diego, California, USA January 4, 2015 Tomasz Kociumaka, J. Radoszewski,

Thank you

Thank you for your attention!

Tomasz Kociumaka, J. Radoszewski, W. Rytter, T. Waleń Internal Pattern Matching Queries in a Text 16/16