succinct indexes for strings, binary relations and multi-labeled trees jérémy barbay, meng he, j....

24
Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen

Upload: tamsin-francis

Post on 18-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees

Jérémy Barbay, Meng He, J. Ian Munro,

University of WaterlooS. Srinivasa Rao,

IT University of Copenhagen

Page 2: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Background: Succinct Data Structures

What are succinct data structures Jacobson 1989

Why succinct data structures Large data sets in modern applications:

textual, genomic, spatial or geometric An implementation: Delpratt et al. 2006

Succinct integrated encodings Main data and auxiliary data structures

Page 3: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Our Problem: Succinct Indexes Use of the concept in previous work

Compact PAT trees: Clark & Munro 1996 Lower bounds: Demaine & López-Ortiz 2001;

Miltersen 2005 Upper bounds: Sadakane & Grossi 2006

Definition of succinct indexes in data structure design ADT: primitive access operators Succinct index: more powerful operators

Page 4: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Succinct Integrated Encodings

+

Navigational Operations

Auxiliary Data Structures

XMain Data

Page 5: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Succinct Indexes

+

Navigational Operations

Succinct IndexMain Data

Page 6: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Succinct Indexes vs. Integrated Encodings

Maximizing the freedom of the encoding of the main data

Allowing incremental design

Supporting implicit data

Page 7: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Strings: Definitions Notation

Alphabet: [σ]={1, 2, …, σ} String: S[1..n]

Operations: string_access(x): S[x] string_rank(α, x): number of occurrences of α in S[1..x]

string_select(α, r): position of the rth occurrence of α in S

Page 8: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Strings: An Example

S = a a b a c c c d a d d a b b b c

string_access(8) =

d

string_rank(a, 8) =

3

string_select(b, 3) =

14

Page 9: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Strings: Previous Results

Succinct Integrated Encodings Wavelet trees: Grossi et al. 2003

Space: nH0 + o(n)∙lg σ bits Time: O(lg σ) time for all three operations

Golynski et al. 2006 Space: n (lg σ + o(lg σ)) bits Time: O(lglg σ) time for string_access and

string_rank, O(1) time for string_select

Page 10: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Strings: Our Results

Succinct Indexes ADT

string_access: f(n, σ) time Space: n∙o(lg σ) bits Operations

string_rank: O(lglg σ lglglg σ (f(n, σ)+lglg

σ)) string_select: O(lglglg σ (f(n, σ)+lglg σ)) Other operations: negations

Page 11: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Binary Relations: Definitions Notation

Binary relation: R ⊆ [n] x [σ] Number of objects: n; number of labels: σ Number of object-label pairs: t

Operations object_access(x, r): rth label associated with x label_access(x, α): whether x is associated

with α label_rank(α, x): number of objects labeled α

up to object x label_select(α, r): rth object labeled α

Page 12: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Binary Relations: An Example

σ

n object_access(1, 2) =label_access(2, 3) =label_rank(3, 4) =label_select(4, 3) =

4

false

3

5

0 1 0 1 0

0 0 0 1 0

1 0 1 1 0

1 1 0 0 1

Page 13: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Binary Relations: Previous Results

Succinct Integrated Encodings Barbay et al., 2006

Space: t (lg σ + o(lg σ)) bits Time: O(lglg σ) time for object_access,

label_rank and label_access, O(1) time for label_select

Page 14: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Binary Relations: Our Results Succinct Indexes

ADT: object_access: f(n,σ,t)

Space: t∙o(lg σ) bits

Time: label_rank and label_access: O(lglg σ

lglglg σ (f(n,σ,t) + lglg σ)) label_select: O(lglglg σ (f(n,σ,t) + lglg σ))

Page 15: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Multi-labeled Trees: Definitions Notation

Number of nodes: n Number of labels: σ Number of node-label pairs: t

Operations α-descendant α-child α-ancestor

Page 16: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Multi-labeled Trees: An Example

1

2

3 7

5 6

4

8

9 10 11

{a, c, d}

{c, d}

{a}

{a, c}

{a, b} {b,d}

{a, b}{b}

{c} {c,d}

{b,c,d}

Node 2 is a c-ancestorof node 6

Node 6 is a b-descendantof node 2

Node 10 is a d-childof node 8

Page 17: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Multi-labeled Trees: Previous Results

Labeled trees Geary et al. 2004 Ferragina et al. 2005 Barbay et al. 2006

Multi-labeled trees Barbay et al. 2006

Page 18: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

3

Multi-labeled Trees: Our Approach Traversal Orders

Preorder DFUDS order

Ordinal Trees: DFUDS Benoit et al. 1999 &

2005 Jansson et al. 2007

2 Binary Relations Nodes in preorder &

labels Nodes in DFUDS order

& labels

1

2

7

5 6

4

8

9 10 11

3

4 5 6

7 8

Page 19: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Multi-labeled Trees: Our Results Succinct Indexes

ADT: node_label(x, r) Supporting α-child/descendant queries:

t∙o(lg σ) bits Supporting α-child/descendant/ancestor

queries: t∙(lg ρ + o(lg ρ) + o(lg σ))bits (ρ: recursivity)

Supporting α-child/descendant/ancestor queries of node x after another node y

Page 20: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Applications Compressed Succinct Encodings

Strings Space: nHk + o(nlg σ) bits Operations:

string_access: O(1) String_rank: O((lglg σ)2lglglg σ) string_select: O(lglg σ lglglg σ)

First high-order entropy-compressed encoding supporting rank/select efficiently

Other Data Structures

Page 21: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Applications (Continued) High-order entropy-compressed text

indexes for large alphabets Notations: n-text size, σ-alphabet size, m-

pattern length, occ-number of occurrences

Our results Space: n Hk+o(n lg σ) bits Pattern searching: O(m lglg σ+occ lg1+ε

n lglg σ) Previous results: a lg σ factor instead of

lglg σ or incompressible

Page 22: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Conclusions We showed the importance of succinct

indexes in the design of succinct data structures by designing: Succinct representation of multi-

labeled trees that supports efficient retrieval of ancestors / children / descendants by label

First high-order entropy compressed representation of strings supporting rank/select

High-order entropy compressed text indexes for large alphabets

Page 23: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Conclusions (Continued)

The concept of succinct indexes is useful in designing succinct data structures … it maximizes the freedom of the encoding of the main data and leads to a rich choice of design tradeoffs.

Page 24: Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT

Thank you!