paolo ferragina, università di pisa compressed rank & select on general strings paolo ferragina...
TRANSCRIPT
Paolo Ferragina, Università di Pisa
Compressed Rank & Select
on general strings
Paolo FerraginaDipartimento di Informatica, Università di Pisa
Paolo Ferragina, Università di Pisa
Generalised Rank and Select
Rank(c,i) = #c in L[1,i]
Select(c,i) = position of the i-th c in L
L = a b a a a c b c d a b e c d ...
Rank( a , 7 ) = 4Select( a , 2 ) = 3
Paolo Ferragina, Università di Pisa
Generalised Rank and Select
If is small (i.e. constant) Build binary Rank data structure for each symbol of
Rank takes O(1) time and small space
If is large (words ?) Need a smarter solution: Wavelet Tree data structure
Algorithmic reduction:
>> Reduce Rank&Select over arbitrary strings
... to Rank&Select over binary strings
Paolo Ferragina, Università di Pisa
The Wavelet Tree
a c
b r
d
abracadabra
aacaaa brdbr
brbr
rr?
aaaaa?
bb?
d?
Paolo Ferragina, Università di Pisa
The Wavelet Tree
a c
b r
d
abracadabra
aacaaa brdbr
brbr
abracadabra01100010110
aacaaa001000
brdbr00100
brbr0101
01100010110
001000 00100
0101
Fact. Given the tree and the binary strings,we can recover the original string !!
In any case, O(|| log ||) bits.
Easier Alphabetic order + Heap structure
Paolo Ferragina, Università di Pisa
brdbr00100
abracadabra01100010110
brbr0101
aacaaa001000
The Wavelet Tree
a c
b r
d
Rank(b,8)
Rank(b,3)
Rank(b,2)
Reducetorightsymbols
Reducetoleftsymbols
It’s binary
Every step can be turned to binary
Paolo Ferragina, Università di Pisa
abracadabra01100010110
Rank1(8)=3
Rank0(2) = 2 – Rank1(1)= 1 Rank0(3) =
3 – Rank1(3)= 2brbr0101
brdbr00100
aacaaa001000
The Wavelet Tree
a c
b r
d
Generalised R&S implemented with log || binary R&S
Rank(b,8)
Right move=Rank1
Left move=Rank0
Left move=Rank0
Select is similar
Paolo Ferragina, Università di Pisa
Representing Trees
Paolo FerraginaDipartimento di Informatica, Università di Pisa
Standard representationBinary tree: each node has twopointers to its left and right children
An n-node tree takes2n pointers or 2n lg n bits.
Supports finding left child or right child of a node (in constant time).
For each extra operation (eg. parent, subtree size) we have to pay additional n lg n bits each.
x
xxxx
x xx x
Can we improve the space bound? There are less than 22n distinct binary trees on n nodes.
2n bits are enough to distinguish between any two different binary trees.
Can we represent an n node binary tree using 2n bits?
Binary tree representation A binary tree on n nodes can be represented using
2n+o(n) bits to support:
parent left child right child
in constant time.
Heap-like notation for a binary tree
1
11 1
1 1
1
1
00 00
0 000
0
Add external nodes
Label internal nodes with a 1and external nodes with a 0
Write the labels in level order
1 1 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0
One can reconstruct the tree from this sequence
An n node binary tree can be represented in 2n+1 bits.
What about the operations?
Heap-like notation for a binary tree
1 1 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
8
5 764
32
1
9
17161514
13121110
1
87
654
32
1 2 3 4 5 6 7 8
parent(x) = On red (⌊x/2⌋)
left child(x) = On green(2x)
right child(x) = On green(2x+1)
x x: # 1’s up to x (Rank)
x x: position of x-th 1 (Select)