random access to grammar compressed strings
TRANSCRIPT
![Page 1: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/1.jpg)
Random Access to Grammar Compressed Strings
Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann
![Page 2: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/2.jpg)
Random Access to Compressed Strings
textDNAXML
![Page 3: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/3.jpg)
Random Access to Compressed Strings
textDNAXML
• What is the ith character?
• What is the substring at [i,j]?
• Does pattern P appear in text? (perhaps with k errors?)
![Page 4: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/4.jpg)
Random Access to Grammar Compressed Strings
• Grammar based compression captures many popular compression schemes with no or little blowup in space [Charikar et al. 2002, Rytter 2003].
• Lempel-Ziv family, Sequitur, Run-Length Encoding, Re-Pair, ...
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
N
≤n
AGTAGTAG N = 8
X7 → X6X3X6 → X5X5X5 → X3X4X4 → TX3 → X1X2X2 → GX1 → A
n = 7
![Page 5: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/5.jpg)
Tradeoffs and Results
• What is the ith character?
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
N
n1 1
1
1 1
26
3 3
2 1
1 1
2
O(N) spaceO(1) query
O(n) spaceO(n) query
O(n) spaceO(log N) query
• What is the substring at [i,j]?O(n) spaceO(log N + j - i) query
![Page 6: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/6.jpg)
Application: Black-Box Compressed String Matching
• Does “AGGA” appear in the text (perhaps with k errors)?
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
1 1
1
1 1
26
3 3
2 1
1 1
2
![Page 7: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/7.jpg)
Application: Black-Box Compressed String Matching
• Total time O(n (log N + m + Blackbox(m))).
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
26
✓✓m+k m+k
• Does “AGGA” appear in the text (perhaps with k errors)?
![Page 8: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/8.jpg)
Extension: Compressed Trees
• Linear space in compressed tree.
• Fast navigation operations (select, access, parent, depth, height, subtree_size, first_child, next_sibling, level_ancestor, nca).
b c
db
ac a
d
b
c d
b
b
c d
c d b
db
ac a
c d
![Page 9: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/9.jpg)
Heavy Path Decomposition
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
N
n1 1
1
1 1
26
3 3
2 1
1 1
2
![Page 10: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/10.jpg)
Heavy Path Decomposition
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
N
n1 1
1
1 1
26
3 3
2 1
1 1
2
![Page 11: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/11.jpg)
Heavy Path Decomposition
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
N
n1 1
1
1 1
26
3 3
2 1
1 1
2
![Page 12: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/12.jpg)
Random Access Query
• The path from root to i goes through O(log N) heavy paths
• Query: Binary search all heavy paths on the way
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 i 4 5 6 7 8
N
n1 1
1
1 1
26
3 3
2 1
1 1
2
O(log n) ⋅ O(log N)
![Page 13: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/13.jpg)
Random Access Query
• The path from root to i goes through O(log N) heavy paths
• Query: Binary search all heavy paths on the way
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 i 4 5 6 7 8
N
n1 1
1
1 1
26
3 3
2 1
1 1
2
O(log n) ⋅ O(log N)
![Page 14: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/14.jpg)
Interval Biased Search Tree
• The path from root to i goes through O(log N) heavy paths
• Query: Binary search all heavy paths on the way
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 i 4 5 6 7 8
N
n1 1
1
1 1
26
3 3
2 1
1 1
2
O(log n) ⋅ O(log N)O(log N/x)
x =
0 N
?
x
Telescopes to O(log N)
• Space: Each IBSTs uses linear space => total O(n2) space for all heavy paths.
![Page 15: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/15.jpg)
O(n) Representation of Heavy Paths
• Search for i on heavy path = lowest ancestor of distance i.
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
N
n
X7
X6
X5
X3
X2 X1
Xi Xj
Xk Xl Xp
X4
G A T
XtXm
1 1
1
1 1
26
3 3
2 1
1 1
2
![Page 16: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/16.jpg)
O(n) Representation of Heavy Paths
• Search for i on heavy path = lowest ancestor of distance i.
• A heavy path decomposition of heavy path representation.
• In-path: O(log N/x) time, total O(n) space.
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
N
n
X7
X6
X5
X3
X2 X1
Xi Xj
Xk Xl Xp
X4
G A T
XtXm
1 1
1
1 1
26
3 3
2 1
1 1
2
![Page 17: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/17.jpg)
O(n) Representation of Heavy Paths
• Search for i on heavy path = lowest ancestor of distance i.
• A heavy path decomposition of heavy path representation.
• In-path: O(log N/x) time, total O(n) space.
• Between-paths: O(log N/x) time, total O(n log n) space.
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
N
n
X7
X6
X5
X3
X2 X1
Xi Xj
Xk Xl Xp
X4
G A T
XtXm
1 1
1
1 1
26
3 3
2 1
1 1
2 log n
![Page 18: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/18.jpg)
O(n) Representation of Heavy Paths
• Search for i on heavy path = lowest ancestor of distance i.
• A heavy path decomposition of heavy path decomposition.
• In-path: O(log N/x) time, total O(n) space.
• Between-paths: O(log N/x) time, total O(n log n) space.
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
N
n
X7
X6
X5
X3
X2 X1
Xi Xj
Xk Xl Xp
X4
G A T
XtXm
1 1
1
1 1
26
3 3
2 1
1 1
2
2
log n n/logn leaves
logn leaves
logn leaves
logn leaves
![Page 19: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/19.jpg)
O(n) Representation of Heavy Paths
• Search for i on heavy path = lowest ancestor of distance i.
• A heavy path decomposition of heavy path decomposition.
• In-path: O(log N/x) time, total O(n) space.
• Between-paths: O(log N/x) time, total O(n log n) space.
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
N
n
X7
X6
X5
X3
X2 X1
Xi Xj
Xk Xl Xp
X4
G A T
XtXm
1 1
1
1 1
26
3 3
2 1
1 1
2
2
log n n/logn leaves
logn nodes
logn nodes
logn nodes
![Page 20: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/20.jpg)
O(n) Representation of Heavy Paths
• Search for i on heavy path = lowest ancestor of distance i.
• A heavy path decomposition of heavy path decomposition.
• In-path: O(log N/x) time, total O(n) space.
• Between-paths: O(log N/x) time, total O(n log n) space.
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
N
n
X7
X6
X5
X3
X2 X1
Xi Xj
Xk Xl Xp
X4
G A T
XtXm
1 1
1
1 1
26
3 3
2 1
1 1
2
2
log n n/logn leaves
logn nodes
logn nodes
logn nodes
O(nα(n))
![Page 21: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/21.jpg)
O(n) Representation of Heavy Paths
• Search for i on heavy path = lowest ancestor of distance i.
• A heavy path decomposition of heavy path decomposition.
• In-path: O(log N/x) time, total O(n) space.
• Between-paths: O(log N/x) time, total O(n log n) space.
X3
X2X1
X5
X4 X3
X2X1
X5
X4
X1 X2
X3X6
X7
A G T A G T A G1 2 3 4 5 6 7 8
N
n
X7
X6
X5
X3
X2 X1
Xi Xj
Xk Xl Xp
X4
G A T
XtXm
1 1
1
1 1
26
3 3
2 1
1 1
2
2
log n n/logn leaves
logn nodes
logn nodes
logn nodes
O(nα(n))O(n) with bittricks
![Page 22: Random Access to Grammar Compressed Strings](https://reader030.vdocuments.net/reader030/viewer/2022021503/58699b941a28abdd708ba1e9/html5/thumbnails/22.jpg)
Summary
• Random access and substring decompression.
• O(n) space and O(log N + length of substring) time.
• Black compressed (approximate) string matching.
• Random access in compressed trees.