compressed full-text indexes for highly repetitive collections · lectio praecursoria jouni sirén...
TRANSCRIPT
![Page 1: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/1.jpg)
Compressed Full-Text Indexes for Highly
Repetitive Collections
Lectio praecursoriaJouni Sirén 29.6.2012
![Page 2: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/2.jpg)
ALGORITHM
![Page 3: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/3.jpg)
Sadakane: New text indexing functionalities of the compressed suffix
Burrows, Wheeler: A block sorting lossless data compression algorithm
Ferragina, Manzini: Indexing compressed text
Grossi, Vitter: Compressed suffix arrays and suffix trees with
Navarro, Mäkinen: Compressed full-text indexes
Raman, Raman, Rao: Succinct indexable dictionaries with applications
Ferragina, Manzini, Mäkinen, Navarro: Compressed representations of
Sadakane: Compressed suffix trees with full functionality
Manber, Myers: Suffix arrays: A new method for on-line string searches
![Page 4: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/4.jpg)
Are there papers withSadakane as the first author?
Sadakane: New text indexing functionalities of the compressed suffix
Burrows, Wheeler: A block sorting lossless data compression algorithm
Ferragina, Manzini: Indexing compressed text
Grossi, Vitter: Compressed suffix arrays and suffix trees with
Navarro, Mäkinen: Compressed full-text indexes
Raman, Raman, Rao: Succinct indexable dictionaries with applications
Ferragina, Manzini, Mäkinen, Navarro: Compressed representations of
Sadakane: Compressed suffix trees with full functionality
Manber, Myers: Suffix arrays: A new method for on-line string searches
![Page 5: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/5.jpg)
How many papers haveSadakane as the first author?
Sadakane: New text indexing functionalities of the compressed suffix
Burrows, Wheeler: A block sorting lossless data compression algorithm
Ferragina, Manzini: Indexing compressed text
Grossi, Vitter: Compressed suffix arrays and suffix trees with
Navarro, Mäkinen: Compressed full-text indexes
Raman, Raman, Rao: Succinct indexable dictionaries with applications
Ferragina, Manzini, Mäkinen, Navarro: Compressed representations of
Sadakane: Compressed suffix trees with full functionality
Manber, Myers: Suffix arrays: A new method for on-line string searches
![Page 6: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/6.jpg)
What are the papers withSadakane as the first author?
Sadakane: New text indexing functionalities of the compressed suffix
Burrows, Wheeler: A block sorting lossless data compression algorithm
Ferragina, Manzini: Indexing compressed text
Grossi, Vitter: Compressed suffix arrays and suffix trees with
Navarro, Mäkinen: Compressed full-text indexes
Raman, Raman, Rao: Succinct indexable dictionaries with applications
Ferragina, Manzini, Mäkinen, Navarro: Compressed representations of
Sadakane: Compressed suffix trees with full functionality
Manber, Myers: Suffix arrays: A new method for on-line string searches
![Page 7: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/7.jpg)
Sadakane: New text indexing functionalities of the compressed suffix
Burrows, Wheeler: A block sorting lossless data compression algorithm
Ferragina, Manzini: Indexing compressed text
Grossi, Vitter: Compressed suffix arrays and suffix trees with
Navarro, Mäkinen: Compressed full-text indexes
Raman, Raman, Rao: Succinct indexable dictionaries with applications
Ferragina, Manzini, Mäkinen, Navarro: Compressed representations of
Sadakane: Compressed suffix trees with full functionality
Manber, Myers: Suffix arrays: A new method for on-line string searches
![Page 8: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/8.jpg)
Sadakane: New text indexing functionalities of the compressed suffix
Burrows, Wheeler: A block sorting lossless data compression algorithm
Ferragina, Manzini: Indexing compressed text
Grossi, Vitter: Compressed suffix arrays and suffix trees with
Navarro, Mäkinen: Compressed full-text indexes
Raman, Raman, Rao: Succinct indexable dictionaries with applications
Ferragina, Manzini, Mäkinen, Navarro: Compressed representations of
Sadakane: Compressed suffix trees with full functionality
Manber, Myers: Suffix arrays: A new method for on-line string searches
![Page 9: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/9.jpg)
Sadakane: New text indexing functionalities of the compressed suffix
Burrows, Wheeler: A block sorting lossless data compression algorithm
Ferragina, Manzini: Indexing compressed text
Grossi, Vitter: Compressed suffix arrays and suffix trees with
Navarro, Mäkinen: Compressed full-text indexes
Raman, Raman, Rao: Succinct indexable dictionaries with applications
Ferragina, Manzini, Mäkinen, Navarro: Compressed representations of
Sadakane: Compressed suffix trees with full functionality
Manber, Myers: Suffix arrays: A new method for on-line string searches
![Page 10: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/10.jpg)
Sadakane: New text indexing functionalities of the compressed suffix
Burrows, Wheeler: A block sorting lossless data compression algorithm
Ferragina, Manzini: Indexing compressed text
Grossi, Vitter: Compressed suffix arrays and suffix trees with
Navarro, Mäkinen: Compressed full-text indexes
Raman, Raman, Rao: Succinct indexable dictionaries with applications
Ferragina, Manzini, Mäkinen, Navarro: Compressed representations of
Sadakane: Compressed suffix trees with full functionality
Manber, Myers: Suffix arrays: A new method for on-line string searches
![Page 11: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/11.jpg)
Sadakane: New text indexing functionalities of the compressed suffix
Burrows, Wheeler: A block sorting lossless data compression algorithm
Ferragina, Manzini: Indexing compressed text
Grossi, Vitter: Compressed suffix arrays and suffix trees with
Navarro, Mäkinen: Compressed full-text indexes
Raman, Raman, Rao: Succinct indexable dictionaries with applications
Ferragina, Manzini, Mäkinen, Navarro: Compressed representations of
Sadakane: Compressed suffix trees with full functionality
Manber, Myers: Suffix arrays: A new method for on-line string searches
![Page 12: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/12.jpg)
Sadakane: New text indexing functionalities of the compressed suffix
Burrows, Wheeler: A block sorting lossless data compression algorithm
Ferragina, Manzini: Indexing compressed text
Grossi, Vitter: Compressed suffix arrays and suffix trees with
Navarro, Mäkinen: Compressed full-text indexes
Raman, Raman, Rao: Succinct indexable dictionaries with applications
Ferragina, Manzini, Mäkinen, Navarro: Compressed representations of
Sadakane: Compressed suffix trees with full functionality
Manber, Myers: Suffix arrays: A new method for on-line string searches
![Page 13: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/13.jpg)
DATA STRUCTURE
![Page 14: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/14.jpg)
• What if we have to preserve the original order of the records?
• We may want even faster queries.
• Perhaps there are too many records to fit into memory.
• Then we probably need another data structure.
![Page 15: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/15.jpg)
INDEX
![Page 16: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/16.jpg)
Sadakane: New text indexing functionalities of the compressed suffix
Burrows, Wheeler: A block sorting lossless data compression algorithm
Ferragina, Manzini: Indexing compressed text
Grossi, Vitter: Compressed suffix arrays and suffix trees with
Navarro, Mäkinen: Compressed full-text indexes
Raman, Raman, Rao: Succinct indexable dictionaries with applications
Ferragina, Manzini, Mäkinen, Navarro: Compressed representations of
Sadakane: Compressed suffix trees with full functionality
Manber, Myers: Suffix arrays: A new method for on-line string searches
Navarro
Raman
Grossi
Burrows
Ferragina
Manber
Sadakane
![Page 17: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/17.jpg)
FULL-TEXT INDEX
![Page 18: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/18.jpg)
$A C G T A C T G $A C T G $C G T A C T G $C T G $G $G A C G T A C T G $G T A C T G $T A C T G $T G $
GACGTACTG$
Suffix Array
![Page 19: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/19.jpg)
10263791458
GACGTACTG$
Suffix Array
![Page 20: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/20.jpg)
• While a character takes 1 byte, each pointer requires 4 or 8 bytes.
• Suffix array usually requires 5 or 9 times more space than the text.
• We need something smaller to handle large texts.
![Page 21: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/21.jpg)
COMPRESSED INDEX
![Page 22: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/22.jpg)
• Ferragina, Manzini 2000, 2005: FM-index
• Grossi, Vitter 2000, 2005: Compressed Suffix Array
• Use Burrows-Wheeler transform to simulate the suffix array.
• Compresses to 40% to 80% of text size.
• Yet some data should compress better.
![Page 23: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/23.jpg)
HIGHLY REPETITIVE DATA
![Page 24: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/24.jpg)
Individual GenomesG A C G T A - C T G C A G A T G - T A A T G CG A C G T A - C T G C A G A T G C T A A T C CG A C G T A - - - G C A G A T G C T A A T G CG A C G T A - C T G C A G - T G C T A A T G CG A C G T A - - - G C A G A T G C T A A T C CG A C G T A - C T G C T G A T G C T A A T G CG A C G T A C C T G C A G A T G C T A A T G CG A C G T A C C T G C A G - T G C T A A T G CG A C G T A - C T G C T G A T G C T A A T G CG A C G T A - C T G C A G A T G C T A A T C C
![Page 25: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/25.jpg)
Version Historydhcp-eduroam-hy-138-42:thesis jltsiren$ svn diff -r 662 thesis.texIndex: thesis.tex===================================================================--- thesis.tex (revision 662)+++ thesis.tex (working copy)@@ -23,7 +23,7 @@ \isbnpdf{978-952-10-8052-4} \issn{1238-8645} \printhouse{Unigrafia}-\pubpages{108 + 72} % FIXME+\pubpages{97 + 63} \supervisorlist{Veli Mäkinen, University of Helsinki, Finland} \preexaminera{Kunihiko Sadakane, National Institute of Informatics, Japan} \preexaminerb{Jorma Tarhio, Aalto University, Finland}
![Page 26: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/26.jpg)
Suffix array construction378 gigabytes
Finnish language Wikipedia with full version history42 gigabytes
Run-length compressed suffix array4.4 gigabytes
Do we have 378 gigabytes of memory?
![Page 27: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/27.jpg)
INDEX CONSTRUCTION
![Page 28: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/28.jpg)
Data Construction RLCSA
Suffix Array Direct Construction
![Page 29: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/29.jpg)
Data Construction RLCSA
Suffix Array Direct Construction
![Page 30: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/30.jpg)
INDEXING AUTOMATA
![Page 31: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/31.jpg)
# G G GA AC C CT
T
T $
# G G G
A
A
A
A
A
A
C
C C C
T
T T $# GA GT
ACTA CTA
ACG CG
AT TGT
TA
AG
ACC
ACTG
CC CTG TG$ G$ $
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITET
UNIVERSITY OF HELSINKIMATEMAATTIS-LUONNONTIETEELLINEN TIEDEKUNTA
MATEMATISK-NATURVETENSKAPLIGA FAKULTETENFACULTY OF SCIENCE
Indexing Finite Language Representationof Population Genotypes Jouni Sirén, Niko Välimäki, Veli Mäkinen
ABSTRACTCompressed full-text indexes [6] based on the Bur-rows-Wheeler transform (BWT) are widely used inbioinformatics. Their most succesful application sofar has been mapping short reads to a referencesequence (e.g. Bowtie [3], BWA [4], SOAP2 [5]).These indexes use the BWT to simulate the suffixtree or the suffix array (SA), while using much lessspace than either of them. A simple generalizationallows indexing a set of sequences.
We propose a biologically motivated generalizationof the BWT to finite languages. Given a multiplealignment of sequences (e.g. individual genomes),we build a compressed index capable of simulatingthe suffix array over plausible recombinations of thesequences. Alternatively, we start from a referencesequence and a set of mutations, and build the in-dex over sequences containing any subset of themutations.
Our approach is based on finite automata. We startwith an automaton recognizing the input language.This automaton is transformed into an equivalentautomaton, where each state corresponds to a lexi-cographic range of suffixes of the language. A gen-eralization of the XBW transform for labeled trees[2] is used to index the transformed automaton.
FULL-TEXT INDEXES FOR PATTERN MATCHING AND SEQUENCE ANALYSIS
A
Suffix Tree SA Sorted Suffixes BWT
10
2
6
3
7
9
1
4
5
8
$
$GTCATGCAG $
10
2
6
3
7
9
1
4
5
8
$GTCATGCA
$GTCATGC
$GTCATG
$GTCAT
$GTCA
$GTC
$GT
$G
GTCATGCA
A
C
C
G
G
G
T
T
G
G
G
G
G
G
G
G
G
A
A
A
A
A
A
A
C
C
C
C
C
C
G
G
G
G
G
T
T
T
T
A
A
A
C
C
T
AC
C
$
G
T
GTACTG$
TG$
GTACTG$
TG$
$
ACGTACTG$
TACTG$
ACTG$
G$
$GTCATGCAGGC
A MATCH IN MULTIPLE ALIGNMENT
GTCATGCAG –
GATGCAG –
GTCATGAG –
GTCATCAG
– –
T
– CT TG GA
INITIAL AUTOMATON AND SORTED AUTOMATON
# G G GA AC C CT
T
T $
# G G G
A
A
A
A
A
A
C
C C C
T
T T $# GA GT
ACTA CTA
ACG CG
AT TGT
TA
AG
ACC
ACTG
CC CTG TG$ G$ $
GENERALIZED COMPRESSED SUFFIX ARRAY
$ ACC ACG ACTA ACTG AG AT CC CG CTA CTG G$ GA GT TA TG$ TGT #
BWT G T G G T T G A A A AC AT # CT CG C A $Edges 1 1 1 1 1 1 1 1 1 1 1 1 100 1 100 1 1 1
Basic operations are about 2 times slower than in regular BWT-based indexes. For reasonable mutationfrequencies f , the expected size of the sorted automaton is n(1 + f )O(log n), where n is the length of thereference sequence. For 1/f = W(log n), this becomes O(n). In our experiments, an index built for thehuman reference genome and the genetic variation found in the Finnish population sample of the 1000Genomes Project took approximately 2.8 gigabytes.
FUTURE DIRECTIONS• With our current algorithm, the construction of
a genome-scale index requires 12 hours and192 gigabytes of memory. We are currently in-vestigating other algorithms, such as externalmemory construction and distributed construc-tion in the MapReduce framework [1].
• In principle, our index can be used in any algo-rithm using a regular BWT-based index. Whatcan be done efficiently in practice?
• We are currently investigating several ways touse the generalized index in read alignment.Are there other applications, where our indexcould be superior to the existing approaches?
REFERENCES[1] J. Dean, S. Ghemawat: Simplified Data Pro-
cessing on Large Clusters. OSDI 2004.
[2] P. Ferragina et al.: Compressing and indexinglabeled trees, with applications. Journal of theACM, 2009.
[3] B. Langmead et al.: Ultrafast and memory-effi-cient alignment of short DNA sequences to thehuman genome. Genome Biology, 2009.
[4] H. Li, R. Durbin: Fast and accurate short readalignment with Burrows-Wheeler Transform.Bioinformatics, 2009.
[5] R. Li et al.: SOAP2: an improved ultrafast toolfor short read alignment. Bioinformatics, 2009.
[6] G. Navarro, V. Mäkinen: Compressed full-textindexes. ACM Computing Surveys, 2007.
![Page 32: Compressed Full-Text Indexes for Highly Repetitive Collections · Lectio praecursoria Jouni Sirén 29.6.2012. ALGORITHM. Sadakane: New text indexing functionalities of the compressed](https://reader036.vdocuments.net/reader036/viewer/2022070110/6048148a8ea9a0522d3d3586/html5/thumbnails/32.jpg)
Compressed Full-Text Indexes for Highly
Repetitive Collections