phylogenies from large samples of bacterial …...phylogenies from large samples of bacterial...
Post on 23-May-2020
6 Views
Preview:
TRANSCRIPT
Phylogenies from Large Samples of Bacterial
Genomes
Bernhard HauboldMPI for Evolutionary Biology, Plön
June 10, 2016
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 1 / 17
Overview
From genomes to phylogenies
Approximate alignments
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 2 / 17
From Genomes to Phylogenies—1
Genomes Alignment Distance Matrix Tree
S4
S3
S2
S1
S4
S3
S2
S1 S1 S2 S3 S4
S1 0
S2 d2,1 0
S3 d3,1 d3,2 0
S4 d4,1 d4,2 d4,3 0
S3
S4
S1
S2
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 3 / 17
From Genomes to Phylogenies—2
Genomes Alignment Distance Matrix Tree
S4
S3
S2
S1
S4
S3
S2
S1 S1 S2 S3 S4
S1 0
S2 d2,1 0
S3 d3,1 d3,2 0
S4 d4,1 d4,2 d4,3 0
S3
S4
S1
S2
slow fast fast
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 4 / 17
From Genomes to Phylogenies—3
Genomes Alignment Distance Matrix Tree
S4
S3
S2
S1
S4
S3
S2
S1 S1 S2 S3 S4
S1 0
S2 d2,1 0
S3 d3,1 d3,2 0
S4 d4,1 d4,2 d4,3 0
S3
S4
S1
S2
slow fast fast
andi
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 5 / 17
Approximate Alignment
Only consider pairs of sequences.
Q S
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 6 / 17
Anchors
Q S
Q S
Anchors:
Unique
Cannot be extended (maximal)
Longer than random match
Equidistant
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 7 / 17
Anchor Distance
g1 AATGCCACCGGGTGATGATAGCCTCGATAGGCCGCAGGTCTCGCGGGGAAATC
g2 GCGAGAGCGCACCAGCGGGTGATGATAGCCTGGATAGGCCGCAGGACGGT
da =1
20 + 13= 0.03
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 8 / 17
Searching
Q S
Compute index of S:◮ Time- & memory-intensive step◮ Parallelize
Search index of S with Q: Quick
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 9 / 17
Implementation
Program: andi (ANchor DIstances)
Code: www.github.com/evolbioinf/andi
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 10 / 17
Accuracy
10−5
10−4
10−3
10−2
10−1
100
10−4 10−3 10−2 10−1
da
Substitutions per Site (K )
da
ideal
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 11 / 17
Problems at high Substitution Rates
0.1
0.2
0.3
0.4
0.5
0.60.70.80.91.0
0.4 0.5 0.6 0.70
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1d
a
Faile
dd
aE
stim
ation
Substitutions per Site (K )
da
idealfailed da
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 12 / 17
29 Escherichia coli Genomes
mugsy: 2 h 29 min andi: 16.7s0.002
E. coli IAI1E. coli SE11
E. coli E24377A
S. sonnei Ss046
S. boydii Sb227S. boydii CDC 3083-94
S. flexneri 5 str. 8401S. flexneri 2a str. 2457TS. flexneri 2a str. 301
E. coli ATCC 8739E. coli HS
E. coli str. K-12 substr. MG1655E. coli str. K12 substr. W3110
E. coli str. K12 substr. DH10BE. coli BW2952
S. dysenteriae Sd197E. coli O55:H7 str. CB9615
E. coli O157:H7 EDL933E. coli O157:H7 str. Sakai
E. coli UMN026E. coli IAI39E. coli SMS-3-5
E. coli 0127:H6 E2348/69E. coli 536
E. coli ED1aE. coli CFT073
E. coli S88
E. coli UTI89E. coli APEC O1
0.002
E. coli IAI1E. coli SE11
E. coli E24377A
S. sonnei Ss046
S. boydii Sb227S. boydii CDC 3083-94
S. flexneri 5 str. 8401S. flexneri 2a str. 2457TS. flexneri 2a str. 301
E. coli ATCC 8739E. coli HS
E. coli str. K-12 substr. MG1655E. coli str. K12 substr. W3110
E. coli str. K12 substr. DH10BE. coli BW2952
S. dysenteriae Sd197E. coli O55:H7 str. CB9615
E. coli O157:H7 EDL933E. coli O157:H7 str. Sakai
E. coli UMN026E. coli IAI39E. coli SMS-3-5
E. coli 0127:H6 E2348/69
E. coli 536E. coli ED1a
E. coli CFT073
E. coli S88
E. coli UTI89E. coli APEC O1
500-fold speedup
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 13 / 17
Time & Memory
0
50
100
150
200
250
0 5 10 15 20 25 300
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Tim
e(s
)
Me
mo
ry(G
b)
Processors
LaptopZone
TimeMemory
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 14 / 17
3085 Streptococcus pneumoniae Genomes (2.2 Mb)
4 h 37 min on 24-core computer; 9.2 GB RAM
Cheewapreecha et al. (2014). Nature Genetics, 46:305–309.
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 15 / 17
Summary
From genomes to phylogenies
genome alignment distance matrix tree
Approximate alignments
genome alignment distance matrix tree
ANchor DIstances: andi◮ accurate & scaleable to thousands of genomes◮ www.github.com/evolbioinf/andi◮ Ubuntu 16.04 (Xenial Xerus)
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 16 / 17
Acknowledgments
Fabian Klötzl, Plön
Peter Pfaffelhuber, Freiburg
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 17 / 17
top related