a large version of the small parsimony problem optimally reconstruct ancestral sequences given -...
Post on 21-Dec-2015
224 views
TRANSCRIPT
![Page 1: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/1.jpg)
A Large Version of the Small Parsimony Problem
Optimally reconstruct ancestral sequences given
- unrooted phylogeny (hence ‘small’ parsimony p.) - multiple alignment - affine gap cost function
Jakob Fredslund* ([email protected]), Jotun Hein**, Tejs Scharling*
* Bioinformatics Research Center, Aarhus University, Denmark** Department of Statistics, University of Oxford, United Kingdom
![Page 2: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/2.jpg)
2
Overview
• Introduction
• Examples
• Gap graph construction
• Theory
• Results
• Conclusions
![Page 3: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/3.jpg)
3
Small Parsimony, No GapsAlgorithm due to Finch-Hartigan-Sankoff: Calculate N(A, C, G,T)
in each node (minimal cost of subtree rooted at this node with
nucleotide X in the root) going up, backtrack going down.
![Page 4: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/4.jpg)
4
Small Parsimony, Large Version
1: ac-a---gattc2: acgac---atcc3: gc-----gagcc4: -agacttgt---5: aagtcttagt-c
g(k) = 12 + 2*k
(note: alignment is given)
![Page 5: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/5.jpg)
5
Two Steps
1) Find optimal set of indels to explain gaps
2) Assign nucleotides optimally (FHS)
So: focus on indels
![Page 6: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/6.jpg)
6
Tracing Evolution
What events could explain this alignment?
cagtta
gcag--a
-cagtta
-cag--a
-ctg--a
![Page 7: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/7.jpg)
7
Tracing Evolution
cagtta
cagtta
![Page 8: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/8.jpg)
8
Tracing Evolution
cagtta caga
cagtta
cag--a
cagtta
![Page 9: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/9.jpg)
9
Tracing Evolution
cagtta caga
ctga
cagtta
cag--a
ctg--a
caga
![Page 10: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/10.jpg)
10
Tracing Evolution
cagtta caga
ctga
gcag--a
cagtta
cag--a
ctg--a
-cagtta
-cag--a
-ctg--a
gcaga
![Page 11: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/11.jpg)
11
Indels Affect Full Subtrees
cagtta caga
ctga
gcaga
gcag--a
-cagtta
-cag--a
-ctg--a
All sequences in right subtree have gaps in blue indel’s position
![Page 12: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/12.jpg)
12
Indels Affect Full Subtrees
cagtta caga
ctga
gcaga
gcag--a
-cagtta
-cag--a
-ctg--a
All sequences in left subtree have gaps in green indel’s position
![Page 13: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/13.jpg)
13
Direction of Evolution?
cagtta caga
ctga
gcaga
gcag--a
-cagtta
-cag--a
-ctg--adeletion of tt
![Page 14: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/14.jpg)
14
Direction of Evolution?
cagtta caga
ctga
gcaga
gcag--a
-cagtta
-cag--a
-ctg--a
insertion of tt
![Page 15: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/15.jpg)
15
Direction of Evolution?
cagtta caga
ctga
gcaga
gcag--a
-cagtta
-cag--a
-ctg--a
Since we don’t know the direction, we refer to insertions/ deletions as indels. And remember: an indel creates gaps in a full subtree.
![Page 16: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/16.jpg)
16
Explaining Gaps With Indels
g(k) = a + bk
(Anonymous nucleotides denoted by n)
![Page 17: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/17.jpg)
17
Explaining Gaps With Indels
g(k) = a + bk 2*(a+2b)
![Page 18: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/18.jpg)
18
Explaining Gaps With Indels
g(k) = a + bk 2*(a+2b) 3*(a+b)
![Page 19: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/19.jpg)
19
Larger Example
N8, N9, N10, N11, N12, N13 : ???.. Complex problem! (not aware of any upper time bound)
![Page 20: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/20.jpg)
20
Gap Graph Construction
Represent in a concise way all gaps and how they are connected: in a graph.
![Page 21: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/21.jpg)
21
Gap Intervals
1.Find gap intervals.
![Page 22: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/22.jpg)
22
Gap Intervals
1.Find gap intervals.
No optimal indel ‘stops’ in the middle of a gap interval:
it is cheaper to extend the indel making the first gap than to open a new one.
(by triangle inequality)
![Page 23: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/23.jpg)
23
Gap Graph Vertices
2. Create minimal tree coverings:
For each gap interval, find minimal number of subtrees with gaps in all leaves, covering all gaps
![Page 24: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/24.jpg)
24
Gap Graph Vertices
2. Create minimal tree coverings:
For each gap interval, find minimal number of subtrees with gaps in all leaves, covering all gaps
![Page 25: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/25.jpg)
25
Gap Graph Vertices
Each vertex represents:
a) subtree with gaps in all leaves
b) region of alignment
2. Create minimal tree coverings:
For each gap interval, find minimal number of subtrees with gaps in all leaves, covering all gaps
![Page 26: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/26.jpg)
26
Gap Graph Vertices
2. Create minimal tree coverings:
For each gap interval, find minimal number of subtrees with gaps in all leaves, covering all gaps
![Page 27: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/27.jpg)
27
Gap Graph Vertices
2. Create minimal tree coverings:
For each gap interval, find minimal number of subtrees with gaps in all leaves, covering all gaps
![Page 28: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/28.jpg)
28
Gap Graph Vertices
2. Create minimal tree coverings:
For each gap interval, find minimal number of subtrees with gaps in all leaves, covering all gaps
![Page 29: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/29.jpg)
29
Gap Graph Vertices
2. Create minimal tree coverings:
For each gap interval, find minimal number of subtrees with gaps in all leaves, covering all gaps
![Page 30: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/30.jpg)
30
Gap Graph Vertices
2. Create minimal tree coverings:
For each gap interval, find minimal number of subtrees with gaps in all leaves, covering all gaps
![Page 31: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/31.jpg)
31
Gap Graph Vertices
2. Create minimal tree coverings:
For each gap interval, find minimal number of subtrees with gaps in all leaves, covering all gaps
![Page 32: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/32.jpg)
32
Gap Graph Connections
3. Create connection between vertices v and w if they represent neighboring gaps.
![Page 33: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/33.jpg)
33
Gap Graph Connections
3. Create connection between vertices v and w if they represent neighboring gaps.
v → w : all v’s gaps continue in w
![Page 34: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/34.jpg)
34
Gap Graph Connections
3. Create connection between vertices v and w if they represent neighboring gaps.
v → w : all v’s gaps continue in w
![Page 35: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/35.jpg)
35
Gap Graph Connections
3. Create connection between vertices v and w if they represent neighboring gaps.
v → w : all v’s gaps continue in w
(a special-case connection exists; see paper)
![Page 36: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/36.jpg)
37
Interpreting a Gap Graph VertexA vertex is a potential indel: one indel could have created all gaps in the subtree.
Either one indel created all gaps in the subtree (vertex confirmed), ..
![Page 37: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/37.jpg)
38
Interpreting a Gap Graph Vertex.. or the vertex is decomposed into several indels (further ‘down’ in the tree).
Goal: confirm or decompose vertices with respect to the gap cost function.
![Page 38: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/38.jpg)
43
Theory Needed Here..
![Page 39: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/39.jpg)
44
We Need Optimality Proof
A gap graph may be huge, thus representing an enormous
number of potential indels. We need to show two things:
P1: that all optimal indels are represented in the gap graph;
P2: how to ‘resolve the graph’ to determine the set of optimal indels.
P1 proved directly in paper (Theorem 1).
![Page 40: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/40.jpg)
45
Resolving the Gap Graph
In order to determine optimal set of indels, we need to reduce potentially huge graph while keeping the optimal solution!
Theorem 2 and a set of following lemmas serve this purpose by
identifying certain local graph configurations that can be reduced.
Preprocess gap graph (perform local reductions) by applying lemmas.
![Page 41: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/41.jpg)
46
Preprocessing Earlier Example
Iteratively apply lemmas to reduce the
graph..
![Page 42: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/42.jpg)
47
Preprocessing Earlier Example
Iteratively apply lemmas to reduce the
graph..
![Page 43: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/43.jpg)
48
Preprocessing Earlier Example
Iteratively apply lemmas to reduce the
graph..
![Page 44: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/44.jpg)
49
Preprocessing Earlier Example
Iteratively apply lemmas to reduce the
graph..
![Page 45: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/45.jpg)
50
Solving Earlier Example
After preprocessing: resolve remaining graph by checking all combinations
decompose
![Page 46: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/46.jpg)
51
Solving Earlier Example
Placing indels in the tree:
![Page 47: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/47.jpg)
52
After Local Preprocessing
• In longer examples there will be many undecided vertices (purple) after preprocessing.
• Find possible decompositions for each vertex and check all combinations in each chain – number of combinations exponential in chain length
![Page 48: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/48.jpg)
53
Execution Times..?Worst-case: exponential.
Average times for random alignments with 60% gaps:
![Page 49: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/49.jpg)
54
60% gapsis a lot..
![Page 50: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/50.jpg)
55
Real Genome Analysis
B.ES.89.S61K15, B.FR.83.HXB2, B.GA.88.OYI, B.GB.83.CAM1, B.NL.86.3202A21, B.TW.94.TWCYS, B.US.86.AD87,
B.US.84.NY5CG, and B.US.83.SF2
Nine HIV-1 subtypes from the Los AlamosHIV database (tree constructed with Quicktree).
Length: 9868. Running Time: 1 sec
![Page 51: A Large Version of the Small Parsimony Problem Optimally reconstruct ancestral sequences given - unrooted phylogeny (hence ‘small’ parsimony p.) - multiple](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d555503460f94a322c5/html5/thumbnails/51.jpg)
56
Conclusions
• Concise way of representing alignment gaps
• Theoretically sound framework prove optimality
• Graph reductions lead to fast resolvement