linear programming for phylogenetic reconstruction based …stelo/cpm/cpm05/cpm05_10_2_tang.pdf ·...
TRANSCRIPT
![Page 1: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/1.jpg)
Linear Programming for
Phylogenetic Reconstruction
Based on Gene Rearrangements
Jijun [email protected]
Department of Computer Science and EngineeringUniversity of South Carolina
– p. 1/30
![Page 2: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/2.jpg)
Acknowledgment
• Joint work with Bernard Moret (University ofNew Mexico).
• Supported by National ScienceFoundation and U. of South Carolina.
– p. 2/30
![Page 3: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/3.jpg)
Overview
• Introduction to gene-order data
• GRAPPA and the computational challenge
• Linear programming setup
• Experimental design
• Experimental results
• Conclusions
– p. 3/30
![Page 4: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/4.jpg)
What Is A Phylogeny?
– p. 4/30
![Page 5: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/5.jpg)
What Is A Phylogeny?
• The evolutionary history of a group oforganisms
– p. 4/30
![Page 6: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/6.jpg)
What Is A Phylogeny?
• The evolutionary history of a group oforganisms
• Usually takes the form of a tree:• Modern organisms are placed at the leaves
• Edges denote evolutionary relationships
– p. 4/30
![Page 7: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/7.jpg)
Example
– p. 5/30
![Page 8: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/8.jpg)
Gene-Order Data
– p. 6/30
![Page 9: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/9.jpg)
Gene-Order Data
• Chromosome can be represented by anordering of signed genes• Linear or circular
• Sign of a gene represents gene orientation
– p. 6/30
![Page 10: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/10.jpg)
Gene-Order Data
• Chromosome can be represented by anordering of signed genes• Linear or circular
• Sign of a gene represents gene orientation
• The gene order can be rearranged byevolutionary events such as:• Inversion, transposition and inverted transposition
• Deletion and insertion
– p. 6/30
![Page 11: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/11.jpg)
Gene-Order Rearrangements
12
3 7
4 65
8
7
85
6
1
43
2
7
85
6
1
−4−3
−2
1
7
65
8−4
−3
−2
InversionInverted Transposition
Transposition
– p. 7/30
![Page 12: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/12.jpg)
Reconstruction Methods
– p. 8/30
![Page 13: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/13.jpg)
Reconstruction Methods
• Distance based methods:Neighbor-joining and its variants
– p. 8/30
![Page 14: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/14.jpg)
Reconstruction Methods
• Distance based methods:Neighbor-joining and its variants
• Bayesian method:Badger
– p. 8/30
![Page 15: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/15.jpg)
Reconstruction Methods
• Distance based methods:Neighbor-joining and its variants
• Bayesian method:Badger
• Maximum parsimony based on encoding:MPBE, MPME
– p. 8/30
![Page 16: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/16.jpg)
Reconstruction Methods
• Distance based methods:Neighbor-joining and its variants
• Bayesian method:Badger
• Maximum parsimony based on encoding:MPBE, MPME
• Direct optimization method:BPAnalysis, GRAPPA, MGR
– p. 8/30
![Page 17: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/17.jpg)
Direct Optimization Methods
– p. 9/30
![Page 18: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/18.jpg)
Direct Optimization Methods
• Goal: to reconstruct phylogeny withminimum # of rearrangement events
– p. 9/30
![Page 19: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/19.jpg)
Direct Optimization Methods
• Goal: to reconstruct phylogeny withminimum # of rearrangement events
• Computationally hard even for only threegenomes• Median problem for three is NP hard under general
distance definition• Find the content of the median genome
to minimize the sum of the distances fromthe median to the three genomes
– p. 9/30
![Page 20: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/20.jpg)
Reconstruction Example
12 11
12
−8
−5
−4−3
9−7
−610
12−5
−4
−9 −8−7−6
1011
12
−3
12
89
1011
12
−5
−7−6 4 3
1 1211
9
2−5
−7−6 4 −8
−3
10
1211
10
98
4−6−7
−52 1
3
1 1211
109
876
2−5
−4−3
−8−9
7−65
4
3 10−2−1−11
−12
3
45 6 7 8
9
10−2−1−12
−11
2 1
10
84
1211
765
9
3
−7 6−5
−410
−2−1
98
3−12
−11
4
3
7−65−810
−2−1
−9
−12−11
45
68
−910
−2−1−3
1211
7
(1,3) (9)
(6,9) (4,7) (6) (8,9)
(6) (7,8)− −(4,9)
(11,2)(3,5)
– p. 10/30
![Page 21: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/21.jpg)
GRAPPA
– p. 11/30
![Page 22: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/22.jpg)
GRAPPA
• Genome Rearrangements Analysis underParsimony and other PhylogeneticAlgorithms
– p. 11/30
![Page 23: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/23.jpg)
GRAPPA
• Genome Rearrangements Analysis underParsimony and other PhylogeneticAlgorithms
• Started as an effort to reimplement theBPAnalysis of Sankoff and Blanchette
– p. 11/30
![Page 24: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/24.jpg)
GRAPPA
• Genome Rearrangements Analysis underParsimony and other PhylogeneticAlgorithms
• Started as an effort to reimplement theBPAnalysis of Sankoff and Blanchette
• Used algorithmic techniques to improvethe speed• A tightened lower bound to discard bad trees
before scoring them
• Profiling, cache awareness, etc
– p. 11/30
![Page 25: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/25.jpg)
Algorithm Outline
– p. 12/30
![Page 26: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/26.jpg)
Algorithm Outline
• Consider each tree topology in turn
– p. 12/30
![Page 27: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/27.jpg)
Algorithm Outline
• Consider each tree topology in turn
• For each tree• Test the lower bound, if it exceeds the best so far,
continue to the next tree
• Initialize the internal nodes by some means
• Compute medians of three iteratively until nochange occurs
– p. 12/30
![Page 28: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/28.jpg)
Algorithm Outline
• Consider each tree topology in turn
• For each tree• Test the lower bound, if it exceeds the best so far,
continue to the next tree
• Initialize the internal nodes by some means
• Compute medians of three iteratively until nochange occurs
• Return the lowest score tree
– p. 12/30
![Page 29: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/29.jpg)
Scoring a Tree
� �� �� �� �� �� �
� �� �� �� �� �� �� � �� � �� � �� �� �� �� �� �� �� �� �� �
� �� �� �
� �� �� �
� �� �� �
� �� �� �� �� �� �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �C
1
3 4 52
B
A
– p. 13/30
![Page 30: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/30.jpg)
Scoring a Tree
� �� �� �� �� �� �
� �� �� �� �� �� �� � �� � �� � �� �� �� �� �� �� �� �� �� �
� �� �� �
� �� �� �
� �� �� �
� �� �� �� �� �� �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � � � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �C
1
3 4 52
B
A
– p. 13/30
![Page 31: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/31.jpg)
Scoring a Tree
� �� �� �� �� �� �
� �� �� �� �� �� �� � �� � �� � �� �� �� �� �� �� �� �� �� �
� �� �� �
� �� �� �
� �� �� �
� �� �� �� �� �� �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � � � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �C
1
3 4 52
B
A
– p. 13/30
![Page 32: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/32.jpg)
Scoring a Tree
� �� �� �� �� �� �
� �� �� �� �� �� �� � �� � �� � �� �� �� �� �� �� �� �� �� �
� �� �� �
� �� �� �
� �� �� �
� �� �� �� �� �� �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � � � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �C
1
3 4 52
B
A
– p. 13/30
![Page 33: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/33.jpg)
Scoring a Tree
� �� �� �� �� �� �
� �� �� �� �� �� �� � �� � �� � �� �� �� �� �� �� �� �� �� �
� �� �� �
� �� �� �
� �� �� �
� �� �� �� �� �� �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � � � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �
� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �
� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �C
1
3 4 52
B
A
– p. 13/30
![Page 34: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/34.jpg)
Computational Challenge
– p. 14/30
![Page 35: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/35.jpg)
Computational Challenge
• Scoring a tree is very expensive
– p. 14/30
![Page 36: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/36.jpg)
Computational Challenge
• Scoring a tree is very expensive
• When the genomes are distant, a medianmay take days or months to be solved
– p. 14/30
![Page 37: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/37.jpg)
Computational Challenge
• Scoring a tree is very expensive
• When the genomes are distant, a medianmay take days or months to be solved
• It needs to solve the median problemsiteratively
– p. 14/30
![Page 38: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/38.jpg)
Computational Challenge
• Scoring a tree is very expensive
• When the genomes are distant, a medianmay take days or months to be solved
• It needs to solve the median problemsiteratively
• Can we find the tree score without solvingthe median problems?
– p. 14/30
![Page 39: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/39.jpg)
Linear Programming Approach
– p. 15/30
![Page 40: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/40.jpg)
Linear Programming Approach
• Goal: minimize the tree length
– p. 15/30
![Page 41: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/41.jpg)
Linear Programming Approach
• Goal: minimize the tree length
• What do we know?
– p. 15/30
![Page 42: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/42.jpg)
Linear Programming Approach
• Goal: minimize the tree length
• What do we know?• The pairwise distance matrix
• A given tree topology
– p. 15/30
![Page 43: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/43.jpg)
Linear Programming Approach
• Goal: minimize the tree length
• What do we know?• The pairwise distance matrix
• A given tree topology
• Approach:• Finding useful constraints
• Using linear programming method to minimize thetree length
– p. 15/30
![Page 44: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/44.jpg)
Median Problem
– p. 16/30
![Page 45: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/45.jpg)
Median Problem
23
1
2 3
0
d12
d10 d13
d30d20
d
d01 + d02 + d03 ≤d12 + d23 + d13
2
– p. 16/30
![Page 46: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/46.jpg)
Median Problem
23
1
2 3
0
d12
d10 d13
d30d20
d
d01 + d02 + d03 ≤d12 + d23 + d13
2
More than 98% cases we have
d01 + d02 + d03=d12 + d23 + d13
2
– p. 16/30
![Page 47: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/47.jpg)
Constraint on Internal Node
d
A
M
C
BA,Bd
kd
A,Cdk+2d
k+1d
B,C
∀M, dk + dk+1 + dk+2 =dA,B + dA,C + dB,C
2
– p. 17/30
![Page 48: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/48.jpg)
Equations
– p. 18/30
![Page 49: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/49.jpg)
Equations
d
1
N+1
2
N+2 2N−3
N
N−1
2N−2
1,2d1d
1,N+2d
2d
2,N+2d3d
2N−3,Nd
2N−3d
2N−5d
2N−3,N−1d
2N−4d
N−1,N
– p. 18/30
![Page 50: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/50.jpg)
Equations
d
1
N+1
2
N+2 2N−3
N
N−1
2N−2
1,2d1d
1,N+2d
2d
2,N+2d3d
2N−3,Nd
2N−3d
2N−5d
2N−3,N−1d
2N−4d
N−1,N
d1 + d2 + d3 =d1,2 + d2,N+2 + d1,N+2
2
· · ·
d2N−5 + d2N−4 + d2N−3 =d2N−3,N−1 + dN−1,N + d2N−3,N
2
– p. 18/30
![Page 51: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/51.jpg)
Problems
d
1
N+1
2
N+2 2N−3
N
N−1
2N−2
1,2d1d
1,N+2d
2d
2,N+2d3d
2N−3,Nd
2N−3d
2N−5d
2N−3,N−1d
2N−4d
N−1,N
– p. 19/30
![Page 52: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/52.jpg)
Problems
d
1
N+1
2
N+2 2N−3
N
N−1
2N−2
1,2d1d
1,N+2d
2d
2,N+2d3d
2N−3,Nd
2N−3d
2N−5d
2N−3,N−1d
2N−4d
N−1,N
• There are ≈ 5N variables,but only N − 2 equations · · ·
– p. 19/30
![Page 53: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/53.jpg)
Problems
d
1
N+1
2
N+2 2N−3
N
N−1
2N−2
1,2d1d
1,N+2d
2d
2,N+2d3d
2N−3,Nd
2N−3d
2N−5d
2N−3,N−1d
2N−4d
N−1,N
• There are ≈ 5N variables,but only N − 2 equations · · ·
• There are many (and redundant) triangular inequations
– p. 19/30
![Page 54: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/54.jpg)
Inequality Equations
• We want to pick up a minimum number ofinequations to cover all the variables
• We know only the distance matrix and treetopology
• Choices:for each pair of genomes, find the two shortest pathsfrom one to another, and build one inequation for eachpath
– p. 20/30
![Page 55: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/55.jpg)
Inequality Equations
d
1
N+1
2
N+2 2N−3
N
N−1
2N−2
1,2d1d
1,N+2d
2d
2,N+2d3d
2N−3,Nd
2N−3d
2N−5d
2N−3,N−1d
2N−4d
N−1,N
d1,2 ≤ d1 + d3
dN−1,N ≤ d2N−4 + d2N − 3
· · ·
d1,N−1 ≤ d1,N+2 + · · · + d2N−3,N−1
d1,N−1 ≤ d1,N+2 + · · · + d2N−5,+d2N−4
– p. 21/30
![Page 56: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/56.jpg)
Sum-up
• Examine every tree
• For each tree (with N genomes)• Minimize the sum of 2N − 3 edge lengths• ≈ 5N variables total• N − 2 equations, < 2N(N − 1) inequations
• These numbers are relatively small if N < 20
• Use lp_solve to find the length of the tree
• Return tree(s) with the minimum length
– p. 22/30
![Page 57: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/57.jpg)
Experimental Design
• Real datasets—limited samples
• Simulation• Generate a tree (true tree) from different
topologies: uniform, birth-death, · · ·• Assign edge lengths based on the expected
evolutionary rate• Assign gene content to each genome based on the
edge length• Use GRAPPA to find a tree (inferred tree)
• Compare inferred tree with true tree to determinethe accuracy
– p. 23/30
![Page 58: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/58.jpg)
Topological Accuracy
– p. 24/30
![Page 59: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/59.jpg)
Topological Accuracy
• False positive:an edge is in the inferred tree,not in the true tree
• False negative:an edge is in the true tree,not in the inferred tree
– p. 24/30
![Page 60: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/60.jpg)
Topological Accuracy
• False positive:an edge is in the inferred tree,not in the true tree
• False negative:an edge is in the true tree,not in the inferred tree
Goal: to minimize FP and FN
– p. 24/30
![Page 61: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/61.jpg)
Simulation Details
• Number of genomes (N ): 12
• Number of genes (n): 200, 500 and 1000
• Expected # of events on each edge:0.05n − 0.15n
• Tree topologies: uniform and birth-death
• Datasets on each combination: 10
– p. 25/30
![Page 62: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/62.jpg)
Simulation Details
• Number of genomes (N ): 12
• Number of genes (n): 200, 500 and 1000
• Expected # of events on each edge:0.05n − 0.15n
• Tree topologies: uniform and birth-death
• Datasets on each combination: 10
– p. 25/30
![Page 63: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/63.jpg)
Simulation Details
• Number of genomes (N ): 12
• Number of genes (n): 200, 500 and 1000
• Expected # of events on each edge:0.05n − 0.15n
• Tree topologies: uniform and birth-death
• Datasets on each combination: 10
– p. 25/30
![Page 64: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/64.jpg)
Simulation Details
• Number of genomes (N ): 12
• Number of genes (n): 200, 500 and 1000
• Expected # of events on each edge:0.05n − 0.15n
• Tree topologies: uniform and birth-death
• Datasets on each combination: 10
– p. 25/30
![Page 65: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/65.jpg)
Simulation Details
• Number of genomes (N ): 12
• Number of genes (n): 200, 500 and 1000
• Expected # of events on each edge:0.05n − 0.15n
• Tree topologies: uniform and birth-death
• Datasets on each combination: 10
– p. 25/30
![Page 66: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/66.jpg)
FN (500 genes, BD tree)
20
15
10
5
0 72 64 56 48 40 32 24
FN
rat
e (n
=50
0)
r
NJLP
– p. 26/30
![Page 67: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/67.jpg)
FP (500 genes, BD tree)
20
15
10
5
0 72 64 56 48 40 32 24
FP
rat
e (n
=50
0)
r
NJLP
– p. 27/30
![Page 68: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/68.jpg)
FN (1000 genes, uniform tree)
25
20
15
10
5
0 144 128 112 96 80 64 48
FN
rat
e (n
=10
00)
r
NJLP
– p. 28/30
![Page 69: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/69.jpg)
FP (1000 genes, uniform tree)
25
20
15
10
5
0 144 128 112 96 80 64 48
FP
rat
e (n
=10
00)
r
NJLP
– p. 29/30
![Page 70: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene](https://reader031.vdocuments.net/reader031/viewer/2022021900/5b575e837f8b9a022e8da2b2/html5/thumbnails/70.jpg)
Conclusion
• Linear programming gives us a new andaccurate method for difficult datasets
• Can be applied to any distance
• Has potential to be used for large andcomplex genomes
• Can be extended to solve the medianproblems
– p. 30/30