parsimony genome 559: introduction to statistical and computational genomics elhanan borenstein
TRANSCRIPT
![Page 1: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/1.jpg)
Parsimony
Genome 559: Introduction to Statistical and Computational Genomics
Elhanan Borenstein
![Page 2: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/2.jpg)
Who am I? Faculty at Genome Sciences Computational systems biologist Training: CS, physics, hi-tech, biology Research interests: Complex biological networks | Evolutionary
dynamics | Microbial communities and metagenomics
What will change? Not much! Informatics: From sequence to genes and to systems Programming:
More emphasis on design and coding practices Tip of the day Coding style
Website: http://elbo.gs.washington.edu/courses/GS_559_12_wi/
![Page 3: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/3.jpg)
A quick review Trees:
Represent sequence relationships A sequence tree has a topology and
branch lengths (distances) The number of tree topologies grows very fast!
Distance trees Compute pairwise corrected distances Build tree by sequential clustering algorithm (UPGMA or
Neighbor-Joining). These algorithms don't consider all tree topologies,
so they are very fast, even for large trees.
![Page 4: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/4.jpg)
“Maximum Parsimony Algorithm”
A fundamentally different method:
Instead of reconstructing a tree, we will search for the best tree.
![Page 5: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/5.jpg)
“Pluralitas non est ponenda sine necessitate”
![Page 6: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/6.jpg)
William of Ockham(c. 1288 – c. 1348)
(Maximum) Parsimony Principle “Pluralitas non est ponenda sine necessitate”
(plurality should not be posited without necessity) William of
Ockham
Occam’s Razor: Of two equivalent theories or explanations, all other things being equal, the simpler one is to be preferred.
"when you hear hoof beats, think horses, not zebras“ Medical diagnosis
The KISS principle: "Keep It Simple, Stupid!" Kelly Johnson, Engineer
“Make everything as simple as possible, but not simpler”Albert Einstein
![Page 7: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/7.jpg)
Parsimony principle for phylogenetic trees
Find the tree that requires the fewest evolutionary changes!
![Page 8: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/8.jpg)
Consider 4 species
![Page 9: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/9.jpg)
Consider 4 speciespositions in alignment (usually called "sites“)Sequence data:
The same approach would work for any discrete property that can be associated with the various species: Gene content (presence/absence of each gene) Morphological features (e.g., “has wings”, purple or white flowers) Numerical features (e.g., number of bristles)
![Page 10: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/10.jpg)
Consider 4 speciespositions in alignment (usually called "sites“)Sequence data:
Parsimony Algorithm1) Construct all possible trees2) For each site in the alignment and for each tree
count the minimal number of changes required3) Add all sites up to obtain the total number of
changes for each tree4) Pick the tree with the lowest score
![Page 11: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/11.jpg)
Consider 4 species
All possible unrooted trees:
H closest to C
Sequence data:
H closest to G
or
H closest to O
or
![Page 12: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/12.jpg)
Consider 4 species
All possible unrooted trees:
H closest to C
Sequence data:
H closest to G
or
H closest to O
or
For each site and for each tree count the minimal number of
changes required:
![Page 13: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/13.jpg)
cc
a a c
c
Consider site 1
What is the minimal number of evolutionary changesthat can account for the observed pattern?
(Note: This is the “small parsimony” problem)
![Page 14: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/14.jpg)
cc
a a c
c
Consider site 1
What is the minimal number of evolutionary changesthat can account for the observed pattern?
(Note: This is the “small parsimony” problem)
![Page 15: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/15.jpg)
cc
a a c
c
Consider site 1
![Page 16: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/16.jpg)
cc
a a c
c
Consider site 1
![Page 17: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/17.jpg)
Uninformative (no changes)
Consider site 2
![Page 18: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/18.jpg)
Consider site 3
![Page 19: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/19.jpg)
Put sites 1 and 3 together
![Page 20: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/20.jpg)
Which tree is the most
parsimonious?
Now put all of them together
98
7parsimony
score
![Page 21: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/21.jpg)
1) Construct all possible trees
2) For each site in the alignment and for each tree count the minimal number of changes required
3) Add all sites up to obtain the total number of changes for each tree
4) Pick the tree with the lowest score
The parsimony algorithm
![Page 22: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/22.jpg)
1) Construct all possible trees
2) For each site in the alignment and for each tree count the minimal number of changes required
3) Add all sites up to obtain the total number of changes for each tree
4) Pick the tree with the lowest score
The parsimony algorithmToo many!
![Page 23: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/23.jpg)
1) Construct all possible trees
2) For each site in the alignment and for each tree count the minimal number of changes required
3) Add all sites up to obtain the total number of changes for each tree
4) Pick the tree with the lowest score
The parsimony algorithmToo many!
How?
![Page 24: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/24.jpg)
1) Construct all possible trees
2) For each site in the alignment and for each tree count the minimal number of changes required
3) Add all sites up to obtain the total number of changes for each tree
4) Pick the tree with the lowest score
The parsimony algorithmToo many!
How? Fitch’s algorithm
Search algorithm
![Page 25: Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein](https://reader038.vdocuments.net/reader038/viewer/2022103112/551a56af55034643688b47a0/html5/thumbnails/25.jpg)