2015-10-131 phylogentic tree. 2015-10-132 evolution evolution of organisms is driven by diversity ...
TRANSCRIPT
23/4/19 2
EvolutionEvolution of organisms
is driven by Diversity
Different individuals carry different variants of the same basic blue print
Mutations The DNA sequence
can be changed due to single base changes, deletion/insertion of DNA segments, etc.
23/4/19 3
Basic Assumptions
Closer related organisms have more similar genomes.
Highly similar genes are homologous (have the same ancestor).
A universal ancestor exists for all life forms. Phylogenetic relation can be expressed by a
dendrogram (a “tree”) .
23/4/19 4
phylogenetic tree
phylogenetic tree is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species;
23/4/19 5
Ancestral Node or ROOT of
the TreeInternal Nodes
Branches or Lineages
Terminal Nodes
A
B
C
D
E
Common Phylogenetic Tree Terminology
23/4/19 6
Phylogenetic trees diagram the evolutionary relationships between the taxa
((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses
Taxon A
Taxon B
Taxon C
Taxon E
Taxon D
No meaning to thespacing between thetaxa, or to the order inwhich they appear fromtop to bottom.
This dimension either can have no scale, can be proportional to genetic distance or amount of change (for ‘phylograms’ or ‘additive trees’), or can be proportionalto time.
These say that B and C are more closely related to each other than either is to A,and that A, B, and C form a clade that is a sister group to the clade composed ofD and E. If the tree has a time scale, then D and E are the most closely related.
23/4/19 7
Historical Note Until mid 1950’s phylogenies were constructed
by experts based on their opinion (subjective criteria)
Since then, focus on objective criteria for constructing phylogenetic trees Thousands of articles in the last decades
Important for many aspects of biology Classification Understanding biological mechanisms
23/4/19 8
Morphological vs. Molecular Classical phylogenetic analysis:
morphological features: number of legs, lengths of legs, etc.
Modern biological methods allow to use molecular features Gene sequences Protein sequences
Analysis based on homologous sequences in different species
23/4/19 9
Morphological topology
BonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboonWhite-fronted capuchinSlow lorisTree shrewJapanese pipistrelleLong-tailed batJamaican fruit-eating batHorseshoe bat
Little red flying foxRyukyu flying foxMouseRatVoleCane-ratGuinea pigSquirrelDormouseRabbitPikaPigHippopotamusSheepCowAlpacaBlue whaleFin whaleSperm whaleDonkeyHorseIndian rhinoWhite rhinoElephantAardvarkGrey sealHarbor sealDogCatAsiatic shrewLong-clawed shrewSmall Madagascar hedgehogHedgehogGymnureMoleArmadilloBandicootWallarooOpossumPlatypus
Archonta
Glires
Ungulata
Carnivora
Insectivora
Xenarthra
(Based on Mc Kenna and Bell, 1997)
23/4/19 10
Rat QEPGGLVVPPTDA
Rabbit QEPGGMVVPPTDA
Gorilla QEPGGLVVPPTDA
Cat REPGGLVVPPTEG
From sequences to a phylogenetic tree
There are many possible types of sequences to use.
23/4/19 11
DonkeyHorseIndian rhinoWhite rhinoGrey sealHarbor sealDogCatBlue whaleFin whaleSperm whaleHippopotamusSheepCowAlpacaPigLittle red flying foxRyukyu flying foxHorseshoe batJapanese pipistrelleLong-tailed batJamaican fruit-eating bat
Asiatic shrewLong-clawed shrew
MoleSmall Madagascar hedgehogAardvarkElephantArmadilloRabbitPikaTree shrewBonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboon
White-fronted capuchinSlow lorisSquirrelDormouseCane-ratGuinea pigMouseRatVoleHedgehogGymnureBandicootWallarooOpossumPlatypus
Perissodactyla
Carnivora
Cetartiodactyla
Rodentia 1
HedgehogsRodentia 2
Primates
ChiropteraMoles+ShrewsAfrotheria
XenarthraLagomorpha
+ Scandentia
Mitochondrial (线粒体 ) topology(Based on Pupko et al.,)
23/4/19 12
What can we get from phylogenetic trees?
A few examples of what can be inferred from phylogenetic trees built from DNAor protein sequence data: Which species are the closest living relatives of modern
humans?
Did the infamous Florida Dentist infect his patients with HIV?
23/4/19 13
Which species are the closest living relatives of modern humans?
Mitochondrial DNA, most nuclear DNA-encoded genes, and DNA/DNA hybridization all show that bonobos and chimpanzees are related more closely to humans than either are to gorillas.
MYA
Chimpanzees
Orangutans
Humans
Bonobos
Gorillas
014
23/4/19 14
Did the Florida Dentist infect his patients with HIV?
DENTIST
DENTIST
Patient D
Patient F
Patient C
Patient A
Patient G
Patient BPatient E
Patient A
Local control 2
Local control 3
Local control 9
Local control 35
Local control 3
Yes:The HIV sequences fromthese patients fall withinthe clade of HIV sequences found in the dentist.
No
No
Phylogenetic treeof HIV sequencesfrom the DENTIST,his Patients, & LocalHIV-infected People:
23/4/19 17
Inferring evolutionary relationships between the taxa requires rooting the tree:
To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root: A
BC
Root D
A B C D
RootNote that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D.
Rooted tree
Unrooted tree
23/4/19 18
Now, try it again with the root at another position:
A
BC
Root
D
Unrooted tree
Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D.
C D
Root
Rooted tree
A
B
23/4/19 19
An unrooted, four-taxon tree theoretically can be rooted in five different places to produce five different rooted trees
The unrooted tree 1:
A C
B D
Rooted tree 1d
C
D
A
B
4
Rooted tree 1c
A
B
C
D
3
Rooted tree 1e
D
C
A
B
5
Rooted tree 1b
A
B
C
D
2
Rooted tree 1a
B
A
C
D
1
These trees show five different evolutionary relationships among the taxa!
23/4/19 20
x
CA
B D
A D
B E
C
A D
B E
C
F
Each unrooted tree theoretically can be rooted anywhere along any of its branches
N (2N - 5)!/(2N - 3 (N - 3)!) (2N - 3)!/(2N - 2 (N - 2)!)
23/4/19 21
By outgroup: Uses taxa (the “outgroup”) that are known to fall outside of the group of interest (the “ingroup”). Requires some prior knowledge about the relationships among the taxa.
There are two major ways to root trees:
A
B
C
D
10
2
3
5
2
By midpoint or distance:Roots the tree at the midway point between the two most distant taxa in the tree, as determined by branch lengths. This assumption is built into some of the distance-based tree building methods.
outgroup
d (A,D) = 10 + 3 + 5 = 18Midpoint = 18 / 2 = 9
23/4/19 22
Two Methods of Tree Construction
Distance- A tree that recursively combines two nodes of the smallest distance.
Parsimony – A tree with a total minimum number of character changes between nodes.
23/4/19 23
Types of data used in phylogenetic inference:Character-based methods: Use the aligned characters, such as DNA
or protein sequences, directly during tree inference. Taxa Characters
Species A ATGGCTATTCTTATAGTACGSpecies B ATCGCTAGTCTTATATTACASpecies C TTCACTAGACCTGTGGTCCASpecies D TTGACCAGACCTGTGGTCCGSpecies E TTGACCAGTTCTCTAGTTCG
Distance-based methods: Transform the sequence data into pairwise distances (dissimilarities), and then use the matrix during tree building.
A B C D E Species A ---- 0.20 0.50 0.45 0.40 Species B 0.23 ---- 0.40 0.55 0.50 Species C 0.87 0.59 ---- 0.15 0.40 Species D 0.73 1.12 0.17 ---- 0.25 Species E 0.59 0.89 0.61 0.31 ----
23/4/19 24
Distance-Based Method
Input: distance matrix between speciesFor two sequences si and sj , perform a pairwise (global)
alignment. Let f = the fraction of sites with different residues. Then
Outline: Cluster species together Initially clusters are singletons At each iteration combine two “closest” clusters
to get a new one
3 4log(1 )
4 3ijd f (Jukes-Cantor Model)
23/4/19 25
Unweighted Pair Group Method using Arithmetic Averages (UPGMA)
UPGMA is a type of Distance-Based algorithm
UPGMA steps:.
1. Cluster the two species with the smallest distance putting them into a single group.
2. Recalculate the distance matrix with the new group against other groups:
3. With the new distance matrix repeat 1 until all species have been grouped.
23/4/19 27
UPGMA Step 1Species A B C D
B 9 – – –
C 8 11 – –
D 12 15 10 –
E 15 18 13 5
Merge D & E
D E
Species A B C
B 9 – –
C 8 11 –
DE 13.5 16.5 11.5
d(DE)A = 0.5 * (dDA+dEA) = 0.5*(12+15) = 13.5 d(DE)B = 0.5 * (dDB+dEB) = 0.5*(15+18) = 16.5d(DE)C = 0.5 * (dDC+dEC) = 0.5*(10+13) = 11.5
23/4/19 28
UPGMA Step 2Merge A & C
D E
Species A B C
B 9 – –
C 8 11 –
DE 13.5 16.5 11.5
A C
Species B AC
AC 10 –
DE 16.5 12.5
23/4/19 29
UPGMA Steps 3 & 4Merge B & AC
D EA C
Species B AC
AC 10 –
DE 16.5 12.5B
Merge ABC & DE
D EA C B
(((A,C)B)(D,E))
23/4/19 30
Optimality criterion: The ‘most-parsimonious’ tree is the one that requires the fewest number of evolutionary events (e.g., nucleotide substitutions, amino acid replacements) to explain the sequences.
Parsimony-score:Number of character-changes (mutations) along the evolutionary tree
Example:
Most Parsimonious Tree (MP
Tree)
AGA AAAAAG GGA
1 1 0 2
0 0
1 0 0 1
0 1AAA
AAA AAAAGAAAAAAG GGA
AAA
AAA AGA
Most parsimonious tree: Tree with minimal parsimony score
Score = 4 Score = 3
23/4/19 31
We cannot go over all the trees. We will try to find a way to find the best tree.There are approximate solutions… But what if we want to make sure we find the global maximum.
There is a way more efficient than just go over all possible tree. It is called BRANCH AND BOUND and is a general technique in computer science, that can be applied to phylogeny.
There are many trees..,
23/4/19 32
BRANCH AND BOUND
To exemplify the BRANCH AND BOUND (BNB) method, we will use an example not connected to evolution. Later, when the general BNB method is understood, we will see how to apply this method to finding the MP tree. We will present the traveling sales person path problem (TSP).
23/4/19 33
Branch and Bound for TSP
Find a minimum cost round-trip path that visits each intermediate city exactly once
Greedy approach:A,G,E,F,B,D,C,A= 251
AC
F
E
D
G
B
93
46
20
35
68
1257 31
15
82
17
8259
23/4/19 34
Search all possible pathsA
C
F
E
D
G
B
93
46
20
35
68
1257 31
15
82
17
8259
AC
F
E
D
G
B
93
46
20
35
68
1257 31
15
82
17
8259
All paths
AG (20) AB (46) AC (93)
AGF (88) AGE (55)
AGFB AGFE AGFC
ACB (175) ACD ACF
ACBE (257)
Best estimate: 251
23/4/19 35
Back to finding the MP tree
Finding the MP tree
BNB helps, though it is still exponential…
23/4/19 36
The MP search tree1
2
34 is added to branch 1.
1
2
34
1
2
34
1
2
3
4
5 is added to branch 2.There are 5 branches
23/4/19 37
The MP search tree
4 is added to branch 1.
30
43 39
52 54 52 53 58 61 56 59 61 69 53 51 42 47 47
55
23/4/19 38
MP-BNB
4 is added to branch 1.
30
43 39
52 54 52 53 58 61 56 59 61 69 53 51 42 47 47
55
Best (minimum) value = 52
23/4/19 39
MP-BNB
4 is added to branch 1.
30
43 39
52 54 52 53 58 61 56 59 61 69 53 51 42 47 47
55
Best record = 52
23/4/19 40
MP-BNB
4 is added to branch 1.
30
43 39
52 54 52 53 58 61 56 59 61 69 53 51 42 47 47
55
Best record = 52
23/4/19 47
MP-BNB
30
43 39
52 54 52 53 58 53 51 42 47 47
55
Best TREE.MP score = 42
Total # trees visited: 14