54406741 molecular evolution
TRANSCRIPT
-
7/24/2019 54406741 Molecular Evolution
1/32
Introduction toMolecular Evolution
Mike Thomas
October 3, 2002
-
7/24/2019 54406741 Molecular Evolution
2/32
What we can learn from multiple sequence alignments
An alignment is a hypothesis about the relatedness of a set of genes
This information can be used to reconstruct the evolutionary history of those
genes
The history of the genes can provide us with information about the structure
and function, and significance of a gene or family of genes
We can also use the reconstructed history to test hypotheses about evolutionitself:
Rates of change
The degree of change
mplications of change, etc
We can then pose and test hypotheses about the evolution of phenomenaunrelated to the genes
!volution of flight in insects
!volution of humans
!volution of disease
-
7/24/2019 54406741 Molecular Evolution
3/32
Assumptions made by phylogenetic methods:
The sequences are correct
The sequence are homologous
!ach position is homologous
The sampling of ta"a or genes is sufficient to resolve the
problem of interest #equence variation is representative of the broader group of
interest
#equence variation contains sufficient phylogenetic signal $as
opposed to noise% to resolve the problem of intereest !ach position in the sequence evolved independently
-
7/24/2019 54406741 Molecular Evolution
4/32
How do you extract this inormation rom an ali!nm
-
7/24/2019 54406741 Molecular Evolution
5/32
Haeckel#s Tree o $ie
%Hi!her& or!anisms
%$ower& or!anisms
'nswer( a tree
' )hylo!enetic tree isa hierarchical,!ra)hicalre)resentation o
relationshi)s
-
7/24/2019 54406741 Molecular Evolution
6/32
Other *ays to +e)resent hylo!eni
-a. /lado!ram showin!the )hylo!eneticrelationshi)s betweenour s)ecies
-b. +elationshi)s o thesame our s)eciesre)resented as a set onested )arentheses
-c. Evolutionaryrelationshi)s o thesame our s)ecies withnine syna)omor)hies-shared, derived
characters. )lotted onthe branches
-
7/24/2019 54406741 Molecular Evolution
7/32
Using Phylogeny toUnderstand Gene Duplication
and Loss
' ' !ene tree1 The !ene tree su)erim)osed on a s)ecies tree,
allowin! identication o the du)lication and lossevents
-
7/24/2019 54406741 Molecular Evolution
8/32
roblems with hylo!enetic Inerence
How do we know what the)otential candidate trees are"
2 How do we choose which treeis -most likely. the true tree"
-
7/24/2019 54406741 Molecular Evolution
9/32
4umber o ossible Trees
4umber otaxa or !enes
4umber o)ossible rootedtrees
3 3
5 6
6 06
7 0,386
1 ' /
' 1 /
/ 1 '
-
7/24/2019 54406741 Molecular Evolution
10/32
+eci)e or reconstructin! a )hylo!eny
9elect an o)timality criterion2 9elect a search strate!y
3 :se the selected searchstrate!y to !enerate a serieso trees, and a))ly theselected o)timality criterion
to each tree, always kee)in!track o the %best& treeexamined thus ar
9 h t t *hi h i th i ht
-
7/24/2019 54406741 Molecular Evolution
11/32
9earch strate!y( *hich is the ri!httree"
When
m
is the number of ta"a, the number ofpossible trees is:
&$'m()%*+&'m('$m('%*+
-or ./ ta"a, the number of trees is )0,012,0'1
3any trees can be discarded because they areobviously wrong
#ometimes, there is a general or even specific
grouping that can serve as a start for the tree search
There are a number of approaches to tree searches
that can be used
-
7/24/2019 54406741 Molecular Evolution
12/32
9earch 9trate!ies
Strategy Type;Stepwise addition Algorithmic
; 9tar decom)osition 'l!orithmic
;Exhaustive Exact;1ranch < bound Exact
;Branch swapping euristic
; =enetic al!orithm Heuristic;Markov /hain Monte /arlo heuristic
1ut, we still need to evaluate the trees in order to
identiy the one most likely to be the true tree
-
7/24/2019 54406741 Molecular Evolution
13/32
/hoose an o)timality criterion to evaluate t
/ommonalities can be ound, but how
can these be used to evaluate a tree"
-
7/24/2019 54406741 Molecular Evolution
14/32
=eneral di>erences between o)timality crite
Minimumevolution
Maximumarsimony
Maximum$ikelihood
Model based %Model ree& Model based
/an account or manyty)es o se?uence
substitutions
'ssumes that allsubstitutions are e?ual
/an account or manyty)es o se?uence
substitutions
*orks well with stron! orweak se?uence similarity
*orks only when se?uencesimilarity is hi!h
*orks well with stron!or weak se?uencesimilarity
/om)utationally ast /om)utationally ast /om)utationally slow
*ell understood statistical)ro)erties -easy to test.
oorly understoodstatistical )ro)erties -hardto test.
*ell understoodstatistical )ro)erties-easy to test.
/an accurately estimatebranch len!ths -im)ortantor molecular clocks.
/annot estimate branchlen!ths accurately
/an estimate branchlen!ths with somede!ree o accuracy
-
7/24/2019 54406741 Molecular Evolution
15/32
Maximum arsimony
The )arsimony score is theminimum number o re?uiredchan!es, or ste)s
2 Only shared, derived characters are
used3 The score or each character -site.is called the character score
5 9ite len!ths added over all sites is
the tree length6 The tree -out o all examined trees.
with the lowest tree len!th is themost parsimonious tree@ and
most likely to be the true tree
-
7/24/2019 54406741 Molecular Evolution
16/32
Exam)le( Maximum arsimony
5 toes
20 teeth
0 ribs, 6 toes,round lobes, lon! le!s
oval lobes, A teeth, 26 verts,B ribs, 3 toes, short le!s
XFHG
5 toes, short le!s, Bribs, A teeth, ovallobes
20 teeth, 6toes, 0ribs, roundlobes, lon!le!s
3 toes, roundlobes
round lobes, 20 teeth, 26
verts,0 ribs, 6 toes, lon! le!s
FGHX
5
Tree len!th( A ste)s
6
2
6
Tree len!th( 2 ste)s
-
7/24/2019 54406741 Molecular Evolution
17/32
#imple e"ample of parsimony with sequence data
-
7/24/2019 54406741 Molecular Evolution
18/32
'nother exam)le with nucleotide da
-a. 'li!nment o ourhy)othetical C4'se?uences
-b. Most )arsimoniousrooted clado!ram orthis ali!nment
-c. /orres)ondin! unrootedclado!ram
-
7/24/2019 54406741 Molecular Evolution
19/32
ssues 4 problems with parsimony
3ultiple trees may be the mostparsimonious $have the same tree length%A consensus tree can be constructed to visuali5e
the congruity 4 discontinuity between these 6ranch lengths $and, therefore, rates of
change% cannot be accurately estimated
7o e"plicit model of change is used, evenwhen one might be well supportedThe most parsimonious tree$s% may not be the
true tree
-
7/24/2019 54406741 Molecular Evolution
20/32
Minimum Evolution -Cistance.
'll data are used, even thou!h some maynot be shared, derived characters
2 The !ranch lengths re)resent distancebetween a taxon and an ancestor, !iven an
assumed model o evolution3 The pairwise distancesare calculated oreach )air o taxa, !iven an assumed modelo evolution
5 The tree lengthis the sum o branch
len!th across a tree6 The tree -out o all examined trees. with the
lowest tree len!th is the minimumevolution tree@ and most likely to be the
true tree
-
7/24/2019 54406741 Molecular Evolution
21/32
The tree is di>erent than a )arsimony tree
-a. Hy)othetical evolutionaryrelationshi)s between threeC4' se?uences, in which thehoriDontal branch len!ths are)ro)ortional to the number o
characterstate chan!esalon! the branches
-b. To)olo!y o the )arsimoniousclado!ram that would beconstructed rom the
se?uence similarities)roduced by such anevolutionary history imulti)le substitutions hadoccurred at several sites
-
7/24/2019 54406741 Molecular Evolution
22/32
Factors that '>ect hylo!enetic Inerence
+elative base re?uencies -',=,T,/.2 TransitionGtransversion ratio3 4umber o substitutions )er site5 4umber o nucleotides -or amino acids. in se?uence6 Ci>erent rates in di>erent )arts o the molecule
A 9ynonymousGnonsynonymous substitution ratio7 9ubstitutions that are uninormative or obuscatory
arallel substitutions2 /onver!ent substitutions3 1ack substitutions
5 /oincidental substitutions
In !eneral, the more actors that are accounted orby the model -ie, more )arameters., the lar!er theerror o estimation It is oten best to use ewer
)arameters by choosin! the sim)ler model
Models o evolution( choosin! )arame
-
7/24/2019 54406741 Molecular Evolution
23/32
9ome distance models( )distance
;) ndGn, where n is the number o sites
-nucleotides or amino acids., and ndis the
number o di>erences between the two
se?uences examined;ery robust when diver!ence times arerecent and the a>ect o com)licatin!)henomena is minor
-
7/24/2019 54406741 Molecular Evolution
24/32
9ome distance models( Jukes/antor
8sed to estimate the number ofsubstitutions per site
The e"pected number of
substitutions per site is:
d 9 )t 9 ($)0%ln&.($0)%p+,
where p is the proportion of
difference between ' sequences
;ariance can be calculated 7o assumptions are made about
nucleotide frequencies, or
differential substitution rates
' T / =
'
T/=
(
(
(
(
9ome distance models( Kimura two
-
7/24/2019 54406741 Molecular Evolution
25/32
9ome distance models( Kimura two)arameter
8sed to estimate the number of
substitutions per site
d 9 'rt, where r is thesubstitution rate $per site, peryear% and t is the generation timet
Accounts for different transition
and transversion rates 7o assumptions are made aboutnucleotide frequencies, varianceis greater than ?u@es(antor
C T
A G
Pyrimidines
Purines
= transition rate= transversion rateThese are treated the
same for long divergence
times.
-
7/24/2019 54406741 Molecular Evolution
26/32
Bther models
Casegawa, Dishino, Eano $CDE%: corrects forunequal nucleotide frequencies and transitiontransversion bias into account
8nrestricted model: allows different rates between
all pairs of nucleotides Feneral Time Reversible model: allows different
rates between all pairs of nucleotides and correctsfor unequal nucleotide frequencies
3any other models have been invented to correctfor specific problems
The more parameters are introduced, the larger thevariance becomes
-
7/24/2019 54406741 Molecular Evolution
27/32
Ways to build trees with distance models: 3!
3inimum !volution $3!% trees can be found by
e"haustive searches or heuristic searches $starting
with a reasonable tree or eliminating unli@ely
possible trees%
-or each tree e"amined, the total tree length iscalculated as the sum of branch lengths calculated
using a given model
3!, li@e 3a"imum Garsimony, may generate a
number of equal(scoring 3! trees and may not
actually result in the true tree 3any other models
have been invented to correct for specific problems
-
7/24/2019 54406741 Molecular Evolution
28/32
Ways to build trees with distance models: 8GF3A
8GF3A $unweighted pair(group method usingarithmetic averages%
Fenerally accurate for molecular evolution whensubstitution rates are relatively constant, but this canrarely be assumed to be true
3ethod: distances for each pair of ta"a are computed using the chosen
distance method
The pair with the smallest value d are combined into a single,
composite ta"on The distances from this composite ta"on to all other ta"a are
computed
The ne"t pair with the smallest d is chosen $includingconsideration of pairings with the composite ta"on%
-
7/24/2019 54406741 Molecular Evolution
29/32
Ways to build trees with distance models: 7eighbor ?oining
7eighbor ?oining $7?% is a very robust method that is accurate
even when substitution rates are not constant, and generally
recovers the 3! tree $although this is not always the case%
3ethod:
We construct a HstarI tree and compute the sum of all branches, #B
$this will be greater than the sum of all branches for the final tree, #-%
We then pic@ a pair of ta"a to be HneighborsI, $say, ta"a . 4 '% and
compute the sum of all branches, #.,'
All other pairs of ta"a are then placed as neighbors and the sum of all
branches computed
The neighbors whose pairing results in the greatest reduction in the
sum of all branches will be @ept
Then, another round of neighbor Joining is conducted, including using
the neighbor pair retained in the first round
-
7/24/2019 54406741 Molecular Evolution
30/32
!"ample: The evolution of flight in stoneflies
9cale, in substitutionsGsite
Cened out!rou) taxa
;+econstruction o the
lecto)tera order -stoneLies.rom B9 r+4' se?uence
;Kimura 2)arameterdistance used
;Tree rooted with knownout!rou) s)ecies
;4ei!hborJoinin! treebuildin! method used toconstruct rst tree treesearch was conducted toensure that the 4J was alsothe ME tree
;/haracters related to Li!htwere then ma))ed onto thetree
-
7/24/2019 54406741 Molecular Evolution
31/32
Maximum $ikelihood
The site li"elihoods re)resent )robabilityo data or one site !iven an assumedmodel o evolution
2 Overall likelihood is the )roduct o the site
likelihoods3 Trees are evaluated by com)arin! log#
li"elihoodscores5 $ikelihood scores are com)arable across
models as well as trees, so it )rovides away o testin! the !oodness o t o amodel
6 The tree -out o all examined trees. withthe lowest tree len!th is the maximumli"elihood tree@ and most likely to be
-
7/24/2019 54406741 Molecular Evolution
32/32
4ext Tuesday(
Exam)les o )hylo!enetic reconstructions2 :ses o )hylo!enetic trees3 Other research usin! molecular evolution
4ext Thursday( exam
'll material throu!h next
Tuesday -0GB. will be coveredby the exam