54406741 molecular evolution

Upload: nahrul-ney

Post on 23-Feb-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/24/2019 54406741 Molecular Evolution

    1/32

    Introduction toMolecular Evolution

    Mike Thomas

    October 3, 2002

  • 7/24/2019 54406741 Molecular Evolution

    2/32

    What we can learn from multiple sequence alignments

    An alignment is a hypothesis about the relatedness of a set of genes

    This information can be used to reconstruct the evolutionary history of those

    genes

    The history of the genes can provide us with information about the structure

    and function, and significance of a gene or family of genes

    We can also use the reconstructed history to test hypotheses about evolutionitself:

    Rates of change

    The degree of change

    mplications of change, etc

    We can then pose and test hypotheses about the evolution of phenomenaunrelated to the genes

    !volution of flight in insects

    !volution of humans

    !volution of disease

  • 7/24/2019 54406741 Molecular Evolution

    3/32

    Assumptions made by phylogenetic methods:

    The sequences are correct

    The sequence are homologous

    !ach position is homologous

    The sampling of ta"a or genes is sufficient to resolve the

    problem of interest #equence variation is representative of the broader group of

    interest

    #equence variation contains sufficient phylogenetic signal $as

    opposed to noise% to resolve the problem of intereest !ach position in the sequence evolved independently

  • 7/24/2019 54406741 Molecular Evolution

    4/32

    How do you extract this inormation rom an ali!nm

  • 7/24/2019 54406741 Molecular Evolution

    5/32

    Haeckel#s Tree o $ie

    %Hi!her& or!anisms

    %$ower& or!anisms

    'nswer( a tree

    ' )hylo!enetic tree isa hierarchical,!ra)hicalre)resentation o

    relationshi)s

  • 7/24/2019 54406741 Molecular Evolution

    6/32

    Other *ays to +e)resent hylo!eni

    -a. /lado!ram showin!the )hylo!eneticrelationshi)s betweenour s)ecies

    -b. +elationshi)s o thesame our s)eciesre)resented as a set onested )arentheses

    -c. Evolutionaryrelationshi)s o thesame our s)ecies withnine syna)omor)hies-shared, derived

    characters. )lotted onthe branches

  • 7/24/2019 54406741 Molecular Evolution

    7/32

    Using Phylogeny toUnderstand Gene Duplication

    and Loss

    ' ' !ene tree1 The !ene tree su)erim)osed on a s)ecies tree,

    allowin! identication o the du)lication and lossevents

  • 7/24/2019 54406741 Molecular Evolution

    8/32

    roblems with hylo!enetic Inerence

    How do we know what the)otential candidate trees are"

    2 How do we choose which treeis -most likely. the true tree"

  • 7/24/2019 54406741 Molecular Evolution

    9/32

    4umber o ossible Trees

    4umber otaxa or !enes

    4umber o)ossible rootedtrees

    3 3

    5 6

    6 06

    7 0,386

    1 ' /

    ' 1 /

    / 1 '

  • 7/24/2019 54406741 Molecular Evolution

    10/32

    +eci)e or reconstructin! a )hylo!eny

    9elect an o)timality criterion2 9elect a search strate!y

    3 :se the selected searchstrate!y to !enerate a serieso trees, and a))ly theselected o)timality criterion

    to each tree, always kee)in!track o the %best& treeexamined thus ar

    9 h t t *hi h i th i ht

  • 7/24/2019 54406741 Molecular Evolution

    11/32

    9earch strate!y( *hich is the ri!httree"

    When

    m

    is the number of ta"a, the number ofpossible trees is:

    &$'m()%*+&'m('$m('%*+

    -or ./ ta"a, the number of trees is )0,012,0'1

    3any trees can be discarded because they areobviously wrong

    #ometimes, there is a general or even specific

    grouping that can serve as a start for the tree search

    There are a number of approaches to tree searches

    that can be used

  • 7/24/2019 54406741 Molecular Evolution

    12/32

    9earch 9trate!ies

    Strategy Type;Stepwise addition Algorithmic

    ; 9tar decom)osition 'l!orithmic

    ;Exhaustive Exact;1ranch < bound Exact

    ;Branch swapping euristic

    ; =enetic al!orithm Heuristic;Markov /hain Monte /arlo heuristic

    1ut, we still need to evaluate the trees in order to

    identiy the one most likely to be the true tree

  • 7/24/2019 54406741 Molecular Evolution

    13/32

    /hoose an o)timality criterion to evaluate t

    /ommonalities can be ound, but how

    can these be used to evaluate a tree"

  • 7/24/2019 54406741 Molecular Evolution

    14/32

    =eneral di>erences between o)timality crite

    Minimumevolution

    Maximumarsimony

    Maximum$ikelihood

    Model based %Model ree& Model based

    /an account or manyty)es o se?uence

    substitutions

    'ssumes that allsubstitutions are e?ual

    /an account or manyty)es o se?uence

    substitutions

    *orks well with stron! orweak se?uence similarity

    *orks only when se?uencesimilarity is hi!h

    *orks well with stron!or weak se?uencesimilarity

    /om)utationally ast /om)utationally ast /om)utationally slow

    *ell understood statistical)ro)erties -easy to test.

    oorly understoodstatistical )ro)erties -hardto test.

    *ell understoodstatistical )ro)erties-easy to test.

    /an accurately estimatebranch len!ths -im)ortantor molecular clocks.

    /annot estimate branchlen!ths accurately

    /an estimate branchlen!ths with somede!ree o accuracy

  • 7/24/2019 54406741 Molecular Evolution

    15/32

    Maximum arsimony

    The )arsimony score is theminimum number o re?uiredchan!es, or ste)s

    2 Only shared, derived characters are

    used3 The score or each character -site.is called the character score

    5 9ite len!ths added over all sites is

    the tree length6 The tree -out o all examined trees.

    with the lowest tree len!th is themost parsimonious tree@ and

    most likely to be the true tree

  • 7/24/2019 54406741 Molecular Evolution

    16/32

    Exam)le( Maximum arsimony

    5 toes

    20 teeth

    0 ribs, 6 toes,round lobes, lon! le!s

    oval lobes, A teeth, 26 verts,B ribs, 3 toes, short le!s

    XFHG

    5 toes, short le!s, Bribs, A teeth, ovallobes

    20 teeth, 6toes, 0ribs, roundlobes, lon!le!s

    3 toes, roundlobes

    round lobes, 20 teeth, 26

    verts,0 ribs, 6 toes, lon! le!s

    FGHX

    5

    Tree len!th( A ste)s

    6

    2

    6

    Tree len!th( 2 ste)s

  • 7/24/2019 54406741 Molecular Evolution

    17/32

    #imple e"ample of parsimony with sequence data

  • 7/24/2019 54406741 Molecular Evolution

    18/32

    'nother exam)le with nucleotide da

    -a. 'li!nment o ourhy)othetical C4'se?uences

    -b. Most )arsimoniousrooted clado!ram orthis ali!nment

    -c. /orres)ondin! unrootedclado!ram

  • 7/24/2019 54406741 Molecular Evolution

    19/32

    ssues 4 problems with parsimony

    3ultiple trees may be the mostparsimonious $have the same tree length%A consensus tree can be constructed to visuali5e

    the congruity 4 discontinuity between these 6ranch lengths $and, therefore, rates of

    change% cannot be accurately estimated

    7o e"plicit model of change is used, evenwhen one might be well supportedThe most parsimonious tree$s% may not be the

    true tree

  • 7/24/2019 54406741 Molecular Evolution

    20/32

    Minimum Evolution -Cistance.

    'll data are used, even thou!h some maynot be shared, derived characters

    2 The !ranch lengths re)resent distancebetween a taxon and an ancestor, !iven an

    assumed model o evolution3 The pairwise distancesare calculated oreach )air o taxa, !iven an assumed modelo evolution

    5 The tree lengthis the sum o branch

    len!th across a tree6 The tree -out o all examined trees. with the

    lowest tree len!th is the minimumevolution tree@ and most likely to be the

    true tree

  • 7/24/2019 54406741 Molecular Evolution

    21/32

    The tree is di>erent than a )arsimony tree

    -a. Hy)othetical evolutionaryrelationshi)s between threeC4' se?uences, in which thehoriDontal branch len!ths are)ro)ortional to the number o

    characterstate chan!esalon! the branches

    -b. To)olo!y o the )arsimoniousclado!ram that would beconstructed rom the

    se?uence similarities)roduced by such anevolutionary history imulti)le substitutions hadoccurred at several sites

  • 7/24/2019 54406741 Molecular Evolution

    22/32

    Factors that '>ect hylo!enetic Inerence

    +elative base re?uencies -',=,T,/.2 TransitionGtransversion ratio3 4umber o substitutions )er site5 4umber o nucleotides -or amino acids. in se?uence6 Ci>erent rates in di>erent )arts o the molecule

    A 9ynonymousGnonsynonymous substitution ratio7 9ubstitutions that are uninormative or obuscatory

    arallel substitutions2 /onver!ent substitutions3 1ack substitutions

    5 /oincidental substitutions

    In !eneral, the more actors that are accounted orby the model -ie, more )arameters., the lar!er theerror o estimation It is oten best to use ewer

    )arameters by choosin! the sim)ler model

    Models o evolution( choosin! )arame

  • 7/24/2019 54406741 Molecular Evolution

    23/32

    9ome distance models( )distance

    ;) ndGn, where n is the number o sites

    -nucleotides or amino acids., and ndis the

    number o di>erences between the two

    se?uences examined;ery robust when diver!ence times arerecent and the a>ect o com)licatin!)henomena is minor

  • 7/24/2019 54406741 Molecular Evolution

    24/32

    9ome distance models( Jukes/antor

    8sed to estimate the number ofsubstitutions per site

    The e"pected number of

    substitutions per site is:

    d 9 )t 9 ($)0%ln&.($0)%p+,

    where p is the proportion of

    difference between ' sequences

    ;ariance can be calculated 7o assumptions are made about

    nucleotide frequencies, or

    differential substitution rates

    ' T / =

    '

    T/=

    (

    (

    (

    (

    9ome distance models( Kimura two

  • 7/24/2019 54406741 Molecular Evolution

    25/32

    9ome distance models( Kimura two)arameter

    8sed to estimate the number of

    substitutions per site

    d 9 'rt, where r is thesubstitution rate $per site, peryear% and t is the generation timet

    Accounts for different transition

    and transversion rates 7o assumptions are made aboutnucleotide frequencies, varianceis greater than ?u@es(antor

    C T

    A G

    Pyrimidines

    Purines

    = transition rate= transversion rateThese are treated the

    same for long divergence

    times.

  • 7/24/2019 54406741 Molecular Evolution

    26/32

    Bther models

    Casegawa, Dishino, Eano $CDE%: corrects forunequal nucleotide frequencies and transitiontransversion bias into account

    8nrestricted model: allows different rates between

    all pairs of nucleotides Feneral Time Reversible model: allows different

    rates between all pairs of nucleotides and correctsfor unequal nucleotide frequencies

    3any other models have been invented to correctfor specific problems

    The more parameters are introduced, the larger thevariance becomes

  • 7/24/2019 54406741 Molecular Evolution

    27/32

    Ways to build trees with distance models: 3!

    3inimum !volution $3!% trees can be found by

    e"haustive searches or heuristic searches $starting

    with a reasonable tree or eliminating unli@ely

    possible trees%

    -or each tree e"amined, the total tree length iscalculated as the sum of branch lengths calculated

    using a given model

    3!, li@e 3a"imum Garsimony, may generate a

    number of equal(scoring 3! trees and may not

    actually result in the true tree 3any other models

    have been invented to correct for specific problems

  • 7/24/2019 54406741 Molecular Evolution

    28/32

    Ways to build trees with distance models: 8GF3A

    8GF3A $unweighted pair(group method usingarithmetic averages%

    Fenerally accurate for molecular evolution whensubstitution rates are relatively constant, but this canrarely be assumed to be true

    3ethod: distances for each pair of ta"a are computed using the chosen

    distance method

    The pair with the smallest value d are combined into a single,

    composite ta"on The distances from this composite ta"on to all other ta"a are

    computed

    The ne"t pair with the smallest d is chosen $includingconsideration of pairings with the composite ta"on%

  • 7/24/2019 54406741 Molecular Evolution

    29/32

    Ways to build trees with distance models: 7eighbor ?oining

    7eighbor ?oining $7?% is a very robust method that is accurate

    even when substitution rates are not constant, and generally

    recovers the 3! tree $although this is not always the case%

    3ethod:

    We construct a HstarI tree and compute the sum of all branches, #B

    $this will be greater than the sum of all branches for the final tree, #-%

    We then pic@ a pair of ta"a to be HneighborsI, $say, ta"a . 4 '% and

    compute the sum of all branches, #.,'

    All other pairs of ta"a are then placed as neighbors and the sum of all

    branches computed

    The neighbors whose pairing results in the greatest reduction in the

    sum of all branches will be @ept

    Then, another round of neighbor Joining is conducted, including using

    the neighbor pair retained in the first round

  • 7/24/2019 54406741 Molecular Evolution

    30/32

    !"ample: The evolution of flight in stoneflies

    9cale, in substitutionsGsite

    Cened out!rou) taxa

    ;+econstruction o the

    lecto)tera order -stoneLies.rom B9 r+4' se?uence

    ;Kimura 2)arameterdistance used

    ;Tree rooted with knownout!rou) s)ecies

    ;4ei!hborJoinin! treebuildin! method used toconstruct rst tree treesearch was conducted toensure that the 4J was alsothe ME tree

    ;/haracters related to Li!htwere then ma))ed onto thetree

  • 7/24/2019 54406741 Molecular Evolution

    31/32

    Maximum $ikelihood

    The site li"elihoods re)resent )robabilityo data or one site !iven an assumedmodel o evolution

    2 Overall likelihood is the )roduct o the site

    likelihoods3 Trees are evaluated by com)arin! log#

    li"elihoodscores5 $ikelihood scores are com)arable across

    models as well as trees, so it )rovides away o testin! the !oodness o t o amodel

    6 The tree -out o all examined trees. withthe lowest tree len!th is the maximumli"elihood tree@ and most likely to be

  • 7/24/2019 54406741 Molecular Evolution

    32/32

    4ext Tuesday(

    Exam)les o )hylo!enetic reconstructions2 :ses o )hylo!enetic trees3 Other research usin! molecular evolution

    4ext Thursday( exam

    'll material throu!h next

    Tuesday -0GB. will be coveredby the exam