bioinformática: inferência filogenética ?· bioinformática: inferência filogenética why do...

Download Bioinformática: Inferência filogenética ?· Bioinformática: Inferência filogenética WHY DO WE…

Post on 15-Nov-2018

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Bioinformtica: Inferncia filogentica

    WHYDO WE CARE ?

    Rita Castilho, rcastil@ualg.pt

    What for?

    Forense

    Prever a evoluo do vrus da influenza

    Prevr as funes de genes no caracterizados

    Descoberta de drogas

    Desenvolvimento de vacinas

    Uses of phylogenies: Taxonomy

    Similar organisms are grouped together

    Clades share common evolutionary history

    Phylogenetic classification names clades

    Source: Inoue, J.G., Miya, M., Tsukamoto, K., Nishida, M. 2003.

    Basal actinopterygian relationships: a mitogenomic perspective on the

    phylogeny of the ancient fish. Molecular Phylogenetics and

    Evolution, 26: 110-120.

    Pryer et al. 2001

  • Uses of phylogenies: Character evolution

    What did the ancestral Darwin's Finch eat?

    Example of correlated character evolution

    Granivore Insectivore Folivore

    MP ML =ts ML ts

    Schluter et al. 1997

    Uses of phylogenies: Ecology

    Study the evolution of ecological interaction and behavior

    Why might two related species have a different ecology?

    e.g. social vs. solitary, drought tolerant vs. mesophytic, parasitic vs. free living, etc.

    What are the causes of these differences? Is the environment causing these differences? Can we infer which condition is ancestral?

    Examples of phylogenetic ecology

    Evolutionary ecology of mate

    choice in swordtail fish (genus Xiphophorus)

    Morris et al. 2003

  • Uses of phylogenies: Co-evolution

    Compare divergence patterns in two groups of tightly linked organisms (e.g. hosts and parasites or plants and obligate pollinators)

    Look at how similar the two phylogenies are Look at host switching

    Evolutionary arms races Traits in one group track traits in another group

    e.g. toxin production and resistance in prey/predator or plant/herbivore systems, floral tube and proboscis length in pollination systems

    Example of host-parasite phylogeny

    Uses of phylogenies: Phylogenetic geography

    Sometimes called historical biogeography or phylogeography

    Map the phylogeny with geographical ranges of populations or species

    Understand geographic origin and spread of species

    Look at similarities between unrelated organisms

    Understand repeated patterns in distributions e.g. identifying glacial refugia

    Example of phylogeographyIndependent sites of pig

    domestication

    Larson et al. 2005

  • Uses of phylogenies: Estimating Divergence Times

    Estimate when a group of organisms originated Uses information about phylogeny and rates of

    evolutionary change to place timescales on tree

    Needs calibration with fossils Combined with mapping characters, correlate

    historical events with character evolution

    e.g. Radiation of flowering plants in the Cretaceous

    Example of timescales on phylogenies

    Timing the evolution of sociality in sweat bees to a warm period in geologic history

    Brady et al. 2006

    Multiple origins of HIV from SIV (Simian Immunodeficiency

    Virus)

    Uses of phylogenies: Medicine

    Learn about the origin of diseases

    Look for disease resistance mechanisms in other hosts to identify treatment and therapy in humans

    Multiple origins of HIV from SIV (Simian Immunodeficiency Virus)

    From: Understanding Evolution. HIV: the ultimate evolver. http://evolution.berkeley.edu/evolibrary/article/0_0_0/medicine_04

  • Severe acute respiratory syndrome

    Example of disease phylogeny

    Wendong et al. 2005

    Methicillin-resistant Staphylococcus aureus

    Asia

    EuropaAmrica do Sul

    AustralasiaAmrica do Norte

    Example of disease phylogeny

    Harris et al. 2010

  • Example of medical forensics

    A dentist who was infected with HIV was suspected of infecting some of his patients in the course of treatment

    HIV evolves very quickly (10-3 substitutions/year) Possible to trace the history of infections among

    individuals by conducting a phylogenetic analysis of HIV sequences

    Samples were taken from dentist, patients, and other infected individuals in the community

    Study found 5 patients had been infected by the dentistSource: Ou et. al. 1992. Molecular epidemiology of HIV transmission in a dental practice. Science, 256: 1165-1171.

    Exemplo 2

  • Filogenia e evoluo molecular

    Rita Castilho, rcastil@ualg.pt

    =

    Determinao da origem comum dos organismos

    Para que servem as filogenias?

    Latimeria

    Protopterus

    Qual o ancestral comum vivo mais prximo dos tetrpodes?

    Qual o ancestral comum vivo mais

    prximo dos tetrpodes?

  • Trs pressupostos principaisQualquer grupo de organismos encontra-se relacionado entre si atravs de um ancestral comum.

    Existe um padro de divergncia que bifurcado.H excepes como a transferncia lateral de genes.....

    Trs pressupostos principais

    A mudana nas caractersticas dos organismos acontece ao longo do tempoOrbit eclipses dorsal midlineOrbit migration

    CitharusPsettodesAmphistium/HeteronectesTrachinatus

    Migrated orbit

    Unmigrated orbit

    Trs pressupostos principais How to build Phylogenetic Trees

    Select Sequences

    Align Sequences

    Choose model and method; Build tree

    Evaluate Tree

    Interpret Phylogeny

    Good

    Needs Improvement

  • Estimating Genetic Differences

    0 25 50 750

    0.5

    1.0

    1.5 Expected differences

    Time

    Diff

    eren

    ces

    betw

    een

    sequ

    ence

    s

    Estimating Genetic Differences

    If all nucleotides equally likely, observed difference would plateau at 0.75

    Simply counting differences underestimates distances.

    Fails to count for multiple hits 0 25 50 75

    0

    0.5

    1.0

    1.5 Expected differences

    Observed differences

    Time

    Diff

    eren

    ces

    betw

    een

    sequ

    ence

    s

  • C GC A T G

    1 2 3

    1

    Seq 1

    Seq 2

    Number of changes

    Seq 1 A G C G A G Seq 2 G C G G A C

    Bioinformtica: Inferncia filogentica

    WHYDO WE CARE ?

    Rita Castilho, rcastil@ualg.pt

    Latimeria

    Protopterus

    Qual o ancestral comum vivo mais prximo dos tetrpodes?

  • Qual o ancestral comum vivo mais

    prximo dos tetrpodes?

    One substitutions happened - one substitution is visible

    G

    CG

    PAST

    G

    CA

    PAST

    Two substitutions happened - only one substitution is visible Two substitutions happened - no visible substitution

    GPAST

    A A

  • Estimating Genetic Differences

    If all nucleotides equally likely, observed difference would plateau at 0.75

    Simply counting differences underestimates distances.

    Fails to count for multiple hits 0 25 50 75

    0

    0.5

    1.0

    1.5Expected differences

    Observed differences

    Time

    Diff

    eren

    ces

    betw

    een

    sequ

    ence

    s

    Page RDM, Holmes EC (1998) Molecular Evolution: a phylogenetic approach Blackwell Science, Oxford.

    Models of evolution

    Page RDM, Holmes EC (1998) Molecular Evolution: a phylogenetic approach Blackwell Science, Oxford.

    Models of evolution Impact of models: 3 sequences

    http://artedi.ebc.uu.se/course/X3-2004/Phylogeny/Exercises/nj.html

    AGC AAC ACC

    Sequences 1 and 2 differs at 1 out of 3 positions = 1/3 Sequences 1 and 3 differs at 1 out of 3 positions = 1/3 Sequences 2 and 3 differs at 1 out of 3 positions = 1/3

    1 2 31 -2 0.333 -3 0.333 0.333 -

  • JC69 model (Jukes-Cantor, 1969)

    http://www.bioinf.manchester.ac.uk/resources/phase/manual

    Where P is the proportion of nucleotides that are different (the observed differences above) in the two sequences and ln is the natural log function. To calculate the JC distances from the observed differences above:

    1 2 31 -2 0.333 -3 0.333 0.333 -

    1 2 31 -2 0.44 -3 0.44 0.44 -

    AGC AAC ACC

    d = 34ln 1 4P

    3

    d = 34ln 1 4(1 / 3)

    3

    Page RDM, Holmes EC (1998) Molecular Evolution: a phylogenetic approach Blackwell Science, Oxford.

    Models of evolution

    K80 model (Kimura, 1980) orKimura 2P

    Kimura's Two Parameter model (K2P) incorporates the observation that the rate of transitions per site (a) may differ from the rate of transversions (b), giving a total rate of substitiutions per site of (a + 2b)(there are three possible substitutions: one transition and two transversions). The transition:transversion ratio a/b is often represented by the letter kappa (k).

    In the K2P model the number of nucleotide substitutions per site is given by:

    where: P the proportional differences between the two sequences due to transitions Q are the proportional differences between the two sequences due to transitions and transversions respectively.

    AGC AAC ACC

    d = 12ln 11 2P Q

    + 14

    11 2Q

    K80 model (Kimura, 1980) orKimura 2P

    Sequences 1 and 3 differ one transversion Sequences 2 and 3 differ one transversion

    AGC AAC

    Sequences 1 and 2 differ one transition

    AGC ACC

    AAC ACC

    1 2 3

    1 -

    2 0.549 -

    3 0.477 0.549 -

  • 1 2 3

    1 -

    2 0.549 -

    3 0.477 0.549 -

    1 2 3

    1 -

    2 0.441 -

    3 0.441 0.441 -

    1 2 3

    1 -

    2 0.333 -

    3 0.333 0.333 -

    Observed differences

    Jukes-Cantor model

    Kimura 2P

    Note how the differences caused by the application of different models give different distances Estimating Genetic Differences