evolutionary biology concepts molecular evolution phylogenetic inference bio520 bioinformaticsjim...

Post on 03-Jan-2016

220 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Evolutionary Biology Concepts

Molecular Evolution

Phylogenetic Inference

BIO520 Bioinformatics Jim Lund

Reading: Ch7

Evolution

Evolution is a process that results in heritable changes in a population spread over many generations.

"In fact, evolution can be precisely defined as any change in the frequency of alleles within a gene pool from one generation to the next." - Helena Curtis and N. Sue Barnes, Biology, 5th ed. 1989 Worth Publishers, p.974

Levels of Evolution

• Changes in allele frequencies within a species.

• Speciation.

Molecular changes:– Single bp changes.

– Genomic changes (alterations in large DNA segments).

Branching Descent

Populations Individuals

Phylogeny

Branching diagram showing the ancestral relations among species.

“Tree of Life”

History of evolutionary change

FRAMEWORK for INFERENCE

The framework for phylogenetics

• How do we describe phylogenies?

• How do we infer phylogenies?

Inheritance

DNA RNA Protein Function

Ancestral Node or ROOT of

the TreeInternal Nodes orDivergence Points

(represent hypothetical ancestors of the taxa)

Branches or Lineages

Terminal Nodes

A

B

C

D

E

Represent theTAXA (genes,populations,species, etc.)used to inferthe phylogeny

Common Phylogenetic Tree Terminology

Phylogenetic trees diagram the evolutionary relationships between the taxa

((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses

Taxon A

Taxon B

Taxon C

Taxon E

Taxon D

No meaning to thespacing between thetaxa, or to the order inwhich they appear fromtop to bottom.

This dimension either can have no scale (for ‘cladograms’),can be proportional to genetic distance or amount of change(for ‘phylograms’ or ‘additive’ trees).

These say that B and C are more closely related to each other than either is to A,and that A, B, and C form a clade that is a sister group to the clade composed ofD and E. If the tree has a time scale, then D and E are the most closely related.

Taxon A

Taxon B

Taxon C

Taxon D

1

1

6

5

genetic change

Taxon A

Taxon B

Taxon C

Taxon D

no meaning

Two types of trees

Cladogram Phylogram or additive tree

Meaning of branch length differs.

All show the same evolutionary relationships,or branching orders, between the taxa.

Rooted vs Unrooted Trees

More Trees

A B C D E F

Trees-3

A B C D E F

Extinction

A B C D E F

Population Genetic Forces

• Natural Selection (fitness)• Drift (homozygosity by chance)

– much greater in small populations

• Mutation/Recombination (variation)• Migration

– homogenizes gene pools

Hardy-Weinberg Paradigmp+q=1

p2 + 2pq + q2 =1

Modes of speciation

Many ways speciation can occur, among the most common are:

• Geographic isolation.

• Reproductive isolation.– Sexual selection.– Behavioral isolation.

DNA, protein sequence change

Multiple Changes/No Change

..CCU AUA GGG..

..CCC AUA GGG..

..CCC AUG GGG..

..CCC AUG GGC..

..CCU AUG GGC..

..CCU AUA GGC..

5 mutations1 DNA change

0 amino acid changes (net)

Enumerating bp/aa changes underestimates evolutionary change

Mechanisms of DNA Sequence Change

Neutral Drift vs Natural Selection

Traditionalselectionmodel

Neutral(Kimura/Jukes)

Pan-neutralism

Rate of change (evolution) of hemoglobin protein

Each point on the graph is for a pair of species, or groups of species. From Kimura (1983) by way of Evolution, Ridley, 3rd ed.

Mutation rate varies Gene-to-Gene

Protein Rate (x 109 yr)

Lysozyme 2.0

Insulin 0.4

Histone H4 0.01

Rate varies Site-to-Site

Protein Coding Silent

Albumin 0.9 6.7

Histone H4

0.03 6.1

Average 0.9 4.6

Rate varies Site-to-Site

From Evolution. Mark Rdley, 3rd Ed.

Constraints on “Silent” Changes

• Codon Biases-translation rates• Transcription elongation rates

– polymerase ‘pause’ sites

• “Silent” regulatory elements– select for or against

presence/absence

• Overall genome structure

DNA, Protein Similarity

• Similarity by common descent– phylogenetic

• Similarity by convergence (rare)– functional importance

• Similarity by chance– random variation not limitless

– particular problem in wide divergence

Homology-similar by common descent

CCCAGG

CCCAAGCCCAAA

CCTAAA

Inferring Trees and Ancestors

CCCAGGCCCAAG->

CCCAAGCCCAAA->

CCTAAACCTAAA->

CCTAAC

Not always straightforward. The data doesn’t always give a single, correct answer.

Homology, Orthology, Paralogy

Paralogy Trap

Improper Inference

Garbage in, garbage out!

Our Goals

• Infer Phylogeny– Optimality criteria

– Algorithm

• Phylogenetic inference– (interesting ones)

Watch Out

“The danger of generating incorrect results is inherently greater in computational phylogenetics than in many other fields of science.”

“…the limiting factor in phylogenetic analysis is not so much in the facility of software application as in the conceptual understanding of what the software is doing with the data.”

top related