Download - Aspectos Claves de la Evolución Molecular 1. Porqué estudiar filogenética y evolución molecular 2. Aspectos claves: C. genético, Mutaciones. 3. Alineamientos

Aspectos Claves de la Evolución Molecular

1. Porqué estudiar filogenética y evolución molecular2. Aspectos claves: C. genético, Mutaciones.3. Alineamientos y homología4. Construcción de árboles filogenéticos5. Modelos de sustitución

Porqué estudiar filogenética y evolución molecular?“Nothing in biology makes sense except in the light of evolution”- Theodosius Dobzhanski, 1973(The American Biology Teacher 35:125)

“Nothing in evolutionary biology makes sense except in the light of a phylogeny”- Jeff Palmer, Douglas Soltis, Mark Chase, 2004(American J. Botany 91: 1437-1445)

The evolutionary thinking

• Russel Wallace writes to Charles Darwin (June 17th 1858)

Ernst Haeckel (mid-19th Century): the tree of life

The neo-synthesis (Fisher, Heldane, and Wright, 1930-1950)

The molecular REvolution• Nuttal, 1904: Serological cross-reactions to study

phylogenetic relationships among various group of animals.

• Watson and Crick beautiful helix!

• Zuckerland and Pauling, 1965: molecular clocks.

• Fitch & Margoliash, 1967: Construction of phylogenetic trees.A method based on mutation distances as estimated from cytochrome c sequences is of general applicability (Science, 155:279-284).

• Kimura, 1968: Evolutionary rate at the molecular level (Nature, 217:624-626).

The birth of molecular evolution

Genetic codeIn the RNA that encode a protein, each triplet of bases is

recognized by the ribosome as a code for a specific amino acid.

This genetic code is universal for all organisms, with only a few exceptions such as the mitochondria.

There are 64 possible triplets: 61 sense codons (encode 20 amino acids) and 3 non-sense codons (stop codons).

A reading frame that is able to encode for a protein (open reading frame, ORF) starts with a codon for methionine and ends with a stop codon.

Inferencia FilogenéticaAspectos claves: C. genético, Mutaciones.

Point Mutations• Errors in duplication of genetic information can result

in the incorporation of a noncomplementary nucleotide: point mutations.

• Point mutations at the 1st, 2nd, and 3rd codon position usually (96%), always (100%), and rarely (30%) result in an amino-acid change, respectively.

• Point mutations that do not results in an amino-acid change are called synonymous.

• Point mutations that results in an amino-acid change are called non-synonymous.


A

Transitions and transversions

C

TG

Transitions () are purine (A, G) or pyrimidine (C, T) mutations: Pu-Pu, Py-Py

Transversions () are purine to pyrimidine mutations or the reverse: (Pu-Py, or Py-Pu).

4 possible transitions: AG, CT 8 possible transversions: AC, AT,

GC, GT Thus if mutations were random, transversions

are 2 times more likely than transitions. Due to steric hindrance (as well as negative

selection!), the opposite is true, transitions occur in general more often than transversions [2-15 times more, depending on the gene region and the species].

Point mutations and the genetic code

Indel Mutations• Errors in duplication of genetic information can also result

in deletions or insertions of one or more nucleotides: indels mutations.

• When three (or multiples of three) nucleotides are inserted or deleted in coding regions, the ORF remains intact, but one (or more) amino acids are inserted or deleted.

• In any other case, indels mutations disturb the ORF and the resulting gene codes for an entirely different protein, with a different length than the original one.

• Viruses often encode several proteins from a single gene by using overlapping ORFs.


Mecanismos genéticos explotados por los virus para la generación de

variabilidad

Mutacion e Hipermutación

Inserciones y Delecciónes

Recombinación

Reordenamientos

Dinámica de Cuasiespecies

compleja población de variantes fuertemente relacionadas genéticamente

baja fidelidad de la ARN polimerasa ARN dependiente

Secuencia maestraCo

ncen

trac

ión

EspacioSecuencia

Espectro de mutantes

Schuster P 2008


Cooperación y complementación


Alignments

• There are three main methods of sequence alignments:

1) Manual

2) Automatic (Dynamic programming, Progressive alignment)

3) Combined

Inferencia FilogenéticaAlineamientos y homología

Alignments

• An alignment is an hypothesis of positional homology between nucleotide/amino acids.

• Homologous sequences are usually aligned such that homologous sites form columns in the alignment.

Easy

Difficult due to indels

Human beta PEEKSAVTALWGKVHorse beta GEEKAAVLALWDKVHuman alpha PADKTNVKAAWGKVHorse alpha AADKTNVKAAWSKVWhale myoglobin EGEWQLVLHVWAKVLamprey globin AAEKTKIRSAWAPVLupin globin ESQAALVKSSWEEF

Human beta VHLTN–-FFESFGDLSTHorse beta VQLSN–-FFDSFGDLSNHuman alpha -VLSGAHYFPHF-DLS-Horse alpha -VLSGGHYFPHF-DLS-Whale myoglobin -VLSEADKFDRFKHLKTLamprey globin APLSYSTFFPKFKGLTTLupin globin GALTNANLFSFLKGTSE


Dynamic programming

• Dynamic programming (Needleman and Wunsch, 1970; Gotoh, 1982) is an exhaustive method that find the best alignment by giving substitutions scores for all pairs of aligned residues and gap penalties (GP).

• To prevent excessive use of gaps, indels are usually penalized using so-called GP.

• Alignment programs have separate penalties for inserting a gap (gap opening) and for extending a gap (gap extend).

G A T T T C G A T – T T CG A A T T C G A – A T T C


Scoring Scheme:•Match: +1•Mismatch: 0•Indel: -1

Dynamic programming


G A T T C –G A A T T C

1

1

0

1

0

-1

Alignment score = 2Alignment score = 2

Dynamic programming

Alineamientos y homología

G A – T T CG A A T T C


Dynamic programming

1

1

-1

1

1

1


G – A T T CG A A T T C


Dynamic programming

1

-1

1

1

1

1


Multiple Sequence Alignments

• Phylogenetic trees are based on multiple sequence alignments.

• Dynamic programming can be used to align multiple sequences but the time required growth exponentially with the number of sequences.

• Until end 1989 multiple sequences alignments were assembled by hand because the exhaustive alignment of more than five or six sequences is computationally unfeasible.

• Now, most multiple sequences alignments are constructed by the method known as progressive sequence alignment (Feng and Doolittle, 1987; Higgins and Sharp; 1988).


Progressive Alignment• Progressive alignment is a heuristic method as it makes no

guarantees to produce an alignment with the best score according to a formula.

• 1) Perform all possible pairwise alignments between each pair of sequences using a fast/approximate method.

• 2) Calculate the ‘distance’ between each pair of sequences and construct a crude “guide tree” with the Neighbor-Joining method.

• 3) The alignment is gradually built up by following the branching order in the tree, with each step being treated as a dynamic programming pairwise alignment, sometimes with each member of a ‘pair’ having more than one sequence.


Progressive alignment - step 1

1. gctcgatacgatacgatgactagcta

2. gctcgatacaagacgatgacagcta

3. gctcgatacacgatgacta----gcta

4. gctcgatacacgatgacga---gcga

5. ctcgaacgatacgatgact----agct

1. gctcgatacgatacgatgactagcta

2. gctcgatacaagacgatgac-agcta

1

23

4

5


1. gctcgatacgatacgatgactagcta2. gctcgatacaagacgatgac-agcta3. gctcgatacacgatgactagcta4. gctcgatacacgatgacgagcga5. ctcgaacgatacgatgactagct

3. gctcgatacacgatgactagcta4. gctcgatacacgatgacgagcga

1

23

4

5



1. gctcgatacgatacgatgactagcta2. gctcgatacaagacgatgac-agcta+3. gctcgatacacgatgactagcta4. gctcgatacacgatgacgagcga

1. gctcgatacgatacgatgactagcta2. gctcgatacaagacgatgac-agcta3. gctcgatacacga---tgactagcta4. gctcgatacacga---tgacgagcga

1

23

4

5



1. gctcgatacgatacgatgactagcta2. gctcgatacaagacgatgac-agcta3. gctcgatacacga---tgactagcta4. gctcgatacacga---tgacgagcga+5. ctcgaacgatacgatgactagct

1. gctcgatacgatacgatgactagcta2. gctcgatacaagacgatgac-agcta3. gctcgatacacga---tgactagcta4. gctcgatacacga---tgacgagcga5. -ctcga-acgatacgatgactagct-

1

23

4

5

Progressive alignment – final step


BR.96.RJ081

BR.96.RJ089

BR.91.RJ347

BR.96.RJ019

BR.92.BR003

BR.92.BR021

BR.95.RJ006

BR.03.56ST

BR.97.RJ130

BR.03.16ST

BR.03.58ST

BR.96.RJ065

BR.90.RJ049

BR.90.RJ129

BR.97.RJ001

BR.97.RJ008

BR.03.05ST

BR.01.M19

BR.03.46ST

BR.96.RJ092

BR.90.RJ054

BR.92.BR017

BR.92.BR018

BR.03.06ST

BR.03.72CF

BR.92.BR030

BR.96.RJ005

BR.92.BR020

BR.97.RJ013

BR.91.RJ364

BR.96.RJ029

BR.96.RJ095

BR.97.RJ004

BR.03.41ST

BR.92.BR019

BR.03.29ST

BR.96.RJ044

BR.91.RJ145

BR.91.RJ404

BR.91.RJ379

BR.95.RJ020

BR.92.RJ452

BR.95.RJ019

BR.96.RJ002

BR.03.51ST

BR.96.RJ001

BR.96.RJ004

BR.92.RJ625

BR.97.P1 10

BR.01.M06

BR.91.RJ350

BR.92.BR014

BR.95.RJ008

BR.95.RJ002

BR.96.RJ043

BR.03.13ST

BR.95.RJ015

BR.03.59ST

BR.96.RJ066

BR.01.M44

BR.97.RJ006

BR.96.RJ093

BR.97.RJ105

BR.97.RJ116

BR.90.RJ125

BR.03.14ST

BR.01.M20

BR.03.50ST

BbrCons

BR.96.RJ025

BR.92.RJ636

BR.96.RJ070

BR.91.RJ153

BR.01.M49

BR.92.BR024

BR.91.RJ392

BR.01.M22

BR.01.M23

BR.01.M45

BR.01.M41

BR.90.RJ121

BR.97.RJ011

BR.90.RJ064

BR.91.RJ336

BR.96.RJ088

BR.91.RJ420

BR.95.RJ005

BR.92.RJ483

BR.92.RJI04

BR.90.RJ062

BR.90.RJ014

BR.97.RJ131

BR.03.17ST

BR.92.RJI02

BR.90.RJ019

BR.90.RJ108

BR.95.RJ017

BR.97.RJ005

BR.01.M46

BR.03.25ST

BR.91.RJ139

BR.91.RJ416

BR.92.BR028

BR.95.RJ016

BR.95.RJ013

BR.96.RJ011

BR.92.RJ478

BR.95.RJ009

BR.95.RJ010

BR.92.RJ626

BR.96.RJ041

BR.90.RJ131

BR.92.RJ484

BR.96.RJ071

BR.91.RJ398

BR.90.RJ012

BR.92.BR004

BR.96.RJ075

BR.92.BR026

BR.03.40ST

BR.97.RJ124

BR.97.P2 12

BR.90.RJ059

BR.03.36ST

BR.03.53ST

BR.91.RJ143

BR.03.54ST

BR.92.RJ623

BR.01.M16

100

100

77

0.05

. Construcción de árboles filogenéticos

• La inferencia de relaciones filogenéticas a partir de secs. moleculares requiere de la selección de uno de los muchos métodos disponibles

• Con frecuencia la inferencia filogenética es considerada como una “caja negra” en la que “entran las secuencias y salen los árboles”

• La inferencia de relaciones filogenéticas a partir de secs. moleculares requiere de la selección de uno de los muchos métodos disponibles

• Con frecuencia la inferencia filogenética es considerada como una “caja negra” en la que “entran las secuencias y salen los árboles”

Objetivos fundamentales1. desarrollar un marco conceptual para entender los fundamentos teóricos (filosóficos)que distinguen a los distintos métodos de inferencia (clasificación de métodos)

2. presentar el uso de modelos y suposiciones en filogenética

Tree Reconstructions Methods

Maximum-likelihood

Bayesian Inference

Neighbour-Joining

Minimum Evolution

UPGMA

Maximum-parsimony

Character-based methods Distance-based methods

Methods based on an explicit model of evolution

Methods not based on an explicit model of evolution


Neighbor-Joining method

The NJ (Saitou and Nei, 1987) method is a heuristic method for estimating the minimum evolution tree.

The NJ method is based on the minimum evolution principle and construct internal nodes by joining nearest neighbors (two taxa connected by a single node) in each step.

PAM Spinach Rice Mosquito Monkey HumanSpinach 0.0 84.9 105.6 90.8 86.3Rice 84.9 0.0 117.8 122.4 122.6Mosquito 105.6 117.8 0.0 84.7 80.8Monkey 90.8 122.4 84.7 0.0 3.3Human 86.3 122.6 80.8 3.3 0.0

Distance Matrix


Distance 3.3 (Human - Monkey) is the minimum. So we'll join Human and Monkey to MonHum.

Mon-Hum

MonkeyHumanSpinachMosquito Rice

Neighbor-Joining (1)

Recalculate the distance matrix again…..


PAM Spinach Rice Mosquito MonHumSpinach 0.0 84.9 105.6 88.6Rice 84.9 0.0 117.8 122.5Mosquito 105.6 117.8 0.0 82.8MonHum 88.6 122.5 82.8 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)




PAM Spinach Rice MosMonHumSpinach 0.0 84.9 97.1Rice 84.9 0.0 120.2MosMonHum 97.1 120.2 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)

Spin-Rice




PAM SpinRice MosMonHumSpinach 0.0 108.7MosMonHum 108.7 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)

Spin-Rice

(Spin-Rice)-(Mos-(Mon-Hum))

Neighbor-Joining (4). Construcción de árboles filogenéticos

Human

Monkey

MosquitoRice

Spinach

Unrooted Neighbor-Joining Tree. Construcción de árboles filogenéticos

Distance-based methods

Advantages:

- very fast

- allows the use of an explicit model of evolution

Disadvantages:

- only produces one best tree (we do not get any idea about other potential tress)

- reduces all sequence information into a single distance value

- generally outperformed by Maximum likelihood or Bayesian methods in choosing the correct tree in computer simulations


Tree Reconstructions Methods

Maximum-likelihood

Bayesian Inference

Neighbour-Joining

Minimum Evolution

UPGMA

Maximum-parsimony

Character-based methods Distance-based methods

Methods based on an explicit model of evolution

Methods not based on an explicit model of evolution


Neighbour-Joining

Download - Aspectos Claves de la Evolución Molecular 1. Porqué estudiar filogenética y evolución molecular 2. Aspectos claves: C. genético, Mutaciones. 3. Alineamientos

Top Related