Aspectos Claves de la Evolución Molecular
1. Porqué estudiar filogenética y evolución molecular2. Aspectos claves: C. genético, Mutaciones.3. Alineamientos y homología4. Construcción de árboles filogenéticos5. Modelos de sustitución
Porqué estudiar filogenética y evolución molecular?“Nothing in biology makes sense except in the light of evolution”- Theodosius Dobzhanski, 1973(The American Biology Teacher 35:125)
“Nothing in evolutionary biology makes sense except in the light of a phylogeny”- Jeff Palmer, Douglas Soltis, Mark Chase, 2004(American J. Botany 91: 1437-1445)
The evolutionary thinking
• Russel Wallace writes to Charles Darwin (June 17th 1858)
Ernst Haeckel (mid-19th Century): the tree of life
The neo-synthesis (Fisher, Heldane, and Wright, 1930-1950)
The molecular REvolution• Nuttal, 1904: Serological cross-reactions to study
phylogenetic relationships among various group of animals.
• Watson and Crick beautiful helix!
• Zuckerland and Pauling, 1965: molecular clocks.
• Fitch & Margoliash, 1967: Construction of phylogenetic trees.A method based on mutation distances as estimated from cytochrome c sequences is of general applicability (Science, 155:279-284).
• Kimura, 1968: Evolutionary rate at the molecular level (Nature, 217:624-626).
The birth of molecular evolution
Genetic codeIn the RNA that encode a protein, each triplet of bases is
recognized by the ribosome as a code for a specific amino acid.
This genetic code is universal for all organisms, with only a few exceptions such as the mitochondria.
There are 64 possible triplets: 61 sense codons (encode 20 amino acids) and 3 non-sense codons (stop codons).
A reading frame that is able to encode for a protein (open reading frame, ORF) starts with a codon for methionine and ends with a stop codon.
Inferencia FilogenéticaAspectos claves: C. genético, Mutaciones.
Point Mutations• Errors in duplication of genetic information can result
in the incorporation of a noncomplementary nucleotide: point mutations.
• Point mutations at the 1st, 2nd, and 3rd codon position usually (96%), always (100%), and rarely (30%) result in an amino-acid change, respectively.
• Point mutations that do not results in an amino-acid change are called synonymous.
• Point mutations that results in an amino-acid change are called non-synonymous.
Inferencia FilogenéticaAspectos claves: C. genético, Mutaciones.
A
Transitions and transversions
C
TG
Transitions () are purine (A, G) or pyrimidine (C, T) mutations: Pu-Pu, Py-Py
Transversions () are purine to pyrimidine mutations or the reverse: (Pu-Py, or Py-Pu).
4 possible transitions: AG, CT 8 possible transversions: AC, AT,
GC, GT Thus if mutations were random, transversions
are 2 times more likely than transitions. Due to steric hindrance (as well as negative
selection!), the opposite is true, transitions occur in general more often than transversions [2-15 times more, depending on the gene region and the species].
Point mutations and the genetic code
Indel Mutations• Errors in duplication of genetic information can also result
in deletions or insertions of one or more nucleotides: indels mutations.
• When three (or multiples of three) nucleotides are inserted or deleted in coding regions, the ORF remains intact, but one (or more) amino acids are inserted or deleted.
• In any other case, indels mutations disturb the ORF and the resulting gene codes for an entirely different protein, with a different length than the original one.
• Viruses often encode several proteins from a single gene by using overlapping ORFs.
Inferencia FilogenéticaAspectos claves: C. genético, Mutaciones.
Mecanismos genéticos explotados por los virus para la generación de
variabilidad
Mutacion e Hipermutación
Inserciones y Delecciónes
Recombinación
Reordenamientos
Dinámica de Cuasiespecies
compleja población de variantes fuertemente relacionadas genéticamente
baja fidelidad de la ARN polimerasa ARN dependiente
Secuencia maestraCo
ncen
trac
ión
EspacioSecuencia
Espectro de mutantes
Schuster P 2008
Dinámica de Cuasiespecies
Dinámica de Cuasiespecies
Dinámica de Cuasiespecies
Cooperación y complementación
Dinámica de Cuasiespecies
Dinámica de Cuasiespecies
Dinámica de Cuasiespecies
Alignments
• There are three main methods of sequence alignments:
1) Manual
2) Automatic (Dynamic programming, Progressive alignment)
3) Combined
Inferencia FilogenéticaAlineamientos y homología
Alignments
• An alignment is an hypothesis of positional homology between nucleotide/amino acids.
• Homologous sequences are usually aligned such that homologous sites form columns in the alignment.
Easy
Difficult due to indels
Human beta PEEKSAVTALWGKVHorse beta GEEKAAVLALWDKVHuman alpha PADKTNVKAAWGKVHorse alpha AADKTNVKAAWSKVWhale myoglobin EGEWQLVLHVWAKVLamprey globin AAEKTKIRSAWAPVLupin globin ESQAALVKSSWEEF
Human beta VHLTN–-FFESFGDLSTHorse beta VQLSN–-FFDSFGDLSNHuman alpha -VLSGAHYFPHF-DLS-Horse alpha -VLSGGHYFPHF-DLS-Whale myoglobin -VLSEADKFDRFKHLKTLamprey globin APLSYSTFFPKFKGLTTLupin globin GALTNANLFSFLKGTSE
Inferencia FilogenéticaAlineamientos y homología
Dynamic programming
• Dynamic programming (Needleman and Wunsch, 1970; Gotoh, 1982) is an exhaustive method that find the best alignment by giving substitutions scores for all pairs of aligned residues and gap penalties (GP).
• To prevent excessive use of gaps, indels are usually penalized using so-called GP.
• Alignment programs have separate penalties for inserting a gap (gap opening) and for extending a gap (gap extend).
G A T T T C G A T – T T CG A A T T C G A – A T T C
Inferencia FilogenéticaAlineamientos y homología
Scoring Scheme:•Match: +1•Mismatch: 0•Indel: -1
Dynamic programming
Inferencia FilogenéticaAlineamientos y homología
G A T T C –G A A T T C
1
1
0
1
0
-1
Alignment score = 2Alignment score = 2
Dynamic programming
Alineamientos y homología
G A – T T CG A A T T C
Alignment score = 4Alignment score = 4
Dynamic programming
1
1
-1
1
1
1
Alineamientos y homología
G – A T T CG A A T T C
Alignment score = 4Alignment score = 4
Dynamic programming
1
-1
1
1
1
1
Alineamientos y homología
Multiple Sequence Alignments
• Phylogenetic trees are based on multiple sequence alignments.
• Dynamic programming can be used to align multiple sequences but the time required growth exponentially with the number of sequences.
• Until end 1989 multiple sequences alignments were assembled by hand because the exhaustive alignment of more than five or six sequences is computationally unfeasible.
• Now, most multiple sequences alignments are constructed by the method known as progressive sequence alignment (Feng and Doolittle, 1987; Higgins and Sharp; 1988).
Alineamientos y homología
Progressive Alignment• Progressive alignment is a heuristic method as it makes no
guarantees to produce an alignment with the best score according to a formula.
• 1) Perform all possible pairwise alignments between each pair of sequences using a fast/approximate method.
• 2) Calculate the ‘distance’ between each pair of sequences and construct a crude “guide tree” with the Neighbor-Joining method.
• 3) The alignment is gradually built up by following the branching order in the tree, with each step being treated as a dynamic programming pairwise alignment, sometimes with each member of a ‘pair’ having more than one sequence.
Alineamientos y homología
Progressive alignment - step 1
1. gctcgatacgatacgatgactagcta
2. gctcgatacaagacgatgacagcta
3. gctcgatacacgatgacta----gcta
4. gctcgatacacgatgacga---gcga
5. ctcgaacgatacgatgact----agct
1. gctcgatacgatacgatgactagcta
2. gctcgatacaagacgatgac-agcta
1
23
4
5
Alineamientos y homología
1. gctcgatacgatacgatgactagcta2. gctcgatacaagacgatgac-agcta3. gctcgatacacgatgactagcta4. gctcgatacacgatgacgagcga5. ctcgaacgatacgatgactagct
3. gctcgatacacgatgactagcta4. gctcgatacacgatgacgagcga
1
23
4
5
Progressive alignment - step 2
Alineamientos y homología
1. gctcgatacgatacgatgactagcta2. gctcgatacaagacgatgac-agcta+3. gctcgatacacgatgactagcta4. gctcgatacacgatgacgagcga
1. gctcgatacgatacgatgactagcta2. gctcgatacaagacgatgac-agcta3. gctcgatacacga---tgactagcta4. gctcgatacacga---tgacgagcga
1
23
4
5
Progressive alignment - step 3
Alineamientos y homología
1. gctcgatacgatacgatgactagcta2. gctcgatacaagacgatgac-agcta3. gctcgatacacga---tgactagcta4. gctcgatacacga---tgacgagcga+5. ctcgaacgatacgatgactagct
1. gctcgatacgatacgatgactagcta2. gctcgatacaagacgatgac-agcta3. gctcgatacacga---tgactagcta4. gctcgatacacga---tgacgagcga5. -ctcga-acgatacgatgactagct-
1
23
4
5
Progressive alignment – final step
Alineamientos y homología
BR.96.RJ081
BR.96.RJ089
BR.91.RJ347
BR.96.RJ019
BR.92.BR003
BR.92.BR021
BR.95.RJ006
BR.03.56ST
BR.97.RJ130
BR.03.16ST
BR.03.58ST
BR.96.RJ065
BR.90.RJ049
BR.90.RJ129
BR.97.RJ001
BR.97.RJ008
BR.03.05ST
BR.01.M19
BR.03.46ST
BR.96.RJ092
BR.90.RJ054
BR.92.BR017
BR.92.BR018
BR.03.06ST
BR.03.72CF
BR.92.BR030
BR.96.RJ005
BR.92.BR020
BR.97.RJ013
BR.91.RJ364
BR.96.RJ029
BR.96.RJ095
BR.97.RJ004
BR.03.41ST
BR.92.BR019
BR.03.29ST
BR.96.RJ044
BR.91.RJ145
BR.91.RJ404
BR.91.RJ379
BR.95.RJ020
BR.92.RJ452
BR.95.RJ019
BR.96.RJ002
BR.03.51ST
BR.96.RJ001
BR.96.RJ004
BR.92.RJ625
BR.97.P1 10
BR.01.M06
BR.91.RJ350
BR.92.BR014
BR.95.RJ008
BR.95.RJ002
BR.96.RJ043
BR.03.13ST
BR.95.RJ015
BR.03.59ST
BR.96.RJ066
BR.01.M44
BR.97.RJ006
BR.96.RJ093
BR.97.RJ105
BR.97.RJ116
BR.90.RJ125
BR.03.14ST
BR.01.M20
BR.03.50ST
BbrCons
BR.96.RJ025
BR.92.RJ636
BR.96.RJ070
BR.91.RJ153
BR.01.M49
BR.92.BR024
BR.91.RJ392
BR.01.M22
BR.01.M23
BR.01.M45
BR.01.M41
BR.90.RJ121
BR.97.RJ011
BR.90.RJ064
BR.91.RJ336
BR.96.RJ088
BR.91.RJ420
BR.95.RJ005
BR.92.RJ483
BR.92.RJI04
BR.90.RJ062
BR.90.RJ014
BR.97.RJ131
BR.03.17ST
BR.92.RJI02
BR.90.RJ019
BR.90.RJ108
BR.95.RJ017
BR.97.RJ005
BR.01.M46
BR.03.25ST
BR.91.RJ139
BR.91.RJ416
BR.92.BR028
BR.95.RJ016
BR.95.RJ013
BR.96.RJ011
BR.92.RJ478
BR.95.RJ009
BR.95.RJ010
BR.92.RJ626
BR.96.RJ041
BR.90.RJ131
BR.92.RJ484
BR.96.RJ071
BR.91.RJ398
BR.90.RJ012
BR.92.BR004
BR.96.RJ075
BR.92.BR026
BR.03.40ST
BR.97.RJ124
BR.97.P2 12
BR.90.RJ059
BR.03.36ST
BR.03.53ST
BR.91.RJ143
BR.03.54ST
BR.92.RJ623
BR.01.M16
100
100
77
0.05
. Construcción de árboles filogenéticos
• La inferencia de relaciones filogenéticas a partir de secs. moleculares requiere de la selección de uno de los muchos métodos disponibles
• Con frecuencia la inferencia filogenética es considerada como una “caja negra” en la que “entran las secuencias y salen los árboles”
• La inferencia de relaciones filogenéticas a partir de secs. moleculares requiere de la selección de uno de los muchos métodos disponibles
• Con frecuencia la inferencia filogenética es considerada como una “caja negra” en la que “entran las secuencias y salen los árboles”
Objetivos fundamentales1. desarrollar un marco conceptual para entender los fundamentos teóricos (filosóficos)que distinguen a los distintos métodos de inferencia (clasificación de métodos)
2. presentar el uso de modelos y suposiciones en filogenética
Tree Reconstructions Methods
Maximum-likelihood
Bayesian Inference
Neighbour-Joining
Minimum Evolution
UPGMA
Maximum-parsimony
Character-based methods Distance-based methods
Methods based on an explicit model of evolution
Methods not based on an explicit model of evolution
. Construcción de árboles filogenéticos
Neighbor-Joining method
The NJ (Saitou and Nei, 1987) method is a heuristic method for estimating the minimum evolution tree.
The NJ method is based on the minimum evolution principle and construct internal nodes by joining nearest neighbors (two taxa connected by a single node) in each step.
PAM Spinach Rice Mosquito Monkey HumanSpinach 0.0 84.9 105.6 90.8 86.3Rice 84.9 0.0 117.8 122.4 122.6Mosquito 105.6 117.8 0.0 84.7 80.8Monkey 90.8 122.4 84.7 0.0 3.3Human 86.3 122.6 80.8 3.3 0.0
Distance Matrix
. Construcción de árboles filogenéticos
Distance 3.3 (Human - Monkey) is the minimum. So we'll join Human and Monkey to MonHum.
Mon-Hum
MonkeyHumanSpinachMosquito Rice
Neighbor-Joining (1)
Recalculate the distance matrix again…..
. Construcción de árboles filogenéticos
PAM Spinach Rice Mosquito MonHumSpinach 0.0 84.9 105.6 88.6Rice 84.9 0.0 117.8 122.5Mosquito 105.6 117.8 0.0 82.8MonHum 88.6 122.5 82.8 0.0
HumanMosquito
Mon-Hum
MonkeySpinachRice
Mos-(Mon-Hum)
Neighbor-Joining (2)
Recalculate the distance matrix again…..
. Construcción de árboles filogenéticos
PAM Spinach Rice MosMonHumSpinach 0.0 84.9 97.1Rice 84.9 0.0 120.2MosMonHum 97.1 120.2 0.0
HumanMosquito
Mon-Hum
MonkeySpinachRice
Mos-(Mon-Hum)
Spin-Rice
Neighbor-Joining (3)
Recalculate the distance matrix again…..
. Construcción de árboles filogenéticos
PAM SpinRice MosMonHumSpinach 0.0 108.7MosMonHum 108.7 0.0
HumanMosquito
Mon-Hum
MonkeySpinachRice
Mos-(Mon-Hum)
Spin-Rice
(Spin-Rice)-(Mos-(Mon-Hum))
Neighbor-Joining (4). Construcción de árboles filogenéticos
Human
Monkey
MosquitoRice
Spinach
Unrooted Neighbor-Joining Tree. Construcción de árboles filogenéticos
Distance-based methods
Advantages:
- very fast
- allows the use of an explicit model of evolution
Disadvantages:
- only produces one best tree (we do not get any idea about other potential tress)
- reduces all sequence information into a single distance value
- generally outperformed by Maximum likelihood or Bayesian methods in choosing the correct tree in computer simulations
. Construcción de árboles filogenéticos
Tree Reconstructions Methods
Maximum-likelihood
Bayesian Inference
Neighbour-Joining
Minimum Evolution
UPGMA
Maximum-parsimony
Character-based methods Distance-based methods
Methods based on an explicit model of evolution
Methods not based on an explicit model of evolution
. Construcción de árboles filogenéticos
Neighbour-Joining
. Construcción de árboles filogenéticos