EEES 4150/5150, Evolution Sigler
1
Chapter 12 (Strikberger) Molecular Phylogenies and Evolution
METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern molecular methods of classification can overcome many of the pitfalls associated with traditional methods. Molecular methods compare antibodies, DNA, RNA, or amino acid sequences to give insight to
the relatedness of organisms. For example, evolutionary changes are indicated by substitutions in nucleotide
and amino acid sequences. Possible even when no morphological, behavioral, or ecological links are
present.
Amino acid sequences Comparing amino acid sequences in a homologous protein can provide information about the relationships between different organisms. Hemoglobin was the first protein investigated. Structure includes a porphyrin (heme, that can reversibly bind to oxygen) attached to a
globin polypeptide chain (>140 aa long).
EEES 4150/5150, Evolution Sigler
2
Found in animals (hemoglobin-like molecules are also found in plants, fungi, and
invertebrates). The conserved nature of this molecule implies an early place in evolution. In a normal, adult: Four polypeptide chains; two α and two β (α2 β2) In some adults: α2 δ2 Embryonic hemoglobin: α2 γ2 Myoglobin and ε chains of hemoglobin are present in some tissues. Differences in hemoglobin structures indicate two kinds of evolution.
1. Differing globin chains (α, β, γ, δ, ε) arose producing the variety carried by a particular organism.
Why might differing structures of globin chains arise? Each were a variation of the same globin theme: 1. Chains are same length 2. Sequence similarity at many positions. 3. 3-D structure similar similar function. 4. β, γ, and δ genes are closely linked on chromosome 11. What does this close linkage tell you about the
evolution of hemoglobin?
EEES 4150/5150, Evolution Sigler
3
2. Once produced, each globin chain followed its own evolutionary path.
This lead to changes in its amino acid sequence in different
species.
EEES 4150/5150, Evolution Sigler
4
GENE DUPLICATION AND DIVERGENCE
The differing globins likely did not evolve independently, and then accidentally converge in sequence and function.
Question: How did differing molecules of such similar structure and function arise? Linkage studies suggest that gene duplication of an original globin-type gene took place. Once copies of the gene were present, each could theoretically undergo
independent evolution leading to today’s α, β, γ, δ, and ε chains. Question: How could you tell which one came first? And which one is the most “recent”?
The temporal order of hemoglobin chain evolution can be deduced by comparing amino acid sequences.
The greater the sequence difference, the longer the time to their common ancestor (and
subsequently, the greater evolutionary distance).
We know: The myoglobin chain differs most from the others (different amino acids at > 100 sites).
α differs from β at 77 sites. β differs from γ at 39 sites but differs from δ at only 10 sites. Implies that: 1. The myoglobin gene formed from an early duplication. 2. A later duplication separated α and β genes. 3. β and δ represent the latest duplication.
1 2
3
EEES 4150/5150, Evolution Sigler
5
How do gene duplications arise? Unequal crossing over that results in increased chromosome material.
Some duplicated genes have evolved completely different functions although they share common
amino acid sequences. Amino acid sequence of α-lactalbumin protein is similar to that of lysozyme. Further evolutionary relationships include: products by “ducts” sugar is the substrate Once genes are duplicated, how long does it take for divergences to occur? Dependent on number of amino acid substitutions necessary to produce a differing
function. However even one amino acid change can drive drastic changes. Example: A single amino acid substitution can convert lactate
dehydrogenase (LDH) to malate dehydrogenase (MDH). LDH used in glycolysis (pyruvate lactate) MDH used in Krebs cycle (malate oxoloacetate) Change generated by a change from glutamine to arginine at the 102nd
polypeptide position. Is it ironic that this simple aa change drives a major functional alteration,
but in two closely related cycles?
EEES 4150/5150, Evolution Sigler
6
DETERMINING MOLECULAR PHYLOGENIES
One can estimate the evolutionary similarity between two genes by determining the minimum number of mutations necessary to transform one amino acid in one sequence to another amino acid in the same position in the other sequence.
Minimizing the number of mutations necessary to drive a given change is referred to as “parsimony”.
Example: it is easier to explain that a phenylalanine codon (UUU) arose from a single
nucleotide substitution in a serine codon (UCU UUU) than from a triple nucleotide substitution in a glutamic acid codon (GAA UUU).
Once parsimoniously determined evolutionary distances between species are established,
phylogenetic relationships can be resolved. Example: Assume the most parsimonious mutational distance between a protein
in species A and B is 25, between A and C is 20, and between B and C is 30. Which two are most closely related? Assign legs x, y, and z to represent the numbers of mutations responsible for their
divergence.
EEES 4150/5150, Evolution Sigler
7
The phylogenetic relationship can be portrayed as follows:
The length of the legs can be calculated: A – B distance (25) is 5 mutations less than C – B distance (30). Therefore, x is 5 mutations less than y. y + z = 30 -(x + z -= 25) y – x = 5 Since y + x = 20 (A – C distance) and y – x = 5, we can determine y. y +x = 20 +(y - x = 5) 2y = 25 y = 12.5, and by substitution, x = 7.5 z = A – B distance – x = 25-7.5 = 17.5, yielding:
Estimated branch position
EEES 4150/5150, Evolution Sigler
8
Using mutational data, we can generate phylogenetic trees that display relationships among varying organisms. Calculated with complex mathematical algorithms. Many trees are possible, but only one will represent the true phylogeny. How do we know when a tree is the best one? Bootstrapping calculates the proportion of acceptable trees in which a node appears when data
is repeatedly sampled and replaced.
Example: Resample sequences that feature sequence differences one hundred times to produce one-hundred trees. Some differences in the trees are omitted and some appear more than once over the course of the resamplings.
Each sampling generates a tree in which a particular node (position) may or may not occur.
The bootstrap value is the frequency (% of time) in which the same branch
appears.
NUCLEIC ACID PHYLOGENIES BASED ON DNA-DNA HYBRIDIZATIONS
Homology among genes from different organisms can be calculated by measuring the degree to which homologous nucleotide sequences in single strands pair up to form double strands. Referred to as DNA reassociation. DNA is isolated from two organisms (X and Y) and dissociated into single strands, then
allowed to reassociate into X-Y hybrid double strands.
EEES 4150/5150, Evolution Sigler
9
The reassociation process can be monitored by noting the A260 on a spectrophotometer.
As the DNA reassociates (becomes double stranded) the A260 will
decrease. The rate of reassociation is proportional to the homology of DNA strands
in the mixture.
Method can be used to compare simple or complex mixtures of DNA comprising billions of nucleotides. DNA reassociation has been used to deduce the phylogenetic relationship of primates.
EEES 4150/5150, Evolution Sigler
10
Note: paleontological evidence suggests that the lineages of Old World monkeys and apes-humans diverged 33 million years ago. A 7.7o C change in the thermal stability of DNA from humans and Old World Monkeys has also been observed. This implies that every 1o C shift in DNA thermal stability represents a 4.3
million year interval in the evolution of primates. Allows the placement of the bottom x-axis in Fig. 12-11. DNA hybridization techniques have their detractors. DNA hybridization compresses all divergence information into a single distance measurement.
NUCLEIC ACID PHYLOGENIES BASED ON RESTRICTION ENZYME SITES
Restriction enzymes recognize short (4 – 8), specific nucleotide sequences and cleave the DNA at these sites. Example: EcoRI recognizes the sequence: 5’ – GAATTC – 3’ 3’ – CTTAAG – 5’ and will cut (restrict) the DNA between the G and A
EEES 4150/5150, Evolution Sigler
11
Since the DNA from different species exhibits differing sequences, the placement of restriction sites will be species- (sometimes strain) specific.
Therefore, each species’ DNA will have fragments of characteristic length following enzyme restriction.
There are many restriction enzymes available with which DNA can be restricted.
Therefore complex mixtures of DNA fragment lengths can be generated, each representative of a differing species = Restriction fragment length polymorphisms (RFLP)
Restriction maps can allow a comparison between species. Example: mitochondrial DNA from humans and apes was restricted with 19 different
enzymes. The enzymes cleaved the DNA at approximately 50 sites. Comparison of site placement can yield evolutionary data.
EEES 4150/5150, Evolution Sigler
12
In agreement with previously determined phylogenies, humans share many more restriction sites with chimpanzees and gorillas than with orangutans and gibbons. However, branching is unclear.
NUCLEIC ACID PHYLOGENIES BASED ON NUCLEOTIDE SEQUENCE COMPARISONS AND HOMOLOGIES
The most accurate method for determining phylogenetic relationships between different organisms is the direct comparison of DNA sequences of the same (or homologous) gene. Databases archive volumes of sequence data. Genbank (NCBI) Sequence information has suggested several evolutionary events: 1. Extensive horizontal gene transfer between genomes. 2. Considerable amount of gene duplication 25% of the Bacillus subtilis genome 3. Many Archaea protein sequences are more similar to Bacteria than to eukaryotes.
EEES 4150/5150, Evolution Sigler
13
4. Protein used in replication, transcription, and translation show greater similarity in Archaea and eukaryotes.
5. 50% of genes have no known function. 6. Roughly 480 genes might be the minimum required for life. Based on sequencing of Mycoplasma genitalium genome. Primary targets for sequencing analyses to determine phylogeny are the ribosomal RNA genes. The gold standard today is the 16S rRNA gene (prokaryotes) and 18S rRNA gene (eukaryotes), however Strickberger focuses on the 5S rRNA gene. Why the rRNAs? 1. Universally distributed in all organisms. Component of the protein synthesis machinery = similar function in different
organisms 2. Gene features regions that are highly conserved as well as regions that are variable. Constant secondary structure.
Allows for alignment and comparison of DNA sequences.
RATES OF MOLECULAR CHANGE: EVOLUTIONARY CLOCKS Inherent in all phylogenies is that evolutionary differences arose due to mutational differences. The greater the number of differences, the greater the evolutionary distance between organisms. In Figure 12-11, the evolutionist used a time scale (1o C of DNA thermal stability change
reflected 4.3 million years of evolutionary time). In doing this, they have assumed that mutations occurred at a fixed rate over time.
EEES 4150/5150, Evolution Sigler
14
Implies that an “evolutionary clock” determines the rate at which many mutations occur.
Evidence can be found in the amino acid sequence of hemoglobin. Compare the α-hemoglobin amino acid sequence of several different organisms to
that of sharks. How many differences are detected? Carp 85 Salamander 84 Chicken 83 Mouse 79 Human 79 Results show that although considerable morphological changes have
occurred in the organisms, a constant rate of mutation must have occurred. The number of sequence differences in β-hemoglobin chains correlates with the
time to a common ancestor for many organism pairs: Organism pair aa changes Time to common per 100 codons ancestor (million years) Human/monkey 5 30 Human/cattle 18 90 Marsupial/placental mammal 27 130 Bird/ mammal 32 250 Shark/bony vertebrate 65 500 If evolutionary clocks exist, then two consequences can be expected:
1. The lines of descent leading from a common ancestor to all contemporary descendents should have similar rates of fixed mutations.
2. The proportional rate of fixation that occurs in one gene relative to the rates of fixation in
other genes stays the same throughout any line of descent. These expectation were tested: The amino acid sequences of seven proteins in 17 vertebrate taxa were examined. An evolutionary clock was “calibrated” using known and accepted dates of
divergence. The temporal length of each line of descent was calculated
EEES 4150/5150, Evolution Sigler
15
The number of nucleotide substitutions that occurred over a given length of time was compared among the 17 taxa.
The results suggested that the rate at which the individual proteins changed varied
significantly among the differing lines of descent. Indicated that molecular changes were not uniform for a specific protein or taxa. However, when nucleotide substitutions are averaged over all seven proteins for each branching
point in the phylogeny, the rate of molecular change over time is constant.
This procedure calibrates an evolutionary clock for these particular proteins and allows us
to link change to time. The need to average the nucleotide substitutions among the different genes in
order to achieve a linear relationship between time and nucleotide substitutions tells us that no one evolutionary clock applies to every nucleotide sequence. Why?
Perhaps selection intensity fixes some mutations more securely than others.