Download - Chapter 12 (Strikberger) Molecular Phylogenies and ... · EEES 4150/5150, Evolution Sigler 1 Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY

EEES 4150/5150, Evolution Sigler

1

Chapter 12 (Strikberger) Molecular Phylogenies and Evolution

METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern molecular methods of classification can overcome many of the pitfalls associated with traditional methods. Molecular methods compare antibodies, DNA, RNA, or amino acid sequences to give insight to

the relatedness of organisms. For example, evolutionary changes are indicated by substitutions in nucleotide

and amino acid sequences. Possible even when no morphological, behavioral, or ecological links are

present.

Amino acid sequences Comparing amino acid sequences in a homologous protein can provide information about the relationships between different organisms. Hemoglobin was the first protein investigated. Structure includes a porphyrin (heme, that can reversibly bind to oxygen) attached to a

globin polypeptide chain (>140 aa long).


2

Found in animals (hemoglobin-like molecules are also found in plants, fungi, and

invertebrates). The conserved nature of this molecule implies an early place in evolution. In a normal, adult: Four polypeptide chains; two α and two β (α2 β2) In some adults: α2 δ2 Embryonic hemoglobin: α2 γ2 Myoglobin and ε chains of hemoglobin are present in some tissues. Differences in hemoglobin structures indicate two kinds of evolution.

1. Differing globin chains (α, β, γ, δ, ε) arose producing the variety carried by a particular organism.

Why might differing structures of globin chains arise? Each were a variation of the same globin theme: 1. Chains are same length 2. Sequence similarity at many positions. 3. 3-D structure similar similar function. 4. β, γ, and δ genes are closely linked on chromosome 11. What does this close linkage tell you about the

evolution of hemoglobin?


3

2. Once produced, each globin chain followed its own evolutionary path.

This lead to changes in its amino acid sequence in different

species.


4

GENE DUPLICATION AND DIVERGENCE

The differing globins likely did not evolve independently, and then accidentally converge in sequence and function.

Question: How did differing molecules of such similar structure and function arise? Linkage studies suggest that gene duplication of an original globin-type gene took place. Once copies of the gene were present, each could theoretically undergo

independent evolution leading to today’s α, β, γ, δ, and ε chains. Question: How could you tell which one came first? And which one is the most “recent”?

The temporal order of hemoglobin chain evolution can be deduced by comparing amino acid sequences.

The greater the sequence difference, the longer the time to their common ancestor (and

subsequently, the greater evolutionary distance).

We know: The myoglobin chain differs most from the others (different amino acids at > 100 sites).

α differs from β at 77 sites. β differs from γ at 39 sites but differs from δ at only 10 sites. Implies that: 1. The myoglobin gene formed from an early duplication. 2. A later duplication separated α and β genes. 3. β and δ represent the latest duplication.

1 2

3


5

How do gene duplications arise? Unequal crossing over that results in increased chromosome material.

Some duplicated genes have evolved completely different functions although they share common

amino acid sequences. Amino acid sequence of α-lactalbumin protein is similar to that of lysozyme. Further evolutionary relationships include: products by “ducts” sugar is the substrate Once genes are duplicated, how long does it take for divergences to occur? Dependent on number of amino acid substitutions necessary to produce a differing

function. However even one amino acid change can drive drastic changes. Example: A single amino acid substitution can convert lactate

dehydrogenase (LDH) to malate dehydrogenase (MDH). LDH used in glycolysis (pyruvate lactate) MDH used in Krebs cycle (malate oxoloacetate) Change generated by a change from glutamine to arginine at the 102nd

polypeptide position. Is it ironic that this simple aa change drives a major functional alteration,

but in two closely related cycles?


6

DETERMINING MOLECULAR PHYLOGENIES

One can estimate the evolutionary similarity between two genes by determining the minimum number of mutations necessary to transform one amino acid in one sequence to another amino acid in the same position in the other sequence.

Minimizing the number of mutations necessary to drive a given change is referred to as “parsimony”.

Example: it is easier to explain that a phenylalanine codon (UUU) arose from a single

nucleotide substitution in a serine codon (UCU UUU) than from a triple nucleotide substitution in a glutamic acid codon (GAA UUU).

Once parsimoniously determined evolutionary distances between species are established,

phylogenetic relationships can be resolved. Example: Assume the most parsimonious mutational distance between a protein

in species A and B is 25, between A and C is 20, and between B and C is 30. Which two are most closely related? Assign legs x, y, and z to represent the numbers of mutations responsible for their

divergence.


7

The phylogenetic relationship can be portrayed as follows:

The length of the legs can be calculated: A – B distance (25) is 5 mutations less than C – B distance (30). Therefore, x is 5 mutations less than y. y + z = 30 -(x + z -= 25) y – x = 5 Since y + x = 20 (A – C distance) and y – x = 5, we can determine y. y +x = 20 +(y - x = 5) 2y = 25 y = 12.5, and by substitution, x = 7.5 z = A – B distance – x = 25-7.5 = 17.5, yielding:

Estimated branch position


8

Using mutational data, we can generate phylogenetic trees that display relationships among varying organisms. Calculated with complex mathematical algorithms. Many trees are possible, but only one will represent the true phylogeny. How do we know when a tree is the best one? Bootstrapping calculates the proportion of acceptable trees in which a node appears when data

is repeatedly sampled and replaced.

Example: Resample sequences that feature sequence differences one hundred times to produce one-hundred trees. Some differences in the trees are omitted and some appear more than once over the course of the resamplings.

Each sampling generates a tree in which a particular node (position) may or may not occur.

The bootstrap value is the frequency (% of time) in which the same branch

appears.

NUCLEIC ACID PHYLOGENIES BASED ON DNA-DNA HYBRIDIZATIONS

Homology among genes from different organisms can be calculated by measuring the degree to which homologous nucleotide sequences in single strands pair up to form double strands. Referred to as DNA reassociation. DNA is isolated from two organisms (X and Y) and dissociated into single strands, then

allowed to reassociate into X-Y hybrid double strands.


9

The reassociation process can be monitored by noting the A260 on a spectrophotometer.

As the DNA reassociates (becomes double stranded) the A260 will

decrease. The rate of reassociation is proportional to the homology of DNA strands

in the mixture.

Method can be used to compare simple or complex mixtures of DNA comprising billions of nucleotides. DNA reassociation has been used to deduce the phylogenetic relationship of primates.


10

Note: paleontological evidence suggests that the lineages of Old World monkeys and apes-humans diverged 33 million years ago. A 7.7o C change in the thermal stability of DNA from humans and Old World Monkeys has also been observed. This implies that every 1o C shift in DNA thermal stability represents a 4.3

million year interval in the evolution of primates. Allows the placement of the bottom x-axis in Fig. 12-11. DNA hybridization techniques have their detractors. DNA hybridization compresses all divergence information into a single distance measurement.

NUCLEIC ACID PHYLOGENIES BASED ON RESTRICTION ENZYME SITES

Restriction enzymes recognize short (4 – 8), specific nucleotide sequences and cleave the DNA at these sites. Example: EcoRI recognizes the sequence: 5’ – GAATTC – 3’ 3’ – CTTAAG – 5’ and will cut (restrict) the DNA between the G and A


11

Since the DNA from different species exhibits differing sequences, the placement of restriction sites will be species- (sometimes strain) specific.

Therefore, each species’ DNA will have fragments of characteristic length following enzyme restriction.

There are many restriction enzymes available with which DNA can be restricted.

Therefore complex mixtures of DNA fragment lengths can be generated, each representative of a differing species = Restriction fragment length polymorphisms (RFLP)

Restriction maps can allow a comparison between species. Example: mitochondrial DNA from humans and apes was restricted with 19 different

enzymes. The enzymes cleaved the DNA at approximately 50 sites. Comparison of site placement can yield evolutionary data.


12

In agreement with previously determined phylogenies, humans share many more restriction sites with chimpanzees and gorillas than with orangutans and gibbons. However, branching is unclear.

NUCLEIC ACID PHYLOGENIES BASED ON NUCLEOTIDE SEQUENCE COMPARISONS AND HOMOLOGIES

The most accurate method for determining phylogenetic relationships between different organisms is the direct comparison of DNA sequences of the same (or homologous) gene. Databases archive volumes of sequence data. Genbank (NCBI) Sequence information has suggested several evolutionary events: 1. Extensive horizontal gene transfer between genomes. 2. Considerable amount of gene duplication 25% of the Bacillus subtilis genome 3. Many Archaea protein sequences are more similar to Bacteria than to eukaryotes.


13

4. Protein used in replication, transcription, and translation show greater similarity in Archaea and eukaryotes.

5. 50% of genes have no known function. 6. Roughly 480 genes might be the minimum required for life. Based on sequencing of Mycoplasma genitalium genome. Primary targets for sequencing analyses to determine phylogeny are the ribosomal RNA genes. The gold standard today is the 16S rRNA gene (prokaryotes) and 18S rRNA gene (eukaryotes), however Strickberger focuses on the 5S rRNA gene. Why the rRNAs? 1. Universally distributed in all organisms. Component of the protein synthesis machinery = similar function in different

organisms 2. Gene features regions that are highly conserved as well as regions that are variable. Constant secondary structure.

Allows for alignment and comparison of DNA sequences.

RATES OF MOLECULAR CHANGE: EVOLUTIONARY CLOCKS Inherent in all phylogenies is that evolutionary differences arose due to mutational differences. The greater the number of differences, the greater the evolutionary distance between organisms. In Figure 12-11, the evolutionist used a time scale (1o C of DNA thermal stability change

reflected 4.3 million years of evolutionary time). In doing this, they have assumed that mutations occurred at a fixed rate over time.


14

Implies that an “evolutionary clock” determines the rate at which many mutations occur.

Evidence can be found in the amino acid sequence of hemoglobin. Compare the α-hemoglobin amino acid sequence of several different organisms to

that of sharks. How many differences are detected? Carp 85 Salamander 84 Chicken 83 Mouse 79 Human 79 Results show that although considerable morphological changes have

occurred in the organisms, a constant rate of mutation must have occurred. The number of sequence differences in β-hemoglobin chains correlates with the

time to a common ancestor for many organism pairs: Organism pair aa changes Time to common per 100 codons ancestor (million years) Human/monkey 5 30 Human/cattle 18 90 Marsupial/placental mammal 27 130 Bird/ mammal 32 250 Shark/bony vertebrate 65 500 If evolutionary clocks exist, then two consequences can be expected:

1. The lines of descent leading from a common ancestor to all contemporary descendents should have similar rates of fixed mutations.

2. The proportional rate of fixation that occurs in one gene relative to the rates of fixation in

other genes stays the same throughout any line of descent. These expectation were tested: The amino acid sequences of seven proteins in 17 vertebrate taxa were examined. An evolutionary clock was “calibrated” using known and accepted dates of

divergence. The temporal length of each line of descent was calculated


15

The number of nucleotide substitutions that occurred over a given length of time was compared among the 17 taxa.

The results suggested that the rate at which the individual proteins changed varied

significantly among the differing lines of descent. Indicated that molecular changes were not uniform for a specific protein or taxa. However, when nucleotide substitutions are averaged over all seven proteins for each branching

point in the phylogeny, the rate of molecular change over time is constant.

This procedure calibrates an evolutionary clock for these particular proteins and allows us

to link change to time. The need to average the nucleotide substitutions among the different genes in

order to achieve a linear relationship between time and nucleotide substitutions tells us that no one evolutionary clock applies to every nucleotide sequence. Why?

Perhaps selection intensity fixes some mutations more securely than others.

Download - Chapter 12 (Strikberger) Molecular Phylogenies and ... · EEES 4150/5150, Evolution Sigler 1 Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY

Top Related