codon usage

42
1 Codon Usage Codon Usage Dan Graur

Upload: penelope-clemons

Post on 31-Dec-2015

100 views

Category:

Documents


4 download

DESCRIPTION

Codon Usage. Dan Graur. Because of the degeneracy of all genetic codes, 18-20 amino acids are encoded by more than one codon (2, 3, 4, or 6). If synonymous mutations are strictly neutral , they should be used randomly as dictated by genomic GC content. Codon-usage bias. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Codon Usage

1

Codon UsageCodon UsageDan Graur

Page 2: Codon Usage

2

Because of the degeneracy of all genetic codes, 18-20 amino acids are encoded by more than one codon (2, 3, 4, or 6).

Page 3: Codon Usage

3

If synonymous mutations are strictly neutral, they should be used randomly as dictated by genomic GC content.

Page 4: Codon Usage

4

Codon-usage bias

Page 5: Codon Usage

5

Measures of codon-usage bias

Page 6: Codon Usage

6

The relative synonymous codon usage (RSCU) is the number of times a codon appears in a gene divided by the number of expected occurrences under equal codon usage.

n = number of synonymous codons (1 n 6) for the amino acid under study, Xi = number of occurrences of codon i.

If the synonymous codons of an amino acid are used with equal frequencies, their RSCU values will equal 1.

RSCUi=

Xi

1n X

ii=1

n∑

Page 7: Codon Usage

7

The codon adaptation index (CAI) measures the degree with which genes use preferred codons.

We first compile a table of RSCU values for highly expressed genes. From this table, it is possible to identify the codons that are most frequently used for each amino acid. The relative adaptiveness of a codon (wi) is computed as

where RSCUmax = the RSCU value for the

most frequently used codon for an amino acid.

wi=

RSCUi

RSCUmax

Page 8: Codon Usage

8

The CAI value for a gene is calculated as the geometric mean of wi values for all the codons used in that gene.

where L = number of codons. €

CAI= wii=1

L∏ ⎛

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

1L

Page 9: Codon Usage

9

The effective number of codons (ENC)

where Fi (i = 2, 3, 4, or 6) is the average probability that two randomly chosen codons for an amino acid with i codons will be identical.

ENC values range from 20 (the number of amino acids), which means that the bias is at a maximum, and only one codon is used from each synonymous-codon group, to 61 (the number of sense codons), which indicates no codon-usage bias.

ENC=2+9F2+1F3+5F4+3F6

Page 10: Codon Usage

10

Escherichia

coli

Saccharomyces

cerevisaeAmino Acid Codon

High Low High Low

UUA 1% 20% 8% 25%

UUG 1% 15% 89% 25%

CUU 2% 12% 0% 12%

CUC 3% 11% 0% 9%

CUA 1% 5% 3% 15%

Leucine

CUG 92% 37% 0% 14%

GUU 60% 27% 52% 28%

GUC 2% 25% 48% 19%

GUA 28% 16% 0% 30%

Valine

GUG 10% 32% 0% 23%

AUU 16% 46% 42% 43%

AUC 84% 37% 58% 22%

Isoleucine

AUA 0% 17% 0% 35%

UUU 17% 67% 10% 69%Phenylalanine

UUC 83% 33% 90% 31%

Page 11: Codon Usage

11

Universal and species-specific patterns of codon

usage

Page 12: Codon Usage

12

The The ggenome enome hhypypothesisothesis

All genes in a genome All genes in a genome tend to have the same tend to have the same coding strategy. That coding strategy. That is, they employ the is, they employ the codon catalog similarly codon catalog similarly and show similar and show similar choices between choices between synonymous codons. synonymous codons.

Different taxa have Different taxa have different coding different coding strategies. strategies.

Richard Grantham

Page 13: Codon Usage

13

Are there universal preferences?

There are NO universally preferred or universally avoided codons.

There may be some universal preferences and avoidances as far as codon neighbor pairs are concerned. For example, the pair NNG GNN, where N stands for all four possible nucleotides, seems to be preferred, while the pair NNG CNN seems to be avoided.

Page 14: Codon Usage

14

Biases in synonymous codon usage can be caused by:

(1) mutational biases

(2) selection favoring preferred codons

(3) purifying selection against disfavored codons

Page 15: Codon Usage

15

Mutational Biases

If the unequal codon-usage is due to biases in mutation patterns, then the expectation is that the magnitude and the direction of the bias will be more or less the same for all codon families and for all genes, regardless of function or expression levels.

Page 16: Codon Usage

16

Mutational Biases

Let us assume that the mutation pattern in an organism tends to result in AT rich sequences. Under such a mutational regime, it is expected that all four-fold degenerate codon families will exhibit a preference for codons ending in A or T. Thus, the preferred codons for valine should be GTA and GTT and the preferred codons for arginine should be CGA and CGT.

Page 17: Codon Usage

17

Mutational Biases

Some bacterial genomes (e.g., Mycoplasma capricolum), exhibit this type of consistent codon-usage bias.

Codon family

Amino acid

T/A in 3rd (%)

G/C in 3rd (%)

CU LEU 93 7

GU VAL 95 5

UC SER 98 2

CC PRO 95 5

AC THR 98 2

GC ALA 94 6

CG ARG 100 0

GG GLY 95 5

Page 18: Codon Usage

18

Mutational Biases

In Escherichia coli, there is no such consistent bias.

Codon family

Amino acid

T/A in 3rd (%)

G/C in 3rd (%)

CU LEU 20 80

GU VAL 42 58

UC SER 49 51

CC PRO 36 64

AC THR 34 66

GC ALA 39 61

CG ARG 47 53

GG GLY 45 65

Page 19: Codon Usage

19

(2) positive selection favoring preferred codons

(3) purifying selection against disfavored codons

Page 20: Codon Usage

20

(2) positive selection is expected to accelerate the rate of substitution

(3) purifying selection is expected to slow down the rate of substitution

Page 21: Codon Usage

21

(2) positive selection is expected to accelerate the rate of substitution

(3) purifying selection is expected to slow down the rate of substitution

Page 22: Codon Usage

22

(2) positive selection is expected to accelerate the rate of substitution

(3) purifying selection is expected to slow down the rate of substitution

There is a negative correlation between codon usage bias and rate of synonymous substitution.

Page 23: Codon Usage

23

positive selection is expected to accelerate the rate of substitution

purifying selection is expected to slow down the rate of substitution

There is a negative correlation between codon usage bias and rate of synonymous substitution.

Page 24: Codon Usage

24

Two selective factors have been convincingly invoked to explain codon usage bias.

(1) translation optimization(2) folding stability of the mRNA

Page 25: Codon Usage

25

The translation efficiency of a codon is related to the relative quantity of tRNA molecules that recognize the particular codon.

Page 26: Codon Usage

26

Page 27: Codon Usage

27

Codon Usage

isrelated

toTranslatio

n Efficiency

Page 28: Codon Usage

28Toshimichi Ikemura

Page 29: Codon Usage

29

Is codon usage bias uniform along the length of the mRNA?For many highly expressed genes, codons recognized by low abundance tRNAs are overrepresented in the 5’ region of the coding region. This pattern suggests that ribosomes translate more slowly over the initial 50 codons or so (the so-called ramp stage) and then translate the remainder of the mRNA at full speed.

Page 30: Codon Usage

30

What purpose does the ramp play in translation? Slowing translation elongation immediately after initiation effectively generates more uniform spacing between ribosomes further down the mRNA, which prevents ribosome congestion and translation stalling and termination.

Page 31: Codon Usage

31

Another potential role for the ramp involves protein folding. The length of the ramp corresponds well to the length of the polypeptide needed to fill the exit tunnel of the ribosome, so the nascent peptide chain should emerge from the ribosome as it transitions from the slow ramp stage to the fast stage of elongation. This raises the possibility that the slowdown in the ramp might increase the fraction of correctly folded product.

Page 32: Codon Usage

32

Folding stability of the mRNA

RNA is synthesized as single strands of ribonucleotides.

Intrastrand base pairing will produce two-dimensional (2D) structures.

Page 33: Codon Usage

33

Folding stability of the mRNA

The stability of a secondary structure is quantified as the amount of free energy released or used to form it. Positive free energy requires work to form a structure. Negative free energy release stored work. The more negative the free energy of a structure, the more likely is formation of that structure, because more stored energy is released.

Page 34: Codon Usage

34

Folding stability of the mRNA

Free energies are additive, so one can determine the total free energy of a secondary structure by adding all the component free energies.

local folding energy = ΔG. along the mRNA sequence using a sliding window of 30 nucleotides (nt) in length, moving from the start codon to the downstream nucleotide in steps of 10 nt (for a total of 13 windows). To quantify the deviation from expectation given a gene's amino-acid sequence and codon usage bias, we also calculated for 1000 permuted mRNA sequences. We obtained permuted sequences by randomly reshuffling synonymous codons within each gene. We then calculated a -score, , by comparing the of the real mRNA segment to the distribution of values of the permuted sequences (see Materials and Methods). measures the extent to which local mRNA stability deviates from expectation. A positive means that local mRNA stability is reduced, and a negative means that it is increased. For each window, we calculated a genome-wide mean by averaging the corresponding values over all genes in a genome.

Page 35: Codon Usage

35

Folding stability of the mRNA

ΔG = Local free energy of a sequence.

Expectation = mean local free energy of 1000 permuted sequences.

ZΔG = A measure of the extent to which a local ΔG value deviates from expectation.

Page 36: Codon Usage

36

Folding stability of the mRNA

A positive ZΔG means that local mRNA stability is reduced.A negative ZΔG means that local mRNA stability is increased.

Page 37: Codon Usage

37

Codon arrangement along the mRNA

The arrangement of different codons along the length of the mRNA influences translation efficiency. In the autocorrelated pattern, when an amino acid recurs in the protein, there is a strong propensity to use the same codon the second time as that for the first occurrence of the amino acid. In the anticorrelated pattern, when an amino acid recurs in the protein, there is a strong tendency to use a different codon the second time from that used in the first occurrence of the amino acid.

Page 38: Codon Usage

38

Page 39: Codon Usage

39

Some organisms display biased codon usage; others do not.

Certain organisms, such as the bacterium Helicobacter pylori and humans present little evidence of translational selection, while others such as the bacterium Escherichia coli, the yeast Saccharomyces cerevisae, the nematode Caenorhabditis elegans, and the fly Drosophila melanogaster, show a marked codon bias due to selection.

Page 40: Codon Usage

40

A possible solution was suggested by dos Reis et al. (2004). dos Reis et al. (2004) discovered that tRNA-gene redundancy and genome size are interacting forces in determining translational selection and codon-usage bias.

They suggested that an optimal combination of these factors exists for which the action of translational selection is maximal.

Page 41: Codon Usage

41

The magnitude of selection was maximal in genomes 1-30 Mb in size that contain 150-600 tRNA specifying genes.

Both Helicobacter pylori and humans fall outside this range.

The genome of Helicobacter pylori contains only 36 tRNA-coding genes (only one tRNA-gene having two copies).

The haploid genome size of humans is approximately 3,500 Mb.

Page 42: Codon Usage

Subramanian S. 2008. Genetics 178:2429-2432