best presentation on codon bias and its appliaction

MASTER’S SEMINAR ON

CODON USAGE BIAS AND IT’S UTILIZATION

Speaker : Deshmukh AbasahebID: 49489

Content :

Introduction Theory behind the codon bias Pattern of codon usage bias in species Factors affecting codon bias Effect of codon usage bias Measurement of codon usage bias Application of codon usage bias Conclusion

CODON USAGE BIAS The differences in the frequency of occurrence of synonymous codon in coding DNA or in mRNA transcript.

What is codon?

What is synonymous

codon?

Why there is differences in frequency of occurence of syn. Codon?

GENETIC CODE

• The genetic code is the set of rules by which information encoded in mRNA sequences is converted into proteins (amino acid sequences) by living cells

• Codons are a triplet of bases which encodes a particular amino acid

• As there are four bases, there are 64 different codon combinations (4 x 4 x 4 = 64)

• The order of the codons determines the amino acid sequence for a protein

• The coding region always starts with a START codon (AUG) and terminates with a STOP codon either by ( UAA/ UGA/ UAG).

The Genetic Code

DEGENARATE

ORDERED

COMMA- FREE NEARLY UNIVERSAL

COMPOSED OF NUCLEOTIDE

TRIPLET

NON OVERLAPPING

6

Because of the degeneracy of all genetic codes, 18-20 amino acids are encoded by more than one codon (2, 3, 4, or 6).

1

2

3

4

5

6

7

89

DEGENERACY OF CODON

Nondegenerate sites: are codon position where

mutations always result in amino acid

substitutions.

Twofold degeneracy

Threefold degeneracy.

TTT (Phenylalanine)CTT (leucine) ATT (Isoleucine) GTT (Valine).

GAT and GAC code for Aspartic acidGAA and GAG code for Glutamic acid

There is only 1 threefold degenerate site: the 3rd position of an isoleucine codon. ATT, ATC, or ATA all encode isoleucine, but ATG encodes methionine.

• Fourfold degenerate sites: are codon positions where changing a nucleotide in any of the 3 alternatives has no effect on the amino acid.

• Five amino-acids are encoded by 4 codons which differ only in the third position. These sites are called “fourfold degenerate” sites

9

Codon-usage bias

Codon bias is the probability that a given codon will be used to code for an amino acid over a different codon which codes for the same amino acid.

Brian Clark (1970)

10

Biases in synonymous codon usage can be explained by: (1) mutational bias theory(2) selection favoring preferred codon theory

Theory for codon usage bias

11

Mutational Bias theory (osawa et

al.,1988)If the unequal codon-usage is due to biases in mutation patterns, then the expectation is that the magnitude and the direction of the bias will be more or less the same for all codon families and for all genes, regardless of function or expression levels. Let us assume that the mutation pattern in an organism tends to result in AT rich sequences. Under such a mutational regime, it is expected that all four-fold degenerate codon families will exhibit a preference for codons ending in A or T. Thus, the preferred codons for valine should be GTA and GTT and the preferred codons for arginine should be CGA and CGT.

• Chemical decay of nucleotide bases (Kaufmann and Paules, 1996)

• Non-uniform DNA repair

• Non-random replication errors

12

Mutational BiasesSome bacterial genomes (e.g., Mycoplasma capricolum), exhibit this type of consistent codon-usage bias. Mycoplasma capricolum shows G+C rich mutational pattern.

Codon family Amino acid G/C in 3rd (%) A/T in 3rd (%)CU LEU 93 7GU VAL 95 5UC SER 98 2CC PRO 95 5AC THR 98 2GC ALA 94 6CG ARG 100 0GG GLY 95 5

most amino acids allow synonymous GC content changing substitutions in the third codon position, the overall GC bias of a genome or genomic region is highly correlated with GC3, a measure of third position GC content.

For individual amino acids as well, G/C ending codons usage generally increases with increasing GC bias and decreases with increasing AT bias

Principal Findings: Two G-ending codons, AGG (arginine) and TTG (leucine), unlike all other

G/C-ending codons, show overall usage that decreases with increasing GC bias, contrary to the usual expectation that G/C-ending codon usage should increase with increasing genomic GC bias.

Moreover, the usage of some codons appears nonlinear, as a function of GC bias. a continuous-time Markov chain model of GC-biased synonymous substitution.

This model correctly predicts the qualitative usage patterns of all codons, including nonlinear codon usage in isoleucine, arginine and leucine.

The model accounts for 72%, 64% and 52% of the observed variability of codon usage in prokaryotes, plants and human respectively.

When codons are grouped based on common GC content, 87%, 80% and 68%

of the variation in usage is explained for prokaryotes, plants and human respectively

Oct,2010

Theory of selection favoring preferred codon

15

Two selective factors have been convincingly invoked to explain codon usage bias.

(1) translation optimization(2) folding stability of the protein

Selection for translation efficiency

The correlation between codon frequency and abundant cognate tRNAs was a compelling argument for natural selection choosing between synonymous codons

The three main parameters that affect translation efficiency are

(i) The maximum turnover of ribosomes, (ii) The efficiency of aminoacyl-tRNA matching (iii) Ternary complex concentrations …………(Kurland1991). Ikemura (1981b) described an optimal codon as one that satisfied

certain rules of codon choice; the predominant rule is that they are translated by the most abundant cognate tRNA.

Codons that are recognized by the major tRNAs are translated 3–6 fold faster than their synonyms (Sorensen, Kurland and Pedersen 1989).

The translation efficiency of a codon is related to the relative quantity of tRNA molecules that recognize the

particular codon. -Ikemura (1981b)

Selection for translation accuracy

The rate of initial codon recognition can vary up to 25 fold with optimal codons being recognised most rapidly results into faster translation

(Curran and Yarus 1988)

Since the number of ribosomes is often limiting; The consequence of faster translation is that ribosomes spend less time on the mRNA, thus increasing the number of free ribosomes and increasing the number of mRNAs translated per ribosome. (Ikemura 1985)

It has been estimated that in E. coli the non-optimal Asn codon AAU can be mistranslated eight to ten times more often than its optimal synonym AAC (Parker et al. 1983; Percup and Parker 1987)

Similarly in some contexts the non-optimal codon UUU (Phe) is frequently misread as a leucine codon (Parker et al. 1992)

19

Is codon usage bias uniform along the length of the mRNA?

For many highly expressed genes, codons recognized by low abundance tRNAs are overrepresented in the 5’ region of the coding region. This pattern suggests that ribosomes translate more slowly over the initial 50 codons or so (the so-called ramp stage) and then translate the remainder of the mRNA at full speed.

Ramp stage

(Tuller et al. 2010)

20

What purpose does the ramp

play in translation?

Faster translation elongation immediately after slower initiation

effectively generates more uniform spacing between

ribosomes further down the mRNA

prevents ribosome congestion & translation stalling and termination.

21

The length of the ramp corresponds well to the length of the polypeptide needed to fill the exit tunnel of the ribosome.

so the nascent peptide chain should emerge from the ribosome as it transitions from the slow ramp stage to the fast stage of elongation.

This raises the possibility that the slowdown in the ramp might increase the fraction of correctly folded product.

Another potential role for the ramp involves protein folding

Selection for stable protein folding

22

Codon arrangement along the mRNAThe arrangement of different codons along the length of the mRNA influences translation efficiency. In the autocorrelated pattern, when an amino acid recurs in the protein, there is a strong propensity to use the same codon the second time as that for the first occurrence of the amino acid. In the anticorrelated pattern, when an amino acid recurs in the protein, there is a strong tendency to use a different codon the second time from that used in the first occurrence of the amino acid.

Literature cited

26

Escherichiacoli

SaccharomycescerevisaeAmino Acid Codon

High Low High LowUUA 1% 20% 8% 25%

UUG 1% 15% 89% 25%

CUU 2% 12% 0% 12%CUC 3% 11% 0% 9%

CUA 1% 5% 3% 15%

Leucine

CUG 92% 37% 0% 14%

GUU 60% 27% 52% 28%

GUC 2% 25% 48% 19%

GUA 28% 16% 0% 30%

Valine

GUG 10% 32% 0% 23%AUU 16% 46% 42% 43%

AUC 84% 37% 58% 22%

Isoleucine

AUA 0% 17% 0% 35%UUU 17% 67% 10% 69%Phenylalanine

UUC 83% 33% 90% 31%

26

Universal and species-specific patterns of codon

usage

Jan C Biro 2008 ; studies on the origin & evolution of codon bias,

Codon usage in nuclear genes of four monocot ( rice, maize, wheat, barley) & three dicot species(Arabidopsis, Nicotiana tabacum & Pisum sativum) was analysed to find general pattern in codon choice of plant spp.

1. Codon bias was correlated with GC content at the third codon position.2. GC contents were higher in monocot species than in dicot spp at all

codon position3. The high GC contents of monocot spp might be the result of relatively

stronger mutational bias that occurred in lineage of poaceae spp.4. In both spp. ENC for most genes was similar to that for expected enc

based on GC content at the third codon positions.5. G & C ending codons were detected as the” preferred” codons in

monocot spp.6. Pyrimidine (C & T ) is used more frequently than purine (G & A) in four

fold degenerate codon groups.

29

The genome hypothesis

All genes in a genome tend to have the same coding strategy. That is, they employ the codon catalog similarly and show similar choices between synonymous codons.

Different taxa have different coding strategies.

Richard Grantham

30

Are there universal preferences? There are NO universally preferred or universally avoided codons. There may be some universal preferences and avoidances as far

as codon neighbor pairs are concerned. For example, the pair NNG CNN, where N stands for all four

possible nucleotides, seems to be preferred, while the pair NNG GNN seems to be avoided.

Factors influencing codon usage bias in Genome

1.Selection for optimized translation:• As translation is high energy expensive ;so inefficient

& inaccurate translation deplete the cellular resources.

• Thus during evolution , mutations that reduces the energy required to translation have been favoured.

• Use of optimal codons are the best way to reduce required energy as it can increases both the accuracy and efficiency of the translation.

• Optimal synonymous codon can be decided on basis of the tRNA abundancy of certain codon, it means codons with high amount of tRNA, and codon that bind their cognate tRNA more strongly will preffered.

LEADS TO BIASNESS IN CODON USAGE.

32

Host-phage relationships have revealed that the codon usages of Staphylococcus aureus phages, T4 phages are all highly influenced by the codon biases of their hosts. This suggests that the phages’ codon usages are largely determined by the most abundant tRNAs of their hosts.

recent studies have revealed that the model of selection for translation efficiency may simple…

In a study of 102 bacterial species, codon usage bias in highly expressed genes seems to result from the

selection of optimal codons associated with the most frequent tRNA genes, the increase in frequency of these tRNA genes also results from codon usage bias.

This leads to the concept of a CO-EVOLUTION , which has left to the idea that different tRNA abundances evolved directly from pre-existing codon biases.

This suggests that tRNA abundance is a consequence of codon bias, not the determining factor of it.

why some tRNAs are

more abundant

than others?

2.GENE EXPRESSION :

One widely studied force behind codon usage bias is gene expression.

A highly expressed gene is a gene that is expressed often, producing greater than average levels of protein. e.g. Housekeeping genes.

A broadly expressed gene is one that is expressed in many tissues.

Studies of the genomes of a wide variety of organisms have revealed a correlation between gene expression level and codon usage bias, namely that high gene expression leads to high codon bias.

In genes that are translated often and at high volumes, codon bias appears to be especially high because the cost of a missense error is increased.

3. Rate of evolution

• Studies on Saccharomyces cerevisiae, Drosophila melanogaster, Escherichia coli, and Salmonella typhimurium have revealed a significant negative correlation between codon usage bias and the rate of nucleotide substitution at silent sites.

• The study that looked at Escherichia coli and Salmonella typhimurium found that, additionally, highly expressed genes have high codon bias and low rates of synonymous substitution.

• Codon preferences reflect a balance between mutational biases and natural selection for translational optimization.

• Since optimal codons are favored by selection, and a synonymous substitution to a non-optimal codon would actually decrease fitness.

• selection among synonymous codons constrains the rate of silent substitution in some genes.

4.Secondary structure• The secondary structural constraints of DNA also play an active role in

determining the codon preferences of genes..

• This was interpreted to suggest that these structural constraints, such as DNA and mRNA flexibility capabilities and folding stabilities, play a more important role in determining codon usage than translational constraints do.The transcription of DNA is highly constrained by the ability of DNA strands to bend and be flexible during transcription. These structural properties are influenced by base sequence and length, which may influence the codon bias, and which often correlate to gene expressionlevels.

5.Nucleotide composition

Variation in the G+C content of silent sites is the major source of variation in codon usage (Fennoy and Baileserres 1993)

There is a correlation between gene density and G+C content, but the location of genes appears to be independent of tissue, time (Bernardi 1993)

There is also a correlation between the G+C content of the 1st and 3rd codon positions of genes (Eyre-Walker 1991).

6. Protein length

The protein length and codon usage bias are positively correlated in both Saccharomyces cerevisiae and Escherichia coli.

Translational selection has been used to explain this correlation.

The cost of translating a protein is proportional to its length, so there is greater

pressure for the selection of the most accurate codons in longer genes to avoid missense errors, explaining the positive correlation

a new index of codon bias was developed to control for the influence of gene length on codon bias ;Measure Independent of Length and Composition (MILC).

Large, randomly generated sequence sets were used to test for dependence on (i) sequence length, (ii) overall amount of codon bias and (iii) codon bias variation in the sequences

41

Toshimichi Ikemura

7.Time

the time and speed of expression is the determinant of codon bias that is, when during the life of a cell and how quickly does replication takes

place.

Fast-growing bacteria have more abundant, less diverse tRNAs, leading to higher codon bias in highly expressed genes.

they tend to have significantly high CAI values.

While in slow growing organisms with low codon biases, CAI is a less effective indicator of highly expressed genes.

The time of replication plays an important role in codon biases within genes and genomes.

for example : In Populus tremula tissues in which cells are currently undergoing growth and division at a rapid pace, shows stronger association between codon bias and gene expression than in cells of the same tissues growing at a slower pace.

Subramanian S. 2008. Genetics 178:2429-2432

8.Genome size

Dos Reis et al. 2004 discovered that tRNA-gene redundancy and genome size are interacting forces in determining translational selection and codon-usage bias

They suggested that an optimal combination of these factors exists for the maximization of translational selection

The magnitude of selection was maximal in genomes 1-30 Mb in size that contain 150-600 tRNA specifying genes

The genome of Helicobacter pylori contains only 36 tRNA-coding genes (only one tRNA-gene having two copies).

The haploid genome size of humans is approximately 3,500 Mb.

Both Helicobacter pylori and humans fall outside this range

Role of codon usage bias in growth & development of organism

Effect on the RNA secondary structures

It is major determinant of mRNA stability

Effect on the speed of translation initiation and translation elongation(translation effeciency)

Effect on the protein folding

Effect on heterologous gene expression

Optimal codon maintain efficient growth rate with greater accuracy

46

• The relative synonymous codon usage (RSCU) is the number of times a codon appears in a gene divided by the number of expected occurrences under equal codon usage.

• n = number of synonymous codons (1 n 6) for the amino acid under study

Xi = number of occurrences of codon i.

• If the synonymous codons of an amino acid are used with equal frequencies, their RSCU values will equal 1.

RSCUiXi

1n Xii1

n

Measures of codon-usage bias

( Sharp & Li, 1987) In study of the role of selection in producing high codon bias, a statistic called Codon Adaptation Index (or CAI) is calculated.Pattern of codon usage in very highly

expressed genes can explain;(i)Which of the alternative synonymous

codons for an amino acid is the most efficient for translation

(ii) The relative extent to which other codons are disadvantageous

The Codon Adaptation Index (CAI)

48

The codon adaptation index (CAI) measures the degree with which genes use preferred codons.

We first compile a table of RSCU values for highly expressed genes. From this table, it is possible to identify the codons that are most frequently used for each amino acid. The relative adaptiveness of a codon (wi) is computed as

where RSCUmax = the RSCU value for the most frequently used codon for an amino acid.

wiRSCUiRSCUmax

49

The CAI value for a gene is calculated as the geometric mean of wi values for all the codons used in that gene.

where L = number of codons.

CAI wii1

L

1L

50

The effective number of codons (ENC)

where Fi (i = 2, 3, 4, or 6) is the average probability that two randomly chosen codons for an amino acid with i codons will be identical.

ENC values range from 20 (the number of amino acids), which means that the bias is at a maximum, and only one codon is used from each synonymous-codon group, to 61 (the number of sense codons), which indicates no codon-usage bias.

ENC2 9F2

1F3

5F4

3F6

Applications of codon bias

1. Phylogeny investigation

• As codon usage divergence is correlated with evolutionary distance (Grantham et al. 1981; Long and Gillespie 1991; Maruyama et al. 1986)

• Although phylogenies based on codon usage may appear to have practical application, phylogenies are best investigated by comparative analysis of homologous sequences (Sharp1986)

• Codon usage can converge in an evolutionary distant species due to similar mutational bias.

• Nesti and co-workers (1995) also presented a phylogeny based on codon usage divergence,

2. Codon Usage as a Tool for Gene Prediction

• Knowledge of codon usage preference can be applied to the prediction of open reading frames (Borodovsky et al. 1995; Krogh et al. 1994)

• Most of the many modern gene prediction programs use codon usage patterns as well as dinucleotide and short oligonucleotide patterns to predict open reading frames (Karlin and Cardon 1994)

• The GeneMark prediction program (Borodovsky et al. 1994a; Borodovsky and McIninch 1993 ) has been used to identify the coding sequences from two major shotgun genome sequencing projects (Fleischmann et al. 1995; Fraser et al. 1995).

• Although modern gene prediction programs can learn from a sample of genes, a more in depth knowledge of codon usage variations can greatly improve their predictive properties (Borodovsky et al. 1995)

• They might contain codons that are rarely used in the desired host, come from organisms that use non-canonical code or contain expression-limiting regulatory elements within their coding sequence.

• Improvements in the speed and cost of gene synthesis have facilitated

the complete redesign of entire gene sequences to maximize the likelihood of high protein expression.

The gene design and use of synthetic genes offers a mechanism by which researchers can take much greater control of heterologous protein expression.

As well as manipulating codon biases, peptide tags can be added, splice sites removed and restriction sites placed as desired.

The cost and fidelity of gene synthesis appears to making their use increasingly cost-effective

• Improving expression by codon optimization

• Improving expression by changing host tRNA pool

NUMBER 12 JUNE 2001

fast

slow

Optimal codon

Non-optimal codon

• Codon usage also impacts gene expression at the level of mRNA decay Codon identity correlates with yeast mRNA half-lives transcriptome wide.

Converting non-optimal codons to optimal codons increases mRNA stability.

Codon optimality impacts translational elongation rate.

Proteins with related function are coordinated at the level of optimal codon content.

Detection of lateral gene transfer

• Transfer of genetic material between different genomes

• Via mobile genetic elements• plasmids , transposons (jumping

genes), bacteriophages• Prokaryotes includes; a. transformationb. conjugation c. transduction

Lateral gene transfer between two different species can be detected by CAI & ENC

IF two different individual of different species shows same gene expression and we have to check the lateral gene transfer between these species then compare their CAI & ENC ; if it is same it means both individual of different species posses same gene due to the lateral gene transfer .

conclusion

Any queries?

best presentation on codon bias and its appliaction

Education