analysis of codon usage pattern in the radioresistant bacterium deinococcus radiodurans

8
BioSystems 85 (2006) 99–106 Analysis of codon usage pattern in the radioresistant bacterium Deinococcus radiodurans Qingpo Liu Department of Agronomy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310029, China Received 17 September 2005; received in revised form 6 December 2005; accepted 12 December 2005 Abstract The main factors shaping codon usage bias in the Deinococcus radiodurans genome were reported. Correspondence analysis (COA) was carried out to analyze synonymous codon usage bias. The results showed that the main trend was strongly correlated with gene expression level assessed by the “Codon Adaptation Index” (CAI) values, a result that was confirmed by the distribution of genes along the first axis. The results of correlation analysis, variance analysis and neutrality plot indicated that gene nucleotide composition was clearly contributed to codon bias. CDS length was also key factor in dictating codon usage variation. A general tendency of more biased codon usage of genes with longer CDS length to higher expression level was found. Further, the hydrophobicity of each protein also played a role in shaping codon usage in this organism, which could be confirmed by the significant correlation between the positions of genes placed on the first axis and the hydrophobicity values (r = 0.100, P < 0.01). In summary, gene expression level played a crucial role, nucleotide mutational bias, CDS length and the hydrophobicity of each protein just in a minor way in shaping the codon usage pattern of D. radiodurans. Notably, 19 codons firstly defined as “optimal codons” may provide useful clues for molecular genetic engineering and evolutionary studying. © 2005 Elsevier Ireland Ltd. All rights reserved. Keywords: Deinococcus radiodurans; Codon usage; Correspondence analysis 1. Introduction Non-random use of synonymous codons universally exists both within and between organisms, which is one of the most important scientific issues that is correlated with many factors, such as base compositional mutation bias (Karlin and Mr´ azek, 1996; Hou and Yang, 2003), gene expression level (Sharp and Li, 1986; Duret and Mouchiroud, 1999; Peixoto et al., 2003; Romero et al., 2003), gene length (Moriyama and Powell, 1998), tRNA abundance (Percudani et al., 1997; Duret, 2000), protein structure (Gu et al., 2004b), codon–anticodon interaction Tel.: +86 571 86971611; fax: +86 571 86971117. E-mail address: [email protected]. (Shi et al., 2001), the hydropathy level of each protein, amino acid conservation (Romero et al., 2000), etc. How- ever, genome compositional mutation bias and natural selection, with different relative importance in different species, mainly contributed to codon bias (Singer and Hickey, 2003; Peixoto et al., 2003; Romero et al., 2003; Gupta et al., 2004). In some prokaryotic genomes, the codon usage pattern was attributable to the equilibrium between natural selection and compositional mutation bias (Bulmer, 1988; Sharp et al., 1993). However, trans- lational selection at synonymous sites played key roles in shaping codon usage of thermophilic prokaryotes (Lynn et al., 2002) and Arabidopsis thaliana (Chiapello et al., 1998). In contrast, in some prokaryotes with extremely high AT or GC contents (Sharp et al., 1993; Gu et al., 2004a,b; Adams and Antoniw, 2004) and human (Karlin 0303-2647/$ – see front matter © 2005 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.biosystems.2005.12.003

Upload: qingpo-liu

Post on 26-Jun-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of codon usage pattern in the radioresistant bacterium Deinococcus radiodurans

BioSystems 85 (2006) 99–106

Analysis of codon usage pattern in the radioresistantbacterium Deinococcus radiodurans

Qingpo Liu ∗Department of Agronomy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310029, China

Received 17 September 2005; received in revised form 6 December 2005; accepted 12 December 2005

Abstract

The main factors shaping codon usage bias in the Deinococcus radiodurans genome were reported. Correspondence analysis(COA) was carried out to analyze synonymous codon usage bias. The results showed that the main trend was strongly correlated withgene expression level assessed by the “Codon Adaptation Index” (CAI) values, a result that was confirmed by the distribution of genesalong the first axis. The results of correlation analysis, variance analysis and neutrality plot indicated that gene nucleotide compositionwas clearly contributed to codon bias. CDS length was also key factor in dictating codon usage variation. A general tendency ofmore biased codon usage of genes with longer CDS length to higher expression level was found. Further, the hydrophobicity of eachprotein also played a role in shaping codon usage in this organism, which could be confirmed by the significant correlation betweenthe positions of genes placed on the first axis and the hydrophobicity values (r = −0.100, P < 0.01). In summary, gene expressionlevel played a crucial role, nucleotide mutational bias, CDS length and the hydrophobicity of each protein just in a minor way inshaping the codon usage pattern of D. radiodurans. Notably, 19 codons firstly defined as “optimal codons” may provide useful cluesfor molecular genetic engineering and evolutionary studying.©

K

1

eowbgM2as

0

2005 Elsevier Ireland Ltd. All rights reserved.

eywords: Deinococcus radiodurans; Codon usage; Correspondence analysis

. Introduction

Non-random use of synonymous codons universallyxists both within and between organisms, which is onef the most important scientific issues that is correlatedith many factors, such as base compositional mutationias (Karlin and Mrazek, 1996; Hou and Yang, 2003),ene expression level (Sharp and Li, 1986; Duret andouchiroud, 1999; Peixoto et al., 2003; Romero et al.,

003), gene length (Moriyama and Powell, 1998), tRNAbundance (Percudani et al., 1997; Duret, 2000), proteintructure (Gu et al., 2004b), codon–anticodon interaction

∗ Tel.: +86 571 86971611; fax: +86 571 86971117.E-mail address: [email protected].

(Shi et al., 2001), the hydropathy level of each protein,amino acid conservation (Romero et al., 2000), etc. How-ever, genome compositional mutation bias and naturalselection, with different relative importance in differentspecies, mainly contributed to codon bias (Singer andHickey, 2003; Peixoto et al., 2003; Romero et al., 2003;Gupta et al., 2004). In some prokaryotic genomes, thecodon usage pattern was attributable to the equilibriumbetween natural selection and compositional mutationbias (Bulmer, 1988; Sharp et al., 1993). However, trans-lational selection at synonymous sites played key roles inshaping codon usage of thermophilic prokaryotes (Lynnet al., 2002) and Arabidopsis thaliana (Chiapello et al.,1998). In contrast, in some prokaryotes with extremelyhigh AT or GC contents (Sharp et al., 1993; Gu et al.,2004a,b; Adams and Antoniw, 2004) and human (Karlin

303-2647/$ – see front matter © 2005 Elsevier Ireland Ltd. All rights reserved.doi:10.1016/j.biosystems.2005.12.003

Page 2: Analysis of codon usage pattern in the radioresistant bacterium Deinococcus radiodurans

100 Q. Liu / BioSystems 85 (2006) 99–106

and Mrazek, 1996), mutation bias rather than trans-lational selection was the most important determinantaccounting for the variation of codon usage. In addi-tion, in the Escherichia coli, Saccharomyces cerevisiae,Caenorhabditis elegans, Drosophila melanogaster, A.thaliana and Oryza sativa genomes, there was a stronglysignificant correlation between gene expression leveland codon usage bias (Ikemura, 1981; Sharp et al.,1986; Duret and Mouchiroud, 1999). Compared withlowly expressed genes, highly expressed genes exhibitsgreat variation in codon usage, reflecting the strongerselection constrain on highly expressed genes to opti-mize translation efficiency and accuracy by the useof a more restricted set of “preferred” synonymouscodons (Bulmer, 1988; Miyasaka, 2002), and suggest-ing that codon usage pattern has a functional signifi-cance (Singer and Hickey, 2003). In many cases, codonusage mirrored the distribution of tRNA abundance(Ikemura, 1981; Percudani et al., 1997; Moriyama andPowell, 1997; Duret, 2000), and the “preferred” codonswere those being best recognized by the most abun-dant tRNA species (Ikemura, 1981). Therefore, analysisof codon usage data has both theoretical and practicalsignificance in understanding the basics of molecularbiology.

Deinococcus radiodurans is a Gram-positive, red-pigmented, non-motile bacterium being originally iden-tified as a contaminant of irradiated caned meat. Upto date, D. radiodurans is the most radiation-resistantorganism among all species in the Deinococcus genus,

location), coding sequences (CDS) were retrieved with a PERLscript. To minimize sampling errors, only CDSs with lengthlonger than or equal to 300 bases (totally 2829) were used forfurther analysis.

2.2. Measurement index of codon usage

GC3s value is the frequency of GC at the third synony-mously variable coding position (excluding Met, Trp and ter-mination codons). GC12 is the average of GC1 and GC2, andis used for neutrality plot analysis. To normalize codon usagewithin datasets of differing amino acid compositions, relativesynonymous codon usage (RSCU) values are calculated bydividing the observed codon usage by that expected when allcodons for the same amino acid are used equally (Sharp andLi, 1986). The “effective number of codons” (ENC) is oftenused to measure the magnitude of codon bias for an individualgene, yielding values ranging from 20, for a gene with extremebias using only one codon per amino acid, to 61 for a gene withno bias using synonymous codons equally (Wright, 1990). The“Codon Adaptation Index” (CAI) is used to estimate the extentof bias toward codons that are known to be preferred in highlyexpressed genes. A CAI value is between 0 and 1.0, and ahigher value means a stronger codon usage bias (Sharp and Li,1987). CAI value has been proved to be the best gene expres-sion theory value and been extensively used as a measure ofgene expression level (Naya et al., 2001; Gupta et al., 2004)The set of reference sequences used for calculating CAI val-ues in this study are those genes coding for ribosomal proteins(Peixoto et al., 2003; Gupta et al., 2004).

2.3. Correspondence analysis (COA)

which makes it an ideal candidate for bioremediationof sites contaminated with radiation and toxic chemi-cals (Battista, 1997). Nowadays, the complete genomesequence of D. radiodurans has been determined (Whiteet al., 1999). Thus, it is of interest to understand howabout the codon usage in this peculiar organism. Theauthor believes that analysis of codon usage data maygive some clues to better understand the features ofgenomic organization and evolutionary information forthis bacterium. The present study performed a compre-hensive analysis of codon usage bias in the D. radiodu-rans genome, using methods of multivariate statisticalanalysis, variance analysis and correlation analysis, andalso determined its optimal codons.

2. Materials and methods

2.1. Sequence data

The complete genome sequence of D. radiodurans, withan average GC content of 67.6%, was obtained from TIGR.According to the annotated coordinates (start and stop codons

The relationship between variable and sample can beexplored using multivariate statistical analysis. This methodhas been successfully utilized to investigate the variation ofRSCU values among genes (Morton, 1999; Musto et al., 2001;Grocock and Sharp, 2002; Singer and Hickey, 2003; Peixotoet al., 2003; Romero et al., 2003; Gupta et al., 2004). Now themost commonly used method is called correspondence anal-ysis (Greenacre, 1984), in which all genes are plotted in a59-dimensional hyperspace according to their usage of the 59sense codons. This method can detect difference in codon usagebetween genes and identify the codons involved. Sequences inwhich a given codon is used in a similar fashion lies close toeach other on the graph. Software CodonW Version 1.4 wasused to perform the COA analysis.

2.4. Statistical analysis

A Chi-square test was employed to examine the significanceof codon usage difference between two datasets (McInerney,1998). For each of the 59 sense codons, the Chi-square testinvolves a 2 × 2 table that yields one degree of freedom, inwhich the first row contains the observation values for thecodon being analyzed, whereas the second row is the total

Page 3: Analysis of codon usage pattern in the radioresistant bacterium Deinococcus radiodurans

Q. Liu / BioSystems 85 (2006) 99–106 101

numbers of synonymous alternatives. The significance wasexamined at the 5% level (Chi-square value of 3.841). In addi-tion, a PERL program was developed to perform varianceanalysis that was used to estimate the significance between dif-ferent sequence groups. Correlation analysis was carried outemploying the Spearman’s rank correlation analysis methodwrapped in the multi-analysis software SPSS Version 12.0.

3. Results

3.1. Gene expression level and codon usage bias

A COA of RSCU values was conducted, in whichthe first axis accounted for 22.8% of the total inertia ofthe 59-dimensional space, whereas the next three axesonly accounted for 5.7%, 4.7% and 4.1%, respectively,indicating that the first axis was the major explanatoryaxis for interpreting codon usage variation among genes.The position of each gene on the plane defined by the firsttwo axes was displayed in Fig. 1. It was interesting tonote that some putatively highly expressed genes such asribosomal proteins, ATP synthase, elongation factor andtrigger factor, etc., were clustered at the extremely leftside of the first axis, while some transferases and hypo-thetical proteins were at the other extreme, suggestingthat gene expression level was primarily responsible forseparating genes according to their codon usage alongthe first axis. In addition, it was found that there weresignificantly negative correlations between gene expres-sion level assessed by CAI values and their positionsa−dc(Hteu

Ftau

Fig. 2. Nc-plot (ENC values vs. GC3s) of the D. radiodurans genes.The continuous curve represents the expected curve between GC3s andENC under random codon usage.

play a minor role in shaping codon usage in this organ-ism.

3.2. Nucleotide compositional constraint analysis

The plot of ENC against GC3s (Nc-plot) was effec-tively used to detect the codon usage variation amonggenes. Wright (1990) argued that the comparison ofactual distribution of genes with the expected distri-bution under no selection could be indicative if codonusage bias of genes has some other influences otherthan compositional constraints. Nc-plot of D. radiodu-rans genes (Fig. 2) showed that a majority of the pointswith low ENC values were lying below the expectedcurve, although a few genes lied on the expected curve. Inaddition, a significantly negative correlation (r = −0.125,P < 0.01) between GC content and ENC values wasfound. Taken together, it could be concluded that apartfrom gene expression level, compositional constraintsmust contribute to the codon usage pattern of thisgenome.

To investigate codon usage variation among geneswith different GC content, the 2829 genes were classifiedinto 4 groups. The results of variance analysis showeda general tendency of lower GC content with higherENC value (Table 1). However, just when CDSs withGC content larger than 0.65 were analyzed, no clearlystatistical significance was found. On the other hand,neutrality plot (GC12 versus GC3; Sueoka, 1988) was

long the first axis, and ENC values (r = −0.817 and0.801, respectively, P < 0.01). In addition, axis 1 coor-

inates was significantly negatively correlated with GContent, CDS length and hydrophobicity of each proteinr = −0.120,−0.189 and−0.100, respectively, P < 0.01).owever, these correlation coefficients were far less

han that of gene expression level, suggesting that genexpression level should be the major source of codonsage variation, while the last three factors seemed to

ig. 1. Distribution of D. radiodurans genes on the plane defined byhe first two main axes of the correspondence analysis. Open squaresnd triangles indicate genes with extremely high and low CAI valuessed as the High and Low datasets, respectively.

performed to examine the association between mutationbias and codon bias. Fig. 3 showed that this organ-ism had a wide range of GC3 (0.28–0.94) distribution.Nevertheless, there was only a quite small correlationcoefficient at the 0.05 level in neutrality plots of D. radio-durans (r = −0.039), suggesting of low mutation bias orhigh conservation of GC contents level throughout thegenome, which indirectly demonstrated that the impor-tance of mutational bias on codon bias was less than thatof gene expression level.

Page 4: Analysis of codon usage pattern in the radioresistant bacterium Deinococcus radiodurans

102 Q. Liu / BioSystems 85 (2006) 99–106

Table 1Variance analysis of ENC variation among different GC content groups

Group GC content Number of observations ENC (means ± standard deviation)

1 <0.55 41 55.39 ± 5.09a

2 0.55–0.65 537 39.83 ± 7.72b

3 0.65–0.70 1434 36.65 ± 5.23c

4 >0.70 817 36.59 ± 4.35cd

Note: The same superscript letters (a–d) within the same column means no statistically significant difference between any two groups (P > 0.05). Incontrast, different superscript letter means statistically significant difference (P < 0.05).

Fig. 3. Neutrality plot of D. radiodurans genes.

3.3. Relationship between CDS length, geneexpression level and codon usage bias

The results of correlation analyses between CDSlength and axis 1 coordinates, CAI, ENC andGC3s values showed that the four correlationcoefficients (r = −0.189, 0.211, −0.160 and 0.214,respectively, P < 0.01) were all significant, suggest-ing a general tendency of more biased gene withlonger CDS length to higher expression level. Fur-thermore, the 2829 genes were classified into 5groups (length ≤ 499 bp, 500–999 bp, 1000–1499 bp,1500–2499 bp and ≥2500 bp) according to CDS length.A detailed variance analysis of CAI, ENC and GC3svalues among different CDS length groups was made.The results showed that with a few exceptions, there wasclearly statistical significance between any two groups,although the difference among average GC3s values wasnot as strong as that of CAI and ENC (Table 2). However,

Fig. 4. Plot of the two most important axes after correspondence anal-ysis of RSCU values. Open and full squares indicate the genes that arelocated on the leading and lagging strands, respectively.

the difference between groups with length longer than2500 bp and shorter than 1500 bp was more apparent.Overall, CDSs with longer length exhibited more exten-sive degree of codon usage variation, higher expressionlevel and more preference for G/C ending codons.

3.4. Strand-specific compositional constraintanalysis

According to the genomic annotation in TIGR, therewas about 52.4% of genes on the leading and 47.6% onthe lagging strands, respectively. Nevertheless, the aver-age GC contents of genes on the two strands (67.5%and 67.6%, respectively) were almost equal frequency.In addition, wherever the genes locating on the leadingor lagging strand, their positions on the plane definedby the first two axes were well mixed together (Fig. 4),

Table 2Variance analysis of CAI, ENC and GC3s values among different CDS length groups

Group CDS length (bp) Number of observations Means ± standard deviation

CAI ENC GC3s

1 ≤499 526 0.525 ± 0.122e 38.72 ± 7.28a 0.804 ± 0.089e

2 500–999 1238 0.553 ± 0.097d 37.78 ± 5.73b 0.823 ± 0.091cd

3 1000–1499 681 0.570 ± 0.098c 37.04 ± 5.87bc 0.826 ± 0.105bc

4 1500–2499 316 0.597 ± 0.090ab 35.97 ± 5.62d 0.836 ± 0.097b

5 ≥2500 68 0.607 ± 0.077a 34.77 ± 4.53e 0.851 ± 0.074a

s no statistically significant difference between any two groups (P > 0.05). Inerence (P < 0.05).

Note: The same superscript letters (a–e) within the same column meancontrast, different superscript letter means statistically significant diff

Page 5: Analysis of codon usage pattern in the radioresistant bacterium Deinococcus radiodurans

Q. Liu / BioSystems 85 (2006) 99–106 103

suggesting of nearly no strand-specific codon usage vari-ation. To further demonstrate this speculation, a Chi-square test was performed to examine the differencebetween the two datasets. The results showed that only5 out of 59 codons (GGA, CUU, UUA, ACU and GUU)exhibited statistically significant differences. Thus, itcould be inferred that the nucleotide composition of theleading and lagging strands should be no effect on codonusage variation among genes.

3.5. Translational optimal codons

To understand which triplets were incrementedamong the highly expressed genes, 5% of the total genes(140 genes) with extremely high and low CAI valueswere used as the High and Low datasets, respectively,and comparative codon usage pattern was performedthen. The results of Chi-square test showed that 52 outof 59 codons exhibited clearly different codon usagewith a P-value less than 0.05 (Table 3), which indicatedthat highly expressed genes displayed absolutely spe-cific codon usage pattern distinguishing them from thosegenes expressed at weak level. Table 3 showed that 19codons ending with C/G (in bold) were much preferredin highly expressed genes. The significantly positivecorrelation between CAI and GC3s values (r = 0.656,P < 0.01) completely supported the above observation.The overall codon usage of highly expressed genesshowed the expected bias toward GC-rich codons. How-ever, it appeared that neither simple mutational bias norn1pa1sa(e

4

ii1wuwteC

opment of genome projects of many organisms, it seemsthat this hypothesis is not sufficient to explain codonusage variation, although it cannot be simply deniedconsequently. In this study, the factors involved in dic-tating codon usage of the D. radiodurans genome atleast includes gene expression level, gene compositionalconstraint, CDS length, as well as the hydrophobicityof each protein. However, strand-specific compositionalconstraint had theoretically no influence on codon bias.

GC content could be one of the most important factorsin the evolution of genomic structures (Bellgard et al.,2001). Ikemura (1985) demonstrated that the correlationbetween codon usage bias and GC content in surroundingnon-coding region could be taken as a support for direc-tional mutation pressure. Codon usage bias of humangenes was related to location in the genome becauseof the mosaic patterns of GC content (Bernardi, 1993;Karlin and Mrazek, 1996). As discussed by Sharp andMatassi in their review (1994), codon usage in the mam-malian genome could reflect the physical location of thegenes, which in turn might simply reflect difference inmutation patterns. In contrast, in the Entamoeba histolyt-ica (Romero et al., 2000) and Streptococcus pneumoniae(Hou and Yang, 2002) genomes, nucleotide composi-tional pressure played minor roles in codon usage varia-tion, while in Chlamydomonas reinhardtii (Naya et al.,2001) and Echinococcus spp. (Fernandez et al., 2001)genomes, which were GC-rich, there was no clear asso-ciation between codon bias and GC content yet. In theD. radiodurans genome, there was clear heterogeneity

earest-neighbor dependent mutational bias (Bulmer,990), could explain the usage of all codons. For exam-le, among Ser codons AGC was about 2.8 as frequents UCG, while among Arg codons CGC was used about2 times more often than CGG (Table 3). These resultstrongly suggested that natural selection must be oper-tive, as was reported for several unicellular speciesRomero et al., 2000; Grocock and Sharp, 2002; Peixotot al., 2003).

. Discussion

In the “neutral theory”, mutations at degenerate cod-ng positions should be selectively neutral, thus resultedn random synonymous codon choice (Nakamura et al.,999). However, numerous studies reported that thereere many factors in shaping species-specific codonsage. A typical example was Thermotoga maritima,hose codon usage was the results of mutational bias,

ranslation selection, hydropathy of each protein, theconomy of the cell, anaerobic condition and usage ofys (Zavala et al., 2002). In other words, with the devel-

of synonymous codon usage among genes: GC contentof different genes varied from 34.1% to 77.1%, with amean value of 67.6% and S.D. of 4.3%. However, genecomposition mutational bias was not the crucial determi-nant in shaping codon usage in this genome. As known,D. radiodurans possesses a very strong ability to highlyefficiently and accurately repair their destroyed DNAsequences when they are under some extremely stressconditions. It has a unique capability to resist the induc-tion of mutations by a broad range of mutagenic agents(Bessman et al., 1996). For example, after the inductionof hundreds of double-stranded breaks by 1.75 Mrads ofionizing radiation, in a little over 24 h, most cells restoredthe genome without rearrangement or increased muta-tion frequency (White et al., 1999). Therefore, it couldbe speculated that the less importance of mutational biason codon usage variation might be closely related with itsability to stably survive the potentially damaging effects.

In the D. radiodurans genome, little is known aboutthe expression level of an individual gene. However, it isan effective way to evaluate the expression level of genesby referring the genes whose expression level are known

Page 6: Analysis of codon usage pattern in the radioresistant bacterium Deinococcus radiodurans

104 Q. Liu / BioSystems 85 (2006) 99–106

Table 3Translational optimal codons of the D. radiodurans genome

AA Codon RSCU χ2 AA Codon RSCU χ2

High Low High Low

Ala GCA 0.080 0.639 504.5 Asn AAC** 1.923 1.091 86.9GCC** 2.540 1.299 292.0 AAU 0.077 0.909 516.6GCG 1.149 1.175 0.2 Pro CCA 0.033 0.759 347.9GCU 0.231 0.887 411.3 CCC** 2.563 1.201 187.1

Cys UGC** 1.842 1.265 8.7 CCG 1.153 1.220 0.8UGU 0.158 0.735 36.0 CCU 0.251 0.820 151.2

Asp GAC** 1.890 1.088 121.4 Gln CAA 0.166 0.614 217.1GAU 0.110 0.912 687.0 CAG** 1.834 1.386 31.6

Glu GAA 1.182 1.068 3.2 Arg AGA 0.031 0.530 209.2GAG 0.818 0.932 6.0 AGG 0.051 0.623 226.3

Phe UUC** 1.731 1.056 70.8 CGA 0.039 0.753 297.8UUU 0.269 0.944 231.2 CGC** 4.665 1.702 463.5

Gly GGA 0.070 0.671 504.1 CGG 0.378 1.487 283.8GGC** 3.212 1.595 310.3 CGU 0.837 0.905 1.2GGG 0.320 0.924 263.9 Ser AGC** 3.535 1.625 201.9GGU 0.397 0.810 123.8 AGU 0.143 1.115 293.8

His CAC** 1.913 1.180 44.5 UCA 0.036 0.647 225.0CAU 0.087 0.820 217.3 UCC 0.917 0.899 0.1

Ile AUA 0.000 0.551 424.7 UCG** 1.275 0.938 17.5AUC** 2.472 1.204 151.7 UCU 0.093 0.775 215.7AUU 0.528 1.245 147.1 Thr ACA 0.025 0.614 393.3

Lys AAA 0.222 0.968 383.6 ACC** 3.258 1.529 237.7AAG** 1.778 1.032 93.3 ACG 0.579 1.071 80.9

Leu CUA 0.013 0.334 207.0 ACU 0.137 0.786 300.3CUC** 2.274 1.270 149.5 Val GUA 0.038 0.529 393.9CUG** 3.474 2.223 127.4 GUC 1.390 1.281 2.6CUU 0.131 0.935 395.1 GUG** 2.462 1.442 136.2UUA 0.006 0.348 230.4 GUU 0.111 0.748 407.6UUG 0.103 0.890 409.7 Tyr UAC** 1.888 1.143 53.6

UAU 0.112 0.857 307.4

Note: Optimal codons defined by significant results of Chi-test between highly and lowly expressed genes, and a higher RSCU value in highlyexpressed genes are indicated.** P < 0.01.

to be high in other organisms, such as ribosomal proteins,elongation factors and metabolic genes. In addition, it isknown that EST counting is efficient for assessing geneexpression level. Nevertheless, due to the limitation ofEST numbers (37 ESTs in D. radiodurans described todate) and inexact prediction of gene expression level bycounting ESTs, I would rather like using the “CodonAdaptation Index” instead of counting ESTs to evaluatethe expression level of examined genes. CAI has beenwidely used to examine the expressivities of genes bymany researchers and has now been considered as a well-accepted measure of gene expression (Hou and Yang,2003; Romero et al., 2003; Peixoto et al., 2003; Guptaet al., 2004).

There was no correlation between codon bias andgene expression level in human (Karlin and Mrazek,1996) and SARSCoV (Gu et al., 2004a), suggesting that

codon usage pattern was not affected by translationalselection. In contrast, it was strongly affected by nat-ural selection at the translation level in Sinorhizobiummeliloti (Peixoto et al., 2003), thermophilic prokary-otes (Singer and Hickey, 2003), S. cerevisiae (Sharpand Cowe, 1991), C. elegans (Stenico et al., 1994;Duret and Mouchiroud, 1999), D. melanogaster (Duretand Mouchiroud, 1999; Miyasaka, 2002), Cyprinidae(Romero et al., 2003), Xenopus laevis (Musto et al.,2001), A. thaliana (Duret and Mouchiroud, 1999), aswell as D. radiodurans, where highly expressed genestended to use optimal codons to increase their transla-tional accuracy and efficiency (Sharp et al., 1993).

Codon bias was affected by gene length to certaindegrees. In eukaryotic organisms, such as C. elegans(Marais and Duret, 2001), D. melanogaster (Miyasaka,2002) and O. sativa, a significant negative relationship

Page 7: Analysis of codon usage pattern in the radioresistant bacterium Deinococcus radiodurans

Q. Liu / BioSystems 85 (2006) 99–106 105

between codon usage bias and gene length was found.There is an explanation proposed by Moriyama andPowell (1997) for this phenomenon: if shorter proteinscan perform similar functions to those of longer pro-teins, longer proteins become energy-expensive and dis-advantageous, thus the selection constraint, which actsto reduce the size of highly expressed genes, domi-nantly determines the relationship between codon usagebias and gene length. On the contrary, in E. coli (Eyre-Walker, 1996), Yersinia pestis (Hou and Yang, 2003)and D. radiodurans cells, codon bias was significantlypositively correlated with gene length. In addition, inPseudomonas aeruginosa (Gupta and Ghosh, 2001) andS. pneumoniae (Hou and Yang, 2002) genomes, likeE. coli, there was a general tendency of more biasesof longer genes. Eyre-Walker (1996) thought this posi-tive correlation was reasonable, due to with the selectionconstraint to avoid misincorporation errors during trans-lation. However, there were no universal rules about therelationship between codon bias and gene length in allstudied genomes, and the real reasons for this discrep-ancy were not quite clearly understood yet.

As for the definition of “optimal codons”, Duret andMouchiroud (1999) thought of optimal codons as thosecodons whose frequency has been shown to increase withgene expression, while Sharp and coworkers definedoptimal codons as those showing a statistically signif-icant increase in frequency between genes with low andhigh codon usage bias (Stenico et al., 1994). In thisstudy, I not only performed statistic analysis, but alsotcrtmtm(woooofaom

A

o

to Dr. Koichiro Matsuno and the anonymous reviewerfor their helpful and valuable comments.

References

Adams, M.J., Antoniw, J.F., 2004. Codon usage bias amongst plantviruses. Arch. Virol. 149, 113–135.

Battista, J.R., 1997. Against all odds: the survival strategies ofDeinococcus radiodurans. Annu. Rev. Microbiol. 51, 203–224.

Bellgard, M., Schibeci, D., Trifonov, E., Gojobori, T., 2001. Earlydetection of G + C differences in bacterial species inferred fromthe comparative analysis of the two completely sequenced Heli-cobacter pylori strains. J. Mol. Evol. 53, 465–468.

Bernardi, G., 1993. The isochore organization of the human genomeand its evolutionary history—a review. Gene 135, 57–66.

Bessman, M.J., Frick, D.N., O’Handley, S.F., 1996. The MutT proteinsor “Nudix” hydrolases, a family of versatile, widely distributed,“housecleaning” enzymes. J. Biol. Chem. 271, 25059–25062.

Bulmer, M., 1988. Are codon usage patterns in unicellular organismsdetermined by selection-mutation balance? J. Mol. Biol. 1, 15–26.

Bulmer, M., 1990. The effects of context on synonymous codon usagein genes with low codon usage bias. Nucleic Acids Res. 18,2869–2873.

Chiapello, H., Lisacek, F., Caboche, M., Henaut, A., 1998. Codonusage and gene function are related in sequences of Arabidopsisthaliana. Gene 209, GC1–GC38.

Duret, L., 2000. tRNA gene number and codon usage in the C. elegansgenome are co-adapted for optimal translation of highly expressedgenes. Trends Genet. 16, 287–289.

Duret, L., Mouchiroud, D., 1999. Expression pattern and, surprisingly,gene length shape codon usage in Caenorhabditis, Drosophila, andArabidopsis. Proc. Natl. Acad. Sci. U.S.A. 96, 4482–4487.

Eyre-Walker, A., 1996. Synonymous codon bias is related to genelength in Escherichia coli: selection for translational accuracy?

ook the RSCU values into consideration. Overall, 19odons were firstly defined as the optimal codons of D.adiodurans (Table 3), which will be significative duringhe design of degenerate primers, introduction of point

utation, modification of heterologous genes, and inves-igation of the evolution mechanism of species at the

olecular level. As reported by Kawabe and Miyashita2003), in both dicotyledon and monocotyledon species,hen the third codon position was modified to a Gr C, increased expression of the modified gene wasbserved. As for D. radiodurans, it was known as thenly representative among the six closely related speciesf radioresistant Deinococci because of its natural trans-ormation characteristic (White et al., 1999). Therefore,nalysis of codon usage data and the determination ofptimal codons would greatly facilitate its further geneticanipulation and evolutionary studying.

cknowledgements

This work was supported by the Key Research Projectf Zhejiang Province (2003C22007). I am also thankful

Mol. Biol. Evol. 13, 867–872.Fernandez, V., Zavala, A., Musto, H., 2001. Evidence for translational

selection in codon usage in Echinococcus spp. Parasitology 123,203–209.

Greenacre, M.J., 1984. Theory and Applications of CorrespondenceAnalysis. Academic Press, London.

Grocock, R.J., Sharp, P.M., 2002. Synonymous codon usage in Pseu-domonas aeruginosa PA01. Gene 289, 131–139.

Gu, W., Zhou, T., Ma, J., Sun, X., Lu, Z., 2004a. Analysis of synony-mous codon usage in SARS Coronavirus and other viruses in theNidovirales. Virus Res. 101, 155–161.

Gu, W., Zhou, T., Ma, J., Sun, X., Lu, Z., 2004b. The relation-ship between synonymous codon usage and protein structure inEscherichia coli and Homo sapiens. Biosystems 73, 89–97.

Gupta, S.K., Bhattacharyya, T.K., Ghosh, T.C., 2004. Synonymouscodon usage in Lactococcus lactis: mutational bias versus transla-tional selection. J. Biomol. Struct. Dyn. 21, 1–9.

Gupta, S.K., Ghosh, T.C., 2001. Gene expressivity is the main factorin dictating the codon usage variation among the genes in Pseu-domonas aeruginosa. Gene 273, 63–70.

Hou, Z.C., Yang, N., 2002. Analysis of factors shaping S. pneumoniaecodon usage. Acta Genet. Sinica 29, 747–752.

Hou, Z.C., Yang, N., 2003. Factors affecting codon usage in Yersiniapestis. Acta Biochim. Biophys. Sinica 35, 580–586.

Ikemura, T., 1981. Correlation between the abundance of Escherichiacoli transfer-RNAs and the occurrence of the respective codons inits protein genes—a proposal for a synonymous codon choice that

Page 8: Analysis of codon usage pattern in the radioresistant bacterium Deinococcus radiodurans

106 Q. Liu / BioSystems 85 (2006) 99–106

is optimal for the Escherichia coli translational system. J. Mol.Biol. 151, 389–409.

Ikemura, T., 1985. Codon usage and tRNA content in unicellular andmulticellular organisms. Mol. Biol. Evol. 2, 13–34.

Karlin, S., Mrazek, J., 1996. What drives codon choices in humangenes? J. Mol. Biol. 262, 459–472.

Kawabe, A., Miyashita, N.T., 2003. Patterns of codon usage bias inthree dicot and four monocot plant species. Genes Genet. Syst. 78,343–352.

Lynn, D.J., Singer, G.A.C., Hickey, D.A., 2002. Synonymous codonusage is subject to selection in thermophilic bacteria. Nucleic AcidsRes. 30, 4272–4277.

Marais, G., Duret, L., 2001. Synonymous codon usage, accuracy oftranslation, and gene length in Caenorhabditis elegans. J. Mol.Evol. 52, 275–280.

McInerney, J.O., 1998. Replicational and transcriptional selection oncodon usage in Borrelia burgdorferi. Proc. Natl. Acad. Sci. U.S.A.95, 10698–10703.

Miyasaka, H., 2002. Translation initiation AUG context varies withcodon usage bias and gene length in Drosophila melanogaster. J.Mol. Evol. 55, 52–64.

Moriyama, E.N., Powell, J.R., 1997. Codon usage bias and tRNA abun-dance in Drosophila. J. Mol. Evol. 45, 514–523.

Moriyama, E.N., Powell, J.R., 1998. Gene length and codon usagebias in Drosophila melanogaster, Saccharomyces cerevisiae andEscherichia coli. Nucleic Acids Res. 26, 3188–3193.

Morton, B.R., 1999. Strand asymmetry and codon usage bias in thechloroplast genome of Euglena gracilis. Proc. Natl. Acad. Sci.U.S.A. 96, 5123–5128.

Musto, H., Cruveiller, S., Onofrio, G.D., Romero, H., Bernardi, G.,2001. Translational selection on codon usage in Xenopus laevis.Mol. Biol. Evol. 18, 1703–1707.

Nakamura, Y., Gojobori, T., Ikemura, T., 1999. Codon usage tabulatedfrom the international DNA sequence database. Nucleic Acids Res.27, 292.

Romero, H., Zavala, A., Musto, H., 2000. Codon usage in Chlamy-dia trachomatis is the result of strand-specific mutational biasesand a complex pattern of selective forces. Nucleic Acids Res. 28,2084–2090.

Romero, H., Zavala, A., Musto, H., Bernardi, G., 2003. The influenceof translational selection on codon usage in fishes from the familyCyprinidae. Gene 317, 141–147.

Sharp, P.M., Cowe, E., 1991. Synonymous codon usage in Saccha-romyces cerevisiae. Yeast 7, 657–678.

Sharp, P.M., Li, W.H., 1986. An evolutionary perspective on syn-onymous codon usage in unicellular organisms. J. Mol. Evol. 24,28–38.

Sharp, P.M., Li, W.H., 1987. The codon adaptation index—a mea-sure of directional synonymous codon usage bias, and its potentialapplications. Nucleic Acids Res. 15, 1281–1295.

Sharp, P.M., Matassi, G., 1994. Codon usage and genome evolution.Curr. Opin. Genet. Dev. 4, 851–860.

Sharp, P.M., Stenico, M., Peden, J.F., Lloyd, A.T., 1993. Codon usage:mutational bias, translational selection, or both? Biochem. Soc.Trans. 21, 835–841.

Sharp, P.M., Tuohy, T.M., Mosurski, K.R., 1986. Codon usage in yeast:cluster analysis clearly differentiates highly and lowly expressedgenes. Nucleic Acids Res. 14, 5125–5143.

Shi, X.F., Huang, J.F., Liang, C.R., Liu, S.Q., Xie, J., Liu, C.Q., 2001.Is there a close relationship between synonymous codon bias andcodon–anticodon binding strength in human genes? Chin. Sci. Bull.12, 1015–1019.

Singer, G.A.C., Hickey, D.A., 2003. Thermophilic prokaryotes havecharacteristic patterns of codon usage, amino acid composition andnucleotide content. Gene 317, 39–47.

Stenico, M., Lloyd, A.T., Sharp, P.M., 1994. Codon usage inCaenorhabditis elegans: delineation of translational selection andmutational biases. Nucleic Acids Res. 22, 2437–2446.

Sueoka, N., 1988. Directional mutation pressure and neutral molecularevolution. Proc. Natl. Acad. Sci. U.S.A. 85, 2653–2657.

Naya, H., Romero, H., Carels, N., Zavala, A., Musto, H., 2001. Trans-lational selection shapes codon usage in the GC-rich genomes ofChlamydomonas reinhardtii. FEBS Lett. 501, 127–130.

Peixoto, L., Zavala, A., Romero, H., Musto, H., 2003. The strength oftranslational selection for codon usage varies in the three repliconsof Sinorhizobium melioti. Gene 320, 109–116.

Percudani, R., Pavesi, A., Ottonello, S., 1997. Transfer RNA generedundancy and translational selection in Saccharomyces cere-visiae. J. Mol. Biol. 268, 322–330.

White, O., Eisen, J.A., Heidelberg, J.F., Hickey, E.K., Peterson, J.D.,Dodson, R.J., Haft, D.H., Gwinn, M.L., Nelson, W.C., Richardson,D.L., et al., 1999. Genome sequence of the radioresistant bacteriumDeinococcus radiodurans R1. Science 286, 1571–1577.

Wright, F., 1990. The “effective number of codons” used in a gene.Gene 87, 23–29.

Zavala, A., Naya, H., Romero, H., Musto, H., 2002. Trends in codonand amino acid usage in Thermotoga maritima. J. Mol. Evol. 54,563–568.