position-specific codon conservation in hypervariable gene families

Download Position-specific codon conservation in hypervariable gene families

Post on 16-Sep-2016




0 download

Embed Size (px)


  • Disulfide bonds between conserved cysteine residuesmaintain the structural integrity of surface loops inlarge families of extracellular ligands and receptors, moststrikingly in the variable loops of venom-derived toxins1.Because cysteine can be encoded by two codons (TGT orTGC), T/C mutations in the third nucleotide of the tripletare expected to be silent mutations, that is, not selected foror against by the evolutionary forces acting on such pep-tides. Surprisingly, an analysis of cysteine codon usage in anumber of hypervariable gene families reveals strict codonconservation in specific positions adjacent to or within thehypervariable regions of these genes. This phenomenonsuggests the possible existence of specific positionalcodon-conservation mechanisms in certain genes and, furthermore, it can be used as a functional-genomics toolto identify critical residues in a particular protein family.

    The conopeptides are a large family of venom-derivedtoxins, recently suggested to be undergoing acceleratedevolution for hypervariability in the mature toxin domain2.

    To estimate the possible range of variability of conopep-tides, we examined their precursor cDNAs currently avail-able in GenBank and, after eliminating redundantsequences, chose the largest available family (the so-calledscaffold VI/VII grouping) for further study. The resultingmultiple sequence alignment consisted of 53 conopeptideprecursors from nine different species. As expected fromprevious studies of this family2, the open reading framesrevealed strong conservation in their N-terminal signalsequences, dropping somewhat in the pro-domain, andwith almost no conservation in the mature toxin segmentexcept for the invariant cysteine residues (Fig. 1a). Moststrikingly, examination of the corresponding nucleotidealignments revealed that five out of the six cysteines inthese peptides exhibit a pronounced position-specificcodon conservation (Fig. 1b). This position-specific codonconservation is all the more remarkable because it appearsin the most hypervariable region of the sequence (Fig. 1c).It is not a reflection of a global codon bias in these species

    Position-specific codon conservation inhypervariable gene families

    OutlookGENOME ANALYSISCodon conservation

    TIG February 2000, volume 16, No. 20168-9525/00/$ see front matter 2000 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(99)01956-3

    Silvestro G. Conticellosilvoc@wicc.weizmann.ac.il

    Yitzhak Pilpel*tpilpel@genetics.med.harvard.edu

    Gustavo Glusman*bmgustav@bioinfo.weizmann.ac.il

    Mike Fainzilbermike.fainzilber@weizmann.ac.il

    Laboratory of MolecularNeurobiology,Department of BiologicalChemistry; and *TheCrown Human GenomeCenter, Department ofMolecular Genetics;Weizmann Institute ofScience, 76100 Rehovot,Israel.


    trends in Genetics










    n %

    (c) 0.20.16





    Signal Propeptide Mature toxin

    Cys2 Cys5 Cys6



    Cys1 Cys2 Cys5 Cys6Cys3+4


    FIGURE 1. Position-specific cysteine codon conservation in conopeptides

    (a) Sequence logo12 from alignment of 53 conopeptides from nine species. Note the high conservation in the signal peptide, lower in the pro region, and high variabilityin the inter-cysteine loops of the mature peptide. (b) Cysteine codon conservation from the conopeptide multiple alignment. The probabilities of obtaining the observedcodon biases were estimated from a binomial distribution assuming a priori probabilities of 43.5% TGC versus 56.5% TGT, calculated from the codon bias tables forthe five most sequenced molluscan species. p values for the cysteine codon biases are highly significant for Cys1 (p,1019), Cys2 (p,105), Cys3 (p,1019), Cys4(p,1014) and Cys5 (p,109), and on the borderline for Cys6 (p 5 0.01). (c) Hypervariability in the immediate sequence environment of the conserved codons. Proteinsequence variability at each alignment position was calculated as previously described13. The solid horizontal line represents an average variability taken fromanalysis of our 260 BLAST data sets (see text), and the two dashed lines represent the margins of two standard deviations in each direction. Note the extreme variabilityof the sequence environment, compared with the high conservation of cysteines and their codons.

  • because the codon usage of TGC/TGT in molluscs is closeto 50%. Furthermore, the preferred codon for Cys1, Cys3,Cys4 and Cys5 in these conopeptides is TGC, whereasCys2 is preferentially encoded by TGT, and Cys6 shows aless-biased ratio of TGC/TGT (Fig. 1b).

    In order to find out if this observation is seen also inother variable gene families, we performed automatedBLAST3 searches of GenBank to identify sets of similarsequence stretches of 50 residues terminating on a cysteine.The resulting data were then restricted to 260 alignments containing 50 or more members, and these setswere analysed for conservation of the cysteine codon versusthe average T:C ratio in the nine nucleotide positionsimmediately after the cysteine codon (these nucleotides donot form part of the original BLAST query). After removalof the alignments showing an overall C or T bias in thenine neighboring nucleotide positions, a number of largegene families with specific cysteine codon conservationadjacent to an unbiased T/C environment were identified(Fig. 2). Most of the identified families are genes withhypervariable regions and, intriguingly for some of them(mucins, T-cell receptors, immunoglobulin heavy chain,and zinc-finger domain families), the conserved cysteinecodon is that immediately preceding the hypervariableregion. It is noteworthy that these families do not reveal

    different cysteine codon biases at different positions, asobserved for the conotoxins.

    Although global codon biases in different phyla arewell documented47, the only example for a region-restricted codon preference in a defined gene family thatwe are aware of is a preference for readily mutable codonsin the hypervariable regions of immunoglobulins8. Thistendency has been suggested to act as a possible facilitatorof accelerated somatic mutation9,10. Our observations sug-gest that an even more cogent phenomenon might occur ingene families that reveal expanded variability, such asvenom-derived toxins and various recognition moleculesin the immune system. The stringent position-specific con-servation of cysteine codons before or within hypervari-able regions of these different gene families suggests that aspecific mechanism might have arisen to ensure and main-tain the observed codon conservation. Such a mechanismcould have arisen in order to conserve the structurally crucialcysteines, thereby imposing the observed codon conservation as a byproduct of conservation of theencoded amino acid residue. Although our observationsdo not shed light on the molecular nature of such a mecha-nism, they do indicate that it is likely to work at the levelof specific DNA or RNA recognition and/or modification.Thus, one possibility is that specific protecting molecules

    Outlook GENOME ANALYSIS Codon conservation

    TIG February 2000, volume 16, No. 258

    trends in Genetics


    Environment Cys codonC T




    FIGURE 2. Cysteine codon conservation

    Cys codon conservation at identified positions in a number of gene families,compared with average C:T ratios calculated over nine neighboring nucleotidepositions following the cysteine codon. The probabilities of obtaining theobserved cysteine codon bias were estimated from a binomial distributionassuming a priori probabilities of 42% TGC; 58% TGT, according to the humancodon bias table; except for the Trypanosoma mucins for which the trypanosomebias of 66% TGC; 34% TGT was used. All biases shown are statisticallysignificant (p values from 10131038). Gene families in the panel: mucins(MUC); L-type calcium channels (CaCh); T-cell receptor g CDR3 (TRGV); T-cellreceptor a CDR3 (TRAV); T-cell receptor d CDR3 (TRDV); Ig heavy chain CDR3(IGHV); Ig constant regions (IGC1, IGC2); HLA (HLA1, HLA2, HLA3); pregnancy-specific glycoproteins (PSG1, PSG2); zinc-finger domain proteins (ZnF1, ZnF2)and cytochrome P450 (CYT).


    s in









    01 2 3 4 5 6 7 8 9 10 11 12



    n %





    O E





    Y codons D codons P codons


    O E


    O E


    O E

    FIGURE 3. Tyrosine codon conservation

    A position-specific tyrosine codon conservation in olfactory receptors. (a)Sequence logo in the region of the consensus Met, Ala, Tyr, Asp, Arg, Tyr(MAYDRY) derived from multiple alignment of 71 olfactory receptor DNAsequences. (b) O (observed) versus E (expected, as calculated from human codonbias table) codon usage for the conserved residues in the alignment. P value forY3 p ,10

    19 (calculated as above).

  • bind to cysteine codons in hypervariable regions ofselected genes, in order to protect them from the enhancedmutagenesis that might occur in close proximity. Theseprotective molecules, whose role might be analogous tothat of a lithographic mask, would thereby impose theobserved codon conservation as a byproduct of themechanism for conservation of the encoded amino acidresidue. More intriguingly, one might speculate that com-plexes of such hyperprotected codons could actually tar-get mutability by providing recognition sites for a mutatorcomplex, which could then operate on adjacent sequencestretches.

    Regardless of the molecular or evolutionary mecha-nisms u


View more >