genomicstructure dnaencoding the lymphocyte homing ...proc. nadl. acad. sci. usa vol. 89, pp....

5
Proc. Nadl. Acad. Sci. USA Vol. 89, pp. 12160-12164, December 1992 Immunology Genomic structure of DNA encoding the lymphocyte homing receptor CD44 reveals at least 12 alternatively spliced exons (CD44 gemn) GAVIN R. SCREATON, MARTYN V. BELL, DAVID G. JACKSON, FRANCOIS B. CORNELIS, ULRICH GERTH, AND JOHN I. BELL Molecular Immunology Group, Institute of Molecular Medicine, John Radcliffe Hospital, Headington, Oxford OX3 9DU, United Kingdom Communicated by David Weatherall, August 27, 1992 (received for review June 26, 1992) ABSTRACT The CD44 molecule is known to display ex- tensive size heterogeneity, which has been attributed both to alternative splicing and to differential glycosylation within the exracellular domain. Although the presence of several alter- native exons has been partly inferred from cDNA sequencing, the precise intron-exon oration of the CD44 gene has not been described to date to our knowledge. In the present study we describe the structure of the human CD44 gene, which contains at least 19 exons s ning some 50 kiobases of DNA. We have identified 10 alternatively spliced exons within the extracellular domain, including 1 exon that has not been previously reported. In addition to the cluson or excusion of whole exons, more diversity is generated through the uztion of internal splice donor and acceptor sites within 2 of the individual exons. The variation previously reported for the cytoplasmic domain is shown to result from the alternative splicing of 2 exons. The genomic structure of CD44 reveals a remarkable degree of complexity, and we confirm the role of alternative splicing as the basis of the structural and functional diversity seen in the CD44 molecule. The human CD44 glycoprotein [Pgp-1 (1), HCAM (2), Hermes antigen (3), ECMR III (4)] has been proposed to function as a lymph node homing receptor on circulating lymphocytes. Expressed on a wide range of different tissues, the CD44 molecule also binds the extracellular matrix components hy- aluronic acid (5), fibronectin (6), and collagen (4) as well as the cytoskeletal protein ankyrin (7). Several antibodies recogniz- ing CD44 have been shown to induce lymphocyte activation (8, 9) and to inhibit lymphopoiesis (10). In addition, CD44 can mediate both homotypic and heterotypic cell adhesion (11, 12). The many functional roles of this molecule may relate to its considerable size heterogeneity, which cannot be accounted for simply by differences in glycosylation (5, 13, 14). Recently, we and others have isolated a number of differ- ent isoforms of CD44 (15-18). These have been characterized by cDNA sequencing and appear to arise by alternative splicing in two different regions, the membrane proximal extracellular domain and the cytoplasmic tail. Some of this variation has been shown to produce functional changes in the molecule. For example, the presence of the 396-base-pair (bp) insert in the epithelial variant of CD44 reduces the affinity for hyaluronic acid (5). Study of these cDNA variants has given some insight into the genomic organization of CD44 but has not allowed the precise characterization of the number and boundaries of exons that encode the variant region of the molecule. In the present paper we have cloned the gene for CD44 from a human yeast artificial chromosome (YAC) and char- acterized its genomic structure.* These studies reveal a remarkable degree of complexity within the structure of the CD44 gene and demonstrate conclusively the role of alter- native splicing in generating the structural heterogeneity that is characteristic of the CD44 molecule. MATERIALS AND METHODS Isolatin, Chatization, and Sudoing of a CD44- Contining YAC. The primer pair Fil-Ril was used to screen a YAC library provided by the U.K. Human Genome Mapping Resource Centre (19) by the PCR method (20). Total yeast DNA from positive YACs was prepared in agarose blocks (21) and analyzed by pulsed-field gel electrophoresis, Southern transfer, and hybridization to 32P-labeled CD44 cDNA variant E (17). A partial restriction map was generated by the partial digestion and indirect end-labeling method (21). To obtain subclones of the YAC, total yeast DNA in agarose blocks from one of the YAC clones was digested for 2 min with 1 unit of Taq I per jig of DNA. After phenol/ chloroform extraction, the fragments were cloned into the unique Cla I site of pL53In (22) and used to transform Escherichia coli DH5a cells. Clones containing CD44 were identified by colony hybridization to 32P-labeled CD44 vari- ant E (17) and intron PCR products (this paper). Plasmid DNA was prepared by standard methods (23), and the CD44 exon content was determined by dot-blot hybridization to 32P-end-labeled oligonucleotides. PCR Amplification. One-hundred nanograms of total yeast DNA or 500 ng of human placental DNA was used as template in a 50-jSu reaction mix that contained 10 mM Tris-HCl (pH 8.4), 50 mM KCl, 2.5 mM MgCl2, 250 L&M each dNTP, the appropriate primers at 200 nM, and 2 units of Taq DNA polymerase. The cycle parameters were 5 min at 950C, 1 min at 630C, and 3 min at 720C (1 cycle) followed by 1 mm at 950C, 1 min at 630C, and 3 min at 720C for 40 cycles. PCR products were verified by direct sequencing across the splice junctions. Sequencing of double-stranded templates, either PCR products or plasmids, was performed by using 32Pend- labeled primers and Taq DNA polymerase (24). PCR primers used to amplify introns are given in Table 1. Sinl Prie PCR. In cases where it was not possible to amplify whole introns by PCR and no YAC subclones were available, we developed a single-primer PCR method to se- quence across the splice junction. Buffer conditions were the same as for standard PCR (see above), and the primer concen- trations were doubled. Three separate 60-cycle amplifications used annealing temperatures of 5(YC, 60C, and 70CC. The products of these three reactions were pooled and sequenced directly; an internal primer was used to provide specificity. Abbreviations: YAC, yeast artificial chromosome; 3' UTR, 3' un- translated region. *The sequences reported in this paper have been deposited in the GenBank data base (accession nos. L05407-LO5424 for CD44 exons 2-19). 12160 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on June 27, 2021

Upload: others

Post on 07-Feb-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

  • Proc. Nadl. Acad. Sci. USAVol. 89, pp. 12160-12164, December 1992Immunology

    Genomic structure of DNA encoding the lymphocyte homingreceptor CD44 reveals at least 12 alternatively spliced exons

    (CD44 gemn)

    GAVIN R. SCREATON, MARTYN V. BELL, DAVID G. JACKSON, FRANCOIS B. CORNELIS, ULRICH GERTH,AND JOHN I. BELLMolecular Immunology Group, Institute of Molecular Medicine, John Radcliffe Hospital, Headington, Oxford OX3 9DU, United Kingdom

    Communicated by David Weatherall, August 27, 1992 (receivedfor review June 26, 1992)

    ABSTRACT The CD44 molecule is known to display ex-tensive size heterogeneity, which has been attributed both toalternative splicing and to differential glycosylation within theexracellular domain. Although the presence of several alter-native exons has been partly inferred from cDNA sequencing,the precise intron-exon oration of the CD44 gene has notbeen described to date to our knowledge. In the present studywe describe the structure of the human CD44 gene, whichcontains at least 19 exons s ning some 50 kiobases of DNA.We have identified 10 alternatively spliced exons within theextracellular domain, including 1 exon that has not beenpreviously reported. In addition to the cluson or excusion ofwhole exons, more diversity is generated through theuztionof internal splice donor and acceptor sites within 2 of theindividual exons. The variation previously reported for thecytoplasmic domain is shown to result from the alternativesplicing of 2 exons. The genomic structure of CD44 reveals aremarkable degree of complexity, and we confirm the role ofalternative splicing as the basis of the structural and functionaldiversity seen in the CD44 molecule.

    The human CD44 glycoprotein [Pgp-1 (1), HCAM (2), Hermesantigen (3), ECMR III (4)] has been proposed to function as alymph node homing receptor on circulating lymphocytes.Expressed on a wide range of different tissues, the CD44molecule also binds the extracellular matrix components hy-aluronic acid (5), fibronectin (6), and collagen (4) as well as thecytoskeletal protein ankyrin (7). Several antibodies recogniz-ing CD44 have been shown to induce lymphocyte activation(8, 9) and to inhibit lymphopoiesis (10). In addition, CD44 canmediate both homotypic and heterotypic cell adhesion (11, 12).The many functional roles of this molecule may relate to itsconsiderable size heterogeneity, which cannot be accountedfor simply by differences in glycosylation (5, 13, 14).

    Recently, we and others have isolated a number of differ-ent isoforms ofCD44 (15-18). These have been characterizedby cDNA sequencing and appear to arise by alternativesplicing in two different regions, the membrane proximalextracellular domain and the cytoplasmic tail. Some of thisvariation has been shown to produce functional changes inthe molecule. For example, the presence ofthe 396-base-pair(bp) insert in the epithelial variant of CD44 reduces theaffinity for hyaluronic acid (5). Study ofthese cDNA variantshas given some insight into the genomic organization ofCD44but has not allowed the precise characterization of thenumber and boundaries of exons that encode the variantregion of the molecule.

    In the present paper we have cloned the gene for CD44from a human yeast artificial chromosome (YAC) and char-acterized its genomic structure.* These studies reveal a

    remarkable degree of complexity within the structure of theCD44 gene and demonstrate conclusively the role of alter-native splicing in generating the structural heterogeneity thatis characteristic of the CD44 molecule.

    MATERIALS AND METHODSIsolatin, Chatization, and Sudoing of a CD44-

    Contining YAC. The primer pair Fil-Ril was used toscreen a YAC library provided by the U.K. Human GenomeMapping Resource Centre (19) by the PCR method (20). Totalyeast DNA from positive YACs was prepared in agaroseblocks (21) and analyzed by pulsed-field gel electrophoresis,Southern transfer, and hybridization to 32P-labeled CD44cDNA variant E (17). A partial restriction map was generatedby the partial digestion and indirect end-labeling method (21).To obtain subclones of the YAC, total yeast DNA in

    agarose blocks from one of the YAC clones was digested for2 min with 1 unit of Taq I per jig of DNA. After phenol/chloroform extraction, the fragments were cloned into theunique Cla I site of pL53In (22) and used to transformEscherichia coli DH5a cells. Clones containing CD44 wereidentified by colony hybridization to 32P-labeled CD44 vari-ant E (17) and intron PCR products (this paper). PlasmidDNA was prepared by standard methods (23), and the CD44exon content was determined by dot-blot hybridization to32P-end-labeled oligonucleotides.PCR Amplification. One-hundred nanograms of total yeast

    DNA or 500 ng of human placental DNA was used astemplate in a 50-jSu reaction mix that contained 10 mMTris-HCl (pH 8.4), 50mM KCl, 2.5 mM MgCl2, 250 L&M eachdNTP, the appropriate primers at 200 nM, and 2 units of TaqDNA polymerase. The cycle parameters were 5 min at 950C,1 min at 630C, and 3 min at 720C (1 cycle) followed by 1 mmat 950C, 1 min at 630C, and 3 min at 720C for 40 cycles. PCRproducts were verified by direct sequencing across the splicejunctions. Sequencing of double-stranded templates, eitherPCR products or plasmids, was performed by using 32Pend-labeled primers and Taq DNA polymerase (24). PCR primersused to amplify introns are given in Table 1.Sinl Prie PCR. In cases where it was not possible to

    amplify whole introns by PCR and no YAC subclones wereavailable, we developed a single-primer PCR method to se-quence across the splice junction. Buffer conditions were thesame as for standard PCR (see above), and the primer concen-trations were doubled. Three separate 60-cycle amplificationsused annealing temperatures of 5(YC, 60C, and 70CC. Theproducts of these three reactions were pooled and sequenceddirectly; an internal primer was used to provide specificity.

    Abbreviations: YAC, yeast artificial chromosome; 3' UTR, 3' un-translated region.*The sequences reported in this paper have been deposited in theGenBank data base (accession nos. L05407-LO5424 for CD44 exons2-19).

    12160

    The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

    Dow

    nloa

    ded

    by g

    uest

    on

    June

    27,

    202

    1

  • Proc. Natl. Acad. Sci. USA 89 (1992) 12161

    Table 1. Primers for PCR amplification of intronsPrimer SequenceF2 AACCTGCCGCTTTGCAGGTGTATTR3 GAAGCAATATGTGTCATACTGGGAGF4 CTCCACCTGAAGAAGATTGTACATCF5 TTACACCTTTTCTACTGTACACCCCR5 TAGCAGGGATTCTGTCTGTGCTGTF6 CACTTTGATGAGACTAGTGCTACAGR6 CAGCCATTTGTGTTGTTGTGTGAAGF7 GTACGTCTTCAAATACCATCTCAGCR7 CTTCATCATCATCAATGCCTGATCCF8 ACGGGCCTTTGACCACACAAAACAR8 TTTGAATGGCTTGGGTTCCACTGGF9 CCCCTCATTCACCATGAGCATCATR9 GCTTGTAGAATGTGGGGTCTCTTCF10 CCAGGCAACTCCTAGTAGTACAACR10 GAATGGGAGTCTTCTCTGGGTGTTF1l CTCAGCTCATACCAGCCATCCAATR11 CCATCCTTCTTCCTGCTTGATGACF12 TGGACTCCAGTCATAGTACAACGCR12 GTCATTGAAAGAGGTCCTGTCCTGF13 GCAGAGTAATTCTCAGAGCTTCTCTR13 TGTCAGAGTAGAAGTTGTTGGATGGF14 ATGTCACAGGTGGAAGAAGAGACCR14 TGGAATCTCCAACAGTAACTGCAGTF15 tgcagtactgaccttcctgattgcR15 TCAGATCCATGAGTGGTATGGGACF16 ggggggaaatcttttgtgtaatgctR16 TTTGGGGTGTCCTTATAGGACCAGF17 ccagaattacctgcctattggctgR17 GACTGCAATGCAAACTGCAAGAATCR18 AGGGAAAATGAGGAAGCTGGAAGGF19 gatgatctgatgcctgagtcactgR19 ccttaccattctcagctcttcctg

    Sequences begin with the 5' nucleotide. Primers in lowercaseletters are based upon the intron sequence and will amplify across thenearest adjacent exon. The numbers correspond to the exon nearestto the primer (eg., the combination of F2 with R3 will amplify theintron between exons 2 and 3 and will have part of exons 2 and 3 ateach end).

    RESULTS AND DISCUSSIONGenomic Map of CD44. A YAC clone containing the CD44

    gene was obtained by PCR screening ofa library provided bythe U.K. Medical Research Council Human Genome Map-ping Resource Centre. Restriction and pulsed-field gel elec-trophoresis analyses (not shown) indicate that this clone isapproximately 320 kilobases (kb) in size and that the CD44

    Ab .

    gene is contained within a 105-kb Sal I fragment and occupiessome 50-60 kb of genomic DNA upon adjacent Cla I frag-ments of approximately 29 and 31 kb.A partial restriction map of the YAC is shown in Fig. 1A

    with the CD44 portion enlarged in Fig. 1B. The 5' end of thegene is located by the sites for Sma I and Sac II, which arepresent in the published sequence approximately 200 bp 5' ofthe initiation codon (18). Portions of the CD44 gene, eitherintron PCR products or plasmid subclones, were assigned tointervals in the map by restriction analysis (Fig. 1C).Most of the introns, except those between exons 1 and 2, 3

    and 4, 5 and 6, and 18 and 19, were readily amplified fromgenomic DNA; PCR products or subclones were sequenced toreveal the precise splice boundaries and the flanking splicesites. The coding sequence of the gene and intron sizesestimated fromPCR are shown in Fig. 2 along with the flankingintron sequences; the intron at the 3' end of exon 3 was notamplified by either conventional or single-primer PCR.Membrane Proximal Extracellular Domain. The region of

    the CD44 gene encoding the membrane-proximal extracellu-lar domain spans 25 kb ofgenomic DNA and contains at least10 alternatively spliced exons (5-14) including one, exon 6,that has not been previously described (Fig. 3). Nine of thesevariable-region exons (exons 6-14) can be skipped by alter-native splicing, while exon 5 contains an alternative splicedonor site; in addition, exon 7 contains two distinct spliceacceptor sites. A variant has also been described in the rat,where the homologue ofhuman exon 15 has been spliced out(25), giving even greater potential diversity to this region.Exon 6 was identified in cDNA from the colon carcinoma

    cell line HT-29 by PCR, where we designed primers to searchfor new exons by amplification between variable regionexons that were presumed to be adjacent. Amplificationbetween exons 5 and 7 consistently produced a product 320bp long in addition to the expected product of 191 bp (Fig. 4).The larger product was directly sequenced and shown tocontain the 129-bp insert, exon 6 (Fig. 2). The sequence ofexon 6 encodes a 43-amino acid peptide that contains a highproportion of serine and threonine residues, a feature that isshared with the other alternative exons, allowing for exten-sive 0-linked glycosylation. This new exon was also detectedin several other cDNAs including mammary and bladdertumors by PCR (data not shown). We have preliminaryevidence for further alternatively spliced exons lying be-tween exons 5 and 6 in the mouse (data not shown).The cDNA sequences for this variable region can now be

    located on our new exon map (Fig. 3); exons 14, 13, 12, 7, and11 correspond to exons 1-5 previously reported from our

    '1I I I

    300 200 1OQ - - -

    K A

    I I IT I

    A

    AX

    11 17

    1 2 3 4 5 6 7 a 9 10 11 12 13 14 15 16 17 18 19L -La'_ " .. s A .....-.-L -"RRR RCP R RA PR R R P

    R R RR A PAR Rp44.2 p44.4 p.44 19 .15

    p45 I 9 R RAPp44.11

    o 10KEI _ C

    FIG. 1. Genomic organization of the CD44 gene. (A) Partial restriction map of the whole YAC. (B) Terminal portion shown in more detail.(C) Exon distribution aligned with the YAC restriction map and plasmid subclones. Restriction site symbols are as follows: K, Sac II; A, SmaI; L, Sal I; C, Cla I; X, Xho I; R, EcoRI; P, Kpn I. The open and solid triangles represent the URA3 and TRPI YAC vector arms, respectively.The dashed lines in C represent introns, the sizes of which can only be estimated from the restriction data and cannot be directly measured.

    - .... . -mI MI

    m I if I I -1

    Immunology: Screaton et al.

    I I

    Dow

    nloa

    ded

    by g

    uest

    on

    June

    27,

    202

    1

  • 12162 Immunology: Screaton et al. Proc. Nati. Acad. Sci. USA 89 (1992)

    D X W WHV AAA W L C L V P L 8 L A Q I D1ami .A¶_CAAGT__-OOCACOCAGCCTOO0OACTCTOOC1V....C...T..cC.O.cO..A..gtgagtcccgCgec.gcau... .... aattctatattcttcccotagATL N I T C R F A a v F H V 9 X t0 a R Y 8 I 8 R T Z A A D L C K A F N 8 T L P T N A Q N 3

    K A L 8 I a FF T C R Y a F I I a H V V I P RAA:AOTCT0AOCA¶OOAT¶¶rAO dXTOCARgtaagagaccagcaccgc..cg....S....... 3..... ctgttgtttgttctctttetc A__I H P N 8 I C A A N N T O V Y I L T O N T O Q Y D T Y C F N A 0 A

    _ _ \ _ _ > _ _ .... ... ctttttttcctacctcatagCTP P L I D C T 8 V T D L P N A F D a P I T I T I

    TAAAA CMC CAACCATtAtAtgtotettettot...........................................tgtctctctcta.atct..... ....gta.5..t tecctctcccagcTATTV N R D. a T R Y V Q K a B Y R T N P L D I Y P 8 N P T D D D V O8 aO 8 8 1 R 8 8 T 8 a

    a Y I F Y T F 8 T V H P I P D * D 8 P V I T D 8 T D R I P A T T_._uoas4t.... ttca

    L 08 T AT A T T A T K R Q I T W D W F O W L F L P 80 1 N H L N T Tactattattacaac AlaA

    T 3 8 N T I 8 A a W 9 P N t t N 2 DT Q N A aACACAAATOGOCTgtaatgagttattoatcC...... .....2=0 .. ..atgcac__ATtaTcA A AA

    3 R D R H L 8 F 8 a 8 a I D D D * D F I 8 8 T I1ctagaotsatceattocag.... ...k .... .. ttcaatcatcgttatcacag

    O T T P R A F D H T K Q N Q D W T Q W N P 0 H 8 N P I V L L Q T T T R M T DTVAAAscCOGAICTACTCPAFCAACCACAMOATQACT~taatgggmttgtrcatat

    V D R N G T T A Y 9 G N W N P R A H P P L I H H 9 H H Ett.... 0.7kb... o..tttacctctca

    I I I T P H 8 T 8 T I Q A T P 0 8 T T I Z T AGA GA GAGACCCCACATICTACAAOCACA~gteaagcaagatggoggtcgg ..... Skb...... ....... aactgatttettectcacT Q K Z Q W F O N R III 0a Y R Q T P R I D O H O T T a T A AACCCAOAAO0ACWI¶'IZOCAACAOAT~OCATG&OOO&TATVOCCAAACAtCCAOMAAOACTCCCM¶VOACAAC&OOOAC&OC eaatggatggtttaacaag .... l.Skb ....uoll.

    A 8 A H T 8 H P M. Q a R T T P 8 P Z D 8 8 W T D F F N P I 8 M P M1 a R a. .ctcaaactgcatggtcc___ _ _= _Ssoac

    H Q A a R R M D M D 80 H 8 T T L Q P T A N PCATCAAGCAGOAAGAAGOATOcgtoatagcctcgagtttt .... .l.kb......t..oo.actatttgattcctt T__CA

    N T O L V I D L D R T G P L 0 M T T Q Q O N 8_ _ AACOgtaogaataocgatgctcag... .2.3kb.... tWOU13.... ttcattcctcagaaoAcAA T

    Q 8 F 8 T 8 N Z G L ZS Z D 1C D H P T T 8 T L T 8 8 N_AAC _TACA _ _ATOA _C¶¶OOAMAMATAAOACCA1iCACAC1CTAC¶VTOACATVMOCtg~oagggettataaoocctag.... .......z14.... ctgattccaccetc

    R N D V T a G R R D P N H 8 Z a 8 T T L L Z a Y T 8 H Y P N T It 3 9 R T r I P V T

    O A K T O8 F 0 V T A V T V O D O N 8 N V N R 0 L 0 0IACAATOcT¶wrCwrcOrtAAltv geaaettggceatttattatc... .5.b ....Bm15..... tcctgattg

    D Q D T F H P 8 a a 8 H T T M a 8 B 8 D actcartec _ Ttcaaaactgcttte.....5kb... luxmlf .gtgttgtttttcccc

    H O H 0 8 Q 1 0 0 A N T T O O P I R T P Q I P 1tt_ _AA AA t gtgagettcaaatttga.g... 2.5kb... . 17.. . atttaattactc

    W L I I L A S L L A L A L I L A V C I A V N 8 R R Ratacc gATTOOTOAVGarZ.cc.........................CATT'OCATIVAACAOTCGAGAA.gtaaggggctgtcctggggg..... .6kb... .tosS.... ct

    attttotgggaaactgtFaMMC0TCAAGM A.AG.Q-A

    C O Q K K K L V I N 80 NT1CCCTO~kPt1t1'OOC'1,ATTFAAATATGAATTTCCTattgtgtgCatcagoccttt ..2p19 .... attttttaoattatt _ _AOOO A V Z D R K P 8O L N I AA 8 08 Q I M V H L V 1 008 I9 T P D Q F M Aa D I T R NL Q N V D ItK -I V **

    FIG. 2. Nucleotide sequence for the CD44 gene and predicted amino acid sequence ofCD44 molecule. cDNA sequence is in uppercase letters,with the junctional intron sequence in lowercase letters. Intron lengths are based upon PCR; ifno distance is given, amplification failed to bridgethe gap, and an estimate of the distance from restriction data is given in Fig. 1. Translation termination codons are marked "***," and the longA+T-rich tract associated with the short-tail 3' untranslated region (UTR) is underlined. The sequence for exon 1, coding for the putative leaderpeptide, is taken from ref. 18 and our own cDNA sequence. Exons 5-14 encode the variable region exons of the proximal extracellular domain,with the two arrows marking the alternative splice donor and acceptor sites in exons 5 and 7, respectively (15, 18).

    laboratory (17). Ofthe five domains reported by Hoffman et al.,three represent more than one exon; thus, exon 7 = domain 1,exons 8 and 9 = domain II, exons 10 and 11 = domain III, exons12 and 13 = domain IV, and exon 14 = domain V.The sequence between exons 12-14 is the only variable

    portion ofthe molecule for which genomic sequence has beenpreviously reported (26); however, some of this sequencediffers from that generated in our study. We could find no

    evidence of polymorphism to account for these differencesafter sequencing the region from nine different haplotypesfrom four individuals and the YAC clone.

    CytoplasMic Tail. cDNA variants ofCD44 that encode long(27) and short (2) cytoplasmic tails have been reported. Themore abundant form has a 210-bp coding for a 70-amino acidcytoplasmic tail followed by a 3' UTR extending for either0.12, 0.69, or 3 kb downstream of the translation stop codon

    Dow

    nloa

    ded

    by g

    uest

    on

    June

    27,

    202

    1

  • Proc. Natl. Acad. Sci. USA 89 (1992) 12163

    LP 'I'M

    aExonno 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

    b

    C

    d

    Membrane Proximal Variable Domain Tail Exons

    FIG. 3. (a) Schematic exon map showing all 19 exons ofCD44 (see Fig. 2). Open boxes represent constitutive exons, and solid boxes representalternative exons that can be wholly spliced out. Alternative splice donor and acceptor sites on exons 5 and 7 are marked with arrows, and thecrosshatching represents the 3' UTR sequence. Exons encoding the leader peptide LP, and transmembrane region TM are marked. (b and c)Hemopoietic variants of CD44 encoding short (b) or long (c) cytoplasmic tails. (d) Exons encoding the epithelial form of the molecule.

    (18). In the less abundant form, the DNA encoding thecytoplasmic tail is truncated to 9 bp and is followed by adistinct 3' UTR. As the junction between short and long tailscontains the invariant part of the splice donor consensus GTin humans, baboons (28), and mice (29), it has been proposedthat the short tail is generated by splicing from an internalsplice donor sequence within the long tail to a downstream 3'untranslated exon (30). This is not the case; in fact charac-terization of the exons encoding the cytoplasmic tails revealsthat the coding and 3' UTR of the long tail are both carriedon exon 19, while exon 18 carries the 3' UTR associated withthe short-tailed variant of CD44 and exon 17 carries thetransmembrane domain and first three amino acids of thecytoplasmic tail, which are shared by both long- and short-tailed variants (Fig. 2).

    R R R R6 7 8 9

    R R R R R R10 11 12 13 14 15

    603-

    310-281234-

    W.194-_

    118

    FIG. 4. Autoradiogram of PCR products generated by usingcDNA from the colon cancer line HT-29 as template. A commonprimer F5 was used in all of the reactions along with exon-specificprimers for the variable region exons, R6-R14. The product gener-ated by the combination of F5 with R15 spans the whole of theinsertion site in the membrane-proximal extracellular domain. Prod-ucts were run on a 2% agarose gel and then transferred ontoHybond-N+ (Amersham). After transfer, the membrane was probedwith the oligonucleotide R5, which lies internal to F5 in the backbonesequence of the molecule and will detect all CD44-containing prod-ucts. The figure shows the two products generated with the primerpair F5-R7, the upper band contains exon 6 and also shows thediversity of transcripts produced by this cell line with more than oneproduct in most of the lanes. A major band of 534 bp in the productsgenerated by using the primer pair of F5-R15, which span the wholeinsertion site, corresponds to the epithelial variant of CD44, con-taining variable exons 12-14, while the lowest band of 148 bprepresents the hemopoietic variant lacking exons 6-14 (Fig. 3).

    Concusions. With 12 of 19 exons being capable of alter-native splicing and the common use of alternative donor oracceptor splice sites within exons, CD44 is potentially one ofthe most extensively alternatively spliced genes reported todate. Both the fibronectin gene (31) and CD45 (32) containthree alternatively spliced exons, whereas the rat a tropo-myosin gene (33) and NCAM (34) can alternatively splice 10of 15 exons and 12 of 27 exons, respectively. None of theseexamples contain a tandem array of at least 10 exons thatappear to be selected at random as is the case with CD44,which gives enormous potential diversity. Tissue-specificpatterns of alternative splicing have been shown to haveimportant functional correlates in several systems includingsex determination in Drosophila (35) and ligand specificity forfibronectin (31). In CD44 the role ofone alternatively splicedvariant in enhancing the metastatic potential of rat tumors(25) indicates that the membrane proximal extracellular do-main may have important effects in mediating cell adhesionand targeting. This is supported by the effect of the intro-duction of exons 12-14 in abrogating hyaluronic acid bindingby the human epithelial variant (5). The functional roles ofmany other splice variants of CD44 remain to be established.The alternative splicing of the cytoplasmic tail may relate

    to the potential role ofthe CD44 molecule in T-cell activation.Alteration in the length of the cytoplasmic tail may modulateintracellular signaling, as the cytoplasmic tail has potentialsites for the action of both protein kinases A and C (14).Furthermore, it has been shown that CD44 associates withcytoskeletal actin filaments and ankyrin, presumably via thecytoplasmic tail. In macrophages activation increases CD44phosphorylation and reduces contacts with the cytoskeleton(14). In lymphocytes CD44 has been demonstrated to beassociated with protein kinase C. Protein kinase C activationby anti-CD44 monoclonal antibodies may be involved in bothankyrin binding (36) and the promotion ofT-cell adhesion viaLFA-1 (lymphocyte function-association antigen 1) (11). Inaddition, variation in the length of the cytoplasmic tail mayalso affect the ability of the molecule to bind specific ligandsthrough the N-terminal domain, as the short tail has beenshown to reduce hyaluronic acid binding by the hemopoieticvariant (37). Interestingly the DNA sequence for the short-tail 3' UTR carries a long A+T-rich tract (Fig. 2). SimilarA+U-rich elements have been found in the 3' UTRs ofseveral mRNAs coding for growth factors and protoonco-genes and are implicated in targeting the mRNA for rapidturnover (38).The tissue-specific patterns ofCD44 alternative exon usage

    suggest that the regulation of splicing not only is responsiblefor the characteristic size heterogeneity but also may affectthe function of the molecule. Although individual cell lines

    Immunology: Screaton et al.

    Dow

    nloa

    ded

    by g

    uest

    on

    June

    27,

    202

    1

  • 12164 Immunology: Screaton et al.

    appear to express a number of different CD44 transcripts(Fig. 4), it remains to be established whether single cells alsoexpress more than one isoform of CD44.Comparison of splice donor and acceptor sites flanking the

    alternatively spliced exons has revealed no obvious variationfrom consensus sequences. However, analysis of the poly-pyrimidine tract from -14 to -4 bp before the splice acceptorsite reveals a pyrimidine content of 87% for the constitutiveexons (1-5 and 15-17) compared with 65% for the alternativeexons (6-14, 18, and 19). Increasing the pyrimidine contentof this tract has been demonstrated to reverse exon skippingin artificial constructs (39).The genomic structure of CD44 presented here should

    facilitate the characterization of isoforms expressed in anumber of different cell types including tumors and lympho-cytes. In lymphocytes the regulated expression of CD44isoforms may be involved in activation, targeting, and on-togeny, while in nonlymphoid tissues it may play an impor-tant role in the control of cell adhesion. Future studies on thenature and control of alternative splicing of CD44 togetherwith functional studies of the alternatively spliced exons,particularly their ligand specifities, may help to shed furtherlight onto these events.

    We thank Zoe Christolodoulou, Mike Francis, and Kay Davies andthe Medical Research Council Human Genome Mapping ResourceCentre for assistance in isolating the CD44 YAC; Michael Reth forthe plasmid vector; David Simmons for the HT 29 cDNA; KenSmyth for mammary and bladder tumor RNAs; Penny Catlin forproducing numerous oligonucleotides; Charles Bangham for discus-sions on the single primer PCR sequencing method; and Sarah Woodfor her help in preparing the manuscript. This work was supportedby the Wellcome Trust. G.R.S. is a Medical Research CouncilTraining Fellow.

    1. Zhou, D. F., Ding, J. F., Picker, L. J., Bargatze, R. F.,Butcher, E. C. & Goeddel, D. V. (1989) J. Immunol. 143,3390-3395.

    2. Goldstein, L. A., Zhou, D. F., Picker, L. J., Minty, C. N.,Bargatze, R. F., Ding, J. F. & Butcher, E. C. (1989) Cell 56,1063-1072.

    3. Jalkanen, S. T., Bargatze, R. F., Herron, L. R. & Butcher,E. C. (1986) Eur. J. Immunol. 16, 1195-1202.

    4. Wayner, E. A., Carter, W. G., Piotrowicz, R. S. & Kunicki,T. J. (1988) J. Cell Biol. 107, 1881-1891.

    5. Stamenkovic, I., Aruffo, A., Amiot, M. & Seed, B. (1991)EMBO J. 10, 343-348.

    6. Jalkanen, S. & Jalkanen, M. (1992) J. Cell Biol. 116, 817-825.7. Kalomiris, E. L. & Bourguignon, L. Y. (1988) J. Cell Biol. 106,

    319-327.8. Denning, S. M., Le, P. T., Singer, K. H. & Haynes, B. F.

    (1990) J. Immunol. 144, 7-15.9. Huet, S., Groux, H., Caillou, B., Valentin, H., Prieur, A. M.

    & Bernard, A. (1989) J. Immunol. 143, 798-801.

    10. Miyake, K., Medina, K. L., Hayashi, S., Ono, S., Hamaoka,T. & Kincade, P. W. (1990) J. Exp. Med. 171, 477-488.

    11. Koopman, G., van Kooyk, Y., de Graaff, M., Meyer, C. J.,Figdor, C. G. & Pals, S. T. (1990) J. Immunol. 145, 3589-3593.

    12. Shimizu, Y., Van Seventer, G., Siraganian, R., Wahl, L. &Shaw, S. (1989) J. Immunol. 143, 2457-2463.

    13. Brown, T. A., Bouchard, T., St John T., Wayner, E. & Carter,W. G. (1991) J. Cell Biol. 113, 207-221.

    14. Camp, R. L., Kraus, T. A. & Pure, E. (1991) J. Cell Biol. 115,1283-1292.

    15. Hofmann, M., Rudy, W., Zoller, M., Tolg, C., Ponta, H.,Herrlich, P. & Gunthert, U. (1991) Cancer Res. 51, 5292-5297.

    16. Dougherty, G. J., Landorp, P. M., Cooper, D. L. &Humphries, R. K. (1991) J. Exp. Med. 174, 1-5.

    17. Jackson, D. G., Buckley, J. & Bell, J. I. (1992) J. Biol. Chem.267, 4732-4739.

    18. Shtivelman, E. & Bishop, J. M. (1991) Mol. Cell. Biol. 11,5446-5453.

    19. Brownstein, B. H., Silverman, G. A., Little, R. D., Burke,D. T., Korsmeyer, S. J., Schlessinger, D. & Olson, M. V.(1989) Science 244, 1348-1351.

    20. Green, E. D. & Olson, M. V. (1990) Proc. Natl. Acad. Sci.USA 87, 1213-1217.

    21. Anand, R., Riley, J. H., Butler, R., Smith, J. C. & Markham,A. F. (1990) Nucleic Acids Res. 18, 1951-1956.

    22. Auch, D. & Reth, M. (1990) Nucleic Acids Res. 18, 6743-6744.23. Sambrook, J., Fritsch, E. F. & Maniatis, T. (1989) Molecular

    Cloning:A Laboratory Manual (Cold Spring Harbor Lab., ColdSpring Harbor, NY).

    24. Murray, V. (1989) Nucleic Acids Res. 17, 8889.25. Gunthert, U., Hofmann, M., Rudy, W., Reber, S., Zoller, M.,

    Haussmann, I., Matzku, S., Wenzel, A., Ponta, H. & Herrlich,P. (1991) Cell 65, 13-24.

    26. Cooper, D. L., Dougherty, G., Harm, H. J., Jackson, S., Bap-tist, E. W., Byers, J., Datta, A., Phillips, G. & Isola, N. R.(1992) Biochem. Biophys. Res. Commun. 182, 569-578.

    27. Stamenkovic, I., Amiot, M., Pesando, J. M. & Seed, B. (1989)Cell 56, 1057-1062.

    28. Isacke, C. M., Sauvage, C. A., Hyman, R., Lesley, J., Schulte,R. & Trowbridge, I. S. (1986) Immunogenetics 23, 326-332.

    29. Wolffe, E. J., Gause, W. C., Pelfrey, C. M., Holland, S. M.,Steinberg, A. D. & August, J. T. (1990) J. Biol. Chem. 265,341-347.

    30. Goldstein, L. A. & Butcher, E. C. (1990) Immunogenetics 32,389-397.

    31. Schwarzbauer, J. E. (1991) Bioessays 13, 527-533.32. Rothstein, D. M., Saito, H., Streuli, M., Schlossman, S. F. &

    Morimoto, C. (1992) J. Biol. Chem. 267, 7139-7147.33. Lees, M. J. & Helfman, D. M. (1991) Bioessays 13, 429-437.34. Reyes, A. A., Small, S. J. & Akeson, R. (1991) Mol. Cell. Biol.

    11, 1654-1661.35. Baker, B. S. (1989) Nature (London) 30, 521-524.36. Kalomiris, E. L. & Bourguignon, L. Y. (1989) J. Biol. Chem.

    264, 8113-8119.37. Lesley, J., He, Q., Miyake, K., Hamann, A., Hyman, R. &

    Kincade, P. W. (1992) J. Exp. Med. 175, 257-266.38. Shaw, G. & Kamen, R. (1986) Cell 46, 659-667.39. Dominski, Z. & Kole, R. (1991) Mol. Cell. Biol. 11, 6075-6083.

    Proc. Nad. Acad. Sci. USA 89 (1992)

    Dow

    nloa

    ded

    by g

    uest

    on

    June

    27,

    202

    1