organization ofthe humanlipoproteinlipase gene …. natl. acad. sci. usa vol. 86, pp. 9647-9651,...

5
Proc. Natl. Acad. Sci. USA Vol. 86, pp. 9647-9651, December 1989 Biochemistry Organization of the human lipoprotein lipase gene and evolution of the lipase gene family (gene structure/5'-flanking sequence/intron loss/tyrosine sulfation/exon shuffling) TODD G. KIRCHGESSNER*tt, JEAN-CLAUDE CHUAT§, CAMILLA HEINZMANN*t, JACQUELINE ETIENNE§, STEPHANE GUILHOT§, KAREN SVENSON*t, DETLEV AMEIS*¶, CATHERINE PILON§, LUC D'AURIOL§, ALI ANDALIBI*t, MICHAEL C. SCHOTZ*II, FRANCIS GALIBERT§, AND ALDONS J. LUSIS*t Departments of *Medicine and tMicrobiology, University of California, Los Angeles, CA 90024; IlLipid Research, Veterans Administration, Wadsworth Medical Center, Los Angeles, CA 90073; and §Laboratoire d'Hematologie Experimentale, Centre Hayem, H6pital Saint-Louis, 75475 Paris Cedex 10, France Communicated by Joseph L. Goldstein, August 14, 1989 ABSTRACT The human lipoprotein lipase gene was cloned and characterized. It is composed of 10 exons spanning -30 kilobases. The first exon encodes the 5'-untranslated region, the signal peptide plus the first two amino acids of the mature protein. The next eight exons encode the remaining 446 amino acids, and the tenth exon encodes the long 3'- untranslated region of 1948 nucleotides. The lipoprotein lipase transcription start site and the sequence of the 5'-flanking region were also determined. We compared the organization of genes for lipoprotein lipase, hepatic lipase, pancreatic lipase, and Drosophila yolk protein 1, which are members of a family of related genes. A model for the evolution of the lipase gene family is presented that involves multiple rounds of gene duplication plus exon-shuffling and intron-loss events. Lipoprotein lipase (LPL) functions in the catabolism of triglyceride-rich lipoproteins in the circulation. It resides on the luminal surface of capillary endothelial cells where it hydrolyzes triglycerides in chylomicrons and very low den- sity lipoproteins, thereby delivering fatty acids to tissues for storage or oxidation. Individuals genetically deficient in LPL activity exhibit extreme postprandial hypertriglyceridemia (1). The enzyme appears to be regulated at both transcrip- tional and posttranscriptional levels during differentiation and in response to nutritional and hormonal changes (2). Several hormones play a role in LPL expression, including insulin, thyroid hormone, and glucocorticoids (3). LPL is a member of a gene family of proteins that includes hepatic lipase (HL), pancreatic lipase (PL), and the Drosophila yolk proteins 1, 2, and 3 (YP1, YP2, and YP3) (4-8). The LPL- and HL-encoding genes are dispersed, mapping to human chromosomes 8p22 and 15q21, respectively (9), while the PL- encoding gene has not yet been mapped. HL hydrolyzes tri- glyceride from very low density lipoprotein remnant particles in the liver, whereas PL hydrolyzes dietary lipid in the intestine. These two proteins have catalytic and structural properties similar to LPL. However, the Drosophila yolk proteins, while exhibiting sequence similarity, lack lipase activity and other- wise show no obvious functional similarity to these enzymes. We now report the organization of the human lipoprotein lipase-encoding gene and the sequence of its 5'-flanking region.** Based on a comparison of intron-exon patterns, we propose a model for the evolution of the lipase gene family. MATERIALS AND METHODS Three libraries were used: a partial Alu I-Hae III digest of human genomic DNA cloned into the Charon 4A vector (10), a partial Sau3a digest of genomic DNA cloned into the EMBL4 vector (11), and a partial Mbo I digest of genomic DNA cloned into the EMBL3 vector (Clontech). LPL clones LPL35, LPL37, and LPL46 (8) containing the complete human LPL 3.5-kilobase (kb) cDNA were used to screen the above libraries by plaque hybridization. Five overlapping clones (clones 2, 3, 11, 13, and 15/E) spanning -50 kb and containing the complete human LPL gene were isolated and characterized. The sequence of LPL flanking, exonic, and bordering intronic regions was determined in both orienta- tions by the dideoxy chain-termination method. For S1 nuclease analysis, a 5'-end-labeled single-stranded DNA probe, mapping from 124 nucleotides (nt) downstream from nt 1 to the -240 position in Fig. 5, was prepared and hybridized to human adipose poly(A)+ or yeast RNA, di- gested with S1 nuclease, and analyzed as described (12). For primer-extension analysis, an end-labeled 17-mer antisense oligonucleotide that maps to 42 nt downstream from nt 1 (Fig. 5) was hybridized to 3 ,tg of human adipose poly(A)+ RNA and extended with avian myeloblastosis virus reverse tran- scriptase in 50 mM Tris HCI, pH 8.3/10 mM MgCI/20 mM KCI/5 mM dithiothreitol/1 mM dNTPs for 30 min at 37°C. Protein sequences were aligned using University of Wis- consin Genetics Computer Group sequence analysis software according to the algorithm of Needleman and Wunsch (13). RESULTS Structure of the Lipoprotein Lipase Gene. Five overlapping human genomic clones containing LPL sequences in bacte- riophage A vectors were isolated and characterized (Fig. 1). The restriction map of the insert sequences was determined for six enzymes (BamHI, HindIll, EcoRI, Pst I, Pvu II, and Xba I), and restriction fragments corresponding to cDNA sequences were subcloned for the sequencing of exonic and flanking regions. The predicted restriction map of the gene is in agreement with results obtained by Southern blot analysis of human genomic DNA (data not shown), indicating that rearrangements or deletions have not occurred during the cloning of the DNA. The LPL gene is composed of 10 exons spanning =30 kb. The first exon encodes 188 nt of 5'-untranslated sequence and coding sequence for the entire signal peptide plus the first two Abbreviations: LPL, lipoprotein lipase; HL, hepatic lipase; PL, pancreatic lipase: YP1, -2, and -3, yolk protein 1, 2, and 3; nt, nucleotide(s). tTo whom reprint requests should be addressed at: Department of Microbiology, University of California, Los Angeles, 405 Hilgard Avenue, Los Angeles, CA 90024. VPresent address: Medizinische Klinik, Universitats-Krankenhaus Eppendorf, Martinistrasse 52, 2000 Hamburg 20-07, F.R.G. **The sequence reported in this paper has been deposited in the GenBank data base (accession no. M29549). 9647 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Upload: phungnga

Post on 04-May-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Proc. Natl. Acad. Sci. USAVol. 86, pp. 9647-9651, December 1989Biochemistry

Organization of the human lipoprotein lipase gene and evolution ofthe lipase gene family

(gene structure/5'-flanking sequence/intron loss/tyrosine sulfation/exon shuffling)

TODD G. KIRCHGESSNER*tt, JEAN-CLAUDE CHUAT§, CAMILLA HEINZMANN*t, JACQUELINE ETIENNE§,STEPHANE GUILHOT§, KAREN SVENSON*t, DETLEV AMEIS*¶, CATHERINE PILON§, LUC D'AURIOL§,ALI ANDALIBI*t, MICHAEL C. SCHOTZ*II, FRANCIS GALIBERT§, AND ALDONS J. LUSIS*tDepartments of *Medicine and tMicrobiology, University of California, Los Angeles, CA 90024; IlLipid Research, Veterans Administration, WadsworthMedical Center, Los Angeles, CA 90073; and §Laboratoire d'Hematologie Experimentale, Centre Hayem, H6pital Saint-Louis, 75475 Paris Cedex 10, France

Communicated by Joseph L. Goldstein, August 14, 1989

ABSTRACT The human lipoprotein lipase gene wascloned and characterized. It is composed of 10 exons spanning-30 kilobases. The first exon encodes the 5'-untranslatedregion, the signal peptide plus the first two amino acids of themature protein. The next eight exons encode the remaining 446amino acids, and the tenth exon encodes the long 3'-untranslated region of 1948 nucleotides. The lipoprotein lipasetranscription start site and the sequence of the 5'-flankingregion were also determined. We compared the organization ofgenes for lipoprotein lipase, hepatic lipase, pancreatic lipase,and Drosophila yolk protein 1, which are members of a familyof related genes. A model for the evolution of the lipase genefamily is presented that involves multiple rounds of geneduplication plus exon-shuffling and intron-loss events.

Lipoprotein lipase (LPL) functions in the catabolism oftriglyceride-rich lipoproteins in the circulation. It resides onthe luminal surface of capillary endothelial cells where ithydrolyzes triglycerides in chylomicrons and very low den-sity lipoproteins, thereby delivering fatty acids to tissues forstorage or oxidation. Individuals genetically deficient in LPLactivity exhibit extreme postprandial hypertriglyceridemia(1). The enzyme appears to be regulated at both transcrip-tional and posttranscriptional levels during differentiationand in response to nutritional and hormonal changes (2).Several hormones play a role in LPL expression, includinginsulin, thyroid hormone, and glucocorticoids (3).LPL is a member of a gene family of proteins that includes

hepatic lipase (HL), pancreatic lipase (PL), and the Drosophilayolk proteins 1, 2, and 3 (YP1, YP2, and YP3) (4-8). The LPL-and HL-encoding genes are dispersed, mapping to humanchromosomes 8p22 and 15q21, respectively (9), while the PL-encoding gene has not yet been mapped. HL hydrolyzes tri-glyceride from very low density lipoprotein remnant particles inthe liver, whereas PL hydrolyzes dietary lipid in the intestine.These two proteins have catalytic and structural propertiessimilar to LPL. However, the Drosophila yolk proteins, whileexhibiting sequence similarity, lack lipase activity and other-wise show no obvious functional similarity to these enzymes.We now report the organization of the human lipoprotein

lipase-encoding gene and the sequence of its 5'-flankingregion.** Based on a comparison of intron-exon patterns, wepropose a model for the evolution of the lipase gene family.

MATERIALS AND METHODSThree libraries were used: a partial Alu I-Hae III digest ofhuman genomic DNA cloned into the Charon 4A vector (10),

a partial Sau3a digest of genomic DNA cloned into theEMBL4 vector (11), and a partial Mbo I digest of genomicDNA cloned into the EMBL3 vector (Clontech). LPL clonesLPL35, LPL37, and LPL46 (8) containing the completehuman LPL 3.5-kilobase (kb) cDNA were used to screen theabove libraries by plaque hybridization. Five overlappingclones (clones 2, 3, 11, 13, and 15/E) spanning -50 kb andcontaining the complete human LPL gene were isolated andcharacterized. The sequence of LPL flanking, exonic, andbordering intronic regions was determined in both orienta-tions by the dideoxy chain-termination method.For S1 nuclease analysis, a 5'-end-labeled single-stranded

DNA probe, mapping from 124 nucleotides (nt) downstreamfrom nt 1 to the -240 position in Fig. 5, was prepared andhybridized to human adipose poly(A)+ or yeast RNA, di-gested with S1 nuclease, and analyzed as described (12). Forprimer-extension analysis, an end-labeled 17-mer antisenseoligonucleotide that maps to 42 nt downstream from nt 1 (Fig.5) was hybridized to 3 ,tg of human adipose poly(A)+ RNAand extended with avian myeloblastosis virus reverse tran-scriptase in 50 mM Tris HCI, pH 8.3/10 mM MgCI/20 mMKCI/5 mM dithiothreitol/1 mM dNTPs for 30 min at 37°C.

Protein sequences were aligned using University of Wis-consin Genetics Computer Group sequence analysis softwareaccording to the algorithm of Needleman and Wunsch (13).

RESULTSStructure of the Lipoprotein Lipase Gene. Five overlapping

human genomic clones containing LPL sequences in bacte-riophage A vectors were isolated and characterized (Fig. 1).The restriction map of the insert sequences was determinedfor six enzymes (BamHI, HindIll, EcoRI, Pst I, Pvu II, andXba I), and restriction fragments corresponding to cDNAsequences were subcloned for the sequencing of exonic andflanking regions. The predicted restriction map of the gene isin agreement with results obtained by Southern blot analysisof human genomic DNA (data not shown), indicating thatrearrangements or deletions have not occurred during thecloning of the DNA.The LPL gene is composed of 10 exons spanning =30 kb.

The first exon encodes 188 nt of 5'-untranslated sequence andcoding sequence for the entire signal peptide plus the first two

Abbreviations: LPL, lipoprotein lipase; HL, hepatic lipase; PL,pancreatic lipase: YP1, -2, and -3, yolk protein 1, 2, and 3; nt,nucleotide(s).tTo whom reprint requests should be addressed at: Department ofMicrobiology, University of California, Los Angeles, 405 HilgardAvenue, Los Angeles, CA 90024.VPresent address: Medizinische Klinik, Universitats-KrankenhausEppendorf, Martinistrasse 52, 2000 Hamburg 20-07, F.R.G.**The sequence reported in this paper has been deposited in theGenBank data base (accession no. M29549).

9647

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

9648 Biochemistry: Kirchgessner et al.

2 345 6 7 8 9 10

I IR I II IIt I I p.~~~~~~~~~ .,..... .........................p.................,%x P VP VP P V Vv VVX PV P PV vx V VX X15/

*29

5kb 1 1

13

FIG. 1. Organization of the human LPL gene. Exons (boxes 1-10) and intronic regions are shown on the top row with a restriction mapunderneath (B, BamHI; H, HindIIl; P, Pst I; R, EcoRI; V, Pvu Il; and X, Xba I). Sites indicated with asterisks are polymorphic in the generalpopulation (C.H., T.G.K., and A.J.L., unpublished work). Position on the map of the five A clones used to characterize the gene is also shown.A scale in kb is indicated.

amino acid residues of the mature protein. Exons 2 through9, with lengths ranging from 105 to 243 nt, contain theremaining 1339 nt of coding sequence beginning within thecodon for the third amino acid of the mature protein andending within the translation stop codon. The unusually long3'-untranslated region ofLPL mRNA, encompassing 1948 nt,is encoded in the tenth and final exon. Our sequencing dataare summarized in Table 1.

Evolution of the Lipase Genes. To gain insight into theevolution of the lipase gene family, we compared the genesof four members of this family: human LPL, human HL,canine PL, and Drosophila YP1. Relevant protein sequenceswere aligned to maximize amino acid identity and similaritybased on functional and evolutionary relatedness (Fig. 2A). Acomparison of the intron/exon organization of the threelipase genes reveals that the LPL and HL genes are veryclose in structure and are more similar to each other than tothe PL gene. To simplify the discussion of introns within thefamily, all intron numbers mentioned subsequently will referto those shown in Fig. 2B. While LPL and presumably HL(exons 9 and 10 in the HL gene have not been fully charac-terized) have 10 exons interrupted by 9 introns in identicallocations and phases (introns 2, 4, 6, 7, 9, and 11-14), the PLgene is composed of 13 exons separated by 12 introns (introns1, 2, 3, and 5-13). The first intron for both LPL and HL

(intron 2) occurs just downstream from the signal peptide-cleavage site for both LPL and HL (7 and 19 nt, respectively),whereas the first intron in the PL gene (intron 1) separates theshort 5'-untranslated sequence from the beginning of thecoding sequence, and the second intron is located immedi-ately after the signal peptide.Among the lipases, there are a total of seven introns that

are strictly conserved with respect to both the amino acidpositions and the codon phases they interrupt and a total ofseven introns that are not conserved (Fig. 2 A and B). Theunpaired introns include some that result from intron-loss orintron-gain events and one that may be due to a processreferred to as "intron sliding" (see below). Introns 1, 3, 5, 8,and 10 of the PL gene have no counterpart in the LPL/HLgene arrangement. Significant sequence homology with theother lipases exists on either side of each of these introns,ruling out intron sliding as a mechanism for the divergence oftwo adjacent introns (one in PL and the other in LPL/HL)from a common position for these PL-specific introns.Clearly, these introns have either been lost in the LPL/HLgenes or gained in the PL gene since their divergence. Thesame can be said for introns 4 and 14. However, in this casean intron loss must have occurred in the PL gene or a gain inthe LPL/HL genes because this intron is absent in the PLgene.

Table 1. Exon sizes and sequence at intron/exon boundaries of the LPL gene5' splice 3' splice Codon Exon length, Amino acids

Exon donor acceptor phase base pairs encoded276 277

1 CCGACCgtaagt... ==9.2kb ... ttccagAAAGAA 1 276 29

437 4382 TGGACGgtaagg... -4.3kb ...cttcagGTAACA 0 161 54

617 6183 ATGGAGgtaaga... -0.9kb ...ccaaagGAGGAG 0 180 60

729 7304 TTACTGgtaaga... -0.8kb ..tttaagGCCTCG 1 112 37

963 964

5 TTGGAGgtaaat... -1.9kb ...acccagATGTGG 1 234 78

1206 12076 ACAAAGgtaggc... -2.8kb ...caacagTCTTCC 1 243 81

1327 13287 CACTCTgtgagt... -1.6kb ...gtttagGCCTGA Il 121 41

1510 15118 GAAAAAgtaatt... -0.9kb ...ccacagGGTGAT lI 183 61

1615 16169 AGGCTGgtgagc... "33.lkb ...tctcagAAACTG ll 105 34

356310 TACCAC end - 1948

Exon and intron sequences are shown in upper- and lowercase letters, respectively. The approximatesize of each intron is given in kb. Intron phase l, II, and 0 refer to introns that interrupt after the first,second, and third nucleotide of a codon, respectively.

Proc. Natl. Acad. Sci. USA 86 (1989)

Biochemistry: Kirchgessner et al. Proc. Natl. Acad. Sci. USA 86 (1989) 9649

A EMV.SJLj TAD[, : A R V A AVADUR R

YP'~~~~~~~~~7 fl -APXYDEND C FQ A~E L S A NT~I VPR[flA7N~~~P7 W JA ESNAT.?: k~~~~~~7~PThlliYPLE [AC:IREJLJDjVF R NP NrfL E RG A L SN HFQNIDISrGrn

A RIEQT ATLGKVC SAVATT7~~~~~~~JVLW~~~~~~~~~~~ " RTNL1SS VL~~~N4LLD.C MF~~~~~~~~~~~~~~~V~~~..E~~~ E

YLL YVARP VEK..C4VZG~LGAL'NDKTVN%1 K K VVA1ALI~~~~NL~~Jrn~~~~SVCJ~~~N~hIV~~~L:C]WsL GA.~~~~~VhJGFAGSSG. .C:~~~~~~L~~jI,

.P7 LYAJVJSWIIS S2YE<S -CLD SJLNIn S]LI GAVD AGLEI'~RA 7 G.. .PCK : L,

LP~ CIA5lFY T C1 I N~[fiCKiL ECI DL L~D]~EI S F : E C'::D L A

YP: G Di Fs - EY E IYM K

W A~JA<:ALLNORJEL1.L VY ]ijPRMGT230~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~2

LP~ EIAIPEG1~QPC~ W~NIQG.A~ T~C RV JJD.:.LA !LEASNVSVP AVF A NV vAE~ V- A

HyLL~ PN LAYFN S< TA F N INFRHT mEH iL K G~EL _IL!A G mW4Y NS53 2GSCKGI1VWdRQPSPL VEPG.AS1YECAVNAPE EFV.0K'PWPCGTPCEG2VA AVG SVP KKW T AV<CNC JCL FNVYCC SA vYT CCW.1. Q. M S

327 K: C - RS[2P3,YKTSP 27KIFS[F C8EfY~rV

EP D!F P G .'1-C TGE R jGL V' D C:[qL H.:-WIII 7iuNG.SN WYGJLJLS~IAC ..VAJWLJ

YP .GRF A G:' P:AJSLGCH L f' A H A H F FE.-CV

Yp_[ FSEWSA S VSVSEAFMAAIYIIFF AFE1N T LYJ71~FD:F C <GP A V A

HL? EPNI P R -A RSVS] K I[V MKT~jjNTF CS NEN 7E-NT Jj~DL A7KR'!AlLC-K W U %J IPL. .:NLSf2KVKAANGRV,CS H VR.LNRSTKC

L'44\7PDGAS 5 G-PQM A A a

hHL E~~~~~~~~~~~~~~~~~~v QL

cP]LY {JRiAfvo4JTjf-JiTLPiiLQiTIToT MLJ-iELJMQ-K11111-JE.N D I1 RW G SVIL~n RAIGQAK....V F SrjGNT1

Fic.2.Comparison~~~~Pof th LRL HL,PLQK yVan PgneognizTios (A)AlgmnS f uaPDua L annLYndDoohlYP1prten eqenes mxiizngmachs f mio ci (nelete cde ienit ad imlait. umer rfet3hma LL mio cipositionsin the alignment;~~LP arrows Vinict sTe of VD L~ Rsina epid levaesaddboesidiat.ostinwhee .mii. of3S f4rsdemachpebxsnictepsiioswhr 2of4reide mth;hev bxe ndc teoiin hr nrn nerp h oigsqec[aboxsurrounding a~~~~P F, P I L GIresidue1s)indicatesIthatItheinrnitrupswti.!cdnn bo Etween Aresdue idc Te tha th iTrnitrusbetweencodons]; the~~~~~~Pclosed K ILtriangle indicates a posiionwhee 4 amno aidsfro thYP seuecewarmoe to opimzth. aigmet*,Trnsatonstp odn.Noe ha te irt 4 mio cis f P1ar nt ncudd n healgnen. hestucurl nfrmtin orth PandYP1genes is from~~~~~~res.1 and 15TepcieystutrlifrainfrteHQeei ro e.1 n AG thkH il n

boxesindicatesignal peptide, ma~~~~~~~~~~~TureS0.LprtinKnnnoIng seuece, Qrepciey nm erl,IITn rfrt ntohse(nrnsi'hsI,II,and0 interrupt~~~~~Hco In seuneatrtefrt eod n thrdn uceoids repciey in aco1don;nmerDnrrwLnict htheopentriangle below~~~~~. .th .ecn exo in YP indiAte th QostoofaK paetisrino 4aioais(e et.Nt httelsLPLexonis not drawn~~~~~~~~~to scale.Finally,the difference in~~~7,the Lpoito of inro2Emnlh nrn uin vltorpaig reitn ie.Ti

LPL/HL and PL genes is best explained by intron sliding change results in a shift in intron position between related(17). According to this model, the difference results from the genes with an insertion or deletion ofcoding sequence. Introncreation ofnew splice donor or acceptor sites within exons or 2 interrupts in the same phase and is separated by only 12

9650 Biochemistry: Kirchgessner et al.

amino acids in the LPL/HL and PL genes. In addition, thereare no amino acid sequence similarities between the LPL/HLsequences and PL sequence within this region. If intronsliding is the mechanism responsible for this difference in theposition of intron 2, a shift ofthe translation start site towardsthe COOH terminus must have occurred in the LPL/HLsequence subsequent to the sliding event, thus masking thegain in protein-coding sequence of these genes relative to PLthat would have resulted from the change in the splice site.

Evolution of Drosophila YP1. The three Drosophila yolkproteins YP1, -2, and -3 are highly homologous (50-60%o aminoacid sequence identity) and have evolved from a commonancestral gene (18). They are also members of the lipase genefamily. The sequence of one of the yolk proteins, YP1, isaligned with the three lipases in Fig. 2. It is apparent that whileYP1 has retained considerable sequence homology with theother members of this family, it has undergone both intron andexon losses since its divergence. It has retained intron 2,whereas it lacks introns 1 and 3-11. The last two exons foundin the lipases are also absent in YP1. Like the three lipases,YP1 has also diverged considerably in its NH2-terminal region.Aside from the loss of most ancestral introns, what is most

striking about the YP1 gene is an apparent insertion of 44amino acids unrelated to the other members of the genefamily within the second exon. This appears to result from anexon-shuffling event that occurred subsequent to the diver-gence of the gene family. All three yolk proteins in Droso-phila are known to undergo a posttranslational sulfation oftyrosine residues. YP2 is sulfated at a single tyrosine residue,whereas YP1 and YP3 contain 2 mol of sulfated tyrosine permol of protein (19). The structural determinants of tyrosinesulfation sites have been well characterized (Fig. 3). Theposition of the single sulfated tyrosine in YP2 has beendetermined (19), and the amino acid sequence surroundingthis site is shown in the top row in Fig. 3. Although the twosites of tyrosine sulfation in YP1 have not been determined,-there are a total of three possible sites in this protein basedon the consensus sequence for sulfation. Two of the putativesites occur within the length of 44 amino acids in exon 2 thatis unrelated to the three lipases. Together, these two putativesites comprise a majority of the 44 residues. In addition, thesingle known YP2 tyrosine sulfation site and two putativesites in YP3 are found in similar regions of unrelated se-quence in the same relative positions as the extra sequencein YP1 when they are aligned with the three lipases (data notshown). However, no putative sulfation sites could be foundin human LPL, HL, or canine PL. Thus, it is likely that aseparate exon containing this unrelated sequence was in-serted into the YP ancestral sequence before the duplicationsleading to the multiple yolk proteins but subsequent to thedivergence from the lipases. The point of insertion of theelement is very near (12 nt downstream) intron 5 of theputative ancestral gene, suggesting that it may have resultedfrom an exon-shuffling event.

Transcription Start Site and 5'-Flanking Region of the LPLGene. The transcription start site for the human LPL genewas determined by primer-extension and S1 nuclease pro-tection analysis of the 5' end of LPL mRNA (Fig. 4). The

A

- I

a\ 4

T \TTT s

T

A- to 4AT

A

AI _0A IA.G I w5

I A C G T 1:R4' RNJ

w

u

-1.

-Aw

:"W

_l_I-- mm

4..

."o,

M.

B

- 56

FIG. 4. Determination of the transcription start site for the humanLPL gene. (A) S1 nuclease analysis. Human adipose poly(A)+ RNAor yeast tRNA was hybridized to a 338 nt end-labeled single-strandedDNA probe and digested with S1 nuclease. The size of the protectedfragment was determined by analysis on a denaturing polyacrylamidegel. A dideoxynucleotide sequencing ladder made with the sameprimer and template used to generate the S1 probe was run as marker.At left is shown the sequence in the region of the protected fragment(of opposite sense to that shown in Fig. 5) with the arrow pointing tothe start site predicted by size of the S1 nuclease-protected fragment.(B) Primer-extension analysis. An end-labeled 17-mer antisenseoligonucleotide beginning 56 nt downstream from the transcriptionstart site shown in Fig. 5 was hybridized to human adipose poly(A)+RNA and used as a primer for DNA synthesis by reverse transcrip-tase. The- length of the major extension product (56 base pairs,determined by electrophoresis on a sequencing gel with a sequencingladder as marker) predicts the same start site for transcriptionindicated by the arrow in A.

position of the cap site predicted by the two methods was inagreement, corresponding to 14 nt upstream from the start ofthe human cDNA sequence reported earlier (8). The se-quence of 730 nt upstream from the cap site of the LPL geneis shown in Fig. 5. Sequences corresponding to the TATAand CAAT elements seen in the promoters of most genes (20)are found at positions -27 and -65, respectively. Severaladditional potentially important regulatory sequences alsooccur in the 5'-flanking region (Fig. 5).

DISCUSSIONThe organization of the LPL gene, containing 10 exons, isvery similar to the HL gene; all introns compared interruptcoding sequence in identical phases, resulting in exons of thesame or nearly the same size. On the other hand, the gene fora third lipase of the gene family, PL, is distinctly different in

YP2 QPYEmTTDYSNFE EOQSQYP1 SITE 2 154 R T S S EE D Y S EEV K N A 197

YP1 R Y N L Q Q Q R Q H G K N G N Q D _JOQ S NIfQ R K N Q R T S S E E D Y S E E V K N A

YP1 SITE 1 YP1 SITE 2

FIG. 3. Amino acid sequence of nonhomologous inserted region in YP1 containing putative tyrosine sulfation sites. The sequence of the 44amino acids within YP1 exon 2 (Fig. 2) with no homology to the lipases is shown on the bottom line. Two sequences suggested to be YP1 tyrosinesulfation sites by Huttner and Baeuerle (19) based on homology with a sulfation site consensus sequence (see text) are underlined. The top lineshows the sequence surrounding the single tyrosine in YP2 known to be sulfated (19). The sulfated or putative sulfated tyrosines are highlightedby the shaded box. Acidic amino acids (open boxes) and glycines and prolines (overlined) indicate amino acids that are in agreement with thesulfation-site consensus sequence.

Proc. Nati. Acad. Sci. USA 86 (1989)

a.-W

0 a

w

Proc. Natl. Acad. Sci. USA 86 (/989) 9651

-730-680-630-580-530-480-430-380-330-2 8 0-2 30- 1 8 0- 1 3 0-80-30

AAATGGAATC ATACAATATG TGTCTTTTGC GACTATCTTC TTTCACTTATCATAACTCAA TACGGCTTTA GATTATTTGA CCTCGA~GCCTCTGAACATAAAATA TTATCCTTGC ATTCCTTGAT GAGTTTGAGG ATTGAGAATAIATTTGCATGA GACAAAAATT AGAAACTAGT TAGAGCAAGT AGGCTTTTCTCCATCACATA AGCTGATCCA TCTTGCCAAT GTTAAAACAC CAGATTGTACAAGCACAAGC TGGGACGCAA TGTGTGTCCC TCTATCCCTA CATTGACTTTGCGGGGGTGG GGATGGGGTG CGGGGTGAGT GAGGGAGGAC TGCAAGTGACAAACAGGA -AAAGA GAGGTGTATT AAAGTGCCGA TCAAATGTAATTTAACAGCT AAACTTTCCC TCCTTGGAAA ACAGGTGATT GTTGAGTATTTAACGTGAAT CGATGTAAAC CTGTGTTTGG TGCTTAGACA GGGGGCCCCCGGGTAGAGTG GAACCCCTTA AGCTAAGCGA ACAGGAGCCT AACAAAGCAAATTTTTCCGT CTGCCCTTTC CCCCTCTTCT CGTTGGCAGG GTTGATCCTCATTACTGTTT GCTCAAACGT TTAGAAGTGA ATTTAGGTCC CTCCCCCCAACTTATGATTT TATAGCCAAT AGGTGATGAG GTTIATTTGC AIATTTCCAGTCACATAAGC AGCCTTGGCG TGAAAACAGT GTCAGACTCG ATTCCCCCTC

FIG. 5. 5'-Flanking sequence of the human LPL gene. Arrowindicates start of transcription (nt 1). TATA (-27 to -22) and CAAT(-65 to -61) box homologies are underlined; sequences with perfecthomology to the octamer motif (-46 to -39 and -580 to -573), a

glucocorticoid response element core (-644 to -639), and thereverse orientation of the cAMP-responsive element consensus

sequence (-372 to -366) are boxed (20).

its organization. It has a total of 13 exons and, although sixintrons are identical in position among the genes for the threeenzymes, seven introns are not strictly conserved, includingfour introns in PL with no counterpart in the LPL/HL gene

arrangement and one intron in the LPL/HL arrangement notfound in the PL gene. These differences probably reflect bothfunctional and evolutionary divergence in this family.

There is now considerable data supporting the view thatintrons were present very early in evolution, before theprokaryotic/eukaryotic divergence, and that they served tomediate the assembly of blocks of coding sequence into genes(21-23). Presumably, introns were subsequently lost in pro-

karyotes and simple eukaryotes as their genomes becamestreamlined for rapid DNA replication. In addition, exon

shuffling, resulting in genes composed of exons encodingdomains recruited from other genes, has been clearly dem-onstrated for a number of genes, including the LDL receptor(24) and various members of the serine protease family (25).Our data suggest that the ancestral lipase gene had a mini-mum of 14 introns (Fig. 2B). After early gene-duplicationevents, there was a loss of introns 1, 3, 5, 8, and 10 followedby a second duplication leading to the mammalian LPL andHL. In a second path taken after the initial duplicationevents, introns 4 and 14 were lost, leading to the arrangementseen in the mammalian PL gene. According to this model(Fig. 6), the PL gene most resembles the primordial gene

because it has undergone the fewest changes. In yet a thirdpath taken after early duplication, there was an addition of anexon encoding a tyrosine sulfation site(s), the loss of most ofthe introns (introns 1, 3-11, and the introns mediating theexon-shuffling event) plus further gene duplications, leadingto the three Drosophila yolk proteins (Fig. 6). The identifi-cation of an exon in another gene that resembles the insertedsequence in YP1 would provide further evidence for theexon-shuffling hypothesis.

PRIMORDIAL GENE(14 introns)

loss cf introns1, 3, 5, 8, 10

gene duplication

LPL HL F

gene du|lication

tyr sulfatiorexon addition

loss of introns

loss of introns 1, 3

4 and 14 loss of two

terminal exons

gene duplication

L YPs 1, 2 & 3

FIG. 6. A model for the evolution of the lipase gene family.Beginning with an ancestral gene containing 14 introns, this modelsuggests that the six known members of the lipase family evolved bya series of gene duplication, intron and exon loss, and exon-shufflingevents (see Discussion). tyr. tyrosine.

By estimating the number of amino acid substitutions persite that have occurred during the evolution of LPL, HL, andPL, Datta et al. (6) have suggested that PL is evolving at twicethe rate of HL and seven times the rate of LPL. This isnoteworthy in the context of the present model for intron-mediated evolution of the lipases. Although the exonic or-ganization of the PL gene appears to have changed the leastsince the divergence of the three lipases, PL coding sequencehas evolved very rapidly. The LPL and HL genes, on theother hand, have undergone significant intron losses and asecond round of gene duplication, but both genes (particu-larly LPL) have subsequently evolved at a much slower rate.

In conclusion, these results have clarified the evolution ofthe lipase gene family. The information presented here shouldalso be useful in examining the molecular basis of inheriteddefects of LPL and the transcriptional regulation of the gene.Note. During preparation of this manuscript, Deeb and Peng reportedthe structure of the human LPL gene (26).The authors thank Dr. George Scheele for providing information

on the structure of the pancreatic lipase gene before publication andDr. Richard Lawn for his critical review of the manuscript. This workwas supported by grants from the National Institutes of Health (HL28481), the American Heart Association, Greater Los Angeles Af-filiate (492-IG15), the Centre National de la Recherche Scientifiquethrough UPR-41 and from the Institut National de la Sante et de laRecherche Mddicale (Grant 867004 to J.E.). D.A. is the recipient ofa fellowship from the Deutsche Forschungsgemeinshaft (Am 65/2-1).A.J.L. is an Established Investigator of the American Heart Asso-ciation.

1. Brunzell, J., Iverius, P. H., Scheibel, M. S. & Fujimoto, W. (1989) inLipoprotein Deficiency Syndromes, eds. Angel, A. & Frolich, J. (Plenum,New York), pp. 227-239.

2. Kirchgessner, T. G., LeBoeuf, R. C., Langner, C. A., Zollman, S.,Chang, C. H., Taylor, B. A., Schotz, M. C., Gordon, J. 1. & Lusis, A. J.(1989) J. Biol. Chem. 264, 1473-1482.

3. Garfinkel, A. S. & Schotz, M. C. (1987) in Plasma Lipoproteins, ed.Gotto, A. M., Jr. (Elsevier, New York), pp. 335-357.

4. Komaromy, M. C. & Schotz, M. C. (1987) Proc. Natl. Acad. Sci. USA84, 1526-1530.

5. Kirchgessner, T. G., Svenson, K. L., Lusis, A. J. & Schotz, M. C.(1987) J. Biol. Chem. 262, 8463-8466.

6. Datta, S., Luo, C.-C., Li, W.-H., Van Tuinen, P., Ledbetter, D. H.,Brown, M. A., Chen, S.-H., Liu, S.-W. & Chan, L. (1988) J. Biol. Chem.263, 1107-1110.

7. Bownes, M., Shirras, A., Blair, M., Collins, J. & Coulson, A. (1988)Proc. Natl. Acad. Sci. USA 85, 1554-1557.

8. Wion, K. L., Kirchgessner, T. G., Lusis, A. J., Schotz, M. C. & Lawn,R. M. (1987) Science 235, 1638-1641.

9. Sparkes, R. S., Zollman, S., Klisak, I., Kirchgessner, T. G., Komaromy,M. C., Mohandas, T., Schotz, M. C. & Lusis, A. J. (1987) Genomics 1,138-144.

10. Lawn, R. M., Fritsch, E. F., Parker, R. C., Blake, G. & Maniatis, T.(1978) Cell 15, 1157-1174.

11. Eladeri, M. E., Syed, S. H., Guilhot, S., d'Auriol, L. & Galibert, F.(1986) Biochem. Biophys. Res. Commun. 140, 313-319.

12. Greene, J. M. (1987) in Current Protocols in Molecular Biology, eds.Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Smith, J. A.,Seidman, J. G. & Struhl, K. (Wiley, New York), pp. 4.6.1-4.6.6.

13. Needleman, S. B. & Wunsch, C. D. (1970) J. Mol. Biol. 48, 443-453.14. Mickel, F. S., Weidenbach, F., Swarovsky, B., LaForge, K. S. &

Scheele, G. A. (1989) J. Biol. Chem. 264, 12895-12901.15. Hovemann, B., Galler, R., Walldord, U., Kupper, H. & Bautz, E. K. F.

(1981) Nucleic Acids Res. 9, 4721-4731.16. Stahnke, G., Sprengel, R., Augustin, J. & Will, H. (1987) Differentiation

35, 45-52.17. Craik, C. S., Rutter, W. J. & Fletterick, R. (1983) Science 220, 1125-

1129.18. Garabedian, M. J., Shirras, A. D., Bownes, M. & Wensink, P. (1987)

Gene 55, 1-8.19. Huttner, W. B. & Baeuerle, P. A. (1988) Mod. Cell. Biol. 6, 97-140.20. Mitchell, P. J. & Tjian, R. (1989) Science 245, 371-378.21. Gilbert, W., Marchionni, M. & McKnight, G. (1986) Cell 46, 151-154.22. Gilbert, W. (1985) Science 228, 823-824.23. Darnell, J. E. & Doolittle, W. F. (1986) Proc. Natd. Acad. Sci. USA 83,

1271-1275.24. SAdhof, T. C., Russel, D. W., Goldstein, I. L., Brown, M. S., Sanchez-

Pescador, R. & Bell, G. I. (1985) Science 228, 893-895.25. Rogers, J. (1985) Nature (London) 315, 458-459.26. Deeb, S. & Peng, R. (1989) Biochemistry 28, 4131-4135.

Biochemistry: Kirchgessner et al.

p