xi. yeast sequencing reports. sequencing of a 13·2 kb segment next

11
YEAST VOL. 10: S81-S91 (1994) ooooo 0 0” 0 XI 0 0 Yeast Sequencing Reports Sequencing of a 13.2 kb Segment Next to the Left Telomere of Yeast Chromosome XI Revealed Five Open Reading Frames and Recent Recombination Events with the Right Arms of Chromosomes I11 and V DESPINA ALEXANDRAKIt$* AND MARIA TZERMIAt ?Foundation for Research and Technology-HELLAS, Institute of Molecular Biology and Biotechnology and $University of Crete, Department of Biology, P. 0. Box 1527, Heraklion 711 I0 Crete, Greece Received 15 August 1993; accepted 1 December 1993 We report the entire sequence of a 13.2 kb segment next to the left telomere of chromosome XI of Saccharomyces cerevisiae. A 1.2 kb fragment near one end is 91% homologous to the right arm of chromosome I11 and 0.7 kb of that are 77% homologous to the right arm of chromosome V. Five open reading frames are included in the sequenced segment. Two of them are almost identical to the known YCR104W and YCR103C hypothetical proteins of chromosome 111. A third one contains a region homologous to the Zn (2)-Cys (6) binuclear cluster pattern of fungal transcriptional activators. The fourth one, part of which is similar to the mammalian putative transporter of mevalonate, has the structure of membrane transporters. The fifth one is similar to yeast ferric reductase. The sequence has been deposited in the EMBL data library under Accession Number X75950. KEY WORDS - Genome sequencing; Saccharomyces cerevisiae; chromosome XI; chromosomal recombination; telomeres; gene redundancy. INTRODUCTION In the course of the European Community (BRIDGE) project to sequence Saccaromyces cerevisiae chromosome XI, we have determined the complete sequence of 13 213 base pairs on a DNA fragment mapped next to the left telomere (about 200 nucleotides away from a 1 kb telomeric sequence, unpublished results). This fragment contains five open reading frames (ORFs), the potential function of which will be discussed below. MATERIALS AND METHODS Strains and vectors Cosmids pUKG040 and pEKGlOO were pro- vided in Escherichia coli strain TG1 (A(1ac pro), *Author to whom correspondence should be addressed. CCC 0749-503X/94/SQ0081-11 0 1994 by John Wiley & Sons Ltd thil, supE44, hsdD5, F‘ (traD36, proA”+ B”+ la& lacZAM15)) from Agnb Thierry and Bernard Dujon (Thierry and Dujon, in preparation). They are cosmids from libraries of chromosome XI, derivatives of cosmids pOU61 cos and pWE15 respectively, containing overlapping partial Sau3AI yeast DNA fragments. Escherichia coli strain DH5a (supE44 AlacU169 (cp80lacZAM15) hsdR17 recAl endA1 gyrA96 thi-1 relAl), and pUC18 and pUC19 vectors were used for all subsequent subcloning and sequencing steps. Sequencing strategy We have used directed sequencing of ordered restriction fragments. Cosmid DNAs were digested with EcoRI and electrophoresed in low melting point agarose. Four EcoRI fragments were puri- fied and subcloned into pUC18 or pUC19 vectors.

Upload: dangtuyen

Post on 25-Jan-2017

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XI. Yeast sequencing reports. Sequencing of a 13·2 kb segment next

YEAST VOL. 10: S81-S91 (1994)

ooooo 0 0” 0 XI 0 0 Yeast Sequencing Reports

Sequencing of a 13.2 kb Segment Next to the Left Telomere of Yeast Chromosome XI Revealed Five Open Reading Frames and Recent Recombination Events with the Right Arms of Chromosomes I11 and V DESPINA ALEXANDRAKIt$* AND MARIA TZERMIAt

?Foundation for Research and Technology-HELLAS, Institute of Molecular Biology and Biotechnology and $University of Crete, Department of Biology, P. 0. Box 1527, Heraklion 711 I0 Crete, Greece

Received 15 August 1993; accepted 1 December 1993

We report the entire sequence of a 13.2 kb segment next to the left telomere of chromosome XI of Saccharomyces cerevisiae. A 1.2 kb fragment near one end is 91% homologous to the right arm of chromosome I11 and 0.7 kb of that are 77% homologous to the right arm of chromosome V. Five open reading frames are included in the sequenced segment. Two of them are almost identical to the known YCR104W and YCR103C hypothetical proteins of chromosome 111. A third one contains a region homologous to the Zn (2)-Cys (6) binuclear cluster pattern of fungal transcriptional activators. The fourth one, part of which is similar to the mammalian putative transporter of mevalonate, has the structure of membrane transporters. The fifth one is similar to yeast ferric reductase. The sequence has been deposited in the EMBL data library under Accession Number X75950.

KEY WORDS - Genome sequencing; Saccharomyces cerevisiae; chromosome XI; chromosomal recombination; telomeres; gene redundancy.

INTRODUCTION In the course of the European Community (BRIDGE) project to sequence Saccaromyces cerevisiae chromosome XI, we have determined the complete sequence of 13 213 base pairs on a DNA fragment mapped next to the left telomere (about 200 nucleotides away from a 1 kb telomeric sequence, unpublished results). This fragment contains five open reading frames (ORFs), the potential function of which will be discussed below.

MATERIALS AND METHODS Strains and vectors

Cosmids pUKG040 and pEKGlOO were pro- vided in Escherichia coli strain TG1 (A(1ac pro), *Author to whom correspondence should be addressed.

CCC 0749-503X/94/SQ0081-11 0 1994 by John Wiley & Sons Ltd

thil, supE44, hsdD5, F‘ (traD36, proA”+ B”+ la& lacZAM15)) from Agnb Thierry and Bernard Dujon (Thierry and Dujon, in preparation). They are cosmids from libraries of chromosome XI, derivatives of cosmids pOU61 cos and pWE15 respectively, containing overlapping partial Sau3AI yeast DNA fragments. Escherichia coli strain DH5a (supE44 AlacU169 (cp80lacZAM15) hsdR17 recAl endA1 gyrA96 thi-1 relAl), and pUC18 and pUC19 vectors were used for all subsequent subcloning and sequencing steps.

Sequencing strategy We have used directed sequencing of ordered

restriction fragments. Cosmid DNAs were digested with EcoRI and electrophoresed in low melting point agarose. Four EcoRI fragments were puri- fied and subcloned into pUC18 or pUC19 vectors.

Page 2: XI. Yeast sequencing reports. Sequencing of a 13·2 kb segment next

D. ALEXANDRAKI AND M. TZERMIA

3343 EcoR I 1585 ECOR I

1524 EcoR I It I 2309 bp

S82

A

B

10248 EcoR I 13208 ECOR I 45% bp I

~

1523 61 1758 6905 2960 bp

3 3

2 2

1 1

-1 -1

-2 -2

-3 -3

2000 4000 6ooo 8000 loo00 12000

F711 7

D123 All0 F705 B473 ++ - - Figure 1. (a) EcoRI restriction map of the 13 213 base pair segment. The arrows indicate the beginning (Suu3AI sites) of the sequences included in the two overlapping cosmids. The numbers below the bar indicate the size of each EcoRI fragment. (b) 6-phase ORF map of the 13 213 base pairs. Small bars indicate initiation codons and full bars indicate stop codons. The location and the direction of five ORFs are indicated by arrows. The number in the name of each ORF indicates its size in amino acids. The ORF names were assigned by MIPS.

The order of the EcoRI fragments is shown on the map of Figure 1A. The sequences of the two 5‘ EcoRI fragments and 2 kb of the third one have been determined from cosmid pUKG040. The rest of the reported sequence derives from cosmid pEKG100. The additional sequence of 24.6 kb included in the pEKGlOO insert has been reported separately (Tzermia et al., 1994).

Double stranded template DNAs were prepared by alkaline lysis followed by Qiagen-tip selection (Qiagen Inc.) or PEG precipitation (Ausubel et al., 1987). They were subsequently sequenced using [35S]dATP and the Sequenase kit (United States Biochemical Corp.) following the supplier’s proto- cols. Sequencing of both strands of the EcoRI fragments, subcloned in both orientations, was performed by the ‘universal’ or the ‘reverse’ M13 primers on nested ExoIII-S1 nuclease (Ausubel et al., 1987) deletions. Synthetic oligonucleotides cor- responding to internal sequences (prepared on an Applied Biosystems synthesizer by the Department of Microchemistry at 1.M.B.B.-Crete) were used as primers to fill in the gaps. The junctions between the sequenced EcoRI fragments have been estab-

lished by PCR sequencing of PCR products, which were synthesized from oligonucleotide primers near the ends of the fragments, using cosmid DNA as template, [32P]ATP-labelled primers and the fmol DNA sequencing kit (Promega Corp.). Samples of sequenced DNAs were analysed on 40cm long 6% or 4% polyacrylamide gels with single or double loadings.

Sequence analysis software Endonuclease restriction, 6-phase ORF map-

ping and hydrophobicity profiles of the sequences were accomplished by the DNA Strider software (Marck, 1988). Comparisons of nucleotide and amino acid sequences were made to the GenBank, EMBL, SWISS-Prot and NBRF libraries using the GCG package software by us on the I.M.B.B. MicroVAX and by the staff at MIPS.

RESULTS AND DISCUSSION Sequence determination

The reported sequence was determined from overlapping Exo 111 produced deletions and

Page 3: XI. Yeast sequencing reports. Sequencing of a 13·2 kb segment next

IENT OF CHROMOSOME XI

601 TTTGlTCGTT AATTTICAAT GTCTATOGAA A C C C G T T a T I U U T T G G C G TTTGI'CTCT ATTTOCGATA GTGTACATAC CGTCCTlGCA TKAGCACIG

101 GAGAlRGCTG GTCTCMTCT OGTAGACTAC CATGGCACAC CAGTGATAX TCTGCTCACT ICTTCAGCTG GMTAAOCAGT ULACATOGTG GIGMGTCAC

801 CATAGTTGAA AACAGCCTCA OCAATTKAA CZGGTAffiT CTCACTTGCA TCAOCTCCTT GAAAULAGTA CTATTOGCCC AAATCAGCTC TGATATCGGA

901 GACGTAAACA CCCAAITCGA OCAAGTTAAC TCITTCGTCA GAIGGAGATA GAGIGGTAGT GGCTCCGGCA GCGGCAACAC CAGCAGCGAT GGCGGCGXA

2 -I-

S83

1001

1601

1701

1801

1901

2001

2101

2201

2301

2401

2 5 0 1

2 6 0 1

2701

2801

2 9 0 1

3001

3101

3201

3301

3401

3501

3601

3701

3801

3901

4001

4101

4201

4301

4401

CCAGCAGCCA TTGAAGTTAA TLTGA AlRTTTGITT T G l T T C T T K TGCZATATA A G C r a A C A G GAAAWVUGG AATAAAMCA TATTCTCAAA

1201 TCTATATCTC ATCTTICACA CAATCTCATT AICAC G AGATGCTCIT GTTTCTGAAC GMT~TACA TCTTT~TAG xmccmm TQXGTACIG

~

1301 TTTTATGGCA CTCATGTGTA I T C G T A E C G TAGAATGEA GAATGCCAAT TATKGGTGC CGAGGXCCT T A T M C C C I T T T C T m G C CTGTCACATT

1 4 0 1 TCCTITTTCG G T C W G A ATATCCGAAT TTl'ACATlTG GAOCCKGlR CAGAAGCTTA TTGTCIMGC CCATATPCAC ICTGCTCl'AA ACGGCTTEA

In 0 t- I&

1: 1501

EcoR I EcoR 1 TGCPIACAAAT A W C T A T C T C r r S A U K G TACAACAKTA AAETGTGIT GGGAGTCGTA TACT G T C T G m T ICG- TCCGCGAATG

m G A GCAACTATCT KAAAAGTAT GIXGAAGICC GWAGCAACA GAGOGTGCCA CCAAIXGCTT TCACAMGAC T X C T T I C G T G I T G G W

AAAAIGAGTC ACTTACTACA OGTTCAATTT TTATAATICA ATETTGTAC C T XTGGAA AAGAlCTCCT CAPARTTACT OCACCCOGAA AlRCCTGIXT

CAAAGRAGGC ACTTGULGAA ETAATATAT W C G A T G R A A T I T T C C G K AAAXTTCTA AAGGICCCTC TAAATCATTT PGTATAGITG TTAAACTGTT

TTCAGTTGCA TCATTXTCGG ITTCAGGCCC TEATCGATA TTTITCATIC CTTIGTTAAA AGCTCTTCTT C C T A T T I T T CCATTCATAT AATAATAGW

AAGTAATAGG A G T T T E C C A AATTTGKTA ACAGGTTICA ACITTTTCTTG GTGMTATGA TCAAATATGT TTTCCATTTC TPTAAAECA ATPTTCACCG

AAATKACTT ATTGTCnGA CCTCAAAGT CKCAGGACA TATAGAATAA GTTAGTGCGG ACGhATTTCT C T G M Z A A T AATTGGATAT C I T T A l T T C

CRTAIXCTGT AGAAAAATGG TATAAATITG AATTAATGCT CTOVVLGCa TCACATAAAT AAGAMCAAC GCATTGRTA AATGGGGCCA TTPTTGCAG

AGGCATAATG ACTTAULAGA TAATTTCCTG TICTCTAATT CAAAATAGGT CTCAAGGACA T T W C A C A AATGTAATGA ATTTAGZGA GCCAGGALGC

TAAAATTGAA TATTCICGCA ZCAAAAICTT GATGTCTTAA CGmTAAAUL TTGCATI\AAC T T C C m T C A T ATGGACAGTT GCATGCCATA A l T G M G I T C c

GGTGAACTCA TTACCATTTA MTTAGATGC AITTAAGDA m c c m a GCTIRMCAT ~ A A T C G T G AACTTTITTA A G T C C ~ T GATGAGTOGA

ATGT!3GGCG GTTTCKACG AGCATGAATT TGRTCATAA TATLTCTCK TCTTAATGTT GTTTTXTACA GAAGGATGTC C C C T G A E M GAATAGTTCT

CTAGACGTAC C T T G m A C A AAATCGEGT TCATTCGCAC C-CT GTACAGAGTG AAATTITAAC GTCCGEAAT AGGATCCATA ACCACAAICT

CTCCAAGTAA GGTATlTCCT LTATTGCATG AITTTTACIC AGATATAPX GACGAAGATC TTCATTAAGG CCCATATGGA TAGCTGZTC AATTGKAGA

CCATGRATAA ATATGCAATG ZACTGATCG CECCGTCAA GAOCCGCCAC ATTTATGTAC AGGTAACGTA A A A A T W ITCCACPCGC TCAGTATAAA

A A A C m A G C CGAAACAAAR CACGTTAAAA CITTATGGRA GACXTCAATA GCC'ICTGGCA CTTCITTTGG ATGGGATGCT AACCACATAA TGGCTGTCAT

AACACCCACT TTATAATAAT TlTTTTTACT GICCAGAITA AGOCAAATTA T T G Z T G C T G ACCAGlTTTA TGACTICTTG G G C C m C AAAACMTCT

TGCACATCCC TTAATACTTT ITCTTTATGA AULATTTOGT AACIATCGTA AARATCGCTA GCLUAAAAT CAGTTAAATA C I C A C A W ACTTCATAAT

Emu I

TTGGAAGCGA TTCACATAAG m m c I v I T m CXTTGG~WI GCCKAGTGC GGTCGGGCAG TCKMTGGA ACTAAICTCC G A G A G ~ C AGTMTWG

CTCTCITTTC CAGTTGITAC OGCTGAGTTT T m a c m c CATATCTCCI CTCZTATTT GGCWGTA TCTGTTTCAG TTGCTMAT vxTcmmc

GACGITGGTC CATAAPICAAT GECCTATTG TGCTTTGAAG ATAAATATCT CATGTTGGA?. AGAGOGTTTT GTCCTICGTG AATGTTAGGC TGGCATTGIT

lTGl7XGCTG AGATTULAGC ICTTCAATAC GACGACCCAT C T C I A A T l l T GAACATCTCC T G T A T I T m CCTCAGITGG ICCTCCATGC CGCAAGTGTT

lTCTITATAT TCACAITGCT TTCGATITCG AWGAGCAA CTICCACAGA ACGGTCTTAC TCCAICGCAC TTCAGClTTC ITACTClRCA AAAGTGGCAC

ZACTTl'GCAG G C W C T E T XTCTCOZTA TCIAA"lTT TTTITTCACT AAGITCCGTA TTC TCTA-m ATGGMK.TA T A n G C T K A

Page 4: XI. Yeast sequencing reports. Sequencing of a 13·2 kb segment next

S84

1901 TATTATTAAT GAAATUCTA CITTTAKCTA AAAAI\TGATC CCAGCATIG ATACTCTTCA AAGTXTCAA TGGCTTTTK PEACTTATCT AARACTATTT

8001 TTGCGGTTTC ATTCCICACT T T G T W A A AALTAGGAGG TCCACAGCAA ACCXAGATA ATGAGCCACT CMTTCAGCC GCTTCAETA GAAGTTCCPT

8101 AACATCTGGT CGCCCGCAAT GGAAAACAAC AOCATCAAAA CCAAGTGGAT TGGCAGACTT TTCTAMGTA GTTGCaCM CAGTGCaTT TEATCAGCC

8201 TTCWTCCT GTTGAblUT ATCTAAACTA TCACTAGGGG TTAATGATGG GAClTCCATT GTGTETAGA TGTWVY;CTG TACATTWGA TTPTCTAAAC

8301 A C A m C T C CGGCTTATAA GCCTCWTA CCTCGAAKC TClAACTGCA ATCXTAATT TTACAGATTG TTTTCmGCA GCCGCTULCG rCTTTCCMG

8401 TTTAATTGCA TGTGCMTAG CTCCAGGCAA ACCGGTACCT CCAGTGAGm AC?!ATACATT ATTGWTTA TTGACCGGAG ATGAAGAACC ATATGGACCT

8501 TCTATAGCTA GTCTCATAGA TGTCTTACCT C(1ITTGCGRC ACXATACTC TTTUCFAGT CTTGTTACTC CCTTTTPTTC W C A G U T A A W C C A a T

8601 CACCATTCTT GCTTACTGAA ICCAAARCI\G TAMTGGATG TGACTGCCAG AAGVGAGTG GATG?AAAAA CGAAACGM XATATLGCC CAGGTTTGGC

8101 CCTCCATGGC CTTGCCGGTT ZTTTAACTGT TAAACGAATG AGATCATCCC CGATTAGTTG TAGGGWGCT TTGGG-C CAAAATAAGA AGCTTTGATA

8801 ATTCTAATAA TCCGGXAAC UTCCAAATC G W T A G U G TG'IKTATCCA CTCPATGCCA CTTAALCTM CMCAECTC CCAACAECA T M G A ? C A

D. ALEXANDRAKI AND M. TZERMIA

r- I&

I ,

4501

4601

4101

4801

4901

5001

5101

5201

5301

5401

5501

5601

5101

5801

5901

6001

6101

6201

6301

GATTCATTAT TTCCAAAGCA XCGTTATTG GTTACTAGTG CTGWLCAATT GATATGTMT CAGTIGCTAT ACATCZAGAT m G m T C TCITCTTPPG

ACGTPPCGAA TATTTTTTGA m T T GAAGTGCACA CTmTCTTTT C M G T I T T T TA-TC CGTTT'IKTGG CTATTCCCGC mTAACATA

GAGGCWLTGC ATGGAACACA ATAATTATTT TAAGGTGCTT TTULGATCAA GATATTTTTA AGAG'IKCTCA CTATGGGTAT UGATAATAA m m T a m c

GTTTICTTAT TATTTCCATT m A G A T T T TACAGGCTA AA(1IcI\cATA TAT-TG CCATCGGATA TEMTITX TATCCTGCTA TIGATATIEI\

ACTAZATAAC TGATACTAGA ATATACWT TCGTGCACTA TTAACCGTTT GGC

MTTGWTTT AAATGCTAAA UCGATATTA ACGGGAATAC CTWTCTCG ATCUGGTGC CTGAECAGG ATATGaTGC TITATTCTTC TECTTTZAT

TCTGTACAAC TTTTCTACTT GGGGGGCAAA TETGGTTAT GClATTTATC TAGCGCATTA TTTAUGAAT MTACTTTTG CKGTGGWLG TAMTTAUC

TATGCTTCTA TAGGTGGGTT KCATTCAGT TGP2GAC"T TmTGCCOC AGTlATAACA TGGCmATC ATATAmTC AAnrAATTC ATTATAGGCT

TAGGGWACT GTTTCAAGGG GCAGCGCTAC RTTTGCWC TTTCTCTGIC ACACPCTGGG AAATTIATCT CACGCPAGGC GTlTTAATTG GATTCGGlTT

AGCATCTATT TTCATXCCA CTGTCACACT CKCCCCACTA TGCTTCAGAA ATAAMGATC TTTAGCCTCT GGTATffiWLA CPGCTGGWG CGGGTTAGGT

GGTAZTGTCT TTAACTTGGG AATGCAAAGT AlTCTACAM AWGGGCGT TI\MTGCCCG CTCATPGCTC ACTGCATMT ATGCACATCA ClTAGCACCA

TTGCGCTTAT GTTGACCAGA W C A C A T C AZGCCTACC TCAACATAAC W T ' C A C A M T T E M T T GCTAGATTAT UTGTGCTTT CAMTTTCGC S d A I+ pEKG100

GGTCUGTTA CTTTTIGGAT WTATCATT TGCTATGTTA GGATATGTIG T C C r m G T A TTCCTPGTCT GATTTTACCG lTAGTT7lGG TDaTACTAGT

RAGCAAGGCT CATACGTATC CTGCATGGTG ACTGTCGGCT CTCTGCTGGG ACGACCMTT GTGGRCACA TTGCTUTM ATATGGATCA C W C A G T K

GCATULTATT GCACCTTGTC ATGGCCATCC TZTGTTGGGC CAlGTGGA'PR CCTIGTAAAA ATTTGGCCAC TGCGAZACGT TCTGGAlTAT TGGTTGGTCC

TATTATGGGA ACAATTTGGC WCAATTGC TEAATTGTT ACPCGCATLG TTGmCTTCA AAAGCTTCCT GGTACCPTTG GTAGTACCTG GATTTTTATG

ZCGGCTTTTG CCTTAGTTGC OCCCATAATC GCTCTGGAAC TTCGTTCAkC TGA'IKCGAAT GGAAIEGATT ATTATCGTAC AGCAATATTC GlGGGTllTG

SGTACTTTGG TGTTAGTTTA UCCAAlGGC TATTGAWLGG GTTTATMTA GCTCGAGATG AGATLGCTGT GCGTWHGCC TATTCAGCTG A C C M T G A

lTTGCATTTA AACGTlRAGT 'IKTCACATAT GXTAAAIGT CTTCTTCGTC ATAALCAATT ACCT€GCAGA GT TAALGGT CACTTTZATT TCACACTCTA

6401 GATAAWULGG GGATAUGTT GCCAGAAAAT TTITTGClTT ATCGCTTTT TAGkTlTGTC TTCTaTTTT ATAAAWTG ATAATTlAGT Al(aATIVLTAG

A T A A W CATTTITTTT ECAAATTGA AACCTTIACT GGTCTTTTAA MGIW\ATAT TAGTUCCTT CTACCUCTA AMTCTRCA CIXFATCCCC

AGTAblUCA TAAGAWGG T T W C l ' T T TTIAGGGCCA TTATATTTTC AGTCCTATAC ATGTTPGATG GCATACPTTT W G G A X A T TDaTWLAGCT

ATGTICCACA TAGGTTTATG UCTAT'SGAT AZATACTATA AAIAAAACGC TCTXCCMG AAAT-CA TTT-C TTTTGTCATC AGCACAAAAA

TATGCGGATT TACGCGTTTG ICATTCCTAT AAACCTTAAC CTlAACGCGA AACCATATTT CGCTXCTAA TGTTCCTTCG GTGCATPPTG A W G G W

CTAGGTTCAG ATATCATGCA XGAATCTAT TATCTTCW C C C A A C m ACAAACGTTT AGACGAGCG T T A m C T G AATTAAUGC AKCTTTTZAC

1001 ACGTGTGACT CTGGCACATT PEAGCTGTTT G-TCWA AGlCTTGCU TCTAAGGACC W A A T G TATTTCGAGG KATAGCNLA GAATCTCATT

1101 GTTCTCCAGC ACTTGCTAAT CACATAKTA TTKTAACXA CAKXTGATA GTTAACMCT AATAWWAT AAACCAGGTA AARAAUHRA Iv\GAAIwGC

1201 TGTAAAAGAC GACAACGTTG GCGTCGGAA AATATTAATA GAACCATAU TTGUTAATG TAGCCPCTAT TATACXTTA ICCCGATTAC AZTGTTTZAG

1301 CGACGGTAAT CATCTTTTTA lCCGGCACTC TGWTCAALC TCTATCTTLR AGTATCCTAT ATGCATCAGT TATGTGATT ATATCGAAGA CKGGTGTGT

1401 GATATTACTC CACAG'PRCAA AAACAGUVLT ACAGAGAAAA TI'(AAAAGAT ATAlGAACGG ATGGlATGGT AAATAAMTA ACAGCCATTA TAMGCTTGT

1501 GGTAAATLTl CCAGGACGGA AGTTAGCTTA TGCAGTTTPG TTTATCTTCC TTGmTCTGT GTATAATACT GMACXGCA ATTAACmT CATAAAATTT

1601 GCCTULGGGC CGTATAAAAA TITAGTQCC ATTGTTCACA CTTIAAATAT AGC-TTG CCAA-GG GTTTCATTGC AGGTTTATCG AAATATTAAC

6501

6601

6101

6801

6901

1101 TCTnaAATA CAGGWLCTAT PUGAGCTAA CTTCTTAAGA ?AAWAACCA TGCGACACAC TACGACTTTT ACMAITCCA IAAAAG?GAT ATTTAAGATT

1801 TAGAVAAAT ATCAAXGAC ECTGTACTC TGCTTCAATT ACCTCCTTAC CATAAAGTTA T A G a G C T T KTCCATCGA CATACTAAAA TTICGATPGG

Figure 2. (Continued).

Page 5: XI. Yeast sequencing reports. Sequencing of a 13·2 kb segment next

13.2 kb SEGMENT OF CHROMOSOME XI

8901 TTGCACCMC GACCATATGA AGCTTCATA MMTACTIC CTGMTACTC W M WLAAACGATT GTGCCAGCTA . I . M . c U C l 4 1

S85

Page 6: XI. Yeast sequencing reports. Sequencing of a 13·2 kb segment next

S86

77%

D. ALEXANDRAKI A M ) M. TZERMIA

Chromosome V (Right end) 11,125 10,678

I I

j S %

Chromosome XI (Left end)

Chromosome I11 (Right end)

Chromosome I11 (Right end)

.- .- I T - l I 89% 96% 78% 93% - 90% n

II I I II I t II I I

I I , 1 I I I I 1 1 I I I I I I I I I

:lo441056

II II II

I I I I I I I . - . . . . . ..

I 306,931 1 1 I

71% !83%

288,802 288,796

Figure 3. Diagram of the homologies between different chromosomes. Shaded bars indicate the chomosomes. Numbers at the bottom of each bar indicate the base coordinates as given in the BlastA analysis. Numbers on top of each bar indicate the percentage of base pair identities

internal oligo-priming to fill in the gaps. An aver- age length of 315 nucleotides was read from each sequencing reaction. Readings up to 400 bases were achieved on 4% polyacrylamide gels. Com- pressions seen at several specific positions were solved by repeating the sequencing reactions using dITP (5 different instances). Sequence assembly was performed manually according to restriction maps and the sequences obtained from PCR con- necting fragments. Sequence alignments of both strands were done using the GCG program. The final sequence contained an additional 6 1-base EcoRI fragment following the first (5')EcoRI site, which had not been detected at the original gel electrophoretic analysis of the cosmid DNA (Figure la).

Sequence analysis Six phase ORF map analysis of the 13.2 kb

fragment revealed five ORFs> 100 codons (Figure lb). Their sizes range from 110 to 71 1 codons and they constitute 48.2% of the entire sequence (6366 bases). This percentage is a low compared to the average chromosome content in coding sequences and it is probably due to the location of this

fragment near the end region of the chromosome (Oliver et al., 1992).

The complete sequence of the 13 213 bases is given in Figure 2. FastA (Pearson and Lipman, 1988) and BlastA ( Altschul et al., 1990) analyses of the sequenced segment revealed extensive hom- ologies to known sequences on different yeast chromosomes (Figure 3). More specifically: a) a region of 1182 bases showed an overall identity of 90.7% (ranging from 75% to 100%) to the right arm of chromosome 111; b) 159 bases of that showed 77% identity to a second region of chromosome 111; c) 638 bases were found 77% identical to the right arm of chromosome V and d) homology has been detected to the right arm of chromosome I1 (Becker, personal communication). Obviously, these are due to recombination events between fragments near the ends of the mentioned chromosomes, as has been previously reported (Oliver et al., 1992).

Analysis of the putative O W products The putative translation products of the identi-

fied ORFs have been compared to protein data- bases using FastA (Table 1). For better evaluation

Page 7: XI. Yeast sequencing reports. Sequencing of a 13·2 kb segment next

13.2 kb SEGMENT OF CHROMOSOME XI S87

Table 1. with the protein databases

Best optimized FastA scores obtained by the comparison of the putative translation product of each ORF

Homologous or Identical Optimized Highest ORF protein score score

D123 S. cerevisiae 544 554 Hypothetical protein YCR104W (124aa) 98.3% identity in 120aa

S. cerevisiae 485 554 SYGP-ORF12 (120aa) 86.2% identity in 123aa

Hypothetical protein YCR103C (ll laa) 79.3% identity in 11 laa

CYPl (HAPl) regulatory protein (1483aa) 28% identity in 130aa

mevalonate transporter (494aa) 28.1% identity in 153aa

Ferric reductase (686aa) 24.5% identity in 693aa

A1 10 S. cerevisiae 496 655

F705 S. cerevisiae 155 3624

B473 Chinese hamster (MeV) 182 2515

F711 S. cerevisiae (FREl) 542 3723

Reference

van der Linden et al., 1992 EMBL: X59720

Mulligan et al., 1993, unpublished EMBL: L10830

van er Linden et al., 1992 EMBL: X59720

Verdiere, 1988 EMBL: X13793

Kim et al., 1992 EMBL: S48888

Dancis et al., 1992 EMBL: M86908

of the significance of each obtained score, we have also included the highest FastA score, obtained by the comparison of each ORF to itself. Optimum scores higher than 200 have been considered as significant. Lower scores due to homologies in restricted areas of the protein sequences indicated conservation of specific domains. Protein patterns (motifs) have been identified by the ProSite pro- gram (Bairoch, 1991) of the GCG package. Our findings on each individual ORF are discussed below.

ORF D123 is included in the region which is closely related to the right arms of yeast chromo- somes I11 and V. It is almost identical to YCR104W a hypothetical protein in the HMR 3' region on chromosome 111, and very similar to the STGP-ORF12 encoded by a gene contained in a region of 36772 base pairs of chromosome V between the known genes MAK10, AFG18 on its 5' site and CYC7 on its 3' site (Mulligan et al., unpublished, Mortimer et al., 1989) (Figure 4a). The function of both of these hypothetical proteins remains unknown. As has been already reported

for YCR104W (Bork et al., 1992), D123 showed also similarities to the yeast temperature-shock inducible protein TIPl (Kondo and Inouye, 1991) (27.3% identity in 99 overlapping amino acids, FastA score: 144) and to the yeast serine rich, glucose induced protein SRPl (Marguet et al., 1988) (26.3% identity in 99 overlapping amino acids, FastA score: 121), the function of which is similarly unknown. All of these proteins, including D123, start with a putative hydrophobic signal sequence, which is followed by a conserved domain of about 90 residues including the stress- induced protein motif (P-W-Y-[ST](2)-R-L). This domain is followed, in SRPl (total length of 254 amino acids) and TIPl (total length of 210 amino acids) proteins only, by a repetitive serine and alanine rich region (Figure 4b). According to Kondo and Inouye (1991) and Marguet et al. (1988), there is a family of several genes in different chromosomes which cross-hybridize to both TIP1 and SRPl sequences but some of these may not be highly expressed genes, since only three distinct transcripts have been detected. We have not

Page 8: XI. Yeast sequencing reports. Sequencing of a 13·2 kb segment next

S88 D. A L E X A N D R A K I A N D M. TZERMIA

A D 1 2 3 YCRlO4W S Y GP -0RF 1 2

D123 YCR104W SY GP -0RF 1 2

D123 YCR104W SY GP-ORF 1 2

B D123 YCR104W S Y GP -0RF 1 2 T I P l SRP 1

D123 YCRlOllW S Y GP-ORF 12 T I P l SRP 1

D123 YCR104W SY GP -0RF 1 2 T I P l SRP 1

D123 YCR104W SYGP-ORF12 T I P l SRP 1

D123 YCR104W SY GP-ORF 12 T I P l SRP 1

MVKLTS IAAGVAAIMGVAAAPATTTLSPSDERVNLVELGVYVSDIRAHIAQYYLFQAAH MVKLTS IAAGVAAIAAGIAAAPATTTLSPSDERVNLVELGVYVSDIRAHIAQYYLFQAAH MVULTS IAAGVMIM---TASATTTLAQSDERVNLVELGVYVSDIRAHIAQYY SFQAAH * * * * * * * * * * * * * * * .*. * * * ** , . * * * * * * * * * * * ** * * * * * * * ** * *

P S E T Y P V E I A E A V F N Y G D F ? T M L T G I P A E Q V T R V I T G V I Y T PTET Y PVEI AEAVFNY GDFTTMLTG IPAEQVTRV I TGVP W Y STRLRPA I SSALSKDGI YT PTETYP IEVAEAVFNYGDFTrMLTGIAPDQVTRMITGVPWYSSRLKPAISSALSKDGIYT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- IAN AIPK - IAN

*. .

MVKLTS I AAGVAAIAAGVmP ATTTLSP SDERVNLVELGVY VSD I RAHIAQY Y LFQAAH MVKLTS I AAGVAAIAAG I AAAP ATTTLSP SDERVNLVELGVYVSD I RAHIAQY Y LFQAAH MVKLT S I AAGVAA IAA- - - TAS AT TTLAQS DERVNLVELGVY VSD I R A H I A Q Y Y SFQAAH MS-VSKIAFVLSAIASLAVADTSAAETA---------ELQAI ICDINSHLSDYLGLETGN MA-YTKIAL-FAAIAALASmT-QDQIN---------ELNVILNDVKSHIQEY I SIASDS

.. * * . . . . . . . . . . ................. P SETYPVE IAEAVFNYGDFTTMLTG IPAEQVTRVITG RPAI-- PTETYPVEIAEAVFNYGDFTTMLTGIPAEQVTRVITG RPAI-- PTETYP 1EVAEAVFNYGDFTTMLTGIAPD;LVTRMITG KPAI --

S-GFQI ---PSDVLSVYQQMulTYTDDAYTTLFSELDFDAITKTIVK SSEI - - SSGFSLSSMPAGVLD I GMAIASATDDSY TTLYSEVDFXVSKMLTM EPALKS

- -- - - - - -- - -- - - - - __ - -- - - - - --

. . . . . . . . . . . . . . . . . . . . . . . ..........

". . . . . . . .

Figure 4. (a ) Multiple alignment of the scqticnccs of the D123 ORF. the YCR104W hypothetical protein and SYGP-ORFIZ using the CLUSTAL program. ( b ) Multiple alignment of the sequences of the D123 ORF. YCR 104W. SYGP-ORFIZ. TIPI and SRPl proteins. The stress-induced pi-otciii motif is shadowed. Asterisks indicate residue identities and dots indicate conservative substitutions.

Page 9: XI. Yeast sequencing reports. Sequencing of a 13·2 kb segment next

13.2 kb SEGMENT OF CHROMOSOME XI S89

AllO MEMLLFLNESYIFHRLRMWSTVLWHSCVFVCVECENANYRVPR-CLIKPF-SVPVTFPFS YCR103C MEMLLFLNESYIFHRFRMWSIVLWHSCVFVCAECGNANYRGAG-VPCKTLLRAPVKFPLS DYClOl MEMLLFLNESYIFHRFRMWSIVLWHSCVFVCAECGNAYYRGAGGCRGAGGCLEKPF-CAPVKFPFS

** *** * . . . . . . . . . . . . . . . . . . . . . ********** ** ** ** . AllO VKKNIRILDLDPRTEAYCLSPYSVCSKRLPCKKYFYLLNSYNIKRVLGVVYC YCRlO3C VKKNIRILDLDPRSEAYCLSLNSVCFKRLPFNKYFHLLNSYNIKRVLGWYC DYClOl VKKNIRILDLDPRSEAYCLSHHLVCPKRFPCKATSLLL----------IPEG

************** * * * * * * ** *** * * **

Figure 5. proteins using the CLUSTAL program.

Multiple alignment of the sequences of the AllO ORF, YCR103C and DYClOl hypothetical

examined the expression of the gene encoding D123 protein. However, its proximity to the telo- mere could indicate that it is expressed at flow levels or under specific conditions (Sandell and Zakian, 1992). In fact, we have noticed, by FastA analyses, that the TIP1 and SRPl proteins are more similar to the SYGP-ORF12 (30.6% identity in 108 overlapping amino acids and 29.3% identity in 99 overlapping amino acids, respectively) than they are to the D123 and YCR104W ORFs.

ORF AllO is also contained in the region of homology between chromosomes XI and 111. It is similar to the hypothetical protein YCR103C. It contains several non conservative amino acid sub- stitutions which may imply that the two products serve different functions. A third ORF of 101 codons (DYC101 sequence communicated by H. Domdey), that was identified on chromosome 11, is also homologous to A1 10, exhibiting 76.3% iden- tity in 97 overlapping amino acids (FastA score: 442). That ORF resembles YCR103C as much as the AllO does (794% identity in 97 overlapping amino acids). A multiple alignment of the three sequences revealed more residue substitutions be- tween A1 10 and the other two proteins, except for the last 14 amino acids at the carboxy terminus which are identical in AllO and YCR103C ORFs and absent from the chromosome I1 ORF (Figure 5).

The FastA alignment of the F705 ORF prod- uct showed homology to the yeast protein CYPl (HAP1) only in 130 overlapping residues of its amino terminus (residues 7 to 136 in F705 and 49 to 170 in CYP1). In fact, the alignment varied depending on the program used, except for a stretch of 36 residues, 30 of which make up the fungal Zn (2)-Cys (6) binuclear cluster domain. F705 was also regionally similar to several pro- teins that contain the same motif (FastA scores: 100-1 55). All of these proteins are transcriptional

activators (GAIA, MAL63, LEU3, etc.) that bind DNA with their cysteine-rich amino termi- nus in a zinc dependent fashion (Coleman, 1992). The identified motif in F705 starts at residue

RKQC) and follows the consensus sequence:

9)-C-x(2)-C-x(6,8)-C F705 contains also, near its carboxy terminus, a second rare motif characteriz- ing membrane proteins involved in sugar trans- port, starting at residue 629 (MEKIGRRAFNKG) and following the consensus sequence: [LIVMSTI- [DE]-x-[LIVMFA]-G-R-[RK]-x(4,6)-G. However, its significance is questionable since the hydropho- bicity profile of F705 protein does not reveal the typical transmembrane domains (data not shown), unless it is a novel type of protein that can mediate both interaction with a sugar and transcriptional regulation.

The FastA search for ORF B473 product revealed low similarities to a number of membrane proteins. The most significant similarity was between its amino terminus and a mammalian membrane protein, the putative transporter of mevalonate (Figure 6a). This similarity does not necessarily indicate a directly homologous mol- ecule but some sort of membrane transporter. In fact, the hydrophobicity profile of ORF B473 showed 12 membrane spanning stretches typical of such proteins (Figure 6b). The profile also included one central and one carboxy terminal hydrophilic region which followed the transmembrane seg- ments 6 and 12 respectively, similarly to the profile of MEV protein and other membrane transporters (Culham et al., 1993). The main difference was at the amino terminus (- 30 residues) or ORF B473 which appeared hydrophilic. A transcript corre- sponding to the ORF B473 was not detected in extracts of cells grown in standard YPD (Guthrie and Fink, 1991) growth medium (data not shown).

23 (SCHFCRVRKLKCDRVRPFCGSCSSRN-

[GAS]-C-X(~)-C-[RKH]-X(~)-[RK]-X-[RK]-C-X(~,

Page 10: XI. Yeast sequencing reports. Sequencing of a 13·2 kb segment next

S90 D. ALEXANDRAKI AND M. TZERMIA

B473 MSEERHEDHHRDVENKLNLNGKDDINGNTSISIEVPDGGYGWFILL-AFILYNFS'I'WGAN MeV MPPA--------------------IGG--PVGYTPPDGGW~A~~ISIGFS-YAFP

**** * * * . . . *** *** . . *. . *.* ... . B473 SGYAIYLAHYLENNTFAGGSKLDYASIGGLAFSCGLFFAPVITWLYHIFSIQFIIGLGIL MeV KSI"F---FKEIEGIFNATTSEVSWISSIMLAVMYAGGPISSVLVNKYGSRPVMIAGGC

.. ... * * ... .... . . *... .. .*. . * . .. . .. * B473 FQGAALLLAAFSVTLWEIYLTQGVLIGFGLAFIFIPSVTLIPLWFRNKRSLASGIGTAGS MeV LSGCGLIAASFCNTVQELYLCIGVIGGLGLAFNLNPALTMIGKYFYKKF@LZNGLAMAGS * . * * * - * . *. *.** **. *.**** . * * * * . * .* . * **** . *** ***

100 200 300 400 B

3 2 1 0

-1 -2 -3 4

4 4 3 3 2 2 1 1

0 0

-1 -1

-2 -2

-3 -3 -4 -4

100 200 400

Figure 6. (a) Alignment of the 180 aminoterminal residues of ORF B473 with the 156 aminoterminal residues of the mammalian membrane transporter MEV. @) Hydrophobicity profiles (Kyte and Doolittle, 1982) of ORF B473 and MEV protein.

Ths may indicate gene expression under specific conditions or gene repression related to its position near the telomere.

The product of ORF F711 showed a significant similarity to the known yeast FREl ferric reduc- tase. We have proven by biochemical, genetic and structural analyses that it is also a membrane protein that can reduce the environmental ferric iron to its intracellular ferrous form. The expres- sion of this non-essential gene is down-regulated by the presence of iron in the growth medium and

its RNA is not detectable under standard YPD medium growth conditions (Georgatsou and Alexandraki, submitted for publication).

ACKNOWLEDGEMENTS We thank Bernard Dujon for the excellent coordi- nation of the Chromosome XI sequencing project. We thank Irmi Becker and all MIPS staff for help with the sequence analysis. We thank Horst Domdey for allowing us to use the unpublished

Page 11: XI. Yeast sequencing reports. Sequencing of a 13·2 kb segment next

13.2 kb SEGMENT OF CHROMOSOME XI S9 1

DYClOl hypothetical protein from chromosome 11. We also thank Yannis Papanikolaou for help with the computer analyses at IMBB, Georgia Houlaki for help with the preparation of the figures and the co-contractor George Thireos for helpful discussions. This work was supported by the Commission of the European Communities under the BRIDGE program of the Division of Biotechnology and by the Greek Ministry of Industry, Energy and Technology.

REFERENCES Altschul, S. F., Gish, W., Miller, W., Myers, E. and

Lipman, D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403-410.

Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A. and Struhl, K. (Eds) (1987). Current Protocols in Molecular Biology. Green Publishing Associates and Wiley-Interscience.

Bairoch, A. (1991). A dictionary of sites and patterns in proteins. Nucl. Acid Res. 16, 2241-2245.

Bork, P., Ouzounis, C., Sander, C., Scharf, M., Schneider, R. and Sonnhammer, E. (1992). Compre- hensive sequence analysis of the 182 predicted open reading frames of yeast chromosome 111. Protein Sci.

Coleman, J . E. (1990). Zinc proteins: enzymes storage proteins, transcription factors and replication pro- teins. Annu. Rev. Biochem. 61, 897-946.

Culham, D. E., Lasby, B., Marangoni, A. G., Milner, J. L., Steer, B. A., van Nues, R. W. and Wood, J. M. (1993). Isolation and sequencing of Escherichia coli gene prop reveals unusual structural features of the osmoregulatory prolinelbetaine transporter, ProP. J. Mol. Biol. 229, 268-276.

Dancis, A,, Roman, D. J., Anderson, G. J., Hinnenbush, A. G. and Klausner, R. D. (1992). Ferric reductase of Saccharomyces cerevisiae: Molecular characteriz- ation, role in iron uptake and transcriptional control by iron. Proc. Natl. Acad. Sci. U.S.A. 89, 3869-3873.

Georgatsou, E. and Alexandraki, D. Two distinctly regulated genes are required for ferric reduction, the first step of iron uptake in Saccharomyces cerevisiae. (Submitted for publication)

1, 1677-1690.

Guthrie, C. and Fink, G. R. (Eds) (1991). Guide to yeast genetics and molecular biology. Methods in Enzymol- ogy, vol. 194. Academic Press, New York.

Higgins, D. G. and Sharp, P. M. (1988). Clustal: a package for performing multiple sequence alignment on a microcomputer. Gene 73, 237-244.

Kim, C. M., Goldstein, J. L. and Brown, M. S. (1992). cDNA cloning of MEV, a mutant protein that facili- tates cellular uptake of mevalonate and identification of the point mutation responsible for its gain of function. J. Biol. Chem. 267, 23113-23121.

Kondo, K. and Inouye, M. (1991). TIPI, a cold shock inducible gene of Saccharomyces cerevisiae. J. Biol. Chem. 266, 17537-1 7544.

Kyte, J. and Doolittle, R. F. (1982). A simple method for displaying the hydrophobic character of a protein. J. Mol. Biol. 157, 105-132.

Marck, C. (1988). ‘DNA Strider’: a ‘C ’ program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computer. Nucl. Acid Res. 16, 829-1836.

Marguet, D., Guo, X. J. and Lauquin, G. J.-M. (1988). Yeast gene SRPI (serine rich protein)-Intragenic re- peat structure and identification of a family of SRPI related DNA sequences. J. Mol. Biol. 202, 455470.

Mortimer, R. K., Schild, D., Contopoulou, C. R. and Kans, J. A. (1989). Genetic map of Saccharomyces cerevisiae, Edition 10. Yeast 5, 321-403.

Oliver, S . G. et al. (1992). The complete DNA sequence of yeast chromosome 111. Nature 357, 38-46.

Pearson, V. R. and Lipman, D. J. (1988). Improved tools for biological sequence analysis. Proc. Natl. Acad. Sci. U.S.A. 85, 2444-2448.

Sandell, L. L. and Zakian, V. A. (1992). Telomeric position effect in yeast. Trends Cell Biol. 2, 1G14.

Tzermia, M., Horaitis, 0. and Alexandraki, D. (1994). The complete sequencing of a 24.6 kb segment of yeast chromosome XI identified the known Loci UR.41, SAC1 and TRP3, and revealed 6 new open reading frames including homologues to the threonine dehydratases, membrane transporters, hydantoinases and the phospholipase A,-activating protein. Yeast 10.