characteristics affecting expression and solubilization of yeast membrane proteins

16
Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins Michael A. White 1 , Kathleen M. Clark 2 , Elizabeth J. Grayhack 1,2 and Mark E. Dumont 1,2 1 Department of Biochemistry and Biophysics, University of Rochester Medical Center , Rochester, NY 14642, USA 2 Department of Pediatrics, University of Rochester Medical Center, Rochester, NY 14642, USA Biochemical and structural analysis of membrane proteins often critically depends on the ability to overexpress and solubilize them. To identify properties of eukaryotic membrane proteins that may be predictive of successful overexpression, we analyzed expression levels of the genomic complement of over 1000 predicted membrane proteins in a recently completed Saccharomyces cerevisiae protein expression library. We detected statistically significant positive and negative correlations between high membrane protein expression and protein properties such as size, overall hydrophobicity, number of transmembrane helices, and amino acid composition of transmembrane segments. Although expression levels of membrane and soluble proteins exhibited similar negative correlations with overall hydrophobicity, high-level membrane protein expression was positively correlated with the hydrophobicity of predicted transmembrane segments. To further characterize yeast membrane proteins as potential targets for structure determination, we tested the solubility of 122 of the highest expressed yeast membrane proteins in six commonly used detergents. Almost all the proteins tested could be solubilized using a small number of detergents. Solubility in some detergents depended on protein size, number of transmembrane segments, and hydrophobicity of predicted transmembrane segments. These results suggest that bioinfor- matic approaches may be capable of identifying membrane proteins that are most amenable to overexpression and detergent solubilization for structural and biochemical analyses. Bioinformatic approaches could also be used in the redesign of proteins that are not intrinsically well-adapted to such studies. © 2006 Elsevier Ltd. All rights reserved. *Corresponding author Keywords: membrane proteins; detergents; yeast; protein overexpression; structural genomics Introduction Integral membrane proteins present a significant problem for the ongoing intensive efforts to expand the availability of three-dimensional protein structures. 1 Whereas membrane proteins comprise up to 30% of the proteome and perform crucial functions in a wide variety of cellular processes, they are poorly represented among available high-reso- lution crystal structures. 2 This lack of structural data has significantly hindered progress towards gaining a detailed understanding of how membrane pro- teins carry out their functions. Most currently available high-resolution membrane protein struc- tures have been obtained using proteins derived Present address: M. A. White, Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63108, USA. Abbreviations used: TX-100, Triton X-100; LDAO, lauryldimethylamine-N-oxide; C 8 E 4 , tetraethyleneglycol monooctyl ether; FC-12, FOS-choline 12®; OG, n-octyl-β-D-glucoside; DDM, n-dodecyl-β-D-maltoside; cmc, critical micelle concentration; ORF, open reading frame; SGD, Saccharomyces Genome Database; cmc, critical micelle concentration. E-mail address of the corresponding author: Mark _ [email protected] doi:10.1016/j.jmb.2006.10.004 J. Mol. Biol. (2007) 365, 621636 0022-2836/$ - see front matter © 2006 Elsevier Ltd. All rights reserved.

Upload: michael-a-white

Post on 25-Oct-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

doi:10.1016/j.jmb.2006.10.004 J. Mol. Biol. (2007) 365, 621–636

Characteristics Affecting Expression and Solubilizationof Yeast Membrane Proteins

Michael A. White1, Kathleen M. Clark2, Elizabeth J. Grayhack1,2

and Mark E. Dumont1,2⁎

1Department of Biochemistryand Biophysics, University ofRochester Medical Center,Rochester, NY 14642, USA2Department of Pediatrics,University of Rochester MedicalCenter, Rochester, NY 14642,USA

Present address: M. A. White, DeWashington University School of MMO 63108, USA.Abbreviations used: TX-100, Trito

lauryldimethylamine-N-oxide; C8E4,monooctyl ether; FC-12, FOS-cholinn-octyl-β-D-glucoside; DDM, n-dodecmc, critical micelle concentration; Oframe; SGD, SaccharomycesGenome Dmicelle concentration.E-mail address of the correspondi

[email protected]

0022-2836/$ - see front matter © 2006 E

Biochemical and structural analysis of membrane proteins often criticallydepends on the ability to overexpress and solubilize them. To identifyproperties of eukaryotic membrane proteins that may be predictive ofsuccessful overexpression, we analyzed expression levels of the genomiccomplement of over 1000 predicted membrane proteins in a recentlycompleted Saccharomyces cerevisiae protein expression library. We detectedstatistically significant positive and negative correlations between highmembrane protein expression and protein properties such as size, overallhydrophobicity, number of transmembrane helices, and amino acidcomposition of transmembrane segments. Although expression levels ofmembrane and soluble proteins exhibited similar negative correlations withoverall hydrophobicity, high-level membrane protein expression waspositively correlated with the hydrophobicity of predicted transmembranesegments. To further characterize yeast membrane proteins as potentialtargets for structure determination, we tested the solubility of 122 of thehighest expressed yeast membrane proteins in six commonly useddetergents. Almost all the proteins tested could be solubilized using asmall number of detergents. Solubility in some detergents depended onprotein size, number of transmembrane segments, and hydrophobicity ofpredicted transmembrane segments. These results suggest that bioinfor-matic approaches may be capable of identifying membrane proteins that aremost amenable to overexpression and detergent solubilization for structuraland biochemical analyses. Bioinformatic approaches could also be used inthe redesign of proteins that are not intrinsically well-adapted to suchstudies.

© 2006 Elsevier Ltd. All rights reserved.

Keywords: membrane proteins; detergents; yeast; protein overexpression;structural genomics

*Corresponding author

partment of Genetics,edicine, Saint Louis,

n X-100; LDAO,tetraethyleneglycol

e 12®; OG,cyl-β-D-maltoside;RF, open readingatabase; cmc, critical

ng author:

lsevier Ltd. All rights reserve

Introduction

Integral membrane proteins present a significantproblem for the ongoing intensive efforts to expandthe availability of three-dimensional proteinstructures.1 Whereas membrane proteins compriseup to 30% of the proteome and perform crucialfunctions in a wide variety of cellular processes, theyare poorly represented among available high-reso-lution crystal structures.2 This lack of structural datahas significantly hindered progress towards gaininga detailed understanding of how membrane pro-teins carry out their functions. Most currentlyavailable high-resolution membrane protein struc-tures have been obtained using proteins derived

d.

Page 2: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

622 Expression and Solubility of Membrane Proteins

from prokaryotes†. However, many important cla-sses of eukaryotic membrane proteins, such as G-protein coupled receptors and receptor tyrosinekinases are entirely absent or poorly representedamong available structures.The most severe difficulties in structure deter-

mination of membrane proteins arise from theprocesses of expression, purification, and crys-tallization.1,3–5 Compared with soluble proteins,membrane proteins are difficult to overexpress athigh levels in a form that yields high quality crystals.Thus, the first eukaryotic transmembrane proteinstructures that were solved were obtained usingproteins derived from native tissues. It is only veryrecently that the first structures of eukaryoticmembrane proteins have been determined usingrecombinantly expressed proteins, and in each caseyeast was used as the heterologous host.6–8 Until thechallenge of obtaining usable protein throughrecombinant expression is met, determination ofeukaryotic membrane protein structures will berestricted to the small subset of such proteins thatare expressed at high levels in native tissues.Recombinant expression is also necessary to obtainstructures of mutant forms of membrane proteinsthat are important for understanding the mechan-isms of their functions.Another major obstacle to structural studies of

membrane proteins arises from the need to solubi-lize these proteins in detergent for purification andbiophysical characterization. The chosen detergentmust effectively solubilize the protein withoutinhibiting function, without causing irreversibledenaturation, and without interfering with purifica-tion or crystallization. Currently, there is little basicunderstanding of the detailed interactions betweenproteins and detergents that could serve as the basisfor a rational protocol for deciding which detergentswould be suitable for use with particular proteins,and even empirical rationales for choosing deter-gents to solubilize a given membrane protein arelacking. In some cases, it is possible to initiallysolubilize a protein in one detergent, and thenexchange it into a different detergent; thus it may bedesirable to identify detergents that are mostgenerally useful in achieving the solubilization of awide variety of membrane proteins. The problemsassociated with detergent selection are particularlyacute in a high-throughput context where extensivescreening of dozens of detergents for use with eachof a large number of proteins is not feasible.The difficulty of overexpressing and solubilizing

membrane proteins also impedes their biochemicalcharacterization. Demonstrating or characterizingthe biochemical function of a particular polypeptidechain requires separation of that chain from otherpolypeptides, which, in the case of membrane

†See the comprehensive lists of membrane proteincrystal structures at http://www.mpibp-frankfurt.mpg.de/michel/public/memprotstruct.html and http://www.blanco.biomol.uci.edu/Membrane_Proteins_xtal.html

proteins, almost always involves the use of deter-gents. Establishing conditions for general, genome-wide solubilization of membrane proteins is animportant step in extending mass-spectrometry-based proteomic analyses to these proteins.9–11

Furthermore, biochemical genomics, the use ofgenomic expression libraries for identification ofunknown genes encoding proteins with assayablebiochemical functions,12 has not generally beenapplied to membrane-associated activities, in partbecause of the difficulty of establishing uniformconditions for solubilization of the genomic comple-ment of membrane proteins in functional form.There is currently little information available on

membrane protein expression, solubilization, pur-ification, and crystallization on a genomic scale.Membrane proteins have generally been excludedfrom most structural genomics pipelines, consistentwith reports that the presence of predicted trans-membrane helices in a protein is strongly correlatedwith failure to determine the structure in a high-throughput context.5,13,14 Data on expression areavailable from several projects that have conductedcloning and expression of entire proteomes inbacteria;13–15 however, there are currently fewpublished analyses of such data specifically devotedto membrane proteins.16,17 In addition, severalmedium-scale projects have attempted to gatherstatistics on expression and detergent solubility ofprokaryotic membrane proteins.18,19 Since the tra-ditional one-at-a-time approach for membraneprotein structure determination has proven to beso difficult, the availability of genome-wide data onexpression and solubilization could enhance theefficiency of structure determination efforts bymaking it possible to identify those membraneproteins that will be most experimentally tractableor by informing rational approaches for mutagene-sis of membrane proteins aimed at improving expre-ssion or solubilization.Recently we cloned and expressed a nearly

complete genomic complement of yeast Saccharo-myces cerevisiae proteins in a homologous yeasthost.12 A total of 5573 open reading frames (ORFs)cloned into a PGAL1-regulated expression vectorwere tested for expression, including 1092 predictedmembrane proteins. This collection is called theMORF (movable ORF) library. We classified theexpression level of each protein as either high,medium, low, or not detected. Among predictedmembrane proteins in this library were 263 thatcould be expressed at levels comparable to the mosthighly expressed soluble proteins (≥1 mg/l),indicating that they are good candidates forstructural experiments. Here, we identify proteinproperties that correlate with the recombinantexpression potential of yeast membrane proteins,the first such analysis covering nearly all of thepredicted membrane proteins of a eukaryoticorganism. In order to evaluate the effectiveness ofvarious detergents in solubilizing a large set ofproteins, we selected 122 highly expressed mem-brane proteins and tested their solubility patterns in

Page 3: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

623Expression and Solubility of Membrane Proteins

six commonly used detergents. These experimentshave uncovered significant differences in the abil-ities of different detergents to solubilize yeastmembrane proteins, and have allowed us to identifyprotein properties associated with solubility in somedetergents.

Results

Protein characteristics that exhibit similarcorrelations with levels of soluble andmembrane protein expression

As described previously, we tested the expressionof 5573 S. cerevisiae proteins, including 1092 pre-dicted integral membrane proteins.12 Each proteinin this study was expressed in yeast as a fusionprotein with a 19 kDa C-terminal tag containing a6-His sequence, an HA epitope, a rhinovirus 3Cprotease recognition site, and a ZZ (protein A IgG-binding) domain. Levels of protein expression weredetermined by immunoblotting using anti-HA anti-bodies, and were scored on a four-level scale: nodetectable expression, low, medium, or high expres-sion. To identify factors that affect the potential forhigh-level recombinant expression of yeast integralmembrane proteins, we compared predicted mem-brane proteins scored as high-expressing (263proteins) to those scored as low-expressing (378proteins). Membrane proteins that were not ex-pressed at detectable levels (132 proteins) wereomitted from our analysis because of the possibilitythat their failure to express resulted from problemswith cloning, yeast transformation, or ORF annota-tion, rather than from the intrinsic characteristics ofthe protein.We examined the percentage of high expressing

proteins in categories defined according to ninedifferent general protein characteristics (Table 1). Six

Table 1. Protein characteristics tested for associationwith expression

A. General protein characteristicsCodon usage, codon adaptation indexMolecules per cell under chromosomal expressionPercentage of total protein residues that are aromaticIsoelectric pointSize (kDa)GRAVY score (overall protein hydrophobicity)Homolog in yeast or other organismCellular localizationPredicted to contain a signal peptide

B. Membrane protein characteristicsNumber of predicted transmembrane segmentsN and C-terminal orientation across membraneAverage transmembrane segment lengthPercentage of protein in transmembrane segmentsPercentage of transmembrane residues that are

hydrophobic (WFLIVMY)Percentage of transmembrane residues that are

charged/polarPercentage of transmembrane residues that are aromatic (WYF)

protein characteristics exhibited significant correla-tions with membrane protein expression level: size,hydropathy (GRAVY score),20 native expressionlevel (molecules per cell),21 isoelectric point, codonadaptation index, and the percentage of aromaticresidues in the total protein (Figure 1). As indicatedin Figure 1, in each of these six cases a similarcorrelation was observed between the relevantprotein property and expression levels of bothmembrane and soluble proteins.One of the strongest predictors of protein

expression level for both soluble and membraneproteins is protein size, such that smaller proteinsare generally more likely to be highly expressedthan larger proteins. Less than 20% of membraneproteins larger than 80 kDa are highly expressedcompared with more than 40% of proteins smallerthan 60 kDa (Figure 1(a)). The size-dependentdecrease in expression levels of membrane proteinsis observed at a somewhat smaller size range(41 kDa–60 kDa) than for soluble proteins, whereit is not observed until the proteins reach sizes of61 kDa–80 kDa. For the smallest size range ofproteins (≥20 kDa), expression actually increaseswith increasing size. Less than 40% of soluble andmembrane proteins in this smallest range are highlyexpressed, compared with more than 60% ofmembrane and soluble proteins in the 21 kDa–40 kDa range. This may reflect the higher percen-tage of reading frames in the smallest size categorythat are classified as “dubious” and are not likely tobe efficiently expressed in cells.A negative correlation between recombinant over-

expression and protein hydrophobicity that haspreviously been noted for proteins in general13,14,22

appears to hold for yeast membrane proteins as well(Figure 1(b)).We observe an overall correlation between native

expression level and the level of protein in theoverexpression MORF library; however, this is ageneral trend with many exceptions. Fully 30%–40%of soluble and membrane proteins in the lowestnative expression category can be expressed to thehighest levels from the MORF vector.No correlation was observed between expression

levels of a protein and (1) the presence a homolog ofthe protein in yeast or other organisms; (2) known orpredicted cellular localization; or (3) whether theprotein contains a predicted a signal peptidetargeting it to the secretory pathway. To test for anassociation between expression level and proteinfunction or physiological process, we used the SGDGO Term Finder23 to identify gene ontology (GO)categories that were over-represented in eitherexpression category. We found only one significantassociation between a GO category and expressionlevel: v-SNARE and t-SNARE proteins involved intransit of proteins through the secretory pathwaytended to be highly expressed. Three of 10 annotatedt-SNARE proteins and 7 of 14 annotated v-SNAREproteins were in the highest expressing class ofproteins, while none of the SNARE proteins were inthe lowest expressing class. Despite this particular

Page 4: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

Figure 1. Protein expression dependence on general protein properties. Bar graphs show the total number ofpredicted membrane proteins (high expressers+low expressers, right axis) in each bin; line plots show the proportion ofhigh expressers for membrane proteins (♦, continuous line) and soluble proteins (●, broken line) as a function of thespecified protein property. The indicated p-values refer to membrane protein plots. They were determined by a chi-squared test for trend in proportions as described in Materials and Methods.

624 Expression and Solubility of Membrane Proteins

case, the functions and localizations of membraneproteins do not generally impact the ability toexpress high levels of proteins in our system.

Membrane-specific properties affecting levels ofprotein overexpression

We tested for an association between level ofexpression and seven properties that are specific tomembrane proteins, focusing on the number andcharacteristics of transmembrane segments (Table 1;Figure 2). As reported previously,12 there is aninverse correlation between the number of trans-membrane segments in a protein and level ofexpression. This is examined in more detail inFigure 2(a). Approximately 50% of proteins withfewer than five transmembrane helices wereexpressed to the highest levels, whereas less than20% of the proteins with seven or more transmem-brane helices were expressed at these high levels.

However, the decrease in expression level withincreasing number of transmembrane segments isnot monotonic over all the full range of numbers oftransmembrane segments.Since the size of an integral membrane protein is

to some degree correlated with the number oftransmembrane segments in the protein, we alsoexamined the relationship between the percentageof the protein found in membrane spanningregions and expression level (Figure 2(b)). Overhalf of the predicted transmembrane proteins with20% or less of their residues in transmembranesegments were highly expressed, whereas thefraction of high-expressing proteins drops toapproximately 30% for proteins with more than20% of their residues in transmembrane segments.In order to further distinguish between the effectsof size and transmembrane helices, we examinedthe effects of overall size on expression for proteinswith a given number of transmembrane segments

Page 5: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

Figure 2. Protein expression dependence on protein size and number of transmembrane segments. Bar graphs showthe total number of predicted membrane proteins (high expressers+ low expressers, right axis) in each bin; line plots showthe proportion of high expressing membrane proteins as a function of the specified property. The indicated p-values weredetermined as described in Materials and Methods. In (c)–(e), the effects of protein size were examined for proteins thatcontain set numbers of predicted transmembrane segments.

625Expression and Solubility of Membrane Proteins

(Figure 2(c)–(e)). The negative correlation betweensize and high expression is maintained for proteinswith one to three transmembrane segments (Figure2(c) and –(d)) but not for proteins with more thanten transmembrane segments (Figure 2(e)). Forproteins with other numbers of transmembranesegments, the numbers of proteins in each sizerange was not sufficient to allow detection of sig-nificant correlations.The relationship between expression level and

sequence in transmembrane segments is distinctly

Figure 3. Protein expression dependence on hydrophobicitthe total number of predicted membrane proteins (high expressthe proportion of high expressing membrane proteins as a fun

different than that between expression and overallprotein sequence in that there is a significantpositive correlation between expression levels andthe hydrophobicity of predicted transmembranesegments (Figure 3(a)). Fewer than 40% of proteinswith less than 70% hydrophobic amino acids in theirtransmembrane segments are highly expressed,while nearly 60% of proteins with more than 70%hydrophobic residues in their transmembrane seg-ments are highly expressed. This effect is even morestriking when viewed in terms of the percentage of

y of predicted transmembrane segments. Bar graphs showers+ low expressers, right axis) in each bin; line plots showction of the specified property.

Page 6: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

Figure 4. Titrations testing detergent/lysate ratio forsolubilizing conditions. Little variation in solubility wasobserved for most detergents over a broad range ofdetergent/lysate volume ratios. Each panel shows theresults of a representative immunoblot comparing totallysate to supernatants from centrifuged samples treatedwith detergent-free buffer or varying ratios of detergent/lysate solutions. The black triangles indicate the increasingratios of volumes of stock detergent to lysate across thelanes of the gel. The actual ratios used were: 1% and 2%OG, 6:1, 17:1, 40:1; 1% DDM, 9:1, 10:1, 13:1, 18:1, 30:1; 1%FC12, 5:1, 8:1, 17:1, 40:1. Results for titrations with C8E4are not shown, since none of the initial trial proteins weresignificantly soluble in this detergent.

626 Expression and Solubility of Membrane Proteins

charged and polar residues in the transmembraneregions (Figure 3(b)). Proteins with transmembranesegments comprised of less than 8% charged orpolar residues express at high levels twice asfrequently as proteins with more than 18% of suchresidues. This is in marked contrast to the negativeeffect of overall protein hydrophobicity on expres-sion levels shown in Figure 1(b).We found no correlation between expression and

any of the other membrane-specific parameters wetested, including average transmembrane segmentlength, percentage of transmembrane residues thatare aromatic, and topological orientation of proteinN and C termini with respect to the membrane(Table 1; and data not shown).

Detergent solubility patterns of membraneproteins

Despite the importance of detergent solubilizationfor membrane protein biochemistry and structuralbiology, there are few general principles available toguide researchers in the selection of the appropriatedetergent for a specific protein.3,24 To investigate theinteractions of detergents with different proteins,we determined the solubility of a set of 122 high-expressing predicted membrane proteins in sixcommonly used detergents: Triton X-100 (TX-100),lauryldimethylamine-N-oxide (LDAO), FOS-choline12 (FC-12, dodecylphosphocholine), tetraethylene-glycol monooctyl ether (C8E4), n-octyl-β-D-glucoside(OG), and n-dodecyl-β-D-maltoside (DDM). Four ofthe detergents used, LDAO, DDM, OG, and C8E4,are among the most commonly used detergents inmembrane protein crystallography‡. FC-12, thoughnot commonly used for crystallography, has beenused in NMR studies of membrane proteins, hasshown promising results in our laboratory (N.Fedoriw, K. Robinson, K.M.C. and M.E.D., unpub-lished results), and is being used in a number ofstructural and biochemical studies in other labo-ratories.18,25–29 For comparison, we also includedTriton X-100, a detergent used extensively in bio-chemical studies of membrane proteins.To establish standard conditions for testing pro-

tein solubilization, we considered two major para-meters: the critical micelle concentration (cmc) andthe molar ratio of detergent to protein. To maximizethe effectiveness of solubilization, all detergentswere present in solubilization trials at concentra-tions several times higher than their cmc values (seeMaterials and Methods). To ensure that the amountsof detergent added for solubilization were notlimiting, we performed titration experiments vary-ing the ratio of cell lysate volume to detergentvolume (Figure 4). In preliminary tests of three trialproteins using the standard detergent concentra-tions and standard amounts of cell lysate (see

‡ See the online list maintained by Hartmut Michel athttp://www.mpibp-frankfurt.mpg.de/michel/public/memprotstruct.html

Materials and Methods), we found maximumsolubility at detergent/lysate ratios of ~15:1 (μldetergent solution:μl lysate) with little additionalchange at increased detergent concentrations. Thus,we used this ratio for all further solubility tests.Of the 122 proteins tested, only nine (7.4%)

exhibited more than 25% solubility in the absenceof detergent. (A listing of the solubilities of indi-vidual tested proteins is available as SupplementaryData). Since we selected membrane proteins forour experiments based only on TMHMM predic-tions, this confirms previous reports of a low rateof false positive identifications for the TMHMMalgorithm.30 TMHMM predicted a total of 1155transmembrane proteins in the MORF collection, incontrast to HMMTOP, which predicted 2018 mem-brane proteins, most likely including a greaternumber of false positives. Of the nine proteins thatwere soluble without detergent, two (Pan5p, Grx2p)are known cytosolic proteins31,32 and were falselypredicted by TMHMM to each contain a single trans-membrane segment. Another protein, YHR138Cp,also predicted to contain a single transmembranesegment, is homologous to the cytosolic proteinPbi2p,33 and, thus, is also likely to be a false positiveprediction of TMHMM. Two additional proteins,Gas4p and Kar2p, that are each predicted to containone transmembrane segment, have been reported to

Page 7: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

Figure 5. Representative detergent solubility patterns.Each panel shows the results of an immunoblot for oneprotein comparing total lysate to the supernatant ofsolubilized samples or a non-solubilized negative controlafter centrifugation at 109,000g for 1 h. Fully solubilizedsamples show a band intensity equal to the lysate control,while poorly solubilized samples show a weak bandintensity. The solubility of the samples was scored on ascale of 0–4, with 4 indicating full solubilization, asdescribed in Materials and Methods.

627Expression and Solubility of Membrane Proteins

be only peripherally associated with the mem-brane.34,35 This leaves four proteins (Tvp15p,Cho1p, Ssu1p, Uip3p; ∼3% of the total) that arepredicted to contain multiple transmembrane seg-ments and are annotated in SGD as authenticmembrane proteins, but show at least partial solubi-lity in the absence of detergent. The basis for thisdiscrepancy is unknown.Nearly all of the remaining proteins fell into only a

few solubility patterns (Table 2; Figure 5). Only threeproteins were classified as being insoluble (specifi-cally, less than 50% soluble) in all detergents, but allthree of these were still ∼25% soluble in FC-12. Thenumber of predicted transmembrane segments inthese proteins ranged from 1 (in Dap2p) to 12 (inAlg7p) and there was no apparent similarity amongthem. Six proteins (Yur1p, Spc2p, Erp1p, Erp3p,Erp4p and YOR105Wp) were insoluble in all but onedetergent, and that detergent was always FC-12.One of these six proteins, YOR105Wp, is a short ORFthat has been classified as “dubious”. The remainingfive ORFs in this class (Yur1p, Spc2p, Erp1p, Erp3pand Erp4p) are proteins that are predicted to containone to two transmembrane segments and have beenlocalized to the Golgi or endoplasmic reticulum(ER). Erp1p, Erp3p, and Erp4p are p24-class ERPproteins involved in ER to Golgi transport.36–38

Of the 113 proteins that showed broader solubility,57%were soluble in at least four different detergents(Table 2). The two most common solubility patternswe observed were solubility in all six detergents(31% of proteins tested) and solubility in alldetergents except C8E4 and OG (21% of proteinstested). With a few exceptions, the proteins thatcould be solubilized in the less efficient detergentstended to also be soluble in the more efficientdetergents (Figure 6), indicating that solubilitypatterns are not dominated by specific protein–detergent interactions.The structures of the detergents appeared to be

related to their solubilization efficiency (Figure 7):the zwitterionic detergents LDAO and FC-12 werethe most effective at solubilizing, while the shortchain detergents C8E4 and OG were the leasteffective. Zwitterionic detergents are known tosolubilize membrane proteins more efficiently thannon-ionic detergents.24,39 The poor solubilizationefficiency of C8E4 and OG, compared with DDM, is

Table 2. Detergent solubility patterns

Solubility pattern No. of proteins Total (%)

All 6 detergents 35 31.0All except C8E4 6 5.3All except C8E4 and OG 24 21.2Zwitterionic detergents and DDM 12 10.6Zwitterionic detergents only 18 15.9Only 1 detergenta 6 5.3Insoluble in allb 3 2.7Miscellaneous 4 3.5

a This detergent is always FC-12.b Always 25%–50% soluble in FC-12.

somewhat unexpected, given their broad use inmembrane protein crystallography.We used an approach similar to the one described

above for protein expression to look for associationsbetween protein characteristics and solubility inparticular detergents. Because LDAO and FC-12solubilized nearly all of the proteins tested, it isapparent that few protein characteristics affectsolubility in these detergents and we therefore didnot analyze these data. Furthermore, when wetested the protein characteristics listed in Table 1for associations with protein solubilities in theremaining four detergents, TX-100, DDM, OG, andC8E4, we found no significant correlations.Solubilities in the two least-effective detergents,

OG and C8E4, were inversely correlated with proteinsize and number of transmembrane segments, andwere also affected by the amino acid composition ofthe transmembrane segments, although the statis-tical significance of these correlations is limited bythe small sample sizes. The most significant findingwas that the solubilities of the proteins tested inthese two detergents progressively decreased with

Figure 6. Overlap of soluble protein sets among sixdifferent detergents. The Venn diagram illustrates over-lapping solubilities among all six detergents tested. Threeproteins were insoluble in all detergents, and oneprotein, Pmp2p, was soluble in all detergents exceptFC-12 and DDM; these proteins are not represented inthe diagram.

Page 8: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

Figure 7. Solubilization efficiency of detergents. Blackbars indicate the number of proteins that were at least 50%soluble in a given detergent (score of 3 or 4) out of apossible 113 proteins. Hatched bars show the number ofproteins that were marginally soluble in a given detergent(score of 2). The zwitterionic detergents LDAO and FC-12were the most effective solubilizers.

628 Expression and Solubility of Membrane Proteins

increasing number of predicted transmembranesegments and larger size (Figure 8(a) and (b)).Proteins with transmembrane segments composed

Figure 8. Dependence of solubility on protein properties.solubility in each bin; line plots show the percentage of proteinsand C8E4 ( , blue) as a function of the specified protein characand these protein properties were not statistically significant; ptest for trend in proportions as described in Materials and Me

of more than 10% charged and polar amino acidswere also somewhat less likely to be soluble thanproteins with less hydrophilic transmembraneregions (Figure 8(c)) and proteins with more than70% hydrophobic residues in predicted transmem-brane segments were nearly twice as likely to besoluble in OG and C8E4 than proteins with less than70% hydrophobic residues (Figure 8(d)). No othersignificant correlations were detected betweensolubilities of different proteins in OG and C8E4and the properties listed in Table 1, including sub-cellular localization of proteins.The limited solubilities of some proteins in OG

and C8E4 could be caused either by the relativelyshort lengths of the acyl chains of these detergents,or by the specific chemical configurations of theirhydrophilic and hydrophobic moieties. To distin-guish between these possibilities, we examined theeffect of varying acyl chain lengths within differentfamilies of detergents on the ability to solubilizeselected proteins that exhibited moderate solubili-ties in the original set of tested detergents. Severaltested proteins exhibited increased solubilities withincreased detergent chain lengths. Cho1p, Ost6p,and Sec66p exhibit only slight solubilities in OG, but

Bar graphs show the total number of proteins tested forsoluble for DDM ( , green), TX-100 ( , red), OG (♦, black)teristic. Correlations between DDM and TX-100 solubility-values for OG and C8E4 were determined by a chi-squaredthods.

Page 9: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

629Expression and Solubility of Membrane Proteins

are more readily solubilized by nonyl and decylglucosides (Figure 9(a)). However, these proteins arenot completely solubilized even in the longer chaindetergents, suggesting either that they are intrinsi-cally slowly or inefficiently solubilized or that theyexist in heterogeneous states in the cell lysates.Pmc1p shows greater solubility in longer chainpolyoxyethylene detergents (Figure 9(b)), but sev-eral other proteins, including Ost6p and the productof the YGL080w gene showed little or no increasedsolubility in the longer chain members of thepolyoxyethylene family (Figure 9(b)). Several pro-teins tested with the families of FOS-choline andmaltoside detergents showed increased solubilitiesas chain lengths increased up to 14 carbon atoms(Figure 9(c) and (d)).

Discussion

The ability to obtain useful amounts of solubilizedmembrane proteins will be critical for structural andbiochemical analyses of the large numbers of theseproteins present in all genomes. However, becauseof the difficulty of membrane protein biochemistry,most previous analyses of protein production on agenomic scale have focused on soluble proteins.Because of the differences between post-transla-tional modification systems of different organismsand the complete absence of successful X-raystructure determinations of eukaryotic membraneproteins expressed in bacterial hosts, there aresignificant advantages to purification of eukaryoticmembrane proteins from a homologous eukaryoticexpression system. Thus, we have investigated theability to overexpress the genomic complement ofpredicted transmembrane proteins of the yeast S.

cerevisiae in a yeast host. Our study of expressionfocused on a comparison between the 263 mosthighly expressed and the 378 lowest expressedpredicted membrane proteins in a nearly completegenomic collection of cloned yeast ORFs.12 Amongthese 263 highly expressed proteins are 90 withhuman orthologs (as annotated in SGD), and over 40with published activity assays. Furthermore, wehave assembled a database of the detergent solubi-lities of a representative subset of approximatelyhalf of the high-expressing membrane proteins (seeSupplementary Data). Considering the generaldifficulty of expressing, purifying, and determiningthe structures of membrane proteins, such a prioriknowledge of the highest expressing, most readilysolubilized membrane proteins in a genome allowsefforts in structure determination and biochemicalcharacterization to be focused on the subset of targetORFs with the highest likelihood of success. Ourresults also point to general characteristics ofmembrane proteins that may be useful in predictingthe success of heterologous membrane proteinproduction in yeast, as well as, possibly, otherexpression systems.We have identified parameters that specifically

affect membrane protein expression, as well asparameters that affect membrane and solubleprotein expression in similar ways. The associationof protein size, isoelectric point, codon usage, andnative number of molecules per cell with the abilityto overexpress the general genomic complement ofyeast proteins and, in some cases, with expressionin other systems has been noted previously.5,12,13,22

However, the correlation of expression with iso-electric point and size was not observed forbacterial expression of Caenorhabditis elegans pro-teins.14 In addition to these properties, we find a

Figure 9. Effects of detergenttail lengths on membrane proteinsolubilization. Proteins exhibitinglimited solubility in OG and C8E4were further tested for solubilitiesin families of related detergentswith different tail lengths as des-cribed in Materials and Methods. Ineach case, the total protein in anequivalent amount of lysate isshown in the left-hand lane. Panelsshow experiments in the followingdetergent families: (a) glucosides;(b) polyoxyethylenes; (c) FOS-cho-line series; (d) maltosides.

Page 10: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

630 Expression and Solubility of Membrane Proteins

negative correlation between the overall percentageof aromatic amino acids in both soluble andtransmembrane proteins and their levels of over-expression that has not been reported. While thephysical explanations for the relationships betweenthese protein properties and expression levels arenot known, the similarity of the correlations formembrane and soluble proteins indicates that noneof these properties affect biosynthetic or degrada-tive processes unique to membrane proteins.In a previous analysis of the potential for

overexpression of 372 Escherichia coli inner mem-brane proteins with cytoplasmic C-termini in ahomologous E. coli host , no significant correlationswere reported between any protein properties andlevel of overexpression.16 The difference betweenthese results and the correlations we have uncov-ered may be explained by any of several con-siderations. (1) Differences between the processesof membrane protein expression in comparing E.coli with yeast or in comparing proteins withcytoplasmic C-termini with the general population.(2) Differences between the sensitivity anddynamic range of the detection method we used(immunoblotting) compared with the GFP fluores-cence used by Daley et al.16 (3) Differences in themeasures of statistical significance used in the twostudies. Whereas Daley et al. only presentedcorrelation coefficients, the p values we report arecapable of detecting even moderate correlationsthat are statistically significant.The likelihood of being able to express high levels

of both soluble and transmembrane proteins in theyeast system decreases significantly with increasedoverall hydrophobicity of the protein as measuredby the GRAVY score and number of predictedtransmembrane segments. Similar trends have beendetected in previous studies of expression.5,12,13,22

However, in contrast to this negative correlationbetween expression and overall hydrophobicity,we find a significant positive association betweenlevels of expression and the hydrophobicity of theamino acid sequences within predicted transmem-brane segments (Figure 3(a) and (b)). This meansthat the behavior of transmembrane segmentsmaking up only a small fraction of overall sequenceinformation in a protein can be a limiting factoraffecting expression levels. The positive correlationbetween hydrophobicity of transmembrane seg-ments and expression could reflect increased effi-ciency of translation provided by altered associationwith ribosomal components or signal recognitionapparatus, increased efficiency of integration intomembranes via the translocon, decreased degra-dation due to enhanced stability of folding, orpreferential localization to particular membranesub-domains.Despite the diversity of the machinery mediating

membrane insertion in different cellular compart-ments that might have been expected to lead todifferent expression efficiencies for proteins withdifferent intracellular trafficking patterns, no corre-lation was detected between the potential for over-

expression and sub-cellular localization, N and C-terminal topology, or the presence of a signalpeptide. This suggests that there is reserve capacityto allow overexpression of proteins with varioustopologies in diverse cellular compartments.Toxicity of overexpressed proteins is not a barrier

to successful membrane protein expression in oursystem, since we have found that overexpression ofmembrane proteins is not significantly more toxicthan overexpression of the general population ofyeast proteins. We previously reported that only 88proteins (1.6%) out of the total collection of 5573were lethal when tested for growth on solid media(2% (w/v) raffinose+ 2% (w/v) galactose), and that,despite this toxicity, 67 of these proteins werepresent at detectable levels after a 4 h induction inliquid media.12 An additional 320 proteins in thecollection (5.7% of the total) caused slow growth.Among predicted membrane proteins, 1.7% (19proteins) were lethal and 6.0% (66 proteins) causedslow growth. As was the case with the entirecollection, most lethal membrane proteins couldstill be produced at detectable levels under ourinduction conditions (79% of lethal membraneproteins versus 76% of all lethal proteins). We alsodetected no significant correlation between expres-sion level and protein toxicity: 15 (5.7%) highexpressing membrane proteins and 26 (6.9%) oflow expressing membrane proteins were lethal orcaused slow growth. A recent study of 567 over-expressed yeast membrane proteins found that 93%of these proteins caused growth defects.17 Theseresearchers measured growth in liquid media onmicrotiter plates and were thus able to detect effectson growth not apparent in our previously reportedresults for growth on solid medium. Despite this,viewed in the context of our focus on producinghigh levels of protein for purification, our resultsconfirm that growth inhibition due to protein over-expression is not a major obstacle.Several lines of evidence suggest that membrane

proteins are expressed in the MORF system infunctional forms. First, overexpressed membraneproteinswith C-terminal fusions in general appear toinsert correctly into the membrane. A recent studyreported the experimentally determined transmem-brane orientations of the C termini of 468 yeastmembrane proteins expressed from multi-copyplasmids as fusions to large C-terminal tags, undercontrol of a strong promoter.40 The authors foundthat the C-terminal orientations of their overex-pressed membrane proteins matched results fromother published reports in nearly all cases wherethere was independent data. Second, in our pre-viously published report on the MORF library, weshowed that glycoproteins, including many mem-brane proteins, are efficiently processed in thiscollection, especially in comparison with a similarcollection expressing N-terminally tagged pro-teins.12 Third, we detected ATPase activity inmembranes isolated from three strains expressingknown membrane ATPases (Figure 10). Fourth,numerous yeast membrane proteins have previously

Page 11: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

Figure 10. Three known membrane ATPases areexpressed in active form. Membranes isolated from strainsexpressing the known membrane ATPases Pma1p,Adp1p, and Ena5p, exhibit specific ATP hydrolysisactivities above the activity found in membranes isolatedfrom a control strain expressing the pheromone receptorSte2p. Each assay was performed in duplicate as describedinMaterials andMethods. (a) Pma1p and the Ste2p controlwere assayed at pH 6. (b) Adp1p, Ena5p, and the Ste2pcontrol were assayed at pH 7.5.

Table 3.Criticalmicelle concentrations (cmc) of detergents

Detergent cmc (mM)a cmc (% (w/v))a

C8E4 8.0 0.25OG 18.0 0.53DDM 0.17 0.0087FC-12 1.5 0.047LDAO 1.0 0.023TX-100 0.23 0.015

a Source: Anatrace catalog (available at http://www.anatrace.com/downloads.htm).

631Expression and Solubility of Membrane Proteins

been shown to be functional when overexpressedwith C-terminal tags.41–52 Finally, we do not observeany accumulation of insoluble aggregates whenproteins expression is induced.We have characterized the detergent solubilities

of half of the most highly expressed predictedmembrane proteins in a nearly comprehensiveyeast protein expression library (see Supplemen-tary Data). These tests provide a measure of theability of detergents to solubilize proteins directlyfrom yeast membranes, an important first step inprotein purification. However, given the diversityof the targeted ORFs, examining the suitability ofthe tested detergents for subsequent steps ofpurification, for protein crystallization, or formaintenance of native structure, was beyond thescope of the present study. In our tests, thezwitterionic detergents LDAO and FC-12 werethe most efficient at solubilizing yeast membraneproteins, consistent with results of previous studieswith smaller sets of proteins.18,39 The high effi-ciency of FC-12 suggests that this could be a usefuldetergent for structural work or high-throughputbiochemical and proteomic applications. FC-12 hasbeen used successfully in several NMR structuralstudies.25,26,53 One membrane protein crystal struc-ture has been reported in FC-14.54 The non-ionic

detergents DDM and TX-100 were less effectivethan the zwitterionic detergents, but were capableof solubilizing more than 60% of the tested proteins.Since DDM has the same hydrocarbon tail structureas LDAO and FC-12, its bulky and non-ionic headgroup appears to interfere with its solubilizingability relative to these zwitterionic detergents.Among non-ionic detergents, however, DDM wasthe most efficient, and since non-ionic detergentsmay be less denaturing than zwitterionic deter-gents,24 DDM could represent the best compromisebetween solubilization efficiency and functionalpreservation in a high-throughput context.Of the detergents commonly used for X-ray

crystallography, OG and C8E4 were the leastgenerally effective for solubilization of our repre-sentative set of membrane proteins. Despite therelatively high cmc values of these two detergents(Table 3), the relative insensitivity of solubilizationefficiency to changes in the relative amounts oflysate and detergent over the range we used (Figure4) indicates that their ineffectiveness is not due tolow numbers of micelles. Instead, their reducedeffectiveness is likely to be the result of their shorterhydrophobic tail lengths or altered head-groupproperties. The differences we observe in solubili-zation by the three different tested detergents thatall contained 12-carbon tails (LDAO, FC-12, andDDM) confirm the role of the detergent head groupas a determinant of effective protein solubilization.However, solubilization trials that we conductedwith families of similar detergents with differentchain lengths also show that, at least in some cases,solubilization of proteins from yeast lysates can beimproved by the use of detergents with longer acylchains. Notably, we encountered cases in whichlimited solubilities of certain proteins in a com-monly used detergent, OG, could be significantlyimproved by using nonyl glucoside, a less com-monly used detergent that differs from OG only bythe presence of one additional carbon in the acylchain.We find that proteins with the highest content

of hydrophobic amino acids in their predictedtransmembrane segments are the most efficientlysolubilized by the detergents OG and C8E4. Sincethese two detergents have short hydrophobic tails,this would appear to contradict simple models ofmembrane protein solubilization in which themost strongly hydrophobic detergents would be

Page 12: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

632 Expression and Solubility of Membrane Proteins

required to solubilize the most hydrophobic trans-membrane segments. Instead, it may be the casethat the most hydrophobic transmembrane seg-ments partition effectively into the interior ofmicelles of even short-chain detergents whereastransmembrane segments with more polar se-quences partition less well into micelles. Our resultssuggest that micelles comprised of long-chaindetergents are more tolerant of different ranges ofprotein hydrophobicity than micelles of short-chaindetergents.Although to date there have been few reported

successes with membrane proteins in structuralgenomics projects,55 our results show that expres-sion and solubilization of membrane proteins can, infact, benefit from high-throughput approaches foridentifying promising structural targets from agenomic complement of membrane proteins. Theuse of high-throughput approaches also allowed usto evaluate properties of membrane ORF sequencesthat can serve as the basis for a priori selection ofprotein targets with the highest likelihood ofsuccessful expression and solubilization.Modification of sequences of transmembrane

proteins based on the correlations we havedescribed may also provide a rational strategy forimproving expression and solubilization of targetproteins for structure determination, although ineach case the functional effect of the mutationswould need to be assessed. In a previous study,Kiefer and colleagues were able to design mutant Gprotein coupled receptors with substantiallyimproved expression by using a multiple linearregression approach to correlate sequences of theloop regions with levels of expression of elevenreceptors.56 However, unlike proteins in our system,these proteins were expressed into inclusion bodies.Other groups have attempted to improve thethermostability of membrane proteins by the intro-duction of mutations in the transmembraneregions.57,58 Our results suggest that increasing theproportion of hydrophobic amino acids in trans-membrane segments of membrane proteins ordecreasing the overall content of aromatic residuescould be used similarly to improve expression as astep toward improving the success rate for thedifficult process of structure determination ofmembrane proteins.

§http://www.cbs.dtu.dk/services/TMHMM/∥http://www.phobius.binf.ku.dk/¶http://www.enzim.hu/hmmtop/

Materials and Methods

Identification and topology prediction of membraneproteins

Our approach to identifying probable membraneproteins was a modification of the procedure used toidentify membrane proteins in our initial analysis ofmembrane protein expression in our S. cerevisiaelibrary.12 Two different transmembrane helix predictionprograms were used to identify and classify membraneproteins in the yeast genome. TMHMM is reported to bethe best program for distinguishing membrane proteins

from non-membrane proteins, especially on a genomicscale.30,59,60 We used TMHMM v. 2.0§61 to predict 1155integral membrane proteins in the MORF collection.From this set of 1155 proteins, we removed 63 that werepredicted by the Phobius program∥62 to have only asignal peptide and no transmembrane segments; this leftus with a total of 1092 proteins predicted to have one ormore transmembrane helices. While TMHMM isreported to be the most accurate for distinguishingmembrane proteins from soluble proteins, it is notnecessarily the best for determining the actual topologyof a membrane protein.60,63,64 We used HMMTOP65¶predictions as our best guess of the topology of themembrane proteins in our collection. In two cases (Ste2pand Pmc1p) we were aware of good data suggesting atopology different from the HMMTOP prediction andwe used the empirically determined topologies in ouranalysis.

Overexpression of membrane proteins forsolubilization experiments

Membrane proteins were expressed in strains from theMORF S. cerevisiae library.12 ORFs in this collection arecloned into a multi-copy PGAL1-regulated plasmid fused toa 19 kDa C-terminal His6-HA-ZZ tag. Yeast cultures inSD–Ura (containing 2% (w/v)dextrose) were grown to anA600 of approximately 2 and used to inoculate fresh 4 mlcultures in –Ura, 2% (w/v) raffinose medium. Thesecultures were grown for 15 h to an A600 of approximately1.2 and induced with 2 ml of 3×YPGalactose medium(yeast extract, peptone, galactose), resulting in a finalconcentration of 1×YP and 2% (w/v) galactose. All yeastcell growth was performed at 30 °C. After 6 h yeast cellswere harvested by removing them from the culturemedium, washing in 1 ml of ice-cold water and freezingthe cell pellets at −70 °C.Yeast cells were resuspended in lysis buffer (50 mM Tris

(pH 7.5), 1 mM EDTA, 1 M NaCl, 1× Complete ProteaseInhibitor CocktailTM (Roche), 2.5 μg/ml of pepstatin,1 mM DTT) at a volume of 300 μl lysis buffer per unit A600at harvest. Cells were broken with 0.5 mm Zirconia/Silicabeads (Biospec) by vortexing ten times for 20 s at 4 °C,with 1 min pauses on ice. Lysed samples were centrifugedin a microcentrifuge at 4 °C and 150g for 5 min. Thesupernatant was transferred to a fresh tube and frozen at−70 °C. Bradford assays66 of lysates from ten proteinsindicated that the average total protein content in lysateswas 460 μg/ml.

Detergent solubilization of membrane proteins

An aliquot of 9 μl of lysate (approximately 4.2 μg totalprotein) was added to 141 μl of detergent solution orbuffer without detergent. The buffer consisted of 20 mMHepes (pH 7.5), 500 mM NaCl, and 10% (v/v) glycerol.Detergent solutions consisted of buffer and one of thefollowing detergents (concentrations are w/v): 1% TritonX-100 (TX-100, Pierce), 1% lauryldimethylamine-N-oxide(LDAO, Anatrace), 2% tetraethyleneglycol monooctylether (C8E4, Anatrace), 1% FOS-choline 12 (FC-12,Anatrace), 2% n-octyl-β-D-glucoside (OG, Anatrace), or

Page 13: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

633Expression and Solubility of Membrane Proteins

1% n-dodecyl-β-D-maltoside (DDM, Anatrace). Sampleswere solubilized at room temperature for 1 h and thencentrifuged at 21 °C in a Beckman Ti 42.2 rotor at30,000 rpm (109,000g) for 1 h. After centrifugation, 16 μlof supernatant or control sample was mixed with 4 μl ofSDS loading buffer, heated for 5 min at 37 °C andelectrophoresed on a 26-well Criterion 8%–16% SDS–polyacrylamide gel (BioRad). Immunoblotting was per-formed using rat anti-HA primary antibody (clone 3F10,Roche) and HRP goat anti-rat secondary antibody(Jackson) for detection. Unless indicated, all additionaldetergents were purchased from Anatrace.For each protein, the intensity of the signals on

immunoblots from the detergent-solubilized sampleswere compared with the no-spin, no-detergent, totalprotein control and scored on a scale of 0–4, with 0representing no signal and 4 representing a bandintensity equal to the control. A score of 0–1 indicatesinsolubility in a particular detergent, while a score of 3–4indicates nearly complete solubility in that detergent.Samples scored as 2 were marginally soluble; onlysamples with a score of 3 or 4, corresponding to over50% solubility, were considered soluble in our analysis.

Database construction

Annotations for the proteins in our datasets wereobtained from the following sources: size, pI, codonusage indices, aromaticity, and GRAVY scores wereobtained from the Saccharomyces Genome Database(SGD)a; localization data and number of molecules percell were obtained from SGD and the UCSF Yeast GFPFusion Localization Databaseb;21,67 homolog data wereobtained from the SGD file of best hits against modelorganisms, which was generated with NCBI BLASTP;properties of transmembrane regions were determined byusing a Perl script to parse HMMTOP output files andprotein sequences obtained from SGD. Hydrophobicamino acids were classified based on the whole-residuehydrophobicity scales by Wimley and White.68

Statistical analysis

A chi-square test for trend in proportions or Fisher'sExact Test was used to test for an association betweenprotein properties and expression or detergent solubility.The 2×k contingency tables were generated for eachprotein variable, with k being the number of variablecategories (such a sub-cellular localization), or thenumber of bins into which numerical variables weredivided (such as ranges for hydrophobicity scores). Binswere chosen to represent biologically reasonable incre-ments of the variables, as well as to ensure a reasonabledistribution of our sample set among the bins, but ourconclusions were found to be insensitive to modestvariations in binning. When a tested protein feature hadonly two possible categories, such as the presence orabsence of a signal peptide, Fisher's Exact Test was usedto calculate p-values; in all other cases a chi-square trendtest was used.The null hypothesis for the statistical tests was that

expression or solubility was independent of the testedprotein property. More specifically, the p-value represents

a SaccharomycesGenomeDatabase: ftp://ftp.yeastgenome.org/yeast/

b http://www.yeastgfp.ucsf.edu/

the probability of obtaining the distribution of cell countsin a particular contingency table by chance, given theexpected distribution determined by the totals of thecolumns and rows of the table. For the chi-square trendtest, the alternative hypothesis was that expression orsolubility was dependent on the given protein character-istic, and that dependence was directional based on thenumerical (as opposed to purely categorical) ordering ofthe property being tested. We used a significance cutoff ofp<0.05, meaning that we accepted a 95% probability ofcorrectly rejecting the null hypothesis.These tests were carried out using the R statistical

computing environment, version 2.1.0c. We used thefunctions fisher.test and prop.trend.test for the two statisticaltests. For prop.trend.test, the number of high expressers orsoluble proteins is the number of events (x), the number oftotal proteins per bin is the number of trials (n), and theupper boundary of each bin is one element of the scorevector.

ATPase assays

Following cell lysis as described above and centrifuga-tion at 150g for 5 min, the supernatant was transferred to afresh tube and centrifuged at 24,600g for 45 min at 4 °C.The supernatant was removed and membranes wereresuspended in resuspension buffer (50 mM Tris (pH 7.5),500 mM NaCl, 10% glycerol, 1 mM EDTA) at 25 μl perA600-ml of original cell culture (50% of the original volumeof lysis buffer added to harvested cell pellets). Membraneswere stored at −20 °C. Protein concentration in membranepreparations was measured by Bradford assay.66

ATPase assays were performed by the addition of 5 μlresuspended membranes (∼1.5 μg protein) to 95 μl ofATPase cocktail (see below) containing 10 mM ATP,followed by mixing and incubation at 37 °C for 10 min.Reactions were stopped by the addition of 100 μl of 10%SDS; as a negative control, each reaction was done inparallel with a reaction towhich the SDSwas added beforeincubation. One hundred microlitres of Taussky and Shorrphosphate detection reagent69 was added, color wasdeveloped for 2 min, and the A700 was measured. Theactivity was determined by subtracting the A700 of thepositive reaction from that of the negative control. Thisvalue was converted to nmol of phosphate using astandard curve.The ATPase cocktail at pH 6 contained 50 mM Mes

buffer, 9 mMMgSO4, and 10 mMNaN3, and was adjustedto pH 6 with KOH. ATP was added to the cocktailimmediately before use. The assay buffer at pH 7.5contained 50 mM Tris instead of Mes buffer. The Tausskyand Shorr reagent was prepared as described.69

Acknowledgements

We thank Drs Gregory Tombline and ZulfiqarAhmad for assistance in setting up the ATPaseassays. This work was supported in part by NIHgrant HG02311 to Eric Phizicky and a sybcontract toM.E.D. of grant U54 GM074899 to the Center for

cR Core Development Team: http://www.r-project.org/

Page 14: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

634 Expression and Solubility of Membrane Proteins

High Throughput Stuctural Biology, P.I. GeorgeDeTitta.

Supplementary Data

Supplementary data associated with this articlecan be found, in the online version, at doi:10.1016/j.jmb.2006.10.004

References

1. Loll, P. J. (2003). Membrane protein structural biology:the high throughput challenge. J. Struct. Biol. 142,144–153.

2. White, S. H. (2004). The progress of membrane proteinstructure determination. Protein Sci. 13, 1948–1949.

3. Wiener, M. C. (2004). A pedestrian guide to membraneprotein crystallization. Methods, 34, 364–372.

4. Werten, P. J., Remigy, H. W., de Groot, B. L., Fotiadis,D., Philippsen, A., Stahlberg, H. et al. (2002). Progressin the analysis of membrane protein structure andfunction. FEBS Letters, 529, 65–72.

5. Canaves, J. M., Page, R., Wilson, I. A. & Stevens, R. C.(2004). Protein biophysical properties that correlatewith crystallization success in Thermotoga maritima:maximum clustering strategy for structural genomics.J. Mol. Biol. 344, 977–991.

6. Jidenko,M., Nielsen, R. C., Sorensen, T. L., Moller, J. V.,le Maire, M., Nissen, P. & Jaxel, C. (2005). Crystal-lization of a mammalian membrane protein over-expressed in Saccharomyces cerevisiae. Proc. Natl Acad.Sci. USA, 102, 11687–11691.

7. Long, S. B., Campbell, E. B. & Mackinnon, R. (2005).Crystal structure of a mammalian voltage-dependentShaker family K+ channel. Science, 309, 897–903.

8. Tornroth-Horsefield, S., Wang, Y., Hedfalk, K., Johan-son, U., Karlsson, M., Tajkhorshid, E. et al. (2006).Structural mechanism of plant aquaporin gating.Nature, 439, 688–694.

9. Wu, C. C. & Yates, J. R., 3rd (2003). The application ofmass spectrometry to membrane proteomics. NatureBiotechnol. 21, 262–267.

10. Santoni, V., Molloy, M. & Rabilloud, T. (2000).Membrane proteins and proteomics: un amourimpossible? Electrophoresis, 21, 1054–1070.

11. Navarre, C., Degand, H., Bennett, K. L., Craw-ford, J. S., Mortz, E. & Boutry, M. (2002). Sub-proteomics: identification of plasma membraneproteins from the yeast Saccharomyces cerevisiae. Pro-teomics, 2, 1706–1714.

12. Gelperin, D. M., White, M. A., Wilkinson, M. L., Kon,Y., Kung, L. A., Wise, K. J. et al. (2005). Biochemicaland genetic analysis of the yeast proteome with amovable ORF collection. Genes Dev. 19, 2816–2826.

13. Christendat, D., Yee, A., Dharamsi, A., Kluger, Y.,Savchenko, A., Cort, J. R. et al. (2000). Structuralproteomics of an archaeon. Nature Struct. Biol. 7,903–909.

14. Luan, C. H., Qiu, S., Finley, J. B., Carson, M., Gray,R. J., Huang, W. et al. (2004). High-throughputexpression of C. elegans proteins. Genome Res. 14,2102–2110.

15. Lesley, S. A., Kuhn, P., Godzik, A., Deacon, A. M.,Mathews, I., Kreusch, A. et al. (2002). Structuralgenomics of the Thermotoga maritima proteomeimplemented in a high-throughput structure deter-

mination pipeline. Proc. Natl Acad. Sci. USA, 99,11664–11669.

16. Daley, D. O., Rapp, M., Granseth, E., Melen, K., Drew,D. & von Heijne, G. (2005). Global topology analysisof the Escherichia coli inner membrane proteome.Science, 308, 1321–1323.

17. Osterberg, M., Kim, H., Warringer, J., Melen, K.,Blomberg, A. & von Heijne, G. (2006). Phenotypiceffects of membrane protein overexpression in Sac-charomyces cerevisiae. Proc. Natl Acad. Sci. USA, 103,11148–11153.

18. Eshaghi, S., Hedren, M., Nasser, M. I., Hammarberg,T., Thornell, A. & Nordlund, P. (2005). An efficientstrategy for high-throughput expression screening ofrecombinant integral membrane proteins. Protein Sci.14, 676–683.

19. Korepanova, A., Gao, F. P., Hua, Y., Qin, H.,Nakamoto, R. K. & Cross, T. A. (2005). Cloning andexpression of multiple integral membrane proteinsfrom Mycobacterium tuberculosis in Escherichia coli.Protein Sci. 14, 148–158.

20. Kyte, J. & Doolittle, R. F. (1982). A simple method fordisplaying the hydropathic character of a protein.J. Mol. Biol. 157, 105–132.

21. Ghaemmaghami, S., Huh, W. K., Bower, K., Howson,R. W., Belle, A., Dephoure, N. et al. (2003). Globalanalysis of protein expression in yeast. Nature, 425,737–741.

22. Goh, C. S., Lan, N., Douglas, S. M., Wu, B., Echols, N.,Smith, A. et al. (2004). Mining the structural genomicspipeline: identification of protein properties that affecthigh-throughput experimental analysis. J. Mol. Biol.336, 115–130.

23. Dwight, S. S., Harris, M. A., Dolinski, K., Ball, C. A.,Binkley, G., Christie, K. R. et al. (2002). SaccharomycesGenome Database (SGD) provides secondary geneannotation using the Gene Ontology (GO).Nucl. AcidsRes. 30, 69–72.

24. le Maire, M., Champeil, P. & Moller, J. V. (2000).Interaction of membrane proteins and lipids withsolubilizing detergents. Biochim. Biophys. Acta, 1508,86–111.

25. MacKenzie, K. R., Prestegard, J. H. & Engelman, D. M.(1997). A transmembrane helix dimer: structure andimplications. Science, 276, 131–133.

26. Arora, A., Abildgaard, F., Bushweller, J. H. & Tamm,L. K. (2001). Structure of outer membrane protein Atransmembrane domain by NMR spectroscopy.Nature Struct. Biol. 8, 334–338.

27. Li, X. D., Villa, A., Gownley, C., Kim, M. J., Song, J.,Auer, M. & Wang, D. N. (2001). Monomeric state andligand binding of recombinant GABA transporterfrom Escherichia coli. FEBS Letters, 494, 165–169.

28. Loll, P. J., Tretiakova, A. & Soderblom, E. (2003).Compatibility of detergents with the microbatch-under-oil crystallization method. Acta Crystallog. sect.D, 59, 1114–1116.

29. West, M., Park, D., Dodd, J. R., Kistler, J. & Christie,D. L. (2005). Purification and characterization of thecreatine transporter expressed at high levels inHEK293 cells. Protein Expr. Purif. 41, 393–401.

30. Kall, L. & Sonnhammer, E. L. (2002). Reliability oftransmembrane predictions in whole-genome data.FEBS Letters, 532, 415–418.

31. White, W. H., Gunyuzlu, P. L. & Toyn, J. H. (2001).Saccharomyces cerevisiae is capable of de novo pantothe-nic acid biosynthesis involving a novel pathway ofbeta-alanine production from spermine. J. Biol. Chem.276, 10794–10800.

Page 15: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

635Expression and Solubility of Membrane Proteins

32. Pedrajas, J. R., Porras, P., Martinez-Galisteo, E.,Padilla, C. A., Miranda-Vizuete, A. & Barcena, J. A.(2002). Two isoforms of Saccharomyces cerevisiaeglutaredoxin 2 are expressed in vivo and localize todifferent subcellular compartments. Biochem. J. 364,617–623.

33. Slusarewicz, P., Xu, Z., Seefeld, K., Haas, A. &Wickner, W. T. (1997). I2B is a small cytosolic proteinthat participates in vacuole fusion. Proc. Natl Acad. Sci.USA, 94, 5582–5587.

34. Hamada, K., Terashima, H., Arisawa, M., Yabuki, N.& Kitada, K. (1999). Amino acid residues in theomega-minus region participate in cellular localiza-tion of yeast glycosylphosphatidylinositol-attachedproteins. J. Bacteriol. 181, 3886–3889.

35. Tokunaga, M., Kawamura, A. & Kohno, K. (1992).Purification and characterization of BiP/Kar2 pro-tein from Saccharomyces cerevisiae. J. Biol. Chem. 267,17553–17559.

36. Springer, S., Chen, E., Duden, R., Marzioch, M.,Rowley, A., Hamamoto, S. et al. (2000). The p24proteins are not essential for vesicular transport inSaccharomyces cerevisiae. Proc. Natl Acad. Sci. USA, 97,4034–4039.

37. Lussier, M., Sdicu, A. M., Camirand, A. & Bussey, H.(1996). Functional characterization of the YUR1, KTR1,andKTR2 genes asmembers of the yeast KRE2/MNT1mannosyltransferase gene family. J. Biol. Chem. 271,11001–11008.

38. Mullins, C., Meyer, H. A., Hartmann, E., Green, N. &Fang, H. (1996). Structurally related Spc1p and Spc2pof yeast signal peptidase complex are functionallydistinct. J. Biol. Chem. 271, 29094–29099.

39. Navarrete, R. & Serrano, R. (1983). Solubilization ofyeast plasma membranes and mitochondria bydifferent types of non-denaturing detergents. Biochim.Biophys. Acta, 728, 403–408.

40. Kim, H., Melen, K., Osterberg, M. & von Heijne, G.(2006). A global topology map of the Saccharomycescerevisiae membrane proteome. Proc. Natl Acad. Sci.USA, 103, 11142–11147.

41. Zhao, R. & Reithmeier, R. A. (2001). Expression andcharacterization of the anion transporter homologueYNL275w in Saccharomyces cerevisiae. Am. J. Physiol.Cell Physiol. 281, C33–C45.

42. Ernst, R., Klemm, R., Schmitt, L. & Kuchler, K. (2005).Yeast ATP-binding cassette transporters: cellularcleaning pumps. Methods Enzymol. 400, 460–484.

43. Mullner, H., Deutsch, G., Leitner, E., Ingolic, E. &Daum, G. (2005). YEH2/YLR020c encodes a novelsteryl ester hydrolase of the yeast Saccharomycescerevisiae. J. Biol. Chem. 280, 13321–13328.

44. Schmitz, C., Kinner, A. & Kolling, R. (2005). Thedeubiquitinating enzyme Ubp1 affects sorting of theATP-binding cassette-transporter Ste6 in the endocyticpathway. Mol. Biol. Cell, 16, 1319–1329.

45. Iida, K., Tada, T. & Iida, H. (2004). Molecularcloning in yeast by in vivo homologous recombina-tion of the yeast putative alpha1 subunit of thevoltage-gated calcium channel. FEBS Letters, 576,291–296.

46. Lemaire, C., Guibet-Grandmougin, F., Angles, D.,Dujardin, G. & Bonnefoy, N. (2004). A yeast mito-chondrial membrane methyltransferase-like proteincan compensate for oxa1 mutations. J. Biol. Chem. 279,47464–47472.

47. Mitsui, K., Ochi, F., Nakamura, N., Doi, Y., Inoue, H. &Kanazawa, H. (2004). A novel membrane proteincapable of binding the Na+/H+ antiporter (Nha1p)

enhances the salinity-resistant cell growth of Sacchar-omyces cerevisiae. J. Biol. Chem. 279, 12438–12447.

48. Mitsui, K., Yasui, H., Nakamura, N. & Kanazawa, H.(2005). Oligomerization of the Saccharomyces cerevisiaeNa+/H+ antiporter Nha1p: implications for its anti-porter activity. Biochim. Biophys. Acta, 1720, 125–136.

49. Schnabl, M., Oskolkova, O. V., Holic, R., Brezna, B.,Pichler, H., Zagorsek, M. et al. (2003). Subcellularlocalization of yeast Sec14 homologues and theirinvolvement in regulation of phospholipid turnover.Eur. J. Biochem. 270, 3133–3145.

50. Bajaj, A., Celic, A., Ding, F. X., Naider, F., Becker, J. M.& Dumont, M. E. (2004). A fluorescent alpha-factoranalogue exhibits multiple steps on binding to its Gprotein coupled receptor in yeast. Biochemistry, 43,13564–13578.

51. Tam, A., Schmidt, W. K. & Michaelis, S. (2001).The multispanning membrane protein Ste24p cata-lyzes CAAX proteolysis and NH2-terminal proces-sing of the yeast a-factor precursor. J. Biol. Chem.276, 46798–46806.

52. Dolence, J. M., Steward, L. E., Dolence, E. K., Wong,D. H. & Poulter, C. D. (2000). Studies with recom-binant Saccharomyces cerevisiae CaaX prenyl proteaseRce1p. Biochemistry, 39, 4096–4104.

53. Lauterwein, J., Bosch, C., Brown, L. R. & Wuthrich, K.(1979). Physicochemical studies of the protein-lipidinteractions in melittin-containing micelles. Biochim.Biophys. Acta, 556, 244–264.

54. Bass, R. B., Strop, P., Barclay, M. & Rees, D. C. (2002).Crystal structure of Escherichia coli MscS, a voltage-modulated and mechanosensitive channel. Science,298, 1582–1587.

55. Lunin, V. V., Dobrovetsky, E., Khutoreskaya, G.,Zhang, R., Joachimiak, A., Doyle, D. A. et al. (2006).Crystal structure of the CorA Mg2+ transporter.Nature, 440, 833–837.

56. Kiefer, H., Vogel, R. & Maier, K. (2000). Bacterialexpression of G-protein-coupled receptors: predictionof expression levels from sequence. Recept. Chann. 7,109–119.

57. Zhou, Y. & Bowie, J. U. (2000). Building a thermostablemembrane protein. J. Biol. Chem, 275, 6975–6979.

58. Xie, G., Gross, A. K. & Oprian, D. D. (2003). An opsinmutant with increased thermal stability. Biochemistry,42, 1995–2001.

59. Moller, S., Croning, M. D. & Apweiler, R. (2001).Evaluation of methods for the prediction of mem-brane spanning regions. Bioinformatics, 17, 646–653.

60. Chen, C. P., Kernytsky, A. & Rost, B. (2002).Transmembrane helix predictions revisited. ProteinSci. 11, 2774–2791.

61. Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer,E. L. (2001). Predicting transmembrane protein topol-ogy with a hidden Markov model: application tocomplete genomes. J. Mol. Biol. 305, 567–580.

62. Kall, L., Krogh, A. & Sonnhammer, E. L. (2004). Acombined transmembrane topology and signal pep-tide prediction method. J. Mol. Biol. 338, 1027–1036.

63. Ideka, M., Arai, M., Lao, D. M. & Shimizu, T. (2002).Transmembrane topology prediction methods: areassessment and improvement by a consensusmethod using a dataset of experimentally-character-ized transmembrane topologies. In Silico Biol. 2, 19–33.

64. Lehnert, U., Xia, Y., Royce, T. E., Goh, C. S., Liu, Y.,Senes, A. et al. (2004). Computational analysis ofmembrane proteins: genomic occurrence, structureprediction and helix interactions. Quart. Rev. Biophys.37, 121–146.

Page 16: Characteristics Affecting Expression and Solubilization of Yeast Membrane Proteins

636 Expression and Solubility of Membrane Proteins

65. Tusnady, G. E. & Simon, I. (2001). The HMMTOPtransmembrane topology prediction server. Bioinfor-matics, 17, 849–850.

66. Bradford, M. (1976). A rapid and sensitive method forthe quantitation of micogram quantities of proteinutilizing the principle of dye-binding. Anal. Biochem.72, 248–254.

67. Huh, W. K., Falvo, J. V., Gerke, L. C., Carroll, A. S.,Howson, R. W., Weissman, J. S. & O'Shea, E. K. (2003).

Global analysis of protein localization in buddingyeast. Nature, 425, 686–691.

68. Wimley, W. C., Creamer, T. P. & White, S. H. (1996).Solvation energies of amino acid side chains andbackbone in a family of host-guest pentapeptides.Biochemistry, 35, 5109–5124.

69. Taussky, H. H. & Shorr, E. (1953). A microcolorimetricmethod for the determination of inorganic phos-phorus. J. Biol. Chem. 202, 675–685.

Edited by J. Bowie

(Received 13 July 2006; received in revised form 27 September 2006; accepted 3 October 2006)Available online 6 October 2006