· web viewto date, work in understanding these proteomic responses through high-throughput...

Original Manuscript

Diverse patterns of protein subcellular localization change inferred from 100,000s of

microscopy images

Alex X Lu1, Yolanda T Chong2, Bob Strome3, Ian Hsu3, Louis-François Handfield1, Oren

Kraus4, Brenda J Andrews2, Alan M Moses1,3,5

1 Department of Computer Science, University of Toronto

2 Terrence Donnelly Centre for Cellular & Biomolecular Research, University of

Toronto

3 Department of Cell & Systems Biology, University of Toronto

4 Department of Electrical Engineering, University of Toronto

5 Center for Analysis of Genome Evolution and Function, University of Toronto

1

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

Abstract

Evaluating protein localization changes on a systematic level is a powerful tool for

understanding how cells respond to environmental, chemical, or genetic perturbations. To

date, work in understanding these proteomic responses through high-throughput imaging has

catalogued localization changes independently for each perturbation. To distinguish changes

that are targeted responses to the specific perturbation and more generalized programs, we

developed a scalable approach to visualize the localization behavior of proteins across

multiple experiments as a quantitative pattern. By applying this approach to 24 experimental

screens consisting of nearly 400,000 images, we differentiate specific responses from more

generalized ones, discover nuance in the localization behavior of stress-responsive proteins,

and form hypotheses by clustering proteins with similar patterns. In one case, we confirmed

using time-lapse fluorescence imaging that, as predicted by our cluster analysis, Rtg3 shows

stress-modulated nuclear pulses. In another case, we confirmed that Yju3 shows an

unexpected relocalization to the nucleus in a Hsl1 deletion background. Our work provides

demonstrates how the growing scale of imaging data can be exploited to yield unexpected

new cell biology that would have not been evident when analysing each dataset

independently.

2

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

1. Introduction

Controlling the subcellular localization of proteins has long been understood as an

important component of a cell’s regulatory toolkit in response to perturbations (1–3), such as

drug treatments, genetic mutations, or environmental stressors. Towards the goal of

systematically characterizing these proteome dynamics, high-throughput technologies have

been employed to systematically gather data about protein localization in cells (4–6). Here,

we focus on GFP-tagged cell libraries using automated high-throughput microscopy (7,8),

which yield terabyte-scale image datasets (9–11) that show the subcellular localization of

each individual protein in a cell over varied perturbations. Using these so-called image

screens, protein localization changes have been systematically identified under a handful of

genetic and environmental perturbations (12–15).

Excitingly, the growing quantity and diversity of perturbations imaged opens the

possibility of analysing these screens in concert, rather than individually for each

perturbation. Multi-perturbation strategies have already been demonstrated through widely-

adopted cluster analysis methods (16) on microarray and RNA-seq data, providing a template

for how to discover complex proteome localization dynamics. One well-known example is the

identification of the environmental stress response in yeast; by appending the results of 142

different stressful microarray experiments into a single vector for each gene, then clustering

the resultant matrix to find shared patterns in the transcriptional regulation of genes across

experiments, Gasch et al. identified a set of ~900 genes whose transcript level changed under

most environmental stress perturbations, and differentiated this shared response from genomic

responses to specific stressors (17). Inspired by these approaches in genomic expression data,

we developed a similar strategy for protein localization changes. We reasoned that doing so

would provide the context to determine if a localization change is specific to a perturbation, or

a more general response found across multiple perturbations, potentially leading into a richer

3

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

understanding of protein function and of how cells permutate, adjust, and repurpose these

functions to adapt to a wide range of perturbations.

To analyze the localization patterns from multiple screens, scalable data acquisition

and analysis methods that produce quantitative, comparable summaries of protein localization

changes are required. Using automated fluorescence microscopy, images can be acquired with

relatively consistent lighting and growth conditions (12–14). For these data, analysis methods

have focused upon finding differences in the annotations and localization classes of cells (12–

14), either through visual inspection or supervised machine learning, which are performed one

screen at a time. Some previous clustering of protein localization changes has been

demonstrated on spatial mass spectrometry data, but only with a single drug treatment and

only capturing changes in relative distribution between the cytoplasm, nucleus, and nucleolus

(18). Recently, we developed an unsupervised localization change detection algorithm (19)

that requires no training of parameters on each screen, easily scales to large image collections,

and describes relative patterns of change quantitatively, rather than relying on predefined

classes, and is therefore applicable to all subcellular localization patterns. Here, we attempt to

exploit the increasing scale of image data to provide mutual context to these protein-level

responses. By using “big data” in cell biology (20), we can provide mutual context and

hypothesis discovery in tandem, that would have not been evident analysing each dataset

independently.

Using data from automated microscopy and our new analysis method, we present, to

our knowledge, the first quantitative, multi-perturbation cluster analysis of protein

localization changes. We show that our data acquisition and analysis methods can scale to

analyses of 100,000s of images. We find that protein profiles for localization changes can be

separated into clusters ranging from being specific to one perturbation, to being shared

between various permutations of perturbations, to exhibiting different patterns depending

upon the perturbation. Moreover, we find functionally specific themes for many of these

clusters consistent with the known biological effects of the perturbations. We show that

4

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

although some protein localization changes are accompanied by transcriptional changes, in

general subcellular localization is an independent layer of regulation and that shared patterns

of localization changes cannot be simply explained by physical interactions or subcellular

compartments. We perform small-scale experiments to confirm several novel observations,

illustrating how these large-scale exploratory analyses yield specific experimental hypotheses.

2. Results

2.1 An unsupervised analysis of protein localization changes in over 280,000 images

In this work, we sought to understand protein localization changes across a wide

range of perturbations. To achieve this, we first gathered a dataset comprising of 17 image

screens of the yeast GFP-fusion collection (20) under various drug treatments and genetic

mutations. 15 screens are previously published from the CYCLoPs database (9), and consist

of 3 wild-type replicates, 3 replicates of rpd3Δ deletion, and 3 timepoints each of rapamycin

(RAP), hydroxyurea (HU), and α-factor (AF) treatment. We further generated another 2

screens, consisting of 2 replicates of iki3Δ. Altogether, this dataset encompasses 4143

individual yeast genes for a total of 281,724 images and an estimated 20.1 million single cells.

Next, to facilitate comparison between experiments, we converted these images into

quantitative measures of localization change, or protein change profiles. We use an

unsupervised localization change detection method (19) performed on a set of interpretable

single cell measurements for yeast cells (21). Briefly, we measure features that track the

distribution of GFP-tagged protein relative to certain cell landmarks, such as the average

distance to the cell center or cell edge. We combine these single cell measurements for each

protein to produce a representation of the “average” cell for each protein in each image

screen. Then, the change detection method calculates a vector of z-scores that report the

relative change for each protein between two image screens, in a way that corrects for

differences stemming from systematic biases between image screens or morphological

5

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

variability between compartments (see methods section 4.2 for more information, including

important technical considerations in our method) We applied these methods to produce a

protein change profile that represents the localization change for each protein in each image

screen, each relative to a standard reference wild-type. An example showing more detailed

information about our features can be found in Supplementary Figure 1.

We sought to aggregate the protein change profiles for individual perturbations into a profile

that would represent multiple perturbations. We did so by appending individual protein

change profiles for the same protein. The heat map in Figure 1 shows these aggregated

vectors. Because localization changes are reported as a relative measure between the wild-

type and perturbation, the z-scores constituting the profiles are larger in magnitude where

localization changes are stronger relative to fluctuations observed in similar proteins. We

found that we could discern in which perturbations a protein changed localization changes (or

not), by assessing intense (or dim) patterns in the profiles. As an estimate of positive

predictive value, we randomly drew 20 images for the set of 85 proteins that had bright

patterns (and thus strong evidence for localization changes) for the RAP2 screen, curated by

qualitatively selecting clusters of bright patterns of protein change profiles from our heat map,

and found that 10 of 20 had visually obvious localization changes, while another 3 were

considered ambiguous for localization changes (suggesting positive predictive value ~ 50%).

Moreover, we found that we could use comparisons of two wild-types relative to the reference

wild-type as controls: where there are similar protein change profiles for our other wild-type

screens, localization changes likely arise from technical issues, experimental variability, or

strain-specific effects, rather than being specifically induced by perturbations.

6

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

Figure 1. A clustered heat map of the aggregated protein change profiles. This heat

map shows a subset of the proteome filtered for strong protein change profiles by requiring

that the aggregated protein change profile have at least 3 z-scores above an absolute value of

5.0. The dendrogram represents the result of applying average linkage hierarchical

agglomerative clustering on the aggregated protein change profiles. Examples of clusters

7

151

152

153

154

155

156

have been highlighted by yellow boxes on the heat map; some clusters for which we have

annotations for are labelled and presented in Table 2. The three panels of microscopy images

correspond to three examples of protein localization changes found in our heat map clusters.

See text for more details.

Cluster Perturbation(s) Annotation P-Value Examples

A α-factor, iki3ΔCytoskeleton-Dependent

Cytokinesis [GO:0061640]3.77E-07 Bud3, Bud4, Cdc10,

Cdc11, Hof1, Myo1Cell Cycle

[GO:0007049]9.21E-05

B Hydroxyurea Cell Cycle - Cdc28, Cdc7, Clb2

CRapamycin, Hydroxyurea

RNA Metabolic Process [GO:0016070]

6.70E-04 Msn2, Dot6, Rtg1, Rtg3, Enp1, Tsr1,

Stb3, Utp20Stress-Responsive

Transcription Factors- Msn2, Dot6, Rtg1,

Rtg3

D

Hydroxyurea, α-factor

Extrinsic Component of Vacuolar Membrane

[GO:0000306]

0.028823 Iml1, Vac14, Tco89

DNA Replication - Pol12, Mcm7, Ctf18

Erpd3Δ,

HydroxyureaTriglyceride Lipase Activity

[GO:0004806]0.029239 Tgl4, Tgl5

Metabolic Activity - Tgl4, Tgl5, Rnr3, Gdb1

F

Rapamycin Negative Regulator of Hydrolase Activity

[GO:0051346]

2.70E-04 Stf1, Inh1, Yhr138c, Pbi2

Enzyme Inhibitor Activity [GO:0004857]

0.001441

Grpd3Δ Annotated by Chong et al.

(2015) to localize away from cytoplasm in rpd3Δ

- Acs1, Rad7, Mni1, Yjr008w, Ycr061w, Pab1, Pus4, Yor292c

H iki3Δ Ribosomal Subunits - Rps9a, Rpl13b, Rps0b

IRapamycin Cytosolic Ribosome

[GO:0022626]0.010879 Rpl23a, Rpl40a,

Rpl40b, Rps10aRibosomal Subunits -

JRapamycin Diacylglycerol

cholinephosphotransferase activity [GO:0004142]

0.043859 Ept1, Cpt1

Phospholipid metabolism - Ipt1, Gpt2, Ept1, Cpt1

Krpd3Δ,

Rapamycin, iki3Δ

Vacuole [GO:0005773] 9.39E-05 Bap2, Itr1, Hxt2, Aqr1, Dip5Ion transmembrane

transporter activity [GO:0015075]

0.020585

LRapamycin, Hydroxyurea

Bounding membrane of organelle [GO:0098588]

0.027761 Ams1, Kch1, Ymd8, Pho91, Mnn2, Och1,

Kre2, Gnt1Protein Glycosylation

[GO:0006486]0.020786 Mnn2, Och1, Kre2,

Gnt1M α-factor Cellular response to

pheromone [GO:0071444]5.37E-04 Fus1, Kar4, Ste2,

Aga2, Crz1

8

157

158

159

160

161

Table 2. Annotations for the clusters presented in Figure 1. Selected significant gene

ontology enrichments are presented with their respective GO accession numbers and p-values

(full lists can be found in Supplementary Data 3). Some observations were not represented

well by enrichment. For example, while many cell cycle proteins are in cluster A, there were

some key cell cycle proteins in clusters B and D that were distinguished in their pattern of

localization changes. We provide some manual annotations for these observations.

2.2 Clustering of multi-perturbation z-score vectors reveals shared and specific

localization changes supported by literature

To test the hypothesis that proteins with similar profiles may be linked functionally,

we grouped together proteins by clustering our aggregated protein change profiles. The heat

map in Figure 1 shows the 1159 vectors with the most significant changes. To our knowledge,

this represents the first quantitative comparison of protein localization changes under multiple

experimental conditions. We observe that the overall landscape of patterns of localization

changes is complex, and includes a mixture of localization changes specific to one

perturbation, and localization changes shared between various combinations of the

perturbations we tested.

We hypothesized that the rapamycin and hydroxyurea treatment would share some

localization changes, because both drugs induce cellular stress responses (22,23). We found a

cluster of shared protein change profiles (Figure 1C, Table 2C) between the two drug

treatments that included stress-responsive proteins such as Msn2, a master regulator of the

yeast general environmental stress response (17), in addition to various proteins involved in

ribosomal biogenesis and rRNA processing, which may connect to the knowledge that

ribosomal subunits are repressed in transcription under the environmental stress response

(17). We find many subtleties in the localization behavior of stress-responsive proteins within

this cluster; for example, while Msn2 is shared between the rapamycin and hydroxyurea

9

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

perturbations, we found that Tsr1, a protein predicted to move to the cytoplasm to the

nucleolus under general stress by a predictive proteome-wide analysis of stress responses

(24), is shared between the rpd3Δ, rapamycin, and hydroxyurea perturbations. Additionally,

while Msn2 has a strong pattern in its protein change profile through all three time points of

hydroxyurea perturbation but in only the first time point of rapamycin perturbation, Stb3, a

repressor of growth under stress controlled by localization between the nucleus and cytoplasm

(25), has a strong pattern in its protein change profile in all time points of both perturbations.

Furthermore, not all stress-related proteins reside in the same cluster; for example, Lap4

(shown as an example in Figure 1), a hydrolase which relocates to the vacuole under

starvation conditions (26), exhibited a strong shared pattern in its protein change profile

between the rapamycin and rpd3Δ perturbations, consistent with the images that show that the

protein is now localized to the vacuole and exhibits an increase in the frequency of its

cytoplasmic foci localization.

We found a similar subtlety in cell cycle proteins. Both hydroxyurea and α-factor

perturbations induce cell cycle arrest (27,28), and consistent with this we found that some

proteins involved in DNA replication (Pol12, Mcm7) and regulation of transcription at the

G1/S transition (Swi6) had similar strong patterns in their respective protein change profiles

(Figure 1D, Table 2D). Some of these localization changes are subtle. We show Mcm7 as an

example in Figure 1; the wild-type cells exhibit a heterogeneous mixture of cytoplasmic or

nuclear-localized protein, while in both hydroxyurea and α-factor, the localization is now

almost entirely nuclear, with the distribution of the nuclear-localized protein becoming

irregular compared to the more circular wild-type nuclear distribution. The Mcm complex,

including Mcm7, is known to relocalize according to cell cycle stage (29). Another cluster in

α-factor includes cytokinesis and cell cycle proteins (Figure 1A, Table 2A); some of these

profiles are shared with the iki3Δ perturbations, an elongator complex subunit that regulates

sensitivity to G1 cell cycle arrest (30).

10

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

In addition to clusters with shared patterns, we also found specific clusters that

contained proteins consistent with the characterized biological effects of the respective

perturbations. α-factor is a pheromone that induces the yeast mating response. Consistent with

this role, we find a cluster of proteins involved in the mating response (Figure 1M, Table

2M), all of which have a strong protein change profile specific to the α-factor only. This

includes the transcription factor Kar4, which is induced as a regulator of downstream events

in the mating response (31). The cluster also includes the transcription factor Crz1, which

responds to calcium by rapid, stochastic pulsing between the nucleus and cytoplasm; a recent

study of single-cell calcium dynamics of pheromone-treated yeast indicates that cells respond

with dramatically more frequent bursts of calcium occurrence dependent upon pheromone

concentration, inducing nuclear localization of Crz1 during these bursts (32). Images for Crz1

in the wild-type versus α-Factor screens (shown in Figure 1) confirm that while the protein is

predominantly cytoplasmic in the wild-type, a proportion of the cells now also localize to the

nucleus for Crz1 under α-Factor perturbation. As another example of a cluster with profiles

specific to only one perturbation, we found a cluster for specific for rapamycin perturbation

that includes proteins that regulate hydrolyse activity (Figure 1F; Table 2F), which may relate

to the autophagic processes induced by rapamycin (33).

Our clustered data is available in Supplementary Data 2. A list of proteins that change

for each perturbation is available in Supplementary Data 4, curated by selecting clusters of

strong patterns of protein change profiles from the heat map in Figure 1.

2.3 Developing hypotheses from our exploratory analysis: examples of three

hypotheses derived from clustering of protein localization changes

The extent to which we can interpret our clusters within known literature varies. For

example, while the mating response is well-studied and provides a clear-cut means to evaluate

the specific localization changes we found in the α-Factor perturbation, other clusters are

11

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

supported by more circumstantial evidence or have little context from literature to form an

interpretation. By integrating the observations made possible by our exploratory analysis of

our protein change profiles with a closer qualitative analysis of images of the proteins

implicated in these clusters, the data can be mined to develop informed hypotheses for

experimental follow-up. In this section, we show three examples of this process.

2.3.1 Certain stress-responsive transcription factors that exhibit stochastic pulsatility

may be controlled by the same mechanism

We observed a small cluster of three transcription factors, Msn2, Dot6, and Rtg3,

with patterns shared between the hydroxyurea and rapamycin perturbation. All three proteins

exhibited a protein change profile indicating that they were becoming more compact in

distribution and closer to the cell center under both hydroxyurea and rapamycin perturbation

(Figure 3 - A). Evaluating the images (Figure 3 - B) confirmed that these proteins were

moving from cytoplasm to nuclei under these perturbations. The import of these transcription

factors into the nucleus is expected; all three respond in this manner due to the inhibition of

TORC1 under rapamycin (22), and Msn2 and Dot6 have both been documented to respond in

this manner through a previous manual assessment of localization changes under hydroxyurea

(13).

12

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

Figure 3. Heat map of the protein change profiles (A) and images (B) for a

cluster of stress-responsive transcription factors. We show images of each respective protein

under standard media, rapamycin treatment at 60 minutes and 220 minutes, and hydroxyurea

treatment at 80 minutes and 160 minutes.

We detected interesting coordinated time dependencies in these responses. All three

transcription factors are localized to the nucleus throughout all three time points of

hydroxyurea perturbation, but strongly localize to the nucleus at the first time point of

rapamycin perturbation before returning closer to the wild-type baseline in later time points.

While there are subtle differences between the three proteins (for example, Msn2 only

diminishes in relative localization to the nucleus rather returning fully to the cytoplasm as we

see in the other proteins), the general trend is preserved.

These similarities in their protein change profile patterns led us to hypothesize that

Msn2, Dot6, and Rtg3 may be regulated by a common mechanism. Msn2 has known

13

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

mechanisms behind its localization changes, exhibiting stochastic oscillation (or pulsing)

between the cytoplasm and nucleus regulated by the cAMP-protein Kinase A (PKA) pathway

(34). Moreover, pulsing in Msn2 is affected by the degree of stress, with low stress levels

inducing a cytoplasmic steady state, high stress inducing a nuclear steady state, and only

intermediate levels of stress inducing pulsing (34); rapamycin has been specifically shown to

induce nuclear accumulation of Msn2 (35). We speculate that coordinated changes in pulsing

behavior under rapamycin or hydroxyurea treatment could underlie the similar patterns of

localization changes between these proteins. However, while a proteome-wide screen

confirms pulsatile dynamics for Msn2 and Dot6, Rtg3 was not found to pulse (36). To test

whether Rtg3 also shows condition specific pulsing dynamics, we produced time-lapse

movies of Rtg3 in standard growth media and under rapamycin treatment.

. We observed Rtg3 pulses (in both the GFP-collection strain, as well as an

independently constructed Rtg3-GFP fusion strain, See Methods), for both the wild-type

media and rapamycin treatment. However, we found that, as predicted by the cluster analysis

rapamycin treatment increases the duration of Rtg3 pulses. In the standard media, pulsing

single cells tend to show frequent oscillation between the nucleus and cytoplasm, while under

rapamycin treatment, pulsing single cells tend to show asingle prolonged pulse in our movies.

We quantified single cell dynamics in our independent Rtg3-GFP strain (examples of pulsing

single cells in Figure 4A and 4B). The proportion of cells with a prolonged pulse (defined as

having its longest pulse as greater than 120 minutes and shown in Figure 4C) significantly

differs under rapamycin treatment relative to the wild-type (26/53 cells in wild-type, 36/50

cells under rapamycin treatment, p = 0.03, Fisher’s Exact Test).

14

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

Figure 4. Single cell localization dynamics for Rtg3, for a cell with frequent

oscillations between the nucleus and cytoplasm (A) and a cell with a single prolonged pulse

15

296

297

298

(B). We show stills from the time-lapse movie of each cell at various timepoints. (C) shows

that the proportion of cells with prolonged pulses increases in rapamycin treatment relative to

wild-type media.

2.3.2 Different sets of ribosomal subunits respond specifically depending on the

perturbation and may exhibit extra-ribosomal functions

We found evidence of localization changes in many ribosomal subunits, many of

which were indicated by their protein change profiles to respond specifically to one

perturbation only. As examples, we show two sets of ribosomal subunits in Figure 5; one set

of proteins has a strong protein change profile pattern in the last two timepoints of rapamycin

perturbation, with some proteins showing a redistribution from the nucleus to the cytoplasm

in the images, while the other set has a strong protein change profile pattern in iki3Δ

perturbation, with some proteins showing a change from a homogeneous cytoplasmic

population to a bimodal cytoplasmic-low expression one. These condition-specific

localization changes for particular ribosomal subunits are in contrast to the tightly coordinated

transcriptional response to stress observed at the transcriptional level (17).

16

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

Figure 5. Heat map for ribosomal subunits taken from two distinct clusters, one that

responded specifically in rapamycin (A – top) and one that responded specifically in iki3Δ (A

– bottom). We show some images of the relevant perturbation and the reference wild-type for

the changes in some subunits in B and C.

While the role of the perturbation-specific localization changes that we identify in

these ribosomal subunits is not entirely clear, a growing body of research suggests that

specific ribosomal subunits may have extra-ribosomal functions ranging from ribosomal

biogenesis to DNA repair to adhesive growth (37). For example, Rpl40a and Rpl40b are

ubiquitin-ribosomal protein fusions known to contribute to 27 SB pre-rRNA maturation in the

nucleolus by the cleaving of the fusion (38). We find that Rpl40a and Rpl40b both redistribute

from the nucleus to the cytoplasm in the later two time points of rapamycin perturbation,

which we speculate reflects the reduction in ribosomal biogenesis and rRNA synthesis and

processing activity caused by rapamycin (39). From this example, we believe that

understanding protein localization changes in ribosomal subunits, and under what

perturbations these changes occur, can assist the exploration of extra-ribosomal functions.

17

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

2.3.3 Membrane proteins exhibit a reciprocal pattern of localization changes between

hydroxyurea and rapamycin

We observed a cluster of proteins that had a strong protein change profile patterns in

both the rapamycin and hydroxyurea perturbations, but with different patterns in their

respective perturbations. In the rapamycin perturbations, the direction of our features

indicated that the proteins were becoming closer to the cell center and further from the cell

edge relative to the wild-type, while in the hydroxyurea perturbations, the same proteins were

predicted to become further from the cell center and closer to the cell edge. Evaluating images

for these proteins (Figure 6) identified that most were membrane proteins, with some being

plasma membranes and others Golgi. In the wild-type, these proteins are localized to both the

membrane and the vacuole; under hydroxyurea treatment, this distribution of protein shifts

towards the membrane more, whereas in rapamycin perturbation, it shifts towards the vacuole

more, confirming the trends predicted by our protein change profile patterns.

18

332

333

334

335

336

337

338

339

340

341

342

343

344

345

Figure 6. Heat map of the protein change profiles (A) and images (B) for a cluster of

proteins with reciprocal protein change profiles between hydroxyurea and rapamycin

treatment. We show images for three proteins from this cluster, under 220 minutes of

rapamycin treatment, standard media, and 160 minutes of hydroxyurea treatment.

19

346

347

348

349

350

While we do not have a clear hypothesis for the mechanism behind this reciprocal

change between the hydroxyurea and rapamycin perturbations, what is interesting about these

membrane proteins is not just that they change localization in either the rapamycin and

hydroxyurea perturbations, but that they change localizations differently between the

perturbations. Because our method visualizes localization changes quantitatively and

simultaneously for both perturbations, these patterns were immediately obvious from the heat

map. In addition, global clustering enabled visual identification of these changes. Because

some of the localization changes are subtle, they may have been missed or attributed to

technical variation if viewed in isolation. By grouping these proteins with more pronounced

examples of changes of a similar nature, these changes were easier to identify.

2.4 Cluster associations found by our unsupervised exploration of localization

changes are complementary to other high-throughput experiments

Next, we wanted to test if the associations between proteins inferred from our

exploratory analysis of protein localization changes were unique to analysing protein

localization.

First, to determine that we were not simply detecting protein localization changes

resulting from transcript-driven protein abundance changes, we tested the overlap of our

predicted localization changes with transcriptional responses for perturbations for which we

could find comparable microarray data. We present this data qualitatively as a heat map,

which presents the microarray data (as log fold-change from wild-type) as columns to the

right of the localization change vectors for each perturbation. This visualization allows us to

assess if clusters of protein change profiles also have large transcript changes in the relevant

perturbations. The data for the full heat map can be found in Supplementary Data 5; we show

important components in Figure 7A.

20

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

Figure 7. Comparisons of our clusters of protein change profiles with other high

throughput experimental data. A) shows some exemplar localization change clusters with

21

377

378

379

transcriptional data side-by-side for the relevant perturbations; note that the transcript vectors

and protein change profiles are shown on a different scale. B) shows the enrichment of 15

clusters for physical interactions, reported as z-scores. C) shows our protein change profiles,

represented as a scatterplot, labelled with manual annotations for the wild-type localization of

the proteins. We show 608 proteins, filtered for having at least 2 z-scores above an absolute

value of 6.0. Some clusters that we previously identified in Table 2 are circled on the

scatterplot.

We found that some clusters of localization change correspond with large transcript

changes. In the alpha factor, the cluster of mating pathway components (Kar4, Aga2, Fus1,

etc.; Figure 7A – I) had both strong transcript vectors and protein change profiles, as well as

some (Bud3, Bud4, etc.; Figure 7A – II), but not all cell cycle proteins. Some stress related

genes common to both the rapamycin and hydroxyurea perturbations have large transcript

changes (Tsr1, Enp1, etc.; Figure 7A – III), but not the transcription factors that we

previously identified in section 2.3.3 (Dot6, Msn2, Rtg3; Figure 7A – III). In contrast, many

clusters do not agree well with transcript changes. For example, a cluster of proteins specific

to the rapamycin perturbation containing some kinases (Pkc1, Rtk1, Npr1; Figure 7A – IV),

and a cluster of proteins involved in DNA repair responding specifically in the hydroxyurea

perturbation (Cdc28, Rad54, Cdc7, etc.; Figure 6A – V) mostly do not exhibit large transcript

changes. The reciprocal pattern of membrane protein localizations between the rapamycin and

hydroxyurea perturbations, identified in 2.3.2, also do not exhibit large transcript changes

(Figure 7A – VI).

For ribosomal subunits in particular, we found that the relationship between transcript

vectors and protein change profiles was complex. First, we observed that some ribosomal

subunits with a pattern specific to the rapamycin perturbation (Rps10a, Rpl40b, Rpl40a;

Figure 6A – VII) had large transcript changes in the rapamycin perturbation. At the same

time, we observed some ribosomal subunits with strong transcript changes in the rapamycin

22

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

and hydroxyurea perturbations (Rps18a, Rpl14a, Rps27b, Rpl27b, Rpl2a; Figure 7A – VIII)

that had strong protein change profiles in neither. Finally, we observed ribosomal proteins

that had strong protein change profiles in both the rapamycin and hydroxyurea perturbations

(Rpl23a, Rpl27a; Figure 7A – IX), that also had strong transcript changes in the rapamycin

perturbation but not the hydroxyurea.

Next, we asked if our clusters of protein change profiles were dominated by

interacting proteins from the same complex moving together. We hypothesized that the

underlying causes behind localization changes were diverse, and that coordinated localization

changes would not necessarily require that the participating proteins had to interact, such as if

a set of proteins were independently handling modular aspects of a response. To test overlap

with protein-protein interactions, we selected 15 clusters of protein change profiles using the

dendrogram presented in Figure 1, ranging from 5 to 37 proteins with an average cluster size

of 15.7 (lists of proteins in each cluster can be found in Supplementary Data 6). We evaluated

if these clusters were enriched for physical interactions from low-throughput experimental

sources using the BIOGRID database (40). Our results are summarized as z-scores indicating

the degree of enrichment relative to null expectation in Figure 7B. Only 5 clusters were

enriched for physical interactions, while 10 clusters were not.

Finally, we wanted to determine if our clusters of protein change profiles consisted

primarily of proteins that shared a subcellular localization in the wild-type or not. In section

2.3.3, we showed an example of proteins in different compartments (cell membrane and

Golgi) showing similar patterns of behavior. We wanted to assess more systematically if we

were capturing protein behavior in our analysis, rather than simply recapitulating proteins that

shared localizations and moved together; we reasoned that if proteins moved under the same

sets of perturbations, even from and to different places, they could still be regulated in the

same way, such as being phosphorylated by the same kinase. To do this, we visualized our

protein change profiles as a scatterplot using a technique that represents the aggregated

profiles in two dimensional space (Figure 7C), and color-coded the points according to

23

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

manually assigned annotation classes for a wild-type GFP-fusion yeast collection by Huh et al

(20). Many of the clusters we presented in Table 2 are also represented by clusters of

proximate points in the scatterplot; we label some of the proteins previously shown.

Generally, most clusters are heterogeneous in their composition of localization

classes. The cluster of proteins responding specifically to rpd3Δ (Table 2G; includes Pus1,

Acs1) is an example of a highly heterogeneous cluster. In many cases, there is some local bias

towards some classes. For example, the cluster of pheromone response proteins (Table 2M;

includes Aga1, Kar4, Fus1) has a mixture of cytoplasm and nucleus, and vacuole localized

proteins. This reflects how this cluster consists of both pheromone-responsive transcription

factors like Kar4, in addition to downstream responses like Fus1, a cell fusion protein that

facilitates vacuole mixing (41). Similarly, while the clustered stress-responsive proteins

Msn2/Dot6/Rtg3 (discussed in section 2.3.1) all share a cytoplasm/nucleus localization,

proximate proteins (close to TSR1 in Figure 6C) are more heterogeneous. Some clusters do

exhibit primarily a single localization class, however. For example, the cluster of cytokinesis

proteins (Table 1A; includes Bud3, Bud4, Myo1) are mostly bud neck; we note that these

proteins were also the cluster that was mostly strongly enriched for physical interactions in

the previous comparison (Figure 7B – 1).

2.5 Exploratory analysis on a kinase deletion screens reveals more limited

localization changes

Next, we explored protein localization changes across image screens for 7 kinase

deletion mutants: elm1Δ, hal5Δ, hsl1Δ, kin1Δ, kin2Δ, mck1Δ, and vhs1Δ. Because

phosphorylation is a well-characterized mechanism for regulating protein localization (2), we

anticipated that we would identify localization changes in these mutants. This dataset

contained 116,004 images and about 5.19 million single cells. These image screens represent

a more challenging case for exploring protein localization change dynamics for several

24

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

reasons. First, we had less prior context to draw upon. While some information exists on the

phenotypes that these deletions induce, little is known about the proteomic dynamics induced

by the kinase deletions. Second, we found that this dataset was more technically challenging.

The average number of cells identified in the kinase deletion screens was ~744,000 cells per

screen (compared to ~1.22 million per screen in the analysis in Figure 1). Furthermore, we do

not have replicates or multiple time points in the kinase deletion screens, making it more

challenging to confirm the reproducibility of the signals we observe. Nevertheless, we applied

the change detection and aggregated profile clustering strategy that we presented in sections

2.1 and 2.2, on the kinase deletion mutants, and visualized the most significant results as a

clustered heat map, (Figure 8A); the full clustered data is available in Supplementary Data 7.

25

460

461

462

463

464

465

466

467

468

469

Figure 8. A) shows a clustered heat map of the aggregated protein change profiles for

the kinase deletion perturbations. This heat map shows a subset of the proteome filtered for

strong protein change profiles by requiring that the aggregated protein change profile have at

least 2 z-scores above an absolute value of 6.0. We highlight some clusters. B) Overlay of

26

470

471

472

473

474

GFP-fluorescence patterns (green) of a Yju3-GFP fusion protein with DIC images to show the

cells (shades of grey) for an independently-produced set of strains and imaged under a

different microscope.

We find that most strong protein change profiles in the kinase deletion screens are

specific, occurring under just one kinase deletion perturbation. These results contrast with the

general stress-related protein localization changes that we found in previous perturbations.

These patterns (such as those in Msn2, Dot6, Rtg3, Tsr1, Stb3 and Lap4, previously discussed

in section 2.2) do not appear as generalized responses in the kinase deletion perturbations. We

do find several proteins in clusters specific to the elm1Δ perturbation previously shown to

relocate to nuclear or cytoplasmic foci under DNA replication stress, including Lap4, Hsp42,

Apj1, and Pph21 (13). Interestingly, we also find that Gre3, previously found to have an

abundance change but no localization change under DNA replication stress, shows a dramatic

localization change from cytoplasm to nucleus under elm1Δ perturbation in most of the cells

(Figure 8). Notably, elm1Δ induces an irregular morphology where the cells become

elongated, which is not observed in the other kinase deletions.

While most clusters for our kinase perturbations appear to be specific, we do find

some shared protein profile change patterns. A small set of protein profiles changes shared

between the hal5Δ and the kin2Δ perturbations consists of Sol4, Hsp30, and Hor2, all of

which are linked to various stress responses (13,42,43). We show Hsp30 in Figure 8, and note

that in both perturbations, a previously rare vacuolar phenotype in the wild-type becomes

much more frequent. Hsp30 has been shown to be activated independently of the general

stress-related transcription factors Msn2 and Msn4 (47), congruent with our observation that

Msn2 and other stress-related transcription factors are absent as a generalized program in the

kinase deletion perturbations. Additionally, we find two proteins that appear to have

reciprocal patterns of protein change profiles between the kin2Δ and mck1Δ perturbations,

Qri5 and Gln3. We show Gln3 in Figure 8; in the wild-type, the protein localizes as a

27

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

heterogeneous mixture of cytoplasmic and cytoplasmic-and-nucleus cells, but in kin2Δ, the

cells appear to be mostly cytoplasmic, whereas in mck1Δ, there appear to be relatively more

cytoplasmic-and-nucleus cells.

Like the results for our previous screens, we found strong protein change profiles in

ribosomal subunits that were specific to some perturbations. A cluster for the hal5Δ

perturbation contained the ribosomal subunits Rps23a, Rpl7b, and Rpl3. We show Rps23a in

Figure 8 as an example, with the localization appearing to be more strongly nuclear in the

perturbation than the wild-type.

We also found some strong specific protein change profile patterns for proteins that

we considered unusual in their trend of localization change, or in the context of the kinase.

For example, for hsl1Δ, we found Yju3, a monoglyceride lipase that is known to localize to

lipid particles and membranes (44); as Hsl1 regulates the morphogenesis checkpoint (45), we

could not understand the link from literature alone. Qualitative analysis of the associated

images indicates that while Yju3 in the wild-type does localize in an ER and cytoplasmic

pattern, a large proportion of the cells now exhibit a localization change to the

nucleus/nucleolus in the hsl1Δ perturbation (Figure 7). To confirm that this observation was

not simply due to contamination or another sample-specific technical issue, we verified the

phenotype with an independently-produced set of strains and a separate microscope (Figure

8B). We confirmed that Yju3 was localized in a punctate cytoplasm pattern in the wild-type

consistent with an ER localization, but changed to the nucleus under hsl1Δ perturbation,

confirming our observations in the GFP-library image collection. We regard Yju3’s

localization to the nucleus as unexpected in the context of its role as a lipase; we hypothesize

that this observation may reflect an unknown additional role for Yju3.

3. Discussion

28

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

Adding context to patterns of protein localization change deepens biological insight.

It is one thing to ask what proteins are changing for a perturbation; it is a more focused query

to ask what proteins are changing in a similar way for a perturbation, but specific to that

perturbation only. For example, for the α-factor treatment, we resolved a specific cluster of

proteins involved in the mating response. Another cluster in the α-factor perturbation, which

included DNA replication factors, had similar protein change profiles, but was shared with the

hydroxyurea perturbation. Had we clustered the α-factor localization change data

individually, the two clusters would have likely been mixed. With the addition of the

hydroxyurea perturbation, these clusters could be individually resolved. From examples like

these, we believe that this increased resolution offered by a multi-perturbation context

empowers identification of functional relationships between proteins by association. This will

likely further improve as high-throughput imaging experiments expand to include more

perturbations.

Moreover, representing protein localization changes comparably between

perturbations permits the detection of nuanced trends in protein responses. We showed that

components of the yeast stress response could be differentiated in subtle ways. Some proteins,

like Stb3 in rapamycin, responded throughout perturbations, whereas other proteins, like

Msn2, responded in a temporal manner even in the same perturbations. Some proteins, like

Tsr1, responded in more perturbations than others. Some proteins, like Lap4, responded in

different combinations of stressful perturbations than others. Many of these observations are

consistent with research on stress-responsive proteins: for instance, the temporal nature of

Msn2 can be attributed to its oscillatory pulsing behavior between the cytoplasm and nucleus

(34), and the more specialized response we observed in Lap4 can be attributed to its selective

transport to the vacuole by the autophagosome (26). Our approach summarizes various facets

of protein responses in a compact visualization. In doing so, we provide a glimpse of the

complex proteomic behaviors that occur within cells.

29

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

An important feature of our change detection and clustering approach is that it is

unbiased by prior biological knowledge. For this reason, it is encouraging that we recapitulate

much known biology in our clusters; the fact that we can find many specific examples of

known responses to perturbations provides evidence to support that our approach reflects

underlying biology. Furthermore, the unbiased nature of our approach permits the association

of new genes with previously known responses. For example, we found that Crz1 was

clustered with proteins in the mating response in the α-Factor perturbation, a result that was

initially unexpected given that Crz1 is not a canonical part of the mating response.

Interestingly, another study independently linked Crz1 to the mating pheromone (32),

confirming that even well-characterized pathways can benefit from unbiased, high-throughput

exploratory analyses of protein localization changes.

In our examples of how to apply this approach for hypothesis development, we

identified Rtg3 as a pulsatile transcription factor modulated by rapamycin in this study. Our

results contradict a previous proteome-wide screen that suggested that Rtg3 does not pulse

(36). However, the highly variable dynamics of pulsing from cell-to-cell (34,36) and the large

number of proteins independently tested in the screen, as well as differences in microscopy

and imaging conditions are plausible reasons for the discrepancy. In contrast to the time-lapse

movie data used to analyse pulsing, our image data is much less comprehensive, consisting of

still images of just three coarse time-points. Rather, our discovery was powered by

associating protein behavior, by observing that Rtg3 had a similar protein change profile to

other pulsatile transcription factors whose dynamics were affected by rapamycin. That we

could make findings missed by stronger experimental approaches demonstrates how looking

for protein properties by association can be surprisingly powerful.

We believe that our approach strongly complements human evaluation of images,

instead of fully replacing it, because an expert observer can provide the critical context to

evaluate if the signals found by our unsupervised approach are biological or technical. At the

same time, careful human evaluation takes too long to look at hundreds of thousands of

30

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

images over the whole proteome, many of which may only have subtle trends; our method

drastically reduces this search space, allowing observers to target proteins with strong and

interesting signals, and equips observers with knowledge of which specific screens to look at,

the general nature of the putative change, and if the change is likely to be a more or less

functionally-specific one based upon the permutation of perturbations it occurs in.

While our approach is based upon direct imaging of cells, predictive multi-

perturbation approaches of protein localization are also available (24,46–48). While the latter

approach is a valuable alternative where experimental data is not readily generated, we

believe that predictions based on protein-interactions and mRNA expression patterns (24) will

miss much of the localization dynamics. We found that not all clusters of protein change

profiles overlap well with either of these types of datasets, suggesting that the underlying

mechanisms that drive localization changes are diverse. Indeed, in recent predictions of

localization change (24) under stress, mitochondrion to nucleus and ER to Golgi, were

predicted to be the most frequent type of change. We observe that transitions between the

nucleus and cytoplasm were very common in the localization changes we looked at. While

this could be a technical effect because our change detection method is more sensitive to

localization changes that reflect larger spatial distances (discussed in (19)), we also observe

that many of the transcription factors we observed as changing under stressful conditions

were not predicted to have localization changes by a previous predictive method that relied

upon protein-interaction and mRNA expression datasets (24).

We present this research as a proof-of-principle for combining systematic imaging

experiments. Certainly, our method has many technical limitations. For example, we analysed

two sets of screens in this study, one of lower quality than the other. Because our change

detection algorithm (19) uses real data from our screens to build expectations of what a non-

changing protein looks like, the overall quality of the image screens may affect what

proteomic changes are detected. We noticed that we found considerably smaller clusters of

protein change profiles, and fewer shared patterns of protein change profiles in our lower-

31

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

quality kinase deletion screens compared to our first set of screens. We do not expect our

study to capture all localization changes due to the incomplete specificity of our change

detection method, but we demonstrate that analysing even the localization changes we

captured was highly informative.

Future work should further improve our general strategy of analysing protein

localization changes in a multi-perturbation context. We believe that our general strategy of

viewing localization changes in a multi-perturbation context may lead to implications beyond

model organisms. New methods are beginning to emerge for the scalable GFP tagging of

human proteins (49); given that our strategy can potentially elucidate protein pathway

responses and compensatory rearrangements of the proteome in response to drugs, it could

identify potential protein targets for knockout or combination therapies. Importantly, as our

approach can separate more general responses from those more specific to a drug, it may

permit the informed selection of protein targets that minimize side effects. Furthermore, while

we frame our work as relevant to perturbations, our approach can easily be extended to study

proteomic changes from differences in tissue or cell-type (50).

4. Methods

4.1 Experimental Strains and Image Acquisition

Image data for wild-type GFP-tagged yeast cells, in addition to the rpd3Δ, rapamycin,

hydroxyurea, and α-factor perturbations, were taken from the CYCLoPs database (9).

The iki3Δ, elm1Δ, hal5Δ, hsl1Δ, kin1Δ, kin2Δ, mck1Δ and vhs1Δ strains were

constructed and imaged as described in Chong et al. (12). Fluorescent micrographs were

acquired using a high-throughput spinning-disc confocal microscope (Opera, PerkinElmer)

with a water-immersion 60X objective (NA 1.2, image depth 0.6 µm and lateral resolution

0.28 µm). Acquisition settings included using a 405/488/561/640 nm primary dichroic, a 568

nm detector dichroic, a 520/35 nm filter in front of camera 1 (12-bit CCD) and a 600/40 nm

32

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

filter in front of camera 2 (12-bit CCD). Excitation was conducted using 488 nm (blue) and

561 nm (green) lasers at maximum power. Eight images were acquired for each well, 4 in the

red channel and 4 in the green channel with simultaneous acquisition of red and green

channels (binning = 1, focus height 2 µm) and an 800 ms exposure time per site.

For images that we present in this study, we crop a 300x300 pixel box from the

respective images showing representative cells for the sample. These images are outputted by

our single cell segmentation program (21), and show blue lines around the cells indicating the

results of the segmentation, and white circles connecting mother and bud cells. Cells

considered to be artifacts are indicated by white crossed-out regions in the images. These

images are rescaled in contrast using the highest and lowest intensity pixels in the images; we

preserve this contrast to better show subcellular localization patterns for proteins expressed at

lower abundances.

For the GPF-tagged Rtg3 strains presented in Figure 4C and 7B, respectively, strains

were generated as fusion products via homologous recombination of the native genomic

sequence. Direct transformation of a linear PCR product containing codon-optimized GFP

coding sequence and a selectable resistance marker flanked by gene-specific sequence yielded

C-terminally tagged fusion products which were then isolated by the inferred drug resistance.

To prepare strains for microscopy, strains were inoculated into synthetic minimal media and

grown overnight @ 30˚C. Prior to imaging, stationary cultures were diluted 1/10 in fresh

media and grown at 30˚C for 4 hours to ensure log-phase growth and proper expression of

GFP fusion products.

To produce time-lapse movies of the Rtg3 strains, cells were dosed with 200 ng/ml of

rapamycin for the rapamycin treatment perturbation, and imaged at 22˚C for 16 hours at 2.5

minute intervals for 4 z-stacks at 1 micrometer intervals. Cells were imaged using a Nikon

spinning disk confocal microscope using a 60X oil-immersion objective. GFP excitation was

at 488 nm. To analyse frames of the codon-optimized GFP strain, segmentation and tracking

was conducted using Matlab on the brightfield image to identify cell peripheries. The

33

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

localization score of the GFP signal is calculated by taking the difference between the average

intensity of the 10% brightest pixels within a cell and the average intensity of the other 90%

of pixels in the cell; under this quantification, a nuclear localization will have a higher score

than a cytoplasmic localization. To identify pulses, a two-state hidden Markov machine was

applied across the series of localization scores for the entire timeframe of each single cell.

For confocal analysis of the Yju3 strain, cells were imaged using a Leica TCS SP8

confocal microscope with a 63X oil-immersion objective and a 10X eye-piece objective. GFP

excitation was at 488nm.

4.2 Image Analysis and Quantification of Localization Changes

To quantify the patterns of GFP in the cells in our images, we use the single-cell

segmentation and feature extraction program as described in (21), specialized for the

segmentation and measurement of GFP-tagged yeast cells. We modify the method of

averaging single-cell features by reducing the number of bins for cell size from 10 to 5, to

increase the number of cells within each bin.

An important property of these features is that they track the concentration and

distribution of GFP-tagged proteins relative to certain cellular landmarks. As opposed to

features that only track the shape of the GFP pattern, these features allow us to track relative

distributions between localizations when a protein is localized to two or more compartments,

and shifts the ratio of protein between these compartments, such as in the examples in section

2.3.3. As subcellular organization emerges from affinity and equilibria, it can be fuzzy in their

dynamic equilibria as opposed to sharply defined binaries (51), reinforcing the need for the

descriptions of protein distribution we pursue here rather than sharply defined categorizations

based upon morphology. However, our features are also sensitive to strong protein abundance

changes that drastically alter local concentrations of proteins, so some abundance chances will

also be represented in our clusters. In considering this trade-off, we realized that the

34

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

distinction between protein localization and abundance chances is sometimes arbitrary. For

example, one unclear case is where the protein is not expressed in the wild-type, but shows a

specific localization in the perturbation.

To conduct localization change detection on these features, we apply our previously

described method in (19). This unsupervised change detection method constructs a

conditional expectation for each protein and reports the direction and magnitude of deviation

of the protein’s change compared to this expectation; because these conditional expectations

are constructed locally for each protein, differences in feature measurements between samples

explained by simple variation in the morphology of an organelle or systematic biases between

image screens are de-emphasized, allowing for the fairer comparison of localizations

differently impacted by these effects. The unsupervised change detection method quantifies

the putative localization change for each protein with a shared set of features. This

representation instead encodes the nature of the localization change in the pattern across our

vector of features; for instance, cytoplasm to the nucleus movements are characterized by

strongly positive z-scores for “distance to cell center” and “distance between proteins”

features, which indicate that the wild-type values for these features are higher than the

perturbation values. Using this method, localization changes are presented in a quantitative

and comparable way that summarizes each localization change as a relative measure between

the wild-type and the perturbation.

We use a parameterization of k = 50 for all screens. We increase the leniency of our

filters for sample size reliability compared to the method described in (19), permitting vectors

that have at least 1 cell in each bin rather than requiring at least 5 as originally described; this

permits for the retention of more data, for more complete aggregated vectors, at the expense

of some reliability in our feature measurements. To compensate for this, we use the more

robust truncated mean profiling method described in the previous study. We apply the image

segmentation and feature extraction method to each image screen in our dataset, and then the

change detection method between each image screen paired against the WT2 screen, which

35

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

served as our common reference wild-type screen for all perturbations and all other wild-

types.

To evaluate sensitivity, we randomly drew 20 images from the collection of proteins

that had strong protein change profiles RAP2 screen relative to the WT2 screen. We chose the

the RAP2 because the rapamycin treatment seemed to induce the most changes other than the

rpd3 deletions (which was already used to evaluate the method in a previous study (19)). We

blinded the protein identities and assigned 3 observers to independently annotate both sets of

images. Proteins were considered positive for a visually apparent localization change if at

least 2 out of 3 observers agreed that there was a localization change; proteins were

considered ambiguous for a localization change if at least 2 out of 3 observers considered the

image ambiguous for a localization change, or if 1 observer considered it positive and at least

1 other observer considered it ambiguous. As we found in 2.3.3, some localization changes

found by our algorithm are subtle, and may be challenging to discern without context, so this

approach only provides an estimate of sensitivity.

4.3 Clustering and Visualization of Aggregated Vectors

As discussed in section 2.2, the protein change profile for individual perturbations is

assembled into an aggregated profile. We cluster these profiles using the open-source Cluster

3.0 package (52), with hierarchical agglomerative clustering using uncentered correlation

distance and average linkage. While this operation can be performed on the entire set of

profiles, for the heat maps we present in Figure 1 and Figure 6, we use some filters prior to

clustering to reduce the number of profiles visualized to just the strongest signals. We require

that the aggregated vector for the non-kinase deletion perturbations have at least 3 z-scores

over an absolute value of 5.0 and greater than 80% of data present, resulting in 1159 proteins

displayed. As there are fewer perturbations in the aggregated vector for the kinase deletion

36

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

perturbations, we use a more lenient threshold of at least 2 z-scores over an absolute value of

5.0 and greater than 80% of data present, for a total of 865 proteins displayed.

To visualize these resultant clustered matrices as heat maps, we use Java Treeview

(53). As with microarray convention, red values indicate positive z-scores and green values

indicate negative z-scores, with the intensity of the color indicating the magnitude. Grey

values indicate missing data.

4.4 Comparison with Other High-Throughput Data Sources

Enrichment analyses of the clusters in Table 2 were conducted using the 1159

proteins visualized in Figure 1 as the background population. We look for gene ontology-

enriched terms (54) in biological process, cellular component, and molecular function, with

Benjamini-Hochberg correction (55) at a 0.05 FDR, using the YeastMine tool (56) on the

Saccharomyces Genome Database. We report a sample of the significant terms that we found

for clusters in Table 2; full lists of enrichment for each cluster can be found in Supplementary

Data 3.

To compare our protein change profile clusters with transcriptional changes for these

perturbations, we used data from four separate genomic expression microarray experiments

for the rpd3Δ (57), hydroxyurea (58), rapamycin (59), and α-factor (60) perturbations in

yeast. We only include time points from these microarray experiments earlier or equivalent to

the time points of the corresponding time points in our image screen data. For the proteins

presented in Figure 1, we append this transcriptional data to the aggregated protein change

profiles for the rpd3Δ, hydroxyurea, rapamycin, and α-factor perturbations (in addition to the

non-reference wild-types), and repeat the clustering operation to further resolve any sub-

clusters that may emerge with the inclusion of the transcription data. Data for the full heat

map is available in Supplementary Data 5, while relevant sections are displayed in Figure 6A.

The transcript fold-regulation data is displayed on a different intensity scale than the profile

37

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

change profile z-scores to account for the differences in significance between the units, and

with four pixels per column of transcript data as opposed to one pixel for each localization

change feature, to better display the smaller number of columns in the transcript data.

Protein-protein interactions for the Sacchromyces cerevisiae proteome was taken

from the Biogrid database (40). We compared our data with physical interactions from low-

throughput experimental sources only to improve the confidence of our set of interactions. We

defined 15 clusters of proteins with strong protein change profiles, involving a total of 236

proteins between all clusters, using the dendrogram for the heat map presented in Figure 1;

the specific proteins involved in each cluster can be found in Supplementary Data 6. We

counted the number of physical and low-throughput interactions, respectively, between the

proteins of each cluster individually. To generate null expectations, we randomly shuffled the

236 proteins within our defined clusters while retaining the sizes of each cluster, and counted

the resulting number of interactions within each cluster. We repeated this simulation 10,000

times to produce a mean and standard deviation for number of interactions for randomized

clusters, and report the true number of interactions for each cluster as a z-score against these

expectations.

We use the manual annotations for the wild-type yeast-GFP collection by Huh et al.

(20) to assess wild-type protein localization within our clusters. To reduce the number of

annotations to a reasonable number of classes to color-code within our scatterplot, we

condensed some annotations into a single class. We condense golgi, punctuate composite,

lipid particle, endosome and peroxisome annotations into “punctuate organelles”, and spindle

pole, microtuble, and actin into “structural components”. In addition, we simplify vacuole and

vacuole membrane-localized proteins to “vacuole” and nucleus and nucleolus to “nucleus”.

For multiply localized compartments, we include a “cytoplasm and nucleus” and “cytoplasm

and other” class as the two most common types of multi-localized proteins; for proteins that

did not fall into either categorization, we classified it as the more dominant class by the

annotation. For the scatterplot, we only display the 608 strongest localization change signals

38

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

for the perturbations presented in Figure 1, found using a filter of at least 2 values over an

absolute value of 6. We use the t-SNE dimensionality reduction method (61) to reduce our

aggregated vectors to two dimensions appropriate for a scatterplot; as pre-processing steps,

we impute missing data using randomly samples values from a Gaussian constructed for each

column in our data, and apply PCA to reduce the dimensionality to 50 before applying t-SNE.

Supplementary Figure 1. A visual explanation of the features comprising our

aggregated protein change profiles. For each perturbation, a z-score representing the relative

change between the perturbation and the reference wild-type is generated for 50 features for

each protein. Each feature measures an aspect of GFP distribution, averaged over 1 of 10 bins

of cells designed to track cell cycle stage and differentiate mother and bud cell types. These z-

scores compromise a protein change profile, that are appended to each other across multiple

perturbations.

References

1. Cyert MS. Regulation of nuclear localization during signaling. J Biol Chem. American Society for Biochemistry and Molecular Biology; 2001 Jun 15;276(24):20805–8.

2. Bauer NC, Doetsch PW, Corbett AH. Mechanisms Regulating Protein Localization. Traffic. 2015 Oct;16(10):1039–61.

39

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807808

809810

3. Protter DSW, Parker R. Principles and Properties of Stress Granules. Trends Cell Biol. 2016;26(9):668–79.

4. Yuet KP, Tirrell DA. Chemical tools for temporally and spatially resolved mass spectrometry-based proteomics. Ann Biomed Eng. NIH Public Access; 2014 Feb;42(2):299–311.

5. Dephoure N, Gygi SP. Hyperplexing: A Method for Higher-Order Multiplexed Quantitative Proteomics Provides a Map of the Dynamic Response to Rapamycin in Yeast. Sci Signal. 2012 Mar 27;5(217):rs2–rs2.

6. Nagaraj N, Alexander Kulak N, Cox J, Neuhauser N, Mayr K, Hoerning O, et al. System-wide Perturbation Analysis with Nearly Complete Coverage of the Yeast Proteome by Single-shot Ultra HPLC Runs on a Bench Top Orbitrap. Mol Cell Proteomics. 2012 Mar 1;11(3):M111.013722–M111.013722.

7. Mattiazzi Usaj M, Styles EB, Verster AJ, Friesen H, Boone C, Andrews BJ. High-Content Screening for Quantitative Cell Biology. Trends Cell Biol. 2016;26(8):598–611.

8. Caicedo JC, Singh S, Carpenter AE. Applications in image-based profiling of perturbations. Curr Opin Biotechnol. 2016;39:134–42.

9. Koh JLY, Chong YT, Friesen H, Moses A, Boone C, Andrews BJ, et al. CYCLoPs: A Comprehensive Database Constructed from Automated Analysis of Protein Abundance and Subcellular Localization Patterns in Saccharomyces cerevisiae. G3 (Bethesda). 2015 Jun;5(6):1223–32.

10. Riffle M, Davis TN. The Yeast Resource Center Public Image Repository: A large database of fluorescence microscopy images. BMC Bioinformatics. 2010 Jan;11:263.

11. Breker M, Gymrek M, Moldavski O, Schuldiner M. LoQAtE—Localization and Quantitation ATlas of the yeast proteomE. A new tool for multiparametric dissection of single-protein behavior in response to biological perturbations in yeast. Nucleic Acids Res. 2014 Jan;42(D1):D726–30.

12. Chong YT, Koh JLY, Friesen H, Kaluarachchi Duffy S, Cox MJ, Moses A, et al. Yeast Proteome Dynamics from Single Cell Imaging and Automated Analysis. Cell. 2015 Jun;161(6):1413–24.

13. Tkach JM, Yimit A, Lee AY, Riffle M, Costanzo M, Jaschob D, et al. Dissecting DNA damage response pathways by analysing protein localization and abundance changes during DNA replication stress. Nat Cell Biol. Nature Publishing Group; 2012 Sep 29;14(9):966–76.

14. Kraus OZ, Grys BT, Ba J, Chong Y, Frey BJ, Boone C, et al. Automated analysis of high‐content microscopy data with deep learning. Mol Syst Biol. 2017 Apr 18;13(4).

15. Breker M, Gymrek M, Schuldiner M. A novel single-cell screening platform reveals proteome plasticity during yeast stress responses. J Cell Biol. 2013 Mar 18;200(6):839–50.

16. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. National Academy of Sciences; 1998 Dec 8;95(25):14863–8.

17. Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. American Society for Cell Biology; 2000 Dec;11(12):4241–57.

40

811812

813814815

816817818

819820821822

823824825

826827

828829830831

832833

834835836837

838839840

841842843844

845846

847848849

850851852

853854855

18. Boisvert F-M, Lam YW, Lamont D, Lamond AI. A quantitative proteomics analysis of subcellular proteome localization and changes induced by DNA damage. Mol Cell Proteomics. American Society for Biochemistry and Molecular Biology; 2010 Mar;9(3):457–70.

19. Lu AX, Moses AM. An Unsupervised kNN Method to Systematically Detect Changes in Protein Localization in High-Throughput Microscopy Images. PLoS One. 2016;11(7):e0158712.

20. Huh W-K, Falvo J V, Gerke LC, Carroll AS, Howson RW, Weissman JS, et al. Global analysis of protein localization in budding yeast. Nature. 2003 Oct 16;425(6959):686–91.

21. Handfield L-F, Chong YT, Simmons J, Andrews BJ, Moses AM. Unsupervised clustering of subcellular protein expression patterns in high-throughput microscopy images reveals protein complexes and functional relationships between proteins. PLoS Comput Biol. 2013 Jan;9(6):e1003085.

22. Loewith R, Hall MN. Target of rapamycin (TOR) in nutrient signaling and growth control. Genetics. Genetics Society of America; 2011 Dec;189(4):1177–201.

23. Koç A, Wheeler LJ, Mathews CK, Merrill GF. Hydroxyurea arrests DNA replication by a mechanism that preserves basal dNTP pools. J Biol Chem. American Society for Biochemistry and Molecular Biology; 2004 Jan 2;279(1):223–30.

24. Lee K, Sung M-K, Kim J, Kim K, Byun J, Paik H, et al. Proteome-wide remodeling of protein location and function by stress. Proc Natl Acad Sci U S A. National Academy of Sciences; 2014 Jul 29;111(30):E3157–66.

25. Liko D, Conway MK, Grunwald DS, Heideman W. Stb3 Plays a Role in the Glucose-Induced Transition from Quiescence to Growth in Saccharomyces cerevisiae. Genetics. 2010 Jul 1;185(3):797–810.

26. Suzuki K, Kamada Y, Ohsumi Y. Studies of cargo delivery to the vacuole mediated by autophagosomes in Saccharomyces cerevisiae. Dev Cell. 2002 Dec;3(6):815–24.

27. Day A, Schneider C, Schneider BL. Yeast cell synchronization. Methods Mol Biol. 2004;241:55–76.

28. Udden MM, Finkelstein DB. Reaction order of Saccharomyces cerevisiae alpha-factor-mediated cell cycle arrest and mating inhibition. J Bacteriol. American Society for Microbiology (ASM); 1978 Mar;133(3):1501–7.

29. Nguyen VQ, Co C, Irie K, Li JJ. Clb/Cdc28 kinases promote nuclear export of the replication initiator proteins Mcm2-7. Curr Biol. 2000 Feb 24;10(4):195–205.

30. Butler AR, White JH, Folawiyo Y, Edlin A, Gardiner D, Stark MJ. Two Saccharomyces cerevisiae genes which control sensitivity to G1 arrest induced by Kluyveromyces lactis toxin. Mol Cell Biol. 1994 Sep;14(9):6306–16.

31. Lahav R, Gammie A, Tavazoie S, Rose MD. Role of Transcription Factor Kar4 in Regulating Downstream Events in the Saccharomyces cerevisiae Pheromone Response Pathway. Mol Cell Biol. 2007 Feb 1;27(3):818–29.

32. Carbó N, Tarkowski N, Ipiña EP, Dawson SP, Aguilar PS. Sexual pheromone modulates the frequency of cytosolic Ca(2+) bursts in Saccharomyces cerevisiae. Mol Biol Cell. American Society for Cell Biology; 2017 Feb 15;28(4):501–10.

33. Alvers AL, Wood MS, Hu D, Kaywell AC, Dunn WA, Aris JP, et al. Autophagy is required for extension of yeast chronological life span by rapamycin. Autophagy. NIH

41

856857858859

860861862

863864865

866867868869

870871

872873874

875876877

878879880

881882

883884

885886887

888889

890891892

893894895

896897898

899900

Public Access; 2009 Aug;5(6):847–9.

34. Jacquet M, Renault G, Lallet S, De Mey J, Goldbeter A. Oscillatory nucleocytoplasmic shuttling of the general stress response transcriptional activators Msn2 and Msn4 in Saccharomyces cerevisiae. J Cell Biol. 2003 May 12;161(3):497–505.

35. Hall MN, Beck T. The TOR signalling pathway controls nuclear localization of nutrient-regulated transcription factors. Nature. Nature Publishing Group; 1999 Dec 9;402(6762):689–92.

36. Dalal CK, Cai L, Lin Y, Rahbar K, Elowitz MB. Pulsatile dynamics in the yeast proteome. Curr Biol. NIH Public Access; 2014 Sep 22;24(18):2189–94.

37. Lu H, Zhu Y, Xiong J, Wang R, Jia Z. Potential extra-ribosomal functions of ribosomal proteins in Saccharomyces cerevisiae. Microbiol Res. 2015;177:28–33.

38. Fernandez-Pevida A, Rodriguez-Galan O, Diaz-Quintana A, Kressler D, de la Cruz J. Yeast Ribosomal Protein L40 Assembles Late into Precursor 60 S Ribosomes and Is Required for Their Cytoplasmic Maturation. J Biol Chem. 2012 Nov 2;287(45):38390–407.

39. Stauffer B, Powers T. Target of rapamycin signaling mediates vacuolar fission caused by endoplasmic reticulum stress in Saccharomyces cerevisiae. Mol Biol Cell. American Society for Cell Biology; 2015 Dec 15;26(25):4618–30.

40. Oughtred R, Chatr-aryamontri A, Breitkreutz B-J, Chang CS, Rust JM, Theesfeld CL, et al. BioGRID: A Resource for Studying Biological Interactions in Yeast: Table 1. Cold Spring Harb Protoc. 2016 Jan 4;2016(1):pdb.top080754.

41. Nolan S, Cowan AE, Koppel DE, Jin H, Grote E. FUS1 regulates the opening and expansion of fusion pores between mating yeast. Mol Biol Cell. American Society for Cell Biology; 2006 May;17(5):2439–50.

42. Seymour IJ, Piper PW. Stress induction of HSP30, the plasma membrane heat shock protein gene of Saccharomyces cerevisiae, appears not to use known stress-regulated transcription factors. Microbiology. 1999 Jan 1;145(1):231–9.

43. Pahlman A-K, Granath K, Ansell R, Hohmann S, Adler L. The Yeast Glycerol 3-Phosphatases Gpp1p and Gpp2p Are Required for Glycerol Biosynthesis and Differentially Involved in the Cellular Responses to Osmotic, Anaerobic, and Oxidative Stress. J Biol Chem. 2001 Feb 2;276(5):3555–63.

44. Heier C, Taschler U, Rengachari S, Oberer M, Wolinski H, Natter K, et al. Identification of Yju3p as functional orthologue of mammalian monoglyceride lipase in the yeast Saccharomycescerevisiae. Biochim Biophys Acta. Elsevier; 2010 Sep;1801(9):1063–71.

45. McMillan JN, Longtine MS, Sia RA, Theesfeld CL, Bardes ES, Pringle JR, et al. The morphogenesis checkpoint in Saccharomyces cerevisiae: cell cycle control of Swe1p degradation by Hsl1p and Hsl7p. Mol Cell Biol. 1999 Oct;19(10):6929–39.

46. Gardy JL, Spencer C, Wang K, Ester M, Tusnády GE, Simon I, et al. PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Res. 2003 Jul 1;31(13):3613–7.

47. Horton P, Park K-J, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, et al. WoLF PSORT: protein localization predictor. Nucleic Acids Res. 2007 May 8;35(Web Server):W585–7.

42

901

902903904905

906907908

909910

911912

913914915916

917918919

920921922

923924925

926927928

929930931932

933934935936

937938939

940941942

943944945

48. Lee K, Kim D-W, Na D, Lee KH, Lee D. PLPD: reliable protein localization prediction from imbalanced and overlapped datasets. Nucleic Acids Res. 2006 Oct;34(17):4655–66.

49. Leonetti MD, Sekine S, Kamiyama D, Weissman JS, Huang B. A scalable strategy for high-throughput GFP tagging of endogenous human proteins. Proc Natl Acad Sci. 2016 Jun 21;113(25):E3501–8.

50. Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, et al. Tissue-based map of the human proteome. Science (80- ). 2015 Jan 23;347(6220):1260419–1260419.

51. Kustatscher G, Rappsilber J. Compositional Dynamics: Defining the Fuzzy Cell. Trends Cell Biol. Elsevier; 2016 Nov;26(11):800–3.

52. de Hoon MJL, Imoto S, Nolan J, Miyano S. Open source clustering software. Bioinformatics. 2004 Jun 12;20(9):1453–4.

53. Saldanha AJ. Java Treeview--extensible visualization of microarray data. Bioinformatics. 2004 Nov 22;20(17):3246–8.

54. Consortium TGO. Gene Ontology Consortium: going forward. Nucleic Acids Res. Oxford University Press; 2015 Jan 28;43(D1):D1049–56.

55. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. JSTOR; 1995;289–300.

56. Balakrishnan R, Park J, Karra K, Hitz BC, Binkley G, Hong EL, et al. YeastMine--an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit. Database (Oxford). Oxford University Press; 2012;2012:bar062.

57. Bernstein BE, Tong JK, Schreiber SL. Genomewide studies of histone deacetylase function in yeast. Proc Natl Acad Sci. 2000 Dec 5;97(25):13708–13.

58. Dubacq C, Chevalier A, Courbeyrette R, Petat C, Gidrol X, Mann C. Role of the iron mobilization and oxidative stress regulons in the genomic response of yeast to hydroxyurea. Mol Genet Genomics. 2006 Feb;275(2):114–24.

59. Hardwick JS, Kuruvilla FG, Tong JK, Shamji AF, Schreiber SL. Rapamycin-modulated transcription defines the subset of nutrient-sensitive signaling pathways directly controlled by the Tor proteins. Proc Natl Acad Sci U S A. National Academy of Sciences; 1999 Dec 21;96(26):14866–70.

60. Roberts CJ, Nelson B, Marton MJ, Stoughton R, Meyer MR, Bennett HA, et al. Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. Science. 2000 Feb 4;287(5454):873–80.

61. Maaten L van der, Hinton G. Visualizing Data using t-SNE. J Mach Learn Res. 2008;9(Nov):2579–605.

43

946947948

949950951

952953954

955956

957958

959960

961962

963964

965966967

968969

970971972

973974975976

977978979

980981

982

983

· web viewto date, work in understanding these proteomic responses through high-throughput...

Documents