· web viewto date, work in understanding these proteomic responses through high-throughput...
TRANSCRIPT
Original Manuscript
Diverse patterns of protein subcellular localization change inferred from 100,000s of
microscopy images
Alex X Lu1, Yolanda T Chong2, Bob Strome3, Ian Hsu3, Louis-François Handfield1, Oren
Kraus4, Brenda J Andrews2, Alan M Moses1,3,5
1 Department of Computer Science, University of Toronto
2 Terrence Donnelly Centre for Cellular & Biomolecular Research, University of
Toronto
3 Department of Cell & Systems Biology, University of Toronto
4 Department of Electrical Engineering, University of Toronto
5 Center for Analysis of Genome Evolution and Function, University of Toronto
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Abstract
Evaluating protein localization changes on a systematic level is a powerful tool for
understanding how cells respond to environmental, chemical, or genetic perturbations. To
date, work in understanding these proteomic responses through high-throughput imaging has
catalogued localization changes independently for each perturbation. To distinguish changes
that are targeted responses to the specific perturbation and more generalized programs, we
developed a scalable approach to visualize the localization behavior of proteins across
multiple experiments as a quantitative pattern. By applying this approach to 24 experimental
screens consisting of nearly 400,000 images, we differentiate specific responses from more
generalized ones, discover nuance in the localization behavior of stress-responsive proteins,
and form hypotheses by clustering proteins with similar patterns. In one case, we confirmed
using time-lapse fluorescence imaging that, as predicted by our cluster analysis, Rtg3 shows
stress-modulated nuclear pulses. In another case, we confirmed that Yju3 shows an
unexpected relocalization to the nucleus in a Hsl1 deletion background. Our work provides
demonstrates how the growing scale of imaging data can be exploited to yield unexpected
new cell biology that would have not been evident when analysing each dataset
independently.
2
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
1. Introduction
Controlling the subcellular localization of proteins has long been understood as an
important component of a cell’s regulatory toolkit in response to perturbations (1–3), such as
drug treatments, genetic mutations, or environmental stressors. Towards the goal of
systematically characterizing these proteome dynamics, high-throughput technologies have
been employed to systematically gather data about protein localization in cells (4–6). Here,
we focus on GFP-tagged cell libraries using automated high-throughput microscopy (7,8),
which yield terabyte-scale image datasets (9–11) that show the subcellular localization of
each individual protein in a cell over varied perturbations. Using these so-called image
screens, protein localization changes have been systematically identified under a handful of
genetic and environmental perturbations (12–15).
Excitingly, the growing quantity and diversity of perturbations imaged opens the
possibility of analysing these screens in concert, rather than individually for each
perturbation. Multi-perturbation strategies have already been demonstrated through widely-
adopted cluster analysis methods (16) on microarray and RNA-seq data, providing a template
for how to discover complex proteome localization dynamics. One well-known example is the
identification of the environmental stress response in yeast; by appending the results of 142
different stressful microarray experiments into a single vector for each gene, then clustering
the resultant matrix to find shared patterns in the transcriptional regulation of genes across
experiments, Gasch et al. identified a set of ~900 genes whose transcript level changed under
most environmental stress perturbations, and differentiated this shared response from genomic
responses to specific stressors (17). Inspired by these approaches in genomic expression data,
we developed a similar strategy for protein localization changes. We reasoned that doing so
would provide the context to determine if a localization change is specific to a perturbation, or
a more general response found across multiple perturbations, potentially leading into a richer
3
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
understanding of protein function and of how cells permutate, adjust, and repurpose these
functions to adapt to a wide range of perturbations.
To analyze the localization patterns from multiple screens, scalable data acquisition
and analysis methods that produce quantitative, comparable summaries of protein localization
changes are required. Using automated fluorescence microscopy, images can be acquired with
relatively consistent lighting and growth conditions (12–14). For these data, analysis methods
have focused upon finding differences in the annotations and localization classes of cells (12–
14), either through visual inspection or supervised machine learning, which are performed one
screen at a time. Some previous clustering of protein localization changes has been
demonstrated on spatial mass spectrometry data, but only with a single drug treatment and
only capturing changes in relative distribution between the cytoplasm, nucleus, and nucleolus
(18). Recently, we developed an unsupervised localization change detection algorithm (19)
that requires no training of parameters on each screen, easily scales to large image collections,
and describes relative patterns of change quantitatively, rather than relying on predefined
classes, and is therefore applicable to all subcellular localization patterns. Here, we attempt to
exploit the increasing scale of image data to provide mutual context to these protein-level
responses. By using “big data” in cell biology (20), we can provide mutual context and
hypothesis discovery in tandem, that would have not been evident analysing each dataset
independently.
Using data from automated microscopy and our new analysis method, we present, to
our knowledge, the first quantitative, multi-perturbation cluster analysis of protein
localization changes. We show that our data acquisition and analysis methods can scale to
analyses of 100,000s of images. We find that protein profiles for localization changes can be
separated into clusters ranging from being specific to one perturbation, to being shared
between various permutations of perturbations, to exhibiting different patterns depending
upon the perturbation. Moreover, we find functionally specific themes for many of these
clusters consistent with the known biological effects of the perturbations. We show that
4
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
although some protein localization changes are accompanied by transcriptional changes, in
general subcellular localization is an independent layer of regulation and that shared patterns
of localization changes cannot be simply explained by physical interactions or subcellular
compartments. We perform small-scale experiments to confirm several novel observations,
illustrating how these large-scale exploratory analyses yield specific experimental hypotheses.
2. Results
2.1 An unsupervised analysis of protein localization changes in over 280,000 images
In this work, we sought to understand protein localization changes across a wide
range of perturbations. To achieve this, we first gathered a dataset comprising of 17 image
screens of the yeast GFP-fusion collection (20) under various drug treatments and genetic
mutations. 15 screens are previously published from the CYCLoPs database (9), and consist
of 3 wild-type replicates, 3 replicates of rpd3Δ deletion, and 3 timepoints each of rapamycin
(RAP), hydroxyurea (HU), and α-factor (AF) treatment. We further generated another 2
screens, consisting of 2 replicates of iki3Δ. Altogether, this dataset encompasses 4143
individual yeast genes for a total of 281,724 images and an estimated 20.1 million single cells.
Next, to facilitate comparison between experiments, we converted these images into
quantitative measures of localization change, or protein change profiles. We use an
unsupervised localization change detection method (19) performed on a set of interpretable
single cell measurements for yeast cells (21). Briefly, we measure features that track the
distribution of GFP-tagged protein relative to certain cell landmarks, such as the average
distance to the cell center or cell edge. We combine these single cell measurements for each
protein to produce a representation of the “average” cell for each protein in each image
screen. Then, the change detection method calculates a vector of z-scores that report the
relative change for each protein between two image screens, in a way that corrects for
differences stemming from systematic biases between image screens or morphological
5
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
variability between compartments (see methods section 4.2 for more information, including
important technical considerations in our method) We applied these methods to produce a
protein change profile that represents the localization change for each protein in each image
screen, each relative to a standard reference wild-type. An example showing more detailed
information about our features can be found in Supplementary Figure 1.
We sought to aggregate the protein change profiles for individual perturbations into a profile
that would represent multiple perturbations. We did so by appending individual protein
change profiles for the same protein. The heat map in Figure 1 shows these aggregated
vectors. Because localization changes are reported as a relative measure between the wild-
type and perturbation, the z-scores constituting the profiles are larger in magnitude where
localization changes are stronger relative to fluctuations observed in similar proteins. We
found that we could discern in which perturbations a protein changed localization changes (or
not), by assessing intense (or dim) patterns in the profiles. As an estimate of positive
predictive value, we randomly drew 20 images for the set of 85 proteins that had bright
patterns (and thus strong evidence for localization changes) for the RAP2 screen, curated by
qualitatively selecting clusters of bright patterns of protein change profiles from our heat map,
and found that 10 of 20 had visually obvious localization changes, while another 3 were
considered ambiguous for localization changes (suggesting positive predictive value ~ 50%).
Moreover, we found that we could use comparisons of two wild-types relative to the reference
wild-type as controls: where there are similar protein change profiles for our other wild-type
screens, localization changes likely arise from technical issues, experimental variability, or
strain-specific effects, rather than being specifically induced by perturbations.
6
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
Figure 1. A clustered heat map of the aggregated protein change profiles. This heat
map shows a subset of the proteome filtered for strong protein change profiles by requiring
that the aggregated protein change profile have at least 3 z-scores above an absolute value of
5.0. The dendrogram represents the result of applying average linkage hierarchical
agglomerative clustering on the aggregated protein change profiles. Examples of clusters
7
151
152
153
154
155
156
have been highlighted by yellow boxes on the heat map; some clusters for which we have
annotations for are labelled and presented in Table 2. The three panels of microscopy images
correspond to three examples of protein localization changes found in our heat map clusters.
See text for more details.
Cluster Perturbation(s) Annotation P-Value Examples
A α-factor, iki3ΔCytoskeleton-Dependent
Cytokinesis [GO:0061640]3.77E-07 Bud3, Bud4, Cdc10,
Cdc11, Hof1, Myo1Cell Cycle
[GO:0007049]9.21E-05
B Hydroxyurea Cell Cycle - Cdc28, Cdc7, Clb2
CRapamycin, Hydroxyurea
RNA Metabolic Process [GO:0016070]
6.70E-04 Msn2, Dot6, Rtg1, Rtg3, Enp1, Tsr1,
Stb3, Utp20Stress-Responsive
Transcription Factors- Msn2, Dot6, Rtg1,
Rtg3
D
Hydroxyurea, α-factor
Extrinsic Component of Vacuolar Membrane
[GO:0000306]
0.028823 Iml1, Vac14, Tco89
DNA Replication - Pol12, Mcm7, Ctf18
Erpd3Δ,
HydroxyureaTriglyceride Lipase Activity
[GO:0004806]0.029239 Tgl4, Tgl5
Metabolic Activity - Tgl4, Tgl5, Rnr3, Gdb1
F
Rapamycin Negative Regulator of Hydrolase Activity
[GO:0051346]
2.70E-04 Stf1, Inh1, Yhr138c, Pbi2
Enzyme Inhibitor Activity [GO:0004857]
0.001441
Grpd3Δ Annotated by Chong et al.
(2015) to localize away from cytoplasm in rpd3Δ
- Acs1, Rad7, Mni1, Yjr008w, Ycr061w, Pab1, Pus4, Yor292c
H iki3Δ Ribosomal Subunits - Rps9a, Rpl13b, Rps0b
IRapamycin Cytosolic Ribosome
[GO:0022626]0.010879 Rpl23a, Rpl40a,
Rpl40b, Rps10aRibosomal Subunits -
JRapamycin Diacylglycerol
cholinephosphotransferase activity [GO:0004142]
0.043859 Ept1, Cpt1
Phospholipid metabolism - Ipt1, Gpt2, Ept1, Cpt1
Krpd3Δ,
Rapamycin, iki3Δ
Vacuole [GO:0005773] 9.39E-05 Bap2, Itr1, Hxt2, Aqr1, Dip5Ion transmembrane
transporter activity [GO:0015075]
0.020585
LRapamycin, Hydroxyurea
Bounding membrane of organelle [GO:0098588]
0.027761 Ams1, Kch1, Ymd8, Pho91, Mnn2, Och1,
Kre2, Gnt1Protein Glycosylation
[GO:0006486]0.020786 Mnn2, Och1, Kre2,
Gnt1M α-factor Cellular response to
pheromone [GO:0071444]5.37E-04 Fus1, Kar4, Ste2,
Aga2, Crz1
8
157
158
159
160
161
Table 2. Annotations for the clusters presented in Figure 1. Selected significant gene
ontology enrichments are presented with their respective GO accession numbers and p-values
(full lists can be found in Supplementary Data 3). Some observations were not represented
well by enrichment. For example, while many cell cycle proteins are in cluster A, there were
some key cell cycle proteins in clusters B and D that were distinguished in their pattern of
localization changes. We provide some manual annotations for these observations.
2.2 Clustering of multi-perturbation z-score vectors reveals shared and specific
localization changes supported by literature
To test the hypothesis that proteins with similar profiles may be linked functionally,
we grouped together proteins by clustering our aggregated protein change profiles. The heat
map in Figure 1 shows the 1159 vectors with the most significant changes. To our knowledge,
this represents the first quantitative comparison of protein localization changes under multiple
experimental conditions. We observe that the overall landscape of patterns of localization
changes is complex, and includes a mixture of localization changes specific to one
perturbation, and localization changes shared between various combinations of the
perturbations we tested.
We hypothesized that the rapamycin and hydroxyurea treatment would share some
localization changes, because both drugs induce cellular stress responses (22,23). We found a
cluster of shared protein change profiles (Figure 1C, Table 2C) between the two drug
treatments that included stress-responsive proteins such as Msn2, a master regulator of the
yeast general environmental stress response (17), in addition to various proteins involved in
ribosomal biogenesis and rRNA processing, which may connect to the knowledge that
ribosomal subunits are repressed in transcription under the environmental stress response
(17). We find many subtleties in the localization behavior of stress-responsive proteins within
this cluster; for example, while Msn2 is shared between the rapamycin and hydroxyurea
9
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
perturbations, we found that Tsr1, a protein predicted to move to the cytoplasm to the
nucleolus under general stress by a predictive proteome-wide analysis of stress responses
(24), is shared between the rpd3Δ, rapamycin, and hydroxyurea perturbations. Additionally,
while Msn2 has a strong pattern in its protein change profile through all three time points of
hydroxyurea perturbation but in only the first time point of rapamycin perturbation, Stb3, a
repressor of growth under stress controlled by localization between the nucleus and cytoplasm
(25), has a strong pattern in its protein change profile in all time points of both perturbations.
Furthermore, not all stress-related proteins reside in the same cluster; for example, Lap4
(shown as an example in Figure 1), a hydrolase which relocates to the vacuole under
starvation conditions (26), exhibited a strong shared pattern in its protein change profile
between the rapamycin and rpd3Δ perturbations, consistent with the images that show that the
protein is now localized to the vacuole and exhibits an increase in the frequency of its
cytoplasmic foci localization.
We found a similar subtlety in cell cycle proteins. Both hydroxyurea and α-factor
perturbations induce cell cycle arrest (27,28), and consistent with this we found that some
proteins involved in DNA replication (Pol12, Mcm7) and regulation of transcription at the
G1/S transition (Swi6) had similar strong patterns in their respective protein change profiles
(Figure 1D, Table 2D). Some of these localization changes are subtle. We show Mcm7 as an
example in Figure 1; the wild-type cells exhibit a heterogeneous mixture of cytoplasmic or
nuclear-localized protein, while in both hydroxyurea and α-factor, the localization is now
almost entirely nuclear, with the distribution of the nuclear-localized protein becoming
irregular compared to the more circular wild-type nuclear distribution. The Mcm complex,
including Mcm7, is known to relocalize according to cell cycle stage (29). Another cluster in
α-factor includes cytokinesis and cell cycle proteins (Figure 1A, Table 2A); some of these
profiles are shared with the iki3Δ perturbations, an elongator complex subunit that regulates
sensitivity to G1 cell cycle arrest (30).
10
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
In addition to clusters with shared patterns, we also found specific clusters that
contained proteins consistent with the characterized biological effects of the respective
perturbations. α-factor is a pheromone that induces the yeast mating response. Consistent with
this role, we find a cluster of proteins involved in the mating response (Figure 1M, Table
2M), all of which have a strong protein change profile specific to the α-factor only. This
includes the transcription factor Kar4, which is induced as a regulator of downstream events
in the mating response (31). The cluster also includes the transcription factor Crz1, which
responds to calcium by rapid, stochastic pulsing between the nucleus and cytoplasm; a recent
study of single-cell calcium dynamics of pheromone-treated yeast indicates that cells respond
with dramatically more frequent bursts of calcium occurrence dependent upon pheromone
concentration, inducing nuclear localization of Crz1 during these bursts (32). Images for Crz1
in the wild-type versus α-Factor screens (shown in Figure 1) confirm that while the protein is
predominantly cytoplasmic in the wild-type, a proportion of the cells now also localize to the
nucleus for Crz1 under α-Factor perturbation. As another example of a cluster with profiles
specific to only one perturbation, we found a cluster for specific for rapamycin perturbation
that includes proteins that regulate hydrolyse activity (Figure 1F; Table 2F), which may relate
to the autophagic processes induced by rapamycin (33).
Our clustered data is available in Supplementary Data 2. A list of proteins that change
for each perturbation is available in Supplementary Data 4, curated by selecting clusters of
strong patterns of protein change profiles from the heat map in Figure 1.
2.3 Developing hypotheses from our exploratory analysis: examples of three
hypotheses derived from clustering of protein localization changes
The extent to which we can interpret our clusters within known literature varies. For
example, while the mating response is well-studied and provides a clear-cut means to evaluate
the specific localization changes we found in the α-Factor perturbation, other clusters are
11
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
supported by more circumstantial evidence or have little context from literature to form an
interpretation. By integrating the observations made possible by our exploratory analysis of
our protein change profiles with a closer qualitative analysis of images of the proteins
implicated in these clusters, the data can be mined to develop informed hypotheses for
experimental follow-up. In this section, we show three examples of this process.
2.3.1 Certain stress-responsive transcription factors that exhibit stochastic pulsatility
may be controlled by the same mechanism
We observed a small cluster of three transcription factors, Msn2, Dot6, and Rtg3,
with patterns shared between the hydroxyurea and rapamycin perturbation. All three proteins
exhibited a protein change profile indicating that they were becoming more compact in
distribution and closer to the cell center under both hydroxyurea and rapamycin perturbation
(Figure 3 - A). Evaluating the images (Figure 3 - B) confirmed that these proteins were
moving from cytoplasm to nuclei under these perturbations. The import of these transcription
factors into the nucleus is expected; all three respond in this manner due to the inhibition of
TORC1 under rapamycin (22), and Msn2 and Dot6 have both been documented to respond in
this manner through a previous manual assessment of localization changes under hydroxyurea
(13).
12
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
Figure 3. Heat map of the protein change profiles (A) and images (B) for a
cluster of stress-responsive transcription factors. We show images of each respective protein
under standard media, rapamycin treatment at 60 minutes and 220 minutes, and hydroxyurea
treatment at 80 minutes and 160 minutes.
We detected interesting coordinated time dependencies in these responses. All three
transcription factors are localized to the nucleus throughout all three time points of
hydroxyurea perturbation, but strongly localize to the nucleus at the first time point of
rapamycin perturbation before returning closer to the wild-type baseline in later time points.
While there are subtle differences between the three proteins (for example, Msn2 only
diminishes in relative localization to the nucleus rather returning fully to the cytoplasm as we
see in the other proteins), the general trend is preserved.
These similarities in their protein change profile patterns led us to hypothesize that
Msn2, Dot6, and Rtg3 may be regulated by a common mechanism. Msn2 has known
13
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
mechanisms behind its localization changes, exhibiting stochastic oscillation (or pulsing)
between the cytoplasm and nucleus regulated by the cAMP-protein Kinase A (PKA) pathway
(34). Moreover, pulsing in Msn2 is affected by the degree of stress, with low stress levels
inducing a cytoplasmic steady state, high stress inducing a nuclear steady state, and only
intermediate levels of stress inducing pulsing (34); rapamycin has been specifically shown to
induce nuclear accumulation of Msn2 (35). We speculate that coordinated changes in pulsing
behavior under rapamycin or hydroxyurea treatment could underlie the similar patterns of
localization changes between these proteins. However, while a proteome-wide screen
confirms pulsatile dynamics for Msn2 and Dot6, Rtg3 was not found to pulse (36). To test
whether Rtg3 also shows condition specific pulsing dynamics, we produced time-lapse
movies of Rtg3 in standard growth media and under rapamycin treatment.
. We observed Rtg3 pulses (in both the GFP-collection strain, as well as an
independently constructed Rtg3-GFP fusion strain, See Methods), for both the wild-type
media and rapamycin treatment. However, we found that, as predicted by the cluster analysis
rapamycin treatment increases the duration of Rtg3 pulses. In the standard media, pulsing
single cells tend to show frequent oscillation between the nucleus and cytoplasm, while under
rapamycin treatment, pulsing single cells tend to show asingle prolonged pulse in our movies.
We quantified single cell dynamics in our independent Rtg3-GFP strain (examples of pulsing
single cells in Figure 4A and 4B). The proportion of cells with a prolonged pulse (defined as
having its longest pulse as greater than 120 minutes and shown in Figure 4C) significantly
differs under rapamycin treatment relative to the wild-type (26/53 cells in wild-type, 36/50
cells under rapamycin treatment, p = 0.03, Fisher’s Exact Test).
14
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
Figure 4. Single cell localization dynamics for Rtg3, for a cell with frequent
oscillations between the nucleus and cytoplasm (A) and a cell with a single prolonged pulse
15
296
297
298
(B). We show stills from the time-lapse movie of each cell at various timepoints. (C) shows
that the proportion of cells with prolonged pulses increases in rapamycin treatment relative to
wild-type media.
2.3.2 Different sets of ribosomal subunits respond specifically depending on the
perturbation and may exhibit extra-ribosomal functions
We found evidence of localization changes in many ribosomal subunits, many of
which were indicated by their protein change profiles to respond specifically to one
perturbation only. As examples, we show two sets of ribosomal subunits in Figure 5; one set
of proteins has a strong protein change profile pattern in the last two timepoints of rapamycin
perturbation, with some proteins showing a redistribution from the nucleus to the cytoplasm
in the images, while the other set has a strong protein change profile pattern in iki3Δ
perturbation, with some proteins showing a change from a homogeneous cytoplasmic
population to a bimodal cytoplasmic-low expression one. These condition-specific
localization changes for particular ribosomal subunits are in contrast to the tightly coordinated
transcriptional response to stress observed at the transcriptional level (17).
16
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
Figure 5. Heat map for ribosomal subunits taken from two distinct clusters, one that
responded specifically in rapamycin (A – top) and one that responded specifically in iki3Δ (A
– bottom). We show some images of the relevant perturbation and the reference wild-type for
the changes in some subunits in B and C.
While the role of the perturbation-specific localization changes that we identify in
these ribosomal subunits is not entirely clear, a growing body of research suggests that
specific ribosomal subunits may have extra-ribosomal functions ranging from ribosomal
biogenesis to DNA repair to adhesive growth (37). For example, Rpl40a and Rpl40b are
ubiquitin-ribosomal protein fusions known to contribute to 27 SB pre-rRNA maturation in the
nucleolus by the cleaving of the fusion (38). We find that Rpl40a and Rpl40b both redistribute
from the nucleus to the cytoplasm in the later two time points of rapamycin perturbation,
which we speculate reflects the reduction in ribosomal biogenesis and rRNA synthesis and
processing activity caused by rapamycin (39). From this example, we believe that
understanding protein localization changes in ribosomal subunits, and under what
perturbations these changes occur, can assist the exploration of extra-ribosomal functions.
17
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
2.3.3 Membrane proteins exhibit a reciprocal pattern of localization changes between
hydroxyurea and rapamycin
We observed a cluster of proteins that had a strong protein change profile patterns in
both the rapamycin and hydroxyurea perturbations, but with different patterns in their
respective perturbations. In the rapamycin perturbations, the direction of our features
indicated that the proteins were becoming closer to the cell center and further from the cell
edge relative to the wild-type, while in the hydroxyurea perturbations, the same proteins were
predicted to become further from the cell center and closer to the cell edge. Evaluating images
for these proteins (Figure 6) identified that most were membrane proteins, with some being
plasma membranes and others Golgi. In the wild-type, these proteins are localized to both the
membrane and the vacuole; under hydroxyurea treatment, this distribution of protein shifts
towards the membrane more, whereas in rapamycin perturbation, it shifts towards the vacuole
more, confirming the trends predicted by our protein change profile patterns.
18
332
333
334
335
336
337
338
339
340
341
342
343
344
345
Figure 6. Heat map of the protein change profiles (A) and images (B) for a cluster of
proteins with reciprocal protein change profiles between hydroxyurea and rapamycin
treatment. We show images for three proteins from this cluster, under 220 minutes of
rapamycin treatment, standard media, and 160 minutes of hydroxyurea treatment.
19
346
347
348
349
350
While we do not have a clear hypothesis for the mechanism behind this reciprocal
change between the hydroxyurea and rapamycin perturbations, what is interesting about these
membrane proteins is not just that they change localization in either the rapamycin and
hydroxyurea perturbations, but that they change localizations differently between the
perturbations. Because our method visualizes localization changes quantitatively and
simultaneously for both perturbations, these patterns were immediately obvious from the heat
map. In addition, global clustering enabled visual identification of these changes. Because
some of the localization changes are subtle, they may have been missed or attributed to
technical variation if viewed in isolation. By grouping these proteins with more pronounced
examples of changes of a similar nature, these changes were easier to identify.
2.4 Cluster associations found by our unsupervised exploration of localization
changes are complementary to other high-throughput experiments
Next, we wanted to test if the associations between proteins inferred from our
exploratory analysis of protein localization changes were unique to analysing protein
localization.
First, to determine that we were not simply detecting protein localization changes
resulting from transcript-driven protein abundance changes, we tested the overlap of our
predicted localization changes with transcriptional responses for perturbations for which we
could find comparable microarray data. We present this data qualitatively as a heat map,
which presents the microarray data (as log fold-change from wild-type) as columns to the
right of the localization change vectors for each perturbation. This visualization allows us to
assess if clusters of protein change profiles also have large transcript changes in the relevant
perturbations. The data for the full heat map can be found in Supplementary Data 5; we show
important components in Figure 7A.
20
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
Figure 7. Comparisons of our clusters of protein change profiles with other high
throughput experimental data. A) shows some exemplar localization change clusters with
21
377
378
379
transcriptional data side-by-side for the relevant perturbations; note that the transcript vectors
and protein change profiles are shown on a different scale. B) shows the enrichment of 15
clusters for physical interactions, reported as z-scores. C) shows our protein change profiles,
represented as a scatterplot, labelled with manual annotations for the wild-type localization of
the proteins. We show 608 proteins, filtered for having at least 2 z-scores above an absolute
value of 6.0. Some clusters that we previously identified in Table 2 are circled on the
scatterplot.
We found that some clusters of localization change correspond with large transcript
changes. In the alpha factor, the cluster of mating pathway components (Kar4, Aga2, Fus1,
etc.; Figure 7A – I) had both strong transcript vectors and protein change profiles, as well as
some (Bud3, Bud4, etc.; Figure 7A – II), but not all cell cycle proteins. Some stress related
genes common to both the rapamycin and hydroxyurea perturbations have large transcript
changes (Tsr1, Enp1, etc.; Figure 7A – III), but not the transcription factors that we
previously identified in section 2.3.3 (Dot6, Msn2, Rtg3; Figure 7A – III). In contrast, many
clusters do not agree well with transcript changes. For example, a cluster of proteins specific
to the rapamycin perturbation containing some kinases (Pkc1, Rtk1, Npr1; Figure 7A – IV),
and a cluster of proteins involved in DNA repair responding specifically in the hydroxyurea
perturbation (Cdc28, Rad54, Cdc7, etc.; Figure 6A – V) mostly do not exhibit large transcript
changes. The reciprocal pattern of membrane protein localizations between the rapamycin and
hydroxyurea perturbations, identified in 2.3.2, also do not exhibit large transcript changes
(Figure 7A – VI).
For ribosomal subunits in particular, we found that the relationship between transcript
vectors and protein change profiles was complex. First, we observed that some ribosomal
subunits with a pattern specific to the rapamycin perturbation (Rps10a, Rpl40b, Rpl40a;
Figure 6A – VII) had large transcript changes in the rapamycin perturbation. At the same
time, we observed some ribosomal subunits with strong transcript changes in the rapamycin
22
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
and hydroxyurea perturbations (Rps18a, Rpl14a, Rps27b, Rpl27b, Rpl2a; Figure 7A – VIII)
that had strong protein change profiles in neither. Finally, we observed ribosomal proteins
that had strong protein change profiles in both the rapamycin and hydroxyurea perturbations
(Rpl23a, Rpl27a; Figure 7A – IX), that also had strong transcript changes in the rapamycin
perturbation but not the hydroxyurea.
Next, we asked if our clusters of protein change profiles were dominated by
interacting proteins from the same complex moving together. We hypothesized that the
underlying causes behind localization changes were diverse, and that coordinated localization
changes would not necessarily require that the participating proteins had to interact, such as if
a set of proteins were independently handling modular aspects of a response. To test overlap
with protein-protein interactions, we selected 15 clusters of protein change profiles using the
dendrogram presented in Figure 1, ranging from 5 to 37 proteins with an average cluster size
of 15.7 (lists of proteins in each cluster can be found in Supplementary Data 6). We evaluated
if these clusters were enriched for physical interactions from low-throughput experimental
sources using the BIOGRID database (40). Our results are summarized as z-scores indicating
the degree of enrichment relative to null expectation in Figure 7B. Only 5 clusters were
enriched for physical interactions, while 10 clusters were not.
Finally, we wanted to determine if our clusters of protein change profiles consisted
primarily of proteins that shared a subcellular localization in the wild-type or not. In section
2.3.3, we showed an example of proteins in different compartments (cell membrane and
Golgi) showing similar patterns of behavior. We wanted to assess more systematically if we
were capturing protein behavior in our analysis, rather than simply recapitulating proteins that
shared localizations and moved together; we reasoned that if proteins moved under the same
sets of perturbations, even from and to different places, they could still be regulated in the
same way, such as being phosphorylated by the same kinase. To do this, we visualized our
protein change profiles as a scatterplot using a technique that represents the aggregated
profiles in two dimensional space (Figure 7C), and color-coded the points according to
23
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
manually assigned annotation classes for a wild-type GFP-fusion yeast collection by Huh et al
(20). Many of the clusters we presented in Table 2 are also represented by clusters of
proximate points in the scatterplot; we label some of the proteins previously shown.
Generally, most clusters are heterogeneous in their composition of localization
classes. The cluster of proteins responding specifically to rpd3Δ (Table 2G; includes Pus1,
Acs1) is an example of a highly heterogeneous cluster. In many cases, there is some local bias
towards some classes. For example, the cluster of pheromone response proteins (Table 2M;
includes Aga1, Kar4, Fus1) has a mixture of cytoplasm and nucleus, and vacuole localized
proteins. This reflects how this cluster consists of both pheromone-responsive transcription
factors like Kar4, in addition to downstream responses like Fus1, a cell fusion protein that
facilitates vacuole mixing (41). Similarly, while the clustered stress-responsive proteins
Msn2/Dot6/Rtg3 (discussed in section 2.3.1) all share a cytoplasm/nucleus localization,
proximate proteins (close to TSR1 in Figure 6C) are more heterogeneous. Some clusters do
exhibit primarily a single localization class, however. For example, the cluster of cytokinesis
proteins (Table 1A; includes Bud3, Bud4, Myo1) are mostly bud neck; we note that these
proteins were also the cluster that was mostly strongly enriched for physical interactions in
the previous comparison (Figure 7B – 1).
2.5 Exploratory analysis on a kinase deletion screens reveals more limited
localization changes
Next, we explored protein localization changes across image screens for 7 kinase
deletion mutants: elm1Δ, hal5Δ, hsl1Δ, kin1Δ, kin2Δ, mck1Δ, and vhs1Δ. Because
phosphorylation is a well-characterized mechanism for regulating protein localization (2), we
anticipated that we would identify localization changes in these mutants. This dataset
contained 116,004 images and about 5.19 million single cells. These image screens represent
a more challenging case for exploring protein localization change dynamics for several
24
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
reasons. First, we had less prior context to draw upon. While some information exists on the
phenotypes that these deletions induce, little is known about the proteomic dynamics induced
by the kinase deletions. Second, we found that this dataset was more technically challenging.
The average number of cells identified in the kinase deletion screens was ~744,000 cells per
screen (compared to ~1.22 million per screen in the analysis in Figure 1). Furthermore, we do
not have replicates or multiple time points in the kinase deletion screens, making it more
challenging to confirm the reproducibility of the signals we observe. Nevertheless, we applied
the change detection and aggregated profile clustering strategy that we presented in sections
2.1 and 2.2, on the kinase deletion mutants, and visualized the most significant results as a
clustered heat map, (Figure 8A); the full clustered data is available in Supplementary Data 7.
25
460
461
462
463
464
465
466
467
468
469
Figure 8. A) shows a clustered heat map of the aggregated protein change profiles for
the kinase deletion perturbations. This heat map shows a subset of the proteome filtered for
strong protein change profiles by requiring that the aggregated protein change profile have at
least 2 z-scores above an absolute value of 6.0. We highlight some clusters. B) Overlay of
26
470
471
472
473
474
GFP-fluorescence patterns (green) of a Yju3-GFP fusion protein with DIC images to show the
cells (shades of grey) for an independently-produced set of strains and imaged under a
different microscope.
We find that most strong protein change profiles in the kinase deletion screens are
specific, occurring under just one kinase deletion perturbation. These results contrast with the
general stress-related protein localization changes that we found in previous perturbations.
These patterns (such as those in Msn2, Dot6, Rtg3, Tsr1, Stb3 and Lap4, previously discussed
in section 2.2) do not appear as generalized responses in the kinase deletion perturbations. We
do find several proteins in clusters specific to the elm1Δ perturbation previously shown to
relocate to nuclear or cytoplasmic foci under DNA replication stress, including Lap4, Hsp42,
Apj1, and Pph21 (13). Interestingly, we also find that Gre3, previously found to have an
abundance change but no localization change under DNA replication stress, shows a dramatic
localization change from cytoplasm to nucleus under elm1Δ perturbation in most of the cells
(Figure 8). Notably, elm1Δ induces an irregular morphology where the cells become
elongated, which is not observed in the other kinase deletions.
While most clusters for our kinase perturbations appear to be specific, we do find
some shared protein profile change patterns. A small set of protein profiles changes shared
between the hal5Δ and the kin2Δ perturbations consists of Sol4, Hsp30, and Hor2, all of
which are linked to various stress responses (13,42,43). We show Hsp30 in Figure 8, and note
that in both perturbations, a previously rare vacuolar phenotype in the wild-type becomes
much more frequent. Hsp30 has been shown to be activated independently of the general
stress-related transcription factors Msn2 and Msn4 (47), congruent with our observation that
Msn2 and other stress-related transcription factors are absent as a generalized program in the
kinase deletion perturbations. Additionally, we find two proteins that appear to have
reciprocal patterns of protein change profiles between the kin2Δ and mck1Δ perturbations,
Qri5 and Gln3. We show Gln3 in Figure 8; in the wild-type, the protein localizes as a
27
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
heterogeneous mixture of cytoplasmic and cytoplasmic-and-nucleus cells, but in kin2Δ, the
cells appear to be mostly cytoplasmic, whereas in mck1Δ, there appear to be relatively more
cytoplasmic-and-nucleus cells.
Like the results for our previous screens, we found strong protein change profiles in
ribosomal subunits that were specific to some perturbations. A cluster for the hal5Δ
perturbation contained the ribosomal subunits Rps23a, Rpl7b, and Rpl3. We show Rps23a in
Figure 8 as an example, with the localization appearing to be more strongly nuclear in the
perturbation than the wild-type.
We also found some strong specific protein change profile patterns for proteins that
we considered unusual in their trend of localization change, or in the context of the kinase.
For example, for hsl1Δ, we found Yju3, a monoglyceride lipase that is known to localize to
lipid particles and membranes (44); as Hsl1 regulates the morphogenesis checkpoint (45), we
could not understand the link from literature alone. Qualitative analysis of the associated
images indicates that while Yju3 in the wild-type does localize in an ER and cytoplasmic
pattern, a large proportion of the cells now exhibit a localization change to the
nucleus/nucleolus in the hsl1Δ perturbation (Figure 7). To confirm that this observation was
not simply due to contamination or another sample-specific technical issue, we verified the
phenotype with an independently-produced set of strains and a separate microscope (Figure
8B). We confirmed that Yju3 was localized in a punctate cytoplasm pattern in the wild-type
consistent with an ER localization, but changed to the nucleus under hsl1Δ perturbation,
confirming our observations in the GFP-library image collection. We regard Yju3’s
localization to the nucleus as unexpected in the context of its role as a lipase; we hypothesize
that this observation may reflect an unknown additional role for Yju3.
3. Discussion
28
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
Adding context to patterns of protein localization change deepens biological insight.
It is one thing to ask what proteins are changing for a perturbation; it is a more focused query
to ask what proteins are changing in a similar way for a perturbation, but specific to that
perturbation only. For example, for the α-factor treatment, we resolved a specific cluster of
proteins involved in the mating response. Another cluster in the α-factor perturbation, which
included DNA replication factors, had similar protein change profiles, but was shared with the
hydroxyurea perturbation. Had we clustered the α-factor localization change data
individually, the two clusters would have likely been mixed. With the addition of the
hydroxyurea perturbation, these clusters could be individually resolved. From examples like
these, we believe that this increased resolution offered by a multi-perturbation context
empowers identification of functional relationships between proteins by association. This will
likely further improve as high-throughput imaging experiments expand to include more
perturbations.
Moreover, representing protein localization changes comparably between
perturbations permits the detection of nuanced trends in protein responses. We showed that
components of the yeast stress response could be differentiated in subtle ways. Some proteins,
like Stb3 in rapamycin, responded throughout perturbations, whereas other proteins, like
Msn2, responded in a temporal manner even in the same perturbations. Some proteins, like
Tsr1, responded in more perturbations than others. Some proteins, like Lap4, responded in
different combinations of stressful perturbations than others. Many of these observations are
consistent with research on stress-responsive proteins: for instance, the temporal nature of
Msn2 can be attributed to its oscillatory pulsing behavior between the cytoplasm and nucleus
(34), and the more specialized response we observed in Lap4 can be attributed to its selective
transport to the vacuole by the autophagosome (26). Our approach summarizes various facets
of protein responses in a compact visualization. In doing so, we provide a glimpse of the
complex proteomic behaviors that occur within cells.
29
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
An important feature of our change detection and clustering approach is that it is
unbiased by prior biological knowledge. For this reason, it is encouraging that we recapitulate
much known biology in our clusters; the fact that we can find many specific examples of
known responses to perturbations provides evidence to support that our approach reflects
underlying biology. Furthermore, the unbiased nature of our approach permits the association
of new genes with previously known responses. For example, we found that Crz1 was
clustered with proteins in the mating response in the α-Factor perturbation, a result that was
initially unexpected given that Crz1 is not a canonical part of the mating response.
Interestingly, another study independently linked Crz1 to the mating pheromone (32),
confirming that even well-characterized pathways can benefit from unbiased, high-throughput
exploratory analyses of protein localization changes.
In our examples of how to apply this approach for hypothesis development, we
identified Rtg3 as a pulsatile transcription factor modulated by rapamycin in this study. Our
results contradict a previous proteome-wide screen that suggested that Rtg3 does not pulse
(36). However, the highly variable dynamics of pulsing from cell-to-cell (34,36) and the large
number of proteins independently tested in the screen, as well as differences in microscopy
and imaging conditions are plausible reasons for the discrepancy. In contrast to the time-lapse
movie data used to analyse pulsing, our image data is much less comprehensive, consisting of
still images of just three coarse time-points. Rather, our discovery was powered by
associating protein behavior, by observing that Rtg3 had a similar protein change profile to
other pulsatile transcription factors whose dynamics were affected by rapamycin. That we
could make findings missed by stronger experimental approaches demonstrates how looking
for protein properties by association can be surprisingly powerful.
We believe that our approach strongly complements human evaluation of images,
instead of fully replacing it, because an expert observer can provide the critical context to
evaluate if the signals found by our unsupervised approach are biological or technical. At the
same time, careful human evaluation takes too long to look at hundreds of thousands of
30
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
images over the whole proteome, many of which may only have subtle trends; our method
drastically reduces this search space, allowing observers to target proteins with strong and
interesting signals, and equips observers with knowledge of which specific screens to look at,
the general nature of the putative change, and if the change is likely to be a more or less
functionally-specific one based upon the permutation of perturbations it occurs in.
While our approach is based upon direct imaging of cells, predictive multi-
perturbation approaches of protein localization are also available (24,46–48). While the latter
approach is a valuable alternative where experimental data is not readily generated, we
believe that predictions based on protein-interactions and mRNA expression patterns (24) will
miss much of the localization dynamics. We found that not all clusters of protein change
profiles overlap well with either of these types of datasets, suggesting that the underlying
mechanisms that drive localization changes are diverse. Indeed, in recent predictions of
localization change (24) under stress, mitochondrion to nucleus and ER to Golgi, were
predicted to be the most frequent type of change. We observe that transitions between the
nucleus and cytoplasm were very common in the localization changes we looked at. While
this could be a technical effect because our change detection method is more sensitive to
localization changes that reflect larger spatial distances (discussed in (19)), we also observe
that many of the transcription factors we observed as changing under stressful conditions
were not predicted to have localization changes by a previous predictive method that relied
upon protein-interaction and mRNA expression datasets (24).
We present this research as a proof-of-principle for combining systematic imaging
experiments. Certainly, our method has many technical limitations. For example, we analysed
two sets of screens in this study, one of lower quality than the other. Because our change
detection algorithm (19) uses real data from our screens to build expectations of what a non-
changing protein looks like, the overall quality of the image screens may affect what
proteomic changes are detected. We noticed that we found considerably smaller clusters of
protein change profiles, and fewer shared patterns of protein change profiles in our lower-
31
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
quality kinase deletion screens compared to our first set of screens. We do not expect our
study to capture all localization changes due to the incomplete specificity of our change
detection method, but we demonstrate that analysing even the localization changes we
captured was highly informative.
Future work should further improve our general strategy of analysing protein
localization changes in a multi-perturbation context. We believe that our general strategy of
viewing localization changes in a multi-perturbation context may lead to implications beyond
model organisms. New methods are beginning to emerge for the scalable GFP tagging of
human proteins (49); given that our strategy can potentially elucidate protein pathway
responses and compensatory rearrangements of the proteome in response to drugs, it could
identify potential protein targets for knockout or combination therapies. Importantly, as our
approach can separate more general responses from those more specific to a drug, it may
permit the informed selection of protein targets that minimize side effects. Furthermore, while
we frame our work as relevant to perturbations, our approach can easily be extended to study
proteomic changes from differences in tissue or cell-type (50).
4. Methods
4.1 Experimental Strains and Image Acquisition
Image data for wild-type GFP-tagged yeast cells, in addition to the rpd3Δ, rapamycin,
hydroxyurea, and α-factor perturbations, were taken from the CYCLoPs database (9).
The iki3Δ, elm1Δ, hal5Δ, hsl1Δ, kin1Δ, kin2Δ, mck1Δ and vhs1Δ strains were
constructed and imaged as described in Chong et al. (12). Fluorescent micrographs were
acquired using a high-throughput spinning-disc confocal microscope (Opera, PerkinElmer)
with a water-immersion 60X objective (NA 1.2, image depth 0.6 µm and lateral resolution
0.28 µm). Acquisition settings included using a 405/488/561/640 nm primary dichroic, a 568
nm detector dichroic, a 520/35 nm filter in front of camera 1 (12-bit CCD) and a 600/40 nm
32
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
filter in front of camera 2 (12-bit CCD). Excitation was conducted using 488 nm (blue) and
561 nm (green) lasers at maximum power. Eight images were acquired for each well, 4 in the
red channel and 4 in the green channel with simultaneous acquisition of red and green
channels (binning = 1, focus height 2 µm) and an 800 ms exposure time per site.
For images that we present in this study, we crop a 300x300 pixel box from the
respective images showing representative cells for the sample. These images are outputted by
our single cell segmentation program (21), and show blue lines around the cells indicating the
results of the segmentation, and white circles connecting mother and bud cells. Cells
considered to be artifacts are indicated by white crossed-out regions in the images. These
images are rescaled in contrast using the highest and lowest intensity pixels in the images; we
preserve this contrast to better show subcellular localization patterns for proteins expressed at
lower abundances.
For the GPF-tagged Rtg3 strains presented in Figure 4C and 7B, respectively, strains
were generated as fusion products via homologous recombination of the native genomic
sequence. Direct transformation of a linear PCR product containing codon-optimized GFP
coding sequence and a selectable resistance marker flanked by gene-specific sequence yielded
C-terminally tagged fusion products which were then isolated by the inferred drug resistance.
To prepare strains for microscopy, strains were inoculated into synthetic minimal media and
grown overnight @ 30˚C. Prior to imaging, stationary cultures were diluted 1/10 in fresh
media and grown at 30˚C for 4 hours to ensure log-phase growth and proper expression of
GFP fusion products.
To produce time-lapse movies of the Rtg3 strains, cells were dosed with 200 ng/ml of
rapamycin for the rapamycin treatment perturbation, and imaged at 22˚C for 16 hours at 2.5
minute intervals for 4 z-stacks at 1 micrometer intervals. Cells were imaged using a Nikon
spinning disk confocal microscope using a 60X oil-immersion objective. GFP excitation was
at 488 nm. To analyse frames of the codon-optimized GFP strain, segmentation and tracking
was conducted using Matlab on the brightfield image to identify cell peripheries. The
33
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
localization score of the GFP signal is calculated by taking the difference between the average
intensity of the 10% brightest pixels within a cell and the average intensity of the other 90%
of pixels in the cell; under this quantification, a nuclear localization will have a higher score
than a cytoplasmic localization. To identify pulses, a two-state hidden Markov machine was
applied across the series of localization scores for the entire timeframe of each single cell.
For confocal analysis of the Yju3 strain, cells were imaged using a Leica TCS SP8
confocal microscope with a 63X oil-immersion objective and a 10X eye-piece objective. GFP
excitation was at 488nm.
4.2 Image Analysis and Quantification of Localization Changes
To quantify the patterns of GFP in the cells in our images, we use the single-cell
segmentation and feature extraction program as described in (21), specialized for the
segmentation and measurement of GFP-tagged yeast cells. We modify the method of
averaging single-cell features by reducing the number of bins for cell size from 10 to 5, to
increase the number of cells within each bin.
An important property of these features is that they track the concentration and
distribution of GFP-tagged proteins relative to certain cellular landmarks. As opposed to
features that only track the shape of the GFP pattern, these features allow us to track relative
distributions between localizations when a protein is localized to two or more compartments,
and shifts the ratio of protein between these compartments, such as in the examples in section
2.3.3. As subcellular organization emerges from affinity and equilibria, it can be fuzzy in their
dynamic equilibria as opposed to sharply defined binaries (51), reinforcing the need for the
descriptions of protein distribution we pursue here rather than sharply defined categorizations
based upon morphology. However, our features are also sensitive to strong protein abundance
changes that drastically alter local concentrations of proteins, so some abundance chances will
also be represented in our clusters. In considering this trade-off, we realized that the
34
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
distinction between protein localization and abundance chances is sometimes arbitrary. For
example, one unclear case is where the protein is not expressed in the wild-type, but shows a
specific localization in the perturbation.
To conduct localization change detection on these features, we apply our previously
described method in (19). This unsupervised change detection method constructs a
conditional expectation for each protein and reports the direction and magnitude of deviation
of the protein’s change compared to this expectation; because these conditional expectations
are constructed locally for each protein, differences in feature measurements between samples
explained by simple variation in the morphology of an organelle or systematic biases between
image screens are de-emphasized, allowing for the fairer comparison of localizations
differently impacted by these effects. The unsupervised change detection method quantifies
the putative localization change for each protein with a shared set of features. This
representation instead encodes the nature of the localization change in the pattern across our
vector of features; for instance, cytoplasm to the nucleus movements are characterized by
strongly positive z-scores for “distance to cell center” and “distance between proteins”
features, which indicate that the wild-type values for these features are higher than the
perturbation values. Using this method, localization changes are presented in a quantitative
and comparable way that summarizes each localization change as a relative measure between
the wild-type and the perturbation.
We use a parameterization of k = 50 for all screens. We increase the leniency of our
filters for sample size reliability compared to the method described in (19), permitting vectors
that have at least 1 cell in each bin rather than requiring at least 5 as originally described; this
permits for the retention of more data, for more complete aggregated vectors, at the expense
of some reliability in our feature measurements. To compensate for this, we use the more
robust truncated mean profiling method described in the previous study. We apply the image
segmentation and feature extraction method to each image screen in our dataset, and then the
change detection method between each image screen paired against the WT2 screen, which
35
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
served as our common reference wild-type screen for all perturbations and all other wild-
types.
To evaluate sensitivity, we randomly drew 20 images from the collection of proteins
that had strong protein change profiles RAP2 screen relative to the WT2 screen. We chose the
the RAP2 because the rapamycin treatment seemed to induce the most changes other than the
rpd3 deletions (which was already used to evaluate the method in a previous study (19)). We
blinded the protein identities and assigned 3 observers to independently annotate both sets of
images. Proteins were considered positive for a visually apparent localization change if at
least 2 out of 3 observers agreed that there was a localization change; proteins were
considered ambiguous for a localization change if at least 2 out of 3 observers considered the
image ambiguous for a localization change, or if 1 observer considered it positive and at least
1 other observer considered it ambiguous. As we found in 2.3.3, some localization changes
found by our algorithm are subtle, and may be challenging to discern without context, so this
approach only provides an estimate of sensitivity.
4.3 Clustering and Visualization of Aggregated Vectors
As discussed in section 2.2, the protein change profile for individual perturbations is
assembled into an aggregated profile. We cluster these profiles using the open-source Cluster
3.0 package (52), with hierarchical agglomerative clustering using uncentered correlation
distance and average linkage. While this operation can be performed on the entire set of
profiles, for the heat maps we present in Figure 1 and Figure 6, we use some filters prior to
clustering to reduce the number of profiles visualized to just the strongest signals. We require
that the aggregated vector for the non-kinase deletion perturbations have at least 3 z-scores
over an absolute value of 5.0 and greater than 80% of data present, resulting in 1159 proteins
displayed. As there are fewer perturbations in the aggregated vector for the kinase deletion
36
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
perturbations, we use a more lenient threshold of at least 2 z-scores over an absolute value of
5.0 and greater than 80% of data present, for a total of 865 proteins displayed.
To visualize these resultant clustered matrices as heat maps, we use Java Treeview
(53). As with microarray convention, red values indicate positive z-scores and green values
indicate negative z-scores, with the intensity of the color indicating the magnitude. Grey
values indicate missing data.
4.4 Comparison with Other High-Throughput Data Sources
Enrichment analyses of the clusters in Table 2 were conducted using the 1159
proteins visualized in Figure 1 as the background population. We look for gene ontology-
enriched terms (54) in biological process, cellular component, and molecular function, with
Benjamini-Hochberg correction (55) at a 0.05 FDR, using the YeastMine tool (56) on the
Saccharomyces Genome Database. We report a sample of the significant terms that we found
for clusters in Table 2; full lists of enrichment for each cluster can be found in Supplementary
Data 3.
To compare our protein change profile clusters with transcriptional changes for these
perturbations, we used data from four separate genomic expression microarray experiments
for the rpd3Δ (57), hydroxyurea (58), rapamycin (59), and α-factor (60) perturbations in
yeast. We only include time points from these microarray experiments earlier or equivalent to
the time points of the corresponding time points in our image screen data. For the proteins
presented in Figure 1, we append this transcriptional data to the aggregated protein change
profiles for the rpd3Δ, hydroxyurea, rapamycin, and α-factor perturbations (in addition to the
non-reference wild-types), and repeat the clustering operation to further resolve any sub-
clusters that may emerge with the inclusion of the transcription data. Data for the full heat
map is available in Supplementary Data 5, while relevant sections are displayed in Figure 6A.
The transcript fold-regulation data is displayed on a different intensity scale than the profile
37
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
change profile z-scores to account for the differences in significance between the units, and
with four pixels per column of transcript data as opposed to one pixel for each localization
change feature, to better display the smaller number of columns in the transcript data.
Protein-protein interactions for the Sacchromyces cerevisiae proteome was taken
from the Biogrid database (40). We compared our data with physical interactions from low-
throughput experimental sources only to improve the confidence of our set of interactions. We
defined 15 clusters of proteins with strong protein change profiles, involving a total of 236
proteins between all clusters, using the dendrogram for the heat map presented in Figure 1;
the specific proteins involved in each cluster can be found in Supplementary Data 6. We
counted the number of physical and low-throughput interactions, respectively, between the
proteins of each cluster individually. To generate null expectations, we randomly shuffled the
236 proteins within our defined clusters while retaining the sizes of each cluster, and counted
the resulting number of interactions within each cluster. We repeated this simulation 10,000
times to produce a mean and standard deviation for number of interactions for randomized
clusters, and report the true number of interactions for each cluster as a z-score against these
expectations.
We use the manual annotations for the wild-type yeast-GFP collection by Huh et al.
(20) to assess wild-type protein localization within our clusters. To reduce the number of
annotations to a reasonable number of classes to color-code within our scatterplot, we
condensed some annotations into a single class. We condense golgi, punctuate composite,
lipid particle, endosome and peroxisome annotations into “punctuate organelles”, and spindle
pole, microtuble, and actin into “structural components”. In addition, we simplify vacuole and
vacuole membrane-localized proteins to “vacuole” and nucleus and nucleolus to “nucleus”.
For multiply localized compartments, we include a “cytoplasm and nucleus” and “cytoplasm
and other” class as the two most common types of multi-localized proteins; for proteins that
did not fall into either categorization, we classified it as the more dominant class by the
annotation. For the scatterplot, we only display the 608 strongest localization change signals
38
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
for the perturbations presented in Figure 1, found using a filter of at least 2 values over an
absolute value of 6. We use the t-SNE dimensionality reduction method (61) to reduce our
aggregated vectors to two dimensions appropriate for a scatterplot; as pre-processing steps,
we impute missing data using randomly samples values from a Gaussian constructed for each
column in our data, and apply PCA to reduce the dimensionality to 50 before applying t-SNE.
Supplementary Figure 1. A visual explanation of the features comprising our
aggregated protein change profiles. For each perturbation, a z-score representing the relative
change between the perturbation and the reference wild-type is generated for 50 features for
each protein. Each feature measures an aspect of GFP distribution, averaged over 1 of 10 bins
of cells designed to track cell cycle stage and differentiate mother and bud cell types. These z-
scores compromise a protein change profile, that are appended to each other across multiple
perturbations.
References
1. Cyert MS. Regulation of nuclear localization during signaling. J Biol Chem. American Society for Biochemistry and Molecular Biology; 2001 Jun 15;276(24):20805–8.
2. Bauer NC, Doetsch PW, Corbett AH. Mechanisms Regulating Protein Localization. Traffic. 2015 Oct;16(10):1039–61.
39
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807808
809810
3. Protter DSW, Parker R. Principles and Properties of Stress Granules. Trends Cell Biol. 2016;26(9):668–79.
4. Yuet KP, Tirrell DA. Chemical tools for temporally and spatially resolved mass spectrometry-based proteomics. Ann Biomed Eng. NIH Public Access; 2014 Feb;42(2):299–311.
5. Dephoure N, Gygi SP. Hyperplexing: A Method for Higher-Order Multiplexed Quantitative Proteomics Provides a Map of the Dynamic Response to Rapamycin in Yeast. Sci Signal. 2012 Mar 27;5(217):rs2–rs2.
6. Nagaraj N, Alexander Kulak N, Cox J, Neuhauser N, Mayr K, Hoerning O, et al. System-wide Perturbation Analysis with Nearly Complete Coverage of the Yeast Proteome by Single-shot Ultra HPLC Runs on a Bench Top Orbitrap. Mol Cell Proteomics. 2012 Mar 1;11(3):M111.013722–M111.013722.
7. Mattiazzi Usaj M, Styles EB, Verster AJ, Friesen H, Boone C, Andrews BJ. High-Content Screening for Quantitative Cell Biology. Trends Cell Biol. 2016;26(8):598–611.
8. Caicedo JC, Singh S, Carpenter AE. Applications in image-based profiling of perturbations. Curr Opin Biotechnol. 2016;39:134–42.
9. Koh JLY, Chong YT, Friesen H, Moses A, Boone C, Andrews BJ, et al. CYCLoPs: A Comprehensive Database Constructed from Automated Analysis of Protein Abundance and Subcellular Localization Patterns in Saccharomyces cerevisiae. G3 (Bethesda). 2015 Jun;5(6):1223–32.
10. Riffle M, Davis TN. The Yeast Resource Center Public Image Repository: A large database of fluorescence microscopy images. BMC Bioinformatics. 2010 Jan;11:263.
11. Breker M, Gymrek M, Moldavski O, Schuldiner M. LoQAtE—Localization and Quantitation ATlas of the yeast proteomE. A new tool for multiparametric dissection of single-protein behavior in response to biological perturbations in yeast. Nucleic Acids Res. 2014 Jan;42(D1):D726–30.
12. Chong YT, Koh JLY, Friesen H, Kaluarachchi Duffy S, Cox MJ, Moses A, et al. Yeast Proteome Dynamics from Single Cell Imaging and Automated Analysis. Cell. 2015 Jun;161(6):1413–24.
13. Tkach JM, Yimit A, Lee AY, Riffle M, Costanzo M, Jaschob D, et al. Dissecting DNA damage response pathways by analysing protein localization and abundance changes during DNA replication stress. Nat Cell Biol. Nature Publishing Group; 2012 Sep 29;14(9):966–76.
14. Kraus OZ, Grys BT, Ba J, Chong Y, Frey BJ, Boone C, et al. Automated analysis of high‐content microscopy data with deep learning. Mol Syst Biol. 2017 Apr 18;13(4).
15. Breker M, Gymrek M, Schuldiner M. A novel single-cell screening platform reveals proteome plasticity during yeast stress responses. J Cell Biol. 2013 Mar 18;200(6):839–50.
16. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. National Academy of Sciences; 1998 Dec 8;95(25):14863–8.
17. Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. American Society for Cell Biology; 2000 Dec;11(12):4241–57.
40
811812
813814815
816817818
819820821822
823824825
826827
828829830831
832833
834835836837
838839840
841842843844
845846
847848849
850851852
853854855
18. Boisvert F-M, Lam YW, Lamont D, Lamond AI. A quantitative proteomics analysis of subcellular proteome localization and changes induced by DNA damage. Mol Cell Proteomics. American Society for Biochemistry and Molecular Biology; 2010 Mar;9(3):457–70.
19. Lu AX, Moses AM. An Unsupervised kNN Method to Systematically Detect Changes in Protein Localization in High-Throughput Microscopy Images. PLoS One. 2016;11(7):e0158712.
20. Huh W-K, Falvo J V, Gerke LC, Carroll AS, Howson RW, Weissman JS, et al. Global analysis of protein localization in budding yeast. Nature. 2003 Oct 16;425(6959):686–91.
21. Handfield L-F, Chong YT, Simmons J, Andrews BJ, Moses AM. Unsupervised clustering of subcellular protein expression patterns in high-throughput microscopy images reveals protein complexes and functional relationships between proteins. PLoS Comput Biol. 2013 Jan;9(6):e1003085.
22. Loewith R, Hall MN. Target of rapamycin (TOR) in nutrient signaling and growth control. Genetics. Genetics Society of America; 2011 Dec;189(4):1177–201.
23. Koç A, Wheeler LJ, Mathews CK, Merrill GF. Hydroxyurea arrests DNA replication by a mechanism that preserves basal dNTP pools. J Biol Chem. American Society for Biochemistry and Molecular Biology; 2004 Jan 2;279(1):223–30.
24. Lee K, Sung M-K, Kim J, Kim K, Byun J, Paik H, et al. Proteome-wide remodeling of protein location and function by stress. Proc Natl Acad Sci U S A. National Academy of Sciences; 2014 Jul 29;111(30):E3157–66.
25. Liko D, Conway MK, Grunwald DS, Heideman W. Stb3 Plays a Role in the Glucose-Induced Transition from Quiescence to Growth in Saccharomyces cerevisiae. Genetics. 2010 Jul 1;185(3):797–810.
26. Suzuki K, Kamada Y, Ohsumi Y. Studies of cargo delivery to the vacuole mediated by autophagosomes in Saccharomyces cerevisiae. Dev Cell. 2002 Dec;3(6):815–24.
27. Day A, Schneider C, Schneider BL. Yeast cell synchronization. Methods Mol Biol. 2004;241:55–76.
28. Udden MM, Finkelstein DB. Reaction order of Saccharomyces cerevisiae alpha-factor-mediated cell cycle arrest and mating inhibition. J Bacteriol. American Society for Microbiology (ASM); 1978 Mar;133(3):1501–7.
29. Nguyen VQ, Co C, Irie K, Li JJ. Clb/Cdc28 kinases promote nuclear export of the replication initiator proteins Mcm2-7. Curr Biol. 2000 Feb 24;10(4):195–205.
30. Butler AR, White JH, Folawiyo Y, Edlin A, Gardiner D, Stark MJ. Two Saccharomyces cerevisiae genes which control sensitivity to G1 arrest induced by Kluyveromyces lactis toxin. Mol Cell Biol. 1994 Sep;14(9):6306–16.
31. Lahav R, Gammie A, Tavazoie S, Rose MD. Role of Transcription Factor Kar4 in Regulating Downstream Events in the Saccharomyces cerevisiae Pheromone Response Pathway. Mol Cell Biol. 2007 Feb 1;27(3):818–29.
32. Carbó N, Tarkowski N, Ipiña EP, Dawson SP, Aguilar PS. Sexual pheromone modulates the frequency of cytosolic Ca(2+) bursts in Saccharomyces cerevisiae. Mol Biol Cell. American Society for Cell Biology; 2017 Feb 15;28(4):501–10.
33. Alvers AL, Wood MS, Hu D, Kaywell AC, Dunn WA, Aris JP, et al. Autophagy is required for extension of yeast chronological life span by rapamycin. Autophagy. NIH
41
856857858859
860861862
863864865
866867868869
870871
872873874
875876877
878879880
881882
883884
885886887
888889
890891892
893894895
896897898
899900
Public Access; 2009 Aug;5(6):847–9.
34. Jacquet M, Renault G, Lallet S, De Mey J, Goldbeter A. Oscillatory nucleocytoplasmic shuttling of the general stress response transcriptional activators Msn2 and Msn4 in Saccharomyces cerevisiae. J Cell Biol. 2003 May 12;161(3):497–505.
35. Hall MN, Beck T. The TOR signalling pathway controls nuclear localization of nutrient-regulated transcription factors. Nature. Nature Publishing Group; 1999 Dec 9;402(6762):689–92.
36. Dalal CK, Cai L, Lin Y, Rahbar K, Elowitz MB. Pulsatile dynamics in the yeast proteome. Curr Biol. NIH Public Access; 2014 Sep 22;24(18):2189–94.
37. Lu H, Zhu Y, Xiong J, Wang R, Jia Z. Potential extra-ribosomal functions of ribosomal proteins in Saccharomyces cerevisiae. Microbiol Res. 2015;177:28–33.
38. Fernandez-Pevida A, Rodriguez-Galan O, Diaz-Quintana A, Kressler D, de la Cruz J. Yeast Ribosomal Protein L40 Assembles Late into Precursor 60 S Ribosomes and Is Required for Their Cytoplasmic Maturation. J Biol Chem. 2012 Nov 2;287(45):38390–407.
39. Stauffer B, Powers T. Target of rapamycin signaling mediates vacuolar fission caused by endoplasmic reticulum stress in Saccharomyces cerevisiae. Mol Biol Cell. American Society for Cell Biology; 2015 Dec 15;26(25):4618–30.
40. Oughtred R, Chatr-aryamontri A, Breitkreutz B-J, Chang CS, Rust JM, Theesfeld CL, et al. BioGRID: A Resource for Studying Biological Interactions in Yeast: Table 1. Cold Spring Harb Protoc. 2016 Jan 4;2016(1):pdb.top080754.
41. Nolan S, Cowan AE, Koppel DE, Jin H, Grote E. FUS1 regulates the opening and expansion of fusion pores between mating yeast. Mol Biol Cell. American Society for Cell Biology; 2006 May;17(5):2439–50.
42. Seymour IJ, Piper PW. Stress induction of HSP30, the plasma membrane heat shock protein gene of Saccharomyces cerevisiae, appears not to use known stress-regulated transcription factors. Microbiology. 1999 Jan 1;145(1):231–9.
43. Pahlman A-K, Granath K, Ansell R, Hohmann S, Adler L. The Yeast Glycerol 3-Phosphatases Gpp1p and Gpp2p Are Required for Glycerol Biosynthesis and Differentially Involved in the Cellular Responses to Osmotic, Anaerobic, and Oxidative Stress. J Biol Chem. 2001 Feb 2;276(5):3555–63.
44. Heier C, Taschler U, Rengachari S, Oberer M, Wolinski H, Natter K, et al. Identification of Yju3p as functional orthologue of mammalian monoglyceride lipase in the yeast Saccharomycescerevisiae. Biochim Biophys Acta. Elsevier; 2010 Sep;1801(9):1063–71.
45. McMillan JN, Longtine MS, Sia RA, Theesfeld CL, Bardes ES, Pringle JR, et al. The morphogenesis checkpoint in Saccharomyces cerevisiae: cell cycle control of Swe1p degradation by Hsl1p and Hsl7p. Mol Cell Biol. 1999 Oct;19(10):6929–39.
46. Gardy JL, Spencer C, Wang K, Ester M, Tusnády GE, Simon I, et al. PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Res. 2003 Jul 1;31(13):3613–7.
47. Horton P, Park K-J, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, et al. WoLF PSORT: protein localization predictor. Nucleic Acids Res. 2007 May 8;35(Web Server):W585–7.
42
901
902903904905
906907908
909910
911912
913914915916
917918919
920921922
923924925
926927928
929930931932
933934935936
937938939
940941942
943944945
48. Lee K, Kim D-W, Na D, Lee KH, Lee D. PLPD: reliable protein localization prediction from imbalanced and overlapped datasets. Nucleic Acids Res. 2006 Oct;34(17):4655–66.
49. Leonetti MD, Sekine S, Kamiyama D, Weissman JS, Huang B. A scalable strategy for high-throughput GFP tagging of endogenous human proteins. Proc Natl Acad Sci. 2016 Jun 21;113(25):E3501–8.
50. Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, et al. Tissue-based map of the human proteome. Science (80- ). 2015 Jan 23;347(6220):1260419–1260419.
51. Kustatscher G, Rappsilber J. Compositional Dynamics: Defining the Fuzzy Cell. Trends Cell Biol. Elsevier; 2016 Nov;26(11):800–3.
52. de Hoon MJL, Imoto S, Nolan J, Miyano S. Open source clustering software. Bioinformatics. 2004 Jun 12;20(9):1453–4.
53. Saldanha AJ. Java Treeview--extensible visualization of microarray data. Bioinformatics. 2004 Nov 22;20(17):3246–8.
54. Consortium TGO. Gene Ontology Consortium: going forward. Nucleic Acids Res. Oxford University Press; 2015 Jan 28;43(D1):D1049–56.
55. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. JSTOR; 1995;289–300.
56. Balakrishnan R, Park J, Karra K, Hitz BC, Binkley G, Hong EL, et al. YeastMine--an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit. Database (Oxford). Oxford University Press; 2012;2012:bar062.
57. Bernstein BE, Tong JK, Schreiber SL. Genomewide studies of histone deacetylase function in yeast. Proc Natl Acad Sci. 2000 Dec 5;97(25):13708–13.
58. Dubacq C, Chevalier A, Courbeyrette R, Petat C, Gidrol X, Mann C. Role of the iron mobilization and oxidative stress regulons in the genomic response of yeast to hydroxyurea. Mol Genet Genomics. 2006 Feb;275(2):114–24.
59. Hardwick JS, Kuruvilla FG, Tong JK, Shamji AF, Schreiber SL. Rapamycin-modulated transcription defines the subset of nutrient-sensitive signaling pathways directly controlled by the Tor proteins. Proc Natl Acad Sci U S A. National Academy of Sciences; 1999 Dec 21;96(26):14866–70.
60. Roberts CJ, Nelson B, Marton MJ, Stoughton R, Meyer MR, Bennett HA, et al. Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. Science. 2000 Feb 4;287(5454):873–80.
61. Maaten L van der, Hinton G. Visualizing Data using t-SNE. J Mach Learn Res. 2008;9(Nov):2579–605.
43
946947948
949950951
952953954
955956
957958
959960
961962
963964
965966967
968969
970971972
973974975976
977978979
980981
982
983