mapping the sub-cellular proteome · 2015-12-04 · spatial proteomics - why? mis-localisation...

Post on 05-Aug-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Mapping the sub-cellular proteome

Computational analyses of high-throughput massspectrometry-based spatial proteomics data

Laurent Gattolg390@cam.ac.uk – @lgatt0

Computational Proteomics Unithttp://cpu.sysbiol.cam.ac.uk/

http://lgatto.github.io/

(Slides @ http://goo.gl/SZRMjg)

14 Oct 2015, CCBI

Plan

Introduction

Spatial proteomics

Data analysis

Transfer learning

Dynamics

Regulations

Cell organisation

Spatial proteomics is the systematic study of protein localisations.

Image from Wikipedia http://en.wikipedia.org/wiki/Cell_(biology).

Spatial proteomics - Why?

Mis-localisationDisruption of the targeting/trafficking process alters propersub-cellular localisation, which in turn perturb the cellularfunctions of the proteins.

I Abnormal protein localisation leading to the loss of functionaleffects in diseases (Laurila and Vihinen, 2009).

I Disruption of the nuclear/cytoplasmic transport (nuclearpores) have been detected in many types of carcinoma cells(Kau et al., 2004).

Re-localisation in

I Differentiation: Tfe3 in mouse ESC (Betschinger et al., 2013).

I Metabolism: changes in carbon sources, elemental limitations.

Plan

Introduction

Spatial proteomics

Data analysis

Transfer learning

Dynamics

Spatial proteomics - How, experimentally

Single celldirect

observation

Population level

Subcellular fractionation (number of fractions)

Tagging Quantitative mass spectrometryCataloguing Relative abundance

1 fraction2 fractions(enriched

and crude)n discrete fractions

n continuous fractions(gradient approaches)

Subtractiveproteomics

(enrichment)

Invariantrich

fraction(clustering)

(χ )2PCP LOPIT

(PCA, PLS-DA)

Pure fraction

catalogue

GFPEpitope

Prot.-spec.antibody

Figure : Organelle proteomics approaches (Gatto et al., 2010)

Fusion proteins and immunofluorescence

Fusion proteins and immunofluorescence

Figure : Example of discrepancies between IF and FPs as well as betweenFP tagging at the N and C termini (Stadler et al., 2013).

Spatial proteomics - How, experimentally

Single celldirect

observation

Population level

Subcellular fractionation (number of fractions)

Tagging Quantitative mass spectrometryCataloguing Relative abundance

1 fraction2 fractions(enriched

and crude)n discrete fractions

n continuous fractions(gradient approaches)

Subtractiveproteomics

(enrichment)

Invariantrich

fraction(clustering)

(χ )2PCP LOPIT

(PCA, PLS-DA)

Pure fraction

catalogue

GFPEpitope

Prot.-spec.antibody

Figure : Organelle proteomics approaches (Gatto et al., 2010). Gradientapproaches: Dunkley et al. (2006), Foster et al. (2006).

⇒ Explorative/discovery approches, global localisation maps.

Fractionation/centrifugation

Quantitation/identificationby mass spectrometry

e.g. Mitochondrion

Cell lysis

e.g. Mitochondrion

Plan

Introduction

Spatial proteomics

Data analysis

Transfer learning

Dynamics

Quantitation data and organelle markers

Fraction1 Fraction2 . . . Fractionm markers

p1 q1,1 q1,2 . . . q1, m unknownp2 q2,1 q2,2 . . . q2, m loc1

p3 q3,1 q3,2 . . . q3, m unknownp4 q4,1 q4,2 . . . q4, m loci...

......

......

...pj qj,1 qj,2 . . . qj, m unknown

Annotated data sets

I Several mouse E14TG2a Embryonic Stem cells.

I Human Embryonic Kidney fibroblast cells.

I The Arabidopsis AT CHLORO data base (Ferroet al., 2010).

I Mouse organs (Foster et al., 2006).

I Arabidopsis from callus (Dunkley et al., 2006;Nikolovksi et al. 2014) and roots (Groen et al.,2014).

I Drosophila embryos (Tan et al., 2009).

I Chicken DT40 Lymphocyte cell (Hall et al.,2009).

I . . .

Available in the pRolocdata experiment package.

0

500

1000

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

Num

ber

of P

MID

s

Spatial/organelle(s) proteomics papers

Visualisation and classification

0.2

0.3

0.4

0.5

Correlation profile − ER

Fractions

1 2 4 5 7 81112

0.1

0.2

0.3

0.4

Correlation profile − Golgi

Fractions

1 2 4 5 7 81112

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Correlation profile − mit/plastid

Fractions

1 2 4 5 7 81112

0.15

0.20

0.25

0.30

0.35

Correlation profile − PM

Fractions

1 2 4 5 7 81112

0.1

0.2

0.3

0.4

0.5

0.6

Correlation profile − Vacuole

Fractions

1 2 4 5 7 81112

●●

●●

●●

●●

●●●● ●

●●

●●

●●

−10 −5 0 5

−5

05

Principal component analysis

PC1

PC

2

ERGolgimit/plastidPM

vacuolemarkerPLS−DAunknown

Figure : From Gatto et al. (2010), Arabidopsis thaliana data fromDunkley et al. (2006)

Data analysis

Fraction1 Fraction2 . . . Fractionm

prot1 q1,1 q1,2 . . . q1, mprot2 q2,1 q2,2 . . . q2, mprot3 q3,1 q3,2 . . . q3, mprot4 q4,1 q4,2 . . . q4, m...

......

......

proti qi,1 qi,2 . . . qi, m...

......

......

protn qn,1 qn,2 . . . qn, m

markers. . . unknown . . .

organelle1unknownorganelle2

......

...organellek

......

.... . . unknown

Fraction1 Fraction2 . . . Fractionm

prot1 . . . . . . . . . . . .

proti...

......

...protn . . . . . . . . . . . .

−6 −4 −2 0 2 4 6

−4

−2

02

4

Principal Component Analysis Plot

PC1 (64.36%)

PC

2 (2

2.34

%)

●● ●●

●●

●●

●●

●●

●●●

●●

● ●●●●●

●● ● ●

●●

●●

●●●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●●

●● ●●

●●

●●

●● ●

●●●

●●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●●

●●

●● ●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

● ●●●

● ●

●●

●●

Supervised machine learning

Using labelled marker proteins to match unlabelled proteins (ofunknown localisation) with similar profiles and classify them asresidents to the markers organelle class.

Current approaches - supervised ML

svm

sigma

cost

0.0625

0.125

0.25

0.5

1

2

4

8

16

0.01 0.1 1 10 100 1000

0.5

0.6

0.7

0.8

0.9

1.0

−6 −4 −2 0 2 4 6−

4−

20

24

Optimised parameters

PC1 (64.36%)

PC

2 (2

2.34

%)

−6 −4 −2 0 2 4 6

−4

−2

02

4

Wrong parameters

PC1 (64.36%)

PC

2 (2

2.34

%)

Figure : Support vector machines classifier with a radial basis functionkernel function, using the pRoloc Bioconductor package1 (Gatto et al.,2014).

1www.bioconductor.org/packages/release/bioc/html/pRoloc.html

F1

0.5

0.6

0.7

0.8

0.9

1.0

knn nb nnet plsda rf svm

● ● ●●

● ●

● ●

Tan.PD

knn nb nnet plsda rf svm

● ● ●

●●●●●

●●

●●●●●●●●●●●●●

●●

●●●

Tan

knn nb nnet plsda rf svm

● ● ●●

● ●●●●

●●●●

● ●

●●●●

●●

●●●●●●●●●●●

●●

●●●●●

●●●●

●●●

Dunkley.PD

● ● ● ● ● ●●●●

●●●●●●●

●●●●●●●●●●●●●●

●●

●●●●

●●●●●●●●

Dunkley

●● ● ● ●

Andy.PD

0.5

0.6

0.7

0.8

0.9

1.0

● ●

● ●●

●●

●● ●

Andy

0.5

0.6

0.7

0.8

0.9

1.0

●●

●●

AT_CHLORO

● ● ● ●

● ●

●●●●●

●●●

Nikolovski

● ●●

● ●

●●●●●● ●●

Nikolovski.Imp

Figure : Comparing classifiers

Limitations

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●● ●●

● ●

● ●

●●

● ●●

●●

●●

●● ●

●● ●

●●●

ER/GolgimitochondrionPMunknown

Incomplete annotation, and therefore lack of training data, formany/most organelles. Drosophila data from Tan et al. (2009).

Novelty detection

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●● ●●

● ●

● ●

●●

● ●●

●●

●●

●● ●

●● ●

●●●

ER/GolgimitochondrionPMunknown

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

CytoskeletonERGolgiLysosomemitochondrionNucleus

PeroxisomePMProteasomeRibosome 40SRibosome 60S

Figure : Left: Drosophila data from Tan et al. (2009). Right:Semi-supervised learning, Breckels et al. (2013).

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●● ●●

● ●

● ●

●●

● ●●

●●

●●

●● ●

●● ●

●●●

ER/GolgimitochondrionPMunknown

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

CytoskeletonERGolgiLysosomemitochondrionNucleus

PeroxisomePMProteasomeRibosome 40SRibosome 60S

Input data:D = (DL, DU )

Phenotype modeling:Select Di

L and modelF = Di

L ∪ DU using aGMM (cluster numberestimate using BIC).

Get candidates: Mem-bers of DU clustered

with DiL are considered

candidats of class i.

Each candidate is testedagainst an outlier

detection algorithm.

Candidates classifiedas members of i are

merged with DiL. Those

which are rejectedare returned to DU

Update classes: ex-amples in DU that areconsistently accepted

into a single class i arelabelled as members of Di

L.

New phenotype: Anyexample of DU not merged

with any DiL and which

are consistenlty clusteredtogether throughoutthe N iterations areconsidered membersof a new phenotype.

Output: Returnunassigned examples,

new DiL members

and new phenotypes.

next class i

all classes considered

Repeat N times

Plan

Introduction

Spatial proteomics

Data analysis

Transfer learning

Dynamics

What about annotation data from repositories such as GO,sequence features, signal peptide, transmembrane domains,images, . . .

I From a user perspective: ”free/cheap” vs. expensive

I Abundant (all proteins, 100s of features) vs. (experimentally)limited/targeted (1000s of proteins, 6 – 20 of features)

I For localisation in system at hand: low vs. high quality

I Static vs. dynamic

number GO features � experimental fractions⇒ dilution of experimental data

What about annotation data from repositories such as GO,sequence features, signal peptide, transmembrane domains,images, . . .

I From a user perspective: ”free/cheap” vs. expensive

I Abundant (all proteins, 100s of features) vs. (experimentally)limited/targeted (1000s of proteins, 6 – 20 of features)

I For localisation in system at hand: low vs. high quality

I Static vs. dynamic

number GO features � experimental fractions⇒ dilution of experimental data

GoalSupport/complement the primary target domain (experimentaldata) with auxiliary data (annotation) features withoutcompromising the integrity of our primary data.

Updated experimental design for

I primary/experimental data

and

I auxiliary/annotation data

Learning from heterogeneous data sources: an application in spatial

proteomics. Breckels LM, Holden S, Wonjar D, Mulvey CM, Christoforou

A, Groen AJ, Kohlbacher O, Lilley KS and Gatto L.

bioRχiv pre-print http://dx.doi.org/10.1101/022152.

Fractionation/centrifugation

Quantitation/identificationby mass spectrometry

Database query

Extract GO CC terms

Convert terms to binary

PR

IMA

RY EX

PER

IMEN

TAL

DATA

AU

XIL

IARY D

RY D

ATA

O00767P51648Q2TAA5Q9UKV5......

GO:0016021 GO:0005789 GO:0005783 ... ... ...

1 1 1 ... ... ...1 1 0 ... ... ...1 1 0 ... ... ...0 0 0 ... ... .... . .. . .. . .. . .. . .. . .

x1

.

.

.

.

.

.

.

.xn

GO1 ... ... ... ... GOA

O00767P51648Q2TAA5Q9UKV5......

0.1361 0.150 0.1062 0.147 0.277 0.1429 0.0380 0.003380.1914 0.205 0.0566 0.165 0.237 0.0996 0.0180 0.027270.1297 0.201 0.0546 0.146 0.292 0.1463 0.0206 0.009020.0939 0.207 0.0419 0.204 0.344 0.1098 0.0000 0.00000. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .

x1

.

.

.

.

.

.

.

.xn

X113 X114 X115 X116 X117 X118 X119 X121

Visualisation Visualisation

e.g. Mitochondrion

Cell lysis

e.g. Mitochondrion

−2 0 2 4

−2

−1

01

23

4

PC1 (40.28%)

PC

2 (2

5.7%

)

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

40S Ribosome60S RibosomeCytosolEndoplasmic reticulumLysosomeMitochondrionNucleus − ChromatinNucleus − NucleolusPlasma membraneProteasomeunknown

Data from mouse stem cells (E14TG2a)

We use a class-weighted kNNtransfer learning algorithm tocombine primary and auxiliarydata, based on Wu andDietterich (2004):

V (ci )j = θ∗nPij + (1− θ∗)nA

ij

Classes and weightsC = {ci=1, . . . , ci=l}; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;

y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;

y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,l

nP2,1 . . . nP

2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,l

nA2,1 . . . nA

2,l

.

.

.

.

.

.

Classes and weightsC = {ci=1, . . . , ci=l}; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;

y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;

y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,l

nP2,1 . . . nP

2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,l

nA2,1 . . . nA

2,l

.

.

.

.

.

.

1

2

c1c2c3

NP =

c1 c2 c3

p133 0 0

p213

23 0

......

...

Classes and weightsC = {ci=1, . . . , ci=l}; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;

y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;

y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,l

nP2,1 . . . nP

2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,l

nA2,1 . . . nA

2,l

.

.

.

.

.

.

Weights matrix (labelled)

c1 c2 c3

θ1 0 0 0θ2 0 0 1

θi...

...... 1 1 0θΘl 1 1 1

F11

F12

F1i...

F1Θl

θ∗ = {1, 0, 1}

Classes and weightsC = {ci=1, . . . , ci=l}; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;

y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;

y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,l

nP2,1 . . . nP

2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,l

nA2,1 . . . nA

2,l

.

.

.

.

.

.

Class-weighted classifier(unlabelled)

V (ci )j = θ∗nPij + (1− θ∗)nA

ij

ci=1 . . . ci=l

123 V (ci )j...j

yj = argmax(V (ci )j )

θ∗ = {1, 0, 1}

NP =

c1 c2 c3

p133 0 0

p213

23 0

......

...

V (c1)1 =1 ×3

3+ (1 − 1) × nA

1,1

V (c2)1 =0 × 0 + (1 − 0) × nA1,2

V (c3)1 =1 × 0 + (1 − 1) × nA1,3

V (c1)2 =1 ×1

3+ (1 − 1) × nA

1,1

V (c2)2 =0 ×2

3+ (1 − 0) × nA

1,2

V (c3)2 =1 × 0 + (1 − 1) × nA1,3

Class-weighted classifier(unlabelled)

V (ci )j = θ∗nPij + (1− θ∗)nA

ijc1 c2 c3

1 V (c1)1 V (c2)1 V (c3)1

2 V (c1)2 V (c2)2 V (c3)2...

...j

yj = argmax(V (ci )j )

D                                              E                        

A                    B                                    C  

● ●●

● ●● ●●●●●

● ●●●●●●●

●●

●●

●●

●●●

●●

40S Ribosome 60S Ribosome Cytosol Endoplasmic reticulum

Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Nucleolus

Plasma membrane Proteasome

0.4

0.6

0.8

1.0

0.6

0.7

0.8

0.9

1.0

0.00

0.25

0.50

0.75

1.00

0.7

0.8

0.9

1.0

0.00

0.25

0.50

0.75

1.00

0.75

0.80

0.85

0.90

0.95

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary

Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary

Combined Primary Auxiliary Combined Primary Auxiliary

F1 s

core

−6 −4 −2 0

−6−4

−20

2

PC1 (3.43%)

PC2

(2.0

8%)

●●

●●●●●●

●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●●

●●●

● ●

●●

●●●

●●

●●●●●●●●●●●●●●

●●●

●●

●●●●

●●

●●

● ●● ●

●●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●

●●●●

●●●●●

●●

●●●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●

● ●

●●●

●●

●●●●

●●

●●

●●

●●●

●●●●●

●●●●

● ●

●●●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●●●●●

●●●

●●●●●

●●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●● ●●●

●●

●●●

●●

●●

●●●

●●●

●● ●

●●

●●

● ●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

40S Ribosome60S RibosomeCytosolEndoplasmic reticulumLysosomeMitochondrionNucleus − ChromatinNucleus − NucleolusPlasma membraneProteasomeunknown

−2 0 2 4

−2−1

01

23

4

PC1 (40.28%)

PC2

(25.

7%)

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

40S Ribosome60S RibosomeCytosolEndoplasmic reticulumLysosomeMitochondrionNucleus − ChromatinNucleus − NucleolusPlasma membraneProteasomeunknown ●

0.5

0.6

0.7

0.8

0.9

Combined Primary Auxiliary

F1 s

core

Proteasome

Plasma membrane

Nucleus − Nucleolus

Nucleus − Chromatin

Mitochondrion

Lysosome

Endoplasmic reticulum

Cytosol

60S Ribosome

40S Ribosome

0 1/3 2/3 1Classifier weight

Cla

ss

Data from mouse stem cells (E14TG2a).

From SML to transfer learning: learn from heterogeneous datasources (experimental spatial proteomics and GO annotation,sequence features, imaging data) to infer localisation more reliably(Breckels et al. 2015).

0.25

0.50

0.75

1.00

knn knn−TL svm svm−TL

Sco

res outcome

correct

incorrect

Plan

Introduction

Spatial proteomics

Data analysis

Transfer learning

Dynamics

Dual-localisation Proteins may be present simultaneously inseveral organelles (e.g. trafficking).

−6 −4 −2 0 2 4 6

−4

−2

02

4

PC1 (64.36%)

PC

2 (2

2.34

%) ●

●● ●

●●

● ●

●●●

●● ●

● ●●●●●

●● ●

●●

●●

●●●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●● ●●

●● ●

●●●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●●

●●

●● ●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

● ●●●

● ●

●●

●●

●●

● ●● ●●

●●

●●

●● ●

●●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

● ●

●● ●●

●●●

●●●

●●●●

●●

●●

●● ●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

● ●

● ●

●●

ER lumenER membraneGolgiMitochondrionPlastidPMRibosomeTGNvacuoleunknown

●● ● ●●

●● ● ● ●

●●

●●

●● ● ● ●

From Betschinger et al. (2013)

−6 −4 −2 0 2 4

−4

−2

02

4

Mouse ESC (E14TG2a) in serum LIF

PC1 (50.05%)

PC

2 (2

4.61

%)

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

● ●

● ●

● ●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

Actin cytoskeletonCytosolEndosomeER/GAExtracellular matrixLysosomeMitochondriaNucleus − ChromatinNucleus − NucleolusPeroxisomePlasma MembraneProteasomeRibosome 40SRibosome 60Sunknown

●Tfe3

Dual-localisation Proteins may be present simultaneously inseveral organelles (e.g. trafficking).

−6 −4 −2 0 2 4 6

−4

−2

02

4

PC1 (64.36%)

PC

2 (2

2.34

%) ●

●● ●

●●

● ●

●●●

●● ●

● ●●●●●

●● ●

●●

●●

●●●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●● ●●

●● ●

●●●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●●

●●

●● ●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

● ●●●

● ●

●●

●●

●●

● ●● ●●

●●

●●

●● ●

●●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

● ●

●● ●●

●●●

●●●

●●●●

●●

●●

●● ●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

● ●

● ●

●●

ER lumenER membraneGolgiMitochondrionPlastidPMRibosomeTGNvacuoleunknown

●● ● ●●

●● ● ● ●

●●

●●

●● ● ● ●

From Betschinger et al. (2013)

−6 −4 −2 0 2 4

−4

−2

02

4

Mouse ESC (E14TG2a) in serum LIF

PC1 (50.05%)

PC

2 (2

4.61

%)

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

● ●

● ●

● ●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

Actin cytoskeletonCytosolEndosomeER/GAExtracellular matrixLysosomeMitochondriaNucleus − ChromatinNucleus − NucleolusPeroxisomePlasma MembraneProteasomeRibosome 40SRibosome 60Sunknown

●Tfe3

Spatial dynamics

Trans-localisation Changes in localisation upon perturbations.

−4 −2 0 2

−4

−3

−2

−1

01

23

PC1 (43.43%)

PC

2 (3

9.04

%)

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

● ●

● ●

● ●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

cytoplasmERGolgiMitochondrialNucleiPlasma membraneProteasome & RibosomeVacuoleunknown

Condition 1

−4 −2 0 2 4

−3

−2

−1

01

23

PC1 (39.04%)

PC

2 (3

0.9%

)

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●● ●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●●

● ●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●●

● ●

●● ●

●●

●●

Condition 2

Spatial dynamics

d1 = dist(profilerep1condition1

, profilerep1condition2

)

d2 = dist(profilerep2condition1

, profilerep2condition2

)

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●●●

● ●

●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

● ●

● ●●

●●

●●

●●●

●●

● ●●

●●

●●

● ●●

●●

● ●

●●

●● ●● ●

●●

●●

● ●

●●●●

●●

● ● ●

●●

●●

●● ●

●●

●●

● ●

●●

●●● ●

● ●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ● ● ●●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●● ●

●●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

● ●● ●

●●

●●

●●

●●

0.0 0.5 1.0 1.5

−3

−2

−1

01

23

(d1 + d2)/2

log2

(d1/

d2)

−4 −2 0 2

−4

−3

−2

−1

01

23

PC1 (43.43%)

PC

2 (3

9.04

%)

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Condition 1

cytoplasmERGolgiMitochondrialNucleiPlasma membraneProteasome & RibosomeVacuoleunknown

●●●

●●

12

3

4

5

−4 −2 0 2 4

−3

−2

−1

01

23

PC1 (39.04%)

PC

2 (3

0.9%

)●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

Condition 2

●●●●●12345

Beyond organelles: application to PPI/Protein complexes

−10 −5 0 5 10

−5

05

10

markers

PC1 (47.02%)

PC

2 (2

2.25

%) ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●●●

●●●

●●

●● ●

● ●● ●

●●●●

●●●

●●●

●●●●

●●●

●●●

●●●

●●

● ●

●●●●

●●●

●●

●●

● ●●

●●

●●

●● ●●●

●●●

●●●

● ●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●●●●

●●

●●●●●

●●

●●

● ●

●●

● ●●

● ●

● ●

●●

● ●

●●●

●●●●

● ●●●

●●

●●

● ●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●● ●

●●

●● ●

●●

●● ●

● ● ●●●

●● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●●●

●●

●●

●●

●●

● ●●●●

●●

●●

●●

●●●

●●

●●

● ●●●

●●

● ●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●● ●

●●

●●

●● ●●

●●●

●●

●●

●●

●●

●●

●●

●●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●● ● ●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●● ●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●

●●●●

●●● ●

●●

●●●

●●●

●●●

●● ●

●●

●●

●●●

●●●

●●●

● ●

●●●●

● ●●

●●●●●●●●

●●

●●

●●

●●

●●●●

●●●●

●●● ●●

●●●●

●●

●●●

●●●

●● ●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●●●●

●●●

●●

●●●

●●

●●●●●●

●●●

●●

●●

●●

●●●

●● ●

●●

●●●●

●● ●●●●●

● ●●●●●

●●

●●

●●●

●●●●●●●●●●●●●●●

●●●●●

●●●●

●●

●●

● ●●

● ●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●● ●●

●●

● ●

●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●● ●●● ●● ●● ●●● ●●●●●●●●●●●●●●●●●●●

●●●●●● ●●

●●●●●●●●●●

●●●

●●●

●●

●●●●

●●

●●●●●●●●

●●●

●●●

●●

●●

●●

●●●

●●●●●●

●●

●●●

●●●●●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●●●

●●●●

●●

●●●

●●●

●●●●●●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●● ●●●

●●● ●

●●

●●

●●● ●

●●●●●

●●●●

●●

●●●

●●

●●●

● ●

●●

● ●●●●

●●

●●●

●●

● ●

●●

●●●

●●●

●●

● ●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

● ●

●●

●●●

●●

●●

●●

● ●●

●●

●●

●●●●

● ●●● ●

●●

●●

●●●

●●●

●●

●●

● ●

●●

●●

● ●●

●● ●

●●●

●●

●●●

●●●

●●●

●● ●●

●●

●●

●●

●●●

●●

●●●●●●●●●●●●●●●●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●●

●●

●●●

●●

●●●●

●● ●

●●●

●●●●

●● ●●●

●●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

● ●●● ●

●●

● ●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●

● ●

●●

●●

● ●

● ●● ●

●●

●●●

●●

●●●

●●●●

●●●●●●

●●●

●● ●

● ●●●●

●●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●● ●

14−319S20S40S60SCCTeIF3Ku70/Ku80PA28Rabunknown

Figure : Data on proteasome complexes from Fabre et al. Mol Syst Biol(2015), DOI: 10.15252/msb.20145497

Software for mass spectrometry and (spatial) proteomics

Bioconductor Open source, enable reproducible research,enables understanding of the data (not a black box) and drivescientific innovation.

I MSnbase – infrastructure to handle quantitative data and meta-data(Gatto and Lilley, 2012) (̃ 350 unique IP download/month).

I pRoloc and pRolocGUI – dedicated visualisation and MLinfrastructure for spatial proteomics (Gatto et al., 2014) (̃ 160unique IP download/month in 2014).

I pRolocdata – structured and annotated spatial proteomics data(Gatto et al., 2014).

I And more generally RforProteomics (Gatto and Christoforou,2014) (̃ 100 unique IP download/month in 2014).

J Betschinger, J Nichols, S Dietmann, P D Corrin, P J Paddison, and A Smith. Exit from pluripotency is gated byintracellular redistribution of the bhlh transcription factor tfe3. Cell, 153(2):335–47, Apr 2013. doi:10.1016/j.cell.2013.03.012.

LM Breckels, L Gatto, A Christoforou, AJ Groen, KS Lilley, and MW Trotter. The effect of organelle discoveryupon sub-cellular protein localisation. J Proteomics, 88:129–40, Aug 2013.

TPJ Dunkley, S Hester, IP Shadforth, J Runions, T Weimar, SL Hanton, JL Griffin, C Bessant, F Brandizzi,C Hawes, RB Watson, P Dupree, and KS Lilley. Mapping the Arabidopsis organelle proteome. PNAS, 103(17):6518–6523, Apr 2006.

LJ Foster, CL de Hoog, Y Zhang, Y Zhang, X Xie, VK Mootha, and M Mann. A mammalian organelle map byprotein correlation profiling. Cell, 125(1):187–199, Apr 2006.

L Gatto and A Christoforou. Using R and Bioconductor for proteomics data analysis. Biochim Biophys Acta, 1844(1 Pt A):42–51, Jan 2014.

L Gatto and KS Lilley. MSnbase - an R/Bioconductor package for isobaric tagged mass spectrometry datavisualization, processing and quantitation. Bioinformatics, 28(2):288–9, Jan 2012.

L Gatto, JA Vizcaino, H Hermjakob, W Huber, and KS Lilley. Organelle proteomics experimental designs andanalysis. Proteomics, 2010.

L Gatto, L M Breckels, S Wieczorek, T Burger, and K S Lilley. Mass-spectrometry based spatial proteomics dataanalysis using pRoloc and pRolocdata. Bioinformatics, Jan 2014.

TR Kau, JC Way, and PA Silver. Nuclear transport and cancer: from mechanism to intervention. Nat Rev Cancer,4(2):106–17, Feb 2004.

K Laurila and M Vihinen. Prediction of disease-related mutations affecting protein localization. BMC Genomics,10:122, 2009.

DJL Tan, H Dvinge, A Christoforou, P Bertone, A Arias Martinez, and KS Lilley. Mapping organelle proteins andprotein complexes in Drosophila melanogaster. J Proteome Res, 8(6):2667–2678, Jun 2009.

P Wu and TG Dietterich. Improving svm accuracy by training on auxiliary data sources. In Proceedings of theTwenty-first International Conference on Machine Learning, ICML ’04, New York, NY, USA, 2004. ACM.

Acknowledgements

I Lisa Breckels, Computational Proteomics Unit, Cambridge(ML, algo)

I Sean Holden, Computer Laboratory, Cambridge (ML)

I Kathryn Lilley, Cambridge Centre of Proteomics(Proteomics)

Funding: BBSRC, PRIME-XS EU FP7, Software SustainabilityInstitute (SSI)

Slides available at http://goo.gl/SZRMjg, under a CC-BY license .

Thank you for your attention

top related