cytof workflow: differential discovery in high …€¦ · workshop!! cytof workflow:...

24
Workshop CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets Malgorzata Nowicka University of Zurich BioC 2017: Where Software and Biology Connect Boston, 28 July 2017

Upload: doancong

Post on 08-Aug-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Workshop!!

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry

datasets !

Malgorzata Nowicka !University of Zurich!

BioC 2017: Where Software and Biology Connect!Boston, 28 July 2017!

Page 2: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

HaveyouanalyzedCyTOForflowcytometrydatabefore? !

Page 3: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

My approach to this workshop!

•  This is just a shorter version of the original workflow!

•  The text is basically the same but maybe try to not focus on it so much, as you can carefully read it later!

•  Ideally, I would do the life coding but because of the time constrains, I will:!–  explain all the code in the vignette (html file)!–  copy-paste to R, to follow with you and have the

opportunity to modify some things if needed !

Page 4: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Introduction!

Figure adapted from Bendall et al. Science 2011!

•  Measuring ~40 parameters (metals)!•  Hundreds of cells per second

CyTOF (mass cytometry) experiment!

Page 5: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Introduction !•  High-dimensional cytometry because it is higher than

before (from ~12 to ~40)!•  We refer to this differential approach as “classic”!

–  Common strategy: !•  identify cell populations of interest by manual gating or automated

clustering!•  determine which of the cell subpopulations or protein markers are

associated with a phenotype of interest using statistical tests!

•  Other approaches:!–  Citrus (Bruggner et al. 2014)!–  CellCnn (Arvaniti and Claassen 2016)!

•  Hybrid approaches (thanks to the modularity)!•  This workflow can serve as a template !

Page 6: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Overview!1.  Data description!2.  Data pre-processing (CATALYST pckg)!3.  Data import – flowCore pckg!4.  Data transformation - arcsinh with cofactor 5!5.  Spot checks!

–  MDS plots (limma pckg)!6.  Cell clustering with FlowSOM and ClusterConsensusPlus pckgs!

–  overclustering + manual merging!7.  Visualization of the clusters with!

–  t-SNE - Rtsne pckg !–  heatmaps - pheatmap pckg!–  ggplot pckg!

8.  Differential analysis with linear mixed models (lme4 and multcomp pckgs)!–  diff. cell population abundance !–  diff. marker expression !

Page 7: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Data description !

•  Subset of data from the Bodenmiller et al. 2012 study; used mass cytometry to measure PBMC's !–  from 8 different patients !–  after 11 different stimulation conditions as well as an

unstimulated "Reference" state!

•  Here, we use the samples that correspond to the BCR-crosslinking and Reference !

•  For each sample, 10 cell surface markers and 14 signaling markers were measured!

Page 8: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Data pre-processing!

•  The pre-processing steps in the Bodenmiller data, as we can download it, included removal of debris and de-barcoding!

•  In general, pre-processing steps may involve:!–  normalization using bead standards!–  de-barcoding!–  compensation !

•  CATALYST pckg!

Page 9: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Data import!

•  All the data is available at the Robinson Lab server http://imlspenticton.uzh.ch/robinson_lab/cytofWorkflow/!

•  Downloading using the download.file() !–  Metadata !–  FCS files!–  Panel file!–  Files with cluster merging !

Page 10: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Data import: expr!M1 M2 M3 …

S1

c1

c2

c3

c4

S2

c5

c6

c7

S3

c8

c9

c10

c11

c12

Page 11: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Data transformation!

•  Skewed distributions !–  Hard to distinguish between positive and negative populations!

•  Magic transformation - arcsinh (hyperbolic inverse sine) with cofactor 5 !–  linear for the lower expression, !–  logarithmic for the higher expression,!–  works for the negative and zero values (they do not have to be

excluded from the analysis)!

Linear Arcsinh cofactor 5Mass cytometry

Figure adapted from Bendall et al. Science 2011!

Page 12: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Diagnostic plots – quick look on!

•  Plot with per-sample marker expression distributions, colored by condition!–  Identify problematic samples or markers!

•  Cell counts in samples!•  MDS plot (or PCA plot)!–  Standard usage in the RNA-seq analysis!–  Generated with the plotMDS() function from the

limma pckg!–  As we want to plot samples we need to summarize

the information about cells to the sample level!

Page 13: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

MDS plot: expr_median_sample_tbl!M1 M2 M3 …

S1

c1

* * *c2

c3

c4

S2

c5

* * *c6

c7

S3

c8

* * *

c9

c10

c11

c12

M1 M2 M3 …

S1 * * *

S2 * * *

S3 * * *

* - median!

Page 14: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

MDS plot!

Page 15: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Cell population identification!•  Manual gating!•  Comparison of clustering algorithms by Lukas

Weber et al., Cytometry Part A, 2016!•  FlowSOM (+ ClusterConsensusPlus) one of the best

performing methods and super fast!•  3 steps:!

1.  Building of the self-organizing map (SOM) with the BuildSOM function - cells are assigned according to their similarities to 100 grid points (or, so-called codes) of the SOM!

2.  Building of a minimal spanning tree, which is mainly used for graphical representation of the clusters!

3.  Metaclustering of the SOM codes performed directly with the ConsensusClusterPlus function!

Page 16: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Over-clustering!

•  Some level of over-clustering is necessary, in order to detect somewhat rare populations!

•  In addition, merging can always follow an over-clustering step, but splitting of existing clusters is generally not feasible!

Page 17: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Over-clustering!

Figures adapted from Nowicka et al. F1000Research 2017!

Over-clustering into 20 groups!

Median marker expression of data normalized to 0-1!0.07

0.07

0.70

0.10

0.62

0.58

0.76

0.73

0.63

0.09

0.53

0.06

0.07

0.07

0.15

0.07

0.08

0.07

0.10

0.07

0.12

0.12

0.79

0.72

0.75

0.77

0.12

0.11

0.11

0.12

0.66

0.52

0.12

0.18

0.20

0.46

0.15

0.54

0.27

0.12

0.21

0.18

0.81

0.79

0.79

0.81

0.20

0.52

0.71

0.13

0.32

0.22

0.54

0.42

0.43

0.50

0.25

0.49

0.47

0.20

0.69

0.58

0.54

0.13

0.14

0.17

0.16

0.21

0.18

0.14

0.14

0.21

0.14

0.25

0.70

0.71

0.58

0.91

0.57

0.18

0.87

0.78

0.52

0.14

0.12

0.11

0.11

0.11

0.11

0.12

0.12

0.32

0.12

0.12

0.11

0.18

0.11

0.19

0.16

0.11

0.81

0.15

0.13

0.07

0.07

0.08

0.07

0.08

0.07

0.07

0.07

0.07

0.07

0.07

0.08

0.07

0.07

0.08

0.08

0.08

0.25

0.25

0.25

0.24

0.24

0.67

0.24

0.67

0.24

0.24

0.24

0.25

0.24

0.65

0.69

0.28

0.25

0.71

1.00

0.95

0.33

0.25

0.35

0.20

0.20

0.20

0.20

0.19

0.19

0.20

0.19

0.31

0.21

0.24

0.43

0.54

0.32

0.93

0.31

0.20

0.70

0.68

0.80

0.76

0.73

0.75

0.68

0.71

0.74

0.06

0.55

0.73

0.75

0.76

0.75

0.76

0.71

0.81

0.51

0.30

0.23

0.24

0.33

0.22

0.22

0.22

0.25

0.24

0.22

0.22

0.21

0.31

0.25

0.27

0.55

0.51

0.33

0.80

0.60

0.24

CD

7

CD

4

CD

3

HLA_D

R

CD

20

IgM

CD

123

CD

33

CD

45

CD

14

1 (3.85%)5 (1.89%)20 (0.34%)15 (7.8%)12 (24.25%)18 (1.85%)3 (14.19%)4 (1.97%)9 (16.62%)2 (2.26%)17 (0.64%)19 (1.51%)8 (8.82%)14 (2.2%)10 (2.08%)6 (3.58%)7 (4.37%)11 (0.64%)13 (0.42%)16 (0.72%)

cluster

0.2

0.4

0.6

0.8

1

Page 18: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Merging by an expert !

Figures adapted from Nowicka et al. F1000Research 2017!

Median marker expression of data normalized to 0-1!0.07

0.07

0.70

0.10

0.62

0.58

0.76

0.73

0.63

0.09

0.53

0.06

0.07

0.07

0.15

0.07

0.08

0.07

0.10

0.07

0.12

0.12

0.79

0.72

0.75

0.77

0.12

0.11

0.11

0.12

0.66

0.52

0.12

0.18

0.20

0.46

0.15

0.54

0.27

0.12

0.21

0.18

0.81

0.79

0.79

0.81

0.20

0.52

0.71

0.13

0.32

0.22

0.54

0.42

0.43

0.50

0.25

0.49

0.47

0.20

0.69

0.58

0.54

0.13

0.14

0.17

0.16

0.21

0.18

0.14

0.14

0.21

0.14

0.25

0.70

0.71

0.58

0.91

0.57

0.18

0.87

0.78

0.52

0.14

0.12

0.11

0.11

0.11

0.11

0.12

0.12

0.32

0.12

0.12

0.11

0.18

0.11

0.19

0.16

0.11

0.81

0.15

0.13

0.07

0.07

0.08

0.07

0.08

0.07

0.07

0.07

0.07

0.07

0.07

0.08

0.07

0.07

0.08

0.08

0.08

0.25

0.25

0.25

0.24

0.24

0.67

0.24

0.67

0.24

0.24

0.24

0.25

0.24

0.65

0.69

0.28

0.25

0.71

1.00

0.95

0.33

0.25

0.35

0.20

0.20

0.20

0.20

0.19

0.19

0.20

0.19

0.31

0.21

0.24

0.43

0.54

0.32

0.93

0.31

0.20

0.70

0.68

0.80

0.76

0.73

0.75

0.68

0.71

0.74

0.06

0.55

0.73

0.75

0.76

0.75

0.76

0.71

0.81

0.51

0.30

0.23

0.24

0.33

0.22

0.22

0.22

0.25

0.24

0.22

0.22

0.21

0.31

0.25

0.27

0.55

0.51

0.33

0.80

0.60

0.24

CD

7

CD

4

CD

3

HLA_D

R

CD

20

IgM

CD

123

CD

33

CD

45

CD

14

1 (3.85%)5 (1.89%)20 (0.34%)15 (7.8%)12 (24.25%)18 (1.85%)3 (14.19%)4 (1.97%)9 (16.62%)2 (2.26%)17 (0.64%)19 (1.51%)8 (8.82%)14 (2.2%)10 (2.08%)6 (3.58%)7 (4.37%)11 (0.64%)13 (0.42%)16 (0.72%)

clustercluster_m

erging

cluster_mergingB−cells IgM+B−cells IgM−CD4 T−cellsCD8 T−cellsDCNK cellsmonocytessurface−

cluster123456789101112131415161718

0.2

0.4

0.6

0.8

1

0.07

0.07

0.70

0.10

0.62

0.58

0.76

0.73

0.63

0.09

0.53

0.06

0.07

0.07

0.15

0.07

0.08

0.07

0.10

0.07

0.12

0.12

0.79

0.72

0.75

0.77

0.12

0.11

0.11

0.12

0.66

0.52

0.12

0.18

0.20

0.46

0.15

0.54

0.27

0.12

0.21

0.18

0.81

0.79

0.79

0.81

0.20

0.52

0.71

0.13

0.32

0.22

0.54

0.42

0.43

0.50

0.25

0.49

0.47

0.20

0.69

0.58

0.54

0.13

0.14

0.17

0.16

0.21

0.18

0.14

0.14

0.21

0.14

0.25

0.70

0.71

0.58

0.91

0.57

0.18

0.87

0.78

0.52

0.14

0.12

0.11

0.11

0.11

0.11

0.12

0.12

0.32

0.12

0.12

0.11

0.18

0.11

0.19

0.16

0.11

0.81

0.15

0.13

0.07

0.07

0.08

0.07

0.08

0.07

0.07

0.07

0.07

0.07

0.07

0.08

0.07

0.07

0.08

0.08

0.08

0.25

0.25

0.25

0.24

0.24

0.67

0.24

0.67

0.24

0.24

0.24

0.25

0.24

0.65

0.69

0.28

0.25

0.71

1.00

0.95

0.33

0.25

0.35

0.20

0.20

0.20

0.20

0.19

0.19

0.20

0.19

0.31

0.21

0.24

0.43

0.54

0.32

0.93

0.31

0.20

0.70

0.68

0.80

0.76

0.73

0.75

0.68

0.71

0.74

0.06

0.55

0.73

0.75

0.76

0.75

0.76

0.71

0.81

0.51

0.30

0.23

0.24

0.33

0.22

0.22

0.22

0.25

0.24

0.22

0.22

0.21

0.31

0.25

0.27

0.55

0.51

0.33

0.80

0.60

0.24

CD

7

CD

4

CD

3

HLA_D

R

CD

20

IgM

CD

123

CD

33

CD

45

CD

14

1 (3.85%)5 (1.89%)20 (0.34%)15 (7.8%)12 (24.25%)18 (1.85%)3 (14.19%)4 (1.97%)9 (16.62%)2 (2.26%)17 (0.64%)19 (1.51%)8 (8.82%)14 (2.2%)10 (2.08%)6 (3.58%)7 (4.37%)11 (0.64%)13 (0.42%)16 (0.72%)

clustercluster_m

erging

cluster_mergingB−cells IgM+B−cells IgM−CD4 T−cellsCD8 T−cellsDCNK cellsmonocytessurface−

cluster123456789101112131415161718

0.2

0.4

0.6

0.8

1

Page 19: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Statistical testing – differential abundance!

Page 20: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Statistical testing – differential abundance!

Ref BCRXL

Ref

1

Ref

2

Ref

3

Ref

4

Ref

5

Ref

6

Ref

7

Ref

8

BCR

XL1

BCR

XL2

BCR

XL3

BCR

XL4

BCR

XL5

BCR

XL6

BCR

XL7

BCR

XL8

0

25

50

75

100

sample_id

prop

ortio

n

clusterB−cells IgM+

B−cells IgM−

CD4 T−cells

CD8 T−cells

DC

NK cells

monocytes

surface−

Ref BCRXL

Ref

1

Ref

2

Ref

3

Ref

4

Ref

5

Ref

6

Ref

7

Ref

8

BCR

XL1

BCR

XL2

BCR

XL3

BCR

XL4

BCR

XL5

BCR

XL6

BCR

XL7

BCR

XL8

0

25

50

75

100

sample_id

prop

ortio

n

clusterB−cells IgM+

B−cells IgM−

CD4 T−cells

CD8 T−cells

DC

NK cells

monocytes

surface−

Page 21: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Ref BCRXL

Ref

1

Ref

2

Ref

3

Ref

4

Ref

5

Ref

6

Ref

7

Ref

8

BCR

XL1

BCR

XL2

BCR

XL3

BCR

XL4

BCR

XL5

BCR

XL6

BCR

XL7

BCR

XL8

0

25

50

75

100

sample_id

prop

ortio

n

clusterB−cells IgM+

B−cells IgM−

CD4 T−cells

CD8 T−cells

DC

NK cells

monocytes

surface−

Statistical testing – differential abundance!DC NK cells monocytes surface−

B−cells IgM+ B−cells IgM− CD4 T−cells CD8 T−cells

20

30

40

1.5

2.0

2.5

3.0

30

35

40

45

50

10

20

1.0

1.5

2.0

2.5

3.0

10

15

20

25

2

4

6

8

0.5

1.0

1.5

2.0

condition

prop

ortio

n

patient_idPatient1

Patient2

Patient3

Patient4

Patient5

Patient6

Patient7

Patient8

conditionRef

BCRXL

DC NK cells monocytes surface−

B−cells IgM+ B−cells IgM− CD4 T−cells CD8 T−cells

20

30

40

1.5

2.0

2.5

3.0

30

35

40

45

50

10

20

1.0

1.5

2.0

2.5

3.0

10

15

20

25

2

4

6

8

0.5

1.0

1.5

2.0

condition

prop

ortio

n

patient_idPatient1

Patient2

Patient3

Patient4

Patient5

Patient6

Patient7

Patient8

conditionRef

BCRXL

DC NK cells monocytes surface−

B−cells IgM+ B−cells IgM− CD4 T−cells CD8 T−cells

20

30

40

1.5

2.0

2.5

3.0

30

35

40

45

50

10

20

1.0

1.5

2.0

2.5

3.0

10

15

20

25

2

4

6

8

0.5

1.0

1.5

2.0

condition

prop

ortio

n

patient_idPatient1

Patient2

Patient3

Patient4

Patient5

Patient6

Patient7

Patient8

conditionRef

BCRXL

The generalized linear mixed model (GLMM) for each cell population!

•  observation-level random effects!!•  random intercept for each patient !

i = 1,…,8 – patient ID!j = 1, 2 – condition !

Y ⇠ Bin(m,⇡)Y – number of CD4 cells!π – proportion of CD4 cells!m – total number of cells !

prop

ortio

n!

logit(⇡ij) = �0 + �1xij + ⇠ij + �i

⇠ij ⇠ N(0,�2⇠ )

�i ⇠ N(0,�2�)

Page 22: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Statistical testing – differential expression of signaling markers!

Page 23: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Statistical testing – differential expression of signaling markers!

Page 24: CyTOF workflow: differential discovery in high …€¦ · Workshop!! CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets ! Malgorzata

Yij = �0 + �1xij + ✏ij + �i

Statistical testing – differential expression of signaling markers!

monocytes surface−

DC NK cells

CD4 T−cells CD8 T−cells

B−cells IgM+ B−cells IgM−

pNFk

Bpp

38pS

tat5

pAkt

pSta

t1pS

HP2

pZap

70pS

tat3

pSlp7

6pB

tkpP

lcg2

pErk

pLat

pS6

pNFk

Bpp

38pS

tat5

pAkt

pSta

t1pS

HP2

pZap

70pS

tat3

pSlp7

6pB

tkpP

lcg2

pErk

pLat

pS6

0

1

2

3

4

0

1

2

3

0

1

2

3

0.0

0.5

1.0

1.5

0

1

2

3

4

0

1

2

3

0

1

2

0

1

2

3

4

antigen

med

ian_e

xpre

ssion

patient_idPatient1

Patient2

Patient3

Patient4

Patient5

Patient6

Patient7

Patient8

conditionRef

BCRXL

monocytes surface−

DC NK cells

CD4 T−cells CD8 T−cells

B−cells IgM+ B−cells IgM−

pNFk

Bpp

38pS

tat5

pAkt

pSta

t1pS

HP2

pZap

70pS

tat3

pSlp7

6pB

tkpP

lcg2

pErk

pLat

pS6

pNFk

Bpp

38pS

tat5

pAkt

pSta

t1pS

HP2

pZap

70pS

tat3

pSlp7

6pB

tkpP

lcg2

pErk

pLat

pS6

0

1

2

3

4

0

1

2

3

0

1

2

3

0.0

0.5

1.0

1.5

0

1

2

3

4

0

1

2

3

0

1

2

0

1

2

3

4

antigen

med

ian_e

xpre

ssion

patient_idPatient1

Patient2

Patient3

Patient4

Patient5

Patient6

Patient7

Patient8

conditionRef

BCRXL

The linear mixed model (LMM) for each cell population!

!•  random intercept for each patient !

i = 1,…,8 – patient ID!j = 1, 2 – condition !

Y – median expression of marker in CD4 cells!

�i ⇠ N(0,�2�)

Med

ian

expr

essi

on!

Y ⇠ N(µ,�2)

✏ij ⇠ N(0,�2)

DC NK cells monocytes surface−

B−cells IgM+ B−cells IgM− CD4 T−cells CD8 T−cells

20

30

40

1.5

2.0

2.5

3.0

30

35

40

45

50

10

20

1.0

1.5

2.0

2.5

3.0

10

15

20

25

2

4

6

8

0.5

1.0

1.5

2.0

condition

prop

ortio

n

patient_idPatient1

Patient2

Patient3

Patient4

Patient5

Patient6

Patient7

Patient8

conditionRef

BCRXL

DC NK cells monocytes surface−

B−cells IgM+ B−cells IgM− CD4 T−cells CD8 T−cells

20

30

40

1.5

2.0

2.5

3.0

30

35

40

45

50

10

20

1.0

1.5

2.0

2.5

3.0

10

15

20

25

2

4

6

8

0.5

1.0

1.5

2.0

condition

prop

ortio

npatient_id

Patient1

Patient2

Patient3

Patient4

Patient5

Patient6

Patient7

Patient8

conditionRef

BCRXL