festival of genomics 2016 - brain talk

57
Jean Fan / Festival of Genomics / June 2016 1 Jean Fan NSF GRFP | Bioinformatics and Integrative Genomics PhD Candidate Kharchenko Lab | Department of Biomedical Informatics | Harvard University Applying single cell transcriptomics : unraveling the complexity of the brain

Upload: jean-fan

Post on 10-Feb-2017

117 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Festival Of Genomics 2016 - Brain talk

1Jean Fan / Festival of Genomics / June 2016

Jean Fan NSF GRFP | Bioinformatics and Integrative Genomics PhD Candidate Kharchenko Lab | Department of Biomedical Informatics | Harvard University

Applying single cell transcriptomics: unraveling the complexity of the brain

Page 2: Festival Of Genomics 2016 - Brain talk

2Jean Fan / Festival of Genomics / June 2016

Page 3: Festival Of Genomics 2016 - Brain talk

3

Motivation: Characterize heterogeneity and identify cell subpopulations with scRNA-seq

Jean Fan / Festival of Genomics / June 2016

Valent P, Bonnet D, De maria R, et al. Cancer stem cell definitions and terminology: the devil is in the details. Nat Rev Cancer. 2012;12(11):767-75.

Cancer

Kaech SM, Cui W. Transcriptional control of effector and memory CD8+ T cell differentiation. Nat Rev Immunol. 2012;12(11):749-61.

T Cells

Page 4: Festival Of Genomics 2016 - Brain talk

4

Motivation: Characterize heterogeneity and identify cell subpopulations with scRNA-seq

Jean Fan / Festival of Genomics / June 2016

Greig LC, Woodworth MB, Galazo MJ, Padmanabhan H, Macklis JD. Molecular logic of neocortical projection neuron specification, development and diversity. Nat Rev Neurosci. 2013;14(11):755-69.

NPCs

Page 5: Festival Of Genomics 2016 - Brain talk

5

Motivation: Characterize heterogeneity and identify cell subpopulations with scRNA-seq

Jean Fan / Festival of Genomics / June 2016

Greig LC, Woodworth MB, Galazo MJ, Padmanabhan H, Macklis JD. Molecular logic of neocortical projection neuron specification, development and diversity. Nat Rev Neurosci. 2013;14(11):755-69.

NPCs

Single cellRNA-seq

Page 6: Festival Of Genomics 2016 - Brain talk

6

Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust

and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most out

of single-cell RNA-seq data?

Jean Fan / Festival of Genomics / June 2016

Page 7: Festival Of Genomics 2016 - Brain talk

7

Food For Thought◦ How can we identify transcriptional subpopulations in a way that is

robust and takes into consideration technical artefacts from single cell RNA-seq?

◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most out

of single-cell RNA-seq data?

Jean Fan / Festival of Genomics / June 2016

Page 8: Festival Of Genomics 2016 - Brain talk

8

Challenges: scRNA-seq data is highly variable and noisy◦ Expect high correlation between replicates

Jean Fan / Festival of Genomics / June 2016

expression in bulk replicate 1

expr

essio

n in

bul

k re

plic

ate

2

Bulk

Page 9: Festival Of Genomics 2016 - Brain talk

9

Challenges: scRNA-seq data is highly variable and noisy◦ Expect high correlation between replicates◦ Many differences between individual cells

(even of the same type)◦ Biological vs. technical differences◦ Focus on the biological variability◦ Control for the technical variability

◦ ex. measurement failures (drop-outs)

Jean Fan / Festival of Genomics / June 2016

Single Cell

Page 10: Festival Of Genomics 2016 - Brain talk

10

Previous work: SCDE - use error models to get a better handle on technical noise

Jean Fan / Festival of Genomics / June 2016

Page 11: Festival Of Genomics 2016 - Brain talk

11

Previous work: SCDE - use error models to get a better handle on technical noise◦ Estimate true

biological variability of a gene

◦ Account for possible drop-out events

Jean Fan / Festival of Genomics / June 2016

Cross-fits

Cell 1

Cell

2

Page 12: Festival Of Genomics 2016 - Brain talk

12

Previous work: SCDE - use error models to get a better handle on technical noise◦ Estimate true

biological variability of a gene

◦ Account for possible drop-out events

Jean Fan / Festival of Genomics / June 2016

Cross-fits Error Models

Cell 1

Cell

2

Page 13: Festival Of Genomics 2016 - Brain talk

13

Previous work: SCDE - use error models to get a better handle on technical noise◦ Estimate true

biological variability of a gene

◦ Account for possible drop-out events

◦ Assess variability of expressing taking into consideration expression magnitude dependencies

Jean Fan / Festival of Genomics / June 2016

Variance Normalization

Page 14: Festival Of Genomics 2016 - Brain talk

14Jean Fan / Festival of Genomics / June 2016

Error models and normalization helps us understand the data on a probabilistic level:

What is the chance this 0 expression in this cell is due to drop-out or true non-expression?

What is the chance that this gene is really this variable given the expected variability for genes at this average expression magnitude?

Page 15: Festival Of Genomics 2016 - Brain talk

PAGODA (Pathway And Geneset OverDispersion Analysis) applies error models and variance normalization to characterize heterogeneity and identify subpopulations

pklab.med.harvard.edu/scde

Page 16: Festival Of Genomics 2016 - Brain talk

PAGODA intuition: Improve statistical sensitivity by taking advantage of pathways and gene sets◦ Rather than relying on a few genes, look for broader patterns of variability◦ Coordinated patterns of variability of genes linked to function/phenotype

== stronger signal -> increases statistical power

Page 17: Festival Of Genomics 2016 - Brain talk

PAGODA intuition: Improve statistical sensitivity by taking advantage of pathways and gene sets◦ Rather than relying on a few genes, look for broader patterns of variability◦ Coordinated patterns of variability of genes linked to function/phenotype

== stronger signal -> increases statistical power

Page 18: Festival Of Genomics 2016 - Brain talk

PAGODA intuition: Improve statistical sensitivity by taking advantage of pathways and gene sets◦ Rather than relying on a few genes, look for broader patterns of variability◦ Coordinated patterns of variability of genes linked to function/phenotype

== stronger signal -> increases statistical power

Page 19: Festival Of Genomics 2016 - Brain talk

PAGODA overview: assess expression within annotated pathways and de novo gene sets

Page 20: Festival Of Genomics 2016 - Brain talk

PAGODA overview: assess expression within annotated pathways and de novo gene sets

Page 21: Festival Of Genomics 2016 - Brain talk

PAGODA overview: Identify pathways and gene sets exhibiting coordinated over dispersion

Page 22: Festival Of Genomics 2016 - Brain talk

PAGODA overview: Remove redundancy pathways and gene sets, and visualize

Page 23: Festival Of Genomics 2016 - Brain talk

23Jean Fan / Festival of Genomics / June 2016

Pathway based approach integrates prior knowledge to increase statistical power and provide interpretability of identified subpopulations

(example next)

Page 24: Festival Of Genomics 2016 - Brain talk

24

Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust

and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most out

of single-cell RNA-seq data?

Jean Fan / Festival of Genomics / June 2016

Page 25: Festival Of Genomics 2016 - Brain talk

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

cells

pathway clusters

Kun Zhang

Jerold Chun

Page 26: Festival Of Genomics 2016 - Brain talk

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

Page 27: Festival Of Genomics 2016 - Brain talk

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

Page 28: Festival Of Genomics 2016 - Brain talk

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

Page 29: Festival Of Genomics 2016 - Brain talk

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

Page 30: Festival Of Genomics 2016 - Brain talk

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

Page 31: Festival Of Genomics 2016 - Brain talk

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

Page 32: Festival Of Genomics 2016 - Brain talk

32

PAGODA integrated with FISH data spatially placed subpopulations

github.com/hms-dbmi/brainmapr

Page 33: Festival Of Genomics 2016 - Brain talk

PAGODA integrated with FISH data spatially placed subpopulations

Allen Brain Atlas; https://github.com/hms-dbmi/brainmapr

Page 34: Festival Of Genomics 2016 - Brain talk

PAGODA identifies multiple, potentially overlapping aspects of transcriptional heterogeneity

Page 35: Festival Of Genomics 2016 - Brain talk

PAGODA identifies multiple, potentially overlapping aspects of transcriptional heterogeneity

Page 36: Festival Of Genomics 2016 - Brain talk

PAGODA identifies multiple, potentially overlapping aspects of transcriptional heterogeneity

Allen Brain Atlas; https://github.com/hms-dbmi/brainmapr

Page 37: Festival Of Genomics 2016 - Brain talk

37

Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust

and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most

out of single-cell RNA-seq data?

Jean Fan / Festival of Genomics / June 2016

Page 38: Festival Of Genomics 2016 - Brain talk

38

Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust

and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most

out of single-cell RNA-seq data? ◦ Alternative splicing

Jean Fan / Festival of Genomics / June 2016

Page 39: Festival Of Genomics 2016 - Brain talk

39

PAGODA applied to human cortical cells identifies and characterizes subpopulations

Jean Fan / Festival of Genomics / June 2016

Xiaochang Zhang

Chris Walsh

Page 40: Festival Of Genomics 2016 - Brain talk

40Jean Fan / Festival of Genomics / June 2016

Marker genes confirm subpopulation identified by PAGODA

Page 41: Festival Of Genomics 2016 - Brain talk

41

PAGODA integrated with MISO identifies alternative splicing in pure pooled single cells

Jean Fan / Festival of Genomics / June 2016

Page 42: Festival Of Genomics 2016 - Brain talk

42

PAGODA integrated with MISO identifies alternative splicing in pure pooled single cells

Jean Fan / Festival of Genomics / June 2016

Needs bulk

Page 43: Festival Of Genomics 2016 - Brain talk

43

PAGODA integrated with MISO identifies alternative splicing in pure pooled single cells

Jean Fan / Festival of Genomics / June 2016

Needs bulk -> pool single cells

Page 44: Festival Of Genomics 2016 - Brain talk

44

Pure pooled RGs vs neurons lend credence to potential purity concerns with bulk CP vs. VZ

Jean Fan / Festival of Genomics / June 2016

Page 45: Festival Of Genomics 2016 - Brain talk

45

Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust

and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most

out of single-cell RNA-seq data? ◦ Alternative splicing◦ Copy number alteration detection / integrative analysis

Jean Fan / Festival of Genomics / June 2016

Page 46: Festival Of Genomics 2016 - Brain talk

46

BADGER quantitatively assess posterior probabilities of copy number alterations

Jean Fan / Festival of Genomics / June 2016

Bayesian Approach to CNV Detection from single cell RNA-seq (BADGER)

Page 47: Festival Of Genomics 2016 - Brain talk

47

BADGER quantitatively assess posterior probabilities of copy number alterations

Jean Fan / Festival of Genomics / June 2016

Bayesian Approach to CNV Detection from single cell RNA-seq (BADGER)

Page 48: Festival Of Genomics 2016 - Brain talk

48

BADGER quantitatively assess posterior probabilities of copy number alterations

Jean Fan / Festival of Genomics / June 2016

Bayesian Approach to CNV Detection from single cell RNA-seq (BADGER)

Page 49: Festival Of Genomics 2016 - Brain talk

49

BADGER applied to scRNA-seq identified subclonal expansion in progressive MM

Jean Fan / Festival of Genomics / June 2016

Soo Lee

Peter Park

Woong-Yang Park

Hae-Ock Lee

Initi

al

Bone

M

arro

wAs

cite

MM34

MM34A

Page 50: Festival Of Genomics 2016 - Brain talk

50

BADGER applied to scRNA-seq identified subclonal expansion in progressive MM

Jean Fan / Festival of Genomics / June 2016

Page 51: Festival Of Genomics 2016 - Brain talk

51

BADGER applied to scRNA-seq identified subclonal expansion in progressive MM

Jean Fan / Festival of Genomics / June 2016

Page 52: Festival Of Genomics 2016 - Brain talk

52

BADGER applied to scRNA-seq identified subclonal expansion in progressive MM

Jean Fan / Festival of Genomics / June 2016

Page 53: Festival Of Genomics 2016 - Brain talk

53

BADGER applied to scRNA-seq identified subclonal expansion in progressive MM

Jean Fan / Festival of Genomics / June 2016

Page 54: Festival Of Genomics 2016 - Brain talk

54

PAGODA integrated with BADGER connects genetic with transcriptional heterogeneity

Jean Fan / Festival of Genomics / June 2016

Page 55: Festival Of Genomics 2016 - Brain talk

55

PAGODA integrated with BADGER connects genetic with transcriptional heterogeneity

Jean Fan / Festival of Genomics / June 2016

Page 56: Festival Of Genomics 2016 - Brain talk

56Jean Fan / Festival of Genomics / June 2016

ScRNA-seq contains (noisy) expression as well as (noisy) splicing and some (noisy) genetic information.

Novel statistical and computational methods and techniques are still needed to harness the potential of scRNA-seq data!

Page 57: Festival Of Genomics 2016 - Brain talk

57

Thanks! Kharchenko Lab

Peter Kharchenko

Joseph Herman

Jean Fan / Festival of Genomics / June 2016

Park Lab

Soo Lee

Semin Lee

SGI

Hae-Ock Lee

Walsh Lab

Xiaochang Zhang

Funding