festival of genomics 2016 - brain talk
TRANSCRIPT
1Jean Fan / Festival of Genomics / June 2016
Jean Fan NSF GRFP | Bioinformatics and Integrative Genomics PhD Candidate Kharchenko Lab | Department of Biomedical Informatics | Harvard University
Applying single cell transcriptomics: unraveling the complexity of the brain
2Jean Fan / Festival of Genomics / June 2016
3
Motivation: Characterize heterogeneity and identify cell subpopulations with scRNA-seq
Jean Fan / Festival of Genomics / June 2016
Valent P, Bonnet D, De maria R, et al. Cancer stem cell definitions and terminology: the devil is in the details. Nat Rev Cancer. 2012;12(11):767-75.
Cancer
Kaech SM, Cui W. Transcriptional control of effector and memory CD8+ T cell differentiation. Nat Rev Immunol. 2012;12(11):749-61.
T Cells
4
Motivation: Characterize heterogeneity and identify cell subpopulations with scRNA-seq
Jean Fan / Festival of Genomics / June 2016
Greig LC, Woodworth MB, Galazo MJ, Padmanabhan H, Macklis JD. Molecular logic of neocortical projection neuron specification, development and diversity. Nat Rev Neurosci. 2013;14(11):755-69.
NPCs
5
Motivation: Characterize heterogeneity and identify cell subpopulations with scRNA-seq
Jean Fan / Festival of Genomics / June 2016
Greig LC, Woodworth MB, Galazo MJ, Padmanabhan H, Macklis JD. Molecular logic of neocortical projection neuron specification, development and diversity. Nat Rev Neurosci. 2013;14(11):755-69.
NPCs
Single cellRNA-seq
6
Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust
and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most out
of single-cell RNA-seq data?
Jean Fan / Festival of Genomics / June 2016
7
Food For Thought◦ How can we identify transcriptional subpopulations in a way that is
robust and takes into consideration technical artefacts from single cell RNA-seq?
◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most out
of single-cell RNA-seq data?
Jean Fan / Festival of Genomics / June 2016
8
Challenges: scRNA-seq data is highly variable and noisy◦ Expect high correlation between replicates
Jean Fan / Festival of Genomics / June 2016
expression in bulk replicate 1
expr
essio
n in
bul
k re
plic
ate
2
Bulk
9
Challenges: scRNA-seq data is highly variable and noisy◦ Expect high correlation between replicates◦ Many differences between individual cells
(even of the same type)◦ Biological vs. technical differences◦ Focus on the biological variability◦ Control for the technical variability
◦ ex. measurement failures (drop-outs)
Jean Fan / Festival of Genomics / June 2016
Single Cell
10
Previous work: SCDE - use error models to get a better handle on technical noise
Jean Fan / Festival of Genomics / June 2016
11
Previous work: SCDE - use error models to get a better handle on technical noise◦ Estimate true
biological variability of a gene
◦ Account for possible drop-out events
Jean Fan / Festival of Genomics / June 2016
Cross-fits
Cell 1
Cell
2
12
Previous work: SCDE - use error models to get a better handle on technical noise◦ Estimate true
biological variability of a gene
◦ Account for possible drop-out events
Jean Fan / Festival of Genomics / June 2016
Cross-fits Error Models
Cell 1
Cell
2
13
Previous work: SCDE - use error models to get a better handle on technical noise◦ Estimate true
biological variability of a gene
◦ Account for possible drop-out events
◦ Assess variability of expressing taking into consideration expression magnitude dependencies
Jean Fan / Festival of Genomics / June 2016
Variance Normalization
14Jean Fan / Festival of Genomics / June 2016
Error models and normalization helps us understand the data on a probabilistic level:
What is the chance this 0 expression in this cell is due to drop-out or true non-expression?
What is the chance that this gene is really this variable given the expected variability for genes at this average expression magnitude?
PAGODA (Pathway And Geneset OverDispersion Analysis) applies error models and variance normalization to characterize heterogeneity and identify subpopulations
pklab.med.harvard.edu/scde
PAGODA intuition: Improve statistical sensitivity by taking advantage of pathways and gene sets◦ Rather than relying on a few genes, look for broader patterns of variability◦ Coordinated patterns of variability of genes linked to function/phenotype
== stronger signal -> increases statistical power
PAGODA intuition: Improve statistical sensitivity by taking advantage of pathways and gene sets◦ Rather than relying on a few genes, look for broader patterns of variability◦ Coordinated patterns of variability of genes linked to function/phenotype
== stronger signal -> increases statistical power
PAGODA intuition: Improve statistical sensitivity by taking advantage of pathways and gene sets◦ Rather than relying on a few genes, look for broader patterns of variability◦ Coordinated patterns of variability of genes linked to function/phenotype
== stronger signal -> increases statistical power
PAGODA overview: assess expression within annotated pathways and de novo gene sets
PAGODA overview: assess expression within annotated pathways and de novo gene sets
PAGODA overview: Identify pathways and gene sets exhibiting coordinated over dispersion
PAGODA overview: Remove redundancy pathways and gene sets, and visualize
23Jean Fan / Festival of Genomics / June 2016
Pathway based approach integrates prior knowledge to increase statistical power and provide interpretability of identified subpopulations
(example next)
24
Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust
and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most out
of single-cell RNA-seq data?
Jean Fan / Festival of Genomics / June 2016
PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations
cells
pathway clusters
Kun Zhang
Jerold Chun
PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations
PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations
PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations
PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations
PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations
PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations
32
PAGODA integrated with FISH data spatially placed subpopulations
github.com/hms-dbmi/brainmapr
PAGODA integrated with FISH data spatially placed subpopulations
Allen Brain Atlas; https://github.com/hms-dbmi/brainmapr
PAGODA identifies multiple, potentially overlapping aspects of transcriptional heterogeneity
PAGODA identifies multiple, potentially overlapping aspects of transcriptional heterogeneity
PAGODA identifies multiple, potentially overlapping aspects of transcriptional heterogeneity
Allen Brain Atlas; https://github.com/hms-dbmi/brainmapr
37
Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust
and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most
out of single-cell RNA-seq data?
Jean Fan / Festival of Genomics / June 2016
38
Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust
and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most
out of single-cell RNA-seq data? ◦ Alternative splicing
Jean Fan / Festival of Genomics / June 2016
39
PAGODA applied to human cortical cells identifies and characterizes subpopulations
Jean Fan / Festival of Genomics / June 2016
Xiaochang Zhang
Chris Walsh
40Jean Fan / Festival of Genomics / June 2016
Marker genes confirm subpopulation identified by PAGODA
41
PAGODA integrated with MISO identifies alternative splicing in pure pooled single cells
Jean Fan / Festival of Genomics / June 2016
42
PAGODA integrated with MISO identifies alternative splicing in pure pooled single cells
Jean Fan / Festival of Genomics / June 2016
Needs bulk
43
PAGODA integrated with MISO identifies alternative splicing in pure pooled single cells
Jean Fan / Festival of Genomics / June 2016
Needs bulk -> pool single cells
44
Pure pooled RGs vs neurons lend credence to potential purity concerns with bulk CP vs. VZ
Jean Fan / Festival of Genomics / June 2016
45
Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust
and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most
out of single-cell RNA-seq data? ◦ Alternative splicing◦ Copy number alteration detection / integrative analysis
Jean Fan / Festival of Genomics / June 2016
46
BADGER quantitatively assess posterior probabilities of copy number alterations
Jean Fan / Festival of Genomics / June 2016
Bayesian Approach to CNV Detection from single cell RNA-seq (BADGER)
47
BADGER quantitatively assess posterior probabilities of copy number alterations
Jean Fan / Festival of Genomics / June 2016
Bayesian Approach to CNV Detection from single cell RNA-seq (BADGER)
48
BADGER quantitatively assess posterior probabilities of copy number alterations
Jean Fan / Festival of Genomics / June 2016
Bayesian Approach to CNV Detection from single cell RNA-seq (BADGER)
49
BADGER applied to scRNA-seq identified subclonal expansion in progressive MM
Jean Fan / Festival of Genomics / June 2016
Soo Lee
Peter Park
Woong-Yang Park
Hae-Ock Lee
Initi
al
Bone
M
arro
wAs
cite
MM34
MM34A
50
BADGER applied to scRNA-seq identified subclonal expansion in progressive MM
Jean Fan / Festival of Genomics / June 2016
51
BADGER applied to scRNA-seq identified subclonal expansion in progressive MM
Jean Fan / Festival of Genomics / June 2016
52
BADGER applied to scRNA-seq identified subclonal expansion in progressive MM
Jean Fan / Festival of Genomics / June 2016
53
BADGER applied to scRNA-seq identified subclonal expansion in progressive MM
Jean Fan / Festival of Genomics / June 2016
54
PAGODA integrated with BADGER connects genetic with transcriptional heterogeneity
Jean Fan / Festival of Genomics / June 2016
55
PAGODA integrated with BADGER connects genetic with transcriptional heterogeneity
Jean Fan / Festival of Genomics / June 2016
56Jean Fan / Festival of Genomics / June 2016
ScRNA-seq contains (noisy) expression as well as (noisy) splicing and some (noisy) genetic information.
Novel statistical and computational methods and techniques are still needed to harness the potential of scRNA-seq data!
57
Thanks! Kharchenko Lab
Peter Kharchenko
Joseph Herman
Jean Fan / Festival of Genomics / June 2016
Park Lab
Soo Lee
Semin Lee
SGI
Hae-Ock Lee
Walsh Lab
Xiaochang Zhang
Funding