single cell transcriptomics – a simple walkthrough...oil oil rt reagents gel bead single cell...
TRANSCRIPT
Single Cell Transcriptomics –a simple walkthrough
Adwait Sathe, PhDMcDermott Center Bioinformatics Lab
MCDERMOTT CENTER BIOINFORMATICS LAB
1
RNASeq RNASeq
2
MCDERMOTT CENTER BIOINFORMATICS LAB https://www.10xgenomics.com/solutions/single-cell/
Single cell RNA Sequencing Workflow
Dissociation of solid tissue
Isolate single cellsDROP-Seq, Smart-Seq etc.
RNA
cDNA
Amplified RNA
Amplified cDNA
Sequencing library
Sequencing
Single cell expression profile
Clustering and Cell type identification
novel markers for cell types
Constructing Single Cell Trajectories
RT-PCR
IVT PCR
Novel applications!
RNASeq
Single cell RNASeq
3
MCDERMOTT CENTER BIOINFORMATICS LAB
Pea
rso
n R
Mo
lecu
lar
Det
ecti
on
Lim
it
Accuracy
Sensitivity
Single cell RNA Sequencing methods comparisonRNASeq
Single cell RNASeq
ScRNAseqmethods
most genes per cell among methodsSmart-seq2
• Full-length
• No UMI
• FACS
• Throughput (number of cells) – 102–103
microfluidic platformSmart-seq/C1
• Full length
• No UMI
• Fluidigm C1
• Throughput (number of cells) – 102–103
UMI – unique molecular identifiersChromium,
10X• 3’ counting (only 3’ of transcript sequenced)
• 8-10 bp UMI
• 10X barcoded gel beads
• Throughput (number of cells) – 104–105
4
MCDERMOTT CENTER BIOINFORMATICS LAB Svensson et. Al., Nature Methods (2017)
26bp
10X genomics protocol
https://www.10xgenomics.com/solutions/single-cell/
RNASeq
Single cell RNASeq
ScRNAseqmethods
10X Chromium
Recovery agent
5
MCDERMOTT CENTER BIOINFORMATICS LAB
10X genomics protocol
https://www.10xgenomics.com/solutions/single-cell/
RNASeq
Single cell RNASeq
ScRNAseqmethods
10X Chromium
10X barcoded Gel Beads
Cells &Enzymes
Oil Oil
RT reagents
Gel bead
Single cell
Cells Lysis & RT
Gel Bead-in-Emulsion (GEM)
6
MCDERMOTT CENTER BIOINFORMATICS LAB
10X genomics protocol
https://www.10xgenomics.com/solutions/single-cell/
RNASeq
Single cell RNASeq
ScRNAseqmethods
10X Chromium
10X barcoded Gel Beads
Cells &Enzymes
Oil
10X barcoded cDNA
Cells Lysis & RT
Recovery agent
GEMs broken
7
MCDERMOTT CENTER BIOINFORMATICS LAB
(1) Jeanette Baran-Gale et. al., Brief Funct Genomics, 2017. doi:10.1093/bfgp/elx035(2) https://www.10xgenomics.com/solutions/single-cell
Multiplet Rate (%)# of Cells Loaded # of Cells Recovered
~0.4% ~870 ~500
~0.8% ~1700 ~1000
~2.3% ~5300 ~3000
~3.9% ~8700 ~5000
~7.6% ~17400 ~10000
RNASeq
Single cell RNASeq
ScRNAseqmethods
10X Chromium
DESIGN
10X genomics protocol
(1)
(2)
Determine starting cells:http://satijalab.org/howmanycells
QC metrics for your cell typehttp://10xqc.com/
8
MCDERMOTT CENTER BIOINFORMATICS LAB
65%
Reads QC *
Alignment *
Mapping QC *
Cell QC
Normalization *
Confounders
Differential Expression *
Clustering, Marker gene
PseudotimeAnalysis
FastQC
STAR
RSeqC
edgeR Seurat, > 5000 cells MONOCLE2
Feature SelectionM3Drop
Scater
Scran
Combat
Red: Tools used for each step* : Common to Bulk RNASeq
Single cell RNA Sequencing computational pipelineRNASeq
Single cell RNASeq
ScRNAseqmethods
10X Chromium
DESIGN
Analysis pipeline
9
MCDERMOTT CENTER BIOINFORMATICS LAB
Let us carry out a single cell RNASeqexperiment
• Peripheral blood mononuclear cells (PBMCs) from a healthy donor. PBMCs are primary cells with relatively small amounts of RNA (~1pg RNA/cell).
• Sequenced on Illumina Hiseq4000
• 26bp read1 (16bp Chromium barcode and 10bp UMI), 98bp read2 (transcript), and 8bp I7 sample barcode
Dataset: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.0.1/pbmc8k
RNASeq
Single cell RNASeq
ScRNASeq
10
MCDERMOTT CENTER BIOINFORMATICS LAB
10X genomics summaryRNASeq
Single cell RNASeq
ScRNASeq
SUMMARY
Parameter PBMC Observed PBMC Recommended
Estimated Number of Cells/Barcodes 8381 500-10,000
Fraction Reads in Cell 93.10%
Mean Reads per Cell 93,552 50,000
Median Genes per Cell 1297 1000
Total Genes Detected 21,425
Median UMI counts per cell 4084
2000 UMIs/cell
10000 barcodes/cells
11
MCDERMOTT CENTER BIOINFORMATICS LAB
Potential problems
Macaulay and Voet, PLOS Genetics, 2014.Computational Methods for Analysis of Single Cell RNA-Seq Data, Ion Măndoiu, University of Connecticut.
– Low amplification efficiency, typically less than 10%
– Gene dropout rates
– Cell quality: Live/dead, Missing cells, Multiple cells
– Stochastic: cell cycle phases, Transcriptional bursting
– Cell Capture rate ≠ population frequency (multiples/gel beads, empty beads)
– Algorithm development
RNASeq
Single cell RNASeq
ScRNASeq
SUMMARY
PITFALLS
12
MCDERMOTT CENTER BIOINFORMATICS LAB
Exploring the Quality Control requiredRNASeq
Single cell RNASeq
ScRNASeq
SUMMARY
PITFALLS
QC
13
MCDERMOTT CENTER BIOINFORMATICS LAB
Number of Genes Number of UMIs
1000
2000
3000
4000
5000
10000
20000
30000
0
Percentage of Mitochondrial genes
0
5
10
15
20
PBMC sample
Clustering
• Determine a subset of genes to use for clustering; this is because not all genes are informative, such as those that are lowly expressed.
• The approach is to select gene based on their average expression and variability across cells
• We scale the data and remove unwanted sources of variation (technical, cell cycle stage, batches etc.)
RNASeq
Single cell RNASeq
14
MCDERMOTT CENTER BIOINFORMATICS LAB
ScRNASeq
SUMMARY
PITFALLS
QC
Mean vs Variation: Detection of variable genesRNASeq
Single cell RNASeq
ScRNASeq
SUMMARY
PITFALLS
QC
-5
0
5
10
15
0 1 2 43 5 6
Dis
per
sio
n
Average expression (log Normalized to library size)
Use for clustering
15
MCDERMOTT CENTER BIOINFORMATICS LAB
The goal is to identify the ”elbow” in this plot,After this point dividing into more clusters is not informative
5-10
RNASeq
Single cell RNASeq
ScRNASeq
SUMMARY
PITFALLS
QC
CLUSTERING
0Number of clusters (components)
25 50 75 100
Var
ian
ce e
xpla
ined
per
co
mp
on
ent
0
0.2
0.1
How many clusters are enough to divide the data into meaningful
groups?
16
MCDERMOTT CENTER BIOINFORMATICS LAB
Graph Cluster results T-SNE plot visualization
t-Distributed Stochastic Neighbor Embedding
RNASeq
Single cell RNASeq
ScRNASeq
SUMMARY
PITFALLS
QC
CLUSTERING
01234567891011
0
25
-25
0-20-40 4020t-SNE 1
t-SN
E 2
17
MCDERMOTT CENTER BIOINFORMATICS LAB
ScRNASeq
SUMMARY
PITFALLS
QC
CLUSTERING
Differential expression Identification of markers for cell types
Markers Cell Type
IL7R CD4 T cells
CD14, LYZ CD14+ Monocytes
MS4A1 B cells
CD8A CD8 T cells
FCGR3A, MS4A7 FCGR3A+ Monocytes
GNLY, NKG7 NK cells
FCER1A, CST3 Dendritic Cells
PPBP Megakaryocytes
Supervised – PBMC known markers
RNASeq
Single cell RNASeq
HETEROGENIETY
KNOWN MARKER
0
25
-25
0-20-40 4020t-SNE 1
t-SN
E 2
18
MCDERMOTT CENTER BIOINFORMATICS LAB
Differential expressionIdentification of novel markers for cell types
Method used is similar to
edgeR
RNASeq
Single cell RNASeq
19
MCDERMOTT CENTER BIOINFORMATICS LAB Cluster No.
Gen
es
Each column is a Cell
ScRNASeq
SUMMARY
PITFALLS
QC
CLUSTERING
HETEROGENIETY
KNOWN MARKER
NOVEL MARKER
Differential expressionIdentification of novel markers for cell types
Log1
0(U
MIs
)
RNASeq
Single cell RNASeq
Cluster 10 Cluster 11
20
ScRNASeq
SUMMARY
PITFALLS
QC
CLUSTERING
HETEROGENIETY
KNOWN MARKER
NOVEL MARKER
RNASeq
Single cell RNASeq
Differential expressionIdentification of novel markers for cell types
21
ScRNASeq
SUMMARY
PITFALLS
QC
CLUSTERING
HETEROGENIETY
KNOWN MARKER
NOVEL MARKER
Cloupe browser further exploration
RNASeq
Single cell RNASeq
MCDERMOTT CENTER BIOINFORMATICS LAB 22
ScRNASeq
SUMMARY
PITFALLS
QC
CLUSTERING
HETEROGENIETY
KNOWN MARKER
NOVEL MARKER
Acknowledgements
McDermott Center Bioinformatics LabChao Xing, PhD - DirectorMohammad KanchwalaAshwani Kumar
McDermott Center NGS core team
Workflow used is based on:
Seurat: Macosko, Basu, Satija et al., Cell, 2015 (Updated approach: Combining dimensional reduction with graph-based clustering)Monocle: Xiaojie Qiu, Andrew Hill, Cole Trapnell et al (2017)
Computational Methods for Analysis of Single Cell RNA-Seq Data, Ion Măndoiu, University of Connecticut
THANK YOU
MCDERMOTT CENTER BIOINFORMATICS LAB 24