lecture 8 understanding transcription rna-seq analysis ... · understanding transcription rna-seq...

44
Lecture 8 Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology David K. Gifford 1

Upload: doantu

Post on 01-Jul-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Lecture 8 Understanding Transcription

RNA-seq analysis

Foundations of Computational Systems Biology David K. Gifford

1

Page 2: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Lecture 8 – RNA-seq Analysis

•  RNA-seq principles – How can we characterize mRNA isoform

expression using high-throughput sequencing?

•  Differential expression and PCA – What genes are differentially expressed, and how

can we characterize expressed genes?

•  Single cell RNA-seq – What are the benefits and challenges of working

with single cells for RNA-seq?

2

Page 3: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

RNA-Seq characterizes RNA molecules

A B C Gene in genome

A B C pre-mRNA or ncRNA

transcription

A B C

splicing

A C

export to cytoplasm

mRNA

nucleus

cytoplasm

High-throughput sequencing of RNAs at various stages of processing

Slide courtesy Cole Trapnel Courtesy of Cole Trapnell. Used with permission.

3

Page 4: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

ET Wang et al. Nature 000, 1-7 (2008) doi:10.1038/nature07509

Pervasive tissue-specific regulation of alternative mRNA isoforms.

Courtesy of Macmillan Publishers Limited. Used with permission.Source: Wang, Eric T., Rickard Sandberg, et al. "Alternative Isoform Regulation inHuman Tissue Transcriptomes." Nature 456, no. 7221 (2008): 470-6.4

Page 5: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

RNA-Seq: millions of short reads from fragmented mRNA

Pepke et. al. Nature Methods 2009

Extract RNA from cells/tissue

+ splice junctions!

Courtesy of Macmillan Publishers Limited. Used with permission.Source: Pepke, Shirley, Barbara Wold, et al. "Computation for ChIP-seq and RNA-seq Studies." Nature Methods 6 (2009): S22-32.

5

Page 6: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Mapping RNA-seq reads to a reference genome reveals expression

Sox2

6

Page 7: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Smug1

Reads over exons

Junction reads (split between exons)

RNA-seq reads map to exons and across exons

7

Page 8: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Two major approaches to RNA-seq analysis

1.  Assemble reads into transcripts. Typical issues with coverage and correctness.

2.  Map reads to reference genome and identify isoforms using constraints

•  Goal is to quantify isoforms and determine significance of differential expression

•  Common RNA-seq expression metrics are Reads per killobase per million reads (RPKM) or Fragments per killobase per million (FPKM)

exon 1 exon 2 exon 3

Short sequencing reads, randomly sampled from a

transcript

8

Page 9: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Aligned reads reveal isoform possibilities

A B C

A B C A B C

A B C A B C

identify candidate exons via genomic mapping

Generate possible pairings of exons

Align reads to possible junctions

Courtesy of Cole Trapnell. Used with permission.

Slide courtesy Cole Trapnell

9

Page 10: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

A

B

C

D

E Isoform FractioT1 ψψ11""

T2 ψψ22""T3 ψψ33""

T4 ψψ44""

n

We can use mapped reads to learn the isoform mixture ψ"

Slide courtesy Cole Trapnell

Courtesy of Cole Trapnell. Used with permission.

10

Page 11: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Common reads Common reads

Detecting alternative splicing from mRNA-Seq data Isoforms Inclusion reads

Exclusion reads Given a set of reads, estimate:

= Distribution of isoforms

11

Page 12: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

P(Ri | T=Tj) – Excluded reads

If a single ended read or read pair Ri is structurally incompatible with transcript Tj, then

Ri

Tj

P(R = Ri |T = Tj ) = 0

Intron in Tj

Courtesy of Cole Trapnell. Used with permission.

Slide courtesy Cole Trapnell

12

Page 13: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

P(Ri | T=Tj) – Single end reads

Ri

Tj

Cufflinks assumes that fragmentation is roughly uniform. The probability of observing a fragment starting at a specific position Si in a transcript of length lj is:""

P(S = Si |T = Tj ) =1l j

Transcript length lj

starting position in transcript, Si

Courtesy of Cole Trapnell. Used with permission.

Slide courtesy Cole Trapnell

13

Page 14: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

P(Ri | T=Tj) – Paired end reads

Assume our library fragments have a length distribution described by a probability density F.. Thus, the probability of observing a particular paired alignment to a transcript:""

Tj

Implied fragment length lj(Ri)

Ri

P(R = Ri |T = Tj ) =F(l j (Rj ))

l j

Courtesy of Cole Trapnell. Used with permission.

Slide courtesy Cole Trapnell

14

Page 15: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Estimating Isoform Expression

•  Find expression abundances ψ1,…,ψn for a set of isoforms T1,…,Tn

•  Observations are the set of reads R1,…,Rm

•  Can estimate mRNA expression of each isoform

Ψ =Ψ

argmaxL(Ψ | R)

L(Ψ | R)∝P(R |Ψ)P(Ψ)

using total number of reads that map to a gene and ψ

P(R |Ψ) = Ψ jP(R = Ri |T = Tj )j=0

n∑

i=0

m∏

15

Page 16: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Case study: myogenesis

• 

• 

• 

• 

Tra

nscrip

ts (

%)

0.01 0.1 1 10 100 1000 10000

0.0

0.2

0.4

0.6

0.8

1.0

●● ●● ●● ●● ●● ●● ●● ●● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●●● ●● ●● ●●

●●

●● ●● ●●●●

●●●●

●●

●●

●●

●●

●● ●●●● ●●

●● ●●●● ●●

●●●● ●●

●●

●● ●● ●● ●● ●● ●●●● ●●

●● ●●●●

●●

●●●● ●● ●● ●●● ● ● ● ● ●

● ● ●

●● ● ●

●●

●● ● ●

● ●● ●

●●

●●

●●

●●

●●

●●

●● ● ●

● ● ● ● ●● ● ● ●

●● ●

● ●

●●

●●

●matchcontainedintra−intron

novel isoformrepeatother

Reads per bp

Tra

nscrip

ts

0.01 0.1 1 10 100 1000 10000

02

00

00

50

00

0

Courtesy of Cole Trapnell. Used with permission.

Cufflinks identified 116,839 distinct transcribed fragments (transfrags)

Nearly 70% of the reads in 14,241 matching transcripts

Tracked 8,134 transfrags across all time points, 5,845 complete matches to UCSC/Ensembl/VEGA

Tracked 643 new isoforms of known genes across all points

Transcript categories, by coverage

Slide courtesy Cole Trapnell

16

Page 17: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Case study: myogenesis

•  ~25% of transcripts have light sequence coverage, and are fragments of full transcripts

•  Intronic reads, repeats, and other artifacts are numerous, but account for less than 5% of the assembled reads.

Tra

nscrip

ts (

%)

0.01 0.1 1 10 100 1000 10000

0.0

0.2

0.4

0.6

0.8

1.0

●● ●● ●● ●● ●● ●● ●● ●● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●●● ●● ●● ●●

●●

●● ●● ●●●●

●●●●

●●

●●

●●

●●

●● ●●●● ●●

●● ●●●● ●●

●●●● ●●

●●

●● ●● ●● ●● ●● ●●●● ●●

●● ●●●●

●●

●●●● ●● ●● ●●● ● ● ● ● ●

● ● ●

●● ● ●

●●

●● ● ●

● ●● ●

●●

●●

●●

●●

●●

●●

●● ● ●

● ● ● ● ●● ● ● ●

●● ●

● ●

●●

●●

●matchcontainedintra−intron

novel isoformrepeatother

Reads per bp

Tra

nscrip

ts

0.01 0.1 1 10 100 1000 10000

02

00

00

50

00

0

Transcript categories, by coverage

Courtesy of Cole Trapnell. Used with permission.

Slide courtesy Cole Trapnell

17

Page 18: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Lecture 8 – RNA-seq Analysis

•  RNA-seq principles – How can we characterize mRNA isoform

expression using high-throughput sequencing?

•  Differential expression and PCA – What genes are differentially expressed, and how

can we characterize expressed genes?

•  Single cell RNA-seq – What are the benefits and challenges of working

with single cells for RNA-seq?

18

Page 19: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

19

Page 20: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

20

Page 21: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

21

Page 22: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

22

Page 23: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

23

Page 24: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Scaling RNA-seq data (DESeq)

•  i gene or isoform •  j sample (experiment) •  m number of samples •  Kij number of counts for isoform i in experiment j •  sj sampling depth for experiment j (scale factor)

js =mediani

ijK1m

ivKv=1m∏( )

24

Page 25: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Model for RNA-seq data (DESeq)

•  i gene or isoform p condition •  j sample (experiment) p(j) condition of sample j •  m number of samples •  Kij number of counts for isoform i in experiment j •  qip Average scaled expression for gene i condition p

ijK ~ NB ijµ , ij2σ( )

ij2σ = ijµ + j

2s pv ip( j )q( )ijµ = ip( j )q js

ipq =1

# of replicatesijKjsj in replicates

25

Page 26: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

2 2σ ij = µ ij + s jvp(qip( j ))

Orange Line – DESeq Dashed Orange – edgeR Purple - Poission

Courtesy of the authors. License: CC-BY.Source: Anders, Simon, and Wolfgang Huber. "Differential Expression Analysis for Sequence Count Data." Genome Biology 11, no. 10 (2010): R106.

26

Page 27: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Significance of differential expression using test statistics

•  Hypothesis H0 (null) – Condition A and B identically express isoform i with random noise added

•  Hypothesis H1 – Condition A and B differentially express isoform

•  Degrees of freedom (dof) is the number of free parameters in H1 minus the number of free parameters in H0; in this case degrees of freedom is 4 – 2 = 2 (H1 has an extra mean and variance).

•  Likelihood ratio test defines a test statistic that follows the Chi Squared distribution

iT = 2 log P( iAK |H1)P( iBK |H1)P iAK , iBK |H0( )

P (H0) ≈1−ChiSquaredCDF (T i | dof )27

Page 28: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Courtesy of the authors. License: CC-BY.Source: Anders, Simon, and Wolfgang Huber. "Differential Expression Analysis for Sequence Count Data." Genome Biology 11, no. 10 (2010): R106.

28

Page 29: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Hypergeometric test for overlap significance

N – total # of genes 1000 n1 - # of genes in set A 20 n2 - # of genes in set B 30 k - # of genes in both A and B 3

P k( ) =

n1k

!

"#

$

%& N − n1

n2− k

!

"#

$

%&

Nn2

!

"#

$

%&

P x ≥ k( ) = P(i)i=k

min(n1,n2)∑

0.017 0.020

29

Page 30: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

30

Page 31: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

31

Page 32: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

32

Page 33: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

33

Page 34: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

34

Page 35: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

35

Page 36: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Lecture 8 – RNA-seq Analysis

•  RNA-seq principles – How can we characterize mRNA isoform

expression using high-throughput sequencing?

•  Differential expression and PCA – What genes are differentially expressed, and how

can we characterize expressed genes?

•  Single cell RNA-seq – What are the benefits and challenges of working

with single cells for RNA-seq?

36

Page 37: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Courtesy of Fluidigm Corporation. Used with permission.

37

Page 38: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Single-cell RNA-Seq of LPS-stimulated bone-marrow-derived dendritic cells reveals extensive transcriptome heterogeneity.

AK Shalek et al. Nature 000, 1-5 (2012) doi:10.1038/nature12172

Courtesy of Macmillan Publishers Limited. Used with permission.Source: Shalek, Alex K., Rahul Satija, et al. "Single-cell Transcriptomics Reveals Bimodality in Expression and Splicing in Immune Cells." Nature (2013).38

Page 39: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Analysis of co-variation in single-cell mRNA expression levels reveals distinct maturity states and an antiviral cell circuit.

AK Shalek et al. Nature 000, 1-5 (2012) doi:10.1038/nature12172

Courtesy of Macmillan Publishers Limited. Used with permission.Source: Shalek, Alex K., Rahul Satija, et al. "Single-cell Transcriptomics Reveals Bimodality in Expression and Splicing in Immune Cells." Nature (2013).39

Page 40: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

Analysis of co-variation in single-cell mRNA expression levels reveals distinct maturity states and an antiviral cell circuit.

AK Shalek et al. Nature 000, 1-5 (2012) doi:10.1038/nature12172

Courtesy of Macmillan Publishers Limited. Used with permission.Source: Shalek, Alex K., Rahul Satija, et al. "Single-cell Transcriptomics Reveals Bimodality in Expression and Splicing in Immune Cells." Nature (2013).40

Page 41: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

RNA-seq library complexity can help qualify cells for analysis

Michal Grzadkowski © Michal Grzadkowski. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

41

Page 42: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

RNA-seq library complexity can help qualify cells for analysis

Michal Grzadkowski © Michal Grzadkowski. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

42

Page 43: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

FIN

43

Page 44: Lecture 8 Understanding Transcription RNA-seq analysis ... · Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology ... Rahul Satija, et al. "Single-cell

MIT OpenCourseWarehttp://ocw.mit.edu

7.91J / 20.490J / 20.390J / 7.36J / 6.802J / 6.874J / HST.506J Foundations of Computational and Systems BiologySpring 2014

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.