transcriptome assembly and quantification from ion torrent rna-seq data

29
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work with Serghei Mangul, Sahar Al Seesi, Adrian Caciula, Dumitru Brinza, Ion Mandoiu

Upload: anais

Post on 23-Feb-2016

70 views

Category:

Documents


0 download

DESCRIPTION

Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data. Alex Zelikovsky Department of Computer Science Georgia State University. Joint work with Serghei Mangul, Sahar Al Seesi, Adrian Caciula, Dumitru Brinza , Ion Mandoiu. Advances in Next Generation Sequencing. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data

Alex Zelikovsky Department of Computer Science

Georgia State University

Joint work with Serghei Mangul, Sahar Al Seesi, Adrian Caciula, Dumitru Brinza, Ion Mandoiu

Page 2: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

2

Advances in Next Generation Sequencing

http://www.economist.com/node/16349358

Roche/454 FLX Titanium400-600 million reads/run

400bp avg. length

Illumina HiSeq 2000Up to 6 billion PE reads/run

35-100bp read length

SOLiD 4/55001.4-2.4 billion PE reads/run

35-50bp read length

Ion Proton Sequencer

Page 3: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

3

RNA-SeqRNA-Seq

A B C D E

Make cDNA & shatter into fragments

Sequence fragment ends

Map reads

Gene Expression

A B C

A C

D E

Transcriptome Reconstruction Isoform Expression

Page 4: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

4

Transcriptome Assembly

• Given partial or incomplete information about something, use that information to make an informed guess about the missing or unknown data.

Page 5: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

5

Transcriptome Assembly Types

• Genome-independent reconstruction (de novo)– de Brujin k-mer graph

• Genome-guided reconstruction (ab initio)– Spliced read mapping – Exon identification– Splice graph

• Annotation-guided reconstruction– Use existing annotation (known transcripts) – Focus on discovering novel transcripts

Page 6: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

6

Previous approaches

• Genome-independent reconstruction – Trinity(2011), Velvet(2008), TransABySS(2008)

• Genome-guided reconstruction – Scripture(2010)

• Reports “all” transcripts– Cufflinks(2010), IsoLasso(2011), SLIDE(2012),

CLIIQ(2012), TRIP(2012), Traph (2013)• Minimizes set of transcripts explaining reads

• Annotation-guided reconstruction– RABT(2011), DRUT(2011)

Page 7: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

7

Gene representation

• Pseudo-exons - regions of a gene between consecutive transcriptional or splicing events

• Gene - set of non-overlapping pseudo-exons

e1 e3 e5

e2 e4 e6

Spse1Epse1

Spse2

Epse2Spse3

Epse3

Spse4

Epse4

Spse5

Epse5 Spse6

Epse6

Spse7Epse7

Pseudo-exons:

e1 e5

pse1 pse2 pse3 pse4 pse5 pse6 pse7

Tr1:

Tr2:

Tr3:

Page 8: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

8

Splice GraphGenome

1 42 3 5 6 7 8 9

TSSpseudo-exons

TES

Page 9: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

• Map the RNA-Seq reads to genome

• Construct Splice Graph - G(V,E)– V : exons– E: splicing events

• Candidate transcripts– depth-first-search (DFS)

• Select candidate transcripts– IsoEM– greedy algorithm

9

Genome

MaLTA Maximum Likelihood Transcriptome Assembly

Page 10: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

10

How to select?

• Select the smallest set of candidate transcripts • covering all transcript variants

Transcript : set of transcript variants

Sharmistha Pal, Ravi Gupta, Hyunsoo Kim, et al., Alternative transcription exceeds alternative splicing in generating the transcriptome diversity of cerebellar development, Genome Res. 2011 21: 1260-1272

alternative first exon alternative last exon exon skipping intron retention

alternative 5' splice junction alternative 5' splice junction splice junction

Page 11: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

IsoEM: Isoform Expression Level Estimation

• Expectation-Maximization algorithm• Unified probabilistic model incorporating

– Single and/or paired reads– Fragment length distribution– Strand information– Base quality scores– Repeat and hexamer bias correction

Page 12: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

Read-isoform compatibility graphirw ,

a

aaair FQOw ,

Page 13: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

Fragment length distribution

A B C

A C

A B C

A C

A B C

A C

i

j

Series1

Fa(i)

Series1

Fa (j)

Page 14: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

14

Greedy algorithm

1. Sort transcripts by inferred IsoEM expression levels in decreasing order

2. Traverse transcripts – Select transcripts if it contains novel transcript

variant– Continue traversing until all transcript variant

are covered

Page 15: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

15

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 16: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

16

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 17: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

17

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 18: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

18

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 19: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

19

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 20: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

20

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 21: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

21

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 22: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

22

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 23: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

23

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 24: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

24

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 25: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

25

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

STOP. All transcript variant are covered.

Page 26: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

26

MaLTA results on GOG-350 dataset

• 4.5M single Ion reads with average read length 121 bp, aligned using TopHat2• Number of assembled transcripts

– MaLTA : 15385 – Cufflinks : 17378

• Number of transcripts matching annotations– MaLTA : 4555(26%) – Cufflinks : 2031(13%)

Page 27: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

Expression Estimation on Ion Torrent reads

IsoEM HBR Cufflinks HBR IsoEM UHR Cufflinks UHR0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

R2 fo

r Iso

EM/C

ufflin

ks E

stim

ates

vs q

PCR

• Squared correlation– IsoEM / Cufflinks FPKMs vs qPCR values for 800 genes – 2 MAQC samples : Human Brain and Universal

Page 28: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

28

Conclusions

• Novel method for transcriptome assembly • Validated on Ion Torrent RNA-Seq Data• Comparing with Cufflinks:

– similar number of assembled transcripts– 2x more previously annotated transcripts

• Transcript quantification is useful for transcript assembly better quantification?

Page 29: Transcriptome Assembly and Quantification  from Ion  Torrent RNA-Seq Data

29