macmanes evolution2014 trimming talk

14
Optimal Trimming of mRNA sequence data Matthew MacManes University of New Hampshire Twitter: @PeroMHC [email protected]

Upload: matthew-macmanes

Post on 08-Jul-2015

372 views

Category:

Science


3 download

DESCRIPTION

This is a talk I presented at the Evolution meeting in Raleigh, NC in June 2014. It describes the work to date establishing optimal trimming for mRNAseq data.

TRANSCRIPT

Page 1: MacManes Evolution2014 trimming talk

Optimal Trimming of mRNA sequence data

Matthew MacManes University of New Hampshire !

Twitter: @PeroMHC [email protected]

Page 2: MacManes Evolution2014 trimming talk

Quality trimming of NGS data

• Universal practice

0.0

0.1

0.2

0.3

0.4

Nucleotide Position

Prob

abilit

y of

nuc

leot

ide

erro

r

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Page 3: MacManes Evolution2014 trimming talk

Quality trimming of NGS data0.

00.

10.

20.

30.

4

Nucleotide Position

Prob

abilit

y of

nuc

leot

ide

erro

r

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Phred=5

Page 4: MacManes Evolution2014 trimming talk

Quality trimming of NGS data0.

00.

10.

20.

30.

4

Nucleotide Position

Prob

abilit

y of

nuc

leot

ide

erro

r

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Phred=10

Page 5: MacManes Evolution2014 trimming talk

Quality trimming of NGS data0.

00.

10.

20.

30.

4

Nucleotide Position

Prob

abilit

y of

nuc

leot

ide

erro

r

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Phred=20

Page 6: MacManes Evolution2014 trimming talk

Trimming Experiment

• 2 Illumina datasets > adapter trimmed.

• Subsampled to 10M, 20M, 50M, 75M, 100M PE reads.

• Trimmed at Phred 0,2,5,10,20

• Assembled using Trinity and SOAPdenovo-Trans

• Developed metrics for evaluating transcriptome assemblies.

MacManes, Frontiers in Genetics 2014

Page 7: MacManes Evolution2014 trimming talk

1000

1400

1800

Num

ber o

f nuc

leot

ide

erro

rs p

er M

b of

ass

embl

y

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces error

MacManes, Frontiers in Genetics 2014

Page 8: MacManes Evolution2014 trimming talk

4000

5000

6000

7000

Num

ber o

f nuc

leot

ide

erro

rs p

er M

b of

ass

embl

y

No Trim Phred=2 Phred=5 Phred=10 Phred=20

SOAP10M SOAP20M

1000

1400

1800

Num

ber o

f nuc

leot

ide

erro

rs p

er M

b of

ass

embl

y

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces error

Page 9: MacManes Evolution2014 trimming talk

−5−4

−3−2

−10

1

Perc

ent d

iff in

num

ber o

f uni

que

BLAS

T hi

ts

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces BLAST hits

MacManes, Frontiers in Genetics 2014

Page 10: MacManes Evolution2014 trimming talk

−5−4

−3−2

−10

1

Perc

ent d

iff in

num

ber o

f uni

que

BLAS

T hi

ts

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

−6−4

−20

Perc

ent d

iff in

num

ber o

f uni

que

BLAS

T hi

ts

No Trim Phred=2 Phred=5 Phred=10 Phred=20

SOAP10M SOAP20M

Quality trimming reduces BLAST hits

Page 11: MacManes Evolution2014 trimming talk

−15

−10

−50

Perc

ent d

iff in

num

ber o

f com

plet

e C

DS

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces complete CDS

MacManes, Frontiers in Genetics 2014

Page 12: MacManes Evolution2014 trimming talk

−15

−10

−50

Perc

ent d

iff in

num

ber o

f com

plet

e C

DS

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces complete CDS−1

5−1

0−5

0Pe

rcen

t diff

in n

umbe

r of c

ompl

ete

CDS

No Trim Phred=2 Phred=5 Phred=10 Phred=20

SOAP10M SOAP20M

Page 13: MacManes Evolution2014 trimming talk

Summary

• Trimming does reduce assembly error, but at the cost of content & contiguity.

• Proposed guidelines.

1. To max assembly content and contiguity ➠ Trim at 0 or 2

2. If concerned about error ➠ Trim at Phred=5

3. Usually probably never trim at Phred ≥ 10

MacManes, Frontiers in Genetics 2014

Page 14: MacManes Evolution2014 trimming talk

Questions? @PeroMHC

[email protected]