macmanes evolution2014 trimming talk

Post on 08-Jul-2015

372 Views

Category:

Science

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

This is a talk I presented at the Evolution meeting in Raleigh, NC in June 2014. It describes the work to date establishing optimal trimming for mRNAseq data.

TRANSCRIPT

Optimal Trimming of mRNA sequence data

Matthew MacManes University of New Hampshire !

Twitter: @PeroMHC macmanes@gmail.com

Quality trimming of NGS data

• Universal practice

0.0

0.1

0.2

0.3

0.4

Nucleotide Position

Prob

abilit

y of

nuc

leot

ide

erro

r

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Quality trimming of NGS data0.

00.

10.

20.

30.

4

Nucleotide Position

Prob

abilit

y of

nuc

leot

ide

erro

r

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Phred=5

Quality trimming of NGS data0.

00.

10.

20.

30.

4

Nucleotide Position

Prob

abilit

y of

nuc

leot

ide

erro

r

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Phred=10

Quality trimming of NGS data0.

00.

10.

20.

30.

4

Nucleotide Position

Prob

abilit

y of

nuc

leot

ide

erro

r

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Phred=20

Trimming Experiment

• 2 Illumina datasets > adapter trimmed.

• Subsampled to 10M, 20M, 50M, 75M, 100M PE reads.

• Trimmed at Phred 0,2,5,10,20

• Assembled using Trinity and SOAPdenovo-Trans

• Developed metrics for evaluating transcriptome assemblies.

MacManes, Frontiers in Genetics 2014

1000

1400

1800

Num

ber o

f nuc

leot

ide

erro

rs p

er M

b of

ass

embl

y

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces error

MacManes, Frontiers in Genetics 2014

4000

5000

6000

7000

Num

ber o

f nuc

leot

ide

erro

rs p

er M

b of

ass

embl

y

No Trim Phred=2 Phred=5 Phred=10 Phred=20

SOAP10M SOAP20M

1000

1400

1800

Num

ber o

f nuc

leot

ide

erro

rs p

er M

b of

ass

embl

y

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces error

−5−4

−3−2

−10

1

Perc

ent d

iff in

num

ber o

f uni

que

BLAS

T hi

ts

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces BLAST hits

MacManes, Frontiers in Genetics 2014

−5−4

−3−2

−10

1

Perc

ent d

iff in

num

ber o

f uni

que

BLAS

T hi

ts

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

−6−4

−20

Perc

ent d

iff in

num

ber o

f uni

que

BLAS

T hi

ts

No Trim Phred=2 Phred=5 Phred=10 Phred=20

SOAP10M SOAP20M

Quality trimming reduces BLAST hits

−15

−10

−50

Perc

ent d

iff in

num

ber o

f com

plet

e C

DS

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces complete CDS

MacManes, Frontiers in Genetics 2014

−15

−10

−50

Perc

ent d

iff in

num

ber o

f com

plet

e C

DS

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces complete CDS−1

5−1

0−5

0Pe

rcen

t diff

in n

umbe

r of c

ompl

ete

CDS

No Trim Phred=2 Phred=5 Phred=10 Phred=20

SOAP10M SOAP20M

Summary

• Trimming does reduce assembly error, but at the cost of content & contiguity.

• Proposed guidelines.

1. To max assembly content and contiguity ➠ Trim at 0 or 2

2. If concerned about error ➠ Trim at Phred=5

3. Usually probably never trim at Phred ≥ 10

MacManes, Frontiers in Genetics 2014

Questions? @PeroMHC

macmanes@gmail.com

top related