macmanes evolution2014 trimming talk
Post on 08-Jul-2015
372 Views
Preview:
DESCRIPTION
TRANSCRIPT
Optimal Trimming of mRNA sequence data
Matthew MacManes University of New Hampshire !
Twitter: @PeroMHC macmanes@gmail.com
Quality trimming of NGS data
• Universal practice
0.0
0.1
0.2
0.3
0.4
Nucleotide Position
Prob
abilit
y of
nuc
leot
ide
erro
r
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
Quality trimming of NGS data0.
00.
10.
20.
30.
4
Nucleotide Position
Prob
abilit
y of
nuc
leot
ide
erro
r
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
Phred=5
Quality trimming of NGS data0.
00.
10.
20.
30.
4
Nucleotide Position
Prob
abilit
y of
nuc
leot
ide
erro
r
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
Phred=10
Quality trimming of NGS data0.
00.
10.
20.
30.
4
Nucleotide Position
Prob
abilit
y of
nuc
leot
ide
erro
r
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
Phred=20
Trimming Experiment
• 2 Illumina datasets > adapter trimmed.
• Subsampled to 10M, 20M, 50M, 75M, 100M PE reads.
• Trimmed at Phred 0,2,5,10,20
• Assembled using Trinity and SOAPdenovo-Trans
• Developed metrics for evaluating transcriptome assemblies.
MacManes, Frontiers in Genetics 2014
1000
1400
1800
Num
ber o
f nuc
leot
ide
erro
rs p
er M
b of
ass
embl
y
No Trim Phred=2 Phred=5 Phred=10 Phred=20
10M 20M 50M 75M 100M
Quality trimming reduces error
MacManes, Frontiers in Genetics 2014
4000
5000
6000
7000
Num
ber o
f nuc
leot
ide
erro
rs p
er M
b of
ass
embl
y
No Trim Phred=2 Phred=5 Phred=10 Phred=20
SOAP10M SOAP20M
1000
1400
1800
Num
ber o
f nuc
leot
ide
erro
rs p
er M
b of
ass
embl
y
No Trim Phred=2 Phred=5 Phred=10 Phred=20
10M 20M 50M 75M 100M
Quality trimming reduces error
−5−4
−3−2
−10
1
Perc
ent d
iff in
num
ber o
f uni
que
BLAS
T hi
ts
No Trim Phred=2 Phred=5 Phred=10 Phred=20
10M 20M 50M 75M 100M
Quality trimming reduces BLAST hits
MacManes, Frontiers in Genetics 2014
−5−4
−3−2
−10
1
Perc
ent d
iff in
num
ber o
f uni
que
BLAS
T hi
ts
No Trim Phred=2 Phred=5 Phred=10 Phred=20
10M 20M 50M 75M 100M
−6−4
−20
Perc
ent d
iff in
num
ber o
f uni
que
BLAS
T hi
ts
No Trim Phred=2 Phred=5 Phred=10 Phred=20
SOAP10M SOAP20M
Quality trimming reduces BLAST hits
−15
−10
−50
Perc
ent d
iff in
num
ber o
f com
plet
e C
DS
No Trim Phred=2 Phred=5 Phred=10 Phred=20
10M 20M 50M 75M 100M
Quality trimming reduces complete CDS
MacManes, Frontiers in Genetics 2014
−15
−10
−50
Perc
ent d
iff in
num
ber o
f com
plet
e C
DS
No Trim Phred=2 Phred=5 Phred=10 Phred=20
10M 20M 50M 75M 100M
Quality trimming reduces complete CDS−1
5−1
0−5
0Pe
rcen
t diff
in n
umbe
r of c
ompl
ete
CDS
No Trim Phred=2 Phred=5 Phred=10 Phred=20
SOAP10M SOAP20M
Summary
• Trimming does reduce assembly error, but at the cost of content & contiguity.
• Proposed guidelines.
1. To max assembly content and contiguity ➠ Trim at 0 or 2
2. If concerned about error ➠ Trim at Phred=5
3. Usually probably never trim at Phred ≥ 10
MacManes, Frontiers in Genetics 2014
top related