find meaning in complexity © copyright 2013 by pacific biosciences of california, inc. all rights...
TRANSCRIPT
FIND MEANING IN COMPLEXITY
© Copyright 2013 by Pacific Biosciences of California, Inc. All rights reserved.
Kevin Corcoran, SVP Market Development
PacBio User Group Meeting September 18, 2013
PacBio User Meeting
WelcomeWest Coast User Group Meeting
2
Thank You!Your Success
IsOur Success
More than 50 customer publications
Special Thanks to our Reception Sponsors
4
Latest Advances – Last 9 Months
5
HGAP & Quiver PacBio® RS II P4 Enzyme
Latest Advances – Last 9 Months
6
HGAP & Quiver PacBio® RS II P4 Enzyme
Finishing Genomes Using Only PacBio® Reads
• Utilizes all PacBio data from single, long-insert library
– Longest reads for continuity
– All reads for high consensus accuracy
Hierarchical Genome Assembly Process (HGAP)
Quiver: A New Consensus Caller for PacBio® Data
• Can achieve accuracy >Q50 (i.e. > 99.999%) using only PacBio reads
• How Quiver works
– Takes multiple reads of a given DNA template, outputs best guess of template’s identity
– QV-aware hidden Markov model to account for sequencing errors; a greedy algorithm to find the maximum likelihood template
– Similar underlying algorithm as currently used for CCS generation
• Links:
– www.pacbiodevnet.com/quiver
– https://github.com/PacificBiosciences/GenomicConsensus
8
9
SMRT® Sequencing Accuracy
0 10 20 30 40 50 60 70 80 90 100
Coverage
Co
nco
rdan
ce -
A
ccu
racy
(Q
V)
99.99999% (QV 70)
99.9999% (QV 60)
99.999% (QV 50)
99.99% (QV 40)
99.9% (QV 30)
99% (QV 20)
90% (QV 10)
Data generated with P4-C2 chemistry on PacBio® RS II; Analyzed using Quiver with 2.0.1 SMRT® Analysis
E. coli
R. palustris
S. aureus
Perfect consensus
Latest Advances – Last 9 Months
10
HGAP & Quiver PacBio® RS II P4 Enzyme
PacBio® RS II
Pre- and Post-PacBio® RS II Upgrade
Throughput effectively doubles with no significant changes to other sequencing metrics
Organism Chem Insert Size
System Model
Filtered Reads
Filtered Bases (Mb)
Mapped Subreads
Avg Subread Length
Avg Read Length
95th % Read
Length
Max Read Length
Single Pass
Accuracy
B. subtilis C2/C2 10 kb RS II 68,475 221.8 91,071 2,000 2,729 7,215 21,060 86.90%
R. palustris C2/C2 10 kb RS II 63,115 208.4 81,998 2,104 2,784 7,457 19,798 85.50%
E. coli C2/C2 10 kb RS II 73,820 293.9 107,520 2,172 3,234 8,446 20,195 85.00%
E. coli C2/C2 10 kb RS 32,407 107.9 46,894 1,938 2,784 7,237 19,006 85.00%
E. coli E. coli B. subtilis R. palustrisE. coli E. coli B. subtilis R. palustris0
50
100
150
200
250
300
350
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
Megabases per SMRT® Cell Reads per SMRT® Cell
Meg
abas
es p
er
SM
RT
® C
ell
Rea
ds
per
SM
RT
® C
ell
Latest Advances – Last 9 Months
HGAP & Quiver PacBio® RS II P4 Enzyme
DNA polymerase from Bacillus subtilis phage phi29 • B-family polymerase • Highly processive• High fidelity (3’ to 5’ exonuclease)• Good strand displacement activity• Single subunit (66 kDa)
Berman, A.J. et al., EMBO J. (2007) 26(14):3494-505
Phi29 DNA Polymerase
P4 Polymerase Performance
• Performance summary of current chemistries
• Key Points
– P4 polymerase shows similar consensus accuracy as C2 polymerase
– P4 polymerase shows similar read lengths as XL polymerase
*E. coli 10 kb library, PacBio® RS II, Stage Start, 120 minute movies
Enzyme Sequence Chemistry
QV50Coverage
Mean Mapped Read Length
P4 C2 32x 3,995
C2 C2 32x 3,235
XL C2 70x 4,012
Typical Results: Accuracy with P4 Polymerase
Consensus Accuracy
XL-C2
C2-C2
P4-C2
Coverage
QV
*E. coli 10 kb library, PacBio® RS II, Stage Start, 120 min movies, C2 sequencing chemistry, SMRT® Analysis 2.0.1
• Key Points
– P4 polymerase shows similar consensus accuracy as C2
– P4 polymerase outperforms XL
Increased Read Length Beyond P4 Enzyme
Photodamage Impacts Read Length
Pol Protecting Scaffold
Dye
Pol
Polymerase surface that
dye can access
Dye cannot access
polymerase surface
Photodamage Mitigation: Photo-Protected Analogs
A large macromolecule scaffold can prevent the dye from touching the polymerase
Dye
Early PacBio chemistries
q108 q208 - 453
q308 q408 q109 q209 -
1012
q309 -
1734
q409 q110 q210- lpr
q310 q410 q111 q211 - fcr
q311 q411 -
ecr2
q112 q212 - c2
q312 q412 - xl
q113 q213 q313 q413 -
p5c3
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
453 1012
1734LPR
FCR
ECR2
C2–C2
P4–C2
P5–C38,500 bp
Rea
d Le
ngth
(bp
)
2008 2009 2010 2011 2012 2013
P5-C3 Chemistry Average Read Lengths
Throughput: ~ 350-400 Mb
New Products and Features
Reagents• DNA Sequencing Reagent 3.0 • DNA/Polymerase Binding Kit P5
Instrument control Software 2.1• 3 hour movies• CCS processing off instrument• Reliability enhancement: Instrument adjustment to SMRT Cell SNR
SMRT Analysis 2.1• Amplicon Haplotype phasing• Diploid Quiver • BridgeMapper
Binding Calculator• Support for Size Selected libraries
New Protocol to support Sage size selection• Fully supported 20kb size selected library protocol
New Products and Features
Reagents• DNA Sequencing Reagent 3.0 • DNA/Polymerase Binding Kit P5
Instrument Control Software 2.1• 3-hour movies• CCS processing off instrument• Reliability enhancement: Instrument adjustment to SMRT® Cell SNR
SMRT Analysis 2.1• Amplicon Haplotype phasing• Diploid Quiver • BridgeMapper
Binding Calculator• Support for Size Selected libraries
New Protocol to support Sage size selection• Fully supported 20kb size selected library protocol
New Products and Features
Reagents• DNA Sequencing Reagent 3.0 • DNA/Polymerase Binding Kit P5
Instrument Control Software 2.1• 3-hour movies• CCS processing off instrument• Reliability enhancement: Instrument adjustment to SMRT Cell SNR
SMRT® Analysis 2.1• Diploid Quiver• Amplicon Haplotype Phasing • BridgeMapper
Binding Calculator• Support for Size Selected libraries
New Protocol to support Sage size selection• Fully supported 20kb size selected library protocol
New Products and Features
Reagents• DNA Sequencing Reagent 3.0 • DNA/Polymerase Binding Kit P5
Instrument Control Software 2.1• 3-hour movies• CCS processing off instrument• Reliability enhancement: Instrument adjustment to SMRT Cell SNR
SMRT® Analysis 2.1• Diploid Quiver• Amplicon Haplotype Phasing • BridgeMapper
Binding Calculator• Support for size-selected libraries
New Protocol to support Sage size selection• Fully supported 20kb size selected library protocol
New Products and Features
Reagents• DNA Sequencing Reagent 3.0 • DNA/Polymerase Binding Kit P5
Instrument Control Software 2.1• 3-hour movies• CCS processing off instrument• Reliability enhancement: Instrument adjustment to SMRT Cell SNR
SMRT® Analysis 2.1• Diploid Quiver• Amplicon Haplotype Phasing • BridgeMapper
Binding Calculator• Support for size-selected libraries
New Protocol to support Sage Science’s BluePippin™ size selection• Fully supported 20 kb size-selected library protocol
Social Media Policy
• Talks are ‘tweetable’ unless speakers says otherwise
• Use #PacBioUGM for hashtag
• Follow us: @PacBio
• Subscribe to our Blog at:
Blog.pacificbiosciences.com@PacBio
Morning Agenda
Time Agenda
9:00 – 9:15 a.m. Opening RemarksKevin Corcoran, Senior Vice President, Market Development, Pacific Biosciences
9:15 – 9:35 a.m. Overview of JGI’s Microbial and Fungal Reference Assembly PipelineAlex Copeland, DOE - Joint Genome Institute
9:35 – 9:55 a.m. The Epigenomic Landscape of BacteriaMatthew Blow, Ph.D., DOE - Joint Genome Institute
9:55 – 10:20 a.m. Population Genomics & Molecular Diagnostics: 100K Pathogen GenomesBart Weimer, Ph.D., UC Davis - School of Veterinary Medicine
10:20 – 10:45 a.m. Population Genomic and Epigenomic Study of Arabidopsis thaliana with SMRT Sequencing Chongyuan Luo, Ph.D., Salk Institute for Biological Studies
10:45 – 11:20 a.m. Coffee Break
11:20 – 11:45 a.m. A Comparison of 454 and PacBio RS in the Context of Characterizing HIV-1 Intra-host DiversityLance Hepler, Ph,D.,Center for AIDS Research, UC San Diego
11:45 – 12:10 p.m. PacBio Meets the MicrobiomeGeorge Weinstock, Ph.D., Washington University St. Louis
12:10 – 12:35 p.m. Genomic Architecture of the KIR and MHC-B and -C Regions in OrangutanLisbeth Guethlein, Ph.D., Stanford University School of Medicine
Afternoon Agenda
Time Agenda
12:35 – 1:55 p.m. Lunch
1:55 – 2:20 p.m. Reconstructing Complex Regions of Genomes Using Long-read Sequencing Technology John Huddleston, M.S., University of Washington
2:20 – 2:45 p.m. Taking Advantage of Long RNA-Seq ReadsVince Magrini, Ph.D., Washington University St. Louis
2:45 – 3:10 p.m. Chicken in Awesome Sauce: A Recipe for New Transcript IdentificationAlisha K. Holloway, Ph.D., Gladstone Institutes
3:10 – 3:45 p.m. Break
3:45 – 4:10 p.m. Gene Isoform Identification by Error-corrected PacBio DataKin Fai Au, Ph.D., Stanford University
4:10 – 4:35 p.m. Optimizing BluePippin Size Selection for Increased Subread Lengths on the PacBio RS II Bobby Sebra, Ph.D., Mt. Sinai School of Medicine
4:35 – 5:00 p.m. Looking Ahead: Improving Workflows for SMRT SequencingJonas Korlach, Ph.D., Pacific Biosciences
5:00 – 6:30 p.m. Reception - Sponsored by Sage Science
Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, and SMRTbell are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.