optimizing blue pippin size- selection for increased ... · optimizing blue pippin size- selection...
Post on 27-May-2018
213 Views
Preview:
TRANSCRIPT
Optimizing Blue Pippin Size-
Selection for Increased SubRead
Lengths on the RSII
Robert P. Sebra, Ph.D.
Icahn School of Medicine at Mount Sinai
Icahn Institute for Genomics & Multi-scale Biology
@IcahnInstitute
Applications in Clinical Genomics
An Integrated Omics Approach
Requires a Multi-faceted Sequencing Technology Sandbox
▶ Microarray Genotyping and
Expression
• 1 x Illumina Bead Array platform
▶ Liquid Handling Automation
• 1 x Agilent Bravo
• 2 x Tecan EBO
• 1 x Beckman FX Robot
▶ Local IT Infrastructure
• 100 Tb mirrored primary
storage
• 1,500 Tb secondary storage
Multi-Platform DNA Sequencing
First Generation
• 1 x Applied Biosystems 3730xl
Second Generation
• 5 x Illumina HiSeq 2500
• 1 x MiSeq
• 1 x Ion Proton
• 2 x Ion PGM
Third Generation
• 2 x PacBio RS II
The SMRT Sequencing Program at Mount Sinai GCF
Human Sequencing
-Complex genetic loci (TR,
CNV, rearrangements, etc.)
-Exploring intronic space
-Clinical validation
-Allelic phasing
-High resolution genotyping
-Targeted sequencing
Infectious Disease
-Hospital surveillance
-Rapid microbial finishing
-Phasing plasmids
-Building phylogenies
-Metagenomics
-Understanding co-infection
Epigenetics
-mtDNA disease
-Discovering novel motifs
-Growth phase comparisons
-Virulence factors
-Oxidative or photochemical
damage associated w/ cancer
Basic Research
-Full length transcriptome
-Basic genomics research
-Novel Bifx pipelines
-Reducing DNA input
-Methods development for
targeted/capture capability
12-month Highlights
~1800 SMRTcells sequenced
in the past 14 months
-Throughput has increased
by ~4X, but as much as 10X
in some cases
-subRL has increased by
~2X, but up to 5X
RSII Upgrades
&
Sage Blue Pippin SS
Size Selection using the Blue Pippin Technology I
5
48502
10086
12220
Inp
ut
10
kb
sh
ear
Inp
ut
20
kb
sh
ear
>5kb >10kb
Shear to 10kb avg.
Size select >5kb
Shear to 20kb avg.
Size select >10kb
8271, 8614 Sheared
DNA
Pippin
High-Pass
Selected
1 5 10 15 20 25 30 35 40
[DNA]
Size (kb) PFGE view
4 samples per run.
Typical runtimes for
DNA >10kb:
2-8 hours.
Size Selection using the Blue Pippin Technology II
Applications Requiring Blue Pippin Size Selection at Sinai
Large Insert Size Selection
(0.75% Cassette)
-Removal of fragments <10kb
-Purification of specific DNA (mtDNA)
-Selection of large plasmids
Short Insert Size Selection
(2.0% Cassette)
-Amplicon or digested DNA selection
(by size)
-Purification of short libraries
-Plasmid selection < 10kb
Removal of <10kb
library
Selected 225-
275bp
Maximize library size
for longest N50 subreads Sample purity
2kb SMRTbell Library Construction
Obtain tissue, cells, or swab for generating gDNA
Culture colony to generate the
appropriate mass
Isolate 6ug gDNA (various methods)
OR
Use 5ug gDNA or Shear to ~20kb
Shear 1ug to 2kb-6kb
Tet Convert, if desired
10ng to WGA if control needed
Blue Pippin Size Select if needed
Sequencing using 20 pM P4-C2, 120’ MB collection
20kb+ SMRTbell Library Construction
Blue Pippin Size Select at >7 or 10kb
Sequencing using 50 pM P4-C2, 120’ MB collection
Option: Mix w/ Plasmid Library
HGAP De Novo Assembly &
Variant Analysis Plasmids Sequenced and/or
Base Modification Profile
Standard Pipeline
Plasmid (<10kb) Pipeline
Pipelines
Blue Pippin Size Selection Performance: Yield & Capacity
2 SMRTcells
10 SMRTcells
40 SMRTcells
100 SMRTcells
-Only 1 SS library has insufficient yield for 2 chips
-Average % yield (by mass) = 24%
-Yield is dependent on input DNA size distribution but
parameters can be adjusted accordingly (7kb vs 10kb)
Tips for Maximizing Yield:
1. Use DNA isolation
techniques with gentle steps
(Qiagen Tip works well)
2. Conduct AMPure prior to
shearing to remove small
fragments for true
assessment of input mass.
3. Remove any bubbles in
cassette by tapping, etc.
4. Wash all loading & elution
wells with E-buffer prior to
placing sample
5. If bubble is in well, don’t
use that well.
6. After sample collection,
rinse all elution wells w/
40uL of E-buffer to collect
DNA
67 BPSS libraries
Blue Pippin Size Selection Impact on SubRL Distributions
~12.5kb ~5kb
w/o BPSS:
95th % ~ 12,400bp
N50 ~ 5000bp
Best Read ~21,000bp
~240Mb mapped
w/ BPSS:
95th % ~ 19,700bp
N50 ~ 12,500bp
Best Read ~34,500bp
~325Mb mapped /
cell
2 SMRTcells
1 SMRTcell
subRL Threshold
%
Seq
uen
ced
Bas
es
Impact of BPSS on N50 Length (MRSA)
Rapid & Cost Effective Infectious Disease Surveillance
-Main culprit in healthcare-associated infections
-Regulatory agencies now require mandatory
reporting, reimbursements held
-At Sinai, each positive blood stream culture
requires a consult due to:
High mortality rates
High treatment failure rates/relapses
Drug resistance
-Accounts for nearly half of all inpatient
Infectious Diseases consultations
Patient Cultures
Drawn
+ Cultures inoculated
on agar plates in
MicroLab
Colonies identified by
routine MicroLab
techniques
Colonies streaked for
single colony & single
colony grown
Cells harvested, spun,
and lysed for DNA
isolation
DNA isolated using
Qiagen DNATip (or
similar kit)
gDNA QC & sample
preparation
(previously shown)
PacBio RSII
sequencing & HGAP
assembly
~ 48 hours & <$300
Pipeline: From Bedside to Sequencing
Example: MRSA
Examples of Microbial Assemblies after BPSS / RSII
# of
SMTcells # of Reads Mean
subRL
95th %
RL
Coverage
Assembled
Bases
N50 of
Contigs
Largest
Contig
#
Contigs
MRSA 1 1 50,000 2,986 12,448 83 X 2.8 Mb 47,597 173,012 87
MRSA 2 1 55,010 3,787 13,038 88 X 2.98 Mb 300,802 619,820 24
MRSA 1’ 2 79,032 7,088 19,680 110 X 2.90 Mb 1.96 Mb 1.96 Mb 11
MRSA 3 2 22,106 5,787 15,959 40 X 2.92 Mb 2.92 Mb 2.92 Mb 2
MRSA 4 2 42,253 7,485 20,771 105 X 2.94 Mb 2.94 Mb 2.94 Mb 1
Bacteria 1 4 197,239 5,507 14,785 239 X 4.75 Mb 4.73 Mb 4.73 Mb 3
Bacteria 2 4 267,140 5,424 14,795 355 X 4.53 Mb 4.53 Mb 4.53 Mb 1
Bacteria 3 4 187,322 4,868 13,940 210 X 4.83 Mb 3.46 Mb 3.46 Mb 10
Bacteria 4 4 168,612 5,578 13,981 204 X 4.78 Mb 799,485 2.09 Mb 18
Bacteria 5 4 192,349 5,758 13,909 238 X 4.71 Mb 4.69 Mb 4.69 Mb 2
-Complete microbial genomes in as little as 1-4 SMRT Cells
-Some contigs may be plasmids
-MRSA 1 is reduced from 87 to 11 contigs w/ BPSS (Others to a single contig)
-DNA quality (from bead beating conditions and/or prep) is important
-All assemblies stats from HGAP 2.0.x (no partial alignments)
10X Mapped Coverage of Human Genome NA12878
454 Reads
Number of reads ~100M
Mapped coverage ~15X
Single and Paired End Illumina Reads
Number of reads ~100s of M
Mapped coverage ~30X
PacBio Reads
Number of Reads ~12M
Mapped coverage 10X+
Mean sub-read length 2,766
Mean unrolled read length 4,066
95th Percentile 11,630
Accuracy (error-corrected reads)
>99%
Hard to Sequence Human Genes w/ Disease Association
Panel of genes we’re interested in
for inherited disease screening
Genes involving TNR expansions
CACNA1A gene TNRs Associated w/ Spinocerebellar Ataxia 6
CAG ACC
Calcium channel, voltage-
dependent, P/Q type, alpha
1A subunit gene
chr19:13318673-13318711
chr19:13319695-13319721
~10-15X reads span both
TNRs
Arrow denotes read spanning exons to link
separate TNRs
Variants sssociated with familial
hemiplegic migraine and episodic ataxia
Long Reads Suggest Under Representation of TR Spans in
Reference
Reference suggests
bias towards repeat
length compression
Impact of BPSS & Longer ReadLength Chemistry on NA12878
CLR Means w/ & w/o Blue Pippin Size Select
Mean sub-RL w/o Pippin SS 2,766 bp
Mean sub-RL w/ Pippin SS 4,491 bp
95th Percentile w/o Pippin SS 11,630 bp
95th Percentile w/ Pippin SS 13,266 bp
~22Mb > 10kbp
~82Mb > 5kbp Fragments < 7kb eliminated
Size Select Protocol was set to select 7kb-50kb
Standard 10-20kb library
Size Selected 10-20kb library
-Blue Pippin electrophoretic size selection improves subRL by 62% by
removing DNA fragments < 7kb to avoid small molecule loading bias.
Impact of BPSS & Longer ReadLength Chemistry on NA12878
0 10000 20000 30000
0
0.25
0.50
0.75
1.0
SubRL (bp)
February
Min. : 40
1st Qu.: 974
Mean : 2165
3rd Qu.: 2739
Max. :21458
Present
Min. : 40
1st Qu.: 1907
Mean : 5520
3rd Qu.: 8211
Max. :34216
August
Min. : 40
1st Qu.: 1324
Mean : 4019
3rd Qu.: 5855
Max. :30474
Currently, we generate ~1X NA12878 coverage in 8-10 SMRTcells for ~$1000
SubRL Statistics
Thanks! Sinai Team
Deena Altman
Ali Bashir
Imane Bourgui
Gintaras Deikus
Andrew Kasarskis
Alona Keren-Paz
Milind Mahajan
Eric Schadt
Anne Schaefer
Harm van Bakel
Ajay Ummat
Cornell
Roger Altman
Russell Durrett
Chris Mason
CSHL
Eric Antoniou
Richard McCombie
Patricia Mocombe
NYGC/Rockefeller
Bob Darnell
NYU
Bo Shopsin
Pacbio
Jason Chin
Ellen Paxinos
PacBio Field Staff
Sage Science
Stay Connected with Us!
20
@IcahnInstitute
top related