comprehensive single-nucleotide, indel, structural, and
Post on 02-Jan-2022
3 Views
Preview:
TRANSCRIPT
For Research Use Only. Not for use in diagnostic procedures. © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks of Advanced Analytical Technologies. All other trademarks are the sole property of their respective owners.
Comprehensive single-nucleotide, indel, structural, and copy-number variant detection in human genomes with PacBio HiFi readsAbstract #: 917418William J. Rowell, Aaron M. Wenger, Armin Töpfer, and Luke HickeyPacBio, 1305 O’Brien Drive, Menlo Park, CA 94025
HiFi Reads and Circular Consensus Sequencing Human Genomic Variation HiFi Reads Detect Small Variants with High Precision and Recall
The polymerase reads are trimmed of adapters to yield
subreads
Consensus is called from subreads
Circularized DNA is sequenced in
repeated passes
Circular Consensus Sequencing.A linear template sequence is ligated to SMRTbell adapters. DNA polymerase synthesizes complementary sequences to both strands of the original linear template, leading to rolling circle sequencing and multiple passes of the original template. CCS uses the individual subreads to generate a highly accurate consensus sequence (HiFi read).
Short reads
5 Mb 3 Mb 10 Mb
1 bpSNVs
≥50 bpstructural variants
1-49 bp indels
HiFi reads
Variants in a Human Genome.Human genomes differ at many scales from single-nucleotide variants to large structural variants, which are few by count but contribute most of the basepairdifferences between two human genomes. Short-read sequencing gives a broad assay of variation, but it misses most structural variants and the small variants in difficult-to-map regions. HiFi reads provide a comprehensive view of variation of all classes, including in difficult regions of the genome.
Eichler, EE (2019) Genetic Variation, Comparative Genomics, and the Diagnosis of Disease, N Engl J Med. DOI: 10.1056/NEJMra1809315
HiFi Reads Excel at Detecting Structural Variants Extended pbsv to Call Copy-Number Variants HiFi Reads Characterize Pathogenic Variant in Mendelian Disease
hapl
otyp
e 1
hapl
otyp
e 2
STRC
GRCh37 15:43,891,619-43,911,196 (19 kb)
HG002HiFi reads
HG002short reads
HIFI READS IDENTIFY AND PHASEVARIANTS IN DIFFICULT REGIONS
SMALL VARIANT CALLINGWITH HIFI READS UTILIZES A
STANDARD WORKFLOW
•HiFi reads match short reads for SNV calling.• Indel performance has improved rapidly and
further improvements are expected.
Map to reference(pbmm2)
Call variants(DeepVariant)
DeepVariant (DV),30-fold HiFi
GATK30-fold NovaSeq
Precision Recall Precision RecallSNV 99.97% 99.97% 99.85% 99.88%Indel,
DV 0.8 96.90% 95.98%99.37% 99.16%
Indel,DV 1.0 98.94% 98.88%
GRCh37 13:112,993,400-112,994,200 (800 bp)
HIFI READS SPAN LARGE INSERTIONSAND DETECT VARIANTS IN REPETITIVE
REGIONS
STRUCTURAL VARIANT CALLINGWITH HIFI READS UTILIZESA STANDARD WORKFLOW
•HiFi reads match short reads for SNV calling.• Indel performance has improved rapidly and
further improvements are expected.
Map to reference(pbmm2)
Call variants(pbsv)
pbsv30-fold HiFi
Manta(DRAGEN 3.5)
30-fold NovaSeq*Precision Recall Precision Recall
Deletions 96.7% 95.0% 94.0% 70.1%
Insertions 96.0% 94.9% 95.3% 54.7%
328 bp insertion
*https://www.linkedin.com/pulse/dragen-35-out-rami-mehio/
hapl
otyp
e 1
hapl
otyp
e 2
HG002HiFi reads
HG002short reads
PBSV COMBINES READ CLIPPING ANDREAD DEPTH TO CALL COPY-NUMBER
VARIANTS
DETECTS DUPLICATIONS THAT ARE TOOLONG TO SPAN WITH INDIVIDUAL READS
Determine genome-wide coveragemedian coverage of non-gap
positions at mapping quality 60
Identify candidate CNV breakpointspositions with multiple clipped reads
Evaluate coverage between adjacent candidate breakpoints
calculate z-score vs Poisson expectation
HG001pbsv CNV
GRCh37 6:220,626-410,470 (189 kb)
Genes
Segdups
Repeats
HG001 HiFi
reads
FAME2 DISEASE COHORT
• Familial Adult Myoclonic Epilepsy (FAME) is characterized by 1) myoclonic tremor and 2) myoclonic or tonic-clonic seizures.• FAME2 is linked to chr2.•WGS identified a repeat expansion intronic to STARD7 in 158/158 individuals.
HIFI READS CHARACTERIZE A COMPLEX REPEAT IN ONE AFFECTED INDIVIDUAL
TAAAA/TTTTA × 388 (1,942 bp)
TGAAA/TTTCA × 274(1,370 bp)
TCCCGAGTAGCTGGGATTACAGGCGTCCACCACCATGCCCAGCTAATTTTTGTATTTTTAGTAGAGATGGAGTTTCACCATGTTTCCCAGGCCGGTCTCGAACTCCTGACATCAGGTGATCCGCCCACCTCGGCCTCCCAAAGTCTGGGATTACAGCGTGAGGCCGTTGTGCTTGGCTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTATTTTATTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTATTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTTTATTTTATTTTATTTTATTTTATTTTATTTTTATTTTATTTTATTTTATTTTATTTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTAATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTATTTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTTATTTTATTTTATTTTATTTTATTTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTCATTTCATTTCATTCATTTCAATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTTCATTCATTTCATTTCATTTCATTTCATATTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATAAATAAGACGGAGTTTCGCTCTTGTTGCCCAGGCTAGAGTGCAATGGCACGATCTTGGCTCCCTGCAACCTCCGCCTCCCGTGTTCAAGCGATTCTCTTGCCTCAGTCTCCCGAGTAGCTGGGATTACAGGTATGTGCCACCGTGCCCAGCTAATTTTGTATTTTTAGTAGAGACAGGGTTTCTCCACGTTGGTTAGGCTGGTCTCAAACTCCTGACCTCAGGTGATCGCCTGCCTCAGCCTCCCAAAGTATTGGGATTACAGGCGTGAGCCACTGCGCCTAGCCTATTTTATTTTTTAAGAGACAGTGTAGCTGGGCACGGTGGTTT
HiFiread
subreads
Corbett, MA (2019) Intronic ATTTC repeat expansions in STARD7 in familial adult myoclonic epilepsy linked to chromosome 2, Nat Commun. doi:10.1038/s41457-019-12671-y.
https://github.com/google/deepvariant
https://github.com/PacificBiosciences/pbsv
https://ccs.how
top related