ashg 2015 genome in a bottle

23
Genome in a Bottle: You’ve sequenced. How well did you do? October 9, 2015 Justin Zook, Marc Salit, and the Genome in a Bottle Consortium *Nothing to Disclose

Upload: genomeinabottle

Post on 15-Apr-2017

2.741 views

Category:

Health & Medicine


0 download

TRANSCRIPT

Page 1: ASHG 2015 Genome in a bottle

Genome in a Bottle: You’ve sequenced. How well did you do?

October 9, 2015

Justin Zook, Marc Salit, and the Genome in a Bottle Consortium

*Nothing to Disclose

Page 2: ASHG 2015 Genome in a bottle

Sequencing technologies and bioinformatics pipelines disagree

O’Rawe et al. Genome Medicine 2013, 5:28

Page 3: ASHG 2015 Genome in a bottle

Sequencing technologies and bioinformatics pipelines disagree

O’Rawe et al. Genome Medicine 2013, 5:28

Who is right?

Is anyone right?

Page 4: ASHG 2015 Genome in a bottle

Genome in a Bottle Consortium (GIAB)Hosted by US National Institute of Standards and Technology

Goal: Provide infrastructure to assess confidence in human variant calls

• Appropriately consented widely available DNA samples, distributed by the Coriell Institute– Also, QCed Reference Material (RM) versions

from controlled lots will be available from NIST– Also, PGP samples are commercially available

• High-accuracy reference data for these samples

• Tools to facilitate their use– With the Global Alliance Data Working Group

Benchmarking Team

Global Alliance for Genomics and Healthga4gh.org

Genome in a Bottlegenomeinabottle.org

Page 5: ASHG 2015 Genome in a bottle

GIAB Selected SamplesCEPH/Utah Pedigree 1463

NA12889

NA12879

NA12890

NA12880NA12881

NA12882NA12883

NA12884NA12885

NA12886NA12887

NA12888NA12893

NA12877 NA12878

NA12891 NA12892

✔ ✔NA24149 NA24143

NA24385

Ashkenazi Jewish Trio

NA24694 NA24695

NA24631

Asian (Han Chinese) Trio

Note: Illumina and RTG have used data from the pedigreeto improve variant calls in the specific GIAB samples.

New

New

PersonalGenomeProject

Available asNIST RM8398

Page 6: ASHG 2015 Genome in a bottle

NGS Validation Process usingGenomes in Bottles

Sample

gDNA isolation

Library Prep

Sequencing

Alignment/Mapping

Variant Calling

Confidence Estimates

Downstream Analysis

Analytical ProcessGenome in a Bottle Scope

Pre-Analytical Process

Clinical InterpretationGIAB Data

Page 7: ASHG 2015 Genome in a bottle

Pilot Genome: NA12878

Page 8: ASHG 2015 Genome in a bottle

Integrated 14 datasets from 5 platforms to establish Reference SNP/indel Calls for NA12878

Zook et al., Nature Biotechnology, 2014.

~77 % High-confidence~23 % Uncertain

Page 9: ASHG 2015 Genome in a bottle

Uses of GIAB NA12878

Oncology – Molecular and Cellular Tumor Markers“Next Generation” Sequencing (NGS) guidelines for somatic genetic variant detection

www.bioplanet.com/gcat

Page 10: ASHG 2015 Genome in a bottle

GeT-RM Browser from NCBI and CDC• http://www.ncbi.nlm.nih.gov/variation/tools/get-rm/

Page 11: ASHG 2015 Genome in a bottle

Global Alliance for Genomics and Health Benchmarking Task Team

• Developed standardized definitions for performance metrics like TP, FP, and FN.

• Developing sophisticated benchmarking tools• vcfeval – Len Trigg• hap.py – Peter Krusche• vgraph – Kevin Jacobs

• Standardized bed files with difficult genome contexts for stratification

Credit: GA4GH, Abby Beeler, Ellie Wood

Stratification of FP RatesHigher FP rates at Tandem Repeats

Page 12: ASHG 2015 Genome in a bottle

New GIAB Triosfrom Personal Genome Project

Page 13: ASHG 2015 Genome in a bottle

Public, unembargoed data from GIAB AJ PGP Trio

Long reads/”Linked” reads• ~70/30/30x PacBio

– ~11kb N50• ~100x BioNano• ~30x 10X Genomics• ~20x Moleculo• Complete Genomics LFR• ~0.005x Oxford Nanopore

Short reads• 300x Illumina paired-end• 15x Illumina 6kb mate-pair• 100x Complete Genomics• 60x SOLiD 5500W• 1000x Ion Proton Exome

http://biorxiv.org/content/early/2015/09/15/026468

Page 14: ASHG 2015 Genome in a bottle

GIAB Analysis Group – New Data Sets

Leaders• Francisco de la Vega• Chris Mason• Tina Graves• Valerie Schneider• Justin Zook• Marc Salit

Status• Analysis Group Responsibilities:

– https://docs.google.com/document/d/10eA0DwB4iYTSFM_LPO9_2LyyN2xEqH49OXHhtNH1uzw/edit?usp=sharing

• Analysis Milestones:– https://docs.google.com/spreadsheets/d/1Pj4nSz

H742g40wJz2fA6f8kFtZYAToZpSZYVPiC5st4/edit?usp=sharing

• Analysis Methods– https://docs.google.com/spreadsheet

s/d/1Je2g85H7oK6kMXbBOoqQ1FMNrvGnFuUJTJn7deyYiS8/edit?usp=sharing

• Analysis Plan:– https://drive.google.com/file/d/0B7Ao1qq

JJDHQdnVEaVdqbWdEdkE/view?usp=sharing

• Collecting Data and analyses on GIAB FTP Site

• Recruiting people to help with the work.

Goal: Establish and distribute a set of authoritative benchmark variant calls of all types and sizes, as well as homozygous reference regions, on GIAB PGP trios

Page 15: ASHG 2015 Genome in a bottle

Analysis Progress: AJ Trio• SNPs/indels

– NIST working on integration– 10X/moleculo/PacBio for difficult-to-map regions

• Assembly– 2 de novo assemblies – Useful for SV calling

• Structural variants– Candidate calls being generated by 15+ groups with >20

different algorithms and 6 datasets– 3+ integration methods

• Long-range Phasing– 2 phased calls so far (CG LFR and 10X)– Integration methods needed

• Other analyses– CpG methylation with PacBio and Illumina

Page 16: ASHG 2015 Genome in a bottle

GIAB AJ Trio PacBio-only AssembliesPacBio Only

Input Algorithm# of

Contigs N50 Max Total

ChildMHAP/Celera (Phillippy Lab) 13,048 4.5Mb 35.1Mb 3.0Gb

ChildDaligner/Falcon

(Chin/Bashir) 9,973 7.1Mb 39.2Mb 3.0Gb

MotherMHAP/Celera (Phillippy Lab) 23,493 1.03Mb 8.9Mb 3.0Gb

FatherMHAP/Celera (Phillippy Lab) 16,326 0.91Mb 9.8Mb 3.0Gb

Merged Trio

Daligner/Falcon(Chin/Bashir) 5,680 9.25 Mb 50.3Mb 2.9Gb

Credits: Ali Bashir, Jason Chin, Adam Phillippy, and Serge Koren

Page 17: ASHG 2015 Genome in a bottle

GIAB AJ Trio Hybrid PacBio/BioNano Assembly

Hybrid (PacBio with BioNano)

Input Assembly Notes# of

Scaffolds N50 Max TotalHG002 Falcon 248 22.7Mb 92.8Mb 2.38Gb

Trio Falcon 210 29.3Mb 87.6Mb 2.32GbTwo Step

Triocelera (child) +

falcon (trio) 187 34.3Mb 98.0Mb 2.6Gb

Credits: Ali Bashir, Jason Chin, Alex HastiePendleton et al, Nature Methods, 2015

Page 18: ASHG 2015 Genome in a bottle

Proposed approach to form high-confidence SV (and non-SV) calls

Generate Candidate Calls

Compare/evaluate calls using Parliament/MetaSV/svclassify/others?;

manual inspection

Integrate new and revised calls; manual inspection

Combine integrated calls; manual inspection; targeted experimental validation?

August 30, 2015

Nov 1, 2015

Jan 1, 2016

Jan 26, 2016 and beyond

Page 19: ASHG 2015 Genome in a bottle

Very Preliminary Confirmation of SVs

Integration results from AJ son

Parliament: BMC Genomics, 2015, 16:286 (performed by Andrew Carroll, DNAnexus)MetaSV: Bioinformatics, 2015, 31:2741 (performed by Marghoob Mohiyuddin, Bina/Roche)

• Parliament– Candidates from Illumina– Confirmed by PacBio and/or

Illumina– ~50% in both technologies– ~4.5k deletions, 1k insertions– 85% of Genotypes consistent

within Trio • MetaSV

– Multiple types of evidence from Illumina

MetaSVTotal:2809

ParliamentTotal:5467

569(20 %)

977(18 %)

MetaSV2240

(80 %)Parliament

4490(82 %)

50 % reciprocal overlapSome overlap within Parliament calls

Page 20: ASHG 2015 Genome in a bottle

New GIAB GitHub Site

github.com/genome-in-a-bottle Credit: Chunlin Xiao, NCBI

Page 21: ASHG 2015 Genome in a bottle

WARNINGS

• Easiest to benchmark only within high-confidence bed file

• Benchmark calls/regions tend to be biased towards easier variants and regions– Some clinical tests are enriched for difficult sites

• Always manually inspect a subset of FPs/FNs• Stratification by variant type and region is

important• Always calculate confidence intervals

Page 22: ASHG 2015 Genome in a bottle

Acknowledgments

• FDA – Elizabeth Mansfield, Computing staff

• Many members of Genome in a Bottle– New members

welcome!– Sign up on website for

email newsletters

Steering Committee– Marc Salit – Justin Zook– David Mittelman – Andrew Grupe – Michael Eberle– Steve Sherry – Deanna Church – Francisco De La Vega– Christian Olsen – Monica Basehore – Lisa Kalman – Christopher Mason – Elizabeth Mansfield – Liz Kerrigan – Leming Shi – Melvin Limson – Alexander Wait Zaranek – Nils Homer – Fiona Hyland– Steve Lincoln – Don Baldwin – Robyn Temple-Smolkin – Chunlin Xiao– Kara Norman– Luke Hickey

Page 23: ASHG 2015 Genome in a bottle

For More Informationwww.genomeinabottle.org - sign up for general GIAB and Analysis Team google group emails

github.com/genome-in-a-bottle – Guide to GIAB data & ftp

www.slideshare.net/genomeinabottle

www.ncbi.nlm.nih.gov/variation/tools/get-rm/ - Get-RM Browser

Data: http://biorxiv.org/content/early/2015/09/15/026468

Global Alliance Benchmarking Team– ga4gh.org/#/benchmarking-team

Twice yearly workshop – Winter: January 28-29, 2016 at Stanford University, California, USA– Summer at NIST, Maryland, USA

Public Meetings!

Justin Zook: [email protected] Salit: [email protected]

Contribute calls or critically evaluate

GIAB calls!

NIST/NRC Postdoc Opportunities available!