161115 precision fda giab
TRANSCRIPT
Genome in a Bottle: so you’ve sequenced a genome, how well did you do?
Justin Zook and Marc SalitNIST Genome-Scale Measurements Group
JIMB
November 8, 2016
Sequencing technologies and bioinformatics pipelines disagree
O’Rawe et al. Genome Medicine 2013, 5:28
Bringing Principles of Metrology to the Genome
• DNA you can buy from NIST• PGP Genomes suitable for commercially derived products
Reference Materials
Technology Development
Extensive State-of-the-Art Characterization
• Arbitrated ”gold-standard” calls for SNPs and small indels
• Analysis on-going as technology develops• Developing benchmarking tools with GA4GH
• RMs used to develop and demonstrate new technology
• Characterization “upgradeable” as technology develops
• Pilot Human Genome – RM 8398 - Daughter of Utah/European Ancestry– Released Spring 2015
• New Materials released Fall 2016 – 3 new PGP Human Genomes
• RM 8391 Son of Eastern European Ashkenazim Jewish Ancestry • RM 8392 Trio of Eastern European Ashkenazim Jewish Ancestry• RM 8392 Son of Chinese Ancestry
– 1 Microbial Genomic RM• RM 8375 –
– set of 4: Salmonella Typhimurium LT2, Staphylococcus aureus, – Pseudomonas aeruginosa, and Clostridium sporogenes
NEW FOR 2016 !!
Human and Microbial Genomic Reference Materials
PrecisionFDA “Truth” Challenge
• First benchmark calls for HG002 released after the challenge
• Is there evidence that pipelines are “tuned” to NA12878/HG001?
• 35 entries
Global Alliance for Genomics and Health Benchmarking Task Team
• Developed standardized definitions for performance metrics like TP, FP, and FN.
• Developing sophisticated benchmarking tools• Integrated into a single framework
with standardized inputs and outputs
• Standardized bed files with difficult genome contexts for stratification
https://github.com/ga4gh/benchmarking-tools
Variant types can change when decomposing or recomposing variants:
Complex variant:chr1 201586350 CTCTCTCTCT CA
DEL + SNP:
chr1 201586350 CTCTCTCTCT Cchr1 201586359 T A
Credit: Peter Krusche, IlluminaGA4GH Benchmarking Team
Benchmarking Tools
Standardized comparison, counting, and stratification with Hap.py + vcfeval
https://precision.fda.gov/ https://github.com/ga4gh/benchmarking-tools
FN rates high in some tandem repeats
1x0.3x 10x3x 30x11
to 5
0 bp
51 to
200
bp
2bp unit repeat
3bp unit repeat
4bp unit repeat
2bp unit repeat
3bp unit repeat
4bp unit repeat
FN rate vs. average
What we learned
• Many variant callers perform similarly well overall
• Performance varies across variant types and regions
• No clear evidence that pipelines have been tuned to NA12878
• The benchmark set matters– Many variants not yet assessed
• 15-25% of SNPs• 50-80% of indels
• A couple participants gave feedback to improve benchmark calls
• Definitions for performance metrics were refined
• Helped form new collaborations within GIAB
Acknowledgements
• NIST– Marc Salit– Jenny McDaniel– Lindsay Vang– David Catoe
• Genome in a Bottle Consortium• GA4GH Benchmarking Team
• FDA– Liz Mansfield– Zivana Tevak– David Litwack
For More Informationwww.genomeinabottle.org - sign up for general GIAB and Analysis Team google group emails; links to order NIST RMs
github.com/genome-in-a-bottle – Guide to GIAB data & ftp
www.slideshare.net/genomeinabottle
www.ncbi.nlm.nih.gov/variation/tools/get-rm/ - Get-RM Browser
Data: http://www.nature.com/articles/sdata201625
Global Alliance Benchmarking Team– https://github.com/ga4gh/benchmarking-tools
Public workshops – Possible SV integration mini-workshop in Spring 2017– Next large workshop in Fall 2017
NIST postdoc opportunities available!Justin Zook: [email protected] Salit: [email protected]