jan2016 bina giab

22
Validating, Enhancing and Using GiaB Reference Materials 1 GiaB Workshop, Stanford University January 29, 2016 Bina Technologies, Roche Sequencing Marghoob Mohiyuddin Mohammad Sahraeian Hugo Y. K. Lam

Upload: genomeinabottle

Post on 17-Jan-2017

325 views

Category:

Health & Medicine


0 download

TRANSCRIPT

Page 1: Jan2016 bina giab

Validating, Enhancing and Using GiaB Reference Materials

1

GiaB Workshop, Stanford UniversityJanuary 29, 2016

Bina Technologies, Roche Sequencing

Marghoob MohiyuddinMohammad Sahraeian

Hugo Y. K. Lam

Page 2: Jan2016 bina giab

Background

2

Page 3: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures. 3

What we do

Page 4: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures. 4

Collaborative scientific innovation

SeqAlto

VarSim

Page 5: Jan2016 bina giab

Benchmarking variant-calling

Page 6: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures.

The VarSim framework

Page 7: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures.

The VarSim framework

Simulate and validate whole-genomesBoth somatic and germline simulation supportedComprehensive simulation and validation of multiple kinds of variants

SNPs, small Indels, SVs

Page 8: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures.

Assessing variant-calling

VarSim simulation of 50x Illumina 2x100bp reads for NA12878

Error profiles from NA12878 platinum genomes sample

SNVs and small indels from GiaB high-confidence set

SVs from multiple sourcesDeletions from 1000Genomes

Insertions randomly sampled from DGV and sequences from HuRef insertion

sequences

Inversions, duplications randomly sampled from DGV

8

Page 9: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures.

Small variant calling accuracy

Page 10: Jan2016 bina giab

Validating GiaB gold set

Page 11: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures.

GiaB SVs

11

GiaB released high-confidence SVs for NA128782676 deletions, 68 insertions

Trio sequences available from Illumina Platinum GenomesMetaSV calls SVs by integrating from multiple methods

http://bioinform.github.io/metasv/Validation of GiaB gold set using MetaSV trio analysis ensures quality of GiaB gold set

Page 12: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures.

MetaSV trio analysis

12

Validate by analyzing trios (50x coverage)MetaSV calls for parents (NA12891, NA12892)MetaSV calls for NA12878

CriteriaDeletions >= 100bp considered (2,348/88% in GiaB)Reciprocal overlap of 50%GiaB deletion validated ifDetected by MetaSV in any parent (multiple samples)Detected by MetaSV as high-confidence in NA12878 (multiple methods)Reported in previous literatures

Page 13: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures.

MetaSV: an ensemble approach

13

Page 14: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures.

MetaSV workflow

14

Merge SVs from multiple methods

Multiple methods high-conf.8 SV callers supported

Enhanced insertion detectionSoft-clip analysis + assembly

Assembly and alignment to refine breakpointsSupports Del, Ins, Inv, Dup, Trans.Supported in bcbio

Page 15: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures. 15

GiaB deletion validation Total Validated

Total not validated

Additionally Validated

0(0%)

2,348 (100%)

0 (0%)

2,302 (98.0%)

46 (2.0%)

2,302(98.0%)

2,306 (98.2%)

42 (1.8%)

4(0.2%)

2,342 (99.7%)

6 (0.3%)

36(1.5%)

GiaB HC

GiaB HC Validated by Parents (MetaSV ALL)

GiaB HC Validated by Child (MetaSV PASS)

GiaB HC Validated by Child (curated)

Page 16: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures. 16

Mendelian validation

● MetaSV High Quality Trio Deletions: ○ Mendelian Inheritance Consistency with Genotypes○ Pass in Child and ALL in Parents○ Considering no call as reference call

MetaSV Trio Dels2,582

GiaB HC Dels2,348

GiaB Private222

Common2,126

MetaSV Private456

(142 not in literature)

MetaSV PASS Dels2,671

96.7% are Mendelian consistent

(98.7% if ignoring genotypes)

Page 17: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures. 17

Trio analysis summary

GiaB SVs have a high validation rate using MetaSV trio analysisOnly 6 unvalidated SVs do not have strong support in IGV or SVVIZ GiaB deletions of high quality

Almost all (up to 98.7%) MetaSV PASS calls are Mendelian consistent making them high-quality

Significant number (456) of MetaSV trio calls not in GiaBPossibly missed due to stringent GiaB requirements since 321 of those in literature

MetaSV trio validation can help validate and extend the gold set

Page 18: Jan2016 bina giab

Enhancing GiaB gold set

Page 19: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures. 19

Work on Jewish trio

Enhanced MetaSVAssembly to get evidence for and refine all kinds of SVsOptimizations to speed up analysis

Calls for Jewish trio submittedWill help build high-confidence SV calls for other trios

Page 20: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures. 20

MetaSV and Parliament

2240

(Met

aSV,

79.

7%)

4490

(Par

liam

ent,

82.1

%)

569

(20.

3%)

977

(17.

9%)

MetaSV total 2809

Parliament total 5467 (after uniq)

Reciprocal overlap of 50% was used, no genotype matching performed. With 90% reciprocal overlap, 75.9% MetaSV calls and 54.0% Parliament calls overlapped.

Page 21: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures. 21

Summary

GiaB high-confidence calls regularly used to assess variant-calling pipelines

For both simulated and real dataMetaSV was used to perform validation for GiaB deletion SVs for NA12878

Trio analysis can also be applied to the Jewish and Han samplesContributing MetaSV calls for Jewish trio to help build better SV gold sets

Page 22: Jan2016 bina giab

For Research Use Only. Not for use in diagnostic procedures. 22

Acknowledgements

Bina: Mohammad Sahraeian, Jian Li, John MuGiaB: Justin Zook, Hemang Parikh