the human variome database in australia in 2014 - graham taylor
DESCRIPTION
There are a number of genetics and genomics initiatives underway in Australia, including the Australian node of the Human Variome Project (HVPA), as well as many active research collaborations including familial cancer, endocrine disease, and developmental delay. Most of these projects work with disease-specific databases on a research basis, with the risk that such archives may be ephemeral. HVPA is the only database that is directly integrated with accredited clinical reporting of variants. As such it is designed to capture variants that have passed scrutiny as diagnostically robust, and have therefore already been curated by qualified staff. Registered users access the HVPA database via a secure Internet portal. I will describe three recent developments of the HVPA database and portal: the upgraded search interface, linkage to other datasets via BioGrid using hash-based de-identified case matching, and the introduction of a genome wide database using LOVD3. Finally I will discuss the future direction of the HVPA and the questions of utility, quality control and sustainability of genetic variation databases. Search interface The search interface has to provide useful tools for clinicians and lab scientists so that the HPVA project offers them direct benefits and incentivises them to participate. Following a request for feedback from users, a series of improvements were implemented, initially on a demonstration server and then on the live server following review by the Steering Committee. The highest priorities were for more information about numbers of times particular variants were recorded, the ability to search by range and to filter by pathogenicity. There was also interest in enabling direct uploading of VCF files and the automated calculation of pathogenicity scores. Many of these features are now implemented and examples will be presented. Linkage to other datasets We have implemented the hash key algorithm and work is in progress with BioGrid to link variation data to clinical data sets. Genome wide database We have established an HVPA LOVD3 database and are working with the Human Genetics Society of Australasia on a pilot study to sequence the exomes of two trios and review the data using this database.TRANSCRIPT
Acknowledgments
Genomic Medicine & Translational Pathology, University of Melbourne: Arthur Lian Chi Hsu, Renate Marquis-Nicholson, Sebastian Lunke, Clare Love, Kym Pham, Olga Kondrashova, Matt Wakefield, Tiffany Cowie, Barney Rudzki and Paul Waring
Human Variome Project Tim Smith, Alan Lo, Melvyn Leong, David Perkins, Heather Howard, Rania Horaitis Dick Cotton BioGrid Maureen Turner, Leon Heffer Royal College of Pathologists of Australasia Vanessa Tyrrell Peter MaCallum Cancer Centre Ken Doig, Andrew Fellowes Victorian Clinical Genetics Service John-Paul Plazzer, Desiree Du Sart
Human Variome Project (Australasia)
• The bigger picture
• Infrastructure and search interface
• Linkage to other datasets
• Panel, exome and genome testing
• Database accreditation
• Next steps
The big picture
• Rediscovery at the genomics community level that data sharing is win-win
• The Genomic Alliance, HGVS, HUGO
– Data standards
– Nomenclature
– Infrastructure
Nature (Perspective) 508 469-475 2014 Guidelines for investigating causality of sequence variants in human disease
D. G. MacArthur, T. A. Manolio, D. P. Dimmock, H. L. Rehm, J. Shendure, G. R. Abecasis, D. R. Adams, R. B. Altman, S. E. Antonarakis, E. A. Ashley, J. C. Barrett, L. G. Biesecker, D. F. Conrad, G. M. Cooper, N. J. Cox, M. J. Daly, M. B. Gerstein, D. B. Goldstein, J. N. Hirschhorn, S. M. Leal, L. A. Pennacchio, J. A. Stamatoyannopoulos, S. R. Sunyaev, D. Valle, B. F. Voight, W. Winckler & C. Gunter.
Priorities for research and infrastructure development 1. Improved public databases of human genetic variants incorporating explicit, up-to-date supporting
evidence for variant implication in disease and audit trails recording changes in interpretation. 2. Improved incentives, and ethical and logistical solutions, for sharing of genetic and phenotypic data from
both research and clinical diagnostic laboratories. 3. Public databases of variant and allele frequency data from large sets of population reference samples
from a wide range of ancestries. 4. Large-scale genotyping of reported human disease-causing variants in large, well-phenotyped
population cohorts, reducing biases in the assessment of the associated penetrance and phenotypic heterogeneity.
5. Development and benchmarking of standardized, quantitative statistical approaches for objectively assigning probability of causation to new candidate disease genes and variants.
Déjà vu all over again?
Nature Genetics 46, 107–115 (2014) Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database
Bryony A Thompson, Amanda B Spurdle, John-Paul Plazzer, Marc S Greenblatt, Kiwamu Akagi, Fahd Al-Mulla, Bharati Bapat, Inge Bernstein, Gabriel Capella ́, Johan T den Dunnen, Desiree du Sart, Aurelie Fabre, Michael P Farrell, Susan M Farrington, Ian M Frayling, Thierry Frebourg, David E Goldgar, Christopher D Heinen, Elke Holinski-Feder, Maija Kohonen-Corish, Kristina Lagerstedt Robinson, Suet Yi Leung, Alexandra Martins, Pal Moller, Monika Morak, Minna Nystrom, Paivi Peltomaki, Marta Pineda, Ming Qi, Rajkumar Ramesar, Lene Juel Rasmussen, Brigitte Royer-Pokora, Rodney J Scott, Rolf Sijmons, Sean V Tavtigian, Carli M Tops, Thomas Weber, Juul Wijnen, Michael O Woods, Finlay Macrae & Maurizio Genuardi, on behalf of InSiGHT.
Nature Genetics 46, 107–115 (2014)
1. Leiden Open Variation Database (LOVD) 2. Micro- attribution using Open Researcher & Contributor Identification (ORCID) 3. Variant Interpretation Committee (VIC) apply a 5-tiered scheme developed by the
International Agency for Research on Cancer (IARC) classification system 4. Endorsed by the Human Variome Project (HVP)
Not everything in the Nature portfolio is gold
It is good to supplement your pocket money
Early nomenclature papers
• Beaudet
• Tsui
• Antanorakis
Translation into diagnostic practice
• 15 years ago Cotton predicted that the majority of human genetic variants will be detected in a diagnostic context
• As NGS moves into a service setting this transition will become even clearer
• Genetic variants will become part of a patient’s medical record
HVPA database
• Primarily for and of diagnostics
• Diagnostic services are busy
• And cash and time limited
• We have to make it easy for them
• And secure
• And useful
• Maybe even essential
HVPA Objective
A national data sharing facility for improving clinical genetic testing services and supporting medical research
Constitutional, not somatic, mutations
NECTAR project grant UoM FE31082
“Clinical and Molecular Data Linkage Tools”, completion date 30th June 2014
Infrastructure and search interface
• Data repository (“the database”)
• Data handling tools that support data upload from laboratories
• Portal though which the database can be browsed
• Website for news and notifications
Human Variome Project Australian Node
What We’ve Done • NeAT Funding (2010-2011)
– Pilot Phase – 4 labs, 3 diseases
• Breast Cancer • Colon Cancer • Huntington’s
– Portal Launched April 2011 – Molecular Data Only – Collaboration with Mawson
• NeCTAR Funding (2012-2014) – 12 more labs + all genes they test
for – Configuration Tool – Clinical Data/Phenotype Linkage – Transfer data internationally
What We Built
• Collection Tool
• Portal
• Data Model
• Ethics Processes
• Access & Usage Policy
• Data Sharing Agreements
How it works
• Software to interface with existing LIMS (or lack thereof) • Collection occurs after report has been issued • Data types:
– All classified variants reported by a lab – Benign variants – NGS/Incidental findings – Not collecting negative results
• Secure data link between lab and Node • (Semi)-automatic transfer of data • Portal to allow interrogation of all Australian data
– http://www.hvpaustralia.org.au
• Linkage key generator • Submission to BioGrid Platform
Open-Source Solutions
• HVP Portal (v1.0, r512) - A web application which features the basic interface for browsing and querying a HVP node. – Open source – MIT License – Python/django
• HVP Exporter (v1.0, r512) - Basic HVP exporting tool for laboratories. Features simple GUI and error checking interface, plug-in architecture for customisation between sites and common libraries for working with MS Access and MS Excel data sources – Open source – MIT License – .NET C#, python/ironpython
• HVP Importer (v1.0, r512) - A series of tools and web services that receive, decrypt and process information by submitting laboratories using the standard transaction XML format – Open source – MIT License – python
Access to HVPA
• Controlled Access
– Diagnostic Lab Staff
– Registered Medical Practitioners
– Board Certified Genetic Counsellors
• Online application
HVPA Status at November 2013
Strengths
1. Database available on demand for diagnostic labs
2. Tools for data sharing
3. Community engagement with RCPA (QUUP), SA/Mawson, BioGrid, VCGS
4. National reach with international connections via HVPI, WHO & UNESCO
Weaknesses 1. Performance of the existing
HVPA database is limited
2. Laboratory buy-in to the database across Australia is limited
3. The database itself has been hard to access because of low server bandwidth
4. The project has not anticipated the likely impact of next generation sequencing and risks missing inclusion in genomic-scale initiatives now underway.
HVPA 24th March 2014
• 5 laboratories submitting
• 295 Unique Variants
• 27410 Instances
• 25 Registered users
Developments proposed in November ID Area Idea Priority
1 B.Presentation Statisticsofnumberofvariantsforthatgeneastableorbargraph(#unique,#instances,top5qtysubmitted)
1
15 D.Feedback Raiseaconcernaboutaninstance'sinterpretation 12 A.Search Searchbyrange 23 A.Search Searchbygenomicposition 24 A.Search Filterbypathogenicity 25 B.Presentation Sortby...(pathogenicity,otherfields) 26 C.RelevantInfo Displaylinkstorelateddatabaseforgenebyreferencinggenenames.org 27 A.Search Wildcardsearchofvariants 29 A.Search Searchbydiseasewhichshowsmultiplegenesandvariantresults 210 E.NGS VCFdataimportsintoHVPAustralia 213 B.Presentation VarVis-visualisationofgeneandvariantsreported 211 B.Presentation VCFdataexportfromHVPAustraliaofasetofresults 312 B.Presentation Atinstancelevel-seeothervariantsfromthistest/patient 314 C.RelevantInfo Capture&displaySIFTscore 316 D.Feedback Notifylabsthegeneralconcensusofpathogencityofsomethingtheysubmittedhas
changed/updated.i.eTheysubmittedbenignanditsnowlikelypathogenicorsubmitedunknownandknowitssomethingelse
3
17 B.Presentation IntegrationwithEBI/NCBItoolsforqueriesanddisplays 319 B.Presentation Displaylastdateuploadedforthisvariant(orlast10dates) 3
Accessing the test database
http://115.146.85.61/
Username:
lab_tester
Password: hvpaustralia2013
Search Interface
• The search interface has to provide useful tools for clinicians and lab scientists so that the HPVA project offers them direct benefits and incentivises them to participate. Following a request for feedback from users, a series of improvements were implemented, initially on a demonstration server and then on the live server following review by the Steering Committee. The highest priorities were for more information about numbers of times particular variants were recorded, the ability to search by range and to filter by pathogenicity. There was also interest in enabling direct uploading of VCF files and the automated calculation of pathogenicity scores. Many of these features are now implemented and examples will be presented.
Purpose of the HVPA Database
• Working database – Record and share diagnostic quality data genetic variation
data – Integrate with clinical phenotype data – Integrate with international efforts – Heads up for NGS gene panel data sets
• Test database – Showcase enhancements – Real world testing and feedback – Uses data edited from actual database – Not accurate or reliable: some parameters edited for test
purposes
Major improvements to search facility
Searching by expression match BRCA BR
Instances of a variant
Pathogenic Variants
Direct Import from Results Lists
• Can recover historical data sets
• Reformat on the fly
• Useful as low-overhead catch up to enable labs to transition to using uplaoding toals as their IT permits – PathWest (John Bielby)
– Institute of Health and Biomedical Innovation, Queensland (Lyn Griffiths)
– Kconfab (Heather Thorne)
– Peter MaCallum Cancer Centre (Ken Doig)
Variant Fields Mandatory GeneName RefSeqName RefSeqVer cDNA mRNA Genomic Protein Location
OfficialHGNC
Symbol
Nameof
reference
sequence(NCBI's
RefSeqproject)
Versionof
reference
sequence
(RefSeq)
HGVSvariant
name(c.)
HGVSvariant
name(m.)
HGVSvariant
name(g.)
HGVSvariant
name(g.)
Exonorintron
number
VARCHAR(20) VARCHAR(20) VARCHAR(20) VARCHAR(255) VARCHAR(255) VARCHAR(255) VARCHAR(255) VARCHAR(255)
Mandatory Mandatory Mandatory Atleastonerequired
Pathogenicity PatientID TestID InstanceDate GenomicRefSeq GenomicRefSeqVer
Levelofpathogenicity
(1=Pathogenic,2=Possibly
Pathogenic,3=Unknown,
4=Possiblebenign,
5=CertainlyBenign)
InternalIDfor
thepatient
usedwithin
thelab
InternalID
forthetest
usedwithin
thelab
Dateinstance
wastested
Genomic
reference
sequence
Genomicreference
sequenceversion
VARCHAR(20) DateTime VARCHAR(255) VARCHAR(255)
Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory
Variant Fields (Optional)
PatientAge TestMethod SampleTissue SampleSource Justification
Ageofpatient
whentestwas
taken
Thenameofthe
testmethodused
Typeofsample
taken
Thesourceofthe
samplee.g.:DNA,
g.DNA,RNA...
Justificationbymedical
scientist
INT32 VARCHAR(20) VARCHAR(20) VARCHAR(20) VARCHAR(65535)
Optional Optional Optional Optional Optional
PubMed RecordedInDatabase SampleStored
VariantSegregatesWi
thDisease HistologyStored
PedigreeA
vailable SIFTScore
PubMed
Identifier/Data
ObjectIdentifier
Whetheritis
recordedindisease
specificorgene
specific
Whetherlabstill
hassampleleft
Whetherpedigreee
wasconsideedduring
diagnosisof
pathogenicity
Whether
histogramsare
stored
Whether
organisati
onhas
pedigree
data
Calculated
SIFTScore
VARCHAR(255) Boolean Boolean Boolean Boolean Boolean INT32
Optional Optional Optional Optional Optional Optional Optional
Linkage to other datasets
• HVPA have implemented the hash key algorithm and work is in progress with BioGrid to link variation data to clinical data sets.
• More details from Maureen Turner, BioGrid CEO who is speaking at this meeting
Cost and performance will force diagnostic labs to adopt NGS as front-line approach
cost per base Illumina share price
Hype cycle
HVPA LOVD3 database pilot
• Established an HVPA LOVD3 database and working with the Human Genetics Society of Australasia on a pilot study to sequence the exomes of two trios and review the data using this database.
• Includes exome-scale data
• Open access to Coriell cases with no “consent” issues
• Explore staging of variant “credibility classification” and access
Relationship to Gene Panel Databases? e.g. http://genomics.bio21.unimelb.edu.au/lovd/
Melbourne Genomics Health Alliance
34
• Clinically led, rather than technology driven
• Fostering ‘end use’ of genomic data
• Common clinical repository
• Prospective : first tier test
• Evaluation to inform implementation
• Engineering collaboration
• Fostering system change
• A/Prof Clara Gaff: Program Leader
PARADIGM FOR IMPLEMENTING GENOMIC MEDICINE
35
Melbourne Genomics Health Alliance
Connected nationally
and internationally
36
How many variants per exome?
SNP count Study
20,000 Choi et al. PNAS 2009
142,000 Mullikin NIH, unpublished 2010
50,000 Clark et al. Nature biotechnology 2011
125,000 Smith et al. Genome Biology 2011
100,000 Johnston & Biesecker Human Molecular Genetics 2013
200,000 to 400,000 Yang et al.N Engl J Med 2013
• 20-fold range • Exome designs vary • Likely to be higher variant count in African populations as the
reference sequence is non-African
Low concordance of multiple variant-calling pipelines
Rawe et al Genomic Medicine 2013
• 15 exomes
• 4 families
• HiSeq 2000
• Agilent SureSelect v.2
• ~120X mean coverage • SOAP, BWA-GATK, BWA-SNVer,
GNUMAP, and BWA- SAMTools
• SNV concordance between five Illumina pipelines across all 15 exomes was 57.4%
• 0.5-5.1% variants were called as unique to each pipeline
• Indel concordance was only 26.8% between three indel calling pipelines
• 11% of CG variants that fall within targeted regions in exome sequencing were not called by any of the Illumina-based exome analysis pipelines
• 97.1%, 60.2% and 99.1% of the GATK-only, SOAP-only and shared SNVs can be validated
• 54.0%, 44.6% and 78.1% of the GATK-only, SOAP-only and shared indels can be validated
• Additional accuracy gained in variant discovery by having access to genetic data from a multi- generational family
Low concordance of multiple variant-calling pipelines O’Rawe et al. Genome Medicine 2013, 5:28
SNV concordance: 57.4% Indel concordance 26.8%
Venn diagrams of selected CNV detection methods in real data processing
Duan J, Zhang J-G, Deng H-W, Wang Y-P (2013) Comparative Studies of Copy Number Variation Detection Methods for Next-Generation Sequencing Technologies. PLoS ONE 8(3): e59128. doi:10.1371/journal.pone.0059128 http://www.plosone.org/article/info:doi/10.1371/journal.pone.0059128
Sequence errors
Post processing errors
Remove errors before processing
K-mer selection
Merging'forward'and'reverse'reads'
0
200
400
600
800
1000
1200
1400
1600
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTTTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGTATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGTA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATTAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
TAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATCTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGGAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAGAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
Discard rare reads
Use a HiFi polymerase
Four capture panels at SOD1
• Known SNV concordance 100%, all assays
• Known indel <6bp concordance 100%, all assays
• Not able to detect c9orf72 hexanucleotide expansion or PRNP octapeptide region repeat with standard pipeline
• Diagnostic yield within appropriate clinical context (based on very limited sample size)
- NimbleGen SeqCap EZ Neuro: 33% (2/6)
- Nextera Neuro: 23% (6/26)
Results – detection of variants
Filtering Variants
All variants None Qual Not in Blood
Blood 9828 8551 NA
Frozen 9920 8736 126
FFPE 9709 8163 199
Variants in Gene List None Qual Not in Blood
Blood 27 18 NA
Frozen 27 23 2 (EGFR)
FFPE 25 19 3 (EGFR, ROS)
EGFR p.L858R
EGFR p.T790M
Confirmation by PCR
0.0
50.0
100.0
150.0
200.0
250.0
EGFR_NM_005228.3T790T790WT
EGFR_NM_005228.3784"c.2350T>C,p.S784P"
EGFR_NM_005228.3784"c.2351C>T,p.S784F"
EGFR_NM_005228.3785"c.2354C>T,p.T785I"
EGFR_NM_005228.3786"c.2356G>A,p.V786M"
EGFR_NM_005228.3790"c.2368A>G,p.T790A"
EGFR_NM_005228.3790"c.2369C>T,p.T790M"
EGFR_NM_005228.3828&861"828&861,wt"
EGFR_NM_005228.3858"c.2572C>A,p.L858M"
EGFR_NM_005228.3858"c.2573_2574delinsGT,
EGFR_NM_005228.3858"c.2573T>A,p.L858Q"
EGFR_NM_005228.3858"c.2573T>G,p.L858R"
EGFR_NM_005228.3860"c.2579A>T,p.K860I"
EGFR_NM_005228.3861"c.2582T>A,p.L861Q"
EGFR_NM_005228.3861"c.2582T>G,p.L861R"
EGFRnormalised
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
KRAS_NM_033360.212"c.34G>A,p.G12S"
KRAS_NM_033360.212"c.34G>C,p.G12R"
KRAS_NM_033360.212"c.34G>T,p.G12C"
KRAS_NM_033360.212"c.35G>A,p.G12D"
KRAS_NM_033360.212"c.35G>C,p.G12A"
KRAS_NM_033360.212"c.35G>T,p.G12V"
KRAS_NM_033360.213"c.37G>A,p.G13S"
KRAS_NM_033360.213"c.37G>C,p.G13R"
KRAS_NM_033360.213"c.37G>T,p.G13C"
KRAS_NM_033360.213"c.38G>A,p.G13D"
KRAS_NM_033360.213"c.38G>C,p.G13A"
KRAS_NM_033360.213"c.38G>T,p.G13V"
KRASnormalised
Auto Upload Database of Results in LOVD Local LOVD instances sharable via HVPA
• Coriell pedigree comparison
• Subset of 19 genes – targeted by all four assays
• Variant allele frequency cut-off of 35% (interested in germline variants)
Results – detection of variants
Total number of variants detected
Non-synonymous variants detected # variants with GAF <5% # variants with African AF 5%
Y077 Mother
Y077 Father
Y077 Child
Y077 Mother
Y077 Father
Y077 Child
Y077 Mother
Y077 Father
Y077 Child
Y077 Mother
Y077 Father
Y077 Child
NimbleGen SeqCap EZ Neuro
194 241 196 16 22 20 4 5 7 2 3 4
Nextera Neuro 250 296 283 17 23 22 4 6 7 2 3 4
TruSight One 121 137 119 16 23 20 3 6 6 1 3 3
Nextera Exome 101 118 114 16 22 22 4 5 7 2 2 4
Y117 Mother
Y117 Father
Y117 Child
Y117 Mother
Y117 Father
Y117 Child
Y117 Mother
Y117 Father
Y117 Child
Y117 Mother
Y117 Father
Y117 Child
NimbleGen SeqCap EZ Neuro
279 245 263 20 20 20 4 5 6 3 2 4
Nextera Neuro 382 371 342 20 21 21 5 5 6 3 2 4
TruSight One 148 154 148 18 18 17 4 4 5 3 2 3
Nextera Exome 121 67 66 19 15 16 5 3 4 3 1 3
Example case showing concordance Gene Variant Chr Coordinate zyg Gene Variant Chr Coordinate zyg KEY
APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het exome
APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het nimbleneuro
APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het nextneuro
APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het trusight1
APOE C>C/T 19 45412040 het NPC1 TA>TA/T 18 21123536 het
APOE C>C/T 19 45412040 het NPC1 TA>TA/T 18 21123536 het
APOE C>C/T 19 45412040 het NPC1 TA>TA/T 18 21123536 het
APOE C>C/T 19 45412040 het NPC1 TAA>TAA/T 18 21123536 het
ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom
ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom
ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom
ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom
ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het
ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het
ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het
ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het
ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het
ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het
ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het
ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het
ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20972048 het
ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20972048 het
ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20972048 het
ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20975727 het
LRRK2 G>A/A 12 40619082 hom PINK1 G>G/A 1 20975727 het
LRRK2 G>A/A 12 40619082 hom PINK1 G>G/A 1 20975727 het
LRRK2 G>A/A 12 40619082 hom PINK1 A>A/C 1 20977000 het
LRRK2 G>A/A 12 40619082 hom PINK1 A>A/C 1 20977000 het
LRRK2 C>C/G 12 40657700 het PINK1 A>A/C 1 20977000 het
LRRK2 C>C/G 12 40657700 het PINK1 A>A/C 1 20977000 het
LRRK2 C>C/G 12 40657700 het PSEN2 G>G/A 1 227071449 het
LRRK2 T>T/A 12 40713901 het PSEN2 G>G/A 1 227071449 het
LRRK2 T>T/A 12 40713901 het PSEN2 G>G/A 1 227071449 het
LRRK2 T>T/A 12 40713901 het PSEN2 G>G/A 1 227071449 het
LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom
LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom
LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom
LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom
NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het
NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het
NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het
NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het
Describing Coverage
% target region with non-zero
depth
% target regions >=
5x
% target regions >=
15x
% target regions >=
30x
% target regions >=
50x
average depth of coverage
5th-centile 20th-centile 50th-centile 95th-centile
MiSeq (12plex)
99.54% 98.25% 94.80% 89.10% 81.56% 180.76 19.08 67.42 160.42 414.00
HiSeq (48plex)
99.90% 99.71% 99.34% 98.85% 98.17% 920.84 126.75 408.83 871.17 1879.92
Mapping quality >= 15 Base quality score >= 15
Coverage reproducibility
Coverage Coefficient of variation
Higher coverage greater reproducibility
Coverage Coefficient of variation
Can capture coverage report dosage to diagnostic standards?
samples
targ
ets
samples
auto
som
al t
arge
ts
chrX
tar
gets
Inter-sample variation is low, But low coverage prevents dosage estimation
Chr X is a good first pass test for dosage
XX vs. XY
8 Female cases and 16 Male cases showing reproducibility of coverage of X loci within each group. Loci with higher SDs were associated with reduced coverage.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 10 20 30 40 50 60 70 80
AverageXX
AverageXY
-0.5
0
0.5
1
1.5
2
2.5
3
0 10 20 30 40 50 60 70 80
AVGEXX
AVGEXY
870
160
Report
Sharing Experience with TruSight One
• In partnership with Illumina, RCPA and the HGSA Kim Flintoff (Wellington Regional Genetics Laboratory) is leading an evaluation of exon sequencing using Illumina’s True Sight One panel. Two Coriell family trios will be sequenced by New Zealand Genomics Limited and the data will be shared on a HVPA database
• The VCF file will be available on the HVPA LOVD database and performance stats will also be made available.
Next Steps
• Robust standards for genomic medicine
• Databases and data content – Access to identified and de-identified data (consent
and confidentiality)
– Database accreditation process in prep with RCPA
– Defining the performance of various aligners, variant callers and annotation programs
– Clinical grade Variant Call Format (VCF)
– Metafile covering data trail: what was tested, what was not tested
Standards for Accreditation of DNA Sequence Variation Databases
Quality Use of Pathology Program (QUPP), a national project for the Development of Standards for Accreditation of DNA Sequence Variation Data Bases has been jointly initiated by the Royal College of Pathologists of Australasia (RCPA), and the Human Variome Project (HVP). Background • There is a rapidly increasing volume, spectrum, and complexity of genetic tests emerging within
diagnostic pathology laboratories. In particular, high throughput sequencing methods such as targeted panel, exome (WES), and whole genome sequencing (WGS), are producing an increasing quantity of genetic data requiring analysis and interpretation, forming a substantial proportion of the workload.
• Currently, there is a plethora of online mutation databases to refer to, however there is a distinct lack of such databases that meet the stringent accuracy and reproducibility that the clinical diagnostic environment demands. Additionally, The current databases are “Fractured”, with varied access and sharing of the data within; and variable quality due to errors / inaccurate data posting, all of which is a clear risk to the quality of patient care. With more widespread, secure sharing of variants and associated phenotypes, the value of cumulative variant information will accelerate the delivery of accurate, actionable, and efficient clinical reports.
• There are currently no standards or equivalent mechanisms for accreditation of databases to ensure the accuracy and quality of uploaded data into any central repository to meet the needs of the clinical diagnostics environment.
Data quality classes Differentiate between three classes of data: The Clinically Reported data label would denote the class of data that the HVP Australian Node was originally designed to collect and share: data that has been generated in a NATA accredited Australian diagnostic laboratory and is able to be included in a clinical report. Unreported Clinical quality data would denote data that has been generated in a NATA accredited diagnostic laboratory, but is not capable of being included in a clinical report. This class would comprise, primarily, of next-generation sequencing (NGS) type data. Unaccredited data would be used to denote data that has been generated by an Australian laboratory that has not been NATA accredited A new filtering option would be made available to allow users to view only data of a certain class
Beyond the NeCTAR funding
• Academic or charitable funding required
• Integrate NGS data resource into the HVPA portfolio
• Move database development into a medical academic centre of excellence
• Seek active partnerships with current and future collaborators with investment and risk sharing