the human variome database in australia in 2014 - graham taylor

Acknowledgments

Genomic Medicine & Translational Pathology, University of Melbourne: Arthur Lian Chi Hsu, Renate Marquis-Nicholson, Sebastian Lunke, Clare Love, Kym Pham, Olga Kondrashova, Matt Wakefield, Tiffany Cowie, Barney Rudzki and Paul Waring

Human Variome Project Tim Smith, Alan Lo, Melvyn Leong, David Perkins, Heather Howard, Rania Horaitis Dick Cotton BioGrid Maureen Turner, Leon Heffer Royal College of Pathologists of Australasia Vanessa Tyrrell Peter MaCallum Cancer Centre Ken Doig, Andrew Fellowes Victorian Clinical Genetics Service John-Paul Plazzer, Desiree Du Sart

Human Variome Project (Australasia)

• The bigger picture

• Infrastructure and search interface

• Linkage to other datasets

• Panel, exome and genome testing

• Database accreditation

• Next steps

The big picture

• Rediscovery at the genomics community level that data sharing is win-win

• The Genomic Alliance, HGVS, HUGO

– Data standards

– Nomenclature

– Infrastructure

Nature (Perspective) 508 469-475 2014 Guidelines for investigating causality of sequence variants in human disease

D. G. MacArthur, T. A. Manolio, D. P. Dimmock, H. L. Rehm, J. Shendure, G. R. Abecasis, D. R. Adams, R. B. Altman, S. E. Antonarakis, E. A. Ashley, J. C. Barrett, L. G. Biesecker, D. F. Conrad, G. M. Cooper, N. J. Cox, M. J. Daly, M. B. Gerstein, D. B. Goldstein, J. N. Hirschhorn, S. M. Leal, L. A. Pennacchio, J. A. Stamatoyannopoulos, S. R. Sunyaev, D. Valle, B. F. Voight, W. Winckler & C. Gunter.

Priorities for research and infrastructure development 1. Improved public databases of human genetic variants incorporating explicit, up-to-date supporting

evidence for variant implication in disease and audit trails recording changes in interpretation. 2. Improved incentives, and ethical and logistical solutions, for sharing of genetic and phenotypic data from

both research and clinical diagnostic laboratories. 3. Public databases of variant and allele frequency data from large sets of population reference samples

from a wide range of ancestries. 4. Large-scale genotyping of reported human disease-causing variants in large, well-phenotyped

population cohorts, reducing biases in the assessment of the associated penetrance and phenotypic heterogeneity.

5. Development and benchmarking of standardized, quantitative statistical approaches for objectively assigning probability of causation to new candidate disease genes and variants.

Déjà vu all over again?

Nature Genetics 46, 107–115 (2014) Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database

Bryony A Thompson, Amanda B Spurdle, John-Paul Plazzer, Marc S Greenblatt, Kiwamu Akagi, Fahd Al-Mulla, Bharati Bapat, Inge Bernstein, Gabriel Capella ́, Johan T den Dunnen, Desiree du Sart, Aurelie Fabre, Michael P Farrell, Susan M Farrington, Ian M Frayling, Thierry Frebourg, David E Goldgar, Christopher D Heinen, Elke Holinski-Feder, Maija Kohonen-Corish, Kristina Lagerstedt Robinson, Suet Yi Leung, Alexandra Martins, Pal Moller, Monika Morak, Minna Nystrom, Paivi Peltomaki, Marta Pineda, Ming Qi, Rajkumar Ramesar, Lene Juel Rasmussen, Brigitte Royer-Pokora, Rodney J Scott, Rolf Sijmons, Sean V Tavtigian, Carli M Tops, Thomas Weber, Juul Wijnen, Michael O Woods, Finlay Macrae & Maurizio Genuardi, on behalf of InSiGHT.

Nature Genetics 46, 107–115 (2014)

1. Leiden Open Variation Database (LOVD) 2. Micro- attribution using Open Researcher & Contributor Identification (ORCID) 3. Variant Interpretation Committee (VIC) apply a 5-tiered scheme developed by the

International Agency for Research on Cancer (IARC) classification system 4. Endorsed by the Human Variome Project (HVP)

Not everything in the Nature portfolio is gold

It is good to supplement your pocket money

Early nomenclature papers

• Beaudet

• Tsui

• Antanorakis

Translation into diagnostic practice

• 15 years ago Cotton predicted that the majority of human genetic variants will be detected in a diagnostic context

• As NGS moves into a service setting this transition will become even clearer

• Genetic variants will become part of a patient’s medical record

HVPA database

• Primarily for and of diagnostics

• Diagnostic services are busy

• And cash and time limited

• We have to make it easy for them

• And secure

• And useful

• Maybe even essential

HVPA Objective

A national data sharing facility for improving clinical genetic testing services and supporting medical research

Constitutional, not somatic, mutations

NECTAR project grant UoM FE31082

“Clinical and Molecular Data Linkage Tools”, completion date 30th June 2014

Infrastructure and search interface

• Data repository (“the database”)

• Data handling tools that support data upload from laboratories

• Portal though which the database can be browsed

• Website for news and notifications

Human Variome Project Australian Node

What We’ve Done • NeAT Funding (2010-2011)

– Pilot Phase – 4 labs, 3 diseases

• Breast Cancer • Colon Cancer • Huntington’s

– Portal Launched April 2011 – Molecular Data Only – Collaboration with Mawson

• NeCTAR Funding (2012-2014) – 12 more labs + all genes they test

for – Configuration Tool – Clinical Data/Phenotype Linkage – Transfer data internationally

What We Built

• Collection Tool

• Portal

• Data Model

• Ethics Processes

• Access & Usage Policy

• Data Sharing Agreements

How it works

• Software to interface with existing LIMS (or lack thereof) • Collection occurs after report has been issued • Data types:

– All classified variants reported by a lab – Benign variants – NGS/Incidental findings – Not collecting negative results

• Secure data link between lab and Node • (Semi)-automatic transfer of data • Portal to allow interrogation of all Australian data

– http://www.hvpaustralia.org.au

• Linkage key generator • Submission to BioGrid Platform

http://www.hvpaustralia.org.au/

Open-Source Solutions

• HVP Portal (v1.0, r512) - A web application which features the basic interface for browsing and querying a HVP node. – Open source – MIT License – Python/django

• HVP Exporter (v1.0, r512) - Basic HVP exporting tool for laboratories. Features simple GUI and error checking interface, plug-in architecture for customisation between sites and common libraries for working with MS Access and MS Excel data sources – Open source – MIT License – .NET C#, python/ironpython

• HVP Importer (v1.0, r512) - A series of tools and web services that receive, decrypt and process information by submitting laboratories using the standard transaction XML format – Open source – MIT License – python

Access to HVPA

• Controlled Access

– Diagnostic Lab Staff

– Registered Medical Practitioners

– Board Certified Genetic Counsellors

• Online application

HVPA Status at November 2013

Strengths

1. Database available on demand for diagnostic labs

2. Tools for data sharing

3. Community engagement with RCPA (QUUP), SA/Mawson, BioGrid, VCGS

4. National reach with international connections via HVPI, WHO & UNESCO

Weaknesses 1. Performance of the existing

HVPA database is limited

2. Laboratory buy-in to the database across Australia is limited

3. The database itself has been hard to access because of low server bandwidth

4. The project has not anticipated the likely impact of next generation sequencing and risks missing inclusion in genomic-scale initiatives now underway.

HVPA 24th March 2014

• 5 laboratories submitting

• 295 Unique Variants

• 27410 Instances

• 25 Registered users

Developments proposed in November ID Area Idea Priority

1 B.Presentation Statisticsofnumberofvariantsforthatgeneastableorbargraph(#unique,#instances,top5qtysubmitted)

1

15 D.Feedback Raiseaconcernaboutaninstance'sinterpretation 12 A.Search Searchbyrange 23 A.Search Searchbygenomicposition 24 A.Search Filterbypathogenicity 25 B.Presentation Sortby...(pathogenicity,otherfields) 26 C.RelevantInfo Displaylinkstorelateddatabaseforgenebyreferencinggenenames.org 27 A.Search Wildcardsearchofvariants 29 A.Search Searchbydiseasewhichshowsmultiplegenesandvariantresults 210 E.NGS VCFdataimportsintoHVPAustralia 213 B.Presentation VarVis-visualisationofgeneandvariantsreported 211 B.Presentation VCFdataexportfromHVPAustraliaofasetofresults 312 B.Presentation Atinstancelevel-seeothervariantsfromthistest/patient 314 C.RelevantInfo Capture&displaySIFTscore 316 D.Feedback Notifylabsthegeneralconcensusofpathogencityofsomethingtheysubmittedhas

changed/updated.i.eTheysubmittedbenignanditsnowlikelypathogenicorsubmitedunknownandknowitssomethingelse

3

17 B.Presentation IntegrationwithEBI/NCBItoolsforqueriesanddisplays 319 B.Presentation Displaylastdateuploadedforthisvariant(orlast10dates) 3

Accessing the test database

http://115.146.85.61/

Username:

lab_tester

Password: hvpaustralia2013

Search Interface

• The search interface has to provide useful tools for clinicians and lab scientists so that the HPVA project offers them direct benefits and incentivises them to participate. Following a request for feedback from users, a series of improvements were implemented, initially on a demonstration server and then on the live server following review by the Steering Committee. The highest priorities were for more information about numbers of times particular variants were recorded, the ability to search by range and to filter by pathogenicity. There was also interest in enabling direct uploading of VCF files and the automated calculation of pathogenicity scores. Many of these features are now implemented and examples will be presented.

Purpose of the HVPA Database

• Working database – Record and share diagnostic quality data genetic variation

data – Integrate with clinical phenotype data – Integrate with international efforts – Heads up for NGS gene panel data sets

• Test database – Showcase enhancements – Real world testing and feedback – Uses data edited from actual database – Not accurate or reliable: some parameters edited for test

purposes

Major improvements to search facility

Searching by expression match BRCA BR

Instances of a variant

Pathogenic Variants

Direct Import from Results Lists

• Can recover historical data sets

• Reformat on the fly

• Useful as low-overhead catch up to enable labs to transition to using uplaoding toals as their IT permits – PathWest (John Bielby)

– Institute of Health and Biomedical Innovation, Queensland (Lyn Griffiths)

– Kconfab (Heather Thorne)

– Peter MaCallum Cancer Centre (Ken Doig)

Variant Fields Mandatory GeneName RefSeqName RefSeqVer cDNA mRNA Genomic Protein Location

OfficialHGNC

Symbol

Nameof

reference

sequence(NCBI's

RefSeqproject)

Versionof

reference

sequence

(RefSeq)

HGVSvariant

name(c.)

HGVSvariant

name(m.)

HGVSvariant

name(g.)

HGVSvariant

name(g.)

Exonorintron

number

VARCHAR(20) VARCHAR(20) VARCHAR(20) VARCHAR(255) VARCHAR(255) VARCHAR(255) VARCHAR(255) VARCHAR(255)

Mandatory Mandatory Mandatory Atleastonerequired

Pathogenicity PatientID TestID InstanceDate GenomicRefSeq GenomicRefSeqVer

Levelofpathogenicity

(1=Pathogenic,2=Possibly

Pathogenic,3=Unknown,

4=Possiblebenign,

5=CertainlyBenign)

InternalIDfor

thepatient

usedwithin

thelab

InternalID

forthetest

usedwithin

thelab

Dateinstance

wastested

Genomic

reference

sequence

Genomicreference

sequenceversion

VARCHAR(20) DateTime VARCHAR(255) VARCHAR(255)

Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory

Variant Fields (Optional)

PatientAge TestMethod SampleTissue SampleSource Justification

Ageofpatient

whentestwas

taken

Thenameofthe

testmethodused

Typeofsample

taken

Thesourceofthe

samplee.g.:DNA,

g.DNA,RNA...

Justificationbymedical

scientist

INT32 VARCHAR(20) VARCHAR(20) VARCHAR(20) VARCHAR(65535)

Optional Optional Optional Optional Optional

PubMed RecordedInDatabase SampleStored

VariantSegregatesWi

thDisease HistologyStored

PedigreeA

vailable SIFTScore

PubMed

Identifier/Data

ObjectIdentifier

Whetheritis

recordedindisease

specificorgene

specific

Whetherlabstill

hassampleleft

Whetherpedigreee

wasconsideedduring

diagnosisof

pathogenicity

Whether

histogramsare

stored

Whether

organisati

onhas

pedigree

data

Calculated

SIFTScore

VARCHAR(255) Boolean Boolean Boolean Boolean Boolean INT32

Optional Optional Optional Optional Optional Optional Optional

Linkage to other datasets

• HVPA have implemented the hash key algorithm and work is in progress with BioGrid to link variation data to clinical data sets.

• More details from Maureen Turner, BioGrid CEO who is speaking at this meeting

Cost and performance will force diagnostic labs to adopt NGS as front-line approach

cost per base Illumina share price

Hype cycle

HVPA LOVD3 database pilot

• Established an HVPA LOVD3 database and working with the Human Genetics Society of Australasia on a pilot study to sequence the exomes of two trios and review the data using this database.

• Includes exome-scale data

• Open access to Coriell cases with no “consent” issues

• Explore staging of variant “credibility classification” and access

Relationship to Gene Panel Databases? e.g. http://genomics.bio21.unimelb.edu.au/lovd/

Melbourne Genomics Health Alliance

34

• Clinically led, rather than technology driven

• Fostering ‘end use’ of genomic data

• Common clinical repository

• Prospective : first tier test

• Evaluation to inform implementation

• Engineering collaboration

• Fostering system change

• A/Prof Clara Gaff: Program Leader

PARADIGM FOR IMPLEMENTING GENOMIC MEDICINE

35

Melbourne Genomics Health Alliance

Connected nationally

and internationally

36

How many variants per exome?

SNP count Study

20,000 Choi et al. PNAS 2009

142,000 Mullikin NIH, unpublished 2010

50,000 Clark et al. Nature biotechnology 2011

125,000 Smith et al. Genome Biology 2011

100,000 Johnston & Biesecker Human Molecular Genetics 2013

200,000 to 400,000 Yang et al.N Engl J Med 2013

• 20-fold range • Exome designs vary • Likely to be higher variant count in African populations as the

reference sequence is non-African

Low concordance of multiple variant-calling pipelines

Rawe et al Genomic Medicine 2013

• 15 exomes

• 4 families

• HiSeq 2000

• Agilent SureSelect v.2

• ~120X mean coverage • SOAP, BWA-GATK, BWA-SNVer,

GNUMAP, and BWA- SAMTools

• SNV concordance between five Illumina pipelines across all 15 exomes was 57.4%

• 0.5-5.1% variants were called as unique to each pipeline

• Indel concordance was only 26.8% between three indel calling pipelines

• 11% of CG variants that fall within targeted regions in exome sequencing were not called by any of the Illumina-based exome analysis pipelines

• 97.1%, 60.2% and 99.1% of the GATK-only, SOAP-only and shared SNVs can be validated

• 54.0%, 44.6% and 78.1% of the GATK-only, SOAP-only and shared indels can be validated

• Additional accuracy gained in variant discovery by having access to genetic data from a multi- generational family

Low concordance of multiple variant-calling pipelines O’Rawe et al. Genome Medicine 2013, 5:28

SNV concordance: 57.4% Indel concordance 26.8%

Venn diagrams of selected CNV detection methods in real data processing

Duan J, Zhang J-G, Deng H-W, Wang Y-P (2013) Comparative Studies of Copy Number Variation Detection Methods for Next-Generation Sequencing Technologies. PLoS ONE 8(3): e59128. doi:10.1371/journal.pone.0059128 http://www.plosone.org/article/info:doi/10.1371/journal.pone.0059128

http://www.plosone.org/article/info:doi/10.1371/journal.pone.0059128




Sequence errors

Post processing errors

Remove errors before processing

K-mer selection

Merging'forward'and'reverse'reads'

0

200

400

600

800

1000

1200

1400

1600

CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA

CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA



CAGAAAAAGTAGAAAATGGAAGTTTATGTGATCAAGAAATCGATAGCATTTGCA


CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGTATTTGCA

CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGTA

CAGAAAAAGTAGAAAATGGAAGTCTATGTGATTAAGAAATCGATAGCATTTGCA


TAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA













CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATCTGCA






CAGAAAAAGTAGGAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA

CAGAAAGAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA

Discard rare reads

Use a HiFi polymerase

Four capture panels at SOD1

• Known SNV concordance 100%, all assays

• Known indel <6bp concordance 100%, all assays

• Not able to detect c9orf72 hexanucleotide expansion or PRNP octapeptide region repeat with standard pipeline

• Diagnostic yield within appropriate clinical context (based on very limited sample size)

- NimbleGen SeqCap EZ Neuro: 33% (2/6)

- Nextera Neuro: 23% (6/26)

Results – detection of variants

Filtering Variants

All variants None Qual Not in Blood

Blood 9828 8551 NA

Frozen 9920 8736 126

FFPE 9709 8163 199

Variants in Gene List None Qual Not in Blood

Blood 27 18 NA

Frozen 27 23 2 (EGFR)

FFPE 25 19 3 (EGFR, ROS)

EGFR p.L858R

EGFR p.T790M

Confirmation by PCR

0.0

50.0

100.0

150.0

200.0

250.0

EGFR_NM_005228.3T790T790WT

EGFR_NM_005228.3784"c.2350T>C,p.S784P"

EGFR_NM_005228.3784"c.2351C>T,p.S784F"

EGFR_NM_005228.3785"c.2354C>T,p.T785I"

EGFR_NM_005228.3786"c.2356G>A,p.V786M"

EGFR_NM_005228.3790"c.2368A>G,p.T790A"

EGFR_NM_005228.3790"c.2369C>T,p.T790M"

EGFR_NM_005228.3828&861"828&861,wt"

EGFR_NM_005228.3858"c.2572C>A,p.L858M"

EGFR_NM_005228.3858"c.2573_2574delinsGT,

EGFR_NM_005228.3858"c.2573T>A,p.L858Q"

EGFR_NM_005228.3858"c.2573T>G,p.L858R"

EGFR_NM_005228.3860"c.2579A>T,p.K860I"

EGFR_NM_005228.3861"c.2582T>A,p.L861Q"

EGFR_NM_005228.3861"c.2582T>G,p.L861R"

EGFRnormalised

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

KRAS_NM_033360.212"c.34G>A,p.G12S"

KRAS_NM_033360.212"c.34G>C,p.G12R"

KRAS_NM_033360.212"c.34G>T,p.G12C"

KRAS_NM_033360.212"c.35G>A,p.G12D"

KRAS_NM_033360.212"c.35G>C,p.G12A"

KRAS_NM_033360.212"c.35G>T,p.G12V"

KRAS_NM_033360.213"c.37G>A,p.G13S"

KRAS_NM_033360.213"c.37G>C,p.G13R"

KRAS_NM_033360.213"c.37G>T,p.G13C"

KRAS_NM_033360.213"c.38G>A,p.G13D"

KRAS_NM_033360.213"c.38G>C,p.G13A"

KRAS_NM_033360.213"c.38G>T,p.G13V"

KRASnormalised

Auto Upload Database of Results in LOVD Local LOVD instances sharable via HVPA

• Coriell pedigree comparison

• Subset of 19 genes – targeted by all four assays

• Variant allele frequency cut-off of 35% (interested in germline variants)

Results – detection of variants

Total number of variants detected

Non-synonymous variants detected # variants with GAF <5% # variants with African AF 5%

Y077 Mother

Y077 Father

Y077 Child

Y077 Mother

Y077 Father

Y077 Child

Y077 Mother

Y077 Father

Y077 Child

Y077 Mother

Y077 Father

Y077 Child

NimbleGen SeqCap EZ Neuro

194 241 196 16 22 20 4 5 7 2 3 4

Nextera Neuro 250 296 283 17 23 22 4 6 7 2 3 4

TruSight One 121 137 119 16 23 20 3 6 6 1 3 3

Nextera Exome 101 118 114 16 22 22 4 5 7 2 2 4

Y117 Mother

Y117 Father

Y117 Child

Y117 Mother

Y117 Father

Y117 Child

Y117 Mother

Y117 Father

Y117 Child

Y117 Mother

Y117 Father

Y117 Child

NimbleGen SeqCap EZ Neuro

279 245 263 20 20 20 4 5 6 3 2 4

Nextera Neuro 382 371 342 20 21 21 5 5 6 3 2 4

TruSight One 148 154 148 18 18 17 4 4 5 3 2 3

Nextera Exome 121 67 66 19 15 16 5 3 4 3 1 3

Example case showing concordance Gene Variant Chr Coordinate zyg Gene Variant Chr Coordinate zyg KEY

APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het exome

APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het nimbleneuro

APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het nextneuro

APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het trusight1

APOE C>C/T 19 45412040 het NPC1 TA>TA/T 18 21123536 het



APOE C>C/T 19 45412040 het NPC1 TAA>TAA/T 18 21123536 het

ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom




ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het




ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het




ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20972048 het




LRRK2 G>A/A 12 40619082 hom PINK1 G>G/A 1 20975727 het

LRRK2 G>A/A 12 40619082 hom PINK1 G>G/A 1 20975727 het

LRRK2 G>A/A 12 40619082 hom PINK1 A>A/C 1 20977000 het

LRRK2 G>A/A 12 40619082 hom PINK1 A>A/C 1 20977000 het

LRRK2 C>C/G 12 40657700 het PINK1 A>A/C 1 20977000 het

LRRK2 C>C/G 12 40657700 het PINK1 A>A/C 1 20977000 het

LRRK2 C>C/G 12 40657700 het PSEN2 G>G/A 1 227071449 het

LRRK2 T>T/A 12 40713901 het PSEN2 G>G/A 1 227071449 het



LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom




NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het




Describing Coverage

% target region with non-zero

depth

% target regions >=

5x

% target regions >=

15x

% target regions >=

30x

% target regions >=

50x

average depth of coverage

5th-centile 20th-centile 50th-centile 95th-centile

MiSeq (12plex)

99.54% 98.25% 94.80% 89.10% 81.56% 180.76 19.08 67.42 160.42 414.00

HiSeq (48plex)

99.90% 99.71% 99.34% 98.85% 98.17% 920.84 126.75 408.83 871.17 1879.92

Mapping quality >= 15 Base quality score >= 15

Coverage reproducibility

Coverage Coefficient of variation

Higher coverage greater reproducibility

Coverage Coefficient of variation

Can capture coverage report dosage to diagnostic standards?

samples

targ

ets

samples

auto

som

al t

arge

ts

chrX

tar

gets

Inter-sample variation is low, But low coverage prevents dosage estimation

Chr X is a good first pass test for dosage

XX vs. XY

8 Female cases and 16 Male cases showing reproducibility of coverage of X loci within each group. Loci with higher SDs were associated with reduced coverage.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 10 20 30 40 50 60 70 80

AverageXX

AverageXY

-0.5

0

0.5

1

1.5

2

2.5

3

0 10 20 30 40 50 60 70 80

AVGEXX

AVGEXY

870

160

Report

Sharing Experience with TruSight One

• In partnership with Illumina, RCPA and the HGSA Kim Flintoff (Wellington Regional Genetics Laboratory) is leading an evaluation of exon sequencing using Illumina’s True Sight One panel. Two Coriell family trios will be sequenced by New Zealand Genomics Limited and the data will be shared on a HVPA database

• The VCF file will be available on the HVPA LOVD database and performance stats will also be made available.

Next Steps

• Robust standards for genomic medicine

• Databases and data content – Access to identified and de-identified data (consent

and confidentiality)

– Database accreditation process in prep with RCPA

– Defining the performance of various aligners, variant callers and annotation programs

– Clinical grade Variant Call Format (VCF)

– Metafile covering data trail: what was tested, what was not tested

Standards for Accreditation of DNA Sequence Variation Databases

Quality Use of Pathology Program (QUPP), a national project for the Development of Standards for Accreditation of DNA Sequence Variation Data Bases has been jointly initiated by the Royal College of Pathologists of Australasia (RCPA), and the Human Variome Project (HVP). Background • There is a rapidly increasing volume, spectrum, and complexity of genetic tests emerging within

diagnostic pathology laboratories. In particular, high throughput sequencing methods such as targeted panel, exome (WES), and whole genome sequencing (WGS), are producing an increasing quantity of genetic data requiring analysis and interpretation, forming a substantial proportion of the workload.

• Currently, there is a plethora of online mutation databases to refer to, however there is a distinct lack of such databases that meet the stringent accuracy and reproducibility that the clinical diagnostic environment demands. Additionally, The current databases are “Fractured”, with varied access and sharing of the data within; and variable quality due to errors / inaccurate data posting, all of which is a clear risk to the quality of patient care. With more widespread, secure sharing of variants and associated phenotypes, the value of cumulative variant information will accelerate the delivery of accurate, actionable, and efficient clinical reports.

• There are currently no standards or equivalent mechanisms for accreditation of databases to ensure the accuracy and quality of uploaded data into any central repository to meet the needs of the clinical diagnostics environment.

Data quality classes Differentiate between three classes of data: The Clinically Reported data label would denote the class of data that the HVP Australian Node was originally designed to collect and share: data that has been generated in a NATA accredited Australian diagnostic laboratory and is able to be included in a clinical report. Unreported Clinical quality data would denote data that has been generated in a NATA accredited diagnostic laboratory, but is not capable of being included in a clinical report. This class would comprise, primarily, of next-generation sequencing (NGS) type data. Unaccredited data would be used to denote data that has been generated by an Australian laboratory that has not been NATA accredited A new filtering option would be made available to allow users to view only data of a certain class

Beyond the NeCTAR funding

• Academic or charitable funding required

• Integrate NGS data resource into the HVPA portfolio

• Move database development into a medical academic centre of excellence

• Seek active partnerships with current and future collaborators with investment and risk sharing

the human variome database in australia in 2014 - graham taylor

Science

phenotypic data

data upload

human diseasecausing

database data handling

human variome project

clearer genetic variants

molecular data linkage

allele frequency data