2015 tricon - clinical grade annotations - public data resources for interpreting genomic variants

Post on 15-Jul-2015

608 Views

Category:

Health & Medicine

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Clinical Grade Annotations:Public Data Resources for

Interpreting Genomic Variants

February 19, 2105

Gabe Rudy

@gabeinformatics

VP Product Management and Engineering

Golden Helix

My Background

Golden Helix

- Founded in 1998

- Genetic association software

- Analytic services

- Thousands of users worldwide

- Over 800 customer citations in journals

Products I Build with My Team

- SNP & Variation Suite (SVS)

- SNP, CNV, NGS tertiary analysis

- Import and deal with all flavors of upstream data

- VarSeq

- Annotate and filter variants in gene panels, exomes and

genomes for clinical labs and researchers.

- GenomeBrowse (Free!)

- Visualization of everything with genomic coordinates.

All standardized file formats.

Agenda

Getting High Quality Variant Calls

Data Sharing and the Maturing of Public Resources

2

3

4

Clinical Grade Candidate Variant Identification

How I Met My Exomes1

NGS Clinical Utopia: Are We There Yet?5

Exome Sequencing in Consumer Genomics

Exomes done as part of Pilot

Program

80x coverage

Raw data with no interpretation

Erin

JIA

Gabe

(me)

Ethan

Research or clinical grade?

Total Reads 140M

Unique Align 87%

Mean Target 105x

% Target at 2x 97%

% Target at 10x 94%

% Target at 20x 89%

% Target at 30x 83%

Agenda

Getting High Quality Variant Calls

Data Sharing and the Maturing of Public Resources

2

3

4

Clinical Grade Candidate Variant Identification

How I Met My Exomes1

NGS Clinical Utopia: Are We There Yet?5

PSPH mis-alignment

Splice Mutation

GRCh38 – Here Now, but no hurry

A better human reference

- Revised Cambridge Reference

Sequence (rCRS) MT

- Has centromere models

- ~2000 incorrect alleles fixed

- ~100 assembly gaps updated

NCBI Annotations 106 on 38

- dbSNP 141, ClinVar,

RefSeqGene

- Ensembl 76 on both

No Poplulation Catalogs

- Some being ported (by

Ensembl, dbSNP)GRCh37 GRCh38

Ts/Tv 2.06558 2.10171

snps

snps

mnps

mnps

indels

indels

complex

complex

270000

280000

290000

300000

310000

320000

330000

340000

GRCh37 GRCh38

My Exome

331,824

319,442

Blog Post

Agenda

Getting High Quality Variant Calls

Data Sharing and the Maturing of Public Resources

2

3

4

Clinical Grade Candidate Variant Identification

How I Met My Exomes1

NGS Clinical Utopia: Are We There Yet?5

Baylor Workflow - Clinical Exomes Paper

Disease gene related

Medically actionable

deleterious variants

Deleterious variants in ACMG

gene list

Deleterious variants

VUS in dominant gene or

homozygous in recessive

gene

Deleterious variant in gene

with no known disease

Annotate, Then Filter and Interpret

Data Sources to Replicate Workflow

1000 Genomes (Phase 1)

“ESP” (NHLBI 6500 Exomes v2)

HGMD (Public vs Professional)

Variant’s Protein Coding Effect

RNA Splicing Effect (dbscSNV)

- −3 to +8 at the 5’, −12 to +2 at the 3’

Genes Lists:

- Single-Gene Disorder (OMIM with Inheritance)

- Medically Actionable (114 genes NHLBI study)

- Dominant Inheritance (MedGen)

- ACMG Carrier Panel (ACMG Incidental

Findings guidelines)

My Exome Analyzed

Start: 235,689

847

234,842

224,914

9,928

9,069

807

859

40

242 13

59 565

0

624

624

255

20

20

20

0

0

598

644

• Pathogenic OTC Variant

• What if I got this through BabySeq?

Agenda

Getting High Quality Variant Calls

Data Sharing and the Maturing of Public Resources

2

3

4

Clinical Grade Candidate Variant Identification

How I Met My Exomes1

NGS Clinical Utopia: Are We There Yet?5

Annotating against Transcripts

RefSeqGenes – Versioned on RNA sequence

- Annotated against human reference by “Annotation Releases”

- Last on GRCh37 was 105 (2013-08-20) – GRCh38 release 106 (2014-01-17)

- 84,950 transcripts, most are “predicted” (XM_” and non-coding)

- Standard in US for reporting variation (NM_016335.4:c.123C>T etc)

- UCSC grabs RNA from RefSeq directly and maps to their genome references

“continuously”

Ensembl – Versioned on Alignment

- GENCODE: Well curated subset of high-quality, validated transcripts

- V75 last version of GRCh37, 2014-06-27

- Many specific bio-types, but protein_coding usued for annotation

- Has mappings to RefSeq IDs, but

Reference Sequence Versus Gene Sequence

EMG1 on GRCh37

“Gap” of the mRNA coding sequence versus reference seq:

Handled differently by 3 different “gene alignments”

Reference Sequence Versus Gene Sequence

EMG1 on GRCh38

Reference sequence patched, no gap

Alignments agree

RefSeq Accession Not Sufficient for Var-Tx Interaction

RefSeq defines transcripts as mRNA sequence

NCBI “Annotation Releases” (like v105) provides alignments using “Splign”

UCSC pulls RefSeq mRNA and aligns themselves using “BLAT”

They can choose equally valid but different alignments for the same assession

This alignment of NM_052814.3 places the exon at dramatically different loci.

Will result in different annotations of any variant overlapping these exons

COSMIC

Does not provide data in easy

to use form for NGS

Just announced change in

licensing affective in March

- Access to the COSMIC website will

stay free for all users.

- The new licensing strategy will

charge for-profit organisations to

download COSMIC datasets.

- Download by academic and non-

profit organisations will remain free

2015 Roadmap:

- GRCh38

- More curation

- Visualization improvements

ClinVar

Submitters:

- OMIM: Johns Hopkins

- Samuels

- Lab for Molecular Medicine

- Invitae

- Emory Genetics Lab

Star rating system

- 0-4 stars – level of review

ClinVar is designed to provide a freely accessible,

public archive of reports of the relationships

among human variations and phenotypes, with

supporting evidence.

ClinVitae: ClinVar and Friends by Invitae

Sources:

- ClinVar (62,913)

- Emory (13,365)

- ARUP (2,850)

- Carver Mut (199)

- K Cunningham (581)

79,907 V, 9,189 G

- 32,523 Pathogenic

- 38,796 Likely Pathogenic

Provided in HGVS

- 59,878 after mapping to genomic space

BRCA: The back door to Myriad’s database

1995 – Patent issued

to Myriad Genetics

June 2013 – Patents

invalidated by ruling

Lab setting up Dx

has a lot of catch up

“Free the Data” and

other ways in which

Mryiad’s data is in

ClinVar, etc.

Sharing Clinical Reports Project

BRCA: In my wife

HGMD

Data mines academic

papers for reported

functional variants

Also takes

submissions,

corrections reviewed by

team

First available in 1996

- Originally 10k variants

- 105k in Public (2014)

- 148k in “Pro” (2014)

Left-Align Delta F508 to Make it Match

Left-Align Annotations

Using a Smith-

Waterman

algorithm to left-

align variants

from public

databases show

non-obvious

differences

NGS alignment

and variant

calling always

left-aligned

Left-align your

database so they

can be annotated

Changes in Monthly Updates

• 36 variants went missing from

December to Jan release

• Some where Pathogenic

ClinVar’s VCF File

• ClinVar current relies on their

dbSNP identifier mappings to

“build” VCF files

• There are ~14,000 small variants

in their database without dbSNP

identifiers, and thus missing from

the VCF

• ~5K Pathogenic

• Often these variants are in newer

dbSNP builds, and the ClinVar

mappings are just not updated.

• This variant was in ClinVar, with

genomic coordinates, but no

RSID:

- HGVS(c.): NM_002894.2:c.298C>T

- Chromosome:Start:Stop: 18:20548818:20548818

- (Recently RSID was added)

dbSNP 141 Had Allele Errors

I reported the issue

7/22/2014

Confirmed, 8/12

generated better VCF

and placed in “test”

folder

Found more issues

Replaced official VCF

in 02/09/2014

We waited until fixed

to publish official

support

Agenda

Getting High Quality Variant Calls

Data Sharing and the Maturing of Public Resources

2

3

4

Clinical Grade Candidate Variant Identification

How I Met My Exomes1

NGS Clinical Utopia: Are We There Yet?5

asdf

NM_002626.4:c.1877G>C in PFKL

NP_002617.3:p.Arg626Pro missense mutation

Predicted damaging by 4/5 functional predictions

VEST3: 0.948, GERP++: 4.59

ExAC and 1kG have a G>A, but G>C is novel

Variants in region are extremely rare (G>C ExAC 4 of 122,364 alleles) – 0.003%

No ClinVar variants for gene

OMIM entry has no known disease association

PubMed search shows few recent articles: Most recent 1998 paper showed

- phosphofructokinase (PFKL) overexpressed in Down syndrome (DS)

- Transgenic PFKL mice had an abnormal glucose metabolism with reduced clearance

rate from blood and enhanced metabolic rate in brain.

d

d

35 LoF Variants, None Homozygous

Training

Most variants are rare or novel

- Training to interpret these is

extensive

MD/Pathology background is

insufficient

Need a PhD in molecular

genetics

There’s only 500 board certified

Clinical Molecular Geneticists

since started

Let’s share in the learning

process

Baylor Exome Sign-Out

Phenotypeing and Matchmaking Portals

Diagnosis often requires finding

another family to confirm a novel

gene to phenotype association

Finding a second family:

- Social media

- PhenoDB

- PhenomeCentral.org

- Orphanet – Resources on over 6000 rare

diseases and orphan drugs.

- European centric: GEN2PHEN (G2P)

Matt Might found a second

family with NGLY1

deficiency through a blog

post that went viral.

N-Glycanase Deficiency

http://www.ngly1.org/

Matthew Might and Matt Wilsey. The

shifting model in clinical diagnostics:

how next-generation sequencing and

families are altering the way rare

diseases are discovered, studied,

and treated. Genetics in Medicine.

March 2014.

Thank you

Heidi Rehm – Chief Laboratory Director at

Laboratory for Molecular Medicine,

PCPGM

Joel Parker – Cancer Genetics, UNC

Chapel Hill

Gerry Higgins – VP, Pharmacogenomic

Science, Assure Rx Health

Frank Schacherer – Chief Technical

Officer, BIOBASE

Reece Hart – Computational Biologist,

Invitae (now 23andMe)

Greta Linse Peterson – Director of Product

Management and Quality, Golden Helix

Questions?

top related