data translator: an open science data platform for mechanistic disease discovery
TRANSCRIPT
Data Translator: an Open Science Data Platform
for Mechanistic Disease Discovery
Melissa Haendel, PhD
@ontowonka
Prevailing clinical genomic pipelines
leverage only a tiny fraction of the available
data
Prevailing clinical genomic pipelines
leverage only a tiny fraction of the available
data
Under-utilized data Loss of discriminatory power
?
More species = more coverage
Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 2016
19,008
9,739
51%
More species = more coverage
19,008
78%
14,779
Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 2016
19,008
9,739
51%
Mungall et al Nucleic Acids Research bit.ly/monarch-nar-2016
More species = more coverage
Even inclusion of just four species boosts phenotypic coverage of genes by 38%
(5189%)
Combined = 89%
19,008
2,195 7,544 7,235 = 16,974 (union of coverage in any species)
Mungall et al Nucleic Acids Research bit.ly/monarch-nar-2016
PolydactylyTriphalangeal
thumb
Extra thumb
bone
https://radiopaedia.org/cases/triphalangeal-thumb-in-fanconi-anemiaPajni-Underwood, 2007, http://dev.biologists.org/content/134/12/2359
Different communities use different languages
Challenge: Each data source uses their own
vocabulary/ontology
MP
HP
MGI
HPOA
Challenge: Each data source uses their own
vocabulary/ontology
ZFA
MPDPO
WPO
HP
OMIA
VT
FYPOAPO
SNOMED
………
WB
PB
FB
OMIA
MGI
RGD
ZFIN
SGD
IMPC
OMIM…
QTLdb
HPOA
EHR
Can we help machines understand
phenotypes?
“Triphalangealthumb”
Human phenotype
I have absolutely no idea what that means
Decomposition of complex concepts allows
interoperability
“Triphalangeal
thumb”
Phalanx of manual digit
=
Human phenotype PATO
Uberon
Species neutral ontologies, homologous concepts
Autopod
GO
=
duplicatedembryonic skeletal
system morphogenesis
Decomposition of complex concepts allows
interoperability
“Triphalangeal
thumb”
Phalanx of manual digit
=
Human phenotype PATO
Uberon
Species neutral ontologies, homologous concepts
Autopod
GO
“Polydactyly”
Mouse phenotype=
duplicatedembryonic skeletal
system morphogenesis
Example case solved by ExomiserP
he
no
typ
ic
pro
file
Ge
ne
s Heterozygous, missense mutation
STIM-1
N/A
Heterozygous, missense mutation
STIM-1N/A
Stim1Sax/Sax
Ranked STIM-1 variant maximally pathogenic based on cross-species G2P data,
in the absence of traditional data sourceshttps://exomiser.github.io/Exomiser/
bit.ly/stim1paper
Example case solved by ExomiserP
he
no
typ
ic
pro
file
Ge
ne
s Heterozygous, missense mutation
STIM-1
N/A
Heterozygous, missense mutation
STIM-1N/A
Stim1Sax/Sax
Ranked STIM-1 variant maximally pathogenic based on cross-species G2P data,
in the absence of traditional data sources
bit.ly/stim1paper
In Genomics England 100K Genomes, of first 1936 diagnosed patients, 82% are in the top 5 Exomiser hits across a range
of rare diseases and family structures
Harmonizing diseases, phenotypes, anatomy, and genotypes
91% of our 2.2 Million G2P associations require integrating 2 or more data sources
Enabling phenotype comparisons across
diseases and species
Plain language synonyms for computable
phenotypes
Translational applicability for FA
Tools can support more rapid diagnostics for FA patients
Integration of data enables mechanistic discovery and new candidate gene targets
Identification of models for FA hypothesis validation
Helping patients contribute data and participate in their ongoing evaluation, care, and science
Acknowledgements
Lawrence Berkeley
Chris Mungall
Suzanna Lewis
Jeremy Nguyen
Seth Carbon
Nicole Washington
Charite
Sebastian Kohler
Garvan
Tudor Groza
Craig McNamara
RTI
Jim Balhoff
Boston Children’s
Ingrid Holm
Catherine Brownstein
John Brownstein
ClinGen
Heidi Rehm
Larry Babb
Harindra Arachchi
OHSU
Matt Brush
Kent Shefchek
Julie McMurry
Tom Conlin
Nicole Vasilevsky
Dan Keith
Maureen Hoatlin
Genomics England/Queen Mary
Damian Smedley
Jules Jacobson
Tomasz Konopka
Pilar Cacheiro
Jackson Laboratory
Peter Robinson
Leigh Carmody
Hannah Blau
EBI
Helen Parkinson
David Osumi-Sutherland
With special thanks to Julie McMurry for excellent graphic design
Johns Hopkins
Chris Chute
Casey Overby
Ada Hamosh
Mayo
Hongfang Liu
Ravi Komandur
UCSC
David Haussler
Benedict Paten
Mark Deikhans
Scripps
Andrew Su
Ben Good
Chunlei Wu
Gregg Stupp
Sanford Health Imagenetics
Neal Boerkoel
Kayli Rageth
Murat Sincan
www.monarchinitiative.org
Chris Mungall, Peter Robinson, Damian Smedley Funding:
NIH Office of Director: 1R24OD011883; NIH-UDP: HHSN268201300036C, HHSN268201400093P;
NCINCI/Leidos #15X143, BD2K U54HG007990-S2 (Haussler) & BD2K PA-15-144-U01 (Kesselman)
extra
Layperson-HPO driven phenotyping tool
https://www.pcori.org/research-results/2017/realization-standard-care-rare-diseases-using-patient-engaged-phenotyping
Genes Environment Phenotypes
VCF PXFGFF
Standard exchange formats exist for genes … but
for phenotypes? Environment?
BED
What does a phenopacket look like?
Alacrima Sleep Apnea Microcephaly
phenotype_profile:
- entity: ”patient16"
phenotype:
types:
- id: "HP:0000522"
label: ”Alacrima"
onset:
description: “at birth”
types:
- id: "HP:0003577"
label: "Congenital onset"
evidence:
- types:
- id: "ECO:0000033"
label: ”Traceable Author Statement"
source:
- id: ”PMID:"
Clinical labs Public databases Journals
Layperson HPO + Phenopackets
Dry eyes Stops breathing during sleep Small head
phenotype_profile:
- entity: “Grace”
phenotype:
types:
- id: "HP:0000522"
label: “Alacrima"
onset:
description: “at birth"
types:
- id: "HP:0003577"
label: "Congenital onset"
evidence:
- types:
- id: “ECO:0000033”
label: “Traceable Author Statement"
source:
- id: “
https://twitter.com/examplepatient/status/1
23456789”
• Patient registries• Social media