global phenotypic data sharing standards to maximize diagnostic discovery
TRANSCRIPT
Global Phenotypic Data Sharing Standards to Maximize Diagnostic Discovery
Melissa Haendel, PhD and Sebastian Köhler, PhD
RD-Action workshopApril 26th and 27th, Brussels
Talk outline
About HPO Semantic similarity Leveraging basic research data Exome analysis and disease discovery HPO-based tools Phenotype data standards for exchange
What do we mean by phenotype?
= Phenotypic abnormality = clinical feature
Constellation/Pattern clinical featuresdefines a disease:– [Disease X]... is a rare developmental disorder defined by
the combination of aplasia cutis congenita of the scalpvertex and terminal transverse limb defects. In addition, vascular anomalies such as cutis marmoratatelangiectatica ... are recurrently seen.
(Yes, this is a simplification)
Starting point: OMIM
Clinical Synopsis (CS) section
Free text phenotypic description Very expressive
Online Mendelian Inheritance in Man database
(Un)Controlled Vocabularies
Not designed to be easily machine interpretable
Spelling problems, acronyms, etc.
Homonyms:
... fibrillation ...
fibrillation ≠ fibrillation
= ventricular fibrillation= muscle fibrillation
Why you should care
OMIM Query Number of Results
large bones 264
large bone 785
enlarged bones 87
enlarged bone 156
big bones 16
huge bones 4
massive bones 28
hyperplastic bones 12
hyperplastic bone 40
bone hyperplasia 134
increased bone growth 612
Motivation
HPO started in 2008
Goal: computer-interpretable clinical features!
Reliable information extraction from databases based on clinicalfeatures
Compute similarity between diseases based on clinical features
Compute similarity between patients based on clinical features
Compute similarity between patients and diseases based on clinicalfeatures
Interoperability with basic research to improve diagnostic discovery
Easy to use
Freely available
The Human Phenotype Ontology
(HPO)
Description of phenotypic abnormalities (or clinical features) in
humans
abnormality of thenervous system
neurofibrillary tangles
cerebral inclusion bodies
gait ataxia
gait disturbance
ataxia
phenotypicabnormality
incoordination
abnormality of movement
abnormality of the central nervous
system
This is a term
CS of OMIM:0815
CS of OMIM:1234
Neurofibrillary tanglesmay be present
Paired helical filaments
The Human Phenotype Ontology (HPO)
Synonyms merged into one term
Textual definitions for each term
id: HP:0002185
name: Neurofibrillary tangles
def: Pathological protein
aggregates formed by
hyperphosphorylation of a
microtubule-associated protein
known as tau, causing it to
aggregate in an insoluble form.
[HPO:sdoelken]
synonym: Neurofibrillary tangles
may be present EXACT []
synonym: Paired helical filaments
EXACT []
abnormality of thenervous system
neurofibrillary tangles
cerebral inclusion bodies
gait ataxia
gait disturbance
ataxia
phenotypicabnormality
abnormality of movement
abnormality of the central nervous
system
incoordination
The Human Phenotype Ontology
(HPO)
Semantic relations
(’subclass of’, ‘is a’)
From top to bottom,
terms get more specific
abnormality of the nervous system
neurofibrillary tangles
cerebral inclusionbodies
gait ataxia
gait disturbance
ataxia
phenotypicabnormality
abnormality of movement
abnormality of the central nervous
system
is a
is ais a
is a
is a
is a
is ais a
is a
is a
is a
is a
is a
is a incoordination
Computable phenotype definitions of
disease
HPO Terms are used to annotate (describe) diseases
E.g. neurofibrillary tangles is used to annotate Alzheimer Disease:
Orphanet + Monarch:
~124,000 annotations of 7,700 rare diseases from OMIM, Orphanet, DECIPHER
~133,000 annotations of 3,145 common diseases
Köhler et al. https://doi.org/10.1093/nar/gkw1039
OMIM:0815 OMIM:1234
Neurofibrillary tanglesmay be present
Paired helical filaments
Why HPO? Existing clinical vocabularies don’t
adequately cover phenotypic descriptions
Winnenburg and Bodenreider, 2014
0
10
20
30
40
50
60
70
80
90
100
HPO UMLS SNOMED CT CHV MedDRA MeSH NCIT ICD10 OMIM
Perc
ent
cove
rage
LDDB (✓)
Orphanet (✓) (Use HPO directly)
MedDRA (✓)
UMLS (completely incorporated)
Community contribution
Multiple HPO-specific workshopsConstant discussions via Tracker-System and E-Mail
We try our best to acknowledge contributors:
+ microattributions
Contributing to and extending HPO
HPO language translations
We need your help! http://bit.ly/hpo-translations
Translation of labels, synonyms, and text definitions
Italian Spanish Russian French
German English layperson Japanese Chinese
100%11%
12%
100%
19%19%
near 100%
20%
Adoption of HPO
Public facing databases using HPO toannotate patients
Tools ingesting HPO-annotated data:
Köhler et al. https://doi.org/10.1093/nar/gkw1039
Why HPO is a successful standard
One language shared by “all“ Synonyms “map“ to one concept (HPO term) Contains terms that no other ontology has Comes with disease annotations! (Not just “Yet another clinical
terminology“) Simple, qualitative phenotyping, deviation (abnormal, abnormal
increase, abnormal decrease, ...) to ease analysis Documented, traceable editing Open science community project with diverse contributors Constantly improved and extended, examples:
Layperson version for patients Language translations Opposite-relations between terms
Talk outline
About HPO Semantic similarity Leveraging basic research data Exome analysis and disease discovery HPO-based tools Phenotype data standards for exchange
A disease can be described
algorithmically as a collection of
phenotypes
Patient
Disease X
Differential diagnosis with matching phenotype concepts is already good
SplenomegalyNasal speech
Increased spleen size Nasal voice
These are synonyms in HPO, i.e. map to the same term
These are synonyms in HPO, i.e. map to the same term
A disease can be described
algorithmically as a collection of
phenotypes
Patient
Disease X
Differential diagnosis with similar but non-matching phenotypes is difficult
Splenomegaly Oral motor hypotonia
Ruptured spleen Decreased muscle mass
Similarity between two terms
Oral motor
hypotonia
Muscular
hypotonia of the
trunk
Abnormal muscle
tone
Oral motor
hypotonia
Abnormality of
calvarial
morphology
Phenotypic
abnormality
High scoring match
Very low scoring match
Medium scoring match
Score: Measured by Information Content
Comparing phenotype profiles
E.g. Patient-to-Disease
comparison
Patient‘s phenotypesmore similar to Disease A
Orphamizer would rank Disease A before DiseaseB
Disease BPatientPatient Disease A
High scoring match
Very low scoring match
Medium scoring match
Score: Measured by Information Content
Orphamizer
Talk outline
About HPO Semantic similarity Leveraging basic research data Exome analysis and disease discovery HPO-based visualization tools Phenotype data standards for exchange
The genome is sequenced, but...
3,398 OMIM
Mendelian Diseases with
no known genetic basis
?
At least 120,000* ClinVar
Variants with no known
pathogenicity
…we still don’t know very much about what it does
*This is > twice what it was in 2016!
Adding other species’ data
helps fill knowledge gaps in human genome
More species = more coverage
19,008
78%
14,779
Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 2016
19,008
Even inclusion of just four species boosts
phenotypic coverage of genes by 38%
(5189%)Combined = 89%
19,008
2,195 7,544 7,235 = 16,974 (union of coverage in any species)
9,739
51%
Mungall et al Nucleic Acids Research bit.ly/monarch-nar-2016
Ulcerated
paws
Palmoplantar
hyperkeratosis
Thick hand skin
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons –https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG
http://www.guinealynx.info/pododermatitis.html
Challenge: Each database uses their own
vocabulary/ontology
MP
HP
MGIHPOA
Challenge: Each database uses their own
phenotype vocabulary/ontology
ZFA
MPDPO
WPO
HP
OMIA
VT
FYPOAPO
SNOMED
…NCIT
…
WB
PB
FB
OMIA
MGI
RGD
ZFIN
SGD
HPOA
EHR
IMPCOMIM
…
QTLdb
Can we help machines understand
phenotype terms?
“Palmoplantar hyperkeratosis”
Human phenotype
I have absolutely no idea what that means
Decomposition of complex concepts
using species neutral terms
Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner, M. (2010). Integrating phenotype ontologies across multiple species. Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2
“Palmoplantar
hyperkeratosis”
increased
Stratum corneumlayer of skin
=
Human phenotypePATO
Uberon
Species neutral ontologies, homologous concepts
Autopod
keratinization
GO
How can anatomy be “species-
neutral”?
HPO Interoperability and
annotations
Hyposmia
Abnormality of
globe location
eyeball of camera-type eye
sensory perception of smell
Abnormal eye
morphology
Motor neuron
atrophyDeeply set eyes
motor neuronCL
34571 annotations in
22 species
157534 phenotype
annotations
2150 phenotype
annotations
11,813phenotype terms
127,125 rare disease -phenotype annotations
136,268common disease -phenotype annotations
http://bit.ly/hpo-paper
Which phenotypic profile is most
similar?
Model X
Patient
Disease Y
Model X
Patient
Disease Ywww.owlsim.org
Fuzzy-phenotype matching
But what about the diseases? How to choose
which ones? What is their provenance?
A dynamic nosology
Challenge: can we rapidly synergize multiple knowledge sources into a dynamic ontology?
classic clinical phenotype-oriented disease classification and molecular sources
Knowledge-based approaches
Logical Definition OWL Ontology Merging
Bayesian OWL Ontology Merging
Data driven
Phenotype and functional ontology networks
Mungall, C. J.,. (2016). k-BOOM: bioRxiv, 048843. doi:10.1101/048843
DOID(blue)
OMIM(brown)
MESH(grey)
ORDO/Orphanet(yellow)
SubClassOf(solid line)
Xref(dashed grey line)
4 disease resourcesplus mappings:Hemolytic anemia
Coherent disease classification =>
Orphanet
https://github.com/monarch-initiative/monarch-disease-ontology
“Ontology” Classes (before, after merge)
SubClass axioms Xrefs
Inputs:
DOID 6878 6012 7082 36656
MESH (D) 11314 4152 19036
OMIM (D) 7783 7783 0 31242
Orphanet (D) 8740 4683 15182 20326
OMIA 4833 4833 3120 355
DC 209 208 310 316
Medic 0 8630 3435
Output:
Merged 39757 27617 44837
Talk outline
About HPO Semantic similarity Leveraging basic research data Exome analysis and disease discovery HPO-based tools Phenotype data standards for exchange
Prevailing clinical genomic pipelines
leverage only a tiny fraction of the available
data
PATIENT EXOME/ GENOME
PATIENT CLINICAL PHENOTYPES
PUBLIC GENOMIC DATA
PUBLIC CLINICAL PHENOTYPE, DISEASE DATA
POSSIBLE DISEASES
DIAGNOSIS & TREATMENT
PATIENT ENVIRONMENTPUBLIC ENVIRONMENT,
DISEASE DATA
PATIENT OMICS PHENOTYPES PUBLIC OMICS PHENOTYPES,CORRELATIONS
Under-utilized data
Phenotypic profile matching
Combining G2P data for variant
prioritization
Whole exome
Remove off-target and common variants
Variant score from allele freq and pathogenicity
Phenotype score from phenotypic similarity
PHIVE score to give final candidates
Mendelian filters
Exomiser results for UDP diagnosed
patients
Inclusion of phenotype data improves variant prioritization
In 60% of first 1000 genomes at GEL, Exomiserpredicts top candidateIn 86% of cases, Exomiser predicts within top 5
Example case solved by ExomiserP
he
no
typ
ic
pro
file
Ge
ne
s Heterozygous, missense mutation
STIM-1
N/A
Heterozygous, missense mutation
STIM-1N/A
Stim1Sax/Sax
Ranked STIM-1 variant maximally pathogenic based on cross-species G2P data,
in the absence of traditional data sourceshttp://bit.ly/exomiser
Deep phenotyping and “fuzzy” matching
algorithms improve diagnostics
4.9% exomes with dual molecular diagnoses, differentiated with deep phenotyping
Talk outline
About HPO Semantic similarity Leveraging basic research data Exome analysis and disease discovery HPO-based tools Phenotype data standards for exchange
How much phenotyping is enough?
Enlarged ears (2)Dark hair (6) Female (4)Male (4)
Blue skin (1)Pointy ears (1)
Hair absent on head (1)Horns present (1)
Hair present on head (7)
Enlarged lip (2)
Increased skin pigmentation (3)
bit.ly/annotationsufficiency
Phenotype matching visualization
widget
file:///.file/id=6571367.18966428
bit.ly/monarch-nar-2016
Matchmaker Exchange for patients, diseases, and model
organisms to aid diagnosis and mechanistic discovery
www.monarchinitiative.orghttp://bit.ly/Monarch-MME
Goal: Get clinical sites & public databases to provide standardized phenotype data
Talk outline
About HPO Semantic similarity Leveraging basic research data Exome analysis and disease discovery HPO-based tools Phenotype data standards for exchange
Genes Environment Phenotypes+ =
Biology central dogma
Standards for exchanging data
must be up to these challenges.
Genes Environment Phenotypes+ =
Computable encodings are essential
Base pairsVariant notation (eg. HGVS)
SNOMED-CTMedical procedure codingEnvironment Ontology
@ontowonka
Genes Environment Phenotypes
VCF PXFGFF
Standard exchange mechanisms exist for
genes … but for phenotypes? Environment?
BED
Introducing PhenoPackets
A packet of phenotype data to be used
anywhere, written by anyone
http://phenopackets.org
What does a phenopacket look like?
Alacrima Sleep Apnea Microcephaly
phenotype_profile:
- entity: ”patient16"
phenotype:
types:
- id: "HP:0000522"
label: ”Alacrima"
onset:
description: “at birth”
types:
- id: "HP:0003577"
label: "Congenital onset"
evidence:
- types:
- id: "ECO:0000033"
label: ”Traceable Author Statement"
source:
- id: ”PMID:"
Clinical labs Public databases Journals
What about patients? Can they phenotype
themselves?
HPO for Patients
http://bit.ly/hpo-biocuration
6,200 plain language terms for patients, families, and non-experts
New software application being developed for patients
Layperson HPO + Phenopackets
Dry eyes Stops breathing during sleep Small head
phenotype_profile:
- entity: “Grace”
phenotype:
types:
- id: "HP:0000522"
label: “Alacrima"
onset:
description: “at birth"
types:
- id: "HP:0003577"
label: "Congenital onset"
evidence:
- types:
- id: “ECO:0000033”
label: “Traceable Author Statement"
source:
- id: “
https://twitter.com/examplepatient/status/1
23456789”
• Patient registries• Social media
Journals are now requiring HPO
terms
Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372
Each phenopacketcan be shared via DOI
in any repository outside paywall (eg.
Figshare, Zenodo, etc)
Each article can beassociated with a
phenopacket
Community “curate-athons” for of HPO
Cardiovascular curate-athon at Stanford. @20 cardiologists (surgeons, pediatric, etc.), four ontologists, and three clinical curators met for two days.
Abnormal ComplexVoltage to be added to all waves-increased, decreased, fluctuating (alternans)Duration to be added to all waves-increased, decreasedP wave-notching-axisQRS-fractionation-axis (right/left/extreme)Q waveR waveS waveR’ waveS’ wave (abnormal only)J wave (can be normal variant)Epsilon wave (abnormal only)Osborne wave (abnormal only)Terminal slur wave (can be normal variant)Delta wave (abnormal only)
Added 100s of clinically relevant cardiophysiology phenotypes to HPO, new exome analysis possible
Summary
The Human Phenotype Ontology is a robust standard describing phenotypic abnormalities FOR the community, FROM the community for deep phenotyping rare disease patients
Model organism data can fill gaps in our knowledge and aid mechanistic exploration of disease candidates
Tools that leverage the Human Phenotype Ontology can be used to prioritize coding and noncoding variants for WES and WGS and CNVs
Patients can provide self-phenotyping information as partners in the deep phenotyping process
Phenopackets is a FAIR-based GA4GH exchange standard for facilitating distributed phenotype data sharing for clinics, labs, patients, and journals
Acknowledgements
Orphanet
Ana Rath
Annie Olry
Marc Hanauer
Halima Lourghi
Lawrence BerkeleyChris Mungall
Suzanna Lewis
Jeremy Nguyen
Seth Carbon
RENCIJim Balhoff
OHSUMatt Brush
Kent Shefchek
Julie McMurry
Tom Conlin
Nicole Vasilevsky
Dan Keith
Genomics England/Queen Mary
Damian Smedley
Jules Jacobson
Jackson LaboratoryPeter Robinson
Leigh Carmody
With special thanks to Julie McMurry for excellent graphic design
GarvanTudor Groza
Craig McNamara
Hipbi / NeuroCureDominik Seelow
Markus Schülke-Gerstenfeld
ChariteDominik Seelow
Tomasz Zemojtel
www.monarchinitiative.org
Funding: NIH Office of Director: 2R24OD011883; NHGRI UDP: HHSN268201300036C,
HSN268201400093P; NCATS: UDN U01TR001395,
Biomedical Data Translator: 1OT3TR002019; E-RARE 2015: Hipbi-RD 01GM1608