making the most of phenotypes in ontology-based biomedical knowledge discovery
TRANSCRIPT
1
Making the most of phenotypes in ontology-based biomedical
knowledge discovery
Michel Dumontier, Ph.D.
Associate Professor of Medicine (Biomedical Informatics)Stanford University
@micheldumontier::Biostats:19-02-15
@micheldumontier::Biostats:19-02-152
Topics
• Computable Phenotypes• Methods to compare Phenotypes• Cross-Species Phenotype Integration• Applications
– Undiagnosed Diseases– Drug Target Identification– Drug Repurposing
@micheldumontier::Biostats:19-02-153
Phenotypes
• A phenotype is an observable characteristic of an individual and typically pertains to its morphology, function, and behavior.– qualitative, deals with normal and abnormal phen.
– red eye color, abnormal gait, enlarged colon
@micheldumontier::Biostats:19-02-154
Diagnosis uses observable/measured phenotypes
“Phenotypic Profile”
@micheldumontier::Biostats:19-02-155
Matching patients to diseases
Patient
Disease X
Differential diagnosis with similar but non-matching phenotypes is difficult
Flat back of head Hypotonia
Abnormal skull morphology Decreased muscle mass
@micheldumontier::Biostats:19-02-156
Differential diagnosis becomes challenging with rare and complex disorders
• Over 7000 rare diseases• < 1 in 1500-2500• Most have fewer than 50
case reports• Nearly 1 in 10 Americans
suffer from one or more rare diseases
• Only 250 medicinal products have been approved to diagnose and treat rare diseases
Carpenter Syndrome- acrocephalopolysyndactyly (ACPS)
disorder- 40 cases described in the literature- <1 in 1M
@micheldumontier::Biostats:19-02-157
Genotypes + Phenotypes Improves Diagnosis
Remove off-target, common variants, and variants not in known disease causing genes
http://compbio.charite.de/PhenIX/
Target panel of 2741 known Mendelian disease genes
Compare phenotype profiles from:Clinvar, OMIM, Orphanet
Zemojtel et al. Sci Transl Med. 2014. 6(252):252ra123
@micheldumontier::Biostats:19-02-158
PhenIX helped diagnose 11/40 patients
@micheldumontier::Biostats:19-02-159
So how did they do it?
1. Computable representation of phenotypes2. Methods to compare phenotype profiles3. Using model organisms to increase coverage
of the phenotype space
@micheldumontier::Biostats:19-02-1510
Difficult to find all results using text searches
@micheldumontier::Biostats:19-02-1511
The Human Phenotype Ontology: A Computable Representation of Human Phenotypes
11,000+ classes
Follows the True Path Rule
Used to annotate:• Patients• Disorders/Diseases• Genes, Gene Variants, & Genotypes
Köhler et al. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.
@micheldumontier::Biostats:19-02-1512
HPO has unique terms
Winnenburg and Bodenreider, ISMB PhenoDay, 2014
@micheldumontier::Biostats:19-02-1513
Increased numbers of diseases are described using the HPO
Phenotype annotations per species
http://www.monarchinitiative.org
@micheldumontier::Biostats:19-02-1514
Phenotype “BLAST”: Which phenotypic profile is most similar?
Disease X
Patient
Disease Y
@micheldumontier::Biostats:19-02-1515
Phenotips: Getting high quality patient phenotypes
Girdea et al. (2013), PhenoTips: Patient Phenotyping Software for Clinical and Research Use. Hum. Mutat., 34: 1057–1065. doi: 10.1002/humu.22347
@micheldumontier::Biostats:19-02-1516
Semantic Similarity
• Semantic similarity is a metric defined over a set of terms, where the distance between them is based on their meaning.
• It can be estimated by examining, for instance,– Topological similarity– Information content– Statistical co-occurrence
• Widely used in bioinformatics for gene enrichment, function prediction, network screening, clustering, etc.
@micheldumontier::Biostats:19-02-1517
= X
similarity
@micheldumontier::Biostats:19-02-1518
Measures of Semantic Similarity
Edge-Based Measures– Shortest path (Rada)– Common path– Scaling by depth, etc.
• Requires uniform distribution of nodes and edges
Node-based Measures– Shared terms– Common ancestors– Information content (IC)
• Better able to account for structural heterogeneity
Set comparisons• Pairwise
– Max/average/sum– All or best pairs
• Groupwise– Set, graph, vector– Various combinations
Implementations– Semanticmeasureslibrary.org– OWL-SIMSemantic Similarity in Biomedical Ontologies
PLoS Comput Biol. 2009 Jul; 5(7): e1000443.
@micheldumontier::Biostats:19-02-1519
Term specificity
𝑖𝑐 (𝑡 )=− log (𝑃 (𝑡 ))
𝑖𝑐 (𝑡 )= h𝑑𝑒𝑝𝑡 (𝑡 )𝑥 (1− log (𝑑𝑒𝑠𝑐 (𝑡 )+1 )log (𝑡𝑜𝑡𝑎𝑙𝑡𝑒𝑟𝑚𝑠 ) )
Structure-based
Corpus-based
by: Heiko Muller, CSIRO
20
Group-wise Similarity
@micheldumontier::Biostats:19-02-15
𝐽 ( 𝐴 ,𝐵 )=¿ 𝐴∩𝐵∨ ¿¿ 𝐴∪𝐵∨¿ ¿
¿
𝐽 (g 1,g 2 )= 611=0.55
@micheldumontier::Biostats:19-02-1521
Group-wise Semantic Similarity
IC(g1) = 10.66IC(g1) = 9.79IC(g1 g2) = 2.79-------------------sim(g1,g2)=0.27
𝑠𝑖𝑚 (g1 ,g2 )=12 ( 𝐼𝐶 (g1⊕g2)
𝐼𝐶(g 1)+𝐼𝐶 (g1⊕g2)𝐼𝐶(g 2) )
X. Chen et al. Gene. 2012. 509(1):131-5
@micheldumontier::Biostats:19-02-1522
Robustness of phenotype annotations
@micheldumontier::Biostats:19-02-1523 Image credit: Viljoen and Beighton, J Med Genet. 1992
Schwartz-jampel Syndrome, Type I
Schwartz-jampel Syndrome, Type I Caused by Hspg2 mutation, a proteoglycan~100 phenotype annotations
@micheldumontier::Biostats:19-02-1524
Similarity of Schwartz-jampel Syndrome derivations
@micheldumontier::Biostats:19-02-1525
Semantic similarity is robust in the face of missing
information
92% of derived profiles are most similar to original disease profile
Profile Similarity Derived Profile Rank
@micheldumontier::Biostats:19-02-1526
Semantic similarity algorithms are sensitive to specificity of information
The more general the phenotype, the poorer the match the disease
Profile Similarity Derived Profile Rank
@micheldumontier::Biostats:19-02-1527
Annotation Sufficiency Score
http://www.phenotips.orghttp://www.monarchinitiative.org
@micheldumontier::Biostats:19-02-1528
Problem: less than 40% of human genes are annotated with phenotypes
GWAS+
ClinVar +
OMIM
@micheldumontier::Biostats:19-02-1529
B6.Cg-Alms1foz/fox/J
increased weight,adipose tissue volume,
glucose homeostasis altered
ALSM1(NM_015120.4)[c.10775delC] + [-]
GENOTYPE
PHENOTYPE
obesity,diabetes mellitus, insulin resistance
increased food intake, hyperglycemia,
insulin resistance
kcnj11c14/c14; insrt143/+(AB)
A multi-species inventory of phenotypes from genetic perturbations
@micheldumontier::Biostats:19-02-1530
Down Syndrome Mouse
Ts65Dn mice survive to adulthood and express some characteristics of Down syndrome such as developmental delay, hyperactivity, weight problems, craniofacial dysmorphology, impaired learning, and behavior deficit
@micheldumontier::Biostats:19-02-1531
Each species uniquely covers a different set of phenotypes
Provides an opportunity to use this information to inform human disease
@micheldumontier::Biostats:19-02-1532
Human and model phenotypes can be linked to >75% human genes
@micheldumontier::Biostats:19-02-1533
Problem: Clinical and model phenotypes are described differently
@micheldumontier::Biostats:19-02-1534
lung
lung
lobular organ
parenchymatous organ
solid organ
pleural sac
thoracic cavity organ
thoracic cavity
abnormal lung morphology
abnormal respiratory system morphology
Mammalian Phenotype
Mouse Anatomy
FMA
abnormal pulmonary acinus morphology
abnormal pulmonary alveolus morphology
lungalveolus
organ system
respiratory system
Lower respiratory
tract
alveolar sac
pulmonary acinus
organ system
respiratory system
Human development
lung
lung bud
respiratory primordium
pharyngeal region
Problem:Each organism uses different vocabularies
develops_frompart_of
is_a (SubClassOf)
surrounded_by
@micheldumontier::Biostats:19-02-1535
MPHP
@micheldumontier::Biostats:19-02-1536
Enhance lexical approach with OWL bridging axioms
• Key idea:– Describe the phenotype in a machine-interpretable way
• Break it down into digestible chunks!• Logical definition
– The machine will then be able to help you• Match phenotypes• Automate ontology checking and addition of new terms
• Approach:– Use Web Ontology Language (OWL), a description logic to
describe phenotypes– Use OWL reasoning to find connections
Mungall et al. (2012). Genome Biology, 13(1), R5Köhler et al. (2014) F1000Research 2:30Haendel et al. (2014) JBMS 5:21
Hoendorf et al. (2011). NAR 39(18):e119Hoendorf et al. (2011) Bioinformatics 27(7):1001
@micheldumontier::Biostats:19-02-1537
MPUberon(Anatomy)
CL(cell types)
@micheldumontier::Biostats:19-02-1538
PATO(qualities)
@micheldumontier::Biostats:19-02-1539
MPHP
‘abnormal phenotype’ and has_entity some ‘type B pancreatic cell’ and has_quality some amount
‘abnormal phenotype’ and has_entity some ‘type B pancreatic cell’ and has_quality some ‘reduced amount’
@micheldumontier::Biostats:19-02-1540
Monarch Cross-Species Similarity
@micheldumontier::Biostats:19-02-1541
PhenomeDrug
Computational methods that use phenotypes to predict drug targets, drug effects, and drug indications
@micheldumontier::Biostats:19-02-1542
animal models provide insight for on target effects
• In the majority of 100 best selling drugs ($148B in US alone), there is a direct correlation between knockout phenotype and drug effect
• Immunological Indications– Anti-histamines (Claritin, Allegra, Zyrtec)– KO of histamine H1 receptor leads to decreased
responsiveness of immune system– Predicts on target effects : drowsiness, reduced anxiety
Zambrowicz and Sands. Nat Rev Drug Disc. 2003.
@micheldumontier::Biostats:19-02-1543
Identifying drug targets from mouse knock-out phenotypes
drug
gene
phenotypes effects
human gene
non-functional gene model
ortholog
similar
inhibits
Main idea: if a drug’s phenotypes matches the phenotypes of a null model, this suggests that the drug is an inhibitor of the gene
@micheldumontier::Biostats:19-02-15
Terminological Interoperability(we must compare apples with apples)
Mouse Phenotypes
Drug effects(mappings from UMLS to DO, NBO, MP)
Mammalian Phenotype OntologyPhenomeNet
PhenomeDrug
@micheldumontier::Biostats:19-02-1545
Semantic SimilarityGiven a drug effect profile D and a mouse model M, we compute the semantic similarity as an information weighted Jaccard metric.
The similarity measure used is non-symmetrical and determines the amount of information about a drug effect profile D that is covered by a set of mouse model phenotypes M.
@micheldumontier::Biostats:19-02-15
Loss of function models predict targets of inhibitor drugs
• 14,682 drugs; 7,255 mouse genotypes• Validation against known and predicted inhibitor-target pairs
– 0.76 ROC AUC for human targets (DrugBank)– 0.81 ROC AUC for mouse targets (STITCH)
• diclofenac (STITCH:000003032) – NSAID used to treat pain, osteoarthritis and rheumatoid arthritis– Drug effects include liver inflammation (hepatitis), swelling of liver
(hepatomegaly), redness of skin (erythema)– 49% explained by PPARg knockout
• peroxisome proliferator activated receptor gamma (PPARg) regulates metabolism, proliferation, inflammation and differentiation,
• Diclofenac is a known inhibitor
– 46% explained by COX-2 knockout • Diclofenac is a known inhibitor
Hoehndorf R, Hiebert T, Hardy NW, Schofield PN, Gkoutos GV, Dumontier M. Mouse model phenotypes provide information about human drug targets. Bioinformatics. 2014 Mar 1;30(5):719-25
@micheldumontier::Biostats:19-02-1547
Computational Drug Repurposing
• Similarity– Guilt by association– If drug i is similar to drug j, and drug i treats
disease x, then drug j may treat disease x• Complementarity
– if the signature of drug i complements/counters the signature of disease x, then drug i may treat disease x
@micheldumontier::Biostats:19-02-1548
PhenomeDrug:phenotypic complementarity
• Extends the idea to match opposing drug-disease phenotypes – Drugs that induce hypotension may be useful in
treating hypertension• Problem: We don’t have any information about
phenotypic complementarity– We generated over 300 antonym pairs for the Human
Phenotype Ontology– Developed a measure to compute phenotypic
complementarity
@micheldumontier::Biostats:19-02-1549
Phenotype-Based Drug Repurposing
@micheldumontier::Biostats:19-02-1550
Preliminary Results
• Suggest that for some well annotated diseases, we recapitulate top drug candidates
• Quality of drug annotation is an issue– Some drugs have
insufficient annotations to find “good” matches
• Full assessment underway
• Pulmonary Arterial Hypertension
@micheldumontier::Biostats:19-02-1551
Summary
• Ontologies provide the structure and semantics by which phenotypes can be accurately represented and computed with
• Measures of semantic similarity in combination with terminological integration enable a broad diversity of ontology-based analyses, including– Diagnosis of rare diseases– Identifying human drug targets– Drug repositioning
@micheldumontier::Biostats:19-02-1552
Acknowledgements
Dumontier Lab• Tanya Hiebert• Joachim Baran
PhenomeDrug• Robert Hoehndorf• George Gkoutos
Monarch Initiative• Melissa Haendel• Peter Robinson• Chris Mungall• the Monarch Team
@micheldumontier::Biostats:19-02-1553
Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier