making the most of phenotypes in ontology-based biomedical knowledge discovery

53
Making the most of phenotypes in ontology-based biomedical knowledge discovery 1 Michel Dumontier, Ph.D. Associate Professor of Medicine (Biomedical Informatics) Stanford University @micheldumontier::Biostats:19-02-15

Upload: michel-dumontier

Post on 15-Apr-2017

483 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Making the most of phenotypes in ontology-based biomedical knowledge discovery

1

Making the most of phenotypes in ontology-based biomedical

knowledge discovery

Michel Dumontier, Ph.D.

Associate Professor of Medicine (Biomedical Informatics)Stanford University

@micheldumontier::Biostats:19-02-15

Page 2: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-152

Topics

• Computable Phenotypes• Methods to compare Phenotypes• Cross-Species Phenotype Integration• Applications

– Undiagnosed Diseases– Drug Target Identification– Drug Repurposing

Page 3: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-153

Phenotypes

• A phenotype is an observable characteristic of an individual and typically pertains to its morphology, function, and behavior.– qualitative, deals with normal and abnormal phen.

– red eye color, abnormal gait, enlarged colon

Page 4: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-154

Diagnosis uses observable/measured phenotypes

“Phenotypic Profile”

Page 5: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-155

Matching patients to diseases

Patient

Disease X

Differential diagnosis with similar but non-matching phenotypes is difficult

Flat back of head Hypotonia

Abnormal skull morphology Decreased muscle mass

Page 6: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-156

Differential diagnosis becomes challenging with rare and complex disorders

• Over 7000 rare diseases• < 1 in 1500-2500• Most have fewer than 50

case reports• Nearly 1 in 10 Americans

suffer from one or more rare diseases

• Only 250 medicinal products have been approved to diagnose and treat rare diseases

Carpenter Syndrome- acrocephalopolysyndactyly (ACPS)

disorder- 40 cases described in the literature- <1 in 1M

Page 7: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-157

Genotypes + Phenotypes Improves Diagnosis

Remove off-target, common variants, and variants not in known disease causing genes

http://compbio.charite.de/PhenIX/

Target panel of 2741 known Mendelian disease genes

Compare phenotype profiles from:Clinvar, OMIM, Orphanet

Zemojtel et al. Sci Transl Med. 2014. 6(252):252ra123

Page 8: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-158

PhenIX helped diagnose 11/40 patients

Page 9: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-159

So how did they do it?

1. Computable representation of phenotypes2. Methods to compare phenotype profiles3. Using model organisms to increase coverage

of the phenotype space

Page 10: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1510

Difficult to find all results using text searches

Page 11: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1511

The Human Phenotype Ontology: A Computable Representation of Human Phenotypes

11,000+ classes

Follows the True Path Rule

Used to annotate:• Patients• Disorders/Diseases• Genes, Gene Variants, & Genotypes

Köhler et al. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.

Page 12: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1512

HPO has unique terms

Winnenburg and Bodenreider, ISMB PhenoDay, 2014

Page 13: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1513

Increased numbers of diseases are described using the HPO

Phenotype annotations per species

http://www.monarchinitiative.org

Page 14: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1514

Phenotype “BLAST”: Which phenotypic profile is most similar?

Disease X

Patient

Disease Y

Page 15: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1515

Phenotips: Getting high quality patient phenotypes

Girdea et al. (2013), PhenoTips: Patient Phenotyping Software for Clinical and Research Use. Hum. Mutat., 34: 1057–1065. doi: 10.1002/humu.22347

Page 16: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1516

Semantic Similarity

• Semantic similarity is a metric defined over a set of terms, where the distance between them is based on their meaning.

• It can be estimated by examining, for instance,– Topological similarity– Information content– Statistical co-occurrence

• Widely used in bioinformatics for gene enrichment, function prediction, network screening, clustering, etc.

Page 17: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1517

= X

similarity

Page 18: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1518

Measures of Semantic Similarity

Edge-Based Measures– Shortest path (Rada)– Common path– Scaling by depth, etc.

• Requires uniform distribution of nodes and edges

Node-based Measures– Shared terms– Common ancestors– Information content (IC)

• Better able to account for structural heterogeneity

Set comparisons• Pairwise

– Max/average/sum– All or best pairs

• Groupwise– Set, graph, vector– Various combinations

Implementations– Semanticmeasureslibrary.org– OWL-SIMSemantic Similarity in Biomedical Ontologies

PLoS Comput Biol. 2009 Jul; 5(7): e1000443.

Page 19: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1519

Term specificity

𝑖𝑐 (𝑡 )=− log (𝑃 (𝑡 ))

𝑖𝑐 (𝑡 )= h𝑑𝑒𝑝𝑡 (𝑡 )𝑥 (1− log (𝑑𝑒𝑠𝑐 (𝑡 )+1 )log (𝑡𝑜𝑡𝑎𝑙𝑡𝑒𝑟𝑚𝑠 ) )

Structure-based

Corpus-based

by: Heiko Muller, CSIRO

Page 20: Making the most of phenotypes in ontology-based biomedical knowledge discovery

20

Group-wise Similarity

@micheldumontier::Biostats:19-02-15

𝐽 ( 𝐴 ,𝐵 )=¿ 𝐴∩𝐵∨ ¿¿ 𝐴∪𝐵∨¿ ¿

¿

𝐽 (g 1,g 2 )= 611=0.55

Page 21: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1521

Group-wise Semantic Similarity

IC(g1) = 10.66IC(g1) = 9.79IC(g1 g2) = 2.79-------------------sim(g1,g2)=0.27

𝑠𝑖𝑚 (g1 ,g2 )=12 ( 𝐼𝐶 (g1⊕g2)

𝐼𝐶(g 1)+𝐼𝐶 (g1⊕g2)𝐼𝐶(g 2) )

X. Chen et al. Gene. 2012. 509(1):131-5

Page 22: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1522

Robustness of phenotype annotations

Page 23: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1523 Image credit: Viljoen and Beighton, J Med Genet. 1992

Schwartz-jampel Syndrome, Type I

Schwartz-jampel Syndrome, Type I Caused by Hspg2 mutation, a proteoglycan~100 phenotype annotations

Page 24: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1524

Similarity of Schwartz-jampel Syndrome derivations

Page 25: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1525

Semantic similarity is robust in the face of missing

information

92% of derived profiles are most similar to original disease profile

Profile Similarity Derived Profile Rank

Page 26: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1526

Semantic similarity algorithms are sensitive to specificity of information

The more general the phenotype, the poorer the match the disease

Profile Similarity Derived Profile Rank

Page 27: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1527

Annotation Sufficiency Score

http://www.phenotips.orghttp://www.monarchinitiative.org

Page 28: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1528

Problem: less than 40% of human genes are annotated with phenotypes

GWAS+

ClinVar +

OMIM

Page 29: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1529

B6.Cg-Alms1foz/fox/J

increased weight,adipose tissue volume,

glucose homeostasis altered

ALSM1(NM_015120.4)[c.10775delC] + [-]

GENOTYPE

PHENOTYPE

obesity,diabetes mellitus, insulin resistance

increased food intake, hyperglycemia,

insulin resistance

kcnj11c14/c14; insrt143/+(AB)

A multi-species inventory of phenotypes from genetic perturbations

Page 30: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1530

Down Syndrome Mouse

Ts65Dn mice survive to adulthood and express some characteristics of Down syndrome such as developmental delay, hyperactivity, weight problems, craniofacial dysmorphology, impaired learning, and behavior deficit

Page 31: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1531

Each species uniquely covers a different set of phenotypes

Provides an opportunity to use this information to inform human disease

Page 32: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1532

Human and model phenotypes can be linked to >75% human genes

Page 33: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1533

Problem: Clinical and model phenotypes are described differently

Page 34: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1534

lung

lung

lobular organ

parenchymatous organ

solid organ

pleural sac

thoracic cavity organ

thoracic cavity

abnormal lung morphology

abnormal respiratory system morphology

Mammalian Phenotype

Mouse Anatomy

FMA

abnormal pulmonary acinus morphology

abnormal pulmonary alveolus morphology

lungalveolus

organ system

respiratory system

Lower respiratory

tract

alveolar sac

pulmonary acinus

organ system

respiratory system

Human development

lung

lung bud

respiratory primordium

pharyngeal region

Problem:Each organism uses different vocabularies

develops_frompart_of

is_a (SubClassOf)

surrounded_by

Page 35: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1535

MPHP

Page 36: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1536

Enhance lexical approach with OWL bridging axioms

• Key idea:– Describe the phenotype in a machine-interpretable way

• Break it down into digestible chunks!• Logical definition

– The machine will then be able to help you• Match phenotypes• Automate ontology checking and addition of new terms

• Approach:– Use Web Ontology Language (OWL), a description logic to

describe phenotypes– Use OWL reasoning to find connections

Mungall et al. (2012). Genome Biology, 13(1), R5Köhler et al. (2014) F1000Research 2:30Haendel et al. (2014) JBMS 5:21

Hoendorf et al. (2011). NAR 39(18):e119Hoendorf et al. (2011) Bioinformatics 27(7):1001

Page 37: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1537

MPUberon(Anatomy)

CL(cell types)

Page 38: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1538

PATO(qualities)

Page 39: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1539

MPHP

‘abnormal phenotype’ and has_entity some ‘type B pancreatic cell’ and has_quality some amount

‘abnormal phenotype’ and has_entity some ‘type B pancreatic cell’ and has_quality some ‘reduced amount’

Page 40: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1540

Monarch Cross-Species Similarity

Page 41: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1541

PhenomeDrug

Computational methods that use phenotypes to predict drug targets, drug effects, and drug indications

Page 42: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1542

animal models provide insight for on target effects

• In the majority of 100 best selling drugs ($148B in US alone), there is a direct correlation between knockout phenotype and drug effect

• Immunological Indications– Anti-histamines (Claritin, Allegra, Zyrtec)– KO of histamine H1 receptor leads to decreased

responsiveness of immune system– Predicts on target effects : drowsiness, reduced anxiety

Zambrowicz and Sands. Nat Rev Drug Disc. 2003.

Page 43: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1543

Identifying drug targets from mouse knock-out phenotypes

drug

gene

phenotypes effects

human gene

non-functional gene model

ortholog

similar

inhibits

Main idea: if a drug’s phenotypes matches the phenotypes of a null model, this suggests that the drug is an inhibitor of the gene

Page 44: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-15

Terminological Interoperability(we must compare apples with apples)

Mouse Phenotypes

Drug effects(mappings from UMLS to DO, NBO, MP)

Mammalian Phenotype OntologyPhenomeNet

PhenomeDrug

Page 45: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1545

Semantic SimilarityGiven a drug effect profile D and a mouse model M, we compute the semantic similarity as an information weighted Jaccard metric.

The similarity measure used is non-symmetrical and determines the amount of information about a drug effect profile D that is covered by a set of mouse model phenotypes M.

Page 46: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-15

Loss of function models predict targets of inhibitor drugs

• 14,682 drugs; 7,255 mouse genotypes• Validation against known and predicted inhibitor-target pairs

– 0.76 ROC AUC for human targets (DrugBank)– 0.81 ROC AUC for mouse targets (STITCH)

• diclofenac (STITCH:000003032) – NSAID used to treat pain, osteoarthritis and rheumatoid arthritis– Drug effects include liver inflammation (hepatitis), swelling of liver

(hepatomegaly), redness of skin (erythema)– 49% explained by PPARg knockout

• peroxisome proliferator activated receptor gamma (PPARg) regulates metabolism, proliferation, inflammation and differentiation,

• Diclofenac is a known inhibitor

– 46% explained by COX-2 knockout • Diclofenac is a known inhibitor

Hoehndorf R, Hiebert T, Hardy NW, Schofield PN, Gkoutos GV, Dumontier M. Mouse model phenotypes provide information about human drug targets. Bioinformatics. 2014 Mar 1;30(5):719-25

Page 47: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1547

Computational Drug Repurposing

• Similarity– Guilt by association– If drug i is similar to drug j, and drug i treats

disease x, then drug j may treat disease x• Complementarity

– if the signature of drug i complements/counters the signature of disease x, then drug i may treat disease x

Page 48: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1548

PhenomeDrug:phenotypic complementarity

• Extends the idea to match opposing drug-disease phenotypes – Drugs that induce hypotension may be useful in

treating hypertension• Problem: We don’t have any information about

phenotypic complementarity– We generated over 300 antonym pairs for the Human

Phenotype Ontology– Developed a measure to compute phenotypic

complementarity

Page 49: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1549

Phenotype-Based Drug Repurposing

Page 50: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1550

Preliminary Results

• Suggest that for some well annotated diseases, we recapitulate top drug candidates

• Quality of drug annotation is an issue– Some drugs have

insufficient annotations to find “good” matches

• Full assessment underway

• Pulmonary Arterial Hypertension

Page 51: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1551

Summary

• Ontologies provide the structure and semantics by which phenotypes can be accurately represented and computed with

• Measures of semantic similarity in combination with terminological integration enable a broad diversity of ontology-based analyses, including– Diagnosis of rare diseases– Identifying human drug targets– Drug repositioning

Page 52: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1552

Acknowledgements

Dumontier Lab• Tanya Hiebert• Joachim Baran

PhenomeDrug• Robert Hoehndorf• George Gkoutos

Monarch Initiative• Melissa Haendel• Peter Robinson• Chris Mungall• the Monarch Team

Page 53: Making the most of phenotypes in ontology-based biomedical knowledge discovery

@micheldumontier::Biostats:19-02-1553

[email protected]

Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier