making the most of phenotypes in ontology-based biomedical knowledge discovery

Post on 15-Apr-2017

483 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Making the most of phenotypes in ontology-based biomedical

knowledge discovery

Michel Dumontier, Ph.D.

Associate Professor of Medicine (Biomedical Informatics)Stanford University

@micheldumontier::Biostats:19-02-15

@micheldumontier::Biostats:19-02-152

Topics

• Computable Phenotypes• Methods to compare Phenotypes• Cross-Species Phenotype Integration• Applications

– Undiagnosed Diseases– Drug Target Identification– Drug Repurposing

@micheldumontier::Biostats:19-02-153

Phenotypes

• A phenotype is an observable characteristic of an individual and typically pertains to its morphology, function, and behavior.– qualitative, deals with normal and abnormal phen.

– red eye color, abnormal gait, enlarged colon

@micheldumontier::Biostats:19-02-154

Diagnosis uses observable/measured phenotypes

“Phenotypic Profile”

@micheldumontier::Biostats:19-02-155

Matching patients to diseases

Patient

Disease X

Differential diagnosis with similar but non-matching phenotypes is difficult

Flat back of head Hypotonia

Abnormal skull morphology Decreased muscle mass

@micheldumontier::Biostats:19-02-156

Differential diagnosis becomes challenging with rare and complex disorders

• Over 7000 rare diseases• < 1 in 1500-2500• Most have fewer than 50

case reports• Nearly 1 in 10 Americans

suffer from one or more rare diseases

• Only 250 medicinal products have been approved to diagnose and treat rare diseases

Carpenter Syndrome- acrocephalopolysyndactyly (ACPS)

disorder- 40 cases described in the literature- <1 in 1M

@micheldumontier::Biostats:19-02-157

Genotypes + Phenotypes Improves Diagnosis

Remove off-target, common variants, and variants not in known disease causing genes

http://compbio.charite.de/PhenIX/

Target panel of 2741 known Mendelian disease genes

Compare phenotype profiles from:Clinvar, OMIM, Orphanet

Zemojtel et al. Sci Transl Med. 2014. 6(252):252ra123

@micheldumontier::Biostats:19-02-158

PhenIX helped diagnose 11/40 patients

@micheldumontier::Biostats:19-02-159

So how did they do it?

1. Computable representation of phenotypes2. Methods to compare phenotype profiles3. Using model organisms to increase coverage

of the phenotype space

@micheldumontier::Biostats:19-02-1510

Difficult to find all results using text searches

@micheldumontier::Biostats:19-02-1511

The Human Phenotype Ontology: A Computable Representation of Human Phenotypes

11,000+ classes

Follows the True Path Rule

Used to annotate:• Patients• Disorders/Diseases• Genes, Gene Variants, & Genotypes

Köhler et al. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.

@micheldumontier::Biostats:19-02-1512

HPO has unique terms

Winnenburg and Bodenreider, ISMB PhenoDay, 2014

@micheldumontier::Biostats:19-02-1513

Increased numbers of diseases are described using the HPO

Phenotype annotations per species

http://www.monarchinitiative.org

@micheldumontier::Biostats:19-02-1514

Phenotype “BLAST”: Which phenotypic profile is most similar?

Disease X

Patient

Disease Y

@micheldumontier::Biostats:19-02-1515

Phenotips: Getting high quality patient phenotypes

Girdea et al. (2013), PhenoTips: Patient Phenotyping Software for Clinical and Research Use. Hum. Mutat., 34: 1057–1065. doi: 10.1002/humu.22347

@micheldumontier::Biostats:19-02-1516

Semantic Similarity

• Semantic similarity is a metric defined over a set of terms, where the distance between them is based on their meaning.

• It can be estimated by examining, for instance,– Topological similarity– Information content– Statistical co-occurrence

• Widely used in bioinformatics for gene enrichment, function prediction, network screening, clustering, etc.

@micheldumontier::Biostats:19-02-1517

= X

similarity

@micheldumontier::Biostats:19-02-1518

Measures of Semantic Similarity

Edge-Based Measures– Shortest path (Rada)– Common path– Scaling by depth, etc.

• Requires uniform distribution of nodes and edges

Node-based Measures– Shared terms– Common ancestors– Information content (IC)

• Better able to account for structural heterogeneity

Set comparisons• Pairwise

– Max/average/sum– All or best pairs

• Groupwise– Set, graph, vector– Various combinations

Implementations– Semanticmeasureslibrary.org– OWL-SIMSemantic Similarity in Biomedical Ontologies

PLoS Comput Biol. 2009 Jul; 5(7): e1000443.

@micheldumontier::Biostats:19-02-1519

Term specificity

𝑖𝑐 (𝑡 )=− log (𝑃 (𝑡 ))

𝑖𝑐 (𝑡 )= h𝑑𝑒𝑝𝑡 (𝑡 )𝑥 (1− log (𝑑𝑒𝑠𝑐 (𝑡 )+1 )log (𝑡𝑜𝑡𝑎𝑙𝑡𝑒𝑟𝑚𝑠 ) )

Structure-based

Corpus-based

by: Heiko Muller, CSIRO

20

Group-wise Similarity

@micheldumontier::Biostats:19-02-15

𝐽 ( 𝐴 ,𝐵 )=¿ 𝐴∩𝐵∨ ¿¿ 𝐴∪𝐵∨¿ ¿

¿

𝐽 (g 1,g 2 )= 611=0.55

@micheldumontier::Biostats:19-02-1521

Group-wise Semantic Similarity

IC(g1) = 10.66IC(g1) = 9.79IC(g1 g2) = 2.79-------------------sim(g1,g2)=0.27

𝑠𝑖𝑚 (g1 ,g2 )=12 ( 𝐼𝐶 (g1⊕g2)

𝐼𝐶(g 1)+𝐼𝐶 (g1⊕g2)𝐼𝐶(g 2) )

X. Chen et al. Gene. 2012. 509(1):131-5

@micheldumontier::Biostats:19-02-1522

Robustness of phenotype annotations

@micheldumontier::Biostats:19-02-1523 Image credit: Viljoen and Beighton, J Med Genet. 1992

Schwartz-jampel Syndrome, Type I

Schwartz-jampel Syndrome, Type I Caused by Hspg2 mutation, a proteoglycan~100 phenotype annotations

@micheldumontier::Biostats:19-02-1524

Similarity of Schwartz-jampel Syndrome derivations

@micheldumontier::Biostats:19-02-1525

Semantic similarity is robust in the face of missing

information

92% of derived profiles are most similar to original disease profile

Profile Similarity Derived Profile Rank

@micheldumontier::Biostats:19-02-1526

Semantic similarity algorithms are sensitive to specificity of information

The more general the phenotype, the poorer the match the disease

Profile Similarity Derived Profile Rank

@micheldumontier::Biostats:19-02-1527

Annotation Sufficiency Score

http://www.phenotips.orghttp://www.monarchinitiative.org

@micheldumontier::Biostats:19-02-1528

Problem: less than 40% of human genes are annotated with phenotypes

GWAS+

ClinVar +

OMIM

@micheldumontier::Biostats:19-02-1529

B6.Cg-Alms1foz/fox/J

increased weight,adipose tissue volume,

glucose homeostasis altered

ALSM1(NM_015120.4)[c.10775delC] + [-]

GENOTYPE

PHENOTYPE

obesity,diabetes mellitus, insulin resistance

increased food intake, hyperglycemia,

insulin resistance

kcnj11c14/c14; insrt143/+(AB)

A multi-species inventory of phenotypes from genetic perturbations

@micheldumontier::Biostats:19-02-1530

Down Syndrome Mouse

Ts65Dn mice survive to adulthood and express some characteristics of Down syndrome such as developmental delay, hyperactivity, weight problems, craniofacial dysmorphology, impaired learning, and behavior deficit

@micheldumontier::Biostats:19-02-1531

Each species uniquely covers a different set of phenotypes

Provides an opportunity to use this information to inform human disease

@micheldumontier::Biostats:19-02-1532

Human and model phenotypes can be linked to >75% human genes

@micheldumontier::Biostats:19-02-1533

Problem: Clinical and model phenotypes are described differently

@micheldumontier::Biostats:19-02-1534

lung

lung

lobular organ

parenchymatous organ

solid organ

pleural sac

thoracic cavity organ

thoracic cavity

abnormal lung morphology

abnormal respiratory system morphology

Mammalian Phenotype

Mouse Anatomy

FMA

abnormal pulmonary acinus morphology

abnormal pulmonary alveolus morphology

lungalveolus

organ system

respiratory system

Lower respiratory

tract

alveolar sac

pulmonary acinus

organ system

respiratory system

Human development

lung

lung bud

respiratory primordium

pharyngeal region

Problem:Each organism uses different vocabularies

develops_frompart_of

is_a (SubClassOf)

surrounded_by

@micheldumontier::Biostats:19-02-1535

MPHP

@micheldumontier::Biostats:19-02-1536

Enhance lexical approach with OWL bridging axioms

• Key idea:– Describe the phenotype in a machine-interpretable way

• Break it down into digestible chunks!• Logical definition

– The machine will then be able to help you• Match phenotypes• Automate ontology checking and addition of new terms

• Approach:– Use Web Ontology Language (OWL), a description logic to

describe phenotypes– Use OWL reasoning to find connections

Mungall et al. (2012). Genome Biology, 13(1), R5Köhler et al. (2014) F1000Research 2:30Haendel et al. (2014) JBMS 5:21

Hoendorf et al. (2011). NAR 39(18):e119Hoendorf et al. (2011) Bioinformatics 27(7):1001

@micheldumontier::Biostats:19-02-1537

MPUberon(Anatomy)

CL(cell types)

@micheldumontier::Biostats:19-02-1538

PATO(qualities)

@micheldumontier::Biostats:19-02-1539

MPHP

‘abnormal phenotype’ and has_entity some ‘type B pancreatic cell’ and has_quality some amount

‘abnormal phenotype’ and has_entity some ‘type B pancreatic cell’ and has_quality some ‘reduced amount’

@micheldumontier::Biostats:19-02-1540

Monarch Cross-Species Similarity

@micheldumontier::Biostats:19-02-1541

PhenomeDrug

Computational methods that use phenotypes to predict drug targets, drug effects, and drug indications

@micheldumontier::Biostats:19-02-1542

animal models provide insight for on target effects

• In the majority of 100 best selling drugs ($148B in US alone), there is a direct correlation between knockout phenotype and drug effect

• Immunological Indications– Anti-histamines (Claritin, Allegra, Zyrtec)– KO of histamine H1 receptor leads to decreased

responsiveness of immune system– Predicts on target effects : drowsiness, reduced anxiety

Zambrowicz and Sands. Nat Rev Drug Disc. 2003.

@micheldumontier::Biostats:19-02-1543

Identifying drug targets from mouse knock-out phenotypes

drug

gene

phenotypes effects

human gene

non-functional gene model

ortholog

similar

inhibits

Main idea: if a drug’s phenotypes matches the phenotypes of a null model, this suggests that the drug is an inhibitor of the gene

@micheldumontier::Biostats:19-02-15

Terminological Interoperability(we must compare apples with apples)

Mouse Phenotypes

Drug effects(mappings from UMLS to DO, NBO, MP)

Mammalian Phenotype OntologyPhenomeNet

PhenomeDrug

@micheldumontier::Biostats:19-02-1545

Semantic SimilarityGiven a drug effect profile D and a mouse model M, we compute the semantic similarity as an information weighted Jaccard metric.

The similarity measure used is non-symmetrical and determines the amount of information about a drug effect profile D that is covered by a set of mouse model phenotypes M.

@micheldumontier::Biostats:19-02-15

Loss of function models predict targets of inhibitor drugs

• 14,682 drugs; 7,255 mouse genotypes• Validation against known and predicted inhibitor-target pairs

– 0.76 ROC AUC for human targets (DrugBank)– 0.81 ROC AUC for mouse targets (STITCH)

• diclofenac (STITCH:000003032) – NSAID used to treat pain, osteoarthritis and rheumatoid arthritis– Drug effects include liver inflammation (hepatitis), swelling of liver

(hepatomegaly), redness of skin (erythema)– 49% explained by PPARg knockout

• peroxisome proliferator activated receptor gamma (PPARg) regulates metabolism, proliferation, inflammation and differentiation,

• Diclofenac is a known inhibitor

– 46% explained by COX-2 knockout • Diclofenac is a known inhibitor

Hoehndorf R, Hiebert T, Hardy NW, Schofield PN, Gkoutos GV, Dumontier M. Mouse model phenotypes provide information about human drug targets. Bioinformatics. 2014 Mar 1;30(5):719-25

@micheldumontier::Biostats:19-02-1547

Computational Drug Repurposing

• Similarity– Guilt by association– If drug i is similar to drug j, and drug i treats

disease x, then drug j may treat disease x• Complementarity

– if the signature of drug i complements/counters the signature of disease x, then drug i may treat disease x

@micheldumontier::Biostats:19-02-1548

PhenomeDrug:phenotypic complementarity

• Extends the idea to match opposing drug-disease phenotypes – Drugs that induce hypotension may be useful in

treating hypertension• Problem: We don’t have any information about

phenotypic complementarity– We generated over 300 antonym pairs for the Human

Phenotype Ontology– Developed a measure to compute phenotypic

complementarity

@micheldumontier::Biostats:19-02-1549

Phenotype-Based Drug Repurposing

@micheldumontier::Biostats:19-02-1550

Preliminary Results

• Suggest that for some well annotated diseases, we recapitulate top drug candidates

• Quality of drug annotation is an issue– Some drugs have

insufficient annotations to find “good” matches

• Full assessment underway

• Pulmonary Arterial Hypertension

@micheldumontier::Biostats:19-02-1551

Summary

• Ontologies provide the structure and semantics by which phenotypes can be accurately represented and computed with

• Measures of semantic similarity in combination with terminological integration enable a broad diversity of ontology-based analyses, including– Diagnosis of rare diseases– Identifying human drug targets– Drug repositioning

@micheldumontier::Biostats:19-02-1552

Acknowledgements

Dumontier Lab• Tanya Hiebert• Joachim Baran

PhenomeDrug• Robert Hoehndorf• George Gkoutos

Monarch Initiative• Melissa Haendel• Peter Robinson• Chris Mungall• the Monarch Team

@micheldumontier::Biostats:19-02-1553

dumontierlab.commichel.dumontier@stanford.edu

Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier

top related