public health ontology 101

51
Public Health Ontology 101 Mark A. Musen, M.D., Ph.D. Stanford Center for Biomedical Informatics Research Stanford University School of Medicine Die Seuche (The Plague), A. Paul Weber, Courtesy of the NLM

Upload: nayef

Post on 23-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Public Health Ontology 101. Mark A. Musen, M.D., Ph.D. Stanford Center for Biomedical Informatics Research Stanford University School of Medicine. Die Seuche (The Plague), A. Paul Weber, Courtesy of the NLM. Many Factors can Influence the Effectiveness of Outbreak Detection. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Public Health Ontology 101

Public Health Ontology 101

Mark A. Musen, M.D., Ph.D.Stanford Center for Biomedical Informatics Research

Stanford University School of Medicine

Die Seuche (The Plague), A. Paul Weber, Courtesy of the NLM

Page 2: Public Health Ontology 101

Many Factors can Influence the Effectiveness of Outbreak Detection

• Progression of disease within individuals

• Population and exposure characteristics

• Surveillance system characteristics

From Buehler et al. EID 2003;9:1197-1204

Page 3: Public Health Ontology 101

Computational Challenges

• Access to data• Interpretation of data• Integration of data• Identification of appropriate analytic methods• Coordination of problem solving to address

diverse data sources• Determining what to report

Page 4: Public Health Ontology 101

Interpretation of data• Clinical data useful for public health surveillance are often

collected for other purposes (e.g., diagnostic codes for patient care, billing)

• Such data may be biased by a variety of factors– Desire to protect the patient– Desire to maximize reimbursement– Desire to satisfy administrative requirements with minimal effort

• Use of diagnostic codes is problematic because precise definitions generally are unknown—both to humans and computers

Page 5: Public Health Ontology 101

A Small Portion of ICD9-CM724 Unspecified disorders of the back724.0 Spinal stenosis, other than cervical724.00 Spinal stenosis, unspecified region724.01 Spinal stenosis, thoracic region724.02 Spinal stenosis, lumbar region724.09 Spinal stenosis, other724.1 Pain in thoracic spine724.2 Lumbago724.3 Sciatica724.4 Thoracic or lumbosacral neuritis724.5 Backache, unspecified724.6 Disorders of sacrum724.7 Disorders of coccyx724.70 Unspecified disorder of coccyx724.71 Hypermobility of coccyx724.71 Coccygodynia724.8 Other symptoms referable to back724.9 Other unspecified back disorders

Page 6: Public Health Ontology 101

The combinatorial explosion1970s ICD9: 8 Codes

Page 7: Public Health Ontology 101

ICD10 (1999): 587 codes for such accidents

• V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while

working for income• W65.40 Drowning and submersion while in bath-

tub, street and highway, while engaged in sports activity

• X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or

engaging in other vital activities

Page 8: Public Health Ontology 101

Syndromic Surveillance

• Requires enumeration of relevant “syndromes”• Requires mapping of codes (usually in ICD9) to

corresponding syndromes• Is complicated by the difficulty of enumerating

all codes that appropriately support each syndrome

• Is complicated by lack of consensus on what the “right” syndromes are in the first place

Page 9: Public Health Ontology 101

There is no consistency in how “syndromes” are defined or monitored

System SyndromeENCOMPASS “Respiratory illness with

fever”

CDC MMWR 10/01 “Unexplained febrile illness associated with pneumonia”

RSVP, New Mexico “Influenza-like illness”

Santa Clara County “Flu-like symptoms”

Winter Olympics, Utah 2002

“Respiratory infection with fever” consensus definition

Page 10: Public Health Ontology 101

The solution to the terminology mess:Ontologies

• Machine-processable descriptions of what exists in some application area

• Allows computer to reason about– Concepts in the world– Attributes of concepts– Relationships among concepts

• Provides foundation for– Intelligent computer systems– The Semantic Web

Page 11: Public Health Ontology 101

What Is An Ontology?• The study of being • A discipline co-opted by computer science

to enable the explicit specification of – Entities– Properties and attributes of entities– Relationships among entities

• A theory that provides a common vocabulary for an application domain

Page 12: Public Health Ontology 101

Supreme genus: SUBSTANCE

Subordinate genera: BODY SPIRIT

Differentiae: material immaterial

Differentiae: animate inanimate

Differentiae: sensitive insensitiveSubordinate genera: LIVING MINERAL

Proximate genera: ANIMAL PLANT

Species: HUMAN BEAST

Differentiae: rational irrational

Individuals: Socrates Plato Aristotle …

Porphyry’s depiction of Aristotle’s Categories

Page 13: Public Health Ontology 101
Page 14: Public Health Ontology 101
Page 15: Public Health Ontology 101
Page 16: Public Health Ontology 101

Heart

Cavityof Heart

Wallof Heart

RightAtrium

Cavity ofRight Atrium

Wall ofRight Atrium

FossaOvalis Myocardium

SinusVenarum

SANode

Myocardiumof Right Atrium

CardiacChamber

HollowViscus

InternalFeature

OrganCavity

Organ CavitySubdivision

AnatomicalSpatial Entity

AnatomicalFeature

BodySpace

OrganComponent

OrganSubdivisionViscus

OrganPartOrgan

AnatomicalStructure

Parts of the heart

Foundational Modelof Anatomy

Is-aPart-of

Page 17: Public Health Ontology 101

The FMA demonstrates that distinctions are not universal

• Blood is not a tissue, but rather a body substance (like saliva or sweat)

• The pericardium is not part of the heart, but rather an organ in and of itself

• Each joint, each tendon, each piece of fascia is a separate organ

These views are not shared by many anatomists!

Page 18: Public Health Ontology 101

Why develop an ontology?• To share a common understanding of the entities in a given

domain– among people– among software agents– between people and software

• To enable reuse of data and information– to avoid re-inventing the wheel– to introduce standards to allow interoperability and automatic

reasoning• To create communities of researchers

Page 19: Public Health Ontology 101

We really want ontologies in electronic form

• Ontology contents can be processed and interpreted by computers

• Interactive tools can assist developers in ontology authoring

Page 20: Public Health Ontology 101

The NCI Thesaurus in Protégé-OWL

Page 21: Public Health Ontology 101

Goals of Biomedical Ontologies• To provide a classification of biomedical entities• To annotate data to enable summarization and

comparison across databases• To provide for semantic data integration• To drive natural-language processing systems • To simplify the engineering of complex software

systems• To provide a formal specification of biomedical

knowledge

Page 22: Public Health Ontology 101

Biosurveillance Data Sources Ontology

Page 23: Public Health Ontology 101

Ontology defines how data should be accessed from the database

Page 24: Public Health Ontology 101

Ontologies: Good news and bad news

• The Good news– Ontologies allow computers to “understand” definitions of

concepts and to relate concepts to one another– Automated inheritance of attributes makes it very easy to add new

concepts to an ontology over time– Ontologies can be developed in standard knowledge-

representation languages that have wide usage• The Bad news

– Most current biomedical ontologies have been developed using non-standard languages

– It’s still very hard to get people to agree about the content of proposed ontologies

Page 25: Public Health Ontology 101

Computational Challenges

• Access to data• Interpretation of data• Integration of data• Identification of appropriate analytic methods• Coordination of problem solving to address

diverse data sources• Determining what to report

Page 26: Public Health Ontology 101

The Medical Entities Dictionary (after Cimino)

MEDPatientRegistration

ClinicalLaboratory Radiology

Pharmacy

Page 27: Public Health Ontology 101

Ontologies for data integration

Hema-tology

LabResult

SerumChemistry

Electro-lytes

Amino-transferases

Sodium HCO3

PatientDatabase

1

PatientDatabase

2

PatientDatabase

3HCO3

Bicarbonate

Bicarb

HCO3–

Ontology ofpatient data

(Canonicaldata value)

Page 28: Public Health Ontology 101

Computational Challenges

• Access to data• Interpretation of data• Integration of data• Identification of appropriate analytic

methods• Coordination of problem solving to address

diverse data sources• Determining what to report

Page 29: Public Health Ontology 101

Different types of data require different types of problem solvers

• Are the data multivariate or univariate? • Do the data involve temporal or spatial dimensions?• Are the data categorical or probabilistic?• Are the data acquired as a continuous stream or as a batch?• Is it possible for temporal data to arrive out of order?• What is the rate of data acquisition and what are the

numbers of data that need to be processed?

Page 30: Public Health Ontology 101

An ontology of problem solvers for aberrancy detection

Obtain Current Observation

Binary Alarm

Transform Data

Forecast

Compute Test Value

Estimate Model

Parameters

Obtain Baseline

Data

Evaluate Test Value

Compute Expectation

Empirical Forecasting

Moving Average

Mean, StDev

Database Query

Database Query

Aberrancy Detection (Temporal)

Residual-Based

Layered Alarm

EWMA

Cumulative Sum

P-Value

. . . .

Constant (theory-based)

Outlier Removal

Smoothing

. . . .

GLM Model Fitting

Trend Estimation

. . . .

. . . .

GLM Forecasting

Compute Residual

Evaluate Residual

Binary Alarm

Aberrancy Detection (Control Chart)

Layered Alarm

Raw Residual

Z-Score

. . . .

EWMA

Generalized Exponential Smoothing

ARIMA Model Fitting

Signal Processing Filter ARIMA Forecasting

Page 31: Public Health Ontology 101

BioSTORM: A Prototype Next-Generation Surveillance Sytem

• Developed at Stanford, initially with funding from DARPA, now from CDC

• Provides a test bed for evaluating alternative data sources and alternative problem solvers

• Demonstrates– Use of ontologies for data acquisition and data

integration– Use of a high-performance computing system for

scalable data analysis

Page 32: Public Health Ontology 101

Data Source

s

Data Regularization Middleware

Epidemic Detection Problem Solvers

Control Structure

BioSTORM Data FlowMapping Ontology

Heterogeneous Input Data

Semantically Uniform Data

Customized Output Data

Data Broker Data Mapper

Data Source Ontology

Page 33: Public Health Ontology 101

Distributed Data

Sources

DataBroker

Data Source Ontology

Heterogeneous Data Input

Semantically Uniform Data

Objects

Data Broker and Data Source Ontology

Page 34: Public Health Ontology 101

Biosurveillance Data Sources Ontology

Page 35: Public Health Ontology 101

Ontology defines how data should be accessed from the database

Page 36: Public Health Ontology 101

Distributed Data

Sources

DataBroker

Data Source Ontology

Heterogeneous Data Input

Semantically Uniform Data

Objects

Data Broker and Data Source Ontology

Page 37: Public Health Ontology 101

Semantically Uniform Data

ObjectsData

Mapper

Customized Data Objects

Mapping Ontology

Data Source Ontology

Input–Output Ontology

Problem Solver

Data Mapper and Mapping Ontology

Page 38: Public Health Ontology 101

Data Mapper

Mapping Ontologies

Problem Solvers

Input–Output

Ontologies

Varying Problem Solvers

Customized Data Objects

Semantically Uniform Data Objects

Page 39: Public Health Ontology 101

An ontology of problem solvers for aberrancy detection

Obtain Current Observation

Binary Alarm

Transform Data

Forecast

Compute Test Value

Estimate Model

Parameters

Obtain Baseline

Data

Evaluate Test Value

Compute Expectation

Empirical Forecasting

Moving Average

Mean, StDev

Database Query

Database Query

Aberrancy Detection (Temporal)

Residual-Based

Layered Alarm

EWMA

Cumulative Sum

P-Value

. . . .

Constant (theory-based)

Outlier Removal

Smoothing

. . . .

GLM Model Fitting

Trend Estimation

. . . .

. . . .

GLM Forecasting

Compute Residual

Evaluate Residual

Binary Alarm

Aberrancy Detection (Control Chart)

Layered Alarm

Raw Residual

Z-Score

. . . .

EWMA

Generalized Exponential Smoothing

ARIMA Model Fitting

Signal Processing Filter ARIMA Forecasting

Page 40: Public Health Ontology 101

Data Source

s

Data Regularization Middleware

Epidemic Detection Problem Solvers

Control Structure

BioSTORM Data FlowMapping Ontology

Heterogeneous Input Data

Semantically Uniform Data

Customized Output Data

Data Broker Data Mapper

Data Source Ontology

Page 41: Public Health Ontology 101

Computational Challenges

• Access to data• Interpretation of data• Integration of data• Identification of appropriate analytic methods• Coordination of problem solving to address

diverse data sources• Determining what to report

Page 42: Public Health Ontology 101

We need to address the challenges of automating surveillance

• Current surveillance systems – Require major reprogramming to add new data sources or new

analytic methods– Lack the ability to select data sources and analytic methods

dynamically based on problem-solving requirements– Ignore qualitative data and qualitative relationships– Will not scale up to the requirements of handling huge data feeds

• The existing health information infrastructure– Is all-too-often paper-based – Uses 19th century techniques for encoding knowledge about clinical

conditions and situations– Remains fragmented, hindering data access and communication

Page 43: Public Health Ontology 101

The National Center for Biomedical Ontology

• One of three National Centers for Biomedical Computing launched by NIH in 2005

• Collaboration of Stanford, Berkeley, Mayo, Buffalo, Victoria, UCSF, Oregon, and Cambridge

• Primary goal is to make ontologies accessible and usable• Research will develop technologies for ontology dissemination, indexing,

alignment, and peer review

Page 44: Public Health Ontology 101

Our Center offers

• Technology for uploading, browsing, and using biomedical ontologies

• Methods to make the online “publication” of ontologies more like that of journal articles

• Tools to enable the biomedical community to put ontologies to work on a daily basis

Page 45: Public Health Ontology 101

http://bioportal.bioontology.org

Page 46: Public Health Ontology 101

Local Neighborhood view

Browsing/Visualizing Ontologies

Page 47: Public Health Ontology 101
Page 48: Public Health Ontology 101

BioPortal will experiment with new models for

• Dissemination of knowledge on the Web• Integration and alignment of online content• Knowledge visualization and cognitive support • Peer review of online content

Page 49: Public Health Ontology 101

BioPortal is building an online community of users who

• Develop, upload, and apply ontologies• Map ontologies to one another• Comment on ontologies via “marginal notes” to give

feedback – To the ontology developers– To one another

• Make proposals for specific changes to ontologies• Stay informed about ontology changes and proposed

changes via active feeds

Page 50: Public Health Ontology 101

Public Health Ontology 101

Mark A. Musen, M.D., Ph.D.Stanford Center for Biomedical Informatics Research

Stanford University School of Medicine

Die Seuche (The Plague), A. Paul Weber, Courtesy of the NLM

Page 51: Public Health Ontology 101