national cancer institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/denisewarzel.pdf ·...

16
National Cancer Institute National Cancer Institute CENDI/NKOS New Dimensions in Knowledge Organization Systems Semantic Interoperability in caBIG™ Leveraging Vocabulary, Metadata Registries and Models September 11, 2008 Denise Warzel Associate Director, Core Infrastructure Program NCI Center for Biomedical Informatics and Information Technology (CBIIT)

Upload: lynga

Post on 22-Feb-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute

CENDI/NKOSNew Dimensions in Knowledge

Organization Systems

Semantic Interoperability in caBIG™Leveraging Vocabulary,

Metadata Registries and Models

September 11, 2008Denise WarzelAssociate Director, Core Infrastructure ProgramNCI Center for Biomedical Informatics and Information Technology (CBIIT)

Page 2: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute Agenda

• caBIG™ Semantic Interoperability Infrastructure

• Why bother?

• NCI’s approach: Terminology + Metadata + Information Models

“The whole is bigger than the sum of the parts”

• Issues relative to implementation– (discussion)

Page 3: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute

• in·ter·op·er·a·bil·i·ty

ability of a system...to use the parts or equipment of another systemSource: Merriam-Webster web site

• interoperability

ability of two or more systems or components to exchange informationand

to use the information that has been exchanged. Source: IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries, IEEE,

1990]

Interoperability

SemanticSemanticinteroperabilityinteroperability

SyntacticSyntacticinteroperabilityinteroperability

Page 4: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute Why Bother?

• Enabling Discovery requires we get more out of data– reuse of data outside of primary or original context

– integrate data from disparate sources

– interoperability between systems

• Computable Unambiguous meaning (semantics)

• Computable Unambiguous syntax

– Metadata Registries + Terminologies + Information Models

• Simple programs = primary use of data within the immediate original context

– spreadsheet/statistical packages

– classic closed world RDBMS

– file server/web server

– E.g. Clinical Trial System, Patient Care, Image Analysis

Page 5: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute

D. Warzel

caBIG™ Semantic Infrastructure

Enterprise Vocabulary

Data Elements

Information Models

Page 6: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute

Preferred Name

Synonyms

Definition

Relationships

Concept Code

Description Logic

Page 7: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute caGrid is based on Object Oriented

Principals

+NSCNumber : char(idl)+isCMAPAgent : boolean(idl)+EVSId : string(idl)+comment : string(idl)+source : string(idl)+name : string(idl)

Agent (C1708)

+tradeName : string(idl)+genericName : string(idl)+fdaApprovalDate : string(idl)+NDCCode : string(idl)+fdaCode : string(idl)

Drug (C1708)

+NSCNumber : char(idl)+isCMAPAgent : boolean(idl)+EVSId : string(idl)+comment : string(idl)+source : string(idl)+name : string(idl)+tradeName : string(idl)+genericName : string(idl)+fdaApprovalDate : string(idl)+NDCCode : string(idl)

Agent/Drug C1708= CDE 2223866 v3.0

= CDE 2223866 v3.0

SemanticSignature

if Agent.NSCNumber = Drug.fdaCode

+NSCNumber : char(idl)+isCMAPAgent : boolean(idl)+EVSId : string(idl)+comment : string(idl)+source : string(idl)+name : string(idl)

Agent (C1708)

+tradeName : string(idl)+genericName : string(idl)+fdaApprovalDate : string(idl)+NDCCode : string(idl)+fdaCode : string(idl)

Drug (C1708)

+NSCNumber : char(idl)+isCMAPAgent : boolean(idl)+EVSId : string(idl)+comment : string(idl)+source : string(idl)+name : string(idl)+tradeName : string(idl)+genericName : string(idl)+fdaApprovalDate : string(idl)+NDCCode : string(idl)

Agent/Drug C1708= CDE 2223866 v3.0

= CDE 2223866 v3.0

SemanticSignature

if Agent.NSCNumber = Drug.fdaCode

caBIG Interoperability by Objects and CDE

Page 8: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute caBIG™ Community Involvement

• caGrid itself provides no real “data” or “analysis” to caBIG™; its the enabling infrastructure which allows the community to do so

• Community members add value to the grid as applications, services, and processes (for example: shared workflows)

– caGrid provides the necessary core services, APIs, and tooling

• The real value of the grid comes from bringing this information to the end user

• Community members develop end user applications which consume of the resources provided by the grid

Page 9: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute Semantics Artifacts in caGrid

Service

Service API

Core Services

ClientXSDWSDL

Grid Service

Service Definition

Data TypeDefinitions

Grid Client

Client API

Registered In

Object Definitions

SemanticallyDescribed In

XMLObjectsSerialize To

ValidatesAgainst

Client Uses

Cancer Data Standards Repository

Objects

GlobalModel

Exchange

GMERegistered In

ObjectDefinitions

Objects

Page 10: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute caGrid Production Environment

Page 11: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute

legacy data

Building caBIG™ Compatible Systems

EnterpriseVocabulary

Services

EnterpriseVocabulary

Services

DataStandardsRepository

DataStandardsRepository

Scientific Research

Scientific Research

Clinical Trials

Clinical Trials Data

Elements

Vocabulary for CDE Specification

DomainObject Metadata

Public/Grid APIsPublic/Grid APIsTerminology Node

DomainObject Model

DomainObject Model

DiscoveryServices

Verify Credentials

ModelAnnotations

Domain Object Model

Page 12: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute The ISO 11179 Model and

Terminology Linkage in caDSR

Page 13: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute ISO 11179 ‘Grammar’

ObjectClass

Property

DataElementConcept

Representation

ValueDomain+

What is it? How do you want torepresent it?

=Data

ElementConcept

ValueDomain

DataElement

Common Data Element+ =

Agent Name TextFDA Agent Name FDA Agent NameText

Anethole TrithioneCyclooxygenase Inhibitor

Ginger Green Tea

Iloprost Taxol

…Ursodiol caDSR & ISO 11179 Training

Jennifer Brush, Dianne Reeves

Sting, Max=40FDA Agent Name

Page 14: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute

ObjectClass

Property

DataElementConcept

Representation

ValueDomain+

What is it? How do you want torepresent it?

=Data

ElementConcept

ValueDomain

DataElement

Common Data Element+ =

Agent Name TextFDA Agent Name FDA Agent NameText

C1708:C42614:C25704 C17237:C1708.C42614

NCI Extension: 11179 Grammar + Concepts

NCI Thesaurus NCI Metathesaurus

Enterprise Vocabulary Services

Anethole TrithioneCyclooxygenase Inhibitor

Ginger Green Tea

Iloprost Taxol

…Ursodiol

C246C1323C2691 C2694

C48397C1411

…C1818

Zebrafish

Page 15: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute No Controlled Terminology?

No computable interoperability

• Systems cannot automatically exchange or use information if they have use incompatible codes or tokens to store the data

• Linkage of metadata to terminology concept assures consistent meaning across the enterprise

• Metadata registry information can enable automation mappings/transformations between different tokens for the same code

Page 16: National Cancer Institute - nkos.slis.kent.edunkos.slis.kent.edu/2008workshop/DeniseWarzel.pdf · National Cancer Institute caBIG™ Community Involvement • caGrid itself provides

Natio

nal C

ance

r Ins

titute

Natio

nal C

ance

r Ins

titute Challenges

• Challenges relative to implementation when building disparate systems that still can interoperate

– Information Modeling vs Domain Modeling (representing context)

– Resolution of concepts from different terminologies – are they the same concept or different concept?

– Many vocabularies aren’t available programmatically

– Many vocabularies don’t contain identifiers

– Vocabulary concepts usually aren’t versioned

– Vocabulary vs Metadata? (e.g. code sets, permissible value sets) - people get confused