evaluating ontology-mapping tools: requirements and experience

Evaluating Ontology-Mapping Tools:Requirements and Experience

Natalya F. Noy

Mark A. Musen

Stanford Medical Informatics

Stanford University

Types Of Ontology Tools

There is not just ONE class ofONTOLOGY TOOLS

There is not just ONE class ofONTOLOGY TOOLS

Ontology ToolsOntology Tools

Development ToolsDevelopment Tools

Protégé-2000, OntoEditOilEd, WebODE, Ontolingua

Mapping ToolsMapping Tools

PROMPT, ONION, OBSERVER,Chimaera, FCA-Merge, GLUE

Evaluation Parameters forOntology-Development Tools

Interoperability with other tools Ability to import ontologies from other languages Ability to export ontologies to other languages

Expressiveness of the knowledge model Scalability Extensibility Availability and capabilities of inference services Usability of tools

Evaluation Parameters ForOntology-Mapping Tools

Can try to reuse evaluation parameters for development tools, but:

Ontology ToolsOntology Tools

Development ToolsDevelopment Tools Mapping ToolsMapping Tools

Differenttasks, inputs,and outputs

Similartasks, inputs,and outputs

Development Tools

Domainknowledge

Ontologiesto reuse

Requirements

Domainontology

Create anontologyCreate anontology

Input OutputTask

Mapping Tools: Tasks

C=Merge(A, B)C=Merge(A, B)

AA BB

iPROMPT, Chimaera

Map(A, B)

AA BB

Anchor-PROMPT, GLUEFCA-Merge

AA BB

Articulation ontologyArticulation ontology

ONION

Mapping Tools: Inputs

ClassesClasses ClassesClassesClassesClasses ClassesClasses ClassesClasses

Sharedinstances

Sharedinstances

Instancedata

Instancedata

DLdefinitions

DLdefinitions

Slots andfacets

Slots andfacets

Slots andfacets

Slots andfacets

iPROMPTChimaera GLUE FCA-Merge OBSERVER

Mapping Tools: Outputs and User Interaction

GUI for interactivemerging

iPROMPT, Chimaera

Lists of pairs ofrelated terms

Anchor-PROMPT, GLUEFCA-Merge

List of articulationrules

ONION

Can We Compare Mapping Tools?

Yes, we can! We can compare tools in the same group How do we define a group?

Architectural Comparison Criteria

Input requirements Ontology elements

Used for analysis Required for analysis

Modeling paradigm Frame-based Description Logic

Level of user interaction: Batch mode Interactive User feedback

Required? Used?

Architectural Criteria (cont’d)

Type of output Set of rules Ontology of mappings List of suggestions Set of pairs of related terms

Content of output Matching classes Matching instances Matching slots

From Large Pool To Small Groups

Space ofmapping tools

Architectural criteria

Performancecriterion

(within a single group)

Resources Required For Comparison Experiments

Source ontologies Pairs of ontologies covering similar domains Ontologies of different size, complexity, level of

overlap

“Gold standard” results Human-generated correspondences between terms Pairs of terms, rules, explicit mappings

Resources Required (cont’d)

Metrics for comparing performance Precision (how many of the tool’s

suggestions are correct) Recall (how many of the correct matches the

tool found) Distance between ontologies Use of inference techniques Analysis of taxonomic relationships (a-la

OntoClean)

Experiment controls Design Protocol

Suggestions that the tool

produced

Operations that the user

performed

Suggestions that the user

followed

Where Will The Resources Come From?

Ideally, from researchers that do not belong to any of the evaluated projects

Realistically, as a side product of stand-alone evaluation experiments

Evaluation Experiment: iPROMPT

iPROMPT is A plug-in to Protégé-2000 An interactive ontology-merging tool

iPROMPT uses for analysis Class hierarchy Slots and facet values

iPROMPT matches Classes Slots Instances

Evaluation Experiment

4 users merged the same 2 source ontologies

We measured Acceptability of iPrompt’s suggestions Differences in the resulting ontologies

Sources

Input: two ontologies from the DAML ontology library

CMU ontology: Employees of

academic organization

Publications Relationships

among research groups

UMD ontology: Individals CS departments Activities

Experimental Design

User’s expertise: Familiar with Protégé-2000 Not familiar with PROMPT

Experiment materials: The iPROMPT software A detailed tutorial A tutorial example Evaluation files

Users performed the experiment on their own. No questions or interaction with developers.

Experiment Results

Quality of iPROMPT suggestions: Recall: 96.9% Precision: 88.6%

Resulting ontologies Difference measure: fraction of frames that

have different name and type Ontologies differ by ~30%

Limitations In The Experiment

Only 4 participants Variability in Protégé expertise Recall and precision figures without

comparison to other tools are not very meaningful

Need better distance metrics

Research Questions

Which pragmatic criteria are most helpful in finding the best tool for a task

How do we develop a “gold standard” merged ontology? Does such an ontology exist?

How do we define a good distance metric to compare results to the gold standard?

Can we reuse tools and metrics developed for evaluating ontologies themselves?

evaluating ontology-mapping tools: requirements and experience

Documents

tools suggestions

research groupsumd ontology

user interactioncan

analysisclass hierarchyslots

single groupresources

ipromptiprompt isa plugin

termspairs of terms

suggestionsset of pairs