Evaluating Ontology-Mapping Tools: Requirements and Experience
Post on 05-Jan-2016
Embed Size (px)
DESCRIPTIONEvaluating Ontology-Mapping Tools: Requirements and Experience. Natalya F. Noy Mark A. Musen Stanford Medical Informatics Stanford University. Types Of Ontology Tools. Ontology Tools. Development Tools. Mapping Tools. Protg-2000, OntoEdit OilEd, WebODE, Ontolingua. - PowerPoint PPT Presentation
Evaluating Ontology-Mapping Tools:Requirements and ExperienceNatalya F. NoyMark A. Musen
Stanford Medical InformaticsStanford University
Types Of Ontology ToolsThere is not just ONE class ofONTOLOGY TOOLSOntology ToolsDevelopment ToolsProtg-2000, OntoEditOilEd, WebODE, OntolinguaMapping ToolsPROMPT, ONION, OBSERVER,Chimaera, FCA-Merge, GLUE
Evaluation Parameters forOntology-Development ToolsInteroperability with other toolsAbility to import ontologies from other languagesAbility to export ontologies to other languagesExpressiveness of the knowledge modelScalabilityExtensibilityAvailability and capabilities of inference servicesUsability of tools
Evaluation Parameters ForOntology-Mapping ToolsCan try to reuse evaluation parameters for development tools, but:
Ontology ToolsDevelopment ToolsMapping ToolsDifferenttasks, inputs,and outputsSimilartasks, inputs,and outputs
Development ToolsDomainontologyCreate anontologyInputOutputTask
Mapping Tools: TasksC=Merge(A, B)ABiPROMPT, ChimaeraMap(A, B)ABAnchor-PROMPT, GLUEFCA-MergeABArticulation ontologyONION
Mapping Tools: InputsClassesClassesClassesClassesClassesSharedinstancesInstancedataDLdefinitionsSlots andfacetsSlots andfacetsiPROMPTChimaeraGLUEFCA-MergeOBSERVER
Mapping Tools: Outputs and User Interaction
Can We Compare Mapping Tools?Yes, we can!We can compare tools in the same groupHow do we define a group?
Architectural Comparison CriteriaInput requirementsOntology elementsUsed for analysisRequired for analysisModeling paradigmFrame-basedDescription LogicLevel of user interaction:Batch modeInteractiveUser feedbackRequired?Used?
Architectural Criteria (contd)Type of outputSet of rulesOntology of mappingsList of suggestionsSet of pairs of related termsContent of outputMatching classesMatching instancesMatching slots
From Large Pool To Small GroupsSpace ofmapping toolsArchitectural criteriaPerformance criterion (within a single group)
Resources Required For Comparison ExperimentsSource ontologiesPairs of ontologies covering similar domainsOntologies of different size, complexity, level of overlapGold standard resultsHuman-generated correspondences between termsPairs of terms, rules, explicit mappings
Resources Required (contd)Metrics for comparing performancePrecision (how many of the tools suggestions are correct)Recall (how many of the correct matches the tool found)Distance between ontologiesUse of inference techniquesAnalysis of taxonomic relationships (a-la OntoClean)Experiment controlsDesignProtocol
Where Will The Resources Come From?Ideally, from researchers that do not belong to any of the evaluated projectsRealistically, as a side product of stand-alone evaluation experiments
Evaluation Experiment: iPROMPTiPROMPT isA plug-in to Protg-2000An interactive ontology-merging tooliPROMPT uses for analysisClass hierarchySlots and facet valuesiPROMPT matchesClassesSlotsInstances
Evaluation Experiment4 users merged the same 2 source ontologiesWe measuredAcceptability of iPrompts suggestionsDifferences in the resulting ontologies
SourcesInput: two ontologies from the DAML ontology libraryCMU ontology:Employees of academic organizationPublicationsRelationships among research groupsUMD ontology:IndividalsCS departmentsActivities
Experimental DesignUsers expertise:Familiar with Protg-2000Not familiar with PROMPTExperiment materials:The iPROMPT softwareA detailed tutorialA tutorial exampleEvaluation filesUsers performed the experiment on their own. No questions or interaction with developers.
Experiment ResultsQuality of iPROMPT suggestions:Recall: 96.9%Precision: 88.6%Resulting ontologiesDifference measure: fraction of frames that have different name and typeOntologies differ by ~30%
Limitations In The ExperimentOnly 4 participantsVariability in Protg expertiseRecall and precision figures without comparison to other tools are not very meaningfulNeed better distance metrics
Research QuestionsWhich pragmatic criteria are most helpful in finding the best tool for a taskHow do we develop a gold standard merged ontology? Does such an ontology exist?How do we define a good distance metric to compare results to the gold standard?Can we reuse tools and metrics developed for evaluating ontologies themselves?