state of the art for ontology repositories frank olken national science foundation cise/iis/iii...

44
State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III [email protected] Presentation to Ontology Summit NIST Gaithersburg, MD v05 April 28, 2008

Upload: sydney-mckenna

Post on 27-Mar-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

State of the Art for Ontology Repositories

Frank OlkenNational Science Foundation

CISE/IIS/[email protected]

Presentation to Ontology SummitNIST

Gaithersburg, MDv05

April 28, 2008

Page 2: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 2

Disclaimer

Opinions expressed in this talk are solely those of the author, and do not reflect the positions of either the National Science Foundation, CISE, IIS or Lawrence Berkeley National Laboratory.

Page 3: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 3

This talk:

I will address key issues in the design and implementation of ontology repositories and

some of the major technologies being used to address these issues.

Page 4: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 4

Outline

What is an ontology repository? Why doe one want one? Macro vs. Micro Issues Implementation Issues

Page 5: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 5

Implementation Issues

Ontology acquisition, ingestion Macro vs. micro issues Centralized vs. Decentralized Ontology representation Ontology search, query Ontology Integration Auxiliary tools SOA, etc.

Page 6: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 6

What is an Ontology Repository?

System for storing, searching, retrieving multiple ontologies

Support for ontology integration Variously:

Tools for ontology creation, editing, visualization Tools for ontology annotation, curation, ....

Page 7: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 7

Multiple Ontologies

This is the source of the hardest problems in building ontology repositories: Scale Diverse ontology representations Ontology integration (mapping) Namespace issues Complex provenance issues

Page 8: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 8

Why would you want an OR? You need to deal with multiple ontologies Usual reasons for ontologies:

Natural Language Processing support Data Integration, Exchange Data semantics Support for DB queries DB, application design Classification / Indexing of documents, etc. Creation / maintenance /use of controlled

vocabularies

Page 9: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 9

Ontology Acquisition

Manual acquisition and loading

e.g. XMDR Useful if ontology representations are very diverse.

Spidering the web to find ontologies (e.g., Nutch) Google (etc.) search to find ontologies How does one recognize an ontology?

Use of OWL, RDF, CL, etc. Lots of is-a, part-of relations ... Comments that assert file is an ontology

Page 10: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 10

Ontology Ingestion

Parsing ontology, syntactic validation Consistency checking (no cycles in partial

orders: taxonomies, partonomies) Conversion to common representation (?)

Syntactic translation Semantic translation

e.g., CWA vs. OWA

Indexing, transitive closure computations, ...

Page 11: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 11

Centralized vs. Federated Architectures

Centralized: collect ontologies into one place High startup, maintenance costs Fast retrieval, facilitates integration

Federated: ontologies stay put Low startup, maintenance costs Less performance, reliability More requirements on ontology sites

Hybrid Centralize ontology level metadata, indices Leave individual ontologies in place

Page 12: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 12

Macro vs. Micro-level Issues

Macro-level Searching across a collection of ontologies and

their metadata Micro-level

Searching, inferencing, within individual ontologies

Page 13: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 13

Macro & Micro similarities

Most (not all) macro and micro level issues are essentially the same and can use the same

technologies for implementation.

Page 14: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 14

Macro-level Support

Over collections of ontologies Use an ontology of ontologies

e.g., taxonomy of subject matter Ontology of ontology metadata

Page 15: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 15

Ontology Search

Text-based search Natural language definitions Symbols E.g., Lucene, UIMA

Semantic Search Over ontology representation (RDF, OWL, CL) e.g., SPARQL, etc. e.g., faceted search (e.g., Siderean) e.g., navigation over taxonomies, etc.

Page 16: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 16

Ontology Representations

Text Frames (OBO) Graphs (RDF) Logics (OWL-DL, OWL Full, CL)

Page 17: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 17

Text Representation

Obvious candidate for ontology representation of informal ontologies, with natural language definitions, etc. ....

A lowest common denominator representation for more formal ontology representations

Readily supports handling diverse ontology representations (must add tags for underlying ontology representation language)

Only supports text search directly

Page 18: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 18

Frame Representations

Each frame is a collection of: (slot, value) pairs or (slot, value list) Originally deployed in Lisp

Secondary Storage Each frame is a BLOB Or, decompose into finer grained DB entries

Current uses: OBO (open biological ontology) format

Page 19: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 19

Graph Representations

a.k.a. Semantic networks, semantic graphs Examples: RDF, RDF schemas, XLinks List of edges, each edge:

Subject Predicate (relation name, attribute name) Object (or attribute value)

Very flexible Only support binary relations directly

Page 20: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 20

Types of Graphs

Trees Simple Taxonomies (isa), Partonomies (partof)

Multi-faceted Classifications Taxonomies with multiple facets e.g.., Vehicles: purpose, propulsion, wheels, axles,

color Directed acyclic graphs

Multiple inheritance Partial orders

Page 21: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 21

Types of graphs

Arbitrary directed graphs Allows arbitrary binary relationships

Named graphs Allows separate inclusion hierarchy Allow edges to point to/from subgraphs

Page 22: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 22

Partial Orders

Many ontologies are Partial Orders (i.e, directed acyclic graphs), e.g., taxonomies, partonomies, ...

Mappings among partial ordered ontologies should be “order preserving”

See work of Cliff Joslyn (PNNL)

Page 23: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 23

Note:

RDF are collections of edges (triples) No naked nodes allowed

Page 24: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 24

Graph Implementations

Represent graph as: Triple store (as on previous slide) Quad store (support named graphs) Standalone system, relational DBMS, column store

Page 25: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 25

Quad stores & Named graphs

Quad stores allow named graphs (named graph, subject, predicate, object)

Named graphs (quads) allow one to name subgraphs (collections of edges) and to refer to them by name

Hence, subjects and objects are no longer just nodes, but may be subgraphs (collections of edges)

Page 26: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 26

Secondary storage of graphs

Long skinny relations Triples or quads

Column stores (Monet DB, Vertica) Multiple indices sorted by: subject, predicate,

object, combinations, ... Clusters of edges (Cogito)

Page 27: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 27

Semantic graph query languages

SPARQL is now the primary candidate Undergoing W3C “standardization”

Page 28: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 28

Logic-based Ontology Representations

Description Logic (e.g., OWL-DL) Restricted to make it decidable and computationally

tractable Typically, lacks cardinality constraints, arithmetic

Datalog (Horn clause logic + recursion) Prolog based

First Order Logic (e.g., Common Logic) IKL (FOL + name propositions)

Page 29: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 29

Logic-based representations

Precise, formal semantics Expressiveness (esp. FOL) Issues of scaling, decidability, computational

tractability Esp. for FOL

Description Logics growing usage DL + rules languages to approx. FOL

Page 30: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 30

Ontology Integration

Construct mappings between entities (concepts) in pairs of ontologies

Mapping relations: same_as, is_a, part_of units_conversion

Specify mappings via: frames, graphs,or logic Graph-based mappings (C. Joslyn, PNNL) Logic-based mappings (PROMPT, N. Noy)

Page 31: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 31

Partial Orders

Many ontologies are Partial Orders (i.e, directed acyclic graphs), e.g., taxonomies, partonomies, ...

Mappings among partial ordered ontologies should be “order preserving”

See work of Cliff Joslyn (PNNL)

Page 32: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 32

Materialization of Partial Orders Partial orders = taxonomies, partonomies Typically specified as direct “edges”

Immediate is-a, or part-of relations Naïve implementation requires repeated

traversal of the partial order graph. Materialization of the transitive closure of the

partial order (e.g., taxonomy) can reduce query times However, initialization and maintenance are

expensive in time and storage

Page 33: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 33

Ontology Constraints

Type constraints Range, domain constraints Cardinality constraints on relations DB Integrity constraints

Functional dependencies Inclusion dependencies (foreign key constraints) Invertibility Disjointedness (of subclasses)

Page 34: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 34

Need for Provenance

Fiction: Ontologists write definitions ab initio

Reality: Most “definitions” are written by:

Administrators (e.g., Code of Federal Regulations) Legislatures (legislation) Judges (court decisions) Professional bodies (accounting regulations)

Page 35: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 35

Implications for Provenance

We need to track the provenance of definitions Typically this requires citations to external

documents May also require tracking of individual

“definition” decisions .... Varying granularity requirements

Individual definitions Collections of axioms, definitions

Examples: see ISO 11179, XMDR

Page 36: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 36

Other Tools

Ontology Creation tools Ontology Editors Ontology Differencing tools Ontology modularization tools (clustering, etc.) Ontology Export Ontology Visualization (e.g., graph visualization) Version management Access control

Page 37: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 37

SOA: Service Oriented Architecture

Very popular Permit distributed implementations Two major alternatives:

REST (Representational State Transfer) Built on HTTP (get, put, delete, post operators) URL/URI addresses for all objects

SOAP/WSDL Based on XML Remote Procedure Calls

Page 38: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 38

REST vs SOAP

REST Simple to implement Requires little more than:

HTTP server XML parsers

SOAP Much more software complexity Lots of software tooling from commercial vendors Better security ?

Page 39: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 39

My advice on REST vs. SOAP:

Use REST.

Page 40: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 40

Ontology Repository Related Standards

ISO/IEC 11179 Metadata Registries version 3.0 of Part 3)

OMG ODM Ontology Definition Metamodel ISO 13250 Topic Maps XML Topic Maps Specification (topicmaps.org) W3C OWL recommendations W3C RDF recommendations

Page 41: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 41

Ontology Related Standards

ISO/IEC 24707 Common Logic ISO TC 37 Terminology Services Standards W3C SKOS Simple Knowledge Organization

System Reference ISO/IEC 19763 Metamodel Framework for

Interoperability (Ontology metadata)

Page 42: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 42

Recapitulation

Ontology Repositories support storage, search, retrieval of multiple ontologies and ontology integration

Macro-level & Micro-level support and search pose similar problems

A common ontology representation is desirable, but difficult

Multiple ontology representations and ontology integration are the most difficult issues aspects.

Page 43: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 43

Acknowledgements

This work was supported by NSF IPA agreement with LBNL, IRD support.

My earlier work on ontology repositories at LBNL was supported by EPA and DOD.

The author would like to thank Joel Sachs, Mark Musen, Natasha Noy, Eric Neumann, Bob MacGregor, Cliff Joslyn, Kevin Keck, Elise Kendall, Mala Mehrotra, Dan Abadi, Deb McGuiness, et al. for their remarks to me about knowledge representation, ontology repositories and ontology mappings.

Page 44: State of the Art for Ontology Repositories Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg,

April 28, 2008 F. Olken, Ontology Summit 2008 44

Contact Information

Frank Olken National Science Foundation 4201 Wilson Blvd., Suite 1125 Arlington, VA 22230 Email: [email protected] Tel: 703-292-8930 (receptionist) Tel: 703-292-7350 (direct)