state of the art for ontology repositories frank olken national science foundation cise/iis/iii...
TRANSCRIPT
State of the Art for Ontology Repositories
Frank OlkenNational Science Foundation
CISE/IIS/[email protected]
Presentation to Ontology SummitNIST
Gaithersburg, MDv05
April 28, 2008
April 28, 2008 F. Olken, Ontology Summit 2008 2
Disclaimer
Opinions expressed in this talk are solely those of the author, and do not reflect the positions of either the National Science Foundation, CISE, IIS or Lawrence Berkeley National Laboratory.
April 28, 2008 F. Olken, Ontology Summit 2008 3
This talk:
I will address key issues in the design and implementation of ontology repositories and
some of the major technologies being used to address these issues.
April 28, 2008 F. Olken, Ontology Summit 2008 4
Outline
What is an ontology repository? Why doe one want one? Macro vs. Micro Issues Implementation Issues
April 28, 2008 F. Olken, Ontology Summit 2008 5
Implementation Issues
Ontology acquisition, ingestion Macro vs. micro issues Centralized vs. Decentralized Ontology representation Ontology search, query Ontology Integration Auxiliary tools SOA, etc.
April 28, 2008 F. Olken, Ontology Summit 2008 6
What is an Ontology Repository?
System for storing, searching, retrieving multiple ontologies
Support for ontology integration Variously:
Tools for ontology creation, editing, visualization Tools for ontology annotation, curation, ....
April 28, 2008 F. Olken, Ontology Summit 2008 7
Multiple Ontologies
This is the source of the hardest problems in building ontology repositories: Scale Diverse ontology representations Ontology integration (mapping) Namespace issues Complex provenance issues
April 28, 2008 F. Olken, Ontology Summit 2008 8
Why would you want an OR? You need to deal with multiple ontologies Usual reasons for ontologies:
Natural Language Processing support Data Integration, Exchange Data semantics Support for DB queries DB, application design Classification / Indexing of documents, etc. Creation / maintenance /use of controlled
vocabularies
April 28, 2008 F. Olken, Ontology Summit 2008 9
Ontology Acquisition
Manual acquisition and loading
e.g. XMDR Useful if ontology representations are very diverse.
Spidering the web to find ontologies (e.g., Nutch) Google (etc.) search to find ontologies How does one recognize an ontology?
Use of OWL, RDF, CL, etc. Lots of is-a, part-of relations ... Comments that assert file is an ontology
April 28, 2008 F. Olken, Ontology Summit 2008 10
Ontology Ingestion
Parsing ontology, syntactic validation Consistency checking (no cycles in partial
orders: taxonomies, partonomies) Conversion to common representation (?)
Syntactic translation Semantic translation
e.g., CWA vs. OWA
Indexing, transitive closure computations, ...
April 28, 2008 F. Olken, Ontology Summit 2008 11
Centralized vs. Federated Architectures
Centralized: collect ontologies into one place High startup, maintenance costs Fast retrieval, facilitates integration
Federated: ontologies stay put Low startup, maintenance costs Less performance, reliability More requirements on ontology sites
Hybrid Centralize ontology level metadata, indices Leave individual ontologies in place
April 28, 2008 F. Olken, Ontology Summit 2008 12
Macro vs. Micro-level Issues
Macro-level Searching across a collection of ontologies and
their metadata Micro-level
Searching, inferencing, within individual ontologies
April 28, 2008 F. Olken, Ontology Summit 2008 13
Macro & Micro similarities
Most (not all) macro and micro level issues are essentially the same and can use the same
technologies for implementation.
April 28, 2008 F. Olken, Ontology Summit 2008 14
Macro-level Support
Over collections of ontologies Use an ontology of ontologies
e.g., taxonomy of subject matter Ontology of ontology metadata
April 28, 2008 F. Olken, Ontology Summit 2008 15
Ontology Search
Text-based search Natural language definitions Symbols E.g., Lucene, UIMA
Semantic Search Over ontology representation (RDF, OWL, CL) e.g., SPARQL, etc. e.g., faceted search (e.g., Siderean) e.g., navigation over taxonomies, etc.
April 28, 2008 F. Olken, Ontology Summit 2008 16
Ontology Representations
Text Frames (OBO) Graphs (RDF) Logics (OWL-DL, OWL Full, CL)
April 28, 2008 F. Olken, Ontology Summit 2008 17
Text Representation
Obvious candidate for ontology representation of informal ontologies, with natural language definitions, etc. ....
A lowest common denominator representation for more formal ontology representations
Readily supports handling diverse ontology representations (must add tags for underlying ontology representation language)
Only supports text search directly
April 28, 2008 F. Olken, Ontology Summit 2008 18
Frame Representations
Each frame is a collection of: (slot, value) pairs or (slot, value list) Originally deployed in Lisp
Secondary Storage Each frame is a BLOB Or, decompose into finer grained DB entries
Current uses: OBO (open biological ontology) format
April 28, 2008 F. Olken, Ontology Summit 2008 19
Graph Representations
a.k.a. Semantic networks, semantic graphs Examples: RDF, RDF schemas, XLinks List of edges, each edge:
Subject Predicate (relation name, attribute name) Object (or attribute value)
Very flexible Only support binary relations directly
April 28, 2008 F. Olken, Ontology Summit 2008 20
Types of Graphs
Trees Simple Taxonomies (isa), Partonomies (partof)
Multi-faceted Classifications Taxonomies with multiple facets e.g.., Vehicles: purpose, propulsion, wheels, axles,
color Directed acyclic graphs
Multiple inheritance Partial orders
April 28, 2008 F. Olken, Ontology Summit 2008 21
Types of graphs
Arbitrary directed graphs Allows arbitrary binary relationships
Named graphs Allows separate inclusion hierarchy Allow edges to point to/from subgraphs
April 28, 2008 F. Olken, Ontology Summit 2008 22
Partial Orders
Many ontologies are Partial Orders (i.e, directed acyclic graphs), e.g., taxonomies, partonomies, ...
Mappings among partial ordered ontologies should be “order preserving”
See work of Cliff Joslyn (PNNL)
April 28, 2008 F. Olken, Ontology Summit 2008 23
Note:
RDF are collections of edges (triples) No naked nodes allowed
April 28, 2008 F. Olken, Ontology Summit 2008 24
Graph Implementations
Represent graph as: Triple store (as on previous slide) Quad store (support named graphs) Standalone system, relational DBMS, column store
April 28, 2008 F. Olken, Ontology Summit 2008 25
Quad stores & Named graphs
Quad stores allow named graphs (named graph, subject, predicate, object)
Named graphs (quads) allow one to name subgraphs (collections of edges) and to refer to them by name
Hence, subjects and objects are no longer just nodes, but may be subgraphs (collections of edges)
April 28, 2008 F. Olken, Ontology Summit 2008 26
Secondary storage of graphs
Long skinny relations Triples or quads
Column stores (Monet DB, Vertica) Multiple indices sorted by: subject, predicate,
object, combinations, ... Clusters of edges (Cogito)
April 28, 2008 F. Olken, Ontology Summit 2008 27
Semantic graph query languages
SPARQL is now the primary candidate Undergoing W3C “standardization”
April 28, 2008 F. Olken, Ontology Summit 2008 28
Logic-based Ontology Representations
Description Logic (e.g., OWL-DL) Restricted to make it decidable and computationally
tractable Typically, lacks cardinality constraints, arithmetic
Datalog (Horn clause logic + recursion) Prolog based
First Order Logic (e.g., Common Logic) IKL (FOL + name propositions)
April 28, 2008 F. Olken, Ontology Summit 2008 29
Logic-based representations
Precise, formal semantics Expressiveness (esp. FOL) Issues of scaling, decidability, computational
tractability Esp. for FOL
Description Logics growing usage DL + rules languages to approx. FOL
April 28, 2008 F. Olken, Ontology Summit 2008 30
Ontology Integration
Construct mappings between entities (concepts) in pairs of ontologies
Mapping relations: same_as, is_a, part_of units_conversion
Specify mappings via: frames, graphs,or logic Graph-based mappings (C. Joslyn, PNNL) Logic-based mappings (PROMPT, N. Noy)
April 28, 2008 F. Olken, Ontology Summit 2008 31
Partial Orders
Many ontologies are Partial Orders (i.e, directed acyclic graphs), e.g., taxonomies, partonomies, ...
Mappings among partial ordered ontologies should be “order preserving”
See work of Cliff Joslyn (PNNL)
April 28, 2008 F. Olken, Ontology Summit 2008 32
Materialization of Partial Orders Partial orders = taxonomies, partonomies Typically specified as direct “edges”
Immediate is-a, or part-of relations Naïve implementation requires repeated
traversal of the partial order graph. Materialization of the transitive closure of the
partial order (e.g., taxonomy) can reduce query times However, initialization and maintenance are
expensive in time and storage
April 28, 2008 F. Olken, Ontology Summit 2008 33
Ontology Constraints
Type constraints Range, domain constraints Cardinality constraints on relations DB Integrity constraints
Functional dependencies Inclusion dependencies (foreign key constraints) Invertibility Disjointedness (of subclasses)
April 28, 2008 F. Olken, Ontology Summit 2008 34
Need for Provenance
Fiction: Ontologists write definitions ab initio
Reality: Most “definitions” are written by:
Administrators (e.g., Code of Federal Regulations) Legislatures (legislation) Judges (court decisions) Professional bodies (accounting regulations)
April 28, 2008 F. Olken, Ontology Summit 2008 35
Implications for Provenance
We need to track the provenance of definitions Typically this requires citations to external
documents May also require tracking of individual
“definition” decisions .... Varying granularity requirements
Individual definitions Collections of axioms, definitions
Examples: see ISO 11179, XMDR
April 28, 2008 F. Olken, Ontology Summit 2008 36
Other Tools
Ontology Creation tools Ontology Editors Ontology Differencing tools Ontology modularization tools (clustering, etc.) Ontology Export Ontology Visualization (e.g., graph visualization) Version management Access control
April 28, 2008 F. Olken, Ontology Summit 2008 37
SOA: Service Oriented Architecture
Very popular Permit distributed implementations Two major alternatives:
REST (Representational State Transfer) Built on HTTP (get, put, delete, post operators) URL/URI addresses for all objects
SOAP/WSDL Based on XML Remote Procedure Calls
April 28, 2008 F. Olken, Ontology Summit 2008 38
REST vs SOAP
REST Simple to implement Requires little more than:
HTTP server XML parsers
SOAP Much more software complexity Lots of software tooling from commercial vendors Better security ?
April 28, 2008 F. Olken, Ontology Summit 2008 39
My advice on REST vs. SOAP:
Use REST.
April 28, 2008 F. Olken, Ontology Summit 2008 40
Ontology Repository Related Standards
ISO/IEC 11179 Metadata Registries version 3.0 of Part 3)
OMG ODM Ontology Definition Metamodel ISO 13250 Topic Maps XML Topic Maps Specification (topicmaps.org) W3C OWL recommendations W3C RDF recommendations
April 28, 2008 F. Olken, Ontology Summit 2008 41
Ontology Related Standards
ISO/IEC 24707 Common Logic ISO TC 37 Terminology Services Standards W3C SKOS Simple Knowledge Organization
System Reference ISO/IEC 19763 Metamodel Framework for
Interoperability (Ontology metadata)
April 28, 2008 F. Olken, Ontology Summit 2008 42
Recapitulation
Ontology Repositories support storage, search, retrieval of multiple ontologies and ontology integration
Macro-level & Micro-level support and search pose similar problems
A common ontology representation is desirable, but difficult
Multiple ontology representations and ontology integration are the most difficult issues aspects.
April 28, 2008 F. Olken, Ontology Summit 2008 43
Acknowledgements
This work was supported by NSF IPA agreement with LBNL, IRD support.
My earlier work on ontology repositories at LBNL was supported by EPA and DOD.
The author would like to thank Joel Sachs, Mark Musen, Natasha Noy, Eric Neumann, Bob MacGregor, Cliff Joslyn, Kevin Keck, Elise Kendall, Mala Mehrotra, Dan Abadi, Deb McGuiness, et al. for their remarks to me about knowledge representation, ontology repositories and ontology mappings.
April 28, 2008 F. Olken, Ontology Summit 2008 44
Contact Information
Frank Olken National Science Foundation 4201 Wilson Blvd., Suite 1125 Arlington, VA 22230 Email: [email protected] Tel: 703-292-8930 (receptionist) Tel: 703-292-7350 (direct)