ibm 中国研究院 © copyright ibm corporation 2006 ontology storage, reasoning and query ----...

82
IBM 中中中中中 © Copyright IBM Corporation 2006 Ontology Storage, Reasoning and Query ---- Methods, Systems and Applications 马 马 [email protected] IBM China Research Lab

Upload: percival-randall

Post on 28-Dec-2015

227 views

Category:

Documents


3 download

TRANSCRIPT

IBM中国研究院

© Copyright IBM Corporation 2006

Ontology Storage, Reasoning and Query ---- Methods, Systems and Applications

马 力[email protected] China Research Lab

IBM 中国研究院

© Copyright IBM Corporation 2006

Outline

Introduction- Ontology Management- Ontology based Data Management

Ontology Storage, Reasoning and Query- Triple Store- Reasoning- SPARQL Query Language- Faceted Search

Systems and Applications- Sesame, Jena, OWLIM, SOR, - Master Data Management

IBM 中国研究院

© Copyright IBM Corporation 2006

Objectives

Understand issues in ontology management Understand the use of ontology for Data Management Learn core technical pillars for Semantic Web. Learn systems and methods for building Semantic Web applications.

IBM 中国研究院

© Copyright IBM Corporation 2006

What Ontology Is?

Recall what we learned before. Ontology

defines the terms and concepts used to describe and represent an area of knowledge

specification of the conceptualization; a formal way of writing down what we think about a domain.

IBM 中国研究院

© Copyright IBM Corporation 2006

Examples of Ontology

In the Extended Lehigh University Benchmark Ontology (UOBM, 1587 lines) :

. . .

. . .

<owl:Class rdf:about="http://uob.iodt.ibm.com/univ-bench-dl.owl#AssistantProfessor"> <rdfs:subClassOf> <owl:Class rdf:about=“http://uob.iodt.ibm.com/univ-bench-dl.owl#Professor”/> </rdfs:subClassOf> <rdfs:label>assistant professor</rdfs:label> </owl:Class> . . . . . .

IBM 中国研究院

© Copyright IBM Corporation 2006

Ontology: one possible formal definition

An ontology is a 3-tupel O := ( C, R, A )- C is the set of concepts C={c1, c2, … cn}

- R is the set of relations R={h,r1, r2, … rn} R≤2CxC

h: concept hierarchy

- A is the set of axioms A={a1, a2, …, an} axiom is expressed in some logical language e.g. father(a,b) ^ father(a,c) b=c

- Optionally, with a symbol mapping function

IBM 中国研究院

© Copyright IBM Corporation 2006

Ontology Reuse

Ontologies- UMLS: Unified Medical Language System

178,904 frames, 7 relations, 1,729,817 assertions- Word Net: an online lexical reference system

217,623 frames, 17 relations, 385,771 assertions- IFW/FS-BOM: Financial Service – Business Object Model

387 classes, 5878 relations- Caper: an ontology used by a case-based AI planning

116,297 frames, 112 relations, 1,768,832 assertions- GO (Gene Ontology) : An Ontology for gene terms

18059 classes, 1 relation- NCI Thesaurus : An Ontology by National Cancer Institute

72,603 classes, 70 relations- Cyc: a formalized representation of fundamental human knowledge

2,843 classes, 1242 relations- UNSPSC:Universal Standard Products and Services Classification Code

9795 classes, 1 relation

IBM 中国研究院

© Copyright IBM Corporation 2006

Why Ontology Management

Ontologies evolve/change over time because of: changes in the real-world (or changes in the domain) adaptations to different tasks (or changes in conceptualization) , or alignments to other ontologies (or changes in specification)

Solution A change management methodology is needed that involves

- advanced versioning methods- configuration management

An ontology management system will facilitate ontology re-use by: open storage, identification and versioning. providing smooth access to existing ontologies and advanced support in adapti

ng ontologies to certain domain and task-specific circumstances. fully employing the power of standardization.

IBM 中国研究院

© Copyright IBM Corporation 2006

how to align different domain descriptions how to handle changes over time how to maintain versions of ontologies how to store ontologies how to identify and retrieve ontologies ???… etc.

Questions in Ontology Management

IBM 中国研究院

© Copyright IBM Corporation 2006

Issues in Ontology Management

Storage accessibility (client/server, Peer-to-Peer, etc.); classification (Classifying ontologies in order to reorganizing and reuse ontologies) module structure (facilitate the process of re-use, mapping and integration).

Identification unique identifier

Versioning Versioning is very critical in ensuring the consistency among different versions of ontologi

es. Search and Query

keyword-based searching or other advanced searching browsing

Editing Remote and cooperative editing

Reasoning (derive consequences from an ontology) ontology evaluation and verification Any query-answering behavior

Alignment Ontologies can be integrated or separated. In both cases, they need to be aligned.

IBM 中国研究院

© Copyright IBM Corporation 2006

Why:- Because different departments and individual employees create domain-sp

ecific ontologies capturing specific aspects of their knowledge.

How: - Special mapping ontologies must be created to link different terminologies

and modeling styles used in these domain specific ontologies, creating bridges between separated pieces of knowledge.

- These bridges along with domain ontologies are then used to perform cross-ontology information search and retrieval.

Three types of mapping:- Inter-model mapping: mapping the ontology language constructs for ontolo

gy translation- Inter-schema mapping: defining the relation between ontology elements for

data translation- Model-to-schema mapping: combining the above two

Alignment and Mapping

IBM 中国研究院

© Copyright IBM Corporation 2006

Outline

Introduction- Ontology Management- Ontology for Data Management

Ontology Storage, Reasoning and Query- Triple Store- Reasoning on Large-Scale Data- SPARQL Query Language- Faceted Search

Systems and Applications- Sesame, Jena, OWLIM, SOR, - Master Data Management

IBM 中国研究院

© Copyright IBM Corporation 2006

Semantic Data Management

Mapping existing data to Ontology - Adapting SQL (or XML) Databases

IBM China Research Laboratory

© 2003 IBM CorporationSemantic Data Management

Query1. Find Company EDOX’s all direct and indirect shareholders who are from Europe and are IT company.

2. A software company that has products about wireless telecom and is held by a Canada company

Company info.

Company business

Semantics Data Management (different levels of conceptual granularity and term variants)

FOO should be returned

BAR should be returned

ID Name Region Tel. Shareholders1

Shareholders2

1 BAR Bei Jing Xxxxx FOO TIT

2 FOO Paris Xxxxx GUC Null

3 ROL New York Xxxxx BAR TIT

4 EDOX New York xxxxx ROL Null

5 GUC Vancouver xxxxx CHA Null

ID Name Business_1 Business_2

1 BAR Memory Wireless software

2 FOO Optical comm. Wireless comm.

3 ROL Banking Solut. Null

4 EDOX Memory Main Board

IBM China Research Laboratory

© 2003 IBM CorporationSemantic Data Management

Ontology based semantic query1. Find Company EDOX’s all direct and indirect shareholders who are from Europe and are IT company.

2. A software company that has products about wireless telecom and is held by a Canada company

ID Name Region Tel. Shareholders1

Shareholders2

1 BAR Bei Jing Xxxxx FOO TIT

2 FOO Paris Xxxxx GUC Null

3 ROL New York Xxxxx BAR TIT

4 EDOX New York xxxxx ROL Null

5 GUC Vancouver xxxxx CHA Null

ID Name Business_1 Business_2

1 BAR Memory Wireless software

2 FOO Optical comm. Wireless comm.

3 ROL Banking Solut. Null

4 EDOX Memory Main Board

Company info.

Company business

FOO is retrieved using transitive closure and subsumption inference.

BAR is retrieved using classification and subsumption inference

Shareholder

Company

Region Business

Located in Conduct

ontology

Hold

Business

Finance …

Banking…

IT

TelecomPC

Hardware…

Software

Optical Wireless

Wireless Software

Main board

Memory

Solution

Region

Asia Euro.Amer.

East Asia

China

BeiJing

…North Amer.

USA

NY

Canada

France

Paris

Vancouver

Semantics Data Management (different levels of conceptual granularity and term variants)

IBM China Research Laboratory

© 2003 IBM CorporationSemantic Data Management

Query

Which genes may be affected by drug OLANZAPINE

Semantics Data Management (hidden linkages)

ID Drugs Symptom Comments

1 OLANZAPINE HALLUCINATION Effective

2 OLANZAPINE ANXIETY Good

3 LITHIUM DEPRESSION Null

ID Disease Symptom_1 Symptom_2

1 ALZHEIMER'S

DISEASE

ANXIETY MEMORY LOSS

3 SCHIZOPHRENIA DELUSION HALLUCINATION

2 BIPOLAR DISORDER

DEPRESSION HALLUCINATION

ID Disease Gene Comments

1 SCHIZOPHRENIA G1 NULL

2 BIPOLAR DISORDER

MAFD1 GENE NULL

3 BIPOLAR DISORDER

DIBD1 GENE NULL

(G1 and MAFD1 GENE) should be returned

IBM China Research Laboratory

© 2003 IBM CorporationSemantic Data Management

Ontology based semantic query

Which genes may be affected by drug OLANZAPINE

Semantics Data Management (hidden linkages)

ID Drugs Symptom Comments

1 OLANZAPINE HALLUCINATION Effective

2 OLANZAPINE ANXIETY Good

3 LITHIUM DEPRESSION Null

ID Disease Symptom_1 Symptom_2

1 ALZHEIMER'S

DISEASE

ANXIETY MEMORY LOSS

3 SCHIZOPHRENIA DELUSION HALLUCINATION

2 BIPOLAR DISORDER

DEPRESSION HALLUCINATION

ID Disease Gene Comments

1 SCHIZOPHRENIA G1 NULL

2 BIPOLAR DISORDER

MAFD1 GENE NULL

3 BIPOLAR DISORDER

DIBD1 GENE NULL

Drug

Symptom

Disease

ReduceHasSym

Treat

Gene

Associatedwith

(G1 and MAFD1 GENE) is retrieved with Drug-Symptom-Disease-Gene affection path

IBM 中国研究院

© Copyright IBM Corporation 2006

Semantic Data Management

Key Motivations- Reduce complexity and simplify integration for IT systems and applications by the effect

ive use of ontologies (metadata), and improve understanding by the use of shared business vocabulary (one language).

- Implicit linkages are made explicit and help user discover hidden relationship- Ontologies (domain metadata) are separately and effectively managed, avoiding to mix

with data, for better ontology share/reuse and rapid semantic-rich application development

Capabilities- Management of Data Concepts and Schemas (Metadata)

Storing and querying ontology (shared vocabularies of business concepts, metadata)

Mapping database schemas to the ontology to formally capture the semantics of corporate data

- Semantic Data Validation Using description logic reasoner to validate enterprise ontology Using inference rules to validate integrity of the data based on a set of restrictions.

The inference rules will automatically identify inconsistencies when querying for information.

- Semantic Query Semantic relationships and taxonomies for Text / Content analysis & search Ontology based query on the existing data

IBM 中国研究院

© Copyright IBM Corporation 2006

Ontology-base (Ontology Repository)

IBM 中国研究院

© Copyright IBM Corporation 2006

Ontology with Large-scale Instances

IBM 中国研究院

© Copyright IBM Corporation 2006

RDF and OWL ontology repositories

Problem definition- The continued rapid growth of ontologies in various domains critically req

uires efficient methods and tools for its storage and inference. - Develop high performance ontology repositories applicable in real busine

ss. How to solve

- Build storage model on well-optimized RDBMS - Provide powerful inference support (subset of OWL-DL)- Support expressive query language, SPARQL- Rich full text search capability, such as Faceted Search

Results- RDF and OWL ontology inference method- Ontology storage tool

RDF ontology repository OWL ontology repository

IBM 中国研究院

© Copyright IBM Corporation 2006

Summary

Topics Function

Ontology Repository Semantic data management

Storage Triple based relational tables Mapping between ontology and DB schema

Inference capability

RDF inference;

Powerful OWL inference (OWL DL);

Less Expressive inference;

Query evaluation

SPARQL query on SQL query engine

SPARQL query + query rewriting

Capability Pure ontology storage toolOntology +instance storage tool

Use ontology to model existing data and provide semantic query

IBM 中国研究院

© Copyright IBM Corporation 2006

Outline

Introduction- Ontology Management- Ontology for Data Management

Ontology Storage, Reasoning and Query- Triple Store- Reasoning on Large-Scale Data- SPARQL Query Language- Faceted Search

Systems and Applications- Sesame, Jena, OWLIM, DLDB-OWL, SOR, - Master Data Management

IBM 中国研究院

© Copyright IBM Corporation 2006

Triple Store: RDF Data Model

Data model for expressing knowledge basic building block: statement

<person001> <name> “Jeen” .

groups of statements form graphs

[email protected]

Jeenname

email

project001

worksIn

SORname

projectMemberEmail

IBM 中国研究院

© Copyright IBM Corporation 2006

Triple Store: RDFS

RDF Schema is a Vocabulary Description Language

- it allows specification of domain vocabulary and a way to structure it

- Class, Property, subClassOf, subPropertyOf, domain, range

Formal semantics add simple reasoning capabilities:

- class and property subsumption- domain and range inference

person001

Researcher

Person

name

rdf:Property

rdfs:Class

rdf:type

rdf:type

rdf:type

rdfs:domainrdfs:subClassOf

IBM 中国研究院

© Copyright IBM Corporation 2006

Triple Store: OWL

<?xml version="1.0" encoding="UTF-8" ?><rdf:RDF xml:base = "http://www.ibm.com/crl#" xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#">

<owl:Ontology rdf:about=""> <rdfs:comment>An example to show differences between owl-lite, DLP and owl-DL</rdfs:comment></owl:Ontology>

<owl:Class rdf:about=#Faculty> <owl:unionOf rdf:parseType="Collection"> <owl:Class rdf:about="#Professor" /> <owl:Class rdf:about="#Post-Doc" /> <owl:Class rdf:about="#Lecturet" /> </owl:unionOf></owl:Class>

<owl:Class rdf:ID=“Ph.D Student"> <rdfs:subClassOf> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="#Person" /> <owl:Restriction> <owl:onProperty rdf:resource="#take" /> <owl:someValuesFrom> <owl:Class rdf:about="#Ph.D course" />

</owl:someValuesFrom> </owl:Restriction> </owl:intersectionOf> </rdfs:subClassOf> </owl:Class>

courseDPhTake ..

Faculty

LectureProfessor

OWL:UnionOf

Ph.D Student

Blank node

Person

rdfs:subClass

OWL:intersectionOf

Post-Doc

IBM China Research Laboratory

© 2005 IBM Corporation

<owl:Class> <owl:complementOf>

<owl:Class rdf:ID="#Ph.D Student"/> </owl:complementOf></owl:Class>

<owl:Class rdf:about=“Major”> <owl:oneOf rdf:parseType="Collection"> <owl:Thing rdf:about="#CS"/> <owl:Thing rdf:about="#EE"/>

<owl:Thing rdf:about="#Physics"/></owl:oneOf> </owl:Class>

<owl:class rdf:ID=“FrenchCitizens”> <owl:equivalentClass> <owl:restriction> <owl:onProperty rdf:resource="#hasNationality" /> <owl:hasValue rdf:resource="#France" /> </owl:restriction> </owl:equivalentClass></owl:class>

<owl:class rdf:ID=“multiple nationality citizens” > <owl:equivalentClass> <owl:restriction> <owl:onProperty rdf:resource="#hasNationality" /> <owl:minCardinality rdf:datatype="xsd;nonNegativeInteger">2</owl:minCardinality> </owl:restriction> </owl:equivalentClass> </owl:class>

</rdf:RDF>

Anonymous class

Ph.D student

OWL:complementOf

Major

CS

OWL:oneOf

EE

……

FrenchStudent

France

hasNationality

IBM 中国研究院

© Copyright IBM Corporation 2006

Triple Store: Generic Storage Model

IBM 中国研究院

© Copyright IBM Corporation 2006

Triple Store: Problems

Do not leverage patterns in data Can not leverage locality (spatial/temporal) Excessive load time (can not use db loader) Database optimizer useless – no statistics

(?var, ex:empId, 123) vs. (?var, ex:gender, “M”)

Alternatives: native RDF store, object-relational store, property tables

IBM 中国研究院

© Copyright IBM Corporation 2006

Triple Store: Binary Storage Model

Too many tables?

IBM 中国研究院

© Copyright IBM Corporation 2006

Triple Store: Improved Generic Storage Model

IBM 中国研究院

© Copyright IBM Corporation 2006

Triple Store: Native Storage Model

OWLIM: Persistence based on N-Triple files HStar: Persistence based on XML storage model

IBM 中国研究院

© Copyright IBM Corporation 2006

Summary and Questions

Categories of Triple Stores:- Generic store with very simple schema- Binary store leveraging OR/OO DB’s features- Native store without complicated transaction and access control

We will introduce optimizations of triple store in next class- Jena’s property table- Index mechanisms- Optimization on triple table- Binary store vs Generic Store

IBM 中国研究院

© Copyright IBM Corporation 2006

Ontologies in Semantic Web

Statistics based on a WWW 2006 paper by T.D. Wang. Survey of 1,211 ontologies.

46%

32%

22%

Not measuredin paper

Shallow ontologies include relatively simple Tbox and are mainly used to organize instances of a huge size.Deep ontologies consist of complex concepts and relations and are often used to classify complex sets of properties as certain sorts of object.

Nigel Shadbolt, Tim Berners-Lee and Wendy Hall, The Semantic Web Revisited, IEEE Intelligent Systems 21(3) pp. 96-101, 2006

RDFS (DL)

Description Horn Logic

OWL Lite

, 1 , , ,R transitivity symmetry inverse

. , . , 1RC RC R

, , ,C D R P Domain Range

OWL DL, , , ,C nR nR one of

IBM 中国研究院

© Copyright IBM Corporation 2006

Ontology Reasoning

Existing ontology persistent systems can be roughly categorized into two classes by reasoning:- DL-based systems: DB serves mainly for scalable storage and convenient retrie

val, and classic DL tableaux algorithms for reasoning. Query answering is reduced to check the satisfiability of KB. Instances Store: Role-free system, on instances classification. IBM SHER Engine: Use summarization and filtering technologies.

- Rule-based systems: Translate DL constructs into rules. Those DL constructs (e.g. existential restrictions) are either partially forbidden (as DLP does) or assigned new meanings (as OWL Flight does). Unlike DL tableaux algorithms, the evaluation of queries adopts strategies by forward chaining or backward chaining. KAON2: Reduce OWL to disjunctive Datalog Programs, extending with DL safe rules. OWLIM: Materialize inferred closure of OWL KBs. Sesame RDF database: Materialize inferred results in database. Jena2 with RDBMS support: Use external reasoning engine in main memory. Oracle RDF database: Support user-defined rules and materialize rule index.

IBM 中国研究院

© Copyright IBM Corporation 2006

Ontology Reasoning

Description Logic Program- An intersection of Description Logics and Logic Programming, and can be implement

ed by a set of rules.- Some DL constructs (e.g. existential restrictions) are partially forbidden in subsumptio

n axioms

These systems are in essence knowledge bases using databases as a persistent store, and focus on ontology reasoning problem. Discrepancies between ontologies and databases are paid less attention.

IBM 中国研究院

© Copyright IBM Corporation 2006

Discrepancies Between Ontologies and Databases

Ontologies and Description Logic (OWL DL)- Open World Assumption (Allow incomplete info. in ABox)- Restrictions for reasoning

takeCourse rdf:domain People, John takeCourse English001- Monotonic negation- Reasoning in OWL DL is NExpTime-complete

TBox reasoning can be well done ABox reasoning is not scalable

Databases- Closed World Assumption (Info. understood as complete)- Constraints for checking

- Non-monotonic negation- Industry strength tools

pID courseID name

People takeCourse

IBM 中国研究院

© Copyright IBM Corporation 2006

Ontology Reasoning

We will introduce more details in next class- Forward chaining- Backward chaining- Ways to scale up classic DL reasoner- Ways to bridge discrepancies between ontologies and databases

IBM 中国研究院

© Copyright IBM Corporation 2006

Semantic Web Query and Search

Keywords search (SWOOGLE) SPARQL query Faceted browsing Visualization

IBM 中国研究院

© Copyright IBM Corporation 2006

Query Semantic Web

• Data Access• Information

organisation• Information format• Identification• Serialization

SPARQL

OWL, RDFS

RDF

URIs

XML

IBM 中国研究院

© Copyright IBM Corporation 2006

Query Patterns

Company

Company

Company

Company

Guarantee

ShareHolding

ShareHolding

Person

ShareHolding

Administering

Company

Company

Company

Company

Relative

ShareHolding

ShareHolding

Person

ShareHolding

Administering

Person

Guarantee

Administering

Company

Advisory Company

Company

Listed Company

Relative

ShareHolding

ShareHolding

Person

ShareHolding

Administering

Person

Advising

Administering

Person

ShareHolding

CompanyShareHolding

Company

Company

Company

Company

Guarantee

Guarantee

ShareHolding

Company

Company

Company

Listed Company

Relative

ShareHolding

ShareHolding

Person

ShareHolding

Administering

Person

Occupy

Administering

Person

ShareHolding

Company

ShareHolding

ShareHolding

Listed Company

CompanyGuarantee

ShareHolding

ShareHolding

Listed Company

IBM 中国研究院

© Copyright IBM Corporation 2006

SPARQL

SPARQL = Query Language + Protocol + XML Results Format

• Access and query RDF graphs

• HTTP and SOAP

• Results: fixed XML form for further transformation

• Product of the RDF Data Access Working Group

• Status: W3C Candidate Recommendation

IBM 中国研究院

© Copyright IBM Corporation 2006

SPARQL Query

PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?title2WHERE{ ?doc dc:title "SPARQL at speed" . ?doc dc:creator ?c . ?docOther dc:creator ?c . ?docOther dc:title ?title2

}

• On a papers database:“Find other papers by the authors of a given paper.”

IBM 中国研究院

© Copyright IBM Corporation 2006

SPARQL Query

PREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX shop: <http://example/shop#>

SELECT ?title2WHERE{ ?doc dc:title ?title . FILTER regex(?title, "SPARQL") . ?doc dc:creator ?c . ?c foaf:name ?name . OPTIONAL { ?doc shop:price ?price }}

• “ Find books with ‘SPARQL’ in the title. Get the authors’ name and the price (if available).”

• Multiple vocabularies

IBM 中国研究院

© Copyright IBM Corporation 2006

SPARQL Query and Inference

An RDF graph may be backed by inference− OWL, RDFS, application, rules

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT ?typeWHERE{ ?x rdf:type ?type .}

:x rdf:type :C .:C rdfs:subClassOf :D .

--------| type |========| :C || :D |--------

IBM 中国研究院

© Copyright IBM Corporation 2006

SPARQL : Data Virtualization

SPARQL as integrator - Data remains where it is- Existing applications untouched- data appears as RDF, remap query to native form

SPARQL to SQL Direct mapping of tables

- Semi-automatic generation of mapping

Modelled: D2RQ- High-quality mapping, manually developed

IBM 中国研究院

© Copyright IBM Corporation 2006

Federated Query: Single Point of Access

Query BrokerSPARQL => SPARQL

DocDB

CorpLDAP

RDF

SP

AR

QL

Query

SP

AR

QL

Query

SP

AR

QL

Query

SP

AR

QL

Query

Inputs:- Service

Description- Information

Directory- Request

Outputs:- Unified results

IBM 中国研究院

© Copyright IBM Corporation 2006

SPARQL Query Processing

Find those who worked on a product bought by a specific customer, return their name and their age if they are younger than 35; and those whose departement sold a product to a specific customer and whose age is older than 50

- PREFIX md: <http://crl.ibm.com/MDM#> - SELECT ?person ?age- WHERE { <customer> md:buyProducts ?product . ?person md:workOn ?product.

OPTIONAL { ?person md:age ?age. FILTER (?age < 35) } } - UNION - { <customer> md:buyProducts ?product . ?product md:producedBy ?department . ?person md:

belongTo ?department. OPTIONAL { ?person md:age ?age. FILTER (?age > 50) } }

md:buyProducts

<customer>

?product

md:workedOn

?person

P1: andPattern md:belongTomd:buyProducts

<customer>

?product

md:producedBy

?department?person

P2: andPattern

md:age

?person

?age

FILTER (?age <35)

P3: OPPattern

md:age

?person

?age

FILTER (?age > 50)

P4: OPPattern

Query Pattern Tree

ORPatternVars: ?person, ?age

andPattern: P1Vars: ?person, ?product

andPattern: P2Vars: ?person, ?product,

?department

OPPattern: P3Vars: ?person, ?age

OPPattern: P4Vars: ?person, ?age

IBM 中国研究院

© Copyright IBM Corporation 2006

A Relational Algebra for SPARQL

IBM 中国研究院

© Copyright IBM Corporation 2006

Evaluation of Different Access Methods

Compared to zero-effort interfaces (keyword search) Gave simple tasks (e.g. find all blue-eyed terrorists) Results:

- higher solution rate, preferred interface- complex queries difficult for people- ranking not intuitive

IBM 中国研究院

© Copyright IBM Corporation 2006

Faceted Search

Facet = meta-data element- e.g. 'author', 'title', 'date‘, ‘type’

Facets have values - e.g. 'author is J. Brown'

In collections facet values are related- e.g. author 'J. Brown' is connected to title 'Once upon a time ...'

Faceted search = chose a facet value and see all related facets and values

IBM 中国研究院

© Copyright IBM Corporation 2006

Faceted Search

Problem solved- user has problems specifying query- over- and under specification

Solution- showing all options- give ways to drill down the information

Applied- database selection (e.g. job sites), e-commerce (e.g. travel),

enhancement of (full text) search

IBM 中国研究院

© Copyright IBM Corporation 2006

Example of Faceted Search

Facet: Type

Facet values:Adobe AD, HTMLDocument, XML Document

Nr. of instances per facet values

IBM 中国研究院

© Copyright IBM Corporation 2006

Facets,categoriesand result counts ineach category

specify keywords for search

details of the current query (editable)

the list of search results

Example of Faceted Search

IBM 中国研究院

© Copyright IBM Corporation 2006

Facets are Data Views

Each navigation facet is driven by a SPARQL query on the underlying repository

SPARQL queries can retrieve and transform the data to provide a facet ‘view’

Spectacle uses the query results to populate the facet with values

IBM 中国研究院

© Copyright IBM Corporation 2006

BrowseRDF.com

IBM 中国研究院

© Copyright IBM Corporation 2006

BrowseRDF.com

IBM 中国研究院

© Copyright IBM Corporation 2006

BrowseRDF.com

Semantics Technologies | IBM China Research Lab

websphere Go !

Search and Explore

Semantics Technologies | IBM China Research Lab

“WebSphere”

(~170,000)

•Industry•Automotive (162)•Banking (132)•Healthcare (203)

. . .

•Brand•WebSphere (362)•Workplace (12)•Rational (13)

. . .

•Product•WebSphere Application Server (162)•WebSphere Portal Server (132)•WebSphere Process Server (203)

. . .

searchCategories

Items searchwebsphere

about 170,000 results

some result 1price: 105, year: 2004, type: DMC-100full text description of the item

some result 2price: 90, year: 2003, type: SYblah blah blah

some result 3price: 100, year: 2005, type: CXYblah blah blah

next page …

Query Navigation

Schema Search and Navigation

Instance Search and Navigation

Semantics Technologies | IBM China Research Lab

“WebSphere”

(~170,000)

•Industry•Automotive (162)•Banking (132)•Healthcare (203)

. . .

•Brand•WebSphere (362)•Workplace (12)•Rational (13)

. . .

•Product•WebSphere Application Server (162)•WebSphere Portal Server (132)•WebSphere Process Server (203)

. . .

searchCategories

Items searchwebsphere

about 170,000 results

some result 1price: 105, year: 2004, type: DMC-100full text description of the item

some result 2price: 90, year: 2003, type: SYblah blah blah

some result 3price: 100, year: 2005, type: CXYblah blah blah

next page …

Restart the query

Customizable Layout via

Max/Min-imize and

Drag & Drop

Semantics Technologies | IBM China Research Lab

“WebSphere”

(~170,000)

•Industry•Automotive (162)•Banking (132)•Healthcare (203)

. . .

•Brand•WebSphere (362)•Workplace (12)•Rational (13)

. . .

•Product•WebSphere Application Server (162)•WebSphere Portal Server (132)•WebSphere Process Server (203)

. . .

searchCategories

Items searchwebsphere

about 170,000 results

some result 1price: 105, year: 2004, type: DMC-100full text description of the item

some result 2price: 90, year: 2003, type: SYblah blah blah

some result 3price: 100, year: 2005, type: CXYblah blah blah

next page …

“WebSphere” & Banking (~1000)

•Banking•Core Systems (16)•Customer Insight (13)•Payments (20)

. . .

•Brand•WebSphere (32)•Workplace (1)•Rational (3)

. . .

•Product•WebSphere Application Server (12)•WebSphere Portal Server (32)•WebSphere Process Server (23)

. . .

about 1,000 results

new result 1price: 105, year: 2004, type: DMC-100full text description of the item

new result 2price: 90, year: 2003, type: SYblah blah blah

new result 3price: 100, year: 2005, type: CXYblah blah blah

next page …

Semantics Technologies | IBM China Research Lab

searchCategories

Items searchwebsphere

“WebSphere” & Banking (~1000)

•Banking•Core Systems (16)•Customer Insight (13)•Payments (20)

. . .

•Brand•WebSphere (32)•Workplace (1)•Rational (3)

. . .

•Product•WebSphere Application Server (12)•WebSphere Portal Server (32)•WebSphere Process Server (23)

. . .

about 170,000 results

new result 1price: 105, year: 2004, type: DMC-100full text description of the item

new result 2price: 90, year: 2003, type: SYblah blah blah

new result 3price: 100, year: 2005, type: CXYblah blah blah

next page …

Semantics Technologies | IBM China Research Lab

searchRelationships

Item Relations search

“WebSphere” & Banking (~1000)

•Replaced-By (200) •Cross Sell (3500)•HV Cross-Sell (3000)•AB Cross-Sell (500)

•Sold-In (100)

about 4,000 results

Item 1 sold-in store1, store2, store3,

store4, store5, …

Item 1 hv cross-sell item2, item3, item 4,

item5, …

item 2 replaced-by item4, item7

… next page …

•Up-Sell (105) •Manufactured-By (105)

Any Item (~3000)(~4000)

Semantics Technologies | IBM China Research Lab

searchRelationships

Item Relations search

“WebSphere” & Banking (~1000)

•HV Cross-Sell (3000)

•AB Cross-Sell (500)

about 3,500 results

Item 1 hv cross -sell store1, store2, store3,

store4, store5, …

Item 2 ab cross-sell item2, item3, item 4,

item5, …

item 4 cross-sell item6, item9

… next page …

Any Item (~2500)Cross Sell

(~3500)

Semantics Technologies | IBM China Research Lab

searchCategories

Items search

“WebSphere” & Banking (~1000)

Any Item (~2500)Cross Sell

(~3500)

•Solution•Core Systems (16)•Customer Insight (13)•OLAP (20)

. . .

•Brand•WebSphere (32)•Workplace (1)•Rational (3)

. . .

•Product•WebSphere Application Server (12)•WebSphere Portal Server (32)•WebSphere Process Server (23)

. . .

about 2,500 results

new result 1price: 105, year: 2004, type: DMC-100full text description of the item

new result 2price: 90, year: 2003, type: SYblah blah blah

new result 3price: 100, year: 2005, type: CXYblah blah blah

next page …

It starts over again on category constraints…

It’s an iterative search and query refinement process with “any time

go back”.

Query Navigation Panel is also expandable

to occupy more screen space

IBM 中国研究院

© Copyright IBM Corporation 2006

Outline

Introduction- Ontology Management- Ontology based Data Management

Ontology Storage, Reasoning and Query- Triple Store- Reasoning on Large-Scale Data- SPARQL Query Language- Faceted Search

Systems and Applications- Sesame, Jena, OWLIM, SOR, - Master Data Management

IBM 中国研究院

© Copyright IBM Corporation 2006

Sesame

A framework for storage, querying and inferencing of RDF and RDF Schema

A Java Library for handling RDF A Database Server for (remote) access

to repositories of RDF data Open Source project by Aduna

http://www.openRDF.org/

IBM 中国研究院

© Copyright IBM Corporation 2006

Sesame features

Light-weight yet powerful Java API Highly expressive query and transformation languages

- SeRQL, SPARQL High scalability (O(107) RDF triples on desktop hardware) Various backends

- Native Store- RDBMS (MySQL, Oracle 10, DB2, PostgreSQL)- main memory

Reasoning support- RDF Schema reasoner- OWL DLP (OWLIM)- domain reasoning (custom rule engine)

Rio Toolkit: parsers and writers for different RDF syntaxes:- RDF/XML, Turtle, N3, N-Triples, TriX

IBM 中国研究院

© Copyright IBM Corporation 2006

Sesame Architecture

HTTP Handler SOAP Handler

HTTP

Client1

RMI

Client2

SO

AP

Client3

Admin Module Query Module Export Module

Storage And Inference Layers (SAIL)

DB, Files, …

Repository Services

Modules

SAILs

Remote Access

IBM 中国研究院

© Copyright IBM Corporation 2006

Sesame Architecture

RDF ModelRDF Model

RioRioSAIL APISAIL API

SAIL Query Model

SeRQLSeRQL SPARQLSPARQL

Repository Access APIRepository Access API

HTTP ServerHTTP Serverapplicationapplication

applicationapplication

HTTP / SPARQL protocol

Storage And Inference Layer

System API for ‘wrapping’ storage backend

The core RDF model, containingobjects and interfaces for URIs, blank nodes, literals, statements.

RDF I/O

Set of parsers and writers for RDF/XML, Turtle, N3, N-Triples.Can be used separately.

Declarative Querying and other ‘higher-level’ functions on SAILs

Main Access API of Sesame

Offers developer-friendlymethods for manipulatingRDF data (query, adding,removing, updating)

Local apps can just include (partsof) Sesame as a Java library and use it to process RDF data efficiently.

Allows deployment of Sesame as a web-enabled database server (e.g. in Tomcat).Implements a superset of SPARQLprotocol (HTTP REST)

Remote apps can communicate overthe Web with a Sesame server andupdate data or do queries

IBM 中国研究院

© Copyright IBM Corporation 2006

The SAIL API

Storage And Inferencing Layer Abstraction from physical storage

- allows other Sesame components to function on any type of store- can be used as a wrapper layer for a

particular data source System Internal API

- application developers typically do not use it directly

IBM 中国研究院

© Copyright IBM Corporation 2006

The Repository Access API

A single Java object representation for a Sesame database, offering methods for

- evaluating a query and retrieving the result- adding RDF data from local file, from the web, as a text string,

etc.- adding/removing (sets of) RDF statements- starting/stopping transactions

IBM 中国研究院

© Copyright IBM Corporation 2006

OWLIM: A Native RDF Store

OWLIM is an OWL DLP In-Memory SAIL for Sesame 1.1. SAIL = storage and inference layer OWLIM supports partial, forward chaining based, reasoning over OWL DLP. It is open-source OWLIM “reasons” in-memory, but has a comprehensive persistence and backup

strategy Persistence based on N-Triple files:

- Number of files can be given as “pre-loaded”- Only one of them can be updated (the “main trunk”)

Special attention paid to assure that no loss of information or inconsistency can be caused by an unpredicted interruption

The strategy for synchronization of the in-memory representation with the persistent files is configurable

TRREE, http://www.ontotext.com/trree/, stands for Triple Reasoning and Rule Entailment Engine. TRREE performs reasoning based on forward-chaining of entailment rules over RDF triple patterns with variables.

IBM 中国研究院

© Copyright IBM Corporation 2006

Jena Architecture

Network API

Query

Inferences

RDF API

Stores

Readers Writers

DAML API RDF-S API

Joseki

RDQL Other QL

Mainmemory

Relationaldatabase

BerkeleyDB

ARP

n-triples

N3

RDF/XML

n-triples

N3

IBM 中国研究院

© Copyright IBM Corporation 2006

IBM Scalable Ontology Repository

Persistent Store

OWL documents

OWL Parser

DB Translator

SPARQL Processor

Users

Reasoning

Import

Storage

Query Answering

TBox Translator

Query Adaptor

DL ReasonerRule Inference

Engine

Simplified Datalog Engine

Generate EODM models from documents

Load and traverse EODM Abpx

Insert Abox assertions into DB

Load and traverse EODM Tbox

Insert Tbox

Retrieve Tbox

Retrieve subsumption

Insert Tbox

SPARQL queries and results

Retrieve data for query answering & reasoning

Insert data for reasoning Generate SPARQL memory model

Enhanced Datalog Engine

SHER

DL Reasoner

ABox Summarizer ABox Filters

Membership and relationship query

TBox Translator

Query Adaptor

Generate reasoning task

Returned results by SQLs

Return results

•SPARQL2SQL translation

•Return resutls

Reasoners

TBox Translator

Query Adaptor

Lightweight Datalog Engine

IBM 中国研究院

© Copyright IBM Corporation 2006

IBM SOR: Inference

The method combines TBox inference of DL reasoner with logic rules for ABox inference. This promises that OWL ontology inference restricted by DLP is complete and sound as well as the important subsumption relationship among classes and properties can be made explicit

TBox inference of DL reasoner

+

ABox logic rules

IBM 中国研究院

© Copyright IBM Corporation 2006

Application: Semantic Master Data Management

“Master data is data that is shared across systems (such as lists or hierarchies of customers, suppliers, accounts, or organizational units) and is used to classify and define transactional data.” [IDC]

Examples- Sell Product A to Customer X on 1/1/06 for $100.- With Master Data, we should be able to answer to such questions

What is a “customer” ?- It is a subclass of People with the specific attributes A,B,C …

How to add a new customer ?- Defines the workflow

How to know that 2 customers refers to the same identity ?- Defines some business rules

IBM 中国研究院

© Copyright IBM Corporation 2006

Decouples master information from individual applications

Becomes a central, application independent resource

Simplifies ongoing integration tasks and new app development

Ensure consistent master information across transactional and analytical systems

Addresses key issues such as data quality and consistency proactively rather than “after the fact” in the data warehouse

Historical /AnalyticalSystems

Existing

Applications

MasterData

MasterData

Existing

Applications

MasterData

MasterData

Existing

Applications

MasterData

MasterData

Master Data

Management

System

New

Applications

What Is Master Data Management?

IBM 中国研究院

© Copyright IBM Corporation 2006

Semantic Master Data Management

Please see the demo

IBM 中国研究院

© Copyright IBM Corporation 2006

Summary

Introduction- Ontology Management- Ontology based Data Management

Ontology Storage, Reasoning and Query- Triple Store- Reasoning on Large-Scale Data- SPARQL Query Language- Faceted Search

Systems and Applications- Sesame, Jena, OWLIM, SOR, - Master Data Management

IBM 中国研究院

© Copyright IBM Corporation 2006

References

This slide contains content from the following literatures:

Knowledge Systems Course, Aduna, Jeen Broekstra, 2006. Ontology Management, Mohammad Al-Najjar, 2006. Faceted Browsing for the Semantic Web, Eyal Oren, Renaud Delbru, Stefan Decker, 2006. Publishing data on the Web (with SPARQL), Andy Seaborne. AI and the Semantic Web, http://www.w3.org/2006/Talks/0718-aaai-tbl/Overview.html#(1), Tim Bern

ers-Lee, 2006. OWLIM, www.inrialpes.fr/exmo/people/zimmer/SDK-meeting/Pr..., Damyan Ognyanov, 2005. Ontology Library Systems: The key to successful Ontology Re-use, Ying Ding and Dieter Fensel. Jena Persistent Storage Property Table Design,

jena.hpl.hp.com/juc2006/proceedings/wilkinson/slides.ppt, Kevin Wilkinson, 2006. Managing Voluminous RDF Description Bases, S. Alexaki, V. Christophides, G. Karvounarakis, D. Pl

exousakis, K. Tolle, 2001. Jena: A Semantic Web Toolkit, Stefan Miot, 2003. Minerva: A Scalable OWL Ontology Storage and Inference System, Jian Zhou, Li Ma, Qiaoling Liu, L

ei Zhang, Yong Yu, and Yue Pan, 2006. HStar-a Semantic Repository for Large Scale OWL Documents, Yan Chen, Jianbo Ou, Yu Jiang, Xia

ofeng Meng, 2006. Jena: Java and .Net Semantic Web Framework (http://jena.sourceforge.net/)