sem tech 2010_integrity_constraints

37
Using OWL in Closed World Applications Evren Sirin, CTO Clark & Parsia, LLC [email protected]

Upload: clark-parsia-llc

Post on 18-Dec-2014

312 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Sem tech 2010_integrity_constraints

Using OWL in Closed World Applications

Evren Sirin, CTOClark & Parsia, LLC

[email protected]

Page 2: Sem tech 2010_integrity_constraints

Who are we?• Clark & Parsia is a semantic software startup 

– HQ in Washington, DC & office in Boston

• Provides software development and integration services

• Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers 

2

http://clarkparsia.com/Twitter: @candp

Page 3: Sem tech 2010_integrity_constraints

Some Applications• Customer and product data

– Find which customer would be interested in buying a certain product

• System and component descriptions– Configure components to build a desired system

• Workforce and employee data– Locate employees with desired expertise

• Patient history and drug data– Detect and prevent potentially harmful drug interactions

3

Page 4: Sem tech 2010_integrity_constraints

Common Theme• There is data and lots of it!• Adding semantics to the data helps a lot

– Some times simple taxonomies, but other times, complex ontologies

• We have complete knowledge about the domain• Errors in the data cause problems

– Failures in applications, errors in decision making, potential loss of revenue, security vulnerabilities, etc.

4

Page 5: Sem tech 2010_integrity_constraints

Data Validation• Fundamental data management problem

– Verify data integrity and correctness – Enforce validity of updates 

• Relevant in many scenarios– Storing data for stand-alone applications– Exchanging data in distributed settings

• Solved (to some degree) in RDBMSs– Harder to achieve as data semantics increase and/or

more expressive integrity conditions are required

5

Page 6: Sem tech 2010_integrity_constraints

Disclaimer• Data validity not important for every use case

– Invalid data may be fine for an application– Invalidity may even be a requirement

• Focus of this talk is cases where data consistency and integrity are crucial

6

Page 7: Sem tech 2010_integrity_constraints

Roadmap for an App• How to build one of these applications?

– Represent data as RDF triples• First step for accomplishing data integration and analysis

– Enrich data with more semantics (RDFS, OWL)• Infer implicit information from explicit assertions

– Ensure data validity• Detect errors in the data

– Do something cool with the data• Obviously...

7

Page 8: Sem tech 2010_integrity_constraints

Reasoning Example• Input ontology

# Every manager is an employeeManager subClassOf Employee# Person0853 is a managerPerson0853 type Manager

• Output inferences# Person0853 is an employeePerson0853 type Employee

Page 9: Sem tech 2010_integrity_constraints

Reasoning Example• Input ontology

# Every manager is an employeeManager subClassOf Employee# Person0853 is a managerPerson0853 type Manager

• Output inferences# Person0853 is an employeePerson0853 type Employee

Schema

Page 10: Sem tech 2010_integrity_constraints

Reasoning Example• Input ontology

# Every manager is an employeeManager subClassOf Employee# Person0853 is a managerPerson0853 type Manager

• Output inferences# Person0853 is an employeePerson0853 type Employee

Schema

Instance data

Page 11: Sem tech 2010_integrity_constraints

Validating RDF Data• Common misunderstanding

– RDFS/OWL is to RDF what XML Schema is to XML– Describe integrity conditions in RDFS or OWL

• Typing constraints - RDFS domain/range• Participation constraints - OWL some values restrictions• Uniqueness constraints - OWL cardinality restriction

– Use a reasoner to find inconsistencies

• Problem: Open World Assumption

9

Page 12: Sem tech 2010_integrity_constraints

Closed vs. Open World• Two different views on truth:

– CWA: Any statement that is not known to be true is false– OWA: A statement is false only if it is known to be false

• Used in different contexts– Databases use CWA because (typically) they contain 

complete information– Ontologies use OWA because (typically) they don't...

that is, they contain incomplete information

• Data validation results significantly different when using CWA instead of OWA

10

Page 13: Sem tech 2010_integrity_constraints

Typing Constraint• Only managers can supervise employees• Input ontology

o supervises domain Managero Person085 supervises Person173

OWA CWA

 Consistent true false

 Reason Infer that Person085 type Manager

Assume that Person085 type not Manager

Page 14: Sem tech 2010_integrity_constraints

• Each supervisor must supervise at least one employee

• Input axiomso Supervisor subClassOf supervises some Employeeo Person085 type Supervisor

OWA CWA Consistent true false

Reason Infer that Person085 supervises _:b _:b type Employee

Assume that Person085 supervises _:b does not exist

Participation Constraint

Page 15: Sem tech 2010_integrity_constraints

Uniqueness Constraint• Employees can have at most one supervisor• Input axioms

o supervises InverseFunctionalo Person085 supervises Person173o Person632 supervises Person173

OWA CWA Consistent true false

Reason Infer that Person085 sameAs Person632

Assume that Person085 sameAs Person632 does not hold

Page 16: Sem tech 2010_integrity_constraints

Workarounds for CW• Manually close the world

– Declare all individuals different from each other– Count existing property values and add a max

cardinality restriction– Make all disjointness statements explicit and add

negated types to individuals

• Drawbacks– Can be computationally expensive– Likely to be error-prone

Page 17: Sem tech 2010_integrity_constraints

Problem Summary• Definitions in an OWL schema may have two

purposes– Infer new statements– Check if existing statements are valid

• Using OWA for validation is undesirable – Not always but in many cases

• In a problem domain we may have:– Complete knowledge about some parts of the domain– Incomplete knowledge about the other parts

Page 18: Sem tech 2010_integrity_constraints

Integrity Constraint Solution

• We defined an alternative semantics for OWL– Integrity Constraint (IC) semantics use CWA– Can be combined with regular inference axioms

• Ontology developer chooses which axioms will be interpreted with...– OWA - regular OWL axiom, or– CWA - integrity constraint

Page 19: Sem tech 2010_integrity_constraints

IC Extension• Syntax specification

– How do we syntactically say an axiom is an IC and not a regular OWL axiom?

• Semantics specification– How do we exactly interpret an IC?

• Validation algorithm– Given the semantics how do we check for IC

violations?

Page 20: Sem tech 2010_integrity_constraints

IC Syntax• Similar approach to using owl:imports• Define a new annotation property in a new

namespace

Ont1 owl:imports Ont2Ont1 ic:imports IC1

• Backward compatible, requires minimum change in tools

Page 21: Sem tech 2010_integrity_constraints

IC Semantics• OWL semantics based on model theory

– Similar to First Order Logic– Formal, precise, and unambiguous

• IC semantics specification – Extends OWL model theory– Change couple basic definitions, everything else

follows

• Details published in technical papers– We are submitting a W3C member submission soon

Page 22: Sem tech 2010_integrity_constraints

Use Case: SKOS• Simple Knowledge Organization System (SKOS)• SKOS provides a model for expressing the basic

structure and content of concept schemes – Thesauri, classification schemes, subject heading lists,

taxonomies, folksonomies, etc.

• SKOS data model specification– Informal (Text): http://www.w3.org/TR/skos-reference/– Formal (OWL): http://www.w3.org/2004/02/skos/core.rdf

20

Page 23: Sem tech 2010_integrity_constraints

# Constraints from SKOS reference expressed as ICsskos:related propertyDisjointWith skos:broaderTransitive

# SKOS reference ontology that contains inference rulesskos:broaderTransitive Transitiveskos:broaderTransitive subPropertyOf skos:broader

# SKOS data that violates the SKOS data model[] a owl:Ontology ; owl:imports skos-reference.ttl ;                  ic:imports skos-constraints.ttl .

A skos:broader B ; skos:related C . B skos:broader C .

skos-constraints.ttl

skos-invalid.ttl

skos-reference.ttl

SKOS Example

Page 24: Sem tech 2010_integrity_constraints

ExplanationVIOLATION: A violates related propertyDisjointWith broaderTransitive INFERRED: A related C ASSERTED: A related C INFERRED: A broaderTransitive C ASSERTED: A broader B ASSERTED: B broader C ASSERTED: broader subPropertyOf broaderTransitive ASSERTED: broaderTransitive Transitive

22

Page 25: Sem tech 2010_integrity_constraints

# SKOS-XL ontology with a cardinality restrictionskosxl:Label subClassOf skosxl:literalForm cardinality 1

# SKOS data that violates the SKOS data model[] a owl:Ontology ; owl:imports skos-xl.ttl .

A skosxl:labelRelation LabelA LabelA type skosxl:Label .

skos-data.tll

skos-xl.ttl

Another SKOS Example

Result: Consistent

Page 26: Sem tech 2010_integrity_constraints

# SKOS data that violates the SKOS data model[] a owl:Ontology ; owl:imports skos-xl.ttl ;                  ic:imports skos-xl.ttl .

A skosxl:labelRelation LabelA LabelA type skosxl:Label .

skos-data.tll

skos-xl.ttl

Another SKOS Example

Result: IC Violation

# SKOS-XL ontology with a cardinality restrictionskosxl:Label subClassOf skosxl:literalForm cardinality 1

Page 27: Sem tech 2010_integrity_constraints

Linked Data Application• Large amounts of instance data• Validate before publishing/consuming LOD• Instance data + Inference axioms + Constraints

– Infer new facts using inference axioms with OWA– Validate data using constraints with CWA– Inference axioms and constraints are both expressed

in OWL

25

Page 28: Sem tech 2010_integrity_constraints

Validation Algorithm• An automated translation algorithm• Automatically maps an OWL IC to ...

– A SPARQL query, or– A RIF rule

• Many different implementation possibilities• Off-the-shelf tools can be used for IC validation

Page 29: Sem tech 2010_integrity_constraints

SPARQL Translation

SELECT * { ?x type Supervisor. NOT EXISTS { ?x supervises ?y. ?y type Employee. } }

Supervisor subClassOf supervises some Employee

Page 30: Sem tech 2010_integrity_constraints

RIF Translation

Forall ?x ?y ( invalid() :- And ( ?x[type -> Supervisor] Naf And ( ?x[supervises -> ?y] ?y[type -> Employee] )))

Supervisor subClassOf supervises some Employee

Page 31: Sem tech 2010_integrity_constraints

Solution Summary• Separate ICs from regular OWL ICs

– No new syntax– Import-based mechanism

• Alternative semantics for ICs– Extends OWL model theory– Provides the meanings of ICs formally

• Validation algorithm– Translate ICs to another formalism– SPARQL or RIF engines can be used

Page 32: Sem tech 2010_integrity_constraints

Performance• Using ICs can improve performance!• Expressive OWL reasoning is not easy• Profiles of OWL defined for tractable reasoning

– OWL 2 QL, OWL 2 EL, OWL 2 RL– Less expressive but more efficient

• Modeling some OWL axioms as ICs may reduce the overall expressivity

30

Page 33: Sem tech 2010_integrity_constraints

Prototype • Pellet IC validator

– Translates ICs into SPARQL queries automatically– Executes SPARQL queries with Pellet– Query results show constraint violations– Automatically explain constraint violations

• Free download– http://clarkparsia.com/pellet/icv

31

Page 34: Sem tech 2010_integrity_constraints

Code Example// create an inferencing model using Pellet reasonerInfModel dataModel = ModelFactory.createInfModel(r);

// load the schema and instance data to PelletdataModel.read( "file:data.rdf" );dataModel.read( "file:schema.owl" ); // Create the IC validator and associate it with the datasetJenaICValidator validator = new JenaICValidator(dataModel); // Load the constraints into the IC validatorvalidator.getConstraints().read("file:constraints.owl");

// Get the constraint violationsIterator<ConstraintViolation> violations = validator.getViolations();

Page 35: Sem tech 2010_integrity_constraints

Next Steps• W3C Member submission for IC semantics• Robust IC validator implementation

– Incremental validation– Multi-threaded validation

• Support for IC editing• Integration with PelletDb

– Scalable reasoning + validation

33

Page 36: Sem tech 2010_integrity_constraints

• Evren Sirin, Michael Smith, Evan Wallace Opening, Closing Worlds - On Integrity ConstraintsOWL: Experiences and Directions Workshop (OWLED '08), October 2008.

• Evren Sirin, Jiao TaoTowards Integrity Constraints in OWLOWL: Experiences and Directions Workshop (OWLED '09), October 2009.

• Jiao Tao, Evren Sirin, Jie Bao, Deborah L. McGuinnessIntegrity Constraints in OWLTo AppearThe 24th AAAIConference on Artificial Intelligence (AAAI '10), July 2010.

References

Page 37: Sem tech 2010_integrity_constraints

Questions