mapping integrity constraint ontology to relational databases

December 2010, 17(6): 113–121 www.sciencedirect.com/science/journal/10058885 http://www.jcupt.com

The Journal of China Universities of Posts and Telecommunications

Mapping integrity constraint ontology to relational databases OUYANG Dan-tong1, 2, CUI Xian-ji1, 2, YE Yu-xin1, 2 ( )

1. College of Computer Science and Technology, Jilin University, Changchun 130012, China

2. Key Laboratory of Symbol Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun 130012, China

Abstract

Integrity constraint is a formula that checks whether all necessary information has been explicitly provided. It can be added into ontology to guarantee the data-centric application. In this paper, a set of constraint axioms called IC-mapping axioms are stated. Based on these axioms, a special ontology with integrity constraint, which is adapted to map ontology knowledge to data in relational databases, is defined. It’s generated through our checking and modification. Making use of the traditional mapping approaches, it can be mapped to relational databases as a normal ontology. Unlike what in the past, a novel mapping approach named IC-based mapping is proposed in accordance with such special ontology. The detailed algorithm is put forward and compared with other existing approaches used in Semantic Web applications. The result shows that our method is advanced to the traditional approaches.

Keywords Semantic Web, integrity constraint, ontology, relational database

1 Introduction

Semantic Web is an extension of the current Web in which information is expressed as intelligent forms, better enabling computers and people to work in cooperation [1]. As the basis of the Semantic Web, ontology plays a key role via providing a source of shared and precisely defined terms that can be used in description of Web source. Description logics (DLs) are a family of logic-based knowledge representation languages [2–3]. They have strong expressivity and provide logical underpinning for Web ontology language (OWL), which is a well-known language for ontology modeling in the Semantic Web [4].

Semantic Web has achieved success in many domain applications, such as information retrieval and e-commerce, etc. However, it is still a long process for the real practical application. Since the vast majority of data on the World Wide Web are still stored in relational databases [5] (about 77.13%), the Semantic Web applications are not free to access and manipulate the data. Moreover, databases adopt the query matching method, while ontology adopts the reasoning Received date: 13-01-2010 Corresponding author: YE Yu-xin, E-mail: [email protected] DOI: 10.1016/S1005-8885(09)60534-3

method (such as tableau algorithm [6], logic programming [7]) to query information. Because of not considering the inference of implicit knowledge, the speed of query execution in databases is faster than in the ontology. Therefore, using the existing storing methods to store ontology knowledge in relational databases is an important issue for the Semantic Web related fields. In this way, the semantic information query can be translated into the database query.

Over the last few years, several approaches have been proposed for the work and achieved good results. The horizontal approach retains a common table in a database [8]: each instance of ontology is a record and each property of the instance is a column. Agrawal further demonstrate that the description of classes [9], properties and their instances can be stored in a single table called vertical table. Afterwards, Dehainsala proposed the class decomposition approach [10]. It stores the ontology data by creating a table for each class in ontology. The Instance Store uses a DL reasoner for inferring TBox information and storing it in a relational database [11]. Recently, Garcia proposed a persistent and scalable OWL reasoner DBOWL to store the Web ontology language description logic (OWL DL) ontologies in relational databases [12]. He categorizes the table into four types: ontology information tables for storing identity (ID) and

114 The Journal of China Universities of Posts and Telecommunications 2010

universal resource identifier (URL) of ontologies; TBox tables for the relationships between concepts (/roles); ABox tables for each class (/property) and create tables for each kind of class description in the ontology.

However, the current mainstream mapping approaches above have failed to consider the data integrity, which may lead to the incompleteness and inconsistency of the query information. Therefore, compared with the traditional mapping mechanism, we aim to use integrity constraints to improve the quality of ontology information storage and query efficiency. Integrity constraint is a formula that checks whether the data conform to guidelines specified by the database administrator. Sirin has presented the method to extend OWL knowledge base to accommodate integrity constraints in the ontology [13]. Similarly, Motik has proposed that TBox axioms can be used to mimic the integrity constraints in relational databases [14]. In this paper, we put forward the IC-mapping axioms to represent the integrity constraints and map ontology to relational databases. It is hoped that the question will be resolved with our proposed method.

The paper is organized as follows: Sect. 2 presents the representation of the integrity constraints in the ontology. Sect. 3 generates the ontology with integrity constraints by checking the integrity of ontology and modifying it. Sect. 4 proposes a new mapping method of the IC-mapping ontology to relational databases called IC-based mapping approach. Sect. 5 presents our experimental results that compare IC-based mapping approach with the existing approaches used in Semantic Web applications. Sect. 6 concludes this paper.

2 Representation of the integrity constraints

In order to insure the integrity of ontology, we aim to define the axioms which are used for representing the integrity constraints in ontology above all. Since each TBox (/RBox) axiom can be interpreted as the integrity constraint, these axioms also can be used as the integrity constraints.

2.1 IC-mapping axioms

It is well-known that the OWL DL is the variant of OWL corresponds to the DL SHOIN (D) [15]. Because of that, in the rest of this paper, we refer to class (/property, instance) in OWL and concept (/role, individual) in DLs interchangeably. The set of SHOIN (D)-concepts (/roles) is built in the following grammar:

1 2 1 2| | | | |1) | | | | . .C C C C C C C R C R C n R→ ⊥ ¬ ∀ ∃ ≤ ≥

1 2 1 2 1 2| | | | |{ , ,..., } , ,..., . , ,..., . .n n nn R o o o nT nT T T T D T T T D∃ ∀≤ ≥ 2) 1 2|{ , ,..., }.nD d c c c→

Where ⊤ abbreviates C C¬ and ⊥ abbreviates ¬⊤. C is an atomic concept name, R and iT are abstract role

name and concrete role name, respectively. n is a nonnegative integer, d is a datatype, io , ic are abstract individual and

concrete individual, respectively. The DLs-based ontology knowledge base is composed of

the TBox (/RBox) and the ABox [2]. TBox is the set of terminological axioms which has the form of 1 2C C ,

1 2C C ⊥ or 1 2C C≡ , where 1C , 2C are concepts.

1 2C C≡ is equivalent to 1 2 2 1( ) ( )C C C C and

1 2C C ⊥ is equivalent to 1 2C C¬ . RBox is the set of axioms which has the form of trans ( )C R , R S and T U , where R, S are abstract roles as well as T, U are concrete roles. It is used for representing the relationship and attributes of concepts and roles. ABox is the set of assertions which has the form of ( )C a or ( , )R a b , where C, R are

concept and role, a, b are the individuals respectively. It is used for representing the assertions of the ontology.

The relational databases not only store the set of facts (similar to the ABox assertion) in general, but also need to map the ontology TBox axioms. Since the expressivities of ontology and relational database are different [14], the ontology itself should have its corresponding constraints to be satisfied. Integrity constraints with respect to the ontology are divided into two parts: one is the constraints which are used for restricting the relationships (/attributes) of concepts and roles; the other is the constraints which are used for checking whether the ontology instances satisfy the given conditions. Therefore, both the cases need to be considered when checking the integrity of the ontology. We give the definition of the constraint axioms as follows.

Defintion 1 An axiom is called an IC-mapping axiom, if it is used for detecting errors and/or missing information of ontology and mapping ontology to relational databases rather than inferring implicit information.

The IC-mapping axioms include TBox constraint axioms and ABox constraint axioms which present the constraints of the concepts (/roles) and the instances, respectively. In the following, we state the axioms in detail.

2.2 Representation of TBox constraints

1) Inclusion constraints 1 2C C : the inclusion constraint

1 2C C states that for each instance 1a C∈ , 2a C∈ must

Issue 6 OUYANG Dan-tong, et al. / Mapping integrity constraint ontology to relational databases 115

hold. The inclusion constraint R S states that for each instance pair ( , )a b R∈ , ( , )a b S∈ must hold.

2) Inverse constraints inv ( )C R : the inverse constraint

inv ( )C R states that for each instance pair ( , )a b R∈ ,

( , )ab R−∈ must hold. 3) Transitivity constraints trans ( ) :C R the transitivity

constraint trans ( )C R states that for each two instance pairs ( , )a b R∈ and ( , )b c R∈ , ( , )a c R∈ must hold.

4) Typing constraints: typing constraints are used for checking whether the properties are correctly typed. For each property R , we must specify the domain and (/or) range. The property of ontology has two representations: ObjectProperty and Datatype Property. For the ObjectProperty, the typing constraints are as follows:

The general form of domain constraint is . ,R C∃ where R is a property of class C, and states that the domain type of the property R must be C.

The general form of range constraint is . ,R C−∃ where R is a property of class C, and states that the range type of the property R must be C.

The domain constraint of the Datatype Property is the same as the ObjectProperty, and the range constraint of the Datatype Property is a special type of class which is used for specifying the domain of the property value. We use the datatype to present it. The general form of datatype constraint is . ,d−∃D where D is a Datatype Property, and d is the domain of value of D. It is used for specifying domain of the property value in the relational database. For each instance a C∈ , the value type of the property D must be d. Relational databases assign domain types to columns, and typing constraints are used in practice for determining the physical layout of the database, all columns are assumed to draw their value from a common countable domain set.

For example, in a TBox, one can state that the property name must be specified as a Datatype Property on GraduateStudent that takes strings as values. These statements are expressed using following TBox axioms:

name. GraduateStudent∃ ; name . string−∃ .

The first three types of constraints are used for assuring the attributes of the properties in the TBox axioms and the typing constraints are used for specifying the types of the property value when creating tables in the relational database.

2.3 Representation of ABox constraints

1) Number constraints: the general form of the number

constraint is 1 2.C nR C⊗ , where { , , }⊗∈ =≥≤ , 1C , 2C

are concepts, R is a role, and n is a nonnegative integer. The constraints state that each instance of 1C must participate in one or more R-relationships with instance of 2C .

At most constraint 1 2.C nR C≤ states that for each instance of the concept 1C , the number of the value of property R in 2C must be no more than n.

At least constraint 1 2.C nR C≥ states that for each instance of the concept 1C , the number of the value of property R in 2C must be no less than n.

Existential constraint 1 2.C R C∃ states that for each instance in 1C , the value of the property R in 2C must have been explicitly specified. According to the semantic of .R C∃ , it can be seen as the special case of the at least restriction. Namely, 1 2.C R C∃ is equivalent to 1 21 .C R C≥ .

Unique constraint 1 21 .C R C= states that for each instance in 1C , the number of the value of the property R in

2C must be 1. It is equivalent to 1 2 1( 1 . ) (C R C C≥ ≤

21 . )R C Especially, the concept in the number constraint 2C may

be a nominal { }o . It can be considered as a special concept which only contains an instance o .

For example, in a TBox, one can state that each GraduateStudent must take 1–3 GraduateCourses and be advised by Professor. These statements are expressed using following TBox axioms: GraduateStudent 3takesCourse.GraduateCourse≤ ; GraduateStudent 1takesCourse.GraduateCourse≥ ; GraduateStudent advisor.Professor∃ .

2) Value constraints: the value constraints are used for specifying some value of the property must satisfy some special conditions, such as the age of the GraduateStudent should be larger than 21. In practice we use the concrete datatype to implement the value constraints. According to the special condition, define some special type d. The general form of value constraints is . ,C R d∃ where R is a property of class C and d is a set of concrete datatype. We can see that the value constraints are the special case of the existential constraints.

For example, in a TBox, one can state that the age of the GraduateStudent must larger than 21. The statement is expressed in the following TBox axiom GraduateStudent age.(min 21)∃ where (min 21) is a new

defined type which means the variable value is larger than 21. The number constraints may be used as the criterion of the


mapping mechanism which may be discussed in Sect. 4 in detail. In further, the existential constraint is similar to the not-null restriction; the unique constraint is similar to the unique restriction; the domain constraints and the value constraints are the same as the check constraints in relational database.

To map the integrity constraint ontology to relational database, the constraints discussed in this section are necessary to be satisfied. Otherwise, the mapped data may not satisfy the integrity of the relational database.

3 IC-mapping ontology

In this section, in order to map the integrity constraint ontology to the relational database, we need to a new strategy to assure the integrity of the ontology first of all. It is divided into two steps: one is checking whether the ontology satisfies the IC-mapping axioms; the other is modifying the unintegrity ontology to be the integrity constraint ontology. Finally, we give the definition of the integrity constraint ontology with IC-mapping axioms.

3.1 Integrity constraints checking

To simplify the complexity of the checking, we give the definition of extension ontology with IC-mapping axioms as follows.

Defintion 2 An extended ontology is a triple O = (A, N, C′ ) such that

1) A is a finite set of ABox assertions. 2) N is a finite set of normal TBox axioms. 3) C′ is a finite set of IC-mapping TBox axioms. Since the integrity constraints in the ontology are

represented in the form of IC-mapping axioms, we only need to consider whether the ontology satisfies the IC-mapping axioms to check the integrity of the ontology. Thus, the TBox axioms distinguish the IC-mapping axiom from normal axioms. The normal axioms are only used for inferring new information.

We check the integrity of ontology using the well-known methods of logic programming which is discussed in Ref. [14]. The theoretical foundation is that: for each minimal Herbrand model I of A N∪ , if I╞C′ , then the ontology satisfies the constraints. The main idea of the checking algorithm is as follows: firstly, according to the standard DL semantics, translate each normal TBox axiom N into a first-order formula

( )Nπ and put the result into a logic program ALP(N).

Convert each clause 1 1... ...n mA A B B¬ ∨ ∨ ¬ ∨ ∨ ∨ into a rule

1 1... ...n mA A B B∧ ∧ → ∨ ∨ . Secondly, gain the minimal Herbrand universe of A N∪ . For each rule in the logic program in which a variable x occurs in the head but not in the body, we add the rule body to the literal AHU(x); for each individual occurring in A N∪ , add an assertion AHU(a). Add the following rule for each n-ary function symbol f:

HU 1 HU HU 1 2( ) ... ( ) ( ( , ,..., )). Thirdly, translaten nA x A x A f x x x∧ ∧ →

each constraint TBox axiom C′ into a first-order formula ( )Cπ ′ . With each subformula of the ( )Cπ ′ , we associate a

unique predicate Eϕ as well as ( )Cπ ′ is decomposed into a

set of simple formulas 1 1( ) : ... ...nC C Eμ ϕ ∧ ∧ ∧ ∧ ∧

1iE Eϕ− → . Compute the following logic program AIC( C′ ):

IC ICsub( )

( ) ( ) ( )A C Aψ ϕ

μ ϕ ψ∈

′ = ∪

IC-mapping ontology O satisfies C′ , if and only if LP IC( ) ( )A A N A C∪ ∪ ╞C′ Eϕ , where ╞C′ presents the

well-known entailment in stratified logic program. Along with the entailment of logic programs to check the

integrity, some new implicit knowledge will be derived. The instances which satisfy the constraints may include two parts: one is the explicit data in the ontology; the other part is the data which do not explicitly appear in the ontology but implicitly appear during the entailment. Since this implicit information also needs to be mapped to the relational database, we must add this implicit information into ontology. If the explicit data satisfy the integrity, we do nothing. Otherwise, add the implicit data to the ontology, so that the ontology data satisfy the integrity. In this way, we complete the constraint checking for all the instance data in ontology. In the next part, we aim to modify the normal ontology to be the integrity constraint ontology.

3.2 IC-mapping ontology generating

For the data do not satisfy the constraints, we work with knowledge engineer to modify them to meet the constraints. The knowledge engineers are required to complement the missing data, modify the incomplete data and delete the duplicate data according to the requirement. The concrete action can be seen as follows.

If the instance 1a C∈ does not satisfy the inclusion constraint 1 2C C , it means the instance a is not an instance of concept 2C . The knowledge engineer needs to add the assertion ( )D a to the ontology knowledge base.

If the instance pair ( , )a b R∈ does not satisfy the inverse


constraint, it means there is no instance pair ( , )b a R−∈

exists. The knowledge engineer needs to add the assertion ( , )R b a− to the ontology knowledge base.

If the two instance pairs ( , )a b R∈ and ( , )b c R∈ do not

satisfy the transitive constraint, it means there is no instance pair ( , )a c R∈ exists. The knowledge engineer needs to add the assertion ( , )R a c to the ontology knowledge base.

If the property does not satisfy the typing constraints, the knowledge engineer must compulsively specify the necessary domain (/range) value and it is must in C.

If the type of the property value does not satisfy the datatype constraints, the knowledge engineer must compulsively specify the relevant type d.

If the instance 1a C∈ does not satisfy the existential constraint 1 2.C R C∃ , it means there is no instance 2b C∈ , so as to ( , )R a b holds. The knowledge engineer needs to add an instance bu to the class 2C , so as to u( , )R a b holds.

If the instance 1a C∈ does not satisfy the at least constraint 1 2.C nR C≥ , assuming that there already is m instances in 2C so as to 1( , ),..., ( , )mR a b R a b hold, where 0 m n<≤ . The knowledge engineer needs to add t necessary instance u1 u2 ub , ,..., tb b to the class 2C , so as to ( ,R a

u1 u),..., ( , )mb R a b hold, where 0 t n< ≤ and t m n+ ≥ . If the instance 1a C∈ does not satisfy the at-most

constraint 1 2.C nR C≤ , assuming that there already is m instances in 2C so as to 1( , ),..., ( , )mR a b R a b hold, where m n> . The knowledge engineer needs to delete t unnecessary instance u1 u2 ub , ,..., tb b from the class 2C which cause

u1 u( , ),..., ( , )mR a b R a b hold, where 0 t n< < and m t n− ≤ . If the instance a C∈ does not satisfy the value constraint

.C R d∃ , then it means the value of the property R is not in the datatype d. The knowledge engineer needs to modify the value so as to the value is in the datatype d.

In this way we obtain a special ontology which is complete and satisfies all the given IC-mapping axioms. To facilitate the mapping, define the special ontology as follows.

Defintion 3 Let O = (A, N, C′ ) be an extended ontology, if the IC-mapping axioms C′ are satisfied in O, we say that the ontology is IC-mapping ontology.

4 IC-based mapping approach

The structure of IC-mapping ontology, which satisfies the IC-mapping axioms, is different from the normal ontology. According to the special feature of the IC-mapping ontology,

we put forward a novel mapping approach to map it to relational database. It will store the ABox and TBox separately and map the ABox assertions according to the IC-mapping axioms. Since the mapping approach is based on the integrity constraints, we call it IC-based mapping approach.

4.1 TBox mapping

We map the TBox axioms to relational databases. Since the information to query is the relationship among the classes (/properties) rather than the constraints about ontology, the IC-mapping axioms are no need to map to relational databases. Thus, in this paper, only the normal axioms are mapped to relational databases while the IC-mapping axioms are used for checking integrity of ontology and mapping the assertions to relational databases.

Compared with the assertions in ABox, the scale of the normal axioms is quite small, and the TBox assertions have no fixed structure. So it can be stored in a simple single table. The table has three columns: subject, predicate as well as object, the subject and object represent the related two classes (/properties) and predicate represents the relationship between two classes (/properties), respectively. Further, set id as the primary key. It is simple to implement and modify the ontology structure. Table 1 shows the TBox mapping table.

Table 1 TBox mapping Id Subject Predicate Object 1 GraduateStudent SubClassOf Person 2 GraduateStudent Type Class 3 UndergraduateDegreeFrom SubPropertyOf DegreeFrom 4 UndergraduateDegreeFrom Domain GraduateStudent5 UndergraduateDegreeFrom Range University 6 UndergraduateDegreeFrom Type ObjectProperty 7 Name Type DatatypeProperty

4.2 ABox mapping

In the following, we state the approach of mapping ABox assertions to relational databases. We mainly use the number constraints and typing constraints to create tables, specify the domain of property value and store the ABox assertions of IC-mapping ontology to the relational database.

Since each class of the ontology is similar to the relational tables of relational database, the instances of the class can be seen as a set of records with similar properties. We can create a relational table for each class in database. The table name is the same as the class name, each row (record) in the relational table represents an instance of ontology class and its columns consist of a set of applicable properties used by its instances.


Since each instance of a class is unique, we set the instance id of the class as the primary key. It is similar to the class decomposition mapping approach.

Since the IC-mapping ontology instance data satisfy the number constraints 1 2.CC nR⊗ , when the instance data are

mapped to the relational database, the number of the records which satisfy the same number constraints for an instance in a table is uncertain. We can take some measures to resolve the problem and optimize the above mapping approach. To resolve the problem, we aim to create two types of tables: class tables and constraint tables to distinguish the data satisfy the constraints from normal data.

Since not all the definitions of the classes in the TBox include the number constraint axioms, we give a formal definition to distinguish the instances in the ABox.

Defintion 4 Let the axiom 1 2.C nR C⊗ is the form of the number constraint, 1{ }C are a set of the classes which

appear in the left side of the axiom and R is the constraint property. For each class 1

iC in the TBox, if 1 1{ }iC C∈ we

call it a constraint class, else call it a normal class. Defintion 5 For each instance 1

ia C∈ if 1iC is a

constraint class we call it a constraint instance, else call it a normal instance.

The concrete implementation of IC-based mapping approach is as follows: since for each relational table to be created, the domain of the property value should be specified, first of all, for each property of the table specify the domain of the property value according to the typing constraints when creating tables. For the normal instances, create a class table for each class in databases like class decomposition approach, but the columns consist of a subset of applicable properties of

its instances except the constraint properties appear in number constraints. For the constraint instances, create a constraint table for each number constraint. The table name is the constraint property name, and subject and object which represent domain and range of the property are both not null. Further, set id as the primary key. In this way, we can store all the data that satisfy the number constraints.

Obviously, IC-based approach is superior to the class decomposition approach. On the one hand, the length of the class table of former approach is shorter than the latter one. Use the at-least constraint as an example. The constraint is as follows: GraduateStudent 2takesCourse.GraduateCourse≥ .

Table 2 shows the class table by the class decomposition approach while Table 3 and Table 4 show the tables corresponding to the Table 2 by IC-based approach. From Table 2 we can see that the instance GraduateStudent4 may generate two records, in further, except the value of property takesCourse, all the other property value is the same. It may lead to the redundancy, and with the increase of n, the redundancy of the table is hard to imagine. Since the redundant constraint properties are all extracted separately, there is no redundancy in Table 3. It makes the length of Table 3 is shorter than Table 2. On the other hand, the efficiency of the query performance of former approach is higher than the latter one. Since we extracted the constraint property separately, we only need to search the corresponding constraint table when query the information for the data which satisfy the constraints. In further, the short table may shorten the query time. Thus, our approach supports small space and efficient query performance.

Table 2 Class table GraduateStudent

Id Name Advisor TeachingAssistantOf UndergraduateDegreeFrom MemberOf TakesCourse

1 GraduateStudent4 Assistantprofessor4 Course12 University640 Department0 Course18 2 GraduateStudent4 Assistantprofessor4 Course12 University640 Department0 Course56 3 GraduateStudent12 Fullprofessor1 Null University184 Department1 Course15 4 GraduateStudent12 Fullprofessor1 Null University184 Department1 Course28 5 GraduateStudent17 Associateprofessor2 Course52 University617 Department3 Course20 6 GraduateStudent17 Associatprofessor2 Course52 University617 Department3 Course32 7 GraduateStudent17 Associateprofessor2 Course52 University617 Department3 Course54

Table 3 Class table GraduateStudent

Id Name Advisor TeachingAssistantOf UndergraduateDegreeFrom MemberOf

1 GraduateStudent4 Assistantprofessor4 Course12 University640 Department0 2 GraduateStudent12 Fullprofessor1 null University184 Department1 3 GraduateStudent17 Associateprofessor2 Course52 University617 Department3


Table 4 Constraint table takesCourse

Id Subject Object 1 GraduateStudent4 Course18 2 GraduateStudent4 Course56 3 GraduateStudent12 Course15 4 GraduateStudent12 Course28 5 GraduateStudent17 Course20 6 GraduateStudent17 Course32 7 GraduateStudent17 Course54

4.3 Algorithm

The basic idea of the algorithm is described as follows: Firstly, create a TBox table with three columns, and transform all the normal axioms into a set of simple statements, further insert the triple into the TBox table. Secondly, for each class in TBox, we divide the properties of the class into normal properties and constraint properties. Thirdly, create a class table for each class and the properties of this table are a set of the normal properties; create a constraint table with two properties for each constraint. Finally, the assertions are transformed into a set of statements. For each triple, insert the data into the corresponding tables. If the predicate of the statement is a normal property, the object of this statement is inserted into the corresponding class table. Otherwise, the predicate is a constraint property. The subject and the object of this statement are inserted into the corresponding constraint table.

The variables used in the algorithm showed as follows: A is the set of assertions, N is the set of normal axioms, IC

is the set of number constraints, with the form of 1 2.C nR C⊗ .

class con{ } { } { }P P P= ∪ where classP is the property of the class table and conP is the property of the constraint table,

respectively. { }C is the set of classes; statement can be seen a triple

(sub, pred, obj). Algorithm: IC-based mapping approach

Input: IC-mapping ontology ( , , )= A N ICO

Output: relational database DB begin:

DB= {} //Store the TBox axioms transform N into the set of statements repeat get next statement insert (sub, pred, obj) into TBox until no next statement

// create tables for each class 1 { }C C∈

classP := all the property of 1C ;

for each axiom begins with 1C

if axiom IC∈

con con{ } { } { }:P P R= ∪

class class{ } { } { }:P P R= − create table 1

C , its property is class{ }P

DB: = DB ∪ table 1C

for each property 1 conP P∈｛｝ create table 1P its property is {subject, object}

DB: = DB ∪ table 1P

// insert data into tables transform A into the set of statements for each class 1 { }C C∈

repeat get next statement if sub is instance of 1

C if conpred { }P∉

insert obj into the table 1C , the property is pred

until no next statement for each property 1 con{ }P P∈

repeat get next statement if pred = 1P

insert (sub, obj) into the table 1P

until no next statement end

5 Experiments

For evaluating the performance of IC-based mapping approach, we have created an IC-mapping ontology. This ontology is evolved from Univ-Bench [16] written by Lehigh University. Univ-Bench describes universities, departments and the activities that occur at them. According to the experiment requirement, we have extended its expression from OWL Lite to OWL DL, then obtained the IC-mapping axioms from the OWL ontology, furthermore, checked the integrity of the IC-mapping ontology. Finally for the data of this ontology which don’t satisfy the IC-mapping axioms, we have modified and completed it to be an IC-mapping ontology.

After generating the data set, the IC-mapping ontology data are stored in the relational database in four ways. There are respectively the vertical approach, the class decomposition approach, DBOWL based approach and the IC-based approach. All experiments were conducted underlying the


following environment: 1) 1.9 GHz AMD Athlon 64 X2 dual core processor 3 600+

CPU; 1.00 GB of RAM: 150 GB of hard disk. 2) Windows XP Professional OS; Java SDK 3.2.2; MySQL

5.0. As we discussed in Sect. 4, the IC-based mapping approach

has small scalability and high efficiency of query performance. In this section we take experiments to prove it. Fig. 1 has compared the length of class tables created by class decomposition approach and IC-based approach. Since the length of the vertical table is much longer than the others and the class tables created by the DBOWL based approach is the same as the class decomposition one, thus, we only compare the class decomposition approach and IC-based approach. Take the class table GraduateStudent for example. The horizontal axis represents the number of ontology files and the vertical axis represents the length of the class table. Fig. 1 indicates that the length of the class table in IC-based approach is shorter than in the other one, and with the increase of ontology scale the length gap is clearer.

Fig. 1 Table length with respect of different scale of datasets

In the following, we compare the query time of the four approaches. The 14 queries are offered by lehigh university benchmark (LUBM) [16] and the ontology data have 7 888 triples. We have carried out 100 times for each different query, and got the average query time as the final query time. We directly have used the structured query language (SQL) statement rather than the semantic query languages, such as RDF query language (RQL) or simple protocol and RDF query language (SPARQL). The purpose is to decrease the time of translating the query languages into the corresponding SQL statement, so that eliminate the impact of the experimental results. Fig. 2 illustrates the result of the experiment. The horizontal axis represents the different query statements and the vertical axis represents the query time for the vertical approach, the class decomposition approach, DBOWL based approach and the IC-based approach. Fig. 2 indicates that IC-based approach is superior to the other three

approaches in the majority cases, furthermore, implies that the query time is differed with the query statements.

Fig. 2 Query time comparison with respect to each approach

Obviously, the smaller the length of the table, the less the number of joined tables, the query efficiency is higher. From Fig. 2, we can find that the query time of the vertical table is much longer than the others in the majority of the cases. That’s because the vertical approach needs to join multiple tables and the vertical table is much longer than other tables while the others do not need the join operation and the queries are carried out in a short table, like 3q . In further, due to the

presence of the redundancy in the class table in the class decomposition approach, the length of the class table in the IC-based approach is shorter than the class decomposition approach, like q1. Moreover, with the growth of the scalability of the integrity constraints, the gap will be much greater. Moreover, since we create a table for each class and each property in the DBOWL based approach, there may to too many tables in relational database, and we need some extra time to determine which tables need to query. Thus, for the IC-mapping ontology, using the IC-based approach may improve the query efficiency.

In addition, the efficiency of query performance with respect to some query statements is not very clear when the dataset has small scale. To resolve the problem, it is necessary to increase the scale of dataset to observe the change of the efficiency. We store the different size of ontology data by the three approaches. The test data have 7 888, 32 155, 99 189, and 328 436 triples, respectively.

Fig. 3 illustrates the result of the experiment. The horizontal axis represents the different scale of datasets and the vertical axis represents the query time for the three approaches. The query time is the average of the query times of these query statements. Fig. 3 indicates that with the increase of ontology scale, the efficiency gap is clearer. In further, we can infer that the scale of the dataset used in second experiment is sufficient to illuminate the advantage, so there is no need to augment the dataset scale any more.


Fig. 3 Query time with respect to different scale of datasets

6 Conclusions

Motivated by the incompleteness problem encountered in mapping ontology to relational databases, the IC-mapping axioms have been defined to mimic the integrity constraints in relational databases. Furthermore, according to these axioms, we have checked the integrity of the IC-mapping ontology and modified a normal ontology to be an IC-mapping ontology. Moreover, IC-based mapping approach, which makes use of the IC-mapping axioms for optimizing the existing mapping approaches, has been proposed. At last, it has been compared with the vertical approach and the class decomposition approach. As the result of query performance shows that the IC-based mapping approach is superior to the two approaches in the majority of the case, in addition, with increasing the data sets capacity, the discrepancy is in evidence.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (60973089, 60873148), the Jilin Province Science and Technology Development Plan (20101501, 20100185), the Erasmus Mundus External Cooperation Window’s Project (EMECW): Bridging the Gap (155776-EM-1-2009-1-IT-ERAMUNDUS-ECW-L12), the Foundation of Key Laboratory of SCKE of Ministry of Education (450060326019).

References

1. Berners-Lee T, Hendler J, Lassila O. The Semantic Web. Scientific American, 2001, 284(5): 35−43.

2. Baader F, Calvanese D, McGuiness D, et al. The description logic handbook: theory, implementation and application. Second Edition. London, UK: Cambridge University Press, 2007

3. Motik B, Grau B, Horrocks I, et al. Representing ontologies using description logics, description graphs, and rules. Artifical Intelligence, 2009, 173(14): 1275−1309

4. Antoniou G, Harmelen F. Web ontology language-OWL. Handbook on Ontologies. International Handbooks on Information Systems. Berlin, Germany: Springer-Verlag, 2004: 67−92

5. Chang K, He B, Li C, et al. Structured databases on the Web: observations and implications. SIGMOD Record, 2004, 33(3): 61−70

6. Tobies S. Complexity results and practical algorithms for logics in knowledge representation. Ph D. Thesis. Aachen, German: RWTH Aachen University, 2001

7. Motik B, Rosati R. A faithful integration of description logics with logic programming. Proceedings in the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), Jan 6−12, 2007, Hyderabad, India. Menlo Park, CA, USA: AAAI Press, 2007: 477−482

8. Pan Z, Heflin J. DLDB: extending relational databases to support semantic Web queries. Proceedings of the 1st International Workshop on Practical and Scalable Semantic Systems (PSSS'03), Oct 20, 2003, Sanibel Island, FL, USA. 2003: 43−48

9. Agrawal R, Somani A, Xu Y. Storage and querying of e-commerce data. Proceedings of the 27th International Conference on Very Large Databases (VLDB’01), Sep 11−14, 2001, Roma, Italy. Roma, Italy: Morgan Kaufmann Publishers, 2001: 149−158

10. Dehainsala H, Pierra G, Bellatreche L. OntoDB: An ontology-based database for data intensive applications. Proceedings of the 12th International Conference on Database Systems for Advanced Applications (DASFAA’07), Apr 9−12, 2007, Bangkok, Thailand. LNCS 4443. Berlin, Germany: Springer-Verlag, 2007: 497−508

11. Bechhofer S, Horrocks I, Turi D. The instance store: system description. Proceedings of the 20th International Conference on Automated Deduction (CADE’05), July 22−27, 2005, Tallinn, Estonia. LNCS 3632. Berlin, Germany: Springer-Verlag, 2005: 177−181

12. García M, Montes J. Complete OWL-DL reasoning using relational databases. Proceedings of the Database and Expert Systems Applications (DEXA’09), Aug 31−Sep 4, 2009, Linz, Austria. LNCS 5690. Berlin, Germany: Springer-Verlag, 2009: 435−442

13. Sirin E, Smith M, Wallace E. Opening, closing worlds: on integrity constraints. Proceedings of the 5th Workshop on OWL: Experiences and Eirections (OWLED’08), Collocated with the 7th International Semantic Web Conference (ISWC’08), Oct 26−27, 2008, Karlsruhe, Germany. 2008

14. Motik B, Horroks I, Sattler U. Bridging the gap between OWL and relational databases. Journal of Web Semantics. 2009, 7(2): 74−89

15. Horrocks I, Schneider P. Reducing OWL entailment to description logic satisfiability. Journal of Web Semantic. 2004, 1(4): 345−357

16. Guo Y, Pan Z, Heflin J. LUBM: a benchmark for OWL knowledge base systems. Journal of Web Semantics, 2005, 3(2):158−182

(Editor: WANG Xu-ying)

mapping integrity constraint ontology to relational databases

Documents