mapping databases to ontologies to design and maintain .../sci/pdfs/p704935.pdf · mapping...

6

Click here to load reader

Upload: buiminh

Post on 02-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mapping Databases To Ontologies To Design And Maintain .../sci/pdfs/P704935.pdf · Mapping Databases To Ontologies To Design And Maintain Data In A ... The backbone of this next generation

Mapping Databases To Ontologies To Design And Maintain Data In ASemantic Web Environment

Olivier CuréISIS Laboratory, Office #240

Université de Marne - la- ValléeChamps - sur - Marne, 77424, France

Abstract. This paper presents a global framework whichenables the end - user to design, enrich and maintain anontology from an existing database. Thesefunctionalities facilitate the development andmaintenance of Semantic Web applications fromfrequently updated and domain - concerned databases.The efficiency of this framework is evaluated throughthe study of a medicine- oriented Semantic Webapplication which benefits from all the features of theDBOM framework.

Keywords: Ontology, Knowledge base, database,mapping.

1. INTRODUCTION

The Semantic Web is an extension of the current Web,considered syntactic, in which semantic markuplanguages enable computers to process and interpretWeb resources. The backbone of this next generationweb is the notion of ontology, understood in aninformation system way and computer science [7][8]rather than its philosophical definition. The World WideWeb Consortium (W3C) has recently released a set ofrecommenda tions, on Resource Description Framework(RDF) and Web Ontology Language (OWL), to tackle thedesign of ontologies for the Semantic Web.

This paper addresses two crucial issues for thedevelopment of Semantic Web applications. The first oneis related to the design of ontologies, which isconsidered difficult to handle even though effective andusable tools are now available (Protégé[6]). The secondone is concerned with data acquisition, anothercumbersome task which faces the current Web's dataoverload. Most of these data are stored in databases andin unstructured text - based documents, usuallygenerated from Word processors.

In [3], the authors emphasize that “databases are similarto knowledge bases because they are usually used tomaintain models of some domain of discourse”. TheDBOM (DataBase Ontology Mapping) framework exploitsthis proposition and deals with both ontology designand data acquisition issues via mapping with databases.This framework also proposes maintenance features foran OWL ontology considering updates of an existingrelational database. A high potential of the DBOMframework is to design domain and applicationontologies [8] from a domain - concerned database,meaning that an important part of the data available forthis domain are contained and well described in thedatabase. A valuable example in the context of XIMSA

(see section 4) is an exhaustive database of all OTC(Over- The- Counter) drugs available in France. Animportant fact about such databases are that updateoperations may also update the domain. Thus theontology requires to be synchronized with thedatabase. The proposed framework offers suchfeatures and extends them to permit symmetricalmaintenance solutions, meaning that controls andmaintenance operations are done both ways : fromthe database to the ontology (ontology updating, e.g.adding new instances, etc.) and from the ontology tothe database (e.g. consistency checking).This paper is organized as follows :

- section 2 highlights the motivation behind theDBOM framework based on studies of availablesolutions in this field.

- Section 3 proposes an overview of the DBOMframework which describes mapping,enrichment and maintenance issues.

- section 4 proposes a full- fledged examplebased on DBOM : the XIMSA (eXtendedInteractive Multimedia System for Auto -medication) project.

2. MOTIVATION

The potential collaboration between databaseschemata and ontologies is an active field, especiallysince the emergence of the Semantic Web. A SWAD-Europe deliverable [12] emphasizes that manyimplementations (D2R Map [2], KAON Reverse [11],etc.) are already tackling this issue for RDFS andOWL ontology languages. These solutions help todesign an ontology from a database schema and toinstantiate classes from tuples. A study of [12]highlights the absence of a global framework whichenables the maintenance of ontologies fromdatabases. These functionalities are quite importan tfor Semantic Web applications where TBox and Abox(respectively terminological and assertionalknowledge) states of the knowledge base must besynchronized with the current domain state.Available solutions enable the updating ofontologies by recreating, from scratch, all theinstances from the database via a mapping process.Such an approach may be difficult to handle withlarge ontologies (up to few mega bytes) andfrequently updated databases. Thus makingscalability an important issue. DBOM satisfies thisissue by providing a dynamic maintenance solutionbetween the database and the ontology.

DBOM focuses on the development of web

SYSTEMICS, CYBERNETICS AND INFORMATICS VOLUME 4 - NUMBER 452 ISSN: 1690-4524

Page 2: Mapping Databases To Ontologies To Design And Maintain .../sci/pdfs/P704935.pdf · Mapping Databases To Ontologies To Design And Maintain Data In A ... The backbone of this next generation

ontologies used in the context of Semantic Webapplications. A major constraint for such framework isthat a fully automatic solution to convert data from arelational schema into a Description Logic knowledgebase schema (with TBox and ABox) can never prove to befully satisfactory. A reason is that the relationalschemata are usually the only source of information asfar as (Extended) Entity- Relation diagram, richer interms of semantics (cardinality constraints, etc.) are notavailable. Thus the solution needs to be semi- automatic,meaning that the DBOM end- user / ontology designer(referred as the designer in the rest of this paper) buildsthe database to ontology mapping file. The designer isthen the source of information which permits to mapthe semantics less database schema to a richlyaxiomatized ontology. Once validated, the mapping fileis automatically processed.

The approach adopted by DBOM is that only a portion ofthe database tables and attributes necessarily map to theontology. Nevertheless, the designer still has thefreedom to map as many relations and attributes as hewants to the ontology. But in order to ensure fastnavigation and effective reasoning activities, the designof ontologies provided with the minimum of classes andproperties is preferable. For example, the design of theXIMSA ontology (see section 4) does not incorpora teattributes such as drug price because this field is notuseful in the current XIMSA reasoning context. Anyhow,such information are valuable to end- users and are stillaccessible via bridges, designed within DBOM, betweenthe database and the ontology.

The vision of the DBOM framework is to developapplications that use the ontology for inferencepurposes and is able to bind the inference results to thedatabase thus providing exhaustive information to theend- user. In order to reach this goal, the designer of themapping must be aware of the database schema andneeds a clear vision of the characteristics of theSemantic Web application about to be implemented.

The main motivation behind the maintenance featuresremains in the database / ontology separation. Thisseparation requires that the database schema is notmodified during mapping processing and enables usersof the DBOM framework to benefit from :

– database features which are not available inontology engineering. We can emphasizemechanisms such as concurrency control,transaction, crash recovery, advanced storagetechniques and query languages.

– OWL ontologies, and underlying DescriptionLogics, reasoning tools, e.g. Racer [9], whichenable to check the consistency of the ontology.

3. DESCRIPTION OF COMPONENTS

The DBOM framework is implemented in Java and isbased on Hewlett Packard's Semantic Web Jena API [10].Jena has been selected because it is an efficient open -source framework popular in the Semantic Webdeveloper community. Among interesting features, Jenaprovides a programmatic environment for RDF, RDFSand OWL, including a rule - based inference engine.

The description of the DBOM framework is dividedinto four distinct components : design, enrichment,ontology and database maintenances. A global visionof the framework is proposed in Figure 1.

Fig. 1 Global vision of interactions with the DBOMframework.

3.1 Ontology design

DBOM's ontology design implementation respectstwo of the main prerequisites described in section2 :

– a semi- automatic solution is adopted. Theontology designer uses his knowledge of thedatabase schema to design the mapping tothe ontology. The mapping is supported byan XML file whose parsing implies to validatean XML Schema. Thus the designer needsexpertise in DBMS, XML and ontologymodeling.

– the existing database schema is not changed.Thus actions effectuated on the database aredata manipulation operations of the SQLlanguage (DML). This approach is adoptedbecause the designer has skills in usingDBMS, thus writing SQL queries should not bea difficult task. Storing SQL queries directly inthe XML mapping file implies improvementsin terms of readability and usability. Theability to easily create individuals from

SYSTEMICS, CYBERNETICS AND INFORMATICS VOLUME 4 - NUMBER 4 53ISSN: 1690-4524

Page 3: Mapping Databases To Ontologies To Design And Maintain .../sci/pdfs/P704935.pdf · Mapping Databases To Ontologies To Design And Maintain Data In A ... The backbone of this next generation

instance data spread over several tables isanother advantage provided by the use of SQLqueries.

The first step a designer has to describe in the XMLmapping file is to provide the information related to theDBMS connection. DBOM enables connection withRDBMSes offering JDBC and ODBC access. Following thissection namespaces definitions can be declared. Thenfollows the heart of the ontology design with thecreation of concepts and properties.

The ontology design phase consists in describing therelations, attributes and possible conditions implied inthe design of the ontology classes and properties (dataand object properties). The mapping addresses thedesign of an ontology according to constructors of theOWL Lite and DL languages. This approach enablesDBOM to generate the TBox with class and propertyhierarchies as well as restrictions. The assumption ofacyclicity is common in Description Logic knowledgebases [3] and is the responsibility of the designer.

The mapping must conform to a set of rules :– class individuals follow the format :

className_xxx. The className is provided in themapping file by the designer. The xxx value isequal to the primary key value of thecorresponding database relation. DBOM allowsthe creation and manipulation of compoundidentifiers. This rule satisfies the correspondencebetween instances of the ontology and tuples ofthe database.

– for ontology readability, class individuals must beprovided with text attributes stored in a rdfs:labelelement and /or datatype properties defined inthe mapping file

The purpose of the rest of this section is to emphasizethe main constructions possibilities of the mapping. Forreadability reasons, the code fragments are kept to theirminimal and essential forms. It is then necessary tostress that functionality definitions are not provided fornamespaces, abstract classes, etc.. An examination of theXML Schema [5] presents all the functionalities of themapping language. In order to provide a concreteexample, the description of the mapping is based on thefollowing simple relational schema :

gender(idGender,valueGender)person(idPerson, lastNamePerson,

firstNamePerson, zipcodePerson,idGender#)parent(idChildPerson#, idParentPerson#)

The definition of the Person ontology concept isdescribed with the following XML code fragment :

<dbom:class dbom:name="Person"dbom:src=”person”>

<dbom:sqlstatement query="SELECTperson.idPerson, person.lasteNameValue,person.firstNameValue

FROM person;"/><dbom:bind> <dbom:id>

<dbom:value>1</dbom:value> </dbom:id>

<dbom:label>

<dbom:value>2</dbom:value> <dbom:value>3</dbom:value> </dbom:label></dbom:bind>

</dbom:class>

The dbom:class element maps to the creation of anOWL concept named after the value of the dbo:nameattribute, (in this example, Person). An equivalentdbo:subclass element enables to design classhierarchies from SQL statement s and requires thename of the extended class.

The design of richly axiomatized concept is alsopossible with the DBOM design mapping. Letssuppose we would like to design a Male conceptcorresponding to male persons from the Personrelation. In order to store the correspondingontology restrictions, DBOM proposes constructorssuch as onProperty and hasValue. SomeValuesFromand allValuesFrom, as well as all OWL Lite and DLlanguages construc ts are also accessible via themapping file. The following example provides thedefinition of the Man concept as being a person witha 'male' value for the object property 'hasGender ' :

<dbom:definition dbom:className="Man"dbom:defType="equivalentClass"> <dbom:conjunct> <dbom:elementdbom:className="Person"/> <dbom:elementdbom:type="hasValueRestriction"dbom:objPropertyName="hasGender" dbom:value="Gender_1"/> </dbom:conjunct> </dbom:definition>

DBOM mapping features enable end - users todefined all kinds of properties :

– data type properties are easily integratedwithin ontologies according to a mappingbetween relation attributes and the range ofthis property. In the previous example, we canassign a hasZipCode property to the Personclass.

– All kinds of object properties (functional,transitive, symmetrical, etc.) can be definedwith or without property restrictions. In theprevious example, a hasParent objectproperty may be defined with the class Personas domain and range. This property may bedeclared transitive to enable the recursivetask to access ancestors.

Property hierarchies features are also present in themapping solutions. With the previous example, wecan define a hasFather property, a sub- property ofthe hasParent property. The corresponding SQLstatement would be :

SELECT person.idPerson,person.lastNamePerson, person.firstNamePerson FROM person, gender

WHEREperson.idGender=gender.idGender

AND valueGender like 'male';

SYSTEMICS, CYBERNETICS AND INFORMATICS VOLUME 4 - NUMBER 454 ISSN: 1690-4524

Page 4: Mapping Databases To Ontologies To Design And Maintain .../sci/pdfs/P704935.pdf · Mapping Databases To Ontologies To Design And Maintain Data In A ... The backbone of this next generation

3.2 Ontology enrichment

The performances of the enrichment phase areincreased due to the storage of SQL queries in themapping file. The query elements (attributes in theSELECT clause, table names in the FROM clause, etc.) arenot decomposed in the XML file thus a SQL querygeneration is not necessary. A simple parsing of themapping files enables query execution in the database.

The SQL statements in the DBOM mapping file supportthe creation of class instances and assign values toinstance properties. In the previous Person conceptexample, the dbom:value element specifies the fieldnumber in the corresponding SQL query which isattached to the nesting element. In this article example,the idPerson column (first in the list) maps to a Personconcept rdf:ID element. The following Person tabletuple :

(101, 'Cure', 'Olivier',77500, 1)

will generate the following individual :

<person rdf:ID="person_101"> <rdfs:label>Cure Olivier</rdfs:label></person>

The enrichment implementation is considerablyfacilitated by the use of the Jena API which proposesclasses and methods to generate OWL Lite and DLontologies in various formats (N3, N- TRIPLES,RDF/XML). The order in which the Jena methods arecalled is importan t and follows respectively the TBoxcreation, with datatype properties, classes and objectproperties, and the ABox creation with conceptinstantiations and instance property assignments.

3.3 Ontology maintenance

The maintenance phase is concerned with (non - schema)updates of the databases which also need to beeffectuated on the ontology. The enrichment phase isusually performed once for an ontology schema whilemaintenance may be processed several times.

The ontology maintenance mechanisms is centered onthe fact that database updates may imply ontologyupdates. This possibility is motivated by the presenceor absence of the updated attribute in the ontology. Theimplementation solution is provided by the adoption ofSQL triggers and external to the database Java methods.

Triggers offer a facility to autonomous ly react todatabase events by evaluating a data - dependentcondition and by executing an action whenever thecondition is satisfied. The DBOM framework makes anextensive use of the Event- Condition - Action (ECA)trigger model. The maintenance module necessita testriggers for all three events (Insert, Update and Delete)for all relations involved in the ontology mapping.Considering large ontologies, this fact may involve thecreation of an important number of triggers. It is knownthat the design of triggers by hand is a difficult anderror - prone task. DBOM proposes to generateautomatically all triggers given the assumption that thedatabase schema assigns ON DELETE and ON UPDATE

clauses for all relations when required byprimary /foreign key binding. This solution offersthe opportuni ty to fire triggers, and maintain theontology, using a cascading approach. The Figure 2presents the general policy for ontology concepttriggers generation :

for each concrete class definition inthe mapping file

{generate an INSERT trigger for the

involved relation with restriction in theWHEN clause if required.

generate a DELETE trigger for theinvolved relation with restriction in theWHEN clause if required.

for each attribute in the SELECTclause {

generate an UPDATE trigger for thatattribute for the involved relation withrestriction in the WHEN clause if required.

}}

Fig .2 Algorithm for concept triggers.

We omit the trigger generation algorithms related toproperties which we consider straight forward to thepolicy provided with the algorithm in Figure 2.

All the generated triggers are characterized by being:– Fired after events occurrence.– Executed at the row level, meaning that the

rule is triggered separately for each tuple.– Related to actions which are automatically

executed, external generic programs, writtenin Java.

The generation of triggers involves an XML parsingand pattern matching operations on the SQL queries.Performance figures for this task can be improvedwith a decomposition of SQL queries elements butsuch an approach is done at the price of usabilityand readability of the mapping file (see Section 3.1and 3.2). The next version of DBOM will be providedwith a QBE- like (Query By Example) solution for themapping file creation. This approach will then tacklethe best of both worlds : fast access to decomposedquery elements and usability considerations.

3.4 Database maintenance

The last component of DBOM is related to thedatabase maintenance which is ensured byconsistency checking of an ABox w.r.t. a TBox. Thisdatabase maintenance is executed after an effective,at least one trigger has been fired, ontologymaintenance. The objective of this maintenance isto ensure that new instances and property valuesare consistent with the semantics of the ontology,something the DBMS can not process due to its lackof detailed semantics.

DBOM uses Racer [9] because it proposes thefollowing characteris tics :

– it is the only publicly available DL system

SYSTEMICS, CYBERNETICS AND INFORMATICS VOLUME 4 - NUMBER 4 55ISSN: 1690-4524

Page 5: Mapping Databases To Ontologies To Design And Maintain .../sci/pdfs/P704935.pdf · Mapping Databases To Ontologies To Design And Maintain Data In A ... The backbone of this next generation

that support s full ABox reasoning for anexpressive DL,

– it enables a Java application to communicate witha DL reasoner via the DIG ([1]) interface, thusproposing an HTTP- based standard forconnecting our client program.

The current philosophy of the database maintenance isnot to act directly on the database Thus the approachadopted is to propose a log file, in a XML format, to theend- user. This log file presents the inconsistenciesdetected by Racer providing the following information :date of detection of inconsistency and identifier of theinconsis tent individual.

4. THE XIMSA EXAMPLE

The main features (design, enrichment and maintenance)of the DBOM framework are used on the XIMSA system[4]. The XIMSA web application aims at providinginformation and services to the general public on mildclinical signs and related treatments and medications.XIMSA is supported by Semantic Web technologies suchas an ontology written in OWL DL. The auto - medicationontology has been designed using DBOM based on adrug database which contains all drugs available inFrance. For each drug, the database regroups all the dataof the Summary of Product Characteristics (SPC) plusextra information such as opinions from health careprofessionals and a drug rating.

The design of the ontology is the result of acollaboration with the clinical pharmacology departmentat the Cochin hospital in Paris. DBOM has proved to bevaluable during this collaboration because :

– it helps all actors of the design (health careprofessionals and ontology designer) toconcentrate solely on the conceptual tasks,

– the ontology becomes a concrete, useful entity,instantiated with thousands of objects, withinseconds after the conceptualization.

At the time of writing this paper, the XIMSA ontologyincorpora tes concepts and properties concerned withthe field of drug auto - medication specialized indetecting drug contra - indications and patient allergies.Future version of XIMSA will imply to extend theontology with new concepts and properties. DBOMfunctionalities will prove useful during this extensionphase because features addition are done at the designlevel and are automatically processed at the enrichmentand maintenance levels.

The integration of the ontology in XIMSA enables thepatient to control drug prescription regarding hisSimplified Electronic Health Record (SEHR). The SEHR iscreated and maintained within the web- based XIMSAinterface and enables the patient to store general (name,gender, date of birth, etc.) and health related (clinicaland drug interactions, drugs being prescribed, etc.)information. An accurate and up- to- date SEHR assiststhe system in providing safe drug prescription to aspecific patient. A ruled- based mechanism handles theontology, the SEHR and a particular request of the end-user (for example requiring an antitussive drug) topropose hyper links of safe to prescribe drugs. The

result of the inference mechanism providesidentifiers of eligible ontology instances. Due to thecorrespondence between the ontology instances andthe database tuples, the end- user can click on ahyper link and obtain all the information stored inthe database about this drug, including informationnot contained in the ontology (e.g. drug price).

An interesting fact about the XIMSA system is theomnipresence of the drug database. Retrievalqueries are involved in the SEHR creation andmaintenance as well as during the drug proposition.A secured administ ra tor page is also provided tomaintain the database. This part of the application isbound to transaction, concurrency control and crashrecovery issues. Such a system is tremendouslygaining, in terms of functionality and usability, fromthe database / ontology separation andcollaboration.

Finally, updates of the ontology produce aconsistency checking. Lets consider the addition of anew drug in the database where the therapeuticclass is not consistent with the RecommendedInternational Non- proprietary Name (RINN – theactive molecule) of the drug. Although the statementrecorded in the database is valid, its semantic iswrong. The consistency checking of the ontology willreport, in a log file, that an inconsis tent statementhas been added in the database.

5. CONCLUSION

The DBOM framework enables the design andmaintenance of high quality ontologies by providingcorrectness, minimally redundant data and richlyaxiomatized descriptions characteris tics. Thecorrectness quality is provided by the capture of theintuitions of domain experts which is facilitated by aconceptual - concerned collaboration with thedesigner. This collaboration also benefits from aknowledge representation language abstraction andfast access to a realistic, richly instantiated ABox.The minimal redundancy quality is provided by thedatabase / ontology separation considering that thedesigner respects the DBOM philosophy : relationsand attributes not concerned with inferencemechanisms should not appear in the ontology andstay solely in the database. The richly axiomatizedquality implies that descriptions are “sufficiently”detailed. DBOM satisfies this requisite in providingthe ability to add restrictions during concept andproperty constructions.

This quality achievement and the ability to proposea knowledge base fast prototyping tool emphasizesthat DBOM can be useful for the development ofSemantic Web applications. XIMSA remains avaluable experience on the development of suchelementary yet useful applications. The dual,synchronized maintenance of the database and theontology highlights the importance of such featurein a knowledge - based environment.

Although the XIMSA auto- medication ontologycontains more than 6000 drug products, 1500 RINNand 500 therapeutic classes ; we believe that studieswith larger ontologies, meaning larger TBoxes and

SYSTEMICS, CYBERNETICS AND INFORMATICS VOLUME 4 - NUMBER 456 ISSN: 1690-4524

Page 6: Mapping Databases To Ontologies To Design And Maintain .../sci/pdfs/P704935.pdf · Mapping Databases To Ontologies To Design And Maintain Data In A ... The backbone of this next generation

ABoxes, need to be conducted. Performance surveysshould also be conducted with such ontologies.

The DBOM framework also requires the implementationof a graphical QBE- like solution for the design ofdatabase to ontology mapping file.

6. REFERENCES

[1]Bechhofer, S. : The DIG description logic interface:DIG/1.1. In Proceedings of the 2003 Description LogicWorkshop (DL 2003), 2003.

[2]Bizer, C. : D2R MAP - A Database to RDF MappingLanguage . The Twelfth International World Wide WebConference (WWW2003), Budapest, Hungary, May2003. Available on line at :http: / /www.bizer.de / .

[3]Bordiga, M. ; Lenzerini, M. ; Rosati, R. : Descriptionlogics for data bases. Chapter 16 in The descriptionlogic handbook, editer by Baader, F. ; Calvanase , D. ;McGuinness, D. ; Nardi, D. ; Patel- Schneider, P.Cambridge University Press.

[4]Cure, O. : XIMSA : eXtended Interactive MultimediaSystem for auto- medication. To appear in IEEE CBMS2004 (Computer - Based Medical Systems). Bethesda,USA. June 2004.

[5]DBOM : Data Base Ontology Mapping home page at :http: / /www.univ- mlv.fr / ~ oc u re / d b o m.ht ml / .

[6]Gennari, J. ; Musen, M. ; Fergerson, R. ; Grosso, W. ;Crubezy, M. ; Eriksson, H. ; Noy, N. ; Tu, S. : Theevolution of protégé: an environment for knowledge -based systems development. International Journal ofHuman - Computer Studies, 58:89 123, 2003.

[7] Gruber, T. : A translation Approach to portableontology specifications. Knowledge Acquisition. Vol.5. 1993. 199- 220.

[8] Guarino, N. : “Formal Ontology and InformationSystems''. In the Proceedings of FOIS 1998. Also inFrontiers in Artificial Intelligence and Applications,IOS- Press, Washington, DC, 1998.

[9]Haarslev, V.; Moller, R. : Racer: A Core InferenceEngine for the Semantic Web. In the proccedings ofEON 2003 (Evaluation of Ontology- based Tools).Proceedings available online athttp: / / su n si te.informatik.rwth -aachen.de /Publications /CEUR- WS//Vol - 87/.

[10] Jena framework home pagehttp: / / j ena.sourceforge.net /

[11] Karlsruhe Ontology projet web page at :http: / / k m.aifb.uni - karlsruhe.de /kaon2 / f ro n tpage .

[12] SWAD- Europe Deliverable 10.2: Mapping SemanticWeb Data with RDBMSes. Available online at :http: / /www.w3.org /2001 / sw /Europe / r ep or t s / s calable_rdbms_mapping_report / .

SYSTEMICS, CYBERNETICS AND INFORMATICS VOLUME 4 - NUMBER 4 57ISSN: 1690-4524