ontology · pdf fileontology-basedstructuredwebdatawarehousesforsustainable interoperability:...

27
Ontology-based structured web data warehouses for sustainable interoperability: requirement modeling, design methodology and tool Selma Khouri ab , Ilyes Boukhari a , Ladjel Bellatreche a , Eric Sardet c , St´ ephane Jean a , Michael Baron a a LISI/ENSMA - Poitiers University Futuroscope - France b National High School for Computer Science (ESI), Algiers - Algeria c CRITT Informatique, Futuroscope - France The spectacular growth of the Internet and its widespread adoption by worldwide corporations lead to an enormous quantity of heterogeneous, distributed and autonomous data sources. To facilitate the access to these huge amounts of data and make these sources interoperable, two technologies may be combined: data warehousing and ontologies. Data warehouses are designed to aggregate data and allow decision makers in these companies to obtain accurate, complete and up to date information. In the past decade, data warehouse technology (DWT) has been successfully applied in several domains such as telecommunication, retail, finance and many other industries. It supports a wide range of applications throughout the enterprise. The DWT has been largely used to offer sustainable solutions for enterprises. On the other hand, ontologies are models for specifying the semantics of concepts used by various heterogenous sources in a well defined and unambiguous way. Ontologies exist in various domains (E-commerce, Engineering, Tourism, etc.) and are used to increase interoperability between sources. They may be used to improve communication between decision makers and users collaborating together, by specifying the semantics of the used concepts. In this paper, we propose a methodology for designing data warehousing applications from various sources. Each source has its local ontology referencing a global one. For satisfying its local requirements and giving to sources more autonomy, each source may specialize/extend the global ontology. The presence of ontologies has three main contributions: (i) each owner of each source may use it to define his/her requirements, (ii) it reduces most important types of conflicts that may exist in sources and requirements (schematic and semantic) and (iii) it facilitates the sustainable urbanisation of the target data warehouse. Our methodology is supported by a case tool facilitating the tasks of data warehouse designers. Keywords: Structured Web Data, Integration Systems, Ontology-based Databases, Requirement Engineer- ing, Case Tool, OWL formalism, PLIB formalism. 1. Introduction The rapid development of Information Technol- ogy solutions enforces the theory stating that the World is a Global Village. This theory breaks ge- ographical barriers and distances between com- panies and organisations. Faced to this globaliza- tion phenomenon, enterprises have had to adapt their traditional behaviours to reach high level of competitiveness. To achieve this goal, enter- prises need to share data and knowledge beyond their own boundaries by the mean of advanced Web technologies. New types of applications are emerging such as: e-tailing, e-government, e- learning following the concept of e-entreprise, on- line services like finance, publishing and mar- keting, all managed by web-based enterprises. We also face an outburst of start-ups and spin off that are blooming every day, which need to integrate existing web data into their own in- formation systems. Additionally, recent Google research projects demonstrated that increasing quantities of these web data are structured. The prime example of such data is the deep web, re- ferring to content on the web that is stored in databases and served by querying HTML forms (14). More recent examples of structure are a va- 1

Upload: hoangliem

Post on 26-Mar-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

Ontology-based structured web data warehouses for sustainable

interoperability: requirement modeling, design methodology and tool

Selma Khouri ab, Ilyes Boukhari a, Ladjel Bellatreche a, Eric Sardet c, Stephane Jean a, Michael Baron a

aLISI/ENSMA - Poitiers UniversityFuturoscope - France

bNational High School for Computer Science (ESI), Algiers - Algeria

cCRITT Informatique, Futuroscope - France

The spectacular growth of the Internet and its widespread adoption by worldwide corporations lead to anenormous quantity of heterogeneous, distributed and autonomous data sources. To facilitate the access to thesehuge amounts of data and make these sources interoperable, two technologies may be combined: data warehousingand ontologies. Data warehouses are designed to aggregate data and allow decision makers in these companies toobtain accurate, complete and up to date information. In the past decade, data warehouse technology (DWT)has been successfully applied in several domains such as telecommunication, retail, finance and many otherindustries. It supports a wide range of applications throughout the enterprise. The DWT has been largelyused to offer sustainable solutions for enterprises. On the other hand, ontologies are models for specifying thesemantics of concepts used by various heterogenous sources in a well defined and unambiguous way. Ontologiesexist in various domains (E-commerce, Engineering, Tourism, etc.) and are used to increase interoperabilitybetween sources. They may be used to improve communication between decision makers and users collaboratingtogether, by specifying the semantics of the used concepts. In this paper, we propose a methodology for designingdata warehousing applications from various sources. Each source has its local ontology referencing a global one.For satisfying its local requirements and giving to sources more autonomy, each source may specialize/extendthe global ontology. The presence of ontologies has three main contributions: (i) each owner of each source mayuse it to define his/her requirements, (ii) it reduces most important types of conflicts that may exist in sourcesand requirements (schematic and semantic) and (iii) it facilitates the sustainable urbanisation of the target datawarehouse. Our methodology is supported by a case tool facilitating the tasks of data warehouse designers.

Keywords: Structured Web Data, Integration Systems, Ontology-based Databases, Requirement Engineer-ing, Case Tool, OWL formalism, PLIB formalism.

1. Introduction

The rapid development of Information Technol-ogy solutions enforces the theory stating that theWorld is a Global Village. This theory breaks ge-ographical barriers and distances between com-panies and organisations. Faced to this globaliza-tion phenomenon, enterprises have had to adapttheir traditional behaviours to reach high levelof competitiveness. To achieve this goal, enter-prises need to share data and knowledge beyondtheir own boundaries by the mean of advancedWeb technologies. New types of applications areemerging such as: e-tailing, e-government, e-

learning following the concept of e-entreprise, on-line services like finance, publishing and mar-keting, all managed by web-based enterprises.We also face an outburst of start-ups and spinoff that are blooming every day, which need tointegrate existing web data into their own in-formation systems. Additionally, recent Googleresearch projects demonstrated that increasingquantities of these web data are structured. Theprime example of such data is the deep web, re-ferring to content on the web that is stored indatabases and served by querying HTML forms(14). More recent examples of structure are a va-

1

2

riety of annotation schemes (e.g., Flickr, the ESPgame, Google Co-op) that enable people to addlabels to content on the web, and Google Base aservice that allows users to load structured datafrom any domain they desire into a central repos-itory (14). In such applications, web data arestored within structural databases. Sharing thesemountains of web data efficiently between differ-ent companies collaborating together is a crucialissue that needs to be addressed. Note that thesedata are particularly heterogeneous, distributed,and autonomous and evolving in a dynamic envi-ronment.

Data integration is one of the relevant solutionsthat offers these functionalities to companies. Itcan be viewed as a process by which several het-erogeneous sources are consolidated into a sin-gle data source associated with a global schema.It recently received a great attention from aca-demic and industrial communities. This is dueto many data management applications for ex-ample peer-to-peer applications, data warehouses(DW), E-commerce, and Web services. In a de-cisional context, DW integration systems are themost suitable solutions offering various analysisand visualization tools used within a decision-making process. A DW can be seen as an inte-gration system, where relevant data of varioussources are extracted, transformed and material-ized in a warehouse (18) (Figure 3). A DW hasadditionally a multidimensional layer organizingits data into central subjects to analyze (Factconcept) according to different analysis perspec-tives (Dimension concepts). Dimensions can alsobe organized into hierarchies representing differ-ent levels of granularity. This multidimensionalaspect enables various OLAP (Online Analyti-cal Processing) analyses for decision makers, thattraditional integration systems do not. DWs havebeen successfully applied in various applicationsfields like telecommunications1, health care (62),and by different companies such as ContinentalAirlines, Wal-Mart, Toyota and Hewlett-Packard(43).

The main task in data integration is the iden-tification of syntactic and semantic conflicts be-

1Teradata company: http://www.teradata.com

Figure 1. Types of semantics conflicts.

tween heterogeneous data. Different categories ofconflicts may be encountered at the schema leveland at the data level and should be solved. Gohet al. (32) suggest the following taxonomy (Figure1): naming conflicts, scaling conflicts, confound-ing conflicts and representation conflicts. For ex-ample a scaling conflict occurs when the price ofa product is given in Dollar for a company in theUnited States or in Euro for a company in Eu-rope.The lack of a generally agreed terminology fol-

lowed by enterprises collaborating together hasbeen recognized as the bottleneck for efficientintegration (19). An effective approach to dealwith integration and interoperability is the useof common standards. Many integration applica-tions used domain ontologies as a reference model,where data reconciliation and mapping is ob-tained according to concepts of the domain ontol-ogy. Several ontology-based integration systemswere proposed (16; 32; 45). Based on the way howontologies are employed in these systems, (65),three different architectures are distinguished(Figure 2): (i) single ontology methods relatingeach source to the same global domain ontology,which limits considerably source schematic auton-omy, (ii) multiple ontologies methods where eachsource has its own ontology developed withoutrespect to other sources, which requires difficulttasks of inter-ontology mapping and (iii) hybridmethods where each source has its own ontology,but all ontologies are connected by some means

3

to a common shared vocabulary. The main as-sumption of these systems is that all the sourcesuse the same shared ontology. This assumption isrelaxed by the emergence of many dedicated ref-erence models covering many industrial areas. Wecan cite global standards and models developed intourism industry that are used by big companieslike Open Travel Alliance (www.opentravel.org)and Hospitality Industry Technology IntegrationStandards (www.hitis.org), or the standard ISO10303 for product information in industrial envi-ronments, etc. Those ontologies have been exten-sively used for traditional Web applications (Webservices, pervasive computing, etc.) and Web 2.0,like ontologies FOAF2 and SIOC3 used for so-cial networks domain. Ontologies are specified us-ing different formalisms developed during last fif-teen years like languages of the semantic Web(Daml+Oil, RDF, RDFS and OWL) for web ap-plications, and PLIB (Parts Library) (54) for-malism used for engineering applications. OWL iscurrently accepted as W3C recommendation forpublishing ontologies on the Web. The PLIB for-malism defines a domain-oriented (for engineeringproducts) ontology model that has been devel-oped during the 90’s at the ISO level. The resultis a standard series, ISO 13584.

�� ������������ ������������� ���������������������� ���� ������������� ���������������������� ���� �����������

�� �� ���� ������ ������� �������� ������ ��������Figure 2. Different Ontology Architectures

The outburst of ontologies in various domainsand their use by different companies leads to

2http://www.foaf-project.org3http://sioc-project.org/

the creation of important amounts of web datareferencing ontologies. These data are calledOntology-Based Data (OBD). Some solutionsproposed to manage OBD in main memory likeJena (50) and OWLIM (44). Such systems al-low fast loading and fast update of data but be-come very fastidious when data grows and a bigamount of semantic data is available. To over-come this problem, database solutions have beenproposed offering efficient storage and queryingmechanisms for ontological data. The generateddatabases are called ontology-based databases(OBDB). OBDB are databases storing both on-tology and data in the same database schema (fig-ure 8).

OBDBs kept the attention of both industrial(like Oracle and IBM) and academic (OntoDB)communities, where different OBDBs were pro-posed like Rdfsuite (4), OntoDB (22), Sesame(12) and Oracle (21). Those OBDBs rapidly be-come candidate sources for the integration pro-cess. OBDB are suitable solutions for managingOBD for OLTP (OnLine Transactional Process-ing) applications, but not for OLAP applications.Following the same idea of OBDB, Ontology-Based Data Warehouses (ODBW) are adequatesolutions for analyzing those ontology-based data.Our architecture scenario is in this case similarto hybrid integration methods of Figure 2 intro-duced previously. Furthermore, some recent re-search efforts in DW design proposed ontology-based methods and frameworks for the definitionof the DW model. This is due to the role of do-main ontologies in clarifying semantics of sourcesduring the integration process, and their strongsimilarities with conceptual models (26). Ontolo-gies can actually be seen as conceptual models ofa whole domain via a logical theory offering rea-soning capabilities that conceptual models do nothave.

Additionally to sources, user requirements areactually recognized as an important componentfor DW design. Some design methods followed adata-driven approach generating the DW modelonly from data sources. Different studies iden-tified limitations of this approach and proposedrequirements-driven or mixed design methods. Itis currently recognized that DW life cycle includes

4

Figure 3. Traditional interoperability scenario by DWs.

a first phase of requirements analysis (35). Wenoticed furthermore that users’ requirements arenot only used during the design phase, but arealso used during different exploitation phases ofDW life cycle:

• requirements are saved a posteriori as users’queries in most DBMS (like Oracle or SqlServer) for optimization purposes,

• user’s preferences, which are considered asnon-functional requirements, are exploitedfor personalization and recommendationprocesses like in (29),

• quality of the DW can be measured accord-ing to its capacity to fulfil the different re-quirements and goals like in (64). Require-ments are also traced in order to managethe evolution of the decisional application.

All these works, suppose the existence and avail-ability of requirements. However, most of integra-tion systems proposed (virtual and materialized)do not care sufficiently about user’s requirementsand do not give them their right value within thesystem developed.

We propose in our approach to specify user’srequirements at the ontological level, and also tostore these requirements in the DW repository.This is particularly important when we know thatrequirements analysis step is a tedious task re-quiring much time and effort from users and from

designers. One cannot afford to redo this costlytask more than once (Figure 4). Actually, the fu-sion between the two domains (Requirement En-gineering and ontologies) aroused the interest ofresearch community since the 80s where ontolo-gies showed their effectiveness for requirementsspecification, their unification, their formalizationand for reasoning about requirements. Ontologiesallow requirements engineers to analyze a require-ments specification with respect to the semanticsof the application domain. They are useful for de-tecting incompleteness and inconsistency as satedin IEEE requirements specification (1). More con-cretely and in the same way as for data sources,ontologies are used to identify and manage se-mantic conflicts between requirements formulatedby different users that do no share the same vo-cabulary.Another dimension that has to be considered

for integration systems is the complex dynam-icity of the environment they evolve in, contin-ually shaken by new requirements and businessneeds. This dimension is called sustainable inter-operability, which goal is to make interoperabilitybetween companies easily adaptable to environ-ment changes. Changes can be of different forms:changes occurring in sources, in business require-ments or in the ontology. Changes can occur attwo levels: data level and schema level. An ef-fective evolution policy must be able to predictand limit certain changes during the design phase

5

of the integration system, and should be able toadapt the system according to additional changesoccurring once the system is developed.We propose in this paper a methodology for

designing OBDWs using domain ontologies. Thepresence of sources and requirements specified atthe ontological level is used: (i) during the in-tegration process to eliminate semantic conflictsbetween sources and requirements, (ii) during de-sign phase to specify the different deign models(conceptual and logical) and (iii) to forecast theevolution of the system once developed.

1.1. Contributions

The main contributions of this paper are:

1. A new classification of DW design methods,showing their convergence to ontology-baseconceptualisation.

2. A methodology for designing DWs fromOBDB and users’s requirements supportedby a case tool.

3. A new structure for representing both dataand requirements and their respective se-mantics in the OBDW.

4. A discussion about different DW evolutionscenarios, and how our defined OBDW canprevent environment changes.

1.2. Organization of the Paper

The rest of this paper is organized as follows.Section 2 presents the case study that will be usedas an illustrating example that we consider alongthe paper. Section 3 presents the background re-quired for facilitating the understanding of ourapproach. Section 4 presents the related work,where a classification of the different DW designmethods is proposed. Section 5 presents our DWdesign methodology comprising a description ofa proposed requirements model, the design pro-cess used to make requirements persistent withinthe DW model, and the definition of the OBDW.Section 6 describes a tool implementing the de-sign methodology proposed. Section 7 discussesdifferent evolution scenarios of the developed sys-tem. Section 8 concludes the paper and suggestssome future issues.

2. Case study

The ontology depicted in figure 5, that willbe used as example along the paper, refers to therental car ontology4, representing cars reservationdomain. Our scenario is that an online companyis managed in a given region but have Branchesbased in different countries. Customers of thiscompany from a given Country can rent a Carof a given CarModel. Once the request validated,a Rental agreement is made. The Reservation ofthe car has a certain Duration and can be Can-celled according to certain conditions.

Decision makers of this company aim at per-forming OLAP analysis related to sales of agree-ments. For this reason, they specify some deci-sional requirements such as:

• Pricing Summary Report provides a reportof all reservations made at a given reserva-tion date. The report lists totals of best-Price, basicPrice and rate of guaranteed.Note that the rate is equal to the sum [ba-sicPrice - bestPrice].

• This requirement aims at retrieving the 10reservations with the highest value for allcustomers in a given region.

• This requirement aims at retrieving the 5branches of all regions with highest numberof customers having cancelled their reserva-tion.

• This requirement consists in finding the av-erage price of reservations of a given carmodel.

3. Background

In this section, we present in details four con-cepts that we consider important for our work:Ontologies, OBDBs, Materialized integration sys-tems (DW) and Requirements. We will explaineach concept in the following sections.

3.1. C1: Ontologies

Ontologies have been defined for various do-mains for a wide range of applications. As a con-

4www.lsi.upc.edu/oromero/EUCarRental.owl

6

Figure 4. New interoperability scenario by DWs.

sequence, existing ontologies are not all alike. Inorder to define precisely the type of ontologiesneeded in our work, we propose the following tax-onomy of ontologies.

3.1.1. Taxonomy: the onion model

Two main categories of ontologies haveemerged in the literature (55). Conceptual On-tologies (CO) represent object categories andproperties that exist in a given domain whereasLinguistic Ontologies (LO) represent the termsused in a given domain eventually in differentnatural languages. However existing COs presentquite different characteristics according to the on-tology model used to define them and to the ap-plications they are designed for. As a consequencewe have divided this category into two subcate-gories. This new classification is based on the no-tions of primitive and defined concepts given byGruber (36). Primitive concepts define the bor-der of the domain conceptualized by the ontology.Each concept in the domain is represented in aunique way by a primitive concept. Primitive con-cepts are the foundations on which new conceptscalled defined concepts can be introduced. Thesedefined concepts are specified by a complete ax-

iomatic definition expressed in terms of otherconcepts (primitive or defined). For example theconcept Car-Reservation is defined as a type ofRental-Agreement concept. Based on this distinc-tion between primitive and defined concepts, ourontology taxonomy is composed of three cate-gories:

• Canonical Conceptual Ontologies (CCO)represent all the concepts in a given domainin a single way without any redundancy.Thus, CCOs only include primitive con-cepts. An example of a CCO for electroniccomponents can be found in (IEC61360-4,1999). Defining a canonical (non redun-dant) vocabulary in a given domain is par-ticularly useful for data exchange. For ex-ample, in the STEP project, exchange mod-els are defined in the EXPRESS language asa STEP Application Protocol (AP). Thesecanonical exchange models are used by in-dustrial users to exchange product descrip-tions between different organizations. Thus,CCOs can also be used as exchange models.PLIB projet (54), which is the continuity ofSTEP project, aims at providing a canoni-

7

Figure 5. Rental car ontology model.

cal representation of concepts of a given do-main. In that representation context, prop-erties play a key role. First, classes are onlyspecified when they are required to define aproperty domain. Secondly, each propertyis defined in the domain of a class, and ismeaningful only for that class and its pos-sible subclasses.

• Non Canonical Conceptual Ontologies(NCCO) include not only primitive con-cepts but also defined concepts. Using de-fined concepts, different reasoning taskscan be performed on the ontology (e.g.,classification of classes or instances). More-over, defined concepts can also be used by auser to query and manipulate the ontology.NCCO are supported by different languagesof the semantic web like RDF and OWL.

• Linguistic Ontologies (LO) define the termsthat appear in a given domain. LOs in-clude relationships between terms such assynonymous-of or homonymous-of. Theyare useful for system-user communicationas well as providing the terms used in agiven domain in different natural languages.

Wordnet5 is a well-known example of suchontologies.

Currently ontology design focuses on one ofthese three categories, the canonical, non canon-ical or linguistic part. However, as we have seenpreviously, these three categories of ontologies arecomplementary. The model, presented in Figure6, called the onion model (37), illustrates howthese three categories of ontologies can be com-bined. The CCO layer provides the foundation torepresent and exchange the knowledge of a do-main. It is extended by a NCCO layer to mapdifferent conceptualization made on this domain.Finally the LO layer is the natural language rep-resentation of all the concepts defined in the CCOand NCCO layers.

Ontologies that we use in our work follow thismodel. In next sections we will see how we canbenefit from the different layers of an ontologythat follows this model.

3.1.2. Formal Definition

We focus in our study on conceptual ontologies,whose schema can be formally defined as follows:

5http://www.cogsci.princeton.edu/wn

8

Figure 6. The onion model of domain ontologies.

(9): O :< C,P,Sub,Applic >:

• C is the set of classes describing the conceptsof a given domain.

• P is the set of properties describing the in-stances of C.

• Sub : C → 2C is the subsumption function,which associates each class Ci to its directsubsumed classes. Two subsumption rela-tionships are introduced in our framework:(i) OOSub: describing the usual subsump-tion of inheritance relationship, where thewhole set of applicable properties is inher-ited. (ii) OntoSub: describing a subsump-tion relationship without inheritance, wherea part of the whole applicable propertiesmay be imported from a subsuming class tothe subsumed one (Figure 7). OntoSub isthe formal operator of modularity. OntoSubis formalized by ’case-of’ relationships inPLIB formalism. Similar mechanisms existin OWL ontologies. (66) gives an overviewof different modularity methods defined forOWL ontologies.

• Applic : C → 2P is a function that asso-ciates to each ontology class, the properties

that are applicable for each instance of thisclass.

3.1.3. Ontologies VS Conceptual models

Conceptual ontologies can be seen as the con-tinuity of conceptual models describing a wholedomain. Ontologies have actually been used fordesigning various database applications. The firstworks proposed to use ontologies in the de-sign of traditional DBs. (63) proposed a methodsimilar to Chen methodology (20) for design-ing databases by adding a semantic layer andproposed to assist designers to define concep-tual models using linguistic ontologies. (24) pro-posed an extension of the ANSI/X3/SPARC ar-chitecture giving ontologies their place during de-sign process of database scheme. (8) proposed amethodology for designing OBDB covering con-ceptual and logical design phase basing on func-tional dependencies embedded in the used ontol-ogy. Ontologies are effectively similar to concep-tual models as they both define a conceptualiza-tion of the universe of discourse using a set ofclasses and properties, but ontologies have spe-cific characteristics that gives the added value:

• Modeling Objective: An ontology describesconcepts and properties of a domain inde-pendently of any objective. A conceptual

9

Figure 7. OntoSub operator.

model prescribes concepts and propertiesof a domain according to applicative objec-tives.

• Consensuality: the consensual aspect thatcharacterizes ontologies facilitates the de-signer task by offering him a global viewof the domain of interest.

• Concepts identification: data in a concep-tual model have a meaning only in the par-ticular context for which the model is de-fined. Concepts in an ontology have univer-sal identifiers and can be referred from anyother context. This eases ontology usage fordata exchange or integration.

• Reasoning : ontologies are formal models of-fering reasoning capabilities that can beused either to check consistency of infor-mation or to deduce new information fromexisting facts.

Figure 8. General architecture of OntoDB.

3.2. C2: Ontology-Based DataBase

3.2.1. Storage of ontologies in databases:

the OntoDB ontology-based

database

As we said in the introduction, several OBDBhave been proposed by industrial and academiccommunities. In this paper, we consider OBDBfollowed OntoDB architecture. This architectureis composed of 4 parts as illustrated in Fig. 8.

Parts 1 and 2 are traditional parts available inall DBMSs, namely the meta-base part that con-tains the system catalog and the data part thatcontains instance data. Parts 3 (ontology) allowsto represent ontologies in the database. The ontol-ogy model supported by this architecture is thePLIB ontology model but we will see that thepart 4 of this architecture can be used to adaptand extend this architecture to other ontologymodels such as OWL. The part 4 (meta-schema)is specific to OntoDB. It records the ontologymodel into a reflexive meta model. For the ontol-ogy part, the meta-schema part plays the samerole as the one played by the meta-base in tradi-tional databases. Indeed, this part may allow: (1)generic access to the ontology part, (2) supportof evolution of the used ontology model, and (3)storage of different ontology models.

Let us now see how the different concepts arestored in the 4 parts of OntoDB. We use the ex-ample presented in Fig. 9.

10

Figure 9. An toy example of an ontology and itsinstances

Figure 10. Storage of instance data in the datapart of OntoDB

Part 2 stores instance data following a rela-tional approach. Indeed, all instances of a givenontology class is stored in a single table. However,contrary to relational databases, all properties ofa given class may not be valued by instances ofthis class. Thus, the table used to store instancesof a class only includes columns for the propertiesthat are used at least by one instance of the class.Fig. 10 shows an example of the storage of the in-stances of the Customer and Address classes ofan ontology.

Part 3. of OntoDB stores ontologies using a re-lational schema defined according to the ontology

Class

ID Name

1 Person� ������� ! "# ���… …

Class

ID Name

1 Person� ������� ! "# ���… …

SubClassOf

ID VALUE

2 1

… …

SubClassOf

ID VALUE

2 1

… …

Property

ID Name

1 name � $%�! $# ���… …

Property

ID Name

1 name � $%�! $# ���… …

Domain

prop class

1 1� &… …

Domain

prop class

1 1� &… …

Range

prop type

1 xsd:string�xsd:integer!xsd:string

… …

Range

prop type

1 xsd:string�xsd:integer!xsd:string

… …

Figure 11. Storage of ontologies in the ontologypart of OntoDB

Figure 12. Storage of ontologies in the metaschema part of OntoDB

model supported. Fig. 11 presents the main tablesof this schema. These tables are used to store thehierarchy of classes as well as the properties ofthe ontology.Finally, part 4 stores the ontology model used

to define ontologies of part 3. Fig. 12 presents thestorage of the main components of an ontologymodel i.e., constructors of classes and properties.This schema can be extended to support a specificontology model.Managing all the data stored in OntoDB us-

ing SQL would be difficult since this language isbased on the relational model and thus does notinclude any operators for ontologies. As a con-sequence we have designed a specific languagecalled OntoQL.

3.2.2. Querying ontologies and its in-

stances: the OntoQL query language

Specific languages have been designed to queryontologies and their instances such as SPARQL6.The OntoQL language that we have proposed hasthree main characteristics that distinguish it from

6http://www.w3.org/TR/rdf-sparql-query/

11

other proposed languages: (1) the OntoQL lan-guage is independent of a given ontology model.Indeed, it is based on a core ontology models con-taining the constructors shared by different on-tology models and this core ontology model canbe extended by the OntoQL language itself, (2)the OntoQL language exploits the linguistic infor-mation that may be associated to a conceptualontology allowing to express queries in differentnatural languages and (3) the OntoQL languageis compatible with the SQL language and thus,it allows to exploit data at the logical level of anOBDB. Moreover, it extends this language to ex-ploit data at the ontological level, independentlyof the logical representation of the data, and stillto manipulate the structure of these data fromthis level.Let’s now see different OntoQL statements that

allow users to manipulate all the data of OntoDB.These statements are based on the ontology ex-ample shown in Fig. 9.Queries one instance data can be expressed. For

example the following query searches the nameand country of all the direct instances of the Per-son class (keyword ONLY).

SELECT name, address.country

FROM ONLY (Person)

As we can see, the OntoQL language has a syn-tax similar to SQL and provides operators to nav-igate through the hierarchy of classes (by default,a query searches on all direct and indirect in-stances of a class) and the composition of proper-ties through path expressions (address.country).The OntoQL language exploits the LO layer ofan ontology to allow user to express their queriesin different natural languages. For example if allthe concepts of our example of ontology are as-sociated to a term in French defined in the LOpart of the ontology, the previous query can beexpressed as follow

SELECT nom, adresse.pays

FROM ONLY (Personne)

USING LANGUAGE FR

The OntoQL language also supports query on on-tologies. For example the following query searches

the names in French and in English of all theclasses of the ontology.

SELECT #name[fr], #name[en]

FROM #class

As we can see all the elements of the ontologylevel (e.g, class or name) are prefixed by #; Thisprefix is used to distinguish query on instancesfrom query on ontologies.

Since query on ontologies and query on in-stances can be expressed, these two capabilitiescan be combined in OntoQL. The following queryexploits this capability to find all direct and indi-rect instances of the Person class showing in thesame time the name in English of the direct classthe instance belongs to.

SELECT p.name, TYPEOF(p.#name[en])

FROM Person p

The TYPEOF operator retrieves the belongingclass of an instance. Thus this query retrieves allpersons and it shows if this person is a customer.

The OntoQL language is also equipped with adefinition and manipulation language. In particu-lar, the definition language can be used to extendthe ontology model used to define ontologies. Forexample the following statement extends the on-tology model with the AllValuesFrom constructorfrom the OWL ontology model.

CREATE ENTITY #OWLAllValuesFrom UNDER #Class

(#onProperty REF(#Property),

#allValuesFrom REF(#Class))

An OWL restriction is a particular class. Thusthe new constructor #OWLAllValuesFrom ex-tends the basic constructor of class (#Class). AnOWL restriction is defined on a particular prop-erty (#onProperty) and takes its values from agiven class (#allValuesFrom). This new construc-tor is added to the meta schema part of OntoDB.

To conclude this part, we show in Fig. 13 thecomplete architecture of our software to handleontologies and their instances. OntoDB storesall the data and OntoQL provides an access atthe knowledge level to query and exchange thesedata.

12

Figure 13. OntoDB and its associated access lay-ers OntoQL and OntoML

3.3. C3: DWs seen as materialized integra-

tion systems

DWs are actually considered as integration sys-tems materialized by multidimensional conceptsfor facilitating OLAP analysis. We explain thesetwo aspects in the following sections.

3.3.1. Integration

The ontology used in our design approach isa conceptual domain ontology covering and inte-grating the set of OBDBs. Formally, a data inte-gration system is a triple I :< S,G,M >, whereS is a set of source schemas which describe thestructure of sources participating in the integra-tion process, G is the global schema which pro-vides a reconciled and an integrated schema andM is the mapping between G and S which estab-lishes the connection between the elements of theglobal schema and those of the sources (9).

In our approach, S, G and M are defined asfollows:

• S is represented by the set of OBDB con-taining their own local ontology Oi :<Ci, Pi, Subi, Applici > defined from a globalshared ontology,

• G is the global ontology (GO) formalized asGO :< Cp, Pp, Subp, Applicp >

• M is the mapping between GO and

OBDBs represented by the subsomptionrelationship OntoSubi,p : Cp → 2Ci be-tween GO et Oi that associates to each classcp of Cp the set of classes ci ∈ Ci that aresubsumed directly by cp:∀cp ∈ Cp, OntoSubi,p(cp) = {ci ∈ Ci|(cpsubsume ci) ∧ (∀c′i|ci ∈ Subi(c

′i) ⇒ c′i /∈

OntoSubi,p(cp)) ∧ (∀c′p ∈ Subp(cp) ⇒ ci /∈OntoSubi,p(c

′p))}.

Several automatic integration scenarios may bedefined in this context. We analyzed and formal-ized in (9) three scenarios corresponding to dif-ferent articulations between locale ontologies andthe global one: (i) FragmentOnto that assumesthat the shared ontology is complete enough tocover the needs of all local sources. Each localontology is a fragment of the shared ontology.(ii) ExtendOnto where the integrated ontology iscomposed of the shared ontology extended withall the local specialization of the variations of lo-cal sources. (iii) ProjOnto where each local sourcedefines its ontology by projecting its instancesonto the applicable properties of its smallest sub-suming class in the shared ontology. We focus forthe rest of the method on the global ontology GOcovering the available OBDBs.Additionally to integration methods, and to

prepare data integration process, some exchangebased formats have been proposed to facilitateand automate data exchange such OntoML. On-toML stands for ”Product Ontology Mark-upLanguage” (ISO13584-32). It has been developedin the ISO TC184/SC4 framework. It is an XMLSchema designed for use by applications thatneed to exchange and process PLIB compliantdomain ontologies, possibly together with theirrelated instances, in various Web-oriented envi-ronments. When designing OntoML, an underly-ing idea was to clearly separate the representa-tion of ontology concepts from the representationof instances. Indeed, in an engineering context,what is important is to ensure that instances ex-changed between business partners can be inter-preted without any ambiguity, whatever be theunderlying ontology language used to describethe related classes and properties (product ex-change interoperability). For that purpose, the

13

ISO 29002 series has been developed. It aimsto provide general mechanisms guaranteeing thatthe concept information can be exchanged inde-pendently from the data model of the underly-ing ontology model. Therefore, ISO 2002 provides(among other things) a neutral concept identifi-cation mechanism (ISO29002-5) and a neutral in-stance representation mechanism (ISO29002-10)(reference to the underlying class associated to aset of property reference/typed value couples).

3.3.2. Multidimensional paradigm

Multidimensional design organizes data intoFacts (subject of analysis) and Dimensions (anal-ysis perspectives). Dimensions can form hierar-chies, where each level of the hierarchy representsa level of granularity. Fact tables are composedof measures (fact attributes) and dimensions arecomposed of dimension attributes. Different stud-ies proposed models following the multidimen-sional paradigm. After studying different multidi-mensional normal forms proposed, we retain thefollowing multidimensional constraints:

• Data summarization: data summarizationguarantees a correct aggregation of data(46) by means of three necessary conditionsagreed by most multidimensional models:(i) Disjointness : meaning that the sets ofobjects to be aggregated must be disjoint,(ii) Completeness : meaning that the unionof subsets must constitute the entire set,and (iii) Compatibility of the Dimension,the kind of measure being aggregated andthe aggregation function used. Many designmethods ensure the first two conditions bylooking for many-to-one relationships be-tween each fact and its dimension and aone-to-many relationship between levels ofthe same hierarchy. The compatibility con-dition requires to access semantics of datato check whether it is possible to apply ag-gregation operation to aggregate measuresalong dimension levels. These three condi-tions are considered as the first step ensur-ing the quality of the DW model and somestudies used them to propose multidimen-sional normal forms.

• Dimension hierarchies: (48) enumerates andclassifies different types of dimension hierar-chies (simple, multiple, generalized). A mul-tidimensional model should represent thesedifferent hierarchies. When translating hi-erarchies to the logical level, some of themmay have the same relational representa-tion. It is thus important to keep trace ofdimension semantics within the DW.

Three different logical implementations of amultidimensional model are possible: Multidi-mensional OLAP approach (MOLAP), Rela-tional OLAP approach (ROLAP) and HybridOLAP approach (HOLAP). MOLAP approachstores multidimensional data in array-based mul-tidimensional storage and implement the OLAPoperations over these special data structures.Each dimension of the DW model is representedas an array dimension. MOLAP approach offersgood access times, but the array update and man-agement are more difficult.

ROLAP approach maps DW multidimensionalmodel into relational tables. Each fact conceptbecomes a fact table having measure attributesand each dimension concept becomes a dimen-sion table having dimension attributes. Fact ta-bles store secondary key attributes of dimensionstables related to the fact. ROLAP servers includeoptimization, scalability, implementation of ag-gregation navigation logic and additional services,which make the relational approach the most pop-ular representation adopted by several importantDBMS editors. ROLAP systems model DW appli-cations by means of a star schema or its variant.A star schema consists of a huge fact table and anumber of dimension tables ( 14). The fact table isjoined to dimension tables. The snowflake schemais a variant of the star schema where dimensiontables are split into several dimension level tablesaccording to their granularity.

HOLAP approach combines both MOLAP andROLAP approaches to get benefits from the scal-ability of ROLAP and the faster computation ofMOLAP. HOLAP manages frequently used data(generally aggregated data) in a MOLAP systemand other data in a ROLAP system.

14

Figure 14. DW star schema

3.3.3. C4: Requirements engineering

Requirements engineering (RE) plays a cru-cial role in information systems development pro-cesses to reduce the risk of failure. Requirementsof an enterprise-wide DW system determine itsfunctional behaviour and its available informa-tion, for example what data must be accessible,how it is transformed and organized, as well ashow it is aggregated or calculated. Requirementsenable stakeholders to communicate the purpose,establish the direction and set the expectationsof information goals for the enterprise. On theother hand, the development team of a data ware-house system expects a complete, correct and un-ambiguous specification of the system to build,which means a further refinement of the businessrequirements from the stakeholders.

Generally, a RE process can be divided intofour activities: (i) requirements elicitation, (ii) re-quirements specification, (iii) requirements vali-dation and (iiii) requirements management. Therequirements specification presents a significantstage that relates to the requirements formaliza-tion. A good requirements specification impliesthus a good DW design. There are several tech-niques for requirements specification to assist re-quirements analysts and stakeholders in produc-ing requirements specification of higher quality,and some of them are put into practice in indus-try:

• Informal techniques: are built in naturallanguage, sometimes with structuring rules.Their use introduces risks of ambiguitiesbecause neither their syntax, nor their se-mantics are perfectly defined. Among thesetechniques, we can quote questionnaires andinterviews.

• Semi-formal techniques: are generally basedon graphic notations with a specified syntaxallowing having a clear vision of the system.These models are good vectors of communi-cation between designers and users system.Among these techniques we can quote UMLnotation.

• Formal techniques: are based on mathemat-ical or logical notations which provide a pre-cise and no ambiguous framework for re-quirements modelling. We can quote modelsusing the B notation and ontological mod-els.

4. Related work: Towards a conceptual

continuity of DW design methods

Many recent studies proposed to combine DWand ontology technologies principally for thefollowing tasks: Extract-Transform-Load (ETL)process, validation of the multidimensional struc-ture, and DW design. Some proposals exploit on-tologies for the ETL phase: (15) proposed ETLmethod for integrating data from sources intothe DW using an ontology as a global schemadescribing sources, (60) proposed to automatethe ETL process by constructing OWL ontol-ogy linking schemas of sources to the DW tar-get schema. In another perspective, some propos-als use ontologies for the validation or extensionof the DW multidimensional schema: (40) pro-posed to use hyperonymy and hyponymy rela-tionships of Wordnet ontology to enrich and com-plete dimensions hierarchies. To the best of ourknowledge, only three studies proposed the useof ontologies for designing DW applications (57;51; 41). Before describing these three works, wefirst present a classification of DW design meth-ods that shows their convergence to the ontologi-

15

Figure 15. Classification of DW design methods.

cal methods through a conceptual continuity ap-proach.When exploring the main DW design works,

we notice a trend of methods following a supply-driven approach, where the DW model is gener-ated from an analysis of data sources (38; 61).User’s requirements are ignored in these works.The domain of requirement analysis is related toRequirements Engineering (RE). RE for DW de-sign aims at identifying needs of users and deci-sion makers to implement a DW satisfying them.The experience showed that revisiting require-ments specification is generally needed even afterDW implementation7. Many methods (47; 67),called demand-driven methods, were proposed togenerate a DW model after analysing users’ re-quirements. Other methods (11; 30), called mixedmethods, generate the DW model from bothsources and requirements.These methods generate DWmodel either from

logical schemas of sources (38), or from concep-tual schemas of sources (61; 34) requiring gener-ally reverse engineering efforts, or recently fromontological schemas (Figure 15). Ontologies havebeen proposed in DW design, essentially as a so-lution to data integration problems caused by se-mantic heterogeneities. These ontological meth-ods are mainly supply-driven methods, where do-main ontologies are used as conceptual schemasintegrating data sources. (57) defines the DWmultidimensional model (facts and dimensions)from an OWL ontology by identifying functionaldependencies (Functional ObjectProperties) be-

7http://www.redbooks.ibm.com

tween ontological concepts. (51) defines a frame-work for analyzing annotations of the semanticweb by designing a semi-structured DW.

From requirements perspective, we notice thatproposed methods define DW schema from in-formal requirements models (28), or from semi-formal models (generally UML models or modelsof i*8 framework) (49; 47; 13). These methodsconsider users’ requirements as the set of speci-fications of the given application. Ontologies onthe other hand can be seen as the set of spec-ification of the entire domain. We proposed in(41) an ontological design method that consistsin defining a conceptual model from an OWL do-main ontology integrating data sources and speci-fying requirements. This method is also supportedby a design tool (42). Users’ requirements in thismethod are expressed in terms of concepts of thedomain ontology to generate a DW ontology, fromwhich a multidimensional model is defined. Fol-lowing the hybrid approach, (58) also proposeda mixed method producing a multidimensionalmodel from an OWL ontology describing sources.Requirements are then used to identify the ETLoperations needed for mapping the sources to tar-get data stores.

In this paper, we propose an ontological designmethod following a mixed approach, extendingour previous method (41), where we define a newDW storage structure representing requirementspersistently. Requirements analysis in DW designlitterature can differ according to the object ana-lyzed. We distinguish process-driven, user-drivenor goal-driven analysis. Process-driven analysis(47; 13; 59) analyzes requirements by identify-ing business processes of the organization. User-driven analysis (67) identifies requirements ofeach target user and unifies them in a globalmodel. Goal driven analysis (11; 30; 40) identifiesgoals and objectives that guide decisions of theorganization at different levels. It is recognizedby different authors that goal-driven analysis pro-vides a good definition of users requirements (56).

8i* is a framework using notions of Goals and Agents. i*is used for identifying users of the system to develop, andrepresent them into social actors dependent each otherthrough the goals to realize, tasks to be made and re-sources used.

16

Figure 16. Criteria for analyzing related works.

Identifying goals of users at the beginning of theproject is an important step in the developmentprocess, especially in decisional applications thatneeds to analyze the activity of an organizationand where goals is an important indicator of thisactivity. To define requirements, we thus followedthe goal-driven approach that has been used inDW context (11; 30).

According to the four concepts detailed in thebackground section (Schemas, Sources, Integra-tion, Requirements) (See Figure 16), we summa-rize the most relevant works discussed in Table1:

Our proposed approach defines the DW fromthe global ontology integrating available OBDBsources. Requirements are specified at the onto-logical level, and are made persistent in the DWstructure.

5. Our Design Approach

Our objective is to define an OBDW mak-ing users requirements persistent within the DWstructure. To fulfill this objective, we need to de-fine: (i) a requirements model, (ii) the processmaking those requirements persistent and (iii) thestructure of the OBDW proposed. The followingsections will describe each step.

5.1. The requirements model

The requirements model we propose followsa goal-driven approach. A goal is an objec-tive that the system under consideration should

achieve. Goal-oriented requirements specificationproduces the so called goal graphs, which rep-resent goals and their logical relationships in anAND-OR graph form ( 18). From our analysis ofvarious existing works suggested in goal-orientedlitterature, we propose our Goal-oriented modelto assist DW analysts and stakeholders in pro-ducing requirements specification of high quality.We formalize a goal based on the GQM definition(Goal Question Metric) (6): ”A goal is defined foran object, for a variety of reasons, with respect tovarious models of quality, from various points ofview, relative to a particular environment”.Fig. 17.b presents the goal oriented require-

ments model we defined, represented as an UMLclass diagram designed with Eclipse Papyrus plu-gin. This model is composed of a main class Goal-Model that stores all Goals, and is aimed at pro-moting requirements reuse.Let us consider the first requirement of the

case study Goal1: ”This requirement provides areport of all reservations made at a given reser-vation date. The report lists totals of bestPrice,basicPrice and rate of guaranteed cancelled. Notethat the rate is equal to the sum [bestPrice - ba-sicPrice].”. A goal is described by the follow-ing properties: goal identifier (Id), name (Goal1),description, purpose (Report) and its priority(mandatory or optional goal). Three reflexive re-lationships between goals are distinguished: (i)Influence relationships: positive or negative in-fluence relationships exist between goals. For ex-ample Goali : Reduce number of cars for branchC negatively influences Goalj : Increase rentalagreements for branch C. (ii) ParentChild rela-tionships: decomposing a general goal into sub-goals using AND/OR relations, (iii) GoalRelationrepresenting relations of conflicts, refinement,containment and equivalence between goals.A goal is characterized by three classes related

to Goal class through aggregation relationships,that we call coordinates of the goal. They areused for easing the gathering of the goals fromusers. Coordinates are: a Metric measuring thesatisfaction of a goal (sum [bestPrice - basicPrice]for Goal1, a Result to analyze (bestPrice, ba-sicPrice and rate for Goal1), and Parameters re-flecting the object of the goal (reservation date

17

Authors/Criteria Type of Schema Type of Sources DW Integration Requirements(34) Conceptual or relational Traditional No No(38) Relational Traditional No No(11) Conceptual Traditional No Yes(30) Relational Traditional No Yes(57) Ontological Traditional No No(58) Ontological Traditional Yes Yes(51) Ontological Ontology-based No Yes(41) Ontological Traditional No Yes

Table 1DW design works analyzed following 4 criteria

Figure 17. Requirement model connected to the ontology model

18

Figure 18. Graph of goals of the rental car domain

for Goal1).Two types of goals are identified: functional

and non-functional. A non-functional require-ment is defined as an attribute or constraint ofthe system (such as security, performance, flexi-bility, etc) (31). Each goal involves one or moreactors that interact with the system to fulfil thegoal (eg. Sales manager for Goal1). Having a setof goals, a graph of goals can be produced as pre-sented in figure 18. The graph is composed ofthe different goals as nodes and the described re-lationships between goals as edges.

Users’ goals are expressed at the ontologicallevel where we defined a mapping between co-ordinates of each goal (Metric, Result and Pa-rameter) and the properties of the domain ontol-ogy. Coordinates of the goal are expressed usingthe properties of the ontology in order to allowthe designer to choose the most relevant proper-ties of each class that can be used to define thegoal. The domain ontology represents the globalmodel integrating the whole sources. Goals areused in this process, to inform us about the mostrelevant information to store in the DW model.Figure 17.a represents a fragment of the OWLontology meta-model.

Fig. 19.c and Fig. 19.d presents an example ofinstantiation of the ontology and goal models us-ing Goal1 requirement expressed on classes andproperties of Rental car ontology.

5.2. The design process proposed

DW design typically includes the three usualdesign phases: conceptual, logical and physical

phases. We first extract a DW ontology from theglobal domain ontology (GO), that is consideredas the DW conceptual model that will be an-notated by multidimensional concepts. A set oftransformation rules is used to translate the con-ceptual model into a logical relational model.

5.2.1. Definition of the DW ontology

Extracting the DW ontology

A DW ontology (DWO) is defined as a mod-ule of the Global Ontology (GO) by extractingconcepts and properties used to express usersgoals. This DWO represents the DW concep-tual model that will be stored in the DW struc-ture for describing its data semantics. Three sce-narios are possible: (1) DWO = GO: the GOcorresponds exactly to users’ goals and require-ments, (2) DWO ⊂ GO: the DWO is extractedfrom the GO using OntoSub operator and cov-ers all requirements, (3) DWO ⊃ GO: the GOdoes not fulfill the entire users’ requirements. Thedesigner extracts the fragment of the GO cor-responding to requirements and can locally en-rich it with new concepts and properties. Fol-lowing the first formalization of domain ontolo-gies, the DWO is formally defined as follows:O :< CDW ,PDW ,SubDW ,ApplicDW >.

Once the DWO is defined, we exploit its rea-soning capabilities to correct all inconsistencies,and to reason about the goals model.

Reasoning on the DW ontology

Two usual reasoning mechanisms are first used:checking the consistency of the ontology (classesand instances) and inferring subsumption rela-tionships. This reasoning is supported by mostexisting reasoners, and allows the detection ofdesign errors. Another reasoning mechanism isused in order to propagate influence relation-ships between goals. Influence relationships areused afterwards for exploring the multidimen-sional structure of the DW model, where weconsider ’fact’ concepts as central concepts and’dimensions’ as concepts influencing them. Forpropagating influence relationships, we definedpropagating rules inspired from works on casualgraphs (17) that use the same semantic of influ-

19

ence relationships. Two operations are used: addi-tion corresponding to the idea of cumulative influ-ences of numerous paths having the same targetgoal, and multiplication operation correspondingto the idea of transitivity of influences in a path.Those rules are applied on the graph of goals tofind a sort of transitive closure of influence rela-tionships.Based on the goal model of figure Fig. 17.b, we

defined the following propagating rules:Positive − influence(G1, G2) ∧ Negative −

influence(G2, G3) −→ Negative −influence(G1, G3)Negative − influence(G1, G2) ∧ Negative −

influence(G2, G3) −→ Positive −influence(G1, G3)Positive − influence(G1, G2) ∧ Positive −

influence(G2, G3) −→ Positive −influence(G1, G3)Negative − influence(G1, G2) ∧ Positive −

influence(G2, G3) −→ Negative −influence(G1, G3)Positive − influence(G1, G2) ∧ Negative −

influence(G1, G3) −→ Undetermined −influence(G1, G2)

Annotating the DW ontology

The multidimensional role of concepts andproperties are then identified based on an anal-ysis of defined goals. We consider for each goal,properties used to define the result (or its metric)as potential fact measures as it represents a resultto analyze. Parameters are entities possibly influ-encing a goal result and are thus considered as po-tential dimension attributes. Parameters of goalsinfluencing this given goal (existing and inferred)are also considered as potential dimensions. Letus take the following goals ”Increase rental agree-ments for country C” and the goal ”Reduce num-ber of cars for all branches” influencing the firstgoal negatively. Parameters Branch (of the sec-ond goal) and Country (of the first goal) are bothcandidates to represent dimensions for the mea-sure Rental Agreements (See figure 20 for thisexample). We validate the link between facts andtheir potential dimensions by looking for (1,n) re-lationships from the DW ontology. (1,n) relation-ships are identified as direct or transitive func-

Figure 20. Influence relationships used for com-pleting DW multidimensional structure

tional ObjectProperties in OWL ontologies (57).Algorithm 1 formalizes these steps.

5.2.2. Definition of the DW logical model

The logical model of the DW is generated bytranslating the annotated DWO into a relationalmodel. Several works in the literature proposedmethods for translating ontologies described ina given formalism (PLIB, OWL, RDF) to a re-lational or object-relational representation (27).This translation can follow three possible rela-tional representations (2): vertical, binary andhorizontal (See Figure 21). Vertical representa-tion is used for RDF ontologies, and stores datain a unique table of three columns (subject, pred-icate, object). In a binary representation, classesand properties are stored in tables of differentstructures. Horizontal representation translateseach class as a table having a column for eachproperty of the class. We proposed in (23) a set oftranslation rules for representing PLIB and OWLontology (classes, properties and restrictions) ina relational schema following the binary and hor-izontal representations.

In our case study scenario (ontology model andrequirements), the DWO was equal to the GO.The multidimensional algorithm identified twofacts: (i) Rental-Agreement having the followingdimensions Rental-duration, Car-Model, Country,customer, and (ii) Branch fact having Customerand Country as dimensions. Figure 22 presentsthe relational star schema obtained after trans-lating the DWO into a relational schema.

20

Figure 19. Instantiation of the requirements and ontological models

Figure 22. Rental car Star schema defined by the design process proposed.

21

begin

for Each goal G doEach ontological property used asresult or metric of G is a measurecandidate;Each property used as a parameter ofG is a dimension attribute candidate;Parameters of the goals influencinggoal G are dimension attributes of themeasure identified for G;Classes that are domains of propertiesthat are measures candidates arefacts candidates;Classes that are domains of dimensionattributes are dimension candidates;if fact class F is linked to adimension by (1,n) relationship then

keep the two classes in the model;else

Rend

eject the dimension class;

end

Hierarchies between dimensions areconstructed by looking for (1,n)relationships between classesidentified as dimensions (for eachfact);

end

end

Algorithm 1: Multidimensional annotations

5.3. Definition of the OBDW

To validate our proposal, we choose the OBDBOntoDB (22) introducing an additional partcalled the meta-schema (Figure 23). The DWontology defined is loaded in OntoDB. The meta-schema is extended to include the requirementsmodel (Fig. 17.b). This extension is done bydefining OntoQL statements. OntoDB currentlysupports only the horizontal relational represen-tation for the data part. The main characteristicof this solution is that it materializes all differentmodels used along the life cycle of DW applica-tions: the conceptual model (represented by theontology), the logical model that may represented

Figure 21. Rental car Star schema defined by thedesign process proposed.

following a binary or horizontal relational repre-sentation according to the choice of the designer,the meta-schema extended by the requirementsmodel we defined for storing requirements persis-tently in the DW, and the local ontology definingsemantics of stored data and requirements.

6. Case tool implementing the proposed

method

To validate our proposal, a case tool has beendeveloped implementing the design steps de-scribed (Figure 24). This tool is designed to offerdesigners similar functionalities offered by classi-cal database case tool tools such as: PowerAMC,Rational Rose, etc. We chose OWL formalism asan example to implement our design process. Ourtool is a graphical system developed in Java Fig-ure 25. Considered OBDBs reference an exist-ing integrated ontology (GO) formalized in OWL.The GO integrating the selected OBDBs is iden-tified by its URI, and its information is displayed

22

Figure 23. OBDW: Extended OBDB of type III by the requirements model.

on the same way as Proteg editor. Classes arepresented as a hierarchy. The selection of a classdisplays its information (Concept name, descrip-tion, properties, super-classes and instances). Ac-cess to all ontologies is made through the OWLAPI. Designers can either express users’ require-ments through an interface allowing them to se-lect the different classes and properties used inrequirements or by choosing a text file contain-ing these requirements. Requirements expressedmust follow the goal model we presented above.The tool checks the syntax of each requirement.The projection of requirements on the GO gen-erates the DW ontology (DWO) that is a mod-ule of the GO. DWO is extracted using ProSeplug-in available within Protege editor, ensuringthe logical completeness of the extracted ontol-ogy (39). Fact++ reasoner is invoked to clas-sify the DWO classes taxonomy and to check itsconsistency. Influence rules defined on goals areimplemented using SWRL (Semantic Web RuleLanguage) language 9, which must be combinedwith Jess 10 inference engine to execute definedSWRL rules and apply them on the DWO. Aparser analyzes requirements specified in order toidentify the multidimensional role of classes and

9http://www.w3.org/Submission/SWRL/10http://www.jessrules.com/

properties according to the annotation algorithmdefined (Algorithm 1). The logical star schemais then defined after translating the DWO anno-tated. This multidimensional model is presentedin two ways: (i) as a tree whose nodes are identi-fied facts, their measures, their respective dimen-sions and dimensions attributes, (ii) graphicallyas a relational star schema that can be printed.The designer can modify this model graphicallyby adding and deleting classes and properties. Ademonstration video of this tool is available athttp://www.lisi.ensma.fr/members/bellatreche.

7. Open issues on evolution in OBDW de-

fined

The second issue when making heterogeneoussystems interoperable is to keep and maintain thisinteroperability operational to different changescharacterizing the complex environment thesesystems evolve in. This new direction is calledsustainable interoperability. Increasingly researchefforts are proposed in this direction for evolvinginteroperable systems sustainably (52; 7; 33; 25;45). Managing the DW evolution implies main-taining and updating the warehouse so that itfulfils the organization requirements as well aspossible.Based on concepts seen in the background, we

23

Figure 24. Architecture of the proposed case tool.

discuss in what follows some scenarios of DW evo-lutions and we show how our ontology-based datawarehouse can adapt to different changes aris-ing to ontologies, sources and requirements. Thesescenarios are as follows: (i) The changes occur inthe ontology but does not affect the DW model,(ii) The changes occur in the ontology and affectthe DW model, (iii) The changes occur in require-ments, (iv) The changes occur in sources.

7.1. Ontology changes do not affect the

DW model

The DW model we defined is designed basedon domain ontology. An ontology, by definitionis a shared and consensual conceptualization ofa given domain. This is one of the advantagesof considering the domain ontology schema as afirst design level where most domain changes arealready limited. One can make the reasonable as-sumption that the shared ontology evolves slowly.However, in some few cases the ontology canevolve. (53) identified three causes for an ontol-

Figure 25. Interfaces from the case tool proposed.

ogy evolution: (i) Changes in the domain (domainevolution due to changes in the real world), (ii)Changes in conceptualization (changes in usageperspective), (iii) Changes in the explicit spec-ification (occur when an ontology is translatedto another knowledge-representation language).These changes can interest decision makers ornot. If not, the DW model remains the same. Thesecond case where the changes interest decisionmakers is discussed in the following point.

7.2. Ontology changes affect the DW

model

In this scenario, changes occurring in the on-tology have to be impacted on the DW model.The exchange interoperability is based on the useof a shared ontology for which definitions havebeen defined at a given time. At that given time,the interpretation of concepts descriptions is ef-fectively unambiguous. But, during an ontologylifetime, evolutions/corrections may be requiredand, in order to preserve the right interpretationof concepts at any time, a versioning managementmust be provided. In PLIB for example, some on-tology change management rules have been de-fined. These rules are based on the ontology con-tinuity principle (10). The idea is to keep trackof evolutions by ensuring in a single ontology(the more recent one) that whatever have beenits evolutions, the interpretation of any conceptdescriptions that was conforming to that ontol-ogy at a given instant can be performed. As aconsequence, it means for instance that a con-

24

cept cannot be removed, but must be annotated” deprecated”. This evolution management dis-tinguishes ontology evolution that corresponds torevision changes (no impact in the concept de-scription interpretation), version change (the oldontology cannot interpret all the evolutions pro-vided by the new ontology) and error correction(the concept shall be deprecated).

7.3. Requirements change

Handling requirements evolution means main-taining the changes that occur to the decisionalrequirements and studying their impact on theDW model. Indeed, new requirements may ariseand others may become inappropriate. Thesechanges can either just be updated the DWmodel, or can affect the whole DW structure (forexample by rising new facts and dimensions). Therequirements model in our approach is stored inthe DW structure extending the DW conceptualmodel. The impact of each requirement changeon DW schema and data can thus be identifiedand can be managed more accurately. As statedin (3), when requirements and their connectionwith semantics sources are represented using for-mal expressions and stored on dedicated knowl-edge bases, this facilitates a sustainable interop-erability through the creation of systems capableto reason and deduce mapping readjustments ac-cording to requirements changes.

The impact of requirements changes on otherrequirements is also an important dimension. Achange made to one requirement may affect sev-eral other requirements making them to change aswell. Neglecting these dependencies can increasethe cost of implementing a requirement, and inturn cause budget or schedule problems (5)

7.4. Sources change

Changes in a DW sources, like in the tradi-tional databases, have two categories: (i) Contentchanges (insertion, update, deletion of data andrecords), (ii) Schema changes (addition, modifi-cation and dropping data structures and theirattributes) (52). In order to tackle the problemof schema changes, two different ways are possi-ble: schema evolution and schema versioning. Thefirst approach consists in updating a schema and

transforming data from an old schema into a newone, and only the current version of a schema ispresent. In contrast, the second approach keepstrack of the history of all versions of a schema(52). We proposed in (52) a framework for man-aging the versioning model of a DW constructedform ontology-based sources. The approach dealswith asynchronous evolution of local ontologiesof sources and the shared one. An important ad-vantage in our approach is the link between thedifferent models. Changes can thus be identifiedand managed on the conceptual level (the localontologies), and can be propagated to the othermodels along the design cycle (logical and physi-cal models).

8. Conclusion

Systems interoperability is one of the most im-portant issues that needs to be solved for enter-prises that have to deal with different partnerscollaborating beyond their own frontiers. In or-der to cooperate and to exchange information inan efficient way, enterprises data need to be inte-grated into a unified view. Ontologies have beenproved to be effective solutions for the developingintegration solutions allowing enterprise sourcesto share the same vocabulary. The presence ofontologies during the development of integrationsystems facilitates the management of data andrequirements ambiguity.In parallel to this situation, ontologies have beenextensively adopted by companies. As conse-quence, mountains of ontological data are pro-duced. To offer solutions for storing, managing,querying these data, ontology based database(OBDB) systems have been proposed by aca-demic and industrial communities. As conse-quence, ontological instances become a gold minefor decision makers to get efficient knowledge toimprove the profit of their enterprises. Data ware-house is the enterprise memory that materializesdata from different sources. In this paper, we pro-pose a new approach for designing ontology-baseddata warehouse from a set of OBDBs. Addition-ally to the nature of sources participating in theconstruction of our data warehouse, a second im-portant component that our methodology consid-

25

ers is the user’s requirements. We propose to ex-tend the use of ontologies, on the same way asfor data sources, to clarify (eliminate ambiguity)and unify users requirements. User’s requirementsare furthermore stored in the DW model. Thepresence of these requirements may contribute inreducing the complexity of other phases of theDW life cycle (optimization, personalization andevolution management). To do so, a requirementmodel following a goal-driven formalism is givenand mechanism to ensure their persistency intothe DW is developed. The final DW structuredefined is validated within the OBDB OntoDB,extending its ontological meta-schema by the re-quirements model. A tool implementing the de-sign method proposed is developed. A discussionabout different evolution scenarios and their man-agement within our proposed design framework ispresented.This work leads to many other tasks currently

in progress including: (i) evaluating our methodfor large scale case study, (ii) a validation of differ-ent evolution management scenarios, (iii) study ofthe impact of requirements persistency in differ-ent phases of life cycle of DW development (queryprocessing and physical design) and (iv) the ex-tension of our design method (including user re-quirement persistency) for OLTP applications.

References

[1] IEEE 830. Recommended practice for software re-quirements specifications. 1998.

[2] D.J. Abadi, A. Marcus, S. Madden, and K. Hollen-bach. Sw-store: a vertically partitioned dbms forsemantic web data management. VLDB Journal,18(2):385–406, 2009.

[3] C. Agostinho and R. Jardim-Goncalves. Dynamicbusiness networks: A headache for sustainable sys-tems interoperability. In OTM Workshops, pages194–204, 2009.

[4] S. Alexaki, Christophides V., G. Karvounarakis,D. Plexousakis, and K. Tolle. In proceedings ofthe second international workshop on the semanticweb. In SemWeb, 2001.

[5] A. Aybuke and W. Claes. Engineering and ManagingSoftware Requirements. Springer, 2005.

[6] V. Bazili, C. Gianluigi, and H. D. Rombach. The goalquestion metric approach. Technical report, Com-puter science Technical Report Series CS-TR-2956,

1992.[7] B. Bebel, J. Eder, C. Koncilia, T. Morzy, and

R. Wrembel. Creation and management of versionsin multiversion data warehouse. In In Proc. ACMSAC, pages 717–723, 2004.

[8] L. Bellatreche, Y. Ait Ameur, and C. Chakroun. Adesign methodology of ontology based database ap-plications. Logic Journal of the IGPL, Oxford Uni-versity Press, 19(5):648–665, 2011.

[9] L. Bellatreche, D. Nguyen Xuan, G. Pierra, andH. Dehainsala. Contribution of ontology-baseddata modeling to automatic integration of elec-tronic catalogues within engineering databases.Computers in Industry Journal Elsevier, 57(8-9):711–724, 2006.

[10] L. Bellatreche, G. Pierra, and E. Sardet. EvolutionManagement of Data Integration Systems by theMeans of Ontological Continuity Principle. RecentTrends in Information Reuse and Integration Book,edited by Springer, 2011.

[11] A. Bonifati, F. Cattaneo, S. Ceri, A. Fuggetta, andS. Paraboschi. Designing data marts for data ware-houses. ACM Transactions on Software Engineer-ing and Methodology, 10(4):452–483, 2001.

[12] J. Broekstra, A. Kampman, and F. V. Harmelen.Sesame: A generic architecture for storing andquerying rdf and rdf schema. In International Se-mantic Web Conference, pages 54–68, 2002.

[13] R. Bruckner, B. List, and J. Schiefer. developing re-quirements for data warehouse systems with usecase. Seventh Americas Conference on InformationSystems, 2001.

[14] M. J. Cafarella and A. Y. Halevy. Web data man-agement. In Proceedings of the ACM SIGMOD In-ternational Conference on Management of Data,pages 1199–1200, 2011.

[15] D. Calvanese, G. Giacomo, M. Lenzerini, D. Nardi,and R. Rosati. Data integration in data warehous-ing. Int. J. Cooperative Inf. Syst., 10(3):237–271,2001.

[16] S. Castano, V. Antonellis, and S. D. C. Vimer-cati. Global viewing of heterogeneous data sources.Transactions on Knowledge and Data Engineering,13(2):277–297, 2001.

[17] B. Chaib-draa. Causal maps: theory, implementation,and practical applications in multiagent environ-ments. IEEE TKDE, 14(6):1201–1217, 2002.

[18] S. Chaudhuri and U. Dayal. n overview of data ware-housing and olap technology. A Sigmod Record,26(1):65–74, March 1997.

[19] D. Chen, G. Doumeingts, and F. Vernadat. Archi-tectures for enterprise integration and interoper-

26

ability: Past, present and future. Special issue onEnterprise Integration and Interoperability in Man-ufacturing Systems. Computers In Industry. Else-vier, 59(5), 2008.

[20] P. Chen. The entity relationship model - towardsa unified view of data. ACM Transactions onDatabase Systems (TODS), 1(1):9–36, 1976.

[21] E. I. Das, G. Eadon Chong, and J. Srinivasan.Supporting ontology-based semantic matching inrdbms. In VLDB, pages 1054–1065, 2004.

[22] H. Dehainsala, G. Pierra, and L. Bellatreche. Ontodb:An ontology-based database for data intensive ap-plications. In DASFAA, pages 497–508, April 2007.

[23] C. Fankam. Ontodb2 : un systme flexible et efficientde base de donnes base ontologique pour le websmantique et les donnes techniques. Ph.d. thesis,Poitiers University, 2009.

[24] C. Fankam, S. Jean, L. Bellatreche, andY. Ait Ameur. Extending the ansi/sparc ar-chitecture database with explicit data semantics:An ontology-based approach. In Second EuropeanConference on Software Architecture (ECSA’08,pages 318– 321, 2008.

[25] J. Ferreira, C. Agostinho, J. Sarraipa, and R. Jardim-Goncalves. Monitoring morphisms to support sus-tainable interoperability of enterprise systems. InRobert Meersman, Tharam S. Dillon, and PilarHerrero, editors, OTM Workshops, volume 7046 ofLecture Notes in Computer Science, pages 71–82.Springer, 2011.

[26] E. Franconi, F. Baader, U. Sattler, and P. Vassil-iadis. Fundamentals of Data Warehousing, chap-ter Multidimensional Data Models and Aggrega-tion. Springer Verlag Berlin Heidelberg, 2000.

[27] A. Gali, C.X Chen, K. Claypool, and R. Uceda-Sosa.From ontology to relational databases. ER Work-shops, pages 278–289, 2004.

[28] I. Gam and C. Salinesi. A requirement-drivenapproach for designing data warehouses.REFSQ’2006, 2006.

[29] A. Giacometti, P. Marcel, E. Negre, and A. Soulet.Query recommendations for olap discovery drivenanalysis. International Journal of Data Warehous-ing and Mining IJDWM, 2011.

[30] P. Giorgini, S. Rizzi, and M. Garzetti. Goal-orientedrequirement analysis for data warehouse design.DOLAP’05, pages 47–56, 2005.

[31] M. Glinz. On non-functional requirements. 15th IEEEInternational Requirements Engineering Confer-ence RE07, pages 21–26, 2007.

[32] C.H. Goh, S. Bressan, E. Madnick, and M.D. Siegel.Context interchange: New features and formalisms

for the intelligent integration of information. ACMTransactions on Information Systems, 17(3):270–293, 1999.

[33] M. Golfarelli, J. Lechtenborger, S. Rizzi, andG. Vossen. Schema versioning in data warehouses:enabling cross-version querying via schema aug-mentation.Data and Knowledge Engineering, 2006.

[34] M. Golfarelli and S. Rizzi. Methodological frameworkfor data warehouse design. In DOLAP, pages 3–9.ACM, 1998.

[35] M. Golfarelli and S. Rizzi. Data Warehouse De-sign: Modern Principles and Methodologies. Mc-Graw Hill, 2009.

[36] T. R. Gruber. A translation approach to portableontology specifications. In Knowledge Acquisition,5:199–220, 1993.

[37] S. Jean, G. Pierra, and Y. Aıt Ameur. Domainontologies: A database-oriented analysis. In JoseA. Moinhos Cordeiro, Vitor Pedrosa, Bruno En-carnacao, and Joaquim Filipe, editors, WEBIST(1), pages 341–351. INSTICC Press, 2006.

[38] M. R. Jensen, T. Holmgren, and T.B. Pedersen. Dis-covering multidimensional structure in relationaldata. 6th International Conference on Data Ware-housing and Knowledge Discovery, LNCS Springer,3181:138–148, 2004.

[39] E. Jimenez-Ruiz, B.C. Grau, U. Sattler, T. Schnei-der, and R. Berlanga Llavori. Safe and economicre-use of ontologies: A logic-based methodologyand tool support. In Franz Baader, Carsten Lutz,and Boris Motik, editors, Description Logics, vol-ume 353 of CEUR Workshop Proceedings. CEUR-WS.org, 2008.

[40] Mazon J.N., J. Trujillo, M. A. Serrano, and M. Pi-attini. Improving the development of data ware-houses by enriching dimension hierarchies withwordnet. in: M. Collard (Ed.), ODBIS, 4623 ofLecture Notes in Computer Science, Springer:85–101, 2006.

[41] S. Khouri and L. Bellatreche. A methodology and toolfor conceptual designing a data warehouse fromontology-based sources. DOLAP10, pages 19–24,2010.

[42] S. Khouri and L. Bellatreche. Dwobs: Data warehousedesign from ontology-based sources. In DASFAA,pages 438–441, 2011.

[43] M. A. King. A realistic data warehouse project: An in-tegration of microsoft acces and microsoft excel ad-vanced features and skills. Journal of InformationTechnology Education Innovations in Practice., 8,2009.

[44] A. Kiryakov, D. Ognyanov, and D. Manov. Owlim

27

- a pragmatic semantic repository for owl. WISEWorkshops, pages 182–192, 2005.

[45] H. Kondylakis and D. Plexousakis. Exelixis: evolv-ing ontology-based data integration system. In Pro-ceedings of the ACM SIGMOD International Con-ference on Management of Data, pages 1283–1286,2011.

[46] H.G. Lenz and A. Shoshani. Summarizability in olapand statistical data bases. In SSDBM, pages 132–143, 1997.

[47] B. List, J. Schiefer, and A. M. Tjoa. Process-orientedrequirement analysis supporting the data ware-house design process a use case driven approach.11th International Conference on DEXA, pages593–603, 2000.

[48] E. Malinowski and E. Zimanyi. Hierarchies in amultidimensional model: From conceptual model-ing to logical representation. Data Knowl. Eng.,59(2):348–377, 2006.

[49] J. Mazon, J. Pardillo, E. Soler, O. Glorio, and J. Tru-jillo. Applying the i* framework to the develop-ment of data warehouses. In The 3rd Internationali* Workshop - istar08, pages 79–82, 2008.

[50] B. McBride. Jena: Implementing the rdf modeland syntax specification. Technical report HewlettPackard Laboratories, 2001.

[51] V. Nebot, R. Berlanga, J. M. Perez, M. J. Aram-buru, and T. B. Pedersen. Multidimensional inte-grated ontologies: A framework for designing se-mantic data warehouses. Journal on Data Seman-tics JoDS, 13:1–36, 2009.

[52] D. Nguyen Xuan, L. Bellatreche, and G. Pierra. Aversioning management model for ontology-baseddata warehouses. In DaWaK, pages 195–206, 2006.

[53] Klein M.C.A. Noy N.F. Ontology evolution: Notthe same as schema evolution. Knowl. Inf. Syst.,6(4):428–440, 2004.

[54] G. Pierra. Context-explication in conceptual ontolo-gies: Plib ontologies and their use for industrialdata. Journal of Advanced Manufacturing Systems,World Scientific Publishing Company, 2006.

[55] G. Pierra. Context representation in domain ontolo-gies and its use for semantic integration of data.Journal of Data Semantics (JoDS), 10:174–211,2008.

[56] C. Rolland. Reasoning with goals to engineer require-ments. Enterprise Information Systems Springer,pages 1–8, 2005.

[57] O. Romero, D. Calvanese, A. Abello, andM. Rodriguez-Muro. Discovering functional depen-dencies for multidimensional design. DOLAP09,pages 1–8, 2009.

[58] O. Romero, A. Simitsis, and A. Abello. Gem:Requirement-driven generation of etl and multidi-mensional conceptual designs. In DaWaK, pages80–95, 2011.

[59] J. Schiefer, B. List, and R.M. Bruckner. A holisticapproach for managing requirements of data ware-house systems. 8th Americas Conference on Infor-mation Systems, 2002.

[60] D. Skoutas and A. Simitsis. Designing etl processesusing semantic web technologies. In Proc. of theACM 9th International Workshop on Data Ware-housing and OLAP, pages 67–74, 2006.

[61] I.Y. Song, R. Khare, and B. Dai. Samstar: Asemi-automated lexical method for generating starschemas from an er diagram. DOLAP07, pages 9–16, 2007.

[62] N. Stolba. Towards a sustainable data warehouse ap-proach for evidence-based heathcare. Ph.d. thesis,Vienna University of Technology, 2007.

[63] V. Sugumaran and V. C. Storey. The role of domainontologies in database design: An ontology man-agement and conceptual modeling environment.ACM TODS, 31(3):10641094, 2006.

[64] P. Vassiliadis, M. Bouzeghoub, and C. Quix. Towardsquality-oriented data warehouse usage and evolu-tion. Inf. Syst., 25(2):89–115, 2000.

[65] H. Wache, T. Vogele, U. Visser, H. Stucken-schmidt, G. Schuster, H. Neumann, and S. Hubner.Ontology-based integration of information - a sur-vey of existing approaches. Proceedings of the In-ternational Workshop on Ontologies and Informa-tion Sharing, pages 108–117, August 2001.

[66] P. Bao J. Wang Y., Haase. A survey of formalismsfor modular ontologies. In: IJCAI 2007. WorkshopSWeCKa., January 2007.

[67] R. Winter and B. Strauch. A method for demanddriven information requirements analysis in datawarehousing projects. 36th HICSS, page 231, 2003.