dwh - a datawarehouse conceptual data model for multidimensional

10

Click here to load reader

Upload: ymego

Post on 02-May-2017

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DWH - A DataWarehouse Conceptual Data Model for Multidimensional

A Data Warehouse Conceptual Data Modelfor Multidimensional Aggregation

Enrico FranconiDept. of Computer Science

Univ. of ManchesterManchester M13 9PL, UK

[email protected]

Ulrike SattlerLuFG Theoretical Computer Science

RWTH AachenD-52074 Aachen, Germany

[email protected]

Abstract

This paper presents a proposal for a Data Ware-house Conceptual Data (CDWDM) Model whichallows for the description of both the relevant ag-gregated entities of the domain—together withtheir properties and their relationships with otherrelevant entities—and the relevant dimensions in-volved in building the aggregated entities. Theproposed CDWDM is able to capture the databaseschemata expressed in an extended version of theEntity-Relationship Data Model; it is able to in-troduce complex descriptions of the structure ofaggregated entities and multiply hierarchically or-ganised dimensions; it is based on DescriptionLogics, a class of formalisms for which it is possi-ble to study the expressivity in relation with decid-ability of reasoning problems and completeness ofalgorithms.

1 Introduction

Data Warehouse—and especially OLAP—applications askfor the vital extension of the expressive power and func-tionality of traditional conceptual modelling formalisms inorder to cope withaggregation. Still, there have been fewattempts [Catarciet al., 1995, Cabibbo and Torlone, 1998]to provide such an extended modelling formalism, despite

The copyright of this paper belongs to the paper’s authors. Permission tocopy without fee all or part of this material is granted provided that thecopies are not made or distributed for direct commercial advantage.

Proceedings of the International Workshop on Design andManagement of Data Warehouses (DMDW’99)Heidelberg, Germany, 14. - 15.6. 1999

(S. Gatziu, M. Jeusfeld, M. Staudt, Y. Vassiliou, eds.)

http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-19/

the fact that (1) experiences in the field of databases haveproved that conceptual modelling is crucial for the design,evolution, and optimisation of a database, (2) a great va-riety of data warehouse system are on the market, mostof them providing some implementation of multidimen-sional aggregation, and (3) query optimisation with aggre-gated queries [Nuttet al., 1998, Cohenet al., 1999] is evenmore crucial for data warehouses than it is for databases—which makessemanticquery optimisation using a concep-tual model even more important. As a consequence ofthe absence of a such an extended modelling formalism,a comparison of different systems or language extensionsfor query optimisation is difficult: a common framework inwhich to translate and compare these extensions is missing,new query optimisation techniques developed for extendedschema and/or query languages cannot be compared appro-priately.

In order to address these questions, a formal frameworkmust be developed that encompasses the abstract principlesof the data warehouse related extensions of traditional rep-resentation formalisms. In this paper, we present some pre-liminary outcome from the research done within the “Foun-dations of Data Warehouse Quality” (DWQ) long term re-search project, funded by the European Commission (n.22469) under the ESPRIT Programme. With respect tothe global picture, the role of our research within DWQis to study a formal framework at theconceptual level(seeFigure 1). The conceptual data model we are investigat-ing should be able to abstract and describe the entities andrelations which are relevant both in the whole enterprise,and in the user analysis of such information. In the follow-ing, we will refer to this formalism as the Data WarehouseConceptual Data Model (DWCDM).

1.1 A Data Warehouse Conceptual Data Model

A DWCDM must provide means for the representation of amultidimensionalconceptual view of data. More precisely,a DWCDM provides the language for defining multidimen-

E. Franconi, U. Sattler 13-1

Page 2: DWH - A DataWarehouse Conceptual Data Model for Multidimensional

DataWarehouse

Store

ClientDataStore

SourceDataStore

Conceptuallevel

Logicallevel

Physicallevel

DataWarehouse

schema

Sourceschema

Clientschema

Sourcemodel

Enterprisemodel

Clientmodel

Figure 1: The role played by the Data Warehouse Concep-tual Data Model with respect to the DWQ architecture.

sional information within a conceptual model in the datawarehouse global information base. As stated above, themodel is of support for the conceptual design of a datawarehouse, for query and view management, and for up-date propagation: it serves as a reference meta-model forderiving the inter-relations among entities, relations, aggre-gations, and for providing the integrity constraints neces-sary to reduce the design and maintenance costs of a datawarehouse. Hence a DWCDM must be expressive enoughto describe both the abstract business domain concernedwith the specific application (Enterprise model)—just likea conceptual schema in the traditional database world—andthe possible views of the enterprise information a specificuser may want to analyse (Client model)—with particularemphasis on the aggregated views, which are peculiar to adata warehouse architecture (see Figure 1). A multidimen-sional modelling object in the logical perspective—e.g., amaterialised view, a query, or a cube—should always berelated with some (possibly aggregated) entity in the con-ceptual schema.

In the following, we will briefly introduce theideas behind a multidimensional data model (see, e.g.,[Agrawalet al., 1995, Cabibbo and Torlone, 1998]) andcompare it with a traditional relational data model. Amore comprehensive introduction has been done in theforthcoming book “Fundamentals of Data Warehousing”[Baaderet al., 1999], Chapter 4 onMultidimensional Ag-gregation.

Relational database tables contain records (or rows).Each record consists of fields (or columns). In a normal re-lational database, a number of fields in each record (keys)may uniquely identify each record. In contrast, a multidi-mensional database contains�-dimensional arrays (some-times calledhypercubesor cubes), where each dimension

JanFeb

Juice

Cola

Soap

NS

W

10

13

REGION

PRODUCT

MONTH

Figure 2: Sales volume as a function of product, time, lo-cation.

has an associated hierarchy of levels of consolidated data.For instance, a spatial dimension might have a hierarchywith levels such as country, region, city, office.

Measures (which are also known as variables ormetrics)—like Sales in the example, or budget, revenue,inventory, etc.—in a multidimensional array correspond tocolumns in a relational database table whose values func-tionally depend on the values of other columns. Valueswithin a table column correspond to values for that mea-sure in a multidimensional array: measures associate val-ues with points in the multi-dimensional world. For ex-ample, the measure of the sales of the product Cola, inthe northern region, in January, is 13,000. Thus, a dimen-sion acts as an index for identifying values within a multi-dimensional array. If one member of the dimension is se-lected, then the remaining dimensions in which a range ofmembers (or all members) are selected defines a sub-cube.If all but two dimensions have a single member selected,the remaining two dimensions define a spreadsheet (or aslice or a page). If all dimensions have a single memberselected, then a single cell is defined. Dimensions offer avery concise, intuitive way of organising and selecting datafor retrieval, exploration and analysis. Usual pre-definedor user-defined dimension levels (or Roll-Ups ) for aggre-gating data in DW are: temporal (e.g., year vs. month),geographical/spatial (e.g., Rome vs. Italy), organisational(meaning the hierarchical breakdowns of your organisa-tion, e.g., Institute vs. Department), and physical (e.g., Carvs. Engine).

A value in a single cell may represent anaggre-gatedmeasure computed from more specific data at somelower level of the same dimensions. Aggregation in-volves computingaggregation functions—according tothe attribute hierarchy within dimensions or to cross-dimensional formulas—for one or more dimensions. Forexample, the value 13,000 for the sales in January, may

E. Franconi, U. Sattler 13-2

Page 3: DWH - A DataWarehouse Conceptual Data Model for Multidimensional

Calls (av. duration) Date (Day)1/1/99 2/1/99 3/1/99 4/1/99 5/1/99 � � � � � � � � � � � �

Cell

Sou

rce

(Poi

ntTy

pe)

Land Line ��

Direct Data

PABX

Calls (av. duration) Date (Week Day)Mon Tue Wed Thu Fri Sat Sun

Sou

rce

(Cus

tom

erTy

pe)

Consumer ��

Business

Figure 3: The cubes reporting the average duration of calls by dates in days and sources in point types, and by dates at thelevel of week days and sources at the level of customer types.

have been consolidated as the sum of the disaggregated val-ues of the weekly (or day-by-day) sales. Another exampleintroducing an aggregation grounded on a different dimen-sion is the cost of a product—e.g., a car—as being the sumof the costs of all of its components.

In order to provide an adequate conceptualisation ofmultidimensional information, a DWCDM should pro-vide the possibility of explicitly modelling the relevantaggregationsand dimensions. According to a conser-vative point of view, a desirable DWCDM should ex-tend some standard modelling formalism (such as Entity-Relationship) to allow for the description of both aggre-gated entities of the domain—together with their proper-ties and their relationships with other relevant entities—and the dimensions involved. This document is about aproposal for a Data Warehouse Conceptual Data Modelbased on the Entity-Relationship model where aggrega-tions and dimensions are first class citizens. The datamodel it is based onDescription Logics(DL), which havebeen proved useful for a logical reconstruction of the mostpopular conceptual data modelling formalisms, includ-ing the (enhanced) ER model. Advantages of using De-scription Logics are their high expressivity combined withdesirable computational properties—such as decidability,soundness and completeness of deduction procedures. Thedevised logic has a decidable reasoning problem, thus al-lowing for automated reasoning over the whole concep-tual representation. The presented framework extendsthe ideas pursued in [Calvaneseet al., 1998b] regardingconceptual modelling using Description Logics as a datamodel, and the Information Integration framework pre-

sented in [Calvaneseet al., 1998a, Calvaneseet al., 1998c]based on an extended Description Logics data modelfor both the conceptual and the logical levels; our pro-posal is compatible with the DWCDM presented in[Calvaneseet al., 1998c].

The paper is organised as follows. Section 2 infor-mally introduces an extended ER formalism which allowsfor the description of the explicitstructureof multidimen-sional aggregations; the section briefly describes the se-mantics of the conceptual data model in terms of a logi-cal representation of multidimensional databases, as pro-posed by [Cabibbo and Torlone, 1998]. Section 3 will pro-pose a basic modelling language—based on DescriptionLogics—which is expressive enough to capture the Entity-Relationship Data Model. The core part of the paper (Sec-tion 4) shows how it is possible to translate a schema ex-pressed in the extended ER with aggregations in a suitableDescription Logics theory, allowing for reasoning servicessuch as satisfiability of a schema or the computation of alogically implied statement, such as an implicit taxonomiclink between entities.

2 Modelling the Structure of Aggregation

We introduce in this section an extension of the Entity-Relationship Conceptual Data Model for representing thestructureof aggregations. Thus, a conceptual schema willbe able to describe abstract properties of multidimensionalcubes, their interrelationships, and, most notably, theircomponents. A Data Warehouse Conceptual Schema maycontain detailed descriptions of the structure of aggregates,

E. Franconi, U. Sattler 13-3

Page 4: DWH - A DataWarehouse Conceptual Data Model for Multidimensional

Call

duration

1,11,1

1,1

Pointcode

typeDay

Mon Tue Wed Thu Fri Sat Sun Consumer Business

Cell Land Line Direct Line PABX

Dest

Source

Date

X X

X X

Figure 4: The Conceptual Data Warehouse Schema for the base data considered in Figure 3.

but it may not explicitly include aggregation functions.

Aggregations are first class citizens of the representationlanguage: it is possible to describe the components of ag-gregations, and the relationships that the properties of thecomponents may have with the properties of the aggrega-tion itself; it is possible to build aggregations out of otheraggregations, i.e., it is possible for an aggregation to beexplicitly composed by other aggregations. This approachclosely resembles the one pursued by [Catarciet al., 1995,De Giacomo and Naggar, 1996], in the sense of proposinga conceptual data model in which aggregations are first-class entities intensionally described by means of theircomponents.

As we have pointed out, the description of an aggre-gation is not going to include a specification ofhow val-ues of its attributes are computed from attribute values ofits components using aggregation functions such as min,average, or sum. While including such constructs in theconceptual model is obviously important, if we restrict ourattention to data models which are computable (in a gen-eral sense), then we should be very conservative. The rea-son for this comes from an important result of the researchwithin the DWQ project which identifies the borders for thepossible extensions of a Data Warehouse Conceptual DataModel towards the explicit inclusion of aggregation func-tions [Baader and Sattler, 1998]. It has turned out that theexplicitpresence of aggregation functions, when viewed asa means to define new attribute values for aggregated enti-ties, and built-in predicates in a concrete domain increasesthe expressive power of the basic conceptual model in sucha way that all interesting inference problems may easilybecome undecidable. Moreover, this result is very tightlybounded: extending a very weak Conceptual Data Modelallowing only basic constructs with a weak form of aggre-gation already leads to the undecidability of reasoning –i.e., no terminating procedure solving the reasoning prob-lem may ever exist. On the other hand, recent researchhas shown that appropriate restrictions of the allowed ag-

gregation functions yield decidability of these problems.These results concern (1) the use of aggregation functionsin nested concepts, and (2) concrete domains like the inte-gers, the non-negative integers, the rationals, and the reals.

2.1 An extended Entity-Relationship Model

As stated in [Agrawalet al., 1995], a “good” data ware-house system should support user-definablemultiple hier-archies alongarbitrary dimensions. In Section 1.1 we havebriefly defined a dimension as an index for identifying mea-sures within a multidimensional data model. In the concep-tual data model, “dimension” is a synonym for a domain ofan attribute (or of attributes) that is structured by a hierar-chy and/or an order. In order to support multiple hierar-chies, the data model must provide means for defining andstructuring these hierarchies, and for arbitrary aggregationalong the hierarchies.

A conceptual data model where both multidimensionalaggregations and multiply hierarchically organised dimen-sions can be abstracted and described can be used inquery languages and for semantic optimization in mul-tidimensional data bases. In fact, in the few attemptswhere a cube algebraintroduces the notion of multi-ple dimensions and of levels within dimensions (e.g.,[Cabibbo and Torlone, 1997, Vassiliadis, 1998]) the DataWarehouse Conceptual Schema can serve as areferencemeta-modelfor deriving the inter-relations among levelsand dimensions.

Let us now consider a concrete example related to theanalysis of the average duration of telephone calls accord-ing to their dates and source types. The base data involvedin the analysis is represented at the conceptual level in Fig-ure 4. In order to perform the analysis, the two tables ofFigure 3 are materialised by the OLAP tool. Each cellin the bi-dimensional cube on top denotes the aggregationcomposed by all the telephone calls issued at some date(expressed as a day of the year) and originated by a partic-ular source (expressed as the type of the calling telephone);

E. Franconi, U. Sattler 13-4

Page 5: DWH - A DataWarehouse Conceptual Data Model for Multidimensional

Call

duration

1,11,1

1,1

Pointcode

typeDay

Consumer Business

Cell Land Line Direct Line PABX

Point TypeAg-1

average(duration)

Dest

Source

Date

X

X X

Figure 5: The Conceptual Data Warehouse Schema for the upper cube considered in Figure 3.

the date and the source are thedimensionsof the cube,while the calls are thetarget. In particular, cell�� is the ag-gregation composed by all those calls issued on 3/1/99 andoriginating from a land line phone point. It is clear that� �

may include more than one call, and it may itself have someproperties which depend on all of its components. For ex-ample,�� may have the propertyaverage(duration)which denotes the average duration of all the calls issuedon 3/1/99 and originating from a land line phone point. Ofcourse, this property may be computed by an appropriateaggregation function from the propertyduration of thecomponents.

An adequate basic conceptual schema for this simplemultidimensional information base should include the baseentities such asCall, Day, andPhone Point and rela-tions such asdate andsource. Moreover, the schemashould also include an additionalaggregated entity, sayAg-1, namely the class denoting the aggregations of callsby date and source; such an aggregated entity can alsohave attributes such asaverage(duration). We canalso say thatAg-1 aggregates telephone calls accordingto the (basic) levelDay and the levelPoint Type ofthe dimensionsdate andsource, respectively. The en-tity Point Type is itself an aggregation, aggregating allthe specific telephone points according to their four basictypes. It is clear that�� is one of the aggregations denotedby Ag-1.

Figure 5 presents the schema in a variant of the Entity-Relationship data model. The particular way of repre-senting aggregated entities in the figure is inspired by[Catarciet al., 1995, De Giacomo and Naggar, 1996].

If we also consider as part of the multidimensional infor-mation base theaggregated viewrepresented by the secondcube of Figure 3—denoting the aggregation composed by

the telephone calls issued at some day of the week and orig-inated from some source type of a different level as before(aggregated now according to consumer and business typepoints)—more conceptual entities come into play. Figure 6presents the extensions required to the original schema.

Cell �� is the aggregation composed by all calls issuedon Friday from a consumer type phone. Similar to� �,�� may have the propertyaverage(duration)whichcomputesthe average duration of all those calls.

Thus, we need to add both a new aggregated entity andthe definitions of the newly introduced levels for the dimen-sionsdate andsource. The new aggregated entity,Ag-2, aggregates calls according to the levelWeek day andthe levelCustomer Type of the dimensionsdate andsource respectively. Then�� is one of the aggregationsdenoted byAg-2. The levelWeek Day is obtained by ag-gregating days from the partitioning of theDay entity intoseven sub-entities, namely the seven days of the week. ThelevelCustomer Type is obtained by aggregating phonepoints from the partitioning of thePoint entity into thetwo sub-entitiesConsumer andBusiness. CustomerType is called simple aggregation, since there is no dimen-sion involved in its definitions.Customer Type andWeek Day arelevelsin themultiply hierarchically organ-isedsource and date dimensions.

We do not formally define in this paper the syntax of theextended ER model.

2.2 Semantics of the extended ER Model

The semantics of an ER schema is given in terms of le-gal multidimensional database states, i.e. multidimensionaldatabases which conform to the constraints imposed by theschema. We consider as a starting point the ER semantics

E. Franconi, U. Sattler 13-5

Page 6: DWH - A DataWarehouse Conceptual Data Model for Multidimensional

Call

duration

1,11,1

1,1

Pointcode

typeDay

Mon Tue Wed Thu Fri Sat Sun Consumer Business

Customer TypeWeek Day Ag-2

average(duration)

Dest

Source

Date

X X

Figure 6: The Conceptual Data Warehouse Schema for the lower cube of Figure 3.

introduced in [Calvaneseet al., 1998b], recasted to copewith multidimensional information. For we have chosenthe multidimensional logical data model�� introducedby [Cabibbo and Torlone, 1998].�� is independent ofany specific implementation of multidimensional databases(ROLAP or proprietary MOLAP), thus providing an ab-stract and general framework for the logical representationof multidimensional data. In [Cabibbo and Torlone, 1998]it is shown how a�� logical schema can be translatedinto a ROLAP logical representation in the form of a “star”schema, and into a general MOLAP logical representationin the form of sparse multidimensional arrays.�� abstracts notions such as dimension hierarchies

and levels, fact tables, cubes, and measures. As expected,dimensions are organised into hierarchies of levels, cor-responding to the various granularity of the basic data.Within a dimension, levels are related through roll-up func-tions. The central element of a�� schema is thef-table,representing factual data. Anf-table is the abstract logicalrepresentation of a multidimensional cube, and it is a func-tion associating symbolic coordinates (one per involved di-mension) to measures. According to the authors, a mul-tidimensional database state is thus aninstanceof a��logical schema: it is the description of the specific f-tablesinvolved, in the form, for example, of tables describing themapping from coordinates to measures.

Thus, a particular ER diagram denotes a set of multidi-mensional database states, i.e., the set of all possible mul-tidimensional databases described as�� instances whichconform to the diagram itself – i.e., they are legal states. If adiagram is inconsistent, then no multidimensional databasemay conform to it.

3 The basic Modelling Language

In this section we give a brief introduction to a ba-sic Description Logic, which will serve as the ba-

sic representation language for our DWCDM pro-posal. With respect to the formal apparatus, we willstrictly follow the concept language formalism intro-duced by [Schmidt-Schauß and Smolka, 1991] whose ex-tensions have been summarised in [Doniniet al., 1996,Calvaneseet al., 1999].

The basic types of a concept language areconcepts,roles, and features. A concept is a description gath-ering the common properties among a collection of in-dividuals; from a logical point of view it is a unarypredicate. Inter-relationships between these individualsare represented either by means of roles (which are in-terpreted as binary relations) or by means of features(which are interpreted as partial functions). Both rolesand features can be used to individuals to certain prop-erties. In the following, we will consider the Descrip-tion Logic����� [Horrocks and Sattler, 1999], extend-ing��� with features (i.e., functional roles), inverse roles,role composition, and role restrictions.

According to the syntax rules of Figure 7,����� con-cepts(denoted by the letters� and�) are built out ofcon-cept names(denoted by the letter�), roles(denoted by theletter���), andfeatures(denoted by the letters� ); rolesare built out ofrole names(denoted by the letter� ) andfeatures are built out offeature names(denoted by the letter�); it is worth noting that features are considered as specialcases of roles.

Let us now consider the formal semantics of the�����. We define themeaningof concepts as sets ofindividuals—as for unary predicates—and the meaning ofroles as sets of pairs of individuals—as for binary predi-cates. Formally, aninterpretationis a pair� � ��� � ���consisting of a set�� of individuals (thedomainof �) anda function�� (theinterpretation functionof �) mapping ev-ery concept� to a subset�� of �� , every role� to a sub-set�� of���� , and every feature to a partial function

E. Franconi, U. Sattler 13-6

Page 7: DWH - A DataWarehouse Conceptual Data Model for Multidimensional

��� � � � � (concept name)� � ��� (top)� � ������ (bottom)�� � ���� �� (complement)� �� � ���� � � � � � � (conjunction)� �� � �� � � � � � � (disjunction)��.� � �� � �� (univ. quantifier)�.� � ����� � �� (exist. quantifier)� � � �������� �� (undefinedness)� � � ��� � �� (selection)

�� � � � (role name)� � � (feature)��� � ������� �� (inverse role)��� � ������� � �� (range restriction)� Æ � �������� � � � � � � (role chain)

�� � � � � (feature name)� Æ �������� � � � � � (feature chain)

Figure 7: Syntax rules for the����� Description Logic.

� from�� to�� , such that the equations in Figure 8 aresatisfied.

A knowledge base, in this context, is a finite set� of ter-minological axioms; it can also be called aterminologyorTBox. For a concept name�, and (possibly complex) con-cepts���, terminological axioms are of the form�

� �

(concept definition),� � (primitive concept definition),� � (general inclusion statement). An interpretation�satisfies� � if and only if the interpretation of� isincluded in the interpretation of�, i.e.,� � � �� . It isclear that the last kind of axiom is a generalisation of thefirst two: concept definitions of the type�

� �—where

� is a concept name—can be reduced to the pair of ax-ioms �� �� and�� ��. Another class of termino-logical axioms—pertaining to roles���—are of the form� �. Again, an interpretation� satisfies� � if andonly if the interpretation of�—which is now a set ofpairsof individuals—is included in the interpretation of�, i.e.,�� � �� . A non-empty interpretation� is a modelof aknowledge base� iff every terminological axiom of� issatisfied by�. If � has a model, then it issatisfiable; thus,checking for KB satisfiability is deciding whether there isat least one model for the knowledge base.� logically im-pliesan axiom� (written� �� �) if � is satisfied by everymodel of�. We say that a concept� is subsumedby aconcept� in a knowledge base� (written� �� � �)if �� � �� for every model� of �. A concept� is sat-isfiable, given a knowledge base�, if there is at least onemodel� of � such that�� � �, i.e.� �� �

� �. Con-

cept subsumption can be reduced to concept satisfiabilitysince� is subsumed by� in � if and only if �� � ��� isunsatisfiable in�.

����� was designed such that it is able to encodedatabase schemas expressed in the most interesting Se-mantic Data Models and Object-Oriented Data Models

�� � ��

�� � �

����� � �� � ��

�� ���� � �� ��

�� ���� � �� ���

���.��� � �� � �� � � . ��� � � �� � � ���

��.��� � �� � �� � . ��� � � �� � � ���

�� �� � �� � ��� ��

�� � ��� � �� � ����� � ����� � ���

������ � ���� � � �� ��� � � � �� � ���

������ � �� ��� � ���

�� Æ ��� � ��Æ ��

Figure 8: The semantics of�����.

[Horrocks and Sattler, 1999, Calvaneseet al., 1998b]. Re-cently, strictly more expressive conceptual data modelsbased on DLs have been considered, most notably the��� conceptual data modelling formalism.��� wasfirst introduced by [Calvaneseet al., 1998a] as a meansfor encoding conjunctive queries over expressive seman-tic data models for information systems such as ex-tended Entity Relationship in the context of schema in-tegration. We have chosen to limit the expressivity ofthe full ��� since we are looking for a language im-plementable with the current technology, but still capa-ble to encode an interesting enhancement of the ER for-malism. In particular, we have developed sophisticatedreasoning algorithms for it [Horrocks and Sattler, 1999]and experimented them using the current academic im-plementations of expressive DLs, namely the systemsFaCT [Horrocks, 1998] and iFaCT. It has been re-cently demonstrated [Horrocks and Patel-Schneider, 1999]that the logic we are considering here allows for the imple-mentation of sound and complete reasoning algorithms thatbehave quite well both in realistic applications and system-atic tests.

4 Encoding ER schemas with Aggregations

It is shown how a schema expressed in the conceptual datamodel informally introduced in the previous section can beexpressed in an����� knowledge base—whose modelscorrespond with legal multidimensional database states ofthe ER diagram—allowing for reasoning services such assatisfiability of a schema or the computation of a logicallyimplied statement.

In the following, we describe the translation between anER diagram and an����� knowledge base.

Definition 1 (Translation)An ER schema� is translated into a corresponding knowl-

edge base� where for each domain, entity, aggregation, orrelationship symbol a concept name is introduced, and for

E. Franconi, U. Sattler 13-7

Page 8: DWH - A DataWarehouse Conceptual Data Model for Multidimensional

each attribute or ER-role symbol1 symbol a feature nameis introduced. The terminology� is defined to contain thefollowing axioms:

� For eachISA link between two entities��� (resp. tworelationships���) in �, � contains:� � (resp.� �)

� For eachPARTITION of an entity� into sub entities�� �� in �, � contains:� �� � � ���� ��� for all � � �

�� � for all �

� For each attribute� in � with domain� of an entity� (resp. of a relationship�),� contains:� � � � (resp.� � � �)

� For each relationship� in � relating � entities�� �� by means of the ER-roles� �

�� ��

��, �

contains:� ���

��� ��� � � ���

��� ���

� For each minimum cardinality constraint� � � in anER-role��

� in � relating a relationship� with andentity � (total or mandatory participation),� con-tains:� ����

� ���.�

� For each aggregationAg in � with targets�� ��,� contains:Ag ��������.�� � �������.�� � � ��

� For each aggregationAg in � involving a target� , �dimensions�� (each one being a relationship in�)and corresponding� levels�� (each one being eitheran entity�� or a simple aggregationAg� in �), �contains:Ag �������.

�������

� ��� ������

��.�� �

�������

� ��� ������

��.��

�� � � � � �

������

� ��� ������

��.��

��� �

� � � �������

� ��� ��� �����

.�� ��������

� ��� ��� �����

.��

�� � � � � �

������� ��� ��� �

����

.��� ���

where��� � �� if the level � is described by anentity ��; otherwise, if the level is described by asimple aggregationAg�, we use its targets��� � �

�� .

1ER-roles are the names given to the arguments of relationships; weassume that a unique name is given within a relationship to each ER-role,representing a specific participation of an entity in the relationship.

Extending the results of [Calvaneseet al., 1994] to thecase of multidimensional databases, it can be proved thatthe translation is correct, in the sense that whenever a rea-soning problem has a specific solution in the ER model,then the corresponding reasoning problem in the DL has acorresponding solution, and vice-versa. This is groundedon the fact that there is a precise correspondence betweenlegal multidimensional databases of� and models of�.Thus, it is possible to exploit DL reasoning procedures forsolving reasoning problems in the ER model. The reason-ing problems we are mostly interested in areconsistencyofa ER schema—which is mapped to a satisfiability problemin the corresponding DL knowledge base—andlogical im-plication within a ER schema—which is mapped to a log-ical implication problem in the corresponding DL knowl-edge base.

The proof is based by establishing the existence of twomappings from legal multidimensional database states of�to models of� and vice-versa. Informally speaking, the ex-istence of the mappings ensures that, whenever an aggrega-tion is satisfiable in�, then a non-empty mapping describ-ing the corresponding f-table in� exists, and vice-versa.The same applies for level orderings and roll-up functionsin �. A more detailed sketch of the proof will be given inthe full paper.

As a final remark, it should be noted that the high ex-pressivity of DL constructs can capture an extended versionof the basic ER model, which includes not only taxonomicrelationships, but also arbitrary boolean constructs to repre-sent so called generalized hierarchies with disjoint unions;entity definitions by means of either necessary or sufficientconditions or both, and integrity constraints expressed bymeans of generalised axioms [Calvaneseet al., 1998b].

Let us now consider the example introduced in Sec-tion 2. We start to (partially) formalise the schema of Fig-ure 4, i.e., the base data. Please recall that every role namewhich appears in the translation of an ER schema in a De-scription Logic knowledge base—with the exception of theaggregation roles—is a functional role name.

��� ��� � �� � ��� � ���

����� ��� � �� � ���� � �����

��� ��� � �� � ���� � �����

����� ��������� ��������

�������� ������ ���������

�������� ������ ���������

The partitioning of days into the seven day of the week istranslated in a similar way.The aggregated entity Customer Type is the simple aggre-gation of telephone points into two categories:

��������-����

E. Franconi, U. Sattler 13-8

Page 9: DWH - A DataWarehouse Conceptual Data Model for Multidimensional

Call

duration

1,11,1

1,1

Pointcode

typeDay

Consumer Business

Cell Land Line Direct Line PABXMobile Call

Dest

Source

Source

Date

X

X X

Figure 9: A Conceptual Data Warehouse Schema introducing the entityMobile Call.

��������.����������.���������� ���������

The Week Day simple aggregation is obtained in a similarway.The aggregated entityAg-2 is defined as being an aggre-gation composed by those calls issued in some day of theweek and originated by either a consumer telephone pointor a business telephone point:

��-� ��������.�� � �������.�� ��-� �������. �������� ������� �����.��

�������� ������� �����.���������������� ������� �����.��������� �������� ���� ����.���������� ���� ����.��� �� � � �������� ���� ����.�����

Recall thatAg-2 is the class of all aggregations such thateach one of them aggregates calls issued at the same day ofthe week and originated from the same telephone point.

Each aggregation of calls belonging to the class denotedby Ag-2 includes either only consumer originated calls oronly business originated calls. In a similar way, each aggre-gation ofAg-2 includes either only calls issued on Mon-day, or only calls issued on Tuesday, etc. Thus, aggrega-tions denoted byAg-2 may be of fourteen possible types:Monday consumer, Monday business, Tuesday consumer,Tuesday business, etc.

As an example of reasoning, let us see a case with aninconsistent aggregation. If we introduce the entity Mo-bile Call as in Figure 9, it turns out that the aggregated en-tity having Mobile Call as target (instead of its super entityCall) and Business as level for the dimension Source is in-consistent, i.e., the materialised cube is necessarily empty.In fact, the translated theory in Description Logics turns outto be unsatisfiable, since mobile calls are originated onlyfrom cell points, which are disjoint from any kind of busi-

ness phone point.

5 Conclusions

We have introduced aData Warehouse Conceptual DataModel, extending the most interesting traditional SemanticData Models and Object-Oriented Data Models, which al-lows the representation of a multidimensional conceptualview of data. We have seen how the proposed conceptualdata model is able to introduce complex descriptions of thestructure of aggregated entities and multiply hierarchicallyorganised dimensions. In order to support multiple hier-archies, the data model provides means for defining andstructuring these hierarchies, and for arbitrary aggregationalong the hierarchies. Our future work will be devoted to afurther development of the data model in order to explicitlyconsider temporal and spatial dimensions, and a study ofthe expressivity in relation with decidability and complex-ity of the refinementreasoning task.

References

[Agrawalet al., 1995] Agrawal, R.; Gupta, A.; andSarawagi, S. 1995. Modeling multidimensionaldatabases. Technical report, IBM Almaden ResearchCenter, San Jose, California. Proc. of ICDE’97.

[Baader and Sattler, 1998] Baader, Franz and Sattler, Ul-rike 1998. Description logics with concrete domains andaggregation. InProceedings of the 13th European Con-ference on Artificial Intelligence (ECAI-98). 336–340.

[Baaderet al., 1999] Baader, Franz; Franconi, Enrico; andSattler, Ulrike 1999.Multidimensional Data Models andAggregation. Springer-Verlag. chapter 4. Edited by M.Jarke, M. Lenzerini, Y. Vassilious and P. Vassiliadis.

[Cabibbo and Torlone, 1997] Cabibbo, Luca and Torlone,Riccardo 1997. Querying multidimensional databases.In proc. Sixth Int. Workshop on Database ProgrammingLanguages (DBPL-97). 253–269.

E. Franconi, U. Sattler 13-9

Page 10: DWH - A DataWarehouse Conceptual Data Model for Multidimensional

[Cabibbo and Torlone, 1998] Cabibbo, Luca and Torlone,Riccardo 1998. A logical approach to multidimensionaldatabases. InProc. of EDBT’98.

[Calvaneseet al., 1994] Calvanese, Diego; Lenzerini,Maurizio; and Nardi, Daniele 1994. A unified frame-work for class-based representation formalisms. InProc. of KR-94, Bonn D.

[Calvaneseet al., 1998a] Calvanese, D.; De Giacomo, G.;Lenzerini, M.; Nardi, D.; and Rosati, R. 1998a. De-scription logic framework for information integration. InProceedings of the 6th International Conference on thePrinciples of Knowledge Representation and Reasoning(KR-98). Morgan Kaufmann. 2–13.

[Calvaneseet al., 1998b] Calvanese, D.; Lenzerini, M.;and Nardi, D. 1998b. Description logics for conceptualdata modeling. In Chomicki, Jan and Saake, G¨unter, ed-itors 1998b,Logics for Databases and Information Sys-tems. Kluwer.

[Calvaneseet al., 1998c] Calvanese, Diego; Giacomo,Giuseppe De; Lenzerini, Maurizio; Nardi, Daniele; andRosati, Riccardo 1998c. Information integration: Con-ceptual modeling and reasoning support. InProc. ofthe 6th Int. Conf. on Cooperative Information Systems(CoopIS’98). 280–291.

[Calvaneseet al., 1999] Calvanese, Diego; De Giacomo,Giuseppe; Lenzerini, Maurizio; and Nardi, Daniele1999. Reasoning in expressive description logics. InRobinson, Alan and Voronkov, Andrei, editors 1999,Handbook of Automated Reasoning. Elsevier SciencePublishers, Amsterdam. To appear.

[Catarciet al., 1995] Catarci, Tiziana; D’Angolini, Gio-vanna; and Lenzerini, Maurizio 1995. Conceptual lan-guage for statistical data modeling.Data & KnowledgeEngineering (DKE)17:93–125.

[Cohenet al., 1999] Cohen, S.; Nutt, W.; and Serebrenik,A. 1999. Rewriting aggregate queries using views. InProc. of PODS’99. To appear.

[De Giacomo and Naggar, 1996] De Giacomo, G. andNaggar, P. 1996. Conceptual data model with structuredobjects for statistical databases. InProceedings of theEighth International Conference on Statistical DatabaseManagement Systems (SSDBM’96). IEEE Computer So-ciety Press. 168–175.

[Donini et al., 1996] Donini, F.; Lenzerini, M.; Nardi, D.;and Schaerf, A. 1996. Reasoning in description logics.In Brewka, G., editor 1996,Principles of KnowledgeRepresentation and Reasoning. Studies in Logic, Lan-guage and Information, CLSI Publications. 193–238.

[Franconi and Sattler, 1999] Franconi, Enrico and Sattler,Ulrike 1999. A data warehouse conceptual data modelfor multidimensional aggregation: a preliminary report.Journal of the Italian Association for Artificial Intelli-genceAI*IA Notizie 9–21.

[Horrocks and Patel-Schneider, 1999] Horrocks, I. andPatel-Schneider, P. F. 1999. Optimising descriptionlogic subsumption.Journal of Logic and Computation.To appear.

[Horrocks and Sattler, 1999] Horrocks, Ian and Sattler,Ulrike 1999. A description logic with transitive and in-verse roles and role hierarchies.Journal of Logic andComputation. To appear.

[Horrocks, 1998] Horrocks, I. 1998. Using an expressivedescription logic: FaCT or fiction? InProc. of the 6�

International Conference on Principles of KnowledgeRepresentation and Reasoning, Trento, Italy. 636–647.

[Nutt et al., 1998] Nutt, Werner; Sagiv, Yehoshua; andShurin, Sara 1998. Deciding equivalences among ag-gregate queries. InProc. of PODS’98. 214–223.

[Schmidt-Schauß and Smolka, 1991] Schmidt-Schauß, M.and Smolka, G. 1991. Attributive concept descriptionswith complements.Artificial Intelligence48(1):1–26.

[Vassiliadis, 1998] Vassiliadis, P. 1998. Modeling multi-dimensional databases, cubes and cube operations. InProc. of the 10th SSDBM Conference, Capri, Italy.

E. Franconi, U. Sattler 13-10