conceptual.modeling. solutions.for.the. data.warehouse · 2015-07-29 · data warehouses (dws)...

26
Conceptual Modeling Solutions for the Data Warehouse Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Chapter I Conceptual Modeling Solutions for the Data Warehouse Stefano Rizzi DEIS-University of Bologna, Italy Abstract In the context of data warehouse design, a basic role is played by conceptual mod- eling, that provides a higher level of abstraction in describing the warehousing process and architecture in all its aspects, aimed at achieving independence of implementation issues. This chapter focuses on a conceptual model called the DFM that suits the variety of modeling situations that may be encountered in real projects of small to large complexity. The aim of the chapter is to propose a comprehensive set of solutions for conceptual modeling according to the DFM and to give the de- signer a practical guide for applying them in the context of a design methodology. Besides the basic concepts of multidimensional modeling, the other issues discussed are descriptive and cross-dimension attributes; convergences; shared, incomplete, recursive, and dynamic hierarchies; multiple and optional arcs; and additivity.

Upload: others

Post on 17-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

Conceptual Modeling Solutions for the Data Warehouse �

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

Chapter.I

Conceptual.Modeling.Solutions.for.the.Data.Warehouse

Stefano RizziDEIS-University of Bologna, Italy

Abstract

In the context of data warehouse design, a basic role is played by conceptual mod-eling, that provides a higher level of abstraction in describing the warehousing process and architecture in all its aspects, aimed at achieving independence of implementation issues. This chapter focuses on a conceptual model called the DFM that suits the variety of modeling situations that may be encountered in real projects of small to large complexity. The aim of the chapter is to propose a comprehensive set of solutions for conceptual modeling according to the DFM and to give the de-signer a practical guide for applying them in the context of a design methodology. Besides the basic concepts of multidimensional modeling, the other issues discussed are descriptive and cross-dimension attributes; convergences; shared, incomplete, recursive, and dynamic hierarchies; multiple and optional arcs; and additivity.

Page 2: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

� Rizzi

Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited.

Introduction

Operational databases are focused on recording transactions, thus they are prevalently characterized by an OLTP (online transaction processing) workload. Conversely, data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support has completely different characteristics, and is widely known as OLAP (online analytical processing). Traditionally, OLAP applications are based on multidimensional modeling that intuitively represents data under the metaphor of a cube whose cells correspond to events that occurred in the business domain (Figure 1). Each event is quantified by a set of measures; each edge of the cube corresponds to a relevant dimension for analysis, typically associated to a hierarchy of attributes that further describe it. The multidimensional model has a twofold benefit. On the one hand, it is close to the way of thinking of data analyz-ers, who are used to the spreadsheet metaphor; therefore it helps users understand data. On the other hand, it supports performance improvement as its simple structure allows designers to predict the user intentions.Multidimensional modeling and OLAP workloads require specialized design tech-niques. In the context of design, a basic role is played by conceptual modeling that provides a higher level of abstraction in describing the warehousing process and architecture in all its aspects, aimed at achieving independence of implementation issues. Conceptual modeling is widely recognized to be the necessary foundation for building a database that is well-documented and fully satisfies the user require-

Figure 1. The cube metaphor for multidimensional modeling

Page 3: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

Conceptual Modeling Solutions for the Data Warehouse �

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

ments;usually,itreliesonagraphicalnotationthatfacilitateswriting,understand-ing,andmanagingconceptualschematabybothdesignersandusers.Unfortunately, in the field of data warehousing there still is no consensus about a formalismforconceptualmodeling(Sen&Sinha,2005).Theentity/relationship(E/R)modeliswidespreadintheenterprisesasaconceptualformalismtoprovidestandarddocumentationforrelationalinformationsystems,andagreatdealofef-forthasbeenmadetouseE/Rschemataastheinputfordesigningnonrelationaldatabasesaswell (Fahrner&Vossen,1995);nevertheless,asE/Risoriented tosupport queries that navigate associations between data rather than synthesizethem,itisnotwellsuitedfordatawarehousing(Kimball,1996).Actually,theE/RmodelhasenoughexpressivitytorepresentmostconceptsnecessaryformodelingaDW;ontheotherhand,initsbasicform,itisnotabletoproperlyemphasizethekeyaspectsofthemultidimensionalmodel,sothatitsusageforDWsisexpensivefromthepointofviewofthegraphicalnotationandnotintuitive(Golfarelli,Maio,&Rizzi,1998).Somedesignersclaimtousestarschemataforconceptualmodeling.Astar schemaisthestandardimplementationofthemultidimensionalmodelonrelationalplat-forms; it is just a (denormalized) relational schema, so it merely defines a set of relationsandintegrityconstraints.Usingthestarschemaforconceptualmodelingislikestartingtobuildacomplexsoftwarebywritingthecode,withoutthesup-portofandstatic,functional,ordynamicmodel,whichtypicallyleadstoverypoorresultsfromthepointsofviewofadherencetouserrequirements,ofmaintenance,andofreuse.Forallthesereasons,inthelastfewyearstheresearchliteraturehasproposedseveraloriginalapproachesformodelingaDW,somebasedonextensionsofE/R,someonextensionsofUML.Thischapterfocusesonanadhocconceptualmodel,thedimensional fact model (DFM), that was first proposed in Golfarelli et al. (1998) and continuously enriched and refined during the following years in order to optimally suitthevarietyofmodelingsituationsthatmaybeencounteredinrealprojectsofsmalltolargecomplexity.TheaimofthechapteristoproposeacomprehensivesetofsolutionsforconceptualmodelingaccordingtotheDFMandtogiveapracticalguideforapplyingtheminthecontextofadesignmethodology.Besidesthebasicconceptsofmultidimensionalmodeling,namelyfacts,dimensions,measures,andhierarchies,theotherissuesdiscussedaredescriptiveandcross-dimensionattributes;convergences;shared,incomplete,recursive,anddynamichierarchies;multipleandoptionalarcs;andadditivity.Afterreviewingtherelated literature in thenextsection, in the thirdandfourthsections,weintroducetheconstructsofDFMforbasicandadvancedmodeling,re-spectively. Then, in the fifth section we briefly discuss the different methodological approachestoconceptualdesign.Finally,inthesixthsectionweoutlinetheopenissuesinconceptualmodeling,andinthelastsectionwedrawtheconclusions.

Page 4: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

� R�zz�

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

Related.Literature

Inthecontextofdatawarehousing,theliteratureproposedseveralapproachestomultidimensionalmodeling.Someofthemhavenographicalsupportandareaimedatestablishingaformalfoundationforrepresentingcubesandhierarchiesaswellasanalgebraforqueryingthem(Agrawal,Gupta,&Sarawagi,1995;Cabibbo&Torlone,1998;Datta&Thomas,1997;Franconi&Kamble,2004a;Gyssens&Lakshmanan,1997;Li&Wang,1996;Pedersen&Jensen,1999;Vassiliadis,1998);sincewebelievethatadistinguishingfeatureofconceptualmodelsisthatofprovid-ingagraphicalsupporttobeeasilyunderstoodbybothdesignersanduserswhendiscussingandvalidatingrequirements,wewillnotdiscussthem.Theapproachesto“strict”conceptualmodelingforDWsdevisedsofararesum-marized inTable1.Foreachmodel, the tableshows if it isassociated tosomemethodforconceptualdesignandifitisbasedonE/R,isobject-oriented,orisanadhocmodel.ThediscussionaboutwhetherE/R-based,object-oriented,oradhocmodelsarepreferableiscontroversial.SomeclaimthatE/Rextensionsshouldbeadoptedsince(1)E/Rhasbeentestedforyears;(2)designersarefamiliarwithE/R;(3)E/Rhasproven flexible and powerful enough to adapt to a variety of application domains; and(4)severalimportantresearchresultswereobtainedfortheE/R(Sapia,Blas-chka, Hofling, & Dinter, 1998; Tryfona, Busborg, & Borch Christiansen, 1999). On the other hand, advocates of object-orientedmodels argue that (1) they aremoreexpressiveandbetterrepresentstaticanddynamicpropertiesofinformationsystems;(2)theyprovidepowerfulmechanismsforexpressingrequirementsandconstraints;(3)object-orientationiscurrentlythedominanttrendindatamodeling;and (4) UML, in particular, is a standard and is naturally extensible (Abelló, Samos, &Saltor,2002;Luján-Mora,Trujillo,&Song,2002).Finally,webelievethatadhocmodelscompensateforthelackoffamiliarityfromdesignerswiththefactthat(1)theyachievebetternotationaleconomy;(2)theygiveproperemphasistothepeculiaritiesofthemultidimensionalmodel,thus(3)theyaremoreintuitiveand

E/Rextension object-oriented adhoc

nomethod

FranconiandKamble(2004b);

Sapiaetal.(1998);Tryfonaetal.(1999)

Abelló et al. (2002);Nguyen,Tjoa,andWagner

(2000)Tsoisetal.(2001)

method Luján-Moraetal.(2002) Golfarellietal.(1998);Hüsemannetal.(2000)

Table 1. Approaches to conceptual modeling

Page 5: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

Conceptual Modeling Solutions for the Data Warehouse �

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

readablebynonexpertusers.Inparticular,theycanmodelsomeconstraintsrelatedtofunctionaldependencies(e.g.,convergencesandcross-dimensionalattributes)inasimplerwaythanUML,thatrequirestheuseofformalexpressionswritten,forinstance,inOCL.AcomparisonofthedifferentmodelsdonebyTsois,Karayannidis,andSellis(2001)pointedoutthat,abstractingfromtheirgraphicalform,thecoreexpressivityissimi-lar. In confirmation of this, we show in Figure 2 how the same simple fact could be modeledthroughanE/Rbased,anobject-oriented,andanadhocapproach.

Figure 2. The SALE fact modeled through a starER (Sapia et al., 1998), a UML class diagram (Luján-Mora et al., 2002), and a fact schema (Hüsemann, Lechtenbörger, & Vossen, 2000)

Page 6: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

� R�zz�

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

The. Dimensional. Fact. Model:............Basic. Modeling

Inthischapterwefocusonanadhocmodelcalledthedimensionalfactmodel.TheDFM is a graphical conceptual model, specifically devised for multidimensional modeling,aimedat:

• Effectivelysupportingconceptualdesign• Providing an environment on which user queries can be intuitively ex-

pressed• Supporting the dialogue between the designer and the end users to refine the

specification of requirements• Creatingastableplatformtogroundlogicaldesign• Providinganexpressiveandnon-ambiguousdesigndocumentation

TherepresentationofrealitybuiltusingtheDFMconsistsofasetoffact schemata.Thebasicconceptsmodeledarefacts,measures,dimensions,andhierarchies.Inthe following we intuitively define these concepts, referring the reader to Figure 3 thatdepictsasimplefactschemaformodelinginvoicesatlinegranularity;aformaldefinition of the same concepts can be found in Golfarelli et al. (1998).

Definition 1: Afactisafocusofinterestforthedecision-makingprocess;typically,itmodelsasetofeventsoccurringintheenterpriseworld.A

Figure 3. A basic fact schema for the INVOICE LINE fact

Page 7: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

Conceptual Modeling Solutions for the Data Warehouse �

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

factisgraphicallyrepresentedbyaboxwithtwosections,oneforthefactnameandoneforthemeasures.

Examplesoffactsinthetradedomainaresales,shipments,purchases,claims;inthe financial domain: stock exchange transactions, contracts for insurance policies, grantingofloans,bankstatements,creditcardspurchases.Itisessentialforafacttohavesomedynamicaspects,thatis,toevolvesomehowacrosstime.

Guideline.1:Theconceptsrepresentedinthedatasourcebyfrequently-updated archives are good candidates for facts; those represented byalmost-staticarchivesarenot.

Asamatteroffact,veryfewthingsarecompletelystatic;eventherelationshipbetweencitiesandregionsmightchange,ifsomeborderwererevised.Thus,thechoiceoffactsshouldbebasedeitherontheaverageperiodicityofchanges,oronthe specific interests of analysis. For instance, assigning a new sales manager to a salesdepartmentoccurslessfrequentlythancouplingapromotiontoaproduct;thus,whiletherelationshipbetweenpromotionsandproductsisagoodcandidatetobemodeledasafact,thatbetweensalesmanagersanddepartmentsisnot—exceptforthepersonnelmanager,whoisinterestedinanalyzingtheturnover!

Definition 2:Ameasureisanumericalpropertyofafact,anddescribesoneofitsquantitativeaspectsofinterestsforanalysis.Measuresarein-cludedinthebottomsectionofthefact.

Forinstance,eachinvoicelineismeasuredbythenumberofunitssold,thepriceperunit,thenetamount,andsoforth.Thereasonwhymeasuresshouldbenumeri-calisthattheyareusedforcomputations.Afactmayalsohavenomeasures,iftheonlyinterestingthingtoberecordedistheoccurrenceofevents;inthiscasethefactschemeissaidtobeemptyandistypicallyqueriedtocounttheeventsthatoccurred.

Definition 3: Adimension is a fact property with a finite domain and describesoneofitsanalysiscoordinates.Thesetofdimensionsofafactdetermines its finest representation granularity. Graphically, dimensions arerepresentedascirclesattachedtothefactbystraightlines.

Typicaldimensionsfortheinvoicefactareproduct,customer,agent,anddate.

Page 8: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

� R�zz�

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

Guideline.2:Atleastoneofthedimensionsofthefactshouldrepresenttime,atanygranularity.

Therelationshipbetweenmeasuresanddimensionsisexpressed,attheinstancelevel,bytheconceptofevent.

Definition 4:Aprimary eventisanoccurrenceofafact,andisidenti-fied by a tuple of values, one for each dimension. Each primary event is describedbyonevalueforeachmeasure.

Primaryeventsaretheelementalinformationwhichcanberepresented(inthecubemetaphor,theycorrespondtothecubecells).Intheinvoiceexampletheymodeltheinvoicingofoneproducttoonecustomermadebyoneagentononeday;itisnotpossibletodistinguishbetweeninvoicespossiblymadewithdifferenttypes(e.g.,active,passive,returned,etc.)orindifferenthoursoftheday.

Guideline.3:.Ifthegranularityofprimaryeventsasdeterminedbythesetofdimensionsiscoarserthanthegranularityoftuplesinthedatasource,measures should be defined as either aggregations of numerical attributes inthedatasource,orascountsoftuples.

Remarkably, some multidimensional models in the literature focus on treatingdimensionsandmeasuressymmetrically(Agrawaletal.,1995;Gyssens&Lak-shmanan,1997).Thisisanimportantachievementfromboththepointofviewofthe uniformity of the logical model and that of the flexibility of OLAP operators. Neverthelessweclaimthat,ataconceptuallevel,distinguishingbetweenmeasuresand dimensions is important since it allows logical design to be more specifically aimed at the efficiency required by data warehousing applications.Aggregation is the basic OLAP operation, since it allows significant information usefulfordecisionsupporttobesummarizedfromlargeamountsofdata.Fromaconceptualpointofview,aggregationiscarriedoutonprimaryeventsthankstothedefinition of dimension attributes and hierarchies.

Definition 5:Adimension attribute is a property, with a finite domain, of adimension.Likedimensions,itisrepresentedbyacircle.

Forinstance,aproductisdescribedbyitstype,category,andbrand;acustomer,byitscityanditsnation.Therelationshipsbetweendimensionattributesareexpressedbyhierarchies.

Page 9: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

Conceptual Modeling Solutions for the Data Warehouse �

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

Definition 6: Ahierarchyisadirectedtree,rootedinadimension,whosenodesareallthedimensionattributesthatdescribethatdimension,andwhosearcsmodelmany-to-oneassociationsbetweenpairsofdimensionattributes.Arcsaregraphicallyrepresentedbystraightlines.

Guideline 4:Hierarchiesshouldreproducethepatternofinterattributefunctionaldependenciesexpressedbythedatasource.

Hierarchies determine how primary events can be aggregated into secondaryevents and selected significantly for the decision-making process. The dimension in which a hierarchy is rooted defines its finest aggregation granularity, while the other dimension attributes define progressively coarser granularities. For instance, thankstotheexistenceofamany-to-oneassociationbetweenproductsandtheircategories,theinvoicingeventsmaybegroupedaccordingtothecategoryoftheproducts.

Definition 7:Given a set of dimension attributes, each tupleof theirvalues identifies a secondary eventthataggregatesallthecorrespondingprimaryevents.Eachsecondaryeventisdescribedbyavalueforeachmeasurethatsummarizesthevaluestakenbythesamemeasureinthecorrespondingprimaryevents.

Weclosethissectionbysurveyingsomealternativeterminologyusedeitherintheliteratureorinthecommercialtools.Thereissubstantialagreementonus-ingthetermdimensionstodesignatethe“entrypoints”toclassifyandidentifyevents;whilewe refer inparticular to the attributedetermining theminimumfactgranularity,sometimesthewholehierarchiesarenamedasdimensions(forinstance,theterm“timedimension”oftenreferstothewholehierarchybuiltondimensiondate).Measuresaresometimescalledvariablesormetrics.Finally,insomedatawarehousingtools,thetermhierarchydenoteseachsinglebranchofthetreerootedinadimension.

The.Dimensional.Fact. Model:............Advanced.Modeling

Theconstructsweintroduceinthissection,withthesupportofFigure4,aredescrip-tiveandcross-dimensionattributes;convergences;shared,incomplete,recursive,

Page 10: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

�0 Rizzi

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

anddynamichierarchies;multipleandoptionalarcs;andadditivity.Thoughsomeofthemarenotnecessaryinthesimplestandmostcommonmodelingsituations,theyarequiteusefulinordertobetterexpressthemultitudeofconceptualshadesthatcharacterizereal-worldscenarios.Inparticularwewillseehow,followingtheintroduction of some of this constructs, hierarchies will no longer be defined as trees tobecome,inthegeneralcase,directedgraphs.

Descriptive Attributes

Inseveralcasesitisusefultorepresentadditionalinformationaboutadimensionattribute,thoughitisnotinterestingtousesuchinformationforaggregation.Forinstance,theusermayaskforknowingtheaddressofeachstore,buttheuserwillhardlybeinterestedinaggregatingsalesaccordingtotheaddressofthestore.

Definition 8:Adescriptive attribute specifies a property of a dimension attribute,towhichisrelatedbyanx-to-oneassociation.Descriptiveat-tributesarenotusedforaggregation;theyarealwaysleavesoftheirhi-erarchyandaregraphicallyrepresentedbyhorizontallines.

Therearetwomainreasonswhyadescriptiveattributeshouldnotbeusedforag-gregation:

Guideline. 5:A descriptive attribute either has a continuously-valueddomain(forinstance,theweightofaproduct),orisrelatedtoadimen-sionattributebyaone-to-oneassociation(forinstance,theaddressofacustomer).

Cross-Dimension Attributes

Definition 9: Across-dimension attributeisa(eitherdimensionordescrip-tive)attributewhosevalueisdeterminedbythecombinationoftwoormoredimensionattributes,possiblybelongingtodifferenthierarchies.Itisdenotedbyconnectingthroughacurvelinethearcsthatdetermineit.

Forinstance,iftheVATonaproductdependsonboththeproductcategoryandthestatewheretheproductissold,itcanberepresentedbyacross-dimensionattributeasshowninFigure4.

Page 11: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

Conceptual Modeling Solutions for the Data Warehouse ��

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

Convergence

Considerthegeographichierarchyondimensioncustomer(Figure4):customersliveincities,whicharegroupedintostatesbelongingtonations.Supposethatcustom-ersaregroupedintosalesdistrictsaswell,andthatnoinclusionrelationshipsexistbetweendistrictsandcities/states;ontheotherhand,salesdistrictsnevercrossthenationboundaries.Inthiscase,eachcustomerbelongstoexactlyonenationwhich-everofthetwopathsisfollowed(customer → city → state → nation or customer → sales district → nat�on).

Definition 10:Aconvergencetakesplacewhentwodimensionattributeswithinahierarchyare connectedby twoormorealternativepathsofmany-to-oneassociations.Convergencesarerepresentedbylettingtwoormorearcsconvergeonthesamedimensionattribute.

Theexistenceofapparentlyequalattributesdoesnotalwaysdetermineaconver-gence.Ifintheinvoicefactwehadabrand cityattributeontheproducthierarchy,representingthecitywhereabrandismanufactured,therewouldbenoconvergencewithattribute(customer) city,sinceaproductmanufacturedinacitycanobviouslybesoldtocustomersofothercitiesaswell.

Figure 4. The complete fact schema for the INVOICE LINE fact

Page 12: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

�2 Rizzi

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

Optional.Arcs

Definition 11:Anoptional arcmodelsthefactthatanassociationrepre-sented within the fact scheme is undefined for a subset of the events. An optionalarcisgraphicallydenotedbymarkingitwithadash.

Forinstance,attributediettakesavalueonlyforfoodproducts;fortheotherprod-ucts, it is undefined. Inthepresenceofasetofoptionalarcsexitingfromthesamedimensionattribute,theircoveragecanbedenotedinordertoposeaconstraintontheoptionalitiesin-volved.LikeforIS-AhierarchiesintheE/Rmodel,thecoverageofasetofoptionalarcsischaracterizedbytwoindependentcoordinates.Letabeadimensionattribute,andb1,...,bmbeitschildrenattributesconnectedbyoptionalarcs:

• Thecoverageistotalifeachvalueofaalwayscorrespondstoavalueforatleastoneofitschildren;conversely,ifsomevaluesofaexistforwhichallofits children are undefined, the coverage is said to be partial.

• Thecoverageisdisjointifeachvalueofacorrespondstoavaluefor,atmost,oneofitschildren;conversely,ifsomevaluesofaexistthatcorrespondtovaluesfortwoormorechildren,thecoverageissaidtobeoverlapped.

Thus,overall,therearefourpossiblecoverages,denotedbyT-D,T-O,P-D,andP-O.Figure4showsanexampleofoptionalityannotatedwithitscoverage.Weassumethatproductscanhavethreetypes:food,clothing,andhousehold,sinceexpirationdate and size are defined only for, respectively, food and clothing, the coverage is partialanddisjoint.

Multiple.Arcs

Inmostcases,asalreadysaid,hierarchiesincludeattributesrelatedbymany-to-oneassociations.Ontheotherhand,insomesituationsitisnecessarytoincludealsoat-tributesthat,forasinglevaluetakenbytheirfatherattribute,takeseveralvalues.

Definition 12: Amultiple arc isanarc,withinahierarchy,modelingamany-to-manyassociationbetweenthetwodimensionattributesitconnects.Graphically,itisdenotedbydoublingthelinethatrepresentsthearc.

Page 13: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

Conceptual Modeling Solutions for the Data Warehouse ��

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

Considerthefactschemamodelingthesalesofbooksinalibrary,representedinFigure5,whosedimensionsaredateandbook.Userswillprobablybeinterestedinanalyzingsalesforeachbook author;ontheotherhand,sincesomebookshavetwoormoreauthors,therelationshipbetweenbookandauthormustbemodeledasamultiplearc.

Guideline.6:Inpresenceofmany-to-manyassociations,summarizabil-ityisnolongerguaranteed,unlessthemultiplearcisproperlyweighted.Multiplearcsshouldbeusedsparinglysince,inROLAPlogicaldesign,theyrequirecomplexsolutions.

Summarizabilityisthepropertyofcorrectingsummarizingmeasuresalonghierar-chies(Lenz&Shoshani,1997).Weightsrestoresummarizability,buttheirintro-duction is artificial in several cases; for instance, in the book sales fact, each author ofamultiauthoredbookshouldbeassignedanormalizedweightexpressingher“contribution”tothebook.

Shared.Hierarchies

Sometimes,largeportionsofhierarchiesarereplicatedtwiceormoreinthesamefactschema.Atypicalexampleisthetemporalhierarchy:afactfrequentlyhasmorethanonedimensionoftypedate,withdifferentsemantics,anditmaybeusefultodefine on each of them a temporal hierarchy month-week-year.Anotherexamplearegeographic hierarchies, that may be defined starting from any location attribute in thefactschema.Toavoidredundancy,theDFMprovidesagraphicalshorthandfordenotinghierarchysharing.Figure4showstwoexamplesofsharedhierarchies.FactINVOICE LINEhastwodatedimensions,withsemantics invoice dateandorderdate,

Figure 5. The fact schema for the SALES fact

Page 14: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

�4 Rizzi

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

respectively.Thisisdenotedbydoublingthecirclethatrepresentsattributedateandspecifyingtworolesinvoiceandorderontheenteringarcs.Thesecondsharedhierarchyistheoneonagent,thatmayhavetworoles:theorderingagent,thatisadimension,andtheagentwhoisresponsibleforacustomer(optional).

Guideline.8:Explicitlyrepresentingsharedhierarchiesonthefactschemaisimportantsince,duringROLAPlogicaldesign,itenablesadhocsolu-tionsaimedatavoidingreplicationofdataindimensiontables.

Ragged.Hierarchies

Leta1,...,an be a sequence of dimension attributes that define a path within a hier-archy(suchascity,state,nat�on).Uptonowweassumedthat,foreachvalueofa1,exactlyonevalueforeveryotherattributeonthepathexists.Inthepreviouscase,thisisactuallytrueforeachcityintheU.S.,whileitisfalseformostEuropeancountries where no decomposition in states is defined (see Figure 6).

Definition 13: Aragged (orincomplete) hierarchyisahierarchywhere,forsomeinstances, the values of one or more attributes are missing (since undefined or un-known).Araggedhierarchyisgraphicallydenotedbymarkingwithadashtheat-tributeswhosevaluesmaybemissing.

AsstatedbyNiemi(2001),withinaraggedhierarchyeachaggregationlevelhaspreciseandconsistentsemantics,butthedifferenthierarchyinstancesmayhavedifferentlengthsinceoneormorelevelsaremissing,makingtheinterlevelrelation-shipsnotuniform(thefatherof“SanFrancisco”belongstolevelstate,thefatherof“Rome”tolevelnat�on).Thereisanoticeabledifferencebetweenaraggedhierarchyandanoptionalarc.Inthe first case we model the fact that, for some hierarchy instances, there is no value foroneormoreattributesin any position of the hierarchy.Conversely,throughan

Figure 6. Ragged geographic hierarchies

Page 15: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

Conceptual Modeling Solutions for the Data Warehouse ��

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

optionalarcwemodelthefactthatthereisnovalueforanattributeand for all of its descendents.

Guideline.9:.Raggedhierarchiesmayleadtosummarizabilityproblems.Awayforavoidingthemistofragmentafactintotwoormorefacts,eachincludingasubsetofthehierarchiescharacterizedbyuniforminterlevelrelationships.

Thus,intheinvoiceexample,fragmenting INVOICE LINEintoU.S. INVOICE LINE andE.U. INVOICE LINE (the first with the stateattribute,thesecondwithoutstate)restoresthecompletenessofthegeographichierarchy.

Unbalanced Hierarchies

Definition 14: Anunbalanced (orrecursive)hierarchy is ahierarchywhere, though interattribute relationshipsareconsistent, the instancesmayhavedifferentlength.Graphically,itisrepresentedbyintroducingacyclewithinthehierarchy.

Atypicalexampleofunbalancedhierarchyistheonethatmodelsthedependenceinterrelationshipsbetweenworkingpersons.Figure4includesanunbalancedhierar-chy on sale agents: there are no fixed roles for the different agents, and the different “leaf”agentshaveavariablenumberofsupervisoragentsabovethem.

Guideline.10:.RecursivehierarchiesleadtocomplexsolutionsduringROLAP logicaldesign and topoorqueryingperformance.Away foravoidingthemisto“unroll”themforagivennumberoftimes.

Forinstance,intheagentexample,iftheuserstatesthattwoisthemaximumnumberofinterestinglevelsforthedependencerelationship,thecustomerhierarchycouldbetransformedasinFigure7.

Dynamic.Hierarchies

Timeisakeyfactorindatawarehousingsystems,sincethedecisionprocessisoftenbasedontheevaluationofhistoricalseriesandonthecomparisonbetweensnapshotsoftheenterprisetakenatdifferentmoments.Themultidimensionalmodelsimplicitlyassumethattheonlydynamiccomponentsdescribedinacubearetheeventsthat

Page 16: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

�6 Rizzi

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

instantiateit;hierarchiesaretraditionallyconsideredtobestatic.Ofcoursethisisnotcorrect:salesmanageralternate,thoughslowly,ondifferentdepartments;newproductsareaddedeveryweektothosealreadybeingsold;theproductcategorieschange, and their relationship with products change; sales districts can be modified, andacustomermaybemovedfromonedistricttoanother.1

Theconceptualrepresentationofhierarchydynamicityisstrictlyrelatedtoitsimpactonuserqueries.Infact,inpresenceofadynamichierarchywemaypicturethreedifferenttemporalscenariosforanalyzingevents(SAP,1998):

•. Today. for. yesterday: All events are referred to the current configuration ofhierarchies.Thus,assumingonJanuary1,2005theresponsibleagentforcustomerSmithhaschangedfromMr.BlacktoMr.White,andthatanewcustomerO’HarahasbeenacquiredandassignedtoMr.Black,whencomput-ingtheagentcommissionsallinvoicesforSmithareattributedtoMr.White,whileonlyinvoicesforO’HaraareattributedtoMr.Black.

•. Yesterday.for.today: All events are referred to some past configuration of hierarchies.Inthepreviousexample,allinvoicesforSmithareattributedtoMr.Black,whileinvoicesforO’Haraarenotconsidered.

•. Today.or.yesterday.(or.historical.truth):Eacheventisreferredtothecon-figuration hierarchies had at the time the event occurred. Thus, the invoices forSmithupto2004andthoseforO’HaraareattributedtoMr.Black,whileinvoicesforSmithfrom2005areattributedtoMr.White.

Whileintheagentexample,dynamicityconcernsanarcofahierarchy,theoneexpressingthemany-to-oneassociationbetweencustomerandagent,insomecasesitmayaswellconcernadimensionattribute:forinstance,thenameofaproductcategory may change. Even in this case, the different scenarios are defined in much thesamewayasbefore.Ontheconceptualschema,itisusefultodenotewhichscenariostheuserisinterestedfor each arc and attribute, since this heavily impacts on the specific solutions to be adoptedduringlogicaldesign.Bydefault,wewillassumethattheonlyinterestingscenarioistodayforyesterday—itisthemostcommonone,andtheonewhose

Figure 7. Unrolling the agent hierarchy

Page 17: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

Conceptual Modeling Solutions for the Data Warehouse ��

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

implementationonthestarschemaissimplest.Ifsomeattributesorarcsrequiredifferentscenarios,thedesignershouldspecifythemonatablelikeTable2.

Additivity

Aggregation requires defining a proper operator to compose the measure values characterizingprimaryeventsintomeasurevaluescharacterizingeachsecondaryevent.Fromthispointofview,wemaydistinguishthreetypesofmeasures(Lenz&Shoshani,1997):

•. Flow.measures:.Theyrefertoatimeperiod,andarecumulativelyevaluatedattheendofthatperiod.Examplesarethenumberofproductssoldinaday,themonthlyrevenue,thenumberofthoseborninayear.

•. Stock.measures:Theyareevaluatedatparticularmomentsintime.Examplesarethenumberofproductsinawarehouse,thenumberofinhabitantsofacity,thetemperaturemeasuredbyagauge.

•. Unit.measures:Theyareevaluatedatparticularmomentsintime,buttheyareexpressedinrelativeterms.Examplesaretheunitpriceofaproduct,thediscountpercentage,theexchangerateofacurrency.

Theaggregationoperatorsthatcanbeusedonthethreetypesofmeasuresaresum-marizedinTable3.

arc/attribute todayforyesterday yesterdayfortoday todayoryesterday

customer-resp. agent YES YES YES

customer-city YES YES

sale district YES

Table 2. Temporal scenarios for the INVOICE fact

temporalhierarchies nontemporalhierarchies

flow measures SUM,AVG,MIN,MAX SUM,AVG,MIN,MAX

stockmeasures AVG,MIN,MAX SUM,AVG,MIN,MAX

unitmeasures AVG,MIN,MAX AVG,MIN,MAX

Table 3. Valid aggregation operators for the three types of measures (Lenz, 1997)

Page 18: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

�8 Rizzi

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

Definition 15:Ameasureissaidtobeadditivealongadimensionifitsvaluescanbeaggregatedalongthecorrespondinghierarchybythesumoperator,otherwise it is callednonadditive.Anonadditivemeasure isnonaggregableifnootheraggregationoperatorcanbeusedonit.

Table 3 shows that, in general, flow measures are additive along all dimensions, stockmeasuresarenonadditivealongtemporalhierarchies,andunitmeasuresarenonadditivealongalldimensions.Ontheinvoicescheme,mostmeasuresareadditive.Forinstance,quantity has flow type:thetotalquantityinvoicedinamonthisthesumofthequantitiesinvoicedinthesingledaysofthatmonth.Measureunit pricehasunittypeandisnonadditivealongalldimensions.Thoughitcannotbesummedup,itcanstillbeaggregatedbyusingoperatorssuchasaverage,maximum,andminimum.Sinceadditivityisthemostfrequentcase,inordertosimplifythegraphicnotationintheDFM,onlytheexceptionsarerepresentedexplicitly.Inparticular,ameasureisconnectedtothedimensionsalongwhichitisnonadditivebyadashedlinelabeledwiththeotheraggregationoperators(ifany)whichcanbeusedinstead.Ifameasureisaggregatedthroughthesameoperatoralongalldimensions,thatoperatorcanbesimplyreportedonitsside(seeforinstance unit priceinFigure4).

Approaches. to. Conceptual.Design

Inthissectionwediscusshowconceptualdesigncanbeframedwithinamethod-ology for DW design. The approaches to DW design are usually classified in two categories(Winter&Strauch,2003):

• Data-driven(orsupply-driven)approachesthatdesigntheDWstartingfromadetailedanalysisofthedatasources;userrequirementsimpactondesignbyallowingthedesignertoselectwhichchunksofdataarerelevantfordecisionmakingandbydeterminingtheirstructureaccordingtothemultidimensionalmodel(Golfarellietal.,1998;Hüsemannetal.,2000).

• Requirement-driven(ordemand-driven)approachesstartfromdeterminingtheinformationrequirementsofendusers,andhowtomaptheserequirementsontotheavailabledatasourcesisinvestigatedonlya posteriori(Prakash&Gosain,2003;Schiefer,List&Bruckner,2002).

Whiledata-drivenapproachessomehowsimplifythedesignofETL(extraction,trans-formation,andloading),sinceeachdataintheDWisrootedinoneormoreattributes

Page 19: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

Conceptual Modeling Solutions for the Data Warehouse ��

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

ofthesources,theygiveuserrequirementsasecondaryroleindeterminingtheinfor-mationcontentsforanalysis,andgivethedesignerlittlesupportinidentifyingfacts,dimensions,andmeasures.Conversely,requirement-drivenapproachesbringuserrequirementstotheforeground,butrequirealargereffortwhendesigningETL.

Data-Driven.Approaches

Data-drivenapproachesarefeasiblewhenallofthefollowingaretrue:(1)detailedknowledgeofdatasourcesisavailablea priorioreasilyachievable;(2)thesourceschemata exhibit a gooddegreeof normalization; (3) the complexityof sourceschemataisnothigh.Inpractice,whenthechosenarchitecturefortheDWreliesonareconciled level(oroperational data store)theserequirementsarelargelysat-isfied: in fact, normalization and detailed knowledge are guaranteed by the source integrationprocess.Thesameholds,thankstoacarefulsourcerecognitionactivity,inthefrequentcasewhenthesourceisasinglerelationaldatabase,well-designedandnotverylarge.Inadata-drivenapproach,requirementanalysisistypicallycarriedoutinformally,basedonsimplerequirementglossaries(Lechtenbörger,2001)ratherthanonformaldiagrams.Conceptualdesignisthenheavilyrootedonsourceschemataandcanbelargelyautomated.Inparticular,thedesignerisactivelysupportedinidentifyingdimensionsandmeasures,inbuildinghierarchies,indetectingconvergencesandsharedhierarchies.Forinstance,theapproachproposedbyGolfarellietal.(1998)consists of five steps that, starting from the source schema expressed either by an E/Rschemaorarelationalschema,createtheconceptualschemafortheDW:

1. Choosefactsofinterestonthesourceschema2. Foreachfact,buildanattribute treethatcapturesthefunctionaldependencies

expressedbythesourceschema3. Edittheattributetreesbyadding/deletingattributesandfunctionaldependencies4. Choosedimensionsandmeasures5. Createthefactschemata

Whilestep2iscompletelyautomated,someadvancedconstructsoftheDFMaremanuallyappliedbythedesignerduringstep5.On-the-field experience shows that, when applicable, the data-driven approach is preferablesinceitreducestheoveralltimenecessaryfordesign.Infact,notonlyconceptualdesigncanbepartiallyautomated,butevenETLdesignismadeeasiersincethemappingbetweenthedatasourcesandtheDWisderivedatnoadditionalcostduringconceptualdesign.

Page 20: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

20 Rizzi

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

Requirement-Driven.Approaches

Conversely,withinarequirement-drivenframework,intheabsenceofknowledgeofthesourceschema,thebuildingofhierarchiescannotbeautomated;themainassuranceofasatisfactoryresultistheskillandexperienceofthedesigner,andthedesigner’sabilitytointeractwiththedomainexperts.Inthiscaseitmaybeworthadoptingformaltechniquesforspecifyingrequirementsinordertomoreaccuratelycaptureusers’needs;forinstance,thegoal-orientedapproachproposedbyGiorgini,Rizzi,andGarzetti(2005)isbasedonanextensionoftheTroposformalismandincludesthefollowingsteps:

1. Create,intheTroposformalism,anorganizational modelthatrepresentsthestakeholders,theirrelationships,theirgoalsaswellastherelevantfactsfortheorganizationandtheattributesthatdescribethem.

2. Create,intheTroposformalism,adecisional modelthatexpressestheanalysisgoalsofdecisionmakersandtheirinformationneeds.

3. Createpreliminaryfactschematafromthedecisionalmodel.4. Editthefactschemata,forinstance,bydetectingfunctionaldependenciesbe-

tweendimensions,recognizingoptionaldimensions,andunifyingmeasuresthatonlydifferfortheaggregationoperator.

This approach is, in our view, more difficult to pursue than the previous one. Neverthe-less,itistheonlyalternativewhenadetailedanalysisofdatasourcescannotbemade(forinstance,whentheDWisfedfromanERPsystem),orwhenthesourcescomefromlegacysystemswhosecomplexitydiscouragesrecognitionandnormalization.

Mixed.Approaches

Finally,alsoafewmixed approachestodesignhavebeendevised,aimedatjoiningthefacilitiesofdata-drivenapproacheswiththeguaranteesofrequirement-drivenones(Bonifati,Cattaneo,Ceri,Fuggetta,&Paraboschi,2001;Giorginietal.,2005).Heretheuserrequirements,capturedbymeansofagoal-orientedformalism,arematchedwiththeschemaofthesourcedatabasetodrivethealgorithmthatgener-atestheconceptualschemafortheDW.Forinstance,theapproachproposedbyGiorginietal.(2005)encompassesthreephases:

1. Create,intheTroposformalism,anorganizational modelthatrepresentsthestakeholders,theirrelationships,theirgoals,aswellastherelevantfactsfortheorganizationandtheattributesthatdescribethem.

Page 21: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

Conceptual Modeling Solutions for the Data Warehouse 2�

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

2. Create,intheTroposformalism,adecisional modelthatexpressestheanalysisgoalsofdecisionmakersandtheirinformationneeds.

3. Map facts, dimensions, and measures identified during requirement analysis ontoentitiesinthesourceschema.

4. Generateapreliminaryconceptualschemabynavigatingthefunctionalde-pendenciesexpressedbythesourceschema.

5. Editthefactschematatofullymeettheuserexpectations.

Notethat,thoughstep4maybebasedonthesamealgorithmemployedinstep2ofthedata-drivenapproach,herenavigationisnot“blind”butratheritisactivelybiasedbytheuserrequirements.Thus,thepreliminaryfactschematageneratedheremaybeconsiderablysimplerandsmallerthanthoseobtainedinthedata-drivenap-proach.Besides,whileinthatapproachtheanalystisaskedforidentifyingfacts,dimensions, and measures directly on the source schema, here such identification isdrivenbythediagramsdevelopedduringrequirementanalysis.Overall,themixedframeworkisrecommendablewhensourceschemataarewell-knownbuttheirsizeandcomplexityaresubstantial.Infact,thecostforamorecarefulandformalanalysisofrequirementisbalancedbythequickeningofcon-ceptualdesign.

Open. Issues

A lot of work has been done in the field of conceptual modeling for DWs; never-thelesssomeveryimportantissuesstillremainopen.Wereportsomeoftheminthissection,astheyemergedduringjointdiscussionatthePerspective Seminar on “Data Warehousing at the Crossroads”thattookplaceatDagstuhl,GermanyonAugust2004.

• Lack. of. a. standard: Though several conceptual models have been pro-posed,noneofthemhasbeenacceptedasastandardsofar,andallvendorsproposetheirownproprietarydesignmethods.Weseetwomainreasonsforthis:(1)thoughtheconceptualmodelsdevisedaresemanticallyrich,someofthemodeledpropertiescannotbeexpressedinthetargetlogicalmodels,sothetranslationfromconceptualtologicalisincomplete;and(2)commercialCASEtoolscurrentlyenabledesignerstodirectlydrawlogicalschemata,thusno industrial push is given to any of the models. On the other hand, a unified conceptualmodelforDWs,implementedbysophisticatedCASEtools,wouldbeavaluablesupportforboththeresearchandindustrialcommunities.

Page 22: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

�� R�zz�

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

• Design.patterns:.Insoftwareengineering,designpatternsareaprecioussup-portfordesignerssincetheyproposestandardsolutionstoaddresscommonmodelingproblems.Recently,somepreliminaryattemptshavebeenmadetoidentifyrelevantpatternsformultidimensionaldesign,aimedatassistingDWdesignersduringtheirmodelingtasksbyprovidinganapproachforrecogniz-ingdimensionsinasystematicandusableway(Jones&Song,2005).Thoughwe agree that DW design would undoubtedly benefit from adopting a pattern-basedapproach,andwealsorecognizetheutilityofpatternsinincreasingtheeffectivenessofteachinghowtodesign,webelievethatfurtherresearchisnecessaryinordertoachieveamorecomprehensivecharacterizationofmul-tidimensionalpatternsforbothconceptualandlogicaldesign.

• Modeling.security:Informationsecurityisaseriousrequirementthatmustbecarefullyconsideredinsoftwareengineering,notinisolationbutasanissueunderlyingallstagesofthedevelopmentlifecycle,fromrequirementanalysistoimplementationandmaintenance.TheproblemofinformationsecurityisevenbiggerinDWs,asthesesystemsareusedtodiscovercrucialbusinessinformation in strategic decision making. Some approaches to security inDWs,focused,forinstance,onaccesscontrolandmultilevelsecurity,canbefoundintheliterature(see,forinstance,Priebe&Pernul,2000),butneitherofthemtreatssecurityascomprisingallstagesoftheDWdevelopmentcycle.Besides,theclassicalsecuritymodelusedintransactionaldatabases,centeredontables,rows,andattributes,isunsuitableforDWandshouldbereplacedbyanadhocmodelcenteredonthemainconceptsofmultidimensionalmodel-ing—suchasfacts,dimensions,andmeasures.

• Modeling.ETL:.ETLisacornerstoneofthedatawarehousingprocess,anditsdesignandimplementationmayeasilytake50%ofthetotaltimeforsettingupaDW.IntheliteraturesomeapproachesweredevisedforconceptualmodelingoftheETLprocessfromeitherthefunctional(Vassiliadis,Simitsis,&Skia-dopoulos,2002),thedynamic(Bouzeghoub,Fabret,&Matulovic,1999),orthestatic(Calvanese,DeGiacomo,Lenzerini,Nardi,&Rosati,1998)pointsofview.Recently,alsosomeinterestingworkontranslatingconceptualintologicalETLschematahasbeendone(Simitsis,2005).Nevertheless,issuessuchastheoptimizationofETLlogicalschemataarenotverywellunderstood.Besides,thereisaneedfortechniquesthatautomaticallypropagatechangesoccurredinthesourceschemastotheETLprocess.

Conclusion

InthischapterwehaveproposedasetofsolutionsforconceptualmodelingofaDWaccordingtotheDFM.Since1998,theDFMhasbeensuccessfullyadopted,inreal

Page 23: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

Conceptual Modeling Solutions for the Data Warehouse 2�

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

DW projects mainly in the fields of retail, large distribution, telecommunications, health,justice,andinstruction,whereithasprovedexpressiveenoughtocaptureawidevarietyofmodelingsituations.Remarkably,inmostprojectstheDFMwasalsousedtodirectlysupportdialoguewithendusersaimedatvalidatingrequire-ments,andtoexpresstheexpectedworkloadfortheDWtobeusedforlogicalandphysicaldesign.ThiswasmadepossiblebytheadoptionofaCASEtoolnamedWAND (warehouseintegrateddesigner),entirelydevelopedattheUniversityofBologna,thatassiststhedesignerinstructuringaDW.WANDcarriesoutdata-drivenconceptualdesigninasemiautomaticfashionstartingfromthelogicalschemeofthe source database (see Figure 8), allows for a core workload to be defined on the conceptualscheme,andcarriesoutworkload-basedlogicaldesigntoproduceanoptimizedrelationalschemefortheDW(Golfarelli&Rizzi,2001).Overall, our on-the-field experience confirmed that adopting conceptual modeling withinaDWprojectbringsgreatadvantagessince:

• Conceptual schemata are the best support for discussing, verifying, and refining user specifications since they achieve the optimal trade-off between expres-sivityandclarity.Starschematacouldhardlybeusedtothispurpose.

• Forthesamereason,conceptualschemataareanirreplaceablecomponentofthedocumentationfortheDWproject.

• Theyprovidea solidandplatform-independent foundation for logical andphysicaldesign.

• TheyareaneffectivesupportformaintainingandextendingtheDW.• Theymaketurn-overofdesignersandadministratorsonaDWprojectquicker

andsimpler.

Figure 8. Editing a fact schema in WAND

Page 24: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

�� R�zz�

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

References

Abelló, A., Samos, J., & Saltor, F. (2002, July 17-19). YAM2 (Yet another multidi-mensionalmodel):AnextensionofUML.InProceedings of the International Database Engineering & Applications Symposium(pp.172-181).Edmonton,Canada.

Agrawal,R.,Gupta,A.,&Sarawagi,S.(1995).Modeling multidimensional databases(IBMResearchReport).IBMAlmadenResearchCenter,SanJose,CA.

Bonifati,A.,Cattaneo,F.,Ceri,S.,Fuggetta,A.,&Paraboschi,S.(2001).Designingdatamartsfordatawarehouses.ACM Transactions on Software Engineering and Methodology,10(4),452-483.

Bouzeghoub,M.,Fabret,F.,&Matulovic,M.(1999).Modelingdatawarehouserefreshment process as a workflow application. In Proceedings of the Inter-national Workshop on Design and Management of Data Warehouses,Heidel-berg,Germany.

Cabibbo,L.,&Torlone,R.(1998,March23-27).Alogicalapproachtomultidimen-sionaldatabases.InProceedings of the International Conference on Extending Database Technology (pp.183-197).Valencia,Spain.

Calvanese,D.,DeGiacomo,G.,Lenzerini,M.,Nardi,D.,&Rosati,R.(1998,August20-22).Informationintegration:Conceptualmodelingandreasoningsupport.InProceedings of the International Conference on Cooperative Information Systems (pp.280-291).NewYork.

Datta,A.,&Thomas,H.(1997).Aconceptualmodelandalgebraforon-lineana-lyticalprocessing indatawarehouses. InProceedings of the Workshop for Information Technology and Systems(pp.91-100).

Fahrner,C.,&Vossen,G.(1995).Asurveyofdatabasetransformationsbasedontheentity-relationshipmodel.Data & Knowledge Engineering,15(3),213-250.

Franconi,E.,&Kamble,A.(2004a,June7-11).TheGMDdatamodelandalgebraformultidimensionalinformation.InProceedings of the Conference on Ad-vanced Information Systems Engineering(pp.446-462).Riga,Latvia.

Franconi,E.,&Kamble,A.(2004b).Adatawarehouseconceptualdatamodel.InProceedings of the International Conference on Statistical and Scientific Da-tabase Management(pp.435-436).

Giorgini,P.,Rizzi,S.,&Garzetti,M.(2005,November4-5).Goal-orientedrequirementanalysisfordatawarehousedesign.InProceedings of the ACM International Workshop on Data Warehousing and OLAP(pp.47-56).Bremen,Germany.

Golfarelli,M.,Maio,D.,&Rizzi,S.(1998).Thedimensionalfactmodel:Acon-ceptual model for data warehouses. International Journal of Cooperative Information Systems,7(2-3),215-247.

Page 25: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

Conceptual Modeling Solutions for the Data Warehouse 2�

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

Golfarelli,M.,&Rizzi,S.(2001,April2-6).WAND:ACASEtoolfordataware-housedesign.InDemo Proceedings of the International Conference on Data Engineering(pp.7-9).Heidelberg,Germany.

Gyssens,M.,&Lakshmanan,L.V.S.(1997).Afoundationformulti-dimensionaldatabases.InProceedings of the International Conferenceon Very Large Data Bases(pp.106-115),Athens,Greece.

Hüsemann,B.,Lechtenbörger,J.,&Vossen,G.(2000).Conceptualdatawarehousedesign.InProceedings of the International Workshop on Design and Manage-ment of Data Warehouses,Stockholm,Sweden.

Jones,M.E.,&Song,I.Y.(2005).Dimensionalmodeling:Identifying,classifying&applyingpatterns.InProceedings of the ACM International Workshop on Data Warehousing and OLAP(pp.29-38).Bremen,Germany.

Kimball,R.(1996).The data warehouse toolkit.NewYork:JohnWiley&Sons.Lechtenbörger ,J. (2001).Data warehouse schema design (Tech.Rep.No.79).

DISDBISAkademischeVerlagsgesellschaftAkaGmbH,Germany.Lenz,H.J.,&Shoshani,A.(1997).SummarizabilityinOLAPandstatisticaldata-

bases.InProceedings of the 9th International Conference on Statistical and Scientific Database Management(pp.132-143).Washington,DC.

Li, C., & Wang, X. S. (1996).A data model for supporting on-line analyticalprocessing.InProceedings of the International Conference on Information and Knowledge Management(pp.81-88).Rockville,Maryland.

Luján-Mora,S.,Trujillo,J.,&Song,I.Y.(2002).ExtendingtheUMLformultidi-mensionalmodeling.InProceedings of the International Conference on the Unified Modeling Language(pp.290-304).Dresden,Germany.

Niemi,T.,Nummenmaa,J.,&Thanisch,P.(2001,June4).Logicalmultidimensionaldatabasedesignforraggedandunbalancedaggregation.Proceedings of the 3rd International Workshop on Design and Management of Data Warehouses, Interlaken,Switzerland (p.7).

Nguyen,T.B.,Tjoa,A.M.,&Wagner,R.(2000).Anobject-orientedmultidimen-sionaldatamodelforOLAP.InProceedings of the International Conference on Web-Age Information Management(pp.69-82).Shanghai,China.

Pedersen,T.B.,&Jensen,C.(1999).Multidimensionaldatamodelingforcomplexdata.InProceedings of the International Conference on Data Engineering(pp.336-345).Sydney,Austrialia.

Prakash,N.,&Gosain,A.(2003).Requirementsdrivendatawarehousedevelop-ment.InProceedings of the Conference on Advanced Information Systems Engineering—Short Papers,Klagenfurt/Velden,Austria.

Priebe,T.,&Pernul,G.(2000).TowardsOLAPsecuritydesign:Surveyandre-searchissues.InProceedings of the ACM International Workshop on Data Warehousing and OLAP(pp.33-40).Washington,DC.

Page 26: Conceptual.Modeling. Solutions.for.the. Data.Warehouse · 2015-07-29 · data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support

�� R�zz�

Copyright©2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

SAP.(1998).Data modeling with BW.SAPAmericaInc.andSAPAG,Rockville,MD.

Sapia, C., Blaschka, M., Hofling, G., & Dinter, B. (1998). Extending the E/R model forthemultidimensionalparadigm.InProceedings of the International Con-ference on Conceptual Modeling,Singapore.

Schiefer, J.,List,B.,&Bruckner,R. (2002).Aholistic approach formanagingrequirements of data warehouse systems. In Proceedings of the Americas Conference on Information Systems.

Sen,A.,&Sinha,A.P.(2005).Acomparisonofdatawarehousingmethodologies.Communications of the ACM,48(3),79-84.

Simitsis,A.(2005).MappingconceptualtologicalmodelsforETLprocesses.InProceedings of the ACM International Workshop on Data Warehousing and OLAP (pp.67-76).Bremen,Germany.

Tryfona,N.,Busborg,F.,&BorchChristiansen,J.G.(1999).starER:Aconceptualmodelfordatawarehousedesign.InProceedings of the ACM International Workshop on Data Warehousing and OLAP, KansasCity,Kansas(pp.3-8).

Tsois,A.,Karayannidis,N.,&Sellis,T.(2001).MAC:ConceptualdatamodelingforOLAP.InProceedings of the International Workshop on Design and Man-agement of Data Warehouses(pp.5.1-5.11).Interlaken,Switzerland.

Vassiliadis,P.(1998).Modelingmultidimensionaldatabases,cubesandcubeopera-tions.InProceedings of the 10th International Conference on Statistical and Scientific Database Management,Capri,Italy.

Vassiliadis,P.,Simitsis,A.,&Skiadopoulos,S.(2002,November8).ConceptualmodelingforETLprocesses.InProceedings of the ACM International Work-shop on Data Warehousing and OLAP(pp.14-21).McLean,VA.

Winter,R.,&Strauch,B.(2003).Amethodfordemand-driveninformationrequire-mentsanalysisindatawarehousingprojects.InProceedings of the Hawaii International Conference on System Sciences, Kona(pp.1359-1365).

Endnote

1 Inthischapterwewillonlyconsiderdynamicityattheinstancelevel.Dy-namicityattheschemalevelisrelatedtotheproblemofevolutionofDWsandisoutsidethescopeofthischapter.