modeling issues for data warehouses cmpt 455/826 - week 7, day 1 (based on trujollo) sept-dec 2009...

17
Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d1 1

Upload: polly-hodge

Post on 05-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Modeling Issues for Data Warehouses

CMPT 455/826 - Week 7, Day 1

(based on Trujollo)

Sept-Dec 2009 – w7d1 1

Page 2: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

This is a tough paper

• This is the toughest paper that we’ve dealt with so far

• It introduces – a number of concepts that are very important– in ways that are often difficult to follow– with a combination of standard and homemade terms

• So, for today – rather than concentrate on critique items– we need to concentrate on the concepts

Sept-Dec 2009 – w7d1 2

Page 3: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Multidimensional modeling

• Ties together the concepts of:– a data warehouse– multidimensional database (MDB)– online analytical processing (OLAP)

• What are dimensions?

• What are – data warehouses– multidimensional database (MDB)– online analytical processing (OLAP)

Sept-Dec 2009 – w7d1 3

Page 4: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Multidimensional modeling

• Structures information into – facts

– dimensions

• a set of attributes called measures or fact attributes– can be atomic or derived

– are contained in cells or points within the data cube• We base this set of measures on a set of dimensions that derive from the granularity

chosen for representing the facts.

• These dimensions thus present the context for analyzing the facts.

• dimension attributes– provide the specifics that characterize dimensions.

Sept-Dec 2009 – w7d1 4

Page 5: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Multidimensional modeling

• facts – many-to-many relationships between all dimensions

– many-to-one relationships between the fact and every particular dimension

• e.g. product sale is related to only one product that is sold in one store to one customer at one time

– can represent many-to-many relationships between particular dimensions

• e.g. one sales slip can contain many products, and one product can be on many sales slips

Sept-Dec 2009 – w7d1 5

Page 6: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Multidimensional modeling

• The additivity / summarizability concept

– A measure (fact attribute) is additive along a dimension • if we can use the SUM operator to aggregate attribute values along all

hierarchies defined on that dimension

• The aggregation of some fact attributes– called roll-up in OLAP terminology

– might not be semantically meaningful for all measures along all dimensions

• e.g. number of clients– estimated by counting the number of purchase receipts for a given product, customer, day, and

store– is not additive along the product dimension. Because the same ticket can include other

products, adding up the number of clients for two or more products would lead to inconsistent results.

Sept-Dec 2009 – w7d1 6

Page 7: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Multidimensional modeling

• The strictness concept – an object at a hierarchy’s lower level – belongs to only one higher level object– e.g. a province can only relate to one country

• The completeness concept– all members belong to one higher-class object and– that object consists of those members only– e.g. only the recorded provinces can form a country.

• In a “complete” classification hierarchy between the country and province levels,

• all the recorded provinces form the country, and • all the provinces that form the country have been recorded

Sept-Dec 2009 – w7d1 7

Page 8: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Multidimensional modeling

• Categorization of dimensions– some attributes are normally valid for all elements within a

dimension – while others are only valid for a subset of elements

– e.g. the attributes alcohol percentage and volume would only be valid for drink products and would be null for food products.

• A proper multidimensional data model – should consider attributes only when necessary, – depending on the categorization of dimensions.

Sept-Dec 2009 – w7d1 8

Page 9: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Multidimensional modeling

• Recommended modeling approach– Clearly separate the structure of a multidimensional model into

• facts • dimensions

– Fact classes • are composite classes • “in a shared-aggregation relationship of n dimension classes”• e.g. they relate instances from all dimensions

– A fact object instance• is always related to object instances from all dimensions

Sept-Dec 2009 – w7d1 9

Page 10: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Multidimensional modeling

• Given the basic of their modeling approach

– they then go on to explain how they can annotate • derived measures (with a “/”)• table specific components of the table’s primary key / object ID (“OID”)• attributes that function as descriptors (‘D”)• constraints on additivity (between braces near the fact table)• additivity and derivation rules (separate from the diagram)• that a dimension is a directed acyclic graph (“DAG”)

– they also use various other UML notations

• Is this perhaps a little much semantic loading?

Sept-Dec 2009 – w7d1 10

Page 11: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Multidimensional modeling

• Regardless of how we model these various concepts– it is important that they be considered– in the design of data warehouses

Sept-Dec 2009 – w7d1 11

Page 12: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Dimensional Modeling

(based on Jones)

Sept-Dec 2009 – w7d1 12

Page 13: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Characteristics for using Patterns

• The problem that the pattern addresses is identified, recognized, and defined from real world situations.

• A pattern provides an approach for formulating a solution to a real world problem.

• The approach must be defined with respect to the real world context from which the problem emanates.

• The approach is reusable because it has been successfully used to solve recurring real world problems.

• A pattern endures over time.Sept-Dec 2009 – w7d1 13

Page 14: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Dimensional Data Patterns

• involve a commonly known & recognized mental model – with the intent of increasing the practitioner's ability to

understand, remember, and apply the DDPs

• facilitate the identification of commonly used entities – thereby providing a greater potential for improving design

correctness with the initial model

• are common across many dimensional models– thus reusability is improved and design time may be decreased

Sept-Dec 2009 – w7d1 14

Page 15: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Mental Models for DDPs

• Using a story as the basis for Domain DDPs

– Who: the characters involved in the story

– What: the important entities and the ideas for those entities

– When: a particular time frame involved

– Where: the location / setting of the story

– Why: the motivation or the reasons behind the story

Sept-Dec 2009 – w7d1 15

Page 16: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Domain DDPs

• A high-level set of domains can then be constructed:

– temporal (when)

– location (where)

– stakeholder (who)

– action (what is done or accomplished)

– object (what)

– qualifier (why)

Sept-Dec 2009 – w7d1 16

Page 17: Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

Commonality of DDPs

• The basic domains can apply to any story

• Experience across stories will recognize commonalities

• Individual stories may contain unique components– however, many of these components will take on similar patterns– despite the components having different names

Sept-Dec 2009 – w7d1 17