Download - Introduction to Dimesional Modelling
![Page 1: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/1.jpg)
AN INTRODUCTION TO DIMESIONAL DATA MODELLING
Ashish ChandwaniIntern – Nationwide InsuranceSchool – University of Maryland, College Park
![Page 2: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/2.jpg)
CONTENTS
Overview of Data Warehouse Introduction to Dimensional Modelling Elements of Dimensional Model Designing a Dimensional Data Model Types of Schema Dimensional Data Model vs Relational Data Model
![Page 3: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/3.jpg)
Data Warehouse
Central Repositories of Integrated Data from one or more diverse sources.
Store current and historical data.
Sometimes referred to as Enterprise Data Warehouse
Often data is collected from multiple sources within and outside the organization and processes are deployed involving cleansing and data integrity.
DW is used for reporting and analysis.
![Page 4: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/4.jpg)
Introduction to Dimensional Modelling
Dimensional Modeling is a technique for database design.
Important for supporting end user queries relating to business transactions.
Intended to support analysis and reporting. Contains business attribute tables (dimensions) and business transaction tables(facts/measures).
Used as basis for OLAP(Online Analytical Processing) cubes.
![Page 5: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/5.jpg)
Elements of Dimensional Data Model
Dimensions Table: Collection of reference information
about a business. Eg: Location, Product and Date are dimensions for certain metrics of organizations like Nationwide Insurance.
Each dimension table contains attributes which describe the details of the dimension. Eg: Product dimensions can contain product name, type, price.
Each dimension table may also contain hierarchies. For eg: Location dimension can contain location name, location city, location state, location country.
Fact Table Measurable events for which
dimension table data is collected and is used for analysis and reporting.
Facts tables could contain information like sales against a set of dimensions like Location, Product and Date.
Primary Key in Dimensional Models are mapped as foreign keys in the Fact Tables.
Usually these keys are Surrogate Keys.
Dimensions contain the context for the business problems and facts are the measures for those contexts.
![Page 6: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/6.jpg)
Surrogate Keys
Before moving along to understand how to design dimensional models, it’s important to understand the concept of Surrogate Keys.
A surrogate key is an unintelligent/dumb key which is not derived from application data like natural keys.
Surrogate key is artificially derived to cover regular changes with in the fact and dimension tables.
It is usually an incremental key with values from 1 to N against each row entry in the data warehouse tables.
![Page 7: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/7.jpg)
Why Surrogate Keys
Avoid backend application data key conflicts. Consistency among dimension keys as different
backend application may use different columns as keys.
Covers the data warehouse for changes in the backend application data.
Implement history of slowly changing dimensions. Usually surrogate keys are integers and not
characters. Surrogate keys are also used for recycling data as
per business requirements.
![Page 8: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/8.jpg)
Designing a Dimensional Model
Understand the business problem = Most Important.Basically while designing a data model solution you should be able to answer : Why, How Much , When/Where/Who, WhatDesigning Dimensional Models typically involves the following steps:
Choose the Business Process •WhyDeclare the Grain •How
Much
Identify the Dimension •3Ws
Identify the Fact
![Page 9: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/9.jpg)
Choose the Business Process
The actual business processes the data warehouse should cover.
Describe the problem on which/for which models should be built on.
This is the “why” of building a data model.
Here is a sample business process :-
The Senior Executives at Nationwide want to determine the sales for certain products in different location for a particular time period.
![Page 10: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/10.jpg)
Declare the Grain
The Grain describes the level of detail needed for the business problem/solution.
Lowest level of information stored in any table. This is the “How much” of building a data model.
Sample Grain:The Senior Executives at Nationwide want to determine the sales for certain products in different locations for every week.So the grain is “by product by location by week”.
![Page 11: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/11.jpg)
Identify the Dimension
Dimensions are the reference information for the business. Contains dimension tables with their attributes(columns) and
hierarchies. This is the “When, Where and Who” of building a data model
Sample Dimensions:The Senior Executives at Nationwide want to
determine the sales for certain products in different locations for a particular time period.Dimensions here are : - Products, Location and TimeDimension Attributes :- For Product - Product key(surrogate key), Product Name, Product specs, Product type.Dimension Hierarchies : - For location – location country, location city, location street, location name
![Page 12: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/12.jpg)
Identify the Fact
Measurable events for Dimensions. This is the “What” of building a data model
Sample Facts:The Senior Executives at Nationwide want to
determine the sales for certain products in certain locations for a particular time period.Fact here is :- Sum of Sales by product by location by time.
![Page 13: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/13.jpg)
Types of Dimensional Model Schemas
A star schema is the one in which a central fact table is surrounded by denormalized dimensional tables. A star schema can be simple or complex. A simple star schema consists of one fact table where as a complex star schema have more than one fact table.
A snow flake schema is an enhancement of star schema by adding additional dimensions. Snow flake schema are useful when there are low cardinality attributes in the dimensions.
Star Schema Snowflake Schema
![Page 14: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/14.jpg)
Differences between Star and Snowflake Schema
Property Star Schema Snowflake SchemaEase of maintenance / change
Easy to maintain due to low redundancy.
Difficult to maintain due to high redundancy.
Facts and Dimension Properties
Dimension Tables are normalized, Fact tables are denormalized
Dimension Tables and tables are denormalized
Ease of Use Difficult to understand to due to increased queries
Easier to understand due to simple queries
Query Performance Poor, due to increased complexity in joins.(increased foreign keys)
Good, less complexity.(Less foreign keys).
Type of Data warehouse
Complex Relations ( Many to Many)
Simple Relations ( One to One/ One to Many)
When to use Greater size of dimension tables, snowflake schema helps reduce space.
Smaller size of dimension tables.
![Page 15: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/15.jpg)
Slowly Changing Dimensions
Sometimes the attribute information with in the dimensions might be altered to correspond to business decisions/rules.
Hence dimension information would be altered which has to be accounted for in the data model.
The changes in the dimension are unpredictable rather than changing over a fixed schedule.
These are Slowly Changing Dimensions.
![Page 16: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/16.jpg)
Illustration of Slowly Changing Dimensions - I Lets consider our example: The Senior Executives at Nationwide want to determine
the sales for certain products in certain locations for a particular time period.
Consider the Product Dimension:Product Key Product
NameProduct Type Product
Price1 Nationwide
PersonalPL $10
2 Nationwide Commercial
CL $25
3 Nationwide Pet
PL $35 Let us consider the company decided tomorrow that Nationwide pet should be
classified as others instead of PL or decided to change the price of Nationwide Personal from $10 to $15?
How will that affect the analysis and reporting and how do we account for such changes?
Do we keep the old historical data or we insert the new data directly?
![Page 17: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/17.jpg)
Methodologies for Handling Slowly Changing Dimensions Type 1- No need to track historical data simply
overwrite the existing data with the new one. (No history)
Type 2 – Historical data should be tracked. Create a new row for the natural key but with a different surrogate key. ( Full History)
Type 3 – Historical data should be partially tracked. Between Type 1 and Type 2. Insert additional columns to track current and last state of the changing attribute. (Partial History).
![Page 18: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/18.jpg)
Illustrations of SCD Handling Methodologies
Lets take a product dimension. Product Type Changes for Nationwide Pet from PL to others.
Type1:
Type2:
Type3:
Product Key Product Name
Product Type Product Price
3 Nationwide Pet
PL $35
Product Key Product Name
Product Type Product Price
3 Nationwide Pet
Others $35
Product Key
Product Name
Product Type
Product Price
Effective Date
Expiry Date
Latest_Ind
3 Nationwide Pet PL $35 01-01-2000
08-10-2015
N
4 Nationwide Pet Others $35 08-11-2015
12-31-9999
YProduct Key Product
NameProduct Type_Old
Product Price Product Type_New
3 Nationwide Pet PL $35 Others
![Page 19: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/19.jpg)
Relational vs Dimensional Models
Relational Data Models Dimensional Data ModelsUnits of storage are tables. Units of Storage are CubesData is Normalized. Data is Denormalized.Detailed Level of Transaction. Aggregates and Measures used for
Business.Volatile and Time Variant. Non Volatile and Time Invariant.Used for OLTP. Used for OLAP CubesNormal Reports. Interactive, user friendly reports.
![Page 20: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/20.jpg)
References
http://learndatamodeling.com/blog/comparison-of-relational-and-dimensional-data-modeling/
http://searchdatamanagement.techtarget.com/definition/fact-table
https://technet.microsoft.com/en-us/library/Aa905979(v=SQL.80).aspx
http://dwbi.org/data-modelling/dimensional-model/1-dimensional-modeling-guide
http://dwbi.org/data-modelling/dimensional-model/19-modeling-for-various-slowly-changing-dimension
![Page 21: Introduction to Dimesional Modelling](https://reader035.vdocuments.net/reader035/viewer/2022070510/58aa16f01a28ab8a488b6da3/html5/thumbnails/21.jpg)