01 presentation dwh
TRANSCRIPT
Lnt Infotech Use Only
An Introduction An Introduction to to
Data WarehousingData Warehousing
Lnt Infotech Use Only
Objectives
• Data Warehouse Overview• Data Warehouse ,OLTP & ODS• Data Warehouse Architecture • Data Models in Data Warehousing• Slowly changing dimensions• Surrogate Keys
Lnt Infotech Use Only
A producer wants to know….
Which are our lowest/highest margin
customers ?
Which are our lowest/highest margin
customers ?
Who are my customers and what products are they buying?
Who are my customers and what products are they buying?
Which customers are most likely to go to the competition ?
Which customers are most likely to go to the competition ?
What impact will new products/services
have on revenue and margins?
What impact will new products/services
have on revenue and margins?
What product prom--otions have the biggest
impact on revenue?
What product prom--otions have the biggest
impact on revenue?
What is the most effective distribution
channel?
What is the most effective distribution
channel?
Lnt Infotech Use Only
Data, Data everywhereyet ...
• I can’t find the data I need– data is scattered over the network– many versions, subtle differences
• I can’t get the data I need– need an expert to get the data
• I can’t understand the data I found– available data poorly documented
• I can’t use the data I found– results are unexpected– data needs to be transformed from one form to other
Lnt Infotech Use Only
What are the users saying...
• Data should be integrated across the enterprise
• Summary data has a real value to the organization
• Historical data holds the key to understanding data over time
• What-if capabilities are required
Lnt Infotech Use Only
Data Warehouse
• A data warehouse is a Subject-oriented Integrated Time-varying Non-volatile
collection of data that is used primarily in organizational decision making
-- Bill Inmon, Building the Data Warehouse 1996
Lnt Infotech Use Only
What is Data Warehousing
A process of transforming data into information and making it available to users in a timely enough manner to make a difference
[Forrester Research, April 1996]Data
Information
Lnt Infotech Use Only
Need for Data Warehousing
• Better business intelligence for end-users• Reduction in time to locate, access, and analyze
information• Consolidation of disparate information sources• Strategic advantage over competitors• Faster time-to-market for products and services• Replacement of older, less-responsive decision
support systems• Reduction in demand on IS to generate reports
Lnt Infotech Use Only
Evolution of Data Warehousing
1960 - 1985 : MIS Era
• Unfriendly
• Slow
• Dependent on IS programmers
• Inflexible
• Analysis limited to defined reports
Focus on ReportingFocus on Reporting
Lnt Infotech Use Only
Evolution of Data Warehousing
1985 - 1990 : Querying Era
• Adhoc, unstructured access to corporate data
• SQL as interface not scalable
• Cannot handle complex analysis
Focus on Online QueryingFocus on Online Querying
Lnt Infotech Use Only
Evolution of Data Warehousing
1990 - 20xx : Analysis Era
• Trend Analysis
• What If ?
• Moving Averages
• Cross Dimensional Comparisons
• Statistical profiles
• Automated pattern and rule discovery
Focus on Online AnalysisFocus on Online Analysis
Lnt Infotech Use Only
Data Warehousing Concepts and Terms
Some terms that are of great importance in understanding of data warehousing concepts are
Operational Data : It is the data that is used to run a business. This data is what is typically stored, retrieved and updated by Online Transaction Processing (OLTP) system. Operational data is stored in a relational database, but may be stored in legacy, hierarchical or flat file formats as well.
Informational Data: It is stored in a format that makes analysis much easier. Analysis can be in the form of decision support(queries), report generation, executive information systems, and more in-depth statistical analysis. Informational data is created from the wealth of operational data that exists in the business. Informational data is what makes up a data warehouse.
Lnt Infotech Use Only
OLTP Systems Vs Data Warehouse
RememberBetween OLTP and Data Warehouse systems
users are different
data content is different,
data structures are different
hardware is different
Understanding The Differences Is The KeyUnderstanding The Differences Is The Key
Lnt Infotech Use Only
Capacity Planning
Pro
cessin
g P
ow
er
Time of day
Processing Load Peaks During the Beginning and End of DayProcessing Load Peaks During the Beginning and End of Day
Lnt Infotech Use Only
OLTP Vs Data Warehouse
Characteristic OLTP Data Warehouse
Orientation Transaction Analysis
Data Content Current values Summarized, Archived, Derived,
Usually historical values
Data Structure Optimized for transactions
Highly Normalized
Optimized for complex queries
Often De-Normalized
Lnt Infotech Use Only
OLTP Vs Data Warehouse
Characteristic OLTP Data Warehouse
Data Access Record at a time Data set at a time
Access Frequency Read/Update/Delete Read / Aggregate
Concurrent Users Many Few
Data Stability Dynamic Static until refreshed
Data Organization By Application By Subject
Usage Predictable, repetitive Adhoc, Heuristic
Support Day-to-day operations Managerial needs
Response time Few seconds Several seconds to minutes
Lnt Infotech Use Only
Do we need a separate database ?
• OLTP and data warehousing require two very differently configured systems
• Isolation of Production System from Business Intelligence System
• Significant and highly variable resource demands of the data warehouse
• Cost of disk space no longer a concern• Production systems not designed for query
processing
Lnt Infotech Use Only
Why Separate Data Warehouse?
Performance
special data organization, access methods, and implementation methods are needed to support multidimensional views and operations typical of OLAP
Complex OLAP queries would degrade performance for operational transactions
Concurrency control and recovery modes of OLTP are not compatible with OLAP analysis
Lnt Infotech Use Only
Why Separate Data Warehouse?
Function missing data: Decision support requires historical
data which operational DBs do not typically maintain
data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources: operational DBs, external sources
data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled.
Lnt Infotech Use Only
Operational Data Store - Definition
B
A
C
Operational
DSS
Data Warehouse
ODS
Lnt Infotech Use Only
Operational Data Store - Definition
A subject oriented, integrated,
volatile, current valued data store containing only corporate
detailed data
Data stored only for current period. Old
Data is either archived or moved to
Data Warehouse
Can I see credit report from
Accounts, Sales from
marketing and open order report from
order entry for this customer
Identical queries may give different results
at different times. Supports analysis requiring current
data
Data from multiple sources is integrated
for a subject
Lnt Infotech Use Only
Operational Data Store
• The ODS applies only to the world of operational systems.
• The ODS contains current valued and near current valued data.
• The ODS contains almost exclusively all detail data• The ODS requires a full function, update, record
oriented environment.
Lnt Infotech Use Only
Different kinds of Information Needs
• Current
• Recent
• Historical
• Current
• Recent
• Historical
Is this medicine available in stock
What are the tests this patient has completed so far
Has the incidence of Tuberculosis increased in last 5 years in Southern region
Lnt Infotech Use Only
OLTP Vs ODS Vs DWH
Characteristic OLTP ODS Data Warehouse
Audience Operating Personnel
Analysts Managers and analysts
Data access Individual records, transaction driven
Individual records, transaction or analysis driven
Set of records, analysis driven
Data content Current, real-time
Current and near-current
Historical
Data Structure Detailed Detailed and lightly summarized
Detailed and Summarized
Data organization
Functional Subject-oriented Subject-oriented
Type of Data Homogeneous Homogeneous Vast Supply of very heterogeneous data
Lnt Infotech Use Only
OLTP Vs ODS Vs DWH
Characteristic OLTP ODS Data Warehouse
Data redundancy Non-redundant within system; Unmanaged redundancy among systems
Somewhat redundant with operational databases
Managed redundancy
Data update Field by field Field by field Controlled batch
Database size Moderate Moderate Large to very large
Development Methodology
Requirements driven, structured
Data driven, somewhat evolutionary
Data driven, evolutionary
Philosophy Support day-to-day operation
Support day-to-day decisions & operational activities
Support managing the enterprise
Lnt Infotech Use OnlyFigure 3. Reasons for moving data outside the operations systems
•Different performance requirements
•Combine data from multiple applications
•Data is mostly non-volatile
•Data saved for a long time period
Order processing
•2 second response time
•Last 6 months orders
DataWarehouse•Last 5 years data
•Response time 2 secondsto 60 minutes
•Data is not modified
Product Price/inventory
•10 second response time
•Last 10 price changes
•Last 20 inventory transactions
Marketing
•30 second response time
•Last 2 years programs
Logical Transformation of operational data
Lnt Infotech Use Only
Logical Transformation of operational data
Figure 5. Data warehouse entities align with the business structure
•No data model restrictions of the source application
•Data warehouse model has business entities
DataWarehouse
Product Price/inventory
MarketingCustomerProfile
Productprice
Order processing
Available Inventory
Customerorders
Productprice
Marketing programs
Productprice
ProductInventory
Product Price changes
Customers
Products
Product Inventory
Product Price
Orders
Lnt Infotech Use Only
Logical Transformation of operational data
Figure 6. Transformation of the operational state information
• Operational state information is not carried to the data warehouse
• Data is transferred to the data warehouse after all state changes
Order ProcessingSystem Data
WarehouseDaily closed orders
Order
Up
Inventory
Dow
n
Weekly inventory snapshot
Editor:
Please add Open,Backorder, Shipped,Closed to the arrowaround the order
Inventory snapshot 1
Inventory snapshot 2
Orders (Closed)
Lnt Infotech Use Only
Advantages of Data Warehouse
• Time saving : The Warehouse has enabled employee to shift their time from collecting information to analyzing it & that helps the company make better business decisions.
• Efficiency : A DW provides, in one central repository, all the metrics necessary to support decision making throughout the queries & reports.
• Complete documentation : A typical DW objective is to store all the information including history
Lnt Infotech Use Only
Advantages of Data Warehouse
• Data Integration : Primary goal of all DW is to
integrate data because :
a) This is a primary deficiency in current decision
support systems.
b) Data content in one file is at a different level of
granularity than that in another file.
c) Same data in one file is updated at a different time
period than that in another file.
Lnt Infotech Use Only
Limitation of Data Warehouse
• High cost of building and on-going maintenance ($ 3 - 5 million).
• Complexity : Since it has to integrate all the data & transaction
system database and hence requires more time to design &
build (average DW requires approx. 3 years to implement).
• Answer to these limitations is Data Marts
Lnt Infotech Use Only
Data Marts
• Subject or Application Oriented Business View of Warehouse– Quick Solution to a specific Business Problem– Finance, Marketing, Sales etc.– Smaller amount of data used for Analytic Processing
A Logical Subset of The Complete Data WarehouseA Logical Subset of The Complete Data Warehouse
Lnt Infotech Use Only
Data Marts
Marketing Data Mart Finance Data Mart Sales Data Mart
Current Level of Detail
( Data Warehouse)
Lnt Infotech Use Only
Data Mart Appeal
What is the appeal of the Data Mart?
Why do departments find it convenient to do their decision support processing in their own data mart?
What is wrong with the data warehouse as a basis for standard decision support making?
There are several factors leading to the popularity of the data mart.
Lnt Infotech Use Only
Data Mart Appeal
As Data warehouses grow,
The competition to get inside the data warehouse grows fierce. More and more departmental decision support processing is done inside the data warehouse to the point where resource consumption becomes a real problem
Data becomes harder to customize
The cost of doing processing in the data warehouse increases as the volume of data increases
The department can build the data mart on its own budget, thereby making all the technological decision it wants
Lnt Infotech Use Only
Summary of Data Mart Appeal
• While DW was designed to manage bulk supply of
data from its suppliers(I.e. operational systems), and
to handle the organization and storage of this data,
the “retail stores” or “Data Marts” could be focussed
on packaging & presenting selections of data to end-
users, often to meet specialized needs.
Lnt Infotech Use Only
Data Warehouse and Data Mart
Data Warehouse Data Marts
Scope •Application Neutral•Centralized, Shared•enterprise
•Specific Application Requirement•department•Business Process Oriented
Data Perspective
•Historical Detailed data•Some summary
•Detailed (some history)•Summarized
Subjects •Multiple subject areas •Single Partial subject•Multiple partial subjects
Lnt Infotech Use Only
Data Warehouse and Data Mart
Data Warehouse Data Marts
Data Sources •Many•Operational/ External Data
•Few•Operational, external data
Implement Time Frame
•9-18 months for first stage•Multiple stage implementation
•4-12 months
Characteristics •Flexible, extensible•Durable/Strategic•Data orientation
•Restrictive, non extensible•Short life/tactical•Project Orientation
Lnt Infotech Use Only
Data Warehouses or Data Marts
For companies interested in changing their corporate cultures or
integrating separate departments, an enterprise wide approach makes
sense.
Companies that want a quick solution to a specific business
problem are better served by a standalone data mart.
Some companies opt to build a warehouse incrementally, data mart
by data mart.
A Logical Subset of The Complete Data WarehouseA Logical Subset of The Complete Data Warehouse
Lnt Infotech Use Only
Warehouse or Mart First ?
Data Warehouse First Data Mart first Expensive Relatively cheap
Large development cycle Delivered in < 6 months
Change management is difficult Easy to manage change
Difficult to obtain continuous corporate support
Can lead to independent and incompatible marts
Technical challenges in building large databases
Cleansing, transformation, modeling techniques may be incompatible
Lnt Infotech Use Only
Data Warehousing Model
Operational Data
Distributed data
External market data
ETL
Data Mining
DSS Tools Data Warehouse
OLAP Tools
Data Marts
Lnt Infotech Use Only
Typical Data Warehouse Architecture
OperationalSystems/Data
Select
Extract
Transform
Integrate
Maintain
Data Preparation
Middleware/API
Data Warehouse
Metadata
EIS /DSS
Query Tools
OLAP/ROLAP
Web Browsers
Data Mining
DataMarts
Multi-tiered Data Warehouse without ODSMulti-tiered Data Warehouse without ODS
Lnt Infotech Use Only
Typical Data Warehouse Architecture
OperationalSystems/Data
Select
Extract
Transform
Integrate
Maintain
Data Preparation
DataMarts
Data Warehouse
Metadata
ODS
Metadata
Select
Extract
Transform
Load
Data Preparation
Multi-tiered Data Warehouse with ODSMulti-tiered Data Warehouse with ODS
Lnt Infotech Use Only
Application of Data Warehousing
• OLAP
• Data Mining
Lnt Infotech Use Only
Commonly used Terms in OLAP
Measure: The entity in numeric figure that tells about the business.
Dimension: A category of information that describes the measure. For e.g The time dimension.
Attribute: A unique level within a dimension, For e.g Month is an attribute within the time dimension.
Hierarchy: The specification of levels that represents relationship between different attributes within a hierarchy. For example: one possible hierarchy in the Time dimension is
Year-- Quarter--Month--Day
Lnt Infotech Use Only
This is a common use of Data warehouse that involves real time access and analysis of multi-dimensional data such as sales information.
The term OLAP has been invented in the recent years to represent the opposite of OLTP(Online Transaction Processing System). Key characteristics of OLAP include
• Large data volumes
• Drill down along many dimensions
• Dynamic viewing and analysis of the data from a wide variety of perspectives and through complex formulae
OLAP : Online Analytical Processing
Lnt Infotech Use Only
Online Analytical Processing
OLAP EXAMPLE:
An example OLAP database may be comprised of sales data which has been aggregated by region, product type, and sales channel. A typical OLAP query might access a multi-year sales database in order to find all product sales in each region for each product type.
After reviewing the results, an analyst might further refine the query to find sales volume for each sales channel within region/product classifications.
As a last step the analyst might want to perform year-to-year or quarter-to-quarter comparison for each sales channel. This whole process must be carried out on-line with rapid response time so that the analysis process is undisturbed.
Lnt Infotech Use Only
Q4Time
Q1 Q2 Q3
ProductGrapes
Apples
Melons
Cherries
Pears
LocationAtlanta
DenverDetroit
SalesSales
•Introduction to Cubes
ProductGrapes
Apples
Melons
Cherries
Pears
ProductGrapes
Apples
Melons
Cherries
Pears
LocationAtlanta
DenverDetroit
SalesSales
Lnt Infotech Use Only
Online Analytical Processing
OLAP database servers support common analytical operations including: “slicing and dicing”, drill down and Consolidation.
“Slicing and Dicing” Slicing and dicing refers to the ability to look at the database from different view points. One slice of the sales database might show all sales of product type within a region. Another slice might show all sales by sales channel within each product type. Slicing and dicing is often performed along a time axis in order to analyze trends and find patterns.
Drill-Down: OLAP database servers can also go in the reverse direction and automatically display detail data which comprises consolidated data. This is called drill-downs. Consolidation and drill-down are an inherent property of OLAP servers.
Consolidation: Involves the aggregation of data such as simple rollups, like for example sales officers can be rolled-up to districts and districts rolled-up to regions.
Lnt Infotech Use Only
Data Mining
Data Mining is also called as “Knowledge Discovery in Databases (KDD)”
Data Mining also refers to “using a variety of techniques to identify nuggets of information or decision-making knowledge in bodies of data, and extracting these in such a way that they can be put to use in the areas such as decision support, prediction, forecasting and estimation. The data is often voluminous, but as it stands of low value as no direct use can be made of it; it is the hidden information in the data that is useful.
Lnt Infotech Use Only
Applications of Data Mining
Data mining has varied fields of applications some of which are listed below:
RETAIL/ MARKETING
Identify buying patterns from customers
Find associations among customer demographic characteristics
Predict response to mailing campaigns
BANKING
Detect patterns of fraudulent credit card use
Identify loyal customers
Determine credit card spending by customer groups
Find hidden correlations between different financial indicators
Lnt Infotech Use Only
Who uses Data Warehouse
• Managers use sales data to improve forecasting & planning for brands, product lines & business areas.
• Retail purchasing managers use DW to track fast-moving lines & ensure an adequate supply of high demand products.
• Financial analyst use warehouses to manage currency & exchange exposures, oversee cash flow & monitor capital expenditures.
Lnt Infotech Use Only
Questions
Lnt Infotech Use Only
Introduction Introduction to to
Data ModelingData Modeling
Lnt Infotech Use Only
Objectives
• At the end of this lesson, you will know :– Data Modeling for Data Warehouse– What are dimensions and facts– Star Schema and Snowflake Schemas– Factless Tables– Some modeling tools
Lnt Infotech Use Only
Data Modeling for Data Warehouse
• How to structure the data in your data warehouse ?• Process that produces abstract data models for one
or more database components of the data warehouse• Modeling for Warehouse is different from that for
Operational database– Dimensional Modeling, Star Schema Modeling or
Fact/Dimension Modeling
Lnt Infotech Use Only
Modeling Techniques
• Entity-Relationship Modeling – Traditional modeling technique– Technique of choice for OLTP– Suited for corporate data warehouse
• Dimensional Modeling– Analyzing business measures in the specific business
context– Helps visualize very abstract business questions– End users can easily understand and navigate the data
structure
Lnt Infotech Use Only
Entity-Relationship Modeling - Basic Concepts
• The ER modeling technique is a discipline used to illuminate the microscopic relationships among data elements.
• The highest art form of ER modeling is to remove all redundancy in the data.
• Created databases that cannot be queried !!!!!
Lnt Infotech Use Only
An Order Processing ER Model
Order Header
Order Details
Customer TableFK
Item TableFK
Salesrep tableCity
Sales District
Sales Region
Sales Country Product Brand
Product Category
FK
Lnt Infotech Use Only
Entity-Relationship Modeling - Basic Concepts
• Entity– Object that can be observed and classified by its properties
and characteristics– Business definition with a clear boundary– Characterized by a noun– Example
• Product
• Employee
Lnt Infotech Use Only
Entity-Relationship Modeling - Basic Concepts
• Relationship– Relationship between entities - structural interaction and
association– described by a verb – Cardinality
• 1-1
• 1-M
• M-M
– Example : Books belong to Printed Media
Lnt Infotech Use Only
Entity-Relationship Modeling - Basic Concepts
• Attributes– Characteristics and properties of entities– Example :
• Book Id, Description, book category are attributes of entity “Book”
– Attribute name should be unique and self-explanatory– Primary Key, Foreign Key, Constraints are defined on
Attributes
Lnt Infotech Use Only
Entity-Relationship Modeling – Why Not ?
• End users cannot understand or remember an ER model.
• No graphical user interface (GUI) that takes a general ER model and makes it usable by end users.
• Softwares cannot usefully query a general ER model. • Use of the ER modeling technique defeats the basic
allure of data warehousing, namely intuitive and high-performance retrieval of data.
Lnt Infotech Use Only
Dimensional Modeling - Basic Concepts
• Represents the data in a standard, intuitive framework that allows for high-performance access;
• Schema designed to process large, complex, adhoc and data intensive queries.
• No concern for concurrency, locking and insert/update/delete performance
• Every dimensional model is composed of one table with a multipart key, called the fact table, and a set of smaller tables called dimension tables.
• This characteristic "star-like" structure is often called a star join.
Lnt Infotech Use Only
Star Schema Architecture
Lnt Infotech Use Only
Star Schema Example
Lnt Infotech Use Only
Star Schema with Sample Data
Lnt Infotech Use Only
Star Schema Architecture
time_keyproduct_keystore_keydollars_soldunits_solddollars_cost
time_keyday_of_weekmonthquarteryearholiday_flag
product_keydescriptionbrandcategory
store_keystore_nameaddressfloor_plan_type
Store Dimension
Product Dimension
Sales Fact
Time Dimension
Lnt Infotech Use Only
Star Schema Architecture
The previous example shows a STAR Schema
The reason for this name is that your query takes on the shape of a star.
The Fact table is the body of the star and the dimension tables are the points of the star.
In the star schema design, a single object (the fact table) sits in the middle and is radially connected to the other surrounding tables(dimension tables) and looks like a STAR.
Lnt Infotech Use Only
Star Schema Architecture
FACT TABLES
The fact table is where the numerical measurements of the business
are stored.
Typically represents a business transaction, or event that can be used
in analyzing business process
Sparse
Access control to sensitive information is maintained in fact tables
These tables can be very large; as much as several billion of rows .
Lnt Infotech Use Only
Star Schema Architecture
Dimension Tables
The dimension tables are where the textual descriptions of the dimensions of the business are stored.
Dimension tables are designed especially for selection and grouping.
There is no access control on these tables, all users can view this information
These tables are much smaller than the Fact tables, may contain 10,000 rows of data.
Lnt Infotech Use Only
Star Schema Architecture
• Dimension Tables Each dimension table has a single-part primary key that
corresponds exactly to one of the components of the multipart key in the fact table.
Dimension tables, most often contain descriptive textual information
Determine contextual background for facts Examples :
• Time
• Location/Region
• Customers
Lnt Infotech Use Only
Star Schema Architecture
• The database consists of a single fact table and a single
table for each dimension.
• Each tuple in the fact table consists of a pointer (foreign
key ) to each of the dimension tables.
• Each dimension table consists of columns that
correspond to attributes of the dimension.
Lnt Infotech Use Only
Star Schema Architecture
• A key role for dimension table attributes is to serve as the source of constraints in a query or to serve as row headers in the user’s answer set.
• For example : A typical answer set returned from a query looks like this :
Brand Dollar Sales Unit SalesAxon 780 263
Framis 1044 509
Widget 213 444
Zapper 95 39
Lnt Infotech Use Only
Star Schema Architecture
• This query seeks to find all the product brands (collection of individual products)that were sold in the first quarter of 1995 and present the total dollar sales as well as the number of units.Thus both the dimension attributes the product and time have been used for providing row headers (product brands) and providing constraints (first quarter of 1995) respectively.
Lnt Infotech Use Only
Components of a Star Schema
Employee_DimEmployee_DimEmployee_DimEmployee_DimEmployeeKeyEmployeeKey
EmployeeID...EmployeeID...
EmployeeKey
Time_DimTime_DimTime_DimTime_DimTimeKeyTimeKey
TheDate...TheDate...
TimeKeyProduct_DimProduct_DimProduct_DimProduct_Dim
ProductKeyProductKey
ProductID...ProductID...
ProductKey
Customer_DimCustomer_DimCustomer_DimCustomer_DimCustomerKeyCustomerKey
CustomerID...CustomerID...
CustomerKeyShipper_DimShipper_DimShipper_DimShipper_Dim
ShipperKeyShipperKey
ShipperID...ShipperID...
ShipperKey
Sales_FactSales_FactTimeKeyEmployeeKeyProductKeyCustomerKeyShipperKey
TimeKeyEmployeeKeyProductKeyCustomerKeyShipperKey
RequiredDate...RequiredDate...
TimeKey
CustomerKeyShipperKey
ProductKeyEmployeeKey
Multipart KeyMultipart KeyMultipart KeyMultipart Key
MeasuresMeasuresMeasuresMeasures
Dimensional KeysDimensional KeysDimensional KeysDimensional Keys
Lnt Infotech Use Only
Fact Table & Dimension Tables
• Fact Tables• Numerical Measurements of
business are stored in Fact Tables.
• Dimensional Tables• Dimensions are attributes
about facts.
• Dimensional Tables• Dimensions are attributes
about facts.
• Fact Tables• Numerical Measurements of
business are stored in Fact Tables.
Lnt Infotech Use Only
Dimension Hierarchies
• For each dimension, the set of associated attributes can be structured as a hierarchy
storesType
city region
customer city state country
Lnt Infotech Use Only
Dimension Hierarchies
store storeId cityId tId mgrs5 sfo t1 joes7 sfo t2 freds9 la t1 nancy
city cityId pop regIdsfo 1M northla 5M south
region regId namenorth cold regionsouth warm region
sType tId size locationt1 small downtownt2 large suburbs
Lnt Infotech Use Only
Snowflake Schema
Snowflake schema: A refinement of star schema where the dimensional hierarchy is represented explicitly by normalizing
the dimension tables
Lnt Infotech Use Only
Snowflake Schema
• Dimension tables are normalized by decomposing at the attribute level
• Each dimension has one key for each level of the dimension’s hierarchy
• Good performance when queries involve aggregation• Complicated maintenance and metadata, explosion in
number of table.• Makes user representation more complex and
intricate
Lnt Infotech Use Only
Snowflake schema - Example
•
FactTable
DimTable
DimTable
DimTable
DimTable
Lnt Infotech Use Only
Using a Snowflake Schema
Sales_FactSales_FactTimeKeyEmployeeKeyProductKeyCustomerKeyShipperKey
TimeKeyEmployeeKeyProductKeyCustomerKeyShipperKeyRequiredDate...RequiredDate...
Product_Brand_IDProduct_Brand_IDProduct_Brand_IDProduct_Brand_IDProduct BrandProduct Brand
Product Category IDProduct Category ID
Product_Category_IDProduct_Category_IDProduct_Category_IDProduct_Category_IDProduct CategoryProduct Category
Product Category IDProduct Category ID
Product_DimProduct_DimProduct_DimProduct_DimProductKeyProductKey
Product NameProduct Name
Product SizeProduct Size
Product Brand IDProduct Brand ID
Lnt Infotech Use Only
Conformed Dimensions
• Dimension that means the same thing with every possible fact table that it can be joined with
• Conformed dimensions most essential – For the Bus Architecture– Integrated function of the Data Warehouse
• Some common dimensions are :– Customer– Product– Location– Time
Lnt Infotech Use Only
Surrogate Keys
• All tables (facts and dimensions) should not use production keys but Data Warehouse generated surrogate keys– Productions keys get reused sometimes– In case of mergers/acquisitions, protects you from different
key formats– Production systems may change their systems to generalize
key definitions– Using surrogate key will be faster– Can handle Slowly Changing dimensions well
Lnt Infotech Use Only
Slowly Changing Dimensions
Certain kinds of dimension attribute changes need to be Certain kinds of dimension attribute changes need to be handled differently in Data Warehousehandled differently in Data Warehouse
• Type I - Overwrite
– e.g. Name Correction, Description changes
• Type II - Partition History– Packing change, Customer movement– Create a new dimension record with new surrogate key
• Type III - Organizational changes– Sales Force Reorganization
– Show by sales broken by new and old organizations
– Need to create an old and a new field
Lnt Infotech Use Only
Factless Fact Tables
• For Event Tracking e.g. attendance
Date_Key
Student_Key
Course_Key
Teacher_Key
Facility_Key
DateDimension
CourseDimension
FacilityDimension
StudentDimension
TeacherDimension
Lnt Infotech Use Only
Examples of Data Modeling Tools
• ERWIN– Supports Data Warehouse design as a modeling technique
• Powersoft WarehouseArchitect– Module of Power Designer specifically for DW Modeling
• Oracle Designer– Can be extended for Warehouse modeling
• Others like Infomodeler, Silverrun are also used
Lnt Infotech Use Only
Questions