agenda common terms used in the software of data warehousing and what they mean. difference between...
TRANSCRIPT
Agenda
• Common terms used in the software of data warehousing and what they mean.
• Difference between a database and a data warehouse - the difference in how each are optimised.
• What is a cube and what are dimensions? • High level overview of Performance Point • Difference between a score card and a dashboard • How do the data warehouse, cube and Performance Point relate to one
another? • At which point and how should calculated fields be added. • The purpose and definition of Fact Tables, Dimension Tables etc. • Quantifiable benefits organisations achieve through data warehousing
Data Warehouse vs Transaction Database
• Transaction Database– Handles day-to-day activities
• Takes Orders• Manages Production• Ships Orders• Runs Accounts• Changes frequently (every hour, minute, second)
• Data Warehouse– Handles Planning
• Looks at historical patterns of Sales• Shows trends in demand and production• Remains mainly static
– New data is added and/or corrections made infrequently
Data Warehouse Overview
Operational Source
Systems
Extract
Data Staging
Area
Services:Clean, combine and standardiseConform dimensionsNO USER QUERY SERVICES
Data Store:Flat Files and Relational Tables
Processing:Sorting and sequential processing
Data Presentation
Area
Data Mart 1DIMENSIONALAtomic and Summary Data.Based on a single business process
Extract
Extract
DW Bus:Conformed Facts and Dimensions
Data Mart 2,3, etc
Data Access Tools
Ad Hoc Query Tools
Report Writers
Analytic and Modelling Applications
SQLMDXDMXExcel
Reporting ServicesReport Builder
Analysis ServicesPerformancePoint
Access
Access
Load
Load
A Data Warehouse
Data Profiler
Source Systems
Corrections
ETL Staging Tables
DQ &
ETL
Control & AuditMetadata
Data Quality
DDS
Reports
Name Description
Data Profiler Analyses number of rows in tables, how many rows contain nulls, etc
Metadata Database containing info about the data structure, data meaning, DQ rules, etc
ETL Extract, Transform and Load process
MDB Multi Dimensional Database
MDB/Cubes
Pivot Tables
Ad Hoc Queries
Spreadsheets
Reports
Data Mining
Dashboard
Analytics
Reports
Scorecards
Other BI Apps
Cubes
The Data WarehouseUsing an Enterprise Data Warehouse
Data Profiler
Source Systems
Corrections
ETL Staging Tables
DQ &
ETL
Control & AuditMetadata
Data Quality
EDW ETL
ETL
DDS
DDS
BI Apps
Finance Apps
CRM Apps
Reports
Name Description
Data Profiler Analyses number of rows in tables, how many rows contain nulls, etc
Metadata Database containing info about the data structure, data meaning, DQ rules, etc
ETL Extract, Transform and Load process
EDW Enterprise Data Warehouse
EXAMPLE OF A MULTI DIMENSIONAL DATABASE
What is a Multi Dimensional Database?
• Consider a sales operation:– We know that last year our total Widget Sales were 53,853– How were those sales broken down?
• Broken down by Quarter:
Q1 Q2 Q3 Q4 TotalSales 8288 16148 18501 10916 53853
But we need more detail – What were the sales of Left, Right and Ambidextrous Widgets
Widget Sales in more detail
Q1 Q2 Q3 Q4
Total Sales 8288 16148 18501 10916 53853
Left Handed Widgets 660 740 794 911
Right Handed Widgets 6128 6509 7707 8342
Ambidextrous Widgets 1500 1650 1499 1663
But we also need to know the sales by area:
Widget Sales in great detailQ1 Q2 Q3 Q4
Sales 8,278 16,148 18,501 10,916 53,853
Left Handed Widgets 650 740 794 911
England 300 330 355 461
Scotland 200 235 260 261
Wales 150 165 179 181
NI 10 8
Right Handed Widgets 6,128 6,509 7,707 8,342
England 2,301 2,565 3,412 3,987
Scotland 1,387 1,454 1,550 1,651
Wales 540 600 765 690
NI 1,900 1,890 1,980 2,014
Ambidextrous Widgets 1,500 1,650 1,499 1,663
England 799 808 789 901
Scotland 400 501 367 460
Wales 300 341 320 299
NI 1 23 3
The CubeQ1 Q2 Q3 Q4Sales 8,278 16,148 18,501 10,916 53,853
Left Handed Widgets 650 740 794 911
England 300 330 355 461Scotland 200 235 260 261Wales 150 165 179 181NI 10 8
Right Handed Widgets 6,128 6,509 7,707 8,342
England 2,301 2,565 3,412 3,987Scotland 1,387 1,454 1,550 1,651Wales 540 600 765 690NI 1,900 1,890 1,980 2,014
Ambidextrous Widgets 1,500 1,650 1,499 1,663
England 799 808 789 901Scotland 400 501 367 460Wales 300 341 320 299NI 1 23 3
4 labels
3 labels
4 labels
• This structure can hold a certain number of data elements. • The number of elements is the total number of separate labels multiplied together• i.e this structure can hold 4 x 3 x 4 data elements. (= 48)• Which makes it look a lot like a cube…• That’s as far as the cube analogy can go, because a real data warehouse will have many different sets of independent labels – They are called Dimensions
Dimension Tables
• Dimension Tables contain the names of each member of the dimension:Product_ID Product_Name Category
101 Left Handed Widget Retail
102 Right Handed Widget Retail
103 Ambidextrous Widget Specialist
Primary Key
Fact Table
Region_ID Product_ID Quarter Units Price
1 101 1 300 45.20
1 101 2 330 45.20
1 101 3 355 45.20
1 101 4 461 44.00
1 102 1 200 39.00
1 102 2 235 39.00
1 102 3 260 38.50
1 102 4 261 38.50
Fact Table & Dimension Table Relationship
Region_ID Product_ID Quarter Units Price
1 101 1 300 45.20
1 101 2 330 45.20
1 101 3 355 45.20
1 101 4 461 44.00
1 102 1 200 39.00
1 102 2 235 39.00
1 102 3 260 38.50
1 102 4 261 38.50
Product_ID Product_Name
101 Left Handed Widget
102 Right Handed Widget
103 Ambidextrous Widget
One-to-Many Relationship
• Normalised Data Structure– Structure designed for handling live transactions
• Dimensional Data Structure– AKA Denormalised Data Structure– Structure designed for querying
• Operational Data Store– Often a copy of a transactional database– Updated regularly from transactional systems– May be used for reporting
Common terms used in data warehousing and what they mean - 1
Common terms used in data warehousing and what they mean - 2
• Dimensional Modelling– Fact Table or Measure Table
• Holds historical records of events that occurred in a transactional system– Conformed Facts
• Facts from multiple fact tables are conformed when the technical definitions of the facts are equivalent. Conformed facts can have the same name in different tables and can be combined and compared mathematically
– Dimension Table• Has a number of Attributes, e.g. Product Name, Category, Colour, etc• Used to slice and dice the data in the Fact Table
– Attribute• Property of a Dimension
– Conformed Dimension• Dimensions are conformed when the are exactly the same (including the keys) or
one is a perfect subset ot the other• The row headers produced in answer sets from two different conformed
dimensions must be able to be matched perfectly
Conformed Dimensions - Example
Business Processes
Common Dimensions
Date
Product
Store
Promotion
Warehouse
Vendor
Contract
Shipper
Retail Sales x x x x
Retail Inventory x x x
Retail Deliveries x x x
Warehouse Inventory x x x x
Warehouse Deliveries x x x x
Purchase Orders x x x x x x
Facts and Dimensions - Example
Common terms used in data warehousing and what they mean - 3
• Slowly Changing Dimension (SCD)– A Dimension where the rows change slowly over time. An example would be a
product Dimension where the Price attribute changes from year to year as a result of marketing/profitability issues.
• Type 1 SCD– Values are overwritten when they change
• Type 2 SCD– A new row is written when the value of an attribute changes
• Type 3 SCD– The previous value is put into an “Old Value” column
• Data Mart– A logical and physical subset of the data warehouse’s presentation area– Data Marts can be tied together using Drill-Across queries when their
dimensions are conformed
Common terms used in data warehousing and what they mean - 4
• Primary Key– Unique Identifier for a record
• Foreign Key– A value in a record that refers to a Primary Key in another table
• Surrogate Key – AKA Meaningless key, integer key, nonnatural key, artificial key, synthetic key– A new primary key that is created in a table to ensure uniqueness regardless of the source of new
records.• E.g. Two Customer tables in different sources may both have a primary key on CustomerID. This means that
the same CustomerID could relate to two totally different customers, depending on which source they came from. So when the records are added to a Dimensional Data Warehouse, a new Primary Key is added which has no relationship to the sources’ primary keys
• Grain– The meaning of a single row in a table. The grain of a fact table represents the most atomic level by
which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by Product by Store“. Each record in this fact table is therefore uniquely defined by a day, product and store. In this case you would not be able to look at sales by the hour, nor could you look at individual sales
• Granularity– The level of detail captured in a data warehouse.
Surrogate Key
• Surrogate Key (AKA Meaningless key, integer key, nonnatural key, artificial key, synthetic key)– Data Warehouses integrate data from multiple sources and therefore they
can’t rely upon an application key in one table being different from another application key in another table in another database.
– A new primary key that is created in a table to ensure uniqueness regardless of the source of new records.
– Surrogate keys can be integers even if the application key isn’t • This saves space• e.g. Two Customer tables in different sources may both have a primary key on
CustomerID. This means that the same CustomerID could relate to two totally different customers, depending on which source they came from. So when the records are added to a Dimensional Data Warehouse, a new Primary Key is added which has no relationship to the sources’ primary keys
• e.g Data changes over time. As an example, if the price of Left Handed Widgets is increased from 45.20 to 47.90, we need to keep the old data and add new data. Therefore we need a key that doesn’t depend solely upon the product ID
Star Schema
Snowflake Schema
• Star• Snowflake