atlanta microsoft database forum introduction to data warehousing concepts brian thomas solution...
TRANSCRIPT
Atlanta Microsoft Database ForumIntroduction to Data Warehousing Concepts
Brian ThomasSolution Builders, Inc.
Presented by
March 8, [email protected]
Data collected from one or many systems that exist within and outside the organization. The Data is structured in such a way as to reduce the amount of time that it takes to produce reliable information.
What is a Data Warehouse?
Why Build a Data Warehouse?
• To Provide a Consistent Common Source for Corporate Information
• To Store Large Volumes of Historical Detail Data from Mission Critical Applications
• Improve the Ability to Access, Report Against, and Analyze Information
• To Solve or Improve Upon Business Processes
Turning Data into Information
Sales System System GeneratedReports
Sales Analysis is extrapolatedfrom the System Reports.
Functional Data Warehouse
Turning Data into Information
Functional Data Warehouseof Sales Information
Sales Information is available to a wider audience of decision makers.
Sales System
Functional Data Warehouse
Turning Data into Information
Sales System
Div
isio
n A
Div
isio
n B
Sales System
Sales System
Div
isio
n C
Centralized Data Warehouse of Sales Data
from across the Organization
Analysis performed and Decisions drawn from
the Cross Organizational Sales Data
Cross Organizational Functional Data Warehouse
Turning Data into Information
Sales System
Production Systems
Marketing System
System GeneratedReports
Corporate Performance Analysis is extrapolated
from the System Reports.
Cross Functional Data Warehouse
Turning Data into Information
Sales System
Production Systems
Marketing System
Cross Functional Data Warehouseof Information
Corporate Performance Analysis is available to a
wider audience.
Cross Functional Data Warehouse
Turning Data into Information
Div
isio
n A
Div
isio
n B
Div
isio
n C
Centralized Cross Functional Data
Warehouseof Information
Analysis is performed and Decisions made from the
Cross Functional Organizational
Performance Data
Cross Organizational & Cross Functional Data Warehouse
Source Systems Data Warehouse ComponentsAccess
MethodsE
xtra
ctio
n T
ran
sfor
mat
ion
Loa
d (
ET
L)
CorporateLevel
BusinessGroupLevel
DivisionalLevel
Enterprise Data Warehouse
Incr
ease
d L
evel
of
Stan
dard
izat
ion
Increased Local Specifications
DW / DM
DM DM DM DM DM DM
DW / DM DW / DM
Dat
a A
cces
s &
Qu
ery
Man
agem
ent
Ser
vice
s
`
Planning &Forecasting
PerformanceManagement
Scorecards &Dashboards
Analytics &Modeling
Query &Reporting
Portal /Web Interface
DesktopApplications
PrintedReports
MobileDevices
Div
isio
n A
Div
isio
n B
Div
isio
n C
Ext
ern
al D
ata
Data Warehouse ArchitectureManagement
Systems
Data Warehouse ArchitectureSource Systems
Div
isio
n A
Div
isio
n B
Div
isio
n C
Ext
ern
al D
ata
Data Staging Area
Data Warehouse Repository
Ext
ract
, Tra
nsfo
rmat
ion
and
Loa
d (E
TL
)
Data Warehouse ArchitectureData Staging Area
• Subject Area Oriented
• Data Structure more closely mirrors Operational System Data Layouts
• Supports Identification of Changed Data
• Acts as a Working Area to Support the Transformation Process
Data Warehouse ArchitectureExtraction, Transformation & Load (ETL)
Ext
ract
, Tra
nsfo
rmat
ion
and
Loa
d (E
TL
)
• Perform Attribute Standardization and Cleansing
• Apply Business Rules and Calculations
• Consolidate using Matching and Merge / Purge Logic
• Ensure Proper Linking and Tracking of History
Data Warehouse ArchitectureExtraction, Transformation & Load (ETL)
App. A: Male , FemaleApp. B: 1 , 0App. C: x , yApp. D: m , f
App. A: pipeline (cm)App. B: pipeline (inches)App. C: pipeline (mcf)App. D: pipeline (yds)
App. A: Date (julian) App. B: Date (yyyymmdd)App. C: Date (mm/dd/yyyy)App. D: Date (absolute)
App. A: DescriptionApp. B: DescriptionApp. C: DescriptionApp. D: Description
App. A: balance on handApp. B: current balanceApp. C: cash in houseApp. D: balance
Male, Female
pipeline (cm)
Date (julian)
Description
Balance
Lookup Function
Conversion Function
Formatting Function
Merging Function
Mapping Function
Data Warehouse ArchitectureData Warehouse Repository
• Organized around Conformed Dimensions and Facts
• Promotes Usability and Intuitiveness
• Consolidated and Cross-Functional
• Historical and Atomic Representation of Data •Insulated from Source System Modifications and Additions
Data Warehouse RepositoryStar Schema Concepts
Fact TableThis table is the core of the Star Schema Structure and contains the Facts or Measures available through the Data Warehouse.
These Facts answer the questions of “What”, “How Much”, or “How Many”.
Some Examples:Sales Dollars, Units Sold, Gross Profit, Expense Amount, Net Income, Unit Cost, Number of Employees, Turnover, Salary, Tenure, etc.
Dimension Tables
Data Warehouse RepositoryStar Schema Concepts
These tables describe the Facts or Measures. These tables contain the Attributes and may also be Hierarchical.
These Dimensions answer the questions of “Who”, “What”, “When”, or “Where”.
Some Examples:• Day, Week, Month, Quarter, Year• Sales Person, Sales Manager, VP of Sales• Product, Product Category, Product Line• Cost Center, Unit, Segment, Business, Company
Data Warehouse RepositoryStar Schema Concepts
Time_DimTime_DimTime_DimTime_DimTimeKeyTimeKeyTheDate...
TheDate...
Sales_FactSales_FactTimeKeyEmployeeKeyProductKeyCustomerKeyShipperKey
TimeKeyEmployeeKeyProductKeyCustomerKeyShipperKey
Required Data(Business Metrics) or (Measures)...
Required Data(Business Metrics) or (Measures)...
Employee_DimEmployee_DimEmployee_DimEmployee_DimEmployeeKeyEmployeeKeyEmployeeID...
EmployeeID...
Product_DimProduct_DimProduct_DimProduct_DimProductKeyProductKeyProductID...
ProductID...
Customer_DimCustomer_DimCustomer_DimCustomer_DimCustomerKeyCustomerKeyCustomerID...
CustomerID...
Shipper_DimShipper_DimShipper_DimShipper_DimShipperKeyShipperKeyShipperID...
ShipperID...
Apples
CherriesGrapes
Melons
Q4Q1 Q2 Q3
Time Dimension
Dallas
Denver
Chicago
Mar
ket
s D
imen
sion Atlanta
Produ
ct D
imen
sion
Data Warehouse RepositoryCube Concepts
Q4
Data Warehouse RepositoryCube Concepts
CherriesGrapes
Melons
Q1 Q2 Q3
Time Dimension
Dallas
Denver
Chicago
Mar
ket
s D
imen
sion Atlanta
Produ
ct D
imen
sion
Sales Fact
Apples
Data Warehouse RepositoryStorage Concepts
• Relational On-Line Analytical Processing (ROLAP): The information that is stored in the Data Warehouse is held in a relational structure. Aggregations are performed on the fly either by the database or in the analysis tool.
• Multidimensional On-Line Analytical Processing (MOLAP): This information is aggregated in a predefined manner based on the characteristics of the Measures and the defined hierarchy of the Dimensions. Since the data is pre-aggregated, navigating through the hierarchies is instantaneous. The user is simply navigating to a point within the Multidimensional Cube and not performing any on the fly aggregations.
• Hybrid On-Line Analytical Processing (HOLAP): This is a combination of MOLAP and ROLAP. A portion of the data is predefined and aggregated. This would typically be the set of information that is accessed most frequently. Additional detail can be held in a ROLAP structure and allow a user to drill through the MOLAP structure into the ROLAP structure.
Client perspectiveClient perspectiveClient perspectiveClient perspective MOLAPMOLAPMOLAPMOLAP HOLAPHOLAPHOLAPHOLAP ROLAPROLAPROLAPROLAP
Query performanceQuery performance
Storage consumptionStorage consumption
FastestFastest
HighHigh
FasterFaster
MediumMedium
FastFast
LowLow
Data Warehouse RepositoryCube Concepts
Source Systems Data Warehouse ComponentsAccess
MethodsE
xtra
ctio
n T
ran
sfor
mat
ion
Loa
d (
ET
L)
CorporateLevel
BusinessGroupLevel
DivisionalLevel
Enterprise Data Warehouse
Incr
ease
d L
evel
of
Stan
dard
izat
ion
Increased Local Specifications
DW / DM
DM DM DM DM DM DM
DW / DM DW / DM
Dat
a A
cces
s &
Qu
ery
Man
agem
ent
Ser
vice
s
`
Planning &Forecasting
PerformanceManagement
Scorecards &Dashboards
Analytics &Modeling
Query &Reporting
Portal /Web Interface
DesktopApplications
PrintedReports
MobileDevices
Div
isio
n A
Div
isio
n B
Div
isio
n C
Ext
ern
al D
ata
ManagementSystems
Where does Microsoft fit in?SQL Server DTSSQL Server Relational Database and Analysis Services
SQL Stored Procedures, SQL Views, MDX, and .NET Web Services
Microsoft Office, Reporting Services and .NET Framework
Sh
areP
oin
t P
orta
l, E
xch
ange
, an
d .N
ET
Fra
mew
ork
Q & A