aggregating knowledge in a data warehouse and...

46
Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd [email protected]

Upload: others

Post on 14-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

Aggregating Knowledge in a Data Warehouse and Multidimensional AnalysisRafal LukawieckiStrategic Consultant, Project Botticelli Ltd

[email protected]

Page 2: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

2

Objectives

• Explain the basics of:

1. Data Warehousing

2. ETL

3. OLAP/Multidimensional Data

• Relate the theory to SQL Server 2008 SSAS and SSIS

This seminar is based on a number of sources including a few dozen of Microsoft-owned presentations, used with permission. Thank you to Marin Bezic, Kathy Sabourin, Aydin Gencler, Bryan Bredehoeft, and Chris Dial for all the support. Thank you to Maciej Pilecki for assistance with demos.

The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or RafalLukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express,implied or statutory, as to the information in this presentation.

Portions © 2009 Project Botticelli Ltd & entire material © 2009 Microsoft Corp. Some slides contain quotations from copyrightedmaterials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved.Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S.and/or other countries. The information herein is for informational purposes only and represents the current view of Project BotticelliLtd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it shouldnot be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy ofany information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as tothe information in this presentation. E&OE.

Page 3: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

3

1. Data Warehouse

Page 4: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

4

OLE DB

ODBC

DB2Oracle

XML

SQL Server

Analysis Services

SQL Server

Report Server Models

SQL Server

Data Mining Models

SQL Server

Integration Services

MySAP

Hyperion Essbase

SAP

NetWeaver BISQL Server

Teradata

Rich ConnectivityData Providers

Page 5: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

5

Let’s Store the Intelligence: DW

• SQL Server Analysis Services server is a logical endpoint for data being aggregated with SSIS

• But do not store actual data in it

• Data physically rests in another database called a Data Warehouse

• You can manipulate it directly, or build in parallel with OLTP processing

• Modelling of data stored in DW and analysed using SSAS is at the heart of good Data Warehouse design

Page 6: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

6

Star Schema

Page 7: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

7

Star Schema Benefits

• Transforms normalized data into a simpler model

• Delivers high-performance queries

• Delivers higher performing queries using Star Join Query Optimization

• Uses mature modeling techniques that are widely supported by many BI tools

• Requires low maintenance as the data warehouse design evolves

Page 8: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

8

Snowflake Dimension Tables

• Define hierarchies using multiple dimension tables

• Support fact tables with varying granularity

• Simplify consolidation of data from multiple sources

Potential for slower query performance in relational reporting

No difference in performance in Analysis Services database

Page 9: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

9

Hierarchies

• Benefits

• View of data at different levels of summarization

• Path to drill down or drill up

• Implementation

• Denormalized starschema dimension

• Normalized snowflakedimension

• Self-referencing relationship

Page 10: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

10

Fact Table Fundamentals

• Collection of measurements associated with a specific business process

• Specific column types

• Foreign keys to dimensions

• Measures – numeric and additive

• Metadata and lineage

• Consistent granularity – the most atomic level by which the facts can be defined

Page 11: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

11

Fact Table Examples

Day Grain

Quarter Grain

Reseller sales data by:

•Product

•Order Date

•Reseller

•Employee

•Sales Territory

Sales quota data by:

•Employee

•Time

Page 12: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

12

Date Dimension Table

• Most common dimension used in analysis (aka Time dimension)

• Used consistently with all facts for efficient and flexible analysis

• Useful common attributes – Year, Quarter, Month, Day

• Time series analysis support

• Navigation and summarization enabled with hierarchies, such as calendar or fiscal

• Single table design (typically not snowflake design)

Tip: Format the key of the dimension as yyyymmdd (e.g. 20060925) to make it readily understandable

Page 13: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

13

Parent-Child Hierarchy

• A dimension that contains a parent attribute

• A parent attribute describes a self-referencing relationship, or a self-join, within a dimension table

• Common examples

• Organizational charts

• General Ledger structures

• Bill of Materials

Page 14: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

14

Parent-Child Hierarchy Example

Brian

Amy

Stacia

Stephen

Shu Michael

Peter

José

Syed

Page 15: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

15

Slowly Changing Dimensions

• Support primary role of data warehouse to describe the past accurately

• Maintain historical context as new or changed data is loaded into dimension tables

• Implement changes by Slowly Changing Dimension (SCD) type

• Type 1: Overwrite the existing dimension record

• Type 2: Insert a new ‘versioned’ dimension record

• Type 3: Track limited history with attributes

Page 16: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

16

SCD Type 1

• Existing record is updated

• History is not preserved

Page 17: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

17

SCD Type 2

• Existing record is ‘expired’ and new record inserted

• History is preserved

• Most common form of SCD

Page 18: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

18

SCD Type 3

• Existing record is updated

• Limited history is preserved

• Implementation is rare

18

SalesTerritoryKeyupdate to 10

Page 19: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

19

Let’s Get the Data

• We would like to populate facts and dimensions in our Data Warehouse from OLTP data...

Page 20: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

20

2. Integration and ETL

Page 21: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

21

Let’s do ETL with SSIS

• SQL Server Integration Services (SSIS) service

• SSIS object model

• Two distinct runtime engines:

• Control flow

• Data flow

• 32-bit and 64-bit editions

Page 22: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

22

The Package

• The basic unit of work, deployment, and execution

• An organized collection of:

• Connection managers

• Control flow components

• Data flow components

• Variables

• Event handlers

• Configurations

• Can be designed graphically or built programmatically

• Saved in XML format to the file system or SQL Server

Page 23: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

23

Control Flow

• Control flow is a process-oriented workflow engine

• A package contains a single control flow

• Control flow elements

• Containers

• Tasks

• Precedence constraints

• Variables

Page 24: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

24

Data Flow

• The Data Flow Task

• Encapsulates the data flow engine

• Exists in the context of an overall control flow

• Performs traditional ETL in addition to other extended scenarios

• Is fast and scalable

• Data Flow Components

• Extract data from Sources

• Load data into Destinations

• Modify data with Transformations

• Service Paths

• Connect data flow components

• Create the pipeline

Page 25: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

25

Data Flow Sources

• Sources extract data from

• Relational tables and views

• Files

• Analysis Services databases

Page 26: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

26

Data Flow Destinations

• Destinations load data to

• Relational tables and views

• Files

• Analysis Services databases and objects

• DataReaders and Recordsets

Enterprise Edition only

Page 27: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

27

Populating Fact Tables

Y

Insert newrecord

Insert newdimension record

Lookupdimension key

N

Lookup failed?

Repeat for each

dimension key

TransformFact

source

Page 28: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

28

Populating Dimension Tables

Y

Insert newrecord

Update changedcolumn(s)

Expire existingrecord

TransformCorrelaterecords

N

N

Y

Type 2change?

Y

Type 1change?

New record?

Dimension

source

Page 29: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

29

Data Flow Transformations

• Aggregate, merge, distribute, or modify data

• Include error outputs in some cases

Transformation Categories

Row

Rowset

Split and Join

Business Intelligence (BI)

Script

Other

Page 30: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

30

Row Transformations

• Update column values or create new columns

• Transform each row in the pipeline input

Page 31: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

31

Rowset Transformations

• Create new rowsets that can include

• Aggregated values

• Sorted values

• Sample rowsets

• Pivoted or unpivoted rowsets

• This is a heavy-weight performer of SSIS

• Are also called asynchronous components

Page 32: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

32

Split and Join Transformations

• Distribute rows to different outputs

• Create copies of the transformation inputs

• Join multiple inputs into one output

• Perform lookup operations

Page 33: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

33

Using SQL Server Integration Services for Aggregating and Deriving Data

Page 34: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

34

3. OLAP/Multidimensional Data

Page 35: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

35

SQL Server 2008 Analysis Services

• OLAP component

• Aggregates and organizes data from business data sources

• Performs calculations difficult to perform using relational queries

• Supports advanced business intelligence, such as Key Performance Indicators

• Data mining component

• Discovers patterns in both relational and OLAP data

• Enhances the OLAP component with discovered results

Page 36: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

36

Cube = Unified Dimensional Model

• Multidimensional data

• Combination of measures and dimensions as one conceptual model

• Measures are sourced from fact tables

• Dimensions are sourced from dimension tables

Page 37: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

37

Dimensions

• Members from tables/views in a data source view (based on a Data Warehouse)

• Contain attributes matching dimension columns

• Organize attributes as hierarchies

• One All level and one leaf level

• User hierarchies are multi-level combinations of attributes

• Can be placed in display folders

• Used for slicing and dicing by attribute

Page 38: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

38

Hierarchy

• Defined in Analysis Services

• Ordered collection of attributes into levels

• Navigation path through dimensional space

• Very important to get right!

Customers by Geography

Country

State

City

Customer

Customers by Demographics

Marital

Gender

Customer

Page 39: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

39

Measure Group

• Group of measures with same dimensionality

• Analogous to a fact table

• Cube can contain more than one measure group

• E.g. Sales, Inventory, Finance

• Defined by dimension relationships

Page 40: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

40

Sales Inventory Finance

Customers X

Products X X

Time X X X

Promotions X

Warehouse X

Department X

Account X

Scenario X

Measure Group

Measure GroupD

ime

ns

ion

Page 41: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

41

Dimension Relationships

• Define interaction between dimensions and measure groups

• Relationship types

• Regular

• Reference

• Fact (Degenerate)

• Many-to-many

• Data mining

Page 42: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

42

Customer

City

State

Country

Gender Marital

Country

State

City

Customer

Gender

Customer

Marital

Gender

Customer

Customer

City

State

Country

Gender

Marital

Attributes Hierarchies

Age

Dimension Model

Page 43: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

43

Calculations

• Expressions evaluated at query time for values that cannot be stored in fact table

• Types of calculations

• Calculated members

• Named sets

• Scoped assignments

• Calculations are defined using MDX

MDX = MultiDimensional EXpressions

Page 44: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

44

1. Using BIDS to Review Dimension Design2. Cube Design and Functionality

Page 45: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

45

Summary

• As a platform for enterprise Business Intelligence you should consider three things:

• A Data Warehouse

• Process of Data Integration (incl. ETL)

• Multidimensional Analysis (OLAP)

= SQL Server 2008 Engine, SSIS, and SSAS

• Now you can support decision making and performance management through:

• Reports, dashboards, Excel integration, data mining, and better business software

Page 46: Aggregating Knowledge in a Data Warehouse and ...download.microsoft.com/documents/UK/Finland/post/20090203/agg… · 03.02.2009  · Data Warehouse and Multidimensional Analysis Rafal

46

© 2009 Microsoft Corporation & Project Botticelli Ltd. All rights reserved.

The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The materialpresented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in thispresentation.

Portions © 2009 Project Botticelli Ltd & entire material © 2009 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors,as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and otherproduct names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informationalpurposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft mustrespond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticellicannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied orstatutory, as to the information in this presentation. E&OE.