aggregating knowledge in a data warehouse and multidimensional analysis rafal lukawiecki strategic...

Download Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd

Post on 22-Dec-2015




2 download

Embed Size (px)


  • Slide 1
  • Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd
  • Slide 2
  • 2 Objectives Explain the basics of: 1.Data Warehousing 2.ETL 3.OLAP/Multidimensional Data Relate the theory to SQL Server 2008 SSAS and SSIS This seminar is based on a number of sources including a few dozen of Microsoft-owned presentations, used with permission. Thank you to Marin Bezic, Kathy Sabourin, Aydin Gencler, Bryan Bredehoeft, and Chris Dial for all the support. Thank you to Maciej Pilecki for assistance with demos. The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. Portions 2009 Project Botticelli Ltd & entire material 2009 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.
  • Slide 3
  • 3 1. Data Warehouse (and its relationship to OLAP)
  • Slide 4
  • 4 OLE DB ODBC DB2 Oracle XML SQL Server Analysis Services SQL Server Report Server Models SQL Server Data Mining Models SQL Server Integration Services MySAP Hyperion Essbase SAP NetWeaver BI SQL Server Teradata Rich Connectivity Data Providers
  • Slide 5
  • 5 Lets Store the Intelligence: DW SQL Server Analysis Services server is a logical endpoint for data being aggregated with SSIS But do not store actual data in it Data physically rests in another relational database called a Data Warehouse Modelling of data stored in DW and analysed using SSAS is at the heart of good Data Warehouse design
  • Slide 6
  • 6 Star Schema
  • Slide 7
  • 7 Star Schema Benefits Transforms normalized data into a simpler model Delivers high-performance queries Delivers higher performing queries using Star Join Query Optimization Uses mature modeling techniques that are widely supported by many BI tools Requires low maintenance as the data warehouse design evolves
  • Slide 8
  • 8 Snowflake Dimension Tables Define hierarchies using multiple dimension tables Support fact tables with varying granularity Simplify consolidation of data from multiple sources Potential for slower query performance in relational reporting No difference in performance in Analysis Services database Potential for slower query performance in relational reporting No difference in performance in Analysis Services database
  • Slide 9
  • 9 OLAP Hierarchies Benefits View of data at different levels of summarization Path to drill down or drill up Implementation Denormalized DW star schema dimension Normalized DW snowflake dimension Self-referencing relationship
  • Slide 10
  • 10 Parent-Child Hierarchy Example Brian Amy Stacia Stephen ShuMichael Peter Jos Syed
  • Slide 11
  • 11 Fact Table Fundamentals Collection of measurements associated with a specific business process Specific column types Foreign keys to dimensions Measures numeric and aggregatable Metadata and lineage Consistent granularity the most atomic level by which the facts can be defined
  • Slide 12
  • 12 Fact Table Examples Day Grain Quarter Grain Reseller sales data by: Product Order Date Reseller Employee Sales Territory Sales quota data by: Employee Time
  • Slide 13
  • 13 Date Dimension Table Most common dimension used in analysis (aka Time dimension) Used consistently with all facts for efficient and flexible analysis Useful common attributes Year, Quarter, Month, Day Time series analysis support Navigation and summarization enabled with hierarchies, such as calendar or fiscal Single table design (typically not snowflake design) Tip: Format the key of the dimension as yyyymmdd (e.g. 20060925) to make it readily understandable
  • Slide 14
  • 14 Slowly Changing Dimensions Support primary role of data warehouse to describe the past accurately Maintain historical context as new or changed data is loaded into dimension tables Implement changes by Slowly Changing Dimension (SCD) type Type 1: Overwrite the existing dimension record Type 2: Insert a new versioned dimension record Type 3: Track limited history with attributes
  • Slide 15
  • 15 SCD Type 1 Existing record is updated History is not preserved
  • Slide 16
  • 16 SCD Type 2 Existing record is expired and new record inserted History is preserved Most common form of SCD
  • Slide 17
  • 17 SCD Type 3 Existing record is updated Limited history is preserved Implementation is rare SalesTerritoryKey update to 10
  • Slide 18
  • 18 Lets Get the Data We would like to populate facts and dimensions in our Data Warehouse from OLTP data...
  • Slide 19
  • 19 2. Integration and ETL
  • Slide 20
  • 20 Lets do ETL with SSIS SQL Server Integration Services (SSIS) service SSIS object model Two distinct runtime engines: Control flow Data flow 32-bit and 64-bit editions
  • Slide 21
  • 21 The Package The basic unit of work, deployment, and execution An organized collection of: Connection managers Control flow components Data flow components Variables Event handlers Configurations Can be designed graphically or built programmatically Saved in XML format to the file system or SQL Server
  • Slide 22
  • 22 Control Flow Control flow is a process-oriented workflow engine A package contains a single control flow Control flow elements Containers Tasks Precedence constraints Variables
  • Slide 23
  • 23 Data Flow The Data Flow Task Performs traditional ETL and more Fast and scalable Data Flow Components Extract data from Sources Load data into Destinations Modify data with Transformations Service Paths Connect data flow components Create the pipeline
  • Slide 24
  • 24 Data Flow Sources Sources extract data from Relational tables and views Files Analysis Services databases
  • Slide 25
  • 25 Data Flow Destinations Destinations load data to Relational tables and views Files Analysis Services databases and objects DataReaders and Recordsets Enterprise Edition only
  • Slide 26
  • 26 Populating Fact Tables Y Insert new record Insert new dimension record Lookup dimension key N Lookup failed? Repeat for each dimension key Transform Fact source
  • Slide 27
  • 27 Populating Dimension Tables Y Insert new record Update changed column(s) Expire existing record Transform Correlate records N N Y Type 2 change? Y Type 1 change? New record? Dimension source
  • Slide 28
  • 28 Row Transformations Update column values or create new columns Transform each row in the pipeline input
  • Slide 29
  • 29 Rowset Transformations Create new rowsets that can include Aggregated values Sorted values Sample rowsets Pivoted or unpivoted rowsets This is a heavy-weight performer of SSIS Are also called asynchronous components
  • Slide 30
  • 30 Split and Join Transformations Distribute rows to different outputs Create copies of the transformation inputs Join multiple inputs into one output Perform lookup operations
  • Slide 31
  • 31 Using SQL Server Integration Services for Aggregating and Deriving Data
  • Slide 32
  • 32 3. OLAP/Multidimensional Data
  • Slide 33
  • 33 SQL Server 2008 Analysis Services OLAP component Aggregates and organizes data from business data sources Performs calculations difficult to perform using relational queries Supports advanced business intelligence, such as Key Performance Indicators Data mining component Discovers patterns in both relational and OLAP data Enhances the OLAP component with discovered results
  • Slide 34
  • 34 Cube = Unified Dimensional Model Multidimensional data Combination of measures and dimensions as one conceptual model Measures are sourced from fact tables Dimensions are sourced from dimension tables
  • Slide 35
  • 35 Dimensions Members from tables/views in a data source view (based on a Data Warehouse) Contain attributes matching dimension columns Organize attributes as hierarchies One All level and one leaf level User hierarchies are multi-level combinations of attributes Can be placed in display folders Used for slicing and dicing by attribute
  • Slide 36
  • 36 Hierarchy Defin


View more >