tuesday introduction to olap and dimensional modelling

Upload: vracle

Post on 03-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    1/29

    Introduction to OLAP and

    Dimensional Modelling

    Tuesday

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    2/29

    Overview: Tuesday

    Format Time Description

    Lecture 10:00 - 10:45 Introduction to OLAP andAnalysis Services

    Demo 10:45 - 11:30 Dimensional modelling

    Lab 12:15 - 13:00 Practical session: Defining adata source and defining anddeploying a cube

    Lab 13:00 - 13:45 Practical session: Modifyingmeasures, attributes and

    hierarchiesLecture 14:30 - 15:15 Observations about design forOLAP and Reporting

    Discussion 15:15 - 16:00 Wrap-up: questions andfeedback

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    3/29

    Definition of OLAP

    Fast Analysis of Shared Multidimensional

    Information (FASMI, Nigel Pendse)

    Fast

    Analysis (statistical and business logic)

    Shared

    Multidimensional

    Information (all of the data and derived

    information needed)

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    4/29

    Multidimensional

    The system must provide a

    multidimensional conceptual view of the

    data, including full support for hierarchies

    and multiple hierarchies, as this is

    certainly the most logical way to analyze

    businesses and organizations.

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    5/29

    Alternative definition of OLAP

    (from SAS)

    OLAP is "fast access to large amounts ofsummarized data".

    This implies the concept of dimensionality. For

    without dimensions, there would be nothing tosummarize the data by.

    Alternative definition is that OLAP provides:

    "the ability of users to conveniently interrogate

    large amounts of data, at varying levels of detail,across a variety of combinations of businessdimensions"

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    6/29

    Kimballs Four-Step Design

    Process

    1. Select a business process

    2. Declare the grain

    3. Choose dimensions4. Identify facts

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    7/29

    STEP 1: Select a business process

    For our exercise, we will be looking at

    Internet sales

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    8/29

    A Quick Look at the Data (1)

    USE AdventureWorksDW;

    SELECT TOP 5

    CustomerKey,ProductKey, OrderDateKey,

    OrderQuantity

    FROM FactInternetSales

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    9/29

    A Quick Look at the Data (2)

    Customer Product OrderDate SalesAmount

    Key Key Key

    ----------- ---------- ------------ ---------------------

    11003 346 1 3399.99

    14501 336 1 699.0982

    21768 310 1 3578.27

    25863 346 1 3399.99

    28389 346 1 3399.99

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    10/29

    STEP 2: Declare the grain

    (What does a row in the fact table mean?)

    In our example, a row is an individual

    order.

    Design rule: recognise the trade-off.

    A finer grain facilitates more detailed analysis,

    but results in a larger quantity of data.

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    11/29

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    12/29

    STEP 4: Identify facts

    The numeric facts that we will measure

    FactInternetSales

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    13/29

    DimCustomer

    CustomerKey

    GeographyKey

    CustomerAlternateKey

    Title

    FirstName

    MiddleName

    DimGeography

    GeographyKey

    City

    StateProvinceCode

    StateProvinceName

    CountryRegionCode

    EnglishCountryRegionName

    SpanishCountryRegionNa...

    DimProduct

    ProductKey

    ProductAlternateKey

    ProductSubcategoryKey

    WeightUnitMeasureCode

    SizeUnitMeasureCode

    EnglishProductName

    SpanishProductName

    FrenchProductName

    StandardCost

    FinishedGoodsFlag

    Color

    SafetyStockLevel

    ReorderPoint

    ListPrice

    DimTime

    TimeKey

    FullDateAlternateKey

    DayNumberOfWeek

    EnglishDayNameOfWeek

    SpanishDayNameOfWeek

    FrenchDayNameOfWeek

    DayNumberOfMonth

    FactInternetSales

    ProductKey

    OrderDateKey

    DueDateKey

    ShipDateKey

    CustomerKey

    PromotionKey

    CurrencyKey

    SalesTerritoryKey

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    14/29

    Demo

    Dimensional modelling

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    15/29

    Lab

    Defining a data source view

    Defining and deploying a cube

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    16/29

    Lab

    Practical session: Modifying measures,

    attributes and hierarchies

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    17/29

    Lecture: Observations about design

    for OLAP and Reporting

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    18/29

    The BI Bottleneck (1)

    Report consumers The report may be electronic, e.g. Excel

    Power users Capable of some self-service

    Report authors The know the data and the business.

    Reporting administrator They know the database and the data, but not necessarily how it

    relates to the business.

    Challenge: make reporting more interactive so thatchanges can be accommodated without passing alongthe chain

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    19/29

    The BI bottleneck (2)

    Typically, analysts time is the scarce

    resource.

    The number of iterations is the killer.

    Sometimes, testing is the bottleneck.

    Possible solution: the analyst spends a bit

    more time in the first iteration providing the

    business user with a more

    generic/interactive report.

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    20/29

    The BI Bottleneck (3)

    Long lead times

    High development costs

    Apparently small changes to arequirement for a report take a long time to

    implement.

    For each link along the chain that arequest for a change needs to go, delay

    goes up by a big factor.

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    21/29

    The Relational Model of Data

    Conceptually, homogeneous tabular structure:

    Logic: for declarative query language

    Algebra: for query optimization

    Application interface (e.g. simple reporting tools). Applications designers and even some end-users can

    (just about) understand tables.

    Relational model provided a mutually intelligible

    language for implementers, administrators,developers, researchers and even users.

    Flexible: join anything with anything (c.f. OLAP).

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    22/29

    Inadequacy of the Relational Model

    for Reporting applications

    Heterogeneous data sources:

    Database, OLAP, XML Web services, etc.

    Relational model does not fit well with the

    area between storage and presentation.

    Aggregation hierarchies

    Matrix structures

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    23/29

    The Microsoft approach: UDM

    Server Analysis Services 2005 implements

    UDM.

    Acts as a bridge between users and their data.

    Encapsulates semantics, language and time.

    UDM perspectives allow the user to view

    subsets.

    Integrated with Data Mining. Accessed via SOAP and XML for Analysis.

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    24/29

    UDM

    A UDM provides a single dimensional model for allOLAP analysis andrelational reporting needs. So you can use either MDX or SQL

    Perspectives are the new data marts

    Cubes are largely transparent concepts downgraded to the status of caches

    Commonly youll only have 1 cube with multiple measure groups andmultiple perspectives.

    Its better to think of measure groups instead of cubes; partitions now applyto measure groups.

    Whilst a UDM can gather data from numerous data sources, the need tocleanse data still requires a data warehouse.

    A cube is structured around dimensional attributes(previously known asmember properties) rather than dimensional hierarchies. Hence the virtualdimension, as a term, is now gone and concept converted to a real, firstclass, dimension.

    UDM has five new dimension types, Role Playing, Fact, Reference, DataMining and Many to many.

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    25/29

    The pre-UDM and UDM stacks

    Pre-UDM stack

    Dimension model

    (pivot table)

    Calculations (Excel) End-user model (if you

    are lucky)

    Data source view

    Management settings

    UDM stack

    Management settings

    End-user model

    Calculations Dimensional model

    Data source view

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    26/29

    Enterprise BI with UDM

    DW

    Datamart

    Datamart

    BI Applications

    XML

    Web

    Service

    MOLAP

    Reporting

    Tool (1)

    OLAP

    Browser (2)

    OLAP

    Browser (1)

    Reporting

    Tool (1)UDM

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    27/29

    Desirable features of a BI data

    model

    The model must facilitate

    re-use of report spare-parts by the power

    users (rather than just the report authors).

    more flexibility for report consumers

    easier maintenance of the set of all reports

    used by an enterprise. (E.g. Avoiding the

    reporting chain.) Interaction.

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    28/29

    Current design principles

    All about how to make reports look good.

    See, for example, Microsoft SQL Server 2005

    Report Design: Best Practices and Guidelines

    Some focus on maintenance.

    No focus on re-use.

  • 8/12/2019 Tuesday Introduction to OLAP and Dimensional Modelling

    29/29

    Wrap-Up

    Questions

    Feedback