biug 20112026 dimensional modeling and mdx best practices

64
BIUG ץץץ2011 Itay Braun CTO [email protected]

Upload: itay-braun

Post on 27-Jun-2015

1.580 views

Category:

Technology


4 download

DESCRIPTION

My presentation at the last Israeli BI User Group. Some of the content taken from SQLBits and Chriss Web's blog.

TRANSCRIPT

Page 1: Biug 20112026   dimensional modeling and mdx best practices

BIUG 2011מרץ

Itay [email protected]

Page 2: Biug 20112026   dimensional modeling and mdx best practices

l Dimension Design l SSAS Best Practicesl MDX

l Inspired by Vincent Rainardi (http://sqlbits.com/ ) and Mosha Pumanski

Agenda

Page 3: Biug 20112026   dimensional modeling and mdx best practices

l 2007 הוקמה בשנתl אנשי תוכנה מומחים בתחומם 22החברה מונה lמשרדי החברה בהרצליה פיתוחl שותפים של בתחוםDB-וה BIl שותפים של בתחוםl בישראלחברת השירותים המובילה בתחום l-מובילים את תחום הBI-ה DB ב

אודות החברה

Page 4: Biug 20112026   dimensional modeling and mdx best practices

תחומי פעילותMicrostrategy בטכנולוגיות מיקרוסופט ו-BIביצוע פרויקטי •DWHאפיון הקמה ותחזוקה של מערכות •הדרכות וקורסים מתקדמים•

BI

SQL SERVER ו-SYBASEמומחים ב-• של מערכות קריטיות ומורכבות בחברות גדולות7*24תחזוקה • גדולות DB למערכות T&Pביצוע •OEMמומחיות בסיוע ללקוחות •

DB

פיתוח פרויקטים בהתאם לצרכי הלקוח• ,C(#פתרונות מבוססי טכנולוגיות מתקדמות של מיקרוסופט •

SILVERLIGHT,WEB( אפיון והקמת פרויקטי אינטגרציה•

.NET

Page 5: Biug 20112026   dimensional modeling and mdx best practices

לקוחות נבחרים

Page 6: Biug 20112026   dimensional modeling and mdx best practices

White Papers

• Analysis Services 2008 R2 Performance Guide

• Analysis Services 2008 Operation Guide

• Performance Improvements for MDX in SQL Server 2008 Analysis Services

• OLAP Design Best Practices

Page 7: Biug 20112026   dimensional modeling and mdx best practices

1 or 2 dimensions

• Simplicity, 1 dim• Hierarchy from customer

attribute &account attribute• Use when we don’t have fact

tables requiring customer grain.

• We can get the customer attributes without knowing the account key

• Disadvantage: can’t go from account to customer without going through the fact table - performance

customerattributes

FactTable

DimAccount

a) One Dimension b) Two Dimensions

FactTable

DimAccount

DimCustomer

Page 8: Biug 20112026   dimensional modeling and mdx best practices

1 or 2 dimensions

• Dim customer is needed by another fact table• Modular: 2 separate dim tables but we can combine

them easily to create a bigger dimension• To get the breakdown of a measure by a customer

attribute is a bit more complicated than a)

select c. attribute, sum(f.measure1) from fact1 finner join dim_account a on f.account_key = a.account_keyinner join dim_customer c on a.customer_key = c.customer_keygroup by c. attribute

c) Snowflake

FactTable

DimAccount

DimCustomer

Page 9: Biug 20112026   dimensional modeling and mdx best practices

When to Snowflake

1. When the sub dim is used by several dimsCity-Country-Region columns exist in DimBroker, DimPolicy, DimOffice and DimInsured

Replaced by Location/GeoKey pointing to DimLocation / DimGeography

Advantage: consistent hierarchy, i.e. relationship between City, Country & Region.

Weakness: we would lose flexibility. City to Country are more or less fixed, but the grouping of countries might be different between dimensions.

Page 10: Biug 20112026   dimensional modeling and mdx best practices

When to Snowflake

2. When the sub dim is used by both the main dim and the fact table(s)

• DimCustomer is used in DimAccount, and is also used in the fact table.

• DimManufacturer is used in DimProduct, and is also used in the fact table.

• DimProductGroup is used in DimProduct, and is also used in some fact table.

The alternative is maintaining two full dimensions (star classic).

Page 11: Biug 20112026   dimensional modeling and mdx best practices

When to Snowflake

3. To make “base dim” and “detail dim”

Insurance classes, account types (banking), product lines, diagnosis, treatment (health care)

Policies for marine, aviation & property classes have different attributes. Pull common attributes into 1 dim: DimBasePolicy Put class-specific attributes into DimMarine, DimProperty, DimAviation

Ref: Kimball DW Toolkit 2nd edition page 213

Page 12: Biug 20112026   dimensional modeling and mdx best practices

A dimension with only 1 attribute

Reasons for putting single attribute in its own dim:– Keep fact table slim  (4 bytes int not 100 bytes varchar)– When the value changes, we don’t have to update the BIG fact

table – ETL performance– Grain is much lower than fact table – small dim– Yes it’s only 1 attribute today, but in the future there could be

another attribute.

Should we put the attribute in the fact table? (like DD = Degenerate Dim)Probably, if the grain = fact table, and it’s short or it’s a number.

Page 13: Biug 20112026   dimensional modeling and mdx best practices

Fact Table Primary Key

Should we have a PK?Yes, if we need to be able to identify each fact row

1. Need to refer to a fact row from another fact row e.g. chain of events

2. Many identical fact rows and we need to update/delete only one

3. To link the fact table to another fact table

Some experts totally disagree

PK FK (no RI)PK FK)not enforced(

Related Trans Header - Detail

PK

Uniqueness

previous/next transaction

Page 14: Biug 20112026   dimensional modeling and mdx best practices

Fact Table Primary Key

Single or Multi Column? Single Column: Generated Identity Multi Column: Dimension Keys

Single-column PK is better than multi-column PK because :

1) A multi-column PK may not be unique. A single-column PK guarantees that the PK is unique, because it is an identity column.

2) A single-column PK is slimmer than a multi-column PK, better query performance. To do a self join in the fact table (e.g. to link the current fact row to the previous fact row), we join on a single integer column.

Page 15: Biug 20112026   dimensional modeling and mdx best practices

Fact Table Primary Key• Advantage: Prevent duplicate rows, query performance• Disadvantage: loading performance• Indexing the PK: cluster or not?

– Cluster the PK if: the PK is an identity column – Don’t cluster the PK if: the PK is a composite, or when you need

the cluster index for query performance (with partitioning)

Example of not having a PK

If duplicate fact rows are allowed.e.g. retail DW: Store Key, Date Key, Product Key, Customer KeySame customer buying the same milk in the same shop on the same day twice

Page 16: Biug 20112026   dimensional modeling and mdx best practices

Aggregate Fact Tables

What are they? • High level aggregation of base fact tables• A “select group by” query on a 2 billion rows

fact table can take 30 mins if it joins with two big fact tables, even with indexes in place

• So we do this query in advance as part of the DW load and store it as an Aggregate Fact Table

• The report only takes 1 second to run.AggregateFact Table

Base Fact Tables

Report

30 mins

1 sec

Page 17: Biug 20112026   dimensional modeling and mdx best practices

Rapidly Changing Dimension• Why is it a problem

– Large SCD2 dim – Attributes change every day – Slow query when join with large fact tables

• What to do– Put into a separate dim, link direct to fact table.– Just store the latest, type 1 attributes (or dual)– Store in the fact table (for small attribute, e.g. indicator)

Type2

Type1

Type2Type2

Type2

Page 18: Biug 20112026   dimensional modeling and mdx best practices

Very Large DimensionWhy is it a problem

– SSAS: 4 GB string store limit for dimension– SSAS: dim is “select distinct” on each attribute

– long processing time– Difficult to browse high cardinality attribute– Join with fact tables – performance

Page 19: Biug 20112026   dimensional modeling and mdx best practices

What to do– Split into 2 dims, same grain. Always cut vertically. – Remove SCD2, or at least only certain columns.– Most common: separate the attributes with high cardinality/change

frequency

Very Large Dimension

VLD

Page 20: Biug 20112026   dimensional modeling and mdx best practices

Real Time Fact Table

• Reporting the transaction system in real time• View to union with the normal fact table, or use partitions• Freezing the dims for key lookup, -3 unknown key• Key corrections next day

Real time partition)intraday today(

Dims as ofyesterday

Main partition)up to last night(

-1 null in source-2 not in dim table-3 not in dim table as dim was frozen to be resolved next batch

Unknown keys:

dimkey

Page 21: Biug 20112026   dimensional modeling and mdx best practices

Dealing with Currency Rates

What for/background/requirements– Report in 3 reporting currencies, using today rates or past– Analyse over time without the impact of currency rates (using fixed

currency rates, e.g. 2010 EOY rates)– Had the transactions happened today– Currency rates historical analysis

TransactionCurrency

DWCurrency

ReportingCurrencyTransaction

RatesReporting

Rates)many transaction

dates( )1 reporting

date(

100 countries40 currencies

1 currency 3-4 currenciesGBP, USD, EUR,

Originale.g. GBP

Page 23: Biug 20112026   dimensional modeling and mdx best practices

Dealing with StatusWhat/background

– Workflow (policies, contracts, documents)– Bottleneck analysis (no of days between

stages)– How many on each stage

Status 1

Status 3

Status 4

Status 5

Status 6

Status 2

date1 date4date3date2

Page 24: Biug 20112026   dimensional modeling and mdx best practices

Dealing with Status

Approaches– Accumulative Snapshot Fact, 1 row per application– SCD2 on DimApp– App Status fact table

AppKey Sts1Date Sts1Ind Sts2Date Sts2Ind Sts3Date Sts3Ind1 1/3/11 1 3/3/11 1 7/3/11 12 6/3/11 1 7/3/11 1 0

AppKey StsKey StsDateKey1 1 611 2 631 3 672 1 662 2 67

AppKey AppID StsKey StsDate Current1 1 1 1/3/11 N2 1 2 3/3/11 N3 1 3 7/3/11 Y4 2 1 6/3/11 N5 2 2 7/3/11 Y

Page 25: Biug 20112026   dimensional modeling and mdx best practices

Referenced Dimensions

• Enables using one “master” member • Not Snowflake dimension

– For ex. • Dim customers: UK, London, Roman Avramovich.• Dim Stores: UK, London, Friendly Bikes Store

– What is the total revenue from Internet customers and stores in London?

Page 26: Biug 20112026   dimensional modeling and mdx best practices

MDX optimization Methodology

• Re-write the MDX code• Add Aggregations• Add pre-calculated Measure Groups (ETL)• Solve the problem using Relational Engine• Use .NET Store Procedures.

– Rarely the problem can be solved using better hardware.

• Column based Databases

Page 27: Biug 20112026   dimensional modeling and mdx best practices

• Optimizing MDX– Baselining Query Speeds

• Clearing the Analysis Services Caches• Clearing the Operating System Caches using

fsutil.exe or SSAS Stored Proc (codeplex)• Identifying and Resolving MDX Query Performanc

e Bottlenecks in SQL Server 2005 Analysis Services

• Configuring the Analysis Services Query Log

Page 28: Biug 20112026   dimensional modeling and mdx best practices

• Cell-by-Cell Mode vs. Subspace Mode

Almost always, performance obtained by using subspace (or block computation) mode is superior to that obtained by using cell-by-cell (nor naïve) mode.

Page 29: Biug 20112026   dimensional modeling and mdx best practices

Using Profiler

• So far so good

Page 30: Biug 20112026   dimensional modeling and mdx best practices

Doesn’t use the cache

Page 31: Biug 20112026   dimensional modeling and mdx best practices

Subcube

• Granularity• Slice

Page 32: Biug 20112026   dimensional modeling and mdx best practices

Granularity

• Single grain– List of GROUP BY attributes in SQL SELECT

• Mixed grain– Both Attribute.[All] and Attribute.MEMBERS

Page 33: Biug 20112026   dimensional modeling and mdx best practices

GranularityAll

Country,All City

Countries,All City

Countries,Cities

All Products

Products

Page 34: Biug 20112026   dimensional modeling and mdx best practices

Slice

• Single member– SQL: Where City = ‘Redmond’– MDX: [City].[Redmond]

• Multiple members– SQL: Where City IN (‘Redmond’, ‘Seattle’)– MDX: { [City].[Redmond], [City].[Seattle] }

Page 35: Biug 20112026   dimensional modeling and mdx best practices

Slice at granularity

SQLSELECT Sum(Sales), City FROM Sales_TableWHERE City IN (‘Redmond’, ‘Seattle’)GROUP BY City

MDXSELECT Measures.Sales ON 0, NON EMPTY {Redmond, Seattle} ON 1FROM Sales_Cube

Page 36: Biug 20112026   dimensional modeling and mdx best practices

Slice below granularity

SQLSELECT Sum(Sales) FROM Sales_TableWHERE City IN (‘Redmond’, ‘Seattle’)

MDXSELECT Measures.Sales ON 0FROM Sales_CubeWHERE {Redmond, Seattle}

Page 37: Biug 20112026   dimensional modeling and mdx best practices

Examples

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

Page 38: Biug 20112026   dimensional modeling and mdx best practices

Examples

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

)Seattle, Year.Year.MEMBERS(

Page 39: Biug 20112026   dimensional modeling and mdx best practices

Examples

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

)Seattle, Year.MEMBERS(

Page 40: Biug 20112026   dimensional modeling and mdx best practices

Examples

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

})Redmond, Seattle, London ,{Year.MEMBERS(

Page 41: Biug 20112026   dimensional modeling and mdx best practices

Examples

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

})Redmond, Seattle} ,{2005, 2006, 2007({

Page 42: Biug 20112026   dimensional modeling and mdx best practices

Arbitrary shaped subcubes

• What is it ?• How can it happen ?• Why is it so bad ?• How to avoid them ?

Page 43: Biug 20112026   dimensional modeling and mdx best practices

Arbitrary shaped subcubes

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

Lodnon

Union((Redmond, Year.Year.MEMBERS), (City.City.MEMBERS, 2005))

Page 44: Biug 20112026   dimensional modeling and mdx best practices

Arbitrary shaped subcubes

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

SF

Denver

CrossJoin(City.City.MEMBERS, Year.Year.MEMBERS) – (Seattle, 2007)

Page 45: Biug 20112026   dimensional modeling and mdx best practices

Arbitrary shaped subcubes

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

)}Redmond,2005) ,(Seattle, 2006) ,(New York, 2007) ,(London, 2008{(

Page 46: Biug 20112026   dimensional modeling and mdx best practices

Arbitrary shaped subcubes

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

Union(([All Cities], Year.MEMBERS), (City.MEMBERS, [All Years]))

Page 47: Biug 20112026   dimensional modeling and mdx best practices

Arbitrary shapes

• WHERE/Subselect/Aggregate• Unnatural hierarchies• Parent-Child (visual totals)• “Non Leaves” subcube• Conditional logic (IIF, IF, CASE,

CoalesceEmpty etc)• NonEmpty, Exists

Page 48: Biug 20112026   dimensional modeling and mdx best practices

WHERE/Subselect

• Severity = ‘1’ OR Priority = ‘1’• multiselect

– {USA, London}

Page 49: Biug 20112026   dimensional modeling and mdx best practices

Mixed grain slicer

All

USA

Seattle New York

UK

London Bristol

Page 50: Biug 20112026   dimensional modeling and mdx best practices

Mixed grain slicer

All

USA

Seattle New York

UK

London Bristol

All Cities Seattle New York London Bristol

All Countries

USA

UK

Page 51: Biug 20112026   dimensional modeling and mdx best practices

Parent-child

Page 52: Biug 20112026   dimensional modeling and mdx best practices

Leaves vs. Non Leaves

Leaves

All Country,All City

Countries,All City

Countries,Cities

All Product

s

Products

Page 53: Biug 20112026   dimensional modeling and mdx best practices

Problems with arbitrary shapes

• Caching• Partition slices• Indexes• SCOPEs• Matching calculations• Many more(for every topic we discuss – just ask “What will happen with arbitrary shapes”, and I am in trouble)

Page 54: Biug 20112026   dimensional modeling and mdx best practices

SCOPESCOPE ( [Date].[Month of Year].[All Periods],        [Date].[Month Name].[All],        Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members,                 { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } ) );       ...;END SCOPE;

Page 55: Biug 20112026   dimensional modeling and mdx best practices

Subcube decompositionSCOPE ( [Date].[Month of Year].[All Periods],        [Date].[Month Name].[All],        Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members,                 { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } ) );       ...;END SCOPE;

Scope 3

Scope 2

Scope 1

Page 56: Biug 20112026   dimensional modeling and mdx best practices

Subcube decompositionSCOPE ( [Date].[Month of Year].[All Periods],        [Date].[Month Name].[All],        Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ), Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember )    ...;END SCOPE;SCOPE ( [Date].[Month of Year].[All Periods],        [Date].[Month Name].[All],        [DateTool].[Aggregation].DefaultMember, Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember )    ...;END SCOPE;SCOPE ( [Date].[Month of Year].[All Periods],        [Date].[Month Name].[All],        Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ), [DateTool].[Comparison].DefaultMember    ...;END SCOPE;

Page 57: Biug 20112026   dimensional modeling and mdx best practices

MDX Optimization - Tips

• Partial expressions are not cachedThis = iif(<expensive expression >= 0, 1/<expensive expression>, null);

create member currentcube.measures.MyPartialExpression as <expensive expression> , visible=0;

this = iif(measures.MyPartialExpression >= 0, 1/ measures.MyPartialExpression, null); 

Page 58: Biug 20112026   dimensional modeling and mdx best practices

Demo

Page 59: Biug 20112026   dimensional modeling and mdx best practices

SSAS Denali

• Coming in the first half of 2012• SSAS Tabular Mode

– Cheaper– Not best of breed– Uses DAX or MDX

• Have you started working with it?

Page 60: Biug 20112026   dimensional modeling and mdx best practices

Mobile BI

l לנתוני גישהBI ודוחות מכול מקום ,בכל זמן ומכל מכשיר

l כיום כמעט לכל מנהל ישSmart Phonel בקרוב כל מנהל ידרושBIבמכשיר הנייד

"Mobile Bi מתאים למי שצריך לקבל מידע טקטי ולהחליט אופרטיבי. אין שום צורך לדחוף סתם כמויות BIמהר, כלומר

גדולות של דאטה למכשיר הנייד. "

Gartner

Page 61: Biug 20112026   dimensional modeling and mdx best practices

Social BI

• Discover New Insights - Analyze the demographic and psychographic profiles of your Facebook application users.

• Analyze Facebook Data - Analyze the full spectrum of Facebook data: profiles, interests, check-ins, and more

• Instantly Available via Cloud

Page 62: Biug 20112026   dimensional modeling and mdx best practices

Social BI

• Deep Personalization

• Enterprise Data Integration

Page 63: Biug 20112026   dimensional modeling and mdx best practices

Survey

• SQL / SSAS Denali• Mobile BI• Social BI

Page 64: Biug 20112026   dimensional modeling and mdx best practices

תודה על ההקשבה