biug 20112026 dimensional modeling and mdx best practices

BIUG 2011מרץ

Itay [email protected]

mailto:[email protected]

l Dimension Design l SSAS Best Practicesl MDX

l Inspired by Vincent Rainardi (http://sqlbits.com/ ) and Mosha Pumanski

Agenda

http://sqlbits.com/

l 2007 הוקמה בשנתl אנשי תוכנה מומחים בתחומם 22החברה מונה lמשרדי החברה בהרצליה פיתוחl שותפים של בתחוםDB-וה BIl שותפים של בתחוםl בישראלחברת השירותים המובילה בתחום l-מובילים את תחום הBI-ה DB ב

אודות החברה

תחומי פעילותMicrostrategy בטכנולוגיות מיקרוסופט ו-BIביצוע פרויקטי •DWHאפיון הקמה ותחזוקה של מערכות •הדרכות וקורסים מתקדמים•

BI

SQL SERVER ו-SYBASEמומחים ב-• של מערכות קריטיות ומורכבות בחברות גדולות7*24תחזוקה • גדולות DB למערכות T&Pביצוע •OEMמומחיות בסיוע ללקוחות •

DB

פיתוח פרויקטים בהתאם לצרכי הלקוח• ,C(#פתרונות מבוססי טכנולוגיות מתקדמות של מיקרוסופט •

SILVERLIGHT,WEB( אפיון והקמת פרויקטי אינטגרציה•

.NET

לקוחות נבחרים

http://www.profitect.com/

White Papers

• Analysis Services 2008 R2 Performance Guide

• Analysis Services 2008 Operation Guide

• Performance Improvements for MDX in SQL Server 2008 Analysis Services

• OLAP Design Best Practices

http://msdn.microsoft.com/en-us/library/bb934106(v=SQL.105).aspx



http://technet.microsoft.com/en-us/library/cc966399.aspx

1 or 2 dimensions

• Simplicity, 1 dim• Hierarchy from customer

attribute &account attribute• Use when we don’t have fact

tables requiring customer grain.

• We can get the customer attributes without knowing the account key

• Disadvantage: can’t go from account to customer without going through the fact table - performance

customerattributes

FactTable

DimAccount

a) One Dimension b) Two Dimensions

FactTable

DimAccount

DimCustomer

1 or 2 dimensions

• Dim customer is needed by another fact table• Modular: 2 separate dim tables but we can combine

them easily to create a bigger dimension• To get the breakdown of a measure by a customer

attribute is a bit more complicated than a)

select c. attribute, sum(f.measure1) from fact1 finner join dim_account a on f.account_key = a.account_keyinner join dim_customer c on a.customer_key = c.customer_keygroup by c. attribute

c) Snowflake

FactTable

DimAccount

DimCustomer

When to Snowflake

1. When the sub dim is used by several dimsCity-Country-Region columns exist in DimBroker, DimPolicy, DimOffice and DimInsured

Replaced by Location/GeoKey pointing to DimLocation / DimGeography

Advantage: consistent hierarchy, i.e. relationship between City, Country & Region.

Weakness: we would lose flexibility. City to Country are more or less fixed, but the grouping of countries might be different between dimensions.

When to Snowflake

2. When the sub dim is used by both the main dim and the fact table(s)

• DimCustomer is used in DimAccount, and is also used in the fact table.

• DimManufacturer is used in DimProduct, and is also used in the fact table.

• DimProductGroup is used in DimProduct, and is also used in some fact table.

The alternative is maintaining two full dimensions (star classic).

When to Snowflake

3. To make “base dim” and “detail dim”

Insurance classes, account types (banking), product lines, diagnosis, treatment (health care)

Policies for marine, aviation & property classes have different attributes. Pull common attributes into 1 dim: DimBasePolicy Put class-specific attributes into DimMarine, DimProperty, DimAviation

Ref: Kimball DW Toolkit 2nd edition page 213

A dimension with only 1 attribute

Reasons for putting single attribute in its own dim:– Keep fact table slim (4 bytes int not 100 bytes varchar)– When the value changes, we don’t have to update the BIG fact

table – ETL performance– Grain is much lower than fact table – small dim– Yes it’s only 1 attribute today, but in the future there could be

another attribute.

Should we put the attribute in the fact table? (like DD = Degenerate Dim)Probably, if the grain = fact table, and it’s short or it’s a number.

Fact Table Primary Key

Should we have a PK?Yes, if we need to be able to identify each fact row

1. Need to refer to a fact row from another fact row e.g. chain of events

2. Many identical fact rows and we need to update/delete only one

3. To link the fact table to another fact table

Some experts totally disagree

PK FK (no RI)PK FK)not enforced(

Related Trans Header - Detail

PK

Uniqueness

previous/next transaction

Fact Table Primary Key

Single or Multi Column? Single Column: Generated Identity Multi Column: Dimension Keys

Single-column PK is better than multi-column PK because :

1) A multi-column PK may not be unique. A single-column PK guarantees that the PK is unique, because it is an identity column.

2) A single-column PK is slimmer than a multi-column PK, better query performance. To do a self join in the fact table (e.g. to link the current fact row to the previous fact row), we join on a single integer column.

Fact Table Primary Key• Advantage: Prevent duplicate rows, query performance• Disadvantage: loading performance• Indexing the PK: cluster or not?

– Cluster the PK if: the PK is an identity column – Don’t cluster the PK if: the PK is a composite, or when you need

the cluster index for query performance (with partitioning)

Example of not having a PK

If duplicate fact rows are allowed.e.g. retail DW: Store Key, Date Key, Product Key, Customer KeySame customer buying the same milk in the same shop on the same day twice

Aggregate Fact Tables

What are they? • High level aggregation of base fact tables• A “select group by” query on a 2 billion rows

fact table can take 30 mins if it joins with two big fact tables, even with indexes in place

• So we do this query in advance as part of the DW load and store it as an Aggregate Fact Table

• The report only takes 1 second to run.AggregateFact Table

Base Fact Tables

Report

30 mins

1 sec

Rapidly Changing Dimension• Why is it a problem

– Large SCD2 dim – Attributes change every day – Slow query when join with large fact tables

• What to do– Put into a separate dim, link direct to fact table.– Just store the latest, type 1 attributes (or dual)– Store in the fact table (for small attribute, e.g. indicator)

Type2

Type1

Type2Type2

Type2

Very Large DimensionWhy is it a problem

– SSAS: 4 GB string store limit for dimension– SSAS: dim is “select distinct” on each attribute

– long processing time– Difficult to browse high cardinality attribute– Join with fact tables – performance

What to do– Split into 2 dims, same grain. Always cut vertically. – Remove SCD2, or at least only certain columns.– Most common: separate the attributes with high cardinality/change

frequency

Very Large Dimension

VLD

Real Time Fact Table

• Reporting the transaction system in real time• View to union with the normal fact table, or use partitions• Freezing the dims for key lookup, -3 unknown key• Key corrections next day

Real time partition)intraday today(

Dims as ofyesterday

Main partition)up to last night(

-1 null in source-2 not in dim table-3 not in dim table as dim was frozen to be resolved next batch

Unknown keys:

dimkey

Dealing with Currency Rates

What for/background/requirements– Report in 3 reporting currencies, using today rates or past– Analyse over time without the impact of currency rates (using fixed

currency rates, e.g. 2010 EOY rates)– Had the transactions happened today– Currency rates historical analysis

TransactionCurrency

DWCurrency

ReportingCurrencyTransaction

RatesReporting

Rates)many transaction

dates( )1 reporting

date(

100 countries40 currencies

1 currency 3-4 currenciesGBP, USD, EUR,

Originale.g. GBP

Dealing with Currency Rates

• A good example can be found here.

http://consultingblogs.emc.com/christianwade/archive/2006/08/24/Currency-Conversion-in-Analysis-Services-2005.aspx

Dealing with StatusWhat/background

– Workflow (policies, contracts, documents)– Bottleneck analysis (no of days between

stages)– How many on each stage

Status 1

Status 3

Status 4

Status 5

Status 6

Status 2

date1 date4date3date2

Dealing with Status

Approaches– Accumulative Snapshot Fact, 1 row per application– SCD2 on DimApp– App Status fact table

AppKey Sts1Date Sts1Ind Sts2Date Sts2Ind Sts3Date Sts3Ind1 1/3/11 1 3/3/11 1 7/3/11 12 6/3/11 1 7/3/11 1 0

AppKey StsKey StsDateKey1 1 611 2 631 3 672 1 662 2 67

AppKey AppID StsKey StsDate Current1 1 1 1/3/11 N2 1 2 3/3/11 N3 1 3 7/3/11 Y4 2 1 6/3/11 N5 2 2 7/3/11 Y

Referenced Dimensions

• Enables using one “master” member • Not Snowflake dimension

– For ex. • Dim customers: UK, London, Roman Avramovich.• Dim Stores: UK, London, Friendly Bikes Store

– What is the total revenue from Internet customers and stores in London?

MDX optimization Methodology

• Re-write the MDX code• Add Aggregations• Add pre-calculated Measure Groups (ETL)• Solve the problem using Relational Engine• Use .NET Store Procedures.

– Rarely the problem can be solved using better hardware.

• Column based Databases

• Optimizing MDX– Baselining Query Speeds

• Clearing the Analysis Services Caches• Clearing the Operating System Caches using

fsutil.exe or SSAS Stored Proc (codeplex)• Identifying and Resolving MDX Query Performanc

e Bottlenecks in SQL Server 2005 Analysis Services

• Configuring the Analysis Services Query Log

http://sqlcat.com/whitepapers/archive/2007/12/16/identifying-and-resolving-mdx-query-performance-bottlenecks-in-sql-server-2005-analysis-services.aspx



http://msdn.microsoft.com/en-us/library/cc917676.aspx

• Cell-by-Cell Mode vs. Subspace Mode

Almost always, performance obtained by using subspace (or block computation) mode is superior to that obtained by using cell-by-cell (nor naïve) mode.

Using Profiler

• So far so good

Doesn’t use the cache

Subcube

• Granularity• Slice

Granularity

• Single grain– List of GROUP BY attributes in SQL SELECT

• Mixed grain– Both Attribute.[All] and Attribute.MEMBERS

GranularityAll

Country,All City

Countries,All City

Countries,Cities

All Products

Products

Slice

• Single member– SQL: Where City = ‘Redmond’– MDX: [City].[Redmond]

• Multiple members– SQL: Where City IN (‘Redmond’, ‘Seattle’)– MDX: { [City].[Redmond], [City].[Seattle] }

Slice at granularity

SQLSELECT Sum(Sales), City FROM Sales_TableWHERE City IN (‘Redmond’, ‘Seattle’)GROUP BY City

MDXSELECT Measures.Sales ON 0, NON EMPTY {Redmond, Seattle} ON 1FROM Sales_Cube

Slice below granularity

SQLSELECT Sum(Sales) FROM Sales_TableWHERE City IN (‘Redmond’, ‘Seattle’)

MDXSELECT Measures.Sales ON 0FROM Sales_CubeWHERE {Redmond, Seattle}

Examples

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

Examples

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

)Seattle, Year.Year.MEMBERS(

Examples

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

)Seattle, Year.MEMBERS(

Examples

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

})Redmond, Seattle, London ,{Year.MEMBERS(

Examples

All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

})Redmond, Seattle} ,{2005, 2006, 2007({

Arbitrary shaped subcubes

• What is it ?• How can it happen ?• Why is it so bad ?• How to avoid them ?


All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

Lodnon

Union((Redmond, Year.Year.MEMBERS), (City.City.MEMBERS, 2005))


All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

SF

Denver

CrossJoin(City.City.MEMBERS, Year.Year.MEMBERS) – (Seattle, 2007)


All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

)}Redmond,2005) ,(Seattle, 2006) ,(New York, 2007) ,(London, 2008{(


All Years 2005 2006 2007 2008

All Cities

Redmond

Seattle

New York

London

Union(([All Cities], Year.MEMBERS), (City.MEMBERS, [All Years]))

Arbitrary shapes

• WHERE/Subselect/Aggregate• Unnatural hierarchies• Parent-Child (visual totals)• “Non Leaves” subcube• Conditional logic (IIF, IF, CASE,

CoalesceEmpty etc)• NonEmpty, Exists

WHERE/Subselect

• Severity = ‘1’ OR Priority = ‘1’• multiselect

– {USA, London}

Mixed grain slicer

All

USA

Seattle New York

UK

London Bristol

Mixed grain slicer

All

USA

Seattle New York

UK

London Bristol

All Cities Seattle New York London Bristol

All Countries

USA

UK

Parent-child

Leaves vs. Non Leaves

Leaves

All Country,All City

Countries,All City

Countries,Cities

All Product

s

Products

Problems with arbitrary shapes

• Caching• Partition slices• Indexes• SCOPEs• Matching calculations• Many more(for every topic we discuss – just ask “What will happen with arbitrary shapes”, and I am in trouble)

SCOPESCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members, { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } ) ); ...;END SCOPE;

Subcube decompositionSCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members, { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } ) ); ...;END SCOPE;

Scope 3

Scope 2

Scope 1

Subcube decompositionSCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ), Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember ) ...;END SCOPE;SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], [DateTool].[Aggregation].DefaultMember, Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember ) ...;END SCOPE;SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ), [DateTool].[Comparison].DefaultMember ...;END SCOPE;

MDX Optimization - Tips

• Partial expressions are not cachedThis = iif(<expensive expression >= 0, 1/<expensive expression>, null);

create member currentcube.measures.MyPartialExpression as <expensive expression> , visible=0;

this = iif(measures.MyPartialExpression >= 0, 1/ measures.MyPartialExpression, null);

SSAS Denali

• Coming in the first half of 2012• SSAS Tabular Mode

– Cheaper– Not best of breed– Uses DAX or MDX

• Have you started working with it?

Mobile BI

l לנתוני גישהBI ודוחות מכול מקום ,בכל זמן ומכל מכשיר

l כיום כמעט לכל מנהל ישSmart Phonel בקרוב כל מנהל ידרושBIבמכשיר הנייד

"Mobile Bi מתאים למי שצריך לקבל מידע טקטי ולהחליט אופרטיבי. אין שום צורך לדחוף סתם כמויות BIמהר, כלומר

גדולות של דאטה למכשיר הנייד. "

Gartner

Social BI

• Discover New Insights - Analyze the demographic and psychographic profiles of your Facebook application users.

• Analyze Facebook Data - Analyze the full spectrum of Facebook data: profiles, interests, check-ins, and more

• Instantly Available via Cloud

http://www.microstrategy.com/social-intelligence/enterprise/wisdom/

Social BI

• Deep Personalization

• Enterprise Data Integration

http://www.microstrategy.com/social-intelligence/enterprise/gateway/

Survey

• SQL / SSAS Denali• Mobile BI• Social BI

תודה על ההקשבה

biug 20112026 dimensional modeling and mdx best practices

Technology

fact account

fact table performance

fact table slim

fact table modular

fact table small dim

fact row1

fact table primary keysingle

fact table primary keyshould