biug 20112026 dimensional modeling and mdx best practices
DESCRIPTION
My presentation at the last Israeli BI User Group. Some of the content taken from SQLBits and Chriss Web's blog.TRANSCRIPT
l Dimension Design l SSAS Best Practicesl MDX
l Inspired by Vincent Rainardi (http://sqlbits.com/ ) and Mosha Pumanski
Agenda
l 2007 הוקמה בשנתl אנשי תוכנה מומחים בתחומם 22החברה מונה lמשרדי החברה בהרצליה פיתוחl שותפים של בתחוםDB-וה BIl שותפים של בתחוםl בישראלחברת השירותים המובילה בתחום l-מובילים את תחום הBI-ה DB ב
אודות החברה
תחומי פעילותMicrostrategy בטכנולוגיות מיקרוסופט ו-BIביצוע פרויקטי •DWHאפיון הקמה ותחזוקה של מערכות •הדרכות וקורסים מתקדמים•
BI
SQL SERVER ו-SYBASEמומחים ב-• של מערכות קריטיות ומורכבות בחברות גדולות7*24תחזוקה • גדולות DB למערכות T&Pביצוע •OEMמומחיות בסיוע ללקוחות •
DB
פיתוח פרויקטים בהתאם לצרכי הלקוח• ,C(#פתרונות מבוססי טכנולוגיות מתקדמות של מיקרוסופט •
SILVERLIGHT,WEB( אפיון והקמת פרויקטי אינטגרציה•
.NET
לקוחות נבחרים
White Papers
• Analysis Services 2008 R2 Performance Guide
• Analysis Services 2008 Operation Guide
• Performance Improvements for MDX in SQL Server 2008 Analysis Services
• OLAP Design Best Practices
1 or 2 dimensions
• Simplicity, 1 dim• Hierarchy from customer
attribute &account attribute• Use when we don’t have fact
tables requiring customer grain.
• We can get the customer attributes without knowing the account key
• Disadvantage: can’t go from account to customer without going through the fact table - performance
customerattributes
FactTable
DimAccount
a) One Dimension b) Two Dimensions
FactTable
DimAccount
DimCustomer
1 or 2 dimensions
• Dim customer is needed by another fact table• Modular: 2 separate dim tables but we can combine
them easily to create a bigger dimension• To get the breakdown of a measure by a customer
attribute is a bit more complicated than a)
select c. attribute, sum(f.measure1) from fact1 finner join dim_account a on f.account_key = a.account_keyinner join dim_customer c on a.customer_key = c.customer_keygroup by c. attribute
c) Snowflake
FactTable
DimAccount
DimCustomer
When to Snowflake
1. When the sub dim is used by several dimsCity-Country-Region columns exist in DimBroker, DimPolicy, DimOffice and DimInsured
Replaced by Location/GeoKey pointing to DimLocation / DimGeography
Advantage: consistent hierarchy, i.e. relationship between City, Country & Region.
Weakness: we would lose flexibility. City to Country are more or less fixed, but the grouping of countries might be different between dimensions.
When to Snowflake
2. When the sub dim is used by both the main dim and the fact table(s)
• DimCustomer is used in DimAccount, and is also used in the fact table.
• DimManufacturer is used in DimProduct, and is also used in the fact table.
• DimProductGroup is used in DimProduct, and is also used in some fact table.
The alternative is maintaining two full dimensions (star classic).
When to Snowflake
3. To make “base dim” and “detail dim”
Insurance classes, account types (banking), product lines, diagnosis, treatment (health care)
Policies for marine, aviation & property classes have different attributes. Pull common attributes into 1 dim: DimBasePolicy Put class-specific attributes into DimMarine, DimProperty, DimAviation
Ref: Kimball DW Toolkit 2nd edition page 213
A dimension with only 1 attribute
Reasons for putting single attribute in its own dim:– Keep fact table slim (4 bytes int not 100 bytes varchar)– When the value changes, we don’t have to update the BIG fact
table – ETL performance– Grain is much lower than fact table – small dim– Yes it’s only 1 attribute today, but in the future there could be
another attribute.
Should we put the attribute in the fact table? (like DD = Degenerate Dim)Probably, if the grain = fact table, and it’s short or it’s a number.
Fact Table Primary Key
Should we have a PK?Yes, if we need to be able to identify each fact row
1. Need to refer to a fact row from another fact row e.g. chain of events
2. Many identical fact rows and we need to update/delete only one
3. To link the fact table to another fact table
Some experts totally disagree
PK FK (no RI)PK FK)not enforced(
Related Trans Header - Detail
PK
Uniqueness
previous/next transaction
Fact Table Primary Key
Single or Multi Column? Single Column: Generated Identity Multi Column: Dimension Keys
Single-column PK is better than multi-column PK because :
1) A multi-column PK may not be unique. A single-column PK guarantees that the PK is unique, because it is an identity column.
2) A single-column PK is slimmer than a multi-column PK, better query performance. To do a self join in the fact table (e.g. to link the current fact row to the previous fact row), we join on a single integer column.
Fact Table Primary Key• Advantage: Prevent duplicate rows, query performance• Disadvantage: loading performance• Indexing the PK: cluster or not?
– Cluster the PK if: the PK is an identity column – Don’t cluster the PK if: the PK is a composite, or when you need
the cluster index for query performance (with partitioning)
Example of not having a PK
If duplicate fact rows are allowed.e.g. retail DW: Store Key, Date Key, Product Key, Customer KeySame customer buying the same milk in the same shop on the same day twice
Aggregate Fact Tables
What are they? • High level aggregation of base fact tables• A “select group by” query on a 2 billion rows
fact table can take 30 mins if it joins with two big fact tables, even with indexes in place
• So we do this query in advance as part of the DW load and store it as an Aggregate Fact Table
• The report only takes 1 second to run.AggregateFact Table
Base Fact Tables
Report
30 mins
1 sec
Rapidly Changing Dimension• Why is it a problem
– Large SCD2 dim – Attributes change every day – Slow query when join with large fact tables
• What to do– Put into a separate dim, link direct to fact table.– Just store the latest, type 1 attributes (or dual)– Store in the fact table (for small attribute, e.g. indicator)
Type2
Type1
Type2Type2
Type2
Very Large DimensionWhy is it a problem
– SSAS: 4 GB string store limit for dimension– SSAS: dim is “select distinct” on each attribute
– long processing time– Difficult to browse high cardinality attribute– Join with fact tables – performance
What to do– Split into 2 dims, same grain. Always cut vertically. – Remove SCD2, or at least only certain columns.– Most common: separate the attributes with high cardinality/change
frequency
Very Large Dimension
VLD
Real Time Fact Table
• Reporting the transaction system in real time• View to union with the normal fact table, or use partitions• Freezing the dims for key lookup, -3 unknown key• Key corrections next day
Real time partition)intraday today(
Dims as ofyesterday
Main partition)up to last night(
-1 null in source-2 not in dim table-3 not in dim table as dim was frozen to be resolved next batch
Unknown keys:
dimkey
Dealing with Currency Rates
What for/background/requirements– Report in 3 reporting currencies, using today rates or past– Analyse over time without the impact of currency rates (using fixed
currency rates, e.g. 2010 EOY rates)– Had the transactions happened today– Currency rates historical analysis
TransactionCurrency
DWCurrency
ReportingCurrencyTransaction
RatesReporting
Rates)many transaction
dates( )1 reporting
date(
100 countries40 currencies
1 currency 3-4 currenciesGBP, USD, EUR,
Originale.g. GBP
Dealing with Currency Rates
• A good example can be found here.
Dealing with StatusWhat/background
– Workflow (policies, contracts, documents)– Bottleneck analysis (no of days between
stages)– How many on each stage
Status 1
Status 3
Status 4
Status 5
Status 6
Status 2
date1 date4date3date2
Dealing with Status
Approaches– Accumulative Snapshot Fact, 1 row per application– SCD2 on DimApp– App Status fact table
AppKey Sts1Date Sts1Ind Sts2Date Sts2Ind Sts3Date Sts3Ind1 1/3/11 1 3/3/11 1 7/3/11 12 6/3/11 1 7/3/11 1 0
AppKey StsKey StsDateKey1 1 611 2 631 3 672 1 662 2 67
AppKey AppID StsKey StsDate Current1 1 1 1/3/11 N2 1 2 3/3/11 N3 1 3 7/3/11 Y4 2 1 6/3/11 N5 2 2 7/3/11 Y
Referenced Dimensions
• Enables using one “master” member • Not Snowflake dimension
– For ex. • Dim customers: UK, London, Roman Avramovich.• Dim Stores: UK, London, Friendly Bikes Store
– What is the total revenue from Internet customers and stores in London?
MDX optimization Methodology
• Re-write the MDX code• Add Aggregations• Add pre-calculated Measure Groups (ETL)• Solve the problem using Relational Engine• Use .NET Store Procedures.
– Rarely the problem can be solved using better hardware.
• Column based Databases
• Optimizing MDX– Baselining Query Speeds
• Clearing the Analysis Services Caches• Clearing the Operating System Caches using
fsutil.exe or SSAS Stored Proc (codeplex)• Identifying and Resolving MDX Query Performanc
e Bottlenecks in SQL Server 2005 Analysis Services
• Configuring the Analysis Services Query Log
• Cell-by-Cell Mode vs. Subspace Mode
Almost always, performance obtained by using subspace (or block computation) mode is superior to that obtained by using cell-by-cell (nor naïve) mode.
Using Profiler
• So far so good
Doesn’t use the cache
Subcube
• Granularity• Slice
Granularity
• Single grain– List of GROUP BY attributes in SQL SELECT
• Mixed grain– Both Attribute.[All] and Attribute.MEMBERS
GranularityAll
Country,All City
Countries,All City
Countries,Cities
All Products
Products
Slice
• Single member– SQL: Where City = ‘Redmond’– MDX: [City].[Redmond]
• Multiple members– SQL: Where City IN (‘Redmond’, ‘Seattle’)– MDX: { [City].[Redmond], [City].[Seattle] }
Slice at granularity
SQLSELECT Sum(Sales), City FROM Sales_TableWHERE City IN (‘Redmond’, ‘Seattle’)GROUP BY City
MDXSELECT Measures.Sales ON 0, NON EMPTY {Redmond, Seattle} ON 1FROM Sales_Cube
Slice below granularity
SQLSELECT Sum(Sales) FROM Sales_TableWHERE City IN (‘Redmond’, ‘Seattle’)
MDXSELECT Measures.Sales ON 0FROM Sales_CubeWHERE {Redmond, Seattle}
Examples
All Years 2005 2006 2007 2008
All Cities
Redmond
Seattle
New York
London
Examples
All Years 2005 2006 2007 2008
All Cities
Redmond
Seattle
New York
London
)Seattle, Year.Year.MEMBERS(
Examples
All Years 2005 2006 2007 2008
All Cities
Redmond
Seattle
New York
London
)Seattle, Year.MEMBERS(
Examples
All Years 2005 2006 2007 2008
All Cities
Redmond
Seattle
New York
London
})Redmond, Seattle, London ,{Year.MEMBERS(
Examples
All Years 2005 2006 2007 2008
All Cities
Redmond
Seattle
New York
London
})Redmond, Seattle} ,{2005, 2006, 2007({
Arbitrary shaped subcubes
• What is it ?• How can it happen ?• Why is it so bad ?• How to avoid them ?
Arbitrary shaped subcubes
All Years 2005 2006 2007 2008
All Cities
Redmond
Seattle
New York
Lodnon
Union((Redmond, Year.Year.MEMBERS), (City.City.MEMBERS, 2005))
Arbitrary shaped subcubes
All Years 2005 2006 2007 2008
All Cities
Redmond
Seattle
SF
Denver
CrossJoin(City.City.MEMBERS, Year.Year.MEMBERS) – (Seattle, 2007)
Arbitrary shaped subcubes
All Years 2005 2006 2007 2008
All Cities
Redmond
Seattle
New York
London
)}Redmond,2005) ,(Seattle, 2006) ,(New York, 2007) ,(London, 2008{(
Arbitrary shaped subcubes
All Years 2005 2006 2007 2008
All Cities
Redmond
Seattle
New York
London
Union(([All Cities], Year.MEMBERS), (City.MEMBERS, [All Years]))
Arbitrary shapes
• WHERE/Subselect/Aggregate• Unnatural hierarchies• Parent-Child (visual totals)• “Non Leaves” subcube• Conditional logic (IIF, IF, CASE,
CoalesceEmpty etc)• NonEmpty, Exists
WHERE/Subselect
• Severity = ‘1’ OR Priority = ‘1’• multiselect
– {USA, London}
Mixed grain slicer
All
USA
Seattle New York
UK
London Bristol
Mixed grain slicer
All
USA
Seattle New York
UK
London Bristol
All Cities Seattle New York London Bristol
All Countries
USA
UK
Parent-child
Leaves vs. Non Leaves
Leaves
All Country,All City
Countries,All City
Countries,Cities
All Product
s
Products
Problems with arbitrary shapes
• Caching• Partition slices• Indexes• SCOPEs• Matching calculations• Many more(for every topic we discuss – just ask “What will happen with arbitrary shapes”, and I am in trouble)
SCOPESCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members, { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } ) ); ...;END SCOPE;
Subcube decompositionSCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members, { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } ) ); ...;END SCOPE;
Scope 3
Scope 2
Scope 1
Subcube decompositionSCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ), Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember ) ...;END SCOPE;SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], [DateTool].[Aggregation].DefaultMember, Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember ) ...;END SCOPE;SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ), [DateTool].[Comparison].DefaultMember ...;END SCOPE;
MDX Optimization - Tips
• Partial expressions are not cachedThis = iif(<expensive expression >= 0, 1/<expensive expression>, null);
create member currentcube.measures.MyPartialExpression as <expensive expression> , visible=0;
this = iif(measures.MyPartialExpression >= 0, 1/ measures.MyPartialExpression, null);
Demo
SSAS Denali
• Coming in the first half of 2012• SSAS Tabular Mode
– Cheaper– Not best of breed– Uses DAX or MDX
• Have you started working with it?
Mobile BI
l לנתוני גישהBI ודוחות מכול מקום ,בכל זמן ומכל מכשיר
l כיום כמעט לכל מנהל ישSmart Phonel בקרוב כל מנהל ידרושBIבמכשיר הנייד
"Mobile Bi מתאים למי שצריך לקבל מידע טקטי ולהחליט אופרטיבי. אין שום צורך לדחוף סתם כמויות BIמהר, כלומר
גדולות של דאטה למכשיר הנייד. "
Gartner
Social BI
• Discover New Insights - Analyze the demographic and psychographic profiles of your Facebook application users.
• Analyze Facebook Data - Analyze the full spectrum of Facebook data: profiles, interests, check-ins, and more
• Instantly Available via Cloud
Social BI
• Deep Personalization
• Enterprise Data Integration
Survey
• SQL / SSAS Denali• Mobile BI• Social BI
תודה על ההקשבה