the analyst’s perspective: ad-hoc analysis with microsoft...

33
1 1 The Analyst’s Perspective: Ad-hoc Analysis with Microsoft PowerPivot and Office 2010 Excel Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd [email protected]

Upload: dangthien

Post on 28-May-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

1 1

The Analyst’s Perspective: Ad-hoc Analysis with Microsoft PowerPivot and Office 2010 Excel

Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd [email protected]

2 2

Objectives

Introduce powerful self-service analysis with PowerPivot

Show use of Microsoft SQL Server 2008 Analysis Services Data Mining

The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. Portions © 2010 Project Botticelli Ltd & entire material © 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.

This seminar is based on a number of sources including a few dozen of Microsoft-owned presentations, used with permission. Thank you to Chris Dial, Tara Seppa, Aydin Gencler, Ivan Kosyakov, Bryan Bredehoeft, Marin Bezic, and Donald Farmer with his entire team for all the support.

3

PowerPivot

4 4

Massive Data Volumes With a few mouse clicks, a user can create and publish intuitive and interactive self-service analysis solutions

6

1. Analysing Massive Data Volumes Using PowerPivot

2. Slicer as a Better Filter

7 7

Published

Reports

SharePoint

Farm

Report-Based

Data Feeds

OLTP and OLAP Data Sources

Reporting Services as a Data Source

8

1. Report as a Data Source for Analysis

9 9

Share and Collaborate

With SharePoint:

Publish your PowerPivots as Web applications for your team

Schedule data refreshes to keep your analysis up-to-date

Manage security just like a document

10 10

PowerPivot Infrastructure Overview

SharePoint Farm

WFE

App Servers

Content dBs

NLB

Excel, RB, PerfPoint

Power User

Data Sources

Excel Services

PowerPivot Mid-Tier

AS Engine

Browser

Standard User

PowerPivot Add-In

11 11

PowerPivot Infrastructure: Excel

SharePoint Farm

WFE

App Servers

Content dBs

NLB

Excel Services

Gemini Mid-Tier

Gemini Engine

Browser

Standard User

Excel, RB, PerfPoint

Power User

Data Sources

• Use of IMBI Engine: In-Memory Column-

Based store

• Once data is imported, all calculations

are performed on client

• Excel now has it’s own local SSAS

engine

• Added Excel power functions for Gemini

called DAX (Data Analysis eXpressions)

• Use of new compression algorithm to

significantly compress the data ~ 10:1

• Added slicer functionality: not just for UI

but for smoother SharePoint integration

PowerPivot Add-In

12 12

Excel, RB, PerfPoint

Power User

Data Sources

Browser

Standard User

SharePoint Farm

WFE

App Servers

Content dBs

NLB

Excel Services

PowerPivot Mid-Tier

AS Engine

PowerPivot SharePoint Integration: ECS Viewing

Excel Web Access

13 13

Excel, RB, PerfPoint

Power User

Data Sources

Browser

Standard User

SharePoint Farm

WFE

App Servers

Content dBs

NLB

Excel Services

PowerPivot Mid-Tier

AS Engine

PowerPivot SharePoint Integration: Server Action

Excel Web Access

14 14

Data Analysis Expressions (DAX)

Simple Excel-style formulas

Define new fields in the PivotTable field list

Enable Excel users to perform powerful data analysis using the skills they already have

Has elements of MDX but does not replace MDX

15 15

Data Analysis Expressions (DAX)

No notion of addressing individual cells or ranges

DAX functions refer to columns in the data

Sample DAX expression Means: = [First Name] &“ ”& [Last Name] String concatenation just like Excel

=SUM(Sales[Amount]) SUM function takes a column name

instead of a range of cells

=RELATED (Product[Cost]) new RELATED function follows

relationship between tables

16 16

DAX Aggregation Functions

DAX implements aggregation functions from Excel including SUM, AVERAGE, MIN, MAX, COUNT, but instead of taking multiple arguments (a list of ranges,) they take a reference to a column

DAX also adds some new aggregation functions which aggregate any expression over the rows of a table

SUMX (Table, Expression)

AVERAGEX (Table, Expression)

COUNTAX (Table, Expression)

MINX (Table, Expression)

MAXX (Table, Expression)

16

17 17

More than 80 Excel Functions in DAX Date and Time Information Math and Trig Statistical Text DATE ISBLANK ABS AVERAGE CONCATENATE DATEVALUE ISERROR CEILING, ISO.CEILING AVERAGEA EXACT DAY ISLOGICAL EXP COUNT FIND EDATE ISNONTEXT FACT COUNTA FIXED EOMONTH ISNUMBER FLOOR COUNTBLANK LEFT HOUR ISTEXT INT MAX LEN MINUTE LN MAXA LOWER

MONTH Logical LOG MIN MID NOW AND LOG10 MINA REPLACE SECOND IF MOD REPT TIME IFERROR MROUND RIGHT TIMEVALUE NOT PI SEARCH TODAY OR POWER SUBSTITUTE WEEKDAY FALSE QUOTIENT TRIM WEEKNUM TRUE RAND UPPER YEAR RANDBETWEEN VALUE

YEARFRAC ROUND

ROUNDDOWN ROUNDUP SIGN SQRT SUM SUMSQ TRUNC

18 18

Example: Functions over a Time Period TotalMTD (Expression, Date_Column [, SetFilter])

TotalQTD (Expression, Date_Column [, SetFilter])

TotalYTD (Expression, Date_Column [, SetFilter] [,YE_Date])

OpeningBalanceMonth (Expression, Date_Column [,SetFilter])

OpeningBalanceQuarter (Expression, Date_Column [,SetFilter])

OpeningBalanceYear (Expression, Date_Column [,SetFilter] [,YE_Date])

ClosingBalanceMonth (Expression, Date_Column [,SetFilter])

ClosingBalanceQuarter (Expression, Date_Column [,SetFilter])

ClosingBalanceYear (Expression, Date_Column [,SetFilter] [,YE_Date])

19

1. Simplicity of DAX to Relate and Analyse Data

20

Data Mining

21 21

What does Data Mining Do?

Explores Your Data

Finds Patterns

Performs Predictions

22 22

Typical Uses

Data Mining

Seek Profitable Customers

Understand Customer

Needs

Anticipate Customer

Churn

Predict Sales &

Inventory

Build Effective

Marketing Campaigns

Detect and Prevent Fraud

Correct Data During

ETL

23 23

Analysis Services Server

Mining Model

Data Mining Algorithm Data Source

Server Mining Architecture

Excel/Visio/SSRS/Your App

OLE DB/ADOMD/XMLA

Deploy

BIDS Excel Visio SSMS

App Data

24 24

Mining Model Mining Model Mining Model

Mining Process

DM Engine DM Engine

Training data

Data to be

predicted Mining Model

With

predictions

25

Microsoft Decision Trees

Use for: Classification: churn and risk analysis

Regression: predict profit or income

Association analysis based on multiple predictable variable

Builds one tree for each predictable attribute

Fast

26

1. Decision Trees for Classification of Customers’ Buying Potential

27 27

Profitability and Risk

Finding what makes a customer profitable is also classification or regression

Typically solved with: Decision Trees (Regression), Linear Regression,

and Neural Networks or Logistic Regression

Often used for prediction Important to predict probability of the predicted, or expected profit

Risk scoring Logistic Regression and Neural Networks

28 28

Neural Network & Logistic Regression

Applied to Classification

Regression

Great for finding complicated relationship among attributes

Difficult to interpret results

Gradient Descent method

LR is NNet with no hidden layers

Age Education Sex Income

Input

Layer

Hidden

Layers

Output

Layer Loyalty

29

1. Neural Networks for Predicting Lending Risk

30

Time Series

Uses: Forecast sales

Inventory prediction

Web hits prediction

Stock value estimation

Regression trees with extras

31

1. Foerecasting Sales with Time Series

32 32

Data Mining Techniques Algorithm Description

Decision Trees Finds the odds of an outcome based on values in a training set

Association Rules Identifies relationships between cases

Clustering Classifies cases into distinctive groups based on any attribute sets

Naïve Bayes Clearly shows the differences in a particular variable for various data elements

Sequence Clustering

Groups or clusters data based on a sequence of previous events

Time Series Analyzes and forecasts time-based data combining the powerof ARTXP (developed by Microsoft Research) for short-term predictionswith ARIMA (in SQL 2008) for long-term accuracy.

Neural Nets Seeks to uncover non-intuitive relationships in data

Linear Regression Determines the relationship between columns in order to predict an outcome

Logistic Regression

Determines the relationship between columns in order to evaluate the probability that a column will contain a specific state

34 34

Summary

Self-service analysis is now very powerful

Works with huge data sets PowerPivot for columnar and multidimensional analysis

Data Mining for pattern discover

To start, all you need is PowerPivot, Excel 2010, and perhaps SQL Analysis Services

35 35

© 2010 Microsoft Corporation & Project Botticelli Ltd. All rights reserved. The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. Portions © 2010 Project Botticelli Ltd & entire material © 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.