the information herein is for informational purposes only and represents the opinions and views of...

22
Advanced Analytics with Excel and Data Mining using SQL Server 2008 R2 Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd [email protected], @rafaldotnet

Upload: michael-gaines

Post on 16-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

Advanced Analytics with Excel and Data Mining using SQL Server 2008 R2Rafal Lukawiecki

Strategic Consultant, Project Botticelli [email protected], @rafaldotnet

Page 2: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

2

Acknowledgments & Notes

The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.

Portions © 2011 Project Botticelli Ltd & entire material © 2011 Microsoft Corp unless noted otherwise. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation.  Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.

This seminar is based on a number of sources including a few dozen of Microsoft-owned presentations, used with permission. Thank you: Michael Tejedor, Chris Volckerick, Tara Seppa, Morris Novello, Bryan Bredehoeft, Chris Dial, Donald Farmer, Sabrena McBride, Peter Sprague, and Diwakar Rajagopal.

Page 3: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

3

• Introduce Data Mining• Demonstrate several DM techniques using

Excel

Objectives

Page 4: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

4

What does Data Mining Do?

Explores Your Data

Finds Patterns

Performs Prediction

s

Page 5: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

5

• Technologies for analysis of data and discovery of (very) hidden patterns

• Fairly young (<20 years old) but clever algorithms developed through database research

• Uses a combination of statistics, probability, artificial intelligence, machine learning, and database technologies

Data Mining

Page 6: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

6

Typical Uses

Predictive

Analysis

Seek Profitable Customers

Understand

Customer Needs

Anticipate Customer

Churn

Predict Sales &

Inventory

Build Effective Marketing Campaign

s

Detect and

Prevent Fraud

Correct Data

During ETL

Page 7: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

7

DM – Part Of Microsoft SQL Server

Page 8: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

9

Mining Model Mining ModelMining Model

Mining Process

DM EngineDM Engine

Training data

Data to be predictedMining Model

With predictions

Page 9: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

10

• Free add-in for Excel 2007 (and Visio)• Works with 32 bit edition of Office 2010• 64 bit version in the future

• Requires SQL Server Analysis Services

• Analyze Tab – simpler to use• Data Mining Tab – full power

Data Mining Add-Ins for Excel

Page 10: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

11

Clustering

• Applied to • Segmentation:

Customer grouping, mailing campaign

• Also: classification and regression

• Anomaly detection• Discrete and

continuous data

Page 11: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

1212

Demo1. Finding Outliers with

Data Mining

Page 12: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

13

ClusteringOutlier/Anomaly Detection

Male Female

Son

Daughter

Parent

Age

Page 13: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

14

Time Series

• Uses:• Forecast sales• Inventory prediction• Web hits prediction• Stock value

estimation• Regression trees

with extras

Page 14: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

1515

Demo1. Forecasting Sales with

Time Series

Page 15: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

16

Association Rules

• Use for:• Market basket

analysis• Cross selling and

recommendations• Advanced data

exploration• Finds frequent

itemsets and rules• Sensitive to

parameters

Page 16: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

1717

Demo1. Market Basket

Analysis

Page 17: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

18

Sequence Clustering

• Analysis of:• Customer behaviour• Transaction patterns• Click stream• Customer

segmentation• Sequence prediction

Page 18: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

1919

Demo1. Predicting Customer

Behaviour

Page 19: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

22

AlgorithmsAlgorithm Description

Decision Trees Finds the odds of an outcome based on values in a training set

Association Rules

Identifies relationships between cases

Clustering Classifies cases into distinctive groups based on any attribute sets

Naïve Bayes Clearly shows the differences in a particular variable for various data elements

Sequence Clustering

Groups or clusters data based on a sequence of previous events

Time Series Analyzes and forecasts time-based data combining the powerof ARTXP (developed by Microsoft Research) for short-term predictionswith ARIMA (in SQL 2008) for long-term accuracy.

Neural Nets Seeks to uncover non-intuitive relationships in data

Linear Regression

Determines the relationship between columns in order to predict an outcome

Logistic Regression

Determines the relationship between columns in order to evaluate the probability that a column will contain a specific state

Page 20: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

23

Time Series

Sequence Clustering

Neural Nets

Naïve Bayes

Logistic Regression

Linear Regression

Decision Trees

Clustering

Association Rules

Classifica

tion

Estimatio

n

Segmentation

Associa

tion

Forecasti

ng

Text Analysis

Advanced Data

Exploration

Page 21: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

24

• Data Mining is one of several advanced analytical tools of SQL Server 2008 R2

• Excel makes it easier to use• See what is hiding in your data

Summary

Page 22: The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material

25

© 2011 Microsoft Corporation & Project Botticelli Ltd. All rights reserved.

The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.

Portions © 2011 Project Botticelli Ltd & entire material © 2011 Microsoft Corp unless noted otherwise. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation.  Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.