the information herein is for informational purposes only and represents the opinions and views of...
TRANSCRIPT
Advanced Analytics with Excel and Data Mining using SQL Server 2008 R2Rafal Lukawiecki
Strategic Consultant, Project Botticelli [email protected], @rafaldotnet
2
Acknowledgments & Notes
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.
Portions © 2011 Project Botticelli Ltd & entire material © 2011 Microsoft Corp unless noted otherwise. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.
This seminar is based on a number of sources including a few dozen of Microsoft-owned presentations, used with permission. Thank you: Michael Tejedor, Chris Volckerick, Tara Seppa, Morris Novello, Bryan Bredehoeft, Chris Dial, Donald Farmer, Sabrena McBride, Peter Sprague, and Diwakar Rajagopal.
3
• Introduce Data Mining• Demonstrate several DM techniques using
Excel
Objectives
4
What does Data Mining Do?
Explores Your Data
Finds Patterns
Performs Prediction
s
5
• Technologies for analysis of data and discovery of (very) hidden patterns
• Fairly young (<20 years old) but clever algorithms developed through database research
• Uses a combination of statistics, probability, artificial intelligence, machine learning, and database technologies
Data Mining
6
Typical Uses
Predictive
Analysis
Seek Profitable Customers
Understand
Customer Needs
Anticipate Customer
Churn
Predict Sales &
Inventory
Build Effective Marketing Campaign
s
Detect and
Prevent Fraud
Correct Data
During ETL
7
DM – Part Of Microsoft SQL Server
9
Mining Model Mining ModelMining Model
Mining Process
DM EngineDM Engine
Training data
Data to be predictedMining Model
With predictions
10
• Free add-in for Excel 2007 (and Visio)• Works with 32 bit edition of Office 2010• 64 bit version in the future
• Requires SQL Server Analysis Services
• Analyze Tab – simpler to use• Data Mining Tab – full power
Data Mining Add-Ins for Excel
11
Clustering
• Applied to • Segmentation:
Customer grouping, mailing campaign
• Also: classification and regression
• Anomaly detection• Discrete and
continuous data
1212
Demo1. Finding Outliers with
Data Mining
13
ClusteringOutlier/Anomaly Detection
Male Female
Son
Daughter
Parent
Age
14
Time Series
• Uses:• Forecast sales• Inventory prediction• Web hits prediction• Stock value
estimation• Regression trees
with extras
1515
Demo1. Forecasting Sales with
Time Series
16
Association Rules
• Use for:• Market basket
analysis• Cross selling and
recommendations• Advanced data
exploration• Finds frequent
itemsets and rules• Sensitive to
parameters
1717
Demo1. Market Basket
Analysis
18
Sequence Clustering
• Analysis of:• Customer behaviour• Transaction patterns• Click stream• Customer
segmentation• Sequence prediction
1919
Demo1. Predicting Customer
Behaviour
22
AlgorithmsAlgorithm Description
Decision Trees Finds the odds of an outcome based on values in a training set
Association Rules
Identifies relationships between cases
Clustering Classifies cases into distinctive groups based on any attribute sets
Naïve Bayes Clearly shows the differences in a particular variable for various data elements
Sequence Clustering
Groups or clusters data based on a sequence of previous events
Time Series Analyzes and forecasts time-based data combining the powerof ARTXP (developed by Microsoft Research) for short-term predictionswith ARIMA (in SQL 2008) for long-term accuracy.
Neural Nets Seeks to uncover non-intuitive relationships in data
Linear Regression
Determines the relationship between columns in order to predict an outcome
Logistic Regression
Determines the relationship between columns in order to evaluate the probability that a column will contain a specific state
23
Time Series
Sequence Clustering
Neural Nets
Naïve Bayes
Logistic Regression
Linear Regression
Decision Trees
Clustering
Association Rules
Classifica
tion
Estimatio
n
Segmentation
Associa
tion
Forecasti
ng
Text Analysis
Advanced Data
Exploration
24
• Data Mining is one of several advanced analytical tools of SQL Server 2008 R2
• Excel makes it easier to use• See what is hiding in your data
Summary
25
© 2011 Microsoft Corporation & Project Botticelli Ltd. All rights reserved.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.
Portions © 2011 Project Botticelli Ltd & entire material © 2011 Microsoft Corp unless noted otherwise. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.