dama-chicago, june 15,2016 · • leverage in-house methods/tools expertise 2. engage a...

15
information engineering Associates DAMA-CHICAGO, JUNE 15,2016 Predictive Analytics: A Statistical Primer for Data Modelers Presenter: Bob Conway Information Engineering Associates [email protected] 303-885-4811 Copyright 2014© Information Engineering Associates. All Rights Reserved. 1

Upload: others

Post on 06-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

information engineering Associates

DAMA-CHICAGO, JUNE 15,2016Predictive Analytics:A Statistical Primer for Data Modelers

Presenter:Bob ConwayInformation Engineering [email protected]

Copyright 2014© Information Engineering Associates. All Rights Reserved. 1

Page 2: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates Predictive Analytics: Agenda

•  What – Contrast to Descriptive Analytics •  Why – Value Proposition for Predictive Analytics •  How – Statistical basis of Predictive Analytics •  Getting Started with Predictive Analytics •  Q&A

Page 3: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates

‘Traditional’ DW/BI Architecture

ERP

CRM

RPTG INTG

A_ERP

A_CRM

ETL1

ETL1

ETL2 ETL3

… …

SystemsOfRecord DataWarehouse/BusinessIntelligence

•  Source-centric•  Non-transformed

•  Normalized•  Granular

•  Denormalized•  Mul7-dimensional•  aggrega7on

Page 4: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates

Descriptive Analytics (OLAP)

COGNOS

Bus Obj

QlikView

ETL3

How effective was that promotional campaign?

What is the MTBF for part components? Supplier dependent?

Are customer transaction patterns coincident with fraud?

Descriptive Analytics – What has happened? Trends, patterns, exceptions in historic data Monitor/Control, business process improvement

… Reporting Layer

retail manufacturing

financial services

Page 5: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates Predictive Analytics

SAS

MatLab

Excel

Forecastfuturesales?àInventorylevelàLaborneeds?

OpAmalequipmentmaintenanceschedule?

Futurejetfuelpricesforhedgecontracts?

Predic'veAnaly'cs–Whatwillhappen?AdvancedstaAsAcsàmathemaAcalmodelsForecastfuturestate/behavior

ERP

CRM

A_E

A_C

INT RPT

… …

ext ETL P.A. Sandbox

granular denormalized (OBFT)

retail equip operation

airlines

Page 6: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates Predictive Analytics Lifecycle

Copyright 2014© Information Engineering Associates. All Rights Reserved. 6

Value

1Business

Understanding

2Data

PreparaAon

3Data

ExploraAon

4StaAsAcalModeling

5EvaluaAon

6Deployment

Page 7: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates Statistics 101

Mean(average),X=Σ(Xi)/N

Variance,Sx2=Σ(Xi–X)2/(N-1)=sx2

StandardDeviaAon,sx=√sx2

Page 8: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates Simple Linear Regression

Covariance,Sxy2=Σ(Xi–X)(Yi–Y)/(N-1)=sxy2Slope,b=sxy/sxIntercept,a=Y–bX

Y= a +bX

Best Fit: minimizes variance between predicted values and observed values

Page 9: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates

Simple Correlation

$-

$100.00

$200.00

$300.00

$400.00

$500.00

$600.00

$700.00

$800.00

0 200 400 600 800 1000

r=0.58

CorrelaAonCoefficient,r=sxy/sxsy -1<=r<=+1Correla'on=/=>CauseandEffect

0<=r2<=+1 r2==>fracAonofYvarianceaaributabletoX

$-

$100.00

$200.00

$300.00

$400.00

$500.00

$600.00

$700.00

$800.00

0 200 400 600 800 1000

r=0.8

r=0.0

Page 10: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates

Multiple Correlation & Regression

$-

$100.00

$200.00

$300.00

$400.00

$500.00

$600.00

$700.00

0 200 400 600 800 1000 $-

$100.00

$200.00

$300.00

$400.00

$500.00

$600.00

$700.00

$800.00

0 50 100

$- $100.00 $200.00 $300.00 $400.00 $500.00 $600.00 $700.00 $800.00

0 2 4 6 8

HH Income (x1000) Customer Age

MercedesSales=a+b(Income)+c(Age)+d(HHCars)R2=0.89

Cars/HH

r=0.8 r=0.3 r=0.6

Variable Sales Income Age Cars

Sales 1.0 0.8 0.6 0.3

Income 1.0 0.7 0.4

Age 1.0 0.3

Cars 1.0

Page 11: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates

Partial Correlation

MercedesSales=a’+b’(HHIncome)-c’(Age)

Variable Sales Income Age Cars

Sales 1.0 0.8 0.6 0.3

Income 1.0 0.7 0.4

Age 1.0 0.3

Cars 1.0

Variable Sales Income Age* Cars*

Sales 1.0 0.79 -0.48 0.09

Income 1.0

Age 1.0

Cars 1.0

OriginalCorrela'onMatrix Stepwise(Par'al)Correla'onMatrix

Page 12: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates

Nonlinear Regression

0

20

40

60

80

100

120

0 5 10 15 20 25

Y=a+bX+cX2

0

20

40

60

80

100

120

0 2 4 6 8 10 12 14

Y=a10-bX

Does the mathematical model make business sense?

Page 13: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates Periodic (Cyclic) Models

$-

$500

$1,000

$1,500

$2,000

$2,500

$3,000

$3,500

$4,000

1 3 5 7 9 11 13 15 17 19 21 23 25 27

Time DJIA = a + bT + cSIN(Tc -T) + dSIN(Td – T) + …

b=~1.10

Fourier transform

Page 14: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates

Getting Started Suggestions

1.  Bone up on statistics/predictive modeling •  www.coursera.org – free, online classes •  Books – “Predictive Analytics” by Conrad Carlberg •  Leverage in-house methods/tools expertise

2.  Engage a user-partner/business problem •  Evaluate current forecast process and impact •  Explore the data – metrics, sensitivity variables •  Prototype a model – test it against actual data

3.  Estimate business impact ($) – what if scenarios, precasting 4.  Get started with simple tools (Excel with advance stat plug-ins) 5.  Parallel trials with current process and proposed predictive model

•  Continuous monitoring and refinement

Page 15: DAMA-CHICAGO, JUNE 15,2016 · • Leverage in-house methods/tools expertise 2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore

Information Engineering Associates Questions

Copyright 2014© Information Engineering Associates. All Rights Reserved. 15