copyright © 2006, sas institute inc. all rights reserved. predictive modeling concepts and...

23
Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Upload: gladys-eunice-shepherd

Post on 26-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Predictive Modeling Concepts and AlgorithmsRuss Albright and David DulingSAS Institute

Page 2: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Predictive Modeling Landscape

1. Background

2. Modeling Overview

3. Models

4. Model Assessment and Selection

5. Model Deployment / Scoring

Page 3: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Use Cases for Data Mining1. Offline applications

Campaign planning Adverse event detection

2. On-demand applications Front Office data collection & recommendation

3. Real-time applications Transaction processing Fraud detection Website product recommendation

4. Real time modeling and scoring of data streams (the future!) Mega data streams Internet traffic Satellite transmissions Digital data acquisition

Page 4: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Background - Enterprise Miner Functionality

ampleample

xplorexplore

odifyodify

odelodel

ssessssess

Page 5: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Background - Predictive Modeling TerminologyTraining Data

Observations

Variables/Features/AttributesActual Target

Scoring DataActual Target

Validation and Test Data Actual TargetPredicted Target

(Output)

Predicted Target(Output)

Page 6: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Modeling Overview

What do we mean by prediction?

What is a predictive model?• Classification/descriminant model– target is categorical,

usually binary

• Regression model– target continuous

Given {x(i),y(i)},

y=f(x,θ)

E(y|x,θ)

p(y|x,θ)

Page 7: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Consider the following dataPredict the Response for a new value of Attribute

Resp

onse

Attribute

Page 8: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

The Most Simple Model: y = YRe

spon

se

Attribute

Page 9: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

What about a polynomial ?Re

spon

se

Attribute

Page 10: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

What about a better polynomial ?Re

spon

se

Attribute

Page 11: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Now acquire more data and call it “validation data”

The blue model is said to overfit the training data.

The mean model is said to underfit the training data.

Resp

onse

Attribute

TrainingValidation

Page 12: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Models

Linear Regression

X

X

** *

*

* **

* *

*

*

*

*

Y

*

**

*

2

1

y = 0 + 1x1 + 2x2

Logistic Regression (Generalized Linear Model)

log(pj/(1-pj)) = 0 + 1X1 + 2X2

0-1 target/response variable

Fit pj = p(yj=0|x) = 1- p(yj=1|x)

Page 13: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Idea: What if we break the data into smaller chunks to identify local phenomena ?

Resp

onse

Attribute

Page 14: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Decision Trees

Page 15: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Neural Networks

ftp://ftp.sas.com/pub/neural/FAQ.html

Page 16: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Evolution of model training error and validation error

Mod

el E

rror

Initialization

Training Error

Validation Error

Underfitting Overfitting

Optimal fit

Page 17: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Memory Based Reasoning (Nearest Neighbors)

X

X

** *

*

* **

*

*

*

*

*

Y

*

**

*

2

1

Neighbors

Page 18: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Model Assessment and Selection – Lift chartsTest Data Actual Target

Predicted Target(Output)

10

01

.9

.8

.3

.6 1

Decision

10

1

Page 19: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Model Assessment Selection – ROC CURVES

Page 20: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Page 21: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

5. $ Model Deployment / “Scoring” $

It is definitely not (just) about building the models.

Scoring and Score Code

Monitoring

Page 22: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Batch Score Delivery to Offline Applications

SAS Scoring

Data StoreScores

RDB ScoringC code

PMML engineBI Application

Scheduled ScoringETL process

Operations

ETL engineModel

Development

ETL for model development and scoringScores generated on nightly basisID and Score data pre-loaded into data storeScore requests contain ID Decision server translates score to action

CampaignPlanning

CampaignExecution

Data Mining

Page 23: Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

Copyright © 2006, SAS Institute Inc. All rights reserved.

Thanks!