predictive analytics - w. p. carey school of business · 2020-01-07 · regression and correlation...

M. Janakiram Oct 2017

Predictive AnalyticsMani Janakiram, PhD

Director, Supply Chain Intelligence & Analytics, Intel Corp.Adjunct Professor of Supply Chain, ASU

October 2017

"Prediction is very difficult, especially if it's about the future." --Nils Bohr, Nobel laureate in Physics


Training Objective

• Data Analytics is now becoming the game changer in the industry and it is much more relevant for supply chain functions. Predictive Analytics is an advanced Data Analytics that leverages historical data and combines it with forecasting models to predict future outcomes. It can be used across the entire gamut of supply chain such as Plan, Source, Make, Deliver and Return.

• In this session, we will focus on the what, where and how predictive analytics can be used. Some of the techniques covered in this session include: Regression, Time Series Forecasting, Basic Machine Learning Algorithms (Clustering, Cart) and Decision Trees. Through use cases, we will demonstrate application of predictive analytics in supply chain functions.

2


What if you were able to….• Predict the outcome of your new product launch?

• Evaluate the impact of your marketing campaigns hourly and make inventory adjustments in real-time?

• Predict potential failure of your critical equipment and plan contingencies?

• Predict the market sentiments and buying behavior of your consumers in real-time?

• Predict the outcome of the US Presidential elections?

3


Data to Insights: Driving Industry 4.0

4

CognitiveAnalytics

4


Supply Chain Analytics

5

5


Predictive Modeling

DataHistory

Correlated Variables

(x1, x2, x3, , y)

Build Model

Y = f(x1,x2,x3…)

Predict Future Y

Given Xs, calc Y

• Learn the model based on historical data

• Correlation is used to understand strength of linear relationship. Beware causation versus correlation

• Predictive techniques (y=f(x)) such as Regression, Time Series Forecasting, Machine learning, Decision Tree are used to build models

Terms such as data mining and machine learning are often used

6


Predictive Model Algorithms (y=f(x))

Y = CategoricalY = Numeric No YSupervised Learning Unsupervised Learning

Regression Classification Clustering

K Means

HierarchicalNeural Networks

K Nearest Neighbors

Support Vector Machines

Logistic Regression

Linear Regression

Lasso Ridge

Best Subsets

Bayesian NetworksStepwise

Regression or Classification

Linear Discriminant Analysis

Time Series Forecasting

Decision Trees

Random Forests

Gradient Boosting

Principal Components Analysis

= we will look at these today7


Better Prediction Better Model!

Three models:Under fit: biasOver fit: varianceJust right!

x

y

Balance accuracy of fit on training data and generalization of future or test data to get a better model!

8


Predictive Modeling

• Regression Analysis

• Time Series Forecasting

• Machine Learning & Decision Trees

9


Regression and Correlation• Regression analysis is a tool for building statistical models

that characterize relationships between a dependent variable and one or more independent variables, all of which are numerical. – A regression model that involves a single independent

variable is called simple regression. A regression model that involves several independent variables is called multiple regression.

• Correlation is a measure of a linear relationship between two variables, X and Y, and is measured by the (population) correlation coefficient. Correlation coefficients will range from −1 to +1. Scatter plot evaluates the relationship between two variables.

10


Linear Regression• Find the line to minimize the squared error of

the fit (least squares loss function)

x

Y

1

2

3

4

5

1 2 3 4 5Predictive Model

Error of fit to this point⇒ This is called a residual

Y = 1 + 0.67x

Fit Y = β0 + β1 x(note – model canextend to any number of x’s)

11


Watch out for Nonsense Correlations!

• Beware lurking variables!• Correlation does not mean Causation

NOTE: images captured from the web12


Reed Auto periodically hasa special week-long sale. As part of the advertisingcampaign Reed runs one ormore television commercialsduring the weekend preceding the sale. Based on a sample of 5 previous sales, determine whether number of cars sold is influenced by number of TV Ads.

Simple Linear Regression

Example: Keith’s Auto Sales

Number ofTV Ads (x)

Number ofCars Sold (y)

13213

1424181727

Σx = 10 Σy = 100

2x = 20y =

0 1y b b x= +

See Excel output 13


Simple Linear Regression Excel Output

14


Example: Programmer Salary Survey

Multiple Regression Model

A software firm collected data for a sample of 20 computer programmers. A suggestion was made that regression analysis could be used to determine if salary was related to the years of experience and the score on the firm’s programmer aptitude test.The years of experience, score on the aptitude test, and corresponding annual salary ($1000s) for a sample of 20 programmers is shown in the table.

47158100166

92105684633

781008682868475808391

88737581748779947089

24.043.023.734.335.838.022.223.130.033.0

38.026.636.231.629.034.030.133.928.230.0

Exper. Score ScoreExper.Salary Salary

y = b0 + b1x1 + b2x2 + . . . + bpxp

See Excel output

15


Multiple Linear Regression Excel Output

16


Predictive Modeling




17


Time Series Forecasting?• Forecasting is an estimate of future performance based on

past/current performance• Qualitative and Quantitative technique can be used for

forecasting• Forecasting can be used to get insight into future performance

for better planning and managing the future activities• Forecasting examples:

– Revenue forecasting– Product demand forecasting– Equipment purchase– Inventory planning, etc.

18


Forecasting Inputs

Sources:Internal:• Performance

history• Experience

External:• Economy• Market• Customers• Competition

Methods:Qualitative:• Judgment• Intuition• Expectations

Quantitative:• Data• Statistical

analysis• Models

Factors: • Promotions• Price changes

• Seasonal changes• Trends / cycles

• Customer tastes• Gov’t regulations

Forecast !Forecast model should be better than Naïve models (which use Average or last period values)

19


Forecasting Methods

Combination of qualitative and quantitative techniques can be used for better forecasting..

Life Cycle Models

GraphicalTechniques

20


Qualitative Models

• Field Sales Force– Bottom-Up Aggregation of Salesmen’s Forecasts (subjective feel).

Surveys are also used very often to get a feel for what the future holds.

• Jury of Executives– A forecast developed by combining the subjective opinions of the

managers and executives who are most likely to have the best insights. We use it internally (called as Forecast Markets) for product forecast and transition.

• Users’ Expectations– Forecast based on users’ inputs (mostly subjective)

• Delphi Method– Similar to Jury of Jury of Executives but the process is recursive until

consensus among the panel members is reached.

21


Quantitative Time Series Models• Naïve Forecast

– Simple Rules Based upon Prior Experience• Trend

– Linear, Exponential, S-Curve and Other Types of Pattern Projection• Moving Average

– Un-weighted Linear Combination of Past Actual Values• Filters

– Weighted Linear Combination of Past Actual Values• Exponential Smoothing

– Forecasts Obtained by Smoothing and Averaging Past Actual Values• Decomposition

– Time Series Broken Into Trend, Seasonality, Cyclicality, and Randomness

• Data mining– Mostly CART and Neural Network techniques are used for forecasting


Causal/Explanatory Models• Regression Models:

– Modeling the value of an output based on the values of one or more inputs. Also called causal models (can have multiple inputs).

• Econometric Models:– Simultaneous Systems of Multiple Regression Equations

• Life Cycle Models:– Diffusion-based approach, used to predict market size and duration

based on product acceptance behavior• ARIMA Models:

– Class of time series models that take into consideration, various behavior of the response in question based on past time series data.

• Transfer Function Models:– Class of time series models that take into consideration, various

behavior of the response along with causal factors based on past time series data.

23


Simple Moving Average (or UWMA)

• The Simple Moving Average, or Uniformly Weighted Moving Average (UWMA) for a time period t, with a range span of k, is described as:

• The forecasted value for time t+1 becomes:

k

xSMA

k

iit

t

∑=

−

= 1

k

xSMA

k

iit

t

∑=

−+

+ = 1)1(

1

First 3 data points averaged

24


Exponentially Weighted Moving Average (EWMA)

• In contrast to Simple Moving Average, Exponentially Weighted Moving Average (EWMA) for a time period t, uses all of previously observed data, instead of just a fixed range.

• Smoothed value for a time period, t, indicated by Zt.

• Each previous observation receives different weight, with more recent observations being weighted more heavily, and weights decreasing exponentially.

• How these weights decrease determined by smoothing parameter, α. Normally it ranges from 0.2 to 0.4.– When α approaches 0, all prior observations have equal weight – When α approaches 1, most weight is given to more recent observations

• Smoothing method should be done on stationary time series without seasonality.

)( or )1(

111

11

−−−

−−

−+=−+=

tttt

ttt

ZyZZZyZ

ααα

25


Forecasting Scenario*

• Process: Sets of 10 coin flips

• What’s the naïve model?• Can you beat it? If so, how?

26


Forecasting: Common Pitfalls• Good-fitting model ≠ accurate forecast

– Standard modeling methods can be (ab)used to over-fit the data

Fitting a model to history is easy – good forecasting is not. (SAS Institute)

Forecast

27


Example: Regression vs. Time Series

• Estimate CO2 data for next 2 years• Intuitively, we could do better than the simple linear regression, right?

325

330

335

340

345

350

CO

2

01/1

973

01/1

974

01/1

975

01/1

976

01/1

977

01/1

978

01/1

979

01/1

980

01/1

981

01/1

982

01/1

983

01/1

984

01/1

985

01/1

986

01/1

987

01/1

988

01/1

989

01/1

990

Year/Month

Linear Fit

Bivariate Fit of CO2 By Year/Month

326328

332334

338

342

346348

352

356

360

Pre

dict

ed V

alue

0 50 100 150 200Row

Forecast

Time Series Models can yield better predictive models!

It takes advantage of correlation (s) between the current value and other responses that occurred previously in time

Time Series Example: Monthly CO2 concentrations from Mauna Loa captured for 14 years

Simple linear regression model shows overall trend of data, but it is not a best representation of all patterns in data!

Time series: Collection of observations made over equally spaced time intervalsE.g.: cost, inventory levels, shipment times, weekly forecast values

28


Components of a Time Series

• The time series data can mainly have the following components:

Trend Cyclical

350000

355000

360000

365000

370000

375000

Dec

Ja

n

Feb

Ma

r

Ap

r

Ma

y

Ju

n

Ju

l

Au

g

Sep

Oct

No

v

Level

Level

150000

350000

550000

750000

950000

1150000

1350000

Dec

Jan

Feb

Mar

Apr

May Jun

Jul

Aug

Sep

Oct

Nov

Linear (Trend) Poly. (Trend)

Trend: Continuing pattern of increase or decrease, either straight-line or curve

Level: Absence of trend, seasonality, or noise. Also, check for stability (Stationary vs. Non-stationary

0

0.5

1

1.5

2

2.5

3

Dec

Jan

Feb

Mar Ap

rMay Ju

nJu

lAu

gSe

pOct

Nov

Seasonality

Seasonal: A repeating pattern of increases and decreases that occurs with a regular frequency

First plot your data chart to understand any behavior and relationships

Seasonal Irregular

29


Models to Account for Seasonality• Think of creating model to account for Seasonality as extension of the

exponential smoothing.

• There are 3 terms to estimate in a seasonal model, the Level, the Trend, and the Seasonal components.

LevelTrend

Seasonal

326328

332334

338

342

346348

352

356

360

Pre

dict

ed V

alue

0 50 100 150 200Row

Forecast

Now look at this CO2 data

from this perspective!

30


Sample Size Guidance

More is better.

50-100 observations

recommended.

• 1 to 2 years of weekly data.• 4 to 8 years of monthly data.• 12 to 24 years of quarterly data.

31


Predictive Modeling




32


Get the right data

Mine the data

Determine what is interesting

Presentation

Set clear business goal

Machine Learning (ML)/Data Mining (DM)

ML/DM follows the problem solving methodology

Pre-processing

Training data

Testing dataModel/Result

New data

33


Decision Trees (Classification)• Finds the best factor split points to minimize

the classification error

x1

x2

Y = Or

1

2

3

4

5

1 2 3 4 5

If x1< 4.5 and x2 < 3 Then Y = Else Y =

Predictive Model

The model is a piecewise constant function

Y is a 2 level categorical variable

x1

x2

< 4.5

< 3Model classification error

34


Iris data – Clustering Analysis

• Iris data first published by geneticist and statistician Ronald Fisher in 1936 is widely used in machine learning for benchmarking.

• The sepal length, sepal width, petal length, and petal width are measured in mm on fifty Iris specimens from each of three species:

1. Iris Setosa2. Iris Versicolor3. Iris Virginica

• Machine learning goal is to classify the species based on the flower characteristics

35


4.5

5.5

6.5

7.5

22.5

33.5

44.5

1

3

5

7

0.51

1.52

2.5

Sepallength

4.5 5 5.5 6 6.5 7 7.5 8

Sepal w idth

2 2.5 3 3.5 4 4.5

Petallength

1 2 3 4 5 6 7

Petal w idth

.5 1 1.5 2 2.5

setosaversicolorvirginica

Species

Scatterplot Matrix

Iris dataY: 1=Setosa, 2=Versicolor & 3=Virginica; X: Sepal Length, Sepal Width, Petal Length and Petal Width

36


4.55.05.56.06.57.07.58.0

2.0

2.5

3.0

3.5

4.0

4.5

1

2

3

4

5

6

7

0.5

1.0

1.5

2.0

2.5

Sepallength

4.5 5.5 6.5 7.5

Sepal w idth

2.0 2.5 3.0 3.5 4.0 4.5

Petallength

1 2 3 4 5 6 7

Petal w idth

.5 1.0 1.5 2.0 2.5

setosaversicolorvirginica

Species

Scatterplot Matrix

Tree building

1

2

1

2

OPTIMAL

37


0.00

0.25

0.50

0.75

1.00

Spec

ies

All Row s

Petal w idth>=0.8 Petal w idth<0.8

setosa

versicolor

virginica

Tree building

0.00

0.25

0.50

0.75

1.00

Spec

ies

All Row s

setosa

versicolor

virginica

Split 1

0.00

0.25

0.50

0.75

1.00Sp

ecie

s

All Row sPetal w idth>=0.8

Petal w idth<1.8 Petal w idth>=1.8Petal w idth<0.8

setosa

versicolor

virginica

38


Single tree: Classification vs. regressionRegression

Continuous Response: Species NumberClassification

Categorical Response: Species

Output: Classification counts, and Misclassification %

Output: mean and variance for each node.

39


Gartner Magic Quadrant - Data Science

40


Business Analytics

41


Questions?

42

predictive analytics - w. p. carey school of business · 2020-01-07 · regression and correlation...

Documents