predictive analytics - w. p. carey school of business · 2020-01-07 · regression and correlation...
TRANSCRIPT
M. Janakiram Oct 2017
Predictive AnalyticsMani Janakiram, PhD
Director, Supply Chain Intelligence & Analytics, Intel Corp.Adjunct Professor of Supply Chain, ASU
October 2017
"Prediction is very difficult, especially if it's about the future." --Nils Bohr, Nobel laureate in Physics
M. Janakiram Oct 2017
Training Objective
• Data Analytics is now becoming the game changer in the industry and it is much more relevant for supply chain functions. Predictive Analytics is an advanced Data Analytics that leverages historical data and combines it with forecasting models to predict future outcomes. It can be used across the entire gamut of supply chain such as Plan, Source, Make, Deliver and Return.
• In this session, we will focus on the what, where and how predictive analytics can be used. Some of the techniques covered in this session include: Regression, Time Series Forecasting, Basic Machine Learning Algorithms (Clustering, Cart) and Decision Trees. Through use cases, we will demonstrate application of predictive analytics in supply chain functions.
2
M. Janakiram Oct 2017
What if you were able to….• Predict the outcome of your new product launch?
• Evaluate the impact of your marketing campaigns hourly and make inventory adjustments in real-time?
• Predict potential failure of your critical equipment and plan contingencies?
• Predict the market sentiments and buying behavior of your consumers in real-time?
• Predict the outcome of the US Presidential elections?
3
M. Janakiram Oct 2017
Data to Insights: Driving Industry 4.0
4
CognitiveAnalytics
4
M. Janakiram Oct 2017
Supply Chain Analytics
5
5
M. Janakiram Oct 2017
Predictive Modeling
DataHistory
Correlated Variables
(x1, x2, x3, , y)
Build Model
Y = f(x1,x2,x3…)
Predict Future Y
Given Xs, calc Y
• Learn the model based on historical data
• Correlation is used to understand strength of linear relationship. Beware causation versus correlation
• Predictive techniques (y=f(x)) such as Regression, Time Series Forecasting, Machine learning, Decision Tree are used to build models
Terms such as data mining and machine learning are often used
6
M. Janakiram Oct 2017
Predictive Model Algorithms (y=f(x))
Y = CategoricalY = Numeric No YSupervised Learning Unsupervised Learning
Regression Classification Clustering
K Means
HierarchicalNeural Networks
K Nearest Neighbors
Support Vector Machines
Logistic Regression
Linear Regression
Lasso Ridge
Best Subsets
Bayesian NetworksStepwise
Regression or Classification
Linear Discriminant Analysis
Time Series Forecasting
Decision Trees
Random Forests
Gradient Boosting
Principal Components Analysis
= we will look at these today7
M. Janakiram Oct 2017
Better Prediction Better Model!
Three models:Under fit: biasOver fit: varianceJust right!
x
y
Balance accuracy of fit on training data and generalization of future or test data to get a better model!
8
M. Janakiram Oct 2017
Predictive Modeling
• Regression Analysis
• Time Series Forecasting
• Machine Learning & Decision Trees
9
M. Janakiram Oct 2017
Regression and Correlation• Regression analysis is a tool for building statistical models
that characterize relationships between a dependent variable and one or more independent variables, all of which are numerical. – A regression model that involves a single independent
variable is called simple regression. A regression model that involves several independent variables is called multiple regression.
• Correlation is a measure of a linear relationship between two variables, X and Y, and is measured by the (population) correlation coefficient. Correlation coefficients will range from −1 to +1. Scatter plot evaluates the relationship between two variables.
10
M. Janakiram Oct 2017
Linear Regression• Find the line to minimize the squared error of
the fit (least squares loss function)
x
Y
1
2
3
4
5
1 2 3 4 5Predictive Model
Error of fit to this point⇒ This is called a residual
Y = 1 + 0.67x
Fit Y = β0 + β1 x(note – model canextend to any number of x’s)
11
M. Janakiram Oct 2017
Watch out for Nonsense Correlations!
• Beware lurking variables!• Correlation does not mean Causation
NOTE: images captured from the web12
M. Janakiram Oct 2017
Reed Auto periodically hasa special week-long sale. As part of the advertisingcampaign Reed runs one ormore television commercialsduring the weekend preceding the sale. Based on a sample of 5 previous sales, determine whether number of cars sold is influenced by number of TV Ads.
Simple Linear Regression
Example: Keith’s Auto Sales
Number ofTV Ads (x)
Number ofCars Sold (y)
13213
1424181727
Σx = 10 Σy = 100
2x = 20y =
0 1y b b x= +
See Excel output 13
M. Janakiram Oct 2017
Simple Linear Regression Excel Output
14
M. Janakiram Oct 2017
Example: Programmer Salary Survey
Multiple Regression Model
A software firm collected data for a sample of 20 computer programmers. A suggestion was made that regression analysis could be used to determine if salary was related to the years of experience and the score on the firm’s programmer aptitude test.The years of experience, score on the aptitude test, and corresponding annual salary ($1000s) for a sample of 20 programmers is shown in the table.
47158100166
92105684633
781008682868475808391
88737581748779947089
24.043.023.734.335.838.022.223.130.033.0
38.026.636.231.629.034.030.133.928.230.0
Exper. Score ScoreExper.Salary Salary
y = b0 + b1x1 + b2x2 + . . . + bpxp
See Excel output
15
M. Janakiram Oct 2017
Multiple Linear Regression Excel Output
16
M. Janakiram Oct 2017
Predictive Modeling
• Regression Analysis
• Time Series Forecasting
• Machine Learning & Decision Trees
17
M. Janakiram Oct 2017
Time Series Forecasting?• Forecasting is an estimate of future performance based on
past/current performance• Qualitative and Quantitative technique can be used for
forecasting• Forecasting can be used to get insight into future performance
for better planning and managing the future activities• Forecasting examples:
– Revenue forecasting– Product demand forecasting– Equipment purchase– Inventory planning, etc.
18
M. Janakiram Oct 2017
Forecasting Inputs
Sources:Internal:• Performance
history• Experience
External:• Economy• Market• Customers• Competition
Methods:Qualitative:• Judgment• Intuition• Expectations
Quantitative:• Data• Statistical
analysis• Models
Factors: • Promotions• Price changes
• Seasonal changes• Trends / cycles
• Customer tastes• Gov’t regulations
Forecast !Forecast model should be better than Naïve models (which use Average or last period values)
19
M. Janakiram Oct 2017
Forecasting Methods
Combination of qualitative and quantitative techniques can be used for better forecasting..
Life Cycle Models
GraphicalTechniques
20
M. Janakiram Oct 2017
Qualitative Models
• Field Sales Force– Bottom-Up Aggregation of Salesmen’s Forecasts (subjective feel).
Surveys are also used very often to get a feel for what the future holds.
• Jury of Executives– A forecast developed by combining the subjective opinions of the
managers and executives who are most likely to have the best insights. We use it internally (called as Forecast Markets) for product forecast and transition.
• Users’ Expectations– Forecast based on users’ inputs (mostly subjective)
• Delphi Method– Similar to Jury of Jury of Executives but the process is recursive until
consensus among the panel members is reached.
21
M. Janakiram Oct 201722
Quantitative Time Series Models• Naïve Forecast
– Simple Rules Based upon Prior Experience• Trend
– Linear, Exponential, S-Curve and Other Types of Pattern Projection• Moving Average
– Un-weighted Linear Combination of Past Actual Values• Filters
– Weighted Linear Combination of Past Actual Values• Exponential Smoothing
– Forecasts Obtained by Smoothing and Averaging Past Actual Values• Decomposition
– Time Series Broken Into Trend, Seasonality, Cyclicality, and Randomness
• Data mining– Mostly CART and Neural Network techniques are used for forecasting
M. Janakiram Oct 2017
Causal/Explanatory Models• Regression Models:
– Modeling the value of an output based on the values of one or more inputs. Also called causal models (can have multiple inputs).
• Econometric Models:– Simultaneous Systems of Multiple Regression Equations
• Life Cycle Models:– Diffusion-based approach, used to predict market size and duration
based on product acceptance behavior• ARIMA Models:
– Class of time series models that take into consideration, various behavior of the response in question based on past time series data.
• Transfer Function Models:– Class of time series models that take into consideration, various
behavior of the response along with causal factors based on past time series data.
23
M. Janakiram Oct 2017
Simple Moving Average (or UWMA)
• The Simple Moving Average, or Uniformly Weighted Moving Average (UWMA) for a time period t, with a range span of k, is described as:
• The forecasted value for time t+1 becomes:
k
xSMA
k
iit
t
∑=
−
= 1
k
xSMA
k
iit
t
∑=
−+
+ = 1)1(
1
First 3 data points averaged
24
M. Janakiram Oct 2017
Exponentially Weighted Moving Average (EWMA)
• In contrast to Simple Moving Average, Exponentially Weighted Moving Average (EWMA) for a time period t, uses all of previously observed data, instead of just a fixed range.
• Smoothed value for a time period, t, indicated by Zt.
• Each previous observation receives different weight, with more recent observations being weighted more heavily, and weights decreasing exponentially.
• How these weights decrease determined by smoothing parameter, α. Normally it ranges from 0.2 to 0.4.– When α approaches 0, all prior observations have equal weight – When α approaches 1, most weight is given to more recent observations
• Smoothing method should be done on stationary time series without seasonality.
)( or )1(
111
11
−−−
−−
−+=−+=
tttt
ttt
ZyZZZyZ
ααα
25
M. Janakiram Oct 2017
Forecasting Scenario*
• Process: Sets of 10 coin flips
• What’s the naïve model?• Can you beat it? If so, how?
26
M. Janakiram Oct 2017
Forecasting: Common Pitfalls• Good-fitting model ≠ accurate forecast
– Standard modeling methods can be (ab)used to over-fit the data
Fitting a model to history is easy – good forecasting is not. (SAS Institute)
Forecast
27
M. Janakiram Oct 2017
Example: Regression vs. Time Series
• Estimate CO2 data for next 2 years• Intuitively, we could do better than the simple linear regression, right?
325
330
335
340
345
350
CO
2
01/1
973
01/1
974
01/1
975
01/1
976
01/1
977
01/1
978
01/1
979
01/1
980
01/1
981
01/1
982
01/1
983
01/1
984
01/1
985
01/1
986
01/1
987
01/1
988
01/1
989
01/1
990
Year/Month
Linear Fit
Bivariate Fit of CO2 By Year/Month
326328
332334
338
342
346348
352
356
360
Pre
dict
ed V
alue
0 50 100 150 200Row
Forecast
Time Series Models can yield better predictive models!
It takes advantage of correlation (s) between the current value and other responses that occurred previously in time
Time Series Example: Monthly CO2 concentrations from Mauna Loa captured for 14 years
Simple linear regression model shows overall trend of data, but it is not a best representation of all patterns in data!
Time series: Collection of observations made over equally spaced time intervalsE.g.: cost, inventory levels, shipment times, weekly forecast values
28
M. Janakiram Oct 2017
Components of a Time Series
• The time series data can mainly have the following components:
Trend Cyclical
350000
355000
360000
365000
370000
375000
Dec
Ja
n
Feb
Ma
r
Ap
r
Ma
y
Ju
n
Ju
l
Au
g
Sep
Oct
No
v
Level
Level
150000
350000
550000
750000
950000
1150000
1350000
Dec
Jan
Feb
Mar
Apr
May Jun
Jul
Aug
Sep
Oct
Nov
Linear (Trend) Poly. (Trend)
Trend: Continuing pattern of increase or decrease, either straight-line or curve
Level: Absence of trend, seasonality, or noise. Also, check for stability (Stationary vs. Non-stationary
0
0.5
1
1.5
2
2.5
3
Dec
Jan
Feb
Mar Ap
rMay Ju
nJu
lAu
gSe
pOct
Nov
Seasonality
Seasonal: A repeating pattern of increases and decreases that occurs with a regular frequency
First plot your data chart to understand any behavior and relationships
Seasonal Irregular
29
M. Janakiram Oct 2017
Models to Account for Seasonality• Think of creating model to account for Seasonality as extension of the
exponential smoothing.
• There are 3 terms to estimate in a seasonal model, the Level, the Trend, and the Seasonal components.
LevelTrend
Seasonal
326328
332334
338
342
346348
352
356
360
Pre
dict
ed V
alue
0 50 100 150 200Row
Forecast
Now look at this CO2 data
from this perspective!
30
M. Janakiram Oct 2017
Sample Size Guidance
More is better.
50-100 observations
recommended.
• 1 to 2 years of weekly data.• 4 to 8 years of monthly data.• 12 to 24 years of quarterly data.
31
M. Janakiram Oct 2017
Predictive Modeling
• Regression Analysis
• Time Series Forecasting
• Machine Learning & Decision Trees
32
M. Janakiram Oct 2017
Get the right data
Mine the data
Determine what is interesting
Presentation
Set clear business goal
Machine Learning (ML)/Data Mining (DM)
ML/DM follows the problem solving methodology
Pre-processing
Training data
Testing dataModel/Result
New data
33
M. Janakiram Oct 2017
Decision Trees (Classification)• Finds the best factor split points to minimize
the classification error
x1
x2
Y = Or
1
2
3
4
5
1 2 3 4 5
If x1< 4.5 and x2 < 3 Then Y = Else Y =
Predictive Model
The model is a piecewise constant function
Y is a 2 level categorical variable
x1
x2
< 4.5
< 3Model classification error
34
M. Janakiram Oct 2017
Iris data – Clustering Analysis
• Iris data first published by geneticist and statistician Ronald Fisher in 1936 is widely used in machine learning for benchmarking.
• The sepal length, sepal width, petal length, and petal width are measured in mm on fifty Iris specimens from each of three species:
1. Iris Setosa2. Iris Versicolor3. Iris Virginica
• Machine learning goal is to classify the species based on the flower characteristics
35
M. Janakiram Oct 2017
4.5
5.5
6.5
7.5
22.5
33.5
44.5
1
3
5
7
0.51
1.52
2.5
Sepallength
4.5 5 5.5 6 6.5 7 7.5 8
Sepal w idth
2 2.5 3 3.5 4 4.5
Petallength
1 2 3 4 5 6 7
Petal w idth
.5 1 1.5 2 2.5
setosaversicolorvirginica
Species
Scatterplot Matrix
Iris dataY: 1=Setosa, 2=Versicolor & 3=Virginica; X: Sepal Length, Sepal Width, Petal Length and Petal Width
36
M. Janakiram Oct 2017
4.55.05.56.06.57.07.58.0
2.0
2.5
3.0
3.5
4.0
4.5
1
2
3
4
5
6
7
0.5
1.0
1.5
2.0
2.5
Sepallength
4.5 5.5 6.5 7.5
Sepal w idth
2.0 2.5 3.0 3.5 4.0 4.5
Petallength
1 2 3 4 5 6 7
Petal w idth
.5 1.0 1.5 2.0 2.5
setosaversicolorvirginica
Species
Scatterplot Matrix
Tree building
1
2
1
2
OPTIMAL
37
M. Janakiram Oct 2017
0.00
0.25
0.50
0.75
1.00
Spec
ies
All Row s
Petal w idth>=0.8 Petal w idth<0.8
setosa
versicolor
virginica
Tree building
0.00
0.25
0.50
0.75
1.00
Spec
ies
All Row s
setosa
versicolor
virginica
Split 1
0.00
0.25
0.50
0.75
1.00Sp
ecie
s
All Row sPetal w idth>=0.8
Petal w idth<1.8 Petal w idth>=1.8Petal w idth<0.8
setosa
versicolor
virginica
38
M. Janakiram Oct 2017
Single tree: Classification vs. regressionRegression
Continuous Response: Species NumberClassification
Categorical Response: Species
Output: Classification counts, and Misclassification %
Output: mean and variance for each node.
39
M. Janakiram Oct 2017
Gartner Magic Quadrant - Data Science
40
M. Janakiram Oct 2017
Business Analytics
41
M. Janakiram Oct 2017
Questions?
42