business data analytics · business data analytics lecture 4 ... marketing and sales customer...

44
Business Data Analytics Lecture 4 MTAT.03.319 The slides are available under creative common license. The original owner of these slides is the University of Tartu.

Upload: others

Post on 11-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Business Data Analytics

Lecture 4

MTAT.03.319

The slides are available under creative common license.

The original owner of these slides is the University of Tartu.

Page 2: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Marketing and Sales

Customer Lifecycle

Management:

Regression Problems

Page 3: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Customer lifecycle

Page 4: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Customer lifecycle

Page 5: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Moving companies grow not

because they force people to move

more often, but by attracting new

customers

Page 6: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Relationships based on

commitment

event-based subscription-based

Page 7: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Relationships based on

commitment

• Packers and Movers

• Wedding Planners

Event-based Subscription-based

• Telco

• Banks

• Retail (Walmart, Konsume,

etc)

• Hairdressers

Page 8: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, Third Edition

Customer lifecycle

Page 9: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Customer lifecycle

Page 10: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Customer lifecycle

(Returns)

https://www.sas.com/content/dam/SAS/en_us/doc/analystreport/forrester-analytics-drives-customer-life-cycle-management-108033.pdf

Page 11: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Customer Lifecycle

(Techniques/Approaches)

• Start with understanding of

your existing customers

(Segment the customers)

• Acquire the customers more

profitable

• Understanding future behavior

with Propensity Model

• Convince/Influencing your

customers to Spend more

• Clustering (Lecture 3)

• Regression techniques

(Present, Lecture 4)

• Classification (Lecture 5)

• Cross-selling/Up-selling

(Lecture ?)

Problems Solutions

Page 12: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Customer lifetime value

(CLV)

Describes the amount of revenue or profit a customer generates

over his or her entire lifetime (10-20 years)

We attempt to minimize “cost per acquisition” (CAC) and

keeping any given customer.

or CLTV, lifetime customer value (LCV), or life-time value (LTV) .

Page 13: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Predictive lifetime value: projects what new

customers will spend over their entire lifetime.

CLV is often referred to two forms of lifetime value analysis:

Historical lifetime value: simply sums up

revenue or profit per customer.

Page 14: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

time

Predicting purchase journeys

?

Page 15: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

“Algorithms predict purchase frequency, average order value,

and propensity to churn to create an estimate of the value of the

customer to the business. Predictive CLV is extremely useful for

evaluating acquisition channel performance, using modeling to

target high value customers, and identifying and cultivating VIP

customers early in their brand journey.”

custora.com

Page 16: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Unsupervised learning Supervised learning

Page 17: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Supervised vs. Unsupervised Learning

The goal of the supervised approach is to learn function that maps input

x to output y, given a labeled set of pairs

The goal of the unsupervised approach is to learn “interesting patterns”

given only an input

Categorize different types of customers ?

How many will leave ? (regression)

If a particular customer will leave or not ? (classification)

Page 18: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Regression vs. Classification

If a particular customer will leave or not ?How many will leave ?

Page 19: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Sleeping habits

4 hours of sleep 8 hours of sleep

exam performance

Page 20: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Linear regression

Page 21: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Linear regression

Page 22: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Linear regression

Page 23: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Simple linear regression

x

y

Task: given a list of observations find a line

that approximates the correspondence in the data

Page 24: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Simple linear regression

output

(dependent variable,

response)

input

(independent variable,

feature,

explanatory variable, etc)

Page 25: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Simple linear regression

intercept (bias)

coefficient (slope, or weight w)

noise (error term, residual)

shows how increases output

if input increases by one unit

mean of y when x=0

shows what we are not able to predict with x

Page 26: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Simple linear regression

Page 27: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Simple linear regression: example

Built-in R dataset: a collection of observations of the Old

Faithful geyser in the USA Yellowstone National Park

> data(faithful)

> head(faithful)

eruptions waiting

1 3.600 79

2 1.800 54

3 3.333 74

4 2.283 62

5 4.533 85

6 2.883 55

the duration of the geyser eruptions (in mins)

the length of the waiting period until the next one (in mins)

> model <- lm(data=faithful, eruptions ~ waiting)

What do we want to model here?

i.e. What is input and output?

> dim(faithful)

[1] 272 2

Page 28: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Simple linear regression: example in R

The fitted model is: eruptions = -1.87 + 0.08 x waiting

What is the eruption time if

waiting was 70?

Page 29: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Simple linear regression: example in R

The fitted model is: eruptions = -1.87 + 0.08 x waiting

> -1.874016 + 70*0.075628

[1] 3.419944

> coef(model)[[1]] + coef(model)[[2]]*70

What is the eruption time if

waiting was 70?

Page 30: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Data

Machine learning “secret sauce”

Test

Train

Page 31: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Prediction Problem

Training Data

(with labeled information)

X 1 -> Y1

X2 -> Y2

:

X100 -> Y200

X101 -> ?

X102 -> ?

:

X110 -> ?

Test Data

(no labeled information)

Page 32: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Simple linear regression

x

y

Task: given a list of observations find a line

that approximates the correspondence in the data

Predict

this Label

Page 33: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Simple linear regression: example in R

train_idx <- sample(nrow(faithful), 172)

train <- faithful[train_idx,]

test <- faithful[-train_idx,]

model <- lm(data=train, eruptions ~ waiting)

test$predictions <- predict(newdata=test, model)

Page 34: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Multiple linear regression

all the same, but instead of one feature, x is a k-dimensional vector

the model is the linear combination of all features:

Page 35: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Multiple linear regression

model_1 <- lm(data=train[,-1], AmountPerCust_2 ~ AmountPerCust_1

+ TransPerCustomer_1 + AmountPerTr_1 + gender + age +

discount_proposed + clicks_in_eshop)

Page 36: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Multiple linear regression

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -121.71416 5.57368 -21.837 < 2e-16 ***

AmountPerCust_1 -0.11450 0.07461 -1.535 0.1259

TransPerCustomer_1 1.10197 0.68570 1.607

0.1090

AmountPerTr_1 0.18432 0.11310 1.630 0.1041

gender1 -1.24804 0.58533 -2.132 0.0337 *

age 0.20375 0.03775 5.397 0.000000132 ***

discount_proposed1 59.90655 2.02840 29.534 < 2e-

16 ***

clicks_in_eshop 23.98903 1.04957 22.856 < 2e-16

***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.948 on 322 degrees of freedom

Multiple R-squared: 0.8529, Adjusted R-squared: 0.8497

F-statistic: 266.7 on 7 and 322 DF, p-value: < 2.2e-16

Interpret

Page 37: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Quality Assessment

• Mean Absolute Error• Mean Square Error• Root Mean Square Error• R2

Page 38: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

MAE and (R)MSE

Reference: https://medium.com/human-in-a-machine-world/mae-and-rmse-which-

metric-is-better-e60ac3bde13d

Loss function/distance:

MAE is more robust to outliers since it

does not make use of square.

With errors, MAE is steady

MSE is more useful if we are

concerned about large errors.

With increase in errors, RMSE

increases as the variance associated

with the frequency distribution of error

magnitudes also increases.

Page 39: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

RMSE Vs. MAE

https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d

RMSE should be more useful when large errors are particularly undesirable.

Page 40: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

RMSE Vs. MAE

https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d

RMSE does not necessarily increase with the variance of the errors. RMSE

increases with the variance of the frequency distribution of error magnitudes

Page 41: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Simple linear regression: example in R

train_idx <- sample(nrow(faithful), 172)

train <- faithful[train_idx,]

test <- faithful[-train_idx,]

model <- lm(data=train, eruptions ~ waiting)

test$predictions <- predict(newdata=test, model)

MSE <- (1/nrow(test))*sum((test$eruptions - test$predictions)^2)

> MSE

[1] 0.2742695

Page 42: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

R2: Goodness of fit

Reference: https://www.youtube.com/watch?v=w2FKXOa0HGA

R2 -> 1, better as compared to when R2-> 0var(mean) – var(model)

R2 = var(mean)

*Var = variation

Page 43: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

R2 and more

• R-squared cannot determine whether the coefficient estimates and

predictions are biased, which is why you must assess the residual

plots.

• “Adjusted R-square” penalizes you for adding variables which do

not improve your existing model.

• Typically, the more non-significant variables you add into the

model, the gap in R-squared and Adjusted R-squared increases.

Page 44: Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Demo time!

https://courses.cs.ut.ee/2018/bda/fall/Main/Practice