business data analytics · 2019. 10. 10. · business data analytics lecture 3 repeatable,...

87
Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these slides is the University of Tartu.

Upload: others

Post on 23-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Business Data Analytics

Lecture 3Customer Segmentation

MTAT.03.319

The slides are available under creative common license.

The original owner of these slides is the University of Tartu.

Page 2: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Business Data Analytics

Lecture 3

Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction

Data Exploration

Customer (Data) Segmentation

Lecturer 2

Lecture 1

Descriptive (numbers) & Visualization

Descriptive Analysis

Descriptive Analysis

Page 3: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Marketing and Sales

Customer Segmentation

Page 4: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Outline

Techniques for Customer Segmentation

• Intuition Based

• Historical/Behavior basedRFMValue TierLife Cyclestage

• Data DrivenK-MeansHierarchical ClusteringDBSCAN

Page 5: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

How’d you do it?

Imagine you have become a business owner of a big company

in the automobile sector and wish to redefine the focus of your production line.

You want to know what and how much to produce (SUV, 4x4, coupe, sedan, wagon, etc..)

You have a big client base.

Page 6: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data
Page 7: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Customer segmentation

Persona analysis and

Intuition-basedHistorical/behavioral-based Data-driven

Page 8: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Know your clients

Persona analysis

Page 9: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Persona analysis

identification of customer types based on their preferences

better communication loyalty new customers

Page 10: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

The most attractive customers are

usually those

for which there is a big gap between

their needs

and the current satisfaction of these

needs

Page 11: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Intuition-based

Demographic Attitudinal

Working men in 30s from

Tartu with children

grouping customers

based on their demographic

characteristics

grouping customers

based on their needs

Women who wish to increase

sport activity

and need motivation

Page 12: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Customer segmentation

Intuition-based Historical/behavioral-based Data-driven

Page 13: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Behavioral-basedgrouping customers based of what they done in the

past: purchases,

website browsing, comments

Page 14: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Behavioral-basedgrouping customers based of what they done in the

past: purchases,

website browsing, comments

RFM Value tier Lifecycle stage

Page 15: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

RFM

Page 16: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

RFM

Page 17: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

F(h)R(l)M(h)

h - high

n - neutral

l - low

F(h)R(n)M(h)

F(h)R(h)M(h)

F(h)R(n)M(n)

F(n)R(n)M(n)

F(n)R(h)M(n)

F(n)R(l)M(n)

F(n)R(l)M(l)

F(l)R(l)M(l)

F(n)R(h)M(l) F(l)R(h)M(n)

RFM

Page 18: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Value tier

Grouping customers based on the value they deliver to your

business. Top 1%, top 5 % etc of generated revenue.

Page 19: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Lifecycle stage

Grouping customers based on the type of relationships with

the company/brand

new customer regular loyal returning

Page 20: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Customer segmentation

Intuition-based Historical/behavioral-based Data-driven

RFM Value tier Lifecycle stage

Page 21: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Data-driven segmentation

Automated discovery of the new segments

Depends on your data

Ursus Wehrli

Page 22: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Data-driven segmentation

Automated discovery of the new segments

Depends on your data

Ursus Wehrli

Price sensitivityPromotion sensitivity

Product affinityemail responses

Page 23: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Automated segmentation

Page 24: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Automated segmentationDistances inside

clusters are small:

dense cloud

Page 25: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Automated segmentation

Distances between

clusters are maximized

Page 26: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

…in multidimensional space

Page 27: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Clustering in U.S Army

Page 28: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data
Page 29: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

3 Important Questions

1. How do you represent a cluster of more than

one point ?

2. How do you determine the “nearness” of

clusters?

3. When to stop combining clusters

Source: Stanford University – Hierarchical clustering

Page 30: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

3 Important Questions

1. How do you represent a cluster of more than one

point?

centroid or clusteroid represents a set of points

2. How do you determine the “nearness” of clusters?

Some Distance metric

3. When to stop combining clusters

convergence or some threshold or cohesion

Page 31: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

K-means clustering

Page 32: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

K-meansfixed number of clusters -

you need to choose it

yourself

Page 33: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

K-meansfixed number of clusters -

you need to choose it

yourself

based on the calculation

of averages

Page 34: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Initialization

Pick randomly K

nodes from the all the

nodes as initiators

32

1

K2

K3

K1

Page 35: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Assignment step:

distance calculation

d

d

d d

d(a, b) =

321

K2

K3

K1

4

X

Y

(𝑥4 − 𝑥1)2+(𝑦4 − 𝑦1)

2=d(4, 1)

𝑖=1

𝑛

(𝑖𝑎 − 𝑖𝑏)2

Page 36: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Assignment step:

distance calculation

=

K2

K3

K14

4 goes to the basket

with min distance

321

4d(a, b)

𝑖=1

𝑛

(𝑖𝑎 − 𝑖𝑏)2

Page 37: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Assignment step:

distance calculation

d

d

d

d

=

321

K2

K3

K1

5

5

4

d(a, b)

𝑖=1

𝑛

(𝑖𝑎 − 𝑖𝑏)2

Page 38: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Assignment step:

distance calculation

d

dd

=

3

21

6

K2

K3

K14

5

6

d(a, b)

𝑖=1

𝑛

(𝑖𝑎 − 𝑖𝑏)2

Page 39: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Assignment step:

distance calculation

d

dd

=

3

21

7 K2

K3

K14

5

6

, 7

d(a, b)

𝑖=1

𝑛

(𝑖𝑎 − 𝑖𝑏)2

Page 40: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Assignment step:

distance calculation

Make three K clusters

(once you finish all the

data points)

Page 41: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Update step:

calculation of average

7

Calculate centroid

of 3 clusters

1

𝑛

𝑖=1

𝑖=𝑛

𝑦𝑖C = 1

𝑛

𝑖=1

𝑖=𝑛

𝑥𝑖 ,

Page 42: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Update step:

assignment recalc

K2

K3

K1

7

Page 43: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Update step:

assignment recalc

d

K2

K3

K1

7

7

Datapoint 7 moved from K2 to K1

Page 44: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Convergence: When to stop ?

Iteration 1 :

Iteration n :

Iteration 2 :

Iteration (n-1) :

:

:

K2 K3K1

Different

Different

Same

Page 45: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Exercise

You have a dataset with two dimensions, customer loyalty score and

customer value score, and the following dataset for 6 customers:

loyalty value

1 2.3 8.0

2 5.6 6.7

3 4.2 7.1

4 3.1 6.0

5 4.6 11.0

6 6.0 9.0

Find 3 clusters using k-means

Page 46: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

model = KMeans(n_clusters=3)

model.fit(df)

Python code:

Your result depends

on K

Page 47: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

DrawbacksThe number of clusters k is an input parameter:

an inappropriate choice of k may yield poor results.

How to select initial data points ?

Pick the first seed randomly

Pick the second seed as far as possible from the first seed.

Pick the third seed as far as possible from the first two

:

:

Page 48: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Best value of K ?

Drawbacks

Page 49: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

DrawbacksBest value of K ?

* Y axis: sum of squared distance or Distortion

Page 50: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Average (calculate the distance from

the centroid for each data point

within that cluster)

Page 51: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

DrawbacksConvergence to a local minimum may produce

counterintuitive ("wrong") results

Sensitive to noise and outliers

Page 52: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Drawbacks

Can handle only numerical features

Male Female

Do you know

what is our

Euclidean

distance?

Page 53: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Cannot detect clusters of various shapes

Drawbacks

Page 54: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Food for thought ?

Can a data point belong to more than one cluster ?

Check “Fuzzy C Means Clustering”

Page 55: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Hierarchical clustering

Page 56: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Why?No need to choose number of clusters and worry about

the initialization. Creates a tree, where lower levels are

subclusters of higher levels

Page 57: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Why?Any distance metric can be used (not only Euclidean)

Page 58: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Why?

Easy to visualize, provides

a good summary of the

data structure in terms of

clusters

The plot of hierarchical

clustering is called

dendrogram

Page 59: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Hierarchical clustering

• Agglomerative (bottom up)

Initially each point is a cluster

Repeatedly combine the two “nearest” clusters into one

• Divisive (Top Down)

Start with one cluster and recursively split it

Page 60: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Hierarchical Clustering Agglomerative (bottom up)

(0,0)

(1,2)

(2,1) (4,1)

(5,0)

(5,3)

Source: Stanford – Hierarchical clustering

Dendrogram

Datapoint

X Centroid

We have 6 clusters. Each data point is a

cluster.

Step 1: Calculate the distance (Euclidean)

among very pair of points

Dis

tance

Page 61: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

(0,0)

(1,2)

(2,1) (4,1)

(5,0)

(5,3)

Source: Stanford – Hierarchical clustering

Dendrogram

Datapoint

X Centroid

Step 2: Pick two points having shortest

distance.

This is first cluster. To represent this cluster,

calculate the centroid (average of two data

points), that is ((1+2)/2, (2+1)/2)

(1.5,1.5)X1

Dis

tance

Hierarchical Clustering Agglomerative (bottom up)

Page 62: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

(0,0)

(4,1)

(5,0)

(5,3)

Source: Stanford – Hierarchical clustering

Dendrogram

Datapoint

X CentroidNow we have 5 clusters.

Calculate again the distance among all the data points, but

now cluster of (1,2) and (2,1) is represented by X1(1.5, 1.5)

Pick with min distance.

(1.5,1.5)X1

X2 (1.5,1.5) Dis

tance

Hierarchical Clustering Agglomerative (bottom up)

Page 63: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

(0,0)

(5,3)

Source: Stanford – Hierarchical clustering

Dendrogram

Datapoint

X CentroidNow we have 4 clusters.

Distance between (0,0) and centroid X1 is the minimum

Pay Attention: How to calculate the new centroid ?

(1.5,1.5)X1

X2 (4.5,0.5) Dis

tance

Hierarchical Clustering Agglomerative (bottom up)

Page 64: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

(0,0)

(5,3)

Source: Stanford – Hierarchical clustering

Dendrogram

Datapoint

X Centroid New centroid: is calculated using original

points (and NOT using centroid).

X2 (4.5,0.5)

X3 (1,1)

(1,2)

(2,1)

(1.5,1.5)X1

Dis

tance

Hierarchical Clustering Agglomerative (bottom up)

Page 65: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

(0,0)

(5,3)

Source: Stanford – Hierarchical clustering

Dendrogram

Datapoint

X CentroidNow 3 clusters

Cluster 1: (5,3)

Cluster 2: [(1,2) , (2,1), (0,0)]

Cluster 3: [ (4,1), (5,0)]

X2 (4.5,0.5)

X3 (1,1)

(1,2)

(2,1)

(1.5,1.5)X1

Dis

tance

Hierarchical Clustering Agglomerative (bottom up)

Page 66: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

(0,0) (5,0)

(5,3)

Source: Stanford – Hierarchical clustering

Dendrogram

Datapoint

X Centroid

X2 (4.5,0.5)

X3 (1,1)

(1,2)

(2,1)

X1(1.5,1.5)

(4,1)

Now 3 clusters

Cluster 1: (5,3)

Cluster 2: [(1,2) , (2,1), (0,0)]

Cluster 3: [ (4,1), (5,0)]

NOTE: To calculate distance: Always use recent

centroids or data points, whichever applicable

Calculate Distance between

(5,3) and X2

X2 and X3

X3 and (5,3)

Dis

tance

Page 67: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

(0,0) (5,0)

(5,3)

Source: Stanford – Hierarchical clustering

Dendrogram

Datapoint

X Centroid Now we have two clusters

X2 (4.5,0.5)

X3 (1,1)

(1,2)

(2,1)

(1.5,1.5)X1

(4,1)

(5,0)

Dis

tance

Hierarchical Clustering Agglomerative (bottom up)

Page 68: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

(0,0) (5,0)

(5,3)

Source: Stanford – Hierarchical clustering

Dendrogram

Datapoint

X CentroidMight not make any sense to make the last

hierarchical cluster (everything into one) ?

X2 (4.5,0.5)

X3 (1,1)

(1,2)

(2,1)

(1.5,1.5)X1

(4,1)

(5,0)

Dis

tance

Hierarchical Clustering Agglomerative (bottom up)

Page 69: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

(0,0) (5,0)

(5,3)

Source: Stanford – Hierarchical clustering

Dendrogram

Datapoint

X Centroid Select a threshold to select the clusters.

(1,2)

(2,1) (4,1)

(5,0)

Dis

tance

Convergence: When to stop ?

Page 70: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Adv: Clustering products by customer preferences

Page 71: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Did you notice what

distance metric we used ?

Centroid LinkageDistance between centroids

(means) of two clusters

Page 72: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Single Linkage

Complete Linkage

Average Linkage

Centroid Linkage

Produces chains

Pick clusters with minimum distance but

how to calculate distance ?

Distance between closest

elements in clusters

Distance between furthest

elements in clusters

Forces “spherical”

clusters with consistent

diameter

Distance between centroids

(means) of two clusters

Less affected by outliersAverage of all pairwise

distances

Ward’s methodConsider joining two

clusters, how does it

changes the total distance

(variance) from centroids

Variance is the key

Source: http://homepages.inf.ed.ac.uk/vlavrenk/

Page 73: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Non Euclidean space

How to represent a cluster of many points ?

Clustroid: data (point) closest to other points

How do you determine the “nearness” of clusters ?

Treat clustroid as if it were centroid when computing intercluster distances

Possible meaning of closest

Smallest maximum distance to other points

Smallest average distance to other points

Smallest sum of squares of distance to other points

Page 74: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Drawbacks

Sensitivity to outliers

A hierarchical structure is created even when

such structure is not appropriate.

Sometimes the dendrogram is too huge huge to

infer anything meaningful

Page 75: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

DBSCANDensity-Based Spatial Clustering of

Applications with Noise

Page 76: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

How about following data

points shapes

Page 77: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

DBSCANDensity-Based Spatial Clustering of Applications with Noise

Source: https://medium.com/@elutins/dbscan-what-is-it-when-to-use-it-how-to-use-it-8bd506293818

Inputs:

1)Radius distance/Eps (ε): Distance between two

data points

2) Minimum points: Minimum number of points

which have to be within the distance so they can

form a cluster.

Page 78: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

DBSCAN

Page 79: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

DBSCAN

Page 80: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

DBSCAN: Pros & ConsAdvantages:1) Great at separating clusters of high density versus clusters of low density within a

given dataset.

2) Is great with handling outliers within the dataset.

Disadvantages:1) Does not work well when dealing with clusters of varying densities.

2) Struggles with high dimensionality data*.

More on DBSCANhttps://towardsdatascience.com/dbscan-clustering-for-data-shapes-k-means-cant-

handle-well-in-python-6be89af4e6ea

* Isolation Forest and Robust Random Cut Forest are two algorithms which

works good for high dimensionality data.

Page 81: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

3 Important Questions

1. How do you represent a cluster of more than one point?

centroid or clusteroid represents a set of points

2. How do you determine the “nearness” of clusters?Some Distance metric

3. When to stop combining clustersconvergence (k-means) or some threshold

(Hierarchical) or cohesion (DBSCAN)

Page 82: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Other clustering methods

source:wikipedia

Page 83: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Additional examples:http://inseaddataanalytics.github.io/INSEADAnalytics/BoatsSegmentationCaseSlides.pdf

Page 84: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data
Page 85: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Summary

Techniques for Customer Segmentation

• Historical/Behavior based

RFM

• Data Driven

K-Means: How to Pick K ? Elbow Method

Hierarchical Clustering: Different Distance metrics

DBSCAN

Page 86: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Demo time!

https://courses.cs.ut.ee/2019/bda/fall/Main/Practice

Page 87: Business Data Analytics · 2019. 10. 10. · Business Data Analytics Lecture 3 Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction Data

Books and links

http://labs.openviewpartners.com/files/2012/10/Cu

stomer-Segmentation-eBook-FINAL.pdf

https://www.putler.com/rfm-analysis/About RFM:

http://proquestcombo.safaribooksonline.com.ezproxy.utlib.ut.e

e/book/databases/business-intelligence/9780470650936

About segmentation in B2B setting: