saks fifth avenue
Click here to load reader
Post on 15-Apr-2017
Embed Size (px)
Saks Fifth Avenue Customer Behavior Report Based on Data Driven Analysis
Group 8: Linhan zhang, Zhongyuan Lian, Huiruo Zhang, Yitian Chen
Saks Fifth Avenue is a luxury department chain store which sells high-end brands both
online and offline. The objective of this research is to help Saks Fifth Avenue (hereafter Saks)
decrease customers return rate and cancel rate so as to improve customers profitability and
satisfaction. Also, we want to regain our old customers as well as increase their loyalty.
The original data in our research comes from the Customer Relationship Management
Database in Saks. This database records a wide range of historical sale information based on every
single order line, including over 137,000 orders from 100,000 customers. Each order line records
customer information in terms of customer ID number, and ZIP Code, and transaction-related
items such as order date, shipping date, revenue, cost etc.
Since we intend to segment our customers based on their return record, order cancel record,
total profits, and the time of their most recent order, we aggregate all records into a new data file
with individual level. Then, we choose four key factors as our variables which are profits, return
rate, cancel rate and time duration since last order date. We use K-means cluster analysis as our
major segmentation method. We divide whole data into calibration set and validation set, and
conduct K-means cluster analysis on each of them to make sure that we will not miss any
meaningful group of customers. Furthermore, different methods are conducted to explore our
research several times.
After outcomes of K-means cluster analysis match our expectation, we summarize and
interpret our key findings. There are 8 clusters which have meaningful features respectively.
Among them there are three groups that interest us most.
The first group makes up about 30% of all customers which generate high profits, and their
return/cancel rate are extremely low. They have shortest time duration since last order date.
Obviously, they are the core customers for our company and we should take an action to retain
these customers in order to generate more profits. For example, we could offer them better services
and high quality products to increase customer loyalty and satisfaction.
The second one is a group of customers who generate relatively high profits to our
company, while their cancel rate are extremely high. These customers were able to generate a huge
profit for us. However, they are likely to cancel their orders due to some reasons. What makes
things worse is that they will cause additional costs for our company since we need to provide
special services when they return items. For these kinds of customers, we need to figure out their
true needs and the reasons of high cancel rate. They have huge financial potential if we can increase
their customer satisfaction. They could turn into the first group of customers and generate a huge
profit to the company.
The third group includes customers who have relatively lower profits, but their return rate
is very high. These customers are unsatisfied with our products or services so they keep returning
their items back. This group is a huge financial burden for Saks, so we have to decrease their return
rate by figuring out the reasons and taking any actions to increase their satisfaction.
Ultimately, we analyze the major reasons, which cause high rate/cancel rate. Based on our
previous analysis, we provide different managerial recommendations for each groups regarding
their significance and characteristics. These recommendations will serve to decrease customers
return rate and cancel rate and eventually increase profits for Saks in the future.
Table of Contents
1. Introduction ................................................................................................. 1
2. Background ................................................................................................. 2
3. Methodology and Analysis ............................................................................ 4
Definition of Clustering Analysis ................................................................... 5
Data Obtained and Used ................................................................................ 7
Variables selection and Explanation ............................................................... 7
Data Preparation ........................................................................................... 9
Calibration and Validation ........................................................................... 11
Clustering Settings ...................................................................................... 11
Measure Interval: Euclidean Distance.................................................... 12
Cluster Method: Wards Method ............................................................ 12
Standardization: Z scores ...................................................................... 13
Specific Operations ..................................................................................... 13
Findings from Clustering Results ................................................................. 18
4. Conclusion & Recommendations ................................................................. 20
5. Limitations and Future Research ................................................................. 28
6. Appendix ................................................................................................... 30
Imagine you are a store owner selling limited-edition Pradas purse which normally more
than $5000. Which kind of customer is more valuable for you? A customer who spends average
amount of money but never returns or cancels the order? Or a customer who spends huge amount
of money but returns or cancels most their orders at the end? This is a significant but tricky question
for every company, especially for Saks Fifth Avenue who has higher unit price.
It is said that customers are the most valuable equity for companies. As a luxury department
store, Saks sells products that are much more expensive, which means every single purchase means
a lot to the company in financial level. As a result, high return and cancel rate are more lethal for
Saks than regular department stores, for example, Macys. At the meantime, customer satisfaction
and loyalty that directly decide the companys fate are also extremely significant for Saks. What
is more, it is also important for us to know how often a customer comes back and purchase.
According to our background research, the major managerial issue of Saks is to increase
profit by reducing return/cancel rate as well as regaining customers who have not purchased more
than one year. Through a series of analysis and comparison, we segment whole customers into 8
groups based on profit that they generate, the time duration since their last order date, return rate
and cancel rate. Each of group has their own meaningful features. Some of them generate the
highest profits while have not purchased for more than two years. Some of them generate high
profit while also have high return/cancel rate. We have discussed each cluster in detail in the
following report. We will elaborate each groups features and provide managerial
Saks Fifth Avenue is a luxury department chain store that was founded in 1867. With such
a long history, Saks has established its own customer pool with large quantity of loyal customers.
Most customers go shopping in Saks for their nice service and latest fashion. There are a number
of world famous luxury brands in Saks including Gucci, Prada and FENDI. Staffs in Saks are very
professional and they usually offer customers thoughtful advices during the purchase process.
However, based on our research, Saks cannot generate as much revenue as it did a few
years ago. The competition between department stores is becoming more and more fierce. Main
competitors of Saks such as Bloomingdales and Neiman Marcus have made much pressure on
Saks by using price-off promotions. Even medium range department stores, saying Macys, and
online stores, like amazon.com, are competing with Saks. More competitions mean customers have
more choices. However, for Saks it leads to high return and cancel rate because once customers
find a lower price on amazon.com, the first action they will take is to cancel their orders on our
website. Moreover, high service costs make Saks more difficult to generate considerable profits.
As a result, Saks has faced much more challenges than it ever did and they need to find a way to
solve their own problems and keep growing.
In recent years, Saks has introduced their online stores and app to enlarge their market
share and attract more young customers. Online shopping is an easier and cheaper way to purchase
items for both customers and companies. However, it raises several issues as well. Since Saks sells
many apparels and makeups, it is impossible for customers to try them on before purchasing on
the website. Once customers find out that the product does not match their expectation, they will
return items back. Therefore, online shopping has increased return/cancel rate, which leads the
company to spend additional costs. Our team will help Saks to figure out solutions to these issues
by reducing return/cancel rate as well as increasing customer satisfaction.
3. Methodology and Analysis
The nature of retailing industry reflects the great importance of a deep understanding of
customers. Saks Fifth Avenue specializes in selling various high-end brands including Gucci,
Burberry, and Prada etc. In an effort to increase the companys profit, we notice that working on
reducing return rate and cancel rate could play a crucial role in achieving this objective. Once a
customer returns a product or cancels an order, we actually lose not only the potential profit, but
also the previous effort we invested in acquiring this customer and in attracting her to visit our
locales. Therefore, we pay most of our attention in investigating returning and cancelling so as to
obtain actionable insights of which we can take advantage.
We intend to conduct cluster analysis to segment our historic customers in terms of their
return record, cancel order record, and total profit generated throughout their accumulated
consumptions in Saks Fifth Avenue, as well as the time of their most recent order. By conducting
cluster analysis, we discover separate groups that differ from each other in these aspects. Then we
compare them, identify their differences, and evaluate the possible reasons to these differences.
After understanding the characteristics and implications of these groups, we are able to come up
with corresponding recommendations that can improve their future performance.
The objective of this study is to identify different customer groups in terms of the above
four aspects and screen out specific contact information of the customers in each group for direct
marketing, eventually decreasing return rate and cancel rate, increasing customer satisfaction and
profit, and regaining old customers.
Definition of Clustering Analysis
Originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology
by Zubin in 1938 and Robert Tryon in 1939, cluster analysis is the task of grouping a set of objects
in such a way that objects in the same group (called a cluster) are more similar (in some sense or
another) to each other than to those in other groups (clusters). [Reference] The outcome of cluster
analysis is to create a set of segments from a set of individual samples. Samples in the same
segment share more commonalities with each other than they do with samples from other
segments. In business, Cluster Analysis is a popular and frequently used method to realize market
segmentation, which is an important part of marketing planning.
Our research adopts two types of clustering methods: Hierarchical Clustering and K-means
Clustering. Hierarchical Clustering is useful when sample size is relatively small. Different
selections of clustering method and measure interval lead to different clustering results. Among
them we select the one that meets our expectation in terms of segment size, segment characters
and between-segment differences. Hierarchical clustering engenders an exploratory insight for
following K-means clustering analysis, which, together with hierarchical clustering, is capable of
big-size-sample segmentation. An effective and efficient cluster analysis on a big size data set
requires the combination of these two clustering methods. The Flowchart (see Fig. 3.1)
demonstrates the procedure of our cluster analysis.
Figure 3.1 Analysis Flowchart
Data Obtained and Used
The data analyzed in our research comes from the Customer Relationship Management
Database in Saks. This database records a wide range of historical sale information based on every
single order line. Each order line records customer information including customer ID number and
ZIP Code, and some key items recorded during a transaction such as order date, shipping date,
price, cost, etc. Different order lines may have the same order number, showing that these order
lines are from the same order. By the same token, different order numbers may have the same
customer number, meaning that the customer placed these orders in different times.
The size of the data we obtain is significant enough to produce representative insights.
We select the records start from 12/16/2004 to 09/17/2012, covering more than 226,000 order
line records. These records reflect over 137,000 orders from 100,000 customers.
Variables selection and Explanation
After understanding the descriptions of the variables in a record line, we determine four
clustering variables. They are: Total Profit, Return Rate, Cancel Rate, and Time Duration since
Last Order Date. These variables are not included in the current data set but can be calculated from
some of the existing variables. There are other variables we will use to describe the features and
attributes of our result segments, including Customer Number, Zip Code, etc. Each of the
clustering variables has its unique meaning and implication for us.
Profit is the most important indicator of a customers value. The higher the profit a
customer generates, the more imperative it is to maintain him/her. Generally, a company has finite
resources available for customer relationship management. If it invests equal resources in every
customer in spite of their value, unavoidably, it will end up with high profitable customers not
served and maintained hospitably and with low profitable customers occupying much resources
but not creating enough profit in return. Therefore, while our final objective is to come up with
actionable recommendations to different customer groups, the Total Profit tells us which group
requires more attentions and hence, more resources.
Return Rate conveys important information about the consumption characteristics of a
customer. A high return rate has many implications. For example, an unclear or misleading product
description could result in customers complaints after receiving their packages, which always
leads to returning. A high return rate could also be attributed to customers particular taste. No
matter what leads to high return rate, the higher the return rate, the more profit we loss. While
increasing revenue is a pathway to greater profit, lowing unnecessary loss is also an effective one.
Saks dedicates in high-end niche market. The nature of high-end brands, generally speaking, have
smaller sale volume than lower tier brands, but they invest more to support their high-end brand
positioning and marketing. Sakss well-qualified salesperson, high rental fee, and high advertising
budget imply a high operational cost. Once return or cancel happens, though most of products can
be resold, we waste a lot of costs. This is another reason we attach great importance to return rate
and cancel rate. With these considerations, we select return rate as one of our clustering
Cancel Rate refers to the ratio of ones cancel order lines to total order lines. The same as
return rate, it has a strong relation with profit, but in a different way. While returning is a
customers decision after receiving the ordered products, cancelling means a customer changes her
mind before that. We assume that return rate, relatively, comes down to the dissatisfaction of our
products and that it implies unsatisfying customer purchasing experiences such as chaotic
shopping guidance and poor customer services. By the same token as return, a decreased cancel
rate brings corresponding increased profit. Therefore, we put cancel rate in our variable list.
Time Duration since Last Order Date
Time duration since last order date is the time period between the date a customer placed
his last order and current date. We audit the data set and find there are a large number of customers
have been a long time not shopping in Saks again. The longer the duration, the higher the
possibility that the customer has already defected. This variable matters our decision making in
that marketing strategies and plans can be totally different towards new customers and old ones.
And so too is the resulting marketing effects. Relatively new customers are easier to contact and
attract because their contact information is up to date and because they have stronger connection
with our brand and products. On the other hand, customers who have more than three years not
coming back are of lesser value and priority due to the opposite reasons. Therefore, differentiating
new and old customers through clustering is meaningful.
The records in the data set is ordered basing on every single order line. Since the objective
of our cluster analysis is the acquirement of information on an individual basis, we aggregate all
records into a format with customer number as key value. We then audit the aggregated date set
and determine the calculations to transform existing variables into the four clustering variables.
Table 3.1 offers a comprehensive explanation of the variables used and the ones we compute, in
the order of variables used throughout this analysis.
Table 3.1 Overview of the Variables Used
Customer Number A unique customer identification numeric string with 11 digit. Each customers has only one customer number.
ZIP Code 5 digit ZIP Code referring the location of a customer where he/she places an order.
Order Number A unique 9 digit numeric string referring to a specific order. One customer could have placed more than one orders with different order number.
Order Line Line number for each unique product in an order.
Order Date Date an order was placed
Quantity Quantity of a product in an order
Revenue The total price of an order line
Cost The total cost of an order line
Return Quantity The quantity of returned product
Computed Variables before aggregation
The profit of a single order line.
Profit = Revenue - Cost
Time Duration since Order Date (Month)
The time duration between today and the day the order was placed.
Time Duration Since Order Date = Date of Today Order Date
Aggregated Variables (Aggregate by Customer Number)
Last ZIP Code The ZIP Code a customer places his/her last Order
Total Profit The summed Profit of a customers all order(s)
Time Duration since Last Order Date (Month)
The time duration between today and the day the customers last order was placed.
Total Quantity The total quantity of products of a customer has ever purchased including returned and cancelled quantity
Total Return Quantity The total quantity of products a customer has ever returned
Total Cancel Quantity The total quantity of products a customer has ever cancelled
Computed Variables after aggregation
The return rate of a customers historical consumptions.
Calculation: Return Rate = Total Return Quantity / Total Quantity
The cancel rate of a customers historical consumptions.
Calculation: Cancel Rate = Total Cancel Quantity / Total Quantity
Calibration and Validation
After the four clustering variables are ready, we divide the data set into two subsets:
Calibration sample set (including 60% records of all) and Validation sample set (including 40%
records of all). The Calibration sample set is used to generate a promising division of
segmentations, while the validation sample set is used to verify whether that division is appropriate
and representative. Conducting clustering on both these sets ensures no meaningful segments are
missed. If true, then a clustering on the entire data set is conducted to further testify that division.
This verification mechanism is useful in guaranteeing the accuracy and the representativeness of
Randomly selecting 10% samples from calibration set, we formulate the approach to
hierarchical clustering. There are three crucial decisions: selection of cluster method, selection of
measure interval, and whether or not to standardize clustering variables.
Measure Interval: Euclidean Distance
Measure Interval decides the calculation standard of the distance between two samples.
Two popular measure interval metrics are Squared Euclidean Distance and Euclidean Distance.
While the distance between two given samples is X according to Euclidean Distance algorithm, it
becomes X 2 in the case of Squared Euclidean Distance algorithm. Squared Euclidean Distance
amplifies the numeric value of a fixed distance, and thus the variance between samples is enlarged.
An enlarged variance alienate two samples. However, we prefer two similar samples to be
convergent rather than distant. Therefore, we select Euclidean Distance.
Cluster Method: Wards Method
Cluster Method decides the criterion that judges the distance between two clusters. Two
alternative methods are Furthest Neighbor and Wards method. Furthest Neighbor method
determine the longest distance between any two members of the two clusters as the distance
between the two clusters. This method is effective in identifying the small sample groups that are
conspicuously different from others, and correspondingly, the outcome clusters always happen to
have the majority of samples converge in a few large groups with the rest minority samples
assigned to much smaller groups. On the other hand, Wards method used sum of squared-errors
as the measure of distance and thus tends to produce groups of similar size.
Our analysis aims at identifying groups with different characters with respect to the four
clustering variables. The identified groups should be adequately sizable for actionable marketing
campaigns, which means that some of the segments identified by Furthest Neighbor method might
be too small to meet our expectations. On the contrary, Wards method provides groups with
relatively even sample distribution and is our choice.
Standardization: Z scores
Standardization is required when clustering variables produce different weighted
influences on the result. Standardization transforms the variables into comparable forms so that
they have equal influences and significances. In our study, the four variables have obviously
different value ranges and variances. To guarantee an accurate analysis, we standardize them.
Firstly, according to our purpose, we must create several new variables in order to complete
further analysis. Since customers profit is the key factor for our analysis, we use the following
equation to calculate a new variable named Profit.
Profit = Revenue Cost1
Because we want to know how many months passed since each customers last order date, we use
Date and Time Wizard to create a new variable named Time Duration. Secondly, we use
Recode into Same Variables to replace the missing values in cancel quantity, return quantity and
quantity with zero. Thirdly, we aggregate the original data file into a new data file. The break
variable is Customer Number, and the aggregated variables are zip code(last), profit(sum),
last order data(minimum), cancel quantity(sum), return quantity(sum) and quantity(sum).
Finally, because we want to know each customers return rate and cancel rate, we create two new
variables named Return Rate and Cancel Rate by using following formulas:
Return rate = return quantity / quantity
1 * Actually the precise total profit of a customer should be calculated by the formula: Profit = (Revenue - Cost)*[1-(Return Quantity + Cancel Quantity)/Quantity] However, this calculation losses the ability to demonstrate a customers potential consumption power since his/her returned and cancelled profit are excluded. In our study, we want to investigate customers true consumption power and therefore, we use: Profit = Revenue Cost.
Cancel rate = cancel quantity / quantity
Standardize Decision Variables
We calculate Z-Scores for all decision variables including Profit, Time Duration,
Return Rate and Cancel Rate. Then we save them as new variables.
Split the Sample
We use Select Cases to split the whole data into a calibration sample which is about 60%
of all data and a validation sample which is about 40% of all data.
Firstly, we choose 10% from the calibration sample as our small subset. Secondly, we run
Hierarchical Cluster Analysis to determine the number of clusters. We choose Wards method,
Euclidean distance and Z scores as our methods. According to the marked line, we choose 6 to 8
as the range of solutions. Based on the comparison of the Custom Tables, we choose 8 clusters as
the number of clusters because we can obtain most clear and meaningful managerial insights. The
detailed Custom Tables are attached on Appendixes. Thirdly, we conduct Hierarchical Cluster
Analysis to identify the cluster centers. Finally, we save the outcome in a new data file as initial
seeds which are attached on Appendixes (see Table 1 in the Appendix).
K-Means Cluster Analysis
We use the results of Hierarchical Cluster Analysis as initial seeds and conduct K-means
Cluster Analysis for the Calibration Sample. We choose 80 as maximum iterations and save cluster
membership. The valid cases are 60,025, and the missing cases are 54. The Initial Cluster Centers
(see Table 2 in the Appendix), Iteration History (see Table 3 in the Appendix), Final Cluster
Centers (see Table 4 in the Appendix) and Number of Cases (see Table 5 in the Appendix) in each
Cluster are attached on Appendixes.
In order to make sure we obtain optimal result, we use different random subsets and
different methods to conduct Hierarchical Cluster Analysis. When we use Furthest Neighbor and
Squared Euclidean distance as methods, the outcome is obviously inappropriate because most data
are concentrated on 2 clusters. Other 6 clusters have extremely small and meaningless counts.
More importantly, we cannot find the ideal group which has high return rate and high cancel rate.
Then we save these outcomes as initial seeds in order to run K-means Cluster Analysis. We
conduct K-means Cluster Analysis using different initial seeds. As expected, the results of
calibration sample, the results of validation sample, and all data results cannot match in major
Finalize Calibration Results
Based on our previous analysis, we finalize our decision by running K-means Cluster
Analysis on Calibration sample. The following is calibration results (Table 3.2).
Firstly, we conduct Hierarchical Cluster Analysis to identify the cluster centers. We still
use Wards method and Euclidean distance as our methods when we run Hierarchical
Cluster Analysis. Then we run K-means Cluster Analysis on Validation sample using new
initial seeds. The valid cases are 39,872, and the missing cases are 45. The Initial Cluster
Centers (see Table 6 in the Appendix), Iteration History (see Table 7 in the Appendix), Final
Cluster Centers (see Table 8 in the Appendix) and Number of Cases in each Cluster (see Table 9
in the Appendix) are attached on Appendixes. The following is validation results (see Table 3.3).
Mean Count Mean Count Mean Count Mean Count
Profit 67.6 7057 916.98 1021 100.48 17777 168.81 2802
Return Rate 0.79 7057 3.41 1021 0.6 17777 24.94 2802
Cancel Rate 0.44 7057 5.43 1021 0.15 17777 16.47 2802
Time Duration 81 7057 26 1021 13 17777 35 2802
Mean Count Mean Count Mean Count Mean Count
Profit 77.49 13597 67.79 2626 79.26 13894 107.68 1251
Return Rate 0.28 13597 98.04 2626 0.1 13894 0.2 1251
Cancel Rate 0.09 13597 0.02 2626 0.02 13894 97.09 1251
Time Duration 61 13597 41 2626 38 13894 46 1251
5 6 7 8
Table 3.2 Calibration Clustering Results
1 2 3 4
Compare and Finalize
We compare the calibration results and validation results, and they are consistent.
Especially, the most managerial meaningful clusters which have high return rate and cancel rate
are consistent. Therefore, we conduct Hierarchical Cluster Analysis to identify the cluster centers.
We still use Wards method and Euclidean distance as our methods. Then we run K-means Cluster
Analysis on all data using new initial seeds. The valid cases are 99,897, and the missing
cases are 99. The Initial Cluster Centers (see Table 10 in the Appendix), Iteration History
(see Table 11 in the Appendix), Final Cluster Centers (see Table 12 in the Appendix) and
Number of Cases in each Cluster (see Table 13 in the Appendix) are attached on Appendixes.
The following is all data results (see Table 3.4).
Mean Count Mean Count Mean Count Mean Count
Profit 86.26 11681 525.19 1604 1939 99 78.24 2395
Return Rate 0.89 11681 4.02 1604 2.91 99 85.14 2395
Cancel Rate 0.17 11681 2.96 1604 10.45 99 0.01 2395
Last Order Date 13 11681 25 1604 27 99 40 2395Cluster Number
Mean Count Mean Count Mean Count Mean Count
Profit 70.47 12078 73.82 10425 161.98 810 93.1 780
Return Rate 0.4 12078 0.6 10425 4.05 810 0 780
Cancel Rate 0.09 12078 0.07 10425 42.07 810 99.93 780
Time Duration 69 12078 41 10425 41 810 46 780
5 6 7 8
Table 3.3 Validation Clustering Results
1 2 3 4
Findings from Clustering Results
The cluster 3 is one of the key customer clusters because this group of people contribute the
highest profit, which is $914.13, to us. Also, their return rate is 3.47% and their cancel rate is
5.28% which are relatively low. The average time duration since last order date is 26 months
which is the second shortest among all clusters.
The cluster 6 is also extremely important for us because the profit of this cluster is $100.16
which is relatively high. The last order time is the shortest among all clusters, and their return
rate and cancel rate are both under 1%. Moreover, the customer number of this cluster is the
largest and makes up nearly 30% of all data sample.
The cluster 4 is one of clusters which our team wants to highlight. The customers in this group
have second highest profit which is $172.52 and third shortest time duration since last order
date which is 35 months. However, their return rate and cancel rate are 24.78% and 16.64%
Mean Count Mean Count Mean Count Mean Count
Profit 67.38 11778 78.05 23015 914.13 1717 172.52 4726
Return Rate 0.8 11778 0.11 23015 3.47 1717 24.78 4726
Cancel Rate 0.4 11778 0.02 23015 5.28 1717 16.64 4726
Last Order Date 81 11778 38 23015 26 1717 35 4726Cluster Number
Mean Count Mean Count Mean Count Mean Count
Profit 77.4 22793 100.16 29415 67.23 4371 103.31 2082
Return Rate 0.29 22793 0.59 29415 98.08 4371 0.17 2082
Cancel Rate 0.09 22793 0.14 29415 0.01 4371 97.41 2082
Time Duration 61 22793 13 29415 41 4371 46 2082
5 6 7 8
Table 3.4 All Data Clustering Results
1 2 3 4
respectively. We can obtain huge financial return if we can lower their return rate and cancel
The cluster 7 is another group which we want to deeply analyze. The profit of this cluster is
$67.23, and their cancel rate is 0.01%, and last order period is 41. But, we are surprised that
their return rate is 98.08%. It means we have been spent large amount of money to serve this
group of customers and they have a huge negative influence on our companys financial status.
We can largely cut down companys cost by decreasing their clusters return rate.
The cluster 8 is surprising us as well. Their profit is $103.31 which is third highest among all
clusters. The return rate of this group is 0.17%, and the time duration since last order date is
46 months. However, the cancel rate of this cluster is 97.41%. From our perspective, this
group of customers has large profit potential if we can optimize our purchase process to lower
the cancel rate.
The cluster 2 and cluster 5 are also significant for our analysis because of several reasons.
These two groups have large customer number. Although the profit of these two groups is
both under $80, the return rate and cancel rate are extremely low which all under 0.3%.
Meanwhile we also need to notice that their time duration since last order date are more than
3 years, so we must figure out how to arouse those old customers.
Cluster 1 is relatively unimportant for this analysis. Although the profit is $67.38, the return
rate and cancel rate are low. These customers didnt buy any product from our store more than
6 years. Thus, it is very hard to re-target this group of people.
4. Conclusion & Recommendations
After all data analysis, we segment our customers in 8 groups. Our goal is to decrease return
rate and cancel rate so that we can improve our customers profitability and satisfaction. We also
want to regain our old customers and increase their loyalty. According to Table 3.4, we create the
pie chart (see Fig.4.1) that illustrates the percentages of different segments that make up total
Figure 4.1 Profit Distribution among Groups
From Fig. 4.1, we can clearly find that cluster 6, 2, 5, and 3 contribute the majority (79%) of
our total profits. These customers are our key customers in terms of total profits they generate.
According to Table 3.4, Customers in cluster 6 have the shortest time duration since the last
order date, which means these people now have the highest awareness of Saks among all customers
and have a stronger connection to us. We need to retain these customers for the long-term
development because they have higher probabilities to bring potential profits. In addition, the fact
that their return rate and cancel rate are both low shows that they currently are satisfied with our
products and services.
Cluster 2 has the lowest return rate and second lowest cancel rate. These customers are highly
satisfied with our products and services. These customers purchase products in Saks with fewer
hesitations. But they have not placed an order for more than 3 years, so it is important for us to
For the cluster 5, customers return rate and cancel rate are low, so their satisfaction is stable.
But they have not placed an order for more than 5 years. The mean profits of this group is relatively
low. Low return/cancel rates and low profits suggest that these customers are perhaps concerned
that the return/cancel process will bring them many troubles, so they are unwilling to buy the
product with a very high price. For these customers, we need to soothe their worries and convey
the information that Saks is the ideal store to buy high-end products. Meanwhile, we should update
their personal information and demands since they have not purchased products from us for more
than 5 years.
Since the mean of profits in cluster 3 is the highest, these customers are valuable for us. The
return rate and cancel rate are relatively low, but we still need to decrease return and cancel rate
indoor to increase their satisfaction as much as possible. This groups time duration since last order
data is the second shortest, so we need to retain them and persuade them to set up a long-term
trustworthy relationship with us, helping us to generate more profits in the future.
Cluster 8 has the second-lowest return rate, so these people at least are satisfied with the
products that they have already bought. However, their cancel rate is extremely high which means
we lost most of our potential profits that they intended to purchase at the beginning. Meanwhile,
customers in cluster 8 have long time duration since their last order date, which means that they
are not willing to purchase products from our store since they had bad purchase experience before.
For example, they may be disappointed with our websites slow updating frequency or long
shipping time. Thus, we may need to regain these customers by setting up specific strategies to
target their needs more efficiently.
Although the profits that cluster 4 bring to us are very high, these customers net profits are
not as high as we see in the table because of their high return rate and cancel rate. This situation
indicates that customers are dissatisfied with our products or services. We should improve the
quality of our products and optimize our services to convince them to keep purchasing products
from Saks with a lower return rate and cancel rate. In this way, we can prevent the loss of potential
profits from these customers.
No matter in terms of total profits or the mean of profits, customers in cluster 7 generate low
profits for us. Their return rate is the highest, which means they almost return all the products that
they purchased before. Although our employees spend much time and effort serving them and
trying to meet their needs, these people return most of our products. So there must be something
wrong with our products or services. Since this group of customers has negative influences on our
financial situation right now, the spending on them will be more productive and efficient if we can
lower their return rate.
In order to provide appropriate recommendations for our customers based on their different
characteristics, we need to analyze some reasons for consumers return and cancelation behaviors.
The difference between these two behaviors is that returns happen when customers have already
purchased products and cancelations happen when people have not paid for the product yet.
As we all know, Saks Fifth Avenue is both a retailer and e-retailer. In our physical store,
customers return items largely because of our staff who cannot provide the proper product
information or shopping advice for customers. On the other hand, an increasing amount of
customers are purchasing products on our official website or app. Thus some problems appear. For
example, when a customer purchases a pair of shoes on our website, he/she cannot look at or try
on these products in person. Many customers will be disappointed when they receive the packages
because the products do not match their expectations. Therefore, for these reasons, customers have
higher possibilities of returning their products.
In addition, more and more online retailers appear, which gives people multiple opportunities
to compare price. They can easily find a better price for the same product on other websites, and
once they find it, they will switch to other retailers. Our team has summarized several possible
reasons for return behaviors:
The product itself cannot satisfy our customers. For instance, if one customer bought a sweater
on our website and she was not satisfied with the material of the cloth, she might return this
Another normal situation is that the product is damaged during the shipping process. Under
this situation, the customer definitely will return his/her product.
The description of the product is not consistent with the real product or the details of the
product are not provided very clearly. The higher the expectations customers have based on
the description on our website, the more disappointed they will be if the product doesnt match
Shopping guides dont offer clear explanations for our customers. When customers ask our
shopping guides for some advice or information in our physical stores, it is possible that our
shopping guides are unable to provide proper advice. Misleading information and advice will
probably result in return behaviors.
Poor post-purchase service is another important factor that will cause people to return their
products. Saks is a high-end retailer that the prices of our products are relatively high. When
customers pay a premium for a product, they will have higher requirements for customer
services. If our post-purchase services cannot solve their problems in time and effectively, they
may return their products as well. For instance, when a customer calls our representative to
require an exchange, if we process this demand very slowly, the customer may run out of
patience and decide to return the product directly.
Our team has summarized several possible reasons for cancelation behaviors:
Customers make some mistakes when they place an order. For instance, they may find that
they chose the wrong size or wrong color when they checkout. Under this situation, they will
cancel the order and replace it with the right order, so this kind of cancelation will not
essentially influence our sales. However, we still need to provide a clearer website design and
better information to help customers place orders correctly. The other condition is that
customers fill in the wrong personal information when they checkout, so they need to cancel
the order and order the product again. This condition doesnt have significant influences on
our profits because customers usually will place the order again.
Customers find a better offer on other websites. Since more and more online retailers appear,
many customers are used to comparing prices of the same products on different websites before
they checkout. Once they find a better offer on another on-line retailer, they will cancel the
previous order on our website.
Personal factors. It happens all the time that customers put items in their shopping carts when
they are stimulated by some external incentives, but they still hesitate to buy. Products from
Saks usually have high prices, so a majority of customers need a longer time to consider. After
the impulse disappears, most customers will recover their rational thoughts and decide to
cancel the order.
Based on the previous analysis, Saks can prevent lots of customers return and cancelation
behaviors by taking practical actions. We hereby provide managerial recommendations based on
each groups characteristics.
Regarding cluster 7, customers generate relatively low profits but their return rate is the
highest. Obviously, we need to decrease the return rate in order to encourage them spend
more on Saks. Firstly, we should improve the quality of the information on our website, such
as providing them more description about products details. In this way, customers could have
better understanding before they purchase products.
Secondly, Saks should use better shipping packaging in order to protect products from being
damaged by external forces. According to our research, we find that customers care more
about the packaging when they pay high prices for products. So, delicate packaging can not
only convey a good impression for our company but also match customers expectations.
Besides, due to their frequent return behaviors, this groups profits may be relatively low, so
if we can decrease their return rate, their profits will increase somewhat.
Regarding cluster 8, this group generates relative high profits, but it also has the highest cancel
rate and has not purchased products from us for a long time. Saks should provide these
customers more straightforward information about products when they do shopping on our
website so as to reduce the probability of misleading them. In addition, Saks should highlight
low stock next to the quantity box in order to give customers a hint that this product may be
not available in a short time. In this way, we can largely reduce the time they hesitate and
motivate them to pay for the order immediately.
Besides, we can remind customers the number of people who are watching this product at the
same time. Giving them an impression that this product is really popular can motivate them to
complete the transaction quickly. A lot of potential profits will be realized if this groups
cancel rate can be decreased. Since we have the contact information of these customers, Saks
should send them greeting emails to show our care. By telling them the new changes about
our company and our new arrivals, we can trigger their interests again.
Regarding cluster 4, the mean profits of this group is the second highest, but their cancel rate
and return rate are relatively high among all groups. Firstly, we need to systematically train
our salespersons and shopping guides so that they have the ability to provide more appropriate
advice and information for our customers. Considering that this group has not placed orders
from us for more than two years, it is really helpful to retarget them by sending them
promotional emails seasonally, especially for holidays. In order to prevent the return behaviors,
we can also provide them discount coupons for their next purchases if they agree to keep their
products this time. If they insist to return, we can offer them a refund, like 5% of the original
price, to convince them not to return.
Regarding cluster 3, this group generates much higher profits than other groups. So these VIP
customers return and cancelation behaviors have more serious negative effects on our profits.
Saks should provide a personal shopping guide for each of them so that we can be aware of
and solve their problems in a timely manner and correctly. Saks will gain huge financial
returns if we can decrease these VIP customers return rate to below 1%.
Regarding cluster 2 and 5, the mean profits of these two groups are in the middle level, and
their cancel rate and return rate are extremely low. Based on our previous analysis, these
customers may have some concerns that the returning and canceling process would bring them
inconvenience, so they are unwilling to purchase high-price items. For these customers, we
need to provide them a guarantee that if they are not satisfied our products, they have multiple
channels to contact us, and we will deal with their problems in 24 hours. We believe that they
will spend more money if Saks shopping process become more convenient.
Regarding cluster 6, the population in this group is the largest, which accounts for 30% of all
population. Their mean profits is relatively high. More importantly, the time duration since
their last order is the shortest. In this situation, we should send them promotional emails or
mailings more frequently to maintain their interests and to convince them to keep purchasing
from us. For example, we send them promotion coupons, like 10% discount. For these
customers, we also want them to generate more profits for our company because they have
potential profitability. Thus, we can try to offer them information about some high-end brands
products through emails or mailings, in an effort to persuade these customers to buy higher
5. Limitations and Future Research
Though we successfully identify 8 groups with diverse characteristics, we understand our
analysis has its limitations.
We lack some supportive data to serve decision making and reinforce our
recommendations. Our study aims at identifying and investigating actionable customer groups
with unique features. For example, for a high return rate group, convincingly lowering its return
rate increases its profit. However, the current data is capable of identifying who are high
return/cancel rate customers, but does not enable us to investigate why they return and/or cancel
orders. As discussed in the previous sections, the reasons leading to high return/cancel rate are
diverse. Knowing the motivations and reasons of returning and cancelling enables us to improve
and optimize in avoidance of future similar situations. Unfortunately, we could not learn relevant
insights from the current data, or otherwise we would have been able to come up with more specific
recommendations for different segments.
For future research, we have to extend our data diversity, especially adding the data that
assists in learning returning and cancelling reasons. Saks has two major retail channels: online
stores and offline stores. To comprehensively analyze the entire customer pool anticipates an
improved data collection mechanism. For the online channel, one suggestion for future data
collection is to add a check box listing possible return/cancel reasons in the after-sale-service page.
The check box window appears when customers apply for a return or a cancel so that our database
could record and store what we need. By the same token, when customers return in offline stores,
our sale assistants should also learn their return reasons and record them into the sale system.
The ultimate goal of analyzing customer information and consumption data is to obtain
financial returns, increased profit for instance. We note that there are various ways to improve
profit. While this study aims at investigating return rate and cancel rate, future research could focus
on improving profit through increasing revenue.
M C M C M C M C M C M C
Last Order Date 32 627 76 1039 18 2302 43 254 53 1684 48 135
Profit 308 627 50 1039 85 2302 66 254 88 1684 75 135
Return Rate 18 627 0 1039 0 2302 100 254 0.05 1684 0 135
Cancel Rate 11 627 0 1039 0 2302 0 254 0.11 1684 100 135
M C M C M C M C M C M C M C
Last Order Date 28 223 76 1039 18 2302 34 404 43 254 53 1684 48 135
Profit 611 223 50 1039 85 2302 141 404 66 254 88 1684 75 135
Return Rate 4 223 0 1039 0 2302 26 404 100 254 0.05 1684 0 135
Cancel Rate 6.6 223 0 1039 0 2302 14 404 0 254 0.11 1684 100 135
M C M C M C M C M C M C M C M C
Last Order Date 28 223 76 1039 27 994 34 404 12 1308 43 254 53 1684 48 135
Profit 611 223 50 1039 58 994 141 404 106 1308 65.91 254 88 1684 75 135
Return Rate 4.06 223 0 1039 0 994 26 404 0.01 1308 100 254 0.05 1684 0 135
Cancel Rate 6.6 223 0 1039 0 994 14 404 0 1308 0 254 0.11 1684 100 135
1 2 3 4 5 6 7 8
1 2 3 4 5 6
Table 1. Hierarchical Cluster Analysis on 10% Calibration Sample
1 2 3 4 5 6 7 8
Cluster Number 1 2 3 4 5 6 7 8
Zscore (Profit) -0.49722 1.41462 -0.56187 -0.26828 -1.16908 0.07006 0.50463 0.29141
Zscore (Return Rate)
Zscore (Cancel Rate)
0.23681 -0.1999 -0.1999 0.70025 -0.1999 -0.1999 -0.19288 6.4128
-0.08598 -0.27678 -0.27678 0.94381 -0.27648 4.42492
Table 2. Initial Cluster Centers for Calibration Sample
2.86974 -0.29206 -0.24894 0.22321 0.02253 -0.20202 -0.07664 -0.15222Zscore (Time Durtion)
1 2 3 4 5 6 7 81 1.299 0.88 0.303 0.49 1.333 0.338 0.324 0.4552 0.104 0.388 0.401 0.158 0.26 0.004 0.108 0.0313 0.029 0.412 0.151 0.065 0.064 0.003 0.141 04 0.037 0.322 0.083 0.041 0.009 0.001 0.131 0.025 0.029 0.254 0.08 0.033 0.018 0 0.105 06 0.011 0.207 0.029 0.031 0.03 0 0.055 0.0067 0.004 0.15 0.018 0.027 0.023 0 0.034 08 0.005 0.132 0.012 0.022 0.016 0 0.02 0.0029 0.004 0.116 0.006 0.018 0.011 0.001 0.009 0.00210 0.007 0.082 0.006 0.014 0.007 0.001 0.007 0.00611 0.006 0.057 0.003 0.012 0.006 0 0.005 012 0.006 0.055 0.003 0.005 0.004 0 0.003 013 0.007 0.046 0.002 0.006 0.004 0 0.001 014 0.009 0.033 0.001 0.002 0.006 0 0 015 0.016 0.026 0.001 0.003 0.01 0 0.001 016 0.014 0.026 0.001 0.005 0.011 0 0.003 017 0.006 0.031 0.001 0.005 0.007 0 0.004 018 0.004 0.021 0.001 0.005 0.003 0 0.002 019 0.001 0.019 0.001 0.003 0.001 0 0.001 020 0.001 0.015 0 0.003 0.001 0.001 0 021 0 0.015 0 0.001 0.001 0 4.68E-05 022 0.001 0.017 0.001 0.003 8.96E-05 0 4.97E-05 023 0 0.014 0 0.002 0 0 0 024 0 0.007 0 0.002 0 0 0 025 0 0.002 0 0.003 0 0 5.31E-05 026 0 0 0 0.001 0 0 0 027 0 0 3.02E-05 0 0 0 3.86E-05 028 0 0 0 0 0 0 0 0
Table 3. Iteration History for Calibration Sample
IterationChange in Cluster Centers
Cluster Number 1 2 3 4 5 6 7 8
-0.1987 -0.1987 6.2202
Zscore (Time Duration)
Zscore (Return Rate)
Zscore (Cancel Rate) -0.1706 0.15895 -0.1903 0.88898 -0.1938
-0.1915 -0.1269 0.03311
-0.2396 -0.1166 -0.2484 0.89573 -0.2634 4.33294 -0.2722 -0.2672
-0.1925 4.58926 -0.0074 0.37726 -0.1368
Table 4. Table Final Cluster Centers for Calibration Sample
1.60161 -0.6019 -1.1245 -0.2261 0.8055 0.00026 -0.1061 0.2118
Table5. Number of Cases in each Cluster for Calibration Sample
Cluster Number 1 2 3 4 5 6 7 8
Zscore(Profit) 0.05353 0.9094 1.63409 0.00658 -0.50291 -1.07304 -0.23003 0.15992
-0.27678 0.00582 -0.27678
Zscore (Cancel Rate) -0.1999 -0.18996 -0.1999 -0.1999 -0.19252 -0.1999
Zscore (Return Rate) -0.27678 -0.27678 -0.27678 4.42075 0.56365
Table 6. Initial Cluster Centers for Validation Sample
Zscore (Time Duration) -0.21249 -0.09599 -0.24407 -0.20105 0.62656 -0.19915 0.30391 -0.02783
Iteration 1 2 3 4 5 6 7 91 0.405 0.206 1.192 0.294 0.859 0.543 0.579 0.2582 0.433 0.091 0.721 0.17 0.122 0.387 0.085 0.0633 0.129 0.134 0.59 0.199 0.062 0.094 0.019 0.0034 0.024 0.212 0.577 0.136 0.016 0.025 0.007 0.0185 0.021 0.214 0.534 0.071 0.009 0.017 0.018 06 0.012 0.189 0.553 0.024 0.01 0.017 0.011 07 0.003 0.135 0.431 0.009 0.006 0.014 0.006 08 0.004 0.102 0.404 0.003 0.002 0.009 0.003 09 0.006 0.099 0.412 0.002 0.001 0.007 0.004 010 0.005 0.098 0.498 0.001 0.001 0.006 0.007 011 0.005 0.078 0.383 0.002 0.001 0.004 0.006 012 0.004 0.068 0.286 0.002 0 0.004 0.005 013 0.005 0.065 0.26 0.002 0.001 0.003 0.008 014 0.004 0.057 0.225 0 0 0.003 0.003 015 0.004 0.051 0.177 0.001 0.001 0.002 0.006 016 0.003 0.042 0.19 0 0 0.001 0.002 017 0.003 0.039 0.178 0 0 0.002 0 018 0.003 0.033 0.191 0 0 0.001 0 019 0.002 0.034 0.175 0.001 0 0.001 0.002 020 0.001 0.033 0.257 0 0.001 0.001 0.004 021 0.002 0.032 0.21 0 0.001 0 0.002 022 0.002 0.033 0.265 0.001 0 0.001 0.008 023 0.001 0.023 0.079 0.001 0 0.001 0.002 024 0.001 0.015 0.041 0 0 0.001 0.007 025 0.001 0.012 0.041 0 0 0.001 0 026 0.001 0.007 0 0 0 0 0 027 0 0.002 0 0 0 0 0 028 0 0.001 0 0 0 0 0 029 0 0 0 0 0 0 0 0
Table7. Iteration History for Validation Sample
Cluster Number 1 2 3 4 5 6 7 9
Zscore (CancelRate) -0.18893 -0.00429 0.49115 -0.19921 -0.19388 -0.19533 2.58208
-0.15752 0.33883 -0.04898
Zscore (ReturnRate) -0.23482 -0.08775 -0.13979 3.7263 -0.25802 -0.24841
Zscore (Profit_sum) -0.08745 2.38355 10.34737 -0.13262 -0.17638
Table8. Final Cluster Centers for Validation Sample
Zscore (Time Duration) -1.10393 -0.62541 -0.55918 -0.03315 1.15412 0.01119 0.00878 0.20639
1 116812 16043 994 23955 120786 104257 8108 09 780
Table 9. Number of Cases in each Cluster for Validation
Cluster Number 1 2 3 4 5 6 7 8
Zscore (Time Duration)
Zscore (Return Rate)
Zscore (Cancel Rate)
Table10. Initial Cluster Centers for All Data
3.39681 -0.066 -0.239 0.26632 0.67681 -0.2987 -0.2043 -0.1314
-0.6108 -1.0087 1.36108 -0.2214 0.48402 0.24312 0.04853 0.2496
-0.1109 -0.2767 -0.2768 0.99106 -0.2506 -0.2768 4.42492 -0.2768
0.19242 -0.1999 -0.1999 0.75803 -0.1211 -0.1999 -0.1999 6.4128
1 2 3 4 5 6 7 81 1.463 0.594 1.082 0.447 0.729 0.614 0.325 0.4142 0.308 0.183 0.411 0.139 0.103 0.195 0.009 0.0353 0.082 0.109 0.349 0.068 0.075 0.077 0.001 0.0114 0.042 0.063 0.285 0.051 0.047 0.03 0.001 0.0085 0.021 0.03 0.233 0.04 0.023 0.017 0 0.0056 0.011 0.021 0.184 0.03 0.017 0.01 0.001 07 0.006 0.015 0.151 0.02 0.013 0.008 0.001 08 0.004 0.006 0.121 0.01 0.004 0.007 0 09 0.001 0.002 0.109 0.01 0.002 0.005 0 0.008
10 0 0.001 0.087 0.012 0.001 0.003 0 0.00311 0 0.001 0.051 0.009 0 0.002 0 0.00212 0 0 0.048 0.009 0 0.002 0.001 0.00113 0 0 0.036 0.006 0.001 0.001 0 014 0 0 0.028 0.006 0 0.001 0 0.00215 0 0 0.022 0.005 0 0.001 0 016 0 5.54E-05 0.019 0.006 0 0 0 017 0.001 0 0.017 0.005 0 0 0 018 0 9.84E-05 0.018 0.004 9.04E-05 0.001 0 019 0 0 0.014 0.002 7.18E-05 0.001 0 020 0 7.04E-05 0.011 0.002 0 0 0 021 0 0 0.005 0.001 0 0 0 022 0 3.14E-05 0.01 0.002 0 0 0 023 0 0 0.013 0.002 0 0 0 024 0 0 0.01 0.003 0 0 0 025 0 0 0.004 0.003 6.85E-05 0 0 026 0 0 0.003 0.001 6.85E-05 8.99E-05 0 027 0 0 0.004 0.001 6.85E-05 9.97E-05 0 028 0 0 0.004 0.001 9.62E-05 0 0 029 0 0 0.005 0.002 9.80E-05 0 0 030 0 4.69E-05 0.004 0.002 0 7.14E-05 0 031 0 4.68E-05 0.006 0.002 0 6.64E-05 0 032 0 0 0.004 0.001 0 9.27E-05 0 033 0 0 0.004 0 0 0 0 034 0 0 0.003 0 0 8.59E-05 0 035 0 0 0 0 0 0 0 0
Table 11. Iteration History for All DataIteration Change in Cluster Centers
Cluster Number 1 2 3 4 5 6 7 8
Zscore (Cancel Rate) -0.1736 -0.1987 0.14937 0.9007 -0.1939 -0.1904 -0.1992
-0.0092 -0.1946 0.00853
Zscore (Return Rate) -0.2394 -0.2717 -0.1138 0.88815 -0.2632 -0.2489
Zscore (Profit) -0.1938 -0.1337 4.57319 0.39816 -0.1373
Table 12. Final Cluster Centers for All Data
Zscore (Time Duration) 1.60055 -0.1037 -0.6051 -0.2386 0.80404 -1.1248 0.00654 0.20816
1 117782 230153 17174 47265 227936 294157 43718 2082
Table 13. Number of Cases in each Cluster for All Data