saks fifth avenue

Click here to load reader

Post on 15-Apr-2017

43 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Saks Fifth Avenue Customer Behavior Report Based on Data Driven Analysis

    Group 8: Linhan zhang, Zhongyuan Lian, Huiruo Zhang, Yitian Chen

  • Executive Summary

    Saks Fifth Avenue is a luxury department chain store which sells high-end brands both

    online and offline. The objective of this research is to help Saks Fifth Avenue (hereafter Saks)

    decrease customers return rate and cancel rate so as to improve customers profitability and

    satisfaction. Also, we want to regain our old customers as well as increase their loyalty.

    The original data in our research comes from the Customer Relationship Management

    Database in Saks. This database records a wide range of historical sale information based on every

    single order line, including over 137,000 orders from 100,000 customers. Each order line records

    customer information in terms of customer ID number, and ZIP Code, and transaction-related

    items such as order date, shipping date, revenue, cost etc.

    Since we intend to segment our customers based on their return record, order cancel record,

    total profits, and the time of their most recent order, we aggregate all records into a new data file

    with individual level. Then, we choose four key factors as our variables which are profits, return

    rate, cancel rate and time duration since last order date. We use K-means cluster analysis as our

    major segmentation method. We divide whole data into calibration set and validation set, and

    conduct K-means cluster analysis on each of them to make sure that we will not miss any

    meaningful group of customers. Furthermore, different methods are conducted to explore our

    research several times.

    After outcomes of K-means cluster analysis match our expectation, we summarize and

    interpret our key findings. There are 8 clusters which have meaningful features respectively.

    Among them there are three groups that interest us most.

    The first group makes up about 30% of all customers which generate high profits, and their

  • return/cancel rate are extremely low. They have shortest time duration since last order date.

    Obviously, they are the core customers for our company and we should take an action to retain

    these customers in order to generate more profits. For example, we could offer them better services

    and high quality products to increase customer loyalty and satisfaction.

    The second one is a group of customers who generate relatively high profits to our

    company, while their cancel rate are extremely high. These customers were able to generate a huge

    profit for us. However, they are likely to cancel their orders due to some reasons. What makes

    things worse is that they will cause additional costs for our company since we need to provide

    special services when they return items. For these kinds of customers, we need to figure out their

    true needs and the reasons of high cancel rate. They have huge financial potential if we can increase

    their customer satisfaction. They could turn into the first group of customers and generate a huge

    profit to the company.

    The third group includes customers who have relatively lower profits, but their return rate

    is very high. These customers are unsatisfied with our products or services so they keep returning

    their items back. This group is a huge financial burden for Saks, so we have to decrease their return

    rate by figuring out the reasons and taking any actions to increase their satisfaction.

    Ultimately, we analyze the major reasons, which cause high rate/cancel rate. Based on our

    previous analysis, we provide different managerial recommendations for each groups regarding

    their significance and characteristics. These recommendations will serve to decrease customers

    return rate and cancel rate and eventually increase profits for Saks in the future.

  • Table of Contents

    1. Introduction ................................................................................................. 1

    2. Background ................................................................................................. 2

    3. Methodology and Analysis ............................................................................ 4

    Definition of Clustering Analysis ................................................................... 5

    Data Obtained and Used ................................................................................ 7

    Variables selection and Explanation ............................................................... 7

    Data Preparation ........................................................................................... 9

    Calibration and Validation ........................................................................... 11

    Clustering Settings ...................................................................................... 11

    Measure Interval: Euclidean Distance.................................................... 12

    Cluster Method: Wards Method ............................................................ 12

    Standardization: Z scores ...................................................................... 13

    Specific Operations ..................................................................................... 13

    Findings from Clustering Results ................................................................. 18

    4. Conclusion & Recommendations ................................................................. 20

    Recommendations....................................................................................... 22

    5. Limitations and Future Research ................................................................. 28

    6. Appendix ................................................................................................... 30

  • 1

    1. Introduction

    Imagine you are a store owner selling limited-edition Pradas purse which normally more

    than $5000. Which kind of customer is more valuable for you? A customer who spends average

    amount of money but never returns or cancels the order? Or a customer who spends huge amount

    of money but returns or cancels most their orders at the end? This is a significant but tricky question

    for every company, especially for Saks Fifth Avenue who has higher unit price.

    It is said that customers are the most valuable equity for companies. As a luxury department

    store, Saks sells products that are much more expensive, which means every single purchase means

    a lot to the company in financial level. As a result, high return and cancel rate are more lethal for

    Saks than regular department stores, for example, Macys. At the meantime, customer satisfaction

    and loyalty that directly decide the companys fate are also extremely significant for Saks. What

    is more, it is also important for us to know how often a customer comes back and purchase.

    According to our background research, the major managerial issue of Saks is to increase

    profit by reducing return/cancel rate as well as regaining customers who have not purchased more

    than one year. Through a series of analysis and comparison, we segment whole customers into 8

    groups based on profit that they generate, the time duration since their last order date, return rate

    and cancel rate. Each of group has their own meaningful features. Some of them generate the

    highest profits while have not purchased for more than two years. Some of them generate high

    profit while also have high return/cancel rate. We have discussed each cluster in detail in the

    following report. We will elaborate each groups features and provide managerial

    recommendations.

  • 2

    2. Background

    Saks Fifth Avenue is a luxury department chain store that was founded in 1867. With such

    a long history, Saks has established its own customer pool with large quantity of loyal customers.

    Most customers go shopping in Saks for their nice service and latest fashion. There are a number

    of world famous luxury brands in Saks including Gucci, Prada and FENDI. Staffs in Saks are very

    professional and they usually offer customers thoughtful advices during the purchase process.

    However, based on our research, Saks cannot generate as much revenue as it did a few

    years ago. The competition between department stores is becoming more and more fierce. Main

    competitors of Saks such as Bloomingdales and Neiman Marcus have made much pressure on

    Saks by using price-off promotions. Even medium range department stores, saying Macys, and

    online stores, like amazon.com, are competing with Saks. More competitions mean customers have

    more choices. However, for Saks it leads to high return and cancel rate because once customers

    find a lower price on amazon.com, the first action they will take is to cancel their orders on our

    website. Moreover, high service costs make Saks more difficult to generate considerable profits.

    As a result, Saks has faced much more challenges than it ever did and they need to find a way to

    solve their own problems and keep growing.

    In recent years, Saks has introduced their online stores and app to enlarge their market

    share and attract more young customers. Online shopping is an easier and cheaper way to purchase

    items for both customers and companies. However, it raises several issues as well. Since Saks sells

    many apparels and makeups, it is impossible for customers to try them on before purchasing on

    the website. Once customers find out that the product does not match their expectation, they will

    return items back. Therefore, online shopping has increased return/cancel rate, which leads the

  • 3

    company to spend additional costs. Our team will help Saks to figure out solutions to these issues

    by reducing return/cancel rate as well as increasing customer satisfaction.

  • 4

    3. Methodology and Analysis

    The nature of retailing industry reflects the great importance of a deep understanding of

    customers. Saks Fifth Avenue specializes in selling various high-end brands including Gucci,

    Burberry, and Prada etc. In an effort to increase the companys profit, we notice that working on

    reducing return rate and cancel rate could play a crucial role in achieving this objective. Once a

    customer returns a product or cancels an order, we actually lose not only the potential profit, but

    also the previous effort we invested in acquiring this customer and in attracting her to visit our

    locales. Therefore, we pay most of our attention in investigating returning and cancelling so as to

    obtain actionable insights of which we can take advantage.

    We intend to conduct cluster analysis to segment our historic customers in terms of their

    return record, cancel order record, and total profit generated throughout their accumulated

    consumptions in Saks Fifth Avenue, as well as the time of their most recent order. By conducting

    cluster analysis, we discover separate groups that differ from each other in these aspects. Then we

    compare them, identify their differences, and evaluate the possible reasons to these differences.

    After understanding the characteristics and implications of these groups, we are able to come up

    with corresponding recommendations that can improve their future performance.

    The objective of this study is to identify different customer groups in terms of the above

    four aspects and screen out specific contact information of the customers in each group for direct

    marketing, eventually decreasing return rate and cancel rate, increasing customer satisfaction and

    profit, and regaining old customers.

  • 5

    Definition of Clustering Analysis

    Originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology

    by Zubin in 1938 and Robert Tryon in 1939, cluster analysis is the task of grouping a set of objects

    in such a way that objects in the same group (called a cluster) are more similar (in some sense or

    another) to each other than to those in other groups (clusters). [Reference] The outcome of cluster

    analysis is to create a set of segments from a set of individual samples. Samples in the same

    segment share more commonalities with each other than they do with samples from other

    segments. In business, Cluster Analysis is a popular and frequently used method to realize market

    segmentation, which is an important part of marketing planning.

    Our research adopts two types of clustering methods: Hierarchical Clustering and K-means

    Clustering. Hierarchical Clustering is useful when sample size is relatively small. Different

    selections of clustering method and measure interval lead to different clustering results. Among

    them we select the one that meets our expectation in terms of segment size, segment characters

    and between-segment differences. Hierarchical clustering engenders an exploratory insight for

    following K-means clustering analysis, which, together with hierarchical clustering, is capable of

    big-size-sample segmentation. An effective and efficient cluster analysis on a big size data set

    requires the combination of these two clustering methods. The Flowchart (see Fig. 3.1)

    demonstrates the procedure of our cluster analysis.

  • 6

    Figure 3.1 Analysis Flowchart

  • 7

    Data Obtained and Used

    The data analyzed in our research comes from the Customer Relationship Management

    Database in Saks. This database records a wide range of historical sale information based on every

    single order line. Each order line records customer information including customer ID number and

    ZIP Code, and some key items recorded during a transaction such as order date, shipping date,

    price, cost, etc. Different order lines may have the same order number, showing that these order

    lines are from the same order. By the same token, different order numbers may have the same

    customer number, meaning that the customer placed these orders in different times.

    The size of the data we obtain is significant enough to produce representative insights.

    We select the records start from 12/16/2004 to 09/17/2012, covering more than 226,000 order

    line records. These records reflect over 137,000 orders from 100,000 customers.

    Variables selection and Explanation

    After understanding the descriptions of the variables in a record line, we determine four

    clustering variables. They are: Total Profit, Return Rate, Cancel Rate, and Time Duration since

    Last Order Date. These variables are not included in the current data set but can be calculated from

    some of the existing variables. There are other variables we will use to describe the features and

    attributes of our result segments, including Customer Number, Zip Code, etc. Each of the

    clustering variables has its unique meaning and implication for us.

    Total Profit

    Profit is the most important indicator of a customers value. The higher the profit a

    customer generates, the more imperative it is to maintain him/her. Generally, a company has finite

    resources available for customer relationship management. If it invests equal resources in every

  • 8

    customer in spite of their value, unavoidably, it will end up with high profitable customers not

    served and maintained hospitably and with low profitable customers occupying much resources

    but not creating enough profit in return. Therefore, while our final objective is to come up with

    actionable recommendations to different customer groups, the Total Profit tells us which group

    requires more attentions and hence, more resources.

    Return Rate

    Return Rate conveys important information about the consumption characteristics of a

    customer. A high return rate has many implications. For example, an unclear or misleading product

    description could result in customers complaints after receiving their packages, which always

    leads to returning. A high return rate could also be attributed to customers particular taste. No

    matter what leads to high return rate, the higher the return rate, the more profit we loss. While

    increasing revenue is a pathway to greater profit, lowing unnecessary loss is also an effective one.

    Saks dedicates in high-end niche market. The nature of high-end brands, generally speaking, have

    smaller sale volume than lower tier brands, but they invest more to support their high-end brand

    positioning and marketing. Sakss well-qualified salesperson, high rental fee, and high advertising

    budget imply a high operational cost. Once return or cancel happens, though most of products can

    be resold, we waste a lot of costs. This is another reason we attach great importance to return rate

    and cancel rate. With these considerations, we select return rate as one of our clustering

    variables.

    Cancel Rate

    Cancel Rate refers to the ratio of ones cancel order lines to total order lines. The same as

    return rate, it has a strong relation with profit, but in a different way. While returning is a

    customers decision after receiving the ordered products, cancelling means a customer changes her

  • 9

    mind before that. We assume that return rate, relatively, comes down to the dissatisfaction of our

    products and that it implies unsatisfying customer purchasing experiences such as chaotic

    shopping guidance and poor customer services. By the same token as return, a decreased cancel

    rate brings corresponding increased profit. Therefore, we put cancel rate in our variable list.

    Time Duration since Last Order Date

    Time duration since last order date is the time period between the date a customer placed

    his last order and current date. We audit the data set and find there are a large number of customers

    have been a long time not shopping in Saks again. The longer the duration, the higher the

    possibility that the customer has already defected. This variable matters our decision making in

    that marketing strategies and plans can be totally different towards new customers and old ones.

    And so too is the resulting marketing effects. Relatively new customers are easier to contact and

    attract because their contact information is up to date and because they have stronger connection

    with our brand and products. On the other hand, customers who have more than three years not

    coming back are of lesser value and priority due to the opposite reasons. Therefore, differentiating

    new and old customers through clustering is meaningful.

    Data Preparation

    The records in the data set is ordered basing on every single order line. Since the objective

    of our cluster analysis is the acquirement of information on an individual basis, we aggregate all

    records into a format with customer number as key value. We then audit the aggregated date set

    and determine the calculations to transform existing variables into the four clustering variables.

    Table 3.1 offers a comprehensive explanation of the variables used and the ones we compute, in

    the order of variables used throughout this analysis.

  • 10

    Table 3.1 Overview of the Variables Used

    Variables Explanation

    Original variables

    Customer Number A unique customer identification numeric string with 11 digit. Each customers has only one customer number.

    ZIP Code 5 digit ZIP Code referring the location of a customer where he/she places an order.

    Order Number A unique 9 digit numeric string referring to a specific order. One customer could have placed more than one orders with different order number.

    Order Line Line number for each unique product in an order.

    Order Date Date an order was placed

    Quantity Quantity of a product in an order

    Revenue The total price of an order line

    Cost The total cost of an order line

    Return Quantity The quantity of returned product

    Computed Variables before aggregation

    Profit

    The profit of a single order line.

    Calculation:

    Profit = Revenue - Cost

    Time Duration since Order Date (Month)

    The time duration between today and the day the order was placed.

    Calculation:

    Time Duration Since Order Date = Date of Today Order Date

    Aggregated Variables (Aggregate by Customer Number)

    Last ZIP Code The ZIP Code a customer places his/her last Order

    Total Profit The summed Profit of a customers all order(s)

    Time Duration since Last Order Date (Month)

    The time duration between today and the day the customers last order was placed.

    Total Quantity The total quantity of products of a customer has ever purchased including returned and cancelled quantity

    Total Return Quantity The total quantity of products a customer has ever returned

  • 11

    Total Cancel Quantity The total quantity of products a customer has ever cancelled

    Computed Variables after aggregation

    Return Rate

    The return rate of a customers historical consumptions.

    Calculation: Return Rate = Total Return Quantity / Total Quantity

    Cancel Rate

    The cancel rate of a customers historical consumptions.

    Calculation: Cancel Rate = Total Cancel Quantity / Total Quantity

    Calibration and Validation

    After the four clustering variables are ready, we divide the data set into two subsets:

    Calibration sample set (including 60% records of all) and Validation sample set (including 40%

    records of all). The Calibration sample set is used to generate a promising division of

    segmentations, while the validation sample set is used to verify whether that division is appropriate

    and representative. Conducting clustering on both these sets ensures no meaningful segments are

    missed. If true, then a clustering on the entire data set is conducted to further testify that division.

    This verification mechanism is useful in guaranteeing the accuracy and the representativeness of

    our analysis.

    Clustering Settings

    Randomly selecting 10% samples from calibration set, we formulate the approach to

    hierarchical clustering. There are three crucial decisions: selection of cluster method, selection of

    measure interval, and whether or not to standardize clustering variables.

  • 12

    Measure Interval: Euclidean Distance

    Measure Interval decides the calculation standard of the distance between two samples.

    Two popular measure interval metrics are Squared Euclidean Distance and Euclidean Distance.

    While the distance between two given samples is X according to Euclidean Distance algorithm, it

    becomes X 2 in the case of Squared Euclidean Distance algorithm. Squared Euclidean Distance

    amplifies the numeric value of a fixed distance, and thus the variance between samples is enlarged.

    An enlarged variance alienate two samples. However, we prefer two similar samples to be

    convergent rather than distant. Therefore, we select Euclidean Distance.

    Cluster Method: Wards Method

    Cluster Method decides the criterion that judges the distance between two clusters. Two

    alternative methods are Furthest Neighbor and Wards method. Furthest Neighbor method

    determine the longest distance between any two members of the two clusters as the distance

    between the two clusters. This method is effective in identifying the small sample groups that are

    conspicuously different from others, and correspondingly, the outcome clusters always happen to

    have the majority of samples converge in a few large groups with the rest minority samples

    assigned to much smaller groups. On the other hand, Wards method used sum of squared-errors

    as the measure of distance and thus tends to produce groups of similar size.

    Our analysis aims at identifying groups with different characters with respect to the four

    clustering variables. The identified groups should be adequately sizable for actionable marketing

    campaigns, which means that some of the segments identified by Furthest Neighbor method might

    be too small to meet our expectations. On the contrary, Wards method provides groups with

    relatively even sample distribution and is our choice.

  • 13

    Standardization: Z scores

    Standardization is required when clustering variables produce different weighted

    influences on the result. Standardization transforms the variables into comparable forms so that

    they have equal influences and significances. In our study, the four variables have obviously

    different value ranges and variances. To guarantee an accurate analysis, we standardize them.

    Specific Operations

    Firstly, according to our purpose, we must create several new variables in order to complete

    further analysis. Since customers profit is the key factor for our analysis, we use the following

    equation to calculate a new variable named Profit.

    Profit = Revenue Cost1

    Because we want to know how many months passed since each customers last order date, we use

    Date and Time Wizard to create a new variable named Time Duration. Secondly, we use

    Recode into Same Variables to replace the missing values in cancel quantity, return quantity and

    quantity with zero. Thirdly, we aggregate the original data file into a new data file. The break

    variable is Customer Number, and the aggregated variables are zip code(last), profit(sum),

    last order data(minimum), cancel quantity(sum), return quantity(sum) and quantity(sum).

    Finally, because we want to know each customers return rate and cancel rate, we create two new

    variables named Return Rate and Cancel Rate by using following formulas:

    Return rate = return quantity / quantity

    1 * Actually the precise total profit of a customer should be calculated by the formula: Profit = (Revenue - Cost)*[1-(Return Quantity + Cancel Quantity)/Quantity] However, this calculation losses the ability to demonstrate a customers potential consumption power since his/her returned and cancelled profit are excluded. In our study, we want to investigate customers true consumption power and therefore, we use: Profit = Revenue Cost.

  • 14

    Cancel rate = cancel quantity / quantity

    Standardize Decision Variables

    We calculate Z-Scores for all decision variables including Profit, Time Duration,

    Return Rate and Cancel Rate. Then we save them as new variables.

    Split the Sample

    We use Select Cases to split the whole data into a calibration sample which is about 60%

    of all data and a validation sample which is about 40% of all data.

    Hierarchical Clustering

    Firstly, we choose 10% from the calibration sample as our small subset. Secondly, we run

    Hierarchical Cluster Analysis to determine the number of clusters. We choose Wards method,

    Euclidean distance and Z scores as our methods. According to the marked line, we choose 6 to 8

    as the range of solutions. Based on the comparison of the Custom Tables, we choose 8 clusters as

    the number of clusters because we can obtain most clear and meaningful managerial insights. The

    detailed Custom Tables are attached on Appendixes. Thirdly, we conduct Hierarchical Cluster

    Analysis to identify the cluster centers. Finally, we save the outcome in a new data file as initial

    seeds which are attached on Appendixes (see Table 1 in the Appendix).

    K-Means Cluster Analysis

    We use the results of Hierarchical Cluster Analysis as initial seeds and conduct K-means

    Cluster Analysis for the Calibration Sample. We choose 80 as maximum iterations and save cluster

    membership. The valid cases are 60,025, and the missing cases are 54. The Initial Cluster Centers

    (see Table 2 in the Appendix), Iteration History (see Table 3 in the Appendix), Final Cluster

  • 15

    Centers (see Table 4 in the Appendix) and Number of Cases (see Table 5 in the Appendix) in each

    Cluster are attached on Appendixes.

    Exploring Results

    In order to make sure we obtain optimal result, we use different random subsets and

    different methods to conduct Hierarchical Cluster Analysis. When we use Furthest Neighbor and

    Squared Euclidean distance as methods, the outcome is obviously inappropriate because most data

    are concentrated on 2 clusters. Other 6 clusters have extremely small and meaningless counts.

    More importantly, we cannot find the ideal group which has high return rate and high cancel rate.

    Then we save these outcomes as initial seeds in order to run K-means Cluster Analysis. We

    conduct K-means Cluster Analysis using different initial seeds. As expected, the results of

    calibration sample, the results of validation sample, and all data results cannot match in major

    clusters respectively.

    Finalize Calibration Results

    Based on our previous analysis, we finalize our decision by running K-means Cluster

    Analysis on Calibration sample. The following is calibration results (Table 3.2).

  • 16

    Validation Sample

    Firstly, we conduct Hierarchical Cluster Analysis to identify the cluster centers. We still

    use Wards method and Euclidean distance as our methods when we run Hierarchical

    Cluster Analysis. Then we run K-means Cluster Analysis on Validation sample using new

    initial seeds. The valid cases are 39,872, and the missing cases are 45. The Initial Cluster

    Centers (see Table 6 in the Appendix), Iteration History (see Table 7 in the Appendix), Final

    Cluster Centers (see Table 8 in the Appendix) and Number of Cases in each Cluster (see Table 9

    in the Appendix) are attached on Appendixes. The following is validation results (see Table 3.3).

    Cluster Number

    Mean Count Mean Count Mean Count Mean Count

    Profit 67.6 7057 916.98 1021 100.48 17777 168.81 2802

    Return Rate 0.79 7057 3.41 1021 0.6 17777 24.94 2802

    Cancel Rate 0.44 7057 5.43 1021 0.15 17777 16.47 2802

    Time Duration 81 7057 26 1021 13 17777 35 2802

    Cluster Number

    Mean Count Mean Count Mean Count Mean Count

    Profit 77.49 13597 67.79 2626 79.26 13894 107.68 1251

    Return Rate 0.28 13597 98.04 2626 0.1 13894 0.2 1251

    Cancel Rate 0.09 13597 0.02 2626 0.02 13894 97.09 1251

    Time Duration 61 13597 41 2626 38 13894 46 1251

    5 6 7 8

    Table 3.2 Calibration Clustering Results

    1 2 3 4

  • 17

    Compare and Finalize

    We compare the calibration results and validation results, and they are consistent.

    Especially, the most managerial meaningful clusters which have high return rate and cancel rate

    are consistent. Therefore, we conduct Hierarchical Cluster Analysis to identify the cluster centers.

    We still use Wards method and Euclidean distance as our methods. Then we run K-means Cluster

    Analysis on all data using new initial seeds. The valid cases are 99,897, and the missing

    cases are 99. The Initial Cluster Centers (see Table 10 in the Appendix), Iteration History

    (see Table 11 in the Appendix), Final Cluster Centers (see Table 12 in the Appendix) and

    Number of Cases in each Cluster (see Table 13 in the Appendix) are attached on Appendixes.

    The following is all data results (see Table 3.4).

    Cluster Number

    Mean Count Mean Count Mean Count Mean Count

    Profit 86.26 11681 525.19 1604 1939 99 78.24 2395

    Return Rate 0.89 11681 4.02 1604 2.91 99 85.14 2395

    Cancel Rate 0.17 11681 2.96 1604 10.45 99 0.01 2395

    Last Order Date 13 11681 25 1604 27 99 40 2395Cluster Number

    Mean Count Mean Count Mean Count Mean Count

    Profit 70.47 12078 73.82 10425 161.98 810 93.1 780

    Return Rate 0.4 12078 0.6 10425 4.05 810 0 780

    Cancel Rate 0.09 12078 0.07 10425 42.07 810 99.93 780

    Time Duration 69 12078 41 10425 41 810 46 780

    5 6 7 8

    Table 3.3 Validation Clustering Results

    1 2 3 4

  • 18

    Findings from Clustering Results

    The cluster 3 is one of the key customer clusters because this group of people contribute the

    highest profit, which is $914.13, to us. Also, their return rate is 3.47% and their cancel rate is

    5.28% which are relatively low. The average time duration since last order date is 26 months

    which is the second shortest among all clusters.

    The cluster 6 is also extremely important for us because the profit of this cluster is $100.16

    which is relatively high. The last order time is the shortest among all clusters, and their return

    rate and cancel rate are both under 1%. Moreover, the customer number of this cluster is the

    largest and makes up nearly 30% of all data sample.

    The cluster 4 is one of clusters which our team wants to highlight. The customers in this group

    have second highest profit which is $172.52 and third shortest time duration since last order

    date which is 35 months. However, their return rate and cancel rate are 24.78% and 16.64%

    Cluster Number

    Mean Count Mean Count Mean Count Mean Count

    Profit 67.38 11778 78.05 23015 914.13 1717 172.52 4726

    Return Rate 0.8 11778 0.11 23015 3.47 1717 24.78 4726

    Cancel Rate 0.4 11778 0.02 23015 5.28 1717 16.64 4726

    Last Order Date 81 11778 38 23015 26 1717 35 4726Cluster Number

    Mean Count Mean Count Mean Count Mean Count

    Profit 77.4 22793 100.16 29415 67.23 4371 103.31 2082

    Return Rate 0.29 22793 0.59 29415 98.08 4371 0.17 2082

    Cancel Rate 0.09 22793 0.14 29415 0.01 4371 97.41 2082

    Time Duration 61 22793 13 29415 41 4371 46 2082

    5 6 7 8

    Table 3.4 All Data Clustering Results

    1 2 3 4

  • 19

    respectively. We can obtain huge financial return if we can lower their return rate and cancel

    rate.

    The cluster 7 is another group which we want to deeply analyze. The profit of this cluster is

    $67.23, and their cancel rate is 0.01%, and last order period is 41. But, we are surprised that

    their return rate is 98.08%. It means we have been spent large amount of money to serve this

    group of customers and they have a huge negative influence on our companys financial status.

    We can largely cut down companys cost by decreasing their clusters return rate.

    The cluster 8 is surprising us as well. Their profit is $103.31 which is third highest among all

    clusters. The return rate of this group is 0.17%, and the time duration since last order date is

    46 months. However, the cancel rate of this cluster is 97.41%. From our perspective, this

    group of customers has large profit potential if we can optimize our purchase process to lower

    the cancel rate.

    The cluster 2 and cluster 5 are also significant for our analysis because of several reasons.

    These two groups have large customer number. Although the profit of these two groups is

    both under $80, the return rate and cancel rate are extremely low which all under 0.3%.

    Meanwhile we also need to notice that their time duration since last order date are more than

    3 years, so we must figure out how to arouse those old customers.

    Cluster 1 is relatively unimportant for this analysis. Although the profit is $67.38, the return

    rate and cancel rate are low. These customers didnt buy any product from our store more than

    6 years. Thus, it is very hard to re-target this group of people.

  • 20

    4. Conclusion & Recommendations

    After all data analysis, we segment our customers in 8 groups. Our goal is to decrease return

    rate and cancel rate so that we can improve our customers profitability and satisfaction. We also

    want to regain our old customers and increase their loyalty. According to Table 3.4, we create the

    pie chart (see Fig.4.1) that illustrates the percentages of different segments that make up total

    profits.

    Figure 4.1 Profit Distribution among Groups

    From Fig. 4.1, we can clearly find that cluster 6, 2, 5, and 3 contribute the majority (79%) of

    our total profits. These customers are our key customers in terms of total profits they generate.

    According to Table 3.4, Customers in cluster 6 have the shortest time duration since the last

    order date, which means these people now have the highest awareness of Saks among all customers

    and have a stronger connection to us. We need to retain these customers for the long-term

    development because they have higher probabilities to bring potential profits. In addition, the fact

    that their return rate and cancel rate are both low shows that they currently are satisfied with our

    Group 18%

    Group 218%

    Group 315%

    Group48%

    Group 517%

    Group 629%

    Group 73%

    Group 82%

    Profits Distribution

  • 21

    products and services.

    Cluster 2 has the lowest return rate and second lowest cancel rate. These customers are highly

    satisfied with our products and services. These customers purchase products in Saks with fewer

    hesitations. But they have not placed an order for more than 3 years, so it is important for us to

    retarget them.

    For the cluster 5, customers return rate and cancel rate are low, so their satisfaction is stable.

    But they have not placed an order for more than 5 years. The mean profits of this group is relatively

    low. Low return/cancel rates and low profits suggest that these customers are perhaps concerned

    that the return/cancel process will bring them many troubles, so they are unwilling to buy the

    product with a very high price. For these customers, we need to soothe their worries and convey

    the information that Saks is the ideal store to buy high-end products. Meanwhile, we should update

    their personal information and demands since they have not purchased products from us for more

    than 5 years.

    Since the mean of profits in cluster 3 is the highest, these customers are valuable for us. The

    return rate and cancel rate are relatively low, but we still need to decrease return and cancel rate

    indoor to increase their satisfaction as much as possible. This groups time duration since last order

    data is the second shortest, so we need to retain them and persuade them to set up a long-term

    trustworthy relationship with us, helping us to generate more profits in the future.

    Cluster 8 has the second-lowest return rate, so these people at least are satisfied with the

    products that they have already bought. However, their cancel rate is extremely high which means

    we lost most of our potential profits that they intended to purchase at the beginning. Meanwhile,

    customers in cluster 8 have long time duration since their last order date, which means that they

    are not willing to purchase products from our store since they had bad purchase experience before.

  • 22

    For example, they may be disappointed with our websites slow updating frequency or long

    shipping time. Thus, we may need to regain these customers by setting up specific strategies to

    target their needs more efficiently.

    Although the profits that cluster 4 bring to us are very high, these customers net profits are

    not as high as we see in the table because of their high return rate and cancel rate. This situation

    indicates that customers are dissatisfied with our products or services. We should improve the

    quality of our products and optimize our services to convince them to keep purchasing products

    from Saks with a lower return rate and cancel rate. In this way, we can prevent the loss of potential

    profits from these customers.

    No matter in terms of total profits or the mean of profits, customers in cluster 7 generate low

    profits for us. Their return rate is the highest, which means they almost return all the products that

    they purchased before. Although our employees spend much time and effort serving them and

    trying to meet their needs, these people return most of our products. So there must be something

    wrong with our products or services. Since this group of customers has negative influences on our

    financial situation right now, the spending on them will be more productive and efficient if we can

    lower their return rate.

    Recommendations

    In order to provide appropriate recommendations for our customers based on their different

    characteristics, we need to analyze some reasons for consumers return and cancelation behaviors.

    The difference between these two behaviors is that returns happen when customers have already

    purchased products and cancelations happen when people have not paid for the product yet.

    As we all know, Saks Fifth Avenue is both a retailer and e-retailer. In our physical store,

  • 23

    customers return items largely because of our staff who cannot provide the proper product

    information or shopping advice for customers. On the other hand, an increasing amount of

    customers are purchasing products on our official website or app. Thus some problems appear. For

    example, when a customer purchases a pair of shoes on our website, he/she cannot look at or try

    on these products in person. Many customers will be disappointed when they receive the packages

    because the products do not match their expectations. Therefore, for these reasons, customers have

    higher possibilities of returning their products.

    In addition, more and more online retailers appear, which gives people multiple opportunities

    to compare price. They can easily find a better price for the same product on other websites, and

    once they find it, they will switch to other retailers. Our team has summarized several possible

    reasons for return behaviors:

    The product itself cannot satisfy our customers. For instance, if one customer bought a sweater

    on our website and she was not satisfied with the material of the cloth, she might return this

    sweater.

    Another normal situation is that the product is damaged during the shipping process. Under

    this situation, the customer definitely will return his/her product.

    The description of the product is not consistent with the real product or the details of the

    product are not provided very clearly. The higher the expectations customers have based on

    the description on our website, the more disappointed they will be if the product doesnt match

    the description.

    Shopping guides dont offer clear explanations for our customers. When customers ask our

    shopping guides for some advice or information in our physical stores, it is possible that our

    shopping guides are unable to provide proper advice. Misleading information and advice will

  • 24

    probably result in return behaviors.

    Poor post-purchase service is another important factor that will cause people to return their

    products. Saks is a high-end retailer that the prices of our products are relatively high. When

    customers pay a premium for a product, they will have higher requirements for customer

    services. If our post-purchase services cannot solve their problems in time and effectively, they

    may return their products as well. For instance, when a customer calls our representative to

    require an exchange, if we process this demand very slowly, the customer may run out of

    patience and decide to return the product directly.

    Our team has summarized several possible reasons for cancelation behaviors:

    Customers make some mistakes when they place an order. For instance, they may find that

    they chose the wrong size or wrong color when they checkout. Under this situation, they will

    cancel the order and replace it with the right order, so this kind of cancelation will not

    essentially influence our sales. However, we still need to provide a clearer website design and

    better information to help customers place orders correctly. The other condition is that

    customers fill in the wrong personal information when they checkout, so they need to cancel

    the order and order the product again. This condition doesnt have significant influences on

    our profits because customers usually will place the order again.

    Customers find a better offer on other websites. Since more and more online retailers appear,

    many customers are used to comparing prices of the same products on different websites before

    they checkout. Once they find a better offer on another on-line retailer, they will cancel the

    previous order on our website.

    Personal factors. It happens all the time that customers put items in their shopping carts when

    they are stimulated by some external incentives, but they still hesitate to buy. Products from

  • 25

    Saks usually have high prices, so a majority of customers need a longer time to consider. After

    the impulse disappears, most customers will recover their rational thoughts and decide to

    cancel the order.

    Based on the previous analysis, Saks can prevent lots of customers return and cancelation

    behaviors by taking practical actions. We hereby provide managerial recommendations based on

    each groups characteristics.

    Regarding cluster 7, customers generate relatively low profits but their return rate is the

    highest. Obviously, we need to decrease the return rate in order to encourage them spend

    more on Saks. Firstly, we should improve the quality of the information on our website, such

    as providing them more description about products details. In this way, customers could have

    better understanding before they purchase products.

    Secondly, Saks should use better shipping packaging in order to protect products from being

    damaged by external forces. According to our research, we find that customers care more

    about the packaging when they pay high prices for products. So, delicate packaging can not

    only convey a good impression for our company but also match customers expectations.

    Besides, due to their frequent return behaviors, this groups profits may be relatively low, so

    if we can decrease their return rate, their profits will increase somewhat.

    Regarding cluster 8, this group generates relative high profits, but it also has the highest cancel

    rate and has not purchased products from us for a long time. Saks should provide these

    customers more straightforward information about products when they do shopping on our

    website so as to reduce the probability of misleading them. In addition, Saks should highlight

    low stock next to the quantity box in order to give customers a hint that this product may be

    not available in a short time. In this way, we can largely reduce the time they hesitate and

  • 26

    motivate them to pay for the order immediately.

    Besides, we can remind customers the number of people who are watching this product at the

    same time. Giving them an impression that this product is really popular can motivate them to

    complete the transaction quickly. A lot of potential profits will be realized if this groups

    cancel rate can be decreased. Since we have the contact information of these customers, Saks

    should send them greeting emails to show our care. By telling them the new changes about

    our company and our new arrivals, we can trigger their interests again.

    Regarding cluster 4, the mean profits of this group is the second highest, but their cancel rate

    and return rate are relatively high among all groups. Firstly, we need to systematically train

    our salespersons and shopping guides so that they have the ability to provide more appropriate

    advice and information for our customers. Considering that this group has not placed orders

    from us for more than two years, it is really helpful to retarget them by sending them

    promotional emails seasonally, especially for holidays. In order to prevent the return behaviors,

    we can also provide them discount coupons for their next purchases if they agree to keep their

    products this time. If they insist to return, we can offer them a refund, like 5% of the original

    price, to convince them not to return.

    Regarding cluster 3, this group generates much higher profits than other groups. So these VIP

    customers return and cancelation behaviors have more serious negative effects on our profits.

    Saks should provide a personal shopping guide for each of them so that we can be aware of

    and solve their problems in a timely manner and correctly. Saks will gain huge financial

    returns if we can decrease these VIP customers return rate to below 1%.

    Regarding cluster 2 and 5, the mean profits of these two groups are in the middle level, and

    their cancel rate and return rate are extremely low. Based on our previous analysis, these

  • 27

    customers may have some concerns that the returning and canceling process would bring them

    inconvenience, so they are unwilling to purchase high-price items. For these customers, we

    need to provide them a guarantee that if they are not satisfied our products, they have multiple

    channels to contact us, and we will deal with their problems in 24 hours. We believe that they

    will spend more money if Saks shopping process become more convenient.

    Regarding cluster 6, the population in this group is the largest, which accounts for 30% of all

    population. Their mean profits is relatively high. More importantly, the time duration since

    their last order is the shortest. In this situation, we should send them promotional emails or

    mailings more frequently to maintain their interests and to convince them to keep purchasing

    from us. For example, we send them promotion coupons, like 10% discount. For these

    customers, we also want them to generate more profits for our company because they have

    potential profitability. Thus, we can try to offer them information about some high-end brands

    products through emails or mailings, in an effort to persuade these customers to buy higher

    priced products.

  • 28

    5. Limitations and Future Research

    Though we successfully identify 8 groups with diverse characteristics, we understand our

    analysis has its limitations.

    We lack some supportive data to serve decision making and reinforce our

    recommendations. Our study aims at identifying and investigating actionable customer groups

    with unique features. For example, for a high return rate group, convincingly lowering its return

    rate increases its profit. However, the current data is capable of identifying who are high

    return/cancel rate customers, but does not enable us to investigate why they return and/or cancel

    orders. As discussed in the previous sections, the reasons leading to high return/cancel rate are

    diverse. Knowing the motivations and reasons of returning and cancelling enables us to improve

    and optimize in avoidance of future similar situations. Unfortunately, we could not learn relevant

    insights from the current data, or otherwise we would have been able to come up with more specific

    recommendations for different segments.

    For future research, we have to extend our data diversity, especially adding the data that

    assists in learning returning and cancelling reasons. Saks has two major retail channels: online

    stores and offline stores. To comprehensively analyze the entire customer pool anticipates an

    improved data collection mechanism. For the online channel, one suggestion for future data

    collection is to add a check box listing possible return/cancel reasons in the after-sale-service page.

    The check box window appears when customers apply for a return or a cancel so that our database

    could record and store what we need. By the same token, when customers return in offline stores,

    our sale assistants should also learn their return reasons and record them into the sale system.

  • 29

    The ultimate goal of analyzing customer information and consumption data is to obtain

    financial returns, increased profit for instance. We note that there are various ways to improve

    profit. While this study aims at investigating return rate and cancel rate, future research could focus

    on improving profit through increasing revenue.

  • 30

    6. Appendix

    Cluster Number

    M C M C M C M C M C M C

    Last Order Date 32 627 76 1039 18 2302 43 254 53 1684 48 135

    Profit 308 627 50 1039 85 2302 66 254 88 1684 75 135

    Return Rate 18 627 0 1039 0 2302 100 254 0.05 1684 0 135

    Cancel Rate 11 627 0 1039 0 2302 0 254 0.11 1684 100 135

    Cluster Number

    M C M C M C M C M C M C M C

    Last Order Date 28 223 76 1039 18 2302 34 404 43 254 53 1684 48 135

    Profit 611 223 50 1039 85 2302 141 404 66 254 88 1684 75 135

    Return Rate 4 223 0 1039 0 2302 26 404 100 254 0.05 1684 0 135

    Cancel Rate 6.6 223 0 1039 0 2302 14 404 0 254 0.11 1684 100 135

    Cluster Number

    M C M C M C M C M C M C M C M C

    Last Order Date 28 223 76 1039 27 994 34 404 12 1308 43 254 53 1684 48 135

    Profit 611 223 50 1039 58 994 141 404 106 1308 65.91 254 88 1684 75 135

    Return Rate 4.06 223 0 1039 0 994 26 404 0.01 1308 100 254 0.05 1684 0 135

    Cancel Rate 6.6 223 0 1039 0 994 14 404 0 1308 0 254 0.11 1684 100 135

    7 8

    1 2 3 4 5 6 7 8

    1 2 3 4 5 6

    Table 1. Hierarchical Cluster Analysis on 10% Calibration Sample

    1 2 3 4 5 6 7 8

    Cluster Number 1 2 3 4 5 6 7 8

    Zscore (Profit) -0.49722 1.41462 -0.56187 -0.26828 -1.16908 0.07006 0.50463 0.29141

    Zscore (Return Rate)

    Zscore (Cancel Rate)

    -0.27425 -0.27678

    0.23681 -0.1999 -0.1999 0.70025 -0.1999 -0.1999 -0.19288 6.4128

    -0.08598 -0.27678 -0.27678 0.94381 -0.27648 4.42492

    Table 2. Initial Cluster Centers for Calibration Sample

    2.86974 -0.29206 -0.24894 0.22321 0.02253 -0.20202 -0.07664 -0.15222Zscore (Time Durtion)

  • 31

    1 2 3 4 5 6 7 81 1.299 0.88 0.303 0.49 1.333 0.338 0.324 0.4552 0.104 0.388 0.401 0.158 0.26 0.004 0.108 0.0313 0.029 0.412 0.151 0.065 0.064 0.003 0.141 04 0.037 0.322 0.083 0.041 0.009 0.001 0.131 0.025 0.029 0.254 0.08 0.033 0.018 0 0.105 06 0.011 0.207 0.029 0.031 0.03 0 0.055 0.0067 0.004 0.15 0.018 0.027 0.023 0 0.034 08 0.005 0.132 0.012 0.022 0.016 0 0.02 0.0029 0.004 0.116 0.006 0.018 0.011 0.001 0.009 0.00210 0.007 0.082 0.006 0.014 0.007 0.001 0.007 0.00611 0.006 0.057 0.003 0.012 0.006 0 0.005 012 0.006 0.055 0.003 0.005 0.004 0 0.003 013 0.007 0.046 0.002 0.006 0.004 0 0.001 014 0.009 0.033 0.001 0.002 0.006 0 0 015 0.016 0.026 0.001 0.003 0.01 0 0.001 016 0.014 0.026 0.001 0.005 0.011 0 0.003 017 0.006 0.031 0.001 0.005 0.007 0 0.004 018 0.004 0.021 0.001 0.005 0.003 0 0.002 019 0.001 0.019 0.001 0.003 0.001 0 0.001 020 0.001 0.015 0 0.003 0.001 0.001 0 021 0 0.015 0 0.001 0.001 0 4.68E-05 022 0.001 0.017 0.001 0.003 8.96E-05 0 4.97E-05 023 0 0.014 0 0.002 0 0 0 024 0 0.007 0 0.002 0 0 0 025 0 0.002 0 0.003 0 0 5.31E-05 026 0 0 0 0.001 0 0 0 027 0 0 3.02E-05 0 0 0 3.86E-05 028 0 0 0 0 0 0 0 0

    Table 3. Iteration History for Calibration Sample

    IterationChange in Cluster Centers

  • 32

    Cluster Number 1 2 3 4 5 6 7 8

    -0.1987 -0.1987 6.2202

    Zscore (Time Duration)

    Zscore (Profit)

    Zscore (Return Rate)

    Zscore (Cancel Rate) -0.1706 0.15895 -0.1903 0.88898 -0.1938

    -0.1915 -0.1269 0.03311

    -0.2396 -0.1166 -0.2484 0.89573 -0.2634 4.33294 -0.2722 -0.2672

    -0.1925 4.58926 -0.0074 0.37726 -0.1368

    Table 4. Table Final Cluster Centers for Calibration Sample

    1.60161 -0.6019 -1.1245 -0.2261 0.8055 0.00026 -0.1061 0.2118

    1 7057

    2 1021

    3 17777

    4 2802

    5 13597

    6 2626

    7 13894

    8 1251

    Valid

    Missing

    Table5. Number of Cases in each Cluster for Calibration Sample

    Cluster

    60025

    54

    Cluster Number 1 2 3 4 5 6 7 8

    Zscore(Profit) 0.05353 0.9094 1.63409 0.00658 -0.50291 -1.07304 -0.23003 0.15992

    2.46949 6.4128

    -0.27678 0.00582 -0.27678

    Zscore (Cancel Rate) -0.1999 -0.18996 -0.1999 -0.1999 -0.19252 -0.1999

    Zscore (Return Rate) -0.27678 -0.27678 -0.27678 4.42075 0.56365

    Table 6. Initial Cluster Centers for Validation Sample

    Zscore (Time Duration) -0.21249 -0.09599 -0.24407 -0.20105 0.62656 -0.19915 0.30391 -0.02783

  • 33

    Iteration 1 2 3 4 5 6 7 91 0.405 0.206 1.192 0.294 0.859 0.543 0.579 0.2582 0.433 0.091 0.721 0.17 0.122 0.387 0.085 0.0633 0.129 0.134 0.59 0.199 0.062 0.094 0.019 0.0034 0.024 0.212 0.577 0.136 0.016 0.025 0.007 0.0185 0.021 0.214 0.534 0.071 0.009 0.017 0.018 06 0.012 0.189 0.553 0.024 0.01 0.017 0.011 07 0.003 0.135 0.431 0.009 0.006 0.014 0.006 08 0.004 0.102 0.404 0.003 0.002 0.009 0.003 09 0.006 0.099 0.412 0.002 0.001 0.007 0.004 010 0.005 0.098 0.498 0.001 0.001 0.006 0.007 011 0.005 0.078 0.383 0.002 0.001 0.004 0.006 012 0.004 0.068 0.286 0.002 0 0.004 0.005 013 0.005 0.065 0.26 0.002 0.001 0.003 0.008 014 0.004 0.057 0.225 0 0 0.003 0.003 015 0.004 0.051 0.177 0.001 0.001 0.002 0.006 016 0.003 0.042 0.19 0 0 0.001 0.002 017 0.003 0.039 0.178 0 0 0.002 0 018 0.003 0.033 0.191 0 0 0.001 0 019 0.002 0.034 0.175 0.001 0 0.001 0.002 020 0.001 0.033 0.257 0 0.001 0.001 0.004 021 0.002 0.032 0.21 0 0.001 0 0.002 022 0.002 0.033 0.265 0.001 0 0.001 0.008 023 0.001 0.023 0.079 0.001 0 0.001 0.002 024 0.001 0.015 0.041 0 0 0.001 0.007 025 0.001 0.012 0.041 0 0 0.001 0 026 0.001 0.007 0 0 0 0 0 027 0 0.002 0 0 0 0 0 028 0 0.001 0 0 0 0 0 029 0 0 0 0 0 0 0 0

    Table7. Iteration History for Validation Sample

  • 34

    Cluster Number 1 2 3 4 5 6 7 9

    6.40827

    -0.08657 -0.27678

    Zscore (CancelRate) -0.18893 -0.00429 0.49115 -0.19921 -0.19388 -0.19533 2.58208

    -0.15752 0.33883 -0.04898

    Zscore (ReturnRate) -0.23482 -0.08775 -0.13979 3.7263 -0.25802 -0.24841

    Zscore (Profit_sum) -0.08745 2.38355 10.34737 -0.13262 -0.17638

    Table8. Final Cluster Centers for Validation Sample

    Zscore (Time Duration) -1.10393 -0.62541 -0.55918 -0.03315 1.15412 0.01119 0.00878 0.20639

    1 116812 16043 994 23955 120786 104257 8108 09 780

    ValidMissing

    Table 9. Number of Cases in each Cluster for Validation

    Cluster

    3987245

    Cluster Number 1 2 3 4 5 6 7 8

    Zscore (Time Duration)

    Zscore (Profit)

    Zscore (Return Rate)

    Zscore (Cancel Rate)

    Table10. Initial Cluster Centers for All Data

    3.39681 -0.066 -0.239 0.26632 0.67681 -0.2987 -0.2043 -0.1314

    -0.6108 -1.0087 1.36108 -0.2214 0.48402 0.24312 0.04853 0.2496

    -0.1109 -0.2767 -0.2768 0.99106 -0.2506 -0.2768 4.42492 -0.2768

    0.19242 -0.1999 -0.1999 0.75803 -0.1211 -0.1999 -0.1999 6.4128

  • 35

    1 2 3 4 5 6 7 81 1.463 0.594 1.082 0.447 0.729 0.614 0.325 0.4142 0.308 0.183 0.411 0.139 0.103 0.195 0.009 0.0353 0.082 0.109 0.349 0.068 0.075 0.077 0.001 0.0114 0.042 0.063 0.285 0.051 0.047 0.03 0.001 0.0085 0.021 0.03 0.233 0.04 0.023 0.017 0 0.0056 0.011 0.021 0.184 0.03 0.017 0.01 0.001 07 0.006 0.015 0.151 0.02 0.013 0.008 0.001 08 0.004 0.006 0.121 0.01 0.004 0.007 0 09 0.001 0.002 0.109 0.01 0.002 0.005 0 0.008

    10 0 0.001 0.087 0.012 0.001 0.003 0 0.00311 0 0.001 0.051 0.009 0 0.002 0 0.00212 0 0 0.048 0.009 0 0.002 0.001 0.00113 0 0 0.036 0.006 0.001 0.001 0 014 0 0 0.028 0.006 0 0.001 0 0.00215 0 0 0.022 0.005 0 0.001 0 016 0 5.54E-05 0.019 0.006 0 0 0 017 0.001 0 0.017 0.005 0 0 0 018 0 9.84E-05 0.018 0.004 9.04E-05 0.001 0 019 0 0 0.014 0.002 7.18E-05 0.001 0 020 0 7.04E-05 0.011 0.002 0 0 0 021 0 0 0.005 0.001 0 0 0 022 0 3.14E-05 0.01 0.002 0 0 0 023 0 0 0.013 0.002 0 0 0 024 0 0 0.01 0.003 0 0 0 025 0 0 0.004 0.003 6.85E-05 0 0 026 0 0 0.003 0.001 6.85E-05 8.99E-05 0 027 0 0 0.004 0.001 6.85E-05 9.97E-05 0 028 0 0 0.004 0.001 9.62E-05 0 0 029 0 0 0.005 0.002 9.80E-05 0 0 030 0 4.69E-05 0.004 0.002 0 7.14E-05 0 031 0 4.68E-05 0.006 0.002 0 6.64E-05 0 032 0 0 0.004 0.001 0 9.27E-05 0 033 0 0 0.004 0 0 0 0 034 0 0 0.003 0 0 8.59E-05 0 035 0 0 0 0 0 0 0 0

    Table 11. Iteration History for All DataIteration Change in Cluster Centers

    Cluster Number 1 2 3 4 5 6 7 8

    6.2413

    4.33483 -0.2688

    Zscore (Cancel Rate) -0.1736 -0.1987 0.14937 0.9007 -0.1939 -0.1904 -0.1992

    -0.0092 -0.1946 0.00853

    Zscore (Return Rate) -0.2394 -0.2717 -0.1138 0.88815 -0.2632 -0.2489

    Zscore (Profit) -0.1938 -0.1337 4.57319 0.39816 -0.1373

    Table 12. Final Cluster Centers for All Data

    Zscore (Time Duration) 1.60055 -0.1037 -0.6051 -0.2386 0.80404 -1.1248 0.00654 0.20816

  • 36

    1 117782 230153 17174 47265 227936 294157 43718 2082

    ValidMissing

    Table 13. Number of Cases in each Cluster for All Data

    Cluster

    9989799