a recommender system to avoid customer churn: a case study

Expert Systems with Applications 36 (2009) 8071–8075

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

A recommender system to avoid customer churn: A case study

Yi-Fan Wang a, Ding-An Chiang b, Mei-Hua Hsu c,*, Cheng-Jung Lin b, I-Long Lin d

a Department of Information Management, Chang Gung Institute of Technology, Taiwanb Department of Information Engineering, Tamkang University, Taiwanc Center for General Education, Chang Gung Institute of Technology, 261, Wen-Hwa 1st Road, Kwei-Shan, Taoyuan, Taiwand Department of Information Management, National Central Police University, Taiwan

a r t i c l e i n f o a b s t r a c t

Keywords:CRMData miningDecision treeRecommender system

0957-4174/$ - see front matter � 2008 Elsevier Ltd. Adoi:10.1016/j.eswa.2008.10.089

* Corresponding author. Tel.: +886 3 2118999; fax:E-mail address: [email protected] (M.-H. Hsu

A major concern for modern enterprises is to promote customer value, loyalty and contribution throughservices such as can help establish a long-term, honest relationship with customers. For purposes of bet-ter customer relationship management, data mining technology is commonly used to analyze large quan-tities of data about customer bargains, purchase preferences, customer churn, etc. This paper aims topropose a recommender system for wireless network companies to understand and avoid customerchurn. To ensure the accuracy of the analysis, we use the decision tree algorithm to analyze data of over60,000 transactions and of more than 4000 members, over a period of three months. The data of the firstnine weeks is used as the training data, and that of the last month as the testing data. The results of theexperiment are found to be very useful for making strategy recommendations to avoid customer churn.

� 2008 Elsevier Ltd. All rights reserved.

1. Introduction vent customer loss. In fact, a regular trading system accumulated

With wireless formats becoming more and more mature, wire-less equipment, such as Access Point (AP), wireless network cards,etc., are getting less expensive, and mobile computing devices,such as notebooks and PDAs, becoming more popular. Accordingto the Dell’Oro Marketing Survey Report, the market of the wirelesslocal area network (WLAN) grew steadily about 40% year by year,from 2003 to 2006. In 2006, the net value of the global wirelessnetworking already broke 100B dollars. The WLAN, as an extensionand supplement to wired networking, will continue to be in greatdemand for the next generation.

When a wireless network company changes the strategy fromproduct-based to customer-based, the customer relationship man-agement (CRM) becomes very important. As the 80/20 rule goes,eighty percent of benefit comes from twenty percent of customers,so it is essential that companies understand customer need and de-velop suitable CRM strategies in order to ensure customer reten-tion, loyalty, and satisfaction. For this purpose, companies haveto collect information about customers and analyze their behaviorpatterns, and the ability to integrate and utilize such informationeffectively, therefore, is crucial to their CRM performance.

However, we have to realize customer’s behavior before we cando an excellent CRM job. To understand customer’s behavior, weneed to collect all kinds of information about customers and ana-lyze the behavior patterns of the customers to realize customer’sbehavior. That would help us to adopt appropriate strategy to pre-

ll rights reserved.

+886 3 2118866.).

enormous amount of valuable data. Therefore, how to integrateand utilize those data would be a critical task. Data mart was thesolution created especially for this issue (Demarest, 1994). Datamart is the subject to store all kinds of internal and external datafor future analysis.

A recommender system is one that recommends useful infor-mation or suggests strategies that users might apply to achievetheir goals. The system gives suggestions based on a given event,such as an error, or on observations of the user’s overall behavior.A simple example is a research engine that, when no results foundfor a query, suggests alternate keywords or queries that mayachieve better results (Diamond Bullet, 2004). Recommender sys-tems are widely used in the fields of E-commence, movies, music,books, and Web pages successfully.

Mooney and Roy (2000) suggest a content-based book recom-mending system that utilizes information extraction and a ma-chine-learning algorithm for text categorization. Miller, Albert,Lam, Konstan, and Riedl (2003) propose the recommender systemMovie Lens, which builds on and extends a movie recommendation(http://movielens.umn.edu) that provides movie, DVD, and VHS vi-deo recommendations, along with a search capability. Chen andChen (2001) design the Music Recommender System (MRS), whichenables a website to provide personalized music recommendationservice based on music data grouping and user interest. The GroupLens recommender system helps users browse among articles inUsenet news (Konstan et al., 1997; Resnick, Iacovou, Suchak, Berg-strom, & Riedl, 1994). Ringo allows users to get music recommen-dations online and connect with other music fans (Shardanand &Maes, 1995). Again, more successful applications can be mentioned

http://movielens.umn.edu

mailto:[email protected]

http://www.sciencedirect.com/science/journal/09574174

http://www.elsevier.com/locate/eswa

Data Mart

Field Selection

Classification

Recommendation

Users

Our System

Raw Data

Fig. 1. A simplified architecture of the system.

Table 1Description of members.

No. Income Age Degree

8072 Y.-F. Wang et al. / Expert Systems with Applications 36 (2009) 8071–8075

at online food stores (Svensson, Laaksolahti, Höök, & Waern, 2000),online book stores such as Amazon.com (Linden, Smith, & York,2003), etc.

2. Preliminary

Decision tree is a Data Mining technology used for classificationand prediction. The tree graph represents the decision tree. First,one data enters root node of the tree, and then it is decided wherechild node will go. The process as above will repeat until the datareach to the leaf node.

The famous algorithms of decision tree are C4.5 (InteractiveDichotometer 3, Quinlan, 1983), CART (Classification and Regres-sion Tree, Breiman, Friedman, Olshen, & Stone, 1984), and CHAID(Chi-square Automatic Interaction Detector, Kass, 1980). Althoughthe algorithms are different, the purposes of them are alike. Thereare many advantages of the decision tree such that the rules re-sulted from the decision tree are readily understandable, the datacan be classified with less calculation, and it can analyze the nu-meric or categorical data.

Many researchers target customers by the segmentation meth-od. Pons (2006) uses biometric technology to bridge the gap be-tween consumer need and the marketer’s perception. Lee andPark (2005) put forward a profitable customer segmentation sys-tem based on a customer satisfaction survey. Tsai and Chiu(2004) propose a combination of methodologies that include pur-chased-based similarity measuring, the clustering algorithm, andthe clustering quality function. Kim, Jung, Suh, and Hwang(2006) segment customers and develop strategies based on thecustomer lifetime value. Hwang, Jung, and Suh (2004) present acase study of the wireless telecommunication industry that catego-rizes customers according to different customer values. Chalmeta(2006) offers a formal methodology of developing and implement-ing a CRM System that considers and integrates various aspects,including the defining of a customer strategy, the re-engineeringof a customer-oriented business process, the management of hu-man resources and computer systems, as well as the managementof change and continuous improvement. Chen, Chiu, and Chang(2005) establish a method of mining changes in customer behaviorby integrating customer behavior variables, demographic vari-ables, and the transaction database. Van Raaij, Vernooij, and VanTriest (2003) help a firm determine the profit contribution of cus-tomers, using customer profitability analysis. Yuan and Chang(2001) suggest a mixed-initiative synthesized learning approachto better understanding of customers. Bae, Ha, and Park (2005) de-velop a web-based system for a life insurance company to analyzethe voices of customers.

Other researchers attempt to predict customer behavior. Suh,Lim, Hwang, and Kim (2004) propose a methodology that supportsweb marketing by predicting the purchase probability of anony-mous customers. Van den Poel and Buckinx (2005) use a logicmodeling and forward and backward variable-selection techniqueto predict a purchase during the next visit to the website. Lariviereand Van den Poel (2005) predict customer retention and profitabil-ity, using random-tree and regression-tree techniques. Verhoef andDonkers (2001) predict customer potential values for an insuranceindustry. Lu and Lin (2002) determine customer loyalty levels andpredict customer behavior in the market-space by examining thesignificance of the content, context, and infrastructure.

1 1 2 Low2 1 1 High3 2 1 High4 1 2 Low5 3 2 Low6 2 2 High7 2 2 High

3. Proposing a recommender system

Our system uses Visual BASIC on an IBM PC and includes fourmajor modules: data mart, field selection, classification, and rec-ommendation. Fig. 1 is a simplified architecture of the system. In

the proposed system, users can gain recommendations from thesystem. Since the process is application-oriented, different applica-tions may need different classification approaches as appropriate.For present purposes, the Decision Tree is employed as our classi-fication function.

3.1. Data mart

Since some unneeded fields exist in the raw data, a data mart(Demarest, 1994), or a relational database, is required to storethose cleaned data. That is, only useful data is downloaded andreformatted into the data mart (Liu & Setiono, 1996).

3.2. Field selection

Field selection is a step that comes before classification. Manyfield selection algorithms have been proposed, some of which havereported remarkable accuracy improvement (Liu & Motoda, 1998).In this part, we use consistency measurement to find some impor-tant fields. The concept of consistency measurement is using min-imum fields to describe raw data. For example, given the data inTable 1, we can determine by consistency measurement whichfield – ‘‘Income” or ‘‘Age” – is more important to the field ‘‘Degree.”

First, the number of inconsistency between ‘‘Income” and‘‘Degree” is computed: I(Income, Degree) = I(Income(1), De-gree) + I(Income(2), Degree) + I(Income(3), Degree) = 1 + 0 + 0 = 1.So the number of inconsistency between ‘‘Income” and ‘‘Degree”is one. In the same way, the number of inconsistency between‘‘Age” and ‘‘Degree” is computed: I(Age, Degree) = I(Age(1), De-gree) + I(Age(2), Degree) = 0 + 2 = 2. So the number of inconsis-tency between ‘‘Age” and ‘‘Degree” is two. Obviously, the field‘‘Income” is more important to the field ‘‘Degree”.

Table 2Descriptions of input data.

Variablename

Description

Last_con_Day Interval (days) since the last use of the company’s wirelessnetwork

USER_COUNT Times of logging inOUT_Flag Customer loss or retention flag:

YES: customer lossNO: customer retention

Result Loss or extend contract

Y.-F. Wang et al. / Expert Systems with Applications 36 (2009) 8071–8075 8073

3.3. Classification

Scholars (Quinlan, 1983; Kass, 1980; Chiang, Chen, Wang, &Hwang 2001) have very well addressed the Decision Tree algo-rithms. In this part, we shall borrow from Chiang et al. (2001) tofind useful rules.

3.4. Recommendation

The conditional rules produced by our decision tree algorithmwill show the characteristics of customer behavior that can leadto customer loss. Such findings will serve as the basis for marketingstrategy recommendations to prevent future customer loss.

4. Experimental result

The experiment begins with an understanding the goals andneeds of the business, transforms the knowledge obtained to ques-tions, and then designs a preliminary plan for the goals. Our con-cern at this step is to get a deep understanding of the subject ofour experiment, which is a wireless networking company. Specifi-cally, the purpose is to learn about its source of operational in-come, methods of metering and collecting fees, the wiring areas,density and coverage rates in different areas, properties of eachinstallation point, etc.

A basic understanding of the data is thus gained, including userdata and other useful fields, in terms of data characteristics andproperties as well as mutual relationships among different data

Fig. 2. Results of Last_con_Day used as the in

Fig. 3. Results of USER_COUNT used as the in

fields. To ensure the accuracy of the analysis, we use the decisiontree algorithm to analyze data of over 60,000 transactions and ofmore than 4000 members, over a period of three months. The dataof the first nine weeks is used as the training data, and that of thelast month the testing data.

4.1. Field selection

The task is to select data properties, reduce the data size, andclean the data; the step prepares the data for analysis. It is the datafields thus derived that will be used to analyze because some of theraw data, such as amounts and times of commerce, may have lin-ear correlations, and to enter such linearly related fields wouldlead to an undesirable result.

The raw data indicates that the most popular program is thepre-paid card category, which accounts for 77% of all sold pro-grams, and that the second popular is the monthly rental program,accounting for 11%. In terms of age, the customers are mainly be-tween ages 10 and 49, mostly in the twenties, and there are moremale users at all levels of age. Most users have full-time jobs.

Through field selection, as introduced in Section 3.2, we gainthese four fields: including interval days since last use, total con-nection times of the user, and a predicted flag. Descriptions ofthe fields are shown in Table 2.

4.2. Customer churn analysis using the decision tree

In Fig. 2, Last_con_Day is used as the input field of the decisiontree algorithm. When Last_con_Day is shorter than 3.5 days, thesaturation for contract extension is 77.7%, which means the usewill most likely continue the program.

In Fig. 3, USER_COUNT is used as the input filed of the decisiontree algorithm. When USER_COUNT is longer than, or equal to, 10.5days, the saturation of contract extension achieves 70.8%, meaningthat the user will very likely continue the program.

As shown in Fig. 4, when Last_con_Day is shorter than 3.5 days,as shown in Fig. 4, the possibility of contract extension decreases. Itis recommended that the company promote programs to suchusers.

By contrast, as shown in Fig. 5, when USER_COUNT is longerthan, or equal to, 10.5 days, the possibility of contract extension in-

put filed of the decision tree algorithm.

put filed of the decision trees algorithm.

Fig. 4. Last interval days and possibilities of contract extension.

Fig. 5. Frequencies of use and possibilities of contract extension.

3.5

Fig. 6. Potential and actual contract extensions – by percentage.

8074 Y.-F. Wang et al. / Expert Systems with Applications 36 (2009) 8071–8075

creases with frequency of use. It is here recommended that promo-tion programs be offered to those customers whose use rates aresmaller than 10.5 days.

4.3. Evaluation

The data of the last month, as previously mentioned, is used asthe testing data. The forecasted results are found to be very close tothe actual percentages of contract extension, and this means theresults obtained by the decision trees algorithm are highly usefuland applicable (Fig. 6).

5. Conclusion

We have used data mining to dig out useful information fromthe huge data about a wireless network company. The informationso obtained has been analyzed, using the decision tree algorithm,

so that we are able to come out with new marketing and promo-tion strategies. When we take actions, we will have to interact withcustomers and record all related feedbacks. Such information canhelp us reevaluate and improve our model. So there exists a con-stant need to understand and study the various reasons for cus-tomer churn. For instance, the behavior patterns of lost andexisting customers must be analyzed and compared in order tofind out potential lost customers and adopt appropriate preventivemeasures. With our proposed recommender system, differentstrategies can be made readily available that at once help maintaincongenial customer relationships and suit new marketing condi-tions and circumstances. To avoid customer churn, it is essentialthat the customers – whether existing or lost – get the messagethat we do care.

References

Bae, S. M., Ha, S. H., & Park, S. C. (2005). A web-based system for analyzing the voicesof all center customers in the service industry. Expert Systems with Applications,28, 29–41.

Breiman, L., Friedman, K. H., Olshen, R. A., & Stone, C. J. (1984). Classification andregression trees. Chapman & Hall.

Chalmeta, R. (2006). Methodology for customer relationship management. TheJournal of Systems and Software, 79, 1015–1024.

Chen, H. -C., & Chen, A. L. P. (2001). A music recommendation system based onmusic data grouping and user interests. In Proceedings of CIKM’01 (pp. 231–238).Atlanta, Georgia.

Chen, M.-C., Chiu, A.-L., & Chang, H.-H. (2005). Mining changes in customer behaviorin retail marketing. Expert Systems with Applications, 28, 77–781.

Chiang, D. A., Chen, W., Wang, Y.-F., & Hwang, L.-J. (2001). Rules generation from thedecision tree. Journal of Information Science and Engineering, 17, 325–339.

Demarest, M. (1994). Building the data mart. DBMA Magazine, 7(8), 44–45.Diamond Bullet (2004). Provider of usability & web design services. Retrieved May 20,

2004, from <http://www.usabilityfirst.com/glossary/term_962.txl>.Hwang, H., Jung, T., & Suh, E. (2004). An LTV model and customer segmentation

based on customer value: A case study on the wireless telecommunicationindustry. Expert Systems with Applications, 26, 181–188.

Kass, G. V. (1980). An exploratory technique for investigating large quantities ofcategorical data. Journal of Applied Statistics, 29(2), 119–127.

Kim, S.-Y., Jung, T.-S., Suh, E.-H., & Hwang, H.-S. (2006). Customer segmentation andstrategy development based on customer lifetime value: A case study. ExpertSystems with Applications, 31, 101–107.

Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L. R., & Riedl, J. (1997).GroupLens: Applying collaborative filtering to Usenet news. Communications ofthe ACM, 40(3), 77–87.

Lariviere, B., & Van den Poel, D. (2005). Predicting customer retention andprofitability by using random forests and regression forests techniques. ExpertSystems with Applications, 29, 472–484.

Lee, J. H., & Park, S. C. (2005). Intelligent profitable customers segmentation systembased on business intelligence tools. Expert Systems with Applications, 29,145–152.

Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Computing: Industry Report, <http://dsonline.computer.org/0301/d/w1lind.htm>.

Liu, H., & Motoda, H. (1998). Feature selection for knowledge discovery and datamining. Kluwer Academic Publishers..

Liu, H., & Setiono, R. (1996). Dimensionality reduction via discretization. Knowledge-Based Systems, 9(1), 67–72.

Lu, H., & Lin, J. C.-C. (2002). Predicting customer behavior in the market-space. Astudy of Rayport and Sviokla’s framework. Information and Management, 40,1–10.

Miller, B. N., Albert, I., Lam, S. K., Konstan, J. A., & Riedl, J. (2003). MovieLensunplugged: Experiences with an occasionally connected recommender system.In Proceedings of IUI’03 (pp. 263–266). Miami, FL.

Mooney, R. J., & Roy, L. (2000). Content-based book recommending using learningfor text categorization. In Proceedings of digital libraries (pp. 195–204). SanAntonio, TX.

Pons, A. P. (2006). Biometric marketing: Targeting the online consumer.Communications of the ACM, 49, 61–65.

Quinlan, J. R. (1983). Learning efficient classification procedures and theirapplication to chess end games. In R. S. Michalski, J. G. Carbonell, & T. M.Mitchell (Eds.), Machine learning: An artificial intelligence approach(pp. 463–482). CA, USA: Morgan Kaufmann.

Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). GroupLens: Anopen architecture for collaborative filtering of Netnews. In Proceedings of CSCW’94 (pp. 175–186). NC: Chapel Hill.

Shardanand, U., & Maes, P. (1995). Social information filtering: Algorithms forautomating ‘‘word of mouth”. In Proceedings of CHI’95 (pp. 210–217). CO: Denver.

Suh, E., Lim, S., Hwang, H., & Kim, S. (2004). A prediction model for the purchaseprobability of anonymous customers to support real time web marketing: Acase study. Expert Systems with Applications, 27, 245–255.

http://www.usabilityfirst.com/glossary/term_962.txl

Y.-F. Wang et al. / Expert Systems with Applications 36 (2009) 8071–8075 8075

Svensson, M., Laaksolahti, J., Höök, K., & Waern, A. (2000). A recipe basedonline food store. In Proceedings of IUI 2000 (pp. 260–263). LA: NewOrleans.

Tsai, C.-Y., & Chiu, C.-C. (2004). A purchased-based market segmentationmethodology. Expert Systems with Applications, 27, 265–276.

Van den Poel, D., & Buckinx, W. (2005). Predicting online-purchasing behaviour.European journal of operational research, 166, 557–575.

Van Raaij, E. M., Vernooij, M. J. A., & Van Triest, S. (2003). The implementation ofcustomer profitability analysis: A case study. Industrial Marketing Management,32, 573–583.

Verhoef, P. C., & Donkers, B. (2001). Predicting customer potential value anapplication in the insurance industry. Decision Support Systems, 32, 189–199.

Yuan, S.-T., & Chang, W.-L. (2001). Mixed-initiative synthesized learning approachfor web-based CRM. Expert Systems with Applications, 20, 187–200.

a recommender system to avoid customer churn: a case study

Documents