volume 5, issue 4, 2015 issn: 2277 128x international...
TRANSCRIPT
© 2015, IJARCSSE All Rights Reserved Page | 417
Volume 5, Issue 4, 2015 ISSN: 2277 128X
International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com
Customer Churn Prediction in Telecommunication Industries using
Data Mining Techniques- A Review Kiran Dahiya KanikaTalwar
M.Tech Student Assistant Professor
MRCE Faridabad, India MRCE Faridabad, India
Abstract—With the swift evolution of digital systems and concomitant information technologies, there is an incipient
inclination in the comprehensive economy to construct digital customer relationship management (CRM) systems.
This leaning is further palpable in the telecommunications industry, where enterprises become progressively
digitalized. Customer churn prediction is a foremost feature of a contemporary telecom CRM systems.Churn
prediction prototype guides the customer relationship management to retain the customers who are probable to quit.
In recent epochs, a number of ensemble and supervised classifiers and data mining techniques are used to model the
churn prediction in telecom. This article presents a state-of-art review of various methods and researches involve in
churn prediction. This paper doassessments on frequently used data mining procedures to categorize customer churn
patterns in telecom industry. The contemporary literature in the expanse of predictive data mining techniques in
customer churn comportment is reviewed and categorized in terms of method used and a argument on the future
research directions is presented.
Keywords: Customer Churn Prediction; Machine learning; data mining, feature extraction, feature selection,
learning methods, classification system, telecomcommunication CRM systems, review study.
I. INTRODUCTION
Churn prediction model monitors the customer relationship management to preserve the customers who are anticipated to
quit.The telecom management endeavorsfirm to acquire precise and timely information about those customers who are
disposed to quit. The churn prediction model plays an important role in this campaign. The intensive competition and
saturated markets have left telecom companies with little margin to ignore high churn rate. This is principally because a
customer quitting the company outlays five or six times more associated to acquiring a new customer. Consequently,
acompetent churn prediction model has a substantialprotagonist to play in telecom industry.
Figure 1. The Churn Prediction Landscape
The contemporary churn prediction system usually relies on classification algorithms. Nevertheless, a classifier
mostly agonizes due to enormous size and bulky dimensionality of the telecom dataset. Moreover, the telecom dataset
has usually an imbalanced nature with scarcer instances of the minority class that also hinders in attaining effective
performance. The classifierschallenges that are confronted in telecom churn prediction, researchers have proposed
various hybrid approaches, where ensemble classification schemes are typically combined with preprocessing techniques.
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 418
A. Customers Churn
Customers can churn for various reasons, and churn can happen at any time. Churn prediction is made more difficult by
the fact that customers can show seasonal behavior. Therefore, careful consideration should be given in defining
customer churn to avoid incorrect identification of churned customers. To this end, we take into account the duration of
the latest lapse period and changes in customer activity during the same season year over year.
Customer churn is a fundamental problem for companies and it is defined as the loss of customers because they
move out to competitors. Being able to predict customer churn in advance, provides to a company a high valuable insight
in order to retain and increase their customer base. A wide range of customer churn predictive models has been
developed in the last years. Most advanced models make use of state-of-the-art machine learning classifiers such as
random forests [1] [2]. Machine learning classifiers work well if there is enough human effort spent in feature
engineering, so it is possible to find a reasonable boundary of the classes in feature space. Thus having the right features
for each particular problem is usually the most important thing.
Figure 2. Nokia Siemens customer acquisition & churn study
To solve these problems companies spent a lot of feature engineering effort designing specific features for
problems like churn prediction and fraud detection. Even more problematic, features obtained in this human feature
engineering process are usually over-specified and incomplete. Feature engineering becomes not optimal in companies
that have huge amounts of data. For instance, in some telecommunication companies the data warehouse system holds
more than 100000 variables.
Since deep learning attempts to learn multiple levels of representation and automatically comes up with good
features and representation for the input data, we have recently investigated the application of deep learning in predicting
customer churn in prepaid mobile telecommunication networks.
Our main motivation to investigate and consider the application of deep learning as a predictive model is to
avoid time-consuming feature engineering effort and ideally toincrease the predictive performance of previous models
(or at least not degrade).
Well-known companies are also reporting the use of deep learning in commercial products. Microsoft work on
Mavis speech recognition system represents one of the first examples of using a deep neural network in a commercial
product [3]. IBM and Google are also using deep learning models for speech recognition [4] [5] and image processing.
Deep learning has been also applied to different domains such as automatic music recommendation [6] and prediction of
protein structure [7]. However, to the best of our knowledge this is the first work reporting the use of deep learning for
predicting churn in a mobile telecommunication network.
Figure 3. The Opportunities Churn Prediction provides
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 419
To investigate the feasibility of using deep learning models in production we trained and validated the models
using large-scale historical data from a telecommunication company with ≈1.2 million customers and span over sixteen
months. This dataset is extremely challenging because churn rate is very high and all customers are prepaid users, so
there is no specific date about contract termination and this action must be inferred in advance from similar behaviors.
Figure 1 depicts a fragment of the input data as a graph representation, it can be seen that there are complex underlying
interactions amongst the users.
Figure 4. A real graph representation of telecommunication data where each node identifies a device (IMEI) or a phone
number (SIM) and edges represent different type of interactions among them.
In a world of ever risingstruggle on the market, firms have become conscious that they should put much
powernot only trying to influence customers to sign agreements, but also to remembercurrentcustomers. Van Den
Poel&Lariviere[8] have shown that in the existingsituation where people are given anenormous choice of proposals&
different service suppliers to choose upon, winning new clients is a costly & hard process. Therefore, putting more power
in keeping churn low has become important for service-oriented firms.
Van Den Poel&Lariviere [8] précis the economic value of customer maintenance:
lowering the necessity topursue new &possible risky clients, which permits focusing on the loads of currentclients;
long-term clients tend to purchase more;
encouragingtalk of mouth from satisfied clients is a respectable way for new clients'attainment;
long-term clients are less costly to attend, because of thesuperior database of their demands;
long-term clients are less complex to competitors' marketing events;
losing clients results in less sales & an increased requirement to attract new clients, which is 5-6times more
expensive than money expended for maintenance of current customers;
People tend to segment more often negative than positive service involvement with friends, consequential in
negative image of the company amongst possible future clients.
The Customer Relationship Management (CRM) tools have been established& applied in order increase customer
attainment&maintenance, growth of profitability & to support significantcriticalresponsibilitiessuch as predictive
modelling &organization. Characteristically, CRM applications hold anenormous set of information regarding
alldistinctclient. This information is increased from clients'movement at the company, data arrived by customer in
process of registering, calls to care hotlines, etc. Appropriateexamination of this data can bring amazingconsequences for
marketing determinations, but also for classifyingclients which are likely to cancel their agreement.
Characteristically, database accesses are scored using anarithmetical model defined over various qualities,
which describe the clients. These qualities are often called predictor variables. Higher scores disclose greater opportunity
of churning. Models are being built using statistical methods like regression analysis, classification trees & neural
networks.
B. Customer Churn Analysis for Telecom Industry
Data volume has been rising at anincrediblestep over the last two decades due to progressions in information
knowledge. At the same time there has been hugeprogress in data mining. Many new approaches&methods have been
added to process data &collect information. The data collected from any source is a raw data in which valuable info is
unknown. Data mining can be clear as the process of eliminating valuable information from data. Data mining methods
have been effectivelyuseful in many different domains.
The toughestdifficulty faced by the telecom industry is customer churn. The customer churn models goal to
distinguish customers with the high probability to jump / leave the service supplier. A database of clients who might
churn permits the company to target those clients& start maintenance strategies that decrease the percentage of customer
churning.
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 420
Figure 5. Architecture and Design of a Platform for Adaptive, Real-time Churn Prediction using Stream Mining
Retention of old clients is alwaysdesirable option to the company. Attracting new clients costs almost 5-6 times more
than retentive theold clients. Attracting a new clientcomprises new recruits of manpower, cost of publicity & discounts.
A loyal client, who has been with the business for quite the long time, tends to produce higher profits& is less complex to
competitor prices. Such clients also cost less to keep & in addition, offer valuable word-of-mouth marketing to the
professional by mentioning their relations, friends, & other associates. In telecom Industry, the scheme is built to offer
service to some average number of clients, when the client number falls below the planned number. It is careful as loss to
company [9].
A minor step towards retentive acurrent customer can lead to animportantgrowth in revenues & profits. The
condition of the recollecting customers desires for correctmodels of customer churn prediction that are both accurate&
comprehensible. The Models have to classify customers who are about to churn & their reason for churn to evade losses
to industry of telecom, the model should be recognised to classify the reasons to churn & the enhancements required to
recollect customers.
II. DECISION TREE BASED METHODS FOR CUSTOMER CHURN PREDICTION
Decision tree is used to assume future trends & to extract models based upon the interconnected decisions [10-13]. It
works upon the principal of categorizing data into certain classes inaccordance with their features. Internal nodes follow
root node by cover all existenceoptions [14, 12]. Thus a tree is designed with its single arc relatingspecific responses.
The decision trees are most commonly used tool for classification& predictions of future events. The growth of
such trees is completedin two major steps: building & pruning. During the first phase the data set is partitioned
recursively until most of archives in each partition contain equal value. The second phase then eliminates some branches
which comprisethe noisy data (those with largest assessed error rate).
CART, a Classification and Regression Tree, is created by recursive division of an instance into subgroups until
a definitestandards has been met. The tree produces until the reduction of impurity falls below a user-defined threshold.
All node in the decision tree is test condition& the branching is based on value ofquality being tested. The tree is
representing a group of multiple rule sets. When estimating a client data set the arrangement is done by crossing through
the tree until the leaf node is strained. The label of leaf node (Churner / Non Churner) is allocated to the client record
under assessment. Figure 2 shows thebasic churn prediction decision tree for a sector of telecommunication.
Figure 6. A simplified churn prediction decision tree
Decision trees are often evaluated that they are not appropriate for capturing composite& non-linear relationships among
the qualities. Nevertheless, research illustrate [15-17] that the exactness of decision trees & training data necessities are
high.
Oseman et al. [18] presented how to put into application grouping decision tree methods for churn examination in
telecommunication industry. Anillustration set is used to carry out a test of customer churn issue using the ID3 decision
tree. In their outcomes they establish that the area of subscriber was main classification characteristics that contributed to
client churn, other than two minor reasons for customer to churn.
Age<60
Usual Call duration
<2 min Placed Calls > 10
Churner Non Churner Non Churner
Churner
Yes No
Yes No Yes No
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 421
In Taiwan, Wei & Chiu [19] put into use C4.5 based procedures on one of the largest local mobile telecommunication
companies & it recognized 28.32% of the subscribers that restricted some of true churners with the lift factor of 2.30 &
the preservation time of 14 days. This can be associated to research by Jahormi et al. [20] that the aimed at evolving a
predictive model for client churn in pre-paid mobile telephony establishments. They applied decision trees‘ methodslike
C5.0 with the neural network & it was exposed that based on improvement measure decision trees executed better than
neural networks. Anassociated study was approved out by Yeshwanth [21] in which he shared J48 decision tree along
Genetic algorithm & constructed a hybrid evolutionary method for churn prediction in mobile networks. Authorachieved
72% accurate consequences for largest telecom company in evolving countries. Kaur et al. [22] useful Naive Bayes, J48
& support vector machineries classifiers to process data so as to classify the important characteristics of the customers
that help in forecasting churn of the bank clients. In their findings, they decided that achievementforecast of the loyal
class is less than prediction success rate % of the churn class. Additionally they alsooriginate that the J48 decision tree
had enhanced performance related to other methods.
Soein&Rodpysh [23] preformed some tests in Iran involving relatingnumerous well-known data mining methods: C5.0,
QUEST, and CART, CHAID, Bayesian networks & Neural networks to find out the besttechnique of customers‘ the
churn prediction in the Iranian Insurance Company. The outcomespresented that CART decision tree had improved
performance than other methods. The other scholars, Hadden et al. [24] had the purpose of specifying the most
appropriate model for churn prediction analysis. They showed an estimate on the different algorithms such asCART
trees,neural networks& regression &confirmed their correctness in predicting customer churn. They originate that
decision trees outperform rest of other methods with an overall correctness percentage of 82%.
Qureshi et al. [25] in their examinationimagine active churners in the Telecom industry by applying numerous
methods of data mining such as, K-Means Clustering, Logistic Regression , Neural Network, Linear &,Exhaustive
CHAID, CART, QUEST, &CHAID. They found that Exhaustive CHAID performed wellrelated to all other methods.
60% was the percentage of correctly recognized churners which was highest % among altogether other methods.
However, other decision trees variants did not demonstrate as high presentationin addition to Exhaustive CHAID.
Jahromi et al. [20] showed research with aim of emergingpredictive model for the customer churn in pre-paid companies
of mobile telephony. They acceptedtests on performance of numerous model-building algorithms such asNeural
Networks, C5.0, CART, & CHAID.
III. NAÏVE BAYES BASED METHODS FOR CUSTOMER CHURN PREDICTION
Naïve Bayes learning produces a probabilistic model of the detected data. Despite its ease, Naïve Bayes has been
confirmed to be reasonable with more composite algorithm such as neural network / decision tree in certain domains [26,
27]. Assumed the training set of instances, each is characterised as a vector of features [𝑥1 , 𝑥2 , … , 𝑥𝑑 ], the assignment is
learning from data to be capable to predict most probable class𝑦𝑗 ∈ ℂ of the new instance whose class is unidentified.
Naïve Bayes employs the Bayes‘s theorem to guess the probabilities of the classes.
𝑃 𝑦𝑗 |𝑥1 , 𝑥2 , … , 𝑥𝑑 =𝑃(𝑦𝑗 )𝑃(𝑥1, 𝑥2 , … , 𝑥𝑑 |𝑦𝑗 )
𝑃(𝑥1 , 𝑥2 , … , 𝑥𝑑 )
…. (1)
Where 𝑃(𝑦𝑗 ) is the prior probability of class 𝑦𝑗 which is projected as its existence frequency in the training
data.𝑃(𝑦𝑗 |𝑥1 , 𝑥2 , … , 𝑥𝑑)Is the subsequent probability of class 𝑦𝑗 after observing the data.𝑃(𝑥1 , 𝑥2 , … , 𝑥𝑑 |𝑦𝑗 )Denotes the
conditional probability of observing an occurrence with the feature vector [𝑥1 , 𝑥2 , … , 𝑥𝑑 ] among those having class𝑦𝑗 .
And 𝑃(𝑥1 , 𝑥2, … , 𝑥𝑑) is probability of detecting an instance with feature vector [𝑥1 , 𝑥2 , … , 𝑥𝑑 ] regardless of the class.
Since the sum of the subsequent probabilities entirely classes is one 𝛴𝑦𝑗∈ℂ𝑃 𝑦𝑗 |𝑥1 , 𝑥2, … , 𝑥𝑑 = 1, denominator on eq.
(1)‗s right hand side is normalizing factor & can be omitted.
𝑃 𝑦𝑗 |𝑥1 , 𝑥2 , … , 𝑥𝑑 = 𝑃(𝑦𝑗 )𝑃(𝑥1, 𝑥2 , … , 𝑥𝑑 |𝑦𝑗 )
…. (2)
An instance will be labelled as the particular class which has the highest posterior probability𝑦𝑀𝐴𝑃 .
𝑦𝑀𝐴𝑃 = arg max𝑦𝑗∈ℂ𝑃(𝑦𝑗 )𝑃(𝑥1 , 𝑥2 , … , 𝑥𝑑 |𝑦𝑗 )…. (3)
In order to estimate the term 𝑃(𝑥1, 𝑥2 , … , 𝑥𝑑 |𝑦𝑗 ) by counting frequencies, one needs to have a huge training set where
every possible combinations [𝑥1 , 𝑥2 , … , 𝑥𝑑 ] appear many times to obtain reliable estimates [26]. Naïve Bayes solves this
problem by its Naïve assumption that features that define instances are conditionally independent given the class.
Therefore the probability of observing the combination [𝑥1 , 𝑥2 , … , 𝑥𝑑 ] is simply the product of the probabilities of
observing each individual feature value 𝑃 𝑥1 , 𝑥2 , … , 𝑥𝑑 𝑦𝑗 = 𝑃(𝑥𝑖|𝑦𝑗 )𝑑𝑖=1 . Substituting this approximation into
equation (3) to derive the Naïve Bayes classification rule.
𝑦𝑀𝐴𝑃 = arg max𝑦𝑗∈ℂ
𝑃(𝑦𝑗 ) 𝑃(𝑥𝑖 |𝑦𝑗 )
𝑑
𝑖=1
…. (4)
As discussed above, for nominal feature, the probability is estimated as the frequency over the training data. For
continuous feature, there are two solutions. The first one is to perform discretization on those continuous features,
transferring them to nominal ones. The second solution is to assume that they to follow a normal distribution.
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 422
The term 𝑃(𝑥𝑖 |𝑦𝑗 ) is estimated by the fraction #𝐷(𝑥𝑖 |𝑦𝑗 )
#𝐷(𝑦𝑗 ), where #𝐷(𝑦𝑗 ) is the number of instances in the training set
having class𝑦𝑗 , and #𝐷(𝑥𝑖|𝑦𝑗 ) is the number of these instances having feature value 𝑥𝑖andclass 𝑦𝑗 . If the training data
doesn‘t contain any instance with this particular combination of class and feature value, #𝐷(𝑥𝑖|𝑦𝑗 ) is zero. The estimate
probability according to equation (4) will be zero for every similar cases. To avoid this, a correction called the m-
estimate is introduced [28, 29].
𝑃 𝑥𝑖 𝑦𝑗 =#𝐷(𝑥𝑖 |𝑦𝑗 ) + 𝑚𝑃(𝑥𝑖)
#𝐷(𝑦𝑗 ) + 𝑚
…. (5)
If the prior probability 𝑃(𝑥𝑖) is unknown, uniform distribution is assumed, i.e. if a feature has k possible values, then
𝑃(𝑥𝑖) = 1/𝑘. The parameter 𝑚 can be regarded as the additional 𝑚 dummy instances appended to the training set.
S. Balaji [30] used Naive Bayesian Classification algorithm for customer classification and to predict churners
that are churned on Life Insurance sector. He also used Naïve Bayes classification to classify the customers from larger
dataset. It also analyses the issues of using data mining technology for predicting the customer habits. In this analysis,
they had tested 10.000 sample of Life Insurance of customers, the unprocessed data can be converted into useful
information and then into knowledge for which they had used predictive data mining techniques .Posterior classification
process applied for the data in this paper. It proved that the naïve bayes classifier is much better than other classifier for
conducting the policy preferences towards the customers. This helps us to raise the income of the organization.
IV. NEURAL NETWORKS BASED METHODS FOR CUSTOMER CHURN PREDICTION
In machine learning and cognitive discipline, artificial neural networks (ANNs) are a group of statistical learning
algorithms enthused by biological neural networks (central nervous systems of animals, in precise the brain) and are used
forassessment or approximation of functions that can be contingent on a large numeral of inputs and are usually
unknown. Artificial neural networks are generally accessible as arrangements of interconnected "neurons" which can
calculate values from inputs, and are accomplished of machine learning as well as pattern recognition obligations to their
adaptive nature.
Figure 7. There are a number of common activation functions in use with neural networks. This is not an exhaustive list.
Neural Networks (NN) is the data mining technique that has capability of learning from the errors [31]. The
Neural Networks are inspired by the brain. This occurs in the sense that brain learns a limited new things which then will
be communicated via the neurons. Similarly, the neural network neuron with the algorithms of learning is able to learn
from the data of training; this makes them be mentioned to as the Artificial Neural Networks [32]. The results of
Lazarov&Capota [33] work presented that the ANNs gave best results as associated to other identified algorithms.
Furthermore they argued that thesuitable prediction model needs constant updating, & should put into the application a
variability of the data mining algorithms. Au et al. [34] consider that the major limitation of the neural networks is that
they scarcely uncover patterns in an easily manner of understandable.
Their study also had exposed that the neural networks outdo the decision trees for churn prediction through the
identification of more churners compared to the C4.5 decision trees. This is in the line with researchproviding by Mozer
et al. in [35] which displays that nonlinear neural network outdoes a decision tree & logistic regression. Sharma
&Panigrahi [36] suggest the neural network-based approach in churn prediction of customer in line with the cellular
wireless services. The consequences of experiments on the churn dataset of UCI repository designate that the neural
network based approach can predict the customer churn with correctness more than the 92%. Correctness that is achieved
by the neural networks completely outweighs the limitation that they essential large volumes of the sets of data & a lot of
time to analyse the considerable load for predictor attributes [33].
V. CUSTOMER CHURN IN TELECOMMUNICATION
Telecommunication has increased 1 of the top positions in list of fastest rising industries of the world by cover 90% of its
population [37]. It is 1 of the segments where customer base plays anactualsignificant role in maintaining the income
[38]. Thesector of telecommunication is facing a simple threat of the customer churn [39-42].
According to [43], the industry of wireless telecom is facing with threat of losing 27% of their customers every
year, which would certainly result in vast revenue loss. It is also an assumed fact that adding / acquiring the new
customer costs 5-10 times more to add the new customer than retentive an old customer withcompany [44].
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 423
Consequently, [45], propose that the company should pay more consideration to retain its present subscribers rather than
the adding new ones.
Presentlythe business firms pay more consideration to make the firm relationship with their customers [46-49].
Therefore it has become a certainty that the best strategy of marketing is to retain existing subscribers / more simply to
evadethe customer churn [49-52]. To challenge with this problem, the techniques of data mining have been showed as the
best tools to fight in contradiction of ever increasing rate ofcustomer churn [47, 53-58].
VI. REGRESSION BASED METHODS FOR CUSTOMER CHURN PREDICTION
Regression analysis is a popular technique used by the researchers dealing with predicting customer satisfaction. It
provides a first step in model development. Mihelis et al. [59] developed a method to determine customer satisfaction
using an ordinal regression based approach. Another model for assessing the value of customer satisfaction was
developed by [60]. They used logistic regression to link satisfaction with attributes of customer retention. They claim that
the logistic function can be interpreted as providing the retention probability. [61]Use a binomial logit model to
determine subscriber churn in the telecommunications industry, based on discrete choice theory. Discrete choice theory is
the study of behaviour in situations where decision makers must select from a finite set of alternatives. According to [62]
regression analysis is fine for determining a probability for prediction; however it is unable to explicitly express the
hidden patterns in a symbolic and easily understandable form.
Figure 8. Illustration of linear regression on a data set.
Hwang et al. [63] discovered that logistic regression performed best for predicting customer churn when compared with
neural networks and decision tree. It should be noted that [63] were investigating a prediction of the customer lifetime
value (CLV), with the intent of including customer churn; they suggest that logistic regression was the best model for
their purpose. The authors believe that many factors could influence these results such as the neural network parameters
chosen and the data that the experiment was based on. The data used for experimentation may have been more suited to a
logistic regression model than that of a neural network or decision tree.
Datta et al. [64] used simple regression to initially predict churn but later experimented with KNN, decision trees and
neural networks. The overall model used to develop the churn prediction platform was done using a neural network.
Their research could not establish a best method. They have stated future directions as including an explanation of
customer behaviour because their model was unable to predict customer churn accurately. They also suggest that
information stored externally to the organisations‘ database should be included, such as the state of the
telecommunications market and current competing offers etc. The model suggested by [64]fails to distinguish between
loyal customers, valuable customers and less profitable customers. They suggest that future research should include a
more financial orientated approach by optimising payoff. They further suggest that by concentrating on payoff rather than
churn the developed model would weight those customers bringing in higher profits over those bringing in lesser profits.
Baesens et al. [65] used Bayesian network classifiers for identifying the slope of the customer lifetime value (CLV) for
long-life customers, but used simple linear regression on the historical contributions of each customer to capture their
individual lifecycles. The slope was then separated into either positive or negative classes to represent increased or
decreased spending. This variable was then used as the dependent variable in their study. The CLV was also the focus of
research by [66] who used the Kaplan-Meier estimator to estimate the value, and [67] who link CLV to company
shareholder value.
VII. LOGISTIC REGRESSION BASED METHODS FOR CUSTOMER CHURN PREDICTION
Naïve Bayes is a generative classifier. Given the data 𝑥 ∈ ℝ𝑑 and the class 𝑦 ∈ {−1, +1}, it learns a model of the
conditional probability 𝑃(𝑥|𝑦) and the prior probability 𝑃(𝑦) to predict the most probable class 𝑃(𝑦|𝑥). Meanwhile,
logistic regression is a representative of discriminative classifier. It learns a direct map from input x to output y by model
the posterior probability 𝑃 𝑦 𝑥 directly [68]. The parametric model proposed by logistic regression is of the form.
𝑃 𝑦 = −1 𝑥 =1
1 + exp(𝑤0 + 𝑤𝑖𝑥𝑖𝑑𝑖=1 )
…. (1)
And
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 424
𝑃 𝑦 = 1 𝑥 =exp(𝑤0 + 𝑤𝑖𝑥𝑖
𝑑𝑖=1 )
1 + exp(𝑤0 + 𝑤𝑖𝑥𝑖𝑑𝑖=1 )
…. (2)
The main task of logistic regression is adjusting the weights so that the model fits the data as well as possible.
𝑤 = [𝑤0 , 𝑤1 , 𝑤2 , … , 𝑤𝑑 ] ← arg max𝑤
𝑃(𝑦 𝑘 |𝑥 𝑘 , 𝑤)
𝑘
…. (3)
where w is the vector of parameters,𝑦(𝑘)is the observed value of y and 𝑥(𝑘) the observed value of x in the 𝑘𝑡ℎ training
instance. The maximization of equation (3) is known as the maximum likelihood estimation (MLE).
𝑤 = [𝑤0 , 𝑤1 , 𝑤2 , … , 𝑤𝑑 ] ← arg max𝑤
ln𝑃(𝑦 𝑘 |𝑥 𝑘 , 𝑤)
𝑘
= 𝑎𝑟𝑔 max𝑤
𝐿(𝑤)
…. (4)
Where L(w) is called the conditional log-likelihood of the class [69]. Maximization of L(w) can be achieved for example
by using gradient ascent.
Designed for continuous feature but logistic regression can still handle nominal feature and missing values.
Nominal features are converted to binary features and missing values are replaced by the mean (continuous features) or
the mode (binary features) of the training data.
M.Owczarczuk [70] used logistic regression, G.Nieet al., [71] used logistic regression and decision tree model
and A.Keramati, S.M.S.Ardabili [72] focused on Binomial logistic regression model for churn prediction and identified
customer dissatisfaction, service usage, switching cost and demographic variable affects customer churn. B.Shim et al.,
[73] used decision tree, neural network and logistic regression for customer classification and identified decision tree
shows highest hit ratio among them and P.Kisioglu,Y.I.Topcu [74] applied bayesian belief network to find out most
important factors that have effects on customer churn in telecommunication industry and CAID algorithm is used to
discretize continuous variable in churn.
VIII. COVERING OR RULE BASED ALGORITHMSFOR CUSTOMER CHURN PREDICTION
There are many covering algorithms families like AQ, CN2, RIPPER, and RULES family where rules are directly
induced from a given set of training examples. This can be illustrated using Verbeke et al. [75] application of two novel
data mining methods to customer churn prediction. They also benchmarked to ancient rule induction techniques for
example C4.5, RIPPER, SVM, and logistic regression. They used both ALBA and AntMiner+ to stimulate accurate and
understandable rules for classification. The experiments results proved that in order to get the highest accuracy a
combination of ALBA with C4.5 or RIPPER is needed. If C4.5 and RIPPER are applied on an oversampled dataset the
sensitivity will be on the highest level.
Figure 9. An instance (X, ) of the set-covering problem, where X consists of the 12 black points
and . A minimum-size set cover is . The greedy algorithm produces a cover of size 4
by selecting the sets S1, S4,S5, and S3 in order.
RULE Extraction System (RULES) was eminent from other algorithms of covering families because of its
effortlessness and easy flow. The 1st member of the RULES family of algorithms RULES-1 [76], has been issued in
1995. After that numerous versions of the algorithm have been settled and applied in more than a few domains [77].
From the literature assessment, we establish out that there has been diminutive research work on inductive learning
covering algorithms and their solicitations in customer Churn in telecom industry. RULES family algorithms are
preciseappropriate tools for data mining solicitations. For instance, Aksoy et al. [78] have stated that RULES-3 Inductive
Learning Algorithm is a very virtuous choice for data mining. In a study [78] they used RULES-3 on eleven real life data
cliques for data mining by associating it with three statistical, two lazy, and six rule-based data mining algorithms in
terms of learning proportion, accuracy and heftiness to noisy and incomplete data. The good recital of RULES-3 is
because of its following features: RULES-3 can handle a large sets of examples without having to disruption them up
into smaller subsets; it can produce only rules that comprehend only relevant conditions; it allows a degree of control
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 425
over the numeral of rules to be extracted; it could be applied to problems involving arithmeticalcharacteristics as well as
nominal attributes and it gives a great flexibility for the user to control the meticulousness of the rules to be engendered,
which can aid in building better prototypes.
IX. STATISTICAL DATA ANALYSIS BASED TECHNIQUES FOR CUSTOMER CHURN PREDICTION
Statistical techniques are a collection of methods applied in data mining used to process large volumes of data. They are
used in learning links between both the dependent and independent attributes. This section presents the major statistical
based data mining techniques (Linear regression, Logistic regression, Naive Bayes Classifier and K-nearest neighbor‘s
algorithm) & their procedure in the context of customer churn Analysis.
Figure 10. Breakdown of Statistical Methods
Methods based on regression have been related with good outcomes in prediction & estimation of churn. In Customer
churn difficulty, there is frequently a two decisions‘ definiteresult. The result is Yes / No or true / false or churns / no
churns. The variables of residual are mostly continuous in the nature because of that logistic regression seemed to be the
best choice [79]. Lazarov andCapota [80] conferred commonly used the algorithm of data mining in the customer churn
prediction&analysis.
The techniques of regression tree were conversed along with other popular methods of data mining such
as,Neural Networks, Rule based learning &Decision Trees. The decision was that the models of good prediction have to
be continuouslyadvanced& a mixture of the proposed approaches has to be used. Qureshi et al. [79] too applied the
methods of logistic regression on telecom industry data to recognizechurners.
It failed to attain well because only 45% of total no. of the churners were properlyrecognized which is a very
low percentage. On conflicting, the logistic regression did a good work by finding 78% of the total number of dynamic
users properly. Additional application is completed by author Nie et al. [81] who used 2algorithmsof data mining; the
decision trees &the logistic regression to construct the model of churn prediction. They used the data of credit card from
the real Chinese bank. The test outcomeclassified regression ahead of the decision trees.
Naive Bayes is a managed learning module which makes predictions about hidden data based on the Bayesian
theorem [80]. [82] Came up with the model of prediction of customer churn. This was based on the algorithm of Naive
Bayes in the data of wireless customer. It attained 68 % correctness in the 1st pass that was based on Bayesian model.
The algorithm of K-nearest Neighbors is 1 of the basic approaches of the traditional statistical classification. The class
label assignment of unseen instance is based on dominant the class label ofkneighbor instances. This categorize assume
only k closest entries in set of training [83]. [84] Whoaccessible in their research the hybrid approach of algorithm of k-
nearest neighbor & also the method of logistic regression for constructing the binary classifier called as KNN-LR.
They carried out the comparison between the KNN-LR with a logistic regression, the C4.5 &the Radial Basis
Function network. The outcome was that the KNN-LR outperformed the RBF on all 4 benchmark datasets. Furthermore,
it also outperformed the logistic regression on these sets of benchmark data, only that they have very close presentation
on Wisconsin breast cancer set of data. The consequence also designated its advantage over the RBF &C4.5 but C4.5 just
surpassed KNN-LR on the dataset of telecom. The new model obtainable by [85] indicates the hybrid model that joins
theadaptedalgorithm of k-means clustering with the classic rule technique of inductive (FOIL) for predicting the behavior
of customer churn.\
A comparison was done to model based on 6 techniques. These were actualdecision tree, k-means, PART,
logistic regression, SVM, KNN, &the OneR& other techniques of Hybrid such asSePI& k-NN-LR. Out of all these 6
organizers, the models of hybrid &the benchmark datasets, proposed system was 12 times improved. There was then
computation of average values of AUC (measurement of the prediction correctness) for each technique of classification,
& hybrid model still has extreme average value.
X. GENETIC ALGORITHM BASED TECHNIQUES FOR CUSTOMER CHURN PREDICTION
Genetic algorithm is a quantity of evolutionary computing, which is a quickly emergent part of artificial intelligence. We
can see that, genetic algorithm is motivated by Darwin's theory around evolution. Fundamentally, explanation of a
problem answered by genetic algorithm is evolved. In a genetic algorithm, a population of threads (termed chromosomes
or the genotype of the genome), which translate applicant explanations (called individuals, creatures, or phenotypes) to
an optimization problem, is changed toward better answers.
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 426
Figure 11. Illustration of the maneuver of an unpretentious genetic Algorithm. In apiece iteration, or 'generation', a
population of conceivable solutions is assessed and the top-ranking solutions are designated as 'parents' of the next
generation. The subsequent generation of solutions is shaped by the recombination of elements from their parents, along
with sporadic random alterations or 'mutations'. The process is continual so that the 'fitness' of subsequent populations
upsurges. In this manner, the likelihood that the population comprehends the optimal solution also escalates.
Usually, solutions are denoted in binary as series of 0s and 1s, but another programming is also thinkable. The
evolution typically starts from a population of randomly created entities and occurs in groups. In each group, the fitness
of each discrete in the population is calculated, various individuals are stochastically elected from the existing
population, and better (recombined and perhaps arbitrarily mutated) to form a novel population. The novel population is
then used in the following repetition of the algorithm.
Generally, the algorithm dismisses when both a maximum number of generations has been created, and a
satisfactory fitness level has been extended for the population. If the algorithm has completed in line for a maximum
number of groups, a suitable solution may or may not have been extended. Here GA is used to compute the optimum
probability for the selection of cluster head among a range of higher and lower bound.
B.Huang et al., [86] proposed genetic algorithm (NSGA) to find number of features subset in different size and
dimension. The experiments were carried out using decision tree C4.5 and results proved that the NSGA algorithm is
efficient and successful for churn prediction. Y.Huang et al., [87] presented a new approach which based on chi-square
method to select features for customer churn prediction and demonstrated the results with five different methods like DT,
NB, LR, SVM and DMEL.
XI. CONCLUSION AND DISCUSSION
Withholding of possibly churning customers' has arisen to be as imperative for service providers as the acquisition of
new customers. High churn rates and substantial revenue loss due to churning have turned correct churn prediction and
prevention to a vital business process. Although churn is unavoidable, it can be managed and kept in acceptable level.
There are many different ways of churn prediction and new techniques continue to emerge. Good prediction modelshave
to be constantly developed and a combination of theproposed techniques has to be used. Valuable customershave to be
identified, thus leading to a combination of churnprediction methods with customer lifetime value techniques.
Although a conclusion may review the main points of the paper, do not replicate the abstract as the conclusion.
A conclusion might elaborate on the importance of the work or suggest applications and extensions. Customer churn has
been identified as a major problem in Telecom industry and aggressive research has been conducted in this by applying
various data mining techniques.
Decision tree based techniques, Neural Network based techniques and regression techniques are generally
applied in customer churn. From the review of literature we found that decision tree based techniques specially C5.0 and
CART have outperformed some of the existing data mining techniques such as regression in terms of accuracy. On other
cases neural networks outdo the former method due to the size of datasets used and different feature selection methods
applied.
There are likely to be tremendous rates of research in data mining and their applications in customer churn, but
still it is an active research field and researchers are searching for more accurate solutions. In this paper we provide a
summary of the different data mining methods, and their applications in customer churn prediction.
However from the literature survey it is evident that there has been little research work on covering algorithms
and their applications in customer churn, especially when it comes to applying Rules family algorithms in customer
churn analysis.
Future work will be applying RULES family techniques on telecom datasets and compare the results with some
of the most commonly used techniques in churn prediction as they are very suitable tools for data mining applications.
Machine learning methods will be best solution for developing an efficient and automated system for churn prediction.
Once churners have been identi_ed and reasons for quittinghave been found rapid action has to be taken by the marketing
department in order to prevent churn properly. Usuallythe time is not enough to address all likely churners. Therefore,
further decision making has to be done to choose theclients that will be contacted.
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 427
Table I: Tabular comparison of surveyed literatures
Author
Name
Year Paper Work Work Description Method
Used
Special Note
Idris, A. 2012 Genetic
Programming
and Adaboosting
based churn
prediction for
Telecom
In this paper, author use Genetic
Programming (GP) based approach
for modeling the challenging
problem of churn prediction in
telecom. Adaboost style boosting
is used to evolve a number of
programs per class. [88]
Genetic
Program
ming
(GP),
Adaboost
ing
optimizat
ion
The predictions are made with
the resulting programs using
the higher output, from a
weighted sum of the outputs of
programs per class.
Li Peng 2013 Telecom
customer churn
prediction based
on imbalanced
data re-sampling
method
In this article, authors utilize
imbalanced data re-sampling
method combines Support Vector
Machine (SVM) to solve the
imbalanced data problem, poor
classification performance. [89]
Support
Vector
Machine
(SVM),
data
resampli
ng
method
Using the appropriate metrics
which are more suitable for
imbalanced data sets to
evaluate the performance, the
datasets are obtained from
France telecom operator,
Orange Telecom, and UCI.
Liu,
Ling
2011 Modeling China
Telecom
customer churn
prediction based
on CRISP_DM
It is necessary for China Telecom
to apply data mining tools to
predict customer churn so as to
raise the pertinence of customer
marketing decision. This paper
takes E9 Package project of China
Telecom as an example and builds
an improved customer churn
prediction model based on
CRISP_DM. [90]
Cross
Industry
Standard
Process
for Data
Mining
The result shows that
modeling on C5.0 algorithms
with the condition of
considering fault classification
loss has lower utilization cost
and higher prediction effect,
comparing with CART
algorithms.
Yabas,
U.
2012 Customer Churn
Prediction for
Telecom
Services
These researchers are working on
data mining methods to accurately
predict customers who will change
and turn to another provider for the
same or similar service. Sample
dataset this work use for
experiments has been compiled by
Orange Telecom from real data.
[91]
Random
Forests
algorith
m
They posted the sample
dataset for 2009 Knowledge
Discovery and Data Mining
Competition. Authors are
aiming to find alternative
methods that can match or
improve the recorded highest
score with more efficient use
of resources.
Xu
Hong
2009 Churn
Prediction in
Telecom Using
a Hybrid Two-
phase Feature
Selection
Method
This paper proposes a hybrid two-
phase feature selection method
which can effectively reduce
feature dimension and promote
predicting performance by using
both traditional expertise approach
and Markov blanket discovery
technique. [92]
Markov
blanket
discover
y
techniqu
e, local
causal
discover
y,
Markov
blanket
induction
Empirical results of a branch
of a Chinese wireless telecom
company show that it is a
feasible and superior method
for telecom costumer feature
selection. The results also
show better performance of
this method than the method
based on traditional expertise
approach.
Suresh,
L.
2009 Analysis and
prediction server
with column
store database
— A case study
in telecom churn
In this work the attempts made by
the authors to develop such a
system named as `rePivot' are
presented. The proposed frame
work consists of three modules
namely - a column store database
to provide quick access to data, a
time series ranking module and a
probabilistic forecasting module.
[93]
Probabili
stic
Projectio
n Model.
multiple
linear
regressio
n, pattern
based
approach
Ranking
Time
Series
A case study of the proposed
frame work in churn analysis
and modeling in telecom has
been carried out to test the
suitability of framework for
industrial applications.
Application of the framework
has shown promising results.
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 428
Ning Lu 2014 A Customer
Churn
Prediction
Model in
Telecom
Industry Using
Boosting
This research conducts a real-
world study on customer churn
prediction and proposes the use of
boosting to enhance a customer
churn prediction model. Unlike
most research that uses boosting as
a method to boost the accuracy of a
given basis learner, this paper tries
to separate customers into two
clusters based on the weight
assigned by the boosting
algorithm. [94]
Logistic
regressio
n
As a result separating
customers into clusters, a
higher risk customer cluster
has been identified. Logistic
regression is used in this
research as a basis learner, and
a churn prediction model is
built on each cluster,
respectively.
Pushpa 2013 Social network
classifier for
churn prediction
in telecom data
This paper addresses the Social
position of each customer in a
network and Equivalence
approaches to classify the telecom
customers. Social position can be
evaluated by finding the centrality
of a node identified through a
number of connections among
network members. [95]
Centralit
y
Measures
, SNA
Techniqu
es
(Centralit
y
measures
), Social
Network
Classifier
, Regular
Equivale
nce
algorith
ms
Such measures are used to
characterize degrees of
influence, prominence and
importance of certain
members. Regular equivalence
analysis seeks to identify
customers as churners and
non-churners based on
regularities in the patterns of
network ties.
Xue
Zeng
2008 Definition of
Misclassificatio
n Cost &
Redistribution
Strategy of
Telecom Churn
Analysis
To solve the data imbalance
problem exiting in this field,
traditional researches always
redistribute samples according to
misclassification cost. But exiting
researches in this area neither gave
out the quantitative description of
the misclassification cost nor set
up a unified method for
redistributing samples. [96]
Study
Work
To solve these problems, an
original mathematical
definition of misclassification
cost for this domain is set up
by taking telecom industry
economic factors into
consideration and a
redistribute strategy based on
this cost is drawn out.
Jun Liu 2010 Research on
customer churn
prediction model
based on IG_NN
double attribute
selection
This paper discusses the problem
of customer churn prediction, and
proposes the customer churn
prediction model based on double
attribute selection of information
gain (IG) and neural network (NN)
by analyzing the characteristics of
customer churn data. [97]
Informati
on gain
(IG) and
neural
network
(NN)
Firstly, undertake the main
attribute selection for
customer churn data by using
IG, and then analyze every
main attribute by using NN,
which output results are
analyzed by 80–20 rule to get
the key attributes affecting
customer churn; secondly,
construct the prediction model
based on IG_NN by taking the
key attributes as input and
customer churn probability as
output. The model predicts
lost customers next month by
carrying on data acquisition
about customer behavior and
payment information of a
telecom operator during first
three months.
Yu, W. 2005 A churn-strategy
alignment model
for managers in
mobile telecom
In this paper, authors propose a
new model for strategic alignment
of churn predictors to an
adaptation of the Delta strategic
Delta
strategic
model,
churn-
Research results contribute to
analyzing churn predictors
from a new perspective - that
of organizational
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 429
model for firm competitiveness.
This model is substantiated using a
dataset from Duke University's
Teradata Center for CRM. [98]
strategy
alignmen
t model
competitiveness strategy.
Using factor analysis, the
model links high-level churn
predictors with
competitiveness strategy.
Yongbin
Zhang
2011 Behavior-Based
Telecommunicat
ion Churn
Prediction with
Neural Network
Approach
A behavior-based telecom
customer churn prediction system
is presented in this paper. Unlike
conventional churn prediction
methods, which use customer
demographics, contractual data,
customer service logs, call-details,
complaint data, bill and payment as
inputs and churn as target output,
only customer service usage
information is included in this
system to predict customer churn
using a clustering algorithm [99]
Neural
Network
It can solve the problems
which traditional methods
have to face, such as missing
or non-reliable data and the
correlation among inputs.
Pushpa 2013 Sociocentric and
egocentric
measures for
identifying the
key players in
telecom social
network
The typical work on social network
analysis includes the construction
of both multirelational telecom
social networks and ego-networks
of telecom customers for discovery
of group of customers who share
similar properties and classify the
customers as churners and non-
churners. [101]
Clusterin
g based
on
Sociocen
tric and
egocentri
c
measures
This paper explores both
sociocentric and egocentric
methods for identifying key
players who plays important
roles in decision making in
finding the churn rate of
telecom social networks.
Hsu,
Tsung-
Hao
2014 Inferring
potential users
in mobile social
networks
To infer potential users, authors
propose a framework including
feature extraction, feature
selection, and classifier learning to
solve the problem. First, they
construct a heterogeneous
information network from the call
detail records of users. Then,
extract the explicit features from
potential users' interaction
behavior in the heterogeneous
information network. [105]
3-tree
Decision
tree
algorith
m in
heteroge
neous
informati
on
network
Because users are influenced
by their community, author
extract community-based
implicit features of potential
users.
Ping
Chen
2009 The design of
architecture,
workflow,
algorithm on
grid system for
Social Network
context
prediction
analysis
Architecture of DMG (data mining
grid) is proposed and prototype of
MDG is designed to solve the
computing problem. Two of the
important issues in DMG, which
are the design of the workflow
service in DMG and the distributed
data mining algorithm, are
investigated. Finally a sample
application in telecom field
customer churn context prediction
analysis is investigated and
illustrated. [100]
DMG
(data
mining
grid),
distribute
d data
mining
algorith
m
In the sample application,
Centrals, which is computed
by parallel algorithm on
DMG, is an important measure
in telecom social network.
Li Yi 2010 The Explanation
of Support
Vector Machine
in Customer
Churn
Prediction
In this paper, an explainable
prediction model is established to
select the optimum features and
parameters, then the selected
optimum parameters are applied to
predicting potential customer
churning in one foreign telecom
company. [103]
Support
Vector
Machine
Discovering that the model not
only achieves a desirable
prediction but is also
explainable through selected
features, and that a balanced
relation between accuracy and
explaining of customer churn
prediction model as well as
that a unified structural frame
for customer churn prediction
model is thus established.
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 430
For the customers with highest probability of churning it is predicted how much revenuea service provider is
going to get over the period of customers' stay. In this way valuable customers are identi_ed and efforts are made of
retaining these customers.
An inversely proportional rate of customer lifetime value of anexisting customer to the churn probability of that customer
can be seen, but the customers' decision to stay backis usually coupled with increment of lifetime value of
thatcustomer.Using customer lifetime value in addition to churn predictioncan minimize the cost for making a needless
retention effort(false positives) and the cost of losing a customer because themodel did not predict he is likely to churn
(false negatives).
REFERENCES
[1] S. Yoon, J. Koehler & A. Ghobarah. (2010) Prediction of Advertiser Churn for Google AdWords. JSM
Proceedings, American Statistical Association.
[2] L. Breiman. (2001) Random Forests. Machine Learning, 45 (1), 5-32.
[3] L. Deng et. al. (2013) Recent Advances in Deep Learning for Speech Research at Microsoft. ICASSP, 8604-
8608.
[4] G. Dahl et. al. (2013) Improving Deep Neural Networks for LVCSR using Rectified Linear Units and Dropout.
ICASSP, 8609-8613.
[5] G. Hinton et. al. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared
Views of Four Research Groups. IEEE Signal Processing Magazine, 29 (6) 82-97.
[6] A. van den Oord, S. Dieleman& B. Schrauwen. (2013) Deep Content-based Music Recommendation. Advances
in Neural Information Processing Systems (NIPS), 2643-2651.
[7] J. Zhou, O. G. Troyanskaya. (2013) Deep Supervised and Convolutional Generative Stochastic Network for
Protein Secondary Structure Prediction. Advances in Neural Information Processing Systems (NIPS).
[8] D. V. den Poel and B. Lariviere. Customer attrition analysis for financial services using proportional hazard
models. European Journal of Operational Research, 157(1):196-217, 2004.
[9] W. Verbeke, D. Martens, C. Mues, and B. Baesens, ―Building comprehensible customer churn prediction
models with advanced rule induction techniques,‖ Expert Syst. Appl., vol. 38, no. 3, pp. 2354–2364, 2011.
[10] Chu, B. H., Tsai, M. S., and Ho, C. S., "Towards a hybrid data mining model for customer retention",
Knowledge-Based Systems, 20, 2007, pp. 703–718.
[11] Berry, M. J. A., and Linoff, G. S., "Data mining techniques second edition – for marketing, sales, and customer
relationship management", 2004.
[12] Chen, Y. L., Hsu, C. L., and Chou, S. C., "Constructing a multi-valued and multilabeled decision tree", Expert
Systems with Applications, 25, 2003, 199–209.
[13] Kim, J. K., Song, H. S., Kim, T. S., and Kim, H. K., "Detecting the change of customer behavior based on
decision tree analysis", Expert System with Applications, 22, 2005, 193–205.
[14] Buckinx, W., Moons, E., Poel, D. V. D., and Wets, G., "Customer-adapted coupon targeting using feature
selection", Expert Systems with Applications, 26, 2004, 509–518.
[15] J. Hadden, A. Tiwari, R. Roy, and D. Ruta. Churn Prediction: Does Technology Matter. International Journal of
Intelligent Technology, 1(1):104-110, 2006.
[16] J. Hadden, A. Tiwari, R. Roy, and D. Ruta. Churn Prediction using Complaints Data. International Journal of
Intelligent Technology, 13:158-163, May 2006.
[17] J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA,
USA, 1993.
[18] K. bintiOseman, S. binti M. Shukor, N. Abu Haris, and F. bin Abu Bakar, ―Data Mining in Churn Analysis
Model for Telecommunication Industry,‖ J. Stat. Model. Anal., vol. 1, no. 19–27, 2010.
[19] C.-P. Wei and I. T. Chiu, ―Turning telecommunications call details to churn prediction: a data mining
approach,‖ Expert Syst. Appl., vol. 23, no. 2, pp. 103–112, 2002.
[20] A. T. Jahromi, M. Moeini, I. Akbari, and A. Akbarzadeh, ―A Dual-Step Multi-Algorithm Approach For Churn
Prediction in Pre-Paid Telecommunications Service Providers,‖ RISUS. J. Innov. Sustain, vol. 1, no. 2, 2010.
[21] V. Yeshwanth, V. V. Raj, and M. Saravanan, ―Evolutionary Churn Prediction in Mobile Networks Using Hybrid
Learning,‖ in Twenty-Fourth International FLAIRS Conference, 2011, pp. 471–476.
[22] M. Kaur, K. Singh, and N. Sharma, ―Data Mining as a tool to Predict the Churn Behaviour among Indian bank
customers,‖ Int. J. Recent Innov. Trends Comput. Commun. vol. 1, no. 9, pp. 720–725, 2013.
[23] R. A. Soeini and K. V. Rodpysh, ―Evaluations of Data Mining Methods in Order to Provide the Optimum
Method for Customer Churn Prediction: Case Study Insurance Industry,‖ 2012 Int. Conf. Inf. Comput. Appl.
(ICICA 2012), vol. 24, pp. 290–297, 2012.
[24] J. Hadden, A. Tiwari, R. Roy, and D. Ruta, ―Churn prediction: does technology matter,‖ Int. J. Intell. Technol.,
vol. 1, no. 2, 2006.
[25] S. A. Qureshi, A. S. Rehman, A. M. Qamar, A. Kamal, and A. Rehman, ―Telecommunication subscribers‘ churn
prediction model using machine learning,‖ in Digital Information Management (ICDIM), 2013 Eighth
International Conference on, 2013, pp. 131–136.
[26] Mitchell, T. M. (1997). Machine Learning. McGraw-Hill Science/Engineering/Math.
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 431
[27] George, H. J., & Langley, P. (1995). Estimating Continuous Distributions in Bayesian Classifiers. Proceeding of
the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 338-345). San Mateo: Morgan Kaufmann.
[28] Gutkin, M. (2008). Feature selection methods for classification of gene expression profiles. Tel-Aviv
University.
[29] Cetnik, B. (1990). Estimating Probabilities: A crucial task in machine learning. Ninth European Conference on
Artificial Intelligence, (pp. 147-149). London.
[30] S. Balaji, S.K. Srinivasta, ―Naïve Bayes Classification approach for Mining Life insurance Databases for
Effective Prediction of Customer Preferences over Life Insurance Products‖, International Journal of Computer
Applications, Vol.51, No. 3, 2012.
[31] Z. Kasiran, Z. Ibrahim, and M. S. M. Ribuan, ―Mobile phone customers churn prediction using Elman and
Jordan Recurrent Neural Network,‖ in Computing and Convergence Technology (ICCCT), 2012 7th
International Conference on, 2012, pp. 673–678.
[32] S. A. Qureshi, A. S. Rehman, A. M. Qamar, A. Kamal, and A. Rehman, ―Telecommunication subscribers‘ churn
prediction model using machine learning,‖ in Digital Information Management (ICDIM), 2013 Eighth
International Conference on, 2013, pp. 131–136.
[33] V. Lazarov and M. Capota, ―Churn Prediction,‖ TUM Comput. Sci., 2007.
[34] W. H. Au, K. C. C. Chan, and Y. Xin, ―A novel evolutionary data mining algorithm with applications to churn
prediction,‖ Evol. Comput. IEEE Trans., vol. 7, no. 6, pp. 532–545, 2003.
[35] M. C. Mozer, R. Wolniewicz, D. B. Grimes, E. Johnson, and H. Kaushansky, ―Predicting subscriber
dissatisfaction and improving retention in the wireless telecommunications industry,‖ Neural Networks, IEEE
Trans., vol. 11, no. 3, pp. 690–696, 2000.
[36] A. Sharma and P. K. Panigrahi, ―A Neural Network based Approach for Predicting Customer Churn in Cellular
Network Services,‖ Int. J. Comput. Appl., vol. 27, no. 11, 2011.
[37] Dass, Rajanish and Jain, Rumit, "An Analysis on the factors causing telecom churn: First Findings", AMCIS
2011 Proceedings - All Submissions. 2011, Paper 2.
[38] Adnan Idris, Muhammad Rizwan , Asifullah Khan, "Churn prediction in telecom using Random Forest and PSO
based data balancing in combination with various feature selection strategies", Journal of Computers and
Electrical Engineering, 38 , 2012,1808–1819.
[39] Jae-HyeonAhna, Sang-Pil Hana, Yung-Seop Lee., "Customer churn analysis: Churn determinants and mediation
effects of partial defection in the Korean mobile telecommunications service industry", Telecommunications
Policy, Volume 30, Issues 10–11, 2006, Pages 552–568.
[40] Kim, Park, and Jeong., "The effects of customer satisfaction and switching barrier on customer loyalty in
Korean mobile telecommunication services", Telecommunications Policy. Volume 28, Issue 2, 2004, Pages
145–159.
[41] Berson, A., Smith, S., and Therling, K. (1999). Building data mining applications for CRM. New York:
McGraw-Hill.
[42] Madden, G., Savage, S. J., and Coble-Neal, G., "Subscriber churn in the Australian ISP market", Information
Economics and Policy, 11 (2), 1999, 195–207
[43] Chih Ping Wei, I-Tang Chiu, "Expert Systems with Applications", Volume 23, Issue 2, August 2002, Pages
103–112.
[44] J Lu, "Modeling Customer Lifetime Value Using Survival Analysis: An Application in the Telecommunications
Industry", Data Mining Techniques, 2003, SUGI 28.
[45] Koh, H. C., and Chan, K. L. G., "Data mining and customer relationship marketing in the banking industry",
Singapore Management Review, 24, 2002, 1–28.
[46] Zhen-Yu Chen , Zhi-Ping Fan , Minghe Sun, "A hierarchical multiple kernel support vector machine for
customer churn prediction using longitudinal behavioral data", European Journal of Operational Research, 223,
2012, 461–472.
[47] KristofCoussement , Dirk Van den Poel, "Churn prediction in subscription services: An application of support
vector machines while comparing two parameter-selection techniques", Expert Systems with Applications,
Volume 34, Issue 1 , 2008, Pages 313–327.
[48] YayaXie, Xiu Li, E. W. T. Ngai, and Weiyun Ying, "Customer churn prediction using improved balanced
random forests", Expert Systems with Applications, Volume 36 Issue 3, 2009, Pages 5445-5449.
[49] Kim H S, Yoon C H., "Determinants of subscriber churn and customer loyalty in the Korean mobile telephony
market", Telecommunications Policy, Volume 28, Issues 9–10, 2004, Pages 751–765.
[50] GolshanMohammadi, Reza Tavakkoli-Moghaddam, and MehrdadMohammadi, "Hierarchical Neural Regression
Models for Customer Churn Prediction", Journal of Engineering, Volume 2013 , 2013, Article ID 543940, 9
pages.
[51] Chih-Fong Tsai, Yu-Hsin Lu, "Customer churn prediction by hybrid neural networks‖, Expert System with
Applications 36, 2009, pp. 12547–12553.
[52] Kim, Park, and Jeong, "The effects of customer satisfaction and switching barrier on customer loyalty in Korean
mobile telecommunication services", Telecommunications Policy. Volume 28, Issue 2, 2004, Pages 145–159.
[53] Au, W. H., Chan, K. C. C., and Yao, X., "A novel evolutionary data mining algorithm with applications to churn
prediction", IEEE Transactions on Evolutionary Computation, 7, 2003, 532–545.
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 432
[54] Bin, Peiji, and Juan, "Customer Churn Prediction Based on the Decision Tree in Personal Handyphone System
Service", Service Systems and Service Management, 2007, International Conference, 1-5.
[55] Shin-Yuan Hung , David C. Yen, Hsiu-Yu Wang., "Applying data mining to telecom churn management‖,
Expert System with Applications 31, 2006, 515–524.
[56] John, Ashutosh, Rajkumar, and Dymitr, "Computer assisted customer churn management: State-of-the-art and
future trends", Computers and Operations Research, Volume 34, Issue 10, 2007, Pages 2902–2917.
[57] Lazarov and Capota. "Churn Prediction", 2007, TUM computer science.
[58] Chih Ping Wei, I-Tang Chiu, "Expert Systems with Applications", Volume 23, Issue 2, August 2002, Pages
103–112.
[59] MIHELIS, G., GRIGOROUDIS, E., SISKOS, Y., POLITIS, Y. & MALANDRAKIS, Y. (2001) Customer
Satisfaction Measurement in the Private Bank Sector. European Journal of operational Research, 130, 347-360.
[60] RUST, R. T. & ZAHORIK, A. J. (1993) Customer Satisfaction, Customer Retention, and Market Share. Journal
of retailing, 69, 193-215.
[61] KIM, H. & YOON, C. (2004) Determinants of Subscriber Churn and Customer Loyalty in the Korean Mobile
Telephony Market. Telecommunications Policy, 28, 751-765.
[62] AU, W., CHAN, C. C. & YAO, X. (2003) A novel evolutionary data mining algorithm with applications to
churn prediction. IEEE transactions on evolutionary computation, 7, 532-545.
[63] HWANG, H., JUNG, T. & SUH, E. (2004) An LTV Model and Customer Segmentation Based on Customer
Value: A Case Study on the Wireless Telecommunications Industry. Expert systems with applications, 26, 181-
188.
[64] DATTA, P., MASAND, B., MANI, D. R. & LI, B. (2001) Automated cellular modelling and prediction on a
large scale. Issues on the application of data mining, 485-502.
[65] BAESENS, B., VERSTRAETEN, G., VAN DEN POEL, D., EGMONT-PETERSON, M., VAN KENHOVE, P.
& VANTHIENEN, J. (2004) Bayesian Network Classifiers for Identifying the Slope of the Customer Lifecycle
of Long-Life Customers. European Journal of operational Research, 156, 508-523.
[66] ROSSET, S., NEUMANN, E., EICK, U. & VATNIK, N. (2003) Customer lifetime value models for decision
support. Data mining and knowledge discovery, 7, 321-339.
[67] STAHL, H. K., MATZLER, K. & HINTERHUBER, H. H. (2003) Linking Customer Lifetime Value with
Shareholder Value. Industrial Marketing Management, 32, 267-279.
[68] Ng, A. Y., & Jordan, M. (2002). On Discriminative vs. Generative Classifiers: A comparison of logistic
regression and Naive Bayes. Neural Information Processing Systems: NIPS 14.
[69] Mitchell, T. M. (2005). Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression. In
Machine Learning. Unpublished manuscript.
[70] MarcinOwczarczuk, ―Churn models for prepaid customers in the cellular telecommunication industry using
large data marts‖, Expert Systems with Applications 37 (2010) 4710–4712.
[71] GuangliNie, Wei Rowe, Lingling Zhang, YingjieTian, Yong Shi, ―Credit card churn forecasting by logistic
regression and decision tree‖, Expert Systems with Applications 38 (2011) 15273–15285.
[72] Abbas Keramati, Seyed M.S. Ardabili, ―Churn analysis for an Iranian mobile operator‖, Telecommunications
Policy 35 (2011) 344–356.
[73] Beomsoo Shim, Keunho Choi, YongmooSuh, ―CRM strategies for a small-sized online shopping mall based on
association rules and sequential patterns‖, Expert Systems with Applications 39 (2012) 7736–7742.
[74] PýnarKisioglu, Y. IlkerTopcu, ―Applying Bayesian Belief Network approach to customer churn analysis: A case
study on the telecom industry of Turkey‖, Expert Systems with Applications 38 (2011) 7151–7157.
[75] W. Verbeke, D. Martens, C. Mues, and B. Baesens, ―Building comprehensible customer churn prediction
models with advanced rule induction techniques,‖ Expert Syst. Appl., vol. 38, no. 3, pp. 2354–2364, 2011.
[76] D. T. Pham and M. S. Aksoy, ―RULES: A simple rule extraction system,‖ Expert Syst. Appl., vol. 8, no. 1, pp.
59–65, Jan. 1995.
[77] A. M. AlMana and M. S. Aksoy, ―An Overview of Inductive Learning Algorithms,‖ Int. J. Comput. Appl., vol.
88, no. 4, pp. 20–28, 2014.
[78] M. S. Aksoy, H. Mathkour, and B. A. Alasoos, ―Performance evaluation of rules-3 induction system on data
mining,‖ Int. J. Innov. Comput. Inf. Control, vol. 6, no. 8, pp. 1–8, 2010.
[79] S. A. Qureshi, A. S. Rehman, A. M. Qamar, A. Kamal, and A. Rehman, ―Telecommunication subscribers‘ churn
prediction model using machine learning,‖ in Digital Information Management (ICDIM), 2013 Eighth
International Conference on, 2013, pp. 131–136.
[80] V. Lazarov and M. Capota, ―Churn Prediction,‖ TUM Comput. Sci., 2007.
[81] G. Nie, W. Rowe, L. Zhang, Y. Tian, and Y. Shi, ―Credit card churn forecasting by logistic regression and
decision tree,‖ Expert Syst. Appl., vol. 38, no. 12, pp. 15273–15285, 2011.
[82] S. V Nath and R. S. Behara, ―Customer churn analysis in the wireless industry: A data mining approach,‖ in
Proceedings-Annual Meeting of the Decision Sciences Institute, 2003.
[83] M. Kaur, K. Singh, and N. Sharma, ―Data Mining as a tool to Predict the Churn Behaviour among Indian bank
customers,‖ Int. J. Recent Innov. Trends Comput. Commun, vol. 1, no. 9, pp. 720–725, 2013.
[84] Y. Zhang, J. Qi, H. Shu, and J. Cao, ―A hybrid KNN-LR classifier and its application in customer churn
prediction,‖ in 2007 IEEE International Conference on Systems, Man and Cybernetics, 2007, pp. 3265–3269.
Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),
April- 2015, pp. 417-433
© 2015, IJARCSSE All Rights Reserved Page | 433
[85] Y. Huang and T. Kechadi, ―An effective hybrid learning system for telecommunication churn prediction,‖
Expert Syst. Appl., vol. 40, no. 14, pp. 5635–5647, Oct. 2013.
[86] Bingquan Huang, B.Buckley, T.-M.Kechadi, ―Multiobjective feature selection by using NSGA-II for customer
churn prediction in telecommunications‖, Expert Systems with Applications 37 (2010) 3638–3646.
[87] Y. Huang, B. Q. Huang, M. T. Kechadi, ―A New FilterFeature Selection Approach for Customer Churn
Prediction in Telecommunications‖, Proceedings of the IEEM, IEEE (2010) 338-342.
[88] Idris, A., ―Genetic Programming and Adaboosting based churnprediction for Telecom‖, IEEE International
Conference on Systems, Man, and Cybernetics (SMC), 2012.
[89] Li Peng, ―Telecom customer churn prediction based on imbalanceddata re-sampling method‖, International
Conference on Measurement, Information and Control (ICMIC), 2013.
[90] Liu, Ling, ―Modeling China Telecom customer churn prediction based on CRISP_DM‖, International
Conference on E -Business and E -Government (ICEE), 2011.
[91] Yabas, U., ―Customer Churn Prediction for Telecom Services‖, IEEE 36th Annual Computer Software and
Applications Conference (COMPSAC), 2012.
[92] Xu Hong, ―Churn Prediction in Telecom Using a Hybrid Two phase Feature Selection Method‖, Third
International Symposium on Intelligent Information Technology Application, 2009. IITA 2009.
[93] Suresh, L., ―Analysis and prediction server with column store database—A case study in telecom churn‖, IEEE
Region 10 Conference TENCON 2009 - 2009.
[94] Ning Lu, ―A Customer Churn Prediction Model in Telecom IndustryUsing Boosting‖, IEEE Transactions on
Industrial Informatics.
[95] Pushpa, ―Social network classifier for churn prediction in telecom data‖, International Conference on Advanced
Computing and Communication Systems (ICACCS), 2013.
[96] Xue Zeng, ―Definition of Misclassification Cost & Redistribution Strategy of Telecom Churn Analysis‖,
International Conference on Apperceiving Computing and Intelligence Analysis, 2008. ICACIA 2008.
[97] Jun Liu, ―Research on customer churn prediction model based on IG_NN double attribute selection‖, 2nd
International Conference on Information Science and Engineering (ISICE), 2010.
[98] Yu, W., ―A churn-strategy alignment model for managers in mobiletelecom ‖, Communication Networks and
Services Research Conference, 2005. Proceedings of the 3rd Annual.
[99] Yongbin Zhang, ―Behavior-Based Telecommunication Churn Prediction with Neural Network Approach‖,
International Symposium on Computer Science and Society (ISCCS), 2011.
[100] Ping Chen, ―The design of architecture, workflow, algorithm on gridsystem for Social Network context
prediction analysis‖, IEEE International Conference on Network Infrastructure and Digital Content, 2009. IC-
NIDC 2009.
[101] Pushpa, ―Sociocentric and egocentric measures for identifying thekey players in telecom social network‖,
International Conference on Emerging Trends in Computing, Communication and Nanotechnology (ICE-CCN),
2013.
[102] Jiayin Qi, ―A novel and convenient variable selection method for choosing effective input variables for
telecommunication customer churn prediction model ‖, IEEE International Conference on Systems, Man and
Cybernetics, 2009. SMC 2009.
[103] Li Yi, ―The Explanation of Support Vector Machine in Customer Churn Prediction‖, International Conference
on E-Product E-Service and E-Entertainment (ICEEE), 2010.
[104] Au, W.-H., ―A novel evolutionary data mining algorithm with applications to churn prediction‖, IEEE
Transactions on Evolutionary Computation.
[105] Tsung-Hao Hsu, ―Inferring potential users in mobile social networks‖, International Conference on Data
Science and Advanced Analytics (DSAA), 2014.
[106] Zhang Xiao-bin, ―Customer-Churn Research Based on Customer Segmentation‖, International Conference on
Electronic Commerce and Business Intelligence, 2009. ECBI 2009.
[107] Yin Wu, ―The study on feature selection in customer churnprediction modeling‖, IEEE International
Conference on Systems, Man and Cybernetics, 2009. SMC 2009.
AUTHOR PROFILE
Example Person 1received his MS and PhD degrees in Electrical and Computer Engineering in 2005 and
2007, respectively, from The Ohio State University, Columbus, Ohio, USA. His BS Degree is in Electrical and
Electronics Engineering in 2003 from Bogazici University, Istanbul, Turkey. After completion of his PhD
program, Dr.Vural worked as a Post-Doctoral Researcher at University of California, Riverside from June
2008 to November 2009. He joined
Example Person 2 is a member of the Centre for Communication Systems Research at the University Of
Surrey, UK. Klaus earned his Dipl-Ing (FH) at The University of Applied Sciences in Offenburg, Germany, an
MSc from Brunel University, UK and his PhD from the University Of Surrey (UK). His research interests
include reconfigurability of the different system levels, including