volume 5, issue 4, 2015 issn: 2277 128x international...

© 2015, IJARCSSE All Rights Reserved Page | 417

Volume 5, Issue 4, 2015 ISSN: 2277 128X

International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com

Customer Churn Prediction in Telecommunication Industries using

Data Mining Techniques- A Review Kiran Dahiya KanikaTalwar

M.Tech Student Assistant Professor

MRCE Faridabad, India MRCE Faridabad, India

Abstract—With the swift evolution of digital systems and concomitant information technologies, there is an incipient

inclination in the comprehensive economy to construct digital customer relationship management (CRM) systems.

This leaning is further palpable in the telecommunications industry, where enterprises become progressively

digitalized. Customer churn prediction is a foremost feature of a contemporary telecom CRM systems.Churn

prediction prototype guides the customer relationship management to retain the customers who are probable to quit.

In recent epochs, a number of ensemble and supervised classifiers and data mining techniques are used to model the

churn prediction in telecom. This article presents a state-of-art review of various methods and researches involve in

churn prediction. This paper doassessments on frequently used data mining procedures to categorize customer churn

patterns in telecom industry. The contemporary literature in the expanse of predictive data mining techniques in

customer churn comportment is reviewed and categorized in terms of method used and a argument on the future

research directions is presented.

Keywords: Customer Churn Prediction; Machine learning; data mining, feature extraction, feature selection,

learning methods, classification system, telecomcommunication CRM systems, review study.

I. INTRODUCTION

Churn prediction model monitors the customer relationship management to preserve the customers who are anticipated to

quit.The telecom management endeavorsfirm to acquire precise and timely information about those customers who are

disposed to quit. The churn prediction model plays an important role in this campaign. The intensive competition and

saturated markets have left telecom companies with little margin to ignore high churn rate. This is principally because a

customer quitting the company outlays five or six times more associated to acquiring a new customer. Consequently,

acompetent churn prediction model has a substantialprotagonist to play in telecom industry.

Figure 1. The Churn Prediction Landscape

The contemporary churn prediction system usually relies on classification algorithms. Nevertheless, a classifier

mostly agonizes due to enormous size and bulky dimensionality of the telecom dataset. Moreover, the telecom dataset

has usually an imbalanced nature with scarcer instances of the minority class that also hinders in attaining effective

performance. The classifierschallenges that are confronted in telecom churn prediction, researchers have proposed

various hybrid approaches, where ensemble classification schemes are typically combined with preprocessing techniques.

http://www.ijarcsse.com/

Dahiya et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(4),

April- 2015, pp. 417-433


A. Customers Churn

Customers can churn for various reasons, and churn can happen at any time. Churn prediction is made more difficult by

the fact that customers can show seasonal behavior. Therefore, careful consideration should be given in defining

customer churn to avoid incorrect identification of churned customers. To this end, we take into account the duration of

the latest lapse period and changes in customer activity during the same season year over year.

Customer churn is a fundamental problem for companies and it is defined as the loss of customers because they

move out to competitors. Being able to predict customer churn in advance, provides to a company a high valuable insight

in order to retain and increase their customer base. A wide range of customer churn predictive models has been

developed in the last years. Most advanced models make use of state-of-the-art machine learning classifiers such as

random forests [1] [2]. Machine learning classifiers work well if there is enough human effort spent in feature

engineering, so it is possible to find a reasonable boundary of the classes in feature space. Thus having the right features

for each particular problem is usually the most important thing.

Figure 2. Nokia Siemens customer acquisition & churn study

To solve these problems companies spent a lot of feature engineering effort designing specific features for

problems like churn prediction and fraud detection. Even more problematic, features obtained in this human feature

engineering process are usually over-specified and incomplete. Feature engineering becomes not optimal in companies

that have huge amounts of data. For instance, in some telecommunication companies the data warehouse system holds

more than 100000 variables.

Since deep learning attempts to learn multiple levels of representation and automatically comes up with good

features and representation for the input data, we have recently investigated the application of deep learning in predicting

customer churn in prepaid mobile telecommunication networks.

Our main motivation to investigate and consider the application of deep learning as a predictive model is to

avoid time-consuming feature engineering effort and ideally toincrease the predictive performance of previous models

(or at least not degrade).

Well-known companies are also reporting the use of deep learning in commercial products. Microsoft work on

Mavis speech recognition system represents one of the first examples of using a deep neural network in a commercial

product [3]. IBM and Google are also using deep learning models for speech recognition [4] [5] and image processing.

Deep learning has been also applied to different domains such as automatic music recommendation [6] and prediction of

protein structure [7]. However, to the best of our knowledge this is the first work reporting the use of deep learning for

predicting churn in a mobile telecommunication network.

Figure 3. The Opportunities Churn Prediction provides


April- 2015, pp. 417-433


To investigate the feasibility of using deep learning models in production we trained and validated the models

using large-scale historical data from a telecommunication company with ≈1.2 million customers and span over sixteen

months. This dataset is extremely challenging because churn rate is very high and all customers are prepaid users, so

there is no specific date about contract termination and this action must be inferred in advance from similar behaviors.

Figure 1 depicts a fragment of the input data as a graph representation, it can be seen that there are complex underlying

interactions amongst the users.

Figure 4. A real graph representation of telecommunication data where each node identifies a device (IMEI) or a phone

number (SIM) and edges represent different type of interactions among them.

In a world of ever risingstruggle on the market, firms have become conscious that they should put much

powernot only trying to influence customers to sign agreements, but also to remembercurrentcustomers. Van Den

Poel&Lariviere[8] have shown that in the existingsituation where people are given anenormous choice of proposals&

different service suppliers to choose upon, winning new clients is a costly & hard process. Therefore, putting more power

in keeping churn low has become important for service-oriented firms.

Van Den Poel&Lariviere [8] précis the economic value of customer maintenance:

lowering the necessity topursue new &possible risky clients, which permits focusing on the loads of currentclients;

long-term clients tend to purchase more;

encouragingtalk of mouth from satisfied clients is a respectable way for new clients'attainment;

long-term clients are less costly to attend, because of thesuperior database of their demands;

long-term clients are less complex to competitors' marketing events;

losing clients results in less sales & an increased requirement to attract new clients, which is 5-6times more

expensive than money expended for maintenance of current customers;

People tend to segment more often negative than positive service involvement with friends, consequential in

negative image of the company amongst possible future clients.

The Customer Relationship Management (CRM) tools have been established& applied in order increase customer

attainment&maintenance, growth of profitability & to support significantcriticalresponsibilitiessuch as predictive

modelling &organization. Characteristically, CRM applications hold anenormous set of information regarding

alldistinctclient. This information is increased from clients'movement at the company, data arrived by customer in

process of registering, calls to care hotlines, etc. Appropriateexamination of this data can bring amazingconsequences for

marketing determinations, but also for classifyingclients which are likely to cancel their agreement.

Characteristically, database accesses are scored using anarithmetical model defined over various qualities,

which describe the clients. These qualities are often called predictor variables. Higher scores disclose greater opportunity

of churning. Models are being built using statistical methods like regression analysis, classification trees & neural

networks.

B. Customer Churn Analysis for Telecom Industry

Data volume has been rising at anincrediblestep over the last two decades due to progressions in information

knowledge. At the same time there has been hugeprogress in data mining. Many new approaches&methods have been

added to process data &collect information. The data collected from any source is a raw data in which valuable info is

unknown. Data mining can be clear as the process of eliminating valuable information from data. Data mining methods

have been effectivelyuseful in many different domains.

The toughestdifficulty faced by the telecom industry is customer churn. The customer churn models goal to

distinguish customers with the high probability to jump / leave the service supplier. A database of clients who might

churn permits the company to target those clients& start maintenance strategies that decrease the percentage of customer

churning.


April- 2015, pp. 417-433


Figure 5. Architecture and Design of a Platform for Adaptive, Real-time Churn Prediction using Stream Mining

Retention of old clients is alwaysdesirable option to the company. Attracting new clients costs almost 5-6 times more

than retentive theold clients. Attracting a new clientcomprises new recruits of manpower, cost of publicity & discounts.

A loyal client, who has been with the business for quite the long time, tends to produce higher profits& is less complex to

competitor prices. Such clients also cost less to keep & in addition, offer valuable word-of-mouth marketing to the

professional by mentioning their relations, friends, & other associates. In telecom Industry, the scheme is built to offer

service to some average number of clients, when the client number falls below the planned number. It is careful as loss to

company [9].

A minor step towards retentive acurrent customer can lead to animportantgrowth in revenues & profits. The

condition of the recollecting customers desires for correctmodels of customer churn prediction that are both accurate&

comprehensible. The Models have to classify customers who are about to churn & their reason for churn to evade losses

to industry of telecom, the model should be recognised to classify the reasons to churn & the enhancements required to

recollect customers.

II. DECISION TREE BASED METHODS FOR CUSTOMER CHURN PREDICTION

Decision tree is used to assume future trends & to extract models based upon the interconnected decisions [10-13]. It

works upon the principal of categorizing data into certain classes inaccordance with their features. Internal nodes follow

root node by cover all existenceoptions [14, 12]. Thus a tree is designed with its single arc relatingspecific responses.

The decision trees are most commonly used tool for classification& predictions of future events. The growth of

such trees is completedin two major steps: building & pruning. During the first phase the data set is partitioned

recursively until most of archives in each partition contain equal value. The second phase then eliminates some branches

which comprisethe noisy data (those with largest assessed error rate).

CART, a Classification and Regression Tree, is created by recursive division of an instance into subgroups until

a definitestandards has been met. The tree produces until the reduction of impurity falls below a user-defined threshold.

All node in the decision tree is test condition& the branching is based on value ofquality being tested. The tree is

representing a group of multiple rule sets. When estimating a client data set the arrangement is done by crossing through

the tree until the leaf node is strained. The label of leaf node (Churner / Non Churner) is allocated to the client record

under assessment. Figure 2 shows thebasic churn prediction decision tree for a sector of telecommunication.

Figure 6. A simplified churn prediction decision tree

Decision trees are often evaluated that they are not appropriate for capturing composite& non-linear relationships among

the qualities. Nevertheless, research illustrate [15-17] that the exactness of decision trees & training data necessities are

high.

Oseman et al. [18] presented how to put into application grouping decision tree methods for churn examination in

telecommunication industry. Anillustration set is used to carry out a test of customer churn issue using the ID3 decision

tree. In their outcomes they establish that the area of subscriber was main classification characteristics that contributed to

client churn, other than two minor reasons for customer to churn.

Age<60

Usual Call duration

<2 min Placed Calls > 10

Churner Non Churner Non Churner

Churner

Yes No

Yes No Yes No


April- 2015, pp. 417-433


In Taiwan, Wei & Chiu [19] put into use C4.5 based procedures on one of the largest local mobile telecommunication

companies & it recognized 28.32% of the subscribers that restricted some of true churners with the lift factor of 2.30 &

the preservation time of 14 days. This can be associated to research by Jahormi et al. [20] that the aimed at evolving a

predictive model for client churn in pre-paid mobile telephony establishments. They applied decision trees‘ methodslike

C5.0 with the neural network & it was exposed that based on improvement measure decision trees executed better than

neural networks. Anassociated study was approved out by Yeshwanth [21] in which he shared J48 decision tree along

Genetic algorithm & constructed a hybrid evolutionary method for churn prediction in mobile networks. Authorachieved

72% accurate consequences for largest telecom company in evolving countries. Kaur et al. [22] useful Naive Bayes, J48

& support vector machineries classifiers to process data so as to classify the important characteristics of the customers

that help in forecasting churn of the bank clients. In their findings, they decided that achievementforecast of the loyal

class is less than prediction success rate % of the churn class. Additionally they alsooriginate that the J48 decision tree

had enhanced performance related to other methods.

Soein&Rodpysh [23] preformed some tests in Iran involving relatingnumerous well-known data mining methods: C5.0,

QUEST, and CART, CHAID, Bayesian networks & Neural networks to find out the besttechnique of customers‘ the

churn prediction in the Iranian Insurance Company. The outcomespresented that CART decision tree had improved

performance than other methods. The other scholars, Hadden et al. [24] had the purpose of specifying the most

appropriate model for churn prediction analysis. They showed an estimate on the different algorithms such asCART

trees,neural networks& regression &confirmed their correctness in predicting customer churn. They originate that

decision trees outperform rest of other methods with an overall correctness percentage of 82%.

Qureshi et al. [25] in their examinationimagine active churners in the Telecom industry by applying numerous

methods of data mining such as, K-Means Clustering, Logistic Regression , Neural Network, Linear &,Exhaustive

CHAID, CART, QUEST, &CHAID. They found that Exhaustive CHAID performed wellrelated to all other methods.

60% was the percentage of correctly recognized churners which was highest % among altogether other methods.

However, other decision trees variants did not demonstrate as high presentationin addition to Exhaustive CHAID.

Jahromi et al. [20] showed research with aim of emergingpredictive model for the customer churn in pre-paid companies

of mobile telephony. They acceptedtests on performance of numerous model-building algorithms such asNeural

Networks, C5.0, CART, & CHAID.

III. NAÏVE BAYES BASED METHODS FOR CUSTOMER CHURN PREDICTION

Naïve Bayes learning produces a probabilistic model of the detected data. Despite its ease, Naïve Bayes has been

confirmed to be reasonable with more composite algorithm such as neural network / decision tree in certain domains [26,

27]. Assumed the training set of instances, each is characterised as a vector of features [𝑥1 , 𝑥2 , … , 𝑥𝑑 ], the assignment is

learning from data to be capable to predict most probable class𝑦𝑗 ∈ ℂ of the new instance whose class is unidentified.

Naïve Bayes employs the Bayes‘s theorem to guess the probabilities of the classes.

𝑃 𝑦𝑗 |𝑥1 , 𝑥2 , … , 𝑥𝑑 =𝑃(𝑦𝑗 )𝑃(𝑥1, 𝑥2 , … , 𝑥𝑑 |𝑦𝑗 )

𝑃(𝑥1 , 𝑥2 , … , 𝑥𝑑 )

…. (1)

Where 𝑃(𝑦𝑗 ) is the prior probability of class 𝑦𝑗 which is projected as its existence frequency in the training

data.𝑃(𝑦𝑗 |𝑥1 , 𝑥2 , … , 𝑥𝑑)Is the subsequent probability of class 𝑦𝑗 after observing the data.𝑃(𝑥1 , 𝑥2 , … , 𝑥𝑑 |𝑦𝑗 )Denotes the

conditional probability of observing an occurrence with the feature vector [𝑥1 , 𝑥2 , … , 𝑥𝑑 ] among those having class𝑦𝑗 .

And 𝑃(𝑥1 , 𝑥2, … , 𝑥𝑑) is probability of detecting an instance with feature vector [𝑥1 , 𝑥2 , … , 𝑥𝑑 ] regardless of the class.

Since the sum of the subsequent probabilities entirely classes is one 𝛴𝑦𝑗∈ℂ𝑃 𝑦𝑗 |𝑥1 , 𝑥2, … , 𝑥𝑑 = 1, denominator on eq.

(1)‗s right hand side is normalizing factor & can be omitted.

𝑃 𝑦𝑗 |𝑥1 , 𝑥2 , … , 𝑥𝑑 = 𝑃(𝑦𝑗 )𝑃(𝑥1, 𝑥2 , … , 𝑥𝑑 |𝑦𝑗 )

…. (2)

An instance will be labelled as the particular class which has the highest posterior probability𝑦𝑀𝐴𝑃 .

𝑦𝑀𝐴𝑃 = arg max𝑦𝑗∈ℂ𝑃(𝑦𝑗 )𝑃(𝑥1 , 𝑥2 , … , 𝑥𝑑 |𝑦𝑗 )…. (3)

In order to estimate the term 𝑃(𝑥1, 𝑥2 , … , 𝑥𝑑 |𝑦𝑗 ) by counting frequencies, one needs to have a huge training set where

every possible combinations [𝑥1 , 𝑥2 , … , 𝑥𝑑 ] appear many times to obtain reliable estimates [26]. Naïve Bayes solves this

problem by its Naïve assumption that features that define instances are conditionally independent given the class.

Therefore the probability of observing the combination [𝑥1 , 𝑥2 , … , 𝑥𝑑 ] is simply the product of the probabilities of

observing each individual feature value 𝑃 𝑥1 , 𝑥2 , … , 𝑥𝑑 𝑦𝑗 = 𝑃(𝑥𝑖|𝑦𝑗 )𝑑𝑖=1 . Substituting this approximation into

equation (3) to derive the Naïve Bayes classification rule.

𝑦𝑀𝐴𝑃 = arg max𝑦𝑗∈ℂ

𝑃(𝑦𝑗 ) 𝑃(𝑥𝑖 |𝑦𝑗 )

𝑑

𝑖=1

…. (4)

As discussed above, for nominal feature, the probability is estimated as the frequency over the training data. For

continuous feature, there are two solutions. The first one is to perform discretization on those continuous features,

transferring them to nominal ones. The second solution is to assume that they to follow a normal distribution.


April- 2015, pp. 417-433


The term 𝑃(𝑥𝑖 |𝑦𝑗 ) is estimated by the fraction #𝐷(𝑥𝑖 |𝑦𝑗 )

#𝐷(𝑦𝑗 ), where #𝐷(𝑦𝑗 ) is the number of instances in the training set

having class𝑦𝑗 , and #𝐷(𝑥𝑖|𝑦𝑗 ) is the number of these instances having feature value 𝑥𝑖andclass 𝑦𝑗 . If the training data

doesn‘t contain any instance with this particular combination of class and feature value, #𝐷(𝑥𝑖|𝑦𝑗 ) is zero. The estimate

probability according to equation (4) will be zero for every similar cases. To avoid this, a correction called the m-

estimate is introduced [28, 29].

𝑃 𝑥𝑖 𝑦𝑗 =#𝐷(𝑥𝑖 |𝑦𝑗 ) + 𝑚𝑃(𝑥𝑖)

#𝐷(𝑦𝑗 ) + 𝑚

…. (5)

If the prior probability 𝑃(𝑥𝑖) is unknown, uniform distribution is assumed, i.e. if a feature has k possible values, then

𝑃(𝑥𝑖) = 1/𝑘. The parameter 𝑚 can be regarded as the additional 𝑚 dummy instances appended to the training set.

S. Balaji [30] used Naive Bayesian Classification algorithm for customer classification and to predict churners

that are churned on Life Insurance sector. He also used Naïve Bayes classification to classify the customers from larger

dataset. It also analyses the issues of using data mining technology for predicting the customer habits. In this analysis,

they had tested 10.000 sample of Life Insurance of customers, the unprocessed data can be converted into useful

information and then into knowledge for which they had used predictive data mining techniques .Posterior classification

process applied for the data in this paper. It proved that the naïve bayes classifier is much better than other classifier for

conducting the policy preferences towards the customers. This helps us to raise the income of the organization.

IV. NEURAL NETWORKS BASED METHODS FOR CUSTOMER CHURN PREDICTION

In machine learning and cognitive discipline, artificial neural networks (ANNs) are a group of statistical learning

algorithms enthused by biological neural networks (central nervous systems of animals, in precise the brain) and are used

forassessment or approximation of functions that can be contingent on a large numeral of inputs and are usually

unknown. Artificial neural networks are generally accessible as arrangements of interconnected "neurons" which can

calculate values from inputs, and are accomplished of machine learning as well as pattern recognition obligations to their

adaptive nature.

Figure 7. There are a number of common activation functions in use with neural networks. This is not an exhaustive list.

Neural Networks (NN) is the data mining technique that has capability of learning from the errors [31]. The

Neural Networks are inspired by the brain. This occurs in the sense that brain learns a limited new things which then will

be communicated via the neurons. Similarly, the neural network neuron with the algorithms of learning is able to learn

from the data of training; this makes them be mentioned to as the Artificial Neural Networks [32]. The results of

Lazarov&Capota [33] work presented that the ANNs gave best results as associated to other identified algorithms.

Furthermore they argued that thesuitable prediction model needs constant updating, & should put into the application a

variability of the data mining algorithms. Au et al. [34] consider that the major limitation of the neural networks is that

they scarcely uncover patterns in an easily manner of understandable.

Their study also had exposed that the neural networks outdo the decision trees for churn prediction through the

identification of more churners compared to the C4.5 decision trees. This is in the line with researchproviding by Mozer

et al. in [35] which displays that nonlinear neural network outdoes a decision tree & logistic regression. Sharma

&Panigrahi [36] suggest the neural network-based approach in churn prediction of customer in line with the cellular

wireless services. The consequences of experiments on the churn dataset of UCI repository designate that the neural

network based approach can predict the customer churn with correctness more than the 92%. Correctness that is achieved

by the neural networks completely outweighs the limitation that they essential large volumes of the sets of data & a lot of

time to analyse the considerable load for predictor attributes [33].

V. CUSTOMER CHURN IN TELECOMMUNICATION

Telecommunication has increased 1 of the top positions in list of fastest rising industries of the world by cover 90% of its

population [37]. It is 1 of the segments where customer base plays anactualsignificant role in maintaining the income

[38]. Thesector of telecommunication is facing a simple threat of the customer churn [39-42].

According to [43], the industry of wireless telecom is facing with threat of losing 27% of their customers every

year, which would certainly result in vast revenue loss. It is also an assumed fact that adding / acquiring the new

customer costs 5-10 times more to add the new customer than retentive an old customer withcompany [44].


April- 2015, pp. 417-433


Consequently, [45], propose that the company should pay more consideration to retain its present subscribers rather than

the adding new ones.

Presentlythe business firms pay more consideration to make the firm relationship with their customers [46-49].

Therefore it has become a certainty that the best strategy of marketing is to retain existing subscribers / more simply to

evadethe customer churn [49-52]. To challenge with this problem, the techniques of data mining have been showed as the

best tools to fight in contradiction of ever increasing rate ofcustomer churn [47, 53-58].

VI. REGRESSION BASED METHODS FOR CUSTOMER CHURN PREDICTION

Regression analysis is a popular technique used by the researchers dealing with predicting customer satisfaction. It

provides a first step in model development. Mihelis et al. [59] developed a method to determine customer satisfaction

using an ordinal regression based approach. Another model for assessing the value of customer satisfaction was

developed by [60]. They used logistic regression to link satisfaction with attributes of customer retention. They claim that

the logistic function can be interpreted as providing the retention probability. [61]Use a binomial logit model to

determine subscriber churn in the telecommunications industry, based on discrete choice theory. Discrete choice theory is

the study of behaviour in situations where decision makers must select from a finite set of alternatives. According to [62]

regression analysis is fine for determining a probability for prediction; however it is unable to explicitly express the

hidden patterns in a symbolic and easily understandable form.

Figure 8. Illustration of linear regression on a data set.

Hwang et al. [63] discovered that logistic regression performed best for predicting customer churn when compared with

neural networks and decision tree. It should be noted that [63] were investigating a prediction of the customer lifetime

value (CLV), with the intent of including customer churn; they suggest that logistic regression was the best model for

their purpose. The authors believe that many factors could influence these results such as the neural network parameters

chosen and the data that the experiment was based on. The data used for experimentation may have been more suited to a

logistic regression model than that of a neural network or decision tree.

Datta et al. [64] used simple regression to initially predict churn but later experimented with KNN, decision trees and

neural networks. The overall model used to develop the churn prediction platform was done using a neural network.

Their research could not establish a best method. They have stated future directions as including an explanation of

customer behaviour because their model was unable to predict customer churn accurately. They also suggest that

information stored externally to the organisations‘ database should be included, such as the state of the

telecommunications market and current competing offers etc. The model suggested by [64]fails to distinguish between

loyal customers, valuable customers and less profitable customers. They suggest that future research should include a

more financial orientated approach by optimising payoff. They further suggest that by concentrating on payoff rather than

churn the developed model would weight those customers bringing in higher profits over those bringing in lesser profits.

Baesens et al. [65] used Bayesian network classifiers for identifying the slope of the customer lifetime value (CLV) for

long-life customers, but used simple linear regression on the historical contributions of each customer to capture their

individual lifecycles. The slope was then separated into either positive or negative classes to represent increased or

decreased spending. This variable was then used as the dependent variable in their study. The CLV was also the focus of

research by [66] who used the Kaplan-Meier estimator to estimate the value, and [67] who link CLV to company

shareholder value.

VII. LOGISTIC REGRESSION BASED METHODS FOR CUSTOMER CHURN PREDICTION

Naïve Bayes is a generative classifier. Given the data 𝑥 ∈ ℝ𝑑 and the class 𝑦 ∈ {−1, +1}, it learns a model of the

conditional probability 𝑃(𝑥|𝑦) and the prior probability 𝑃(𝑦) to predict the most probable class 𝑃(𝑦|𝑥). Meanwhile,

logistic regression is a representative of discriminative classifier. It learns a direct map from input x to output y by model

the posterior probability 𝑃 𝑦 𝑥 directly [68]. The parametric model proposed by logistic regression is of the form.

𝑃 𝑦 = −1 𝑥 =1

1 + exp(𝑤0 + 𝑤𝑖𝑥𝑖𝑑𝑖=1 )

…. (1)

And


April- 2015, pp. 417-433


𝑃 𝑦 = 1 𝑥 =exp(𝑤0 + 𝑤𝑖𝑥𝑖

𝑑𝑖=1 )

1 + exp(𝑤0 + 𝑤𝑖𝑥𝑖𝑑𝑖=1 )

…. (2)

The main task of logistic regression is adjusting the weights so that the model fits the data as well as possible.

𝑤 = [𝑤0 , 𝑤1 , 𝑤2 , … , 𝑤𝑑 ] ← arg max𝑤

𝑃(𝑦 𝑘 |𝑥 𝑘 , 𝑤)

𝑘

…. (3)

where w is the vector of parameters,𝑦(𝑘)is the observed value of y and 𝑥(𝑘) the observed value of x in the 𝑘𝑡ℎ training

instance. The maximization of equation (3) is known as the maximum likelihood estimation (MLE).

𝑤 = [𝑤0 , 𝑤1 , 𝑤2 , … , 𝑤𝑑 ] ← arg max𝑤

ln𝑃(𝑦 𝑘 |𝑥 𝑘 , 𝑤)

𝑘

= 𝑎𝑟𝑔 max𝑤

𝐿(𝑤)

…. (4)

Where L(w) is called the conditional log-likelihood of the class [69]. Maximization of L(w) can be achieved for example

by using gradient ascent.

Designed for continuous feature but logistic regression can still handle nominal feature and missing values.

Nominal features are converted to binary features and missing values are replaced by the mean (continuous features) or

the mode (binary features) of the training data.

M.Owczarczuk [70] used logistic regression, G.Nieet al., [71] used logistic regression and decision tree model

and A.Keramati, S.M.S.Ardabili [72] focused on Binomial logistic regression model for churn prediction and identified

customer dissatisfaction, service usage, switching cost and demographic variable affects customer churn. B.Shim et al.,

[73] used decision tree, neural network and logistic regression for customer classification and identified decision tree

shows highest hit ratio among them and P.Kisioglu,Y.I.Topcu [74] applied bayesian belief network to find out most

important factors that have effects on customer churn in telecommunication industry and CAID algorithm is used to

discretize continuous variable in churn.

VIII. COVERING OR RULE BASED ALGORITHMSFOR CUSTOMER CHURN PREDICTION

There are many covering algorithms families like AQ, CN2, RIPPER, and RULES family where rules are directly

induced from a given set of training examples. This can be illustrated using Verbeke et al. [75] application of two novel

data mining methods to customer churn prediction. They also benchmarked to ancient rule induction techniques for

example C4.5, RIPPER, SVM, and logistic regression. They used both ALBA and AntMiner+ to stimulate accurate and

understandable rules for classification. The experiments results proved that in order to get the highest accuracy a

combination of ALBA with C4.5 or RIPPER is needed. If C4.5 and RIPPER are applied on an oversampled dataset the

sensitivity will be on the highest level.

Figure 9. An instance (X, ) of the set-covering problem, where X consists of the 12 black points

and . A minimum-size set cover is . The greedy algorithm produces a cover of size 4

by selecting the sets S1, S4,S5, and S3 in order.

RULE Extraction System (RULES) was eminent from other algorithms of covering families because of its

effortlessness and easy flow. The 1st member of the RULES family of algorithms RULES-1 [76], has been issued in

1995. After that numerous versions of the algorithm have been settled and applied in more than a few domains [77].

From the literature assessment, we establish out that there has been diminutive research work on inductive learning

covering algorithms and their solicitations in customer Churn in telecom industry. RULES family algorithms are

preciseappropriate tools for data mining solicitations. For instance, Aksoy et al. [78] have stated that RULES-3 Inductive

Learning Algorithm is a very virtuous choice for data mining. In a study [78] they used RULES-3 on eleven real life data

cliques for data mining by associating it with three statistical, two lazy, and six rule-based data mining algorithms in

terms of learning proportion, accuracy and heftiness to noisy and incomplete data. The good recital of RULES-3 is

because of its following features: RULES-3 can handle a large sets of examples without having to disruption them up

into smaller subsets; it can produce only rules that comprehend only relevant conditions; it allows a degree of control


April- 2015, pp. 417-433


over the numeral of rules to be extracted; it could be applied to problems involving arithmeticalcharacteristics as well as

nominal attributes and it gives a great flexibility for the user to control the meticulousness of the rules to be engendered,

which can aid in building better prototypes.

IX. STATISTICAL DATA ANALYSIS BASED TECHNIQUES FOR CUSTOMER CHURN PREDICTION

Statistical techniques are a collection of methods applied in data mining used to process large volumes of data. They are

used in learning links between both the dependent and independent attributes. This section presents the major statistical

based data mining techniques (Linear regression, Logistic regression, Naive Bayes Classifier and K-nearest neighbor‘s

algorithm) & their procedure in the context of customer churn Analysis.

Figure 10. Breakdown of Statistical Methods

Methods based on regression have been related with good outcomes in prediction & estimation of churn. In Customer

churn difficulty, there is frequently a two decisions‘ definiteresult. The result is Yes / No or true / false or churns / no

churns. The variables of residual are mostly continuous in the nature because of that logistic regression seemed to be the

best choice [79]. Lazarov andCapota [80] conferred commonly used the algorithm of data mining in the customer churn

prediction&analysis.

The techniques of regression tree were conversed along with other popular methods of data mining such

as,Neural Networks, Rule based learning &Decision Trees. The decision was that the models of good prediction have to

be continuouslyadvanced& a mixture of the proposed approaches has to be used. Qureshi et al. [79] too applied the

methods of logistic regression on telecom industry data to recognizechurners.

It failed to attain well because only 45% of total no. of the churners were properlyrecognized which is a very

low percentage. On conflicting, the logistic regression did a good work by finding 78% of the total number of dynamic

users properly. Additional application is completed by author Nie et al. [81] who used 2algorithmsof data mining; the

decision trees &the logistic regression to construct the model of churn prediction. They used the data of credit card from

the real Chinese bank. The test outcomeclassified regression ahead of the decision trees.

Naive Bayes is a managed learning module which makes predictions about hidden data based on the Bayesian

theorem [80]. [82] Came up with the model of prediction of customer churn. This was based on the algorithm of Naive

Bayes in the data of wireless customer. It attained 68 % correctness in the 1st pass that was based on Bayesian model.

The algorithm of K-nearest Neighbors is 1 of the basic approaches of the traditional statistical classification. The class

label assignment of unseen instance is based on dominant the class label ofkneighbor instances. This categorize assume

only k closest entries in set of training [83]. [84] Whoaccessible in their research the hybrid approach of algorithm of k-

nearest neighbor & also the method of logistic regression for constructing the binary classifier called as KNN-LR.

They carried out the comparison between the KNN-LR with a logistic regression, the C4.5 &the Radial Basis

Function network. The outcome was that the KNN-LR outperformed the RBF on all 4 benchmark datasets. Furthermore,

it also outperformed the logistic regression on these sets of benchmark data, only that they have very close presentation

on Wisconsin breast cancer set of data. The consequence also designated its advantage over the RBF &C4.5 but C4.5 just

surpassed KNN-LR on the dataset of telecom. The new model obtainable by [85] indicates the hybrid model that joins

theadaptedalgorithm of k-means clustering with the classic rule technique of inductive (FOIL) for predicting the behavior

of customer churn.\

A comparison was done to model based on 6 techniques. These were actualdecision tree, k-means, PART,

logistic regression, SVM, KNN, &the OneR& other techniques of Hybrid such asSePI& k-NN-LR. Out of all these 6

organizers, the models of hybrid &the benchmark datasets, proposed system was 12 times improved. There was then

computation of average values of AUC (measurement of the prediction correctness) for each technique of classification,

& hybrid model still has extreme average value.

X. GENETIC ALGORITHM BASED TECHNIQUES FOR CUSTOMER CHURN PREDICTION

Genetic algorithm is a quantity of evolutionary computing, which is a quickly emergent part of artificial intelligence. We

can see that, genetic algorithm is motivated by Darwin's theory around evolution. Fundamentally, explanation of a

problem answered by genetic algorithm is evolved. In a genetic algorithm, a population of threads (termed chromosomes

or the genotype of the genome), which translate applicant explanations (called individuals, creatures, or phenotypes) to

an optimization problem, is changed toward better answers.


April- 2015, pp. 417-433


Figure 11. Illustration of the maneuver of an unpretentious genetic Algorithm. In apiece iteration, or 'generation', a

population of conceivable solutions is assessed and the top-ranking solutions are designated as 'parents' of the next

generation. The subsequent generation of solutions is shaped by the recombination of elements from their parents, along

with sporadic random alterations or 'mutations'. The process is continual so that the 'fitness' of subsequent populations

upsurges. In this manner, the likelihood that the population comprehends the optimal solution also escalates.

Usually, solutions are denoted in binary as series of 0s and 1s, but another programming is also thinkable. The

evolution typically starts from a population of randomly created entities and occurs in groups. In each group, the fitness

of each discrete in the population is calculated, various individuals are stochastically elected from the existing

population, and better (recombined and perhaps arbitrarily mutated) to form a novel population. The novel population is

then used in the following repetition of the algorithm.

Generally, the algorithm dismisses when both a maximum number of generations has been created, and a

satisfactory fitness level has been extended for the population. If the algorithm has completed in line for a maximum

number of groups, a suitable solution may or may not have been extended. Here GA is used to compute the optimum

probability for the selection of cluster head among a range of higher and lower bound.

B.Huang et al., [86] proposed genetic algorithm (NSGA) to find number of features subset in different size and

dimension. The experiments were carried out using decision tree C4.5 and results proved that the NSGA algorithm is

efficient and successful for churn prediction. Y.Huang et al., [87] presented a new approach which based on chi-square

method to select features for customer churn prediction and demonstrated the results with five different methods like DT,

NB, LR, SVM and DMEL.

XI. CONCLUSION AND DISCUSSION

Withholding of possibly churning customers' has arisen to be as imperative for service providers as the acquisition of

new customers. High churn rates and substantial revenue loss due to churning have turned correct churn prediction and

prevention to a vital business process. Although churn is unavoidable, it can be managed and kept in acceptable level.

There are many different ways of churn prediction and new techniques continue to emerge. Good prediction modelshave

to be constantly developed and a combination of theproposed techniques has to be used. Valuable customershave to be

identified, thus leading to a combination of churnprediction methods with customer lifetime value techniques.

Although a conclusion may review the main points of the paper, do not replicate the abstract as the conclusion.

A conclusion might elaborate on the importance of the work or suggest applications and extensions. Customer churn has

been identified as a major problem in Telecom industry and aggressive research has been conducted in this by applying

various data mining techniques.

Decision tree based techniques, Neural Network based techniques and regression techniques are generally

applied in customer churn. From the review of literature we found that decision tree based techniques specially C5.0 and

CART have outperformed some of the existing data mining techniques such as regression in terms of accuracy. On other

cases neural networks outdo the former method due to the size of datasets used and different feature selection methods

applied.

There are likely to be tremendous rates of research in data mining and their applications in customer churn, but

still it is an active research field and researchers are searching for more accurate solutions. In this paper we provide a

summary of the different data mining methods, and their applications in customer churn prediction.

However from the literature survey it is evident that there has been little research work on covering algorithms

and their applications in customer churn, especially when it comes to applying Rules family algorithms in customer

churn analysis.

Future work will be applying RULES family techniques on telecom datasets and compare the results with some

of the most commonly used techniques in churn prediction as they are very suitable tools for data mining applications.

Machine learning methods will be best solution for developing an efficient and automated system for churn prediction.

Once churners have been identi_ed and reasons for quittinghave been found rapid action has to be taken by the marketing

department in order to prevent churn properly. Usuallythe time is not enough to address all likely churners. Therefore,

further decision making has to be done to choose theclients that will be contacted.


April- 2015, pp. 417-433


Table I: Tabular comparison of surveyed literatures

Author

Name

Year Paper Work Work Description Method

Used

Special Note

Idris, A. 2012 Genetic

Programming

and Adaboosting

based churn

prediction for

Telecom

In this paper, author use Genetic

Programming (GP) based approach

for modeling the challenging

problem of churn prediction in

telecom. Adaboost style boosting

is used to evolve a number of

programs per class. [88]

Genetic

Program

ming

(GP),

Adaboost

ing

optimizat

ion

The predictions are made with

the resulting programs using

the higher output, from a

weighted sum of the outputs of

programs per class.

Li Peng 2013 Telecom

customer churn

prediction based

on imbalanced

data re-sampling

method

In this article, authors utilize

imbalanced data re-sampling

method combines Support Vector

Machine (SVM) to solve the

imbalanced data problem, poor

classification performance. [89]

Support

Vector

Machine

(SVM),

data

resampli

ng

method

Using the appropriate metrics

which are more suitable for

imbalanced data sets to

evaluate the performance, the

datasets are obtained from

France telecom operator,

Orange Telecom, and UCI.

Liu,

Ling

2011 Modeling China

Telecom

customer churn

prediction based

on CRISP_DM

It is necessary for China Telecom

to apply data mining tools to

predict customer churn so as to

raise the pertinence of customer

marketing decision. This paper

takes E9 Package project of China

Telecom as an example and builds

an improved customer churn

prediction model based on

CRISP_DM. [90]

Cross

Industry

Standard

Process

for Data

Mining

The result shows that

modeling on C5.0 algorithms

with the condition of

considering fault classification

loss has lower utilization cost

and higher prediction effect,

comparing with CART

algorithms.

Yabas,

U.

2012 Customer Churn

Prediction for

Telecom

Services

These researchers are working on

data mining methods to accurately

predict customers who will change

and turn to another provider for the

same or similar service. Sample

dataset this work use for

experiments has been compiled by

Orange Telecom from real data.

[91]

Random

Forests

algorith

m

They posted the sample

dataset for 2009 Knowledge

Discovery and Data Mining

Competition. Authors are

aiming to find alternative

methods that can match or

improve the recorded highest

score with more efficient use

of resources.

Xu

Hong

2009 Churn

Prediction in

Telecom Using

a Hybrid Two-

phase Feature

Selection

Method

This paper proposes a hybrid two-

phase feature selection method

which can effectively reduce

feature dimension and promote

predicting performance by using

both traditional expertise approach

and Markov blanket discovery

technique. [92]

Markov

blanket

discover

y

techniqu

e, local

causal

discover

y,

Markov

blanket

induction

Empirical results of a branch

of a Chinese wireless telecom

company show that it is a

feasible and superior method

for telecom costumer feature

selection. The results also

show better performance of

this method than the method

based on traditional expertise

approach.

Suresh,

L.

2009 Analysis and

prediction server

with column

store database

— A case study

in telecom churn

In this work the attempts made by

the authors to develop such a

system named as `rePivot' are

presented. The proposed frame

work consists of three modules

namely - a column store database

to provide quick access to data, a

time series ranking module and a

probabilistic forecasting module.

[93]

Probabili

stic

Projectio

n Model.

multiple

linear

regressio

n, pattern

based

approach

Ranking

Time

Series

A case study of the proposed

frame work in churn analysis

and modeling in telecom has

been carried out to test the

suitability of framework for

industrial applications.

Application of the framework

has shown promising results.


April- 2015, pp. 417-433


Ning Lu 2014 A Customer

Churn

Prediction

Model in

Telecom

Industry Using

Boosting

This research conducts a real-

world study on customer churn

prediction and proposes the use of

boosting to enhance a customer

churn prediction model. Unlike

most research that uses boosting as

a method to boost the accuracy of a

given basis learner, this paper tries

to separate customers into two

clusters based on the weight

assigned by the boosting

algorithm. [94]

Logistic

regressio

n

As a result separating

customers into clusters, a

higher risk customer cluster

has been identified. Logistic

regression is used in this

research as a basis learner, and

a churn prediction model is

built on each cluster,

respectively.

Pushpa 2013 Social network

classifier for

churn prediction

in telecom data

This paper addresses the Social

position of each customer in a

network and Equivalence

approaches to classify the telecom

customers. Social position can be

evaluated by finding the centrality

of a node identified through a

number of connections among

network members. [95]

Centralit

y

Measures

, SNA

Techniqu

es

(Centralit

y

measures

), Social

Network

Classifier

, Regular

Equivale

nce

algorith

ms

Such measures are used to

characterize degrees of

influence, prominence and

importance of certain

members. Regular equivalence

analysis seeks to identify

customers as churners and

non-churners based on

regularities in the patterns of

network ties.

Xue

Zeng

2008 Definition of

Misclassificatio

n Cost &

Redistribution

Strategy of

Telecom Churn

Analysis

To solve the data imbalance

problem exiting in this field,

traditional researches always

redistribute samples according to

misclassification cost. But exiting

researches in this area neither gave

out the quantitative description of

the misclassification cost nor set

up a unified method for

redistributing samples. [96]

Study

Work

To solve these problems, an

original mathematical

definition of misclassification

cost for this domain is set up

by taking telecom industry

economic factors into

consideration and a

redistribute strategy based on

this cost is drawn out.

Jun Liu 2010 Research on

customer churn

prediction model

based on IG_NN

double attribute

selection

This paper discusses the problem

of customer churn prediction, and

proposes the customer churn

prediction model based on double

attribute selection of information

gain (IG) and neural network (NN)

by analyzing the characteristics of

customer churn data. [97]

Informati

on gain

(IG) and

neural

network

(NN)

Firstly, undertake the main

attribute selection for

customer churn data by using

IG, and then analyze every

main attribute by using NN,

which output results are

analyzed by 80–20 rule to get

the key attributes affecting

customer churn; secondly,

construct the prediction model

based on IG_NN by taking the

key attributes as input and

customer churn probability as

output. The model predicts

lost customers next month by

carrying on data acquisition

about customer behavior and

payment information of a

telecom operator during first

three months.

Yu, W. 2005 A churn-strategy

alignment model

for managers in

mobile telecom

In this paper, authors propose a

new model for strategic alignment

of churn predictors to an

adaptation of the Delta strategic

Delta

strategic

model,

churn-

Research results contribute to

analyzing churn predictors

from a new perspective - that

of organizational


April- 2015, pp. 417-433


model for firm competitiveness.

This model is substantiated using a

dataset from Duke University's

Teradata Center for CRM. [98]

strategy

alignmen

t model

competitiveness strategy.

Using factor analysis, the

model links high-level churn

predictors with

competitiveness strategy.

Yongbin

Zhang

2011 Behavior-Based

Telecommunicat

ion Churn

Prediction with

Neural Network

Approach

A behavior-based telecom

customer churn prediction system

is presented in this paper. Unlike

conventional churn prediction

methods, which use customer

demographics, contractual data,

customer service logs, call-details,

complaint data, bill and payment as

inputs and churn as target output,

only customer service usage

information is included in this

system to predict customer churn

using a clustering algorithm [99]

Neural

Network

It can solve the problems

which traditional methods

have to face, such as missing

or non-reliable data and the

correlation among inputs.

Pushpa 2013 Sociocentric and

egocentric

measures for

identifying the

key players in

telecom social

network

The typical work on social network

analysis includes the construction

of both multirelational telecom

social networks and ego-networks

of telecom customers for discovery

of group of customers who share

similar properties and classify the

customers as churners and non-

churners. [101]

Clusterin

g based

on

Sociocen

tric and

egocentri

c

measures

This paper explores both

sociocentric and egocentric

methods for identifying key

players who plays important

roles in decision making in

finding the churn rate of

telecom social networks.

Hsu,

Tsung-

Hao

2014 Inferring

potential users

in mobile social

networks

To infer potential users, authors

propose a framework including

feature extraction, feature

selection, and classifier learning to

solve the problem. First, they

construct a heterogeneous

information network from the call

detail records of users. Then,

extract the explicit features from

potential users' interaction

behavior in the heterogeneous

information network. [105]

3-tree

Decision

tree

algorith

m in

heteroge

neous

informati

on

network

Because users are influenced

by their community, author

extract community-based

implicit features of potential

users.

Ping

Chen

2009 The design of

architecture,

workflow,

algorithm on

grid system for

Social Network

context

prediction

analysis

Architecture of DMG (data mining

grid) is proposed and prototype of

MDG is designed to solve the

computing problem. Two of the

important issues in DMG, which

are the design of the workflow

service in DMG and the distributed

data mining algorithm, are

investigated. Finally a sample

application in telecom field

customer churn context prediction

analysis is investigated and

illustrated. [100]

DMG

(data

mining

grid),

distribute

d data

mining

algorith

m

In the sample application,

Centrals, which is computed

by parallel algorithm on

DMG, is an important measure

in telecom social network.

Li Yi 2010 The Explanation

of Support

Vector Machine

in Customer

Churn

Prediction

In this paper, an explainable

prediction model is established to

select the optimum features and

parameters, then the selected

optimum parameters are applied to

predicting potential customer

churning in one foreign telecom

company. [103]

Support

Vector

Machine

Discovering that the model not

only achieves a desirable

prediction but is also

explainable through selected

features, and that a balanced

relation between accuracy and

explaining of customer churn

prediction model as well as

that a unified structural frame

for customer churn prediction

model is thus established.


April- 2015, pp. 417-433


For the customers with highest probability of churning it is predicted how much revenuea service provider is

going to get over the period of customers' stay. In this way valuable customers are identi_ed and efforts are made of

retaining these customers.

An inversely proportional rate of customer lifetime value of anexisting customer to the churn probability of that customer

can be seen, but the customers' decision to stay backis usually coupled with increment of lifetime value of

thatcustomer.Using customer lifetime value in addition to churn predictioncan minimize the cost for making a needless

retention effort(false positives) and the cost of losing a customer because themodel did not predict he is likely to churn

(false negatives).

REFERENCES

[1] S. Yoon, J. Koehler & A. Ghobarah. (2010) Prediction of Advertiser Churn for Google AdWords. JSM

Proceedings, American Statistical Association.

[2] L. Breiman. (2001) Random Forests. Machine Learning, 45 (1), 5-32.

[3] L. Deng et. al. (2013) Recent Advances in Deep Learning for Speech Research at Microsoft. ICASSP, 8604-

8608.

[4] G. Dahl et. al. (2013) Improving Deep Neural Networks for LVCSR using Rectified Linear Units and Dropout.

ICASSP, 8609-8613.

[5] G. Hinton et. al. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared

Views of Four Research Groups. IEEE Signal Processing Magazine, 29 (6) 82-97.

[6] A. van den Oord, S. Dieleman& B. Schrauwen. (2013) Deep Content-based Music Recommendation. Advances

in Neural Information Processing Systems (NIPS), 2643-2651.

[7] J. Zhou, O. G. Troyanskaya. (2013) Deep Supervised and Convolutional Generative Stochastic Network for

Protein Secondary Structure Prediction. Advances in Neural Information Processing Systems (NIPS).

[8] D. V. den Poel and B. Lariviere. Customer attrition analysis for financial services using proportional hazard

models. European Journal of Operational Research, 157(1):196-217, 2004.

[9] W. Verbeke, D. Martens, C. Mues, and B. Baesens, ―Building comprehensible customer churn prediction

models with advanced rule induction techniques,‖ Expert Syst. Appl., vol. 38, no. 3, pp. 2354–2364, 2011.

[10] Chu, B. H., Tsai, M. S., and Ho, C. S., "Towards a hybrid data mining model for customer retention",

Knowledge-Based Systems, 20, 2007, pp. 703–718.

[11] Berry, M. J. A., and Linoff, G. S., "Data mining techniques second edition – for marketing, sales, and customer

relationship management", 2004.

[12] Chen, Y. L., Hsu, C. L., and Chou, S. C., "Constructing a multi-valued and multilabeled decision tree", Expert

Systems with Applications, 25, 2003, 199–209.

[13] Kim, J. K., Song, H. S., Kim, T. S., and Kim, H. K., "Detecting the change of customer behavior based on

decision tree analysis", Expert System with Applications, 22, 2005, 193–205.

[14] Buckinx, W., Moons, E., Poel, D. V. D., and Wets, G., "Customer-adapted coupon targeting using feature

selection", Expert Systems with Applications, 26, 2004, 509–518.

[15] J. Hadden, A. Tiwari, R. Roy, and D. Ruta. Churn Prediction: Does Technology Matter. International Journal of

Intelligent Technology, 1(1):104-110, 2006.

[16] J. Hadden, A. Tiwari, R. Roy, and D. Ruta. Churn Prediction using Complaints Data. International Journal of

Intelligent Technology, 13:158-163, May 2006.

[17] J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA,

USA, 1993.

[18] K. bintiOseman, S. binti M. Shukor, N. Abu Haris, and F. bin Abu Bakar, ―Data Mining in Churn Analysis

Model for Telecommunication Industry,‖ J. Stat. Model. Anal., vol. 1, no. 19–27, 2010.

[19] C.-P. Wei and I. T. Chiu, ―Turning telecommunications call details to churn prediction: a data mining

approach,‖ Expert Syst. Appl., vol. 23, no. 2, pp. 103–112, 2002.

[20] A. T. Jahromi, M. Moeini, I. Akbari, and A. Akbarzadeh, ―A Dual-Step Multi-Algorithm Approach For Churn

Prediction in Pre-Paid Telecommunications Service Providers,‖ RISUS. J. Innov. Sustain, vol. 1, no. 2, 2010.

[21] V. Yeshwanth, V. V. Raj, and M. Saravanan, ―Evolutionary Churn Prediction in Mobile Networks Using Hybrid

Learning,‖ in Twenty-Fourth International FLAIRS Conference, 2011, pp. 471–476.

[22] M. Kaur, K. Singh, and N. Sharma, ―Data Mining as a tool to Predict the Churn Behaviour among Indian bank

customers,‖ Int. J. Recent Innov. Trends Comput. Commun. vol. 1, no. 9, pp. 720–725, 2013.

[23] R. A. Soeini and K. V. Rodpysh, ―Evaluations of Data Mining Methods in Order to Provide the Optimum

Method for Customer Churn Prediction: Case Study Insurance Industry,‖ 2012 Int. Conf. Inf. Comput. Appl.

(ICICA 2012), vol. 24, pp. 290–297, 2012.

[24] J. Hadden, A. Tiwari, R. Roy, and D. Ruta, ―Churn prediction: does technology matter,‖ Int. J. Intell. Technol.,

vol. 1, no. 2, 2006.

[25] S. A. Qureshi, A. S. Rehman, A. M. Qamar, A. Kamal, and A. Rehman, ―Telecommunication subscribers‘ churn

prediction model using machine learning,‖ in Digital Information Management (ICDIM), 2013 Eighth

International Conference on, 2013, pp. 131–136.

[26] Mitchell, T. M. (1997). Machine Learning. McGraw-Hill Science/Engineering/Math.


April- 2015, pp. 417-433


[27] George, H. J., & Langley, P. (1995). Estimating Continuous Distributions in Bayesian Classifiers. Proceeding of

the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 338-345). San Mateo: Morgan Kaufmann.

[28] Gutkin, M. (2008). Feature selection methods for classification of gene expression profiles. Tel-Aviv

University.

[29] Cetnik, B. (1990). Estimating Probabilities: A crucial task in machine learning. Ninth European Conference on

Artificial Intelligence, (pp. 147-149). London.

[30] S. Balaji, S.K. Srinivasta, ―Naïve Bayes Classification approach for Mining Life insurance Databases for

Effective Prediction of Customer Preferences over Life Insurance Products‖, International Journal of Computer

Applications, Vol.51, No. 3, 2012.

[31] Z. Kasiran, Z. Ibrahim, and M. S. M. Ribuan, ―Mobile phone customers churn prediction using Elman and

Jordan Recurrent Neural Network,‖ in Computing and Convergence Technology (ICCCT), 2012 7th





[33] V. Lazarov and M. Capota, ―Churn Prediction,‖ TUM Comput. Sci., 2007.

[34] W. H. Au, K. C. C. Chan, and Y. Xin, ―A novel evolutionary data mining algorithm with applications to churn

prediction,‖ Evol. Comput. IEEE Trans., vol. 7, no. 6, pp. 532–545, 2003.

[35] M. C. Mozer, R. Wolniewicz, D. B. Grimes, E. Johnson, and H. Kaushansky, ―Predicting subscriber

dissatisfaction and improving retention in the wireless telecommunications industry,‖ Neural Networks, IEEE

Trans., vol. 11, no. 3, pp. 690–696, 2000.

[36] A. Sharma and P. K. Panigrahi, ―A Neural Network based Approach for Predicting Customer Churn in Cellular

Network Services,‖ Int. J. Comput. Appl., vol. 27, no. 11, 2011.

[37] Dass, Rajanish and Jain, Rumit, "An Analysis on the factors causing telecom churn: First Findings", AMCIS

2011 Proceedings - All Submissions. 2011, Paper 2.

[38] Adnan Idris, Muhammad Rizwan , Asifullah Khan, "Churn prediction in telecom using Random Forest and PSO

based data balancing in combination with various feature selection strategies", Journal of Computers and

Electrical Engineering, 38 , 2012,1808–1819.

[39] Jae-HyeonAhna, Sang-Pil Hana, Yung-Seop Lee., "Customer churn analysis: Churn determinants and mediation

effects of partial defection in the Korean mobile telecommunications service industry", Telecommunications

Policy, Volume 30, Issues 10–11, 2006, Pages 552–568.

[40] Kim, Park, and Jeong., "The effects of customer satisfaction and switching barrier on customer loyalty in

Korean mobile telecommunication services", Telecommunications Policy. Volume 28, Issue 2, 2004, Pages

145–159.

[41] Berson, A., Smith, S., and Therling, K. (1999). Building data mining applications for CRM. New York:

McGraw-Hill.

[42] Madden, G., Savage, S. J., and Coble-Neal, G., "Subscriber churn in the Australian ISP market", Information

Economics and Policy, 11 (2), 1999, 195–207

[43] Chih Ping Wei, I-Tang Chiu, "Expert Systems with Applications", Volume 23, Issue 2, August 2002, Pages

103–112.

[44] J Lu, "Modeling Customer Lifetime Value Using Survival Analysis: An Application in the Telecommunications

Industry", Data Mining Techniques, 2003, SUGI 28.

[45] Koh, H. C., and Chan, K. L. G., "Data mining and customer relationship marketing in the banking industry",

Singapore Management Review, 24, 2002, 1–28.

[46] Zhen-Yu Chen , Zhi-Ping Fan , Minghe Sun, "A hierarchical multiple kernel support vector machine for

customer churn prediction using longitudinal behavioral data", European Journal of Operational Research, 223,

2012, 461–472.

[47] KristofCoussement , Dirk Van den Poel, "Churn prediction in subscription services: An application of support

vector machines while comparing two parameter-selection techniques", Expert Systems with Applications,

Volume 34, Issue 1 , 2008, Pages 313–327.

[48] YayaXie, Xiu Li, E. W. T. Ngai, and Weiyun Ying, "Customer churn prediction using improved balanced

random forests", Expert Systems with Applications, Volume 36 Issue 3, 2009, Pages 5445-5449.

[49] Kim H S, Yoon C H., "Determinants of subscriber churn and customer loyalty in the Korean mobile telephony

market", Telecommunications Policy, Volume 28, Issues 9–10, 2004, Pages 751–765.

[50] GolshanMohammadi, Reza Tavakkoli-Moghaddam, and MehrdadMohammadi, "Hierarchical Neural Regression

Models for Customer Churn Prediction", Journal of Engineering, Volume 2013 , 2013, Article ID 543940, 9

pages.

[51] Chih-Fong Tsai, Yu-Hsin Lu, "Customer churn prediction by hybrid neural networks‖, Expert System with

Applications 36, 2009, pp. 12547–12553.

[52] Kim, Park, and Jeong, "The effects of customer satisfaction and switching barrier on customer loyalty in Korean

mobile telecommunication services", Telecommunications Policy. Volume 28, Issue 2, 2004, Pages 145–159.

[53] Au, W. H., Chan, K. C. C., and Yao, X., "A novel evolutionary data mining algorithm with applications to churn

prediction", IEEE Transactions on Evolutionary Computation, 7, 2003, 532–545.


April- 2015, pp. 417-433


[54] Bin, Peiji, and Juan, "Customer Churn Prediction Based on the Decision Tree in Personal Handyphone System

Service", Service Systems and Service Management, 2007, International Conference, 1-5.

[55] Shin-Yuan Hung , David C. Yen, Hsiu-Yu Wang., "Applying data mining to telecom churn management‖,

Expert System with Applications 31, 2006, 515–524.

[56] John, Ashutosh, Rajkumar, and Dymitr, "Computer assisted customer churn management: State-of-the-art and

future trends", Computers and Operations Research, Volume 34, Issue 10, 2007, Pages 2902–2917.

[57] Lazarov and Capota. "Churn Prediction", 2007, TUM computer science.

[58] Chih Ping Wei, I-Tang Chiu, "Expert Systems with Applications", Volume 23, Issue 2, August 2002, Pages

103–112.

[59] MIHELIS, G., GRIGOROUDIS, E., SISKOS, Y., POLITIS, Y. & MALANDRAKIS, Y. (2001) Customer

Satisfaction Measurement in the Private Bank Sector. European Journal of operational Research, 130, 347-360.

[60] RUST, R. T. & ZAHORIK, A. J. (1993) Customer Satisfaction, Customer Retention, and Market Share. Journal

of retailing, 69, 193-215.

[61] KIM, H. & YOON, C. (2004) Determinants of Subscriber Churn and Customer Loyalty in the Korean Mobile

Telephony Market. Telecommunications Policy, 28, 751-765.

[62] AU, W., CHAN, C. C. & YAO, X. (2003) A novel evolutionary data mining algorithm with applications to

churn prediction. IEEE transactions on evolutionary computation, 7, 532-545.

[63] HWANG, H., JUNG, T. & SUH, E. (2004) An LTV Model and Customer Segmentation Based on Customer

Value: A Case Study on the Wireless Telecommunications Industry. Expert systems with applications, 26, 181-

188.

[64] DATTA, P., MASAND, B., MANI, D. R. & LI, B. (2001) Automated cellular modelling and prediction on a

large scale. Issues on the application of data mining, 485-502.

[65] BAESENS, B., VERSTRAETEN, G., VAN DEN POEL, D., EGMONT-PETERSON, M., VAN KENHOVE, P.

& VANTHIENEN, J. (2004) Bayesian Network Classifiers for Identifying the Slope of the Customer Lifecycle

of Long-Life Customers. European Journal of operational Research, 156, 508-523.

[66] ROSSET, S., NEUMANN, E., EICK, U. & VATNIK, N. (2003) Customer lifetime value models for decision

support. Data mining and knowledge discovery, 7, 321-339.

[67] STAHL, H. K., MATZLER, K. & HINTERHUBER, H. H. (2003) Linking Customer Lifetime Value with

Shareholder Value. Industrial Marketing Management, 32, 267-279.

[68] Ng, A. Y., & Jordan, M. (2002). On Discriminative vs. Generative Classifiers: A comparison of logistic

regression and Naive Bayes. Neural Information Processing Systems: NIPS 14.

[69] Mitchell, T. M. (2005). Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression. In

Machine Learning. Unpublished manuscript.

[70] MarcinOwczarczuk, ―Churn models for prepaid customers in the cellular telecommunication industry using

large data marts‖, Expert Systems with Applications 37 (2010) 4710–4712.

[71] GuangliNie, Wei Rowe, Lingling Zhang, YingjieTian, Yong Shi, ―Credit card churn forecasting by logistic

regression and decision tree‖, Expert Systems with Applications 38 (2011) 15273–15285.

[72] Abbas Keramati, Seyed M.S. Ardabili, ―Churn analysis for an Iranian mobile operator‖, Telecommunications

Policy 35 (2011) 344–356.

[73] Beomsoo Shim, Keunho Choi, YongmooSuh, ―CRM strategies for a small-sized online shopping mall based on

association rules and sequential patterns‖, Expert Systems with Applications 39 (2012) 7736–7742.

[74] PýnarKisioglu, Y. IlkerTopcu, ―Applying Bayesian Belief Network approach to customer churn analysis: A case

study on the telecom industry of Turkey‖, Expert Systems with Applications 38 (2011) 7151–7157.

[75] W. Verbeke, D. Martens, C. Mues, and B. Baesens, ―Building comprehensible customer churn prediction

models with advanced rule induction techniques,‖ Expert Syst. Appl., vol. 38, no. 3, pp. 2354–2364, 2011.

[76] D. T. Pham and M. S. Aksoy, ―RULES: A simple rule extraction system,‖ Expert Syst. Appl., vol. 8, no. 1, pp.

59–65, Jan. 1995.

[77] A. M. AlMana and M. S. Aksoy, ―An Overview of Inductive Learning Algorithms,‖ Int. J. Comput. Appl., vol.

88, no. 4, pp. 20–28, 2014.

[78] M. S. Aksoy, H. Mathkour, and B. A. Alasoos, ―Performance evaluation of rules-3 induction system on data

mining,‖ Int. J. Innov. Comput. Inf. Control, vol. 6, no. 8, pp. 1–8, 2010.




[80] V. Lazarov and M. Capota, ―Churn Prediction,‖ TUM Comput. Sci., 2007.

[81] G. Nie, W. Rowe, L. Zhang, Y. Tian, and Y. Shi, ―Credit card churn forecasting by logistic regression and

decision tree,‖ Expert Syst. Appl., vol. 38, no. 12, pp. 15273–15285, 2011.

[82] S. V Nath and R. S. Behara, ―Customer churn analysis in the wireless industry: A data mining approach,‖ in

Proceedings-Annual Meeting of the Decision Sciences Institute, 2003.

[83] M. Kaur, K. Singh, and N. Sharma, ―Data Mining as a tool to Predict the Churn Behaviour among Indian bank

customers,‖ Int. J. Recent Innov. Trends Comput. Commun, vol. 1, no. 9, pp. 720–725, 2013.

[84] Y. Zhang, J. Qi, H. Shu, and J. Cao, ―A hybrid KNN-LR classifier and its application in customer churn

prediction,‖ in 2007 IEEE International Conference on Systems, Man and Cybernetics, 2007, pp. 3265–3269.


April- 2015, pp. 417-433


[85] Y. Huang and T. Kechadi, ―An effective hybrid learning system for telecommunication churn prediction,‖

Expert Syst. Appl., vol. 40, no. 14, pp. 5635–5647, Oct. 2013.

[86] Bingquan Huang, B.Buckley, T.-M.Kechadi, ―Multiobjective feature selection by using NSGA-II for customer

churn prediction in telecommunications‖, Expert Systems with Applications 37 (2010) 3638–3646.

[87] Y. Huang, B. Q. Huang, M. T. Kechadi, ―A New FilterFeature Selection Approach for Customer Churn

Prediction in Telecommunications‖, Proceedings of the IEEM, IEEE (2010) 338-342.

[88] Idris, A., ―Genetic Programming and Adaboosting based churnprediction for Telecom‖, IEEE International

Conference on Systems, Man, and Cybernetics (SMC), 2012.

[89] Li Peng, ―Telecom customer churn prediction based on imbalanceddata re-sampling method‖, International

Conference on Measurement, Information and Control (ICMIC), 2013.

[90] Liu, Ling, ―Modeling China Telecom customer churn prediction based on CRISP_DM‖, International

Conference on E -Business and E -Government (ICEE), 2011.

[91] Yabas, U., ―Customer Churn Prediction for Telecom Services‖, IEEE 36th Annual Computer Software and

Applications Conference (COMPSAC), 2012.

[92] Xu Hong, ―Churn Prediction in Telecom Using a Hybrid Two phase Feature Selection Method‖, Third

International Symposium on Intelligent Information Technology Application, 2009. IITA 2009.

[93] Suresh, L., ―Analysis and prediction server with column store database—A case study in telecom churn‖, IEEE

Region 10 Conference TENCON 2009 - 2009.

[94] Ning Lu, ―A Customer Churn Prediction Model in Telecom IndustryUsing Boosting‖, IEEE Transactions on

Industrial Informatics.

[95] Pushpa, ―Social network classifier for churn prediction in telecom data‖, International Conference on Advanced

Computing and Communication Systems (ICACCS), 2013.

[96] Xue Zeng, ―Definition of Misclassification Cost & Redistribution Strategy of Telecom Churn Analysis‖,

International Conference on Apperceiving Computing and Intelligence Analysis, 2008. ICACIA 2008.

[97] Jun Liu, ―Research on customer churn prediction model based on IG_NN double attribute selection‖, 2nd

International Conference on Information Science and Engineering (ISICE), 2010.

[98] Yu, W., ―A churn-strategy alignment model for managers in mobiletelecom ‖, Communication Networks and

Services Research Conference, 2005. Proceedings of the 3rd Annual.

[99] Yongbin Zhang, ―Behavior-Based Telecommunication Churn Prediction with Neural Network Approach‖,

International Symposium on Computer Science and Society (ISCCS), 2011.

[100] Ping Chen, ―The design of architecture, workflow, algorithm on gridsystem for Social Network context

prediction analysis‖, IEEE International Conference on Network Infrastructure and Digital Content, 2009. IC-

NIDC 2009.

[101] Pushpa, ―Sociocentric and egocentric measures for identifying thekey players in telecom social network‖,

International Conference on Emerging Trends in Computing, Communication and Nanotechnology (ICE-CCN),

2013.

[102] Jiayin Qi, ―A novel and convenient variable selection method for choosing effective input variables for

telecommunication customer churn prediction model ‖, IEEE International Conference on Systems, Man and

Cybernetics, 2009. SMC 2009.

[103] Li Yi, ―The Explanation of Support Vector Machine in Customer Churn Prediction‖, International Conference

on E-Product E-Service and E-Entertainment (ICEEE), 2010.

[104] Au, W.-H., ―A novel evolutionary data mining algorithm with applications to churn prediction‖, IEEE

Transactions on Evolutionary Computation.

[105] Tsung-Hao Hsu, ―Inferring potential users in mobile social networks‖, International Conference on Data

Science and Advanced Analytics (DSAA), 2014.

[106] Zhang Xiao-bin, ―Customer-Churn Research Based on Customer Segmentation‖, International Conference on

Electronic Commerce and Business Intelligence, 2009. ECBI 2009.

[107] Yin Wu, ―The study on feature selection in customer churnprediction modeling‖, IEEE International

Conference on Systems, Man and Cybernetics, 2009. SMC 2009.

AUTHOR PROFILE

Example Person 1received his MS and PhD degrees in Electrical and Computer Engineering in 2005 and

2007, respectively, from The Ohio State University, Columbus, Ohio, USA. His BS Degree is in Electrical and

Electronics Engineering in 2003 from Bogazici University, Istanbul, Turkey. After completion of his PhD

program, Dr.Vural worked as a Post-Doctoral Researcher at University of California, Riverside from June

2008 to November 2009. He joined

Example Person 2 is a member of the Centre for Communication Systems Research at the University Of

Surrey, UK. Klaus earned his Dipl-Ing (FH) at The University of Applied Sciences in Offenburg, Germany, an

MSc from Brunel University, UK and his PhD from the University Of Surrey (UK). His research interests

include reconfigurability of the different system levels, including

volume 5, issue 4, 2015 issn: 2277 128x international...

Documents