regularized weighted ensemble of deep classifiers › papers › ijcsa › v5n3 ›...

19
International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015 DOI:10.5121/ijcsa.2015.5305 47 Regularized Weighted Ensemble of Deep Classifiers Shruti Asmita 1 and K.K. Shukla 2 1 Department of Computer Science, Banasthali University, Jaipur-302001,Rajasthan, India 2 Department of Computer Science and Engineering, Indian Institute of Technology, Banaras Hindu University, Varanasi-221005, Uttar Pradesh, India ABSTRACT Ensemble of classifiers increases the performance of the classification since the decision of many experts are fused together to generate the resultant decision for prediction making. Deep learning is a classification algorithm where along with the basic learning technique, fine tuning learning is done for improved precision of learning. Deep classifier ensemble learning is having a good scope of research. Feature subset selection is another for creating individual classifiers to be fused for ensemble learning. All these ensemble techniques faces ill posed problem of overfitting. Regularized weighted ensemble of deep support vector machine performs the prediction analysis on the three UCI repository problems IRIS, Ionosphere and Seed data set, thereby increasing the generalization of the boundary plot between the classes of the data set. The singular value decomposition reduced norm 2 regularization with the two level deep classifier ensemble gives the best result in our experiments. KEYWORDS Deep learning, support vector machine, feature subset selection, singular value decomposition, regularization 1.INTRODUCTION Machine learning is a domain of computational statistics, a specialized field of prediction making. This aims at artificial learning i.e. the construction of such algorithms which are capable of learning from data [1]. Such learning is based on the development of model from training data and hence making decisions using the model on the test data. Supervised machine learning [2] is marked by the presence of a supervisor in a way that training set comprising of a number of inputs and corresponding output i.e. associated label is provided to the machine for initial learning and model forming. Later with the help of this generated model, required output is generated on any input not present in the training set. On the other side, the unsupervised learning [2] does not contain any such supervisor. These try to find out hidden relation between the unlabelled data. Classification, regression etc. are techniques of supervised learning whereas clustering, self-organizing neural network map etc. are techniques of unsupervised learning. Other learning approaches in existence are semi supervised learning, reinforcement learning, developmental learning etc. In classification [3], the training data is divided into two or more classes. A model is required to be formed which can distinguish between the category and generate an ability to place new input instances in the correct class to which it belongs. The performance measure of classification is the classification accuracy. The goal of any learning lies in achieving best possible classification

Upload: others

Post on 26-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

DOI:10.5121/ijcsa.2015.5305 47

Regularized Weighted Ensemble of Deep

Classifiers

Shruti Asmita1 and K.K. Shukla

2

1Department of Computer Science, Banasthali University, Jaipur-302001,Rajasthan, India

2Department of Computer Science and Engineering, Indian Institute of Technology,

Banaras Hindu University, Varanasi-221005, Uttar Pradesh, India

ABSTRACT

Ensemble of classifiers increases the performance of the classification since the decision of many experts

are fused together to generate the resultant decision for prediction making. Deep learning is a

classification algorithm where along with the basic learning technique, fine tuning learning is done for

improved precision of learning. Deep classifier ensemble learning is having a good scope of research.

Feature subset selection is another for creating individual classifiers to be fused for ensemble learning. All

these ensemble techniques faces ill posed problem of overfitting. Regularized weighted ensemble of deep

support vector machine performs the prediction analysis on the three UCI repository problems IRIS,

Ionosphere and Seed data set, thereby increasing the generalization of the boundary plot between the

classes of the data set. The singular value decomposition reduced norm 2 regularization with the two level

deep classifier ensemble gives the best result in our experiments.

KEYWORDS

Deep learning, support vector machine, feature subset selection, singular value decomposition,

regularization

1.INTRODUCTION

Machine learning is a domain of computational statistics, a specialized field of prediction making.

This aims at artificial learning i.e. the construction of such algorithms which are capable of

learning from data [1]. Such learning is based on the development of model from training data

and hence making decisions using the model on the test data. Supervised machine learning [2] is

marked by the presence of a supervisor in a way that training set comprising of a number of

inputs and corresponding output i.e. associated label is provided to the machine for initial

learning and model forming. Later with the help of this generated model, required output is

generated on any input not present in the training set. On the other side, the unsupervised learning

[2] does not contain any such supervisor. These try to find out hidden relation between the

unlabelled data. Classification, regression etc. are techniques of supervised learning whereas

clustering, self-organizing neural network map etc. are techniques of unsupervised learning.

Other learning approaches in existence are semi supervised learning, reinforcement learning,

developmental learning etc.

In classification [3], the training data is divided into two or more classes. A model is required to

be formed which can distinguish between the category and generate an ability to place new input

instances in the correct class to which it belongs. The performance measure of classification is the

classification accuracy. The goal of any learning lies in achieving best possible classification

Page 2: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

48

accuracy. Several classification algorithms are being applied onto various datasets but the scope

of improvement in the performance through the use of new techniques is always there. Machine

learning aims at obtaining high test accuracy. Number of popular classifiers used widely for

several classification techniques are k nearest neighbour classifier, decision tree classifier,

frequent pattern classifier, bayes classifier, rule based classifier, support vector machine (SVM)

classifier etc. [4] Among these SVM [5] classifier is most studied and implemented classifier

these days because of its high accuracy and exceptional ability to model complex non-linear

decision boundaries by mapping non-linear data to higher dimensions. Hence both linear as well

as non-linear data can be well classified by this classifier. Also, because of the presence of

support vectors in SVM classifiers, the compactness of the classification is very high. Groups of

people can often make better decisions than individuals [6]. Hence the ensemble of classification

models results into improved classification accuracy than the individual classifier model.

The task of prediction can be time series where the training data for model generation is recorder

over a long span of time and in such cases batch learning is done [7]. In batch learning, the model

generated on individual batches till the previous time unit is ensembled to form the resultant

model for testing of present batch data. Another prediction can be non-time series where the

training data for model generation contains various instances at one particular time instant. Batch

learning is not feasible in such classifications since all the instances are equally related to each

other. Hence for obtaining the ensemble of classifiers, the techniques possible for individual

model generation are bagging [6] with bootstrap subsampling, deep learning and feature subset

selection etc. These techniques aim at increasing the diversity for the ensemble of classifiers.

Even in the ensemble of classifier model, there occurs an ill posed problem of overfitting. This

problem can be handled through regularization. The vector norms applied in the process of

regularization handles overfitting by reducing the mean squared distance between the training

instances.

This paper deals with three prediction problems, first, the prediction of type of IRIS plant from

among Iris Setosa, Iris Versicolour and Iris Virginica, second, the prediction of a good radar

return or a bad radar return from the Ionosphere and third, the prediction of type of wheat kernel

from among Kama, Rosa and Canadian variety. The prediction making for above is done through

regularized weighted ensemble of deep support vector machine classifiers. The individual models

for the ensemble learning are generated through feature subset selection and deep learning. The

weights are assigned to each individual model by majority voting technique. These weights are

then regularized through four variations i.e. norm 1, norm 2, tikhonov and singular value

decomposition (SVD) reduced norm 2 regularization. This form of regularization reduces the

curvature of each depression and convolution of the non-linear boundary plot of SVM and hence

the loss function is modified to promote generalization and provide the essential curve fitting over

the input feature vectors for classification. To the best of our knowledge, this technique of

regularization of weights with deep learning and such ensemble learning approaches in the

supervised machine learning task, for dealing with the problem of overfitting of the classifiers has

yet not been applied to such prediction problems. In the stretch of paper firstly the detail about

dataset and background concepts are discussed. Moving further the algorithm, framework,

experiment results and comparison analysis is done.

2. DATA SET

Three prediction problems used in this paper are summarized in table 1. The training set and test

set comprise of 70% and 30% of the whole database respectively. This ratio of 7:3 is an arbitrary

ratio but is chosen because it is a good practical ratio according to most of the experiments in

machine learning.

Page 3: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

49

2.1.IRIS Dataset

Iris database is created by R.A. Fisher and donated by Michael Marshall in July 1988 [8]. This is

a popular dataset and is being successfully used in several problems related to prediction and

pattern recognition. The data set contains 3 classes specifying the type of iris plant from among

Iris Setosa, Iris Versicolour and Iris Virginica. There are a total of 50 instances per class in the

whole dataset. The classification problem is the prediction of category of Iris plant. The four

attributes or features in record of the dataset are sepal length (cm), sepal width (cm), petal length

(cm) and petal width (cm). Table 2 describes the number of instances of each class in total,

training and test data of Iris data set. Table 3 describes major previous related work done on Iris

data.

Table 1. Instances distribution in training and test set of data

S.

No.

Dataset Year of

data set

creation

Number

of classes

Number

of

features

Total

number

of

instances

Training

number of

instances

Test

number of

instances

1 IRIS 1988 3 4 150 105 45

2 Ionosphere 1989 2 34 351 246 105

3 Seed 2012 3 7 210 147 63

Table 2. Number of instances of each class in total, training and test data set of Iris Data set

S. No. Feature Total number of

instance

Training

number of

instance

Test number of

instance

1 Iris Setosa 50 35 15

2 Iris Versicolour 50 35 15

3 Iris Virginica 50 35 15

Table 3. Previous major experiments reported on Iris data set and classification accuracy achieved in each

case

S. No. Year of

Research

Problem Statement Reported

classification

accuracy (%)

1. 2014 Neuro-fuzzy classifier system [9] 96.70

2. 2013 Evolving neural network ensembles using string genetic

algorithms for pattern classification [10]

93.30

3. 2012 Hybrid SVM and decision tree classifier [11] 97.08

4. 2012 Classifier ensemble for SVM [12] 95.00

5. 2011 One class SVM weighted bagging [13] 92.00

6. 2010 Large margin classifier SVM [14] 95.30

7. 2010 Feature subset selection in neural network classifier[15] 97.00

8. 2008 SVM based semi supervised classification [16] 95.00

9. 2003 SVM Ensemble with majority voting [17] : SVM

: Bagging

: Boosting

96.50

96.80

97.20

Page 4: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

50

2.2.Ionosphere Dataset

Ionosphere database is created at Johns Hopkins University and donated Vince Sigillito in 1989

[18]. For collection of the dataset, radar system is used. This radar contains phased array of 16

high frequency antennas with the help of which the free electrons in the ionosphere are recorded.

The two classes into which the categorization has to be done are “good” and “bad” ionosphere.

Predictions are done on the basis of 34 attributes. This large number of attribute lists marks this

dataset different from the other two dataset mentioned in this section. Table 4 shows the number

of instances of each class in total, training and test data of Ionosphere data set. Table 5 shows

previous major similar contribution on Ionosphere data set

Table 4. Number of instances of each class in total, training and test data set of Ionosphere Data set

S. No. Feature Total number of

instance

Training

number of

instance

Test number

of instance

1 Good radar signal 224 168 56

2 Bad radar signal 127 78 49

Table 5. Previous major experiments reported on Ionosphere data set and classification accuracy achieved

in each case

S. No. Year of

Research

Problem Statement Reported

classification

accuracy (%)

1 2014 Classifier ensemble based on weighted accuracy and

diversity [19]

94.00

2 2014 Weighted classifier ensemble SVM [20] 94.00

3 2013 Artificial immune recognition through SVM

classification[21]

93.00

4 2013 One class ensemble classifier majority voting approach

[22]

89.80

5 2010 Fast local Radial basis function kernel SVM

classification[23]

93.72

6 2008 Oblique decision tree embedded with SVM

classification [24]

92.59

7 2008 SVM infinite ensemble learning [25] 92.00

8 2006 Evolving ensemble of classifiers with majority voting

[26]

81.00

2.3. Seed Dataset

Seed database is one of the new database and hence has a very few previous experiments in its

list. Dataset mentions in it the geometrical properties of the kernel which is a characteristic to

differentiate varieties of wheat i.e. Kama, Rosa and Canadian. For the collection of the dataset

some X-ray techniques are used [27]. Seven parameters of wheat kernels which forms the feature

set in the dataset are area (A), compactness (C = 4*pi*A/P^2), perimeter (P), length of kernel,

asymmetry coefficient, width of kernel, and length of kernel groove. Table 6 shows number of

instances of each class in total, training and test data set of seed data. Table 7 shows the major

Page 5: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

51

previous similar contribution on seed data. (To the best of our knowledge, this data has been

worked upon on the similar proposals, only by its developers, till date. Hence a single previous

work is reported in table 7).

Table 6. Number of instances of each class in total, training and test data set of Seed Data set

S. No. Feature Total number of

instance

Training

number of

instance

Test number of

instance

1 Kama 70 49 21

2 Kosa 70 49 21

3 Canadian 70 49 21

Table 7. Previous major experiments reported on seed data set and classification accuracy achieved in each

case

S. No. Year of

Research

Problem Statement Reported

classification

accuracy (%)

1 2012 Complete gradient clustering with K Mean Algorithm

[28]

92.00

3.BACKGROUND APPROACH

3.1.SVM Classifier

Origin of SVM classifiers lies in VC dimensions. VC dimension is defined on a set of function. It

is the maximum number of points that can be separated in all possible ways by that set of

function. The non-linearly separable data are transformed to higher dimensions for achieving

classification through SVM (figure 1). The margin between the classes can be soft margin or hard

margin (figure 2). In case of soft margin classifiers, the model generated contains the

compensation of the misclassified instances. However the hard margin does not allow any

misclassification. Instead, it plots strict non-linear boundary to avoid misclassification. SVM

classifies the data through hinge loss optimization function. Soft margin classification is more

prevalent than hard margin classification since the later faces a very high rate of overfitting.

Figure 1. SVM

Page 6: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

52

3.2.Ensemble of classifiers

Ensemble Learning is the process of training multiple learning machines individually and thereby

combining their outputs similar to a committee of decision makers. The principle behind this

method of decision making is that the individual predictions combined appropriately, should have

better overall accuracy, on average, than any individual committee member [29]. Prime

Aggregation method applied in the ensemble learning are voting techniques such as majority

voting, borda count aggregation, behaviour knowledge based aggregation, dynamic classifier

selection etc. [30]. Out of these, our proposed learning technique uses majority voting [31]

aggregation. The three versions of majority voting are unanimous voting, simple voting and

plurality voting. Plurality voting is the most optimal form of majority voting.

Majority voting in the proposed statement of this paper aims at giving high weightage to more

qualified experts in the ensemble of classifiers. The expertise is inversely proportional to the

classification error.

Figure 2. Hard Margin SVM plot

3.3.Feature Subset Selection

Feature selection algorithms attempt to select features which are useful and deselect the features

which are not helpful or destructive to learning [32]. Feature subset selection is an important

phase of pre-processing in machine learning [33]. At times in this phase some feature are

removed totally. However these removed features may become important when incorporated in

some combination with other features. This disadvantage of feature selection can be removed by

utilizing it in ensemble learning. Here several combinations of features are selected through some

algorithms to form individual models to be ensembled. Various selection algorithms are

exhaustive selection (evaluation of all possible subset of features), branch and bound selection

(evaluation using branch and bound algorithm), sequential forward selection (SFS) (select best

single feature and then add one feature at a time in combination which maximizes decision

accuracy), sequential backward selection (SBS) (select all the features and remove one feature at

a time which maximizes the decision accuracy) and best individual feature selection (evaluation

of all the N features individually and then taking the best set of features) etc. [34]. SFS is bottom

up procedure and SBS is top down procedure. Here the exhaustive selection is most ideal

approach but is feasible only when the number of attributes is few in numbers. Otherwise the

possible combination can shoot exponentially in number, not possible to handle.

Page 7: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

53

3.4.Deep Learning

Deep SVM is inspired from the success of deep neural networks [35], deep belief networks [35],

and deep Boltzmann machine [36] etc. Multilayer perceptron with many hidden layers is an

example of deep learning. Deep learning is a type of machine learning techniques that learns

multiple levels of representations in deep architectures [37]. There are chances of the

conventional classifiers to get trapped in local optima of objective function. But the deep

architectures learns the feature representations through both supervised training and fine tuning at

further deep phases of learning. First phase of deep SVM is the standard training process. Then in

the second phase, the kernel activations of the support vectors of first phase are set as inputs for

another SVM and so on till whatever level of tuning is required to be done [38]. Usually the

tuning starts to repeat after 3-4 levels of deep learning. This training procedure is greedy in

nature. This makes the computationally very efficient. Ensemble of each phase of learning in the

deep learning further increases the precision of the model. However, there exist fine tuning

learning, but the model function still over fits the data points due to non-linear kernel activation

learning.

3.5.Regularization

The concept of regularization came into existence in 1990’s. In the supervised machine learning

problems, accurate prediction is more important than the close fit of the function onto the data.

Hence generalization is appreciated or in other words overfitting of function has to be checked. In

figure 3 the blue curve is a 2 degree curve, red curve is a 4 degree curve and the green curve is the

8 degree curve which is the maximum out of the three. The green curve plots the close fit

boundary between the two classes, but the test accuracy decreases. However the blue curve shows

minimum training accuracy but chances of betterment in test accuracy is the maximum in this

case. Green curve marks overfitting. Hence it can be said that the overfitting occurs when

generalization is decreased. Regularization is a measure to check this overfitting. This provides

problem stability. Regularization restricts the hypothesis space to a linear function or a

polynomial of a particular degree according to the scenarios and smoothness to the function is

provided by putting the function in Reproducing kernel hilbert space (RKHS). A regularization

parameter ‘λ’ associated with the regularization term of optimization function which controls the

trade-off between stability and accuracy.

Figure 3. Fitting of classifier on the data set

Page 8: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

54

In case of the ensemble learning, the regularization can be applied to the optimization of the loss

function. By doing this the degree of the best fit polynomial is reduced and test classification

accuracy is improved. On the other side, overfitting can also be dealt with by keeping the degree

of the best fit function constant and regularizing the weightage associated to each individual

classifiers participating in the ensemble learning. This reduces the curvature of each positive or

negative depression in the curve without reducing the degree of whole curve. Hence the loss

function is modified to provide the boundary fitting over the input feature vectors.

Another statistical technique is bootstrap resampling in which a new set dataset DT’ is drawn out

from the previous dataset DT by random sampling with replacement. Bagging is performed by

applying this in several iterations and then performing ensemble learning onto this. For a large

DT, the number of individual samples that are not present in any of the bootstrapped dataset is

large. The probability that first training sample is not selected once is (1- 1/N) and not selected at

all is (1-1/N)N [1]. Since N -> ᴔ, 1/e = 0.36 .Hence only about 63% of original training samples

are represented in any bootstrapped set. Since bagging reduces variance, it provides an alternative

approach to regularization [6] because even if each classifier is individually overfit, they are

likely to be overfit to different things.

4.PROPOSED WORK

In our work, regularized ensemble of deep SVM classifier has been used which shows a markable

improvement in the classification accuracy of prediction problems. For training and optimization

of our problem, we have used a popular library libSVM [40,41]. The ensemble of deep classifiers

is generated using four different frameworks shown in fig 4, fig 5, fig 6, fig 7. Fig 4 shows

ensemble of classifiers based on feature subset selection framework where the individual models

are formed by different training on different feature subset. Even those features which do not

contribute well in isolation or total combination, may work well in some combinations. This

explores all the best possible decisions using feature combinations. Fig 5 shows the ensemble of

deep classifiers level 1 where each individual model is generated by the training in each phase of

deep learning. This provides improved basic training through fine tuning of deep phases. Fig 6

shows the ensemble of deep classifiers level 2, where fine tuning at a further level is done. Fig 7

shows a combination of motive achieved in fig 4 and fig 6 i.e. ensemble of deep classifiers

learning with feature subset selection.

Figure 4. Ensemble of classifiers based on feature subset selection framework

Page 9: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

55

Figure 5. Ensemble of deep classifiers level 1 framework

Figure 6. Ensemble of deep classifiers level 2 framework

Page 10: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

56

Figure 7. Ensemble of deep classifiers learning with feature subset selection

For SVM, the loss function optimized is the hinge loss L(f(x),y)=max(0,1-y.f(x)). It has been

observed that the regularization technique that generates the best accuracy for our proposed work

is the singular value decomposition (SVD) reduced weight matrix with regularization parameter

ʎ1 and square of norm 2 of weight matrix with regularization parameter ʎ2. Other regularization

factors are norm1, norm2 and tikhonov regularization. The objective function is described in

equation 1:

(1)

Here βi is achieved through regularized majority voting.

Algorithm 1: Regularized ensemble of classifiers using exhaustive feature subset selection

1: Start

2: Find all the possible combinations of features

3: Train the SVM classifier all combinations received in 1

4: Estimate the weights {β1……… βt} associated with each individual model through regularized

majority voting

5: Evaluate ensemble of classifier model

6: Report ensemble model, classification accuracy on Test data set and the weights {β1……… βt}.

7: End

Algorithm 2: Regularized ensemble of classifiers using best N feature subset selection

1: Start

2: Train the SVM classifier on all individual features.

3: Record the accuracy generated and the corresponding feature in descending order.

4: Train SVM classifiers on Classifierset= {Best N, Best N-1, Best N-2…… Best 1}

Page 11: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

57

4: Estimate the weights {β1……… βt} associated with each member of Classifierset through

regularized majority voting technique

5: Evaluate ensemble of classifier model

6: Report ensemble model, classification accuracy on Test data set and the weights {β1……… βt}.

7: End

Algorithm 3: Regularized ensemble of deep classifiers

1: Start

2: for level= 1: t

3: Train the SVM classifier on data set D and record the model generated in [Model]

4: Generate new data set D’ with the support vectors of model generated

5: D=D’

6: end for

7: Estimate the weights {β1……… βt} associated with each member of [Model] through regularized

majority voting technique

5: Evaluate ensemble of classifier model

6: Report ensemble model, classification accuracy on Test data set and the weights {β1……… βt}.

7: End

Regularization parameter λ associated with the regularization term is an important term to control

the trade-off between stability and accuracy. There are many regularization techniques in

existence and this is also a topic under further research. L1 Regularization is norm 1

regularization factor which penalizes all the factors equally. This focuses on selection of only the

relevant factors. Its numerical definition is λ1.||β||1. L1 penalty is linear which tends to produce

many points with zero curvature. A disadvantage with this regularizer is slow convergence in case

of large scale problems. Secondly, L2 regularizer minimizes curvature at all the points in the

curve by applying penalty that scales square of curvature. Its numerical definition is λ1.||β||2.

Complexity of L2 regularization is greater than L1 regularizer. Thirdly,

Tikhonov regularizer is a special case of L2 Regularization numerically defined by term

(λ1)2.(||β||2)

2. Further the SVD reduced norm 2 regularization is represented as λ1. SVD(β) + λ2.

(||β||2). SVD has multiple roles and can be viewed as a method for transforming correlated

variables into a set of uncorrelated ones that better expose the various relationships among the

original data items, a method for identifying and ordering the dimensions along which data points

exhibit the most variation and a method for data reduction by finding the best approximation of

the original data points using fewer dimensions. Regularization path varies with the experimental

conditions.

5.EXPERIMENTS

In all the experiments listed below, SVM classifier is used because it evaluates dot products of

vectors in the higher dimension to construct the dividing boundary. The choice of a kernel

function depends on the model to plot. A polynomial kernel allows to model feature conjunctions

up to the order of the polynomial. Radial basis functions (RBF) allows plotting circular

boundaries in higher dimensions. Linear kernel allows putting linear boundaries in higher

dimensions. Multiclass classification is best achieved through RBF. If ƴ is the kernel bandwidth

parameter and (Xi , Xj) is vector to be transformed to higher dimensions, equations 2 shows RBF

kernel equation.

(2)

Page 12: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

58

Other important algorithm used is the parameter estimation algorithm of Grid search. In v-fold

cross-validation, the training set is divided into v subsets of equal size. Classifiers are trained on

v-1 subsets and are tested on one subset. Hence each instance is predicted once and so the cross

validation accuracy is the percentage of data which are correctly classified. The kernel parameters

(C, ƴ) are estimated using cross-validation. Various combination of (C, ƴ) is tried and one with

best cross validation accuracy is picked. In the experiments of our proposed work, libSVM library

[40, 41], is used for training multi class SVMs with RBF kernel. The features in the training and

test datasets were scaled in the range [-1, +1]. 10 fold cross validation is used for choosing the

kernel bandwidth parameter ƴ and SVM C parameter through grid search. The range of (C, ƴ) are

[2-10 ,2-9, …..25] and [2-5, 2-4,…..210] respectively. The range for regularization parameter λ1 and λ2

is 0 < λ1 < 0.5 and 0 < λ2 < 0.5 respectively. Five cases of experiment are described below. Results

of bagging technique are listed in table 8 for a comparative vision.

Case 1: Bagging Ensemble of classifiers

Case 2: Ensemble of classifiers based on feature subset selection

Case 3: Ensemble of classifiers in deep learning level 1

Case 4: Ensemble of classifiers in deep learning level 2

Case 5: Ensemble of classifiers in deep learning level 1 with the feature subset selection

Cases 2,3,4,5 have subcases for the following regularization schemes:

Setting 1: SVD reduced Norm 2 regularization

Setting 2: Norm 1 regularization

Setting 3: Norm 2 regularization

Setting 4: Tikhonov regularization

Table 8. Results of Bagging ensemble of classifiers in all the three dataset

S.No. Dataset Classification

accuracy (%) in

Bagging Ensemble of

classifiers

1. IRIS Data set 96.66

2. Ionosphere Data

set

87.87

3. Seed Data set 95.38

For the feature subset selection IRIS data set uses Algorithm 1 i.e. exhaustive feature subset

selection. This is most optimal selection algorithm. For Ionosphere and Seed data set, since the

number of features or the attributes is very large in number, it is very lengthy and highly complex

to find out all the possible combinations of attributes. Hence they both use Algorithm 2 i.e. best N

feature subset selection. Fig 8 represents the classification accuracy results of experiments on

IRIS dataset. Fig 9 and fig 10 represents 2D and 3D scatterplot where different colours mark the

different class vectors respectively. Similarly Fig 11, fig 12, fig 13 are the corresponding results

on Ionosphere dataset and fig 14, fig 15 and fig 16 are the corresponding results on Seed dataset.

Page 13: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

59

Figure 8. Results of experiments on IRIS Dataset

Figure 9. 2D Scatter plot between all pair of attributes in IRIS dataset

Figure 10. 3D Scatter plot between all pair of attributes in IRIS dataset

Page 14: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

60

Figure 11. Results of experiment on Ionosphere dataset

Figure 12. 2D Scatter plot on the best set of features in Ionosphere dataset.

Figure 13. 3D Scatter plot on the best set of features in Ionosphere dataset

Page 15: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

61

Figure 14. Results of experiment on seed dataset

Figure 15. 2D Scatter plot on the best set of features in Seed dataset.

Page 16: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

62

Figure 16. 3D Scatter plot on the best set of features in Seed dataset.

6.OBSERVATION

The results in all the above three set of experiments show the improved classification accuracy

than the major reported previous results, in the case of ensemble of deep classifiers level 2 with

the SVD reduced norm 2 regularizations which is nearly 99%. Time taken in this particular case

for various dataset is reported in table 9. It is to be noted that time taken in case of ionosphere

data is comparatively larger than other two dataset due to comparatively large number of features

in it. The deep learning on the complete dataset is generating better results than the deep learning

on the feature subset selection schemes. This is because the fine tuning in the presence of all the

features is better in comparison to the feature subset. The penalty in Norm 1 regularization deletes

many noise features by estimating their coefficients to zero since it is not differentiable at zero.

Whereas the penalty in Norm 2 regularization uses all the input features in classification because

it is differentiable at all points in the function. Hence Norm 2 regularization achieves higher order

smoothness for curve estimation.

Table 9. Time taken in deep learning level 2 with full feature set ensemble learning with SVD

reduced Norm 2 regularization

S. No. Dataset Time (sec)

1. IRIS Data set 16.37

2. Ionosphere Data set 123.78

3. Seed Data set 32.78

Next, since the bagging model shows the inclusion of only about 63% of the original training

samples in any bootstrapped set (as discussed in section 3.5), the regularization provided by this

technique is not as smooth as the ensemble of deep classifiers. Analysis of the regularizers

applied above can be done on the basis of worst case time complexity. In Norm 1 regularization,

there are total of (t-1) sum operations computed at run of algorithm. Time Complexity O(t) is

reported. In Norm 2 regularization, there are total of (t-1) sum operations, t operations to square

all the elements, and 1 square root operation is computed. Time complexity O(3t) is reported. One

Page 17: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

63

degree regularization parameter is applied. In the tikhonov regularization, time complexity O(3t)

is same as L2 regularization but here 2 degree regularization parameter is applied. In

(SVD+Norm2), there are two expressions involved. O(t2) for SVD computation summed with

O(3t) for norm 2 computation. Hence time complexity O(t2) is reported.

7.CONCLUSION

The deep learning approach for the improvement in the classification accuracy is very prevalent

in the artificial neural network field. The deep SVM classifier is still an emerging concept. Here

the experiments prove a good scope of deep learning with SVM classifiers. Regularization of

deep learning has further marked an improvement in classification accuracy. Many other

regularization techniques could be applied for comparison and better results. Other feature

selection strategies such as SFS and SBS could also be applied for feature subset selection.

REFERENCES

[1] Rob Schapire, “Theoretical Machine Learning”, COS 511, Lec No. 1, p. 1-6, 2008

[2] R. Sathya, Annamma Abraham, “Comparison of Supervised and Unsupervised Learning Algorithms

for Pattern Classification”, IJARAI, Vol 2, No. 2, 2013

[3] D. Michie, D.J. Spiegelhalter, C.C. Taylor, “Machine Learning, Neural and Statistical Classification”,

Tutorial section 2.1, p. 6-16, 1994

[4] S. B. Kotsiantis, “Supervised Machine Learning: A Review of Classification Techniques”, Informatica,

Vol 31, p. 249-268, 2007

[5] Koby Crammer, Yoram Singer, “On the Algorithmic Implementation of Multiclass Kernel-based

Vector Machines”, JMLR 2, p. 256-295, 2001

[6] Hal Daume III, “A course in Machine Learning”, Ensemble learning CIML, V0-8, Ch 1,p. 148-155,

2012

[7] A.Vergara, Shankar Vembu, Tuba Ayhanb, Margaret A. Ryanc, Margie L. Homerc, Ramon Huertaa

“Chemical gas sensor drift compensation using classifier ensembles” . Sensors and Actuators B, p.

166-167 2012.

[8] Fisher’s IRIS dataset, UCI repository, https://archive.ics.uci.edu/ml/datasets/Iris, 1988

[9] Vaishali Arya, R.K.Rathy, “An Efficient Neuro-Fuzzy Approach for Classification of Iris Dataset”,

International Conference on Reliability, Optimization and Information Technology, p. 161- 165, 2014.

[10] Xiaoyang Fu and Shuqing Zhang, “Evolving Neural Network Ensembles Using Variable String

Genetic Algorithm for Pattern Classification”, Sixth International Conference on Computational

Intelligence, p. 81-85 2013.

[11] Anshu Bharadwaj, Sonajharia Minz, “Hybrid Approach for Classification using Support Vector

Machine and Decision Tree”, International Conference on Advances in Electronics, Electrical and

Computer Science Engineering, p. 337-341, 2012.

[12] Hamid Parvin, Sajad Parvin, “Robust Classifier Ensemble for Improving the Performance of

Classification”, Eleventh Mexican International Conference on Artificial Intelligence, IEEE special

session, Vol 11, p. 52-57, 2012.

[13] Xue-Fang Chen, Hong-Jie Xing, Xi-Zhao Wang, “A modified AdaBoost method for one-class SVM

and its application to novelty detection”, IEEE, Vol 11 p. 3506-3511, 2011.

[14] Hakan Cevikalp, Bill Triggs , Hasan Serhan Yavuz , Yalc, Mahide, Atalay Barkana, “Large margin

classifiers based on affine hulls” Elsevier, Vol 73, p. 3160-3168, 2010.

[15] A. Marcano-Cedeño, J. Quintanilla-Domínguez, M.G. Cortina-Januchs, D. Andina, “Feature Selection

Using Sequential Forward Selection and classification applying Artificial Metaplasticity Neural

Network”, IEEE, No. 36, p. 2845-2850, 2010

[16] Narendra S. Chaudhari, Aruna Tiwari, Jaya Thomas,“Performance Evaluation of SVM Based Semi-

supervised Classification Algorithm”, 10th Intl. Conf. on Control, Automation, Robotics and Vision,

No. 10, p. 1942-1947, 2008

[17] Hyun-Chul Kim, Shaoning Pang, Hong-Mo Je, Daijin Kim, Sung Yang Bang, “ Constructing support

vector machine ensemble”, The journal of the pattern recognition society, Vol 36, p. 2757-2767, 2003.

Page 18: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

64

[18] Vince Sigillito, Ionosphere Dataset , UCI repository, https://archive.ics.uci.edu/ml/datasets/Ionosphere,

1989

[19] Xiaodong Zeng, Derek F. Wong, Lidia S. Chao, “Constructing Better Classifier Ensemble Based on

Weighted Accuracy and Diversity Measure”, The Scientific World Journal, Volume 2014, Article No.

961747, p. 1-12, 2014

[20] Shasha Mao, LichengJiao, LinXiong, ShuipingGou, BoChen, Sai-KitYeung, “Weighted classifier

ensemble based on quadratic form”, Elsevier Vol 48, Issue 5, p. 1688-1706, 2014

[21] Darwin Tay, Chueh Loo Poh, Richard I. Kitney, “An Evolutionary Data-Conscious Artificial Immune

Recognition System” , Proceedings of the 15th annual conference on Genetic and evolutionary

computation, p. 1101-1108, 2013

[22] Eitan Menahem, Lior Rokach, Yuval Elovici, “Combining One-Class Classifiers via Meta Learning”,

Proceedings of 22 ACM international conference on information & knowledge management, No. 22,

p. 2435-2440, 2013

[23] Nicola Segata , Enrico Blanzieri, “Fast and Scalable Local Kernel Machines”, JMLR, Vol 1, p. 1883-

1926, 2010

[24] Vlado Menkovski, Ioannis T. Christou, Sofoklis Efremidis, “Oblique Decision Trees Using

Embedded Support Vector Machines in Classifier Ensembles” , Vol 11, p. 1-6, 2008

[25] Hsuan-Tien Lin , Ling Li, “Support Vector Machinery for Infinite Ensemble Learning”, JMLR , Vol

9, p. 285-312, 2008

[26] Albert Hung-Ren Ko, Robert Sabourin, Alceu de Souza Britto, “Evolving Ensemble of Classifiers in

Random Subspace”, Proceedings of the 8th annual conference on Genetic and evolutionary

computation, p. 1473-1480, 2006

[27] Gorzata’s Seed Data set, UCI repository, https://archive.ics.uci.edu/ml/datasets/seeds, 2012

[28] M. Charytanowicz, J. Niewczas, P. Kulczycki, P.A. Kowalski, S. Lukasik, S. Zak, “A Complete

HGradient Clustering Algorithm for Feature Analysis of X-ray Images”, Information Technology in

Biomedicine, Springer-Verlag, p. 15-24, 2010

[29] Gavin Brown, Encyclopaedia of Machine Learning Vol 1, p. 312-320, 2010

[30] Robi Polaker, Ensemble based systems in decision making, IEEE, Vol 6, Issue 3, p. 21-45

[31] Hyun-Chul Kim, Shaoning Pang, Hong-Mo Je, Daijin Kim, and Sung-Yang Bang, Support Vector

Machine Ensemble with Bagging, Springer, LNCS 2388, p. 397-408, 2002

[32] David W. Opitz, “Feature Selection for Ensembles”, American Association for Artificial Intelligence,

AAAI Proceeding No. 99, p.1-6, 1999

[33] Mohamed A. Aly, “Novel Methods for the Feature Subset Ensembles Approach”, International

Journal of Artificial Intelligence and Machine Learning, Vol. 6, No. 4, p. 1-7, 2006

[34] Anil K. Jain, Robert P.W. Duin, Jianchang Mao, “Statistical Pattern Recognition: A Review”,IEEE

transactions on pattern analysis and machine intelligence, Vol 22, Issue 1, p. 4-37, 2000

[35] Dong Yu and Li Deng, “Deep Learning and Its Applications to Signal and Information Processing” ,

IEEE processing magazine Vol 28, Issue 1, p. 145-154, 2011

[36] Nitish Srivastava, Ruslan Salakhutdinov, “Multimodal Learning with Deep Boltzmann Machines”,

ICML, 25 Annual Conferrence on learning theory, No. 25, p. 1-9, 2012

[37] Xue-Wen chen, Xiaotong Lin, “Big Data Deep Learning: Challenges and Perspectives”, IEEE Access,

Vol 2, p. 514-525, 2014.

[38] Azizi Abdullah, Remco C. Veltkamp, Marco A. Wiering, “An Ensemble of Deep Support Vector

Machines for Image Categorization”, International Conference of Soft Computing and Pattern

Recognition, p.301-306, 2009.

[39] Hal Daume III, From zero to reproducing kernel hilbert spaces in twelve pages or less, p.1-12, 2004

[40] C.-C. Chang, C.-J. Lin, LIBSVM: A Library for Support Vector Machines, Software. Available at

http://www.csie.ntu.edu.tw/cjlin/libsvm, 2001.

[41] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, Department of Computer Science, National

Taiwan University,Taipei 106, Taiwan, 2003, Practical Guide to Support Vector Classification, p 1-

16, 2003.

Page 19: Regularized Weighted Ensemble of Deep Classifiers › papers › ijcsa › V5N3 › 5315ijcsa05.pdf · 2015-10-23 · regularized weighted ensemble of deep support vector machine

International Journal on Computational Science & Applications (IJCSA) Vol.5, No.3, June 2015

65

Authors

Ms. Shruti Asmita (B.Tech., 2013 – KEC Ghaziabad, Uttar Pradesh Technical University, Lucknow) is a

M.Tech. Computer Science scholar at Banasthali University, Jaipur and pursuing her research internship at

IIT-BHU (CSE), Varanasi. Her research interests include data mining, image processing, machine learning

and sensor networks etc.

Dr. K.K. Shukla (Ph. D., 1993 - Institute of Technology (BHU), Varanasi) is professor and current head of

department at Indian Institute of Technology, Banaras Hindu University Varanasi, India. He has been

awarded B.Tech from APSU, Rewa in 1980, M.Tech. from IT (BHU) in 1982 and PhD from IT (BHU) in

1993. He is having research and teaching experience of 30 years, He is having more than 120 research

papers in reputed journals and conferences and more than 90 citations. His present research collaborations

in India include ISRO and TCS. Out of India research collaborations includes INRIA, France and ETS,

Canada. He has many popular books under his authorship on subjects Neuro-computers, RTS Scheduling,

Fuzzy modelling and Image Compression. His field of research includes image processing and pattern

recognition, fuzzy logics, wireless sensor networks and machine learning etc.