bearing fault diagnosis using cwt, bga and artificial bee ... · bearing fault diagnosis using cwt,...

International Journal of Mechanical & Mechatronics Engineering IJMME-IJENS Vol:15 No:03 1

150603-7474-IJMME-IJENS © June 2015 IJENS I J E N S

Bearing fault diagnosis using CWT, BGA and

Artificial Bee Colony Algorithm S.Devendiran

1, K. Manivannan

2, C.Rajeswari

3, Joshua Michael Amarnath

4, & Apoorv prasad

5

1, 2, &4 School of Mechanical and Building sciences ,VIT University, Vellore, India

3&5 School of Information Technology & Engineering, VIT University, Vellore, India

1 [email protected]

[email protected] 3 [email protected]

4joshua [email protected]

[email protected]

Abstract– Health diagnosis of bearing is essential reduce

the breakdowns of rotating machinery. An intelligent

method to diagnose the bearing fault using vibration signal

is proposed. This paper proposes a binary genetic

algorithm (BGA) in feature selection process and discuss

about the role of fitness functions in feature selection

process by application of different fitness functions in GA

process. A vibration signal from various conditions of

bearing is extracted from a test rig and statistical features

extracted using wavelet coefficients by continuous wavelet

transform (CWT). A new heuristic classifier artificial bee

colony (ABC) algorithm is applied and fault diagnosis

results are compared with learning vector quantization

(LVQ) classifier and their relative efficiency were

compared based on their classification accuracy. To select

the predominant features a famous feature selection

approach a binary genetic algorithm (BGA) were used.

Index Term-- Test rig, Fault diagnosis, Continuous wavelet

decomposition, Statistical features, binary genetic algorithm

(BGA), Artificial bee colony (ABC) algorithm and Learning

vector quantization (LVQ),

I. INTRODUCTION

The dynamic performance of rotating components is highly

influential on efficiency of any rotating machine. Particularly

bearings, which is considered as the heart of an rotating

machinery. The accurate diagnosis of rolling bearing faults

can reduce or prevent the accidents and In case the rolling

bearing breaks down, the consequences can be serious [1].

Fault finding methodologies of rolling bearings have attained

importance in preventing of machinery from failures [2] as

well it is very important to know the nature and severity of a

bearing fault in order to select the most appropriate

maintenance action. In bearing health diagnostics the most

preferable, reliable and popular method is vibration analysis

technique [3-4]. Many signal processing techniques have been

developed and applied for machine diagnosis in this research

area. They include the conventional techniques, such as the

spectral analysis [5-6].Some of the researchers used FFT of

intrinsic mode functions in Hilbert–Huang transform which

provides multi-resolution in various scales of frequency along

with taking the contents of the signal frequency and

considering their variation in Bearing health diagnosis [7].Few

researchers compared Fourier Transform (FT), Windowed

Fourier Transform (WFT) and Wavelet Transform (WT)

methods for phase calculation and it was found that WT and

WFT was more appropriate than FT in calculation at

discontinuities [8]. Wavelet transform is applied to find

changes in the vibration signals obtained from bearings being

monitored. In particular, The wavelet and the envelope

detection (ED) method deployed in fault diagnosis of rolling

element bearing and the results showed that both the wavelet

and ED methods are effective in finding the bearing fault, but

the wavelet method is less time expensive[9].Continues

wavelet transforms(CWT) are often used for find the

singularity points in output signals sampled from the machines,

furthermore, for fault diagnostics and wavelet transform

modulus maxima to detect abrupt changes in the vibration

signals obtained from operating bearings being monitored

[10].A specific method such as singularity analysis across all

scales of the continuous wavelet transform is performed to

identify the location (in time) of defect-induced bursts in the

vibration signals. In one dimensional CWT the dimensionality

of the raw data is not reduced but it preserves the missing

features of DWT and it is a complementary part of DWT and

usage of simulation results showed more distinctive fault

signatures with coefficients of wavelet decomposition rather

than the actual signal [11].

In recent years, intelligent fault diagnosis of rolling bearing

based on statistical features which is extracted from time and

frequency domain vibration signals, has received significant

attention because of containing significant information about

component failure [12-13 ]. Over a last decade rolling-element

bearing heath monitoring has been an important research

topic in pattern recognition domain. However, most studies

have focused on fault type classification based on acquired

vibration fault samples using classifiers such as support

vector machine and neural networks [14-15] . Researchers [16]

prefer to use some selected number of features to classify data

in the aim of reduce the dimensionality of the data without

compromise the useful information and obviously to reduce

computation timing. The aim of the present work is to develop

a new automatic monitoring and diagnosing procedure for

detect the condition of rolling-element bearings in early stage



with less computation time and higher classification accuracy .

An traditional frequency spectrum analysis method of normal

as well the faulty signals by means of a manually interpreted

knowledge-based system, is also proposed to diagnose

whether the defect is in the inner race, outer race, rolling

element along with automated system of procedure using

CWT based feature extraction , GA for optimum feature

selection process and comparison of results using classifiers

such as ABC and LVQ algorithm is not yet attempted .This

paper is one such attempt to apply the above mentioned

methods in bearing fault diagnosis.

II. EXPERIMENTAL STUDIES

An experimental setup was designed to perform the tests to

validate the proposed methodology (Fig.1). Experimental set-

up shown in Fig-2 consists of variable frequency drive (VFD),

three phase 0.5 Hp AC motor, bearing ,belt drive, gearbox and

brake drum dynamometer with weighing scale. A standard

deep groove ball bearing (No. 6001) is used in this experiment.

Tri axial type accelerometer (Vibration sensor) is fixed over

the bearing block to measure the vibration signals. 24 Bit,

ATA0824DAQ51 data acquisition system was used and the

signals were collected with the sampling frequency of 12800

hz. Bearing was driven by an motor at a constant rotating

speed of 1700 r/min. Constant load was applied by brake drum

dynamometer and the speed is monitor by tachometer.

Various bearing conditions indicating normal, outer race fault,

inner race fault and ball fault of 1mm deep crack are depicted

in Fig.1. These artificial faults created using the electric

discharge machining process. For each bearing conditions, the

experiments carried out to acquire time domain vibration

signals for prescribed load and speed conditions. FFT(Fast

Fourier Transform) is the most common transformation

technique in health monitoring due to its tranquil

interpretation of fault condition, WT is more appropriate than

that of FFT in calculation at discontinuities. WT transfers the

data from time domain to time-frequency domain

Fig. 1. Flow chart for

automated bearing condition diagnosis

The Fig-2 indicates the test rig as well the normal, outer race ,

inner race and ball fault(1mm crack depth) conditions were

formed using the wire cut EDM process. The experiment of

acquiring vibration signal is carried out for all bearing

conditions. The number of the sample data for each bearing

condition is depicted in Table I. Sample signal of the four

bearing conditions is in time domain is illustrated in Fig. 3.

Commonly used transformation technique in health

monitoring is Fast Fourier transform (FFT), which is used to

transform the time series data to frequency domain, where the

signal is used to deduction of sine and cosine waves from the

sample. FFT is executed on sample signals for all the states of

bearing. The bearing can be diagnosed by analysing the

abnormal frequency-domain amplitude. The frequency of the

abnormal vibration is called fault frequency which is decided

by the fault location .The following equations given the detail

of fault characteristic frequencies for different parts of

bearing .The characteristic bearing frequencies are BPFO-

Ball Pass Frequency Outer Race, BPFI- Ball Pass Frequency

Inner Race, FTF- Fundamental Train Frequency and BSF-

Ball Spin Frequency. These characteristic frequencies are

useful to find the defects of the bearing components from the

concern component frequencies and its harmonics.

Frequency analysis may be the most fundamental approach for

bearing condition monitoring and fault detection. In tradition ,

finding those frequencies and measuring the amplitude

variations in the particular frequency and its side bands as well

the harmonics of those frequencies will give the information

of the health condition of bearing (shown in Fig.4 and Fig.5)

Shaft rotational frequency- Fs( Hz) = Shaft speed/60 (1)

Ball passing frequency outer race, (BPFO) =

(2) 1 cos

2

b d

d

N BFs

P

Raw

signal

extraction

Continuous

Wavelet

transform

Feature

Extraction

Feature

Selection by GA

ABC/LVQ

Classification

Classification

comparison



Ball passing frequency inner race (BPFI) =

(3)

Fundamental train frequency (FTF) =

(4)

Ball spin frequency (BSF) =

(5)

Fig. 2. Test rig (experimental set up) and different bearing conditions

1 cos2

b d

d

N BFs

P

11 cos

2

d

d

BFs

P

2

2

21 cos

2

d d

d d

P BFs

B P



Fig. 3. Time domain raw signals (a) Normal (b)Outer race fault (c) Ball fault (d)Inner race fault

Fig. 4. Frequency spectrum signals with harmonic variations in characteristic frequencies (a) normal (b) outer race fault (c) ball fault (d) Inner race fault

In this proposed work the wavelet transform method to be used as dimensionality reduction function for the raw data. The

features are extracted from the wavelet-transformed data. These features form a transformed space and it is used as the input of

next process called feature selection and further classification. Limitation of FFT is that it cannot find the non-stationary transient

information from the samples, which serves as the reason this paper focus on wavelet transform.



TABLE I

Sample structure used for training and testing of proposed

Fig. 5. Frequency spectrum signals (a) normal (b) outer race fault (c) ball fault (d) Inner race fault

I. THEORETICAL BACKGROUND OF WAVELET TRANSFORM

Wavelet Transform (WT) is a time-frequency decomposition

of a sample signal into ―wavelet‖ basic function. Wavelet

analysis is widely used for decomposing, de-noising and

signal analysis over a non-stationary signals. At high

frequencies WT gives good time and poor frequency

resolution, and at the same time at low frequencies it gives

good frequency and poor time resolution. Investigation with

wavelets proceed with breaking up a signal into shifted and

scaled versions of its mother (or original) wavelet, that is

obtaining one high frequency term from each level and one

low frequency residual from the last level of decomposition.

In other words Decomposition of signal is a process of

breaking of signals into lower resolution components with

respect to levels. In general two categories of transformation

widely used in wavelet: Continuous Wavelet Transform

(CWT) and Discrete Wavelet Transform (DWT). Continuous

wavelet transform had the capability by creating time-

frequency signal which contains a very good time and

frequency localization. This locate the wavelet transform

apart from the Fourier Transform, the effect were

accumulation of higher frequency sine waves spread

throughout the frequency axis.

Bearing Condition Number of samples

For training For testing

All Normal 100 20

Inner raceFault 100 20

Outer raceFault 100 20

Ball Fault 100 20



TABLE II

CWT is widely used to divide a continuous-time function into

wavelets. The continuous wavelet transform of a time function

z(t) is denoted as :

(6)

Where is a continuous function in both the time

domain and the frequency domain called the mother /original

wavelet and * represents operation of complex conjugate.

Further expansion of gives

Where x, y ∈ R, x≠0 (7)

In general mother wavelet gives a source function to generate

the translated and scaled version of its sibling wavelets. As

given in equation (7), the transform signal CWT (a, b) is

defined on plane x - y, were a and b are used to change the

frequency and the time location of the wavelet. Whenever

high frequency resolution is required, the decrement of x will

construct a high-frequency wavelet and vice versa is possible.

In other side as y increases, the wavelet transverses the length

of the input signal, and increases or decreases in response to

changes in the local time and frequency content of the signals.

Acquired signals are decomposed based on Continuous

Wavelet transform, which is then used for extracting various

statistical features. Transform coefficients are a measure of

similarity between the raw and daughter wavelets [17].

Morlet wavelet has equal octave intervals and resulting in the

first formalization of the continuous wavelet transform. It

have a cosine function which exponentially decreases at both

ends (Fig. 4). It looks like an impulse function modulated with

a cosine function. Morlet wavelet is more suitable in cases of

variations found in abnormal stationary signals.

Fig. 4. Morlet wavelet

( , )( , ) ( ) ( )a bCWT x y z t t dt

*

(a,b)(t)ψ

*

(a,b)(t)ψ

1( , )

t yx y xx

Feature Equation Definition

Mean 1

n

i

imean

k

kn

Average of all values in the population

Standard deviation

2

1

1( )

1

n

Sd i

i

k kn

Square root of an unbiased estimator of the

variance of the population

Kurtosis 4

1

1( )

n

kur

i

k k t kn

Fourth central moment of X, divided by fourth

power of its standard deviation

Root means square 2

1

1 n

rms

i

k kn

Root of sum of squared values

Variance

2

2var

( )

1

i

n

k k

kn

Measures how far a set of numbers is spread out

Peak to RMS . .

h

peak to rms

rms

kk

k

Ratio of the largest absolute value and the root

mean squared value

Peak to peak

2p p h lk k k Difference between largest and smallest values

Skewness

3

1

3

n

i

iskew

k k

kn

Third central moment of the value, divided by the

cube of its standard deviation

Minimum min min( )ik k

Minimum value in the set

Maximum max max( )ik k

Maximum value in the set

http://en.wikipedia.org/wiki/Continuous_wavelet_transform



After a trail of experiments, it found that scaling the signal

with a factor of 8 is considered to have more efficiency than

all other scales .The equation for Morlet Wavelet transform is

given by equation (8).

( )

√ (

)

(8)

: central frequency of mother wavelet

Figure 3 depicts the plot of raw signal and its corresponding

Morlet Continuous wavelet transform coefficients of four

types of signals – Normal bearing, inner and outer race and

Ball faults. When comparing raw and transformed it is noticed

that the anomalies are more distinct in transformed signal.

This will help increase the classifier accuracy.

Statistical features such as Mean , standard deviation etc.,

(shown in Table II) were used to extract the required features

from the coefficients and calculated feature sets (shown in

Table III) are used as a input for further feature selection and

classification

TABLE III

Sample feature values extracted of training data (before normalization) from 1D Morlet-wavelet decomposition for 3 different conditions of bearing fault

IV.GENETIC ALGORITHM

Genetic Algorithm (GA) is a search heuristic technique, which

imitates the process of natural selection. This algorithm is

consistently used to generate positive solutions to search and

optimization problems. The algorithm produces an optimal

solution based on Darwinian principle of ‗survival of the

fittest‘ through a series of iterative calculation. GA begins

with a set of chromosomes called population. After initializing

the population randomly, GA evaluated the fitness of each

individual. The algorithm then generates successive

generations of population in order to obtain an optimal

solution. Successive generations are created using natural

evolution such as mutation, crossover, etc. At each generation

a fitness function is used to evaluate the candidate solution.

Crossover and mutation operations are the main parameters,

which impact the fitness value. Every successive generation is

the product of mutation or crossover of the previous

generation and the candidates are chosen according to the

fitness value. The candidates (Chromosomes) with best fitness

have higher probability of being selected for reproduction.

Thus after successive generations the best candidate or

solution shall be obtained [18].

In this research paper Genetic Algorithm is used to select the

best features which will be used as the input to the classifier.

Therefore, the initial population is randomly selected features.

Each candidate represents a feature among the total of ten.

The objective of GA is:

Establish least within-class distance

Establish maximum between-class distance

These two objectives are applied to each feature and the best

features are selected. Finding the feature with minimum

within-class distance ensures that the samples are lying at the

least distance from the center of respective class. This will

guarantee better resemblance among the samples in the

respective classes and improve the chances of correct

classification [19]. The within-class distance is defined as:

(9)

Where c is the class, c=1, 2, 3, 4; Samples, S=1,2,3….n; Mc is

the mean vector of the class c.

Having a maximum between-class distance ensures that the

particular feature will guarantee better divergence from the

other classes thereby increasing classification accuracy.

(10)

M is the mean vector of all the classes.

The cost function or the fitness function defined for GA is

n

j

c

c

j

T

c

c

jc

c MSMSn

D1

1

MMMMnD c

c

T

cib

4

1

Feature Normal Inner-race fault Outer-race fault Ball fault

sample-1 sample-2 sample-1 sample-2 sample-1 sample-2 sample-1 sample-2

Mean -0.0003 0.00051 -7.00E-05 0.00034 -0.00049 0.00036 -0.00029 2.00E-05

Standard deviation 0.1755 0.16975 0.12791 0.13382 0.10916 0.11237 0.20063 0.23035

Kurotsis 2.49133 2.38996 2.3897 3.05922 4.16713 3.72449 7.08792 6.79926

RMS 0.17541 0.16967 0.12784 0.13375 0.10911 0.11231 0.20053 0.23023

Variance 0.0308 0.02882 0.01636 0.01791 0.01192 0.01263 0.04025 0.05306

Peak2rms 2.58228 2.53587 2.67953 3.54373 4.28417 3.72643 4.95698 5.15222

Peak2peak 0.89626 0.85445 0.68458 0.91314 0.89701 0.81399 1.97231 2.29468

Skewness 0.00247 -0.00595 -0.0057 -0.00236 0.02021 0.01894 0.02296 0.02338

Minimum -0.45296 -0.4242 -0.34256 -0.47398 -0.46744 -0.39547 -0.99404 -1.10846

Maximum -0.45296 -0.4242 0.34201 0.43916 -0.46744 -0.39547 0.97827 1.18622



(11)

The candidate (Chromosome), which minimizes this function

the best, are chosen and given as classifier input and accuracy

of classifier for various combinations of features are

compared. The procedure and pseudo code of GA is given in

Fig 6(a) &6(b).

V. ARTIFICIAL BEE COLONY ALGORITHM

Artificial Bee Colony (ABC) Algorithm is a swarm-based

algorithm [20]. It is based on the foraging behavior of

honeybees. In ABC the bee colony consist of three types of

bees: employed bees, onlooker bees, scout bees. To

incorporate ABC, the optimization problem must be converted

to a problem of obtaining the best parameter vector, which

minimizes an objective function. After which the algorithm

randomly initializes the solution vector and iteratively

improve it and thereby achieves the most optimal solution.

The solution vector is the food source of the foraging bees

[16].

The algorithm can be divided into four parts for better

understanding.

1. Initialization Phase

2. Employed Bee Phase

3. Onlooker Bee Phase

4. Scout Bee Phase

1) In initialization phase all the vectors of the population of

food source are initialized (Xf). The size of the population is

the total number of (employed bees + onlooker bees). Each

food source will contain n number of (Xm,i , i=1, 2..n) which

have to be optimized so the objective function is minimized.

The initialization is done by the following equation:

(12)

b

c DDD 1

iiiif LUrandLX 1,0,

Generate random initial population

Evaluate each individual’s average fitness (ref eqn. 1, 2 & 3)

Repeat

Select best ranking individuals

Randomly pair individuals to reproduce

Obtain crossover off springs

Also obtain mutated off springs

Determine each individuals fitness

Update the new off springs into the population

RETURN best individual among population (Selected

Feature Vector)

END procedure

Fig. 6 (b) Pseudo code of genetic algorithm

No

Final best combination of features

Yes

Initial population generation with different

combination of features

Evaluate fitness of each combination by

finding the in-class and between class

distances

Choose best ranking combinations

Obtain new

combination using

crossover function

Obtain new

combination using

mutation function

Update the population with the best

combination

Terminate

?

Pair the

combinations

Evaluate fitness of each combination

by calculating in-class and between-

Fig. 6 (a) Flow chart of genetic algorithm



2) An employed bees search for a new food source (Solution,

Vf) which lies near to the one in memory(Xf.)The new food

source can be obtained using the equation:

(13)

The fitness of the newly found solution is evaluated. If the

fitness value (nectar amount) is greater than the previous

solution the bee memorizes the new solution and discards the

old one. The fitness (fitf) can be calculated using the formula:

(14)

Here is the objective function of food source ,

which should be minimized. When is greater than

zero the fitness values will become less (making that food

source less profitable) and whenever is minimized

the fitness value is proportionally higher (making the food

source more profitable).

3) Onlooker bees are categorized as unemployed bee along

with scout bees. The employed bees complete the search

process and returns to onlooker bees to the fitness value (food

source information or nectar value) . The onlooker bees

evaluate the fitness value and choose a food source on a

probabilistic basis. The probability value pmwith which

is chosen by an onlooker bee is defined by the equation:

(15)

After the onlooker bees choose the food source , a

neighboring food source is chosen using equation (10) and its

nectar amount or fitness value is calculated. Then a greedy

selection is applied to the and .This will ensure

that better solution attract more onlooker bees and

4) Employed bee tries to improve their food source by

searching the neighborhood. If the employed bee fails to

improve the food source after a predetermined number of

iterations its becomes a scout bee and moves to a random food

source and continue searching. The procedure and pseudo

code of ABC algorithm is given in Fig 7(a) &7(b).

ikififif XXX ,,,,if,V

01

0.1

1

ffff

ff

ffff

xfifxfabs

xfifxfxfit

ff xf

fx

ff xf

ff xf

fx

PS

f

ff

ff

f

xfit

xfitP

1

fx

fx

fv



Initialization Phase:

Initialize all vectors of the population food source,

Eq.(9)

Send the employed bees to the current food source.

REPEAT:

Employed Bee Phase:

For each employed bee- Search for a new food

source near to Eq.(10)

Evaluate Fitness (Fitf) for the newly found food

source (11)

Apply greedy selection process on and

Onlooker Bee Phase:

Obtain fitness information of from

employed bees.

Calculate Probability value of food source Pm

Eq. (12)

A neighbor is chosen using Eq.(10) and its

fitness value is evaluated

Apply greedy selection process on and

For each Scout Bee

If there is an abandoned solution for the scout

then replace it with a new solution, which will

be randomly produced

Memorize the best solution so far

UNTIL cycle = MAX CYCLE NUMBER or MAX CPU

TIME

End procedure

Final Class Centers

Yes

No

Initialize employed bees randomly with

solution vectors

Send the employee bees to the solution

vector

For each sample, distance is taken from the

solution vector

Mean Square Error of misclassification is

used as the fitness value

Onlooker bees find the probabilistic values

of the fitness

Based on the best probability value, a new

solution is chosen

Employed bees share fitness value with

onlooker bees

Solution vectors with which the minimum

distance is obtained, the sample belongs to

that class

Check for abandoned solutions

Terminate

?

Fig. 7 (b) Pseudo of ABC classification algorithm Fig. 7 (a) Flow chart of ABC classification algorithm



VI.LEARNING VECTOR QUANTIZATION

Learning Vector Quantization is a supervised

classification algorithm. It is a special case of artificial neural

networks. It is applied on the basis of winner-takes-all

methodology and is related to self-organizing Maps (SOM)

[22] and K-Nearest Neighbor algorithm [23].

There are a few different LVQ algorithms present, but all are

based on the following basic concept [24]:

Weight vectors (class centers) are randomly initialized.

A set of learning sample inputs (Xi) are given to the classifier

along with the respective correct class labels.

The distance between the class centre and the input vector is

determined and a winner is selected.

This classification method uses a weight vector which is the

centre for each class. Initially a set of learning input samples

are given to the classifier and each input will have a correct

class label. The weight vector for each class is randomly

initialized using an input vector within the class. The classifier

then calculates the Euclidean distance (eq. 16) between the

input vector and the class centre for all classes. The input is

classified to the class which the minimum distance

corresponds to. Each classified input is then checked for

classification accuracy by utilizing the class label information

that each input vector holds [20]. If the input vector is

correctly

classified, the centre of the corresponding class is pushed

towards the input vector(eq. 17). Otherwise, the centre of the

corresponding class is pushed away from the input vector(eq.

18). The following equation defines the LVQ process:

( ) ( )

(16)

( ) ( ) ( ( ))

(17)

( ) ( ) ( ( ))

(18)

Subsequently all the inputs are fed to the classifier and the

final class centre is obtained. This is called the training phase.

In the validation phase a new input is given to the classifier

and it calculates the Euclidean distance between the input

vector and each class center (obtained after training). The

input is classified to the class which corresponds to the

minimum distance. The pseudo code and the procedure of

LVQ classification algorithm is given in Fig 8(a) &8(b).

Fig. 8 (a) Pseudo code of LVQ classification algorithm

Initialize random reference weight vectors, Wi(center)

For each input vector Xi

Repeat until all input vectors are considered:

Using equation (13) , compute Euclidean distance Di between weight Wi and

input Xi vectors

Find vectors of minimum Euclidean distance

If sample is correctly classified:

Push weight vector towards the input vector (eq. 14)

Else

Push the weight vector away from the input vector (eq. 15)

Reduce learning rate, 𝛼

End



Fig. 8 (b) Flow chart of LVQ classification algorithm

VII.IMPLEMENTATION OF FEATURE SELECTION AND

CLASSIFICATION

Matlab platform is used to execute the process of feature

selection and classification. Initially a number features are

extracted from the raw signal. This large number of values in

the data can lead to increase in computational complexity thus

classifier efficiency is affected. This can be avoided by

selecting the best features required for the classification and

removing the unnecessary extracted features. It helps in

improving the performance of the classifier. An efficient way

of feature selection can be done with Genetic Algorithm.

Optimization of the feature selection is done based on

distance-based selection method. In this work, Initial

population is randomly selected. Number of chromosomes is

equal to number of features taken to the optimized selection

process. Ten features are taken as GA input parameters and 50

generations were executed. In this research paper Genetic

Algorithm is used to select the best features which will be

used as the input to the classifier. Therefore, the initial

population is randomly selected features. Each candidate

represents a feature among the total of ten.

LVQ classification method uses a weight vector, which is the

centre for each class. Initially a set of learning input samples

VIII.IMPLEMENTATION of GA

Yes

No

Initialize random weight vectors

Compute Euclidean distance between

each sample and the weight vector

Obtain vectors with minimum Euclidean

distance and classify

Correctly

classified

?

Push weight vectors

towards the input

sample

Push weight vectors away

from the input sample

Reduce learning rate, 𝛼

Terminate?

Class center vector

Yes No



are given to the classifier and each input will have a correct

class label. The weight vector for each class is randomly

initialized using an input vector within the class. The classifier

then calculates the Euclidean distance (Eq. 11) between the

input vector and the class centre for all classes. The input is

classified to the class, which the minimum distance

corresponds to. Each classified input is then checked for

classification accuracy by utilizing the class label information

that each input vector holds [25]. If the input vector is

correctly classified, the centre of the corresponding class is

pushed towards the input vector (Eq. 12). Otherwise, the

centre of the corresponding class is pushed away from the

input vector (Eq. 13). Subsequently all the inputs are fed to the

classifier and the final class centre is obtained. This is called

the training phase. In the validation phase a new input is given

to the classifier and it calculates the Euclidean distance

between the input vector and each class centre (obtained after

training). The input is classified to the class which

corresponds to the minimum distance. The weight vectors

correspond to the class center. ABC algorithm is an

optimization algorithm and it is used as a another classifier in

this work . It exploits the foraging behavior of bees in order

optimize a problem. To do that, the problem should be defined

and a cost function should be designed accordingly. The cost

function will differ from one application to another and it is

designed to return a value which will be minimized by the

algorithm. Implementation of a classifier using an

optimization algorithm is done by designing a cost function,

which returns the mean square error of misclassification to the

algorithm which it tries to minimize. The ABC algorithm

initializes each employed bee with random solution vector (a

vector with four class center). Since we have used 10

employed bees, there will be 10 solution vectors. Then the

fitness of each vector is calculated. Fitness calculation is

carried out in 3 steps. First step is to find the Euclidean

distance from each sample to four class centers. The sample is

then classified to the class corresponding to minimum

distance. Then, the mean square error of classification is

calculated. This value is the fitness function. The employed

bees then share the fitness information with the onlooker bees.

The onlooker bees then calculate a probabilistic value using

Eqn.7 with data acquired from the employed bees. The

solution vector corresponding to the highest probabilistic

value is memorized. In case a food source is not yielding good

fitness after a predefined number of iteration (limit value) it

will be considered as an abandoned source. The employed bee

corresponding to the source will become a scout bee searching

for a new one. This process is continued until a fixed number

of iterations are carried out. As the iterations increase, the

center of each class moves towards a better position. Initially

the colony size which comprise of employed bees and

onlooker bees taken as 20 and food sources as 10.

VIII. RESULTS AND DISCUSSIONS

The bearing signals were extracted in the sampling rate of

12800 Hz through accelerometer. Signals are decomposed

and statistical features are extracted after obtaining the

vibration signals for four conditions bearing using morlet

wavelet coefficients. GA based feature selection, initial

parameters taken as follows: the population is 80, the length of

chromosome code is 20 (2 sets of features, each set contains

10 nos.), number of generation is 100. The best combination

of selected features satisfying the given objective. The values

greater than 0.5 is considered as a selected feature and the

fitness of the process is also arrived. From the feature results

11011000110010001000 features selected for

example:(F1,F2,F4,F5,F9,F10,F13,F17) are selected and the

remaining are abandoned. The Selected feature subsets from

GA are used to train and test the ABC and LVQ classifiers.

The total feature set calculated is split into training (70%) and

testing (30%) data set. Samples for a test set are used to

evaluate the LVQ and ABC classifiers.

TABLE IV

Output efficiencies of LVQ classifier

LVQ Classifier Scheme

Accuracy on test data (%) Average

Accuracy on

testing data

(%)

Normal

condition

Inner-

Race fault

Outer-

Race fault

Ball Fault

All 20 features (Without feature

selection)

LVQ-S1 90 97 95 98 95.00

F1,F2,F4,F5,F9,F10,F13,F17 LVQ-S2 100 100 90 100 97.50

F3,F5,F6,F7,F9,F11,F12,F14 LVQ-S3 100 95 90 100 96.25

Random 5 features -

F1,F10,F11,F14,F16

LVQ-S4 75 100 70 100 86.25

The classification results Table IV and Table V as well Fig

9(a) and Fig 9(b) consists of the description of respective

schemes and the corresponding prediction results . Among all

schemes, overall average testing accuracy of 97.5% is higher

in case of LVQ-S2 schema. It is also seen that the results from

LVQ classifier without feature selection has given 95%

accuracy whereas with feature selection. ABC with GA gives

a classification accuracy of 98.75% which is the maximum

among the two classifiers. Meanwhile, the ABC classifier

provides 95.50% accuracy when built without feature

selection. The accuracy increases with feature selections using

GA even with reduced features .Thus we can infer that a



combination of GA and ABC (scheme ABC-S2) gives a better result.

TABLE V

Output efficiencies of ABC classifier

ABC Classifier

Accuracy on test data (%) Average

accuracy

on testing

Data (%) Normal

condition

Inner-

Race fault

Outer-

Race Fault Ball Fault

All 20 features (Without feature

selection)

ABC-S1 92 100 95 95 95.50

F1,F2,F4,F5,F9,F10,F13,F17 ABC-S2

95 100 95 100 98.75

F3,F5,F6,F7,F9,F11,F12,F14 ABC-S3

100 100 100 90 97.5

Random 5 features -

F1,F10,F11,F14,F16 ABC-S4

70 100 95 95 90.00

Fig. 9 (a) Classification scheme Vs Accuracy of LVQ



Fig. 9 (b) Classification scheme Vs Accuracy of ABC

XII. CONCLUSION

This paper introduced the another effective approach by

combine the strength of optimization technique genetic

algorithm in feature selection process and ABC as a

classification algorithm to solve the bearing fault diagnosis

problems. In feature selection GA has proved its capability by

quickly converge and it has a strong search capability and in

selecting minimal features. The results show that how this

present approach increases the predictive accuracy for bearing

fault diagnosis. The proposed methods are compared by not

deploy the feature selection process before classification and

by using feature selection process along with both the

classification algorithms ABC and LVQ in the process .

Classification accuracy measures are used to evaluate the

performance of the proposed approaches and proven clearly

the effectiveness of the present appr

REFERENCES [1] X. Chiementin, F. Bolaers and J.-P. Dron, Early detection of

fatigue damage on rolling element bearings using adapted wavelet,

Journal of Vibration and Acoustics, 129 (4) (2007) 495-506. [2] Mathew J, Alfredson RJ. The condition monitoring of rolling

element bearings using vibration analysis. Trans ASME, J Vibr,

Acoust, Stress Reliab Design 1984; 106:447–53. [3] Scheffer, C., & Girdhar, P. (2004). Practical machinery vibration

analysis and predictive maintenance. Newnes.-2004

[4] N. Tandon a, A. Choudhury, A review of vibration and acoustic measurement methods for the detection of defects in rolling

element bearings,Tribology International 32 (1999) 469–480

[5] Harris, T. (1991). Rolling bearing analysis. New York: Wiley. [6] Taylor, J. (2003). The vibration analysis handbook. Vibration

consultants, Tampa, FL.

[7] V.K. Rai, A.R. Mohanty - Bearing fault diagnosis using FFT of intrinsic mode functions in Hilbert–Huang transform,Mechanical

Systems and Signal Processing 21 (2007) 2607–2615

[8] Zonghua Zhang, Zhao Jing, Zhaohui Wang, Dengfeng Kuang, Comparison of Fourier transform, windowed Fourier transform,

and wavelet transform methods for phase calculation at

discontinuities in fringe projection profilometry, Optics and Lasers in Engineering 50 (2012) 1152–1160

[9] P.W. Tse, Y.H. Peng, R. Yam, Wavelet analysis and envelope

detection for rolling element bearing fault diagnosis—their effectiveness and flexibilities, Journal of Vibration and Acoustics

123 (2001) 303–310

[10] K. Mori, N. Kasashima, T. Yoshioka, Y. Ueno, Prediction of spalling on a ball bearing by applying the discrete wavelet

transform to vibration signals, Wear 195 (1996) 162–168.

[11] Q. Sun, Y. Tang, Singularity analysis using continuous wavelet transform for bearing fault diagnosis, Mechanical Systems and

Signal Processing 16 (2002) 1025–1041.

[12] Xiaoran Zhu*, Youyun Zhang and Yongsheng Zhu-Intelligent fault diagnosis of rolling bearing based on kernel neighborhood

rough sets and statistical features, Journal of Mechanical Science

and Technology 26 (9) (2012) 2649~2657 [13] Jing Lin,Liangsheng Qu -Feature extraction based on morlet

wavelet and its application for mechanical fault diagnosis., Journal of Sound and Vibration (2000) 234(1), 135-148.

[14] Yang, J., Zhang, Y., & Zhu, Y. (2007a). Intelligent fault diagnosis

of rolling element bearing based on SVMS and fractal dimension. Mechanical Systems and Signal Processing, 21, 2012–2024.

[15] Wang, H., & Chen, P. (2011). Intelligent diagnosis method for

rolling element bearing faults using possibility theory and neural network. Computers & Industrial Engineering, 60, 511–518.

[16] Weixiang Sun and Jin Chen, Jiaqing Li ,Decision tree and PCA-

based fault diagnosis, Mechanical Systems and Signal Processing 21 (2007) 1300–1317.

[17] Jing Lin, Feature extraction of machine sound using wavelet and

its application in fault diagnosis, NDT&E International 34 (2001) 25±30

[18] Cheng-Lung Huang , Chieh-Jen Wang. A GA-based feature

selection and parameters optimization for support vector machines, Expert Systems with Applications 31 (2006) 231–240

[19] Ngoc-Tu Nguyen, Hong-Hee Leeand Jeong-Min Kwon, Optimal

feature selection using genetic algorithm for mechanical fault detection of induction motor, Journal of Mechanical Science and

Technology 22 (2008) 490~496

[20] Dervis Karaboga, Celal Ozturk, A novel clustering approach: Artificial Bee Colony (ABC) algorithm, Applied Soft Computing

11 (2011) 652–657

[21] Changsheng Zhang, Dantong Ouyang, Jiaxu Ning, An artificial bee colony approach for clustering, Expert Systems with



Applications 37 (2010) 4761–4767

[22] T. Kohonen, Self-Organizing Maps, Springer, Berlin, 1997. [23] Jianye Liu, Yongchun Liang, Xiaoyun Sun, Application of

Leaming Vector Quantization Network in fault Diagnosis ofPower

Transformer, proceedings of IEEE conference 2009, international conference on mechatronics and automation

[24] Fahad and Sikander,Classification of textual documents using

learning vector quantization. Information Technology Journal 6.1 (2007): 154-159

[25] Ouyang Sen, Song Zhengxiang, Wang Jianhua, Chen Degui,

application of LVQ neural networks Combined with Genetic Algorithm in Power Quality Signals Classification, IEEE 2002

bearing fault diagnosis using cwt, bga and artificial bee ... · bearing fault diagnosis using cwt,...

Documents