chapter 3 feature extraction using genetic...

46

CHAPTER 3

FEATURE EXTRACTION USING GENETIC ALGORITHM

BASED PRINCIPAL COMPONENT ANALYSIS

3.1 INTRODUCTION

Cardiac beat classification is a key process in the detection of

myocardial ischemic episodes in the electrocardiographic signal. Myocardial

ischemia is caused by insufficient blood flow to the muscle tissue of the heart.

This reduced blood supply may be due to narrowing of the coronary arteries,

obstruction by a thrombus, or, less commonly, due to diffuse narrowing of

arterioles and other small vessels within the heart. Ischemia is one of the

leading causes of death in modern societies and, as a consequence, its early

diagnosis and treatment is of great importance. In the ECG signal, ischemia is

expressed as slow dynamic changes of the ST segment and/or the T wave.

Long duration electrocardiography, like Holter recordings or continuous ECG

monitoring in the coronary care unit, is a simple and noninvasive method to

observe such alterations. The development of suitable automated analysis

techniques can make this type of ECG recording very effective in supporting

the physician’s diagnosis and guide patient management in clinics and clinical

applications. The accurate ischemic episode detection in the recorded ECG is

based on the correct classification of the ischemic cardiac beats. Several

techniques have been proposed for ischemic beat classification, which

evaluate the ST segment changes and the T-wave alterations with different

methodologies.

47

3.2 PREPROCESSING OF ECG

The main aim of the ECG signal preprocessing is to prepare a

compact description of the ST–T complex, composed from the ST Segment

and the T–wave, for input to the classification methodology with the

minimum loss of information.

Until now, ECG recordings that are used for the diagnosis of

ischemic episodes are affected by noise, which deteriorates significantly the

diagnostic accuracy. Better handling of the noisy ECGs can improve the

accuracy of the diagnostic methods and increase their applications in every

day practice. There are three types of noise in the ECG signal:

(a) Power line interference (A/C interference),

(b) Electromyographic contamination (EMG noise), and

(c) Baseline wandering (BW).

A/C interference contaminates the ECG signal with main frequency

interference, which sometimes is phase-shifted with respect to the main

voltage (50 or 60 Hz). EMG noise is correlated with muscle contraction and

overlaps with the frequency spectrum of the ECG signal. It is obvious that the

removal of the EMG noise alters also the original ECG signal. Finally, the

baseline wandering is caused by respiration and motion artifacts and generally

is a low frequency noise.

3.3 FEATURE EXTRACTION USING GENETIC PRINCIPAL

COMPONENT ANALYSIS

This section describes the feature extraction process from the beat

signals extracted from the electrocardiograms. Here we have used two

methods for feature extraction, namely PCA and GPCA. The main goal of this

48

work is to develop algorithms to automatically detect ischemia episodes. For

this purpose, features based on ST segment deviation T wave and QRS

complex morphology changes were extracted.

3.3.1 Feature Extraction Using Principal Component Analysis

Principal Components Analysis (PCA) is an exploratory

multivariate statistical technique for simplifying complex data sets. The PCA

transformation is selected as the tool for reducing the dimensionality of the

extracted ST-T samples. The PCA decomposition is in terms of second order

statistics optimum, in the sense that it permits an optimal reconstruction of the

original data in the mean-square error sense (subject to the dimensionality

constraint). The PCA transformation describes the original vectors (ST-T

complexes) according to the direction of maximum variance reduction in the

training set. The latter information is obtained by analyzing the data

covariance matrix. The orthogonal eigenvectors of the covariance matrix are

selected as basic functions for the signal projection operation. The

corresponding eigenvalues represent the average dispersion of the projection

of the input vectors onto the corresponding eigenvectors (basis functions).

The numerical value of each eigenvalue quantifies the amount of variance that

is accounted for by projecting the signal onto the corresponding eigenvector.

Accordingly, it represents the contribution of the eigenvector’s analysis

direction to the signal reconstruction in the mean squared error sense. For the

analysis of ECG signal the eigenvalues after the fifth have very small

numerical values. Thus, for the representation of ST-T Complex the first five

PCA coefficients were used to characterize about 97.9% of the signal energy.

A small performance improvement has been observed by using the first five

PCA coefficients instead of four. The five principal components extracted

from the corresponding ST-T Complex are assigned to each QRS fiducial

point. The first principal component (PC) and the second one (but to a less

49

extent) represent the dominant low-frequency component of the ST-T

Complex; the third, fourth, and fifth contain more high-frequency energy. In

the time series representation of the PCs the ischemic episodes appear as

peaks. A straightforward way for the detection of ischemic beats from the

PCA representation is to use as the input vector the PCA coefficients of a

single beat. This approach clearly accounts only for local information.

Therefore, a better approach that can extract also morphological information

from the ST-T episodes in such a way to distinguish artifacts and to

appreciate even weak ST episodes is necessary. This type of approach should

take into account the information from a sequence of beats instead of a single

beat.

Given ‘n’ observations on ‘m’ variables, the goal of PCA is to

reduce the dimensionality of the data matrix by finding ‘r’new variables,

where ‘r’ is less than ‘m’. Principal components project high dimensional data

into the subspace spanned by the eigenvectors with the ‘r’ largest eigenvalues

while remaining mutually uncorrelated and orthogonal. Each principal

component is a linear combination of the original variables. The algorithm to

obtain the Principal Components of a vector set X represented by a XN×M

matrix, where N represents the number of segments, and M represents the

dimension of the vectors that constitute the vector set.

The algorithm of PCA is explained as below:

a. Obtain the Mean vector ( ):1N

0iix x

N

1

b. Obtain the Covariance Matrix:T

ii

1N

0iiix )x()x(

N

1C

c. Obtain the eigenvectors and eigenvalues: eeCx where e is

eigenvector and is eigenvalue.

50

d. After creating the eigenspace we can proceed to recognition.

Given a new beat of an individual , the signals are

concatenated the same way as the training, the mean vector

is subtracted and the result is projected into the face space:

)(eT

kk

for k=1,..,M’. These calculated values of together form a vectorT = [ 1,

2, … , M’]. is then used to establish which of the pre-defined classes best

describes the new signal. The simplest way to determine class k that

minimizes the Euclidian distance:

2

kk

where k is a vector describing the kth

signal class. A signal is classified as

belonging to a certain class when the minimum k (i.e. the maximum

matching score) is below some certain threshold.

Choosing components and forming a feature vector: From the

experiments get 2040 components corresponding to the dimensionality of the

input sequence. Components that are significant from the point of view of

contribution to the total energy of the signal are selected. The selected

components together must constitute about 99% of the total energy of the

signal. This procedure decreases the data dimensionality without significant

loss of information. There are at least three proposed ways to eliminate

eigenvectors.

First is the mentioned elimination of eigenvalues with smallest

eigenvalues. This can be accomplished by discarding the last

60% of total number of eigenvectors.

51

The second way is to use the minimum number of

eigenvectors to guarantee that energy E is greater than a

threshold. A typical threshold is 0.9 (90% of total energy). If

we define Ei as the energy of the ith

eigenvector, it is the ratio

of the sum of all eigenvalues up to and including i over the

sum of all the eigenvalues: where k is the total number of

eigenvectors.

k

j

j

i

j

jiE11

The third variation depends upon the stretching dimension.

The stretch for the ith

eigenvector is the ratio of that

eigenvalue over the largest eigenvalue ( I):

Si = i / I

In our proposed method, Genetic Algorithm (GA) is used to

select the best eigenvectors.

3.3.2 Genetic Algorithm Approach

Genetic Algorithm is an adaptive heuristic method of global-

optimization searching and it simulates the behaviour of the evolution process

in nature. It maps the searching space into a genetic space. That is, every

possible key is encoded into a vector called a chromosome. One element of

the vector represents a gene. All of the chromosomes make up of a population

and are estimated according to the fitness function. A fitness value will be

used to measure the “fitness” of a chromosome. Initial populations in the

genetic process are randomly created. GA then uses three operators to

produce a next generation from the current generation: reproduction,

crossover, and mutation. GA eliminates the chromosomes of low fitness and

keeps the ones of high fitness. This whole process is repeated, and more

52

chromosomes of high fitness move to the next generation, until a good

chromosome (individual) is found. The main objective of genetic feature

selection stage is to reduce the dimensionality of the problem before the

supervised inductive learning process. Among the many wrapper algorithms

used, the Genetic Algorithm (GA), which solves optimization problems using

the methods of evolution, specifically “survival of the fittest”, has proved as a

promising one. GA evaluates each individual’s fitness as well the quality of

the solution. The fitter individuals are more eligible to enter into the next

generation as a population. After a required number of generations the final

set of optimal population with fittest chromosomes will emerge giving the

solution. The process of selection, crossover and mutation continues for a

fixed number of generations or till a termination condition is satisfied.

Genetic algorithms have been used for selecting the optimal subspace in

which the projected data gives higher recognition accuracy.

3.3.3 Genetic Principal Component Analysis

The input data is transformed to higher dimension using a non-

linear transfer function (polynomial function) and GA is used to select the

optimal subset of the non-linear principal components with the fitness

function taken as the recognition performance. As explained in the previous

section there is three possible ways to eliminate eigenvectors. Here, the GA is

used to select the best eigenvectors for PCA. In general, ‘M number of

eigenvectors having highest eigenvalues will be selected. The main drawback

of general PCA is that we can’t expect an equal contribution of principal

components from each class. And, the principal components are selected

based only on highest eigenvalues.

In this proposed method, we are going to choose only F number of

eigenvectors, for each class, the reduced feature set will contain S=NC×F

number of features, where NC is the number of classes. In this case, it has two

53

classes: ischemic and non-ischemic. The basic idea here is, instead of

choosing highest eigenvectors from the entire eigenspace, we are going to

choose the best eigenvector for each class based on Euclidian distance.

Initially the eigenspace are grouped based on number of classes. For each

class the principal components are selected using GA as discussed below:

Initially, the eigenspace are grouped based on number of classes

(NC). An n number of eigenvector is selected from each class at random. The

index of each eigenvector is used to construct one chromosome. Similarly N

number of chromosomes is generated. (N=10). For example, consider the

chromosome:

980 726 657 807 240 825 728 771 889 941…..

24 252 220 92 534 155 633 526 689 796…...

Each integer represents one eigenvector. The first 980 stands for

the 980th

eigenvector from the first class, the 24 in the second row represents

the 24th

eigenvector from the second class. The total length of the

chromosome is equal to the number of principal components required. Here,

we kept the size as 600. For each chromosome, the Euclidian distance within

the class (W) and between the classes (B) has been calculated. The fitness

value is calculated as:

f(x) = B / W

The chromosome which has the minimum fitness value (Gmin) is

stored as the best eigenvector set. Then the genetic operators are applied to

search for the optimum set.

Reproduction (selection) – The selection process selects

chromosomes from the mating pool directed by the survival of the fittest

concept of natural genetic systems. In the proportional selection strategy

54

adopted in this article, a chromosome is assigned a number of copies, which is

proportional to its fitness in the population that goes into the mating pool for

further genetic operations. Roulette wheel selection is one common technique

that implements the proportional selection strategy.

Crossover – is a probabilistic process that exchanges information

between two parent chromosomes for generating two child chromosomes. In

this work, single point crossover with a fixed crossover probability of pc=0.6

is used. For chromosomes of length l, a random integer, called the crossover

point, is generated in the range [1, l-1]. The portions of the chromosomes

lying to the right of the crossover point are exchanged to produce two

offspring.

Mutation – Each chromosome undergoes mutation with a fixed

probability pm=0.003. For binary representation of chromosomes, a bit

position (or gene) is mutated by simply flipping its value. Since we are

considering real numbers, a random position is chosen in the chromosome and

replaced by a random number between 0-9.The new populations is generated

after the genetic operators are applied. The current best eigenvector set is

(Lmin) selected from the new population and compared with the global one.

If the global set contains minimum fitness value then the local, the next

iteration is continued with the old population. Otherwise, the current

population is considered for the next iteration. This process is repeated for k

number of iterations. Figure 3.2 shows a flow chart for Genetic PCA based

feature Extraction

The algorithm is given as:

1. Construct the initial population (p1) with random eigenvectors

2. Calculate the fitness value (x) = B / W

3. Find out the Global minimum (Gmin)

55

4. For i = 1 to k do

a. Perform reproduction

b. Apply the crossover operator between each parent.

c. Perform mutation and get the new population. (p2)

d. Calculate the local minimum (Lmin)

e. If Gmin > Lmin then

i. Gmin = Lmin

ii. p1 = p2

5. Repeat

Figure 3.1 A Flow chart for genetic PCA based feature extraction

56

3.4 GENETIC PCA FOR ISCHEMIC BEATS CLASSIFICATION

Electrocardiography is a significant tool in analyzing the condition

of the heart. The ECG is the record of discrepancy of bioelectric potential

with respect to time as the human heart beats. It provides most valuable

information about the functional characteristics of the heart and cardiovascular

system. Myocardial ischemia is one of the diseases with highest incidence rate

in the industrialized countries. Prolonged severe or repeated ischemic

episodes can provoke irreversible damage to the cardiac tissue. ECG analysis

is not the most accurate method that exists to detect the ischemic events. We

proposed an improved version of PCA for feature extraction. Here, the

Genetic Algorithm (GA) is combined with PCA to extract more relevant

features. A Back propagation Neural Network is used to classify the beats into

either ischemic or non-ischemic, with the features from the GPCA.

Figure 3.2 Block diagram for GPCA based ischemic beats classification

The classifier employed in this work is a three-layer Back

Propagation Neural Network. The BPN optimizes the net for correct

responses to the training input data set. More than one hidden layer may be

beneficial for some applications, but one hidden layer is sufficient if enough

hidden neurons are used. Initially the features from the textural analysis

method, are normalized between [0,1]. That is, each value in the feature set is

divided by the maximum value from the set.

57

Wih

S1

Input Neurons

Output

Neuron

Hidden

Neurons

Who

S2

Figure 3.3 A Three-layer back propagation network

These normalized values are assigned to the input neurons. The

number of hidden neurons is equal to the number of input neurons and only

one output neuron. Figure 3.3 shows a Three-Layer Back propagation

Network for classification. Initial weights are assigned randomly between [-

0.5 to 0.5]. The output from the each hidden neuron is calculated using the

sigmoid function,

S1 = 1 / ( 1 + e- x

)

where =1, and x = i wih ki, where wih is the weight assigned between input

and hidden layer, and k is the input value. The output from the output layer is

calculated using the sigmoid function,

S2 = 1 / ( 1 + e- x

)

where =1, and x = i who Si, where who is the weight assigned between

hidden and output layer, and Si is the output value from hidden neurons. S2 is

subtracted from the desired output. Using this error (d) value, the weight

change is calculated as:

delta = d * S2 * ( 1 – S2)

58

and the weights assigned between input and hidden layer and hidden and

output layer are updated as:

Who = Who + ( n * delta * S1)

Wih = Wih + ( n * delta * k)

where ‘n’ is the learning rate, ‘k’ is the input values. Again calculate the

output from hidden and output neurons. Then check the error (d) value, and

update the weights. This procedure is repeated till the target output is equal to

the desired output. The network is trained to produce a 1.0 output value for

ischemic and 0.1 output value for non-ischemic. The classification

performance is validated using the ten-fold validation method and the results

were analyzed by using ROC analysis. Figure 3.4 A Flow chart for three-

layer Back propagation Neural Network Classifier.

Figure 3.4 A Flow chart for back propagation neural network classifier

59

3.5 RESULTS AND DISCUSSION

The European ST-T Database is used for evaluation of our

proposed algorithm. This database consists of 90 annotated excerpts of

ambulatory ECG recordings from 79 subjects. The subjects were 70 men aged

30 to 84, and 8 women aged 55 to 71. The database includes 367 episodes of

ST segment change, and 401 episodes of T-wave change. Each record is two

hours in duration and contains two signals, each sampled at 250 samples per

second with 12-bit resolution over a nominal 20 millivolt input range.

Two cardiologists worked independently to annotate each record

beat-by-beat and for changes in ST segment and T-wave morphology, rhythm,

and signal quality. ST segment and T-wave changes were identified in both

leads (using predefined criteria which were applied uniformly in all cases),

and their onsets, extrema and ends were annotated. Annotations made by the

two cardiologists were compared, disagreements were resolved by the

coordinating group in Pisa, and the reference annotation files were prepared;

altogether, these files contain 802,866 annotations. Over half (48 of 90

complete records, and reference annotation files for all records) of this

database is freely available from PhysioNet. In this paper, we have taken the

full length ECG signals from 17 patients and each signal will be translated

into 120 samples and totally 2040 beats for short duration and 20,400 beats

extracted for long duration analysis. This dimensionality is reduced by GPCA

as discussed in the earlier section. Figure3.5 shows the Comparison of

sensitivity at each fold from our proposed and feature extraction existing

methods. As shown in the figure, it is noted that GPCA output performance

with consistent and improved results.

60

70

75

80

85

90

95

PCA FUZZY GPCA

Per

cen

tag

e

Sensitivity

Figure 3.5 Comparison of sensitivity with feature extraction methods

Figure 3.6 shows the Az value of existing and the proposed methods

for automated ischemic beat classification. The area under the ROC curve is

an important criterion for evaluating diagnostic performance. Usually it is

referred as the AZ index. The AZ value of ROC curve is just the area under

the ROC curve. The value of AZ is 1.0 when the diagnostic detection has

perfect performance, which means that TP rate is 100% and FP rate is 0%.

Table 3.1 performance analysis of ischemic beat classification with sensitivity

and Az Value

0.7

0.75

0.8

0.85

0.9

0.95

PCA FUZZY GPCA

Az

va

lue

Figure 3.6 Comparison of Az value with feature extraction methods

61

Table 3.1 Performance analysis of ischemic beat classification

Methods Sensitivity Az Value

Principal Component Analysis 80% 0.78

Fuzzy Approach 81% 0.80

Genetic based PCA 92% 0.90

The Receiver Operating Characteristic (ROC) curve is one of the performance

measures for classification. ROC curves measure predictive utility by

showing the trade off between the true-positive rate and the false-positive rate

inherent in selecting specific thresholds on which predictions might be based.

The area under this curve represents the probability that, given a positive case

and a negative case, the classifier rule output will be higher for the positive

case and it is not dependent on the choice of decision threshold. Figure 3.7

shows the ROC curves for comparison of classification performances for the

proposed method.

Figure 3.7 ROC curve analysis of ischemic beat classification

62

It conveniently displays diagnostic accuracy expressed in terms of

sensitivity (or true-positive rate) against (1 - specificity) (or false-positive

rate) at all possible threshold values. Performance of each test is characterized

in terms of its ability to identify true positives while rejecting false positives,

with the following definitions:

False Positive Fraction (FPF) = FP / (TN – FP)

True Positive Fraction (TPF) = TP / (TP –FN)

True Negative Fraction (TNF) = TN / (TN – FP)

False Negative Fraction (FNF) = FN / (TP – FN)

Where TP, TN, FP, and FN are the numbers of true positive, true

negative, false positive, and false negative test results, respectively. Note that

because every actual positive results in either a true positive or a false

negative, while every actual negative results in either a true negative or a false

positive, TPF is the ratio of true positives (actually positive and reported

positive) to actual positives, and TNF is the ratio of true negatives to actual

negatives. Two other quantities of interest for performance characterization

are defined in terms of the above quantities, as follows:

Sensitivity = TPF

Specificity = TNF = 1.0 – FPF

Choosing a value of threshold c defines an “operating point,” at

which the test has a particular combination of sensitivity and specificity. A

plot of TPF versus FPF for all possible operating points is the ROC curve for

test X, which makes explicit the trade off between sensitivity and specificity

for the test. Both TPF and FPF range from 0 to 1, so the ROC is often plotted

within a unit square. The results shown that our proposed GPCA method

extracts more relevant features than the linear PCA and other methods.

63

Table 3.2 The value of sensitivity at each fold for different extraction

methods

FoldPCA

(%)

Fuzzy

(%)

GPCA

(%)

1 78 77 90

2 78 78 91

3 79 80 91

4 79 80 92

5 79 80 92

6 79 81 92

7 79 82 93

8 80 82 93

9 81 83 93

10 82 84 93

Table 3.2 shows the sensitivity at each fold. Here the ten-fold

validation method has been applied for analyzing the performance with the

linear PCA. The sensitivity is higher than that of the previously described

algorithms while the Az value is better than other method.

Table 3.3 shows the performance of GPCA for Detection rate. An

average sensitivity, specificity and classification accuracy obtained by the

evolved BPNNs was approximately 91%, 89.75 % and 90.24% respectively.

64

Table 3.3 Performance of GPCA for Detection rate

Record

Number

No of

normal

beats

No of

abnormal

beats

TP FP TN FN

Detection

Rate

%

Fp

rate

%

Se

(%)

Sp

(%)

Acc

(%)

EO103 101 19 19 8 93 0 100 8 100 92 93

EO104 87 33 25 4 83 8 75 5 75 95 90

EO105 84 36 36 8 76 0 100 10 100 90 93

EO107 105 15 10 9 96 5 60 9 60 92 88

EO108 92 28 28 13 79 0 100 14 100 85 89

EO111 114 6 3 20 94 3 52 18 52 82 80

EO113 90 30 30 11 79 0 100 12 100 86 90

EO114 95 25 25 3 92 0 100 3 100 96 97

EO119 107 13 13 10 97 0 100 9 100 90 91

EO121 98 22 21 6 92 1 95 6 95 94 94

EO122 105 15 14 14 91 1 97 13 97 87 87

EO127 79 41 40 15 64 1 98 19 98 81 86

EO129 112 8 8 17 95 0 100 15 100 84 85

EO151 108 12 12 6 102 0 100 5 100 94 95

EO154 104 16 15 6 98 1 96 6 96 94 94

EO170 102 18 13 8 94 5 74 8 74 92 89

EO207 110 10 10 8 102 0 100 7 100 92 93

Total 1693 347 322 166 1527 25 91 9.8 91 89.75 90.24

Table 3.4 shows the performance analysis of accuracy at each fold.

The average testing and training accuracy was obtained 93.58% and 90.14%

respectively. The current approach is able to clarify the type of each detected

episode (different types of ST segment vs. T-wave changes) with high rates of

sensitivity, specificity and accuracy. Table 3.5. Shows the performance

analysis of GPCA for Long Duration ECGs.

65

Table 3.4 Performance analysis of accuracy at each fold

FoldNo of Training Beats No of Testing Beats Training

Accuracy

Testing

AccuracyNormal Abnormal Normal Abnormal

F1 1529 307 164 40 89.1 87.01

F2 1544 292 149 55 89.27 84.82

F3 1526 310 167 37 79.55 72.05

F4 1518 318 175 29 93.23 90.06

F5 1521 315 172 32 95.15 91.27

F6 1524 312 169 35 99.56 97.56

F7 1575 261 153 51 100 99.34

F8 1473 363 180 24 100 99.59

F9 1466 370 187 17 91.25 84.25

F10 1516 320 177 27 98.75 95.45

Total 15192 3168 1693 347

93.58 90.14Average Value

Table 3.5 Performance analysis of GPCA for Long Duration ECGs

Total

No of

Beats

No of

Normal

Beats

No of

Abnormal

Beats

TP FP TN FN

Detectio

n rate

%

Fp

rate

%

Se

%

Sp

%Accuracy

(%)

20400 17065 3335 2957 2218 14846 378 88.6 12.9 88.6 87 87.3

66

3.6 SUMMARY OF CONTRIBUTION

In this work, an enhanced version of PCA in ischemia detection has

proposed. The Genetic Algorithm (GA) is combined with PCA to select more

relevant principal components from the feature set vector of ECG signals.

Initially, the features are extracted from the ECG signals as eigenvectors and

eigenvalues. As we are having large number of samples, the dimensionality of

this vector space is reduced with the proposed Genetic based Principal

Component Analysis (GPCA). These extracted features are fed into a three

layer BPN to classify the beats into ischemic or non-ischemic. The results

showed that the proposed GPCA method extracts more relevant features than

linear PCA and long duration ECG analysis.

chapter 3 feature extraction using genetic...

Documents