chapter 3 feature extraction using genetic...
TRANSCRIPT
46
CHAPTER 3
FEATURE EXTRACTION USING GENETIC ALGORITHM
BASED PRINCIPAL COMPONENT ANALYSIS
3.1 INTRODUCTION
Cardiac beat classification is a key process in the detection of
myocardial ischemic episodes in the electrocardiographic signal. Myocardial
ischemia is caused by insufficient blood flow to the muscle tissue of the heart.
This reduced blood supply may be due to narrowing of the coronary arteries,
obstruction by a thrombus, or, less commonly, due to diffuse narrowing of
arterioles and other small vessels within the heart. Ischemia is one of the
leading causes of death in modern societies and, as a consequence, its early
diagnosis and treatment is of great importance. In the ECG signal, ischemia is
expressed as slow dynamic changes of the ST segment and/or the T wave.
Long duration electrocardiography, like Holter recordings or continuous ECG
monitoring in the coronary care unit, is a simple and noninvasive method to
observe such alterations. The development of suitable automated analysis
techniques can make this type of ECG recording very effective in supporting
the physician’s diagnosis and guide patient management in clinics and clinical
applications. The accurate ischemic episode detection in the recorded ECG is
based on the correct classification of the ischemic cardiac beats. Several
techniques have been proposed for ischemic beat classification, which
evaluate the ST segment changes and the T-wave alterations with different
methodologies.
47
3.2 PREPROCESSING OF ECG
The main aim of the ECG signal preprocessing is to prepare a
compact description of the ST–T complex, composed from the ST Segment
and the T–wave, for input to the classification methodology with the
minimum loss of information.
Until now, ECG recordings that are used for the diagnosis of
ischemic episodes are affected by noise, which deteriorates significantly the
diagnostic accuracy. Better handling of the noisy ECGs can improve the
accuracy of the diagnostic methods and increase their applications in every
day practice. There are three types of noise in the ECG signal:
(a) Power line interference (A/C interference),
(b) Electromyographic contamination (EMG noise), and
(c) Baseline wandering (BW).
A/C interference contaminates the ECG signal with main frequency
interference, which sometimes is phase-shifted with respect to the main
voltage (50 or 60 Hz). EMG noise is correlated with muscle contraction and
overlaps with the frequency spectrum of the ECG signal. It is obvious that the
removal of the EMG noise alters also the original ECG signal. Finally, the
baseline wandering is caused by respiration and motion artifacts and generally
is a low frequency noise.
3.3 FEATURE EXTRACTION USING GENETIC PRINCIPAL
COMPONENT ANALYSIS
This section describes the feature extraction process from the beat
signals extracted from the electrocardiograms. Here we have used two
methods for feature extraction, namely PCA and GPCA. The main goal of this
48
work is to develop algorithms to automatically detect ischemia episodes. For
this purpose, features based on ST segment deviation T wave and QRS
complex morphology changes were extracted.
3.3.1 Feature Extraction Using Principal Component Analysis
Principal Components Analysis (PCA) is an exploratory
multivariate statistical technique for simplifying complex data sets. The PCA
transformation is selected as the tool for reducing the dimensionality of the
extracted ST-T samples. The PCA decomposition is in terms of second order
statistics optimum, in the sense that it permits an optimal reconstruction of the
original data in the mean-square error sense (subject to the dimensionality
constraint). The PCA transformation describes the original vectors (ST-T
complexes) according to the direction of maximum variance reduction in the
training set. The latter information is obtained by analyzing the data
covariance matrix. The orthogonal eigenvectors of the covariance matrix are
selected as basic functions for the signal projection operation. The
corresponding eigenvalues represent the average dispersion of the projection
of the input vectors onto the corresponding eigenvectors (basis functions).
The numerical value of each eigenvalue quantifies the amount of variance that
is accounted for by projecting the signal onto the corresponding eigenvector.
Accordingly, it represents the contribution of the eigenvector’s analysis
direction to the signal reconstruction in the mean squared error sense. For the
analysis of ECG signal the eigenvalues after the fifth have very small
numerical values. Thus, for the representation of ST-T Complex the first five
PCA coefficients were used to characterize about 97.9% of the signal energy.
A small performance improvement has been observed by using the first five
PCA coefficients instead of four. The five principal components extracted
from the corresponding ST-T Complex are assigned to each QRS fiducial
point. The first principal component (PC) and the second one (but to a less
49
extent) represent the dominant low-frequency component of the ST-T
Complex; the third, fourth, and fifth contain more high-frequency energy. In
the time series representation of the PCs the ischemic episodes appear as
peaks. A straightforward way for the detection of ischemic beats from the
PCA representation is to use as the input vector the PCA coefficients of a
single beat. This approach clearly accounts only for local information.
Therefore, a better approach that can extract also morphological information
from the ST-T episodes in such a way to distinguish artifacts and to
appreciate even weak ST episodes is necessary. This type of approach should
take into account the information from a sequence of beats instead of a single
beat.
Given ‘n’ observations on ‘m’ variables, the goal of PCA is to
reduce the dimensionality of the data matrix by finding ‘r’new variables,
where ‘r’ is less than ‘m’. Principal components project high dimensional data
into the subspace spanned by the eigenvectors with the ‘r’ largest eigenvalues
while remaining mutually uncorrelated and orthogonal. Each principal
component is a linear combination of the original variables. The algorithm to
obtain the Principal Components of a vector set X represented by a XN×M
matrix, where N represents the number of segments, and M represents the
dimension of the vectors that constitute the vector set.
The algorithm of PCA is explained as below:
a. Obtain the Mean vector ( ):1N
0iix x
N
1
b. Obtain the Covariance Matrix:T
ii
1N
0iiix )x()x(
N
1C
c. Obtain the eigenvectors and eigenvalues: eeCx where e is
eigenvector and is eigenvalue.
50
d. After creating the eigenspace we can proceed to recognition.
Given a new beat of an individual , the signals are
concatenated the same way as the training, the mean vector
is subtracted and the result is projected into the face space:
)(eT
kk
for k=1,..,M’. These calculated values of together form a vectorT = [ 1,
2, … , M’]. is then used to establish which of the pre-defined classes best
describes the new signal. The simplest way to determine class k that
minimizes the Euclidian distance:
2
kk
where k is a vector describing the kth
signal class. A signal is classified as
belonging to a certain class when the minimum k (i.e. the maximum
matching score) is below some certain threshold.
Choosing components and forming a feature vector: From the
experiments get 2040 components corresponding to the dimensionality of the
input sequence. Components that are significant from the point of view of
contribution to the total energy of the signal are selected. The selected
components together must constitute about 99% of the total energy of the
signal. This procedure decreases the data dimensionality without significant
loss of information. There are at least three proposed ways to eliminate
eigenvectors.
First is the mentioned elimination of eigenvalues with smallest
eigenvalues. This can be accomplished by discarding the last
60% of total number of eigenvectors.
51
The second way is to use the minimum number of
eigenvectors to guarantee that energy E is greater than a
threshold. A typical threshold is 0.9 (90% of total energy). If
we define Ei as the energy of the ith
eigenvector, it is the ratio
of the sum of all eigenvalues up to and including i over the
sum of all the eigenvalues: where k is the total number of
eigenvectors.
k
j
j
i
j
jiE11
The third variation depends upon the stretching dimension.
The stretch for the ith
eigenvector is the ratio of that
eigenvalue over the largest eigenvalue ( I):
Si = i / I
In our proposed method, Genetic Algorithm (GA) is used to
select the best eigenvectors.
3.3.2 Genetic Algorithm Approach
Genetic Algorithm is an adaptive heuristic method of global-
optimization searching and it simulates the behaviour of the evolution process
in nature. It maps the searching space into a genetic space. That is, every
possible key is encoded into a vector called a chromosome. One element of
the vector represents a gene. All of the chromosomes make up of a population
and are estimated according to the fitness function. A fitness value will be
used to measure the “fitness” of a chromosome. Initial populations in the
genetic process are randomly created. GA then uses three operators to
produce a next generation from the current generation: reproduction,
crossover, and mutation. GA eliminates the chromosomes of low fitness and
keeps the ones of high fitness. This whole process is repeated, and more
52
chromosomes of high fitness move to the next generation, until a good
chromosome (individual) is found. The main objective of genetic feature
selection stage is to reduce the dimensionality of the problem before the
supervised inductive learning process. Among the many wrapper algorithms
used, the Genetic Algorithm (GA), which solves optimization problems using
the methods of evolution, specifically “survival of the fittest”, has proved as a
promising one. GA evaluates each individual’s fitness as well the quality of
the solution. The fitter individuals are more eligible to enter into the next
generation as a population. After a required number of generations the final
set of optimal population with fittest chromosomes will emerge giving the
solution. The process of selection, crossover and mutation continues for a
fixed number of generations or till a termination condition is satisfied.
Genetic algorithms have been used for selecting the optimal subspace in
which the projected data gives higher recognition accuracy.
3.3.3 Genetic Principal Component Analysis
The input data is transformed to higher dimension using a non-
linear transfer function (polynomial function) and GA is used to select the
optimal subset of the non-linear principal components with the fitness
function taken as the recognition performance. As explained in the previous
section there is three possible ways to eliminate eigenvectors. Here, the GA is
used to select the best eigenvectors for PCA. In general, ‘M number of
eigenvectors having highest eigenvalues will be selected. The main drawback
of general PCA is that we can’t expect an equal contribution of principal
components from each class. And, the principal components are selected
based only on highest eigenvalues.
In this proposed method, we are going to choose only F number of
eigenvectors, for each class, the reduced feature set will contain S=NC×F
number of features, where NC is the number of classes. In this case, it has two
53
classes: ischemic and non-ischemic. The basic idea here is, instead of
choosing highest eigenvectors from the entire eigenspace, we are going to
choose the best eigenvector for each class based on Euclidian distance.
Initially the eigenspace are grouped based on number of classes. For each
class the principal components are selected using GA as discussed below:
Initially, the eigenspace are grouped based on number of classes
(NC). An n number of eigenvector is selected from each class at random. The
index of each eigenvector is used to construct one chromosome. Similarly N
number of chromosomes is generated. (N=10). For example, consider the
chromosome:
980 726 657 807 240 825 728 771 889 941…..
24 252 220 92 534 155 633 526 689 796…...
Each integer represents one eigenvector. The first 980 stands for
the 980th
eigenvector from the first class, the 24 in the second row represents
the 24th
eigenvector from the second class. The total length of the
chromosome is equal to the number of principal components required. Here,
we kept the size as 600. For each chromosome, the Euclidian distance within
the class (W) and between the classes (B) has been calculated. The fitness
value is calculated as:
f(x) = B / W
The chromosome which has the minimum fitness value (Gmin) is
stored as the best eigenvector set. Then the genetic operators are applied to
search for the optimum set.
Reproduction (selection) – The selection process selects
chromosomes from the mating pool directed by the survival of the fittest
concept of natural genetic systems. In the proportional selection strategy
54
adopted in this article, a chromosome is assigned a number of copies, which is
proportional to its fitness in the population that goes into the mating pool for
further genetic operations. Roulette wheel selection is one common technique
that implements the proportional selection strategy.
Crossover – is a probabilistic process that exchanges information
between two parent chromosomes for generating two child chromosomes. In
this work, single point crossover with a fixed crossover probability of pc=0.6
is used. For chromosomes of length l, a random integer, called the crossover
point, is generated in the range [1, l-1]. The portions of the chromosomes
lying to the right of the crossover point are exchanged to produce two
offspring.
Mutation – Each chromosome undergoes mutation with a fixed
probability pm=0.003. For binary representation of chromosomes, a bit
position (or gene) is mutated by simply flipping its value. Since we are
considering real numbers, a random position is chosen in the chromosome and
replaced by a random number between 0-9.The new populations is generated
after the genetic operators are applied. The current best eigenvector set is
(Lmin) selected from the new population and compared with the global one.
If the global set contains minimum fitness value then the local, the next
iteration is continued with the old population. Otherwise, the current
population is considered for the next iteration. This process is repeated for k
number of iterations. Figure 3.2 shows a flow chart for Genetic PCA based
feature Extraction
The algorithm is given as:
1. Construct the initial population (p1) with random eigenvectors
2. Calculate the fitness value (x) = B / W
3. Find out the Global minimum (Gmin)
55
4. For i = 1 to k do
a. Perform reproduction
b. Apply the crossover operator between each parent.
c. Perform mutation and get the new population. (p2)
d. Calculate the local minimum (Lmin)
e. If Gmin > Lmin then
i. Gmin = Lmin
ii. p1 = p2
5. Repeat
Figure 3.1 A Flow chart for genetic PCA based feature extraction
56
3.4 GENETIC PCA FOR ISCHEMIC BEATS CLASSIFICATION
Electrocardiography is a significant tool in analyzing the condition
of the heart. The ECG is the record of discrepancy of bioelectric potential
with respect to time as the human heart beats. It provides most valuable
information about the functional characteristics of the heart and cardiovascular
system. Myocardial ischemia is one of the diseases with highest incidence rate
in the industrialized countries. Prolonged severe or repeated ischemic
episodes can provoke irreversible damage to the cardiac tissue. ECG analysis
is not the most accurate method that exists to detect the ischemic events. We
proposed an improved version of PCA for feature extraction. Here, the
Genetic Algorithm (GA) is combined with PCA to extract more relevant
features. A Back propagation Neural Network is used to classify the beats into
either ischemic or non-ischemic, with the features from the GPCA.
Figure 3.2 Block diagram for GPCA based ischemic beats classification
The classifier employed in this work is a three-layer Back
Propagation Neural Network. The BPN optimizes the net for correct
responses to the training input data set. More than one hidden layer may be
beneficial for some applications, but one hidden layer is sufficient if enough
hidden neurons are used. Initially the features from the textural analysis
method, are normalized between [0,1]. That is, each value in the feature set is
divided by the maximum value from the set.
57
Wih
S1
Input Neurons
Output
Neuron
Hidden
Neurons
Who
S2
Figure 3.3 A Three-layer back propagation network
These normalized values are assigned to the input neurons. The
number of hidden neurons is equal to the number of input neurons and only
one output neuron. Figure 3.3 shows a Three-Layer Back propagation
Network for classification. Initial weights are assigned randomly between [-
0.5 to 0.5]. The output from the each hidden neuron is calculated using the
sigmoid function,
S1 = 1 / ( 1 + e- x
)
where =1, and x = i wih ki, where wih is the weight assigned between input
and hidden layer, and k is the input value. The output from the output layer is
calculated using the sigmoid function,
S2 = 1 / ( 1 + e- x
)
where =1, and x = i who Si, where who is the weight assigned between
hidden and output layer, and Si is the output value from hidden neurons. S2 is
subtracted from the desired output. Using this error (d) value, the weight
change is calculated as:
delta = d * S2 * ( 1 – S2)
58
and the weights assigned between input and hidden layer and hidden and
output layer are updated as:
Who = Who + ( n * delta * S1)
Wih = Wih + ( n * delta * k)
where ‘n’ is the learning rate, ‘k’ is the input values. Again calculate the
output from hidden and output neurons. Then check the error (d) value, and
update the weights. This procedure is repeated till the target output is equal to
the desired output. The network is trained to produce a 1.0 output value for
ischemic and 0.1 output value for non-ischemic. The classification
performance is validated using the ten-fold validation method and the results
were analyzed by using ROC analysis. Figure 3.4 A Flow chart for three-
layer Back propagation Neural Network Classifier.
Figure 3.4 A Flow chart for back propagation neural network classifier
59
3.5 RESULTS AND DISCUSSION
The European ST-T Database is used for evaluation of our
proposed algorithm. This database consists of 90 annotated excerpts of
ambulatory ECG recordings from 79 subjects. The subjects were 70 men aged
30 to 84, and 8 women aged 55 to 71. The database includes 367 episodes of
ST segment change, and 401 episodes of T-wave change. Each record is two
hours in duration and contains two signals, each sampled at 250 samples per
second with 12-bit resolution over a nominal 20 millivolt input range.
Two cardiologists worked independently to annotate each record
beat-by-beat and for changes in ST segment and T-wave morphology, rhythm,
and signal quality. ST segment and T-wave changes were identified in both
leads (using predefined criteria which were applied uniformly in all cases),
and their onsets, extrema and ends were annotated. Annotations made by the
two cardiologists were compared, disagreements were resolved by the
coordinating group in Pisa, and the reference annotation files were prepared;
altogether, these files contain 802,866 annotations. Over half (48 of 90
complete records, and reference annotation files for all records) of this
database is freely available from PhysioNet. In this paper, we have taken the
full length ECG signals from 17 patients and each signal will be translated
into 120 samples and totally 2040 beats for short duration and 20,400 beats
extracted for long duration analysis. This dimensionality is reduced by GPCA
as discussed in the earlier section. Figure3.5 shows the Comparison of
sensitivity at each fold from our proposed and feature extraction existing
methods. As shown in the figure, it is noted that GPCA output performance
with consistent and improved results.
60
70
75
80
85
90
95
PCA FUZZY GPCA
Per
cen
tag
e
Sensitivity
Figure 3.5 Comparison of sensitivity with feature extraction methods
Figure 3.6 shows the Az value of existing and the proposed methods
for automated ischemic beat classification. The area under the ROC curve is
an important criterion for evaluating diagnostic performance. Usually it is
referred as the AZ index. The AZ value of ROC curve is just the area under
the ROC curve. The value of AZ is 1.0 when the diagnostic detection has
perfect performance, which means that TP rate is 100% and FP rate is 0%.
Table 3.1 performance analysis of ischemic beat classification with sensitivity
and Az Value
0.7
0.75
0.8
0.85
0.9
0.95
PCA FUZZY GPCA
Az
va
lue
Figure 3.6 Comparison of Az value with feature extraction methods
61
Table 3.1 Performance analysis of ischemic beat classification
Methods Sensitivity Az Value
Principal Component Analysis 80% 0.78
Fuzzy Approach 81% 0.80
Genetic based PCA 92% 0.90
The Receiver Operating Characteristic (ROC) curve is one of the performance
measures for classification. ROC curves measure predictive utility by
showing the trade off between the true-positive rate and the false-positive rate
inherent in selecting specific thresholds on which predictions might be based.
The area under this curve represents the probability that, given a positive case
and a negative case, the classifier rule output will be higher for the positive
case and it is not dependent on the choice of decision threshold. Figure 3.7
shows the ROC curves for comparison of classification performances for the
proposed method.
Figure 3.7 ROC curve analysis of ischemic beat classification
62
It conveniently displays diagnostic accuracy expressed in terms of
sensitivity (or true-positive rate) against (1 - specificity) (or false-positive
rate) at all possible threshold values. Performance of each test is characterized
in terms of its ability to identify true positives while rejecting false positives,
with the following definitions:
False Positive Fraction (FPF) = FP / (TN – FP)
True Positive Fraction (TPF) = TP / (TP –FN)
True Negative Fraction (TNF) = TN / (TN – FP)
False Negative Fraction (FNF) = FN / (TP – FN)
Where TP, TN, FP, and FN are the numbers of true positive, true
negative, false positive, and false negative test results, respectively. Note that
because every actual positive results in either a true positive or a false
negative, while every actual negative results in either a true negative or a false
positive, TPF is the ratio of true positives (actually positive and reported
positive) to actual positives, and TNF is the ratio of true negatives to actual
negatives. Two other quantities of interest for performance characterization
are defined in terms of the above quantities, as follows:
Sensitivity = TPF
Specificity = TNF = 1.0 – FPF
Choosing a value of threshold c defines an “operating point,” at
which the test has a particular combination of sensitivity and specificity. A
plot of TPF versus FPF for all possible operating points is the ROC curve for
test X, which makes explicit the trade off between sensitivity and specificity
for the test. Both TPF and FPF range from 0 to 1, so the ROC is often plotted
within a unit square. The results shown that our proposed GPCA method
extracts more relevant features than the linear PCA and other methods.
63
Table 3.2 The value of sensitivity at each fold for different extraction
methods
FoldPCA
(%)
Fuzzy
(%)
GPCA
(%)
1 78 77 90
2 78 78 91
3 79 80 91
4 79 80 92
5 79 80 92
6 79 81 92
7 79 82 93
8 80 82 93
9 81 83 93
10 82 84 93
Table 3.2 shows the sensitivity at each fold. Here the ten-fold
validation method has been applied for analyzing the performance with the
linear PCA. The sensitivity is higher than that of the previously described
algorithms while the Az value is better than other method.
Table 3.3 shows the performance of GPCA for Detection rate. An
average sensitivity, specificity and classification accuracy obtained by the
evolved BPNNs was approximately 91%, 89.75 % and 90.24% respectively.
64
Table 3.3 Performance of GPCA for Detection rate
Record
Number
No of
normal
beats
No of
abnormal
beats
TP FP TN FN
Detection
Rate
%
Fp
rate
%
Se
(%)
Sp
(%)
Acc
(%)
EO103 101 19 19 8 93 0 100 8 100 92 93
EO104 87 33 25 4 83 8 75 5 75 95 90
EO105 84 36 36 8 76 0 100 10 100 90 93
EO107 105 15 10 9 96 5 60 9 60 92 88
EO108 92 28 28 13 79 0 100 14 100 85 89
EO111 114 6 3 20 94 3 52 18 52 82 80
EO113 90 30 30 11 79 0 100 12 100 86 90
EO114 95 25 25 3 92 0 100 3 100 96 97
EO119 107 13 13 10 97 0 100 9 100 90 91
EO121 98 22 21 6 92 1 95 6 95 94 94
EO122 105 15 14 14 91 1 97 13 97 87 87
EO127 79 41 40 15 64 1 98 19 98 81 86
EO129 112 8 8 17 95 0 100 15 100 84 85
EO151 108 12 12 6 102 0 100 5 100 94 95
EO154 104 16 15 6 98 1 96 6 96 94 94
EO170 102 18 13 8 94 5 74 8 74 92 89
EO207 110 10 10 8 102 0 100 7 100 92 93
Total 1693 347 322 166 1527 25 91 9.8 91 89.75 90.24
Table 3.4 shows the performance analysis of accuracy at each fold.
The average testing and training accuracy was obtained 93.58% and 90.14%
respectively. The current approach is able to clarify the type of each detected
episode (different types of ST segment vs. T-wave changes) with high rates of
sensitivity, specificity and accuracy. Table 3.5. Shows the performance
analysis of GPCA for Long Duration ECGs.
65
Table 3.4 Performance analysis of accuracy at each fold
FoldNo of Training Beats No of Testing Beats Training
Accuracy
Testing
AccuracyNormal Abnormal Normal Abnormal
F1 1529 307 164 40 89.1 87.01
F2 1544 292 149 55 89.27 84.82
F3 1526 310 167 37 79.55 72.05
F4 1518 318 175 29 93.23 90.06
F5 1521 315 172 32 95.15 91.27
F6 1524 312 169 35 99.56 97.56
F7 1575 261 153 51 100 99.34
F8 1473 363 180 24 100 99.59
F9 1466 370 187 17 91.25 84.25
F10 1516 320 177 27 98.75 95.45
Total 15192 3168 1693 347
93.58 90.14Average Value
Table 3.5 Performance analysis of GPCA for Long Duration ECGs
Total
No of
Beats
No of
Normal
Beats
No of
Abnormal
Beats
TP FP TN FN
Detectio
n rate
%
Fp
rate
%
Se
%
Sp
%Accuracy
(%)
20400 17065 3335 2957 2218 14846 378 88.6 12.9 88.6 87 87.3
66
3.6 SUMMARY OF CONTRIBUTION
In this work, an enhanced version of PCA in ischemia detection has
proposed. The Genetic Algorithm (GA) is combined with PCA to select more
relevant principal components from the feature set vector of ECG signals.
Initially, the features are extracted from the ECG signals as eigenvectors and
eigenvalues. As we are having large number of samples, the dimensionality of
this vector space is reduced with the proposed Genetic based Principal
Component Analysis (GPCA). These extracted features are fed into a three
layer BPN to classify the beats into ischemic or non-ischemic. The results
showed that the proposed GPCA method extracts more relevant features than
linear PCA and long duration ECG analysis.