ecg beats classification using multiclass svms with ecoc

14
ECG beats classification using multiclass SVMs with ECOC CSE 463{Neural Networks} Final Report- Phase 4 Submitted to: Prof. Hazem Abbas Submitted by: Mostafa Mohamed Hassan Megahid Yomna Mahmoud Ibrahim Hassan Yusuf Ibrahim Yusuf Ain Shams University Faculty of Engineering Computer & Systems Department

Upload: yomna-ibrahim-hassan

Post on 18-Nov-2014

395 views

Category:

Technology


0 download

DESCRIPTION

ECG beats classification using multiclass SVMs with ECOC

TRANSCRIPT

Page 1: ECG beats classification using multiclass SVMs with ECOC

ECG beats classification using multiclass SVMs with ECOC

CSE 463{Neural Networks}

Final Report- Phase 4

Submitted to: Prof. Hazem Abbas

Submitted by: Mostafa Mohamed Hassan Megahid Yomna Mahmoud Ibrahim Hassan

Yusuf Ibrahim Yusuf

Ain Shams University Faculty of Engineering

Computer & Systems Department

Page 2: ECG beats classification using multiclass SVMs with ECOC

Abstract: Our Project aims to facilitate tracking patterns in heartbeats ( in Electrocardiographic Signals), by using Support-Vector Machines with Error Correcting Output Codes to detect the patterns , and identify which class of diseases the input signal belongs to. This is achieved by first passing the signal into a preprocessing module which performs feature extraction, separating the unwanted noise from the pattern. Then using the SVM (support Vector Machine) Module to classify the signals (In our Project, the classification was into 4 types of diseases: normal beat, congestive heart failure beat, ventricular tachyarrhythmia beat, a trial fibrillation beat(1),(6).

In our Final Implementation, We improved the accuracy of the classification from 2 out of 8 classes classified correctly on average, into 2 out of 4 classes classified correctly on average, using the same preprocessing function for feature extraction, with editing in the SVM (Support Vector Machine) code, using the hamming code generation mentioned in Thomas G.Diettrich's paper as an ECOC (Error correcting output code), to reach the maximum accuracy on classification by the SVM (10),(13).

Page 3: ECG beats classification using multiclass SVMs with ECOC

Introduction: The Project's main goal is the classification of ECG

(Electrocardiographic beats), the response signal (ECG) differs, depending on the type of disease affecting the heart. Noticing that, we extracted certain features of the signal (using DWT as pre-processing), and depending on few of these features, we classified the signal into classes.

The main classis faction was implemented using SVM (Support vector machines) with ECOCs (Error correcting output codes). The target behind using ECOCs is that it increases the classification performance as they are used for representing the classes with more than one bit; therefore the classification by the chosen neural network is applied once per bit in all classes, leading to more accuracy.

In the following section, we'll give a review on the Project topic in general (including others' researches), then we will move to Our Project's implementation in detail. The last partition will detail all the experiments conducted all through the time of the project to reach the final results.

Page 4: ECG beats classification using multiclass SVMs with ECOC

Project Documentation

a- Review of the project topic:

The paper used in the project is implementing multiclass Support Vector machines (SVM) with error correcting output codes to classify ECG beats into four classes: normal beat, congestive heart failure beat, ventricular tachyarrhythmia beat and a trial fibrillation beat. The data set used in the paper was obtained from the Physiobank database (12).

The figure below show the process of classifying ECG beats used in the paper:

Discrete Wavelet Transform

Discrete Wavelet Transform (DWT) is used to perform feature extraction from ECG signals. The number of decomposition levels used in this study was chosen to be four. Thus, the ECG signals were decomposed into the details D1, D2, D3, D4 and one final approximation, A4. The wavelet coefficients were computed using the MATLAB software package(3),(4),(5).

It was detected, in the paper, that Daubechies of order 2(db2) Wavelet type has offered better classification accuracy with percentage of 98.61%.

Page 5: ECG beats classification using multiclass SVMs with ECOC

A feature vectors were calculated for each sub band. The following table shows the total classification accuracy against the wavelet type used:

Two different experiments with different feature vectors were performed in the paper. In the first experiment, 256 wavelet coefficients for each class were used as feature vectors. In the second experiment, statistical features (maximum, mean, minimum and standard deviation of the wavelet coefficients in each sub-band) were used as feature vectors. Results showed that reducing the dimension of the feature vectors and representing the signals by the selected features significantly increase the classification accuracies.

Support Vector Machine

The classifier in the paper used an already developed concept to improve the results, which is using ECOCs, where the SVM decisions are fused using the ECOC approach, which was adopted from the digital communication theory of transmission, this is done by combining the results from the multi-classifiers used to separate the classes two at a time, and checking for errors, if any, it approximates it to the nearest target possible.

The SVM developed in the paper was trained and tested by 720 vectors with 20 dimensions. The training algorithm of the SVM is based on quadratic programming. It used the RBF kernel functions. The test performance of the classifiers can be determined by the computation of sensitivity, specificity and total classification accuracy. The sensitivity, specificity and total classification accuracy are defined as:

- Sensitivity: number of true positive decisions/number of actually positive cases.

Page 6: ECG beats classification using multiclass SVMs with ECOC

- Specificity: number of true negative decisions/number of actually negative cases.

- Total classification accuracy: number of correct decisions/total number of cases.

Results against MLPNN

The performance of the SVM classifier was tested against multilayer perceptron neural network (MLPNN) which was implemented with a single hidden layer and 20 inputs. It used sigmoidal function as an activation function and was trained by the Levenberg-Marquardt algorithm.

As a Result, the SVM had a total classification accuracy of percentage 98.61%, where the MLPNN had a total classification accuracy of percentage 91.39%.The following table shows the classification accuracy in terms of Sensitivity and Specificity:

b- Project Implementation:

The project is implemented on three phases. First, we used the DWT to extract features from the ECG signals; then we used SVM with ECOC toolbox to implement classification algorithm; finally we tried to change the error correcting code to be more optimized when working on the training and testing set used. In this section, we will illustrate each phase and how it was implemented.

Page 7: ECG beats classification using multiclass SVMs with ECOC

i. The discrete wavelet transforms:

The advantage of the DWT over Fourier transformation is that it performs multi-resolution analysis of signals with localization both in time and frequency, popularly known as time-frequency localization.

Several wavelet families are available for different applications. Here in this paper the Daubechies 2 wavelet gave the best classification results, so it is used.

Implementation of DWT

The Discrete Wavelet Transform can be implemented using filter banks using a pair of high pass and low pass filter as shown in figure(4).

This implementation is less efficient as half the coefficients computed are just thrown away in the down-sampling procedure and this implementation doesn't lend itself well to parallelism, so lifting scheme is used instead.

The implementation of lifting scheme is based on this data dependency diagram:

Page 8: ECG beats classification using multiclass SVMs with ECOC

By implementing only the first stage and setting a =-0.5 and b= 0.5 and setting K=1, we have essentially implemented db2 using lifting (k should be √2 but as a property of Support vector Machines, it doesn't matter if we garbled with a feature vector in some way as long as all the other vectors are garbled in the same way).

We could use the resulting DWT coefficients as feature vectors of 256 dimension, but testing results should that using statistical properties of every sub-band as feature vectors improve accuracy and performance (20 dimensions), so it's used.

ii. The Support Vector Machine:

Implementing the SVM we used a toolbox obtained from (2). We used two main parts of the functions introduced in the toolbox to implement SVM with ECOC classification:

1- Kernel function implementation, the toolbox provided many kernel functions to be used such as linear, radial basis function, .etc. After examining the kernel functions introduced we chose the “Radial basis function” (8) as it was recommended in the paper and it actually introduced a good results in our experimentation. Its basic idea is that a predicted target value of an item is likely to be about the same as other items that have close values of the predictor variables.

2- Neural Network training algorithm, which was the Decomposition algorithm, which was implemented instead of the original

Page 9: ECG beats classification using multiclass SVMs with ECOC

QP(Quadratic problem) as it is more compatible with larger data sets, it's idea is to divide the main QP into sub QP, all of them are solved in parallel, decreasing training time.

More detailed about algorithm and functions used is mentioned in the manual documentation.

iii. The Error correcting codes:

The first error correcting code used in the implementation was from the already implemented ECOCs that were downloaded with the SVM with ECOC toolbox. We used the closer code to our needs; we used a 15 bit classifier code with 12, 13 and 14 representation bits (13).

These already implemented codes did not result in a very accurate classification as it wasn’t the actual code for our classification problem, which was a four-class classifier.

Then we implemented our own hamming code (11), and modified the structure of the file, resulting inaccurate and unstable results (the values differs with each run of the program).

The final error correcting code used was based on the "exhaustive method"(13), which is based on a routine of zero's followed by one's, so that rows in the code are independent , which increase the performance.

c- Experiments & results:

First, discussing the trials on the Support Vector Machine Module, We First divided the output from the feature extraction into a Training set & a Testing set, Including 12 signals in the training, and 8 signals in the Testing. This resulted, (using the ECOC's structure mentioned below) a 25% matching between the results and the real classification. On increasing the Training set, the accuracy reached 50% on average. Shown in the figures below are the output results before and after modification.

Page 10: ECG beats classification using multiclass SVMs with ECOC

For the Error Correcting code structure, we first tried to change the attached file with random changes in the bits (with no certain sequence), but all the results didn't reach the optimum except once in every 10 trials by maximum. In the Final implementation, we reached the proper Hamming code to reach optimum result, referring to the "exhaustive method" mentioned in Thomas G.Diettrich's paper. This method is proper

Page 11: ECG beats classification using multiclass SVMs with ECOC

for classification of classes of 3 or more (but not exceeding 7), as its output will be satisfying the requirements of ECOCs (Row & Column Separation). By implementing this Method, we observed that each time we run the code now, we reach the maximum accuracy (3 out of 8 classes correct) instead of (4 out of 8 classes correct).

Conclusion:

With the results mentioned above, we reached that the most suitable form of analysis and feature extraction for real time data that can be accompanied with noise is the Discrete Wavelet Transform. As for the classification module, Support vector machines are one of the most suitable networks for multi-class classification.

In addition, using ECOCs increases the reliability of the results, and ECOCs are extremely flexible, as they can be applied to any neural network's output. Also they can be modified to different number of classes and different output representations.

For Future Work, we suggest increasing the database, increasing the number of extracted features from the signals (to increase the matching of patterns), and finally, the system can be enlarged into covering more classes (diseases) in the classification.

Page 12: ECG beats classification using multiclass SVMs with ECOC

Appendices:

Appendix A (Programs Listing):

- Preprocessing Code

- Experiment_1 (SVM & ECOC implementation).

Appendix B (Project Manual):

Preprocessing Code

• preprocessing_Fextraction_main : The main function in the feature Extraction Module, it is divided into cells of : Loading the required data from the input database file, un-sampling the signal into a fixed size data ( using a buffer), and finally calling "ROI", and "calc_featurevector" functions ( explained later).

• ROI: the function calculates the region of interest in the signal, The concept of an ROI is commonly used in medical Signaling. For example, the boundaries of a tumor may be defined on a Signal or in a volume, for the purpose of measuring its size. The endocardial border may be defined on a Signal, perhaps during different phases of the cardiac cycle, say end-systole and end-diastole, for the purpose of assessing cardiac function.

• Calc_featureVector: after specifying the ROI, we calculate the values for these features in each signal, and this is the output to the SVM module.

Page 13: ECG beats classification using multiclass SVMs with ECOC

Experiment_1 (SVM & ECOC implementation)

• Experiment_1 : is the main function for the SVM module, in it we call the rest of the functions in the sequence mentioned as following.

• ECOC: the function initializes a wrapper for error correcting output codes for a multiclass cclassification problem with NCLASSES classes, and NBITS for representation.

• Ecocload: The function extracts the hamming code from the attached file "code 7-4" to implement its sequence on the output.

• SVM: It initializes the support vector Machine with a kernel function (here we used radial basis kernel function).

• Ecoctrain: It trains the structure of the SVM with the data from the training set, using the ECOC structure mentioned in ECOCload.

• Ecocfwd: This function results the final classification, it predicts the output of each testing set according to the training occurs.

More information about the code is in the commentaries available in the code files.

Page 14: ECG beats classification using multiclass SVMs with ECOC

References:

1) ECG beats classification using multiclass support vector machines with error correcting output codes

2) SVM Toolbox for Mat-lab

3) Stephane Mallat, A Wavelet Tour of Signal Processing, Second Edition, 1990

4) Amara Graps, an Introduction to Wavelets.

5) Ingrid Daubechies, Ten Lectures on Wavelets, 1992.

6) Support-Vector Networks, by Corinna Cortes & Vladimir Vapnik

7) Trial Database for the SVM toolbox trial.

8) RBF Neural Networks.

9) Decomposition Algorithm for the SVM training.

10) Research of Thomas G.Diettrich.

11) Hamming code on Wikipedia.

12) MIT-BIH Arrhythmia database.

13) Solving Multiclass Learning Problems via Error Correcting output codes.