major project presentation

Major Project Presentation on

AUTOMATED DIAGNOSTIC SYSTEM FOR GASTRIC CANCER DETECTION

National Institute of Technology Raipur

Department of Information Technology

Presented by: Ankita Singh [11118011] Mansi Jain [11118045]

Guided by: Mr. Pavan Kumar Mishra Assistant Professor Dept. of IT, NIT Raipur

1 Dept. of IT, NIT Raipur 5/14/2015

Introduction

• Diseases relating to gastrointestinal (GI) tract greatly threaten human’s health at present. Gastric cancer is the fourth common cancer and the second major cause of cancer death worldwide.

• Wireless capsule endoscopy (WCE) is a revolutionary, patient-friendly imaging technique that enables non-invasive visual inspection of the patient’s digestive tract and, especially, small intestine.

• However, a major issue with this new technology is that too many images produced by CE causes a huge burden to physicians, so it is very meaningful to help the clinicians if we could partially implement computer aided diagnosis.


Introduction

Figure 1: A, Endoscopic image of an ulcerating adenocarcinoma; B, ulcerating adenocarcinoma. 3 Dept. of IT, NIT Raipur 5/14/2015

Outline of proposed methodology

1. Data Acquisition- A total of 100 images were taken. These included 30 cases of normal and 70 cases of abnormal (ulcerous) condition.

2. Pre-processing- These images are then subjected to

different image preprocessing techniques, namely:

• Gray scale conversion

• Noise Removal using Median Filter

• Image sharpening

• Region growing Segmentation


Outline of proposed methodology

3. Image Processing-Image processing techniques involve

identifying the sets of essential features and extracting these features from the preprocessed image for further processing.

4. Feature Selection- A feature selection algorithm is necessary to select few of those extracted features which are most significant and which describe the image characteristics the best.

5. Classification- These optimal features were then provided as an input to the Naïve Bayes classifier for classification.


Workflow of the Methodology


Figure2: Suggested Process

Acquired Images

7

(2) (1)

(3) (4)

Figure: Different Stages of Gastric Cancer Dept. of IT, NIT Raipur 5/14/2015

Feature Extraction

• The objective of feature extraction is to represent raw image in its reduced and compact form in order to facilitate and speed up the decision making process such as classification.

• As illustrated in Fig. 3, abnormal regions in CE images with ulcer show more or less differences in texture compared to its surrounding regions. This property and the aforementioned review encourage us to investigate texture features of CE images.


Feature Extraction(2)

• Feature extraction methodologies analyze the preprocessed images to extract the most prominent features that represent various sets of features based on their pixel intensity relationship and statistics.

• A set of 5 features (i.e., statistical texture features), namely, intensity histogram, gray-level co-occurrence matrix (GLCM), gray-level run-length matrix (GLRLM), and invariant moments, were extracted from each of the total 100 images in MATLAB.


Intensity Histogram Features

• A total of 6 features were extracted corresponding to Intensity Histogram.

1. Mean

2. Energy

3. Variance

4. Entropy

5. Skewness

6. Kurtosis


GLCM Features • Gray level co-occurrence matrix, also termed as “spatial

gray level dependency matrices” it is one of the most widely used statistical tools for extracting texture information from images.

• A total of 21 features were extracted corresponding to GLCM.

1. Autocorrelation 2. Contrast 3. Correlation 4. Cluster prominence 5. Cluster shade 6. Dissimilarity 7. Energy


GLCM Features(2)

8. Entropy 9. Homogeneity 10. Maximum Probability 11. Sum of Squares 12. Sum average 13. Sum variances 14. Sum entropy 15. Difference variance 16. Difference entropy 17. Information measures of correlation-1 18. Information measures of correlation-2 19. Inverse Difference 20. Inverse difference normalized [INN] 21. Inverse difference moment normalized


GLRLM Features

• Grey-level run-length matrix (GLRLM) is a matrix from which the texture features can be extracted for texture analysis. A total of 11 features were extracted corresponding to GLRLM.

1. Short run emphasis [SRE]

2. Long run emphasis [LRE]

3. Gray-level non uniformity [GLN]

4. Run length non uniformity [RLN]

5. Run percentage [RP]

6. Low gray-level run emphasis [LGRE]


GLRLM Features(2)

7. High gray-level run emphasis [HGRE]

8. Short run low gray-level emphasis [SRLGE]

9. Short run high gray-level emphasis [SRHGE]

10. Long run low gray-level emphasis [LRLGE]

11. Long run high gray-level emphasis [LRHGE]


Feature Selection

• A total of 46 features were extracted in feature extraction process from each image but all of these features cannot be supplied to the classification algorithm because the number of features is high.

• The objective of feature selection is to reduce the dimensionality of feature space by removing redundant, irrelevant, or noisy data.

• It speeds up a data mining algorithm, improves the data quality and thereof the performance of data mining.

• WEKA tool was used for selection of features.


Feature Selection(2)

• Two feature selection techniques were employed for this purpose:

1. Information Gain Attribute Evaluation – Evaluates the worth of an attribute by measuring the

information gain with respect to the class.

– InfoGain(Class,Attribute) = H(Class)-H(Class|Attribute)

2. Gain Ratio Attribute Evaluation – Evaluates the worth of an attribute by measuring the

gain ratio with respect to the class.

– GainR(Class, Attribute) = H(Class)-H(Class|Attribute))/H(Attribute).



3. Ranker Algorithm

– Ranker algorithm is used in conjunction with both InfoGainAttributeEval and GainRatioAttributeEval.

– It is a simple algorithm that ranks the individual feature based on the values generated from the respective evaluation techniques.



– Intensity Histogram Features:

Kurtosis Energy Variance Skewness Class

3.32591 0.373544 0.58388 -0.27838 abnormal

3.012475 0.220592 1.663059 0.01345 abnormal

2.875807 0.212768 1.855407 0.082635 abnormal

2.66464 0.34416 0.662323 -0.128 abnormal

2.717288 0.263268 1.104087 -0.10918 abnormal

4.846169 0.441032 2.326847 0.636859 normal

5.442856 0.442208 2.147973 0.577592 normal



– GLRLM Features

RP GLN RLN LRE Class

12.88566 10238.89 19944.25 15.76177 abnormal

17.59285 14542.72 23353.28 23.0428 abnormal

11.30146 7300.354 6478.964 19.88311 abnormal

11.02774 5605.537 4378.65 23.18057 abnormal

14.1735 8866.56 6196.044 17.84978 abnormal

2.833454 790.5157 1363.097 13.25243 normal

2.822276 565.9785 975.357 13.09647 normal



– GLCM Features

Autocorrelation

Cluster Shade

Entropy Homogeneity

Sum Variance

Difference Variance

Difference Entropy

Class

16255.17 24283685 0.000902 7.516601 16282.83 62436.55 5.326924 abnormal

13492.13 2.01E+08 0.000407 8.277571 13525.86 51536.94 5.884364 abnormal

21892.82 241290627 0.000361 8.617996 21953.21 84401.9 5.917933 abnormal

16109.78 71902277.2 0.000561 7.944364 16156.63 61791.02 5.66429 abnormal

15746.48 18958507.5 0.002232 8.519832 15868.13 60655.19 6.032412 abnormal

10334.92 160258173 0.001304 7.870643 10410.03 39473.03 4.901656 normal

10944.94 116586083 0.000648 7.298406 10994.83 41728.69 5.087692 normal


Classification

• We have chosen Naïve Bayes Classifier for classification of test data for these three major reasons:

1. Despite its simplicity, Naïve Bayes can often outperform more sophisticated classification methods in terms of accuracy.

2. It takes linear time, rather than by expensive iterative approximation as used for many other types of classifiers.

3. Naïve Bayes shows better resilience to missing data in comparison with other discriminative algorithms(e.g. SVM). 21 Dept. of IT, NIT Raipur 5/14/2015

Classification(2)

• Naïve Bayes’ classifiers can be trained very efficiently in a supervised learning setting. Parameter estimation for naïve Bayes models uses the method of maximum likelihood.

• The Naïve Bayes Classifier technique is a probabilistic classifier based on Bayesian theorem.


Experimental Results and Accuracy

• Classification accuracy is calculated by determining the percentage of cases in which the test sets are correctly classified.

• The performance of the neural network was calculated by analysis of confusion matrix and the receiver operator characteristic curve (ROC).


Experimental Results and Accuracy(2)

• The following results were yielded by mixed features on the dataset:

– Accuracy : 92%

– Sensitivity : 90.74%

– Specificity : 89%

– False positive rate : 13.7%

– True positive rate : 92.6%

– Roc Area : 93.8%


Conclusion

• From the above results, we have achieved our objective in finding the best classifier for diagnosis of gastric cancer.

• Five sets of features such as GLCM, intensity histogram, GLRLM, invariant moments, and mixed features were extracted.

• These features were then selected and trained in Weka to determine the best set of features, which can determine the presence of ulcer conditions in the stomach.

• Experimentally, both GLRLM and mixed feature set showed excellent accuracy in training as well as testing.


References [1] Karthik Kalyan, Binal Jakhia, Ramachandra Dattatraya Lele, Mukund Joshi

and Abhay Chowdhary(2014), Artificial Neural Network Application in the Diagnosis of Disease Conditions with Liver Ultrasound Images, http://dx.doi.org/10.1155/2014/708279.[Online] Available.

[2] Alexandros Karargyris and Nikolaos Bourbakis, Detection of Small Bowel Polyps and Ulcers in Wireless Capsule Endoscopy Videos IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 58, NO. 10, OCTOBER 2011 2777.

[3] Baopu Li , Max Q.-H. Meng, Texture analysis for ulcer detection in capsule endoscopy images, Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China, journal homepage: www.elsevier.com/locate/imavis.

[4] Miaou, S.-G., Chang, F.-L., Timotius, I.K., Huang, H.-C., Su J.-L., Liao, R.-S. and Lin, T.-Y. (2009) A Multi-Stage Recognition System to Detect Different Types of Abnormality in Capsule Endoscope Images. Journal of Medical and Biological Engineering, 29, 114-121.

[5] M. Hu, “Visual pattern recognition by moment invariants,” IRE Transactions on InformationTheory, vol. 8, pp. 179–187, 1962.


References(2) [6] N. Sharma, A. Bajpai, and R. Litoriya, “Comparison the various clustering

algorithms of weka tools,” International Journal of Emerging Technology and Advanced Engineering, vol. 2, no. 5, pp. 73–80, 2012.

[7] A. Ahmadian, A. Mostafa, M. Abolhassani, N. Alam, and M. Gitti, An Efficient Texture Feature Extraction Method for Classification of Liver Sonography Based on Gabor Wavelet, Medicon, Tehran, Iran, 2004.

[8] M. Pietik¨ainen, T. Ojala, Z. Xu, and M. Pietik¨ainen, Rotation- Invariant Texture Classification Using Feature Distributions.

[9] Guyon, I., Weston, J., Barnhill S. and Vapnik, V. (2002) Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning, 46, 389-422. http://dx.doi.org/10.1023/A:1012487302797 [Online] Available:

[10] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural features for image classification,” IEEE Trans. Syst.,Man, Cybern., vol. SMC3, no. 6, pp. 610–621, Nov. 1973.

[11] A. Karargyris and N. Bourbakis, “Three-dimensional reconstruction of the digestive wall in capsule endoscopy videos using elastic video interpolation," IEEE Trans. Med. Imag, vol. 30, no. 4, pp. 957–971, Apr.2011.


Thank You


major project presentation

Documents

nit raipur

texture features of

prominent features

optimal features

endoscopic image

raw image

image characteristics

preprocessed images