automatic mitosis detection in breast …

12
AUTOMATIC MITOSIS DETECTION IN BREAST HISTOPATHOLOGY IMAGES USING KNN CLASSIFIER 1 G.Usha, 2 K.Narasimman, 3 T.Shanmuganathan, 4 M.Thalaimalaichamy 1 Department of ECE, SRC, SASTRA Deemed University, Kumbakonam, Tamilnadu, India 2 Department of ECE, School of EEE, SASTRA Deemed University, Thanjavur, Tamilnadu, India 3 Department of ECE, Hindustan Institute of Technology and Science, Chennai, Tamilnadu, India 4 Department of ECE, SRC, SASTRA Deemed University, Kumbakonam, Tamilnadu, India 1 [email protected] Abstract: Mitosis detection is very hard to detect. Mitotic count is an important factor in grading of breast cancer. In fact, mitosis is a process in which nucleus of the cell undergoes various transformations. In addition, different image areas are characterized by different tissue types, which exhibit highly variable appearance. Pixel classifiers are used to solve many detection problems, and these are characterized by the relatively obvious appearance of the objects to be detected. A KNN classifier is utilized to detect mitotic candidates from the contour segmented nuclei regions. The technique utilizes stain normalization process to reduce the complexity in segmenting exact nuclei boundary in large clinical images. The algorithm provides improved performance with average F-score of 99.09% for the mitosis data set. Keywords: H & E stained images, Stain reinhard normalisation, K-means clustering, KNN 1. Introduction Mitotic count is one of the most important prognostic factors in breast cancer grading as it is the key element for the assessment of tumour. Usually, mitotic nuclei are in the form of hyper chromatic objects without a clear nuclear membrane in H & E stained breast histopathology images [1]. Fig. 1 displays four main evolution phases in the mitosis, namely interface, prophase, metaphase, anaphase and telophase. The shape of nucleus will be different in various stages. However, they should be count as single mitosis since they are not separate cells. Due to large variety of shapes, low frequency and size of the mitotic cells the detection process is time- International Journal of Pure and Applied Mathematics Volume 119 No. 18 2018, 2795-2805 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ Special Issue http://www.acadpubl.eu/hub/ 2795

Upload: others

Post on 15-Oct-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AUTOMATIC MITOSIS DETECTION IN BREAST …

AUTOMATIC MITOSIS DETECTION IN BREAST HISTOPATHOLOGY IMAGES

USING KNN CLASSIFIER

1G.Usha,

2K.Narasimman,

3T.Shanmuganathan,

4M.Thalaimalaichamy

1Department of ECE, SRC, SASTRA Deemed University, Kumbakonam, Tamilnadu, India

2Department of ECE, School of EEE, SASTRA Deemed University, Thanjavur, Tamilnadu, India

3Department of ECE, Hindustan Institute of Technology and Science, Chennai, Tamilnadu, India

4Department of ECE, SRC, SASTRA Deemed University, Kumbakonam, Tamilnadu, India

[email protected]

Abstract: Mitosis detection is very hard to detect. Mitotic count is an important factor in grading

of breast cancer. In fact, mitosis is a process in which nucleus of the cell undergoes various

transformations. In addition, different image areas are characterized by different tissue types,

which exhibit highly variable appearance. Pixel classifiers are used to solve many detection

problems, and these are characterized by the relatively obvious appearance of the objects to be

detected. A KNN classifier is utilized to detect mitotic candidates from the contour segmented

nuclei regions. The technique utilizes stain normalization process to reduce the complexity in

segmenting exact nuclei boundary in large clinical images. The algorithm provides improved

performance with average F-score of 99.09% for the mitosis data set.

Keywords: H & E stained images, Stain reinhard normalisation, K-means clustering, KNN

1. Introduction

Mitotic count is one of the most important prognostic factors in breast cancer grading as it

is the key element for the assessment of tumour. Usually, mitotic nuclei are in the form of hyper

chromatic objects without a clear nuclear membrane in H & E stained breast histopathology

images [1]. Fig. 1 displays four main evolution phases in the mitosis, namely interface, prophase,

metaphase, anaphase and telophase. The shape of nucleus will be different in various stages.

However, they should be count as single mitosis since they are not separate cells. Due to large

variety of shapes, low frequency and size of the mitotic cells the detection process is time-

International Journal of Pure and Applied MathematicsVolume 119 No. 18 2018, 2795-2805ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/

2795

Page 2: AUTOMATIC MITOSIS DETECTION IN BREAST …

consuming and extremely difficult. In addition, irregular illumination, non-uniform stain

variation, and lymphocyte presence nuclei makes the detection process more difficult [2].

In this paper the pre-processing is done by using Stain Reinhard Normalisation technique.

The input images are H & E stained images. In this normalisation Hematoxyline stains nuclei

cells into blue and Eosin stains proteins as red, pink or orange. In segmentation K-Means

clustering algorithm is used to segment the interested area from the background. It will classify

given set into certain number of clusters. „K‟ is selected centroids so as the number of clusters.

The classification of the selected clustered image is done by means of KNN classifier.

2. Literature Survey

The detection of mitosis in H and E stained slides of the breast cancer is tedious process

because mitosis are of small sizes with large variety of shapes. Mitosis can be easily confused

with other artefacts present in the image [3]. The Krill Herd algorithm is proposed for solving

optimization of the tasks. It is based on the herding behaviour simulation of krill individuals.

Minimum distances of the individual krill from highest density of the herd are the objective

function for the krill movement [4]. The number of cells undergoing mitosis will play a vital role

in the classification system. However manual calculation is difficult, a computer assisted system

will produce precise results which results in high accuracy [5]. Image analysis using multi

threshold concept is implemented in the detection process to produce maximum optimization [6].

The multi threshold concept is applied in the segmentation of the biomedical images so that no

cell is left behind.

3. Methods

For the classification of the input image five steps were involved namely, image

acquisition, pre-processing, segmentation, feature extraction, performance analysis. The acquired

input image is treated using pre-processing by using stain normalisation followed by

segmentation using K-means clustering and in the classification is done by using Knn classifier.

International Journal of Pure and Applied Mathematics Special Issue

2796

Page 3: AUTOMATIC MITOSIS DETECTION IN BREAST …

4. Preprocessing

Image pre-processing is the process of enhancing the image . It consists of 3 major steps

namely filtering noise in input image, edge detection to detect the required object from the

unwanted background and binary image conversion (the process of converting the pixel

Fig. 1.Samples of mitotic cells in five mitotic phases.

International Journal of Pure and Applied Mathematics Special Issue

2797

Page 4: AUTOMATIC MITOSIS DETECTION IN BREAST …

value of the image into zero‟s and one‟s. The technique used for pre-processing is Stain reinhard

normalization Fig 2. Haematoxylin images are of dark blue or violet stained of basic in nature and

binds to basophilic substances such as DNA/RNA which are acidic in nature. Eosin is a pink or

red stain of acidic in nature which binds to acidophilic substances like DNA/RNA arginine and

colours cytoplasm red and RBC cherry red in colour. [9] Haemalum is a complex formed from

aluminium and haematin. It results in staining of nuclei cells in blue colour and with aqueous or

alcoholic solution which results in the eosinophilic structures like proteins in shades of red, pink

and orange. The staining of nuclei due to haemalum results in chemical reaction between dye and

cellular components [7].

5. Segmentation

The accuracy of mitotic count depends of the pre-processing, segmentation and

classification procedures. Cell nuclei and other cell structures can be differentiated using Stain

Normalization technique. Here comes the segmentation process where Krill Herd Algorithm

(KHA) was used in the existing system. Usually in the starting stages of breast cancer or any

other cancer cell membrane vanishes. So the background and the nuclei can be differentiated

easily. This is because we can‟t find a valid threshold (let‟s think it‟s a line that differentiates

background and nucleus). So what was done in KHA is we will first take a coloured infected

tissue image and convert that image to binary image that is 0's and 1's image (black and white

image). This is done because processing of coloured image is complex and time taking. Now in

this binary image (have pixels) selected pixels are made 1 and all the other pixels are made zero.

So now we get a selected imaged which is nothing but mask image and this specifies the centroids

of the nuclei region.

Now three thresholds are selected to differentiate nuclei from cytoplasm, background

stroma and vacuoles. This bi-level image is a mask which provides initial outline to segment

nuclei with exact boundaries by using LACM. LACM is Localized Active Contour Model which

is a broad overview in computer vision for describing object contour from disturbed image. It is

used in the applications like segmentation and shape recognition etc.., It is nothing but energy

minimizing curve (saline - it‟s a curve that connect two specific points) that pulls it towards

object contours that can withstand deformation. Till now it is the existing system and what we

used in our proposed system is K-means Algorithm instead of KHA, LACM because in LACM,

International Journal of Pure and Applied Mathematics Special Issue

2798

Page 5: AUTOMATIC MITOSIS DETECTION IN BREAST …

due to the energy minimization minute features are not considered over the entire contour and

KHA is not so efficient in large data collection and the performance speed is very low.

.

Here comes the K-means algorithm which is a clustering algorithm. It is used to segment

the selected area from the background. Before this segmentation we go for pre-processing for

improving the quality of the image. K-means classifies the given set through certain number of

clusters (bunch of similar things). How we get different clusters? let us take an example of an

image of 100*100 pixels. Let us select a part of image which has 10*10 pixels. These 10*10

pixels are nothing but 10*10 data points. In K-means K represent the count of randomly selected

centroid and so as the number of clusters we have. Let‟s take K value as 2. So now we have 2

randomly selected centroids ie., we will select two centroids randomly on the 100*100 pixel

image. Let it be C1 and C2. Now we have 10*10 data points and 2 centroids. Now we have to

calculate the distance from each data point to C1 and C2. The lesser the distance the closer the

centroid to that data point. So let us assume that 6*6(x*y) else belong to C1 and 4*4 pixels belong

to C2.

The next step is to calculate the mean i.e, average of 6*6 pixels and average of 4*4 pixels.

These averages will be the new centroids. Again the same process is repeated by calculating the

distance of data points from these new centroids and so on. After three to four iterations the

process can be stopped because though we can get new centroids those will be very nearer to the

previously arrived centroids.

We are extracting 2 clusters from the normalized image in segmentation. In our project we

considered the value of K as 4 and so we have 4 randomly selected centroids and so as the four

clustered images. We can get such number of images which have different similar things using

repetition matrix (inbuilt function) in matlab. This is how the segmentation is done using K-

means Algorithm.

5.1Advantages of K-means Algorithm

By K-means algorithm high performance speed is achieved by means of the repetition

matrix. The efficiency in the data collection is high. Accurate boundaries can be identified by

using K-means clustering.

International Journal of Pure and Applied Mathematics Special Issue

2799

Page 6: AUTOMATIC MITOSIS DETECTION IN BREAST …

6. Nuclei Classification

Classification phase consists of three stages such as

• Feature computation

• Feature selection

• Decision fusion of individual classifiers using KNN classifier frame work

7. Feature Computation

The cells which undergo mitosis will exhibit variations in texture, shape, size at different

stages. Fig.3(a) shows the example of an input image and Fig.3(b) displays zoomed version of a

selected segmented region. Useful features such as intensity based features, shape based and

texture based features of the cells are extracted from the segmented nucleus patch shown in

Fig.3(c). The intensity-based features include Median (M), Variance (V), Kurtosis (K) and

Skewness (S). The features such as Area (A), Perimeter (P) and Solidity (SL) are the shape-based

features considered along with thirteen Haralick texture features [3,7].

The Gray Level Co-occurrence Matrix (GLCM) will describe the pairing of pixels with

specific values which occurred in an image. However, the GLCM matrices can be estimated by

taking any direction. The adjacency occurs in horizontal (0◦), vertical (90◦), along 45◦ & 135◦, the

texture features are computed along these four directions. By taking the average in all the four

directions, thirteen texture features are computed that include Autocorrelation, Contrast (C),

Correlation (CR), Sum of Squares (SoS), Inverse Difference Moment (IDM), Sum Average (SA),

dissipation, energy, entropy (E), Difference Variance (DV), Difference Entropy (DV),

Information Measure of Correlation (IMoC) and Cluster Tendency. The final feature vector

contains 20 features which include mean and range of the 13 texture features along with four

intensity and three shape based features.

International Journal of Pure and Applied Mathematics Special Issue

2800

Page 7: AUTOMATIC MITOSIS DETECTION IN BREAST …

Table 1 – Feature Extraction

Feature Type Feature Name Dimension

Intensity based Median, Variance, Kurtosis, and Skewness 4

Shape based Area, Perimeter and Solidity 3

Texture based Haralick features 13

8. Feature Selection

The classifier subset evaluator selects a small subset of features that give best discriminant

information. The classification of the segmented image is done by using KNN classifier. The data

set which was taken from the RCC will be taken as training set. The data set of the input image

will be taken as the test set. The algorithm will return the row of the trained set matrix which was

matched with the test set. Based on the row number the classification is done. The subset of

features with highest score is considered as best feature subset.

By normalizing the feature within a uniform range the differences in the dynamic range of

the features are solved. The normalized value of N‟ is given by

m in

m a x m in

'N N

NN N

(1)

Where N is the actual feature value, m in

N and m a x

N represents the minimum and maximum

feature values.

9. Decision Fusion Using KNN Classifier

The KNN classifier will classifies the test set into groups, based on the training set

grouping. Both the test set and training set should consist of equal number of columns. Group is a

vector whose distinct value defines the rows of the training set. The default behaviour of the

classifier is to use the majority rule. It means a sample point is assigned to the class the majority

of the k nearest neighbours are from. The algorithm will return the row of the trained set matrix

which was matched with the test set. Based on the row number the classification is done.

Comparative results by the proposed KNN classifier with other classifiers as shown in fig.4.

International Journal of Pure and Applied Mathematics Special Issue

2801

Page 8: AUTOMATIC MITOSIS DETECTION IN BREAST …

10. Experimental results and discussion

The detected mitosis is considered as correct if it is located within the range of 8μm from

the centroid of ground truth mitosis. The well-known measures for the validation are precision

and F-score. Performance graph of precision, recall and F-score as shown in fig.5

1 0 0T P

T P F P

Np re c is io n

N N

(2)

2 1 0 0sen s it iv ity p rec is io n

F sco resen s it iv ity p rec is io n

(3)

Where T P

N represents number of True Positives (TP-correctly detected Mitosis),F P

N number of

False Positives (FP-wrongly detected mitosis).

Fig 5. Performance Graph

0

20

40

60

80

100

120

RF MV-MCS DBN-MCS KNN

Sensitivity

Precision

F-score

Fig 4. Comparative results by the proposed KNN classifier with other classifiers

International Journal of Pure and Applied Mathematics Special Issue

2802

Page 9: AUTOMATIC MITOSIS DETECTION IN BREAST …

The acquired:

Precision= 0.98198198198198

Recall = 1

F-Score = 0.990909090909091

11. Conclusion:

The paper proposes an accurate framework to carry out segmentation and classification of

mitotic nuclei in H & E stained breast images. Since mitosis detection is very hard to detect and

Mitotic count is an important factor in grading of breast cancer they are very hard to distinguish

from non-mitotic nuclei. The proposed technique first uses stain rein-hard normalization to reduce

the segmentation complexity. The complexity in segmentation is treated in an optimum way by

using K-means clustering algorithm. In classification KNN classifier is used to categorize the

image. Sequential feature selection and feature normalization helps in enhancing the classifier

properties. The proposed technique is evaluated on a publicly available standard dataset.

Compared to the existing techniques, the proposed framework results in better performance with

high sensitivity make it more realistic in clinical applications.

References

[1] M. M. Dundar et al., „„Computerized classification of intraductal breast lesions using

histopathological images,‟‟ IEEE Trans. Biomed. Eng., vol. 58, no. 7, pp. 1977–1984, Jul. 2011.

[2] C. D. Malon and E. Cosatto, „„Classification of mitotic figures with convolutional neural

networks and seeded blob features,‟‟ J. Pathol. Inform., vol. 4, no. 1, p. 9, 2013.

[3] L. Roux et al., „„Mitosis detection in breast cancer histological images An ICPR 2012

contest,‟‟ J. Pathol.Inform., vol. 4, no. 1, p. 8, 2013.

[4] A. H. Gandomi and A. H. Alavi, „„Krill herd: A new bio-inspired optimization algorithm,‟‟

Commun. Nonlinear Sci. Numer.Simul., vol. 17, no. 12, pp. 4831–4845, Dec. 2012.

[5] F. B. Tek et al., „„Mitosis detection using generic features and an ensemble of cascade

adaboosts,‟‟ J. Pathol. Inform., vol. 4, no. 1, p. 12, 2013.

[6] “Adaptive Multi Thresholding for Breast Cancer Stem Cell Detection- A Review” Sabina1 ,

Mrs.Nidhi Post Graduate Student (M.E.), Dept. of ECE, UIET, Panjab University, Chandigarh-

160014, India1 Asst. Prof., Dept. of ECE, UIET, Panjab University, Chandigarh-160014, India.

International Journal of Pure and Applied Mathematics Special Issue

2803

Page 10: AUTOMATIC MITOSIS DETECTION IN BREAST …

[7] H. Chen, Q. Dou, X. Wang, J. Qin, and P. A. Heng, „„Mitosis detection in breast cancer

histology images via deep cascaded networks,‟‟ in Proc. 13th AAAI Conf. Artif.Intell., 2016, pp.

1160–1166.

[8] (2014). MITOS, ICPR 2014 Contest, IPAL UMI CNRS Lab Std. [Online]. Available:

http://ipal.cnrs.fr/ICPR2014.

[9] J.Savithri, H.Inbarani,” Comparative Analysis Of K-Means, Pso-K-Means, And Hybrid Psogenetic K-

Means For Gene Expression Data”, International Journal Of Innovations In Scientific And

Engineering Research, Vol .1, No. 1, pp.43-50, 2014.

International Journal of Pure and Applied Mathematics Special Issue

2804

Page 11: AUTOMATIC MITOSIS DETECTION IN BREAST …

2805

Page 12: AUTOMATIC MITOSIS DETECTION IN BREAST …

2806