supervised and unsupervised tumor characterization in the ...sarfaraz hussein, pujan kandel, candice...

11
Under Review for IEEE Transactions on Medical Imaging 2018 1 Supervised and Unsupervised Tumor Characterization in the Deep Learning Era Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci * , Senior Member, IEEE. Abstract—Cancer is among the leading causes of death world- wide. Risk stratification of cancer tumors in radiology images can be improved with computer-aided diagnosis (CAD) tools which can be made faster and more accurate. Tumor characterization through CADs can enable non-invasive cancer staging and prognosis, and foster personalized treatment planning as a part of precision medicine. In this study, we propose both supervised and unsupervised machine learning strategies to improve tumor characterization. Our first approach is based on supervised learning for which we demonstrate significant gains in deep learning algorithms, particularly by utilizing a 3D Convolutional Neural Network along with transfer learning. Motivated by the radiologists’ interpretations of the scans, we then show how to incorporate task dependent feature representations into a CAD system via a “graph-regularized sparse Multi-Task Learning (MTL)” framework. In the second approach, we explore an unsupervised scheme to address the limited availability of labeled training data, a common problem in medical imaging applications. Inspired by learning from label proportion (LLP) approaches, we propose a new algorithm, proportion-SVM, to characterize tumor types. We also seek the answer to the fundamental question about the goodness of “deep features” for unsupervised tumor classification. We evaluate our proposed approaches (both supervised and unsupervised) on two different tumor diagnosis challenges: lung and pancreas with 1018 CT and 171 MRI scans respectively. Index Terms—Deep learning, Unsupervised Learning, Lung cancer, Pancreatic Cancer, 3D CNN, IPMN, Cancer Diagnosis. I. I NTRODUCTION Cancer affects a significant number of the world’s popula- tion; approximately 40% of people will be diagnosed with cancer at some point during their lifetime with an overall mortality of 171.2 per 100,000 people per year (based on 2008- 2012 deaths) [1]. Lung and pancreatic cancers are two of the most common cancers. While lung cancer is the largest cause of cancer-related deaths in the world, pancreatic cancer has the poorest prognosis with a 5-year survival rate of just 7% in the United States [1]. With regards to pancreatic cancer, specifically in this work, we focus on the challenging prob- lem of automatic diagnosis of Intraductal Papillary Mucinous Neoplasms (IPMN). IPMN is a pre-malignant condition and if left untreated, it can progress to invasive cancer. IPMN is mucin-producing neoplasm that can be found in the main pancreatic duct and its branches. They are radiographically identifiable precursors to pancreatic cancer [2]. Detection and characterization of these lung and pancreatic tumors can aid * indicates corresponding author ([email protected]). S. Hussein and U. Bagci are with Center for Research in Computer Vision (CRCV) at University of Central Florida (UCF), Orlando, FL; P. Kandel, C. Bolan and M. Wallace are with Mayo Clinic, Jacksonville, FL Computer Aided Diagnosis (CAD) Supervised learning Clustering Proportional SVM 1. Lung nodule characterization Deep Learning (CNN) Clustering Case studies Multi-Task Learning Proportion SVM Unsupervised learning 2. IPMN classification Figure 1: A block diagram to represent different schemes, methods and experimental case studies presented in this paper. We categorize CAD systems into supervised and unsupervised schemes. For supervised scheme, we propose a new algorithm consisting of a 3D CNN architecture and Graph Regularized Sparse Multi-task learning and perform evaluations for lung nodule characterization. For unsupervised scheme, we pro- posed a new clustering algorithm, -SVM, and test it for the classification of lung nodules and IPMN. in early diagnosis; hence, increased survival chance through appropriate treatment/surgery plans. Conventionally, the CAD systems are designed to assist radiologists in making accurate and fast clinical decisions by reducing the number of false positives and false negatives. For diagnostic decision making, a higher emphasis is laid on increased sensitivity as a false-flag is more tolerable than a tumor being missed or incorrectly classified as benign. In this regard, computerized analysis of imaging features (and markers) is fast becoming a key instrument for radiologists to improve their diagnostic decisions. In the literature, automated detection and diagnosis methods had been developed for tumors in different organs such as breast, colon, brain, lung, liver, prostate, and others. Generally, a CAD system comprises preprocessing, feature engineering, which includes feature extraction and selection followed by a classification scheme. However, with the success of deep learning, a transition from feature engineering to feature learning has been observed in medical image analysis literature. Those systems comprise Convolutional Neural Networks (CNN) as feature extractor followed by a conventional classifier such as such as Random Forest (RF) [3], [4]. This work is a significant extension to our IPMI 2017 [5] study. We present a supervised strategy to perform risk- stratification task for lung nodules from low-dose CT scans. arXiv:1801.03230v2 [cs.CV] 29 Jul 2018

Upload: others

Post on 18-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supervised and Unsupervised Tumor Characterization in the ...Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci , Senior Member, IEEE. Abstract—Cancer

Under Review for IEEE Transactions on Medical Imaging 20181

Supervised and Unsupervised TumorCharacterization in the Deep Learning Era

Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci∗, Senior Member, IEEE.

Abstract—Cancer is among the leading causes of death world-wide. Risk stratification of cancer tumors in radiology images canbe improved with computer-aided diagnosis (CAD) tools whichcan be made faster and more accurate. Tumor characterizationthrough CADs can enable non-invasive cancer staging andprognosis, and foster personalized treatment planning as a partof precision medicine. In this study, we propose both supervisedand unsupervised machine learning strategies to improve tumorcharacterization. Our first approach is based on supervisedlearning for which we demonstrate significant gains in deeplearning algorithms, particularly by utilizing a 3D ConvolutionalNeural Network along with transfer learning. Motivated by theradiologists’ interpretations of the scans, we then show how toincorporate task dependent feature representations into a CADsystem via a “graph-regularized sparse Multi-Task Learning(MTL)” framework.

In the second approach, we explore an unsupervised schemeto address the limited availability of labeled training data, acommon problem in medical imaging applications. Inspired bylearning from label proportion (LLP) approaches, we proposea new algorithm, proportion-SVM, to characterize tumor types.We also seek the answer to the fundamental question about thegoodness of “deep features” for unsupervised tumor classification.We evaluate our proposed approaches (both supervised andunsupervised) on two different tumor diagnosis challenges: lungand pancreas with 1018 CT and 171 MRI scans respectively.

Index Terms—Deep learning, Unsupervised Learning, Lungcancer, Pancreatic Cancer, 3D CNN, IPMN, Cancer Diagnosis.

I. INTRODUCTION

Cancer affects a significant number of the world’s popula-tion; approximately 40% of people will be diagnosed withcancer at some point during their lifetime with an overallmortality of 171.2 per 100,000 people per year (based on 2008-2012 deaths) [1]. Lung and pancreatic cancers are two of themost common cancers. While lung cancer is the largest causeof cancer-related deaths in the world, pancreatic cancer hasthe poorest prognosis with a 5-year survival rate of just 7%in the United States [1]. With regards to pancreatic cancer,specifically in this work, we focus on the challenging prob-lem of automatic diagnosis of Intraductal Papillary MucinousNeoplasms (IPMN). IPMN is a pre-malignant condition andif left untreated, it can progress to invasive cancer. IPMNis mucin-producing neoplasm that can be found in the mainpancreatic duct and its branches. They are radiographicallyidentifiable precursors to pancreatic cancer [2]. Detection andcharacterization of these lung and pancreatic tumors can aid

∗ indicates corresponding author ([email protected]).S. Hussein and U. Bagci are with Center for Research in Computer Vision

(CRCV) at University of Central Florida (UCF), Orlando, FL; P. Kandel, C.Bolan and M. Wallace are with Mayo Clinic, Jacksonville, FL

Computer Aided

Diagnosis (CAD)

Supervised learning

• Clustering • Proportional SVM

1. Lung nodule characterization

Deep Learning (CNN)

Clustering

Case studies

Multi-Task Learning

Proportion SVM

Unsupervised learning

2. IPMN classification

Figure 1: A block diagram to represent different schemes,methods and experimental case studies presented in this paper.We categorize CAD systems into supervised and unsupervisedschemes. For supervised scheme, we propose a new algorithmconsisting of a 3D CNN architecture and Graph RegularizedSparse Multi-task learning and perform evaluations for lungnodule characterization. For unsupervised scheme, we pro-posed a new clustering algorithm, ∝-SVM, and test it for theclassification of lung nodules and IPMN.

in early diagnosis; hence, increased survival chance throughappropriate treatment/surgery plans.

Conventionally, the CAD systems are designed to assistradiologists in making accurate and fast clinical decisions byreducing the number of false positives and false negatives.For diagnostic decision making, a higher emphasis is laid onincreased sensitivity as a false-flag is more tolerable than atumor being missed or incorrectly classified as benign. Inthis regard, computerized analysis of imaging features (andmarkers) is fast becoming a key instrument for radiologists toimprove their diagnostic decisions. In the literature, automateddetection and diagnosis methods had been developed fortumors in different organs such as breast, colon, brain, lung,liver, prostate, and others. Generally, a CAD system comprisespreprocessing, feature engineering, which includes featureextraction and selection followed by a classification scheme.However, with the success of deep learning, a transition fromfeature engineering to feature learning has been observed inmedical image analysis literature. Those systems compriseConvolutional Neural Networks (CNN) as feature extractorfollowed by a conventional classifier such as such as RandomForest (RF) [3], [4].

This work is a significant extension to our IPMI 2017 [5]study. We present a supervised strategy to perform risk-stratification task for lung nodules from low-dose CT scans.

arX

iv:1

801.

0323

0v2

[cs

.CV

] 2

9 Ju

l 201

8

Page 2: Supervised and Unsupervised Tumor Characterization in the ...Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci , Senior Member, IEEE. Abstract—Cancer

2

We first propose a 3D CNN based discriminative featureextraction strategy for lung nodules. We contend that 3Dnetworks are important for the characterization of lung nodulesin CT images which are inherently 3-dimensional. The use ofconventional 2D CNN methods, however, leads to the loss ofvital volumetric information which can be crucial for preciserisk assessment of lung nodules. The improved performance of3D CNN networks over 2D networks is thoroughly discussedin [6]. Using the deep learning based approach, we can alsocircumvent hand-crafted feature extraction, meticulous featureengineering, as well as parameter tuning. In the absence of alarge number of labeled training examples, we utilize a pre-trained 3D CNN architecture and fine-tune the network withlung nodules dataset. Inspired by the significance of lung nod-ule attributes for clinical determination of malignancy [7], weutilize the information about six high-level nodule attributessuch as calcification, spiculation, sphericity, lobulation, mar-gin, and texture (Figure 2-A) to improve automatic benign-malignant classification. Furthermore, we integrate featuresassociated with these high-level nodule attributes obtainedfrom 3D CNN and incorporate them in a graph regularizedmulti-task learning (MTL) framework to yield the final ma-lignancy output. To the best of our knowledge, our work isthe first to study transfer learning of a 3D CNN for lungnodule malignancy determination. We also analyze the im-pact of aforementioned lung nodule attributes for malignancydetermination and find these attributes to be complementaryfor obtaining the malignancy scores. Additionally, we studieddifferent regularizers as well as multi-task learning approachessuch as trace-norm and graph regularized MTL for regression.

Lastly, inspired by the successful application of unsuper-vised methods in other domains, we explore the potential ofunsupervised learning strategies in lung nodule and IPMNclassification. Mainly, we extract important discriminative in-formation from a large amount of unlabeled data. We employboth hand-crafted and deep learning features extracted frompre-trained networks for classification. In order to obtain aninitial set of labels in an unsupervised fashion, we cluster thesamples into different groups in the feature domain. We nextpropose to train Proportion-Support Vector Machine (∝SVM)classifier using label proportions rather than instance labels.The trained model is then employed to classify testing samplesas benign or malignant.

II. RELATED WORK

This section summarizes the advances in machine learningapplied to medical imaging and CAD systems developed forlung cancer diagnosis. Since the automatic characterizationof IPMN has not been extensively studied in the literature,relevant works are selected from the clinical studies. Ourwork will be the first in this regard.

Imaging Features and Classifiers: Although, not an objectivein this paper, the detection of lung nodules has also been asubject of interest among researchers [8], [9], [10]. The ap-proach in [10], comprises of CNN with dense connections fora single-shot detection of lung nodules without explicit false-positive removal. On the other hand, conventionally, the risk

stratification (classification) of lung nodules requires nodulesegmentation, computation and selection of low-level featuresfrom the image, and the use of a standard classifier/regressor.In the approach by [11] different physical statistics includ-ing intensity measures were extracted and class labels wereobtained using Artificial Neural Networks (ANN). In [12]lung nodules were segmented using appearance based modelsfollowed by shape analysis using spherical harmonics. The laststep involved k-nearest neighbor based classification. Anotherapproach extended 2D texture features including Local BinaryPatterns (LBP), Gabor and Haralick to 3D [13]. Classificationusing Support Vector Machine (SVM) was performed as thefinal step. In another approach by [14], segmentation wasimplemented via 3D active contours, and the rubber bandstraightening transform was applied to obtain texture features.A Linear Discriminant Analysis (LDA) classifier was appliedto get class labels. Lee et al. [15] introduced a featureselection based approach utilizing both clinical and imagingdata. Information content and feature relevance were measuredusing an ensemble of genetic algorithm (GA) and randomsubspace method (RSM). Lastly, LDA was applied to obtainfinal classification on the condensed feature set. In a recentwork, spherical harmonics features were fused with deeplearning features [4] and then RF classification was employedfor lung nodule characterization. Hitherto, the application ofCNN for nodule characterization has been limited to 2Dspace [16], thus falling short of incorporating vital contextualand volumetric information. In another approach, Shin etal. [17] employed CNN for the classification of lung nodules.Other than not completely 3D CNN, the approach didn’t takeinto account high-level nodules and requires training an off-the-shelf classifier such as RF and SVM.

The information about different high-level image attributeshad been found to be contributory in the malignancycharacterization of lung nodules. In a study exploring thecorrelation between malignancy and nodule attributes, [7]found that 82% of the lobulated, 93% of the ragged, 97% ofthe densely spiculated, and 100% of the halo nodules weremalignant in a particular dataset. Automatic determinationof lung nodule attributes and types had been exploredin [18]. The objective in that approach was to perform theclassification of six different nodule types such as solid,non-solid, part-solid, calcified, perifissural and spiculatednodules. However, the approach is based on 2D CNN andfell short of estimating the malignancy of lung nodules.Furthermore, 66% of the round nodules were determined asbenign. In a similar fashion, in this work, we employ 3DCNN to mine discriminative and rich feature set associatedwith each of these 6 attributes. Finally, we deduce malignancylikelihood by combining these feature representations viaMTL.

IPMN: Although there has been a considerable progress indeveloping automatic approaches to segment pancreas andits cysts [19], [20], the use of CAD systems to performfully automatic risk-stratification of IPMNs is limited. Forinstance, the approach by [21] investigated the influence of360 imaging features ranging from intensity, texture, and

Page 3: Supervised and Unsupervised Tumor Characterization in the ...Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci , Senior Member, IEEE. Abstract—Cancer

3

shape to stratify subjects as low or high-grade IPMN. Inanother example, [22] extracted texture and features from thesolid component of segmented cysts followed by a featureselection and classification scheme. Here it is importantto note that both of these approaches [22], [21] requiredsegmentation of cysts or pancreas and are evaluated on CTscans. In contrast, our experimental setup doesn’t requiresegmentation of cysts or pancreas and we evaluate IPMNs onMRI scan which is a preferred imaging modality because ofno radiation exposure and improved soft-tissue contrast [23].To the best of our knowledge, our study is the largest IPMNclassification study consisting of 171 subjects across bothmodalities (CT and MRI).

Unsupervised Learning: Typically, the visual recognition andclassification tasks are addressed using supervised learningalgorithms which use labeled data. However, for tasks wheremanually generating labels corresponding to large datasetsis laborious and expensive, the use of unsupervised learningmethods is of significant value. Unsupervised techniques hadbeen used to solve problems in various domains ranging fromobject categorization [24] to speech processing [25]. Thesemethods typically relied on some complementary informationprovided with the data to improve learning, which may not beavailable for many classification tasks in medical imaging.

In medical imaging, there have been different approachesthat used unsupervised learning for detection and diagnosisproblems. The approach by Shin et al. [26] used stackedautoencoders for multiple organ detection in MRI scans.Vaidhya et al. [27] presented a brain tumor segmentationmethod with stacked denoising autoencoder evaluated onmulti-sequence MRI images. In the work by Sivakumaret al. [28], the segmentation of lung nodules is performedwith unsupervised clustering methods. The work by Kumaret al. [3] used features from autoencoder for lung noduleclassification. These auto-encoder approaches, however, didnot yield satisfactory classification results. Unsupervisedfeature learning remains an active research area for themedical imaging community, more recently with GenerativeAdversarial Networks (GAN) [29]. In order to explorethe information from unlabeled images, Zhang et al. [30]described a semi-supervised method for the classification offour types of nodules. In sharp contrast to these approaches,the unsupervised learning strategies presented in this paperdon’t involve feature learning using auto-encoders. Using setsof hand-crafted as well as pre-trained deep learning features,we propose an unsupervised learning approach where aninitially estimated label set is progressively improved so as toget better tumor classification performance.

Our ContributionsA block diagram representing different supervised and unsu-pervised schemes is presented in Figure 1. Overall, our maincontributions in this work can be summarized as follows:

• For supervised learning, we present a 3D CNN basedapproach to fully appreciate the anatomical information in3D, which would be otherwise lost in the conventional

2D approaches. Briefly, we apply fine-tuning to avoid therequirement for a large number of volumetric training exam-ples for 3D CNN. We fine-tune a network (which is trainedon 1 million videos) using the CT data.

• We apply graph regularized sparse MTL to integrate thecomplementary feature information from lung noduleattributes so as to improve malignancy determination.Figure 2-A shows lung nodule attributes having varyinglevels of prominence.

As a significant extension of our IPMI 2017 work [5], wemade the following additional contributions in this paper:

• We propose and evaluate different supervised and unsuper-vised methods for both lung nodules and IPMN diagnosis.In the era where the wave of deep learning has sweptinto almost all domains of visual analysis, we investigatethe contribution of features extracted from different deeplearning architectures.

• Our work can be considered as a significant milestoneto study the potential of unsupervised learning methodsfor the lung nodule and IPMN classification using bothhandcrafted and deep learning features. Additionally, inorder to alleviate the effect of noisy labels (i.e. mislabeling)obtained during clustering, we propose to employ ∝SVM,which is trained on label proportions. Instead of hardassigning labels, we estimate the label proportions in adata-driven manner.

Other than these contributions, we have also extended theevaluation set to include IPMN diagnosis with MRI scanson a dataset consisting of over 170 scans. To the best of ourknowledge, this is the first work to investigate the automaticdiagnosis of IPMN with MRI.

III. SUPERVISED LEARNING METHODS

A. Problem FormulationLet X = [x1, x2 . . . xn]T ∈ Rn×d represent the input set

which consists of features from n images of lung noduleseach having a dimension d. Each data sample has an at-tribute/malignancy score given by Y = [y1, y2 . . . yn], whereY T ∈ Rn×1. Generally, in CAD systems, X consists offeatures extracted from images, and Y represents the malig-nancy score over 1-5 scale where 1 represents benign and5 represents malignant. In supervised learning, the labeledtraining data is used to learn the coefficient vector or theregression estimator W ∈ Rd. While testing, W will thenbe used to estimate Y for an unseen testing example. Forregression, a regularizer is added to prevent over-fitting. Thus,a least square regression function with `1 regularization canbe represented as:

minW‖XW − Y ‖22 + λ ‖W‖1 . (1)

In the above equation the sparsity level of the coefficientvector W = [w1, w2 . . . wd] is controlled by λ. It can be

Page 4: Supervised and Unsupervised Tumor Characterization in the ...Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci , Senior Member, IEEE. Abstract—Cancer

4

(a) (b) (c) (d) (e) (f)

Low

High

Cal Sph Mar Lob Spic Tex

0

50

100

150

200

250

1 1.5 2 2.5 3 3.5 4 4.5 5

# o

f N

odul

es

Malignancy Scores

(g)

(A) Lung nodule attributes

Input volume

3D Convolution Max pooling

Fully Connected layer

4096-D Attributes:

• Malignant? – Y

• Calcified? – N

• Lobulated? – Y

• Spherical? – Y

• Spiculated? – Y

• Margin? – Y

• Texture ? – Y

3D Convolution Max pooling

Fully Connected layer

4096-D

3DCNN for attribute 1

3D CNN for attribute ‘M’

Graph Regularized Sparse Representation

Malignancy score

Multi-Task Learning

(B) An overview of the proposed supervised approach

Figure 2: (A) A visualization of lung nodules having different levels of attributes. On moving from the top (attribute absent)to the bottom (attribute prominently visible), the prominence level of the attribute increase. Different attributes includingcalcification, sphericity, margin, lobulation, spiculation and texture can be seen in (a-f). The graph in (g) depicts the numberof nodules with different malignancy levels in our experiments using the publicly available dataset [31]. An overview of theproposed 3D CNN based graph regularized sparse MTL approach is presented in (B).

observed that Eq. (1) is an example of unconstrained convexoptimization problem, which is not differentiable at wi = 0.So, for Eq. (1) the closed form solution with a global minimumis not feasible. In that case, we can represent the problem inthe form of a constrained optimization function as:

minW‖XW − Y ‖22 , s.t. ‖W‖1 ≤ t, (2)

where t and λ observe inverse relationship. The function inEq. (2) is convex and the constraints define a convex set. Asit is a convex problem, a local minimizer of the objectivefunction is subjected to constraints corresponding to a globalminimizer. In the following subsections, we extend this super-vised setting in deep learning and MTL to characterize nodulesas benign or malignant.

B. 3D Convolution Neural Network and Fine-tuning

We use 3D CNN [6] trained on Sports-1M dataset [32] andfine-tune it on the lung nodule dataset. The dataset consistsof 487 classes with 1 million videos. As the lung noduledataset doesn’t have a large number of training examples, fine-tuning is done to acquire dense feature representation from alarge-scale dataset (Sports-1M). The 3D architecture consistsof 5 sets of convolution, 2 fully-connected and 1 soft-maxclassification layers. Each convolution set is followed by amax-pooling layer. The input consists of 16 non-overlappingslices with a total dimension of 3×16×128×171. The numberof filters in the first 3 convolution layers are 64, 128 and 256respectively, whereas there are 512 filters in the last 2 layers.The fully-connected layers have a dimension 4096 which isalso the length of feature vectors used as an input to the MTLframework. Implementation details are mentioned in sectionV-C.

C. Multi-task learning (MTL)

Multi-task learning is an approach of learning multiple taskssimultaneously while considering disparities and similaritiesacross those tasks. Given M tasks, the goal is to improvethe learning of a model for task i, where i ∈ M , by usingthe information contained in the M tasks. We formulatethe malignancy determination of lung nodules as an MTLproblem, where different tasks represent different attributesof a nodule. Here the aim is to learn a joint model whileexploiting the dependencies among attributes (tasks) in featurespace. Consider a dataset D comprising M tasks where eachtask denotes an attribute. The correlation between these tasksand the shared feature representation are not known. Here, weaim to jointly learn these attributes while exploiting featurelevel dependencies so as to improve regressing malignancyusing other attributes.

As shown in Figure 2-B, we design lung tumor characteriza-tion as an MTL problem, where each task has model parame-ters Wm, which are utilized to characterize the correspondingtask m. Moreover, when W = [W1,W2 . . .WM ] ∈ Rd×Mconstitutes a rectangular matrix, rank can be considered asa natural extension to cardinality, and nuclear/trace normleads to low rank solutions. In some cases nuclear normregularization can be considered as the `1-norm of the singularvalues [33]. Trace norm, the sum of singular values, is theconvex envelope of the rank of a matrix (which is non-convex), where the matrices are considered on a unit ball.After substituting, `1-norm by trace norm in Eq. (1), the leastsquare loss function with trace norm regularization can beformulated as:

minW

M∑i=1

‖XiWi −Yi‖22 + ρ ‖W‖∗ , (3)

where ρ adjusts the rank of the matrix W, and ‖W‖∗ =∑i=1 σi(W) is the trace-norm where σ denotes singular

Page 5: Supervised and Unsupervised Tumor Characterization in the ...Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci , Senior Member, IEEE. Abstract—Cancer

5

values. However, as in trace-norm, the assumption aboutmodels sharing a common subspace is restrictive for someapplications.

As the task relationships are often unknown and are learnedfrom data, we represent tasks and their relations in the formof a graph. Let Υ = (V,E) represent a complete graph inwhich nodes V correspond to the tasks and the edges E modelany affinity between the tasks. In such case, a regularizationcan be applied on the graph modeling task dependencies [34].The complete graph can be modeled as a structure matrix S =[e1, e2 . . . e‖E‖] ∈ RM×‖E‖ where the deviation between thepairs of tasks can be regularized as:

‖WS‖2F =

‖E‖∑i=1

∥∥Wei∥∥22

=

‖E‖∑i=1

∥∥∥Weia−Weib

∥∥∥22, (4)

here, eia, eib are the edges between the nodes a and b, whereei ∈ RM . The matrix S defines an incidence matrix where eiaand eib are assigned to 1 and -1 respectively if nodes a and bare connected in the graph. Eq. (4) can be further explainedas:

‖WS‖2F = tr((WS)T (WS)) = tr(WSSTWT ) = tr(WLWT ),(5)

where L = SST is the Laplacian matrix and ‘tr’ representsthe trace of a matrix. The method to compute structure matrixS is discussed in section V-C.

There may exist disagreements between the scores fromdifferent experts (radiologists) due to the inherent uncertaintyin their evaluations. For instance, while one radiologist maygive a malignancy score of xj1 to a nodule j, the other maygive a score of xj2. In order to reflect these uncertainties inour algorithm, we formulate a scoring function which modelsthese inconsistencies:

Ψ(j) =

(exp(

−∑r(x

jr − µj)2

2σj)

)−1. (6)

For a particular example j, this inconsistency measure canbe represented as Ψ(j). xjr is the score given by the rth radiol-ogist (expert) whereas µj and σj represent mean and standarddeviation of the scores, respectively. For simplicity, we haveomitted the index for the task; however, this inconsistencyscore is calculated for all the tasks under consideration. Thefinal objective function of graph regularized sparse least squareoptimization with the inconsistency measure can expressed as:

minW

M∑i=1

1©︷ ︸︸ ︷‖(Xi + Ψi)Wi −Yi‖22 +

2©︷ ︸︸ ︷ρ1 ‖WS‖2F +

3©︷ ︸︸ ︷ρ2 ‖W‖1,

(7)where ρ1 tunes the penalty degree for graph structure andρ2 handles the sparsity level. In Eq. (7), the least squareloss function 1© observes decoupling of tasks whereas 2©and 3© model their interdependencies, so as to learn jointrepresentation.

D. Optimization

In order to solve Eq. (7) the conventional approach is touse standard gradient descent. However, standard gradientdescent cannot be applied here because the `1−norm is notdifferentiable at W = 0 and gradient descent approach fails toprovide sparse solutions [35]. The optimization function in theabove equation has both smooth and non-smooth convex parts.In this case the function can be solved by estimating the non-smooth part. The `1-norm in the above equation constitutesthe non-smooth part and the proximal operator can be usedfor its estimation. Therefore, we utilize accelerated proxi-mal gradient method [36] to solve Eq. (7). The acceleratedproximal approach is the first order gradient method having aconvergence rate of O(1/m2), where m controls the numberof iterations.

IV. UNSUPERVISED LEARNING METHODS

Since the task of annotating medical images is laborious,expensive and time-consuming, we explore the potential of un-supervised learning approaches for developing CAD systems.Our proposed unsupervised framework includes three steps(Figure 3). First, we perform clustering on the appearancefeatures obtained from the images to estimate an initial setof labels. Using the obtained initial labels, we compute labelproportions corresponding to each cluster. We finally train aclassifier using the label proportions and clusters to obtain thefinal classification.

A. Initial Label Estimation

Let X = [x1, x2 . . . xn]T ∈ Rn×d represent the input matrixwhich contains features from n images such that x ∈ Rd.In order to obtain an initial set of labels corresponding toeach sample, we cluster the data into 2 ≤ k < n clustersusing k-means. Let A represent |X| × k assignment matrixwhich denotes the membership assignment of each sample toa cluster. The optimal clustering would minimize the followingobjective function:

argminµv,A

k∑v=1

A(u, v) ‖xu − µv‖2 ,

s.t. A(u, v) = 0 ∨ 1,∑j

A(u, v) = 1

(8)

where µv is the mean of the samples in cluster v. The labelcu for the uth sample can then be estimated as:

cu = argmaxv

A(u, v) (9)

These labels serve as an initial set used to estimate labelproportions which are then used to train proportion-SVM(∝SVM) for further improvements. Here it is important tostate that when data is divided into groups/clusters throughclustering, one may assume that each cluster corresponds to aparticular class. In our work, clustering is used to estimatean initial set of labels that is progressively refined in thesubsequent steps.

Page 6: Supervised and Unsupervised Tumor Characterization in the ...Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci , Senior Member, IEEE. Abstract—Cancer

6

Cluster 1 p1

Cluster 2 p2

Label Proportions

Input Images (Lung nodules/IPMN)

Clustering

Initial Labels αSVM

Final Classification

Malignant/IPMN

Benign/Normal

Feature Extraction 𝒑𝒋 = 𝒏−𝟏𝑰(𝒚𝒊 = 𝒋)

𝒏

𝒊=𝟏

Step 1

Step 2

Step 3

Figure 3: An outline of the proposed unsupervised approach. Given the input images, we compute GIST features and performk-means clustering to get the initial set of labels which can be noisy. Using the set of labels, we compute label proportionscorresponding to each cluster/group (Eq. (11)). We finally employ ∝SVM to learn a discriminative model using the featuresand label proportions.

TABLE I: List and details of different experiments performed for supervised and unsupervised learning along with theirevaluation sets.

Experiments Details Evaluation SetE1 Supervised learning, 3D CNN based

Multi-task learning with attributes,fine-tuning (C3D) network

3D dataset:Malignancy score

regression of Lungnodules (CT)

E2 Unsupervised learning, GIST features,Proportion-SVM 2D dataset:

Lung nodules(CT) and IPMNclassification(MRI)

E3 Unsupervised learning, features fromdifferent layers of 2D VGG network

E4 Supervised learning to establishclassification upper-bound, GIST and VGG

features with SVM and RF

B. Learning with the Estimated Labels

Since our initial label estimation approach is unsupervised,there are uncertainties associated with them. It is, therefore,reasonable to assume that learning a discriminative modelbased on these noisy instance level labels can deteriorateclassification performance. In order to address this issue, wemodel the instance level labels as latent unknown variablesand thereby consider group/bag level labels.

Inspired by ∝SVM approach [37], which models the latentinstance level variables using the known group level labelproportions, we formulate our learning problem such thatclusters are analogous to the groups. In our formulation, eachcluster j can be represented as a group such that the majorityof samples belong to the class j. Considering the groups tobe disjoint such that

⋃kv=1Ωv= 1, 2, . . . n , and Ω represents

groups; the objective function of the large-margin ∝SVM afterconvex relaxation can be formulated as:

minc∈C

minw

(1

2wTw +K

n∑u=1

L(cu, wTφ(x))

)

C =

c

∣∣∣∣ |pv(c)− pv| ≤ ε, cu ∈ −1, 1 ∀kv=1

,

(10)

where p and p represent the estimated and true label propor-tions, respectively. In Eq. (10), c is the set of instance levellabels, φ(.) is the input feature, K denotes cost parameter andL(.) represents the hinge-loss function for maximum-margin

classifiers such as SVM. An alternative approach based ontraining a standard SVM classifier with clustering assignmentsis discussed in section V-D.

The optimization in Eq. (10) is, in fact, an instance ofMultiple Kernel Learning, which can be solved using thecutting plane method where the set of active constraints isincrementally computed. The goal is to find the most violatedconstraint, however, the objective function still decreases evenby further relaxation and aiming for any violated constraint.Further details about optimization can be studied in [37].

C. Calculating Label Proportions

In the conventional ∝SVM approach, the label proportionsare known a priori. Since our approach is unsupervised, bothinstance level labels and group label proportions are unknown.Moreover, establishing strong assumptions about the labelproportions may affect learning. It is, however, reasonable toassume that a large number of instances in any group carrythe same label and there may be a small number of instanceswhich are outliers. The label proportions serve as a soft-labelfor a bag where a bag can be considered as a super-instance.In order to determine the label proportions in a data-drivenmanner, we use the estimated labels obtained from clustering.The label proportion pj corresponding to the group j can berepresented as:

pj = n−1n∑i=1

I(yi = j), (11)

Page 7: Supervised and Unsupervised Tumor Characterization in the ...Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci , Senior Member, IEEE. Abstract—Cancer

7

Normal Pancreas

Pancreas with IPMN

Figure 4: Axial T2 MRI scans illustrating pancreas. The top row shows different ROIs of pancreas, along with a magnifiedview of a normal pancreas (outlined in blue). The bottom row shows ROIs from subjects with IPMN in the pancreas, whichis outlined in red.

where I(.) is the indicator function which yields 1 when yi = j.The ∝SVM is trained using the image features and label pro-portions to classify the testing data. It is important to mentionthat the ground truth labels (benign/malignant labels) are usedonly to evaluate the proposed framework and are not usedin estimating label proportions or training of the proportion-SVM. In addition, clustering and label proportion calculationare only performed on the training data and the testing dataremains completely unseen for ∝SVM. The number of clustersis fixed as 2, i.e. benign and malignant classes and the resultwas checked to assign benign and malignant labels to theclusters.

V. EXPERIMENTS

A. Data for Lung Nodules

For test and evaluation, we used LIDC-IDRI data set fromLung Image Database Consortium [31], which is one of thelargest publicly available lung nodule dataset. The datasetcomprises 1018 CT scans with a slice thickness varying from0.45 mm to 5.0 mm. At most four radiologists annotated thoselung nodules which have diameters equal to or greater than 3.0mm.

We considered nodules which were interpreted by at leastthree radiologists for evaluations. The number of nodulesfulfilling this criterion was 1340. As a nodule may havedifferent malignancy and attribute scores provided by differentradiologists, their mean scores were used. The nodules havescores corresponding to these six attributes: (i) calcification,(ii) lobulation, (iii) spiculation, (iv) sphericity, (v) margin and(vi) texture as well as malignancy (Figure 2). The malignancyscores ranged from 1 to 5 where 1 denoted benign and 5meant highly malignant nodules. To account for malignancyindecision among radiologists, we excluded nodules witha mean score of 3. The final evaluation set included 509

malignant and 635 benign nodules. As a pre-processing step,the images were resampled to be isotropic so as to have 0.5mm spacing in each dimension.

B. Data for IPMN

The data for the classification of IPMN contains T2 MRIaxial scans from 171 subjects. The scans were labeled by aradiologist as normal or IPMN. Out of 171 scans, 38 subjectswere normal, whereas the rest of 133 were from subjectsdiagnosed with IPMN. The in-plane spacing (xy-plane) ofthe scan was ranging from 0.468 mm to 1.406 mm. As pre-processing, we first employ N4 bias field correction [38] toeach image in order to normalize variations in image intensity.We then apply curvature anisotropic image filter to smoothimage while preserving edges. For experiments, 2D axial sliceswith pancreas (and IPMN) are cropped to generate Regionof Interest (ROI) as shown in Figure 4. The large intra-classvariation, especially due to varying shapes of the pancreas canalso be observed in Figure 4. A list of different supervised andunsupervised learning experiments along with their evaluationsets is tabulated in Table I.

C. Evaluation and Results- Supervised Learning

We fine-tuned the 3D CNN network trained on Sports-1M dataset [32] which had 487 classes. In order to trainthe network with binary labels for malignancy and the sixattributes we used the mid-point as pivot and labeled samplesas positive (or negative) based on their scores being greater(or lesser) than the mid-point. In our context, malignancy andattributes are characterized as tasks. The C3D was fine-tunedwith these 7 tasks and 10 fold cross-validation was conducted.The requirement to have a large amount of labeled trainingdata was evaded by fine-tuning the network. Since the inputto the network required 3 channel image sequences with at

Page 8: Supervised and Unsupervised Tumor Characterization in the ...Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci , Senior Member, IEEE. Abstract—Cancer

8

TABLE II: The comparison of the proposed approach with other methods using regression accuracy and mean absolute scoredifference for lung nodule characterization.

Methods Accuracy Mean Score% Difference

GIST features + LASSO 76.83 0.675GIST features + RR 76.48 0.6743D CNN features + LASSO (Pre-trained) 86.02 0.5303D CNN features + RR (Pre-trained) 82.00 0.5973D CNN features + LASSO (Fine-tuned) 88.04 0.4973D CNN features + RR (Fine-tuned) 84.53 0.5503D CNN MTL with Trace norm 80.08 0.626Proposed (3D CNN with Multi-task Learning- Eq. 7) 91.26 0.459

least 16 slices, we concatenated the gray level axial channelas the other two channels.

Additionally, in order to ascertain that all input vol-umes have 16 slices, we performed interpolation where war-ranted. The final feature representation was obtained from thefirst fully connected layer of 3D CNN consisting of 4096-dimensions.

For computing structure matrix S, we calculate the corre-lation between different tasks by estimating the normalizedcoefficient matrix W via least square loss function with lassofollowed by the calculation of correlation coefficient matrix[34]. In order to get a binary graph structure matrix, wethresholded the correlation coefficient matrix. As priors inEq. (7) we used ρ1 and ρ2 as 1 and 10 respectively. Finally, toobtain the malignancy score for test images, the features fromthe network trained on malignancy were multiplied with thecorresponding task coefficient vector W .

We evaluated our proposed approach using both classifi-cation and regression metrics. For classification, we consid-ered a nodule to be successfully classified if its predictedscore lies in ±1 of the ground truth score. For regression,we calculated average absolute score difference between thepredicted score and the true score. The comparison of ourproposed MTL approach with approaches including GISTfeatures [39], 3D CNN features from pre-trained network +LASSO, Ridge Regression (RR) and 3D CNN MTL+tracenorm is tabulated in Table II. It can be observed that ourproposed graph regularized MTL performs significantly betterthan other approaches both in terms of classification accuracyas well as the mean score difference. The gain in classificationaccuracy was found to be 15% and 11% for GIST and trace-norm respectively. In comparison with the pre-trained network,we obtain an improvement of 5% with proposed MTL. Inaddition, our proposed approach reduces the average absolutescore difference for GIST by 32% and for trace-norm by 27%.

D. Evaluations and Results- Unsupervised Learning

For unsupervised learning, evaluations were performed onboth lung nodules and IPMN datasets. In order to computeimage level features, we used GIST descriptors [39]. Thenumber of clusters is fixed as 2, which accounts for benign

and malignant classes. The clustering result was checked toassign benign and malignant labels to the clusters. We used 10fold cross-validation to evaluate our proposed approach. Thetraining samples along with the label proportions generatedusing clustering served as the input to ∝SVM with a linearkernel.

To evaluate our unsupervised approach we used accuracy,sensitivity and specificity as metrics. It can be observedin Table III that the proposed combination of clusteringand ∝SVM significantly outperforms other approaches inaccuracy and sensitivity. In comparison with clustering+SVM,the proposed framework yields almost 21% improvement insensitivity for lung nodules and around 7% improvement forIPMN classification. The low sensitivity and high specificityof clustering, clustering+SVM, and clustering+RF approachescan be explained by disproportionate assignment of instancesas benign (normal) by these approaches, which is not foundin the proposed approach. At the same time, the proposedapproach records around 24% and 9% improvement inaccuracy as compared to clustering for lung nodules andIPMN, respectively.

Are Deep Features good for Unsupervised Classification?Given the success of deep learning features for image classi-fication tasks and their popularity with the medical imagingcommunity, we explored their performance to classify lungnodules and IPMN in an unsupervised manner. For thispurpose, we used a pre-trained deep CNN architecture toextract features and then perform clustering to obtain baselineclassification performance. We extracted features from fully-connected layers 7 and 8 of Fast-VGG [40] with and withoutapplying ReLU non-linearity. Classification accuracy, usingclustering over these features is shown in Figure 5.

It can be seen in Figure 5 that the features with non-linearity (ReLU) are more discriminative for classificationusing clustering as compared to without ReLU. The same trendcan be observed for both lung nodules and IPMN classificationusing VGG-fc7 and VGG-fc8 layers. Owing to the largerevaluation set, the influence of ReLU is more prominent forlung nodules as compared to IPMN. Although the resultsbetween VGG-fc7 and VGG-fc8 are not substantially different,

Page 9: Supervised and Unsupervised Tumor Characterization in the ...Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci , Senior Member, IEEE. Abstract—Cancer

9

TABLE III: Average classification accuracy, sensitivity and specificity of the proposed unsupervised approach for IPMN andlung nodule classification with other methods

Evaluation Set Methods Accuracy Sensitivity Specificity

IPMNClassification

Clustering 49.18% 45.34% 62.83%Clustering + RF 53.20% 51.28% 69.33%

Clustering + SVM 52.03% 51.96% 50.5%Proposed approach 58.04% 58.61% 41.67%

Lung NoduleClassification

Clustering 54.83% 48.69% 60.04%Clustering + RF 76.74% 58.59% 91.40%

Clustering + SVM 76.04% 57.08% 91.28%Proposed approach 78.06% 77.85% 78.28%

0

10

20

30

40

50

60

70

VGG-fc7 VGG-fc7-ReLU VGG-fc8 VGG-fc8-ReLU

47.11

54.3 52.99 55.42

Perc

enta

ge (%

)

Unsupervised Classification of Lung nodules using various Deep Features

Accuracy

Sensitivity

Specificity

0

10

20

30

40

50

60

VGG-fc7 VGG-fc7-ReLU VGG-fc8 VGG-fc8-ReLU

49.12 51.47 50.36 50.92

Perc

enta

ge (%

)

Unsupervised Classification of IPMN using various Deep Features

Accuracy

Sensitivity

Specificity

Figure 5: Influence of deep learning features obtained from different layers of a VGG network with and without ReLUnon-linearities. The graph on the left shows accuracy, sensitivity and specificity for unsupervised lung nodule classification(clustering), whereas the right one shows the corresponding results for IPMN.

highest accuracy for IPMN can be obtained by using VGG-fc7-ReLU features and for lung nodules by using VGG-fc8-ReLU features. The non-linearity induced by ReLU clipsthe negative values to zero, which can sparsify the featurevector and can reduce overfitting. Additionally, it can be seenthat GIST features yield comparable performance than deepfeatures (Table III). This can be explained by the fact that thedeep networks were trained on ImageNet dataset so the filtersin the networks were more tuned to the variations in naturalimages than medical images. Classification improvement canbe expected with unsupervised feature learning techniquessuch as GANs [29].

Classification using Supervised LearningIn order to establish the upper-bound on the classificationperformance, we trained linear SVM and Random Forestusing GIST and different deep learning features with groundtruth labels on the same 10 fold cross-validations sets.Table IV lists the classification accuracy, sensitivity, andspecificity using GIST, VGG-fc7 and VGG-fc8 featuresfor both IPMN and lung nodules. For both VGG-fc7 andVGG-fc8, we used features after ReLU since they are foundto be more discriminative (Figure 5). Interestingly, for lungnodules, VGG-fc7 features along with RF classifier arereported to have comparable results to the combination of

GIST and RF classifier. This can be explained by the factthat deep networks are pre-trained on ImageNet dataset ascompared to handcrafted features such as GIST, which don’trequire any training. On the other hand, for smaller datasetssuch as IPMN, deep features are found to perform betteras compared to GIST. In order to balance the number ofpositive (IPMN) and negative (normal) examples, which canbe a critical drawback otherwise, we performed AdaptiveSynthetic Sampling [41]. This was done to generate syntheticexamples in terms of features from the minority class (normal).

VI. DISCUSSION AND CONCLUDING REMARKS

In this study, we present a framework for the malignancydetermination of lung nodules with 3D CNN based graphregularized sparse MTL. To the best of our knowledge, thisis the first work where transfer learning is studied and empir-ically analyzed for 3D deep networks so as to improve riskstratification. Usually, the data sharing for medical imaging ishighly regulated and the accessibility of experts (radiologists)to label these images is limited. As a consequence, the accessto the crowdsourced and publicly gathered and annotated datasuch as videos may help in obtaining discriminative featuresfor medical image analysis.

Page 10: Supervised and Unsupervised Tumor Characterization in the ...Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci , Senior Member, IEEE. Abstract—Cancer

10

TABLE IV: Classification of IPMN and Lung Nodules using different features and supervised learning classifiers.

Evaluation Set Features Classifiers Accuracy (%) Sensitivity (%) Specificity (%)

IPMNClassification

GIST SVM 76.05 83.65 52.67RF 81.9 93.69 43.0

VGG-fc7 SVM 84.18 96.91 44.83RF 81.96 94.61 42.83

VGG-fc8 SVM 84.22 97.2 46.5RF 80.82 93.4 45.67

Lung NoduleClassification

GIST SVM 81.56 71.31 90.02RF 81.64 76.47 85.97

VGG-fc7 SVM 77.97 75.2 80.6RF 81.73 78.24 84.59

VGG-fc8 SVM 78.76 74.67 82.29RF 80.51 76.03 84.24

We also analyzed the significance of different imagingattributes corresponding to lung nodules including spiculation,texture, calcification and others for risk assessment. Insteadof manually modeling these attributes we utilized 3D CNNto learn rich feature representations associated with theseattributes. The graph regularized sparse MTL framework wasemployed to integrate 3D CNN features from these attributes.We have found the features associated with these attributescomplementary to those corresponding to malignancy.

In the second part of this study, we explored the potentialof unsupervised learning for malignancy determination. Sincein most medical imaging tasks radiologists are required toget annotations, acquiring labels to learn machine learningmodels is more cumbersome and expensive as compared toother computer vision tasks. In order to address this challenge,we employed clustering to obtain an initial set of labelsand progressively refined them with ∝SVM. We obtainedpromising results and our proposed approach outperformedthe other methods in evaluation metrics.

Following up on the application of deep learning for almostall tasks in the visual domain, we studied the influence of dif-ferent pre-trained deep networks for lung nodule classification.For some instances, we found that commonly used imagingfeatures such as GIST have comparable results as thoseobtained from pre-trained network features. This observationcan be explained by the fact that the deep networks weretrained on ImageNet classification tasks so the filters in CNNwere more tuned to the nuances in natural images as comparedto medical images.

To the best of our knowledge, this is one of the first and thelargest evaluation of a CAD system for IPMN classification.CAD systems for IPMN classification are relatively newerresearch problems and there is a need to explore the use of dif-ferent imaging modalities to improve classification. AlthoughMRI remains the most common modality to study pancreaticcysts, CT images can also be used as a complementary imagingmodality due to its higher resolution and its ability to capturesmaller cysts. Additionally, a combination of T2-weighted,contrast-enhanced and unenhanced T1-weighted sequences canhelp improve detection and diagnosis of IPMN [23]. In thisregard, multi-modal deep learning architectures can be deemed

useful [42]. The detection and segmentation of pancreas canalso be useful to make a better prediction about the presenceof IPMN and cysts. Due to its anatomy, the pancreas is achallenging organ to segment, particularly in MRI images.To address this challenge, other imaging modalities can beutilized for joint segmentation and diagnosis of pancreaticcysts and IPMN. Furthermore, visualization of activation mapscan be quite useful for the clinicians to identify new imagingbiomarkers that can be employed for diagnosis in future.

The future prospects of using different architectures toperform unsupervised representation learning using GAN arepromising. Instead of using hand-engineered priors of sam-pling in the generator, the work in [43] learned priors usingdenoising auto-encoders. For measuring the sample similarityfor complex distributions such as those in the images, [44]jointly trained variational autoencoders and GANs. Moreover,the applications of CatGAN [45] and InfoGAN [46] for semi-supervised and unsupervised classification tasks in medicalimaging are worth exploring as well.

There is a lot of potential for research in developingunsupervised approaches for medical imaging applications.Medical imaging has unique challenges associated with thescarcity of labeled examples. Moreover, unless corroboratedby biopsy, there may exist a large variability in labelingfrom different radiologists. Although fine-tuning has helpedto address the lack of annotated examples, the performance islimited due to large differences in domains. It is comparativelyeasier to obtain scan level labels than slice level labels. Inthis regard, weakly supervised approaches such as multipleinstance learning (MIL) can be of great value. Active learningcan be another solution to alleviate the difficulty in labeling.Deep learning for joint feature learning and clustering can beemployed to obtain data-driven clusters [47]. In future, theseresearch directions can be pursued to address unique medicalimaging challenges and to have improved diagnostic decisionsin clinics.

REFERENCES

[1] Society, A.C.: Cancer Facts & Figures. American Cancer Society (2016)[2] Sadot, E., Basturk, O., Klimstra, D.S., Gonen, M., Anna, L., Do, R.K.G.,

DAngelica, M.I., DeMatteo, R.P., Kingham, T.P., Jarnagin, W.R., et al.:Tumor-associated neutrophils and malignant progression in intraductal

Page 11: Supervised and Unsupervised Tumor Characterization in the ...Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci , Senior Member, IEEE. Abstract—Cancer

11

papillary mucinous neoplasms: an opportunity for identification of high-risk disease. Annals of surgery 262(6), 1102 (2015)

[3] Kumar, D., Wong, A., Clausi, D.A.: Lung nodule classification usingdeep features in CT images. In: Computer and Robot Vision (CRV),2015 12th Conference on. pp. 133–138. IEEE (2015)

[4] Buty, M., Xu, Z., Gao, M., Bagci, U., Wu, A., Mollura, D.J.: Characteri-zation of Lung Nodule Malignancy Using Hybrid Shape and AppearanceFeatures. In: MICCAI. pp. 662–670. Springer (2016)

[5] Hussein, S., Cao, K., Song, Q., Bagci, U.: Risk Stratification of LungNodules Using 3D CNN-Based Multi-task Learning. In: InternationalConference on Information Processing in Medical Imaging. pp. 249–260. Springer (2017)

[6] Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learningspatiotemporal features with 3D convolutional networks. In: ICCV. pp.4489–4497. IEEE (2015)

[7] Furuya, K., Murayama, S., Soeda, H., Murakami, J., Ichinose, Y.,Yauuchi, H., Katsuda, Y., Koga, M., Masuda, K.: New classificationof small pulmonary nodules by margin characteristics on highresolutionCT. Acta Radiologica 40(5), 496–504 (1999)

[8] Setio, A.A.A., Ciompi, F., Litjens, G., Gerke, P., Jacobs, C., van Riel,S.J., Wille, M.M.W., Naqibullah, M., Sanchez, C.I., van Ginneken, B.:Pulmonary nodule detection in CT images: false positive reduction usingmulti-view convolutional networks. IEEE TMI 35(5), 1160–1169 (2016)

[9] Setio, A.A.A., Traverso, A., De Bel, T., Berens, M.S., van den Bogaard,C., Cerello, P., Chen, H., Dou, Q., Fantacci, M.E., Geurts, B., et al.:Validation, comparison, and combination of algorithms for automaticdetection of pulmonary nodules in computed tomography images: theluna16 challenge. Medical image analysis 42, 1–13 (2017)

[10] Khosravan, N., Bagci, U.: S4ND: Single-Shot Single-Scale Lung NoduleDetection. arXiv preprint arXiv:1805.02279 (2018)

[11] Uchiyama, Y., Katsuragawa, S., Abe, H., Shiraishi, J., Li, F., Li, Q.,Zhang, C.T., Suzuki, K., Doi, K.: Quantitative computerized analysis ofdiffuse lung disease in high-resolution computed tomography. MedicalPhysics 30(9), 2440–2454 (2003)

[12] El-Baz, A., Nitzken, M., Khalifa, F., Elnakib, A., Gimelfarb, G., Falk,R., El-Ghar, M.A.: 3D shape analysis for early diagnosis of malignantlung nodules. In: IPMI. pp. 772–783. Springer (2011)

[13] Han, F., Wang, H., Zhang, G., Han, H., Song, B., Li, L., Moore, W.,Lu, H., Zhao, H., Liang, Z.: Texture feature analysis for computer-aideddiagnosis on pulmonary nodules. Journal of Digital Imaging 28(1), 99–115 (2015)

[14] Way, T.W., Hadjiiski, L.M., Sahiner, B., Chan, H.P., Cascade, P.N.,Kazerooni, E.A., Bogot, N., Zhou, C.: Computer-aided diagnosis ofpulmonary nodules on CT scans: segmentation and classification using3D active contours. Medical Physics 33(7), 2323–2337 (2006)

[15] Lee, M., Boroczky, L., Sungur-Stasik, K., Cann, A., Borczuk, A.,Kawut, S., Powell, C.: Computer-aided diagnosis of pulmonary nodulesusing a two-step approach for feature selection and classifier ensembleconstruction. Artificial Intelligence in Medicine 50(1), 43–53 (2010)

[16] Chen, S., Ni, D., Qin, J., Lei, B., Wang, T., Cheng, J.Z.: Bridgingcomputational features toward multiple semantic features with multi-task regression: A study of CT pulmonary nodules. In: MICCAI. pp.53–60. Springer (2016)

[17] Shen, W., Zhou, M., Yang, F., Yang, C., Tian, J.: Multi-scale convo-lutional neural networks for lung nodule classification. In: IPMI. pp.588–599. Springer (2015)

[18] Ciompi, F., Chung, K., Van Riel, S.J., Setio, A.A.A., Gerke, P.K., Jacobs,C., Scholten, E.T., Schaefer-Prokop, C., Wille, M.M., Marchiano, A.,et al.: Towards automatic pulmonary nodule management in lung cancerscreening with deep learning. Scientific reports 7, 46479 (2017)

[19] Zhou, Y., Xie, L., Fishman, E.K., Yuille, A.L.: Deep Supervision forPancreatic Cyst Segmentation in Abdominal CT Scans. arXiv preprintarXiv:1706.07346 (2017)

[20] Cai, J., Lu, L., Zhang, Z., Xing, F., Yang, L., Yin, Q.: Pancreassegmentation in MRI using graph-based decision fusion on convolutionalneural networks. In: MICCAI. pp. 442–450. Springer (2016)

[21] Hanania, A.N., Bantis, L.E., Feng, Z., Wang, H., Tamm, E.P., Katz,M.H., Maitra, A., Koay, E.J.: Quantitative imaging to evaluate malignantpotential of IPMNs. Oncotarget 7(52), 85776 (2016)

[22] Gazit, L., Chakraborty, J., Attiyeh, M., Langdon-Embry, L., Allen,P.J., Do, R.K., Simpson, A.L.: Quantification of CT Images for theClassification of High-and Low-Risk Pancreatic Cysts. In: SPIE MedicalImaging. pp. 101340X–101340X. International Society for Optics andPhotonics (2017)

[23] Kalb, B., Sarmiento, J.M., Kooby, D.A., Adsay, N.V., Martin, D.R.: MRimaging of cystic lesions of the pancreas. Radiographics 29(6), 1749–1765 (2009)

[24] Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.:Discovering objects and their location in images. In: ICCV. vol. 1, pp.370–377. IEEE (2005)

[25] Kamper, H., Jansen, A., Goldwater, S.: Fully unsupervised small-vocabulary speech recognition using a segmental Bayesian model. In:Interspeech (2015)

[26] Shin, H.C., Orton, M.R., Collins, D.J., Doran, S.J., Leach, M.O.:Stacked autoencoders for unsupervised feature learning and multipleorgan detection in a pilot study using 4d patient data. IEEE transactionson pattern analysis and machine intelligence 35(8), 1930–1943 (2013)

[27] Vaidhya, K., Thirunavukkarasu, S., Alex, V., Krishnamurthi, G.: Multi-modal brain tumor segmentation using stacked denoising autoencoders.In: International Workshop on Brainlesion: Glioma, Multiple Sclerosis,Stroke and Traumatic Brain Injuries. pp. 181–194. Springer (2015)

[28] Sivakumar, S., Chandrasekar, C.: Lung nodule segmentation throughunsupervised clustering models. Procedia engineering 38, 3064–3073(2012)

[29] Radford, A., Metz, L., Chintala, S.: Unsupervised representation learningwith deep convolutional generative adversarial networks. arXiv preprintarXiv:1511.06434 (2015)

[30] Zhang, F., Song, Y., Cai, W., Zhou, Y., Fulham, M., Eberl, S., Shan,S., Feng, D.: A ranking-based lung nodule image classification methodusing unlabeled image knowledge. In: IEEE ISBI. pp. 1356–1359. IEEE(2014)

[31] Armato III, S., McLennan, G., Bidaut, L., McNitt-Gray, M.F., Meyer,C.R., Reeves, A.P., Zhao, B., Aberle, D.R., Henschke, C.I., Hoffman,E.A., et al.: The Lung Image Database Consortium (LIDC) and ImageDatabase Resource Initiative (IDRI): a completed reference database oflung nodules on CT scans. Medical Physics 38(2), 915–931 (2011)

[32] Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei,L.: Large-scale video classification with convolutional neural networks.In: IEEE CVPR. pp. 1725–1732 (2014)

[33] Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutionsof linear matrix equations via nuclear norm minimization. SIAM review52(3), 471–501 (2010)

[34] Zhou, J., Chen, J., Ye, J.: MALSAR: Multi-task learning via structuralregularization (2012)

[35] Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l1-regularizedloss minimization. Journal of Machine Learning Research 12(Jun),1865–1892 (2011)

[36] Nesterov, Y.: Introductory lectures on convex optimization: A basiccourse, vol. 87. Springer Science & Business Media (2013)

[37] Yu, F., Liu, D., Kumar, S., Tony, J., Chang, S.F.: ∝SVM for Learningwith Label Proportions. In: Proceedings of The 30th InternationalConference on Machine Learning. pp. 504–512 (2013)

[38] Tustison, N.J., Avants, B.B., Cook, P.A., Zheng, Y., Egan, A., Yushke-vich, P.A., Gee, J.C.: N4ITK: Improved N3 bias correction. IEEETransactions on Medical Imaging 29(6), 1310–1320 (2010)

[39] Oliva, A., Torralba, A.: Modeling the shape of the scene: A holisticrepresentation of the spatial envelope. IJCV 42(3), 145–175 (2001)

[40] Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of thedevil in the details: Delving deep into convolutional nets. In: BritishMachine Vision Conference (2014)

[41] He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: Adaptive syntheticsampling approach for imbalanced learning. In: Neural Networks, 2008.IJCNN 2008.(IEEE World Congress on Computational Intelligence).IEEE International Joint Conference on. pp. 1322–1328. IEEE (2008)

[42] Ma, L., Lu, Z., Shang, L., Li, H.: Multimodal convolutional neuralnetworks for matching image and sentence. In: IEEE ICCV. pp. 2623–2631 (2015)

[43] Nguyen, A., Yosinski, J., Bengio, Y., Dosovitskiy, A., Clune, J.: Plug &play generative networks: Conditional iterative generation of images inlatent space. arXiv preprint arXiv:1612.00005 (2016)

[44] Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencod-ing beyond pixels using a learned similarity metric. In: ICML (2016)

[45] Springenberg, J.T.: Unsupervised and semi-supervised learningwith categorical generative adversarial networks. arXiv preprintarXiv:1511.06390 (2015)

[46] Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel,P.: Infogan: Interpretable representation learning by information maxi-mizing generative adversarial nets. In: Advances in Neural InformationProcessing Systems. pp. 2172–2180 (2016)

[47] Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding forclustering analysis. In: International conference on machine learning.pp. 478–487 (2016)