a novel deep hashing method for fast image...

12
The Visual Computer https://doi.org/10.1007/s00371-018-1583-x ORIGINAL ARTICLE A novel deep hashing method for fast image retrieval Shuli Cheng 1 · Huicheng Lai 1 · Liejun Wang 2 · Jiwei Qin 3 © Springer-Verlag GmbH Germany, part of Springer Nature 2018 Abstract In recent years, the deep hashing image retrieval algorithm has become a hot spot in current research. Although the deep hashing algorithm has achieved good results in image retrieval, how to further improve the accuracy of the image retrieval algorithm and reduce the computational complexity of the algorithm, the two basic requirements of the algorithm, need attention in image retrieval. The paper proposes a new Aggregate Deep Fast Supervised Discrete Hashing (ADFSDH) method for highly efficient image retrieval on large-scale datasets. Specifically, in order to improve the algorithm performance, the paper first proposes a new Aggregate Deep Convolutional Neural Network (ADCNN) mode based on VGG16, VGG19 and transfer learning for effective image feature extraction, which contains two different feature extractors in parallel. And then, the paper proposes a new feature fusion method. When our weighted proportion is consistent with the Mean Average Precision results of two different feature extractors, we can obtain the most accurate description of the image. Firstly, in order to save ADCNN required storage space and improve ADCNN image retrieval efficiency, the Fast Supervised Discrete Hashing algorithm after adjusting the parameters is introduced into the ADCNN model. In addition, ADFSDH unifies feature learning and hash coding into the same framework. The proposed method was experimented on three datasets (CIFAR10, MNIST and FD-XJ), and the result shows that it is superior to the current mainstream approaches in image retrieval. Keywords Image retrieval · Convolutional Neural Network · Fast Supervised Discrete Hashing · Transfer learning · Aggregate Deep Fast Supervised Discrete Hashing 1 Introduction Content-based image retrieval (CBIR) technology gradually becomes the mainstream and expresses the image content by the low-level features. The low-level features include local feature descriptor based on gradient, such as Generalized Search Trees (GIST) [1], Scale-invariant Features Transform (SIFT) [2], Histogram of Oriented Gradient (HOG) [3] and so on. The convolutional neural network provides an end- to-end learning model that can obtain more representative B Huicheng Lai [email protected] 1 School of Information Science and Engineering, Xinjiang University, No. 666, Victory Rd., Tianshan District, Urumchi 830046, China 2 School of Software Engineering, Xinjiang University, No. 499, Northwest Rd., Saybagh District, Urumchi 830046, China 3 School of Education, Shaanxi Normal University, No. 199, Changan South Rd., Yanta District, Urumchi, Xi’an 710062, China features. For the previously study of convolutional neural network structure, although LeNet-5 [4] was successfully applied in the field of handwritten character recognition, the depth of LeNet-5 network needs further optimization. Until 2012, AlexNet [5] achieved the best classification effect on ImageNet [6], causing the convolutional neural network once again to arouse the attention of the academic community. The appearance of VGGNet [7] network proved that the increase in network depth helps to improve the accuracy of image classification. ResNet [8] proposed to solve the problem of the degradation of deep networks. These studies have got bet- ter image feature representations, but these methods require high computational complexity in image retrieval. In order to achieve efficient retrieval of large-scale high-dimensional image data, the researchers proposed an approximate nearest neighbor search strategy. The hashing technique is the mainstream method of solving approximate nearest neighbor problem. Locality Sensitive Hashing (LSH) [9] uses random projections to construct a hash function. Spectral Hashing (SH) [10] converts the problem of image feature vector encoding to the dimension reduction prob- 123

Upload: others

Post on 14-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A novel deep hashing method for fast image retrievalstatic.tongtianta.site/paper_pdf/066cca60-a865-11e9-a732... · 2019-07-17 · strong learning capabilities, Convolutional Neural

The Visual Computerhttps://doi.org/10.1007/s00371-018-1583-x

ORIG INAL ART ICLE

A novel deep hashingmethod for fast image retrieval

Shuli Cheng1 · Huicheng Lai1 · Liejun Wang2 · Jiwei Qin3

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

AbstractIn recent years, the deep hashing image retrieval algorithm has become a hot spot in current research. Although the deephashing algorithm has achieved good results in image retrieval, how to further improve the accuracy of the image retrievalalgorithm and reduce the computational complexity of the algorithm, the two basic requirements of the algorithm, needattention in image retrieval. The paper proposes a new Aggregate Deep Fast Supervised Discrete Hashing (ADFSDH) methodfor highly efficient image retrieval on large-scale datasets. Specifically, in order to improve the algorithm performance, thepaper first proposes a new Aggregate Deep Convolutional Neural Network (ADCNN) mode based on VGG16, VGG19and transfer learning for effective image feature extraction, which contains two different feature extractors in parallel. Andthen, the paper proposes a new feature fusion method. When our weighted proportion is consistent with the Mean AveragePrecision results of two different feature extractors, we can obtain the most accurate description of the image. Firstly, inorder to save ADCNN required storage space and improve ADCNN image retrieval efficiency, the Fast Supervised DiscreteHashing algorithm after adjusting the parameters is introduced into the ADCNNmodel. In addition, ADFSDH unifies featurelearning and hash coding into the same framework. The proposed method was experimented on three datasets (CIFAR10,MNIST and FD-XJ), and the result shows that it is superior to the current mainstream approaches in image retrieval.

Keywords Image retrieval ·Convolutional Neural Network · Fast Supervised Discrete Hashing · Transfer learning ·AggregateDeep Fast Supervised Discrete Hashing

1 Introduction

Content-based image retrieval (CBIR) technology graduallybecomes the mainstream and expresses the image content bythe low-level features. The low-level features include localfeature descriptor based on gradient, such as GeneralizedSearch Trees (GIST) [1], Scale-invariant Features Transform(SIFT) [2], Histogram of Oriented Gradient (HOG) [3] andso on. The convolutional neural network provides an end-to-end learning model that can obtain more representative

B Huicheng [email protected]

1 School of Information Science and Engineering, XinjiangUniversity, No. 666, Victory Rd., Tianshan District,Urumchi 830046, China

2 School of Software Engineering, Xinjiang University, No.499, Northwest Rd., Saybagh District, Urumchi 830046,China

3 School of Education, Shaanxi Normal University, No. 199,Changan South Rd., Yanta District, Urumchi, Xi’an 710062,China

features. For the previously study of convolutional neuralnetwork structure, although LeNet-5 [4] was successfullyapplied in the field of handwritten character recognition, thedepth of LeNet-5 network needs further optimization. Until2012, AlexNet [5] achieved the best classification effect onImageNet [6], causing the convolutional neural network onceagain to arouse the attention of the academic community. Theappearance of VGGNet [7] network proved that the increasein network depth helps to improve the accuracy of imageclassification. ResNet [8] proposed to solve the problem ofthe degradation of deep networks. These studies have got bet-ter image feature representations, but these methods requirehigh computational complexity in image retrieval.

In order to achieve efficient retrieval of large-scalehigh-dimensional image data, the researchers proposed anapproximate nearest neighbor search strategy. The hashingtechnique is the mainstream method of solving approximatenearest neighbor problem. Locality Sensitive Hashing (LSH)[9] uses random projections to construct a hash function.Spectral Hashing (SH) [10] converts the problem of imagefeature vector encoding to the dimension reduction prob-

123

Page 2: A novel deep hashing method for fast image retrievalstatic.tongtianta.site/paper_pdf/066cca60-a865-11e9-a732... · 2019-07-17 · strong learning capabilities, Convolutional Neural

S. Cheng et al.

lem of Laplacian feature maps. Supervised Hashing withKernels (KSH) [11] has better nonlinear mapping capa-bilities. Supervised Discrete Hashing (SDH) [12] directlyuses Discrete Cyclic Coordinate (DCC) descent iterativelyto solve discrete optimization. Supervised Discrete Hashingwith Relaxation (SDHR) [13] optimizes the SDH regres-sion objective function. Fast Supervised Discrete Hashing(FSDH) [14] uses a very simple yet effective regression ofthe class labels of training examples to the correspondinghash code to accelerate the algorithm. Encouraged by CNN’sstrong learning capabilities, Convolutional Neural NetworkHashing (CNNH) [15] pushes the CNN-based depth hashingalgorithm to the forefront of scientific research.

Based on the outstanding performance of convolutionalneural networks and hash algorithms, this paper proposesa novel deep hashing method, named the Aggregate DeepFast SupervisedDiscreteHashing (ADFSDH). Compact rootbilinear CNN (CRB-CNN) removes the full connection layerof matconvnet-vgg-m and vgg-verydeep-16, and then per-forms compact bilinear pooling and inner product imagedescriptor [16, 17]. Aggregate Ensemble model based ondeep CNN (AECNN) [18] utilizes two kinds of deep learn-ing networks, AlexNet and Network in Network (NIN), toobtain image features, and to compute weighted average fea-ture vectors for image retrieval. Our proposed model is notonly inspired by aggregate models [16–18], but also inspiredby deep hashing learning networks [19–23]. Our main con-tributions are summarized as follows:

(1) We propose a new feature learning model. The modelcontains two different feature extractors in parallel,namely fine-tuning with VGG16 and fine-tuning withVGG19. Two parallel feature extractors are used toextract 512-dimensional features for each image.

(2) We propose a new feature fusion method. When ourweighted proportion is consistent with the MAP resultsof two different feature extractors, we can obtain themost accurate description of the image.

(3) A new method to adjust parameters. We adjust theparameters of the FSDH algorithm, and return the cate-gory labels of the training examples to the correspondinghash codes to speed up the algorithm.

(4) We propose a new deep hashing method namedADFSDH, which integrates feature learning and hashcoding in our framework for highly effective imageretrieval.

(5) We apply our proposed method to a face database col-lected by members of image group of our laboratory.

The rest of the article is organized as follows. We reviewthe relatedworks in Sect. 2 and describe our proposedmethodin Sect. 3. Experimental results are described in Sect. 4, andwe summarize in Sect. 5.

2 Related works

Hash algorithm has the advantages of high speed and lowstorage space, so it has attracted much attention fromresearchers. The hashing algorithm is divided into threecategories: Unsupervised Hashing, Supervised Hashing andSemi-Supervised Hashing. In unsupervised data-dependenthashingmethods, the training example labels are not requiredfor learning. These includeLocality SensitiveHashing (LSH)[9], Spectral Hashing (SH) [10] and Iterative Quantization(ITQ) [24]. Supervised hashing makes full use of supervi-sory information to learn compact hash coding. MinimalLoss Hashing (MLH) [25] uses a hash function based onsimilarity information to train. In order to solve the prob-lems of linear indivisibility of data, Supervised Hashing withKernels (KSH) [11] and Binary Reconstructive Embedding(BRE) [26] learn the hash function that preserves similarityin Kernels space. By analyzing the advantages and disadvan-tages of Unsupervised and Supervised Hashing, researchershave proposed Semi-Supervised Hashing (SSH) [27], whichminimizes empirical error and maximizes the hash-codedvariance.

Most of the existing hashing algorithms are based onlow-level features (Generalized Search Trees (GIST) [1],Scale-invariant Features Transform (SIFT) [2] and His-togram of Oriented Gradient (HOG) [3]). Low-level featuresresult in low image retrieval accuracy. The recently proposedConvolutional Neural Network Hashing (CNNH) [15] firstdecomposes the similaritymatrix to get the binary code of thesample, and then, the Convolutional Neural Network (CNN)is used to fit the obtained binary code. Compared to tradi-tional low-level feature methods, CNNH’s performance hasimproved, but learning image feature representations can-not always update binary code. Lin et al. [28] proposed deeplearning of binary hash codes for image retrieval. The core ofdeep learning of binary hash codes is to insert a new full con-nection layer between the penultimate layer of the pre-trainednetwork and the final task layer. The activation function issigmod, and the number of nodes is the code length of the tar-get binary code. End-to-end fine-tuning can embed semanticinformation into the output of newly inserted fully connectedlayer.

According to the latest researches, image retrieval meth-ods can be classified into three categories: for the firstmethoddirectly using convolutional neural networks to extract thedeep features of images, and then using the Euclidean dis-tance measures; applying the low-level features to hashingalgorithms, and then using the Hamming distance measures;combining a convolutional neural network with a hashingmethod, and then using the Hamming distance measures;the first method requires a large amount of storage and cal-culation. For the second method, the fixed image encodingmethod results in a poor image description capability. The

123

Page 3: A novel deep hashing method for fast image retrievalstatic.tongtianta.site/paper_pdf/066cca60-a865-11e9-a732... · 2019-07-17 · strong learning capabilities, Convolutional Neural

A novel deep hashing method for fast image retrieval

third method is also a hot topic in the latest research. Recentstudies show that the aggregate network model has betterimage description capabilities [16–18], while SDH [12],SDHR [13] and FSDH [14] belong to applying the low-level features to hashing algorithms. In addition, Li et al.[29] proposed the linear subspace ranking hashing for cross-modal retrieval. Li et al. [30] proposed deep cross-modalhashing. These cross-model retrieval methods improve theperformance of image retrieval to some extent.

Inspired by the CRB-CNN [17] architecture and theFSDH [14] algorithm, this paper proposes a method namedADFSDH. The algorithm combines the convolutional neuralnetwork with the hashing method and then uses the Ham-ming distance to measure the similarity. Firstly, we constructthe deep convolutional neural network model; secondly, wepropose a new feature fusion method to achieve the betterdescription of the image; finally, we adjust the parameters ofthe hash algorithm. In addition, we combine feature learningand hash coding in the same framework for fast and efficientimage retrieval. In this way, we propose our own deep hashmethod for fast image retrieval.

3 Our proposedmethod

Although the deep hashing algorithm has achieved goodresults in image retrieval, how to further improve the accuracyof the image retrieval algorithmand reduce the computationalcomplexity of the algorithm, the two basic requirements ofthe algorithm, need attention in image retrieval. Integratednetwork model is more accurate than single network modelin image retrieval, but the integrated model has high compu-tational complexity in image retrieval. Hash algorithm canreduce the computational complexity of the algorithm tosome extent. Based on this consideration, this paper pro-poses the ADFSDH algorithm to further improve the imageretrieval accuracy and reduce the computational complexity.

3.1 The framework of retrieval and feature learning

As shown in Fig. 1, our proposed ADFSDH is composed ofour proposed ADCNN and FSDH after adjusting parame-ters, which combines feature learning with hash coding intoa framework. Our method is mainly divided into the fol-lowing five steps: (1) to initialize the model by pre-training

Fig. 1 ADFSDH method andretrieval framework

Pooling Layers

ImageNet

Feature Extractor 1

Feature Extractor 2

feature fusion image

description

Fast Supervised Discrete H

ashing

Similarity Matching

Vector Space

Query Vector

Ranking

image image image

image image image

image image image

Parameter Transferring

Fully

-Connected Class 1

Class 2

Class n

Convolutional Layers

123

Page 4: A novel deep hashing method for fast image retrievalstatic.tongtianta.site/paper_pdf/066cca60-a865-11e9-a732... · 2019-07-17 · strong learning capabilities, Convolutional Neural

S. Cheng et al.

VGG16 and VGG19 networks on ImageNet; (2) to fine-tune the ADCNN model on image retrieval target domain;(3) to describe images more accurately by a new featurefusion method; (4) to generate a compact binary hash codeby FSDH algorithm after adjusting the parameters; (5) tocombine feature learning and hash coding into the sameframework for fast image retrieval. Figure 1 shows our pro-posed ADFSDH architecture and retrieval framework. InFig. 1, we propose a new Aggregate Deep ConvolutionalNeural Network (ADCNN) mode based on VGG16, VGG19and transfer learning for effective image feature extraction,which contains two different feature extractors in parallel.The multi-feature fusion image retrieval algorithm has a bet-ter search effect than a single feature. Therefore,we propose anew image fusionmethod to get better image description. Theintegrated model increases the computational complexity inimage retrieval. Based on this consideration, we combinefeature learning and hash coding in the same framework forhighly effective image retrieval.

3.2 The ADCNNmodel and feature fusion

Although some algorithms and related techniques have beenapplied to image retrieval [31–34], the ability of low-levelfeatures to describe images is limited in image retrieval.Based on the above considerations, we further study thedeep network model and feature fusion to obtain moreaccurate description of images. As shown in Fig. 2, our pro-posed ADCNN model is based on two variants of recentconvolutional neural networks: imagenet-vgg-verydeep-16(VGG16) and imagenet-vgg-verydeep-19 (VGG19) are pre-trained on ImageNet. These networks are composed of theconvolutional layers, the pooled layers and the fully con-nected layers. The input pictures are all adjusted to 224×224pixels. In the target domain, we need to fine-tune the struc-ture of the convolutional neural network. In our proposedADCNN model, we first extract eigenvectors with dimen-sions of 512 dimensions by two different feature extractorsin parallel and then perform a new feature fusion methodfor a more accurate description of the image. In our pro-posed feature fusion method, when our weighted proportionis consistent with the MAP results of two different featureextractors, we can obtain the most accurate description of theimage. Among them, MAP 1 is the MAP result of the imageretrieval by the feature extractor 1, and MAP 2 is the MAPresult of the image retrieval by the feature extractor 2.

3.3 Fast Supervised Discrete Hashing

The proposed ADCNN model can extract the 512-dimensional feature vector of each image. Although ourproposed ADCNN improves the accuracy of image retrieval,our proposedADCNNmodel requires a large amount of stor-

CB

IR

Dat

aset

Feat

ure

Extra

ctor

1

noitpircsedega

minoisuf

erutaef

Vector Space 1

MAP 1

MAP 2

Query Vector 1

Que

ry im

age

Similarity Matching

Feat

ure

Extra

ctor

2

Vector Space 2

Query Vector 2

Eigenvectors 1

MAP 1

Eigenvectors 2MAP 2

Fig. 2 ADCNN model and feature fusion method

age capacity and computational complexity in large datasets.In order to achieve effective retrieval of large-scale high-dimensional image data, we use a simple and efficient FSDHalgorithm and adjust the parameters of the FSDH algorithm.FSDH’s objective function is defined as follows:

minB,F,W

‖B − YW‖2F + λ‖W‖2F + α‖B − F(X)‖2F (1)

s.t. B � {bi }ni�1 ∈ {−1, 1}n×l

where ‖·‖ F is the Frobenius norm of a matrix, W is theprojection matrix for hash codes, and B is a set of hashcodes,B � {bi }ni�1 ∈ {−1, 1}n×l . X � {xi }ni�1, X is inputsample.Y � {yi }ni�1 ∈ Rn×c, where c is the number ofclasses. If xi is derived from the kth category, yik �1. Other-wise, yik �0. λ and α are the regular term coefficients of thesecond and third terms, respectively.

F(x) � φ(x)P (2)

where F(·) represents the approximate hash code, ϕ(x) is anm-dimensional row vector obtained by the Gaussian kernel,ϕ(x) � [exp(‖x−a1‖2/σ , . . . , exp(‖x−am‖2/σ )].

The terms of {a}mi�1 are the randomly selected m anchorexamples from vector space. σ is a Gaussian kernel param-

123

Page 5: A novel deep hashing method for fast image retrievalstatic.tongtianta.site/paper_pdf/066cca60-a865-11e9-a732... · 2019-07-17 · strong learning capabilities, Convolutional Neural

A novel deep hashing method for fast image retrieval

eter. In addition, the matrix P is a matrix of n rows and mcolumns. Its role is to map the input samples into the low-dimensional space. FSDH algorithm is mainly divided intoF-step, G-step and B-step.

F-step: The F-step of FSDH is shown in formula (3):

P �(φ(X)Tφ(X)

)−1φ(X)T B (3)

G-step:WhenP andB are fixed, formula (1) can be rewrit-ten as formula (4):

minW

tr(WT

(Y T Y + λI

)W

)− 2 tr

(WTY T B

)(4)

From Eq. (4), W can be solved with a closed-form solu-tion:

W �(Y T Y + λI

)−1Y T B (5)

B-step: If W and F are fixed, B can be solved by Eq. (6)as follows:

B � sgn(YW + βF(X)). (6)

3.4 ADFSDH algorithm function description

From the feature level, the ADCNN we proposed get a bet-ter image representation; from the data storage capacity andcomputational complexity, our proposed FSDH method has

less storage capacity and computational complexity. The lat-est research shows that the effect of single model imageretrieval is not ideal [16–18]. Therefore, this paper furtheroptimizes the network structure and combines feature learn-ing and hash coding in the same image retrieval frameworkto propose a deep hashing method. In theory, our proposedADFSDH not only can get better image representation, butalso has less storage capacity and computational complexity,and can adapt to large-scale image retrieval. Our proposedADFSDH algorithm for large-scale image retrieval is pre-sented in Algorithm 1.

Algorithm 1 ADFSDHInputs: Extracting Deep Features of Images by ADCNN Model {xi}i=1n ; The labelcorresponding to the feature is {yi}i=1n ; The binary code length is l; parameter λ,β and the maximum number of iterations N.Outputs: Hash code {bi}i=1n ∈ {−1,1}n×l.

Randomly select m samples {aj}j=1m

from the deep features of the image {xi}i=1n ;Get the ϕ(x) via the RBF kernel;

Initialize Y,Y = {Yij } ∈ Rn×c,where Yij = { 1, yi = j

0, othersUse formula (5) to initialize W;Use formula (3) to initialize P;

RepeatB-step Use formula (6) to solve B;G-step Use formula (5) to solve W;F-step Use formula (3) to solve P;

Until convergence Extract the binary hash code of the query image;Calculate the Hamming distance between query image hash code b and B ;Sort according to Hamming distance, return the most similar m images;

In the retrieved m images, the most similar k images are returned according to Euclidean distance sorting.

4 Experimental results and analysis

4.1 Experiment environment and assessment

Our experiments were tested on a GPU and Matlab 2017aplatform. Specifically, the graphics card model in the serveris GTX1080. In addition, the processor in the personal com-puter is the Intel(R) Corel(TM) i3-4150, the CPU frequencyis 3.5 GHz, and the RAM is 8 GB. The datasets CIFAR10,MNIST and FD-XJ were used in our algorithm tests. (1)CIFAR10 is a shared dataset for object recognition collectedby Alex Krizhevsky and Ilya Sutskever. This dataset consistsof 60,000 images from 10 classes with 6000 instances foreach class, and the pixels of the input images are 32×32.(2) MNIST is composed of 10 categories of the handwrit-ten digits from 0 to 9, and the pixels of the input images are28×28. In addition, this dataset has 60,000 training images

123

Page 6: A novel deep hashing method for fast image retrievalstatic.tongtianta.site/paper_pdf/066cca60-a865-11e9-a732... · 2019-07-17 · strong learning capabilities, Convolutional Neural

S. Cheng et al.

and 10,000 test images. (3) FD-XJ is supported by XinjiangUniversity and collected by image team members of our lab-oratory. The original intention of this database is to buildan intelligent campus. A total of 253 ethnic minority mem-bers participated in our database collection. The 13 imageswith different poses of each member were collected by us.The pixels of the input images are 28×28. During the test-ing process, we randomly selected 30% of the samples inthe three datasets as the test set and the rest of the samplesas the training set. In our proposed ADFSDH method, theparameters λ, β, N and randomly sampled anchor points areset to 1, 2e−5, 5 and 1000. In order to evaluate the perfor-mance of the proposed algorithm, precision and recall aretwo metrics that are widely used in the field of informationretrieval and statistical classification to evaluate the effect ofimage retrieval. When the number of bits is fixed, we need tofurther consider the performance of the algorithm. The mostcommon methods of evaluation are F-Measure and MAP.

4.2 ADCNNmodel performance test

In this section, we propose a new feature learning model.Our model is called ADCNN. The model contains two dif-ferent feature extractors in parallel, namely fine-tuning withVGG16 and fine-tuning with VGG19. We extracted imagefeatures with dimensions 256, 512 and 1024. From the testresults, we extracted image features with a dimension of512 to achieve the better performance of image retrieval.Therefore, we use two parallel feature extractors to extract512-dimensional feature vectors and then perform our pro-posed multi-feature fusion method. In this way, we get moreaccurate image descriptions. Table 1 shows the MAP resultsof different models and the effect of extracting image fea-ture dimensions. When the dimension of the image featureis 512, VGG16 and VGG19 have the better MAP results. Onthe FD-XJ dataset, the MAP result for the CRB-CNN [17]model is 0.565, and the MAP result for our proposed methodis 0.601, so our proposedmodel increases by 2.2% comparedto the CRB-CNN.

Table 2 shows the MAP results for images with differ-ent weighting schemes. W1 represents the weight of featureextractor 1, and W2 represents the weight of feature extrac-tor 2. In Table 2, 0.497 is the MAP result of the image bythe feature extractor 1 and 0.503 is the MAP result of theimage by the feature extractor 2. The experimental resultsshow that we can obtain the most accurate description of theimage when our weighted proportion is consistent with theMAP results of feature extractor 1 and feature extractor 2.Therefore, we propose a new feature fusion method based onthis phenomenon. Compared with other fusion methods, ourproposed fusionmethod has the betterMAP result in Table 2.

Table 1 Deep learning model performance test on FD-XJ

Model The dimensions ofall features

MAP

256 0.513

matconvnet-vgg-f 512 0.523

1024 0.510

256 0.532

matconvnet-vgg-m 512 0.533

1024 0.530

256 0.559

vgg-verydeep-16 512 0.563

1024 0.556

256 0.568

vgg-verydeep-19 512 0.569

1024 0.566

CRB-CNN 512 0.565

Ours (ADCNN) 512 0.601

Table 2 MAP results fordifferent fusion schemes onFD-XJ

W1 W2 MAP

0 1 0.569

0.1 0.9 0.571

0.3 0.7 0.586

0.497 0.503 0.601

0.7 0.3 0.581

0.9 0.1 0.564

1 0 0.563

The bold values represent theoptimal solution for featurefusion

4.3 Comparison of ADFSDH and FSDH performance

From Table 1, we can see that our proposed ADCNN hasbetter performance compared with VGG16, VGG19 andCRB-CNN, but our proposed ADCNN requires large storagecapacity and computational complexity. FromTables 3 and 4,it canbe seen that theFSDHalgorithmhas better performanceon MNIST and FD-XJ and the computational complexity islower. However, the FSDH algorithm does not use the deepfeatures of the image. Based on the above considerations, wehave also proposed the ADFSDH method. The experimen-tal results verify our hypothesis. From Table 3, the methodMAP results we proposed increase by 2.2% compared to theFSDH algorithm onMNIST. From Table 4, the methodMAPresults we proposed increase by 4.9% compared to the FSDHalgorithmOn FD-XJ. Therefore, the performance of our pro-posed deep hashing method is better than that of FSDH.

123

Page 7: A novel deep hashing method for fast image retrievalstatic.tongtianta.site/paper_pdf/066cca60-a865-11e9-a732... · 2019-07-17 · strong learning capabilities, Convolutional Neural

A novel deep hashing method for fast image retrieval

Table 3 The MAP results of FSDH and ADFSDH on MNIST withdifferent code lengths

Model Bits Precision Recall MAP

FSDH 12 0.913 0.881 0.922

24 0.916 0.823 0.922

32 0.909 0.782 0.923

48 0.873 0.704 0.922

64 0.844 0.684 0.923

128 0.844 0.684 0.923

Ours (ADFSDH) 12 0.935 0.921 0.941

24 0.914 0.881 0.941

32 0.895 0.849 0.943

48 0.838 0.793 0.941

64 0.812 0.781 0.943

128 0.812 0.781 0.943

The bold values represent the optimal MAP under a certain bit

Table 4 The MAP results of FSDH and ADFSDH on FD-XJ with dif-ferent code lengths

Model Bits Precision Recall MAP

FSDH 12 0.780 0.532 0.746

24 0.667 0.431 0.746

32 0.637 0.405 0.749

48 0.615 0.386 0.746

64 0.572 0.353 0.749

128 0.563 0.341 0.746

Ours (ADFSDH) 12 0.819 0.577 0.781

24 0.714 0.485 0.781

32 0.668 0.443 0.786

48 0.636 0.415 0.781

64 0.608 0.387 0.786

128 0.587 0.369 0.781

The bold values represent the optimal MAP under a certain bit

4.4 Verify the universality of the algorithm

MAP is a standard for evaluating the quality of an algorithm.Table 5 and Fig. 3 show that MAP is the evaluation stan-dard for different algorithms on CIFAR10 when the numberof bits is changing. Table 6 and Fig. 4 show that MAP isthe evaluation standard for different algorithms on MNISTwhen the number of bits is changing. Table 7 and Fig. 5show that MAP is the evaluation standard for different algo-rithms on FD-XJ when the number of bits is changing. Theexperimental results show that ourmethod outperforms otheralgorithms in all three datasets. On MNIST and FD-XJ, ourproposed method achieved the better performance when thehash code length was 32.

When the hash code length is fixed at 32, precision andrecall rate curves show the opposite trend. We need to mea-

Table 5 The MAP results of different algorithms on CIFAR10

Method 12 bits 24 bits 32 bits 48 bits

LSH 0.126 0.129 0.137 0.146

SH 0.132 0.135 0.133 0.124

KSH 0.325 0.337 0.347 0.356

SDH 0.401 0.419 0.434 0.448

SDHR 0.477 0.483 0.496 0.502

CNNH 0.511 0.514 0.517 0.512

FSDH 0.589 0.594 0.607 0.614

Ours (ADFSDH) 0.631 0.642 0.651 0.656

The bold value represent the optimal MAP under a certain bit

10 15 20 25 30 35 40 45 50

Length of hash code

0.1

0.2

0.3

0.4

0.5

0.6

0.7

MA

P

CIFAR10

LSH

SH

KSH

SDH

SDHR

CNNH

FSDH

ADFSDH

Fig. 3 MAP is the evaluation standard for different algorithms onCIFAR10 when the number of bits is changing

Table 6 The MAP results of different algorithms on MNIST

Method 12 bits 24 bits 32 bits 48 bits

LSH 0.251 0.284 0.338 0.432

SH 0.342 0.385 0.396 0.384

KSH 0.864 0.871 0.874 0.875

SDH 0.913 0.912 0.912 0.913

SDHR 0.915 0.916 0.916 0.915

CNNH 0.918 0.921 0.921 0.919

FSDH 0.922 0.922 0.923 0.922

Ours (ADFSDH) 0.941 0.941 0.943 0.941

The bold value represent the optimal MAP under a certain bit

sure the performance of the algorithm with new objectivecriteria. The most common method is F-Measure. So whenwe choose the hash code length of 32, we need to furthertest the performance of algorithms. In the case where theHamming distance is less than 2, Table 8 and Fig. 6 showthe F-Measure and precision recall rate curves on CIFAR10,Table 9 and Fig. 7 show the F-Measure and precision recall

123

Page 8: A novel deep hashing method for fast image retrievalstatic.tongtianta.site/paper_pdf/066cca60-a865-11e9-a732... · 2019-07-17 · strong learning capabilities, Convolutional Neural

S. Cheng et al.

10 15 20 25 30 35 40 45 50

Length of hash code

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

MA

PMNIST

LSH

SH

KSH

SDH

SDHR

CNNH

FSDH

ADFSDH

Fig. 4 MAP is the evaluation standard for different algorithms onMNIST when the number of bits is changing

Table 7 The MAP of different algorithms on FD-XJ

Method 12 bits 24 bits 32 bits 48 bits

LSH 0.133 0.178 0.215 0.331

SH 0.162 0.266 0.373 0.425

KSH 0.474 0.514 0.612 0.637

SDH 0.547 0.631 0.648 0.656

CNNH 0.566 0.642 0.651 0.684

SDHR 0.571 0.654 0.723 0.738

FSDH 0.746 0.746 0.749 0.746

Ours (ADFSDH) 0.781 0.781 0.786 0.781

The bold value represent the optimal MAP under a certain bit

10 15 20 25 30 35 40 45 50

Length of hash code

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

MA

P

FD-XJ

LSH

SH

KSH

SDH

SDHR

CNNH

FSDH

ADFSDH

Fig. 5 MAP is the evaluation standard for different algorithms on FD-XJ when the number of bits is changing

Table 8 The F-Measure results of different algorithms on CIFAR10when the number of hash bits is 32

Method Cifar10

Precision Prec-Recall F-Measure Map

LSH 0.092 0.151 0.114 0.137

SH 0.101 0.250 0.144 0.133

KSH 0.195 0.556 0.289 0.347

SDH 0.242 0.656 0.354 0.434

SDHR 0.348 0.609 0.443 0.496

CNNH 0.440 0.650 0.525 0.517

FSDH 0.475 0.752 0.582 0.607

Ours (ADFSDH) 0.636 0.713 0.672 0.651

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Prec

isio

n

CIFAR10

LSH

SH

KSH

SDH

SDHR

CNNH

FSDH

ADFSDH

Fig. 6 Precision-recall curves with 32 bits on CIFAR10

rate curves on MNIST, and Table 10 and Fig. 8 show theF-Measure and precision recall rate curves on FD-XJ. Theexperimental results show that our algorithm’s F-Measureand MAP are higher than other algorithms in this table, sothe universality of our algorithm is verified.

4.5 Our algorithm tests results in our dataset

CRB-CNN was proposed by Alzu’bi et al. [17]. CRB-CNN removes the full connection layer of matconvnet-vgg-m and vgg-verydeep-16, and then uses Compact BilinearPooling and inner product image descriptor. Our proposedADFSDH method is composed of our proposed ADCNNand the improved FSDH algorithm, which combines fea-ture learning with hash coding into a framework. OurproposedADCNNhas two different feature extractors in par-allel, namely fine-tuning with VGG16 and fine-tuning withVGG19.Table 11 showsMAP results for different algorithms

123

Page 9: A novel deep hashing method for fast image retrievalstatic.tongtianta.site/paper_pdf/066cca60-a865-11e9-a732... · 2019-07-17 · strong learning capabilities, Convolutional Neural

A novel deep hashing method for fast image retrieval

Table 9 The F-Measure results of different algorithms onMNISTwhenthe number of hash bits is 32

Method MNIST

Precision Prec-Recall F-Measure MAP

LSH 0.225 0.508 0.312 0.338

SH 0.324 0.450 0.377 0.396

KSH 0.732 0.853 0.788 0.874

SDH 0.804 0.821 0.812 0.912

SDHR 0.843 0.900 0.871 0.916

CNNH 0.881 0.893 0.887 0.921

FSDH 0.909 0.782 0.841 0.923

Ours(ADFSDH) 0.895 0.849 0.871 0.943

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Prec

isio

n

MNIST

LSH

SH

KSH

SDH

SDHR

CNNH

FSDH

ADFSDH

Fig. 7 Precision-recall curves with 32 bits on MNIST

Table 10 The F-Measure results of different algorithms on FD-XJwhenthe number of hash bits is 32

Method FD-XJ

Precision Prec-Recall F-Measure MAP

LSH 0.092 0.604 0.160 0.215

SH 0.133 0.750 0.226 0.373

KSH 0.412 0.754 0.533 0.612

SDH 0.587 0.703 0.640 0.648

SDHR 0.611 0.727 0.664 0.651

CNNH 0.628 0.822 0.712 0.723

FSDH 0.644 0.830 0.725 0.749

Ours(ADFSDH) 0.701 0.803 0.749 0.786

and the required feature dimensions for each image. Figure 9shows image retrieval results of CRB-CNN and ADFSDH.In Fig. 9, the first image in each row represents the queryimage, the first row and the third row represent the imageretrieval results of the CRB-CNN, and the second row and

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Prec

isio

n

FD-XJ

LSH

SH

KSH

SDH

SDHR

CNNH

FSDH

ADFSDH

Fig. 8 Precision-recall curves with 32 bits on FD-XJ

Table 11 Performance evaluation of our method on our database (FD-XJ)

Model #Each image featuredescription

MAP

matconvnet-vgg-m 256 0.532

512 0.533

1024 0.530

vgg-verydeep-16 256 0.559

512 0.563

1024 0.556

vgg-verydeep-19 256 0.568

512 0.569

1024 0.566

CRB-CNN 512 0.565

Ours (ADCNN) 512 0.601

Ours (ADFSDH) 32 0.786

the fourth row represent the image retrieval results of theADFSDH. From Table 11 and Fig. 9, we can see that ourproposed ADFSDH is better than CRB-CNN [17] regardlessof the MAP result or retrieval effect.

5 Conclusion

In this paper, we propose a new deep hashing method calledADFSDH,which contains theADCNNmodel and the FSDHalgorithm after setting parameters. Our method is mainlydivided into the following five steps: (1) to initialize themodel by pre-trainingVGG16 andVGG19 networks on Ima-geNet; (2) to fine-tune the ADCNNmodel on image retrievaltarget domain; (3) to describe images more accurately by anew feature fusion method; (4) to generate a compact binary

123

Page 10: A novel deep hashing method for fast image retrievalstatic.tongtianta.site/paper_pdf/066cca60-a865-11e9-a732... · 2019-07-17 · strong learning capabilities, Convolutional Neural

S. Cheng et al.

Fig. 9 Image retrieval results ofCRB-CNN and ADFSDH Query image Image retrieval results

Image retrieval results of ADFSDH algorithm

Image retrieval results of CRB-CNN[17] algorithm

hash code by FSDH algorithm after adjusting the parameters;(5) to combine feature learning and hash coding into the sameframework for highly effective image retrieval. Experiment isconducted on two image classification databases (CIFAR10and MNIST) and our laboratory’s own face database (FD-XJ). The experimental results show that ADFSDH is veryeffective. In the course of comparing the experimental design,weverified the effectiveness of our proposedADCNNmodel,completed the comparison of ADFSDH and FSDH and veri-fied the universality of our proposed deep hashingmethod onthree databases. The method was applied to a face databasecollected by members of image group of our laboratory. Ourproposed method is still a coarse-to-fine search method thatimproves retrieval speed and reduces storage space. We alsocompare our algorithm with several state-of-the-art imageretrieval algorithms. The results clearly demonstrate that ouralgorithm provides significant improvement in retrieval per-formance.

In this work, we proposed a new deep hashing methodbased on the performance requirements of the algorithm inimage retrieval. However, in the era of big data, the privacyprotection of massive image retrieval needs researchers topay attention. Therefore, in the future research, our mainresearch comes from two aspects. On the one hand, we willstudy the deep hashing image retrieval algorithm. On theother hand, we will study the privacy protection of imageretrieval.

Acknowledgements This work is supported by the Chinese NationalNatural Science Foundation (Program Nos. 61471311, 61771416), theCreative Research Groups of Higher Education of Xinjiang UygurAutonomous Region (Program No.XJEDU2017T002) and the SaierNetwork Next Generation Internet Technology Innovation Project (Pro-gram No.NGII20170325).

Compliance with ethical standards

Conflict of interest The authors declare that they have no conflict ofinterest.

References

1. Oliva, A., Torralba, A.: Building the gist of a scene: the role ofglobal image features in recognition. Prog. Brain Res. 155, 23–36(2006)

2. Lowe, D.G.: Distinctive image features from scale-invariant key-points. Int. J. Comput. Vis. 60(2), 91–110 (2004)

3. Dalal, N., Triggs, B.: Histograms of oriented gradients for humandetection. In: Proceedings of Computer Vision and Pattern Recog-nition (CVPR), pp. 886–893 (2005)

4. LeCun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learningapplied to document recognition. Proc. IEEE 86(11), 2278–2324(1998)

5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classificationwith deep convolutional neural networks. In: Advances in NeuralInformation Processing Systems, pp. 1097–1105 (2012)

6. Deng, J., Dong,W., Socher, R., et al.: Imagenet: A large-scale hier-archical image database. In: Proceedings of the IEEE ConferenceonComputerVision and PatternRecognition (CVPR), pp. 248–255(2009)

7. Simonyan, K., Zisserman, A.: Very deep convolutional networksfor large-scale image recognition. In: ICLR (2015)

8. He, K., Zhang, X., Ren, S., et al.: Deep residual learning forimage recognition. In: Proceedings of Computer Vision and PatternRecognition (CVPR), pp. 770–778 (2016)

9. Gionis, A., Indyk, P.,Motwani, R.: Similarity search in high dimen-sions via hashing. In: VLDB, vol. 99(6), pp. 518–529 (1999)

10. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Advancesin Neural Information Processing Systems, pp. 1753–1760 (2009)

123

Page 11: A novel deep hashing method for fast image retrievalstatic.tongtianta.site/paper_pdf/066cca60-a865-11e9-a732... · 2019-07-17 · strong learning capabilities, Convolutional Neural

A novel deep hashing method for fast image retrieval

11. Liu,W.,Wang, J., Ji, R., et al.: Supervised hashing with kernels. In:Proceedings of Computer Vision and Pattern Recognition (CVPR),pp. 2074–2081 (2012)

12. Shen, F., Shen, C., Liu, W., et al.: Supervised Discrete Hashing. In:Proceedings of Computer Vision and Pattern Recognition (CVPR),vol. 2(3), pp. 37–45 (2015)

13. Gui, J., Liu, T., Sun, Z., et al.: Supervised discrete hashing withrelaxation. IEEE Trans. Neural Netw. Learn. Syst. 29(3), 1–10(2016)

14. Gui, J., Liu, T., Sun, Z., et al.: Fast supervised discrete hashing.IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 490–496 (2018)

15. Xia, R., Pan, Y., Lai, H., et al.: Supervised hashing for imageretrieval via image representation learning. In: AAAI (2014)

16. Alzu’bi, A., Amira, A., Ramzan, N.: Compact root bilinear CNNsfor content-based image retrieval. In: ICIVC, pp. 41–45 (2016)

17. Alzu’bi, A., Amira, A., Ramzan, N.: Content-based image retrievalwith compact deep convolutional features. Neurocomputing 249,95–105 (2017)

18. Huang, H.K., Chiu, C.F., Kuo, C.H., et al.: Mixture of deep CNN-based ensemble model for image retrieval. In: Proceedings of 5thGlobal Conference on Consumer Electronics, pp. 1–2 (2016)

19. Zhong, G., Xu, H., Yang, P., et al.: Deep hashing learning networks.In: International Joint Conference on Neural Networks (IJCNN),pp. 2236–2243 (2016)

20. Li, J., Li, J.: Supervised hashing binary code with deep CNN forimage retrieval. In: Proceedings of 8th International Conferenceon Biomedical Engineering and Informatics (BMEI), pp. 649–655(2015)

21. Lai, H., Pan, Y., Liu, Y., et al.: Simultaneous feature learning andhash coding with deep neural networks, pp. 3270–3278 (2015)

22. Peng, T., Li, F.: Image retrieval based on deep ConvolutionalNeural Networks and binary hashing learning. In: IEEE Interna-tional Conference on Speech and Signal Processing (ICASSP),pp. 1742–1746 (2017)

23. Liu, H., Wang, R., Shan, S., et al.: Deep supervised hashing for fastimage retrieval. In: Proceedings of Computer Vision and PatternRecognition (CVPR), pp. 2064–2072 (2016)

24. Gong, Y., Lazebnik, S., Gordo, A., et al.: Iterative quantization:a procrustean approach to learning binary codes for large-scaleimage retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35(12),2916–2929 (2013)

25. Norouzi, M., Blei, D.M.: Minimal loss hashing for compact binarycodes. In: Proceedings of the 28th International Conference onMachine Learning (ICML-11), pp. 353–360 (2011)

26. Kulis, B., Darrell, T.: Learning to hash with binary reconstructiveembeddings. In: Advances in Neural Information Processing Sys-tems, pp. 1042–1050 (2009)

27. Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing forlarge-scale search. IEEE Trans. Pattern Anal. Mach. Intell. 34(12),2393–2406 (2012)

28. Lin, K., Yang, H.F., Hsiao, J.H., et al.: Deep learning of binary hashcodes for fast image retrieval. In: Proceedings of Conference onComputer Vision and Pattern Recognition Workshops (CVPRW),pp. 27–35 (2015)

29. Li, K., Qi, G.J., Ye, J., et al.: Linear subspace ranking hashingfor cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell.39(9), 1825–1838 (2017)

30. Jiang Q.Y., Li, W.J.: Deep cross-modal hashing. In: Proceedings ofComputer Vision and Pattern Recognition (CVPR), pp. 3270–3278(2017)

31. Zheng, Y.T., Neo, S.Y., Chua, T.S., et al.: Toward a higher-levelvisual representation for object-based image retrieval.Vis.Comput.25(1), 13–23 (2009)

32. Lavoué, G.: Combination of bag-of-words descriptors for robustpartial shape retrieval. Vis. Comput. 28(9), 931–942 (2012)

33. Joia, P., Gomez-Nieto, E., Neto, J.B., et al.: Class-specific metricsfor multidimensional data projection applied to CBIR. Vis. Com-put. 28(10), 1027–1037 (2012)

34. Li,H., Toyoura,M., Shimizu,K., et al.: Retrieval of clothing imagesbased on relevance feedback with focus on collar designs. Vis.Comput. 32(10), 1–13 (2016)

Shuli Cheng (1990–) is now inthe first year for his D.S. in Infor-mation Science and Engineeringat Xinjiang University. His majoris computer application technol-ogy. He has a deep interest in thedeep learning and artificial neuralnetworks. He is also doing someresearch in computer vision.

Huicheng Lai (1963–) receivedhis Ph.D. degree in the Schoolof Information Science and Engi-neering from the Xinjiang Univer-sity in 1990. His current researchinterests include image intelligentprocessing and computer vision.

Liejun Wang (1975–) receivedhis Ph.D. degree in the School ofInformation and CommunicationEngineering from the Xi’an Jiao-tong University in 2012. He isalso a member of the EducationInformation Teach and TeachingCommittee, member of the expertgroup for promoting domesticcryptography in key areas inXinjiang, director of the Advi-sory Committee of EducationalInformation Technology Expertsin Xinjiang, director of the mainnode of the China Education and

Scientific Research Network in Xinjiang, and deputy director ofNetwork Centre. He has presided over 4 national projects relatedto network information security, 2 provincial and ministerial level,published more than 50 core papers. His current research interestsinclude wireless sensor network, encryption algorithm and imageintelligent processing.

123

Page 12: A novel deep hashing method for fast image retrievalstatic.tongtianta.site/paper_pdf/066cca60-a865-11e9-a732... · 2019-07-17 · strong learning capabilities, Convolutional Neural

S. Cheng et al.

Jiwei Qin (1978–) received herMaster and Ph.D. degree in theSchool of Computer Architecturefrom the Xi’an Jiaotong Uni-versity, Xi’an, China, in 2008and 2013. She is currently anEngineer in the Network andInformation Technology Cen-ter of Xinjiang University. Herresearch interests include intelli-gent network, data mining, socialnetwork modeling and analysis,E-learning, recommender systemsand collaborative filtering. Shehas directed and participated in

quite a number of research projects and published papers in manyinternational journals and conference proceedings, including Edu-cational Technology and Society, Sensors and Transducers, SAIComputing Conference.

123