multi-label remote sensing image retrieval based on deep features

Remote Sensing LaboratoryDept. of Information Engineering and Computer Science

University of TrentoVia Sommarive, 14, I-38123 Povo, Trento, Italy

STUDENTMichele Compri

Multi-Label Remote Sensing Image Retrieval By Using Deep Features

E-mail: [email protected]

THESIS ADVISORSBegüm Demir (Unitn)Xavier Girò-i-Nieto (UPC)

University of Trento, Italy

Outline

Michele Compri

Introduction

Aim of the Thesis

1

Conclusion

Proposed Approach to Multi-Label RS Image Retrieval

2

3

5

Experimental Results4

2


Introduction

Michele Compri

✓ During the last decade, advances in RS technology has led to an increased volume of remote sensing (RS) images.

✓ EO data archives grow rapidly motivating the need of efficient and effective content-based image retrieval (CBIR) methods.

3

Query

Archive

Similar metrics( Euclidean, cosine similarity)

v = ( v1,...,vn)

v1= ( v11,...,v1n)

vk = ( vk1,...,vkn)

Image MatchingImage Representation Ranking


Aim of the Thesis

Michele Compri

✓ Usually, in CBIR system in RS, for image representation and image matching, images are categorized under a single-label.

✓ Such strategy does not fit well the complexity of RS image, where each one might be associated multi labels.

4

Parking Lot Tennis court Airplane

Airplane, Cars, Grass, TreesCars, Pavement Bare-soil,Court,

Grass, Tree

Proposal Solution: To Investigate the effectiveness of different Deep Learning architecture in the framework of multi-label RS image retrieval problems.


Proposed Approach: General View

Michele Compri 5

XTr

Training set TTr

Fine-tuning

Pretrained DEEP CNN

fine-tuned DEEP CNN

N retrieved images

FeatureExtraction

Retrieval

System is composed by three main stages: ● Pretrained Architecture● Fine-Tuning● Retrieval


Proposed Approach: Pretrained Architectures

Michele Compri

✓ Since that CNN takes a lot time and huge amount of data to be trained, pretrained models on ImageNet are considered.

✓ In particular,three different pretrained architecture on ImageNet have been considered:

➢ VGG16: CNN characterized by 16 weights layers, with intermediate max pooling layers and 3 fully connected(FC) layers

➢ Inception V3: Improved version of GoogleNet, containing more layers but less parameters, by removing FC and using global average pooling ➢ ResNet50: Deeper CNN characterized by residual layer that allows data to flow by skipping the convolutional blocks

✓ Since RS images are different to images present in ImageNet, fine-tuning approach is considered to better hold on the features.

6

University of Trento, Italy 7

Proposed Approach: Fine-Tuning

XTr

Training set TTr

Fine-tuning

Pretrained DEEP CNN

fine-tuned DEEP CNN

Architecture

Classifier

Architecture

New Classfier

New Classfier } High level

Trainable

Frozen}Michele Compri 7

➢ Fine tuning is a transfer learning strategy to use generic features of pretrained architecture while training the top of fine-tuned architecture

➢ Fine tuning consists in two phases:■ Replace classifier■ Training only top of

architecture

➢ Since that Multi-Label are considered, binary cross entropy as cost function and sigmoid activation are used


Proposed Approach:Feature Extraction

XTr

Test set TTr

Fine-tunedDEEP CNN

FeatureExtraction

Features Extraction

Michele Compri 8

Retrieval

v = ( v1,...,vn)

OUTPUT

VGG16

BLOCK 1

BLOCK 2

BLOCK 3

BLOCK 5

CLASSIFIER

BLOCK 4


Proposed Approach:Retrieval

XTr

Test set TTr

Fine-tunedDEEP CNN

FeatureExtraction

Image Retrieval

Michele Compri 9

Retrieval

v = ( v1,...,vn)

Image Dataset

Image Matching

BLOCK 1

BLOCK 2

BLOCK 3

BLOCK 5

CLASSIFIER

OUTPUT

VGG16

BLOCK 4


UC Merced Land Use benchmark archive: 2100 images categorized under 21 Land-cover classes (categories) and characterized by 17 primitive classes (Multi-labels)

Data Set Description

Field Trees Airplane

Bare-soil Chaparral Buildings

Grass Sea Sand

Pavement Mobile-home Cars

Ship Dock Water

Tanks Court

Multi-labels (primitive classes) Single-Label( Broad categories)

Agricultural Airplane Basell diamond

Beach Buildings Chaparral

Dense Residential Forest Freeway

Golf Course Harbor Intersection

Medium Residential Mobile Home Park

Overpass

Parking Lot River Runaway

Sparse Residential Storage Tanks Tennis Court

Airplane

Airplane, Cars, Grass, Trees

Parking Lot

Cars, Pavement

Tennis Court

Bare-soil,Court, Grass, Tree

Michele Compri 10


Experimental Setup

11

✓ Considered Framework is Keras, which is deep learning python library that run on top of Theano, numerical computational library.

✓ Dataset is splitted as: 80% training set and 20% test set.

✓ Different values for each meta-parameter have been tested using fine-tuning technique.

11

Name Values

Optimizer initial/final SGD/ AdamLearning rate initial/final 0.001/ 0.01

Weights decay initial/final 0 /0.3678

Michele Compri 11


Experimental Setup

12

✓ To fine-tune, each architecture is splitted into Fine-tuned layers and Frozen layers.

✓ Fine-tuned layers: During training phase the weights presented in that layer are updated, in according with considered archive.

✓ Frozen layers: Part of architecture where weights does not change ( generic features).

12

Architecture Fine-tuned Layers (Top)

VGG-16 14-18

Inception V3 172-217

ResNet 50 152-174

Michele Compri

New Classfier } High level

Trainable

Frozen}12


Experimental Results

13

Architectures Accuracy Precision Recall

VGG-16 58.22% 69.40% 69.95%

Inception V3 52.15% 63.08% 62.64%

ResNet 50 66.89% 76.27% 78.06%

✓ Baseline Experiment: Performance of original pretrained Deep architectures on retrieval the most 20 similar images.

✓ To evaluate performance three metrics have been considered: Accuracy, Precision and Recall

Michele Compri 13



14

Architectures Accuracy Precision Recall

VGG-16 70.97% 80.54% 81.61%

Inception-V3 66.97% 76.69% 77.53%

ResNet50 72.51% 82.18% 83.05%

Architecture Accuracy Precision Recall

VGG-16 +12.75% +11.14% +11.66%

inception V3 +14.82% +13.61% +14.89%

ResNet50 +5.62% +5.91% +4.99%

✓ Performance of fine-tuned architectures on top 20 retrieved images

✓ Gain of fine-tuning with respect to the model pre-trained with ImageNet

Michele Compri 14



15

Methods Accuracy Precision Recall

SVM 70.39% 80.32% 76.08%

ResNet50 72.51% 82.18% 83.05%

✓ Performance of SVM by using SIFT features vs fine-tuned architecture

Michele Compri 15



16

Intersection Buildings, Cars, Grass, Pavement, Tree

Intersection Buildings, Cars, Grass, Pavement, Tree

Intersection Bare-soil, Buildings, Cars, Grass, Pavement, Tree

Tenniscourt Buildings, Cars, Court, Pavement, Tree

Sparse Residential Buildings, Grass, Pavement, Tree

Medium Residential Buildings, Cars, Grass, Tree




Intersection Bare-soil, Buildings, Cars, Grass, Pavement, TreeQuery

VGG16 Inception V3 ResNet50

111

10 1010

202020

Michele Compri 16


Conclusion

17

✓ Unlike to existing CBIR system, multi-label RS images are retrieved by investigating the effectiveness of different Deep Learning architecture.

✓ Three different pretrained architecture on ImageNet are considered: VGG16, Inception V3 and ResNet50

✓ These off-the-shell models are fine-tuned with subset of RS images and their multi-label information.

✓ From retrieval experiment we observe that architectures and also fine-tuning strategy are effectived in multi-label RS images framework.

✓ As future development:

▪ Different architectures could be analyzed ▪ Data augmentation could be taken in consideration▪ Collect more data to train architectures from scratch

Michele Compri 17


THANKS FOR YOUR ATTENTION !

Michele Compri 18

multi-label remote sensing image retrieval based on deep features

Data & Analytics