yfcc100m hybridnet fc6 deep features for content-based image retrieval

Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro and Fausto Rabitti

[email protected]

YFCC100M HYBRIDNET FC6 DEEP FEATURES

FOR CONTENT-BASED IMAGE RETRIEVAL

Multimedia COMMONS Workshop at ACM Multimedia 2016

Amsterdam, The Netherlands, October 15-19

[email protected]

WHERE WE COME FROM AND MOTIVATIONS

CoPhIR – Content-based Photo Image Retrieval

http://cophir.isti.cnr.it

• Flickr 106M Photos (not all CC)

• title, description, author, tags, comments, notes, and also its GPS, coordinates, the number of views and the number of users considering the photo a favorite

• MPEG-7 Visual Features

• mainly used by the Similarity Search community(144 citations and about 100 requests)

Similarity SearchThe Metric Space ApproachZezula, Amato, Dohnal, Batko

2008

http://cophir.isti.cnr.it/

[email protected]

MAJOR RELATED EVENTS

Deep Learning explosion

YFCC100M

The Multimedia Commons Initiative

[email protected]

CONTRIBUTIONS

• HybridNet fc6 Deep Features for YFCC100M imagesmultimediacommons.wordpress.com/

• CBIR Systems on the YFCC100M

o MI-Filemifile.deepfeatures.org

o Lucene Quantizationmelisandre.deepfeatures.org

• Ground-truth Results for evaluating Approximate k-NN (k=10,001)www.deepfeatures.org/

o On 3 types of the neuron activations (features) processing

o For subsets of the whole collections at each 1M step

https://multimediacommons.wordpress.com/

http://mifile.deepfeatures.org/

http://melisandre.deepfeatures.org/

http://www.deepfeatures.org/

[email protected]

HYBRIDNET

• Trained on 3.5 million images from 1,183 categories:o ImageNet-ILSVRC

• about 1 million images from 888 categories (removing Places 295 duplicates)

o Places 205

• about 2.5 million images from 205 categories

Learning Deep Features for Scene Recognition using Places DatabaseZhou, Lapedriza, Xiao, Torralba, Oliva, NIPS 2014

[email protected]

WHY HYBRIDNET FC6?

A Practical Guide to CNNs and Fisher Vectors for Image Instance RetrievalV Chandrasekhar, J Lin, O Morère, H Goh, A Veillard - Signal Processing, 2016 - Elsevier

[email protected]

DEEP FEATURES PROCESSING

• We generated 3 distinct features from the fc6 activations:

o Raw (no ReLu) + L2Norm.

o ReLu + L2Norm.

o BinaryA simple binarization of deep features was shown to lead to a negligible performance drop for both classification and detection (PASCAL-CLS in particular).

𝑏𝑖 = 1 𝑓𝑖 > 00 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Analyzing the performance of multilayer neural networks for object recognition.P. Agrawal, R. Girshick, and J. Malik. (ECCV 2014)

[email protected]

[email protected]

GT RESULTS www.deepfeature.org

[email protected]

GT RESULTS (SEQUENTIAL SCANNNING)

[email protected]

APPROXIMATE CBIR RESULTS

MI-

File

Lu

ce

ne

Qu

an

tiza

tio

n

[email protected]

THE CBIR ONLINE SYSTMES

• MI-File

o Permutation Based method

o Uses Inverted Files

MI-File: using inverted files for scalable approximate similarity search

G Amato, C Gennaro, P Savino (Multimedia tools and applications)

• Lucene Quantization

o Exploits the sparsity of deep features (ReLu -> 25% non zeros)

o Quantization approach to allow text encoding

o Also able to perform text and combined search

Large scale indexing and searching deep convolutional neural network features

G. Amato, F. Debole, F. Falchi, C. Gennaro, and F. Rabitti (DaWaK 2016)

[email protected]

MI-FILE (INDEXING BINARY FEATURES)

[email protected]

LUCENE QUANTIZATION (INDEXING RELU L2NORM.)

[email protected]

MI-FILE (COMPARED TO GT FOR RELU-L2NORM)

ONGOING WORKS

[email protected]

IMAGE ANNOTATION

[email protected]

CROSS MEDIA RETRIEVAL (RESULTS ON MS-COCO)

• Text queries are translated in HybridNet fc6 Visual Vectors by a NN

Picture It In Your Mind: Generating High Level Visual Representations From Textual DescriptionsFabio Carrara, Andrea Esuli, Tiziano Fagni, Fabrizio Falchi, Alejandro Moreo Fernándezhttps://arxiv.org/abs/1606.07287

Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions

[email protected]

CROSS MEDIA RETRIEVAL (RESULTS ON YFCC100M)

Picture It In Your Mind: Generating High Level Visual Representations From Textual DescriptionsFabio Carrara, Andrea Esuli, Tiziano Fagni, Fabrizio Falchi, Alejandro Moreo Fernándezhttps://arxiv.org/abs/1606.07287


[email protected]

CONCLUSIONS AND FUTURE WORK

Contributions:

• HybridNet fc6 Deep Features

• CBIR Systems for YFCC100M:

o MI-File mifile.deepfeatures.org

o Lucene Quantization melisandre.deepfeatures.org

• GT k-NN results for evaluating Approximate Search www.deepfeatures.org/

Ongoing and future works:

• HybridNet fc6 PCA256

• Image annotation based on the YFCC100M metadata

• Extracting new features, e.g.:Deep Image Retrieval: Learning Global Representations for Image SearchAlbert Gordo, Xerox Research; Jon Almazan, XRCE; Jerome Revaud, Xerox Research; Diane Larlus, Xerox

• Cross-media retrievalPicture It In Your Mind: Generating High Level Visual Representations From Textual DescriptionsFabio Carrara, Andrea Esuli, Tiziano Fagni, Fabrizio Falchi, Alejandro Moreo Fernándezhttps://arxiv.org/abs/1606.07287

http://mifile.deepfeatures.org/

http://melisandre.deepfeatures.org/

http://www.deepfeatures.org/


[email protected]

THANKS!

Questions are welcomed

Fabrizio Falchi

[email protected]

mailto:[email protected]

yfcc100m hybridnet fc6 deep features for content-based image retrieval

Science