object discovery using cnn features in egocentric videos

Presentacin de PowerPoint

Object Discovery using CNN Features in Egocentric VideosM. Bolaos, M. Garolera and P. Radeva

SenseCam and Narrative wearable camerasLifelogging and Egocentric VisionPresent a new field of research with a high potential:Automatic nutrition diary.Memory aid for mild cognitive impairment patients.

Introduction Scheme Methodology Validation Conclusions

Lifelogging cameras passively acquire pictures 24-7-365. Producing data about the users' typical activities and habits.

Focus of the WorkHypothesis: Individuals environment is constructed by sets of objects that characterize it.3Introduction Scheme Methodology Validation ConclusionsObject Discovery Object Detection?

Goal: To develop automatic techniques for object discovery to characterize the environment of the person wearing the camera.

These are some examples of lifelogging sets aquired with a wearable camera. We can see that some of them capture quite well the objects and environment surrounding the user, but in many occasions, considering the non-intentionality of the aquisition of the pictures, we do not have useful information and moreover many can appear blurry.

Analysing these pictures, we considered that the environment of any individual is composed of and can be described by the objects people and scenes that appear in their daily life images.

Based on this, our goal is to discover the set of objects that characterize this environment using an Object Discovery algorithm. We must note that Object Discovery Object Detection, because we do not use a supervised technique for detecting a specific object, but have to gradually discover new concepts that describe the environment.3

Focus of the WorkHypothesis: Individuals environment is constructed by sets of objects that characterize it.4Introduction Scheme Methodology Validation ConclusionsObject Discovery Object Detection?

Goal: To develop automatic techniques for object discovery to characterize the environment of the person wearing the camera.Person APerson APerson B




Baseline Object Discovery ProposalLee & Graumans Learning the Easy Things First: Self-Paced Visual Category DiscoveryThey propose to use:a) As features for object candidates: LAB histograms: colour information.Pyramid HOG: shape informationSpatial Pyramid Matching: texture information. Visual words technique based on SIFT.b) Iterative clustering-based easy-first discovery.5Introduction Scheme Methodology Validation Conclusions

A good object discovery proposal was given by Lee & Grauman.

They propose a set of colour, shape and texture features for describing each object candidate after the previous application of an object detection method.

Then they apply an interative semi-supervised object discovery method based on starting by the easiest objects first.5

Our ProposalDiscover iteratively and semi-supervisedly the most relevant classes from egocentric videos for a particular user, based on the following methodology:How can we apply a sampling on each image? Objectness object detectorHow can we represent real world objects?CNN featuresHow can we apply a knowledge reuse?Refill strategy How can we discover new concepts? Clustering6Introduction Scheme Methodology Validation Conclusions

In our proposal we want to keep the good aspects of other methods, but at the same time define a new methodology useful for solving the object discovery problem in non-intentional images. Therefore, our newly defined methodology has to tackle some very precise problems:

Sampling? ObjectnessRich representation? CNN featuresKnowledge reuse? RefillRemove FPs?SVM filteringDiscover new concepts?Clustering

We additionally make public the first egocentric lifelogging dataset (EDUB).6

Ego-Object Discovery Scheme (I)Introduction Scheme Methodology Validation Conclusions

Ego-Object Discovery Scheme (II)Introduction Scheme Methodology Validation Conclusions

Ego-Object Discovery Scheme (III)Introduction Scheme Methodology Validation Conclusions

Objectness detectorObject detection method for extracting a set of initial object candidates as an initial step for the object discovery.

Maximizes 3 complementary terms:Multi-scale SaliencyColor ContrastSuperpixels Straddling

10Introduction Scheme Methodology Validation Conclusions

For extracting the set of general object candidates (without an initial specific label), we use Ferraris Objectness.

Their proposal is to use a combination of three measures to choose the best object candidates and an associated objectness score.10

CNN FeaturesWe propose a set of features based on a pre-trained Convolutional Neural Network on ImageNet objects.After deleting the last layer, it can be used as a powerful feature extractor for real-world objects (output of dimension 4096).11Introduction Scheme Methodology Validation Conclusions

For obtaining a rich feature representation for our objects, we used a pre-trained CNN provided by Hinton et al. on general objects images as a feature extractor.11

Refill StrategyRefill strategy: useful for offering a knowledge reuse and an additional context from the objects. Consequently providing more robust clusters.

Bag of Refillusage12Introduction Scheme Methodology Validation Conclusions

In order to have more robuts clusters and reuse the aquired knowledge, we developed the refill strategy, which inserts previously labeled samples during the clustering step.

Then, for deleting a great part of the FPs caused by the object dectector, we trained a Object vs NoObject SVM.12

Dataset1.000 images from a person's work day.50.000 object candidates were extracted.To validate our method, we used the most frequent:Introduction Scheme Methodology Validation Conclusions

Tests SettingsTrue detection of an object candidate, if the Overlapping Score (OS) between the detected region and the Ground Truth is 0.4:

76% of the object detector samples are FPs! (No Objects)Introduction Scheme Methodology Validation Conclusions

Tests ComparisonFinal F-Measure, Purity and Accuracy for each settingF-Measure evolution for each different setting

Introduction Scheme Methodology Validation Conclusions

Real Clusters Obtained (simplified)tvmonitor

handclusterhard samples

cluster

hard samplesIntroduction Scheme Methodology Validation Conclusions

Extended Recent Work (I)Introduction Scheme Methodology Validation ConclusionsEgocentric Dataset of the University of Barcelona (EDUB) 4,912 images Acquired by 4 people in 8 different days11,294 object labels (21 different classes).Tested also on public datasets of intentional images.

MSRCPASCAL 12EDUB

Extended Recent Work (II)Introduction Scheme Methodology Validation ConclusionsSetting 5Added SVM Filter strategy for removing FPs produced by the Object Detectors.

Setting 3Setting 1

Extended Recent Work (III)Introduction Scheme Methodology Validation Conclusions

ConclusionsObject discovery algorithm that outperforms the state of the art and relies on:Refill strategy for providing knowledge reuse.Features extraction method using a pre-trained CNN.SVM filtering for providing a filter of FP candidates.Egocentric lifelogging dataset with object labels (EDUB).Comparison on three datasets EDUB, PASCAL and MSRC.

Future lines:Discovering objects, scenes and people to characterize the users environment, adding also information provided by event segmentation techniques.Algorithm code and dataset to be released soon.


In this work we have proposed a new object discovery methodology that uses:

CNN features for a rich representation.SVM filtering for removing FP candidates.Refill strategy for applying a knowledge reuse.

Moreover:

First public 20

Focus of the Work

Hypothesis: Individuals environment is constructed by sets of objects that characterize it.

Goal: To develop automatic techniques for object discovery to characterize the environment of the person wearing the camera.21Introduction Scheme Methodology Validation ConclusionsWhat is the difference between Object Discovery and Object Detection?




ClusteringWe use an unsupervised clustering for finding groups of similar objects to label, based on three parts:Agglomerative Ward Clustering: Minimum variance measure.Clusters of similar size.Silhouette Coefficient:Selects the best cluster for the user to label.One-Class SVM:Trained on the last labeled cluster (discovers harder samples of the same class).


Our clustering step is based in 3 parts:

Agglomerative ward clustering, which introduces a measure of minimum variance.Silhouette coefficient for selecting the best cluster for the user to label.One-Class SVM trained on the newly labeled samples for applying an expansion on harder instances of the same concept.22

Future Work

Propose an iterative scene and object complementary discovery for improving the context.

Make the method user-discriminative, being able to detect the objects that discriminate the persons environments from the egocentric data.

Algorithm code and dataset to be released soon.23Introduction Scheme Methodology Validation Conclusions

As a future work we plan to:

Define an algorithm for discovering objects, scenes and people and taking advantage of event segmentation techniques for a better characterisation of the user.Iterative scene and object complementary discovery.Make the method user-discriminative for detecting only relevant objects for the user.23

The selected easiest objects are complemented with a 20% of samples from the refill bag. They are equally divided between all the known classes except the NoObjects.

A SVM Filter is used to discard as manyNo Objects as possible (i.e. FP of theObject detector).

Ego-Object Discovery Scheme

0.870.750.680.430.72

0.920.890.800.680.650.600.350.410.260.09

Iteration: 1Iteration: 2

The new (label, sample) pairs are added tothe Bag of Refill.The new (label, sample) pairs are also added tothe Bag of Refill.24Introduction Scheme Methodology Validation Conclusions

Quick overview!24

Used a pre-trained CNN provided by Hinton et al., trained on millions of ImageNet images.

Used the output of the penultimate layer as our features (4096 variables).

The selected easiest objects are complemented with a 20% of labeld samples. Equally divided between all the known classes but the NoObjects.Before starting with the discovery iterations,40% of all the object candidates are inserted into the previous knowledge set.

Algorithm

0.870.750.680.430.72

0.920.890.800.680.650.600.350.410.260.09

object discovery using cnn features in egocentric videos

Software