temporal segmentation of egocentric videos to highlight ...10.1007/978-3-319-46604... · temporal...

Temporal Segmentation of Egocentric Videos toHighlight Personal Locations of Interest

Supplementary Material

Antonino Furnari, Giovanni Maria Farinella, Sebastiano Battiato

Department of Mathematics and Computer ScienceUniversity of Catania

{furnari,gfarinella,battiato}@dmi.unict.it

This document is intended for the convenience of the reader and contains sup-plementary material which could not be included in the submitted paper dueto space limits. Specifically, Section 1 reports and discusses additional results toallow a better assessment of the performances of the proposed method.

1 Additional Results

Fig. 1 to Fig.11 report a graphical representation of the segmentation resultsrelated to the proposed method (method [c] in Table 1). Each image reports theresults for the multi-class classifier alone (first row), the rejection mechanism(second row) and the HMM (third row). The last row reports the ground truthresults. As outlined in the manuscript, simple class discrimination (top row)yields noisy predictions when ground truth frames are negative. The rejectionmechanism successfully detects negative segments. The use of a HMM finallyhelps reducing sudden changes in the predicted labels.

In Fig.s 12 - 21, we report the segmentation diagrams for the methods com-pared in Table1. Fig. 22 reports the segmentation diagram for the concatenationof all sequences. The diagrams are intended for qualitative and comparative as-sessment of the results. Each diagram reports the ground truth labels as wellas the predicted labels for each of the considered methods. The depicted seg-mentations are the output of the HMM applied to each of the methods. Theproposed methods ([c] and [f ] in Table1) in average outperform the competitorsand reach remarkable performances in some cases (e.g., Fig.s 15, 16, 18, 20).As can be noted, the localization accuracy of the segmentation boundaries isslightly affected by the proposed negative rejection method.

References

1. Gal, Y., Ghahramani, Z.: Dropout as a Bayesian Approximation: RepresentingModel Uncertainty in Deep Learning. arXiv preprint arXiv:1506.02142 (2015)

2. Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based visionsystem for place and object recognition. International Conference on ComputerVision (2003)

2 Antonino Furnari, Giovanni Maria Farinella, Sebastiano Battiato

Accuracy Comp. PerformancesId Settings Discrim. +Rejection +HMM Dimensions Time

[c] L ND 94.53 85.00 88.63 378 MB 13.10 ms[f ] L LR 92.31 81.00 85.37 26 MB 10.23 ms

[g] SIFT 34.64 33.16 – 71 MB 5170.1 ms[h] L ND NE 73.84 76.42 79.69 378 MB 12.82 ms[i] SVM [5] 87.76 74.14 79.64 423 MB 97.83 ms

Table 1. Comparisons with the state of the art. Architectural settings: L the con-volutional layers are locked, ND dropout is disabled, LR fully connected layers arereplaced by a single logistic regression layer, SIFT the SIFT feature matching baseline,NE the model is trained on both positive and negative samples, SVM classificationbased on one-class and multiclass SVM classifiers.

Sequence 1

Discr.

+Rej.

+HMM

GT

Fig. 1. Segmentation results obtained with the proposed method [c] in Table 1 relatedto Sequence1.

3. Templeman, R., Korayem, M., Crandall, D., Apu, K.: PlaceAvoider: Steering First-Person Cameras away from Sensitive Spaces. In: Annual Network and DistributedSystem Security Symposium. (2014) 23–26

4. Bishop, C.M.: Pattern recognition and Machine Learning. Springer (2006)5. Furnari, A., Farinella, G.M., Battiato, S.: Recognizing personal contexts from ego-

centric images. In: Workshop on Assistive Computer Vision and Robotics (ACVR)in conjunction with the IEEE International Conference on Computer Vision. (2015)

Highlighting Personal Location of Interest in Egocentric Video 3

Sequence 2

Discr.

+Rej.

+HMM

GT

Fig. 2. Segmentation results obtained with the proposed method [c] in Table 1 relatedto Sequence 2.

Sequence 3

Discr.

+Rej.

+HMM

GT



Sequence 4

Discr.

+Rej.

+HMM

GT


Sequence 5

Discr.

+Rej.

+HMM

GT



Sequence 6

Discr.

+Rej.

+HMM

GT


Sequence 7

Discr.

+Rej.

+HMM

GT



Sequence 8

Discr.

+Rej.

+HMM

GT


Sequence 9

Discr.

+Rej.

+HMM

GT



Sequence 10

Discr.

+Rej.

+HMM

GT


All sequences

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

Discr.

+Rej.

+HMM

GT

Fig. 11. Segmentation results obtained with the proposed method [c] in Table 1 relatedthe concatenation of all sequences.


Sequence 1

[c]

[f]

[g]

[h]

[i]

GT

Fig. 12. Comparative segmentation results of the methods reported in Table 1 relatedto Sequence 1.

Sequence 2

[c]

[f]

[g]

[h]

[i]

GT



Sequence 3

[c]

[f]

[g]

[h]

[i]

GT


Sequence 4

[c]

[f]

[g]

[h]

[i]

GT



Sequence 5

[c]

[f]

[g]

[h]

[i]

GT


Sequence 6

[c]

[f]

[g]

[h]

[i]

GT



Sequence 7

[c]

[f]

[g]

[h]

[i]

GT


Sequence 8

[c]

[f]

[g]

[h]

[i]

GT



Sequence 9

[c]

[f]

[g]

[h]

[i]

GT


Sequence 10

[c]

[f]

[g]

[h]

[i]

GT



All sequences

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

[c]

[f]

[g]

[h]

[i]

GT

Fig. 22. Comparative segmentation results of the methods reported in Table 1 relatedto all sequences.

temporal segmentation of egocentric videos to highlight ...10.1007/978-3-319-46604... · temporal...

Documents