visual search for musical performances and endoscopic videos

VISUAL SEARCH

FOR MUSICAL PERFORMANCES

AND ENDOSCOPIC VIDEOS

Degree’s Final Project Dissertation

Telecommunications Engineering

Jennifer Roldán

Supervisors:

Assoc. Prof. Mathias Lux

Assoc. Prof. Xavier Giró

Outline of the Thesis

1. Introductioni. Motivation

ii. Gantt chart. Work Plan

2. Overview. Existing Demo-Application

3. Methods i. Global features using Late Fusion Methods

ii. Local features: SIMPLE descriptor

4. Data setsi. Musical Performances

ii. Endoscopic Videos

5. Experimentsi. Quantitative evaluation

ii. Qualitative evaluation. Thinking-aloud test

6. Conclusions and Further Work

Sep 2014 – May 2015

Slide 2

Motivation

• Application for covering the surgeons’ needs and

automatize data processing

• Endoscopic videos (confidential data)

• Focus of the project

• Video retrieval on demand for surgeons

• Musical performances (free data set)

• Reproducible results for evaluation

• Quantitative and qualitative studies

Slide 3

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Gantt Chart. Work Plan

Slide 4

Use of existing tools and

define the Thesis’s statements

Experiments with

endoscopic videos

Two papers submitted

in 13th CBMI Congress

Project development

with Jiku Mobile data set


Existing Demo Application

Slide 5

Fig. All results are presented in HTML 5 and can be viewed in a

recent version of common browsers.


Existing Demo Application

Slide 6

Publicated at ACM Mutimedia Open Source Competition [1]

• Open source library for CBIR

• Based on Lucene

• Java text retrieval framework

• Indexing and Search

• Supporting Global and Local features

(Integrate until 20 descriptors)


[1] Mathias Lux. LIRE: Open source image retrieval in java. In Proceedings of the 21st ACM international conference on Multimedia, pages 843{846.ACM, 2013.

Methodology

Slide 7

1. Previous methods in demo application:

• Global Features

i. CEDD. Color and Edge Directivity Descriptor

ii. Color Histogram.

iii.PHOG. Pyramid Histogram of Oriented Gradients

• Late Fusion Methods

2. Extend the methods to local features for retrieval

• Use an existing tool to study better results

• SIMPLE descriptor


Method 1

Global features using Late Fusion

Feature extraction and indexing Similarity measure Fusion

Fig. System Architecture


Slide 8

Method 1



Global descriptors for each IRM:

1. CEDD

2. Color Histogram

3. PHOG


Slide 9

Method 1



Normalization:

• Two different approaches

• N limited images:

1. rank: 𝑅𝐾 n =N+1−Rk n

N

2. score: 𝑅𝐾 n =Rk n −min(RK)

max Rk −min(𝑅𝑘)


Slide 10

Method 1



Fusion Methods:

a. Sum:

𝑅𝑡 n = 𝑘𝑅𝑘 𝑛 = 𝑅1 𝑛 + 𝑅2 𝑛 +⋯+ 𝑅𝐾 𝑛

b. Sum with combMNZ:

sum x number of IRM returned by image n

Final Ranked Lists:1. Sum (ranks)

2. Sum (scores)

3. Sum with comMNZ (ranks)

4. Sum with comMNZ (scores)

4


Slide 11

“Searching Images with MPEG-7 (& MPEG-7-like)

Powered Localized dEscriptors (SIMPLE)” [2]

SURF detector + CEDD descriptor

• Extraction of global features as local ones (image key points)

• Codebook of 512 VW using Bag-Of-Visual-Words (BOVW) model

• K-means clustering algorithm with vocabulary of 512 words.

Method 2

Local features. SIMPLE descriptor


Slide 12

[2] Chryssanthi Iakovidou, Nektarios Anagnostopoulos, Athanasios Ch Kapoutsis, Yiannis Boutalis, and Savvas A Chatzichristos. Searching images

with MPEG-7 (& mpeg-7-like) powered localized descriptors: the SIMPLE answer to effective content based image retrieval. In 12th International

Workshop on Content-Based Multimedia Indexing (CBMI), pages 1-6. IEEE, 2014.

Data sets

Video Retrieval for two different cases


Slide 13

1

2

Musical Performances

Endoscopic Videos


Freely available data set. It allows us to compare results

Jiku Mobile data set• 473 video clips

• Mobile devices

• Multiple users

• 5 events and several performances

Test• 356 videos randomly selected

• Based on 1 frame per second

• 412 query imagesFig. Query images event domain

Slide 14


1

Fig. Query images medical domain

Endoscopic Videos2 Confidential and anonymized

data

Live video stream data set • Surgeons’ recordings in HQ

• Inside of their subjects

• 33 hours roughly covered

• 54 laparoscopy procedures

Test• 1,276 videos randomly selected

• Based on 5 frame per second

• 600 query images


Slide 15

Experiments

Video Retrieval tested by two different evaluations

Slide 16

1

2

Quantitative evaluation

Qualitative evaluation(Thinking-aloud Test)


Evaluation Social Study, at AAU

Quantitative study:

• To find the position of the video where the query image belongs

• Results Global Features

• Results Local Features

Qualitative study. Thinking-aloud Test

• Interface semi-interactive web-page

• Participants are researchers and non-researchers within the

CODE-MM Project

• 6 Volunteers for Musical Performances Test

• 2 Volunteers for Endoscopic Videos Test

Slide 17


1

2


Thinking-aloud Test

• Interface semi-interactive web-page blindly labeled with 3 Search

Engines (A, B, C)

i. sum of ranks method and global features Search Engine A

ii. sum of scores method and global features Search Engine B

iii. SIMPLE (SURF detector + CEDD descriptor) Search Engine C

• Participants must show their thoughts in loud-voice

• Sessions are recorded

Slide 18



Thinking-aloud Test

Slide 19


Fig. Screenshots of the different movements of the first volunteer

Fig. Screenshot from the thinking aloud test

Fig. Interface for the thinking aloud test

Experiments


Slide 20

1

2


Endoscopic Videos


Table I. Results of the tests on where that actual video can be found in the results. The first four

columns give the four different tested feature fusion approaches, the fifth one gives the results

on the use of the SIMPLE-CEDD descriptors

Benchmarking based on the 412 set of queries:



Quantitative Evaluation

Slide 21

Source video of the query image ranked in the first position of the result list• Global features: 96,6% of the queries

• Local features: 91,5% of the queries



Qualitative Evaluation

Fig. Most used query images in the user test (left to right)

Global features ( A, B ) • Search Model: Abstract

exploratory

• Different sub-events, same view point

Local features ( C ) • Search Model: Semantically

similar content

• Same performance, different viewpoints

• Good results in earlier video’s position

Overall impression

Slide 22

Global Features using Late Fusion SIMPLE: SURF detector + CEDD




Slide 23

Experiments

Slide 24

1

2


Endoscopic Videos



Benchmarking based on the 600 set of queries:


Endoscopic Videos

Quantitative Evaluation

Table II. Results of the tests on where that actual video can be found in the results. The first four

columns give the four different tested feature fusion approaches, the fifth one gives the results on

the use of the SIMPLE-CEDD descriptors

Slide 25

Source video of the query image ranked in the first position of the result list• Global features: 78.3% of the queries

• Local features: 79,8% of the queries


Endoscopic Videos


Global features ( A, B ) • Search Model: Abstract

exploratory

• Relevant shots in the top results

(semantically dissimilar)

Local features ( C ) • Search Model: Semantically

similar content

• Same movements in surgeries

• Good results for finding the

query’s video source

Overall impression

Fig. Shots (photos) manually created from the surgeon in the course of

the procedure.

Slide 26


Fig. Screenshots of the result presentation showing the three top videos and the query image. All results

are presented in HTML5 and can be viewed in recent browsers supporting HTML5 videos and JavaScript.

Best matching frames are indicated by triangles in the red and grey time line below the video player.

SIMPLE: SURF detector + CEDD descriptor

Slide 27


Conclusions and Further Work

An existing tool is adapted and extended

for content-based video retrieval

Slide 28


Global features

Exploratory search mode

Local features

Semantically similar content

Further work: • ad-hoc search within surgery procedures.

• faster indexing strategies

• fusion of local and global features.

• different implementation of SIMPLE descriptor (Random Detector + modified-CEDD descriptor).

Appendix

Slide 29


[3] Roldan-Carlos J, Lux M, Giró-i-Nieto X, Muñoz-Trallero P, Anagnostopoulos N. Event Video Retrieval using Global and Local Descriptors in Visual Domain.

In: IEEE/ACM International Workshop on Content-Based Multimedia Indexing - CBMI 2015 .

[4] Roldan-Carlos J, Lux M, Giró-i-Nieto X, Muñoz-Trallero P, Anagnostopoulos N. Visual Information Retrieval in Endoscopic Video Archives. In: IEEE/ACM

International Workshop on Content-Based Multimedia Indexing - CBMI 2015 . Prague, Czech Republic: In Presshttp://arxiv.org/abs/1504.07874

Two papers were presented in the Special Session on Medical Multimedia Processing [3] [4] (acceptance rate for special sessions= 55%)

Thank you for your attention

Do you have any question?

7 May 2015

Visual Search for


and Endoscopic Videos

Jennifer Roldán

visual search for musical performances and endoscopic videos

Technology

previous methods

methodology slide

work plan slide

late fusion feature

searchsupporting global

global featurescedd

open source image retrieval

projectvideo retrieval