visual search for musical performances and endoscopic videos

30
VISUAL SEARCH FOR MUSICAL PERFORMANCES AND ENDOSCOPIC VIDEOS Degree’s Final Project Dissertation Telecommunications Engineering Jennifer Roldán Supervisors: Assoc. Prof. Mathias Lux Assoc. Prof. Xavier Giró

Upload: xavier-giro

Post on 20-Jul-2015

122 views

Category:

Technology


1 download

TRANSCRIPT

VISUAL SEARCH

FOR MUSICAL PERFORMANCES

AND ENDOSCOPIC VIDEOS

Degree’s Final Project Dissertation

Telecommunications Engineering

Jennifer Roldán

Supervisors:

Assoc. Prof. Mathias Lux

Assoc. Prof. Xavier Giró

Outline of the Thesis

1. Introductioni. Motivation

ii. Gantt chart. Work Plan

2. Overview. Existing Demo-Application

3. Methods i. Global features using Late Fusion Methods

ii. Local features: SIMPLE descriptor

4. Data setsi. Musical Performances

ii. Endoscopic Videos

5. Experimentsi. Quantitative evaluation

ii. Qualitative evaluation. Thinking-aloud test

6. Conclusions and Further Work

Sep 2014 – May 2015

Slide 2

Motivation

• Application for covering the surgeons’ needs and

automatize data processing

• Endoscopic videos (confidential data)

• Focus of the project

• Video retrieval on demand for surgeons

• Musical performances (free data set)

• Reproducible results for evaluation

• Quantitative and qualitative studies

Slide 3

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Gantt Chart. Work Plan

Slide 4

Use of existing tools and

define the Thesis’s statements

Experiments with

endoscopic videos

Two papers submitted

in 13th CBMI Congress

Project development

with Jiku Mobile data set

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Existing Demo Application

Slide 5

Fig. All results are presented in HTML 5 and can be viewed in a

recent version of common browsers.

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Existing Demo Application

Slide 6

Publicated at ACM Mutimedia Open Source Competition [1]

• Open source library for CBIR

• Based on Lucene

• Java text retrieval framework

• Indexing and Search

• Supporting Global and Local features

(Integrate until 20 descriptors)

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

[1] Mathias Lux. LIRE: Open source image retrieval in java. In Proceedings of the 21st ACM international conference on Multimedia, pages 843{846.ACM, 2013.

Methodology

Slide 7

1. Previous methods in demo application:

• Global Features

i. CEDD. Color and Edge Directivity Descriptor

ii. Color Histogram.

iii.PHOG. Pyramid Histogram of Oriented Gradients

• Late Fusion Methods

2. Extend the methods to local features for retrieval

• Use an existing tool to study better results

• SIMPLE descriptor

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Method 1

Global features using Late Fusion

Feature extraction and indexing Similarity measure Fusion

Fig. System Architecture

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Slide 8

Method 1

Global features using Late Fusion

Feature extraction and indexing Similarity measure Fusion

Global descriptors for each IRM:

1. CEDD

2. Color Histogram

3. PHOG

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Slide 9

Method 1

Global features using Late Fusion

Feature extraction and indexing Similarity measure Fusion

Normalization:

• Two different approaches

• N limited images:

1. rank: 𝑅𝐾 n =N+1−Rk n

N

2. score: 𝑅𝐾 n =Rk n −min(RK)

max Rk −min(𝑅𝑘)

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Slide 10

Method 1

Global features using Late Fusion

Feature extraction and indexing Similarity measure Fusion

Fusion Methods:

a. Sum:

𝑅𝑡 n = 𝑘𝑅𝑘 𝑛 = 𝑅1 𝑛 + 𝑅2 𝑛 +⋯+ 𝑅𝐾 𝑛

b. Sum with combMNZ:

sum x number of IRM returned by image n

Final Ranked Lists:1. Sum (ranks)

2. Sum (scores)

3. Sum with comMNZ (ranks)

4. Sum with comMNZ (scores)

4

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Slide 11

“Searching Images with MPEG-7 (& MPEG-7-like)

Powered Localized dEscriptors (SIMPLE)” [2]

SURF detector + CEDD descriptor

• Extraction of global features as local ones (image key points)

• Codebook of 512 VW using Bag-Of-Visual-Words (BOVW) model

• K-means clustering algorithm with vocabulary of 512 words.

Method 2

Local features. SIMPLE descriptor

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Slide 12

[2] Chryssanthi Iakovidou, Nektarios Anagnostopoulos, Athanasios Ch Kapoutsis, Yiannis Boutalis, and Savvas A Chatzichristos. Searching images

with MPEG-7 (& mpeg-7-like) powered localized descriptors: the SIMPLE answer to effective content based image retrieval. In 12th International

Workshop on Content-Based Multimedia Indexing (CBMI), pages 1-6. IEEE, 2014.

Data sets

Video Retrieval for two different cases

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Slide 13

1

2

Musical Performances

Endoscopic Videos

Musical Performances

Freely available data set. It allows us to compare results

Jiku Mobile data set• 473 video clips

• Mobile devices

• Multiple users

• 5 events and several performances

Test• 356 videos randomly selected

• Based on 1 frame per second

• 412 query imagesFig. Query images event domain

Slide 14

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

1

Fig. Query images medical domain

Endoscopic Videos2 Confidential and anonymized

data

Live video stream data set • Surgeons’ recordings in HQ

• Inside of their subjects

• 33 hours roughly covered

• 54 laparoscopy procedures

Test• 1,276 videos randomly selected

• Based on 5 frame per second

• 600 query images

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Slide 15

Experiments

Video Retrieval tested by two different evaluations

Slide 16

1

2

Quantitative evaluation

Qualitative evaluation(Thinking-aloud Test)

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Evaluation Social Study, at AAU

Quantitative study:

• To find the position of the video where the query image belongs

• Results Global Features

• Results Local Features

Qualitative study. Thinking-aloud Test

• Interface semi-interactive web-page

• Participants are researchers and non-researchers within the

CODE-MM Project

• 6 Volunteers for Musical Performances Test

• 2 Volunteers for Endoscopic Videos Test

Slide 17

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

1

2

Evaluation Social Study, at AAU

Thinking-aloud Test

• Interface semi-interactive web-page blindly labeled with 3 Search

Engines (A, B, C)

i. sum of ranks method and global features Search Engine A

ii. sum of scores method and global features Search Engine B

iii. SIMPLE (SURF detector + CEDD descriptor) Search Engine C

• Participants must show their thoughts in loud-voice

• Sessions are recorded

Slide 18

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Evaluation Social Study, at AAU

Thinking-aloud Test

Slide 19

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Fig. Screenshots of the different movements of the first volunteer

Fig. Screenshot from the thinking aloud test

Fig. Interface for the thinking aloud test

Experiments

Video Retrieval tested by two different evaluations

Slide 20

1

2

Musical Performances

Endoscopic Videos

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Table I. Results of the tests on where that actual video can be found in the results. The first four

columns give the four different tested feature fusion approaches, the fifth one gives the results

on the use of the SIMPLE-CEDD descriptors

Benchmarking based on the 412 set of queries:

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Musical Performances

Quantitative Evaluation

Slide 21

Source video of the query image ranked in the first position of the result list• Global features: 96,6% of the queries

• Local features: 91,5% of the queries

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Musical Performances

Qualitative Evaluation

Fig. Most used query images in the user test (left to right)

Global features ( A, B ) • Search Model: Abstract

exploratory

• Different sub-events, same view point

Local features ( C ) • Search Model: Semantically

similar content

• Same performance, different viewpoints

• Good results in earlier video’s position

Overall impression

Slide 22

Global Features using Late Fusion SIMPLE: SURF detector + CEDD

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Musical Performances

Qualitative Evaluation

Slide 23

Experiments

Slide 24

1

2

Musical Performances

Endoscopic Videos

Video Retrieval tested by two different evaluations

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Benchmarking based on the 600 set of queries:

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Endoscopic Videos

Quantitative Evaluation

Table II. Results of the tests on where that actual video can be found in the results. The first four

columns give the four different tested feature fusion approaches, the fifth one gives the results on

the use of the SIMPLE-CEDD descriptors

Slide 25

Source video of the query image ranked in the first position of the result list• Global features: 78.3% of the queries

• Local features: 79,8% of the queries

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Endoscopic Videos

Qualitative Evaluation

Global features ( A, B ) • Search Model: Abstract

exploratory

• Relevant shots in the top results

(semantically dissimilar)

Local features ( C ) • Search Model: Semantically

similar content

• Same movements in surgeries

• Good results for finding the

query’s video source

Overall impression

Fig. Shots (photos) manually created from the surgeon in the course of

the procedure.

Slide 26

Qualitative Evaluation

Fig. Screenshots of the result presentation showing the three top videos and the query image. All results

are presented in HTML5 and can be viewed in recent browsers supporting HTML5 videos and JavaScript.

Best matching frames are indicated by triangles in the red and grey time line below the video player.

SIMPLE: SURF detector + CEDD descriptor

Slide 27

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Conclusions and Further Work

An existing tool is adapted and extended

for content-based video retrieval

Slide 28

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Global features

Exploratory search mode

Local features

Semantically similar content

Further work: • ad-hoc search within surgery procedures.

• faster indexing strategies

• fusion of local and global features.

• different implementation of SIMPLE descriptor (Random Detector + modified-CEDD descriptor).

Appendix

Slide 29

Introduction · Overview · Methods · Data sets · Experiments · Conclusions

[3] Roldan-Carlos J, Lux M, Giró-i-Nieto X, Muñoz-Trallero P, Anagnostopoulos N. Event Video Retrieval using Global and Local Descriptors in Visual Domain.

In: IEEE/ACM International Workshop on Content-Based Multimedia Indexing - CBMI 2015 .

[4] Roldan-Carlos J, Lux M, Giró-i-Nieto X, Muñoz-Trallero P, Anagnostopoulos N. Visual Information Retrieval in Endoscopic Video Archives. In: IEEE/ACM

International Workshop on Content-Based Multimedia Indexing - CBMI 2015 . Prague, Czech Republic: In Presshttp://arxiv.org/abs/1504.07874

Two papers were presented in the Special Session on Medical Multimedia Processing [3] [4] (acceptance rate for special sessions= 55%)

Thank you for your attention

Do you have any question?

7 May 2015

Visual Search for

Musical Performances

and Endoscopic Videos

Jennifer Roldán