pseudo-relevance feedback for multimedia retrieval

33
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL 2011-11709 Seo Seok Jun

Upload: melita

Post on 15-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL. 2011-11709 Seo Seok Jun. Abstract. Video information retrieval Finding info. relevant to query Approach Pseudo-relevance feedback Negative PRF. Questions. How this paper approach to content-based video retrieval - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

2011-11709Seo Seok Jun

Page 2: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

AbstractVideo information retrieval

◦Finding info. relevant to queryApproach

◦Pseudo-relevance feedback◦Negative PRF

Page 3: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

QuestionsHow this paper approach to con-

tent-based video retrievalWhat is the advantage of nega-

tive PRFWhat this paper do to remove ex-

treme outliers

Page 4: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

IntroductionContent-based access to video

info.CBVR

◦Allow users to query and retrieve based on audio and video

◦Limite capturing fairly low-level physical fea-

tures Color, texture, shape, … Difficult to determine similarity metrics

diff. query scenario -> diff. similarity metrics Animals -> by shape Sky, water -> by color

Page 5: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Introduction◦Making the similarity metric adaptive

Adapting similarity metric◦Automatically discover the discrimi-

nating feature subspace◦How?

Cast as classification problem Margin-based classifier

SVMs, Adaboosting High performance Learning the maximal margin hyperplane Users’ query only provides a small positive data

with no explicit negative data at all

Page 6: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Introduction◦Thus, to use, more training data

needed Negative examples Random sampling

As positive data # in a collection is very small Risk: positive examples might be included as

negative In standard relevance feedback

Ask user to label Tedious!

Automatic retrieval is essential!

Page 7: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Introduction Automatic relevance feedback

Based on not tailored to specific queries Negative feedback -> sample the bottom-

ranked examples Ex) car -> different from query images in

“shape” Feedback negative data

re-weight Refine discriminating feature subspace

Learning algorithm would be better than univer-sal similarity metric(used in all query)

Page 8: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

IntroductionLearning process

◦Purpose Discover a better similarity metric Finding the most discriminating subspace be-

tween positive and negative examples.◦Cannot produce fully accurate classifica-

tion Training data is too small

◦Negative distribution -> not reliable!◦Risk! -> feedback from incorrect estimate◦Combining! (with generic similarity met-

ric)

Page 9: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Related workBriefly discuss some of the fea-

tures of complete system◦The Informedia Digital Video Library◦Relevance and Pseudo-Relevance

Feedback

Page 10: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Pseudo-Relevance Feed-backSimilar to relevance feedback

◦Both oriented from document re-trieval

◦Without any user intervention◦Few study in multimedia retrieval yet

No longer can assume top ranked are al-ways relevant

Relatively poor performance of visual re-trieval

Page 11: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Pseudo-Relevance Feed-backPositive example based learning

◦Partially supervised learning◦Begin with a small # of positive ex-

amples◦No negative examples◦Goal: associate all examples in col-

lection with one of the given cate-gories Out goal?

Producing a ranked list of the examples

Page 12: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Pseudo-Relevance Feed-backSemi-supervised learning

◦Two classifier◦Training set of labeled data◦Working set of unlabeled data

Transductive learning ◦Paradigms to utilize the info. of unla-

beled data◦Successful in image retrieval◦Computation is too expensive

Multimedia -> large collection

Page 13: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Pseudo-Relevance Feed-backQuery: text + audio + image/

videoRetrieving a set of relevant video

shot◦Permutation of the video shots◦Sorted by their similarity

Difference(two video segments) -> simi-larity metric

◦Video feature Multiple perspective

Speech transcript, audio, camera motion, video frame

Page 14: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Pseudo-Relevance Feed-backRetrieval as classification prob-

lem◦Data collection can be separated into

pos/neg◦Mean average precision

Precision and recall is common measure But not taking the rank into consideration Area under an ideal recall/precision curve

Page 15: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Pseudo-Relevance Feed-backPRF

◦Users’ judgment -> output of a base similarity metric

◦fb: base similarity metric◦p: sampling strategy◦fl: learning algorithm◦g: combination strategy

Page 16: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Pseudo-Relevance Feed-back

Page 17: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Algorithm DetailsBase similarity metric

◦Dissimilarity for x to query q1,…,qn◦Score -> for each frame

But retrieval unit -> shot(multiple frames)

Choose maximal score of a frame in one shot

Sampling Strategies◦From speech transcript -> positive

feedback Due to high precision of textual retrieval

Page 18: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Algorithm DetailsClassification Algorithm

◦SVMs◦Posterior probability

Linearly normalize the score = g(, ) = + : combinational factor

Page 19: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Algorithm DetailsCombinational with text retrieval

◦Externally provided video summaries are source of textual information Posterior probability set to 1 if keyword

exists Posterior probability for

+ + : posterior prob. of transcript retrieval : video summary retrieval Each for

In experiment , = 1, = 0.2

Whole video as a unit -> too coarse to be ac-curate

Page 20: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Pseudo-Relevance Feed-backPositive example

◦Query examplesNegative example

◦Strongest negative examplesFeedback only one time

◦Computational issueAutomatically feedback the training

data based on generic similarity metric◦To learn adaptive similarity metric◦Generalize the discriminating subspace for

various queries

Page 21: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Pseudo-Relevance Feed-backWhy good?

◦Good generalization ability of mar-gin-based learning algorithm

Isotropic data distribution -> in-valid◦Directions vary with different

queries, topics Sky -> color Car -> shape

◦In this case, PRF provide better simi-lar metric than generic.

Page 22: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Pseudo-Relevance Feed-backTest two case

◦Positive data Along the edge of the data collection Center of the data collection

◦Both case PRF superior Base similarity metric: generic metric

Cannot be modified across query

Page 23: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Pseudo-Relevance Feed-back

Page 24: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Pseudo-Relevance Feed-backPRF metric can be adapted based

on the global data distribution and training data◦By feeding back the negative exam-

ples◦Near optimal decision boundary

Associate higher score◦Farther away from the negative data◦Good when positive data are near

the margin Common in high dimensional spaces

Page 25: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Pseudo-Relevance Feed-backDownside

◦Some neg. outlier assigned a higher score than any positive data -> more false alarm

◦Solution Combining base metric and PRF metric Smooth out most of the outlier Just simple linear combination(1:1) Reasonable trade-off between local clas-

sification behavior and global discriminat-ing ability

Page 26: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

ExperimentVideo: TREC Video Retrieval TrackText: NIST

◦40 hours of MPEG-1 videoAudio: splits the audio from the video

◦Down-samples to 16cKz, 16 bit sampleSpeech recognition system

◦Broadcast news transcriptImage processing side

◦Low-level image features; color and tex-ture

◦Query as xml

Page 27: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Experiment

Page 28: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Results

Page 29: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Results

Page 30: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Results

Page 31: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

Results

Page 32: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

results

Page 33: PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL

conclusionClassification taskMachine learning theory to video

retrievalSVMs learn to weight the discrim-

inating featuresNegative PRF

◦Separate the means of distributions of the neg. and pos. examples

Smoothing with combination