cvpr2010 tutorial: video search engines

96
Cees Snoek & Arnold Smeulders University of Amsterdam 1462010 Video Search Engines CVPR 2010 1 Video Search Engines Video Search Engines Cees Cees G.M. G.M. Snoek Snoek & Arnold W.M. & Arnold W.M. Smeulders Smeulders Intelligent Systems Lab Amsterdam, University of Amsterdam, The Netherlands A brief history of television A brief history of television From broadcasting to narrowcasting From broadcasting to narrowcasting ~1955 ~1985 ~2005 …to …to thincasting thincasting ~>2008 ~2010

Upload: zukun

Post on 26-Dec-2014

53 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 1

Video Search EnginesVideo Search Engines

CeesCees G.M. G.M. SnoekSnoek & Arnold W.M. & Arnold W.M. SmeuldersSmeulders

Intelligent Systems Lab Amsterdam,University of Amsterdam, The Netherlands

A brief history of televisionA brief history of television

•• From broadcasting to narrowcastingFrom broadcasting to narrowcasting

~1955 ~1985 ~2005

•• …to …to thincastingthincasting

~>2008

~2010

Page 2: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 2

The international business caseThe international business case

•• Everybody with a message uses video for deliveryEverybody with a message uses video for delivery

•• Growing Growing unmanageableunmanageable amounts of videoamounts of video

Example from Example from YoutubeYoutube

•• YoutubeYoutube users are uploading 24 hours of video users are uploading 24 hours of video every minuteevery minute

Hours of video uploaded per minute30

25

20

15

10

5

0

Page 3: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 3

Example from the NetherlandsExample from the Netherlands

•• Yearly ingestYearly ingest–– 15.000 hours of video15.000 hours of video

–– 40.000 hours of radio40.000 hours of radio

•• Next 6 yearsNext 6 years–– 137.200 hours of video137.200 hours of video

–– 22.510 hours of film22.510 hours of film

•• Europe’s largest digitization projectEurope’s largest digitization project

–– >1 >1 petabytepetabyte per yearper year –– 123.900 hours of audio123.900 hours of audio

–– 2.900.000 photo’s2.900.000 photo’s

Lack of metadata

ExpertExpert--driven searchdriven search

http://e-culture.multimedian.nl

Page 4: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 4

CrowdCrowd--given searchgiven search

What others say is in the video.

RawRaw--driven searchdriven search

www.science.uva.nl/research/isla

MultimediaN project

Page 5: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 5

1. Short course outline1. Short course outline

.0 Problem statement.0 Problem statement

.1 Measuring features.1 Measuring features

.2 Concept detection.2 Concept detection

3 Lexicon learning3 Lexicon learning.3 Lexicon learning.3 Lexicon learning

.4 Query prediction.4 Query prediction

.5 Video browsing.5 Video browsing

Problem 1:Problem 1:Variation in appearanceVariation in appearance

So many images of one thing, due to minor differences in:illuminationbackground occlusionviewpoint, …

•• ThisThis is the is the sensorysensory gapgap

Page 6: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 6

110101101101101101101100110101101111100

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

Problem 2:Problem 2:What defines things?What defines things?

SuitBasketball

Table

Tree

US flag Building

01011011111001101011011111

11010110111111101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

11010110110110110110110011

Machine

Multimedia ArchivesAircraft

Dog TennisMountain

Fire

1101011011011011011011001101011011111001101011011111

01011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111Humans

Problem 3: Problem 3: Many things in the worldMany things in the world

•• ThisThis is the model gapis the model gap

Page 7: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 7

ProblemProblem 4:4:VocabularyVocabulary problemproblem

Query-by-keyword

Query-by-concept

Query-by-example

Query

Find shots of peopleshaking hands

ThisThis is the is the queryquery--contextcontext gapgap

Query-by-humming

Any combination, any sequence?

Query-by-gesture

Prediction

Problem 5: Problem 5: Use is openUse is open--endedended

screen

scope

•• ThisThis is the interface gapis the interface gap

keywords

Page 8: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 8

Conclusion on problemsConclusion on problems

•• Video search is a diverse and challengeVideo search is a diverse and challenge--rich rich research topicresearch topic–– Sensory gapSensory gap

–– SemanicSemanic gapgap

–– Model gapModel gap

–– QueryQuery--context gapcontext gap

–– Interface gapInterface gap

Today’s promiseToday’s promise

•• You will be acquainted with the theory and You will be acquainted with the theory and practice of the conceptpractice of the concept--based video search based video search paradigm. paradigm.

•• You will be able to recall the five major scientific You will be able to recall the five major scientific problems in video retrieval and explain, and value problems in video retrieval and explain, and value the presentthe present--day solutions.day solutions.

Page 9: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 9

1. Short course outline1. Short course outline

.0 Problem statement.0 Problem statement

.1 Measuring features.1 Measuring features

.2 Concept detection.2 Concept detection

3 Lexicon learning3 Lexicon learning.3 Lexicon learning.3 Lexicon learning

.4 Query prediction.4 Query prediction

.5 Video browsing.5 Video browsing

A million appearancesA million appearances

There are a million appearances to one concept

Where are the patterns (of the same shoe)?

Page 10: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 10

S h th

Invariance: the need for ~Invariance: the need for ~

Somewhere the variance must be removed.

The illumination and the viewing direction are removed as soon as the image has entered.

Common transformationsCommon transformations

Illumination transformationsIllumination transformations ContrastContrastIllumination transformationsIllumination transformations ContrastContrast

Intensity and ShadowIntensity and Shadow

ColorColor

ViewpointViewpoint Rotation and LateralRotation and Lateral

DistanceDistance

Viewing angleViewing angle

ProjectionProjection

CoverCover SpecularSpecular or matteor matte

Occlusion & Clutter, Wear & Tear, Aging, Night & day and Occlusion & Clutter, Wear & Tear, Aging, Night & day and so on into increasingly complex transformations.so on into increasingly complex transformations.

Page 11: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 11

More than one transformationMore than one transformation

Features of selected points may be good enough to describe object

iff the selection & the feature set are both invariant forscene-accidental conditions Gevers TIP 2000

Design of invariants: OrbitsDesign of invariants: Orbits

For a property variant under W observations of a constantFor a property variant under W, observations of a constant property are spread over the orbit. The purpose of an invariant is to capture all of the orbit into one value.

Page 12: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 12

Example: invarianceExample: invariance

Slide credit: Theo Gevers

projection

arctan)( RBGRc =

(R,G,B)-space(c(c11,c,c22,c,c33))--spacespace

},max{arctan),,(1 BG

BGRc =

},max{arctan),,(2 BR

GBGRc =

},max{arctan),,(3 GR

BBGRc =

Color invarianceColor invariance

Gevers, PRL, 1999Geusebroek, PAMI 2001

shadows shading highlights ill. intensity ill. shadows shading highlights ill. intensity ill. colorcolorEE -- -- -- -- --WW -- + + -- + + --CC + + + + -- + + --MM + + + + -- + ++ +NN + ++ + -- + ++ +σ = 1 σ = 2NN + + + + + ++ +LL + + + + + + + + --HH + + + + + + + + --

σ 1 σ 2H 300 315N 550 900C 600 850W 900 995E 950 990Retained from 1000 colours:

Page 13: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 13

Local shape motivationLocal shape motivation

Perceptual importanceConcise data

Robust to occlusion & clutter

Tuytelaars FTCGV 2008

Meet the GaussiansMeet the Gaussians

Taylor expansion at xTaylor expansion at x

Robust additive differentialsRobust additive differentials

For discretely sampled signal use the Gaussians

Dimensions separableDimensions separable

No maxima No maxima introducedintroduced

Page 14: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 14

Meet the Gaussians Meet the Gaussians

The basic video observables are:

Local receptive fields Local receptive fields ff((xx))

The receptive fields up to The receptive fields up to first first order.order.

Grey value as well as opponent color sets.Grey value as well as opponent color sets.

Page 15: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 15

Taxonomy of image structureTaxonomy of image structure

Slide credit: Theo Gevers

T-junction Junction

Highlight

Corner

Meet GaborMeet Gabor

The 2D Gabor function is:

)(222

2

22

21),( vyuxj

yx

eeyxh ++

−= πδ

πσTuning parameters: u, v, σ + usual invariants by combinationM j th d M G b f t t i FManjunath and Ma on Gabor for texture as seen in F-space

Page 16: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 16

Local receptive fields Local receptive fields FF((xx))

The receptive fields for (u, v) measured locallyThe receptive fields for (u, v) measured locally

Greyvalue as well as opponent color sets.Greyvalue as well as opponent color sets.

Hoang 2003

GaborGabor filters: filters: texturetexture

Hoang, ECCV 2002

Original image K-means clustering Segmentation

Page 17: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 17

GaborGabor filters: filters: texturetexture

Local receptive field in f(Local receptive field in f(xx,t),t)

Gaussian equivalent over x and t:Gaussian equivalent over x and t:

zero orderzero order first order over tfirst order over t

Burghouts TIP 2006

Page 18: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 18

Receptive fields: overviewReceptive fields: overview

All observables up to first order All observables up to first order colorcolor, , second order spatial scales, eight second order spatial scales, eight frequency bands & first order in time.frequency bands & first order in time.

Good observables > easy algorithmsGood observables > easy algorithms

Periodicity:

Detect periodic motion by one steered filter:

Deadly simple algorithm…

Burghouts TIP 2006

Page 19: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 19

Meet the LoweansMeet the Loweans

So far we paid respect to the spatial order.So far we paid respect to the spatial order.

Now we will weakly follow the spatial order and form Now we will weakly follow the spatial order and form histograms on all directions we encounter locally, histograms on all directions we encounter locally, … better known as (the second part of) SIFT.… better known as (the second part of) SIFT.

Lowe IJCV 2004

Meet the Meet the LoweansLoweans

4 x 4 Gradient window after thresholding4 x 4 Gradient window after thresholding

Histogram of 4 x 4 samples per window in 8 directionsHistogram of 4 x 4 samples per window in 8 directions

Gaussian weighting around center (Gaussian weighting around center (σσ is 1/2 of is 1/2 of σσ keypoint)keypoint)

4 x 4 x 8 = 128 dimensional feature vector4 x 4 x 8 = 128 dimensional feature vector

Image: Jonas Hurrelmann

Page 20: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 20

SIFT detectionSIFT detection

Slide credit: Jepson 2005

Enriching SIFT Enriching SIFT (in a nutshell)(in a nutshell)

Mikolajczyk, IJCV 2005Ke, CVPR 2004

Van de Sande, PAMI 2010

•• Affine SIFTAffine SIFT–– Choose prominent direction in SIFTChoose prominent direction in SIFT

•• PCAPCA--SIFTSIFT–– Robust and compact representationRobust and compact representation

•• ColorSIFTColorSIFT–– Add several invariant color descriptorsAdd several invariant color descriptors

•• TimeSIFTTimeSIFT–– Anyone?Anyone?

Page 21: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 21

Quality of propertiesQuality of properties

Original blurring JPEG ill.direction viewpoint spectrum

1 in 1000 Harris patches

In the experiment, manual matching of the Harris’ point.

Quality of propertiesQuality of properties

Page 22: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 22

Quality of propertiesQuality of properties

Good invariants are very powerful.

Tracking invariant appearanceTracking invariant appearance

Nguyen PAMI 2003

Page 23: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 23

Where to sample in the video?Where to sample in the video?

•• Video shot is set of frames representing a Video shot is set of frames representing a continuous camera action in time and spacecontinuous camera action in time and space–– Analysis typically on a single key frame per shotAnalysis typically on a single key frame per shot

Shot Key Frame

WhereWhere to sample in the frame?to sample in the frame?

Tuytelaars 2008 FTCGV

Page 24: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 24

Where to sample? contextWhere to sample? context

What is the object in the middle?

No segmentation …No pixel values of the object …

Context in codebooksContext in codebooks

Similarity to K prototype texturesy p yp

. . .

Sky Grass Road

. . .

=

Sky

Gras

sRo

ad Sky

Gras

sRo

ad Sky

Gras

sRo

ad Sky

Gras

sRo

ad Sky

Gras

sRo

ad

. . . =

=

. . . =

Sky

Gras

sRo

ad

Features: Weibull textures

Page 25: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 25

Interest point examplesInterest point examples

Mikolajczyk, CVPR 2006van de Weijer, PAMI 2006

Original image Harris Laplacian Color salient points

DenseDense sampling sampling exampleexample

Page 26: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 26

FastFast densedense descriptorsdescriptors

Uijlings et al, CIVR 2009

Image Patch

Reuse sub-regions: 16x speed-up

Conclusion on Conclusion on measuring featuresmeasuring features

•• Invariance is crucial when designing featuresInvariance is crucial when designing features–– More invariance means less stable…More invariance means less stable…

–– …but more robustness to sensory gap…but more robustness to sensory gap

•• Effective features strike a balance betweenEffective features strike a balance between•• Effective features strike a balance between Effective features strike a balance between invariance and discriminatory powerinvariance and discriminatory power–– And for video search efficiency is helpful also…And for video search efficiency is helpful also…

Page 27: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 27

And there is always more …And there is always more …

For example:For example:

Local Invariant Feature Detectors: A SurveyLocal Invariant Feature Detectors: A Survey

TinneTinne TuytelaarsTuytelaars & & KrystianKrystian MikolajczykMikolajczyk

FTCGV 3:3(177FTCGV 3:3(177——280)280)

1. Short course outline1. Short course outline

.0 Problem statement.0 Problem statement

.1 Measuring features.1 Measuring features

.2 Concept detection.2 Concept detection3 Lexicon learning3 Lexicon learning.3 Lexicon learning.3 Lexicon learning

.4 Query prediction.4 Query prediction

.5 Video browsing.5 Video browsing

Page 28: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 28

The semantic gapThe semantic gap

The semantic gap is the lack of coincidence The semantic gap is the lack of coincidence between the information that one can extract between the information that one can extract from the sensory data and the interpretation that from the sensory data and the interpretation that

Quote

the same data has for a user in a given situationthe same data has for a user in a given situation

Arnold Smeulders, PAMI, 2000

The science of labelingThe science of labeling

•• To understand anything in science, things need a To understand anything in science, things need a name that is universally recognizedname that is universally recognized

•• Worldwide endeavor in Worldwide endeavor in naming visual informationnaming visual information

living organisms chemical elements human genome‘categories’ text

Page 29: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 29

How difficult is the problem?How difficult is the problem?

•• Human vision consumes 50% brain power…Human vision consumes 50% brain power…

Van Essen, Science 1992

Semantic concept detectionSemantic concept detection

•• The patient approachThe patient approach–– Building detectors oneBuilding detectors one--atat--aa--timetime

f d f f l fA face detector for frontal faces

Page 30: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 30

A simple face detectorA simple face detector

Video document

Concept detection overviewConcept detection overview

Volleyball

Sport Music Documentary Commercial Cartoon Soap Sitcom Feature film Talk show News

Basketball Semiotic Tennis Car racing Wildlife Football Soccer Ice hockey Financial

Entertainment Information Communication

Semantic

Guest Host Anchor Report Weather Interview

HuntsViolence Car chase Highlights Walking Gathering Graphical Dramatic events Sport events

Dialogue Story Play Break Action

One PhD per detector requires too many students…

Page 31: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 31

So how about these?So how about these?

and the thousands of others ………

Basic concept detectionBasic concept detection

Feature Extraction

Supervised Lea ne

Labeled examples

aircraftoutdoor

Extraction Learner

Training

Feature Measurement

Classification

Testing

Video

It is an aircraft probability 0.7It is outdoor probability

0.95

Page 32: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 32

Demo: Demo: concept detectionconcept detection

Visualization byJasper Schulte

Support vector machineSupport vector machine

Vapnik, 1995

•• Learns statistical model from provided positive Learns statistical model from provided positive and negative examplesand negative examples

•• Maximizes margin between two classes in highMaximizes margin between two classes in high--dimensional feature spacedimensional feature space

MarginMargin

* Positive class+ Negative class

Page 33: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 33

Support vector machine Support vector machine (cont’d)(cont’d)

Vapnik, 1995

•• Depends on many parametersDepends on many parameters–– Select best of multiple parameter combinationsSelect best of multiple parameter combinations

–– Using cross validationUsing cross validation

SVMVector

Semantic Concept Probability

C = cost of misclassification

K( ) = kernel function

Causes of poor generalizationCauses of poor generalization

•• OverOver--fittingfitting–– Separate your dataSeparate your data

•• Curse of dimensionalityCurse of dimensionalityInformation fusion helpsInformation fusion helps–– Information fusion helpsInformation fusion helps

Page 34: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 34

Feature fusionFeature fusionSynchronization

Shot segmentedvideo 

Concept confidence

Normalization

Transformation

Concatenation

Feature vector

Feature fusionFeature fusionSynchronization

Shot segmentedvideo 

Concept confidence

Normalization

Transformation

Concatenation

Feature vector

+ Only one learning phase- Combination often ad hoc - One feature may dominate- Curse of dimenisonality

Page 35: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 35

Avoiding dimensionality curseAvoiding dimensionality curse

Leung and Malik, IJCV, 2001Sivic and Zisserman, ICCV, 2003

•• Codebook aka bagCodebook aka bag--ofof--words modelwords model–– Create a codeword vocabularyCreate a codeword vocabulary

–– DiscretizeDiscretize image with image with codewordscodewords

–– Represent image as codebook histogramRepresent image as codebook histogram

0 100 200 300 400 5000

10

20

30

40

50

60

70

80

EmphasizingEmphasizing spatialspatial configurationsconfigurations

Grauman, ICCV 2005Lazebnik, CVPR 2006

Marszalek, VOC 2007

•• CodebookCodebook ignoresignores geometricgeometric correspondencecorrespondence

•• For video:For video:–– 1x1 entire image1x1 entire image

–– 2x2 image quarters2x2 image quarters

1x3 horizontal bars1x3 horizontal bars

•• SolutionSolution: : spatialspatial pyramidpyramid–– aggregate statistics of local features over fixed aggregate statistics of local features over fixed

subregionssubregions

–– 1x3 horizontal bars1x3 horizontal bars

Page 36: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 36

Codebook modelCodebook model

•• Codebook consists of Codebook consists of codewordscodewords–– kk--means clustering of descriptorsmeans clustering of descriptors–– Commonly 4,000 Commonly 4,000 codewordscodewords per codebookper codebook

Cluster

AssignDense+OpponentSIFT Feature vector (length 4,000)

Codebook assignmentCodebook assignment

van Gemert, PAMI 2010

● Codeword

Hard assignmentHard assignment Soft assignmentSoft assignment

Page 37: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 37

Fast quantizationFast quantization

Moosman, PAMI 2008Uijlings, CIVR 2009

•• Random Random forestsforests–– Randomized process makes it very fast to buildRandomized process makes it very fast to build

–– Tree structure allows fast vector quantizationTree structure allows fast vector quantization

–– Logarithmic rather than linear projection timeLogarithmic rather than linear projection time

•• RealReal timetime BoWBoW (!)(!)•• RealReal--timetime BoWBoW (!)(!)–– WhenWhen usedused withwith fastfast densedense samplingsampling

–– SURF 2x2 descriptor SURF 2x2 descriptor insteadinstead of 4x4of 4x4

–– RBF RBF kernelkernel

GPUGPU--empowered quantizationempowered quantization

•• Achieve dataAchieve data parallelism by writing Euclideanparallelism by writing Euclidean

Van de Sande, TMM 2011

8.00

10.00

12.00

14.00

e (s)

CPU Xeon (3,4GHz)

CPU Opteron 250 (2,4GHz)

CPU Core 2 Duo 6400 (2,13GHz)

CPU Core i7 (2,66GHz)

•• Achieve dataAchieve data--parallelism by writing Euclidean parallelism by writing Euclidean distance in vector formdistance in vector form

0.00

2.00

4.00

6.00

0 5000 10000 15000 20000

Time Pe

r Image

Number of SIFT Descriptors Per Image

GPU Geforce 8800GTX (128 cores)

GPU Geforce GTX260 (216 cores)

17x speed-up

GPU

CPU

Page 38: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 38

Codebook libraryCodebook library

•• A single codebook is…A single codebook is…

Single CodebookSampling method Descriptor Construction Assignment

#1 Dense OpponentSIFT K-means Soft

#2 Harris-Laplace SIFT Radius-based Soft

#3 Dense rgSIFT K-means Hard

Dense + Spatial pyramid C SIFT K means Hard

•• Codebook library is…Codebook library is…–– a configuration of several codebooksa configuration of several codebooks

… Dense + Spatial pyramid C-SIFT K-means Hard

Codebook libraryCodebook library (cont’d)(cont’d)

•• Concatenate multiple codebooksConcatenate multiple codebooks

–– Spatial pyramid adds more dimensionsSpatial pyramid adds more dimensionsoo 1x1 = 4K1x1 = 4Koo 1x1 4K1x1 4K

oo 2x2 = 16K2x2 = 16K

oo 1x3 = 12K1x3 = 12K

–– Feature vector length easily Feature vector length easily >100K>100K……

Page 39: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 39

SVM preSVM pre--computed kernel trickcomputed kernel trick

•• Use distance between feature vectors (!)Use distance between feature vectors (!)

•• Increase efficiency significantlyIncrease efficiency significantly

-γ dist( , )K( , ) = e

y g yy g y–– PrePre--compute the SVM kernel matrixcompute the SVM kernel matrix

–– Long vectors possible as we only need 2 in memoryLong vectors possible as we only need 2 in memory

–– Parameter optimization reParameter optimization re--uses preuses pre--computed matrixcomputed matrix

11 Compute average distances perCompute average distances per N²N² kernel subkernel sub--blockblock

GPUGPU--empoweredempoweredprepre--computed kernelcomputed kernel

Van de Sande, TMM 2011

1.1. Compute average distances per Compute average distances per N²N² kernel subkernel sub--blockblock

2.2. Compute kernel function valuesCompute kernel function values

1000

1200

1400

1600

1800

s)

1x Core i7 (2,66GHz)

1x Opteron 250 (2,4GHz)4x Opteron 250 (2,4GHz)16x Opteron 250 (2,4GHz)25x Opteron 25065x speed-up

1 CPU

4 CPU

0

200

400

600

800

0 20000 40000 60000 80000 100000 120000 140000

Time (

Total Feature Vector Length

25x Opteron 250 (2,4GHz)

p p

3x speed-up

GPU

Page 40: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 40

Feature Feature fusionfusion

Point sampling strategy Color feature extraction Codebook model

Van de Sande, PAMI 2010

0

1

Relative frequency

1 2 3 4 5

Codebook element

Dense sampling

Harris-Laplace salient points

Point sampling strategy Color feature extraction Codeboo ode

1 1

0

1

Relative frequency

1 2 3 4 5

Codebook element

Bag-of-features

Bag-of-features

.

.

.

Image

.

.

.

Spatial pyramid

0

1 2 3 4 5

0

1 2 3 4 5

0

1

1 2 3 4 5

0

1

1 2 3 4 5

Spatial pyramid: multiple bags-of-features

. .

..

+ Codebook reduces dimensionality- Combination still ad hoc - One codebook may dominate

Classifier fusionClassifier fusion

Page 41: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 41

Classifier fusionClassifier fusion

+ Focus on feature strength

+ Fusion in semantic space

- Expensive learning effort

- Loss of feature correlation

Unsupervised fusion of classifiersUnsupervised fusion of classifiers

+ Aggregation functions reduce learning effort

Snoek, TRECVID 2006Wang, ACM MIR 2007

Support Vector

Machine

Global Image Feature

Extraction

GeometricMean

Fisher Linear

Discriminant

Regional Image Feature

Extraction

+ Aggregation functions reduce learning effort+ Efficient use of training examples

Logistic Regression

Extraction

Keypoint Image Feature

Extraction

- Linear function unlikely to be optimal

Page 42: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 42

Fusing conceptsFusing concepts

•• Exploitation of concept coExploitation of concept co--occurrence occurrence –– Concepts do not occur in vacuumConcepts do not occur in vacuum

Concept 1

Concept 2

Concept 3

Naphade Trans. MM 2001

SkyAircraft

HowHow to to fusefuse conceptsconcepts??

•• LearningLearning spatialspatial modelsmodels

•• LearningLearning temporal modelstemporal models

•• IncludeInclude ontologiesontologies

Page 43: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 43

Learning spatial models Learning spatial models -- explicitlyexplicitly

•• Using graphical modelsUsing graphical models–– Computationally complexComputationally complex

–– Limited scalabilityLimited scalability

Qi, TOMCCAP 2009

Learning spatial models Learning spatial models -- implicitlyimplicitly

•• Using support vector machine, or data miningUsing support vector machine, or data mining–– Assumes classifier learns relationsAssumes classifier learns relations

–– Suffers from error propagationSuffers from error propagation

Weng, ACM MM 2008

Page 44: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 44

Learning temporal modelsLearning temporal models

•• Extend spatial models with time dimensionExtend spatial models with time dimension–– Common approach is Hidden Markov ModelCommon approach is Hidden Markov Model

–– Relatively few have actually considered time… Relatively few have actually considered time…

Ebadollahi, ICME 2006

Including knowledgeIncluding knowledge

•• Can ontologies help?Can ontologies help?–– Symbolic ontolgoies Symbolic ontolgoies vsvs uncertain detectorsuncertain detectors

Wu, ICME 2004

Page 45: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 45

Concept detection pipelineConcept detection pipeline

IBM 2003

Concept detection pipelineConcept detection pipeline

IBM 2003

Feature Classifier ConceptFusion Fusion

pFusion

Page 46: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 46

Video diverVideo diver

Wang, ACM MIR 2007

Video diverVideo diver

Feature Classifier Concept

Wang, ACM MIR 2007

Fusion Fusionp

Fusion

Page 47: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 47

Layout Features Extraction

Semantic PathfinderSemantic Pathfinder

Snoek, PAMI 2006

Supervised Learner

Supervised Learner

Visual Features

Semantic Features

Combination

Capture Features Extraction

Content Features Extraction

Select Best of 3 Paths

after Validation

Animal

Vehicle

FlagFire

Supervised Learner

Multimodal Features

Combination

Features Extraction

Textual Features Extraction

Content Analysis Step Style Analysis Step Context Analysis Step

Context Features Extraction Sports

Vehicle

Entertainment MonologueWeather

news

Hu Jintao

Layout Features Extraction

Semantic PathfinderSemantic PathfinderFeature Classifier Concept

Snoek, PAMI 2006

Supervised Learner

Supervised Learner

Visual Features

Semantic Features

Combination

Capture Features Extraction

Content Features Extraction

Select Best of 3 Paths

after Validation

Animal

Vehicle

FlagFire

Fusion Fusion Fusion

Supervised Learner

Multimodal Features

Combination

Features Extraction

Textual Features Extraction

Content Analysis Step Style Analysis Step Context Analysis Step

Context Features Extraction Sports

Vehicle

Entertainment MonologueWeather

news

Hu Jintao

Page 48: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 48

StateState--ofof--thethe--ArtArtFeatureFusion

ClassifierFusion

Snoek et al, TRECVID 2008-2009Van Gemert et al, PAMI 2009

Van de Sande et al, PAMI 2010

Fusion Fusion

StateState--ofof--thethe--ArtArt

Snoek et al, TRECVID 2008-2009Van de Sande et al, PAMI 2010

Van Gemert et al, PAMI 2010

Software available for download at http://colordescriptors.com

Page 49: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 49

Detecting Semantic Concepts in VideoDetecting Semantic Concepts in VideoConclusion on:Conclusion on:

•• We started with invariance and manual laborWe started with invariance and manual labor

•• We generalized with machine learningWe generalized with machine learning–– …but needed several abstractions to do appropriately…but needed several abstractions to do appropriately

•• For the moment, no oneFor the moment, no one--sizesize--fitsfits--all solution all solution –– Learn optimal machinery per conceptLearn optimal machinery per concept

1. Short course outline1. Short course outline

.0 Problem statement.0 Problem statement

.1 Measuring features.1 Measuring features

.2 Concept detection.2 Concept detection

3 Lexicon learning3 Lexicon learning.3 Lexicon learning.3 Lexicon learning.4 Query prediction.4 Query prediction

.5 Video browsing.5 Video browsing

Page 50: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 50

Problem 3: Problem 3: Many things in the worldMany things in the world

•• ThisThis is the model gapis the model gap

Trial 1: counting dictionary wordsTrial 1: counting dictionary words

Biederman, Psychological Review 1987

Slide credit: Li Fei-Fei

Page 51: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 51

Trial 2: reverseTrial 2: reverse--engineeringengineering

Hauptmann, PIEEE 2008

•• Estimation by Hauptmann et al.: 5000Estimation by Hauptmann et al.: 5000–– Using manually labeled queries and conceptsUsing manually labeled queries and concepts

–– But speculative, and questionable assumptionsBut speculative, and questionable assumptions

‘Google performance’

Oracle Combination + Noise

‘Realistic’ Combination

How to obtain labeled examples?How to obtain labeled examples?massive amounts of

3 billion

–– …but only human …but only human expertsexperts provide provide good qualitygood quality examplesexamples

Page 52: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 52

Experts start with concept definitionExperts start with concept definition

•• MM078MM078--Police/Security PersonnelPolice/Security Personnel–– Shots depicting law enforcement or private Shots depicting law enforcement or private

security agency personnel.security agency personnel.

Expert annotation toolsExpert annotation tools

Volkmer, ACM MM 2005

•• Balance between:Balance between:–– Spatiotemporal level of Spatiotemporal level of

annotation detailannotation detail

–– Number of conceptsNumber of concepts

–– Number of positive Number of positive d ti ld ti land negative examplesand negative examples

Page 53: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 53

LSCOM LSCOM (Large Scale Concept Ontology for Multimedia)(Large Scale Concept Ontology for Multimedia)

Naphade, IEEE MM 2006

•• Provides manual annotations for 449 conceptsProvides manual annotations for 449 concepts–– In international broadcast TV newsIn international broadcast TV news

•• Connection to Connection to CycCyc ontologyontology

•• LSCOMLSCOM--LiteLite–– 39 semantic concepts39 semantic concepts

http://www.lscom.org/

Verified positive Verified positive examplesexamples

•• ImageNetImageNet (11M images)(11M images)

–– 4000 categories4000 categories

–– > 100 examples> 100 examples

D t l CVPR 2009•• SUN SUN (130K images)(130K images)

–– 397 scene categories397 scene categories

–– > 100 examples> 100 examples

Deng et al, CVPR 2009

Xiao et al, CVPR 2010

Page 54: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 54

Bridging the model gapBridging the model gap

•• RequirementsRequirements–– Generic concept detection methodGeneric concept detection method

–– Massive amounts of labeled examplesMassive amounts of labeled examples

–– Evaluation methodEvaluation method

–– Fair amount of computationFair amount of computationpp

Model gap best treated by Model gap best treated by TRECVIDTRECVID

•• Situation in 2000Situation in 2000–– Various concept definitionsVarious concept definitions

–– SpecificSpecific and and smallsmall data setsdata sets

–– Hard to compare methodologiesHard to compare methodologies = ResearchersNIST

•• Since 2001 worldwide evaluation by NISTSince 2001 worldwide evaluation by NIST–– TRECVID benchmarkTRECVID benchmark

Page 55: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 55

NIST TRECVID benchmarkNIST TRECVID benchmark

P i id i l hP i id i l h•• Promote progress in video retrieval researchPromote progress in video retrieval research–– Provide common dataset Provide common dataset –– Challenging tasksChallenging tasks–– Independent evaluation protocolIndependent evaluation protocol–– Forum for researchers to compare resultsForum for researchers to compare results

http://trecvid.nist.gov/

Video data setsVideo data sets

•• US TV news US TV news (`03/`04)(`03/`04)

•• International TV news International TV news (`05/`06)(`05/`06)

•• Dutch TV infotainment Dutch TV infotainment (`07/`08/`09)(`07/`08/`09)

Page 56: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 56

TRECVID 2010TRECVID 2010Internet Archive web videosInternet Archive web videos

Expert annotation effortsExpert annotation efforts

500

LSCOM

MediaMill - UvA

Others

Expert annotation efforts

17 32 39101

374…

TRECVID Edition

Page 57: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 57

Measuring performanceMeasuring performance

Set of retrieved itemsSet of relevant items

•• PrecisionPrecision Set of relevant

1.

2.

3.

Results

retrieved items

inverse relationship•• RecallRecall

4.

5.

Evaluation measureEvaluation measure

•• Average PrecisionAverage Precision–– Combines precision and recallCombines precision and recall

–– Averages precision after relevant shotAverages precision after relevant shot

–– Top of ranked list most important Top of ranked list most important

1.

2.

3.

Results

AP

AP =1/1 + 2/3 + 3/4 + …

number of relevant documents

4.

5.

AP 

Page 58: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 58

De facto evaluation standardDe facto evaluation standard

Concept examplesConcept examples

fAircraft

Beach

Mountain

Note the variety in visual appearance

People marching

Police/Security

Flower

Page 59: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 59

TRECVID TRECVID concept detection task resultsconcept detection task results

•• Hard to compareHard to compare–– Different video dataDifferent video data

–– Different conceptsDifferent concepts

•• Clear Clear top top performersperformers

2003

2004

2005

2006 pp pp–– Median skews to leftMedian skews to left

–– Learning effectLearning effect

–– Plenty of variationPlenty of variation

2007

2008

2009

UvAUvA--MediaMillMediaMill@TRECVID@TRECVID

Snoek et al, TRECVID 04-09

• 900 other detection systems

Page 60: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 60

1,000,0001,000,000 frames analyzedframes analyzed

Snoek, ICME 2005

•• MultiMulti--frame biggest improvement in 2008 / 2009frame biggest improvement in 2008 / 2009–– We analyze up to 10 extra We analyze up to 10 extra ii--frames/shotframes/shot

–– For 2009 yields 1M frames to analyze for the For 2009 yields 1M frames to analyze for the test set test set

•• Need to speedNeed to speed--up by being “smart and strong”up by being “smart and strong”SpeedSpeed up feature extractionup feature extraction–– SpeedSpeed--up feature extractionup feature extraction

–– SpeedSpeed--up quantizationup quantization

–– SpeedSpeed--up kernelup kernel--based learningbased learning

–– SpeedSpeed--up by computingup by computing

ComputingComputing

•• Best 2009 system much more efficient than 2008 Best 2009 system much more efficient than 2008 systemsystem–– 6x more visual data analyzed using less compute power6x more visual data analyzed using less compute power

•• Some best estimates:Some best estimates:•• Some best estimates:Some best estimates:–– Visual feature extraction: 8400 Visual feature extraction: 8400 PProcessorrocessor--NNodeode--HHoursours

–– Training concept detectors: 4000 PNHTraining concept detectors: 4000 PNH

–– Applying concept detectors: ~1 week GPUApplying concept detectors: ~1 week GPU

Page 61: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 61

MediaMill ChallengeMediaMill Challengehttp:/www.mediamill.nl/challenge/

•• The Challenge providesThe Challenge provides–– Manually annotated lexicon Manually annotated lexicon

of of 101101 semantic conceptssemantic concepts

–– PrePre--computed lowcomputed low--level level featuresfeatures

–– Trained classifier modelsTrained classifier models

55 experimentsexperiments

•• The Challenge allows toThe Challenge allows to–– Gain insight in intermediate Gain insight in intermediate

video analysis stepsvideo analysis steps

–– Foster repeatability of Foster repeatability of experimentsexperiments

–– Optimize video analysis Optimize video analysis systems on a component levelsystems on a component level–– 55 experimentsexperiments

–– Implementation + resultsImplementation + results

y py p

–– Compare and improveCompare and improve

• The Challenge lowers threshold for novice researchers

Columbia374 + VIREO374Columbia374 + VIREO374

•• Baseline detectors for 374 conceptsBaseline detectors for 374 concepts

http://www.ee.columbia.edu/ln/dvmm/columbia374/http://www.cs.cityu.edu.hk/~yjiang/vireo374/http://www.ee.columbia.edu/ln/dvmm/CU-VIREO374/

Page 62: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 62

Community myths or facts?Community myths or facts?

•• Chua et al., Chua et al., ACM Multimedia 2007ACM Multimedia 2007

–– Video search is practically solved and progress Video search is practically solved and progress has only been incrementalhas only been incremental

•• Yang and Hauptmann, Yang and Hauptmann, ACM CIVR 2008ACM CIVR 2008

–– Current solutions are weak and generalize poorlyCurrent solutions are weak and generalize poorly

We have done an experimentWe have done an experiment

•• Two video search engines from 2006 and 2009Two video search engines from 2006 and 2009–– MediaMillMediaMill Challenge 2006 systemChallenge 2006 system

–– MediaMillMediaMill TRECVID 2009 systemTRECVID 2009 system

•• How well do they detect 36 LSCOM concepts?How well do they detect 36 LSCOM concepts?

Page 63: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 63

Four video data set mixturesFour video data set mixtures

TRECVID 2005 TRECVID 2007

•• TrainingTrainingBroadcast

newsDocumentary

video

Within domain

•• TestingTestingDocumentary

videoBroadcast

news

Cross domain

Performance doubled in just 3 yearsPerformance doubled in just 3 years

Snoek & Smeulders, IEEE Computer 2010

• 36 concept detectors

–– Even when using training Even when using training data of different origindata of different origin

–– Vocabulary still limitedVocabulary still limited–– Vocabulary still limitedVocabulary still limited

Page 64: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 64

500 detectors, a closer look500 detectors, a closer look

The number of labeled image examples used at trainingThe number of labeled image examples used at training time seems decisive in concept detector accuracy.

Demo timeDemo time

Page 65: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 65

500 detectors, a closer look500 detectors, a closer look

Learning social tag relevance by Learning social tag relevance by neighbor votingneighbor voting

Xirong Li, TMM 2009

•• Exploit consistency in tagging behavior of Exploit consistency in tagging behavior of different users for visually similar imagesdifferent users for visually similar images

Page 66: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 66

AlgorithmAlgorithm: : tagtag--relevancerelevance learninglearning

Why is this useful?

Image retrieval experimentsImage retrieval experiments

•• UserUser--tagged image databasetagged image database–– 3.5 million 3.5 million labeled Flickr imageslabeled Flickr images

•• Visual feature Visual feature –– A global colorA global color--texture texture

•• Evaluation setEvaluation set–– 20 concepts20 concepts

•• Evaluation criteriaEvaluation criteria–– Average precisionAverage precision

Page 67: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 67

Image retrieval experimentsImage retrieval experiments

•• A A standardstandard tagtag--basedbased retrievalretrieval frameworkframework–– RankingRanking functionfunction: OKAPI: OKAPI--BM25BM25

•• ComparisonComparison•• ComparisonComparison–– Baseline:Baseline: retrieval using original tagsretrieval using original tags

–– Neighbor: Neighbor: retrieval using learned tag relevance as retrieval using learned tag relevance as tag frequencytag frequency

ResultsResults

24% relative improvement

Page 68: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 68

Updated tag relevanceUpdated tag relevance

Results suggest…Results suggest…

•• Relevance of a tag can be predicted based on Relevance of a tag can be predicted based on ‘wisdom’ of crowds‘wisdom’ of crowds–– Even with a lightEven with a light--weight visual featureweight visual feature

–– And, a small database of 3.5M imagesAnd, a small database of 3.5M images

Page 69: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 69

Conclusion on lexicon learningConclusion on lexicon learning

RequiresRequires•• Invariant featuresInvariant features

•• Concept detectionConcept detection

•• Many, many, annotationsMany, many, annotations

Suffers most Suffers most fromfrom•• Weakly labeled visual dataWeakly labeled visual data

•• Transfer across domainsTransfer across domains

•• Measuring performanceMeasuring performance

•• Lots of computationLots of computation

1. Short course outline1. Short course outline

.0 Problem statement.0 Problem statement

.1 Measuring features.1 Measuring features

.2 Concept detection.2 Concept detection

.3 Lexicon learning.3 Lexicon learninggg

.4 Query prediction.4 Query prediction

.5 Video browsing.5 Video browsing

Page 70: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 70

ProblemProblem 4:4:VocabularyVocabulary problemproblem

Query-by-keyword

Query-by-concept

Query-by-example

Query

Find shots of peopleshaking hands

ThisThis is the is the queryquery--contextcontext gapgap

Query-by-humming

Any combination, any sequence?

Query-by-gesture

Prediction

Traditional approachesTraditional approaches

•• Parse topicParse topic--text, reframe as querytext, reframe as query--byby--keywordkeyword–– Using speech recognition or closed captionsUsing speech recognition or closed captions

•• Use possible images accompanying the topic for Use possible images accompanying the topic for bb llqueryquery--byby--exampleexample

–– Using shotUsing shot--based based keyframeskeyframes

Page 71: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 71

A new hope?A new hope?

We are now seeing researchers starting to use We are now seeing researchers starting to use the confidence values from concept detectors, the confidence values from concept detectors, within the shot retrieval process and this appears within the shot retrieval process and this appears

Quote

Alan Smeaton, Inf. Sys., 2007Alan Smeaton, Inf. Sys., 2007

to be the roadmap for future work in this area.to be the roadmap for future work in this area.

Video query examplesVideo query examples

Find shots of a hockey rink with at least one of the nets fully

visible from some point of view.

Find shots of one or more helicopters in flight.

Find shots of a group including at least four people dressed in suits, seated, and with at least

one flag.

Find shots of an office setting, i.e., one or more desks/tables and one or more computers

and one or more people

Page 72: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 72

Typical ‘oracle’ resultsTypical ‘oracle’ results

Find shots of a graphic map of Iraq, location of Bagdhad

marked - not a weather map.

Best 2nd Best

Maps OverlayedText

+

H t l t l t

Find shots of George Bush entering or leaving a vehicle

(e.g., car, van, airplane, helicopter, etc) (he and vehicle both visible at the same time)

IyadAllawi

+rocket

propelled grenades

How to select relevant detectors automatically?

??

Detector selection strategiesDetector selection strategies

Find shots of an office setting

Video Query

Visual‐based

Ontology‐based

Text‐basedbased

Page 73: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 73

TextText--based selectionbased selection

•• Represent concept descriptions as term vectorRepresent concept descriptions as term vector–– Exact matching to link queries to detectorsExact matching to link queries to detectors

–– Vector space model to match queries to descriptionsVector space model to match queries to descriptions

–– CorpusCorpus--driven query expansion methodsdriven query expansion methods

Recap: concept definitionRecap: concept definition

•• MM078MM078--Police/Security PersonnelPolice/Security Personnel–– Shots depicting law enforcement or private Shots depicting law enforcement or private

security agency personnel.security agency personnel.

Page 74: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 74

VisualVisual--based selectionbased selection

Rasiwasia et al, TMM 2007

FeatureExtraction

SupervisedLearner

Training stage

Testing stage

FeatureExtraction Classify

Query Image

face p=0.97outdoor p=0.98helicopter p=0.43…

1. Identify objects in WordNet

“Find a report from the desert showing a house or car on fire.”OntologyOntology--based selectionbased selection

Slide credit: Bouke Huurnink

car

desert house

fire

Page 75: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 75

2. Identify related concept detectors

“Find a report from the desert showing a house or car on fire.”OntologyOntology--based selectionbased selection

car

desert house

fire

car

3. Find most similar and specific detector using ontology measure

“Find a report from the desert showing a house or car on fire.”OntologyOntology--based selectionbased selection

Wei, TMM 2008

car

vehicle

car

desert

fire

desert building

desert house

fire

Page 76: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 76

Search strategy combinationSearch strategy combination

1.1. Parallel combinationParallel combination–– All retrieval results are taken into account simultaneouslyAll retrieval results are taken into account simultaneously

–– Weighted average of individual resultsWeighted average of individual results

22 Sequential combinationSequential combination2.2. Sequential combinationSequential combination–– Update video retrieval results in successionUpdate video retrieval results in succession

–– PseudoPseudo--relevance feedback variantsrelevance feedback variants

Parallel combinationParallel combination

Paul Natsev, 2005 - 2007

•• Estimating weights of individual retrieval Estimating weights of individual retrieval modules is main problemmodules is main problem–– Predefine by expertPredefine by expert

–– Learn from training dataLearn from training data

–– OracleOracle

–– ……

Page 77: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 77

Sequential combinationSequential combination

Hsu, IEEE MM 2007

•• Quality depends on ranking in first stageQuality depends on ranking in first stage

Tackling the queryTackling the query--context gapcontext gap

•• RequirementsRequirements–– Several video retrieval methods Several video retrieval methods

(by(by--keywords/bykeywords/by--example/byexample/by--manymany--concepts)concepts)

–– Detector selection and combination methodDetector selection and combination method

–– Training dataTraining data

–– Search topicsSearch topics

–– Evaluation methodEvaluation method

–– WellWell--defined application domain?defined application domain?

Page 78: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 78

QueryQuery--context gap is reasonably context gap is reasonably addressed at TRECVIDaddressed at TRECVID

•• Automatic search taskAutomatic search task–– Automatically solve 20+ search topicsAutomatically solve 20+ search topics

–– Return 1,000 ranked shotReturn 1,000 ranked shot--based results per topicbased results per topic

–– Evaluate using Average PrecisionEvaluate using Average Precision

•• DrawbacksDrawbacks–– Queries tend to be overly complex, limited in Queries tend to be overly complex, limited in

number, drifting away from realnumber, drifting away from real--world usageworld usage

–– Inclusion of recall lowers performanceInclusion of recall lowers performance

–– Lack of training dataLack of training data http://trecvid.nist.gov/

Query prediction at TRECVIDQuery prediction at TRECVID

•• Performance is humblePerformance is humble–– Lack of training dataLack of training data

–– Especially in 2007Especially in 2007--20082008

ition

•• Most pronouncedMost pronounced–– 2005: Large semantic lexicon2005: Large semantic lexicon

Mean Average Precision

Ed

Page 79: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 79

Conclusion on query predictionConclusion on query prediction

•• Retrieval tasks (in)directly covered by concepts Retrieval tasks (in)directly covered by concepts in the lexicon benefit from detector selectionin the lexicon benefit from detector selection

•• Retrieval tasks not covered by the lexicon, Retrieval tasks not covered by the lexicon, result in humble performance onlyresult in humble performance onlyresult in humble performance onlyresult in humble performance only

•• We need better evaluation setupWe need better evaluation setup

1. Short course outline1. Short course outline

.0 Problem statement.0 Problem statement

.1 Measuring features.1 Measuring features

.2 Concept detection.2 Concept detection

.3 Lexicon learning.3 Lexicon learninggg

.4 Query prediction.4 Query prediction

.5 Video browsing.5 Video browsing

Page 80: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 80

Problem 5: Problem 5: Use is openUse is open--endedended

screen

scope

•• ThisThis is the interface gapis the interface gap

keywords

So many choices for retrieval…So many choices for retrieval…

•• Why not let user decide interactively?Why not let user decide interactively?–– Navigate through query methodsNavigate through query methods

–– Visualize video retrieval resultsVisualize video retrieval results

–– Learn from browsing behaviorLearn from browsing behavior

Page 81: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 81

Video search 1.0Video search 1.0

Note the influence of textual meta data such as theNote the influence of textual meta data, such as the video title, on the search results.

Query selectionQuery selection

Page 82: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 82

‘Classic’ Informedia system‘Classic’ Informedia system

Carnegie Mellon University

•• First multimodal video search engineFirst multimodal video search engine

FíschlárFíschlár

Dublin City University

•• Optimized for use by “real” usersOptimized for use by “real” users

Page 83: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 83

IBM iMARSIBM iMARS

IBM Research

•• A web based systemA web based system

http://mp7.watson.ibm.com/

MediaMagicMediaMagic

FxPal

•• Focus on the story levelFocus on the story level

Page 84: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 84

VisionGoVisionGo

NUS & ICT-CAS

•• Extremely fast and efficientExtremely fast and efficient

CrossBrowsing through resultsCrossBrowsing through results

Snoek, TMM 2007

RankRank

TimeTime

Sphere variant

Page 85: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 85

Demo: Demo: MediaMillMediaMill video search enginevideo search engine

http://www.mediamill.nlhttp://www.mediamill.nl

The RotorBrowserThe RotorBrowser

de Rooij, TMM 2009

Page 86: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 86

Extreme video retrievalExtreme video retrieval= very demanding!= very demanding!

Carnegie Mellon University

•• ObservationObservation–– Correct results are retrieved, but not optimally rankedCorrect results are retrieved, but not optimally ranked

–– If user has time to scan results exhaustively, retrieval is If user has time to scan results exhaustively, retrieval is a matter of watching, selecting, and sorting a matter of watching, selecting, and sorting quicklyquickly

Learning from the userLearning from the user

•• Two common approachesTwo common approaches1.1. Relevance feedbackRelevance feedback

2.2. Active learningActive learning

Page 87: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 87

Relevance feedbackRelevance feedback

Try to find boundary in Try to find boundary in F1

Slide credit: Marcel Worring

feature space best feature space best separating positiveseparating positivefrom negative examplesfrom negative examples

F1

In the next iteration the In the next iteration the ill hill h

F F2

Measure of class membership probabilityMeasure of class membership probability

user will have more user will have more samples hence a better samples hence a better estimate of the boundaryestimate of the boundary

Active learningActive learning

Slide credit: Marcel Worring

In active learning the In active learning the system decides which system decides which elements to show for elements to show for

F1

The system can safely assumethis sample is also negative

feedback and which not.feedback and which not.

F F2

For the system it isFor the system it isrelevant to know this labelrelevant to know this label

Page 88: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 88

Demo: Demo: ForkBrowserForkBrowser

de Rooij, CIVR 2008

•• Learn positive and negative items from user Learn positive and negative items from user browse behaviorbrowse behavior

Demo: Timeline navigationDemo: Timeline navigation

http://hollandsglorieoppinkpop.nl/

Page 89: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 89

The future of video retrieval?The future of video retrieval?

Jonathan Wang, Carnegie Mellon University

Interface gap best addressed byInterface gap best addressed by

•• TRECVID interactive search taskTRECVID interactive search task–– Interactively solve 20+ search topics Interactively solve 20+ search topics (10/15 minutes)(10/15 minutes)

–– Return 1,000 ranked shotReturn 1,000 ranked shot--based results per topicbased results per topic

–– Evaluate using Average PrecisionEvaluate using Average Precision

•• VideOlympicsVideOlympics showcaseshowcase

Page 90: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 90

Video browsing at TRECVIDVideo browsing at TRECVID

•• Wide performance variationWide performance variation–– # concept detectors# concept detectors

–– Search interfaceSearch interface

–– Expert Expert vsvs novice user novice user

Edition

•• Most pronouncedMost pronounced–– 2003: 2003: InformediaInformedia classicclassic

–– 2005: Large semantic lexicon2005: Large semantic lexicon

–– 2008: Active learning2008: Active learningMean Average Precision

UvAUvA--MediaMillMediaMill browsersbrowsers@TRECVID@TRECVID

Snoek et al. TRECVID 04-09

CrossBrowser

ForkBrowserForkBrowser

• 228 other interactive systemsTraditional systems

Page 91: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 91

CriticismCriticism

•• Retrieval performance cannot be the only Retrieval performance cannot be the only evaluation criterionevaluation criterion–– Quality of detectors countsQuality of detectors counts

–– Experience of searcher countsExperience of searcher counts

–– Visualization of interface countsVisualization of interface counts

–– Ease of use countsEase of use counts

–– ……

Video browsing at Video browsing at VideOlympicsVideOlympics

•• Promote multiple facets of video searchPromote multiple facets of video search–– RealReal--time interactive video search ‘competition’time interactive video search ‘competition’

–– Simultaneous exposure of multiple video search enginesSimultaneous exposure of multiple video search engines

–– Highlight possibilities and limitations of stateHighlight possibilities and limitations of state--ofof--thethe--artart

Page 92: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 92

ParticipantsParticipants

Video trailerVideo trailer

http://www.VideOlympics.org

Page 93: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 93

Conclusion on video browsingConclusion on video browsing

•• Interaction by browsing indispensible for any Interaction by browsing indispensible for any practical video search enginepractical video search engine

•• System should support user by active learning System should support user by active learning and intuitive (mobile) visualizationsand intuitive (mobile) visualizationsand intuitive (mobile) visualizationsand intuitive (mobile) visualizations

Interactive Video RetrievalInteractive Video RetrievalConclusion on:Conclusion on:

queryprediction

measuring conceptdetection

lexiconlearning

browsingvideovideo

videomeasuringfeatures

Page 94: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 94

And there is always more …And there is always more …

•• Recommended special issues Recommended special issues –– IEEE Transactions on Pattern Analysis and Machine IEEE Transactions on Pattern Analysis and Machine

Intelligence, 30(11), November 2008Intelligence, 30(11), November 2008

–– Proceedings of the IEEE, 96(4), April 2008Proceedings of the IEEE, 96(4), April 2008

–– IEEE Transactions on Multimedia, 9(5), August 2007IEEE Transactions on Multimedia, 9(5), August 2007

•• 300 references on video search300 references on video search–– Snoek and Snoek and WorringWorring, Concept, Concept--Based Video Retrieval, Based Video Retrieval,

Foundations and Trends in Information Retrieval, Foundations and Trends in Information Retrieval, Vol. 2: No 4, pp 215Vol. 2: No 4, pp 215--322, 322, 2009.2009.

General references IGeneral references IColor Invariance. Jan-Mark Geusebroek, R. van den Boomgaard, Arnold W. M. Smeulders, H. Geerts. IEEE Trans. Pattern Analysis and Machine Intelligence, Volume 23 (12) page 1338 1350 2001Volume 23 (12), page 1338-1350, 2001.

Distinctive Image Features from Scale-Invariant Keypoints. D. G. Lowe. Int'l Journal of Computer Vision, vol. 60, pp. 91-110, 2004.

Large-Scale Concept Ontology for Multimedia. M. R. Naphade, J. R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. S. Kennedy, A. G. Hauptmann, and J. Curtis,. IEEE MultiMedia, vol. 13, pp. 86-91, 2006.

Efficient Visual Search for Objects in Videos. J. Sivic and A. Zisserman. Proceedings of the IEEE, vol. 96, pp. 548-566, 2008.

High Level Feature Detection from Video in TRECVid: A 5-year Retrospective of Achievements. A. F. Smeaton, P. Over, and W. Kraaij, In Multimedia Content Analysis, Theory and Applications, (A. Divakaran, ed.), Springer, 2008.

Visually Searching the Web for Content. J. R. Smith and S.-F. Chang. IEEE MultiMedia, vol. 4, pp. 12-20, 1997.

Content Based Image Retrieval at the End of the Early Years. Arnold W. M. Smeulders, Marcel Worring, S. Santini, A. Gupta, R. Jain. IEEE Trans. Pattern Analysis and Machine Intelligence, Volume 22 (12), page 1349-1380, 2000.

Page 95: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 95

General references IIGeneral references IIThe Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia. Cees G. M. Snoek, Marcel Worring, Jan C. van Gemert, Jan-Mark Geusebroek Arnold W M Smeulders ACM Multimedia page 421 430 2006Geusebroek, Arnold W. M. Smeulders. ACM Multimedia, page 421-430, 2006.

The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing. Cees G. M. Snoek, Marcel Worring, Jan-Mark Geusebroek, Dennis C. Koelma, Frank J. Seinstra, Arnold W. M. Smeulders. IEEE Trans. Pattern Analysis and Machine Intelligence, Volume 28 (10), page 1678-1689, 2006.

A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval. Cees G. M. Snoek, Marcel Worring, Dennis C. Koelma, Arnold W. M. Smeulders. IEEE Trans. Multimedia, Volume 9 (2), page 280-292, 2007.

Adding Semantics to Detectors for Video Retrieval. Cees G. M. Snoek, BoukeHuurnink, Laura Hollink, Maarten de Rijke, Guus Schreiber, Marcel Worring. IEEE Trans. Multimedia, Volume 9 (5), page 975-986, 2007.

The MediaMill TRECVID 2004-2009 Semantic Video Search Engine. Cees G. M. Snoek et al. Proceedings of the TRECVID Workshop, 2004-2009.

Visual-Concept Search Solved?. Cees G.M. Snoek and Arnold W.M. Smeulders. IEEE Computer, Volume 43 (6) (in press), 2010.

General references IIIGeneral references IIIConcept-Based Video Retrieval. Cees G. M. Snoek, Marcel Worring. Foundations and Trends in Information Retrieval, Vol. 4 (2), page 215-322, 2009.

http://www.science.uva.nl/research/publications/

Local Invariant Feature Detectors: A Survey. T. Tuytelaars and K. Mikolajczyk. Foundations and Trends in Computer Graphics and Vision, vol. 3, pp. 177-280, 2008.

Evaluating Color Descriptors for Object and Scene Recognition. Koen E. A. van de Sande, Theo Gevers, Cees G. M. Snoek. IEEE Trans. Pattern Analysis and Machine Intelligence (in press), 2010.

Visual Word Ambiguity. Jan C. van Gemert, Cor J. Veenman, Arnold W. M. Smeulders, Jan-Mark Geusebroek. IEEE Trans. Pattern Analysis and Machine Intelligence (in press), 2009.

Real-Time Bag of Words Approximately Jasper R R Uijlings, Arnold W MReal Time Bag of Words, Approximately. Jasper R. R. Uijlings, Arnold W. M. Smeulders, R. J. H. Scha. ACM Int'l Conference on Image and Video Retrieval, 2009.

Lessons Learned from Building a Terabyte Digital Video Library. H. D. Wactlar, M. G. Christel, Y. Gong, and A. G. Hauptmann. IEEE Computer, vol. 32, pp. 66-73, 1999.

Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study. J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Int'l Journal of Computer Vision, vol. 73, pp. 213-238, 2007.

Page 96: cvpr2010 tutorial: video search engines

Cees Snoek & Arnold SmeuldersUniversity of Amsterdam

14‐6‐2010

Video Search Engines  ‐ CVPR 2010 96

Contact infoContact info

•• Cees Snoek Cees Snoek http://staff.science.uva.nl/~cgmsnoekhttp://staff.science.uva.nl/~cgmsnoek

•• Arnold Arnold SmeuldersSmeuldershttp://staff science uva nl/~smeulderhttp://staff science uva nl/~smeulderhttp://staff.science.uva.nl/~smeulderhttp://staff.science.uva.nl/~smeulder

Further informationFurther information

M di Mill lwww.MediaMill.nl