cvpr 2011 best student paper recognition using visual phrases

Click here to load reader

Upload: jocelyn-fowler

Post on 18-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Recognition Using Visual Phrases

CVPR 2011 Best Student PaperRecognition Using Visual Phrases

OutlineIntroductionRelated WorksApproachPhrasal RecognitionDecoding Multiple DetectionsResultsDiscussionIntroduction

IntroductionVisual PhrasesTraditional approachDetect objects (person, dog, horse)Relation between objectsNMS(non-maximum suppression)PASCAL otherDisadvantage

Introduction

ContributionsIntroducing visual phrases as categories for recognitionIntroducing a novel dataset for phrasal recognitionThe state of the art methods of modeling interactionsA decoding algorithmPerformance results in multi-class object recognitionIntroduction

Object RecognitionObject RecognitionDeformable templates [IEEE2001,CVPR1998] Part base model [CVPR2005,CVPR2003] DetectorsDeformable based model [IEEE2010] Related WorkObject InteractionsFocus on relation [ECCV2008] Person with object [CVPR 2010]Objects [ECCV2010]Relation of objects [ICCV2010] left, right, top, down label weight, confidence Related WorkScene understandingRepresent scenes as with global features that take into account general information about images [Vision2001,CVPR2006]Cluster [ECCV2008]Related WorkMachine translationStatistical translation methods [Press2010]Translation modelLanguage modelA decoding algorithm

Output: a query sentenceAllow multiple to multiple translation

Related Work

Phrasal Recognition Datasetselect 8 obj. class (Pascal VOC 2008) person, bike, car, dog, horse, bottle, sofa, chairA list of 17 visual phrases + background classDog jumping ,horse jumping, person riding horsePhrasal RecognitionPhrasal Recognition

Datasets2769 images (822 negative image)120 examples, average of each classes 5067 bounding boxes(1796 phrases,3271 objects)The complexity of Visual Phrases creaseThe number of training example decrease

Phrasal RecognitionPhrasal RecognitionAppearance modelsDeformation part model17 phrases in our dataset using provided bounding boxes8 categories from Pascal are used as models for objectsNMS decodingPerfect detectors with excellent tightly tuned modelsNatural decoding strategy better than NMS on interactionGreedily search the space of labelsWell designed feature (nearby)

Decoding Multiple DetectionsAll detector responsesDecodingFinal outcomeDecoding processWe compare our decoding algorithm with that of [2] on our phrase datasetStep1: construct the featureStep2: running algorithm to learn a set of weights that rescore the confidences of the bounding boxes based on interactionsStep3: We again rescore until optimal

Decoding Multiple Detections

Discriminative models for multi-class object layout

Decoding Multiple Detections : a bounding box in an imageAn image is represented as a collection of overlapping Bounding boxesX = { : i=1.M},M is the total num of bounding boxK is different categories1 , 11 is the score of image X with Y is the set of weights that corresponds tothe class of the bounding box

RepresentationImage = bounding boxesConfidenceOverlapSize ratioRelationAbove, Below, overlappingWindow, category, spatial binsRepresentation has K*3*3+1 dimensionsDecoding Multiple DetectionsInferenceassume bounding boxes are independent given their features1

Decoding Multiple Detections

LearningA form of max margin structure learning1

Decoding Multiple Detections

Decoding Multiple Detections1

our inner maximization is exact and very fast. We solve this optimization problem by subgradient descent method as follows.

Single category detectiondeformable part models for 17 visual phrasethe trained models from for objectsUse PASCAL dataset : 50 positive and 150 negative examples Show Precision-Recall (PR) curvesTrained these detectors with at most 50 positive examples

Result

ResultResult

Result

Result

Decoding

ResultPaper decoding*[2]NMSOverall AP0.3190.313 0.308Mean per class AP0.4950.4930.491[2] C. F. C. Desai, D. Ramanan. Discriminative models for multi-class object layout. In ICCV, 2010.Result

Result

Introduce visual phrases, phrasal recognition datasetA coding algorithm The dimensionality of our features grows with the number of categories Future Workthe relations between attributes and objectsparts and objectsvisual phrases and scenesobjects and visual phrases mirror one anotherDiscussionExperienceLow complexityUse less data to detectionFeatures grows with the number of categories (exponential 2n)But we dont need to consider all of the categories when we model the interactionsBuilding long enough phrase tables is still a challengeDiscussion