semantic contours from inverse detectors bharath hariharan et.al. (iccv-11)
TRANSCRIPT
Semantic Contours from Inverse Detectors
Bharath Hariharan et.al. (ICCV-11)
Problem
• Localizing and classifying category-specific object contours in real world images
Class specific contours
Low-level contours(No-class specific)
Naive Solution
• Localizing and classifying category-specific object contours in real world images
• Using detector outputs will result is contours from surrounding context
• To avoid this problem they propose the inverse detector
• - Feature vector for pixel (i, j)
The Inverse Detector
• Given localized contours I and object detector , the Inverse Detector produces the object contour image
• I – image• G – output of contour detector• Gij – scores the likelihood of a pixel (i,j) lying on a contour• R1, ..., Rl – l activation windows of the detector • sk – score corresponding to each activation window Rk
Inverse detector
Feature Vector• Each detector window divided into S spatial bins• Contours are binned into O orientation bins• For a pixel (i, j), for an activation window RK, assigned into one of bins (from SO)
• Feature Vector at a location (i, j), and detector RK:
• index of the bin into which the pixel (i, j) falls
• en: an SO-dimensional vector with 1 in the nth position and 0 otherwise
• Feature vector for pixel (i, j):• weighted sum of across all the activation windows
Inverse detectors
• Inverse detectors is of the following form:
• Complete system: use of inverse detectors for localizing semantic contours• Using poselet types object detectors[1]• bottom-up contour detector[2]
• where, learn weight vector using a linear SVM with these features
Inverse detector
[1]-Detecting people using mutually consistent poselet activation. L. Bourdev et.al., ECCV-2010[2] - Contour detection and hierarchical image segmentation. P. Arbelaez et.al, PAMI-2011
Localizing semantic contours using inverse detectors
• System has two stages • train inverse detectors for each poselet types
• let P poselets corresponding to category C be• combine output of these inverse detectors to produce category-specific contours
• Stage 1: train inverse detectors (of the following form) for each poselet (as discussed previously)
• Stage 2: combining the outputs of each of these inverse detectors
• Features: concatenate the outputs of the inverse detectors corresponding to each of the poselet type
• Train a linear SVM (with classifying each pixel belonging to object contour or not)
Combining information across categories
• Previous model: considers each category independently. • In this model: combine information from across categories• Propose two methods
Method 1 • First level: Train contour detector for each category separately• Second level: Train on the outputs of these contour detectors
• Feature vector at the second level:
Method 2
• Only One level: Train on the features which are the outputs of the inverse detectors corresponding to the poselets of all categories
• Feature vector this level:
Semantic Boundaries Dataset (SBD)
• 8498 training images and 2820 test images (both instance specific and class specific)
Benchmark• Show precision-recall curve for a detector producing soft output, parameterized by the detection score• Report two summary statistics: • Average precision (AP)• maximal F-measure (MF) = (F = 2PR/(P+R)
• Precision: fraction of true contours among detections • Recall: fraction of ground-truth contours detected
precision and recall are practically zero
Experiments
• 8498 training images and 2820 test images • Baseline comparison with the low level contour generated by contour detector[1]• Improve both MF and AP by a factor of 5 wrt to the bottom up contour detector• Single stage contour detector that combines the outputs of all inverse detectors across all categories does better than two stage detector.
• Best performance: transportation means (aeroplane, bicycle, bus, car, motorbike, train), people, bottles, TV monitors• Worst: chairs, dining tables, potted plants, boats and birds (hard to detect)
[1] - Contour detection and hierarchical image segmentation. P. Arbelaez et.al, PAMI-2011
Experiments
Thank you