instructor: mircea nicolescu lecture 17 cs 485 / 685 computer vision
TRANSCRIPT
Instructor: Mircea Nicolescu
Lecture 17
CS 485 / 685
Computer Vision
Object Recognition Using SIFT Features
1.Match individual SIFT features from an image to a database of SIFT features from known objects (i.e., find nearest neighbors)
2. Find clusters of SIFT features belonging to a single object (hypothesis generation)
2
3. Estimate object pose (i.e., recover the transformation that the model has undergone) using at least three matches
4. Verify that additional features agree on object pose
Object Recognition Using SIFT Features
3
Nearest Neighbor Search
• Linear search: too slow for large database • kD trees: become slow when k > 10
4
Nearest Neighbor Search
• Approximate nearest neighbor search:− Best-bin-first [Beis et al. 97] (modification to kD-tree
algorithm)− Examine only the N closest bins of the kD-tree− Use a heap to identify bins in order by their distance
from query.
• Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time.
Marius Muja and David G. Lowe, "Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration", International Conference on Computer Vision Theory and Applications, 2009.
FLANN - Fast Library for Approximate Nearest Neighbors
5
Estimate Object Pose
• Now, given feature matches…− Find clusters of features corresponding to a single
object− Solve for transformation (e.g., affine transformation)
6
• Need to consider clusters of size >=3• How do we find three “good” (true) matches?
Estimate Object Pose
7
• Pose clustering − Each feature is associated with four parameters:
− For every model-scene match (mi, sj), estimate a similarity transformation between mi and sj
(2D location, scale, orientation)
(tx, ty, s, θ)
vote
Estimate Object Pose
8
− Transformation space is 4D: (tx, ty, s, θ)
(tx,ty,s,θ) (t’x,t’y,s’,θ’) ….
votes
Estimate Object Pose
9
− Partial voting: vote for neighboring bins as well, and
use large bin size to better tolerate errors
− Transformations that accumulate at least three votes are selected (hypothesis generation)
− Using model-scene matches, compute object pose (i.e., affine transformation) and apply verification
Estimate Object Pose
10
Verification
• Back-project model on the scene and look for additional matches.
• Discard outliers (incorrect matches) by imposing stricter matching constraints (e.g., half error).
• Find additional matches by refining the transformation computed (i.e., iterative affine refinements).
11
Verification
• Evaluate probability that match is correct.
− Use a Bayesian (probabilistic) model, to estimate the probability that a model is present based on the actual number of matching features.
− Bayesian model takes into account: − Object size in image− Textured regions− Model feature count in database− Accuracy of fit
Lowe, D.G. 2001. Local feature view clustering for 3D object recognition. IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, pp. 682–688.
12
Planar Recognition
• Training images (models)
13
• Reliably recognized at a rotation of 60° away from the camera.
• Affine fit approximates perspective projection.
• Only 3 points are needed for recognition.
Planar Recognition
14
3D Object Recognition
• Training images
15
• Only 3 keypoints are needed for recognition; extra keypoints provide robustness.
• Affine model is no longer as accurate.
3D Object Recognition
16
Recognition Under Occlusion
17
Illumination Invariance
18
Object Categorization
19
Bag-of-Features (BoF) Models
Good for object categorization
20
Origin 1: Texture Recognition
• Texture is characterized by the repetition of basic elements or textons.
• Many times, it is the identity of the textons, not their spatial arrangement, that matters.
21
Universal texton dictionary
histogram
Universal texton dictionary
histogram
Origin 1: Texture Recognition
histogram
universal texton dictionary
22
Origin 2: Document Retrieval
• Orderless document representation:
frequencies of words from a dictionary Salton & McGill (1983)
23
BoF for Object Categorization
G. Csurka et al., "Visual Categorization with Bags of Keypoints", European Conference on Computer Vision, Czech Republic, 2004.
Need a “visual” dictionary!
24
BoF: Main Steps
Characterize objects in terms of parts or local features
25
BoF: main steps
Step 1: Feature extraction (e.g., SIFT features)
…
26
BoF: main steps (cont’d)
Step 2: Learn “visual” vocabulary
…
“visual” vocabulary
Feature extraction & clustering
27
BoF: Main Steps
…Features
28
Clustering
…
BoF: Main Steps
29
Clustering
…
“Visual” vocabulary: cluster centers
BoF: Main Steps
30
Example: K-Means Clustering
Algorithm:• Randomly initialize K cluster centers• Iterate until convergence:
− Assign each data point to the nearest center.− Re-compute each cluster center as the mean of all
points assigned to it.
31
BoF: Main Steps
Step 3: Quantize features using “visual” vocabulary
(i.e., represent each feature by the closest cluster center)
32
Step 4: Represent images by frequencies of “visual words” (i.e., bags of features)
BoF: Main Steps
33
BoF: Main Steps
34
BoF Object Categorization
• How do we use BoF for object categorization?
35
BoF Object Categorization
• Nearest Neighbor (NN) Classifier
36
BoF Object Categorization
• K-Nearest Neighbor (KNN) Classifier
Find the k closest points from training data.
Labels of the k points “vote” to classify.
Works well provided there is lots of data and the distance function is good.
37
BoF Object Categorization
• Functions for comparing histograms
38
BoF Object Categorization
• SVM classifier
SVM
SVM
SVM
39
Example
40
Dictionary quality and size are very important parameters!
Example
41
Appearance-Based Recognition
• Represent an object by the set of its possible appearances (i.e., under all possible viewpoints and illumination conditions).
• Identifying an object implies finding the closest stored image.
42
• In practice, a subset of all possible appearances is used.
• Images are highly correlated, so “compress” them into a low-dimensional space that captures key appearance characteristics (e.g., use Principal Component Analysis (PCA)).
M. Turk and A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991.
H. Murase and S. Nayar, Visual Learning and Recognition of 3D Objects from Appearance, International Journal of Computer Vision, vol 14, pp. 5-24, 1995.
Appearance-Based Recognition
43
44
− The goal of segmentation is to partition an image into regions (e.g., separate objects from background)
− The results of segmentation are very important in determining the eventual success or failure of image analysis
− Segmentation is a very difficult problem in general !!
Image Segmentation• Goals and Difficulties
45
− Introduce enough knowledge about the application domain− Assume control over the environment (e.g., in industrial
applications)− Select type of sensors to enhance the objects of interest (e.g., use
infrared imaging for target recognition applications)
Image Segmentation
• Increasing accuracy and robustness
46
− Edge-based approaches:− Use the boundaries of regions to segment the image− Detect abrupt changes in intensity (discontinuities)
Image Segmentation• Segmentation approaches
− Region-based approaches:− Use similarity among pixels to find different regions
− Theoretically, both approaches should give identical results but this is not true in practice
47
Region Detection• A region is a group of connected pixels with similar
properties.• Region-based approaches use similarity and spatial
proximity among pixels to find different regions.• The goal is to divide the image into regions, so that:
− each region is homogeneous in some sense− adjacent regions are not homogeneous if taken together, in the
same sense.
48
Region Detection• Properties for region-based segmentation
− Partition an image R into sub-regions R1, R2,..., Rn
− Assume P(Ri) is a logical predicate – a property that pixel values of region Ri satisfy (e.g., intensity between 100 and 120).
− The following properties must be true:
49
Region Detection
• Main approaches for region detection
− Thresholding (pixel classification)− Region growing (splitting and merging)− Relaxation
50
Thresholding
• The simplest approach to image segmentation is by thresholding:
if f(x,y) < T then f(x,y) = 0 else f(x,y) = 255
51
Thresholding
• Automatic thresholding
− To make segmentation more robust, the threshold should be automatically selected by the system.
− Knowledge about the objects, the application, the environment should be used to choose the threshold automatically:
− Intensity characteristics of the objects− Sizes of the objects− Fractions of an image occupied by the objects− Number of different types of objects appearing in an image
52
Thresholding
• Choosing the threshold using the image histogram
− Regions with uniform intensity give rise to strong peaks in the histogram
− Multilevel thresholding is also possible
− In general, good thresholds can be selected if the histogram peaks are tall, narrow, symmetric, and separated by deep valleys.