instructor: mircea nicolescu lecture 17 cs 485 / 685 computer vision

Instructor: Mircea Nicolescu

Lecture 17

CS 485 / 685

Computer Vision

Object Recognition Using SIFT Features

1.Match individual SIFT features from an image to a database of SIFT features from known objects (i.e., find nearest neighbors)

2. Find clusters of SIFT features belonging to a single object (hypothesis generation)

2

3. Estimate object pose (i.e., recover the transformation that the model has undergone) using at least three matches

4. Verify that additional features agree on object pose

Object Recognition Using SIFT Features

3

Nearest Neighbor Search

• Linear search: too slow for large database • kD trees: become slow when k > 10

4

Nearest Neighbor Search

• Approximate nearest neighbor search:− Best-bin-first [Beis et al. 97] (modification to kD-tree

algorithm)− Examine only the N closest bins of the kD-tree− Use a heap to identify bins in order by their distance

from query.

• Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time.

Marius Muja and David G. Lowe, "Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration", International Conference on Computer Vision Theory and Applications, 2009.

FLANN - Fast Library for Approximate Nearest Neighbors

5

Estimate Object Pose

• Now, given feature matches…− Find clusters of features corresponding to a single

object− Solve for transformation (e.g., affine transformation)

6

• Need to consider clusters of size >=3• How do we find three “good” (true) matches?


7

• Pose clustering − Each feature is associated with four parameters:

− For every model-scene match (mi, sj), estimate a similarity transformation between mi and sj

(2D location, scale, orientation)

(tx, ty, s, θ)

vote


8

− Transformation space is 4D: (tx, ty, s, θ)

(tx,ty,s,θ) (t’x,t’y,s’,θ’) ….

votes


9

− Partial voting: vote for neighboring bins as well, and

use large bin size to better tolerate errors

− Transformations that accumulate at least three votes are selected (hypothesis generation)

− Using model-scene matches, compute object pose (i.e., affine transformation) and apply verification


10

Verification

• Back-project model on the scene and look for additional matches.

• Discard outliers (incorrect matches) by imposing stricter matching constraints (e.g., half error).

• Find additional matches by refining the transformation computed (i.e., iterative affine refinements).

11

Verification

• Evaluate probability that match is correct.

− Use a Bayesian (probabilistic) model, to estimate the probability that a model is present based on the actual number of matching features.

− Bayesian model takes into account: − Object size in image− Textured regions− Model feature count in database− Accuracy of fit

Lowe, D.G. 2001. Local feature view clustering for 3D object recognition. IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, pp. 682–688.

12

Planar Recognition

• Training images (models)

13

• Reliably recognized at a rotation of 60° away from the camera.

• Affine fit approximates perspective projection.

• Only 3 points are needed for recognition.

Planar Recognition

14

3D Object Recognition

• Training images

15

• Only 3 keypoints are needed for recognition; extra keypoints provide robustness.

• Affine model is no longer as accurate.

3D Object Recognition

16

Recognition Under Occlusion

17

Illumination Invariance

18

Object Categorization

19

Bag-of-Features (BoF) Models

Good for object categorization

20

Origin 1: Texture Recognition

• Texture is characterized by the repetition of basic elements or textons.

• Many times, it is the identity of the textons, not their spatial arrangement, that matters.

21

Universal texton dictionary

histogram

Universal texton dictionary

histogram

Origin 1: Texture Recognition

histogram

universal texton dictionary

22

Origin 2: Document Retrieval

• Orderless document representation:

frequencies of words from a dictionary Salton & McGill (1983)

23

BoF for Object Categorization

G. Csurka et al., "Visual Categorization with Bags of Keypoints", European Conference on Computer Vision, Czech Republic, 2004.

Need a “visual” dictionary!

24

BoF: Main Steps

Characterize objects in terms of parts or local features

25

BoF: main steps

Step 1: Feature extraction (e.g., SIFT features)

…

26

BoF: main steps (cont’d)

Step 2: Learn “visual” vocabulary

…

“visual” vocabulary

Feature extraction & clustering

27

BoF: Main Steps

…Features

28

Clustering

…

BoF: Main Steps

29

Clustering

…

“Visual” vocabulary: cluster centers

BoF: Main Steps

30

Example: K-Means Clustering

Algorithm:• Randomly initialize K cluster centers• Iterate until convergence:

− Assign each data point to the nearest center.− Re-compute each cluster center as the mean of all

points assigned to it.

31

BoF: Main Steps

Step 3: Quantize features using “visual” vocabulary

(i.e., represent each feature by the closest cluster center)

32

Step 4: Represent images by frequencies of “visual words” (i.e., bags of features)

BoF: Main Steps

33

BoF: Main Steps

34

BoF Object Categorization

• How do we use BoF for object categorization?

35


• Nearest Neighbor (NN) Classifier

36


• K-Nearest Neighbor (KNN) Classifier

Find the k closest points from training data.

Labels of the k points “vote” to classify.

Works well provided there is lots of data and the distance function is good.

37


• Functions for comparing histograms

38


• SVM classifier

SVM

SVM

SVM

39

Example

40

Dictionary quality and size are very important parameters!

Example

41

Appearance-Based Recognition

• Represent an object by the set of its possible appearances (i.e., under all possible viewpoints and illumination conditions).

• Identifying an object implies finding the closest stored image.

42

• In practice, a subset of all possible appearances is used.

• Images are highly correlated, so “compress” them into a low-dimensional space that captures key appearance characteristics (e.g., use Principal Component Analysis (PCA)).

M. Turk and A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991.

H. Murase and S. Nayar, Visual Learning and Recognition of 3D Objects from Appearance, International Journal of Computer Vision, vol 14, pp. 5-24, 1995.

Appearance-Based Recognition

43

44

− The goal of segmentation is to partition an image into regions (e.g., separate objects from background)

− The results of segmentation are very important in determining the eventual success or failure of image analysis

− Segmentation is a very difficult problem in general !!

Image Segmentation• Goals and Difficulties

45

− Introduce enough knowledge about the application domain− Assume control over the environment (e.g., in industrial

applications)− Select type of sensors to enhance the objects of interest (e.g., use

infrared imaging for target recognition applications)

Image Segmentation

• Increasing accuracy and robustness

46

− Edge-based approaches:− Use the boundaries of regions to segment the image− Detect abrupt changes in intensity (discontinuities)

Image Segmentation• Segmentation approaches

− Region-based approaches:− Use similarity among pixels to find different regions

− Theoretically, both approaches should give identical results but this is not true in practice

47

Region Detection• A region is a group of connected pixels with similar

properties.• Region-based approaches use similarity and spatial

proximity among pixels to find different regions.• The goal is to divide the image into regions, so that:

− each region is homogeneous in some sense− adjacent regions are not homogeneous if taken together, in the

same sense.

48

Region Detection• Properties for region-based segmentation

− Partition an image R into sub-regions R1, R2,..., Rn

− Assume P(Ri) is a logical predicate – a property that pixel values of region Ri satisfy (e.g., intensity between 100 and 120).

− The following properties must be true:

49

Region Detection

• Main approaches for region detection

− Thresholding (pixel classification)− Region growing (splitting and merging)− Relaxation

50

Thresholding

• The simplest approach to image segmentation is by thresholding:

if f(x,y) < T then f(x,y) = 0 else f(x,y) = 255

51

Thresholding

• Automatic thresholding

− To make segmentation more robust, the threshold should be automatically selected by the system.

− Knowledge about the objects, the application, the environment should be used to choose the threshold automatically:

− Intensity characteristics of the objects− Sizes of the objects− Fractions of an image occupied by the objects− Number of different types of objects appearing in an image

52

Thresholding

• Choosing the threshold using the image histogram

− Regions with uniform intensity give rise to strong peaks in the histogram

− Multilevel thresholding is also possible

− In general, good thresholds can be selected if the histogram peaks are tall, narrow, symmetric, and separated by deep valleys.

instructor: mircea nicolescu lecture 17 cs 485 / 685 computer vision

Documents

object size

verification estimate

d object recognition

vote estimate object

compute object

computer vision slide

modelscene matches

affine transformation