advanced features jana kosecka cs223b slides from: s. thurn, d. lowe, forsyth and ponce
Post on 15-Jan-2016
226 views
TRANSCRIPT
![Page 1: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/1.jpg)
Advanced Features
Jana KoseckaCS223b
Slides from: S. Thurn, D. Lowe, Forsyth and Ponce
![Page 2: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/2.jpg)
CS223b2
Advanced Features: Topics
• Template matching• SIFT features• Haar features
![Page 3: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/3.jpg)
CS223b3
Features for Object Detection/Recognition
Want to find… in here
![Page 4: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/4.jpg)
CS223b4
Template Convolution
Pick a template - rectangular/square region of an imageGoal - find it in the same image/images of the same scene from Different viewpoint
![Page 5: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/5.jpg)
CS223b5
Convolution with Templates% read imageim = imread('bridge.jpg');bw = double(im(:,:,1)) ./ 255;imshow(bw)
% apply FFTFFTim = fft2(bw);bw2 = real(ifft2(FFTim));imshow(bw2)
% define a kernelkernel=zeros(size(bw));kernel(1, 1) = 1;kernel(1, 2) = -1;FFTkernel = fft2(kernel);
% apply the kernel and check out the result
FFTresult = FFTim .* FFTkernel;result = real(ifft2(FFTresult));imshow(result)
% select an image patchpatch = bw(221:240,351:370);imshow(patch)patch = patch - (sum(sum(patch)) / size(patch,1) /
size(patch, 2));
kernel=zeros(size(bw));kernel(1:size(patch,1),1:size(patch,2)) = patch;FFTkernel = fft2(kernel);
% apply the kernel and check out the resultFFTresult = FFTim .* FFTkernel;result = max(0, real(ifft2(FFTresult)));result = result ./ max(max(result));result = (result .^ 1 > 0.5);imshow(result)
% alternative convolutionimshow(conv2(bw, patch, 'same'))
![Page 6: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/6.jpg)
CS223b6
Template Convolution
![Page 7: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/7.jpg)
CS223b7
Aside: Convolution Theorem
)()()( gFIFgIF ⋅=⊗
∫∫ℜ
+−=2
)}(2exp{),(),))(,(( dydxvyuxiyxgvuyxgF π
Fourier Transform of g:
F is invertible
Convolution is a spatial domain is a multiplication in frequencydomain - often more efficient when fast FFT available
![Page 8: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/8.jpg)
CS223b8
Convolution with Templates% read imageim = imread('bridge.jpg');bw = double(im(:,:,1)) ./ 256;;imshow(bw)
% apply FFTFFTim = fft2(bw);bw2 = real(ifft2(FFTim));imshow(bw2)
% define a kernelkernel=zeros(size(bw));kernel(1, 1) = 1;kernel(1, 2) = -1;FFTkernel = fft2(kernel);
% apply the kernel and check out the result
FFTresult = FFTim .* FFTkernel;result = real(ifft2(FFTresult));imshow(result)
% select an image patchpatch = bw(221:240,351:370);imshow(patch)patch = patch - (sum(sum(patch)) / size(patch,1) /
size(patch, 2));
kernel=zeros(size(bw));kernel(1:size(patch,1),1:size(patch,2)) = patch;FFTkernel = fft2(kernel);
% apply the kernel and check out the resultFFTresult = FFTim .* FFTkernel;result = max(0, real(ifft2(FFTresult)));result = result ./ max(max(result));result = (result .^ 1 > 0.5);imshow(result)
% alternative convolutionimshow(conv2(bw, patch, 'same'))
![Page 9: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/9.jpg)
CS223b9
Given a template - find the region in the image with the highest matching score
Matching score - result of convolution is maximal
(or use SSD, SAD, NSS similarity measures)
Given rotated, scaled, perspectively distorted version of the image
Can we find the same patch (we want invariance!) Scaling Rotation Illumination Perspective Projection
Feature Matching with templates
![Page 10: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/10.jpg)
CS223b10
Given a template - find the region in the image with the highest matching score
Matching score - result of convolution is maximal
(or use SSD, SAD, NSS similarity measures)
Given rotated, scaled, perspectively distorted version of the image
Can we find the same patch (we want invariance!) Scaling - NO Rotation - NO Illumination - depends Perspective Projection - NO
Feature Matching with templates
![Page 11: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/11.jpg)
CS223b11
Scale Invariance: Image Pyramid
![Page 12: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/12.jpg)
CS223b12
Aliasing Effects
Constructing a pyramid by taking every second pixel leads to layers that badly misrepresent the top layer
Slide credit: Gary Bradski
![Page 13: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/13.jpg)
CS223b13
“Drop” vs “Smooth and Drop”
Drop every second pixel Smooth and Drop every second pixel
Aliasing problems
![Page 14: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/14.jpg)
CS223b14
Improved Invariance Handling
Want to find… in here
![Page 15: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/15.jpg)
CS223b15
SIFT Features
Invariances: Scaling Rotation Illumination Deformation
Provides Good localization
Yes
Yes
Yes
Not reallyYes
![Page 16: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/16.jpg)
CS223b16
SIFT Reference
Distinctive image features from scale-invariant keypoints. David G. Lowe, International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
SIFT = Scale Invariant Feature Transform
![Page 17: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/17.jpg)
CS223b17
Invariant Local Features
Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters
SIFT Features
![Page 18: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/18.jpg)
CS223b18
Advantages of invariant local features
Locality: features are local, so robust to occlusion and clutter (no prior segmentation)
Distinctiveness: individual features can be matched to a large database of objects
Quantity: many features can be generated for even small objects
Efficiency: close to real-time performance
Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness
![Page 19: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/19.jpg)
CS223b19
SIFT On-A-Slide1. Enforce invariance to scale: Compute Gaussian difference
max, for many different scales; non-maximum suppression, find local maxima: keypoint candidates
2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero.
3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold.
4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up.
5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients).
6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
![Page 20: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/20.jpg)
CS223b20
Finding “Keypoints” (Corners)
Idea: Find Corners, but scale invariance
Approach: Run linear filter (difference of Gaussians)
Do this at different resolutions of image pyramid
![Page 21: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/21.jpg)
CS223b21
Difference of Gaussians
Minus
Equals
Approximates Laplacian (see filtering lecture)
![Page 22: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/22.jpg)
CS223b22
Difference of Gaussians
surf(fspecial('gaussian',40,4))surf(fspecial('gaussian',40,8))surf(fspecial('gaussian',40,8) -
fspecial('gaussian',40,4))
im =imread('bridge.jpg');bw = double(im(:,:,1)) / 256;
for i = 1 : 10 gaussD = fspecial('gaussian',40,2*i) -
fspecial('gaussian',40,i); res = abs(conv2(bw, gaussD, 'same')); res = res / max(max(res)); imshow(res) ; title(['\bf i = ' num2str(i)]); drawnowend
![Page 23: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/23.jpg)
CS223b23
Gaussian Kernel Size i=1
![Page 24: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/24.jpg)
CS223b24
Gaussian Kernel Size i=2
![Page 25: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/25.jpg)
CS223b25
Gaussian Kernel Size i=3
![Page 26: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/26.jpg)
CS223b26
Gaussian Kernel Size i=4
![Page 27: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/27.jpg)
CS223b27
Gaussian Kernel Size i=5
![Page 28: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/28.jpg)
CS223b28
Gaussian Kernel Size i=6
![Page 29: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/29.jpg)
CS223b29
Gaussian Kernel Size i=7
![Page 30: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/30.jpg)
CS223b30
Gaussian Kernel Size i=8
![Page 31: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/31.jpg)
CS223b31
Gaussian Kernel Size i=9
![Page 32: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/32.jpg)
CS223b32
Gaussian Kernel Size i=10
![Page 33: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/33.jpg)
CS223b33
Key point localization
In D. Lowe’s paper image is decomposed to octaves (consecutively sub-sampled versions of the same image)
Instead of convolving with large kernels
within an octave kernels are kept the same
Detect maxima and minima of difference-of-Gaussian in scale space
Look for 3x3 neighbourhood in scale and space
Blur
Resample
Subtract
![Page 34: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/34.jpg)
CS223b34
Example of keypoint detection
(a) 233x189 image(b) 832 DOG extrema(c) 729 above threshold
![Page 35: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/35.jpg)
CS223b35
SIFT On-A-Slide1. Enforce invariance to scale: Compute Gaussian difference max,
for may different scales; non-maximum suppression, find local maxima: keypoint candidates
2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero.
3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold.
4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up.
5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients).
6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
![Page 36: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/36.jpg)
CS223b36
Example of keypoint detection
Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach)
(c) 729 left after peak value threshold (from 832)(d) 536 left after testing ratio of principle curvatures
![Page 37: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/37.jpg)
CS223b37
SIFT On-A-Slide1. Enforce invariance to scale: Compute Gaussian difference
max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates
2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero.
3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold.
4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up.
5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients).
6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
![Page 38: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/38.jpg)
CS223b38
Select canonical orientation
Create histogram of local gradient directions computed at selected scale
Assign canonical orientation at peak of smoothed histogram
Each key specifies stable 2D coordinates (x, y, scale, orientation)
0 2π
![Page 39: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/39.jpg)
CS223b39
SIFT On-A-Slide1. Enforce invariance to scale: Compute Gaussian difference
max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates
2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero.
3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold.
4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up.
5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients).
6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
![Page 40: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/40.jpg)
CS223b40
SIFT vector formation
Thresholded image gradients are sampled over 16x16 array of locations in scale space
Create array of orientation histograms 8 orientations x 4x4 histogram array = 128
dimensions
![Page 41: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/41.jpg)
CS223b41
Nearest-neighbor matching to feature database
Hypotheses are generated by approximate nearest neighbor matching of each feature to vectors in the database SIFT use best-bin-first (Beis & Lowe, 97) modification to k-d tree algorithm
Use heap data structure to identify bins in order by their distance from query point
Result: Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time
![Page 42: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/42.jpg)
CS223b42
3D Object Recognition
Extract outlines with background subtraction
![Page 43: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/43.jpg)
CS223b43
3D Object Recognition
Only 3 keys are needed for recognition, so extra keys provide robustness
Affine model is no longer as accurate
![Page 44: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/44.jpg)
CS223b44
Recognition under occlusion
![Page 45: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/45.jpg)
CS223b45
Test of illumination invariance
Same image under differing illumination
273 keys verified in final match
![Page 46: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/46.jpg)
CS223b46
Examples of view interpolation
![Page 47: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/47.jpg)
CS223b47
Location recognition
![Page 48: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/48.jpg)
CS223b48
SIFT
Invariances: Scaling Rotation Illumination Perspective Projection
Provides Good localization
YesYes
Yes
MaybeYes
State-of-the-art in invariant feature matching!Alternative detectors/descriptors/references can be found at
http://www.robots.ox.ac.uk/~vgg/software/
![Page 49: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/49.jpg)
CS223b49
SOFTWARE for Matlab (at UCLA)
![Page 50: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/50.jpg)
CS223b50
SIFT demos
Runsift_compilesift_demo2
![Page 51: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/51.jpg)
CS223b51
Advanced Features: Topics
SIFT Features Learning with Many Simple Features
![Page 52: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/52.jpg)
CS223b52
A totally different idea
Use many very simple features Learn cascade of tests for target object
Efficient if: features easy to compute cascade short
![Page 53: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/53.jpg)
CS223b53
Using Many Simple Features
Viola Jones / Haar Features
(Generalized) Haar Features:
• rectangular blocks, white or black• 3 types of features:
• two rectangles: horizontal/vertical• three rectangles• four rectangles
• in 24x24 window: 180,000 possible features
![Page 54: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/54.jpg)
CS223b54
Integral ImageDef: The integral image at location (x,y), is the sum of the pixel values above and to the left of (x,y), inclusive.
We can calculate the integral image representation of
the image in a single pass.
(x,y)
s(x,y) = s(x,y-1) + i(x,y)
ii(x,y) = ii(x-1,y) + s(x,y)
(0,0)
x
y
Slide credit: Gyozo Gidofalvi
![Page 55: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/55.jpg)
CS223b55
Efficient Computation of Rectangle Value
Using the integral image representation one can compute the value of any rectangular sum in constant time.
Example: Rectangle D
ii(4) + ii(1) – ii(2) – ii(3)
As a result two-, three-, and four-rectangular features can be computed with 6, 8 and 9 array references respectively.
Slide credit: Gyozo Gidofalvi
![Page 56: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/56.jpg)
CS223b56
Idea 1: Linear Separator
Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce
![Page 57: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/57.jpg)
CS223b57
Linear Separator for Image features(highly related to Vapnik’s Support Vector Machines)
Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce
![Page 58: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/58.jpg)
CS223b58
Problem
How to find hyperplane? How to avoid evaluating 180,000 features?
Answer: Boosting [AdaBoost, Freund/Shapire] Finds small set of features that are “sufficient”
Generalizes very well (a lot of max-margin theory)
Requires positive and negative examples
![Page 59: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/59.jpg)
CS223b59
AdaBoost Idea (in Viola/Jones):
Given set of “weak” classifiers: Pick best one Reweight training examples, so that misclassified images have larger weight
Reiterate; then linearly combine resulting classifiers
Weak classifiers: Haar features
![Page 60: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/60.jpg)
CS223b60
AdaBoost Idea (in Viola/Jones):
We will dicuss the classification later
Sneak preview of Adaboost and Results on face and car detection
… to be continued when discussing object
detection and recognition
![Page 61: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/61.jpg)
CS223b61
AdaBoost Weak Classifier 1
WeightsIncreased
Weak classifier 3
Final classifier is linear combination of weak classifiers
t
xhyt
t Z
eiDiD
iti )(
1
)()(
−
+ =
44 344 21t
i
xhyt
ht
Z
eiDh ii∑ −= )()(min
Weak Classifier 2
Freund & Shapire
![Page 62: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/62.jpg)
CS223b62
Adaboost Algorithm
Freund & Shapire
![Page 63: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/63.jpg)
CS223b63
AdaBoost gives efficient classifier:
Features = Weak Classifiers Each round selects the optimal feature given: Previous selected features Exponential Loss
AdaBoost Surprise Generalization error decreases even after all training examples 100% correctly classified (margin maximization phenomenon)
![Page 64: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/64.jpg)
CS223b64
Boosted Face Detection: Image Features
“Rectangle filters”
000,000,6100000,60 =×Unique Binary Features
Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce
![Page 65: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/65.jpg)
CS223b65
Example Classifier for Face Detection
ROC curve for 200 feature classifier
A classifier with 200 rectangle features was learned using AdaBoost
95% correct detection on test set with 1 in 14084false positives.
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
![Page 66: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/66.jpg)
CS223b66
Classifier are Efficient
Given a nested set of classifier hypothesis classes vs false neg determined by
% False Pos
% D
etec
tion
0 50
50
100
IMAGESUB-WINDOW
Classifier 1
F
NON-FACE
F
NON-FACE
FACEClassifier 3T
F
NON-FACE
TTTClassifier 2
F
NON-FACE
Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce
![Page 67: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/67.jpg)
CS223b67
Cascaded Classifier
1 Feature 5 Features
F
50%20 Features
20% 2%
FACE
NON-FACE
F
NON-FACE
F
NON-FACE
IMAGESUB-WINDOW
A 1 feature classifier achieves 100% detection rate and about 50% false positive rate.
A 5 feature classifier achieves 100% detection rate and 40% false positive rate (20% cumulative)
using data from previous stage. A 20 feature classifier achieve 100% detection
rate with 10% false positive rate (2% cumulative)
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
![Page 68: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/68.jpg)
CS223b68
Output of Face Detector on Test Images
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
![Page 69: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/69.jpg)
CS223b69
Solving other “Face” Tasks
Facial Feature Localization
DemographicAnalysis
Profile Detection
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
![Page 70: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/70.jpg)
CS223b70
Face Localization Features
Learned features reflect the task
Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce
![Page 71: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/71.jpg)
CS223b71
Face Profile Detection
Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
![Page 72: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/72.jpg)
CS223b72
Face Profile Features
![Page 73: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/73.jpg)
CS223b73
Finding Cars (DARPA Urban Challenge) Hand-labeled images of generic car rear-ends
Training time: ~5 hours, offline
1100 images
Credit: Hendrik Dahlkamp
![Page 74: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/74.jpg)
CS223b74
Generating even more examples Generic classifier finds all cars in recorded video.
Compute offline and store in database
28700 images
Credit: Hendrik Dahlkamp
![Page 75: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/75.jpg)
CS223b75
Results - Video
![Page 76: Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce](https://reader035.vdocuments.net/reader035/viewer/2022062322/56649d375503460f94a0f554/html5/thumbnails/76.jpg)
CS223b76
Summary Viola-Jones
Many simple features Generalized Haar features (multi-rectangles) Easy and efficient to compute
Discriminative Learning: finds a small subset for object recognition Uses AdaBoost
Result: Feature Cascade 15fps on 700Mhz Laptop (=fast!)
Applications Face detection Car detection Many others