Download - Vision - cs.umd.edu
![Page 1: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/1.jpg)
Last update: May 4, 2010
Vision
CMSC 421: Chapter 24
CMSC 421: Chapter 24 1
![Page 2: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/2.jpg)
Outline
♦ Perception generally
♦ Image formation
♦ Early vision
♦ 2D → 3D
♦ Object recognition
CMSC 421: Chapter 24 2
![Page 3: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/3.jpg)
Perception generally
Stimulus (percept) S, World W
S = g(W )
E.g., g = “graphics.” Can we do vision as inverse graphics?
W = g−1(S)
CMSC 421: Chapter 24 3
![Page 4: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/4.jpg)
Perception generally
Stimulus (percept) S, World W
S = g(W )
E.g., g = “graphics.” Can we do vision as inverse graphics?
W = g−1(S)
Problem: massive ambiguity!
CMSC 421: Chapter 24 4
![Page 5: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/5.jpg)
Perception generally
Stimulus (percept) S, World W
S = g(W )
E.g., g = “graphics.” Can we do vision as inverse graphics?
W = g−1(S)
Problem: massive ambiguity!
CMSC 421: Chapter 24 5
![Page 6: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/6.jpg)
Perception generally
Stimulus (percept) S, World W
S = g(W )
E.g., g = “graphics.” Can we do vision as inverse graphics?
W = g−1(S)
Problem: massive ambiguity!
CMSC 421: Chapter 24 6
![Page 7: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/7.jpg)
Example
CMSC 421: Chapter 24 7
![Page 8: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/8.jpg)
Example
CMSC 421: Chapter 24 8
![Page 9: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/9.jpg)
Better approaches
Bayesian inference of world configurations:
P (W |S) = α P (S|W )︸ ︷︷ ︸“graphics”
P (W )︸ ︷︷ ︸“prior knowledge”
Better still: no need to recover exact scene!Just extract information needed for
– navigation– manipulation– recognition/identification
CMSC 421: Chapter 24 9
![Page 10: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/10.jpg)
Vision “subsystems”
behaviorsscenes
objects
depth map
optical flowdisparityedges regions
features
image sequence
s.f. contour
trackingdata association
objectrecognition
segmentationfilters
edgedetection
matching
s.f.motion
s.f.shading
s.f.stereo
HIGH−LEVEL VISION
LOW−LEVEL VISION
Vision requires combining multiple cues
CMSC 421: Chapter 24 10
![Page 11: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/11.jpg)
Image formation
f
imageplane
pinhole
P
P’
Y
X
Z
P is a point in the scene, with coordinates (X, Y, Z)P ′ is its image on the image plane, with coordinates (x, y, z)
x =−fXZ
, y =−fYZ
where f is a scaling factor.
CMSC 421: Chapter 24 11
![Page 12: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/12.jpg)
Images
CMSC 421: Chapter 24 12
![Page 13: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/13.jpg)
Images, continued
195 209 221 235 249 251 254 255 250 241 247 248
210 236 249 254 255 254 225 226 212 204 236 211
164 172 180 192 241 251 255 255 255 255 235 190
167 164 171 170 179 189 208 244 254 234
162 167 166 169 169 170 176 185 196 232 249 254
153 157 160 162 169 170 168 169 171 176 185 218
126 135 143 147 156 157 160 166 167 171 168 170
103 107 118 125 133 145 151 156 158 159 163 164
095 095 097 101 115 124 132 142 117 122 124 161
093 093 093 093 095 099 105 118 125 135 143 119
093 093 093 093 093 093 095 097 101 109 119 132
095 093 093 093 093 093 093 093 093 093 093 119
255 251
I(x, y, t) is the intensity at (x, y) at time t
typical digital camera ≈ 5 to 10 million pixelshuman eyes ≈ 240 million pixels; about 0.25 terabits/sec
CMSC 421: Chapter 24 13
![Page 14: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/14.jpg)
Color vision
Intensity varies with frequency → infinite-dimensional signal
intensity
frequency
frequency
sensitivity
Human eye has three types of color-sensitive cells;each integrates the signal ⇒ 3-element vector intensity
CMSC 421: Chapter 24 14
![Page 15: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/15.jpg)
Primary colors
Did anyone ever tell you that the three primary colors are red, yellow, andblue?
CMSC 421: Chapter 24 15
![Page 16: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/16.jpg)
Primary colors
Did anyone ever tell you that the three primary colors are red, yellow, andblue?
Not correct.
Did anyone ever tell you that• red, green, and blue are the primary colors of light, and• red, yellow, and blue are the primary pigments?
CMSC 421: Chapter 24 16
![Page 17: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/17.jpg)
Primary colors
Did anyone ever tell you that the three primary colors are red, yellow, andblue?
Not correct.
Did anyone ever tell you that• red, green, and blue are the primary colors of light, and• red, yellow, and blue are the primary pigments?
Only partially correct.
What do we mean by “primary colors”?
CMSC 421: Chapter 24 17
![Page 18: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/18.jpg)
Primary colors
Did anyone ever tell you that the three primary colors are red, yellow, andblue?
They weren’t correct.
Did anyone ever tell you that• red, green, and blue are the primary colors of light, and• red, yellow, and blue are the primary pigments?
Only partially correct.
What do we mean by “primary colors”?
Ideally, a small set of colors from which wecan generate every color the eye can see
Some colors can’t be generated by any combination of red, yellow, and bluepigment.
CMSC 421: Chapter 24 18
![Page 19: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/19.jpg)
Primary colors
Use color frequencies that stimulate the three color receptors
Primary colors of light (additive)⇔ frequencies where the three
color receptors are most sensitive⇔ red, green, blue
Primary pigments (subtractive)white minus red = cyanwhite minus green = magentawhite minus blue = yellow
“Four color” (or CMYK) printing uses cyan, magenta, yellow, and black.
CMSC 421: Chapter 24 19
![Page 20: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/20.jpg)
Edge detection
Edges in image ⇐ discontinuities in scene:1) depth2) surface orientation3) reflectance (surface markings)4) illumination (shadows, etc.)
Edges correspond to abrupt changes in brightness or color
CMSC 421: Chapter 24 20
![Page 21: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/21.jpg)
Edge detection
Edges correspond to abrupt changes in brightness or colorE.g., a single row or column of an image:
Ideally, we could detect abrupt changes by computing a derivative:
In practice, this has some problems . . .
CMSC 421: Chapter 24 21
![Page 22: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/22.jpg)
Noise
Problem: the image is noisy:
The derivative looks like this:
First need to smooth out the noise
CMSC 421: Chapter 24 22
![Page 23: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/23.jpg)
Smoothing out the noise
Convolution g = h ? f : for each point f [i, j], compute a weighted averageg[i, j] of the points near f [i, j]
g[i, j] = Σku=−kΣ
kv=−kh[u, v]f [i− u, j − v]
Use convolutionto smooth theimage, thendifferentiate:
CMSC 421: Chapter 24 23
![Page 24: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/24.jpg)
Edge detection
Find all pixels at which the derivative is above some threshold:
Label above-threshold pixels with edge orientation
Infer “clean” line segments by combining edge pixels with same orientation
CMSC 421: Chapter 24 24
![Page 25: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/25.jpg)
Inferring shape
Can use cues from prior knowledge to help infer shape:
Shape from. . . Assumes
motion rigid bodies, continuous motionstereo solid, contiguous, non-repeating bodiestexture uniform textureshading uniform reflectancecontour minimum curvature
CMSC 421: Chapter 24 25
![Page 26: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/26.jpg)
Inferring shape from motion
Is this a 3-d shape? If so, what shape is it?
CMSC 421: Chapter 24 26
![Page 27: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/27.jpg)
Inferring shape from motion
Rotate the shape slightly
CMSC 421: Chapter 24 27
![Page 28: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/28.jpg)
Inferring shape from motion
Match the corresponding pixels in the two imagesUse this to infer 3-dimensional locations of the pixels
CMSC 421: Chapter 24 28
![Page 29: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/29.jpg)
Inferring shape using stereo vision
Can do something similar using stereo vision
Perceived object
Right imageLeft image
P0
P
CMSC 421: Chapter 24 29
![Page 30: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/30.jpg)
Inferring shape from texture
Idea: assume actual texture is uniform, compute surface shape that wouldproduce this distortion
Similar idea works for shading—assume uniform reflectance, etc.—butinterreflections give nonlocal computation of perceived intensity⇒ hollows seem shallower than they really are
CMSC 421: Chapter 24 30
![Page 31: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/31.jpg)
Inferring shape from edges
Sometimes we can infer shape from properties of the edges and vertices:
Let’s only consider solid polyhedral objects with trihedral vertices⇒ Only certain types of labels are possible, e.g.,
CMSC 421: Chapter 24 31
![Page 32: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/32.jpg)
All possible combinations of vertex/edge labels
1
5 7
3
3 3 3 5
1 5 1 5
3
173
Given a line drawing, use constraint-satisfaction to assign labels to edges
CMSC 421: Chapter 24 32
![Page 33: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/33.jpg)
Vertex/edge labeling example
3 3 3 5
1 5 1 5
3
173
Use a constraint-satisfaction algorithm to assign labels to edgesvariables = edges, constraints = possible node configurations
CMSC 421: Chapter 24 33
![Page 34: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/34.jpg)
Vertex/edge labeling example
3 3 3 5
1 5 1 5
3
173
Start by putting clockwise arrows on the boundary
CMSC 421: Chapter 24 34
![Page 35: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/35.jpg)
Vertex/edge labeling example
1
1
1
3 3 3 5
1 5 1 5
3
173
=⇒
CMSC 421: Chapter 24 35
![Page 36: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/36.jpg)
Vertex/edge labeling example
1
1
1
1 1
1
3 3 3 5
1 5 1 5
3
173
=⇒
CMSC 421: Chapter 24 36
![Page 37: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/37.jpg)
Vertex/edge labeling example
1
1
1
1 1
1
1
3 3 3 5
1 5 1 5
3
173 =⇒
CMSC 421: Chapter 24 37
![Page 38: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/38.jpg)
Vertex/edge labeling example
1
1
1
1 1
1
1
T
3 3 3 5
1 5 1 5
3
173
=⇒
CMSC 421: Chapter 24 38
![Page 39: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/39.jpg)
Vertex/edge labeling example
1
1
1
1 1
1
1
T3
3 3 3 5
1 5 1 5
3
173
=⇒
CMSC 421: Chapter 24 39
![Page 40: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/40.jpg)
Vertex/edge labeling example
1
1
1
1 1
1
1
T3
3
3 3 3 5
1 5 1 5
3
173
=⇒
CMSC 421: Chapter 24 40
![Page 41: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/41.jpg)
Vertex/edge labeling example
1
1
1
1 1
1
1
T3
3
3 3 3 5
1 5 1 5
3
173
Need to backtrack
CMSC 421: Chapter 24 41
![Page 42: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/42.jpg)
Vertex/edge labeling example
1
1
1
1 1
1
1
T
3 3 3 5
1 5 1 5
3
173
CMSC 421: Chapter 24 42
![Page 43: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/43.jpg)
Vertex/edge labeling example
1
1
1
1 1
1
1
T3
3 3 3 5
1 5 1 5
3
173
=⇒
CMSC 421: Chapter 24 43
![Page 44: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/44.jpg)
Vertex/edge labeling example
1
1
1
1 1
1
1
T3
3
3 3 3 5
1 5 1 5
3
173
=⇒
CMSC 421: Chapter 24 44
![Page 45: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/45.jpg)
Vertex/edge labeling example
1
1
1
1 1
1
1
T3
3
3 3 3 5
1 5 1 5
3
173
Need to backtrack again
CMSC 421: Chapter 24 45
![Page 46: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/46.jpg)
Vertex/edge labeling example
1
1
1
1 1
1
1
T3
3 3 3 5
1 5 1 5
3
173
CMSC 421: Chapter 24 46
![Page 47: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/47.jpg)
Vertex/edge labeling example
1
1
1
1 1
1
1
T3
3
3 3 3 5
1 5 1 5
3
173
=⇒
CMSC 421: Chapter 24 47
![Page 48: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/48.jpg)
Vertex/edge labeling example
1
1
1
1 1
1
1
T3
3
33
3 3 3 5
1 5 1 5
3
173=⇒
Found a consistent labeling
CMSC 421: Chapter 24 48
![Page 49: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/49.jpg)
Object recognition
Simple idea:– extract 3-D shapes from image– match against “shape library”
Problems:– extracting curved surfaces from image– representing shape of extracted object– representing shape and variability of library object classes– improper segmentation, occlusion– unknown illumination, shadows, markings, noise, complexity, etc.
Approaches:– index into library by measuring invariant properties of objects– alignment of image feature with projected library object feature– match image against multiple stored views (aspects) of library object– machine learning methods based on image statistics
CMSC 421: Chapter 24 49
![Page 50: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/50.jpg)
Handwritten digit recognition
3-nearest-neighbor = 2.4% error400–300–10 unit MLP (multi-layer perceptron) = 1.6% errorLeNet: 768–192–30–10 unit MLP = 0.9% error
Current best (kernel machines, vision algorithms) ≈ 0.6% error♦ A kernel machine is a statistical learning algorithm♦ The “kernel” is a particular kind of vector function (see Section 20.6)
CMSC 421: Chapter 24 50
![Page 51: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/51.jpg)
Shape-context matching
Do two shapes represent the same object?♦ Represent each shape as a set of sampled edge points p1, . . . , pn♦ For each point pi, compute its shape context
- a function that represents pi’s relation to the other points in the shape
Shape P ShapeQ Polar coordinates (log r, θ)
CMSC 421: Chapter 24 51
![Page 52: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/52.jpg)
Shape-context matching contd.
A point’s shape context is a histogram:a grid of discrete values of log r and θthe grid divides space into “bins,” one for each pair (log r, θ)
θ
log r
θ
log r
θ
log r
hi(k) = the number of points in the k’th bin for point pi
CMSC 421: Chapter 24 52
![Page 53: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/53.jpg)
Shape-context matching contd.
For pi in shape P and qj in shape Q, let
Cij =1
2
K∑k=1
[hi(k)− hj(k)]2/[hi(k) + hj(k)]
Cij is the χ2 distance between the histograms of pi and qj. The smaller itis, the closer the more alike pi and qj are.
Find a one-to-one correspondence between the points in shape P and thepoints in shape Q, in a way that minimizes the total cost
weighted bipartite matching problem, can be solved in time O(n3)
Simple nearest-neighbor learning gives 0.63% error rate on NIST digit data
CMSC 421: Chapter 24 53
![Page 54: Vision - cs.umd.edu](https://reader034.vdocuments.net/reader034/viewer/2022051315/627a88245b16a65251063aca/html5/thumbnails/54.jpg)
Summary
Vision is hard—noise, ambiguity, complexity
Prior knowledge is essential to constrain the problem
Need to combine multiple cues: motion, contour, shading, texture, stereo
“Library” object representation: shape vs. aspects
Image/object matching: features, lines, regions, etc.
CMSC 421: Chapter 24 54