stockman msu/cse fall 20081 computer vision and human perception a brief intro from an ai...

90
Stockman MSU/CSE Fall 20 08 1 Computer Vision and Human Perception A brief intro from an AI perspective

Upload: maximilian-hampton

Post on 18-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 1

Computer Vision and Human Perception

A brief intro from an AI perspective

Page 2: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 2

Computer Vision and Human Perception

What are the goals of CV? What are the applications? How do humans perceive the 3D

world via images? Some methods of processing

images.

Page 3: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 3

Goal of computer vision Make useful decisions about real physical

objects and scenes based on sensed images.

Alternative (Aloimonos and Rosenfeld): goal is the construction of scene descriptions from images.

How do you find the door to leave? How do you determine if a person is friendly

or hostile? .. an elder? .. a possible mate?

Page 4: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 4

Critical Issues Sensing: how do sensors obtain images

of the world? Information: how do we obtain color,

texture, shape, motion, etc.? Representations: what representations

should/does a computer [or brain] use? Algorithms: what algorithms process

image information and construct scene descriptions?

Page 5: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 5

Images: 2D projections of 3D 3D world has color, texture, surfaces,

volumes, light sources, objects, motion, betweeness, adjacency, connections, etc.

2D image is a projection of a scene from a specific viewpoint; many 3D features are captured, some are not.

Brightness or color = g(x,y) or f(row, column) for a certain instant of time

Images indicate familiar people, moving objects or animals, health of people or machines

Page 6: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 6

Image receives reflections Light reaches

surfaces in 3D Surfaces reflect Sensor element

receives light energy

Intensity matters Angles matter Material maters

Page 7: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 7

CCD Camera has discrete elts

Lens collects light rays

CCD elts replace chemicals of film

Number of elts less than with film (so far)

Page 8: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 8

Intensities near center of eye

Page 9: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 9

Camera + Programs = Display Camera

inputs to frame buffer

Program can interpret data

Program can add graphics

Program can add imagery

Page 10: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 10

Some image format issues

Spatial resolution; intensity resolution; image file format

Page 11: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 11

Resolution is “pixels per unit of length”

Resolution decreases by one half in cases at left

Human faces can be recognized at 64 x 64 pixels per face

Page 12: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 12

Features detected depend on the resolution

Can tell hearts from diamonds

Can tell face value

Generally need 2 pixels across line or small region (such as eye)

Page 13: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 13

Human eye as a spherical camera 100M sensing elts in

retina Rods sense intensity Cones sense color Fovea has tightly

packed elts, more cones

Periphery has more rods

Focal length is about 20mm

Pupil/iris controls light entry

• Eye scans, or saccades to image details on fovea

• 100M sensing cells funnel to 1M optic nerve connections to the brain

Page 14: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 14

Image processing operations

Thresholding;Edge detection;

Motion field computation

Page 15: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 15

Find regions via thresholding

Region has brighter or darker or redder color, etc.

If pixel > threshold then pixel = 1 else pixel = 0

Page 16: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 16

Example red blood cell image

Many blood cells are separate objects

Many touch – bad! Salt and pepper

noise from thresholding

How useable is this data?

Page 17: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 17

Robot vehicle must see stop sign

sign = imread('Images/stopSign.jpg','jpg'); red = (sign(:, :, 1)>120) & (sign(:,:,2)<100) & (sign(:,:,3)<80); out = red*200; imwrite(out, 'Images/stopRed120.jpg', 'jpg')

Page 18: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 18

Thresholding is usually not trivial

Page 19: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 19

Can cluster pixels by color similarity and by adjacency

Original RGB Image Color Clusters by K-Means

Page 20: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 20

Some image processing ops

Finding contrast in an image; using neighborhoods of pixels;

detecting motion across 2 images

Page 21: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 21

Differentiate to find object edges For each pixel,

compute its contrast

Can use max difference of its 8 neighbors

Detects intensity change across boundary of adjacent regions

LOG filter later on

Page 22: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 22

4 and 8 neighbors of a pixel 4 neighbors are at

multiples of 90 degrees

. N . W * E . S .

8 neighbors are at every multiple of 45 degrees

NW N NE W * E SW S SE

Page 23: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 23

Detect Motion via Subtraction

Constant background

Moving object Produces

pixel differences at boundary

Reveals moving object and its shapeDifferences computed over

time rather than over space

Page 24: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 24

Two frames of aerial imagery

Video frame N and N+1 shows slight movement: most pixels are same, just in different locations.

Page 25: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 25

Best matching blocks between video frames N+1 to N (motion vectors)

The bulk of the vectors show the true motion of the airplane taking the pictures. The long vectors are incorrect motion vectors, but they do work well for compression of image I2!

Best matches from 2nd to first image shown as vectors overlaid on the 2nd image. (Work by Dina Eldin.)

Page 26: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 26

Gradient from 3x3 neighborhoodEstimate both magnitude and direction of the edge.

Page 27: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 27

Prewitt versus Sobel masks

Sobel mask uses weights of 1,2,1 and -1,-2,-1 in order to give more weight to center estimate. The scaling factor is thus 1/8 and not 1/6.

Page 28: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 28

Computational short cuts

Page 29: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 29

2 rows of intensity vs difference

Page 30: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 30

“Masks” show how to combine neighborhood values

Multiply the mask by the image neighborhood to get first derivatives of intensity versus x and versus y

Page 31: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 31

Curves of contrasting pixels

Page 32: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 32

Boundaries not always found well

Page 33: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 33

Canny boundary operator

Page 34: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 34

LOG filter creates zero crossing at step edges (2nd der. of Gaussian)

3x3 mask applied at each image position

Detects spots

Detects steps edgesMarr-Hildreth theory of edge detection

Page 35: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 35

G(x,y): Mexican hat filter

Page 36: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 36

Positive center

Negative surround

Page 37: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 37

Properties of LOG filter Has zero response on constant

region Has zero response on intensity ramp Exaggerates a step edge by making

it larger Responds to a step in any direction

across the “receptive field”. Responds to a spot about the size of

the center

Page 38: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 38

Human receptive field is analogous to a mask

X[j] are the image intensities.

W[j] are gains (weights) in the mask

Page 39: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 39

Human receptive fields amplify contrast

Page 40: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 40

3D neural network in brain

Level j

Level j+1

Page 41: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 41

Mach band effect shows human bias

Page 42: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 42

Human bias and illusions supports receptive field theory of edge detection

Page 43: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 43

Human brain as a network 100B neurons, or nodes Half are involved in vision 10 trillion connections Neuron can have “fanout” of 10,000 Visual cortex highly structured to

process 2D signals in multiple ways

Page 44: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 44

Color and shading Used heavily in human

vision Color is a pixel property,

making some recognition problems easy

Visible spectrum for humans is 400nm (blue) to 700 nm (red)

Machines can “see” much more; ex. X-rays, infrared, radio waves

Page 45: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 45

Imaging Process (review)

Page 46: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 46

Factors that Affect Perception

• Light: the spectrum of energy that illuminates the object surface

• Reflectance: ratio of reflected light to incoming light

• Specularity: highly specular (shiny) vs. matte surface

• Distance: distance to the light source

• Angle: angle between surface normal and light source

• Sensitivity how sensitive is the sensor

Page 47: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 47

Some physics of color

White light is composed of all visible frequencies (400-700)

Ultraviolet and X-rays are of much smaller wavelength

Infrared and radio waves are of much longer wavelength

Page 48: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 48

Models of Reflectance

We need to look at models for the physics of illumination and reflection that will1. help computer vision algorithms extract information about the 3D world, and2. help computer graphics algorithms render realistic images of model scenes.

Physics-based vision is the subarea of computer visionthat uses physical models to understand image formationin order to better analyze real-world images.

Page 49: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 49

The Lambertian Model:Diffuse Surface Reflection

A diffuse reflectingsurface reflects lightuniformly in all directions

Uniform brightness forall viewpoints of a planarsurface.

Page 50: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 50

Real matte objects

Light from ring around camera lens

Page 51: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 51

Specular reflection is highly directional and mirrorlike.

R is the ray of reflectionV is direction from the surface toward the viewpoint is the shininess parameter

Page 52: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 52

CV: Perceiving 3D from 2D

Many cues from 2D images enable

interpretation of the structure of the 3D

world producing them

Page 53: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 53

Many 3D cues

How can humans and other machines reconstruct the 3D nature of a scene from 2D images?

What other world knowledge needs to be added in the process?

Page 54: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 54

Labeling image contours interprets the 3D scene structure

+

An egg and a thin cup on a table top lighted from the top right

Logo on cup is a “mark” on the material

“shadow” relates to illumination, not material

Page 55: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 55

“Intrinsic Image” stores 3D info in “pixels” and not intensity.

For each point of the image, we want depth to the 3D surface point, surface normal at that point, albedo of the surface material, and illumination of that surface point.

Page 56: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 56

3D scene versus 2D image

Creases Corners Faces Occlusions (for

some viewpoint)

Edges Junctions Regions Blades, limbs, T’s

Page 57: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 57

Labeling of simple polyhedra

Labeling of a block floating in space. BJ and KI are convex creases. Blades AB, BC, CD, etc model the occlusion of the background. Junction K is a convex trihedral corner. Junction D is a T-junction modeling the occlusion of blade CD by blade JE.

Page 58: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 58

Trihedral Blocks World Image Junctions: only 16 cases!

Only 16 possible junctions in 2D formed by viewing 3D corners formed by 3 planes and viewed from a general viewpoint! From top to bottom: L-junctions, arrows, forks, and T-junctions.

Page 59: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 59

How do we obtain the catalog?

think about solid/empty assignments to the 8 octants about the X-Y-Z-origin

think about non-accidental viewpoints

account for all possible topologies of junctions and edges

then handle T-junction occlusions

Page 60: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 60

Blocks world labeling

Left: block floating in space

Right: block glued to a wall at the back

Page 61: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 61

Try labeling these: interpret the 3D structure, then label parts

What does it mean if we can’t label them? If we can label them?

Page 62: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 62

1975 researchers very excited very strong constraints on

interpretations several hundred in catalogue when

cracks and shadows allowed (Waltz): algorithm works very well with them

but, world is not made of blocks! later on, curved blocks world work

done but not as interesting

Page 63: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 63

Backtracking or interpretation tree

Page 64: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 64

“Necker cube” has multiple interpretations

A human staring at one of these cubes typically experiences changing interpretations. The interpretation of the two forks (G and H) flip-flops between “front corner” and “back corner”. What is the explanation?

Label the different interpretations

Page 65: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 65

Depth cues in 2D images

Page 66: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 66

“Interposition” cue

Def: Interposition occurs when one object occludes another object, thus indicating that the occluding object is closer to the viewer than the occluded object.

Page 67: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 67

interposition

• T-junctions indicate occlusion: top is occluding edge while bar is the occluded edge

• Bench occludes lamp post

• leg occludes bench

• lamp post occludes fence

• railing occludes trees

• trees occlude steeple

Page 68: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 68

• Perspective scaling: railing looks smaller at the left; bench looks smaller at the right; 2 steeples are far away

• Forshortening: the bench is sharply angled relative to the viewpoint; image length is affected accordingly

Page 69: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 69

Texture gradient reveals surface orientation

Texture Gradient: change of image texture along some direction, often corresponding to a change in distance or orientation in the 3D world containing the objects creating the texture.

( In East Lansing, we call it “corn” not “maize’. )

Note also that the rows appear to converge in 2D

Page 70: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 70

3D Cues from Perspective

Page 71: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 71

3D Cues from perspective

Page 72: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 72

More 3D cues

Virtual lines Falsely perceived interposition

Page 73: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 73

Irving Rock: The Logic of Perception; 1982

Summarized an entire career in visual psychology

Concluded that the human visual system acts as a problem-solver

Triangle unlikely to be accidental; must be object in front of background; must be brighter since it’s closer

Page 74: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 74

More 3D cues

2D alignment usually means 3d alignment

2D image curves create perception of 3D surface

Page 75: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 75

“structured light” can enhance surfaces in industrial vision

Sculpted object Potatoes with light stripes

Page 76: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 76

Models of Reflectance

We need to look at models for the physics of illumination and reflection that will1. help computer vision algorithms extract information about the 3D world, and2. help computer graphics algorithms render realistic images of model scenes.

Physics-based vision is the subarea of computer visionthat uses physical models to understand image formationin order to better analyze real-world images.

Page 77: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 77

The Lambertian Model:Diffuse Surface Reflection

A diffuse reflectingsurface reflects lightuniformly in all directions

Uniform brightness forall viewpoints of a planarsurface.

Page 78: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 78

Shape (normals) from shading

Cylinder with white paper and pen stripes

Intensities plotted as a surface

Clearly intensity encodes shape in this case

Page 79: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 79

Shape (normals) from shading

Plot of intensity of one image row reveals the 3D shape of these diffusely reflecting objects.

Page 80: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 80

Specular reflection is highly directional and mirrorlike.

R is the ray of reflectionV is direction from the surface toward the viewpoint is the shininess parameter

Page 81: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 81

What about models for recognition

“recognition” = to know again;

How does memory store models of faces, rooms,

chairs, etc.?

Page 82: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 82

Human capability extensive

Child age 6 might recognize 3000 words

And 30,000 objects Junkyard robot must recognize

nearly all objects Hundreds of styles of lamps,

chairs, tools, …

Page 83: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 83

Some methods: recognize Via geometric alignment: CAD Via trained neural net Via parts of objects and how they

join Via the function/behavior of an

object

Page 84: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 84

Side view classes of Ford Taurus (Chen and Stockman)

These were made in the PRIP Lab from a scale model.

Viewpoints in between can be generated from x and y curvature stored on boundary.

Viewpoints matched to real image boundaries via optimization.

Page 85: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 85

Matching image edges to model limbs

Could recognize car model at stoplight or gate.

Page 86: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 86

Object as parts + relations

Parts have size, color, shape Connect together at concavities Relations are: connect, above, right of, inside of,

Page 87: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 87

Functional models Inspired by JJ Gibson’s Theory of

Affordances. An object is what an object does * container: holds stuff * club: hits stuff * chair: supports humans

Page 88: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 88

Louise Stark: chair model Dozens of CAD models of chairs Program analyzed for * stable pose * seat of right size * height off ground right size * no obstruction to body on seat * program would accept a trash can (which could also pass as a container)

Page 89: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 89

Minski’s theory of frames(Schank’s theory of scripts)

Frames are learned expectations – frame for a room, a car, a party, an argument, …

Frame is evoked by current situation – how? (hard)

Human “fills in” the details of the current frame (easier)

Page 90: Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 2008 90

summary Images have many low level features Can detect uniform regions and contrast Can organize regions and boundaries Human vision uses several simultaneous

channels: color, edge, motion Use of models/knowledge diverse and

difficult Last 2 issues difficult in computer vision