Download - Computer Vision Group
![Page 1: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/1.jpg)
Computer Vision GroupUniversity of California Berkeley
Recognizing objects and actions in images and video
Jitendra Malik
U.C. Berkeley
![Page 2: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/2.jpg)
Computer Vision GroupUniversity of California Berkeley
Collaborators
• Grouping: Jianbo Shi, Serge Belongie, Thomas Leung
• Ecological Statistics: Charless Fowlkes, David Martin, Xiaofeng Ren
• Recognition: Serge Belongie, Jan Puzicha, Greg Mori
![Page 3: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/3.jpg)
Computer Vision GroupUniversity of California Berkeley
Motivation
• Interaction with multimedia needs to be in terms that are significant to humans: objects, actions, events..
• Feature based CBIR systems (e.g. color histograms) are a failure for this purpose
• Presently, the best solutions use text / speech recognition/ human annotation
• Goal: Auto-annotation
![Page 4: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/4.jpg)
Computer Vision GroupUniversity of California Berkeley
From images/video to objects
Labeled sets: tiger, grass etc
![Page 5: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/5.jpg)
Computer Vision GroupUniversity of California Berkeley
![Page 6: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/6.jpg)
Computer Vision GroupUniversity of California Berkeley
![Page 7: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/7.jpg)
Computer Vision GroupUniversity of California Berkeley
ConsistencyA
B C
• A,C are refinements of B• A,C are mutual refinements • A,B,C represent the same percept
• Attention accounts for differences
Image
BG L-bird R-bird
grass bush
headeye
beakfar body
headeye
beak body
Perceptual organization forms a tree:
Two segmentations are consistent when they can beexplained by the samesegmentation tree (i.e. theycould be derived from a single perceptual organization).
![Page 8: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/8.jpg)
Computer Vision GroupUniversity of California Berkeley
What enables us to parse a scene?
– Low level cues• Color/texture• Contours• Motion
– Mid level cues• T-junctions• Convexity
– High level Cues• Familiar Object• Familiar Motion
![Page 9: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/9.jpg)
Computer Vision GroupUniversity of California Berkeley
Outline
• Finding boundaries
• Recognizing objects
• Recognizing actions
![Page 10: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/10.jpg)
Computer Vision GroupUniversity of California Berkeley
Finding boundaries: Is texture a problem or a solution?
image orientation energy
![Page 11: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/11.jpg)
Computer Vision GroupUniversity of California Berkeley
Statistically optimal contour detection
• Use humans to segment a large collection of natural images.
• Train a classifier for the contour/non-contour classification using orientation energy and texture gradient as features.
![Page 12: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/12.jpg)
Computer Vision GroupUniversity of California Berkeley
Orientation Energy
• Gaussian 2nd derivative and its Hilbert pair
•
• Can detect combination of bar and edge features; also insensitive to linear shading [Perona&Malik 90]
• Multiple scales
22 )()( evenodd fIfIOE
![Page 13: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/13.jpg)
Computer Vision GroupUniversity of California Berkeley
Texture gradient = Chi square distance between texton histograms in half disks across edge
i
j
k
K
m ji
jiji mhmh
mhmhhh
1
22
)()()]()([
21),(Chi-square
0.1
0.8
![Page 14: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/14.jpg)
Computer Vision GroupUniversity of California Berkeley
![Page 15: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/15.jpg)
Computer Vision GroupUniversity of California Berkeley
![Page 16: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/16.jpg)
Computer Vision GroupUniversity of California Berkeley
![Page 17: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/17.jpg)
Computer Vision GroupUniversity of California Berkeley
ROC curve for local boundary detection
![Page 18: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/18.jpg)
Computer Vision GroupUniversity of California Berkeley
Outline
• Finding boundaries
• Recognizing objects
• Recognizing actions
![Page 19: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/19.jpg)
Computer Vision GroupUniversity of California Berkeley
Biological Shape
• D’Arcy Thompson: On Growth and Form, 1917– studied transformations between shapes of organisms
![Page 20: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/20.jpg)
Computer Vision GroupUniversity of California Berkeley
Deformable Templates: Related Work
• Fischler & Elschlager (1973)
• Grenander et al. (1991)
• von der Malsburg (1993)
![Page 21: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/21.jpg)
Computer Vision GroupUniversity of California Berkeley
Matching Framework
• Find correspondences between points on shape
• Fast pruning
• Estimate transformation & measure similarity
model target
...
![Page 22: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/22.jpg)
Computer Vision GroupUniversity of California Berkeley
Comparing Pointsets
![Page 23: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/23.jpg)
Computer Vision GroupUniversity of California Berkeley
Shape ContextCount the number of points inside each bin, e.g.:
Count = 4
Count = 10
...
Compact representation of distribution of points relative to each point
![Page 24: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/24.jpg)
Computer Vision GroupUniversity of California Berkeley
Shape Context
![Page 25: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/25.jpg)
Computer Vision GroupUniversity of California Berkeley
Shape Contexts
• Invariant under translation and scale
• Can be made invariant to rotation by using local tangent orientation frame
• Tolerant to small affine distortion– Log-polar bins make spatial blur proportional to r
Cf. Spin Images (Johnson & Hebert) - range image registration
![Page 26: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/26.jpg)
Computer Vision GroupUniversity of California Berkeley
Comparing Shape ContextsCompute matching costs using Chi Squared distance:
Recover correspondences by solving linear assignment problem with costs Cij
[Jonker & Volgenant 1987]
![Page 27: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/27.jpg)
Computer Vision GroupUniversity of California Berkeley
Matching Framework
• Find correspondences between points on shape
• Fast pruning
• Estimate transformation & measure similarity
model target
...
![Page 28: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/28.jpg)
Computer Vision GroupUniversity of California Berkeley
Fast pruning
• Find best match for the shape context at only a few random points and add up cost
),(minarg
),(),(
2*
*
1
2
ui
jqueryui
ij
query
r
jiquery
SCSCSC
SCSCSSdist
![Page 29: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/29.jpg)
Computer Vision GroupUniversity of California Berkeley
Snodgrass Results
![Page 30: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/30.jpg)
Computer Vision GroupUniversity of California Berkeley
Results
![Page 31: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/31.jpg)
Computer Vision GroupUniversity of California Berkeley
Matching Framework
• Find correspondences between points on shape
• Fast pruning
• Estimate transformation & measure similarity
model target
...
![Page 32: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/32.jpg)
Computer Vision GroupUniversity of California Berkeley
• 2D counterpart to cubic spline:
• Minimizes bending energy:
• Solve by inverting linear system
• Can be regularized when data is inexact
Thin Plate Spline Model
Duchon (1977), Meinguet (1979), Wahba (1991)
![Page 33: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/33.jpg)
Computer Vision GroupUniversity of California Berkeley
MatchingExample
model target
![Page 34: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/34.jpg)
Computer Vision GroupUniversity of California Berkeley
Outlier Test Example
![Page 35: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/35.jpg)
Computer Vision GroupUniversity of California Berkeley
Synthetic Test Results
Fish - deformation + noise Fish - deformation + outliers
ICP Shape Context Chui & Rangarajan
![Page 36: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/36.jpg)
Computer Vision GroupUniversity of California Berkeley
Terms in Similarity Score• Shape Context difference
• Local Image appearance difference– orientation– gray-level correlation in Gaussian window– … (many more possible)
• Bending energy
![Page 37: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/37.jpg)
Computer Vision GroupUniversity of California Berkeley
Object Recognition Experiments
• Handwritten digits
• COIL 3D objects (Nayar-Murase)
• Human body configurations
• Trademarks
![Page 38: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/38.jpg)
Computer Vision GroupUniversity of California Berkeley
Handwritten Digit Recognition
• MNIST 60 000: – linear: 12.0%– 40 PCA+ quad: 3.3%– 1000 RBF +linear: 3.6%– K-NN: 5%– K-NN (deskewed): 2.4%– K-NN (tangent dist.): 1.1%– SVM: 1.1%– LeNet 5: 0.95%
• MNIST 600 000 (distortions): – LeNet 5: 0.8%– SVM: 0.8%– Boosted LeNet 4: 0.7%
• MNIST 20 000: – K-NN, Shape Context
matching: 0.63%
![Page 39: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/39.jpg)
Computer Vision GroupUniversity of California Berkeley
![Page 40: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/40.jpg)
Computer Vision GroupUniversity of California Berkeley
Results: Digit Recognition
1-NN classifier using:Shape context + 0.3 * bending + 1.6 * image appearance
![Page 41: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/41.jpg)
Computer Vision GroupUniversity of California Berkeley
COIL Object Database
![Page 42: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/42.jpg)
Computer Vision GroupUniversity of California Berkeley
Error vs. Number of Views
![Page 43: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/43.jpg)
Computer Vision GroupUniversity of California Berkeley
Prototypes Selected for 2 Categories
Details in Belongie, Malik & Puzicha (NIPS2000)
![Page 44: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/44.jpg)
Computer Vision GroupUniversity of California Berkeley
Editing: K-medoids
• Input: similarity matrix
• Select: K prototypes
• Minimize: mean distance to nearest prototype
• Algorithm: – iterative– split cluster with most errors
• Result: Adaptive distribution of resources (cfr. aspect graphs)
![Page 45: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/45.jpg)
Computer Vision GroupUniversity of California Berkeley
Error vs. Number of Views
![Page 46: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/46.jpg)
Computer Vision GroupUniversity of California Berkeley
Human body configurations
![Page 47: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/47.jpg)
Computer Vision GroupUniversity of California Berkeley
Deformable Matching
• Kinematic chain-based deformation model
• Use iterations of correspondence and deformation
• Keypoints on exemplars are deformed to locations on query image
![Page 48: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/48.jpg)
Computer Vision GroupUniversity of California Berkeley
Results
![Page 49: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/49.jpg)
Computer Vision GroupUniversity of California Berkeley
Trademark Similarity
![Page 50: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/50.jpg)
Computer Vision GroupUniversity of California Berkeley
Recognizing objects in scenes
![Page 51: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/51.jpg)
Computer Vision GroupUniversity of California Berkeley
Shape matching using multi-scale scanning
• Shape context computation (10 Mops)– Scales * key-points * contour-points (10*100*10000)
• Multi scale coarse matching (100 Gops)– Scales * objects * views * samples * key-points* dim-sc
(10*1000*10*100*100*100)
• Deform into alignment (1 Gops)– Image-objects * shortlist * (samples)^2 *dim-sc
(10*100*10000*100)
![Page 52: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/52.jpg)
Computer Vision GroupUniversity of California Berkeley
Shape matching using grouping
• Complexity determining step: find approx. nearest neighbors of 10^2 query points in a set of 10^6 stored points in the 100 dimensional space of shape contexts.
• Naïve bound of 10^9 can be much improved using ideas from theoretical CS (Johnson-Lindenstrauss, Indyk-Motwani etc)
![Page 53: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/53.jpg)
Computer Vision GroupUniversity of California Berkeley
Putting grouping/segmentation on a sound foundation
• Construct a dataset of human segmented images
• Measure the conditional probability distribution of various Gestalt grouping factors
• Incorporate these in an inference algorithm
![Page 54: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/54.jpg)
Computer Vision GroupUniversity of California Berkeley
Outline
• Finding boundaries
• Recognizing objects
• Recognizing actions
![Page 55: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/55.jpg)
Computer Vision GroupUniversity of California Berkeley
Examples of Actions• Movement and posture change
– run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), …
• Object manipulation– pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit,
press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike, insert, extract, juggle, play musical instrument (various)…
• Conversational gesture– point, …
• Sign Language
![Page 56: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/56.jpg)
Computer Vision GroupUniversity of California Berkeley
Activities and Situation Classification
• Example: Withdrawing money from an ATM
• Activities constructed by composing actions. Partial order plans may be a good model.
• Activities may involve multiple agents
• Detecting unusual situations or activity patterns is facilitated by the video activity transform
![Page 57: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/57.jpg)
Computer Vision GroupUniversity of California Berkeley
On the difficulty of action detection
• Number of categories
• Intra-Category variation– Clothing– Illumination– Person performing the action
• Inter-Category variation
![Page 58: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/58.jpg)
Computer Vision GroupUniversity of California Berkeley
Objects in space Actions in spacetime
• Segment/Region-of-interest
• Features (points, curves, wavelet coefficients..)
• Correspondence and deform into alignment
• Recover parameters of generative model
• Discriminative classifier
• Segment/volume-of-interest
• Features (points, curves, wavelets, motion vectors..)
• Correspondence and deform into alignment
• Recover parameters of generative model
• Discriminative classifier
![Page 59: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/59.jpg)
Computer Vision GroupUniversity of California Berkeley
Key cues for action recognition
• “Morpho-kinesics” of action (shape and movement of the body)
• Identity of the object/s
• Activity context
![Page 60: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/60.jpg)
Computer Vision GroupUniversity of California Berkeley
Image/Video Stick figure Action
• Stick figures can be specified in a variety of ways or at various resolutions (deg of freedom)– 2D joint positions– 3D joint positions– Joint angles
• Complete representation
• Evidence that it is effectively computable
![Page 61: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/61.jpg)
Computer Vision GroupUniversity of California Berkeley
Tracking by Repeated Finding
![Page 62: Computer Vision Group](https://reader036.vdocuments.net/reader036/viewer/2022062302/58a01cc31a28ab746f8b5036/html5/thumbnails/62.jpg)
Computer Vision GroupUniversity of California Berkeley
Achievable goals in 3 years
• Reasonable competence at object recognition at crude category level (~1000)
• Detection/Tracking of humans as kinematic chains, assuming adequate resolution.
• Recognition of ~10-100 actions and compositions thereof.