recognizing action at a distance a.a. efros, a.c. berg, g. mori, j. malik uc berkeley
TRANSCRIPT
![Page 1: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/1.jpg)
Recognizing Action at a Distance
A.A. Efros, A.C. Berg, G. Mori, J. Malik
UC Berkeley
![Page 2: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/2.jpg)
Looking at People
• 3-pixel man• Blob tracking
– vast surveillance literature
• 300-pixel man• Limb tracking
– e.g. Yacoob & Black, Rao & Shah, etc.
Far fieldNear field
![Page 3: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/3.jpg)
Medium-field Recognition
The 30-Pixel Man
![Page 4: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/4.jpg)
Appearance vs. Motion
Jackson PollockNumber 21 (detail)
![Page 5: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/5.jpg)
Goals
• Recognize human actions at a distance– Low resolution, noisy data– Moving camera, occlusions– Wide range of actions (including non-periodic)
![Page 6: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/6.jpg)
Our Approach
• Motion-based approach– Non-parametric; use large amount of data– Classify a novel motion by finding the most similar
motion from the training set• Related Work
– Periodicity analysis• Polana & Nelson; Seitz & Dyer; Bobick et al; Cutler & Davis;
Collins et al.
– Model-free • Temporal Templates [Bobick & Davis]
• Orientation histograms [Freeman et al; Zelnik & Irani]
• Using MoCap data [Zhao & Nevatia, Ramanan & Forsyth]
![Page 7: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/7.jpg)
Gathering action data
• Tracking – Simple correlation-based tracker– User-initialized
![Page 8: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/8.jpg)
Figure-centric Representation
• Stabilized spatio-temporal volume– No translation information– All motion caused by person’s
limbs• Good news: indifferent to camera
motion
• Bad news: hard!
• Good test to see if actions, not just translation, are being captured
![Page 9: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/9.jpg)
input sequence
Remembrance of Things Past• “Explain” novel motion sequence by
matching to previously seen video clips– For each frame, match based on some temporal
extent
Challenge: how to compare motions?
motion analysisrun
walk leftswing
walk rightjog
database
![Page 10: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/10.jpg)
How to describe motion?
• Appearance – Not preserved across different clothing
• Gradients (spatial, temporal)– same (e.g. contrast reversal)
• Edges/Silhouettes – Too unreliable
• Optical flow– Explicitly encodes motion – Least affected by appearance – …but too noisy
![Page 11: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/11.jpg)
Spatial Motion Descriptor
Image frame Optical flow yxF ,
yx FF , yyxx FFFF ,,, blurred
yyxx FFFF ,,,
![Page 12: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/12.jpg)
Spatio-temporal Motion Descriptor
t
…
…
…
…
Sequence A
Sequence B
Temporal extent E
Bframe-to-frame
similarity matrix
A
motion-to-motionsimilarity matrix
A
B
I matrix
E
E
blurry I
E
E
![Page 13: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/13.jpg)
Football Actions: matching
InputSequence
Matched Frames
input matched
![Page 14: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/14.jpg)
Football Actions: classification
10 actions; 4500 total frames; 13-frame motion descriptor
![Page 15: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/15.jpg)
Classifying Ballet Actions16 Actions; 24800 total frames; 51-frame motion descriptor. Men used to classify women and vice versa.
![Page 16: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/16.jpg)
Classifying Tennis Actions
6 actions; 4600 frames; 7-frame motion descriptorWoman player used as training, man as testing.
![Page 17: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/17.jpg)
Classifying Tennis
• Red bars show classification results
![Page 18: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/18.jpg)
Querying the Databaseinput sequence
database
run
walk leftswing
walk rightjog
run walk left swing walk right jogAction Recognition:
Joint Positions:
![Page 19: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/19.jpg)
2D Skeleton Transfer
• We annotate database with 2D joint positions
• After matching, transfer data to novel sequence– Ajust the match for best fit
Input sequence:
Transferred 2D skeletons:
![Page 20: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/20.jpg)
3D Skeleton Transfer
• We populate database with rendered stick figures from 3D Motion Capture data
• Matching as before, we get 3D joint positions (kind of)!
Input sequence:
Transferred 3D skeletons:
![Page 21: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/21.jpg)
“Do as I Do” Motion Synthesis
• Matching two things:– Motion similarity across sequences– Appearance similarity within sequence (like VideoTextures)
• Dynamic Programming
input sequence
synthetic sequence
![Page 22: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/22.jpg)
“Do as I Do” Source Motion Source Appearance
Result
3400 Frames
![Page 23: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/23.jpg)
“Do as I Say” Synthesis
• Synthesize given action labels– e.g. video game control
run walk left swing walk right jog
synthetic sequence
run
walk leftswing
walk rightjog
![Page 24: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/24.jpg)
“Do as I Say”
• Red box shows when constraint is applied
![Page 25: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/25.jpg)
Actor Replacement
SHOW VIDEO
![Page 26: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/26.jpg)
Conclusions
• In medium field action is about motion
• What we propose:– A way of matching motions at coarse scale
• What we get out:– Action recognition– Skeleton transfer – Synthesis: “Do as I Do” & “Do as I say”
• What we learned?– A lot to be said for the “little guy”!
![Page 27: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/27.jpg)
Thank You
![Page 28: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/28.jpg)
Smoothness for Synthesis
• is action similarity between source and target • is appearance similarity within target frames• For every source frame i, find best target frame • by maximizing following cost function:
• Optimize using dynamic programming
appW
actW
)1,(),( 2
11
n
iiiappapp
n
iiactact WiW
i
![Page 29: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/29.jpg)
The Database Analogy
![Page 30: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/30.jpg)
Conclusions
• Action is about motion
• Purely motion-based descriptor for actions
• We treat optical flow – Not as measurement of pixel displacement– But as a set of noisy features that are carefully
smoothed and aggregated
• Can handle very poor, noisy data
![Page 31: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/31.jpg)
Cool Video, Attempt II
![Page 32: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/32.jpg)
![Page 33: Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062716/56649dc65503460f94aba279/html5/thumbnails/33.jpg)
Comparing motion descriptors
t
motion-to-motionsimilarity matrixblurry I
…
…
…
…
frame-to-framesimilarity matrix
I matrix