introduction to computer visionjbhuang/teaching/ece... · what is the purpose of action...
TRANSCRIPT
![Page 1: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/1.jpg)
Action Recognition
Computer Vision
Jia-Bin Huang, Virginia Tech
Many slides from D. Hoiem
![Page 2: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/2.jpg)
This section: advanced topics
•Convolutional neural networks in vision
•Action recognition
•Vision and Language
•3D Scenes and Context
![Page 3: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/3.jpg)
What is an action?
Action: a transition from one state to another• Who is the actor?
• How is the state of the actor changing?
• What (if anything) is being acted on?
• How is that thing changing?
• What is the purpose of the action (if any)?
![Page 4: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/4.jpg)
How do we represent actions?
CategoriesWalking, hammering, dancing, skiing, sitting down, standing up, jumping
Poses
Nouns and Predicates<man, swings, hammer><man, hits, nail, w/ hammer>
![Page 5: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/5.jpg)
What is the purpose of action recognition?
To describe
![Page 6: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/6.jpg)
What is the purpose of action recognition?
•To predict
![Page 7: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/7.jpg)
What is the purpose of action recognition?
•To understand the intention and motivation
Predicting Motivations of Actions by Leveraging Text, CVPR 2016
![Page 8: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/8.jpg)
How can we identify actions?
Motion Pose
Held Objects
Nearby Objects
![Page 9: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/9.jpg)
Representing Motion
Bobick Davis 2001
Optical Flow with Motion History
![Page 10: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/10.jpg)
Representing Motion
Space-Time Volumes
Blank et al. 2005
![Page 11: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/11.jpg)
Representing Motion
Efros et al. 2003
Optical Flow with Split Channels
optical flow split into pos/neg channels blurred pos/neg flow
![Page 12: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/12.jpg)
Representing Motion
Tracked Points
Matikainen et al. 2009
![Page 13: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/13.jpg)
Representing MotionSpace-Time Interest Points
Corner detectors in space-time
Laptev 2005
Moving cornerBall hits wall
Balls collide Balls collide (different scale)
![Page 14: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/14.jpg)
Representing MotionSpace-Time Interest Points
Laptev 2005
![Page 15: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/15.jpg)
Examples of Action Recognition Systems
•Feature-based classification
•Recognition using pose and objects
![Page 16: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/16.jpg)
Action recognition as classification
Retrieving actions in movies, Laptev and Perez, 2007
![Page 17: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/17.jpg)
Remember image categorization…
Training Labels
Training
Images
Classifier Training
Training
Image Features
Image Features
Testing
Test Image
Trained Classifier
Trained Classifier Outdoor
Prediction
![Page 18: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/18.jpg)
Remember spatial pyramids….
Compute histogram in each spatial bin
![Page 19: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/19.jpg)
Features for Classifying Actions
1. Spatio-temporal pyramids • Image Gradients• Optical Flow
![Page 20: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/20.jpg)
Features for Classifying Actions
2. Spatio-temporal interest points
Corner detectors in space-time
Descriptors based on Gaussian derivative filters over x, y, time
![Page 21: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/21.jpg)
Classification
•Boosted stubs for pyramids of optical flow, gradient
•Nearest neighbor for STIP
![Page 22: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/22.jpg)
Searching the video for an action
1. Detect keyframes using a trained HOG detector in each frame
2. Classify detected keyframes as positive (e.g., “drinking”) or negative (“other”)
![Page 23: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/23.jpg)
Accuracy in searching video
Without keyframedetection
With keyframedetection
![Page 24: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/24.jpg)
Learning realistic human actions from movies, Laptev et al. 2008
“Talk on phone”
“Get out of car”
![Page 25: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/25.jpg)
Approach
•Space-time interest point detectors
•Descriptors• HOG, HOF
•Pyramid histograms (3x3x2)
•SVMs with Chi-Squared Kernel
Interest Points
Spatio-Temporal Binning
![Page 26: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/26.jpg)
Results
![Page 27: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/27.jpg)
Action Recognition using Pose and Objects
Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities, B. Yao and Li Fei-Fei, 2010
Slide Credit: Yao/Fei-Fei
![Page 28: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/28.jpg)
Human-Object Interaction
Torso
Head
• Human pose estimation
Holistic image based classification
Integrated reasoning
Slide Credit: Yao/Fei-Fei
![Page 29: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/29.jpg)
Human-Object Interaction
Tennis
racket
• Human pose estimation
Holistic image based classification
Integrated reasoning
• Object detection
Slide Credit: Yao/Fei-Fei
![Page 30: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/30.jpg)
Human-Object Interaction
• Human pose estimation
Holistic image based classification
Integrated reasoning
• Object detection
Torso
Head
Tennis
racket
Activity: Tennis Forehand
Slide Credit: Yao/Fei-Fei
• Action categorization
![Page 31: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/31.jpg)
• Felzenszwalb & Huttenlocher, 2005
• Ren et al, 2005
• Ramanan, 2006
• Ferrari et al, 2008
• Yang & Mori, 2008
• Andriluka et al, 2009
• Eichner & Ferrari, 2009
Difficult part
appearance
Self-occlusion
Image region looks
like a body part
Human pose estimation & Object detection
Human pose
estimation is
challenging.
Slide Credit: Yao/Fei-Fei
![Page 32: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/32.jpg)
Human pose estimation & Object detection
Human pose
estimation is
challenging.
• Felzenszwalb & Huttenlocher, 2005
• Ren et al, 2005
• Ramanan, 2006
• Ferrari et al, 2008
• Yang & Mori, 2008
• Andriluka et al, 2009
• Eichner & Ferrari, 2009 Slide Credit: Yao/Fei-Fei
![Page 33: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/33.jpg)
Human pose estimation & Object detection
Facilitate
Given the
object is
detected.
Slide Credit: Yao/Fei-Fei
![Page 34: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/34.jpg)
• Viola & Jones, 2001
• Lampert et al, 2008
• Divvala et al, 2009
• Vedaldi et al, 2009
Small, low-resolution,
partially occluded
Image region similar
to detection target
Human pose estimation & Object detection
Object
detection is
challenging
Slide Credit: Yao/Fei-Fei
![Page 35: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/35.jpg)
Human pose estimation & Object detection
Object
detection is
challenging
• Viola & Jones, 2001
• Lampert et al, 2008
• Divvala et al, 2009
• Vedaldi et al, 2009
Slide Credit: Yao/Fei-Fei
![Page 36: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/36.jpg)
Human pose estimation & Object detection
Facilitate
Given the
pose is
estimated.
Slide Credit: Yao/Fei-Fei
![Page 37: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/37.jpg)
Human pose estimation & Object detection
Mutual Context
Slide Credit: Yao/Fei-Fei
![Page 38: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/38.jpg)
H
A
Mutual Context Model Representation
• More than one H for each A;
• Unobserved during training.
A:
Croquet
shot
Volleyball
smash
Tennis
forehand
Intra-class variations
Activity
Object
Human pose
Body parts
lP: location; θP: orientation; sP: scale.
Croquet
malletVolleyballTennis
racket
O:
H:
P:
f: Shape context. [Belongie et al, 2002]
P1
Image evidence
fO
f1 f2 fN
O
P2 PN
Slide Credit: Yao/Fei-Fei
![Page 39: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/39.jpg)
Learning Results
Cricket
defensive
shot
Cricket
bowling
Croquet
shot
Slide Credit: Yao/Fei-Fei
![Page 40: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/40.jpg)
Learning Results
Tennis
serve
Volleyball
smash
Tennis
forehand
Slide Credit: Yao/Fei-Fei
![Page 41: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/41.jpg)
I
Model Inference
The learned models
Slide Credit: Yao/Fei-Fei
![Page 42: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/42.jpg)
I
Model Inference
The learned models
Head detection
Torso detection
Tennis racket detectionLayout of the object and body parts.
Compositional
Inference
[Chen et al, 2007]
* *
1 1 1 1,, , , nn
A H O P
Slide Credit: Yao/Fei-Fei
![Page 43: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/43.jpg)
I
Model Inference
The learned models
* *
1 1 1 1,, , , nn
A H O P * *
,, , ,K K K K nn
A H O P
Output
Slide Credit: Yao/Fei-Fei
![Page 44: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/44.jpg)
Dataset and Experiment Setup
• Object detection;
• Pose estimation;
• Activity classification.
Tasks:
[Gupta et al, 2009]
Cricket
defensive shot
Cricket
bowling
Croquet
shot
Tennis
forehand
Tennis
serve
Volleyball
smash
Sport data set: 6 classes
180 training (supervised with object and part locations) & 120 testing images
Slide Credit: Yao/Fei-Fei
![Page 45: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/45.jpg)
[Gupta et al, 2009]
Cricket
defensive shot
Cricket
bowling
Croquet
shot
Tennis
forehand
Tennis
serve
Volleyball
smash
Sport data set: 6 classes
Dataset and Experiment Setup
• Object detection;
• Pose estimation;
• Activity classification.
Tasks:
180 training (supervised with object and part locations) & 120 testing images
Slide Credit: Yao/Fei-Fei
![Page 46: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/46.jpg)
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Recall
Pre
cis
ion
Object Detection Results
Cricket bat
Valid
region
Croquet mallet Tennis racket Volleyball
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Recall
Pre
cis
ion
Cricket ball
Our
Method
Sliding
window
Pedestrian
context
[Andriluka
et al, 2009]
[Dalal &
Triggs, 2006]
Slide Credit: Yao/Fei-Fei
![Page 47: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/47.jpg)
Object Detection Results
590 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Recall
Pre
cis
ion
Volleyball
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Recall
Pre
cis
ion
Cricket ball
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
RecallP
recis
ion
Our Method
Pedestrian as context
Scanning window detector
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Recall
Pre
cis
ion
Our Method
Pedestrian as context
Scanning window detector
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Recall
Pre
cis
ion
Our Method
Pedestrian as context
Scanning window detectorSliding window Pedestrian context Our method
Sm
all
ob
jec
tB
ac
kg
rou
nd
clu
tte
r
Slide Credit: Yao/Fei-Fei
![Page 48: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/48.jpg)
Dataset and Experiment Setup
• Object detection;
• Pose estimation;
• Activity classification.
Tasks:
[Gupta et al, 2009]
Cricket
defensive shot
Cricket
bowling
Croquet
shot
Tennis
forehand
Tennis
serve
Volleyball
smash
Sport data set: 6 classes
180 training & 120 testing images
Slide Credit: Yao/Fei-Fei
![Page 49: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/49.jpg)
Human Pose Estimation Results
Method Torso Upper Leg Lower Leg Upper Arm Lower Arm Head
Ramanan,
2006.52 .22 .22 .21 .28 .24 .28 .17 .14 .42
Andriluka et
al, 2009.50 .31 .30 .31 .27 .18 .19 .11 .11 .45
Our full
model.66 .43 .39 .44 .34 .44 .40 .27 .29 .58
Slide Credit: Yao/Fei-Fei
![Page 50: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/50.jpg)
Human Pose Estimation Results
Method Torso Upper Leg Lower Leg Upper Arm Lower Arm Head
Ramanan,
2006.52 .22 .22 .21 .28 .24 .28 .17 .14 .42
Andriluka et
al, 2009.50 .31 .30 .31 .27 .18 .19 .11 .11 .45
Our full
model.66 .43 .39 .44 .34 .44 .40 .27 .29 .58
Andriluka
et al, 2009
Our estimation
result
Tennis serve
modelAndriluka
et al, 2009
Our estimation
result
Volleyball
smash model
Slide Credit: Yao/Fei-Fei
![Page 51: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/51.jpg)
Human Pose Estimation Results
Method Torso Upper Leg Lower Leg Upper Arm Lower Arm Head
Ramanan,
2006.52 .22 .22 .21 .28 .24 .28 .17 .14 .42
Andriluka et
al, 2009.50 .31 .30 .31 .27 .18 .19 .11 .11 .45
Our full
model.66 .43 .39 .44 .34 .44 .40 .27 .29 .58
One pose
per class.63 .40 .36 .41 .31 .38 .35 .21 .23 .52
Estimation
result
Estimation
result
Estimation
result
Estimation
result
Slide Credit: Yao/Fei-Fei
![Page 52: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/52.jpg)
Dataset and Experiment Setup
• Object detection;
• Pose estimation;
• Activity classification.
Tasks:
[Gupta et al, 2009]
Cricket
defensive shot
Cricket
bowling
Croquet
shot
Tennis
forehand
Tennis
serve
Volleyball
smash
Sport data set: 6 classes
180 training & 120 testing images
Slide Credit: Yao/Fei-Fei
![Page 53: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/53.jpg)
Activity Classification Results
Gupta et
al, 2009
Our
model
Bag-of-
Words
83.3%
Cla
ssific
atio
n a
ccu
racy
78.9%
52.5%
0.9
0.8
0.7
0.6
0.5
Cricket
shot
Tennis
forehand
Bag-of-words
SIFT+SVM
Gupta et
al, 2009
Our
model
Slide Credit: Yao/Fei-Fei
![Page 54: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/54.jpg)
Motion features – Dense Trajectory
Action Recognition by Dense Trajectories, CVPR 2011Action Recognition with Improved Trajectories, ICCV 2013
![Page 55: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/55.jpg)
Video classification with CNNs
Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014
![Page 56: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/56.jpg)
Video classification with CNNs
Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014
![Page 57: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/57.jpg)
Two-stream CNN
Two-Stream Convolutional Networks for Action Recognition in Videos, NIPS 2014
![Page 58: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/58.jpg)
3D Convolutional Networks
Learning Spatiotemporal Features with 3D Convolutional Networks, ICCV 2015
![Page 59: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/59.jpg)
Action recognition -> Semantic role Labeling
![Page 60: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/60.jpg)
![Page 61: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/61.jpg)
Take-home messages
•Action recognition is an open problem. • How to define actions?• How to infer them?• What are good visual cues? • How do we incorporate higher level reasoning?
![Page 62: Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner](https://reader034.vdocuments.net/reader034/viewer/2022052101/603bb06a3492bd5cb87ed1d6/html5/thumbnails/62.jpg)
Take-home messages
•Some work done, but it is just the beginning of exploring the problem. So far…• Actions are mainly categorical
(could be framed in terms of effect or intent)• Most approaches are classification using simple features
(spatial-temporal histograms of gradients or flow, s-t interest points, SIFT in images)
• Just a couple works on how to incorporate pose and objects
• Not much idea of how to reason about long-term activities or to describe video sequences