date: 2013/05/27 instructor : prof. wang , sheng- jyh student: hung, fei -fan
DESCRIPTION
Recognizing Human-Object Interaction in still Image by Modeling the Mutual Context of Objects and Human Poses. Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan. Yao, B., and Fei-fei , L. IEEE Transactions on PAMI (2012 ). Outline. Introduction - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/1.jpg)
RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSESDate: 2013/05/27Instructor: Prof. Wang, Sheng-Jyh Student: Hung, Fei-Fan
Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
![Page 2: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/2.jpg)
2
Outline• Introduction
• Intuition and goal• Model Representation• Model Learning
• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters
• Model Inference• Experimental Results• Conclusion
![Page 3: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/3.jpg)
3
Outline• Introduction
• Intuition and goal• Model Representation• Model Learning
• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters
• Model Inference• Experimental Results• Conclusion
![Page 4: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/4.jpg)
4
Why using context in computer vision?
• simple image vs. human activities
~3-4%
with context
without context
With mutual context:
Without context:
![Page 5: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/5.jpg)
5
Challenges in Human Pose Estimation
• Human pose estimation is challenging
• Object detection facilitate human pose estimation
Difficult part appearance
Self-occlusion
Image region looks like a body part
![Page 6: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/6.jpg)
6
Challenges in Object Detection• Object detection is challenging
• human pose estimation facilitate object detection
Small, low-resolution, partially occluded
Image region similar to detection target
![Page 7: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/7.jpg)
7
The Goal• To build a mutual context model in Human-Object
Interaction(HOI) activities
![Page 8: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/8.jpg)
8
Outline• Introduction
• Intuition and goal• Model Representation• Model Learning
• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters
• Model Inference• Experimental Results• Conclusion
![Page 9: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/9.jpg)
9
Tennis ball
Croquet mallet
Volleyball
Tennis racket
O:
Model representation• Modeling the mutual context of object and human poses
A:
Croquet shot
Volleyball smash
Tennis forehand
H:
P: body parts,
, M:num of bounding box
More than one atomic pose H in A
Body parts
![Page 10: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/10.jpg)
10
• : co-occurrence compatibility between A,O,H• : spatial relationship between O,H• : modeling the image evidence with detectors or classifiers
Model representation
H
A
P1 P2 PL
O1 O2
activity
Human poseobjects
![Page 11: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/11.jpg)
11𝝓1: Co-occurrence context• co-occurrence between all A,O,H
• : strength of co-occurrence interaction
between
: indicator function: total number of atomic poses : total number of objects : total number of activity classes
H
A
P1 P2 PL
O1 O2
![Page 12: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/12.jpg)
12
• Spatial relationship between all O and different H
• : weight of • : a sparse binary vector • shows relative location• of w.r.t.
𝝓2: Spatial context
H
A
P1 P2 PL
O1 O2
:
![Page 13: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/13.jpg)
13
• Model O in the image I using object detection score
• For all object O• : vector of score of detecting • : weight of
• Between Om and Om’
• : binary feature vector• : weight of and
𝝓3: Modeling objects
H
A
P1 P2 PL
O1 O2
![Page 14: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/14.jpg)
14𝝓4: Modeling human pose• Model atomic pose that H belongs to and likelihood
• : Gaussian likelihood function• : vector of score of detecting body part in
H
A
P1 P2 PL
O1 O2
![Page 15: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/15.jpg)
15𝝓5: Modeling activity• Model HOI activity by training activity classifier
• : -dim output of one-versus-all (OVA) discriminative classifier taking image as features
• : feature weight of
H
A
P1 P2 PL
O1 O2
![Page 16: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/16.jpg)
17
Model Properties• Spatial context between O and H
• Object detection and human pose estimation facilitate each other • Ignore the objects and body parts that are unreliable
• Flexible to extend to large scale datasets and other activities• Jointly model can share all objects and atomic poses
![Page 17: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/17.jpg)
18
Outline• Introduction
• Intuition and goal• Model Representation• Model Learning
• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters
• Model Inference• Experimental Results• Conclusion
![Page 18: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/18.jpg)
19
Model Learning
Assign human pose to atomic pose
Training detectors and classifiers
Estimate parameters by Maximum Likelihood
![Page 19: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/19.jpg)
20
• Using clustering to obtain atomic poses
• Normalize the annotations
• Finding missing part• Using the nearest visible neighbor
• Obtain a set of atomic poses• Hierarchical clustering with maximum linkage measure :
Obtaining Atomic Poses
Assign human pose to atomic pose
Training detectors and classifiers
Estimate parameters by Maximum Likelihood
![Page 20: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/20.jpg)
21
Training Detectors and Classifiers• : Object detector in • : Human body part detector in
• : Overall activity classifier in
Assign human pose to atomic pose
Training detectors and classifiers
Estimate parameters by Maximum Likelihood
deformable part model
Spatial pyramid matching (SPM)SIFT + 3 level image pyramid
![Page 21: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/21.jpg)
24
Estimating Model Parameters• Estimate by using ML approach
with zero-mean Gaussian priorAssign human pose to atomic pose
Training detectors and classifiers
Estimate parameters by Maximum Likelihood
![Page 22: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/22.jpg)
25
Learning result
![Page 23: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/23.jpg)
26
Outline• Introduction
• Intuition and goal• Model Representation• Model Learning
• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters
• Model Inference• Experimental Results• Conclusion
![Page 24: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/24.jpg)
27
Model Inference
Initialize with learned results
New image
Update human body parts
Update object detection results
Update A and H labels
![Page 25: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/25.jpg)
28
Initialization
Initialize Activity classification
Object detectionHuman pose estimation
New image
Initialize with learned results
A: SPM classificationO: object detectionH: pictorial structure model
![Page 26: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/26.jpg)
29
Update model inference• Marginal distribution of human pose:
• Using mixture of Gaussian to refine the prior of body part
Update human body parts
Update object detection results
Update A and H labels
![Page 27: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/27.jpg)
30
Update model inference
• Greedy forward search method :• Initial and no object in bounding box• Select • Label box as • update
• Stop when <0
Update human body parts
Update object detection results
Update A and H labels
O,H
O,A,H O,I
![Page 28: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/28.jpg)
31
Update model inference• Enumerate possible A and H label
• Optimize
Update human body parts
Update object detection results
Update A and H labels
![Page 29: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/29.jpg)
32
Outline• Introduction
• Intuition and goal• Model Representation• Model Learning
• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters
• Model Inference• Experimental Results• Conclusion
![Page 30: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/30.jpg)
33
Experimental Results (Sports Dataset)
![Page 31: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/31.jpg)
34
Experimental Results (Sports Dataset)
![Page 32: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/32.jpg)
35
Experimental Results (Sports Dataset)• Activity classification
![Page 33: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/33.jpg)
36
![Page 34: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/34.jpg)
37
Experimental results (PPMI Dataset)
![Page 35: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/35.jpg)
38
Experimental results (PPMI Dataset)
![Page 36: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/36.jpg)
39
![Page 37: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/37.jpg)
40
Outline• Introduction
• Intuition and goal• Model Representation• Model Learning
• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters
• Model Inference• Experimental Results• Conclusion
![Page 38: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/38.jpg)
41
Conclusion• Mutual context can significantly improve the performance
in difficult visual recognition problems
• The joint model can share all the information
• Annotate all the human body parts and objects in training images
![Page 39: Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan](https://reader035.vdocuments.net/reader035/viewer/2022062520/568161ba550346895dd190d6/html5/thumbnails/39.jpg)
42
Reference• Yao, B., and Fei-fei, L. “Recognizing Human-Object Interactions in
Still Images by Modeling the Mutual Context of Objects and Human Poses,” IEEE Transactions on Pattern Analysis and Machine Intelligence (2012)
• B. Yao and L. Fei-Fei, “Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010
• B. Sapp, A. Toshev, and B. Taskar, “Cascade Models for Articulated Pose Estimation,” Proc. European Conf. Computer Vision, 2010.
• S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.
• http://en.wikipedia.org/wiki/Hierarchical_clustering