bangpeng yao li fei-fei computer science department, stanford university, usa
TRANSCRIPT
![Page 1: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/1.jpg)
Bangpeng Yao Li Fei-Fei
Computer Science Department, Stanford University, USA
Modeling Mutual Context of Object and Human Pose
in Human-Object Interaction Activities
![Page 2: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/2.jpg)
IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human
pose estimationExperimentsConclusion
Outline
![Page 3: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/3.jpg)
IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human
pose estimationExperimentsConclusion
Outline
![Page 4: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/4.jpg)
Human pose estimation & Object detection
Introduction
Right-arm
Left-arm
Torso
Right-leg
Left-leg
Tennisracket
![Page 5: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/5.jpg)
Challenging:
Introduction
![Page 6: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/6.jpg)
Mutual context:Human pose estimation & Object detection - facilitate the recognition of each other
Introduction
![Page 7: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/7.jpg)
Mutual context V.S no mutual context
Introduction
![Page 8: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/8.jpg)
IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human
pose estimationExperimentsConclusion
Outline
![Page 9: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/9.jpg)
HOI activity
![Page 10: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/10.jpg)
A: Activity class, ex : tennis server, volleyball smash
O:Object, ex : tennis racket, volleyball
H:Human pose
P: Body partsf: visual feature
Each A have more than one type of H
HOI activity
![Page 11: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/11.jpg)
: edge of the model : potential function
: weight : Freguencies of
co-occurrence between A, O, and H , , : Spatial
relationship among object and body parts, compute by
: (position, orientation, scale)
The model
![Page 12: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/12.jpg)
: model the dependence of the object and a body part with their corresponding image evidence
The model
![Page 13: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/13.jpg)
Co-occurrence context for the activity class, object, and human pose
Multiple types of human pose for each activity
Spatial context between object and body parts
Properties of the model
![Page 14: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/14.jpg)
IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human
pose estimationExperimentsConclusion
Outline
![Page 15: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/15.jpg)
Learning step needs to achieve two goals:structure learning & parameter
estimation
Structure learning: discover the hidden human pose and the connectivity among the object, human pose, and body parts
Parameter estimation: for the potential weight to maximize the discrimination between different activities
Model learning
![Page 16: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/16.jpg)
Objective: Connectivity pattern between the object, the human pose, and the body parts
Method: hill-climbing approach with tabu list
Structure learning
![Page 17: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/17.jpg)
Hill-climbing approach adds or removes edges one at a time until maximum is reached
Hill-climbing structure learning
Humanpose
![Page 18: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/18.jpg)
Objective: obtain a set of potential weight that maximize the discrimination between different classes of activities
Training sample : : is potential function value, disconnected edge set 0
: is the human pose H : is the class label AIf , then
: is a weight vector for the r-th sub-class
Max-margin parameter estimation
![Page 19: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/19.jpg)
: is L2 norm : normalization constant
Multiclass SVM
![Page 20: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/20.jpg)
Using only one human pose for each HOI class is not enough to characterize well all the image in this class
Analysis of our learning algorithm
![Page 21: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/21.jpg)
IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human
pose estimationExperimentsConclusion
Outline
![Page 22: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/22.jpg)
Given a new testing image, our objective is : - estimate the pose of the human- detect the object that is interacting with the human
Model inference, object detection, and human pose estimation
![Page 23: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/23.jpg)
IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human
pose estimationExperimentsConclusion
Outline
![Page 24: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/24.jpg)
Cricket - defensive shot (player and cricket bat)
Cricket - bowling (player and cricket ball)Croquet - shot (player and croquet mallet)Tennis - forehand (player and tennis racket)Tennis – serve (player and tennis racket)Volleyball - smash (player and volleyball)
30 images for training, 20 for testing
The sports dataset
![Page 25: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/25.jpg)
Better object detection
![Page 26: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/26.jpg)
Sliding window Pedestrian as context Our method
detector
Better object detection
![Page 27: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/27.jpg)
Pose estimation still difficult
Multiple pose is better than only one pose
Better pose estimation
![Page 28: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/28.jpg)
Upper: our methodLower left: object detection by a scanning
windowLower right: pose estimation by the state-of-
art pictorial structure method
![Page 29: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/29.jpg)
Note Gupta et.al. uses predominantly the background scene context
Combining object and pose for HOI activity classification
![Page 30: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/30.jpg)
IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human
pose estimationExperimentsConclusion
Outline
![Page 31: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/31.jpg)
Treat object and human pose as the context of each other in different HOI activity classes
Structure learning method - connectivity important patterns between objects and human pose
Further improve : - incorporate useful background scene context to facilitate the recognition of foreground object and
activity- deal with more than one object
Conclusion
![Page 32: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649cf85503460f949c9094/html5/thumbnails/32.jpg)
Thanks!!!