bangpeng yao li fei-fei computer science department, stanford university, usa

Bangpeng Yao Li Fei-Fei

Computer Science Department, Stanford University, USA

Modeling Mutual Context of Object and Human Pose

in Human-Object Interaction Activities

IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human

pose estimationExperimentsConclusion

Outline

Human pose estimation & Object detection

Introduction

Right-arm

Left-arm

Torso

Right-leg

Left-leg

Tennisracket

Challenging：

Introduction

Mutual context：Human pose estimation & Object detection - facilitate the recognition of each other

Introduction

Mutual context V.S no mutual context

Introduction



Outline

HOI activity

A： Activity class, ex : tennis server, volleyball smash

O：Object, ex : tennis racket, volleyball

H：Human pose

P： Body partsf： visual feature

Each A have more than one type of H

HOI activity

: edge of the model : potential function

: weight : Freguencies of

co-occurrence between A, O, and H , , : Spatial

relationship among object and body parts, compute by

: (position, orientation, scale)

The model

: model the dependence of the object and a body part with their corresponding image evidence

The model

Co-occurrence context for the activity class, object, and human pose

Multiple types of human pose for each activity

Spatial context between object and body parts

Properties of the model



Outline

Learning step needs to achieve two goals：structure learning & parameter

estimation

Structure learning： discover the hidden human pose and the connectivity among the object, human pose, and body parts

Parameter estimation： for the potential weight to maximize the discrimination between different activities

Model learning

Objective： Connectivity pattern between the object, the human pose, and the body parts

Method： hill-climbing approach with tabu list

Structure learning

Hill-climbing approach adds or removes edges one at a time until maximum is reached

Hill-climbing structure learning

Humanpose

Objective： obtain a set of potential weight that maximize the discrimination between different classes of activities

Training sample : : is potential function value, disconnected edge set 0

: is the human pose H : is the class label AIf , then

: is a weight vector for the r-th sub-class

Max-margin parameter estimation

: is L2 norm : normalization constant

Multiclass SVM

Using only one human pose for each HOI class is not enough to characterize well all the image in this class

Analysis of our learning algorithm



Outline

Given a new testing image, our objective is : - estimate the pose of the human- detect the object that is interacting with the human

Model inference, object detection, and human pose estimation



Outline

Cricket - defensive shot (player and cricket bat)

Cricket - bowling (player and cricket ball)Croquet - shot (player and croquet mallet)Tennis - forehand (player and tennis racket)Tennis – serve (player and tennis racket)Volleyball - smash (player and volleyball)

30 images for training, 20 for testing

The sports dataset

Better object detection

Sliding window Pedestrian as context Our method

detector

Better object detection

Pose estimation still difficult

Multiple pose is better than only one pose

Better pose estimation

Upper： our methodLower left： object detection by a scanning

windowLower right： pose estimation by the state-of-

art pictorial structure method

Note Gupta et.al. uses predominantly the background scene context

Combining object and pose for HOI activity classification



Outline

Treat object and human pose as the context of each other in different HOI activity classes

Structure learning method - connectivity important patterns between objects and human pose

Further improve : - incorporate useful background scene context to facilitate the recognition of foreground object and

activity- deal with more than one object

Conclusion

Thanks!!!

bangpeng yao li fei-fei computer science department, stanford university, usa

Documents

human pose slide

human slide

human pose h

mutual context slide

hidden human pose

usa slide

challenging slide

scale slide