school of electronic information engineering, tianjin university human action recognition by...

School of Electronic Information Engineering , Tianjin University

Human Action Recognition by Learning Bases of Action

Attributes and Parts

Jia pingping

Outline：

Experiments: PASCAL & Stanford 40 Actions4

Intuition: Action Attributes and Parts2

Algorithm: Learning Bases of Attributes and Parts

Conclusion

1 Action Classification in Still Images

Action Classification in Still Images

Low level featureRiding bike

Riding a bikeSitting on a bike seatWearing a helmetPeddling the pedals…

- Semantic concepts – Attributes

Low level feature High-level representationRiding bike

- Semantic concepts – Attributes- Objects

- Semantic concepts – Attributes- Objects- Human poses

- Semantic concepts – Attributes- Objects- Human poses- Contexts of attributes & parts

Riding

Low level feature

- Semantic concepts – Attributes- Objects- Human poses- Contexts of attributes & parts

High-level representation

riding a bike

wearing a helmet

Peddling the pedal

sitting on bike seat

Incorporate human knowledge; More understanding of image content; More discriminative classifier.

Riding bike

Outline：

Conclusion

Action Attributes and Parts

Attributes:

… …

semantic descriptions of human actions

Attributes:

… …

semantic descriptions of human actions

Riding bike

Not riding bike

Discriminative classifier, e.g. SVM

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

A pre-trained detector

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Attribute classification

Object detection

Poselet detection

a: Image feature vector

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Attribute classification

Object detection

Poselet detection

Action bases Φ

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Action bases Φ

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Action bases Φ

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Action bases

Bases coefficients w

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Action bases

Bases coefficients w

Riding bike

Outline：

Conclusion

Bases of Atr. & Parts: Training

• Input: 1, , Na a

• Output: 1, , MΦ Φ Φ

1, , NW w wsparse

1min ,

i i ii

a Φw w

1 2s.t. , 1

• Jointly estimate and :Φ W

Bases of Atr. & Parts: Testing

• Input: a

• Output:

1, , MΦ Φ Φ

w sparse

• Estimate w:

wa Φw w

Outline：

Conclusion

1. PASCAL Action Dataset

http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2008/

1. PASCAL Action Dataset

• Contain 9 classes , there are 21,738 images in total;

• Randomly select 50% of each class for training/validation and the remain images for testing;

• 14 attributes, 27 objects, 150 poselets;

• The number of action bases are set to 400 and 600 respectively. The 𝜆and values are set to 0.1 and 0.15.𝛾

Classification Result

1 2 3 4 5 6 7 8 9

Phoning Playing instrument

Reading Riding bike

Riding horse

Running Taking photo

Using computer

Walking

on Our method, use “a”

POSELETS

SURREY_MKUCLEAR_DOSP

1 2 3 4 5 6 7 8 9

Reading Riding bike

Riding horse

Walking

Our method, use “a”Our method, use “w”

POSELETS

Using computer

1 2 3 4 5 6 7 8 9

Reading Riding bike

Riding horse

Walking

Poselet, Maji et al, 2011

Using computer

400 action bases

attributesobjects

poselets

1 2 3 4 5 6 7 8 9

Reading Riding bike

Riding horse

Walking

Using computer

400 action bases

attributesobjects

poselets

1 2 3 4 5 6 7 8 9

Reading Riding bike

Riding horse

Walking

Using computer

400 action bases

attributesobjects

poselets

Control Experiment

Use “a”

Use “w”

A: attributeO: objectP: poselet

2. Stanford 40 Actions

Applauding Blowing bubbles

Brushing teeth

Calling Cleaning floor

Climbing wall

Cooking Cutting trees

Cutting vegetables

Drinking Feeding horse

Fishing Fixing bike

Gardening Holding umbrella

Jumping

Playing guitar

Playing violin

Pouring liquid

Pushing cart

Reading Repairing car

Riding bike

Riding horse

Rowing Running Shooting arrow

Smoking cigarette

Taking photo

Texting message

Throwing frisbee

Using computer

Using microscope

Using telescope

Walking dog

Washing dishes

Watching television

Waving hands

Writing on board

Writing on paper

http://vision.stanford.edu/Datasets/40actions.html

2. Stanford 40 Actions

• contains 40 diverse daily human actions;• 180∼300 images for each class, 9532 real world images in total;• All the images are obtained from Google, Bing, and Flickr;• large variations in human pose, appearance, and background clutter.

Cutting vegetables

Drinking Feeding horse

Fixing bike

Gardening Holding umbrella

Playing guitar

Playing violin

Pouring liquid

Reading Repairing car

Riding bike

Shooting arrow

Smoking cigarette

Taking photo

Walking dog

Washing dishes

Watching television

Drinking Gardening

Smoking Cigarette

Result: • Randomly select 100 images in each class for training, and the remaining images for testing.• 45 attributes, 81 objects, 150 poselets. The number of action bases are set to 400 and 600 respectively. The 𝜆 and 𝜆 values are set to 0.1 and 0.15.•Compare our method with the Locality-constrained Linear Coding (LLC, Wang et al, CVPR 2010) baseline.

Riding

Rowing

Riding

Climbin

Cleanin

Playin

Fishin

Holding

Cuttin

Feedin

ing th

Cuttin

Blowing

Playin

ing te

Pushin

Cookin

ing th

Drinkin

Calling

Pourin

Taking

Textin

Our Method

Control Experiment

A: attributeO: objectP: poselet

Use “a”

Use “w”

Outline：

Conclusion

• Partwise Bag-of-Words (PBoW) Representation: Local feature Body part localization PBoW generation

head-wise BoW

limb-wise BoW

leg-wise BoW

foot-wise BoW

• Local Action Attribute Method: 1. Label the action samples according to different parts

static

vertical move

horizontal move

static

Limb …

For each part, we define a

new set of low-level semantic to re-class the training action

samplesstatic

Leg…

static

• Local Action Attribute Method: 2. For each part, train a set of attribute classifiers according to the set of

semantic we define.

for each part

……

• Local Action Attribute Method: 3. For each action sample, map its low-level representation to a middle-

level representation through the framework as follow:

Head-wise BoW

Limb-wise BoW

Leg-wise BoW

Foot-wise BoW

Combine this four part to built a new histogram

representation of the sample

One action sample

• Local Action Attribute Method: 4. Thus, based on local action attribute, we construct a new descriptor of

action samples. It can be used to classify.

Training set

Testing set

SVMK-NN

Training set

Testing set

School of Electronic Information Engineering , Tianjin University

Thank you

school of electronic information engineering, tianjin university human action recognition by...

attributes parts parts

parts attributes

parts conclusion

action classification

images riding bike slide

learning bases of attributes

human poses parts

images slide

Documents

tianjin i c ., l

china ibp (tianjin)...ibp/yc.jzyc-2019a1 ibp (tianjin)...

precipitation analyses for climate applications pingping xie...

tianjin cement industry design and research institute co...

tianjin xinyue steel

yan shen 1 , a.-y. xiong 1 pingping xie 2

— case study of tianjin, china

comparison: superiority. bao xishun he pingping bao is he....

tianjin cruise port - china...

tianjin eco-city: another green ghost town? - isee...

tianjin pipe international economic & trading … · 2016....

tianjin zhongwang profile

port de tianjin

tianjin mandarin

validity date from country china 00195 section other ... ·...

tianjin university

tianjin haitong chemical industrial co.,...

tianjin dongfang

explosions in tianjin, china

port of tianjin