modeling scene and object contexts for human action retrieval with few examples

23
Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples Yu-Gang Jiang Zhenguo Li Shih-Fu Chang IEEE Transactions on CSVT 2011

Upload: selma-terrell

Post on 01-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples. Yu-Gang Jiang Zhenguo Li Shih-Fu Chang IEEE Transactions on CSVT 2011. Outline. Context-based Action Retrieval Framework Experiment Result Conclusion. Framework. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

Modeling Scene and Object Contexts for Human Action

Retrieval with Few Examples

Yu-Gang JiangZhenguo Li

Shih-Fu ChangIEEE Transactions on CSVT 2011

Page 2: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

Outline

• Context-based Action Retrieval Framework• Experiment Result• Conclusion

Page 3: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

Framework

A. Video Representation and Negative Sample Selection

B. Obtaining Action Context1. Scene Recognition2. Object Recognition

C. Estimating Action-Scene-Object Relationship

D. Incorporationg Multiple Contextual Cues

Page 4: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

Context-Based Action Retrival Framework

Page 5: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

A. Video Representation and Negative Sample Selection

• Use the bag-of-features framework

Page 6: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

A. Video Representation and Negative Sample Selection

• Use the bag-of-features framework• Use k-means clustering to generate 4000

visual words

Page 7: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

A. Video Representation and Negative Sample Selection

• Use the bag-of-features framework• Use k-means clustering to generate 4000

visual words• Quantize each video clip into two 4000-D

histograms of visual words

Page 8: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

A. Video Representation and Negative Sample Selection

• Use the bag-of-features framework• Use k-means clustering to generate 4000

visual words• Quantize each video clip into two 4000-D

histograms of visual words• Apply Local and Global Consistency(LGC) [27]

• Pick negative samples after propagation

[27] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Scholkopf, “Learning with local and global consistency,” in Proc. Neural Inform. Process. Syst., 2004, pp. 321–328.

Page 9: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

Context-Based Action Retrival Framework

Page 10: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

B. Scene Recognition

• Train different classifiers for two bag-of-features and simply average their probability predictions

• The scene models are learned by SVM• Adopt 10 scene classes

House Road Bedroom Car Interior Hotel

Kitchen Living Room Office Restaurant Shop

Page 11: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

B. Object Recognition

• It can only detect person, chair and car• Define actions– Track objects based on location and box size– Discard isolated detections

• Compute average spatial distance between different types of object

Page 12: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

B. Object Recognition

Page 13: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

Context-Based Action Retrival Framework

Page 14: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

C. Estimating Action-Scene-Object Relationship

• Define context-based inference score

– Well distinguish samples from P and N

– Produce similar scores if two samples are close

Page 15: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

C. Estimating Action-Scene-Object Relationship

• F : prediction matrix of contextual cues• c : coefficient vector

...

…m contextual cues

n training samples

c

F × ...

Page 16: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

C. Estimating Action-Scene-Object Relationship

Constraint 1 Constraint 2

Page 17: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

Context-Based Action Retrival Framework

Page 18: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

D. Incorporating Multiple Contextual Cues

• Given an action a and a test sample x

: context weight parameter: the prediction score of contextual cues on x: action prediction score based on raw visual features: refined prediction after incorporating contextual cues

AnswerPhone DriveCar Eat Kiss GetOutCar HandShake

FightPerson HugPerson Run SitDown SitUP StandUp

Page 19: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

Experiment Results

• Mean average precision(mAP)• Retrieval Performance by Raw Features

Page 20: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

Experiment Results

• Scene vs. Object

Page 21: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

Experiment Results

• Scene vs. Object

Page 22: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

Experiment Results

• Comparison to the state of art– SVM learning– Movie script-mining

Page 23: Modeling Scene and Object Contexts for Human Action Retrieval with Few Examples

Conclusion

• An algorithm based on semi-supervised learning paradigm is used to model action-scene-object dependency from limited samples

• This algorithm can be applied to other types of action videos