Juergen Gall
Analyzing Human Behavior in
Video Sequences
Analyzing Human Behavior
Videos
Low level features, e.g., gradients, optical flow
Analyzing Human Behavior
Human Pose
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 2
21 Actions from HMDB
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 3
928 clips, 33183 frames
HMDB51 (Kuehne et al, ICCV 2011)
Puppet Annotation
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 4
Joint-annotated HMDB (JHMDB)
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 5
[ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ]
[ http://jhmdb.is.tue.mpg.de ]
Study with Annotated Data (2013)
• Large potential gain for pose feature
• Not with existing 2d human pose methods
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 6
given flow
+ ~11%
given mask
+ ~9%
pose features
+ ~20%
baseline given puppet flow given puppet mask given joint positions
Low Mid High
baseline
GT
[ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ]
[ http://jhmdb.is.tue.mpg.de ]
CNNs for Pose Estimation
Stack CNNs:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 7
[ S.-E. Wei et al. Convolutional Pose Machines. CVPR 2016 ]
Coupled Action Recognition and Pose
Estimation
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 16
[ U. Iqbal et al. Pose for Action – Action for Pose. FG 2017 ]
Pose Estimation in Videos
Video datasets for human pose in unconstrained videos
does not exist.
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 18
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Estimation in Videos
Video datasets for human pose in unconstrained videos
does not exist.
Unconstrained means
• Public available content from the Internet (e.g.
Youtube)
• Multiple persons in a video (no assumption about
position)
• Arbitrary number of visible joints (truncation and
occlusion)
• Large scale variations (unknown scale)
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 19
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose-Track Dataset
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 20
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Joint-annotated HMDB (JHMDB)
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 22
[ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ]
[ http://jhmdb.is.tue.mpg.de ]
Pose-Track Dataset
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 23
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose-Track Dataset
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 24
Challenge ICCV 2017
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 25
[ http://posetrack.net/workshops/iccv2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
Estimate pose + person association over time:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 26
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
Estimate pose + person association over time:
• Predict body joints (CNN trained on MPII Pose)
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 27
Pose Track: Simultaneous Pose
Estimation and Tracking
Estimate pose + person association over time:
• Predict body joints (CNN trained on MPII Pose)
• Build a graph with temporal and spatial edges
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 28
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 29
f’f f’’f’f f’’
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 30
f’f f’’
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
Unaries: Confidences of detected joints
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 31
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
Spatial binaries: Extract quadratic bounding box around
detection
Two cases:
• Different joint type:
• Logistic regression based on distance and orientation
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 32
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
Spatial binaries: Extract quadratic bounding box around
detection
Two cases:
• Same joint type:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 33
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 34
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
Temporal binaries: Compute optical flow (DeepMatching)
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 35
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
Temporal binaries: Compute optical flow (DeepMatching)
Logistic regression:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 36
f’f[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
Solve integer linear program:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 37
f’f f’’
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
Solve integer linear program:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 38
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
Solve integer linear program:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 39
f’f f’’
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
To obtain plausible pauses, constraints are added:
• Spatial transitivity:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 40
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
To obtain plausible pauses, constraints are added:
• Spatial transitivity:
• Temporal transitivity:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 42
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
To obtain plausible pauses, constraints are added:
• Spatial transitivity:
• Temporal transitivity:
• Spatio-temporal trans.:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 43
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 44
Pose Track: Simultaneous Pose
Estimation and Tracking
To obtain plausible pauses, constraints are added:
Spatio-temporal consistency:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 45
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
Estimate pose + person association over time:
• Predict body joints (CNN trained on MPII Pose)
• Build a graph with temporal and spatial edges
• Partition spatio-temporal graph
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 46
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose
Estimation and Tracking
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 47
Pose Track: Evaluation
• Pose estimation accuracy (mAP)
• Person association (MOTA)
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 48
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Evaluation
• Pose estimation accuracy (mAP)
• Person association (MOTA)
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 49
[ U. Iqbal et al. Pose-Track: Joint Multi-Person
Pose Estimation and Tracking. CVPR 2017 ]
Joint-annotated HMDB (JHMDB)
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 50
[ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ]
[ http://jhmdb.is.tue.mpg.de ]
Video Analysis for Studying the
Behavior of Mice
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 72
Recurrent Neural Networks
• Gated units (LSTM/GRU)
10 /9 /20 17 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 74
Weakly Supervised Learning
• Fully supervised:
• Weakly supervised (transcripts)
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 75
[ A. Richard et al. Weakly Supervised Action Learning with RNN
based Fine-to-Coarse Modeling. CVPR 2017 ]
Weakly Supervised Learning
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 77
Weakly Supervised Learning
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 78
• Represent an activity a like “spoon_powder” by latent
sub-activities s1(a) ,s2
(a),s3(a),…
• Optimal number of sub-activities is unknown:
• Many sub-activities for long activities
• Few sub-activities for short activities
s1(a) s2
(a) s3(a) s4
(a) s5(a) s6
(a)
Model
• RNN with Gated Recurrent Units (GRU)
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 79
Model
• Hidden Markov Model (HMM) enforce fixed order of
sub-activities: s1(a) ,s2
(a),s3(a),…
• HMMs use probabilities of RNN as input
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 80
Model
• Hidden Markov Model (HMM) for each activity
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 81
Model
• The transcripts define the order of activities:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 82
Model
• The transcripts define the order of activities:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 83
Model
• The transcripts define the order of activities:
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 84
Weakly Supervised Learning
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 85
[ A. Richard et al. Weakly Supervised Action Learning with RNN
based Fine-to-Coarse Modeling. CVPR 2017 ]
Weakly Supervised Learning
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 86
[ A. Richard et al. Weakly Supervised Action Learning with RNN
based Fine-to-Coarse Modeling. CVPR 2017 ]
Weakly Supervised Learning
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 87
[ A. Richard et al. Weakly Supervised Action Learning with RNN
based Fine-to-Coarse Modeling. CVPR 2017 ]
Weakly Supervised Learning
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 88
[ A. Richard et al. Weakly Supervised Action Learning with RNN
based Fine-to-Coarse Modeling. CVPR 2017 ]
Weakly Supervised Learning
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 89
[ A. Richard et al. Weakly Supervised Action Learning with RNN
based Fine-to-Coarse Modeling. CVPR 2017 ]
Results
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 90
Results
• Accuracy on unseen sequences (video without
transcript)
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 91
[ A. Richard et al. Weakly Supervised Action Learning with RNN
based Fine-to-Coarse Modeling. CVPR 2017 ]
Results
• Accuracy on unseen sequences (video without
transcript)
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 92
[ A. Richard et al. Weakly Supervised Action Learning with RNN
based Fine-to-Coarse Modeling. CVPR 2017 ]
Results
• Accuracy on unseen sequences (video with
transcript)
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 93
[ A. Richard et al. Weakly Supervised Action Learning with RNN
based Fine-to-Coarse Modeling. CVPR 2017 ]
Research Unit - Anticipating Human
Behavior
20 .09 .2 01 6 Resear ch Uni t 2535 - Ant ic i p a t in g Hum an Behavior 94
[ https://pages.iai.uni-bonn.de/FOR2535 ]
Research Unit - Anticipating Human
Behavior
20 .09 .2 01 6 Resear ch Uni t 2535 - Ant ic i p a t in g Hum an Behavior 95
Thank you for your attention.
09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 96