spatiotemporal graphs for object segmentation, human pose
TRANSCRIPT
![Page 1: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/1.jpg)
Spatiotemporal Graphs for Object Segmentation, Human Pose Estimation and
Action Detection in Videos
Mubarak Shah
Center for Research in Computer Vision
University of Central Florida
![Page 2: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/2.jpg)
Spatiotemporal Graphs (STG)
• Video-based problems
• Nodes and edges
• Spatiotemporal
• Type I
• Type II
![Page 3: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/3.jpg)
Frame 3 Frame 2
Type I Spatiotemporal Graph (STG)
• Nodes represent entities in single frames
Frame 1
……
……
……
Frame ...
Nodes can be: Object proposals Pixels Super-pixels Object locations …
Edges can be: Color similarities Distances Shape similarities …
![Page 4: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/4.jpg)
Type II Spatiotemporal Graph (STG)
• Nodes represent entities in multiple frames
Nodes can be: Object tracklets Super-voxels …
Edges can be: Appearance similarities Motion models Overlaps …
![Page 5: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/5.jpg)
Examples of Spatiotemporal Graphs
![Page 6: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/6.jpg)
Original Video Object Segmentation
Video Object Segmentation (VOS)
![Page 7: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/7.jpg)
Spatiotemporal Graph (STG): Video Object Segmentation
Frame i-1 Frame i Frame i+1
……
……
……
……
……
……
t s
![Page 8: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/8.jpg)
Video Object Co-Segmentation (VOCS)
![Page 9: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/9.jpg)
… …
Video 1 Video 2
…
…
…
…
…
Tracklets
…
…
…
…
…
Tracklets
STG – Video Object Co-Segmentation
![Page 10: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/10.jpg)
Human Pose Estimation in Videos (HPEV)
![Page 11: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/11.jpg)
STG – Human Pose Estimation in Videos
Head Top …
Head Bottom …
Hip
Shoulder
… …
Knee
Elbow
… …
Ankle
Hand … …
![Page 12: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/12.jpg)
Action Detection (HAD)
Diving
![Page 13: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/13.jpg)
…
Video 1
Spatiotemporal Context Graphs for Training Videos
Co
mp
osi
te G
rap
h (
)
Training Videos for Action c
…
Video n
Context Graphs
G1 ( V1, E1 )
Gn ( Vn , En ) …
…
![Page 14: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/14.jpg)
Outline
• Video Object Segmentation (VOS)
• Video Object Co-Segmentation (VOCS)
• Human Pose Estimation in Videos (HPEV)
• Human Action Detection (HAD)
![Page 15: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/15.jpg)
Video Object Segmentation (VOS)
Dong Zhang, Omar Javed, and Mubarak Shah, “Video object segmentation through spatially accurate and temporally dense extraction of primary object regions”, CVPR, 2013
![Page 16: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/16.jpg)
Video Object Segmentation (VOS)
• Applications • Object Recognition
• Activity Recognition
• Surveillance
![Page 17: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/17.jpg)
Video Object Segmentation (VOS)
• Challenges • Camera movements
• Varieties of objects
• Deformable objects
![Page 18: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/18.jpg)
Spatiotemporal Graph for Object Selection
GMMs and MRF based Optimization
Input Video
Object Segmentation
Object Proposal Generation
Framework
![Page 19: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/19.jpg)
Object Proposal Generation
• Object proposal methods [1,2]
[1] Ian Endres and Derek Hoiem, “Category Independent Object Proposals”, ECCV, 2010
[2] Alexe, B., Deselares, T. and Ferrari, V., “What is an object?”, CVPR, 2010
… …
![Page 20: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/20.jpg)
Frame index
Segtrack (monkeydog)
… … 100 1 2 3 4
30
40
17
21 … …
… …
… … 100 1 2 3 4
1 2 3 4
1 2 3 4
51
60
… … 100 1 2 3 4
18
Ranked object proposals
Sample a lot of proposals! Select the right ones!
![Page 21: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/21.jpg)
100
100
… …
… …
… …
… …
… …
Frame index
96
98
100
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
Segtrack (parachute)
33
38
40
43
49
Ranked object proposals expansion
Multiple proposals
![Page 22: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/22.jpg)
Spatiotemporal Graph for Object Selection
![Page 23: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/23.jpg)
Beginning node Ending node
Unary edge Represents object-ness
An object proposal
Unary Edge
![Page 24: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/24.jpg)
𝑺𝒖𝒏𝒂𝒓𝒚 = 𝑴 𝒓 + 𝑨(𝒓)
𝑨 𝒓 : appearance score Objectness
𝑴(𝒓) : average Frobenius norm for optical flow gradient
𝑼𝒙 =𝒖𝒙 𝒖𝒚𝒗𝒙 𝒗𝒚 𝑭
= 𝒖𝒙𝟐 + 𝒖𝒚
𝟐 + 𝒗𝒙𝟐 + 𝒗𝒚
𝟐
Unary Edge: Score
![Page 25: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/25.jpg)
Original video frame
Optical flow
Object region (proposal
Optical flow gradient
Boundary region
OF gradient around boundary
Unary Edge: Motion Score
![Page 26: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/26.jpg)
Binary edge
Frame i Frame i+1
… …
…
…
… …
…
…
Frame i+2
… …
…
…
… …
… …
… …
… …
… …
… …
… …
… …
… …
… …
Binary Edges
![Page 27: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/27.jpg)
𝑺𝒃𝒊𝒏𝒂𝒓𝒚 = 𝝀 ∙ 𝑺𝒐𝒗𝒆𝒓𝒍𝒂𝒑 𝒓𝒎, 𝒓𝒏 ∙ 𝑺𝒄𝒐𝒍𝒐𝒓 (𝒓𝒎, 𝒓𝒏)
𝑺𝒄𝒐𝒍𝒐𝒓(𝒓𝒎, 𝒓𝒏) = 𝒉𝒊𝒔𝒕(𝒓𝒎) ∙ 𝒉𝒊𝒔𝒕(𝒓𝒏) 𝑻
𝑺𝒐𝒗𝒆𝒓𝒍𝒂𝒑(𝒓𝒎, 𝒓𝒏) =𝒓𝒎 ∩ 𝒘𝒂𝒓𝒑𝒎𝒏(𝒓𝒏)
𝒓𝒎 ∪ 𝒘𝒂𝒓𝒑𝒎𝒏(𝒓𝒏)
Binary Edge Score
![Page 28: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/28.jpg)
…… ……
…… ……
Frame i-1 Frame i Frame i+1
…… ……
t s
Goal: Find only one object proposal from each frame, such that all of them have high object-ness and high similarity across frames.
Find the highest weighted path in the DAG.
Longest Path Problem of DAG Dynamic Programming Solution.
Final Spatiotemporal Graph
![Page 29: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/29.jpg)
Results
![Page 30: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/30.jpg)
Qualitative Results – “Girl”
Original video Ground truth
Selected object proposals Segmentation results
Region within the red boundary is the object region
![Page 31: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/31.jpg)
Qualitative Results – “Parachute”
Original video Ground truth
Selected object proposals Segmentation results
Region within the red boundary is the object region
![Page 32: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/32.jpg)
Qualitative Results – “Birdfall”
Original video Ground truth Segmentation results
Region within the red boundary is the object region
![Page 33: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/33.jpg)
Original video Ground truth Segmentation results
Qualitative Results – “Cheetah”
Region within the red boundary is the object region
![Page 34: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/34.jpg)
Original video Ground truth Segmentation results
Qualitative Results – “Monkeydog”
Region within the red boundary is the object region
![Page 35: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/35.jpg)
* Average per-frame pixel error rate. The smaller, the better.
SegTrack: Quantitative Results*
Ours [14] [13] [20] [6]
Use GTs? N N N Y Y
Birdfall 155 189 288 252 454
Cheetah 633 806 905 1142 1217
Girl 1488 1698 1785 1304 1755
Monkeydog 365 472 521 533 683
Parachute 220 221 201 235 502
Avg. 452 542 592 594 791
![Page 36: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/36.jpg)
Summary
• STG moving object
• STG pixel-level segmentation
• Performance improved ~20%
![Page 37: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/37.jpg)
Video Object Segmentation (VOS)
Dong Zhang, Omar Javed, and Mubarak Shah, “Video object segmentation through spatially accurate and temporally dense extraction of primary object regions”, CVPR, 2013
![Page 38: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/38.jpg)
How about multiple videos?
![Page 39: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/39.jpg)
Video Object Co-Segmentation (VOCS)
Dong Zhang, Omar Javed, and Mubarak Shah, “Video object co-segmentation by regulated maximum weight cliques”, ECCV, 2014
![Page 40: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/40.jpg)
Video Object Co-Segmentation (VOCS)
• Applications • Automatic Annotation
• Unsupervised object detection & recognition
• Re-Identification Training image
Annotation
Testing image
![Page 41: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/41.jpg)
Video Object Co-Segmentation (VOCS)
• Challenges
• Appearance variation • Multiple object classes • High complexity
![Page 42: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/42.jpg)
Regulated Maximum Weight Cliques for Tracklets
MRF based Optimization
Input Videos
Object Co-Segmentation
Object Proposal Tracklets Generation
Framework
![Page 43: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/43.jpg)
Object Proposal Tracklets Generation
![Page 44: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/44.jpg)
… …
Video
Object Proposals
![Page 45: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/45.jpg)
… …
Object Proposals
Frame 31 track 1
Track backward Track forward Frame 31 track 2
𝑺𝒔𝒊𝒎𝒊 𝒙𝒎, 𝒙𝒏 = 𝑺𝒂𝒑𝒑 𝒙𝒎, 𝒙𝒏 .𝑺𝒍𝒐𝒄 𝒙𝒎, 𝒙𝒏 .𝑺𝒔𝒉𝒂𝒑𝒆 𝒙𝒎, 𝒙𝒏
![Page 46: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/46.jpg)
Frame 31 track 1
Frame 31 track 2
… … … … for all proposals, in all frames
Frame 61 track 2
… …
… …
… …
…
…
… …
… …
… …
![Page 47: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/47.jpg)
Regulated Maximum Weight Cliques for Tracklets
![Page 48: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/48.jpg)
… …
Video 1 Video 2
C1
C2
…
…
…
…
…
Tracklets
…
…
…
…
…
Tracklets
Clique 1: all chickens
Clique 2: all turtles
Each tracklet is a node Node weight 𝑾 𝑿 = (𝑺𝒐𝒃𝒋𝒆𝒄𝒕(𝒙𝒊))
𝒇𝒊=𝟏 Find Regulated Maximum Weight Cliques by
our modified Bron-Kerbosch Algorithm
![Page 49: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/49.jpg)
Results
![Page 50: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/50.jpg)
Chicken & Turtle
Red: first object Green: second object
Original Videos CoSegmentation Results
![Page 51: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/51.jpg)
Elephant & Giraffe
Red: first object Green: second object
Original Videos CoSegmentation Results
![Page 52: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/52.jpg)
Lion & Zebra
Red: first object Green: second object
Original Videos CoSegmentation
Results Original Videos CoSegmentation
Results
![Page 53: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/53.jpg)
Quantitative Results: MOViCS Dataset
Video Set Ours1 Ours2 VCS[4] ICS[13]
Ours1: same parameters for all video sets Ours2: different parameters for each video set Numbers are the results by intersection-over-union metric, the larger, the better.
![Page 54: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/54.jpg)
Quantitative Results: MOViCS Dataset
Video Set Ours1 Ours2 VCS[4] ICS[13]
Chicken&turtle 0.860 0.860 0.65 0.08
Ours1: same parameters for all video sets Ours2: different parameters for each video set
Numbers are the results by intersection-over-union metric, the larger, the better.
![Page 55: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/55.jpg)
Quantitative Results: MOViCS Dataset
Video Set Ours1 Ours2 VCS[4] ICS[13]
Chicken&turtle 0.860 0.860 0.65 0.08
Zebra&lion 0.588 0.636 0.48 0.23
Giraffe&elephant 0.528 0.639 0.52 0.07
Tiger 0.336 0.336 0.30 0.30
Overall 0.578 0.617 0.49 0.17
Ours1: same parameters for all video sets Ours2: different parameters for each video set
Numbers are the results by intersection-over-union metric, the larger, the better.
![Page 56: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/56.jpg)
Summary
• Type I STG for object segmentation
• Type II STG for object co-segmentation
• Results improved more than 20%
![Page 57: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/57.jpg)
Video Object Co-Segmentation (VOCS)
Dong Zhang, Omar Javed, and Mubarak Shah, “Video object co-segmentation by regulated maximum weight cliques”, ECCV, 2014
![Page 58: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/58.jpg)
What is the most important object?
Human!
![Page 59: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/59.jpg)
Human Pose Estimation in Videos (HPEV)
Dong Zhang and Mubarak Shah, “Human Pose Estimation in Videos”, ICCV, 2015 Dong Zhang and Mubarak Shah, “A Framework for Human Pose Estimation in Videos” (submitted), PAMI, 2016
![Page 60: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/60.jpg)
An Example for Human Segmentation
Coarse segmentation
![Page 61: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/61.jpg)
Pose Estimation
![Page 62: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/62.jpg)
Human Pose Estimation in Videos (HPEV)
• Applications • Action recognition
• HCI
• Surveillance
![Page 63: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/63.jpg)
Human Pose Estimation in Videos (HPEV)
• Challenges • Huge appearance variation
• Multiple people
• Consistent estimation
![Page 64: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/64.jpg)
Body Part Hypotheses Generation
Body Part Tracking
Input Videos
Tree-based Pose Estimation
Pose Hypotheses Generation
Framework
![Page 65: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/65.jpg)
Frame f Frame f+1 Frame f+2
… … … …
Body part
Intra-frame Edge
Inter-frame Edge
Yellow Edges: Commonly Used Intra-
frame Edges
Blue Edges: Symmetric Intra-
frame Edges
Red Edges: Inter-frame Edges
Intra-frame Simple Cycles
Inter-frame Simple Cycles
Too Many Simple Cycles!
NP Hard!!!
![Page 66: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/66.jpg)
Idea 1: Abstraction
Abstract Body Parts Relational Graph Real Body Parts Relational Graph
Remove intra-frame simple cycles
![Page 67: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/67.jpg)
Idea 2: Association
Pose Relational Graph (Tracklet Graph)
Remove the inter-frame simple cycles
![Page 68: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/68.jpg)
N-Best Hypotheses
Real Body Part Hypotheses
Abstract Body Part Hypotheses
Abstract Body Part Tracklets
Tree-based Pose
Estimation
Generate many full body pose hypotheses for each video frame
![Page 69: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/69.jpg)
x x x x
x x
x
x x x x
x x
x
x
x
x
x x x
x
N-Best Hypotheses
Real Body Part Hypotheses
Abstract Body Part Hypotheses
Abstract Body Part Tracklets
Tree-based Pose
Estimation
x x x x
x x
x
x
x
x
x x
x
Generate real body part hypotheses for the frames
![Page 70: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/70.jpg)
N-Best Hypotheses
Real Body Part Hypotheses
Abstract Body Part Hypotheses
Abstract Body Part Tracklets
Tree-based Pose
Estimation
x x x x
x x
x
x x x x
x
x x
x
x
x x x x
x x
x x
x
x x x x
x x x
x x x x
x
x x x
x x x x
x
x x x
x x x
x x
x
Combine Symmetric Parts
Real Body Parts Relational Graph
Abstract Body Parts Relational Graph
x x
x x x
x x
x
![Page 71: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/71.jpg)
N-Best Hypotheses
Real Body Part Hypotheses
Abstract Body Part Hypotheses
Abstract Body Part Tracklets
Tree-based Pose
Estimation
Tracklet Hypotheses Graph
Get Best Tracklets for each part
![Page 72: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/72.jpg)
N-Best Hypotheses
Real Body Part Hypotheses
Abstract Body Part Hypotheses
Abstract Body Part Tracklets
Tree-based Pose
Estimation
Pose Hypotheses Graph
…
…
…
…
…
…
…
… Select Best Poses
![Page 73: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/73.jpg)
Qualitative Results
![Page 74: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/74.jpg)
Outdoor Dataset (video: warmup)
Ours N-Best
![Page 75: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/75.jpg)
Outdoor Dataset (video: bounce)
Ours N-Best
![Page 76: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/76.jpg)
Outdoor Dataset: (video: walk2 video: kick)
Ours
N-Best
Ours
N-Best
![Page 77: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/77.jpg)
N-Best Dataset (video: baseball)
Ours N-Best
![Page 78: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/78.jpg)
N-Best Dataset (video: walkstraight)
Ours N-Best
![Page 79: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/79.jpg)
HumanEva Dataset (video: Jog)
Ours N-Best
![Page 80: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/80.jpg)
HumanEva Dataset (video: Walking)
Ours N-Best
![Page 81: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/81.jpg)
Quantitative Results
![Page 82: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/82.jpg)
Park et
al.
0.44 0.58 0.55 0.69 1.03 1.65 0.82
Ramakri
shna
et.al
0.39 0.58 0.48 0.48 0.88 1.42 0.71
Ours 0.19 0.22 0.35 0.37 0.41 0.61 0.36
Park et
al.
0.99 0.83 0.92 0.86 0.79 0.52 0.82
Ours 0.99 1.00 1.00 0.97 0.91 0.66 0.92
Ramakri
shna
et.al
0.99 0.86 0.95 0.96 0.86 0.52 0.86
Metric Method
Head Torso U.L. L.L. U.A. L.A. Average
PCP
Ours 0.99 1.00 1.00 0.97 0.91 0.66 0.92
Ramakrishna et.al
0.99 0.86 0.95 0.96 0.86 0.52 0.86
Park et al.
0.99 0.83 0.92 0.86 0.79 0.52 0.82
KLE
Ours 0.19 0.22 0.35 0.37 0.41 0.61 0.36
Ramakrishna et.al
0.39 0.58 0.48 0.48 0.88 1.42 0.71
Park et al.
0.44 0.58 0.55 0.69 1.03 1.65 0.82
Outdoor Dataset
PCP is a precision metric, the larger the better KLE is an error metric, the smaller the better
Metric Method Head Torso U.L. L.L. U.A. L.A. Average
PCP
KLE
Probability of a Correct Pose (PCP)
Keypoint Localization Error (KLE)
![Page 83: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/83.jpg)
Park et
al.
0.23 0.52 0.24 0.35 1.10 1.18 0.60
Ramakris
hna et.al
0.27 0.48 0.13 0.22 1.14 1.07 0.55
Ours 0.16 0.42 0.13 0.15 0.20 0.24 0.22
Park et
al.
0.97 0.97 0.97 0.90 0.83 0.48 0.85
Ramakris
hna et.al
0.99 1.00 0.99 0.98 0.99 0.53 0.91
Ours 1.00 1.00 1.00 0.94 0.93 0.67 0.92
Metric Method Head Torso U.L. L.L. U.A. L.A. Average
PCP
Ours 1.00 1.00 1.00 0.94 0.93 0.67 0.92
Ramakrishna et.al
0.99 1.00 0.99 0.98 0.99 0.53 0.91
Park et al.
0.97 0.97 0.97 0.90 0.83 0.48 0.85
KLE
Ours 0.16 0.42 0.13 0.15 0.20 0.24 0.22
Ramakrishna et.al
0.27 0.48 0.13 0.22 1.14 1.07 0.55
Park et al.
0.23 0.52 0.24 0.35 1.10 1.18 0.60
HumanEva I Dataset
PCP is a precision metric, the larger the better KLE is an error metric, the smaller the better
Metric Method Head Torso U.L. L.L. U.A. L.A. Average
PCP
KLE
![Page 84: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/84.jpg)
Park et
al.
0.54 0.74 0.80 1.39 2.39 4.08 1.66
Ramakris
hna et.al
0.53 0.88 0.67 1.01 1.70 2.68 1.25
Ours 0.15 0.17 0.24 0.37 0.30 0.60 0.31
Park et
al.
1.00 0.61 0.86 0.84 0.66 0.41 0.73
Ramakris
hna et.al
1.00 0.69 0.91 0.89 0.85 0.42 0.80
Ours 1.00 1.00 0.92 0.94 0.93 0.65 0.91
Metric Method Head Torso U.L. L.L. U.A. L.A. Average
PCP
Ours 1.00 1.00 0.92 0.94 0.93 0.65 0.91
Ramakrishna et.al
1.00 0.69 0.91 0.89 0.85 0.42 0.80
Park et al.
1.00 0.61 0.86 0.84 0.66 0.41 0.73
KLE
Ours 0.15 0.17 0.24 0.37 0.30 0.60 0.31
Ramakrishna et.al
0.53 0.88 0.67 1.01 1.70 2.68 1.25
Park et al.
0.54 0.74 0.80 1.39 2.39 4.08 1.66
N-Best Dataset
PCP is a precision metric, the larger the better KLE is an error metric, the smaller the better
Metric Method Head Torso U.L. L.L. U.A. L.A. Average
PCP
KLE
![Page 85: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/85.jpg)
Summary
• HPEV can be well formulated into STGs
• STGs can be employed in multiple stages of HPEV
• Improved results
![Page 86: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/86.jpg)
Action Localization in Videos through Context Walk
Khurram Soomro, Haroon Idrees and Mubarak Shah ICCV-2015
![Page 87: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/87.jpg)
Action Recognition
Diving Lifting
Golf
Swing Bench Walking
![Page 88: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/88.jpg)
Action Localization
1. Action Recognition
2. Action Detection a. Trimmed Videos
i. Spatio-Temporal
b. Untrimmed Videos i. Temporal
ii. Spatio-Temporal
Diving
Lifting
Swing Bench
![Page 89: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/89.jpg)
Challenges: Action Localization
• Cluttered Background
• Multiple Actors/Actions
• Untrimmed Videos
Basketball Dunk
Salsa Spin
Hand Waving/Clapping/Boxing
![Page 90: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/90.jpg)
Applications of Action Localization
•Video Search
•Action Retrieval
•Multimedia Event Recounting
•Video Understanding
![Page 91: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/91.jpg)
Existing Solutions to Action Localization
• 1) Learn Action Detector
• 2) Exhaustively search in testing videos
• Sliding Window approach is IMPRACTICAL and WASTEFUL! • Videos:
• Untrimmed (Longer Duration)
• High Resolution
![Page 92: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/92.jpg)
• Action Localization in Videos through Context Walk An efficient approach for action localization
Use of Context Relations that exists in videos: Action-Scene Intra-Action
Action Contours instead of bounding boxes
Motivation Context Graph Context Walk CRF Results
![Page 93: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/93.jpg)
• Context Relations • Learn Spatio-Temporal Relations between all the Supervoxels to those within the Action (Actor
Bounding Box) • Arrows represent three-dimensional displacement vectors capturing:
Action-Scene Relations Intra-Action Relations
Motivation Context Graph Context Walk CRF Results
![Page 94: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/94.jpg)
• Context Graph • Given supervoxels in an nth Training Video
• Construct a directed Graph Gn(Vn, En) for the video • Vn = Supervoxel nodes • En = Spatio-Temporal Relations
• Edges emanate from: All the nodes (supervoxels) Nodes (supervoxels) contained within the Actor Bounding Box
Directed Graph Action-Scene Relations Intra-Action Relations
Motivation Context Graph Context Walk CRF Results
![Page 95: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/95.jpg)
• Context Walk • Given a Testing Video: 1. Construct an Undirected Graph G(V,E)
• Edges exist between Spatio-Temporal Neighbors 2. Randomly Select Initial node 3. Find Nearest Neighbor Supervoxel from Training Data 4. Project Displacement Vectors onto Testing Supervoxels 5. Select Next Node with Max. Probability, Repeat (Steps 3-5)
Training Video Nc
Motivation Context Graph Context Walk CRF Results
![Page 96: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/96.jpg)
(b) Construct Spatio-temporal
Graph using all SVs
SV (v), SV Features ( )
(c) Search NNs using SV
features, then project
displacement vectors
(d) Update SVs Conditional
Distribution using all NNs
(e) Select SV with highest
confidence
(f) Repeat for T steps
(g) Segment Action Proposals through
CRF + SVM Classification
G (V, E)
i
n
j
n uu
Ξ
τΨ
Context Walk
Proposed Framework for Context Walk
CRF + SVM
(a) Segment Video into
Supervoxels (SVs)
![Page 97: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/97.jpg)
•UCF Sports Dataset
Annotated Actor Bounding Box Action Localization Contour
Motivation Context Graph Context Walk CRF Results
![Page 98: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/98.jpg)
Action Localization Contour
•UCF Sports Dataset
Motivation Context Graph Context Walk CRF Results
Annotated Actor Bounding Box
![Page 99: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/99.jpg)
• Sub-JHMDB Dataset
Motivation Context Graph Context Walk CRF Results
Action Localization Contour Annotated Actor Bounding Box
![Page 100: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/100.jpg)
• Sub-JHMDB Dataset
Motivation Context Graph Context Walk CRF Results
Action Localization Contour Annotated Actor Bounding Box
![Page 101: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/101.jpg)
• THUMOS’13 Dataset
Motivation Context Graph Context Walk CRF Results
Action Localization Contour Annotated Actor Bounding Box
![Page 102: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/102.jpg)
• THUMOS’13 Dataset
Motivation Context Graph Context Walk CRF Results
Action Localization Contour Annotated Actor Bounding Box
![Page 103: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/103.jpg)
•Quantitative Results (UCFSports)
Motivation Context Graph Context Walk CRF Results
![Page 104: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/104.jpg)
•Quantitative Results (sub-JHMDB)
Motivation Context Graph Context Walk CRF Results
![Page 105: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/105.jpg)
•Quantitative Results (THUMOS’13)
Motivation Context Graph Context Walk CRF Results
![Page 106: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/106.jpg)
Summary
• Efficient and Effective approach for Action Localization
• Learn Contextual Relations in the form of relative locations between different video regions
• Use Context Walk to select supervoxel at each step and predict the Action Location
![Page 107: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/107.jpg)
Action Localization in Videos through Context Walk
Khurram Soomro, Haroon Idrees and Mubarak Shah ICCV-2015
![Page 108: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/108.jpg)
Conclusion
• Generic Object Segmentation in Videos • Single video (CVPR-2013)
• Multiple videos (ECCV-2014)
• Human Pose Estimation in Videos (ICCV-2015)
• Human Action Detection in Videos (ICCV-2015)
![Page 109: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/109.jpg)
Youtube Presentations
https://www.youtube.com/user/UCFCRCV
![Page 110: Spatiotemporal Graphs for Object Segmentation, Human Pose](https://reader031.vdocuments.net/reader031/viewer/2022012205/61de586fa59b13681779df04/html5/thumbnails/110.jpg)
Thank You