arttrack: articulated multi-person tracking in the wild : cv勉強会関東
TRANSCRIPT
![Page 1: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/1.jpg)
第41回 コンピュータビジョン勉強会@関東 CVPR2017読み会(後編) 2017.8.19
ArtTrack: Articulated Multi-person Tracking in the Wild
Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, Bernt Schiele
Yukiyoshi Sasao @poyy
1
![Page 2: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/2.jpg)
Articulated Multi-Person Tracking
Max Planck Institute for Informatics
project : http://pose.mpi-inf.mpg.de/art-track/demo : https://youtu.be/eYtn13fzGGoCVPR : https://youtu.be/kdV2sdZ9TWgcode : https://github.com/eldar/pose-tensorflow [pdf] [supp] [arXiv]
2
単眼RGB動画で、複数人の人物検出・姿勢推定・追跡
![Page 3: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/3.jpg)
Related Work : Human Pose Estimation
Joint Training of a Convolutional Network and a
Graphical Model for Human Pose Estimation,
Jonathan Tompson, et al. NIPS2014
Stacked Hourglass Networks for Human Pose
Estimation, Alejandro Newell, et al. ECCV2016
Realtime Multi-Person 2D Pose Estimation using Part
Affinity Fields, Zhe Cao, et al. CVPR2017
3
一人
複数人
![Page 4: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/4.jpg)
Related Work : Multi Target Tracking
4
Continuous Energy Minimization for Multitarget
Tracking, PAMI2014, Anton Milan, et al. PAMI2014
Near-Online Multi-target Tracking with Aggregated
Local Flow Descriptor, Wongun Choi, et al. ICCV2015
Multiple People Tracking by Lifted Multicut and Person
Re-identification, Siyu Tang, et al. CVPR2017
MOT Challenge : https://motchallenge.net/
BoundingBoxが対象
ほぼ道を歩いているシーン
![Page 5: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/5.jpg)
Related Work : Human Pose Tracking
5
Flowing ConvNets for Human Pose Estimation in
Videos, Tomas Pfister, et al. ICCV2015
Chained Predictions Using Convolutional Neural
Networks, Georgia Gkioxari, et al. ECCV2016
Thin-Slicing Network: A Deep Structured Model for
Pose Estimation in Videos, Jie Song, et al. CVPR 2017
対象一人、事前に位置は与えられる
![Page 6: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/6.jpg)
本手法 : 3 task を複数人でこなす初めての試み
6
CVPR2017で同じ問題設定:
PoseTrack: Joint Multi-Person Pose Estimation and Tracking, Umar Iqbal, et al.
![Page 7: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/7.jpg)
MPII history
7
CVPR2016 DeepCut
ECCV 2016 DeeperCut
CVPR2017 ArtTrack -> 時間方向に拡張
CVPR2014
MPII Human Pose Dataset
![Page 8: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/8.jpg)
Method Overview
8
![Page 9: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/9.jpg)
Formulation
9
Minimum Cost
Subgraph Multicut Problem
node:
unary cost edge:
pairwise cost
時間方向
空間方向
Subgraph Decomposition for Multi-Target Tracking, Tang, et al. CVPR2015
![Page 10: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/10.jpg)
Formulation
10
unary cost pairwise cost
{0,1} {0,1}
制約1: 一貫性
制約2: 移行性OK NG
線形計画問題
![Page 11: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/11.jpg)
Solver
・Joint Graph Decomposition & Node Labeling: Problem, Algorithms, Applications, Levinkov, et al. CVPR2017.
・KL (Kernighan Lin) based. (node 項 を無視できる場合 )
11
![Page 12: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/12.jpg)
Part-Detection : Bottom-Up (従来)
12
CNNで各パーツのHeatmapを出力NMS (Non-Maximum-Suppression)
でPeak値を候補点として出力
unary cost は Heatmap の値を用いるCNNにはResNet-101を利用
![Page 13: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/13.jpg)
Part-Detection : Top-Down
13
検出精度の高いパーツ (頭部) を基準に他のパーツを探すことで、精度良く少量の候補点を検出
![Page 14: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/14.jpg)
Top-Down / Bottom-Up prediction
14
![Page 15: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/15.jpg)
Spatio-Temporal Graph
15
Spatial edge
Temporal edge
![Page 16: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/16.jpg)
Spatial Pairwise (従来法)
16
DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model, Insafutdinov, et al. ECCV2016
これらを特徴量としたロジスティック回帰
-> pairwise cost
あるパーツからの相対位置を回帰(CNN?)で求め、
実際の候補パーツとの差(位置,角度) をΔ,θとする.
![Page 17: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/17.jpg)
Spatial Pairwise (従来法・本手法)
17
従来法 (Bottom-Up)
パーツ間全てで前ページのedge-costを計算
本手法 (Top-Down)
root-nodeから各パーツへのedge-costのみ計算
![Page 18: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/18.jpg)
attractive-repulsive edge (付けるか離すか)
18
attractive-repulsive edge : 同じタイプのパーツを接続
必要性 : NMS時点で, 近い位置にある2人分の頭部を1つの
候補点にしてしまうことを避けたい
-> 距離に反比例するCost
3タイプのedge
![Page 19: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/19.jpg)
spatial connectivity
19
Person Node (root node)
Bottom-Up full接続 Bottom-Up Sparse Top-Down
長距離は低信頼性
![Page 20: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/20.jpg)
Temporal Pairwise (matching)
20
特徴量
ロジスティック回帰
補完関係にある3つの特徴量
L2 : 位置が近いか : スローモーションで有効 .
Deep Matching : 合致した点の割合を特徴量とする ,
2画像の比較なら通常は最も有効 .
SIFT : 回転に強い .
DeepFlow/DeepMatching
Revaud, et al. ICCV2013.
![Page 21: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/21.jpg)
Evaluation : Single frame
21
on MPII Multi Person Val
![Page 22: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/22.jpg)
Evaluation : Single frame 他手法比較
22
on MPII Multi Person Test
[Cao, et al. CVPR2017] (Bottom-Up) 75.6 0.005 (含CNN)-> 効率的なCNNで少ない候補点を出す方が高速高精度 ?
![Page 23: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/23.jpg)
Evaluation : Pose Tracking
23
on MPII Video Pose
(今回作成したdataset)
![Page 24: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/24.jpg)
Result : Single frame
24
![Page 26: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/26.jpg)
Result : 失敗例 (Single frame)
26
BU, TD/BU ともに失敗:
scaleを考慮していないため
TD/BU のみ失敗:
明示的にパーツの位置関係をモデル化していないため
![Page 27: ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東](https://reader036.vdocuments.net/reader036/viewer/2022062311/5a64791a7f8b9a31568b46bf/html5/thumbnails/27.jpg)
まとめ
● Minimum Cost Subgraph Multicut Problemを解くことで、Spatial-Temporal な grouping を実現.
● Top-Down モデルによって, 高精度化, 処理量削減.
● 新たに “MPII Video Pose” dataset を作成, 公開 (見当たらない)
● 新たなベンチマークとして, PoseTrack.net を公開.
https://posetrack.net/
ICCV2017 Workshop
500 video sequence, 20K frames, 120K body pose annotations, 3 challenge
27