arttrack: articulated multi-person tracking in the wild : cv勉強会関東

第41回　コンピュータビジョン勉強会＠関東 CVPR2017読み会(後編) 2017.8.19

ArtTrack: Articulated Multi-person Tracking in the Wild

Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, Bernt Schiele

Yukiyoshi Sasao @poyy

1

Articulated Multi-Person Tracking

Max Planck Institute for Informatics

project : http://pose.mpi-inf.mpg.de/art-track/demo : https://youtu.be/eYtn13fzGGoCVPR : https://youtu.be/kdV2sdZ9TWgcode : https://github.com/eldar/pose-tensorflow [pdf] [supp] [arXiv]

2

単眼RGB動画で、複数人の人物検出・姿勢推定・追跡

http://pose.mpi-inf.mpg.de/art-track/

https://youtu.be/eYtn13fzGGo

https://youtu.be/kdV2sdZ9TWg

https://github.com/eldar/pose-tensorflow

http://openaccess.thecvf.com/content_cvpr_2017/papers/Insafutdinov_ArtTrack_Articulated_Multi-Person_CVPR_2017_paper.pdf

http://openaccess.thecvf.com/content_cvpr_2017/supplemental/Insafutdinov_ArtTrack_Articulated_Multi-Person_2017_CVPR_supplemental.pdf

https://arxiv.org/abs/1612.01465

http://www.youtube.com/watch?v=TClSwDRIJUQ

Related Work : Human Pose Estimation

Joint Training of a Convolutional Network and a

Graphical Model for Human Pose Estimation,

Jonathan Tompson, et al. NIPS2014

Stacked Hourglass Networks for Human Pose

Estimation, Alejandro Newell, et al. ECCV2016

Realtime Multi-Person 2D Pose Estimation using Part

Affinity Fields, Zhe Cao, et al. CVPR2017

3

一人

複数人

Related Work : Multi Target Tracking

4

Continuous Energy Minimization for Multitarget

Tracking, PAMI2014, Anton Milan, et al. PAMI2014

Near-Online Multi-target Tracking with Aggregated

Local Flow Descriptor, Wongun Choi, et al. ICCV2015

Multiple People Tracking by Lifted Multicut and Person

Re-identification, Siyu Tang, et al. CVPR2017

MOT Challenge : https://motchallenge.net/

BoundingBoxが対象

ほぼ道を歩いているシーン

https://motchallenge.net/

Related Work : Human Pose Tracking

5

Flowing ConvNets for Human Pose Estimation in

Videos, Tomas Pfister, et al. ICCV2015

Chained Predictions Using Convolutional Neural

Networks, Georgia Gkioxari, et al. ECCV2016

Thin-Slicing Network: A Deep Structured Model for

Pose Estimation in Videos, Jie Song, et al. CVPR 2017

対象一人、事前に位置は与えられる

本手法 : 3 task を複数人でこなす初めての試み

6

CVPR2017で同じ問題設定:

PoseTrack: Joint Multi-Person Pose Estimation and Tracking, Umar Iqbal, et al.

MPII history

7

CVPR2016 DeepCut

ECCV 2016 DeeperCut

CVPR2017 ArtTrack -> 時間方向に拡張

CVPR2014

MPII Human Pose Dataset

Method Overview

8

Formulation

9

Minimum Cost

Subgraph Multicut Problem

node:

unary cost edge:

pairwise cost

時間方向

空間方向

Subgraph Decomposition for Multi-Target Tracking, Tang, et al. CVPR2015

Formulation

10

unary cost pairwise cost

{0,1} {0,1}

制約1: 一貫性

制約2: 移行性OK NG

線形計画問題

Solver

・Joint Graph Decomposition & Node Labeling: Problem, Algorithms, Applications, Levinkov, et al. CVPR2017.

・KL (Kernighan Lin) based. (node 項を無視できる場合 )

11

Part-Detection : Bottom-Up (従来)

12

CNNで各パーツのHeatmapを出力NMS (Non-Maximum-Suppression)

でPeak値を候補点として出力

unary cost は Heatmap の値を用いるCNNにはResNet-101を利用

Part-Detection : Top-Down

13

検出精度の高いパーツ (頭部) を基準に他のパーツを探すことで、精度良く少量の候補点を検出

Top-Down / Bottom-Up prediction

14

Spatio-Temporal Graph

15

Spatial edge

Temporal edge

Spatial Pairwise (従来法)

16

DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model, 　Insafutdinov, et al. ECCV2016

これらを特徴量としたロジスティック回帰

-> pairwise cost

あるパーツからの相対位置を回帰(CNN?)で求め、

実際の候補パーツとの差(位置,角度) をΔ,θとする.

Spatial Pairwise (従来法・本手法)

17

従来法 (Bottom-Up)

パーツ間全てで前ページのedge-costを計算

本手法 (Top-Down)

root-nodeから各パーツへのedge-costのみ計算

attractive-repulsive edge (付けるか離すか)

18

attractive-repulsive edge : 同じタイプのパーツを接続

必要性 : NMS時点で, 近い位置にある2人分の頭部を1つの

候補点にしてしまうことを避けたい

-> 距離に反比例するCost

3タイプのedge

spatial connectivity

19

Person Node (root node)

Bottom-Up full接続 Bottom-Up Sparse Top-Down

長距離は低信頼性

Temporal Pairwise (matching)　

20

特徴量

ロジスティック回帰

補完関係にある3つの特徴量

　L2 : 位置が近いか : スローモーションで有効 .

　Deep Matching : 合致した点の割合を特徴量とする ,

2画像の比較なら通常は最も有効 .

　SIFT : 回転に強い .

DeepFlow/DeepMatching

Revaud, et al. ICCV2013.

Evaluation : Single frame

21

on MPII Multi Person Val

Evaluation : Single frame 他手法比較

22

on MPII Multi Person Test

[Cao, et al. CVPR2017] (Bottom-Up) 75.6 0.005 (含CNN)-> 効率的なCNNで少ない候補点を出す方が高速高精度 ?

Evaluation : Pose Tracking

23

on MPII Video Pose

(今回作成したdataset)

Result : Single frame

24

Result : Tracking

25

http://www.youtube.com/watch?v=eYtn13fzGGo&t=8

Result : 失敗例 (Single frame)

26

BU, TD/BU ともに失敗:

scaleを考慮していないため

TD/BU のみ失敗:

明示的にパーツの位置関係をモデル化していないため

まとめ

● Minimum Cost Subgraph Multicut Problemを解くことで、Spatial-Temporal な grouping を実現.

● Top-Down モデルによって, 高精度化, 処理量削減.

● 新たに “MPII Video Pose” dataset を作成, 公開 (見当たらない)

● 新たなベンチマークとして, PoseTrack.net を公開.

https://posetrack.net/

ICCV2017 Workshop

500 video sequence, 20K frames, 120K body pose annotations, 3 challenge

27

https://posetrack.net/

arttrack: articulated multi-person tracking in the wild : cv勉強会関東

Technology