hand pose estimation via 2.5d latent heatmap …...umar iqbal1,2, pavlo molchanov2, thomas breuel2,...

1
Umar Iqbal 1,2 , Pavlo Molchanov 2 , Thomas Breuel 2 , Juergen Gall 1 and Jan Kautz 2 Hand Pose Estimation via 2.5D Latent Heatmap Regression 2 NVIDIA Research 1 Computer Vision Group, University of Bonn, Germany [email protected] [email protected] Comparison with the state-of-the-art Normalized relative depth Loss function Ablative Studies learnable parameter controls spread Hadamard product [2] C. Zimmerman and T. Brox. Learning to estimate 3D hand pose from a single Image. In ICCV'17 [4] F. Mueller, F. Bernard, O. Sotnychenko, D. Mehta, S. Sridhar, D. Casas, C. Theobalt. GANerated hands for real-time 3D hand tracking from monocular RGB. In CVPR'18. [3] T. Simon, H. Joo, I. Mattews, Y. Sheikh. Hand keypoint detection in single images using multiview bootstrapping. In CVPR'17. References [1] J. Zhang, J. Jiao, M. Chen, L. Qu, X. Xu, Q. Yang. 3D hand pose tracking and estimation using stereo matching. ArXiv'16. Problem: Introduction Challenges Large amounts of appearance variation and self occlusions 2D and 3D hand pose estimation Occlusion due to interaction with objects Complex hand articulations Motivation An exact approach to reconstruct 3D hand pose from 2.5D pose Overview Results 3D Pose Reconstruction 3D pose estimation is an ill-posed problem due to scale and depth ambiguities 2.5D Pose Representation A 2.5D pose representation that can be estimated easily from an RGB image A 2.5D heatmap representation to enable accuract keypoint localization Contributions A CNN architecture to regress 2.5D heatmaps in a latent way A view-agnostic approach for monocular 2D and 3D hand pose estimation VR/AR human-machine interactions gaming recognition sign-language 2D pixel coordinates root-relative depth 1 0 -1 Scale Normalization Scale Recovery Latent Direct Latent 2.5D Heatmap Regression Given 2.5D pose, we need to find the depth of the root keypoint to reconstruct the scale normalized 3D pose. Given and there exists a unique 3D pose that satisfies: The coefficients of the quadratic equation: Mean bone length Kinematic structure of the hand The equation can be rewritten in terms of the 2D projections , and relative depths , as follows: 2D Heatmaps 2D Coordinates Stereo Hand Pose Ego-Dexter MPII+NZSL Dexter-Object Dexter-Object Ego-Dexter Comparison with direct heatmaps

Upload: others

Post on 16-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hand Pose Estimation via 2.5D Latent Heatmap …...Umar Iqbal1,2, Pavlo Molchanov2, Thomas Breuel2, Juergen Gall1 and Jan Kautz2 Hand Pose Estimation via 2.5D Latent Heatmap Regression

Umar Iqbal1,2, Pavlo Molchanov2, Thomas Breuel2, Juergen Gall1 and Jan Kautz2 Hand Pose Estimation via 2.5D Latent Heatmap Regression

2NVIDIA Research1Computer Vision Group, University of Bonn, Germany

[email protected] [email protected]

Comparison with the state-of-the-art

Normalized relative depth Loss function

Ablative Studies

learnable parameter controls spread

Hadamard product

[2] C. Zimmerman and T. Brox. Learning to estimate 3D hand pose from a single Image. In ICCV'17

[4] F. Mueller, F. Bernard, O. Sotnychenko, D. Mehta, S. Sridhar, D. Casas, C. Theobalt. GANerated hands for real-time 3D hand tracking from monocular RGB. In CVPR'18.

[3] T. Simon, H. Joo, I. Mattews, Y. Sheikh. Hand keypoint detection in single images using multiview bootstrapping. In CVPR'17.

References[1] J. Zhang, J. Jiao, M. Chen, L. Qu, X. Xu, Q. Yang. 3D hand pose tracking and estimation using stereo matching. ArXiv'16.

Problem:

Introduction

Challenges

Large amounts of appearance variation and self occlusions

2D and 3D hand pose estimation

Occlusion due to interaction with objects

Complex hand articulations

Motivation

An exact approach to reconstruct 3D hand pose from 2.5D pose

Overview Results

3D Pose Reconstruction

3D pose estimation is an ill-posed problem due to scale and depth ambiguities

2.5D Pose Representation

A 2.5D pose representation that can be estimated easily from an RGB image

A 2.5D heatmap representation to enable accuract keypoint localization

Contributions

A CNN architecture to regress 2.5D heatmaps in a latent way

A view-agnostic approach for monocular 2D and 3D hand pose estimation

VR/AR human-machineinteractions gamingrecognition

sign-language

2D pixel coordinates

root-relative depth

1 0 -1

Scale Normalization

Scale Recovery

Lat

ent

Dir

ect

Latent 2.5D Heatmap Regression

Given 2.5D pose, we need to find the depth of the root keypointto reconstruct the scale normalized 3D pose.

Given and there exists a unique 3D pose that satisfies:

The coefficients of the quadratic equation:

Mean bone length

Kinematic structure of the hand

The equation can be rewritten in terms of the 2D projections ,and relative depths , as follows:

2D Heatmaps

2D Coordinates

Stereo Hand Pose

Ego-Dexter MPII+NZSL

Dexter-Object Dexter-Object

Ego-Dexter

Comparison with direct heatmaps