1.introduction 2.article [1] real time motion capture using a single tof camera (2010) 3.article [2]...

54
Human Pose Recognition

Upload: clarence-cross

Post on 17-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

Human Pose Recognition

Page 2: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

Contents

1. Introduction

2. Article [1]

Real Time Motion Capture Using a

Single TOF Camera (2010)

3. Article [2]

Real Time Human Pose Recognition In

Parts Using a Single Depth

Images(2011)

Page 3: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

1.1 What Is Pose Recognition?

Fig From [2]

Input Image

armtorso

head

Page 4: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

1.2 Motivation

Why do we need this?

Robotics

Smart surveillance

virtual reality

motion analysis

Gaming - Kinect

Page 5: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

Kinect – Project Natal

Microsoft Xbox 360 console

“You are the controller”

Launched - 04/11/10

In the first 60 days on the market sold

over 8M units! (Guinness world record)

http://www.youtube.com/watch?v=p2qlHo

xPioM

Page 6: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

1.3 Challenges

Real Time???

Full Solution??

Cheap???

OCCLUSIONS???Light?

Shadows?

Clothes?

What is the problem???

Page 7: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

1.4 Previous Technology

mocap using markers –

expensive

Multi View camera systems –

limited applicability.

Monocular –

simplified problems.

Page 8: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

1.4 New TechnologyTime Of Flight Camera. (TOF)

Dense depth

High frame rate (100 Hz)

Robust to:

Lighting

shadows

other problems.

Page 9: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2. Article [1]Real Time Motion

Capture Using a Single Time Of Flight Camera

(V. Ganapathi et al. CVPR 2010)

Page 10: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

Article Contents

2.1 previous work

2.2 What’s new?

2.3 Overview

2.4 results

2.5 limitations & future work

2.6 Evaluation

Page 11: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.1 Previous workMany many many articles…

(Moeslund et al 2006–covered 350

articles…)

(2006) (2006) (1998)

Page 12: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.2 What’s new?TOF technology

Propagating information up the kinematic

chain.

Probabilistic model using the unscented

transform.

Multiple GPUs.

Page 13: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.3 Overview

1. Probabilistic Model

2. Algorithm Overview:

Model Based Hill Climbing Search

Evidence Propagation

Full Algorithm

Page 14: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

1 .Probabilistic Model 1 .Probabilistic Model

15 body parts

DAG – Directed Acyclic Graph

1{ }i Nt t iX X pose

tVspeed tzrange scan

DBN– Dynamic Bayesian Network

Page 15: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

dynamic Bayesian network (DBN)

Assumptions

Use ray casting to evaluate

distance from measurement.

Goal: Find the most likely states, given previous frame

MAP, i.e.:

Fig From [1]

1( ) 1i i it t tP X V X 1 1| ~ ( , )t t tV V N V

, 1 1ˆ ˆ ˆ ˆ, argmax log ( | , ) log ( , | , )

t tt t X V t t t t t t tX V P z X V P X V X V

1 .Probabilistic Model

kz

Page 16: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2 .Algorithm Overview

1. Hill climbing search (HC)

2. Evidence Propagation –EP

Page 17: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.1 Hill Climbing Search (HC)

Fig From [1]

0.05m

1ˆ,t t tX fromV X

0.05m

Calculate

evaluate likelihood choose best point!

1( | )i it tP V V Grid around

itVSamplei

Coarse to fine Grids.

Page 18: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.1 Hill Climbing Search (HC)

The good:

Simple

Fast

run in parallel in GPUS

The Bad:

Local optimum

Ridges, Plateau, Alleys

Can lose track when motion is fast ,or occlusions

occur.

Page 19: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.2 Evidence Propagation

Also has 3 stages:

1. Body part detection (C. Plagemann et al 2010)

2. Probabilistic Inverse Kinematics

3. Data association and inference

Page 20: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.2.1 Body Part Detection

Bottom up approach:

1. Locate interest points with AGEX –

Accumulative Geodesic Extrema.

2. Find orientation.

3. Classify the head, foots and hands using local shape

descriptors.

Fig From [3]

Page 21: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.2.1 Body Part Detection

Results:

Fig From [3]

Page 22: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.2.2 Probabalistic inverse kinematics (EP)

51{ , , }i ip Head Hands Legs of X ˆ ( 1,..., )jp j N

?

Assume Correspondence

Need new MAP conditioned on .

Problem – isn’t linear!

Solution: Linearize with the unscented Kalman filter .

Easy to determine .

1 1ˆ ˆ( , , )i t t tp V X V

1 1ˆ ˆ ˆ( | , , )t t t jP V V X p

1ˆ ˆ,t jX p

ˆi jp p

Page 23: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.3 Full Algorithm

HC

Part Detection

Remove ExplainedSuggestions.Coresspond: by body parts

ˆ{( , )}i jp p

X’

HC

PreviousMAP

DepthImage

X’>Xbest?

X’

Xbest

EP

Page 24: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.4 Results Experiments:

28 real depth image sequences.

Ground Truth - tracking markers.

, – real marker position

– estimated position

perfect tracks.

fault tracking.

Compared 3 algorithms: EP, HC, HC+EP .

1

ˆ|| ||Mi i

avgi

m m

M

im

ˆ im

0.1avg m

0.3avg m

Page 25: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.4 Results

best – HC+EP, worse – EP.

Runs close to real time.

HC: 6 frames per second.

HC+EP: 4-6 frames per second.Fig From [1]

BiggerDifference

Harder

Page 26: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.4 Results

HC

HC+EP

Lose trackExtreme case – 27:

Fig From [1]

Page 27: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.5 Limitations & Future workLimitations:

Manual Initialization.

Tracking more than one person at a time.

Using temporal data – consume more time,

reinitialization problem.

Future work:

improving the speed.

combining with color cameras

fully automatic model initialization.

Track more than 1 person.

Page 28: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.6 Evaluation Well Written

Self Contained

Novel combination of existing parts

New technology

Achieving goals (real time)

Missing examples on probabilistic model.

Not clear how is defined

Extensively validated:

Data set and code available

not enough visual examples in article

No comparison to different

algorithms

0X

Page 29: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

3. Article [2]Real Time Human Pose

Recognition In Parts From Single Depth

Images (Shotton et al. & Xbox incubation

Microsoft Research 2011)

Page 30: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

Article Contents

2.1 previous work

2.2 What’s new?

2.3 Overview

2.4 results

2.5 limitations & future work

2.6 Evaluation

Page 31: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.1 Previous work Same as Article [1].

Page 32: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.2 What’s new? Using no temporal information – robust

and

fast (200 frames per second).

Object recognition approach.

per pixel classification.

Large and highly varied

training dataset .

Fig From [2]

Page 33: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.3 Overview

1. Database construction

2. Body part inference and joint proposals:

Goals:

computational efficiency and robustness

Page 34: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

1 .Database

Pose estimation is often overcome lack of training

data… why???

Huge color and texture variability.

Computer simulation don’t produce the range of

volitional motions of a human subject.

Page 35: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2 .Data base

Fig From [2]

100k mocap frames Synthetic rendering pipeline

Page 36: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

1 .Database

Real data

Synthetic data

Which is real???

Fig From [2]

Page 37: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2 .Body part inference

1. Body part labeling

2. Depth image features

3. Randomized decision forests

4. Joint position proposals

Page 38: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.1 Body part labeling

31 body parts labeled .

The problem now can be solved by an efficient

classification algorithms.

Fig From [2]

Head Up RightHead Up Left

Page 39: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.2 Depth comparison features

Simple depth comparison features:(1)

– depth at pixel x in image I, offset

normalization - depth invariant.

computational efficiency:

no preprocessing.

( )Id x ( , )u v

Fig From [2]

Page 40: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.3 Randomized Decision forests

How does it work?

Node = feature

Classify pixel x:

f and a threshold

1

1( | , ) ( | , )

T

tt

P c I x P c I xT

Fig From [2]

Pixel x

Page 41: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.3 Randomized Decision forests

Training Algorithm: 1M Images – 2000 pixels

Per image

( , )

( , )

| ( ) |argmax ( ) ( ) ( ) ( ( ))

| |s

ss l r

QG G H Q H Q

Q

*H-antropy

Training 3 trees, depth 20, 1M images~ 1 day (1000 core

cluster)

1M images*2000pixels*2000 *50 =

f 142 10 ...computations

Page 42: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.3 Randomized Decision forests

Fig From [2]

Trained tree:

Page 43: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.4 Joint Position Proposal

Local mode finding approach based on mean shift with a

weighted Gaussian kernel.

Density estimator:

2

1

ˆ( ) expN

ic ic

i c

x xf x w

b

2( | , ) ( )ic i I iw P c I x d x

Fig From [4]

outliersCenter of mass

Page 44: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.4 Results Experiments:

8800 frames of real depth images.

5000 synthetic depth images.

Also evaluate Article [1] dataset.

Measures :

1. Classification accuracy – confusion

matrix.

2. joint accuracy –mean Average Precision

(mAP)

results within D=0.1m –TP.

Page 45: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

Fault

Fig From [2]

Page 46: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.4 Results- Classification accuracy high correlation between real and synthetic.

Depth of tree – most effective

Fig From [2]

Page 47: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.4 Results - Joint Prediction Comparing the algorithm on:

real set (red) – mAP 0.731

ground truth set (blue) – mAP 0.914

mAP 0.984 – upper

body

Fig From [2]

Page 48: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.4 Results- Joint PredictionComparing algorithm to ideal Nearest Neighbor

matching, and realistic NN - Chamfer NN.

Fig From [2]

Page 49: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.4 Results- Joint PredictionComparison to Article[1]:

Run on the same dataset

Better results (even without temporal

data)

Runs 10x faster.

Fig From [2]

Page 50: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.4 Results- Joint PredictionFull rotations and multiple people

Right-left ambiguity

mAP of 0.655 ( good for our uses)

Result VideoFig From [2]

Page 51: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.5 Limitations & Future workFuture work:

better synthesis pipeline

Is there efficient approach that directly

regress joint positions? (already done in

future

work -

Efficient offset regression of body joint position

s

)

Page 52: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

2.6 Evaluation Well Written

Self Contained

Novel combination of existing parts

New technology

Achieving goals (real time)

Extensively validated:

Used in real console

Many results graphs and examples

(Another pdf of supplementary

material)

Broad comparison to other

algorithms

data set and code not available

Page 53: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

References[1] Real Time Motion Capture Using a Single TOF Camera (V.

Ganapathi et al. 2010)

[2] Real Time Human Pose Recognition In Parts Using a Single

Depth Images(Shotton et al. & Xbox Incubation 2011)

[3] Real time identification and localization of body parts from

depth images (C. Plagemann et al. 2010)

[4] Computer Graphics course (046746), Technion.

Page 54: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a

Questions?