capturing complex spatio-temporal relations among facial...

1
Rensselaer Polytechnic Institute 1 University of Science and Technology of China 2 Ziheng Wang 1 Shangfei Wang 2 Qiang Ji 1 Capturing Complex Spatio-Temporal Relations among Facial Muscles for Facial Expression Recognition 2. Main Idea 3. Related Work Existing approaches o Time-sliced graphical models such as hidden Markov models (HMMs) and dynamic Bayesian networks (DBNs) o Syntactic and description-based approaches Limitations o Can only model a sequence of instantaneously occurring events o Only offer three time point relations: precedes, follows and equals o Typically assume first order Markov property and local stationary transition. o Syntactic and description-based models lack the expressive power to capture uncertainties The proposed model o Model both sequential and overlapping events o Do no reply on local assumptions and capture global relations o Capture a larger variety of complex relations o Remains fully probabilistic 8. Experimental Results Data Sets o CK+: 7 expressions, 327 video sequences o MMI: 6 expressions, 205 video sequences Performance Vs Number of Relation Nodes Performance Vs Related works o ITBN outperforms time-slice based models such as HMM o ITBN achieves comparable and even better performance than the related works, even though it is based purely on the tracking results Relationship Analysis Existing Dynamic Models Proposed Model Sequential events Sequential or overlapping events Local stationary relations Global relations 3 relations (precede, follow, equal) 13 complex relations 4. Interval Temporal Bayesian Network Node: Temporal Entity (Event) o A temporal entity is characterized by a pair Σ, Ω where Σ is a set of all possible states for the temporal entity, and Ω= { , ∈ ℝ: < } is the period of time spanned by the temporal entity. Link: Temporal Dependency o A temporal dependency denoted as describes a temporal relation between two temporal entities = Σ and = Σ with as the reference. o The strength of is quantified by the conditional probability ( ) o 13 temporal dependencies are defined according to Allen’s Interval Algebra B A C I AC I BC I AB Primitive Facial Muscle Event o Definition: each primitive event is the movement of one facial feature point o Duration: the interval from the time the point starts to move till the time it finally comes back o State: one of the moving patterns o Temporal relation: based on the durations of the events, their temporal relations can be uniquely determined Facial Expression Recognition with ITBN o To recognize expressions, we build ITBN’s { := 1, … , }, each of which models one expression. o Given a query sample , its expression is classified as: P 1 P 2 T 1 E 1 y t T 2 E 2 y t R 12 : E 2 overlaps E 1 S 1 S 2 S 3 S m ……………… 6. Modeling Facial Expression with ITBN 7. Learning and Inference Step 1: Temporal Relation Node Selection o It is not necessary to consider the relations among all possible pairs of events o A selection routine is performed to select those that maximize discrimination o The following score is used to rank all the relation nodes. The top L nodes are selected and instantiated in the model Step 2: Structure Learning o Temporal nodes are linked to their corresponding event nodes o Spatial links are learned by finding a network that maximizes the BIC score on the training data Step 3: Parameter Estimation o Parameters include the conditional probability distribution (CPD) , and the CPD o Tree structured CPD is introduced to reduce the number of parameters o Parameters are learned by maximizing the log likelihood = arg max y ( ) ( )= + ( ) > max log ,Θ − Θ log 2 Θ = arg max Θ log (|Θ) CK+ ITBN HMM Lucey et al. CK+ 86.3% 83.5% 83.3% ITBN HMM Zhong et al. CPL CSPL ADL MMI 59.7% 51.5% 49.4% 73.5% 47.8% 9. Conclusions Model a facial expression as a complex activity of temporally sequential or overlapping facial events Propose a novel ITBN to capture global and a larger variety of complex spatio-temporal relations among facial events The proposed model achieves comparable and even better results than existing methods, without using appearance information ITBN can be widely applied for analyzing other complex activities 1. Problem Facial Expression Recognition Happiness Surprise Disgust Sadness A B C I AB I AC I BC B B B B B C B C C C C C B C before meet overlap start during finish equal ℰ,ℛ = =1 =1 Spatial Relations Temporal Relations MMI 5. Implementation of ITBN We propose to implement ITBN with a corresponding Bayesian Network (BN) to exploit the well developed BN mathematical machinery o Circular node: state of each temporal event o Square node: type of temporal relation o Solid links: capture the spatial dependencies among events o Dotted links: connect the relation node with the corresponding temporal event nodes and capture the temporal dependencies Given a set of temporal events ℰ = { } =1 and their pairwise relations ℐ = { } =1 , the joint distribution can be written as t E 1 E 2 E 3 E 4 Model Facial expression as a complex activity consisting of sequential or overlapping facial muscle events Propose an Interval Temporal Bayesian Network (ITBN) to explicitly capture a larger variety of complex spatio-temporal relations among facial muscle events for expression recognition

Upload: others

Post on 22-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Capturing Complex Spatio-Temporal Relations among Facial ...cvrl/zihengwang/papers/cvpr2013poster.pdfThis PowerPoint 2007 template produces a 36x56 inch professional poster. You can

QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--)

This PowerPoint 2007 template produces a 36x56 inch

professional poster. You can use it to create your

research poster and save valuable time placing titles,

subtitles, text, and graphics.

We provide a series of online tutorials that will guide

you through the poster design process and answer your

poster production questions.

To view our template tutorials, go online to

PosterPresentations.com and click on HELP DESK.

When you are ready to print your poster, go online to

PosterPresentations.com.

Need Assistance? Call us at 1.866.649.3004

Object Placeholders

Using the placeholders

To add text, click inside a placeholder on the poster and type

or paste your text. To move a placeholder, click it once (to

select it). Place your cursor on its frame, and your cursor

will change to this symbol . Click once and drag it to a

new location where you can resize it.

Section Header placeholder

Click and drag this preformatted section header placeholder

to the poster area to add another section header. Use section

headers to separate topics or concepts within your

presentation.

Text placeholder

Move this preformatted text placeholder to the poster to add

a new body of text.

Picture placeholder

Move this graphic placeholder onto your poster, size it first,

and then click it to add a picture to the poster.

QUICK TIPS (--THIS SECTION DOES NOT PRINT--)

This PowerPoint template requires basic PowerPoint (version 2007 or newer) skills. Below is a list of commonly asked questions specific to this template. If you are using an older version of PowerPoint some template features may not work properly.

Template FAQs

Verifying the quality of your graphics

Go to the VIEW menu and click on ZOOM to set your

preferred magnification. This template is at 100% the

size of the final poster. All text and graphics will be

printed at 100% their size. To see what your poster will

look like when printed, set the zoom to 100% and

evaluate the quality of all your graphics before you

submit your poster for printing.

Modifying the layout

This template has four different

column layouts. Right-click

your mouse on the background

and click on LAYOUT to see the

layout options. The columns in

the provided layouts are fixed and cannot be moved

but advanced users can modify any layout by going to

VIEW and then SLIDE MASTER.

Importing text and graphics from external sources

TEXT: Paste or type your text into a pre-existing

placeholder or drag in a new placeholder from the left

side of the template. Move it anywhere as needed.

PHOTOS: Drag in a picture placeholder, size it first,

click in it and insert a photo from the menu.

TABLES: You can copy and paste a table from an

external document onto this poster template. To

adjust the way the text fits within the cells of a table

that has been pasted, right-click on the table, click

FORMAT SHAPE then click on TEXT BOX and change

the INTERNAL MARGIN values to 0.25.

Modifying the color scheme

To change the color scheme of this template go to the

DESIGN menu and click on COLORS. You can choose

from the provided color combinations or create your

own.

© 2013 PosterPresentations.com 2117 Fourth Street , Unit C Berkeley CA 94710 [email protected]

Student discounts are available on our Facebook page.

Go to PosterPresentations.com and click on the FB icon.

Rensselaer Polytechnic Institute1 University of Science and Technology of China2

Ziheng Wang1 Shangfei Wang2 Qiang Ji1

Capturing Complex Spatio-Temporal Relations among Facial Muscles for Facial Expression Recognition

2. Main Idea

3. Related Work

Existing approaches

o Time-sliced graphical models such as hidden Markov models

(HMMs) and dynamic Bayesian networks (DBNs)

o Syntactic and description-based approaches

Limitations

o Can only model a sequence of instantaneously occurring events

o Only offer three time point relations: precedes, follows and

equals

o Typically assume first order Markov property and local

stationary transition.

o Syntactic and description-based models lack the expressive

power to capture uncertainties

The proposed model

o Model both sequential and overlapping events

o Do no reply on local assumptions and capture global relations

o Capture a larger variety of complex relations

o Remains fully probabilistic

8. Experimental Results

Data Sets

o CK+: 7 expressions, 327 video sequences

o MMI: 6 expressions, 205 video sequences

Performance Vs Number of Relation Nodes

Performance Vs Related works

o ITBN outperforms time-slice based models such as HMM

o ITBN achieves comparable and even better performance than

the related works, even though it is based purely on the

tracking results

Relationship Analysis

Existing Dynamic Models Proposed Model

Sequential events Sequential or overlapping events

Local stationary relations Global relations

3 relations (precede, follow, equal) 13 complex relations

4. Interval Temporal Bayesian Network

Node: Temporal Entity (Event)

o A temporal entity is characterized by a pair Σ, Ω where Σ is a

set of all possible states for the temporal entity, and Ω ={ 𝑎, 𝑏 ∈ ℝ: 𝑎 < 𝑏} is the period of time spanned by the

temporal entity.

Link: Temporal Dependency

o A temporal dependency denoted as 𝐼𝑋𝑌 describes a temporal

relation between two temporal entities 𝑋 = Σ𝑋, Ω𝑋 and

𝑌 = Σ𝑌, Ω𝑌 with 𝑋 as the reference.

o The strength of 𝐼𝑋𝑌 is quantified by the conditional probability

𝑃(𝐼𝑋𝑌|Σ𝑋, Σ𝑌)

o 13 temporal dependencies are defined according to Allen’s

Interval Algebra

B A

C

IAC IBC

IAB

Primitive Facial Muscle Event

o Definition: each primitive event is the movement of one facial

feature point

o Duration: the interval from the time the point starts to move

till the time it finally comes back

o State: one of the moving patterns

o Temporal relation: based on the durations of the events, their

temporal relations can be uniquely determined

Facial Expression Recognition with ITBN

o To recognize 𝑁 expressions, we build 𝑁 ITBN’s {𝑀𝑦: 𝑦 =

1,… , 𝑁}, each of which models one expression.

o Given a query sample 𝑥, its expression is classified as:

P1

P2

T1

E1

y

t

T2 E2

y

t

R12: E2 overlaps E1

S1

S2

S3

Sm

………………

6. Modeling Facial Expression with ITBN

7. Learning and Inference

Step 1: Temporal Relation Node Selection

o It is not necessary to consider the relations among all possible

pairs of events

o A selection routine is performed to select those that maximize

discrimination

o The following score is used to rank all the relation nodes. The

top L nodes are selected and instantiated in the model

Step 2: Structure Learning

o Temporal nodes are linked to their corresponding event nodes

o Spatial links are learned by finding a network 𝐺 that maximizes

the BIC score on the training data 𝐷

Step 3: Parameter Estimation

o Parameters include the conditional probability distribution

(CPD) 𝑃 𝐸𝑖 𝜋 𝐸𝑖 , and the CPD 𝑃 𝐼𝑘 𝜋 𝐼𝑘

o Tree structured CPD is introduced to reduce the number of

parameters

o Parameters are learned by maximizing the log likelihood

𝑦∗ = argmaxy

𝑃 𝑥 𝑀𝑦

𝐶𝑜𝑚𝑝𝑙𝑒𝑥𝑖𝑡𝑦(𝑀𝑦)

𝑆(𝐼𝐴𝐵) = 𝐷𝐾𝐿 𝑃𝑖 ∥ 𝑃𝑗 + 𝐷𝐾𝐿(𝑃𝑗 ∥ 𝑃𝑖)

𝑖>𝑗

max𝐺log 𝑃 𝐷 𝐺, Θ −

Θ log𝑁

2

Θ∗ = argmaxΘlog 𝑃(𝐷|Θ)

CK+

ITBN HMM Lucey et al.

CK+ 86.3% 83.5% 83.3%

ITBN HMM Zhong et al.

CPL CSPL ADL

MMI 59.7% 51.5% 49.4% 73.5% 47.8%

9. Conclusions

Model a facial expression as a complex activity of temporally

sequential or overlapping facial events

Propose a novel ITBN to capture global and a larger variety of

complex spatio-temporal relations among facial events

The proposed model achieves comparable and even better results

than existing methods, without using appearance information

ITBN can be widely applied for analyzing other complex activities

1. Problem

Facial Expression Recognition

Happiness Surprise Disgust Sadness

A

B

C

IAB

IAC

IBC

B

B

B

B

B

C

B

C

C

C

C

C

B

C

before

meet

overlap

start

during

finish

equal

𝑃 ℰ,ℛ = 𝑃 𝐸𝑖 𝜋 𝐸𝑖

𝑛

𝑖=1

𝑃 𝐼𝑘 𝜋 𝐼𝑘

𝐾

𝑘=1

Spatial Relations Temporal Relations

MMI

5. Implementation of ITBN

We propose to implement ITBN with a corresponding Bayesian

Network (BN) to exploit the well developed BN mathematical

machinery

o Circular node: state of each temporal event

o Square node: type of temporal relation

o Solid links: capture the spatial dependencies among events

o Dotted links: connect the relation node with the corresponding

temporal event nodes and capture the temporal dependencies

Given a set of temporal events ℰ = {𝐸𝑖}𝑖=1𝑛 and their pairwise

relations ℐ = {𝐼𝑘}𝑘=1𝐾 , the joint distribution can be written as

t

E1

E2

E3

E4

Model Facial expression as a complex activity consisting of

sequential or overlapping facial muscle events

Propose an Interval Temporal Bayesian Network (ITBN) to

explicitly capture a larger variety of complex spatio-temporal

relations among facial muscle events for expression recognition