tony tung @ matsuyama lab., kyoto university 2007-2014

125
Dynamic Surface Modeling & Applications Tony Tung Matsuyama Laboratory, Kyoto University 2005.07-2005.08 2008.06-2014.09

Upload: tony-tung

Post on 17-Jul-2015

246 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Dynamic Surface

Modeling & Applications

Tony Tung

Matsuyama Laboratory, Kyoto University

2005.07-2005.08

2008.06-2014.09

Page 2: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

自己紹介

Interests: computer vision, pattern recognition,

shape modeling, human-computer interaction

Tony TUNG

Matsuyama Laboratory

Graduate School of Informatics, Kyoto University

2005/07/01 - 2005/08/31 : JSPS Summer program (postdoc)

2008/06/01 - 2010/01/31 : Postdoc + JSPS short-term postdoc

2010/02/01 - 2014/09/30 : Assistant Professor (CREST - Kawahara Laboratory)

KAKENHI WAKATEB *2

JSPS AYAME

Microsoft Research Azure project

Contact: tonytung.org

Page 3: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D Video is:

- Free-viewpoint video

- Image-based system for full surface capture of objects in

motion

- Markerless technique

3D video: full 3D object in motion

3D Video project

Page 4: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

[Matsuyama et al., CVIU'04]

3D Video project

Applications: preservation of intangible cultural heritage,

medicine (e.g., gait analysis), entertainment (movies,

sport replay), etc.

Page 5: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D Video project

T. Matsuyama, S. Nobuhara, T. Takai, T. Tung

Springer 2012 (book)

Page 6: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video framework

- Current 3D video studio (3rd) at Kyoto University

•Reconstruction space: 3m x 3m x3m

•Green background for chroma keying, fluorescent/LED lamps

•16 video cameras 1600x1200@25fps

•Grasshopper IEEE1394b

•Synchronization by external trigger

•Geometrically calibrated

•Cluster which consists of 2 PCs (8 cameras per PC)

Page 7: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video framework

• 3D video data = sequence of 3D mesh models– Frame-by-frame reconstruction using multiview stereo techniques

Page 8: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video framework

• 3D video data = sequence of 3D mesh models– Frame-by-frame reconstruction using multiview stereo techniques

Page 9: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D Video Reconstruction[CVPR08] [ICCV09]

Page 10: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video reconstruction

3D video reconstruction from multiview stereo:

[Matsuyama et al., CVIU’04] [Matsuyama et al., Springer’12]

++ using temporal cues:

[Tung et al., CVPR’08] [Tung et al., ICCV’09]

Page 11: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video super resolution

Image-based super resolution of 3D video

[Tung et al., CVPR’08]

Page 12: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Stereo probabilistic fusion

3D video reconstruction from wide baseline stereo and SfM

probabilistic fusion

[Tung et al., ICCV’09]

Page 13: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Data size issue

• One or several subjects in the 3D video studio

• 3D surface reconstruction by MVS technique

• Volumetric graph-cuts (5mm resolution)

• Each 3D model = 1.5 MB (30,000 triangles)

• 5 min of 3D video = 11.25 GB

How to manage this big amount of data?

- How to search ( analysis, visualization)

- How to handle inconsistency ( storage, transfer)

Page 14: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology Dictionary

for 3D Video Understanding[CVPR07] [CVPR09] [PAMI12]

Page 15: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video sequence

Sequential reconstruction

(Inconsistent topology between frames)

Topology can be used to characterized 3D video data

Page 16: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology dictionary for 3D video understanding

• Abstraction levels• Topology-based shape description (frame level)

• Probabilistic motion graph modeling (sequence level)

• Applications• Analysis: segmentation, annotation, action recognition

• Content-based encoding: summarization, skimming

• Data size compression: storage, streaming

[Tung et al., CVPR’09]

[Tung et al., PAMI’12]

[Matsuyama et al., Springer’12]

Page 17: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology-based shape description

Morse theory

: S with : real continuous function

S : manifold surface (mesh surface)

Reeb graph = quotient space of the graph of in S

defined by the equivalence relation ~

. (X) = (Y)

(X ,Y) S2, X ~ Y . X and Y same connected

component as -1((X))

[Reeb, 1946]

Page 18: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology-based shape description

• Multiresolution Reeb graphs

[Hilaga et al., SIGGRAPH’01]

[Tung et al., CVPR’07]

- Automatic extraction of graphs

- R, t, scale invariant

- Homotopic

- Multiresolution coarse-to-fine matching

Page 19: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology-based shape description

Page 20: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Reeb graph evaluation

• Robustness to surface noise

Page 21: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Reeb graph vs. skeleton

• “Automatic” 3D shape description

Page 22: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology matching

- Invariance to rotation, translation and scale

- Matching using topological and geometrical

attributes (valence, relative area)

- Coarse-to-fine multiresolution strategy

- Similarity of two models M,N from similarity of

topology consistent node pairs {(mi, nj)} at

every level of resolution:

SIM(M,N) = sim(mi, nj)r=0

R

{ij}

[Hilaga et al., SIGGRAPH’01]

[Tung et al., CVPR’07]

Page 23: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Performance evaluation

Pose retrieval in 3D video sequences

[Huang et al., 3DPVT'10]

Page 24: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology clusters

• Dataset clustering using similarity evaluation

Distance matrix {1 - SIM}

Page 25: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology clusters

• Dataset clustering using similarity evaluation

Repeated poses

Long poses

Short poses

Transitions

Distance matrix {1 - SIM}

Page 26: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology clusters

• Clustering of (repetitive) atomic actions

Page 27: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology clusters

• Clustering of (repetitive) atomic actions

i i

Page 28: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology clusters

• Clustering of (repetitive) atomic actions

Page 29: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology clusters

• Motion graph structure SIGGRAPH’02: [Arikan&Forsyth] [Kovar et al.] [Lee et al.]

• using statistics on cluster size and occurrence

i

Page 30: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology clusters

i

SUMMARIZATION

• Motion graph structure SIGGRAPH’02: [Arikan&Forsyth] [Kovar et al.] [Lee et al.]

• using statistics on cluster size and occurrence

Page 31: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Topology clusters

i

3D VIDEO SKIMMING

• Motion graph structure SIGGRAPH’02: [Arikan&Forsyth] [Kovar et al.] [Lee et al.]

• using statistics on cluster size and occurrence

Page 32: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video skimming

Page 33: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video annotation

• Add semantic information to each topology cluster

Page 34: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video skimming and annotation

Topology dictionary for 3D video understanding

[Tung et al., CVPR’09] [Tung et al., PAMI’12]

(captions should be automatically displayed with this video)

Page 35: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

CG models as prior

Page 36: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video annotation

Page 37: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video annotation

Topology dictionary for 3D video understanding

[Tung et al., CVPR’09] [Tung et al., PAMI’12]

(captions should be automatically displayed with this video)

Page 38: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Invariant Surface Descriptor for

3D Video Encoding[ACCV12] [TVC14]

Page 39: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video encoding

• 3D video data size is big

– Several GB for few minutes of HR sequence

• Impractical for data storage/management

• Impractical for data streaming over network

• Data structure inconsistency prevents existing

compression approaches to be efficient

Page 40: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video encoding

Approach: Geometry image technique (3D to 2D transform)

• Cut open 3D meshes and re-parameterize on plane

• Apply lossless compression (2D video)

See [Gu et al., SIGGRAPH’02]

for synthetic data

Page 41: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video encoding

Approach: Geometry image technique (3D to 2D transform)

• Cut open 3D meshes and re-parameterize on plane

• Apply lossless compression (2D video)

Solution: stabilize the cuts for optimal encoding

Page 42: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video encoding

Possible scenarios:

1. Meshes are consistent (share same connectivity)

• Synthetic datasets

Page 43: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video encoding

Possible scenarios:

1. Meshes are consistent (share same connectivity)

• Synthetic datasets

2. Meshes are inconsistent (different connectivity, resolution)

• Tracking & remeshing [Cagniart et al., ECCV’10]

• Point-to-point surface alignment

“Geodesic mapping”

[Tung et al., CVPR’10][Tung et al., PAMI’14]

Page 44: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video encoding

Possible scenarios:

1. Meshes are consistent (share same connectivity)

• Synthetic datasets

2. Meshes are inconsistent (different connectivity, resolution)

• Tracking & remeshing [Cagniart et al., ECCV’10]

• Point-to-point surface alignment [Tung et al., CVPR’10][Tung et al.,

PAMI’14]

• Geometrical data are inconsistent in time (e.g., raw 3D video)

– Adaptive bitrate streaming (where resolution can vary)

Deformation invariant surface descriptor

[Tung et al., ACCV’12] [Tung et al., Vis. Comp.’14]

Page 45: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Invariant shape descriptor

• Define a surface-based shape descriptor

– Graph defined on object’s surface

– Nodes are geodesically consistent across time

• E.g., surface extremal points

[Tung et al., IJSM05] [Tung et al., PAMI12]

Page 46: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Invariant shape descriptor

• Define a surface-based shape descriptor

– Graph defined on object’s surface

– Nodes are geodesically consistent across time

• E.g., surface extremal points

– Edges joint the nodes

• Defined as paths on the surface

• Maintained geodesically consistent across time

– Using the previous position of the path (vertices)

– Using the shortest path between nodes

• Probabilistic framework (MAP-MRF) to handle

surface non-rigid deformations

Page 47: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Invariant shape descriptor

• Define a surface-based shape descriptor

– Graph defined on object’s surface

– Nodes are geodesically consistent across time

• E.g., surface extremal points

– Edges joint the nodes

• Defined as paths on the surface

• Maintained geodesically consistent across time

– Using the previous position of the path (vertices)

– Using the shortest path between nodes

Page 48: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Invariant shape descriptor

1. Invariant to surface deformation and parametrization

2. Parametrization in one-shot

3. Use as cut graphs

Page 49: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Invariant shape descriptor

Page 50: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D video encoding

Invariant surface-based descriptor for 3D video encoding

[Tung et al., ACCV’12] [Tung et al., Vis. Comp.’14]

Page 51: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Dynamic Surface Alignment[CVPR10] [PAMI14]

Page 52: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Point-to-point surface alignment

For:

– Shape matching

(retrieval, comparison)

– Motion tracking

– Texture transfer

– …

– 3D video encoding

– Surface dynamics

Page 53: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Point-to-point surface alignment

Appearance-based

• color, corners, local features

e.g., see [Ahmed et al., CVPR08]

Have to deal with:

- Inconsistent colors from multiple views

- Poor texture (e.g., solid color clothing)

- Surface noise

Usual process

1. Find landmark points

2. Refine (interpolate)

Geometry-based

• local geometry property

• mapping/diffusion functions:

spherical [Starck et al., ICCV05],

embedding [Bronstein et al., TVCG07],

multiple maps [Kim et al., , SIGGRAPH11],

spectral matching [Lombaert et al., PAMI13]

• patch deformation [Cagniart et al., ECCV10]

Page 54: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Point-to-point surface alignment

is a surface mapping between S1 and S2

is a metric

is a diffeomorphism

Have to deal with:

- Inconsistent colors from multiple views

- Poor texture (e.g., solid color clothing)

- Surface noise

Page 55: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping

1. Define landmark points using geometry-

based approach

2. Choose the landmark points with

minimum ambiguity (coarse-to-fine

strategy)

3. Refine by propagation

Have to deal with:

- Inconsistent colors from multiple views

- Poor texture (e.g., solid color clothing)

- Surface noise

See preliminary work in [Tung et al., CVPR’10]

Page 56: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping

Model:

• Define a smooth bijective map between two manifolds

(S1, g1) and (S2, g2)

• g1 and g2 are geodesic distances

Geodesic consistency of v1S1 and v2S2 :

• Assuming two sets of N points

B1 = {b1,…,bN} S1 and B2 = {b’1,…,b’N} S2

• i{1,..,N}, |g1 (v1, bi) - g2 (v2, b’i)| ≤ ’

Global geodesic distance measures distortion between surfaces points w.r.t. N points.

Page 57: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping

Overview:

BtSt and Bt+1St+1 are

surface extremal points

(see [Tung et al., PAMI2012])

Surface extremal points are critical points of

Page 58: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping

Overview:

BtSt and Bt+1St+1 are

surface extremal points

(see [Tung et al.,

PAMI2012])

Geodesic consistency condition can be broken

when surfaces undergo non-rigid deformations!

Page 59: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping

Overview:

BtSt and Bt+1St+1 are

surface extremal points

(see [Tung et al., PAMI2012])

Ambiguity degree A (vS) for point localization:

Measure of the number of points geodesically

consistent to v w.r.t. B S

Geodesic consistency condition can be broken

when surfaces undergo non-rigid deformations!

Page 60: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping

Recursive mapping:

• Recursively chose Ni points in regions

of low ambiguity w.r.t. N landmarks

• Find corresponding points using N’ ≤ N

(N’ = max number of isoline intersections)

• Set N = Ni

Page 61: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping

N = Ni

Page 62: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping

• Refinement by MRF optimization:

Labeling problem

Global geodesic distance DN w.r.t. B^t = {bi ^t} and B^t+1 = {bi ^t+1} :

Tp(lp): orientation of (p, lp)

Page 63: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping

• Experimental resultspoint-to-point surface alignment between consecutive frames

Page 64: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping point-to-point surface alignment

[Cagniart et al., ECCV10] as ground truth [Spectral method] = [Lombaert et al., PAMI13]

Page 65: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping point-to-point surface alignment

[Cagniart et al., ECCV10] as ground truth [Spectral method] = [Lombaert et al., PAMI13]

[Misreconstruction]

Page 66: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping point-to-point surface alignment

• Quantitative evaluations

[Lombaert,13

]

[Kim et

al.,11]

Page 67: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping

• Topology change

Regions where no topology change occurred are not affected

[Kim et al.,

SIGGRAPH11]

Page 68: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Geodesic mapping

• Applications

Page 69: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Intrinsic Characterization of

Dynamic Surface[CVPR13] [CVPR14]

Page 70: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Natural object dynamics modeling

• Natural scenes are complex but contain

statistics

– e.g., water, fire, human actions, etc.

• Dynamics modeling has been used for complex

scene segmentation and classification

– Dynamic textures

• Linear Dynamical Systems (distances, BoS)

[Doretto, IJCV02] [Chan, CVPR05] [Ravichandran, CVPR09]

– Dynamic facial events

• Timing structure of LDS

[Kawashima et al., 2007~2010]

Page 71: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Real-world surface dynamics

•Real-world objects in motion exhibit local deformation statistics

•Observation of intrinsic geometry

[Tung et al., CVPR13]

Page 72: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Real-world surface dynamics

Bouncing sequence

Shape index observation across time [Koenderink, Vis. Comp. ‘92]

•Real-world objects in motion exhibit local deformation statistics

•Observation of intrinsic geometry

Page 73: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Real-world surface dynamics

Samba sequence

Shape index observation across time [Koenderink, Vis. Comp. ‘92]

•Real-world objects in motion exhibit local deformation statistics

•Observation of intrinsic geometry

Page 74: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Intrinsic geometry

• Local topology descriptor (Koenderink shape

index)

[-1,1] and k1, k2 are principal curvatures (k1≤k2)

The shape index varies continuously with respect to surface deformation.

Page 75: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Intrinsic geometry

• Shape index variance average give information

on deformation location and relative magnitude

• However, it does not contain information about

acceleration patterns or timing structure

Shape index variance average over sequence.

Page 76: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Surface deformation dynamics

• After surface alignment, surface points can be

tracked across time

• Observation of temporal variations of shape

index at each surface point

• Characterization per surface patch

Page 77: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Surface deformation dynamics

• After surface alignment, surface points can be

tracked across time

• Observation of temporal variations of shape

index at each surface point

• Characterization per per surface patch

Free sequence

Page 78: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Surface deformation dynamics

• Dynamics modeling using Hybrid Linear Dynamical

System [Kawashima et al., ICIAP’07]

– Hidden state variable with Markovian dynamics

• Continuous hidden state variable x(t)

• Noisy measurements y(t)

– Linear-Gaussian model

• Y = { y(t) } : observations

• X = { x(t) } : hidden states in continuous state space

• Fi : transition matrix that models the dynamics of Di

• H : observation matrix mapping hidden states to system output by linear

projection

• gi : bias vector, vi(t) : measurement noise, w(t): observation noise[Doretto, IJCV02] [Chan, CVPR05] [Ravichandran, CVPR09]

Page 79: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Surface deformation dynamics

• Dynamics modeling using Hybrid Linear Dynamical

System [Kawashima et al., ICIAP’07]

– Model LDS state durations and transitions (i.e., timing

structure)

Page 80: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Surface deformation dynamics

– Model state durations and transitions (i.e., timing structure)

Page 81: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Bag-of-System

• Keypoint classification using bag-of-systems

– Bag-of-feature framework

– Codebook obtained by k-medoid clustering

• Codewords accounting for timing distribution

– Softweighting accounting for relative state duration

• Classification using SVM with RBF kernel

– Rigid/non-rigid regions

Page 82: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Rigidity-based classification

- Collection of N = 4 LDS per patch

- K=8 codewords

- For each sequence: 25% for training, 75% for testing

[Ravichandran 09][Saisan01] [Saisan01] [Ours]

[Tung et al., CVPR’13]

Page 83: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Timing-based local descriptor

I = {overlapping intervals}

[Tung et al., CVPR’14]

• Preserve local structure of surface such as deformation

patterns between neighbor patches

Page 84: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Timing-based local descriptor

I = {overlapping intervals}

Yi , Yj : observed signals

[Tung et al., CVPR’14]

• Histogram of timing:

Page 85: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Bag-of-Timing paradigm

• Timing of local surface element dynamics are

words of a codebook

– Sparse histogram of dynamic state timings

– Find codewords using k-medoids algorithm

– Soft-weighting of descriptors

• Classification (SVM)/ segmentation of

descriptors

– Different rigidity levels

Page 86: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014
Page 87: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014
Page 88: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014
Page 89: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Rigidity-based surface segmentation

Page 90: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014
Page 91: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Surface dynamics

Page 92: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Rigidity-based surface segmentation

Page 93: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Dynamic face

Page 94: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

3D face dataset

Page 95: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Dynamic face

Page 96: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Cardiac datasets

Page 97: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014
Page 98: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Summary

• 3D video is a markerless surface capture technique which allows the capture of objects in motion

• 3D video reconstruction state-of-the art• Silhouette and stereo fusion

• Topology dictionary for 3D video understanding- Shape description using Reeb graphs

- Sequence encoding by feature vector clustering

- Probabilistic motion graph model

• Applications: skimming, summarization, annotation, content-based description/encoding.

Page 99: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Summary

• Invariant surface-based descriptor

– Geometry video approach

– Deformation invariant surface cut graph

– Probabilistic formulation

– Applications: 3D video data compression for transfer,

storage.

Page 100: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Summary

• Point-to-point surface alignment of 3D video

data

– Recursive geodesic mapping

– Ambiguity measure

– Competitive with state-of-the-art

Accuracy is to be improved when topology change

Use other intrinsic maps

Page 101: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Summary

• Deformable surface dynamics modeling

– Intrinsic surface properties are tracked across time

– Dynamics modeled using a set of LDS with timing

structure information (using Hybrid LDS)

– Timing-based local descriptor

– Applications: rigidity classification, segmentation with

respect to rigidity levels

• Deformation learning using a generative model

Page 102: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Multimodal Interaction Dynamics

in Group Discussion

using a Smart Digital Signage[ECCVW12] [HCI13] [THMS14]

[ECCVW 08] [IJNCR14]

Page 103: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Human-human interaction

• Human-human interactions for ambient systems

supervising human communications

• Multimodal sensing and analysis of multiparty

interaction for high-level understanding of

human interactions

• Speaker diarization / Visual information processing

• Annotation of comprehension and interest level

• New indexing scheme of speech archives

• Interaction-oriented approach (reaction)

• Non-verbal information (backchannels, nodding,

gaze )

Page 104: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Related work

• VACE Multimodal meeting corpus [Chen et al., MLMI’06]

• 6 people (round table)

• 12 stereo camera pairs, 3D Vicon IR system, microphones

• AMI meeting corpus [2007]

• 6 cameras, 24 microphones, whiteboard

• IMADE room (poster) [Kawahara et al., Interspeech’08]

• 1 presenter, 2 listeners

• 6-8 multiview video cameras, motion capture (12 markers on

body and head), eye-tracking system with accelerometer, micro

array (8-19) and headset

Page 105: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Related work

Video capture at IMADE room

Page 106: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Why poster sessions?

• Norm in conferences and open labs

• Mixture of lecture and meeting characteristics

• One main speaker with a small audience

• Real-time feedback (backchannels by audience)

• Interactive

• Audience can make questions/comments at any time

• Controllable (knowledge/familiarity) and yet real

Page 107: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Overview

1. Multimodal capture system

2. Audio and Visual information processing

3. Multimodal interaction dynamics modeling

4. Experimental validation

• Joint-attention estimation

Page 108: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Portable multimodal system

• 65” plasma screen

• 19-channel mic array + amplifier

• 6 multiple view video cameras

• Vision camera (UXGA, 25fps), synchronized &

calibrated

• 1 PC with GPU

65” display (160cm width)

200cm

30-40cmMicrophone array

Demo at IEEE ICASSP’12

Page 109: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Multimodal data processing

Page 110: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Audio information processing

• Speaker diarization

• Audio segmentation

• Speaker turns

1. Speech enhancement

2. 2 GMM models for classification (256

components)• Speech

• Noise

3. Training by EM [Gomez et al., IEEE Trans. ASLP 2010]

Page 111: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Video information processing

• Online head motion tracking (for nodding and turning)

1. Face detection [Viola & Jones, CVPR’01]

• Face feature detection (nose)

2. Depth from stereo

3. Feature tracking using probabilistic model (particle

filter) [ECCVW08] [IJNCR14]

• Likelihood updated with color histograms and depth info

• Cope with missing frames, partial occlusions

Page 112: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Video information processing

System demo at IEEE ICASSP2012

Page 113: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

A/V interaction

• Input: temporal data (e.g., head positions)

• Speaker diarization

• Head motion of each subjects

• Dynamics modeling using HDS [Kawashima et al.,

NIPSw’10]

• System of LDS

• Transitions using a Finite State Machine

• Timing structure analysis

(event classification, multimodal interaction modeling)

Page 114: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Modeling using HDS

• Linear Dynamical System Di

• Y = { y(t) } : observations

• X = { x(t) } : hidden states in continuous state space

• Fi : transition matrix that models the dynamics of Di

• H : observation matrix that maps hidden states to

system output by linear projection

• gi : bias vector, vi(t) : meas. noise, w(t): obs. noise

Page 115: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Modeling using HDS (cont’d)

• Hybrid LDS

1. N LDS Di

2. FSM with N states: S = { qi }– (N and LDS parameters are estimated using EM)

• Interval-based representation

• Interval: Ik = < qi , tj >

• Duration: tj = ek - bk

[Kawashima et al., NIPSw’10]

Page 116: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Interaction modeling

• Interaction level between multimodal signals

i.e., number of occurrences of synchronized events wrt time

• The distribution of temporal differences of two signals Yk

and Yk’ is modeled by:

Z(Yk,Yk’)=Pr({ bk-bk’=b, ek-ek’= e} | {(Ik, Ik’) : [bk,ek][bk’,ek’] !=0} )

(Z represents synchronization wrt reaction time)

Page 117: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Experimental results

• Two scenarios with digital signage:

• Poster presentation

• Casual discussion

• Speaker/Audience interaction characterization

• A/V processing

• Multimodal interaction dynamics modeling using 6

states

• Insight about joint-attention

Page 118: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Poster presentation

3min

Page 119: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Multimodal interaction modeling

• IHDS with 6 modes for head motion

• LDS clustering & parameter optimization by EM

• LDS timing structure and speaker turn synchronization

Head motion dynamics vs. speech turns

Page 120: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Joint-attention characterization

• Reaction occurrences to A/V stimuli

Audio stimuli Visual stimuli

Page 121: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Casual discussion

3min

Page 122: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Multimodal interaction modeling

Head motion dynamics vs. speech turns

Synchronized state

distribution

Page 123: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Joint-attention estimation

Audio stimuli Visual stimuli

Page 124: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

Summary

• Multimodal system with digital signage (smart

poster) for human-human interaction analysis

• Mic array & multiview video

• Poster presentations (1 presenter, 2-3 listeners)

• Multimodal data interaction

• Speaker diarization & dynamical system modeling

(IHDS)

• Joint-attention in group discussion

• Non-verbal events generate more non-verbal

reactions compared to audio events

Page 125: Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014

tonytung.org