new geometric disentanglement for generative latent shape modelstaumen/pubs/iccv-2019-poster.pdf ·...

Geometric Disentanglement for Generative Latent Shape Models Tristan Aumentado-Armstrong, Stavros Tsogkas, Allan Jepson, Sven Dickinson University of Toronto Vector Institute for AI Samsung AI Center, Toronto Factorizing pose & shape Background: Intrinsic Geometry Geometrically Disentangled VAE: Model Architecture Results: Pose-Aware Retrieval References • Laplace-Beltrami Operator (LBO): Δ g • (ℎ) ≈ መ : - Captures intrinsic geometry (vector signature). - Isometry invariant (e.g., ~ignores articulation). [1]: Achlioptas et al, Learning Representations and Generative Models for 3D Point Clouds, ICLR, 2018. [2]: Esmaeili et al, Structured Disentangled Representations, AISTATS, 2019. Hierarchically Factorized (HF) VAE loss [2]: Penalty on Jacobian of each latent group with respect to another: VAE Loss Function • SMAL and SMPL have separate shape and pose parameters, so we can compute separate retrieval errors for each ( and ). • Ideally, should have high and low ; should have high and low . • Comparisons: full AE and VAE latent vectors. Errors SMAL 0.641 0.743 0.975 0.645 0.938 0.983 0.983 0.993 SMPL 0.856 0.922 0.997 0.928 0.577 0.726 0.709 0.947 Generation/Sampling Autoencoding VAE Operations Failure Modes Datasets: point clouds derived from MNIST, DYNA, SMAL, and SMPL. Interpolations: visualizing movement in and independently largely shows the latter controls intrinsic shape, while the former controls pose. Latent digit subspace traversal can roughly separate shape and style. Each row traverses the marked set. Results: Latent Manipulations Latent interpolations between blue-coloured shapes (SMAL & SMPL). Note: upper-right and lower-left shapes are latent pose/deformation transfers. Vertical: movement in ; horizontal: movement in . • Enables vision, graphics, & robotics tasks (e.g., pose-invariant recognition, manipulation, constrained inference, retrieval, pose transfer). • Spectral methods also factorize pose and shape by separating extrinsic from intrinsic geometry. Goal : disentangle extrinsic pose and intrinsic shape in the latent representation of a 3D point cloud, without annotations. Changing intrinsic shape Metric-altering deformations; they often change the object class/identity. Changing extrinsic pose Often intra-class deformations: rigid transforms, articulation, style-like. Observed trade-off between reconstruction fidelity, prior matching, and disentanglement. Query Retrieval with Retrieval with Predicted Spectrum Rotation VAE Space , , =: AE Space (PointNet) Input: Point cloud Output: Reconstructed point cloud Similar body shape, different pose ≈ Close in መ space Unintuitive intermediates Incomplete disentanglement Encoding failure • Model: two-level VAE (as in [1]). • Pretrain AE space independently. • Disentanglement with constricted dim() forces pose into . Encode into Vary rigid pose Vary non- rigid pose Vary intrinsic shape

Upload: others

Post on 21-Oct-2020

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Geometric Disentanglement for Generative Latent Shape Models

Tristan Aumentado-Armstrong, Stavros Tsogkas, Allan Jepson, Sven Dickinson

University of Toronto Vector Institute for AI Samsung AI Center, Toronto

Factorizing pose & shape

Background: Intrinsic Geometry

Geometrically Disentangled VAE: Model Architecture Results: Pose-Aware Retrieval

References

• Laplace-Beltrami Operator (LBO): Δg• 𝐿𝐵𝑂𝑆𝑝𝑒𝑐𝑡𝑟𝑢𝑚(𝑠ℎ𝑎𝑝𝑒) ≈ መ𝜆:

- Captures intrinsic geometry (vector signature).

- Isometry invariant (e.g., ~ignores articulation).

[1]: Achlioptas et al, Learning Representations and Generative Models for

3D Point Clouds, ICLR, 2018.

[2]: Esmaeili et al, Structured Disentangled Representations, AISTATS, 2019.

Hierarchically Factorized (HF) VAE loss [2]:

Penalty on Jacobian of each latent

group with respect to another:

VAE Loss

Function

• SMAL and SMPL have separate shape 𝛽 and

pose 𝜃 parameters, so we can compute separate

retrieval errors for each (𝐸𝛽 and 𝐸𝜃).

• Ideally, 𝑧𝐼 should have high 𝐸𝜃 and low 𝐸𝛽; 𝑧𝐸should have high 𝐸𝛽 and low 𝐸𝜃.

• Comparisons: full AE 𝑋 and VAE 𝑧 latent vectors.

Errors 𝑋 𝑧 𝑧𝐸 𝑧𝐼

SMAL𝐸𝛽 0.641 0.743 0.975 0.645

𝐸𝜃 0.938 0.983 0.983 0.993

SMPL𝐸𝛽 0.856 0.922 0.997 0.928

𝐸𝜃 0.577 0.726 0.709 0.947

Generation/Sampling

Autoencoding

VAE Operations

Failure Modes

Datasets: point clouds derived from MNIST, DYNA, SMAL, and SMPL.

Interpolations: visualizing movement in 𝑧𝐸 and 𝑧𝐼 independently largely

shows the latter controls intrinsic shape, while the former controls pose.

Latent digit subspace traversal can

roughly separate shape and style.

Each row traverses the marked set.

𝑧

𝑧𝐼

𝑧𝐸

𝑧

𝑧𝐼𝑧𝐸

Results: Latent Manipulations

Latent interpolations between blue-coloured shapes (SMAL & SMPL). Note: upper-right and lower-left

shapes are latent pose/deformation transfers. Vertical: movement in 𝑧𝐸 ; horizontal: movement in 𝑧𝐼.

𝑧𝐼

𝑧𝐸

• Enables vision, graphics, & robotics tasks (e.g.,

pose-invariant recognition, manipulation,

constrained inference, retrieval, pose transfer).

• Spectral methods also factorize pose and shape

by separating extrinsic from intrinsic geometry.

Goal: disentangle extrinsic pose and

intrinsic shape in the latent representation

of a 3D point cloud, without annotations.

Changing intrinsic shapeMetric-altering deformations; they

often change the object class/identity.

Changing extrinsic poseOften intra-class deformations: rigid

transforms, articulation, style-like.

Observed trade-off between reconstruction fidelity,

prior matching, and disentanglement.

Query Retrieval with 𝑧𝐸 Retrieval with 𝑧𝐼

Predicted

Spectrum

Rotation

VAE Space

𝑧𝑅, 𝑧𝐸 , 𝑧𝐼 =: 𝑧

AE Space

(PointNet)

Input: Point cloud 𝑃

Output: Reconstructed point cloud 𝑃

Similar body shape, different pose ≈ Close in መ𝜆 space

Unintuitive

intermediates

Incomplete

disentanglement

Encoding

failure

• Model: two-level VAE (as in [1]).

• Pretrain AE space independently.

• Disentanglement with constricted

dim(𝑧) forces pose into 𝑧𝐸.

Encode

into 𝑧𝑧𝐸

𝑧𝐼

𝑧𝑅

Vary rigid

pose

Vary non-

rigid pose

Vary intrinsic

shape