learning weight uncertainty with stochastic gradient mcmc ... · results: applications to shape...

6
Background Stochastic Gradient MCMC Results Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification Presenter: Chunyuan Li Chunyuan Li, Andrew Stevens, Changyou Chen, Yunchen Pu, Zhe Gan, Lawrence Carin Duke University June 30, 2016

Upload: others

Post on 21-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Weight Uncertainty with Stochastic Gradient MCMC ... · Results: Applications to Shape Classification A variety of 2D and 3D datasets –inclduingSHRECandShapeNetetc Empirical

Background Stochastic Gradient MCMC Results

Learning Weight Uncertainty with Stochastic

Gradient MCMC for Shape Classification

Presenter: Chunyuan Li

Chunyuan Li, Andrew Stevens, Changyou Chen,Yunchen Pu, Zhe Gan, Lawrence Carin

Duke University

June 30, 2016

Page 2: Learning Weight Uncertainty with Stochastic Gradient MCMC ... · Results: Applications to Shape Classification A variety of 2D and 3D datasets –inclduingSHRECandShapeNetetc Empirical

Background Stochastic Gradient MCMC Results

Deep Neural Nets for Shape Representations

Shapes in the real-world manifest rich variability.

Learning deep representations of shapes with DNNs.While SGD with Backpropagation is popular, issues exist:

Overfitting: Make overly confident decisions on prediction

Page 3: Learning Weight Uncertainty with Stochastic Gradient MCMC ... · Results: Applications to Shape Classification A variety of 2D and 3D datasets –inclduingSHRECandShapeNetetc Empirical

Background Stochastic Gradient MCMC Results

Stochastic Gradient MCMC: Motivation

Weight Uncertainty of DNNs

! " ! "

! : Input Object " : Output Label : Hidden Units

: Weight with8a fixed value : Weight with a distribution

Posterior inference of weight distributions

Bring MCMC back to CV community to tackle “big data"Traditional MCMC: was popular in CV a decade ago

– inclduing Gibbs sampling, HMC, MH, etc; NOT scalablePropose to use scalable MCMC to fill the gap

Page 4: Learning Weight Uncertainty with Stochastic Gradient MCMC ... · Results: Applications to Shape Classification A variety of 2D and 3D datasets –inclduingSHRECandShapeNetetc Empirical

Background Stochastic Gradient MCMC Results

Stochastic Gradient MCMC: Algorithm

Implementation1 Training: Adding noise to parameter update2 Testing: Model averaging

SG-MCMC algorithms and their optimization counterpartsAlgorithms SG-MCMC OptimizationBasic SGLD SGDPreconditioning pSGLDı RMSprop/AdagradMomentum SGHMC momentum SGDThermostat SGNHT Santaù

[ı] Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural NetworksLi et al, AAAI 2016

[ù] Bridging the Gap between Stochastic Gradient MCMC and Stochastic OptimizationChen et al, AISTATS 2016

Page 5: Learning Weight Uncertainty with Stochastic Gradient MCMC ... · Results: Applications to Shape Classification A variety of 2D and 3D datasets –inclduingSHRECandShapeNetetc Empirical

Background Stochastic Gradient MCMC Results

Interpretation of Dropout and Batch Normaliation

Dropout/DropConnect and SGLD share the same form ofupdate rule, with the only di�erence being that the level ofinjected noise is di�erent.

The integration of binary Dropout with SG-MCMC can beviewed as learning weight uncertainty of mixtures of neuralnetworks.

Batch-Normalization can accelerate SG-MCMC training.It helps prevent the sampler from getting stuck in thesaturated regimes of nonlinearities.

Page 6: Learning Weight Uncertainty with Stochastic Gradient MCMC ... · Results: Applications to Shape Classification A variety of 2D and 3D datasets –inclduingSHRECandShapeNetetc Empirical

Background Stochastic Gradient MCMC Results

Results: Applications to Shape Classification

A variety of 2D and 3D datasets– inclduing SHREC and ShapeNet etc

Empirical observationsThe use of Bayesian learning (SG-MCMC or Dropout) slows downtraining initially. This is likely due to the higher uncertainty imposedduring learning, resulting in more exploration of parameter space.

Increased uncertainty, however, prevents overfitting and eventuallyresults in improved performance.