deep generative learning_icml_part1

31
Bayesian Posterior Inference in the Big Data Arena Max Welling Anoop Korattikara

Upload: scyfer

Post on 21-Jun-2015

314 views

Category:

Science


0 download

DESCRIPTION

Max Welling Scyfer workshop at ICML

TRANSCRIPT

  • 1. Bayesian Posterior Inference in the Big Data Arena Max Welling Anoop Korattikara

2. Outline Introduction Stochastic Variational Inference Variational Inference 101 Stochastic Variational Inference Deep Generative Models with SVB MCMC with mini-batches MCMC 101 MCMC using noisy gradients MCMC using noisy Metropolis-Hastings Theoretical results Conclusion 3. Big Data (mine is bigger than yours) Square Kilometer Array (SKA) produces 1 Exabyte per day by 2024 (interested to do approximate inference on this data, talk to me) 4. Introduction Why do we need posterior inference if the datasets are BIG? 5. p>>N Big data may mean large p, small n Gene expression data fMRI data 5 6. Planning Planning against uncertainty needs probabilities 6 7. Little data inside Big data Not every data-case carries information about every model component New user with no ratings (cold start problem) 7 8. 1943: First NN (+/- N=10) 1988: NetTalk (+/- N=20K) 2009: Hintons Deep Belief Net (+/- N=10M) 2013: Google/Y! (N=+/- 10B) Big Models! Models grow faster than useful information in data 8 9. Two Ingredients for Big Data Bayes Any big data posterior inference algorithm should: 1. easily run on a distributed architecture. 2. only use a small mini-batch of the data at every iteration. 10. Bayesian Posterior Inference Variational Inference Sampling Variational Family Q All probability distributions Deterministic Biased Local minima Easy to assess convergence Stochastic (sample error) Unbiased Hard to mix between modes Hard to assess convergence 11. Variational Bayes 11 Hinton & van Camp (1993) Neal & Hinton (1999) Saul & Jordan (1996) Saul, Jaakkola & Jordan (1996) Attias (1999,2000) Wiegerinck (2000) Ghahramani & Beal (2000,2001) Coordinate descent on Q P Q (Bishop, Pattern Recognition and Machine Learning) 12. Stochastic VB Hoffman, Blei & Bach, 2010 Stochastic natural gradient descent on Q 12 P and Q in exponential family. Q factorized: At every iteration: subsample n