bayesian robust principal component analysis presenter: raghu ranganathan ece / cmr tennessee...

Bayesian Robust Principal Component Analysis

Presenter: Raghu Ranganathan

ECE / CMR

Tennessee Technological University

January 21, 2011

Reading Group

(Xinghao Ding, Lihan He, and Lawrence Carin)

2

Paper contribution

■ The problem of matrix decomposition into low-rank and sparse components is considered employing a hierarchical approach

■ The matrix is assumed noisy, with unknown and possibly non-stationary noise statistics

■ The Bayesian framework approximately infers the noise statistics in addition to the low-rank and sparse outlier contributions

■ The model proposed is robust to a broad range of noise levels without having to change the hyper-parameter settings

■ In addition, a Markov dependency between successive rows of the matrix is inferred by the Bayesian model to exploit additional structure in the observed matrix, particularly, in video applications

1/21/11

3

Introduction

■ Most high-dimensional data such as images, biological data, and social network data (Netflix data) reside in a low-dimensional subspace or low-dimensional manifold

1/21/11

4

Noise models

■ In low-rank matrix representations, two types of noise models are usually considered

■ One causes small scale perturbation to all the matrix elements, e.g. i.i.d. Gaussian noise added to each element.

■ In this case, if the noise energy is small compared to the dominant singular values of the SVD, it does not significantly affect the principal vectors

■ The second case is sparse noise with arbitrary magnitude, impacting a small subset of matrix elements, for example a moving object in video, in the presence of a static background manifests such sparse noise

1/21/11

5

Convex optimization approach

11/5/10

6

Bayesian approach

■ The observation matrix is considered to be of the form

Y = L (low-rank)+ S (sparse)+ E (noise), with the presence of both sparse noise, S, and dense noise E.

■ In the proposed Bayesian model, the noise statistics of E are approximately learned, along with learning S, and L.

■ The proposed model is robust to a broad range of noise variances

■ The Bayesian model infers approximation to the posterior distributions on the model parameters, and obtains approximate probability distributions for L, S, and E

■ The advantage of Bayesian model is that prior knowledge is employed in the inference

1/21/11

7

Bayesian approach

■ The Bayesian framework exploits the anticipated structure in the sparse component.

■ In video analysis, it is desired to separate the spatially localized moving objects (sparse component), from the static or quasi-static background (low-rank component) in the presence frame dependent additive noise E.

■ The correlation between the sparse components of the video from frame to frame (column to column in the matrix) has to be considered

■ In this paper, a Markov dependency in time and space is assumed between the sparse components of consecutive matrix columns

■ This structure is incorporated into the Bayesian framework, with the Markov parameters inferred through the observed matrix

1/21/11

8

Bayesian Robust PCA

■ The work in this paper is closely related to the low-rank matrix completion problem where we try to approximate a matrix (with noisy entries) by a low-rank matrix and to predict the missing entries

■ The matrix Y = L + S + E is missing random entries; the proposed model can make estimates for the missing entries (in terms of the low-rank term L)

■ The S term is defined as a sparse set of matrix entries; the location of S must be inferred while estimating the values of L, S, and E

■ Typically, in Bayesian inference, a sparseness promoting prior is imposed on the desired signal, and the posterior distribution of the sparse signal is inferred.

1/21/11

9

Bayesian Low-rank and Sparse Model

1/21/11

10 11/5/10

11


11/5/10

12


11/5/10

13

C. Noise component

■ The measurement noise is drawn i.i.d from a Gaussian distribution, and the noise affects all measurements

■ The noise variance is assumed unknown, and is learned within the model inference. Mathematically, the noise is modeled as

■ The model can learn different noise variances for different parts of E, i.e. each column/row of Y (each frame) in general have its own noise level. The noise structure is modified as

1/21/11

14

Relation to the optimization based approach

1/21/11

15

Relation to the optimization based approach

■ In the Bayesian model, it is not required to know the noise variance a priori, the model will learn the noise during inference

■ For the low-rank component instead of the constraint to impose sparseness of singular values, the Gaussian prior together with the beta-Bernoulli distribution is used to obtain an constraint

■ For the sparse component, instead of the constraint, the constraint

and the beta-Bernoulli distribution is employed to enforce sparsity■ Compared to the Laplacian prior (gives many small entries close to 0),

the beta-Bernoulli prior yields exactly zero values■ In Bayesian learning, numerical methods are used to estimate the

distribution for the unknown parameters, whereas in the optimization based approach, a solution to the minimum of a function similar to

1l

Fl2

F

1l Fl2

FX

1l

),|(log HYp

1/21/11

16

Markov dependency of Sparse Term in Time and Space

1/21/11

17


1/21/11

18


1/21/11

19

Posterior inference

1/21/11

20 11/5/10

21

Experimental results

11/5/10

22 11/5/10

23 11/5/10

24

B. Video example

■ The application of video surveillance with a fixed camera is considered

■ The objective is to reconstruct a near static background and moving foreground from a video sequence

■ The data are organized such that the column m of Y is constructed by concatenating all pixels of frame m from a grayscale video sequence

■ The background is modeled as the low-rank component, and the moving foreground as the sparse component.

■ The rank r is usually small for a static background, and the sparse components across frames (columns of Y) are strongly correlated, modeled by a Markov dependency

1/21/11

25 11/5/10

26 11/5/10

27

Conclusions

■ The authors have developed a new robust Bayesian PCA framework for analysis of matrices with sparsely distributed noise of arbitrary magnitude

■ The Bayesian approach is found to be robust to densely distributed noise, and the noise statistics may be inferred based on the data, with no tuning of hyperparameters

■ In addition, using the Markov property, the model allows the noise statistics to vary from frame to frame

■ Future research directions would involve a moving camera which would assume the background resides in a low-dimensional manifold as opposed to low-dimensional linear subspace

■ The Bayesian framework may be extended to infer the properties of the low-dimensional manifold

1/21/11

bayesian robust principal component analysis presenter: raghu ranganathan ece / cmr tennessee...

Documents

sparse noise

dense noise

noise energy

gaussian noise

lowrank s sparse e noise

observed matrix

bayesian framework

noise statistics of