incremental_svd_missingdata (1).pdf

8/10/2019 Incremental_SVD_missingdata (1).pdf

1/14

MERL A MITSUBISHI ELECTRIC RESEARCH LABORATORY

http://www.merl.com

Incremental singular value decomposition of

uncertain data with missing values

Matthew Brand

TR-2002-24 May 2002

Abstract

We introduce an incremental singular value decomposition (SV D) of incomplete data. The SV D

is developed as data arrives, and can handle arbitrary missing/untrusted values, correlated un-

certainty across rows or columns of the measurement matrix, and user priors. Since incomplete

data does not uniquely specify an SV D, the procedure selects one having minimal rank. For

a dense p qmatrix of low rankr , the incremental method has time complexity O(pqr) andspace complexity O((p+ q)r)better than highly optimized batch algorithms such as MAT-LA Bs svd(). In cases of missing data, it produces factorings of lower rank and residual than

batch SV D algorithms applied to standard missing-data imputations. We show applications in

computer vision and audio feature extraction. In computer vision, we use the incremental S VD to

develop an efficient and unusually robust subspace-estimating flow-based tracker, and to handle

occlusions/missing points in structure-from-motion factorizations.

First circulated spring 2001.

This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in

part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies

include the following: a notice that such copying is by permission of Mitsubishi Electric Information Technology Center America; an

acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying,

reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Information

Technology Center America. All rights reserved.

Copyright cMitsubishi Electric Information Technology Center America, 2002

201 Broadway, Cambridge, Massachusetts 02139


2/14

Proceedings of the 2002 European Conference on Computer Vision (ECCV2002 Copenhagen), Springer

Lecture Notes in Computer Science volume 2350.


3/14

Incremental singular value decomposition of uncertain data with

missing values

Matthew Brand

Mitsubishi Electric Research Labs, 201 Broadway, Cambridge 02139 MA, USA

To appear, Proc. European Conference on Computer Vision (ECCV), May 2002.

Abstract. We introduce an incremental singular value decomposition ( SV D) of incomplete data. The

SV D is developed as data arrives, and can handle arbitrary missing/untrusted values, correlated uncer-

tainty across rows or columns of the measurement matrix, and user priors. Since incomplete data does not

uniquely specify an S VD, the procedure selects one having minimal rank. For a dense pqmatrix of lowrankr, the incremental method has time complexity O(pqr) and space complexity O((p + q)r)better

than highly optimized batch algorithms such as MATLAB

s svd

(). In cases of missing data, it producesfactorings of lower rank and residual than batch S VD algorithms applied to standard missing-data imputa-

tions. We show applications in computer vision and audio feature extraction. In computer vision, we use

the incremental S VD to develop an efficient and unusually robust subspace-estimating flow-based tracker,

and to handle occlusions/missing points in structure-from-motion factorizations.

1 Introduction

Many natural phenomena can be faithfully modeled with multilinear functions, or closely approximated as

such. Examples include the combination of lighting and pose [20] and shape and motion [12,3] in image

formation, mixing of sources in acoustic recordings [6], and word associations in collections of documents

[1,23]. Multilinearity means that a matrix of such a phenomenons measured effects can be factored into

low-rank matrices of (presumed) causes. The celebrated singular value decomposition ( SV D) [8] provides a

bilinear factoring of a data matrix M,

Uprdiag(sr1)VrqSVDr Mpq, r min(p,q) (1)

whereUandVare unitary orthogonal matrices whose columns give a linear basis forMs columns and rows,

respectively. For low-rank phenomena, rtrue min(p,q), implying a parsimonious explanation of the data.Sincertrue is often unknown, it is common to wastefully compute a large rapprox rtrueSV Dand estimate anappropriate smaller valuerempiricalfrom the distribution of singular values in s. All butrempiricalof the smallest

singular values in s are then zeroed to give a thin truncated SV D that closely approximates the data. This

forms the basis of a broad range of algorithms for data analysis, dimensionality reduction, compression,

noise-suppression, and extrapolation.

The S VD is usually computed by a batch O(pq2 +p2q + q3)time algorithm [8], meaning that all the datamust be processed at once, and S VDs of very large datasets are essentially unfeasible. Lanczos methods yield

thin S VDs in O(pqr2

)time [8], butrtrueshould be known in advance since Lanczos methods are known to beinaccurate for the smaller singular values [1]. A more pressing problem is that the S VDrequirescompletedata,

whereas in many experimental settings some parts of the measurement matrix may be missing, contaminated,

or otherwise untrusted. Consequently, a single missing value forces the modeler to discard an entire row or

column of the data matrix prior to the S VD. The missing value may be imputed from neighboring values, but

such imputations typically mislead the S VD away from the most parsimonious (low-rank) decompositions.


4/14

2

We consider how an S VD may be updated by adding rows and/or columns of data, which may be missing

values and/or contaminated with correlated (colored) noise. The size of the data matrix need not be known:

The SV D is developed as the data comes in and handles missing values in a manner that minimizes rank.The resulting algorithms have better time and space complexity than full-data batch SV D methods and can

produce more informative results (more parsimonious factorings of incomplete data). In the case of dense

low-rank matrices, the time complexity is linear in the size and the rank of the data O(pqr)while thespace complexity is sublinearO((p + q)r).

2 Related work

SV D updating has a literature spread over three decades [5,4,1,10,7,23] and is generally based on Lanczos

methods, symmetric eigenvalue perturbations, or identities similar to equation 2 below. Zha and Simon [23]

use such an identity but their update is approximate and requires a dense SV D. Chandrasekaran et alia [7]

begin similarly but their update is limited to single vectors and is vulnerable to loss of orthogonality. Levy

and Lindenman [14] exploit the relationship between the QR-decomposition and the SV D to incrementally

compute the left singular vectors in O(pqr2)time; ifp,q,and rare known in advance and p q r, thenthe expected complexity falls to O(pqr). However, this is also vulnerable to loss of orthogonality and resultshave only been reported for matrices having a few hundred columns.

None of this literature contemplates missing or uncertain values, except insofar as they can be treated

as zeros (e.g., [1]), which is arguably incorrect. In batch- SV D contexts, missing values are usually handled

via subspace imputation, using an expectation-maximization-like procedure: Perform an S VD of all complete

columns, regress incomplete columns against theS VD to estimate missing values, then re-factor and re-impute

the completed data until a fixpoint is reached (e.g., [21]). This is extremely slow (quartic time) and only works

if very few values are missing. It has the further demerit that the imputation does not minimize effective rank.

Other heuristics simply fill missing values with row- or column-means [19].

In the special case where a matrix M is nearly dense, its normalized scatter matrix m,n.

= Mi,mMi,nimay be fully dense due to fill-in. In that case s eigenvectors areMs right singular vectors [13]. However,this method does not lead to the left singular vectors, and it often doesnt work at all because is frequently

incomplete as well, with undefined eigenvectors.

3 Updating an SVD

We begin with an existing rank-r SV D as in equation 1. We have a matrix Cpc whose columns containadditional multivariate measurements. LetL

.= U\C = UCbe the projection ofC onto the orthogonal basis

U, also known as its eigen-coding. Let H .= (IUU)C=CULto be the component ofC orthogonal

to the subspace spanned by U. (I is the identity matrix.) Finally, let J be an orthogonal basis ofH and let

K .= J\H = JHbe the projection ofConto the subspace orthogonal toU. For example,JK QRHcould be

a QR-decomposition ofH. Consider the following identity:

U J

diag(s) L0 K

V0

0 I

= U(IUU

)C/K

diag(s) UC0 K

V0

0 I

=

U diag(s)V C

=

M C

(2)

Like an S VD, the left and right matrices in the product are unitary and orthogonal. The middle matrix, which

we denoteQ, is diagonal with ac-column border. To update the S VD we must diagonalize Q. Let

U diag(s)V SVD Q (3)


5/14

3

Fig. 1. A vector is decomposed into components within and orthogonal to an SV D-derived subspace. The

parallel component causes the singular vectors to be rotated (see figure2), while the orthogonal component

increases the rank of the S VD.

U U J U; s s; V

V0

0 I

V (4)

Then the updated S VD is

U diag(s)V=

U diag(s)V C

=

M C. (5)

The whole update procedure takes O((p + q)r2 +pc2) time1, spent mostly in the subspace rotations ofequation4. To add rows one simply swaps U for V and Ufor V .

In practice, some care must be taken to counter numerical error that may make J and U not quite orthog-

onal. We found that applying modified Gram-Schmidt orthogonalization to U when the inner product of its

first and last columns is more than some small away from zero makes the algorithm numerically robust. Amuch more efficient scheme will be developed below.

Fig. 2.Visualization of theS VD update in equation2. The quasi-diagonalQ matrix at left is diagonalized and

the subspaces are counter-rotated to preserve equality.

3.1 Automatic truncation

Definev .

=det

(K

K)

, which is the volume ofC that is orthogonal to U. Ifv

incremental_svd_missingdata (1).pdf

Documents