incremental_svd_missingdata (1).pdf

Upload: vazzoleralex6884

Post on 02-Jun-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Incremental_SVD_missingdata (1).pdf

    1/14

    MERL A MITSUBISHI ELECTRIC RESEARCH LABORATORY

    http://www.merl.com

    Incremental singular value decomposition of

    uncertain data with missing values

    Matthew Brand

    TR-2002-24 May 2002

    Abstract

    We introduce an incremental singular value decomposition (SV D) of incomplete data. The SV D

    is developed as data arrives, and can handle arbitrary missing/untrusted values, correlated un-

    certainty across rows or columns of the measurement matrix, and user priors. Since incomplete

    data does not uniquely specify an SV D, the procedure selects one having minimal rank. For

    a dense p qmatrix of low rankr , the incremental method has time complexity O(pqr) andspace complexity O((p+ q)r)better than highly optimized batch algorithms such as MAT-LA Bs svd(). In cases of missing data, it produces factorings of lower rank and residual than

    batch SV D algorithms applied to standard missing-data imputations. We show applications in

    computer vision and audio feature extraction. In computer vision, we use the incremental S VD to

    develop an efficient and unusually robust subspace-estimating flow-based tracker, and to handle

    occlusions/missing points in structure-from-motion factorizations.

    First circulated spring 2001.

    This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in

    part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies

    include the following: a notice that such copying is by permission of Mitsubishi Electric Information Technology Center America; an

    acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying,

    reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Information

    Technology Center America. All rights reserved.

    Copyright cMitsubishi Electric Information Technology Center America, 2002

    201 Broadway, Cambridge, Massachusetts 02139

  • 8/10/2019 Incremental_SVD_missingdata (1).pdf

    2/14

    Proceedings of the 2002 European Conference on Computer Vision (ECCV2002 Copenhagen), Springer

    Lecture Notes in Computer Science volume 2350.

  • 8/10/2019 Incremental_SVD_missingdata (1).pdf

    3/14

    Incremental singular value decomposition of uncertain data with

    missing values

    Matthew Brand

    Mitsubishi Electric Research Labs, 201 Broadway, Cambridge 02139 MA, USA

    To appear, Proc. European Conference on Computer Vision (ECCV), May 2002.

    Abstract. We introduce an incremental singular value decomposition ( SV D) of incomplete data. The

    SV D is developed as data arrives, and can handle arbitrary missing/untrusted values, correlated uncer-

    tainty across rows or columns of the measurement matrix, and user priors. Since incomplete data does not

    uniquely specify an S VD, the procedure selects one having minimal rank. For a dense pqmatrix of lowrankr, the incremental method has time complexity O(pqr) and space complexity O((p + q)r)better

    than highly optimized batch algorithms such as MATLAB

    s svd

    (). In cases of missing data, it producesfactorings of lower rank and residual than batch S VD algorithms applied to standard missing-data imputa-

    tions. We show applications in computer vision and audio feature extraction. In computer vision, we use

    the incremental S VD to develop an efficient and unusually robust subspace-estimating flow-based tracker,

    and to handle occlusions/missing points in structure-from-motion factorizations.

    1 Introduction

    Many natural phenomena can be faithfully modeled with multilinear functions, or closely approximated as

    such. Examples include the combination of lighting and pose [20] and shape and motion [12,3] in image

    formation, mixing of sources in acoustic recordings [6], and word associations in collections of documents

    [1,23]. Multilinearity means that a matrix of such a phenomenons measured effects can be factored into

    low-rank matrices of (presumed) causes. The celebrated singular value decomposition ( SV D) [8] provides a

    bilinear factoring of a data matrix M,

    Uprdiag(sr1)VrqSVDr Mpq, r min(p,q) (1)

    whereUandVare unitary orthogonal matrices whose columns give a linear basis forMs columns and rows,

    respectively. For low-rank phenomena, rtrue min(p,q), implying a parsimonious explanation of the data.Sincertrue is often unknown, it is common to wastefully compute a large rapprox rtrueSV Dand estimate anappropriate smaller valuerempiricalfrom the distribution of singular values in s. All butrempiricalof the smallest

    singular values in s are then zeroed to give a thin truncated SV D that closely approximates the data. This

    forms the basis of a broad range of algorithms for data analysis, dimensionality reduction, compression,

    noise-suppression, and extrapolation.

    The S VD is usually computed by a batch O(pq2 +p2q + q3)time algorithm [8], meaning that all the datamust be processed at once, and S VDs of very large datasets are essentially unfeasible. Lanczos methods yield

    thin S VDs in O(pqr2

    )time [8], butrtrueshould be known in advance since Lanczos methods are known to beinaccurate for the smaller singular values [1]. A more pressing problem is that the S VDrequirescompletedata,

    whereas in many experimental settings some parts of the measurement matrix may be missing, contaminated,

    or otherwise untrusted. Consequently, a single missing value forces the modeler to discard an entire row or

    column of the data matrix prior to the S VD. The missing value may be imputed from neighboring values, but

    such imputations typically mislead the S VD away from the most parsimonious (low-rank) decompositions.

  • 8/10/2019 Incremental_SVD_missingdata (1).pdf

    4/14

    2

    We consider how an S VD may be updated by adding rows and/or columns of data, which may be missing

    values and/or contaminated with correlated (colored) noise. The size of the data matrix need not be known:

    The SV D is developed as the data comes in and handles missing values in a manner that minimizes rank.The resulting algorithms have better time and space complexity than full-data batch SV D methods and can

    produce more informative results (more parsimonious factorings of incomplete data). In the case of dense

    low-rank matrices, the time complexity is linear in the size and the rank of the data O(pqr)while thespace complexity is sublinearO((p + q)r).

    2 Related work

    SV D updating has a literature spread over three decades [5,4,1,10,7,23] and is generally based on Lanczos

    methods, symmetric eigenvalue perturbations, or identities similar to equation 2 below. Zha and Simon [23]

    use such an identity but their update is approximate and requires a dense SV D. Chandrasekaran et alia [7]

    begin similarly but their update is limited to single vectors and is vulnerable to loss of orthogonality. Levy

    and Lindenman [14] exploit the relationship between the QR-decomposition and the SV D to incrementally

    compute the left singular vectors in O(pqr2)time; ifp,q,and rare known in advance and p q r, thenthe expected complexity falls to O(pqr). However, this is also vulnerable to loss of orthogonality and resultshave only been reported for matrices having a few hundred columns.

    None of this literature contemplates missing or uncertain values, except insofar as they can be treated

    as zeros (e.g., [1]), which is arguably incorrect. In batch- SV D contexts, missing values are usually handled

    via subspace imputation, using an expectation-maximization-like procedure: Perform an S VD of all complete

    columns, regress incomplete columns against theS VD to estimate missing values, then re-factor and re-impute

    the completed data until a fixpoint is reached (e.g., [21]). This is extremely slow (quartic time) and only works

    if very few values are missing. It has the further demerit that the imputation does not minimize effective rank.

    Other heuristics simply fill missing values with row- or column-means [19].

    In the special case where a matrix M is nearly dense, its normalized scatter matrix m,n.

    = Mi,mMi,nimay be fully dense due to fill-in. In that case s eigenvectors areMs right singular vectors [13]. However,this method does not lead to the left singular vectors, and it often doesnt work at all because is frequently

    incomplete as well, with undefined eigenvectors.

    3 Updating an SVD

    We begin with an existing rank-r SV D as in equation 1. We have a matrix Cpc whose columns containadditional multivariate measurements. LetL

    .= U\C = UCbe the projection ofC onto the orthogonal basis

    U, also known as its eigen-coding. Let H .= (IUU)C=CULto be the component ofC orthogonal

    to the subspace spanned by U. (I is the identity matrix.) Finally, let J be an orthogonal basis ofH and let

    K .= J\H = JHbe the projection ofConto the subspace orthogonal toU. For example,JK QRHcould be

    a QR-decomposition ofH. Consider the following identity:

    U J

    diag(s) L0 K

    V0

    0 I

    = U(IUU

    )C/K

    diag(s) UC0 K

    V0

    0 I

    =

    U diag(s)V C

    =

    M C

    (2)

    Like an S VD, the left and right matrices in the product are unitary and orthogonal. The middle matrix, which

    we denoteQ, is diagonal with ac-column border. To update the S VD we must diagonalize Q. Let

    U diag(s)V SVD Q (3)

  • 8/10/2019 Incremental_SVD_missingdata (1).pdf

    5/14

    3

    Fig. 1. A vector is decomposed into components within and orthogonal to an SV D-derived subspace. The

    parallel component causes the singular vectors to be rotated (see figure2), while the orthogonal component

    increases the rank of the S VD.

    U U J U; s s; V

    V0

    0 I

    V (4)

    Then the updated S VD is

    U diag(s)V=

    U diag(s)V C

    =

    M C. (5)

    The whole update procedure takes O((p + q)r2 +pc2) time1, spent mostly in the subspace rotations ofequation4. To add rows one simply swaps U for V and Ufor V .

    In practice, some care must be taken to counter numerical error that may make J and U not quite orthog-

    onal. We found that applying modified Gram-Schmidt orthogonalization to U when the inner product of its

    first and last columns is more than some small away from zero makes the algorithm numerically robust. Amuch more efficient scheme will be developed below.

    Fig. 2.Visualization of theS VD update in equation2. The quasi-diagonalQ matrix at left is diagonalized and

    the subspaces are counter-rotated to preserve equality.

    3.1 Automatic truncation

    Definev .

    =det

    (K

    K)

    , which is the volume ofC that is orthogonal to U. Ifv