convex algebraic geometry of rank...
TRANSCRIPT
Convex Algebraic Geometry of Rank
MinimizationPablo A. Parrilo
International Symposium on Mathematical Programming
August 2009 Chicago
PROBLEM: Find the lowest rankmatrix in a given convex set.
• E.g., given an affine subspace of matrices, find one of lowest possible rank
• In general, NPhard (e.g., reduction from max cut, sparsest vector, etc.).
Many applications...
• Boolean quadratic minimization (e.g., maxcut)
• Relax using the substitution X := xxT:
• If solution X has rank 1, then we solved the original problem!
Application 1: Quadratic optimization
• Partially specified matrix, known pattern • Often, random sampling of entries • Applications:
– Partially specified covariances (PSD case) – Collaborative prediction (e.g., RennieSrebro 05, Netflix problem)
Mij known for black cells Mij unknown for white cells
M =
6
Application 3: Sum of squares
• Number of squares equal to the rank of the Gram matrix
• How to compute a SOS representation with the minimum number of squares?
• Rank minimization with SDP constraints
∑ =
• “Complexity” of P ≈ rank(hank(h))
Application 4: System Identification
• Given video frames with missing portions
• Reconstruct / interpolate the missing data
• “Simple” dynamics <> Low rank (multi) Hankel
??
10
Overview
Rank optimization is ubiquitous in optimization, communications, and control. Difficult, even under linear constraints.
This talk: • Underlying convex geometry and nuclear norm • Rank minimization can be efficiently solved (under certain conditions!)
• Links with sparsity and “compressed sensing” • Combined sparsity + rank
11
Outline
• Applications
• Convex hulls of varieties
• Geometry and nuclear norm
Rank minimization
PROBLEM: Find the matrix of smallest rank that satisfies the underdetermined linear system:
• Given an affine subspace of matrices, find one of lowest rank
13
3x3 matrices, rank 2
Manymethods have been proposed (after all, it’s an NLP!)
• Newtonlike local methods
• Etc...
Sometimes (often?) work very well. But, very hard to prove results on global performance.
15
Geometry of rank varieties
What is the structure of the set of matrices of fixed rank?
An algebraic variety (solution set of polynomial equations), defined by the vanishing of all (k+1) × (k+1) minors of M.
Its dimension is k x (2nk), and is nonsingular, except on those matrices of rank less than or equal to k1.
At smooth points , welldefined tangent space:
M
• Unitarily invariant
• Also known as Schatten 1norm, KyFan rnorm, trace norm, etc...
Nuclear norm
• Bad nonconvex problem > Convexify!
rank nuclear norm
• Systematic methods to produce (exact or approximate) SDP representations of convex hulls
• Based on sums of squares (Shor, Nesterov, Lasserre, P., Nie, Helton)
• Parallels/generalizations from combinatorial optimization (theta bodies, e.g., GouveiaLaurentP. Thomas 09)
21
• Semidefinite programming characterization:
A convex heuristic
• PROBLEM: Find the matrix of lowest rank that satisfies the underdetermined linear system
• Convex optimization heuristic – Minimize nuclear norm (sum of singular values) of X
– This is a convex function of the matrix X
– Equivalent to a SDP problem
23
• Convex, can be solved efficiently
• Seems to work well in practice
Relaxation:Affine Rank Minimization:
Let’s see...
Relaxation:Affine Rank Minimization:
• Nuclear norm minimization via SDP (SeDuMi)
26
How to formalize this?
General recipe:
• Find a deterministic condition that ensures success
• Sometimes, condition may be hard to check
• If so, invoke randomness of problem data, to show condition holds with high probability (concentration of measure)
Relaxation:Affine Rank Minimization:
• Influential recent work of Donoho/Tanner, Candès/Romberg/Tao, Baraniuk,...
• Relies on sparsity (on some domain, e.g., Fourier, wavelet, etc)
• “Few” random measurements + smart decoding
• Many applications, particularly MRI, geophysics, radar, etc.
30
• Use a “restricted isometry property” (RIP)
• Then, this heuristic provably works.
• For “random” operators, RIP holds with overwhelming probability
Relaxation:Affine Rank Minimization:
Restricted Isometry Property (RIP)
• Let A:Rm x n→ Rp be a linear map. For every positive integer r ≤m, define the rrestricted isometry constant to be the smallest number δr(A) such that
holds for all matrices X of rank at most r.
• Similar to RIP condition for sparsity studied by Candès and Tao (2004).
32
RIP ⇒ Heuristic Succeeds
Theorem: Let X0 be a matrix of rank r. Let X* be the solution of A(X)=A(X0) of smallest nuclear norm. Suppose that r≥ 1 is such that δ5r(A) < 1/10. Then X*=X0.
• Deterministic condition on A • Checking RIP can be hard
• However, “random”A will have RIP with high probability
Independent of m,n,r,p
“Random”⇒ RIP holds
Theorem: Fix 0<δ<1. If A is “random”, then for every r, there exist a constant c0 depending only on δ such that δr(A) ≤ δ whenever p≥c0 r(2nr) log(n2) with probability exponentially close to 1.
• Number of measurements c0 r(2nr) log(n2)
• Typical scaling for this type of result.
constant intrinsic dimension
• disjoint support implies cardinality of sum is sum of cardinalities
• l1 norm
of cardinality on unit ball of l∞ norm
– Optimized via LP
Lowrank Recovery
• rank
• “disjoint” row and column spaces implies rank of sum is sum of ranks
• nuclear norm
– dual norm: operator norm
– best convex approximation of rank on unit ball of operator norm
– Optimized via SDP
35
Secant varieties
• What is common to the two cases? Can this be further extended?
• A natural notion: secant varieties
• Generalize notions of rank to other objects (e.g., tensors, nonnegative matrices, etc.)
• However, technical difficulties – In general, these varieties may not be closed
– In general, associated norms are not polytime computable
36
• Many possible approaches: – Interiorpoint methods for SDP – Projected subgradient – Lowrank parametrization (e.g., BurerMonteiro)
• Much exciting recent work: – LiuVandenberghe (interior point, exploits structure) – CaiCandèsShen (singular value thresholding) – MaGoldfarbChen (fixed point and Bregman) – TohYun (proximal gradient) – LeeBresler (atomic decomposition) – LiuSunToh (proximal point)
37
Lowrank completion problem
• Recent work of CandèsRecht, CandèsTao, Keshavan MontanariOh, etc.
• These provide theoretical guarantees of recovery, either using nuclear norm, or alternative algorithms
• E.g., CandèsRecht:
38
+
? ? =
• Then, the inverse covariance S1 is sparse.
Application: Graphical models
No longer true if graphical model has hidden variables.
But, it is the sum of a sparse and a lowrank matrix (Schur complement)
40
Rigidity of a matrix NPhard to compute
Rigidity bounds have a number of applications
– Complexity of linear transforms [Valiant 77]
– Communication complexity [Lokam 95]
Identifiability issues
Problem can be illposed. Need to ensure that terms cannot be simultaneously sparse and lowrank.
Define two geometric quantities, related to tangent spaces to sparse and lowrank varieties:
Bad Benign
Tangent spaces must intersect transversally
Sufficient condition for transverse intersection
Tangent space identifiability
• combinatorial, NP-hard in general
44
Propose:
45
is unique optimum of convex program for a range of
Essentially a refinement of tangent space transversality conditions
Transverse intersection:
Convex recovery:
46
Summary
optimization – Algebraic and geometric aspects
• Theoretical challenges – Efficient descriptions – Sharper, verifiable conditions for recovery – Other formulations (e.g., AmesVavasis on planted cliques) – Finite fields?
• Algorithmic issues – Reliable, large scale methods
47
Want to know more? Details below, and in references therein:
• B. Recht, M. Fazel, P.A. Parrilo, Guaranteed MinimumRank Solutions of Linear Matrix Equations via Nuclear Norm Minimization, arXiv:0706.4138, 2007.
• V. Chandrasekaran, S. Sanghavi, P.A. Parrilo, A. Willsky, Rank Sparsity Incoherence for Matrix Decomposition, arXiv:0906.2220, 2009.
Thanks for your attention!
First, my coauthors...
Application 4: System Identification
Application 5: Video inpainting
Overview
Outline
Comparison with sparsity
Nuclear norm and SDP
Numerical experiments
Phase transition
Compressed sensing - Overview
Restricted Isometry Property (RIP)
Application: Graphical models
Application: Matrix rigidity
International Symposium on Mathematical Programming
August 2009 Chicago
PROBLEM: Find the lowest rankmatrix in a given convex set.
• E.g., given an affine subspace of matrices, find one of lowest possible rank
• In general, NPhard (e.g., reduction from max cut, sparsest vector, etc.).
Many applications...
• Boolean quadratic minimization (e.g., maxcut)
• Relax using the substitution X := xxT:
• If solution X has rank 1, then we solved the original problem!
Application 1: Quadratic optimization
• Partially specified matrix, known pattern • Often, random sampling of entries • Applications:
– Partially specified covariances (PSD case) – Collaborative prediction (e.g., RennieSrebro 05, Netflix problem)
Mij known for black cells Mij unknown for white cells
M =
6
Application 3: Sum of squares
• Number of squares equal to the rank of the Gram matrix
• How to compute a SOS representation with the minimum number of squares?
• Rank minimization with SDP constraints
∑ =
• “Complexity” of P ≈ rank(hank(h))
Application 4: System Identification
• Given video frames with missing portions
• Reconstruct / interpolate the missing data
• “Simple” dynamics <> Low rank (multi) Hankel
??
10
Overview
Rank optimization is ubiquitous in optimization, communications, and control. Difficult, even under linear constraints.
This talk: • Underlying convex geometry and nuclear norm • Rank minimization can be efficiently solved (under certain conditions!)
• Links with sparsity and “compressed sensing” • Combined sparsity + rank
11
Outline
• Applications
• Convex hulls of varieties
• Geometry and nuclear norm
Rank minimization
PROBLEM: Find the matrix of smallest rank that satisfies the underdetermined linear system:
• Given an affine subspace of matrices, find one of lowest rank
13
3x3 matrices, rank 2
Manymethods have been proposed (after all, it’s an NLP!)
• Newtonlike local methods
• Etc...
Sometimes (often?) work very well. But, very hard to prove results on global performance.
15
Geometry of rank varieties
What is the structure of the set of matrices of fixed rank?
An algebraic variety (solution set of polynomial equations), defined by the vanishing of all (k+1) × (k+1) minors of M.
Its dimension is k x (2nk), and is nonsingular, except on those matrices of rank less than or equal to k1.
At smooth points , welldefined tangent space:
M
• Unitarily invariant
• Also known as Schatten 1norm, KyFan rnorm, trace norm, etc...
Nuclear norm
• Bad nonconvex problem > Convexify!
rank nuclear norm
• Systematic methods to produce (exact or approximate) SDP representations of convex hulls
• Based on sums of squares (Shor, Nesterov, Lasserre, P., Nie, Helton)
• Parallels/generalizations from combinatorial optimization (theta bodies, e.g., GouveiaLaurentP. Thomas 09)
21
• Semidefinite programming characterization:
A convex heuristic
• PROBLEM: Find the matrix of lowest rank that satisfies the underdetermined linear system
• Convex optimization heuristic – Minimize nuclear norm (sum of singular values) of X
– This is a convex function of the matrix X
– Equivalent to a SDP problem
23
• Convex, can be solved efficiently
• Seems to work well in practice
Relaxation:Affine Rank Minimization:
Let’s see...
Relaxation:Affine Rank Minimization:
• Nuclear norm minimization via SDP (SeDuMi)
26
How to formalize this?
General recipe:
• Find a deterministic condition that ensures success
• Sometimes, condition may be hard to check
• If so, invoke randomness of problem data, to show condition holds with high probability (concentration of measure)
Relaxation:Affine Rank Minimization:
• Influential recent work of Donoho/Tanner, Candès/Romberg/Tao, Baraniuk,...
• Relies on sparsity (on some domain, e.g., Fourier, wavelet, etc)
• “Few” random measurements + smart decoding
• Many applications, particularly MRI, geophysics, radar, etc.
30
• Use a “restricted isometry property” (RIP)
• Then, this heuristic provably works.
• For “random” operators, RIP holds with overwhelming probability
Relaxation:Affine Rank Minimization:
Restricted Isometry Property (RIP)
• Let A:Rm x n→ Rp be a linear map. For every positive integer r ≤m, define the rrestricted isometry constant to be the smallest number δr(A) such that
holds for all matrices X of rank at most r.
• Similar to RIP condition for sparsity studied by Candès and Tao (2004).
32
RIP ⇒ Heuristic Succeeds
Theorem: Let X0 be a matrix of rank r. Let X* be the solution of A(X)=A(X0) of smallest nuclear norm. Suppose that r≥ 1 is such that δ5r(A) < 1/10. Then X*=X0.
• Deterministic condition on A • Checking RIP can be hard
• However, “random”A will have RIP with high probability
Independent of m,n,r,p
“Random”⇒ RIP holds
Theorem: Fix 0<δ<1. If A is “random”, then for every r, there exist a constant c0 depending only on δ such that δr(A) ≤ δ whenever p≥c0 r(2nr) log(n2) with probability exponentially close to 1.
• Number of measurements c0 r(2nr) log(n2)
• Typical scaling for this type of result.
constant intrinsic dimension
• disjoint support implies cardinality of sum is sum of cardinalities
• l1 norm
of cardinality on unit ball of l∞ norm
– Optimized via LP
Lowrank Recovery
• rank
• “disjoint” row and column spaces implies rank of sum is sum of ranks
• nuclear norm
– dual norm: operator norm
– best convex approximation of rank on unit ball of operator norm
– Optimized via SDP
35
Secant varieties
• What is common to the two cases? Can this be further extended?
• A natural notion: secant varieties
• Generalize notions of rank to other objects (e.g., tensors, nonnegative matrices, etc.)
• However, technical difficulties – In general, these varieties may not be closed
– In general, associated norms are not polytime computable
36
• Many possible approaches: – Interiorpoint methods for SDP – Projected subgradient – Lowrank parametrization (e.g., BurerMonteiro)
• Much exciting recent work: – LiuVandenberghe (interior point, exploits structure) – CaiCandèsShen (singular value thresholding) – MaGoldfarbChen (fixed point and Bregman) – TohYun (proximal gradient) – LeeBresler (atomic decomposition) – LiuSunToh (proximal point)
37
Lowrank completion problem
• Recent work of CandèsRecht, CandèsTao, Keshavan MontanariOh, etc.
• These provide theoretical guarantees of recovery, either using nuclear norm, or alternative algorithms
• E.g., CandèsRecht:
38
+
? ? =
• Then, the inverse covariance S1 is sparse.
Application: Graphical models
No longer true if graphical model has hidden variables.
But, it is the sum of a sparse and a lowrank matrix (Schur complement)
40
Rigidity of a matrix NPhard to compute
Rigidity bounds have a number of applications
– Complexity of linear transforms [Valiant 77]
– Communication complexity [Lokam 95]
Identifiability issues
Problem can be illposed. Need to ensure that terms cannot be simultaneously sparse and lowrank.
Define two geometric quantities, related to tangent spaces to sparse and lowrank varieties:
Bad Benign
Tangent spaces must intersect transversally
Sufficient condition for transverse intersection
Tangent space identifiability
• combinatorial, NP-hard in general
44
Propose:
45
is unique optimum of convex program for a range of
Essentially a refinement of tangent space transversality conditions
Transverse intersection:
Convex recovery:
46
Summary
optimization – Algebraic and geometric aspects
• Theoretical challenges – Efficient descriptions – Sharper, verifiable conditions for recovery – Other formulations (e.g., AmesVavasis on planted cliques) – Finite fields?
• Algorithmic issues – Reliable, large scale methods
47
Want to know more? Details below, and in references therein:
• B. Recht, M. Fazel, P.A. Parrilo, Guaranteed MinimumRank Solutions of Linear Matrix Equations via Nuclear Norm Minimization, arXiv:0706.4138, 2007.
• V. Chandrasekaran, S. Sanghavi, P.A. Parrilo, A. Willsky, Rank Sparsity Incoherence for Matrix Decomposition, arXiv:0906.2220, 2009.
Thanks for your attention!
First, my coauthors...
Application 4: System Identification
Application 5: Video inpainting
Overview
Outline
Comparison with sparsity
Nuclear norm and SDP
Numerical experiments
Phase transition
Compressed sensing - Overview
Restricted Isometry Property (RIP)
Application: Graphical models
Application: Matrix rigidity