matrix completion from a few entriescnls.lanl.gov/~jasonj/poa/slides/keshavan.pdf · matrix...
TRANSCRIPT
Matrix Completion from a Few Entries
Raghunandan Keshavan, Andrea Montanari and Sewoong Oh
Stanford University
Physics of AlgorithmsSanta Fe - Aug 31, 2009
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 1 / 24
Motivating Example : Recommender System
Netflix Challenge
3
13
4
1
1
5
4 4
4
4
4
4
4
4
4
4
4
4
4
4
3 3
3
3
1
1
1
11
5
5
5
2
2
2
2
5 · 105 users
2·10
4m
ovies
108 ratings
M =
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 2 / 24
Matrix Completion Problem
Low-rank M U
Σ V T
=n
nα r
r
1. Low-rank matrix M = UΣV T .UTU = n, V TV = nα, Σ = diag(Σ1,Σ2, . . . ,Σr )
2. Uniformly random sample E ⊂ [n]× [nα] given its size |E |.[k] = 1, 2, . . . , k
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 4 / 24
Matrix Completion Problem
Sample MEn
nα
1. Low-rank matrix M = UΣV T .UTU = n, V TV = nα, Σ = diag(Σ1,Σ2, . . . ,Σr )
2. Uniformly random sample E ⊂ [n]× [nα] given its size |E |.[k] = 1, 2, . . . , k
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 4 / 24
Matrix Completion Problems
For any estimation M, let RMSE =(
1√mn||M − M||F
)||A||2F =
Pij A2
ij
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24
Matrix Completion Problems
For any estimation M, let RMSE =(
1√mn||M − M||F
)Q1. How many samples do we need to get RMSE ≤ δ?
(1 + α)rn . |E | = O(nr)
Q2. How many samples do we need to recover M exactly?
n log n . |E | = O(n log n)
Q3. What if the samples are corrupted by noise?
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24
Matrix Completion Problems
For any estimation M, let RMSE =(
1√mn||M − M||F
)Q1. How many samples do we need to get RMSE ≤ δ?
(1 + α)rn . |E | = O(nr)
Q2. How many samples do we need to recover M exactly?
n log n . |E | = O(n log n)
Q3. What if the samples are corrupted by noise?
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24
Matrix Completion Problems
Q1. How many samples do we need to get RMSE ≤ δ?
(1 + α)rn . |E | = O(nr)
Q2. How many samples do we need to recover M exactly?
n log n . |E | = O(n log n)
Q3. What if the samples are corrupted by noise?
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24
Matrix Completion Problems
Q1. How many samples do we need to get RMSE ≤ δ?
(1 + α)rn . |E | = O(nr)
Q2. How many samples do we need to recover M exactly?
n log n . |E | = O(n log n)
Q3. What if the samples are corrupted by noise?
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24
Matrix Completion Problems
Q1. How many samples do we need to get RMSE ≤ δ?
(1 + α)rn . |E | = O(nr)
Q2. How many samples do we need to recover M exactly?
n log n . |E | = O(n log n)
Q3. What if the samples are corrupted by noise?
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24
Matrix Completion Problems
Q1. How many samples do we need to get RMSE ≤ δ?
(1 + α)rn . |E | = O(nr)
Q2. How many samples do we need to recover M exactly?
n log n . |E | = O(n log n)
Q3. What if the samples are corrupted by noise?
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24
Pathological Example
M = e1eT1
n
xy
1 0 · · · 00 0 · · · 0...
.... . .
...0 0 · · · 0
P(observing M11) = |E |
n2
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 6 / 24
Pathological Example
M = e1eT1
n
xy
1 0 · · · 00 0 · · · 0...
.... . .
...0 0 · · · 0
P(observing M11) = |E |
n2
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 6 / 24
Incoherence Property
M is (µ0, µ1)-incoherent if
A1. Mmax ≤ µ0Σ1
√r ; ,
A2.r∑
a=1
U2ia ≤ µ1r ,
r∑a=1
V 2ja ≤ µ1r .
[Candes, Recht 2008 [1]]
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 7 / 24
Previous Work
Theorem (Candes, Recht 2008 [1])
Let M be an n × nα matrix of rank r satisfying (µ0, µ1)-incoherencecondition. If
|E | ≥ C (α, µ0, µ1)rn6/5 log n ,
then w.h.p. Semidefinite Programming reconstructs M exactly.
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 8 / 24
Main Contributions
Questions Main Results
1. RMSE ≤ C (α)(
nr|E |
) 12
2. Exact Reconstruction |E | = O(n log n)
3. Noisy Reconstruction |E | = O(n log n)
(N = M + Z ) 1√mn||M − M||F ≤ C n
√αr|E | ||Z
E ||24. Complexity? O(|E |r log n)
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24
Main Contributions
Questions Main Results
1. RMSE ≤ C (α)(
nr|E |
) 12
2. Exact Reconstruction |E | = O(n log n)
3. Noisy Reconstruction |E | = O(n log n)
(N = M + Z ) 1√mn||M − M||F ≤ C n
√αr|E | ||Z
E ||24. Complexity? O(|E |r log n)
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24
Main Contributions
Questions Main Results
1. RMSE ≤ C (α)(
nr|E |
) 12
2. Exact Reconstruction |E | = O(n log n)
3. Noisy Reconstruction |E | = O(n log n)
(N = M + Z ) 1√mn||M − M||F ≤ C n
√αr|E | ||Z
E ||24. Complexity? O(|E |r log n)
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24
Main Contributions
Questions Main Results
1. RMSE ≤ C (α)(
nr|E |
) 12
2. Exact Reconstruction |E | = O(n log n)
3. Noisy Reconstruction |E | = O(n log n)
(N = M + Z ) 1√mn||M − M||F ≤ C n
√αr|E | ||Z
E ||24. Complexity? O(|E |r log n)
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24
Main Contributions
Questions Main Results
1. RMSE ≤ C (α)(
nr|E |
) 12
2. Exact Reconstruction |E | = O(n log n)
3. Noisy Reconstruction |E | = O(n log n)
(N = M + Z ) 1√mn||M − M||F ≤ C n
√αr|E | ||Z
E ||24. Complexity? O(|E |r log n)
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24
The Algorithm and Main Theorems
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 10 / 24
Naıve Approach
MEij =
Mij if (i , j) ∈ E ,
0 otherwise.
ME =n∑
k=1
xkσkyTk
Rank-r projection :
Pr (ME ) ≡ n2α
|E |
r∑k=1
xkσkyTk
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 11 / 24
Naıve Approach Fails
Define : deg(rowi ) ≡ # of samples in row i .
For |E | = O(n), spurious singular values of Ω(√
log n/(log log n)).
Solution : Trimming
MEij =
0 if deg(rowi ) > 2E[deg(rowi )] ,0 if deg(colj) > 2E[deg(coli )] ,
MEij otherwise.
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 12 / 24
Naıve Approach Fails
Define : deg(rowi ) ≡ # of samples in row i .
For |E | = O(n), spurious singular values of Ω(√
log n/(log log n)).
Solution : Trimming
MEij =
0 if deg(rowi ) > 2E[deg(rowi )] ,0 if deg(colj) > 2E[deg(coli )] ,
MEij otherwise.
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 12 / 24
Naıve Approach Fails
Define : deg(rowi ) ≡ # of samples in row i .
For |E | = O(n), spurious singular values of Ω(√
log n/(log log n)).
Solution : Trimming
MEij =
0 if deg(rowi ) > 2E[deg(rowi )] ,0 if deg(colj) > 2E[deg(coli )] ,
MEij otherwise.
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 12 / 24
The Algorithm
OptSpace
Input : sample positions E , sample values ME , rank r
Output : estimation M
1: Trim ME , and let ME be the output;
2: Compute rank-r projection Pr (ME ) = X0S0YT0 ;
3:
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 13 / 24
Main Result
Theorem (Keshavan, Montanari, Oh, 2009 [2])
Let M be an n × nα matrix of rank-r bounded by Mmax. Then, w.h.p,
1
nMmax||M − Pr (ME )||F = RMSE ≤ C (α)
√nr
|E |,
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 14 / 24
The Algorithm
OptSpace
Input : sample positions E , sample values ME , rank r
Output : estimation M
1: Trim ME , and let ME be the output;
2: Compute rank-r projection Pr (ME ) = X0S0YT0 ;
3: Minimize F(X ,Y ) by gradient descent starting at (X0,Y0).
F(X ,Y ) = minS∈Rr×r
∑(i ,j)∈E
(Mij − (XSY T )ij
)2
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 15 / 24
Main Result
Theorem (Keshavan, Montanari, Oh, 2009 [2])
Assume r = O(1), and let M be an n × nα matrix satisfying(µ0, µ1)-incoherence with σ1(M)/σr (M) = O(1). If
|E | ≥ C ′n log n ,
then OptSpace returns, w.h.p., the matrix M.
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 16 / 24
Comparison
Theorem (Candes, Tao, 2009 [3])
Assume a strongly incoherent matrix M.If |E | ≥ C r n (log n)6 thenSemidefinite Programming returns, w.h.p., the matrix M.
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 17 / 24
Corrupted Observations
N = M + Z for any n × nα matrix Z
NE (defined similar to ME ) input to OptSpace
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 18 / 24
Corrupted Observations
N = M + Z for any n × nα matrix Z
NE (defined similar to ME ) input to OptSpace
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 18 / 24
Corrupted Observations
N = M + Z for any n × nα matrix Z
NE (defined similar to ME ) input to OptSpace
Theorem (Keshavan, Montanari, Oh, 2009 [4])
Let N = M + Z with M as above and Z any n × nα matrix. If|E | ≥ C ′n log n , then OptSpace with input NE returns M such thatw.h.p.,
1√mn||M − M||F ≤ C
n√αr
|E |||ZE ||2
provided that the right hand side is smaller than Σr .
||A||2 = supv 6=0
“||Av||||v||
”
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 18 / 24
Corrupted Observations
Zij are independent, EZij = 0 and P|Zij | ≥ x ≤ Ce−x2
2σ2
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 19 / 24
Corrupted Observations
Zij are independent, EZij = 0 and P|Zij | ≥ x ≤ Ce−x2
2σ2
Theorem (Keshavan, Montanari, Oh, 2009 [4])
If Z is a random matrix with entries drawn as above, then
||ZE ||2 ≤ Cσ
(√α|E | log |E |
n
) 12
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 19 / 24
Corrupted Observations
Zij are independent, EZij = 0 and P|Zij | ≥ x ≤ Ce−x2
2σ2
Theorem (Keshavan, Montanari, Oh, 2009 [4])
If Z is a random matrix with entries drawn as above, then
||ZE ||2 ≤ Cσ
(√α|E | log |E |
n
) 12
Corollary
Let N = M + Z, with Z distributed as above. Then, OptSpace withinput NE returns M such that
1√mn||M − M||F ≤ Cασ
(nr
|E |
) 12 √
log |E |
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 19 / 24
Simulation ResultsM = UV T , Uij ∼ N(0, 1), Vij ∼ N(0, 1)r = 10, α = 1, n = 1000M is recovered if ||M − M||F/||M||F < 10−4
0
0.5
1
0 20 40 60 80 100 120 140 160 180 200
Lower BoundOptSpace
FPCASVT
ADMiRA
|E |/n
Rec
over
yR
ate
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 20 / 24
Simulation ResultsUij ∼ N(0, σ2 = 20/
√n)[5]
Zij ∼ N(0, 1)r = 2, α = 1, n = 600
0
0.2
0.4
0.6
0.8
1
0 100 200 300 400 500 600
Convex RelaxationADMiRAOptSpace
Oracle Bound
RM
SE
|E |/nR.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 21 / 24
Conclusion
Main Results
1 RMSE ≤ δ : |E | = O(nr)
2 Exact : |E | = O(n log n)
3 Noisy : ≤ C n√αr|E | ||Z
E ||2
4 Complexity : O(|E |r log n)
What’s left?
1 Prior knowledge of rank?RankEstimation is exact w.h.p for |E | = O(n)
2 r = Θ(nβ) ?Suboptimal bound
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24
Conclusion
Main Results
1 RMSE ≤ δ : |E | = O(nr)
2 Exact : |E | = O(n log n)
3 Noisy : ≤ C n√αr|E | ||Z
E ||2
4 Complexity : O(|E |r log n)
What’s left?
1 Prior knowledge of rank?RankEstimation is exact w.h.p for |E | = O(n)
2 r = Θ(nβ) ?Suboptimal bound
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24
Conclusion
Main Results
1 RMSE ≤ δ : |E | = O(nr)
2 Exact : |E | = O(n log n)
3 Noisy : ≤ C n√αr|E | ||Z
E ||2
4 Complexity : O(|E |r log n)
What’s left?
1 Prior knowledge of rank?RankEstimation is exact w.h.p for |E | = O(n)
2 r = Θ(nβ) ?Suboptimal bound
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24
Conclusion
Main Results
1 RMSE ≤ δ : |E | = O(nr)
2 Exact : |E | = O(n log n)
3 Noisy : ≤ C n√αr|E | ||Z
E ||2
4 Complexity : O(|E |r log n)
What’s left?
1 Prior knowledge of rank?RankEstimation is exact w.h.p for |E | = O(n)
2 r = Θ(nβ) ?Suboptimal bound
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24
Conclusion
Main Results
1 RMSE ≤ δ : |E | = O(nr)
2 Exact : |E | = O(n log n)
3 Noisy : ≤ C n√αr|E | ||Z
E ||2
4 Complexity : O(|E |r log n)
What’s left?
1 Prior knowledge of rank?RankEstimation is exact w.h.p for |E | = O(n)
2 r = Θ(nβ) ?Suboptimal bound
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24
ConclusionMain Results
1 RMSE ≤ δ : |E | = O(nr)
2 Exact : |E | = O(n log n)
3 Noisy : ≤ C n√αr|E | ||Z
E ||2
4 Complexity : O(|E |r log n)
What’s left?
1 Prior knowledge of rank?RankEstimation is exact w.h.p for |E | = O(n)
2 r = Θ(nβ) ?Suboptimal bound
Thank you!
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24
References
E. J. Candes and B. Recht.Exact matrix completion via convex optimization.arxiv:0805.4471, 2008.
R. H. Keshavan, A. Montanari, and S. Oh.Matrix completion from a few entries.arXiv:0901.3150, January 2009.
E. J. Candes and T. Tao.The power of convex relaxation: Near-optimal matrix completion.arXiv:0903.1476, 2009.
R. H. Keshavan, A. Montanari, and S. Oh.Matrix completion from noisy entries.arXiv:0906.2027, June 2009.
E. J. Candes and Y. Plan.Matrix completion with noise.arXiv:0903.3131, 2009.
R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 23 / 24