singular value shrinkage prior: a matrix version of stein ... · singular value shrinkage prior ˇ...
TRANSCRIPT
![Page 1: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/1.jpg)
Singular value shrinkage prior:a matrix version of Stein’s prior
Takeru Matsuda
The University of Tokyo
June 19, 2019Symposium in memory of Charles Stein
June 19, 2019 Symposium in memory of Charles Stein 1 / 37
![Page 2: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/2.jpg)
Motivation
vector James–Stein estimator (1961) Stein’s prior (1974)
matrix Efron–Morris estimator (1972) ?
June 19, 2019 Symposium in memory of Charles Stein 2 / 37
![Page 3: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/3.jpg)
Stein’s 1974 paper
“Estimation of the mean of a multivariate normal distribution”
1. Introduction2. Computation of the risk of an arbitrary estimate of the mean3. The spherically symmetric case4. The risk of an estimate of a matrix of means5. Choice of an estimate in the p × p case6. Directions in which this work ought to be extended
June 19, 2019 Symposium in memory of Charles Stein 3 / 37
![Page 4: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/4.jpg)
AbstractEfron–Morris estimator (Efron and Morris, 1972)
MEM(X) = X(Iq − (p − q − 1)(X>X)−1
)minimax estimator of a normal mean matrixnatural extension of the James–Stein estimator
↓
Singular value shrinkage prior (M. and Komaki, Biometrika 2015)
πSVS(M) = det(M>M)−(p−q−1)/2
superharmonic (∆πSVS ≤ 0), natural generalization of the Stein priorworks well for low-rank matrices→ reduced-rank regression
Empirical Bayes matrix completion (M. and Komaki, 2019)
estimate unobserved entries of a matrix by exploting low-rankness
June 19, 2019 Symposium in memory of Charles Stein 4 / 37
![Page 5: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/5.jpg)
Efron–Morris estimator(Efron and Morris, 1972)
June 19, 2019 Symposium in memory of Charles Stein 5 / 37
![Page 6: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/6.jpg)
Note: singular values of matrices
Singular value decomposition of p × q matrix M (p ≥ q)
M = UΛV>
U : p × q, V : q × q, U>U = V>V = Iq
Λ = diag(σ1(M), . . . , σq(M))
σ1(M) ≥ · · · ≥ σq(M) ≥ 0 : singular values of Mrank(M) = #i | σi(M) > 0
June 19, 2019 Symposium in memory of Charles Stein 6 / 37
![Page 7: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/7.jpg)
Estimation of normal mean matrix
Xi j ∼ N(Mi j, 1) (i = 1, · · · , p; j = 1, · · · , q)
estimate M based on X under Frobenius loss ‖M − M‖2FEfron–Morris estimator (= James–Stein estimator when q = 1)
MEM(X) = X(Iq − (p − q − 1)(X>X)−1
)Theorem (Efron and Morris, 1972)When p ≥ q + 2, MEM is minimax and dominates MMLE(X) = X.
Stein (1974) noticed that it shrinks the singular values of theobservation to zero.
σi(MEM) =
(1 −
p − q − 1σi(X)2
)σi(X)
June 19, 2019 Symposium in memory of Charles Stein 7 / 37
![Page 8: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/8.jpg)
Numerical resultsRisk functions for p = 5, q = 3, σ1 = 20, σ3 = 0 (rank 2)black: MLE, blue: JS, red: EM
MEM works well when σ2 is small, even if σ1 is large.I MJS works well when ‖M‖2F = σ2
1 + σ22 + σ2
3 is small.
June 19, 2019 Symposium in memory of Charles Stein 8 / 37
![Page 9: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/9.jpg)
Numerical resultsRisk functions for p = 5, q = 3, σ2 = σ3 = 0 (rank 1)black: MLE, blue: JS, red: EM
MEM has constant risk reduction as long as σ2 = σ3 = 0,because it shrinks singular values for each.Therefore, it works well when M has low rank.
June 19, 2019 Symposium in memory of Charles Stein 9 / 37
![Page 10: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/10.jpg)
Remark: SURE for matrix meanorthogonally invariant estimator
X = UΣV>, M = UΣ(Iq − Φ(Σ))V>
Stein (1974) derived an unbiased estimate of risk (SURE):
pq +
q∑i=1
σ2
i φ2i − 2(p − q + 1)φi − 2σi
∂φi
∂σi
− 4
∑i< j
σ2i φi − σ
2jφ j
σ2i − σ
2j
I regularity conditions→ M. and Strawderman (2018)
SURE is also improved by singular value shrinkage (M. andStrawderman, 2018)
I extension of Johnstone (1988)
June 19, 2019 Symposium in memory of Charles Stein 10 / 37
![Page 11: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/11.jpg)
Singular value shrinkage prior
(Matsuda and Komaki, 2015)
June 19, 2019 Symposium in memory of Charles Stein 11 / 37
![Page 12: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/12.jpg)
Superharmonic prior for estimation
X ∼ Np(µ, Ip)
estimate µ based on X under the quadratic losssuperharmonic prior
∆π(µ) =
p∑i=1
∂2
∂µ2i
π(µ) ≤ 0
the Stein prior (p ≥ 3) is superharmonic:
π(µ) = ‖µ‖2−p
Bayes estimator with the Stein prior shrinks to the origin.
Theorem (Stein, 1974)Bayes estimators with superharmonic priors dominate MLE.
June 19, 2019 Symposium in memory of Charles Stein 12 / 37
![Page 13: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/13.jpg)
Superharmonic prior for prediction
X ∼ Np(µ,Σ), Y ∼ Np(µ, Σ)
We predict Y from the observation X (Σ, Σ: known)Bayesian predictive density with prior π(µ)
pπ(y | x) =
∫p(y | µ)π(µ | x)dµ
Kullback-Leibler loss
D(p(y | µ), p(y | x)) =
∫p(y | µ) log
p(y | µ)p(y | x)
dy
Bayesian predictive density with the uniform prior is minimaxJune 19, 2019 Symposium in memory of Charles Stein 13 / 37
![Page 14: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/14.jpg)
Superharmonic prior for prediction
X ∼ Np(µ,Σ), Y ∼ Np(µ, Σ)
Theorem (Komaki, 2001)When Σ ∝ Σ, the Stein prior dominates the uniform prior.
Theorem (George, Liang and Xu, 2006)When Σ ∝ Σ, superharmonic priors dominate the uniform prior.
Theorem (Kobayashi and Komaki, 2008; George andXu, 2008)For general Σ and Σ, superharmonic priors dominate the uniformprior.
June 19, 2019 Symposium in memory of Charles Stein 14 / 37
![Page 15: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/15.jpg)
Motivation
vectorJames–Stein estimator
µJS =(1 − p−2
‖x‖2
)x
Stein’s prior
πS(µ) = ‖µ‖−(p−2)
matrixEfron–Morris estimator
MEM = X(Iq − (p − q − 1)(X>X)−1
) ?
note: JS and EM are not generalized Bayes.
June 19, 2019 Symposium in memory of Charles Stein 15 / 37
![Page 16: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/16.jpg)
Singular value shrinkage prior
πSVS(M) = det(M>M)−(p−q−1)/2 =
q∏i=1
σi(M)−(p−q−1)
We assume p ≥ q + 2.πSVS puts more weight on matrices with smaller singularvalues, so it shrinks singular values for each.When q = 1, πSVS coincides with the Stein prior.
Theorem (M. and Komaki, 2015)πSVS is superharmonic: ∆πSVS ≤ 0.
Therefore, the Bayes estimator and Bayesian predictive densitywith respect to πSVS are minimax.
June 19, 2019 Symposium in memory of Charles Stein 16 / 37
![Page 17: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/17.jpg)
Comparison to other superharmonic priors
Previously proposed superharmonic priors mainly shrink tosimple subsets (e.g. point, linear subspace).
In contrast, our priors shrink to the set of low rank matrices,which is nonlinear and nonconvex.
Theorem (M. and Komaki, 2015)∆πSVS(M) = 0 if M has full rank.
Therefore, superharmonicity of πSVS is strongly concentrated inthe same way as the Laplacian of the Stein prior becomes aDirac delta function.
June 19, 2019 Symposium in memory of Charles Stein 17 / 37
![Page 18: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/18.jpg)
An observationJames-Stein estimator
µJS =
(1 −
p − 2‖x‖2
)x
Stein’s priorπS(µ) = ‖µ‖−(p−2)
Efron–Morris estimator
σi =
(1 −
p − q − 1σ2
i
)σi
Singular value shrinkage prior
πSVS(M) =
q∏i=1
σi(M)−(p−q−1)
June 19, 2019 Symposium in memory of Charles Stein 18 / 37
![Page 19: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/19.jpg)
Numerical resultsRisk functions of Bayes estimators
I p = 5, q = 3I dashed: uniform prior, solid: Stein’s prior, dash-dot: our prior
σ1 = 20, σ3 = 0
σ2
Fro
be
niu
s r
isk
0 5 10 15 20
13
14
15
πSVS works well when σ2 is small, even if σ1 is large.I Stein’s prior works well when ‖M‖2F = σ2
1 + σ22 + σ2
3 is small.
June 19, 2019 Symposium in memory of Charles Stein 19 / 37
![Page 20: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/20.jpg)
Numerical resultsRisk functions of Bayes estimators
I p = 5, q = 3I dashed: uniform prior, solid: Stein’s prior, dash-dot: our prior
σ2 = 0, σ3 = 0
σ1
Fro
be
niu
s r
isk
0 5 10 15 20
05
10
15
πSVS has constant risk reduction as long as σ2 = σ3 = 0,because it shrinks singular values for each.Therefore, it works well when M has low rank.
June 19, 2019 Symposium in memory of Charles Stein 20 / 37
![Page 21: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/21.jpg)
Remark: integral representation
When p > 2q, an integral representation of πSVS is obtained.I dΣ : Lebesgue measure on the space of positive semidefinite
matrices
πSVS(M) ∝∫
Np,q(0, Ip ⊗ Σ)dΣ
cf. Stein’s prior
πS(µ) = ‖µ‖2−p ∝
∫ ∞
0Np(0, tIp)dt
June 19, 2019 Symposium in memory of Charles Stein 21 / 37
![Page 22: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/22.jpg)
Additional shrinkage
Efron and Morris (1976) proposed an estimator that furtherdominates MEM by additional shrinkage to the origin
MMEM = X
Iq − (p − q − 1)(X>X)−1 −q2 + q − 2tr(X>X)
Iq
Motivated from this estimator, we propose another shrinkage
prior
πMSVS(M) = πSVS(M)‖M‖−(q2+q−2)F
Theorem (M. and Komaki, 2017)The prior πMSVS asymptotically dominates πSVS in both estimationand prediction.
June 19, 2019 Symposium in memory of Charles Stein 22 / 37
![Page 23: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/23.jpg)
Numerical results
p = 10, q = 3, σ2 = σ3 = 0 (rank 1)black: πI, blue: πS, green: πSVS, red: πMSVS
Additional shrinkage improves risk when ‖M‖F is small.
June 19, 2019 Symposium in memory of Charles Stein 23 / 37
![Page 24: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/24.jpg)
Admissibility results
Theorem (M. and Strawderman)The Bayes estimator with respect to πSVS is inadmissible.The Bayes estimator with respect to πMSVS is admissible.
Proof: use Brown’s condition
June 19, 2019 Symposium in memory of Charles Stein 24 / 37
![Page 25: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/25.jpg)
Addition of column-wise shrinkage
πMSVS(M) = πSVS(M)q∏
j=1
‖M· j‖−q+1
M· j: j-th column vector of M
Theorem (M. and Komaki, 2017)The prior πMSVS asymptotically dominates πSVS in both estimationand prediction.
This prior can be used for sparse reduced rank regression.
Y = XB + E, E ∼ Nn,q(0, In ⊗ Σ)
→ B = (X>X)−1X>Y ∼ Np,q(B, (X>X)−1 ⊗ Σ)
June 19, 2019 Symposium in memory of Charles Stein 25 / 37
![Page 26: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/26.jpg)
Stein’s recommendationEfron–Morris estimator
σi =
(1 −
p − q − 1σ2
i
)σi
Singular value shrinkage prior
πSVS(M) =
q∏i=1
σi(M)−(p−q−1)
Stein (1974, Section 5) recommends stronger shrinkage
σi =
(1 −
p + q − 2i − 1σ2
i
)σi
and says it dominates the Efron–Morris estimator.Corresponding prior ?
π(M) =
q∏i=1
σi(M)−(p+q−2i−1)
June 19, 2019 Symposium in memory of Charles Stein 26 / 37
![Page 27: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/27.jpg)
Empirical Bayes matrix completion
(Matsuda and Komaki, 2019)
June 19, 2019 Symposium in memory of Charles Stein 27 / 37
![Page 28: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/28.jpg)
Empirical Bayes viewpointEfron–Morris estimator was derived as an empirical Bayesestimator.
M ∼ Np,q(0, Ip ⊗ Σ) ⇔ Mi· ∼ Nq(0,Σ)
Y | M ∼ Np,q(M, Ip ⊗ Iq) ⇔ Yi j ∼ N(Mi j, 1)
Bayes estimator (posterior mean)
Mπ(Y) = Y(Iq − (Iq + Σ)−1
)Since Y>Y ∼ Wq(Iq + Σ, p) marginally,
E[(Y>Y)−1] =1
p − q − 1(Iq + Σ)−1
→ Replace (Iq + Σ)−1 in Mπ(Y) by (p − q − 1)(Y>Y)−1
→ Efron–Morris estimatorJune 19, 2019 Symposium in memory of Charles Stein 28 / 37
![Page 29: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/29.jpg)
Matrix completion
Netflix problemI matrix of movie ratings by users
We want to estimate unobserved entries for recommendation.→ matrix completionMany studies investigated its theory and algorithm.
June 19, 2019 Symposium in memory of Charles Stein 29 / 37
![Page 30: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/30.jpg)
Matrix completion
Low-rankness of the underlying matrix is crucial in matrixcompletion.Existing algorithms employ low rank property.
I SVT, SOFT-IMPUTE, OPTSPACE, Manopt, ...
e.g. SVT algorithmI ‖A‖∗: nuclear norm (sum of singular values)
minimizeM
‖M‖∗
subject to |Yi j − Mi j| ≤ Ei j, (i, j) ∈ Ω
→ sparse singular values (low rank)
June 19, 2019 Symposium in memory of Charles Stein 30 / 37
![Page 31: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/31.jpg)
EB algorithmWe develop an empirical Bayes (EB) algorithm for matrixcompletion.EB is based on the following hierarchical model
I Same with the derivation of the Efron–Morris estimatorI C: scalar or diagonal matrix (unknown)
M ∼ Np,q(0, Ip ⊗ Σ)
Y | M ∼ Np,q(M, Ip ⊗C)
Goal: estimate M from observed entries of YI If Y is fully observed, it reduces to the previous problem.
→ EM algorithm !!June 19, 2019 Symposium in memory of Charles Stein 31 / 37
![Page 32: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/32.jpg)
EB algorithm
EB algorithmE step: estimate (Σ,C) from M and YM step: estimate M from Y and (Σ, C)Iterate until convergence
Both steps can be solved analytically.I Sherman-Morrison-Woodbery formula
We obtain two algorithms corresponding to C is scalar ordiagonal.EB does not require heuristic parameter tuning other thantolerance.
June 19, 2019 Symposium in memory of Charles Stein 32 / 37
![Page 33: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/33.jpg)
Numerical results
Results on simulated dataI 1000 rows, 100 columns, rank = 30, 50 % entries observedI observation noise: homogeneous (R = Iq)
error time
EB (scalar) 0.26 4.33
EB (diagonal) 0.26 4.26
SVT 0.48 1.44
SOFT-IMPUTE 0.50 3.58
OPTSPACE 0.89 67.74
Manopt 0.89 0.17
EB has the best accuracy.
June 19, 2019 Symposium in memory of Charles Stein 33 / 37
![Page 34: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/34.jpg)
Numerical results: rank
Performance with respect to rankI 1000 rows, 100 columns, 50 % entries observedI observation noise: unit variance
EB has the best accuracy when r ≥ 20.
June 19, 2019 Symposium in memory of Charles Stein 34 / 37
![Page 35: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/35.jpg)
Application to real data
Mice Protein Expression datasetI expression levels of 77 proteins measured in the cerebral cortex
of 1080 miceI from UCI Machine Learning Repository
error time
EB (scalar) 0.12 2.90
EB (diagonal) 0.11 3.35
SVT 0.84 0.17
SOFT-IMPUTE 0.29 2.14
OPTSPACE 0.33 12.39
Manopt 0.64 0.19
EB attains the best accuracy.
June 19, 2019 Symposium in memory of Charles Stein 35 / 37
![Page 36: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/36.jpg)
Future work (tensor case)
How about tensors?
X = (Xi jk)
For tensors, even the definition of rank or singular values is notclear..
Hopefully, some empirical Bayes method provides a naturalshrinkage for tensors.
June 19, 2019 Symposium in memory of Charles Stein 36 / 37
![Page 37: Singular value shrinkage prior: a matrix version of Stein ... · Singular value shrinkage prior ˇ SVS(M) = det(M>M)(pq1)=2 = Yq i=1 ˙ i(M)(pq1) We assume p q+ 2. ˇ SVS puts more](https://reader035.vdocuments.net/reader035/viewer/2022071013/5fcc0963d2cbda19f343b6cf/html5/thumbnails/37.jpg)
SummaryEfron–Morris estimator (Efron and Morris, 1972)
MEM(X) = X(Iq − (p − q − 1)(X>X)−1
)minimax estimator of a normal mean matrixnatural extension of the James–Stein estimator
↓
Singular value shrinkage prior (M. and Komaki, Biometrika 2015)
πSVS(M) = det(M>M)−(p−q−1)/2
superharmonic (∆πSVS ≤ 0), natural generalization of the Stein priorworks well for low-rank matrices→ reduced-rank regression
Empirical Bayes matrix completion (M. and Komaki, 2019)
estimate unobserved entries of a matrix by exploting low-rankness
June 19, 2019 Symposium in memory of Charles Stein 37 / 37