sparse eigen methods by d.c. programming · introduction algorithm results conclusion sparse eigen...
TRANSCRIPT
IntroductionAlgorithm
ResultsConclusion
Sparse Eigen Methods by D.C. Programming
Bharath K. Sriperumbudur David A. TorresGert R. G. Lanckriet
University of California, San Diego
International Conference on Machine Learning, 2007.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
1 IntroductionGeneralized Eigenvalue ProblemWhy Sparsity?
2 AlgorithmSparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSparse Generalized Eigenvalue Algorithm
3 Experiments & Results
4 Conclusion & Future Work
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Generalized Eigenvalue ProblemSparsity
Generalized Eigenvalue Problem
Eigenvalue problems are popular in machine learning and statistics.Classification
Fisher discriminant analysis (FDA)
Dimensionality reduction
Principal component analysis (PCA)Canonical correlation analysis (CCA)
The variational formulation for the generalized eigenvalue problem isgiven by
λmax(A,B) = maxx
xTAx
s.t. xTBx = 1, (1)
where x ∈ Rn, A ∈ S
n and B ≻ 0.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Generalized Eigenvalue ProblemSparsity
Classification
Fisher Discriminant Analysis
finds a 1-D subspace onto which the projections of data lead to amaximal separation between classes (binary classification setting).The variational formulation for FDA is given by
maxx
xT (µ1 − µ2)(µ1 − µ2)
Tx
s.t. xT (Σ1 + Σ2)x = 1. (2)
A = (µ1 − µ2)(µ1 − µ2)T is the between-cluster variance and
B = Σ1 + Σ2 is the within-cluster variance.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Generalized Eigenvalue ProblemSparsity
Dimensionality Reduction
Principal Component Analysis
identifies the direction of maximal variance.The variational formulation for PCA is given by
maxx
xTΣx
s.t. xTx = 1. (3)
A = Σ is the covariance matrix and B = I is the identity matrix.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Generalized Eigenvalue ProblemSparsity
Dimensionality Reduction
Canonical Correlation Analysis
two-view PCA between spaces X and Y .The variational formulation for CCA is given by
maxwx , wy
wTx Sxywy
s.t. wTx Sxxwx = 1 ,wT
y Syywy = 1. (4)
A =
(
0 Sxy
Syx 0
)
, B =
(
Sxx 0
0 Syy
)
and x =
(
wx
wy
)
where
S =
(
Sxx Sxy
Syx Syy
)
is the covariance matrix between samples from
X and Y .
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Generalized Eigenvalue ProblemSparsity
Why Sparsity?
Usually, the solutions of FDA, PCA and CCA are not sparse.
This often makes it difficult to interpret the results.
PCA/CCA: For better interpretability, few relevant features arerequired that explain as much variance as possible.
Applications: bio-informatics, finance etc.
FDA: feature selection aids generalization performance by promotingsparse solutions.
Sparse representation ⇒ better interpretation, better generalizationand reduced computational costs.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Sparse Generalized Eigenvalue Problem
The variational formulation for the sparse generalized eigenvalueproblem is given by
maxx
xTAx
s.t. xTBx = 1
||x||0 ≤ k, (5)
where 1 ≤ k ≤ n and ||x||0 is the cardinality of x.
Eq. (5) is non-convex, NP-hard and therefore intractable.
Usually, the ℓ1 approximation is used for the cardinality constraint.
The problem is still computationally hard.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Sparse Generalized Eigenvalue Problem
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Level sets of xTAx
Geometry of the generalized eigenvalue problem
KKT condition: Ax = λBx. The generalized eigenvalue problem is veryspecial.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Sparse Generalized Eigenvalue Problem
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
k=sqrt(2)
Geometry of the sparse generalized eigenvalue problem
||x||1 ≤ k
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Sparse Generalized Eigenvalue Problem
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
k>sqrt(2)
Geometry of the sparse generalized eigenvalue problem
||x||1 ≤ k; uninteresting
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Sparse Generalized Eigenvalue Problem
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
k=1
Geometry of the sparse generalized eigenvalue problem
||x||1 ≤ k; uninteresting
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Sparse Generalized Eigenvalue Problem
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
k<1
Geometry of the sparse generalized eigenvalue problem
||x||1 ≤ k; infeasible
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Sparse Generalized Eigenvalue Problem
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
1<k<sqrt(2)
Geometry of the sparse generalized eigenvalue problem
||x||1 ≤ k; interesting
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Sparse Generalized Eigenvalue Problem
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
1<k<sqrt(2)
Geometry of the sparse generalized eigenvalue problem
For x ∈ Rn
k < 1, infeasiblek = 1, uninteresting1 < k <
√n, interesting but hard
k ≥ √n, uninteresting (generalized eigenvalue problem).
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Prior Work
SCoTLASS [Jolliffe et al., 2003]: ℓ1 approximation; locallyconvergent algorithm for B = I.
DSPCA [d’Aspremont et al., 2005]: ℓ1 approximation, followed bylifting and rank constraint relaxation; a convex semidefinite program(SDP) for B = I.
The variational formulation for DSPCA is given by
maxX�0
tr(AX)
s.t. tr(X) = 1, 1T |X|1 ≤ k, (6)
SPCA [Zou et al., 2004]: ℓ1-penalized regression algorithm for PCAusing elastic-net; solved very efficiently using least angle regression.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Prior Work
DSPCA is computationally very expensive.
Interior point methods: O(n6 log(1/ǫ)).First-order methods: O(n4
√
log(n)/ǫ), where ǫ is the requiredaccuracy on the optimal value.
Convexity vs. Scalability
SDP relaxation is the only possible convex approach.prohibitively expensive for large n.though convex, it is only an approximation to the true solution.
Options: locally convergent algorithms / expensive mixed-integerprograms.
Currently, SPCA is the only viable option for handling veryhigh-dimensional datasets (on the order of 10,000).
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Approximation to ||x||0
Two observations
The ℓ1-norm relaxation does not simplify Eq. (5) ⇒ a betterapproximation to cardinality would improve sparsity.The convex SDP approximation to Eq. (5) scales terribly in size ⇒use a locally convergent algorithm with better scalability.
Eq. (5) can be written as
maxx
xTAx − ρ ||x||0
s.t. xTBx≤ 1, (7)
where ρ ≥ 0.
Approximate ||x||0 by∑n
i=1 log(ε + |xi |), where 0 ≤ ε ≪ 1 avoidsproblems when one of the xi is zero.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Approximation to ||x||0
The approximation can be interpreted as defining a limitingStudent’s t-distribution prior over x (leading to an improper prior)given by
p(x) ∝
n∏
i=1
1
|xi |
and computing its negative log-likelihood.
Eq. (7) reduces to
maxx
xTAx − ρ
n∑
i=1
log(ε + |xi |)
s.t. xTBx ≤ 1. (8)
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Approximation to ||x||0
Proposition
Let x̂ and x̌ be the maximizers of Eq. (7) and Eq. (8) respectively. If x̌ isindependent of the choice of ε and δ ≤ |x̌i | < ∞ for some fixed δ > 0,where i = {j : |x̌j | 6= 0, 1 ≤ j ≤ n}, then
||x̌||0 ≤ ||x̂||0 +cδ
log ε,
where cδ is a constant dependent on δ.
The global maximizers of Eq. (7) and Eq. (8) have almost the samecardinality.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Approximation to ||x||0
With ε = 0, Eq. (8) can be written as
maxx,y
xTAx − ρ
n∑
i=1
log yi
s.t. (x, y) ∈ F = {(x, y) : xTBx ≤ 1 ,−y � x � y}. (9)
Rewriting Eq. (9), we have
minx,y IF (x, y) −
(
xTAx − ρn∑
i=1
log yi
)
, (10)
where the convex function IF =
{
0 (x, y) ∈ F
∞ (x, y) /∈ Fis the indicator
function of the convex set, F .
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
D.C. Programming
Definition
Let C be a convex subset of Rn. A real-valued function f : C → R is
called difference of convex functions (d.c.) on C , if there exist twoconvex functions g , h : C → R such that f can be expressed in the form
f (x) = g(x) − h(x). (11)
The variational form of d.c. programming problems is given by
minx∈X
f0(x)
s.t. fi (x) ≤ 0, i = 1, . . . ,m, (12)
where fi = gi − hi , i = 0, . . . ,m, are d.c. functions and X is aclosed convex subset of R
n.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
D.C. Programming
Global optimization approaches such as branch and bound, cuttingplanes, etc. are not scalable to large-scale d.c. problems.
A robust and efficient d.c. minimization algorithm (DCA) wasproposed by [Tao and An, 1998].
DCA is a primal-dual subdifferential method for solving large-scaled.c. programs.
Based on the d.c. duality and the local optimality, DCA solves asequence of convex programs.
DCA can be understood as the convex-concave procedure(CCCP) [Yuille and Rangarajan, 2003].
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Sparse Generalized Eigenvalue Algorithm
Require: A � 0, B ≻ 0 and ρ ≥ 01: Choose x0 ∈ {x : xTBx ≤ 1} arbitrarily2: repeat
3:
x̄∗ = arg maxx̄
xTl AD(xl)x̄ − ρ
2 ||x̄||1
s.t. x̄TD(xl )BD(xl )x̄ ≤ 1 (13)
4: xl+1 = D(xl )x̄∗
5: until xl+1 = xl
6: return xl , x̄∗
where D(x) = diag(x).
solves a sequence of convex quadratically constrained quadraticprograms (QCQPs).
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Sparse Generalized Eigenvalue Algorithm
Proposition
Let ρ = 0, xl be the output of sparse generalized eigenvalue algorithmand
x̆ = arg maxx
xTAx
s.t. xTBx = 1.
ThenxT
l Axl = λmax(B−1A) and xl = x̆,
where λmax(B−1A) is the maximum eigenvalue of B−1A.
Local and global solutions are the same for ρ = 0.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution
Sparse Generalized Eigenvalue Algorithm
Proposition
Let B = I, A � 0 and ρ = 0. Then sparse generalized eigenvaluealgorithm is the power method for eigenvalue computation.
We call the sparse generalized eigenvalue program for B = I asDC-PCA.
Comparison to SCoTLASS
Using DCA, the variational formulation for SCoTLASS is given by
x̄∗ = arg max
x̄x
Tl Ax̄ − ρ
2||x̄||1
s.t. x̄Tx̄ ≤ 1, (14)
with xl+1 = x̄∗.Eq. (14) differs from DC-PCA in the multiplicative update.Therefore, DC-PCA yields atleast as much sparsity as SCoTLASS.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Experiments & Results
Pit props data [Jeffers, 1967]
A benchmark data to test sparse PCA algorithms.180 observations and 13 measured variables.6 principal directions are considered as they capture 87% of the totalvariance.
Algorithm Sparsity pattern Cumulative cardinality Cumulative varianceSPCA (7,4,4,1,1,1) 18 75.8%
DSPCA (6,2,3,1,1,1) 14 75.5%DC-PCA (6,2,2,1,1,1) 13 77.1%DC-PCA (7,4,4,1,1,1) 18 81.8%
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Pit Props
1 2 3 4 5 6
30
40
50
60
70
80
Number of principal components
Cum
ulat
ive
varia
nce
(%)
DC−PCADSPCASPCAPCADC−PCA*
(a)
1 2 3 4 5 66
8
10
12
14
16
18
Number of principal components
Cum
ulat
ive
card
inal
ity
DC−PCADSPCASPCA
(b)
Figure: (a) cumulative variance (b) cumulative cardinality for first 6 sparseprincipal components (PCs). DC-PCA∗ in (a) represents DC-PCA evaluated atSPCA’s sparsity pattern of (7,4,4,1,1,1).
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Colon Cancer [Alon et al., 1999]
62 tissue samples (22 normal and 40 cancerous) with gene expressionprofiles of n = 2000 genes extracted from DNA micro-array data.
1 2 3 4 5
40
45
50
55
60
65
70
Number of principal components
Cum
ulat
ive
varia
nce
(%)
PCADC−PCASPCA
(a)
1 2 3 4 52
4
6
8
10
Number of principal componentsC
umul
ativ
e ca
rdin
ality
(in
103 )
PCASPCADC−PCA
(b)
Figure: (a) cumulative variance (b) cumulative cardinality for first 5 sparseprincipal components (PC).
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Scalability
Randomly chosen problems of size n ranging from 10 to 2000.
Linux 3 GHz, 4 GB RAM workstation.
101
102
103
10−2
100
102
104
Problem size, n
CP
U ti
me
(sec
)
SPCADC−PCADSPCA
The empirical complexity is O(np) where p = 1.46 for DC-PCA, p = 1.91for SPCA and p = 3.92 for DSPCA.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Controlling sparsity with ρ
DC-PCA & SPCA
setting ρ is not straight forward to achieve the required sparsity, k.
DSPCA
k can be directly used in the semidefinite program.
0 0.5 1 1.50
5
10
15
20
25
30
35
Per
cent
age
of v
aria
nce
expl
aine
d
ρ0 0.5 1 1.5
0
2
4
6
8
10
12
14
Num
ber
of n
onze
ro lo
adin
gs
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Conclusion
A sparse generalized eigenvalue algorithm using d.c. programming isproposed.
The sparse PCA algorithm (DC-PCA) is derived as a special case.
DC-PCA has better sparsity and scalability than SPCA and DSPCA.
Sparsity parameter
DC-PCA & SPCA: difficult to set ρ.DSPCA: k is explicitly mentioned.
Quality of approximation
Difficult to characterize (theoretically) the quality of DC-PCAsolution.The quality of convex relaxation can be studied forDSPCA [El Ghaoui, 2006].
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Future Work
Extend the sparse generalized eigenvalue algorithm to any A ∈ Sn.
Explore path following techniques to efficiently set the sparsityparameter, ρ.
Investigate the sparsity paradigm for CCA which has interestingapplications in dictionary translation, music annotation, etc.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
References
Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A. J. (1999).Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues.Cell Biology, 96:6745–6750.
d’Aspremont, A., El Ghaoui, L., Jordan, M. I., and Lanckriet, G. R. G. (2005).A direct formulation for sparse PCA using semidefinite programming.In Saul, L. K., Weiss, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 17, pages 41–48, Cambridge, MA.MIT Press.
El Ghaoui, L. (2006).On the quality of a semidefinite programming bound for sparse principal component analysis.arXive.org.
Jeffers, J. (1967).Two case studies in the application of principal components.Applied Statistics, 16:225–236.
Jolliffe, I. T., Trendafilov, N. T., and Uddin, M. (2003).A modified principal component technique based on the LASSO.Journal of Computational and Graphical Statistics, 12:531–547.
Tao, P. D. and An, L. T. H. (1998).D.c. optimization algorithms for solving the trust region subproblem.SIAM J. Optim., pages 476–505.
Yuille, A. L. and Rangarajan, A. (2003).The concave-convex procedure.Neural Computation, pages 915–936.
Zou, H., Hastie, T., and Tibshirani, R. (2004).Sparse principal component analysis.Technical report, Statistics Department, Stanford University.
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Questions
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming
IntroductionAlgorithm
ResultsConclusion
Thank You
Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming