sparse eigen methods by d.c. programming · introduction algorithm results conclusion sparse eigen...

36
Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G. Lanckriet University of California, San Diego International Conference on Machine Learning, 2007. Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Upload: others

Post on 16-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Eigen Methods by D.C. Programming

Bharath K. Sriperumbudur David A. TorresGert R. G. Lanckriet

University of California, San Diego

International Conference on Machine Learning, 2007.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 2: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

1 IntroductionGeneralized Eigenvalue ProblemWhy Sparsity?

2 AlgorithmSparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSparse Generalized Eigenvalue Algorithm

3 Experiments & Results

4 Conclusion & Future Work

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 3: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Generalized Eigenvalue ProblemSparsity

Generalized Eigenvalue Problem

Eigenvalue problems are popular in machine learning and statistics.Classification

Fisher discriminant analysis (FDA)

Dimensionality reduction

Principal component analysis (PCA)Canonical correlation analysis (CCA)

The variational formulation for the generalized eigenvalue problem isgiven by

λmax(A,B) = maxx

xTAx

s.t. xTBx = 1, (1)

where x ∈ Rn, A ∈ S

n and B ≻ 0.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 4: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Generalized Eigenvalue ProblemSparsity

Classification

Fisher Discriminant Analysis

finds a 1-D subspace onto which the projections of data lead to amaximal separation between classes (binary classification setting).The variational formulation for FDA is given by

maxx

xT (µ1 − µ2)(µ1 − µ2)

Tx

s.t. xT (Σ1 + Σ2)x = 1. (2)

A = (µ1 − µ2)(µ1 − µ2)T is the between-cluster variance and

B = Σ1 + Σ2 is the within-cluster variance.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 5: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Generalized Eigenvalue ProblemSparsity

Dimensionality Reduction

Principal Component Analysis

identifies the direction of maximal variance.The variational formulation for PCA is given by

maxx

xTΣx

s.t. xTx = 1. (3)

A = Σ is the covariance matrix and B = I is the identity matrix.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 6: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Generalized Eigenvalue ProblemSparsity

Dimensionality Reduction

Canonical Correlation Analysis

two-view PCA between spaces X and Y .The variational formulation for CCA is given by

maxwx , wy

wTx Sxywy

s.t. wTx Sxxwx = 1 ,wT

y Syywy = 1. (4)

A =

(

0 Sxy

Syx 0

)

, B =

(

Sxx 0

0 Syy

)

and x =

(

wx

wy

)

where

S =

(

Sxx Sxy

Syx Syy

)

is the covariance matrix between samples from

X and Y .

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 7: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Generalized Eigenvalue ProblemSparsity

Why Sparsity?

Usually, the solutions of FDA, PCA and CCA are not sparse.

This often makes it difficult to interpret the results.

PCA/CCA: For better interpretability, few relevant features arerequired that explain as much variance as possible.

Applications: bio-informatics, finance etc.

FDA: feature selection aids generalization performance by promotingsparse solutions.

Sparse representation ⇒ better interpretation, better generalizationand reduced computational costs.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 8: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Sparse Generalized Eigenvalue Problem

The variational formulation for the sparse generalized eigenvalueproblem is given by

maxx

xTAx

s.t. xTBx = 1

||x||0 ≤ k, (5)

where 1 ≤ k ≤ n and ||x||0 is the cardinality of x.

Eq. (5) is non-convex, NP-hard and therefore intractable.

Usually, the ℓ1 approximation is used for the cardinality constraint.

The problem is still computationally hard.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 9: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Sparse Generalized Eigenvalue Problem

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Level sets of xTAx

Geometry of the generalized eigenvalue problem

KKT condition: Ax = λBx. The generalized eigenvalue problem is veryspecial.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 10: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Sparse Generalized Eigenvalue Problem

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

k=sqrt(2)

Geometry of the sparse generalized eigenvalue problem

||x||1 ≤ k

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 11: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Sparse Generalized Eigenvalue Problem

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

k>sqrt(2)

Geometry of the sparse generalized eigenvalue problem

||x||1 ≤ k; uninteresting

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 12: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Sparse Generalized Eigenvalue Problem

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

k=1

Geometry of the sparse generalized eigenvalue problem

||x||1 ≤ k; uninteresting

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 13: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Sparse Generalized Eigenvalue Problem

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

k<1

Geometry of the sparse generalized eigenvalue problem

||x||1 ≤ k; infeasible

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 14: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Sparse Generalized Eigenvalue Problem

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

1<k<sqrt(2)

Geometry of the sparse generalized eigenvalue problem

||x||1 ≤ k; interesting

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 15: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Sparse Generalized Eigenvalue Problem

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

1<k<sqrt(2)

Geometry of the sparse generalized eigenvalue problem

For x ∈ Rn

k < 1, infeasiblek = 1, uninteresting1 < k <

√n, interesting but hard

k ≥ √n, uninteresting (generalized eigenvalue problem).

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 16: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Prior Work

SCoTLASS [Jolliffe et al., 2003]: ℓ1 approximation; locallyconvergent algorithm for B = I.

DSPCA [d’Aspremont et al., 2005]: ℓ1 approximation, followed bylifting and rank constraint relaxation; a convex semidefinite program(SDP) for B = I.

The variational formulation for DSPCA is given by

maxX�0

tr(AX)

s.t. tr(X) = 1, 1T |X|1 ≤ k, (6)

SPCA [Zou et al., 2004]: ℓ1-penalized regression algorithm for PCAusing elastic-net; solved very efficiently using least angle regression.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 17: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Prior Work

DSPCA is computationally very expensive.

Interior point methods: O(n6 log(1/ǫ)).First-order methods: O(n4

log(n)/ǫ), where ǫ is the requiredaccuracy on the optimal value.

Convexity vs. Scalability

SDP relaxation is the only possible convex approach.prohibitively expensive for large n.though convex, it is only an approximation to the true solution.

Options: locally convergent algorithms / expensive mixed-integerprograms.

Currently, SPCA is the only viable option for handling veryhigh-dimensional datasets (on the order of 10,000).

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 18: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Approximation to ||x||0

Two observations

The ℓ1-norm relaxation does not simplify Eq. (5) ⇒ a betterapproximation to cardinality would improve sparsity.The convex SDP approximation to Eq. (5) scales terribly in size ⇒use a locally convergent algorithm with better scalability.

Eq. (5) can be written as

maxx

xTAx − ρ ||x||0

s.t. xTBx≤ 1, (7)

where ρ ≥ 0.

Approximate ||x||0 by∑n

i=1 log(ε + |xi |), where 0 ≤ ε ≪ 1 avoidsproblems when one of the xi is zero.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 19: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Approximation to ||x||0

The approximation can be interpreted as defining a limitingStudent’s t-distribution prior over x (leading to an improper prior)given by

p(x) ∝

n∏

i=1

1

|xi |

and computing its negative log-likelihood.

Eq. (7) reduces to

maxx

xTAx − ρ

n∑

i=1

log(ε + |xi |)

s.t. xTBx ≤ 1. (8)

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 20: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Approximation to ||x||0

Proposition

Let x̂ and x̌ be the maximizers of Eq. (7) and Eq. (8) respectively. If x̌ isindependent of the choice of ε and δ ≤ |x̌i | < ∞ for some fixed δ > 0,where i = {j : |x̌j | 6= 0, 1 ≤ j ≤ n}, then

||x̌||0 ≤ ||x̂||0 +cδ

log ε,

where cδ is a constant dependent on δ.

The global maximizers of Eq. (7) and Eq. (8) have almost the samecardinality.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 21: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Approximation to ||x||0

With ε = 0, Eq. (8) can be written as

maxx,y

xTAx − ρ

n∑

i=1

log yi

s.t. (x, y) ∈ F = {(x, y) : xTBx ≤ 1 ,−y � x � y}. (9)

Rewriting Eq. (9), we have

minx,y IF (x, y) −

(

xTAx − ρn∑

i=1

log yi

)

, (10)

where the convex function IF =

{

0 (x, y) ∈ F

∞ (x, y) /∈ Fis the indicator

function of the convex set, F .

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 22: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

D.C. Programming

Definition

Let C be a convex subset of Rn. A real-valued function f : C → R is

called difference of convex functions (d.c.) on C , if there exist twoconvex functions g , h : C → R such that f can be expressed in the form

f (x) = g(x) − h(x). (11)

The variational form of d.c. programming problems is given by

minx∈X

f0(x)

s.t. fi (x) ≤ 0, i = 1, . . . ,m, (12)

where fi = gi − hi , i = 0, . . . ,m, are d.c. functions and X is aclosed convex subset of R

n.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 23: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

D.C. Programming

Global optimization approaches such as branch and bound, cuttingplanes, etc. are not scalable to large-scale d.c. problems.

A robust and efficient d.c. minimization algorithm (DCA) wasproposed by [Tao and An, 1998].

DCA is a primal-dual subdifferential method for solving large-scaled.c. programs.

Based on the d.c. duality and the local optimality, DCA solves asequence of convex programs.

DCA can be understood as the convex-concave procedure(CCCP) [Yuille and Rangarajan, 2003].

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 24: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Sparse Generalized Eigenvalue Algorithm

Require: A � 0, B ≻ 0 and ρ ≥ 01: Choose x0 ∈ {x : xTBx ≤ 1} arbitrarily2: repeat

3:

x̄∗ = arg maxx̄

xTl AD(xl)x̄ − ρ

2 ||x̄||1

s.t. x̄TD(xl )BD(xl )x̄ ≤ 1 (13)

4: xl+1 = D(xl )x̄∗

5: until xl+1 = xl

6: return xl , x̄∗

where D(x) = diag(x).

solves a sequence of convex quadratically constrained quadraticprograms (QCQPs).

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 25: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Sparse Generalized Eigenvalue Algorithm

Proposition

Let ρ = 0, xl be the output of sparse generalized eigenvalue algorithmand

x̆ = arg maxx

xTAx

s.t. xTBx = 1.

ThenxT

l Axl = λmax(B−1A) and xl = x̆,

where λmax(B−1A) is the maximum eigenvalue of B−1A.

Local and global solutions are the same for ρ = 0.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 26: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Sparse Generalized Eigenvalue ProblemPrior WorkCardinality ApproximationD.C. ProgrammingSolution

Sparse Generalized Eigenvalue Algorithm

Proposition

Let B = I, A � 0 and ρ = 0. Then sparse generalized eigenvaluealgorithm is the power method for eigenvalue computation.

We call the sparse generalized eigenvalue program for B = I asDC-PCA.

Comparison to SCoTLASS

Using DCA, the variational formulation for SCoTLASS is given by

x̄∗ = arg max

x̄x

Tl Ax̄ − ρ

2||x̄||1

s.t. x̄Tx̄ ≤ 1, (14)

with xl+1 = x̄∗.Eq. (14) differs from DC-PCA in the multiplicative update.Therefore, DC-PCA yields atleast as much sparsity as SCoTLASS.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 27: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Experiments & Results

Pit props data [Jeffers, 1967]

A benchmark data to test sparse PCA algorithms.180 observations and 13 measured variables.6 principal directions are considered as they capture 87% of the totalvariance.

Algorithm Sparsity pattern Cumulative cardinality Cumulative varianceSPCA (7,4,4,1,1,1) 18 75.8%

DSPCA (6,2,3,1,1,1) 14 75.5%DC-PCA (6,2,2,1,1,1) 13 77.1%DC-PCA (7,4,4,1,1,1) 18 81.8%

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 28: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Pit Props

1 2 3 4 5 6

30

40

50

60

70

80

Number of principal components

Cum

ulat

ive

varia

nce

(%)

DC−PCADSPCASPCAPCADC−PCA*

(a)

1 2 3 4 5 66

8

10

12

14

16

18

Number of principal components

Cum

ulat

ive

card

inal

ity

DC−PCADSPCASPCA

(b)

Figure: (a) cumulative variance (b) cumulative cardinality for first 6 sparseprincipal components (PCs). DC-PCA∗ in (a) represents DC-PCA evaluated atSPCA’s sparsity pattern of (7,4,4,1,1,1).

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 29: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Colon Cancer [Alon et al., 1999]

62 tissue samples (22 normal and 40 cancerous) with gene expressionprofiles of n = 2000 genes extracted from DNA micro-array data.

1 2 3 4 5

40

45

50

55

60

65

70

Number of principal components

Cum

ulat

ive

varia

nce

(%)

PCADC−PCASPCA

(a)

1 2 3 4 52

4

6

8

10

Number of principal componentsC

umul

ativ

e ca

rdin

ality

(in

103 )

PCASPCADC−PCA

(b)

Figure: (a) cumulative variance (b) cumulative cardinality for first 5 sparseprincipal components (PC).

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 30: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Scalability

Randomly chosen problems of size n ranging from 10 to 2000.

Linux 3 GHz, 4 GB RAM workstation.

101

102

103

10−2

100

102

104

Problem size, n

CP

U ti

me

(sec

)

SPCADC−PCADSPCA

The empirical complexity is O(np) where p = 1.46 for DC-PCA, p = 1.91for SPCA and p = 3.92 for DSPCA.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 31: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Controlling sparsity with ρ

DC-PCA & SPCA

setting ρ is not straight forward to achieve the required sparsity, k.

DSPCA

k can be directly used in the semidefinite program.

0 0.5 1 1.50

5

10

15

20

25

30

35

Per

cent

age

of v

aria

nce

expl

aine

d

ρ0 0.5 1 1.5

0

2

4

6

8

10

12

14

Num

ber

of n

onze

ro lo

adin

gs

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 32: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Conclusion

A sparse generalized eigenvalue algorithm using d.c. programming isproposed.

The sparse PCA algorithm (DC-PCA) is derived as a special case.

DC-PCA has better sparsity and scalability than SPCA and DSPCA.

Sparsity parameter

DC-PCA & SPCA: difficult to set ρ.DSPCA: k is explicitly mentioned.

Quality of approximation

Difficult to characterize (theoretically) the quality of DC-PCAsolution.The quality of convex relaxation can be studied forDSPCA [El Ghaoui, 2006].

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 33: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Future Work

Extend the sparse generalized eigenvalue algorithm to any A ∈ Sn.

Explore path following techniques to efficiently set the sparsityparameter, ρ.

Investigate the sparsity paradigm for CCA which has interestingapplications in dictionary translation, music annotation, etc.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 34: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

References

Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A. J. (1999).Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues.Cell Biology, 96:6745–6750.

d’Aspremont, A., El Ghaoui, L., Jordan, M. I., and Lanckriet, G. R. G. (2005).A direct formulation for sparse PCA using semidefinite programming.In Saul, L. K., Weiss, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 17, pages 41–48, Cambridge, MA.MIT Press.

El Ghaoui, L. (2006).On the quality of a semidefinite programming bound for sparse principal component analysis.arXive.org.

Jeffers, J. (1967).Two case studies in the application of principal components.Applied Statistics, 16:225–236.

Jolliffe, I. T., Trendafilov, N. T., and Uddin, M. (2003).A modified principal component technique based on the LASSO.Journal of Computational and Graphical Statistics, 12:531–547.

Tao, P. D. and An, L. T. H. (1998).D.c. optimization algorithms for solving the trust region subproblem.SIAM J. Optim., pages 476–505.

Yuille, A. L. and Rangarajan, A. (2003).The concave-convex procedure.Neural Computation, pages 915–936.

Zou, H., Hastie, T., and Tibshirani, R. (2004).Sparse principal component analysis.Technical report, Statistics Department, Stanford University.

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 35: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Questions

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming

Page 36: Sparse Eigen Methods by D.C. Programming · Introduction Algorithm Results Conclusion Sparse Eigen Methods by D.C. Programming Bharath K. Sriperumbudur David A. Torres Gert R. G

IntroductionAlgorithm

ResultsConclusion

Thank You

Bharath K. Sriperumbudur Sparse Eigen Methods by D.C. Programming