a penalized matrix decomposition, and its...

72
Introduction Penalized Matrix Decomposition Sparse Hierarchical Clustering A penalized matrix decomposition, and its applications Daniela M. Witten Thesis Defense Department of Statistics Stanford University June 7, 2010 Daniela M. Witten A penalized matrix decomposition

Upload: vonguyet

Post on 01-May-2018

237 views

Category:

Documents


1 download

TRANSCRIPT

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    A penalized matrix decomposition,and its applications

    Daniela M. WittenThesis Defense

    Department of StatisticsStanford University

    June 7, 2010

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparsity

    I Consider a high-dimensional regression problem: we wish topredict y Rn using X Rnp, where p may be quite large.

    I We can use an L1 or lasso penalty to fit the modely = X + in way that gives a sparsity:

    = argmin{||y X||2 + ||||1}

    I An active area of research: Lasso (Tibshirani 1996), BasisPursuit (Chen, Donoho, and Saunders 1998), LARS (Efron,Hastie, Johnstone, and Tibshirani 2004), Adaptive Lasso (Zou2006), Group Lasso (Yuan and Lin 2006), Dantzig selector(Candes and Tao 2007), Relaxed Lasso (Meinshausen 2008)

    I Today: sparsity in the unsupervised setting.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparsity

    I Consider a high-dimensional regression problem: we wish topredict y Rn using X Rnp, where p may be quite large.

    I We can use an L1 or lasso penalty to fit the modely = X + in way that gives a sparsity:

    = argmin{||y X||2 + ||||1}

    I An active area of research: Lasso (Tibshirani 1996), BasisPursuit (Chen, Donoho, and Saunders 1998), LARS (Efron,Hastie, Johnstone, and Tibshirani 2004), Adaptive Lasso (Zou2006), Group Lasso (Yuan and Lin 2006), Dantzig selector(Candes and Tao 2007), Relaxed Lasso (Meinshausen 2008)

    I Today: sparsity in the unsupervised setting.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparsity

    I Consider a high-dimensional regression problem: we wish topredict y Rn using X Rnp, where p may be quite large.

    I We can use an L1 or lasso penalty to fit the modely = X + in way that gives a sparsity:

    = argmin{||y X||2 + ||||1}

    I An active area of research: Lasso (Tibshirani 1996), BasisPursuit (Chen, Donoho, and Saunders 1998), LARS (Efron,Hastie, Johnstone, and Tibshirani 2004), Adaptive Lasso (Zou2006), Group Lasso (Yuan and Lin 2006), Dantzig selector(Candes and Tao 2007), Relaxed Lasso (Meinshausen 2008)

    I Today: sparsity in the unsupervised setting.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparsity

    I Consider a high-dimensional regression problem: we wish topredict y Rn using X Rnp, where p may be quite large.

    I We can use an L1 or lasso penalty to fit the modely = X + in way that gives a sparsity:

    = argmin{||y X||2 + ||||1}

    I An active area of research: Lasso (Tibshirani 1996), BasisPursuit (Chen, Donoho, and Saunders 1998), LARS (Efron,Hastie, Johnstone, and Tibshirani 2004), Adaptive Lasso (Zou2006), Group Lasso (Yuan and Lin 2006), Dantzig selector(Candes and Tao 2007), Relaxed Lasso (Meinshausen 2008)

    I Today: sparsity in the unsupervised setting.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Matrix Decompositions

    Consider a n p matrix X for which we want a low-rankapproximation. For simplicity, assume that the row and columnmeans of X are zero.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Matrix Decompositions

    We might want this low-rank approximation in order to

    1. obtain a lower-dimensional projection of the data thatcaptures most of the variability, or

    2. achieve a better understanding and interpretation of the data,or

    3. impute missing values, e.g. for movie recommender systems

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    The singular value decomposition

    We decompose the matrix X as

    X = UDVT

    where U and V have orthonormal columns and D is diagonal;d1 d2 ... dp 0.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    A sparse matrix decomposition

    The SVD has many useful and interesting properties, but ingeneral, the columns of U and V are not sparse - that is, noelements of U and V are exactly zero.

    We want a matrix decomposition with sparse elements, forconciseness, parsimony, and interpretability.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Example of the sparse matrix decomposition: Netflix Data

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Netflix recommendations

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Netflix Data

    Over 100 million ratings given by 480,000 users to 18,000 movies.Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Netflix Data

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Netflix Results

    Lord of the Rings: The Fellowship of the RingLord of the Rings: The Two Towers: Extended EditionLord of the Rings: The Fellowship of the Ring: Extended EditionLord of the Rings: The Two TowersLord of the Rings: The Return of the KingLord of the Rings: The Return of the King: Extended EditionStar Wars: Episode V: The Empire Strikes BackStar Wars: Episode VI: Return of the JediStar Wars: Episode IV: A New HopeRaiders of the Lost Ark

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Applications of the penalized matrix decomposition

    Input matrix ResultData data interpretation

    missing value imputationmatrix completion

    Variance-covariance sparse PCA

    Cross-products sparse CCA

    Dissimilarity sparse clustering

    Between-Class Covariance sparse LDA

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Criterion for the Singular Value Decomposition

    Recall that the first components u, v, and d of the SVD comprisethe best rank-1 approximation to the matrix X, in the sense of theFrobenius norm:

    minimizeu,v,d

    ||X duvT ||2F subject to ||u||2 = 1, ||v||2 = 1

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Criterion for the Penalized Matrix Decomposition

    Suppose we add in additional penalty terms to that criterion:

    minimizeu,v,d

    ||X duvT ||2F

    subject to ||u||2 = ||v||2 = 1,P1(u) c1,P2(v) c2,

    where P1 and P2 are arbitrary penalty functions. We can call thisthe rank-one penalized matrix decomposition.

    For now, let P1(u) = ||u||1, P2(v) = ||v||1.

    This encourages sparsity: the sparse matrix decomposition.

    This is related to a proposal of Shen and Huang (2008).

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Criterion for the Penalized Matrix Decomposition

    Suppose we add in additional penalty terms to that criterion:

    minimizeu,v,d

    ||X duvT ||2F

    subject to ||u||2 = ||v||2 = 1,P1(u) c1,P2(v) c2,

    where P1 and P2 are arbitrary penalty functions. We can call thisthe rank-one penalized matrix decomposition.

    For now, let P1(u) = ||u||1, P2(v) = ||v||1.

    This encourages sparsity: the sparse matrix decomposition.

    This is related to a proposal of Shen and Huang (2008).

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    More on the Rank-One PMD Model

    I Note that u, v that minimize

    ||X duvT ||2F subject to ||u||2 = ||v||2 = 1

    also maximize

    uTXv subject to ||u||2 1, ||v||2 1.

    I This means that we can re-write the rank-one PMD criterionas

    maximizeu,v

    uTXv subject to ||u||2 1, ||v||2 1, ||u||1 c1, ||v||1 c2.

    I With u fixed, the criterion is convex in v, and with v fixed, itsconvex in u. This bi-convexity leads to a convenient iterativealgorithm!

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    More on the Rank-One PMD Model

    I Note that u, v that minimize

    ||X duvT ||2F subject to ||u||2 = ||v||2 = 1

    also maximize

    uTXv subject to ||u||2 1, ||v||2 1.

    I This means that we can re-write the rank-one PMD criterionas

    maximizeu,v

    uTXv subject to ||u||2 1, ||v||2 1, ||u||1 c1, ||v||1 c2.

    I With u fixed, the criterion is convex in v, and with v fixed, itsconvex in u. This bi-convexity leads to a convenient iterativealgorithm!

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    More on the Rank-One PMD Model

    I Note that u, v that minimize

    ||X duvT ||2F subject to ||u||2 = ||v||2 = 1

    also maximize

    uTXv subject to ||u||2 1, ||v||2 1.

    I This means that we can re-write the rank-one PMD criterionas

    maximizeu,v

    uTXv subject to ||u||2 1, ||v||2 1, ||u||1 c1, ||v||1 c2.

    I With u fixed, the criterion is convex in v, and with v fixed, itsconvex in u. This bi-convexity leads to a convenient iterativealgorithm!

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Algorithm for Sparse Matrix Decomposition

    1. Initialize v to satisfy the constraints ||v||2 = 1, ||v||1 c2.2. Iterate until convergence:

    I u argmaxuuTXv subject to ||u||1 c1, ||u||2 1.I v argmaxvuTXv subject to ||v||1 c2, ||v||2 1.

    For c1 and c2 sufficiently small, the resulting u and v will be sparse.

    In the absence of L1 penalties, this yields the rank one SVD.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Soft-thresholding

    To update u with v held fixed, we must optimize

    u argmaxuuTXv subject to ||u||1 c1, ||u||2 1.

    It turns out that the solution simply involves soft-thresholding:

    u =S(Xv,)

    ||S(Xv,)||2

    where S(a,) = sgn(a)max(0, |a| ).

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    L1 and L2 penalties

    Video of L1 and L2 penalties

    Daniela M. Witten A penalized matrix decomposition

    movie.mpgMedia File (video/mpeg)

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    L1 and L2 penalties

    The story in three dimensions

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Algorithm in action

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Algorithm in action: Update u

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Algorithm in action: Update v

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Algorithm in action: Update u

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Algorithm in action: Update v

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Extension to Rank-K Decomposition

    I To get the rank-K decomposition, we simply subtract out therank-(K 1) decomposition from the original data matrix X,and apply the rank-1 decomposition to the residuals.

    I In the absence of L1 penalties, this gives the rank K SVD.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Selection of tuning parameters c1 and c2

    I Selection of tuning parameters in unsupervised problems is avery difficult problem.

    I We leave out scattered elements of X and choose the tuningparameters such that our low-rank approximation to Xoptimally estimates the left-out elements.

    I Closely related to proposals by Owen and Perry (2009) andWold (1978).

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Applications of the penalized matrix decomposition

    Input matrix ResultData data interpretation

    missing value imputationmatrix completion

    Variance-covariance sparse PCA

    Cross-products sparse CCA

    Dissimilarity sparse clustering

    Between-Class Covariance sparse LDA

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Example of Sparse Matrix Decomposition: Netflix Data

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Netflix Data: Factor 1 - All movies have negative weights

    Lord of the Rings: The Fellowship of the RingLord of the Rings: The Two Towers: Extended EditionLord of the Rings: The Fellowship of the Ring: Extended EditionLord of the Rings: The Two TowersLord of the Rings: The Return of the KingLord of the Rings: The Return of the King: Extended EditionStar Wars: Episode V: The Empire Strikes BackStar Wars: Episode VI: Return of the JediStar Wars: Episode IV: A New HopeRaiders of the Lost Ark

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Netflix Data: Factor 5 - Movies with positive weights

    Austin Powers in GoldmemberAustin Powers: International Man of MysteryAustin Powers: The Spy Who Shagged MeThe Nutty ProfessorBig Mommas HouseWild Wild WestDodgeball: A True Underdog StoryAnchorman: The Legend of Ron BurgundyMr. DeedsPunch-Drunk LoveAnger ManagementMoulin RougeSpaceballs

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Netflix Data: Factor 5 - Movies with negative weights

    Star Wars: Episode V: The Empire Strikes BackLord of the Rings: The Two Towers: Extended EditionLord of the Rings: The Fellowship of the Ring: Extended EditionLord of the Rings: The Return of the King: Extended EditionRaiders of the Lost ArkThe Silence of the LambsRain ManWe Were SoldiersThe GodfatherThe Shawshank Redemption: Special EditionSaving Private RyanE.T. the Extra-Terrestrial: The 20th Anniversary (Rerelease)Finding Nemo (Widescreen)

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Applications of the penalized matrix decomposition

    Input matrix ResultData data interpretation

    missing value imputationmatrix completion

    Variance-covariance sparse PCA

    Cross-products sparse CCA

    Dissimilarity sparse clustering

    Between-Class Covariance sparse LDA

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Hierarchical clustering

    There has been a resurgence of interest in hierarchical clustering inthe field of genomics.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Clustering when p n

    Suppose we wish to cluster n observations on p features, wherep n.

    I Hierarchical clustering is very subjective: the answer you getdepends on what set of features you use. We want aprincipled way to choose a set of features to use in clustering.

    I If the true classes that we wish to identify are defined on onlya subset of the features, then the presence of noise featurescan obscure this signal. We want a way to adaptively choosethe signal features to use in clustering.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Clustering when p n

    Suppose we wish to cluster n observations on p features, wherep n.

    I Hierarchical clustering is very subjective: the answer you getdepends on what set of features you use. We want aprincipled way to choose a set of features to use in clustering.

    I If the true classes that we wish to identify are defined on onlya subset of the features, then the presence of noise featurescan obscure this signal. We want a way to adaptively choosethe signal features to use in clustering.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Clustering when p n

    Suppose we wish to cluster n observations on p features, wherep n.

    I Hierarchical clustering is very subjective: the answer you getdepends on what set of features you use. We want aprincipled way to choose a set of features to use in clustering.

    I If the true classes that we wish to identify are defined on onlya subset of the features, then the presence of noise featurescan obscure this signal. We want a way to adaptively choosethe signal features to use in clustering.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Example

    A simple example with 10 observations; 2 classes are defined on 10important features.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Example: 10 important features; 10 features total

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Example: 10 important features; 500 features total

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Example: 10 important features; 5000 features total

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparse hierarchical clustering results: 10 importantfeatures; 5000 features total

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparse Clustering

    We want a method to hierarchically cluster observations based ona small subset of the features; we will call this sparse hierarchicalclustering.

    We want an automated way to

    I find a subset of features to use in the clustering, and

    I obtain a more accurate or interesting clustering using thatsubset of features.

    Assumption: We assume that the dissimilarity measure used isadditive in the features: Di ,i =

    pj=1 di ,i ,j

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparse Clustering

    We want a method to hierarchically cluster observations based ona small subset of the features; we will call this sparse hierarchicalclustering.

    We want an automated way to

    I find a subset of features to use in the clustering, and

    I obtain a more accurate or interesting clustering using thatsubset of features.

    Assumption: We assume that the dissimilarity measure used isadditive in the features: Di ,i =

    pj=1 di ,i ,j

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparse Clustering

    We want a method to hierarchically cluster observations based ona small subset of the features; we will call this sparse hierarchicalclustering.

    We want an automated way to

    I find a subset of features to use in the clustering, and

    I obtain a more accurate or interesting clustering using thatsubset of features.

    Assumption: We assume that the dissimilarity measure used isadditive in the features: Di ,i =

    pj=1 di ,i ,j

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Dissimilarity matrix for the n observations

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Dissimilarity matrix for the n observations

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Dissimilarity matrix is a sum of dissimilarity matrices overthe features

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Hierarchical clustering sums the dissimilarity matrices forthe features

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Weighted sum of the dissimilarity matrices for the features

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparse hierarchical clustering and the PMD

    Let D denote the n2 p matrix for which column j is thefeature-wise dissimilarity matrix for feature j .

    Then, suppose we apply the PMD to D:

    maximizeu,w

    uTDw subject to ||u||2 1, ||w||2 1, ||w||1 s

    I wj is a weight on the dissimilarity matrix for feature j .I wj 0 occurs naturally (we assume that Di ,i 0).I If we re-arrange the elements of Dw into a n n matrix, then

    performing hierarchical clustering on this re-weighteddissimilarity matrix gives sparse hierarchical clustering.

    I If w1 = ... = wp then this gives standard hierarchicalclustering.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparse hierarchical clustering and the PMD

    Let D denote the n2 p matrix for which column j is thefeature-wise dissimilarity matrix for feature j .

    Then, suppose we apply the PMD to D:

    maximizeu,w

    uTDw subject to ||u||2 1, ||w||2 1, ||w||1 s

    I wj is a weight on the dissimilarity matrix for feature j .I wj 0 occurs naturally (we assume that Di ,i 0).I If we re-arrange the elements of Dw into a n n matrix, then

    performing hierarchical clustering on this re-weighteddissimilarity matrix gives sparse hierarchical clustering.

    I If w1 = ... = wp then this gives standard hierarchicalclustering.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparse hierarchical clustering and the PMD

    Let D denote the n2 p matrix for which column j is thefeature-wise dissimilarity matrix for feature j .

    Then, suppose we apply the PMD to D:

    maximizeu,w

    uTDw subject to ||u||2 1, ||w||2 1, ||w||1 s

    I wj is a weight on the dissimilarity matrix for feature j .

    I wj 0 occurs naturally (we assume that Di ,i 0).I If we re-arrange the elements of Dw into a n n matrix, then

    performing hierarchical clustering on this re-weighteddissimilarity matrix gives sparse hierarchical clustering.

    I If w1 = ... = wp then this gives standard hierarchicalclustering.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparse hierarchical clustering and the PMD

    Let D denote the n2 p matrix for which column j is thefeature-wise dissimilarity matrix for feature j .

    Then, suppose we apply the PMD to D:

    maximizeu,w

    uTDw subject to ||u||2 1, ||w||2 1, ||w||1 s

    I wj is a weight on the dissimilarity matrix for feature j .I wj 0 occurs naturally (we assume that Di ,i 0).

    I If we re-arrange the elements of Dw into a n n matrix, thenperforming hierarchical clustering on this re-weighteddissimilarity matrix gives sparse hierarchical clustering.

    I If w1 = ... = wp then this gives standard hierarchicalclustering.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparse hierarchical clustering and the PMD

    Let D denote the n2 p matrix for which column j is thefeature-wise dissimilarity matrix for feature j .

    Then, suppose we apply the PMD to D:

    maximizeu,w

    uTDw subject to ||u||2 1, ||w||2 1, ||w||1 s

    I wj is a weight on the dissimilarity matrix for feature j .I wj 0 occurs naturally (we assume that Di ,i 0).I If we re-arrange the elements of Dw into a n n matrix, then

    performing hierarchical clustering on this re-weighteddissimilarity matrix gives sparse hierarchical clustering.

    I If w1 = ... = wp then this gives standard hierarchicalclustering.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparse hierarchical clustering and the PMD

    Let D denote the n2 p matrix for which column j is thefeature-wise dissimilarity matrix for feature j .

    Then, suppose we apply the PMD to D:

    maximizeu,w

    uTDw subject to ||u||2 1, ||w||2 1, ||w||1 s

    I wj is a weight on the dissimilarity matrix for feature j .I wj 0 occurs naturally (we assume that Di ,i 0).I If we re-arrange the elements of Dw into a n n matrix, then

    performing hierarchical clustering on this re-weighteddissimilarity matrix gives sparse hierarchical clustering.

    I If w1 = ... = wp then this gives standard hierarchicalclustering.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparse hierarchical clustering in action

    A simulated example with 6 classes defined on 200 signal features;2000 features in total.

    5658

    6062

    6466

    6870

    72

    Standard Clustering

    0.00

    00.

    005

    0.01

    00.

    015

    0.02

    00.

    025

    Sparse Clustering

    0 500 1000 1500 2000

    0.00

    0.05

    0.10

    0.15

    0.20

    W

    Index

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    An important breast cancer paper

    Nature (2000) 406:747-752.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Breast cancer data

    I 65 breast tumor samples for which gene expression data isavailable. Some samples are replicates from the same tumor(before and after chemo).

    I Clustered based on full set of 1753 genes first.

    I Clustered based on 496 intrinsic genes for which the variationbetween different tumors is large relative to the variationwithin a tumor.

    I Based on the intrinsic gene clustering, determined that 62 of65 tumors fall into one of four classes: normal-breast-like,basal-like, ER+, Erb-B2+.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Breast cancer data

    I 65 breast tumor samples for which gene expression data isavailable. Some samples are replicates from the same tumor(before and after chemo).

    I Clustered based on full set of 1753 genes first.

    I Clustered based on 496 intrinsic genes for which the variationbetween different tumors is large relative to the variationwithin a tumor.

    I Based on the intrinsic gene clustering, determined that 62 of65 tumors fall into one of four classes: normal-breast-like,basal-like, ER+, Erb-B2+.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Breast cancer data

    I 65 breast tumor samples for which gene expression data isavailable. Some samples are replicates from the same tumor(before and after chemo).

    I Clustered based on full set of 1753 genes first.

    I Clustered based on 496 intrinsic genes for which the variationbetween different tumors is large relative to the variationwithin a tumor.

    I Based on the intrinsic gene clustering, determined that 62 of65 tumors fall into one of four classes: normal-breast-like,basal-like, ER+, Erb-B2+.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Breast cancer data

    I 65 breast tumor samples for which gene expression data isavailable. Some samples are replicates from the same tumor(before and after chemo).

    I Clustered based on full set of 1753 genes first.

    I Clustered based on 496 intrinsic genes for which the variationbetween different tumors is large relative to the variationwithin a tumor.

    I Based on the intrinsic gene clustering, determined that 62 of65 tumors fall into one of four classes: normal-breast-like,basal-like, ER+, Erb-B2+.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Clustering using intrinsic genes: normal-breast-like,basal-like, ER+, Erb-B2+

    0.0

    0.5

    1.0

    1.5

    All Samples

    0.2

    0.4

    0.6

    62 Samples

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparse clustering

    We wonder: If we sparsely cluster the 62 observations using all ofthe genes, can we identify the four classes successfully?

    Three types of clustering:

    1. Standard hierarchical clustering using all 1753 genes.

    2. Sparse hierarchical clustering of all 1753 genes, with thetuning parameter chosen to yield 496 genes.

    3. Standard hierarchical clustering using the 496 genes withhighest marginal variance.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Sparse clustering

    We wonder: If we sparsely cluster the 62 observations using all ofthe genes, can we identify the four classes successfully?

    Three types of clustering:

    1. Standard hierarchical clustering using all 1753 genes.

    2. Sparse hierarchical clustering of all 1753 genes, with thetuning parameter chosen to yield 496 genes.

    3. Standard hierarchical clustering using the 496 genes withhighest marginal variance.

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    normal-breast-like, basal-like, ER+, Erb-B2+0.

    00.

    51.

    01.

    5

    Standard Clust: All 1753 Genes

    0.0

    0.5

    1.0

    1.5

    Sparse Clust: 496 NonZero Genes

    0.0

    0.5

    1.0

    1.5

    Standard Clust: 496 HighVar. Genes

    Daniela M. Witten A penalized matrix decomposition

  • IntroductionPenalized Matrix Decomposition

    Sparse Hierarchical Clustering

    Genes with high weights

    # Gene Weight1 S100 CALCIUM-BINDING PROTEIN A8 (CALGRANULIN A) 0.2232 SECRETED FRIZZLED-RELATED PROTEIN 1 0.21263 ESTROGEN RECEPTOR 1 0.20764 KERATIN 17 0.16275 HUMAN REARRANGED IMMUNOGLOBULIN LAMBDA 0.15686 CYTOCHROME P450, SUBFAMILY IIA 0.1557 APOLIPOPROTEIN D 0.15098 LACTOTRANSFERRIN 0.14719 ESTROGEN RECEPTOR 1 0.140510 134783 0.1411 HEPATOCYTE NUCLEAR FACTOR 3, ALPHA 0.133212 HUMAN REARRANGED IMMUNOGLOBULIN LAMBDA LIGHT 0.130913 FATTY ACID BINDING PROTEIN 4, ADIPOCYTE 0.129214 CERULOPLASMIN (FERROXIDASE) 0.12615 HUMAN SECRETORY PROTEIN (P1.B) MRNA 0.120816 NON-SPECIFIC CROSS REACTING ANTIGEN 0.119917 LIPOPROTEIN LIPASE 0.112318 IMMUNOGLOBULIN LAMBDA LIGHT CHAIN 0.11219 CRYSTALLIN, ALPHA B 0.110820 FATTY ACID BINDING PROTEIN 4, ADIPOCYTE 0.1121 PLEIOTROPHIN (HEPARIN BINDING GROWTH FACTOR 8) 0.109922 85660 0.107723 ESTS, HIGHLY SIMILAR TO PROBABLE ATAXIA-TELANGIECTASIA 0.107124 V-FOS FBJ MURINE OSTEOSARCOMA VIRAL ONCOGENE HOMOLOG 0.105625 EPIDIDYMIS-SPECIFIC, WHEY-ACIDIC PROTEIN TYPE 0.101326 ALDO-KETO REDUCTASE FAMILY 1, MEMBER C1 0.1007

    Daniela M. Witten A penalized matrix decomposition

    IntroductionPenalized Matrix DecompositionSparse Hierarchical Clustering