sparsity with sign-coherent groups of variables via the cooperative-lasso
TRANSCRIPT
Sparsity with sign-coherent groups of variables via thecooperative-Lasso
Julien Chiquet1, Yves Grandvalet2, Camille Charbonnier1
1 Statistique et Genome, CNRS & Universite d’Evry Val d’Essonne
2 Heudiasyc, CNRS & Universite de Technologie de Compiegne
SSB – 29 mars 2011
arXiv preprint.
http://arxiv.org/abs/1103.2697
R-package scoop.
http://stat.genopole.cnrs.fr/logiciels/scoop
cooperative-Lasso 1
Notations
Let
I Y be the output random variable,
I X = (X1, . . . , Xp) be the input random variables, where Xj is thejth predictor.
The dataGiven a sample (yi, xi), i = 1, . . . , n of i.id. realizations of (Y,X),denote
I y = (y1, . . . , yn)ᵀ the response vector,
I xj = (xj1, . . . , xjn)ᵀ the vector of data for the jth predictor,
I X the n× p design matrix of data whose jth column is xj ,
I D = i : (yi, xi) ∈ training set,I T = i : (yi, xi) ∈ test set.
cooperative-Lasso 2
Generalized linear models
Suppose Y depends linearly on X through a function g:
E(Y ) = g(Xβ?).
We predict a response yi by yi = g(xiβ) for any i ∈ T by solving
β = arg maxβ
`D(β) = arg minβ
∑i∈D
Lg(yi,xiβ),
where Lg is a loss function depending on the function g. Typically,
I if Y is Gaussian and g = Id (OLS),
Lg(y, xβ) = (y − xβ)2
I if Y is binary and g : t 7→ g(t) = (1 + e−t)−1 (logistic regression)
Lg(y, xβ) = −(y · xβ − log
1 + exβ
)or any negative log-likelihood ` of an exponential family distribution.
cooperative-Lasso 3
Generalized linear models
Suppose Y depends linearly on X through a function g:
E(Y ) = g(Xβ?).
We predict a response yi by yi = g(xiβ) for any i ∈ T by solving
β = arg maxβ
`D(β) = arg minβ
∑i∈D
Lg(yi,xiβ),
where Lg is a loss function depending on the function g. Typically,
I if Y is Gaussian and g = Id (OLS),
Lg(y, xβ) = (y − xβ)2
I if Y is binary and g : t 7→ g(t) = (1 + e−t)−1 (logistic regression)
Lg(y, xβ) = −(y · xβ − log
1 + exβ
)or any negative log-likelihood ` of an exponential family distribution.
cooperative-Lasso 3
Estimation and selection at the group level
1. Structure: the set I = 1, . . . , p splits into a known partition.
I =
K⋃k=1
Gk, with Gk ∩ G` = ∅, k 6= `.
2. Sparsity: the support S of β? has few entries.
S = i : β?i 6= 0, such as |S| p.
The group-Lasso estimator
Grandvalet and Canu ’98, Bakin ’99, Yuan and Lin ’06
βgroup
= arg minβ∈Rp
−`D(β) + λ
K∑k=1
wk∥∥βGk∥∥
.
I λ ≥ 0 controls the overall amount of penalty,
I wk > 0 adapts the penalty between groups (dropped hereafter).
cooperative-Lasso 4
Estimation and selection at the group level
1. Structure: the set I = 1, . . . , p splits into a known partition.
I =
K⋃k=1
Gk, with Gk ∩ G` = ∅, k 6= `.
2. Sparsity: the support S of β? has few entries.
S = i : β?i 6= 0, such as |S| p.
The group-Lasso estimator
Grandvalet and Canu ’98, Bakin ’99, Yuan and Lin ’06
βgroup
= arg minβ∈Rp
−`D(β) + λ
K∑k=1
wk∥∥βGk∥∥
.
I λ ≥ 0 controls the overall amount of penalty,
I wk > 0 adapts the penalty between groups (dropped hereafter).
cooperative-Lasso 4
Toy example: the prostate dataset
Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0
lambda (log scale)
coeffi
cien
ts
lcp
agepgg45
gleason
lbph
lcavol
lweight
svi
Figure: Lasso
I lcavol log(cancer volume)
I lweight log(prostate weight)
I age age
I lbph log(benign prostatichyperplasia amount)
I svi seminal vesicle invasion
I lcp log(capsular penetration)
I gleason Gleason score
I pgg45 percentage Gleason scores 4or 5
cooperative-Lasso 5
Toy example: the prostate dataset
Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.
age
pgg45
lbph
lcavol
svi
lcp
lweigh
t
gleason
0100
200
300
400
500
600
Heigh
t
Figure: hierarchical clustering
I lcavol log(cancer volume)
I lweight log(prostate weight)
I age age
I lbph log(benign prostatichyperplasia amount)
I svi seminal vesicle invasion
I lcp log(capsular penetration)
I gleason Gleason score
I pgg45 percentage Gleason scores 4or 5
cooperative-Lasso 5
Toy example: the prostate dataset
Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.
-3 -2 -1 0
lambda (log scale)
coeffi
cien
ts
lcp
agepgg45
gleason
lbph
lcavol
lweight
svi
Figure: group-Lasso
I lcavol log(cancer volume)
I lweight log(prostate weight)
I age age
I lbph log(benign prostatichyperplasia amount)
I svi seminal vesicle invasion
I lcp log(capsular penetration)
I gleason Gleason score
I pgg45 percentage Gleason scores 4or 5
cooperative-Lasso 5
Toy example: the prostate dataset
Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0
lambda (log scale)
coeffi
cien
ts
lcp
agepgg45
gleason
lbph
lcavol
lweight
svi
Figure: Lasso
I lcavol log(cancer volume)
I lweight log(prostate weight)
I age age
I lbph log(benign prostatichyperplasia amount)
I svi seminal vesicle invasion
I lcp log(capsular penetration)
I gleason Gleason score
I pgg45 percentage Gleason scores 4or 5
cooperative-Lasso 5
Application to splice site detection
Predict splice site status (0/1) by a sequence of 7 bases and theirinteractions.
1 2 3 4 5 6 7 8 9
Position
0
0.5
1
1.5
2
Inform
ationcontent I order 0: 7 factors with 4 levels,
I order 1: C27 factors with 42 levels,
I order 2: C37 factors with 43 levels,
I using dummy coding for factor,we form groups.
L. Meier, S. van de Geer, P. Buhlmann, 2008.The group-Lasso for logistic regression, JRSS series B.
cooperative-Lasso 6
Application to splice site detection
Predict splice site status (0/1) by a sequence of 7 bases and theirinteractions.
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0
g18
g5
g4
g44
g54
g42
g49
g45
g61
order 0order 1order 2
I order 0: 7 factors with 4 levels,
I order 1: C27 factors with 42 levels,
I order 2: C37 factors with 43 levels,
I using dummy coding for factor,we form groups.
L. Meier, S. van de Geer, P. Buhlmann, 2008.The group-Lasso for logistic regression, JRSS series B.
cooperative-Lasso 6
Group-Lasso limitations
1. Not a single zero should belong to a group with non-zerosI Strong group sparsity (Huang and Zhang, ’10 arXiv)
establish the conditions where the group-Lasso outperforms the Lasso,and conversely.
2. No sign-coherence within groupI Required if groups gather consonant variables
e.g., groups defined by clusters of positively correlated variables.
The cooperative-Lasso
A penalty which assumes a sign-coherent group structure, that is to say,groups which gather either
I non-positive,
I non-negative,
I or null parameters.
cooperative-Lasso 7
Group-Lasso limitations
1. Not a single zero should belong to a group with non-zerosI Strong group sparsity (Huang and Zhang, ’10 arXiv)
establish the conditions where the group-Lasso outperforms the Lasso,and conversely.
2. No sign-coherence within groupI Required if groups gather consonant variables
e.g., groups defined by clusters of positively correlated variables.
The cooperative-Lasso
A penalty which assumes a sign-coherent group structure, that is to say,groups which gather either
I non-positive,
I non-negative,
I or null parameters.
cooperative-Lasso 7
Motivation: multiple network inference
experiment 1 experiment 2 experiment 3
inference inference inference
A group is a set of corresponding edges across tasks (e.g., red or blueones): sign-coherence matters!
J. Chiquet, Y. Grandvalet, C. Ambroise, 2010.Inferring multiple graphical structures, Statistics and Computing.
cooperative-Lasso 8
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimize
β∈Rp‖β − y‖2,
s.t
p∑i=1
|βi − βi−1| < s,
where
I y a vector in Rp,
I β a vector in Rp,
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimizeβ∈Rn×p
‖β −Y‖2,
s.t
p∑i=1
‖βi − βi−1‖ < s,
where
I Y a n× p matrix with n profileswith size p.
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimizeβ∈Rn×p
‖β −Y‖2,
s.t
p∑i=1
‖βi − βi−1‖ < s,
where
I Y a n× p matrix with n profileswith size p.
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimizeβ∈Rn×p
‖β −Y‖2,
s.t
p∑i=1
‖βi − βi−1‖ < s,
where
I Y a n× p matrix with n profileswith size p.
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimizeβ∈Rn×p
‖β −Y‖2,
s.t
p∑i=1
‖βi − βi−1‖ < s,
where
I Y a n× p matrix with n profileswith size p.
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimizeβ∈Rn×p
‖β −Y‖2,
s.t
p∑i=1
‖βi − βi−1‖ < s,
where
I Y a n× p matrix with n profileswith size p.
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimizeβ∈Rn×p
‖β −Y‖2,
s.t
p∑i=1
‖βi − βi−1‖ < s,
where
I Y a n× p matrix with n profileswith size p.
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 10
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 11
The cooperative-Lasso estimator
Definition
βcoop
= arg minβ∈Rp
J(β), with J(β) = −`D(β) + λ‖β‖coop,
where, for any v ∈ Rp,
‖v‖coop = ‖v+‖group + ‖v−‖group =
K∑k=1
∥∥∥v+Gk
∥∥∥+∥∥∥v−Gk∥∥∥ ,
and
I v+ = (v+1 , . . . , v+p ), v+j = max(0, vj),
I v− = (v−1 , . . . , v−p ), v+j = max(0,−vj).
cooperative-Lasso 12
A geometric view of sparsity`(β1,β
2)
β2 β1
minimizeβ1,β2
−`(β1, β2) + λΩ(β1, β2)
mmaximize
β1,β2`(β1, β2)
s.t. Ω(β1, β2) ≤ c
cooperative-Lasso 13
A geometric view of sparsityβ2
β1
minimizeβ1,β2
−`(β1, β2) + λΩ(β1, β2)
mmaximize
β1,β2`(β1, β2)
s.t. Ω(β1, β2) ≤ c
cooperative-Lasso 13
Ball crafting: group-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖group ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 14
Ball crafting: group-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖group ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 14
Ball crafting: group-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖group ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 14
Ball crafting: group-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖group ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 14
Ball crafting: cooperative-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖coop ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 15
Ball crafting: cooperative-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖coop ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 15
Ball crafting: cooperative-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖coop ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 15
Ball crafting: cooperative-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖coop ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 15
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 16
Convex analysisSupporting Hyperplane
An hyperplane supports a set iff
I the set is contained in one half-space
I the set has at least one point on the hyperplane
β2
β1
cooperative-Lasso 17
Convex analysisSupporting Hyperplane
An hyperplane supports a set iff
I the set is contained in one half-space
I the set has at least one point on the hyperplane
β2
β1
β2
β1
cooperative-Lasso 17
Convex analysisSupporting Hyperplane
An hyperplane supports a set iff
I the set is contained in one half-space
I the set has at least one point on the hyperplane
β2
β1
β2
β1
cooperative-Lasso 17
Convex analysisSupporting Hyperplane
An hyperplane supports a set iff
I the set is contained in one half-space
I the set has at least one point on the hyperplane
β2
β1
β2
β1
cooperative-Lasso 17
Convex analysisSupporting Hyperplane
An hyperplane supports a set iff
I the set is contained in one half-space
I the set has at least one point on the hyperplane
β2
β1
β2
β1
β2
β1
There are Supporting Hyperplane at all points of convex sets:Generalize tangents
cooperative-Lasso 17
Convex analysisDual Cone and subgradient
Generalizes normals
β2
β1
β2
β1
β2
β1
g is a subgradient at xm
the vector (g,−1) is normal to the supporting hyperplane at this point
The subdifferential at x is the set of all subgradient at x.
cooperative-Lasso 18
Convex analysisDual Cone and subgradient
Generalizes normals
β2
β1
β2
β1
β2
β1
g is a subgradient at xm
the vector (g,−1) is normal to the supporting hyperplane at this point
The subdifferential at x is the set of all subgradient at x.
cooperative-Lasso 18
Convex analysisDual Cone and subgradient
Generalizes normals
β2
β1
β2
β1
β2
β1
g is a subgradient at xm
the vector (g,−1) is normal to the supporting hyperplane at this point
The subdifferential at x is the set of all subgradient at x.
cooperative-Lasso 18
Convex analysisDual Cone and subgradient
Generalizes normals
β2
β1
β2
β1
β2
β1
g is a subgradient at xm
the vector (g,−1) is normal to the supporting hyperplane at this point
The subdifferential at x is the set of all subgradient at x.
cooperative-Lasso 18
Optimality conditions
TheoremA necessary and sufficient condition for the optimality of β is that thenull vector 0 belong to the subdifferential of the convex function J :
0 3 ∂βJ(β) = v ∈ Rp : v = −∇β`(β) + λθ,
where θ ∈ Rp belongs to the subdifferential of the coop-norm.Define
ϕj(v) = (sign(vj)v)+,
then θ is such as
∀k ∈ 1, . . . ,K , ∀j ∈ Sk(β) , θj =βj∥∥ϕj(βGk)
∥∥ ,
∀k ∈ 1, . . . ,K , ∀j ∈ Sck(β) ,∥∥ϕj (θGk)
∥∥ ≤ 1.
We derive a subset algorithm to solve that problem (that you canenjoy in the paper and the package).
cooperative-Lasso 19
Optimality conditions
TheoremA necessary and sufficient condition for the optimality of β is that thenull vector 0 belong to the subdifferential of the convex function J :
0 3 ∂βJ(β) = v ∈ Rp : v = −∇β`(β) + λθ,
where θ ∈ Rp belongs to the subdifferential of the coop-norm.Define
ϕj(v) = (sign(vj)v)+,
then θ is such as
∀k ∈ 1, . . . ,K , ∀j ∈ Sk(β) , θj =βj∥∥ϕj(βGk)
∥∥ ,
∀k ∈ 1, . . . ,K , ∀j ∈ Sck(β) ,∥∥ϕj (θGk)
∥∥ ≤ 1.
We derive a subset algorithm to solve that problem (that you canenjoy in the paper and the package).
cooperative-Lasso 19
Optimality conditions
TheoremA necessary and sufficient condition for the optimality of β is that thenull vector 0 belong to the subdifferential of the convex function J :
0 3 ∂βJ(β) = v ∈ Rp : v = −∇β`(β) + λθ,
where θ ∈ Rp belongs to the subdifferential of the coop-norm.Define
ϕj(v) = (sign(vj)v)+,
then θ is such as
∀k ∈ 1, . . . ,K , ∀j ∈ Sk(β) , θj =βj∥∥ϕj(βGk)
∥∥ ,
∀k ∈ 1, . . . ,K , ∀j ∈ Sck(β) ,∥∥ϕj (θGk)
∥∥ ≤ 1.
We derive a subset algorithm to solve that problem (that you canenjoy in the paper and the package).
cooperative-Lasso 19
Linear regression with orthonormal design
Consider
β = arg minβ
1
2‖y −Xβ‖2 + λΩ(β)
,
with XᵀX = I. Hence, (xj)ᵀ(Xβ − y) = βj − βols
and
β = arg minβ
1
2βᵀ(β − βols
) + λΩ(β)
.
We may find a closed-form of β for, e.g.,
1. Ω(β) = ‖β‖lasso,
2. Ω(β) = ‖β‖group,
3. Ω(β) = ‖β‖coop.
cooperative-Lasso 20
Linear regression with orthonormal design
Consider
β = arg minβ
1
2‖y −Xβ‖2 + λΩ(β)
,
with XᵀX = I. Hence, (xj)ᵀ(Xβ − y) = βj − βols
and
β = arg minβ
1
2βᵀ(β − βols
) + λΩ(β)
.
We may find a closed-form of β for, e.g.,
1. Ω(β) = ‖β‖lasso,
2. Ω(β) = ‖β‖group,
3. Ω(β) = ‖β‖coop.
cooperative-Lasso 20
Linear regression with orthonormal design
βlasso1
βols2 βols
1
∀j ∈ 1, . . . , p ,
βlassoj =
1− λ∣∣∣βolsj
∣∣∣+ βols
j ,
∣∣∣βlassoj
∣∣∣ =(∣∣∣βols
j
∣∣∣− λ)+ .
Fig.: Lasso as a function of the OLS coefficients
cooperative-Lasso 20
Linear regression with orthonormal design
βgroup1
βols2 βols
1
∀k ∈ 1, . . . ,K , ∀j ∈ Gk ,
βgroupj =
1− λ∥∥∥βols
Gk
∥∥∥+ βols
j ,
∥∥∥βgroup
Gk
∥∥∥ =(∥∥∥βols
Gk
∥∥∥− λ)+ .
Fig.: Group-Lasso as a function of the OLS coefficients
cooperative-Lasso 20
Linear regression with orthonormal design
βcoop1
βols2 βols
1
∀k ∈ 1, . . . ,K , ∀j ∈ Gk ,
βcoopj =
1− λ∥∥∥ϕj(βols
Gk )∥∥∥+ βols
j ,
∥∥∥ϕj(βcoop
Gk )∥∥∥ =
(∥∥∥ϕj(βols
Gk )∥∥∥− λ)+ .
Fig.: Coop-Lasso as a function of the OLS coefficients
cooperative-Lasso 20
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 21
Linear regression setupTechnical assumptions
(A1) X and Y have finite fourth order moments
E‖X‖4 <∞, E|Y |4 <∞,
(A2) the covariance matrix Ψ = EXXᵀ ∈ Rp×p is invertible,
(A3) for every k = 1, . . . ,K,if ‖(β?)+‖ > 0 and ‖(β?)−‖ > 0 then for every j ∈ Gk β?j 6= 0.(All sign-coherent groups are either included or excluded from the true support).
cooperative-Lasso 22
Irrepresentability condition
Define Sk = S ∩ Gk the support within a group and
[D(β)]jj = ‖[sign(βj)βGk ]+‖−1.
Assume there exists η > 0 such that
(A4) For every group Gk including at least one null coefficient:
max(‖(ΨSckSΨ−1SSD(β?S)β?S)+‖, ‖(ΨSckSΨ−1SSD(β?S)β?S)−‖) ≤ 1− η,
(A5) For every group Gk intersecting the support and including eitherpositive or negative coefficients, let νk be the sign of thesecoefficients (νk = 1 if ‖(β?Gk)+‖ > 0 and νk = −1 if ‖(β?Gk)−‖ > 0):
νkΨSckSΨ−1SSD(β?S)β?S 0,
where denotes componentwise inequality.
cooperative-Lasso 23
Consistency results
TheoremIf assumptions (A1-5) are satisfied and if there exists η > 0, then forevery sequence λn such that λn = λ0n
−γ , γ ∈]0, 1/2[,
βcoop P−→ β? and P(S(β
coop) = S)→ 1.
Asymptotically, the cooperative-Lasso is unbiased and enjoys exactsupport recovery (even when there are irrelevant variables within agroup).
cooperative-Lasso 24
Sketch of the proof
1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.
2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β
coop
S and βcoop
Sc = 0, by uniqueness.
3. We need to prove that limn→∞ P(En) = 1.
4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from
I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.
5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.
6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.
cooperative-Lasso 25
Sketch of the proof
1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.
2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β
coop
S and βcoop
Sc = 0, by uniqueness.
3. We need to prove that limn→∞ P(En) = 1.
4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from
I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.
5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.
6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.
cooperative-Lasso 25
Sketch of the proof
1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.
2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β
coop
S and βcoop
Sc = 0, by uniqueness.
3. We need to prove that limn→∞ P(En) = 1.
4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from
I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.
5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.
6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.
cooperative-Lasso 25
Sketch of the proof
1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.
2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β
coop
S and βcoop
Sc = 0, by uniqueness.
3. We need to prove that limn→∞ P(En) = 1.
4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from
I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.
5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.
6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.
cooperative-Lasso 25
Illustration
-3 -2 -1 0 1
-1.0
-0.5
0.0
0.5
1.0
log10(λ)
coeffi
cien
ts
Generate data y = Xβ? + σε,
I β? = (1, 1,−1,−1, 0, 0, 0, 0)I G = 1, 2, 3, 4, 5, 6, 7, 8I σ = 0.1, R2 ≈ 0.99, n = 20,
I irrepresentability conditionsI holds for the coop-Lasso,I holds not for the group-Lasso.
I average over 100 simulations.
Fig.:: 50% coverage intervals (upper / lower quartiles)
cooperative-Lasso 26
Illustration
-3 -2 -1 0 1
-1.0
-0.5
0.0
0.5
1.0
log10(λ)
coeffi
cien
ts
Generate data y = Xβ? + σε,
I β? = (1, 1,−1,−1, 0, 0, 0, 0)I G = 1, 2, 3, 4, 5, 6, 7, 8I σ = 0.1, R2 ≈ 0.99, n = 20,
I irrepresentability conditionsI holds for the coop-Lasso,I holds not for the group-Lasso.
I average over 100 simulations.
Fig.:group-Lasso: 50% coverage intervals (upper / lower quartiles)
cooperative-Lasso 26
Illustration
-3 -2 -1 0 1
-1.0
-0.5
0.0
0.5
1.0
log10(λ)
coeffi
cien
ts
Generate data y = Xβ? + σε,
I β? = (1, 1,−1,−1, 0, 0, 0, 0)I G = 1, 2, 3, 4, 5, 6, 7, 8I σ = 0.1, R2 ≈ 0.99, n = 20,
I irrepresentability conditionsI holds for the coop-Lasso,I holds not for the group-Lasso.
I average over 100 simulations.
Fig.:coop-Lasso: 50% coverage intervals (upper / lower quartiles)
cooperative-Lasso 26
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 27
Optimism of the training error
I The training error:
err =1
|D|∑i∈D
L(yi,xiβ).
I The test error (“extra-sample” error):
Errex = EX,Y [L(Y,Xβ)|D].
I The “in-sample” error
Errin =1
|D|∑i∈D
EY[L(Yi,xiβ)|D
].
Definition (Optimism)
Errin = err + ”optimism”.
cooperative-Lasso 28
Optimism of the training error
I The training error:
err =1
|D|∑i∈D
L(yi,xiβ).
I The test error (“extra-sample” error):
Errex = EX,Y [L(Y,Xβ)|D].
I The “in-sample” error
Errin =1
|D|∑i∈D
EY[L(Yi,xiβ)|D
].
Definition (Optimism)
Errin = err + ”optimism”.
cooperative-Lasso 28
Cp statistics
For squared-error loss (and some other loss),
Errin = err +2
|D|∑i∈D
cov(yi, yi).
The amount by which err underestimates the true error dependson how strongly yi affects its own prediction. The harder we fitthe data, the greater the covariance will be thereby increasingthe optimism (ESLII 5th print).
Mallows’ Cp Statistic
For a linear regression fit yi with p inputs∑
i∈D cov(yi, yi) = pσ2 :
Cp = err + 2 · df
|D| σ2, with df = p.
cooperative-Lasso 29
Cp statistics
For squared-error loss (and some other loss),
Errin = err +2
|D|∑i∈D
cov(yi, yi).
The amount by which err underestimates the true error dependson how strongly yi affects its own prediction. The harder we fitthe data, the greater the covariance will be thereby increasingthe optimism (ESLII 5th print).
Mallows’ Cp Statistic
For a linear regression fit yi with p inputs∑
i∈D cov(yi, yi) = pσ2 :
Cp = err + 2 · df
|D| σ2, with df = p.
cooperative-Lasso 29
Generalized degrees of freedom
Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.
Proposition (Efron (’04)+ Stein’s Lemma (’81))
df(λ).=
1
σ2
∑i∈D
cov(yi(λ), yi) = Ey
[tr∂yλ∂y
].
For the Lasso, Zou et al. (’07) show that
df lasso(λ) =∥∥∥βlasso
(λ)∥∥∥0.
Assuming XᵀX = I Yuan and Lin (’06) show for the group-Lasso thatthe trace term equals
dfgroup(λ) =
K∑k=1
1(∥∥∥βgroup
Gk (λ)∥∥∥ > 0
)1 +
∥∥∥βgroup
Gk (λ)∥∥∥∥∥βols
Gk∥∥ (pk − 1)
.
cooperative-Lasso 30
Generalized degrees of freedom
Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.
Proposition (Efron (’04)+ Stein’s Lemma (’81))
df(λ).=
1
σ2
∑i∈D
cov(yi(λ), yi) = Ey
[tr∂yλ∂y
].
For the Lasso, Zou et al. (’07) show that
df lasso(λ) =∥∥∥βlasso
(λ)∥∥∥0.
Assuming XᵀX = I Yuan and Lin (’06) show for the group-Lasso thatthe trace term equals
dfgroup(λ) =
K∑k=1
1(∥∥∥βgroup
Gk (λ)∥∥∥ > 0
)1 +
∥∥∥βgroup
Gk (λ)∥∥∥∥∥βols
Gk∥∥ (pk − 1)
.
cooperative-Lasso 30
Generalized degrees of freedom
Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.
Proposition (Efron (’04)+ Stein’s Lemma (’81))
df(λ).=
1
σ2
∑i∈D
cov(yi(λ), yi) = Ey
[tr∂yλ∂y
].
For the Lasso, Zou et al. (’07) show that
df lasso(λ) =∥∥∥βlasso
(λ)∥∥∥0.
Assuming XᵀX = I Yuan and Lin (’06) show for the group-Lasso thatthe trace term equals
dfgroup(λ) =
K∑k=1
1(∥∥∥βgroup
Gk (λ)∥∥∥ > 0
)1 +
∥∥∥βgroup
Gk (λ)∥∥∥∥∥βols
Gk∥∥ (pk − 1)
.
cooperative-Lasso 30
Approximated degrees of freedom for the coop-Lasso
Proposition
Assuming that data are generated according to a linear regression modeland that X is orthonormal, the following expression of dfcoop(λ) is anunbiased estimate of df(λ)
dfcoop(λ) =
K∑k=1
1∥∥∥∥(βcoopGk
(λ))+∥∥∥∥>0
1 + (pk+ − 1)
∥∥∥∥(βcoop
Gk (λ))+∥∥∥∥∥∥∥∥(βols
Gk
)+∥∥∥∥
+ 1∥∥∥∥(βcoopGk
(λ))−∥∥∥∥>0
1 + (pk− − 1)
∥∥∥∥(βcoop
Gk (λ))−∥∥∥∥∥∥∥(βols
Gk)−∥∥∥
,
where pk+ and pk− are respectively the number of positive and negative
entries in βols
Gk (γ).
cooperative-Lasso 31
Approximated degrees of freedom for the coop-Lasso
Proposition
Assuming that data are generated according to a linear regression modeland that X is orthonormal, the following expression of dfcoop(λ) is anunbiased estimate of df(λ)
dfcoop(λ) =
K∑k=1
1∥∥∥∥(βcoopGk
(λ))+∥∥∥∥>0
1 +
pk+− 1
1 + γ
∥∥∥∥(βcoop
Gk (λ))+∥∥∥∥∥∥∥∥(βridge
Gk (γ))+∥∥∥∥
+ 1∥∥∥∥(βcoopGk
(λ))−∥∥∥∥>0
1 +
pk−− 1
1 + γ
∥∥∥∥(βcoop
Gk (λ))−∥∥∥∥∥∥∥∥(βridge
Gk (γ))−∥∥∥∥
,
where pk+ and pk− are respectively the number of positive and negative
entries in βridge
Gk (γ).cooperative-Lasso 31
Approximated information criteria
Following Zou et al, we extend the Cp stat to an “approximated” AIC
AIC(λ) =‖y − y(λ)‖
σ2+ 2df(λ),
and from the AIC, there is (small) step to BIC:
BIC(λ) =‖y − y(λ)‖
σ2+ log(n)df(λ).
I The K–fold cross-validation works well but is computationallyintensive.
I It is required when we do not meet the linear regression setup. . .
cooperative-Lasso 32
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 33
Revisiting Elastic-Net experiments (1)
lasso enet group coop
1020
3040
5060
70
MS
E
Generate data y = Xβ? + σε,
I β? =(0, . . . , 0︸ ︷︷ ︸
10
, 2, . . . , 2︸ ︷︷ ︸10
, 0, . . . , 0︸ ︷︷ ︸10
, 2, . . . , 2︸ ︷︷ ︸10
)
I G1 = 1, . . . , 10,G2 = 11, . . . , 20,G3 = 21, . . . , 30,G4 = 31, . . . , 40.
I σ = 15, corr(xi,xj) = 0.5,
I training/validation/test/ =100/100/400,
I average over 100 simulations.
cooperative-Lasso 34
Revisiting Elastic-Net experiments (2)
lasso enet group coop
050
100
150
200
250
MS
E
Generate data y = Xβ? + σε,
I β? = (3, . . . , 3︸ ︷︷ ︸15
, 0, . . . , 0︸ ︷︷ ︸25
)
I σ = 15,
I G1 = 1, . . . , 5,G2 = 6, . . . , 10,G3 = 11, . . . , 15,G4 = 16, . . . , 40.
I xj = Z1 + ε, Z1 ∼ N (0, 1), ∀j ∈ G1I xj = Z3 + ε, Z2 ∼ N (0, 1), ∀j ∈ G2I xj = Z3 + ε, Z3 ∼ N (0, 1), ∀j ∈ G3I xj ∼ N (0, 1), ∀j ∈ G4.
I training/validation/test/ =50/50/400,
I average over 100 simulations.
cooperative-Lasso 35
Breiman’s setupSimulations setting
A wave-like vector of parameters β?
I p = 90 variables partitioned into K = 10 groups of size pk = 9,
I 3 (partially) active groups, 6 groups of zeros,
I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.
0 20 40 60 80
Figure: β? with h = 1, |Sk| = 1 non-zero coefficients in each active group.
cooperative-Lasso 36
Breiman’s setupSimulations setting
A wave-like vector of parameters β?
I p = 90 variables partitioned into K = 10 groups of size pk = 9,
I 3 (partially) active groups, 6 groups of zeros,
I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.
0 20 40 60 80
Figure: β? with h = 2, |Sk| = 3 non-zero coefficients in each active group.
cooperative-Lasso 36
Breiman’s setupSimulations setting
A wave-like vector of parameters β?
I p = 90 variables partitioned into K = 10 groups of size pk = 9,
I 3 (partially) active groups, 6 groups of zeros,
I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.
0 20 40 60 80
Figure: β? with h = 3, |Sk| = 5 non-zero coefficients in each active group.
cooperative-Lasso 36
Breiman’s setupSimulations setting
A wave-like vector of parameters β?
I p = 90 variables partitioned into K = 10 groups of size pk = 9,
I 3 (partially) active groups, 6 groups of zeros,
I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.
0 20 40 60 80
Figure: β? with h = 4, |Sk| = 7 non-zero coefficients in each active group.
cooperative-Lasso 36
Breiman’s setupSimulations setting
A wave-like vector of parameters β?
I p = 90 variables partitioned into K = 10 groups of size pk = 9,
I 3 (partially) active groups, 6 groups of zeros,
I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.
0 20 40 60 80
Figure: β? with h = 5, |Sk| = 9 non-zero coefficients in each active group.
cooperative-Lasso 36
Breiman’s setupExample of path of solution and signal recovery with BIC choice
The signal strength is generated so as
I y = Xβ? + σε, with σ = 1, n = 30 to 500,
I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),
I magnitude in β chosen so as R2 ≈ 0.75.
Remark
Covariance structure is purposely disconnected from the group structure.
None of the support recovery conditions are fulfilled.
cooperative-Lasso 37
Breiman’s setupExample of path of solution and signal recovery with BIC choice
The signal strength is generated so as
I y = Xβ? + σε, with σ = 1, n = 30 to 500,
I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),
I magnitude in β chosen so as R2 ≈ 0.75.
One shot sample with n = 120
cooperative-Lasso 37
Breiman’s setupExample of path of solution and signal recovery with BIC choice
The signal strength is generated so as
I y = Xβ? + σε, with σ = 1, n = 30 to 500,
I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),
I magnitude in β chosen so as R2 ≈ 0.75.
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
-0.2
0.0
0.2
0.4
0.6
log10(λ)
βlasso
0 20 40 60 80
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
i
βlasso
True signal
Estimated signal
Figure: Lassocooperative-Lasso 37
Breiman’s setupExample of path of solution and signal recovery with BIC choice
The signal strength is generated so as
I y = Xβ? + σε, with σ = 1, n = 30 to 500,
I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),
I magnitude in β chosen so as R2 ≈ 0.75.
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
log10(λ)
βgroup
0 20 40 60 80
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
i
βgroup
True signal
Estimated signal
Figure: Group-Lassocooperative-Lasso 37
Breiman’s setupExample of path of solution and signal recovery with BIC choice
The signal strength is generated so as
I y = Xβ? + σε, with σ = 1, n = 30 to 500,
I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),
I magnitude in β chosen so as R2 ≈ 0.75.
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
log10(λ)
βco
op
0 20 40 60 80
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
i
βco
op
True signal
Estimated signal
Figure: Coop-Lassocooperative-Lasso 37
Breiman’s setupErrors as a function of the sample size n
pred
icti
on
erro
r
100 200 300 400 500
0.0
0.2
0.4
0.6
0.8
1.0
1.2
sig
ner
ror
100 200 300 400 500
0.00
0.05
0.10
0.15
0.20
0.25
0.30
n n
Figure: h = 3, |Sk| = 5 (favoring Lasso).
lasso group coop
cooperative-Lasso 38
Breiman’s setupErrors as a function of the sample size n
pred
icti
on
erro
r
100 200 300 400 500
0.0
0.2
0.4
0.6
0.8
1.0
1.2
sig
ner
ror
100 200 300 400 500
0.00
0.05
0.10
0.15
0.20
0.25
0.30
n n
Figure: h = 4, |Sk| = 7 (intermediate).
lasso group coop
cooperative-Lasso 38
Breiman’s setupErrors as a function of the sample size n
pred
icti
on
erro
r
100 200 300 400 500
0.0
0.2
0.4
0.6
0.8
1.0
1.2
sig
ner
ror
100 200 300 400 500
0.00
0.05
0.10
0.15
0.20
0.25
0.30
n n
Figure: h = 5, |Sk| = 9 (favoring group-Lasso).
lasso group coop
cooperative-Lasso 38
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 39
Robust microarray gene selection
Affymetrix typically contains multiple probes per gene defined as siblingprobes.
Reasons (Li, Zhu, Cook, BMC genomics 2008)
1. lack of knowledgegenome annotation maps probe sets to the same genes after chip design.
2. instabilityprobe sets cross-hybridize in an unpredictable manner.
3. designed on purposeprobe sets specific to RNA variant (splicing).
at least two good reasons to put sibling probe sets in the same group
cooperative-Lasso 40
Robust microarray gene selection
Affymetrix typically contains multiple probes per gene defined as siblingprobes.
Reasons (Li, Zhu, Cook, BMC genomics 2008)
1. lack of knowledgegenome annotation maps probe sets to the same genes after chip design.
2. instabilityprobe sets cross-hybridize in an unpredictable manner.
3. designed on purposeprobe sets specific to RNA variant (splicing).
at least two good reasons to put sibling probe sets in the same group
cooperative-Lasso 40
Application: Basal tumor
Methodology
1. select a restricted number of d probes from differential analysis,
2. determine the genes associated to these d probes, retrieve all the pprobes related to the genes, regardless of their signal,
3. fit a model with group penalties where groups are defined by genes.
Breast cancer data set
I 22269 probes,
I n = 29 patients with basal tumor,
I predict response to chemotherapy: pCR / not-pCR.
cooperative-Lasso 41
Application: Basal tumor
Pretreatment
I ordered p-values with differential analysis (Jeanmougin et al. 2011),
I keep the d = 10 most differentiated probes,
I this corresponds to exactly 10 genes for a total of 27 probes.
Methods comparison
1. probes: logistic Lasso on the d = 10 most differentiated probes,
2. lasso: logistic Lasso on the p = 27 probes (with no group effect),
3. group: logistic group-Lasso on the p = 27 probes (with groupeffect),
4. coop: logistic coop-Lasso on the p = 27 probes (with signed groupeffect).
cooperative-Lasso 42
Results
Gk (gene) pk symbol probes lasso group coop
frmd4b 3 0.38 0.62 0.68 0.75rnps1 2 0 0 0 0phlda3 1 1.82 1.93 4.12 7.32tbc1d22a 3 0 0 0 0ece1 2 0.89 0 0 1.87lzts1 6 1.34 1.57 1.15 0rpp38 1 0.95 0.90 1.92 3.66gtse1 5 0.88 0.85 1.21 0pak4 3 1.68 0.96 1.70 4.58chst10 1 0.79 0.36 1.08 2.50
Table: Genes corresponding to the probes selected by differential analysis, sizeof groups of probes, and `2-norm of each group of parameters for each estimate.
cooperative-Lasso 43
Results0
24
6
Figure: Lasso
Gk (gene) pk symbol
frmd4b 3
rnps1 2
phlda3 1
tbc1d22a 3
ece1 2
lzts1 6
rpp38 1
gtse1 5
pak4 3
chst10 1
cooperative-Lasso 44
Results0
24
6
Figure: Group-Lasso
Gk (gene) pk symbol
frmd4b 3
rnps1 2
phlda3 1
tbc1d22a 3
ece1 2
lzts1 6
rpp38 1
gtse1 5
pak4 3
chst10 1
cooperative-Lasso 44
Results0
24
6
Figure: Coop-Lasso
Gk (gene) pk symbol
frmd4b 3
rnps1 2
phlda3 1
tbc1d22a 3
ece1 2
lzts1 6
rpp38 1
gtse1 5
pak4 3
chst10 1
cooperative-Lasso 44
Results
0 5 10 15 20
0.0
0.2
0.4
0.6
0.8
1.0
1.2
‖β‖
binom
ialdeviance
lassogroupcoopprobes CV(λ?) CV
?
probes 0.511 0.474lasso 0.513 0.499group 0.430 0.372coop 0.263 0.194
Table: Best average CV scoreCV(λ?) and averaged best CV
score CV?
.
cooperative-Lasso 45
Conclusion
Summary
I A variant of the group-Lasso which assumes sign-coherent groups,possibly sparse.
I the coop-Lasso comes with the “usual” accompanying toolsI consistency theorem,I model selection criteria,I subset algorithm,I R-package scoop
I very encouraging results on real genomic data
Perspective
I enhance algorithms/implementation for large scale experiments
I deeper analysis in the gene selection framework
I other application in genomics (aCGH segmentation ?)
cooperative-Lasso 46