how to lasso positively, quickly and correctly · the lasso (a.k.a. l 1 regularization) consider...
TRANSCRIPT
![Page 1: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/1.jpg)
How to lasso positively, quickly and correctly
David Bryant
University of Otago
SUQ 2013
David Bryant
![Page 2: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/2.jpg)
The Lasso (a.k.a. L1 regularization)
Consider the linear inverse problem
y = Xβ + ε.
An L1-regularized solution takes a parameter δ and minimizes thepenalized residual
‖Xβ − y‖22 + δ‖β‖1.
This has the advantage that the solutions are typically sparse:used for variable selection.
When δ = 0 we obtain the least squares solution. As δ →∞,the solutions approach 0.
We can replace ‖β‖ with a weighted version∑
i wi |βi |.Introduced into statistics by Tibshirani (1996) under the nameof ‘the LASSO
David Bryant
![Page 3: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/3.jpg)
Computing the Lasso
Computational challenge to efficiently compute LASSO solutionsfor a range of δ values (i.e. all?).
First observation (from KKT conditions): the set of LASSOsolutions
argminβ{‖Xβ − y‖22 + δ‖β‖1}
for δ ≥ 0 equals the set of solutions to
argminβ{‖Xβ − y‖22 such that ‖β‖1 ≤ λ}
for λ ≥ 0.
David Bryant
![Page 4: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/4.jpg)
LARS - Lasso algorithm
Let β(λ) denote the optimal solution for a given λ. That is,
β(λ) = argmin‖Xβ − y‖22 such that ‖β‖1 ≤ λ.
It is not too hard to show that β(λ) is piecewise linear (as afunction of λ).
0 OLS
Curve starts at 0 and finishes at the un-penalised solution.
David Bryant
![Page 5: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/5.jpg)
Lasso computations
Osborne, Presnall and Turlach (2000)394 citations
Efron, Hastie, Johnstone and Tibshirani (2004)2909 citations
David Bryant
![Page 6: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/6.jpg)
The positive Lasso
In many applications (including ours), we require variablecoefficients which are non-negative.
β(λ) = argmin‖Xβ − y‖2 such that β ≥ 0 and ‖β‖ ≤ λ.
David Bryant
![Page 7: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/7.jpg)
LARS-LASSO
Efron et al. propose a ‘Positive Lasso Lars’ algorithm.
Let β = 0 and c = X′(y − Xβ).while ‖c‖ > 0
A = {i : ci is maximum}.Choose the search direction wA = (X′AXA)−11Move β in direction wA until
an entry becomes negative, ORnew variable(s) join the set of those with maximum ci .
Update c = X′(y − Xβ).end
Problem: can fail if more than one variable leaves or enters A atany one time.
David Bryant
![Page 8: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/8.jpg)
Multiple entries
Is this ‘one-at-a-time’ restriction a problem?
NO: you can always add random noise to break ties.
YES: With a positivity constraint ties appear as part of thealgorithm (not just degenerate data). Also, we ran into problemswith our degenerate models and problems.
David Bryant
![Page 9: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/9.jpg)
Our algorithm for the positive Lasso
Let β = 0 and c = X′y.while ‖c‖ > 0
A = {i : ci is maximum}.Choose the search direction wA = (X′AXA)−11Find v minimizing ‖XA(vA −wA)‖2 such that
vi ≥ 0 when βi = 0.Move β in direction v until
an entry goes negative, ORnew variable(s) join the set A
Update c = X′(y − Xβ).end
David Bryant
![Page 10: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/10.jpg)
KKT conditions
For each λ we want to find
minβ‖Xβ − y‖2 such that β ≥ 0 and ‖β‖1 = λ. (†)
From KKT conditions:
β solves (†) if and only if β is feasible and ci is maximal for all iwith βi > 0.
David Bryant
![Page 11: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/11.jpg)
One step
Consider moving β in direction v. For γ define
βγ = β + γv
so thatcγ = X′(y − Xβγ) = c− γX′Xv.
The conditions that βγ and cγ need to satisfy are then
Feasibility: βγ ≥ 0.
Optimality: cγi maximal for all i such that βγi > 0.
Increasing:∑
βγ >∑
β.
David Bryant
![Page 12: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/12.jpg)
Defining the search direction
Define A = {i : ci maximal}. LARS-Lasso algorithm considers the(unconstrained) search direction w, where wA = (X′AXA)−11.
From above, the actual search direction should be v satisfying
For all i ∈ A such that βi = 0, vi ≥ 0.
For all i ∈ A such that βi > 0 or vi > 0, (X′Xv)i = 1.
For all i ∈ A, (X′Xv)i ≥ 1.
For all i 6∈ A, vi = 0.
These are the KKT conditions for the constrained problem
minv‖X(vA −wA)‖ such that βi = 0⇒ vi ≥ 0
where vi = 0 for all i 6∈ A.
David Bryant
![Page 13: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/13.jpg)
An algorithm for the positive Lasso
Let β = 0 and c = X′y.while ‖c‖ > 0
A = {i : ci is maximum}.Choose the search direction wA = (X′AXA)−11Find v minimizing ‖XA(vA −wA)‖2 such that
vi ≥ 0 when βi = 0.Move β in direction v until
an entry goes negative, ORnew variable(s) join the set A
Update c = X′(y − Xβ).end
David Bryant
![Page 14: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/14.jpg)
Phylogenetics
David Bryant
![Page 15: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/15.jpg)
Application: linear models in phylogenetics
Each edge in the tree corresponds to a split (bipartition) of theobjects into two parts. These splits and their weights determineevolutionary distances between the objects:
0.5 0.2
0.2 0.1
0.1
0.3
0.1 0.1
0.3
A
B
C
D
EF
y contains the observed distances between objects;
X indicates which splits/branches separate which pairs;
β is the vector of split/branch weights to be inferred.
y = Xβ + ε
David Bryant
![Page 16: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/16.jpg)
Application: phylogenetic networks
Most collections of splits do not encode a tree, however they canbe represented using a split network.
0.5 0.1
0.2
0.2
0.2
0.1 0.2
0.1 0.05
0.3 0.1
0.1
0.1
A
B
C
D
EF
Useful for data exploration since we can depict conflicting signals,and represent the amount of noise∗.
David Bryant
![Page 17: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/17.jpg)
Phylogenetic networks
David Bryant
![Page 18: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/18.jpg)
From English accents...
David Bryant
![Page 19: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/19.jpg)
...to Swedish worms.
David Bryant
![Page 20: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/20.jpg)
From the origin of modern wheat....
David Bryant
![Page 21: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/21.jpg)
to the origin of life.
David Bryant
![Page 22: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/22.jpg)
Networks and overfitting
With phylogenetic networks we intentionally over-fit the data.
In practice, many variables (splits) are eliminated using NNLS.
A large component of my student Alethea Rea’s Ph.D. thesis wasdevoted to methods for cleaning up the remainder: the Lasso wasan obvious choice.
David Bryant
![Page 23: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/23.jpg)
Numerical issues
Let n be the number of objects. Then
X is(n2
)×(n2
).
X′X typically poorly conditioned.
X not sparse, but structured, so efficient algorithms for Xv,X′v.
(X′X)−1 sparse.
David Bryant
![Page 24: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/24.jpg)
Numerical issues
Let β = 0 and c = X′y.while ‖c‖ > 0
A = {i : ci is maximum}.Choose the search direction wA = (X′AXA)−11Find v minimizing ‖XA(vA −wA)‖2 such that
vi ≥ 0 when βi = 0.Move β in direction v until
an entry goes negative, ORnew variable(s) join the set A
Update c = X′(y − Xβ).end
David Bryant
![Page 25: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/25.jpg)
Algorithm steps with numerical issues
Let β = 0 and c = X′y.while ‖c‖ > 0
A = {i : ci is maximum}.Choose the search direction wA = (X′AXA)−11Find v minimizing ‖XA(vA −wA)‖2 such that
vi ≥ 0 when βi = 0.Move β in direction v until
an entry goes negative, ORnew variable(s) join the set A
Update c = X′(y − Xβ).end
David Bryant
![Page 26: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/26.jpg)
Strategies
The key computation is the choice of search direction:
minv‖XA(vA −wA)‖2 such that βi = 0⇒ vi ≥ 0.
Started with PGCG (thanks to John) but had problems withconditioning of (X′AXA) and with degeneracy
‘Regressed’ to an active set method. Made use of the sparseness of(X′X)−1, PCG and the Woodbury formula to solve sub-problems.
David Bryant
![Page 27: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/27.jpg)
Simple example: full network
David Bryant
![Page 28: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/28.jpg)
Simple example: lasso networks
David Bryant
![Page 29: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/29.jpg)
Simple example: lasso networks
David Bryant
![Page 30: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/30.jpg)
Simple example: lasso networks
David Bryant
![Page 31: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/31.jpg)
Simple example: lasso networks
David Bryant
![Page 32: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/32.jpg)
Simple example: lasso networks
David Bryant
![Page 33: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/33.jpg)
Simple example: lasso networks
David Bryant
![Page 34: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/34.jpg)
Simple example: lasso networks
David Bryant
![Page 35: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/35.jpg)
Simple example: lasso networks
David Bryant
![Page 36: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/36.jpg)
Simple example: lasso networks
David Bryant
![Page 37: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/37.jpg)
Simple example: lasso networks
David Bryant
![Page 38: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/38.jpg)
Simple example: lasso networks
David Bryant
![Page 39: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/39.jpg)
Open problems
1 Making a choice of λ.
2 Weights within the penalty function (adaptive lasso?)
3 More general error distributions.
David Bryant
![Page 40: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/40.jpg)
LASSO sampling
What I like most about the LARS and LARS-Lasso algorithm isthat you effectively get an estimate for all possible values of λ:these are the points on the path β.
Lasso-LARS sampler forπ(β|y, λ)
would produce a ‘nice’ function β : < −→ <n such that for each λ,β(λ) has the conditional marginal distribution
β(λ) ∼ π(β|y, λ).
Goal:
1 Sample β(λ0) from π(β|y, λ0).
2 Remainder of β(λ) computed deterministically from β(λ0)(e.g. numerically).
David Bryant
![Page 41: How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter](https://reader036.vdocuments.net/reader036/viewer/2022070100/6001dfed0c3737169302617f/html5/thumbnails/41.jpg)
Summary
We propose a way to ‘correct’ the LARS-positive lasso algorithm toaccount for degeneracies.
Motivation was applications to phylogenetic networks, though weare exploring other applications.
David Bryant