high-dimensional error analysis of regularized m-estimators ehsan abbasichristos thrampoulidisbabak...
TRANSCRIPT
1
High-dimensional Error Analysis of Regularized M-Estimators
Ehsan AbbasiChristos Thrampoulidis Babak Hassibi
Allerton ConferenceWednesday September 30, 2015
2
Linear Regression ModelEstimate unknown signal from noisy linear measurements:
measurement/design matrix
unknown signal
noise vector
3
M-estimatorsFor some convex loss function solve:
• Maximum Likelihood (ML) estimators
?
• least-squares, least-absolute deviationsHuber-loss, etc…
Fisher information, consistency, asymptotic normality,Cramer-Rao bound, ML, robust statistics, Huber loss, optimal loss …
4
Why revisit & what changes?
• Modern: n is increasingly large machine learning, image processing, sensor/social networks, DNA microarrays, ...
• Structured signals: sparse, low-rank, block-sparse, low-varying …
Regularized M-estimators
• Compressive sensing:
• Traditional: but the ambient dimension n is fixed
• Regularizer is structure inducing, convex, typically non-smoothL1 , nuclear, L1/L2 norms, total variation …atomic norms
atomic norms
5
Classical question - Modern regime: New results & phenomena
• High-dimensional Proportional regime
?
• Question goes back to 50’s (Huber, Kolmogorov…)• Only very recent advances, special instances, strict assumptions• No general theory!
has entries iid GaussianAssumption:
• benchmark in CS/statistics theory• universality
6
Contribution
• at a rate Assume
• has entries iid Gaussian
• mild regularity conditions on , pz, f, and px0
Then, with probability one,
where is the unique solution to a system of four nonlinear equationsin four unknowns :
7
The Equations
Let’s parse them,to get some insight …
8
The Explicit ones
and appear in the equations explicitly.
9
The Loss and the Regularizer
The loss function and the regularizer appear through their Moureau envelope approximations.
In the traditional regime instead of the Moureau envelopes the functions themselves appear
10
The Distributions
The convolution of the pdf of the noise with a gaussian is a completely new phenomenon compared to the traditional regime
11
The Expected Moureau Envelope• The role of and is summarized in
• how they affect error performance of the M-estimator • (strictly) convex and continuously differentiable
even if is non-differentiable!
• generalizes the “Gaussian width” or “Gaussian distance squared” or “statistical dimension”.
• same for and
12
Reminder: Moureau EnvelopesMoureau-Yoshida envelope of evaluated at with parameter :
• always underestimates f at x. The smaller the τ the closer to f
• smooth approximation always continuously differentiable in both x and τ
( even if f is non-differentiable )• jointly convex in x and τ
• optimal v is unique (proximal operator)
• everything extends to vector-valued function f
13
Examples
14
Set Indicator Function
Gaussian width
15
Summarizing Key Features
• Squared error of general Regularized M-estimators• Minimal and generic regularity assumptions
– non-smooth, heavy-tails, non-separable, …• Key role of Expected Moureau envelopes
– strictly convex and smooth– generalize known geometric summary parameters
• Observation: fast solution by simple iterative scheme!
16
Simulations
Optimal tuning?
17
Non-smooth losses
18
Non-smooth losses
Optimal loss?
19
Non-smooth losses
Consistent Estimators?
20
Heavy-tailed noise• Huber loss function + noise iid Cauchy Robustness?
21
Non-separable loss
Square-root LASSO
22
Beyond Gaussian Designs
• analysis framework directly applies to elliptically distributed• For the LASSO we have extended ideas to IRO matrices
• Universality over iid entries (Empirical observation) modifiedequations
23
Convex Gaussian Min-max Theorem
Apply CGMT to
(PO)
(AO)
Theorem (CGMT) [TAH’15,TOH’15]
24
Proof Diagram
M-estimator (PO)Duality
(AO)
(DO)Deterministic min-max
Optimization in 4 variablesCGMT
The Equations
First-order optimality conditions
25
Related Literature
• [El Karoui 2013,2015]• Ridge regularization, smooth loss, no structured x0
• Ellpitical distributions• iid entries beyond Gaussian
• [Donoho, Montanari 2013]• No regularizer• smooth+strongly convex, bounded noise
26
Conclusions• Master Theorem for general M-estimators
– Minimal assumptions– 4 nonlinear equations, unique solution, fast iterative solution (why?)– Summary parameters: Expected Moureau envelopes
• Opportunities, lots to be asked…• Optimal loss-function? optimal Regularizer? • When can we be consistent?• Optimally tuning tuning parameter?
LASSO: Linear = Non-linear[TAH’15 NIPS]
• CGMT framework is powerful• non-linear measurements, y=g(Ax0)
• Beyond squared error analysis… Apply CGMT for different set S…[TAYH’15 ICASSP]
Thank You!