sketched learning from random features moments · compressive learning nicolas keriven compression...

85
Sketched Learning from Random Features Moments Nicolas Keriven Ecole Normale Supérieure (Paris) CFM-ENS chair in Data Science (thesis with Rémi Gribonval at Inria Rennes) ISMP, July 6th 2018

Upload: others

Post on 24-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Sketched Learning from Random Features Moments

Nicolas Keriven

Ecole Normale Supérieure (Paris)

CFM-ENS chair in Data Science

(thesis with Rémi Gribonval at Inria Rennes)

ISMP, July 6th 2018

Page 2: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Compressive learning

Nicolas Keriven 1/15

Page 3: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Compressive learning

Nicolas Keriven

Compression Learning

Linear sketch

• Sketched learning: First compress data in a linear sketch [Cormode

2011], then learn

1/15

Page 4: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Compressive learning

Nicolas Keriven

Compression Learning

Linear sketch

• Sketched learning: First compress data in a linear sketch [Cormode

2011], then learn• Hash tables, count sketches, histograms…

1/15

Page 5: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Compressive learning

Nicolas Keriven

Compression Learning

Linear sketch

• Sketched learning: First compress data in a linear sketch [Cormode

2011], then learn• Hash tables, count sketches, histograms…

• Advantages: one-pass, streaming, distributed compression, data privacy…

1/15

Page 6: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Compressive learning

Nicolas Keriven

Compression Learning

Linear sketch

• Sketched learning: First compress data in a linear sketch [Cormode

2011], then learn• Hash tables, count sketches, histograms…

• Advantages: one-pass, streaming, distributed compression, data privacy…

• In this talk: unsupervised learning

1/15

Page 7: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

2/15

Page 8: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

2/15

What is contained in a sketch ?

Page 9: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

2/15

What is contained in a sketch ?

• : mean

Page 10: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

2/15

What is contained in a sketch ?

• : mean

• : moment

Page 11: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

2/15

What is contained in a sketch ?

• : mean

• : moment

• : histogram

Page 12: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

2/15

What is contained in a sketch ?

• : mean

• : moment

• : histogram

• Proposed: kernel random features [Rahimi 2007]

(random proj. + non-linearity)

Page 13: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

2/15

What is contained in a sketch ?

• : mean

• : moment

• : histogram

• Proposed: kernel random features [Rahimi 2007]

(random proj. + non-linearity)

Questions:

• What information is preserved by the sketching ?

• How to retrieve this information ?

• What is a sufficient number of features ?

Page 14: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

2/15

What is contained in a sketch ?

• : mean

• : moment

• : histogram

• Proposed: kernel random features [Rahimi 2007]

(random proj. + non-linearity)

Questions:

• What information is preserved by the sketching ?

• How to retrieve this information ?

• What is a sufficient number of features ?

Intuition: sketching as a linear embedding

Page 15: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

2/15

What is contained in a sketch ?

• : mean

• : moment

• : histogram

• Proposed: kernel random features [Rahimi 2007]

(random proj. + non-linearity)

Questions:

• What information is preserved by the sketching ?

• How to retrieve this information ?

• What is a sufficient number of features ?

- Assumption:

Intuition: sketching as a linear embedding

Page 16: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

2/15

What is contained in a sketch ?

• : mean

• : moment

• : histogram

• Proposed: kernel random features [Rahimi 2007]

(random proj. + non-linearity)

Questions:

• What information is preserved by the sketching ?

• How to retrieve this information ?

• What is a sufficient number of features ?

- Assumption:

- Linear operator:

Intuition: sketching as a linear embedding

Page 17: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

2/15

What is contained in a sketch ?

• : mean

• : moment

• : histogram

• Proposed: kernel random features [Rahimi 2007]

(random proj. + non-linearity)

Questions:

• What information is preserved by the sketching ?

• How to retrieve this information ?

• What is a sufficient number of features ?

- Assumption:

- Linear operator:

- « Noisy » linear measurement:

Noise small

Intuition: sketching as a linear embedding

Page 18: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

2/15

What is contained in a sketch ?

• : mean

• : moment

• : histogram

• Proposed: kernel random features [Rahimi 2007]

(random proj. + non-linearity)

Questions:

• What information is preserved by the sketching ?

• How to retrieve this information ?

• What is a sufficient number of features ?

- Assumption:

- Linear operator:

- « Noisy » linear measurement:

Noise small

Intuition: sketching as a linear embedding

Dimensionality-reducing, random, linear embedding: Compressive Sensing?

Page 19: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Retrieving mixture of Diracsfrom a sketch= k-means

Example of applications [Keriven 2016,2017]

Nicolas Keriven 3/15

Page 20: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Retrieving mixture of Diracsfrom a sketch= k-means

Example of applications [Keriven 2016,2017]

Nicolas Keriven

Application: Spectral clusteringfor MNIST classification

3/15

Page 21: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Retrieving mixture of Diracsfrom a sketch= k-means

Example of applications [Keriven 2016,2017]

Nicolas Keriven

Application: Spectral clusteringfor MNIST classification

3/15

- Twice faster than k-means- 4 orders of magnitude more

memory efficient

Page 22: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Retrieving mixture of Diracsfrom a sketch= k-means

Example of applications [Keriven 2016,2017]

Nicolas Keriven

Application: Spectral clusteringfor MNIST classification

3/15

- Twice faster than k-means- 4 orders of magnitude more

memory efficient

Retrieving GMMs from a sketch

Application: speaker verification [Reynolds 2000]

Error:

• EM on 300 000 samples : 29.53• 20kB sketch computed on 50GB database: 28.96

Page 23: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

In this talk

Nicolas Keriven

Q: Theoretical guarantees ?

• Inspired by Compressive Sensing:

• 1: with the Restricted Isometry Property (RIP)

• 2: with dual certificates

4/15

Page 24: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Outline

Nicolas Keriven

Information-preservation guarantees: a RIP analysis

Joint work with R. Gribonval, G. Blanchard, Y. Traonmilin

Total variation regularization:a dual certificate analysis

Conclusion, outlooks

Page 25: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Recall: Linear inverse problem

Nicolas Keriven 5/15

Page 26: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

True distribution:

Recall: Linear inverse problem

Nicolas Keriven 5/15

Sketch:

Page 27: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

True distribution:

Recall: Linear inverse problem

Nicolas Keriven

• Estimation problem = linear inverse problem on measures

• Extremely ill-posed !

5/15

Sketch:

Page 28: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

True distribution:

Recall: Linear inverse problem

Nicolas Keriven

• Estimation problem = linear inverse problem on measures

• Extremely ill-posed !

• Feasibility? (information-preservation)

5/15

Best algorithmpossible

Sketch:

Page 29: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Information preservation guarantees

Nicolas Keriven 6/15

Page 30: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

: Model set of « simple » distributions (eg. GMMs)

Information preservation guarantees

Nicolas Keriven 6/15

Page 31: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

: Model set of « simple » distributions (eg. GMMs)

Information preservation guarantees

Nicolas Keriven 6/15

Page 32: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

: Model set of « simple » distributions (eg. GMMs)

Information preservation guarantees

Nicolas Keriven 6/15

Page 33: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

: Model set of « simple » distributions (eg. GMMs)

Information preservation guarantees

Nicolas Keriven

GoalProve the existence of a decoder robustto noise and stable to modeling error.

« Instance-optimal » decoder

6/15

Page 34: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

: Model set of « simple » distributions (eg. GMMs)

Information preservation guarantees

Nicolas Keriven

GoalProve the existence of a decoder robustto noise and stable to modeling error.

Lower Restricted Isometry Property

« Instance-optimal » decoder

6/15

Page 35: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

: Model set of « simple » distributions (eg. GMMs)

Information preservation guarantees

Nicolas Keriven

Generalized Method of Moments

GoalProve the existence of a decoder robustto noise and stable to modeling error.

Lower Restricted Isometry Property

« Instance-optimal » decoder

6/15

Page 36: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

: Model set of « simple » distributions (eg. GMMs)

Information preservation guarantees

Nicolas Keriven

New goal: find/construct models and operators that satisfy the LRIP (w.h.p.)

Generalized Method of Moments

GoalProve the existence of a decoder robustto noise and stable to modeling error.

Lower Restricted Isometry Property

« Instance-optimal » decoder

6/15

Page 37: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Proving the LRIP

Nicolas Keriven

Goal: LRIP

7/15

Page 38: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Proving the LRIP

Nicolas Keriven

Construction of :- Kernel mean [Gretton 2006, Borgwardt 2006]

- Random features [Rahimi 2007]

Pointwise LRIP

Goal: LRIP

7/15

Page 39: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Proving the LRIP

Nicolas Keriven

Construction of :- Kernel mean [Gretton 2006, Borgwardt 2006]

- Random features [Rahimi 2007]

Pointwise LRIP

Goal: LRIP

Extension to LRIP

Covering numbers (compacity) of the normalized secant set

7/15

Page 40: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Proving the LRIP

Nicolas Keriven

Construction of :- Kernel mean [Gretton 2006, Borgwardt 2006]

- Random features [Rahimi 2007]

Pointwise LRIP

Goal: LRIP

Extension to LRIP

Covering numbers (compacity) of the normalized secant set

Subset of a unit ball (infinite dimension)that only depends on

7/15

Page 41: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Proving the LRIP

Nicolas Keriven

Construction of :- Kernel mean [Gretton 2006, Borgwardt 2006]

- Random features [Rahimi 2007]

Pointwise LRIP

Goal: LRIP

Extension to LRIP

Covering numbers (compacity) of the normalized secant set

Subset of a unit ball (infinite dimension)that only depends on

7/15

Page 42: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Main result [Keriven 2016]

Nicolas Keriven

Main hypothesis

The normalized secant set has finite covering numbers.

8/15

Page 43: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Result

For ,

Main result [Keriven 2016]

Nicolas Keriven

Main hypothesis

The normalized secant set has finite covering numbers.

8/15

Page 44: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Result

For ,

Main result [Keriven 2016]

Nicolas Keriven

Main hypothesis

The normalized secant set has finite covering numbers.

Pointwise concentration Dimensionality of the model

8/15

Page 45: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Result

For ,

W.h.p.

Main result [Keriven 2016]

Nicolas Keriven

Main hypothesis

The normalized secant set has finite covering numbers.

Pointwise concentration Dimensionality of the model

8/15

Page 46: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Result

For ,

W.h.p.

Main result [Keriven 2016]

Nicolas Keriven

Main hypothesis

The normalized secant set has finite covering numbers.

Pointwise concentration Dimensionality of the model

Modeling error Empirical noise

8/15

Page 47: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Result

For ,

W.h.p.

Main result [Keriven 2016]

Nicolas Keriven

Main hypothesis

The normalized secant set has finite covering numbers.

Pointwise concentration Dimensionality of the model

Modeling error Empirical noise

8/15

Does not depend on !

Does not depend on !

Page 48: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Result

For ,

W.h.p.

Main result [Keriven 2016]

Nicolas Keriven

Main hypothesis

The normalized secant set has finite covering numbers.

Pointwise concentration Dimensionality of the model

Modeling error

- Classic Compressive Sensing: finite dimension: Known- Here: infinite dimension: Technical

Empirical noise

8/15

Does not depend on !

Does not depend on !

Page 49: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Application

Nicolas Keriven

k-means with mixtures of Diracs

9/15

Page 50: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Application

Nicolas Keriven

k-means with mixtures of Diracs

Hypotheses- - separated centroids- - bounded domain for centroids

9/15

Page 51: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Application

Nicolas Keriven

k-means with mixtures of Diracs

Hypotheses- - separated centroids- - bounded domain for centroids

(no assumptionon the data)

9/15

Page 52: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Application

Nicolas Keriven

k-means with mixtures of Diracs

Hypotheses- - separated centroids- - bounded domain for centroids

Sketch- Adjusted Random Fourier features (for

technical reasons)

(no assumptionon the data)

9/15

Page 53: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Application

Nicolas Keriven

k-means with mixtures of Diracs

Hypotheses- - separated centroids- - bounded domain for centroids

Sketch- Adjusted Random Fourier features (for

technical reasons)

Result- W.r.t. k-means usual cost (SSE)

(no assumptionon the data)

9/15

Page 54: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Application

Nicolas Keriven

k-means with mixtures of Diracs

Hypotheses- - separated centroids- - bounded domain for centroids

Sketch- Adjusted Random Fourier features (for

technical reasons)

Result- W.r.t. k-means usual cost (SSE)

Sketch size

(no assumptionon the data)

9/15

Page 55: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Application

Nicolas Keriven

k-means with mixtures of Diracs GMM with known covariance

Hypotheses- - separated centroids- - bounded domain for centroids

Sketch- Adjusted Random Fourier features (for

technical reasons)

Result- W.r.t. k-means usual cost (SSE)

Sketch size

(no assumptionon the data)

9/15

Page 56: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Application

Nicolas Keriven

k-means with mixtures of Diracs GMM with known covariance

Hypotheses- - separated centroids- - bounded domain for centroids

Sketch- Adjusted Random Fourier features (for

technical reasons)

Result- W.r.t. k-means usual cost (SSE)

Sketch size

Hypotheses- Sufficiently separated means- Bounded domain for means

(no assumptionon the data)

9/15

Page 57: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Application

Nicolas Keriven

k-means with mixtures of Diracs GMM with known covariance

Hypotheses- - separated centroids- - bounded domain for centroids

Sketch- Adjusted Random Fourier features (for

technical reasons)

Result- W.r.t. k-means usual cost (SSE)

Sketch size

Hypotheses- Sufficiently separated means- Bounded domain for means

Sketch- Fourier features

(no assumptionon the data)

9/15

Page 58: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Application

Nicolas Keriven

k-means with mixtures of Diracs GMM with known covariance

Hypotheses- - separated centroids- - bounded domain for centroids

Sketch- Adjusted Random Fourier features (for

technical reasons)

Result- W.r.t. k-means usual cost (SSE)

Sketch size

Hypotheses- Sufficiently separated means- Bounded domain for means

Sketch- Fourier features

Result- With respect to log-likelihood

(no assumptionon the data)

9/15

Page 59: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Application

Nicolas Keriven

k-means with mixtures of Diracs GMM with known covariance

Hypotheses- - separated centroids- - bounded domain for centroids

Sketch- Adjusted Random Fourier features (for

technical reasons)

Result- W.r.t. k-means usual cost (SSE)

Sketch size

Hypotheses- Sufficiently separated means- Bounded domain for means

Sketch- Fourier features

Result- With respect to log-likelihood

Sketch size

(no assumptionon the data)

9/15

Page 60: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Application

Nicolas Keriven

k-means with mixtures of Diracs GMM with known covariance

Hypotheses- - separated centroids- - bounded domain for centroids

Sketch- Adjusted Random Fourier features (for

technical reasons)

Result- W.r.t. k-means usual cost (SSE)

Sketch size

Hypotheses- Sufficiently separated means- Bounded domain for means

Sketch- Fourier features

Result- With respect to log-likelihood

Sketch size

(no assumptionon the data)

9/15

Compared to Generalized Method of moments, different guarantees

Page 61: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Outline

Nicolas Keriven

Information-preservation guarantees: a RIP analysis

Total variation regularization:a dual certificate analysisJoint work with C. Poon, G. Peyré

Conclusion, outlooks

Page 62: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Total Variation regularization

Nicolas Keriven

Previously: RIP analysis

Minimization: moment matching

11/15

Page 63: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Total Variation regularization

Nicolas Keriven

Previously: RIP analysis

• Must know

• Non-convex !

Minimization: moment matching

11/15

Page 64: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Total Variation regularization

Nicolas Keriven

Previously: RIP analysis

• Must know

• Non-convex !

Minimization: moment matching

Convex relaxation (« super resolution »): Beurling-LASSO (BLASSO) [DeCastro 2015]

• : Radon measure

• : Total variation (« L1 norm »)

11/15

Page 65: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Total Variation regularization

Nicolas Keriven

Previously: RIP analysis

• Must know

• Non-convex !

Minimization: moment matching

Convex relaxation (« super resolution »): Beurling-LASSO (BLASSO) [DeCastro 2015]

• : Radon measure

• : Total variation (« L1 norm »)

Questions:• Is the measure sparse ?

• Does it have the right number of components ?

• Does it recover the true ?

11/15

Page 66: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Dual certificates

Nicolas Keriven 12/15

Dual certificate analysis:

( = Lagrange multiplier)

Page 67: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Dual certificates

Nicolas Keriven 12/15

Dual certificate analysis:

( = Lagrange multiplier)

Function

Page 68: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Dual certificates

Nicolas Keriven 12/15

Dual certificate analysis:

( = Lagrange multiplier)

Such that:

• otherwise•

Function

Page 69: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Dual certificates

Nicolas Keriven

Step 1: study full kernel [Candes 2013]

Assume sufficiently separated

12/15

Dual certificate analysis:

( = Lagrange multiplier)

Such that:

• otherwise•

Function

Page 70: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Dual certificates

Nicolas Keriven

Step 1: study full kernel [Candes 2013]

Step 2: bounding the deviations

Assume sufficiently separated

12/15

Dual certificate analysis:

( = Lagrange multiplier)

Such that:

• otherwise•

Function

Page 71: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Dual certificates

Nicolas Keriven

Step 1: study full kernel [Candes 2013]

Step 2: bounding the deviations

Assume sufficiently separated

m=10

12/15

Dual certificate analysis:

( = Lagrange multiplier)

Such that:

• otherwise•

Function

Page 72: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Dual certificates

Nicolas Keriven

Step 1: study full kernel [Candes 2013]

Step 2: bounding the deviations

Assume sufficiently separated

m=10 m=20

12/15

Dual certificate analysis:

( = Lagrange multiplier)

Such that:

• otherwise•

Function

Page 73: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Dual certificates

Nicolas Keriven

Step 1: study full kernel [Candes 2013]

Step 2: bounding the deviations

Assume sufficiently separated

m=50m=10 m=20

12/15

Dual certificate analysis:

( = Lagrange multiplier)

Such that:

• otherwise•

Function

Page 74: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Results for separated GMM

Nicolas Keriven

1: Ideal scaling in sparsity

13/15

Page 75: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Results for separated GMM

Nicolas Keriven

1: Ideal scaling in sparsity

13/15

Page 76: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Results for separated GMM

Nicolas Keriven

1: Ideal scaling in sparsity

In progress…

13/15

Page 77: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Results for separated GMM

Nicolas Keriven

1: Ideal scaling in sparsity

In progress…

• not necessarily right number of components, but:

• Mass of concentrated around true• (weak) robustness to modelling error

• Proof: infinite-dimensional golfingscheme (new)

13/15

Page 78: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Results for separated GMM

Nicolas Keriven

1: Ideal scaling in sparsity

In progress…

• not necessarily right number of components, but:

• Mass of concentrated around true• (weak) robustness to modelling error

• Proof: infinite-dimensional golfingscheme (new)

2: Minimal norm certificate[Duval, Peyré 2015]

In progress…

13/15

Assumption: data are actually drawn from a GMM…

Page 79: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Results for separated GMM

Nicolas Keriven

1: Ideal scaling in sparsity

In progress…

• not necessarily right number of components, but:

• Mass of concentrated around true• (weak) robustness to modelling error

• Proof: infinite-dimensional golfingscheme (new)

2: Minimal norm certificate[Duval, Peyré 2015]

In progress…

• when n high enough: sparse, withright number of components

• Proof: adaptation of [Tang, Recht 2013]

13/15

Assumption: data are actually drawn from a GMM…

Page 80: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Outline

Nicolas Keriven

Information-preservation guarantees: a RIP analysis

Total variation regularization:a dual certificate analysis

Conclusion, outlooks

Page 81: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Sketch learning

Nicolas Keriven

• Sketching :• Streaming, distributed learning

• Original view on data compression and generalized moments

• Combines random features and kernel mean with infinitedimensional Compressive sensing

14/15

Page 82: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Summary, outlooks

Nicolas Keriven

• RIP analysis• Information preservation guarantees• Fine control on noise, modeling error (instance optimal decoder) and

recovery metrics• Necessary and sufficient conditions

15/15

Page 83: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Summary, outlooks

Nicolas Keriven

• Dual certificate analysis• Convex minimization• In some cases, automatically guess the right number of components

• RIP analysis• Information preservation guarantees• Fine control on noise, modeling error (instance optimal decoder) and

recovery metrics• Necessary and sufficient conditions

15/15

Page 84: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Summary, outlooks

Nicolas Keriven

• Dual certificate analysis• Convex minimization• In some cases, automatically guess the right number of components

• RIP analysis• Information preservation guarantees• Fine control on noise, modeling error (instance optimal decoder) and

recovery metrics• Necessary and sufficient conditions

15/15

• Outlooks• Algorithms for TV minimization• Other features (not necessarily random…)• Other « sketched » learning tasks• Multilayer sketches ?

Page 85: Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Thank you !

Nicolas Keriven

• Gribonval, Blanchard, Keriven, Traonmilin. Compressive Statistical Learning with Random Feature Moments. 2017. <arXiv:1706.07180>

• Keriven. Sketching for Large-Scale Learning of Mixture Models. PhD Thesis. <tel-01620815>

• Poon, Keriven, Peyré. A Dual Certificates Analysis of Compressive Off-the-Grid Recovery. 2018. <arXiv:1802.08464>

• Code, applications: nkeriven.github.io