noise robustness of optimization algorithms via

35
Noise Robustness of Optimization Algorithms via Statistical Queries Cristóbal Guzmán [email protected] CWI Scientific Meeting 27-11-2015

Upload: others

Post on 11-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Noise Robustness of Optimization Algorithms via

Noise Robustness of Optimization Algorithms via Statistical Queries

Cristóbal Guzmá[email protected]

CWI Scientific Meeting27-11-2015

Page 2: Noise Robustness of Optimization Algorithms via

Motivation

2

Page 3: Noise Robustness of Optimization Algorithms via

Motivation

• In algorithms: Faster is better, right?

2

Page 4: Noise Robustness of Optimization Algorithms via

Motivation

• In algorithms: Faster is better, right?

• Sometimes, other objectives are as important

2

Page 5: Noise Robustness of Optimization Algorithms via

Motivation

• In algorithms: Faster is better, right?

• Sometimes, other objectives are as important

• This time, we care about robustness

2

Page 6: Noise Robustness of Optimization Algorithms via

Goal: Find

Example: Gradient Methods

3

where smooth convex function, compact convex setf : X ! R X ✓ Rd

f

⇤ = minx2X

f(x)

Page 7: Noise Robustness of Optimization Algorithms via

Goal: Find

Example: Gradient Methods

3

• Gradient Descent [Euler:1770]

x

t+1 = x

t � �trf(xt)

f

⇤ = minx2X

f(x)

Page 8: Noise Robustness of Optimization Algorithms via

Goal: Find

Example: Gradient Methods

3

• Gradient Descent [Euler:1770]

x

t+1 = x

t � �trf(xt) ) f(xT )� f⇤ = O(1/T )

f

⇤ = minx2X

f(x)

Page 9: Noise Robustness of Optimization Algorithms via

Goal: Find

Example: Gradient Methods

3

x

t = y

t � �trf(yt)

y

t+1 = x

t +↵t � 1

↵t+1(xt � x

t�1)

• Accelerated Gradient Method [Nesterov:1983]

• Gradient Descent [Euler:1770]

x

t+1 = x

t � �trf(xt) ) f(xT )� f⇤ = O(1/T )

f

⇤ = minx2X

f(x)

Page 10: Noise Robustness of Optimization Algorithms via

Goal: Find

Example: Gradient Methods

3

x

t = y

t � �trf(yt)

y

t+1 = x

t +↵t � 1

↵t+1(xt � x

t�1)

• Accelerated Gradient Method [Nesterov:1983]

• Gradient Descent [Euler:1770]

x

t+1 = x

t � �trf(xt) ) f(xT )� f⇤ = O(1/T )

) f(xT )� f⇤ = O(1/T 2)

f

⇤ = minx2X

f(x)

Page 11: Noise Robustness of Optimization Algorithms via

Example: Gradient Methods

4

Source: Moritz Hardt’s blog

g(x) = rf(x)

Page 12: Noise Robustness of Optimization Algorithms via

g(x) = rf(x) + ⇠

Example: Gradient Methods

4

Source: Moritz Hardt’s blog

⇠ ⇠ N (0,�2), � = 0.1

Page 13: Noise Robustness of Optimization Algorithms via

• There are other important reasons why acceleration could be harmful

Why Noise Robustness?

5

Page 14: Noise Robustness of Optimization Algorithms via

• There are other important reasons why acceleration could be harmful

Why Noise Robustness?

5

• Overfitting, privacy leakage, etc.

Page 15: Noise Robustness of Optimization Algorithms via

• There are other important reasons why acceleration could be harmful

Why Noise Robustness?

5

• Overfitting, privacy leakage, etc.

• Goal: A theory of optimization algorithms that incorporates noise sensitivity

"

TIterations

Accuracy

Page 16: Noise Robustness of Optimization Algorithms via

• There are other important reasons why acceleration could be harmful

Why Noise Robustness?

5

• Overfitting, privacy leakage, etc.

• Goal: A theory of optimization algorithms that incorporates noise sensitivity

"

TIterations

Accuracy

Data

Page 17: Noise Robustness of Optimization Algorithms via

Statistical Queries

6

• Unknown distribution supported on D W

W

D

1/⌧2

Page 18: Noise Robustness of Optimization Algorithms via

Statistical Queries

6

• Unknown distribution supported on D W

• Algorithm has oracle access to D

� : W ! [�1, 1]

Ew⇠D

[�(w)]± ⌧

W

D

1/⌧2

Page 19: Noise Robustness of Optimization Algorithms via

Statistical Queries

6

• Unknown distribution supported on D W

• Algorithm has oracle access to D

� : W ! [�1, 1]

Ew⇠D

[�(w)]± ⌧

W

D

1/⌧2

An oracle query can be emulated by samples1/⌧2

Page 20: Noise Robustness of Optimization Algorithms via

Statistical Queries

6

• Unknown distribution supported on D W

• Goals

• Algorithm has oracle access to D

• Efficiency ~ small number of queries

� : W ! [�1, 1]

Ew⇠D

[�(w)]± ⌧

W

D

1/⌧2

Page 21: Noise Robustness of Optimization Algorithms via

Statistical Queries

6

• Unknown distribution supported on D W

• Goals

• Algorithm has oracle access to D

• Efficiency ~ small number of queries• Noise tolerance ~ as large as possible ⌧

� : W ! [�1, 1]

Ew⇠D

[�(w)]± ⌧

W

D

1/⌧2

Page 22: Noise Robustness of Optimization Algorithms via

Why Statistical Queries?

7

SQ algorithms have additional properties, or provide the basis for designing algorithms with additional features:

Page 23: Noise Robustness of Optimization Algorithms via

Why Statistical Queries?

7

SQ algorithms have additional properties, or provide the basis for designing algorithms with additional features:

• Noise tolerance [Kearns:1994]

• Differential privacy [Blum, Dwork, McSherry, Nissim:2005], [Kasiviswanathan, Lee, Nissim, Raskhodnikova, Smith, 2011]

• Distributed computation [Balcan, Blum, Fine, Mansour:2012]

• Generalization in adaptive data analysis [Dwork, Feldman, Hardt, Pitassi, Reingold, Roth:2014]

Page 24: Noise Robustness of Optimization Algorithms via

Statistical Query Convex Opt.

8

• Stochastic convex optimizationminx2X

Ew⇠D

[f(x,w)]

Page 25: Noise Robustness of Optimization Algorithms via

Statistical Query Convex Opt.

8

• Stochastic convex optimizationminx2X

Ew⇠D

[f(x,w)]

• Statistical queries

• Function value: �(·) = f(x, ·)

Page 26: Noise Robustness of Optimization Algorithms via

Statistical Query Convex Opt.

8

• Stochastic convex optimizationminx2X

Ew⇠D

[f(x,w)]

• Statistical queries

• Function value: �(·) = f(x, ·)

• Gradient:�(·) = @f(x, ·)

@xii = 1, . . . , d

Page 27: Noise Robustness of Optimization Algorithms via

Statistical Query Convex Opt.

8

• Difficulty: estimation in 2-norm accumulates errors

• Stochastic convex optimizationminx2X

Ew⇠D

[f(x,w)]

• Statistical queries

• Function value: �(·) = f(x, ·)

• Gradient:�(·) = @f(x, ·)

@xii = 1, . . . , d

gi = Ew⇠D

@f(x,w)

@xi

�± ⌧ ) krEw[f(x,w)]� gk2 ⇡

pd⌧

Page 28: Noise Robustness of Optimization Algorithms via

Optimal Estimation Algorithm

9

[Feldman, G., Vempala:2015]

Q: Is it possible to avoid noise accumulation in gradient estimation?

W

D

Page 29: Noise Robustness of Optimization Algorithms via

Optimal Estimation Algorithm

9

[Feldman, G., Vempala:2015]

Q: Is it possible to avoid noise accumulation in gradient estimation?

W

D a

Page 30: Noise Robustness of Optimization Algorithms via

Optimal Estimation Algorithm

9

[Feldman, G., Vempala:2015]

Q: Is it possible to avoid noise accumulation in gradient estimation?

W

D

U

Page 31: Noise Robustness of Optimization Algorithms via

Optimal Estimation Algorithm

9

[Feldman, G., Vempala:2015]

Q: Is it possible to avoid noise accumulation in gradient estimation?

W

D

U

With high probability krEw[f(x,w)]� gk2 ⇡ O(

plog d)⌧

Page 32: Noise Robustness of Optimization Algorithms via

Further Consequences

10

New statistical query algorithms for optimization and machine learning

[Feldman, G., Vempala:2015]

Page 33: Noise Robustness of Optimization Algorithms via

Further Consequences

10

New statistical query algorithms for optimization and machine learning

[Feldman, G., Vempala:2015]

• Gradient methods: mirror-descent, accelerated method, strongly convex minimization• Polynomial-time algorithms: center of gravity, interior-point method• Improved algorithms for high-dimensional classification (Perceptron) and regression• Improved differentially-private convex optimization algorithms

Page 34: Noise Robustness of Optimization Algorithms via

Further Consequences

10

New statistical query algorithms for optimization and machine learning

[Feldman, G., Vempala:2015]

• Gradient methods: mirror-descent, accelerated method, strongly convex minimization• Polynomial-time algorithms: center of gravity, interior-point method• Improved algorithms for high-dimensional classification (Perceptron) and regression• Improved differentially-private convex optimization algorithms

We provide new structural lower bounds for convex optimization

Page 35: Noise Robustness of Optimization Algorithms via

Thank you

11