quadratic programs cone programsggordon/10725-f12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g.,...

39
Quadratic programs Cone programs 10-725 Optimization Geoff Gordon Ryan Tibshirani

Upload: others

Post on 09-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Quadratic programsCone programs

10-725 OptimizationGeoff Gordon

Ryan Tibshirani

Page 2: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Administrivia

• HW3 back at end of class

• Last day for feedback survey

• All lectures now up on Youtube (and continue to be downloadable from course website)

• Reminder: midterm next Tuesday 11/6!‣ in class, 1 hr 20 min, one sheet (both sides) of notes

2

Page 3: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Quadratic programs

• m constraints, n vars‣ A: Rm×n b: Rm c: Rn x: Rn H: Rn×n

‣ [min or max] xTHx/2 + cTx

‣ s.t. Ax ≤ b or Ax = b [or some mixture]

‣ may have (some elements of) x ≥ 0

• Convex problem if:‣

3

max 2x+x2+y2 s.t.x + y ≤ 4

2x + 5y ≤ 12x + 2y ≤ 5

x, y ≥ 0

Page 4: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

For example

Page 5: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Cone programs

• m constraints, n vars‣ A: Rm×n b: Rm c: Rn x: Rn

‣ Cones K ⊆ Rm L ⊆ Rn

‣ [min or max] cTx s.t. Ax + b ∈ K x ∈ L

‣ convex if

• E.g., K =

• E.g., L =

5

Page 6: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

For example: SOCP• min cTx s.t. Aix + bi ∈ Ki, i = 1, 2, …

6

Page 7: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Conic sections

7

Page 8: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

QPs are reducible to SOCPs

• min xTHx/2 + cTx s.t. …

8

Page 9: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

∃ SOCPs that aren’t QPs?

• QCQP: convex quadratic objective & constraints

• minimize a2 + b2 s.t.‣ a ≥ x2, b ≥ y2

‣ 2x + y = 4

• Not a QP (nonlinear constraints)‣ but, can rewrite as SOCP

9

Page 10: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

More cone programs: SDP

• Semidefinite constraint:‣ variable x ∈ Rn

‣ constant matrices A1, A2, … ∈ Rm×m

‣ constrain

• Semidefinite program: min cTx s.t. ‣ semidefinite constraints

‣ linear equalities

‣ linear inequalities

10

Page 11: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Visualizing S+

• 2 x 2 symmetric matrices w/ tr(A) = 1

Page 12: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

What about 3 x 3?• Try setting entire diagonal to 1/3‣ plot off-diagonal elements (3 of them)

Page 13: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

3×3 symmetric psd matrices

Page 14: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

S+ is self-dual

• S+: { A | A=AT, xTAx ≥ 0 for all x }

• [xTAx ≥ 0 for all x] ⇔ [tr(BTA) ≥ 0 for all psd B]

Page 15: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

How hard are QPs and CPs?

• Convex QP or CP: not much harder than LP!‣ as long as we have an efficient rep’n of the cone

‣ poly(L, 1/ϵ) (L = bit length, ϵ = accuracy)

‣ can we get strongly polynomial (no 1/ϵ)?‣ famous open question, even for LP

• General QP or CP: NP-complete‣ e.g., reduce max cut to QP

15

Page 16: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

QP examples

• Euclidean projection

• LASSO‣ Mahalanobis projection

• Huber regression

• Support vector machine

16

Page 17: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

LASSO example

17

y

x

fit y = ax+bw/ (a,b) sparse

Page 18: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

LASS

O e

xam

ple

Page 19: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Robust (Huber) regression

• Given points (xi, yi)

‣ L2 regression: minw Σi (yi – xiTw)2

• Problem: overfitting!

• Solution: Huber loss‣ minw Σi Hu(yi – xi

Tw)

Hu(z) =

Page 20: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Huber loss as QP

• Hu(z) = mina,b (z + a – b)2 + 2a + 2b ‣ s.t. a, b ≥ 0

Page 21: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Cone program examples

• SOCP‣ (sparse) group lasso

‣ discrete MRF relaxation

‣ [Kumar, Kolmogorov, Torr, JMLR 2008]

‣ min volume covering ellipsoid (nonlinear objective)

21

Page 22: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Cone program examples

• SDP‣ graphical lasso (nonlinear objective)

‣ Markowitz portfolio optimization (see B&V)

‣ max-cut relaxation [Goemans, Williamson]

‣ matrix completion

‣ manifold learning: max variance unfolding

22

Page 23: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Matrix completion

• Observe Aij for ij ∈ E, write Pij = {

• min ||(X–A)○P||2 + λ||X||

23

F *

Page 24: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Max-variance unfoldingaka semidefinite embedding

• Goal: given x1, … xT ∈ Rn

‣ find y1, …, yT ∈ Rk (k ≪ n)

‣ ||yi – yj|| ≈ ||xi – xj|| ∀i,j ∈ E

• If xi were near a k-dim subspace of Rn, PCA!

• Instead, two steps:‣ first look for z1, … zT ∈ Rn with

‣ ||zi – zj|| = ||xi – xj|| ∀i,j ∈ E

‣ and var(z) as big as possible

‣ then use PCA to get yi from zi

24

Page 25: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

MVU/SDE

• maxz tr(cov(z)) s.t. ||zi – zj|| = ||xi – xj|| ∀i,j ∈ E

25

Page 26: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Duality for QPs and Cone Ps

• Combined QP/CP:‣ min cTx + xTHx/2 s.t. Ax + b ∈ K x ∈ L

‣ cones K, L implement any/all of equality, inequality, generalized inequality

26

Page 27: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Primal-dual pair

• Primal:‣ min cTx + xTHx/2 s.t. Ax + b ∈ K x ∈ L

• Dual:‣ max –zTHz/2 – bTy s.t. Hz + c – ATy ∈ L* y ∈ K*

27

Page 28: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

KKT conditions‣ min cTx + xTHx/2 s.t. Ax + b ∈ K x ∈ L

‣ max –bTy – zTHz/2 s.t. Hz + c – ATy ∈ L* y ∈ K*

28

prim

al-

dual

pai

r

Page 29: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

KKT conditions

‣ primal: Ax+b ∈ K x ∈ L

‣ dual: Hz + c – ATy ∈ L* y ∈ K*

‣ quadratic: Hx = Hz

‣ comp. slack: yT(Ax+b) = 0 xT(Hz+c–ATy) = 0

29

Page 30: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Support vector machines(separable case)

Page 31: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Maximizing margin

• margin M = yi (xi . w - b)

• max M s.t. M ≤ yi (xi . w - b)

Page 32: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

For example

32

1 0 1 2 31

0.5

0

0.5

1

1.5

2

2.5

Page 33: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Slacks

• min ||v||2/2 s.t. yi (xiTv – d) ≥ 1 ∀i

33

1 0 1 2 31

0.5

0

0.5

1

1.5

2

2.5

Page 34: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

SVM duality

• min ||v||2/2 – Σsi s.t. yi (xiTv – d) ≥ 1–si ∀i

• min vTv/2 + 1Ts s.t. Av – yd + s – 1 ≥ 0

34

Page 35: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Interpreting the dual• max 1Tα – αTKα/2 s.t. yTα = 0 0 ≤ α ≤ 1

35!! !"#$ " "#$ ! !#$ % %#$ &!!

!"#$

"

"#$

!

!#$

%

%#$

α:

α>0:

α<1:

yTα=0:

Page 36: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

From dual to primal• max 1Tα – αTKα/2 s.t. yTα = 0 0 ≤ α ≤ 1

36!! !"#$ " "#$ ! !#$ % %#$ &!!

!"#$

"

"#$

!

!#$

%

%#$

Page 37: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

A suboptimal support set

37

1 0 1 21

0.5

0

0.5

1

1.5

2

2.5

Page 38: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

SVM duality: the applet

Page 39: Quadratic programs Cone programsggordon/10725-F12/slides/20-qps.pdf · 2012-11-02 · ‣ e.g., reduce max cut to QP 15. Geoff Gordon—10-725 Optimization—Fall 2012 QP examples

Geoff Gordon—10-725 Optimization—Fall 2012

Why is the dual useful?aka the kernel trick

• SVM: n examples, m features‣ primal:

‣ dual:

39

max 1Tα – αTAATα/2 s.t. yTα = 0 0 ≤ α ≤ 1