lecture 5: optimization and convexity sanjeev arora elad hazan · sanjeev arora elad hazan cos 402...

27
Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan COS 402 – Machine Learning and Artificial Intelligence Fall 2016

Upload: others

Post on 24-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Lecture5:optimizationandconvexity

SanjeevArora EladHazan

COS402– MachineLearningand

ArtificialIntelligenceFall2016

Page 2: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Admin

• Exercise2(implementation)nextThu,inclass• Exercise3(written),duenextThu• Movie– “ExMachina”+discussionpanelw.Prof.Hasson(PNI)WedOct.4th 19:30ticketsstillavailable@Bellaroom204COS

• NextTue:specialguest- Dr.Yoram Singer@Google

Page 3: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Recap

• Definition+fundamentaltheoremofstatisticallearning• Powerfulclassesw.lowsamplecomplexityerrorexist(i.e.pythonprograms),butcomputationallyhard• Perceptron• SVM

Page 4: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Agenda

• convexrelaxations• convexoptimization• Gradientdescent

Page 5: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Mathematicaloptimization

Input:function𝑓:𝐾 ↦ 𝑅,for𝐾 ⊆ 𝑅'

Output:point𝑥 ∈ 𝐾,suchthat 𝑓 𝑥 ≤ 𝑓 𝑦 ∀𝑦 ∈ 𝐾

Page 6: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Mathematicaloptimization

• Continuousfunctions(backtocalculus,derivatives,differentiability,…)• Vs.combinatorialoptimizationasingraphalgorithms(strongconnection)• Studiedsinceearly1900’s,lotsofworkinsovietunion(centraloptimization,resourceallocation,militaryapplications,etc.)• Specialcases:linearprogramming,convexoptimization,maxflowingraphs

Efficient(poly-time)algorithms

Page 7: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Optimizationforlinearclassification

Givenasample𝑆 = 𝑥0, 𝑦0 ,… , 𝑥3, 𝑦3 ,findhyperplane(throughtheoriginw.l.o.g)suchthat:

𝑤 = arg min; <0

# ofmistakes

Page 8: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Optimizationforlinearclassification

𝑤 = arg min; <0

𝑖s. 𝑡. 𝑠𝑖𝑔𝑛(𝑤F𝑥G ≠ 𝑦G |

Page 9: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Minimizationcanbehard

Page 10: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Sumofsignsà hard

Page 11: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Convexfunctions:localà global

Sumofconvexfunctionsà alsoconvex

Page 12: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Convexrelaxationfor0-1loss

Page 13: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Convexrelaxationforlinearclassification

𝑤 = arg min; <0

ℓ(𝑤K𝑥G,𝑦G) suchas:

1. Ridge/linearregressionℓ 𝑤K𝑥G, 𝑦G = 𝑤K𝑥G − 𝑦G N

2. SVM ℓ 𝑤K𝑥G,𝑦G = max{0,1 − 𝑦G 𝑤K𝑥G}3. Logisticregression ℓ 𝑤K𝑥G,𝑦G = log(1 + 𝑒;XYZ)

𝑤 = arg min; <0

𝑖s. 𝑡. 𝑠𝑖𝑔𝑛(𝑤F𝑥G ≠ 𝑦G |

Page 14: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Smallrecap

• Findinglinearclassifiers:formulatedasmathematicaloptimization• Convexity:propertythatallowslocal greedyalgorithms• Formulateconvexrelaxationstolinearclassification

Next:• Algorithmsforconvexoptimization

Page 15: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Convexity

Afunction𝑓: 𝑅' ↦ 𝑅 isconvexifandonlyif:

𝑓12𝑥 +

12𝑦 ≤

12𝑓 𝑥 +

12𝑓 𝑦

• Informally:smileyJ

Page 16: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

• Gradient=thedirectionofsteepestdescent,whichisthederivativeineachcoordinate:

Calculusreminder:gradient

�[rf(x)]i = � @

@xif(x)

Page 17: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Convexity

(assumesdifferentiability,o/wsubgradient)(anotheralternative:secondderivativeisnon-negativein1D)

• Alternativedefinition:

f y ≥ f x + 𝛻𝑓(𝑥)K(𝑦 − 𝑥)

𝑥 𝑦

Page 18: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

• Moveinthedirectionofsteepestdescent,whichis:

Greedyoptimization:gradientdescent

p1p* p2p3

�[rf(x)]i = � @

@xif(x)

𝑥`a0 ← 𝑥` − 𝜂𝛻𝑓 𝑥`

“stepsize”or“Learningrate”

Page 19: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

gradientdescent– constrainedset

𝑦`a0 ← 𝑥` − 𝜂𝛻𝑓 𝑥`𝑥`a0 = argmin

d∈e|𝑦`a0 − 𝑥|

Page 20: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

convexconstraints

SetKisconvexifandonlyif:

𝑥, 𝑦 ∈ 𝐾 ⇒ (½𝑥 +½𝑦) ∈ 𝐾

Page 21: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

gradientdescent– constrainedset

𝑦`a0 ← 𝑥` − 𝜂𝛻𝑓 𝑥`𝑥`a0 = argmin

d∈e|𝑦`a0 − 𝑥|

Let:• G=upperboundonnormofgradients

|𝛻𝑓 𝑥` | ≤ 𝐺

• D=diameterofconstraintset∀𝑥, 𝑦 ∈ 𝐾. |𝑥 − 𝑦| ≤ 𝐷

Theorem:forstepsize𝜂 = jk F

𝑓1𝑇m𝑥``

≤ minY∗∈e

𝑓 𝑥∗ +𝐷𝐺𝑇

Page 22: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Proof:1. Observation 1:

x∗ − yoa0 N = x∗ − xo N − 2𝜂𝛻𝑓(𝑥`)(𝑥` − 𝑥∗)+ 𝛻𝑓(𝑥`) N

2. Observation 2: x∗ − 𝑥`a0 N ≤ x∗ − y`a0 N

This is the Pythagorean theorem:

𝑦`a0 ← 𝑥` − 𝜂𝛻𝑓 𝑥`𝑥`a0 = argmin

d∈e|𝑦`a0 − 𝑥|

Page 23: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Proof:1. Observation 1:

x∗ − yoa0 N = x∗ − xo N − 2𝜂𝛻𝑓(𝑥`)(𝑥` − 𝑥∗) + 𝛻𝑓(𝑥`) N

2. Observation 2: x∗ − 𝑥`a0 N ≤ x∗ − y`a0 N

Thus: x∗ − xoa0 N ≤ x∗ − xo N − 2𝜂𝛻𝑓(𝑥`)(𝑥` − 𝑥∗) + 𝐺N

And hence:

𝑓(1𝑇m𝑥`)− 𝑓 𝑥∗ ≤

`

1𝑇m 𝑓 𝑥` − 𝑓 𝑥∗

`≤1𝑇m𝛻𝑓 𝑥` 𝑥` − 𝑥∗

`

≤1𝑇m

12𝜂 x∗ − xoa0 N − x∗ − xo N

`+𝜂2 𝐺

N

≤1

𝑇 ⋅ 2𝜂 𝐷N +

𝜂2𝐺

N ≤𝐷𝐺𝑇

𝑦`a0 ← 𝑥` − 𝜂𝛻𝑓 𝑥`𝑥`a0 = argmin

d∈e|𝑦`a0 − 𝑥|

Page 24: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

gradientdescent– constrainedset

Theorem:forstepsize𝜂 = jk F

𝑓1𝑇m

𝑥``

≤ minY∗∈e

𝑓 𝑥∗ +𝐷𝐺𝑇

Thus,toget𝜖-approximatesolution,applyrsks

tsgradient

iterations.

Page 25: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

GDforlinearclassification

1. Ridge/linearregressionℓ 𝑤K𝑥G, 𝑦G = 𝑤K𝑥G − 𝑦G N

2. SVM ℓ 𝑤K𝑥G,𝑦G = max{0,1 − 𝑦G 𝑤K𝑥G}3. Logisticregression ℓ 𝑤K𝑥G,𝑦G = log(1 + 𝑒;XYZ)

𝑤 = arg min; <0

1𝑚mℓ 𝑤K𝑥G,𝑦G

G

Page 26: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

GDforlinearclassification

𝑤 = arg min; <0

1𝑚mℓ 𝑤K𝑥G,𝑦G

G

𝑤`a0 = wo − 𝜂1𝑚mℓw 𝑤`K𝑥G,𝑦G 𝑥G

G

• Complexity? 0ts

iterations,eachtaking~lineartimeindataset

• Overall𝑂 3'ts

runningtime,m=#ofexamplesinRd

• Canwespeeditup??

Page 27: Lecture 5: optimization and convexity Sanjeev Arora Elad Hazan · Sanjeev Arora Elad Hazan COS 402 –Machine Learning and Artificial Intelligence Fall 2016. Admin • Exercise 2

Summary

• Mathematicaloptimizationforlinearclassification• Convexrelaxations• Gradientdescentalgorithm• GDappliedtolinearclassification