continuous optimization problems and successes

Continuous optimizationProblems and successes

Tijl De BieIntelligent Systems Laboratory

MVSE, University of BristolUnited Kingdom

[email protected]

Continuous OptimizationTijl De Bie

Slide2

• Back-propagation algorithm for training neural networks (gradient descent)

• Support vector machines• Convex optimization `boom’ (NIPS, also ICML, KDD...)

Motivation

What explains this success?(Is it really a success?)

(Mainly for CP-ers not familiar with continuous optimization)


Slide3

•

• Continuousoptimization:

• Convex optimization:

(Convex) continuous optimization

:lig:kif

f

i

i

1 ,01 ,0

:subject to

min 0

xx

xx

functions affine are functionsconvex are

i

i

gf

dii

d

Rgf

R

over defined , functions valued-real and

,Consider x


Slide4

Convex optimization


Slide5

• General convex optimization approach– Start with a guess, iteratively improve until optimum found– E.g. Gradient descent, conjugate gradient, Newton method, etc

• For constrained convex optimization:Interior point methods– Provably efficient (worst-case, typical case even better)– Iteration complexity: – Complexity per iteration: polynomial

• Out-of-the-box tools exist (SeDuMi, SDPT3, MOSEK...)• Purely declarative• Book: Convex Optimization (Boyd & Vandenberghe)

Convex optimization

kO )/1log(


Slide6

Convex optimization

SDPSOCP

QP LP

Logdet

Geometric programming

Convex optimization

Cone Programming


Slide7

• Linear objectiveLinear inequality constraintsAffine equality constraints

• Applications:– Relaxations of Integer LP’s– Classification: linear support vector machines (SVM),

forms of boosting– (Lots outside DM/ML)

Linear Programming (LP)

ii

ii

bh

xaxg

xcx

''

:subject to

'min


Slide8

• Convex Quadratic constraints

• LP is a special case where

• Applications:– Classification/regression: SVM– Novelty detection: minimum volume enclosing hypersphere– Regression + feature selection: lasso– Structured prediction problems

Convex Quadratic Programming (QP)

0''' bxaBxBx

0B


Slide9

• Second Order Cone constraints

• QCQP is a special case where

• Applications:– Metric learning– Fermat-Weber problem: find a point in a plane with minimal sum

of distances to a set of points– Robust linear programming

Second-Order Cone Programming (SOCP)

d xcbAx '2

0c


Slide10

• Constraints requiring a matrix to be Positive Semi-Definite:

• SOCP is a special case:

• Applications:– Metric learning– Low rank matrix approximations (dimensionality reduction)– Very tight relaxations of graph labeling problems (e.g. Max-cut)– Semi-supervised learning– Approximate inference in difficult graphical models

Semi-Definite Programming (SDP)

0FF k

kkx0

0

'''

IxcbAxbAxxcd

d


Slide11

• Objective and constraints of the form:

• Applications:– Maximum entropy modeling with moment constraints– Maximum likelihood fitting of exponential family distributions

Geometric programming

kkk bxa 'explog


Slide12

• Objective is the log determinant of a matrix:

• = -volume of parallelepiped spanned by columns of X

• Applications:– Novelty detection: minimum volume enclosing ellipsoid– Experimental design / active learning (which labels for which

data points are likely to be most informative)

Log Determinant Optimization (Logdet)

Xdetlog


Slide13

• Eigenvalue problems are not convex optimization problems

• Still, a relatively efficient and globally convergent, and a useful primitive:– Dimensionality reduction (PCA)– Finding relations between datasets (CCA)– Spectral clustering– Metric learning– Relaxations of combinatorial problems

Eigenvalue problems


Slide14

• Very popular in conferences like NIPS, ICML, KDD• These model classes are sufficiently rich to do

sophisticated things– Sparsity: L1 norm/linear constraints feature selection– Low-rank of matrices: SDP constraint and trace norm (sparse

PCA, labeling problems...)• Declarative nature, little expertise needed• Computational complexity is easy to understand

The hype


Slide15

• But:– Polynomial-time, often with a high exponent

E.g. SDP: and sometimes – Convex constraints can be too limitative

• Tendency toward other paradigms:– Convex-concave programming

(Few guarantees, but works well in practice)– Submodular optimization

(Approximation guarantees, works well in practice)

After the hype

5.22qdO 2qOd


Slide16

• “CP: Choosing the best model is an art” (Helmut)“CP requires skill and ingenuity” (Barry)

• I understand in CP there is a hierarchy of propagation methods, but...

• Is there a hierarchy of problem complexities?– How hard is it to see if a constraint will propagate well?– Does it depend on the implementation?– ...

CP vs Convex Optimization

continuous optimization problems and successes

Documents

special case

columns of x applications

typical case

order cone constraints

conjugate gradient

volume of parallelepiped

svmnovelty detection

eigenvalue problems