continuous optimization problems and successes
DESCRIPTION
Continuous optimization Problems and successes. Tijl De Bie Intelligent Systems Laboratory MVSE, University of Bristol United Kingdom [email protected]. Motivation. Back-propagation algorithm for training neural networks (gradient descent) Support vector machines - PowerPoint PPT PresentationTRANSCRIPT
Continuous optimizationProblems and successes
Tijl De BieIntelligent Systems Laboratory
MVSE, University of BristolUnited Kingdom
Continuous OptimizationTijl De Bie
Slide2
• Back-propagation algorithm for training neural networks (gradient descent)
• Support vector machines• Convex optimization `boom’ (NIPS, also ICML, KDD...)
Motivation
What explains this success?(Is it really a success?)
(Mainly for CP-ers not familiar with continuous optimization)
Continuous OptimizationTijl De Bie
Slide3
•
• Continuousoptimization:
• Convex optimization:
(Convex) continuous optimization
:lig:kif
f
i
i
1 ,01 ,0
:subject to
min 0
xx
xx
functions affine are functionsconvex are
i
i
gf
dii
d
Rgf
R
over defined , functions valued-real and
,Consider x
Continuous OptimizationTijl De Bie
Slide4
Convex optimization
Continuous OptimizationTijl De Bie
Slide5
• General convex optimization approach– Start with a guess, iteratively improve until optimum found– E.g. Gradient descent, conjugate gradient, Newton method, etc
• For constrained convex optimization:Interior point methods– Provably efficient (worst-case, typical case even better)– Iteration complexity: – Complexity per iteration: polynomial
• Out-of-the-box tools exist (SeDuMi, SDPT3, MOSEK...)• Purely declarative• Book: Convex Optimization (Boyd & Vandenberghe)
Convex optimization
kO )/1log(
Continuous OptimizationTijl De Bie
Slide6
Convex optimization
SDPSOCP
QP LP
Logdet
Geometric programming
Convex optimization
Cone Programming
Continuous OptimizationTijl De Bie
Slide7
• Linear objectiveLinear inequality constraintsAffine equality constraints
• Applications:– Relaxations of Integer LP’s– Classification: linear support vector machines (SVM),
forms of boosting– (Lots outside DM/ML)
Linear Programming (LP)
ii
ii
bh
xaxg
xcx
''
:subject to
'min
Continuous OptimizationTijl De Bie
Slide8
• Convex Quadratic constraints
• LP is a special case where
• Applications:– Classification/regression: SVM– Novelty detection: minimum volume enclosing hypersphere– Regression + feature selection: lasso– Structured prediction problems
Convex Quadratic Programming (QP)
0''' bxaBxBx
0B
Continuous OptimizationTijl De Bie
Slide9
• Second Order Cone constraints
• QCQP is a special case where
• Applications:– Metric learning– Fermat-Weber problem: find a point in a plane with minimal sum
of distances to a set of points– Robust linear programming
Second-Order Cone Programming (SOCP)
d xcbAx '2
0c
Continuous OptimizationTijl De Bie
Slide10
• Constraints requiring a matrix to be Positive Semi-Definite:
• SOCP is a special case:
• Applications:– Metric learning– Low rank matrix approximations (dimensionality reduction)– Very tight relaxations of graph labeling problems (e.g. Max-cut)– Semi-supervised learning– Approximate inference in difficult graphical models
Semi-Definite Programming (SDP)
0FF k
kkx0
0
'''
IxcbAxbAxxcd
d
Continuous OptimizationTijl De Bie
Slide11
• Objective and constraints of the form:
• Applications:– Maximum entropy modeling with moment constraints– Maximum likelihood fitting of exponential family distributions
Geometric programming
kkk bxa 'explog
Continuous OptimizationTijl De Bie
Slide12
• Objective is the log determinant of a matrix:
• = -volume of parallelepiped spanned by columns of X
• Applications:– Novelty detection: minimum volume enclosing ellipsoid– Experimental design / active learning (which labels for which
data points are likely to be most informative)
Log Determinant Optimization (Logdet)
Xdetlog
Continuous OptimizationTijl De Bie
Slide13
• Eigenvalue problems are not convex optimization problems
• Still, a relatively efficient and globally convergent, and a useful primitive:– Dimensionality reduction (PCA)– Finding relations between datasets (CCA)– Spectral clustering– Metric learning– Relaxations of combinatorial problems
Eigenvalue problems
Continuous OptimizationTijl De Bie
Slide14
• Very popular in conferences like NIPS, ICML, KDD• These model classes are sufficiently rich to do
sophisticated things– Sparsity: L1 norm/linear constraints feature selection– Low-rank of matrices: SDP constraint and trace norm (sparse
PCA, labeling problems...)• Declarative nature, little expertise needed• Computational complexity is easy to understand
The hype
Continuous OptimizationTijl De Bie
Slide15
• But:– Polynomial-time, often with a high exponent
E.g. SDP: and sometimes – Convex constraints can be too limitative
• Tendency toward other paradigms:– Convex-concave programming
(Few guarantees, but works well in practice)– Submodular optimization
(Approximation guarantees, works well in practice)
After the hype
5.22qdO 2qOd
Continuous OptimizationTijl De Bie
Slide16
• “CP: Choosing the best model is an art” (Helmut)“CP requires skill and ingenuity” (Barry)
• I understand in CP there is a hierarchy of propagation methods, but...
• Is there a hierarchy of problem complexities?– How hard is it to see if a constraint will propagate well?– Does it depend on the implementation?– ...
CP vs Convex Optimization