surrogate models and the of systems in the absence of...
TRANSCRIPT
Surrogate models and the optimization of systems in the absence
of algebraic modelsNick Sahinidis
Acknowledgments:Alison Cozad, David Miller, Zach Wilson
Atharv Bhosekar, Luis Miguel Rios, Hua Zheng
2Carnegie Mellon University
SIMULATION OPTIMIZATION
Pulverized coal plant Aspen Plus® simulation provided by the National Energy Technology Laboratory
3Carnegie Mellon University
EXPERIMENT OPTIMIZATION
ExtractionFermentation (twice)
PolymerizationExtrusionDistillation
ProcessSystems
Engineering
• Six functional units• Maximize PTT yield • Experimental system available in the Unit Operations Lab• Several degrees of freedom
– Fermenter—pH, T, NaCl concentration, O2 sparging conditions– Extruder—input concentrations, spinning conditions (discrete)
Carnegie Mellon University 4
No algebraic model Complex process alternatives
Costly simulations Scarcity of fully robustsimulations
CHALLENGESOPT
IMIZER
SIMULATO
R
Cost$
1 reactor2 reactors
3 reactors
Reactor size
seconds
minutes
hours
Carnegie Mellon University 5
• Derivative‐free optimization (DFO)– Optimizes in the absence of algebraic models– Derivatives are symbolically unavailable– Derivatives may be time consuming to compute numerically
– Rios and Sahinidis (JOGO, 2013)• Reviews algorithms and compares 20+ codes on 500+ problems
• Simulation‐optimization (SO)– Addresses noise in function values– Amaran, Sahinidis, Sharda and Bury (AOR, 2015)
• Reviews algorithms and applications
OPTIMIZATION TECHNOLOGY
Carnegie Mellon University 6
DFO ALGORITHMS• LOCAL SEARCH METHODS
– Direct local search• Nelder‐Mead simplex algorithm
• Generalized pattern search and generating search set
– Based on surrogate models• Trust‐region methods• Implicit filtering
• GLOBAL SEARCH METHODS– Deterministic global search
• Lipschitzian‐based partitioning • Multilevel coordinate search
– Stochastic global optimization• Hit‐and‐run• Simulated annealing• Genetic algorithms• Particle swarm
– Based on surrogate models• Response surface methods• Surrogate management framework
• Branch‐and‐fit
Carnegie Mellon University 7
DFO SOFTWARELOCAL SEARCH
FMINSEARCH (Nelder-Mead)DAKOTA PATTERN (PPS)HOPSPACK (PPS)SID-PSM (Simplex gradient PPS)NOMAD (MADS)DFO (Trust region, quadratic model)IMFIL (Implicit Filtering)BOBYQA (Trust region, quadratic model)NEWUOA (Trust region, quadratic model)
GLOBAL SEARCHDAKOTA SOLIS-WETS (DIRECT)DAKOTA DIRECT (DIRECT)TOMLAB GLBSOLVE (DIRECT)TOMLAB GLCSOLVE (DIRECT)MCS (Multilevel coordinate search)TOMLAB EGO (RSM using Kriging)TOMLAB RBF (RSM using RBF)SNOBFIT (Branch-and-Fit)TOMLAB LGO (LGO algorithm)
STOCHASTICASA (Simulated annealing)CMA-ES (Evolutionary algorithm)DAKOTA EA (Evolutionary algorithm)GLOBAL (Clustering – Multistart)PSWARM (Particle swarm)
Carnegie Mellon University 8
PROBLEMS SOLVED
9Carnegie Mellon University
PROCESS DISAGGREGATION
Block 1:Simulator
Model generation
Block 2:Simulator
Model generation
Block 3:Simulator
Model generation
Surrogate ModelsBuild simple and accuratemodels with a functional
form tailored for an optimization framework
Process SimulationDisaggregate process into
process blocks
Optimization ModelAdd algebraic constraints
design specs, heat/mass balances, and logic
constraints
Carnegie Mellon University 10
Build a model of output variables as a function of input variables x over a specified interval
LEARNING PROBLEM
Independent variables:Operating conditions, inlet flow
properties, unit geometry
Dependent variables:Efficiency, outlet flow conditions,
conversions, heat flow, etc.
Process simulation or Experiment
∈ ⋮
⋮
⋮
⋮∈
11Carnegie Mellon University
• We aim to build surrogate models that are– Accurate
• We want to reflect the true nature of the simulation
– Simple• Interpretable; tailored for algebraic optimization
– Generated from a minimal data set• Reduce experimental and simulation requirements
DESIRED MODEL ATTRIBUTES
12Carnegie Mellon University
ALAMOAutomated Learning of Algebraic Models using Optimization
trueStop
Update training data set
Start
false
Initial sampling
Build surrogate model
Adaptive sampling
Model converged?
Black-box function
Initial sampling
Build surrogate model
Current model
Modelerror
Adaptive sampling
Update training data set
New model
13Carnegie Mellon University
MODEL COMPLEXITY TRADEOFFKriging [Krige, 63]
Neural nets [McCulloch-Pitts, 43]Radial basis functions [Buhman, 00]
Model complexity
Mod
el a
ccur
acy
Linear response surface
Preferred region
14Carnegie Mellon University
• Goal: Identify the functional form and complexity of the surrogate models
• Functional form: – General functional form is unknown: Our method will identify
models with combinations of simple basis functions
MODEL IDENTIFICATION
15Carnegie Mellon University
• Step 1: Define a large set of potential basis functions
• Step 2: Model reduction
OVERFITTING AND TRUE ERROR
True errorEmpirical error
Complexity
Err
or
Ideal Model
OverfittingUnderfitting
16Carnegie Mellon University
• Qualitative tradeoffs of model reduction methods
MODEL REDUCTION TECHNIQUES
Backward elimination [Oosterhof, 63] Forward selection [Hamaker, 62]
Stepwise regression [Efroymson, 60]
Regularized regression techniques• Penalize the least squares objective using the magnitude of the regressors [Tibshirani, 95]
Best subset methods• Enumerate all possible subsets
17Carnegie Mellon University
MODEL SIZING
Complexity = number of terms allowed in the model
Goodness‐of‐fit measure 6th term was not worth the
added complexity
Final model includes 5 termsSome measure of
error that is sensitive to overfitting
(AICc, BIC, Cp)
Solve for the best one‐term
modelSolve for the best two‐term
model
18Carnegie Mellon University
• Balance fit (sum of square errors) with model complexity (number of terms in the model; denoted by p)
MODEL SELECTION CRITERIA
Corrected Akaike Information Criterion
log1
22 1
1
Mallows’ Cp∑
2
Hannan‐Quinn Information Criterion
log1
2 log log
Bayes Information Criterion∑
log
Mean Squared Error∑
1
19Carnegie Mellon University
CPU TIME COMPARISON
0.01
0.1
1
10
100
1000
10000
100000
20 30 40 50 60 70 80
CPU time (s)
Problem Size
Cp BIC AIC, MSE, HQC
• Eight benchmarks from the UCI and CMU data sets• Seventy noisy data sets were generated with multicolinearity
and increasing problem size (number of bases)
• BIC solves more than two orders of magnitude faster than AIC, MSE and HQC– Optimized directly via a
single mixed‐integer convex quadratic model
20Carnegie Mellon University
MODEL QUALITY COMPARISON
0
5
10
15
20
25
20 30 40 50 60 70 80
Spurious Variables Includ
ed
Problem Size
0
1
2
3
4
5
6
7
8
9
10
20 30 40 50 60 70 80
% Deviatio
n from
True Mod
el
Problem Size
Cp BIC AIC HQC MSE
• BIC leads to smaller, more accurate models– Larger penalty for model complexity
21Carnegie Mellon University
ALAMOAutomated Learning of Algebraic Models using Optimization
trueStop
Update training data set
Start
false
Initial sampling
Build surrogate model
Adaptive sampling
Model converged?
Modelerror
Error maximization sampling
22Carnegie Mellon University
• Search the problem space for areas of model inconsistency or model mismatch
• Find points that maximize the model error with respect to the independent variables
– Derivative‐free solvers work well in low‐dimensional spaces[Rios and Sahinidis, 12]
– Optimized using a black‐box or derivative‐free solver (SNOBFIT) [Huyer and Neumaier, 08]
ERROR MAXIMIZATION SAMPLING
Surrogate model
23Carnegie Mellon University
• Goal – Compare methods on three target metrics
• Modeling methods compared– ALAMO modeler – Proposed best subset methodology– The LASSO – The lasso regularization– Ordinary regression – Ordinary least‐squares regression
• Sampling methods compared (over the same data set size)– ALAMO sampler – Proposed error maximization technique– Single LH – Single Latin hypercube (no feedback)
COMPUTATIONAL RESULTS
Model simplicity3Data efficiency2Model accuracy1
24Carnegie Mellon University
Fraction of problems solved
Ordinary regression
ALAMO modelerthe lasso
Ordinary regressionALAMO modeler
the lasso
(0.005, 0.80)80% of the problems
had ≤0.5% error
70% of problems
solved exactly
Normalized test error
error maximization sampling
single Latin hypercube
Model simplicity3Data efficiency2Model accuracy1
Normalized test error
25Carnegie Mellon University
ALAMO samplerSingle LH
ALAMO samplerSingle LH
ALAMO samplerSingle LH
Ordinary regression
ALAMO modeler
the lasso
Normalized test error
Fraction of problems solved
Model simplicity3Data efficiency2Model accuracy1
26Carnegie Mellon University
more complexity than required
Results over a test set of 45 known functions treated as black boxes with bases that are available to all modeling methods.
Modeling type, Median
Model simplicity3Data efficiency2Model accuracy1
27Carnegie Mellon University
• Use freely available system knowledge to strengthen model– Physical limits– First‐principles knowledge– Intuition
• Non‐empirical restrictions can be applied to general regression problems
THEORY UTILIZATION
empirical data non‐empirical information
28Carnegie Mellon University
• Challenging due to the semi‐infinite nature of the regression constraints
CONSTRAINED REGRESSION
Standard regression
Surrogate model
easy
tough
29Carnegie Mellon University
IMPLIED PARAMETER RESTRICTIONS
Step 1: Formulate constraint in z‐ and x‐space
Step 2: Identify a sufficient set of β‐space constraints
1 parametric constraint
4 β‐constraints
Global optimization problems solved with BARON: archimedes.cheme.cmu.edu/?q=baron and www.minlp.com
30Carnegie Mellon University
Multiple responsesIndividual responsesResponse bounds
Response derivatives Alternative domains Boundary conditions
Multiple responses
mass balances, sum‐to‐one, state variables
Individual responses
mass and energy balances, physical limitations
TYPES OF RESTRICTIONSResponse bounds
pressure, temperature, compositions
Response derivatives Alternative domains Boundary conditions
monotonicity, numerical properties, convexity
safe extrapolation, boundary conditions
Addno slip
velocity profile model
safe extrapolation, boundary conditions
Problem Space
Extrapolation zone
31Carnegie Mellon University
CARBON CAPTURE SYSTEM DESIGN
• Discrete decisions: How many units? Parallel trains? What technology used for each reactor?
• Continuous decisions: Unit geometries• Operating conditions: Vessel temperature and pressure, flow rates,
compositions
Surrogate models for each reactor and technology used
32Carnegie Mellon University
SUPERSTRUCTURE OPTIMIZATIONMixed-integer nonlinear
programming model
• Economic model• Process model• Material balances• Hydrodynamic/Energy balances• Reactor surrogate models• Link between economic model
and process model• Binary variable constraints• Bounds for variables
MINLP solved with BARON
33Carnegie Mellon University
• Expanding the scope of algebraic optimization– Using low‐complexity surrogate models to strike a balance
between optimal decision‐making and model fidelity• Surrogate model identification
– Simple, accurate model identification – MINLP• Error maximization sampling
– More information found per simulated data point – DFO
ALAMO REMARKS
New surrogate model
Surrogate model
34Carnegie Mellon University
• Rios‐Sahinidis (2013) conclusions:– Deterministic solvers outperform stochastic ones– Surrogate‐model‐based algorithms have an edge
• Solve auxiliary algebraic models to global optimality to expedite search– Decide where to sample objective function
• Guarantee geometry– Construct surrogate models
• Higher‐quality surrogates– Solve trust‐region subproblems
• Escape local minima; guarantee convergence
• BARON is highly efficient for problems below 100 variables• Model‐and‐search (MAS) local search algorithm• Branch‐and‐model (BAM) global search algorithm
NEW DFO ALGORITHMS
35Carnegie Mellon University
MAS LOCAL SEARCH ALGORITHMCollect points around current iterate
Add points: n linearly independent points
Check for positive basis. Add points if necessary
Build and optimize linear or quadratic interpolating model
2x
1x
3x
4x
5x
1r2r
1r2r
36Carnegie Mellon University
PRELIMINARY RESULTS WITH MAS
359 testproblems
MAS
37Carnegie Mellon University
BAM GLOBAL SEARCH ALGORITHM
Partition the space in a
collection of hypercubes
Eliminate potentially suboptimal hypercubes
For remaining hypercubes, FIT
models
Optimize surrogate models
Sort hypercubesby predicted
optimal values. Explore the best
Sort hypercubesby size. Explore the largestt
38Carnegie Mellon University
PRELIMINARY RESULTS WITH BAM
BAM
39Carnegie Mellon University
• ALAMO provides algebraic models that are Accurate and simple Generated from a minimal number of function evaluations
• ALAMO’s constrained regression facility allows modeling of Bounds on response variables Convexity/monotonicity of response variables
• Extensive results with MAS and BAM will be presented at the upcoming INFORMS and AIChE meetings
• ALAMO site archimedes.cheme.cmu.edu/?q=alamo• DFO/SO site archimedes.cheme.cmu.edu/?q=bbo
– Test sets http://archimedes.cheme.cmu.edu/?q=sdfolib
CONCLUSIONS