surrogate models and the of systems in the absence of...

39
Surrogate models and the optimization of systems in the absence of algebraic models Nick Sahinidis Acknowledgments: Alison Cozad, David Miller, Zach Wilson Atharv Bhosekar, Luis Miguel Rios, Hua Zheng

Upload: others

Post on 17-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

Surrogate models and the optimization of systems in the absence 

of algebraic modelsNick Sahinidis

Acknowledgments:Alison Cozad, David Miller, Zach Wilson

Atharv Bhosekar, Luis Miguel Rios, Hua Zheng

Page 2: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

2Carnegie Mellon University

SIMULATION OPTIMIZATION

Pulverized coal plant Aspen Plus® simulation provided by the National Energy Technology Laboratory

Page 3: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

3Carnegie Mellon University

EXPERIMENT OPTIMIZATION

ExtractionFermentation (twice)

PolymerizationExtrusionDistillation

ProcessSystems

Engineering

• Six functional units• Maximize PTT yield • Experimental system available in the Unit Operations Lab• Several degrees of freedom

– Fermenter—pH, T, NaCl concentration, O2 sparging conditions– Extruder—input concentrations, spinning conditions (discrete)

Page 4: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

Carnegie Mellon University 4

No algebraic model Complex process alternatives

Costly simulations Scarcity of fully robustsimulations

CHALLENGESOPT

IMIZER

SIMULATO

R

Cost$

1 reactor2 reactors

3 reactors

Reactor size

seconds

minutes

hours

Page 5: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

Carnegie Mellon University 5

• Derivative‐free optimization (DFO)– Optimizes in the absence of algebraic models– Derivatives are symbolically unavailable– Derivatives may be time consuming to compute numerically

– Rios and Sahinidis (JOGO, 2013)• Reviews algorithms and compares 20+ codes on 500+ problems

• Simulation‐optimization (SO)– Addresses noise in function values– Amaran, Sahinidis, Sharda and Bury (AOR, 2015)

• Reviews algorithms and applications

OPTIMIZATION TECHNOLOGY

Page 6: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

Carnegie Mellon University 6

DFO ALGORITHMS• LOCAL SEARCH METHODS

– Direct local search• Nelder‐Mead simplex algorithm

• Generalized pattern search and generating search set

– Based on surrogate models• Trust‐region methods• Implicit filtering

• GLOBAL SEARCH METHODS– Deterministic global search

• Lipschitzian‐based partitioning • Multilevel coordinate search

– Stochastic global optimization• Hit‐and‐run• Simulated annealing• Genetic algorithms• Particle swarm

– Based on surrogate models• Response surface methods• Surrogate management framework

• Branch‐and‐fit

Page 7: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

Carnegie Mellon University 7

DFO SOFTWARELOCAL SEARCH

FMINSEARCH (Nelder-Mead)DAKOTA PATTERN (PPS)HOPSPACK (PPS)SID-PSM (Simplex gradient PPS)NOMAD (MADS)DFO (Trust region, quadratic model)IMFIL (Implicit Filtering)BOBYQA (Trust region, quadratic model)NEWUOA (Trust region, quadratic model)

GLOBAL SEARCHDAKOTA SOLIS-WETS (DIRECT)DAKOTA DIRECT (DIRECT)TOMLAB GLBSOLVE (DIRECT)TOMLAB GLCSOLVE (DIRECT)MCS (Multilevel coordinate search)TOMLAB EGO (RSM using Kriging)TOMLAB RBF (RSM using RBF)SNOBFIT (Branch-and-Fit)TOMLAB LGO (LGO algorithm)

STOCHASTICASA (Simulated annealing)CMA-ES (Evolutionary algorithm)DAKOTA EA (Evolutionary algorithm)GLOBAL (Clustering – Multistart)PSWARM (Particle swarm)

Page 8: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

Carnegie Mellon University 8

PROBLEMS SOLVED

Page 9: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

9Carnegie Mellon University

PROCESS DISAGGREGATION

Block 1:Simulator

Model generation

Block 2:Simulator

Model generation

Block 3:Simulator

Model generation

Surrogate ModelsBuild simple and accuratemodels with a functional

form tailored for an optimization framework

Process SimulationDisaggregate process into

process blocks

Optimization ModelAdd algebraic constraints

design specs, heat/mass balances, and logic

constraints

Page 10: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

Carnegie Mellon University 10

Build a model of output variables  as a function of input variables x over a specified interval

LEARNING PROBLEM

Independent variables:Operating conditions, inlet flow 

properties, unit geometry

Dependent variables:Efficiency,  outlet flow conditions, 

conversions, heat flow, etc.

Process simulation or Experiment

∈ ⋮

⋮∈

Page 11: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

11Carnegie Mellon University

• We aim to build surrogate models that are– Accurate

• We want to reflect the true nature of the simulation

– Simple• Interpretable; tailored for algebraic optimization

– Generated from a minimal data set• Reduce experimental and simulation requirements

DESIRED MODEL ATTRIBUTES

Page 12: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

12Carnegie Mellon University

ALAMOAutomated Learning of Algebraic Models using Optimization

trueStop

Update training data set

Start

false

Initial sampling

Build surrogate model

Adaptive sampling

Model converged?

Black-box function

Initial sampling

Build surrogate model

Current model

Modelerror

Adaptive sampling

Update training data set

New model

Page 13: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

13Carnegie Mellon University

MODEL COMPLEXITY TRADEOFFKriging [Krige, 63]

Neural nets [McCulloch-Pitts, 43]Radial basis functions [Buhman, 00]

Model complexity

Mod

el a

ccur

acy

Linear response surface

Preferred region

Page 14: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

14Carnegie Mellon University

• Goal: Identify the functional form and complexity of the surrogate models

• Functional form: – General functional form is unknown: Our method will identify 

models with combinations of simple basis functions

MODEL IDENTIFICATION

Page 15: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

15Carnegie Mellon University

• Step 1: Define a large set of potential basis functions

• Step 2: Model reduction

OVERFITTING AND TRUE ERROR

True errorEmpirical error

Complexity

Err

or

Ideal Model

OverfittingUnderfitting

Page 16: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

16Carnegie Mellon University

• Qualitative tradeoffs of model reduction methods

MODEL REDUCTION TECHNIQUES

Backward elimination [Oosterhof, 63] Forward selection [Hamaker, 62]

Stepwise regression [Efroymson, 60]

Regularized regression techniques• Penalize the least squares objective using the magnitude of the regressors [Tibshirani, 95]

Best subset methods• Enumerate all possible subsets

Page 17: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

17Carnegie Mellon University

MODEL SIZING

Complexity = number of terms allowed in the model

Goodness‐of‐fit measure 6th term was not worth the 

added complexity

Final model includes 5 termsSome measure of 

error that is sensitive to overfitting

(AICc, BIC, Cp)

Solve for the best one‐term 

modelSolve for the best two‐term 

model

Page 18: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

18Carnegie Mellon University

• Balance fit (sum of square errors) with model complexity (number of terms in the model; denoted by p)

MODEL SELECTION CRITERIA

Corrected Akaike Information Criterion

log1

22 1

1

Mallows’ Cp∑

2

Hannan‐Quinn Information Criterion

log1

2 log log

Bayes Information Criterion∑

log

Mean Squared Error∑

1

Page 19: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

19Carnegie Mellon University

CPU TIME COMPARISON

0.01

0.1

1

10

100

1000

10000

100000

20 30 40 50 60 70 80

CPU time (s)

Problem Size

Cp BIC AIC, MSE, HQC

• Eight benchmarks from the UCI and CMU data sets• Seventy noisy data sets were generated with multicolinearity

and increasing problem size (number of bases)

• BIC solves more than two orders of magnitude faster than AIC, MSE and HQC– Optimized directly via a 

single mixed‐integer convex quadratic model

Page 20: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

20Carnegie Mellon University

MODEL QUALITY COMPARISON

0

5

10

15

20

25

20 30 40 50 60 70 80

Spurious Variables Includ

ed

Problem Size

0

1

2

3

4

5

6

7

8

9

10

20 30 40 50 60 70 80

% Deviatio

n from

 True Mod

el

Problem Size

Cp BIC AIC HQC MSE

• BIC leads to smaller, more accurate models– Larger penalty for model complexity

Page 21: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

21Carnegie Mellon University

ALAMOAutomated Learning of Algebraic Models using Optimization

trueStop

Update training data set

Start

false

Initial sampling

Build surrogate model

Adaptive sampling

Model converged?

Modelerror

Error maximization sampling

Page 22: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

22Carnegie Mellon University

• Search the problem space for areas of model inconsistency or model mismatch

• Find points that maximize the model error with respect to the independent variables

– Derivative‐free solvers work well in low‐dimensional spaces[Rios and Sahinidis, 12]

– Optimized using a black‐box or derivative‐free solver (SNOBFIT) [Huyer and Neumaier, 08]

ERROR MAXIMIZATION SAMPLING

Surrogate model

Page 23: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

23Carnegie Mellon University

• Goal – Compare methods on three target metrics

• Modeling methods compared– ALAMO modeler – Proposed best subset methodology– The LASSO – The lasso regularization– Ordinary regression – Ordinary least‐squares regression

• Sampling methods compared (over the same data set size)– ALAMO sampler – Proposed error maximization technique– Single LH – Single Latin hypercube (no feedback)

COMPUTATIONAL RESULTS

Model simplicity3Data efficiency2Model accuracy1

Page 24: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

24Carnegie Mellon University

Fraction of problems solved

Ordinary regression

ALAMO modelerthe lasso

Ordinary regressionALAMO modeler

the lasso

(0.005, 0.80)80% of the problems

had ≤0.5% error

70% of problems

solved exactly

Normalized test error

error maximization sampling

single Latin hypercube

Model simplicity3Data efficiency2Model accuracy1

Normalized test error

Page 25: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

25Carnegie Mellon University

ALAMO samplerSingle LH

ALAMO samplerSingle LH

ALAMO samplerSingle LH

Ordinary regression

ALAMO modeler

the lasso

Normalized test error

Fraction of problems solved

Model simplicity3Data efficiency2Model accuracy1

Page 26: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

26Carnegie Mellon University

more complexity than required

Results over a test set of 45 known functions treated as black boxes with bases that are available to all modeling methods.

Modeling type, Median

Model simplicity3Data efficiency2Model accuracy1

Page 27: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

27Carnegie Mellon University

• Use freely available system knowledge to strengthen model– Physical limits– First‐principles knowledge– Intuition

• Non‐empirical restrictions can be applied to general regression problems

THEORY UTILIZATION

empirical data non‐empirical information

Page 28: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

28Carnegie Mellon University

• Challenging due to the semi‐infinite nature of the regression constraints

CONSTRAINED REGRESSION

Standard regression

Surrogate model

easy

tough

Page 29: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

29Carnegie Mellon University

IMPLIED PARAMETER RESTRICTIONS

Step 1: Formulate constraint in z‐ and x‐space

Step 2: Identify a sufficient set of β‐space constraints 

1 parametric constraint 

4 β‐constraints

Global optimization problems solved with BARON: archimedes.cheme.cmu.edu/?q=baron and www.minlp.com

Page 30: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

30Carnegie Mellon University

Multiple responsesIndividual responsesResponse bounds

Response derivatives Alternative domains Boundary conditions

Multiple responses

mass balances, sum‐to‐one, state variables

Individual responses

mass and energy balances, physical limitations

TYPES OF RESTRICTIONSResponse bounds

pressure, temperature, compositions

Response derivatives Alternative domains Boundary conditions

monotonicity, numerical properties, convexity

safe extrapolation, boundary conditions

Addno slip

velocity profile model

safe extrapolation, boundary conditions

Problem Space

Extrapolation zone

Page 31: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

31Carnegie Mellon University

CARBON CAPTURE SYSTEM DESIGN

• Discrete decisions:   How many units? Parallel trains? What technology used for each reactor?

• Continuous decisions: Unit geometries• Operating conditions:  Vessel temperature and pressure, flow rates, 

compositions

Surrogate models for each reactor and technology used

Page 32: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

32Carnegie Mellon University

SUPERSTRUCTURE OPTIMIZATIONMixed-integer nonlinear

programming model

• Economic model• Process model• Material balances• Hydrodynamic/Energy balances• Reactor surrogate models• Link between economic model

and process model• Binary variable constraints• Bounds for variables

MINLP solved with BARON

Page 33: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

33Carnegie Mellon University

• Expanding the scope of algebraic optimization– Using low‐complexity surrogate models to strike a balance 

between optimal decision‐making and model fidelity• Surrogate model identification

– Simple, accurate model identification – MINLP• Error maximization sampling

– More information found per simulated data point – DFO

ALAMO REMARKS

New surrogate model

Surrogate model

Page 34: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

34Carnegie Mellon University

• Rios‐Sahinidis (2013) conclusions:– Deterministic solvers outperform stochastic ones– Surrogate‐model‐based algorithms have an edge

• Solve auxiliary algebraic models to global optimality to expedite search– Decide where to sample objective function

• Guarantee geometry– Construct surrogate models

• Higher‐quality surrogates– Solve trust‐region subproblems

• Escape local minima; guarantee convergence

• BARON is highly efficient for problems below 100 variables• Model‐and‐search (MAS) local search algorithm• Branch‐and‐model (BAM) global search algorithm

NEW DFO ALGORITHMS

Page 35: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

35Carnegie Mellon University

MAS LOCAL SEARCH ALGORITHMCollect points around current iterate

Add points: n linearly independent points

Check for positive basis. Add points if necessary

Build and optimize linear or quadratic interpolating model

2x

1x

3x

4x

5x

1r2r

1r2r

Page 36: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

36Carnegie Mellon University

PRELIMINARY RESULTS WITH MAS

359 testproblems

MAS

Page 37: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

37Carnegie Mellon University

BAM GLOBAL SEARCH ALGORITHM

Partition the space in a 

collection of hypercubes

Eliminate potentially suboptimal hypercubes

For remaining hypercubes,  FIT 

models

Optimize surrogate models

Sort hypercubesby predicted 

optimal values. Explore the best

Sort hypercubesby size. Explore the largestt

Page 38: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

38Carnegie Mellon University

PRELIMINARY RESULTS WITH BAM

BAM

Page 39: Surrogate models and the of systems in the absence of ...egon.cheme.cmu.edu/ewo/docs/EWO_2015_NVS.pdfCarnegie Mellon University 5 • Derivative‐free optimization (DFO) – Optimizes

39Carnegie Mellon University

• ALAMO provides algebraic models that are Accurate and simple Generated from a minimal number of function evaluations

• ALAMO’s constrained regression facility allows modeling of Bounds on response variables Convexity/monotonicity of response variables

• Extensive results with MAS and BAM will be presented at the upcoming INFORMS and AIChE meetings

• ALAMO site archimedes.cheme.cmu.edu/?q=alamo• DFO/SO site archimedes.cheme.cmu.edu/?q=bbo

– Test sets http://archimedes.cheme.cmu.edu/?q=sdfolib

CONCLUSIONS