alamo approach to machine learning: adaptive sampling, and

52
The ALAMO approach to machine learning: Best subset selection, adaptive sampling, and constrained regression Nick Sahinidis Acknowledgments: Alison Cozad, David Miller, Zach Wilson

Upload: others

Post on 02-Nov-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ALAMO approach to machine learning: adaptive sampling, and

The ALAMO approach to machine learning: Best subset selection, adaptive sampling, 

and constrained regressionNick Sahinidis

Acknowledgments:Alison Cozad, David Miller, Zach Wilson

Page 2: ALAMO approach to machine learning: adaptive sampling, and

2Carnegie Mellon University

Build a model of output variables  as a function of input variables x over a specified interval

MACHINE LEARNING PROBLEM

Independent variables:Operating conditions, inlet flow 

properties, unit geometry

Dependent variables:Efficiency,  outlet flow conditions, 

conversions, heat flow, etc.

Process simulation or Experiment

Page 3: ALAMO approach to machine learning: adaptive sampling, and

3Carnegie Mellon University

SIMULATION OPTIMIZATION

Pulverized coal plant Aspen Plus® simulation provided by the National Energy Technology Laboratory

Page 4: ALAMO approach to machine learning: adaptive sampling, and

4Carnegie Mellon University

No algebraic model Complex process alternatives

Costly simulations Scarcity of fully robustsimulations

CHALLENGESOPT

IMIZER

SIMULATO

R

Cost$

1 reactor2 reactors

3 reactors

Reactor size

X Gradient‐based methods X Derivative‐free methods

seconds

minutes

hours

Page 5: ALAMO approach to machine learning: adaptive sampling, and

5Carnegie Mellon University

PROCESS DISAGGREGATION

Block 1:Simulator

Model generation

Block 2:Simulator

Model generation

Block 3:Simulator

Model generation

Surrogate ModelsBuild simple and accuratemodels with a functional

form tailored for an optimization framework

Process SimulationDisaggregate process into

process blocks

Optimization ModelAdd algebraic constraints

design specs, heat/mass balances, and logic

constraints

Page 6: ALAMO approach to machine learning: adaptive sampling, and

6Carnegie Mellon University

1. Accurate– We want to reflect the true nature of the system

2. Simple– Usable for algebraic optimization– Interpretable

3. Generated from a minimal data set– Reduce experimental and simulation requirements

4. Obeys physics and user insights– Increase fidelity and validity in regions with no measurements

DESIRED MODEL ATTRIBUTES

Page 7: ALAMO approach to machine learning: adaptive sampling, and

7Carnegie Mellon University

ALAMOAutomated Learning of Algebraic MOdels

trueStop

Update training data set

Start

false

Initial sampling

Build surrogate model

Adaptive sampling

Model converged?

Black-box function

Initial sampling

Build surrogate model

Current model

Modelerror

Adaptive sampling

Update training data set

New model

Page 8: ALAMO approach to machine learning: adaptive sampling, and

8Carnegie Mellon University

MODEL COMPLEXITY TRADEOFFKriging [Krige, 63]

Neural nets [McCulloch-Pitts, 43]Radial basis functions [Buhman, 00]

Model complexity

Mod

el a

ccur

acy

Linear response surface

Preferred region

Page 9: ALAMO approach to machine learning: adaptive sampling, and

9Carnegie Mellon University

• Identify the functional form and complexity of the surrogate models

• Seek models that are combinations of basis functions1. Simple basis functions

2. Radial basis functions for non‐parametric regression3. User‐specified basis functions for tailored regression

MODEL IDENTIFICATION

Page 10: ALAMO approach to machine learning: adaptive sampling, and

10Carnegie Mellon University

• Step 1: Define a large set of potential basis functions

• Step 2: Model reduction

Select subset that optimizes an information criterion

OVERFITTING AND TRUE ERROR

True errorEmpirical error

Complexity

Err

or

Ideal Model

OverfittingUnderfitting

Page 11: ALAMO approach to machine learning: adaptive sampling, and

11Carnegie Mellon University

• Qualitative tradeoffs of model reduction methods

MODEL REDUCTION TECHNIQUES

Backward elimination [Oosterhof, 63] Forward selection [Hamaker, 62]

Stepwise regression [Efroymson, 60]

Regularized regression techniques• Penalize the least squares objective using the magnitude of the regressors

Best subset methods• Enumerate all possible subsets

Page 12: ALAMO approach to machine learning: adaptive sampling, and

12Carnegie Mellon University

• Balance fit (sum of square errors) with model complexity (number of terms in the model; denoted by p)

MODEL SELECTION CRITERIA

Corrected Akaike information criterion

log1

22 1

1

Mallows’ Cp∑

2

Hannan‐Quinn information criterion

log1

2 log log

Bayes information criterion∑

log

Mean squared error∑

1• Mixed‐integer nonlinear programming models

Page 13: ALAMO approach to machine learning: adaptive sampling, and

13Carnegie Mellon University

BRANCH‐AND‐REDUCE• Constraint propagation and duality‐based bounds tightening

– Ryoo and Sahinidis, 1995, 1996– Tawarmalani and Sahinidis, 2004

• Finite branching rules– Shectman and Sahinidis, 1998– Ahmed, Tawarmalani and Sahinidis, 2004

• Convexification– Tawarmalani and Sahinidis, 2001, 2002, 2004, 2005– Khajavirad and Sahinidis, 2012, 2013, 2014, 2016– Zorn and Sahinidis, 2013, 2013, 2014

• Implemented in BARON– First deterministic global optimization solver for NLP and MINLP

Page 14: ALAMO approach to machine learning: adaptive sampling, and

14Carnegie Mellon University

GLOBAL MINLP SOLVERS ON MINLPLIB2

CAN_SOLVE

ANTIGONE

BARON

SCIP

Con: 1893 (1—164,321), Var: 1027 (3—107,223), Disc: 137 (1—31,824)

LINDOGLOBALCOUENNE

Page 15: ALAMO approach to machine learning: adaptive sampling, and

15Carnegie Mellon University

CPU TIME COMPARISON OF METRICS

0.01

0.1

1

10

100

1000

10000

100000

20 30 40 50 60 70 80

CPU time (s)

Problem Size

Cp BIC AIC, MSE, HQC

• Eight benchmarks from the UCI and CMU data sets• Seventy noisy data sets were generated with multicolinearity

and increasing problem size (number of bases)

BIC solves more than two orders of magnitude faster than AIC, MSE and HQC

Page 16: ALAMO approach to machine learning: adaptive sampling, and

16Carnegie Mellon University

MODEL QUALITY COMPARISON

0

5

10

15

20

25

20 30 40 50 60 70 80

Spurious Variables Includ

ed

Problem Size

0

1

2

3

4

5

6

7

8

9

10

20 30 40 50 60 70 80

% Deviatio

n from

 True Mod

el

Problem Size

Cp BIC AIC HQC MSE

• BIC leads to smaller, more accurate models– Larger penalty for model complexity

Page 17: ALAMO approach to machine learning: adaptive sampling, and

17Carnegie Mellon University

ALAMOAutomated Learning of Algebraic MOdels

trueStop

Update training data set

Start

false

Initial sampling

Build surrogate model

Adaptive sampling

Model converged?

Modelerror

Error maximization sampling

Page 18: ALAMO approach to machine learning: adaptive sampling, and

18Carnegie Mellon University

• Search the problem space for areas of model inconsistency or model mismatch

• Find points that maximize the model error with respect to the independent variables

– Optimized using derivative‐free solver SNOBFIT (Huyer and Neumaier, 2008)

– SNOBFIT outperforms most derivative‐free solvers (Rios and Sahinidis, 2013) 

ERROR MAXIMIZATION SAMPLING

Surrogate model

Page 19: ALAMO approach to machine learning: adaptive sampling, and

19Carnegie Mellon University

CONSTRAINED REGRESSION

Safe extrapolation

Data space

Extrapolation zone

Fits the data better

Fits the true function better

Page 20: ALAMO approach to machine learning: adaptive sampling, and

20Carnegie Mellon University

• Challenging due to the semi‐infinite nature of the regression constraints

• Use intuitive restrictions among predictor and response variables to infer nonintuitive relationships between regression parameters

CONSTRAINED REGRESSIONStandard regression

Surrogate model

easy

tough

Constrained regression

Page 21: ALAMO approach to machine learning: adaptive sampling, and

21Carnegie Mellon University

IMPLIED PARAMETER RESTRICTIONS

Step 1: Formulate constraint in z‐ and x‐space

Step 2: Identify a sufficient set of β‐space constraints 

1 parametric constraint 

4 β‐constraints

Global optimization problems solved with BARON

Page 22: ALAMO approach to machine learning: adaptive sampling, and

22Carnegie Mellon University

Multiple responsesIndividual responsesResponse bounds

Response derivatives Alternative domains Boundary conditions

Multiple responses

mass balances, sum‐to‐one, state variables

Individual responses

mass and energy balances, physical limitations

TYPES OF RESTRICTIONSResponse bounds

pressure, temperature, compositions

Response derivatives Alternative domains Boundary conditions

monotonicity, numerical properties, convexity

safe extrapolation, boundary conditions

Addno slip

velocity profile model

safe extrapolation, boundary conditions

Problem Space

Extrapolation zone

Page 23: ALAMO approach to machine learning: adaptive sampling, and

23Carnegie Mellon University

CATEGORICAL CORRELATED

Page 24: ALAMO approach to machine learning: adaptive sampling, and

24Carnegie Mellon University

CATEGORICAL CORRELATED

Page 25: ALAMO approach to machine learning: adaptive sampling, and

25Carnegie Mellon University

• A collection of basis functions or groups of basis functions is referred to as a group

• A group is selected iff any of its components is selected• Group constraints

– NMT: no more than a specified number of components is selected from a group

– ATL: at least a specified number of components is selected from a group

– REQ: if a group is selected, another group must also be selected

– XCL: if a group is selected, another group must not be selected

GROUP CONSTRAINTS

Page 26: ALAMO approach to machine learning: adaptive sampling, and

26 of 76Carnegie Mellon University

ALAMO TUTORIAL AND APPLICATIONS

Page 27: ALAMO approach to machine learning: adaptive sampling, and

27Carnegie Mellon University

• From The Optimization Firm at http://minlp.com• ZIP archives of Windows, Linux, and OSX versions• Free licenses for academics and CAPD companies

DOWNLOAD

Page 28: ALAMO approach to machine learning: adaptive sampling, and

28 of 76Carnegie Mellon University

WORKING WITH PRE‐EXISTING DATA SETS

Page 29: ALAMO approach to machine learning: adaptive sampling, and

29Carnegie Mellon University

EXAMPLE ALAMO INPUT FILE

128 alternative models

Page 30: ALAMO approach to machine learning: adaptive sampling, and

30Carnegie Mellon University

ALAMO OUTPUT

Page 31: ALAMO approach to machine learning: adaptive sampling, and

31Carnegie Mellon University

more complexity than required

Results over a test set of 45 known functions treated as black boxes with bases that are available to all modeling methods.

Modeling type, Median

Number of terms in the true function

Number of terms inthe surrogate model

_

SIMPLE AND ACCURATE MODELS

Page 32: ALAMO approach to machine learning: adaptive sampling, and

32 of 76Carnegie Mellon University

WORKING WITH A SIMULATOR

Page 33: ALAMO approach to machine learning: adaptive sampling, and

33Carnegie Mellon University

BUBBLING FLUIDIZED BED ADSORBER

CO2 rich solid outlet

Cooling water

Outlet gas Solid feed

CO2 rich gas

Goal:  Optimize a bubbling fluidized bed reactor by

• Minimizing the increased cost of electricity

• Maximizing CO2 removal

Page 34: ALAMO approach to machine learning: adaptive sampling, and

34Carnegie Mellon University

BUBBLING FLUIDIZED BED ADSORBER

CO2 rich solid outlet

Cooling water

Generate model of % CO2 removal:

Over the Range:

Page 35: ALAMO approach to machine learning: adaptive sampling, and

35Carnegie Mellon University

POTENTIAL MODEL

66 basis functions

Not tractable in an algebraic superstructure formulation!

Page 36: ALAMO approach to machine learning: adaptive sampling, and

36Carnegie Mellon University

• Current model form, model surface, and data points

BUILD MODEL

Page 37: ALAMO approach to machine learning: adaptive sampling, and

37Carnegie Mellon University

• New data point(s) shown 

ADAPTIVE SAMPLING

Page 38: ALAMO approach to machine learning: adaptive sampling, and

38Carnegie Mellon University

• Previous model in mesh

REBUILD MODEL

Page 39: ALAMO approach to machine learning: adaptive sampling, and

39Carnegie Mellon University

• Model error(iterations) and AIC(T) shown

REBUILD MODEL

Page 40: ALAMO approach to machine learning: adaptive sampling, and

40Carnegie Mellon University

ADAPTIVE SAMPLING

Page 41: ALAMO approach to machine learning: adaptive sampling, and

41Carnegie Mellon University

• Sufficiently low error, model converges

REBUILD MODEL

Page 42: ALAMO approach to machine learning: adaptive sampling, and

42Carnegie Mellon University

OPTIMAL PARETO CURVE

Page 43: ALAMO approach to machine learning: adaptive sampling, and

43Carnegie Mellon University

OPTIMAL PARETO CURVE

Page 44: ALAMO approach to machine learning: adaptive sampling, and

44Carnegie Mellon University

OPTIMAL PARETO CURVE

Surrogate results

Page 45: ALAMO approach to machine learning: adaptive sampling, and

45Carnegie Mellon University

OPTIMAL PARETO CURVE

Surrogate results

Simulation results

Page 46: ALAMO approach to machine learning: adaptive sampling, and

46Carnegie Mellon University

Fraction of problems solved

Ordinary regression

ALAMO modelerthe lasso

Ordinary regressionALAMO modeler

the lasso

(0.005, 0.80)80% of the problems

had ≤0.5% error

70% of problems

solved exactly

Normalized test error

error maximization sampling

single Latin hypercube

Normalized test error

SAMPLING EFFICIENCY

Results with 45 known functions with bases that are available to all modeling methods

Page 47: ALAMO approach to machine learning: adaptive sampling, and

47 of 76Carnegie Mellon University

SUPERSTRUCTURE OPTIMIZATION

Page 48: ALAMO approach to machine learning: adaptive sampling, and

48Carnegie Mellon University

CARBON CAPTURE SYSTEM DESIGN

• Discrete decisions:   How many units? Parallel trains? What technology used for each reactor?

• Continuous decisions: Unit geometries• Operating conditions:  Vessel temperature and pressure, flow rates, 

compositions

Surrogate models for each reactor and technology used

Page 49: ALAMO approach to machine learning: adaptive sampling, and

49Carnegie Mellon University

Model outputs (14 total)Geometry required (2)Operating condition required (1)Gas mole fractions (3)Solid compositions (3)Flow rates (2)Outlet temperatures (3)

BUBBLING FLUIDIZED BED

• Model inputs (16 total)– Geometry (3)– Operating conditions (5)– Gas mole fractions (2)– Solid compositions (2)– Flow rates (4)

Bubbling fluidized bed adsorber diagramOutlet gas Solid feed

CO2 rich gas CO2 rich solid outlet

Coolingwater

Model created by Andrew Lee at the National Energy Technology Laboratory

Page 50: ALAMO approach to machine learning: adaptive sampling, and

50Carnegie Mellon University

EXAMPLE MODELS ‐ ADSORBERSolid feed

CO2 rich gas

Coolingwater

Page 51: ALAMO approach to machine learning: adaptive sampling, and

51Carnegie Mellon University

SUPERSTRUCTURE OPTIMIZATIONMixed-integer nonlinear

programming model

• Economic model• Process model• Material balances• Hydrodynamic/Energy balances• Reactor surrogate models• Link between economic model

and process model• Binary variable constraints• Bounds for variables

MINLP solved with BARON

Page 52: ALAMO approach to machine learning: adaptive sampling, and

52Carnegie Mellon University

• ALAMO provides algebraic models that are Accurate Simple Generated from a minimal number of data points

• ALAMO’s constrained regression facility allows modeling of Bounds on response variables Variable groups Forthcoming: constraints on gradient of response variables

• Built on top of state‐of‐the‐art optimization solvers

• Available from www.minlp.com 

CONCLUSIONS