surrogate and reduced-order models · surrogate and reduced-order models problem: difficult to...
Post on 18-Jan-2020
9 Views
Preview:
TRANSCRIPT
Surrogate and Reduced-Order ModelsProblem: Difficult to obtain sufficient number of realizations of discretized PDE models for Bayesian model calibration, design and control.
Solution: Construct surrogate models• Also termed data-fit models, response surface models, emulators, meta-models
• Projection-based models often called reduced-order models
Constitutive Closure Relations: e.g.,
Mass
Momentum
Energy
Water
Aerosol
@⇢
@t+r · (⇢v) = 0
@v@t
= -v ·rv -1⇢rp - gk̂ - 2⌦⇥ v
⇢cV@T@t
+ pr · v = -r · F +r · (krT ) + ⇢q̇(T , p, ⇢)
p = ⇢RT
@mj
@t= -v ·rmj + Smj (T , mj ,�j , ⇢) , j = 1, 2, 3,
@�j
@t= -v ·r�j + S�j (T ,�j , ⇢) , j = 1, · · · , J,
Surrogate Models: MotivationExample: Consider the heat equation
Notes:• Requires approximation of PDE in 3-D• What would be a simple surrogate?
t
1 x , y , z
with the response
Boundary Conditions
Initial Conditions
y(q) =
Z 1
0
Z 1
0
Z 1
0
Z 1
0u(t , x , y , z)dxdydzdt
@u
@t
=@2
u
@x
2 +@2
u
@y
2 +@2
u
@z
2 + f (q)
Surrogate Models: MotivationExample: Consider the heat equation
with the response
Boundary Conditions
Initial Conditions
y(q) =
Z 1
0
Z 1
0
Z 1
0
Z 1
0u(t , x , y , z)dxdydzdt
@u
@t
=@2
u
@x
2 +@2
u
@y
2 +@2
u
@z
2 + f (q)
Question: How do you construct a polynomial surrogate?• Regression• Interpolation
t
1 x , y , z
Surrogate: Quadraticys(q) = (q - 0.25)2 + 0.5
Surrogate ModelsRecall: Consider the model
Question: How do you construct a polynomial surrogate?• Interpolation• Regression
M=7k=2
with the response
Boundary Conditions
Initial Conditions
y(q) =
Z 1
0
Z 1
0
Z 1
0
Z 1
0u(t , x , y , z)dxdydzdt
@u
@t
=@2
u
@x
2 +@2
u
@y
2 +@2
u
@z
2 + f (q)
t
1 x , y , z
Surrogate: Quadraticys(q) = (q - 0.25)2 + 0.5
Surrogate and Reduced-Order ModelsIssues: •Techniques for regression versus interpolation
• Grid choices for interpolation• Techniques for time-dependent problems
Data-Fit ModelsNotes:• Often termed response surface models, surrogates, emulators, meta-models.
• Rely on interpolation or regression.
• Data can consist of high-fidelity simulations or experiments.
• Common techniques: polynomial models, kriging (Gaussian process regression), orthogonal polynomials.
Statistical Model:
y = f(q)
Strategy: Consider high fidelity model
with M model evaluations
y = f (q)
ym = f (qm) , m = 1, ... , M
fs(q): Surrogate for f (q)
ym = fs(qm) + "m , m = 1, ... , M
Data-Fit Models – Polynomial EmulatorQuadratic Emulator: Regression
Least Squares Estimate:
Deterministic System:
Notes:• Good choice for optimization;
• Accurate approximation may require high-order polynomials;
• Does not provide uncertainty bounds for uncertainty quantification.
fs(q;�) = �0 + �1q + �2q2
yobs
= X�
� = [XTX]�1XT yobs
Quadratic
Res
pons
e
Data
Parameters q
Emulator
Data-Fit Models – CollocationStrategy: Consider high fidelity model
y = f(q)
with M model evaluations
ym = f(qm) , m = 1, · · · ,M
Collocation Surrogate:
ys(q) = fs(q) =MX
m=1
ymLm(q)
Lm(q) =MY
j=0j 6=m
q � q j
qm � q j=
(q � q 1) · · · (q � qm�1)(q � qm+1) · · · (q � qM )
(qm � q 1) · · · (qm � qm�1)(qm � qm+1) · · · (qm � qM )
Note:
Lm(qj) = �jm =
⇢0 , j 6= m1 , j = m
Result: ys(qm) = f(qm)
where Lm(q) is a Lagrange polynomial, which in 1-D is represented by
ym
Data-Fit Models – CollocationStrategy: Consider high fidelity model
Alternative: Collocation surrogate
Note: Result:
Lm(q) =MY
j=0j 6=m
q - q j
q m - q j =(q - q 1) · · · (q - q m-1)(q - q m+1) · · · (q - q M)
(q m - q 1) · · · (q m - q m-1)(q m - q m+1) · · · (q m - q M)
Lm(q j) = �jm =
�0 , j 6= m1 , j = m
y = f (q)
where Lm(q) is a Lagrange polynomial, which in 1-D, is represented by
ym
with M samples ym , m = 1, ... , M
yM(q) =MX
m=1
ymLm(q)
yM(qm) = ym
Properties:• Easy to add points• Method is nonintrusive and treats
code as black box.
Data-Fit Models – Gaussian Process Emulator
Gaussian Process: Kriging
fs(q;�) = gT (q)�+ Z (q)
• gT (q)� : Trend function
• Z (q): Gaussian process with
cov[Z (qi), Z (qj)] = �2R(qi, qj) + �2
0
�(qi, qj)
R(qi, qj) = exp
-pX
k=1
���✓k(qik - qj
k)����k
!
High Fidelity Model: M evaluations
ym = f (qm) , m = 1, ... , M
Statistical Model:
ym = fs(qm,�) + "m , "m ⇠ N(0,�20)
Note:�(qi , qj) =
�1 , kqj - qik = 00 , else
Universal Kriging: gT (q)� =PK
k=0 �kgk(q)
Ordinary Kriging: gT (q)� = �0
Gaussian Process Emulator
Gaussian Process: Kriging
fs(q;�) = gT (q)�+ Z (q)
cov[Z (qi), Z (qj)] = �2R(qi, qj) + �2
0
�(qi, qj)
R(qi, qj) = exp
-pX
k=1
���✓k(qik - qj
k)����k
!
Statistical Model:
ym = fs(qm,�) + "m , "m ⇠ N(0,�20)
Note: Hyper-parameters ✓k ,�ktuned to achieve varying degrees of correlation
−4 −2 0 2 40
0.2
0.4
0.6
0.8
1
qi−qj
R(q
i ,qj )
γ=0.5γ=1γ=2
Gaussian Process Emulator
Gaussian Process: Kriging
fs(q;�) = gT (q)�+ Z (q)
cov[Z (qi), Z (qj)] = �2R(qi, qj) + �2
0
�(qi, qj)
R(qi, qj) = exp
-pX
k=1
���✓k(qik - qj
k)����k
!
Statistical Model:
ym = fs(qm,�) + "m , "m ⇠ N(0,�20)
Note: In absence of observation noise, GP surrogate interpolates; i.e.,
ym = fs(qm,�) , m = 1, ... , M
Uncertainty Bounds:
Gaussian Process EmulatorObservations:
Likelihood:
Log-likelihood:
and
Take ys = [y1, ... , yM ]T , 1 = [1, · · · , 1] and Rij = R(qi , qj)
L(�0
,�2
; q) =1
(2⇡�2)M/2|R|1/2
exp
-
1
2�2
(ys - �0
1)TR-1(ys - �0
1)�
` = -M2
ln(2⇡)-M2
ln(�2)-12
ln(R)-1
2�2 (ys - �01)TR-1(ys - �01)
) @`
@�0=
12�2 1TR-1(ys - �01) · 2 = 0
) �0(✓,�) =⇥1TR-11
⇤-1 ⇥1TR-1ys⇤
@`
@�2 = -M2
1�2 +
12(�2)2 (ys - �01)TR-1(ys - �01) = 0
) �2(✓,�) =1M
[ys - �01]T R-1 [ys - �01]
Note: Optimize to infer ✓,�
Gaussian Process PredictionStrategy:
Assume we have M points ys = [f (q1), ... , f (qM)]T and we want
data ys and prediction yp. Form augmented vector
ey =
ys
yp
�=
2
6664
f (q1)...
f (qM)yp
3
7775
Let
and
to make prediction yp that maximizes likelihood of observed
eR =
"R rrT 1
#
=
"M ⇥ M M ⇥ 11 ⇥ M 1 ⇥ 1
#
r(q) =
2
64cov[Z (q1), Z (q)]
.
.
.
cov[Z (qM), Z (q)]
3
75 =
2
64R(q1
, q).
.
.
R(qM, q)
3
75
Gaussian Process PredictionLog-Likelihood:
Last term:
Note: Form partitioned matrix
Note:
=
" eR-111
eR-112
eR-121
eR-122
#
eR-1 =
"R-1 + R-1r
�1 - rTR-1r
�-1 rTR-1 -R-1r�1 - rTR-1r
�-1
-�1 - rTR-1r
�-1 rTR-1�1 - rTR-1r
�-1
#
` = -M2
ln(2⇡)-M2
ln e�2 -12
ln |eR|- 12e�2 [ey - �01]T eR-1 [ey - �01]
-12e�2
"ys - �01yp - �0
#T "R rrT 1
#-1 "ys - �01yp - �0
#
ys - �01yp - �0
�T
=⇥f (q1)- �0, ... , f (qM)- �0 | yp - �0
⇤
Gaussian Process PredictionThen
Thus@`
@yp= 0
Best Unbiased Estimate for Mean:
"ys - �01yp - �0
#T " eR11 eR12
eR21 eR22
#-1 "ys - �01yp - �0
#
=h(ys - �01)T eR-1
11 + (yp - �0)eR-121 (ys - �01)T eR-1
12 + (yp - �0)eR-122
i "ys - �01yp - �0
#
= F (ys)+ (yp -�0)eR-121 (ys -�01)+ (ys -�01)T eR-1
12 (yp -�0)+ (yp -�0)2eR-1
22
= F (ys) +
-2rTR-1(ys - �01)
1 - rTR-1r(yp - �0) +
11 - rTR-1r
(yp - �0)
�
) -11 - rTR-1r
(yp - �0) +rTR-1(ys - �01)
1 - rTR-1r= 0
yp(q) = E[fs(q,�0)] = �0 + rT (q)R-1(ys - �01)
Gaussian Process PredictionBest Unbiased Estimate for Mean:
yp(q) = E[fs(q,�0)] = �0 + rT (q)R-1(ys - �01)
Note:
r(q) =
2
64
R(q1, q)...
R(qM , q)
3
75 ) r(qj) =
2
64
R(q1, qj)...
R(qM , qj)
3
75
Then
rTR-1[ys - �01] =⇥R(q1, qj), ... , R(qM , qj)
⇤2
4 R-1
3
5
2
64
y1 - �0...
yM - �0
3
75
= yj - �0
so yp(qj) = yj
Gaussian Process Prediction
Variance:
where
results from the condition
�2 =1M
[ys - �0(✓,�)1]T R-1 [ys - �0(✓,�)1]
var[fs(q,�0)] = �21 - rTR-1r +
(rTR-11 - 1)2
1TR-11
�
@`
@�2 = 0
References:• C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning, MIT Press,
2006
• A.I.J. Forrester, A.Sóbester and A.J. Keane, Engineering Design via Surrogate Modelling: A Practical Guide, Progress in Astronautics and Aeronautics Volume 26, John Wiley and Sons, Chickester, UK, 2007
Example: Scaled Branin Function
Function:
where
and
• Functionmovedup80andflipped
• Usedprimarilytotestoptimizationroutines• 3uniquemaxima
Example: Scaled Branin Function• N=1000randomlyselectedpoints usedtoconstructsurrogatemodel
• m=100randomlyselectedtestpointsusedtoassessaccuracy
• MATLABfitrgp andpredictfunctions• Maximumlikelihoodestimateofhyperparameters
• RMSerrorovermpredictions=0.0460
Example: Scaled Branin Function
1-D Function:
• Sameconstantsusedasbefore
• Uniquemaximum
• MATLABprovides99%predictionintervals
• Addingnewdatatightenspredictionintervals
• Regression,sonotexactatdatapoints
Surrogate Models – Grid Choice
Example:
−1 −0.5 0 0.5 1−0.2
0
0.2
0.4
0.6
0.8
1
1.2
q
f(q)
−1 −0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
q
f(q)
−1 −0.5 0 0.5 1−80
−60
−40
−20
0
20
q
f(q)
−1 −0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
q
f(q)
Consider the Runge function f (q) = 1
1+25q2
with points
qj = -1 + (j - 1)2M
, j = 1, ... , M qj = - cos
⇡(j - 1)
M - 1
, j = 1, ... , M
Sparse Grid TechniquesTensored Grids: Exponential growth as a function of dimension
Sparse Grids: Same accuracy with significantly reduce number of points
R
0 1
1 x y
2 x
2xy y
2
3 x
3x
2y xy
2y
3
4 x
4x
3y x
2y
2xy
3y
4
Motivation: Do not need full set of points to achieve same degree of accuracy
Sparse Grid TechniquesTensored Grids: Exponential growth Sparse Grids: Same accuracy
p R` Sparse Grid R Tensored Grid R = (R`)p
2 9 29 81
5 9 241 59,049
10 9 1581 > 3⇥ 109
50 9 171,901 > 5⇥ 1047
100 9 1,353,801 > 2⇥ 1095
Reduced-order ModelsBased on Projections: e.g., Evolution model:
Full-Order Approximate Solution
satisfies
Goal: Construct reduced-order solution Note: • Basis construction crux of method.
• Eigenfunctions
• Proper Orthogonal Decomposition (POD)
Reduced-order ModelsExample: Eigenfunctions for heat equation
Weak Formulation:
Reduced-Order Basis:
Reduced-Order Model:
satisfying Reduced-Order System:
Reduced-order ModelsProper Orthogonal Decomposition (POD): •Based on snapshots
•Set typically highly redundant;
•POD extracts coherent structure having largest mean square projection onto set of observations.
•Related to Karhunen-Loeve expansion and singular value decomposition.
Strategy: Seek basis function of the form
Reduced-Order Basis:
Reduced-order ModelsStrategy: Seek basis function of the form
Reduced-Order Basis:
Note: R is positive, self-adjoint operator
Equivalent Problem: Find largest eigenvalue of
or
top related