eawag: swiss federal institute of aquatic science and technology mechanism-based emulation of...
TRANSCRIPT
Eawag: Swiss Federal Institute of Aquatic Science and Technology
Mechanism-Based Emulation of Dynamic Simulation Models –Concept and Application in Hydrology
Peter Reichert
Eawag Dübendorf and ETH ZürichSwitzerland
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Contents
Motivation
Concept
Implementation
Application
Discussion
Motivation
Concept of Emulators General Concept
Gaussian Process Emulator
Dynamic Emulator
Implementation
Application
Discussion and Outlook
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Motivation
Motivation
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Motivation
Motivation
Concept
Implementation
Application
Discussion
Problem
Many important systems analytical techniques, such as optimization, sensitivity analysis, and statistical inference (e.g. Bayesian inference using MCMC) require a large number of model evaluations.
Many environmental simulation models are computationally demanding.
Model-based analysis of environmental systems is often limited by computational requirements.
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Motivation
Motivation
Concept
Implementation
Application
Discussion
Solution Strategies
1. Improve the efficiency of the implementation of environmental simulation models.
2. Improve the efficiency of the implementaton of systems analytical techniques.
3. Replace the simulation model by a simplified statistical description, an emulator.
Obviously, all three strategies must be followed.
This talk is about recent progress with strategy 3: The construction and use of emulators of dynamic environmental simulation models.
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Concept
Concept
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Concept
Emulator:
An emulator is a statistical approximation of a deterministic simulation model
It can be used for interpolating model results between simulation results gained at carefully chosen design points in model input space.
Replacing the simulation model by the emulator can tremendously increase the efficiency of analyses(but it also adds additional uncertainty).
The emulator provides a deterministic interpolation result as well as a probability distribution representing our knowledge of the uncertainty of emulation.
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Concept
Gaussian Process Emulators:
Emulators have quite successfully been constructed by setting-up a Gaussian process prior with a mean consisting of a linear combination of basis functions and then conditioning this prior on the design data.
Motivation
Concept
Implementation
Application
Discussion
O‘Hagan 2006
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Concept
Gaussian Process Emulators:
Limitations:
1. Dense output in the time domain leads to numerical difficulties (large size and poor conditioning of matrices to be inverted).
2. The knowledge about the mechanisms built into the simulation program is not used.It can be expected that we could built a better emulator when using this knowledge. This is of particular importance if the design set is small.
Motivation
Concept
Implementation
Application
Discussion
This raises the question how to build an emulator of a dynamic model that resolves both of these issues.
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Concept
Emulators for Dynamic Models:
Three Options:
Motivation
Concept
Implementation
Application
Discussion
1. Application of Gaussian processes with time dimension as an additional input.Can lead to very large and poorly conditioned matrices to invert and numerical problems.
2. For Markovian or state-space models: Emulate transfer function from one state to the next instead of the complete dynamic response.
3. Use a simple dynamic model as a prior and model innovations as Gaussian processes in the other input dimensions. These Gaussian processes correct for the bias in the simple model.
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Concept
Emulators for Dynamic Models:
All emulators proposed so far (to my knowledge) do not consider our knowledge about the mechanisms implemented in the simulation model (with the exception of an problem-specific choice of basis functions).
Approach proposed in this talk:
Motivation
Concept
Implementation
Application
Discussion
Use a simplified, linear state-space model to describe the approximate dynamics of the simulation model.
Formulate the innovations as Gaussian processes of parameters (and potentially other input).
Derive the emulator (posterior) by Kalman smoothing.
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Implementation
Implementation
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Construction of Emulators
Construction of Emulators:
We can distinguish five steps of emulator development:
1. Choice of Design Data
2. Choice of a Simplified Probabilistic Model
3. Coupling of Replicated Simplified Models
4. Conditioning the Simplified Model on the Design Data
5. Calculation of Expected Value and Uncertainty
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Construction of Emulators
1. Choice of Design Data:
Often parameter values are chosen by latin hypercube sampling from reasonable domains of model parameters. However, adaptive sampling schemes could be used that increase the density of sampling points in regions of high variability of results.
The design data set consists of these parameter values and the corresponding simulation results:
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Construction of Emulators
2. Choice of a Simplified Probabilistic Model:
The emulator is based on a simplified probabilistic model M‘ of the simulation model M.
This model expresses our prior beliefs of the behaviour of the deterministic simulation model.
Ist likelihood function is given by:
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Construction of Emulators
3. Coupling of Replicated Simplified Models:
The augmented model consists of n replicates of the simplified model for different parameter values:
Motivation
Concept
Implementation
Application
Discussion
These models are stochastically coupled.
Probabilities represent here beliefs in a Bayesian sense.
We construct a model with n = nD+1 replicates of the simplified model. These correspond to models for the nD design parameter sets and for the emulation parameter set.
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Construction of Emulators
4. Conditioning the Simplified Model on the Design Data:
Motivation
Concept
Implementation
Application
Discussion
We calculate the distribution of the last set of components conditional on results for the first nD sets of components:
The emulator is gained by integrating out additional parameters:
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Construction of Emulators
5. Calculation of Expected Value and Uncertainty:Motivation
Concept
Implementation
Application
Discussion
The expected value provides the deterministic emulator:
The variance-covariance matrix of the emulator is a quantification of emulation uncertainty.
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Gaussian Process Emulator
1. Choice of Design Data:
Often parameter values are chosen by latin hypercube sampling from reasonable domains of model parameters. However, adaptive sampling schemes could be used that increase the density of sampling points in regions of high variability of results.
The design data set consists of these parameter values and the corresponding simulation results:
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Gaussian Process Emulator
2. Choice of a Simplified Probabilistic Model:Motivation
Concept
Implementation
Application
Discussion
The simplified probabilistic model consists of a deterministic model plus a multivariate normal error term with mean zero:
The simplified model can contain additional parameters. Often a linear combination of suitably chosen basis function is used:
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Gaussian Process Emulator
3. Coupling of Replicated Simplified Models:Motivation
Concept
Implementation
Application
Discussion
The augmented model consists of independent replications of the deterministic simplified model and error terms that are stochastically coupled:
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Gaussian Process Emulator
3. Coupling of Replicated Simplified Models:Motivation
Concept
Implementation
Application
Discussion
A simple stochastic coupling is obtained by:
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Gaussian Process Emulator
4. Conditioning the Simplified Model on the Design Data:
Motivation
Concept
Implementation
Application
Discussion
The augmented model is then multivariate normal. For this reason, we can apply the standard result for conditioning a multivariate normal distribution on some of ist components:
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Gaussian Process Emulator
4. Conditioning the Simplified Model on the Design Data:
Motivation
Concept
Implementation
Application
Discussion
This leads to the emulator as a multivariate normal distribution:
with
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Gaussian Process Emulator
5. Calculation of Expected Value and Uncertainty:
O‘Hagan 2006
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Dynamic Emulator
Motivation
Concept
Implementation
Application
Discussion
Dynamic models (and their emulators) have a structured output:
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Dynamic Emulator
1. Choice of Design Data:
Often parameter values are chosen by latin hypercube sampling from reasonable domains of model parameters. However, adaptive sampling schemes could be used that increase the density of sampling points in regions of high variability of results.
The design data set consists of these parameter values and the corresponding simulation results:
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Dynamic Emulator
2. Choice of a Simplified Probabilistic Model:Motivation
Concept
Implementation
Application
Discussion
Concept: Use of state-space model – emulation of „observed“ output only.
Reasons:
This accounts for the typical „hidden Markov“ structure of environmental simulation models.
It allows us to implement an emulator with a simplied (lower dimensional) state space.
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Dynamic Emulator
2. Choice of a Simplified Probabilistic Model:Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Dynamic Emulator
3. Coupling of Replicated Simplified Models:Motivation
Concept
Implementation
Application
Discussion
Augmented Model (1):
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Dynamic Emulator
3. Coupling of Replicated Simplified Models:Motivation
Concept
Implementation
Application
Discussion
Augmented Model (2):
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Dynamic Emulator
3. Coupling of Replicated Simplified Models:Motivation
Concept
Implementation
Application
Discussion
Augmented Model (3): Stochastic coupling
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Dynamic Emulator
4. Conditioning the Simplified Model on the Design Data:Motivation
Concept
Implementation
Application
Discussion
Kalman (forward) filtering (Künsch, 2001):
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Dynamic Emulator
4. Conditioning the Simplified Model on the Design Data:
Motivation
Concept
Implementation
Application
Discussion
Kalman (backward) smoothing (Künsch, 2001):
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Dynamic Emulator
5. Calculation of Expected Value and Uncertainty:Motivation
Concept
Implementation
Application
Discussion
Calculation of expected value and variance-covariance matrix of last set of components:
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Implementation
Due to the dependence on(which depends on the design data as well as on the new parameter values), the smoothing step is very inefficient.
By using the general matrix identity
we are able to separate-out the inversion of the large sub-matrix that depends only on the design data. This makes the procedure much more efficient as we do not have to perform large matrix inversions when using the emulator at new parameter values.
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Application
Motivation
Concept
Implementation
Application
Discussion Application
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Hydrological Model
Simple Hydrological Watershed Model (1):
gwlatetrunoffrains )(d
dqqqqq
t
h
dpbfgwgw
d
dqqq
t
h
rbflatrunoffr
d
dqqqq
t
h
Kuczera et al. 2006
Motivation
Concept
Implementation
Application
Discussion
soil
groundwater
river
qet
qrain
qrunoff
qlat
qgw
qbf
qr
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Hydrological Model
Simple Hydrological Watershed Model (2):
)(rain trainq
)(satrunoff trainfq
)()exp(1 petet tpetfhkq set
maxlat,satlat qfq
gwbfbf hkq
maxgw,satgw qfq
gwdpdp hkq
rwr qAQ 1
1
)exp(1
1
FssFsat
shks
f
Kuczera et al. 2006
1
2
3 4
5
7
8
8 model parameters3 initial conditions1 standard dev. of obs. err.
Motivation
Concept
Implementation
Application
Discussion
rrr hkq 6
soil
groundwater
river
qet
qrain
qrunoff
qlat
qgw
qbf
qr
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Model Application
Data set of Abercrombie watershed, New South Wales, Australia (2770 km2), kindly provided by George Kuczera (Kuczera et al. 2006).
Box-Cox transformation applied to model and data to decrease heteroscedasticity of residuals.
Step function input to account for input data in the form of daily sums of precipitation and potential evapotranspiration.
Daily averaged output to account for output data in the form of daily averaged discharge.
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Linearization
Motivation
Concept
Implementation
Application
Discussion
Linearization of model nonlinearities:
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Linearization
Motivation
Concept
Implementation
Application
Discussion
Derivation of simplified, linear state-space model:
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Results
Motivation
Concept
Implementation
Application
Discussion
Preliminary results with a simpler model look promising. They demonstrate that the concept works.
Unfortunately, the results for the hydrological model are not yet available.
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Discussion
Discussion
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Discussion
• We developed a general technique of constructing emulators for dynamic simulation models.
• In addition to solving technical problems of Gaussian process emulation of dynamic models, this technique easily allows us to rely on mechanisms incorporated in the simulation model. It can be expected that this improves the emulation process. This is of particular importance if the design set is small.
• There is need for more research:
• Gaining more experience with our approach.
• Extending the approach to the estimation of additional parameters of the simplified model.
• Learning about advantages and disadvantages of the different approaches to dynamic emulation.
Motivation
Concept
Implementation
Application
Discussion
Data-driven and physically-based models,
IMS, Singapore,Jan. 2008
Acknowledgements
Collaboration for this paper:Gentry White, Susie Bayarri, Bruce Pitman, Tom Santner during my stay at SAMSI, NC, USA
• Hydrological example and data:George Kuczera.
• More Interactions at SAMSI:Jim Berger, Fei Liu, Rui Paulo, Robert Wolpert, John Paul Gosling, Tony O‘Hagan, and many more.
Motivation
Concept
Implementation
Application
Discussion