lessons from analysing trial and error calibrated models for...

12
Calibration and Reliability in Groundwater Modelling: A Few Steps Closer to Reality (Proceedintis of ModcICARE'2002. Prague, Czech Republic. 17-20 June 2002). IAHS Publ. no. 277. 2002. 247 Lessons from analysing trial and error calibrated models for prediction reliability THEO OLSTHOORN Amsterdam Water Supply, Vogelemangseweg 21, 2114 BA Vogelenzang, The Netherlands t.olsthoornfS.gwa.nl EILEEN POETER Colorado School of Mines, International Ground Water Modelling Center, 1516 Illinois Street, Golden, Colorado 80401, USA JOS MOORMAN Amsterdam Water Supply, Vogelemangseweg 21, 2114 BA Vogelenzang, The Netherlands Abstract Statistics, which quantify the accuracy/reliability of a model, are missing in models calibrated by trial and error and therefore rarely reported. Trial and error calibration does not reveal underlying problems and may be completed without enough supporting data. The modeller easily adds parameters to improve the model fit, but is seldom aware of the large uncertainties invoked by over-parameterization. We demonstrate this using an existing freeware calibration tool to statistically analyse a model that was calibrated by trial and error. The statistics are used to improve model reliability through simplification and re-optimization. Key words Amsterdam Water Supply; calibration; correlation; sensitivity; statistical analysis; The Netherlands; trial and error; UCODE; Vechtplassen CALIBRATION STATISTICS FOR ANY MODEL Although there have been four ModelCARE conferences, in practice, most hydrologists still calibrate models by trial and error. At a one-day symposium to evaluate the extent of computer-aided calibration of groundwater models in practice, held in The Netherlands in March 2001, it was disappointing to find that most consul- tants are still reluctant to use tools such as PEST and UCODE to improve models. They claim that clients are not interested, do not want to spend money on automated calibration, and only want definitive results without uncertainty. Even if clients are not interested in computer-aided calibration, practicing engineers should be interested. We need to know the quality of our models and the uniqueness of our solution, to understand the confidence associated with our predictions. Use of automated calibration should be standard practice for these reasons alone. Mary Hill's (1998) guidelines provide a template for the profession. The necessary tools, such as MODFLOW2000, PEST and UCODE, are available in the public domain. Therefore, we should challenge our colleagues to use automated calibration, or at least report calibration statistics, because it is easy to do for any model, even if trial and error was the calibration technique (Poeter & Hill, 1997). Such an analysis sheds light on the nature of a model, as illustrated by the example in this paper.

Upload: others

Post on 23-Jul-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lessons from analysing trial and error calibrated models for …hydrologie.org/redbooks/a277/iahs_277_247.pdf · distinct, maintained water levels covers most of the area and controls

Calibration and Reliability in Groundwater Modelling: A Few Steps Closer to Reality (Proceedintis o f ModcICARE'2002. Prague, Czech Republic. 17-20 June 2002). IAHS Publ. no. 277. 2002. 247

Lessons from analysing trial and error calibrated models for prediction reliability

THEO OLSTHOORN Amsterdam Water Supply, Vogelemangseweg 21, 2114 BA Vogelenzang, The Netherlands

t .o ls thoornfS.gwa.nl

EILEEN POETER Colorado School of Mines, International Ground Water Modelling Center, 1516 Illinois Street, Golden, Colorado 80401, USA

JOS MOORMAN Amsterdam Water Supply, Vogelemangseweg 21, 2114 BA Vogelenzang, The Netherlands

Abstract Statistics, which quantify the accuracy/reliability of a model, are missing in models calibrated by trial and error and therefore rarely reported. Trial and error calibration does not reveal underlying problems and may be completed without enough supporting data. The modeller easily adds parameters to improve the model fit, but is seldom aware of the large uncertainties invoked by over-parameterization. We demonstrate this using an existing freeware calibration tool to statistically analyse a model that was calibrated by trial and error. The statistics are used to improve model reliability through simplification and re-optimization. K e y w o r d s A m s t e r d a m W a t e r S u p p l y ; ca l ib ra t ion ; cor re la t ion ; sens i t iv i ty ; stat ist ical ana lys i s ; T h e N e t h e r l a n d s ; trial and er ror ; U C O D E ; V e c h t p l a s s e n

CALIBRATION STATISTICS FOR ANY MODEL

Although there have been four ModelCARE conferences, in practice, most hydrologists still calibrate models by trial and error. At a one-day symposium to evaluate the extent of computer-aided calibration of groundwater models in practice, held in The Netherlands in March 2001, it was disappointing to find that most consul­tants are still reluctant to use tools such as PEST and UCODE to improve models. They claim that clients are not interested, do not want to spend money on automated calibration, and only want definitive results without uncertainty.

Even if clients are not interested in computer-aided calibration, practicing engineers should be interested. We need to know the quality of our models and the uniqueness of our solution, to understand the confidence associated with our predictions. Use of automated calibration should be standard practice for these reasons alone. Mary Hill's (1998) guidelines provide a template for the profession. The necessary tools, such as MODFLOW2000, PEST and UCODE, are available in the public domain. Therefore, we should challenge our colleagues to use automated calibration, or at least report calibration statistics, because it is easy to do for any model, even if trial and error was the calibration technique (Poeter & Hill, 1997). Such an analysis sheds light on the nature of a model, as illustrated by the example in this paper.

Page 2: Lessons from analysing trial and error calibrated models for …hydrologie.org/redbooks/a277/iahs_277_247.pdf · distinct, maintained water levels covers most of the area and controls

248 Théo Olsthoorn et al.

EXAMPLE MODEL

A regional two-aquifer groundwater model, of 400 Ion2 in the central part of The Netherlands, provides an example trial and error calibrated model to illustrate the procedure for evaluating calibration statistics (Moorman, 2000). Distinct zones divide the two aquifers based on pumping tests and fault lines. A sandy hill area in the northeast is the main recharge zone, while a dense network of surface waters with distinct, maintained water levels covers most of the area and controls the shallow groundwater. The Multi-Layer Analytic Element Model (MLAEM—Strack, 1989; De Lange, 1996) simulated the steady state conditions, and calibration was by trial and error, using 100 observations and estimating 21 parameters consisting of: precipitation excess, 17 hydraulic conductivities, anisotropy in the northeastern hills, and two vertical resistances, between the overlying system and the first aquifer and another between the two aquifers. In fact, the spatial variation of the latter resulted in an even larger number of parameters. Although calibration statistics calculated by automated calibration software are only rigorously acceptable for optimal parameter values, which the trial and error approach does not ensure, the statistics of the resulting model provide insight.

Table 1 Overview of the unweighted mean and standard error of the five groups of observations in the hand-calibrated model.

Gr Description No. Mean Std Dimension

1 Heads Aq 1 Lowland 55 -0.002 0.12 m 2 Heads Aql Hills 16 -0.022 0.19 m 3 Heads Aq2 Lowland 22 0.056 0.13 m 4 Heads Aq2 Hills 5 -0.18 0.23 m All Heads 98 -0.001 0.14 m 5 Polder Flows 2 1609 1613 m' day"1

ANALYSIS OF THE TRIAL AND ERROR CALIBRATED MODEL

Any hand-calibrated model can be analysed using a readily-available calibration tool, such as UCODE (Poeter, 1998), to get calibration statistics and parameter and prediction confidence intervals. This is a straightforward procedure, because UCODE has an option to compute the calibration statistics of an existing model, assuming the model is correct and its parameter values are optimized. The only risk is that this exercise may unveil unexpected deficiencies in the hand-calibration.

To analyse a model for the quality of its calibration, all we need is the residual vector, e — y(b)-y, the weights, to, and the Jacobian or sensitivity matrix, / . Hereby

is a vector of length Nc/, containing all observations; and y(b) is the vector of model outputs (simulated equivalents) for these observations and depends on the parameter vector b, which is of length Np. The objective function,

S = (y(b)-y)T a>(y(b)-y) = e(b)'(ùe(b), is minimized by optimization of the

parameters in b. In general, to is a full Nd x Nd weight matrix, but in practice w is typically diagonal, which implies that the observations are uncorrelated. Often, trial and error calibration does not employ weights, except for a conversion factor to render

Page 3: Lessons from analysing trial and error calibrated models for …hydrologie.org/redbooks/a277/iahs_277_247.pdf · distinct, maintained water levels covers most of the area and controls

Lessons from analysing trial-and-error calibrated models for prediction reliability 249

dimensionless residuals for combination in a single objective function (e.g. heads, flows, and concentrations). Further, prior information is generally included formally in the objective function of automated calibration, but only intuitively in trial and error calibration.

We start the analysis of the trial and error model by checking the mean of the ND

unweighted residuals, m = ]Te>,• = -0.001 m, which is near zero, and therefore 7=1

acceptable. To consider the sign of the residuals as well, we use the root mean square

error, RMSE =. eTe

ND-Np

= 0.16 m, which is small, and again considered acceptable.

Next, we check for bias, bias •• T

e e (e-e)T{e-e) ND-Np

= 0.002m, which is close

to zero. Adjustment of many parameters in the trial and error process remove the bias, which may, or may not, be supported by substantial objective evidence. Removing bias is easier when more parameters are available for adjustment. However, the modeller may not be aware that the data do not support such adjustments. The current model provides an example of this. The explained variance, ve, of the calibrated model is

also encouraging: ve •• var(>„,c,a,)-var(e) vari

= 0.99.

GAINING DEEPER INSIGHT

Generally, this is the extent of statistical evaluation in modelling reports, which may convince the commissioning agency of the value of the study. Some trial and error calibration reports consider sensitivity analysis, while in automated calibration calculation of the sensitivity matrix is required. The sensitivity matrix, or Jacobian, J, at every iteration step is:

de, dnt

de2

dît,

dn.

de, dn2

de2

dn2

dn2

det

de2

3TT„

4 CO, d7C.

which has Np columns and Nd rows. Here 7i stands for the parameter vector which may contain b or log(Z>), because often the log of the parameters are optimized instead of the parameters themselves. The final result is similar, but convergence when minimizing S is generally better using log(Z») for appropriate parameters, such as conductivities, flows and concentrations, but not boundary heads. A thorough analysis of the Jacobian of a trial and error-calibrated model, with or without weights, provides invaluable information on the quality of the calibration, the choice of parameters and the uniqueness of the result.

Page 4: Lessons from analysing trial and error calibrated models for …hydrologie.org/redbooks/a277/iahs_277_247.pdf · distinct, maintained water levels covers most of the area and controls

250 Théo Olsthoorn et al.

Computing the Jacobian is straightforward, requiring Np + 1 runs with our hand-calibrated model. We can take the partial derivative of p or log(p) and compute

/ CRJ>-

/ as dimensionless scaled sensitivities, J - ô / A /w ; / -zr-ç-, which, for small relative

increments, ô = {dbj)lb}, is equivalent to taking log-sensitivities:

d(ln{bj)) = ln((l + S)bj)-ln(è;.) = In(l + Ô) = 6, so J } = J ^ ^ b J + 8 b ^ ~ ^ , with

ô a small scalar, e.g. 1% or 5%. To evaluate whether the model parameters are optimal, we check the partial

de derivative of the objective function, which if made dimensionless, is 2eT(û 6,.. All

J dbj ' of these partial derivatives should be zero. This will not be the case in practice as is shown in Fig. 1 where convergence is defined at an arbitrary cut-off and the model set­up is never a perfect representation of the field.

We get a different picture if we include the two polder flows as prior information in the objective function. From this we conclude that it is important to formally include these flows because they will have a major impact on the result, demonstrating the importance of using flows in conjunction with heads. These figures show that at least some of the parameters may not be optimal and we should reconsider the optimization including the polder flows.

The composite-scaled sensitivities, ess (Hill, 1998), provide a comprehensive view of the overall sensitivity of the simulated equivalents (model equivalents to the observations) with respect to each of the parameters. The composite scaled sensitivities are the average of the squared sensitivities of each column of the Jacobian:

The composite scaled sensitivities range over three orders of magnitude in the current model (see Table 5), thus we suspect it is impossible to estimate insensitive parameters objectively, even though it has been done by trial and error.

The variance-covariance matrix, cov = s2(jTj) ', reveals the uncertainty of the estimated parameters. In this case, with 21 parameters, it is a 21 x 21 matrix, s2 is the

S(b) error variance of the weighed residuals, s2 ~ ——,which follows from the value

of the objective function, S(b), evaluated for the optimised parameter values and Np - Np, the number of degrees of freedom of the optimization. The diagonal of the covariance matrix gives a first approximation of the error variances of the parameters themselves. The root of which are the standard errors. For l 0log parameters, these errors refer to the log of the parameter. In that case, a standard error of 1 is equivalent to a multiplicative standard error of a factor of 10 and a standard error of 0.1 is equivalent to a multiplicative standard error of a factor of 1.3, thus any value over say 0.5 is quite uncertain. Table 2 contains the statistical information obtained for the parameters. From this table we find that many parameters have no support from the

Page 5: Lessons from analysing trial and error calibrated models for …hydrologie.org/redbooks/a277/iahs_277_247.pdf · distinct, maintained water levels covers most of the area and controls

Lessons from analysing trial-and-error calibrated models for prediction reliability 251

c 0 . 3 ! 4 0.25 5 0.2 „- 0.15 u

I 0.1 0.05

° 0

° -0.05 I -0.1 ° -0.15 a -0.2 J

I I CD r-- T - co CD h-

Fig. 1 Partial derivatives of the objective function for the trial and error calibrated model using only heads as observations.

o Q.

O

500

0 r-

c - 3

o .a 5= o

CD

a

-500

•1000

-1500

-2000

-2500 ^ _v: . .^

CNCMLMCNCMOJOICM a; ^ JX. O O r - t N C O ' t l O C D N T - C M C O C O ^ M T D

N N N N N N N N N N N N N N N N N S

Fig. 2 Partial derivatives of the objective function for the trial and error calibrated model using both heads as observations and polder flows as prior information.

data. If we reject those with a standard deviation of the log parameters larger than 0.5 (coefficient of variation of 3.16), at least 11 out of 21 parameters should be dropped or combined with other parameters if hydrological arguments for this are present.

cov.. The normalized covariance matrix (correlation matrix), corff = ,

V c o v / /V c o v . / / provides insight to the mutual dependencies of parameters, given the available observations, of the calibrated model (the values are not shown here due to limited space). This matrix is included in Table 3. High positive correlations indicate parameters for which parameter variation accomplishes the same model result and high negative ones indicate parameters that counteract each other, so that highly correlated parameters are not separable without further independent information. Such parameters should be tied to other parameters. For example, with head data alone, we cannot estimate recharge and k, only their ratio. This is not obvious from the correlation matrix of the trial and error model due to the use of many conductivity parameters.

Page 6: Lessons from analysing trial and error calibrated models for …hydrologie.org/redbooks/a277/iahs_277_247.pdf · distinct, maintained water levels covers most of the area and controls

252 Théo Olsthoorn et al.

Table 2 Statistical results of the trial and error calibration. (Parameter type is either multiplier or conductivity; ess is composite scaled sensitivities of the log parameters; Std is log parameter standard deviation.

Parameter: Zone Aq'fer/ CSS Std Parameter value and confidence: No. Name Type no. aq'td no. -95%CI final +95%CT

1 rfl multipl (-) 99 1 5.04 0.05 0.81 1.0 1.23 2 fr multipl (-) 99 1 0.17 0.04 0.80 1.0 1.14 3 fl multipl (-) 99 2 0.05 0.12 0.59 1.0 1.83 4 ZOkl kh (m day"') 0 1 5.91 0.13 21.7 40.0 74 5 Z0k2 kit (m day"1) 0 2 4.19 0.14 24.2 45.0 84 6 Z lk l kh (m day"') 1 1 0.85 1.31 0.11 45.0 18700 7 Z2kl kh (m day"') 2 1 1.45 1.60 0.03 50.0 78700 8 Z3kl kh (m day"') 3 1 1.19 0.55 3.00 37.5 469 9 Z4kl kh (m day' 1) 4 1 2.68 0.18 12.0 27.5 63 10 Z5kl kh (m day"1) 5 1 0.49 0.65 1.27 25.0 491 11 Z6kl kh (m day"1) 6 1 0.36 1.17 0.20 42.5 9280 12 Z7kl kh (m day"') 7 1 0.02 30.80 0.00 45.2 1E+38 13 Zlk2 kh (m day"') 1 2 1.04 1.03 0.36 40.0 4490 14 Z2k2 kh (m day"') 2 2 2.54 0.92 0.67 44.5 2970 15 Z8k2 kit (m day"') 8 2 2.47 0.20 21.9 55.6 141 16 Z3k2 kh (m day"') 3 2 2.32 0.15 34.2 66.6 130 17 Z4k2 kh (m day"1) 4 2 1.91 0.38 13.6 76.0 426 18 Z5k2 kh (m day"') 5 2 1.34 0.27 10.8 37.8 133 19 Z6k2 kh (m day"') 6 2 0.43 0.73 1.18 33.3 938 20 Z7k2 kh (m day"') 7 2 0.06 9.21 2E-17 43.0 1E+20 21 aniso multipl (-) 99 1 0.01 12.70 5E-26 1.0 2E+25

The ess in Table 2 show that recharge is a dominating factor as the driving force of the model, implying that any error in the recharge will propagate to errors in the values of other parameters.

For this model, high correlations occur between the shallow and deep aquifers in five zones where the separating aquitard is virtually absent. Consequently, it would be better to estimate one value of conductivity for both layers in these zones and assume vertical isotropy. In two zones of this model it is impossible to calibrate a transmissivity and the resistance c = Blk of the overlying aquitard simultaneously, because the model is only sensitive to the characteristic length \-4TC (m). The western part of the model is controlled by artificially-maintained, surface water networks that provide recharge via specified heads with much uncertainty in the parameters, but the correlation matrix identifies problems.

Given this insight, we conclude that the available data did not support the trial and error calibration. This is common for trial and error-calibrated models, because adding parameters is the easiest way to improve model fit and statistics are not available to deter this action. UCODE recalibrates the model using a new, smaller set of seven parameters, recharge is defined from field conditions, the large seepage flows of two large, deep, artificially-maintained polders is explicitly included in the objective function, and weights are based on the variance associated with the field observations. We assume the original value of (0.84 to 1 mm day"1) for recharge to be given, because the steady-state values are relatively well known in The Netherlands while we have mainly head observations, thus all estimated parameter values are dependent on the

Page 7: Lessons from analysing trial and error calibrated models for …hydrologie.org/redbooks/a277/iahs_277_247.pdf · distinct, maintained water levels covers most of the area and controls

Lessons from analysing trial-and-error calibrated models for prediction reliability 253

Table 3 Parameter correlation matrix.

Par# ParNm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 P

1 rf1 100 0

2 fr 16 100 1

3 fl - 1 3 - 1 3 100 2

4 ZOkO 0 40 79 100 3

5 ZOkO 6 - 6 - 8 3 - 8 7 100 0

6 ZOkO - 1 29 6 20 - 9 100 4

7 ZOkO 4 38 - 1 18 - 4 36 100 4

8 ZOkO 51 - 1 - 2 0 - 1 9 17 - 5 1 100 7

9 ZOkO 26 - 6 68 54 - 5 6 4 6 14 100 7

10 ZOkO 26 - 1 - 7 - 7 6 - 2 1 32 8 100 5

11 ZOkO - 5 5 10 4 4 1 2 1 6 - 2 100 5

12 ZOkO 10 9 - 2 - 1 7 - 2 2 - 3 5 - 1 3 49 100 5

13 ZOkO - 1 0 - 9 - 8 - 1 2 7 - 7 8 - 2 2 - 3 - 1 1 - 2 - 1 - 1 100 4

14 ZOkO - 1 - 3 5 - 1 - 1 8 6 - 4 0 - 9 5 4 - 5 2 - 1 1 12 100 4

15 ZOkO 22 4 - 6 -4 4 - 8 - 1 8 - 7 0 - 4 1 1 - 7 8 100 6

16 ZOkO 39 - 2 - 1 - 1 0 1 6 22 25 6 2 - 8 0 - 4 - 1 7 100 6

17 ZOkO 73 16 - 2 5 - 7 10 0 - 2 20 - 2 2 10 - 1 3 - 1 4 2 - 2 23 13 100 6

18 ZOkO - 2 2 1 15 14 - 1 3 2 - 3 - 3 9 - 4 - 7 1 8 - 8 - 2 - 1 19 - 2 6 3 100 5

19 ZOkO 4 - 1 - 3 1 - 2 2 8 - 4 - 2 6 - 2 2 5 - 6 9 - 2 9 4 2 3 0 16 - 8 100 5

20 ZOkO 0 - 3 2 5 - 8 3 1 - 3 0 4 - 7 3 - 7 9 0 - 2 - 2 0 13 -4 33 100 5

21 aniso 37 - 3 - 2 5 - 2 5 23 - 6 1 92 20 27 1 - 2 - 1 3 - 1 0 20 3 - 3 6 1 -4 100 0

The last column shows the new parameterization applied for the final optimization, in which original parameters are fixed or tied together by a common multiplier. Zero indicates that the parameter is fixed. This combining was motivated as follows: recharge, rfl (#1) was fixed as it was considered the given driving force of the model to which everything else is related. Anisotropy was fixed because of its insensitivity. Z lk l (6), Z2kl (7), Zlk2 (13) and Z2k2 (14) were combined because the aquitard was missing in that location, which is reflected by the high correlation between parameters 6 and 13, and 7 and 14. Further, Z5kl (10), Z6kl (11), Z7kl (12), Z5k2 (18), Z6k2 (19) and Z7k2 (20) were combined because of aquitard absence. For hydrological reasons Z8k2 (15), Z3k2 (16) and Z4k2 (17) were combined. Similarly for Z3kl (7) and Z4kl (8). Finally, ZOkl (4) and Z0k2 (5) are strongly correlated as well as fl (3) and Z0k2 (5). Z0k2 (5) was fixed because it is well known from other studies and thus stabilizes ZOkl (4) and fl (3). In total 7 out of 21 parameters were retained.

recharge value. Future field data should better define recharge. The model is completely insensitive to anisotropy, so we assume the values of the trial and error model to be correct. Finally, we group the hydraulic conductivities of the correlated K zones discussed above. A more specific argumentation is included in the caption of Table 3. Head observations are believed accurate to approximately 0.25 m in the low, and 0.4 m in the high, region of the model and assuming we are 95% confident of this, their variance is 0.016 to 0.04 m 2, respectively. The coefficient of variation for the large seepage flows of the two deep polders is of the order of 2%.

RESULTS OF AUTOMATIC RE-OPTIMIZATION AFTER RE-PARAMETERIZATION

Optimization of the remaining seven parameters is accomplished with UCODE, which took only three iterations. The resulting unweighted residuals are all shown in Fig. 3.

Page 8: Lessons from analysing trial and error calibrated models for …hydrologie.org/redbooks/a277/iahs_277_247.pdf · distinct, maintained water levels covers most of the area and controls

254 Théo Olsthoorn et al.

HORSTERMEER POLDER

6

B e t h u n e P o l d e r

O O

Res Aq 1[mJ 0.6 - 0 . 4 0.4 - -0.3 0.3 - -0.2 0,2 - 0 . 1 0.1 - 0 0 -0 .1 0.1 -0.2 0.2 - 0.3 0,3-0,4 0.4 - 0.6

R e s A q 2 [m] 0.4 - 0 . 3

P 0 , 3 - 0 . 2 • 0 , 2 - 0 . 1 . 0,1 - 0 . 0 -0 .1 < 0,1 -0.2 < 0.2-0.3 4 0.3 - 0.4

Roads Lakes Roads

~J~J Aquifer

10 30 Kilometers

Fig. 3 Final (unweighted) residuals: 98 heads (m) in two aquifers and the two deep polder seepage flows (nr' day"').

Results for the unweighted residuals are presented in Table 4 and can be directly compared with Table 1 for the trial and error calibrated model. As can be seen, all groups have better or equal standard error, using only seven parameters (the differ­ences in the means are better or negligibly worse (overall heads by 8 mm and lowland heads in the second aquifer by 5 mm). Further statistics are calculated using weighed residuals, which were not available for the trial and error model, because weights were not formally applied for that calibration, but weights are essential for statistical analysis to combine all observations irrespective of their dimension. We have an explained variance of 0.98 as before.

Table 4 Overview of the unweighted mean and standard error of the five groups of observations after re-parameterization and re-optimization with UCODE.

Gr Description No. Mean Std Dimension

1 Heads Aql Lowland 55 0.001 0.12 m 2 Heads Aql Hills 16 0.015 0.18 m 3 Heads Aq2 Lowland 22 0.061 0.13 m 4 Heads Aq2 Hills 5 -0.14 0.19 m All Heads 98 0.009 0.14 m 5 Polder Flows 2 171 627 m 3 day"1

Further statistical analysis is done with the weighed simulated values and residuals. The RMSE of the weighed residuals should be approximately 1 as weights are used according to the expected errors. In our case we find that their standard error

Page 9: Lessons from analysing trial and error calibrated models for …hydrologie.org/redbooks/a277/iahs_277_247.pdf · distinct, maintained water levels covers most of the area and controls

Lessons from analysing trial-and-error calibrated models for prediction reliability 255

after regression is 1.038. The average weighed residual is -0.069. Additional results are presented in Table 5. A valid model will have unbiased, random residuals. A number of evaluations can be made to determine if this is the case. For example, a graph relating weighed residuals and weighed simulated values should exhibit a band of uniform width about zero as is the case for this model (Fig. 4). Also, the runs test compares the number of consecutive series of positive and negative residuals with the expected number of runs in an equally long series of random numbers to see if the series in question can be considered random. In our case we had 43 runs in 100 observations, yielding a runs statistic of-1.50, thus there is a 5-10% chance that these residuals are random, suggesting the spatial pattern of weighed residuals should be checked for bias. Spatial bias is not notable in Fig. 3. If there is spatial bias, it encourages redefining zones and/or regrouping parameters, that is, improvement of the conceptual model.

Table 5 Results of re-optimization with seven parameters (multipliers), (ess is composite scaled sensitivity; log(b) is final log parameter value; Std is its standard deviation and the coefficient of variation is 10s"1.

P a r a m e t e r : CSS log(b) Std CoefV P a r a m e t e r va lues : Correlation matrix:

No. N a m e - 9 5 % b_final + 9 5 % No. 1 2 3 4 5 6 7

1 fr 0.171 - 0 . 0 2 1 0.031 1.073 0 .828 0 .953 1.100 1 1 0 0 %

2 fl 0 .054 0.017 0.051 1.125 0 .823 1.040 1.310 2 - 3 3 % 1 0 0 %

3 pk l 0 .060 - 0 . 0 1 7 0 .052 1.128 0 .758 0 .963 1.220 3 7 1 % 1 6 % 100%,

4 pk3 0 .118 - 0 . 0 3 5 0 .055 1.134 0 .718 0 .924 1.190 4 7 2 % - 1 4 % 5 0 % 100%.

5 pk4 0 .097 0.086 0 .098 1.254 0 .777 1.220 1.910 5 5 % - 4 % - 2 % 1 7 % 1 0 0 %

6 pk5 0 .030 - 0 . 0 1 2 0 .059 1.145 0 .743 0 .973 1.270 6 - 5 % - 2 1 % - 1 6 % - 4 4 % - 3 2 % 1 0 0 %

7 pk6 0 .002 - 0 . 0 0 1 0 .064 1.160 0 .743 0 .998 1.340 7 - 2 2 % 6 2 % 7 % 2 % 6% - 4 7 % 100%.

i- heads a1 lowland O heads a1 hills f heads a2 lowland • heads a2 hills x flows

-30 -20 -10 0 10 20 30 40 50 60 weighted simulated values

Fig. 4 Weighed residuals vs weighed simulated values.

Further calibration results are given in Table 5, in which the information is ordered in the same way as Table 2 for the trial and error calibration. From this table it

o

° 0 \ o

Page 10: Lessons from analysing trial and error calibrated models for …hydrologie.org/redbooks/a277/iahs_277_247.pdf · distinct, maintained water levels covers most of the area and controls

256 Théo Olsthoorn et al.

becomes clear that composite scaled sensitivities are more similar, but one multiplier, pk6, is still low compared to multiplier fr. The model is simpler, and perhaps could be made even more so. The confidence intervals of the optimized parameters has been reduced to reasonable limits as can be seen from the coefficient of variation.

The confidence intervals rely on a linear model and normally distributed residuals. Beale's measure, which is computed by UCODE, measures model linearity. The value for this model is 0.042, which is reported as an effectively linear model. The correlation between the ordered weighted residuals (Fig. 5) and normal order statistics is 98.4%, which is above the 5% significance level of 97.6, and so we accept with 95% certainty that the weighed residuals are independent and normally distributed. Hence, the confidence intervals can be considered accurate for this model set-up.

r 9

0.999 - :

0 gg7 j o weighted residuals i. ! x 5 control groups

0.99 - - ' -— ' i- -

0.001 | -» l. ... _ i i . . . i i i

- 2 - 1 0 1 2 3 weighted and simulated weighted residuals (rg)

Fig. 5 Normality plot of the weighed residuals, together with five UCODE generated control groups of simulated values with the same statistics to check deviates from normality for small populations (Cooley & Naff, 1990).

PREDICTION

Sometimes models are made to investigate relationships, but more generally their goal is to make reliable predictions. Automated calibration provides the necessary statistics to calculate confidence intervals on the predictions. For this model we consider the resulting head distribution if one of the low-lying polders reverted to a lake, increasing the water level by 2.9 m. In this case, the current seepage of nearly 80 000 m 3 day"1

would vanish. The seepage/leakage of other polders and heads in the region would change in response.

Head increases and their 95% confidence intervals are determined at selected locations using UCODE (Figs 6 and 7). Given the improved model, predictions are more accurate and the confidence intervals are smaller than obtained with the over-parameterized trial and error calibrated model. Trial and error calibration does not provide the statistics necessary to calculate confidence intervals (if such statistics are generated independently, their validity is questionable because there is no guarantee that the parameter values are optimal).

Page 11: Lessons from analysing trial and error calibrated models for …hydrologie.org/redbooks/a277/iahs_277_247.pdf · distinct, maintained water levels covers most of the area and controls

Lessons from analysing trial-and-error calibrated models for prediction reliability 257

4000

5 3500

3000 f

ro 2500

2000

a! 1500 4-

1000

Fig. 6 Predicted change of daily leakage with 95% confidence interval for three nearby polders if Bethune Polder reverted to a lake.

466000

464000-

462000 -

460000

0.13, 0.11 <y

0.03 0.04

Breukelenveen -1.25 1o1& 0.9*3 -<>- 0.07

0.29 <!> 0.jj8 2 7

Bettiune Polder.^ 4g 2.38 \ 0.07

2.28 ~& 2.49 ^06 A 0.08 Zodden^ 0.25 - 1 " 1 8

20 <>- 0.W

v.™ v v.<o?18 O 0.32 Molenpolder -1.76 . 1 - 5 0

130000 132000 134000 136000 138000 Fig. 7 Head locations around Bethune Polder, with simulated head (shown below the location marker, (m)) in the first aquifer, head change (shown above marker) due to a surface water head rise from -3.9 to -1 .1 m (a.m.s.l.) in the polder, and lower and upper 95% confidence interval of the computed head change (left and right of marker).

The calibration should be formally reconsidered from the perspective of the desired predictions, by evaluating whether parameters that were insensitive under calibration conditions are sensitive in the predictive situation, and whether parameters that were correlated in the calibration are not correlated in the predictive situation. If parameters have become sensitive or uncorrelated in the predictive situation, we should reconsider the calibration and strive to acquire data that will increase the calibration sensitivity to parameters that show sensitivity only in the predictive mode, as well as data that will decrease calibration phase con-elation between the parameters that are not correlated in the predictive mode. Neither of these situations occurred in the case considered here.

Page 12: Lessons from analysing trial and error calibrated models for …hydrologie.org/redbooks/a277/iahs_277_247.pdf · distinct, maintained water levels covers most of the area and controls

258 Théo Olsthoorn et al.

CONCLUSION

Statistics, which are automatically obtained from calibration tools, are missing if a model is calibrated by trial and error. Statistical analysis of such models is possible and straightforward, either by hand calculation or using readily available calibration tools. This analysis yields the statistics to quantify the confidence limits of the existing model. At the same time it unveils underlying problems with the trial and error calibrated model. These statistics are the key to making a sound re-parameterization, after which the parameters can be optimized automatically to obtain a statistically more robust result with quantified confidence intervals. In the current example, we obtained a better overall result with only seven of 21 original parameters. The model was also better balanced and formally included flows.

We urge that calibration statistics be reported with every model. These statistics are a prerequisite to quantifying confidence associated with model predictions and can be computed readily using publicly available software.

REFERENCES

Cooley, R. L. & Naff, R. L. (1990) Regression Modeling of Ground-Water Flow. Techniques of Water-Resources Investigations of the USGS, Book 3, Chapter B4. US Geological Survey, Washington DC, USA.

De Lange, W. J. (1996) Groundwater model ing of large domains with analytic elements . PhD Thesis , Delft University, RIZA, Lelystad, The Nether lands . ISBN 90-369-4569-0.

Hill, IV1. C. (1998) Methods and guidelines for effective model calibration. US Geol. Survey. Water Resources Investigations 98-4005.

Moorman, J. H. N . (2000) Statistical analysis of the parameters of a hand-calibrated analytic element model . In: Analytic Element Conference, vol. I (ed. by O. D. L. Strack) (Proc. Third Int. Conf., Brainerd, Minnesota, USA). Dept. Civil Engineering, University of Minnesota, Minnesota, USA.

Poeter, E. P. & Hill, M. C. (1997) Inverse models, a necessary step in groundwater modeling. Groundwater 35(2), 2 5 0 - 2 6 0 . Poeter, E. P. & Hill, M. C. (1998) Documentat ion of U C O D E , A computer code for universal inverse modeling. US Geol.

Survey. Water Resources Investigations 98-4080. Strack, O. D. L. (1989) Groundwater Mechanics. Prentice Hall, Englewood Cliffs, N e w Jersey, USA.