artificial neural networks and conditional stochastic ... · artificial neural networks and...

Artificial neural networks and conditional stochasticsimulations for characterization of aquifer heterogeneity

Item Type text; Dissertation-Reproduction (electronic)

Authors Balkhair, Khaled Saeed

Publisher The University of Arizona.

Rights Copyright © is held by the author. Digital access to this materialis made possible by the University Libraries, University of Arizona.Further transmission, reproduction or presentation (such aspublic display or performance) of protected items is prohibitedexcept with permission of the author.

Download date 29/12/2020 20:05:14

Link to Item http://hdl.handle.net/10150/284451

http://hdl.handle.net/10150/284451

INFORMATION TO USERS

This manuscript has been reproduced from the microfilm master. UMI films the

text directly fi om the original or copy submitted. Thus, some thesis and

dissertation copies are In typewriter face, while others may be from any type of

computer printer.

The quality of this reproduction is dependent upon the quality of the copy

submitted. Broken or indistinct print, colored or poor quality Illustrations and

photographs, print bleedthrough, substandard margins, and Improper alignment

can adversely affect reproductlon.

In the unlikely event that the author did not send UMI a complete manuscript and

there are missing pages, these will be noted. Also, if unauthorized copyright

material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning

the original, beginning at the upper left-hand comer and continuing firom left to

right in equal sections with small overiaps. Each original is also photographed in

one exposure and Is included in reduced form at the back of the book.

Photographs included in the original manuscript have been reproduced

xerographically in this copy. Higher quality 6" x 9" black and white photographic

prints are available for any photographs or illustrations appearing in this copy for

an additional charge. Contact UMI directly to order.

Bel! & Howell Information and Learning 300 North Zeeb Road, Ann Artxjr, Ml 48106-1346 USA

800-521-0600

ARTIFICIAL NEURAL NETWORKS AND CONDITIONAL STOCHASTIC SIMULATIONS

FOR CHARACTERIZATION OF AQUIFER HETEROGENEITY

By

Khaled Saeed Baikhair

A Dissertation Submitted to the Faculty of the

DEPARTMENT OF HYDROLOGY AND WATER RESOURCES

In Partial Fulfillment of the Requirements For the Degree of

DOCTOR OF PHILOSOPHY WITH A MAJOR IN HYDROLOGY

In the Graduate College

THE UNIVERSITY OF ARIZONA

1999

UMI NxJinber: 9934843

UMI Microform 9934843 Copyright 1999, by UMI Company. All rights reserved.

This microform edition is protected against unauthorized copying under Title 17, United States Code.

UMI 300 North Zeeb Road Ann Arbor, MI 48103

2

THE UNIVERSITY OF ARIZONA ® GRADUATE COLLEGE

As members of the Final Examination Committee, we certify that we have

read the dissertation prepared by Khaled Saeed Balkhair

entitled Artificial Neural Networks and Conditional Stochastic

.<^TTnTtlations for Characterization of Aquifer Heterogeneity

and recommend that it be accepted as fulfilling the dissertation

requirement for the Degree of Doctor of Philosophy

i-f I Ijlj Dat/e -

Thoma

Mac Nish

Date

Date

—Peter TT Wierenga

David M. Hendricks

Date I

Date

Final approval and acceptance of this dissertation is contingent upon the candidate's submission of the final copy of the dissertation to the Graduate College.

I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement.

Dissertation Director-^Lucien Duckstei^

-b Date

3

STATEMENT BY AUTHOR

This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at The University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library.

Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of the source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College. In all other instances, however, permission must be obtained from the author.

SIGNED;

4

ACKNOWLEDGEMENTS

I have had the pleasure of knowing my advisor Professor Lucien Duckstein since the fall of 1996 when I took his course in Fuzzy Logic. His enthusiasm, encouragement, and advice were instrumental in helping me complete my dissertation and degree requirements. I would also like to thank my other committee members. Dr. Thomas Maddock IH, Dr. Robert MacNish, Dr. Peter Wierenga, and Dr. David Hendricks for offering ideas and suggestions pertinent to my research both during and after my oral comprehensive exam.

Gratitude is extended to Professor Allan Gutjahr of New Mexico Tech.; our discourses on stochastic approach were highly beneficial. I also extend this gratitude to his former student. Dr. Debra Hughson, for sharing with me her research experience, which helped facilitate implementation of the stochastic approach.

I am greatly indebted to the financial sponsorship provided by my government through the Saudi Culture Mission. This assistance enabled me to focus primarily on my coursework and research.

Special thanks to my friend and classmate Emery Coppola Jr. for helping me reviewing parts of the dissertation manuscript. His suggestions were very helpful.

I feel particularly formnate to be a graduate of this great school and outstanding Department. There are many names to mention which deserve special thanks. They include Dr. Donald Davis for his continual support in providing useful references; his office door was always open to me; Teresa Handloser, Academic Advisor, for providing useful administrative advice; Department secretaries Frances Janssen, Chris Wenger, and Mary Nett who lent me all sorts of support and help when needed.

I would like to express my deepest gratitude to my parents for their continual support and encouragement; my wife for her patience, care and understanding throughout the tenure of my graduate study; and the rest of my family for their kind words of encouragement.

DEDICATION

To

My parents

My wife

My children

And

My Family

TABLE OF CONTENTS

LIST OF HGURES

LIST OF TABLES 1

ABSTRACT 1

L INTRODUCTION 1

1.1 Background 1

1.2 Objectives 2

1.3 Contents of the Dissertation 2

2. SPATIAL HETEROGENEITY AND RANDOM FIELDS 2

2.1 Heterogeneity and Stochastic Process 2

2.2 Geostatistical Representation of Spatial Variability 3

2.3 Random Processes and Random Fields 3

2.4 Stationarity and Ergodicity of Random Processes 3

2.5 Random Field Generators (RFG) 3

2.5.1 Spectral Representation of Random Variables 4

2.5.1.1 Unconditional two-dimensional Random

Field Generation 4

2.5.1.1 Conditional two-dimensional Random Fields 4

3. STOCHASTIC MODEL DEVELOPMENT 5

3.1 Introduction 5

3.2 Mathematical Development of the Stochastic Model 5

3.2.1 Spectral Representation of the Flow Equation 5

3.2.2 Ln[T] Spectra and Covariance Function 5

3.2.3 Head Variance and Covariance Functions 6

3.2.4 Cross-Covariance Functions 6

3.3 Conditioning 6

7

TABLE OF CONTENTS - CONTINUED

3.3.1 Cokriging 64

3.3.3 Iterative Conditioning Procedure 68

4. NEURAL NETWORKS 72

4.1 Introduction 72

4.2 Basis of Neural Networks 76

4.3 Artificial Neurons 79

4.4 Artificial Neural Networks 83

4.5 Example of Artificial Neural Network 85

4.6 Learning and Recall 88

5. BACK-PROPAGATION NEURAL NETWORK 91

5.1 General 91

5.2 Widrow-Hoff Delta Learning Rule 92

5.3 Multi-layer Back-propagation Training 98

5.3.1 Calculation of Weights for the Output-Layer Neurons 102

5.3.2 Calculation of Weights for the Hidden Layer Neurons 106

5.4 Momentum Ill

6. STOCHASTIC AND NEURAL NETWORK MODEL APPLICATIONS 113

6.1 Methodology 113

6.2 Iterative Conditional Simulation (ICS) 117

6.3 Neural Network Simulation 121

6.3.1 Development of training and testing patterns 122

7. RESULTS AND DISCUSSION 128

8. CONCLUSIONS AND RECOMMENDATIONS 175

8.1 Conclusions 175

8.2 Recommendations for future work 179

TABLE OF CONTENTS - CONTLNUED

REFERENCES

9

LIST OF FIGURES

Figure 3.1 DIustration of conditioning of a random field on data 69

Figure 4.1 Sketch of Biological Neuron 78

Figure 4.2 Sketch of Artificial Neuron 80

Figure 4.3 Transfer Functions for Neurons 82

Figure 4.4 Architecture of Neural Network 84

Figure 4.5 Feed-Forward Neural Network 86

Figure 5.1 Sketch of an artificial neuron 93

Figure 5.2 Derivatives of Logistic Function 95

Figure 5.3 Sketch of Neuron without Activation Function 97

Figure 5.4 Sketch of multilayer back propagation neural network 101

Figure 5.5 Layers of neurons for calculating weights at output

during training 103

Figure 5.6 Representation of neurons for calculating the change

of weight in the hidden layer 108

Figure 6.1 Hypothetical Aquifer Layout 116

Figure 6.2 Uniform sampling scheme of transmissivities and head data 118

Figure 6.3 Non-uniform sampling scheme of transmissivities

and head data 119

Figure 6.4 Architecture of f-ANN 124

Figure 6.5 Architecture of fh-ANN 125

2

Figure 7.1 True and estimatedf fields for cr^- = 1.0, (RF#1) 130

2

Figure 7.2 True and estimated f fields for CTy = l .0, (RF#2) 131

2

Figure 7.3 True and estimated/fields for cr^ = l.O, (RF#3) 132

10

LIST OF FIGURES - CONTINUED

Figure 7.4 True vs. estimated ICS, fh-ANN, and f-ANN/fields

2 for = 1.0,(RF#1, 2, and 3) 134

2

Figure 7.5 True and estimated/fields for O"/ = 2.0, (RF#1) 135

2 Figure 7.6 True and estimated /fields for cT/- = 2.0, (RF#2) 136

2

Figure 7.7 True and estimated/fields for Of/ = 2.0, (RF#3) 137

Figure 7.8 True vs. estimated ICS, fh-ANN, and f-ANN/fields

2 for a J- = 2.0, (RF#1, 2, and 3) 138

2 Figure 7.9 True and estimated/fields for = 5.0, (RF#1) 139

2

Figure 7.10 True and estimated/fields for '^f = 5.0, (RF#2) 140

2 Figure 7.11 True and estimated/fields for = 5.0, (RF#3) 141

Figure 7.12 True vs. estimated ICS, fli-ANN, and f-ANN/fields

2

for a J- = 5.0, (RF#1, 2, and 3) 143

2

Figure 7.13 True and simulated h fields for <^/ = 1-0, (RF#1) 144

2

Figure 7.14 True and simulated h fields for cr^ = 1.0, (RF#2) 145

2

Figure 7.15 True and simulated h fields for ^f — l-O, (RF#3) 146

Figure 7.16 True vs. simulated ICS, fh-ANN, and f-ANN h fields

2 for CTj. = 1.0, (RF#1, 2, and 3) 147

I I

LIST OF FIGURES - CONTINUED

2

Figure 7.17 True and simulated h fields for CT/ = 2.0, (RF#1) 148

2

Figure 7.18 True and simulated h fields for cT/ = 2.0, (RF#2) 149

2



2

for af = 2.0, (RF#1, 2, and 3) 151

2 Figure 7.21 True and simulated h fields for = 5.0, (RF#1) 152

2

Figure 7.22 True and simulated h fields for <J = 5.0, (RF#2) 153

2



2

for CTf = 5.0, (Ea^#l, 2, and 3) 155

Figure 7.25 True vs. simulated ICS, fh-ANN, and f-ANN head at sampled

2

locations for = l.Q, (RF#1, 2, and 3) 156


2

locations for <^/ = 2.0, (RF#1, 2, and 3) 157


2

locations for orj- = 5.0, (RF#1, 2, and 3) 158

2 Figure 7.28 Mean Square Error (MSE) of 100/fields for cr^ =1.0 160

2

Figure 7.29 Mean Square Error (MSE) of 100/fields for cr^ = 2.0 161

2

Figure 7.30 Mean Square Error (MSE) of 100/fieIds for = 5.0 162

12

LIST OF FIGURES - CONTEWED

2

Figure 7.31 Mean Square Error (MSB) of 100 A fields for cf = 1.0 163

2

Figure 7.32 Mean Square Error (MSE) of 100 fields for (J^ =2.0 164

2

Figure 7.33 Mean Square Error (MSE) of 100 h fields for cr^- = 5.0 165

Figure 7.34 Mean Square Error (MSE) fluctuations during training 168

Figure 7.35 True vs. estimated ICS, fli-ANN, and f-ANN mean/fields 171

Figure 7.36 True vs. simulated ICS, fli-ANN, and f-ANN mean h fields 172

Figure 131 True vs. estimated ICS, fli-ANN, and f-ANN

variance of/fields 173

Figure 7.38 True vs. simulated ICS, fli-ANN, and f-ANN

variance oih fields 174

13

LIST OF TABLES

Table 7.1 Scenarios considered in the model applications 129

Table 7.2 MSE of ANNs at the end of 15,000 iterations for

different variances 169

14

Abstract

Although it is one of the most difficult tasks in hydrology, delineation of aquifer

heterogeneity is essential for accurate simulation of groundwater flow and transport.

There are various approaches used to delineate aquifer heterogeneity from a limited data

set, and each has its own difficulties and drawbacks. The inverse problem is usually used

for estimating different hydraulic properties (e.g. transmissivity) from scattered

measurements of these properties, as well as hydraulic head. Difficulties associated with

this approach are issues of indentifiability, uniqueness, and stability. The Iterative

Conditional Simulation (ICS) approach uses kriging (or cokriging), to provide estimates

of the property at unsampled locations while retaining the measured values at the

sampled locations. Although the relation between transmissivity (7) and head (Ji) in the

governing flow equation is nonlinear, the cross covariance function and the covariance of

h are derived from a first-order-linearized version of the equation. Even if the log

transformation of T is adopted, the nonlinear nature between/(mean removed Ln[T]) and

h still remains. The linearized relations then, based on small perturbation theory, are valid

only if the unconditional variance of /is less than 1.0. Inconsistent transmissivity and

head fields may occur as a result of using a linear relation between T and h.

In this dissertation. Artificial Neural Networks (ANN) is investigated as a means

for delineating aquifer heterogeneity. Unlike ICS, this new computational tool does not

rely on a prescribed relation, but seeks its own. Neural Networks are able to learn

arbitrary non-linear input-output mapping directly from training data and have the very

advantageous property of generalization.

15

For this study, a random field generator was used to generate transmissivity fields

from known geostatistical parameters. The corresponding head fields were obtained using

the governing flow equation. Both T and h at sampled locations were used as input

vectors for two different back-propagation neural networks designed for this research.

The corresponding values of transmissivities at unsampled location (unknown),

constituting the output vector, were estimated by the neural networks. Results from the

ANN were compared to those obtained from the (ICS) approach for different degrees of

heterogeneity. The degree of heterogeneity was quantified using the variance of the

transmissivity field, where values of 1.0, 2.0, and 5.0 were used. It was found that ANN

overcomes the limitations of ICS at high variances. Thus, ANN was better able to

accurately map the highly heterogeneous fields using limited sample points.

16

1. INTRODUCTION

1.1 Background

The movement of water through natural earth materials is governed not only by the

distribution of inflow in time and space, but also by the nature of the material through

which the water flows. Atmospheric variations and geologic processes, over longer term,

create earth materials that have highly variable hydraulic properties. Such variations in

subsurface conditions are reflected in typical subsurface hydrologic observations.

The variation in hydraulic conductivity is often quantified in terms of standard

deviation of the logarithm of the hydraulic conductivity. Data compilations cited in many

different types of aquifers and soils [Freeze, (1975); Delhomme, (1979), Harr, (1987)]

indicate that standard deviation of the natural log of hydraulic conductivity, may range

from 0.4 to 4. Similar variation is observed for the depth-averaged transmissivity

[Delhomme, (1979)]. These ranges of variations indicate that the hydraulic properties of

many aquifers and soils will vary over several orders of magnitude, even in a given

aquifer or soil type. This suggests that any scheme used to quantify the variability in time

and/or space in a subsurface-flow system must consider the scale of variations in the

earth materials. The scale of variability of hydraulic conductivity may be as low as a

fraction of meter, and time variations over a period of hours or days may be significant in

many aquifers and soils. Most hydrologic investigations deal with applications that

require predictions of the quantity or quality of water over scales that are much larger

than the scale of variability discussed.

In the last two decades, research efforts focused towards developing general

quantitative concepts to describe the spatial variability (heterogeneity) of subsurface

porous media. Much of this research applied a geostatistical approach in a stochastic

framework (modeling). The overall goal of stochastic modeling in subsurface hydrology

is to develop methods that can be used to quantify large-scale flow and transport in

complex, naturally variable, subsurface flow system. In this sense, the concern would be

devoted to the dominant large-scale effects that could be reflected in a solution for the

mean behavior. In many cases, it is found that the mean behavior is very similar to the

classical deterministic descriptions.

Yeh (1992) and Yeh and McCord (1994) provided an overview of several

stochastic approaches developed in the last few years for modeling water flow and solute

transport in heterogeneous aquifers, they classified them into two main categories:

homogeneous or effective parameters and heterogeneous approaches. Most of these

models are known to be valid only if the spatial heterogeneity of the soil is moderate and

are limited to a relatively simplified analytical models, which are discussed below.

The effective parameter approach assumes that the heterogeneous geologic

formation can be homogenized to obtain effective parameters with which one can predict

the ensemble behavior of the flow and transport processes. Ensemble is defined as a

collection of all possible values of the variation of a random variable in a stochastic

process. Examples of such studies include those by [Gelhar and Axness (1983), and

Dagan (1988)] for saturated porous media, and those by [Yeh et al. (1985a,b,c);

Mantoglou and Gelhar (1987a,b,c); Russo and Dagan (1991a); and Russo (1993 a,b)] for

unsaturated media. Although these studies have contributed to a better understanding of

flow and transport in heterogeneous aquifers, the major drawback of these approaches is

that it predicts only the ensemble behavior of the aquifer, which can be quite different

from that of a particular realization (single possible random field) encountered in reality.

Theoretically, the ensemble behavior approaches the behavior of a single realization after

flow encounters heterogeneity of different scales over a large portion of the aquifer.

Such a theory appears valid for mildly heterogeneous aquifer at certain scales

[Sudicky (1986); and Garabedian et al. (1991)]. As the process grows and encounters

heterogeneities of different scales, the validity of this theory is questionable. Therefore,

for field applications, the predictive ability of the equivalent homogeneous approach is

considered limited and the prediction based on the equivalent homogeneous approach

involves large uncertainties.

The heterogeneous approach is designed to consider the nature of spatial

variability of hydrologic properties of the aquifer with limited amount of data. Methods

in this approach generally consist of geostatistics, Monte Carlo simulation, and

conditional simulation. Geostatistics is a mathematical interpolation and extrapolation

tool, which uses the spatial Statistics of the data set to estimate the property at unsampled

locations. The cokriging approach is computationally economic and is often considered a

more practical and realistic approach than the effective parameter approach. Although

hydraulic head and transmissivity fields derived from cokriging have been found to be

reasonable, there is no guarantee these estimates satisfy the principle of conservation of

mass [Harter and Yeh (1993); and Yeh et al. (1995,)]. The fact that kriged/cokriged map

19

is inevitably smoother than the true map can not be ignored. Conditional simulation,

which is a generalization of kriging or cokriging, provides a solution to this problem

[Matheron (1973); and Delhomme (1979)].

Monte Carlo simulation is the most intuitive approach for dealing with spatial

variability in a stochastic sense. Although it belongs to the heterogeneous approach since

hydraulic property at every point in the aquifer is specified, it is, in principle, equivalent

to the effective parameter approach. Both Monte Carlo simulation and the effective

parameter approach derive the mean and variance of the hydraulic head, but Monte Carlo

simulation requires fewer assumptions, and it can predict shape of frequency distribution

of the output variables. The principle of Monte Carlo simulation is straightforward. First

it assumes that the probability distribution of the parameter and its covariance function

are either known or can be derived firom available field data. Using some mathematical

techniques, it generates many possible realizations of the hydraulic conductivity field that

conform to the assumed probability distribution and the covariance function. Each

generated realization is inputted into the flow model from which the corresponding

hydraulic head distribution is determined. Finally, the head distribution resulting from all

realizations are used to determine its expected value, variance, covariance, and

distribution. Typical examples of studies using this approach can be found in Freeze

(1975), and Smith and Freeze (1979).

Conditional simulation is an approach that combines geostatistics and Monte

Carlo simulation. Unlike Monte Carlo simulation, it provides only a subset of all possible

realizations of the hydrologic property, which consists of the values of the properties at

sample locations and confirms with a predefined spatial statistics of the hydrologic

property. In this context, realizations that do not agree with measured values at the

sampled locations are discarded. Because the conditional simulation includes the data

values at the sampled location and all possible values at the unsampled locations, the

conditional simulation is considered the most rational approach for dealing with

uncertainties in heterogeneous geologic formations, [Yeh (1992) and Yeh and McCord

(1994)]. The complete theory of conditional simulation is given by Matheron (1973) and

Joumel and Huijbregts (1978).

A great challenge facing hydrologists today is posed by the inverse problem of

estimating aquifer hydraulic properties, (e.g. transmissivity (J) from scattered

measurements of these properties and hydraulic head (^. The difficulties with the inverse

problem are associated with issues of identifiability, uniqueness, and stability, as

discussed by W. Yeh (1986); and Carrera and Neuman (1986b). The problem is further

complicated by the consensus recognition of the inherent heterogeneities of an aquifer's

hydraulic properties, the nonlinear relationships between T and (p, and the fact that

observations of T and ^ are usually limited.

Various methods have been developed to solve the inverse problem given scattered

head and conductivity or transmissivity measurements. One popular method among these

is the minimum-output-error based approaches [Yeh and Tauxe (1971); Gavalas et al.

(1976); Willis and Yeh (1987); Cooley (1982); Neuman and Yakowitz (1979); Neuman

(1980); Clifton and Neuman (1982); and Carrera and Neuman (1986a,b)]. A shortcoming

of this approach is that the identity of the estimate is often undefined. In other words, it is

unclear what the transmissivity and head fields derived from these methods represent in

the case where only scattered head and transmissivity measurements are given. Being

unable to ascertain their identities, this approach suffers from the same difficulty as any

manual model calibration approaches, e.g. Yeh and Mock (1996), because the uncertainty

associated with the output can not be addressed.

The geostatistical approaches, [Kitanidis and Vomvoris (1983); Hoeksema and

Kitanidis (1984); Dagan (1985a); Rubin and Dagan (1987); and Gutjahr and Wilson

(1989)], have received increasing attention recently. The geostatistical approaches to the

inverse problem rely on the use of kriging/cokriging estimation techniques. For example,

Kriging is defined as the best linear unbiased predictor which provides the estimates of

the property at unsampled locations but retains the sample values at locations where the

values are known, i.e., kriging is a special type of conditional expectation. In the case of

simulation of flow in large-scale aquifers where only limited amounts of hydraulic

conductivity data are collected, geostatistics is generally found to be useful [Gutjahr

(1981) and Clifton andNeuman (1982)].

Similarly, in groundwater hydrology, cokriging using head and transmissivity

values at sample locations, has been used to estimate the unknown transmissivity and/or

hydraulic head values at other locations, [Ahmed and de Marsily (1993); Kitanidis and

Vomvoris (1983); Hoeksema and Kitanidis (1984); Gutjahr and Wilson (1989); and

Hoeksema and Kitanidis (1989)]. The resultant hydraulic head and transmissivity fields

are then used to simulate the solute transport. Cokriging is based explicitly on the

statistical characterization of the spatial variability of aquifer log transmissivity, Ln[T].

The idea is to take advantage of the spatial continuity of the Ln[T] field implied by a

covariance function or variogram, and to make use of the linearized relationship between

Ln[T] and (f> implied by the stochastic flow equation. In cokriging, the unknown/ (mean

removed Ln[TX' value at a point of interest is estimated by a weighted linear combination

of the observed f and A(mean removed (f). The weights are determined by requiring that

the estimator be unbiased and have minimum variance. By casting the problem in a

probabilistic framework, Dagan [1982, 1985] and Rubin and Dagan (1987) showed that

when the random transmissivity/and head h fields are jointly Gaussian (or multivariate

normal), with known mean and covariance, cokriging estimate and cokriging covariance

are equivalent to the conditional mean and conditional covariance of the new joint

probability distribution function conditioned on the measurements.

The classical cokriging is a linear predictor. In addition, the cross covariance

function between/and h and the covariance of h required in cokriging are derived from a

first-order linearized version of the governing flow equation [Mizell et al. (1982);

Kitanidis and Vomvoris (1983); Hoeksema and Kitanidis (1984),(1989)], while the

relation between T and ^ is nonlinear. Even if the log transformation of T is adopted, the

nonlinear nature between / and h still remains. The linearized relations, based on small

perturbation theory, are valid only if the unconditional variance of/is less than 1.0. The

nonlinearly in the flow equation implies that in general h will not be normal, and/ and h

will not be jointly normal, even if / is normal. As a result, the use of classical

geostatistical techniques is not justified.

23

A major problem with the classical geostatistical approaches is that they often

produce inconsistent T and ^ estimates [Yeh et al. (1993b), (1995a)]. If cokriging/co-

conditional simulation is used to obtain both Ln[T] and (f) without solving the flow

equation, the resulting velocity field is prone to serious mass balance errors, especially

for highly heterogeneous aquifers or under highly non-uniform flow conditions. This is

clearly undesirable in transport modeling. On the other hand, if cokriging/co- conditional

simulation is employed to obtain Ln[T], and the flow equation is solved for the head and

velocity fields, the resulting numerical solution of (f) is not guaranteed to be, at least

approximately, consistent with measured values of <f) at mejisurement locations, [Harter

and Yeh (1994)].

Carrera and Glorioso (1991) made a comparison between the classical cokriging

approach and an iterative statistical inverse approach. They concluded that basic

hypotheses are similar for the two formulations and main differences stem from

linearization performed about the estimated mean in cokriging methods and around the

estimated Ln[T] in iterative statistical approach. As a result, the latter is less constrained

by linearity than the former and lead to better estimates and more consistent estimation

covariance matrices. However, the identity of the estimate remains unknown.

Yeh et al. (1995a) developed an iterative co-kriging-like approach that combines

the cokriging and numerical flow model to estimate transmissivity based on observed

transmissivities and hydraulic heads in the saturated aquifer. The method suffers from the

drawback that the iterative process often diverges for large domain or when the number

24

of head measurements is small. An altemative iterative co-conditional simulation

approach was suggested by Gutjahr et al. [1993, 1995]. This iterative approach is more

computationally efficient than the one proposed by Yeh et al. [1993, 1995]. However, it

does not guarantee that the iterative conditional head values are going to converge to the

measured head values at sampling locations.

In light of the above discussion on geostatistical and stochastic approaches, one can

say that in many real world problems, precise solutions do not exist. In these cases,

acquiring knowledge by example may be the best altemative. In other words, if it is not

possible to describe the logic behind a problem or predict behavior with analytical or

numerical solutions to governing equations, traditional predictive analysis will be

difficult.

As an altemative solution. Artificial Neural Network (ANN) will be investigated in

this dissertation. This new tool does not rely on a prescribed relation, but seeks its own,

and may be superior to the traditional predictive tools. Previous research has shown that

the iterative conditional simulation (ICS) outperforms cokriging [Yeh et al. (1996) and

Hanna S. (1995)]. In this research, the predictive capabilities of the neural network will

be compared to the ICS approach.

Early work in the ANN technology was done by Rosenblatt [1958, 1962] on the

perceptron and by Widrow and Hoff (1960) on the AD ALINE. Rosenblatt demonstrated

that a MeCulloch and Pitts (1943) neuron could be trained to solve any linearly separable

problem, i.e., separated by a hyper-plane [Rumelhart et al., 1986], in a finite number of

steps. He called this trained device a perceptron. Widrow developed the device

25

AD ALINE (Adaptive Linear Element, originally the Adaptive Linear Neuron) which can

be used as a signal-processing filter. This device has a long history of successful

applications including echo elimination, automated control, and antenna array

adjustment. See Widrow and Steams (1985) for a discussion.

Minsky and Papert (1969) are often credited with (or accused of!) fostering a

pessimistic outlook in the one- or two-layered perceptron networks, which precipitated a

dark age for ANN that lasted until the early 1980s. Rumelhart et al. (1986) and

MeClelland et al. (1986) are often credited with leading the modem renaissance in ANN

technology. The criticisms that Minsky and Papert directed at the perceptron neural

networks were correct. However, the addition of more complexity into the networks,

specifically in adding a middle (hidden) layer to a multi-layer perceptron network,

together with a clear explanation of the back propagation learning algorithm, overcame

many of the limitations of the one- or two-layered perceptron neural networks.

Since 1986, the variety of ANNs has rapidly expanded. In their now classic work

Rumelhart et al. (1986) and McClelland et al. (1986) described four types of ANNs. One

year later, Lippman (1987) reviewed six types of ANNs at the First International

Conference on Neural Networks. A few months later, Hecht-Nielsen (1988) described 13

ANNs. In 1988, Simpson circulated a review of 26 types of ANNs that was later

published as a book [Simpson, (1990)]. Maren et al. (1990) described about 24 ANNs, of

which about 12 differ from those described by Simpson (1990). Maren (1991) suggested

that the rapid increase is continuing, with the number of ANNs now approaching 48.

26

There are currently a vast array of ANN applications in the cognitive sciences, the neuro-

sciences, engineering, computer science, and the physical sciences.

It may be helpful to think of an ANN as a nonparametric, nonlinear regression

technique. In traditional regression, one must decide a priori on a model to which the data

will be fitted. The ANN approach is not as restrictive, because the data will be fitted with

the best combination of nonlinear or linear functions as necessary, without the searcher

rigidly pre-selecting the form of these functions. Neural networks can organize data into

the vital aspects or feamres that enable one pattem to be distinguished from another. This

quality of a neural network stems from its adaptability in learning by example and leads

to its ability to generalize. Generalization may be thought of as the ability to abstract or to

respond appropriately to input patterns from those involved in the training of the network.

1.2 Objectives

The main objective of this study is to introduce an Artificial Neural Network

(ANN) and explore its feasibility as a new tool in computational hydrology. This tool

attempts to leam, capture, and then map conditionally generated random fields of aquifer

parameter values.

The second objective is to build ANN models capable of learning effectively, by

example, the large-scale natural variability in subsurface earth materials, and to

efficiently reproduce as output the randomly generated aquifer parameter fields.

The third objective is to identify the strengths and weaknesses of the ANN method

for characterizing variability in aquifer parameter fields, and compare it to the Iterative

Conditional Simulation approach.

1.3 Contents of the Dissertation

Chapter 2 discusses spatial heterogeneity and its geostatistical representation, along

with spectral representations of random variables. In addition, the concepts of random

processes, random fields, and two-dimensional conditional and unconditional random

field generators are introduced. It presents some definitions of probability terms,

including random variables, random fields and their generations, and stochastic process

terms used in the stochastic approach. Chapter 3 presents the stochastic approach

methodology used in this study, along with a mathematical development of the stochastic

model. Chapter 4 introduces artificial neural networks, their basis, mechanism, and

necessary elements. Chapter 5 provides a mathematical development of feed-forward

back-propagation neural network and its algorithm. Chapter 6 presents applications of

both a stochastic model and a back-propagation neural network to a hypothetical aquifer.

Chapter 7 presents and discusses results of the applications. Finally, Chapter 8 presents

conclusions derived from this study, as well as recommendations for future studies.

28

2. SPATIAL HETEROGENEITY AND RANDOM FIELDS

2.1 Heterogeneity and Stochastic Process

Spatial heterogeneity refers to the variation of a physical property in two or three-

dimensional space. This physical variation is encountered in many earth science

applications; it is of particular interest when studying flow and transport processes in the

saturated and unsaturated zone. When examining soil media, spatial heterogeneity is

observed on many different scales such as the micro-scale of a single pore, the

intermediate scale of laboratory experiments, the scale of field experiments, and the

mega-scale, which encompasses entire regions. This work is not concerned with the

spatial heterogeneity on the micro-scale or pore scale because the governing physical

laws for porous media flow are only valid on a scale larger than the micro-scale.

Bear (1972) defined "Representative Elementary Volume"' as the smallest volume

over which there is a constant "effective" proportionality factor between the flux and the

total pressure gradient or total head gradient. This proportionality factor is called the

hydraulic conductivity of the REV. By definition of the REV, the hydraulic conductivity

does not rapidly change as the volume to which it applies is increased to sizes larger than

the REV. This is based on the conceptual notion that either no heterogeneity is

encountered at a scale larger than the REV or that heterogeneity occurs on distinctly

scales, the smallest of which is the REV [de Marsily, (1986)]. The latter model assumes

that within each scale relatively homogeneous regions exist. Within these homogeneous

units heterogeneities can only be defined on a significantly smaller scale. Geologists refer

to these different scales as fades [Anderson, (1991)] while hydrologists commonly speak

in terms of hydrologic units [Neuman, (1991)]. Analysis of a large number of hydrologic

and geologic data from different sites associated with different scales has shown that the

existence of discrete hierarchical scales for any particular geologic or hydrologic system

vanishes in the global view as the multitude of different geologic or hydrologic units

allows for a continuous spectrum of scales [Neuman, (1990)].

For the scale of the REV, mathematical models based on the physics of flow and

transport in homogeneous porous media have been well-established in the literature and

their accuracy has been verified in many laboratory experiments [Hillel, (1980)]. The

physical meaning of the underlying model parameters is already well understood [Jury,

(1991)].

Depending on the problem formulation, spatially heterogeneous properties can

belong to either measurable or predictable porous media property. Measurable properties

that are seen as the cause of flow and transport behavior in soils such as pore geometry,

the saturated permeability of the soil, the soil textural properties, etc. Predictable

properties are usually based on physical laws or functions. This dissertation deals with

spatial heterogeneity that is predictable given some knowledge the heterogeneity of

measurable properties.

Stochastic analysis is closely associated with the theory of random processes,

which is a branch of mathematics called probability theory. Probability theory itself is a

branch of mathematics called measure theory. "Probability theory and measure theory

both concentrate on functions that assign real numbers to certain sets in an abstract

30

space according to certain rules." [Gray and Davisson, (1986), p.27]. The treatment of

spatial heterogeneity in terras of random processes is a highly abstract procedure, the

appropriateness of which has been questioned. However, this treatment is justified by

Athanasios Papoulis [(19840, p. xi]:

"Scientific theories deal with concepts, not with reality. All theoretical results are

derived from certain axioms by deductive logic. In physical sciences the theories

are so formulated as to correspond in some useful sense to the real world,

whatever that may mean. However, this correspondence is approximate, and the

physical justification of all theoretical conclusions must be based on some form of

inductive reasoning."

For a complete derivation of the concepts of random variables, random processes,

and stochastic differential equations there is a vast amount of literature that has been

published in this area for many different applications [see e.g. Gray and Davisson,

(1986); Papoulis, (1984); Priestley, (1981)].

2.2 Geostatistical Representation of Spatial Variability

Aquifers are inherently heterogeneous at various observation scales.

Characterizing the heterogeneity at a scale of our interest generally requires information

of hydrologic properties at every point in the aquifer. Such a detailed hydraulic property

distribution in aquifers requires numerous measurements, considerable time, and great

expense, and is generally considered impractical and infeasible. The alternative is to

utilize a small number of samples to estimate the variability of parameters in a statistical

31

framework. That is, the spatial variation of a hydraulic property is characterized by its

probability distribution estimated from samples. Law (1944) and Bennion and Griffiths

(1969) reported that the distribution of porosity data in an aquifer is normal. Hoeksema

and Kitanidis (1985) suggested that the spatial distribution of storage coefiRcient might be

log-normal. Hydraulic conductivity distributions are usually reported to be log-normal,

[Law (1944); Bulness (1946); Bakr (1976); de Marsily (1986); Sudicky (1986) and

Jensen et al. (1987)].

Based on such a statistical approach. Freeze (1975) treated hydraulic conductivity

as a random variable and analyzed the uncertainty in groundwater flow modeling.

However, recent analyses of hydraulic conductivity data showed that, although the

hydraulic conductivity values vary significantly in space, the variation is not entirely

random, but correlated in space [Bakr (1976); Byers and Stephens (1983); Hoeksema and

Kitanidis (1985); and Russo and Bouron (1992)). Such a correlated nature implies that

the parameter values are not statistically independent in space and they must be treated as

a stochastic process, instead of a single random variable.

To illustrate the stochastic conceptualization of the spatial variability of

hydrologic parameters, the hydraulic conductivity data measured along a vertical

borehole are used as an example. The value of hydraulic conductivity at a point JC , i =

1,2,3,...,n, along the bore hole can be conceptualized as one of many possible geological

materials that may have been deposited at that given point. Thus, the hydraulic

conductivity at that point is a random variable, K(X^,Q)) . The o) indicates that there are

many possible values of K at x.. As a result, hydraulic conductivity values of the entire

32

depth of the borehole may be considered as a collection of many random variables in

space. 'Ez.chK{xI,co) has a probability distribution, which may be interrelated. The

probability of finding a particular sequence of hydraulic conductivity values along the

borehole, ), depends not only on the probability distribution of the hydraulic

conductivity at one location, but also on those at other locations. This implies that, actual

hydraulic conductivity values along the borehole are one possible sequence

of ATCXjjty,) out of all the possible sequences, K{Xf,CQ) . In the vocabulary of

stochastic processes, the probability of finding that sequence is then defined as the joint

probability distribution or joint distribution. All these possible sequences are called an

ensemble, and a realization refers to any one of these possible sequences.

The covariance C,^-of two random variables x,.and x^, is a measure of the

physical correlation between these variables and is a second moment defined as:

where m,- and rtij are the mean of x- and Xj respectively.

If £ = j, the covariance function C,y =Var(x) or cr^ (varinace).

The autocorrelation function p(^) where ^ represents a separation distance

between variables represents the persistence of the value of a property in space. An

autocorrelation function is simply defined as the ratio of the covariance function to its

variance, i.e..

Cij = [(x,. -m^XXj -THj)] (2.1)

(2.2)

33

Generally, the value of yo(i^) of the hydraulic conductivity data tends to drop

rapidly as increases. Different autocorrelation models can represent the decline of the

correlation. The one cormnonly used in three dimension is an exponential decay model,

[Bakret al. (1978); Gelhar and Axness (1983); and Yeh et al., (1985a,b,c)]:

/7(0=exp« A A

+ (2.3)

Where ^is the separation vector (^'1,^2'^3)' are the integral scales (or

correlation scales) in the x, y and z directions, respectively. The integral scale is defined

as the area under an autocorrelation fixnction if the area is a positive and non-zero value,

Lumley and Panofsky (1964). For the exponential model, the integral scale is the

-1 separation distance at which the correlation drops to the e level. At this level, the

correlation between data points is considered insignificant. Furthermore, if the correlation

scales of a random field are the same in all the directions, the random field is said to be

statistically Isotropic. Thus, the autocorrelation function is considered a statistical

measure of the spatial structure of hydrogeologic parameters.

34

2.3 Random Processes and Random Fields

The term "random process" or "stochastic process" is mostly used if the index set

is the time variable, while the term "random field" is commonly applied for index sets of

spatial locations. A random process is an infinite collection of random variables, where

the random variables are indexed on a discrete or continuous "index set" I, which

corresponds to spatial location x in this research applications.

The randomness lies in the lack of knowledge, and inability to acquire it fully,

about what these porous medium properties exactly are. Soil physical or chemical

properties are commonly determined by either an acmal measurement of soil properties

or by the intuitive, graphical, or mathematical estimation of soil properties from related

data (inverse distance interpolation, kriging, etc.). Both measurement and estimation are

associated with errors. The (physically deterministic) errors occurring during the

measurement and/or estimation process have the properties of random variables and thus

allow a rigorous analysis with statistical tools. This is the key to stochastic analysis and

the bridge between reality and conceptual model. Stochastic analysis in subsurface

hydrology is about modeling the limitations of our knowledge! How limited our

knowledge is will in turn depend on the porous medium heterogeneity [Harter (1994)].

The probability distributions encountered in stochastic modeling are essentially a

reflection of the fuzziness or uncertainty of our knowledge about the soil properties.

Hence, the justification for treating porous media as random fields lies NOT in the

physical nature of the porous medium (which is deterministic) but in the limitation of our

knowledge ABOUT the porous medium. This is not to say however that heterogeneity is

35

unrelated to the statistical analysis. Indeed, the estimation error is a direct function of the

soil heterogeneity. If the porous medium is relatively homogeneous, the properties of the

soil at unmeasured locations are estimated with great certainty given a few sample data.

On the other hand, if the porous medium is very heterogeneous and soil properties are

correlated over only short distances, an estimation of the exact soil properties at

unmeasured locations is associated with large errors. Hence, the heterogeneity of the soil

is a measure of the estimation error or prediction uncertainty.

2.4 Stationarity and Ergodicity of Random Processes

The sample statistics give a quantitative estimate of the degree of heterogeneity in

the porous medium, which also is an estimate of the expected estimation error. Then two

problems need to be addressed:

1. The sample taken from measuring MANY random variables ONCE must be related to

the MANY possible outcomes of any particular ONE random variable X(x) at

location x.

2. The sample statistics must be related to the ensemble statistics of the random field.

These two points are crucial to the stochastic analysis and in particular the first

one must not be underestimated [Harter (1994)]. Recall that a random field consists of an

infinite number of random variables, each of which has its own marginal pdf. The

random variables in a random field need not have identical probability distributions.

Estimates of soil properties that are conditioned on field data, are indeed always

random fields with random variables whose probability distribution function is a function

of the location in space. This is because the uncertainty about field properties may vary

from location to location depending on how closeness to a measurement point. In order to

determine the probability of occurrence of a particular sequence of random variables, a

joint distribution of these random variables, e.g. hydraulic conductivity K{x^,eo^) must

be known. Obviously, the joint distribution is not available in real-life situations, because

K{XI,q3^) values sampled along a borehole represent only one realization out of the

ensemble K{x-,Cl>^) Therefore, one must resort to simplified assumptions,

namely, stationarity and ergodicity.

Translating field measurements, a "sample", into statistical parameters defining

random variables is a practical problem. This leads to the problem of deriving

"ensemble" statistical parameters of random fields from a small sample that gives one

measurement of each of an infinite number of random variables. "Sample" statistical

parameters and a "sample probability distribution" or histogram of the measured random

field parameters can be computed which can give a quantitative estimate of the degree of

heterogeneity in the porous medium, which also is an estimate of the expected estimation

error.

Only a single realization of the random field is available, since all regional and

sub-regional geologic and other environmental phenomena are unique and do not repeat

themselves elsewhere. Realizations (samples) of random fields are a basic element of the

numerical stochastic analysis. The realizations are often referred to as random fields. In

numerical applications, random fields are always dicretized in a finite domain.

To simplify the problem, it is assumed that the marginal probability distribution

function of each random variable is identical at every location in the random field. This

implies that the mean, the variance, and the other moments of the probability distribution

are identical for every location in the random field. This property is called "stationarity"

or "strict stationarity". In this study, a weaker form of stationarity is assumed: "second

order stationarity" or "weak stationarity", which requires that only the mean and

covariance are identical everywhere in the random field:

IJ.^ (x) = fj. for all X (2.4)

Cov^[[Xix),(XCx+AO)] = Cov,iAO (2.5)

The existence of stationarity in porous medium properties cannot be proven

rigorously at any single field site. Data are often sparsely distributed. In the best of cases

a linear or higher order trend can reasonably be removed from the data. For all practical

purposes, it is dierefore convenient to hypothesize that the field site is a realization of a

weakly stationary random field (after removing an obvious trend). This is a reasonable

assumption in many field applications. Once this working hypothesis is postulated, the

sample of measurements at different locations is treated as if it were a sample of several

realizations of the same random variable (i.e. at the same location).

In order to solve the problem of relating sample statistics to ensemble parameters,

it is necessary that the sample statistics taken from a single realization indeed converge to

the ensemble statistics of the random variables as the number of samples increases. A

random field or random process that satisfies this theorem is called "mean ergodic".

Ergodicity means that by observing the spatial variation of a single realization of a

38

stochastic process, it is possible to determine the statistical property of the process for all

realizations.

Like stationarity, mean ergodicity cannot be measured in a single field site i.e., a

single realization of the hypothetical random field. Hence, mean ergodicity is taken as a

working hypothesis, i.e. it is assumed a priori that the measured sample statistics

converge in the mean square to the true ensemble parameters as the number of samples

increases. Ergodic processes need not be stationary, and similarly stationary random

fields need not be ergodic. The difference between the sample statistical parameters of X

and its ensemble moments is generally referred to as parameter estimation error and will

subsequently be neglected.

2.5 Random Field Generators (RFG)

To carry out the conditional simulations, one needs to be able to generate random

fields with prescribed covariance and cross-covariance functions. There are several

techniques to generate these fields.

Harter [1992, 1994] did a comprehensive review of the different techniques to

generate unconditional and conditional random fields. The generation of spatially

correlated samples of random fields plays a fundamental role in the numerical analysis of

stochastic functions. The purpose of random field simulation is to create numerical

samples or "realizations or random fields" of stochastic processes with well-defined

properties. The simplest and most commonly available form of simulation is the random

number generator on a calculator or computer. These readily accessible simulators

generate independent, uniformly distributed random numbers, [i.e. sample of a single

random variable with a uniform, univariate distribution. Press et al. (1992)].

A particular challenge arises when the random variables are dependent, which is

spatially correlated and defined through a joint or multivariate distribution. Not only do

the generated random fields have to converge (in the mean square) to the desired

ensemble mean and variance, and any higher order moments if appropriate, but they also

have to converge (in the mean square) to the desired correlation strucmre as the number

of samples increases. The purpose of a random field generator is to transform an

orthogonal realization, consisting of independently generated random numbers, with a

prescribed univariate distribution, into a correlated random field with the desired joint

probability distribution. If the distribution is Gaussian, the joint pdf is expressed by its

first two moments, the mean and the covariance.

In practice, the joint probability distribution function is often inferred from field

data obtained at the site of interest. The joint probability distribution is commonly

described by invoking the ergodicity and stationarity hypotheses discussed before, and by

taking the sample mean and the sample covariance function as the moments of the

underlying multivariate pdf. The simulations must be conditioned on the information

known at the measurement points. This amounts to the generation of random variables

with a conditional joint probability distribution function. Tompson et al. (1989) and

Harter (1994) have assessed the numerical efficiency of several popular RFG's.

40

The most commonly used methods for generating spatially correlated random fields

are:

1. Spectral representation and a fast Fourier transform FFT,

2. The turning band method (Th),

3. The matrix LU decomposition method, and;

4. The sequential Gaussian simulation method (s).

The first three RFG methods use finite linear combinations of uncorrelated random

variables, thus, the common distribution of these random variables should be such that it

is preserved under finite linear combinations, Myers (1989). Gaussian distribution is

frequently used, since it is the only distribution with this property, plus it is the simplest

distribution, requiring only the first two moments. The fourth RFG method features a rich

family of spatial strucmres not limited to the Gaussian distribution. Each of the above

RFG methods can serve as a basis for generating both unconditional and conditional

simulations.

In this study the spectral approach utilizing Fast Fourier Transform (FFT) is used

due primarily to its computational speed advantage and ease of incorporating an existing

subroutine into the algorithm.

2.5.1 Spectral Representation of Random Variables

In the analysis of random processes (time series), "spectral analysis" has been an

important tool for many different tasks, and is a well-established field of probability

theory, C.F. Priestley (1981). Recently, spectral analysis has also become important for

the study of spatially variable processes (random fieldsGelhar et al. (1974) introduced

spectral analysis to study groundwater systems and it has been applied to a great variety

of subsurface hydrologic problems [Bakr et al. (1978); Gutjahr et al. (1978); Gelhar and

Axness (1983); Yeh et al. (1985a,b); and Li and McLaughlin. (1991)].

Spectral analysis propounds that a single realization of a random process or of a

random field as a superposition of many (even infinitely) different (n-dimensional) sine-

cosine waves, each of which has different amplitude and frequency. Then any particular

realization can be expressed either in terms of a spatial function or in terms of the

frequencies and amplitudes of the sine-cosine waves and their amplimdes (called 'Fourier

series' of a discrete process and 'Fourier transform' of a continuous process). The latter

are collectively called the "spectral representation" of the random field. The spectral

representation of a single random field realization can intuitively be understood as a field

of amplitudes, where the coordinates are the frequencies of the sine-cosine waves. In

other words, instead of an actual value for each location in space, the spectral

representation gives amplimde for each possible frequency (wavelength). Note that in n-

dimensional space, n<3, sine-cosine waves are defined by n-dimensional frequencies

(with one component for each spatial direction) and therefore the spectral representation

of a n-dimensional random field is also n-dimensional.

42

Each realization of a random field has its own spectral representation. But

obviously the amplitudes of the sine-cosine waves can be defined as random variables

with the frequency domain as the index field. Then, the problem turns to deal with the

probability space of the spectral representation, which in torn is also a random field, but

defined in the frequency domain; (i.e. the probability space of the spatial random field is

mapped onto the probability space of the spectral random field).

There are many advantages to representing a random field in terms of its

underlying spectral properties, (i.e., in terms of the probability distribution of amplitudes

and frequencies of "waves"). Among these, there are two of particular importance. The

first is that the spectral representation of a spatially correlated random field is a random

field with uncorrelated random variables, making the analysis much easier than that of

correlated random variables which require a multivariate joint distribution function. The

second is that the spectral transformation of a partial differential equation is a polynomial

whose solution is found much easier than the solution to the partial differential equation

in the spatial domain.

The spectral representation of a single realization X(x) of a random field with mean

zero is formally defined in terms of the Fourier-Stieltjes integral, [Wiener (1930)]:

CO

(2.6)

43

Where the integral is n-dimensional, and Z(k) is a complex-valued function,

called the Fourier-Stieltjes transform of X(x). The Fourier-Stieltjes integral must be

chosen over the more common Fourier-Riemann integral:

f { x ) = ] e ' ^ g { k ) d k (2.7) ->00

where g(k) is the Fourier transform of f(x), since the Fourier-Stieltjes transform Z(k) of

the random field X(x) is generally not differentiable such that dZ(k) = Z(k) dk. Z(k) can

be understood as an integrated measure of the amplitudes of the frequencies between (-

oo,k) contributing to the realization X(x). The new probabilit>' space of the random

variables Z(k) in (2.1) has several important properties:

1. [JZ(fc)] = -^ fe-'^[Z(x)]d!r —CO

2. \\dZ(k)\-] = S{k)dk (2.8)

3. [dZ(Jc^)dZ'(Jc2)]=Q for all / tj

The first property states that the mean [dz(k)] of the random variables dZ(k) is

equal to the Fourier transform of the mean of the random variables X(x). In this study,

only zero-mean random processes are considered, hence the spectral representations are

also of zero means. The second property defines the variance of the random variable

dz(k) as [S(k) dk]. The term [S(k) dk] is a measure of the average 'energy per unit area'

or 'power' contribution of the amplimde of a frequency to the random field X(x). S(k) is

called the "spectral density" or "spectrum" of the random field x, which depends purely

44

on the probabilistic properties of the random field X(x). The third property states that the

increments dZ(ki) and dZ(k2) at two different frequencies ki and kj are uncorrelated.

dz(k) is called an orthogonal' random field. Through (2.8), the first two moments of the

random field dz(k) are defined solely in terms of the first two moments of the stationary

random field X(x). Hence, if the first two moments of the random field X(x) are known,

then the first two moments of the spectral representation dz(k) are known. Note that the

spectral representation dz(k) of a weakly stationary random field X(x) is only stationary

to first order: The mean [dz(k)] is constant (first property), but the variance S(k) of the

random field dz(k), is a function of the location k in the frequency domain (second

property).

In summary, a new probability space, called the spectral representation of a random

field, was defined on the known probability space of a random field. The mapping of a

stationary correlated random field X into its spectral representation dZ provides the

important advantage of creating an equivalent dZ to the random field X that consist of

orthogonal or uncorrelated random variables.

2.5.1.1 Unconditional two-dimensional Random Field Generation

The Spectral representation dZ of a correlated random field variable (RFV) X, is

itself a RFV of independent random variables with a variance defined by the spectral

density function of X, S(k) dk. It can be shown that the spectral density S(k) is simply

the Fourier transform of the covariance C(Q of X where ^ is the separation distance.

45

Hence, if random, zero-mean dz(k) are generated with a variance SQt) dk then their

inverse Fourier transform yields a correlated random field with X(x) that have zero-mean

and the desired covariance function by virtue of (2.3). RFG's based on Fourier transforms

have first been introduced by Shinozuka [1972, 1991]. Gutjahr (1989) describes a two-

dimensional random field generator based on a fast Fourier transform algorithm, which

has been adopted in my study.

In this study, as applied before by Harter (1994), realizations are generated on a

rectangular domain defined over a regular grid centered around the origin with grid

points being Ax = (AXj, Ax,)^ apart. The size of the domain is defined by MAx such that

the rectangle spans the area between - MAx and (M-l)Ax and the number of grid points

in the random field is 2M by 2M. Since the spectral representation of a stationary random

field is only defined for an infinite domain, it is further assumed that the random field is

periodic with period 2M in both dimensions. This is a necessary assumption for the

formal derivation of its spectral representation, since the analysis of an infinite process

can be used for the generation of a finite random field. Also, periodic functions are

known to have a discrete rather than a continuous spectrum. Hence, dZ(k) exists only for

discrete k, for which it can be generated, such that [dz(k)] = 0 and [ldz(k)i"j = S(k) dk.

The discretization of X(x) limits the wavelengths "seen" by the discrete random

field to all those that are at least of length 2Ax, i.e. to all (angular) frequencies k <

2ji/(2Ax).Higher frequencies cannot be distinguished firom frequencies within this limit,

an effect referred to as "aliasing". In other words, heterogeneities on a scale smaller than

46

the discretization Ax are not resolved by the random field. Similarly, the longest possible

wavelength "seen" by a finite random field is less than or equal to 2MAx i.e., the lowest

(angular) frequency is Ak= 2K /(2MAX), and all other frequencies k must be multiples of

Ak. Hence, the spectral representation dz(k) of a finite, discrete random field X(x) with

2 (2M)2 gnd points in 2-D space is also a finite, discrete random field defined on a (2M)

grid in the 2-D frequency domain. For discrete dz(k) the Fourier-Stieltjes integral (2.2)

becomes a Fourier series such that;

W = (2.9) m=—\i

where z(k) are (complex valued) random Fourier coefficients with the same

properties as dZ(k) in (2.7), namely zero-mean, a variance , and all z(k)

are independent for ^ .To ensure that X(x) is a real valued random field, the z(k)

field must be constructed such that:

z { - k ) - z { k ) (2.10)

(i.e. random numbers z(k) need only to be generated for one half the size of the

rectangle). The asterisk * stands for complex conjugate. Complex valued, Gaussian

2 distributed z(k) for discrete kj ,J=1, (2M) /2 are obtained by generating two independent

Gaussian random numbers aj and Pj for each kj. Each is independent, Gaussian

distributed random numbers with zero mean and variance of Vz. For one half of the

random field:

47

z(,k,) = [S(ir^)A/r)]"' (2.11)

The other half is obtained through the symmetry relation (2.10). Gutjahr et al.

(1989) showed that the above Construction of z(k) satisfies the required properties. The

correlated random field X(x) is obtained by performing the Fourier summation (2.9). The

double summation in (2.9) is most efficiently done by a numerical Fourier transform

technique called the "Fast Fourier transform", or simply FFT, Brigham (1988). FFT

algorithms can be found in many computer libraries, e.g. IBM (1993) and Press et al.

(1992). Most available FFT algorithms are written using the fi-equency u as argument

instead of the angular fi^equency k, where k = 27cu. Fourier transform pairs can be defined

as follows:

'S(i)=-^r f ^x(x)dx (2;r) J-"-'-"

Changing the variables of integration fi-om k to u, we can get similar pairs as in

Equations (2.12), [Harter (1994)]. Typical FFT algorithms require the summation in (2.9)

to be over the interval [0,2M-1] rather than over the interval [-M, M-l], Using the

periodicity assumption z (m Ak), m > M-l, are obtained fi-om;

C(;c,)=r r e-^S(Jc)dk J— CO J— CO

S(.k) = f (iTt) J-®-"-"

x ( x ) = r r e'^^dzck) J—oo J—CO (2.12)

48

z(mAk) = z[(m-2M) Ak], all m > M-1

Recalling that Ak=27r/(2MAx), this leads to the following constmction of the

correlated random field X (x), with entries X(niAxi, n2Ax2), 0 < ni, ni < 2M-1 ;

ni|=0 nh=0

where;

- 2M 2M (2.13)

Itc iTC •sa Irnn^ iTtm.

- t l /2

IMAaTi 2MAX2 2MAXi ' 2MAX2 ) (cc (2.14)

with a and P being zero-mean, independent , Gaussian distributed random

numbers of known variance . In this study, random fields are generated using equation

(2.13) with FOURN subroutine to perform the FFT and with the GAUSDEV and RAN2

subroutines to generate the random numbers a and (3. Original implementation of this

RFG was provided by Gutjahr (1989) and has been applied by Harter (1994).

2.5.1.2 Conditional two-dimensional Random Fields

Assume an array of measurements AT^ = {Xj,..., x^} is available and a two

dimensional conditional random field is generated such that at location{XJ,..., x^} the

measured values of the random variables X^are reproduced, and such that at all other

locations{x^jv» \} the generated random numbers, } have a sample

mean and sample covariance that converge to the conditional mean (X^ and conditional

49

covariance E22 , in the limit as the number of random fields generated becomes infinite,

[Harter(1992, 1994)],

00

[Z,.r =[X.XJ \ x^,x^,...,xJdX; -<a

ae 00

= [ X i X J I Xi, X,] = J J (x,., Xj i Xi, Xjx„ i,J= 1 n -<0—00

To implement the conditional random field generation, Delhomme (1979) used the

following approach based on work by Matheron (1973) and Joumel (1974,1978):

Initially, the measured data Xj are used to infer the moments, mean and covariance, of the

unconditional joint pdf of the random field. Then an estimate of the conditional mean

is obtained by simple kxiging. The kriging weights and the estimated conditional

mean [X^]*' are retained for the subsequent generation of conditional random fields, Xj^

Which are constructed as:

Z; = [ X ] ' + (Z, - [ZJ*) = [ X r + (2.15)

where [X]'' is the kriged random field given the simulated data ft-om the

unconditionally generated random field X^. X^ has a joint probability distribution defined

by the measured moments. [XJ*^ is the simulated equivalent to [X]'': It preserves the data

X in the unconditionally generated random fields at and only at the locations {x,,..., x }, aS X 111

where measurements are available in the real field site as well, and of the kriged

estimates [XjJ*' at all other locations given the unconditionally simulated

data . The difference (X-[X]'^) is a realization e^ of a possible estimation error

incurred by estimating the data through the kriged values [Xj''. The simulated error is

added to the originally estimated conditional mean [X]'' obtain a possible conditional

random field X . 5

The simulated estimation error e^ has the same conditional moments as for the real

estimation error e = (X-[X]'^) since the unconditional probability density functions of the

real and the simulated fields are identical (neglecting measurement and moment

estimation errors), and because the conditioning occurs at the same locations both at the

field site and in the simulations [Joumel (1974) and Delhomme (1979)]. The

unconditional random field generator and the kriging of the generated random field, from

the simulated measurement data, are repeated for each realization. Each simulation yields

a random field of estimation error e^, which can be added to the kriging estimate of the

real data to obtain a conditional random field.

For a large number of samples, the sample variance of X2s(x2) will converge in the

mean square to the true conditional variance or kriging variance of X2(x2) , Delhomme

(1979). This conditioning technique is independent of the method used to generate the

unconditional random field and is not related to the spectral random field generator.

51

3. STOCHASTIC MODEL DEVELOPMENT

3.1 Introduction

To achieve the above objectives and for a comparison study, a stochastic

conditional simulation approach will be introduced first. Random field generators as

discussed earlier are often used in the stochastic approach to generate realizations of

transmissivity field under some geostatistical assumptions and known statistical

parameters. These realizations are considered to represent real transmissivity fields

(reality). Since NN learns by examples, the above mentioned realizations will serve as

learning sets during the design of the NN.

An assumption of statistical second-order stationarity is made here in describing the

spatial heterogeneity of the transmissivity. Essentially this means that the statistical

properties are not a function of location but only of separation distance between locations

in the field. In an attempt to circumvent some problems of scale, the method described

here discretizes a block-centered finite differences grid so that the block size

approximates the scale of the sample volume of pumping tests. Alternatively, the point-

scale process could be integrated over the block domain but there remains the problem of

initially determining that point scale covariance. Concerning spatial discretization, there

needs to be at least 10 blocks of the finite difference grid per correlation scale to

adequately model the spatial variability [Van Lent and Kitanidis, 1996] and several

correlation lengths per scale of the bounded domain.

This chapter develops the theory and algorithm for stochastic approach of

conditioning random simulations of perturbations in log-transmissivity on measurements

of head. Equations for covariances are approximated by a small-perturbation linearization

of the two-dimensional vertically integrated groundwater flow equation. The covariance

equations are then solved, along with equations for head and mean head, by multigrid

partial differential equation solver developed by Steve Schaffer's (1995). Conditioning

on data is accomplished by cokriging using the covariances obtained from the numerical

solution, and head is solved directly from the cokriged transmissivities and the

groundwater flow model. An improvement to the match between simulated heads and

actual head data is accomplished by iteratively cokriging on the differences between

model heads and the data and solving the flow equation.

53

3.2 Mathematical Development of the Stochastic Model

3.2.1 Spectral Representation of the Flow Equation

The classical, depth-averaged equation for two-dimensional, steady groundwater

flow, with heterogeneous transmissivity is written as:

where T is transmissivity and ^ is hydraulic head, and x and y are the coordinates of

spatial location. Transmissivity is a spatially heterogeneous parameter that is unknown,

except at data locations (pumping tests), and is modeled statistically as a second-order

stationary random field. Uncertainty in the transmissivity parameter then propagates

through the model and results in uncertainty in the hydraulic head. Head also is measured

only at well locations in space. This model can be applied to basin-scale with steady flow

in confined aquifers or in unconfined aquifers v/here the change in saturated thickness is

negligible. This model is not valid where there is a significant component of vertical

flow, such as may be the case near point sources, sinks, or boundaries or where there is a

significant change in saturated thickness.

Boundary and initial conditions are required in order to complete the model. The general

boundary condition can be formulated as

aA^.n + b(^ = c

in which n is a unit vector outward normal to the boundary and a, b, and c are constants

whose values determine the boundary conditions. The types of boundaries which will be

(3.1)

54

considered here are where a = 0 and b = I (type 1), and where a = 1 and b = Q (type 2).

The value of c will be treated as deterministic along the boundary. Spatial location of the

model boundaries is an extremely important aspect of designing a model to represent

some real aquifer system. For this research, however, boundaries are considered to be of

known location, type, and magnimde.

Randomness is introduced and the model is linearized by a first-order small

perturbation expansion. The log-transformed transmissivity is expanded into the sum of a

constant mean, F = E{Ln(T{x))}, and small mean zero permrbation, f(x). Likewise,

hydraulic head is the sum of the mean head, /^(jc) = £{^(x)}, and a small mean zero

perturbation in head, h (x).

Equation (3.1) can be written in terms of LnIT] as:

d\nT dS d\nT d<f> „ — — T + - + ^ = 0 ( 3 . 2 ) dx~ dy dx dx dy dy

Assuming head and log-transmissivity to be spatial stochastic processes, they may

be decomposed as the sum of mean and perturbation parts about the mean

(p{x, y)=H(x) + hix,y)

(3.3)

LnTix, y) = F +fix, y)

In (3.3) H is only a function of the x-direction, implying a unidirectional mean-flow

assumption. Unidirectional mean flow is not excessively restrictive; the coordinate

55

system can be oriented so that the x-axis and mean-flow directions are parallel. The

LnlT] processes is assumed to have a constant mean, F.

Substituting (3.3) into (3.2) leads to:

d-H d-h d^h 8f dH df dh df dh ^ — — • : r + — = 0 (3.4) dx dx~ dy~ dx dx dx dx dy dy

If the variations in the transmissivity are small, then the second-order terms (products of

perturbations) may be neglected [see Mizell et al., 1982, p. 1054; Dagan, 1982, p. 819].

Then equation for mean flow is found by taking the expectation of (3.4)

\ ax dxj = 0 (3.5)

ydy dy^

and subtracting (3.5) from (3.4) results in the governing equation of perturbation in flow,

(3.6) dx^ dy"- ' dx

where /, =-dHldx is the mean gradient. Note that the second-order perturbation

products are neglected.

Dagan (1985) wrote: "In a recent paper, Gutjahr (1984) has shown that if the log-

conductivity and head fields are jointly normal, the first-order approximation becomes

exact and is valid for arbitrary crj. The present study indicates that this is not the case,

in the sense that the first-order approximation has to be supplemented by additional

terms. Nevertheless, for small to moderate values of al, it is seen that these additional

56

terms may be neglected". Recently, Loaiciga and Marino (1990) analyzed the error in

dropping the perturbation terms in (3.6) in the case of steady state flow case. They

reinforced those statements by Dagan (1985) and presented a necessary and sufficient

condition for the smallness of these two terms.

An assumption of statistical homogeneity (stationarity) for both T and h

fluctuations is made. Statistically homogeneous processes are described by statistical

properties (covariance and related functions) which depend only on the lag vector

separating observations, and not on actual locations of observations. Assuming statistical

homogeneity allows representation of input/and output h by Fourier-Stieltjes integrals

[Lumley and Panofsky (1964), p. 16]

00 J —00

(3.7)

where dZf and dZh represent complex Fourier amplitudes of the respective fluctuations

over kj and k^, wave-number space. Substituting (3.7) into (3.6) gives, after

differentiating,

-iidZ, =0 (3.8)

57

where (3.8) is true if the bracketed term is zero. Thus, an equation relating the

Fourier amplitudes of/and h by uniqueness of the representation yields

dZ dZ i k , , ) (3.9) ( ^ I + 2 )

From Lumley and Panofsky (1964, p. 16), the expectation of the product of the

Fourier amplitude with its complex conjugate is the spectrum for the random process. By

this procedure a relationship between the spectra of/and h can be derived from (3.9) as

2 2

S,{k„k^)dk = EidZ,dzl)=-p^^l^Sf{k„k,)dk (3.10) (^1 +^2)

where is the head spectrum and Sj- is the LnlT] spectrum (asterisk designates the

complex conjugate).

The head spectrum (3.10) is directly proportional to the Ln[T] spectrum, and the

proportionality factor is a function of the mean gradient and wave-number space location.

Mathematically, knowledge of either the h or/spectrum would allow evaluation of the

other. In the following section, the Ln[T] spectrum will be identified for use in (3.10).

58

3.2.2 Lii[T] Spectra and Covariance Function

A spectrum-covariance pair for two-dimensional spatial processes was proposed by

Whittle (1954) based on analysis of agricultural yields. The Whittle spectrum has the

form:

S.<.k,.k,_)= {3.11) +k^ + a " ) "

a = — 2X Jo a-

where A is the integral scale and cr^ the variance of the whittle process. The associated

covariance function obtained from Fourier transform of ^^(Luniley and Panofsky,

1964), is formulated in terms of modified Bessel function

where C is the magnitude of the separation, or lag vector between observation points,

and kj is the modified Bessel function of second kind, first order. Whittle [(1954), p.

448) describes (3.12), when normalized by the variance, as "the elementary correlation

function in two-dimensions, similar to the exponential e~'^''^in one dimension." Gelhar

(1976) uses the Whittle spectrum and covariance functions to describe the log-

conductivity process when analyzing two-dimensional phreatic flow.

59

The head variance is obtained by using the assumed Ln[T] spectrum, (3.11), in

(3.10) and integrating over wave-number space. Note that singularity at = 0

remains in the head spectrum, and as a result the integral is divergent. Thus, the Whittle

spectrum will not yield a finite variance for two-dimensional confined groundwater flow

under the assumption of statistically homogeneous A. As an alternative, Mizell (1982)

considered two spectra, A and B, appropriately modified to eliminate the singularity in

the head spectrum. In this study only modified spectra A will be adopted. The modified

spectra A for two-dimensional spatial process is

la- a-jk, -^kl) ^ Tvik ' -+kl+a^y

(3.13)

The associated covariance fiinctions and integral scale are;(see Mizell, 1982 for detail)

^.(^) = ^ 7^ 4A

K fjL 4JL J

£L 4 A

\

K. 4A

(3.14)

where

A= — 4a

60

3.2.3 Head Variance and Covariance Functions

The head covariance function is detennined by the Fourier transform of the head

spectrum. Variance of the head process may be found by evaluating the Fourier

transform of the spectrum at zero lag.

OO OO

(3.15)

OO OO

= j \ S^iki,k^)dk^dk. (3.16)

where the h subscript denotes statistical properties of the head fluctuations and and

are components of the lag vector separating observation points of the process.

Substituting (3.10) into (3.15), assuming (3.13) for the Ln[T] spectrum, and

performing the integration (see Mizell, 1982 for details) leads to the head covariance

function associated with input spectrum A. Written in polar coordinates, this head

covariance function has the form

cos^ X

RAH =

4 { 4 Z J K.

y4Aj 4X \4A ,

4 U > ^ .

-I

\4AJ l4Z

\-2

y

\ 4AJ 4^4J lJ \ 4ZJ

J J

(3.17)

61

where x is the angle between the lag vector, and the mean-flow direction, and

A is the integral scale associated with . Equation (3.16) gives the head covariance

3.2.4 Cross-Covariance Functions

Evaluation of the cross-covariance function between head and Ln[T], a measure of

dependence between the two processes at various lag distances, first requires evaluation

of the cross-spectrum. Formulation of the cross-spectrum parallels that of the head

spectrum previously discussed. The cross-spectrum is given by the expectation of the

Fourier amplitude of the head fluctuations, (3.9), multiplied by the complex conjugate of

the Ln[T] Fourier amplitude (Lumley and Panofsky, 1964, p. 21):

The cross-covariance function is the Fourier transform of the cross-spectmm.

Substituting spectrum A,(3.13), into (3.19) and completing the transform (see Mizell,

1982) gives the associated cross-covariance function for spectra A, which in polar

coordinate, is

;r " ^ r 2 _ 2 3 2 . 2 'J \ ^ f ^ (3.18)

S^ ikMdk = E(dZ,dZ\ ) = — ( A r f + A : - )

(3.19)

(3.20)

62

7t Note that (3.20) is zero valued when = 0 or when - Thi^s, head and Ln[T]

fluctuations are uncorrelated at zero lag; £[/z/] = 0.

Covariance and correlation functions (correlation functions are the covariance

function divided by the variance) measure the degree of dependence of information

obtained at various lag distances. These functions are constructed by taking the expected

value of the product of two observations made at various lag distances (Luniley and

Panofsky, 1964, p. 14)

(3.21, cr a

where p is the correlation function and X has zero mean. Positive correlation values

imply correlation between observations of similar magnitude (high values with high

values or low with low), and negative values for the correlation function imply

correlation of observations or opposite magnitude (high values with low). Larger

absolute values of correlation indicate stronger dependence between observations at that

lag distance. Maximum correlation occurs at zero lag distance because observations are

compared with themselves.

63

3.3 Conditioning

Consider the natural log of transmissivity, Ln[T(x)], of an aquifer to be a

stationary stochastic process, with a constant unconditional mean EPLn(T)]=F, and the

unconditional perturbation f. The corresponding hydraulic head is given by

^{x) = H{x) + h{x), where and h is the unconditional head perturbation.

Suppose that a limited number of transmissivity and head measurements, designated

and n^, respectively, are available for the aquifer, where /)* = Ln[7)* ] - F and

represent the observed transmissivity and head values, respectively, with i =

1 , 2 , . . . , r i j -and j=nj^+l , . . . , n^+n^ .

One possible solution of conditional simulation is to preserve observed head and

transmissivity values at sample locations, while satisfying both the underlying statistical

properties (i.e. mean and covariance, etc.) of the field and the governing flow equation.

Cokriging, as discussed in the next section, is considered the best linear unbiased

estimator, and is usually used for conditional simulations. In the conditional probability

concept, a head or transmissivity field represents a conditioned realization of ^ or Ln[T]

in the ensemble. Infinitely many possible realizations of conditioned fields ^ or Ln[T]

exist.

3.3.1 Cokriging

Cokxiging is an estimation technique useful when two (or more) correlated

variables are measured in ±e field and can be estimated together [de Marsily (1986);

Kitanidis and Vomvoris (1983); Neuman (1984) and Hoeksema and Kitanidis (1984)].

For example, transmissivity values for an aquifer, which is generally correlated with

hydraulic head, can be estimated from not only measurements of transmissivity itself, but

also simultaneously from hydraulic head measurements.

In cokriging, the unknown mean removed value Ln[T] is estimated at any point

in the aquifer by a weighted linear combination of the observed Ln[T] and h. The weights

are determined by requiring the estimator to be unbiased with minimum variance. In this

sense, cokriging is referred to as a Best Linear Unbiased Estimator. Cokriging is also an

exact interpolator; that is, at any measurement point, the cokriged estimated value is

equal to the measured one, and the cokriging variance is zero.

By casting the problem in a probability firamework [Dagan (1982a,b).(1985a,b) ;

and Rubin and Dagan (1987)] show that when the random transmissivity T and head h

fields are jointly Gaussian, with known mean and covariance, the cokriging estimate and

its covariance are equivalent to the conditional mean and covariance of the new joint pdf

that is conditioned on the measurements. Even if cokriging is an optimal and exact

estimator, it can not reproduce all the details about the transmissivity field that have not

been measured, regardless of the quantity and quality of head measurements, because it

reflects conditional expectation. As a result, the cokriged map is inevitably smoother than

the actual one. This would be true even if head information were available everywhere.

65

In this section, cokriging will be described briefly; detailed discussions of the

theory is found in many literamre sources, e.g. de Marsily(1986) and Vauclin et al.

(1983).

Assume that Zi(x) and Z2(x) are two correlated second-order stationary random

fields with known means and covariances. The estimation of Zi, and Z?, if necessary,

using cokriging, is based on the best linear unbiased estimator technique, which is,

C-^o) — ^ A/^l ^ ^2 ^n+k ) (3 -22) j=i 4=1

where:

Z, (x,) = the estimate at location ,

Z, {Xj) = the measured values of Z, at locations x, .Xj,..., . Similarly,

2^2(^n+i) = measured values of Z, at locations • Similarly,

and = are the cokriging weights corresponding to Zj and Z^, respectively.

Note that, Z, and Z^ need not to be measured at the same location. The minimal variance

criterion requires that:

= Minimal (3.23)

If Z, and Z, are second order stationary processes, the cokriging system of equations can

be written as:

66

n m + = - X . ) for / = ! n

(3.24) n m

The auto-covariances of Z, and Zj, and the cross covariances between Z, and Z,

are denoted by Cj^Cjiand C,,, respectively. By solving the system of Equations (3.24),

the cokriging weights can be obtained. Following this, equation (3.22) can be used to

obtain the estimates of either or both variables at any location. Prior to the solution of the

normal system of equations, the auto-covariances and cross covariances as functions of

the separation distance must be known. This information can be obtained from either

field data or some theoretical analyses.

One of the applications of the covariance and cross-covariance approximations

developed in section 3.2 is that they can be used for conditioning randomly simulated

fields jointly on head and transmissivity data by linear cokriging estimation. Given a data

vector of transmissivity and hydraulic head measured at spatial locations, the cokriging

estimator of the perturbation in LnlT(x)], at some locations,, is the sum of the product of

the data and the weight applied to each datum. That is:

(3.25)

67

where is ±e cokriged/value at location x,. Then, transmissivity becomes

fk^ô) co-kriging weights A-.and jUj^ associated with pointcan be

evaluated as;

n, " f+nn

1=1 j=n^+l

(3.26)

î„Rfh(.î^ (Xy , ) = /?yj^ (Xo , X; ) / +l,/2y +2,.../2y- +/Z^

where R^. and R^ are covariances of / and h, respectively, and R^ is the cross-

covariance of/and h. Note that F is assumed to be known and H(x) is derived by solving

the governing flow equation with the transmissivity equals to e''. Similarly, one can use

classical cokriging based on the observed f' and h'j to construct a cokriged head field.

That is:

hkM = (3.27) 1=1 7=1/+1

where h^. (x,) is the cokriged h value at location x,, and /3-^ and are the co-kriging

weights, which can be evaluated as follows:

«/ "/+"!,

j^j3ioRffix^,x,)+ = /?^(x,,x,) / = l,2,..-,/Zj. i=l

(3.28)

^r j ,Rh/ , ( ^Xj 'X , ) = Ru,{x„Xi ) l=nf+Xn^+2, . . .n f+n^

The covariances and cross-covariance , in (3.26) and (3.28) are derived from

the first-order approximation in equations (3.14),(3.17), and (3.20).

3.3.2 Iterative Conditioning Procedure

Conditioning on data using the cokriging estimator is accomplished by a standard

geostatistical procedure. This procedure is illustrated in Figure (3.1). First, classical

cokriging is performed based on the observed f' and h'- to construct (x), a cokriged

/field, mean removed Ln[T], as shown in Equation (3.25), where ck stands for cokriging.

The cokriging coefficients are calculated using Equation (3.26). Similarly, classical

cokriging is performed based on the observed f' and h'. to construct {x), a cokriged

h field, mean removed (p as shown in Equation (3.27), and the cokriging coefficients are

calculated using Equation (3.28).

Second, an unconditional realization of the random perturbation of the Ln[T(x)]

field, /„ (x), which maintains the prescribed covariance function, is generated, where

subscript u denotes the unconditional simulation. Cokriging can then be performed using

the measured values at the acmal sample locations to derive the kriging estimate, (x),

which matches the random field samples at the data locations as shown schematically in

Figure (3.1). The cokriging error of the unconditional simulation can be calculated as the

difference between the transmissivity value of the given unconditional simulation and its

cokriging estimate, represented by [/„(x)-/„*(x)], and is always zero at sample

Figure (3.1) Illustration of conditioning a random field on data

70

locations. The corresponding unconditional head field, (x) = H{x) + (x), is

generated by solving the governing flow equation, with (x) = .

The head perturbations obtained firom the flow model parameterized with the

conditioned/field do not exactly match the head perturbation data. This is due, in part, to

the first-order linearization approximation [Dagan (1985)]. As the input variance of the/

field is reduced, the discrepancy between the data and the model results also diminishes.

A deparmre fi-om a multivariate Gaussian distribution of the correlated/and h variables

may also affect this discrepancy [Gutjahr (1984)]. The match between head perturbation

data and the head perturbations resulting from the flow model parameterized with the

conditioned / field can be improved by iteratively conditioning on head data [Gutjahr et

al. (1994)]. The strategy is to take the difference between the hd data and the model

results and add this difference to the unconditioned hu.

First, the unconditioned random field /„(x) is substimted into the flow model

equation, and the resulting perturbations around the mean head (x) are sampled at the

locations corresponding to the observation points in the true field (call these samples

0

K (^j) )• These samples and the samples of the unconditioned / field are used in the

cokriging equation (3.25) to obtain a cokriged /^(x) field, and the cokriging weights

\ and /jj_ are determined from (3.26) for each location. Then, is substituted into

the flow model, and the resulting discrepancy between head perturbation data and the

71

model results are added to the unconditionedfield. At iteration m, the conditioned /

where is a vector of samples of head perturbations resulting from the flow model at

i te ra t ion m-l , and / / . i s the cokr ig ing weight assoc ia ted wi th loca t ion i .

These iterations are repeated until a prescribed convergence criterion is reached.

As a result, T^^x) = md ^^(x ) = H(x) + h^{x) agree with the measured values in

the true fields, where the conditioned 7].(x)and the true transmissivity field have the

same covariance function. The above procedure can then be repeated using different seed

numbers to generates different realizations for conditioning.

m field (x-) at location x. is

(3.29)

m—I

72

4. NEURAL NETWORKS

4.1 Introduction

Fundamentally, there are two major different approaches in the field of artificial

intelligence (AI) for realizing human intelligence in machines [Munakata, (1994)]. One is

symbolic Al, which is characterized by a high level of abstraction and a macroscopic

view. Classical psychology operates at a similar level, and knowledge engineering

systems and logic programming fail in this category. The second approach is based on

low-level microscopic biological models. It is similar to the emphasis of physiology or

genetics. Artificial neural networks and genetic algorithms are the prime examples of this

latter approach. They originated from modeling of the brain and evolution. However,

these biological models do not necessarily resemble their original biological counterparts.

Neural networks are a new generation of information processing systems that are

deliberately constructed to make use of some of the orgaiiizational principles that

characterize the human brain. The main theme of neural network research focuses on

modeling of the brain as a parallel computational device for various computational tasks

that were performed poorly by traditional serial computers. Neural networks have a large

number of interconnected processing elements (nodes) that usually operate in parallel

and are configured in regular architectures. The collective behavior of an NN, like a

human brain, demonstrates the ability to learn, recall, and generalize from training

patterns or data. Since the first application of NNs to consumer products appeared at the

end of 1990, scores of industrial and commercial applications have come into use. Like

73

fuzzy systems, the applications where neural networks have the most promise are also

those with a real-world flavor, such as speech recognition, speech-to-text conversion,

image processing and visual perception, medical applications, loan applications and

counterfeit checks, and investing and trading.

Models of neural networks are specified by three basic entities: models of the

processing element themselves, models of interconnections and strucmres (network

topology), and the learning rules (the ways information is stored in the network).

Each node in a neural network collects the values from all its input connections,

performs a predefined mathematical operation, and produces a single output value. The

information processing of a node can be viewed as consisting of two parts; input and

output. Associated with the input of a node is an integration function (typically a dot

product) which serves to combine information or activation from an external source or

other nodes into a net input to the node. A second action of each node is to output an

activation value as a function of its net input through an activation function, which is

usually nonlinear. Each connection has an associated weight that determines the effect of

the incoming input on the activation level of the node. The weights may be positive

(excitatory) or negative (inhibitory). The connection weights store the information, and a

neural network learning procedure often determines the value of the connection weights.

It is through adjustment of the connection weights that the neural network is able to leam.

In a neural network, each node output is connected through weights to other nodes

or to itself. Hence, the structure that organizes these nodes and the connection geometry

among them should be specified for a neural network. We can first take a node and

combine it with other nodes to make a layer of nodes. Inputs can be connected to many

nodes with various weights, resulting in a series of outputs, one per node. This results in a

single-layer feed-forward network. We can further interconnect several layers to form a

multi-layer feed-forward network. The layer that receives inputs is called the input

layer and typically performs no computation other than buffering of the input signal. The

outputs of the network are generated from the output layer. Any layer between the input

and the output layers is called a hidden layer because it is internal to the network and has

no direct contact with the external environment. There may be from zero to several

hidden layers in a neural network. The two types of networks mentioned above are

feedforward networks since no node output is an input to a node in the same layer or

preceding layer. When outputs are directed back as inputs to same- or preceding-layer

nodes, the network is a feedback network. Feedback networks that have closed loops are

called recurrent networks.

The third important element in specifying a neural network is the learning scheme.

Broadly speaking, there are two kinds of learning in neural networks; parameter

learning, which concems the updating of the connection weights in a neural network,

and structure learning, which focuses on the change in the network structure, including

the number of nodes and their connection types. These two kinds of learning can be

performed simultaneously or separately. Each kind of learning can be further classified

into three categories:

Supervised learning, reinforcement learning, and unsupervised learning.

Among these, the supervised learning is of more concern in this study as will be

discussed later.

Among the existing neural network models, two of the important ones that started

the modem era of neural networks are the Hopfleld network and the back-propagation

network. The Hopfield network is a single-layer feedback network with symmetric

weights proposed by John Hopfield in 1982. The network has a point attractor dynamics,

making its behavior relatively simple to understand and analyze. This feature provides

the mathematical foundation for understanding the dynamics of an important class of

networks. The back-propagation network is a multi-layer feed-forward network with a

gradient-descent-type learning algorithm, called the back-propagation learning rule [by

Rumelhart, Hinton, and Williams's (1986b)]. In this learning scheme, the error at the

output layer is propagated backward to adjust ±e connection weights of preceding layers

and to minimize output errors. The back-propagation learning rule is an extension of the

work of Rosenblatt, Widrow, and Hoff to deal with learning in complex multi-layer

networks providing an answer to one of the most severe criticisms of the neural network

field. With this learning algorithm, multi-layer networks can be reliably trained. The

back-propagation network has had a major impact on the field of neural networks and is

the primary method employed in most of the applications.

Neural networks offer the following characteristics and properties;

Nonlinear input-output mapping: Neural networks are able to leam arbitrary

nonlinear input-output mapping directly from training data.

16

Generalization: Neural networks can sensibly interpolate input patterns that are

new to the network. From a statistical point of view, neural networks can fit the desired

function in such a way that they have the ability to generalize to situations that are

different from the collected training data.

Adaptivity: Neural networks can automatically adjust their connection weights, or

even network structures (number of nodes or connection types), to optimize their

behavior as controllers, predictors, pattern recognizers, decision makers, and so on.

Fault tolerance: The performance of a neural network is degraded gracefully under

faulty conditions such as damaged nodes or connections. The inherent fault-tolerance

capability of neural network stems from the fact that the large number of connections

provides much redundancy, each node acts independently of all the others, and each node

relies only on local information.

4.2 Basis of Neural Networks

The human brain is a very complex system capable of thinking, remembering, and

problem solving. There have been many attempts to emulate brain functions with

computer models, and although rather spectacular achievements have stemmed coming

from these efforts, all of the models developed to date pale in comparison to the complex

functioning of the human brain.

A neuron is the fundamental cellular unit of the brain's nervous system. It is a

simple processing element that receives and combines signals from other neurons through

input paths called dendrites. If the combined input signal is strong enough, the neuron

"fires," producing an output signal along the axon that connects to the dendrites of many

other neurons. Figure (4.1) is a sketch of a neuron showing the various components. Each

signal coming into a neuron along a dendrite passes through a synapse or sj^aptic

junction. This junction is an infinitesimal gap in the dendrite that is filled with

neurotransmitter fluid that either accelerates or retards the flow of electrical charges. The

fundamental actions of the neuron are chemical in nature, and this neurotransmitter fluid

produces electrical signals that go to the nucleus or soma of the neuron. The adjustment

of the impedance or conductance of the synaptic gap is a critically important process.

Indeed, these adjustments lead to memory and learning. As the synaptic strengths of the

neurons are adjusted, the brain "leams" and stores information.

When a person is bom the cerebral cortex portion of his or her brain contains

approximately 100 billion neurons. The outputs of each of these neurons are connected

through their respective axons (output paths) to about 1000 other neurons. Each of these

1000 paths contains a synaptic junction by which the flow of electrical charges can be

controlled by a neurochemical process. Hence, there are about 100 trillion synaptic

junctions that are capable of having influence on the behavior of the brain. It is readily

apparent that in our attempts to emulate the processes of the human brain, we cannot

think of billions of neurons and trillions of synaptic junctions. Indeed, the largest of our

neural networks typically contain a few thousand artificial neurons and less than a million

artificial synaptic junctions.

78

Dendrites Cell body(Soma)

Figure (4.1) Sketch of biological neuron

The one area in which artificial neural networks may have an advantage is speed.

When a person walks into a room, it typically takes another person about half a second to

recognize them. This recognition process involves about 200-250 individual separate

operations within the brain. As a benchmark for speed, this means that the human brain

operates at about 400-500 hertz (Hz). Modem digital computers typically operate at clock

speeds between 100 and 500 megahertz (MHz), which mean that they have a very large

speed advantage over the brain. However, this advantage is dramatically reduced because

digital computers operate in a serial mode whereas the brain operates in a parallel mode.

However, neural network chips have been developed in recent years that enable neural

computers to operate in a parallel mode.

4.3 Artificial Neurons

An artificial neuron is a model whose components have direct analogs to

components of an actual neuron. Figure (4.2) shows the schematic representation of an

artificial neuron. The input signals are represented by xo, Xi, xi,..., Xn. These signals are

continuous variables, not the discrete electrical pulses that occur in the brain. Each of

these inputs is modified by a weight (sometimes called the synaptic weight) whose

function is analogous to that of the synaptic junction in a biological neuron. These

weights can be either positive or negative, corresponding to acceleration or inhibition of

the flow of electrical signals. This processing element consists of two parts. The first part

simply aggregates (sums) the weighted inputs resulting in a quantity I; the second part is

80

w. 2

Sum of weights Axon Activation function

X

Soma

Neuron J

X n

Figure (4.2) Sketch of artificial neuron

81

effectively a nonlinear filter, usually called the activation function or transfer function

±rough which the combined signal flows.

Figure (4.3) shows several possible activation functions. The threshold function as

shown in Figure (4.3 a) passes information only when the output I of the first part of the

artificial neuron exceeds the threshold T. The signum function (shown in Figure (4.3b),

sometimes called a quantizer function) passes negative information when the output is

less than the threshold T, and positive information when the output is greater than the

threshold T. More commonly, the activation function is a continuous function that varies

gradually between two asymptotic values, typically 0 and 1, or - 1 and + 1 (called the

sigmoidal function). The most widely used activation function is the logistic function,' a

member of the sigmoidal activation functions, (shown in Figure (4.3c)), and is

represented by the equation

1 + e" m=T-^ (4-1)

where a is a coefficient that adjusts the abruptness of this function as it changes between

the two asymptotic values.

A more descriptive term for the activation function is "squashing function," which

indicates that this function squashes or limits the values of the output of an artificial

neuron to values between the two asymptotes. This limitation is very useful in keeping

the output of the processing elements within a reasonable dynamic range. However, there

are certain situations in which a linear relation, sometimes only in the right half-plane, is

used for the activation function. It should be noted, however, that the use of a linear

1

(a)

0 ( / ) = ^ ^ I [ 0 , / < T \

^ 0 ( / ) 1

(a)

0 ( / ) = ^ ^ I [ 0 , / < T \

0 Threshhold T

4<D(/ )

0(7) =

Threshhold T

0(7)

^ ( / ) =

Figure (4.3) Transfer functions for neurons

83

activation function removes the nonlinearity from the artificial neuron. Without

nonlinearities, a neural network carmot model nonlinear phenomena.

4.4 Artificial Neural Networks

Lefteri H. et al. (1997) defined an artificial neural network as

"A data processing system consisting of a large number of simple, highly

intercormected processing elements (artificial neurons) in an architecture inspired by the

structure of the cerebral cortex of the brain".

These processing elements are usually organized into a sequence of layers or slabs

with full or random connections between the layers. This arrangement as shown in Figure

(4.4) uses the input layer as a buffer that presents data to the network. This input layer is

not a neural computing layer because the nodes have no input weights and no activation

functions. The top layer is the output layer, which presents the output response to a given

input. The other layer (or layers) is (are) called the intermediate or hidden layer(s)

because it (they) usually, has (have) no connections to the outside world. Typically the

input, hidden, and output layers are designated the i''', and k^ layers, respectively.

Two general kinds of neural networks are in use: the hetero-associative neural

network in which the output vector is different than the input vector, and the auto-

associative neural network in which the output is identical to the input.

84

yi yr

Output layer

Hidden layer

input layer

1

Figure (4.4) Architecture of neural network

4.5 Example of Artificial Neural Network

A typical neural network is "fully connected" meaning there is a connection

between each of the neurons in any given layer with each of the neurons in the next layer

as shown in Figure (4.5). When there are no lateral connections between neurons in a

given layer or previous layers, the network is said to be a feed-forward network.

Networks with connections from one layer back to a previous layer are called feedback

networks. Lateral connections between neurons in the same layer are also called feedback

connections. In certain cases, a neuron has feedback from its output to its own input. Ih

all cases, these connections have weights that must be trained.

Each of the connections between neurons has an adjustable weight (shown in

Figure (4.5)). This simple neural network is a fully connected, feed-forward network with

three neurons in the input layer, four in the middle or hidden layer, and two in the output

layer. The individual weights are shown on the connection and are designated by symbols

such as w-j. For instance, the symbol indicates a weight on the connection between

neurons 2 and 6.

Let us consider the neural network of Figure (4.5), which has an input vector X

consisting of components x^,x^,x^,and an output vector Y having components

and . When a signal x, is applied to neuron 1 in the input layer, the output of x,

goes to each of the artificial neurons in the middle or hidden layer, passing through

weights w,j, w,6, and w^-,. The input signals and x^ behave in a similar manner.

86

J^8 ^9

Figure (4.5) Feedforward Neural Network

87

sending signals to neurons 5, 6, and 7 with the appropriate weights (shown in Figure

(4.5)).

Now consider the behavior of neuron 6. It has four inputs from the four neurons in

the input layer that have been modified by the connection weights w,g, Wjs. and .

The first part of this neuron simply sums up these three weighted inputs. This summation

is then passed to the second part of the neuron, which is typically a nonlinear function

(e.g. a logistic curve between 0 and 1 as shown in Figure (4.3c)). The output of this

activation or squashing function is then sent to neurons 8 and 9 through weights

vvgg and Wg,. Neurons 5 and 7 behave in a similar manner. Neurons 8 and 9 collect the

weighted inputs from neurons 5, 6 and 7, sum them, and pass the sums through the

activation functions to produce and yg, the components of the output vector Y.

It is convenient to use vector and matrix notation in writing the inputs and outputs

relationship between layers. For simplicity, assuming the activation functions are linear,

the input output relationship can be written in vector matrix form as:

V,=w,-X, (4.1)

where Vj is the output vector of the neurons in the layer, W,j is the weight matrix of

neurons connecting neuron i and j, and X, is the input vector. For example, in Figure

(4.5), Vj which is the output of neuron 5 in the j''' layer can be explicidy written as:

V5 = W15X1 + W25X2 + W35X3 + W45X4 (4.2)

88

Similarly, the output vector Y,j of layer k can be written as the dot product of the weight

matrix Wj^ and the input vector Vj, that is

Y,=W^.V, (4.3)

Combining (4.1) and (4.3), Y,^can be written as:

Y.=w;, .w^.x, (4.4)

4.6 Learning and Recall

Neural networks perform two major functions: learning and recall. Learning is the

process of adapting the connection weights in an artificial neural network to produce the

desired output vector in response to a stimulus vector presented to the input buffer. Recall

is the process of accepting an input stimulus and producing an output response in

accordance with the network weight strucmre. Recall occurs when a neural network

globally processes the stimulus presented at its input buffer and creates a response at the

output buffer. Recall is an integral part of the learning process since a desired response to

the network must be compared to the actual output to create an error function.

The leaming rules of neural computation indicate how connection weights are

adjusted in response to a leaming example. In supervised learning, the artificial neural

network is trained to give the desired response to a specific input stimulus. In graded

learning, the output is "graded" as good or bad on a numerical scale, and the connection

weights are adjusted in accordance with the grade.

89

In unsupervised learning, no specific response is sought, but rather, the response

is based on the network's ability to organize itself. Only the input stimuli are applied to

the input buffers of the network. The network then organizes itself internally so that each

hidden neuron responds strongly to a different set of input stimuli. These sets of input

stimuli represent clusters in the input space (which often represent distinct real-world

concepts or features).

The vast majority of learning in engineering applications involves supervised

learning. In this case, a stimulus buffer representing the input vector is presented at the

input, and another stimulus representing the desired response to the given input is

presented at the output buffer. This desired response must be provided by a

knowledgeable teacher. The difference between actual output and desired response

constitutes an error, which is used to adjust the connection weights. In other cases, the

weights are adjusted in accordance with criteria that are prescribed by the nature of the

learning process, as in competitive learning or in Hebbian learning.

There are a number of common supervised learning algorithms utilized in neural

networks. Perhaps the oldest is Hebbian learning, named after Donald Hebb, who

proposed a model for biological learning (Hebb, 1949), where a connection weight is

incremented if both the input and the desired output are large. This type of learning

comes from the biological world, where a neural pathway is strengthened each time it is

used. "Delta rule" learning takes place when the error (i.e., the difference between the

desired output response and the actual output response) is minimized, usually by a least

squares process. Competitive learning, on the other hand, occurs when the artificial

neurons compete among themselves, and only the one that yields the largest response to a

given input modifies its weight to become more like the input. There is also random

learning in which random incremental changes are introduced into the weights, and then

either retained or dropped, depending upon whether the output is improved or not (based

on whatever criteria the user specifies).

In the recall process, a neural network accepts the signal presented at the input

buffer and then produces a response at the output buffer that is determined by the training

of the network. The simplest form of recall occurs when there are no feedback

connections from one layer to another or within a layer (i.e., the signal flow from the

input buffer to the output buffer in a "feed-forward" manner). In a feed-forward network,

the response is produced in one cycle of calculations by the computer.

91

5. BACK-PROPAGATION NEURAL NETWORK

5.1 General

Backpropagation is a systematic method for training multiple (three or more)-Iayer

artificial neural networks. The elucidation of this training algorithm in 1986 by

Rumelhart, Hinton, and Williams (1986) was the key step in making aeural networks

practical in many real-world situations. However, Rumelhart, Hinton, and Williams were

not the first to develop the backpropagation algorithm. It was developed independently

by Parker (1987) in 1987 and earlier by Werbos (1974) in 1974 as part of his Ph.D.

dissertation at Harvard University. Nevertheless, the backpropagation algorithm was

critical to the advances in neural networks because of the limitations of the one- and two-

layer networks discussed previously. Indeed, backpropagation played a critically

important role in the resurgence of the neural network field in the mid-1980s. Today, it is

estimated that 80% of all applications utilize this backpropagation algoritfmi in one form

or another. In spite of its limitations, backpropagation has dramatically expanded the

range of problems to which neural network can be applied, perhaps because it has a

strong mathematical foundation.

A backpropagation neural network is typically initialized with random weights on its

synapses. During training when the network is exposed to a data set of input and output

vectors, its weights are incrementally adjusted to reduce the difference between estimated

and acmal output vectors. When a sufficiently small difference (mean square error)

between the two output vectors has been achieved, training of the network is terminated.

92

At this point, the network is ready to categorize previously unknown inputs in the

application it was trained for.

Backpropagation belongs to the class of algorithms that perform gradient descent.

As such, it is a descendant of the Widrow-Hoff or delta rule which will be discussed in

the following section.

Among the advantages of backpropagation is its ability to store numbers of patterns

far in excess of its built-in vector dimensionality. On the other hand, it requires very large

training sets in the leaming phase, and the minima it converges to, may or may not be

global minima.

5.2 Widrow-Hoff Delta Leaming Rule

Consider a typical neuron, shown in Figure (5.1), with inputs x,-, weights w- ,a

summation function in the left half of the neuron, and a nonlinear activation function in

the right half. The summation of the weighted inputs designated by I is given by:

The nonlinear activation function used is the typical sigmoidal function and is

given by:

This function is, in fact, the logistic function, one of several sigmoidal functions, which

monotonically increase from a lower limit (0 or - I) to an upper limit (+1) as I increase.

n (5.1)

(5.2)

93

I = Y.w,x, ®(/) = ^>(/)

X n

Figure (5.1) Sketch of an artificial neuron

94

A plot of a logistic function is shown in Figure (4.3c), in which values vary between 0

and 1, with a value of 0.5 when I is zero. An examination of this figure shows that the

derivative (slope) of the curve asymptotically approaches zero as the input I approaches

minus infinity and plus infinity, and it reaches a maximum value of a/4 when I equals

zero as shown in Figure (5.2). Since this derivative function will be utilized in

backpropagation, it can be reduced to its most simple form. If we take the derivative of

equation (5.2), we get

^^^£1 = (_i)(i + e-^ (-a) = ae-^0'(/) (5.3) dl

If we solve equation (5.2) for e'"', substitute it into equation (5.3), and simplify,

we get

= {a[l-0(/)}l>(/)}=a(l-0)0 (5.4)

where 0(/) has been simplified to O by dropping (I). It is important to note that multi

layer networks have greater representational power than single-layer networks only if

nonlinearities are introduced. The logistic function (also called the "squashing" function)

provides the needed nonlinearity. However, in the use of the backpropagation algorithm,

any nonlinear function can be used if it is everywhere differentiable and monotonically

increasing with I. Sigmoid functions, including logistic, hyperbolic tangent, and

arctangent functions, meet these requirements.

The use of a sigmoid (squashing) function provides a form of "automatic gain

control"; that is, for small values of I near zero, the slope of the input-output curve is

95

1.5

a =6

1.0 —

a = 3

0.5 — a =2

a = 1

0.0

-1.0 0.0 1.0 2.0 -2.0

I

Figure (5.2) Derivative of Logistic Function

96

steep, producing a high gain. All sigmoid activation functions have derivatives wi± bell

shapes of the type (shown in Figure (5.2)). The derivative of the sigmoid has its largest

value (slope) in the transition region. Once the unit has been saturated, either positively

or negatively, the derivative of the sigmoid is almost zero. Hence, the derivative is used

to control when units are allowed to make relatively large changes to their input

connections, because it governs the magnitude of the error signal at each output unit.

Output units that are saturated are very slow to change, while output units that have

inputs not yet saturated them can be strongly influenced by an error signal.

The Widrow-Hoff delta learning rule can be derived by considering the node of

Figure (5.3), where T is the target or desired value vector and I is defined by equation

(5.1) as the dot product of the weight and input vectors and is given by

n ^ = Z (5.5)

i= 1

For this derivation, no quantizer or other nonlinear activation function is included,

but the result presented here is equally valid when such nonlinear elements are included.

2 From Figure (5.3), the squared error e as a function of all weights w,.can be

written as:

s^=(T-lf (5.6)

The gradient of the square error vector is the partial derivatives with respect to each

of these i weights:

w

X

Figure (5.3) Sketch of a neuron without activation function

98

^ = -2(r - /) = -2(r - DX; (5.7) ow. ow.

Since this gradient involves only the ith weight component, the summation of

equation (5.5) disappears.

The Widrow-Hoff delta-training rule ensures that the change in each weight vector

component is proportional to the negative of its gradient:

a — 2 Aw. = -K -— = K.2(T - /)x, = IKex, (5.8)

where AT is a constant of proportionality. The negative sign is introduced because a

minimization process is involved. It is common to normalize the input vector component

x,.by dividing by|Xj^. Equation (5.8) now becomes

= (5.9)

if we define the learning constant t] to be equal to the terms in the brackets:

i2 77 = 2ii: X (5.10)

5.3 Multi-layer Back-propagation Training

Before discussing the details of the backpropagation process, let us consider the

benefits of the middle layer(s) in an artificial neural network. A network with only two

layers (input and output) can only represent the input with whatever representation

already exists in the input data. Hence, if the data are discontinuous or nonlinearly

separable, the innate representation is inconsistent, and the mapping cannot be learned.

Adding a third (middle) layer to the artificial neural network allows it to develop its own

internal representation of this mapping. Having this rich and complex internal

representation capability allows the hierarchical network to leam any mapping, not just

linearly separable ones.

Some guidance to the number of neurons in the hidden layer is given by

Kolmogorov's theorem as it is applied to artificial neural networks. In any artificial neural

network, the goal is to map any real vector of dimension m into any other real vector of

dimension n. Let us assume that the input vectors are scaled to lie in the region from 0 to

1, but there are no constraints on the output vector. Then, Kolmogorov's theorem tells us

that a three-layer neural network exists that can perform this mapping exactly (not an

approximation) and that the input layer will have m neurons, the output layer will have n

neurons, and the middle layer will have 2m + 1 neurons. Hence, Kolmogorov's theorem

guarantees that a three-layer artificial neural network will solve all nonlinearly separable

problems. What it does not say is that (1) this network is the most efficient one for this

mapping, (2) a smaller network cannot also perform this mapping, or (3) a simpler

network cannot perform the mapping just as well. Unfortunately, it does not provide

enough detail to find and build a network that efficiently performs the mapping we want.

It does, however, guarantee that a method of mapping does exist in the form of an

artificial neural network [Poggio and Girosi, (1990)].

Let us consider the three-layer network shown in Figure (5.4), where all activation

functions are logistic functions. It is important to note that back-propagation can be

applied to an artificial neural network with any number of hidden layers [Werbos,

(1994)]. The training objective is to adjust the weights so that the application of a set of

inputs produces the desired outputs. To accomplish this the network is usually trained

with a large number of input-output pairs, which we also call examples.

The training procedure is as follows:

1. Randomize the weights to small random values (both positive and negative) to

ensure that the network is not saturated by large values of weights. (If all

weights start at equal values, and the desired performance requires unequal

weights, the network would not train at all.)

2. Select a training pair fi-om the training set.

3. Apply the input vector to network input.

4. Calculate the network output.

5. Calculate the error, (the difference between the network output and the desired

output).

6. Adjust the weights of the network in a way that minimizes this error. (This

adjustment process is discussed later.)

7. Repeat steps 2-6 for each pair of input-output vectors in the training set until

the error for the entire system is acceptably low.

101

w, 47

yv,

24 w.

T >58]) <Comp>—^ —yz —I

w,

w. w.

67,

W, 36

Layer i Layer j Layer k

Figure (5.4) Sketch of multilayer backpropagation neural network

102

Training of an artificial neural network involves two passes. In the forward pass the

input signals propagate from the network input to the output. In the reverse pass, the

calculated error signals propagate backward through the network, where they are used to

adjust the weights. The calculation of the output is carried out, layer by layer, in the

forward direction. The output of one layer is the input to the next layer. In the reverse

pass, the weights of the output neuron layer are adjusted first since the target value of

each output neuron is available to guide the adjustment of the associated weights, using

the delta rule. Next, middle layer weights are adjusted. The problem is that the middle-

layer neurons have no target values. Hence, the training is more complicated, because the

error must be propagated back through the network, including the nonlinear functions,

layer by layer.

5.3.1 Calculation of Weights for the Output-Layer Neurons

Let us consider the details of the backpropagation learning process for the weights

of the output layer. Figure (5.5) is a representation of a train of neurons leading to hidden

and output layers with neurons 5 and 8, input weights Wjsand Wjg, outputs OjjC/) and

OjgC/) respectively. For illustration purposes, the path with a target value of Tg wiU be

considered here in the calculation of weights. The notation (I) in O will be dropped for

convenience. The output of the neuron in layer k is subtracted from its target value and

squared to produce the square error signal, which for neuron 8 at layer k is

£=S^ =[T^ -^58]=[^8 ->^8] (5.11)

w 25

C 25 o 25

w 58

Layer i Layer j

—*( ) J

t;

Layer k

Figure (5.5) Layers of neurons for calculating weights at output during training

O

104

Since only one output error is involved

(5.12)

The delta rule indicates that the change in a weight is proportional to the rate of change of

the square error with respect to that weight, that is,

ds; — (5.13)

5W58

where ^jgis constant of proportionality called learning rate. To evaluate this partial

derivative, the chain rule of differentiation is used:

^ ggg- d y ^

Swjg 5^8 ^^58 ^^58

Each of these terms are evaluated in turn. The partial derivative of equation (5.12) with

respect to gives

^ = -2[T,-y,] (5.15) ^8

From equation (5.4), we get

^Zs _ ^^58

= ay^\X-y^] (5.16)

From Figure (5.4) is the sum of the weighted inputs from the middle layer, that is,

^ S 8 = Z ^ / 8 ^ m y m = j - 3 (5.17) 7=4

105

Taking the partial derivative with respect to Wjg yields

1^ = 25 (5.18)

Since we are dealing with one weight, only one term of the summation of equation (5.17)

remains. Substitoting equations (5.15), (5.16), and (5.18) into equation (5.14) gives

=-2a[T,-y,]y,[l-y,]0^=-S,s'^^ (5.19) dwss

where is defined as

<^58 = 2a[r8 - jglygtl- = 2^8 (5.20) 5/58

Substituting equations (5.19) into equation (5.13) gives

d s ^ ^^58 "758^58^25 (5-21)

or

= + (5.22)

Equation (5.22) can be written for any neuron in layer j and k as:

+ = + (5.23)

where N is the number of iterations . An identical process is performed for each weight of

the output layer to give the adjusted values of the weights. The error term from

equation (5.20) is used to adjust the weights of the output layer neurons using equation

106

(5.11) and (5.12). It is useful to discuss why the derivative of the activation function is

involved in this process. In equation (5.10), we have calculated an error which must be

propagated back through the network. This error exists because ±e output neurons

generate the wrong outputs. The reasons are (1) their own incorrect weights and (2) the

middle-layer neurons generated the wrong output. To assign this blame, we

backpropagate the errors for each output-layer neuron, using the same interconnections

and weights as the middle layer used to transmit its outputs to the output layer.

When a weight between a middle-layer neuron and an output-layer neuron is large,

and the output layer neuron has a very large error, the weights of the middle layer

neurons may be assigned a very large error, even if that neuron has a very small output,

and could not have contributed significantly to the output error. By applying the

derivative of the squashing function, this error is moderated, and only small to moderate

changes are made to the middle-layer weights because of the bell-shaped curve of the

derivative function (shown in Figure (5.2)).

5.3.2 Calculation of Weights for the Hidden Layer Neurons

Backpropagation trains hidden layers by propagating the adjusted error back

through the network, layer by layer, adjusting the weight of each layer as it goes. The

equations for the hidden layer are the same as for the output layer except that the error

term <?25 must be generated without a target vector. We must compute for each

107

neuron in the middle layer that includes contributions from errors in each neuron in the

output layer to which it is connected.

Consider a single neuron number 5 in the hidden layer just before the output layer,

(see Figure (5.5)). In the forward pass, this neuron propagates its output values to neurons

7,8, and 9 in the output layer through the interconnecting weights Wj,, Wsg, andvvsg.

During training, these weights operate in reverse order, passing the value of the error

S from the output layer back to the hidden layer. For example the error term <^58 is

produced when the signal flow back form neuron 8 to neuron 5. Each of the above

connection weights is multiplied by the value of the neurons in the output layer. The

value of S needed for the hidden-layer neuron is produced by surmning all such

products.

The arrangement in Figure (5.6) shows the errors that are backpropagated to

produce the change in Since all error terms of the output layer are involved, the

partial derivative involves a summation over the outputs of the neurons in the k'^ layer.

The procedure for calculating is substantially the same as calculating <5"25; start with

the derivative of the square error with respect to the weight for the middle layer that is to

be adjusted. Then, in a manner analogous to equation (5.13), the delta rule training gives

(5.24)

Layer j

»(Comp>— y?

7:

Layer i

*C K.Comp>

— O q

0^r> >(Comp>-3

"J Layer k

Figure (5.6) Representation of neurons for calculating the change of weight in the hidden layer

O 00

109

where the total mean square is now defined as:

(5.25) k=l t=7

Since several output errors may be involved, he learning constant 7735 is usually,

but not necessarily, equal to 753.

Using the chain rule of differentiation, the last term of equation (5.24) can be written as:

.££L= y d i s k ^^25 ^^25 , d w ^ T t i d y ^ 50 25 5/25 5^25

Let us consider neurons 5 and 8. Each term in equation (5.26) is evaluated similar to

equation (5.14).

The first two terms are already given by equation (5.15) and (5.16), which are:

^ = -2[r,->-, 1 = -2£s (5.27)

IZi. diss

= ccy^ [1 - 8 ] (5-28)

Taking the derivative of equation (5.17) with respect to O25 gives

dl.

5CD35 =^58 (5.29)

Note that only one connection is involved.

110

Changing equation (5.16) to account for the middle layer gives

^^=a<I> j3 [ l -®js ] (5 .30 ) 5/25

Similarly, equation (5.17) can be rewritten in terms of middle layer by substituting the i'*'

layer input JCjfor the layer input taking the derivative gives

dl. 25 _

dw^s = x. (5.31)

Substituting equation (5.28) thought (5.31) into equation (5.26) yields

= 2 ~ lyt (1 - 3'fc )]H'54a[025 (1 - 25 )]^2 (5.32) dW2s k=l

Using the definition of ^58 or in equation (5.20) yields

de^ ^ 50, = "Z sk v^5i " = 4,5,6 (5.33)

5W2„ U " "

Defining as

50 ^2n~^Sk^Sk (5.34)

equation (5.33) becomes

(5.35) ^2„ n=7

Substituting equation (5.33) and (5.34) into equation (5.24) gives

I l l

9

(5.36)

and hence

6

(^ +1) = H'2„ (N) + 773„ (^N)x^ J^S.„ (iV) (5.37)

If there are more than one middle layer of neurons, this process moves through the

network, layer to layer, to the input, adjusting the weights as it goes. After completion, a

new training input is applied and the process repeats until an acceptably low error is

reached. At this point the network is trained.

5.4 Momentum

The gradient descent can be very slow if the leaming rate rj is small and can

oscillate widely if 7] is too large. This problem essentially results from error-surface

valleys with steep sides but a shallow slope along the valley floor. One efficient and

commonly used method that allows a large leaming rate without divergent oscillations

occurring is the addition of a momentum term to the normal gradient descent method

[Plaut et. al., (1986)]. The idea is to give the weight some inertia or momentum so that it

tends to change in the direction of the average downhill force that it feels. Hence,

equation (5.23) becomes

Awj,(.N+1)= (5.38)

112

where // is the momentum coejficient (typically about 0.9). The new value of the weight

then becomes equal to the previous value of the weight plus the weight change of

equation (5.21), which includes the momentum term. Equation (5.23) now becomes

= Wj^{N) + Ah'^^.(A^+1) (5.39)

This process works well in many problems, but not so well in others. Another way

of viewing the purpose of momentum is it overcomes the effects of local minima. The

momentum term will often carry a weight change process through one or more local

minima, and the network will converge to global minima. This is perhaps its most

important function.

113

6. STOCHASTIC AND NEURAL NETWORK MODEL APPLICATIONS

6.1 Methodology

Assessment of the neural network approach or the stochastic approach under field

conditions requires a large amount of transmissivity and head data with free measurement

errors, which rarely exists. Model validation under field conditions is another issue

fraught with difficulty and uncertainty. A typical example of this can be viewed by

contrasting two approaches to estimation of the transmissivity distribution in Avra Valley

in Arizona. A composite objective function method was used by Clifton and Neuman

(1982), and a geostatistical cokriging method was used by Rubin and Dagan (1987)

developed for this site. There was a dramatic difference between the transmissivity

distributions predicted by these two methods. These differences are particularly

dismrbing when one recognizes that these two methods are among the more advanced

and highly regarded techniques, and that this field involves an atypically large amount of

transmissivity data. In any case, the results from this field application are essentially

ambitious, see Gelhar (1993) for more details.

In this study, two-dimensional steady-state saturated flow will be considered in the

stochastic simulations and the neural network design. Boundary conditions will be treated

deterministically so that the only random parameter is spatially heterogeneous

transmissivity. Transmissivity represents the permeability of the porous aquifer matrix,

with respect to the fluid properties of water, integrated vertically over the thickness of the

114

saturated zone of the aquifer. Data on transmissivity is commonly obtained from

pumping tests where a well is pumped for some period of time and water levels in the

well or nearby observation wells are monitored. Measurements thus obtained often show

less variability than, measurements made in the laboratory on "undismrbed" aquifer

matrix samples or measurements obtained by slug injection or withdrawal, because

pumping tests average a greater volume of aquifer material. While the true sample

volume of a pumping test is unknown, treating transmissivity data as point measurements

makes sense only if the scale of the model is much greater than the scale encompassed by

the pumping test. A heterogeneous property averaged over larger volumes of sample

shows less variability and a longer correlation scale [Joumel and Huijbreghts, 1978] than

does the theoretical underlying point-scale random field. Using a finite element or finite

difference discretization for a numerical solution can introduce an artificial correlation

between transmissivity values unless the element size is much smaller than the

correlation scale of the transmissivity field [Dagan, 1985].

Because of the problems described above, for this study, neural networks and

stochastic approaches were assessed and compared using hypothetical heterogeneous

aquifers. It was assumed that all necessary statistical parameters characterizing the spatial

variability of Ln[T] are known exactly and measurements are considered error-free.

To demonstrate the performance of the neural network simulation, several

realizations of transmissivity fields consisting of different degrees of heterogeneity for a

hypothetical two-dimensional aquifer were generated, and these fields are considered as

the real world or "true" fields. In this analysis, the perturbed f-field (transmissivity) was

115

generated using a spectral random field generator, Gutjahr (1989,1992,1994), assuming

an exponential covariance function. Using the generated transmissivity fields, head fields

for these hypothetical aquifers were obtained fi^om the governing flow equation (1) using

multi-grid finite difference method Steve Schaffer's (1995) with known boundary

conditions of types 1 and 2. The resultant head field is considered as the real-world or

"true" hydraulic head corresponding to the "true" transmissivity field.

The layout of the hypothetical aquifer is shown in Figure (6.1). For the purpose of

random field simulation and numerical solution of the governing flow equation, the

domain was dicretized into a grid of 32 x 32 uniform blocks, each measuring 1x1. Each

block was assigned a constant mean transmissivity value. The upper and lower sides of

the aquifer were defined as no flux boundaries and the left and right-hand sides are

assigned as prescribed head boundaries (to maintain a uniform gradient of 0.1). Once the

transmissivity of the hypothetical aquifer was generated fi-om which the corresponding

head field was simulated, either a random or specified sampling scheme was then

employed to determine the sample locations for transmissivity and head data in the

hypothetical aquifer. Note that the aquifer dimensions could be scaled by any factor,

provided the conductivity, and constant head boundaries are scaled by the same factor,

Harvey and Gorelick (1995).

No FLow Boundary

Grid size = 32 X 32

Blocks = 1X1 uniform

Ln[T] = -3.0

Integral scale = 0.5

aj = 1,2, andS

No FLow Boundary

Figure (6.1) Hypothetical Aquifer Layout

117

Both the stochastic (ICS) and the neural network approaches are examined and

compared using the mean square error (MSE) criteria as follows:

where and are the observed and estimated head or transmissivity values at location

i respectively, and N is the total number of nodes compared.

6.2 Iterative Conditional Simulation (ICS)

Realizations of f-fields for three different cases with an exponential correlation

scale of 0.5 (approximately one sixth of the domain) in both x and y directions were

generated for the stochastic model application. All these cases used a mean transmissivity

value of Ln|T] = -3.0, and each case used a different variance of transmissivity, these are

1, 2, and 5.

The next step is to obtain the h-field that satisfies the full nonlinear flow equation.

Both the generated/field and the boundary conditions given by the gradient of h are used

to solve for the h field. Both the/and h fields were sampled at specified locations using

both uniform and non-uniform sampling schemes. For the uniform sampling scheme,

thirty evenly spaced data locations of both the / and h fields, as shown in Figure (6.2),

were sampled. Figure (6.3) shows ten transmissivity and thirty head sample locations of

for the non-uniform sampling scheme. Note that for both sampling schemes, /and h data

are not necessarily sampled at the same locations.

(6.1)

I | 2 | 3 | . | S | 6 | 7 | a 1 ? 1 ia| 11 12| 13| 14| 1S| 16| I7| IS isl 20J21 26|27|3s|29|30|3l|32

No Flow Boundary

fh fil fh. fll fh fh.

fll fh. fh fh fh fh. & •S c a 0 m

ir •0 a a 0 0

tJ 0 fll fh. fh. fh fh fh. •0 • X • X *> c a g jj a 0 u fh. fh fh fh fh fh. e 0 u

fh fh fh fh fh fh

No Flow Boimdary

Figure (6.2) Unifonn Sampling Scheme of Transmissivities and Head Data

4 5 6 1 7 s 1 9 1 la 11 12 13 14 15 16 17 IS 19 20 2l|22|23 24 2SI26 27|2a 29130 31 "1 No Flow Boxmdary

No Flow Boundary

Figure (6.3) Non-Uniform Sampling Scheme of Transmissivtties and Head Data

120

The final step is to compute covariances and cross-covariances between actual

conditioning locations, and to compute covariances between all grid locations and

conditioning locations to obtain the kriging matrices for conditioning. At this point, the

model is ready for acmal conditioning.

To perform an actual conditioning, a new random f-field (realization) has to be

generated and conditioned on previously selected/data values at the specified locations.

The corresponding h-field is obtained in the same manner mentioned above. As

conditioning goes, these two conditioned fields should match, within measurement errors,

the values of the sampled data. Note that the conditioned f and h fields do not actually

satisfy the flow equation, but they do satisfy the linearized form of the flow equation,

particularly when high variances of transmissivities are used. To overcome this

limitation, an alternative method, which involved iterative conditioning, is used.

In the iterative approach, the transmissivity field is conditioned on both the head

and transmissivity measurements. This transmissivity field is then used, along with the

boundary conditions, to solve for a new head field. The resulting fields satisfy the

continuity conditions, but the iterated heads may not agree with the measured heads.

Consequently, the new head field at the previous data locations is used again to condition

the transmissivity field, and the process is repeated until the iterated heads are "close" to

the sampled heads. Unfortunately, the solution to the acmal flow equation is very

different from the linearized equation for high variances of transmissivities. Thus,

iteration can only try to improve the head differences, but it can never completely remove

121

them. However, this procedure seems to significantly improve the head differences after

four to six iterations.

6.3 Neural Network Simulation

The successful performance of a neural network is sensitive to its physical structure

or architecture (i.e., number of input nodes, hidden layer nodes, and output nodes). The

appropriate architecture for a neural network is highly problem dependent. In order to

map inputs of the real system to the system's outputs, the strucmre of the neural network

must first be defined. The number of neurons in the input and output layers coincides

with the number of the system inputs and outputs, respectively. The number of neurons

in the hidden layer H must be found empirically. The notation I-H-O is often used to

denote a network with I neurons in the input layer, H neurons in the hidden layer, and O

neurons in the output layer. For example, 4-5-1 ANN has four, five, and one neuron in

the input, hidden, and output layers, respectively.

To reproduce relationships between input and output data sets of a system, an ANN

must be trained. Training consists of (i) calculating output sets from input sets, (ii)

comparing measured and calculated outputs, and (iii) adjusting the weights and the bias

in the transfer function for each neuron to decrease the difference between measured and

calculated values. Each input-output data set, which is often called a pattern, has to be

presented to the ANN many times until changes in errors becomes insignificant. The

mean square sum of differences between measured and calculated value serves as a

measure of the goodness-of-fit.

122

In this study, a feed-forward backpropagation neural network was used to map

input of the real system to its output. Knowledge acquisition during training is

accomplished by supervised learning, in which the neural network is provided with the

desired response to the training input patterns. This learning approach allows the neural

network to develop generalization or abstraction capabilities, unlike unsupervised

learning (self-organization) or reinforcement (grading output).

6.3.1 Development of training and testing patterns

Generated transmissivity fields (realizations) as explained in sec. (2.5) served as

input training patterns into the ANN. Each training pattern consists of an input and output

sets of data values. Transmissivity values at sampled locations are used as input to the

ANN (total number of neurons in the input layer equals the total number of sampled

locations). The corresponding outputs are the values of transmissivities at unsampled

locations (total number of neurons in the output layer equals the total number of

unsampled locations). The number of neurons in the hidden layer H are usually found

empirically; one suggestion is the geometric mean of the number of neurons in input and

output layers, as recommended by Maren et al., (1990); another given by Kolmogorov's

theorem (1957), suggests that the number of neurons in the hidden layer H is 21+1 (twice

the input +1). Both suggestions were tried in this study, and no significant difference was

found in the performance measure of the ANN. In order to keep the ANN as simple as

possible, and without compronusing performance, the second suggestion was used,

guaranteeing a smaller number of connection weights.

123

In this study, two different artificial neural networks (ANN) were built to simulate

transmissivity field of the hypothetical aquifer shown in Figure (6.1). Both ANNs have

three feed-forward layers (input, hidden and output), and they both share the same output

target (transmissivities at unsampled locations). The first ANN, named f-ANN as shown

in Figure (6.4) uses only transmissivity values at sampled locations (/^^ ... /^^ ) in its

input layer. The corresponding output layer () has neurons representing all

unknown (to be estimated) transmissivity values at unsampled locations as shown in

Figure (6.4), this design corresponds to the aquifer with a uniform sampling scheme

shown in Figure (6.2). With this input scheme, the f-ANN is written as 30-61-994 (iaput-

hidden-output). The second ANN, named fh-ANN, is the same as f-ANN, except it

utilizes both sampled transmissivities and head fields in its input layer. The fh-ANN is

designated as 60-121-994. A sketch of this ANN is shown in Figure (6.5); this design

corresponds to the aquifer with a non-uniform sampling scheme shown in Figure (6.3).

The success of the training strongly depends on the normalization of the data and

on ±e training parameters, namely the learning rate and momentum coefficients.

Unfortunately, there are no general rules for selecting the optimal values of these

parameters and one must resort to experience. In this study, both networks were trained

the same way, using a total of 2000 generated training patterns, and learning rate and

momentum of 0.7 and 0.8 respectively.

<3r

<r

f (US): 60-121-994 (NUS): 40-81-994

Output Layer

Hidden Layer

Input Layer

h

Figure (6.5) Architecture of fh-ANN

to

126

The training procedure as follows:

Step 0: Connection weights are first randomly initializing within ±e interval

(-0.3,0.3), this is done using a standard random generator algorithm to produce

uniformly distributed random numbers in a specified range. A small range was

selected to ensure diat the network was not saturated by large values of weights.

Step 1: Pairs of training patterns are selected randomly firom the training sets and

presented to the ANN. Note that the data were normalized to fall within the interval

(0.2,0.8).

Step 2: Input vector is applied to the network input.

Step 3: Network output is calculated.

Step 4: Mean Square Error (MSE) is calculated, which is the difference between

the network output and the desired output.

Step 5: Connection weights are adjusted to minimize the MSE.

Step 6: Steps 1-5 are repeated for each pair of input-output vector in the training

set, until no significant change in the MSE is detected for the system, or until a

desired value of MSE is reached.

After training is completed, the final connection weights are kept fixed, and new

input patterns are presented to the network to produce the corresponding output

consistent with the intemal representation of the input/output mapping. Notice that the

nonlinearity of the sigmoidal function in the processing elements allows the neural

\21

network to learn arbitrary nonlinear mappings. Moreover, each node acts independently

of all the others and its functioning relies only on the local information provided through

the adjoining connections. In other words, the functioning of one node does not depend

on the states of those other nodes to which it is not connected. This allows for efficient

distributed representation and parallel processing, and for intrinsic fault-tolerance and

generalization capability.

Two common strategies for measuring how successfully the ANN has been trained

are to test the capability of the network to correctly predict the output for (1) the input

sets that were originally used to train the network, and (2) input sets that were not in the

training set. The first strategy is often referred to as an accuracy performance, and the

second as a generalization performance. Both f-ANN and fh-ANN showed an average

high accuracy performance of 99.9%; the average generalization performance over 100

new patterns was 95.4% and 96.3% for f-ANN and fh-ANN respectively. Note that these

high scores of performances does not mean that a perfect match is found between

network output and the desired output, but rather it indicates that the networks are

healthy, well trained and capable to generalize, regardless of its output.

128

7. RESULTS AND DISCUSSION

To examine both stochastic and neural approaches, 100 f-fields for each of three/

variances (LO, 2.0, and 5.0) were generated, and their corresponding h-fields simulated.

Both/and h fields represent real or "true" transmissivity and head distributions over the

entire domain. Both uniform and non-uniform sampling schemes were utilized for both

the stochastic and neural network applications. These applications were used to estimate

the true fields for all realizations. Although similar results were achieved for all fields,

only three randomly selected fields for each variance are discussed in detail below.

Combining the three variances with the two sampling schemes produced six

scenarios, shown in table (7.1). The table also shows the number of/and h data used for

each scenario. The results of the estimation methods are shown in figures, each having

seven contour maps. The contour maps show true and simulated/or h fields generated

by the (ICS) and the two neural networks (fh-ANN, f-ANN), utilizing both the uniform

and non-uniform sampling schemes. Corresponding to these contour maps are scatter-

plots depicting real versus simulated/or h values at unsampled locations.

129

Table (7.1) Scenarios considered in the model applications

Scenario Sampiing Scheme #o f /

data

# o f h

data

1 Uniform (US) 30 30 1

2 Non-Uniform (NUS) 10 30 1


4 Non-uniform (NUS) 10 30 2


6 Non-uniform (NUS) 10 30 5

For visual comparisons, all contour maps of the same figure are scaled with the

same range of colors as shown in the bar-scale. As the legends show, color shading

indicates the magnitudes of the transmissivity values over the domain.

Contour maps are shown in Figures (7.1) through (7.3) for Scenarios 1 and 2. Each

figure represents one of the three randomly selected transmissivity fields of variance 1.0.

For both scenarios, all three methods performed well in estimating the overall spatial

variability of the "true" transmissivity fields. The results of Scenario 1 (uniform

sampling) were always superior to those of Scenario 2 (non-uniform sampling),

especially for the conditional iterative simulation.

Scenario 1 (a) True f F»ld

0 10 20 30

(b) ICS (US)

(d) fhnANN (US)

(f) frANN (US)

Scenario 2 (c) ICS (NUS)

0 10 20 30

(e) fh^N (NUS)

0 10 20 30

(g) f-ANN (NUS)

Figure (7.1) True and estimated f fields for Cr^= 1.0, (RF#1)

Scenario 1

(a) True f Field

0 10 20 30

(b) ICS (US)

0 10 20 30

(d) fh-ANN (US)

(f) f-ANN (US)

Scenario 2

(c) ICS (NUS)

0 10 20 30

(g) f-ANN (NUS)

0 10 20 30

Figure (7.2) True and estimated f fields for (T^= 1.0, (RF#2)

Scenario 1 (a) Taie f Rdd

(d) fh^N (US)

0 10 20 30

(f) f- N (US)

0 10 20 30


0 10 20 30

(g) frANN (NUS)

0 10 20 30

Figure (7.3) True and estimated f fields for cr^= 1.0, (RF#3)

133

The contour maps show that in general, the neural networks tend to smooth the

transmissivity fields. That is, gradual transitions from lower to higher regions of

transmissivity exist. This contrasts with the ICS, which produces abrupt changes in the

transmissivity fields.

For both Scenarios 1 and 2, the (fh-ANN) neural network outperformed both the (f-

ANN) neural network and the ICS. This is verified by the computed mean square errors

as shown by the bar charts in Figure (7.4). The scatter-plots also show how uniform

sampling (Scenario 1) resulted in better estimates of the true transmissivity values than

non-uniform sampling (Scenario 2). This is shown by the greater concentration of points

along the 45-degree line for Scenario 1.

The results of Scenarios 3 and 4, shown in Figures (7.5) through (7.7), had similar

behavior to Scenarios 1 and 2. However, since the variance is twice that of Scenarios 1

and 2, the computed mean square errors for Scenarios 3 and 4 were larger, as shown in

Figure (7.8). Also, the discrepancy between the uniform (Scenario 3) and non-uniform

(Scenario 4) sampling estimates increased, particularly for the ICS and the (f-ANN)

neural network.

The results of Scenarios 5 and 6 are shown in Figures (7.9) through (7.11). There

is a significant difference in perform.ance between the ICS and the neural networks. As

shown by the contour maps, regions of shading not seen in the true field appear in the

ICS fields. This indicates poorer performance in estimating the true transmissivity fields.

As with earlier Scenarios, the neural networks were better able to estimate the true fields.

This is shown by the close agreement in color shading of the contour maps. Both the

Estimated f Estimated f Estimated f

orci e •n n

4^

O "1 q H s o>

I. i a n «)

hJ (0 c 3

U) 0> 3 a 3' a

Estimated f Estimated f Estimated f

Si rt a (A

O O

z 0 3 1 c 3

(0 fi) 3 D 3' 0

U) 4^

Scenario 3 (a) True f Field

10 20 30

(b) ICS (US)

0 10 20 30

(d) fh-ANN (US)

0 10 20 30

(f) f N (US)

-2.5

-2.5

I -5

-2.5

I 4

2

0

10 20 30

-2.5

I -5


0 10 20 30

(e) fhrANN (NUS)

0 10 20 30

(g) fnANN (NUS)

10 20 30

Figure (7.5) True and estimated f fields for cr^= 2.0, (RF#1)

Scenario 3 (a) Tme f Field

(d) fh-ANN (US)

0 10 20 30

(f) fnANN (US)

0 10 20 30


(e) fh-ANN (NUS)

(g) f-ANN (NUS)

Figure (7.6) True and estimated f fields for 0"^= 2.0, (RF#2)

137

Scenario 3

(a) True f Field

0 10 20 30

(b) ICS (US)

0 10 20 30

(d) fh^N (US)

0 10 20 30

(f) f-ANN (US)

1 - 6

Scenario 4

(c) ICS (NUS)

0 10 20 30

(e) fhrANN (NUS)

0 10 20 30

(g) f N (NUS)

10 20 30

s

10 20 30

Figure (7.7) True and estimated f fields for (7^=2.0, (RF#3)

138

Unifonn SamDUna Non-Uniform Sampling

"O Q

ca m

0.0 -

-5.0

T3 O

OS LU

•a

<0 Ui

RF#1

Truef

RF#2

0.0 -

True f

RF#3

True f

-5.0

6.0

-6.0

0.0 -

True f

True f

True f

Figure (7.8) True vs. estimated ICS(«), fh-ANN(«), and f-ANN(#) f fields

for 0"y- =2.0, (RF# 1,2, and 3).

Scenario 5

(a) True f Field

0 10 20 30

(b) ICS (US)

(d) fhnANN (US) 30

30 20 0 10

(f) f N (US)

139

Scenario 6

(c) ICS (NUS)

(e) fh^N (NUS)

0 10 20 30

(g) f-ANN (NUS)

Figure (7.9) True and estimated f fields for <T^ = 5.0, (RF#1)

140

Scenario 5 (a) True f Field

I

0 10 20 30

(b) ICS (US)

0 10 20 30

(d) fh-ANN (US)

0 10 20 30

(f) f-ANN (US)

I 12

1 -7 .5

-15

12

I ^ 0

-7 .5

I -15

-7 .5

' -15

0 10 20 30

-7 .5

-15


0 10 20 30

(e) fh- N (NUS)

0 10 20 30

(g) f-ANN (NUS)

-7 .5

-15

-7 .5

-15

• IIIISSSSBSSIMJTX -7 .5

-15

Figure (7.10) True and estimated f fields fora^= 5.0, (RF#2)

141

Scenario 5 (I^^ITiSHetFiFidtl

0 10 20 30

(b) ICS (US)

0 10 20 30

(d) fhrANN (US)

0 10 20 30

(f) f-ANN (US)


0 10 20 30

(e) fh-ANN (NUS)

0 10 20 30

(g) fnANN (NUS)

10 20 30

I ^ 3

0

-4

I ^ 3

0

-4

' -8

10 20 30

Figure (7.11) True and estimated f fields for<Tj^=5.0, (RF#3)

142

scatter-plots and computed mean square errors shown in Figure (7.12) further support

these observations.

For all scenarios, the corresponding head contour maps and scatter-plots are shown

in Figures (7.13) through (7.24). Because of the non-linearity between/and h, the results

of the scenarios for head are not necessarily consistent with those of transmissivity.

The (f-ANN) neural network generally performed poorly in estimating the true

head field for almost all scenarios. As in the transmissivity case, all estimation methods

were superior when utilizing the uniform sampling to the non-uniform sampling scheme.

In all scenarios utilizing non-uniform sampling, the (fh-ANN) neural network

outperformed the other two estimation methods. The ICS achieved better results in

estimating the head fields than the transmissivity fields. This is particularly true for the

uniform sampling scheme.

Figures (7.25) through (7.27) are scatter-plots depicting real versus simulated head

at all sampled locations for variances 1.0, 2.0, and 5.0, respectively. Each figure shows

both uniform and non-uniform sampling schemes for each of the three randomly selected

realizations.

For all variances of f, the conditional iterative simulation reproduces the exact

values of transmissivity at the sampled locations. For/variances of 1.0 and 2.0, the ICS

approach preserves the true values of h at the sampled locations. However, at a variance

of 5.0, the true h values were not preserved at all sampled locations, especially for the

Estimated f

Estimated f

Estimated f Estimated f

Estimated f Estimated f

144

Scenario 1 (a) True h field

0 10 20 30

(b) ICS (US)

0 10 20 30

(d) fhrANN (US)

0 10 20 30

(f) f-ANN (US)

i

I 0 .07

0 .035

0

-0 .045

-0 .09

I 0 .07

0 .035

0

-0 .045

-0 .09

1 0 .07

0 .035

0

-0 .045

-0 .09

0 .07

0 .035

0

-0 .045

' -0 .09


0 10 20 30

(e) fhnANN (NUS)

0 10 20 30

(g) f-ANN (NUS) 30

0.07

0.035

Q

I -0 .045

I -0 .09

0.07

0 .035

0

1-0 .045

' -0 .09

0.07

0 .035

0

-0 .045

-0 .09

Figure (7.13) True and simulated h fields for crj^= 1.0, (RF#1)

145

Scenario 1

(d) fh-ANN (US)

(f) fnANN (US)

0 10 20 30


0 10 20 30

(e) fh-ANN (NUS)

0 10 20 30

Figure (7.14) True and simulated h fields for cr^= 1.0, (RF#2)

146


0 10 20 30

(b) ICS (US)

0 10 20 30

(d) fh-ANN (US)

0 10 20 30

(f) f-ANN (US)

Scenario 2

(e) fh-ANN (NUS)

0 10 20 30

(g) f-ANN (NUS)

Figure (7.15) True and simulated h fields foro'y= 1.0, (RF#3)

147

Uniform Sampling Non-Unifonn Sampling

hu1-23

RF#1 OXU-

-QM -

-0.12

<0.12 -0X8 -OJ04. OJOO True h

0.04 00)8

hn1-23

•0.12 -0.08 -OJM 0.00 True h

QM 0.08

True h True h

hu1-99

•OM -OJ02

True h True h

Figure (7.16) True vs. simulated ICS(* ), fh-ANN(« ), and f-ANN(» ) h fields

for 0'j-= 1.0, (RF# 1,2, and 3).

Scenario 3

(a) True h field

0 10 20 30

(b) ICS (US)

0 10 20 30

(d) fh-ANN (US)

0 10 20 30

(f) f-ANN (US)

148

10 . 1 8

jo.09

H-0.04

-0.08 Scenario 4

(c) ICS (NUS)

0 10 20 30

(e) fh-ANN (NUS)

0 10 20 30

(g) f- N (NUS)

0 10 20 30


149

Scenario 3

(a) True h field

0 10 20 30

(b) ICS (US)

0 10 20 30

(d) fh-ANN (US)

0 10 20 30

(f) f N (US)

I 0 .04

. . . .

-0.065

1-0.13

I 0 .04

-0 .065

-0 .13

0 .04

0.02

0

1-0.065

I -0 .13

I 0 .04

0 . 0 2

-0 .065

I -0 .13

Scenario 4

(c) ICS (NUS)

0 10 20 30

(e) fh-ANN (NUS)

0 10 20 30

(g) f-ANN (NUS)

0.04

0 . 0 2

0

-0 .065

1-0 .13

I 0 .04 . . . .

-0 .065

-0 .13

r 0 .04 ...3

-0 .065

-0 .13

Figure (7.18) True and simulated h fields for C7^= 2.0, (RF#2)

150

Scenario 3

(a) True h field

0 10 20 30

(b) ICS (US)

0 10 20 30

(d) fhrANN (US)

0 10 20 30

(f) f-ANN (US)

0 10 20 30

Scenario 4

(c) ICS (NUS)

(e) fh-ANN (NUS)

(g) frANN (NUS)

0 10 20 30


simulated h

Simulated h

S) rt a VI

6 u ot

6 U o

S Ih

8"

Simulated h Simulated h



0 10 20 30

(b) ICS (US)

(d) fh-ANN (US)

(f) f-ANN (US)

0 10 20 30

152


0 10 20 30

(e) fh-ANN (NUS)

(g) f-ANN (NUS)

0 10 20 30

Figure (7.21) True and simulated h fields for(T^= 5.0, (RF#1)

153


0 10 20 30

(b) ICS (US)

0 10 20 30

(d) fh^N (US)

0 10 20 30

(f) f-ANN (US)

m

. . .

0.05

0

-0 .0S5

' -0 .17

0.1

O.OS

0

-o.oas 10


' -0 .17

f " ' 0.05

0

-0 .085 10

0 10 20 30

(e) fh-ANN (NUS)

' -0 .17

0.1

0.05

0

-0 .085 10

0 10 20 30

(g) f-ANN (NUS)

10 20 30 ' -0 .17

10 20 30

0.1

0.05

0

-0 .085

' -0 .17

0.1

0.05

0

-0 .085

' -0 .17

0.1

0.05

0

-0 .085

' -0 .17

Figure (7.22) True and simulated h fields for crj^= 5.0, (RF#2)

154


10 20 30

(b) ICS (US)

0 10 20 30

(d) fh-ANN (US)

0 10 20 30

(f) f-ANN (US)

0.13

0 .065

0.05S

r 0.13

0 .065

0

I -0 .055

-0.11

0.13

0 .065

-0 .055

-0.11

10 20 30

I 0.13

0 .065

-0 .055

'-0.11


30

20

10

0 10 20 30

(e) fh-ANN (NUS)

0 10 20 30

(g)f- N(NUS) 30

20

10

10 20 30

I

I

I 0 .13

0 .065

0

I -0 .055

I-0.11

0.13

0 .065

0

I -0 .055

-0.11

0.13

0 .065

0

I -0 .055

1-0.11

Figure (7.23) True and simulated h fields for C7j:= 5.0, (RF#3)

simulated h

Simulated h

B) 2. a v>

simulated h Simulated h


Ul

156

Uniform Sampling Non-Uniform Sampling

hu1-23

0.02

CO -0.04 -

0.08

-0.10 "T r •0.04 0.02

True h

•0.04 0.00 0.04 True h

0.08

hu1^79

0.04 -

CO 0.00 -

0.08

0.02 -

hn1-23

•0.04 •0.04 0.00 0.04

True h

0.08

CO -0.04 >

•0.10 -0.10

True h

0.08 hn1-79

0J14 '

CO 0.00 -

0.08

0.08 hu1-99

JZ 0.04

0.00 0.04 0.08

0.08

.C 0.04

3 E

CO 0X0

-0.04

True h True h

Figure (7.25) True vs. simulated ICS (O), fh-ANN (A), and f-ANN (+) head

at sampled locations for = 1.0, (RF# 1,2, and 3)

157

Uniform Sampiina

0.20

0.15 -

•a o « 0.10 3 E cn

0.05 -

0.00

hu2-56

0.00 0.05 0.10 0.15

True h 0.20

Non-Uniform Sampiinq

0.15

"O 0.10 o 3 E

CO

0.00

-0.05

hn2-56

A

•0.05 0.00 0.05 0.10 0.15 0.20

True h

hu2-83

CO 0.00 -

True h •0.08 0.16

hn2-«3

<5 0.00

•0.08

-0.16 -0.08 0J)0 0.08 True h

0.16

0.12

0.08

0.04 O a 3 E 0.00

CO

•0.08

0.00 -0.08 -0.04 0.04 0.08 0.12

0.16

0.12

0.08

0.04

0.00

•0.08

•0.08 ^0.04 0.00 0.04 0.08 0.12 0.16

True h True h

Figure (7.26) True vs. simulated ICS (o), fh-ANN and f-ANN (+) head

at sampled locations for = 2.0, (RF# 1,2, and 3)

158


HU5-74

0.00 -

CO -0.09 -

•0.09

True h

0.09

-0.18

-0.18 0.09

JZ 0.00 -

"O o a 3 H

•0.18

hn5-74

40

* ?

y/O + O

o Oq

A

yT &

-0.18 -0.09 0.00

True h 0.09

0.08

0.00

0

a

•0.16

huS-24

+• y '

•«> 4' +Ay'

t -fr

A

y' A

•0.16 •0.08 0.00

True h 0.08

hnS^24

0.00 •

CO -0.08

-r--0.08 0.00

True h 0.08

0.14

0.07

£ 0.00

•0.07

-0.14 -0.14 0.00 0.07 0.14

0.14

T3 O o 0.00 -3 E

CO •0.07 -

-0.07 0i)0 0.07 0.14 True h True h

Figure (7.27) True vs. simulated ICS (o), fh-ANN (A), and f-ANN (+) head

at sampled locations for 0*^= 3.0, (RF# 1,2, and 3)

159

non-uniform sampling scheme. This failure to preserve all true values may be due to the

high variance of/magnifying the highly non-linear relationship between/and h.

Unlike the ICS, the neural network does not preserve the true values of h at the

sampled locations. For the neural network to do so, its estimated transmissivity fields

would have to be conditioned, which is not possible. This is because the neural network,

after sufficient training, maps input to output with a fixed weight vector. This weight

vector does not condition mapping to reproduce values at the sampled locations, but

seeks to minimize the global error.

Figures (7.28) through (7.33) show mean square error o f f o r h for each of the 100

realizations for the three variances and two sampling schemes. The legend in the figure

indicates the number of realizations for which the neural networks outperformed the ICS.

For a variance of 1.0 with a uniform sampling scheme, both neural networks

outperformed the ICS for all / realizations, as shown in Figure (7.28a). For the non

uniform sampling scheme, only the fh-ANN always outperformed the ICS. However, as

shown in Figure (7.28b), the f-ANN outperformed the ICS in only 59 of the 100 /

realizations.

Similar results were obtained for a variance of 2.0 for the/fields. For the uniform

sampling scheme, both neural networks always outperformed the ICS, as shown in Figure

(7.29a). For the non-uniform sampling scheme, the fh-ANN and f-ANN neural networks

outperformed the ICS 94 and 60 times, respectively, as shown in Figure (7.29b).

1.S0 ICS

fh-ANN (100)

0 f-ANN (100)

(a) ful (US)

1.00

UJ v> •s.

0.50 O Q

O o

0.00 0 20 40 60 80 100

Realizations

1.S0 ICS

fh-ANN (100)

0 f-ANN (59)

(b) fnl (NUS)

1.00

UJ (0 S

O o o.so o O

0.00

0 20 40 60 80 100 Realizations

Figure (7.28) Mean Square Error (MSE) of 100 f fields for (jj- 1.0

0\ o

3.00

ICS

fh-ANN (100)

O f-ANN (100)

(a) fu2 (US)

2.00

lU (0 s

1.00

O Q O

0.00 0 20 40 60 80 100

Realizations

5 ICS

fh-ANN (94)

f-ANN (60)

(b) fn2 (NUS) 4

3

2

1

0 0 20 40 60 80 100

Realizations

Figure (7.29) Mean Square Error (MSE) of 100 f fields for (j^ ~ 2.0

100.0 ICS

fh-ANN(IOO)

O f-ANN (100)

(a) fuS (US)

Ui 50.0

««»»<»« I t • 1) . « 0 U • 0.0 0 20 40 60 80 100

Realizations

250.0

(b) fnS (NUS) ICS

fh-ANN(IOO)

O '-ANN (100)

200.0

150.0 UI (0 S

100.0

50.0

f lnoaanow4onraniO»nonnnn«nna • n a Q.n n 0.0 0 20 40 60 80 100

Realizations

Figure (7.30) Mean Square Error (MSE) of 100 f fields for (^= 5.0

0.0008 ICS fh-ANN(41)

O '-ANN (7)

(a) hu1 (US)

Ui (0 0.0004 -s


ICS ' 1 1 1

fh-ANN(95) (b) hn1 (NUS)

0 f-ANN (8)

o 0

o 0

0

o 0 0 o ° O Q

o o °°o ° " o o °

0 o

0 0

0 " 0 0 o A 0 ® 0 0 0 O ® -1 ^ 11 I_I 1 111 1

0 o ® " - A o 0„ ^

0 _ O o ° ° a „ ° 0 ° -° • •


Figure (7.31) Mean Square Error (MSE) of 100 h fields for (Jf = 1.0

Ul (0 S

0.0015

0.001 -

0.0005

ICS

fh-ANN(36)

O f-ANN(12)

(a) hu2 (US)

•-•A'A 40 60

Realizations 100

0.008

0.006 -

UJ (0 0.004 S

0.002

1 1 ICS

1 1 •— •• (b)hn2(NUS) 0

fh-ANN(58)

1 1 •— •• (b)hn2(NUS)

- 0 f-ANN(2)

o 0

0

0 » "

o o

0

-

0 0 o o L

o

o

c

o

o o

o o a

A c

o

o A o 0

3 O

o

3 O

9

O o

O ® 0 O O O n ^ _ o O /

20 40 60 Realizations

80 100

Figure (7.32) Mean Square Error (MSE) of 100 h fields for (J^=2.0

o\ 4^

o.oos

ICS

(h-ANN (59)

f-ANN (39)

(a) hu5 (US)

UJ W 0.0025 -


0.02

ICS

m-ANN(93)

O f-ANN (36)

(b) fnS (NUS)

0.015 -

UJ (0 s

0.01

o.oos -O o

o o 0


Figure (7.33) Mean Square Error (MSE) of 100 h fields for 5.0

ON

166

For a variance of 5.0, both neural networks always outperformed the ICS for both

the uniform and non-uniform sampling schemes, as shown in Figures (7.30a) and (7.30b),

respectively. Note that both figures show that about 10 percent of the estimated ICS

fields have very high mean square errors.

The corresponding head fields are shown in Figures (7.31) through (7.33). As

previously mentioned, the relationship between/and h in non-linear. Consequently, the

results for h are not consistent with those obtained for f. For example, as Figure (7.31a)

shows, for a variance of 1.0 with a uniform sampling scheme, the fh-ANN outperformed

the ICS only 41 times-, as compared to 100 times obtained for the/realizations. Better

performance was obtained for the same variance with the non-uniform sampling scheme,

as shown in Figure (7.31b). In this case, the fh-ANN outperformed the ICS 95 times.

This supports the earlier observation that the ICS did not estimate the h field as

accurately for the non-uniform sampling scheme as it did for the uniform sampling

scheme. The f-ANN outperformed the ICS only 7 and 8 times for the uniform and non

uniform sampling scheme, respectively.

For the variance of 2.0, the fh-ANN outperformed the ICS 36 and 58 times for the

uniform and non-uniform sampling schemes, respectively, as shown in Figures (7.32a)

and (7.32b). No significant differences in performance were found for the f-ANN as

compared to both sampling schemes for a variance of 1.0.

For the variance of 5.0, both neural networks achieved their best overall

performance against the ICS, as shown in Figures (7.33a) and (7.33b). This is largely a

consequence of significantly poorer ICS performance with higher variance, rather than

167

improvement in neural network performance. This is underscored by the fact that the

mean square error for all approaches increases as the variance increases.

The level of neural network performance can be attributed to how successfully the

network is trained. The mean square errors for all scenarios as a function of iteration

during training are shown in Figure (7.34). A lower mean square error for the fh-ANN

was always achieved for both uniform and non-uniform sampling schemes, indicating

greater success in training. This is because this neural network, utilizing more processing

elements in its input layer, uses more information to estimate the fields. This behavior is

tme for all three /variances, indicating network stability. Table (7.2) tabulates the final

mean square error achieved at the end of the 15,(X)0 iteration for all trained neural

networks.

168

0.12

0.09

0.06 -

0.03

0.12

0.09

0.06 -

0.03

0.12

0.09

0.06 -

0.03

a'r =1.0 ftl-ANN (US)

fh^ANH (NUS)

f-ANN (US)

f ANN (NUS)

4000

4000

8000 Iterations

12000 16000

= 2.0 m-ANN (US)

fh-ANN (NUS)

I MNN (US)

^— f-ANN (NUS)

8000 Iterations

12000 16000

fll-ANN (US)

fh-ANN (NUS)

f-ANN (US)

^— f-ANN (NUS)

4000 8000 Iterations

12000 16000

Figure (7.34) Mean Square Error (MSE) fluctuations during training

169

Table (7.2) MSE of ANNs at the end of 15,000 iteration for different variances

MSE

Variance ANN Sampling Training Testing

1.0

fh-ANN US 0.0445 0.0481

1.0

fh-ANN NUS 0.0508 0.0559

1.0 f-ANN US 0.0625 0.0664 1.0

f-ANN NUS 0.0680 0.0742

2.0

fh-ANN US 0.0502 0.0480

2.0

fh-ANN NUS 0.0505 0.0557

2.0 f-ANN US 0.0642 0.0668 2.0

f-ANN NUS 0.0690 0.0756

3.0

fh-ANN US 0.0473 0.0493

3.0

fh-ANN NUS 0.0507 0.0568

3.0 f-ANN US 0.0674 0.0710 3.0

f-ANN NUS 0.0693 0.0798

170

The goal of the conditional simulation was to not only preserve the true values at

the sampled locations but to also conform to the pre-established statistics of the true

fields. For all scenarios, the mean and variance were computed for the / and h fields

estimated by the three approaches. Figure (7.35) and (7.36) are scatter-plots depicting

tme versus computed means for the / and h fields, respectively. Each represents a

different scenario, and exhibits the results for the 100 realizations.

It is clear from both figures that the estimated means of/and h fields for all three

approaches are in good agreement with the true field for the uniform sampling scheme.

For the non-uniform sampling scheme, the points are scattered about the 45-degree line

with greater dispersion. The ICS always outperformed both neural networks in

preserving the mean / and h fields. This is because the ICS conditions the field and

preserves its statistics. In contrast, the neural networks are not constrained to preserve

the pre-established statistics.

Similarly, Figures (7.37) and (7.38) show scatter-plots depicting true versus

computed variances for the/and h fields, respectively. For the/field, the ICS always

outperformed the neural networks for variances 1.0 and 2.0. Poor/field results for the

ICS were generally obtained for both sampling schemes with a variance of 5.0. This is

consistent with the magnification of the non-linear behavior between / and h with high

variances in/.


0.8

0.4

® 0.0 .S 3 E -0.4 CO

-0.8

-1.2

T • 'l Var. = 1.0

• "J y A • A Ojifta

( 1

-

/ 1 f 1 t -1.2 -0.8 -0.4 0.0 0.4 0.8

Real

0.8

0.4

"O ® 0.0

JO 3 .§ -0.4 CO

-0.8

-1.2

—1 1 1 1 y Var-= 1.0 0 y

-

i- —

r- "

-

»-f

o

^ "SEaSS *

t/

/ 1 1 1 1 -1.2 -0.8 -0.4 0.0 0.4 0.8

Real

Var. s 2.0

® 0.0

1.0

Figure (7.35) Tare vs. estimated ICS (o), fh-ANN (A), and f-ANN (-H)

mean f fields.

Uniform Samplina

0.10 Var. = 1.0

•o o (0 •5 0.00

E tn

-0.10 -0.10 0.00 0.10

Real

0.10

Var. = 2.0

•5 0.00

E tn

-0.10 0.00 0.10

Real

0.12

Var. = 5.0

0°

•a o (D 0.00

E CO

•0.12 -0.12 0.00 0.12

Real

Non-Uniform Sampling

0.10

Var. = 1.0

•a a a •5 0.00

E CO

•0.10

-0.10 0.00 0.10 Real

0.10

Var.» 2.0

•o 0

CO t* 3 0.00

E CO

-0.10 0.00 0.10 Real

0.12

Var. > 5.0

"O o CO

•5 0.00

E CO O •

-0.12

-0.12 0.00 0.12

Real

Figure (7.36) Ture vs. simulated ICS (0), fh-ANN (a), and f-ANN (+)

mean h fields.

Uniform Sampling

0.5

0.0

1 1 Var. = 1.0 O

—1 —~p

oX °/ <b 8/

3°rv/6 A +•

o

/ 1 r '

Non-Uniform Sampling

2.0

1.5

1.0

0.5 -

0.0 0.5 1.5

5.0 6.0

4.0

"D 4.0 ® 3.0

£ 2.0 CO 2.0

1.0

0.0 0.0

Var. = 2.0 Var. > 2.0

CD

0.0 1.0 2.0 3.0 4.0 5.0

Real 0.0 2.0 4.0

Real 6.0

100.0

Var. = 5.0

250.0

75.0 100.0

•Var. a 5.0

200.0

w 150.0

.5 100.0

0.0 0.0 50.0 100.0 150.0 200.0 250.0

Real

Figure (7.37) Ture vs. estimated ICS (o), fh-ANN (A), and f-ANN (+)

variance of f fields.

174

Uniform Sampling

0.002 Var. = 1.0

— 0.001

0.001 0.000

0.000 0.002

Uniform Sampling

0.002 var. = 1.0

"= 0.001

0.000 0.000 0.001

Real 0.002

0.004

"D O (B •= 0.002 3 E

CO

0.000

1 1 1 "p Var. = 2.0

A i

A + ^ / •Hi / A A p6 *•

& Qix

c?* * p*- +•

t

0.0050

0.000 0.002

Real 0.004

Var. a 2.0

3 0.0025

0.0000 0.0000 0.0025

Real 0.0050

0.008

Var. = S.o

0.004

0.000 0.000 0.004

Real 0.008

0.010

•a o (Q -= 0.005

E (O

0.000

• '• 1 1 Var.« 5.^ °

I 1 J

A

o o A O OA A. o a O yA A

A A

I* 0.000 0.005

Real 0.010

Figure (7.38) Ture vs. simulated ICS (o), fh-ANN (A), and f-ANN (+)

variance of h fields.

175

8. CONCLUSIONS AND RECOMMENDATIONS

8.1 Conclusions

In this research, the feasibility of using ANN as a tool for estimating

transmissivity and the corresponding head fields from limited sample data was explored.

Two different ANN architectures were used; one utilizing f data only, and the other

utilizing both/and h data in the input vector. The ability of these ANNs' to estimate the

true fields under different scenarios was compared with ICS. The scenarios considered

different sampling schemes (uniform versus non-uniform) and different/field variances

(1.0, 2.0, and 5.0). Based upon the results of this research, the following observations

can be made:

1. Although the ANN approach is not physically based like flow or transport simulation

models (stochastic approaches), it shows capability and superiority in learning a

highly heterogeneous random fields of transmissivities from a limited amount of

scattered/and h data. This is supported by the high accuracy performance of 99.9%,

with an average generalization performance for 219 patterns of 94.2% and 96 % for f-

ANN and fh-ANN, respectively.

2. As any physically based model, ANN performs better when more information is

used. It has been shown that ANN performed better when both/and h data (fh-ANN)

are used, as compared with (f-ANN), where only/data are utilized.

3. It was found that the ICS is very sensitive to the variance of / field. That is, its

estimates of the / and h fields become markedly worse as the variance increases.

176

This sensitivity is magnified when non-uniform sampling is used. This indicates that

ICS is a poor estimator in reproducing the true fields for the high variance cases. This

may be due to attempting to condition a relatively large number of/and/or h data that

have high variances.

4. Regardless of the variance of the training set, ANN achieves a similar minimum

global error. Thus, the ANN is able to leam and generalize the heterogeneous namre

of the fields regardless of their spatial variability.

5. ANN overcomes the limitations of ICS at high variances since it produces much

smaller MSE than the ICS.

6. ANN can not reproduce the exact h values at the sampled locations. This is because

after sufficient training, the ANN maps input to output with a fixed weight vector.

This weight vector does not condition mapping to reproduce values at the sampled

locations, but seeks to minimize the global error.

7. ANNs are not constrained to preserve the pre-established statistics as in the case of

ICS. This problem can not be solved internally by the ANN, but may be externally

"solved" by accepting only those outputs that preserve the statistics within some

acceptable tolerance.

8. One clear drawback of ANN is the 'smoothing effect' that appears in the estimated/

fields. In other words, there was a gradual transition from regions of low to high

values, unlike the sharp boundary interfaces produced by ICS. This smoothing effect

could be due to the high number of neurons in the output layer relative to the input

layer. The ratios of input/output neurons for uniform sampling were 1:33 and 1:16.5

177

for f-ANN and fh-ANN respectively. The ratios, even larger for non-uniform

sampling, were 1:203.8 and 1:29 for f-ANN and fh-ANN, respectively. These small

input/output ratios were not cited anywhere in the literature, meaning that the number

of input neurons is always much higher than number of output neurons. Improved

performance would be obtained vidth a larger input/output ratio.

9. In conditioning with ICS, values at unsampled locations (majority) are estimated by

co-kriging, which is a linear estimator. ANN relates input to output through a transfer

or activation function, which can take many different forms, from simple linear to

highly nonlinear. This characteristic helps ANN make better estimates of the output

vector (e.g. transmissivity values at unsampled locations).

10. In contrast, larger input vectors (f and/or h) improves ANN learning. This is

underscored by the fact that the fh-ANN significantly outperformed the f-ANN in

estimating the true fields.

11. In the derivation of the stochastic flow equation, an assumption of statistical

homogeneity for both/and h was necessary for spectral representation of input/and

output h by Fourier-Stieltjes integral. For development of the ANN, such

assumptions are not necessary.

12. ANN requires a large number of training sets to eflfectively train the network. This

means thousands of f and/or h data sets are required for a particular field size; this

number increases as the field size is increased or refined. In real world cases, such

data sets rarely, if ever exist. This means that the real world case must be

transformed into an equivalent hypothetical case, where random fields are generated

178

to represent the real field.

13. Once the ANN is trained and the final connection weights obtained, producing

estimates of the true field is faster and simpler with the neural network approach than

with the stochastic simulation or classical co-kriging approach. Faster implies less

computational time and simpler less complex mathematical operations. For example,

it takes only two seconds to produce a single estimate using ANN, as compared to an

average time of 75 seconds for ICS. This CPU time increases exponentially for ICS

when high variances are used. These runs were made on an IBM compatible PC

with CPU-350 of type AMD K6-II.

In conclusion, the results of this research agree with previous findings where ICS

performed well only at variances less than or equal to 2.0. Above this value, conditioning

head data becomes increasingly problematic. In contrast, the ANN that used

transmissivity and head data performed well, both at low and high variances. However,

when only transmissivity data was used, the ANN did not accurately estimate the head

field. The former case is consistent with real world situations, where both transmissivity

and head data are measured in the field. The non-uniform sampling scheme used in this

study is particularly applicable to many field sites, where head outnumbers transmissivity

measurements.

179

8.2 Recommendations for future work

• In tiiis study, it is assumed that measurements of transmissivity and head are error

free. One may consider errors and/or uncertainty and develop a kind of neuro-fuzzy

system to incorporate these errors and uncertainties.

• The ANN approach can also be tried to solve contaminant transport problems. In this

case, ANN can be built considering velocity field and solute concentration in its input

and/or output vectors.

Since ANN leams by example, this approach can be used in cases of transient flow

with source and sink terms. In this case the ANN can be trained on time bases.

Unsaturated flow systems can also investigated by ANN.

Since there is no restriction in the design of ANN, different architectures can be tried

to solve the same problem. For example, geostatistical parameters and the

coordinates of sampled and unsampled data could be incorporated into the design of

the ANN.

Beside the Backpropagation NN, Other ANN types can also be tried, for example

Learning vector quantization. Self organizing map. General regression neural

network. Modular neural networks. Radial bases function NN, and Probabilistic NN.

180

9. REFERENCES

Ahmed, S.; and G. de Marsily, Cokxiged estimation of aquifer transmissivity as an indirect solution of the inverse problem: A practical approach, Water Resour. Res., 29(2), pp.512-53, 1993.

Alabert, F., The practice of fast conditional simulations through the LU decom.position of the covariance matrix. Mathematical Geology, 19, pp.369-386, 1987.

Anderson, M. P., Comment on: Universal scaling of hydraulic conductivities and dispersivities in geologic media". Water Resour. Res., 27, 1381-1382, 1991.

Bakr, A.A., Stochastic analysis of the effect of spatial variations in hydraulic conductivity on groundwater flow, Ph.D. dissertation. New Mexico Institute of Mining and Technology, Socorro, 1976.

Bakr, A.A; L.W. Gelhar; A. L. Gutjahr; and J. R. McMillan, Stochastic analysis of the effect of spatial variability in subsurface flows, 1: Comparison of one- and three-dimensional flows. Water Resour. Res., 14(2), pp.263-271, 1978.

Bear, J., Dynamics of fluids in porous media, Elsevier, New York, 1972.

Bennion, D.W.; and J.C. Griffiths, A stochastic model for predicting variations in reservoir rock properties. Trans. AIME, 237, Part 2, pp.9-16, 1969

Bras, R.L.; and 1. Rodriguez-Iturbe, Random functions and hydrology, Addison-Wesley, Reading, Massachusetts, 1985.

Brigham, E.O., The fast Fourier transform and its applications, Englewood Cliffs, N.J., 1988.

Brooker, P. I., Two-dimensional simulation by turning bands. Mathematical Geology, 17, pp.81-90, 1985.

Bulness, A.C., An application of statistical methods to core analysis data of dolomitic limestone, Tran. MME 165, pp.223-240, 1946.

Byers, E.; and D.B. Stephens, Statistical and stochastic analysis of hydraulic conductivity and particle size in a fluvial sand. Soil Science Sac. Am. J., 47, pp. 1072-1080, 1983.

Carrara, J.; and L. Glorioso, On Geostatistical formulation of the groundwater flow inverse problem, Water Resour., 14(5), pp.273-283, 1991.

181

Carrera, J.; and S. P. Neuman, Estimation of aquifer parameters under transient and steady state conditions 1. Maximum likelihood method incorporating prior information. Water Resour. Res., 22(2), pp.199-210, 1986a.

Carrera, J.; and S. P. Neuman, Estimation of aquifer parameters under transient and steady state conditions 2. Uniqueness, stability, and solution algorithms. Water Resour. Res., 22(2), pp.211-227, 1986b.

Carrera, J.; and S. P. Neuman, Estimation of aquifer parameters under transient and steady state conditions 3. Application to synthetic and field. Water Resour. Res., 22(2), pp. 228-242, 1986c.

Clifton, P. M.; and S. P. Neuman, Effects of kriging and inverse modeling on conditional simulation of the Avra Valley aquifer in southem Arizona, Water Resour. Res., 18(4), pp.1215-1234, 1982.

Cooley, R.L., Incorporation of prior information of parameters into nonlinear regression groundwater models, l.Theory, Water Resour. Res., 1S(4), pp.965-976, 1982.

Dagan, G., A note on higher-order corrections of the head covariances in steady aquifer flow. Water Resour. Res., 21, pp.573-578, 1985b.

Dagan, G., Stochastic modeling of groundwater flow by unconditional and conditional probabilities, 1. Conditional simulation and the direct problem. Water Resour. Res., 18(4), pp.813-833, 1982a.

Dagan, G., Stochastic modeling of groundwater flow by unconditional and conditional probabilities, 2. The solute transport. Water Resour. Res., 18(4), pp.835-848, 1982b.

Dagan, G., Stochastic modeling of groundwater flow by unconditional and conditional probabilities: The inverse problem. Water Resour. Res., 21(1), pp.65-72, 1985a.

Dagan, G., Time-dependent macro dispersion for solute transport in anisotropic heterogeneous aquifers. Water Resour. Res., 24(9), pp. 1491-1500, 1988.

Davis, M.W., Production of conditional simulations via the LU decomposition of the covariance matrix. Mathematical Geology, 19, pp.91-98, 1987.

de Marsily, G., Quantitative hydrogeology, groundwater hydrology for engineers. Academic Press, San Diego, 1986.

Freeze, R. A., A stochastic-conceptual analysis of one-dimensional groundwater flow in non-uniform, homogeneous media. Water Resour. Res., 11(9), pp.725-741, 1975.

182

Garabedian, S. P.; D. R. LeBlanc; L. W. Gelhar; and M. A. Celia, Large scale natural gradient tracer test in sand and gravel. Cape Cod, Massachusetts, 2. Analysis of spatial moments for a nonreactive tracer. Water Resour. Res., 27(5), pp.911-924, 1991.

Gavalas, G. R.; P. C. Shah; and J. H. Seinfeld, Reservoir history matching by Bayesian estimation, Tran. AIME 261, pp.337-350, 1976.

Gelhar, L. W., Effects of hydraulic conductivity variations on groundwater flow, in Proceedings, Second International lAHR Symposium on Stochastic Hydraulics, International Association of Hydraulic Research, Lund, Sweden, 1976.

Gelhar, L. W.; P.Y. Ko; H.H. Kwai; J. L. Wilson, Stochastic modeling of groundwater systems, R.M. Parsons Laboratory for Water Resources and Hydrodynamics Report 189. Cambridge, Mass,; MTT, 1974.

Gelhar, L.W., Stochastic subsurface hydrology, Prentic Hall, 1993.

Gelhar, L.W; and C. L. Axness, Three-dimensional stochastic analysis of macro dispersion in aquifers. Water Resour. Res., 19(1), pp.161-180, 1983.

Gomez-Hernandez, J. J., A stochastic approach to the simulation of block conductivity fields conditioned upon data measured at a smaller scale, Ph.D. dissertation. Department of Applied Earth Sciences, Stanford University, 351 pp., 1991.

Gray, R. M., L. D. Davisson, Random processes: a mathematical approach for engineers, 305pp., Prentice-Hall, Englewood-Cliffs, New Jersey, 1986.

Gutjahr, A. L., Kriging in stochastic hydrology: assessing the worth of data, AGU Chapman conference on spatial variability in hydrologic modeling. Fort Collins, Colorado., 1981.

Gutjahr, A. L., Q. Bai, S. Hatch, Conditional simulation applied to contaminant flow modeling, Technical Completion Report, Department of Mathematics, New Mexico Institute of Mining and Technology, Socorro, New Mexico, 1992.

Gutjahr, A. L.; and 3.L. Wilson, Co-kriging for stochastic models. Transport in Porous Media, ^(6), pp.585'598. 1989.

Gutjahr, A. L.; L.W. Gelhar; A. A. Bakr; and J. R. McMillan, Stochastic analysis of spatial variability in subsurface flows. Part 11: Evaluation and application. Water Resources Research, 14(5), pp.953-960, 1978.

183

Gutjahr, A. L.; S. J. Colarullo; and F. M. Phillips, Validation of a two-scale aquifer characterization model using Las Cruces Trench experimental data, EOS Transaction AGU, V. /7/7.250, Fall's meeting abstract, 1993.

Gutjahr, AL., Fast Fourier transforms for random field generation. Project Report for Los Alamos Grant to New Mexico Tech, Contract number 4-R58-2690R, Department of Mathematics, New Mexico Tech, Socorro, New Mexico, 1989.

Gutjahr, AL.; S. J. Colarullo; S. Hatch; and L. Hughson, Joint conditional simulations and the spectral approach for flow modeling, J. of Stochastic Hydrology, pp. 80-108,1995.

Hanna, S., An iterative Monte Carlo technique for estimating conditional mean and variances of transmissivity and hydraulic head fields, Ph.D. dissertation. Department of Hydrology and Water Resources, University of Arizona, Tucson, Arizona, 1995.

Harr, M.E., Reliability-based design in civil engineering, McGraw-Hill, New York, 1987.

Harter, Th., Conditional simulation: a comprehensive review of some important numerical techniques, preliminary examination report, personal communication, 1992.

Harter, Th., Unconditional and conditional simulation of flow and transport in heterogeneous, variably saturated porous media, Ph.D. dissertation. Department of Hydrology and Water Resources, University of Arizona, Tucson, Arizona, 1994.

Harter, Th.; and T.-C.J. Yeh, An efficient method for simulating steady unsaturated flow in random porous media; using an analytical perturbation solution as initial guess to a numerical model. Water Resour. Res., 29(12), pp.4139-4149, 1993.

Harvey, F.H.; and S.M. Gorelick, Mapping hydraulic conductivity: sequential conditioning with measurements of solute arrival time, hydraulic head, and local conductivity. Water Resour. Res., 31(7), pp.1615-1626, 1995.

Hebb, D., Organization of behavior, John Wiley, New York, 1949.

Hillel D., Fundamentals of soil physics, 413 pp.. Academic Press, New York, 1980.

Hoeksema, R J.; and P.K Kitanidis, Comparison of Gaussian conditional mean and kriging estimation in the geostatistical solution of the inverse problem. Water Resour. Res., 21(6), pp. 825-836, 1985.

184

Hoeksema, R. J.; and P.K. Kitanidis, An application of the geostatistical approach to the inverse problem in two-dimensional groundwater modeling. Water Resour. Res., 20(7), pp. 1003-1020, 1984.

Hoeksema, R. J.; and P.K. Kitanidis, Prediction of transmissivities, heads, and seepage velocities using mathematical modeling and geostatistics. Adv. Water Resour., 12, pp.90-101, 1989.

Hopfield, J. J., Neural Network and physical systems with emergent collective computational abilities, Proc. Nat. Acad. Sci. USA 79:2554-2558, 1982.

IBM, Engineering and scientific subroutine library, guide and reference, IBM publication, Joumel, A. G., Geostatistics for conditional simulations of ore bodies, Econ. GeoL, 69,

pp. 673-687, 1974.

Jensen, J.L.; D.V. Hinkley; and L.W. Lake, A statistical study of reservoir permeability: distribution, correlation and averages, Soc. Petr. Eng. Formation Evaluation, pp.461-468, 1987.

Joumel, A. G.; and Ch. J. Hujibregts, Mining Geostatistic, 600 pp.. Academic Press, San Diego, 1978.

Joumel, A.G.; and J.J. Gomez-Hernandez, Stochastic imaging of the Wilmington clastic sequence, 64th Annual Technical Conference and Exhibition of the Society of petroleum Engineers, San Antonio, Texas, October 8-11, 1989, SPE 19857, pp.591-606, 1989.

Jury, W. A., W. R. Gardner, W. H. Gardner, Soil physics. New York (Wiley), 1991.

Kitanidis, P.K.; and E.G. Vomvoris, A geostatistical approach to the inverse problem in groundwater modeling (steady state) and one-dimensional simulations, Water Resour. Res., 19(3), pp. 677-690,1983.

Kolmogorov, A. N., On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition, DokJ. Akad. Nauk USSR, 114:953-956. 1957, In Russian.

Law, 3., A statistical approach to the interstitial heterogeneity of sand reservoirs. Trans MME, 155, pp.202-222,1944.

Lent, T. V. and BCitanidis, P. K., Effects of first-order approximations on head and specific discharge covariances in high-contrast log conductivity. Water Resour. Res., 32(5), pp. 1197-1207, 1996.

185

Lefteri, H. T., and Robert, E. U., Fuzzy and Neural Approaches in Engineering, John Wiley & Sons, Inc., 1997.

Li, S.-G.; D. McLaughlin, A nonstationary spectral method for solving stochastic groundwater problems: unconditional analysis. Water Resour. Res., 27, pp.1589-1605, 1991.

Lumley, J.L.; and H.A Panofsky, The structure of atmospheric turbulence, John Wiley, New York, pp. 239, 1964.

Mantoglou, A; and J.L. Wilson, The turning bands method for simulation of random fields using line generation by a spectral method. Water Resour. Res., 18(5), pp. 1379-1394, 1982.

Mantoglou, A, L.W. Gelhar, Stochastic modeling of large-scale transient unsaturated flow systems. Water Resour. Res., 23(1), 31-46, 1987a.

Mantoglou, A., L.W. Gelhar, Capillary tension head variance, mean soil moisture content, and effective specific soil moisture capacity of transient unsaturated flow in stratified soils. Water Resour. Res., 23(1), 47-56, 1987b.

Mantoglou, A, L.W. Gelhar, Effective hydraulic conductivities of transient unsaturated flow in stratified soils, Water Resour. Res., 23(1),57-67, 1987c.

Mantoglou, A., Digital simulation of multi-variate two- and three-dimensional stochastic processes with a spectral turning bands method. Mathematical Geology, 19, 129-149, 1987.

Maren, A. J., C. T. Harston, and R. M. Pap, Handbook of neural computing applications, Academic Press, San Diego, 1990.

Matheron, G., The intrinsic random functions and their applications, Advan. Appl Probab., 5, pp.438-468, 1973.

Mizell, S.A.; A. L. Gutjahr; and L.W. Gelhar, Stochastic analysis of spatial variability in two-dimensional steady groundwater flow assuming stationary and nonstationary heads. Water Resour. Res., 18(4), pp. 1053-1067, 1982.

Mustakata, T., Commercial and industrial AI, Commun. ACM 37(3):23-26, 1994.

Myers, D. E., Vector conditional simulation, Vol.1, edited by A. Armstrong, pp.283-293, 1989.

186

Neuman, S. P., A statistical approach to the inverse problem of aquifer hydrology, 3. Improved solution method and added perspective. Water Resour. Res., 16(2), pp.331-346, 1980.

Neuman, S. P.; and S. Yakowitz, A statistical approach to the inverse problem of aquifer hydrology, 1. Theory, Water Resow. Res., 15, pp.845-860, 1979.

Neuman, S. P., Reply, Water Resour. Res., 27,1383-1384, 1991.

Papoulis, A., Probability, random variables, and stochastic processes, 2°'^ edition, 576 pp., McGraw-Hill, New York, 1984.

Parker, D., Optimal Algorithms for Adaptive Networks: Second Order Back-Propagation, Second order Direct Propagation, and Second Order Hebbian Learning, Proceedings of the IEEE First International Corrference on Neural Networks, Vol. 11, San Diego, CA, pp 593-600, 1987.

Plaut, D., S. Nowlan, and G. Hinton, Experiments on learning by backpropagation. Technical report CMU-CS-86-126, Department of Computer Science, Carnegie Mellon University, Pittsburg, PA, 1986.

Poggio, T., and Girosi, F., Regularization Algorithms for Learning that Are Equivalent to Multiple-Layer Networks, .Jc/e/Tce, VoL247, pp.978-982, 1990.

Press, W. H.; W.T. Vetterling; S. A. Teukolsky; and B. P. Flannery, Numerical recipes in FORTRAN, 2nd edition, Cambridge University Press, Cambridge, 1992.

Priestley, M. B., Spectral analysis and time series. Academic Press, San Diego, pp.890, 1981.

Rubin, Y.; and G. Dagan, Stochastic identification of transmissivity and effective recharge in steady groundwater flow, 1. Theory, Water Resour. Res., 23(7), pp.ll85-1192, 1987.

Rumelhart, D. E., G. E. Hinton, and R. J. Williams, Learning representations by backpropagation errors. Nature 323:533-536, 1986b.

Rumelhart, D. E., Hinton, G. R., and Williams, R. J., Learning Internal Representations by Error Propagation, in Parallel Distributed Processing, Vol. 1, D. E. Rumelhart, and J. L. McClelland, eds., MIT Press, Cambridge, MA, 1986.

Russo, D.; and M. Bouton, Statistical analysis of spatial variability in unsaturated flow parameters. Water Resour. Res., 28(7), pp.1911-1925, 1992.

187

Russo, D., Stochastic modeling of macro dispersion for solute transport in a heterogeneous unsaturated porous formation. Water Resour. Res., 29(2), pp.383-397, 1993a.

Russo, D., Stochastic modeling of solute flux in a heterogeneous partially saturated porous formation. Water Resour. Res., 29(6), pp.1731-1744, 1993b.

Russo, D.; and G. Dagan, On solute transport in a heterogeneous porous formation under saturated and unsaturated water flows. Water Resour. Res., 27(2), pp.285-292, 1991.

Schaffer, S., A Semi-Coarsening Multigrid Method for Elliptic Partial Differential Equations with High Discontinuous and Anisotropic Coefficients, SIAM Journal of Scientific Computing, 1995.

Shinozuka, M., Monte Carlo simulation of structural dynamics. Computers and Structures, 2, pp.855-875, 1972.

Shinozuka, M.; G. Deodatis, Simulation of stochastic processes by spectral representation, Ap/?/., Mech. Rev., 44, pp.191-204, 1991.

Smith, L.; and R.A. Freeze, Stochastic analysis of groundwater flow in a bounded domain, 2. Two-dimensional simulations. Water Resour. Res., 1S(6), pp.1543-1559, 1979.

Sudicky, E. A., A natural gradient experiment on solute transport in a sand aquifer: spatial variability of hydraulic conductivity and its role in the dispersion process. Water Resour. Res., 22(2), pp.2069-2082, 1986.

Tompson, A. F.; R. Ababou; and L.W. Gelhar, Implementation of the three-dimensional turning bands random field generator. Water Resour. Res., 25(10), pp.227-2243, 1989.

Vauclin, M.; D.R. Vieira; G. Vachaud; and D.R. Nielsen, The use of cokriging with limited field soil observations. Soil Science Soc. Am. J., 47, pp.175-184, 1983.

Werbos, P., Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, Ph.D. Dissertation, Harvard University, Boston, MA, 1974.

Wiener, N., Generalized harmonic analysis, Acta Math., 55, pp.117-258, 1930.

Willis, R.; and W.G. Yeh, Groundwater Systems Planning and Management, Prentice Hall Englewood Cliffs, NJ, 1987.

188

Yeh, T.-C. J.; A. L. Gutjahr; and M. Jin, An iterative co-conditional simulation method for solute transport in heterogeneous aquifers, EOS, Transactions, American Geophysical Union, 74, Supplement, p.251, October 26, 1993b.

Yeh, T.-C. J.; and P.A. Mock, A structured approach for calibrating steady-state ground water flow models, to appear Ground Water, May-3une, 1996.

Yeh, T.-C. J.; L.W. Gelhar; and A. L. Gutjahr, Stochastic analysis of unsamrated flow in heterogeneous soils, 1: Statistically isotropic media. Water Resour. Res., 21(4), pp.447-456, 1985a.

Yeh, T.-C. J.; R. Srivastava; and A. Guzman, Th. Harter, A numerical model for two-dimensional water flow and chemical transport. Ground Water, 32, pp.2-il, 1993a.

Yeh, T.-C.J.; L.W. Gelhar; and A. L. Gutjahr, Stochastic analysis of unsamrated flow in heterogeneous soils, 2: Statistically anisotropic media with variable CL, Water Resour. Res., 21(4), pp.457-464, 1985b.

Yeh, T.-C.J.; L.W. Gelhar; and A. L. Gutjahr, Stochastic analysis of unsaturated flow in heterogeneous soils: 3, Observations and Applications, Water Resour. Res., 21(4), pp. 465-471, 1985c.

Yeh, T-C. J., Stochastic modeling of groundwater flow and solute transport in aquifers, HydroL Processes, 5, pp.369-395, 1992.

Yeh, T-C. J.; A. L. Gutjahr; and M. Jin, An iterative Cokriging-like technique for groundwater flow modeling. Groundwater Water, Jan.-Feb, 1995a.

Yeh, T-C. J.; and J. T. McCord, Review of modeling of water flow and solute transport in the vadose zone: Stochastic Approaches, Technical Report HWR94-040, Hydrology and Water Resources Dept., November 10, 1994.

Yeh, T.-C.J.; M. Jin; and S. Hanna, An iterative inverse method: Conditional effective transmissivity and hydraulic head fields. Water Resour. Res., 32(1), pp. 85-92, 1996.

Yeh, W.W-G., Review of parameter identification procedure in groundwater hydrology: The inverse problem. Water Resour. Res., 22(2), pp.95-108, 1986.

Yeh, W.W-G., System analysis in groundwater planning and management, J. of Water Resour. Planning and Manag., 118(3), pp. 22^231, 1992.

Yeh, W.W-G.; and G.W. Tauxe, Optimal identification of aquifer diffiisivity using quasi-linearization. Water Resour. Res., 7(4), pp. 955-962, 1971.

189

Zimmermann, D. A.; and J.L. Wilson, TUBA: A computer code for generating two-dimensional random fields via the turning bands method. A: User Guide, Seasoft, Albuquerque, New Mexico, 1990.

artificial neural networks and conditional stochastic ... · artificial neural networks and...

Documents