evaluating conditional density estimation networks as a … · 2015. 6. 26. · evaluating...
TRANSCRIPT
Evaluating conditional density estimation networks as a probabilisticdownscaling tool: Application to precipitation in Newfoundland
by
c⃝ Biswas, RaselB. Sc.(Hons), M. Sc.
A report submitted to theSchool of Graduate Studiesin partial fulfillment of the
requirements for the degree ofMaster of Science.
Department of Scientific ComputingMemorial University of Newfoundland
21st June 2015
St. John’s Newfoundland
Abstract
The objective of this research is to better quantify the distribution of extreme pre-
cipitation within Newfoundland, for the current climate conditions. The province of
Newfoundland is interested in this information as guidance for climate adaptation
and longterm infrastructure planning purposes. Extreme analyses are commonly lim-
ited by short periods of observation (often only 30 years or less), resulting in large
uncertainty regarding rare events. These limitations can be addressed by increasing
the period of observation, or partially addressed using synthetic time series generated
by stochastic weather generators. These weather generators attempt to replicate
relevant statistics of the observed climate, using various statistical modelling tools.
Here, a probabilistic neural network-based downscaling method is used to predict the
probability distribution of precipitation at a study site conditional upon the synoptic
state of the atmosphere; that is, features of a given day’s large-scale atmospheric
state are used to estimate the likelihood of all possible precipitation amounts for that
day. My results show that CDEN-based probabilistic downscaling generally agrees
better with observations than uncorrected reanalysis-based precipitation estimates.
However, it gives significantly higher estimates for extreme events at low frequency
(e.g., 100 year return periods). This approach is demonstrated for St. John’s airport.
Comparison with nearby stations, Windsor Lake, suggests that these higher estimates
may be reasonable; however, further work and a longer observational record is needed
ii
to determine whether these estimates represent added value to the raw observational
record or a shortcoming of our CDEN-based methodology.
iii
Contents
Abstract ii
List of Tables vi
List of Figures vii
1 Introduction 1
2 Data 7
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Atmospheric Reanalyses . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Station data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Methodology 10
3.1 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Multilayer perceptron (MLP) . . . . . . . . . . . . . . . . . . . . . . 11
3.3 The Conditional Density Estimation Network (CDEN) . . . . . . . . 13
3.4 Predictor selection method . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Probability distributions for daily Precipitation . . . . . . . . . . . . 18
3.6 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
iv
4 Application of CDEN to Precipitation Downscaling 22
4.1 Predictor Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 33
v
List of Tables
4.1 Cross-validated model performance statistics for CDEN, NCEP down-
scaling models with root mean square error (RMSE), correlation (r),
Hit-rate (HIT), False alarm ratio (FAR), True scale statistic (TSS),
Bias ratio (BIAS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
vi
List of Figures
3.1 Potential predictors from NCEP reanalysis at locations used in the
current study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Station data and randomly created Precipitation data via Bernoulli-
gamma distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Cross Validation (CV) method. . . . . . . . . . . . . . . . . . . . . . 21
4.1 Red locations and blue locations represents selected predictor locations
and potential predictor locations respectively for precipitation at the
St. John’s airport station. . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Compared total amount of seasonal precipitation between the station
data and the stochastic weather generator using CDEN Model. The
blue line represents the total amount of seasonal precipitation for the
station data and box plot represents the total amount of seasonal pre-
cipitation for the stochastic weather generator. . . . . . . . . . . . . . 27
4.3 Quantile-Quantile plot at the St. John’s international airport. . . . . 28
4.4 Compared between (a) the precipitation data for St. John’s airport
and (b) the stochastic weather generator using CDEN Model.The line
inside each box represents the median, boxes extend from the 25th to
75th percentiles, and outliers are shown as circles. . . . . . . . . . . . 32
vii
Chapter 1
Introduction
The response of precipitation to climate change has been the subject of numer-
ous studies, including annual and seasonal analyses on both global and local scales
(Houghton et al., 1996). Past studies suggest that mean global precipitation has
increased throughout the 20th century; however, trends show considerable spatial
variability (Houghton et al., 1996). For example, in northern Europe, positive trends
have been identified (Forland et al., 1996; Schonwiese et al., 1997), while in parts
of the south trends are often negative (Schonwiese et al., 1997). This variability is
partially explained by the complexity of precipitation responses to changing climate,
which may include adjustments in the frequency of events, the intensity of events,
or both (Trenberth et al., 2003; Allen & Ingram, 2002); furthermore, these responses
may vary across the seasons (Finnis et al., 2007).
Impacts of climate change on intense, but relatively rare, periods of precipitation
are especially of interest. These extreme precipitation events arise from complex in-
teractions between temperature, moisture, winds and orography; individual extreme
events typically influence small areas, making it difficult to predict the response of
extremes to climate change. The biggest obstacle in this area of research is a lack of
1
Chapter 1. Introduction 2
high quality data with which to constrain results. At the end of the last century, a
significant positive trend in the frequency of extreme rainfall (greater than 50mm per
day) has been identified in the US (Karl et al., 1995, 1998). In Australia, a signific-
ant increase in the 90th and 95th percentiles of daily precipitation was identified by
Suppiah & Hennessy (1995) and Suppiah & Hennessy (1998). Hennessy & Suppiah
(1999) and Plummer et al. (1999) further showed increases in the 99th percentile.
A study by Groisman et al. (1999) provides a useful synopsis of precipitation responses
to climate change, both on global and regional scales. Over the past century in the
US, Australia, and Norway, Groisman et al. (1999) found that total annual precip-
itation had increased, as had the mean intensity of individual precipitation events,
i.e., increasing intensity was contributing to an increase in precipitation amounts. In
order to describe these changes, they compared fitted probability distributions under
warmer and cooler climates; results suggested that parameters associated with the
spread of a distribution (e.g., scale parameters) were most affected by warming, while
those associated with the shape of a distribution were less affected. This translates
to an increase in the likelihood of high precipitation amounts. These changes are in-
dependent of total precipitation; however, some locations (e.g., Siberia) experienced
this increase in intensity without an associated increase in total precipitation, sug-
gesting a compensating decrease in the frequency of events. Generally, however, the
projected increases in total accumulated precipitation were found to be more robust
than changes in extreme (intense) precipitation (Groisman et al., 1999).
A further problem in predicting climate responses to climate change involves geo-
graphic scale. While the general circulation models (GCMs) typically used for climate
change analyses perform reasonably well over large spatial scales (e.g., 100-1000km),
they do not adequately represent the small (regional or local) scales of greatest in-
terest in climate adaptation planning. This issue compounds problems associated
Chapter 1. Introduction 3
with spatial variability of precipitation, as small-scale factors such as local topo-
graphy, coastline orientation, and slope aspect (among many others) can exert large
influences on local-scale precipitation, but cannot be captured by GCMs. The heav-
iest (most intense) precipitation events are also typically associated with processes
acting on spatial scales much smaller than GCM resolutions; consequently, heavy
precipitation is reduced as the impact of small-scale, intense events, such as convect-
ive storms, are spread over a very large GCM grid cell. Despite these limitations,
large-scale GCM output can be used to infer local-scale impacts, assuming that a
location’s precipitation is adequately connected to large-scale forcing. The process of
predicting local-scale information from large-scale data is referred to as ‘downscaling’,
and is accomplished through a wide range of techniques (e.g., Benestad et al., 2008).
Broadly, these fall into two categories:
i. Dynamic downscaling, in which higher resolution regional climate models are
nested within low resolution GCM output, and
ii. Statistical downscaling, which capitalize on long observational records to identify
cross-scale statistical relationships.
There is no single, universally accepted approach to precipitation downscaling, des-
pite a number of comparative studies (e.g., Wigley et al., 1990; Wilby et al., 1998)
and dedicated dynamical downscaling efforts (e.g., Christensen et al., 2007; Goodess,
2011; Gutowski et al., 2010). Commonly used statistical methods include transfer
functions (e.g., regression) (Wigley et al., 1990; Wilby et al., 1998) and weather typ-
ing schemes (e.g., Vrac et al., 2007). Perfect prognosis methods (or, model output
statistics) have been adapted for downscaling projections (Rummukainen, 1997), as
have artificial neural networks (ANNs).
In order for statistical downscaling to be effective, the GCMs being downscaled
Chapter 1. Introduction 4
must simulate required large-scale variables realistically. When a direct relationship
between large-scale variables is simulated by a model, and local-scale observations
exist, statistical downscaling essentially corrects for model errors (see, e.g., Lenderink
et al., 2007). Most downscaling methods are deterministic, that is, they provide a
single ‘best-guess’ of precipitation given large-scale conditions. However, the same
data used to identify and train downscaling models can be used to develop stochastic
weather generators. These are statistical models that generate synthetic time series
of the local scale variable while reproducing statistical properties of the observations
(e.g., Wilks, 1998; Vrac et al., 2007). Incorporating stochastic weather generators
into downscaling schemes has the potential to improve representation of extreme
events (which are often the primary concern in adaptation planning); deterministic
approaches frequently underestimate these extremes as they strive to predict the most
likely outcome, which is often closer to the long-term mean value (e.g., Fowler et al.,
2007).
Multivariate linear regression (MLR) forms the basis of the most commonly used
statistical downscaling approaches. However, the interest in nonlinear regression
methods, including ANNs, is increasing because of their potential to accommod-
ate complex, nonlinear and time-varying input-output mapping. Relative to MLR,
ANN-based regression is extremely flexible and capable of identifying nonlinear rela-
tionships between input (predictors) and output (predictands), given enough training
data.
If no nonlinear relationship exists, ANNs give similar results to MLR-based methods,
but with additional computational costs, and using a regression methodology that
is much more difficult to interpret (Schoof & Pryor, 2001). However, when applied
to precipitation, ANNs are often able to account for heavy rainfall events missed by
linear regression approaches (Weichert & Burger, 1998). Cannon & McKendry (2002)
Chapter 1. Introduction 5
found that an ensemble ANN downscaling model was capable of predicting changes
in stream flows using only large-scale atmospheric conditions as model input. Other
nonlinear downscaling methods have been used, to varying effect. These include (but
are not limited to) dynamic neural networks, which accommodate time-varying input-
output relationships (Gautam & Holz, 2000), and Gaussian Processes (e.g., Cai et al.,
2006). A common thread in these analyses is that nonlinear approaches typically of-
fer better performance relative to MLR, at the cost of increased complexity. This is
particularly true in locations where environmental processes are complex and highly
nonlinear (Hsieh, 2009).
The multilayer perceptron (MLP) is the most common ANN used in statistical down-
scaling, and essentially acts as a nonlinear alternative to MLR. MLP can deal with
unspecified interactions between multiple covariates to provide a deterministic predic-
tion. An alternative approach uses a conditional density estimation network (CDEN),
which is a probabilistic extension of the MLP (Gardner & Dorling, 1998; Hsieh &
Tang, 1998; Dawson & Wilby, 2001). CDENs have been successfully applied to vari-
ous prediction problems, including air pollutant concentration estimations (Dorling
et al., 2003) and hydroclimatology (Haylock et al., 2006; Cannon, 2010). Rather
than predicting the most likely outcome, a CDEN predicts probability distribution
parameters, conditional upon the state of predictor variables. As such, it provides
a more detailed prediction, allowing the likelihood of all outcomes to be quantified.
When used in precipitation downscaling, its output can be considered as probabilistic
downscaling.
The current study applies the CDEN method to precipitation in St. John’s, New-
foundland. The purpose is to assess the method’s suitability for climate projection
analyses in the province of Newfoundland and Labrador. Here, a CDEN-based model
is used to produce a conditional precipitation distribution for a modern study period.
Chapter 1. Introduction 6
Drawing randomly from these distributions produces an extended precipitation time
series. This can be considered a conditional stochastic weather generator. The skill
of the CDEN is assessed against available observations, and the approach is then
used to estimate the magnitude of extreme events at our study location. The meth-
ods potential value in climate projection is assessed on the basis of this analysis.
Subsequent chapters describe this method in detail. In the current study, extreme
value statistics are based on annual maxima (a ‘block maxima’ approach, rather than
a peaks-over-threshold approach). This was chosen to better emulate the methods
used by Environment Canada for their official intensity/duration/frequency (IDF)
analyses.
Chapter 2
Data
2.1 Overview
All analyses are based on freely available data sets from Environment Canada and
the U.S. National Centers for Environmental Prediction (NCEP). Predictor variables
(i.e., inputs to downscaling algorithms) were taken from the NCEP/NCAR Reanalysis
(Kalnay et al., 1996), while predictand (output) variables are from the Adjusted and
Homogenized Canadian Climate Data set (AHCCD; Vincent, 1998).
2.2 Atmospheric Reanalyses
The need to monitor climate and weather arises for a variety of reasons, from short-
term planning and hazard preparedness to long-term infrastructure planning (e.g.,
using intensity-duration-frequency curves for precipitation). Climate observing sta-
tions are our primary tool in this effort, providing detailed descriptions of climate
at fixed locations. However, maintaining these observing sites is expensive, and cov-
erage is largely limited to populated land areas. Satellite observations can help fill
7
Chapter 2. Data 8
gaps in the surface observing network, but provide less detailed coverage (varying
spatial/temporal coverage), and often provide only indirect measurement of many
variables of interest (Climate Change Science Program, 2008).
Reanalyses were originally proposed as a means of addressing the shortcomings of
climate station and satellite-based observations. This approach was inspired by the
use of atmospheric analyses in meteorology, i.e., the merging of current observations
and prior model forecasts into a best-guess of the atmospheric state at a given point
in time. In operational meteorology, analyses provide a starting point for numerical
weather prediction (NWP) models. The approach used to generate analyses evolves
continually as data assimilation and modelling techniques improve, complicating ef-
forts to interpret long-term climate using incompatible analyses. Reanalyses address
this consistency issue, creating a continuous record using a single NWP model and
data assimilation methodology. The result is a gridded data set of all “analysed” vari-
ables with global coverage over the reanalysis period (usually ∼1950s to present). In
essence, reanalyses estimate data at locations away from climate stations or satellite
observations using a combination of all available observations at other locations and
model output; effectively, it provides an intelligent interpolation informed by NWP
physics. Variables that are not directly analysed can be provided by subsequent iter-
ations of the NWP model (referred to as reanalysis forecast variables) (Kalnay, 2003).
Reanalysis data has proven consistent with the trends produced from the other ob-
servational datasets with some regional exceptions, especially since satellite measure-
ments originated in the late 1970s (Climate Change Science Program, 2008). Unfor-
tunately, reanalysis treatments of precipitation are often of poor quality relative to
other variables (e.g., temperature or sea level pressure). The primary factors that are
responsible for this inconsistency in the trends are limitations in the models and the
methods used to integrate the datasets (Bromwich & Cullather, 1998). Nevertheless,
Chapter 2. Data 9
reanalysis data helps to identify and explore atmospheric features associated with the
weather and climate variability. Reanalysis can assist in these efforts by providing
large-scale context for events such as droughts or floods, allowing us to move beyond
studies at fixed locations.
2.3 Station data
Station data used in the current study were taken from the Adjusted and Homogenized
Canadian Climate Data archive (Vincent, 1998; Vincent & Gullett, 1999; Vincent et
al., 2002; Mekis & Vincent, 2011). This was used in place of raw (unadjusted) station
data, as the influence of factors such as station relocation, changes in observational
practices, and instrument replacement have been removed in the AHCCD. We have
used daily precipitation values; total precipitation accumulated in a 24 hour period
(mm). It is calculated as the addition of the station’s adjusted daily rain gauge
and snow observations (converted to water equivalent). AHCCD development is a
continuous process, and adjusted precipitation is updated every year, making it easy
to update or extend our analysis with new data as it becomes available.
Chapter 3
Methodology
3.1 Neural networks
Inspired by the mechanics of biological learning, artificial neural networks (ANNs) are
a family of computational algorithms commonly used for classification and regression
problems. Capable of identifying nonlinear relationships between various paramet-
ers, they are well suited to statistical modelling and prediction in the environmental
sciences. Although building neural networks (“training”) can be an involved process,
and resulting models are often very complex, many easy-to-use software packages are
now available that simplify the process for users with limited formal statistical train-
ing.
In the most common approach to training, a neural network is iteratively shown
paired inputs (predictors) and desired outputs (predictands). This is referred to as
supervised training, as the network is guided towards a known outcome (i.e., the
predictand); it is distinct from unsupervised training, which excludes information
regarding outputs (i.e., no prior information regarding outcomes). In the course of
supervised training, a network adapts to minimize differences between its predictions
10
Chapter 3. Methodology 11
and supplied outputs. Essentially, the network assumes a relationship exists between
supplied inputs and outputs, and attempts to capture it. Supervised NNs are com-
monly used as nonlinear analogs of traditional multivariate linear regression.
The goal of training is to optimize the NN’s parameters, which are a series of weights
(w) and biases (b). The number of parameters and structure of the final network are
set by a user prior to training; a description of how weights and biases are structured
is provided in Section 3.2.
3.2 Multilayer perceptron (MLP)
Each iteration of the training process adjusts the weights by a small amount. Suppose
the training process is in its m th iteration; when shown the (m + 1) th set of paired
predictors and predictands, the weights may be updated in the following way:
wm+1i j = wm
i j + ∆wmi j, (3.1)
where ∆wmi j is the updated weights. The multilayer perceptron (MLP) is a common
neural network model, belonging to the class of supervised networks. Using historical
data, the MLP network is capable of effectively mapping output data to input data
(i.e., they may be used for classification or regression). The MLP consists of layers of
‘nodes’; the number of layers and number of nodes per layer are set by the user prior
to training. All MLPs include an input layer (which accepts predictors as input), an
output layer (producing desired predictands), and an arbitrary number of ‘hidden’
layers, so named because the value of nodes in these layers is not typically examined
or relevant to the user.
Mathematically, a trained MLP can be described as follows. Let
Chapter 3. Methodology 12
F : X → H
be a simple mathematical model which represents the first layer of MLP mappings,
where X represents the predictor set (inputs), H represents the first layer of MLP
nodes, and F represents the activation function or transfer function connecting the
nodes input layer to the hidden layer. The transfer function F can be linear or nonlin-
ear. The hyperbolic tangent function and unipolar sigmoid functions are commonly
used nonlinear transfer functions. Let xi (i = 1, 2, ....., I) ϵX be a given set (i.e.,
input set) that will produce a value for hj ϵH determined by set weights (wj,i) and
biases (bj) connecting it to input values; here, j gives the number of nodes in this
first hidden layer.
The value of the jth node (hj), is then calculated as follows:
hj = F
(∑i
xi wj,i + bj
)(3.2)
That is, hj is a function of all inputs xi. For an MLP with only a single hidden layer,
the values of H then determine the values of the output layer O:
g : H → O.
Suppose ok ϵO and let gk be the transfer function, which is often chosen to constrain
output within a specified (e.g., physically meaningful) range. Now, hj is the output
from the first input layer, which is multiplied by the interconnected weights wk,j and
bk is the bias for the output layer. Then
ok = gk
(∑j
hj wk,j + bk
)(3.3)
Again, the outputs are a function of all nodes in the hidden layer. For MLPs with
multiple hidden layers, nodes in subsequent layers are determined by the value of
Chapter 3. Methodology 13
nodes in the previous layer. If the multilayer perceptron is too complex (i.e., with
many hidden layers), this may lead to over-fitting by incorporating the noise, and not
just the desired signal, into predictions. When compared to training data, over-fitted
MLP output will show extremely high agreement with actual outcomes; however, they
will show poor performance when used to predict novel data points not used in the
training period. For this reason, MLP performance must always be evaluated using
‘testing’ data, independent of training data (Hsieh, 2009).
3.3 The Conditional Density Estimation Network
(CDEN)
The CDEN is a probabilistic extension of the classical MLP. Rather than return-
ing the single, deterministic value provided by the standard MLP, a CDEN returns
probability distribution parameters, conditional upon input variables. Training of the
network proceeds in a similar fashion to the traditional MLP, with some adjustment
to the standard cost functions. The current study uses the algorithm described in
Cannon (2008). The goal is to identify the NN’s parameters such that the result-
ing conditional distribution maximizes the probability of the training dataset, rather
than minimizing the error of NN predictions. The CDEN assesses the likelihood of
actual outcomes against a predicted probability density function; as the probability
of the true outcomes increases, the cost function is optimized. A description of this
optimization approach follows.
Suppose x is a continuous sample of data with known probability distribution, f(x, θi),
where θi, i = 1, 2, 3, ...., n are the unknown constant NN weights and biases (hereafter
referred to as NN’s parameters, distinct from statistical distribution parameters) to
Chapter 3. Methodology 14
be estimated from the sampling data. The likelihood function L is a map
L : Θ → R
which is given by
L(x | θi) = f(x |θi); θiϵΘi, i = 1, 2, 3, ..., n
where Θ is the parameter space. If x1, x2, ..., xn are independent observations then
the likelihood function can be written as
L = L(x1, x2, ...., xn|θ1, θ2, ....., θn) = Πni=1 f(xi|θ1, θ2, ....., θn). (3.4)
Taking the natural logarithm of both sides, then
InL =n∑
i=1
In f(xi|θ1, θ2, ....., θn). (3.5)
Now, the maximum likelihood estimates of θ1, θ2, ....., θn are obtained by maximizing
L or InL. By using the definition of critical point, we can write
d InL
dθj= 0; j = 1, 2, ...., n (3.6)
The solution of unknown parameters θj (j = 1, 2, 3, ........, n) is found by solving the
above system of equations. When using the likelihood function, f is set to a known
probability distribution. In the current study, the Bernoulli-gamma distribution has
been used, with three distribution parameters.
L = Πni=1 p(yn | pn, αn, βn) (3.7)
where
p(yn |xn) = p(yn | pn, αn, βn)
Chapter 3. Methodology 15
and pn = f(xn), αn = f(xn), βn = f(xn) are CDEN outputs; that is, probability
distribution parameters vary with the input vectors. Minimizing the negative log of
the likelihood function is equivalent to maximizing the likelihood function. Hence,
the objective function is
J = −n∑
n=1
In p(yn|w, xn)
Since xn and yn are known from the given data, the unknown w is optimized to
minimize J .
3.4 Predictor selection method
When building statistical models, it is very important to select a set of appropriate
input feature variables. The objective is to find the smallest set of features that en-
sures satisfactory predictive performance. With sufficient time and computing power,
this can be accomplished by training multiple prediction/downscaling models using
every possible combination of predictors, then using the single model that gives the
best performance. However, this is not feasible if
i. very large numbers of potentially useful predictors are available, and/or
ii. training takes significant computing resources.
In these situations, an alternate (and faster) prediction method may be used to test
all predictors first. Decision trees are often used for this purpose. These are com-
putationally efficient tools that can be trained quickly, but have limited predictive
power relative to systems such as neural networks; for example, decision trees cannot
extrapolate outside the range of output data supplied during training (SAS Institute
Chapter 3. Methodology 16
Inc., 2011). The process partitions the entire output data set into multiple disjointed
subsets, and then fits single responses into each individual region of the tree. The
tree is built by finding the input variable that best splits the output data into two
groups, and then, after separating the data, this process is applied to each sub-group
separately. The process is continued until the subgroups either reach a minimum
size or until no improvement can be made. The earlier and/or more often a given
input variable is selected as the basis for division, the more likely it is to be a useful
predictor. Here, Recursive Partitioning and Regression Trees (RPART) have been
used to screen candidate predictors. These included functions of day of year, sea level
pressure (psl850), relative humidity (rh850), specific humidity (sp850), temperature
(t850) for 850mb, vertical velocity (v500), and geopotential height (z500) fields for
500mb from NCEP reanalysis for the fifteen nearest points to St. John’s. Loca-
tions are shown in Figure 3.1. The nearest and furthest point is 19km and 476.59km
respectively from St. John’s airport.
Chapter 3. Methodology 17
Latitude
-58 -56 -54 -52 -50 -48 -46
4446
4850
5254
56Potential Predictor Locations
Potential Predictor Locations
Longitude
Figure 3.1: Potential predictors from NCEP reanalysis at locations used in the current
study.
Chapter 3. Methodology 18
3.5 Probability distributions for daily Precipita-
tion
In order to apply the CDEN methodology to precipitation downscaling, a suitable
probability distribution for precipitation at the target location must be selected.
A common choice for daily precipitation totals is the mixed Bernoulli-gamma dis-
tribution, in which a probability of zero precipitation is combined with a gamma
distribution for non-zero precipitation probabilities (e.g., Williams, 1998; Haylock et
al., 2006; Cawley et al., 2007; Cannon, 2008). The current study also adopts the
Bernoulli-gamma distribution for St. John’s, having confirmed the distribution is
suitable for this region. Figure 3.2 compares a histogram of St. John’s daily precipit-
ation to a histogram of randomly sampled data from a Bernoulli-gamma distribution
fitted to this same St. John’s data.
The Bernoulli-gamma distribution has three specified parameters; a probability of
precipitation occurring (p), a shape parameter (α), and a scale parameter (β). Its
probability density function is
f(y; p, α, β) =p ( y
β)α−1 exp(−y
β)
βΓα, if y > 0
and
f(y; p, α, β) = 1− p, if y = 0
where y denotes the amount of precipitation. The mean and variance of the distri-
bution are given respectively as
µ = pα
β
and
s = pα
(1 + (1− p)α
β2
).
Chapter 3. Methodology 19
Station Precipitation data
Amount of Precipitation(mm)
Frequency
0 20 40 60 80 120
02000
4000
6000
8000
10000
Randomly created Precipitation data
Amount of Precipitation(mm)
Frequency
0 20 40 60 80 120
02000
4000
6000
8000
10000
Figure 3.2: Station data and randomly created Precipitation data via Bernoulli-
gamma distribution.
The current study uses a probabilistic downscaling approach, in which the paramet-
ers of a Bernoulli-gamma distribution are conditional upon daily large-scale forcing;
that is, all three Bernoulli-gamma parameters are allowed to vary in response to daily
conditions, reflecting changes in the likelihood that a given amount of precipitation
will occur. For example, a properly trained CDEN should increase the likelihood of
zero precipitation during synoptic conditions that promote clear skies, and increase
the likelihood of heavy precipitation when conditions are extremely warm, humid,
Chapter 3. Methodology 20
and highly unstable. Daily CDEN-predicted distributions can then be used as a con-
ditional weather generator, informed by daily large-scale variability; the distribution
can be randomly sampled multiple times to identify a reasonable precipitation range
for that day. This approach provides a greater depth of precipitation information
than single-value deterministic downscaling, and better reflects daily variability than
a weather generator based on a static daily precipitation distribution. So, the extreme
events are identified as the maximum precipitation of each year. These extreme events
are extracted from precipitation data and stochastic weather generator sample data.
3.6 Cross-Validation
When using any powerful empirical prediction technique, we must be careful to avoid
over-fitting. This can be accomplished by excluding some data from training, al-
lowing testing of the trained prediction algorithm against independent data. This is
convenient if very long time series are available, but is difficult if only limited data is
available. In this situation, cross-validation is often used to produce longer data sets
for evaluation (e.g., Hsieh, 2009). Figure 3.3 illustrates how cross validation works.
The process splits the data into a pre-selected number of segments. The data has
been distributed equally among the segments; one segment is called the test data and
the remaining segments are called the training data. An estimation of each segment
is then produced by a prediction model trained with the remaining segments. In this
way, the full time series of data is estimated by a series of independent models.
Chapter 3. Methodology 21
u u uu uu u| |Test
| |Train
u u u u u u uu u u u u u u| |Test
| |Train
| |Train
u u u u u u u| |Test
| |Train
| |Train
...
u u u u u u u| |Test
| |Train
Figure 3.3: Cross Validation (CV) method.
Chapter 4
Application of Conditional Density
Estimation Networks to
Precipitation Downscaling in
Newfoundland
4.1 Predictor Selection
The RPART predictor selection method used here is described in Chapter 3; here,
it has been used to select between the following daily mean fields obtained from
the NCEP reanalysis: sea level pressure (psl850), relative humidity (rh850), spe-
cific humidity (sp850), and temperatures (t850) recorded at the 850mb pressure level
(∼ 1km above sea level), and vertical velocity (v500) and geopotential height (z500)
at the 500mb pressure level (∼ 5.5km above sea level). Data from the fifteen NCEP
grid points nearest the St. John’s airport were included; these all lie within 450km
of the airport. As the expected daily precipitation varies with the annual seasonal
22
Chapter 4. Application of CDEN to Precipitation Downscaling 23
cycle, sine/cosine representations of the seasonal cycle (as a function of day of year)
were added as additional candidate predictors. The following additional predict-
ors (variable and location given) were identified as relevant in an RPART analysis,
and including in CDEN training: temperature at (−47.5◦, 47.5◦), sea level pressure
at (−55◦, 45◦) and (−55◦, 50◦), vertical velocity at five locations (−52.5◦, 47.5◦),
(−50◦, 47.5◦), (−55◦, 47.5◦), (−55◦, 45◦) and (−57.5◦, 45◦), and relative humidity
at three different locations (−52.5◦, 47.5◦), (−55◦, 47.5◦), and (−55◦, 45◦). In Fig-
ure 4.1, the blue points demonstrate the locations of the potential predictor paramet-
ers and the red points demonstrate the locations of the effective predictor parameters.
Chapter 4. Application of CDEN to Precipitation Downscaling 24
Latitude
-58 -56 -54 -52 -50 -48 -46
4446
4850
5254
56
Potential Predictor Locations
Potential Predictor LocationsEffective Predictor Locations
Longitude
Figure 4.1: Red locations and blue locations represents selected predictor locations
and potential predictor locations respectively for precipitation at the St. John’s
airport station.
Chapter 4. Application of CDEN to Precipitation Downscaling 25
4.2 Results and Discussion
The thirteen candidate predictors were applied in the CDEN training process. Daily
data covering January 1st, 1976 through December 31st, 2011 were used. A six-fold
cross-validation approach was adopted, in which the full 1976-2011 daily time series
was separated into six equal segments. Daily precipitation for each segment was
then simulated using a CDEN trained with the remaining five segments, ensuring
the simulations were based on independent data. These simulations consist of daily
Bernoulli-gamma probability distribution parameters, conditional upon the state of
the thirteen predictor values: scale and shape parameters, along with the conditional
probability of non-zero precipitation for the day.
In order to evaluate the performance of the CDEN, a ‘best-guess’ time series of daily
precipitation was evaluated by comparing
i. the median of the daily CDEN-generated Bernoulli-gamma distribution to
ii. observed daily precipitation totals.
Table 4.1 summarizes results as a series of relevant skill scores (Wilks, 2006). As a
comparison point, the same scores are presented for predictions for St. John’s from
the raw NCEP-NCAR reanalysis daily precipitation field. The reanalysis values give
a benchmark for performance at our study location.
Table 4.1 shows the deterministic and categorical performance of the CDEN-derived
‘best-guess’ time series for St. John’s; this represents a deterministic prediction by the
trained CDEN model. Results show that the root-mean square error (RMSE) of the
deterministic CDEN prediction is not significantly different from NCEP, consistent
with results generated in other studies using a coupled ANN-analog downscaling
model (Cannon, 2008). The correlation coefficient for both methods is statistically
Chapter 4. Application of CDEN to Precipitation Downscaling 26
Statistic CDEN NCEP
RMSE(mm) 7.57 7.97
r 0.62 .62
HIT 0.90 0.63
FAR 0.13 0.23
TSS 0.77 0.40
BIAS 1.03 0.82
Table 4.1: Cross-validated model performance statistics for CDEN, NCEP downscal-
ing models with root mean square error (RMSE), correlation (r), Hit-rate (HIT),
False alarm ratio (FAR), True scale statistic (TSS), Bias ratio (BIAS).
significant, but there is no difference between them. The remaining skill measures in
Table 4.1 are based on categorical predictions; specifically, whether precipitation does
or does not occur. Here, precipitation was considered to have occurred if a prediction
(from the deterministic application of the CDEN or NCEP) gave a daily precipitation
of 0.5mm or more. Four skill measures are considered:
i. the hit rate (HIT) gives the fraction of correctly forecasted events, with HIT =
1 being a perfect score;
ii. the false alarm ratio (FAR) gives the fraction of incorrect ‘yes’ forecasts (i.e.,
predicting an event that does not occur), with a perfect score of FAR = 0;
iii. the true skill statistic (TSS) is the difference between HIT and FAR, with 1 being
a perfect score; and
iv. BIAS (B), calculated as the ratio between the number of times an event is pre-
dicted relative to the number of times an event occurs, with B = 1 being the
Chapter 4. Application of CDEN to Precipitation Downscaling 27
ideal score (Wilks, 2006).
200
400
600
800
1000
DJF seasonal total precipitation
Time
Tota
l am
ount
of s
easo
nal p
reci
pita
tion
(mm
)
1976 1980 1984 1988 1992 1996 2000 2004 2008
Weather generators data Station data
(a)
200
400
600
800
1000
MAM seasonal total precipitation
TimeTo
tal a
mou
nt o
f sea
sona
l pre
cipi
tatio
n (m
m)
1976 1980 1984 1988 1992 1996 2000 2004 2008
Weather generators data Station data
(b)
200
400
600
800
1000
JJA seasonal total precipitation
Time
Tota
l am
ount
of s
easo
nal p
reci
pita
tion
(mm
)
1976 1980 1984 1988 1992 1996 2000 2004 2008
Weather generators data Station data
(c)
200
400
600
800
1000
SON seasonal total precipitation
Time
Tota
l am
ount
of s
easo
nal p
reci
pita
tion
(mm
)
1976 1980 1984 1988 1992 1996 2000 2004 2008
Weather generators data Station data
(d)
Figure 4.2: Compared total amount of seasonal precipitation between the station data
and the stochastic weather generator using CDEN Model. The blue line represents
the total amount of seasonal precipitation for the station data and box plot represents
the total amount of seasonal precipitation for the stochastic weather generator.
Chapter 4. Application of CDEN to Precipitation Downscaling 28
Here, the CDEN predictions give significantly higher skills than NCEP predictions,
with higher HIT, lower FAR, and better TSS. The CDEN model is a little over-biased,
while NCEP is significantly under-biased (predicting too few events). Thus, we have
found the CDEN model to be more effective than the NCEP in predicting whether
precipitation will or will not occur, even if RMSE and correlation does not show signi-
ficant differences. Figure 4.2 compares total seasonal precipitation from observations
(blue line) to the 30-member ensemble generated with stochastic generator. Results
show reasonable agreement, with seasonal totals usually remaining within the 95%
confidence interval predicted by the model. There are several instances when obser-
vations and the model diverge significantly (e.g., Winter, 1998), although these are
relatively rare. Along with results in Table 4.1, this plot emphasizes that the CDEN
provides a reasonable, though imperfect, fit with observations.
In an attempt to further evaluate the strength of the CDEN approach to simulate
Figure 4.3: Quantile-Quantile plot at the St. John’s international airport.
Chapter 4. Application of CDEN to Precipitation Downscaling 29
realistic precipitation fields, a quantile-quantile plot was produced (Figure 4.3). Here,
the probabilistic potential of the CDEN approach is explored. Instead of drawing a
single ‘best-guess’ from the daily CDEN-predicted precipitation distributions, 30 ran-
dom samples were taken, providing a greater sense of each day’s likely precipitation.
The result is a CDEN-derived stochastic weather generator sample thirty times the
size of the original observed time series. If the CDEN is well trained and works well,
this stochastic sample should give a better sense of extreme precipitation events than
a short observation time series. If the observation period is long enough, the two
should give similar results.
Figure 4.3 compares quantiles of observed St. John’s precipitation over the train-
ing period (horizontal axis) to quantiles from the CDEN sample (vertical axis). A
reference line in this plot shows what perfect agreement would look like. From Fig-
ure 4.3, it is clear that there is a strong relationship between the observed test data
and the simulated precipitation data for events lower than 70mm. This is a signific-
ant amount of precipitation, exceeded roughly once every 1.33 years in the observed
data. Above 70mm, the quantile-quantile plot suggests the CDEN overpredicts pre-
cipitation amounts, with several large outliers. These outliers, or extreme events, are
important events for climatology, and are used as design criteria in engineering in-
frastructure. Results suggest that either the CDEN stochastic sample overestimates
these large events, or that the observational data underestimates them. Figure 4.4(a)
shows the daily precipitation data of our observation site. In these figure, the x-axis
represents the precipitation interval and the y-axis represents the amount of precipit-
ation; that is, we have split observed precipitation into discrete categories (‘intervals’;
x-axis), then show the range of precipitation in that interval as a box and whisker
plot (y-axis). This figure is presented to highlight aspects of extreme precipitation
at this location; notably, the observational record shows that daily precipitation in
Chapter 4. Application of CDEN to Precipitation Downscaling 30
excess of 100mm occurs only four times in the period examined. These extreme
outliers may have a strong impact on the training of the CDEN model, in either a
negative or positive manner. Ideally, these rare events would provide information on
conditions that promote extreme precipitation. However, with so few large events,
the CDEN may not be able to accurately identify predictor/extreme precipitation
relationships, leaving the CDEN unable to simulate these events realistically. This
may be the case here, as the CDEN developed for this paper appears to over-estimate
the magnitude of extreme precipitation relative to observations (Figure 4.3). Fig-
ure 4.4(b) shows the same information as Figure 4.4(a), but this time for a synthetic
time series (CDEN output). Comparing these gives more insight into the CDEN’s
extreme overestimates. The larger stochastic sampling is thirty times the size of the
original time series, allowing a closer examination of the CDEN’s interpretation of
rare events. Results again suggest the CDEN output overestimates the magnitude
of large precipitation events. Comparing results in the high precipitation categories
(>100mm) in Figure 4.4(a) and Figure 4.4(b), the CDEN tends to adopt much higher
values, even reaching 216.7mm in the most extreme case. However, the CDEN ac-
curately captures the relative frequency of extreme events, producing 131 events with
100mm or higher. Given that the stochastic CDEN sample is thirty times the size
of the observation time series, this is a close match to observations (four events in
36 years, compared to 131 in 1080 simulated years). Respectively, these give a mean
frequency of 0.11 and 0.12 extreme events per year, or return periods of 9 and 8.2
years.
The apparent tendency for the CDEN to ‘overforecast’ the magnitude of extreme
events raises concerns that this statistical model is unsuitable for extreme precipit-
ation analysis in our study region. Alternatively, results could be interpreted as an
indication that very high precipitation events are more likely to occur here than the
Chapter 4. Application of CDEN to Precipitation Downscaling 31
raw observational record would suggest.
A recent extreme precipitation analysis conducted using a second data source suggests
this may be the case, and that much larger precipitation events are possible than have
been recorded at St. John’s airport. Specifically, a rain gauge located 1.6km from St.
John’s airport (Windsor Lake) recorded three events that are greater than 100mm
since 1998; respectively, these were 104.9mm, 149.6mm, and 180mm (CBCL Ltd.,
2014). This later amount far exceeds the maximum (144.8mm) recorded at the air-
port, and suggests the CDEN’s larger estimates are warranted (e.g., reasonably likely
to occur). It is important to note that although the CDEN does simulate a handful of
events larger than the maximum 180mm Windsor Lake measurement, they are very
rare: in our 1080-year simulation, only 8 were recorded, giving a return period of 135
years.
Chapter 4. Application of CDEN to Precipitation Downscaling 32
(a)
(b)
Figure 4.4: Compared between (a) the precipitation data for St. John’s airport and
(b) the stochastic weather generator using CDEN Model.The line inside each box
represents the median, boxes extend from the 25th to 75th percentiles, and outliers
are shown as circles.
Chapter 4. Application of CDEN to Precipitation Downscaling 33
4.3 Summary and Conclusion
Results indicate that the CDEN methodology, as applied in the current study, can
offer some useful insight into local precipitation distributions and extreme events.
However, they also suggest the method should be used with caution. As a precipit-
ation downscaling method, the CDEN appears to work reasonably well, although it
retains errors in magnitude comparable to NCEP/NCAR reanalysis estimates. The
CDEN does, however, strongly outperform the reanalysis when predicting whether
precipitation will or will not occur, with much better hit rates, bias, and false alarm
rates. CDEN skill shown here is comparable to skill reported in similar studies of
other locations (e.g., British Columbia; Cannon, 2008).
Although the reasonable performance as a deterministic downscaling technique con-
firms that our CDEN application is capable of capturing relationships between large-
scale forcing and local scale precipitation (Table 4.1), the method is potentially more
valuable as a tool for exploring the broader precipitation probability distribution and
extreme precipitation events. To explore this potential, the CDEN was used as a
conditional weather generator, generating a large synthetic precipitation data set by
randomly sampling CDEN-based daily conditional precipitation distributions. In this
case, each day in our 36-year study period was sampled 30 times, giving 1080 equival-
ent years of synthetic data. A comparison between the 36-station record suggests that
the CDEN overestimates extreme precipitation values (Figure 4.3, 4.4). However,
our ability to evaluate the CDEN is limited by the available observational record;
with only 36 years examined, it is possible that this record is under-representing ex-
treme events. This is supported by other nearby precipitation records, which include
events much larger than those observed at St. John’s airport over the same period
of record (e.g., Windsor Lake, with a maximum of 180mm in 24 hrs). Keeping this
Chapter 4. Application of CDEN to Precipitation Downscaling 34
in mind, CDEN results suggest that design criteria like the 100-year return period
should be increased relative to values based on observations alone. At present, how-
ever, it would be premature to base design criteria on CDEN output alone.
The novelty of the work is mostly in our application in a new geographic context,
and reframing of the CDEN framework as ‘probabilistic’ downscaling. Computing
challenges included
i. predictor screening/selection for St. John’s,
ii. adaptation of the CDEN algorithm to the current context (including scripting
and testing a new cost function in an existing CDEN package in R),
iii. application of a cross-validation methodology.
Further work on predictor selection and analyses based on longer data sets may be
able to improve the skill of the CDEN predictions; this work is necessary for this
technique to be truly useful for operational decision making. If improved CDEN skills
are achieved, this tool may prove useful in projecting the impact of climate change
on extreme precipitation in Newfoundland. By then applying a CDEN trained with
observations and reanalyses to 20th and 21st general circulation model output, it
would be possible to explore shifts in precipitation distributions in detail, even when
only relatively short model runs (≤ 100 years) are available. This is being considered
for future work.
Bibliography
2008 Reanalysis and Attribution: Understanding How and Why Recent Climate
Change Has Varied and Changed : Findings and Summary of the U.s. Climate
Change Science Program Synthesis and Assessment Product 1.3. Climate Change
Science Program.
Allen, M. R. & Ingram, W. J. 2002 Constraints on future changes in climate
and the hydrological cycle. Nature, 419 224-232.
Benestad, R. E., Hanssen-Bauer, I. & Chen, D. 2008 Empirical-statistical
downscaling. World Scientific Publishing Company.
Cai, S., William W., H. & Alex J., C. 2006 A comparison of bayesian and con-
ditional density models in probabilistic ozone forecasting. Hydrological Processes,
16, 1137-1150.
Barrow, E., Maxwell, B., Gachon, P. & Canada, C. E. 2004 Cli-
mate Variability and Change in Canada: Past, Present and Future. Environment
Canada.
Beniston, M., Stephenson, D. B., Christensen, O. B., Ferro, C. A. T.,
Frei, C., Goyette, S., Halsnaes, K., Holt, T., Jylh, K. & Koffi, B.
35
BIBLIOGRAPHY 36
2007 Future extreme events in european climate: an exploration of regional climate
model projections. Climatic Change, 81 (S1), 71-95.
Bromwich, D. H. & Cullather, R. I. 1998 The atmospheric hydrologic cycle
over the Arctic basin from ECMWF re-analyses. Proc. of the WCRP First Int.
Conf. on Reanalyses Silver Spring, MD, WCRP, in press.
Cannon, A. J. & McKendry, I. 2002 A graphical sensitivity analysis for statist-
ical climate models: Application to indian monsoon rainfall prediction by artificial
neural networks and multiple linear regression models. International Journal of
Climatology, 22, 1687-1708.
Cannon, A. J. 2008 Probabilistic multisite precipitation downscaling by an expan-
ded bernoulli-gamma density network. Journal of Hydrometeorology, 9, 1284-1300.
Cannon, A. J. 2010 A flexible nonlinear modelling framework for non-stationary
generalized extreme value analysis in hydro-climatology. Hydrological Processes, 24,
673-685.
Cawley, G. C., Janacek, G. J., Haylock, M. R. & Dorling, S. R.
2007 Predictive uncertainty in environmental modelling. Neural Networks, 20 (4),
537–549.
CBCL Ltd. 2014 Rennies River Catchment Stormwater Management Plan. CBCL
Project No. 123097 .
Chen, D. & Hellstrom, C. 1999 The influence of the north atlantic oscillation
on the regional temperature variability in sweden: spatial and temporal variations.
Tellus, 51A, 505-516.
BIBLIOGRAPHY 37
Christopher, J. W., Sanabria, A, Corney, S, Grose, M, Holz, M,
Bennett, J, Cechet, R & Bindoff, N.L. 2010 Modelling Extreme Events in
a Changing Climate using Regional Dynamically- Downscaled Climate Projections.
IEMSs 2010 International Congress on Environmental Modelling and Software,
Ottawa.
Dawson, C. & Wilby, R. 2001 Hydrological modelling using artificial neural
networks. Progress in Physical Geography, 25, 80-108.
Christensen, J. H., Carter, T. R., Rummukainen, M. & Amanatidis,
G. 2007 Evaluating the performance and utility of regional climate models: The
PRUDENCE project. Climate Change, 81, 1-6.
Dickinson, R.E., Errico, R.M., Giorgi, F.& Bates, G.T. 1989 A regional
climate model for western United States. Climatic Change, 15, 383-422.
Dorling, S., Foxall, R., Mandic, D. & Cawley, G. 2003 Maximum
likelihood cost functions for neural network models of air quality data. Atmospheric
Environment, 37(24), 3435-3443.
Finnis, J., Holland, M. M., Serreze, M. C. & Cassano, J. J. 2007 Response
of Northern Hemisphere extratropical cyclone activity and associated precipita-
tion to climate change, as represented by the Community Climate System Model.
Journal Geophysics Ressources, 112 G04S42, doi:10.1029/2006JG000286.
Folland, C. & Karl, T. 2001 Observed climate variability and change. in:
Houghton, j.t. et al., editors. climate change 2001: the scientific basis. Cambridge
University Pres, , 99-181.
BIBLIOGRAPHY 38
Forland, E.J., Engelen , A., Ashcroft, J., Dahlstrom, B., Demaree,
G., Frich, P., Hanssen-Bauer, I., Heino, R., Jonsson, T., Mietus,
M., Muller-Westermeier, G., Palsdottir, T., Tuomenvirta, H. &
Vedin, H. 1996 Change in normal precipitation in the north atlantic region (2nd
edn). DNMI Report 7-96 Klima.
Fowler, H. J., Blenkinsop, S.,& Tebaldi, C. 2007 Linking climate change
modelling to impact studies: Recent advances in downscaling techniques for hydro-
logical modelling. International Journal Climatology, 27, 1547-1578.
Frei, C., Schar, C., Luthi, D. & Davies, H. C. 1998 Heavy precipitation
processes in a warmer climate. grl, 25, 1431-1434.
Frei, C., Schll, R., Fukutome, S., Schmidli, J. & Vidale, P. L. 2006
Future change of precipitation extremes in Europe: Inter-comparison of scenarios
from regional climate models. Journal of Geophysical Research, 111.
Gardner, M. W. & Dorling, S. 1998 Artificial neural networks (the multilayer
perceptron)a review of applications in the atmospheric sciences. Atmospheric En-
vironment, 32, 2627–2636.
Gautam, D. & Holz, K. 2000 Neural network based system identification
approach for the modelling of water resources and environmental systems. Artificial
intelligence methods in civil engineering applications, 87-100.
Giorgi, F. 1990 Simulation of regional climate using a limited area model nested
in a general circulation model. Journal of Climate, 3, 941-963.
Goodess, C. M. 2011 An inter-comparison of statistical downscaling methods for
Europe and European regionsAssessing their performance with respect to extreme
BIBLIOGRAPHY 39
weather events and the implications for climate change applications. Technical re-
port, Univ. of East Anglia, Norwich, U. K., in press.
Groisman, P., Karl, T., Easterling, D., Knight, R. W., Jamason, P.,
Hennessy, R. S., C.M., P., Wibig, J., Fortuniak, K., Razuvaev, V.,
Forland, E. & Zhai, P. 1999 Changes in the probability of heavy precipitation:
Important indicators of climatic change. Climate Change, 42, 243-283.
Gutowski, W.J., Arritt, R.W., Kawazoe, S., Flory, D.M., Takle,
E.S., Biner, S., Caya, D., Jones, R.G., Laprise, R., Leung, R.L.,
Mearns, L.O., Moufouma-Okia, W., Nunes, A.M.B.,Qian, Y , Roads,
J.O., Sloan, L.C. & Snyder, M.A. 2010 Regional Extreme Monthly Precip-
itation Simulated by NARCCAP RCMs. Journal of Hydrometeor, 11, 1373-1379.
doi: http://dx.doi.org/10.1175/2010JHM1297.1
Haylock, M. R., Cawley, G. C., Harpham, C., Wilby, R. L. & Goodess,
C. M. 2006 Downscaling heavy precipitation over the united kingdom: a compar-
ison of dynamical and statistical methods and their future scenarios. International
Journal of Climatology, 26, 13971415.
Hellstrom, C., Chen, D., Achberger, C. & Risnen, J. 2001 Comparison of
climate change scenarios for sweden based on statistical and dynamical downscaling
of monthly precipitation. Climate Research, 19, 45-55.
Hennessy, K. J. & Suppiah, R. 1999 Australian rainfall changes, 19101955.
Australian Meteorological Magazine, 48, 1-13.
Hostetler, S.W., Bartlein, P.J., Clark, P.U., Small, E.E. & So-
lomon, A.M. 2000 Simulated influence of Lake Agassiz on the climate of Central
North America 11,000 years ago. Nature, 405(6784), 334–337.
BIBLIOGRAPHY 40
Houghton, JT., Meira Filho, L.,Callander, B., Harris, N., Kat- ten-
berg, A. & Maskel, l. K. e. 1996 IPCC climate change. the IPCC second
assessment report. Cambridge University Press: New York .
Hsieh, W. W. & Tang, B. 1998 Applying neural network models to prediction
and data analysis in meteorology and oceanography. Bulletin of the American Met-
eorological Society, 79, 1855-1870.
Hsieh, W. W. 2009 Machine Learning Methods in the En-
vironmental Sciences. Cambridge University Press, 1st edi-
tion.http://dx.doi.org/10.1017/CBO9780511627217.
Iwashima, T.& Yamamoto, R. 1993 A statistical analysis of the extreme events:
long-term trend of heavy daily precipitation. Journal of the Meteorological Society
of Japan, 71, 637-640.
Kaas, E. & Frich, P. 1995 Diurnal temperature range and cloud cover in the nordic
countries: observed trends and estimates for the future. Atmospheric Research, 37,
211-228.
Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D.,
Gandin, L., & Joseph, D. 1996 The NCEP/NCAR 40-year reanalysis project.
Bulletin of the American meteorological Society, 77(3), 437-471.
Kalnay, E 2003 Atmospheric Modeling, Data Assimilation and Predictability. Cam-
bridge university.
Karl, T. R., Knight, R. W. & Plummer, N. 1995 Trends in high frequency
climate variability in the twentieth century, 217-220.
BIBLIOGRAPHY 41
Karl, T. R. & Knight, R. W. 1998 Secular trends of precipitation amount fre-
quency and intensity in the United States. Bulletin of the American Meteorological
Society, 79, 231-241.
Kaas, E. & Frich, P. 1995 Diurnal temperature range and cloud cover in the nor-
dic countries: observed trends and estimates for the future. Atmospheric Research,
37, 211-228.
Lenderink, G., Buishand, A. ,& van Deursen, W. 2007 Estimates of future
discharges of the river Rhine using two scenario methodologies: Direct versus delta
approach. Hydrology and Earth System Sciences, 11(3), 1145-1159.
Linderson, M., Achberger, J. & Chen, D. L. 2004 Statistical downscal-
ing and scenario construction of precipitation in scania, southern sweden. Nordic
Hydrology, 35(3), 261-278.
Mekis, & Vincent, L.A. 2011 An overview of the second generation adjusted
daily precipitation dataset for trend analysis in Canada. Atmosphere-Ocean, 49(2),
163-177.
Plummer, N., Salinger, M. J., Nicholls, N., Suppiah, R., Hennessy,
K. J., Leighton, R. M., Trewin, B., Page, C. M. & Lough, J. M.
1999 Changes in climate extremes over the australian region and new zealand dur-
ing the twentieth century. In Weather and Climate Extremes, 183-202. Springer
Netherlands.
Rummukainen, M. 1997 Methods of statistical downscaling of GCM simulations.
SMHI Rapporter. Meteorologi och Klimatologi (Sweden). no. 80.
BIBLIOGRAPHY 42
SAS Institute Inc. 2011 SAS(R) Enterprise Miner 7.1 Extension Nodes: De-
velopers Guide. SAS Cary, NC: SAS Institute Inc.
Schonwiese, C. & Rapp, J 1996 Climate trend atlas of Europe based on obser-
vations 1891- 1990. Kluwer Academic Publishers: Dordrecht.
Schoof, J. & Pryor, S. 2001 Downscaling temperature and precipitation: a
comparison of regression-based methods and artificial neural networks. Interna-
tional Journal Climatology, 21, 773–790.
Suppiah, R. & Hennessy, K. J. 1995 Trends in the intensity and frequency of
heavy rainfall in tropical Australia and links with the southern oscillation. World
Meteorological Organization-Publications-WMO TD, 897-901.
Suppiah, R. & Hennessy, K. J. 1998 Trends in total rainfall, heavy rainfall
events and number of dry events in Australia, 1910-1990. International Journal of
Climatology, 18, 1141-1164.
Trenberth, K.E., Karl, T. R. & Knight, R. W. 2003 Modern global climate
change. Science, 302, 1719-1723
Vincent, L. A. 1998 A technique for the identification of inhomogeneities in Ca-
nadian temperature series. Journal of Climate, 11, 1094-1104.
Vincent, L. A. & Gullett, D. W. 1999 Canadian historical and homogen-
eous temperature datasets for climate change analyses. International Journal of
Climatology, 19, 1375-1388.
Vincent, L. A., Zhang, X., Bonsal, B.R. & Hogg, W. D. 2002 Homogen-
ization of daily temperatures over Canada. Journal of Climate, 15, 1322-1334.
BIBLIOGRAPHY 43
Vincent Lucie, A. & Mekis, V.A. 2006 Changes in daily and extreme temper-
ature and precipitation indices for canada over the twentieth century. Atmosphere-
Ocean, 44 (2), 177-193.
Vrac, M., & Naveau, P. 2007 Stochastic downscaling of precipitation: From
dry events to heavy rainfalls. Water resources research, 43(7).
Weichert, A.& Burger, G. 1998 Linear versus nonlinear techniques in down-
scaling. Climate Research, 8, 83-93.
Wigley, T. M. L., Jones, P. D., Briffa, K. R. & Smith, G. 1990 Obtaining
subgrid scale information from coarse resolution general circulation model output.
Journal of Geophysical Research: Atmospheres (1984-2012), 95(D2), 1943-1953.
Wilby, R. L., Wigley, T. M. L., Conway, D., Jones, P. D., Hewitson,
B. C., Main, J.& Wilks,D. S. 1998 Statistical downscaling of general circula-
tion model output: A comparison of methods. Water resources research, 34(11),
2995-3008.
Wilks, D. S. 1998 Multisite generalization of a daily stochastic precipitation gen-
eration model. Journal of Hydrology, 210, 178–191.
Wilks, D. S. 2006 Statistical Methods in the Atmospheric Sciences. Academic
Press, 627.
Williams, P. M. 1998 Modelling seasonality and trends in daily rainfall data.
Prameters, 16(12), 15.