spatial prediction on river networks: comparison of ... · m is the lagrange multiplier from the...

HYDROLOGICAL PROCESSESHydrol. Process. 28, 315–324 (2014)Published online 5 November 2012 in Wiley Online Library(wileyonlinelibrary.com) DOI: 10.1002/hyp.9578

Spatial prediction on river networks: comparison of top-krigingwith regional regression

G. Laaha,1* J.O. Skøien2 and G. Blöschl31 Institute of Applied Statistics and Computing (IASC), University of Natural Resources and Life Sciences, BOKU Vienna, Austria

2 Institute for Environment and Sustainability, Joint Research Centre of the European Commission, Ispra, Italy3 Institute for Hydraulic and Water Resources Engineering (IWI), Vienna University of Technology, Austria

*CCoBOE-m

Co

Abstract:

Top-kriging is a method for estimating stream flow-related variables on a river network. Top-kriging treats these variables as emergingfrom a two-dimensional spatially continuous process in the landscape. The top-kriging weights are estimated by regularising the pointvariogram over the catchment area (kriging support), which accounts for the nested nature of the catchments. We test the top-krigingmethod for a comprehensive Austrian data set of low stream flows. We compare it with the regional regression approach where linearregressionmodels between low stream flow and catchment characteristics are fitted independently for sub-regions of the study area thatare deemed to be homogeneous in terms offlowprocesses. Leave-one-out cross-validation results indicate that top-kriging outperformsthe regional regression on average over the entire study domain. The coefficients of determination (cross-validation) of specific lowstream flows are 0.75 and 0.68 for the top-kriging and regional regression methods, respectively. For locations without upstream datapoints, the performances of the twomethods are similar. For locationswith upstreamdata points, top-kriging performsmuch better thanregional regression as it exploits the low flow information of the neighbouring locations. Copyright © 2012 John Wiley & Sons, Ltd.

KEY WORDS change of support; geostatistics; spatial interpolation; prediction in ungauged basins (PUB); stream distance; lowflows and droughts

Received 20 March 2012; Accepted 1 October 2012

INTRODUCTION

The estimation of stream flow-related variables is usuallybased on regression methods between the stream flowvariable and physiographic catchment characteristics. Ifthe study domain is large or very heterogeneous in terms ofthe low flow processes, a number of authors have suggestedto split the domain into regions and to apply a regressionrelationship to each of the regions independently (e.g.Nathan and McMahon, 1990; Gustard et al., 1992;Aschwanden and Kan, 1999; Laaha and Blöschl, 2006b).This is termed the regional regression approach. Recently,geostatistical methods have been proposed for streamnetworks. One approach is to treat the estimation on a rivernetwork as a one-dimensional problem. Gottschalk (1993a)was probably the first to develop a method for calculatingcovariance along a stream network based on river distance.Ver Hoef et al. (2006) and Cressie et al. (2006) proposeda moving average approach for interpolating on the one-dimensional river system, where the estimation eitherproceeds upstream or downstream. Peterson and Ver Hoef(2010) proposed a combination of these two methods.Garreta et al. (2009) evaluated the three model types in thecontext of summer stream temperature and nitrate concen-tration and found the combined model to be superior toeach of the individual models.

orrespondence to: G. Laaha, Institute of Applied Statistics andmputing (IASC), University of Natural Resources and Life Sciences,KU Vienna, Austriaail: [email protected]

pyright © 2012 John Wiley & Sons, Ltd.

The second approach treats the river network as a two-dimensional problem. In this approach, the runoff processis conceptualised as a spatially continuous process whichexists at any point in the landscape, and stream flow is theintegral of local runoff over the catchment. Sauquet et al.(2000) and Gottschalk et al. (2006) proposed a block-kriging method where spatial dependence of catchmentswith different, non-zero support is modelled by aregularised covariogram. Skøien et al. (2006) extend thework of Sauquet et al. (2000) to account for the strongerspatial correlation between nested basins than betweenun-nested basins. They showed that their method, knownas topological kriging or top-kriging, can be used, in anapproximate way, for a range of stream flow-relatedvariables including variables that do not aggregate linearlyand are non-stationary. Also, they apply a kriging withuncertain data (KUD) estimator (de Marsily, 1986 p. 300;Merz and Blöschl, 2005) to account for local uncertaintiesof the observations.The aim of this paper is twofold: (1) to test top-kriging

in the context of low stream flows, and (2) to compare itwith the regional regression method which constitutes theactual benchmark for low stream flow regionalisation. Theanalyses will be performed on a comprehensive Austriandata set.

TOP-KRIGING METHOD

There are two main groups of processes that control streamflow. The first group consists of runoff generating

Figure 1. Calculation of regularised variogram for two nested catchments(schematic, from Skøien et al., 2006). Catchment i is nested withincatchment j, Ai and Aj are the respective catchment areas (support), and

s and u are the discretisation points within each catchment

316 G. LAAHA ET AL.

processes acting over the catchment area. These processesare continuous in space. The second group is related torunoff aggregation and routing. These processes are relatedto the river network topology. The main idea of top-krigingis to combine the two groups of processes in a geostatisticalframework. For this, runoff generation is conceptualised asa spatially continuous process which exists at any point inthe landscape. Instead of observing a pointwise realisationof this process, data z(A1), z(A2), . . ., z(An) are observed atstream gauges, where

z Aið Þ ¼ 1Aij j

ZAi

z xð Þdx (1)

and Ai denotes the spatial support of z(Ai). For stream flowvariables, Ai is the catchment which drains into a riverlocation xi, and |Ai| is its surface area. If we follow thestream from the source to the mouth, the support increasesas the catchment area increases. In this context, the transferof information between river locations adds up to the area-to-area change of support problem in geostatistics (e.g.Gotway and Young, 2002), based on the classicalEuclidean distance metric. For spatial prediction on a riverlocation x0 with catchment area A0 from non-point samplesz(A1), z(A2), . . ., z(An), the linear block-kriging predictorgiven by

z A0ð Þ ¼Xni¼1

liz Aið Þ (2)

is used. Assuming runoff generation as an intrinsic stationaryrandom process, the optimal weights li can be found bysolving the kriging system:

Xnj¼1

lj�gij � ljs2i þ m ¼ �g0i i ¼ 1;. . .; n

Xni¼1

li ¼ 1

(3)

The �gij refers to the expected semivariance betweentwo observations i and j with non-zero support (see below),m is the Lagrange multiplier from the constraint minimisa-tion, ands2i represents the measurement error or uncertaintyof observation i. The kriging equations (Equation (3))result from minimising prediction mean squared errorsubject to unbiasedness constraints. The use of measure-ment errors in the kriging equations is termed KUD(de Marsily, 1986 p. 300).Since the observations have a non-zero support A, the

expected semivariances �gij between the observations needto be obtained by regularisation (Cressie, 1993, p. 66).This means that instead of one variogram model, a familyof variogram models for different catchment areas(kriging support) is used, which accounts for the differentscales and the nested nature of the catchments. Assumingthe existence of a point semivariogram gp, the expectedsemivariance�gij between two observations with catchmentareas Ai and Aj, respectively, is:

Copyright © 2012 John Wiley & Sons, Ltd.

�gij ¼ 0:5 � Var z Aið Þ � z Aj

� �� ¼ 1

Aij j Aj

�� ZAj

ZAi

gp s� uð Þ ds du

�0:5 �"

1

Aij j2ZAi

ZAi


þ 1

Aj

�� 2ZAj

ZAj


#(4)

s and u are position vectors within each catchment used forthe integration. The first part of this expression integratesall the variance between the two catchments, while thesecond part subtracts the variance within the catchments.Consequently, the �gij will be lowest for close-by locationsat the same river, due to the overlapping support. Intop-kriging, the integrals in Equation (4) are computed bydiscretising the catchment area into a grid of points,according to common geostatistical practice. Figure 1(from Skøien et al., 2006) shows a schematic of two nestedcatchments, their discretisation by a square grid, andthe distances between the discretised points withinthe catchments.

EXAMPLE OF LOW STREAM FLOWS

Data set

We test the performance of top-kriging for the exampleof low stream flows. The data set used in this study consistsof about 8000 catchments in Austria. For 490 of thesecatchments, daily streamflowmeasurements Q are availablefor the period 1977–1996 (Laaha and Blöschl, 2006a).Q was standardised by the catchment area to transfer stream

Hydrol. Process. 28, 315–324 (2014)

317SPATIAL PREDICTION ON A RIVER NETWORK: TOP-KRIGING VERSUS REGRESSION

flow into a continuous spatial variable q with point support.From these data, the specific discharge q95 (ls�1 km�2) thatis exceeded on 95% of the time, Pr(q> q95) = 0.95, wascalculated. q95 was used as the target variable in this study(seeFigure 2). Themeasurement errors of q95were taken fromLaaha and Blöschl (2005 and 2007). From this assessment,the observed low flow characteristics exhibit errors between3% and 24% of q95 for 20- to 5-year stream flow records.The so-obtained data set of low flow characteristics and theirmeasurement errors was used in top-kriging estimation.The top-kriging approachwas comparedwith the regional

regression model presented in Laaha and Blöschl (2006b).The regression model was fitted to low flow data of325 Austrian catchments from a 20-year period, and 31physiographic catchment characteristics were used aspotential predictor variables, including sub-catchmentarea (A), topographic elevation (H), topographic slope (S),precipitation (P), geological classes (G), land use classes(L), and stream network density (D), see Laaha and Blöschl(2006a). The comparison of top-kriging and regionalregression performances was carried out on 300 catchmentswhich constitute the intersection of both data sets.

Estimation of variogram

For applying top-kriging, a variogram model is needed,and following Skøien et al. (2003), a point variogramwith anugget effect of the following shape was used:

gp hð Þ ¼ ahb 1� e� h=cð Þd� �

þ C0p (5)

a, b, c, and d are parameters. a is related to the sill of thevariogram, c is a correlation length, b and d define the longand short distance slope of the variogram in a log-log plot,and C0p is the point nugget effect. Despite it is also possibleto usemany of the classical variogrammodels in top-kriging(e.g. exponential, Gaussian, or spherical variogram), wehave chosen to use the model according to Equation (5) as itproved well suited in an earlier flood regionalisation study(Skøien et al., 2003). To ensure a valid kriging model, the

Figure 2. Situation of stream gauges used in this study. Point symbol colours indirecords. Background shading refers to seasonality types (A–E winter seasonality


point variogram generally needs to be estimated andmodelled with a valid conditionally negative-definitefunction based on the data. We proved the validity of themodel for parameters a, b, c, d> 0 and 2b+ d< 1 byshowing that for all z> 0, e� zg(h) is positive definite(Cressie, 1993, p.86ff). The proof was given through thesufficient condition that a decreasing real function which iseven and convex on (0;1) is positive definite (e.g. Berg andForst, 1975, Theorem 5.4).For stream flow and related variables the point variogram

cannot be directly fitted to the sample variogram because ofthe different support (catchment area) of the measurements.Rather, it can be determined by the back-calculationapproach of Kyriakidis (2004) and Mockus (1998). In thisapproach, a number of theoretical point variograms withdifferent parameter sets are assumed. Then, each pointvariogram model is regularised for all pairs of observationsusing Equation (4), in order to find the point variogramwhose regularisations fit best to sample variogram values.According to Cressie (1985), the sum of weighted squaredresiduals between regularised theoretical semivariances andobserved semivariances was used as the optimality criterion.The observed semivariances were estimated using aclassified variogram estimatorwhich is similar toMatheron’s(1965) traditional estimator. The classes were defined alongdistance between centroids (h), area of the smaller catchmentAi, and area of the bigger catchment Aj.From the automatic fitting procedure, we obtained a point

variogram model with parameters a = 112, b = 0.001,c = 4000, d = 0.100, and C0p = 0.580. Figure 3 shows thepoint variogram together with a number of regularisedvariograms for different catchment areas. In all cases, asquare catchment shape was assumed. As the catchmentarea increases, the gamma values decrease because of thesmoothing effect of regularisation. Due to their differentsupports, catchments of different size will always have asemivariance> 0, also in the hypothetical limiting casewhen the centre-to-centre distance is zero. This is the reasonwhy all variograms between catchments of different size

cate magnitude of specific low flows q95 calculated from observed stream flow, 1–5 summer seasonality). The area shown is Austria which is 600km across

Hydrol. Process. 28, 315–324 (2014)

Figure 3. Variogram model of q95 top-kriging

318 G. LAAHA ET AL.

start with an apparent nugget effect. A scatter plot betweenthe theoretical variogram values gmod and observedvariogram values gobs shows that the variogram fits wellto observations, although the model has a tendency ofunderestimating someof the larger gammavalues (Figure 4).

0 10 15 20 25

010

1520

25

Observed variogram (l/s/km²)²

Reg

ular

ized

theo

retic

al v

ario

gram

(l/s

/km

²)²

5

5

Figure 4. Goodness-of-fit between theoretical and observed variogram

Table I. Components of the regional regression model. R2 denotes tflow q95 [ls

Group Region

A–C Alps

1 Flatland and hilly terrain2 Bohemian Massif3 Foothills of Alps (Upper Austria)4 Flyschzone5 Lower CarinthiaD Pre-Alps (Styria)E Pre-Alps (Vorarlberg)

Symbols: P, PW, PS mean annual / winter / summer precipitation [102mm]),percentages of moderate slope [%]; GQ, GL, GC, GGS area percentages of Qgroundwater table [%]; LF, LR, LWE area percentages of forest / rocks / wet


For the purpose of demonstrating the characteristics oftop-kriging, however, the fit was considered acceptable.

Regression model

For application of the regression model, the study domainwas subdivided into eight geographically contiguousregions of similar low flow seasonality (Figure 2). For eachregion, a multiple regression relationship between low flowcharacteristic q95 and catchment characteristics was fitted todata. Nested catchments were split into sub-catchmentsbetween subsequent stream gauges. To minimise inter-correlations and multicollinearity, a stepwise regressionapproach was adopted usingMallow’s Cp (Weisberg, 1985,p. 216) as the criterion of optimality. The method was mademore robust by an interactive outlier detection based onCook’s distance criterion.The resulting regional regression models are shown in

Table I. The equations suggest that precipitation is one ofthe most important controls of low flows in Austria. Itdetermines the water fluxes into the catchment which areavailable during the recession periods. Hence, it has apositive effect on low flows in both, Alpine and lowlandregions. The positive effect on winter low flows in the Alpsmay be related to a tendency of precipitation periods to begenerally warmer than dry winter periods. Catchmenttopography is represented in all regional models, generallyby one altitude parameter or by one slope parameter. Thetopography further has a strong influence on precipitation,and appears as an equally important control of low flows inAustria as precipitation. Altitude has a positive effect onsummer low flows (less evaporation) and a negative effecton winter low flows (lower temperature), which is mostpronounced in the highest alpine ranges. Slope generally hasa positive effect on low flows; it is possibly correlated withstorage volume in high mountains. Catchment geology isrepresented in many regional models. Low flows increasewith the porosity of the formations, so that sediments andlimestone formations have a positive effect on low flows,and crystalline rocks have a negative effect on low flows.Land use seems to play a subordinate role. Apart from forestareas, the proportion of rocks and the proportion of wetlandsappear in the regression models of the alpine regions, but

he goodness-of-fit coefficient of determination; q95 predicted low�1 km�2]

R2 Model

51% q95 = 0.67 + 0.40∙P+ 0.17∙GQ � 0.01∙GC

+6.43∙LWE+ 0.14∙SM � 0.04∙LR � 0.20∙H0

71% q95 = �0.12 + 0.11∙SM+0.05∙GGS + 0.02∙GC

64% q95 = �3.31 + 1.96∙PW68% q95 = �10.04 � 0.76∙D+3.27∙P � 2.22∙H0

63% q95 = �6.17 + 0.06∙GL+ 2.07∙PS � 0.06∙LF

83% q95 = �17.48 + 3.56∙D+20.06∙LWE

89% q95 = �7.99 + 1.08∙P+ 0.04∙LF

60% q95 = 18.20 � 0.18∙SMO

H0 altitude of the stream gauge [102m], SM mean slope [%], and SSL areauaternary sediments / Limestone / Crystalline rock / areas with shallowland [%]; D stream network density [102m/km2].

Hydrol. Process. 28, 315–324 (2014)


they are rather associated with high mountains orhydrological conditions than with classical land uses.The coefficients of determination of Table I show

that the regional regression model based on seasonalityregions performs well in most regions, with coefficients ofdetermination ranging from 60 to 80%. The exception is theAlpine, winter low flow-dominated region, wherethe coefficient of determination is only 51%. This lowcoefficient of determination may be related to lumpingthree types of seasonality (A, B, C) that do not formcontiguous regions into a single contiguous region. Theregression model for the Pre-Alps of Styria exhibits alarger coefficient of determination of 89%. Overall, theresults show that the seasonality characteristics seem tocontain a lot of information highly relevant to low flowregionalisation.In addition to average errors represented by the

coefficient of determination, it is important to have anunderstanding of the errors committed when predictingat an individual, ungauged site. The regression standarderror of predicting individual observations represents theuncertainty of regression estimates and is given by:

e ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2 þ se2 q950ð Þ

p(6)

Figure 5. Estimated specific low


where s is the standard deviation of the residuals of multipleregression, andse q950ð Þis the standard error of the predictedmean value q950 (Draper and Smith, 1998, p. 130).The resulting standard errors range between 0.4 and2.8 ls�1 km�2, depending on the region.

Spatial estimates and error standard deviations

The fitted models were applied to estimate specific lowflows q95 for 8000 nodes of the river network. Assumingsub-catchments as homogeneous units with constant q95,these estimates are representative for the river segmentupstream of the node and can thus be plotted as vector mapsof the stream network.The spatial estimates obtained by top-kriging and regional

regression are presented in Figure 5. Although estimatedpatterns of both models are similar on a larger scale, thereare clear differences in terms of small-scale variability, andtop-kriging yields more heterogeneous patterns thanregression. The small-scale variability of estimates corre-sponds with the weight a model gives to local informationrelative to regional information. Regression is fitted to aregional data set without local weighting of data points.Thus, the variability of estimates solely originates from thespatial variability of predictors reflecting topography and

stream flows q95 (ls�1 km�2)

Hydrol. Process. 28, 315–324 (2014)

320 G. LAAHA ET AL.

climate of the study area. Top-kriging, however, distributesweights according to proximity and topology of catchments,depending on the chosen variogram model and the size ofthe local neighbourhood. For the setting used in this study,much weight is given to local data, and the patterns of top-kriging estimates are therefore more heterogeneous thanregression estimates.The uncertainties estimated by top-kriging and regression

are very different for most of the stream network. Theprediction errors of regression (Figure 6b) are nearlyconstant over large areas which, to some degree, correspondto the geographic regions described in section 3.3. Theybasically reflect the estimation variance of each regionalregression model. The prediction errors of top-kriging(Figure 6a) are more heterogeneous. Top-kriging givesrelatively small uncertainties on the main river with errorstandard deviations between 0 and 1 ls�1 km�2. This is onlyslightly larger than the error standard deviations of theobservations. On the other hand, the uncertainties ofsome of the tributaries are considerably larger, in particularof those tributaries where no observations are available.It is interesting that the uncertainty also increases substan-tially with decreasing catchment area. For some of thesmallest catchments, error standard deviations of more than

Figure 6. Error standard deviation of spatial es


8 ls�1 km�2 are estimated. The higher uncertainty for thesmallest catchments is very realistic, because catchments ofsuch size are underrepresented in the sample. Hence, theirprediction will be subject to additional downscaling errors.Because of the tree structure of river networks, scale issuesconstitute a typical problem in river network modelling(Blöschl and Sivapalan, 1995; Skøien and Blöschl, 2006).Top-kriging explicitly takes scaling into account, since it isactually one solution to the traditional change of supportproblem in geostatistics (Gotway and Young, 2002). Top-kriging is therefore the natural way of predicting streamflow-related variables on a river network. Regression,however, ignores the differences of the support along theriver network.

Cross-validation

In order to examine the relative performances of modelsmore quantitatively, we performed leave-one-out cross-validation where one withholds the stream flow observationof a particular data point,makes an estimate for that location,and then compares the estimate with the stream flowobservation, repeating the procedure for all data points. Thisprocedure emulates the case of estimating at sites withoutstream flow observation. For top-kriging, we used the

timates of low stream flows q95 (ls�1 km�2)

Hydrol. Process. 28, 315–324 (2014)


variogrammodel obtained from all n observations to predictby kriging each point in turn using the remaining n�1observations (e.g. see Cressie, 1993, p102). The approachtherefore assumes a known variogram model. For regionalregression, the model consists of individual regressions forcontiguous sub-regions (section 3.3). Because of the largenumber of observations, the boundaries will not changemuch when leaving out single data points, so we did notchange the regions in the cross-validation. For each point inturn, we updated the regression equation for remaining n�1data points, and the so-obtained regression model was usedto predict low flows at the site of interest. From the resultingvector of cv-errors of each model, the cross-validationerror variance Vcv, the root mean squared error rmsecv, thecoefficient of determination R2

cv, and the bias of cross-validation estimates biascv were estimated:

Vcv ¼ 1n

Xni¼1

ecv;i� �2

(7)

rmsecv ¼ffiffiffiffiffiffiffiVcv

p(8)

R2cv ¼

Vq � Vcv

Vq(9)

biascv ¼ 1n

Xni¼1

ecv;i� �

(10)

In these equations, ecv,i is the cv-residual of a model forcatchment i, and Vq is the spatial variance of the observedspecific low flow discharges q95. The root mean squarederror rmsecv and the coefficient of determination R2

cv

provide composite measures of systematic and randomerrors, whereas the of cross-validation estimates biascvprovides a measure of systematic errors only. In addition,relative error measures rrmsecv = rmsecv / mean(q95) andrbiascv = biascv / mean (q95) were calculated.The results of the cross-validation are shown in Figure 7

in terms of the error distribution for the set of 300catchments, and residual statistics are presented in Table II.All residual statistics (R2

cv, rmsecv, biascv, etc.) were estimatedwithout using the 5% outliers for robustness. These outliers

0.2 1.0 5.0 20.0

0.2

1.0

5.0

20.0

Observed q95 (l/s/km²)

Pre

dict

ed q

95 (

l/s/k

m²)

a)

Figure 7. Scatter plots of predicted versus observed specific low st


can be explained by karst effects or seepage and are hence notgenuinely attributable to regionalisation errors. Figure 7,however, does include these outliers. The results indicate thattop-kriging (R2

cv = 0.75, rmsecv = 1.781 ls�1 km�2) clearlyoutperforms regression, which gives R2

cv of 0.68 and rmsecvof 1.999 ls�1 km�2. The regression estimates are nearlyunbiased (biascv =�0.032 ls�1 km�2), while top-krigingslightly overestimates low stream flows (biascv = 0.294ls�1 km�2

, rbiascv = 0.05). The possible reasons will beassessed through regional examples in the next section.

Regional performances

The first example is a typical situation of Tyrolean High-Alps. Catchments in this region are small, and many of thecatchments are headwater catchment, i.e. catchmentswithout upstream data point. Figure 8 shows the estimates(left panels) and error standard deviations (right panels) oftop-kriging and regression plotted along the stream network.The observations are shown as circles, using the same colourcoding as for the estimates. The results of both models differsubstantially. Close to a gauge, top-kriging estimates fit verywell to the observations, and kriging errors are of the samemagnitude as measurement errors. The prediction errors ofregression, however, are much larger than measurementerrors, indicating a lower performance of regression in closeproximity to gauges. The relative performances of modelschange when moving from the gauges to the headwatercatchments (e.g. region indicated by the ellipses in Figure 8).Low flow discharges of Alpine catchments are expected todecrease with catchment altitude, as they are controlled byfreezing processes which are more intense for higheraltitudes. The expected low flow patterns are fairly wellreproduced by regression, as altitude is parameterised in theregression model. Top-kriging, however, overestimatesthese values. The reason is that top-kriging distributesweights according to integral distances of catchment areasand therefore gives most weight to neighbouring gauges atthe same river. Sources therefore correspond to a boundaryarea of top-kriging, and effects similar to the well-knownboundary effects of ordinary-kriging occur. The boundaryeffects are reflected in top-kriging standard deviations(Figure 8c) which increase whenmoving from valleys to the

0.2 1.0 5.0 20.0

0.2

1.0

5.0

20.0

Observed q95 (l/s/km²)

Pre

dict

ed q

95 (

l/s/k

m²)

b)

ream flows q95 (ls�1 km�2) of (a) top-kriging and (b) regression

Hydrol. Process. 28, 315–324 (2014)

Table II. Predictive performance of top-kriging (TK) and regional regression (Reg) for different data situations (all catchments,headwater catchments, non-headwater catchments)

Situation All catchments Headwater Non-headwater

Model TK Reg TK Reg TK Reg

R2cv (-) 0.75 0.68 0.59 0.56 0.91 0.82

rmsecv (ls�1 km�2) 1.781 1.999 2.355 2.427 0.951 1.392

biascv (ls�1 km�2) 0.294 �0.032 0.558 0.119 0.036 �0.167

rrmsecv (-) 0.31 0.34 0.40 0.41 0.16 0.24rbiascv (-) 0.05 �0.01 0.10 0.02 0.01 �0.03

Figure 8. Regional example Tyrol: q95 low flow predictions in ungauged basins by top-kriging (a) and regression (b) for an Alpine region in Austria.(c) and (d) indicate error standard deviations. The area shown is 100 km across. Ellipses indicate a region where top-kriging extrapolates from gauges

situated in the valleys to headwater catchments (without upstream data point) leading to systematic overestimation

a)

d)

c)

b)

Figure 9. Regional example river Mur: q95 low flow predictions in ungauged basins by top-kriging (a) and regression (b) for a major river in Austria. (c) and(d) indicate error standard deviations. The area shown is 100km across. For large rivers and high gauging density (see ellipsis) top-kriging outperforms regression

322 G. LAAHA ET AL.

Copyright © 2012 John Wiley & Sons, Ltd. Hydrol. Process. 28, 315–324 (2014)


sources. In this situation, top-kriging extrapolates the highervalues from gauges in the valleys to the headwaters.The second example is the river Mur and tributaries in

low Alps of Styria, southern Austria. The river Mur is oneof the major rivers in Austria, and river sites are related tolarge catchment areas (up to 4400 km2). Estimates anderror standard deviations of top-kriging and regression arepresented in Figure 9. Along the main river, top-krigingand regression estimates are consistent with observations.The estimates of top-kriging are also consistent withobservations for the larger tributaries, but this is clearlynot true for regression. For locations without upstreamdata point, however, top-kriging yields a rather arbitrarypattern, while regression estimates show the expecteddecrease with catchment altitude. The relative perfor-mances of both methods are manifested in the errorstandard deviations (Figure 9c,d). The errors of top-kriging are low for large rivers and a high gaugingdensity, and the errors increase in the tributaries withdecreasing catchment area. The errors of regression arenearly constant in this region. They do not seem tocontain a lot of information about the regional perform-ance of the model.To generalise the findings from the regional examples

to the whole study area, we conducted a stratified cross-validation analysis where catchments were separated inheadwater catchments (without upstream gauge) and non-headwater catchments (with upstream gauge). For theAustrian setting, headwater catchments typically correspondto small catchments situated in Alpine areas (example 1),and non-headwater catchments correspond to larger catch-ments (example 2).Table II indicates that for non-headwater catchments, top-

kriging is nearly unbiased (biascv = 0.036 ls�1 km�2) andregression yields a slightly negative bias of �0.167ls�1 km�2. These biases are small compared to observedlow flows in Austria, and systematic errors are a minorproblem for the relatively large non-headwater catchments.For headwater catchments, the systematic error of regressionis also low (biascv = 0.119 ls�1 km�2), but top-krigingexhibits somewhat larger systematic errors (biascv = 0.558ls�1 km�2), corresponding to an average overestimation of10%.As it has been discussed above, low flow discharges ofAlpine catchments are expected to decrease with catchmentaltitude, and using spatial information from a downstreamgauge is clearly a biased procedure. For top-kriging, thealtitude gradient of low flows may result in non-stationaryincrements which violate the assumption of intrinsicstationarity. Hence, top-kriging estimates could be furtherimproved by including the altitude gradient in the model, byan approach similar to external drift kriging (Wackernagel,1995). This will likely reduce the bias of headwatercatchments and increase the predictive performance oftop-kriging over the regression model. Laaha et al. (2012)tested top-kriging with external drift to water temperaturesin Austria. The approach assumes a deterministic relation-ship between water temperature and catchment altitude,through performing top-kriging on the residuals of an initialregressionmodel. Their analysis showed that themodel with


external drift was indeed free of regional biases. From theassumptions, the approach seems notably well suited whenthe auxiliary variable has a much lower measurement errorthan the target variable. Otherwise, a combined co-krigingand top-kriging approach would be better suited to includeauxiliary information in the top-kriging model. This,however, would require a more complex algorithm whichincludes cross-variograms in all steps of the analysis.We finally assess the total predictive performance of

models. For headwater catchments, top-kriging (R2cv = 0.59,

rmsecv = 2.355 ls�1 km�2) exhibits somewhat highercoefficient of determination and a somewhat lower rootmean squared error than regression (R2

cv = 0.56, rmsecv =2.427 ls�1 km�2). For the non-headwater catchments, thepredictive performances are generally higher than forheadwater catchments. Here, top-kriging (R2

cv = 0.91,rmsecv = 0.911 ls�1 km�2) clearly outperforms regression(R2

cv = 0.82, rmsecv = 1.392 ls�1 km�2). The cross-validation

analysis confirms the findings from the regional examplesthat top-kriging performs clearly better for the larger non-headwater catchments than regression. For smallercatchments in the tributaries and headwater catchments,the performance of top-kriging is indeed reduced because ofthe bias, but, nevertheless, still higher than the performanceof the regression model. These findings correspond wellwith the results of Castiglioni et al. (2011)where top-krigingwas compared with a smooth regional estimation method,known as physiographical space-based interpolation (PSBI)(Castiglioni et al., 2009) for the case of predicting low flowsof 51 catchments in central Italy. Overall, top-kriging andPSBI have comparable performances (Nash–Sutcliffeefficiencies in cross-validation of 0.89 and 0.83, respect-ively), but top-kriging outperforms PSBI at larger riverbranches while PSBI performs better for headwatercatchments. The main idea of PSBI is to apply standardpoint-kriging methods in a transformed space of catchmentcharacteristics instead of kriging in geographical space.Consequently, PSBI and regressionmodels exploit the samekind of information, i.e. the catchment characteristics, andthis explains the similar features of both concepts. Thecorrelations along the stream network are accounted for onlyindirectly through the catchment characteristics. However,top-kriging exploits the spatial correlations of low flows inan explicit way. Top-kriging is a best linear unbiasedgeostatistical estimation method which takes the specificstructure of river networks into account.

CONCLUSIONS

We compared the performance of top-kriging relative to theregional regression model proposed by Laaha and Blöschl(2006b) by leave-one-out cross-validation. Regional regres-sion is the standard regionalisation approach in low flowhydrology and constitutes a benchmark for any innovativemodel.On average, over theAustrian study area, top-krigingexplains 75% of the variance of the low flows, while theregression models explain 68%. The performance of top-kriging mainly depends on the (intrinsic) homogeneity of

Hydrol. Process. 28, 315–324 (2014)

324 G. LAAHA ET AL.

the observations and the density of the gauging network inthe region. One would expect the performance of top-kriging to increase with increasing density of the networkand increasing catchment size. The latter is because runoff isan integrating process, so low flows tend to vary moresmoothly along the stream for large catchments. Theanalyses of the data set in this paper indicate that this isindeed the case. Table II suggests that top-kriging explains91% and 59% of the variances in the non-headwater andheadwater catchments, respectively. The median catchmentsizes of these two groups of catchments are 252 and 62 km2.The performance of the regressionmodel, in contrast, hingeson the availability of meaningful catchment characteristicsused in the regression model and the degree of correlation ofthese catchment characteristics with the low flows. Lowflow generation is mainly governed by subsurface processeswhich are not accessible by area-wide data collectiontechniques, such as remote sensing. Thismakes it difficult tocollect meaningful catchment characteristics. The choicebetween top-kriging and the regression model shouldtherefore mainly depend on data availability and thecharacteristics of the observations, as the two methods usedifferent sources of information. The data characteristics ofthis case study give some guidance on the choice of method:For top-kriging, the network density is relevant (490observation points over 84 000 km2). For the regressionmodels, the standard error is relevant (0.4 to 2.5 ls�1 km�2,depending on the region).From the conceptualisation of stream flow generation

processes, the distribution of kriging weights, thepredictive performance, and the distribution of uncer-tainty along the stream network, top-kriging appearsnotably well adapted for the stream network problem.Top-kriging is easy to apply as it does not requireadditional auxiliary data. This is a major advantage overthe classical regression approach if the auxiliary data arenot well correlated with the variable of interest orunavailable. Also, top-kriging can be extended tocombine the strengths of regional regression and top-kriging which is a topic we are currently exploring. Top-kriging also offers advantages over one-dimensionalconceptualisations of correlations along the streamnetwork, as there is no need for a decision whether toestimate the variable from upstream or downstreamneighbours, or a combination thereof. We thereforesuggest that top-kriging is the most natural method ofspatial predictions on river networks.

ACKNOWLEDGEMENTS

The authors would like to thank the Austrian ScienceFoundation (FWF), project no. P18993-N10, and theAustrian Academy of Sciences ‘Hydrological Predictabil-ity’ project for financial support. The paper is acontribution to UNESCO’s FRIEND-Water program.Comments from two anonymous reviewers contributedto the improvement of this paper.


REFERENCES

Aschwanden H, Kan C. 1999. Die Abflussmenge Q347 - eineStandortbestimmung. Hydrologische Mitteilungen 27. Landeshydrolo-gie und -geologie, Bern.

Berg C, Forst G. 1975. Potential theory on locally compact Abeliangroups. Springer: Berlin.

Blöschl G, Sivapalan M. 1995. Scale issues in hydrological modelling - areview. Hydrological Processes 9: 251–290.

Castiglioni S, Castellarin A, Montanari A. 2009. Prediction of low-flowindices in ungauged basin through physiographical space-basedinterpolation. Journal of Hydrology 378: 272–280.

Castiglioni S, Castellarin A, Montanari A, Skøien JO, Laaha G, Blöschl G.2011. Smooth regional estimation of low-flow indices: physiographicalspace based interpolation and top-kriging. Hydrology and Earth SystemSciences 15: 715–727. DOI: 10.5194/hess-15-715-2011.

Cressie N. 1985. Fitting variogram models by weighted least squares.Mathematical Geology 17: 563–586.

Cressie N. 1993. Statistics for spatial data, revised edn.Wiley: NewYork, NY.Cressie N, Frey J, Harch B, Smith M. 2006. Spatial prediction on a rivernetwork. Journal of Agricultural, Biological, and EnvironmentalStatistics 11: 127–150.

Draper NR, Smith H. 1998. Applied Regression Analysis, 3rd edn. Wiley:New York, NY.

Garreta V, Monestiez P, Ver Hoef JM. 2009. Spatial modelling andprediction on river networks: up model, down model or hybrid?Environmetrics 21: 439–456. DOI: 10.1002/env.995.

Gottschalk L. 1993a. Correlation and covariance of runoff. StochasticHydrology and Hydraulics 7: 85–101.

Gottschalk L, Krasovskaia I, Leblois E, Sauquet E. 2006. Mapping meanand variance of runoff in a river basin. Hydrology and Earth SystemSciences 10: 469–484.

Gotway CA, Young, LJ. 2002. Combining Incompatible Spatial Data.Journal of the American Statistical Association 97: 632–648.

Gustard A, Bullock A, Dixon JM. 1992. Low flow estimation in the UnitedKingdom. Institute of Hydrology, Report No. 108, 88pp, append.

Kyriakidis PC. 2004. A geostatistical framework for area-to-point spatialinterpolation. Geographical Analysis 36: 259–289.

Laaha G, Blöschl G. 2005. Low flow estimates from short stream flowrecords—a comparison of methods. Journal of Hydrology 306(1/4):264–286.

Laaha G, Blöschl G. 2006a. Seasonality indices for regionalizing lowflows. Hydrological Processes 20: 3851–3878.

Laaha G, Blöschl G. 2006b. A comparison of low flow regionalisationmethods—catchment grouping. Journal of Hydrology 323(1/4): 193–214.

Laaha G, Blöschl G. 2007. A national low flow estimation procedure forAustria. Hydrological Sciences Journal 52(4): 625–644.

Laaha, G, Skøien JO, Blöschl G. 2012. External top-kriging of streamtemperature in Austria. Environmental Modeling and Assessment, underreview.

deMarsily G. 1986.Quantitative hydrogeology. Academic Press: London, UK.Matheron G. 1965. Les variables regionalisées et leur estimation. Masson:Paris, France.

Merz R, Blöschl G. 2005. Flood frequency regionalisation - Spatialproximity vs catchment attributes. Journal of Hydrology 302: 283–306.

Mockus A. 1998. Estimating dependencies from spatial averages. Journalof Computational and Graphical Statistics 7: 501–513.

Nathan RJ, McMahon TA. 1990. Identification of homogeneous regionsfor the purpose of regionalization. Journal of Hydrology 121: 217–238.

Peterson EE, Ver Hoef JM. 2010. A mixed-model moving-averageapproach to geostatistical modelling in stream networks. Ecology 91(3):644–651.

Sauquet E, Gottschalk L, Leblois E. 2000. Mapping average annualrunoff: a hierarchical approach applying a stochastic interpolationscheme. Hydrological Sciences Journal 45: 799–815.

Skøien JO, Blöschl G. 2006. Sampling scale effects in random fields andimplications for environmental monitoring. Environmental Monitoringand Assessment 114(1-3): 521–552.

Skøien JO, Blöschl G, Western AW. 2003. Characteristic space-timescales in hydrology. Water Resources Research 39: 1304–1323.

Skøien JO, Merz R, Blöschl G. 2006. Top-kriging – geostatistics onstream networks. Hydrology and Earth System Sciences 10: 277–287.

Ver Hoef JM, Peterson E, Theobald D. 2006. Spatial statistical models thatuse flow and stream distance. Environmental and Ecological Statistics13: 449–464.

Wackernagel H. 1995. Multivariate Geostatistics. Springer: BerlinHeidelberg, Germany.

Weisberg S. 1985.Applied LinearRegression, 2nd edn.Wiley:NewYork,NY.

Hydrol. Process. 28, 315–324 (2014)

spatial prediction on river networks: comparison of ... · m is the lagrange multiplier from the...

Documents