Cost assessment for PR19: a consultation on
econometric cost modelling
Appendix 1 – Modelling results
March 2018
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
2
Summary .................................................................................................................... 3
1 Water models ...................................................................................................... 5
1.1 Water resources models ............................................................................... 5
1.2 Water treatment models ................................................................................ 9
1.3 Water resources plus .................................................................................. 11
1.4 Treated water distribution models ............................................................... 12
1.5 Network plus water models ......................................................................... 15
1.6 Wholesale water models ............................................................................. 28
2 Wastewater models........................................................................................... 41
2.1 Bioresources models ................................................................................... 41
2.2 Sewage treatment models........................................................................... 48
2.3 Bioresources plus models ........................................................................... 52
2.4 Sewage collection models ........................................................................... 52
2.5 Network plus wastewater models ................................................................ 56
2.6 Wholesale wastewater models .................................................................... 69
3 Retail models .................................................................................................... 80
3.1 Bad debt models ......................................................................................... 80
3.2 Totex less bad debt models ........................................................................ 88
3.3 Total expenditure models ............................................................................ 94
4 Enhancement expenditure models .................................................................. 102
4.1 Meeting lead standards costs.................................................................... 102
4.2 Water new developments and new connections ....................................... 103
4.3 First time sewerage costs ......................................................................... 104
4.4 Sewage growth ......................................................................................... 105
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
3
Summary
This appendix presents the econometric cost models of this consultation. It includes
models proposed by us and models proposed by 13 water companies. This appendix
supplements Cost assessment for PR19: consultation on econometric modelling.
All the econometric models in this appendix are presented in a fixed template. The
table below provides a glossary for the statistical diagnostics used in the templates.
We have published a set of (‘do’) files with a code to run all our models in this
appendix in Stata. We have also published a set of excel spreadsheets with the
underlying data. We discuss the source of data in the main document.
The remainder of this appendix is structured as follows:
section 1 presents the modelling results for wholesale water activities;
section 2 presents the modelling results for wholesale wastewater activities;
section 3 presents the modelling results for retail expenditure, and
section 4 presents the modelling results for enhancement expenditure.
A simple glossary of statistical diagnostics in our templates
P-value of an
estimated
coefficient
The p-value gives the probability of observing the estimated
coefficient (or one more extreme) if the true value was in fact zero.
A lower value indicates a lower probability of observing the
estimated coefficient if the true value was zero, and can thus be
interpreted as giving a higher degree of confidence that the true
value is not zero – i.e. that there is a relationship between the
dependent and explanatory variables.
In practice, the p-value indicates our confidence in the estimated
coefficient. The lower the p-value, the more confident we are in
the value of the estimated coefficient.
Technical comment: due to the panel nature of the data, p-values
in this appendix are based on cluster robust standard errors.
Asterisks for
estimated
coefficients
Next to the estimated coefficients we use a common asterisks
notation to indicate their statistical significance.
*** indicates 1% significance level
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
4
** indicates 5% significance level
* indicates 10% significance level
The more starts, the more confident we are in the value of the
estimated coefficient.
No star indicates a lower level of statistical significance (ie there is
less confidence in the value of the estimated coefficient). However,
there is a wide range of confidence levels in this category. As we
say in section 2.1 of the consultation, statistical significance of
80% and even 70% are may deemed valid in practical work.
R2 adjusted The adjusted R-squared measures how accurately the model fits
the data. It measures the proportion of variation in the dependent
variables (in our case, variation in costs) that can be explained by
the model.
The statistic ranges from 0 to 1. The higher the value the better
the model fits.
Importantly, R2 measures should only be used to compare models
with the same dependent variable.
Variance
Inflation
Factor (VIF)
Used to detect multicollinearity. High collinearity means that we
cannot estimate the coefficients with confidence – their variance is
high and statistical significance low. As a consequence the
individual coefficient estimates are not precise and unstable. As a
rule of thumb, a VIF>4 indicates medium risk and VIF>10
indicates harmful collinearity.
An exception to this rule is when the model includes a variable and
its quadratic term. In such cases the VIF becomes high due to the
correlation between these two related terms. But while the high
collinearity may impair our ability to accurately estimate the impact
of the individual terms on the dependent variable, it should not
impair our ability to accurately estimate their collective impact.
Since these two terms always move together, the collective impact
is what is important.
Reset test Regression specification error test. Used to detect an inadequate
functional form. Particularly powerful for detecting if the model is
missing non-linear terms.
The higher the p-value the more confident we are that the
functional form is adequate.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
5
1 Water models
1.1 Water resources models
Template 1. Water resources models proposed by Ofwat
Description of dependent variable
Water resources base costs excluding abstraction charges and items described in section 3 of the main consultation document.
All monetary values have been inflated to 2016-17 prices using the CPIH
Comments on models
We have used the number of connected properties as a scale variable in our water resource models. We considered that the volume of water, while perhaps a more intuitive scale driver or a water resources business, suffered from endogeneity. It is to an extent under management control and can provide a perverse incentive on water efficiency.
We use average pumping head to account for energy costs, which are an important component of water resources costs.
We considered other factors. We expected a positive coefficient to the number of sources per property and a negative coefficient for the proportion of water from impounding reservoirs. However, the model did not generate our expected results for these variables. A number of companies present models with a positive coefficient on the proportion of water from reservoirs. We question whether this is the expected sign in a water resources model.
Our simple models explain close to 90% of the variation in water resources costs.
Consultation model ID OWR1 OWR2
Dependent variable ln (water resources base costs)
ln (connected properties) 1.026*** (0.000)
1.069*** (0.000)
ln (average pumping head water resources)
0.163 (0.139)
Constant 1.938** (0.019)
0.808 (0.460)
R2 adjusted 0.889 0.894
VIF (max) 1.000 1.259
Reset test 0.562 0.62
Estimation method OLS OLS
N (sample size) 107 107
Template 2. Water resources models proposed by Anglian Water
Description of dependent variable
Log of water resources base costs excluding rates and abstraction charges
Acronyms used in explanatory variables
APH = average pumping head
DI = distribution input
WTW = water treatment works
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
6
Comments on models (Anglian Water)
Model 1 takes the Water Resources’ operational parameters as the causation factors. For a single AMP, these causation factors are exogenous. Models 2, 3 and 4 are based on demographic and geographic factors. It is the most fundamental: causation factors are completely exogenous to WaSCs and WoCs.
All four models are described in detail in our Cost Modelling report – Phase 2, published March 2018 here http://www.anglianwater.co.uk/about-us/thinking-about-our-future/.
Consultation model ID ANHWR1 ANHWR2 ANHWR3 ANHWR4
Company’s model ID 1 2 3 4
Dependent variable ln (Water resources botex less rates and abstraction costs)
Ln(DI from impounding reservoirs) Ml/d
0.0007 ** (0.046)
Ln(DI from pumped storage reservoirs) Ml/d
-0.00004 (0.797)
Ln(DI from rivers) Ml/d
0.0004 (0.107)
Ln(DI from boreholes) Ml/d
0.0007 ** (0.015)
0.220 *** (0.000)
Ln(DI from rivers & reservoirs Ml/d
0.377 *** (0.000)
Ln(average DI from surface WTW) Ml/d
0.344 *** (0.000)
Ln(average DI from borehole WTW) Ml/d
-0.104 (0.105)
Ln(APH x *DI) Unit: (Ml/d)*m hd
0.641 *** (0.000)
0.184 * (0.082)
Ln(Number of sources) 0.173 ** (0.041)
0.268 *** (0.000)
0.429 *** (0.000)
0.101 (0.156)
Ln(Reservoir capacity ) Unit: Ml
0.140 *** (0.000)
0.304 *** (0.000)
0.209 *** (0.000)
0.164 *** (0.000)
% DI from groundwater -1.350 ***
(0.000)
% DI from rivers & pumped storage reservoirs
-1.145 *** (0.004)
Volume abstracted/maximum licenced volume
1.427 *** (0.000)
1.420 *** (0.000)
% population in sparse areas (<600 people per sqkm)
-0.284 (0.151)
% population in dense areas (>4000 people per sqkm)
0.686 ** (0.031)
Constant -7.04 *** (0.000)
-3.31 *** (0.000)
-4.90 *** (0.000)
-3.11 *** (0.000)
R2 adjusted 0.902 0.902 0.905 0.912
Reset test 0.697 0.000 0.086 0.602
VIF (max) 9.77 5.19 8.14 4.77
Method OLS OLS OLS OLS
N (sample size) 107 107 107 107
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
7
Template 3. Water resources models proposed by Southern Water
Description of dependent variable
Modelled water resources OPEX + modelled water resources base CAPEX
modelled OPEX is total OPEX less third party services, abstraction charges and local authority rates
modelled base CAPEX is maintenance expenditure in infrastructure and non-infrastructure less grants and contributions.
All costs are unsmoothed and deflated to 2016/17 prices using CPIH.
Comments on models (Southern Water)
Water resources models in particular tend to be comparatively less robust to statistical sensitivities and alternative model specifications and predict a relatively wide efficiency ranges across the industry. The models presented are robust to the inclusion of abstraction charges, but individual companies’ performances are sensitive to the modelled cost.
The models control for scale (using unit costs), geological factors, capacity and source type:
Model 7 models cost per connected property, whereas models 8-9 model cost per population served
All models control for sources over DI and reservoir capacity
Models 7-8 control for proportion of DI from reservoirs and model 9 controls for the proportion of DI from boreholes and the proportion of DI from rivers All models control for sources over DI and reservoir capacity
Model 7 is identical to Yorkshire Water model 14 (Ofwat comment)
Consultation model ID SRNWR1 SRNWR2 SRNWR3
Company’s model ID 7 8 9
Dependent variable ln (Resources BOTEX per thousand connected properties)
Sources over DI 0.646*** (0.003)
0.740*** (0.001)
0.708*** (0.008)
Reservoir capacity (log) 0.040 (0.17)
0.044* (0.095)
0.046* (0.083)
Proportion of DI from reservoirs 0.731** (0.043)
0.749** (0.04)
Proportion of DI from boreholes -0.734** (0.03)
Proportion of DI from rivers -0.793* (0.072)
Constant -5.428*** (0.000)
-6.296*** (0.000)
-5.563*** (0.000)
R2 adjusted 0.382 0.404 0.403
Reset test 0.069 0.201 0.172
VIF (max) 1.499 1.499 2.728
Method OLS OLS OLS
N (sample size) 102 102 102
Template 4. Water resources models proposed by Yorkshire Water
Description of dependent variable
Water resources base costs = operating expenditure less abstraction charges, third party services and local authority rates + capital maintenance expenditure net of grants and contributions (G&C)
The dependent variables are deflated using CPIH to 2016/17 prices. No smoothing was undertaken.
Comments on models (Yorkshire Water)
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
8
Water resources models tend to predict a wider efficiency range and have poor statistical properties compared to other models.
Our dependent variable excludes G&C consistent with the PR14 approach. However, given lack of split of G&C for capital maintenance and enhancement expenditure, we have also modelled CAPEX on a gross basis. The statistical performance of the models are broadly consistent with and without G&C.
Model 15 was proposed also by South Staffs Water, which made the comment:
The water resources models are not as robust as aggregate or network plus models and therefore the range of modelled costs compared to actual costs is wider. For this reason, we think that it would be inappropriate to set the water resources price control on the basis of a water resource model alone, the price control should take a wider set of information into account.
Consultation model ID YKYWR1 YKYSSCWR2 YKYWR3 YKYWR4
Company’s model ID 14 15 16 17
Dependent variable ln (resources BOTEX per thousand
connected properties) ln (resources BOTEX per thousand
population)
Sources over DI (number / (Ml/d))
0.646*** (0.003)
0.535** (0.025)
0.740*** (0.001)
0.708*** (0.008)
Reservoir capacity (Ml) (log) 0.0400 (0.17)
0.049* (0.08)
0.0440* (0.095)
0.046* (0.083)
% DI from reservoirs 0.731** (0.043)
0.749** (0.04)
% DI from boreholes -0.652* (0.052)
-0.734** (0.03)
% DI from rivers -0.858** (0.039)
-0.793* (0.072)
Constant -5.428*** (0.000)
-4.775*** (0.000)
-6.296*** (0.000)
-5.563*** (0.000)
R2 adjusted 0.382 0.389 0.404 0.403
Reset test 0.069 0.038 0.201 0.172
VIF (max) 1.499 2.728 1.499 2.728
Method OLS OLS OLS OLS
N (sample size) 102 102 102 102
Template 5. Water resources models proposed by Bristol Water
Description of dependent variable
The dependent variable is Botex per connected property.
Botex = (total opex – business rates – third party costs) + capital maintenance
Comments on models (Bristol Water)
The models and presented in this pro forma are based on the Master Wholesale Cost data file dated 27th February 2017, reflecting the latest updates and amendments to the data.
Capital maintenance costs have been smoothed on a three year rolling-average basis, therefore four years of data have been modelled (2014-2017). Botex costs have been calculated on a unit cost basis by dividing cost information by the sum of Total non-household connected properties at year end and Total household connected properties at year end also from the six-year wholesale cost data set.
A full description of the work undertaken to arrive at these models is set out in a report by NERA: ‘Comparative Benchmarking Assessment to Support Preparation of Bristol Water’s AMP7 Business Plan’ (December 2017).
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
9
Consultation model ID BRLWR1 BRLWR2 BRLWR3
Company’s model ID 4 5 6
Dependent variable Ln(water resource botex per property)
Ln(length of raw mains and conveyors/DI)
0.134* (0.053)
0.109* (0.078)
Year15 dummy -0.039* (0.084)
-0.039 (0.108)
-0.038 (0.107)
Year16 dummy -0.029 (0.569)
-0.035 (0.498)
-0.034 (0.492)
Year17 dummy -0.047 (0.448)
-0.063 (0.305)
-0.063 (0.297)
% of water from reservoirs 0.680** (0.026)
0.673** (0.027)
0.624*** (0.001)
Ln(number of sources/DI) 0.167** (0.025)
-0.034 (0.601)
0.0007 (0.993)
% of water from boreholes -0.224 (0.383)
0.119 (0.657)
Ln(average pumping head resources)
0.180* (0.067)
0.171 (0.113)
Constant -4.006*** -5.013*** -4.872***
R2 adjusted 0.45 0.55 0.54
Reset test 0.43 0.44 0.45
VIF (max) 2.21 3.11 1.51
Method OLS OLS OLS
N (sample size) 68 68 68
1.2 Water treatment models
Template 6. Water treatment models proposed by Ofwat
Description of dependent variable
Water treatment base costs excluding items described in section 3 of the main consultation document.
Description of selected explanatory variables
Treated water = total water treated at all ground and surface water works
% boreholes =The percent of distribution input (DI) coming from boreholes, artificial recharge and aquifer storage and recovery water supply schemes
% proportion of water treated in WTW levels 3-6 = The percent of water treated in water treatment works with complexity levels 3 to 6
All monetary values have been inflated to 2016-17 prices using the CPIH
Comments on models
We considered connected customers and total treated water as scale variables in our water treatment models.
For treatment complexity we used one of two factors: the percent of distribution input from boreholes, which is typically cheaper to treat relative to surface sources, or the proportion of water treated in works of complexity 3 to 6. We considered that treatment works levels 3-6 provided a better representation of the more complex works, rather than treatment works level 4-6. Although level 3 does include traditional treatment methods there are significant three treatment stage works that would fall into this category and the boundary between levels 2 and 3 represents a clearer divide between ‘basic’ and ‘complex’ than the boundary between levels 3 and 4.
In some models we control for pumping costs. In some we control for economies of scale at the treatment works level by including a density variable. All coefficients are robust and meet expectations.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
10
Consultation model ID OWT1 OWT2 OWT3 OWT4 OWT5 OWT6 OWT7 OWT8 OWT9 OWT10
Dependent variable ------------------ ln (water treatment base costs) ------------------
ln (connected properties) .947*** (0.000)
.941*** (0.000)
.949*** (0.000)
.985*** (0.000)
.972*** (0.000)
ln (total water treated) .923*** (0.000)
.913*** (0.000)
.919*** (0.000)
.968*** (0.000)
.962*** (0.000)
% of DI coming from boreholes
-.006* (0.065)
-0.005 (0.109)
-.004** (0.039)
-0.003 (0.104)
-.004** (0.050)
-0.003 (0.151)
% of water treated in WTW levels 3-6
.008*** (0.008)
.006** (0.014)
.008*** (0.000)
.007*** (0.000)
ln (average pumping head for water treatment)
.217*** (0.004)
.200*** (0.004)
.200*** (0.007)
.187*** (0.006)
.156*** (0.009)
.129** (0.013)
.188** (0.019)
.156** (0.030)
ln (weighted average density)
-.157** (0.015)
-.203*** (0.003)
-0.117 (0.142)
-.173** (0.033)
Constant 4.47*** 11.7*** 3.98*** 11.2*** 3.08*** 10.6*** 3.8*** 11.8*** 4.5*** 12.2***
R2 adjusted 0.862 0.865 0.907 0.903 0.91 0.904 0.92 0.922 0.912 0.915
VIF (max) 1.061 1.081 1.101 1.119 1.127 1.142 1.276 1.286 1.264 1.304
Reset test 0.769 0.885 0.076 0.174 0.551 0.294 0.007 0 0.016 0.024
Method OLS OLS OLS OLS OLS OLS OLS OLS OLS OLS
N (sample size) 107 107 107 107 107 107 107 107 107 107
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
11
1.3 Water resources plus
Template 7. Water resources plus models proposed by Ofwat
Description of dependent variables
Water resources plus is composed of water resources, raw water distribution and water treatment.
We excluded abstraction charges and cost items described in section 3 of the main consultation document.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Comments on models
We use connected properties as a scale variable. This scale variable has advantages at more aggregate models as it is completely exogenous and captures more dimensions than other scale variables. It captures both the volume of water and the size of the network. As such, it is a sort of composite variable of water volume and network length.
Model 2 shows a lower effect of properties than that of model 1. This is due to the inclusion of a number of sources which also accounts for the effect of scale.
We use the same variables described above for water treatment with the addition of average pumping head for water resources plus.
Similarly to water treatment models, the effect of the percent of water treated at complexity levels 3-6 is higher than that of the percent of water coming from boreholes. The weighted average density has similar effect on costs to that observed in water treatment.
Consultation model
ID OWRP1 OWRP2 OWRP3 OWRP4 OWRP5 OWRP6 OWRP7 OWRP8
Dependent variable ---------------- ln (water resources plus base costs) ----------------
ln (connected properties)
0.996*** (0.000)
0.639*** (0.000)
1.002*** (0.000)
0.991*** (0.000)
1.020*** (0.000)
1.017*** (0.000)
1.024*** (0.000)
1.023*** (0.000)
% of water treated in water treatments in complexity levels 3-6
0.009*** (0.001)
0.007** (0.048)
0.008*** (0.007)
% of DI from boreholes
-.009*** -.009*** -0.005** (0.016)
-0.004** (0.026)
-0.004** (0.023) (0.001) (0.001)
ln (weighted average density)
-0.182** -0.152* -0.124* -0.067 (0.022) (0.053) (0.100) (0.377)
ln (distribution input per source)
-.350***
(0.000)
ln (number of sources)
0.356*** (0.001)
ln (average pumping head for water resources plus)
0.297* (0.063)
0.332** (0.015)
0.179 (0.233)
0.278* (0.069)
Constant 4.989*** 7.682*** 4.353*** 5.266*** 1.788** 2.433*** 3.024*** 3.042***
(0.000) (0.000) (0.000) (0.000) (0.025) (0.010) (0.005) (0.009)
R2 adjusted 0.935 0.935 0.936 0.925 0.934 0.935 0.94 0.937
VIF (max) 1.819 5.163 1.113 1.172 1.284 1.231 1.838 1.576
Reset test 0.02 0.119 0.005 0.65 0.782 0.011 0.051 0.022
Estimation method OLS OLS OLS OLS OLS OLS OLS OLS
N (sample size) 107 107 107 107 107 107 107 107
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
12
Template 8. Water resources plus models proposed by Wessex Water
Description of dependent variable
Water treatment and resources botex smoothed = Opex + IRE + average MNI over period – third party costs – local authority rates – abstraction charges
Comments on models (Wessex Water)
Variation 1 (v1) models are the exogenous variations of our water treatment & resource models. Variation 2 (v2) models are the endogenous variations of our water treatment & resource models.
All models below provide very similar results with unsmoothed expenditure.
Consultation model ID WSXWRP1 WSXWRP2 WSXWRP3 WSXWRP4
Company’s model ID 2v1 2v2 4v1 4v2
Dependent variable Ln(Smoothed WT&R Botex) Ln(Smoothed unit WT&R Botex per DI)
Distribution Input 0.965*** (0.000)
0.994*** (0.000)
Measure of highly dense areas (% area with >6000 people per sqkm)
-0.545** (0.017)
-0.565** (0.013)
Proportion of DI from groundwater sources
-0.296* (0.099)
-0.255 (0.165)
Average Source Size -0.167 (0.587)
-0.179 (0.585)
Proportion of water treated W4+
0.016
(0.211)
0.016 (0.219)
Average pumping head 0.261*** (0.008)
0.411*** (0.004)
0.280*** (0.008)
0.414*** (0.001)
Constant -2.948*** (0.000)
-4.899*** (0.000)
-3.249*** (0.000)
-4.957*** (0.000)
R2 adjusted 0.95 0.94 0.59 0.44
VIF (max) 1.67 12.73 1.67 10.26
Reset test 0.001 0.013 0.000 0.167
Method OLS OLS OLS OLS
N (sample size) 102 102 102 102
1.4 Treated water distribution models
Template 9. Treated water distribution models proposed by Ofwat
Description of dependent variables
Treated water distribution base costs excluding cost items described in section 3 of the main consultation document.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Comments on models
We considered two scale variables in our distribution models: number of properties (models 1-4) and length of mains (models 5-8).
When using length of mains as a scale variable, we have also included a density variable. This is to account for the fact that a company that serves a larger population per km of mains may incur higher distribution costs. As expected, the coefficient of the density variable is positive, albeit quite large. We present the same models
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
13
with the weighted average density driver, which produces more sensible values for the estimated coefficient. The coefficient also captures increased cost of working in highly dense/urban areas.
Other cost drivers we included are the number of booster pumping stations, service reservoirs and water towers per length of main, to account for network complexity.
The variables percent of mains length laid after 1981 and percent of mains length refurbished and realigned were included as additional drivers of maintenance costs.
Consultation model
ID OTWD
1 OTWD
2 OTWD
3 OTWD
4 OTWD
5 OTWD
6 OTWD
7 OTWD
8
Dependent variable ------------------ ln (treated water distribution base costs) ------------------
ln (connected properties)
1.086*** (0.000)
1.121*** (0.000)
1.122*** (0.000)
1.157*** (0.000)
ln (lengths of main)
1.106*** (0.000)
1.156*** (0.000)
1.080*** (0.000)
1.124*** (0.000)
% of mains length refurbished and relined
0.465*** (0.000)
0.478*** (0.000)
0.475*** (0.000)
0.488*** (0.000)
0.449*** (0.000)
0.483*** (0.000)
0.446*** (0.002)
0.486*** (0.001)
ln (booster pumping stations per lengths of main)
0.308** (0.030)
0.296*** (0.008)
0.394*** (0.000)
0.310** (0.026)
ln (service reservoirs and water towers per lengths of main)
0.242** (0.045)
0.242** (0.015)
0.258** (0.019)
0.144 (0.411)
% of mains lengths laid post 1981
-.012*** (0.005)
-.012*** (0.002)
-.011*** (0.001)
-.012*** (0.000)
-0.013** (0.019)
-.014*** (0.006)
ln (density)
1.286*** (0.000)
1.191*** (0.000)
ln (weighted average density)
0.296*** (0.001)
0.253*** (0.009)
Constant 4.102*** 3.340*** 3.974*** 3.271*** 3.823*** 3.200*** 7.161*** 6.359***
R2 adjusted 0.966 0.964 0.976 0.975 0.977 0.975 0.961 0.96
VIF (max) 1.132 1.164 1.238 1.266 2.386 2.066 2.913 2.307
Reset test 0.632 0.511 0.033 0.079 0.042 0.064 0.001 0
Estimation method OLS OLS OLS OLS OLS OLS OLS OLS
N (sample size) 107 107 107 107 107 107 107 107
Template 10. Treated water distribution models proposed by Thames Water
Description of dependent variable
Water Distribution totex (including enhancement) net of grants and contributions
Description of selected explanatory variables
% 𝑀𝑎𝑖𝑛𝑠_320𝑚𝑚_450𝑚𝑚 =𝑃𝑜𝑡𝑎𝑏𝑙𝑒 𝑤𝑎𝑡𝑒𝑟 𝑚𝑎𝑖𝑛𝑠 320𝑚𝑚 − 450𝑚𝑚
𝐿𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑀𝑎𝑖𝑛𝑠𝑋100%
% 𝑀𝑎𝑖𝑛𝑠_450𝑚𝑚_610𝑚𝑚 =𝑃𝑜𝑡𝑎𝑏𝑙𝑒 𝑤𝑎𝑡𝑒𝑟 𝑚𝑎𝑖𝑛𝑠 450𝑚𝑚 − 610𝑚𝑚
𝐿𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑀𝑎𝑖𝑛𝑠𝑋100%
% 𝑀𝑎𝑖𝑛𝑠_320𝑚𝑚_610𝑚𝑚 =𝑃𝑜𝑡𝑎𝑏𝑙𝑒 𝑤𝑎𝑡𝑒𝑟 𝑚𝑎𝑖𝑛𝑠 320𝑚𝑚 − 450𝑚𝑚+𝑃𝑜𝑡𝑎𝑏𝑙𝑒 𝑤𝑎𝑡𝑒𝑟 𝑚𝑎𝑖𝑛𝑠 450𝑚𝑚 − 610𝑚𝑚
𝐿𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑀𝑎𝑖𝑛𝑠𝑋100%
% 𝑀𝑎𝑖𝑛𝑠_𝑝𝑟𝑒1880 = 𝑇𝑜𝑡𝑎𝑙 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑚𝑎𝑖𝑛𝑠 𝑙𝑎𝑖𝑑 𝑜𝑟 𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑎𝑙𝑙𝑦 𝑟𝑒𝑓𝑢𝑟𝑏𝑖𝑠ℎ𝑒𝑑 𝑝𝑟𝑒−1880
𝐿𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑀𝑎𝑖𝑛𝑠𝑋100%
% 𝑀𝑎𝑖𝑛𝑠_1921_1940 = 𝑇𝑜𝑡𝑎𝑙 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑚𝑎𝑖𝑛𝑠 𝑙𝑎𝑖𝑑 𝑜𝑟 𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑎𝑙𝑙𝑦 𝑟𝑒𝑓𝑢𝑟𝑏𝑖𝑠ℎ𝑒𝑑 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 1921 𝑎𝑛𝑑 1940
𝐿𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑀𝑎𝑖𝑛𝑠𝑋100%
Comments on models (Thames Water)
We consider that a translog model is more appropriate because:
the F-tests supports the translog and most of the interaction and square terms in the translog are statistically significant
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
14
the estimated coefficients of regional wage are more sensible
the statistical level of significance of other cost drivers (e.g., average pumping head distribution, diameter, age, etc.) become relevant, and
no evidence of misspecification (see model M4 for example)
The predicted vs. actual cost difference shows a significant improvement when using a translog functional form compared to the Cobb-Douglas (although the improvement in the difference could be in part explained by the inclusion of more explanatory factors, e.g. squares and interactions).
A negative estimated coefficient on regional wages was found when we used water delivered as a scale variable. We therefore used length of mains as the scale variable.
None of the models shows a significant statistical effect of regional wages. However, including regional wages in the models helps to mitigate the more serious problem of omitted variable bias which avoids any pre-adjustment in the models.
Average pumping head shows a strong and stable significant effect across all specifications. There is also some indication that the diameter and age of the mains are important drivers of distribution costs. By controlling these factors the models provide some evidence of not omitting important drivers.
The time trend estimation shows a reduction of cost, on average, 2% per annum across all companies over the period 2011-12 to 2016-17 with statistical level of significance in some of the models.
Consultation model ID TMSTWD1 TMSTWD2 TMSTWD3 TMSTWD4 TMSTWD5
Company’s model ID 2 3 4 5 6
Dependent variable Ln(Totex Distribution)
Ln(Mains) 0.868** (0.000)
0.934*** (0.000)
0.968*** (0.000)
0.951*** (0.000)
0.968*** (0.000)
Ln(Property Density) 0.632*** (0.005)
0.460* (0.082)
0.572** (0.018)
0.567** (0.031)
0.572** (0.018)
Ln(Mains)_SQ -0.246** (0.022)
-0.192 (0.120)
-0.161 (0.189)
-0.180 (0.133)
-0.161 (0.189)
Ln(Property Density)_SQ 2.967*** (0.009)
2.931*** (0.007)
4.016*** (0.001)
4.238*** (0.001)
4.016*** (0.001)
Ln(Mains)Ln(Density) 0.999*** (0.008)
0.651*** (0.001)
0.725*** (0.003)
0.577*** (0.004)
0.725*** (0.003)
Ln(Regional Wage_water_2soc) 0.571
(0.548) 0.951
(0.378) 0.633
(0.657) 0.571
(0.735) 0.633
(0.657)
Ln(average pumping head distribution) 0.228*** (0.007)
0.154** (0.019)
0.143** (0.032)
0.159** (0.018)
0.144** (0.032)
Time -0.024** (0.038)
-0.026** (0.039)
-0.022 (0.212)
-0.024 (0.257)
-0.022 (0.212)
% mains_320mm_450mm 0.432* (0.079)
% mains_450mm_610mm 0.197
(0.153) 0.268* (0.092)
0.133 (0.395)
0.267* (0.092)
% mains320_610mm
% mains_pre_1880 -0.036 (0.193)
-0.019 (0.498)
-0.036 (0.193)
% mains_1921_1940 0.098
(0.343) 0.132
(0.309) 0.098
(0.343)
Constant 4.686*** (0.000)
4.923*** (0.000)
4.693*** (0.000)
4.813*** (0.000)
4.693*** (0.000)
R2 adjusted 0.966 0.964 0.966 0.969 0.966
Reset test 0.002 0.001 0.003 0.201
VIF (max) 3.58 4.91 4.56 5.54
Method OLS OLS OLS RE OLS
N (sample size) 106 106 106 106 106
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
15
Template 11. Treated water distribution models proposed by Wessex Water
Description of dependent variable
Water treatment and resources botex = Opex + IRE + average MNI over period – third party costs – local authority rates – abstraction charges
Comments on models (Wessex Water)
The main cost driver is the number of connected properties. The main issue we faced was how to model density. Our preferred approach was to use the number of service reservoirs normalised by property numbers as a measure of density. We submitted aggregate and unit cost models based on the numbers of properties supplied, the normalised number of service reservoirs and average pumping head.
All models below provide very similar results with unsmoothed expenditure.
Consultation model ID WSXTWD1 WSXTWD2
Company’s model ID 2 4
Dependent variable Ln(WD botex smoothed) Ln(WD botex per property
smoothed)
Connected Properties 1.087*** (0.000)
Service Reservoirs / 100k properties -3.015*** (0.000)
-2.930*** (0.000)
Service Reservoirs / 100k properties^2 0.557*** (0.000)
0.533*** (0.000)
Average pumping head 0.150* (0.052)
0.190 (0.146)
Constant -0.090 (0.979)
0.623 (0.487)
R2 adjusted 0.97 0.44
VIF (max) 111.25 110.25
Reset test 0.485 0.037
Method OLS OLS
N (sample size) 102 102
1.5 Network plus water models
Template 12. Network plus water models proposed by Ofwat
Description of dependent variable
Network plus water base costs excluding cost items described in section 3 of the main consultation document.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Comments on models
We considered two scale variables in our network plus models: connected properties (models 1-4) and length of mains (models 5-8).
When using length of mains as a scale variable, we have also included a density variable. This is to account for the fact that a company that serves a larger population per km of mains may incur higher distribution costs. As expected, the coefficient of the density variable is positive, albeit quite large. We present the same models with the weighted average density driver, which produces more sensible values for the estimated coefficient. The coefficient also captures increased cost of working in highly dense/urban areas.
The rationale for all explanatory variables in our network plus models can be found in our comments on the water treatment and treated water distribution models.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
16
Consultation model ID ONPW
1 ONPW
2 ONPW
3 ONPW
4 ONPW
5 ONPW
6 ONPW
7 ONPW
8
Dependent variable ------------------ Ln(network plus water base costs) ------------------
ln (connected properties)
1.022*** (0.000)
1.044*** (0.000)
1.064*** (0.000)
1.084*** (0.000)
ln (lengths of main) 1.046*** 1.037*** 1.044*** 1.062***
(0.000) (0.000) (0.000) (0.000)
% of mains length refurbished and relined
0.16 (0.125)
0.163 (0.114)
0.264*** (0.006)
0.265*** (0.006)
0.166 (0.145)
0.232** (0.038)
0.172 (0.166)
0.257** (0.045)
ln (booster pumping stations per lengths of main)
0.278 (0.110)
0.256** (0.037)
0.246* (0.081)
0.416*** (0.007)
ln (service reservoirs and water towers per lengths of main)
0.335** (0.015)
0.337*** (0.009)
0.036 (0.836)
0.237 (0.153)
% of mains length laid or refurbished after 1981
-0.008* (0.065)
-0.006 (0.158)
-0.008* (0.064)
-0.006 (0.123)
-0.010** (0.047)
-0.009 (0.108)
ln (average pumping head for water treatment)
0.084* (0.069)
0.090*** (0.008)
0.091*** (0.008)
0.111*** (0.004)
% of water treated in water treatments in complexity levels 3-6
0.004* (0.050)
0.003* (0.057)
0.002 (0.201)
0.002 (0.303)
ln (density) 1.028***
(0.000) 1.064*** (0.000)
ln (weighted average density)
0.221*** (0.007)
0.229** (0.014)
Constant 5.266*** 5.114*** 4.780*** 4.774*** 5.124*** 5.736*** 7.074*** 7.733***
(0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000)
R2 adjusted 0.971 0.975 0.967 0.97 0.975 0.971 0.967 0.96
VIF (max) 1.532 1.543 1.189 1.306 3.159 2.557 2.883 2.337
Reset test 0.057 0.047 0.355 0.08 0.024 0.013 0 0
Estimation method OLS OLS OLS OLS OLS OLS OLS OLS
N (sample size) 107 107 107 107 107 107 107 107
Template 13. Network plus water models proposed by Anglian Water
Description of dependent variable
Water Network plus base costs excluding local authority rates
Acronyms used in explanatory variables
APH = average pumping head
DI = distribution input
Comments on models (Anglian Water)
The models follow the form developed by the CMA for the 2015 Bristol Determination. The same approach as taken by the CMA was followed in choosing which models to report.
All models are described in detail in our Cost Modelling report – Phase 2, published March 2018 here
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
17
Consultation model ID ANHNPW1 ANHNPW2 ANHNPW3 ANHNPW4 ANHNPW5
Company’s model ID 1 2 3 4 5
Dependent variable Water N+ botex: Log Unit Cost Water N+ botex: Log Aggregate
Ln(Water Delivered /Properties)
0.642* (0.064)
0.632*** (0.000)
0.800** (0.017)
0.778 ** (0.027)
0.637 *** (0.000)
Ln(Aggregate length of potable mains)
0.433*** (0.000)
0.456*** (0.000)
-0.482*** (0.000)
-0.603*** (0.000)
-0.573*** (0.000)
Ln(Regional wages) 0.345
(0.563) 0.284
(0.582) 0.914
(0.141) 0.257
(0.663) 0.302
(0.557)
Ln(Aggregate length of potable mains)
1.049*** (0.000)
1.049*** (0.000)
1.030*** (0.000)
% DI from rivers 0.028
(0.777)
% DI from reservoirs 0.307*** (0.001)
Ln(Average Pumping Head) -0.018 (0.801)
0.021 (0.77)
0.107 (0.209)
0.076 (0.381)
0.075 (0.365)
% Water delivered to Non Household customers
-0.778 (0.137)
-0.730* (0.065)
-0.388 (0.475)
-0.572 (0.278)
-0.515 (0.23)
% DI treated using multiple treatment approaches
0.304*** (0.003)
0.285*** (0.004)
0.255** (0.015)
0.258*** (0.01)
Time dummy 1st year -0.231** (0.014)
-0.115* (0.084)
-0.277*** (0.003)
-0.233* (0.012)
-0.116* (0.083)
Time dummy 2nd year -0.127 (0.115)
-0.091 (0.14)
-0.176** (0.025)
-0.130 (0.102)
-0.090 (0.143)
Time dummy 3rd year -0.064 (0.424)
-0.048 (0.43)
-0.094 (0.225)
-0.060 (0.447)
-0.048 (0.431)
Time dummy 4th year -0.110 (0.136)
-0.038 (0.517)
-0.127* (0.074)
-0.109 (0.136)
-0.037 (0.529)
Time dummy 5th year -0.124* (0.095)
-0.135* (0.059)
-0.122 * (0.095)
Time dummy 6th year -0.168** (0.022)
-0.163** (0.020)
-0.165** (0.022)
Constant 3.179
(0.105) 3.050* (0.065)
-6.364*** (0.002)
-4.249** (0.031)
-4.418*** (0.01)
R2 adjusted 0.203 0.433 0.963 0.960 0.972
Reset test 0.012 0.002 .005 0.000 0.000
VIF (max) 6.24 3.78 7.29 6.27 3.78
Method OLS OLS OLS OLS OLS
N (sample size) 125 89 125 125 89
Template 14. Network plus water models proposed by Southern Water
Description of dependent variable
Network plus base costs = modelled OPEX plus modelled base CAPEX.
Modelled OPEX is total OPEX less third party services, abstraction charges and local authority rates.
Modelled base CAPEX is maintenance expenditure in infrastructure and non-infrastructure less grants and contributions.
All costs are unsmoothed and deflated to 2016/17 prices using CPIH.
Comments on models (Southern Water)
Given that Network+ costs are a large proportion of wholesale costs, the models are similar in terms of variable selection:
Model 4 controls for length of mains as a scale driver, whilst models 5-6 use connected properties
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
18
Model 4 uses a simple (linear) measure of density, whilst models 5-6 use a translog relationship.
Models 4-5 use proportion of mains laid before 1980 and proportion of mains renewed/relined as maintenance drivers, whereas model 6 uses the proportion of mains renewed/relined only.
Consultation model ID SRNNPW1 SRNNPW2 SRNNPW3
Company’s model ID 4 5 6
Dependent variable Network+ BOTEX (log)
Connected properties (‘000s) (log) 1.053*** (0.000)
1.039*** (0.000)
Length of mains (km) (log) 1.094***
(0)
Properties over mains (‘000s / km) (log)
0.515** (0.044)
Properties over area (‘000s / km2) (log, demeaned)
-0.187* (0.05)
-0.173* (0.059)
Properties over area (‘000s / km2) (log, demeaned) squared
0.306*** (0.006)
0.362*** (0.005)
Proportion of water treated at complexity band 4 and above (%)
0.356** (0.028)
0.394* (0.058)
0.404* (0.091)
Proportion of mains renewed/relined (%)
28.83** (0.036)
24.34* (0.087)
22.72* (0.096)
Proportion of mains laid before 1980 (%)
1.005* (0.059)
0.570 (0.257)
Year 2016 dummy -0.0776** (0.012)
-0.0854*** (0.007)
Constant -5.502*** (0.000)
-3.509*** (0.000)
-3.067*** (0.000)
R2 adjusted 0.956 0.958 0.956
RESET Test 0.112 0.638 0.286
VIF (max) 1.232 1.276 1.267
Method OLS OLS OLS
N (sample size) 102 102 102
Template 15. Network plus water models proposed by Severn Trent Water
Description of dependent variable
All models relate to water network plus base costs: operating expenditure (less third party costs and council tax) + maintenance expenditure (infra and non-infra) gross of grants and contributions.
Description of selected explanatory variables
Length Length of potable and raw water distribution mains
Density(weighted) Ofwat’s weighted average density from the "Constructed data" folder on sharepoint
Density(Props. per km squared)
Number of properties per km squared. The area data used in the denominator is from Ofwat’s Masterfile with the exception of Southern Water, whose data in that file represents the waste boundaries. We have replaced this with data from our own GIS database.
WTW Water treatment works
GW/SW ratio The ratio of the number of ground to surface water works
Relined&renewed (km)
Length of mains relined or renewed
Prop. bands 4-6 Proportion of water treated in sites of categories 4-6
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
19
Comments on models (Severn Trent Water)
Model 1 is a Cobb-Douglas specification. A priori, we would expect small positive coefficients on the pumping head, mains repair, and treatment complexity variables. Our expectations have been met in this model. Given that ground water is usually significantly cheaper to treat than surface water, we expect (and see) a small negative coefficient on the variable measuring the ratio of ground to surface water sites. An increase in the scale of operations at a company (production and maintenance), while keeping the average capacity of treatment works unchanged, should result in a broadly equal proportionate change in costs. Therefore, we would also expect the sum of the coefficients on the length, no. of works and mains repair variables to be around 1 (assuming the average company in the industry is operating at constant returns to scale). This is indeed the case in this model.
Model 2 extends model 1 by adding non-linear terms in the length and density variables in order to allow economies of scale/density to vary with firm characteristics. All other variables are retained. While the pumping head and number of works variables are insignificant, they are correctly signed and theoretically important and so they are retained in the model to try to account for energy costs and economies of scale at the asset level. The positive and significant non-linear density term indicates diseconomies of density, which might be expected given that treated water distribution is the single largest element of network plus expenditure and costs tend to increase with density in this area. In the presence of constant returns to scale at the industry average, we would expect the sum of the coefficients on the length, no. of works and mains repair (relined/renewed) variables to sum to around 1. These expectations are met almost exactly in the model.
Model 3 extends model 1 by adding non-linear terms in the length and no. of works variables. Previous Severn Trent research suggests an increase in cost elasticity as capacity declines/no. of works increases. However, given that a non-linear term on the "no. of works" variable in a model would be likely to capture both asset level and firm level economies of scale, we chose to re-scale it and express it as the number of works per property. This allows it to more clearly represent differences in asset capacity. The positive coefficient on this variable is as expected. Re-scaling the WTW coefficient also changes our expectations for the magnitude of the length coefficient; this and the mains repair variable are now expected to sum to around 1 in the presence of constant returns to scale. The sum of these scale features comes in only slightly higher than 1.
Model 4 presents a Cobb-Douglas form model with distribution input as the main scale driver. All of the other variables that have been discussed are also included and each coefficient conforms to expectations. However, due to the greater correlation between Ofwat’s new weighted density measure and the distribution input variable, the standard errors are somewhat inflated and more variables are statistically insignificant. The sum of the three scale related variables comes to 1.06. This is close to our expectations.
Model 5 extends model 4 by adding non-linear terms in the distribution input and no. of treatment works variables. For the same reasons as in model 3, we re-scale the treatment works variable and express it as the number of works per property. These non-linear terms add little to the model with both highly insignificant and the treatment works variable of a sign we would not expect in the OLS version of the model. However, each of the other coefficients remains of a logical sign and magnitude. The sum of the distribution input and mains repair (relined/renewed) variables comes to 1.08.
Model 6 above builds on Ofwat’s PR14 approach, using a similar density measure, but adds some extra logical variables and addresses water treatment cost drivers differently. Our prior expectations are for small positive coefficients on the pumping head, treatment (prop. bands 4-6), relined/renewed and no. of works variables. These expectations have been met with most of these coefficients statistically significant; only the average pumping head variable is insignificant but we place more weight on its magnitude despite its insignificance given the small sample size. We would also expect the sum of the coefficients on the length, no. of works and relined/renewed variables to sum to around 1 in the presence of constant returns to scale. The condition is almost met with these coefficients summing to 1.06.
Model 7 presents a simpler Cobb-Douglas style version of model 6. The coefficients on each individual variable are very close to expectations, but some of the variables are insignificant. However, we take confidence from the fact that these coefficients are appropriately signed and of a logical magnitude. Given their theoretical importance, they are retained in the model.
As with the other models, the sum of the coefficients on the length, no. of works and mains repair variables comes to around 1 (1.1). This is in line with our expectations.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
20
Consultation model ID SVTNPW1 SVTNPW2 SVTNPW3 SVTNPW4 SVTNPW5 SVTNPW6 SVTNPW7
Company’s model ID 1 2 3 4 5 6 7
Dependent variable ---------------------- Ln(Water N+ botex) ----------------------
Ln(Length) .57*** (.001)
.86*** (0.000)
.99*** (.000)
.85*** (.00)
.80*** (.00)
Ln(Distribution Input) .72*** (.00)
.98*** (.000)
Ln(Dist. Input)^2 .01
(.74)
Ln(number of WTW) .37** (.029)
.07 (.48)
.25* (.06)
.25** (.032)
.1**
(.036) .19*
(.052)
Ln(WTW per property) .27** (.014)
Ln(WTW per prop)^2 .19* (.09)
-.02 (.85)
Prop. bands 4-6 .31*
(.059)
.39** (.02)
.16 (.25)
.12 (.43)
GW/SW works ratio -.03** (.017)
-.02** (.045)
-.02* (.08)
-.03*** (.00)
-.03*** (.002)
-.02*** (.00)
-.03*** (.00)
Ln(Average pumping head)
.05 (.19)
.1 (.41)
.12 (.47)
.16 (.3)
.17 (.26)
.13 (.14)
.09 (.55)
Ln(Relined & Renewed) .09** (.04)
.07** (.029)
.07 (.12)
.09** (.017)
.1*** (.007)
.11*** (.00)
.12*** (.00)
Ln(Density) .31* (.09)
.39*** (.00)
.33 (.01)
.08 (.31)
.08 (25)
Ln(Density – props per km^2)
.59*** (.00)
.47*** (.00)
Ln(Length)^2 -.05 (.12)
.02 (.63)
-.03 (.12)
Ln(Density)^2 .19*** (.00)
.37*** (.00)
Ln(Density) X Ln(Length) .12*** (.009)
Prop. bands 4-6 .26*
(.056)
.24** (.014)
.12 (.47)
Dummy 2012 -.05** (.04)
-.03 (.55)
-.04 (.42)
-.09* (.057)
-.11** (.027)
-.06(.194) -.08 (.13)
Dummy 2013 -.02** (.05)
-.01 (.91)
-.02 (.79)
-.04 (.48)
-.05 (.42)
-.03(.52) -.05 (.33)
Dummy 2014 -.08** (.04)
-.07 (.13)
-.07* (.1)
-.1** (.015)
-.11** (.011)
-.08** (.046)
-.1**(.03)
Dummy 2015 -.1**
(.039) -.09* (.054)
-.1** (.05)
-.1** (.02)
-.12** (.021)
-.09** (.049)
-.1**(.04)
Dummy 2016 -.13** (.02)
-.13*** (.00)
-.14*** (.00)
-.14*** (.00)
-.14*** (.00)
-.14*** (.00)
-.13***(.00)
Constant 4.93*** 4.91*** 5.16*** 5.25*** 5.02*** 5.22***
R2 adjusted .97 .98 .96 .97 .97 .98 .97
Reset test 0 .41 0.17 .009 0.41 0.00 0.01 0.23
VIF(max) 16.6 24 6.3 19.6 5.7 15.2 11.3
Method OLS OLS OLS OLS OLS OLS OLS
N (sample size) 107 107 107 107 107 107 107
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
21
Template 16. Network plus water models proposed by South West Water
Description of dependent variable
Modelled OPEX = network+ OPEX – network+ third party – network+ abstraction charges – network+ local authority rates
Modelled base CAPEX = network+ maintenance infra + network+ maintenance non-infra – network+ grants and contributions
Modelled BOTEX = Modelled OPEX + Modelled base CAPEX
Modelled BOTEX+ (growth) = modelled BOTEX + network+ additions to the supply and demand balance + network+ new developments and growth + network+ metering expenditure + network+ resilience
Modelled TOTEX = modelled BOTEX + network+ other capital expenditure infra + network+ other capital expenditure non-infra + network+ infrastructure network reinforcement
Unsmoothed net costs from 2011/12 to 2016/17
Comments models (South West Water)
We have adopted the same approach to modelling network plus wholesale water costs as for aggregate BOTEX, as there were no resources-specific drivers in our models. We have not, at this stage, examined the appropriateness of different estimation approaches. We do note, however, that some models seem more robust than others and clearly this will have implications for identifying relative efficiency.
See our aggregate wholesale water BOTEX submission for a more detailed review of the drivers considered, which were:
Scale (properties)
Density/sparsity (mains per property)
Source type
Maintenance
Each of the network+ water models we have developed captures each of these key cost drivers for companies. Our network+ water models differ across the way in which density and sparsity are captured.
To explore the impact of density and sparsity on water costs we considered both a trans-log specification using mains over properties, in addition to considering a simpler model with a single log-linear term. Models 1, 3 and 5 model a trans-log ‘u shape’ relationship between cost and population density/sparsity. Models 2, 4, and 6 use only the log of mains over connected properties, capturing only the impact of sparsity.
We have extended our aggregate BOTEX modelling to models controlling for BOTEX + growth enhancement and TOTEX (see discussion in wholesale water models). As can be seen from the efficiency range charts, modelling BOTEX+ (growth) or TOTEX does not substantially broaden the efficiency ranges. As for wholesale water models, we would recommend that BOTEX+ (growth) and TOTEX modelling approaches are explored to the fullest possible extent at PR19.
All of the BOTEX models estimate statistically significant coefficients which are supported from an operational and economic perspective. The relationship between cost and cost drivers in BOTEX+ (growth) and TOTEX models is broadly similar to that estimated in BOTEX models, although not all coefficients pass statistical significance tests. As with aggregate modelling, the large coefficient on the proportion of mains relined or renewed seems large is a result of the small size of the underlying data.
While some trans-log models do fail the RESET test, we would note that they lead to narrower efficiency ranges than a log-linear model when modelling BOTEX. Given the operational justification for this specification, we would recommend the exploration of models which control for a ‘u-shape’ relationship between cost and density.
Given our focus on modelling what we consider to be key industry drivers of cost, we have not explored estimation approaches beyond OLS with robust standard errors. We will be considering the most appropriate estimation approaches as part of our consultation response.
All models are broadly robust from a statistical perspective, with the exception of the RESET test for some models.
Adjusted R2 is sufficiently high.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
22
VIF (a measure of collinearity) is well below the ‘rule of thumb’ threshold of 10. Note that for trans-log specifications we demean the squared term to minimise collinearity, with no impact on coefficients or predictions.
We find mixed evidence from the RESET test on whether the model would be improved by the addition of polynomial terms, i.e. given the control variables, whether the model is mis-specified. This is despite including trans-log terms.
Consultation model ID SWBNPW1 SWBNPW2 SWBNPW3 SWBNPW4 SWBNPW5 SWBNPW6
Company’s model ID 1 2 3 4 5 6
Dependent variable Network+ BOTEX (ln) Network+ BOTEX+ (growth)
(ln) Network+ TOTEX (growth)
(ln)
Properties (ln) 1.025*** (0.000)
1.046*** (0.000)
1.046*** (0.000)
1.060*** (0.000)
1.060*** (0.000)
1.078*** (0.000)
Proportion of mains renewed or relined (%)
32.88*** (0.000)
31.81*** (0.000)
24.77*** (0.005)
24.66*** (0.005)
25.53*** (0.006)
25.39*** (0.008)
Mains over connected properties (ln)
0.379*** (0.000)
0.352** (0.015)
0.390*** (0.000)
0.369*** (0.002)
0.512*** (0.000)
0.484*** (0.000)
Mains over connected properties (ln squared and demeaned)
1.737*** (0.000)
1.072*** (0.001)
1.440*** (0.000)
Proportion of treated surface water (%)
0.163** (0.033)
0.146* (0.069)
0.0548 (0.555)
0.0706 (0.441)
0.0221 (0.821)
0.0433 (0.659)
Properties growth (%) 0.272** (0.012)
0.399*** (0.002)
0.290*** (0.008)
0.462*** (0.002)
Constant -3.819*** (0.000)
-3.797*** (0.000)
-3.899*** (0.000)
-3.984*** (0.000)
-4.244*** (0.000)
-4.359*** (0.000)
R2 adjusted 0.956 0.944 0.950 0.946 0.948 0.942
Reset test 0.286 0.767 0.000 0.025 0.001 0.001
VIF(max) 1.312 1.277 1.337 1.331 1.337 1.331
Method OLS OLS OLS OLS OLS OLS
N (sample size) 102 102 102 102 102 102
Template 17. Network plus water models proposed by Thames Water
Description of dependent variable
Network plus totex = opex + capex (maintenance + enhancement) net of grants & contributions
Description of selected explanatory variables
APH Network=Avg_pmphd_R_T_DN=Avg_PMHD Raw+Avg_PMHD Treatment+Avg_PMHD Distribution
% DI from boreholes= Proportion of distribution input derived from boreholes, excluding managed aquifer recharge (MAR) water supply schemes
Comments on models (Thames Water)
All our network plus (raw + treatment + distribution) models are totex unsmoothed.
We have used the functional form found in Water Distribution as a starting point. We are exploring different functional forms (Cobb-Douglas and Translog) in network plus with different scale variables to determine if having flexible economies of scale/density is appropriate at the network plus level
There is a potential issue on the way average pumping head has been allocated by companies. As it was shown in the Water Distribution analysis, average pumping head is an important and statistically significant driver in the cost functions. When we bring this variable to the network plus models, (e.g., raw, treatment and distribution) the calculation of average pumping head might be suffering by the misreading of companies in the definition of this variable.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
23
This allocation issue might be affecting the performance of the average pumping head (in particular in raw and treatment) yielding not statistical evidence as an important driver.
Length of mains appear strongly significant along with the interaction term with property density. In a similar, way property density seems to be another relevant driver of costs.
The age of the mains between 1921 and 1940 seems to be quite relevant in terms of its level of significance and the stability of the magnitude estimated across all the models tested.
The coefficient of regional wages is sensible although not statically significant. But including sensible estimates of regional wage in the models helps mitigate the effects of serious issues, such as omitted variable bias, and is preferred to costs pre-adjustments
By using proportion of Impounding Reservoirs and Boreholes as the main representation of water treatment costs in the network plus level, models M3 and M4 are providing good evidence of no omitted variables issues. However, these variables are not statistically significant under cluster robust standard errors.
Consultation model ID TMSNPW1 TMSNPW2 TMSNPW3 TMSNPW4
Company’s model ID M3 M4 M5 M9
Dependent variable -------------- Ln(Totex Water NetworkPlus) --------------
Ln(Length of potable and raw water mains)
1.054*** (0.000)
1.062*** (0.000)
1.069*** (0.000)
1.054*** (0.000)
Ln(Property Density) 0.373
(0.105) 0.419* (0.089)
0.363 (0.122)
0.373*** (0.004)
Ln(Length of potable and raw water mains)^2
-0.023 (0.815)
-0.001 (0.991)
-0.0152 (0.883)
-0.023 (0.626)
Ln(Property Density)^2 1.775
(0.225) 2.138
(0.191) 3.014* (0.051)
1.775** (0.030)
Ln(Length of potable and raw water mains)*Ln(Density)
0.496** (0.020)
0.480** (0.026)
0.426** (0.043)
0.496*** (0.000)
Ln(APH Network) 0.046
(0.727) 0.0004 (0.998)
0.177 (0.177)
0.046 (0.510)
Ln(Regional Wage_water_2soc) 0.699
(0.641) 0.407
(0.819) 0.563
(0.716) 0.699
(0.339)
time -0.020 (0.329)
-0.014 (0.527)
-0.017 (0.416)
-0.020 (0.119)
% mains laid between 1921 and 1940
0.211* (0.057)
0.217* (0.089)
0.287*** (0.004)
0.211*** (0.000)
% DI from impounding reservoirs 0.125
(0.144) 0.1287 (0.172)
0.074 (0.417)
0.125*** (0.001)
% DI from boreholes 0.098
(0.247) 0.116
(0.173) 0.016
(0.865) 0.098** (0.024)
Constant 4.997*** (0.000)
4.852*** (0.000)
4.989*** (0.000)
4.997*** (0.000)
R2 adjusted 0.970 0.968 0.971
Reset test 0.061 0.221
VIF (max) 5.75 6.00
Method OLS OLS RE RE
N (sample size) 106 88 106 106
Template 18. Network plus water models proposed by Welsh Water
Description of dependent variable
Network Plus includes costs for Raw Water Distribution, Water Treatment and Treated Water Distribution
Water Network Plus Botex = (Total Operating Expenditure – Third Party Services – Abstraction Charges – Local authority rates) + (Maintaining the long term capability of the assets infra + Maintaining the long term capability of the assets non-infra)
Values rebased to 2016/17 using CPIH.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
24
Comments on models (Welsh Water)
The models aim to capture key cost drives for the industry. The models include a scale variable, the number of connected properties, alongside variables to capture density and sparsity, treatment complexity and drivers of maintenance.
To capture both density and sparsity, the models include properties over length of mains alongside a squared term. In this way the two variables capture the U-shape relationship between costs and density and sparsity. This variable allows the impact of density on costs to vary according to how dense the company is.
The estimated elasticities of density on costs are reasonable and of the right order across the industry. The model is robust to the removal of the sparsest companies however the coefficient on the square term becomes less significant when the densest company is removed.
The models are broadly robust to using alternative modelled costs (e.g. including abstraction charges, excluding grants and contributions etc), and to alternative estimation techniques such as random effects.
These models have been produced with South West and Bournemouth combined.
Consultation model ID WSHNPW1 WSHNPW2
Company’s model ID 3 4
Dependent variable Ln(Network plus Botex)
Ln(Connected Properties) (,000)
1.002*** (0.000)
1.010*** (0.000)
Ln (Properties over Mains), demeaned (,000/km)
-0.304* (0.081)
-0.296* (0.085)
Ln (Properties over Mains)^2, demeaned (,000/km)
1.836*** (0.000)
2.075*** (0.000)
% mains renewed and relined 31.91*** (0.002)
32.90*** (0.003)
% of water treated at complexity band 2 and below
-0.532** (0.011)
-0.587*** (0.004)
% of water treated at complexity band 5 and above
0.125 (0.318)
Constant -2.469*** -2.492***
R2 adjusted 0.970 0.970
VIF (max) 1.715 1.261
Reset test 0.036 0.076
Estimation method OLS OLS
N (sample size) 102 102
Template 19. Network plus water models proposed by Yorkshire Water
Description of dependent variable
Network plus base costs = operating expenditure less abstraction charges, third party services and local authority rates + capital maintenance expenditure net of grants and contributions (G&C)
The dependent variables are deflated using CPIH to 2016/17 prices. No smoothing was undertaken.
Comments on models (Yorkshire Water)
The network+ models use a variety of scale variables and density measures. These models are generally robust to alternative modelled costs and estimation techniques, and produce reasonably compact efficiency ranges.
Our dependent variable excludes G&C consistent with the PR14 approach. However, given lack of split of G&C for capital maintenance and enhancement expenditure, we have also modelled CAPEX on a gross basis. The statistical performance of the models are broadly consistent with and without G&C.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
25
Consultation model ID YKYNPW1 YKYNPW2 YKYNPW3
Company’s model ID 11 12 13
Dependent variable Network + BOTEX (log)
Connected properties (‘000s) (log) 1.030*** (0.000)
Population served (‘000s) (log) 1.012*** (0.000)
Length of mains (km) (log) 1.026*** (0.000)
% of mains renewed/relined 32.75** (0.01)
30.16** (0.018)
35.86*** (0.005)
% of mains laid before 1980 0.791* (0.096)
0.623 (0.18)
% of DI from reservoirs 0.327** (0.019)
0.341** (0.012)
% of DI from rivers 0.217
(0.361) 0.220
(0.342)
% of water treated at complexity band 1 and below
-0.692** (0.015)
Properties over area (‘000s / km2) (log, demeaned)
-0.123 (0.169)
-0.188** (0.041)
Properties over area (‘000s / km2) (log, demeaned) squared
0.279** (0.02)
0.246** (0.028)
Properties over mains (‘000s / km) (log)
0.719** (0.039)
Constant -3.437***
(0.000 -4.004*** (0.000)
-3.410*** (0.002)
R2 adjusted 0.958 0.960 0.952
Reset test 0.560 0.475 0.0615
VIF (max) 1.569 1.569 1.275
Estimation method OLS OLS OLS
N (sample size) 102 102 102
Template 20. Network plus water models proposed by Bristol Water
Description of dependent variable
The dependent variable is Botex per connected property.
Botex = (total opex – business rates – third party costs) + capital maintenance
Comments on models (Bristol Water)
The models and corresponding coefficients presented in this pro forma are based on cost information for 17 companies (data for Bournemouth and South West Water have been appropriately combined). Regressions were run in reference to the Master Wholesale Cost data file dated 27th February 2017, reflecting the latest updates and amendments to the data.
Capital maintenance costs have been smoothed on a three year rolling-average basis, therefore four years of data have been modelled (2014-2017). Botex costs have been calculated on a unit cost basis by dividing cost information by the sum of Total non-household connected properties at year end and Total household connected properties at year end also from the six-year wholesale cost data set.
A full description of the work undertaken to arrive at these models is set out in a report by NERA: ‘Comparative Benchmarking Assessment to Support Preparation of Bristol Water’s AMP7 Business Plan’ (December 2017).
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
26
Consultation model ID BRLNPW1 BRLNPW2 BRLNPW3
Company’s model ID 7 8 9
Dependent variable Ln(network plus botex per property)
Ln(DI/connected property) Ml/d per ‘000 connected property
0.793* (0.063)
1.050** (0.024)
0.973*** (0.003)
Ln(length of mains/ connected property) Km/‘000 connected property
0.607*** (0.008)
0.494** (0.011)
0.461* (0.076)
Share of water treated at level 5 and above (%)
0.257 (0.144)
0.143 (0.364)
Length of mains laid pre-1940/Total length of mains (%)
0.805** (0.012)
0.487 (0.152)
1.197*** (0.000)
Length of renewed and relined mains/Total length of mains (%)
18.12* (0.093)
15.64 (0.157)
34.12** (0.011)
Year15 0.0003 (0.990)
0.003 (0.912)
0.024 (0.468)
Year16 0.024
(0.521) 0.029
(0.432) 0.072
(0.138)
Year17 0.024
(0.461) 0.032
(0.301) 0.0767** (0.046)
Surface water treated / Total water treated (%)
0.526*** (0.006)
Share of water from reservoirs (%) 0.206* (0.059)
Ln(number of sources / DI) 0.113
(0.325)
Ln(average pumping head network) 0.140
(0.294)
Constant -3.733*** -3.881*** -3.437***
R2 adjusted 0.53 0.65 0.70
Reset test 0.67 0.54 0.15
VIF (max) 1.48 1.84 2.94
Estimation method OLS OLS OLS
N (sample size) 68 68 68
Template 21. Network plus water models proposed by South East Water
Description of dependent variables
Modelled OPEX = OPEX – third party – abstraction charges – local authority rates
Modelled base CAPEX = maintenance infra + maintenance non-infra – (grants and contributions)
Modelled BOTEX = Modelled OPEX + Modelled base CAPEX
Costs are modelled on an outturn basis and unsmoothed
Comments on models (South East Water)
The model coefficients are broadly robust to alternative modelled costs and estimation techniques.
The number of treatment plants is a material driver of costs for the same reason as the number of sources noted above and is captured in the models by the number of treatment plants per scale driver. We have modelled this variable in levels and logs with models in levels tending to have marginally superior statistical properties.
The coefficient on the proportion of mains relined/renewed variable is large only because the proportion of mains relined/renewed is a small variable which takes a maximum value of 0.012 and a minimum value of 0.000207 with a mean of 0.004. The magnitude of the cost adjustment is therefore limited. A coefficient of 25 would imply an estimated elasticity range of approximately 0 to 0.3.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
27
Consultation model ID SEWNPW1 SEWNPW2 SEWNPW3 SEWNPW4 SEWNPW5
Company’s model ID 9 10 11 12 13
Dependent variable Network + BOTEX (log)
Connected properties (‘000s) (log)
1.097*** (0.000)
1.101*** (0.000)
1.089*** (0.000)
Population served (‘000s) (log) 1.095*** (0.000)
DI (Ml/d) (log) 1.057*** (0.000)
Proportion of area with more than 4000 people per km2 (%)
0.452** (0.028)
0.326* (0.057)
0.689*** (0.004)
0.695*** (0.006)
0.689*** (0.002)
Proportion of area with less than 600 people per km2 (%)
0.757*** (0.005)
0.607** (0.012)
0.633*** (0.006)
0.658*** (0.006)
0.530*** (0.005)
% of DI treated at complexity band 3 and above (%)
0.262 (0.277)
0.322 (0.172)
0.429* (0.05)
0.383* (0.078)
0.587*** (0.000)
Proportion of mains renewed/relined (%)
22.66** (0.036)
23.71** (0.024)
21.96** (0.041)
22.95** (0.035)
21.34** (0.047)
Number of treatment works over connected properties (number / ‘000s) (log)
0.0926 (0.126)
Number of treatment works over DI (number / (Ml/d)) (log)
0.0667 (0.245)
Number of treatment works over DI (number / (Ml/d))
1.437** (0.023)
Year 2016 dummy -0.079*** (0.002)
-0.061** (0.012)
-0.080*** (0.003)
-0.079*** (0.004)
-0.083*** (0.001)
Constant -4.616*** (0.000)
-2.802*** (0.000)
-3.589*** (0.000)
-3.710*** (0.000)
-4.039*** (0.000)
R2 adjusted 0.960 0.963 0.961 0.961 0.964
Reset test 0.736 0.373 0.846 0.799 0.963
VIF (max) 2.063 2.000 2.422 2.582 2.608
Estimation method OLS OLS OLS OLS OLS
N (sample size) 102 102 102 102 102
Template 22. Network plus water models proposed by South Staffs Water
Description of dependent variable
Modelled OPEX = [OPEX] – [third party] – [abstraction charges] – [local authority rates]
Modelled base CAPEX = [maintenance infra] + [maintenance non-infra] – [grants and contributions]
Modelled BOTEX = [Controllable OPEX] + [Controllable base CAPEX]
Costs are deflated to 2016/17 base prices using CPI-H modelled on an unsmoothed basis.
Comments on models (South Staffs Water)
The coefficients are generally robust to alternative modelled costs and estimation techniques.
The coefficient on the proportion of mains relined/renewed variable is large only because it is a small variable which takes a maximum value of 0.012 and a minimum value of 0.000207 with a mean of 0.004. The magnitude of the cost adjustment is therefore limited. A coefficient of 25 would imply an estimated elasticity range of approximately 0 to 0.3.
Average pumping head is a known driver of power expenditure, yet the driver was often insignificant and/or had a counter-intuitive sign. This may be due to data problems with this variable or that its effect is reduced through the inclusion of other cost drivers. We note however that there remains a very strong correlation between average pumping head, distribution input and power costs when modelled separately. Modelling power expenditure separately as a function of average pumping head may be more appropriate, but we appreciate that the consultation will may give us the opportunity to study what other companies have observed in this area.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
28
Models containing Ofwat’s density and sparsity measures were considered. Although these models performed reasonably well in statistical diagnostic tests, company performances were sensitive to the choice of the threshold. As a robust operational rationale for choosing a particular threshold could not be identified, the models presented include simple density drivers only.
Consultation model ID SSCNPW1 SSCNPW2
Company’s model ID Model 1 Model 2
Dependent variable Network+ BOTEX (log)
Length of mains (km) (log) 1.069*** (0.000)
1.094*** (0.000)
Properties over mains (‘000s / km) (log) 0.577* (0.074)
0.515** (0.044)
% water treated at complexity band 4 and above 0.344
(0.109) 0.356** (0.028)
% of mains renewed/relined 27.53* (0.056)
28.83** (0.036)
% of mains laid before 1980 1.005*
(0.059)
Constant -4.446*** (0.000)
-5.502*** (0.000)
R2 adjusted 0.948 0.956
Reset test 0.821 0.112
VIF (max) 1.164 1.232
Estimation method OLS OLS
N (sample size) 102 102
1.6 Wholesale water models
Template 23. Wholesale water models proposed by Ofwat
Description of dependent variables
Wholesale water base costs excluding cost items described in section 3 of the main consultation document.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Comments on models
We considered two scale variables in our wholesales models: connected properties (models 1-6) and length of mains (models 7-12).
When using length of mains as a scale variable, we have also included a density variable. This is to account for the fact that a company that serves a larger population per km of mains may incur higher distribution costs. As expected, the coefficient of the density variable is positive, albeit quite large. We present the same models with the weighted average density driver, which produces more sensible values for the estimated coefficient. The coefficient also captures increased cost of working in highly dense/urban areas.
The rationale for all explanatory variables in our wholesale water models can be found in our comments on the water treatment and treated water distribution models. All coefficients are reasonable robust and meet expectations.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
29
Consultation model ID OWW 1 OWW 2 OWW 3 OWW 4 OWW 5 OWW 6 OWW 7 OWW 8 OWW 9 OWW
10 OWW
11 OWW
12
Dependent variable -------------- ln (wholesale water base costs) --------------
ln (connected properties) 1.109*** (0.000)
1.078*** (0.000)
1.114*** (0.000)
1.053*** (0.000)
1.037*** (0.000)
1.081*** (0.000)
ln (lengths of main) 1.114***
(0.000) 1.072*** (0.000)
1.114*** (0.000)
1.086*** (0.000)
1.031*** (0.000)
1.082*** (0.000)
% mains length refurbished and relined 0.177
(0.126) 0.185* (0.073)
0.191* (0.071)
0.286** (0.014)
0.247** (0.014)
0.276*** (0.006)
0.210* (0.067)
0.174 (0.122)
0.197* (0.071)
0.184 (0.146)
0.13 (0.301)
0.165 (0.173)
ln (booster pumping stations per lengths of main)
0.280** (0.041)
0.392*** (0.006)
0.320* (0.051)
0.353** (0.049)
ln (service reservoirs and water towers per lengths of main)
0.202** (0.029)
0.336*** (0.006)
0.183 (0.162)
0.165 (0.360)
% of lengths of mains laid or refurbished 1981
-0.007* (0.088)
-0.007 (0.116)
-0.007 (0.106)
-0.005 (0.101)
-0.005 (0.197)
-0.006 (0.178)
-0.008* (0.058)
-0.006 (0.136)
-0.007* (0.098)
-0.009* (0.067)
-0.007 (0.183)
-0.008* (0.094)
% of water treated in water treatments in complexity levels 3-6
0.004 (0.185)
0.003 (0.130)
0.004** (0.030)
ln (average pumping head for water resources plus)
0.272*** (0.007)
0.170* (0.078)
0.199** (0.037)
0.231** (0.011)
0.172* (0.067)
0.196** (0.038)
0.252** (0.031)
0.207* (0.092)
0.231* (0.065)
ln (density) 0.918***
(0.000) 1.148*** (0.000)
1.071*** (0.000)
ln (weighted average density) 0.248***
(0.001) 0.330*** (0.000)
0.290*** (0.001)
Constant 2.287*** 4.324*** 3.394*** 3.696*** 5.780*** 4.840*** 3.249*** 4.244*** 3.508*** 5.614*** 7.216*** 6.139***
R2 adjusted 0.972 0.976 0.975 0.963 0.974 0.973 0.973 0.976 0.974 0.968 0.971 0.969
VIF (max) 1.31 1.725 1.605 1.234 1.254 1.306 1.46 2.829 2.287 1.641 3.182 2.533
Reset test 0.372 0.145 0.684 0.046 0.047 0.161 0.346 0.162 0.476 0.025 0.021 0.019
Estimation method OLS OLS OLS OLS OLS OLS OLS OLS OLS OLS OLS OLS
N (sample size) 107 107 107 107 107 107 107 107 107 107 107 107
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
30
Template 24. Wholesale water models proposed by Southern Water
Description of dependent variable
Wholesale water base costs = (OPEX less third party services, abstraction charges and local authority rates) + (maintenance capital expenditure infra and non-infra less grants and contributions).
All costs are unsmoothed and deflated to 2016-17 prices using CPIH.
Comments on models (Southern Water)
The three BOTEX models vary with the scale driver, density driver and maintenance drivers used:
Model 1 uses length of mains as the scale driver, whilst models 2-3 use connected properties
Model 1 uses a simple (linear) density measure. Because length of mains also captures aspects of sparsity, the positive coefficient on density is to be expected. Models 2-3 estimate a translog density relationship, using properties over mains and properties over area respectively.
Models 1 and 3 control for the proportion of mains renewed/relined and the proportion of mains laid before 1980. Model 2 controls for the proportion of mains renewed/relined only
Consultation model ID SRNWW1 SRNWW2 SRNWW3
Company’s model ID 1 2 3
Dependent variable BOTEX (log)
Connected properties (‘000s) (log) 1.028*** (0.000)
1.070*** (0.000)
Length of mains (km) (log) 1.096*** (0.000)
Properties over mains (‘000s / km) (log) 0.502** (0.034)
Properties over mains (‘000s / km) (log,demeaned)
-0.0817 (0.512)
Properties over mains (‘000s / km) (log,demeaned) squared
1.313*** (0.007)
Properties over area (‘000s / km2) (log, demeaned)
-0.155* (0.092)
Properties over area (‘000s / km2) (log, demeaned) squared
0.238* (0.055)
Sources over DI 0.760*** (0.000)
0.344* (0.081)
% DI from reservoirs 0.185* (0.067)
% of water treated at complexity band 4 and above
0.409*** (0.007)
0.486*** (0.009)
% of mains renewed/relined 28.86** (0.035)
29.98*** (0.003)
21.89 (0.119)
% of mains laid before 1980 0.926* (0.059)
0.438
(0.379)
Year 2016 dummy -0.0589** (0.027)
-0.0803*** (0.006)
Constant -5.440*** (0.000)
-3.441*** (0.000)
-3.531*** (0.000)
R2 adjusted 0.962 0.977 0.964
Reset test 0.215 0.608 0.743
VIF (max) 1.232 2.259 1.471
Method OLS OLS OLS
N (sample size) 102 102 102
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
31
Template 25. Wholesale water models proposed by South West Water
Description of dependent variable
Wholesale BOTEX = (OPEX – third party – abstraction charges – local authority rates) + (maintenance infra + maintenance non-infra – grants and contributions)
Wholesale BOTEX+ (growth) = wholesale BOTEX + additions to the supply and demand balance + new developments and growth + metering expenditure + resilience
Wholesale TOTEX = Wholesale BOTEX + other capital expenditure infra + other capital expenditure non-infra + infrastructure network reinforcement
Unsmoothed net costs from 2011/12 to 2016/17
Comments on models (South West Water)
We have focused on capturing the key drivers of costs in wholesale water that are operationally robust and statistically valid.
The key drivers we have focused on for aggregate modelling are:
Scale (properties): Properties represents the most appropriate scale driver for aggregate water costs, as it simultaneously captures the volume of water that requires treatment and the size of the network as captured by the number of connections.
Density/sparsity (mains per property): there are increased costs associated with operating in densely populated urbanised areas (traffic congestion, congested underground utilities, etc.) and in sparsely populated areas (increase travel costs, leakage control, pumping costs). We selected this measure as it most directly relates to the operational relationship with maintenance costs. In addition, it allowed the modelling of a u-shape relationship, whereby the costs of operating in areas of more extreme population density and sparsity are accounted for within our trans-log models.
Source type/treatment process: the type of source determines the resource costs and the quality of the source water, which in turn determines the required complexity of water treatment. The proportion of distribution input that comes from surface water is outside of management control, as the source types available to companies are determined by local geological factors (while the type of treatment process, in contrast, lies partially within management control).
Maintenance: the costs associated with maintaining and repairing assets.
Our models differ across 2 key parameters: density/sparsity and source water quality.
Models 1, 3 and 5 model a trans-log ‘u-shape’ relationship between cost and population density/sparsity. Models 2, 4 and 6 use only the log of mains over connected properties, capturing only the impact of sparsity.
We have extended our BOTEX modelling to models controlling for BOTEX + growth enhancement and TOTEX. We have used the same BOTEX drivers as in our aggregate BOTEX models, as the regional operating characteristics increasing or decreasing BOTEX are also likely to affect the cost of delivering enhancement solutions. In addition, we have augmented our models with a driver for growth—the percentage increase in properties—to capture the impact of an increase in customer volumes on: growth enhancement directly; ongoing OPEX and capital maintenance costs; and delivery of programmes recorded under quality enhancement. We were not able to include direct measures of differences in the amount of quality enhancement within our econometric modelling.
While these models do not include a quality enhancement specific driver, they do meet many of the statistical criteria set out by Ofwat (see below). As can be seen from the efficiency range charts, while modelling BOTEX+ (growth) does not widen the efficiency ranges, including quality enhancement to model TOTEX does lead to somewhat broader efficiency ranges.
We would recommend that BOTEX+ (growth) and TOTEX modelling approaches are explored to the fullest possible extent at PR19. Benchmarking companies based on their TOTEX spend plays an important role in capturing the synergies between OPEX and CAPEX spend and ensuring that companies are rewarded for innovative solutions that reduce costs overall rather than in one particular area.
There is a broader range between model specifications than across cost categories. The models which include trans-log mains over connected properties have the narrowest efficiency ranges.
All of the BOTEX models estimate statistically significant coefficients that meet expectations. Likewise in BOTEX+ (growth) and TOTEX modes, although some coefficients are less significant.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
32
Consultation model ID SWBWW1 SWBWW2 SWBWW3 SWBWW4 SWBWW5 SWBWW6
Company’s model ID 1 2 3 4 5 6
Dependent variable Ln(wholesale water
botex) Ln(wholesale water
botex+growth) Ln(wholesale water
totex)
Properties (ln) 1.027*** (0.000)
1.046*** (0.000)
1.049*** (0.000)
1.060*** (0.000)
1.058*** (0.000)
1.073*** (0.000)
% of mains renewed or relined
33.70*** (0.000)
32.74*** (0.000)
27.19*** (0.000)
27.11*** (0.002)
26.19*** (0.005)
26.07*** (0.005)
Mains over connected properties (ln)
0.374*** (0.000)
0.350*** (0.010)
0.371*** (0.000)
0.355*** (0.001)
0.496*** (0.000)
0.473*** (0.000)
Mains over connected properties (ln squared and demeaned)
1.543*** (0.000)
0.856*** (0.003)
1.205*** (0.000)
% of treated surface water
0.193*** (0.010)
0.178** (0.020)
0.113 (0.212)
0.126 (0.158)
0.0782 (0.413)
0.0960 (0.312)
Properties growth (%) 0.285*** (0.007)
0.387*** (0.002)
0.308*** (0.005)
0.452*** (0.002)
Constant -3.729*** (0.000)
-3.710*** (0.000)
-3.820*** (0.000)
-3.888*** (0.000)
-4.129*** (0.000)
-4.225*** (0.000)
R2 adjusted 0.962 0.953 0.954 0.952 0.952 0.947
RESET Test 0.335 0.860 0.013 0.019 0.001 0.002
VIF (max) 1.312 1.277 1.337 1.331 1.337 1.331
Method OLS OLS OLS OLS OLS OLS
N (sample size) 102 102 102 102 102 102
Template 26. Wholesale water models proposed by Welsh Water
Description of dependent variable
Water Botex = (Total Operating Expenditure – Third Party Services – Abstraction Charges – Local authority rates) + (Maintaining the long term capability of the assets infra + Maintaining the long term capability of the assets non-infra)
Values rebased to 2016/17 using CPIH.
Comments on models (Welsh Water)
The submitted botex models aim to capture key cost drives for the industry. The models include a scale variable, the number of connected properties, alongside variables to capture density and sparsity, treatment complexity, drivers of maintenance and the size of the sources.
To capture both density and sparsity, the models include properties over length of mains alongside a squared term. In this way, the two variables capture the U-shape relationship between costs and density and sparsity. This variable allows the impact of density on costs to vary according to how dense the company is. The density variables have been demeaned (the sample mean value of the variable is subtracted from each observation) in order to eliminate collinearity between the linear and quadratic density term.
One of the two variables has a coefficient which does not show up as significant. This is not considered an issue as the two variables work in conjunction with each other and the other variable is highly significant.
The estimated elasticities of sparsity/density on costs are reasonable and of the right order across the industry. The model is robust to the removal of the most sparse and dense companies.
The models also appear to be robust to using alternative modelled costs (e.g. including abstraction charges, excluding grants and contributions etc) and to alternative estimation techniques such as random effects.
These models have been produced with South West and Bournemouth combined.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
33
Consultation model ID WSHWW1 WSHWW2
Company’s model ID 1 2
Dependent variable Ln(Water Botex)
Ln(Connected Properties) (,000)
0.986*** (0.000)
1.025*** (0.000)
Ln (Properties over Mains), demeaned (,000/km)
-0.0250 (0.877)
-0.0625 (0.62)
Ln (Properties over Mains)^2, demeaned (,000/km)
1.125** (0.019)
1.605*** (0.000)
Number of Sources/DI (nr/Ml/D)
0.659*** (0.002)
0.628*** (0.000)
% mains renewed and relined 29.25*** (0.004)
% water treated at complexity band 2 and below -0.709*** (0.000)
-0.837*** (0.000)
% water treated at complexity band 5 and above 0.213
(0.194)
Year 2016 Dummy -0.110*** (0.006)
-0.0559** (0.019)
Constant -2.186*** -2.518***
Adjusted R-squared 0.974 0.978
VIF (max) 2.046 2.052
Reset test 0.594 0.292
Method OLS OLS
N (sample size) 102 102
Template 27. Wholesale water models proposed by Yorkshire Water
Description of dependent variables
Wholesale water base costs = operating expenditure less abstraction charges, third party services and local authority rates + capital maintenance expenditure net of grants and contributions (G&C)
Wholesale water totex (growth) costs = wholesale water base costs + growth enhancement expenditure.
Modelled Growth enhancement expenditure = expenditure of supply side enhancement to the supply/demand balance (peak) + supply side enhancement to the supply/demand balance (average) + demand side enhancement to the supply/demand balance (peak) + demand side enhancement to the supply/demand balance (average) + resilience + new developments + metering for optants + metering for meters introduced by companies + metering for non-household and other.
The dependent cost variables are deflated using CPIH to 2016/17 prices. No smoothing was undertaken.
Comments on models (Yorkshire Water)
The aggregate BOTEX models use a variety of scale variables and density measures. These models are generally robust to alternative modelled costs and estimation techniques, and produce reasonably compact efficiency ranges.
The statistical performance of the models are broadly consistent with and without G&C.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
34
Consultation model ID YKYWW1 YKYWW2 YKYWW3 YKYWW4 YKYWW5 YKYWW6
Company’s model ID 5 6 7 8 9 10
Dependent variable Water BOTEX (log)
Length of mains (km) (log) 1.032*** (0.000)
1.047*** (0.000)
1.060*** (0.000)
Connected properties (‘000s) (log)
1.012*** (0.000)
1.030*** (0.000)
Population served (‘000s) (log) 1.026*** (0.000)
Properties over mains (‘000s / km) (log)
0.932*** (0.000)
0.984*** (0.001)
0.923*** (0.000)
Properties over mains (‘000s / km) (log, demeaned)
-0.131 (0.368)
-0.0890 (0.465)
-0.302** (0.016)
Properties over mains (‘000s / km) (log, demeaned) squared
1.236*** (0.009)
1.320*** (0.006)
1.041** (0.021)
Sources over DI (number / (Ml/d)
0.708*** (0.007)
0.765** (0.010)
0.694*** (0.003)
0.518*** (0.001)
0.754*** (0.000)
0.682*** (0.000)
% of mains renewed/relined 28.79** (0.019)
32.05*** (0.002)
35.40*** (0.001)
32.21*** (0.002)
28.02*** (0.006)
% of mains laid before 1980 0.800** (0.029)
0.904** (0.014)
% DI from reservoirs 0.529*** (0.002)
0.526*** (0.004)
0.548*** (0.000)
0.174 (0.103)
0.186* (0.059)
0.239** (0.013)
Proportion of DI from rivers (%)
0.236 (0.308)
0.229 (0.336)
0.332 (0.127)
% of water treated in band 1 and below
-0.892*** (0.000)
% of water treated in band 2 and below
-0.754*** (0.000)
-0.632*** (0.000)
Constant -3.596*** (0.000)
-3.217*** (0.000)
-4.113*** (0.000)
-2.594*** (0.000)
-2.718*** (0.000)
-3.508*** (0.000)
R2 adjusted 0.962 0.963 0.969 0.975 0.977 0.977
Reset test 0.512 0.727 0.107 0.737 0.639 0.508
VIF (max) 2.415 2.392 2.416 2.022 2.257 2.251
Method OLS OLS OLS OLS OLS OLS
N (sample size) 102 102 102 102 102 102
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
35
Template 28. Wholesale water plus models proposed by Yorkshire Water
Consultation model ID YKYWW7 YKYWW8 YKYWW9 YKYWW10 Company’s model ID 1 2 3 4
Dependent variable TOTEX (growth) (log)
Length of mains (km) (log) 1.048*** (0.000)
1.047*** (0.000)
1.052*** (0.000)
1.055*** (0.000)
Properties over mains (‘000s / km) (log) 0.810*** (0.000)
0.796*** (0.000)
0.891*** (0.000)
0.836*** (0.000)
Sources over DI (number / (Ml/d)) 0.494*** (0.001)
0.439*** (0.002)
0.597*** (0.005)
0.442** (0.012)
% of mains renewed/relined 20.13* (0.097)
21.65** (0.05)
% of mains laid before 1980 0.905** (0.02)
0.824** (0.039)
% of DI from reservoirs 0.396*** (0.003)
0.406*** (0.002)
0.399** (0.021)
0.419*** (0.008)
Enhancement to the supply/demand balance over DI
2.034** (0.019)
1.961** (0.018)
1.912* (0.1)
1.741 (0.156)
New properties over connected properties
0.134
(0.387)
0.297** (0.045)
Constant -3.874*** (0.000)
-3.935*** (0.000)
-3.229*** (0.000)
-3.583*** (0.000)
R2 adjusted 0.963 0.963 0.959 0.962
Reset test 0.761 0.720 0.289 0.154
VIF (max) 1.643 1.829 1.563 1.839
Method OLS OLS OLS OLS
N (sample size) 102 102 102 102
Template 29. Wholesale water models proposed by Affinity Water
Description of dependent variable
The dependent variable of the models presented in this template is total smoothed botex per connected property. This includes operating and capital maintenance costs across all the wholesale value chain for the water service.
Operating costs include all operating expenditure except for local authority rates and third party services. Capital maintenance costs are based on the capex category “maintaining the long term capability of assets”, including both infrastructure and non-infrastructure costs.
In order to mitigate the effects of “lumpy” capital investments, and following recent precedent from the CMA, we have smoothed companies’ capital maintenance on a 3 year rolling-average basis. Therefore, our models are based on four years of data (2014 to 2017).
Comments on models (Affinity Water)
The models presented in this template are estimated using data from Ofwat’s wholesale water cost assessment dataset from October 2017, which compiles cost and driver data for all companies in England and Wales. Our dataset includes a total of 17 companies, since we have combined the data for Bournemouth and South West Water to treat them as a single merged company.
We have selected the models presented in this template using an innovative tool based on a Monte Carlo simulation. This tool randomly generates and runs a total of 12,000 econometric models based on different combinations of the available cost drivers. We have then selected our initial set of preferred models based on the following filtering criteria:
The models pass the Ramsey RESET test at the 5% significance level
The adjusted R-squared is higher than 0.4
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
36
The coefficients of certain cost drivers have an intuitive sign. In particular, we consider that water delivered per property, distribution input per property, and the share of water treated at level 4 (or 5) and above should have a positive relationship with base expenditure per property.
A total of 385 models out of the 12,000 satisfied the above criteria. We have then applied further filtering criteria to narrow down this set of potential models:
We have excluded all models which included leakage or distribution input/property as an explanatory variable. The reason is that leakage is a driver that can be managed by the company to some extent, and including it therefore risks endogeneity bias (as was highlighted by the CMA in its 2015 Final Determination for Bristol Water, page A4(2)-28).
We have only included models which contain at least one variable capturing each of the following four effects on companies’ costs: (1) population density, (2) network density (3) water treatment complexity variables (4) variables relating to the company’s mix of sources. We consider that these are key cost drivers for Affinity Water and for the water industry in general, and if they had been omitted from models, some form of off-model adjustment would be required (eg. special factor adjustment) to control for their effect.
A total of 14 models out of the 385 models satisfied these additional criteria. We have then estimated the VIF statistic for each of these 14 models, and selected the top 4 models with the lowest VIF. This is an objective method for minimising the risk that the models are distorted due to the effects of multicollinearity.
This innovative model selection method has the advantage of allowing us to asses a large number of possible models in a systematic and objective way, ensuring our selected models satisfy key statistic standards from the perspective of the industry as a whole. However, it is a mechanistic method which involves limited expert judgement, and as such does not guarantee that these are the best possible models for explaining water industry costs. Rather, they provide a starting point for developing models that can be applied in the PR19 review.
Consultation model ID AFWWW1 AFWWW2 AFWWW3 AFWWW4
Company’s model ID 1 2 3 4
Dependent variable Ln (total smoothed botex per property)
Ln (length of mains/ connected properties) (km/000s)
0.939*** (0.001)
0.966*** (0.005)
0.978*** (0.005)
0.988** (0.001)
Ln (population/ connected properties) 3.219*** (0.001)
2.807** (0.023)
2.760** (0.025)
3.639*** (0.001)
% of water treated at level 4 or above 0.231* (0.065)
0.357* (0.051)
0.367** (0.041)
0.266 (0.104)
% of water from reservoirs 0.351*** (0.001)
% of water from boreholes -0.026 (0.453)
Surface water treated/ Total water treated
0.281** (0.018)
Ln (water treatment works/ DI) -0.062 (0.385)
-0.077 (0.270)
year15 dummy -0.023 (0.302)
-0.026 (0.227)
-0.026 (0.222)
-0.022 (0.375)
year16 dummy -0.018 (0.634)
-0.032 (0.454)
-0.033 (0.443)
-0.021 (0.622)
year17 dummy -0.022 (0.579)
-0.012 (0.792)
-0.021 (0.625)
-0.006 (0.890)
Constant -7.443*** -7.274*** -7.322*** -7.975***
Adjusted R-squared 0.57 0.41 0.42 0.53
VIF (max) 3.53 3.56 3.52 3.90
Reset test 0.41 0.48 0.37 0.57
Method OLS OLS OLS OLS
N (sample size) 68 68 68 68
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
37
Template 30. Wholesale water models proposed by Bristol Water
Description of dependent variable
The dependent variable is Botex per connected property.
Botex = (total opex – business rates – third party costs) + capital maintenance
Comments on models (Bristol Water)
The models and corresponding coefficients presented in this pro forma are based on cost information for 17 companies (data for Bournemouth and South West Water have been appropriately combined ). Regressions were run in reference to the Master Wholesale Cost data file dated 27th February 2017, reflecting the latest updates and amendments to the data.
Capital maintenance costs have been smoothed on a three year rolling-average basis, therefore four years of data have been modelled (2014-2017). Botex costs have been calculated on a unit cost basis by dividing cost information by the sum of Total non-household connected properties at year end and Total household connected properties at year end also from the six-year wholesale cost data set.
A full description of the work undertaken to arrive at these models is set out in a report by NERA: ‘Comparative Benchmarking Assessment to Support Preparation of Bristol Water’s AMP7 Business Plan’ (December 2017).
Consultation model ID BRLWW1 BRLWW2 BRLWW3
Company’s model ID 1 2 3
Dependent variable Ln(total botex per property aggregate)
Ln(DI/ ‘000 connected property) 0.718
(0.116) 0.834** (0.0160)
0.753* (0.055)
Ln(length of mains/ ‘000 connected property) 0.279
(0.231) 0.346* (0.097)
0.454*** (0.009)
Ln(length of raw mains and conveyors/DI) Unit: km per Ml/d
0.041 (0.624)
Share of water treated at level 5 and above (%) 0.354** (0.017)
0.193
(0.222)
Length of mains laid pre-1940/Total length of main (%)
0.270 (0.358)
0.987*** (0.006)
0.439 (0.140)
Length of renewed and relined mains/Total length of mains (%)
11.11 (0.278)
32.36** (0.021)
17.71* (0.090)
Ln(average pumping head aggregate) 0.228
(0.146) 0.026
(0.806) 0.102
(0.559)
Year15 -0.015 (0.460)
0.017 (0.583)
-0.003 (0.892)
Year16 -0.004 (0.911)
0.062 (0.220)
0.023 (0.590)
Year17 -0.012 (0.699)
0.063* (0.092)
0.019 (0.583)
Surface water treated / Total water treated (%) 0.541** (0.020)
Share of water from reservoirs (%) 0.051
(0.753) 0.272** (0.027)
Ln(number of sources / DI) 0.132
(0.172)
Constant -3.718*** -3.131*** -3.699***
R2 adjusted 0.61 0.73 0.67
Reset test 0.94 0.24 0.74
VIF (max) 1.99 3.66 1.95
Method OLS OLS OLS
N (sample size) 68 68 68
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
38
Template 31. Wholesale water models proposed by South East Water
Description of dependent variable
Modelled BOTEX = Modelled OPEX + Modelled base CAPEX Modelled OPEX = OPEX – third party – abstraction charges – local authority rates Modelled base CAPEX = maintenance infra + maintenance non-infra – (grants and contributions) Costs are modelled on an outturn basis and unsmoothed.
Comments on models (South East Water)
The coefficients are broadly robust to alternative modelled costs and estimation techniques.
It is important to capture the impact of the number of sources on expenditure as the number of sources drives a number of real costs such as employment costs (travel time), maintenance costs, capital costs as each require control systems, pumps, borehole maintenance, monitors, chemical delivery costs etc. The number of sources over DI variable has a statistically significant coefficient and is of the expected sign.
The coefficient on the proportion of mains relined/renewed variable is large only because the proportion of mains relined/renewed is a small variable which takes a maximum value of 0.012 and a minimum value of 0.000207 with a mean of 0.004. The magnitude of the cost adjustment is therefore limited. A coefficient of 25 would imply an estimated elasticity range of approximately 0 to 0.3.
The TOTEX (growth) models were developed by including growth enhancement cost and corresponding drivers in BOTEX models. Since each enhancement activity is a small part of TOTEX (growth), and given a relatively small dataset, these enhancement drivers end up statistically insignificant and sometimes have an unintuitive sign. The coefficient on the proportion of mains relined/renewed variable is large only because the proportion of mains relined/renewed is a (numerically) small variable that takes a maximum value of 0.012 and a minimum value of 0.000207 with a mean of 0.004. The magnitude of the cost adjustment is therefore limited. A coefficient of 25 would imply an estimated elasticity range of approximately 0 to 0.3. Some coefficients are narrowly insignificant at the 10% level. Given these coefficients are of the correct sign from an operational and economic perspective, this was deemed appropriate to consider. The coefficients are broadly robust to alternative modelled costs (e.g. including abstraction charges) and alternative estimation approaches such as Random Effects.
Consultation model ID SEWWW1 SEWWW2 SEWWW3 SEWWW4
Company’s model ID 5 6 7 8
Dependent variable Water BOTEX (log)
Connected properties (‘000s) (log) 1.088*** (0.000)
1.065*** (0.000)
Population served (‘000s) (log) 1.084*** (0.000)
Distribution input (Ml/d) (log) 1.046*** (0.000)
% of area with more than 4000 people per km2 0.547*** (0.000)
0.367*** (0.009) 0.243* (0.07)
0.615*** (0.000)
% of area with less than 600 people per km2 0.428*** (0.003) 0.512*** (0.001) 0.367** (0.014)
0.413*** (0.005)
% of DI treated at complexity band 3 and above 0.608*** (0.000)
0.563*** (0.000)
0.617*** (0.000)
0.581*** (0.000)
% of mains renewed/relined 24.38** (0.012)
22.03** (0.037)
23.01** (0.032)
Sources over DI (number / (Ml/d) 0.669*** (0.001) 0.592*** (0.004) 0.578*** (0.003) 0.709*** (0.001)
Year 2016 dummy -0.072*** (0.003)
-0.076*** (0.001)
-0.058*** (0.008)
-0.113*** (0.005)
Constant -3.862*** (0.000)
-4.649*** (0.000)
-2.843*** (0.000)
-3.577*** (0.000)
R2 adjusted 0.973 0.971 0.973 0.970
Reset test 0.950 0.994 0.808 0.835
VIF (max) 3.000 3.044 2.970 2.996
Estimation method OLS OLS OLS OLS
N (sample size) 102 102 102 102
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
39
Template 32. Wholesale water plus models proposed by South East Water
Consultation model ID SEWWW5 SEWWW6 SEWWW7 SEWWW8
Company’s model ID 1 2 3 4
Dependent variable TOTEX (growth) (log)
Connected properties (‘000s) (log) 1.123*** (0.000)
DI (Ml/d) (log) 1.073*** (0.000)
1.065*** (0.000)
1.067*** (0.000)
Proportion of area with more than 4000 people per km2 (%)
0.680*** (0.000)
0.350** (0.029)
0.389** (0.011)
0.380*** (0.006)
% area with less than 600 people per km2 0.554*** (0.000)
0.470** (0.014)
0.536*** (0.002)
0.520*** (0.001)
% water treated at complexity band 3 and above
0.584*** (0.000)
0.637*** (0.000)
0.511*** (0.000)
0.600*** (0.000)
Sources over DI (number / (Ml/d)) (log) 0.170*** (0.000)
0.140** (0.031)
0.127** (0.04)
0.117** (0.05)
% of mains renewed/relined 21.90** (0.02)
21.15* (0.081)
17.37 (0.124)
20.37* (0.076)
New mains over length of mains (%) 0.171 (0.29)
0.268 (0.187)
0.281
(0.185)
Enhancement to the supply/demand balance over DI
1.832* (0.1)
1.884 (0.113)
Constant -3.606*** (0.000)
-2.644*** (0.000)
-2.488*** (0.000)
-2.674*** (0.000)
R2 adjusted 0.977 0.972 0.973 0.974
Reset test 0.538 0.247 0.0881 0.195
VIF (max) 2.524 2.498 2.580 2.599
Estimation method OLS OLS OLS OLS
N (sample size) 102 102 102 102
Template 33. Wholesale water models proposed by South Staffs Water
Description of dependent variable
Modelled OPEX = [OPEX] – [third party] – [abstraction charges] – [local authority rates]
Modelled base CAPEX = [maintenance infra] + [maintenance non-infra] – [grants and contributions]
Modelled BOTEX = [Controllable OPEX] + [Controllable base CAPEX]
Costs are deflated to 2016/17 base prices using CPI-H modelled on an unsmoothed basis.
Comments on models (South Staffs Water)
The coefficients are generally robust to alternative modelled costs and estimation techniques.
The coefficient on the proportion of mains relined/renewed variable is large only because it is a small variable which takes a maximum value of 0.012 and a minimum value of 0.000207 with a mean of 0.004. The magnitude of the cost adjustment is therefore limited. A coefficient of 25 would imply an estimated elasticity range of approximately 0 to 0.3.
Average pumping head is a known driver of power expenditure, yet the driver was often insignificant and/or had a counter-intuitive sign. This may be due to data problems with this variable or that its effect is reduced through the inclusion of other cost drivers. We note however that there remains a very strong correlation between average pumping head, distribution input and power costs when modelled separately. Modelling power expenditure separately as a function of average pumping head may be more appropriate, but we appreciate that the consultation will may give us the opportunity to study what other companies have observed in this area.
Models containing Ofwat’s density and sparsity measures were considered. Although these models performed reasonably well in statistical diagnostic tests, company performances were sensitive to the choice of the
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
40
threshold. As a robust operational rationale for choosing a particular threshold could not be identified, the models presented include simple density drivers only.
Consultation model ID SSCWW1 SSCWW2
Company’s model ID 1 2
Dependent variable Water BOTEX (log)
Length of mains (km) (log) 1.048*** (0.000)
1.029*** (0.000)
Properties over mains (‘000s / km) (log)
1.051*** (0.000)
1.013*** (0.000)
% of water treated at complexity band 2 and below
-0.649*** (0.002)
-0.540** (0.011)
% of DI from reservoirs 0.335*** (0.005)
0.360** (0.011)
Sources over DI (number / (Ml/d)) 0.968*** (0.000)
0.905*** (0.001)
% of mains renewed/relined 29.39*** (0.005)
% of mains laid before 1980 0.402
(0.262)
Constant -2.864*** (0.000)
-2.924*** (0.002)
R2 adjusted 0.972 0.967
Reset test 0.253 0.339
VIF (max) 1.992 2.310
Estimation method OLS OLS
N (sample size) 102 102
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
41
2 Wastewater models
2.1 Bioresources models
Template 34. Bioresources models proposed by Ofwat
Description of dependent variable
Bioresources base costs excluding cost items described in section 4 of the main consultation document.
Comments on models
We use properties or sludge produced as a scale variable. For a vertically separated bioresources provider, sludge produced is not under management control (unlike sludge disposed).
To account for disposal costs we used the percent of sludge disposed to farmland. To account for transport costs we use the percent of intersiting work done by tanker or trucks. In model 3 we add total intersiting work (by all forms of transport) to distinguish between vehicle transport (tanker and trucks) from pipe transport.
All estimated coefficients have the expected sign and a plausible magnitude. The percent of total intersiting work in model 3 does not seem to improve the model.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Consultation model ID OBR 1 OBR 2 OBR 3
Dependent variable --------- ln (bioresources base costs) ---------
ln (properties) 1.002*** (0.000)
ln (sludge produced) 0.940***
(0.000) 0.912*** (0.000)
% intersiting work done by truck and tanker
0.020*** (0.003)
0.017*** (0.010)
0.019*** (0.008)
% of sludge disposed to farmland -0.021** (0.021)
-0.018** (0.026)
-0.018** (0.025)
ln (intersiting work) 0.061
(0.437)
Constant 3.167** (0.017)
13.261*** (0.000)
12.802*** (0.000)
R2 adjusted 0.862 0.878 0.88
VIF (max) 2.536 2.47 2.671
Reset test 0.011 0.003 0.002
Estimation method OLS OLS OLS
N (sample size) 60 60 60
Template 35. Bioresources models proposed by Anglian Water
Description of dependent variable
Natural log of Bioresources botex excluding rates
Acronyms used in explanatory variables
ttds = tons of dry solids
STW = sewage treatment works
Comments on models (Anglian Water)
We have developed three possible model forms for Bioresources:
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
42
Model 1 is based on demographic and geographic factors, with causation factors that are exogenous so far as the Bioresources function is concerned.
Models 2, 3 are based on the nature of the Network Plus asset base which produces the raw sludge which in turn is the treatment input for the Bioresources function. Both from the point of view that:
1. The existing Network Plus fixed asset base cannot realistically be changed in the short to medium term; 2. Bioresources as a stand-alone function cannot control the Network Plus technology used to produce the
sludge it is treating.
Models 4-6 take the operational parameters of the Bioresources function as being the causation factors. Given the asset lives of Bioresources assets, except in the short term, these causation factors are not exogenous so far as the Bioresources function is concerned.
Arable land is the proportion of arable land in each WaSC’s appointed area as reported by DEFRA. It is intended as a proxy for Land-bank.
All models are described in detail in our Cost Modelling report – Phase 2, published March 2018: http://www.anglianwater.co.uk/about-us/thinking-about-our-future/
Consultation model ID ANHBR1 ANHBR2 ANHBR3 ANHBR4 ANHBR5 ANHBR6
Company’s model ID 1 2 3 4 5 6
Dependent variable Ln(Bioresources botex)
Ln(Sludge produced x sparsity<600) (ttds)
0.383** (0.035)
Ln(Sludge produced x(1- sparsity<600)) (ttds)
0.462*** (0.000)
Ln(Sludge produced x sparsity<1,150) (ttds)
1.043*** (0.000)
Ln(Sludge produced x(1- sparsity<1,150)) (ttds)
0.217*** (0.000)
ln(Ttds generated by Band5 STWs) (ttds)
0.156
(0.249) 0.280*** (0.027)
ln(Ttds generated by Band6 STWs) (ttds)
0.812*** (0.000)
0.692** (0.000)
ln(Ttds generated by Band1-4 STWs) (ttds)
-0.172 (0.382)
0.139 (0.286)
Ln(Sludge produced) (ttds)
1.086*** (0.000)
1.150*** (0.000)
% tds treated by conventional or advanced anaerobic digestion
-0.992*** (0.000)
-0.713*** (0.000)
-0.803*** (0.000)
-1.010*** (0.000)
-0.804*** (0.000)
-0.858*** (0.000)
Ln(Appointed area) 0.488** (0.013)
Sewered area / Appointed area
2.182* (0.098)
Arable land in appointed area as % of total arable land
3.664** (0.042)
% sludge produced at co-located STW
-0.796*** (0.002)
Sparsity<600/km2
0.964*** (0.001)
Time Trend
0.046** (0.03)
0.036* (0.086)
0.042** (0.047)
0.043** (0.045)
0.041** (0.05)
0.043** (0.035)
Constant -4.719*** (0.002)
-0.375 (0.418)
-0.583 (0.213)
-0.991** (0.037)
-2.489*** (0.000)
-1.615*** (0.002)
R2 adjusted 0.829 0.833 0.823 0.822 0.828 0.838
Reset test 0.480 0.984 0.770 0.011 0.320 0.818
VIF (max) 4.21 7.65 3.05 1.77 2.30 1.81
Method OLS OLS OLS OLS OLS OLS
N (sample size) 60 60 60 60 60 60
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
43
Template 36. Bioresources models proposed by Southern Water
Description of dependent variable
Modelled OPEX plus modelled base CAPEX.
Modelled OPEX is total OPEX less third party services, abstraction charges and local authority rates.
Modelled base CAPEX is maintenance expenditure in infrastructure and non-infrastructure less grants and contributions.
All costs are unsmoothed and deflated to 2016/17 prices using CPIH.
Comments on models (Southern Water)
The two network+ models are similar to models 1 and 3 of the BOTEX models, providing alternative approaches to control for pumping capacity per length of sewer. These models also appear to estimate coefficients that are operationally intuitive with reasonable statistical properties.
Consultation model ID SRNBR1 SRNBR2 SRNBR3 SRNBR4
Company’s model ID 1 2 3 4
Dependent variable ln (Bioresources BOTEX)
Amount of Sludge produced (log) 1.063*** (0.000)
1.011*** (0.000)
1.046*** (0.000)
1.101*** (0.000)
% of sludge treated using AD or AAD -0.947*** (0.009)
-0.942** (0.018)
-0.718*** (0.000)
% of sludge produced and treated at a site of STW and STC co-location
-0.008** (0.014)
Total measure of intersiting 'work' done (all forms of transportation) per unit sludge produced (log) (km/year)
0.163
(0.207)
% of load treated in small WTWs (bands 1 to 3)
0.052** (0.048)
% of area with more than 2000 people per km2
-0.809*** (0.003)
-0.624* (0.098)
Constant -0.129 (0.627)
-0.804 (0.226)
-0.480** (0.037)
-1.536** (0.011)
R2 adjusted 0.806 0.794 0.813 0.763
VIF (max) 1.753 1.750 1.529 2.553
Reset test 0.142 0.192 0.533 0.0851
Estimation method OLS OLS OLS OLS
N (sample size) 60 60 60 60
Template 37. Bioresources models proposed by Severn Trent Water
Description of dependent variable
Sludge base cost
Description of selected explanatory variables
Weighted density Ofwat's new weighted density index Prop. Load with tight N3 consent
This is the proportion of load that has an ammonia consent of 3mg/l or less. Engineering logic informs us that it would be better to have include the load with consents of between 3mg/l and 5mg/l also but this data was not readily available.
Av. Distance intersiting Total intersiting "work" done divided by sludge vol. (km/yr) Av. Distance intersiting via pipe Intersiting "work" done by pipeline divided by sludge vol. (km/yr) Av. Distance to disposal Total disposal "work" divided by total sludge vol. (km/yr)
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
44
Comments on models (Severn Trent Water)
In model 13, we prefer to use a distance based measure of disposal work to reduce correlation with the scale variable and distance based measures of inter-siting activity to reduce correlation relative to the “work” based variables. All coefficients are broadly in line with expectations.
Consultation model ID SVTBR1 SVTBR2
Company’s model ID 13 14
Dependent variable Ln(Sludge base capex
smoothed 5 years) Sludge base capex smoothed 5
years)
Ln(sludge produced) 1.15*** (.00)
1.23*** (.00)
Ln(Weighted average density) -.11 (.27)
-.16*** (.007)
Ln(sludge produced)^2 .18
(.12)
Ln(Weighted Density)^2 .05
(.26)
% sludge treated with anaerobic digestion (conventional and advanced)
-.19 (.39)
-.02 (.91)
Av. distance intersited (km) .19*** (.00)
.27*** (.00)
Av. distance intersited by pipeline (km) -.04*** (.00)
-.045*** (.00)
% sludge treated at STC-STW co-located sites -.42* (.06)
-.36** (.04)
Av. distance to disposal (km) .33** (.012)
.38*** (.003)
Dummy 2012 .18
(.28) .23
(.19)
Dummy 2013 .21
(.18) .26
(.13)
Dummy 2014 .14 (.3)
.16 (.25)
Dummy 2015 .13
(.24) .14
(.22)
Dummy 2016 .03
(.73) .03 .77)
Constant 4.03*** (.00)
3.77*** (.00)
R2 adjusted .88 .89
Reset test 0.38 0.38
VIF max 4.35 19
Method OLS OLS
N (sample size) 60 60
Template 38. Bioresources models proposed by South West Water
Description of dependent variable in bioresources models
Bioresources = sludge transport + sludge treatment + sludge disposal
Modelled OPEX = bioresources OPEX – bioresources third party – bioresources pensions – bioresources local authority rates
Modelled base CAPEX = bioresources maintenance infra + bioresources maintenance non-infra – bioresources grants and contributions
Modelled BOTEX = modelled OPEX + modelled base CAPEX
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
45
Modelled BOTEX+ (growth) enhancement = modelled BOTEX + bioresources first time sewerage + bioresources sludge enhancement (growth) + bioresources new developments and growth + bioresources growth at sewage treatment works + bioresources resilience + bioresources reduce flooding risk for properties
Modelled TOTEX = modelled BOTEX + bioresources other capital expenditure infra + bioresources other capital expenditure non-infra + bioresources infrastructure network reinforcement
Unsmoothed net costs from 2011/12 to 2016/17
Consultation
model ID SWBBR1 SWBBR2 SWBBR3 SWBBR4 SWBBR5 SWBBR6 SWBBR7 SWBBR8 SWBBR9
Company’s model ID
1 2 3 4 5 6 7 8 9
Dependent variable
Bioresources BOTEX (ln) Bioresources BOTEX+
(growth) (ln) Bioresources TOTEX (ln)
Sludge produced (ln)
0.990*** (0.000)
1.064*** (0.000)
1.020*** (0.000)
1.044*** (0.000)
1.047*** (0.000)
1.114*** (0.000)
1.145*** (0.000)
1.166*** (0.000)
1.223*** (0.000)
Proportion of area with less than 250 people per km2
0.731** (0.030)
0.591* (0.078)
0.586
(0.116)
Number of treatment works per property (ln)
0.220* (0.059)
0.122
(0.282)
0.139 (0.284)
Proportion of load treated at works in size band 1-3
0.0556* (0.052)
0.059** (0.037)
0.0617** (0.034)
Constant -1.441** (0.041)
0.0256 (0.941)
-1.357* (0.063)
-1.577** (0.028)
-0.475 (0.165)
-1.778** (0.014)
-1.966** (0.012)
-0.847*** (0.009)
-2.223*** (0.003)
R2 adjusted 0.739 0.734 0.743 0.765 0.757 0.778 0.787 0.781 0.799
Reset test 0.032 0.045 0.230 0.004 0.002 0.176 0.007 0.008 0.184
VIF max 2.020 3.934 2.269 2.020 3.934 2.269 2.020 3.934 2.269
Method OLS OLS OLS OLS OLS OLS OLS OLS OLS
N (sample size) 60 60 60 60 60 60 60 60 60
Template 39. Bioresources models proposed by United Utilities
Description of dependent variable
Model 8 and 9 are models of bioresources botex.
Botex has been derived by subtracting total enhancement expenditure (table 9, line 36), business rates (table 8 line 8) and third party services (table 8 lines 10 and 18) from net totex (table 8 line 21) for each of the respective value chains.
Each dependent variable includes smoothed base capex which minimises the impact of spikes.
For all models, the dependent variable is included in its logged form and is in 2012/13 CPIH FYA prices.
Comments on models (United Utilities)
Bioresources models perform well against statistical criteria but have lower explanatory power than econometric models for other subservices.
The model R2 scores of around 0.8 are acceptable but lower than those witnessed in other services. This may reflect the fact that companies can substitute activities between different parts of the wastewater value chain and therefore costs between bioresources and wastewater treatment more readily than they do between other service areas, that data quality is worse, in part as a result of inconsistency between companies in cost allocation and income accounting, that a suitable exogenous land bank variable has not been identified, or perhaps that there is greater variation in efficiency for the service.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
46
Consultation model ID UUBR1 UUBR2
Company’s model ID 8 9
Dependent variable ln(Bioresources botex [transport, treatment & disposal])
Log(Total sewage sludge produced) 0.985*** (0.000)
1.008*** (0.000)
% 'work' done in sludge disposal operations (all forms of transportation)
0.006* (0.067)
0.006** (0.019)
% of load received by WwTW bands 1-3 5.536* (0.066)
5.627* (0.067)
% WwTW in sparse areas (Arup/Vivid) 0.253
(0.583)
Constant -1.447** (0.039)
-1.610** (0.015)
R2 adjusted 0.797 0.795
VIF (max) 2.51 2.84
Reset test 0.185 0.043
Estimation method OLS OLS
N (sample size) 60 60
Template 40. Bioresources models proposed by Welsh Water
Description of dependent variable
Bioresources includes costs for sludge transport, sludge treatment and sludge disposal
Bioresources Botex = “Total Operating Expenditure” – “Third Party Services” – “Local authority and Cumulo rates” + “Maintaining the long term capability of the assets – infra” + “Maintaining the long term capability of the assets - non-infra”
Values rebased to 2016/17 using CPIH in line with the PR19 Methodology Statement.
Comments on models (Welsh Water)
The submitted Bioresources model controls for the amount of transport required using the proportion of load treated in band 1-3 works and the proportion of sludge produced and treated at a site of STW and STC co-location.
The model’s coefficients have the expected sign and magnitude and perform well on the statistical tests.
This cost segment appears to be slightly more problematic to model compared to Network+ and aggregate BOTEX, with estimated range of efficiency scores across the industry being slightly larger.
Although the models produce statistically insignificant coefficients for some variables, the estimated sign and magnitude is supported from an operational point of view. The models appear to have appropriate statistical properties and reasonably robust to other modelling approaches such as Random Effects and unit cost modelling.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
47
Consultation model ID WSHBR1
Company’s model ID 7
Dependent variable Ln (Bioresources Botex)
Ln (Sewage Sludge Produced) 1.050*** (0.000)
% load treated in band 1-3 works 0.055** (0.024)
% sludge produced and treated at a site of STW and STC co-location
-0.001 (0.784)
Constant -1.399*** (0.006)
R2 adjusted 0.760
VIF (max) 2.586
Reset test 0.293
Estimation method OLS
N (sample size) 60
Template 41. Bioresources models proposed by Wessex Water
Description of dependent variable
Model 1 and 3: Bioresources botex = Opex + Capital Maintenance – Third party costs – Local authority rates – EA charges
Model 2 and 4: Bioresources botex = Opex + IRE + Average MNI over period – Third party costs – Local authority rates – EA charges
Description of selected explanatory variables
Ofwat measure of highly dense areas = the proportion of the companies area of service with over 6000 pop.
Comments models (Wessex Water)
We include simple and exogenous models. No endogenous variables were included to aid in setting a level playing field for market opening. Limited independent observations limits number of variables we could include.
Consultation model ID WSXBR1 WSXBR2 WSXBR3 WSXBR4
Company’s model ID 1 2 3 4
Dependent variable Ln(Bioresources
botex)
Ln(Smooth bioresources
botex)
Ln(Unit Bioresources
botex per load)
Ln(Smooth unit Bioresources
botex per load)
Sludge Produced 0.968*** (0.000)
0.952*** (0.000)
Ofwat measure of highly dense areas
-0.529 (0.228)
-0.474 (0.256)
-0.411 (0.195)
-0.384 (0.237)
Constant -0.660 (0.386)
-0.736 (0.269)
-8.760*** (0.000)
-8.910*** (0.000)
R2 adjusted 0.82 0.89 0.077 0.140
VIF (max) 1.67 1.67 1.67 1.67
Reset test 0.000 0.000 0.638 0.662
Estimation method OLS OLS OLS OLS
N (sample size) 60 60 60 60
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
48
Template 42. Bioresources models proposed by Yorkshire Water
Description of dependent variable
The dependent cost variable modelled is BOTEX. The dependent cost variables are deflated using CPIH to 2016/17 prices. No smoothing was undertaken.
The costs are in net terms i.e. excluding grants and contributions (G&C) consistent with the PR14 approach. However, given lack of split of G&C for capital maintenance and enhancement expenditure, we have also modelled CAPEX on a gross basis.
Comments on models (Yorkshire Water)
The bioresources models below aim to explain variations in bioresources BOTEX through variations in scale, sludge treatment and density (as a possible proxy for sludge transportation requirement).
In models 2 and 3, the estimated coefficients on the density/sparsity variable appears to be of the right sign. While these models also have reasonable statistical properties, they result in a relatively wider efficiency range than other parts of the value chain, indicating possible limitations in using these directly for price setting purposes and recourse to other modelling approaches and cross-checking.
Consultation model ID YKYBR1 YKYBR2 YKYBR3 YKYBR4 YKYBR5 YKYBR6
Company’s model ID 1 2 3 4 5 6
Dependent variable Bioresources BOTEX
Amount of Sludge produced (log) (ttds/ year)
0.920*** (0.000)
1.046*** (0.000)
1.080*** (0.000)
1.127*** (0.000)
1.107*** (0.000)
1.083*** (0.000)
% of sludge treated using AD or AAD
-0.646** (0.0277)
-0.718*** (0.000)
-0.741*** (0.009)
-0.740*** (0.001)
-0.703*** (0.008)
-0.686** (0.0103)
% of area with more than 2000 people per km2
-0.809*** (0.003)
% of area with more than 4000 people per km2
-0.843** (0.0381)
% of area with less than 250 people per km2
0.972*** (0.002)
Resident population per service area (log)
-0.343* (0.060)
Connected properties per service area (log)
-0.340* (0.100)
Constant -0.182 (0.550)
-0.480** (0.037)
-0.769* (0.090)
-1.672*** (0.005)
-0.723** (0.039)
-0.895** (0.046)
R2 adjusted 0.776 0.813 0.798 0.811 0.796 0.794
VIF (max) 1.105 1.529 2.160 2.282 2.716 2.431
Reset test 0.116 0.533 0.834 0.204 0.363 0.279
Estimation method OLS OLS OLS OLS OLS OLS
N (sample size) 60 60 60 60 60 60
2.2 Sewage treatment models
Template 43. Sewage treatment models proposed by Ofwat
Description of dependent variables
Sewage treatment base costs excluding cost items described in section 4 of the main consultation document.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
49
Comments on models
We use two alternative scale drivers, properties and total load. We think these are appropriate scale variables for sewage treatment. We control for economies of scale by including the proportion of load treated in small works of bands 1-3. Models 3 to 6 add treatment complexity variables.
All estimated coefficient have the expected sign and are statistically significant, except for the percent of load coming from trade effluent customers which is quite weak and not significant.
The reset test fails in all models. We tested specifications with quadratic and cross-product terms to allow for a more flexible relationship with the scale variable. This has not improved the reset test. The reset test should not be applied mechanically to exclude these models. Rather, it should prompt a specification search – which it did.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Consultation model ID OSWT1 OSWT2 OSWT3 OSWT4 OSWT5 OSWT6
Dependent variable -------- ln (sewage treatment base costs) --------
ln (properties) 1.000***
(0.000)
0.930*** (0.000)
0.899*** (0.000)
ln (load entering treatment works) 0.950*** (0.000)
0.884*** (0.000)
0.859*** (0.000)
% of load treated in STWs bands 1 to 3
0.053** (0.045)
0.054** (0.045)
0.056** (0.024)
0.058** (0.018)
0.056** (0.037)
0.058** (0.029)
% of biological load treated by STWs with an ammonia consent below 1mg
0.028** (0.011)
0.030*** (0.006)
0.028*** (0.004)
0.030*** (0.001)
% of load trade effluent customers received at treatment works
0.032 (0.621)
0.040 (0.516)
Constant 6.370*** (0.002)
3.869* (0.058)
7.154*** (0.001)
4.834** (0.017)
7.395*** (0.001)
5.183** (0.014)
R2 adjusted 0.868 0.864 0.896 0.897 0.898 0.907
VIF (max) 2.273 2.299 2.484 2.488 2.76 2.724
Reset test 0 0 0 0 0 0
Estimation method OLS OLS OLS OLS OLS OLS
N (sample size) 60 60 60 60 60 60
Template 44. Sewage treatment models proposed by Thames Water
Description of dependent variable
Sewage treatment botex = opex + capital maintenance (infra and non-infra)
Description of selected explanatory variables
Number of Works= Total number of works in each year for each company
Load Capacity Treatment Works=(Total Load Received)/(Total Number Of Works)
% tight consent NH3=〖NH3〗_(≤1mg/l)/(Total Load Received) x100%
Comments on models (Thames Water)
Based on a F-Test Cobb-Douglas (CD) is preferred over Translog
The scale variable estimations are strongly significant across all models, ranging from [0.89 to 0.95] suggesting the expected outcome of the presence of economies of scale
Time dummies or time trend don’t provide a significant effect.
The effect of Load Capacity of Treatment Works as a proxy for stock of capital yielded a statistical significance effect. However, all the models provide statistical evidence that there are still issues with omitted variables. This might be an indication that the stock of capital needs to be measured accurately as it is a fundamental part of the cost structure of botex.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
50
As an important driver for sewage treatment, we have explored the effect of quality in our models by using the following measure:
〖Quality Tight Consent Max〗_it=(〖Max〗_(Load Received_it) (〖NH3〗_(≤1mg/l) , 〖BOD〗_(≤7mg/l) , P_(≤0.5 mg/l) ))/〖Total Load Received〗_it X100%
This variable simply takes the maximum load received between all the three high/Tight consents in Ammonia (NH3), BOD and Phosphorus (P) as a proportion of the total load received. This measure captures the tight consents that companies are facing either in NH3, BOD or P. These consents are exogenously determined by the Environmental Agency and are without any management control. The estimated results for this variable ranges between [0.0394 and 0.0403] see models M2 and M3.
Specifically, the variables used in the quality measure are:
〖NH3〗_(≤1mg/l)=Load under 〖NH3〗_(≤1mg/l) in kg BOD5/day
〖BOD〗_(≤7mg/l)=Load under 〖BOD〗_(≤7mg/l) in kg BOD5/day
P_(≤0.5mg/l)=Load under P_(≤0.5mg/l) in kg BOD5/day
〖Total Load Received〗_it=Band 1+Band 2+Band 3+Band 4+Bnad 5+Above Band 5,all in kg BOD5/day
Results show a strong and stable relationship which is statistically significant over a large set of models
Regional wages show a positive effect as expected when using a pooled OLS. Initial results showed that the RE model tends to underestimate the effect of regional wages, and sometimes it produces a negative unexpected coefficient ruling out the use of this econometric model
Consultation model ID TMSSWT1 TMSSWT2
Company’s model ID 2 3
Dependent variable Ln(Botex Treatment)
Ln(Total Load Received) 0.956*** (0.000)
0.951*** (0.000)
% tight consent(NH3,BOD,P) 0.039*** (0.000)
0.040*** (0.000)
Ln(regional wages waste 2soc) 0.827
(0.504) 0.887
(0.510)
Ln(load capacity treatment works) -0.343** (0.029)
-0.346** (0.030)
Constant -7.496** (0.029)
-7.58** (0.039)
R2 adjusted 0.896 0.892
Reset test 0.000 0.000
VIF (max) 4.99 5.14
Method OLS OLS
N (sample size) 60 50
Template 45. Sewage treatment models proposed by United Utilities
Description of dependent variables
Wastewater treatment botex with selected enhancement expenditure, net of grants and contributions.
Botex = excludes business rates and third party services.
Enhancement areas that are substitutable with base costs can be integrated with base cost models. In some areas, companies can achieve a service outcome either through spending on enhancement or through more intensive operation or maintenance of their existing assets. Where this is the case, merging relevant enhancement lines into base cost may be expected to improve the explanatory power of the models, especially where the base models include explanatory factors that are causally related to the enhancement lines.
The dependent is included in its logged form and is in 2012/13 CPIH FYA prices.
Comments on models (United Utilities)
See United Utilities’ comments on wastewater collection models.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
51
Consultation model ID UUSWT1
Company’s model ID 6
Dependent variable ln(Wastewater treatment [incl selected enhancement])
% of population living in urban areas (Arup/Vivid) 2.263** (0.017)
Log(Total load received) 0.984***
(0)
% load received by WwTW bands 1-3 12.116*** (0.003)
% load received by WwTW with tertiary treatment (TA1/TA2/TB1/TB2)
0.275 (0.402)
2012-13 dummy 0.059
(0.113)
2013-14 dummy 0.024 (0.66)
2014-15 dummy 0.016
(0.818)
2015-16 dummy 0.059
(0.366)
2016-17 dummy 0.072
(0.236)
Constant -10.21***
(0)
R2 adjusted 0.897
VIF (max) 5.89
Reset test 0.0000
Estimation method OLS
N (sample size) 60
Template 46. Sewage treatment models proposed by Wessex Water
Description of dependent variable
Sewage treatment botex smoothed = Opex + IRE + average MNI over period – third party costs – local authority rates – abstraction charges
Comments on models (Wessex Water)
Variation 1 models: These are our Endogenous STW models. Variation 2 models: These are our Exogenous STW models. All models below produce very similar results with unsmoothed expenditure.
Consultation model ID WSXSWT1 WSXSWT2 WSXSWT3 WSXSWT4
Company’s model ID 2v1 2v2 4v1 4v2
Dependent variable Ln(Smooth ST botex) Ln(Smooth unit ST botex per load)
Total load (BOD) 0.710*** (0.000)
0.758*** (0.000)
Average size of works (total load / total works)
0.0450 (0.137)
-0.007 (0.841)
Ofwat measure of highly dense areas 0.142
(0.691)
-0.268 (0.404)
Proportion of load undergoing tertiary treatment
0.066 (0.878)
0.042 (0.923)
0.356 (0.535)
0.405 (0.463)
Constant -4.452** (0.014)
-4.852** (0.023)
-8.032*** (0.000)
-8.022*** (0.000)
R2 adjusted 0.88 0.863 0.068 0.12
VIF (max) 1.71 1.70 1.69 1.68
Reset test 0.000 0.000 0.111 0.004
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
52
Estimation method OLS OLS OLS OLS
N (sample size) 60 60 60 60
2.3 Bioresources plus models
Template 47. Bioresources plus models proposed by Ofwat
Description of dependent variables
Bioresources and sewage treatment base costs, excluding cost items described in section 4 of the main consultation document.
Comments on models
These models combine variables used in our bioresources and sewage treatment models. All cost drivers have the expected sign and are statistically significant. The goodness of fit of all models is quite high, explaining at least 90 percent of the costs variance. All monetary values have been inflated to 2016-17 prices using the CPIH.
Consultation model ID OBP1 OBP1 OBP3 OBP4 OBP5 OBP6 OBP7
Dependent variable --------- ln (bioresources plus base costs) ---------
ln (properties) 0.963***
(0.000)
0.976*** (0.000)
0.779*** (0.000)
ln (load) 0.963*** (0.000)
0.911*** (0.000)
0.925*** (0.000)
0.746*** (0.000)
% load treated in STWs bands 1 to 3
0.047** (0.026)
0.047** (0.010)
0.050*** (0.002)
0.052** (0.013)
0.054*** (0.004)
% biological load treated by STWs with an ammonia consent below 1mg
0.012* (0.081)
0.012** (0.029)
% of intersiting work done by truck and tanker
-0.007*** (0.003)
-0.007*** (0.003)
-0.003** (0.037)
-0.004** (0.019)
% of sludge disposed to farmland
-0.011*** (0.000)
-0.013*** (0.000)
Constant 6.561*** (0.000)
8.298*** (0.000)
5.944*** (0.000)
7.654*** (0.000)
5.247*** (0.001)
9.788*** (0.000)
7.966*** (0.000)
R2 adjusted 0.919 0.946 0.953 0.933 0.937 0.903 0.904
VIF (max) 2.273 2.407 2.407 2.405 2.411 1.819 1.814
Reset test 0 0.073 0.003 0 0 0 0
Estimation method OLS OLS OLS OLS OLS OLS OLS
N (sample size) 60 60 60 60 60 60 60
2.4 Sewage collection models
Template 48. Sewage collection models proposed by Ofwat
Description of dependent variable
Sewage collection base costs excluding cost items described in section 4 of the main consultation document.
Comments on models
We use volume and connected properties as alternative scale variables.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
53
We consider that the volume of wastewater is a strong cost driver. The cost of running the wastewater network will be driven more by the volume of wastewater being conveyed to the treatment works rather than by the pollutant load of the wastewater. The volume, rather than the pollutant load, will affect pumping costs and the size of pipes, which in turn have an influence on maintenance costs.
The number of connected properties is also a good output driver. However, while it will capture the volume of domestic wastewater, it may not capture the amount of surface water entering the system.
An alternative scale driver not present in our models is sewer length, which performs similarly well.
We included the number of network pumping stations per sewer length to account for network complexity. An alternative to the number of pumping stations might be the capacity of pumping stations, which seems to produce good results as well.
The variables percent of new mains and percent of gravity sewers rehabilitated were included as additional drivers of maintenance costs.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Consultation model ID OSWC1 OSWC2 OSWC3 OSWC4 OSWC5
Dependent variable -------- ln (sewage collection base costs) --------
Log(connected properties) 0.796*** (0.000)
0.870*** (0.000)
0.858*** (0.000)
Log(volume) 0.772*** (0.000)
0.844*** (0.000)
Log(density) 0.703
(0.167) 0.856** (0.029)
Log(pumping stations per sewer length)
0.271** (0.046)
Pumping station per length (not log)
4.502** (0.023)
3.431** (0.046)
3.485* (0.074)
% of gravity sewer rehabilitated
0.294 (0.337)
0.368 (0.158)
0.337 (0.181)
Log(lengths replaced or renewed post 2001)
-0.063* (0.056)
% of lengths replaced or renewed post 2001
-0.007 (0.380)
-0.01 (0.281)
Constant 5.77** 3.58** 6.80*** 5.37*** 5.62***
R2 adjusted 0.889 0.907 0.882 0.886 0.896
VIF (max) 1.168 1.27 2.468 2.337 2.355
Reset test 0.361 0.021 0.032 0.014 0.005
Estimation method OLS OLS OLS OLS OLS
N (sample size) 60 60 60 60 60
Template 49. Sewage collection models proposed by Thames Water
Description of dependent variable
Sewage collection botex = opex + capital maintenance expenditure (infra and non-infra)
Description of selected explanatory variables
Length of Mains=Total length of "legacy" public sewers as at 31 March
Property Density=(Total Number of connected Properties)/(Length of Public Sewers)
Pumping station Capacity= Total Pumping station capacity (Source: Cost Assessment November 2017, Waste Network sheet)
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
54
VolWasteColl=Volume of wastewater receiving treatment at sewage treatment works. This is a good proxy for total volume of wastewater collected as one of the dimensions of the output in sewage collection. (Source: Cost Assessment November 2017, Waste Network sheet)
Population EQV=Current Population Equivalent served by STWs
Comments on models (Thames Water)
We have tested the type of functional form, Cobb-Douglas (CD) vs. Translog, and the results suggest that a CD functional form is more appropriate in this part of the wastewater value chain
We have use as a scale coefficient the total length of sewers mains (where mains=Total length of "legacy" public sewers as at 31 March). The results are quite robust and significant across all the models ranging between 0.64 and 0.73.
Pumping station capacity as a proxy for stock of capital shows a significant effect in all the models explored. Its effect ranged between [0.12 and 0.20]
All our models suggest that there are no issues with omitted variables
The estimated results show a positive regional wages effect. However, model M4 underestimate its effect when using a random effect econometric model (a similar result is found in sewage treatment)
From M4 to M6, we have included another important dimension of the output by including the effect of volume of wastewater collected proxied as the volume of wastewater receiving treatment at sewage treatment works as a proportion of population equivalent. The results provide statistical significant evidence of the important impact of this driver in sewage collection botex. Its effect ranges from 0.33 to 0.35, when excluding the RE model 4
Consultation model ID TMSSWC1 TMSSWC2 TMSSWC3
Company’s model ID 4 6 7
Dependent variable Ln(Botex Collection)
Ln(Mains) 0.660*** 0.679*** 0.657***
(0.000) (0.000) (0.000)
Ln(Property Density) 1.270** 1.281** 1.244***
(0.011) (0.011) (0.000)
Ln(Regional Wage_waste_2soc) 0.470 0.682 0.597
(0.465) (0.318) (0.286)
Ln(Pumping Station Capacity) 0.127** 0.101* 0.124***
(0.033) (0.062) (0.003)
Ln(VolWasteColl/Population EQV) 0.347** 0.339* 0.354***
(0.029) (0.054) (0.008)
Time -0.005
(0.653)
Constant -3.270 -3.680 -3.556*
(0.140) (0.133) (0.069)
R2 adjusted 0.925 0.928 0.924
Reset test 0.411 0.728 0.470
VIF (max) 2.21 2.37 2.73
Method OLS OLS OLS
N (sample size) 60 50 60
Template 50. Sewage collection models proposed by United Utilities
Description of dependent variable
Model 1 uses sewage collection botex as its dependent variable. Model 5 uses wastewater collection botex, with selected enhancement expenditure as its dependent variable.
Botex has been derived by subtracting total enhancement expenditure (table 9, line 36), business rates (table 8 line 8) and third party services (table 8 lines 10 and 18) from net totex (table 8 line 21) for each of the respective value chains.
Each dependent variable includes smoothed base capex which minimises the impact of spikes. In order to prevent further reduction in the amount of data points available, smoothing is undertaken by adjusting actual
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
55
base capex using a ‘smoothing factor’, which is the ratio between the smoothed (using an extended dataset) and unsmoothed for each company for each year in the sample period.
The dependent variable is included in its logged form and is in 2012/13 CPIH FYA prices.
Comments on models (United Utilities)
These models incorporate the findings of the Arup and Vivid Economics reports published alongside this consultation. Engineering criteria consider whether model explanatory variables represent factors that will cause costs in AMP7 and whether the sign and magnitude of model coefficients are consistent with these causal narratives. These criteria thus consider models’ predictive plausibility directly. Statistical criteria are more limited because they appraise models’ predictive power only through models’ performance in historical datasets. With a large number of causal narratives to account for and limited data available, all models will predict costs with error and biases that affect companies in different ways. By choosing suites of models with different underlying assumptions or drawbacks, errors and biases can be reduced though not eliminated, which will improve the accuracy of predictions and reduce risks. The use of a diverse set of models is more likely to achieve this than a set of very similar models, whose errors and biases will be highly correlated with each other.
By choosing suites of models with different underlying assumptions or drawbacks, errors and biases can be reduced though not eliminated, which will improve the accuracy of predictions and reduce risks. The use of a diverse set of models is more likely to achieve this than a set of very similar models, whose errors and biases will be highly correlated with each other.
Consultation model ID UUSWC1 UUSWC2
Company’s model ID 1 5
Dependent variable Sewage collection botex Sewage collection (incl selected
enhancement)
Log(total sewer length) 0.371* (0.065)
0.382** (0.033)
Log(Annual urban runoff) (Arup/Vivid) 0.328
(0.214) 0.283
(0.197)
% of population living in urban areas (Arup/Vivid)
0.731 (0.48)
1.190** (0.032)
2012-13 -0.113 (0.457)
-0.081 (0.477)
2013-14 -0.0351 (0.719)
0.0389 (0.643)
2014-15 -0.029 (0.691)
0.084 (0.373)
2015-16 -0.0659 (0.568)
-0.069 (0.409)
2016-17 0.0092 (0.906)
0.005 (0.924)
Constant -2.319* (0.08)
-2.307** (0.033)
R2 adjusted 0.834 0.856
VIF (max) 14.77 14.77
Reset test 0.000 0.0437
Estimation method OLS OLS
N (sample size) 60 60
Template 51. Sewage collection models proposed by Wessex Water
Description of dependent variables
Sewerage Botex = Opex + IRE + Average MNI over period – Third party costs – Local authority rates – Abstraction charges
Comments on models (Wessex Water)
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
56
The main cost driver is the number of connected properties. The main issue we faced was how to model density. We found that the inclusion of measures of density based on sewerage area per connected property produced robust results. The models include aggregate and unit cost models based on the number of connected properties, and the linear and quadratic term accounting for population density.
All models below provide very similar results with unsmoothed expenditure.
Consultation model ID WSXSWC1 WSXSWC2
Company’s model ID 2 4
Dependent variable Ln(sewage collection botex
smoothed) Ln(sewage collection botex per
property smoothed)
Connected Properties 0.685*** (0.000)
Sewage Catchment area per 1k properties -0.210 (0.144)
0.036 (0.771)
Sewage Catchment area per 1k properties ^2 -0.143 (0.695)
-0.655** (0.075)
Constant -0.639 (0.421)
-2.929*** (0.000)
R2 adjusted 0.895 0.328
VIF max 3.48 1.67
Reset test 0.024 0.000
Estimation method OLS OLS
N (sample size) 60 60
2.5 Network plus wastewater models
Template 52. Network plus wastewater models proposed by Ofwat
Description of dependent variable
Network plus wastewater base costs excluding cost items described in section 4 of the main consultation document. These costs include sewage collection and treatment base costs.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Comments on models
The models contain cost drivers that are relevant for sewage collection or treatment in terms of scale, density, or complexity.
In addition to the variables described in sewage collection and treatment models, we also test the sewer length as an alternative scale variable. The coefficient on the density variable increases dramatically when including length of sewers in the model. This is because of the relationship between the variables – density is defined as properties per length of sewer. While the coefficient on density may appear high, it should be considered together with the coefficient on length. Short sewer length can contribute to high density, as a result the two coefficients will offset each other to provide, arguably, a plausible outcome.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
57
Consultation model
ID
ONPWW1
ONPWW2
ONPWW3
ONPWW4
ONPWW5
ONPWW6
ONPWW7
ONPWW8
ONPWW9
ONPWW10
Dependent variable
----------------- ln (network plus wastewater base costs) -----------------
ln (properties)
.769*** (0.000)
.774*** (0.000)
.721*** (0.000)
ln (load) .732***
(0.000)
.738*** (0.000)
.690*** (0.000)
ln (volume) .738***
(0.000)
.746*** (0.000)
ln (sewer length)
.769*** (0.000)
.738*** (0.000)
ln (density) .703*** (0.002)
0.688** (0.011)
0.435 (0.436)
1.47*** (0.000)
% lengths of sewer laid post 2001
-.020*** (0.000)
-.018*** (0.000)
-.016** (0.011)
-.020*** (0.000)
-.019** (0.012)
-.017** (0.012)
-.015** (0.018)
-0.018 (0.166)
-.017*** (0.003)
-.016*** (0.004)
% of load, ammonia consent < 1mg
0.019** (0.013)
0.018** (0.021)
Constant 5.16*** 7.11*** 7.67*** 5.17*** 8.10*** 9.99*** 9.43*** 11.8*** 8.84*** 10.6***
R2 adjusted 0.925 0.923 0.888 0.925 0.905 0.904 0.882 0.836 0.92 0.917
VIF (max) 1.016 1.02 1.027 1.025 1.007 1.011 1.015 1.01 1.279 1.291
Reset test 0.003 0 0.025 0.003 0 0 0.032 0 0.008 0.002 Estimation method
OLS OLS OLS OLS OLS OLS OLS OLS OLS OLS
N (sample size)
60 60 60 60 60 60 60 60 60 60
Template 53. Network plus wastewater models proposed by Anglian Water
Description of dependent variables
Average system models (1-6): Natural log of wastewater network plus botex excluding rates per system
Passing Distance models: (7-11): Natural log of wastewater network plus botex excluding rates
Acronyms used in explanatory variables
p.e = population equivalent
Comments models (Anglian Water)
All models are described in detail in our Cost Modelling report – Phase 2, published March 2018: http://www.anglianwater.co.uk/about-us/thinking-about-our-future/
Translog model forms will inevitably see increased multicollinearity (as measured by VIF). This is the downside of the trade-off between the explanatory power of specific coefficients and the additional explanatory power of the model consequent on the inclusion of interacting terms. It is worth noting that in most cases, the coefficients quoted are significant. Furthermore, multicollinearity does not invalidate the model; it just makes it more difficult to interpret specific coefficients.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
58
Consultation model ID ANHNPWW1 ANHNPWW2 ANHNPWW3 ANHNPWW4 ANHNPWW5 ANHNPWW6 ANHNPWW7 ANHNPWW8 ANHNPWW9 ANHNPWW10 ANHNPWW11
Company’s model ID 1 2 3 4 5 6 7 8 9 10 11
Dependent variable Botex exc rates per system Botex exc rates
Ln(p.e. x(1-Sparsity <600km2)) Unit: Population
0.383*** (0.004)
0.327*** (0.000)
0.382*** (0.000)
0.361*** (0.0)
0.342*** (0.0)
0.338*** (0.0)
0.388*** (0.004)
Ln(p.e x Sparsity <600km2) Unit: Population
1.133*** (0.003)
1.129*** (0.0)
1.037*** (0.0)
0.946*** (0.0)
0.758** (013)
0.464*** (0.001)
0.486*** (0.000)
Ln((p.e x(1- Sparsity <600km2)) x ln(Total length of sewer) Populationxkm
-0.862* (0.094)
-0.528** (0.022)
-0.761*** (0.01)
-0.467*** (0.005)
-0.580* (0.056)
-0.271** (0.043)
0.417*** (0.000)
0.490*** (0.000)
0.365*** (0.000)
0.331*** (0.001)
Ln(p.e. Sparsity <600km2) x ln(Total length of sewer) Unit: Populationxkm
1.636** (0.028)
1.448*** (0.000)
1.504*** (0.002)
1.190*** (0.000)
0.947 (0.115)
0.366 (0.12)
0.535*** (0.000)
0.594
Ln(p.e.(1- Sparsity <600km2))^2 Unit: Population2
0.619 (0.221)
0.264 (0.199)
0.525* (0.084)
0.272* (0.065)
0.462 (0.105)
0.276** (0.019)
Combined sewer length as % total sewer length
0.011** (0.028)
0.010*** (0.0)
0.010*** (0.004)
0.008*** (0.0)
0.011*** (0.001)
0.008*** (0.0)
Pump capacity / # Water Recycling Centres (kW/system)
0.002 (0.17)
0.002*** (0.0)
Ln(p.e. x % indigenous sludge) Unit: Population
0.417*** (0.000)
0.4901*** (0.000)
0.365*** (0.000)
0.331*** (0.000)
Ln(p.e. x (1- % indigenous sludge)) Unit: Population
0.535***
(0.000)
0.594***
(0.000)
0.445***
(0.000)
0.286***
(0.000)
Ln(Total length of sewer) 0.275* (0.065)
0.288** (0.040)
-0.035 (0.821)
0.392*** (0.000)
0.511*** (0.003)
Ln(# Water Recycling Centres) x ln(Total length of sewer) Unit: kmxsystem
-0.172** (0.019)
-0.437*** (0.000)
-0.463*** (0.001)
-0.299*** (0.000)
-0.180 (0.344)
Ln(Total length of sewer)^2 0.461*** (0.000)
0.563*** (0.000)
0.612*** (0.001)
0.540*** (0.000)
0.380* (0.077)
Combined sewer length as % total sewer length
0.009*** (0.000)
0.010*** (0.000)
0.004*** (0.009)
0.009*** (0.000)
0.009*** (0.000)
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
59
Consultation model ID ANHNPWW1 ANHNPWW2 ANHNPWW3 ANHNPWW4 ANHNPWW5 ANHNPWW6 ANHNPWW7 ANHNPWW8 ANHNPWW9 ANHNPWW10 ANHNPWW11
Company’s model ID 1 2 3 4 5 6 7 8 9 10 11
Pump capacity/ Total length of sewer Unit: kW/km
0.1221*** (0.000)
0.146*** (0.000)
0.126*** (0.000)
0.130*** (0.000)
2013 dummy 0.112** (0.014)
0.113 (0.121)
0.123** (0.026)
2014 dummy 0.117*** (0.009)
0.117 (0.109)
0.026 (0.119)
0.025 (0.14)
0.022 (0.544)
0.114** (0.039)
0.020
(0.276)
2015 dummy 0.024
(0.606) 0.020
(0.786) 0.031* (0.072)
0.031* (0.068)
0.029 (0.431)
0.014 (0.8)
0.025
(0.172)
2016 dummy 0.128*** (0.005)
0.125* (0.088)
0.073*** (0.0)
0.071***
(0.0) 0.068* (0.073)
0.099* (0.073)
0.057*** (0.003)
2017 dummy 0.197***
(0.0) 0.193*** (0.009)
0.110*** (0.0)
0.076** (0.045)
0.102*** (0.0)
0.097** (0.012)
0.155*** (0.006)
0.049** (0.04)
0.081*** (0.000)
Constant -0.163* (0.099)
-0.141** (0.015)
-0.0883 (0.21)
-0.035 (0.136)
-0.068 (0.294)
-0.037 (0.218)
0.048** (0.015)
-0.036 (0.402)
0.0267 (0.398)
0.031** (0.046)
0.014 (0.778)
R2 adjusted 0.942 0.933 0.972 0.969 0.981 0.981 0.973 0.949 0.926 0.984 0.983
Reset test N/A 0.0001 N/A N/A N/A 0.000 0.000 0.040 0.000 0.370 N/A
VIF (max) N/A 71.57 N/A N/A N/A 70.62 59.13 25.49 22.99 23.74 N/A
Method RE OLS RE RE RE OLS RE OLS RE RE RE
N (sample size) 60 60 50 50 50 50 60 60 50 50 50
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
60
Template 54. Network plus wastewater models proposed by Southern Water
Description of dependent variables
Modelled OPEX plus modelled base CAPEX.
Modelled OPEX is total OPEX less third party services, abstraction charges and local authority rates.
Modelled base CAPEX is maintenance expenditure in infrastructure and non-infrastructure less grants and contributions.
All costs are unsmoothed and deflated to 2016/17 prices using CPIH.
Comments on models (Southern Water)
The two network+ models are similar to models 1 and 3 of the BOTEX models, providing alternative approaches to control for pumping capacity per length of sewer. These models also appear to estimate coefficients that are operationally intuitive with reasonable statistical properties.
Consultation model ID SRNNPWW1 SRNNPWW2
Company’s model ID 1 2
Dependent variable ln (Network+ BOTEX)
Total number of properties (log) (000s)
0.704*** (0.000)
0.679*** (0.000)
Proportion of load with BOD<10mg/L and amm<1mg/L (%)
4.227*** (0.001)
4.410*** (0.001)
Pumping station capacity per km sewer (kW/km) 0.074*** (0.002)
Pumping station capacity per km sewer (log) (kW/km) 0.198***
(0.000)
Proportion of area with more than 4000 people per km2 (%)
-0.956*** (0.000)
-0.992*** (0.001)
Constant -0.402 (0.376)
-0.193 (0.683)
R2 adjusted 0.910 0.914
VIF (max) 4.745 4.679
Reset test 0.034 0.005
Estimation method OLS OLS
N (sample size) 60 60
Template 55. Network plus wastewater models proposed by Severn Trent Water
Description of dependent variable
Models 8-11: Network plus botex gross of grants and contributions
Model 12: Network plus unit cost botex
Description of selected explanatory variables
Load Total load received, kg BOD5/day
No. of STW's Total number of sewage treatment works
Density Properties/mains length
Weighted density Ofwat's new weighted density index
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
61
Tight BOD and N3 consents
Constructed from old June returns data and recent APR's. This is the sum of the number of tight BOD (<10mg/l) and tight ammonia (<5mg/l) consents at large STWs (band 6).
No. of tertiary works
This is the number of large (band 6) works that have a tertiary treatment stage.
Prop. Load with tight N3 consent
This is the proportion of load that has an ammonia consent of 3mg/l or less. Engineering logic informs us that it would be better to have include the load with consents of between 3mg/l and 5mg/l also but this data was not readily available.
Length/Load Length of sewerage mains divided by load
No. of STW's/load No. of STW's divided by load
Sludge vol Total volume of sludge produced (ttds)
Av. Distance intersiting
Total intersiting "work" done divided by sludge vol. (km/yr)
Av. Distance intersiting via pipe
Intersiting "work" done by pipeline divided by sludge vol. (km/yr)
% anaerobic digestion
% of sludge treated with anaerobic digestion (conventional and advanced)
Av. Distance to disposal
Total disposal "work" divided by total sludge vol. (km/yr)
% collocated sites % of sludge treated at a site of STC and STW collocation
Comments on models (Severn Trent Water)
Model 8 uses network plus base costs as the dependent variable. The coefficients are in line with expectations.
Model 9 extends model 8 by adding non-linear terms in the load and no. of STW’s variables. Our prior expectations on the sum of the first order load, STW’s and treatment variables remain unchanged, and have broadly been met in this model.
Model 10 changes only the treatment variable and once again our prior expectations are broadly met. The random effects version of the model poses the same problems as in model 9, with the treatment variable of a negligible size and highly insignificant.
Model 11 extends model 8 with non-linear terms in density and load and also changes the treatment variable. The expression of the treatment variable as a proportion of load also changes our prior expectations with the load and number of works variables which are now expected to sum to around 1 in the presence of constant returns to scale. These coefficients come in broadly in line with expectations, as does the coefficient for the treatment variable.
Model 12 is a unit cost model with all drivers scaled by load. This again imposes an assumption of constant returns to scale in load.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
62
Consultation model ID SVTNPWW1 SVTNPWW2 SVTNPWW3 SVTNPWW4 SVTNPWW5
Company’s model ID 8 9 10 11 12
Dependent variable Ln(Network plus waste base costs)
Ln(Load) .54*** (.00)
.32** (.02)
.4*** (.00)
.67*** (.00)
Ln(Length/Load) .61
(.16)
Ln(Density) 1.37** (.00)
1.2*** (.00)
1.4*** (.00)
1.4*** (.00)
1.6*** (.00)
Ln (No. of STW’s/Load) .31*** (.00)
Ln (No. of STW’s) .4** (.00)
.41*** (.00)
.4*** (.00)
.4*** (.00)
Ln (Sum of tight BOD and N3 permits)
.1** (.02)
.17** (.00)
Prop. of load subject to tight ammonia consent
.23 (.2)
.21 (.3)
Ln(Load)^2 -.03 (.55)
-.14* (.058)
.06 (.2)
Ln(Density)^2 4.78*** (.00)
Ln(No. of STW’s)^2 .12
(.67) .04
(.75)
Ln(Load) X Ln(No. of STW’s)
-.33** (.047)
-.37*** (.00)
Ln(large tertiary works) .27*** (.00)
Dummy 2012 -.05 (.2)
-.06 (.23)
-.03 (.3)
-.04 (.2)
-.06 (.17)
Dummy 2013 -.03 (.3)
-.03 (.33)
-.01 (.78)
-.02 (.48)
-.03 (.36)
Dummy 2014 -.03 (.4)
-.03 (.45)
-.001 (.98)
-.02 (.47)
-.03 (.36)
Dummy 2015 -.02 (.65)
.02 (.63)
-.001 (.97)
-.01 (.78)
-.01 (67)
Dummy 2016 -.01 (.72)
-.01 (.65)
-.002 (.95)
-.01 (.66)
-.01 (.65)
Constant 5.34*** (.00)
5.3*** (.00)
5.37*** (.00)
5.2*** (.00)
-7.5*** (.00)
R2 adjusted .96 .97 .98 .97 .76
Reset test 0.01 0.001 0.01 0.01 0.01
VIF max 7.7 (Load) 22 (Load) 14 (Load) 7.9 (Load) 5.1
Method OLS OLS OLS OLS OLS
N (sample size) 60 60 60 60 60
Template 56. Network plus wastewater models proposed by South West Water
Description of dependent variable
Network+ = Sewage collection + Sewage treatment
Modelled OPEX = Network+ OPEX – Network+ Third party – Network+ pensions – Network+ Local authority rates
Modelled base CAPEX = Network+ Maintenance infra + Network+ Maintenance non-infra – Network+ grants and contributions
Modelled BOTEX = Modelled OPEX + Modelled base CAPEX
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
63
Modelled BOTEX+ (growth) enhancement = modelled BOTEX + network+ first time sewerage + network+ sludge enhancement (growth) + network+ new developments and growth + network+ growth at sewage treatment works + network+ resilience + network+ reduce flooding risk for properties
Modelled TOTEX = modelled BOTEX + network+ other capital expenditure infra + network+ other capital expenditure non-infra + network+ infrastructure network reinforcement
Unsmoothed net costs from 2011/12 to 2016/17
Comments on models (South West Water)
We have adopted the same approach to modelling network plus wholesale wastewater costs as for aggregate wholesale wastewater costs, as there were no bioresources-specific drivers in our aggregate models. We have not, at this stage, examined the appropriateness of different estimation approaches. We do note, however, that some models seem more robust than others and clearly this will have implications for identifying relative efficiency.
See our aggregate wholesale wastewater BOTEX submission for a more detailed review of the drivers considered, which were:
Scale
Sparsity/economies of scale
Local environmental sensitivities (tightness of consents)
Costs of operating and maintaining network assets
Holiday population
We have explored specifications which capture several of these factors, although due to the nature of the data it is not possible to combine all factors into one model. All models capture scale, pumping costs and tightness of consents.
As with aggregate BOTEX, models 1, 2, 5, 6, 9 and 10 also use a metric of population density or sparsity. See the aggregate BOTEX submission for more detail on the rationale for this choice of driver.
As with aggregate BOTEX, models 3, 7 and 11 use the number of sewage treatment works. See the aggregate BOTEX submission for more detail on the rationale for this choice of driver.
As with aggregate BOTEX, models 4, 8 and 12 use the ratio of non-resident to resident population. See the aggregate BOTEX submission for more detail on the rationale for this choice of driver.
We have extended our aggregate BOTEX modelling to models controlling for BOTEX + growth enhancement and TOTEX (see discussion in wholesale wastewater models). As can be seen from the efficiency ranges in the Excel document, while modelling BOTEX+ (growth) does not widen the efficiency ranges, including quality enhancement to model TOTEX does lead to somewhat broader efficiency ranges. As for wholesale water models, we would recommend that BOTEX+ (growth) and TOTEX modelling approaches are explored to the fullest possible extent at PR19.
All of the BOTEX models estimate statistically significant coefficients which are supported from an operational and economic perspective. The relationship between cost and cost drivers in BOTEX+ (growth) and TOTEX models is broadly similar to that estimated in BOTEX models, although not all coefficients pass statistical significance tests. We would note that almost all models considered have significant coefficients on tightness of consents and one or both of pumping capacity and/or a measure of sparsity/economies of scale. This would suggest that these drivers have the strongest statistical relationship with cost.
Given our focus on modelling what we consider to be key industry drivers of cost, we have not explored estimation approaches beyond OLS with robust standard errors. We will be considering the most appropriate estimation approaches as part of our consultation response.
All models are broadly robust from a statistical perspective.
Adjusted R2 is sufficiently high.
VIF (a measure of collinearity) is well below the ‘rule of thumb’ threshold of 10.
We find mixed evidence from the RESET test on whether the model would be improved by the addition of polynomial terms, i.e. given the control variables, whether the model is mis-specified.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
64
Consultation model ID SWBNPW
W1 SWBNPW
W2 SWBNPW
W3 SWBNPW
W4 SWBNPW
W5 SWBNPW
W6 SWBNPW
W7 SWBNPW
W8 SWBNPW
W9 SWBNPW
W10 SWBNPW
W11 SWBNPW
W12
Company’s model ID Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 Model 10 Model 11 Model 12
Dependent variable Network+ BOTEX (ln) Network+ BOTEX+ (growth) (ln) Network+ TOTEX (ln)
Properties (ln) 0.699*** (0.000)
0.798*** (0.000)
0.741*** (0.000)
0.982*** (0.000)
0.656*** (0.000)
0.740*** (0.000)
0.720*** (0.000)
0.945*** (0.000)
0.732*** (0.000)
0.829*** (0.000)
0.674*** (0.000)
0.984*** (0.000)
Pumping capacity over mains (ln)
0.179*** (0.000)
0.186*** (0.000)
0.166*** (0.000)
0.187*** (0.000)
0.090** (0.040)
0.086* (0.073)
0.088** (0.026)
0.085* (0.066)
0.165*** (0.003)
0.203*** (0.001)
0.161*** (0.000)
0.171*** (0.002)
Proportion of load with BOD<10mg/L and amm<1mg/L
2.173*** (0.000)
1.907*** (0.001)
1.580*** (0.002)
0.325 (0.509)
2.030*** (0.000)
1.934*** (0.000)
1.514*** (0.001)
0.191 (0.662)
2.473*** (0.000)
1.819*** (0.005)
1.604*** (0.003)
0.837 (0.137)
Number of combined sewer overflow per km sewer (ln)
0.159** (0.024)
0.214** (0.011)
0.163** (0.022)
0.107* (0.066)
0.158** (0.021)
0.101* (0.059)
0.170** (0.027)
0.210*** (0.008)
0.171** (0.016)
Proportion of area with more than 2,000 people per km2
-0.517*** (0.007)
-0.458** (0.011)
-0.450** (0.032)
Proportion of area with less than 250 people per km2
0.463** (0.017)
0.501*** (0.008)
0.127
(0.607)
Number of treatment works per property (ln)
0.161** (0.011)
0.143** (0.015)
0.009
(0.894)
Ratio of non-resident to resident population
0.041*** (0.004)
0.046*** (0.001)
0.037** (0.0252)
Constant 0.179
(0.713) -0.820* (0.094)
-0.474 (0.260)
-2.036*** (0.007)
0.566 (0.199)
-0.346 (0.446)
-0.130 (0.755)
-1.710** (0.011)
0.253 (0.748)
-0.533 (0.409)
0.186 (0.729)
-1.718* (0.050)
R2 adjusted 0.893 0.889 0.881 0.895 0.902 0.901 0.896 0.910 0.889 0.881 0.870 0.891
Reset test 0.000 0.000 0.001 0.000 0.002 0.001 0.000 0.000 0.000 0.001 0.003 0.015
VIF max 7.743 6.336 4.556 8.892 7.743 6.336 4.556 8.892 7.743 6.336 4.556 8.892
Method OLS OLS OLS OLS OLS OLS OLS OLS OLS OLS OLS OLS
N (sample size) 60 60 60 60 60 60 60 60 60 60 60 60
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
65
Template 57. Network plus wastewater models proposed by Thames Water
Description of dependent variable
Sewage Network Plus botex = opex + capital maintenance expenditure (infra and non-infra)
Description of selected explanatory variables
𝑇𝑜𝑡𝑎𝑙 𝐿𝑜𝑎𝑑 𝑅𝑒𝑐𝑒𝑖𝑣𝑒𝑑𝑖𝑡 = 𝐵𝑎𝑛𝑑 1 + 𝐵𝑎𝑛𝑑 2 + 𝐵𝑎𝑛𝑑 3 + 𝐵𝑎𝑛𝑑 4 + 𝐵𝑛𝑎𝑑 5 +𝐴𝑏𝑜𝑣𝑒 𝐵𝑎𝑛𝑑 5, 𝑎𝑙𝑙 𝑖𝑛 𝑘𝑔 𝐵𝑂𝐷5/𝑑𝑎𝑦
𝐿𝑜𝑎𝑑 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑊𝑜𝑟𝑘𝑠 =𝑇𝑜𝑡𝑎𝑙 𝐿𝑜𝑎𝑑 𝑅𝑒𝑐𝑒𝑖𝑣𝑒𝑑
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑂𝑓 𝑊𝑜𝑟𝑘𝑠
𝑄𝑢𝑎𝑙𝑖𝑡𝑦 𝑇𝑖𝑔ℎ𝑡 𝐶𝑜𝑛𝑠𝑒𝑛𝑡 𝑀𝑎𝑥𝑖𝑡 =𝑀𝑎𝑥𝐿𝑜𝑎𝑑 𝑅𝑒𝑐𝑒𝑖𝑣𝑒𝑑_𝑖𝑡{𝑁𝐻3≤1𝑚𝑔/𝑙 , 𝐵𝑂𝐷≤7𝑚𝑔/𝑙 , 𝑃≤0.5 𝑚𝑔/𝑙}
𝑇𝑜𝑡𝑎𝑙 𝐿𝑜𝑎𝑑 𝑅𝑒𝑐𝑒𝑖𝑣𝑒𝑑𝑖𝑡𝑋100%
𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦 𝐷𝑒𝑛𝑠𝑖𝑡𝑦 =𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑖𝑒𝑠
𝐿𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑃𝑢𝑏𝑙𝑖𝑐 𝑆𝑒𝑤𝑒𝑟𝑠
Pumping station Capacity= Total Pumping station capacity (Source: Cost Assessment November 2017, Waste Network sheet
For regional wages (2 soc) we use the latest version from January 2018.
Comments on models (Thames Water)
The scale variable estimations are strongly significant across all models, ranging from [0.90 to 1.05] suggesting the presence of economies of scale
We run network plus models controlling density with the weighted average population (wad) yielding a high estimated coefficient for Regional wages (1.205), whereas when density is controlled by property density and estimated by OLS the models produce a sensible estimation for regional wage ranging between [0.576, 1.01], but not statistically significant in any model.
Some interesting results showed that models tend to have higher adjusted R2 when controlling density by Property Density versus the case when it uses population density (wad)
There is a consistent failure in the RAMSEY Reset Test for omitted variables in all the wastewater network plus models but it is less severe than the sewage treatment case. This might be an indication that the problem remains in the sewage collection models as none of the models run in treatment passed the test. This might be explained by the way the stock of capital is measure.
We have tested time dummies and time trend variables with no relevant significant effects
Finally, The quality variable proposed in sewage treatment, 𝑄𝑢𝑎𝑙𝑖𝑡𝑦 𝑇𝑖𝑔ℎ𝑡 𝐶𝑜𝑛𝑠𝑒𝑛𝑡 𝑀𝑎𝑥𝑖𝑡, has produced
consistent and strong significant effects across all the specifications and models ranging between [0.021 and 0.024].
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
66
Consultation model ID TMSNPWW1 TMSNPWW2
Company’s model ID 2 3
Dependent variable Ln(Totex Water NetworkPlus)
Ln (Total Load Received) 1.049*** 0.903***
(0.000) (0.000)
Prp Tight Consents Max(NH3, BOD, P) (%) 0.022*** 0.024***
(0.001) (0.000)
Ln(Regional Wages waste 2soc) 0.988 0.576
(0.237) (0.591)
Ln(Property Density) 1.309 1.149
(0.004) (0.003)
Ln(Load Capacity Treatment Works) -0.422*** -0.349***
(0.000) (0.006)
Ln(Pumping Station Capacity) 0.118**
(0.031)
Time -0.014
(0.303)
Constant -4.609*** -3.584***
(0.087) (0.197)
R2 adjusted 0.950 0.959
Reset test 0.018 0.003
VIF (max) 7.13 12.13
Method OLS OLS
N (sample size) 50 60
Template 58. Network plus wastewater models proposed by United Utilities
Description of dependent variable
Wastewater network plus botex, net of grants and contributions.
It excludes business rates and third party services.
Each dependent variable includes smoothed base capex which minimises the impact of spikes.
Comments on models
See United Utilities’ comments on wastewater collection models.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
67
Consultation model ID UUNPWW1
Company’s model ID 4
Dependent variable ln(Wastewater network plus botex)
Log(total load received) 0.864***
(0)
% load received by WwTW bands 1-3 10.687*** (0.003)
% of population living in urban areas (Arup/Vivid)
2.432** (0.022)
% load received by WwTW with tertiary treatment (TA1/TA2/TB1/TB2)
0.259 (0.332)
2012-13 dummy 0.072** (0.028)
2013-14 dummy 0.051
(0.132)
2014-15 dummy 0.030
(0.444)
2015-16 dummy 0.059
(0.159)
2016-17 dummy 0.068
(0.144)
Constant -8.207*** (0.001)
R2 adjusted 0.928
VIF (max) 6.89
Reset test 0.0001
Estimation method OLS
N (sample size) 60
Template 59. Network plus wastewater models proposed by Welsh Water
Description of dependent variables
Wastewater Network Plus includes costs for Sewage Collection and Sewage Treatment
Wastewater Network Plus Botex = “Total Operating Expenditure” – “Third Party Services” – “Local authority and Cumulo rates” + “Maintaining the long term capability of the assets – infra” + “Maintaining the long term capability of the assets - non-infra”
Values rebased to 2016/17 using CPIH in line with the PR19 Methodology Statement.
Comments on models (Welsh Water)
The Wastewater Network Plus model submitted is similar to the aggregate model. This is to be expected as Wastewater Network Plus consists of more than 80% of the wholesale botex on average across the industry.
The model includes a scale variable, the number of connected properties, alongside variables to capture density, treatment complexity and maintenance drivers.
The model does not pass the model specification test (reset) at the specified level. Due to the relatively small sample for the wastewater industry, coupled with relatively stable cost drivers over time we have placed importance on the model consistency and interpretability from an economic perspective.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
68
Consultation model ID WSHNPWW1
Company’s model ID 6
Dependent variable Ln(Wastewater Botex)
Ln(Connected Properties) (000s)
0.795*** (0.001)
Ln(Pumping Station Capacity per km of sewer) (kW/km)
0.182** (0.034)
Ln(Number of combined sewer overflows per km of combined sewer) (nr/km)
0.203 (0.154)
% of load with BOD<10mg/L and Ammonia <1 mg/L (%) 3.918** (0.016)
% of area with less than 250 people per km2 0.448
(0.163)
Constant -0.774 (0.293)
R2 adjusted 0.904
VIF (max) 6.336
Reset test 0.001
Estimation method OLS
N (sample size) 60
Template 60. Network plus wastewater models proposed by Yorkshire Water
Description of dependent variable
Network plus wastewater network plus base costs = operating expenditure less third party services and local authority rates + capital maintenance expenditure net of grants and contributions (G&C).
The dependent variable is deflated using CPIH to 2016/17 prices. No smoothing was undertaken.
Comments on models (Yorkshire Water)
The Network+ models proposed are similar to the aggregate wholesale models. As above, general limitations and modelling observations highlighted above are applicable here.
The models appear to estimate coefficients of the right sign and appropriate magnitude, and are robust to the various statistical tests.
The statistical performance of the models are broadly consistent with and without G&C.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
69
Consultation model ID YKYNPWW1 YKYNPWW2 YKYNPWW3 YKYNPWW4 YKYNPWW5
Company’s model ID 1 2 3 4 5
Dependent variable Network+ BOTEX
Total number of properties (log) (000s)
0.817*** (0.000)
0.699*** (0.000)
0.724*** (0.000)
0.846*** (0.000)
0.795*** (0.000)
% load with BOD<10mg/L and amm<1mg/L
2.399* (0.079)
4.346** (0.011)
4.288*** (0.002)
1.746* (0.081)
2.995** (0.048)
Pumping station capacity per km sewer (log) (kW/km)
0.238*** (0.001)
0.179** (0.034)
0.213*** (0.001)
0.332*** (0.002)
0.290*** (0.002)
Number of combined sewer overflows per km of sewer (log) (nr/km)
0.202 (0.200)
0.159 (0.194)
0.0580 (0.576)
% of area with more than 2000 people per km2
-0.517 (0.158)
% of area with more than 4000 people per km2
-0.923*** (0.001)
-0.502* (0.060)
% of sewers that are combined sewer
0.858*** (0.008)
0.574* (0.079)
Constant -0.681 (0.385)
0.179 (0.791)
-0.377 (0.597)
-1.832* (0.058)
-1.326 (0.185)
R2 adjusted 0.882 0.893 0.914 0.921 0.926
VIF (max) 6.280 7.743 6.645 4.810 6.435
Reset test 0.000 0.000 0.003 0.646 0.012
Estimation method OLS OLS OLS OLS OLS
N (sample size) 60 60 60 60 60
2.6 Wholesale wastewater models
Template 61. Wholesale wastewater models proposed by Ofwat
Description of dependent variable
Wholesale wastewater base costs = bioresources, treatment and collection base costs, excluding cost items described in section 4 of the main consultation document.
Comments on models
We used connected properties or load treated as a volume driver.
The coefficient of the number of pumping stations is not significant. If we use capacity of pumping stations instead of number this becomes significant. We will consider the appropriate measure based on responses to this consultation.
All coefficients have the expected sign and plausible magnitude. We considered alternative, more flexible, specification in light of the failure of the Reset test. This search did not yield a better model and we consider that despite the low Reset tests the models are appropriate.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
70
Consultation model ID
OWWW1
OWWW 2
OWWW3
OWWW4
OWWW5
OWWW6
OWWW7
OWWW8
Dependent variable
--------------- ln (wholesale wastewater base costs) ---------------
Ln(properties) 0.976***
(0.000) 0.961*** (0.000)
0.975*** (0.000)
LN(load) 0.877*** (0.000)
0.852*** (0.000)
0.924*** (0.000)
0.910*** (0.000)
0.921*** (0.000)
% lengths replaced post 2001
-0.013** (0.013)
-0.015** (0.021)
-.013*** (0.003)
-.015*** (0.002)
ln (pumping stations per sewer length)
0.141
(0.207)
% load treated in STWs bands 1-3
0.034* (0.079)
0.019 (0.386)
0.061*** (0.000)
0.052*** (0.000)
0.048*** (0.000)
0.066*** (0.000)
0.055*** (0.000)
0.050*** (0.000)
% load from trade effluent customers
0.069*** (0.010)
0.087*** (0.001)
% sludge disposed to farmland
-.008*** (0.001)
-.009*** (0.000)
Ln(density) 1.170***
(0.009) 0.667* (0.087)
0.742*** (0.003)
1.317*** (0.001)
0.688** (0.032)
0.775*** (0.000)
Constant 8.25*** (0.000)
9.01*** (0.000)
2.23 (0.139)
5.51*** (0.009)
4.45*** (0.001)
-0.94 (0.402)
3.08* (0.076)
1.81* (0.050)
R2 adjusted 0.946 0.951 0.958 0.963 0.966 0.963 0.967 0.971
VIF (max) 2.35 3.545 2.838 2.631 2.671 2.913 2.669 2.699
Reset test 0.001 0 0.01 0.002 0.001 0.002 0 0.01
Method OLS OLS OLS OLS OLS OLS OLS OLS
N (sample size) 60 60 60 60 60 60 60 60
Template 62. Wholesale wastewater models proposed by Southern Water
Description of dependent variable
y = modelled OPEX plus modelled base CAPEX.
Modelled OPEX is total OPEX less third party services, abstraction charges and local authority rates.
Modelled base CAPEX is maintenance expenditure in infrastructure and non-infrastructure less grants and contributions.
All costs are unsmoothed and deflated to 2016/17 prices using CPIH.
Comments on models (Southern Water)
The four models provide alternatives in the following areas:
Pumping station capacity – moving sewage around is a key driver of wastewater costs. Models 1 and 2 control for pumping station capacity per length of sewer in levels, while models 3 and 4 control for pumping station capacity per length of sewer in logs. While regulatory precedent indicate modelling this variable in logarithms, we have presented both alternatives.
Bioresources drivers – models 2 and 4 control for sludge treatment and transport to account for variation in bioresources costs, whilst models 1 and 3 control for density/sparsity as an alternative (possibly capturing some aspect of the need for sludge transport).
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
71
Consultation model ID SRNWWW1 SRNWWW2 SRNWWW3 SRNWWW4
Company’s model ID 1 2 3 4
Dependent variable ln (Wholesale wastewater BOTEX)
Total number of properties (log) (000s) 0.714*** (0.000)
0.771*** (0.000)
0.697*** (0.000)
0.732*** (0.000)
% of load with BOD<10mg/L and amm<1mg/L
3.798*** (0.000)
2.140* (0.074)
3.926*** (0.000)
2.397** (0.045)
Pumping station capacity per km sewer (kW/km)
0.056*** (0.007)
0.0538*** (0.003)
Pumping station capacity per km sewer (log) (kW/km)
0.153*** (0.000)
0.138*** (0.000)
% of area with more than 4000 people per km2
-0.863*** (0.000)
-0.892*** (0.000)
% of sludge treated using AD or AAD -0.305* (0.086)
-0.265 (0.121)
Total measure of intersiting 'work' done (all forms of transportation) per unit sludge produced (log) (km/year)
0.142** (0.016)
0.143** (0.020)
Constant -0.211 (0.600)
-0.774 (0.238)
-0.0710 (0.840)
-0.496 (0.386)
R2 adjusted 0.939 0.927 0.943 0.928
VIF (max) 4.745 6.429 4.679 5.930
Reset test 0.164 0.0353 0.0354 0.0241
Estimation method OLS OLS OLS OLS
N (sample size) 60 60 60 60
Template 63. Wholesale wastewater models proposed by Severn Trent Water
Description of dependent variables
Models 1-5: Wholesale botex gross of grants and contributions Models 6-7: Wholesale unit cost botex
Description of selected explanatory variables
Load Total load received, kg BOD5/day No. of STW's Total number of sewage treatment works Density Properties/mains length Tight BOD and N3 consents
Constructed from old June returns data and recent APR's. This is the sum of the number of tight BOD (<10mg/l) and tight ammonia (<5mg/l) consents at large STWs (band 6).
No. of tertiary works
This is the number of large (band 6) works that have a tertiary treatment stage.
Prop. Load with tight N3 consent
This is the proportion of load that has an ammonia consent of 3mg/l or less. Engineering logic informs us that it would be better to have include the load with consents of between 3mg/l and 5mg/l also but this data was not readily available.
Length/Load Length of sewerage mains divided by load No. of STW's/load No. of STW's divided by load
Comments on models (Severn Trent Water)
Model 1 OLS : Model 1 presents a log-linear model with the sum of tight BOD and ammonia consents acting as the treatment cost driver. The coefficients are all broadly of a magnitude that we would expect, are all significant, and our prior expectations on the 3 scale related coefficients is met, with the three summing almost exactly to 1.
Model 2 OLS: Model 2 extends model 1 with non-linear terms in the load and no. of STW variables as well as an interaction term between the two. The high correlation between the interaction term and other variables in this model led to some coefficient inaccuracy (although our prior hypotheses on the three core variables were still not rejected and all were correctly signed and of a sensible magnitude) which was rectified by changing the treatment variable to the number of tertiary works. Following discussion with Reckon (“Review of Severn
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
72
Trent’s sewerage cost models”, Reckon for Severn Trent (2018), who argued against inclusion of an interaction term, we constructed model 3.
Model 3 OLS: Model 3 adds non-linear terms in load and the number of STW’s to model 1. The variables are all statistically significant and of a logical magnitude and our prior expectations are broadly met. The absence of the interaction term improves model stability substantially.
Model 4 OLS: Model 4 is the random effects version similar to model 3 with the addition of an interaction term included. The interaction term appears to slightly reduce the stability of the model, however, the coefficients remain in line with our expectations (with or without the interaction term). It should be noted that the use of the number of works with tight ammonia consents only as a measure of treatment complexity also works quite well in models 1-4, in that the coefficients are in line with expectations. However, these models tend to have greater problems with multicollinearity with many variables insignificant.
Model 5 OLS: This model is more like Ofwat’s PR14 specifications with non-linear terms in the density and load terms. The expression of the treatment variable as a proportion of load also changes our prior expectations on the load and number of works variables which are now expected to sum to around 1 in the presence of constant returns to scale. These coefficients come broadly in line with expectations.
Model 6 OLS: Model 6 is a unit cost model with all drivers scaled by load. This imposes an assumption (which we consider rather arbitrary) of constant returns to scale in load. While we would have preferred to scale by the number of properties, we found it difficult to obtain sensible coefficients when we adopted that approach.
Model 7 OLS: Model 7 changes only the treatment variable with most other coefficients remaining a similar magnitude to model 6.
Consultation model ID SVTWWW
1 SVTWWW
2 SVTWWW
3 SVTWWW
4 SVTWWW
5 SVTWWW
6 SVTWWW
7
Company’s model ID 1 2 3 4 5 6 7
Dependent variable Ln(botex waste) Ln(botex waste per unit load)
Ln(Load) .56*** (.00)
.49*** (.00)
.58*** (.00)
.47** (.02)
.66*** (.00)
Ln(Length/Load) .68
(.14) .42
(.25)
Ln(Density) 1.07** (.04)
1.05** (.03)
1.06** (.02)
1.15*** (.00)
1.1*** (.00)
1.5*** (.01)
1.31*** (.00)
Ln (No. of STW’s) .34** (.01)
.34** (.02)
.36** (.02)
.39*** (.00)
.35*** (.00)
Ln (No. of STW’s/Load) .28*** (.00)
.32*** (.00)
Ln (Sum of tight BOD and N3 permits)
.09** (.03)
.09** (.03)
.1 (.105)
Ln(Load)^2 -.09 (.28)
.04* (.096)
.01 (.9)
Ln(No. of STW’s)^2 .08 (67)
.07 (.77)
.2 (.4)
Ln(Load) X Ln(No. of STW’s)
-.32 (.16)
-.27 (.25)
Ln(large tertiary works) .17** (.04)
Prop. of load subject to tight ammonia consent
.25* (.08)
.14 (.56)
Prop. tight BOD load .71
(.23)
Ln(Load)^2 .06
(.11)
Ln(Density)^2 4.95*** (.00)
Dummy 2012 -.02 (.58)
-.01 (.8)
-.02 (.58)
-.02 (.6)
-.01 (.76)
-.02 (.6)
-.03 (.54)
Dummy 2013 -.004 (.00)
.01 (.8)
-.004 (.87)
-.004 (.86)
-.01 (.8)
-.002 (.96)
-.005 (.88)
Dummy 2014 -.002 (.93)
.014 (65)
-.002 (.9)
-.002 (.9)
-.004 (.88)
-.003 (.91)
-.004 (.89)
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
73
Consultation model ID SVTWWW
1 SVTWWW
2 SVTWWW
3 SVTWWW
4 SVTWWW
5 SVTWWW
6 SVTWWW
7
Company’s model ID 1 2 3 4 5 6 7
Dummy 2015 .003 (.92)
.01 (.7)
.003 (.9)
.003 (.9)
-.01 (.72)
.007 (.79)
.004 (.86)
Dummy 2016 -.006 (.7)
-.00 (.9)
-.006 (.7)
-.001 (.68)
-.006 (.67)
-.004 (.78)
-.004 (.76)
Constant 5.5*** (.00)
5.5*** (.00)
5.49*** (.00)
5.5*** (.00)
5.37*** (.00)
-7.3*** (.00)
-7.3*** (.00)
R2 adjusted .96 .97 .96 .98 .97 .76 .79
Reset test 0.004 0.00 0.00 0.00 0.04 0.001 0.11
VIF (max) 7.7 14 8.4 21.9 7.9 5.1 5.5
Method OLS OLS OLS RE OLS OLS OLS
N (sample size) 60 60 60 60 60 60 60
Template 64. Wholesale wastewater models proposed by South West Water
Description of dependent variable
Modelled OPEX = OPEX – third party – pensions – local authority rates
Modelled base CAPEX = maintenance infra + maintenance non-infra – grants and contributions
Modelled BOTEX = modelled OPEX + modelled base CAPEX
Modelled BOTEX+ (growth) enhancement = modelled BOTEX + first time sewerage + sludge enhancement (growth) + new developments and growth + growth at sewage treatment works + resilience + reduce flooding risk for properties
Modelled TOTEX = modelled BOTEX + other capital expenditure infra +
other capital expenditure non-infra + infrastructure network reinforcement
Unsmoothed net costs from 2011/12 to 2016/17
Explanatory factors
Data on explanatory factors is taken from the Ofwat industry data-share. Measures of density and sparsity are Ofwat constructed data, using ONS statistics.
Comments on models (South West Water)
We have focused on capturing the key drivers of costs in wholesale wastewater that are operationally robust and statistically valid. We have not, at this stage, examined the appropriateness of different estimation approaches. We do note, however, that some models seem more robust than others and clearly this will have implications for identifying relative efficiency.
The key drivers we have focused on for aggregate wholesale wastewater modelling are:
Scale (properties): there are significant benefits from economies of scale in wastewater services. Properties represents the most appropriate scale driver for aggregate wastewater costs as it captures simultaneously the volume of waste that requires treatment and the size of the network as captured by the number of connections.
Sparsity/economies of scale: the cost of providing wastewater services to a dispersed customer base spread out across a company’s operating area is substantially greater than for serving major urban conurbations. This is most apparent in wastewater treatment, where there are large economies of scale in the size of treatment works (for example, a number of companies have extremely large wastewater treatment works approaching 1,000,000 p.e. and up to 3,000,000 p.e.).
Local environmental sensitivities (tightness of consents): depending on local environmental sensitivities, companies face different costs in treating and disposing of waste. To capture these differences we control for the impact of tight consents as these are outside of management control. (We note, however, that UV consents are only available in the large wastewater dataset, so we have not been able to control for such consents).
Costs of operating and maintaining network assets: the two key asset types we have identified as driving maintenance costs are pumping stations/capacity (driven by the topography and sparsity of the region) and the number of combined sewer overflows (driven by topography and climate).
Holiday population: a large increase in the population in the summer months increases costs over and above treating the same total wastewater flows in a steady state due to the need to build peak
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
74
capacity and to ramp up and down the treatment process (manifesting as higher chemical and energy use and more maintenance needs).
We have explored specifications which capture several of these factors, although due to the nature of the data it is not possible to combine all factors into one model. All models (i) capture scale, (ii) pumping costs and (iii) tightness of consents.
Models 1, 2, 5, 6, 9 and 10 also use a metric of population density or sparsity to capture the impact of serving populations in remote rural locations. When we include a density measure we find it has a negative effect (in contrast to a positive effect in water) due to the beneficial economies of scale in treatment works discussed above. These metrics fall entirely outside management control, and so can be regarded as an entirely exogenous driver of costs. However, as a fairly generic index of population sparsity it may not capture differences in sewage collection, treatment and bioresources that are explained by factors other than population density, such as topography. These factors constrain economic transport distances and rationalisation potential as well as increasing the unit cost of operational and maintenance activities, even where population density is similar.
As such, Models 3, 7 and 11 use the number of sewage treatment works to more directly capture the incremental additional costs of transporting sewage to a greater number of small sewage treatment works, the economies of scale that companies with fewer larger treatment works serving large urban centres are able to achieve, and the additional bioresources costs that result from having many dispersed sewage treatment works and sludge treatment centres.
As an alternative, Models 4, 8 and 12 use the ratio of non-resident to resident population to control for the impact of the large variation in flows that areas with more holiday population face. This metric falls entirely outside of management control and so can be regarded as an exogenous driver of costs.
We have extended our aggregate BOTEX modelling to models controlling for BOTEX + growth enhancement and TOTEX. We have used the same BOTEX drivers as in our aggregate BOTEX models, as the regional operating characteristics increasing or decreasing BOTEX are also likely to affect the cost of delivering many enhancement solutions. We were not able to include direct measures of differences in the amount of growth or quality enhancement within our econometric modelling.
While these models do not include an enhancement specific driver, they do meet many of the statistical criteria set out by Ofwat (see below). As can be seen from the efficiency range charts, while modelling BOTEX+ (growth) does not widen the efficiency ranges, including quality enhancement to model TOTEX does lead to somewhat broader efficiency ranges.
As for wholesale water models, we would recommend that BOTEX+ (growth) and TOTEX modelling approaches are explored to the fullest possible extent at PR19.
All of the BOTEX models estimate statistically significant coefficients which are supported from an operational and economic perspective. The relationship between cost and cost drivers in BOTEX+ (growth) and TOTEX models is broadly similar to that estimated in BOTEX models, although not all coefficients pass statistical significance tests. We would note that most models considered have significant coefficients on tightness of consents and one or both of pumping capacity and/or a measure of sparsity/economies of scale. This would suggest that these drivers have the strongest statistical relationship with cost.
Given our focus on modelling what we consider to be key industry drivers of cost, we have not explored estimation approaches beyond OLS with robust standard errors. We will be considering the most appropriate estimation approaches as part of our consultation response.
All models are broadly robust from a statistical perspective.
Adjusted R2 is sufficiently high.
VIF (a measure of collinearity) is well below the ‘rule of thumb’ threshold of 10.
We find mixed evidence from the RESET test on whether the model would be improved by the addition of polynomial terms, i.e. given the control variables, whether the model is mis-specified.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
75
Consultation model ID SWBWWW1 SWBWWW2 SWBWWW3 SWBWWW4 SWBWWW5 SWBWWW6 SWBWWW7 SWBWWW8 SWBWWW9 SWBWWW10 SWBWWW11 SWBWWW12
Company’s model ID 1 2 3 4 5 6 7 8 9 10 11 12
Dependent variable Aggregate BOTEX (ln) Aggregate BOTEX+ (growth) (ln) Aggregate TOTEX (ln)
Properties (ln) 0.619*** (0.000)
0.751*** (0.000)
0.752*** (0.000)
0.920*** (0.000)
0.606*** (0.000)
0.708*** (0.000)
0.722*** (0.000)
0.897*** (0.000)
0.688*** (0.000)
0.809*** (0.000)
0.702*** (0.000)
0.974*** (0.000)
Pumping capacity over mains (ln)
0.095** (0.019)
0.120*** (0.006)
0.124*** (0.001)
0.125*** (0.004)
0.036 (0.396)
0.045 (0.332)
0.064 (0.107)
0.046 (0.327)
0.103** (0.050)
0.146*** (0.010)
0.127*** (0.005)
0.119** (0.027)
Proportion of load with BOD<10mg/L and amm<1mg/L
2.290*** (0.000)
1.744*** (0.000)
1.400*** (0.000)
0.248 (0.523)
2.170*** (0.000)
1.886*** (0.000)
1.438*** (0.000)
0.268 (0.403)
2.619*** (0.000)
1.850*** (0.000)
1.551*** (0.000)
0.721 (0.119)
Number of combined sewer overflow per km sewer (ln)
0.0625 (0.277)
0.129* (0.074)
0.081
(0.205) 0.037
(0.419) 0.094* (0.089)
0.041
(0.351) 0.104
(0.139) 0.155** (0.028)
0.113* (0.086)
Proportion of area with more than 2,000 people per km2
-0.661*** (0.000)
-0.536*** (0.000)
-0.565*** (0.003)
Proportion of area with less than 250 people per km2
0.457*** (0.002)
0.475*** (0.002)
0.192
(0.376)
Number of treatment works per property (ln)
0.144*** (0.006)
0.112** (0.024)
0.0102 (0.864)
Ratio of non-resident to resident population
0.037*** (0.001)
0.042*** (0.000)
0.039*** (0.006)
Constant 0.810** (0.049)
-0.428 (0.349)
-0.322 (0.416)
-1.527** (0.011)
0.983** (0.011)
-0.0519 (0.902)
0.0321 (0.938)
-1.293** (0.017)
0.631 (0.385)
-0.366 (0.549)
0.174 (0.742)
-1.603** (0.029)
R2 adjusted 0.930 0.916 0.915 0.920 0.938 0.932 0.929 0.939 0.917 0.905 0.898 0.915
Reset test 0.016 0.000 0.000 0.000 0.108 0.000 0.000 0.000 0.003 0.000 0.002 0.001
VIF (max) 7.743 6.336 4.556 8.892 7.743 6.336 4.556 8.892 7.743 6.336 4.556 8.892
Method OLS OLS OLS OLS OLS OLS OLS OLS OLS OLS OLS OLS
N (sample size) 60 60 60 60 60 60 60 60 60 60 60 60
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
76
Template 65. Wholesale wastewater models proposed by United Utilities
Description of dependent variable
Model 7’s dependent variable is wastewater botex, which includes selected enhancement expenditure.
Botex has been derived by subtracting total enhancement expenditure (table 9, line 36), business rates (table 8 line 8) and third party services (table 8 lines 10 and 18) from net totex (table 8 line 21) for each of the respective value chains.
The dependent variable for these models has been adjusted to include selected enhancement expenditure. Enhancement areas that are substitutable with base costs can be integrated with base cost models. In some areas, companies can achieve a service outcome either through spending on enhancement or through more intensive operation or maintenance of their existing assets. Where this is the case, merging relevant enhancement lines into base cost may be expected to improve the explanatory power of base cost models, especially where the base models include explanatory factors that are causally related to the enhancement lines.
The dependent variable includes expenditure associated with NEP - Event Duration Monitoring at intermittent discharges, NEP - Monitoring of pass forward flows at CSOs, Odour, New development and growth, Growth at sewage treatment works (excluding sludge treatment), Resilience, SEMD, Reduce flooding risk for properties and Transferred private sewers and pumping stations.
The dependent variable includes smoothed base capex which minimises the impact of spikes.
For all models, the dependent is included in its logged form and is in 2012/13 CPIH FYA prices.
Comments on models (United Utilities)
See United Utilities’ comments on wastewater collection models.
Consultation model ID UUWWW1
Company’s model ID 7
Dependent variable ln(Wastewater botex + selected enhancement
expenditure)
Log(total load received) 0.879***
(0)
% of load received by WwTW bands 1 to 3 6.249* (0.097)
% population living in urban area (Arup/Vivid) 1.111
(0.258)
% of load received by WwTW with tertiary treatment 0.296
(0.162)
2012-13 dummy 0.027
(0.354)
2013-14 dummy 0.043** (0.026)
2014-15 dummy 0.057* (0.065)
2015-16 dummy -0.003 (0.917)
2016-17 dummy 0.037
(0.355)
Constant -6.857*** (0.005)
R2 adjusted 0.939
VIF (max) 6.89
Reset test 0.002
Estimation method OLS
N (sample size) 60
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
77
Template 66. Wholesale wastewater models proposed by Welsh Water
Description of dependent variable
Wastewater Botex = “Total Operating Expenditure” – “Third Party Services” – “Local authority and Cumulo rates” + “Maintaining the long term capability of the assets – infra” + “Maintaining the long term capability of the assets - non-infra”
2016-17 Cost Assessment Table 8 References:
Wastewater Botex = Line 11 – Line 10 – Line 8 + Line 12 + Line 13
Values rebased to 2016/17 using CPIH in line with the PR19 Methodology Statement.
Comments on models (Welsh Water)
The submitted botex models aim to capture key cost drives for the industry. The models include a scale variable, the number of connected properties, alongside variables to capture density, treatment complexity and drivers of maintenance.
The model’s estimated coefficients have the expected sign, magnitude and are statistically significant. The model has a sufficiently high R2 and is robust to outliers. The model does not pass the model specification test (reset) at the specified level. Due to the relatively small sample for the wastewater industry, coupled with relatively stable cost drivers over time we have placed importance on the model consistency and interpretability from an economic perspective.
Consultation model ID WSHWWW1
Company’s model ID 5
Dependent variable Ln(Wastewater Botex)
Ln(Connected Properties) (000s)
0.755*** (0)
Ln(Pumping Station Capacity per km of sewer) (kW/km)
0.120** (0.0439)
Ln(Number of combined sewer overflows per km of combined sewer) (nr/km)
0.124 (0.321)
% load with BOD<10mg/L and Ammonia <1 mg/L 3.541***
(0.00998)
% of area with less than 250 people per km2 0.437*
(0.0685)
Constant -0.423 (0.490)
R2 adjusted 0.929
VIF (max) 6.336
Reset test 0.000
Estimation method OLS
N (sample size) 60
Template 67. Wholesale wastewater models proposed by Yorkshire Water
Description of dependent variable
Wholesale wastewater base costs = operating expenditure less third party services and local authority rates + capital maintenance expenditure net of grants and contributions (G&C).
Modelled TOTEX(Growth) = modelled BOTEX + modelled growth enhancement expenditure.
Modelled BOTEX = OPEX less third party services, pension deficit recovery payments, and local authority rates were + capital maintenance net of grants and contributions.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
78
Modelled Growth enhancement expenditure = expenditure of “first time sewerage” + “New development and growth” + “Sludge enhancement (growth)” + “Growth at sewage treatment works (excluding sludge treatment)” + “Resilience” + “Reduce flooding risk for properties”.
The dependent variables are deflated using CPIH to 2016/17 prices. No smoothing was undertaken. Local authority rates were + capital maintenance net of grants and contributions.
Comments on models (Yorkshire Water)
The Aggregated BOTEX models are similar to the TOTEX (Growth) models, with the exclusion of the growth enhancement driver. General limitations and modelling observations highlighted under TOTEX (growth) are applicable here.
The models appear to estimate coefficients of the right sign and appropriate magnitude, and are robust to the various statistical tests.
Given lack of split of G&C for capital maintenance and enhancement expenditure, we have also modelled CAPEX on a gross basis. The statistical performance is broadly consistent with and without G&C.
The TOTEX (growth) models aim to explain variations in BOTEX through variation in scale, pumping requirements, treatment complexity, density/sparsity, maintenance drivers and a growth enhancement driver (properties growth).
CSOs per combined sewer was controlled in model 1, and proportion of combined sewers in models 2, 3, 4 as alternatives. From an operational point of view, CSOs rather than combined sewer might be a more appropriate driver of maintenance costs. Having said that, the estimated coefficient on this driver (model 1) appears to be statistically insignificant. While this may be due to a data paucity issue, the impact of this driver in the model appears less clear.
The density thresholds have an impact on the models (from a statistical as well as from an operational point of view). Density 1 measure (2000 people and above) might be more appropriate should variation across the industry be of importance. In Density measure 2 (4000 people and above) the data suggests only variation for Anglian, Severn Trent, Southern, Thames, United Utilities and Wessex. While we have explored both thresholds (and other density measures), the operational rationale for specific thresholds remain unclear.
The models appear to estimate coefficients of the right sign and appropriate magnitude, and broadly robust to the various statistical tests.
Consultation model ID YKYWWW1 YKYWWW2 YKYWWW3 YKYWWW4
Company’s model ID 1 2 3 4
Dependent variable ln (Wastewater Aggregate BOTEX)
Total number of properties (log) (000s)
0.619*** (0.000)
0.834*** (0.000)
0.765*** (0.000)
0.779*** (0.000)
% of load with BOD<10mg/L and amm<1mg/L 4.581*** (0.002)
1.569** (0.0167)
2.579*** (0.006)
2.923*** (0.005)
Pumping station capacity per km sewer (log) (kW/km)
0.0954* (0.088)
0.263*** (0.001)
0.217*** (0.006)
0.218*** (0.000)
Number of combined sewer overflows per km of sewer (log) (nr/km)
0.0625 (0.523)
% of area with more than 2000 people per km2 -0.661** (0.013)
-0.242 (0.254)
% of area with more than 4000 people per km2 -0.544*** (0.001)
% of combined sewers 0.714*** (0.001)
0.576*** (0.004)
0.407** (0.017)
Constant 0.810
(0.175) -1.424** (0.024)
-0.794 (0.138)
-0.875* (0.090)
R2 adjusted 0.930 0.942 0.942 0.948
VIF (max) 7.743 4.810 10.34 6.435
Reset test 0.016 0.044 0.047 0.046
Estimation method OLS OLS OLS OLS
N (sample size) 60 60 60 60
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
79
Template 68. Wholesale wastewater plus models proposed by Yorkshire Water
Consultation model ID YKYWWWP5 YKYWWWP6 YKYWWWP7 YKYWWWP8
Company’s model ID 1 2 3 4
Dependent variable Agg. BOTEX (Growth)
Total number of properties (log) (000s)
0.616*** (0.000)
0.786*** (0.000)
0.716*** (0.000)
0.735*** (0.000)
Pumping station capacity per km sewer (log) (kW/km)
0.032 (0.550)
0.162** (0.020)
0.116 (0.127)
0.123** (0.022)
Number of combined sewer overflows per km of sewer (log) (nr/km)
0.043 (0.613)
% load with BOD<10mg/L and amm<1mg/L 4.328*** (0.006)
1.947** (0.017)
2.949** (0.022)
3.155*** (0.008)
% of area with more than 2000 people per km2 -0.529** (0.050)
-0.240 (0.375)
% of area with more than 4000 people per km2 -0.488* (0.098)
% of sewers that are combined sewers 0.534** (0.015)
0.397* (0.070)
0.259 (0.157)
% growth in number of properties 3.075
(0.297) 3.061
(0.259) 2.964
(0.256) 2.082
(0.503)
Constant 0.909
(0.190) -0.821 (0.225)
-0.193 (0.801)
-0.316 (0.569)
R2 adjusted 0.938 0.943 0.944 0.948
VIF (max) 7.856 4.827 10.34 6.481
Reset test 0.0626 0.0156 0.0324 0.0197
Estimation method OLS OLS OLS OLS
N (sample size) 60 60 60 60
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
80
3 Retail models
3.1 Bad debt models
Template 69. Retail bad debt models proposed by Ofwat
Description of dependent variables
Bad debt plus debt management costs per household
The denominator, household, is the total number of connected households receiving either water only, wastewater only or dual services.
Comments on models
The two main variables in our debt per household models are average bill size and a proxy for the propensity to default.
We used three proxies for the propensity to default:
1. Percentage of households with default (eq_lpcf62)
2. Credit risk score derived from all Insight data (eq_rgc102). Higher credit score means a lower risk of default so we expected a negative coefficient as estimated.
3. The proportion of people experiencing income deprivation.
The first two variables were provided by United Utilities https://www.unitedutilities.com/corporate/about-us/our-future-plans/looking-to-the-future/ (see retail cost assessment) and are sourced from Equifax. The last variable (income deprivation domain) is sourced from the ONS (DCLG) and the Welsh Government.
The proportion of people in England and Wales experiencing income deprivation is calculated for each country. The criteria for the English and Welsh income deprivation measures are broadly similar, covering income related benefits, tax credit recipients and supported asylum seekers so we have combined the measures to obtain data for England and Wales.
The results were corroborated using other deprivation measures, such as unemployment rate and number of mortgage repossessions. The estimated coefficient provided a similar effect although the level of significance was slightly lower.
Models 3 and 4 include the total number of households as an additional explanatory variable to capture economies of scale. There is some evidence of economies of scale in models 3 and 4.
It is possible that bad debt costs could be impacted by different accounting policies adopted by companies, with a large effect on annual costs reported but without relation to the underlying drivers. In this context, we present model 6 where we averaged the data over the four-year period to smooth year-on-year volatility in reported costs.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
81
Consultation model ID ORDC1 ORDC2 ORDC3 ORDC4 ORDC5 ORDC6
Dependent variable --- ln(bad debt per household) --- sample avg
Ln(number of households)
-0.128* (0.083)
-0.032 (0.629)
-0.053 (0.601)
Ln(bill size)
1.160*** (0.000)
1.138*** (0.000)
1.341*** (0.000)
1.183*** (0.000)
1.095*** (0.000)
1.168*** (0.000)
HHs with default (%) (Eq_lpcf62)
0.050*** (0.006)
0.068*** (0.004)
Income deprivation domain (%)
0.058** (0.032)
Credit risk score (Eq_rgc102)
-0.032** (0.034)
-0.034** (0.034)
-0.036* (0.067)
Constant -5.479*** 0.393 -5.204*** 0.888 -4.580*** 1.467
R2 adjusted 0.79 0.773 0.803 0.771 0.774 0.789
VIF (max) 1.03 1.078 2.843 2.152 1.178 2.221
Reset test 0.146 0.257 0.153 0.352 0.018 0.477
Estimation method OLS OLS OLS OLS OLS OLS
N (sample size) 71 71 71 71 71 17
Template 70. Retail bad debt models proposed by Anglian Water
Description of dependent variable
All models are described in detail in our Cost Modelling report – Phase 2, published March 2018: http://www.anglianwater.co.uk/about-us/thinking-about-our-future/
Description of selected explanatory variables
Deprivation measure – 80th percentile for IMD with billing used as weight
Comments on models (Anglian Water)
Doubtful Debt and Debt Management model is expected to be a function of:
Average bill size
Customer numbers
Deprivation
Regional unemployment
Regional wages
Consultation model ID ANHRDC1
Company’s model ID 2
Dependent variable Doubtful debt & debt management
Ln(Average bill size) 0.26** (0.050)
Ln(Revenue2) 0.096*** (0.000)
Deprivation measure 0.762* (0.055)
Time trend -0.030 (0.227)
Constant
-1.870** (0.018)
R2 adjusted 0.9564
Reset test 0.0005
VIF (max) 3.21
Method OLS
N (sample size) 89
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
82
Template 71. Retail bad debt models proposed by United Utilities
Description of dependent variable
Natural logarithm of bad debt costs per household, where bad debt costs is debt management plus doubtful debt.
Comments on models (United Utilities)
Households are counted as one regardless of whether they receive one or two services; this is not a unique customer measure. These models capture economies of scope through the bill size independent variable.
We have found bill size and deprivation to be significant drivers of bad debt cost.
These models all perform well in diagnostic tests not included in the pro-forma, including the Linktest and Shapiro-Wilk test.
Predicted IMD was constructed using a range of factors also provided by Equifax. More details can be found in Reckon LLP (2017) “Capturing deprivation and arrears risk in household retail cost assessment”.
All models are discussed in more detail in Reckon LLP (2018) published alongside this consultation.
Price base is in 2017 CPI terms.
Consultation model ID UURDC1 UURDC2
Company’s model ID BD1_d3 BD1_d5
Dependent variable Ln(bad debt per household)
Revenue per household (£/household) 1.142*** (0.00)
1.115*** (0.00)
Deprivation measure (Units vary across measures)
1.204* (0.085)
3.001** (0.024)
2014 dummy 0.157* (0.076)
0.159* (0.072)
2015 dummy 0.204** (0.038)
0.195** (0.042)
2016 dummy 0.136 (0.11)
0.121 (0.138)
Constant -4.287*** -4.553***
R2 adjusted 0.771 0.786
Reset test 0.428 0.393
VIF (max) 1.54 1.37
Estimation method OLS OLS
N (sample size) 71 71
Template 72. Retail bad debt models proposed by Severn Trent Water
Description of dependent variable
Doubtful debt plus Debt management costs
Description of selected explanatory variables
Bill to income ratio – average bill (total revenue/number of connected households) divided by weekly earnings. In the models, this is average weekly earnings of the lowest decile earners in the region.
Proportion of private rental properties - proportion of connected households that are rented.
Equifax credit risk (XPCF2) - Equifax “Partial Insight Postcode Event” – Average number of partial insight accounts or county court judgements per household.
Comments on models (Severn Trent Water)
All of the coefficients in the model seem reasonable and are in line with expectations.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
83
Consultation model ID SVTRDC1
Company’s model ID 2
Dependent variable Ln(Debt costs)
Ln (connected customers) 0.82*** (0.000)
Proportion metered -0.56** (0.000)
Ln(Bill to income ratio(10th percentile)) 1.34*** (0.000)
Proportion private rental property 0.05*** (0.01)
Equifax credit risk (XPCF2) 0.51** (0.00)
Constant 0.91
(0.13)
R2 adjusted 0.97
Reset test 0.2
VIF (max) 3.4
Estimation method OLS
N (sample size) 71
Template 73. Retail bad debt models proposed by South West Water
Description of dependent variable
Bad debt incudes doubtful debt and debt management
All costs are outturn and are not smoothed.
Description of selected explanatory variable
Deprivation: DCLG and Welsh government statistics
Council tax default rate: DCLG data on council tax collection rates, by local authority (2013/14–2016/17)
Prepayment: Ofwat data release for years 2013/14–2014/15, assumed same levels over AMP6
Comments on retail bad debt models (South West Water)
We have focused on capturing the effect of three key cost drivers:
scope, the number of dual customers a company serves;
bill size, which increases a company’s exposure to customers defaulting; and
deprivation, which increases the propensity of customers to default.
We have used income deprivation, collected by the ONS for England and Wales, to capture deprivation levels. We have also explored an alternative deprivation driver, the proportion of council tax defaults. Our bad debt models also include bill size and the Ofwat measure of unique customers from PR14 (the sum of all single customers + 1.3 × the sum of dual customers).
Given our focus on modelling what we consider to be key industry drivers of cost, we have not explored estimation approaches beyond OLS with robust standard errors. We will be considering the most appropriate estimation approaches as part of our consultation response.
All models are broadly robust from a statistical perspective.
The RESET test does not suggest that the model is miss-specified.
Modelling retail BOTEX or retail OPEX + depreciation has little impact on the model specification or efficiency ranges
Modelling for 4 years (2013/14-16/17) or for AMP6 only (2015/15-16/17) has little impact on the model specification or efficiency ranges
Ofwat comment: model 1 has been proposed also by Yorkshire Water and South Staffs Water.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
84
Consultation model ID YKYSSCSWBRDC1 SWBRDC2
Company’s model ID Model 1 Model 2
Dependent variable Bad debt (ln)
Unique customers, Ofwat constructed measure (ln) 0.911*** (0.000)
0.896*** (0.000)
Average combined bill (ln) 0.930*** (0.000)
0.974*** (0.000)
Income deprivation (ln) 0.841*** (0.003)
Council tax default rate (%) 0.198** (0.018)
Constant 6.212*** (0.000)
3.823*** (0.000)
R2 adjusted 0.950 0.949
Reset test 0.601 0.287
VIF (max) 2.96 3.381
Estimation method OLS OLS
N (sample size) 68 68
Template 74. Retail bad debt models proposed by Thames Water
Description of dependent variable
Bad debt costs incudes doubtful debt and debt management
Description of selected explanatory variables
Income deprivation AHC - measure of after-housing-costs (AHC) income deprivation established by combining ONS data on (before-housing-costs) income deprivation and HBAI data on AHC income deprivation (ONS, Department of Work and Pensions).
Total internal migration - propensity of people to migrate from/to UK local authorities, sum of inflows and outflows (ONS).
Total international migration - propensity of people to migrate from/to UK local authorities and abroad, sum of inflows and outflows (ONS).
Comments on models (Thames Water)
Transience, in our experience, is a key driver of our bad debt costs (to a lesser extent of customer service costs). The level of transience varies greatly across England and Wales. According to our analysis, transience is 20% higher for Thames Water compared to any other company.
The model output below shows that transience is a robust driver in some models of bad debt costs.
We consider that transience should be part of the mix of explanatory variables that Ofwat has regard to in developing its models of bad debt costs.
We refer to Economic Insight’s transience report (available on Water UK’s marketplace of ideas) for a more in-depth discussion of this matter.
Key findings
Both single and dual service customers are significant in models 1 to 4.
Total customers found to be significant in models 5 and 6, single service customers dropped on the ground of insignificance.
IMD income and wholesale bill variables found to be significant in all model where included.
The significance of the aforementioned variables across models strongly suggests that customer number, IMD income and wholesale bill variables should be included as drivers in models of bad debt costs.
The measure of transience measure (i.e. internal or international) has only a small impact on R2 and on the transience variable’s level of significance. This likely reflects the high degree of correlation between both measures.
Internal transience is significant at the 1% level in model 1, international transience is significant at 1% level in model 2 and at 5% level in model 6.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
85
Transience is not significant although it has the expected sign in models 3, 4 and 5.
Model 7 – Should a measure of deprivation account for housing costs?
Deprivation is a key driver of bad debt costs. This reflects that the risk of arrears is considerably greater for deprived customers. Models 1-6 use the IMD income measure published by the ONS. It measures the proportion of a local authority’s population with an income of <60% of the UK median income.
The IMD does not account for housing costs. We consider that a measure for the likelihood of households getting into arrears should account for housing costs. Housing costs account for a large proportion of household expenditures and is particularly high in London.
In model 7 we replaced the IMD with AHC – a deprivation measure that we developed by combining data on IMD income with HBAI data on AHC income deprivation. The AHC is statistically significant. Its lower coefficient relative to that of the IMD is a consequence of the proportion of deprived HHs being higher when accounting for housing costs.
Data on IMD income is published by ONS at the level of local authorities; data on AHC income deprivation by DWP but only at the regional level.
Consultation model ID
TMSRDC1 TMSRDC2 TMSRDC3 TMSRDC4 TMSRDC5 TMSRDC6 TMSRDC7
Company’s model ID
1 2 3 4 5 6 7
Dependent variable
Ln(bad debt related retail operating costs)
Ln(single service customers)
0.535*** (0.000)
0.513*** (0.000)
0.442*** (0.002)
0.482*** (0.000)
0.427*** (0.006)
Ln(dual service customers)
0.121*** (0.000)
0.119*** (0.000)
0.196*** (0.002)
0.183*** (0.002)
0.146*** (0.013)
Ln(total customers)
0.944*** (0.000)
0.919*** (0.000)
IMD income (ONS) (%)
0.189*** (0.000)
0.150*** (0.000)
0.168*** (0.002)
0.133*** (0.008)
0.082*** (0.000)
0.072*** (0.000)
Income AHC (%)
0.106*** (0.003)
Ln(wholesale bill)
1.744*** (0.000)
1.752*** (0.000)
1.188*** (0.003)
1.258*** (0.002)
1.153*** (0.000)
1.189*** (0.000)
1.532*** (0.000)
Total internal migration (%)
0.091*** (0.001)
0.101
(0.109)
0.030 (0.145)
Total international migration (%)
0.291*** (0.001)
0.160
(0.320)
0.131** (0.014)
Constant -14.37*** (0.000)
-13.21*** (0.000)
-10.95*** (0.000)
-10.22*** (0.000)
-11.91*** (0.000)
-11.65*** (0.000)
-11.14*** (0.000)
R2 adjusted 0.9333 0.9347 0.9315 0.9328 0.9619 0.9628 0.9223
Reset test 0.0004 0.0002 0.0031 0.0000 0.0076 0.0277 0.0000
VIF (max) 6.78 6.78 6.78 6.78 2.90 3.14 6.28
Method OLS OLS RE RE OLS OLS OLS
N (sample size)
89 89 89 89 89 89 89
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
86
Template 75. Retail bad debt models proposed by Wessex Water and Bristol Water
Description of dependent variable
Bad debt and debt management costs
Comments on models (Wessex Water and Bristol Water)
The models were developed using an objective general to specific methodology, which was subject to academic peer review. This generated a suite of 16 econometric models:
Generalised models used a wide set of variables derived from a ‘first principles’ consideration of the drivers of retail costs.
Specific models were estimated taking a ‘liberal’ approach to statistical significance (i.e. including variables that were significant at levels approaching 10%).
‘Alternative’ models were estimated for total retail operating costs, which retained variables that were not significant, but were correctly signed.
Two approaches were used in the inclusion of scale (customer numbers) and scope (dual versus single service): Models ‘A’ include separate variables for the number of dual and single service customers. Models ‘B’ include a variable for total customer numbers, alongside the number of single service customers (where this remains after general to specific modelling).
We think that both approaches to the incorporation of scale and scope are valid, and each has advantages and disadvantages. Using separate dual and single service variables provides a very flexible specification, and the resulting models incorporate a wider range of potentially relevant variables. On the other hand, the coefficients are difficult to interpret, as some companies have no dual service customers. The alternative approach is less flexible, but provides more intuitive coefficient estimates.
Overall, we consider the models across the suite to be valid.
A full description of the work undertaken to arrive at these models is set out in a report by Economic Insight: ‘Household retail cost assessment for PR19: final report for Bristol and Wessex Water.
Consultation model ID WSXRDC1 WSXRDC2 WSXRDC3 WSXRDC4
Company’s model ID A2 A6 B2 B6
Dependent variable ln(bad debt related operating costs)
ln(total customers) 0.979*** (0.000)
0.933*** (0.000)
ln(single service customers) 0.535*** (0.000)
0.532*** (0.000)
ln(dual service customers) 0.121*** (0.000)
0.184*** (0.003)
IMD income (%) 0.189*** (0.000)
0.136*** (0.008)
0.067*** (0.000)
0.055* (0.071)
Property repossessions (%) 0.147** (0.015)
ln(average wholesale bill) 1.744*** (0.000)
1.235*** (0.002)
1.091*** (0.000)
1.165*** (0.000)
Internal population total flow (%) 0.091*** (0.001)
Constant -14.37*** (0.000)
-10.25*** (0.000)
-11.31*** (0.000)
-11.57*** (0.000)
R2 adjusted 0.933 0.926 0.962 0.964
Reset test 0.000 0.003 0.031 0.017
VIF (max) 6.78 6.78 2.07 2.62
Estimation method OLS RE OLS RE
N (sample size) 89 89 89 89
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
87
Template 76. Retail bad debt models proposed by Yorkshire Water
Description of dependent variable
Doubtful debt + debt management costs
All costs are not deflated and are unsmoothed.
Comments on models (Yorkshire Water)
In developing bad debt models we have considered the same factors as for models of total BOTEX. We identified the same model specifications for bad debt costs as in total BOTEX, although the coefficients themselves are different.
The impact of bill size and deprivation is greater in size, while the relationship between cost and metering is reversed, with greater metering penetration implying lower bad debt costs.
General limitations and modelling observations noted under retail BOTEX apply here as well.
Deprivation: DCLG and Welsh government statistics
Private and social renters: 2011 census data extrapolated forwards using 2016 regional data on tenure by region from the ONS.
Ofwat comment: Model 4 was also proposed by South Staffs Water and South West Water.
Consultation model ID YKYSSCSWBRDC1 YKYRDC2 YKYRDC3
Company’s model ID 4 5 6
Dependent variable Bad debt (log)
Unique customers, Ofwat constructed measure (log) 0.911*** (0.000)
0.851*** (0.000)
0.889*** (0.000)
Average combined bill (log) 0.930*** (0.000)
0.981*** (0.000)
0.989*** (0.000)
Income deprivation (log) 0.841* (0.051)
1.056*** (0.031)
0.774** (0.048)
Proportion of private renters (%) 4.078
(0.115)
Proportion of metered customers (%) -0.345 (0.350)
Constant 6.212*** (0.000)
6.144*** (0.000)
6.079*** (0.000)
R2 adjusted 0.950 0.952 0.950
Reset test 0.601 0.275 0.514
VIF (max) 2.96 3.84 3.26
Estimation method OLS OLS OLS
N (sample size) 68 68 68
Template 77. Retail bad debt models proposed by South East Water
Description of dependent variable
Modelled bad debt includes doubtful debt plus debt management
Costs are unsmoothed and in nominal prices
Description of selected explanatory variables
Bill size: Water UK
Deprivation: DCLG and Welsh government statistics
Customer numbers and metered customers: company APR’s (2013/14 – 15/16) and Ofwat data release
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
88
Comments on models (South East Water)
We have included average bill size as an explanatory factor. However, we would comment that the models may insufficiently control for the larger economies of scale and legal tools available to WASCs collecting larger
bills.
We believe that using the average combined bill as an explanatory factor for debt related costs could underestimate the costs faced by WOCs in collecting smaller bills, where the same costs are incurred but less debt is collected. In addition, larger combined bills often come with more debt recovery tools and court enforcement options to chase debt and receive a greater return on debt management expenditure.
Based on the diagnostic tests, while bill size does test as being statistically significant, the magnitude of the coefficient requires further examination.
Consultation model ID SEWRDC1 SEWRDC2
Company’s model ID Model 3 Model 4
Dependent variable Bad debt (log)
Total customers (log) 0.904*** (0.000)
0.888*** (0.000)
Average combined bill (log) 0.932*** (0.000)
0.980*** (0.015)
Unemployment (%) 0.126* (0.098)
0.113 (0.108)
Metering -0.300 (0.494)
Constant 3.801*** (0.000)
3.871*** (0.000)
R2 adjusted 0.949 0.949
Reset test 0.835 0.515
VIF (max) 3.20 3.40
Estimation method OLS OLS
N (sample size) 68 68
3.2 Totex less bad debt models
Template 78. Retail other expenditure models proposed by Ofwat
Description of dependent variable
Other residential retail costs per household.
Other retail costs includes customer service, meter reading, plus depreciation on capital investment. It excludes expenditure related to third party services.
The denominator, household, is the total number of connected households receiving either water only, wastewater only or dual services.
Comments on models
The main variables in our other retail cost models are the number and type of households served.
All models include the proportion of dual service households (households which receive both water and wastewater services from the same retailer). Providing both services may drive higher retail costs than providing a single service due to additional metering costs and more frequent customer contact. The coefficient is consistent, with a plausible magnitude and reasonable significance across all specifications.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
89
Models 2 and 4 include the proportion of metered households to account for metering costs and, possibly, for higher customer service costs due to more frequent contact. Although the coefficient is not statistically significant in any of the models, its value is plausible and consistent across the different specifications.
Models 3 and 4 also include the total number of connected households to allow for economies of scale. The negative coefficient provides some evidence that the costs per household reduce with the number of households served.
The time dummies suggest that costs have dropped in PR14.
The models have a very low R2. This suggests that the explanatory variables do not explain much of the variation in the dependent variable. To some extent, our modelling suggests that using an average cost to serve approach for other retail costs is a sensible approach, with any variation not explained by customer number regarded as noise.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Consultation model ID OROC1 OROC2 OROC3 OROC4
Dependent variable ------------ ln(other retail costs per household) ------------
% of dual service households 0.002 0.002 0.003** 0.003**
(0.132) (0.115) (0.010) (0.016)
% metered households
0.004 0.004
(0.227) (0.322)
Ln(number of households)
-0.080* -0.068 (0.094) (0.208)
2015 dummy 0.036* 0.026 0.036* 0.028
(0.081) (0.279) (0.080) (0.275)
2016 dummy -0.048 -0.067 -0.047 -0.064
(0.204) (0.127) (0.220) (0.159)
2017 dummy -0.078** -0.101** -0.069* -0.090*
(0.043) (0.021) (0.053) (0.052)
Constant 2.752*** 2.552*** 3.784*** 3.457***
R2 adjusted 0.06 0.124 0.117 0.162
VIF (max) 1.493 1.513 2.153 2.212
Reset test 0.497 0.819 0.315 0.907
Estimation method OLS OLS OLS OLS
N (sample size) 71 71 71 71
Template 79. Retail other expenditure models proposed by Anglian Water
Description of dependent variable
All models are described in detail in our Cost Modelling report – Phase 2, published March 2018: http://www.anglianwater.co.uk/about-us/thinking-about-our-future/
Description of selected explanatory variables
Deprivation measure – 80th percentile for IMD with billing used as weight
Comments on models (Anglian Water)
On the basis of section 2.2 of Annex 5 to our report, Other Retail costs are expected to be a function of:
The number of metered customers
The number of unmetered customers
The proportion of customers which take a wastewater service
Regional Wages
Quality of Service.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
90
Consultation model ID ANHROC1 ANHROC2
Company’s model ID 3 4
Dependent variable Other Retail costs Other Retail costs
Ln(# Metered customers) 0.549*** (0.000)
0.521*** (0.000)
Ln(# Unmetered customers) 0.339*** (0.000)
0.382*** (0.000)
Ln(Regional Wages) Unit:£
1.045** (0.014)
0.137 (0.743)
Sparsity 0.249*** (0.007)
WoC billed wastewater customers as % of total customers
-0.301** 0.035
WaSC billed wastewater customers as % of total customers
0.505*** (0.000)
SIM -0.007 (0.152)
-0.012** (0.021)
Time trend 0.018
(0.292) 0.033* (0.080)
Constant -5.395*** (0.000)
-2.875** (0.021)
R2 adjusted 0.9699 0.9619
Reset test 0.197 0.849
VIF (max) 5.44 5.35
Method OLS OLS
N (sample size) 89 89
Template 80. Retail other expenditure models proposed by United Utilities
Description of dependent variable
Total retail cost per household less costs related to bad debt.
Comments on models (United Utilities)
Households are counted as one regardless of whether they receive one or two services; this is not a unique customer measure. These models capture economies of scope through the dual service independent variable.
The dependent variable is not logged as we consider remaining retail costs are best modelled using an additive, rather than multiplicative, specification.
We do not consider the low R2 to be a significant issue. These specifications model cost per household. If these models were specified as total cost models the R2 would exceed 0.9. We consider the loss of R2 to be worth the additional gain in precision that comes from using a cost per household specification. Additionally, these models perform well on RESET, Linktest and Shapiro-Wilk tests.
The metered services variable accounts for the fact that a customer may receive water and wastewater services from different companies.
All models are discussed in more detail in Reckon LLP (2018) ‘Econometric models for residential retail cost assessment’.
Price base is in 2017 CPI terms.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
91
Consultation model ID UUROC1 UUROC2
Company’s model ID RR2 RR3
Dependent variable Remaining retail cost per household
Metered services per household 3.87
(0.311)
% dual service 0.753
(0.702) 2.714
(0.101)
2014 dummy 0.854
(0.198) 0.56
(0.332)
2015 dummy 1.54*** (3.11)
1.379** (0.02)
2016 dummy 0.234
(0.598) 0.183
(0.725)
Constant 12.69*** 15.12***
R2 adjusted 0.155 0.098
VIF (max) 1.69 1.54
Reset test 0.857 0.942
Estimation method OLS OLS
N (sample size) 71 71
Template 81. Retail other expenditure models proposed by Severn Trent Water
Description of dependent variable
Other operating expenditure model: (Operating costs + Depreciation + Amortisation) – (Doubtful debt + Debt management costs)
Description of selected explanatory variables
Bill to income ratio – average bill (total revenue/number of connected households) divided by weekly earnings. In the models, this is average weekly earnings of the lowest decile earners in the region.
Prop. of private rental properties - proportion of connected households that are rented.
Comments on models (Severn Trent Water)
The positive coefficient on the “metered customers” variable indicates, as we expected, that metered customers cost more to serve. We also include the deprivation measures in this model, as deprivation is likely to indirectly influence the scale of retail operations. As expected, these costs are less responsive to differences in deprivation levels but nonetheless, deprivation has some impact on the wider retail function.
Consultation model ID SVTROC1
Company’s model ID 3
Dependent variable Ln(Other opex)
Ln (connected customers) 0.88*** (0.00)
Proportion metered 0.46** (0.03)
Ln(Bill to income ratio(10th percentile)) 0.35*** (0.00)
Unemployment % 0.06
(0.25)
High Density (% customers residing in an area with more than 2000 people per square km)
0.42** (0.01)
Constant 2.42*** (0.00)
R2 adjusted 0.97
Reset test 0.04
VIF (max) 3.1
Estimation method OLS
N (sample size) 71
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
92
Template 82. Retail other expenditure models proposed by Wessex Water and Bristol
Water
Description of dependent variable
Non-bad debt related retail operating costs: The subset of total retail operating costs not included in bad debt related retail operating costs – that is, all household retail operating costs other than debt management and doubtful debt.
Comments on models
See comments on bad debt models by Wessex Water and Bristol Water.
Consultation model ID WSXROC1 WSXROC2 WSXROC3 WSXROC4
Company’s model ID A3 A7 B3 B7
Dependent variable ln(non-bad debt related operation costs)
ln(total customers) 1.061*** (0.000)
1.069*** (0.000)
ln(single service customers) 0.498*** (0.000)
0.268** (0.025)
-0.120*** (0.000)
-0.138** (0.021)
ln(dual service customers) 0.263*** (0.000)
0.250*** (0.000)
Metered customers (%) 0.014*** (0.000)
0.002 (0.610)
0.005*** (0.004)
0.005 (0.114)
Metered household density (per km mains)
-0.0155*** (0.001)
ln(peak traffic speed) -1.830*** (0.000)
-1.217** (0.047)
-0.257* (0.062)
-0.327 (0.286)
Time trend -0.0372** (0.014)
-0.035*** (0.002)
Constant 4.539*** (0.000)
4.104* (0.067)
-3.200*** (0.000)
-2.820** (0.011)
R2 adjusted 0.8743 0.8539 0.9676 0.9709
Reset test 0.0025 0.0010 0.0273 0.0076
VIF (max) 2.83 1.40 1.44 1.44
Estimation method OLS Random effects OLS Random effects
N (sample size) 89 89 89 89
Template 83. Retail other expenditure models proposed by South Staffs Water,
Yorkshire Water and South West Water
Description of dependent variable
Total retail costs less bad debt and debt management costs (customer service OPEX + other OPEX + metering OPEX + capital maintenance)
All costs are unsmoothed and nominal.
Information on selected explanatory variables
Bill size: Water UK
Private and social renters: 2011 census data extrapolated forwards using 2016 regional data on tenure by region from the ONS
Unique customers: based on Ofwat’s PR14 assumption that that the cost to serve a dual customer is 1.3 times greater than for a single customer.
Comments on models (South Staffs Water)
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
93
Model 9 uses Ofwat’s PR14 unique customer measure. Model 10 uses a flexible specification allowing economies of scope to vary. We find limited difference in outcomes across the two specifications, suggesting that the assumption underpinning unique customers’ definition may be broadly appropriate.
Given that retail costs have been changing significantly (primarily becoming more efficient) in recent years, partly driven by the PR14 price control, we would advocate comparative cost assessment using only the most recent data alongside business plan projections.
Comments on models (Yorkshire Water)
Model 9 captures scale, scope and metering costs. Model 11 captures the above as well as a measure of transient population – the proportion of population in social housing.
We included measures of transient population to capture any impact on customer service or other non-bad debt costs. Deprivation or bill size do not figure in as an explanatory variable for BOTEX less bad debt costs.
Comments on models (South West Water)
For BOTEX less bad debt models we have focused on capturing the effect of two key cost drivers:
Economies of scope – the number of dual customers a company serves; and
Metering penetration – the proportion of customers with meters.
We also looked at the proportion of revenue from customer prepayments as a potential driver.
Models 9 and 10 capture scale, scope and metering. Model 12 captures the above as well as the proportion of revenue from customer prepayments.
Modelling for 4 years (2013/14-16/17) or for AMP6 only (2015/15-16/17) has little impact on the model specification or efficiency ranges
Consultation model ID YKYSSCSWBROC1 SSCSWBROC2 YKYROC3 SWBROC4
Company’s model ID 9 10 11 12
Dependent variable Modelled retail less bad debt (log)
Log(total “unique” customers) 0.984*** (0.000)
0.970*** (0.000)
0.988*** (0.000)
Log(total customers) 0.990*** (0.000)
% dual customers 0.247* (0.052)
% metered customers 0.383
(0.130) 0.388
(0.130) 0.543*** (0.067
0.430*** (0.001)
% customers in social housing 1.683
(0.266)
% of revenue from pre-payments -1.310*** (0.007)
Constant 9.514*** (0.000)
9.480*** (0.000)
9.237*** (0.000)
9.569*** (0.000)
R2 adjusted 0.964 0.964 0.965 0.967
Reset test 0.179 0.182 0.410 0.231
VIF (max) 1.018 2.33 1.45 1.031
Estimation method OLS OLS OLS OLS
N (sample size) 68 68 68 68
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
94
3.3 Total expenditure models
Template 84. Retail totex models proposed by Ofwat
Description of dependent variables
Total residential retail costs per household.
Total residential retail costs = total retail operating costs plus depreciation on capital investment. It excludes third party costs.
The denominator, household, is the total number of connected households receiving either water only, wastewater only or dual services.
Comments on models
The variables in our total retail cost models are those that performed well in our more disaggregated models – the bad debt models and the other cost models.
Models 1, 3 and 4 include the proportion of metered households to account for metering costs and, possibly, for higher customer service costs due to more frequent contact. Although the coefficient is not statistically significant in any of the models, its value is plausible and consistent across the different specifications.
Model 1 includes the proportion of dual service households (households which receive both water and wastewater services from the same retailer). This variable aims to capture higher costs associated with dual customers. It appears to capture the higher impact of dual customers on bad debt due to their higher bill relative to single service customers.
Models 2 to 4 include average bill size. Average bill size is very significant in all specifications. Its inclusion makes the proportion of dual customers insignificant, as it provide the same information on the effect of bill size on bad debt. We therefore excluded the proportion of dual customers from models 2-4.
Models 3 and 4 include a proxy for the probability of default (the percentage of households with default). Its coefficient has the correct sign, a plausible magnitude and a reasonable level of significance.
Model 4 includes the total number of households to allow for economies of scale. The significant negative variable implies that the costs per household reduce with the number of households served.
The time dummies suggest that costs have dropped in PR14.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Consultation model ID ORTC1 ORTC2 ORTC3 ORTC4
Dependent variable ------------ ln(total retail cost per household) ------------
% of dual service households 0.006*** (0.000)
Ln(number of households) -0.119** (0.012)
% metered households 0.005
(0.167)
0.004 (0.420)
0.004 (0.376)
Ln(bill size) 0.535*** (0.000)
0.468*** (0.000)
0.641*** (0.000)
% households with default (Eq_lpcf62) 0.026
(0.173) 0.042** (0.014)
2015 dummy 0.025
(0.344) 0.034
(0.156) 0.024
(0.344) 0.024
(0.372)
2016 dummy -0.070** (0.046)
-0.029 (0.301)
-0.043 (0.265)
-0.029 (0.446)
2017 dummy -0.133*** (0.001)
-0.090*** (0.003)
-0.096** (0.012)
-0.064* (0.094)
Constant 2.857*** 0.361 -0.14 0.117
R2 adjusted 0.583 0.612 0.638 0.694
VIF (max) 1.513 1.494 2.019 2.936
Reset test 0.732 0.005 0.033 0.396
Estimation method OLS OLS OLS OLS
N (sample size) 71 71 71 71
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
95
Template 85. Retail totex models proposed by Anglian Water
Description of dependent variable
Retail botex as defined in Anglian Water Cost Modelling report Phase 2 report, published March 2018:http://www.anglianwater.co.uk/about-us/thinking-about-our-future/
Description of selected explanatory variables
Deprivation measure – 80th percentile for IMD with billing used as weight.
Comments on models (Anglian Water)
The drivers of the integrated model are the drivers of the DDDM (Doubtful Debt and Debt Management) and Other Retail models on the assumptions that these models have been properly specified.
Consultation model ID ANHRTC1
Company’s model ID Retail
Dependent variable Total Retail botex
Ln(number of metered customers) 0.484 *** (0.000)
Ln(number of unmetered customers) 0.347 *** (0.000)
Ln(Average bill size) 0.419 *** (0.002)
Ln(Regional Wages) 1.263 *** (0.001)
Deprivation measure 0.582 ** (0.020)
Regional unemployment 4.432 * (0.092)
% wastewater customers of total customers 0.443 *** (0.006)
WoC billed wastewater customers as % of total customers -0.453 ** (0.028)
Billing complaints per 10,000 customers 0.003 ** (0.040)
Time trend 0.036 * (0.097)
Constant
-8.804 *** (0.000)
R2 adjusted 0.9797
Reset test 0.055
VIF (max) 15.19
Method OLS
N (sample size) 89
Template 86. Retail totex models proposed by United Utilities
Description of dependent variable
Total retail costs per household.
Households are counted as one regardless of whether they receive one or two services; this is not a unique customer measure. These models capture economies of scope through the bill size/dual service variables.
Price base is in 2017 CPI terms.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
96
Comments on models (United Utilities)
All models are discussed in more detail in Reckon LLP (2018) published alongside this consultation.
Bill ratio was constructed by Reckon LLP to control for the correlation between bill size and proportion of dual service customers. Bill ratio captures the differences between companies’ average bills, while accounting for the differing service mix.
These models perform well on diagnostic tests of model specification and analysis indicates that the coefficients are robust to observations being omitted. We consider this to be a more important test of predictive power than statistical significance.
Model RT4_d2 seeks to capture extreme deprivation. For example, the top-20 percent referred to is the 20 percent most deprived households, as measured IMD predicted.
Consultation model ID UURTC1 UURTC2
Company’s model ID RT4_d2 RT4_d4
Dependent variable Ln(total retail costs)
% dual service 0.323* (0.06)
0.278 (0.107)
Bill ratio 0.81*** (0.008)
0.859*** (0.004)
Deprivation measure (units vary by measure)
0.606 (0.29)
1.62 (0.177)
2014 dummy 0.065** (0.04)
0.067** (0.035)
2015 dummy 0.119*** (0.002)
0.115*** (0.002)
2016 dummy 0.042* (0.079)
0.035 (0.117)
Constant 2.279 2.016
R2 adjusted 0.676 0.686
VIF (max) 2.72 2.94
Reset test 0.208 0.615
Estimation method OLS OLS
N (sample size) 71 71
Template 87. Retail totex models proposed by Severn Trent Water
Description of dependent variable
Total revenue = operating costs + Depreciation + Amortisation
Description of selected explanatory variables
Customers - total number of households connected
Unemployment - % of the population in the region that are unemployed
Bill to income ratio – average bill (total revenue/number of connected households) divided by weekly earnings. In the models, this is gross weekly earnings of the lowest decile earners in the region.
Density - Proportion of customers residing in an area with more than 2000 people per square km.
Comments on models (Severn Trent Water)
Model 1 OLS: In results not reported we find that the fit of the model (AICC, BIC) improves as we change the denominator of the bill to income ratio variable from median, to 20th and then 10th percentile. We have used the 10th percentile in all of the models included here.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
97
Consultation model ID SVTRTC1
Company’s model ID 1
Dependent variable Ln(Total revenue)
Ln (Customers) 0.87*** (0.00)
Ln(Bill to income ratio(10th percentile)) 0.74*** (0.00)
Unemployment % 0.05
(0.29)
Density 0.42** (0.009)
Constant 3.22*** (0.00)
R2 adjusted 0.98
Reset test 0.012
VIF max 2.1
Estimation method OLS
N (sample size) 71
Template 88. Retail totex models proposed by Welsh Water
Description of dependent variable
Retail Operating Costs = “Total Operating Costs”
Comments on models (Welsh Water)
The submitted retail model controls for deprivation within a total operating cost model. Deprivation is measured using either the Income IMD or the IMD score. A comparable measure of IMD has been produced by Economic Insight detailed in the accompanying report “Evaluating a predicted IMD approach to debt cost assessment-Final-STC-12-03-18.pdf”. The model also controls for the number of customers, the proportion of metered customers and economies of scope. The proportion of metered properties is insignificant but is included from an operational point of view.
Economies of scope has been incorporated using two different approaches:
Models 7 and 8 use Ofwat’s PR14 1.3 assumption for dual service customers to calculate the “Ofwat Adjusted Customers”.
Models 8 and 9 use the number of unique accounts (dual service customers are counted as two accounts). The models then include the proportion of dual customers to account for economies of scope. The negative coefficient indicates the presence of economies of scope.
South West and Bournemouth have been modelled separately in these models.
Costs at outturn prices.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
98
Consultation model ID WSHRTC1 WSHRTC2 WSHRTC3 WSHRTC4
Company’s model ID 8 9 10 11
Dependent variable Ln(Retail Operating Costs)
Ln(Unique Accounts) (,000)
0.973*** (0.000)
0.944*** (0.000)
Ln (Ofwat PR14 Adjusted Customers) (,000)
0.946*** (0.000)
0.931*** (0.000)
Ln(Average Wholesale Bill) 0.330*** (0.003)
0.436*** (0.001)
0.571** (0.010)
0.534** (0.010)
Income IMD (%) 3.842** (0.021)
4.360** (0.011)
IMD Score 0.022** (0.032)
0.030*** (0.005)
% metered customers 0.254
(0.423) 0.251
(0.402) 0.234
(0.450) 0.227
(0.462)
% Dual customers -0.437** (0.014)
-0.309** (0.039)
Constant -6.011*** -6.195*** -7.162*** -6.800***
R2 adjusted 0.980 0.980 0.981 0.980
VIF (max) 3.04 2.87 7.40 5.46
Reset test 0.158 0.143 0.027 0.045
Estimation method OLS OLS OLS OLS
N (sample size) 89 89 89 89
Template 89. Retail totex models proposed by Yorkshire Water
Description of dependent variable
Total retail costs = OPEX + capital maintenance. Costs are not deflated and are unsmoothed.
Comments on models (Yorkshire Water)
We focused on four key cost drivers:
economies of scope, the number of dual customers a company serves;
metering penetration, which drives metering reading costs;
bill size, which increases a company’s exposure to customers defaulting; and
level of deprivation, which increases the propensity of customers to default.
We have additionally considered several measures of transient population to explore the impact that a high turnover of population has on the propensity of customers to default thereby increasing costs of debt management and customer service. Source: Private and social renters: 2011 census extrapolated forwards using 2016 regional data on tenure by region from the ONS.
We have used income deprivation to measure deprivation. Our aggregate BOTEX models also include the Ofwat measure of unique customers from PR14 (the sum of single customers + 1.3 * the sum of dual customers) and bill size. We consider one model with an additional control to capture the impact of metering on costs and another which uses the proportion of the population privately renting to capture population transiency. Income deprivation source DCLG and Welsh government statistics.
Robustness checks:
1. Modelling retail BOTEX as OPEX + depreciation rather than OPEX + capital maintenance does not significantly change the model coefficients or outcomes
2. Model coefficients remain similar and generally statistically significant if we limit to AMP6 data alone 3. Models do not appear to be mis-specified based on the RESET test 4. Models are generally robust to using the random effects estimator. Some coefficients become statistical
insignificant, but the sign and magnitude hold. 5. The models are not materially impacted by the exclusion of outliers.
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
99
Ofwat comment: Model 1 was also proposed by South Staffs Water
Consultation model ID YKYSSCRTC1 YKYRTC2 YKYRTC3
Company’s model ID 1 2 3
Dependent variable Modelled retail BOTEX (log)
Unique customers, Ofwat PR14 measure (log) 0.935*** (0.000)
0.946*** (0.000)
0.915*** (0.000)
Average combined bill (log) 0.374*** (0.002)
0.344*** (0.008)
0.391*** (0.000)
Income deprivation (log) 0.302
(0.348) 0.335
(0.287) 0.375
(0.317)
Proportion of metered customers (%) 0.173
(0.465)
Proportion of private renters (%) 1.375
(0.534)
Constant 9.078*** (0.000)
9.145*** (0.000)
9.056*** (0.000)
R2 adjusted 0.973 0.973 0.973
Reset test 0.410 0.525 0.491
VIF (max) 2.96 3.26 3.84
Estimation method OLS OLS OLS
N (sample size) 68 68 68
Template 90. Retail totex models proposed by Wessex Water and Bristol Water
Description of dependent variable
Total retail operating costs: The totality of household operating retail costs, including opex and capital costs: customer services; debt management; doubtful debts; meter reading; services to developers; other operating expenditure; local authority rates; exceptional items; third party services; depreciation and amortisation.
Comments on models (Wessex Water and Bristol Water)
See comments on bad debt models by Wessex Water and Bristol Water.
Consultation model ID WSXRTC1 WSXRTC2 WSXRTC3 WSXRTC4 WSXRTC5 WSXRTC6 WSXRTC7 WSXRTC8
Company’s model ID A1 A4 A5 A8 B1 B4 B5 B8
Dependent variable ln(total retail operating costs)
ln(total customers)
0.877*** (0.000)
0.966*** (0.000)
1.043*** (0.000)
1.065*** (0.000)
ln(single service customers)
0.536*** (0.000)
0.563*** (0.0000)
0.349*** (0.001)
0.318*** (0.003)
-0.069* (0.087)
-0.134** (0.041)
-0.150** (0.030)
ln(dual service customers)
0.122*** (0.000)
0.159*** (0.0000)
0.226*** (0.000)
0.246*** (0.000)
Metered customers (%) 0.007* (0.062)
0.00198 (0.500)
0.005*** (0.005)
0.002
(0.400)
Metered household density (per km mains)
-0.007** (0.041)
Flats (%) 0.057*** (0.000)
0.060*** (0.001)
0.053
(0.144)
ln(peak traffic speed) -0.364 (0.290)
IMD income (%) 0.164*** (0.000)
0.155*** (0.000)
0.066 (0.167)
0.105* (0.056)
0.027*** (0.001)
0.027*** (0.003)
Property repossessions (%)
0.107*** (0.000)
0.119*** (0.002)
0.121*** (0.000)
0.147*** (0.000)
0.113*** (0.000)
0.130*** (0.000)
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
100
Consultation model ID WSXRTC1 WSXRTC2 WSXRTC3 WSXRTC4 WSXRTC5 WSXRTC6 WSXRTC7 WSXRTC8
ln(average wholesale
bill)
1.206*** (0.000)
0.999*** (0.000)
0.341*** (0.000)
0.301 (0.213)
0.659*** (0.000)
0.480*** (0.000)
0.400*** (0.004)
0.351** (0.019)
Constant -10.02*** (0.000)
-8.06*** (0.000)
-2.74 (0.103)
-3.84** (0.039)
-6.97*** (0.000)
-6.50*** (0.0000)
-5.52*** (0.000)
-5.45*** (0.000)
R2 adjusted 0.9284 0.9283 0.8957 0.9060 0.9821 0.9835 0.9815 0.9824
Reset test 0.016 0.006 0.000 0.000 0.204 0.408 0.007 0.017
VIF (max) 6.98 13.49 6.78 8.12 2.62 9.81 5.79 7.84
Estimation method OLS OLS RE RE OLS OLS RE RE
N (sample size) 89 89 89 89 89 89 89 89
Template 91. Retail totex models proposed by South East Water
Description of dependent variable
Total retail costs = total retail OPEX - third party services + capital expenditure
Costs are unsmoothed and in nominal prices
Description of selected explanatory variables
Prepayment: Ofwat data release for years 2013/14-14/15, assumed same levels over AMP6
Comments on models (South East Water)
For our aggregate retail cost models we have considered 3 key drivers of retail cost: economies of Scope; metering and deprivation.
We have excluded average combined bill. We believe there are sufficient explanatory factors to infer debt related costs and are mindful not to exaggerate number of explanatory factors to the detriment of underplaying more significant elements of retail functions (e.g. billing, customer query/investigations and metering) – subsequently we consider one explanatory factor, unemployment rate (as considered at PR14), to be suitable proxy of deprivation and debt related expenditure.
We consider the proportion of revenue from customer prepayments is a potential driver of bad debt and customer service costs. We have derived this variable from Ofwat’s publication of data used in the retail services efficiency report 28 September 2017. We defined it as the amount of deferred income from customer prepayments over appointed revenues.
During the course of the present regulatory period as SEW become one of the industry leading companies on the proportion of metered customer, our cost to serve reporting has indicated that the real cost impact of servicing metered customers and we are keen to ensure this important factor is not underplayed by the introduction of other explanatory factors added. Including average combined bill appears to reduce the impact of the proportion of metered customers on costs. But as noted, average combined bill may not take into account possible diseconomies in chasing smaller combined bills. Smaller bills generally do not have the advantage of tougher legal action options, and are therefore more costly to chase with smaller reward.
Consultation model ID SEWRTC1 SEWRTC2
Company’s model ID Model 1 Model 2
Dependent variable Modelled retail BOTEX (log)
Unique customers (log) 1.064*** (0.000)
1.078*** (0.000)
Unemployment (%) 0.0644* (0.080)
0.0448 (0.269)
Proportion of metered customers (%) 0.564* (0.082)
0.574** (0.015)
Proportion of revenue from prepayments (%) -1.581*** (0.016)
Constant 8.965*** 9.106***
R2 adjusted 0.969 0.973
Reset test 0.234 0.658
VIF (max) 1.575 1.661
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
101
Estimation method OLS OLS
N (sample size) 68 68
Template 92. Retail totex models proposed by South Staffs Water
Description of dependent variable
Total retail = total retail OPEX – third party services + capital expenditure
Comments on models (South Staffs Water)
Oxera developed models that broadly pass the diagnostic tests, however we have identified an issue with these models not fully capturing company specific levels of deprivation. We have found that modelling bill size and deprivation together may work at an industry level as most companies with higher deprivation are WaSCs, with higher combined bill levels. Such models are however unable to appropriately capture bad debt costs for a WoC with high deprivation levels but a lower bill. It is not appropriate that such companies should be penalised in cost assessment as a result of the statistical distribution of cost across WoCs and WaSCs.
Given this difficulty, we would support validating any econometric modelling with an efficient cost to serve approach for the customer service costs and a separate deprivation model for bad debt and debt collection costs for WoCs.
The models we have included for retail use income deprivation as a cost driver, which we believe to be most robust and reflective of our customer base and service area. We have not been able to develop models which use LSOA data on income deprivation; however this could be a plausible option if the data was robust and an appropriate threshold could be identified.
Given that retail costs have been changing significantly (primarily becoming more efficient) in recent years, partly driven by the PR14 price control, we would advocate comparative cost assessment using only the most recent data alongside business plan projections.
All costs are unsmoothed and modelled in nominal prices.
Ofwat comment: Model 6 is identical to Yorkshire Water’s model 1.
Consultation model ID YKYSSCRTC1 SSCRTC2
Company’s model ID 6 7
Dependent variable Total retail (log)
Unique customers, Ofwat measure (log) 0.935*** (0.000)
Average combined bill (log) 0.374*** (0.002)
0.415** (0.039)
Income deprivation (log) 0.302
(0.348) 0.351
(0.025)
Total customers (log) 0.938*** (0.000)
Proportion of dual customers (%) 0.175
(0.563)
Constant 9.078*** 8.954***
R2 adjusted 0.973 0.973
Reset test 0.410 0.430
VIF (max) 2.96 7.689
Estimation method OLS OLS
N (sample size) 68 68
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
102
4 Enhancement expenditure models
4.1 Meeting lead standards costs
Template 93. Meeting lead standards models proposed by Ofwat
Description of dependent variable
Capital expenditure for meeting lead standards, gross of grants and contributions.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Comments on models
We present three models with alternative scale variables: total water delivered; total population served, and total number of communication pipes. We also include the number of lead pipes replaced as a proxy for the amount of work done to improve lead standards.
We use smoothed data averaged over three-year periods. The models perform slightly better in the original scale than in the logarithmic scale. However we note that the constants in models OE1 and OE2 are negative. If this has a large, distortive, impact on implied efficient costs for companies (eg if the implied cost allowance is negative) we may use the logarithmic model instead. By definition, the logarithmic model does not have the problem of a negative constant.
The estimated coefficients are robust and in line with expectations.
Consultation model ID OE1 OE2 OE3
Dependent variable Meeting lead standards costs (smooth)
Water delivered (smooth) (Ml/d)
0.0015*** (0.000)
Total population served (smooth) (000’s)
0.0003*** (0.000)
Lead communication pipes (number)
0.000002*
(0.051)
Lead communication pipes replaced (number)
0.0004*** (0.000)
0.0004*** (0.000)
0.0004** (0.01)
Constant - 0.156 (0.461)
- 0.109 (0.624)
0.129 (0.593)
R2 adjusted 0.879 0.862 0.843
VIF (max) 1.752 1.534 2.905
Reset test (p-value) 0.087 0.050 0.000
Estimation method RE RE RE
N (sample size) 48 48 48
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
103
4.2 Water new developments and new connections
Template 94. Water new developments and new connections models proposed by
Ofwat
Description of dependent variable
Capital expenditure associated with new developments and new connections, gross of grants and contributions.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Comments on new models
We modelled the costs of new developments combined with the costs of new connections. We did so for two reasons. First, because these activities shared common cost drivers. Second, we wanted to mitigate potential cost allocation issues between the two activities. New connections expenditure was not reported separately until we requested companies to do so in December 2017.
We present two alternative models with a single (scale) variable. The models perform reasonably well and the coefficients are in line with expectations. We use data from 2005-06 with a three-year moving average to smooth the lumpiness of the data and mitigate misalignment of the costs and the drivers in any one year.
We will also consider including new developments and new connections costs as part of the wholesale water econometric models.
Consultation model ID OE4 OE5
Dependent variable ln smooth (new developments and new connections costs)
ln total population served (smooth) (000’s)
1.061*** (0.000)
ln total number of household and non-household new connections (smooth) (000’s)
1.040*** (0.000)
Constant - 6.498*** (0.000)
- 0.242 (0.309)
R2 adjusted 0.823 0.815
VIF (max) 1.000 1.000
Reset test (p-value) 0.228 0.736
Estimation method RE RE
N (sample size) 70 70
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
104
4.3 First time sewerage costs
Template 95. First time sewerage models proposed by Ofwat
Description of dependent variable
Capital expenditure for new and additional sewage treatment and sewerage assets for first time sewerage schemes to meet the duty under s101A of the Water Industry Act 1991. The expenditure is gross of grants and contributions.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Comments on models
We consider that first time sewerage costs are likely to be related to:
The size of the s101A schemes as measured by the number of connectable properties;
The number of s101A schemes completed; and
The size of the scheme, for which the average number of properties per scheme is a proxy.
Our models use data from 2009-10 with a three-year moving average to smooth the lumpiness of the data and mitigate misalignment of the costs and the drivers in any one year. The models performed better in the original scale than in logarithmic scale. The estimated coefficients are robust and in line with expectations.
Consultation model ID OE6 OE7 OE8
Dependent variable ------------ smooth (first time sewerage costs) ------------
Connectable properties served by s101a schemes (smooth)
0.017*** (0.000)
0.012*** (0.000)
S101a schemes (smooth) 1.245*** (0.000)
0.432*** (0.003)
Average number of connectable properties per s101a schemes (smooth)
0.009* (0.096)
Constant 0.063
(0.842) 0.877** (0.014)
0.584* (0.096)
R2 adjusted 0.824 0.918 0.923
VIF (max) 1.144 1.000 4.588
Reset test (p-value) 0.000 0.945 0.987
Estimation method RE RE RE
N (sample size) 59 59 59
Cost assessment for PR19: a consultation on econometric cost modelling Appendix 1: Modelling results
105
4.4 Sewage growth
Template 96. Sewage growth models proposed by Ofwat
Description of dependent variable
Capital expenditure associated with three areas: new developments and growth; growth at sewage treatment works and reducing sewer flooding risk for properties. The costs are gross of grants and contributions.
All monetary values have been inflated to 2016-17 prices using the CPIH.
Comments on models
We combined costs of three enhancement activities: new development and network growth; growth at sewage treatment works; and reducing sewer flooding risk. These activities are likely to be affected by similar factors (eg the size of the customer base) and combining them will mitigate issues regarding potential inconsistencies in the way companies allocated costs between reducing sewer flooding risk and new development and network growth.
We present models with two alternative scale variables, resident population and number of household and non-household properties billed for sewerage. We include load per sewage treatment works in two models, to capture economies of scale.
We use a three-year moving average to smooth the lumpiness of the data and mitigate misalignment of the expenditure and the drivers in any one year. The models perform better in the original scale than in logarithmic scale. The estimated coefficients are robust and in line with expectations.
We will also consider including sewage growth costs, including new developments, as part of the wholesale wastewater econometric models.
Consultation model ID OE9 OE10 OE11 OE12
Dependent variable ------------ smooth (sewage growth) ------------
Resident population (smooth) (000s)
0.005*** (0.000)
0.003* (0.068)
Household and non-household properties billed for sewage (smooth)
0.012*** (0.000)
0.006* (0.069)
Load per sewage treatment work (smooth) (kg BOD5/day)
0.012
(0.113) 0.015** (0.022)
Constant 4.869
(0.279) 2.766
(0.603) 7.004
(0.139) 6.109
(0.223)
R2 adjusted 0.780 0.751 0.817 0.818
VIF (max) 1.000 1.000 4.753 3.590
Reset test (p-value) 0.308 0.154 0.683 0.682
Estimation method RE RE RE RE
N (sample size) 40 40 40 40