how many replicate tests are needed to test cookstove...
TRANSCRIPT
-
1
How many replicate tests are needed to test cookstove performance and 1
emissions? – Three is not always adequate 2
Yungang Wang1,*
, Michael D. Sohn1, Yilun Wang
2, Kathleen M. Lask
3, Thomas W. 3
Kirchstetter1,4
, Ashok J. Gadgil1,4
4
1Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720 5
2ISO Innovative Analytics, San Francisco, CA 94111 6
3University of California - Berkeley, College of Engineering, Applied Science and Technology 7
Program, Berkeley CA 94720 8
4University of California - Berkeley, Department of Civil and Environmental Engineering, 9
Berkeley CA 94720 10
11
Corresponding author: Yungang Wang (Email: [email protected]; Phone: 1-510-495-8065) 12
13
Abstract 14
Almost half of the world’s population still cooks on biomass cookstoves of poor efficiency and 15
primitive design, such as three stone fires (TSF). Emissions from biomass cookstoves contribute 16
to adverse health effects and climate change. A number of improved cookstoves with higher 17
energy efficiency and lower emissions have been designed and promoted across the world. 18
During the design development, and for the selection of a stove for dissemination, the stove 19
performance and emissions are commonly evaluated, communicated and compared using the 20
arithmetic average of replicate tests made using a standardized laboratory-based test, commonly 21
the water boiling test (WBT). However, the statistics section of the test protocol contains some 22
debatable concepts and in certain cases, easily misinterpreted recommendations. Also, there is 23
mailto:[email protected]
-
2
no agreement in the literature on how many replicate tests should be performed to ensure 24
“confidence” in the reported average performance (with three being the most common number of 25
replicates). This matter has not received sufficient attention in the rapidly growing literature on 26
stoves, and yet is crucial for estimating and communicating the performance of a stove, and for 27
comparing the performance between stoves. We illustrate an application using data from a 28
number of replicate tests of performance and emission of the Berkeley-Darfur Stove (BDS) and 29
the TSF under well-controlled laboratory conditions. Here we focus on two as illustrative: time-30
to-boil and emissions of PM2.5 (particulate matter less than or equal to 2.5 micrometers in 31
diameter). We demonstrate that interpretation of the results comparing these stoves could be 32
misleading if only a small number of replicates had been conducted. We then describe a 33
practical approach, useful to both stove testers and designers, to assess the number of replicates 34
needed to obtain useful data from previously untested stoves with unknown variability. 35
36
Keywords: Cookstove; Berkeley-Darfur Stove; Variability; Confidence Interval; Kolmogorov–37
Smirnov Test; Bootstrap 38
39
1. Introduction 40
About half of the world’s population uses biomass as fuel for cooking (IEA, 2004). The smoke 41
from biomass cooking fires was recently found to be the largest environmental threat to health in 42
the world, and is associated with 4 million deaths each year (Lim et al., 2012). This exposure 43
has also been linked to adverse respiratory, cardiovascular, neonatal, and cancer outcomes 44
(Smith et al., 2004; Weinhold, 2011). A 2011 World Bank report notes significant contributions 45
-
3
of biomass cooking to global climate change (World Bank, 2011). The contribution to climate 46
change from black carbon (BC) emission from biomass cooking is a topic of growing interest, 47
especially in terms of climate forcing and melting of glaciers (Hadley et al., 2010; Ramanathan 48
and Carmichael, 2008). Current biomass stoves lead to a large burden of disease, and contribute 49
to adverse impacts on local and the global environment. Hence there is substantial interest in 50
developing and disseminating fuel-efficient biomass stoves with reduced emissions (e.g. DOE 51
2011). Launched in September 2010, the Global Alliance for Clean Cookstoves (GACC) “100 52
by 20” goal calls for 100 million homes to adopt clean and efficient stoves and fuels by 2020. 53
The “three-stone fire” (TSF) is a commonly prevailing cooking method for a large 54
fraction of the population at the base of the economic pyramid. In quantifying the performance 55
of an improved stove, the TSF is commonly used as the baseline. This least expensive class of 56
stove is simply an arrangement of three large stones supporting a pot over an open and unvented 57
biomass fire. A TSF is one of the two stoves we analyzed in this study. We also tested the 58
performance and emissions of the Berkeley-Darfur Stove (BDS) as an exemplar of an improved 59
fuel-efficient biomass cookstove. The BDS was developed at Lawrence Berkeley National 60
Laboratory (LBNL) for internally displaced persons in Darfur, Sudan 61
(http://cookstoves.lbl.gov/darfur.php). It is an all-metal precision-designed natural-convection 62
stove, with design features co-developed by iterative feedback from Darfuri women cooks. The 63
BDS by design accommodates Darfuri traditional round-bottom cooking pots and cooking 64
techniques (Figure 1). 65
A literature survey of recent laboratory cookstove testing in peer-reviewed journal 66
articles shows widely different numbers of replicate tests (Bailis et al., 2007; Jetter and Kariher, 67
2009; Jetter et al., 2012; MacCarty et al., 2008, 2010; Roden et al., 2009; Smith et al., 2007). 68
-
4
The number of replicates reported in these seven studies range from 1 to 23. However, six out of 69
seven studies have reported results with only 3 or fewer replicates. One then can rightly ask: 70
how many replicate tests do I need to test the performance and emissions of the stove? 71
Answering this question is application specific, and requires greater specificity. For example, 72
the question might be better phrased. For a water boiling test (WBT), how many replicates are 73
needed to estimate the average “time to boil” to within 2 minutes and with 95% confidence? Or 74
how many replicates are needed to confirm, with 95% confidence, that Stove “A” emits less 75
PM2.5 than Stove “B”? These questions exemplify perhaps the most frequently asked questions in 76
planning stove experiments and interpreting their results. 77
There is no single or simple answer to the number of replicates needed to answer the 78
above questions. The answer depends on the experimental design, how many parameters need to 79
be estimated, and the resulting variability in the stove replicates. In this study, we investigate 80
how to answer the above questions using data from the BDS and TSF water boiling experiments. 81
We show how the number of replicates is linked to uncertainty and variability in the experiments 82
and stove performance. We also show how many replicates are likely needed as various 83
practical performance comparisons, such as “Does Stove A perform better than Stove B?” and 84
“What is the uncertainty in the expected performance of Stove A or Stove B?” Finally, we 85
describe a practical approach to design an experiment to test the performance of a previously 86
untested stove. 87
88
2. Problem statement and causes of variability 89
-
5
The Appendix 6 of the WBT (version 3.0, http://www.pciaonline.org/node/1048) 90
provides a detailed approach for comparing the performance of stoves. It describes a suite of test 91
statistics and important considerations for interpreting test results. While comprehensive, the 92
description contains some debatable concepts and in certain cases, easily-misinterpreted 93
recommendations. For example, it affirms “At least three tests should be performed on each 94
stove” and provides a cogent explanation for it. It also discusses the importance of paying 95
attention to the statistical significance of a series of comparison tests between the performances 96
of two stoves. While both statements are correct, it is not surprising that stove testers 97
misinterpret these comments as (i) “only three tests are needed” or (ii) a hypothesis test with 98
strong p-value (assuming a Gaussian distribution) provides unarguable confirmation of stove 99
performance or comparison results. In fact, neither interpretation is correct or claimed in the 100
text. We reason further elucidation of Appendix 6 is necessary, and a more transparent 101
methodology would greatly benefit stove testers. We believe a transparent methodology would 102
be best accomplished by developing an approach that maps the trade space between sample size, 103
variability, and confidence. We also believe it is important to show that alternative methods for 104
comparing the performances of stoves are available and should be considered. This work thus 105
builds and improves upon Appendix 6 by providing new methods of interpreting test results for 106
stove testers. 107
The literature generally shows that even under carefully controlled conditions, stove test 108
results show high test-to-test variability (coefficient of variation > 1.0, e.g. Jetter et al., 2012). 109
There are many possible causes of this variability even within a precisely defined test such as the 110
latest WBT (version 4.2.2), and we list a few here. Stove efficiency and emissions are generally 111
a function of thermal power, and owing to the discrete nature of fuel-feeding events, a stove’s 112
-
6
thermal power invariably varies, also contributing to temporal variability within a test, which can 113
translate into test-to-test variability. Despite due care, the ratio of bark to sapwood to hardwood 114
for various pieces of fuelwood can be different, and thus will have different burn characteristics. 115
Furthermore, different pieces of fuelwood may have different surface to volume ratios, 116
contributing to different rates of burning. Lastly, even reasonably experienced and careful stove 117
testers demonstrate some variability in the way they tend the fire in the stove from test to test, 118
and within a test (Granderson et al., 2009). All these (and other uncontrolled factors) together 119
give rise to what we lump together as variability in the test-to-test replicate results for a stove 120
under controlled laboratory conditions. 121
122
3. Approach 123
The question of “How many replicate tests do I need?” is not novel. It is a well-researched 124
question in classical statistical theory, but has not received much attention from the stove 125
research community. We briefly summarize here the statistical background relevant to answer 126
the question. 127
3.1 Probability density function and cumulative distribution function 128
Technically, for a continuous random variable, the probability density function (PDF) describes 129
the probability that a value will be within a certain range of the sample. However, as this range 130
is evaluated by integrating, it can be chosen to be quite small, so for most practical purposes, the 131
PDF may be considered the probability of obtaining a particular value (Ellison, 2009). 132
Graphically, if the PDF is a curve, the cumulative distribution function (CDF) is the area under 133
that curve. It is used to compute probability; the larger the included range, the greater the 134
-
7
probability. Because of this, the CDF over the entire range is equal to 1. For a normal (or 135
Gaussian) distribution, the CDF curve is a normal ogee curve, which is a smooth even S-shaped 136
curve (Ellison, 2009). Skewing in the distribution away from the Gaussian will lead to one half 137
of the S to be elongated or distorted. 138
3.2 Standard error and confidence interval for an average 139
The standard deviation refers to the variation of observations within individual experimental 140
units, whereas the standard error refers to the random variation of an estimate (made with n 141
replicates) from the mean value that will be obtained as the number of replicates increases. The 142
standard deviation is calculated by: 143
√ ∑( ̅)
(1) 144
where are the individual measurements used to calculate the average. A 145
convenient way to calculate the sample standard deviation is using the “STDEV” function in 146
Excel. The standard error is the measure of the experimental error of an estimated statistic (e.g. 147
the mean). For the sample average ̅ from n replicate tests, the standard error ̅ is √ , where 148
is the standard deviation of the n replicates. The standard error on the mean can be reduced by 149
increasing the number of replicates. Replication will not reduce the standard deviation but it will 150
reduce the standard error. In practical term, this means that our goal is to achieve a standard 151
error small enough to make convincing and useful conclusions. Additionally, in our experience, 152
computing the variance can be problematic from very few replicates. It is mathematically 153
correct that a variance can be computed from just three replicates. However, we have commonly 154
found that three replicates resulted in a somewhat small variance, only to be often greater or 155
-
8
much greater once we include the fourth and fifth sample. As a rule-of-thumb we are dubious of 156
variances computed from fewer than five replicates. 157
The confidence interval indicates the reliability of an estimate made from a given number 158
of replicates. The ( ) confidence interval for the average ̅ has the form ̅ , 159
where is called the half-length, since a segment of the length of 2E centered on ̅, provides the 160
full confidence interval. E is related to α, σ, and n (the number of replicates) by the following 161
equation. 162
⁄ √ (2) 163
Where ⁄ is a dimensionless number that can be looked up in standard handbooks for various 164
standard distributions (e.g. Berthouex and Brown, 2002). Transposing equation (2), the number 165
of replicates that will produce this interval half-length is 166
(
⁄
) (3) 167
This assumes random sampling. It also assumes that n is large enough that the normal 168
distribution can be used to define the confidence interval. To apply equation (3), we must 169
specify ( ) . Values of ( ) that might be used are shown in the top row 170
with corresponding values of Z in the bottom row of Table 1. 171
When the measurements are assumed to be normally distributed but the number of 172
replicates is small (by small, textbooks suggest less than 30) and the population standard 173
deviation is unknown, a Student’s t-distribution is used (Berthouex and Brown, 2002). To 174
calculate the number of replicates n, the coefficient is used in place of ⁄ shown in equation 175
-
9
(3). A selection of t-values is listed in Table 2. The t value decreases as n increases, but notice 176
that there is little change once n exceeds 5. An exact solution of the number of replicates for 177
small n (less than 30) requires an iterative solution, but a good approximate is obtained by using 178
a rounded value of t = 2.1 or 2.2, which covers a good working range of n = 10 to n = 25 (p = 179
0.05). When analyzing data we carry three decimal places in the value of t, but that kind of 180
accuracy is misplaced. The greatest uncertainty lies in the value of the specified (refer to 181
Equation (2)), so we can conveniently round off t to one decimal place. Additional information 182
about confidence interval estimation and experiment sizing can be found in Berthouex and 183
Brown (2002), Spiegel et al. (2008), and Taylor (1997). 184
3.3 Bootstrapping 185
All the preceding discussion was predicated on the assumption of a Gaussian distribution of 186
underlying population. What if the distribution is not Gaussian? Bootstrapping is a powerful 187
statistical approach that allows estimation of the variability of many properties of the data 188
without making any assumptions about the shape of the original distribution F. Efron (1979) 189
provides an accessible explanation, with examples, of the bootstrap method. The key principle 190
of Bootstrapping is to simulate repeated observations from the unknown distribution F, using 191
repeated sampling of the obtained single set of data. Bootstrapping can be implemented by 192
constructing a number of resamples of the observed dataset. Each resample is obtained by 193
random sampling with replacement from the original dataset (Varian, 2005). Increasing the 194
number of resamples can reduce the impact of random sampling errors, but it cannot increase the 195
amount of information existing in the original dataset (Efron and Tibshirani, 1993). 196
3.4 Kolmogorov-Smirnov test 197
-
10
The Kolmogorov-Smirnov (K-S) test quantifies whether two cumulative distribution functions 198
(CDFs) are from the same population. It does so by exploring the maximum distance between 199
the two CDFs. Corder et al. (2009) provide a good summary of the K-S test. The null 200
hypothesis of a K-S test poses that the two samples are from the same population, and the 201
research hypothesis poses either that they generally differ, leading to a two-tailed probability 202
estimate, or that they differ in a specific direction, leading to a one-tailed estimate (Wall, 2003). 203
The K-S test can be used to compare a sample distribution and a reference distribution or to 204
compare two sample distributions. We will apply this test to help us explore how many 205
replicates are needed to confirm whether the performance of two stoves is indistinguishable. 206
The K-S test is a nonparametric statistical test and is only limited by the condition that it 207
must be applied to continuous distributions. Unlike the t-test and other parametric tests, which 208
require assuming Gaussian distribution, continuity is the primary requirement for application of 209
K-S test making it a very useful tool with unknown distributions. Also for small and medium 210
samples, it is more effective to use the K-S test over other nonparametric “goodness-of-fit” tests, 211
such as the chi-square test or the Wilcoxon test. The different research hypotheses of the K-S 212
test also provide directional flexibility which the chi-square test cannot provide (Wall, 2003). 213
214
4. Methods 215
4.1 Laboratory testing 216
Laboratory tests of BDS and TSF were performed at the LBNL cookstove testing facility. 217
Concentrations of PM2.5 (particulate matter less than or equal to 2.5 micrometers in diameter), 218
carbon monoxide/carbon dioxide (CO/CO2), BC, and several other co-pollutants emitted from 219
-
11
the BDS and TSF were simultaneously measured. The DustTrak measures the amount of light 220
scattered by particles and relates that to their mass. It is calibrated for a National Institute of 221
Standards and Technology (NIST) certified PM standard composed of soil from Arizona. Since 222
the amount of light scattered by particles is specific to their morphology and chemical 223
composition, in this study a calibration specific to wood smoke was developed, per the 224
manufacturer’s recommendation, by comparing PM2.5 concentrations measured with the 225
DustTrak after adjusting for secondary dilution to those measured gravimetrically. However, the 226
DustTrak data are not as reliable and consistent as gravimetric results. 227
The CO/CO2 concentrations were measured in a single instrument by nondispersive 228
infrared absorption spectroscopy (NDIR analyzer, CAI 600 series). A cookstove smoke-specific 229
calibration was developed for the BC aethalometer measurements. The results were compared 230
with particle light-absorption coefficients measured with a photoacoustic absorption 231
spectrometer (PAS) and elemental carbon concentrations measured using a thermal-optical 232
analysis method. The moisture content of each piece of fuel wood was measured using a 233
moisture meter (Delmhorst, J-2000). Soft (pine and fir) and hard (oak) woods were used in an 234
equal number of tests with both stove types. Soft wood pieces were saw-cut to approximately 15 235
cm long with a square cross-section of approximately 4 cm2 and hard wood pieces were hatchet-236
cut to a similar size but irregular shape. The variability in the laboratory test results could 237
probably be further reduced by using consistent quality wood with more consistent dimensions. 238
The BDS and TSF were compared using a modification of the WBT V3.0 protocol. The 239
WBT is intended to provide a method to compare the performance and emissions of different 240
stoves in completing a defined standardized task (Bailis et al., 2007). In our modified protocol, a 241
fire is ignited and maintained by periodic feeding of fuelwood to bring 2.5 L of water in a 2.3 kg 242
-
12
metal Darfur pot (without pot lids) to boil and subsequently maintain it on simmer for 15 243
minutes, whereupon the fire is extinguished and the mass of remaining fuelwood is measured. 244
The WBT suggests a default test volume of water of 5 L. We chose to test with 2.5 L of water, 245
because it reflects the actual volume of food stove users prepare at a time. Our previous testing 246
results show no significant difference of time to boil between cold start and hot start for both the 247
BDS and TSF. Therefore, only one high-power phase (cold start) was included in each test. 248
Note that the International Organization for Standardization (ISO) International Workshop 249
Agreement (IWA) metrics average high-power (cold start and hot start) values 250
(http://www.pciaonline.org/files/ISO-IWA-Cookstoves.pdf). When three WBT replicate tests 251
are performed, n is equal to 6. 252
One of the main metrics in our modified WBT test is the time to boil. In an important 253
report by the United States Agency for International Developing (USAID, 2008), authors state, 254
“Fuel-efficient stoves can deliver numerous benefits to end-user households, including fuel and 255
time savings.” This underlines what we found in our work in Darfur, time savings are indeed 256
important to the users. Moreover, we learned from our field partners that the most attractive 257
feature of the BDS is that the stove could take their drinking water to boiling in less than 5 258
minutes. The refugee women in Darfur IDP camps have named the BDS in Arabic “Kanun 259
Khamsa Dagaig” (i.e., “the 5-min stove”), indicating this as the single most important feature of 260
the BDS from their perspective. Therefore, we believe “time to boil” is an important testing 261
matrix from the user perspective and consequently, it is important for us to examine for both 262
BDS and TSF. 263
Stove testers control the fuel feeding rate that determines the time to boil. Two trained 264
stove testers were employed for all the tests in this study. The average fuel burning rates for 265
-
13
BDS and TSF are 12.2 ± 0.9 g/min and 13.8 ± 1.3 g/min (mean ± 1SD), respectively. These 266
values indicate that the fire tending skill of the two testers is very consistent. Please note in other 267
areas of the world where fuel is more abundant and inexpensive compared to Darfur, users often 268
sacrifice fuel consumption for time savings. As shown in Figure 1, the BDS has a small fire box 269
opening to prevent using more fuel wood than necessary. The TSF has no such restriction, so it 270
can achieve a higher fuel burning rate than the BDS, therefore, the TSF could have a shorter time 271
to boil if fuel consumption is not an issue. The detailed testing methodology and results are 272
given by Kirchstetter et al. (2010). 273
4.2 Data analysis 274
Stove performance is strongly influenced by the skill of the person tending the stove. Dozens of 275
tests were practiced by trained stove testers on both TSF and BDS, and these data were discarded 276
before performing the tests to produce the data reported in this paper. This ensured that the 277
variability observed in the test results was not being primarily influenced by increasing skill of 278
the tester in tending the stove. There were 20 and 21 tests completed for TSF and BDS for data 279
analysis, respectively. All instrumentation discussed above operated properly during these 41 280
tests. The statistical analysis was performed using Statistical Analysis System (SAS Institute 281
Inc., version 9) and R (http://www.r-project.org/). 282
283
5. Results and discussion 284
5.1 Data overview 285
http://www.r-project.org/
-
14
The stove performance and emission results of 21 BDS tests and 20 TSF tests are 286
comprehensively presented in Kirchstetter et al. (2010). The moisture content and dry mass of 287
the soft and hard woods were similar to each other and were the same for TSF and BDS tests. 288
The completion of tests with softwood (10 tests) required about 90% of the time duration and 289
90% of the wood mass compared to those with hard wood (10 tests). 290
The data of time to boil and PM2.5 emission factor (g/g of fuel consumed) for TSF and 291
BDS are selected for the statistical analysis in this study. We understand that PM2.5 emissions 292
per energy delivered to the cooking pot (g/MJ delivered) is an important metric of cookstove 293
performance, because it is based on the fundamental desired output - cooking energy- that 294
enables valid comparisons between all stoves and fuels (Smith et al., 2000). Also cooking 295
energy tends to have less variation than time to boil, so it might require a smaller number of 296
replicates. However, the data for the mass of water evaporated and the mass of fuel consumed 297
during cold start were not collected when these tested were conducted. Thus, a shortcoming of 298
this study is that it is not possible to calculate the emission factors based on energy delivered to 299
the pot. 300
The histogram plots of these data are shown in Figure 2 and Figure 3. The CDF plots for 301
the same data are shown in Figure 4 and Figure 5. On average, cooking tests with the BDS were 302
completed in 74% of the time for TSF (30.3 minutes vs. 41.0 minutes). There was less variation 303
in time to boil with the BDS, as indicated by a narrower spread in the CDF curves for BDS 304
compared to TSF (Figure 4). The average PM2.5 emission factor for the BDS tests was 80% of 305
that for the TSF (3.1 g/kg-wood burned vs. 3.9 g/kg-wood burned). PM2.5 shows large test-to-306
test variability. The distributions of BDS and TSF PM2.5 data overlap substantially, but the 307
-
15
question to answer is whether data from BDS and TST tests show performance data that are 308
different and discernable. 309
5.2 Number of replicate tests to estimate the mean 310
We now discuss the number of replicate tests needed to estimate the experiment mean within a 311
user-defined level of confidence. For example, suppose the analyst desires to compute the 312
expected boil time of the BDS within a range of plus or minus 2 minutes. Suppose also that the 313
analyst desires the certainty of that estimate to be 95%. In other words, the analyst is saying, “I 314
would like to know the number of replicate tests needed to compute the average time to boil of 315
the BDS within a range of 4 minutes, and I want to know that range with a confidence of 95%.” 316
Figure 6 shows the number replicates needed for three probability levels (0.1, 0.05, and 0.01), 317
which correspond to confidences of 90%, 95%, and 99%, respectively. We compute the number 318
of replicates using equation (3). The x-axis represents the number of replicates ranging from 1 to 319
25. The y-axis represents the width of the confidence interval about the mean, which is twice the 320
E value in equation (2). As can be seen in the figure, the smaller the confidence interval about 321
the mean desired, the larger the number of replicates required. 322
As the 0.05 probability in Figure 6 shows, if the width of the confidence interval for the 323
mean time to boil is 4 minutes at the probability of 0.05, 7 replicates are required. Note that σ 324
for the underlying distribution in equation (2) is calculated based on the original 21 replicate 325
tests. If only two replicates are conducted, the width of the confidence interval about the mean is 326
38 minutes at the probability of 0.05 (191 minutes for the probability of 0.01, 19 minutes for the 327
probability of 0.10). When the number of replicates increases to 5, the width shrinks to 5.3 328
minutes at the probability of 0.05 (8.8 minutes for the probability of 0.01, 4.1 minutes for the 329
-
16
probability of 0.10). The width of the confidence interval about the mean is relatively stable 330
when the number of replicates is greater than 15. A similar trend is observed for the BDS PM2.5 331
emission factor data. The width of the confidence interval about the mean BDS PM2.5 emission 332
factor is enormous for n < 5, and becomes steady when n > 10. 333
5.3 Number of replicate tests to compare two stoves 334
We now discuss how many replicate tests are needed to confirm whether the performance of two 335
stoves is indistinguishable, within a level of confidence. In essence, we test whether the 336
underlying statistical distribution of the two stoves for the mean boil time or emission factor are 337
the same. Figure 7 shows the probability as a function of the number of replicates calculated 338
using the K-S test. 339
On the x-axis is the number of replicates. For every replicate number, we generated 340
50,000 bootstrap samples using the original 21 replicate tests for the BDS and 50,000 bootstrap 341
samples using the original 20 TSF replicate tests. For each pair of samples, we compute the 342
probability (p value) that they come from the same distribution. We then compute the ratio, or 343
probability, of the number of pairs that come from the same distribution divided by 50,000 with a 344
confidence of 95%. The y-axis shows the resulting probability. When the number of replicates 345
is greater than 6, the probability that the BDS and the TSF time to boil data are from two 346
different distributions is greater than 95%. For the PM2.5 emission factor data, 30 replicates are 347
required to ensure that at least 95% chance the BDS and the TSF samples are drawn from two 348
different distributions. 349
5.4 A practical approach to assess the number of replicate tests 350
-
17
The difficulty with estimating the number of replicate tests needed for one particular stove is the 351
lack of prior knowledge about the expected σ of the planned experiments. We knew the σ for the 352
above demonstration because we had already conducted 21 replicates. 353
In the absence of the σ, the experiment designer must speculate on the variance. We 354
recommend reviewing the literature of similar stoves to pose a notional variance. In the absence 355
of such data, then the designer must use any other information as a starting point, such as the 356
variance computed from the TSF and BDS replicates reported here. Note the σ values for BDS 357
and TSF for time-to-boil are 2.1 minutes and 5.6 minutes, respectively, and for emission factor 358
for PM2.5 they are 1.2 g/kg-wood and 1.0 g/kg-wood, respectively. The σ values calculated for 359
all measured variables are summarized in Table 3. 360
Note the wide difference in the three-stone-fire and the Berkeley-Darfur Stove. The 361
former is a set up with three stones with irregular shape, and the dimensions and shape and 362
spacing of the stones can vary from test to test. Results reported in the literature have generally 363
been with consistent dimensions, shape, and spacing of the stones (or bricks). This factor may 364
have caused more variation in our TSF results compared to literature values. In contrast, the 365
BDS is precisely engineered metal stove of fixed dimensions. The remarkable point is that while 366
there is a difference in the σ values for the time to boil, there is not a large difference in the σ 367
values for emission factors of the TSF and BDS despite the significant design difference. So, we 368
recommend starting conservatively, with the notional σ similar to the value for the TSF. If the 369
designer’s stove or testing conditions are likely to show less variation, then perhaps start with a 370
notional variance that is 10% less. Conversely, our BDS experiment was conducted in a 371
controlled laboratory setting. If the designer expects greater variation in the experiment (say, 372
owing to variable field conditions), then begin with a notional variance of 10, 50, or even 100 373
-
18
percent greater. For example, when testing fan-assisted stoves, which burn engineered wood 374
pellets, we might start with a notional variance that is 10% smaller than was found here owing to 375
the uniform nature of the engineered fuel pellets. On the other hand, when testing an open fire 376
(the fire is open completely to the ambient environment), we might begin with a notional 377
variance that is twice that of the TSF (three stones or bricks are places between the fire and the 378
ambient to provide the support to the cooking pot and some insulation of the fire) laboratory tests 379
reported here. 380
With a notional variance, the designer would proceed with equation (3) to compute the 381
number of replicates needed based on the desired size of confidence interval (E) and the level of 382
confidence desired (α). Remember also that test conditions change, instruments malfunction, 383
and interpretation of tests differ (such as the precise time of onset of hard boil, or the precise 384
duration that water simmer, can be questionable). These factors should also be considered 385
beyond what is computed from the above statistics to arrive at the number of replicate tests. 386
More replicate tests should be planned than required by the statistical estimation to compensate 387
for these unusual occurrences. This also increases the margin of safety in case the variability in 388
the underlying distribution, represented by the standard deviation (σ) in equation (2), is larger 389
than anticipated. A conservative margin of 100% is recommended based on our abundant stove 390
laboratory testing experience. 391
With the number of replicate tests determined, the experimenters conduct the tests. With 392
these data now in hand, the experimenters can calculate the actual, observed variance computed 393
from the experiment. This value should be used to estimate the analysis results. One might need 394
to conduct additional replicate tests to achieve the desired confidence interval and desired level 395
of confidence in the mean estimation from the test results. 396
-
19
397
6. Conclusions 398
Our results show moderate inherent variability (coefficient of variation up to 0.4) among the TSF 399
and BDS time to boil and PM2.5 emission measurements based on the modified WBT protocol. 400
We demonstrate using these data as examples that some stove laboratory testing results could be 401
misleading if only a small number of replicate tests were conducted. However, there are costs 402
associated with increasing the number of replicates. The average value of any measured 403
parameter should be always reported together with the number of replicates conducted and the 404
uncertainty (e.g. standard deviation or confidence interval). Cautions must be exercised in the 405
interpretation of results based on only a few replicates. We then describe a practical approach to 406
calculate the number of replicate tests needed to obtain useful data from previously untested 407
stoves. 408
The implications of these results include the following: (1) In the stove design and 409
laboratory testing phase, researchers need to conduct a relatively large number of replicate tests 410
to ensure with some confidence that the improvements of stove performance and emission levels 411
are truly achieved. (2) In the stove field testing phase, even more tests are required because of 412
the less controlled testing environment and the associated larger inherent variability within the 413
replicates. (3) In the stove dissemination and adoption phase, decision makers and policy 414
analysts should take into consideration the variability and confidence intervals of the laboratory 415
and field testing results prior to any decisions. 416
417
Acknowledgements 418
-
20
This work was performed at the Lawrence Berkeley National Laboratory, operated by the 419
University of California, under DOE Contract DE‐AC02‐ 05CH11231. We gratefully 420
acknowledge partial support for this work from LBNL’s LDRD funds, and DOE’s Biomass 421
Energy Technologies Office. The data used in this work were collected during research 422
supported with grant number 500-99-013 from the California Energy Commission (CEC). Its 423
contents are solely the responsibility of the authors and do not necessarily represent the official 424
views of the CEC. Kathleen M. Lask was supported with National Defense Science and 425
Engineering Graduate (NDSEG) Fellowship and the National Science Foundation Graduate 426
Research Fellowship. The authors gratefully acknowledge Douglas Sullivan, Jessica 427
Granderson, Chelsea Preble, Odelle Hadley and Philip Price of Lawrence Berkeley National 428
Laboratory for their support of this project, as well as the many students, interns, and researchers 429
who, before us, contributed to the development of the Berkeley-Darfur Stove. The authors are 430
very grateful to have the paper manuscript reviewed by the journal reviewers. The paper quality 431
is substantially improved owing to the careful review. 432
433
Disclaimer 434
This document was prepared as an account of work sponsored by the United States Government. 435
While this document is believed to contain correct information, neither the United States 436
Government nor any agency thereof, nor The Regents of the University of California, nor any of 437
their employees, makes any warranty, express or implied, or assumes any legal responsibility for 438
the accuracy, completeness, or usefulness of any information, apparatus, product, or process 439
disclosed, or represents that its use would not infringe privately owned rights. Reference herein 440
-
21
to any specific commercial product, process, or service by its trade name, trademark, 441
manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, 442
recommendation, or favoring by the United States Government or any agency thereof, or The 443
Regents of the University of California. The views and opinions of authors expressed herein do 444
not necessarily state or reflect those of the United States Government or any agency thereof, or 445
The Regents of the University of California. 446
447
References 448
Bailis, R., Berrueta, V., Chengappa, C., Dutta, K., Edwards, R., Masera, O., Still, D., Smith, K. 449
R., 2007. Performance testing for monitoring improved biomass stove interventions: 450
experiences of the household energy and health project. Energy for Sustainable Development 451
11 (2), 57-70. 452
Berthouex, P. M., Brown, L. C., 2002. Statistics for Environmental Engineers. Second Edition. 453
Lewis Publishers. 454
Corder, G., Foreman, D., 2009. Nonparametric Statistics for Non-Statisticians: A Step-by-Step 455
Approach. Wiley. 456
DOE (Department of Energy), 2011. Biomass cookstoves technical meeting: Summary report, 457
Alexandria, VA. 458
http://www1.eere.energy.gov/biomass/pdfs/cookstove_meeting_summary.pdf. Accessed 459
February 4, 2013. 460
Efron, B., 1979. Bootstrapping methods: Another look at the jackknife. The Annals of Statistics 461
7 (1): 1-26. 462
Efron, B., Tibshirani, R., 1993. An introduction to the Bootstrap. Boca Raton, FL: Chapman & 463
Hall/CRC. ISBN 0-412-04231-2. 464
Ellison, S., Barwick, V., Duguid Farrant, T., Practical Statistics for the Analytical Scientist: A 465
Bench Guide, 2nd ed., (Royal Society of Chemistry, 2009)\ 466
http://www1.eere.energy.gov/biomass/pdfs/cookstove_meeting_summary.pdf
-
22
Granderson, J., Sandhu, J. S., Vasquez, D., Ramirez, E., Smith, K. R., 2009. Fuel use and design 467
analysis of improved woodburning cookstoves in the Guatemalan Highlands. Biomass and 468
Bioenergy 33, 306-315. 469
Hadley, O. L., Corrigan, C. E., Kirchstetter, T. W., Cliff, S. S., Ramanathan, V., 2010. 470
Measured black carbon deposition on the Sierra Nevada snow pack and implication for snow 471
pack retreat. Atmos. Chem. Phys., 10, 7505-7513. 472
IEA (International Energy Agency), 2004. Energy and Development. World Energy Outlook 473
2004. IEA Publications, Paris. 474
Jetter, J., Kariher, P., 2009. Solid-fuel household cook stoves: Characterization of performance 475
and emissions. Biomass and Bioenergy 33, 294-305. 476
Jetter, J., Zhao, Y., Smith, K. R., Khan, B., Yelverton, T., DeCarlo, P., Hays, M. D., 2012. 477
Pollutant emissions and energy efficiency under controlled conditions for household biomass 478
cookstoves and implications for metrics useful in setting international test standards. 479
Environmental Science & Technology 46, 10827-10834. 480
Kirchstetter, T., Preble, C., Hadley, O., Gadgil, A., 2010. Quantification of black carbon and 481
other pollutant emissions from a traditional and an improved cookstove. Lawrence Berkeley 482
National Laboratory (LBNL) Report, number: LBNL-6062E. Available: 483
http://gadgillab.berkeley.edu/wp-content/uploads/2010/11/TWK_improved-cookstove.f_13-484
6-10.pdf. Accessed December 5, 2013. 485
Lim S.S., Vos, T., Flaxman, A. D., Danaei, G., Shibuya, K., Adair-Rohani, H. et al., 2012. A 486
comparative risk assessment of burden of disease and injury attributable to 67 risk factors 487
and risk factor clusters in 21 regions, 1990-2010: a systematic analysis for the Global Burden 488
of Disease Study 2010. Lancet 380, 2224-60. 489
MacCarty, N., Ogle, D., Still, D., Bond, T., Roden, C., 2008. A laboratory comparison of the 490
global warming impact of five major types of biomass cooking stoves. Energy for 491
Sustainable Development 12 (2), 56-65. 492
MacCarty, N., Still, D., Ogle, D., 2010. Fuel use and emissions performance of fifty cooking 493
stoves in the laboratory and related benchmarks of performance. Energy for Sustainable 494
Development 14, 161-171. 495
Milton, J. S., Arnold, J. C. 1995. Introduction to probability and statistics: Principles and 496
applications for engineering and the computing sciences, 3rd ed., McGraw-Hill. 497
-
23
Ramanathan, V., Carmichael, G., 2008. Global and regional climate changes due to black 498
carbon. Nature Geoscience 1, 221 – 227. 499
Roden, C. A., Bond, T. C., Conway, S., Benjamin, A., Pinel, O., MacCarty, N., Still, D., 2009. 500
Laboratory and field investigations of particulate and carbon monoxide emissions from 501
traditional and improved cookstoves. Atmospheric Environment 43, 1170-1181. 502
Smith, K. R., Uma, R., Kishore, V. V. N., Lata, K., Joshi, V., Zhang, J., Rasmussen, R. A., 503
Khalil, M. A. K., 2000. Greenhouse gases from small-scale combustion devices in 504
developing countries; EPA/600/R-00/052; U.S. Environmental Protection Agency: 505
Washington, DC. 506
Smith, K. R., Mehta, S., Maeusezahl‐Feuz, M., 2004. Indoor smoke from household solid fuels. 507
In Comparative quantification of health risks: global and regional burden of disease due to 508
selected major risk factors, M. Ezzati, A.D. Rodgers, A.D. Lopez, and C.L.J. Murray eds., 509
World Health Organization, Geneva, Switzerland. 510
Smith, K. R., Dutta, K., Chengappa, C., Gusain, P. P. S., Berrueta, V., Masera, O., Edwards, R., 511
Bailis, R., Shields, K. N., 2007. Monitoring and evaluation of improved biomass cookstove 512
programs for indoor air quality and stove performance: Conclusions from the household 513
energy and health project. Energy for Sustainable Development 11 (2), 5-18. 514
Spiegel, M. R., Lipschutz, S., Liu, J., 2008. Mathematical Handbook of Formulas and Tables, 515
3rd ed. McGraw-Hill. 516
Taylor, J. R., 1997. An Introduction to Error Analysis, 2nd ed. University Science Books. 517
USAID (United States Agency for International Development), Fuel-efficiency stove programs 518
in IDP settings – Summary evaluation report, Darfur, Susan. December 2008; Available: 519
http://pdf.usaid.gov/pdf_docs/PDACM099.pdf. Accessed January 6, 2014. 520
Varian, H., 2005. Bootstrap Tutorial. Mathematics Journal, 9, 768-775. 521
Wall, J. V., Jenkins, C. R., 2003. Practical Statistics for Astronomers, Cambridge University 522
Press. 523
Weinhold, B., 2011. Indoor PM pollution and elevated blood pressure: Cardiovascular impact of 524
indoor biomass burning. Environmental Health Perspectives, 119 (10), A442. 525
World Bank, 2011. Household Cookstoves, Environment, Health and Climate Change: A New 526
Look at an Old Problem, The World Bank, Washington, DC; Available: 527
-
24
http://climatechange.worldbank.org/sites/default/files/documents/Household%20Cookstoves-528
web.pdf. Accessed December 5, 2013. 529
http://climatechange.worldbank.org/sites/default/files/documents/Household%20Cookstoves-web.pdfhttp://climatechange.worldbank.org/sites/default/files/documents/Household%20Cookstoves-web.pdf
-
25
Table 1. Summary of Z values. 530
1 – α = 0.99 1 – α = 0.95 1 – α = 0.90
z = 2.56 z = 1.96 z = 1.64
531
-
26
Table 2. Student’s t-distribution critical values. 532
n n – 1 t.995 (One sided) or t.975 (One sided) or t.95 (One sided) or
(Number of
replicates)
(Degrees of
Freedom) t.99 (Two sided) t.95 (Two sided) t.90 (Two sided)
1 - - - -
2 1 63.657 12.706 6.314
3 2 9.925 4.303 2.920
4 3 5.841 3.182 2.353
5 4 4.604 2.776 2.132
6 5 4.032 2.571 2.015
7 6 3.707 2.447 1.943
8 7 3.500 2.365 1.895
9 8 3.355 2.306 1.860
10 9 3.250 2.262 1.833
11 10 3.169 2.228 1.812
12 11 3.106 2.201 1.796
13 12 3.054 2.179 1.782
14 13 3.012 2.160 1.771
15 14 2.977 2.145 1.761
16 15 2.947 2.132 1.753
17 16 2.921 2.120 1.746
18 17 2.898 2.110 1.740
19 18 2.878 2.101 1.734
20 19 2.861 2.093 1.729
21 20 2.845 2.086 1.725
22 21 2.831 2.080 1.721
23 22 2.819 2.074 1.717
24 23 2.807 2.069 1.714
25 24 2.797 2.064 1.711
26 25 2.787 2.060 1.708
27 26 2.779 2.056 1.706
28 27 2.771 2.052 1.703
29 28 2.763 2.048 1.701
30 29 2.756 2.045 1.699
533
534
-
27
Table 3. Summary of the standard deviation values (σ) of all measured variables for the BDS 535
(n=21) and the TSF (n=20). 536
TSF BDS
Time to boil (minute) 5.6 2.1
Dry wood burned (g) 75.4 33.6
CO emission factor (g/kg-wood) 6.8 5.8
PM2.5 emission factor (g/kg-wood) 1.0 1.2
BC emission factor (g/kg-wood) 0.3 0.5
537
-
28
538
539
Figure 1. Schematic of the Berkeley-Darfur Stove. (1) A tapered wind collar that increases fuel-540
efficiency in the windy Darfur environment and allows for multiple pot sizes; (2) Wooden 541
handles for easy handling; (3) Metal tabs for accommodating flat plates for bread baking; (4) 542
Internal ridges for optimal spacing between the stove and a pot for maximum fuel efficiency; (5) 543
Feet for stability with optional stakes for additional stability; (6) Nonaligned air openings 544
between the outer stove and inner fire box to accommodate windy conditions; and (7) Small fire 545
box opening to prevent using more fuel wood than necessary. 546
-
29
547
548
Figure 2. Histogram of time to boil data for the BDS and the TSF. 549
-
30
550
551
Figure 3. Histogram of PM2.5 emission factor data for the BDS and the TSF. 552
-
31
553
554
Figure 4. Cumulative distribution function (CDF) of time to boil data for the BDS and the TSF. 555
-
32
556
557
Figure 5. Cumulative distribution function (CDF) of PM2.5 emission factor data for the BDS and 558
the TSF. 559
-
33
560
561
Figure 6. The width of the confidence interval about the mean as a function of the number of 562
replicate tests at three probability levels (0.1, 0.05, and 0.01) for the BDS time to boil and PM2.5 563
emission factor data. For example, if the width of the confidence interval for the mean time to 564
boil is 4 minutes at probability levels of 0.1, 0.05, and 0.01, 5, 7 and 12 replicates are required, 565
respectively, as indicated by the black horizontal dash line and the black vertical arrows. 566
-
34
567
568
Figure 7. Kolmogorov-Smirnov test result showing the probability of the BDS and the TSF 569
bootstrap samples are drawn from two different distributions as a function of the number of 570
replicate tests for the time to boil and PM2.5 emission factor data. 571