trb06-0960 effects of sample size on the ... effects of sample size on the goodness-of-fit statistic...
TRANSCRIPT
TRB06-0960
EFFECTS OF SAMPLE SIZE ON THE GOODNESS-OF-FIT STATISTIC AND
CONFIDENCE INTERVALS OF CRASH PREDICTION MODELS
SUBJECTED TO LOW SAMPLE MEAN VALUES
Ravi Agrawal
Research Assistant
Department of Civil Engineering
TAMU 3136
Texas A&M University
College Station, Texas
77843-3136
e-mail: [email protected]
Dominique Lord†
Assistant Professor
Department of Civil Engineering
Texas A&M University
College Station, Texas
77843-3136
e-mail: [email protected]
November 10th, 2005
4,459 words + 4 figures + 3 tables = 6,209 words
† Contact person
ABSTRACT
The statistical relationship between motor vehicle crashes and covariates can generally be
modeled via generalized linear models (GLMs) using logarithmic links with errors
distributed in a Poisson or Poisson-gamma manner. The scaled deviance (SD) and
Pearson’s X2 are tools that have been proposed to test statistical fit of GLMs. Recent
studies have shown that these two estimators are not adequate for testing the goodness-
of-fit (GOF) of GLMs when they are developed from data characterized with low sample
mean values. To circumvent this problem, a testing method has been proposed to evaluate
the goodness-of-fit of such GLMs. Given the fact that this method can be time-
consuming to implement, there is a need to determine whether this technique is sensitive
to different sample sizes. The primary objective of this paper was to investigate the
effects of decreasing sample sizes on the GOF testing technique. A secondary objective
was to estimate how the reducing of sample size influences the confidence intervals of
GLMs. In order to accomplish the objectives of the study, GLMs were fitted using two
datasets subjected to average and low sample means collected in Toronto, Ontario.
Several models were estimated for different sample sizes. The results of the study show
that the testing technique is more effective for smaller samples than for larger samples
when data is subjected to low sample mean values. The results also show that the width
of the confidence intervals increases, as expected, as the sample size decreases, and can
be extremely large for very small sample sizes. Hence, statistical models characterized by
low sample mean values should be developed using a large number of observations. In
fact, it is recommended to develop models using datasets containing at least 100
observations (e.g., intersections, segments, etc.). The paper concludes with
recommendations for future studies involving such datasets.
Agrawal & Lord 1
INTRODUCTION
A subject of research in the roadway safety analysis has been to analyze the effects of
random variations and various systematic causal factors on crash counts (1). The
statistical relationship between motor vehicle crashes and covariates can generally be
estimated via generalized linear models (GLMs) using logarithmic links with errors
distributed in a Poisson or Poisson-gamma (aka negative binomial) manner. Poisson
models serve well under homogeneous conditions while Poisson-gamma models serve
better when the data are subjected to heterogeneity (2). In other words, Poisson-gamma
models are more appropriate if the variation in the data is larger than the mean (usually
referred to as overdispersion).
The goodness-of-fit (GOF) of GLMs can usually be tested by the statistical indicators
scaled deviance (SD) and Pearson’s χ2. Unfortunately, it has been recently determined
that these two indicators are not adequate to determine the GOF of GLMs developed
from crash data characterized by low sample mean values. Maycock and Hall (3) were
the first to raise the issue related to low sample mean values. Fridstrøm et al. (1) further
discussed this matter, while Maher and Summersgill (4) showed how the GOF of
statistical models could be affected by a low sample mean. They defined this issue as the
“low mean problem” (LMP). Subsequent to this identification and its effects on the
development of statistical models, Wood (5, 6) proposed a method to test the GOF of
GLMs developed using data characterized with low sample mean values. Although the
method is very useful, it may be a little complicated for the average transportation safety
modeler as well as time-consuming to implement. Wood (7) also devised a method for
estimating the confidence intervals for the mean response (µ), for the gamma mean (m),
and the predicted response (y) at a new site having similar characteristics as the sites used
in the original dataset from which the model was developed. Given the recent issues
identified by researchers in statistics (8), biology (9, 10) and highway safety (11) on the
effects of small sample sizes combined with low sample mean values on the estimation of
GLMs, there is a need to determine how they affect the GOF statistic testing method and
the computation of confidence intervals. In addition, it is important to find out whether
Agrawal & Lord 2
this technique is sensitive to different sample sizes in the light of the fact that the testing
method can be time-consuming to implement. Hence, determining whether additional
data should be collected, if it is a cost-effective approach, could circumvent using the
testing method.
This paper describes an investigation into the effects of decreasing sample sizes on the
GOF test proposed by Wood (6) and the confidence intervals of GLMs (7). To test the fit,
statistical models with different sample sizes were developed and the fit was assessed by
comparing the χ2 probabilities of SD. The hypothesis was that the grouping technique (to
be discussed below) would have more important effect for smaller sample sizes than
larger sample sizes. For the same models, confidence intervals on the gamma mean and
predicted responses were then calculated to estimate the effects of reducing sample sizes
on these intervals. The goal is to quantify the changes as a function of the sample size. It
is common knowledge that smaller sample sizes increase the confidence intervals but, so
far, there has not been any research that tried to quantify or determine the magnitude of
these changes, at least not for crash prediction models. In order to accomplish the
objectives of the study, two datasets comprising of crashes and traffic flow data collected
at signalized and unsignalized intersections in Toronto, Ontario were used. The first
dataset contained observations collected at 868 sites with the corresponding entering
flows while the second data contained 354 observations with a lower sample mean than
that of the first data. One year of data was used for the model development. GLMs were
fitted to the data using the most common functional form utilized by transportation safety
modelers to link crashes to the entering flows at intersections.
The paper is divided into four sections. The first section briefly describes the
characteristics of GLMs for modeling crash-flow relationships and issues related to the
low mean problem. The second section describes the methodology used for analyzing the
two datasets. The third section summarizes the results of the analysis. The last section
provides the conclusions and recommendations for further studies.
Agrawal & Lord 3
BACKGROUND
The relationship between crashes and traffic flows can be represented by a GLM with
negative binomial (NB) or Poisson error structure (12). The most common functional
form used to characterize crash-flow relationships at intersections remains 1 20 1 2F Fβ βµ β= ,
where µ is the mean number of crashes, and F1 and F2 are entering flows for the major
and minor approaches respectively. Although this functional form does not offer the best
relationship (see 13), it will be used herein given its simplicity in the development of
predictive models and because it is still the most popular functional form used in practice.
It has been shown that the crash process can be approximated by a Poisson-based
distribution and the magnitude of the variation in the data is dependent on the
characteristics of this process (2). The most common probabilistic structure that has been
proposed to accommodate extra-variation in the data remains the Poisson-gamma or NB
model. This model can be transformed into a linear model by taking the logarithm of the
mean function. The NB distribution is characterized by two parameters µ and φ ,
representing the mean and inverse dispersion parameter respectively. In the present work,
φ is assumed to be fixed, but recent work has shown that the inverse dispersion
parameter may be dependent on the covariates of the model (see 13, 14, 15). The
probability density function (pdf) of the NB distribution can be represented below:
( ) ( )( )
; ,!
yyf y
y
φφ φ µφ µφ µ φ µ φ
Γ + ⎛ ⎞ ⎛ ⎞= ⎜ ⎟ ⎜ ⎟Γ + +⎝ ⎠ ⎝ ⎠
(1)
Where,
y = response variable (i.e., crashes per year);
µ = mean response of the distribution; and
φ = inverse dispersion parameter of the NB distribution.
Agrawal & Lord 4
The variance of the distribution can be represented by
( )2
Var y µµφ
= + (2)
The GOF of NB models can be assessed using the value of SD (which is deviance
divided by the dispersion parameter for the model and can be calculated as twice the
logarithm of ratio of the likelihoods of the two models which are compared) or Pearson’s
X2 statistic (12). These statistics follow χ2 distributions if the data are approximately
normally distributed. Thus, if these statistics are close to the degrees of freedom, it can be
said that the hypothesized model is adequate and more terms are not needed (4). The
degree of freedom for calculating the χ2 probability for the statistics is the difference of
parameters in the large model and smaller model. It was found that for very low sample
means, the SD and Pearson’s X2 are no longer χ2 distributed. This is because for small
mean values the data are not normally distributed (5). This leads to the false
approximation of the model to fit the data. Maher and Summersgill (4) suggested the use
of X2 or Expected Standard Deviation (ESD) as the measure of goodness-of-fit, but Wood
(5) showed that both of these fail with very low sample means.
Wood (5) proposed instead a grouping method to increase the suitability of the scaled
deviance and Pearson’s X2 statistic as the statistics for the approximation of adequate
fitting of the GLMs. In this method, some of the data are grouped to improve the
normality of observations. The observations and the flows for some sites are combined to
give a newly grouped observations and flows. The same fitted coefficients from the
original model are then used to calculate the new mean for the grouped sites. The reader
is referred to Wood (5) for the more detailed description of the methodology. The
grouped SD for the NB distribution has been used in this research and is given by,
ˆ ˆ
2
1
ˆ ˆˆ ˆ ˆ2 ln lnˆ ˆ ˆˆ ˆ
iyn
i i ii
i ii i
yG ry y
φµ φ µ φ
µφ φ=
⎛ ⎞⎛ ⎞ ⎛ ⎞+ +⎜ ⎟= +⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟+ +⎝ ⎠ ⎝ ⎠⎝ ⎠∑ (3)
Agrawal & Lord 5
Where ir is the group size for group i , iy is the new mean of the observed values for
group i , iµ is the estimated mean for group i , φ is the estimated dispersion parameter
and n is the number of groups. The group size ir can be different for every group in the
data depending on the observed values. Variance of SD provides the constraints on the
group size for the observations. While grouping the data makes the components of SD to
be more asymptotic to χ2 distribution, the value of SD itself can deviate from the χ2
distribution (5). Also too much grouping causes the largest model against which the
hypothesized model is being tested to get smaller. Hence, care has to be taken in
determining group sizes (5). A common φ can be used for the maximal model and
reduced model as was shown in Maher and Summersgill (4).
A GOF model is accompanied by the confidence intervals on the mean, gamma mean and
predicted responses. For the case of a NB model, a confidence interval for µ, the long-
term mean and prediction intervals for safety of a site, m and the predicted crashes at a
new site, y are needed. Wood (7) has described a method to find these intervals (Table 1).
Here, η is the logarithm of the estimated long-term mean, µ while φ is the inverse
dispersion parameter estimated during the fitting process.
Agrawal & Lord 6
Table 1. 95% Confidence and Prediction Intervals for Poisson-gamma models (7)
Parameter Intervals
µ ˆ1.96 ( )
ˆ1.96 ( )
ˆ ˆ, VarVar
ee
ηη
µ µ⎡ ⎤⎢ ⎥⎣ ⎦
m 2 22
2 22
ˆˆ ˆvar( )ˆˆ ˆmax 0, 1.96 var( ) ,
ˆˆ ˆvar( )ˆˆ ˆ1.96 var( )
µ η µµ µ ηφ
µ η µµ µ ηφ
⎡ ⎧ ⎫+⎪ ⎪⎢ − +⎨ ⎬⎢ ⎪ ⎪⎩ ⎭⎣
⎤++ + ⎥
⎥⎦
y 2 22 ˆˆ ˆ( )ˆˆ ˆ ˆ0, 19 ( ) VarVar µ η µµ µ η µ
φ
⎡ ⎤⎢ ⎥+⎢ ⎥+ + +⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦⎣ ⎦
Note: 1
0 0ˆ( ) ( )Var η −′= x XWX x
x⎢ ⎥⎣ ⎦ denotes the largest integer less or equal than x
The next section describes how the two datasets with different means were analyzed to
evaluate the effect of varying sample sizes on the appropriateness of the methods
proposed by Wood (5, 7).
METHODOLOGY
Two datasets were used in this study. The datasets were initially collected for a project
related to the development of statistical models for predicting the safety performance of
signalized and unsignalized intersections in Toronto, Ont. (16). The datasets have been
found to be of very good quality and have been used extensively over the last few years
(11, 13, 16, 17, 18, 19). The samples included information on fatal and non-fatal injury
crashes at each site along with the major and minor average daily traffic flows (AADT).
The first dataset contained 868 observations (signalized intersections) for the year 1995.
The second dataset contained 354 observations (unsignalized intersections) with a lower
Agrawal & Lord 7
sample mean than that of the first dataset. The first dataset had a sample mean equal to
3.93 and a sample variance equal to 13.96. The sample mean and variance for the second
dataset were equal to 1.01 and 1.28, respectively. A plot of the mean vs variance for the
data at hand revealed overdispersion. For both the datasets, the GLM was assumed to
follow the proposed functional form 1 20 1 2F Fβ βµ β= , where µ is the mean response
corresponding to the entering flows 1F and 2F for the major and minor approaches
respectively, and the sβ are the regression coefficients and can be represented by the
matrix β .
The steps performed in the analysis are as follows:
1. The data contained the injury counts and the two flows (major and minor for
every site). Only fatal and non-fatal injury crashes were used for modeling and
analysis purposes. A preliminary estimate of inverse dispersion parameter (φ )
was obtained using Equation (2) by computing the mean and variance of the crash
counts. This preliminary estimate is used for determining the magnitude of the
variation of within the dataset, and consequently selecting the proper error
distribution for the model to be fitted (this approach provides a better estimate of
the dispersion in the data than the plot described above). Genstat (20) was used
for estimating the model coefficients ( sβ ) and the inverse dispersion parameter
(φ ). This was done by regressing the injury counts with the logarithm of the
flows in Genstat for a generalized linear model with a logarithm link. The
mathematics behind the model fitting process is described in details in Wood (6).
In this work, it is assumed that the fixed inverse dispersion parameter is not
affected by the low sample mean or small sample size (see 11 for additional
information on this assumption).
2. Once the model coefficients and φ are estimated, the SD component for every
site can be estimated using Equation (3). The sum of the components of the scaled
deviance would give the value for G2. The scaled deviance components are
Agrawal & Lord 8
hypothesized to be distributed asymptotically to a χ2 distribution. Hence, the value
of G2 is used to get a χ2 probability, which tells about the GOF of the model.
3. Since we are considering the problem of low mean here, the validity of SD as an
indicator of GOF may not be true. Hence a new scaled deviance needs to be
calculated which has the components distributed asymptotically to χ2 distribution.
As described earlier, this method is explained by Wood (5, 6). Wood used a
method to group the values of counts and flows in such a way that new SD
components fit the χ2 distribution better. This method was then used to calculate
the new SD. The method is described as follows (6):
• For each flow–accident pair, calculate the fitted µ and associate the pair with
a group size r .
• “Working from right to left in the regression plot, group the data so that each
flow–accident pair is in a group of size at least as large as the associated r .
This would give n groups with the ith group of size ir having crash counts yi1,
yi2,…,yiri and associated flow counts of x1, x2…xi.”
• Average the x and y values for each group.
• Calculate the new SD using Equation (3) where iµ is the value fitted by the
model at average xi.
• Test SD against a χ2 distribution, with degrees of freedom equal to n minus
p .
4. The confidence intervals for m and y were then calculated using the equations in
Table 1 (7). In these equations, η is the logarithm of the estimated mean ( µ )
while φ is the inverse dispersion parameter estimated during the fitting process.
The Var(η ) is calculated by ′ -1X I X where -1I is the variance-covariance matrix
of the parameters (usually provided by the output of statistical software programs)
and X is defined as the input ( 1)n p× + matrix containing the entering flows in
the logarithmic form. In cases where the output is not available, the variance-
Agrawal & Lord 9
covariance matrix ( ) 1−′X WX can be estimated directly, where W is the diagonal
n n× weights matrix with ( ) ( )ii i iw φ µ φ µ= × + , with i varying from 0 to n
sites.
5. In the last step, the same process used for steps 1-4 was repeated for lower sample
sizes. The sample sizes of 20, 50, 100, 200, 400 and 868 were analyzed for the
first data while the sample sizes of 20, 50, 100, 200 and 354 were analyzed from
the second data. For the latter dataset, a few samples were tested for n=20, but the
statistical models did not provide an adequate fit. In fact, some of the coefficients
did not provide a logical relationship with the crash data. For instance, the
coefficient for the flow on the minor approach was negative for the tested samples.
Thus, although still shown in Table 3, it was decided to remove the sample size
equal to 20 from further analysis. Finally, all samples were randomly taken from
the data to minimize biases.
STATISTICAL ANALYSIS
This section summarizes the statistical analysis carried out in this work and is separated
into two sub-sections. The first sub-section describes the results of the GOF analysis. The
second section shows the results of the confidence interval analysis.
GOODNESS-OF-FIT ANALYSIS
The first objective of this paper was to evaluate whether sample sizes influence the GOF
test proposed by Wood (6). The results of this analysis are presented in Tables 2 and 3.
Table 2 shows the values of χ2 probabilities of the GOF for scaled deviances for the first
dataset. These values were plotted against the sample sizes before grouping and after
grouping (Figure 1). The plot shows that the problem of scaled deviance not following
the χ2 distribution is not significant. As can be seen, there is not a large difference
between the χ2 probabilities of scaled deviance before or after grouping. This is expected,
Agrawal & Lord 10
since at higher sample means the scaled deviance without grouping is a good
approximation of the GOF of the data. As the sample size becomes smaller from 868 to
20, there is no deviation from this pattern and the values of scaled deviances are similar
before and after grouping.
Agrawal & Lord 1
Table 2. Comparison of χ2 Probabilities for the Scaled Deviances for Different Sample Sizes for First Dataset First Dataset (868)
Ungrouped Grouped
Sample
size Mean Variance )ln( 0β
1β 2β φ G-square dof χ2-
prob G-
square dof χ2-prob
Difference in
Ungrouped and
Grouped χ2-
prob (%)
20 3.65 12.24
-32.7
(13.8)†
2.97
(1.31)
0.308
(0.198)
3.9857
(3.1907) 23.80 17 0.125 23.93 15 0.121 -3.20
50 3.42 12.98
-12.26
(4.31)
0.817
(0.404)
0.571
(0.152)
2.3101
(0.8644) 55.91 97 0.175 48.69 42 0.222 26.86
100 3.49 12.27
-10.39
(2.48)
0.605
(0.232)
0.601
(0.0825)
4.4665
(1.5129) 106.39 97 0.242 104.40 89 0.291 20.17
200 3.6 12.75
-12.83
(1.78)
0.851
(0.172)
0.5816
(0.0613)
4.8058
(1.1949) 217.42 197 0.152 191.50 180 0.265 74.44
400 3.39 10.48
-11.93
(1.06)
0.7795
(0.0991)
0.5667
(0.0433)
7.0138
(1.5436) 431.79 397 0.111 3.39 363 0.128 16.20
868 3.93 13.96
-11.959
(0.651)
0.7482
(0.0644)
0.6136
(0.03)
7.7356
(1.1689) 968.22 865 0.008 893.47 805 0.016 103.75 † Standard Error
Agrawal & Lord 1
0
0.2
0.4
0.6
0.8
1
20 50 100 200 400 868
Sample size
Chi
-squ
are
prob
a
Ungrouped SD- Chi-square probability Grouped SD- Chi-square probability
Figure 1. Scaled Deviance χ2 Probabilities for Different Sample Sizes
for First Dataset
Table 3 shows the values of χ2 probabilities of the GOF for scaled deviances for the
second dataset. Similar to above, these values were plotted against the sample sizes
before grouping and after grouping (Figure 2). This data had a lower sample mean, i.e.
1.01, for which the problem of scaled deviance not following the χ2 distribution is much
larger compared to the first dataset. As can be seen, there is a large difference between
the χ2 probabilities of scaled deviance before or after grouping. This is expected, as at
lower means the scaled deviance without grouping is not a good approximation of the
GOF of the data. The grouped SD is a better indicator of GOF of the GLM. It can be seen
that at larger sample sizes, the effect of grouping the data is not as large as for lower
sample sizes. The χ2 probabilities are closer with large samples. In addition, it can be seen
that the χ2 probabilities are a lot different for lower sample sizes even after grouping. This
indicates that sample size can make some difference to the statistical fit of the model.
Figure 2 shows that, as reported by Wood (6), grouping data offers a better approach to
estimate the statistical fit of models with low sample mean values.
Agrawal & Lord 1
Table 3. Comparison of χ2 Probabilities for the Scaled Deviances for Different Sample Sizes for Second Dataset Second dataset (354)
Ungrouped Grouped Sample
size Mean Variance ( )0ln β 1β 2β φ
G-
square dof χ2-prob
G-
square dof
χ2-
prob
Difference in
Ungrouped
and Grouped
χ2-prob (%)
20‡ 1.15 2.77
-11.61
(9.90)†
1.571
(0.835)
-0.56
(0.656)
0.795
(0.6098) 17.30 17 0.4344 7.2699 6 0.297 -31.74
50 0.98 1.73
-7.97
(5.75)
0.670
(0.449)
0.162
(0.337)
1.1938
(0.6827) 50.20 47 0.348 11.92 20 0.919 164.10
100 0.96 1.25
-8.33
(3.66)
0.68
(0.294)
0.193
(0.207)
4.2732
(3.7004) 110.06 97 0.172 43.61 43 0.446 158.86
200 0.94 1.13
-10.66
(2.56)
0.787
(0.197)
0.360
(0.142)
11.0436
(14.9561) 224.13 197 0.090 138.96 97 0.003 -96.33
354 1.01 1.28
-7.65
(1.83)
0.607
(0.142)
0.209
(0.103)
4.8367
(2.3758) 396.05 351 0.049 224.40 172 0.004 -90.97 † Standard Error ‡ Sample size equal to 20 did not provide a good statistical model, since the coefficient for F2 is negative. The model is shown here for illustration purposes
only and was not used for further analyses
Agrawal & Lord 1
0
0.2
0.4
0.6
0.8
1
50 100 200 354
Sample size
Chi
squa
re p
roba
UnGrouped SD- Chi-square probability Grouped SD- Chi-square probability
Figure 2. Scaled Deviance χ2 Probabilities for Different Sample Sizes
for Second Dataset
CONFIDENCE INTERVAL ANALYSIS
The second objective of this paper was to analyze the effect of the reducing sample sizes
on the confidence intervals of the gamma mean and predicted response. The analysis was
initially carried out with the second dataset because the functional form for every sample
size was very similar; in other words, the coefficients of the models were very close. This
made the comparison between each sample size easier. Consequently, confidence
intervals were estimated for sample sizes equal to 50, 100, and 384. The summary of the
models were presented in Table 3.
Using the equations presented in Table 1, the 95%-precentile confidence intervals on the
gamma mean and predicted response were computed for the different sample sizes for an
entering flow for the minor approach 2 2,000F = . The results are presented in Figures 3
and 4 respectively. As a general trend, these figures show the confidence intervals on the
gamma mean and predicted response become larger as the sample size becomes smaller.
For instance, the width of the confidence intervals for the gamma mean increases
between 0% to 10% when the sample size is reduced from 354 to 100. The increase is
much more important for reducing the sample size from 354 to 50, with an increase in the
width varying between 72% and 85%. The same outcome can be seen for the predicted
Agrawal & Lord 2
response, in which the confidence interval increases with a smaller sample size. The
difference can be as high as 65% for the sample reduced from 354 to 50. Additionally,
although difficult to see at the lower left-hand side, the confidence intervals are actually
wider at lower and higher flows, similar to the confidence intervals produced for linear
statistical models (21). This characteristic is explained by the fact that there are fewer
sites at both extremities.
Agrawal & Lord 3
Sample s ize=3 54 , F2 =2 0 0 0
0
0.5
1
1.5
2
2.5
3
3.5
4
F1 (AADT)
Sample s ize=10 0 , F2 =2 0 0 0
0
0.5
1
1.5
2
2.5
3
3.5
4
F1 (AADT)
Sample s ize=50, F2=2000
0
1
2
3
4
5
6
7
F1 (AADT)
Figure 3. 95% Confidence Intervals for the gamma Mean m
Agrawal & Lord 4
Sample size=354, F2=2000
0
1
2
3
4
5
6
7
8
9
1000
4000
7000
1000
013
000
1600
019
000
2200
025
000
2800
031
000
3400
037
000
4000
043
000
4600
049
000
5200
055
000
5800
0
F1 (AADT)
y
Sample size=100, F2=2000
0
1
2
3
4
5
6
7
8
9
10
1000
4000
7000
1000
013
000
1600
019
000
2200
025
000
2800
031
000
3400
037
000
4000
043
000
4600
049
000
5200
055
000
5800
0
F1 (AADT)
y
Sample size=50, F2=2000
0
2
4
6
8
10
12
14
1000
4000
7000
1000
013
000
1600
019
000
2200
025
000
2800
031
000
3400
037
000
4000
043
000
4600
049
000
5200
055
000
5800
0
F1 (AADT)
y
Figure 4. 95% Confidence Intervals for the Predicted Response y
Agrawal & Lord 5
In order to remove the effects caused by using different values for the inverse dispersion
parameter and only account for the sampling error of the model (via )(ηVar ), the
statistical models were re-fitted using the inverse dispersion parameter estimated from the
sample of 354 sites ( 87.4=φ ) for the sample size of 100 and 50 respectively. New
confidence intervals were estimated and compared with the one estimated from the
original model for 354 sites. When the sampling error accounts for the only variation in
the model, the effects of reducing sample size on the estimation of the confidence was
also affected, albeit to a lesser degree. For instance, the width of the confidence interval
for gamma mean varied between -3% (for the left-hand side) to 10% for a sample size
equal to 100 and between 10% and 20% for a sample size equal to 50. This implies that
lower sample sizes increase the width of the confidence intervals even if the inverse
dispersion parameter remains constant. This is expected since )(ηVar is also dependent
on the dispersion parameter. On the other hand, the confidence intervals on the predicted
response are less affected with changes varying between -20% to an increase of 12.5% (at
high flows). In this case, the width of the steps where they start and end will be different,
as shown in Figure 4. Thus, some values will automatically jump by one point while
some will go down. Nonetheless, the width increases at the both extremities.
The first dataset, with a higher sample mean, was also used for evaluating the effects of
reducing sample size on the confidence intervals. The same entering flow 2 2,000F =
was employed in this exercise. Because the predicted values varied greatly between the
models developed from different sample sizes, the coefficients and the dispersion
parameter from the full dataset along with the variance-covariance matrix output from the
lower sample sizes were used to compute the confidence intervals. Although not ideal,
the comparison showed that the confidence intervals for the gamma mean and predicted
response increased on average by similar percentages for sample sizes equal to 100 and
50 respectively. Interestingly, the confidence intervals for the gamma mean increased by
an average of 136% for a sample size equal to 20 (compared to the full dataset). With this
kind of increase, developing statistical models using such small number of observations
is not recommended.
Agrawal & Lord 6
SUMMARY AND CONCLUSIONS
This study aimed at analyzing the effects of reducing sample sizes on the improved GOF
statistic method proposed by Wood (6) which was devised to increase the appropriateness
of the SD as a GOF indicator for GLMs with Poisson or NB error structure. Two datasets
containing fatal and non-fatal injury crash counts and traffic flow data were analyzed
with respective mean injury counts of 3.9 and 1.01. Different sample sizes were analyzed
for each dataset. For each sample size, the GOF was tested using a GLM with and
without grouping of the data. In addition, the confidence intervals for the gamma mean
and predicted responses were estimated using data the datasets.
The following results were obtained:
1. There was no strong effect of grouping on the χ2 probabilities for first dataset
(with mean injury count 3.9) with different sample sizes. This was expected as the
grouping technique was only needed when the scaled deviance (SD) does not
follow a χ2 distribution. It can be concluded based on the first dataset that the
grouping technique might not be needed for improving goodness-of-fit procedure
for data with large mean.
2. There was a stronger effect of the grouping on the χ2 probabilities for the second
dataset with different sample sizes. This was expected as the low mean data does
not approximate normal distribution, especially for small sample sizes, and hence
SD does not follow a χ2 distribution. As discussed by Lawless (22), the inferences
associated with NB models become asymptotically (or normally) distributed as
the sample size increases, i.e. ∞→× µn . (see also 23 for a discussion on the
asymptotically approximation of Poisson models estimated using small sample
size.)
3. Confidence intervals for the gamma mean and predicted responses got wider with
decreasing sample sizes. This is reasonable as with the reduction in sample size.
However, given the fact that the confidence intervals are highly dependent on the
inverse dispersion parameter, it is very critical that the parameter be properly
Agrawal & Lord 7
estimated (see point 6 below). Unfortunately, models developed from small
sample sizes are very likely to be biased (11).
4. As expected, confidence intervals are wider at both extremes of the distribution.
This is explained by the low number of observations at these extremes.
5. Confidence intervals on the gamma mean and predictive responses were found to
be pretty large indicating highly approximate estimates for µ, m and y. This is
because the model coefficients and the observed values are only approximately
normal distributed, thus influencing the accuracy of the estimation (7).
6. Finally, the analyses described in this research were performed with the
assumption that the inverse dispersion parameter φ is properly estimated. As
reported by Lord (11), this assumption is only valid when statistical models
characterized by low sample mean values are developed using a large number of
observations, preferably above 1,000 sites (for 1.0µ ) , if possible (see the paper
for additional information on the minimum sample size requirements). If such
large samples are used, many of the issues associated with the modified GOF
statistic testing method and the increasing width of the boundaries of confidence
intervals would be avoided.
Given the results of the study, it is recommended to collect crash data at a minimum of
400 sites (i.e., segments, intersections, etc.) in order to avoid using the GOF proposed by
Wood (5) when the overall sample mean is close to 1.0. A minimum of 100 observations
is recommended for building reliable statistical models and, consequently, confidence
intervals for the same sample mean. The increase in the width of confidence intervals was
found to be less than 10% compared to the full dataset in the analysis carried out in this
research. The suggested sample size was also recommended by Lord (11) as an absolute
minimum to lessen biases in the estimating the dispersion parameter of Poisson-gamma
models.
Some recommendations for further research include the following:
1. It would be recommended to conduct further analyzes on the effects of reducing
sample sizes for different sample mean values, particularly for extremely low
Agrawal & Lord 8
sample mean values ( 1.0µ < ); perhaps using simulation would help for this
evaluation.
2. In this work, only crash-flow models were used. It is recommended to replicate
this study using “full” models that include several exploratory variables.
3. Finally, in the light of recent work on the dependence of the inverse dispersion
parameter with the covariates of the model (13), it is suggested to re-evaluate the
modified GOF statistic for varying inverse dispersion parameter.
It is hoped that this research project will help transportation safety modelers with better
guidance for selecting the appropriate sample size when the GOF statistic is used for
comparing models subjected to low sample mean values or when the computation of the
confidence intervals of crash prediction models is a critical element of the analysis, such
as the comparison of highway design alternatives or the identification of hazardous sites.
ACKNOWLEDGMENTS
The authors would like to thank Dr. Thomas Jonsson, currently a visiting fellow at the
University of Connecticut, for comments provided on an earlier version of this paper. The
paper benefited from the input of TRB reviewers.
REFERENCES
1. Fridstrøm, L., J. Ifver, S. Ingebrigtsen, R. Kulmala, and L.K. Thomsen.
Measuring the contribution of randomness, exposure, weather, and daylight to the
variation in road accident counts. Accident Analysis and Prevention, Vol. 27, No.
1, 1995, pp. 1–20.
2. Lord, D., S.P. Washington, and J.N. Ivan. Poisson, Poisson-Gamma and Zero
Inflated Regression Models of Motor Vehicle Crashes: Balancing Statistical Fit
and Theory. Accident Analysis & Prevention, Vol. 37, No. 1, 2005, pp. 35-46.
Agrawal & Lord 9
3. Maycock G. and R.D. Hall. Accidents at 4-arm roundabouts. TRRL Laboratory
Report 1120. Transportation and Road Research Laboratory, Crowthorne,
Bershire, 1984.
4. Maher, M.J., and I. Summersgill. A comprehensive methodology for the fitting of
predictive accident models. Accident Analysis & Prevention, Vol. 28, No. 3, 1996,
pp. 281–296.
5. Wood, G.R. Assessing goodness of fit for Poisson and negative binomial models
with low mean. Massey University Technical Report, Institute of Information
Sciences and Technology, Massey University, Palmerston North, New Zealand,
2000.
6. Wood, G.R. Generalized linear accident models and goodness of fit testing.
Accident Analysis & Prevention, Vol. 34, No. 1, 2002, pp. 417-427.
7. Wood, G.R. Confidence and prediction intervals for generalized linear accident
models. Accident Analysis & Prevention, Vol. 37, No. 2, 2005, pp. 267-273.
8. Dean, C.B. Modified Pseudo-Likelihood Estimator of the Overdispersion
Parameter in Poisson Mixture Models. Journal of Applied Statistics, Vol. 21, No.
6, 1994, pp. 523-532.
9. Clark, S.J., and J.N. Perry. Estimation of the Negative Binomial Parameter by
Maximum Quasi-Likelihood. Biometrics, Vol. 45, 1989, pp. 309-316.
10. Piegorsch, W.W. Maximum Likelihood Estimation for the Negative Binomial
Dispersion Parameter. Biometrics, Vol. 46, 1990, pp. 863-867.
Agrawal & Lord 10
11. Lord, D. Modeling Motor Vehicle Crashes using Poisson-gamma Models:
Examining the Effects of Low Sample Mean Values and Small Sample Size on
the Estimation of the Fixed Dispersion Parameter. Paper accepted for presentation
at the 85th Annual Meeting of the TRB, Transportation Research Board,
Washington, D.C., 2005.
12. Dobson, A.J. An Introduction to Generalized Linear Models. Chapman and Hall,
London, 1990.
13. Miaou, S.-P., and D. Lord. Modeling Traffic Crash-Flow Relationships for
Intersections: Dispersion Parameter, Functional Form, and Bayes versus
Empirical Bayes. Transportation Research Record 1840, 2003, pp. 31-40.
14. Heydecker, B.G., and J. Wu. Identification of Sites for Road Accident Remedial
Work by Bayesien Statistical Methods: An Example of Uncertain Inference.
Advances in Engineering Software, Vol. 32, 2001, pp. 859-869.
15. Lord, D., A. Manar, and A. Vizioli. Modeling Crash-Flow-Density and Crash-
Flow-V/C Ratio for Rural and Urban Freeway Segments. Accident Analysis &
Prevention, Vol. 37, No. 1, 2005, pp. 185-199.
16. Lord, D. The Prediction of Accidents on Digital Networks: Characteristics and
Issues Related to the Application of Accident Prediction Models. Ph.D.
Dissertation. Department of Civil Engineering, University of Toronto, Toronto,
Ontario, 2000.
17. Lord, D., and B.N. Persaud. Accident Prediction Models with and without Trend:
Application of the Generalized Estimating Equations Procedure. Transportation
Research Record 1717, 2000, pp. 102-108.
Agrawal & Lord 11
18. Lord, D., and B.N. Persaud. Estimating the Safety Performance of Urban
Transportation Networks. Accident Analysis & Prevention, Vol. 36, No. 2, 2004,
pp. 609-620.
19. Miaou, S.-P., and J.J. Song. Bayesian ranking of sites for engineering safety
improvements: Decision parameter, treatability concept, statistical criterion and
spatial dependence. Accident Analysis and Prevention, Vol. 37, No. 4, 2005, pp.
699-720.
20. Payne, R.W. (ed.) The Guide to Genstat. Lawes Agricultural Trust, Rothamsted
Experimental Station, Oxford, U.K., 2000.
21. Myers, R.H. Classical and Modern Regression with Applications, 2nd ed.,
Duxbury Press, Pacific Grove, CA, 2000.
22. Lawless, J.F. Negative Binomial and mixed Poisson Regression. The Canadian
Journal of Statistics, Vol. 15, No. 3, 1987, pp. 209-225.
23. Morris, C.N. Fitting Hierarchical Models. Workshop on Statistics and
Epidemiology: Environment and Health, Minneapolis, MN, 1997. (web page:
http://www.ima.umn.edu/summerstat/week6.html#wk6tue accessed on November
4th, 2005)