trb06-0960 effects of sample size on the ... effects of sample size on the goodness-of-fit statistic...

TRB06-0960

EFFECTS OF SAMPLE SIZE ON THE GOODNESS-OF-FIT STATISTIC AND

CONFIDENCE INTERVALS OF CRASH PREDICTION MODELS

SUBJECTED TO LOW SAMPLE MEAN VALUES

Ravi Agrawal

Research Assistant

Department of Civil Engineering

TAMU 3136

Texas A&M University

College Station, Texas

77843-3136

e-mail: [email protected]

Dominique Lord†

Assistant Professor

Department of Civil Engineering

Texas A&M University

College Station, Texas

77843-3136

e-mail: [email protected]

November 10th, 2005

4,459 words + 4 figures + 3 tables = 6,209 words

† Contact person

ABSTRACT

The statistical relationship between motor vehicle crashes and covariates can generally be

modeled via generalized linear models (GLMs) using logarithmic links with errors

distributed in a Poisson or Poisson-gamma manner. The scaled deviance (SD) and

Pearson’s X2 are tools that have been proposed to test statistical fit of GLMs. Recent

studies have shown that these two estimators are not adequate for testing the goodness-

of-fit (GOF) of GLMs when they are developed from data characterized with low sample

mean values. To circumvent this problem, a testing method has been proposed to evaluate

the goodness-of-fit of such GLMs. Given the fact that this method can be time-

consuming to implement, there is a need to determine whether this technique is sensitive

to different sample sizes. The primary objective of this paper was to investigate the

effects of decreasing sample sizes on the GOF testing technique. A secondary objective

was to estimate how the reducing of sample size influences the confidence intervals of

GLMs. In order to accomplish the objectives of the study, GLMs were fitted using two

datasets subjected to average and low sample means collected in Toronto, Ontario.

Several models were estimated for different sample sizes. The results of the study show

that the testing technique is more effective for smaller samples than for larger samples

when data is subjected to low sample mean values. The results also show that the width

of the confidence intervals increases, as expected, as the sample size decreases, and can

be extremely large for very small sample sizes. Hence, statistical models characterized by

low sample mean values should be developed using a large number of observations. In

fact, it is recommended to develop models using datasets containing at least 100

observations (e.g., intersections, segments, etc.). The paper concludes with

recommendations for future studies involving such datasets.

Agrawal & Lord 1

INTRODUCTION

A subject of research in the roadway safety analysis has been to analyze the effects of

random variations and various systematic causal factors on crash counts (1). The

statistical relationship between motor vehicle crashes and covariates can generally be

estimated via generalized linear models (GLMs) using logarithmic links with errors

distributed in a Poisson or Poisson-gamma (aka negative binomial) manner. Poisson

models serve well under homogeneous conditions while Poisson-gamma models serve

better when the data are subjected to heterogeneity (2). In other words, Poisson-gamma

models are more appropriate if the variation in the data is larger than the mean (usually

referred to as overdispersion).

The goodness-of-fit (GOF) of GLMs can usually be tested by the statistical indicators

scaled deviance (SD) and Pearson’s χ2. Unfortunately, it has been recently determined

that these two indicators are not adequate to determine the GOF of GLMs developed

from crash data characterized by low sample mean values. Maycock and Hall (3) were

the first to raise the issue related to low sample mean values. Fridstrøm et al. (1) further

discussed this matter, while Maher and Summersgill (4) showed how the GOF of

statistical models could be affected by a low sample mean. They defined this issue as the

“low mean problem” (LMP). Subsequent to this identification and its effects on the

development of statistical models, Wood (5, 6) proposed a method to test the GOF of

GLMs developed using data characterized with low sample mean values. Although the

method is very useful, it may be a little complicated for the average transportation safety

modeler as well as time-consuming to implement. Wood (7) also devised a method for

estimating the confidence intervals for the mean response (µ), for the gamma mean (m),

and the predicted response (y) at a new site having similar characteristics as the sites used

in the original dataset from which the model was developed. Given the recent issues

identified by researchers in statistics (8), biology (9, 10) and highway safety (11) on the

effects of small sample sizes combined with low sample mean values on the estimation of

GLMs, there is a need to determine how they affect the GOF statistic testing method and

the computation of confidence intervals. In addition, it is important to find out whether

Agrawal & Lord 2

this technique is sensitive to different sample sizes in the light of the fact that the testing

method can be time-consuming to implement. Hence, determining whether additional

data should be collected, if it is a cost-effective approach, could circumvent using the

testing method.

This paper describes an investigation into the effects of decreasing sample sizes on the

GOF test proposed by Wood (6) and the confidence intervals of GLMs (7). To test the fit,

statistical models with different sample sizes were developed and the fit was assessed by

comparing the χ2 probabilities of SD. The hypothesis was that the grouping technique (to

be discussed below) would have more important effect for smaller sample sizes than

larger sample sizes. For the same models, confidence intervals on the gamma mean and

predicted responses were then calculated to estimate the effects of reducing sample sizes

on these intervals. The goal is to quantify the changes as a function of the sample size. It

is common knowledge that smaller sample sizes increase the confidence intervals but, so

far, there has not been any research that tried to quantify or determine the magnitude of

these changes, at least not for crash prediction models. In order to accomplish the

objectives of the study, two datasets comprising of crashes and traffic flow data collected

at signalized and unsignalized intersections in Toronto, Ontario were used. The first

dataset contained observations collected at 868 sites with the corresponding entering

flows while the second data contained 354 observations with a lower sample mean than

that of the first data. One year of data was used for the model development. GLMs were

fitted to the data using the most common functional form utilized by transportation safety

modelers to link crashes to the entering flows at intersections.

The paper is divided into four sections. The first section briefly describes the

characteristics of GLMs for modeling crash-flow relationships and issues related to the

low mean problem. The second section describes the methodology used for analyzing the

two datasets. The third section summarizes the results of the analysis. The last section

provides the conclusions and recommendations for further studies.

Agrawal & Lord 3

BACKGROUND

The relationship between crashes and traffic flows can be represented by a GLM with

negative binomial (NB) or Poisson error structure (12). The most common functional

form used to characterize crash-flow relationships at intersections remains 1 20 1 2F Fβ βµ β= ,

where µ is the mean number of crashes, and F1 and F2 are entering flows for the major

and minor approaches respectively. Although this functional form does not offer the best

relationship (see 13), it will be used herein given its simplicity in the development of

predictive models and because it is still the most popular functional form used in practice.

It has been shown that the crash process can be approximated by a Poisson-based

distribution and the magnitude of the variation in the data is dependent on the

characteristics of this process (2). The most common probabilistic structure that has been

proposed to accommodate extra-variation in the data remains the Poisson-gamma or NB

model. This model can be transformed into a linear model by taking the logarithm of the

mean function. The NB distribution is characterized by two parameters µ and φ ,

representing the mean and inverse dispersion parameter respectively. In the present work,

φ is assumed to be fixed, but recent work has shown that the inverse dispersion

parameter may be dependent on the covariates of the model (see 13, 14, 15). The

probability density function (pdf) of the NB distribution can be represented below:

( ) ( )( )

; ,!

yyf y

y

φφ φ µφ µφ µ φ µ φ

Γ + ⎛ ⎞ ⎛ ⎞= ⎜ ⎟ ⎜ ⎟Γ + +⎝ ⎠ ⎝ ⎠

(1)

Where,

y = response variable (i.e., crashes per year);

µ = mean response of the distribution; and

φ = inverse dispersion parameter of the NB distribution.

Agrawal & Lord 4

The variance of the distribution can be represented by

( )2

Var y µµφ

= + (2)

The GOF of NB models can be assessed using the value of SD (which is deviance

divided by the dispersion parameter for the model and can be calculated as twice the

logarithm of ratio of the likelihoods of the two models which are compared) or Pearson’s

X2 statistic (12). These statistics follow χ2 distributions if the data are approximately

normally distributed. Thus, if these statistics are close to the degrees of freedom, it can be

said that the hypothesized model is adequate and more terms are not needed (4). The

degree of freedom for calculating the χ2 probability for the statistics is the difference of

parameters in the large model and smaller model. It was found that for very low sample

means, the SD and Pearson’s X2 are no longer χ2 distributed. This is because for small

mean values the data are not normally distributed (5). This leads to the false

approximation of the model to fit the data. Maher and Summersgill (4) suggested the use

of X2 or Expected Standard Deviation (ESD) as the measure of goodness-of-fit, but Wood

(5) showed that both of these fail with very low sample means.

Wood (5) proposed instead a grouping method to increase the suitability of the scaled

deviance and Pearson’s X2 statistic as the statistics for the approximation of adequate

fitting of the GLMs. In this method, some of the data are grouped to improve the

normality of observations. The observations and the flows for some sites are combined to

give a newly grouped observations and flows. The same fitted coefficients from the

original model are then used to calculate the new mean for the grouped sites. The reader

is referred to Wood (5) for the more detailed description of the methodology. The

grouped SD for the NB distribution has been used in this research and is given by,

ˆ ˆ

2

1

ˆ ˆˆ ˆ ˆ2 ln lnˆ ˆ ˆˆ ˆ

iyn

i i ii

i ii i

yG ry y

φµ φ µ φ

µφ φ=

⎛ ⎞⎛ ⎞ ⎛ ⎞+ +⎜ ⎟= +⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟+ +⎝ ⎠ ⎝ ⎠⎝ ⎠∑ (3)

Agrawal & Lord 5

Where ir is the group size for group i , iy is the new mean of the observed values for

group i , iµ is the estimated mean for group i , φ is the estimated dispersion parameter

and n is the number of groups. The group size ir can be different for every group in the

data depending on the observed values. Variance of SD provides the constraints on the

group size for the observations. While grouping the data makes the components of SD to

be more asymptotic to χ2 distribution, the value of SD itself can deviate from the χ2

distribution (5). Also too much grouping causes the largest model against which the

hypothesized model is being tested to get smaller. Hence, care has to be taken in

determining group sizes (5). A common φ can be used for the maximal model and

reduced model as was shown in Maher and Summersgill (4).

A GOF model is accompanied by the confidence intervals on the mean, gamma mean and

predicted responses. For the case of a NB model, a confidence interval for µ, the long-

term mean and prediction intervals for safety of a site, m and the predicted crashes at a

new site, y are needed. Wood (7) has described a method to find these intervals (Table 1).

Here, η is the logarithm of the estimated long-term mean, µ while φ is the inverse

dispersion parameter estimated during the fitting process.

Agrawal & Lord 6

Table 1. 95% Confidence and Prediction Intervals for Poisson-gamma models (7)

Parameter Intervals

µ ˆ1.96 ( )

ˆ1.96 ( )

ˆ ˆ, VarVar

ee

ηη

µ µ⎡ ⎤⎢ ⎥⎣ ⎦

m 2 22

2 22

ˆˆ ˆvar( )ˆˆ ˆmax 0, 1.96 var( ) ,

ˆˆ ˆvar( )ˆˆ ˆ1.96 var( )

µ η µµ µ ηφ

µ η µµ µ ηφ

⎡ ⎧ ⎫+⎪ ⎪⎢ − +⎨ ⎬⎢ ⎪ ⎪⎩ ⎭⎣

⎤++ + ⎥

⎥⎦

y 2 22 ˆˆ ˆ( )ˆˆ ˆ ˆ0, 19 ( ) VarVar µ η µµ µ η µ

φ

⎡ ⎤⎢ ⎥+⎢ ⎥+ + +⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦⎣ ⎦

Note: 1

0 0ˆ( ) ( )Var η −′= x XWX x

x⎢ ⎥⎣ ⎦ denotes the largest integer less or equal than x

The next section describes how the two datasets with different means were analyzed to

evaluate the effect of varying sample sizes on the appropriateness of the methods

proposed by Wood (5, 7).

METHODOLOGY

Two datasets were used in this study. The datasets were initially collected for a project

related to the development of statistical models for predicting the safety performance of

signalized and unsignalized intersections in Toronto, Ont. (16). The datasets have been

found to be of very good quality and have been used extensively over the last few years

(11, 13, 16, 17, 18, 19). The samples included information on fatal and non-fatal injury

crashes at each site along with the major and minor average daily traffic flows (AADT).

The first dataset contained 868 observations (signalized intersections) for the year 1995.

The second dataset contained 354 observations (unsignalized intersections) with a lower

Agrawal & Lord 7

sample mean than that of the first dataset. The first dataset had a sample mean equal to

3.93 and a sample variance equal to 13.96. The sample mean and variance for the second

dataset were equal to 1.01 and 1.28, respectively. A plot of the mean vs variance for the

data at hand revealed overdispersion. For both the datasets, the GLM was assumed to

follow the proposed functional form 1 20 1 2F Fβ βµ β= , where µ is the mean response

corresponding to the entering flows 1F and 2F for the major and minor approaches

respectively, and the sβ are the regression coefficients and can be represented by the

matrix β .

The steps performed in the analysis are as follows:

1. The data contained the injury counts and the two flows (major and minor for

every site). Only fatal and non-fatal injury crashes were used for modeling and

analysis purposes. A preliminary estimate of inverse dispersion parameter (φ )

was obtained using Equation (2) by computing the mean and variance of the crash

counts. This preliminary estimate is used for determining the magnitude of the

variation of within the dataset, and consequently selecting the proper error

distribution for the model to be fitted (this approach provides a better estimate of

the dispersion in the data than the plot described above). Genstat (20) was used

for estimating the model coefficients ( sβ ) and the inverse dispersion parameter

(φ ). This was done by regressing the injury counts with the logarithm of the

flows in Genstat for a generalized linear model with a logarithm link. The

mathematics behind the model fitting process is described in details in Wood (6).

In this work, it is assumed that the fixed inverse dispersion parameter is not

affected by the low sample mean or small sample size (see 11 for additional

information on this assumption).

2. Once the model coefficients and φ are estimated, the SD component for every

site can be estimated using Equation (3). The sum of the components of the scaled

deviance would give the value for G2. The scaled deviance components are

Agrawal & Lord 8

hypothesized to be distributed asymptotically to a χ2 distribution. Hence, the value

of G2 is used to get a χ2 probability, which tells about the GOF of the model.

3. Since we are considering the problem of low mean here, the validity of SD as an

indicator of GOF may not be true. Hence a new scaled deviance needs to be

calculated which has the components distributed asymptotically to χ2 distribution.

As described earlier, this method is explained by Wood (5, 6). Wood used a

method to group the values of counts and flows in such a way that new SD

components fit the χ2 distribution better. This method was then used to calculate

the new SD. The method is described as follows (6):

• For each flow–accident pair, calculate the fitted µ and associate the pair with

a group size r .

• “Working from right to left in the regression plot, group the data so that each

flow–accident pair is in a group of size at least as large as the associated r .

This would give n groups with the ith group of size ir having crash counts yi1,

yi2,…,yiri and associated flow counts of x1, x2…xi.”

• Average the x and y values for each group.

• Calculate the new SD using Equation (3) where iµ is the value fitted by the

model at average xi.

• Test SD against a χ2 distribution, with degrees of freedom equal to n minus

p .

4. The confidence intervals for m and y were then calculated using the equations in

Table 1 (7). In these equations, η is the logarithm of the estimated mean ( µ )

while φ is the inverse dispersion parameter estimated during the fitting process.

The Var(η ) is calculated by ′ -1X I X where -1I is the variance-covariance matrix

of the parameters (usually provided by the output of statistical software programs)

and X is defined as the input ( 1)n p× + matrix containing the entering flows in

the logarithmic form. In cases where the output is not available, the variance-

Agrawal & Lord 9

covariance matrix ( ) 1−′X WX can be estimated directly, where W is the diagonal

n n× weights matrix with ( ) ( )ii i iw φ µ φ µ= × + , with i varying from 0 to n

sites.

5. In the last step, the same process used for steps 1-4 was repeated for lower sample

sizes. The sample sizes of 20, 50, 100, 200, 400 and 868 were analyzed for the

first data while the sample sizes of 20, 50, 100, 200 and 354 were analyzed from

the second data. For the latter dataset, a few samples were tested for n=20, but the

statistical models did not provide an adequate fit. In fact, some of the coefficients

did not provide a logical relationship with the crash data. For instance, the

coefficient for the flow on the minor approach was negative for the tested samples.

Thus, although still shown in Table 3, it was decided to remove the sample size

equal to 20 from further analysis. Finally, all samples were randomly taken from

the data to minimize biases.

STATISTICAL ANALYSIS

This section summarizes the statistical analysis carried out in this work and is separated

into two sub-sections. The first sub-section describes the results of the GOF analysis. The

second section shows the results of the confidence interval analysis.

GOODNESS-OF-FIT ANALYSIS

The first objective of this paper was to evaluate whether sample sizes influence the GOF

test proposed by Wood (6). The results of this analysis are presented in Tables 2 and 3.

Table 2 shows the values of χ2 probabilities of the GOF for scaled deviances for the first

dataset. These values were plotted against the sample sizes before grouping and after

grouping (Figure 1). The plot shows that the problem of scaled deviance not following

the χ2 distribution is not significant. As can be seen, there is not a large difference

between the χ2 probabilities of scaled deviance before or after grouping. This is expected,

Agrawal & Lord 10

since at higher sample means the scaled deviance without grouping is a good

approximation of the GOF of the data. As the sample size becomes smaller from 868 to

20, there is no deviation from this pattern and the values of scaled deviances are similar

before and after grouping.

Agrawal & Lord 11

Agrawal & Lord 1

Table 2. Comparison of χ2 Probabilities for the Scaled Deviances for Different Sample Sizes for First Dataset First Dataset (868)

Ungrouped Grouped

Sample

size Mean Variance )ln( 0β

1β 2β φ G-square dof χ2-

prob G-

square dof χ2-prob

Difference in

Ungrouped and

Grouped χ2-

prob (%)

20 3.65 12.24

-32.7

(13.8)†

2.97

(1.31)

0.308

(0.198)

3.9857

(3.1907) 23.80 17 0.125 23.93 15 0.121 -3.20

50 3.42 12.98

-12.26

(4.31)

0.817

(0.404)

0.571

(0.152)

2.3101

(0.8644) 55.91 97 0.175 48.69 42 0.222 26.86

100 3.49 12.27

-10.39

(2.48)

0.605

(0.232)

0.601

(0.0825)

4.4665

(1.5129) 106.39 97 0.242 104.40 89 0.291 20.17

200 3.6 12.75

-12.83

(1.78)

0.851

(0.172)

0.5816

(0.0613)

4.8058

(1.1949) 217.42 197 0.152 191.50 180 0.265 74.44

400 3.39 10.48

-11.93

(1.06)

0.7795

(0.0991)

0.5667

(0.0433)

7.0138

(1.5436) 431.79 397 0.111 3.39 363 0.128 16.20

868 3.93 13.96

-11.959

(0.651)

0.7482

(0.0644)

0.6136

(0.03)

7.7356

(1.1689) 968.22 865 0.008 893.47 805 0.016 103.75 † Standard Error

Agrawal & Lord 1

0

0.2

0.4

0.6

0.8

1

20 50 100 200 400 868

Sample size

Chi

-squ

are

prob

a

Ungrouped SD- Chi-square probability Grouped SD- Chi-square probability

Figure 1. Scaled Deviance χ2 Probabilities for Different Sample Sizes

for First Dataset

Table 3 shows the values of χ2 probabilities of the GOF for scaled deviances for the

second dataset. Similar to above, these values were plotted against the sample sizes

before grouping and after grouping (Figure 2). This data had a lower sample mean, i.e.

1.01, for which the problem of scaled deviance not following the χ2 distribution is much

larger compared to the first dataset. As can be seen, there is a large difference between

the χ2 probabilities of scaled deviance before or after grouping. This is expected, as at

lower means the scaled deviance without grouping is not a good approximation of the

GOF of the data. The grouped SD is a better indicator of GOF of the GLM. It can be seen

that at larger sample sizes, the effect of grouping the data is not as large as for lower

sample sizes. The χ2 probabilities are closer with large samples. In addition, it can be seen

that the χ2 probabilities are a lot different for lower sample sizes even after grouping. This

indicates that sample size can make some difference to the statistical fit of the model.

Figure 2 shows that, as reported by Wood (6), grouping data offers a better approach to

estimate the statistical fit of models with low sample mean values.

Agrawal & Lord 1

Table 3. Comparison of χ2 Probabilities for the Scaled Deviances for Different Sample Sizes for Second Dataset Second dataset (354)

Ungrouped Grouped Sample

size Mean Variance ( )0ln β 1β 2β φ

G-

square dof χ2-prob

G-

square dof

χ2-

prob

Difference in

Ungrouped

and Grouped

χ2-prob (%)

20‡ 1.15 2.77

-11.61

(9.90)†

1.571

(0.835)

-0.56

(0.656)

0.795

(0.6098) 17.30 17 0.4344 7.2699 6 0.297 -31.74

50 0.98 1.73

-7.97

(5.75)

0.670

(0.449)

0.162

(0.337)

1.1938

(0.6827) 50.20 47 0.348 11.92 20 0.919 164.10

100 0.96 1.25

-8.33

(3.66)

0.68

(0.294)

0.193

(0.207)

4.2732

(3.7004) 110.06 97 0.172 43.61 43 0.446 158.86

200 0.94 1.13

-10.66

(2.56)

0.787

(0.197)

0.360

(0.142)

11.0436

(14.9561) 224.13 197 0.090 138.96 97 0.003 -96.33

354 1.01 1.28

-7.65

(1.83)

0.607

(0.142)

0.209

(0.103)

4.8367

(2.3758) 396.05 351 0.049 224.40 172 0.004 -90.97 † Standard Error ‡ Sample size equal to 20 did not provide a good statistical model, since the coefficient for F2 is negative. The model is shown here for illustration purposes

only and was not used for further analyses

Agrawal & Lord 1

0

0.2

0.4

0.6

0.8

1

50 100 200 354

Sample size

Chi

squa

re p

roba

UnGrouped SD- Chi-square probability Grouped SD- Chi-square probability

Figure 2. Scaled Deviance χ2 Probabilities for Different Sample Sizes

for Second Dataset

CONFIDENCE INTERVAL ANALYSIS

The second objective of this paper was to analyze the effect of the reducing sample sizes

on the confidence intervals of the gamma mean and predicted response. The analysis was

initially carried out with the second dataset because the functional form for every sample

size was very similar; in other words, the coefficients of the models were very close. This

made the comparison between each sample size easier. Consequently, confidence

intervals were estimated for sample sizes equal to 50, 100, and 384. The summary of the

models were presented in Table 3.

Using the equations presented in Table 1, the 95%-precentile confidence intervals on the

gamma mean and predicted response were computed for the different sample sizes for an

entering flow for the minor approach 2 2,000F = . The results are presented in Figures 3

and 4 respectively. As a general trend, these figures show the confidence intervals on the

gamma mean and predicted response become larger as the sample size becomes smaller.

For instance, the width of the confidence intervals for the gamma mean increases

between 0% to 10% when the sample size is reduced from 354 to 100. The increase is

much more important for reducing the sample size from 354 to 50, with an increase in the

width varying between 72% and 85%. The same outcome can be seen for the predicted

Agrawal & Lord 2

response, in which the confidence interval increases with a smaller sample size. The

difference can be as high as 65% for the sample reduced from 354 to 50. Additionally,

although difficult to see at the lower left-hand side, the confidence intervals are actually

wider at lower and higher flows, similar to the confidence intervals produced for linear

statistical models (21). This characteristic is explained by the fact that there are fewer

sites at both extremities.

Agrawal & Lord 3

Sample s ize=3 54 , F2 =2 0 0 0

0

0.5

1

1.5

2

2.5

3

3.5

4

F1 (AADT)

Sample s ize=10 0 , F2 =2 0 0 0

0

0.5

1

1.5

2

2.5

3

3.5

4

F1 (AADT)

Sample s ize=50, F2=2000

0

1

2

3

4

5

6

7

F1 (AADT)

Figure 3. 95% Confidence Intervals for the gamma Mean m

Agrawal & Lord 4

Sample size=354, F2=2000

0

1

2

3

4

5

6

7

8

9

1000

4000

7000

1000

013

000

1600

019

000

2200

025

000

2800

031

000

3400

037

000

4000

043

000

4600

049

000

5200

055

000

5800

0

F1 (AADT)

y


0

1

2

3

4

5

6

7

8

9

10

1000

4000

7000

1000

013

000

1600

019

000

2200

025

000

2800

031

000

3400

037

000

4000

043

000

4600

049

000

5200

055

000

5800

0

F1 (AADT)

y


0

2

4

6

8

10

12

14

1000

4000

7000

1000

013

000

1600

019

000

2200

025

000

2800

031

000

3400

037

000

4000

043

000

4600

049

000

5200

055

000

5800

0

F1 (AADT)

y

Figure 4. 95% Confidence Intervals for the Predicted Response y

Agrawal & Lord 5

In order to remove the effects caused by using different values for the inverse dispersion

parameter and only account for the sampling error of the model (via )(ηVar ), the

statistical models were re-fitted using the inverse dispersion parameter estimated from the

sample of 354 sites ( 87.4=φ ) for the sample size of 100 and 50 respectively. New

confidence intervals were estimated and compared with the one estimated from the

original model for 354 sites. When the sampling error accounts for the only variation in

the model, the effects of reducing sample size on the estimation of the confidence was

also affected, albeit to a lesser degree. For instance, the width of the confidence interval

for gamma mean varied between -3% (for the left-hand side) to 10% for a sample size

equal to 100 and between 10% and 20% for a sample size equal to 50. This implies that

lower sample sizes increase the width of the confidence intervals even if the inverse

dispersion parameter remains constant. This is expected since )(ηVar is also dependent

on the dispersion parameter. On the other hand, the confidence intervals on the predicted

response are less affected with changes varying between -20% to an increase of 12.5% (at

high flows). In this case, the width of the steps where they start and end will be different,

as shown in Figure 4. Thus, some values will automatically jump by one point while

some will go down. Nonetheless, the width increases at the both extremities.

The first dataset, with a higher sample mean, was also used for evaluating the effects of

reducing sample size on the confidence intervals. The same entering flow 2 2,000F =

was employed in this exercise. Because the predicted values varied greatly between the

models developed from different sample sizes, the coefficients and the dispersion

parameter from the full dataset along with the variance-covariance matrix output from the

lower sample sizes were used to compute the confidence intervals. Although not ideal,

the comparison showed that the confidence intervals for the gamma mean and predicted

response increased on average by similar percentages for sample sizes equal to 100 and

50 respectively. Interestingly, the confidence intervals for the gamma mean increased by

an average of 136% for a sample size equal to 20 (compared to the full dataset). With this

kind of increase, developing statistical models using such small number of observations

is not recommended.

Agrawal & Lord 6

SUMMARY AND CONCLUSIONS

This study aimed at analyzing the effects of reducing sample sizes on the improved GOF

statistic method proposed by Wood (6) which was devised to increase the appropriateness

of the SD as a GOF indicator for GLMs with Poisson or NB error structure. Two datasets

containing fatal and non-fatal injury crash counts and traffic flow data were analyzed

with respective mean injury counts of 3.9 and 1.01. Different sample sizes were analyzed

for each dataset. For each sample size, the GOF was tested using a GLM with and

without grouping of the data. In addition, the confidence intervals for the gamma mean

and predicted responses were estimated using data the datasets.

The following results were obtained:

1. There was no strong effect of grouping on the χ2 probabilities for first dataset

(with mean injury count 3.9) with different sample sizes. This was expected as the

grouping technique was only needed when the scaled deviance (SD) does not

follow a χ2 distribution. It can be concluded based on the first dataset that the

grouping technique might not be needed for improving goodness-of-fit procedure

for data with large mean.

2. There was a stronger effect of the grouping on the χ2 probabilities for the second

dataset with different sample sizes. This was expected as the low mean data does

not approximate normal distribution, especially for small sample sizes, and hence

SD does not follow a χ2 distribution. As discussed by Lawless (22), the inferences

associated with NB models become asymptotically (or normally) distributed as

the sample size increases, i.e. ∞→× µn . (see also 23 for a discussion on the

asymptotically approximation of Poisson models estimated using small sample

size.)

3. Confidence intervals for the gamma mean and predicted responses got wider with

decreasing sample sizes. This is reasonable as with the reduction in sample size.

However, given the fact that the confidence intervals are highly dependent on the

inverse dispersion parameter, it is very critical that the parameter be properly

Agrawal & Lord 7

estimated (see point 6 below). Unfortunately, models developed from small

sample sizes are very likely to be biased (11).

4. As expected, confidence intervals are wider at both extremes of the distribution.

This is explained by the low number of observations at these extremes.

5. Confidence intervals on the gamma mean and predictive responses were found to

be pretty large indicating highly approximate estimates for µ, m and y. This is

because the model coefficients and the observed values are only approximately

normal distributed, thus influencing the accuracy of the estimation (7).

6. Finally, the analyses described in this research were performed with the

assumption that the inverse dispersion parameter φ is properly estimated. As

reported by Lord (11), this assumption is only valid when statistical models

characterized by low sample mean values are developed using a large number of

observations, preferably above 1,000 sites (for 1.0µ ) , if possible (see the paper

for additional information on the minimum sample size requirements). If such

large samples are used, many of the issues associated with the modified GOF

statistic testing method and the increasing width of the boundaries of confidence

intervals would be avoided.

Given the results of the study, it is recommended to collect crash data at a minimum of

400 sites (i.e., segments, intersections, etc.) in order to avoid using the GOF proposed by

Wood (5) when the overall sample mean is close to 1.0. A minimum of 100 observations

is recommended for building reliable statistical models and, consequently, confidence

intervals for the same sample mean. The increase in the width of confidence intervals was

found to be less than 10% compared to the full dataset in the analysis carried out in this

research. The suggested sample size was also recommended by Lord (11) as an absolute

minimum to lessen biases in the estimating the dispersion parameter of Poisson-gamma

models.

Some recommendations for further research include the following:

1. It would be recommended to conduct further analyzes on the effects of reducing

sample sizes for different sample mean values, particularly for extremely low

Agrawal & Lord 8

sample mean values ( 1.0µ < ); perhaps using simulation would help for this

evaluation.

2. In this work, only crash-flow models were used. It is recommended to replicate

this study using “full” models that include several exploratory variables.

3. Finally, in the light of recent work on the dependence of the inverse dispersion

parameter with the covariates of the model (13), it is suggested to re-evaluate the

modified GOF statistic for varying inverse dispersion parameter.

It is hoped that this research project will help transportation safety modelers with better

guidance for selecting the appropriate sample size when the GOF statistic is used for

comparing models subjected to low sample mean values or when the computation of the

confidence intervals of crash prediction models is a critical element of the analysis, such

as the comparison of highway design alternatives or the identification of hazardous sites.

ACKNOWLEDGMENTS

The authors would like to thank Dr. Thomas Jonsson, currently a visiting fellow at the

University of Connecticut, for comments provided on an earlier version of this paper. The

paper benefited from the input of TRB reviewers.

REFERENCES

1. Fridstrøm, L., J. Ifver, S. Ingebrigtsen, R. Kulmala, and L.K. Thomsen.

Measuring the contribution of randomness, exposure, weather, and daylight to the

variation in road accident counts. Accident Analysis and Prevention, Vol. 27, No.

1, 1995, pp. 1–20.

2. Lord, D., S.P. Washington, and J.N. Ivan. Poisson, Poisson-Gamma and Zero

Inflated Regression Models of Motor Vehicle Crashes: Balancing Statistical Fit

and Theory. Accident Analysis & Prevention, Vol. 37, No. 1, 2005, pp. 35-46.

Agrawal & Lord 9

3. Maycock G. and R.D. Hall. Accidents at 4-arm roundabouts. TRRL Laboratory

Report 1120. Transportation and Road Research Laboratory, Crowthorne,

Bershire, 1984.

4. Maher, M.J., and I. Summersgill. A comprehensive methodology for the fitting of

predictive accident models. Accident Analysis & Prevention, Vol. 28, No. 3, 1996,

pp. 281–296.

5. Wood, G.R. Assessing goodness of fit for Poisson and negative binomial models

with low mean. Massey University Technical Report, Institute of Information

Sciences and Technology, Massey University, Palmerston North, New Zealand,

2000.

6. Wood, G.R. Generalized linear accident models and goodness of fit testing.

Accident Analysis & Prevention, Vol. 34, No. 1, 2002, pp. 417-427.

7. Wood, G.R. Confidence and prediction intervals for generalized linear accident

models. Accident Analysis & Prevention, Vol. 37, No. 2, 2005, pp. 267-273.

8. Dean, C.B. Modified Pseudo-Likelihood Estimator of the Overdispersion

Parameter in Poisson Mixture Models. Journal of Applied Statistics, Vol. 21, No.

6, 1994, pp. 523-532.

9. Clark, S.J., and J.N. Perry. Estimation of the Negative Binomial Parameter by

Maximum Quasi-Likelihood. Biometrics, Vol. 45, 1989, pp. 309-316.

10. Piegorsch, W.W. Maximum Likelihood Estimation for the Negative Binomial

Dispersion Parameter. Biometrics, Vol. 46, 1990, pp. 863-867.

Agrawal & Lord 10

11. Lord, D. Modeling Motor Vehicle Crashes using Poisson-gamma Models:

Examining the Effects of Low Sample Mean Values and Small Sample Size on

the Estimation of the Fixed Dispersion Parameter. Paper accepted for presentation

at the 85th Annual Meeting of the TRB, Transportation Research Board,

Washington, D.C., 2005.

12. Dobson, A.J. An Introduction to Generalized Linear Models. Chapman and Hall,

London, 1990.

13. Miaou, S.-P., and D. Lord. Modeling Traffic Crash-Flow Relationships for

Intersections: Dispersion Parameter, Functional Form, and Bayes versus

Empirical Bayes. Transportation Research Record 1840, 2003, pp. 31-40.

14. Heydecker, B.G., and J. Wu. Identification of Sites for Road Accident Remedial

Work by Bayesien Statistical Methods: An Example of Uncertain Inference.

Advances in Engineering Software, Vol. 32, 2001, pp. 859-869.

15. Lord, D., A. Manar, and A. Vizioli. Modeling Crash-Flow-Density and Crash-

Flow-V/C Ratio for Rural and Urban Freeway Segments. Accident Analysis &

Prevention, Vol. 37, No. 1, 2005, pp. 185-199.

16. Lord, D. The Prediction of Accidents on Digital Networks: Characteristics and

Issues Related to the Application of Accident Prediction Models. Ph.D.

Dissertation. Department of Civil Engineering, University of Toronto, Toronto,

Ontario, 2000.

17. Lord, D., and B.N. Persaud. Accident Prediction Models with and without Trend:

Application of the Generalized Estimating Equations Procedure. Transportation

Research Record 1717, 2000, pp. 102-108.

Agrawal & Lord 11

18. Lord, D., and B.N. Persaud. Estimating the Safety Performance of Urban

Transportation Networks. Accident Analysis & Prevention, Vol. 36, No. 2, 2004,

pp. 609-620.

19. Miaou, S.-P., and J.J. Song. Bayesian ranking of sites for engineering safety

improvements: Decision parameter, treatability concept, statistical criterion and

spatial dependence. Accident Analysis and Prevention, Vol. 37, No. 4, 2005, pp.

699-720.

20. Payne, R.W. (ed.) The Guide to Genstat. Lawes Agricultural Trust, Rothamsted

Experimental Station, Oxford, U.K., 2000.

21. Myers, R.H. Classical and Modern Regression with Applications, 2nd ed.,

Duxbury Press, Pacific Grove, CA, 2000.

22. Lawless, J.F. Negative Binomial and mixed Poisson Regression. The Canadian

Journal of Statistics, Vol. 15, No. 3, 1987, pp. 209-225.

23. Morris, C.N. Fitting Hierarchical Models. Workshop on Statistics and

Epidemiology: Environment and Health, Minneapolis, MN, 1997. (web page:

http://www.ima.umn.edu/summerstat/week6.html#wk6tue accessed on November

4th, 2005)

trb06-0960 effects of sample size on the ... effects of sample size on the goodness-of-fit statistic...

Documents