a caution about using deviance information criterion while modeling traffic crashes

4
A caution about using deviance information criterion while modeling traffic crashes Srinivas Reddy Geedipally a,, Dominique Lord b,1 , Soma Sekhar Dhavala c a Texas A&M Transportation Institute, Texas A&M University System, 110 N. Davis Dr. Suite 101, Arlington, TX 76013-1877, United States b Zachry Department of Civil Engineering, Texas A&M University, 3136 TAMU, College Station, TX 77843-3136, United States c Dow AgroSciences, Indianapolis, IN, United States article info Article history: Received 18 June 2013 Accepted 6 October 2013 Available online 30 October 2013 Keywords: Deviance information criterion Poisson-Gamma Negative binomial Traffic crashes abstract The Poisson-Gamma (PG) or negative binomial (NB) model still remains the most popular method used for analyzing count data. In the software WinBUGS (or any other software used for Bayesian analyses), there are different ways to parameterize the NB model. In general, either a PG (based on the Poisson- mixture) or a NB (based on the Pascal distribution) modeling framework can be used to relate traffic crashes to the explanatory variables. However, it is important to note that the way the model is param- eterized will influence the output of the Deviance Information Criterion (DIC) values. The objective of this short study is to document the difference between the PG and NB models in the estimation of the DIC. This is especially important given that the NB/PG model is still the most frequently used model in highway safety research and applications. To accomplish the study objective, PG and NB models were developed using the crash data collected at 4-legged signalized intersections in Toronto, Ont. The study results showed that there is a considerable difference in the estimation of the DIC values between the two models. It is thus recommended not to consider the DIC as the sole model selection criterion and the com- parison should be done only between the models that have similar parameterization. Other alternatives such as Bayes Factors, Posterior predictive performance criterion, Bayesian Information Criterion (BIC), among others need to be considered in addition to the DIC in the model selection. Ó 2013 Elsevier Ltd. All rights reserved. 1. Introduction Despite the recent developments in regression modeling and analysis techniques, the Poisson-Gamma (PG) or negative binomial (NB) model still remains the most popular method used for analyz- ing count data (Hilbe, 2011). Its popularity in highway safety is no different than with other fields of research (Lord and Mannering, 2010). The extensive application of the PG/NB model is explained by its ability to capture (moderate) over-dispersion (i.e., the variance is large than the mean), the simplicity in manipulating the relationship between the mean and the variance, and the fact the model is available in all commercially available statistical programs. The PG/NB model can be derived using several approaches (Hilbe, 2011). The most common approach is based on the PG mixture distribution (Lawless, 1987; Cameron and Trivedi, 1998). The PG model has properties that are very similar to the Poisson model in which the dependent variable Y i is modeled as a Poisson variable with a mean l i where the model error is assumed to fol- low a Gamma distribution. As it names implies, the Poisson- Gamma is a mixture of two distributions and was first derived by Greenwood and Yule (1920). This mixture distribution was developed to account for over-dispersion that is commonly ob- served in discrete or count data (Lord et al., 2005). It became very popular because the conjugate distribution (same family of func- tions) has a closed form and leads to NB distribution. As discussed by Cook (2009), ‘‘the name of this distribution comes from apply- ing the binomial theorem with a negative exponent’’. Recently, researchers in statistics and highway safety have been using an alternative parameterization of the NB distribution for analyzing count data (Zamani and Ismail, 2010; Lord and Geedipally, 2011). This parameterization is based on the probabil- ity of successes and failures in successive trials (Casella and Berger, 1990). This process is also referred to as the Pascal distribution. The proposed parameterization was needed for the development of the NB-Lindley model (Geedipally et al., 2012). In theory, the PG (based on the Poisson-mixture distribution) and the NB (based on the Pas- cal distribution) models will provide the same estimates. During the development of the NB-Lindley model (Geedipally et al. 2012), it was noted that, although the PG and NB models provided the same modeling output (i.e., coefficients, standard errors, etc.), 0925-7535/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ssci.2013.10.007 Corresponding author. Tel.: +1 (817) 462 0519. E-mail addresses: [email protected] (S.R. Geedipally), d-lord@tamu. edu (D. Lord), [email protected] (S.S. Dhavala). 1 Tel.: +1 (979) 458 3949. Safety Science 62 (2014) 495–498 Contents lists available at ScienceDirect Safety Science journal homepage: www.elsevier.com/locate/ssci

Upload: soma-sekhar

Post on 30-Dec-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A caution about using deviance information criterion while modeling traffic crashes

Safety Science 62 (2014) 495–498

Contents lists available at ScienceDirect

Safety Science

journal homepage: www.elsevier .com/locate /ssc i

A caution about using deviance information criterion while modelingtraffic crashes

0925-7535/$ - see front matter � 2013 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.ssci.2013.10.007

⇑ Corresponding author. Tel.: +1 (817) 462 0519.E-mail addresses: [email protected] (S.R. Geedipally), d-lord@tamu.

edu (D. Lord), [email protected] (S.S. Dhavala).1 Tel.: +1 (979) 458 3949.

Srinivas Reddy Geedipally a,⇑, Dominique Lord b,1, Soma Sekhar Dhavala c

a Texas A&M Transportation Institute, Texas A&M University System, 110 N. Davis Dr. Suite 101, Arlington, TX 76013-1877, United Statesb Zachry Department of Civil Engineering, Texas A&M University, 3136 TAMU, College Station, TX 77843-3136, United Statesc Dow AgroSciences, Indianapolis, IN, United States

a r t i c l e i n f o

Article history:Received 18 June 2013Accepted 6 October 2013Available online 30 October 2013

Keywords:Deviance information criterionPoisson-GammaNegative binomialTraffic crashes

a b s t r a c t

The Poisson-Gamma (PG) or negative binomial (NB) model still remains the most popular method usedfor analyzing count data. In the software WinBUGS (or any other software used for Bayesian analyses),there are different ways to parameterize the NB model. In general, either a PG (based on the Poisson-mixture) or a NB (based on the Pascal distribution) modeling framework can be used to relate trafficcrashes to the explanatory variables. However, it is important to note that the way the model is param-eterized will influence the output of the Deviance Information Criterion (DIC) values. The objective of thisshort study is to document the difference between the PG and NB models in the estimation of the DIC.This is especially important given that the NB/PG model is still the most frequently used model inhighway safety research and applications. To accomplish the study objective, PG and NB models weredeveloped using the crash data collected at 4-legged signalized intersections in Toronto, Ont. The studyresults showed that there is a considerable difference in the estimation of the DIC values between the twomodels. It is thus recommended not to consider the DIC as the sole model selection criterion and the com-parison should be done only between the models that have similar parameterization. Other alternativessuch as Bayes Factors, Posterior predictive performance criterion, Bayesian Information Criterion (BIC),among others need to be considered in addition to the DIC in the model selection.

� 2013 Elsevier Ltd. All rights reserved.

1. Introduction

Despite the recent developments in regression modeling andanalysis techniques, the Poisson-Gamma (PG) or negative binomial(NB) model still remains the most popular method used for analyz-ing count data (Hilbe, 2011). Its popularity in highway safety is nodifferent than with other fields of research (Lord and Mannering,2010). The extensive application of the PG/NB model is explainedby its ability to capture (moderate) over-dispersion (i.e., thevariance is large than the mean), the simplicity in manipulatingthe relationship between the mean and the variance, and the factthe model is available in all commercially available statisticalprograms.

The PG/NB model can be derived using several approaches(Hilbe, 2011). The most common approach is based on the PGmixture distribution (Lawless, 1987; Cameron and Trivedi, 1998).The PG model has properties that are very similar to the Poissonmodel in which the dependent variable Yi is modeled as a Poisson

variable with a mean li where the model error is assumed to fol-low a Gamma distribution. As it names implies, the Poisson-Gamma is a mixture of two distributions and was first derivedby Greenwood and Yule (1920). This mixture distribution wasdeveloped to account for over-dispersion that is commonly ob-served in discrete or count data (Lord et al., 2005). It became verypopular because the conjugate distribution (same family of func-tions) has a closed form and leads to NB distribution. As discussedby Cook (2009), ‘‘the name of this distribution comes from apply-ing the binomial theorem with a negative exponent’’.

Recently, researchers in statistics and highway safety have beenusing an alternative parameterization of the NB distribution foranalyzing count data (Zamani and Ismail, 2010; Lord andGeedipally, 2011). This parameterization is based on the probabil-ity of successes and failures in successive trials (Casella and Berger,1990). This process is also referred to as the Pascal distribution. Theproposed parameterization was needed for the development of theNB-Lindley model (Geedipally et al., 2012). In theory, the PG (basedon the Poisson-mixture distribution) and the NB (based on the Pas-cal distribution) models will provide the same estimates. Duringthe development of the NB-Lindley model (Geedipally et al.2012), it was noted that, although the PG and NB models providedthe same modeling output (i.e., coefficients, standard errors, etc.),

Page 2: A caution about using deviance information criterion while modeling traffic crashes

496 S.R. Geedipally et al. / Safety Science 62 (2014) 495–498

differences were observed with the Deviance Information Criterion(DIC), a commonly used goodness-of-fit (GOF) measure used forassessing the performance of competitive Bayesian models. Theresearchers investigated this difference and this paper documentsthis effort in order to warn other researchers about potential issuesand pitfalls with using the DIC for comparing Bayesian models.

2. Background

This section briefly describes the differences between theparameterization of the PG and the NB models.

2.1. Poisson-Gamma model

For modeling traffic crash data, researchers have been using thefollowing model structure for the PG model. The crash frequency‘yi’ for a particular ith site when conditional on its mean li is Pois-son distributed and independent over all sites and time periods(Miaou and Lord, 2003):

yi j li � Poisson ðliÞ i ¼ 1;2; . . . ; I ð1Þ

The crash mean li is structured as:

li ¼ f ðX; bÞ expðeiÞ ð2Þ

where f(X; b) is a function of the explanatory variables (X), b is avector of coefficients that are estimated from the data; and, ei isthe model error independent of all the covariates, which follows agamma distribution with same shape and location parameters.

From the above equations, it can be shown that yi, conditionalon li and /, is distributed as a PG random variable with a meanli and a variance li þ l2

i =/, respectively. The probability massfunction (PMF) of the PG structure described above is given bythe following equation:

f ðyi; /;liÞ ¼Cðyi þ /ÞCð/Þyi!

/li þ /

� �/ li

li þ /

� �yi

ð3Þ

where yi, response variable for site i; li, mean response for site i;and, /, inverse dispersion parameter of the PG distribution.In thesoftware WinBUGS (Spiegelhalter et al., 2003), for example, thecoefficients of the PG regression model will be estimated usingthe following parameterization:

yðiÞ � dpois ðl½i�Þ

l½i� ¼ f ðX; bÞ expðe½i�Þ

expðe½i�Þ � dgamma ð/;/Þ

It can be can recognized that the PG model is a hierarchicalmodel, where the Poisson component constitutes the likelihoodat the data level and the gamma distribution appears in the nexthierarchy at random effects level.

2.2. Negative binomial model

As discussed above, the NB can be derived using the probabilityof successes and failures in successive trials (Benjamin and Cornell,1970). It can be shown that the PMF of the NB distribution can begiven as:

PðY ¼ yi; /;piÞ ¼Cð/þ yiÞCð/Þ � yi!

ð1� piÞ/ðpiÞ

yi ; / > 0; 0 < pi < 1 ð4Þ

The parameter ‘p’ is defined as the probability of success in eachtrial and is given as:

pi ¼li

li þ /ð5Þ

where li, mean response for observation i; and, /, inverse disper-sion parameter of the NB distribution.

In the software WinBUGS (Spiegelhalter et al., 2003), the coeffi-cients of the NB regression model will be estimated using thefollowing parameterization:

yðiÞ � dnegbinðp½i�;/Þ

p½i� ¼ //þ l½i�

l½i� ¼ f ðX; bÞ

3. Methodology

This section describes the functional form used for estimatingthe models and the deviance information criterion.

3.1. Functional form

The functional form used for models is as follows:

li ¼ b0Fb1Maj iF

b2Min i ð6Þ

where li, the mean number of crashes per year for intersection i;FMaj_i, entering flow for the major approach (average annual dailytraffic or AADT) for intersection i; and, FMin_i, entering flow for theminor approach for intersection i.

3.2. Deviance Information Criterion (DIC)

The DIC is a widely used GOF statistic for comparing models in aBayesian framework (Spiegelhalter et al. 2002). DIC is a hierarchi-cal modeling generalization of the Akaike Information Criterion(AIC) and Bayesian Information Criterion (BIC), defined as:

DIC ¼ DðhÞ þ PD ð7Þ

and

PD ¼ DðhÞ � Dð�hÞ ð8Þ

where h represents the collection of parameters, PD is a measure ofmodel complexity and is interpreted as the effective number ofparameters. The larger the PD, the easier it is to fit the model tothe data. DðhÞ ¼ E½�2 log L� is the expectation of the deviance underthe posterior of the un-standardized model, where L is the modellikelihood. Larger DðhÞ values correspond to a worst fit. Dð�hÞ is thedeviance evaluated at a posterior summary of h, which is typicallythe mean but the median or mode can also be considered whenappropriate. Models with smaller DIC should be preferred to modelswith larger DIC. Models are penalized by the value of DðhÞ, whichwill decrease as the number of parameters in a model increases,and PD, which compensates for this effect by favoring models witha smaller number of parameters.

One of the drawbacks of the DIC is that it is not invariant to re-parameterization, and therefore, parameterization of the modelsmust be carefully chosen. Formal justification for the DIC requiresthat the posterior be approximately normal, which may or may notbe true in practice. Despite these drawbacks, the DIC is very popu-lar and widely used due its simplicity and that it is readily availableas built-in tool in software like WinBUGS. Such easy access to thistool has one important ramification: DIC can be misused and mis-interpreted. Below, the researchers explore the reasons that arewell documented but often overlooked (Millar, 2009).

The definition of the DIC presented above is not unique in thecontext of multi-level models or hierarchical models and in fact de-pends very much on what part of the model is being considered as

Page 3: A caution about using deviance information criterion while modeling traffic crashes

Table 1Summary statistics for the Toronto data.

Variable Min. Max. Average (Std. dev.) Total

Crashes 0 54 11.56 (10.02) 10,030Major AADT 5469 72,178 28044.81 (10660.4) –Minor AADT 53 42,644 11010.18 (8599.40) –

S.R. Geedipally et al. / Safety Science 62 (2014) 495–498 497

the likelihood, which Spiegelhalter et al. (2002) refer to as the‘‘focus’’ issue. Consequently, changing the focus results in a differ-ent DIC. Indeed, Celeux et al. (2003) considered eight differenttypes of DICs by changing the focus and by replacing the plug-inestimates for the deviance. This incoherence or lack of specificitywas considered a cause of concern by the discussants of the origi-nal paper. To see why this may be the case, consider f(y | l) as thedata generating model. Let f(l | w) be the prior on l and f(w | f) bethe hyper-prior in this multi-level model. Now, we can considereither f(y | l) or f(y|w) =

Rf(y | l) f (l | w)dl as the likelihood. If

the deviance calculation is based on f(y | l), the resulting DIC isreferred to as conditional DIC (denoted as DICc) and the focus ison l. In the latter case, the focus is instead on w and justifiablywe can call it as marginal DIC (denoted as DICm) since l is effec-tively integrated out (marginalized). In the context of crash datamodeling, using a PG modeling, we may think of using DICc ifone is interested in site-specific predictions and may prefer touse DICm when prediction across all sites is of interest. These con-ditional and marginal foci obviously induce different complexitiesand comparing different models based on DIC may not be appro-priate as the foci are different. As one can see, to obtain DICm,one needs to perform integration explicitly either numerically oranalytically, and computing DICc requires no additional computa-tion or writing new piece of code. In fact, WinBUGS reports DICc

as the default and the researchers believe that this could be asource misuse, which is the main purpose why this short commu-nication was prepared. To be more explicit, consider the following(simplified) example given in Celeux et al. (2003).

Let yi be the response at site i, given as:

yi ¼ zi þ ei; i ¼ 1; . . . ; I ð9Þ

with

zi � Nðh; k�1Þ and ei � Nð0; s�1Þ ð10Þ

This is a simple random effects model, where each random ef-fect zi has, a priori, normal distribution with mean h and variances. When flat priors are assumed for the unknown precision param-eters, the conditional DIC for this example is given as:

DICc ¼ p log 2p�X

log si þ kX

qið1� qiÞðyi � hðyÞ

þ 2X

qi þP

qið1� qiÞPqi

� �ð11Þ

where

hðyÞ ¼P

qiyiPqi

and qi ¼si

si þ kð12Þ

The marginal DIC is given as,

DICm ¼ p log 2p�X

logðkqiÞ þ kX

qiðyi � hðyÞÞ2 þ 2 ð13Þ

It can be noticed, in this small example, where analytical calcu-lations are possible, that both DICs are different, even though theBayes Factor is 1 (since marginal likelihoods are the same for boththe foci). Having reminded the reader that multiple definitions forDIC are possible, we illustrate the difference between them by con-sidering an observed traffic crash data-set in the followingsections.

4. Data description

The dataset contained crash data collected in 1995 at 4-leggedsignalized intersections located in Toronto, Ont. The data have

previously been used for several research projects and have beenfound to be of relatively good quality (Miaou and Lord, 2003;Miranda-Moreno and Fu, 2007; Lord et al., 2008). In total, 868 sig-nalized intersections were used in this dataset. Table 1 presentsthe summary statistics for the dataset.

5. Results and discussion

Table 2 summarizes the results of parameter estimation. Thistable shows that the PG and NB models estimate similar coeffi-cients and as well the dispersion parameter, as expected. However,there is a considerable difference in the estimation of the DICbetween the two models.

It is believed that part of the popularity of the DIC is due to itsease of computation from Markov chain Monte Carlo (MCMC) sam-ples, and BUGS (Bayesian Inference Using Gibbs Sampling) includesthe deviance as a stochastic node automatically. However, a casualapplication of the DIC as model selection criterion can be mislead-ing at times, as noted in Table 2. It is reminded to the reader thatthis study is not the first to notice the issue but merely emphasiz-ing the subtleties involved and make the readers aware of thepotential difficulties certain models schema may cause.

When multi-level hierarchical models are considered, a safetyanalyst can choose a different focus and can obtain completely dif-ferent DICs for the same model, similar to the results shown inTable 2. In the current example, the researchers considered thePG mixture and NB models. Conditional on the regression coeffi-cients, both models are exact in their marginal forms (a reasonwhy the NB is called as a Poisson-Gamma mixture). Therefore,the Bayes Factor would be identical for both models, and yet theDIC is different. This is because the likelihood and the parametersin the PG model are conditionally based on the Poisson likelihood;however, marginally it is the NB model. Owing to this specification,two different predictors for future responses are obtained: eitherconditional (assuming that site-specific Poisson mean parametersare known) or marginal (by taking the marginal expectation). Asa result, predicted values, their variances are different and it isnatural to expect that the deviance will vary.

The effective parameters are elusive in hierarchical modelssince changing focus and interpretation about what is known orunknown will affect the degrees of freedom. It is not exactly aweakness of hierarchical models, but their strength (we wouldwant the data to specify how much strength should be borrowedby pooling versus not pooling at all). This data dependency causesa problem in general with any model selection criteria that requirea notion of degrees of freedom or number of effective parametersto penalize over-fitting, not just DIC. For example, in mixed modelsthere are a number of approaches to estimate the degrees of free-dom such as Satterhwaite approximation, Kenward-Rogersapproximation, among others (Bolker et al., 2009; Hodges andSargent, 2001). While the DIC is able to estimate the effective de-grees of parameters, it is subjective with respect to the model used.For example, in the PG model, the effective number of random ef-fects can be as large as the number of sites itself or zero (exactlyknown). In the particular dataset, PD is 459.6 for the PG model(much smaller than the maximum possible value of 872) and itis 3.9 (much closer to 4, the true number of parameters) in the

Page 4: A caution about using deviance information criterion while modeling traffic crashes

Table 2Modeling results for the PG and NB models using the Toronto data.

Parameter PGa NBa

Mean Std. dev. Mean Std. dev.

b0 –10.33 0.424 �10.33 0.447b1 0.629 0.042 0.628 0.045b2 0.685 0.021 0.687 0.021/ ¼ 1=a 7.152 0.627 7.154 0.631DIC 4778.2 5077.1Dð�hÞ 4319.0 5073.2

DðhÞ 3859.4 5069.3

PD 459.6 3.9

a PG, Poisson-Gamma (mixture model); NB, Negative Binomial (Pascaldistribution).

498 S.R. Geedipally et al. / Safety Science 62 (2014) 495–498

NB model. Adding to the complexity, the nominal number ofparameters may not be equal to the effective degrees of freedomas suggested by the DIC. These philosophical issues have stimu-lated lot of research into proposing new or modifying DIC to sce-narios involving hierarchical models (Celeux et al., 2003).Further, acknowledging the fact that the DIC tends to select over-fitted models, Ando (2007) proposed a new alternative criterionknown as Bayesian predictive information criterion (BPIC).

The DIC may be an appropriate model selection criterion, whenthe focus and the likelihood remain same across all models underconsideration. When such condition is not met, other alternativesincluding Bayes Factors, Posterior predictive performance criteria,BIC, BPIC, among others may need to be considered.

6. Summary and conclusions

This paper has documented the differences in the estimation ofthe DIC values between the PG and NB models while analyzing themotor vehicle crashes. The comparison analysis was carried outusing the most common functional forms used by transportationsafety analysts, which link crashes to the entering flows at inter-sections. To accomplish the study objectives, PG and NB models(parameterized as dpois plus dgamma and dnegbin in WinBUGSrespectively) were developed using the crash data collected at4-legged signalized intersections in Toronto, Ont.

The results of this study show that although both the modelsestimate similar parameter estimates and the dispersion parame-ter, there is a considerable difference in the estimation of the DICvalues. When comparing different models, the DIC can be used asa model selection criterion only if the likelihood remains the sameacross all the models under consideration. When such condition isnot met, using DIC as the sole criterion will often results into mis-leading conclusions. It is therefore important to compare modelsthat have similar parameterization. For example, if the comparisoninvolves a mixture distribution, such as the Poisson-lognormal(Miaou et al., 2003; Lord and Miranda-Moreno, 2008) or thePoisson-Weibull (Maher and Mountain, 2009; Cheng et al., 2012),then the PG model needs to be used. On the other hand, if thecomparison involves the NB-Lindley (Geedipally et al., 2012), asan example, then the NB model (based on the Pascal distribution)needs to be employed. Finally, it is recommended to considerother alternatives such as Bayes Factors, Posterior predictiveperformance criterion, BIC, among others in addition to the DICwhen comparing different models.

Acknowledgement

The authors would like to thank Ms. Lingzi Cheng from TexasA&M University for her assistance in data modeling.

References

Ando, T., 2007. Bayesian predictive information criterion for the evaluation ofhierarchical Bayesian and empirical Bayes models. Biometrika 94 (2), 443–458.

Benjamin, J.R., Cornell, C.A., 1970. Probability, Statistics, and Decision for CivilEngineers. McGraw-Hill, Inc., New York, NY.

Bolker, B.M., Brooks, M.E., Clark, C.J., Geange, S.J., Poulsen, J.R., Henry, M., Stevens, H.,White, J.S., 2009. Generalized linear mixed models: a practical guide for ecologyand evolution. Trends in Ecology and Evolution 24 (3), 127–135.

Cameron, A.C., Trivedi, P.K., 1998. Regression Analysis of Count Data. CambridgeUniversity Press, Cambridge, UK.

Casella, G., Berger, R.L., 1990. Statistical Inference. Wadsworth Brooks/Cole, PacificGrove, CA.

Celeux, G., Forbes, F., Roberts, C.P., Titterington, M., 2003. Deviance informationcriteria for missing data models. Bayesian Analysis 4 (651), 674.

Cheng, L., Geedipally, S.R., Lord, D., 2012. Examining the Poisson-Weibullgeneralized linear model for analyzing crash data. Safety Science 54, 38–42.

Cook, J.D., 2009. Notes on the negative binomial distribution. <www.johndcook.com/negative_binomial.pdf> (accessed on August 21, 2012).

Geedipally, S.R., Lord, D., Dhavala, S.S., 2012. The negative-binomial-generalized-lindley generalized linear model: characteristics and application using crashdata. Accident Analysis and Prevention 45 (2), 258–265.

Greenwood, M., Yule, G.U., 1920. An inquiry into the nature of frequencydistributions of multiple happenings, with particular reference to theoccurrence of multiple attacks of disease or repeated accidents. Journal of theRoyal Statistical Society A 83, 255–279.

Hilbe, J.M., 2011. Negative Binomial Regression, second ed. Cambridge UniversityPress, Cambridge, UK.

Hodges, J.S., Sargent, D.J., 2001. Counting degrees of freedom in hierarchical andother richly-parameterised models. Biometrika 88, 367–379.

Lawless, J.F., 1987. Negative binomial and mixed poisson regression. The CanadianJournal of Statistics 15 (3), 209–225.

Lord, D., Geedipally, S.R., 2011. The negative binomial – Lindley distribution as atool for analyzing crash data characterized by a large amount of zeros. AccidentAnalysis and Prevention 43 (5), 1738–1742.

Lord, D., Guikema, S.D., Geedipally, S., 2008. Application of the Conway-Maxwell-Poisson generalized linear model for analyzing motor vehicle crashes. AccidentAnalysis and Prevention 40 (3), 1123–1134.

Lord, D., Mannering, F.L., 2010. The statistical analysis of crash-frequency data: areview and assessment of methodological alternatives. Transportation Research– Part A 44 (5), 291–305.

Lord, D., Miranda-Moreno, L.F., 2008. Effects of low sample mean values and smallsample size on the estimation of the fixed dispersion parameter of Poisson-Gamma models for modeling motor vehicle crashes: a Bayesian perspective.Safety Science 46 (5), 751–770.

Lord, D., Washington, S.P., Ivan, J.N., 2005. Poisson, Poisson-Gamma and zeroinflated regression models of motor vehicle crashes: balancing statistical fit andtheory. Accident Analysis and Prevention 37 (1), 35–46.

Maher, M., Mountain, L., 2009. The sensitivity of estimates of regression to themean. Accident Analysis and Prevention 41 (4), 861–868.

Miaou, S.-P., Lord, D., 2003. Modeling traffic-flow relationships at signalizedintersections: dispersion parameter, functional form and Bayes vs EmpiricalBayes. Transportation Research Record 1840, 31–40.

Miaou, S.P., Song, J.J., Mallick, B.K., 2003. Roadway traffic crash mapping: a space-time modeling approach. Journal of Transportation and Statistics 6 (1), 33–57.

Millar, R.B., 2009. Comparison of hierarchical bayesian models for overdispersedcount data using DIC and Bayes’s factors. Biometrics 65, 962–969.

Miranda-Moreno, L.F., Fu, L., 2007. Traffic safety study: empirical Bayes or fullBayes? paper 07–1680. In: Presented at the 84th Annual Meeting of theTransportation Research Board, Washington, DC.

Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A., 2002. Bayesian measuresof model complexity and fit. Journal of the Royal Statistical Society, Series B 64,583–640.

Spiegelhalter, D.J., Thomas, A., Best, N.G., Lun, D. 2003. WinBUGS Version 1.4.1 UserManual. MRC Biostatistics Unit, Cambridge. <http://www.mrcbsu.cam.ac.uk/bugs/welcome.shtml>.

Zamani, H., Ismail, N., 2010. Negative binomial-Lindley distribution and itsapplication. Journal of Mathematics and Statistics 6 (1), 4–9.