on zero-modified poisson-sujatha distribution to model … · 2019-08-02 · 2 on zero-modi ed...

19
AJS Austrian Journal of Statistics June 2018, Volume 47, 1–19. http://www.ajs.or.at/ doi:10.17713/ajs.v47i3.590 On Zero-Modified Poisson-Sujatha Distribution to Model Overdispersed Count Data Wesley Bertoli da Silva Federal Technology University of Paran´ a Ang´ elica Maria T. Ribeiro Federal Technology University of Paran´ a Katiane S. Concei¸c˜ ao University of S˜ ao Paulo Marinho G. Andrade University of S˜ ao Paulo Francisco Louzada University of S˜ ao Paulo Abstract In this paper we propose the zero-modified Poisson-Sujatha distribution as an alter- native to model overdispersed count data exhibiting inflation or deflation of zeros. It will be shown that the zero modification can be incorporated by using the zero-truncated Poisson-Sujatha distribution. A simple reparametrization of the probability function will allow us to represent the zero-modified Poisson-Sujatha distribution as a hurdle model. This trick leads to the fact that proposed model can be fitted without any previously information about the zero modification present in a given dataset. The maximum likeli- hood theory will be used for parameter estimation and asymptotic inference concerns. A simulation study will be conducted in order to evaluate some frequentist properties of the developed methodology. The usefulness of the proposed model will be illustrated using real datasets of the biological sciences field and comparing it with other models available in the literature. Keywords : zero-modified Poisson-Sujatha, overdispersion, inflation/deflation of zeros, hurdle models, maximum likelihood estimation. 1. Introduction Most applications involving the analysis of count data are performed using the Poisson and Negative Binomial distributions. The latter is a well-known 2-parameter Poisson compound model that arises as alternative to fit overdispersed data since Poisson models are not ap- plicable in this case. The literature concerning discrete models that accommodate different levels of dispersion is wide and provides several composed distributions as Poisson-Lindley (Sankaran 1970), Negative Binomial-Lindley (Zamani and Ismail 2010), Poisson-Exponential (Cancho, Louzada-Neto, and Barriga 2011), Poisson-Shanker (Shanker 2016a), among others. A relevant drawback of such compound models is the fact that they do not fit well when a large amount of zeros is observed. To overcome this issue, several zero-inflated and hur- dle approaches for standard Poisson model were proposed (Mullahy 1986; Lambert 1992; Zorn 1996). McDowell (2003) provides a insightful discussion about hurdle models. Such approaches were considered by several authors for applications and we will pointed out a

Upload: others

Post on 16-Feb-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

AJS

Austrian Journal of StatisticsJune 2018, Volume 47, 1–19.

http://www.ajs.or.at/

doi:10.17713/ajs.v47i3.590

On Zero-Modified Poisson-Sujatha Distribution to

Model Overdispersed Count Data

Wesley Bertolida Silva

Federal TechnologyUniversity of Parana

Angelica MariaT. Ribeiro

Federal TechnologyUniversity of Parana

Katiane S.ConceicaoUniversity

of Sao Paulo

Marinho G.AndradeUniversity

of Sao Paulo

FranciscoLouzadaUniversity

of Sao Paulo

Abstract

In this paper we propose the zero-modified Poisson-Sujatha distribution as an alter-native to model overdispersed count data exhibiting inflation or deflation of zeros. Itwill be shown that the zero modification can be incorporated by using the zero-truncatedPoisson-Sujatha distribution. A simple reparametrization of the probability function willallow us to represent the zero-modified Poisson-Sujatha distribution as a hurdle model.This trick leads to the fact that proposed model can be fitted without any previouslyinformation about the zero modification present in a given dataset. The maximum likeli-hood theory will be used for parameter estimation and asymptotic inference concerns. Asimulation study will be conducted in order to evaluate some frequentist properties of thedeveloped methodology. The usefulness of the proposed model will be illustrated usingreal datasets of the biological sciences field and comparing it with other models availablein the literature.

Keywords: zero-modified Poisson-Sujatha, overdispersion, inflation/deflation of zeros, hurdlemodels, maximum likelihood estimation.

1. Introduction

Most applications involving the analysis of count data are performed using the Poisson andNegative Binomial distributions. The latter is a well-known 2-parameter Poisson compoundmodel that arises as alternative to fit overdispersed data since Poisson models are not ap-plicable in this case. The literature concerning discrete models that accommodate differentlevels of dispersion is wide and provides several composed distributions as Poisson-Lindley(Sankaran 1970), Negative Binomial-Lindley (Zamani and Ismail 2010), Poisson-Exponential(Cancho, Louzada-Neto, and Barriga 2011), Poisson-Shanker (Shanker 2016a), among others.

A relevant drawback of such compound models is the fact that they do not fit well whena large amount of zeros is observed. To overcome this issue, several zero-inflated and hur-dle approaches for standard Poisson model were proposed (Mullahy 1986; Lambert 1992;Zorn 1996). McDowell (2003) provides a insightful discussion about hurdle models. Suchapproaches were considered by several authors for applications and we will pointed out a

Page 2: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

2 On Zero-Modified Poisson-Sujatha Distribution to Model Overdispersed Count Data

few. Bohara and Krieg (1996) show that the modelling of migratory frequency data can beimproved using zero-inflated Poisson models. Gurmu and Trivedi (1996) seek to deal withthe excess of zeros on data from recreational trips. Ridout, Demetrio, and Hinde (1998) useseveral zero-inflated Poisson regression models for Apple shoot propagation data. In the socialsciences, Bahn and Massenburg (2008) consider the hurdle Poisson model for the number ofhomicides in Chicago (Illinois - USA). Further applications were considered for quantitativestudies on HIV-risk reduction (Heilbron and Gibson 1990; Hu, Pavlicova, and Nunes 2011)and for DNA sequencing data (Beuf, Schrijver, Thas, Criekinge, Irizarry, and Clement 2012).As zero-deflated data seldom arises in practice, there are very few literature addressing thiscase (Angers and Biswas 2003; Conceicao, Andrade, and Louzada 2013) even if this situationis often referred in papers dealing with zero-inflated models.

Recently, the Poisson-Sujatha distribution was obtained by compounding the Poisson witha Sujatha distribution. The latter was introduced by Shanker (2016b) for modelling reallifetime in biological and engineering contexts. The author has shown that this model is athree component mixture of an Exponential distribution with scale parameter θ, a Gammadistribution having shape parameter 2 and scale parameter θ and a Gamma distributionhaving shape parameter 3 and scale parameter θ with mixing proportions given, respectively,by θ2υ−1, θυ−1 and 2υ−1, being υ = θ2 + θ + 2. A comprehensive discussion about thestatistical properties of the Sujatha distribution such as moments, hazard function, stochasticorderings, parameter estimation, among others is also presented on the mentioned paper.

The Poisson-Sujatha distribution was introduced and extensively studied by Shanker (2016c)which have discussed its various mathematical properties. Shanker and Fesshaye (2016a)consider the Poisson-Sujatha distribution to model overdispersed counts provided by ecologicaland genetic experiments. Shanker and Fesshaye (2016b) obtained the size-biased version ofthe Poisson-Sujatha distribution, presenting its properties and discussing its applications.The zero-truncated Poisson-Sujatha distribution, which will be of particular interest in thispaper, was presented by Shanker and Fesshaye (2016c). Further, a detailed report on zero-truncated Poisson, Poisson-Lindley and Poisson-Sujatha distributions is provided by Shankerand Fesshaye (2016d).

Zero-modified models may arise when no information about the kind of zero modification ina given dataset is available. Dietz and Bohning (2000) proposed the zero-modified Poissonregression model for zero inflated/deflated samples and Conceicao et al. (2013) consider aBayesian approach for this model as an alternative to model Brazilian leptospirosis notifica-tion data. Once zero inflated/deflated models may also be useful to deal with data presentingoverdispersion, this paper aims to introduce and present the usefulness of the zero modifiedversion of the Poisson-Sujatha distribution, which is itself overdispersed. The proposed modelis naturally more flexible than the original one since it takes into account inflation or deflationof zeros, being the first an issue often encountered when analysing count data. For our pur-pose, we consider a reparameterization of the zero-modified Poisson-Sujatha probability massfunction, which will allow the likelihood function to be separable on the model parameters.The estimation procedure will be conducted under the frequentist point of view by the usuallikelihood theory. A simulation study will be conducted in order to evaluate some frequentistproperties of the maximum likelihood estimators. The usefulness of the proposed model willbe illustrated by considering applications to real datasets from the biological sciences field.Standard model comparison will be also provided.

This paper is organized as follows. In Section 2, we briefly present the Poisson-Sujathadistribution, some of its mathematical properties and its zero-truncated version. In Section3, we introduce the zero-modified Poisson-Sujatha distribution, demonstrating its flexibilityto deal with zero inflated/deflated data. In Section 4, the zero-modified Poisson-Sujathadistribution is presented as a hurdle model. In Section 5, maximum likelihood estimation forthe unknown parameters as well the asymptotic standard errors and confidence intervals arediscussed. In Section 6, a simulation study is presented. In Section 7, the proposed model isconsidered for application to real datasets. Concluding remarks are addressed in Section 8.

Page 3: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

Austrian Journal of Statistics 3

2. Poisson-Sujatha distribution

A random variable ψ is said to have Sujatha (S) distribution if its probability density function(pdf) can be written as

g (ψ; θ) =θ3

θ2 + θ + 2

(ψ2 + ψ + 1

)e−θψ, ψ > 0,

for θ > 0.

The Poisson-Sujatha (PS) distribution is a probabilistic model that arises when the S distribu-tion is chosen to describe the rate parameter (ψ) of the Poisson (P) distribution. In this case,a random variable X is said to have PS distribution if it follows the stochastic representation

X|ψ ∼ P (ψ) and ψ ∼ S (θ) .

The unconditional distribution of the random variable X can be denoted by PS (θ). LetXz = {z, z + 1, . . .} the set of the integers greater or equal to z. We completed the definitionby stating that a random variable X, defined on X0, will have PS distribution if its probabilitymass function (pmf) can be written as

f (x; θ) =θ3

θ2 + θ + 2

[x2 + x (θ + 4) +

(θ2 + 3θ + 4

)(θ + 1)x+3

], x ∈ X0, (1)

for θ > 0. Using the gamma integral, the above result can be easily proved by integratingf (x|ψ) g (ψ; θ) respect to ψ over R+, being f (x|ψ) the conditional pmf of a P variable.

From the results provided by Shanker (2016c) we have that the rth factorial moment aboutthe origin of the PS distribution is given by

µ′r =r![θ2 + (r + 1) θ + (r + 1) (r + 2)

]θr (θ2 + θ + 2)

, (2)

which provides the moments about origin. Thus, the expected value and the variance are

µ = µ′1 =θ2 + 2θ + 6

θ (θ2 + θ + 2), (3)

and

σ2 = µ′2 −(µ′1)2

=θ5 + 4θ4 + 14θ3 + 28θ2 + 24θ + 12

θ2(θ2 + θ + 2)2. (4)

It is easily to see that the variance term can be written as

σ2 = µ

[1 +

θ4 + 4θ3 + 18θ2 + 12θ + 12

θ (θ2 + θ + 2) (θ2 + 2θ + 6)

]= µτ, (5)

being the ratio involving the parameter θ always positive. This implies that the PS distribu-tion is overdispersed, i.e. whichever θ > 0 we have that σ2 > µ. Further, the useful indexof dispersion (τ) is clearly greater than 1, also implying overdispersion since τ = σ2µ−1. Onthe other hand, we have that τ → 1

(σ2 → µ

)as θ → ∞, i.e. the PS distribution has the

property of equidispersion for large values of θ.

Again, using the relationship between the moments about mean and the moments aboutorigin, some useful measures as the coefficient of variation (β), the coefficient of skewness (γ)and the coefficient of kurtosis (ζ) can be derived from equation (2). The expressions of suchmeasures are

β =σ

µ=

√θ5 + 4θ4 + 14θ3 + 28θ2 + 24θ + 12

θ2 + 2θ + 6,

Page 4: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

4 On Zero-Modified Poisson-Sujatha Distribution to Model Overdispersed Count Data

γ =µ3

µ3/22

=θ8 + 7θ7 + 32θ6 + 110θ5 + 228θ4 + 300θ3 + 240θ2 + 144θ + 48

(θ5 + 4θ4 + 14θ3 + 28θ2 + 24θ + 12)3/2

,

and

ζ =µ4µ22

=θ11 + 15θ10 + 99θ9 + 488θ8 + 1682θ7 + 4016θ6

(θ5 + 4θ4 + 14θ3 + 28θ2 + 24θ + 12)2+

7008θ5 + 9016θ4 + 8784θ3 + 6240θ2 + 2880θ + 720

(θ5 + 4θ4 + 14θ3 + 28θ2 + 24θ + 12)2.

The definition of the zero-truncated Poisson-Sujatha (ZTPS) distribution will be quite usefulfor the purpose of this paper. A random variable X is said to have ZTPS if its pmf can bewritten as

fZTPS

(x; θ) =θ3

θ4 + 4θ3 + 10θ2 + 7θ + 2

[x2 + x (θ + 4) +

(θ2 + 3θ + 4

)(θ + 1)x

], x ∈ X1, (6)

for θ > 0. See Shanker and Fesshaye (2016c) for further details about the ZTPS distribution.

3. Zero-modified Poisson-Sujatha distribution

Let X be a random variable defined on X0. Thus, X is said to have zero-modified Poisson-Sujatha (ZMPS) distribution if its pmf can be written as

fZMPS

(x; θ, π) = (1− π) δx + π f (x; θ) , x ∈ X0, (7)

for θ > 0 and the parameter π is subject to the condition (called π-condition) given by

0 6 π 61

1− f (0; θ), (8)

being f (x; θ) the pmf of a PS random variable. Further, δx is the indicator function, so thatδx = 1 if x = 0 and δx = 0 otherwise. Note that (7) is not a mixture distribution typicallyfitted to zero-inflated data, since parameter π can assume values greater than 1. However, forall values of π between 0 and its upper boundary, the equation (7) corresponds to a properlypmf since f

ZMPS(x; θ, π) is positive for each x and sums to 1 on X0.

The expected value and the variance of X are

µZMPS

= πµ and σ2ZMPS

= π[σ2 + (1− π)µ2

], (9)

where µ and σ2 are given in equations (3) and (4). Under the ZMPS distribution, the indexof dispersion and the coefficients of variation, skewness and kurtosis are given, respectively,by

τZMPS

=

(θ5 + 5θ4 + 18θ3 + 44θ2 + 48θ + 48

)− π

(θ4 + 4θ3 + 16θ2 + 24θ + 36

)θ (θ2 + θ + 2) (θ2 + 2θ + 6)

,

βZMPS

=

√π (θ5 + 5θ4 + 18θ3 + 44θ2 + 48θ + 48)− π2 (θ4 + 4θ3 + 16θ2 + 24θ + 36)

π (θ2 + 2θ + 6),

Page 5: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

Austrian Journal of Statistics 5

γZMPS

=

[θ4 + 8θ3 + 30θ2 + 96θ + 120

θ2 + 2θ + 6−

3π(θ3 + 4θ2 + 12θ + 24

)θ2 + θ + 2

+

2π2(θ2 + 2θ + 6

)2(θ2 + θ + 2)2

]π(θ2 + 2θ + 6

)θ3 (θ2 + θ + 2)h3/2

.

and

ζZMPS

=

[θ5 + 16θ4 + 84θ3 + 336θ2 + 840θ + 720

θ2 + 2θ + 6−

4π(θ4 + 8θ3 + 30θ2 + 96θ + 120

)θ2 + θ + 2

+

6π2(θ3 + 4θ2 + 12θ + 24

) (θ2 + 2θ + 6

)(θ2 + θ + 2)2

−3π3

(θ2 + 2θ + 6

)3(θ2 + θ + 2)3

]π(θ2 + 2θ + 6

)θ4 (θ2 + θ + 2)h2

.

where

h =π(θ2 + 2θ + 6

)θ2 (θ2 + θ + 2)

[θ3 + 4θ2 + 12θ + 24

(θ2 + 2θ + 6)−π(θ2 + 2θ + 6

)(θ2 + θ + 2)

].

The ZMPS distribution may be considered an interesting alternative to the usual zero-modifiedPoisson (ZMP) model since the basis distribution of the former can accommodate severallevels of overdisperson, issue that the P distribution generally fails in deal with. Table 1summarizes the nature and the behaviour of the presented measures, using selected values forthe parameters θ and π.

Table 1: Theoretical descriptive measures for different values of θ and π.

θ πMeasures

µZMPS

σ2ZMPS

τZMPS

βZMPS

γZMPS

ζZMPS

1.00.5 1.1250 3.8594 3.4306 1.7462 2.2984 9.4194

1.2 2.7000 5.0100 1.8556 0.8290 1.4478 6.0346

3.00.5 0.2500 0.4256 1.7024 2.6095 3.4391 18.3409

1.2 0.6000 0.8114 1.3523 1.5013 1.9409 8.1560

5.00.5 0.1281 0.1767 1.3794 3.2809 4.0635 24.0585

1.2 0.3075 0.3689 1.1997 1.9753 2.3341 9.9550

7.00.5 0.0850 0.1066 1.2541 3.8424 4.5498 28.6871

1.2 0.2039 0.2316 1.1359 2.3597 2.6613 11.5852

9.00.5 0.0634 0.0755 1.1909 4.3332 4.9771 32.9873

1.2 0.1522 0.1677 1.1018 2.6908 2.9534 13.2027

It is clear that the coefficient of variation, the coefficient of skewness, and the coefficient ofkurtosis are increasing as θ increases and π decreases. The higher values for the index ofdispersion are obtained for small values of θ and π. On the other hand, combining smallvalues of θ with higher as possible values of π will provide bigger values for the expectedvalue and for the variance.

Theorem 1. The following statements holds.

i) If π = 0 then fZMPS

(0; θ, π) = 1 and therefore, equation (7) relates to a degeneratedistribution with all mass at zero;

ii) If π = 1 then fZMPS

(0; θ, π) = f (0; θ). Hence (7) is the usual PS distribution;

Page 6: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

6 On Zero-Modified Poisson-Sujatha Distribution to Model Overdispersed Count Data

iii) If π = [1− f (0; θ)]−1 then fZMPS

(0; θ, π) = 0;

iv) If 0 < π < 1 then fZMPS

(0; θ, π) > f (0; θ) and therefore, the ZMPS distribution has aproportion of zeros greater than the usual PS distribution;

v) If 1 < π < [1− f (0; θ)]−1 then fZMPS

(0; θ, π) < f (0; θ) and therefore, the ZMPS distri-bution has a proportion of zeros smaller than the usual PS distribution.

Proof. Define the proportion of additional or missing zeros by

fZMPS

(0; θ, π)− f (0; θ) = (1− π) + π f (0; θ)− f (0; θ)

= (1− π) [1− f (0; θ)] . (10)

Follows from the previous expression that (i) and (ii) are obvious. As for (iii),

fZMPS

(0; θ, π)− f (0; θ) ={

1− [1− f (0; θ)]−1}

[1− f (0; θ)]

= f (0; θ) ,

hence fZMPS

(0; θ, π) = 0. Statement (iv) follows from the fact that if 0 < π < 1 then 0 <(1− π) [1− f (0; θ)] < 1 since f is a probability measure. Therefore f

ZMPS(0; θ, π) > f (0; θ).

For the latter, whichever π > 1, (1− π) < 0 and the result follows by the same argument for(iv). Hence, f

ZMPS(0; θ, π) < f (0; θ), which completes the proof.

The inflation and deflation of zeros are characterized, respectively, by statements (iv) and (v)of the previous Theorem. We observe from (10) that the main role of the parameter π is tocontrol the frequency of zeros. In such a way, very different values of π lead to completelydifferent ZMPS distributions. For instance, fixing θ = 1.5, if π = 0.05 then P (X = 0) ≈ 0.97and if π = 0.95 then P (X = 0) ≈ 0.43.

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

x

f(x;

θ =

1.0

)

π = 0.20π = 0.70π = 1.20

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

x

f(x;

θ =

5.0

)

π = 0.50π = 2.50π = 4.50

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

x

f(x;

θ =

10.

0)

π = 1.00π = 5.00π = 9.00

Figure 1: Behaviour of the ZMPS distribution for different values of θ and π.

Figure 1 depicts the pmf of the ZMPS distribution for θ = 1.0 (implying 0 6 π 6 1.33), forθ = 5.0 (implying 0 6 π 6 4.90) and for θ = 10.0 (implying 0 6 π 6 9.90).

4. Hurdle version of the ZMPS distribution

The class of hurdle models was introduced by Mullahy (1986). The relevant feature of suchmodels is that the zero outcomes are treated separately from the positive ones. In the formu-lation, a binary probability model determines whether a zero or a non-zero outcome occurs

Page 7: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

Austrian Journal of Statistics 7

and hence, an appropriated truncated discrete distribution is chosen to describe the positiveoutcomes (Saffar, Adnan, and Greene 2012).

Let us define the hurdle version of the ZMPS distribution. Firstly, the equation (7) can beexpressed as

fZMPS

(x; θ, π) = {1− π [1− f (0; θ)]} δx + π (1− δx) f (x; θ) , x ∈ X0, (11)

Now, setting ω = π [1− f (0; θ)], expression (11) becomes

fZMPS (x; θ, ω) = (1− ω)δx + ω fZTPS

(x; θ) , x ∈ X0, (12)

being fZTPS

(x; θ) the ZTPS distribution given by (6). Since 0 6 p 6 [1 − f(0;µ)]−1 then0 6 p [1− f(0;µ)] 6 1. Hence 0 6 ω 6 1.

The pmf (12) can be seen as a hurdle version of the ZMPS distribution, where the probabilityof X = 0 is (1− ω) and the probability of X > 0, is ω f

ZTPS(x; θ). Moreover, from equation

(9) we have that the expected value and the variance of the ZMPS distribution, expressed inits hurdle version, depends on the probability of X = 0 under the PS distribution.

The ZMPS distribution expressed in a hurdle version contains the ZTPS distribution as oneof its components, which differs from the traditional mixture representation of zero-inflateddistributions. Indeed, this representation of the ZMPS distribution can be interpreted asa superposition of two processes, i.e. one that produces positive observations from a ZTPSdistribution and another that produces only zero valued observations with probability (1−ω).

The hurdle version of the ZMPS distribution can be used to derive the maximum likelihoodestimators (MLEs) for the parameters θ and ω. Furthermore, such approach allow us to useonly the positive observations in a given dataset to estimate the parameter θ assuming thatthese observations comes from a ZTPS distribution, while the parameter ω can be estimatedas the proportion of zeros in the sample. Hence, parameter π can be estimated subsequentlyusing the equation

π =ω

1− f (0; θ). (13)

It is noteworthy that make inferences about the parameter π is essential to identify the kindof zero modification (inflation or deflation) is present in the analysed dataset.

5. Maximum likelihood estimation

Let X = (X1, . . . ,Xn) a random sample of size n from the ZMPS distribution andx = (x1, . . . , xn) its observed values. Thus, the likelihood function for the parameters θand ω is given by

Ln (θ, ω;x) =n∏j=1

(1− ω)δxj[ω f

ZTPS(xj ; θ)

]1−δxj. (14)

One can note that the likelihood values of hurdle models are computed separately for eachpmf. The MLEs of θ and ω can be obtained by direct maximization of the log-likelihoodfunction

`n (θ, ω;x) =

n∑j=1

(1− δxj

){log[fZTPS

(xj ; θ)]

+ log (ω)}

+n∑j=1

δxj log (1− ω)

=

n∑j=1

(1− δxj

Page 8: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

8 On Zero-Modified Poisson-Sujatha Distribution to Model Overdispersed Count Data

log

[θ3

θ4 + 4θ3 + 10θ2 + 7θ + 2

(x2j + xj (θ + 4) +

(θ2 + 3θ + 4

)(θ + 1)xj

)]+

n∑j=1

[log (ω)− δxj log

1− ω

)]

=m∑j=1

log

[θ3

θ4 + 4θ3 + 10θ2 + 7θ + 2

(x2j + xj (θ + 4) +

(θ2 + 3θ + 4

)(θ + 1)xj

)]+

n∑j=1

log (ω)−q∑j=1

log

1− ω

)= 3m log (θ)−m log

(θ4 + 4θ3 + 10θ2 + 7θ + 2

)−mxm log (θ + 1) +

n log (ω)− q log

1− ω

)+

m∑j=1

log[x2j + xj (θ + 4) +

(θ2 + 3θ + 4

)], (15)

where m denotes the number of positive outcomes and q the number of zero ones. Indeedm+ q = n. Moreover, xm is the sample mean obtained from the set of positive values.

From (15) it is straightforward to see that the parameters θ and ω are orthogonal and thatall terms in the log-likelihood function depending on θ take into account only the positivevalues of the sample vector x. Denoting by x(m) the vector of positive values from x, the

log-likelihood function for θ based on the assumption that x(m)j , j = 1, . . . ,m, are generated

from a ZTPS distribution is given by

`n

(θ, ω;x(m)

)= 3m log (θ)−m log

(θ4 + 4θ3 + 10θ2 + 7θ + 2

)−mxm log (θ + 1) +

m∑j=1

log[x2j + xj (θ + 4) +

(θ2 + 3θ + 4

)]. (16)

Indeed `n(θ, ω;x(m)

)= `m (θ;x), since each xj present in the log-likelihood of θ are generated

by a zero-truncated distribution. Therefore, evaluate θ under ZMPS distribution is equivalentto assuming that the positive values of x comes entirely from a ZTPS distribution. On theother hand, denoting by x(q) the vector of zero outcomes from x, the log-likelihood functionfor ω is given by

`n

(θ, ω;x(q)

)= `n (ω;x) = n log (ω)− q log

1− ω

). (17)

Now, the corresponding score vector is given by

U ≡ U (θ, ω;x) = [uθ, uω]ᵀ , (18)

where

uθ =∂ `m (θ;x)

∂θ=

3m

θ−m

[4θ3 + 12θ2 + 20θ + 7

θ4 + 4θ3 + 10θ2 + 7θ + 2

]− mxmθ + 1

+

=m∑j=1

xj + 2θ + 3

x2j + xj (θ + 4) + (θ2 + 3θ + 4), (19)

and

uω =∂ `n (ω;x)

∂ω=

(n− q)ω

− q

(1− ω). (20)

The observed information, i.e. the Hessian matrix is given by

K ≡ K (θ, ω;x) = −[kθθ kθωkωθ kωω

],

Page 9: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

Austrian Journal of Statistics 9

where

kθθ =∂2`m (θ;x)

∂θ2= −3m

θ2+m

[4θ6 + 24θ5 + 68θ4 + 132θ3 + 176θ2 + 92θ + 9

(θ4 + 4θ3 + 10θ2 + 7θ + 2)2

]+

mxm

(θ + 1)2+

m∑j=1

x2j − 2xjθ + 2xj − 2θ2 − 6θ − 1[x2j + xj (θ + 4) + (θ2 + 3θ + 4)

]2 , (21)

and

kωω =∂2`n (ω;x)

∂ω2= −(n− q)

ω2− q

(1− ω)2. (22)

By orthogonality of θ and ω, the crossed partial derivatives are null, then kθω = kωθ = 0.Hence the set up of the information matrix is complete. As usual, the curvature of thelog-likelihood function can be evaluated locally at θ and ω.

Proposition 1. Let ω the MLE of parameter ω. The following statements holds.

i) ω = mn−1;

ii) ω is an unbiased estimator for ω;

iii) The lower bound for the variance of ω is that for binary probability models.

Proof. Item (i) is straightforward. Take n − q = m and isolate ω in the equation uω = 0.Now,

EX

(ω) = EX

(mn

)=

1

nE

X

n∑j=1

(1− δ

Xj

)=

1

n

n−n∑j=1

EX

(δXj

)=

1

n

n−n∑j=1

fZMPS

(0; θ, ω)

=

1

n{n− n (1− ω)} = ω,

and (ii) holds. For (iii), firstly note that the set {x : fZMPS

(0; θ, ω) > 0} does not depend on θnor ω. Moreover, it is clear from (20) that for all x > 0, uω exists and is finite whenever ω 6= 0.For the moment, such conditions are sufficient and allow us to make use of the Cramer-Raobound for the variance of an unbiased MLE, which is the case of ω. Then,

Var (ω) > J−1 (ω) ,

where J = −EX

(K) is the expected information, i.e. the Fisher information matrix. From(22) we have that

−EX

(kωω) = EX

{m

ω2+

(n−m)

(1− ω)2

}=

1

ω2E

X(m) +

n

(1− ω)2− 1

(1− ω)2E

X(m)

=n

ω+

n

(1− ω)2− nω

(1− ω)2

= n

{1

ω+

1

(1− ω)

}=

n

ω (1− ω),

Page 10: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

10 On Zero-Modified Poisson-Sujatha Distribution to Model Overdispersed Count Data

and hence,

Var (ω) >ω (1− ω)

n, (23)

which completes the proof. It is straightforward to show that the variance of ω is exactlyn−1ω (1− ω) and therefore, coincides with the Cramer-Rao lower bound.

There is no closed form for the MLE of θ, see (19). However, using (17) the parameter θcan be estimated using standard numeric optimization algorithms such the Newton-Raphson,the Bisection and the Regula-Falsi methods. By the usual maximum likelihood theory, anasymptotic approximation for the variance of θ can be obtained from k−1θθ , which evaluated

at θ provides a consistent estimator for such a measure. On the other hand, the variance ofω can be estimated by its lower bound provided by the previous Proposition which, in fact,corresponds to the exact variance.

As aforementioned, the parameter π is a non-linear function depending on θ and ω. By theinvariance principle, the MLE of π can be obtained as

π = s (θ, ω) =ω

1− f(

0; θ)

=m(θ2 + θ + 2

)(θ + 1

)3n(θ4 + 4θ3 + 10θ2 + 7θ + 2

) .Now, the variance of π can be estimated using the delta-method. Since θ and ω are orthogonal,

Cov(θ, ω

)= 0. Hence,

Var (π) ≈ Var(θ)[ ∂

∂ θs (θ, ω)

]2+ Var (ω)

[∂

∂ ωs (θ, ω)

]2

≈ m2

n2Var

(θ) θ4 (θ + 1

)4 (θ4 + 6θ3 + 25θ2 + 32θ + 24

)2(θ4 + 4θ3 + 10θ2 + 7θ + 2

)4 +

mq

n3

(θ2 + θ + 2

)2 (θ + 1

)6(θ4 + 4θ3 + 10θ2 + 7θ + 2

)2 ,being the variance of θ estimated numerically. The terms inside the brackets are evaluatedat the MLEs of θ and ω. Now, to obtain intervallic estimates, we can use large sampleapproximations for the 100 (1− α) % two sided confidence intervals (CIs) for the parametersθ, ω and π that are given, respectively, by

θ ± zα/2 SE(θ), ω ± zα/2 SE (ω) and π ± zα/2 SE (π) ,

being zα the upper αth percentile of the standard Normal distribution. The standard errors(SEs) are estimated as the squared root of the variance of the MLE of each model parameters.

In the following two sections, we presented the results obtained in the simulation study andthe application of the proposed model to real datasets. To attain the numerical results, allcomputations were performed under the R environment (R Development Core Team 2007).

6. Simulation study

In this section, we seek to evaluate the frequentist properties of the proposed methodologyby performing a simulation study. The simulation process consists in generating N = 10, 000

Page 11: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

Austrian Journal of Statistics 11

pseudo-random samples of sizes n = 50, 100, 150 and 200 of a variable X having ZMPS distri-bution in its hurdle version. Our procedure is based on the Monte Carlo simulation methodto estimate the average bias and the mean squared error of the MLEs of the parameters θand ω as well the coverage probability of the asymptotic CIs derived from such estimates.The results obtained for the parameter π are also presented. Assuming φ = θ, ω or π, themeasures computed using the generated samples are given by

B(φ)

=1

N

N∑j=1

(φj − φ

), MSE

(φ)

=1

N

N∑j=1

(φj − φ

)2and CP (φ) =

1

N

N∑j=1

δAj,

where Aj ={φj − zα/2 se

(φ)< φ < φj + zα/2 se

(φ)}

and therefore, δAj

assumes 1 whenever

the CI obtained from the jth simulation contains the true value φ.

The following algorithm can be used to generate a single random variable from a ZMPSdistribution. The process to generate a random sample consists to run the algorithm as oftenas necessary, say n times. The sequential-search is a black-box type of algorithm and workswith any computable probability vector. The main advantage of such procedure is its easeof implementation. More informations on this algorithm can be found at Hormann, Leydold,and Derflinger (2013).

Algorithm 1 Sequential-Search

1: procedure SeqSea(θ, ω)2: Generate u ∼ U (0, 1)3: Set x← 04: Set p← (1− ω)5: while u > p do6: Set x← x+ 17: Set p← p+ ω f

ZTPS(x; θ)

8: end while9: return x

10: end procedure

Under ZMPS distribution, the expected number of iterations (NI), i.e. the expected numberof comparisons in the while condition is given by

µNI

= µZMPS

+ 1 = πµ+ 1

=ω (θ + 1)3

(θ2 + 2θ + 6

)θ (θ4 + 4θ3 + 10θ2 + 7θ + 2)

+ 1.

To run the simulation, we have established four scenarios in which a single value of θ waschosen for each one. The selected values were θ = 0.5, 1.0, 2.0 and 3.0. Moreover, we considerω = 0.1, 0.5 and 0.9 varying in each scenario. On the other hand, since parameter π dependson the values of θ and ω, its values vary within each scenario and are closer of ω for smallθ and when ω approaches to zero. Further, in order to evaluate if the coverage probabilityof the asymptotic CIs are around the nominal level of 95%, we fix α = 0.05 to compute suchmeasures in the simulation process.

In Table 2, the bias and the coverage probability of the MLEs are presented for each parameterinvolving the ZMPS model. Figures 2 and 3 depicts the mean squared error of such estimates.The results shows that both bias and mean squared error tends to zero when the sample sizeincreases. It is noteworthy that the MLE of ω is negative biased in some cases. The meansquared error of ω remains quite small even for small n, which also occurs for π the smallerthe value of θ. The coverage probabilities were found between 93% and 97% in most cases,indicating that the coverage of the 95% asymptotic CIs is relatively accurate. On the otherhand, one can note that for small n, the coverage probability of the CIs obtained for ω decre-

Page 12: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

12 On Zero-Modified Poisson-Sujatha Distribution to Model Overdispersed Count Data

Table 2: Estimated bias of the MLEs and coverage probability of the CIs.

Parameter Valuen = 50 n = 100 n = 150 n = 200

Bias CP Bias CP Bias CP Bias CP

Scenario 1

θ 0.50 0.0897 95.47 0.0371 95.29 0.0237 95.47 0.0173 95.39

ω 0.10 0.0014 89.02 0.0003 93.20 -0.0001 92.74 0.0001 92.56

π 0.11 0.0057 93.91 0.0021 95.47 0.0011 95.45 0.0008 95.48

θ 0.50 0.0104 94.97 0.0054 94.94 0.0040 94.97 0.0035 94.87

ω 0.50 -0.0008 93.29 0.0001 94.20 -0.0006 94.35 -0.0004 94.43

π 0.54 0.0020 93.64 0.0015 97.08 0.0003 97.25 0.0004 97.45

θ 0.50 0.0060 94.85 0.0032 94.87 0.0028 94.90 0.0023 94.72

ω 0.90 -0.0001 86.92 -0.0002 93.09 -0.0001 92.34 -0.0002 92.67

π 0.98 0.0027 99.75 0.0013 99.66 0.0010 99.72 0.0007 99.74

Scenario 2

θ 1.00 0.2274 94.78 0.1156 95.34 0.0701 95.62 0.0490 95.68

ω 0.10 0.0025 89.96 0.0004 93.25 0.0001 92.80 -0.0001 92.48

π 0.13 0.0207 96.62 0.0088 97.20 0.0051 97.71 0.0035 97.85

θ 1.00 0.0299 95.21 0.0146 94.92 0.0101 95.24 0.0080 95.01

ω 0.50 -0.0006 93.34 0.0001 94.23 -0.0004 94.31 -0.0003 94.35

π 0.67 0.0112 99.09 0.0059 99.53 0.0034 99.56 0.0027 99.65

θ 1.00 0.0160 95.05 0.0075 95.01 0.0060 94.92 0.0047 94.82

ω 0.90 -0.0002 86.97 -0.0001 93.10 -0.0001 92.36 -0.0002 92.68

π 1.20 0.0113 99.97 0.0053 99.97 0.0041 99.97 0.0030 99.99

Scenario 3

θ 2.00 0.4278 93.33 0.4225 94.36 0.2783 95.06 0.1806 95.21

ω 0.10 0.0053 91.71 0.0006 93.50 0.0001 92.87 0.0001 92.55

π 0.21 0.0597 96.46 0.0394 98.12 0.0252 98.98 0.0162 99.25

θ 2.00 0.1179 95.01 0.0551 95.12 0.0372 95.28 0.0289 95.26

ω 0.50 -0.0007 93.29 0.0001 94.22 -0.0004 94.33 -0.0003 94.35

π 1.04 0.0550 99.67 0.0265 99.91 0.0169 99.94 0.0132 99.95

θ 2.00 0.0621 95.68 0.0287 95.20 0.0215 94.24 0.0169 95.06

ω 0.90 -0.0002 86.98 -0.0001 93.10 -0.0001 92.36 -0.0002 92.68

π 1.87 0.0536 99.99 0.0247 99.99 0.0183 99.99 0.0140 99.99

Scenario 4

θ 3.00 0.3821 91.24 0.7463 93.52 0.6301 94.31 0.4740 94.66

ω 0.10 0.0078 92.69 0.0013 93.99 0.0003 92.97 0.0001 92.70

π 0.30 0.0832 96.70 0.0796 98.51 0.0620 99.22 0.0453 99.46

θ 3.00 0.3197 94.38 0.1425 95.06 0.0914 95.67 0.0690 95.21

ω 0.50 -0.0007 93.25 0.0001 94.22 -0.0004 94.32 -0.0003 94.38

π 1.48 0.1552 99.86 0.0707 99.97 0.0443 99.99 0.0335 99.99

θ 3.00 0.1436 95.05 0.0629 94.99 0.0451 95.14 0.0333 94.94

ω 0.90 -0.0002 86.93 -0.0002 93.07 -0.0001 92.38 -0.0002 92.69

π 2.67 0.1289 99.99 0.0567 99.99 0.0405 99.99 0.0297 99.99

Page 13: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

Austrian Journal of Statistics 13

50 100 150 200

0.00

0.02

0.04

0.06

0.08

0.10

n

MS

E (

θ)

ω = 0.10ω = 0.50ω = 0.90

50 100 150 200

0.00

0.02

0.04

0.06

0.08

0.10

nM

SE

)

ω = 0.10ω = 0.50ω = 0.90

50 100 150 200

0.00

0.02

0.04

0.06

0.08

0.10

n

MS

E (

π)

ω = 0.10ω = 0.50ω = 0.90

(a) Scenario 1

50 100 150 200

0.0

0.1

0.2

0.3

0.4

0.5

0.6

n

MS

E (

θ)

ω = 0.10ω = 0.50ω = 0.90

50 100 150 200

0.0

0.1

0.2

0.3

0.4

0.5

0.6

n

MS

E (

ω)

ω = 0.10ω = 0.50ω = 0.90

50 100 150 200

0.0

0.1

0.2

0.3

0.4

0.5

0.6

n

MS

E (

π)

ω = 0.10ω = 0.50ω = 0.90

(b) Scenario 2

Figure 2: Estimated mean squared error of the MLEs (scenarios 1 and 2).

ases at ≈ 89% when its true value is close to boundaries of the parametric space. The lastrelevant result is that the coverageprobability of the CIs obtained for π increases as the valuesof θ and ω increases, attaining values greater than 99% in the last cases of the third andfourth scenarios.

7. Application to real data

In this section, the ZMPS distribution is considered as an attempt to adequately model fourdatasets from biological science field. The goodness of fit of the proposed model is comparedwith those accessed by the P, the PS and the ZMP distributions. The first dataset is due toBeall (1940). The response is the number of Pyrausta nubilalis observed in small unit areasof a field in 1937. The second one refers to the number of chromatid aberrations observed onchemically induced chromosome aberrations in cultures of human leukocytes (Loeschcke andKohler 1976). The third and fourth datasets relates to the number of mammalian cytogeneticdosimetry lesions induced in rabbits by lymphoblast streptonigrin at the exposure of 60 mg/kgand 90 mg/kg, respectively. A broader description of the last three datasets is provided byShanker and Fesshaye (2016a).

Table 3 presents some descriptive statistics. The last column shows the observed proportionof zeros (PZ) in each dataset. One can note that more than 50% of the observations arezero-valued in all samples. Also, the initial analysis highlights the presence of overdispersion(see the index of dispersion), justifying the choice of the ZMPS model to describe such data.

Page 14: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

14 On Zero-Modified Poisson-Sujatha Distribution to Model Overdispersed Count Data

50 100 150 200

0.0

0.5

1.0

1.5

2.0

n

MS

E (

θ)

ω = 0.10ω = 0.50ω = 0.90

50 100 150 200

0.0

0.5

1.0

1.5

2.0

nM

SE

)

ω = 0.10ω = 0.50ω = 0.90

50 100 150 200

0.0

0.5

1.0

1.5

2.0

n

MS

E (

π)

ω = 0.10ω = 0.50ω = 0.90

(a) Scenario 3

50 100 150 200

01

23

45

n

MS

E (

θ)

ω = 0.10ω = 0.50ω = 0.90

50 100 150 200

01

23

45

n

MS

E (

ω)

ω = 0.10ω = 0.50ω = 0.90

50 100 150 200

01

23

45

n

MS

E (

π)

ω = 0.10ω = 0.50ω = 0.90

(b) Scenario 4

Figure 3: Estimated mean squared error of the MLEs (scenarios 3 and 4).

In Table 4 we present the frequency distribution of each sample. The expected frequencieswas obtained using the estimated probabilities considering the MLEs of the parameters θ andπ. Frequencies in bold relate to those one closer to the real values. The results show that theexpected frequencies provided by the ZMPS model are the closest in most cases.

Table 3: Variables and descriptive statistics for each dataset.

Label Variable n Mean Variance ID (%) PZ

Dataset 1Number of

56 0.7500 1.3182 175.76 0.5893Pyrausta nublilalis

Dataset 2Number of

400 0.5475 1.1256 205.58 0.6700Chromatid aberrations

Dataset 3Number of mammalian

601 0.4742 0.7398 156.00 0.6872cytogenetic lesions (E-60)

Dataset 4Number of mammalian

300 0.8533 1.3697 160.51 0.5167cytogenetic lesions (E-90)

The MLEs, the SEs and the 95% asymptotic CIs for the parameter θ of each fitted modelare presented in Table 5. The model selection was performed using the Akaike information

Page 15: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

Austrian Journal of Statistics 15

Table 4: Observed and expected frequencies from each fitted model.

CountsObserved Expected Frequency

Frequency P PS ZMP ZMPS

Dataset 1

0 33 26.45 31.47 33.00 33.00

1 12 19.84 14.17 10.83 12.30

2 6 7.44 6.13 7.34 5.91

3 3 1.86 2.55 3.32 2.71

4 1 0.35 1.03 1.12 1.20

5 1 0.05 0.40 0.30 0.52

Dataset 2

0 268 231.36 257.61 268.00 268.00

1 87 126.67 92.98 71.81 79.05

2 26 34.68 32.71 40.04 32.38

3 9 6.33 11.19 14.88 12.80

4 4 0.87 3.73 4.15 4.90

5 2 0.09 1.22 0.93 1.83

6 1 0.01 0.39 0.17 0.67

7 3 0.00 0.12 0.03 0.24

Dataset 3

0 413 374.05 406.13 413.00 413.00

1 124 177.38 132.99 115.99 123.26

2 42 42.06 42.67 52.14 43.02

3 15 6.65 13.37 15.62 14.61

4 5 0.79 4.10 3.51 4.83

5 0 0.07 1.23 0.63 1.56

6 2 0.01 0.36 0.09 0.50

Dataset 4

0 155 127.80 157.53 155.00 155.00

1 83 109.05 77.56 71.93 80.64

2 33 46.53 36.42 45.66 36.81

3 14 13.24 16.37 19.32 16.11

4 11 2.82 7.10 6.13 6.81

5 3 0.48 2.99 1.56 2.79

6 1 0.07 1.23 0.33 1.12

criterion with correction for finite samples (AICc) and the Bayesian information criterion(BIC). The goodness of fit was evaluated by the χ2 statistic. It is noteworthy that thesmaller AICc’s are provided by the ZMPS model. On the other hand, in some cases the PSdistribution presents similar fit when compared with its zero-modified version, as can be seenby that obtained from Dataset 4. In fact, there exist evidences that the proposed modeladheres better to the considered datasets and hence, can be considered as a suitable optionto model zero inflated/deflated count data in the presence of overdispersion.

The summary for the parameters ω and π can be found at Table 6. Under the ZMPS model,the inference about the parameter π allow us to identify that may exists an evidence thatthe first three datasets are zero-inflated while the last one may be classified as zero-deflated.By the 95% CIs obtained for the parameter π, we can also conclude that the PS distribution

Page 16: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

16 On Zero-Modified Poisson-Sujatha Distribution to Model Overdispersed Count Data

Table 5: Summary for the parameter θ and comparison criteria.

Dataset Model θ SE(θ) 95% CI

AICc BIC χ2

Lower Upper

1

P 0.7500 0.1157 0.5232 0.9769 145.24 147.19 24.08

PS 2.2415 0.3167 1.6208 2.8622 136.03 137.98 1.38

ZMP 1.3551 0.4196 0.5327 2.1776 62.31 66.14 2.00

ZMPS 1.9779 0.4516 1.0929 2.8630 61.88 65.71 0.53

2

P 0.5475 0.0370 0.4750 0.6200 881.04 885.02 13473.41

PS 2.8291 0.1707 2.4945 3.1637 809.33 813.32 71.88

ZMP 1.1151 0.2388 0.6471 1.5831 322.99 330.94 338.30

ZMPS 2.3863 0.2476 1.9009 2.8717 300.23 308.18 35.41

3

P 0.4742 0.0281 0.4191 0.5292 1167.36 1171.75 726.48

PS 3.1258 0.1640 2.8043 3.4472 1115.68 1120.07 9.76

ZMP 0.8990 0.2731 0.3637 1.4344 376.34 385.12 42.19

ZMPS 2.8538 0.2722 2.3204 3.3874 369.61 378.38 6.16

4

P 0.8533 0.0533 0.7488 0.9579 802.94 806.63 65.49

PS 2.0342 0.1183 1.8023 2.2658 767.89 771.58 3.27

ZMP 1.2694 0.1853 0.9063 1.6325 362.08 369.45 13.25

ZMPS 2.1041 0.1966 1.7187 2.4894 354.15 361.51 3.35

Table 6: Summary for the parameters ω and π under ZMP and ZMPS models.

Dataset Model ω SE (ω)95% CI

π SE (π)95% CI

Lower Upper Lower Upper

1ZMP

0.411 0.066 0.282 0.5400.554 0.161 0.237 0.870

ZMPS 0.846 0.206 0.442 1.249

2ZMP

0.330 0.024 0.284 0.3760.491 0.061 0.371 0.611

ZMPS 0.795 0.092 0.616 0.975

3ZMP

0.313 0.019 0.276 0.3500.528 0.055 0.421 0.634

ZMPS 0.886 0.095 0.700 1.072

4ZMP

0.483 0.029 0.427 0.5400.672 0.076 0.523 0.821

ZMPS 1.046 0.102 0.846 1.247

may be a reasonable choice to model the last two datasets, since the provided CIs containsthe value 1. However, in the cases where exist evidence of zero modification, zero-modifiedmodels remain preferred.

8. Concluding remarks

In this paper, the ZMPS distribution was introduced as an alternative to model overdispersedcount data having inflation or deflation of zeros. We discuss some of its mathematical proper-ties as the expected value, the variance and the coefficients of variation, skewness and kurtosis.Moreover, using the hurdle version of the proposed model we derive the log-likelihood func-tion, the score function, the information matrix and present some properties concerning theMLEs of the model parameters. Also, we performed a simulation study where the bias andthe mean squared error of the MLEs as well the coverage probability of the asymptotic CIs

Page 17: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

Austrian Journal of Statistics 17

were computed and indicated the suitability of the considered methodology. The usefulness ofthe proposed distribution was evaluated by fitting it to four datasets obtained from biologicalscience field with characteristics of overdispersion and zero modification. The model selectionwas performed using the AICc and the BIC criteria. The goodness of fit was accessed by theχ2 statistic. The provided results demonstrate the superiority of the proposed model over theP, the PS and the ZMP distributions, confirming its applicability to model overdispersed andzero-modified count data.

Acknowledgements

We would like to thank the associate editor and the anonymous reviewers for their insightfulsuggestions to improve the manuscript. In addition, the first two authors would like to thankthe Federal Technology University of Parana for the financial support during this research.

References

Angers JF, Biswas A (2003). “A Bayesian Analysis of Zero–Inflated Generalized PoissonModel.” Computational Statistics & Data Analysis, 42(1), 37–46.

Bahn GD, Massenburg R (2008). “Deal with Excess Zeros in the Discrete Dependent Variable,the Number of Homicide in Chicago Census Tract.” In Joint Statistical Meetings of theAmerican Statistical Association, pp. 3905–12.

Beall G (1940). “The Fit and Significance of Contagious Distributions When Applied toObservations on Larval Insects.” Ecology, 21(4), 460–474.

Beuf KD, Schrijver JD, Thas O, Criekinge WV, Irizarry RA, Clement L (2012). “ImprovedBase–Calling and Quality Scores for 454 Sequencing Based on a Hurdle Poisson Model.”BMC bioinformatics, 13(1), 303.

Bohara AK, Krieg RG (1996). “A Zero–Inflated Poisson Model of Migration Frequency.”International Regional Science Review, 19(3), 211–222.

Cancho VG, Louzada-Neto F, Barriga GDC (2011). “The Poisson–Exponential Lifetime Dis-tribution.” Computational Statistics & Data Analysis, 55(1), 677–686.

Conceicao KS, Andrade MG, Louzada F (2013). “Zero–Modified Poisson Model: Bayesian Ap-proach, Influence Diagnostics, and an Application to a Brazilian Leptospirosis NotificationData.” Biometrical Journal, 55(5), 661–678.

Dietz E, Bohning D (2000). “On Estimation of the Poisson Parameter in Zero–ModifiedPoisson Models.” Computational Statistics & Data Analysis, 34(4), 441–459.

Gurmu S, Trivedi PK (1996). “Excess Zeros in Count Models for Recreational Trips.” Journalof Business & Economic Statistics, 14(4), 469–477.

Heilbron DC, Gibson DR (1990). “Shared Needle Use and Health Beliefs Concerning AIDS:Regression Modeling of Zero–Heavy Count Data. Poster Session.” In Proceedings of theSixth International Conference on AIDS, San Francisco, CA.

Hormann W, Leydold J, Derflinger G (2013). Automatic Nonuniform Random Variate Gen-eration. Springer Science & Business Media.

Hu MC, Pavlicova M, Nunes EV (2011). “Zero–Inflated and Hurdle Models of Count Data withExtra Zeros: Examples from an HIV-Risk Reduction Intervention Trial.” The Americanjournal of drug and alcohol abuse, 37(5), 367–375.

Page 18: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

18 On Zero-Modified Poisson-Sujatha Distribution to Model Overdispersed Count Data

Lambert D (1992). “Zero–Inflated Poisson Regression, with an Application to Defects inManufacturing.” Technometrics, 34(1), 1–14.

Loeschcke V, Kohler W (1976). “Deterministic and Stochastic Models of the Negative Bino-mial Distribution and the Analysis of Chromosomal Aberrations in Human Leukocytes.”Biometrische Zeitschrift, 18(6), 427–451.

McDowell A (2003). “From the Help Desk: Hurdle Models.”The Stata Journal, 3(2), 178–184.

Mullahy J (1986). “Specification and Testing of Some Modified Count Data Models.” Journalof Econometrics, 33(3), 341–365.

R Development Core Team (2007). R: A Language and Environment for Statistical Comput-ing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URLhttp://www.R-project.org.

Ridout M, Demetrio CGB, Hinde J (1998). “Models for Count Data with Many Zeros.” InProceedings of the XIXth international biometric conference, volume 19, pp. 179–192.

Saffar SE, Adnan R, Greene W (2012). “Parameter Estimation on Hurdle Poisson RegressionModel with Censored Data.” Jurnal Teknologi, 57(1), 189–198.

Sankaran M (1970). “The Discrete Poisson–Lindley Distribution.”Biometrics, 26(1), 145–149.

Shanker R (2016a). “The Discrete Poisson–Shanker Distribution.” Jacobs Journal of Bio-statistics, 1(1), 005.

Shanker R (2016b). “Sujatha Distribution and Its Applications.” Statistics in Transition–Newseries, 17(3), 1–20.

Shanker R (2016c). “The Discrete Poisson–Sujatha Distribution.” International Journal ofProbability and Statistics, 5(1), 1–9.

Shanker R, Fesshaye H (2016a). “On Poisson–Sujatha Distribution and Its Applications toModel Count Data from Biological Sciences.” Biometrics & Biostatistics InternationalJournal, 3(4), 1–7.

Shanker R, Fesshaye H (2016b). “Size–Biased Poisson–Sujatha Distribution with Applica-tions.” American Journal of Mathematics and Statistics, 6(4), 145–154.

Shanker R, Fesshaye H (2016c). “Zero–Truncated Poisson–Sujatha Distribution with Appli-cations.” Journal of Ethiopian Statistical Association, 24, 55–63.

Shanker R, Fesshaye H (2016d). “On Zero–Truncation of Poisson, Poisson–Lindley andPoisson–Sujatha Distributions and Their Applications.” Biometrics & Biostatistics In-ternational Journal, 3(5), 1–13.

Zamani H, Ismail N (2010). “Negative Binomial–Lindley Distribution and Its Application.”Journal of Mathematics and Statistics, 6(1), 4–9.

Zorn CJW (1996). “Evaluating Zero–Inflated and Hurdle Poisson Specifications.” MidwestPolitical Science Association, 18(20), 1–16.

Page 19: On Zero-Modified Poisson-Sujatha Distribution to Model … · 2019-08-02 · 2 On Zero-Modi ed Poisson-Sujatha Distribution to Model Overdispersed Count Data few.Bohara and Krieg(1996)

Austrian Journal of Statistics 19

Affiliation:

Wesley Bertoli da SilvaDepartment of MathematicsFederal Technology University of ParanaCuritiba, PR, BrazilE-mail: [email protected]

Angelica Maria T. RibeiroDepartment of MathematicsFederal Technology University of ParanaCuritiba, PR, BrazilE-mail: [email protected]

Katiane S. ConceicaoInstitute of Mathematical Sciences and ComputationDepartment of StatisticsUniversity of Sao PauloSao Carlos, SP, BrazilE-mail: [email protected]

Marinho G. AndradeInstitute of Mathematical Sciences and ComputationDepartment of StatisticsUniversity of Sao PauloSao Carlos, SP, BrazilE-mail: [email protected]

Francisco LouzadaInstitute of Mathematical Sciences and ComputationDepartment of StatisticsUniversity of Sao PauloSao Carlos, SP, BrazilE-mail: [email protected]

Austrian Journal of Statistics http://www.ajs.or.at/

published by the Austrian Society of Statistics http://www.osg.or.at/

Volume 47 Submitted: 2016-08-08June 2018 Accepted: 2016-10-23