mohammed - university of otago · mohammed research in pharmacoepidemiology (ripe) @ national...

22
Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Upload: others

Post on 16-Feb-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Mohammed

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 2: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

What is zero inflation?

Suppose you want to study hippos and the effect of habitat variables on their distribution. When sampling, you may count zero hippos at many

sites and as a result standard statistical techniques like regression and GLM are not applicable, therefore zero inflated models should be used

Page 3: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Issues: Excess zeros

• Often, the numbers of zeros in the sample cannot be

accommodated properly by a Poisson or Negative Binomial

model. Both models would underpredict them.

• There is said to be an “excess zeros” problem.

• New models are needed to deal with these type of data.

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 4: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Issues: Excess zeros

Types:

• Zero-Inflated Poisson (ZIP)

• Zero-Inflated Negative Binomial (ZINB) Models

• Hurdle models

• These models are designed to deal with situations where there isan “excessive” number of individuals with a count of 0.

• Poisson regression models provide a standard framework for theanalysis of count data.

• In practice, however, count data are often over-dispersed relativeto the Poisson distribution.

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 5: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Over-dispersion

• Because the Poisson model assumes that the conditional variance

of the dependent variable is equal to the conditional mean.

• In most count data sets, the conditional variance is greater than

the conditional mean, often much greater, a phenomenon known

as over-dispersion.

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 6: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Consequence of over-dispersion

• Standard errors will be underestimated

• Potential for overconfidence in results; rejecting H0 when you

shouldn’t!

• Note: over-dispersion doesn’t necessarily affect predicted counts

(compared to alternative models).

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 7: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Issues: Excess zeros

• If data consist of non-negative, highly skewed sequence countswith a large proportion of zeros. Zero-inflated models are usefulfor analysing such data.

• Moreover, the non-zero observations may be over-dispersed inrelation to the Poisson distribution, biasing parameter estimatesand underestimating standard errors.

• In such a circumstance, a zero-inflated negative binomial (ZINB)model better accounts for these characteristics compared to azero-inflated Poisson (ZIP).

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 8: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Zero-Inflated Models

• These models, called Two-part models, allow for two differentprocess:

– one drives whether the value is 0 or positive (participationpart), and

– the other one drives the value of the strictly positive count(amount part).

• Proposed models:

– Zero inflated models

– Hurdle models

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 9: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Zero Inflation – ZIP Models

Structure:

• Zero-inflated Poisson model have two kinds of zeros: “true zeros”

and “excess zeros.”

• Two groups of people: Always Zero & Not Always Zero

• Example: Investors (traders) who sometime just did not trade that

week versus investors who never ever do.

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 10: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Zero Inflation – ZIP Models

• Two models: (1) for the count and (2) for excess zeros. The key

difference is that the count model allows zeros now.

• If we are interested in modelling trading, the zeros from investors

who will never trade are not relevant. But, we only observe the

zero, not the type of investor. This is the excess zeros problem.

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 11: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Zero-inflated model

Simple definition:

• In statistics, a zero-inflated model is a statistical model based on a

zero-inflated probability distribution, i.e. a distribution that allows

for frequent zero-valued observations.

• Zero-inflated Poisson (ZIP) model is used to model data with

excess zeroes

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 12: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Zero Inflation – ZIP Models

• Note: lots of zeros

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 13: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Zero-inflated Poisson

• The first zero-inflated model is zero-inflated Poisson model. Thezero-inflated Poisson model concerns a random event containingexcess zero-count data in unit time

• The Poisson Distribution is a discrete distribution which takes onthe values of X = 0, 1, 2, 3,….. It is often used as a model for thenumber of events in a specific time period..

• Also used to calculate the probability of a number of successesthat take place in a certain interval of time or space.

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 14: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Zero-inflated Poisson

• The zero-inflated Poisson (ZIP) model employs two componentsthat correspond to two zero generating processes.

• The first process is governed by a binary distribution thatgenerates extra zeros. (OR or LOGIT models commonly used)

• The second process is governed by a Poisson distribution thatgenerates count (counting zeroes), some of which may be zero.

• The two model components are described as follows:

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 15: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Zero-inflated Poisson

𝑃𝑟 𝑦𝑗 = 0 = 𝜋 + (1 + 𝜋)𝑒−𝜆

𝑃𝑟 𝑦𝑗 = ℎ𝑖 = (1 − 𝜋)𝜆ℎ𝑖𝑒−𝜆

ℎ𝑖!, ℎ𝑖 ≥ 1

where the outcome variable 𝑦𝑗 has any non-negative integer value

(h = observed count), 𝜆𝑖 is the expected Poisson count (expectedcount and variance) for the ith individual (Called mu (𝜇) in some texts);𝜋 is the probability of extra zeros.

The mean is (1 − 𝜋) 𝜆 and the variance is 𝜆 (1 − 𝜋) (1 + 𝜆 𝜋)

E.g., the number of insurance claims within a population for a certaintype of risk would be zero-inflated by those people who have nottaken out insurance against the risk and thus are unable to claim.

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 16: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Notes on Zero Inflation Models

• Poisson is not nested in ZIP

• Standard tests are not appropriate

• Use Vuong statistic. ZIP model almost always wins.

• Zero Inflation models extend to NB models –ZINB are standardmodels

– Creates two sources of over-dispersion

– Generally difficult to estimate

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 17: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Zero-Inflated Negative Binomial (ZINB)

• The zero-inflated negative binomial (ZINB) distribution is amixture of binary distribution that is degenerate at zero and anordinary count distribution such as negative binomial

• The negative binomial regression can be written as an extensionof Poisson regression and it enables the model to have greaterflexibility in modelling the relationship between the conditionalvariance and the conditional mean compared to the Poissonmodel.

• The binary distribution captures the excess number of zeros,which exceed those predicted by the negative binomialdistribution.

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 18: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Hurdle Models

• A hurdle model is also a modified count model with two parts:

- one generating the zeros

- one generating the positive values.

• The models are not constrained to be the same.

• A binomial probability model governs the binary outcome of whether a count variable has a zero or a positive value.

If yi > 0, the "hurdle is crossed," the conditional distribution of the positive values is governed by a zero-truncated count model.

• Popular models in health economics (use of health care facilities, counselling, drugs, alcohol, etc.).

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 19: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Take away message

• The zero inflated Poisson (ZIP) model is one way to allow for over-dispersion

• This model assumes that the sample is a “mixture” of two sorts ofindividuals:

– one group whose counts are generated by the standard Poisson regressionmodel, and

– another group (call them the absolute zero group) who have zeroprobability of a count greater than 0.

• Observed values of 0 could come from either group.

• Although not essential, the model is typically elaborated toinclude a logistic regression model predicting which group anindividual belongs to.

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 20: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Take away message

But what about the zero-inflated negative binomial (ZINB) model?

• It’s certainly possible that a ZINB model could fit better than aconventional negative binomial model regression model.

• But, the latter is a special case of the former, so it’s easy to do alikelihood ratio test to compare them (by taking twice the positivedifference in the log-likelihoods)

• So next time thinking about fitting a zero-inflated regressionmodel, first consider whether a conventional negative binomialmodel might be good enough. Having a lot of zeros doesn’tnecessarily mean that you need a zero-inflated model.

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 21: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Take away message

• In cases of over-dispersion, the ZIP model typically fits better than a standard Poisson model.

• But there’s another model that allows for over-dispersion, and that’s the standard negative binomial regression model.

• Experts says; the negative binomial model fits much better than a ZIP model, as evaluated by AIC or BIC statistics and it’s a much simpler model to estimate and interpret

Ref: http://statisticalhorizons.com/zero-inflated-models

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago

Page 22: Mohammed - University of Otago · Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago. Research in Pharmacoepidemiology (RIPE) @ National

Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of OtagoResearch in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago