a com-poisson mixed model for clustered count data darcy...

Post on 07-Aug-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A COM-Poisson Mixed Model for

Clustered Count Data

Darcy Steeg Morris, U.S. Census Bureau

Kimberly Sellers, Georgetown University and U.S. Census Bureau

Conference on Multivariate Count AnalysisJuly 5, 2018

1 / 53

Disclaimer

This presentation is intended to inform interested parties of ongoingresearch and to encourage discussion of work in progress. Any viewsexpressed on statistical, methodological, technical, or operational issuesare those of the authors and not necessarily those of the U.S. CensusBureau.

2 / 53

Outline

Introduction

COM-Poisson Distribution and Regression

COM-Poisson Mixed ModelCOM-Poisson-Normal ModelCOM-Poisson-Conjugate Model

Analysis: Simulated Data

Analysis: Epilepsy Data

Discussion

3 / 53

Clustered Count Data

I Count data may exhibit over- or under-dispersion.

I Positive correlation between responses is one cause of over-dispersion(Hilbe 2008).

I Clustered data has inherent correlation within a cluster.

I Can account for the correlation by incorporating random effects in acount model.

4 / 53

Random Intercept Poisson Model

I Poisson distribution assumes equi-dispersion.

I Random effects loosen this assumption to capture additional variabilityinduced by the correlation of measurements within a cluster.

I Model Assumptions:

yij |αi ∼ Poi(λ∗ij)

log(λ∗ij

)= β1xij1 + · · ·+ βpxijp + αi

αi ∼ g(αi |θ)

where yij is the count outcome for cluster i = 1, . . . ,N at occurrence

j = 1, . . . , Ji ; xij1, . . . , xijp are the p covariates for cluster i at occurrence j ; and

αi is the cluster-specific random intercept.

5 / 53

Random Intercept Poisson Model

L(β,θ) =N∏i=1

∫ Ji∏j=1

e−λ∗ijλ∗yijij

yij !

g(αi |θ) dαi

I Random intercept distributional assumption:

1. g(αi |θ) = N(µ, σ2)⇒ Poisson-normal model, or

2. g(ui |θ) =gamma(a, c), where ui = eαi ⇒ Poisson-gamma model.

I Assumption (1) requires numerical integration.

I Assumption (2) reduces to a tractable form of the density.

6 / 53

Why use COM-Poisson?

I Dispersion may also exist from underlying count process mechanism(mean-variance relationship) - not adequately modeled by clusterrandom effects (Booth et. al. 2003, Molenberghs et. al. 2007, 2010).

I The Conway-Maxwell-Poisson (COM-Poisson) distribution is a flexiblecount distribution that allows for under- and over-dispersion.

I COM-Poisson model for clustered data allows modeling of additionalvariability due to (1) the within-cluster correlation and (2) thedispersion from the underlying count process.

I Marginal COM-Poisson models have been proposed (Khan andJowaheer 2013, Choo-Wosoba et. al. 2016).

I We study a COM-Poisson mixed (i.e. conditional) model(Choo-Wosoba and Datta 2018, Choo-Wosoba et. al. 2018).

7 / 53

Outline

Introduction

COM-Poisson Distribution and Regression

COM-Poisson Mixed ModelCOM-Poisson-Normal ModelCOM-Poisson-Conjugate Model

Analysis: Simulated Data

Analysis: Epilepsy Data

Discussion

8 / 53

COM-Poisson Distribution

I The COM-Poisson pmf takes the form (Shmueli et. al., 2005)

P(Y = y | λ, ν) =λy

(y !)νZ (λ, ν), y = 0, 1, 2, . . .

for a random variable Y , where Z (λ, ν) =∑∞

s=0λs

(s!)ν is a normalizingconstant.

I Dispersion parameter ν ≥ 0:

ν = 1 ⇒ equi-dispersion

ν > 1 ⇒ under-dispersion

ν < 1 ⇒ over-dispersion

I Special Cases: Poisson (ν = 1), geometric (ν = 0, λ < 1) andBernoulli (ν →∞ with probability λ

1+λ).

9 / 53

COM-Poisson Distribution: Mean

Moments are not of closed form, but mean can be approximated:

E (Y ) = λ∂ logZ (λ, ν)

∂λ≈ λ1/ν − ν − 1

for ν ≤ 1 or λ > 10ν (Shmueli et. al., 2005). Or more generally?

10 / 53

COM-Poisson Regression

I Sellers and Shmueli (2010) extend the COM-Poisson distribution toregression.

I Allows varying λ for each observation i .

I Model Assumptions:

yi ∼ CMP(λi , ν)

log (λi ) = β0 + β1xi1 + · · ·+ βpxip

I Indirectly models the relationship between the mean and the linearpredictor.

11 / 53

COM-Poisson Regression

I Likelihood:

L(β, ν) =N∏i=1

λyii(yi !)

ν Z (λi , ν)

I Loglikelihood:

log L(β, ν) =N∑i=1

yi log λi − νN∑i=1

log yi !−N∑i=1

logZ (λi , ν)

I Sellers and Shmueli (2010): maximum likelihood estimation.

I Guikema and Coffelt (2008): Bayesian estimation of are-parameterized version.

12 / 53

Outline

Introduction

COM-Poisson Distribution and Regression

COM-Poisson Mixed ModelCOM-Poisson-Normal ModelCOM-Poisson-Conjugate Model

Analysis: Simulated Data

Analysis: Epilepsy Data

Discussion

13 / 53

Random Intercept COM-Poisson Model

I Extend Sellers and Shmueli (2010) COM-Poisson regression model toinclude a random effect.

I Model Assumptions:

yij |αi ∼ CMP(λ∗ij , ν)

log(λ∗ij

)= log (uiλij) = β1xij1 + · · ·+ βpxijp + αi

αi ∼ g(αi |θ)

I αi assumed to capture all within-cluster correlation so that

yij ⊥ yik |αi for j 6= k .

14 / 53

Random Intercept COM-Poisson Model

L(β, ν,θ) =N∏i=1

∫ Ji∏j=1

λ∗yijij

(yij !)ν1

Z (λ∗ij , ν))

g(αi |θ) dαi

I Random intercept distributional assumption:

1. g(αi |θ) = N(µ, σ2)⇒ COM-Poisson-normal model, or

2. g(ui |θ) ∝ ua−1i Z−c(ui , ν)⇒ COM-Poisson-conjugate model.

I Assumption (1) and (2) BOTH require numerical integration.

15 / 53

Outline

Introduction

COM-Poisson Distribution and Regression

COM-Poisson Mixed ModelCOM-Poisson-Normal ModelCOM-Poisson-Conjugate Model

Analysis: Simulated Data

Analysis: Epilepsy Data

Discussion

16 / 53

CMP-normal Model: Loglikelihood

I CMP-normal loglikelihood involves an intractable integral:

log L(β, ν, µ, σ2) = log

N∏i=1

∫ Ji∏j=1

f (yij |αi )︷ ︸︸ ︷(λ∗ij )

yij

(yij !)ν1

Z(λ∗ij , ν)

g(αi )︷ ︸︸ ︷[

1

σ√2π

e− (αi−µ)2

2σ2

]dαi

=

N∑i=1

Ji∑j=1

yij log(λij)− νN∑i=1

Ji∑j=1

log(yij !)−N∑i=1

log(σ√2π)

+N∑i=1

log

∫ eαi

∑Jij=1 yij−

(αi−µ)2

2σ2

(Ji∏j=1

Z(eαiλij , ν)

)−1

dαi

17 / 53

CMP-normal Model: MLE

Obtain maximum likelihood estimates in R (with help Rcpp!) using:

1 numerical integration (the integrate function) to obtain anapproximation of the marginal loglikelihood,

2 optimization (the nlminb function) to maximize the approximatemarginal loglikelihood.

Maximum likelihood estimates of the CMP-normal model can similarly beobtained in SAS R© using the NLMIXED procedure (Morris et. al. 2017).

18 / 53

Outline

Introduction

COM-Poisson Distribution and Regression

COM-Poisson Mixed ModelCOM-Poisson-Normal ModelCOM-Poisson-Conjugate Model

Analysis: Simulated Data

Analysis: Epilepsy Data

Discussion

19 / 53

COM-Poisson Conjugate (Kadane et. al. 2005)

I Conjugate prior for COM-Poisson (“extended bivariate gamma”):

h(λ, ν) = λa−1e−νbZ−c(λ, ν) κ(a, b, c)

where κ(a, b, c) is the integration constant.

I Associated conditional distribution of λ:

h(λ|ν) = λa−1Z−c(λ, ν) κ(a, c)

where κ(a, c) is the integration constant.

20 / 53

CMP Conditional Conjugate: Special Cases

I Conditional conjugate distribution h(λ|ν) special cases:

1. ν = 1⇒ gamma(a, c),

2. ν = 0⇒ beta(a, c + 1), and

3. ν →∞⇒ ac−aF (2a, 2(c − a)), c > a ≡ λ

1+λ ∼ beta(a, c − a).

I COM-Poisson conjugate relationship special cases:

1. ν = 1⇒ Poisson-gamma,

2. ν = 0⇒ geometric-beta, and

3. ν →∞⇒ Bernoulli-beta.

21 / 53

CMP Conditional Conjugate

h(λ|ν, a, c) Shiny App

22 / 53

CMP Conjugate: Parameter Constraints

I Joint conjugate h(λ, ν): κ−1(a, b, c) is finite when (Kadane et. al.2005)

b

c> log(ba/cc!) + (a/c − ba/cc) log(ba/cc+ 1)

I Conditional conjugate h(λ|ν):

κ−1(a, c) =

∫λa−1Z−c(λ, ν)dλ =

∫λa−1

[ ∞∑k=0

λk

(k!)ν

]−cdλ

divergent when a > c and ν large.

23 / 53

Empirical Study of Parameter Constraints

κ−1(a, c) evaluated over a ∈ (.1, 5), c ∈ (.1, 5) by .1 and ν ∈ (0, 30) by .05 for a > c.

24 / 53

CMP-conjugate Model: Loglikelihood

I Poisson-gamma model: conjugate distribution ⇒ closed form.

I Unfortunately CMP-conjugate loglikelihood involves intractableintegrals (violates strong conjugacy, Molenberghs et. al. 2010):

log L(β, ν, a, c) = log

N∏i=1

∫ Ji∏j=1

f (yij |ui )︷ ︸︸ ︷(λ∗ij )

yij

(yij !)ν1

Z(λ∗ij , ν)

g(ui )︷ ︸︸ ︷[

ua−1i Z−c(ui , ν) κ(a, c)

]dui

=

N∑i=1

Ji∑j=1

yij log(λij)− νN∑i=1

Ji∑j=1

log(yij !) +N∑i=1

log (κ(a, c))

+N∑i=1

log

∫ui

ua−1+

∑Jij=1 yij

i

(Z c(ui , ν)

Ji∏j=1

Z(uiλij , ν)

)−1

dui

25 / 53

CMP-conjugate Model: MLE

Obtain maximum likelihood estimates in R (with help Rcpp!) from using:

1 numerical integration (the integrate function) to obtain anapproximation of the marginal loglikelihood,

2 optimization (the nlminb function) to maximize the approximatemarginal loglikelihood.

26 / 53

Outline

Introduction

COM-Poisson Distribution and Regression

COM-Poisson Mixed ModelCOM-Poisson-Normal ModelCOM-Poisson-Conjugate Model

Analysis: Simulated Data

Analysis: Epilepsy Data

Discussion

27 / 53

Simulated Data Generating Process

I N = 100 clusters & Ji = 5 ∀ i (500 observations) & 50 replications.

I Distributional Assumptions:

yij |ui ∼ f (yij |λ∗ij , {ν})

log(λ∗ij

)= β1xi + αi

xi ∼ N(0, .1) and αi ∼ N(.5, .5) and β1 = .5

where f (yij |λ∗ij , {ν}) is:

I Poi(λ∗ij) ≡ CMP(λ∗ij , 1

)I Bern

(λ∗ij

1+λ∗ij

)≡ CMP

(λ∗ij ,∞

)I geom

(1

1+λ∗ij

)∼= CMP

(λ∗ij , 0

)I CMP

(λ∗ij , 5

)I CMP

(λ∗ij , .75

)

28 / 53

Simulated Data: Mean Link

Recall

E (Y ) = λ∗∂ logZ (λ∗, ν)

∂λ∗

I For Poisson data:

E (Y ) = λ∗ ≡ E (Y ) = λ∗∂ log

(eλ∗)

∂λ∗= λ∗

I For Bernoulli data:

E (Y ) = p =λ∗

1 + λ∗≡ E (Y ) = λ∗

∂ log (1 + λ∗)

∂λ∗=

λ∗

1 + λ∗

I For geometric data:

E (Y ) =1− p

p= λ∗ 6≡ E (Y ) = λ∗

∂ log (1− λ∗)−1

∂λ∗=

λ∗

1− λ∗

29 / 53

Simulated Data: Misspecification

I Fit 4 models: Poisson-normal, NB-normal, CMP-normal,CMP-conjugate.

I Sources of misspecification.

I Random effect distribution.

I Special Cases: CMP-conjugate.

I COM-Poisson Data: CMP-conjugate.

I Link to linear predictor.

I Special Cases: CMP-normal/conjugate for geometric data.

I COM-Poisson Data: Poisson-normal, NB-normal.

30 / 53

Simulation Study Results: Special Cases

Special Case Simulated Data Mean Estimates

Simulated Model

Dataset Est. Poisson NB CMP-normal CMP-conjugate

Poisson Disp. k = 0.00 ν = 1.02 ν = 0.99Var. σ2 = 0.49 σ2 = 0.49 σ2 = 0.51 a = 2.12, c = 1.01

min AIC 0.96 0.72 0.96 0.12max ` 0.00 0.04 0.76 0.20

Bernoulli** Disp. k = 0.00 ν = 37.9 ν = 34.8Var. σ2 = 0.00 σ2 = 0.00 σ2 = 0.53 a = 7.26, c = 11.75

min AIC 0.00 0.00 1.00 1.00max ` 0.00 0.00 0.55 0.45

Geometric Disp k = 1.01 ν = 0.02 ν = 0.02Var. σ2 = 0.67 σ2 = 0.45 σ2 = 0.04 a = 6.70, c = 3.33

min AIC 0.00 0.98 0.22 0.34max ` 0.00 0.64 0.02 0.34

Note: min AIC is the proportion of replications where AIC ≤ min(AIC) + 2 and max `is proportion of replications where ` is largest.

** The random intercept logistic model results/estimates for the simulated Bernoullidata are: σ2 = 0.45, min AIC = 1.00, and max ` = 0.00.

31 / 53

Simulation Study Results: Special Cases

I COM-Poisson models have better/comparable model fit to specialcases.

I COM-Poisson model recognizes special cases:

I ν = 1.02, 0.99 ≈ 1 for Poisson,

I ν = 37.9, 34.8 is large for Bernoulli, and

I ν = 0.02, 0.02 ≈ 0 for geometric.

I Cluster variability.

I Captured by Poisson and NB for over-dispersed data: σ2 > 0.

(NB recognizes geometric special case: k ≈ 1.)

I Not captured by Poisson or NB for under-dispersed data: σ2 = 0.

I Captured by COM-Poisson models: σ2 > 0 and ...

32 / 53

Simulation Study Results: Special Cases

0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

Estimated Random Effect Distribution: Poisson Data

ui

Den

sity

True: E(u) = 2.12 , SD(u) = 1.71Lognormal (CMP): E(u) = 2.17 , SD(u) = 1.76Conjugate (CMP): E(u) = 2.09 , SD(u)= 1.43

0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

Estimated Random Effect Distribution: Bernoulli Data

ui

Den

sity

True: E(u) = 2.12 , SD(u) = 1.71Lognormal (CMP): E(u) = 2.12 , SD(u) = 1.61Conjugate (CMP): E(u) = 2.08 , SD(u)= 1.61

33 / 53

Simulation Study Results: Special Cases

0 2 4 6 8

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Estimated Random Effect Distribution: Geometric Data

ui

Den

sity

True: E(u) = 2.12 , SD(u) = 1.71True (adjusted): E(u) = 0.83 , SD(u) = 0.12Lognormal (CMP): E(u) = 0.63 , SD(u) = 0.13Conjugate (CMP): E(u) = 0.62 , SD(u)= 0.15

34 / 53

Simulation Study Results: COM-Poisson Data

COM-Poisson Simulated Data Mean Estimates

Simulated Model

Dataset Est. Poisson NB CMP-normal CMP-conjugate

CMP Disp. k = 0.00 ν = 5.05 ν = 5.08(under) Var. σ2 = 0.00 σ2 = 0.00 σ2 = 0.50 a = 6.00, c = 8.83

min AIC 0.00 0.00 1.00 0.97max ` 0.00 0.00 0.58 0.42

CMP Disp k = 0.00 ν = 0.77 ν = 0.74(over) Var. σ2 = 0.76 σ2 = 0.75 σ2 = 0.51 a = 1.76, c = 0.56

min AIC 0.17 0.19 0.93 0.19max ` 0.00 0.10 0.79 0.12

Note: min AIC is the proportion of replications where AIC ≤ min(AIC) + 2 and max `

is proportion of replications where ` is largest.

35 / 53

Simulation Study Results: COM-Poisson Data

I COM-Poisson models outperform for both cases of intermediate levelsof over- and under-dispersion.

I COM-Poisson models recognize over- and under-dispersion:

I ν = 5.05, 5.08 ≈ 5.00 > 1 for COM-Poisson (under), and

I ν = 0.77, 0.74 ≈ 0.75 < 1 for COM-Poisson (over).

I Cluster variability.

I Not captured by Poisson or NB for under-dispersed data: σ2 = 0.

I Captured by COM-Poisson models: σ2 > 0 and ...

36 / 53

Simulation Study Results: COM-Poisson Data

0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

Estimated Random Effect Distribution: CMP Underdispersed Data

ui

Den

sity

True: E(u) = 2.12 , SD(u) = 1.71Lognormal (CMP): E(u) = 2.26 , SD(u) = 1.81Conjugate (CMP): E(u) = 2.16 , SD(u)= 1.54

0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

Estimated Random Effect Distribution: CMP Overdispersed Data

ui

Den

sity

True: E(u) = 2.12 , SD(u) = 1.71Lognormal (CMP): E(u) = 2.18 , SD(u) = 1.78Conjugate (CMP): E(u) = 2.08 , SD(u)= 1.4

37 / 53

Simulation Study Results: Model Comparisons

Simulated Data Best Model by AIC

Simulated Model

Dataset Poisson NB CMP-normal CMP-conjugate

Poisson X X X

Bernoulli X X

Geometric X X

CMP (under) X X

CMP (over) X

Note: bolded are misspecified models.

I But what if random effect distribution is misspecified for -normalmodels?

ui ∼ gamma(1.54, 1.37)

assuming same mean and variance as ui ∼ log N(.5, .5).

38 / 53

Simulated Data: Misspecification

I Fit 4 models: Poisson-normal, NB-normal, CMP-normal,CMP-conjugate.

I Sources of misspecification.

I Random effect distribution.

I Special Cases: All except CMP-conjugate for Poisson data.

I COM-Poisson Data: All.

I Link to linear predictor.

I Special Cases: CMP-normal/conjugate for geometric data.

I COM-Poisson Data: Poisson-normal, NB-normal.

39 / 53

Simulation Study Results: Special Cases

Special Case Simulated Data Mean Estimates

Simulated Model

Dataset Est. Poisson NB CMP-normal CMP-conjugate

Poisson Disp. k = 0.00 ν = 1.03 ν = 1.00Var. σ2 = 0.68 σ2 = 0.68 σ2 = 0.71 a = 1.67, c = 0.48

min AIC 0.28 0.08 0.22 0.88max ` 0.00 0.00 0.12 0.88

Bernoulli** Disp. k = 0.00 ν = 36.0 ν = 35.6Var. σ2 = 0.00 σ2 = 0.00 σ2 = 0.96 a = 4.47, c = 6.50

min AIC 0.00 0.00 1.00 1.00max ` 0.00 0.00 0.69 0.31

Geometric Disp k = 1.03 ν = 0.00 ν = 0.00Var. σ2 = 0.90 σ2 = 0.66 σ2 = 0.04 a = 5.62, c = 1.59

min AIC 0.00 0.97 0.00 0.45max ` 0.00 0.76 0.00 0.24

Note: min AIC is the proportion of replications where AIC ≤ min(AIC) + 2 and max `is proportion of replications where ` is largest.

** The random intercept logistic model results/estimates for the simulated Bernoullidata are: σ2 = 0.85, min AIC = 1.00, and max ` = 0.00.

40 / 53

Simulation Study Results: Special Cases

0 2 4 6 8

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Estimated Random Effect Distribution: Poisson Data

ui

Den

sity

True: E(u) = 2.12 , SD(u) = 1.71Lognormal (CMP): E(u) = 3.84 , SD(u) = 3.91Conjugate (CMP): E(u) = 3.53 , SD(u)= 2.73

0 2 4 6 8

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Estimated Random Effect Distribution: Bernoulli Data

ui

Den

sity

True: E(u) = 2.12 , SD(u) = 1.71Lognormal (CMP): E(u) = 4.13 , SD(u) = 5.25Conjugate (CMP): E(u) = 4.38 , SD(u)= 33.31

41 / 53

Simulation Study Results: Special Cases

0 2 4 6 8

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Estimated Random Effect Distribution: Geometric Data

ui

Den

sity

True: E(u) = 2.12 , SD(u) = 1.71True (adjusted): E(u) = 0.82 , SD(u) = 0.14Lognormal (CMP): E(u) = 0.73 , SD(u) = 0.14Conjugate (CMP): E(u) = 0.73 , SD(u)= 0.17

42 / 53

Simulation Study Results: COM-Poisson Data

COM-Poisson Simulated Data Mean Estimates

Simulated Model

Dataset Est. Poisson NB CMP-normal CMP-conjugate

CMP Disp. k = 0.00 ν = 5.10 ν = 5.09(under) Var. σ2 = 0.00 σ2 = 0.00 σ2 = 0.79 a = 4.01, c = 5.14

min AIC 0.00 0.00 1.00 1.00max ` 0.00 0.00 0.21 0.79

CMP Disp k = 0.02 ν = 0.78 ν = 0.76(over) Var. σ2 = 1.14 σ2 = 1.14 σ2 = 0.77 a = 1.38, c = 0.23

min AIC 0.06 0.09 0.23 0.86max ` 0.00 0.06 0.14 0.80

Note: min AIC is the proportion of replications where AIC ≤ min(AIC) + 2 and max `

is proportion of replications where ` is largest.

43 / 53

Simulation Study Results: COM-Poisson Data

0 2 4 6 8

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Estimated Random Effect Distribution: CMP Underdispersed Data

ui

Den

sity

True: E(u) = 2.12 , SD(u) = 1.71Lognormal (CMP): E(u) = 3.73 , SD(u) = 4.09Conjugate (CMP): E(u) = 3.59 , SD(u)= 3.57

0 2 4 6 8

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Estimated Random Effect Distribution: CMP Overdispersed Data

ui

Den

sity

True: E(u) = 2.12 , SD(u) = 1.71Lognormal (CMP): E(u) = 3.92 , SD(u) = 4.22Conjugate (CMP): E(u) = 3.53 , SD(u)= 2.68

44 / 53

Simulation Study Results: Model Comparisons

Simulated Data Best Model by AIC

Simulated Model

Dataset Poisson NB CMP-normal CMP-conjugate

Poisson X X X X

Bernoulli X X

Geometric X X

CMP (under) X X

CMP (over) X X

Note: bolded are misspecified models.

45 / 53

Outline

Introduction

COM-Poisson Distribution and Regression

COM-Poisson Mixed ModelCOM-Poisson-Normal ModelCOM-Poisson-Conjugate Model

Analysis: Simulated Data

Analysis: Epilepsy Data

Discussion

46 / 53

Epilepsy Data & Model (Diggle et. al. 1994)

I Number of seizures measured for 59 epileptic patients in an 8-weekbaseline period followed by 4 consecutive 2-week treatment periods.

I Outcome Variable, yij : number of seizures for subject i in timeperiod j .

I Covariates:I xij1: indicator of a period after baseline (weeks 8− 16).I xij2: indicator of receipt of progabide (vs. placebo).I Tij : length of time period t.

I Model: Diggle et. al. (1994) fit a random intercept Poissonregression model.

log E (yij |ui ) = β0 + xij1β1 + xij2β2 + xij1xij2β3 + log(Tij) + ui

47 / 53

Epilepsy Data Results

Epilepsy Data Estimates

ModelParameter Poisson NB CMP-normal CMP-conjugate

β0 or µ 1.033 1.080 -0.781β1 0.111 0.023 0.793 0.633β2 -0.024 0.073 0.048 0.165β3 -0.104 -0.310 -0.176 -0.198k 0.148ν 0.420 0.541σ2 0.608 0.661 0.143

a, c 2.93, 2.79

AIC 2031.4 1789.5 1754.0 1776.6−` 1010.6 888.7 871.0 882.3

48 / 53

Epilepsy Data Results

I CMP models have the best fit.

I All models indicate subject variability: σ2 > 0 and ...

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

2.0

2.5

Estimated Prior Distribution for CMP Models

ui

Den

sity

Lognormal: E(u) = 0.49 , SD(u) = 0.19Conjugate: E(u) = 0.74 , SD(u)= 0.38

49 / 53

Epilepsy Data Results

I Additional over-dispersion is evident in CMP and NB:

I ν < 1 (ν = 0.420, 0.541) in COM-Poisson, and

I k > 0 (k = 0.148) in negative binomial.

I Both the negative binomial and COM-Poisson models can account forvariability beyond the subject-specific random effect, however theCOM-Poisson captures additional over-dispersion in a way that thenegative binomial model cannot.

I Findings consistent with Molenberghs et. al. (2010).

50 / 53

Outline

Introduction

COM-Poisson Distribution and Regression

COM-Poisson Mixed ModelCOM-Poisson-Normal ModelCOM-Poisson-Conjugate Model

Analysis: Simulated Data

Analysis: Epilepsy Data

Discussion

51 / 53

Discussion

I COM-Poisson regression model can be extended to include randomeffects to model clustered data.

I The flexibility of the COM-Poisson mixed model allows modeling ofvariability in the count outcome beyond that induced by within-clustercorrelation.

I Assuming the conditional conjugate distribution for random effectsallows further flexibility.

I Framework naturally allows random slopes, mixed modeling of thedispersion parameter, etc.

I To do: Mean parametrization?!

52 / 53

Thank you!

darcy.steeg.morris@census.gov

53 / 53

References

I Shmueli, Minka, Kadane, Borle and Boatwright. (2005). “A UsefulDistribution for Fitting Discrete Data: Revival of theConway-Maxwell-Poisson Distribution.” Journal of the Royal StatisticalSociety, Series C, 54: 127-142.

I Sellers and Shmueli. (2010). “A Flexible Regression Model for Count Data.”The Annals of Applied Statistics, 4(2): 943-961.

I Guikema and Coffelt. (2008). “A Flexible Count Data Regression Model forRisk Analysis.” Risk Analysis. 28(1): 213-223.

I Kadane, Shmueli, Minka, Borle and Boatwright, P. (2006). “ConjugateAnalysis of the Conway-Maxwell-Poisson Distribution.” Bayesian Analysis.1(2): 363-374.

I Diggle, Heagerty, Liang and Zeger. (1994). Analysis of Longitudinal Data.Oxford: Clarendon.

53 / 53

References

I Choo-Wosoba and Datta. (2018).“Analyzing Clustered Count Data with aCluster-specific Random Effect Zero-inflated Conway-Maxwell-PoissonDistribution.” Journal of Applied Statistics, 45(5): 799-814.

I Choo-Wosoba, Gaskins, Levy and Datta. (2018). “A Bayesian approach foranalyzing zero-inflated clustered count data with dispersion: BayesianConway-Maxwell-Poisson.” Statistics in Medicine, 37(1): 801-812.

I Choo-Wosoba, Levy and Datta. (2016). “Marginal Regression Models forClustered Count Data Based on Zero-Inflated Conway-Maxwell-PoissonDistribution with Applications.” Biometrics, 72(2): 606-618.

I Morris, Sellers and Menger. (2017). “Fitting a Flexible Model forLongitudinal Count Data Using the NLMIXED Procedure,” In SAS GlobalForum Proceedings. Cary, NC: SAS Institute.

53 / 53

References

I Booth, Casella, Friedl and Hobert. (2003). “Negative Binomial LoglinearMixed Models.” Statistical Modelling, 3: 179-191.

I Molenberghs, Verbeke, Demetrio and Vieira. (2010). “A Family ofGeneralized Linear Models for Repeated Measures with Normal andConjugate Random Effects.” Statistical Science, 25(3): 325-247.

I Molenberghs, Verbeke and Demetrio. (2007). “An Extended Random-EffectsApproach to Modeling Repeated, Overdispersed Count Data.” Lifetime DataAnalysis, 13: 513-531.

53 / 53

Simulated Data HistogramsPoisson

Count

Fre

quen

cy

0 5 10 15 20

050

150

250

Bernoulli

Count

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

010

020

030

0

Geometric

Count

Fre

quen

cy

0 5 10 15 20

010

020

030

0

CMP−Under

Count

Fre

quen

cy

0.0 0.5 1.0 1.5 2.0

010

020

0

CMP−Over

Count

Fre

quen

cy

0 5 10 15 20

050

150

250

53 / 53

True Random Effects Distribution

0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

True Random Effect Distribution

ui

Den

sity

LognormalGamma

53 / 53

CMP Conditional Conjugate

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

a = 0.5 and c = 0.5

λ

Den

sity

ν = 0ν = 0.5ν = 1ν = 2ν = 10ν = 30

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

a = 0.5 and c = 1

λ

Den

sity

ν = 0ν = 0.5ν = 1ν = 2ν = 10ν = 30

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

a = 0.5 and c = 2

λ

Den

sity

ν = 0ν = 0.5ν = 1ν = 2ν = 10ν = 30

53 / 53

CMP Conditional Conjugate

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

a = 1 and c = 0.5

λ

Den

sity

ν = 0ν = 0.5ν = 1ν = 2ν = 10

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

a = 1 and c = 1

λ

Den

sity

ν = 0ν = 0.5ν = 1ν = 2ν = 10ν = 30

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

a = 1 and c = 2

λ

Den

sity

ν = 0ν = 0.5ν = 1ν = 2ν = 10ν = 30

53 / 53

CMP Conditional Conjugate

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

a = 2 and c = 0.5

λ

Den

sity

ν = 0ν = 0.5ν = 1ν = 2

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

a = 2 and c = 1

λ

Den

sity

ν = 0ν = 0.5ν = 1ν = 2ν = 10

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

a = 2 and c = 2

λ

Den

sity

ν = 0ν = 0.5ν = 1ν = 2ν = 10ν = 30

53 / 53

CMP Conditional Conjugate

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

a = 3 and c = 0.5

λ

Den

sity

ν = 0ν = 0.5ν = 1ν = 2

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

a = 3 and c = 1

λ

Den

sity

ν = 0ν = 0.5ν = 1ν = 2ν = 10

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

a = 3 and c = 2

λ

Den

sity

ν = 0ν = 0.5ν = 1ν = 2ν = 10

53 / 53

top related