generalized linear mixed effects models in r · fitting generalized linear fixed effects models in...

Fitting Generalized Linear Fixed Effects

Models in RDavid Reitter, Informatics, University of Edinburgh

[email protected]

What linear models can do for you

• Factor analysis (cf. ANOVA)

• Regression (continuous response)

• Continuous predictors (covariates)

• Unbalanced designs (observational studies!)

• Non-normal response variables (GLM)

• Repeated measures / time series (random effects) (GLMM)

The Titanic Dataset

• Survival data with factorsSex, Age (Child/Adult), Cabin-Class, Crew

• Provided with R, but we’ll use a more complete dataset

Exploratory Data Analysis# Dead

Prob(Dead|Age)

A first linear model: Anova

ANOVA and Linear Models

• ANOVAs assume • Normality of response

• Linearity

• Homogeneity of variances

• IID sampling

• ANOVA as special case of the general LM:

• y = β0 + β1 x1• β0: intercept (baseline)

• β1: between-group variation

• ... compared to the within-group error

Non-balanced data

• Experimental Designs are often balanced

• controls error across conditions

• continuous variables binned

• Information about nature of effect missing! E.g. decay of preactivation in priming is log-linear.

• ANOVA compatible

• Naturalistic data is usually unbalanced

Factors and Predictors

• Age is a continuous variable

• binned for ANOVA: I(Age>16)

• do not discretize continuous variables!

• information loss

• bias through (arbitrary?) bins (thresholds)

• LMs can deal with continuous variables

...however

• Are the ANOVA / LM assumptions met?

• Observations are not IID

• spatial correlation (Boat)

• Response normally distributed?

• compare visually with qqnorm, qqplot

• apply Kolmogorov-Smirnov (ks.test) and/or Shapiro-Wilks (shapiro.test)

• “Binning” may be needed:

Transformations

• Generalized Linear Models (GLM) perform a transformation of the response via a link function

• No need for a manual transform!

• Link functions for glm include

• binomial (logit link): dichotomous response

• poisson: count data

Titanic GLM

... produces a huge model with mostly insignificant interactions

... a model with the interaction

... a simple model

Does Class:Age help?

Age matters when you’re a stewart

Actual Prediction

• A 49 year-old second-class passenger - how likely did he survive?

y! = log y ! log(1! y)

y! = log(y

1! y)

y =ey!

1 + ey!

Random Effects

• Most designs, both experimental and observational, involve some dependence between samples:

• time series data

• repeated measures

• spatial correlation of samples

• Mixed Effects Models include random effects and allow grouping of interdependent samples.

GLMM

• Generalized Mixed-Effects Model

• Specification:

• fixed effects formulaformula = target ~ log(time) * primed

• random effects formula: random = ~ 1 | speaker

• nested F1/F2 effects:random = ~ 1 | subject/item

• Library: nlme load with library(nlme) Functions: lme, nlme

• Library: MASS load with library(nlme) Function: glmmPQL

Contrasts

• To estimate effect sizes under different combinations of factors, use “within” formula notation: Survived ~ Age/Class

• “regresses out” Age before estimating effect of Class

• Use intervals with glmmPQL models to get confidence intervals for bar charts

• lme / lmer models need Markov-Chain Montecarlo Sampling to estimate p and confidence intervals

Reporting resultscoefficients (!i) Std. Error

Intercept -3.778 0.025 ***

ln(DISTTime) -0.057 0.015 **

ln(FREQ) 0.538 0.190 ***

ln(DIST) : ln(FREQ)) 0.083 0.010 ***

ln(DIST) : (ROLE = CP ) -0.031 0.012 *

ln(DIST) : (ROLE = PP ) : (SOURCE = MapTask) -0.050 0.014 **

ln(DIST) : (ROLE = CP ) : (SOURCE = MapTask) -0.137 0.018 ***

Table 1: The regression model for the joint data set of Switchboard and Map Task (Exp. 5). This is the minimal model without

insignificant covariates. * p < 0.01, ** p < 0.005, *** p < 0.0001.

to verify the hypothesis using time as the relevant decay cor-relate. We do so in Experiments 4 and 5.

Exp. 4: Pre-activation decay: over time, or with

each utterance?

While the previous experiments have shown that repetitionprobability decays soon after any stimulus, it is unclearwhether the pre-activation diminishes with time, or with ac-tual linguistic activity. To some extent, corpora can help makethat distinction.The differences between conversational and task-oriented

dialogue that we pointed out (Experiment 3) are founded onthe correlation of distance between prime and target and rep-etition likelihood. This correlation is likely to be sensitive tothe scale of DISTANCE. As an alternative, we can use thedelay between the left boundaries of the priming and targetphrases as the relevant predictor.The models discussed measure the distance between prime

and target in utterances. In this experiment, we fitted a secondregression model, estimating decay over time.To compare the two (obviously interrelated) predictors

DISTTime and DISTUtts, we estimated two simple linear re-gression models, one for time, the other one for number ofutterances as predictor. Such regression models can, as op-posed to GLMMs, produce a meaningful R2 measure. Inthese models, we include the maximum-likelihood estimateof the number of chance repetitions, which is calculated fromthe overall frequency of each syntactic rule (this is in additionto the covariates discussed before). The response variablehere is not binary, as in the other experiments, but a countof actual rule repetitions. The complete interaction term isrep ! ln(DISTUtts) " ROLE " SOURCE + EXPECTED.4

The goodness-of-fit measure R2 helps us determine howmuch of the variance in our data is explained by the model.

Results

For distance over utterances,R2 is 0.91, for time (in 1-secondbuckets) it is 0.89, a similar size.Thus, there is no compelling empirical evidence to assume

DISTTime as a predictor over the work-load-based one (usingutterance distance) chosen before. Because we cannot rea-sonably opt for one of the alternatives, we will reevaluate theeffect of corpus choice seen in Experiment 3, this time usingDISTTime.

4These models assume a normal distribution as opposed to theappropriate Poisson one.

Exp. 5: Priming over time

While time- and utterance-based models fit their respectivedata similarly well, time is a theoretically attractive measureof distance, in particular because the utterance is difficult todelineate in the context of speech.The methodology of this experiment is as in Experiment 3,

except that DISTTime is the distance predictor, instead of theDISTUtts used previously.

Results

The interaction of corpus type and priming decay found inExperiment 3 holds. CP priming is stronger in task-orienteddialogue. Table 1 contains the estimated model.The model based on temporal distance makes essentially

comparable predictions. The SOURCE has an interaction ef-fect on the priming decay ln(DIST), both for CP priming(!lnDist:CP :MapTask = # 0.137, t = # 7.6, p < 0.0001)and for PP priming (!lnDist:PP :MapTask = # 0.050,t = # 3.7, p < 0.0005). Figures 2, 3 provide the predic-tions for the four combinations of ROLE and SOURCE.

Discussion

Both corpora of spoken dialogue we investigated showed aneffect of distance between prime and target in syntactic repe-tition, thus providing evidence for a structural priming effectfor arbitrary syntactic rules. In both corpora, we also foundreliable effects of both production-production (PP) priming(self-priming) and comprehension-production-priming. Butonly in the Map Task, a corpus of task-oriented dialogue didwe find evidence for stronger CP priming than PP priming.A possible explanation for these results is the reduced cog-

nitive load that we can reasonably assume for spontaneous,everyday conversation (as in the Switchboard corpus). Pick-ering and Garrod (2004) suggest that interlocutors reducetheir workload by aligning their linguistic and semantic rep-resentations, as re-using structure is easier than creating it.As cognitive load in non-task oriented, spontaneous conver-sion is low, speakers reduce the amount of priming that isrequired in dialogue that related to a difficult difficult task.The fact that we consistently see stronger priming for lessfrequent syntactic rules supports the cognitive-load explana-tion: frequently used rules are more accessible, hence theirrepresentations need less pre-activation.Another reason may simply be that interlocutors in Switch-

board (as in all spontaneous dialogue) switch topics fre-quently, engaging in longer turns in between. Such a se-quence of monologues may, in general, be less affected by

Time CP MAPT

Time PP MAPT

Time CP SWBD

Time PP SWBD

Utts CP MAPT

Utts PP MAPT

Utts CP SWBD

Utts PP SWBD

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35-- - - - - -

Figure 2: Priming effect sizes ( ln(DIST)) under differentROLE and SOURCE situations. Prime-target distance by num-

ber of utterances (Exp. 3) and seconds (Exp. 5). 95% CI. Ef-

fects estimated from separately fitted nested regression mod-

els on separately sampled datasets.

Again, a GLMM was built to correlate priming conditionwith the set of factors and predictors.

Results

Once again we find that repetition is more likely the shorterthe distance between prime and target utterances is. Unlike inSwitchboard, interlocutors repeat each other’s syntactic struc-tures more readily and more similarly to the way they repeattheir own structures.The model showed a reliable effect of ln(DIST)

(t = ! 71.2, p < 0.005) .ROLE had a reliable constant effect on repetition rates

(t = ! 11.0, p < 0.0001), but there was no interactionbetween ROLE and DIST (p = 0.92).This finding confirms experimental results by Bock and

Griffin (2000) and Branigan et al. (1999), who find syntac-tic priming over longer distances, even though the effect de-cays. (The effect of ROLE on bias may be related to speakeridiosyncracies, i.e. more chance repetition within speakers.)To determine whether there is a significant influence of di-

alogue type on priming, comparing the effects we have seenin experiments 1 and 2, we built a further model, described inthe next section.

Exp. 3: Comparing corpora

With their Interactive Alignment Model, (Pickering and Gar-rod, 2004) argue that the situation-model alignment of speak-ers is due to lower-level priming effects. In task-oriented dia-logue, and in the task carried out by participants in Map Task,speakers need to align in order to successfully complete theirtasks. Thus, the theory would predict that syntactic primingbetween speakers (CP) is greater in task-oriented dialogue.We test this hypothesis by fitting a model of the joint data

set with SOURCE as a binary factor, indicating whether a rep-etition stems from Map Task (task-oriented) or Switchboard(not task-oriented). From Map Task, only dialogues in whichinterlocutors could not see one another where included.

2 4 6 8 10 12 14

0.0

08

0.0

10

0.0

12

0.0

14

0.0

16

distance: Temporal Distance between prime and target (seconds)

p(p

rim

e=

targ

et|ta

rget,dis

tance)

Map Task PP

Switchboard PP

Map Task CP

Switchboard CP

Figure 3: Decaying repetition probability estimates depend-

ing on the increasing distance between prime and target, con-

trasting different ROLE and SOURCE situations. (Exp. 5)

Results

As seen in the previous experiments, it can make a differencewhether a speaker primes themself or is primed by their in-terlocutor. Interestingly, the gap between CP and PP primingis substantially affected by the choice of corpus (last two in-teractions in Table 1). In both corpora, we find a positive PPpriming effect. However, in Map Task, CP and PP primingcannot be distinguished (cf. Experiment 2), while in Switch-board, there is little CP priming (cf. Experiment 1). Fig-ure 2 (first four bars) provides the resulting priming strengthestimates for the four factorial combinations of ROLE andSOURCE at increasing distance. Also, priming is stronger forless frequent rules.

For Switchboard, the model estimates a higher coeffi-cient for ln(DIST), suggesting that there was faster de-cay in Map Task (Baseline effect of LN(DIST): !lnDist =!0.092, p < 0.0001;!lnDist:CP = 0.083, p < 0.0001;!lnDist:MapTask = !0.044, p = 0.05;!lnDist:CP :MapTask = !0.140, p < 0.0001).Frequency is negatively correlated with decay(!lnDist:lnFreq = 0.049, p < 0.0001).Finding the marked difference between CP and PP prim-

ing, and also a clear PP priming effect in spontaneous con-versation, extends Dubey et al. (2005), who do not find reli-able evidence of adaptation within speakers in Switchboardfor selected syntactic rules in coordinate structures.

Thus, the data is consistent with the hypothesis that seman-tic alignment in dialogue is based on lower-level (syntactic)priming. However, when comparing data across corpora, weneed to be careful to ensure that differences in genre and an-notation are not the primary cause of the effect at hand. Thecoefficient for pre-activation decay is sensitive to utterancelength, which becomes an issue for instance when utterancesare not consistently marked or if decay occurs over time andnot with utterances. Indeed, most utterances in Switchboardare actually dialogue turns, and given the genre, they are usu-ally longer than those in Map Task. Therefore, it makes sense

Time CP MAPT

Time PP MAPT

Time CP SWBD

Time PP SWBD

Utts CP MAPT

Utts PP MAPT

Utts CP SWBD

Utts PP SWBD

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35-- - - - - -

Figure 2: Priming effect sizes ( ln(DIST)) under differentROLE and SOURCE situations. Prime-target distance by num-

ber of utterances (Exp. 3) and seconds (Exp. 5). 95% CI. Ef-

fects estimated from separately fitted nested regression mod-

els on separately sampled datasets.

Again, a GLMM was built to correlate priming conditionwith the set of factors and predictors.

Results

Once again we find that repetition is more likely the shorterthe distance between prime and target utterances is. Unlike inSwitchboard, interlocutors repeat each other’s syntactic struc-tures more readily and more similarly to the way they repeattheir own structures.The model showed a reliable effect of ln(DIST)

(t = ! 71.2, p < 0.005) .ROLE had a reliable constant effect on repetition rates

(t = ! 11.0, p < 0.0001), but there was no interactionbetween ROLE and DIST (p = 0.92).This finding confirms experimental results by Bock and

Griffin (2000) and Branigan et al. (1999), who find syntac-tic priming over longer distances, even though the effect de-cays. (The effect of ROLE on bias may be related to speakeridiosyncracies, i.e. more chance repetition within speakers.)To determine whether there is a significant influence of di-

alogue type on priming, comparing the effects we have seenin experiments 1 and 2, we built a further model, described inthe next section.

Exp. 3: Comparing corpora

With their Interactive Alignment Model, (Pickering and Gar-rod, 2004) argue that the situation-model alignment of speak-ers is due to lower-level priming effects. In task-oriented dia-logue, and in the task carried out by participants in Map Task,speakers need to align in order to successfully complete theirtasks. Thus, the theory would predict that syntactic primingbetween speakers (CP) is greater in task-oriented dialogue.We test this hypothesis by fitting a model of the joint data

set with SOURCE as a binary factor, indicating whether a rep-etition stems from Map Task (task-oriented) or Switchboard(not task-oriented). From Map Task, only dialogues in whichinterlocutors could not see one another where included.

2 4 6 8 10 12 14

0.0

08

0.0

10

0.0

12

0.0

14

0.0

16

distance: Temporal Distance between prime and target (seconds)

p(p

rim

e=

targ

et|ta

rge

t,d

ista

nce

)

Map Task PP

Switchboard PP

Map Task CP

Switchboard CP

Figure 3: Decaying repetition probability estimates depend-

ing on the increasing distance between prime and target, con-

trasting different ROLE and SOURCE situations. (Exp. 5)

Results

As seen in the previous experiments, it can make a differencewhether a speaker primes themself or is primed by their in-terlocutor. Interestingly, the gap between CP and PP primingis substantially affected by the choice of corpus (last two in-teractions in Table 1). In both corpora, we find a positive PPpriming effect. However, in Map Task, CP and PP primingcannot be distinguished (cf. Experiment 2), while in Switch-board, there is little CP priming (cf. Experiment 1). Fig-ure 2 (first four bars) provides the resulting priming strengthestimates for the four factorial combinations of ROLE andSOURCE at increasing distance. Also, priming is stronger forless frequent rules.

For Switchboard, the model estimates a higher coeffi-cient for ln(DIST), suggesting that there was faster de-cay in Map Task (Baseline effect of LN(DIST): !lnDist =!0.092, p < 0.0001;!lnDist:CP = 0.083, p < 0.0001;!lnDist:MapTask = !0.044, p = 0.05;!lnDist:CP :MapTask = !0.140, p < 0.0001).Frequency is negatively correlated with decay(!lnDist:lnFreq = 0.049, p < 0.0001).Finding the marked difference between CP and PP prim-

ing, and also a clear PP priming effect in spontaneous con-versation, extends Dubey et al. (2005), who do not find reli-able evidence of adaptation within speakers in Switchboardfor selected syntactic rules in coordinate structures.

Thus, the data is consistent with the hypothesis that seman-tic alignment in dialogue is based on lower-level (syntactic)priming. However, when comparing data across corpora, weneed to be careful to ensure that differences in genre and an-notation are not the primary cause of the effect at hand. Thecoefficient for pre-activation decay is sensitive to utterancelength, which becomes an issue for instance when utterancesare not consistently marked or if decay occurs over time andnot with utterances. Indeed, most utterances in Switchboardare actually dialogue turns, and given the genre, they are usu-ally longer than those in Map Task. Therefore, it makes sense

Model Checking

• Use plot(model) to show diagnostic graphs

• shows four diagnostics, incl. Q-Q and residuals vs. fitted

• Interpretation: see Crawley 2005

ideal case!

Count data, with linear model (no transformation)

(Model does not show effects!)

i <- runif(60); j <- runif(50); p <- rbind(data.frame(r=as.integer(rpois(60,40)*(i*3)), t='a', i=i), data.frame(r=as.integer(rpois(50,45)*(j*2.5)), t='b', i=j))

Count data, with Poisson GLM (i.e. log(1+y) transform)

(Model shows two main effects and the interaction)i <- runif(60); j <- runif(50); p <- rbind(data.frame(r=as.integer(rpois(60,40)*(i*3)),

t='a', i=i), data.frame(r=as.integer(rpois(50,45)*(j*2.5)), t='b', i=j))

To read

• H. Baayen: Practical Data Analysis for the Language Sciences with R. To Appear.

• S. Vasishth: The foundations of statistics: A simulation-based approach. In Prep. Download: http://www.ling.uni-potsdam.de/~vasishth/SFLS.html

• M. J. Crawley: Statistics. An Introduction using R, Wiley 2005

• W.N. Venables & B.D. Ripley: Modern Applied Statistics with S. Springer 2002

GLMs in R: Don’t panic!

Picture: http://www.solarnavigator.net

generalized linear mixed effects models in r · fitting generalized linear fixed effects models in...

Documents