alison bowling maximum likelihood. general linear model
DESCRIPTION
ALTERNATIVE DISTRIBUTIONS Binomial (proportions) P (event occurring), 1-P (event not occurring) Poisson (count data)TRANSCRIPT
![Page 1: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/1.jpg)
A L I S O N BO W L I N G
MAXIMUM LIKELIHOOD
![Page 2: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/2.jpg)
GENERAL LINEAR MODEL
• ei ~ i.i.d. N(0, s2)
• Residuals are• Independent and identically distributed• Normally distributed• Mean 0, Variance s2
• What to do when the normality assumption does not hold?
• We can fit an alternative distribution• This requires Maximum Likelihood methods.
![Page 3: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/3.jpg)
ALTERNATIVE DISTRIBUTIONS
• Binomial (proportions)• P (event occurring), 1-P (event not occurring)
• Poisson (count data)
![Page 4: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/4.jpg)
MAXIMUM LIKELIHOOD
• Myung, J. (2003). Tutorial on maximum likelihood estimation. Journal of Mathematical Psychology, 47, 90 – 100.• Standard approach to parameter estimation and
inference in statistics• Many of the inference methods in statistics are based on
MLE.• Chi-square test• Bayesian methods• Modelling of random effects
![Page 5: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/5.jpg)
PROBABILITY DISTRIBUTIONS
• Imagine a biased coin, with the probability of heads, w, = 0.7, is tossed 10 times.• The following probability distribution, can be
computed using the binomial theorem.
0 1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
0.3
Number of Heads
Prob
ality
of
resu
lt (f
(y))
This is a probability distribution.• the probability of
obtaining a particular outcome for 10 tosses of a coin with w = .7
• 7 heads are more likely to occur than any other combination
![Page 6: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/6.jpg)
LIKELIHOOD FUNCTION
• Suppose we don’t know w, but have tossed the coin 10 times and obtained y = 7 heads. • What is the most likely value of
w?• This may be obtained from the
likelihood function.• This is a function of the
parameter, w, given the data, y.
• The most likely value of w is at the peak of this function.
![Page 7: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/7.jpg)
MAXIMUM LIKELIHOOD ESTIMATION
• We are interested in finding the probability distribution that underlies that data that have been collected.• We are consequently interested in finding the parameter
value(s) that correspond to the desired probability distribution.
• The MLE estimate is the maximum (peak) of the maximum likelihood function• This may be obtained from the first derivative of the MLF.• To make sure this is a peak (and not a valley), the second
derivative is also checked.
![Page 8: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/8.jpg)
ITERATIVE METHOD
• For very simple scenarios, the maximum can be obtained using calculus as in the example.• This is usually not possible, especially when the
model involves many parameters.• This is done by an iterative series of trial and
error steps.• Start with a value of a parameter, w, and compute the
likelihood of obtaining this.• Then try another, and see if the likelihood is higher.• If so, keep going• Stop when the maximum is found (solution converges).
![Page 9: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/9.jpg)
MLE ALGORITHMS
• Different algorithms are used to obtain the result• EM: estimation maximisation algorithm• Newton-Raphson• Fisher Scoring.
• SPSS uses both the Newton-Raphson and the Fisher scoring method.
![Page 10: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/10.jpg)
LOG LIKELIHOOD
• The computation of likelihood involves multiplying probabilities for each individual outcome• This can be computationally intensive.
• For this reason, the log of the likelihood is computed instead.• Instead of multiplying, the outcomes are added.• Log (A x B) = Log A + Log B
• We maximise the log of the likelihood rather than the likelihood itself, for computational convenience.
![Page 11: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/11.jpg)
-2LL
• The log likelihood is the sum of the probabilities associated with the predicted and actual outcomes.• This is analogous to the residual sum of squares in OLS
regression.• The larger the log likelihood the greater the unexplained
variance.• This is usually negative, and can be made positive by
adding the negative sign.• We multiply by 2 to enable us to obtain p values to
compare models.• This value is -2LL
![Page 12: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/12.jpg)
EVALUATING MODELS
• Using OLS we use R2 to evaluate models.• i.e. does the addition of a predictor produce a significant
increase in R2?• R2
is based on Sums of Squares, which we do not have when using ML.• We use the -2LL, Deviance, and Information
Criteria to evaluate models using ML.• Unlike R2, -2LL is not meaningful in its own right.• Used to compare with other models.
![Page 13: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/13.jpg)
DEVIANCE
• Deviance is a measure of lack of fit.• Measures how much worse the model is than a perfectly
fitting model.• Deviance can be used to obtain a measure of
pseudo-R2
• -
![Page 14: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/14.jpg)
LIKELIHOOD RATIO STATISTIC
• LR = likelihood of reduced model (without the parameters)• LF = likelihood of the full model (with the
parameters)
• LR ~ c2r , where r = dffull – dfreduced
• G2 compares the fitted model with the intercept-only model.
![Page 15: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/15.jpg)
MAXIMUM LIKELIHOOD IN SPSS
• Logistic regression.• Used with a binomial outcome variable• E.g. yes, no; correct, incorrect; married, not married.
• Generalised Linear models• Provides a range of non-linear models to be fitted.
![Page 16: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/16.jpg)
BAR-TAILED GODWIT DATA
• Dependent variable is a count:• Maximum number of birds observed at each estuary for
each year• Independent variables• Estuary: Richmond, Hastings, Clarence, Hunter, Tweed• categorical
• Year: 1981 – 2014.• Continuous (centred to 0 at 1981).
• Research question:• Does the number of Bar-tailed Godwits in the Richmond
Estuary remain stable, or improve, compared to the other estuaries?
![Page 17: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/17.jpg)
STEP 1: GRAPH THE DATA
It is obvious that these data have problems.
Counts in the Hunter estuary are much higher than the other estuaries, and have much greater variance.
![Page 18: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/18.jpg)
STEP 2: DUMMY CODE THE ESTUARY DATA
Richmond Clarence Hunter Hastings Tweed0 1 0 0 00 0 1 0 00 0 0 1 00 0 0 0 1
Use Richmond as the comparison category.Each of the other estuaries may be compared in turn with Richmond.
![Page 19: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/19.jpg)
STEP 3: RUN OLS ANALYSIS OF THE DATA
• I will just include Hunter in this analysis to illustrate.• Model:
• Including just the Year0:
• There is a non-significant change in Godwit numbers over the years.
![Page 20: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/20.jpg)
OLS DATA ANALYSIS
• Including the estuary and estuary * Year0 interaction.
There is a significant increase in R2 when the Hunter and Hunter* year interaction are included in the model.
![Page 21: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/21.jpg)
INTERPRETATION OF THE FULL MODEL
• At year0 =0, the predicted Godwit for Richmond = 292 birds• Change in numbers over the years for Richmond = -4.4• At Year0=0, difference between numbers in the Hunter and Richmond =
1449.7 (p < .001) • Over 24 years, difference in rate of change for Hunter, compared with
Richmond is -15.2 (p = .031)• i.e. there is a steeper decline in bird numbers in Hunter estuary, than the
Richmond estuary.
![Page 22: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/22.jpg)
CHECKING RESIDUALS….
• Residuals are not normally distributed.• The assumptions
for a linear model are not met!!
![Page 23: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/23.jpg)
WHAT TO DO?
• We could try a transformation of the DV• A Square root transformation is better, but not perfect
• We could use a non-linear model• The data are counts, and we could use either a Poisson or
Negative Binomial distribution• We will use a Negative Binomial (for reasons that will be
explained later)• Use Generalized Linear Models for the analysis.
![Page 24: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/24.jpg)
INTERCEPT ONLY MODEL
• No predictors are included, and the model simply tests whether the overall number of BT Godwits is different to zero.• The Log likelihood is -
827.26• -2LL = 1654.53
![Page 25: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/25.jpg)
MODEL WITH THREE PARAMETERS
• Running the model including Year0, Hunter and Hunter*Year0 gives the following Goodness of Fit MeasuresLog likelihood = -781.3
-2LL = 1562.6
![Page 26: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/26.jpg)
COMPARING THE TWO MODELS
• -2LL for intercept only model = 1654.53• -2LL for full model (with parameters) = 1562.6• Likelihood ratio (G2) = 1654.5 – 1562.6 = 91.9 • df = 3 , p < .001• Therefore the model including the three parameters is a
better fit to the data than just the intercept only model.• Limitations:
1. the models must be nested (one model must be contained within the other)
2. Data sets must be identical
![Page 27: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/27.jpg)
INFORMATION CRITERIA
• Akaike’s Information Criterion : AIC = -2LL + 2k• Schwartz’s Bayesian Criterion : BIC = -2LL + k + ln(N)• k = number of parameters• N = number of participants
• Can be used with non-nested models• These IC are similar to restricted R2
• The more parameters you have, the better a model is likely to fit the data.
• The IC take this into account by penalising for additional parameters and/or participants.
• Better fitting models have lower values of the IC.
![Page 28: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/28.jpg)
ANALYSIS OF COUNT DATA
• Coxe, S., West, S.G. and Aiken, L. (2009). The analysis of count data: a gentle introduction to Poisson regression and its alternatives. Journal of Personality Assessment, 91, 121- 136.• Poisson regression • Overdispersed Poisson regression models• Negative binomial regression models• Models which address problems with zeros.
![Page 29: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/29.jpg)
ANALYSIS OF COUNT DATA
• Count data are discrete numbers• Usually not normally
distributed.• E.g. number of drinks
on a Saturday night.• Modelled by a Poisson
distribution.• This has one parameter, m.
![Page 30: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/30.jpg)
POISSON MODEL
• Assumptions: (Y|X)~ Poi(μ), Var(Y|X)=fμ, f=1
• i.e. The residuals have a Poisson distribution.
![Page 31: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/31.jpg)
EXAMPLE: DRINKS DATA
• Coxe et al Poisson dataset in SPSS format.• Sensation: mean score on a sensation seeking scale (1-7)• Gender (0 = female, 1 = male)• Y : number of drinks on a Saturday night.
![Page 32: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/32.jpg)
OLS REGRESSION
• Intercept < 0• When sensation =
0, number of drinks is negative!!
• Residuals are not normally distribution.
• OLS has problems!!
![Page 33: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/33.jpg)
POISSON REGRESSION: PARAMETERS
• Sensation only
• When sensation = 0, drinks = e-.14 = .86• For every 1 unit change in sensation, number of
drinks is multiplied by e-.231 = 1.26.
![Page 34: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/34.jpg)
POISSON REGRESSION: MODEL FIT
• Sensation only: Model fit
• G2 = 35.07• Model fits better than the intercept only model
• Deviance = 1151• -2LL = -(-1037.5) x2 = 2075• BIC = 2087
• Deviance for the intercept-only model = 1186 (check)
• Pseudo-R2 =
![Page 35: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/35.jpg)
POISSON REGRESSION: PARAMETERS
• Sensation and Gender as predictors
• What is the effect of gender on number of drinks consumed (holding sensation constant)??
![Page 36: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/36.jpg)
EFFECT OF GENDER
• Intercept = -.789 (for gender = 0; female)• Exp(-.789) = .45• Females drink .45 drinks on a Saturday night
• B = .839 (gender = 1: male)• Exp(.839) = 2.3• Males drink 2.3 times as many drinks as females (when
sensation seeking = 0).
![Page 37: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/37.jpg)
POISSON REGRESSION: MODEL FIT
• -2LL = -2 * (-.941.4) = 1828.2• BIC = 1900.77• Model including gender is a substantially better fit
than sensation model alone• (1900 vs 2087)
Pseudo-R2 =
![Page 38: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/38.jpg)
MODEL ADEQUACY
• Save deviance residuals and predicted values, and plot the residuals against predicted values.
![Page 39: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/39.jpg)
OVERDISPERSION
• A Poisson distribution has only one parameter, m, where m is the mean and variance of the distribution.• Often the variance of a set of data is greater than
the mean• The data are overdispersed.
![Page 40: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/40.jpg)
OVERDISPERSED POISSON REGRESSION MODELS
• A second parameter, f, is estimated to scale the variance.• The parameters from the overdispersed model
are the same as with the simple model, but standard errors are larger.• Use information criteria to compare models
![Page 41: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/41.jpg)
NEGATIVE BINOMIAL MODELS
• Negative binomial models use a Poisson distribution, but allow for individuals to vary in the distribution fitted.
![Page 42: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL](https://reader033.vdocuments.net/reader033/viewer/2022051101/5a4d1b1e7f8b9ab0599943b3/html5/thumbnails/42.jpg)
HOMEWORK
• Use PGSI Data.sav (Leigh’s Honours data)• DV = PGSI (Score on Problem Gambling Severity Scale)• Predictors = GABS, FreqCoded
• Run a Poisson regression to predict PGSI from GABS• Does GABS significantly predict PGSI score?• Look at the likelihood ratio (G2)
• Interpret the coefficients for the intercept and GABS• Run a second regression including FreqCode (as a
continuous variable) in the model.• Does this second predictor improve the model fit?• (hint – look at the BIC for the two models)