stats 330: lecture 32

32
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32

Upload: talbot

Post on 23-Feb-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Stats 330: Lecture 32. Course Summary. The Exam!. 25 multiple choice questions, similar to term test (60% for 330, 50% for 762) 3 “long answer” questions, similar to past exams (STATS 330) You have to do all the multiple choice questions, and 2 out of 3 “long answer” questions - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1

Stats 330: Lecture 32

Page 2: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 2

The Exam!• 25 multiple choice questions, similar to

term test (60% for 330, 50% for 762)• 3 “long answer” questions, similar to past

exams (STATS 330) You have to do all the multiple choice questions, and 2 out of 3 “long answer” questions

• (STATS 762) You have to do a compulsory extra question

• Held on am of Wed 31th October 2012

Page 3: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 3

Help Schedule• I will be available in Rm 265 from 10:30

am to 12:00 on Monday, Tuesday and Wednesday in the week before the exam, my schedule permitting

– Tutors: as per David Smiths’s email

Page 4: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 4

STATS 330: Course Summary

The course was about

• Graphics for data analysis

• Regression models for data analysis

Page 5: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 5

GraphicsImportant ideas:• Visualizing multivariate data

– Pairs plots– 3d plots– Coplots– Trellis plots

• Same scales• Plots in rows and columns

• Diagnostic plots for model criticism

Page 6: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 6

Regression models

We studied 3 types of regression:

– “Ordinary” (normal, least squares) regression for continuous responses

– Logistic regression for binomial responses– Poisson regression for count responses

(log-linear models)

Page 7: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 7

Normal regression• Response is assumed to be N(m,s2)• Mean is a linear function of the covariates m = b0 + b1x1 + . . . + bkxk

• Covariates can be either continuous or categorical

• Observations independent, same variance

Page 8: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 8

Logistic regression• Response ( s “successes” out of n) is

assumed to be Binomial Bin(n,p)• Logit of Probability log(p/(1-p))is a linear

function of the covariates log(p/(1-p)) = b0 + b1x1 + . . . + bkxk

• Covariates can be either continuous or categorical

• Observations independent

Page 9: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 9

Poisson regression• Response is assumed to be Poisson(m)

• Log of mean log(m) is a linear function of the covariates (log-linear models)

log(m) = b0 + b1x1 + . . . + bkxk (Or, equivalently m = exp(b0 + b1x1 + . . . + bkxk)

• Covariates can be either continuous or categorical

• Observations independent

Page 10: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 10

Interpretation of b -coefficients

• For continuous covariates:– In normal regression, b is the increase in mean

response associated with a unit increase in x– In logistic regression, b is the increase in log odds

associated with a unit increase in x– In Poisson regression, b is the increase in log mean

associated with a unit increase in x– In logistic regression, if x is increased by 1, the odds

are increased by a factor of exp(b)– In Poisson regression, if x is increased by 1, the mean

is increased by a factor of exp(b)

Page 11: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 11

Interpretation of b -coefficients

• For categorical covariates (main effects only):– In normal regression, b is the increase in mean

response relative to the baseline– In logistic regression, b is the increase in log odds

relative to the baseline– In logistic regression, if we change from baseline to

some level, the odds are increased by a factor of exp(parameter for that level) relative to the baseline

– In Poisson regression, if we change from baseline to some level, the mean is increased by a factor of exp(parameter for that level) relative to the baseline

Page 12: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 12

Measures of Fit• R2 (for normal regression)

• Residual Deviance (for Logistic and Poisson regression)

– But not for ungrouped data in logistic, or Poisson with very small means (cell counts)

Page 13: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 13

Prediction• For normal regression,

– Predict response at covariates x1, . . . ,xk– Estimate mean response at covariates x1, . . . ,xk

• For logistic regression, – estimate log-odds at covariates x1, . . . ,xk – Estimate probability of “success” at covariates

x1, . . . ,xk • For Poisson regression,

– Estimate mean at covariates x1, . . . ,xk

Page 14: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 14

Inference• Summary table

– Estimates of regression coefs– Standard errors– Test stats for coef = 0– R2 etc (normal regression)– F-test for null model– Null and residual deviances (logistic/Poisson)

Page 15: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 15

Testing model vs sub-model

• Use and interpretation of both forms of anova

– Comparing model with a sub-model– Adding successive terms to a model

Page 16: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 16

Topics specific to normal regression

• Collinearity– VIF’s– Correlation– Added variable plots

• Model selection– Stepwise procedures: FS, BE, stepwise– All possible regressions approach

• AIC, BIC, CP, adjusted R2, CV

Page 17: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 17

Factors (categorical explanatory variables)

• Factors– Baselines– Levels– Factor level combinations– Interactions– Dummy variables– Know how to express interactions in terms of

means, means in terms of interactions– Know how to interpret zero interactions

Page 18: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 18

Fitting and Choosing models

• Fit a separate plane (mean if no continuous covariates) to each combination of factor levels

• Search for a simpler submodel (with some interactions zero) using stepwise and anova

Page 19: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 19

Diagnostics• For non-planar data

– Plot res/fitted, res/x’s, partial residual plots, gam plots, box-cox plot

– Transform either x’s or response, fit polynomial terms

• For unequal variance– Plot res/ fitted, look for funnel effect– Weighted least squares– Transform response

Page 20: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 20

Diagnostics (2)• For outliers and high-leverage points

– Hat matrix diagonals– Standardised residuals, – Leave-one-out diagnostics

• Independent observations– Acf plots– Residual/previous residual– Time series plot of residuals– Durbin-Watson test

Page 21: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 21

Diagnostics (3)• Normality

– Normal plot– Weisberg-Bingham test– Box Cox (select power)

Page 22: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 22

Specifics for Logistic Regression

Log-likelihood is

))...exp(1log( -

)...(),...,(

,)...exp(1

)...exp(

)1log()()log(),...,(

110

1101

0

110

110

10

ikkii

ikki

n

iik

ikki

ikkii

iiii

n

iik

xxn

xxsl

lyequivalentorxx

xxwhere

snsl

bbb

bbbbb

bbbbbbp

ppbb

++++

+++=

+++++++

=

--+=

=

=

Page 23: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 23

DevianceDeviance = 2(log LMAX - log LMOD)

– log LMAX: replace p’s with frequencies si/ni

– log LMOD: replace p’s with estimated p’s from logistic model i.e.

))ˆ...ˆˆexp(1()ˆ...ˆˆexp(ˆ

110

110

ikki

ikkii xx

xxbbb

bbbp++++

+++=

Can’t use as goodness of fit measure in ungrouped case

Page 24: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 24

Odds and log-odds• Probabilities p: Pr(Y=1)

• Odds p /(1- p)

• Log-odds: log p/(1- p)

)...exp( 110 kk xx bbb +++

))...exp(1()...exp(

110

110

kk

kk

xxxxbbb

bbb++++

+++

kk xx bbb +++ ...110

Page 25: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 25

Residuals• Pearson

• Deviance)1(

)(

iii

iii

nnr

ppp-

-

=

=

+++++++

-=

n

ii

ikkiiikkii

iiii

dDeviance

xxn -xxr

nrsignd

1

2

110110

2/1|))ˆ...ˆˆexp(1log()ˆ...ˆˆ(|

)(

bbbbbb

p

Page 26: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 26

Topics specific to Poisson regression

• Offsets• Interpretation of regression coefficients

– (same as for odds in logistic regression)• Correspondence between Poisson

regression (Log-linear models) and the multinomial model for contingency tables– The “Poisson trick”

Page 27: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 27

Contingency tables• Cells 1,2,…, m• Cell probabilities p1, . . . , pm

• Counts y1, . . . , ym

• Log-likelihood is

=

m

iiiy

1

logp

Page 28: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 28

Contingency tables (2)• A “model” for the table is anything that specifies the form

of the probabilities, possibly up to k unknown parameters • Test if the model is OK by

– Calculate Deviance = 2(log LMAX - log LMOD)

log LMAX: replace p’s with table frequencieslog LMOD: replace p’s with estimated p’s from

the model– Model OK if deviance is small,(p-value > 0.05)– Degrees of freedom m - 1 - k– k = number of parameters in the model

Page 29: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 29

Independence models• Correspond to interactions being zero• Fit a “saturated” model using Poisson

regression• Use anova, stepwise to see which

interactions are zero• Identify the appropriate model• Models can be represented by graphs

Page 30: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 30

Odds Ratios• Definition and interpretation• Connection to independence• Connection with interactions• Relationship between conditional OR’s

and interactions• Homogeneous association model

Page 31: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 31

Association graphs• Each node is a factor• Factors joined by lines if an interaction

between them• Interpretation in terms of conditional

independence• Interpretation in terms of collapsibility

Page 32: Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 32

Contingency tables: final topics

• Association reversal – Simpson’s paradox– When can you collapse

• Product multinomial – comparing populations– populations the same if certain interactions are zero

• Goodness of fit to a distribution– Special case of 1-dimensional table