201ab quantitative methods multiple regression (b) · 2020. 11. 10. · sst (ss total, also ssy)...

ED VUL | UCSD Psychology

201ab Quantitative methodsMultiple regression (b)

With great illustrations from Julian Parris.


Multiple regression• Review• Coefficient of “partial determination”

(partial R2, partial eta2)• Nested models• Non-nested models• Polynomial regression• Multiple regression diagnostics• Partial correlations and mediation analyses


Regression

Yi = β0 + β1X1i + β2X2i + ε i εi ~ N(0,σε2 )

summary(lm(daughter~dad+mom))

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.7872 4.6471 0.815 0.417082 mom 0.5210 0.1164 4.477 2.06e-05 *dad 0.3900 0.1078 3.617 0.000475 *

Coefficients:- Partial slope: dY/dXj holding

other Xs constant.

Multicolinearity:- Correlation among

predictors.- Credit assignment is

uncertain- Coefficients change; are

sensitive to model and noise; have higher marginal errors.


SST (SS total, also SSY)

SSR[X1] SSE[X1]

Variability in Y left over after factoring in X1

SSR[X1] SSE[X1,X2]SSR[X2|X1]


SSR[X1,X2] SSE[X1,X2]

Variability in Y accounted for by X1 & X2e.g., Variability in daughters’ heights accounted for by mothers’ and fathers’ height

Variability unaccounted

for by X1 & X2

Extra sums of squares: Extra variability accounted for by taking into account X1 after having considered X2.

e.g., Additional variability in daughters’ heights accounted for by taking into account mothers’ heights having already considered fathers’ height


F(dfterm,dferror ) =

SStermdfterm

!

"#

$

%&

SSEFULLdferror

!

"#

$

%&

d.f. of regression term: # parameters of this term

d.f. error: n minus # parameters in full model

SS: Sum of squares for this term

SSE: Sum of squared residuals



anova(lm(son~mom+dad))

Response: sonDf Sum Sq Mean Sq F value Pr(>F)

mom 1 79.523 79.523 15.3977 0.00572 **dad 1 9.225 9.225 1.7862 0.22320 Residuals 7 36.152 5.165

anova(lm(son~dad+mom))

Response: sonDf Sum Sq Mean Sq F value Pr(>F)

dad 1 79.595 79.595 15.4116 0.005707 **mom 1 9.153 9.153 1.7723 0.224818 Residuals 7 36.152 5.165




F(pFULL − pREDUCED,n− pFULL ) =

SSEREDUCED − SSEFULLpFULL − pREDUCED

"

#$

%

&'

SSEFULLn− pFULL

"

#$

%

&'

Extra parameters in full model

n minus number of parameters in full model

Extra sums of squares of full compared to reduced

Remaining sums of squares error in full model

SSR[X1,X2,X3] SSE[X1,X2,X3]

SSX[X2,X3|X1]

SSR[X1] (SS regression var 1) SSE[X1]anova(lm(y~x1))

Df Sum Sq Mean Sq F value Pr(>F) x1 1 517.18 517.18 64.373 2.263e-12 *Residuals 98 787.34 8.03

anova(lm(y~x1+x2+x3))

Df Sum Sq Mean Sq F value Pr(>F) x1 1 517.18 517.18 545.73 < 2.2e-16 *x2 1 460.22 460.22 485.62 < 2.2e-16 *x3 1 236.15 236.15 249.19 < 2.2e-16 *Residuals 96 90.98 0.95 anova( lm(y~x1) , lm(y~x1+x2+x3) )

Model 1: y ~ x1Model 2: y ~ x1 + x2 + x3

Res.Df RSS Df Sum of Sq F Pr(>F) 1 98 787.34 2 96 90.98 2 696.37 367.4 < 2.2e-16 *


Significance in regression• Pairwise correlation t-test.– Is there a significant linear relationship between Y and Xj

ignoring other predictors?

• Coefficient t-test.– Does the partial slope dY/dXj controlling for all other predictors

differ significantly from zero?• Variance-partitioning F-tests.– Is the sums of squares allocated to this term (depends on order,

SS type) significantly greater than chance?• Nested model comparison F-tests.– Does the larger model account for significantly more variance

than the smaller model?In some special cases, these end up equivalent.



SSR[X1] SSE[X1]

Variability in Y left over after factoring in X1




Variability in Y accounted for by X1 & X2e.g., Variability in daughters’ heights accounted for by mothers’ and fathers’ height

Variability unaccounted

for by X1 & X2

Extra sums of squares: Extra variability accounted for by taking into account X1 after having considered X2.

e.g., Additional variability in daughters’ heights accounted for by taking into account mothers’ heights having already considered fathers’ height



SSR[X1] (SS regression var 1) SSE[X1] (SS error)

R2Proportion of variability in Y accounted for by X1

e.g., Variability in daughters’ heights accounted for by mothers’ height“Coefficient of determination”

1-R2Proportion of variability unaccounted for by X1

e.g., Variability in daughters’ heights not accounted for by mothers’ height

SSR[X1] SSE[X1,X2]SSX[X2|X1]


R2Proportion of variability in Y accounted for by X1, X2

e.g., Variability in daughters’ heights accounted for by mothers’ and fathers’ height“Coefficient of multiple determination”



SSR[X1] (SS regression var 1) SSE[X1] (SS error)

R2Proportion of variability in Y accounted for by X1

e.g., Variability in daughters’ heights accounted for by mothers’ height“Coefficient of determination”

1-R2Proportion of Variability unaccounted for by X1

e.g., Variability in daughters’ heights not accounted for by mothers’ height

SSR[X1] SSE[X1,X2]SSX[X2|X1]

R2Y,X2|X1Proportion of variability previously unaccounted for by X1 that can be

accounted for by X2“Coefficient of partial

determination”

RYX2|X12 =

SSX[X2 | X1]SSE[X1]


• Nested Model: A smaller model that differs only by excluding some parameters of a larger model.A) height ~ mom + dad + protein + exercise + milk B) height ~ mom + dad + protein + exerciseC) height ~ dad + protein + exerciseD) height ~ mom + dadE) height ~ dad + proteinF) height ~ mom + dad + milkG) height ~ exercise + milk + beerH) weight ~ mom + dad + protein + exercise

B in A; C in A, B; D in A, B; E in A, B, C; F in A; A, G, H are not nested in others.


F-tests compare nested models

14

They ask: is a bigger model better than a smaller model?

height ~ mom + dad + protein + exercise + milk (nested)height ~ mom + dad + protein + exercise(nested)height ~ dad + protein + exercise(nested) height ~ protein + dad(nested) height ~ dad(nested)height ~ 1




"

#$

%

&'

SSEFULLn− pFULL

"

#$

%

&'

d.f. of numerator d.f. of denominator

d.f. of numerator: number of extra parameters in full model

d.f. of denominator: n minus number of parameters in full model

Extra sums of squares of full compared to reduced: Estimated by difference in SSE.

Remaining sums of squares error in

full model


- Extra sums of squares of full compared to reduced model is the difference in sums of squares of error.

- Degrees of freedom of the extra sums of squares is the number of parameters added.

- The remaining sums of squares error from the full model is the denominator.

- Degrees of freedom of error are n minus the number of parameters in full model.



"

#$

%

&'

SSEFULLn− pFULL

"

#$

%

&'




"

#$

%

&'

SSEFULLn− pFULL

"

#$

%

&'

• SSE reduced is just SST (a 1 parameter regression model considering only the mean of Y: B0)

• SSR[X1] = SST – SSE[x1]


SSR[X1] (SS regression var 1) SSE[X1]

F = (SSR[x1] / (2-1)) / (SSE[x1] / (n-2))




"

#$

%

&'

SSEFULLn− pFULL

"

#$

%

&'

• SSE reduced is SSE[x1]. SSE full is SSE[x1,x2,x3]• SSX[x2,x3|x1] = SSE[x1]– SSE[x1,x2,x3]• # parameters full: 4 (b0, b1, b2, b3)• # parameters reduced: 2 (b0, b1)

SSR[X1] (SS regression var 1) SSE[X1,X2,X3]SSX[X2,X3|X1]

F = (SSX[x2,x3|x1] / (2)) / (SSE[x1,x2,x3] / (n-4))





"

#$

%

&'

SSEFULLn− pFULL

"

#$

%

&'

• SSE reduced is SSE[b0]. SSE full is SSE[x1,x2,x3]• SSR[x2,x3,x1] = SSE[b0]– SSE[x1,x2,x3]• # parameters full: 4 (b0, b1, b2, b3)• # parameters reduced: 1 (b0)



F = (SSR[x1,x2,x3] / (4-1)) / (SSE[x1,x2,x3] / (n-4))




"

#$

%

&'

SSEFULLn− pFULL

"

#$

%

&'

• SSE reduced is SSE[x1,x3]. SSE full is SSE[x1,x2,x3]• SSX[x2|x1,x3] = SSE[x1,x3]– SSE[x1,x2,x3]• # parameters full: 4 (b0, b1, b2, b3)• # parameters reduced: 3 (b0,b1,b3)

F = (SSX[x2|x1,x3] / (1)) / (SSE[x1,x2,x3] / (n-4))

SSR[X1,X3] SSX[X2|X1,X3]SSE[X1,X2,

X3]






Comparisons:

Does X2 account for the variability in Y left over after taking into account X1 and X2 better than chance?


X3]

Omnibus: Do X1, X2, and X3 together account for the variability in Y better than chance?

Do X2 and X3 together account for the variability in Y left over after taking into account X1better than chance?


OLS regression: Does X1 account for the variability in Y better than chance?





Comparisons:

F = (SSX[x2|x1,x3] / (1)) / (SSE[x1,x2,x3] / (n-4))


X3]

F = (SSR[x1,x2,x3] / (4-1)) / (SSE[x1,x2,x3] / (n-4))

F = (SSX[x2,x3|x1] / (2)) / (SSE[x1,x2,x3] / (n-4))


F = (SSR[x1] / (2-1)) / (SSE[x1] / (n-2))


• F test allows us to compare nestedmodels.

• How do we compare non-nested models?– height ~ mom + dad– height ~ mom + protein– height ~ protein + exercise– height ~ ethnicity– weight ~ mom + dad



"

#$

%

&'

SSEFULLn− pFULL

"

#$

%

&'

“Model building” comparison:Is it better to add dad or protein to model that already has mom?Is it better to add mom or exerciseto a model that already has protein?

I am using these terms to describe different comparisons only for convenience, these are not really technical names for different non-nested model comparisons. In reality, all of them are ‘model selection’ problems.






"

#$

%

&'

SSEFULLn− pFULL

"

#$

%

&'

“Model selection” comparison:Is a model with mom and dadbetter than a model with protein and exercise? A model with ethnicity?(These can also be seen as model building problems: would it be better to add these or those regressors to null model)







"

#$

%

&'

SSEFULLn− pFULL

"

#$

%

&'

Weird (but sometimes useful) model comparison:Is height more/less predictable by mom and dad (height?) than weight?



• How do we compare non-nested models?– There isn’t really a good way to test the null hypothesis that

two non-nested models are equally good. Because (a) we don’t know what ‘good’ means. Bigger models will have better fits, how do we trade off fit with model size(b) Even if we define ‘good’, the difference in goodness of two models doesn’t have a definable null hypothesis distribution.

– Consequently, we just define some goodness statistic and compare the numerical difference in goodness.(Bayesian methods offer ways to attach probability statements to goodness comparisons between non-nested models, but we will not be dealing with this now)


• How do we compare non-nested models?Goodness:– R2 (no punishment for bigger models: fit is all that counts)

• Useful for simple model building when number of parameters is constant: which parameter is a better one to add to the model I already have? Which K parameter model better fits these data?


• How do we compare non-nested models?– Goodness:– R2a ‘Adjusted R squared’

(like R^2, but punished for having more parameters)

RA2 = R2 =1− (1− R2 ) n−1

n− p=1− SSE

SST(n−1)(n− p)


• How do we compare non-nested models?Goodness:– R2

– R2a ‘Adjusted R squared’

– Lots more available based on likelihood, rather than SS: AIC, BIC, WAIC, DIC, etc. (more next term)

– Complicated ones available based on “marginal likelihood” or “model evidence” via Bayesian methods: Bayes Factor

– They all define some trade off between number of parameters and fit to the data.• Sometimes they will give you different answers! If so, you should be

worried. A clearly better model should do better on all of these metrics. When different metrics give you different answers you should not be confident.


Midterm• Goes out today after class• Due Wednesday night

• Do not work together.

31


• R, data, etc.– Markdown– Cleaning,– Graphing

• Probability– Basic rules, conditional p, PDF, CDF, quantile– Expectation, variance of RV

• Significance testing– p, alpha, power, effect size, – Z-tests

• Linear model– Simple T-tests– Correlation, – OLS Regression– Multiple regression


Some typical probability questions.• When flipping a fair coin, what is the probability that the

first occurrence of heads will be on the 5th flip?

• If you run 20 independent, tests on truly null data, each with a false-positive rate of 0.05, what’s the probability that you will get at least 1 false positive (the familywisefalse-positive rate)?– What would the per-test false-positive rate have to be for

the familywise false positive rate –P(at least 1 false-positive) -- to be 0.05?


Binomial questionsUse vectors, dbinom(), and sum() in R to make these calculations!

• Generally, 51.2% of all (US) births are male. A hospital has 10 births in one day. What is the probability that…– exactly 6 of them will be male?– 7 or more of them will be female?– The probability that the proportion of male births will be

abnormali.e., either abnormally high (>75%) or abnormally low (


Binomial questions with rbinom()Use rbinom() to make these calculations!

• Generally, 51.2% of all (US) births are male. A hospital has 10 births in one day. What is the probability that…– exactly 6 of them will be male?– 7 or more of them will be female?– The proportion of male births will be abnormal

i.e., either abnormally high (>75%) or abnormally low (


Cumulative probability questionsuse pnorm(), pbinom() in R

• IQ normally distributed with a mean=100 and sd=15.What is the probability that a given person has an IQ…– Less than 120?– Greater than 145?– Between 90 and 110?

• Test scores on a 25 item quiz are binomially distributed with n=25, p=0.7.What is the probability that a given person’s score is…– Less than 17?– Greater than 20?– Between 15 and 20?


Quantile questionsUse qnorm(), qbinom() in R

• IQ normally distributed with a mean=100 and sd=15.– What would your IQ score have to be to join MENSA

(98th percentile of IQ distribution)?– What is the interquartile range of IQ?– The Prometheus society accepts only the top 1/30000th of the

IQ distribution. How much higher is the IQ cutoff for Prometheus membership as compared to MENSA?

• Test scores on a 25 item quiz are binomially distributed with n=25, p=0.7.– What score would put you in the 90th percentile?– What is the interquartile range for these scores?– What is the median score?


Expectation questions1) Scores (X) on a 25 item quiz are distributed as a Binomial

with n=25, p=0.7- What is the expected value of X?- Variance of X?- Skew of X?- Kurtosis of X?

2) Scores on another quiz (Y) have Mean[Y]=15, Var[Y]=16 (and are independent with scores on the first quiz). I add both scores up to get the final score, Z.- What is Mean[Z]?- What is Var[Z]?


Expectation + Normal questionLet's say puppies have the following traits:wagging speed: w(Hz) ~ Norm ( mu=1, sd=0.25 )ear stiffness: k(log GPa) ~ Norm ( mu=-2.5, sd=0.5 )eye/head size: e(log m2/m2) ~ Norm ( mu=-1.5, sd=1/3 )

Remarkably, all of these are independent.

We define a cuteness index (λ) for a given puppy as λ = 4*w - k + 3*e + 5

1) What is the mean λ for all puppies?2) What is the variance of λ for puppies?3) What is the probability that a randomly sampled puppy

has a cuteness index greater than 10?


Z scores• What is the probability that our sample mean will have a Z-

score > 1.96 or < -1.96? (i.e. will be more than 1.96 standard errors away from the population mean?)

• What is the ‘critical’ absolute Z value such that the Z-score of our sample mean will have an absolute value less than that with probability 68.27%?

40


Sampling dist. of the sample mean• We draw N samples from a distribution with mean=100,

sd=15 (e.g., N IQ scores). We calculate the mean of those n samples. What is the distribution of the sample mean? (Mean[sample mean]? Variance[sample mean]?)


Linear transformation practice.1) We find that B0 = 0; B1 = 0.1 in:

z.extraversion ~ (height.in – mean(height))*B1 + B0How do we expect extraversion to differ between a 5’9” and a 6’0” person?

2) We are trying to predict newborn weight based on the weights of the mother and the father. How would you set up this regression?

3) We find: gre.percentile ~ (income.percentile)*0.5-0.4What is wrong with extrapolation of this regression line?

4) We find: z.rt ~ –0.4*(z.iq).Mean(rt) = 400, sd(rt) = 150; mean(iq)=102; sd(iq)=14What is the predicted RT of someone with an IQ of 106?

5) We find: fat.percentage = 17 + 3800*(weight.lb / height.in^3)(weight.lb / height.in^3): mean = 0.0005. sd=0.0005What’s a better way to have set up this regression?


Fatreadr::read_tsv('http://vulstats.ucsd.edu/data/bodyfat.data2.txt')

What variables predict bodyfat percentage?- We have a bunch of very correlated predictors; how can we make new

variables to orthogonalize them?- What’s a good model to predict bodyfat percentage?- What would we predict is the bodyfat percentage of someone who is:

- Height: 69- Weight: 175- Neck: 36- Chest: 100- Abdomen: 90- Hip: 99

-Thigh: 58-Knee: 38-Ankle: 22-Bicep: 31-Forearm: 28-Wrist: 17

201ab quantitative methods multiple regression (b) · 2020. 11. 10. · sst (ss total, also ssy)...

Documents