201ab quantitative methods multiple regression (b) · 2020. 11. 10. · sst (ss total, also ssy)...

43
ED VUL | UCSD Psychology 201ab Quantitative methods Multiple regression (b) With great illustrations from Julian Parris.

Upload: others

Post on 27-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • ED VUL | UCSD Psychology

    201ab Quantitative methodsMultiple regression (b)

    With great illustrations from Julian Parris.

  • ED VUL | UCSD Psychology

    Multiple regression• Review• Coefficient of “partial determination”

    (partial R2, partial eta2)• Nested models• Non-nested models• Polynomial regression• Multiple regression diagnostics• Partial correlations and mediation analyses

  • ED VUL | UCSD Psychology

    Regression

    Yi = β0 + β1X1i + β2X2i + ε i εi ~ N(0,σε2 )

    summary(lm(daughter~dad+mom))

    Coefficients:Estimate Std. Error t value Pr(>|t|)

    (Intercept) 3.7872 4.6471 0.815 0.417082 mom 0.5210 0.1164 4.477 2.06e-05 *dad 0.3900 0.1078 3.617 0.000475 *

    Coefficients:- Partial slope: dY/dXj holding

    other Xs constant.

    Multicolinearity:- Correlation among

    predictors.- Credit assignment is

    uncertain- Coefficients change; are

    sensitive to model and noise; have higher marginal errors.

  • ED VUL | UCSD Psychology

    SST (SS total, also SSY)

    SSR[X1] SSE[X1]

    Variability in Y left over after factoring in X1

    SSR[X1] SSE[X1,X2]SSR[X2|X1]

    SSR[X2] SSE[X1,X2]SSR[X1|X2]

    SSR[X1,X2] SSE[X1,X2]

    Variability in Y accounted for by X1 & X2e.g., Variability in daughters’ heights accounted for by mothers’ and fathers’ height

    Variability unaccounted

    for by X1 & X2

    Extra sums of squares: Extra variability accounted for by taking into account X1 after having considered X2.

    e.g., Additional variability in daughters’ heights accounted for by taking into account mothers’ heights having already considered fathers’ height

  • ED VUL | UCSD Psychology

    F(dfterm,dferror ) =

    SStermdfterm

    !

    "#

    $

    %&

    SSEFULLdferror

    !

    "#

    $

    %&

    d.f. of regression term: # parameters of this term

    d.f. error: n minus # parameters in full model

    SS: Sum of squares for this term

    SSE: Sum of squared residuals

    SST (SS total, also SSY)

    SSR[X1,X2] SSE[X1,X2]

    anova(lm(son~mom+dad))

    Response: sonDf Sum Sq Mean Sq F value Pr(>F)

    mom 1 79.523 79.523 15.3977 0.00572 **dad 1 9.225 9.225 1.7862 0.22320 Residuals 7 36.152 5.165

    anova(lm(son~dad+mom))

    Response: sonDf Sum Sq Mean Sq F value Pr(>F)

    dad 1 79.595 79.595 15.4116 0.005707 **mom 1 9.153 9.153 1.7723 0.224818 Residuals 7 36.152 5.165

    SSR[X1] SSE[X1,X2]SSR[X2|X1]

    SSR[X2] SSE[X1,X2]SSR[X1|X2]

  • ED VUL | UCSD Psychology

    F(pFULL − pREDUCED,n− pFULL ) =

    SSEREDUCED − SSEFULLpFULL − pREDUCED

    "

    #$

    %

    &'

    SSEFULLn− pFULL

    "

    #$

    %

    &'

    Extra parameters in full model

    n minus number of parameters in full model

    Extra sums of squares of full compared to reduced

    Remaining sums of squares error in full model

    SSR[X1,X2,X3] SSE[X1,X2,X3]

    SSX[X2,X3|X1]

    SSR[X1] (SS regression var 1) SSE[X1]anova(lm(y~x1))

    Df Sum Sq Mean Sq F value Pr(>F) x1 1 517.18 517.18 64.373 2.263e-12 *Residuals 98 787.34 8.03

    anova(lm(y~x1+x2+x3))

    Df Sum Sq Mean Sq F value Pr(>F) x1 1 517.18 517.18 545.73 < 2.2e-16 *x2 1 460.22 460.22 485.62 < 2.2e-16 *x3 1 236.15 236.15 249.19 < 2.2e-16 *Residuals 96 90.98 0.95 anova( lm(y~x1) , lm(y~x1+x2+x3) )

    Model 1: y ~ x1Model 2: y ~ x1 + x2 + x3

    Res.Df RSS Df Sum of Sq F Pr(>F) 1 98 787.34 2 96 90.98 2 696.37 367.4 < 2.2e-16 *

  • ED VUL | UCSD Psychology

    Significance in regression• Pairwise correlation t-test.– Is there a significant linear relationship between Y and Xj

    ignoring other predictors?

    • Coefficient t-test.– Does the partial slope dY/dXj controlling for all other predictors

    differ significantly from zero?• Variance-partitioning F-tests.– Is the sums of squares allocated to this term (depends on order,

    SS type) significantly greater than chance?• Nested model comparison F-tests.– Does the larger model account for significantly more variance

    than the smaller model?In some special cases, these end up equivalent.

  • ED VUL | UCSD Psychology

    Multiple regression• Review• Coefficient of “partial determination”

    (partial R2, partial eta2)• Nested models• Non-nested models• Polynomial regression• Multiple regression diagnostics• Partial correlations and mediation analyses

  • ED VUL | UCSD Psychology

    SST (SS total, also SSY)

    SSR[X1] SSE[X1]

    Variability in Y left over after factoring in X1

    SSR[X1] SSE[X1,X2]SSR[X2|X1]

    SSR[X2] SSE[X1,X2]SSR[X1|X2]

    SSR[X1,X2] SSE[X1,X2]

    Variability in Y accounted for by X1 & X2e.g., Variability in daughters’ heights accounted for by mothers’ and fathers’ height

    Variability unaccounted

    for by X1 & X2

    Extra sums of squares: Extra variability accounted for by taking into account X1 after having considered X2.

    e.g., Additional variability in daughters’ heights accounted for by taking into account mothers’ heights having already considered fathers’ height

  • ED VUL | UCSD Psychology

    SST (SS total, also SSY)

    SSR[X1] (SS regression var 1) SSE[X1] (SS error)

    R2Proportion of variability in Y accounted for by X1

    e.g., Variability in daughters’ heights accounted for by mothers’ height“Coefficient of determination”

    1-R2Proportion of variability unaccounted for by X1

    e.g., Variability in daughters’ heights not accounted for by mothers’ height

    SSR[X1] SSE[X1,X2]SSX[X2|X1]

    SSR[X1,X2] SSE[X1,X2]

    R2Proportion of variability in Y accounted for by X1, X2

    e.g., Variability in daughters’ heights accounted for by mothers’ and fathers’ height“Coefficient of multiple determination”

  • ED VUL | UCSD Psychology

    SST (SS total, also SSY)

    SSR[X1] (SS regression var 1) SSE[X1] (SS error)

    R2Proportion of variability in Y accounted for by X1

    e.g., Variability in daughters’ heights accounted for by mothers’ height“Coefficient of determination”

    1-R2Proportion of Variability unaccounted for by X1

    e.g., Variability in daughters’ heights not accounted for by mothers’ height

    SSR[X1] SSE[X1,X2]SSX[X2|X1]

    R2Y,X2|X1Proportion of variability previously unaccounted for by X1 that can be

    accounted for by X2“Coefficient of partial

    determination”

    RYX2|X12 =

    SSX[X2 | X1]SSE[X1]

  • ED VUL | UCSD Psychology

    Multiple regression• Review• Coefficient of “partial determination”

    (partial R2, partial eta2)• Nested models• Non-nested models• Polynomial regression• Multiple regression diagnostics• Partial correlations and mediation analyses

  • ED VUL | UCSD Psychology

    • Nested Model: A smaller model that differs only by excluding some parameters of a larger model.A) height ~ mom + dad + protein + exercise + milk B) height ~ mom + dad + protein + exerciseC) height ~ dad + protein + exerciseD) height ~ mom + dadE) height ~ dad + proteinF) height ~ mom + dad + milkG) height ~ exercise + milk + beerH) weight ~ mom + dad + protein + exercise

    B in A; C in A, B; D in A, B; E in A, B, C; F in A; A, G, H are not nested in others.

  • ED VUL | UCSD Psychology

    F-tests compare nested models

    14

    They ask: is a bigger model better than a smaller model?

    height ~ mom + dad + protein + exercise + milk (nested)height ~ mom + dad + protein + exercise(nested)height ~ dad + protein + exercise(nested) height ~ protein + dad(nested) height ~ dad(nested)height ~ 1

  • ED VUL | UCSD Psychology

    F(pFULL − pREDUCED,n− pFULL ) =

    SSEREDUCED − SSEFULLpFULL − pREDUCED

    "

    #$

    %

    &'

    SSEFULLn− pFULL

    "

    #$

    %

    &'

    d.f. of numerator d.f. of denominator

    d.f. of numerator: number of extra parameters in full model

    d.f. of denominator: n minus number of parameters in full model

    Extra sums of squares of full compared to reduced: Estimated by difference in SSE.

    Remaining sums of squares error in

    full model

  • ED VUL | UCSD Psychology

    - Extra sums of squares of full compared to reduced model is the difference in sums of squares of error.

    - Degrees of freedom of the extra sums of squares is the number of parameters added.

    - The remaining sums of squares error from the full model is the denominator.

    - Degrees of freedom of error are n minus the number of parameters in full model.

    F(pFULL − pREDUCED,n− pFULL ) =

    SSEREDUCED − SSEFULLpFULL − pREDUCED

    "

    #$

    %

    &'

    SSEFULLn− pFULL

    "

    #$

    %

    &'

  • ED VUL | UCSD Psychology

    F(pFULL − pREDUCED,n− pFULL ) =

    SSEREDUCED − SSEFULLpFULL − pREDUCED

    "

    #$

    %

    &'

    SSEFULLn− pFULL

    "

    #$

    %

    &'

    • SSE reduced is just SST (a 1 parameter regression model considering only the mean of Y: B0)

    • SSR[X1] = SST – SSE[x1]

    SST (SS total, also SSY)

    SSR[X1] (SS regression var 1) SSE[X1]

    F = (SSR[x1] / (2-1)) / (SSE[x1] / (n-2))

  • ED VUL | UCSD Psychology

    F(pFULL − pREDUCED,n− pFULL ) =

    SSEREDUCED − SSEFULLpFULL − pREDUCED

    "

    #$

    %

    &'

    SSEFULLn− pFULL

    "

    #$

    %

    &'

    • SSE reduced is SSE[x1]. SSE full is SSE[x1,x2,x3]• SSX[x2,x3|x1] = SSE[x1]– SSE[x1,x2,x3]• # parameters full: 4 (b0, b1, b2, b3)• # parameters reduced: 2 (b0, b1)

    SSR[X1] (SS regression var 1) SSE[X1,X2,X3]SSX[X2,X3|X1]

    F = (SSX[x2,x3|x1] / (2)) / (SSE[x1,x2,x3] / (n-4))

    SSR[X1] (SS regression var 1) SSE[X1]

  • ED VUL | UCSD Psychology

    F(pFULL − pREDUCED,n− pFULL ) =

    SSEREDUCED − SSEFULLpFULL − pREDUCED

    "

    #$

    %

    &'

    SSEFULLn− pFULL

    "

    #$

    %

    &'

    • SSE reduced is SSE[b0]. SSE full is SSE[x1,x2,x3]• SSR[x2,x3,x1] = SSE[b0]– SSE[x1,x2,x3]• # parameters full: 4 (b0, b1, b2, b3)• # parameters reduced: 1 (b0)

    SST (SS total, also SSY)

    SSR[X1,X2,X3] SSE[X1,X2,X3]

    F = (SSR[x1,x2,x3] / (4-1)) / (SSE[x1,x2,x3] / (n-4))

  • ED VUL | UCSD Psychology

    F(pFULL − pREDUCED,n− pFULL ) =

    SSEREDUCED − SSEFULLpFULL − pREDUCED

    "

    #$

    %

    &'

    SSEFULLn− pFULL

    "

    #$

    %

    &'

    • SSE reduced is SSE[x1,x3]. SSE full is SSE[x1,x2,x3]• SSX[x2|x1,x3] = SSE[x1,x3]– SSE[x1,x2,x3]• # parameters full: 4 (b0, b1, b2, b3)• # parameters reduced: 3 (b0,b1,b3)

    F = (SSX[x2|x1,x3] / (1)) / (SSE[x1,x2,x3] / (n-4))

    SSR[X1,X3] SSX[X2|X1,X3]SSE[X1,X2,

    X3]

    SSR[X1,X3] SSE[X1,X3]

  • ED VUL | UCSD Psychology

    SST (SS total, also SSY)

    SSR[X1,X2,X3] SSE[X1,X2,X3]

    SSR[X1] (SS regression var 1) SSE[X1,X2,X3]SSX[X2,X3|X1]

    Comparisons:

    Does X2 account for the variability in Y left over after taking into account X1 and X2 better than chance?

    SSR[X1,X3] SSX[X2|X1,X3]SSE[X1,X2,

    X3]

    Omnibus: Do X1, X2, and X3 together account for the variability in Y better than chance?

    Do X2 and X3 together account for the variability in Y left over after taking into account X1better than chance?

    SSR[X1] (SS regression var 1) SSE[X1]

    OLS regression: Does X1 account for the variability in Y better than chance?

  • ED VUL | UCSD Psychology

    SST (SS total, also SSY)

    SSR[X1,X2,X3] SSE[X1,X2,X3]

    SSR[X1] (SS regression var 1) SSE[X1,X2,X3]SSX[X2,X3|X1]

    Comparisons:

    F = (SSX[x2|x1,x3] / (1)) / (SSE[x1,x2,x3] / (n-4))

    SSR[X1,X3] SSX[X2|X1,X3]SSE[X1,X2,

    X3]

    F = (SSR[x1,x2,x3] / (4-1)) / (SSE[x1,x2,x3] / (n-4))

    F = (SSX[x2,x3|x1] / (2)) / (SSE[x1,x2,x3] / (n-4))

    SSR[X1] (SS regression var 1) SSE[X1]

    F = (SSR[x1] / (2-1)) / (SSE[x1] / (n-2))

  • ED VUL | UCSD Psychology

    Multiple regression• Review• Coefficient of “partial determination”

    (partial R2, partial eta2)• Nested models• Non-nested models• Polynomial regression• Multiple regression diagnostics• Partial correlations and mediation analyses

  • ED VUL | UCSD Psychology

    • F test allows us to compare nestedmodels.

    • How do we compare non-nested models?– height ~ mom + dad– height ~ mom + protein– height ~ protein + exercise– height ~ ethnicity– weight ~ mom + dad

    F(pFULL − pREDUCED,n− pFULL ) =

    SSEREDUCED − SSEFULLpFULL − pREDUCED

    "

    #$

    %

    &'

    SSEFULLn− pFULL

    "

    #$

    %

    &'

    “Model building” comparison:Is it better to add dad or protein to model that already has mom?Is it better to add mom or exerciseto a model that already has protein?

    I am using these terms to describe different comparisons only for convenience, these are not really technical names for different non-nested model comparisons. In reality, all of them are ‘model selection’ problems.

  • ED VUL | UCSD Psychology

    • F test allows us to compare nestedmodels.

    • How do we compare non-nested models?– height ~ mom + dad– height ~ mom + protein– height ~ protein + exercise– height ~ ethnicity– weight ~ mom + dad

    F(pFULL − pREDUCED,n− pFULL ) =

    SSEREDUCED − SSEFULLpFULL − pREDUCED

    "

    #$

    %

    &'

    SSEFULLn− pFULL

    "

    #$

    %

    &'

    “Model selection” comparison:Is a model with mom and dadbetter than a model with protein and exercise? A model with ethnicity?(These can also be seen as model building problems: would it be better to add these or those regressors to null model)

    I am using these terms to describe different comparisons only for convenience, these are not really technical names for different non-nested model comparisons. In reality, all of them are ‘model selection’ problems.

  • ED VUL | UCSD Psychology

    • F test allows us to compare nestedmodels.

    • How do we compare non-nested models?– height ~ mom + dad– height ~ mom + protein– height ~ protein + exercise– height ~ ethnicity– weight ~ mom + dad

    F(pFULL − pREDUCED,n− pFULL ) =

    SSEREDUCED − SSEFULLpFULL − pREDUCED

    "

    #$

    %

    &'

    SSEFULLn− pFULL

    "

    #$

    %

    &'

    Weird (but sometimes useful) model comparison:Is height more/less predictable by mom and dad (height?) than weight?

    I am using these terms to describe different comparisons only for convenience, these are not really technical names for different non-nested model comparisons. In reality, all of them are ‘model selection’ problems.

  • ED VUL | UCSD Psychology

    • How do we compare non-nested models?– There isn’t really a good way to test the null hypothesis that

    two non-nested models are equally good. Because (a) we don’t know what ‘good’ means. Bigger models will have better fits, how do we trade off fit with model size(b) Even if we define ‘good’, the difference in goodness of two models doesn’t have a definable null hypothesis distribution.

    – Consequently, we just define some goodness statistic and compare the numerical difference in goodness.(Bayesian methods offer ways to attach probability statements to goodness comparisons between non-nested models, but we will not be dealing with this now)

  • ED VUL | UCSD Psychology

    • How do we compare non-nested models?Goodness:– R2 (no punishment for bigger models: fit is all that counts)

    • Useful for simple model building when number of parameters is constant: which parameter is a better one to add to the model I already have? Which K parameter model better fits these data?

  • ED VUL | UCSD Psychology

    • How do we compare non-nested models?– Goodness:– R2a ‘Adjusted R squared’

    (like R^2, but punished for having more parameters)

    RA2 = R2 =1− (1− R2 ) n−1

    n− p=1− SSE

    SST(n−1)(n− p)

  • ED VUL | UCSD Psychology

    • How do we compare non-nested models?Goodness:– R2

    – R2a ‘Adjusted R squared’

    – Lots more available based on likelihood, rather than SS: AIC, BIC, WAIC, DIC, etc. (more next term)

    – Complicated ones available based on “marginal likelihood” or “model evidence” via Bayesian methods: Bayes Factor

    – They all define some trade off between number of parameters and fit to the data.• Sometimes they will give you different answers! If so, you should be

    worried. A clearly better model should do better on all of these metrics. When different metrics give you different answers you should not be confident.

  • ED VUL | UCSD Psychology

    Midterm• Goes out today after class• Due Wednesday night

    • Do not work together.

    31

  • ED VUL | UCSD Psychology

    • R, data, etc.– Markdown– Cleaning,– Graphing

    • Probability– Basic rules, conditional p, PDF, CDF, quantile– Expectation, variance of RV

    • Significance testing– p, alpha, power, effect size, – Z-tests

    • Linear model– Simple T-tests– Correlation, – OLS Regression– Multiple regression

  • ED VUL | UCSD Psychology

    Some typical probability questions.• When flipping a fair coin, what is the probability that the

    first occurrence of heads will be on the 5th flip?

    • If you run 20 independent, tests on truly null data, each with a false-positive rate of 0.05, what’s the probability that you will get at least 1 false positive (the familywisefalse-positive rate)?– What would the per-test false-positive rate have to be for

    the familywise false positive rate –P(at least 1 false-positive) -- to be 0.05?

  • ED VUL | UCSD Psychology

    Binomial questionsUse vectors, dbinom(), and sum() in R to make these calculations!

    • Generally, 51.2% of all (US) births are male. A hospital has 10 births in one day. What is the probability that…– exactly 6 of them will be male?– 7 or more of them will be female?– The probability that the proportion of male births will be

    abnormali.e., either abnormally high (>75%) or abnormally low (

  • ED VUL | UCSD Psychology

    Binomial questions with rbinom()Use rbinom() to make these calculations!

    • Generally, 51.2% of all (US) births are male. A hospital has 10 births in one day. What is the probability that…– exactly 6 of them will be male?– 7 or more of them will be female?– The proportion of male births will be abnormal

    i.e., either abnormally high (>75%) or abnormally low (

  • ED VUL | UCSD Psychology

    Cumulative probability questionsuse pnorm(), pbinom() in R

    • IQ normally distributed with a mean=100 and sd=15.What is the probability that a given person has an IQ…– Less than 120?– Greater than 145?– Between 90 and 110?

    • Test scores on a 25 item quiz are binomially distributed with n=25, p=0.7.What is the probability that a given person’s score is…– Less than 17?– Greater than 20?– Between 15 and 20?

  • ED VUL | UCSD Psychology

    Quantile questionsUse qnorm(), qbinom() in R

    • IQ normally distributed with a mean=100 and sd=15.– What would your IQ score have to be to join MENSA

    (98th percentile of IQ distribution)?– What is the interquartile range of IQ?– The Prometheus society accepts only the top 1/30000th of the

    IQ distribution. How much higher is the IQ cutoff for Prometheus membership as compared to MENSA?

    • Test scores on a 25 item quiz are binomially distributed with n=25, p=0.7.– What score would put you in the 90th percentile?– What is the interquartile range for these scores?– What is the median score?

  • ED VUL | UCSD Psychology

    Expectation questions1) Scores (X) on a 25 item quiz are distributed as a Binomial

    with n=25, p=0.7- What is the expected value of X?- Variance of X?- Skew of X?- Kurtosis of X?

    2) Scores on another quiz (Y) have Mean[Y]=15, Var[Y]=16 (and are independent with scores on the first quiz). I add both scores up to get the final score, Z.- What is Mean[Z]?- What is Var[Z]?

  • ED VUL | UCSD Psychology

    Expectation + Normal questionLet's say puppies have the following traits:wagging speed: w(Hz) ~ Norm ( mu=1, sd=0.25 )ear stiffness: k(log GPa) ~ Norm ( mu=-2.5, sd=0.5 )eye/head size: e(log m2/m2) ~ Norm ( mu=-1.5, sd=1/3 )

    Remarkably, all of these are independent.

    We define a cuteness index (λ) for a given puppy as λ = 4*w - k + 3*e + 5

    1) What is the mean λ for all puppies?2) What is the variance of λ for puppies?3) What is the probability that a randomly sampled puppy

    has a cuteness index greater than 10?

  • ED VUL | UCSD Psychology

    Z scores• What is the probability that our sample mean will have a Z-

    score > 1.96 or < -1.96? (i.e. will be more than 1.96 standard errors away from the population mean?)

    • What is the ‘critical’ absolute Z value such that the Z-score of our sample mean will have an absolute value less than that with probability 68.27%?

    40

  • ED VUL | UCSD Psychology

    Sampling dist. of the sample mean• We draw N samples from a distribution with mean=100,

    sd=15 (e.g., N IQ scores). We calculate the mean of those n samples. What is the distribution of the sample mean? (Mean[sample mean]? Variance[sample mean]?)

  • ED VUL | UCSD Psychology

    Linear transformation practice.1) We find that B0 = 0; B1 = 0.1 in:

    z.extraversion ~ (height.in – mean(height))*B1 + B0How do we expect extraversion to differ between a 5’9” and a 6’0” person?

    2) We are trying to predict newborn weight based on the weights of the mother and the father. How would you set up this regression?

    3) We find: gre.percentile ~ (income.percentile)*0.5-0.4What is wrong with extrapolation of this regression line?

    4) We find: z.rt ~ –0.4*(z.iq).Mean(rt) = 400, sd(rt) = 150; mean(iq)=102; sd(iq)=14What is the predicted RT of someone with an IQ of 106?

    5) We find: fat.percentage = 17 + 3800*(weight.lb / height.in^3)(weight.lb / height.in^3): mean = 0.0005. sd=0.0005What’s a better way to have set up this regression?

  • ED VUL | UCSD Psychology

    Fatreadr::read_tsv('http://vulstats.ucsd.edu/data/bodyfat.data2.txt')

    What variables predict bodyfat percentage?- We have a bunch of very correlated predictors; how can we make new

    variables to orthogonalize them?- What’s a good model to predict bodyfat percentage?- What would we predict is the bodyfat percentage of someone who is:

    - Height: 69- Weight: 175- Neck: 36- Chest: 100- Abdomen: 90- Hip: 99

    -Thigh: 58-Knee: 38-Ankle: 22-Bicep: 31-Forearm: 28-Wrist: 17