lecture 6: introduction to linear...

Post on 18-Aug-2018

224 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 6:Introduction to Linear Regression

Ani Manichaikulamanicha@jhsph.edu

24 April 2007

2

Linear regression: main idea

Linear regression can be used to study anoutcome as a linear function of a predictorExample: 60 cities in the US were evaluatedfor numerous characteristics, including:

the percentage of the population that was“disadvantaged”median education level

3

Binary education variable10

1520

2530

% o

f pop

ulat

ion

with

inco

me

< $3

000

Low Education High Education

4

Linear regression vs. ANOVAThese means could be compared by a t-test or ANOVA

Mean in low education group: 15.7%Mean in high education group: 13.2%

Regression provides a unified equation:

where Xi= 1 for high education 0 for low education (X is a“dummy variable” or “indicator variable” that designatesgroup)

ii

i10i

X5.27.51Y

XY

5

Interpreting the modelis the predicted mean of the outcome for

Xi, that observation’s value for X.

Xi=0 (Low education)

Xi=1 (High education)

0

i

7.1505.27.51Y

iY

ii

i10i

X5.27.51Y

XY

10

i

2.1315.27.51Y

6

Interpretation

0 is the mean outcome for thereference group, or the group forwhich Xi=0.Here, 0 is the average percent of thepopulation that is disadvantaged forcities with low education.

7

Interpretation

1 is the difference in the meanoutcome between the two groups(when Xi=1 vs. when Xi=0)Here, 1 is difference in the averagepercent of the population that isdisadvantaged for cities with higheducation compared to cities with loweducation.

8

Why use linear regression?

Linear regression is very powerful. Itcan be used for many things:

Binary XContinuous XCategorical XAdjustment for confoundingInteractionCurved relationships between X and Y

9

Regression Analysis

A regression is a description of aresponse measure, Y ,the dependentvariable, as a function of anexplanatory variable, X, theindependent variable.Goal: prediction or estimation of thevalue of one variable, Y , based on thevalue of the other variable, X.

10

Regression Analysis

A simple relationship between the twovariables is a linear relationship(straight line relationship)

Other names: linear, simple linear, leastsquares regression

11

Galton’s Example

1000 records of heights of familygroupsReally tall fathers tend on average tohave tall sons but not quite as tall asthe really tall fathersThere is a “regression” of a son’s heighttoward the average height for sons

12

Galton’s ExampleRegression of Son's Stature on Father'sE(Y) = 33.73 + 0.516*X

Son

's H

eigh

t

Father's Height (inches)60 62 64 66 68 70 72 74

64

66

68

70

72

74

13

Regression Analysis:Population Model

Probability Model: independent responses

y1, y2,…,yn are sampled from

Yi ~ N( i, 2)

Systematic Model: µi = E(yi|xi) = 0 + 1xiwhere: 0 = intercept

1 = slope

14

Another way to write the model

Systematic: yi = 0 + 1xi + i

Probability: i ~ N(0, 2)

The response, Yi, is a linear function ofXi plus some random, normallydistributed error, I

Data = Signal + noise

15

Geometric Interpretation

16

Model1) Yi ~ N( i, 2)2) µi = E(yi|xi) = 0 + 1xi

OR1) yi = 0 + 1xi + i2) i ~ N(0, 2)

where: 0 = intercept1 = slope

The response, Yi, is a linear function of Xiplus some random, normally distributederror, i

17

Interpretation of Coefficients

Mean Model: µ = E(y|x) = 0 + 1x0 = expected response when X = 0

Since: E(y|x=0) = 0 + 1(0) = 0

1 = change in expected response per 1 unitincrease in X

Since: E(y|x+1) = 0 + 1(x+1)And: E(y|x) = 0 + 1x

E(y) from x to x+1 = 1

18

From Galton’s Example

E(Y|x) = 0 + 1xE(Y|x) = 33.7 + 0.52x

where: Y = son’s height (inches)x = father’s height (inches)

Expected son’s height =33.7 inches whenfather’s height is 0 inchesExpected difference in heights for sons whosefathers’ heights differ by one inch = 0.52inches

19

City/Education Example10

1520

2530

9 10 11 12 13Median education

% o

f pop

ulat

ion

with

inco

me

< $3

000

20

Model

where Xi = the median educationlevel in city i

when Xi=0

when Xi=1

when Xi=2

ii

i10i

X0.22.36Y

XY

0

i

36.200.22.36Y

10

i

34.210.22.36Y

232.220.22.36Y

10

i

21

Interpretation

0 is the mean outcome for thereference group, or the group forwhich Xi=0.Here, 0 is the average percent of thepopulation that is disadvantaged forcities with median education level of 0.

22

Interpretation

1 is the difference in the meanoutcome for a one unit change in X.Here, 1 is difference in the averagepercent of the population that isdisadvantaged between two cities,when the first city has 1% highermedian education level than the secondcity.

23

Finding ’s from the graph

0 is the Y-intercept of the line, or theaverage value of Y when X=0.

1 is the slope of the line, or the averagechange in Y per unit change in X.

y=mx+bb= 0, m= 1

21

211 xx

yyˆNotation:

1 represents the true slope (in the population)

b1 and are sample estimates of the slope1ˆ

24

Where is our intercept?10

1520

2530

3540

4550

5560

0 2 4 6 8 10 12 14Median education

% o

f pop

ulat

ion

with

inco

me

< $3

000

25

Centering

0 makes no sense!We can change X to fix this problemby a process called centering

1. Pick a value of X (c) within the range ofthe data

2. For each observation, generateX_centered = Xi-c

3. Redo the regression with X_centered

26

We’ll use c=12,a high school degree

1015

2025

30

9 10 11 12 13Median education

% o

f pop

ulat

ion

with

inco

me

< $3

000

27

New equation

1 has not changed

0 now corresponds to X=12, not X=0

Note: with X=0, we have

12X0.22.12Y

12XY

ii

i10i

36.22412.21200.22.12Yi

28

Interpretation

0 is the mean outcome for the referencegroup, or the group for which Xi-12=0, orwhen Xi=12.Here, 0 (12.2%) is the average percent ofthe population that is disadvantaged for citieswith a median education level of 12, theequivalent of a high school degree.The interpretation of 1 has not changed.

29

Centering in Galton ExampleMake 6 feet (72 inch) fathers the ‘reference group’Create a new X variable, X*, by subtracting 72 fromour old X variable, X* = X – 72

Then: E(Y|x*) = 0 + 1x*= 0 + 1(x – 72)

So, 0 = expected response when X = 72,since E(Y|x=72) = 0 + 1(72 – 72) = 0

Center X’s whenever interpretations call for it!

30

Population Comparisons

0: changes depending on centering of X,which doesn’t affect association of interestReal concern: is X associated with Y?Assess by testing 1:Does 1=0 in the population from which thissample was drawn?

Hypothesis testingConfidence interval

31

Hypothesis testing

H0: 1=0Test statistic:

df = n-k-1n = number of observationsk = number of predictors (X’s)

1

1obs ˆSE

0ˆt

32

Hypothesis testing foreducation example

H0: 1=0Test statistic:

df = n-k-1 = 60-1-1 = 58n = number of observations = 60k = number of predictors (X’s) = 1

p<2*(1-0.995)p<0.01

36.30.59

00.2-tobs

33

Interpretation and conclusionIf there were no association between medianeducation and percentage of disadvantagedcitizens in the population, there would be lessthan a 1% chance of observing data as ormore extreme than ours.

The null probability is very small, so:reject the null hypothesisconclude that median education level andpercentage of disadvantaged citizens areassociated in the population

34

Confidence IntervalNo need to specify a hypothesis:

3.2,-0.8-0.59021.20.2

ˆSEtˆ1cr1

35

Interpretation and conclusion

We are 95% confident that the truepopulation decrease in percentage ofdisadvantaged citizens per additional year ofmedian education is between 3.2 and 0.8.

Since this interval does not contain 0, webelieve percentage of disadvantaged citizensand median education are associated amongcities in the United States.

36

So far…Linear regression is used for continuous outcomevariables

0: mean outcome when X=0Binary X = “dummy variable” for group

1: mean difference in outcome between groupsContinuous X

1: mean difference in outcome corresponding toa 1-unit increase in XCenter X to give meaning to 0

Test 1=0 in the population

Linear Regression:Multiple covariates andconfounding

38

Dataset

Hourly wage information from 9,918workers, along with informationregarding age, gender, years ofexperience, etc.We’ll focus on predicting hourly wagewith available information.

39

Regression: Hourly wage vs.Years of experience

010

2030

4050

0 20 40 60Years of Experience

Hou

rly W

age

40

What are the parameters?For each person, their actual hourly wage (Yi)and predicted hourly wage are known.

is the residual or errorThe parameters are found by minimizing thesum of the squared error

The parameters are the “least squares”estimates

i10i

iii

XYYY

n

1i

2i10i XYmin

iY

41

Notesfor any known pointon the line

is always true

The regression line equation

XY 10

i10i XY

ii10i XY

42

Model 1Model 1: Predict income by years of experience

so the average hourly wage for someonewith no experience at all is about $8.40.

so for every additional year of experience,the predicted hourly wage increases about 4 cents.

For 10 years of additional experience, the predicted hourlywage increases about 40 cents.

38.8ˆ0

04.0ˆ1

iii10i X04.038.8YXˆˆY

43

Should we center X?

0 years of experience is within therange of the dataThe average hourly wage correspondingto 0 years of experience makes sense

No need to center X

44

What happens if we alsoconsider gender? (Model 2)

010

2030

4050

0 20 40 60Years of Experience

Men's hourly wage Women's hourly wagefit2_men fit2_women

Hou

rly W

age

45

Model 2: Gender effect,no experience

For a man with no experience:

For a woman with no experience:0

i

ˆ9.27$)0(2.20-0)(04.027.9Y

20

i

ˆˆ$7.072.20(1)-0.04(0)9.27Y

)enderG(2.20-)Experience(04.027.9Y

)enderG(ˆ)Experience(ˆˆY

iii

i2i10i

46

Model 2: Gender effect,10 years experience

For a man with 10 years of experience:

For a woman with 10 years of experience:

)enderG(2.20-)Experience(04.027.9Y

)enderG(ˆ)Experience(ˆˆY

iii

i2i10i

(10)ˆˆ9.67$)0(2.20-0)1(04.027.9Y

10

i

(1)ˆ(10)ˆˆ7.47$)1(2.20-0)1(04.027.9Y

210

i

47

Model 2: Experience effect,males

For a man with no experience:

For a man with 10 years of experience:0

i

ˆ9.27$)0(2.20-0)(04.027.9Y

)enderG(2.20-)Experience(04.027.9Y

)enderG(ˆ)Experience(ˆˆY

iii

i2i10i

(10)ˆˆ9.67$)0(2.20-0)1(04.027.9Y

10

i

48

Model 2: Experience effect,females

For a woman with no experience:

For a woman with 10 years of experience:

)enderG(2.20-)Experience(04.027.9Y

)enderG(ˆ)Experience(ˆˆY

iii

i2i10i

210

i

ˆ(10)ˆˆ7.47$)1(2.20-0)1(04.027.9Y

20

i

ˆˆ$7.072.20(1)-0.04(0)9.27Y

49

Interpretation: Model 2

: the average hourly wage for a manwith no experience at all is about $9.30.

: for every additional year ofexperience, the predicted hourly wage increasesabout 4 cents for both men and women.

: the expected hourly wage is $2.20lower for women than it is for men at anyexperience level.

27.9ˆ0

04.0ˆ1

20.2ˆ2

50

Model 1 vs. Model 2Model 1:

Model 2:

95% CI for 1 in Model 1: (0.001, 0.07)and from Model 2 is within this CI

Gender is not a confounder

)enderG(2.20-)Experience(04.027.9Y iii

ii Experience04.038.8Y

51

What happens if we considerage, instead? (Model 3)

The relationship is harder to graph with twocontinuous predictors, since now theregression is in a 3-dimensional space.

Notice that age is centered at 40 years.Age ranged between 18 and 64 in thisdataset.

40)-Age(ˆ)Experience(ˆˆY i2i10i

52

Model 3: Age effect,no experience

For a 40-year-old with no experience:

For a 41-year-old with no experience:0

i

ˆ50.62$)4040(0.920)(82.05.26Y

40)-Age(0.92)Experience(82.05.6240)-Age(ˆ)Experience(ˆˆY

ii

i2i10i

20

i

ˆˆ42.72$)4041(0.920)(82.05.26Y

53

Model 3: Age effect,10 years experience

For a 40-year-old with 10 years of experience:

For a 41-year-old with 10 years of experience:

10ˆˆ30.18$)4040(0.920)1(82.05.26Y

10

i

40)-Age(0.92)Experience(82.05.6240)-Age(ˆ)Experience(ˆˆY

ii

i2i10i

1ˆ10ˆˆ22.19$)4041(0.920)1(82.05.26Y

210

i

54

Model 3: Experience effect,40 year old

For a 40-year-old with no experience:

For a 40-year-old with 10 years of experience:0

i

ˆ50.62$)4040(0.920)(82.05.26Y

40)-Age(0.92)Experience(82.05.6240)-Age(ˆ)Experience(ˆˆY

ii

i2i10i

10ˆˆ30.18$)4040(0.920)1(82.05.26Y

10

i

55

Model 3: Experience effect,41 year old

For a 41-year-old with no experience:

For a 41-year-old with 10 years of experience:

40)-Age(0.92)Experience(82.05.6240)-Age(ˆ)Experience(ˆˆY

ii

i2i10i

20

i

ˆˆ42.72$)4041(0.920)(82.05.26Y

1ˆ10ˆˆ22.19$)4041(0.920)1(82.05.26Y

210

i

56

Interpretation: Model 3: the average hourly wage for a 40-

year-old with no experience at all is about$26.50

: for every additional year ofexperience, the predicted hourly wage decreasesabout 82 cents for two people of the same age(or “adjusting for age”)

: for every additional year of age, theexpected hourly wage increases about 92 centsfor two people with the same amount ofexperience (or “adjusting for experience”)

5.26ˆ0

82.0ˆ1

92.0ˆ2

57

Model 1 vs. Model 3Model 1:

Model 3:

95% CI for 1 in Model 1: (0.001, 0.07)and from Model 3 is outside this CI

Age is a confounder. When we adjust for age,the apparent effect of experience on wagechanges.

ii Experience04.038.8Y

40)-Age(0.92)Experience(82.05.62Y iii

58

The Coefficient of Determination

R2 is the “coefficient of determination”R2 measures the ability to predict Yusing XVariability explained by X is

SSM =Total variability is SST =

2)ˆ( yyi

2)( yyi

59

The Coefficient of Determination

R2 is defined as

Measures the proportion of totalvariability explained by the model

2

22

)(

)ˆ(

yy

yy

SSTSSMR

i

i

60

R2 is the square of r, “Pearson’scorrelation coefficient”

r is a rough way of evaluating theassociation between two continuousvariables.

The Coefficient of Determination

61

So, what is R2?

The coefficient of determination, R2

evaluates the entire model.R2 shows the proportion of the totalvariation in Y that has beenpredicted by this model.

Model 1: 0.0076; 0.8% of variationexplainedModel 2: 0.05; 5% of variation explainedModel 3: 0.20; 20% of variation explained

62

What is the adjusted R2?

In both models 2 and 3, the new predictoradded a great deal to the model

R2 increased a lotMore importantly, both new predictors werestatistically significant

R2 always goes up!The adjusted R2 is adjusted for the number ofX’s in the model, so it only goes up whenhelpful predictors are added.

63

SummaryRegression by least squaresInterpreting regression coefficientsAdding a 2nd predictor to a model

Binary X added: 2 parallel linesContinuous X added: 3-dimensional graphfor both, new interpretation reflecting new model

Is the new X a confounder?Compare 1 across models

top related