jul-15h.s.1 linear regression hein stigum presentation, data and programs at:
TRANSCRIPT
04/19/23 H.S. 1
Linear Regression
Hein Stigum
Presentation, data and programs at:
http://folk.uio.no/heins/courses
04/19/23 H.S. 3
Outcome and regression types
• Numerical data– Discrete
• number of partners
– Continuous• Weight
• Categorical data– Nominal
• disease/ no disease
– Ordinal• small/ medium/ large
Poisson regression
Linear regression
Logistic regression
Ordinal regression
04/19/23 H.S. 4
Regression idea
residual error,e
xofeffect ,tcoefficienb
covariate =x
outcome=y
:model
1
10
exbby
covariate = x,x
:cofactorsmany with model
21
22110 exbxbby
2500
3000
3500
4000
4500
5000
birt
h w
eigh
t (gr
am
)
250 260 270 280 290 300 310gestational age (days)
04/19/23 H.S. 5
Measures and Assumptions
• Adjusted effects– b1 is the increase in weight per day of gestational age
– b1 is adjusted for b2
• Assumptions– Independent errors
– Linear effects
– Constant error variance
• Robustness– influence
esexbgestbbweight 210
04/19/23 H.S. 6
Workflow
• DAG
• Plots: distribution and scatter
• Bivariate analysis
• Regression– Model estimation– Test of assumptions
• Independent errors• Linear effects• Constant error variance
– Robustness • Influence
Discuss
Plot
Plot
04/19/23 H.S. 8
DAGs
Egest age
Dbirth weight
C2parity
C1sex
Associations Bivariate (unadjusted)Causal effects Multivariable (adjusted)
Draw your assumptions before your conclusions
04/19/23 H.S. 9
Plot outcome by exposure
OK
Be clear on the research question: overall birth weight: linear regression low birth weight: logistic regression linear and logistic can give opposite resultsMay lead to non-constant error variance
May have high influential outliers
Effects on linear regression:
Bivariate analysis
04/19/23 H.S. 11
Outcome: birthweightN Mean p-value
All 564 3604Gestational age <0.001
<=280 days 230 3436>280 days 288 3744
Sex 0.004Boy 291 3668Girl 273 3535
Parity <0.0010 225 34851 215 36772 123 3695
Categorical covariates
• 2 categories– OK, but know the coding
• 3+ categories– Use “dummies”
• “Dummies” are 0/1 variables used to create contrasts
• Want 3 categories for parity: 0, 1 and 2-7 children
• Choose 0 as reference
• Make dummies for the two other categories
04/19/23 H.S. 13
generate Parity1 = (parity==1) if parity<.
generate Parity2_7 = (parity>=2) if parity<.
Create meaningful constant
Expected birth weight at:gest= 0, sex=0, parity=0
gest=280, sex=1, parity=0
7_21
)(tirth weighExpected b
43210 ParityParitysexgest
yE
gr
gr
35241280
1972
210
0
Alternative: center variablesgen gest280=gest-280 gest280 has a meaningful zero at 280 days
gen sex0=sex-1 sex0 has a meaningful zero at boys
Model results
04/19/23 H.S. 16
coeff 95% conf. Int.Birth weight at ref 3524.3Gestational age
per day 6.0 (3.9 , 8.2)Sex
Boy 0Girl -139.2 (-228.9 , -49.5)
Parity0 01 232.0 (130.6 , 333.5)2-7 226.0 (106.9 , 345)
04/19/23 H.S. 17
Test of assumptions
• Discuss
• Independent residuals?
• Plot residuals versus predicted y
• Linear effects?
• constant variance?-1
000
-500
050
010
0015
00R
esid
uals
3200 3400 3600 3800 4000Linear prediction
Outlier not included
04/19/23 H.S. 18
Violations of assumptions• Dependent residuals
Use linear mixed models
• Non linear effectsAdd square term
Or use piecewise linear
• Non-constant varianceUse robust variance estimation
-1-.
50
.51
200 220 240 260 280 300gest
-2-1
01
2re
s
3400 3500 3600 3700 3800p
04/19/23 H.S. 19
Influence
Outlier
Regression with outlier
Regressionwithout outlier
2000
3000
4000
5000
6000
Birt
h w
eigt
h
200 300 400 500 600 700Gestational age
04/19/23 H.S. 20
Measures of influence
• Measure change in:– Predicted outcome
– Deviance
– Coefficients (beta)• Delta beta
Remove obs 1, see changeremove obs 2, see change
-.6
-.4
-.2
0.2
Influ
ence
1 2 10Id
Delta beta for gestational age
04/19/23 H.S. 21
539-10
-8-6
-4-2
0D
fbet
a ge
stC
280
2000 3000 4000 5000 6000weight
beta for gestational age= 6.04
If obs nr 539 is removed, beta will change from 6 to 16
Removing outlier
04/19/23 H.S. 22
coeff 95% conf. Int.Birth weight at ref 3524Gestational age
per day 6 (4 , 8)Sex
Boy 0Girl -139 (-229 , -49)
Parity0 01 232 (131 , 333)2-7 226 (107 , 345)
coeff 95% conf. Int.Birth weight at ref 3531Gestational age
per day 17 (13 , 20)Sex
Boy 0Girl -166 (-252 , -80)
Parity0 01 229 (132 , 326)2-7 225 (112 , 339)
Full data Outlier removed
One outlier affected two estimates Final model