–the shortest distance is the one that crosses at 90° the vector u statistical inference on...
TRANSCRIPT
– The shortest distance is the one that crosses at 90° the vector u
Statistical Inference on correlation and regression
– The shortest distance is the one that crosses at 90° the vector u
Statistical Inference on Correlation
Angle between two variables Relationship between two variables
Statistical Inference on Correlation
The null hypothesis is that there is no correlation between the two variables in the population. In other words, we seek to know if the two variables are linearly independent. If the hull hypothesis is rejected, then it means that the two variables are not independent and that there is a linear relationship between the two.
0
1
: 0
: 0
xy
xy
H
H
2
)1( 2
2
ndf
dfr
rF
xy
xy
Statistical Inference on Correlation
Example
3252
7744.0
88.0
5
0:
0:
2
1
0
ndf
r
r
n
H
H
xy
xy
xy
xy
10.3 3
7744.01
7744.0
1 2
2
dfr
rF
xy
xy
xy
In this case, we cannot use the standard normal distribution (Z distribution). We will use the F ratio distribution instead.(see pdf file)
Statistical Inference on Correlation
Example
3252
7744.0
88.0
5
0:
0:
2
1
0
ndf
r
r
n
H
H
xy
xy
xy
xy
Nb of variables - 1
Nb
of p
arti
cipa
nts-
2
Statistical Inference on Correlation
Because Fxy > F(0.05, 1, 3) (10.3>10.13) we
reject H0 and therefore accept H1. There
is a linear dependency between the two
variables.
Example
3252
7744.0
88.0
5
0:
0:
2
1
0
ndf
r
r
n
H
H
xy
xy
xy
xy
10.3 3
7744.01
7744.0
1 2
2
dfr
rF
xy
xy
xy
13.10)3 ,1 ,05.0() ,1 ,( FdfF
– The shortest distance is the one that crosses at 90° the vector u
Linear regression
We want a functional relationship between 2 variables; not only a strength of association.
In other words, we want to be able to predict the outcome given a predictor
x1
y1
Recall: finding the slope and the constant of a line
Linear regression
• Regression: 0 1v b b u
b
e
– By substitution, we can isolate the b1 coefficient.
Linear regression
• Regression: The formula to obtain the regression coefficients can be deducted directly from geometry
T
T1
T T1
T T1
T 1 T T 1 T1
T 1 T1 1
0
( ) 0
0
( ) ( ) ( ) ( )
( ) ( ) 1
b
b
b
b
b b
u e
u v u
u v u u
u v u u
u u u v u u u u
u u u v
If we generalized to any situation (multiple, multivariate)
T 1 T( )B X X X Y
-1T T1
-121 2
covcov
b
b ss
uvu uv
u
u u u v
(true for 2 variables only)
1 2
0 1
xy
x
Covb
s
b y b x
0 1y b b x
If we replace b0
Parameters of the linear regression
0 1
1 1
1
ˆ
ˆ
ˆ ( )
y b b x
y y b x b x
y y x x b
Equation of prediction
We know that:
If we replace the covariance we then obrain:
Note
xyxy
x y
xy xy x y
Covr
s s
Cov r s s
1 2
1 2
1
xy
x
xy x y
x
xy y yxy
x x
Covb
s
r s sb
s
r s sb r
s s
2 2
1
2
3
4
5
6
7
( ) ( ) ( )( )
8 10 3 4 9 16 12
6 8 1 2 1 4 2
3 2 2 4 4 16 8
5 6 0 0 0 0 0
7 9 2 3 4 9 6
2 2 3 4 9 16 12
4 5 1 1 1 1 1
35 42 0 0 28 62 41
Sujet x y x x y y x x y y x x y y
s
s
s
s
s
s
s
5 6
2.16 3.21x y
x y
s s
2
cov 6.83
0.98
0.96
xy
xy
xy
r
r
Example
Participant
1
0 1
0 1
3.210.98 1.46
2.16
6 1.46 5 1.3
ˆ
ˆ 1.3 1.46
yxy
x
sb r
s
b y b x
y b b x
y x
Example