logo regression analysis lecturer: dr. bo yuan e-mail: [email protected]
TRANSCRIPT
![Page 2: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/2.jpg)
Regression
To express the relationship between two or more variables by a mathematical formula.
x : predictor (independent) variable
y : response (dependent) variable
Identify how y varies as a function of x.
y is also considered as a random variable.
Real-Word Example:
Footwear impressions are commonly observed at crime scenes.
While there are numerous forensic properties that can be obtained
from these impressions, one in particular is the shoe size. The
detectives would like to be able to estimate the height of the
impression maker from the shoe size.
The relationship between shoe sizes and heights2
![Page 3: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/3.jpg)
Shoe Size vs. Height
3
![Page 4: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/4.jpg)
Shoe Size vs. Height
What is the predictor?
What is the response?
Can the height by accurately estimated from the shoe size?
If a shoe size is 11, what would you advise the police?
What if the size is 7 or 12.5?
4
![Page 5: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/5.jpg)
General Regression Model
The systematic part m(x) is deterministic.
The error ε(x) is a random variable.
Measurement Error
Natural Variations
Additive
5
)()()( xxmxy
![Page 6: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/6.jpg)
Example: Sin Function
6
)()sin()( xxAxy
![Page 7: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/7.jpg)
Standard Assumptions
7
![Page 8: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/8.jpg)
A1
8
![Page 9: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/9.jpg)
A2
9
![Page 10: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/10.jpg)
A3
10
![Page 11: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/11.jpg)
Back to Shoes
11
![Page 12: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/12.jpg)
Simple Linear Regression
12
xxm 10)(
![Page 13: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/13.jpg)
Model Parameters
13
![Page 14: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/14.jpg)
Derivation
14
n
iii xyR
1
21010 ),(
xy
xyn
iii
R
10
1100
020
2
1
2
11
111
1100
0
021
xnx
yxnyx
xxyxyx
xyx
n
ii
n
iii
n
iiiii
n
iiii
R
![Page 15: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/15.jpg)
Standard Deviations
15
n
iin 1
22
2
1
2/1
2
1
2
21
0
xnx
x
n n
i
2/1
2
1
2
11
xnxn
i
![Page 16: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/16.jpg)
Polynomial Terms
Modeling the data as a line is not always adequate.
Polynomial Regression
This is still a linear model!
m(x) is a linear combination of β.
Danger of Overfitting
16
p
k
kk
pp xxxxm
010 ...)(
![Page 17: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/17.jpg)
Matrix Representation
17
i
p
k
kiki xy
0
XY
![Page 18: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/18.jpg)
Matrix Representation
18
XYXYR T )(
YXXX
XXYXXYYYTT
TTTTTTR
00
YXXX TT 1
![Page 19: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/19.jpg)
Model Comparison
19
n
ii yySST
1
2 :Total Squares of Sum
n
iii yySSE
1
2^
:Error Squares of Sum
![Page 20: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/20.jpg)
R2
20
SST
SSE
SST
SSESSTR
12
2 / ( ( 1))1
/ ( 1)adj
SSE n pR
SST n
![Page 21: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/21.jpg)
Example
21
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-5
0
5
10
15
20
25
30
X
Y
Y= -3.6029+4.8802X
R2=0.9131
Y= 0.7341-0.4303X+1.0621X2
R2=0.9880
Y=X2+N(0,1)
![Page 22: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn](https://reader036.vdocuments.net/reader036/viewer/2022062619/55180736550346a2228b4e74/html5/thumbnails/22.jpg)
Summary
Regression is the oldest data mining technique.
Probably the first thing that you want to try on a new data set.
No need to do programming! Matlab, Excel …
Quality of Regression
R2
Residual Plot
Cross Validation
What you should learn after class:
Confidence Interval
Multiple Regression
Nonlinear Regression
22