Chapter 8Correlation and regression analysis
Chapter 8 Correlation and Regression Analysis
Statistics in Practice
The restaurant vocation in western countries has an unwritten law, that is when consuming service in restaurant, people need to pay a amount of tips, many people heard of that how much of tips should pay?
About 16% of bill, is it true? Let’s seeing about table 10-1, the data in table is the sample data through investigation, through analyzing and observing these data, we can find out the quantity relation of two.
Table 10-1The data of bill and tip Bill
(dollar) 33.5 50.7 87.9 98.8 63.6 107.3 120.7
Tips (dollar)
5.5 5.0 8.1 17.0 12.0 16.0 18.6
STAT
Chapter 8 Correlation and Regression AnalysisQuest ions are :
1 Are there enough evidences to conclude: there exists some relations between bill and tips?
2 If this relation exists, how to use this relation to confirm how much of the tips should be left?
The key points in this chapter are making some deduction based on the sample data appeared in couples . Example as above, we want to make sure if there exists some relations between bill and tips, if it exists, we want to use a formula to describe it, by doing so we can find out the rules people obeyed when they pay tips. There are many questions like this, such as:
(1) The rate of crime and the rate of stealing;
(2) The cigarettes being consumed and the rate of being cancered;
(3) The level of individual’s earning and the years of being educated;
(4) The age and blood pressure;
(5) The stature of parents and children;
(6) The stipend and the price of alcohol;
(7) The length of lifeline in the palm of people and the length of people’s life-span.
STAT
Key points
1 Correlative relations and the description of regression equation;
2 Correlative relations of determination;
3 Fit the regression equation;
4 The application of regression equation.
Difficult points
1 Calculate coefficient of production-moment correlation
2 The sum of squares of total deviation and its decompose
References and Bibliography 1 、 Li Xinyu :《 Application Economy Statistics 》, Beijing university Press ;2 、 David S.Moore: 《 The World of Statistics 》, Zhongxin Press ;3 、 Yuan Wei :《 New Statistics Tutorial 》, Economy and Technology Press ;4 、 Statistics Websites : UNSD 、 OECD 、 China National Statistics Bureau ;
Chapter 8 Correlation and Regression Analysis STAT
Summarize the correlation relations
1 Mutual relations between variables
(1) Function relation
Definition: Complete certain( quantity) relation.
A: One group of variables have one to one corresponding relations with another group of variables;
[Example] Wages by pieces (y) and output (x) y=f(x)=10x ; x0=1piece , y0=10yuan ; x1=2piece , y1=20yuan the area of round S = ΠR2 , R=10 , S=100 ΠB: y the variable being explained( dependent variable); x explained variable( independent variable).
2 Correlation relation
(1) Definition: Incomplete certain relation.
A: When one group of variables have relations with the other one, but not one to one corresponding;
STATChapter 8 Correlation and Regression Analysis
STAT
[ Example ] Stature y and Weight x ; A : x=60kg 、 y=170m ; B : x=60kg 、 y=1.72m ; C : x=60kg 、 y=1.68m ; D : x=60kg 、 y=1.65m 。
B: Description : y=f(x)+。 The factors that affect stature: weight 、 inheritance 、 exercise 、 the
quality of dormancy
2 、 Causes(1) Some affected factors haven’t been recognized;(2) Although have been recognized but can’t be measured;(3) Measure errors. [ Example ] some fruit p yuan/kilo: quantum of purchase y=Px quantity x=2kilo y=2P+=2×1.9+0.2 3 、 The forms of quantity relations
Chapter 8 Correlation and Regression Analysis
(1) Single cause and effect relations;
(2) Mutual cause and effect relations;
(3) Concomitant relations.
3 、 The kinds of correlativity
( 1 ) Sort by correlative level A: Complete correction : function relations ; B: Dissociation : no relations ; C: Incomplete correction.
( 2 ) Sort by corrective direction A: Positive correlation: the variables’ directions of change are the same( increase the same time and decrease the same time); B: Negative correlation: the variables’ directions of change are opposite( one increases and the other decreases).
STATChapter 8 Correlation and Regression Analysis
Chapter 8 Correlation and Regression Analysis
STAT
3 Sort by correlative forms
(1) Linear dependent;
(2) Linear independent
Correlative level is close
Correlative level is not close
4 Sort by affected factors
(1) Single correlation: only one variable:
[Example] The grades of study and the time spent on it;
Blood pressure and the age; the output of a unit of area and the quantity of fertilizing.
(2) Multiple correlation: two or more than two variables;
[Example] The relations between the growth of economy and the growth of population 、 the level of science and technology 、 natural resource 、 the level of management and so on.
The relations of weight 、 appetite 、 the time of sleeping and so on.
(3) Partial correlation: Measure the two variables’ correlative level among some variables while supposing other variable doesn’t change.
[Example] To y=ax1+bx2+ , investigate the relations of y and x1, supposing x2 doesn’t change.
STATChapter 8 Correlation and Regression Analysis
Measure the relations of linear dependent
[Purpose ] Measure the correlative directions and close level among the variables.
1 Correlative graph
( 1 ) Correlative table
A: Single variable grouping of correlative graph: independent variables are grouped and calculated the times, dependent variables are only calculated the average numbers.
Relational data of 30 congener enterprises
Output (piece) x The numbers of enterprises Average cost of a unit (yuan) y
20
30
40
50
80
16.8
15.6
15.0
14.8
14.2
9
5
5
6
5
STATChapter 8 Correlation and Regression Analysis
(2) Double variables grouping of correlative graph: both dependent variable and independent variable are grouped.
Annotate: independent variable X axis; dependent Y axis.
Relational data of 30 congener enterprises
Cost of a unit (yuan/piece)
18
16
15
14
Output x (piece)
20 30 40 50 80
4
4
1
__
__
3
2
__
__
1
3
1
__
1
3
2
__
__
1
4
Summation
4
9
10
7
Summation 9 5 5 6 5 30
STATChapter 8 Correlation and Regression Analysis
2 Correlative table: scatter diagram
[Shortage] Difficult to reflect the correlative close level accurately
2 (Linear) Correlation coefficient
( 1 ) Production-moment method calculation formula
Suppose is a group of sample observation values of
then, is the correlation coefficient of x and y ,
),( ii yx ),( YX
yx
xyr
yx
yx
yx
xy
,
,
Covariance
Standard deviation
2222 )()(
))((
)()(
))((
yyxx
yyxx
nyy
nxxn
yyxx
r
yyxx
xy
LL
L
STATChapter 8 Correlation and Regression Analysis
STAT
2 The effects of covariance xy
(1) Show the relative direction of x and y.
X
Y
yy
xx )1()2(
)3( )4(
),( 11 yx
),( nn yx
0)(
)(
))(()3()1(
xyyy
xx
yyxx
yx
xyr
n
yyxxxy
))((
Positive correlation 0 r
Chapter 8 Correlation and Regression Analysis
Chapter 8 Correlation and Regression Analysis STAT
[Negative correlation]
X
Y
yy
xx )1()2(
)3( )4(
),( 11 yx
),( nn yx
0)(
)(
))(()4()2(
xyyy
xx
yyxx
n
yyxxxy
))((
yx
xyr
Negative correlation 0 r
Chapter 8 Correlation and Regression Analysis STAT
2 、 Show the relative level of x and y
)(
)(
))(()3()1(
yy
xxtableA
yyxx
X
Y
P
Q
))(( yyxx
Table A dense distributing Table B dishevelled distributing
partial to bigger ))(( qqpp partial to smaller
Chapter 8 Correlation and Regression Analysis STAT
[ Negative correlation ]
)(
)(
))(()4()2(
yy
xxTableA
yyxx
X
Y
P
Q
))(( yyxx
Table A dense distributing Table B dishevelled distributing
partial to bigger ))(( qqpp partial to smaller
Chapter 8 Correlation and Regression Analysis STAT
[Dissociation ]
00))((0: xyyyxxxx
X
Y
X
Yxx
yy
00))((0: xyyyxxyytableB
Table A Table B
X and y have no linear correlation
[ Conclude ] The effects of xy
Firstly, show the relative direction of x and y
00
00
00
r
r
r
r
xy
xy
xy
yx
xy
No linear correlation
Positive correlation
Negative correlation
Secondly, show the relative close level of x and y
xy
xy
Is bigger the relative level of x and y is higher
Is smaller the relative level of x and y is lower
STATChapter 8 Correlation and Regression Analysis
3 The effects of x 、 y
Make covariance of different variables standardization direct contrast.
yxyx
xy n
yyxx
r
))((
yxn
yyxx
))((
n
yyxx
yx
n
yyxx
yyxx
n
yyxx ))((Standardization covariance
STATChapter 8 Correlation and Regression Analysis
2 Let 111 rr
n
yyxx
r yx n
yyxx
r yx
2
2
222
1
yxyx
yyxxyyxx
n
2
2
2
22
)()(1
yxyx nyy
nxxyyxx
n
111
2
yx
yyxxn
21
2
2
yx
yyxx
nr
1022 rr The same can be proved 1r
Chapter 8 Correlation and Regression Analysis STAT
Chapter 8 Correlation and Regression Analysis STAT
4 Shortcut calculation formula of correlation coefficient by production-moment method
2222 )()(
))((
)()(
))((
yyxx
yyxx
n
yy
n
xx
n
yyxx
ryx
xy
nyx
xy
)())(( yxyxyxxyyyxx
yxyxxyxy
ny
nx
nnyx
nyx
xy
yxnyn
xx
n
yxy
n
yxxyyyxx
))((Conclusion:
Chapter 8 Correlation and Regression Analysis STAT
[Shortcut calculation formula]
22 2 xxxx
)2()( 222 xxxxxx
222
)(2
nx
nnx
x
n
xx
22 )(
n
xxxx
222 )(
)(
n
yyyy
222 )(
)(
22 2 xnxn
xx
Conclusion
Chapter 8 Correlation and Regression Analysis STAT
[r’s terse calculation formula]
22 )()(
))((
yyxx
yyxxr
yx
xy
ny
ynx
x
nyx
xy
22
22 )()(
n
yy
n
xxn
n
yxxyn
22
22 )()(
)(
2222 )()( yynxxn
yxxyn
2222 yyxx
yxxy
yx
yxxy
n
yy
n
xx
n
n
yxxy
n2
22
2 )()(1
)(1
5 The judge rules of linear dependent 3.0r Slender correlation 5.03.0 r Low correlation
8.05.0 r Significance correlation 18.0 r High correlation
0r X and y have non-linear relation, but may have other relations
1r X and y have absolute linear relation: function relation
[Example] In order to know the amount relations of consumption and tips in restaurant , select 10 consumers through random sampling from some consumers to investigate, the amounts gained are in the following:
STATChapter 8 Correlation and Regression Analysis
The data of the consumption amount in restaurant and the tips are in the following: unit: dollar
Consumption 33.5 50.7 87.9 98.8 63.6 107.3 120.7 78.5 102.3 140.6
Tips 5.5 5.0 8.1 17 12 16 18.6 9.4 15.4 22.4
Someone believe that the length of the lifeline of palm can forcast their’s life span.
In the letter which relesed in 《 American medicine association
transaction 》 by M.E.Winson and L.E.Mather, denounce refute it through the research of the ashes. The age of death and the length of the lifeline of palm are recorded. The author have a conclusion that there have no pertinent relevent between the age of death and the length of the lifeline of palm . Hand anthroposcopy is lost, so the hand put down.
STATChapter 8 Correlation and Regression analysis
(6) Characteristic of relevent coefficient of sample
1, two variables both are random variable. 2, two variables are equal rxy= ryx 。
3, the extent of closing to 1 is relevent to sample content n.
n small r 1. sepecial example : when n=2, r=1
148
48
16.9225
48
)()( 2222
yynxxn
yxxynr
[Example] : sample ( x,y ) is ( 6,12.6 ) , ( 1,3.0 ) , n=2.
[Example] draw out 10 stores randomly from the 100 stores, we have
8
stores
money
Profit %
STATChapter 8 Correlation and Regression analysis
(7) normal error of correlation
When we explain the result of correlation, there would be there normal errors.
1, correlation imply the relation of cause and effect. Such as : one research indicate that the salary of the statistic professor have a positive correlation with the amount of consuming of beer of per person, but these two variables are effected by economic position.
2, Correlation coefficient is zero, to a centainty is inrelevent.
3, the correlation extent of the relevent analysis of mean value and the relevent analysis of unit data. For example: in a research, the twin data of individual income and education bring the linear correlation cofficient 0.4, but when the area of using is average, the linear correlation cofficient change to 0.7.
STATChapter 8 Correlation and Regression analysis
(8) Hypothesis testing of linear correlation ( two methods)
1, advance the original and alternative hypothesis
2, advance the the level of significance α.
3, choose the method of testing and design tesstatistic.
4, compared test statistic with critical value, if the absolute value of test statistic is larger than the critical value, reject the original hypothesis, otherwise, don’t reject original hypothesis.
T testing
2
1 2
n
r
rt
0:,0: 10 HH
r testing: using the computed r as the test statistic, its critical value can be find in the table
STATChapter 8 Correlation and Regression analysis
Hypothesis testing of linear correlation ( two methods)
Like the former example : r of the bill and tip is 0.92, if use test statistic;
0:,0: 10 HH
r testing hypothesis:
N=10,r=0.92,rα=0.632, r > rα reject original hypothesis, consider there exsit ∴pertinent linear correlation between the two.
If
so reject original hypothesis
it is considered that there exists pertinent relevant relationship between bill consumption and tip
STATChapter 8 Correlation and Regression analysis
The third section regression analysis
A. summarize of regression analysis
(1) concept
1, linear correlation analysis: calculate the linear correlation coefficient r establish the correlation aspect and osculation extent of the two variables.
[not enough] can not indicate the relation of cause and effect of the two variables can’t presume the change of the variable( y) according to one or several variables ( xi )
The money and tip of ten consumers who have meals consume
Bill x 33.5 50.7 63.6 78.5 87.9 98.8 107.3 102.3120.7 140.6
Tip y 5.5 5 12 9.4 8.1 17 16 15.4 18.622.5
r=0.92
STATChapter 8 Correlation and Regression analysis
2, regression analysis : through the change of one variable to explain the change of other variable
y = a+bx 、 y=a+b1x1+bx2 、 y=0+ 1x1+ 2x2+…+ nxn
[regression] first advance by England biologist F · Galton
elder stature offspring stature
X y y = f ( x ) + men’s average stature
(2) varieties of regression analysis
1, classify by the number of independent variable
(1) simply ( unitary ) regression: only one independent varible
[example] y = a+bx unitary regression equation
(2) ] multiple regression: two or more independent varibles
[example] y=0+ 1x1+ 2x2+…+ nxn
STATChapter 8 Correlation and Regression analysis
2, classify by the character of the regression equation
(1) linear regression: dependent variable is the linear function of independent variable
[example] y = a+bx unitary linear regression equation
(2) nonlinear regression: dependent varible is the nonlinear function of indenpent variable
[example] double curve regression equation
Exponential function regression equation
Logarithmic function regression equation
STAT
power function regression equation
Chapter 8 Correlation and Regression analysis
(3) steps of regression analysis
1, establish independent and dependent variable
[example] output of food supplies ( y ) output of fertilization ( x );
expenditure of consume ( y ) country income (x) ;
fire lost ( y ) the distance between the fire accures and the nearestfirehouse ( x ) .
2, establish the sample regression equation
3, testing statistic
4, forecast or control
[example] the regression equation of consume and income: y= a+bx= 200+0.15x
known x establish y : estimate or forcast
known y establish x : control
STATChapter 8 Correlation and Regression analysis
B, fit of unitary linear regression equation
(1) population regression equation
[example] the data of governable income and the expenditure of consume of 40 families
incomeconsumption
first group
second groupthird group
fourth group
firth group
condition probility
condition mean
condition probility : condition mean :
STATChapter 8 Correlation and Regression analysis
[table]
50
100
150
200
iY
80 100 120 140 160 180 200iX
Population regression beeling
distribution
distribution
distribution
STATChapter 8 Correlation and Regression analysis
[suppose] the mean of y distribution are all in a beeling
50
100
150
200
iY
80 100 120 140 160 180 200iX
Population regression beeling
Premise 1 : there exist linear relation between X and E (Y/X )
Premise 2 : N
Premise 3 : the effect of casual factor is counteracted.
)/( ii XYE = population regression beeling
STATChapter 8 Correlation and Regression analysis
Yi/Xi= condition mean +εi =α+βXi+ εi
50
100
150
200
iY
80 100 120 140 160 180 200iX
Population regression beeling
22)(
)var(
N
YY ii
i 160ii XY /
i
Population regression beeling
random disturb and suppose
STATChapter 8 Correlation and Regression analysis
50
100
150
200
iY
80 100 120 140 160 180 200iX
Population regression beeling
)/( ii XYE = population regression beeling
Sample regression equation
bxay ˆiii XXYE )/(
[ fit ideas ] sample N n,
STATChapter 8 Correlation and Regression analysis
( 2 ) fit of sample regression equation
randomly sample from the population, get a group of sample observational value. [example] : the data of the governable income and expenditure of consumption of 40 familities
income
consumption
condition probility
condition mean
STATChapter 8 Correlation and Regression analysis
[table]
50
100
150
200
iY
80 100 120 140 160 180 200iX
Sample regression beeling
1e
2e
iiiii ebxaeyy ˆ
sample regression equation ( beeling )
Residual : observational value – regression value
regression coefficient
sample a
sample b
population
population
“ population regression equation “ is unknown
STATChapter 8 Correlation and Regression analysis
50
100
150
200
iY
80 100 120 140 160 180 200iX
Sample regression beeling
1e
2e
unknown
known
step : 1, use sample date fit sample regression beeling, try to reduce error; 2, test the fungible extant of sample regression beeling for population
regression beeling.
STATChapter 8 Correlation and Regression analysis
( 3 ) the fit method of sample regression equation
XXYE )/(
bxay ˆ
n
iii yy
1
2)ˆ(
min)(1
2
n
iii bxay
n
iieQ
1
2
1, fit method of absolute value
Let the beeling of “ best beeling “
2, OLS
basic thinking : the beeling which make squares sum of residual least is “ best beeling “
find the best beeling find the best a and b
STATChapter 8 Correlation and Regression analysis
can find the value of a and b which make the value of Q is least
min)()ˆ( 22 bxayyyQ
0)()(2
0)1)((2
xbxayb
Q
bxaya
Q
xbyn
xb
n
ya
We get
From ( 1 ) equation
STATChapter 8 Correlation and Regression analysis
xbyn
xb
n
ya
xbxaxy
xbnay
)2(
)1(2
2xbxn
xb
n
yxy
22)(
xbn
xb
n
yx
n
xx
n
yxxy
b2
2 )(
Let a into the (2) equation, we get
clean up :
STATChapter 8 Correlation and Regression analysis
[simple calculate]
xbyn
xb
n
ya
n
xx
n
yxxy
b2
2 )(
22 )( xxn
yxxyn
n
xxxx
222 )(
)(
2)(
))((
xx
yyxxb
n
xxn
yyxx
2)(
))((
2x
xy
Known :
STATChapter 8 Correlation and Regression analysis
2x
xy
yx
xy br
yx
xy
y
x
x
xy
y
xbr
2
2x
xy
x
y
yx
xy
x
yrb
bxay ˆ
The relationship of correlation coefficient r and regression coefficient b
(1) both are in the same direction;
(2) r reflect the correlation direction and osculation extent
b reflect the average change of one variable when a variable change a
unit .
STATChapter 8 Correlation and Regression analysis
Example: In order to research the relationship between the consumption of having
dinner and expenditure of tip, randomly draw out ten customers of having dinner,
we get sample date follows: The money of having dinner consumption and tip data follows: unit: dollar
consumption
tip
please fit sample regression equation
sample correlation coefficient r=0.92
STATChapter 8 Correlation and Regression analysis
5. 55
129. 48. 1
17 16 15. 418. 6
22. 5
0
5
10
15
20
25
33.5
50.7
63.6
78.5
87.9
98.8 107
102
121
141
[example] In order to research the relationship between the consumption of having dinner and expenditure of tip, randomly draw out ten customers of having dinner ( use EXCEL softeware inborm the scatter diagram )
please fit sample regression equation
STATChapter 8 Correlation and Regression analysis
• Solution: through the scatter diagram can approximatively see the linear connection between the consuming of having dinner and tip expenditure. So we let
• y=a+bx
• regression equation:
• Economic meaning: when add the 100 RMB of the consuming of having dinner expenditure, there are averaged adding 16.6RMB of the tip expenditure.
18.13031,59.1987
23.87703,5.129,9.883,102
2
xyy
xyxn
166.009.95753
75.15846
9.88323.8770310
5.1299.88318.1303110
)( 222
xxn
yxxynb
723.139.88166.095.12
n
xb
n
yxbya
xbxay 166.0723.1ˆ
Chapter 8 Correlation and Regression analysisSTAT
• The variance analysis of regression equation• Bring forward the question: sample
magnitude, namely fit goodness. • ( 1 ) decompose of the sum of squares for total deviation
• total deviation= • residual + regression deviation
• regression deviation
)ˆ(ˆ yybxay
)]ˆ()ˆ[()( yyyyyy )ˆ()ˆ()( yyyyyy
bxayebxay ˆ
xbayxbya residualeyy ˆ )(ˆ xxbyy
)( yy )ˆ( yy
)ˆ( yy
bxay ˆ x
y
yy
Chapter 8 Correlation and Regression analysisSTAT
• According:
• Both sides adding
)ˆ()ˆ()( yyyyyy 22 )]ˆ()ˆ[()( yyyyyy
)ˆ)(ˆ(2)ˆ()ˆ()( 222 yyyyyyyyyy
xbayxbyabxay ˆ
))(()ˆ)(ˆ( xbabxabxayyyyy )]()([ xbbxbxxbyy )()( xxbbxxbyy ))](()[( xxxxbyyb 0])())([( 2 xxbyyxxb
22
)())(()(
))((xxbyyxx
xx
yyxxb
22
)())(()(
))((xxbyyxx
xxyyxx
b
22 )]ˆ()ˆ[()( yyyyyy
Chapter 8 Correlation and Regression analysisSTAT
• [Analysis of deviation]
• sum of squares for total deviation (SST)= squares for the residual(SSE)• + squares for the regression(SSR)• ( 1 ) SSE analysis:• the error resulted from residual
smaller of the error e y is closer to the y better of the fit degree
larger of the error e y is farer from the y worse of the fit degree
( 2 ) SSR analysis:• residual resulted from the change of x
222 )ˆ()ˆ()( yyyyyy
222 )()ˆ( ebxaebxayy 2)ˆ( yy
2222 )()()ˆ( xxbxbabxayy
2)ˆ( yy
Chapter 8 Correlation and Regression analysisSTAT
is larger (y-y) is smaller the effect of y fit y is good
is smaller (y-y) is larger the effect of y fit y is bad
222 )ˆ()ˆ()( yyyyyy
2
2
2
2
2
2
)()ˆ(
)()ˆ(
)()(
yyyy
yyyy
yyyy
22
2
2
2
)(
)ˆ(
)(
)ˆ(1 r
yy
yy
yy
yy
2
22
)(
)ˆ(1
yy
yyr
2
2
)(
)ˆ(
yy
yy
2
2
r
r
the proportion of SSR account for the STR
(determinant coefficient)
•Determinant coefficient
Chapter 8 Correlation and Regression analysisSTAT
• [Effect of determinant coefficient]
yyr ˆ12
yyr ˆ02
10 2r
2
22
)(
)ˆ(1
yy
yyr
2
2
)(
)ˆ(
yy
yy
bxay ˆ
yy
)( yy
)ˆ( yy
)ˆ( yy
x
y
Chapter 8 Correlation and Regression analysisSTAT
Determine the relationship between the coefficient of determination r2 and correlation coefficient r
Chapter 8 Correlation and Regression analysis
2
22
)(
)ˆ(
yy
yyr
xbaybxayand ˆ:
2222 )()()ˆ( xxbxbabxayy
2
22
)(
)ˆ(
yy
yyr
2
22
)(
)(
yy
xxb
n
yyn
xxb
2
22
)(
)(
2
22
y
xb
2
22
2
2
2 rbryx
xy
y
x
x
xy
y
x
The sum of squares of regression deviation The sum of squares of total deviation
STAT
• (3) standard error of the estimate• 1. Definition: The average error between the observed
value and regression value.• 2. Formula: Regression analysis
Chapter 8 Correlation and Regression analysis
XYEpopulation
bxaysample
)(:
ˆ:
:)ˆ( 2yy
2
)ˆ( 2
n
yyS yx
smaller
gerlar
S yx
The average error between y and y
The larger of the mean deviation the worse of the effect of fit
The smaller of the mean deviation the better of the effect of fit
The sum of squares of deviation between observed value and regression value.
STAT
Chapter 8 Correlation and Regression analysis
Graph analysis N
YYn
yyS yyx
22 )(2
)ˆ(
iY
200
150
100
50
2
2
n
xybyayS yx
80 100 120 140 160 180 200 iX
SSE yyx )(
Simple and fast calculation formula
The regression beeline of population
is the unbiased estimator value of
STAT
Chapter 8 Correlation and Regression analysis
( 4 ) The method of variance and coefficient of determinationThe method of variance and coefficient of determination
2
2
2
22
)(
)ˆ(1
)(
)ˆ(
yy
yy
yy
yyr
2
2
)(
)ˆ(1
yy
yyr
n
yyn
yy
2
2
)(
)ˆ(
1
2
2)ˆ(
1y
n
yy
2 nn 2
2
1y
yxSr
2
22 1
y
yxSr
22
2
1 rS
y
yx
21 rS yyx
coefficient of determination
N is very large The method of variance
STAT
Chapter 8 Correlation and Regression analysis
Example: Knowing the following information , try to calculate the coefficient of determination and the standard error of the estimate.
Revenue x Expendi ture y x2 y2 xy 20
30
33
40
15
13
26
38 35 43
7
9
8
11
5
4
8
10 9 10
400 900 1089 1600 225 169 676 1444 1225 1849
49 81 64 121 25 16 64 100 81 100
140 270 264 440 75 52 208 380 315 430
293 81 9577 701 2574
STAT
Chapter 8 Correlation and Regression analysis
Answer: The deviation about the mean between observed value and regression value is 0.73, and 88.03% of the total deviation is due to the change of X.
2574,701,9577,81,293,10 22 xyyxyxn
2033.01726.2 ba
2992.4)ˆ( 22 xybyayyy
73.0210
2992.42
)ˆ( 2
n
yyS yx
%03.8849.4
5374.01
1.81.70
5374.01
5374.011
2222
22
yy
Sr
y
yx
Example: Knowing the following information , try to calculate the coefficient of determination and the standard error of the estimate.
STAT
Chapter 8 Correlation and Regression analysis
• The third section: Multiple linear regression analysis11 、 、 Multiple linear regression modelMultiple linear regression model: It refers to study the quantity relations between independent variable and dependent variable which are two or over two, under the condition that it is linear correlative.The model is : y=0+ 1X1 2X2+…+ nXn+ei
2 、 The parameter estimate of the multiple linear regression model :least squares method. To get the estimate value of regression coefficient, we usually use the statistics software. And the equation can be expressed by matrix:
nnnnnknn
k
k
n e
e
e
eB
y
y
y
YB
u
u
u
U
xx
xx
xx
X
y
y
y
Y 2
1
2
1
2
1
2
1
2
1
2
222
121
2
1
,
ˆ
ˆ
ˆ
ˆ,
ˆ
ˆ
ˆ
ˆ,,,
1
1
1
,
STAT
Thanks for Your Attention