week10 annotated

90
ACTL2002/ACTL5101 Probability and Statistics: Week 10 ACTL2002/ACTL5101 Probability and Statistics c Katja Ignatieva School of Risk and Actuarial Studies Australian School of Business University of New South Wales [email protected] Week 10 Probability: Week 1 Week 2 Week 3 Week 4 Estimation: Week 5 Week 6 Review Hypothesis testing: Week 7 Week 8 Week 9 Linear regression: Week 11 Week 12 Video lectures: Week 1 VL Week 2 VL Week 3 VL Week 4 VL Week 5 VL

Upload: bob

Post on 16-Feb-2016

237 views

Category:

Documents


0 download

DESCRIPTION

.

TRANSCRIPT

Page 1: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

ACTL2002/ACTL5101 Probability and Statistics

c© Katja Ignatieva

School of Risk and Actuarial StudiesAustralian School of Business

University of New South Wales

[email protected]

Week 10Probability: Week 1 Week 2 Week 3 Week 4

Estimation: Week 5 Week 6 Review

Hypothesis testing: Week 7 Week 8 Week 9

Linear regression: Week 11 Week 12

Video lectures: Week 1 VL Week 2 VL Week 3 VL Week 4 VL Week 5 VL

Page 2: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

3101/3173

Page 3: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Last nine weeks

Introduction to probability;

Moments: (non)-central moments, mean, variance (standarddeviation), skewness & kurtosis;

Special univariate (parametric) distributions (discrete &continue);

Joint distributions;

Convergence; with applications LLN & CLT;

Estimators (MME, MLE, and Bayesian);

Evaluation of estimators;

Interval estimation & Hypothesis tests.3102/3173

Page 4: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

This week

Simple linear regression:

- Idea;

- Estimating using LSE (& BLUE estimator & relation MLE);

- Partition of variability of the variable;

- Testing:

i) Slope;

ii) Intercept;

iii) Regression line;

iv) Correlation coefficient.

3103/3173

Page 5: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Basic idea

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 6: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Basic idea

Basic ideaSuppose observe data y = [y1, . . . , yn]>;

Assume that Y is affected by X with x = [x1, . . . , xn]>;

What can we say about the relationship between X and Y ?

To do so we fit:y = β0 + β1x + ε,

to the data:

(xi , yi ) for i = 1, 2, . . . , n.

y is called the endogenous variable (or response/dependentvariable);

x is called exogenous variable (or predictor/independentvariable);

Question: How to determine β0 and β1?3104/3173

Page 7: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Basic idea

Basic ideaRegression, with E[εi ] = 0:

yi = β0 + β1xi + εi .

We determine β0 and β1 by minimizing S (β0, β1) =∑n

i=1 ε2i .

Hence, we use least squares estimates (LSE) for β0 and β1:

minβ0,β1

{S (β0, β1)} = minβ0,β1

{n∑

i=1

ε2i

}= min

β0,β1

{n∑

i=1

(yi − (β0 + β1xi ))2

}.

The minimum is obtained by setting FOC equal to zero:

∂S (β0, β1)

∂β0= −2 ·

n∑i=1

yi − (β0 + β1xi )

∂S (β0, β1)

∂β1= −2 ·

n∑i=1

xi (yi − (β0 + β1xi )) .

3105/3173

Page 8: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Basic idea

The LSE β0 and β1 are given by setting the FOC equal tozero:

n∑i=1

yi =nβ0 + β1

n∑i=1

xi

n∑i=1

xiyi =β0

n∑i=1

xi + β1

n∑i=1

x2i .

Next step: β0 and β1 as functions of∑n

i=1 yi ,∑n

i=1 xi ,∑ni=1 x

2i , and

∑ni=1 xiyi .

See F&T page 24, slides 3161-3165 (MLE on slide 3123):

β0 =y − β1 · x ; β1 =

∑ni=1 xi · yi − n · xy∑ni=1 x

2i − n · x2

;

σ2 =

∑ni=1(yi − yi )

2

n − 2=

1

n − 2·

(n∑

i=1

y2i − n · y2 −

(∑n

i=1 xi · yi − n · x · y)2∑ni=1 x

2i − n · x2

)3106/3173

Page 9: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Correlation Coefficient

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 10: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Correlation Coefficient

Correlation Coefficient

Regression: find the dependency of Y and X , i.e., they have ajoint distribution.

β1 give the marginal effect of a change in X , ρXY measuresthe strength of the dependence.

Recall from week 3: the correlation coefficient between a pairof random variables X and Y , denoted by ρXY , or simply ρ is:

ρXY =Cov (X ,Y )

σXσY=

E [(X − µX ) · (Y − µY )]√E [(X − µX )2] · E [(Y − µY )2]

.

The value of the correlation coefficient lays between -1 and 1,i.e., −1 ≤ ρXY ≤ 1.

3107/3173

Page 11: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Correlation Coefficient

Correlation Coefficient

We know that the correlation coefficient has the followinginterpretations:

- A correlation of −1 indicates a perfect negative linearrelationship;

- A correlation of 1 indicates a perfect positive linearrelationship;

- A correlation of 0 implies no linear relationship;

- The larger the correlation in absolute value, the stronger the(positive/negative) linear relationship.

3108/3173

Page 12: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Correlation Coefficient

Correlation CoefficientCorrelations are indications of linear relationships - it ispossible that two variables have zero correlation, but arestrongly dependent (non-linear).

The correlation coefficient is a population parameter that canbe estimated from data.

Suppose we have n pairs of observations denoted by:

(x1, y1) , (x2, y2) , . . . , (xn, yn)

Estimate the population correlation ρXY using (week 3):

sx =

√√√√ 1

n − 1

n∑i=1

(xi − x)2 and sy =

√√√√ 1

n − 1

n∑i=1

(yi − y)2

to estimate the population standard deviations σX and σY ,respectively.3109/3173

Page 13: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Correlation Coefficient

Correlation Coefficient

Similarly, the sample covariance is given by:

sX ,Y =1

n − 1

n∑i=1

(xi − x) (yi − y) .

Thus the sample correlation coefficient is:

r =

1

n − 1

∑ni=1 (xi − x) (yi − y)

sxsy

=

∑ni=1 (xi − x) (yi − y)√∑n

i=1 (xi − x)2 ·∑n

i=1 (yi − y)2

=

∑ni=1 xiyi − n (x · y)√(∑n

i=1 x2i − nx2

)·(∑n

i=1 y2i − ny2

) .3110/3173

Page 14: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Correlation Coefficient

Effect correlation

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

y

ρ=0 ρ=0.8 ρ=−0.3

3111/3173

Page 15: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Correlation Coefficient

Effect variance

−8 −6 −4 −2 0 2 4 6 8

−6

−4

−2

0

2

4

6

x

y

σx=4, σ

y=1 σ

x=1, σ

y=1 σ

x=1, σ

y=4

3112/3173

Page 16: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Correlation Coefficient

Effect mean

−5 0 5

−4

−3

−2

−1

0

1

2

3

4

x

y

µx=3, µ

y=0 µ

x=0, µ

y=0 µ

x=0, µ

y=3

3113/3173

Page 17: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Assumptions

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 18: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Assumptions

In order to preform linear regression we require:- Non-collinearity: x is linear independent of 1n or rank(X )=2 orX is not singular, i.e., det(X>X ) 6= 0.

- Weak assumptions:E [ε|X = x ] =0

Var (ε|X = x) =σ2In,

where In is a matrix of size n × n with 1 on the diagonal.- Strong assumption (for tests/CI, not required for

LSE-estimates):

L{ε|X = x} =Nn(0, σ2In).

- Sometimes additional assumption X and ε are independent(not required for LSE-estimates).

Weak assumptions imply:

E[y |X = x

]=E

[Xβ + ε|X = x

]= Xβ

Var(y |X = x

)=Var

(Xβ + ε|X = x

)= σ2In,

⇒ conditional covariance matrix y |X = x is independent of X .3114/3173

Page 19: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Assumptions

Effect non-linear function

−3 −2 −1 0 1 2 3 4 5−3

−2

−1

0

1

2

3

4

5

x

y

exponential linear quadratic

3115/3173

Page 20: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Assumptions

Effect changing variance

−1.5 −1 −0.5 0 0.5 1 1.5 2−4

−2

0

2

4

6

8

x

y

3116/3173

Page 21: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Assumptions

Effect binary choice variable

−1.5 −1 −0.5 0 0.5 1 1.5 2−0.2

0

0.2

0.4

0.6

0.8

1

x

y

3117/3173

Page 22: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Assumptions

Unbiased regression parametersStatistical Properties of Least Squares Estimators. Recall thestatistical model:

yi = β0 + β1xi + εi , for i = 1, . . . , n.

Under the weak assumptions we have unbiased estimates (seeslide 3166):

E[β0

]=β0 and E

[β1

]= β1.

An (unbiased) estimate of σ2 is given by:

s2 =

∑ni=1 ε

2i

n − 2=

∑ni=1

(yi −

(β0 + β1xi

))2

n − 2.

Proof: we use that β0 and β1 are unbiased and E[εi ] = 0:

yi = β0 + β1xi + εi ⇒ E [yi ] = E[β0

]+E

[β1

]·xi .

3118/3173

Page 23: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Assumptions

Interpretation uncertainty slope

−10 −5 0 5 10−10

−5

0

5

10

x

y

σx=4, σ

y=1 σ

x=1, σ

y=1 σ

x=4, σ

y=4

3119/3173

Page 24: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Assumptions

Interpretation uncertainty slope

−10 −5 0 5 10−6

−4

−2

0

2

4

6

x

ε

σx=4, σ

y=1 σ

x=1, σ

y=1 σ

x=4, σ

y=4

3120/3173

Page 25: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Assumptions

Interpretation uncertainty intercept

−5 0 5−3

−2

−1

0

1

2

3

x

y

µx=3, µ

y=−1 µ

x=1, µ

y=0 µ

x=−3, µ

y=0

3121/3173

Page 26: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Relation MLE and LSE

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 27: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Relation MLE and LSE

Maximum Likelihood Estimates

In the regression model there are three parameters toestimate: β0, β1, and σ2.

Joint density of Y1,Y2, . . . ,Yn -under the (strong) normalityassumptions- is the product of their marginals (independentby assumption) so that the likelihood is:

L(y ;β0, β1, σ

)=

n∏i=1

1√2πσ

exp

(−(yi − (β0 + β1xi ))2

2σ2

)

=1

(2π)n/2 σnexp

(− 1

2σ2

n∑i=1

(yi − (β0 + β1xi ))2

)

`(y ;β0, β1, σ

)=− n log

(√2πσ

)− 1

2σ2

n∑i=1

(yi − (β0 + β1xi ))2 .

3122/3173

Page 28: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Relation MLE and LSE

Relation MLE and LSE

Partial derivatives set to zero give the following MLEs:

β0 =y − β1x ,

β1 =

∑ni=1 (xi − x) (yi − y)∑n

i=1 (xi − x)2,

and

σ2 =1

n

n∑i=1

(yi −

(β0 + β1xi

))2= s2 · n − 2

n.

Note: the parameters β0 and β1 are the same as in case ofLS (see slide 3106).

However, the MLE σ2 is a biased estimator of σ2.

Thus we use LSE, which is the unbiased variant of the MLE.3123/3173

Page 29: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Relation MLE and LSE

BLUE estimator

A point estimator τ(θ) is called linear if τ(θ) = Ax + b.

A point estimator τ(θ) is called Best Linear UnbiasedEstimator (BLUE):

Eθ [τ(θ)] = τ(θ), unbiased;Var (τ(θ)) ≤ Var (τ?(θ)) , minimum variance,

for any linear unbiased estimator τ?.

One can show that the LS-estimator β0 + X β1 is BLUE forµ = β0 + Xβ1 under the weak assumptions (prove notrequired in this course);

One can show that the LS-estimator β0 + X β1 is UMVUE forµ = β0 + Xβ1 under the strong assumptions (prove notrequired in this course).

3124/3173

Page 30: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Partitioning the variability

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 31: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Partitioning the variability

Partitioning the variability

Partitioning the variability is used for economic significance.

The squared deviations (yi − y)2 provide us with a measure ofthe spread of the data.

Define:

SST =n∑

i=1

(yi − y)2

to be the total sum of squares.

Using the estimated regression line, we can compute the fittedvalue:

yi = β0 + β1xi .

3125/3173

Page 32: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Partitioning the variability

Partitioning the variability

Partition the total deviation as:

yi − y︸ ︷︷ ︸total deviation

= (yi − yi )︸ ︷︷ ︸unexplained deviation

+ (yi − y) .︸ ︷︷ ︸explained deviation

We then obtain:

n∑i=1

(yi − y)2

︸ ︷︷ ︸SST

=n∑

i=1

(yi − yi )2

︸ ︷︷ ︸SSE

+n∑

i=1

(yi − y)2

︸ ︷︷ ︸SSM

,

where- SSE: sum of squares error (sometime called residual);- SSM: sum of squares model (sometime called regression).

Proof: similar to week 8 k-sample tests, see slides 3171-31723126/3173

Page 33: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Partitioning the variability

Interpret these sums of squares as follows:- SST is the total variability in the absence of knowledge of the

variable X ;- SSE is the total variability remaining after introducing the

effect of X ;- SSM is the total variability “explained” because of knowledge

of X .

This partitioning of the variability is used in ANOVA tables:

Source Sum of squares Degrees Mean Fof freedom square

Model SSM=n∑

i=1(yi − y)2 DFM=1 MSM= SSM

DFMMSMMSE

Error SSE=n∑

i=1(yi − yi )

2 DFE=n − 2 MSE= SSEDFE

Total SST=n∑

i=1(yi − y)2 DFT=n − 1 MST= SST

DFT3127/3173

Page 34: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Partitioning the variability

Coefficient of DeterminationNotice that the square of the correlation coefficient occurs inthe denominator of the t statistic used to test hypothesesconcerning the population correlation coefficient. The statisticR2 is called the coefficient of determination and providesuseful information.

Noting (prove: slide 3173, notation: slide3163):

SSE = Syy︸︷︷︸=SST

− β1Sxy︸ ︷︷ ︸=SSM

,

then the coefficient of determination may be written as:

R2 =

(Sxy√SxxSyy

)2

= 1− SSE

SST.

Thus, R2 can be seen as the proportion of total variation of yexplained by the variable x in a simple linear regression model.3128/3173

Page 35: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Exercise

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 36: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Exercise

Exercise

A car insurance company is interested in how large adverseselection effect is in his sample, i.e. how large the difference inclaim size relative to the premium is for different groups.

The insurance premium depends on the coverage (GoldComprehensive Car Insurance, Standard Comprehensive CarInsurance and Third Party Property Car Insurance) and theprice of the insured vehicle (five categories).

a. Explain why there might be difference in the claim sizes forthe different groups.

Solution: high coverage ⇒ reckless behavior (example:airbags). Expensive car ⇒ more wealthy drivers ⇒ betterdrivers? Other explanation is also possible.

3129/3173

Page 37: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Exercise

ExerciseEach of the 15 categories has a different premiums andnumber of contracts for the insurance contract.

The insurance company has the total claim sizes in the groups.

b. Give the linear regression model.

Solution: Let:

yi be the average claim size for group i = 1, . . . , 15;

xi be the average MVI premium for group i = 1, . . . , 15;

Then the regression is:

yi = β0 + β1x1 + εi ,

where β0 and β1 are regression constants, εi for i = 1, . . . , 15the residual independently distributed with mean zero,variance σ2, independent of X .3130/3173

Page 38: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Exercise

Exercise

c. Are the weak assumptions and strong assumptions reasonablein this regression model?

Solution: Weak assumptions:

Residual has a mean of zero: yes, the mean is captured in β0

and β1. Note: assumed linear relation!

Variance independent of explanatory variable: debatable(increasing?), have to check using data.

Residuals are independent: yes.

Additional strong assumption:

Residuals are normally distributed: debatable, have to checkusing data.

3131/3173

Page 39: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Exercise

Exercise data

200 300 400 500 600 700 800 900 1000100

200

300

400

500

600

700

800

900

1000

Premium (x)

Aver

age

claim

size

3132/3173

Page 40: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Exercise

ExerciseThe observed values for the 15 groups are:

i 1 2 3 4 5 6 7 8

xi 210 230 235 250 260 280 320 360yi 189 267 234 142 302 149 308 392

i 9 10 11 12 13 14 15

xi 380 410 460 540 720 880 910yi 323 313 456 528 768 963 954

Summary statistics:∑15

i=1 xi = 6445,∑15

i=1 yi = 6288,∑15i=1 x

2i = 3, 529, 325,

∑15i=1 y

2i = 3, 660, 190, and∑15

i=1 xiyi = 3, 566, 000.

d. Find the LS estimates of the regression model.

Solution: See next slide.3133/3173

Page 41: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Exercise

β1 =

∑15i=1 xi · yi − n · x · y∑15

i=1 x2i − n · x2

=3, 566, 000− 6445·6288

15

3, 529, 325− 64452

15

= 1.137.

β0 =y − β1 · x

=6288

15− 1.137 · 6445

15= −69.329.

σ2 =1

n − 2·

15∑i=1

y2i − n · y2 −

(∑15i=1 xi · yi − n · x · y

)2

∑15i=1 x

2i − n · x2

=

1

13·

(3, 660, 190− 62882

15−(3, 566, 000− 6445·6288

15

)2

3, 529, 325− 64452

15

)=3200 ⇒ σ =

√3200 = 56.57.

3134/3173

Page 42: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Exercise

Exercise

e. Find the Correlation coefficient. Relate the (sign) of thecorrelation coefficient to the estimates.

Solution:

r =

∑ni=1 xiyi − n (x · y)√(∑n

i=1 x2i − nx2

)·(∑n

i=1 y2i − ny2

)=

3, 566, 000− 6445·622815√(

3, 529, 325− 64552

15

)·(

3, 660, 190− 62282

15

)=0.9795.

Positive sample correlation (r > 0) ⇒ β1 is positive.

3135/3173

Page 43: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Exercise

Exercise

f. Partition the variability.

Solution:

SST =n∑

i=1

y2i − n · y2

=3, 660, 190− 62882

15= 1, 024, 260

SSE =(n − 2) · σ2

=13 · 3200 = 41, 606

SSM =SST− SSE = 1, 024, 260− 41, 606 = 982, 654.

3136/3173

Page 44: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Simple linear regression

Exercise

Comment on residual plot

200 300 400 500 600 700 800 900 1000−120

−100

−80

−60

−40

−20

0

20

40

60

80

Premium (x)

Resid

ual

3137/3173

Page 45: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Overview

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 46: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Overview

Overview of tests and CI

- Inference on white noise (εi )

- Inference on individual parameters (β0 or β1)

- Inference on a function of both parameters (β0 + xiβ1)

- Inference on a function of both parameters and white noise(β0 + xiβ1 + εi )

Note: under strong assumptions: εi ∼ N(0, σ2).

Thus (using two estimated parameters β0 and β1):

(n − 2) · s2

σ2=

n∑i=1

( εi/σ︸︷︷︸N(0,1)

)2 =

∑ni=1(yi − β0 − β1 · xi )2

σ2∼ χ2(n−2)

3138/3173

Page 47: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Inference on the slope

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 48: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Inference on the slope

Inference on the slope: Often we want to test whether theexogenous variable has an influence on the endogenousvariable or if the influence is larger/smaller than some value.

The distribution of β1 under the strong assumptions:

β1 − β1

σ

/√x>x

∼ N (0, 1) .

Notation: x = x − x ⇒ x>x =∑n

i=1(xi − x)2 and

Var(β1

)= σ2/

∑ni=1(xi − x)2 (see slide 3168).

σ is usually unknown, and estimated by s so:

β1 − β1(s

/√x>x

) =β1 − β1

σ ·√x>x︸ ︷︷ ︸

N(0,1)

/ √(n−2)·s2

σ2

n − 2︸ ︷︷ ︸√χ2(n−2)/(n−2)

∼ t (n − 2)

3139/3173

Page 49: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Inference on the slope

Inference on the slope

A 100 (1− α) % confidence interval for β1 is given by:

β1 − t1−α/2,n−2 · se(β1

)< β1 < β1 + t1−α/2,n−2 · se

(β1

),

wherese(β1

)=

s√x>x

is the standard error of the estimated slope coefficient.

For testing the null hypothesis H0 : β1 = β1 for some constantβ1, use the test statistic:

t(β1

)=β1 − β1

se(β1

) =β1 − β1

s

/√x>x

.

3140/3173

Page 50: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Inference on the slope

Inference on the slopeThe decision rules under various alternative hypotheses aresummarized below.

Decision Making Procedures for Testing H0 : β1 = β1

Alternative H1 Reject H0 in favor of H1 if

β1 6= β1

∣∣∣t (β1

)∣∣∣ > t1−α/2,n−2

β1 > β1 t(β1

)> t1−α,n−2

β1 < β1 t(β1

)< −t1−α,n−2

To test whether the regressor variable is significant or not, it isequivalent to testing whether the slope is zero or not. Thus, testH0 : β1 = 0 against H1 : β1 6= 0.

3141/3173

Page 51: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Inference on the intercept

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 52: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Inference on the intercept

Inference on the interceptSimilar, one can test the value of the intercept.

Again, using testing procedure / confidence interval.

The distribution of β0 under the strong assumptions:

β0 − β0

σ ·√

1n + |x |2

n2x>x

∼ N (0, 1) .

Note (see slide 3169): Var(β0

)= σ2 ·

(1n + x2∑

(xi−x)2

).

σ is usually unknown, and estimated by s thus:

β0 − β0(s√

1n + |x|2

n2x>x

) =β0 − β0

σ ·√

1n + |x|2

n2x>x︸ ︷︷ ︸N(0,1)

/ √(n−2)·s2

σ2

n − 2︸ ︷︷ ︸√χ2(n−2)/(n−2)

∼ t (n − 2) .

3142/3173

Page 53: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Inference on the intercept

Inference on the interceptA 100 (1− α) % confidence interval for β0 is given by:

β0 − t1−α/2,n−2 · se(β0

)< β0 < β0 + t1−α/2,n−2 · se

(β0

),

where se(β0

)=

s√1n + |x |2

n2x>x

= s ·

√1

n+

x2∑ni=1(xi − x)2

is the standard error of the estimated slope coefficient.

For testing the null hypothesis H0 : β0 = β0 for some constantβ0, use the test statistic (with similar decision rules as forthe slope, see slide 3141):

t(β0

)=β0 − β0

se(β0

) =β0 − β0

s ·√

1n + |x |2

n2x>x

.

3143/3173

Page 54: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Confidence Intervals for the Population Regression Line

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 55: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Confidence Intervals for the Population Regression Line

Confidence Intervals for the Population Regression Line

Suppose x = x0 is a specified value of the -out of sample-regressor variable and we want to predict the corresponding Yvalue associated with it.

Consider estimating the mean of this which will be:

E [Y |x = x0 ] = E [β0 + β1x + ε |x = x0 ]

= β0 + β1x0 + 0.

Thus the best predicted value (also unbiased) will be:

y0 = β0 + β1x0.

3144/3173

Page 56: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Confidence Intervals for the Population Regression Line

Confidence Intervals for the Population Regression LineThe variance of the prediction is:

Var (y0) =Var(β0 + β1x0

)=Var

(β0

)+ x2

0 · Var(β1

)+ 2x0 · Cov

(β0, β1

)∗=

(1

n+

x2

(n − 1) s2x

)· σ2 + x2

0 ·σ2

(n − 1) s2x

+ 2x0 ·(−x · σ2

(n − 1) s2x

)=

(1

n+

x2 − 2x0x + x20

(n − 1) s2x

)· σ2

=

(1

n+

(x − x0)2

(n − 1) s2x

)· σ2.

* see slide 3169 for Var(β0

)slide 3168 for Var

(β0

)and slide

3170 for Cov(β0, β1

).

3145/3173

Page 57: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Confidence Intervals for the Population Regression Line

Confidence Intervals for the Population Regression LineSince both β0 and β1 are linear functions of Y1, . . . ,Yn, so isβ0 + β1x0.

Therefore, we have: y0 ∼ N

(β0 + β1x0,

(1

n+ (x−x0)2

(n−1)·s2x

)· σ2

)and we have: y0 − (β0 + β1x0)

s ·

√1

n+

(x − x0)2

(n − 1) · s2x

∼ t (n − 2) .

This pivot can therefore be used to construct the100 (1− α) % confidence interval for y0:(

β0 + β1x0

)± t1−α/2,n−2 × s ·

√1

n+

(x − x0)2

(n − 1) · s2x

.

Note: Regression line (mean response) does not includeuncertainty due to εi , for that see slide 3150.3146/3173

Page 58: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Confidence Intervals for the Population Regression Line

Example

−10 −5 0 5 10−10

−5

0

5

10

x

y

σx=4, σ

y=1 σ

x=1, σ

y=1 σ

x=4, σ

y=4

3147/3173

Page 59: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Prediction Intervals for the Actual Value of the Dependent Variable

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 60: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Prediction Intervals for the Actual Value of the Dependent Variable

CI for the Actual Value of the Dependent VariableIn the next slides we will find pointwise CI and predictionintervals for the value of yi . Note, this implies that for eachobservation the probability that it is between the upper andlower bound is α, i.e., all values of yi are between upper andlower bound w.p. much smaller than α.

We base our prediction of Yi (given X = x) when X = xi on:

yi = β0 + β1xi .

The error in our prediction is:

Yi − yi = β0 + β1xi + εi − yi = E[Y |X = xi ]− yi + εi ,

where we have:

E [yi |X = x ,X = xi ] = E[β0 + β1xi |X = x

]= β0 + β1xi .

3148/3173

Page 61: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Prediction Intervals for the Actual Value of the Dependent Variable

CI for the Actual Value of the Dependent VariableThus we have:

E [Yi − yi |X = x ,X = xi ] = E [Y |X = xi ]− (β0 + β1 · xi )= 0.

Further (using 3145),

Var (Yi − yi |X = x ,X = xi ) =Var (Yi |X = xi ) + Var (yi |X = x)

−2Cov (Yi , yi |X = x ,X = xi )

=σ2 + σ2

(1

n+

(xi − x)2

Sxx

)−0

=σ2

(1 +

1

n+

(xi − x)2

Sxx

).

Notation: Sxx =∑n

i=1 x2i .

3149/3173

Page 62: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Prediction Intervals for the Actual Value of the Dependent Variable

Prediction Intervals

It then follows that:

(Yi − yi |X = x ,X = xi ) ∼ N

(0, σ2 ·

(1 +

1

n+

(xi − x)2

Sxx

))and thus the test statistic is:

T =Yi − yi

s ·

√1 +

1

n+

(xi − x)2

Sxx

∼ tn−2.

Thus: for the variance for the predicted individual response anadditional σ2 must be added to the variance for the predictedmean response (see slide 3146)

3150/3173

Page 63: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Prediction Intervals for the Actual Value of the Dependent Variable

Prediction Intervals

Thus, we have a 100(1− α)% prediction interval for Yi , the valueof Y at X = xi given by:

yi ± t1−α/2,n−2 · s ·

√1 +

1

n+

(xi − x)2

Sxx

= β0 + β1xi ± t1−α/2,n−2 · s ·

√1 +

1

n+

(xi − x)2

Sxx.

3151/3173

Page 64: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Prediction Intervals for the Actual Value of the Dependent Variable

Example

−10 −5 0 5 10−10

−5

0

5

10

x

y

σx=4, σ

y=1 σ

x=1, σ

y=1 σ

x=4, σ

y=4

3152/3173

Page 65: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Hypothesis Test for Population Correlation

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 66: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Hypothesis Test for Population Correlation

Testing the correlation coefficient

See F&T page 25.

Let a pair of random variables come from a common bivariatenormal distribution, it is possible to find a test statistic basedon the likelihood ratio test to conduct a test of independence.

Using LRT test H0 : ρXY = 0 v.s. H1 : ρXY 6= 0 (or > or <).

The test statistic is based on (prove not required):

T =R√

(1− R2) /(n − 2)=

R ·√n − 2√

1− R2,

where R is the random variable denoting correlationcoefficient, but with the x and y replaced by X and Y .

3153/3173

Page 67: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Hypothesis Test for Population Correlation

Testing the correlation coefficient

It can be shown that T ∼ t (n − 2) has a t-distribution with(n − 2) degrees of freedom.

We can summarize a procedure for testing the independencebetween two sets of random variables as follow:

Suppose we obtain observed pairs of variables(x1, y1) , (x2, y2) , . . . , (xn, yn).

1. To test H0 : ρXY = 0 against the alternative H1 : ρXY > 0, thedecision rule is:

Reject H0 if the observed t =r ·√n − 2√

1− r2> t1−α,n−2.

3154/3173

Page 68: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Hypothesis Test for Population Correlation

Testing the correlation coefficient

2. To test H0 : ρXY = 0 against the alternative H1 : ρXY < 0,the decision rule is:

Reject H0 if the observed t =r ·√n − 2√

1− r2< −t1−α,n−2.

3. And to test H0 : ρXY = 0 against the alternativeH1 : ρXY 6= 0, the decision rule is:

Reject H0 if the observed |t| =

∣∣∣∣ r · √n − 2√1− r2

∣∣∣∣ > t1−α/2,n−2.

3155/3173

Page 69: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Hypothesis Test for Population Correlation

Fisher’s z-transformation for testing correlation hypotheses

We have n independent data points of the form:(x1, y1), (x2, y2), . . . , (xn, yn). Each point is random, drawnfrom bivariate normal distribution with correlation ρ.

We wish to test:

H0 : ρ = ρ0 against H1 : ρ 6= ρ0.

Our test statistic is based on r , the sample correlationcoefficient, together with:

z =1

2log ((1 + r)/(1− r)) and

ζ0 =1

2log (1 + ρ0)/(1− ρ0)) ,

and is the form: T = (z − ζ0) ·√n − 3, which is

approximately normally distributed (prove not required).3156/3173

Page 70: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Exercise

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 71: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Exercise

Exercise

Consider the previous exercise on slides 3129-3137 with thedata on slide 3133.

i. Test whether the correlation coefficient is positive.

Solution: r = 0.9795 (see slide 3135).

Method 1: T = r ·√n−2√

1−r2= 0.9795·

√13√

1−0.97952= 17.55. Using F&T

page 163: t1−p(13) = 17.55 for p = 0, i.e., p-value is almost0, reject null hypothesis.

Method 2: z = log(

1+r1−r

)/2 = log

(1.97950.0205

)/2 = 2.2845

and ζ0 = 0. Hence, T = 2.2845 ·√

12 = 7.91 Thus p-valueequals zero.

3157/3173

Page 72: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Exercise

Exerciseii. Test whether the slope parameter is larger than one.

Solution: test H0 : β1 = 1 v.s. H1 : β1 6= 1.

Test statistic:

T =β1 − 1

s/√∑n

i=1 x2i − n · x2

∗=

1.137− 1

56.57/√

3, 529, 325− 64452

15

=0.137

0.06488809= 2.11

* using σ = 56.57 and β1 = 1.137, see slide 3134.

t1−0.027(13) = 2.11. Accept null for level of significance of5.5% or lower (note two-sided test).3158/3173

Page 73: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Exercise

Exerciseiii. Test whether the intercept parameter is non-negative.

Solution: test H0 : β0 ≤ 0 v.s. H1 : β0 > 0.

Test statistic:

T =β0 − 0

s ·√

1n + x2∑n

i=1 x2i −n·x2

∗=

−69.329

56.57 ·√

115 +

( 644515 )2

3,529,325− 64452

15

=−69.329

31.475= −2.20.

* using σ = 56.57 and β0 = −69.329, see slide 3134.

t0.0231(13) = −2.20. Accept null only for level of significanceof 2.3% or lower (note one-sided test).

3159/3173

Page 74: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Testing in simple linear regression

Exercise

Exerciseiv. Calculate the 95% Confidence interval of Y given that

X = 350.

Solution: We have:yi |xi = 350 = −69.329 + 1.137 · 350 =328.619, s = 56.57.√

Var(yi ) =s ·

√1 +

1

n+

(x − x0)2∑ni=1 x

2i − n · x2

=56.57 ·

√√√√1 +1

15+

(6445

15 − 350)2

3, 529, 325− 64452

15

=60.449

t0.975(13) = 2.160. Thus

Pr (yi ∈ (328.61− 2.160 · 60.449, 328.61 + 2.160 · 60.449)) =0.95

Pr (yi ∈ (198.04, 459.18)) =0.953160/3173

Page 75: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Algebra: parameter estimates

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 76: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Algebra: parameter estimates

Basic idea, used on slide 3106

β0 =

∑ni=1 yi − β1

∑ni=1 xi

n

β1 =

∑ni=1 xiyi − β0

∑ni=1 xi∑n

i=1 x2i

β0 =

∑ni=1 yi −

(∑ni=1 xiyi−β0

∑ni=1 xi∑n

i=1 x2i

)∑ni=1 xi

n(1−

(∑n

i=1 xi )2

n∑n

i=1 x2i

)β0 =

∑ni=1 x

2i

∑ni=1 yi −

(∑ni=1 xiyi − β0

∑ni=1 xi

)∑ni=1 xi

n∑n

i=1 x2i

β0∗=

∑ni=1 yi

(∑ni=1 x

2i

)−∑n

i=1 xiyi∑n

i=1 xi

n∑n

i=1 x2i − (

∑ni=1 xi )

2.

*: (1− a/b)c = d/b ⇒ (bc − ac)/b = d/b ⇒ c = d/(b − a).3161/3173

Page 77: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Algebra: parameter estimates

From the previous slide we have:

β0 =

∑ni=1 yi − β1

∑ni=1 xi

n

β1 =

∑ni=1 xiyi − β0

∑ni=1 xi∑n

i=1 x2i

.

thus parameter β1 is estimated by (used on slide 3106):

β1 =n∑n

i=1 xiyi −(∑n

i=1 yi − β1∑n

i=1 xi

)∑ni=1 xi

n∑n

i=1 x2i(

1−(∑n

i=1 xi )2

n∑n

i=1 x2i

)β1 =

n∑n

i=1 xiyi −∑n

i=1 yi∑n

i=1 xin∑n

i=1 x2i

β1∗=n∑n

i=1 xiyi −∑n

i=1 yi∑n

i=1 xi

n∑n

i=1 x2i − (

∑ni=1 xi )

2.

*: (1− a/b)c = d/b ⇒ (bc − ac)/b = d/b ⇒ c = d/(b − a).3162/3173

Page 78: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Algebra: parameter estimates

Parameter estimates II: Notation

More commonly, we express the parameter estimates in termsof (squared) errors.

We have the following sum of squares (see F&T page 24-25):Sx =

∑ni=1 xi and Sy =

∑ni=1 yi

Sxx =∑n

i=1 x2i and Syy =

∑ni=1 y

2i

Sxy =∑n

i=1 xi · yisxx =

∑ni=1(xi − x)2 = (n − 1) · s2

x

syy =∑n

i=1(yi − y)2 = (n − 1) · s2y

sxy =∑n

i=1(xi − x) ·∑n

i=1(yi − y) = (n − 1) · sx ,y ,

where s2x (sx ,y ) denotes sample (co-)variance. Moreover, we

denote:

r =sx ,ysx · sy

=sxy√

sxx · syy.

3163/3173

Page 79: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Algebra: parameter estimates

Parameter estimates II (used on slide 3106)We have:

β1 =n∑n

i=1 xiyi −∑n

i=1 yi∑n

i=1 xi

n∑n

i=1 x2i − (

∑ni=1 xi )

2=

nSxy − SxSynSxx − S2

x

=n(∑n

i=1 xiyi −∑n

i=1 yi∑n

i=1 xi ·nn2

)n(∑n

i=1 x2i − (

∑ni=1 xi )

2 · 1n2

)=

∑ni=1 xiyi − nxy∑ni=1 x

2i − nx2

=

∑ni=1 xiyi −

∑ni=1 xiy −

∑ni=1 xiy + nxy∑n

i=1 x2i +

∑ni=1 x

2 − 2∑n

i=1 xix

=

∑ni=1(xi − x) · (yi − y)∑n

i=1(xi − x)2

=sxysxx

=sxy√

sxx ·√sxx·√syy√syy

= rsysx.

3164/3173

Page 80: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Algebra: parameter estimates

Parameter estimates II

Thus we have that β1 is the sample correlation coefficienttimes the quotient of the sample standard deviation of Y andX .

We refer to β1 as the slope of the regression line.

For β0 we have (used on slide 3106):

β0 =

∑ni=1 yi (

∑ni=1 xi )

2 −∑n

i=1 xiyi∑n

i=1 xi

n∑n

i=1 x2i − (

∑ni=1 xi )

2

=SySxx − SxySxnSxx − S2

x

=

∑ni=1 yi − β1

∑ni=1 xi

n=

Syn− β1

Sxn.

We refer to β0 as the intercept of the regression line.3165/3173

Page 81: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Properties of parameter estimates

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 82: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Properties of parameter estimates

Unbiased regression parameters

For β0 we have (used on slide 3118):

E[β0

]= E

[∑ni=1 yi

(∑ni=1 x

2i

)−∑n

i=1 xiyi∑n

i=1 xi

n∑n

i=1 x2i − (

∑ni=1 xi )

2

]

=

∑ni=1 E [yi ]

(∑ni=1 x

2i

)−∑n

i=1 xiE [yi ]∑n

i=1 xi

n∑n

i=1 x2i − (

∑ni=1 xi )

2

=

∑ni=1 (β0 + β1xi )

(∑ni=1 x

2i

)−∑n

i=1 xi (β0 + β1xi )∑n

i=1 xi

n∑n

i=1 x2i − (

∑ni=1 xi )

2

=nβ0

(∑ni=1 x

2i

)−∑n

i=1 (β0xi )∑n

i=1 xi

n∑n

i=1 x2i − (

∑ni=1 xi )

2

= β0.

Home exercise: show the same for E[β1

]= β1.

3166/3173

Page 83: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Properties of parameter estimates

Regression parameters uncertainty

Under the weak assumptions we have that the (co-)varianceof the parameters is given by:

s2β0

= Var(β0

)=

σ2∑n

i=1 x2i

n∑n

i=1 x2i − (

∑ni=1 xi )

2= s2

β1· Sxx

n

s2β1

= Var(β1

)=

nσ2

n∑n

i=1 x2i − (

∑ni=1 xi )

2=

ns2ε

nSxx − S2x

Cov(β0, β1

)=

−σ2∑n

i=1 xi

n∑n

i=1 x2i − (

∑ni=1 xi )

2= −Sx

n· s2β1

s2ε = Var (ε) =

nSyy − S2y − β2

1(nSxx − S2x )

n(n − 2)

Proof: See next slides.3167/3173

Page 84: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Properties of parameter estimates

Proof sample variance slope (used on slide 3139 and 3145)Note that we have:

β1 =

∑ni=1 (xi − x) (yi − y)∑n

i=1 (xi − x)2=

∑ni=1 (xi − x) yi∑ni=1 (xi − x)2

.

Then we have that:

Var(β1

)= Var

(∑ni=1 (xi − x) yi∑ni=1 (xi − x)2

)

=

∑ni=1 (xi − x)2 Var (yi )(∑n

i=1 (xi − x)2)2

=σ2∑n

i=1 (xi − x)2(∑ni=1 (xi − x)2

)2

= σ2/∑

ni=1 (xi − x)2 .

3168/3173

Page 85: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Properties of parameter estimates

Proof sample variance intercept

Using that:β0 = y − β1x ,

we have (used on slide 3142 and 3145):

Var(β0

)=Var

(y − β1x

)=Var

(n∑

i=1

yi/n

)+ x2Var

(β1

)=∑

ni=1Var (yi ) /n

2 + x2σ2/∑

ni=1 (xi − x)2

=σ2 n∑n

i=1(xi − x)2 + nx2

n∑n

i=1 x2i − (

∑ni=1 xi )

2

=σ2∑n

i=1 x2i

n∑n

i=1 x2i − (

∑ni=1 xi )

2.

3169/3173

Page 86: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Properties of parameter estimates

Proof sample covariance intercept and slope (used on slide3145)

Using that:β0 = y − β1x ,

we have:

Cov(β0, β1

)=Cov

(y − β1x , β1

)=Cov

(−β1x , β1

)=− x · Cov

(β1, β1

)=−

∑ni=1 xin

· Var(β1

).

3170/3173

Page 87: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Proof: Partitioning the variability

Simple Linear Regression

Simple linear regressionBasic ideaCorrelation CoefficientAssumptionsRelation MLE and LSEPartitioning the variabilityExercise

Testing in simple linear regressionOverviewInference on the slopeInference on the interceptConfidence Intervals for the Population Regression LinePrediction Intervals for the Actual Value of the Dependent VariableHypothesis Test for Population CorrelationExercise

AppendixAlgebra: parameter estimatesProperties of parameter estimatesProof: Partitioning the variability

Page 88: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Proof: Partitioning the variability

Proof: Partitioning the variability

SST =n∑

i=1

(yi − y)2 =n∑

i=1

y2i + y2 − 2yyi

SSE + SSM =n∑

i=1

(yi − yi )2 +

n∑i=1

(yi − y)2

=n∑

i=1

y2i + y2

i − 2yi yi + y2i + y2 − 2y yi

=n∑

i=1

y2i + 2y2

i − 2yi yi + y2 − 2y yi

∗=

n∑i=1

y2i + 2y2

i − 2(yi + εi )yi + y2 − 2y(yi − εi )

* using yi = yi + εi . Cont. next slide.3171/3173

Page 89: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Proof: Partitioning the variability

Proof cont.:

SST =n∑

i=1

(yi − y)2 =n∑

i=1

(y2i + y2 − 2yyi

)SSE + SSM =

n∑i=1

y2i + 2y2

i − 2(yi + εi )yi + y2 − 2y(yi − εi )

=n∑

i=1

y2i − 2yiεi + y2 − 2yyi + 2yεi

∗∗=

n∑i=1

y2i + y2 − 2yyi = SST

** using∑n

i=1 2yεi = 2y∑n

i=1 εi = 0 and∑ni=1 2yiεi = 2nE[yiεi ]

∗∗∗= 2nE[yi ] · E[εi ] = 0, using ***

independence.Used on slide 3126.

3172/3173

Page 90: Week10 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 10

Appendix

Proof: Partitioning the variability

SSM =n∑

i=1

(yi − y)2

=n∑

i=1

(β0 + β1 · xi − y

)2

=n∑

i=1

((y − β1 · x) + β1 · xi − y

)2

=n∑

i=1

β21 · (xi − x)2

=β21 · sxx

∗=β1 · sxy

using β1 =sxysxx

from slide 3164.3173/3173