topic 12 simple linear regression and correlation

Post on 17-Dec-2015

248 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

TOPIC 12TOPIC 12

Simple Linear Regression and

Correlation

Simple Linear Regression and

Correlation

• Representation of some phenomenon• Mathematical model is a mathematical

expression of some phenomenon• Often describe relationships between

variables• Types

– Deterministic models– Probabilistic models

ModelsModels

• Hypothesize exact relationships

• Suitable when prediction error is negligible

• Example: force is exactly mass times acceleration– F = m·a

Deterministic ModelsDeterministic Models

• Hypothesize two components– Deterministic– Random error

• Example: sales volume (y) is 10 times advertising spending (x) + random error– y = 10x + – Random error may be due to factors

other than advertising

Probabilistic ModelsProbabilistic Models

ProbabilisticModels

Regression Models

Correlation Models

Regression ModelsRegression Models

• Answers ‘What is the relationship between the variables?’

• Equation used– One numerical dependent (response) variable

What is to be predicted– One or more numerical or categorical

independent (explanatory) variables• Used mainly for prediction and estimation

Regression ModelsRegression Models

1. Hypothesize deterministic component

2. Estimate unknown model parameters

3. Specify probability distribution of random error term• Estimate standard deviation of error

4. Evaluate model

5. Use model for prediction and estimation

Regression Modeling StepsRegression Modeling Steps

Simple

1 ExplanatoryVariable

RegressionModels

2+ ExplanatoryVariables

Multiple

LinearNon-

LinearLinear

Non-Linear

Types of Regression ModelsTypes of Regression Models

y x 0 1

Relationship between variables is a linear function

Dependent (Response) Variable

Independent (Explanatory) Variable

Population Slope

Population y-intercept

Random Error

Linear Regression ModelLinear Regression Model

Population

$ $

$

$

$

Unknown Relationship

0 1y x

Random Sample

$ $

$ $

0 1ˆ ˆ ˆy x

Population & Sample Regression ModelPopulation & Sample Regression Model

y

x

0 1i i iy x

0 1E y x

Observedvalue

Observed value

i = Random error

(line of means)

Population Linear Regression ModelPopulation Linear Regression Model

y

x

0 1ˆ ˆ ˆi i iy x

0 1ˆ ˆˆi iy x

Unsampled observation

i = Random

error

Observed value

^

Sample Linear Regression ModelSample Linear Regression Model

1. Plot of all (xi, yi) pairs

2. Suggests how well model will fit

0204060

0 20 40 60

x

y

Scatter DiagramScatter Diagram

0204060

0 20 40 60

x

y

• How would you draw a line through the points?• How do you determine which line ‘fits best’?

Thinking ChallengeThinking Challenge

• ‘Best fit’ means difference between actual y values and predicted y values are a minimum– But positive differences off-set negative

2 2

1 1

ˆ ˆn n

ii ii i

y y

• Least Squares minimizes the Sum of the Squared Differences (SSE)

Least Squares MethodLeast Squares Method

e2

y

x

e1 e

3

e4

^^

^^

2 0 1 2 2ˆ ˆ ˆy x

0 1ˆ ˆˆi iy x

2 2 2 2 21 2 3 4

1

ˆ ˆ ˆ ˆ ˆLS minimizes n

ii

Least Squares GraphicallyLeast Squares Graphically

1 1

11 2

12

1

ˆ

n n

i ini i

i ixy i

nxx

ini

ii

x y

x ySS nSS

x

xn

Slope

y-intercept

0 1ˆ ˆy x Prediction Equation

0 1ˆ ˆy x

Coefficient EquationsCoefficient Equations

You’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad $ Sales (Units)1 12 13 24 25 4

Find the least squares line relatingsales and advertising.

Least Squares ExampleLeast Squares Example

2 2

1 1 1 1 1

2 1 4 1 2

3 2 9 4 6

4 2 16 4 8

5 4 25 16 20

15 10 55 26 37

xi yi xi yi xiyi

Parameter Estimation Solution TableParameter Estimation Solution Table

1 1

11 2 2

12

1

15 1037

5ˆ .7015

555

n n

i ini i

i ii

n

ini

ii

x y

x yn

x

xn

ˆ .1 .7y x

0 1ˆ ˆ 2 .70 3 .10y x

Parameter Estimation SolutionParameter Estimation Solution

Parameter Estimates

Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|INTERCEP 1 -0.1000 0.6350 -0.157 0.8849ADVERT 1 0.7000 0.1914 3.656 0.0354

0^

1^

ˆ .1 .7y x

Parameter Estimation Computer OutputParameter Estimation Computer Output

0

1

2

3

4

0 1 2 3 4 5

Sales

Advertising

ˆ .1 .7y x

Regression Line Fitted to the DataRegression Line Fitted to the Data

You’re an economist for the county cooperative. You gather the following data:

Fertilizer (lb.) Yield (lb.) 4 3.0 6 5.510 6.512 9.0

Find the least squares line

Relating crop yield and fertilizer.

Least Squares Thinking ChallengeLeast Squares Thinking Challenge

4 3.0 16 9.00 12

6 5.5 36 30.25 33

10 6.5 100 42.25 65

12 9.0 144 81.00 108

32 24.0 296 162.50 218

xi2yi xi yi xiyi

2

Parameter Estimation Solution TableParameter Estimation Solution Table

1 1

11 2 2

12

1

32 24218

4ˆ .6532

2964

n n

i ini i

i ii

n

ini

ii

x y

x yn

x

xn

0 1ˆ ˆ 6 .65 8 .80y x

ˆ .8 .65y x

Parameter Estimation SolutionParameter Estimation Solution

02468

10

0 5 10 15

Yield (lb.)

Fertilizer (lb.)

ˆ .8 .65y x

Regression Line Fitted to the Data*Regression Line Fitted to the Data*

1. Mean of probability distribution of error, ε, is 0

2. Probability distribution of error has constant variance

3. Probability distribution of error, ε, is normal

4. Errors are independent

Linear Regression AssumptionsLinear Regression Assumptions

x1 x2 x3

yE(y) = β0 + β1x

x

Error Probability DistributionError Probability Distribution

• Measured by standard error of regression model– Sample standard deviation of : s^

• Affects several factors– Parameter significance– Prediction accuracy

^• Variation of actual y from predicted y, y

Random Error VariationRandom Error Variation

y

xxi

0 1ˆ ˆˆi iy x

yi 2ˆ( )i iy y

Unexplained sum of squares

2( )iy yTotal sum of squares

2ˆ( )iy yExplained sum of squares

y

Variation MeasuresVariation Measures

22 ˆ2 i i

SSEs where SSE y y

n

2

2

SSEs s

n

Estimation of σ2Estimation of σ2

You’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad $ Sales (Units)1 12 13 24 25 4

Find SSE, s2, and s.

Calculating SSE, s2, s ExampleCalculating SSE, s2, s Example

1 1

2 1

3 2

4 2

5 4

xi yi

.6

1.3

2

2.7

3.4

ˆy y

.4

-.3

0

-.7

.6

2ˆ( )y yˆ .1 .7y x

.16

.09

0

.49

.36

SSE=1.1

Calculating SSE SolutionCalculating SSE Solution

Calculating s2 and s Solution

2 1.1.36667

2 5 2

SSEs

n

.36667 .6055s

Estimation of σ2Estimation of σ2

• Shows if there is a linear relationship between x and y

• Involves population slope 1

• Hypotheses – H0: 1 = 0 (No Linear Relationship)

– Ha: 1 0 (Linear Relationship)

• Theoretical basis is sampling distribution of slope

Test of Slope CoefficientTest of Slope Coefficient

y

Population Linex

Sample 1 Line

Sample 2 Line

b1

Sampling Distribution

1

1S

^

^

All Possible Sample Slopes

Sample 1: 2.5Sample 2: 1.6 Sample 3: 1.8Sample 4: 2.1 : :

Very large number of sample slopes

Sampling Distribution of Sample SlopesSampling Distribution of Sample Slopes

1

1 1

ˆ

2

12

1

ˆ ˆ2

wherexx

n

ini

xx ii

t df nsS

SS

x

SS xn

Slope Coefficient Test StatisticSlope Coefficient Test Statistic

You’re a marketing analyst for Hasbro Toys.

You find β0 = –.1, β1 = .7 and s = .6055.Ad $ Sales (Units)1 12 13 24 25 4

Is the relationship significant at the .05 level of significance?

^^

Test of Slope Coefficient ExampleTest of Slope Coefficient Example

xi yi xi2 yi

2 xiyi

1 1 1 1 1

2 1 4 1 2

3 2 9 4 6

4 2 16 4 8

5 4 25 16 20

15 10 55 26 37

Solution TableSolution Table

1

1

ˆ 2

1

ˆ

.6055.1914

1555

5

ˆ .703.657

.1914

xx

SS

SS

tS

Test Statistic SolutionTest Statistic Solution

• H0: 1 = 0

• Ha: 1 0• .05• df 5 - 2 = 3• Critical Value(s):

Test Statistic:

Decision:

Conclusion:

1

1

ˆ

ˆ .703.657

.1914t

S

Reject at = .05

There is evidence of a relationshipt0 3.182-3.182

.025

Reject H0 Reject H0

.025

Test of Slope Coefficient SolutionTest of Slope Coefficient Solution

Parameter Estimates Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|INTERCEP 1 -0.1000 0.6350 -0.157 0.8849ADVERT 1 0.7000 0.1914 3.656 0.0354

t = 1 / S

P-Value

S11 1

^^^^

Test of Slope Coefficient Computer OutputTest of Slope Coefficient Computer Output

ProbabilisticModels

Regression Models

Correlation Models

Correlation Model MapCorrelation Model Map

• Answers ‘How strong is the linear relationship between two variables?’

• Coefficient of correlation– Sample correlation coefficient denoted r– Values range from –1 to +1– Measures degree of association– Does not indicate cause–effect

relationship

Correlation ModelsCorrelation Models

xy

xx yy

SSr

SS SS

2

2

2

2

xx

yy

xy

xSS x

n

ySS y

n

x ySS xy

n

where

Coefficient of CorrelationCoefficient of Correlation

–1.0 +1.00–.5 +.5

No Linear Correlation

Perfect Negative

Correlation

Perfect Positive

Correlation

Increasing degree of positive correlation

Increasing degree of negative correlation

Coefficient of Correlation ValuesCoefficient of Correlation Values

You’re a marketing analyst for Hasbro Toys.

Ad $ Sales (Units)1 12 13 24 25 4

Calculate the coefficient ofcorrelation.

Coefficient of Correlation ExampleCoefficient of Correlation Example

xi yi xi2 yi

2 xiyi

1 1 1 1 1

2 1 4 1 2

3 2 9 4 6

4 2 16 4 8

5 4 25 16 20

15 10 55 26 37

Solution TableSolution Table

22

2

22

2

(15)55 10

5

(10)26 6

5

(15)(10)37 7

5

xx

yy

xy

xSS x

n

ySS y

n

x ySS xy

n

7.904

10 6xy

xx yy

SSr

SS SS

Coefficient of Correlation SolutionCoefficient of Correlation Solution

You’re an economist for the county cooperative. You gather the following data:

Fertilizer (lb.) Yield (lb.) 4 3.0 6 5.510 6.512 9.0

Find the coefficient of correlation.

Coefficient of Correlation Thinking ChallengeCoefficient of Correlation Thinking Challenge

4 3.0 16 9.00 12

6 5.5 36 30.25 33

10 6.5 100 42.25 65

12 9.0 144 81.00 108

32 24.0 296 162.50 218

xi2yi xi yi xiyi

2

Solution TableSolution Table

22

2

22

2

(32)296 40

4

(24)162.5 18.5

4

(32)(24)218 26

4

xx

yy

xy

xSS x

n

ySS y

n

x ySS xy

n

26.956

40 18.5xy

xx yy

SSr

SS SS

Coefficient of Correlation SolutionCoefficient of Correlation Solution

Proportion of variation ‘explained’ by relationship between x and y

2 Explained Variation

Total Variationyy

yy

SS SSEr

SS

0 r2 1

r2 = (coefficient of correlation)2

Coefficient of DeterminationCoefficient of Determination

You’re a marketing analyst for Hasbro Toys.

You know r = .904.

Ad $ Sales (Units)1 12 13 24 25 4

Calculate and interpret thecoefficient of determination.

Coefficient of Determination ExampleCoefficient of Determination Example

r2 = (coefficient of correlation)2

r2 = (.904)2

r2 = .817

Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad $ (x) to predict Sales (y) in the linear model.

Coefficient of Determination SolutionCoefficient of Determination Solution

Root MSE 0.60553 R-square 0.8167 Dep Mean 2.00000 Adj R-sq 0.7556 C.V. 30.27650

r2 adjusted for number of explanatory variables & sample size

r2

r2 Computer Outputr2 Computer Output

• Types of predictions– Point estimates– Interval estimates

• What is predicted– Population mean response E(y) for given x

Point on population regression line

– Individual response (yi) for given x

Prediction With Regression ModelsPrediction With Regression Models

Mean y, E(y)

y

^yIndividual

Prediction, y

E(y) = b0 + b1x

^x

xP

^ ^y i =

b 0 + b 1x

What is Predicted?What is Predicted?

Confidence Interval Estimate for Mean Value of y at x = xp

xx

p

SS

xx

nSty

2

2/

df = n – 2

Confidence Interval EstimateConfidence Interval Estimate

1. Level of confidence (1 – )• Width increases as confidence increases

2. Data dispersion (s)• Width increases as variation increases

3. Sample size• Width decreases as sample size increases

4. Distance of xp from mean • Width increases as distance increases

Factors Affecting Interval WidthFactors Affecting Interval Width

x

Sample 2 Line

y

xx1 x2

y

Sample 1 LineGreater dispersion than x1

x

Why Distance from MeanWhy Distance from Mean

You’re a marketing analyst for Hasbro Toys.

You find β0 = -.1, β 1 = .7 and s = .6055.

Ad $ Sales (Units)1 12 13 24 25 4

Find a 95% confidence interval forthe mean sales when advertising is $4.

^^

Confidence Interval Estimate ExampleConfidence Interval Estimate Example

x i y i x i2 y i

2 x iy i

1 1 1 1 1

2 1 4 1 2

3 2 9 4 6

4 2 16 4 8

5 4 25 16 20

15 10 55 26 37

Solution TableSolution Table

2

/ 2

2

ˆ .1 .7 4 2.7

4 312.7 3.182 .6055

5 10

1.645 ( ) 3.755

p

xx

x xy t s

n SS

y

E Y

x to be predicted

Confidence Interval Estimate SolutionConfidence Interval Estimate Solution

Prediction Interval of Individual Value of y at x = xp

Note!

2

/ 2

1ˆ 1

p

xx

x xy t S

n SS

df = n – 2

Prediction Interval of Individual ValuePrediction Interval of Individual Value

Expected(Mean) y

yy we're trying topredict

Prediction, y

xxp

eE(y) = b0 + b1x

^^

^y i =

b 0 + b 1x i

Why the Extra ‘S’ ?Why the Extra ‘S’ ?

You’re a marketing analyst for Hasbro Toys.

You find β0 = -.1, β 1 = .7 and s = .6055.

Ad $ Sales (Units)1 12 13 24 25 4

Predict the sales when advertising is $4. Use a 95% prediction interval.

^^

Prediction Interval ExamplePrediction Interval Example

x i y i x i2 y i

2 x iy i

1 1 1 1 1

2 1 4 1 2

3 2 9 4 6

4 2 16 4 8

5 4 25 16 20

15 10 55 26 37

Solution TableSolution Table

2

/ 2

2

4

1ˆ 1

ˆ .1 .7 4 2.7

4 312.7 3.182 .6055 1

5 10

.503 4.897

p

xx

x xy t s

n SS

y

y

x to be predicted

Prediction Interval SolutionPrediction Interval Solution

Dep Var Pred Std Err Low95% Upp95% Low95% Upp95%Obs SALES Value Predict Mean Mean Predict Predict 1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037 2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497 3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111 4 2.000 2.700 0.332 1.644 3.755 0.502 4.897 5 4.000 3.400 0.469 1.907 4.892 0.962 5.837

Predicted y when x = 4

Confidence Interval

SYPredictionInterval

Interval Estimate Computer OutputInterval Estimate Computer Output

x

y

x

^^ ^

y i = b 0 +

b 1x i

Confidence Intervals vs. Prediction IntervalsConfidence Intervals vs. Prediction Intervals

top related