9.simple linear regession and correlation 9.1regression and correlation 9.2regression model...

54
9. SIMPLE LINEAR REGESSION AND CORRELATION 9.1 Regression and Correlation 9.2 Regression Model 9.3 Probabilistic Models 9.4 Fitting The Model: The Least Square Approach 9.5 The Least-Square Lines 9.6 The Least-Squares Assumption 9.7 Model Assumptions of Simple Regression 9.8 Assessing the utility of the model- making inference about the slope 9.9 The Coefficient of Correlation 9.10 Calculating r 2 9.11 Correlation Model 9.12 Correlation Coefficient 9.13 The Coefficient of Determination 9.14 Using The Model for Estimation and

Post on 19-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9. SIMPLE LINEAR REGESSION AND CORRELATION9.1 Regression and Correlation

9.2 Regression Model

9.3 Probabilistic Models

9.4 Fitting The Model: The Least Square Approach

9.5 The Least-Square Lines

9.6 The Least-Squares Assumption

9.7 Model Assumptions of Simple Regression

9.8 Assessing the utility of the model-making inference about the slope

9.9 The Coefficient of Correlation

9.10 Calculating r2

9.11 Correlation Model

9.12 Correlation Coefficient

9.13 The Coefficient of Determination

9.14 Using The Model for Estimation and Prediction

Page 2: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.1 Regression and Correlation

• Regression: Helpful in ascertaining the probable form of the relationship between variables, and predict or estimate the value corresponding to a given value of another variable.

• Correlation: Measuring the strength of the relationship between variables.

Page 3: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.2 Regression Model

Two variables, X and Y, are interest.

Where, X = independent variable

Y = dependent variable

= Random error component

(beta zero) = y intercept of the line

(beta one) = Slope of the line , the amount of

increase or decrease in the

deterministic of y for every 1 unit of x

increase or decrease.

xyE 10)(

01

Page 4: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Figure 9a. Regression Model

Page 5: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.3 Probabilistic Models

9.3.1 General Form of Probabilistic Models

y = Deterministic component + Random error

Where y is the variable of interest. We always assume that the mean value of the random error equals 0. This is equivalent to assuming that the mean value of y, E(y), equals the deterministic component of the model; that is,

E(y) = Deterministic component

Page 6: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.3.2 A First-Order (Straight-line) Probabilistic Model

Where y = Dependent or response variable (variable to be

modeled) x = Independent or predictor variable (variable used

as a predictor of y) E(y) = = Deterministic component (epsilon) = Random error component (beta zero) = y-intercept of the line, that is, the point at which

the line intercepts or cuts through the y-axis (see figure 9b below)

(beta one) = Slope of the line, that is, the amount of increase (or decrease) in the deterministic component of y for every one-unit increase in x.

xy 10

x10

0

1

Page 7: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Figure 9b. The straight-line model

Page 8: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.4 Fitting The Model:The Least Square Approach

Table 9a. Reaction Time Versus Drug Percentage

Subject Amount of Drug x (%)

Reaction Time y (seconds)

1

2

3

4

5

1

2

3

4

5

1

1

2

2

4

Page 9: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Figure 9c.

(for data in table above)2) Visual straight line1) Scattergram

(fitted for the data above)

Page 10: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

• The Least-Squares Line is result in obtaining the desired line which called method of lease-squares.

Where, y = value on the vertical axis

x = value on the horizontal axis= point where the line crosses the vertical

axis= shows the amount by which y changes for

each unit change in x.

9.5 The Least-Squares Lines

xy 10

0

1

Page 11: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.5.1 Definition of Least Square Line

The least square line is one that has the following two properties:

1. The sum of the errors (SE) equals 0

2. The sum of squared errors (SSE) is smaller than that for any other straight-line model

Page 12: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.5.2 Formulas for the Least Squares Estimation

Where;

xbybintercepty

SS

SSbSlope

xx

xy

10

1

ˆˆ:

ˆ:

size Sample

)()(

))(())((

222

nn

xxxxSS

n

yxyxyyxxSS

iixx

iiiiiixy

i

Page 13: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Figure 9d. Scatter Diagram

Page 14: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

The total deviation:- measuring the vertical distance from line.

The explained deviation:- shows how much the total deviation is reduced when the regression line is fitted to the points.

Unexplained deviation:- shows the proportion of the total deviation accounted for by the introduction of the regression line.

)( yyi

)ˆ( yy

)()ˆ()( yyyyyy ii total deviation Unexplained

deviationExplaineddeviation

Page 15: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Total sum of squares (SST):- to measure of the total variation in observed values of Y.

Explained sum of squares (SSR) :- measures the amount of the total variability in the observes values of Y that is accounted for by the linear relationship between the observed values of X and Y.

Unexplained sum of squares (SST):-measure the dispersion of the observed Y values about the regression line.

2)ˆ( yy

2)ˆ( yyi

Page 16: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.6 The Least-Squares Assumption

Consider now a reasonable criterion for estimating and from data. The method of ordinary least squares (OLS) determines values of and (since these will be estimated from data, we will replace and with Latin letters a and b).

Page 17: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

so that the sum of the squared vertical deviations residuals) between the data and the fitted line,

Residuals = Data -Fit,

is less than the sum of the squared vertical deviations from any other straight line that could be fitted through the data:

Minimum of (Data - Fit)²

Page 18: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

 A "vertical deviation" is the vertical distance from an observed point to the line. Each deviation in the sample is squared and the least-squares line is defined to be the straight line that makes the sum of these squared deviations a minimum:  

Data = a + bX + Residuals.

Page 19: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Figure 1 (a) illustrates the regression relationship between two variables, Y and X. The arithmetic mean of the observed values of Y is denoted by . The vertical dashed lines represent the total deviations of each value y from the mean value .

Part (b) in Figure 1 shows a linear least-squares regression line fitted to the observed points

y

y

Page 20: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

(a) Total variation (b) Least-squares regression

Figure 9e: The total variation of Y and the least-squares regression between Y and X.

Page 21: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

The total variation can be expressed in terms of (1) the variation explained by the regression and (2) a residual portion called the unexplained variation.

Page 22: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Figure 2 (a) shows the explained variation, which is expressed by the vertical distance between any fitted (predicted) value and the mean or - . The circumflex (^) over the y is used to represent fitted values determined by a model. Thus, it is also customary to write a = and b = . Figure 2 (b) shows the unexplained or residual variation-the vertical distance between the observed values and the pre-dicted values (y - )

yi

ˆ_

y

yi

ˆ

Page 23: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

(a)  Explained variation (b) Unexplained variation

Figure 9f. The explained and unexplained variation in least-squares regression.

Page 24: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.7 Model Assumptions of Simple Regression

Assumption 1:The mean of the probability distribution of is 0. That is, the average

of the values of over an infinitely long series of experiments is 0 for each setting of the independents variable x. This assumption implies that the mean value of y, E(y), for given value of x is .

xyE 10)(

xy 10

Page 25: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Assumption 2:

The variance of the probability distribution of is constant for all settings of the independent variable x. For our straight-line model, this assumption means that the variance of is equal to a constant, say , for all values of x.

Assumption 3:

The probability distribution of is normal.

Assumption 4:

The values of associated with any two observed values of y are independent. That is, the value of associated with one value of y has no effect on the values of associated with other y values.

2

Page 26: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Figure 9g. The probability distribution of

Page 27: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.8 Assessing the Utility of the Model: Making Inference About the Slope

9.8.1 A Test Of Model Utility: Simple Linear Regression

One-Tailed Test Two-Tailed Test

Page 28: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Where are based on degrees of freedom

Assumption: Refer the four assumption about

2 and tt )2( n

Page 29: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Figure 9h. Rejection region and calculated t value for testing versus

Page 30: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.8.2 A Confidence Interval for the Simple Linear Regression Slope

Where the estimated standard error of is calculated by

And is based on (n-2) degrees of freedom.

Assumption: Refer the four assumption about

Page 31: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.9 The Coefficient of Correlation

Definition:

The Pearson product moment coefficient of correlation, r, is a measure of a strength of the linear relationship between two variables x and y. It is computed (for a sample of n measurements on x and y) as follows:

yyxx

xy

SSSS

SSr

Page 32: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Figure 9i. Value of r and their implication

1) Positive r : y increases as x increases

Page 33: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

2) r near zero: little or no relationship between y and x

Page 34: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

3) Negative r : y decreases as x increases

Page 35: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

4) r = 1: a perfect positive relationship between y and x

Page 36: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

5) r = -1: a perfect negative relationship between y and x

Page 37: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

6) r near 0: little or no relationship between y and x

Page 38: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.10 Calculating r2

Where:

r = The sample correlation

Page 39: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Fig 9j. r2 as a measure of closeness of fit of the sample regression line to the sample observation

Page 40: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.11 Correlation Model

• We have what is called Correlation Model, when Y and X are random variable.

• Involving two variables implies a co-relationship between them.

• One variable as dependent and another one as independent.

Page 41: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.12 The Correlation Coefficient ( )

• Measures the strength of the linear relationship between X and Y.

• May assumed any value between –1 and +1. • If = 1, there is perfect direct linear.• If = -1, indicates perfect inverse linear

correlation.

Page 42: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.13 The Coefficient of Determination

• Figure 9k. A comparison of the sum of squares of deviations for two

models

Page 43: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

b

Page 44: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

c

Page 45: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.13.1 Coefficient of Determination Definition

• It represents the proportion of the total sample variability around that is explained by the linear relationship between y and x. (In simple linear regression, it may also be computed as the square of the coefficient of correlation r.

yyyy

yy

SS

SSE

SS

SSESSr

12

y

Page 46: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.14 Using The Model for Estimation and Prediction

Page 47: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.14.1 Sampling Errors for the Estimator of the Mean of y and the Predictor of an Individual New Value of y

1. The standard deviation of the sampling distribution of the estimator of the mean of y at a specific value of x, say xp is

Where is the standard deviation of the random error . We refer to as the standard error of .

y

xx

py SS

xx

n

2

ˆ

)(1

y y

Page 48: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

2. The standard deviation of the prediction error for the predictor of an individual new y value at a specific value of x is

Where is the standard deviation of the random error . We refer to as the standard error of prediction.

y

xx

pyy SS

xx

n

2

ˆ

)(11

yy ˆ

Page 49: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.14.2 A Confidence Interval for the Mean Value of y at x = xp

Where is based on (n-2) degrees of freedom.

Page 50: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

9.14.3 A Prediction Interval for an Individual New Value of y at x = xp

Where is based on (n-2) degrees of freedom.

Page 51: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Figure 9l. A 95% confidence interval for mean sales and a prediction interval for drug concentration when x = 4

Page 52: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Figure 9m. Error of estimating the mean value of y for a given value of x

Page 53: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Figure 9n. Error of predicting a future value of y for a given value of x

Page 54: 9.SIMPLE LINEAR REGESSION AND CORRELATION 9.1Regression and Correlation 9.2Regression Model 9.3Probabilistic Models 9.4Fitting The Model: The Least Square

Figure 9o. Confidence intervals for mean value and prediction intervals for new values