14.2–the coefficient of determination -...

Post on 13-May-2018

228 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

§14.2–The Coefficient of Determination

Tom Lewis

Fall Term 2009

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 1 / 13

Outline

1 Review

2 The regression identity

3 Some computing formulas

4 The coefficient of determination

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 2 / 13

Review

Variation

Given a data set {w1,w2, . . . ,wn}, we define the variation of the setby ∑

(wi − w)2 =∑

w2i − (

∑wi )

2

n

The variation of a set measures the deviation of the set from its mean.

The variation of a data set is closely related to its standard deviation.For example, if s is the population standard deviation of this set, then

s =

√variation

n − 1or variation = (n − 1)s2

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 3 / 13

Review

Variation

Given a data set {w1,w2, . . . ,wn}, we define the variation of the setby ∑

(wi − w)2 =∑

w2i − (

∑wi )

2

n

The variation of a set measures the deviation of the set from its mean.

The variation of a data set is closely related to its standard deviation.For example, if s is the population standard deviation of this set, then

s =

√variation

n − 1or variation = (n − 1)s2

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 3 / 13

Review

Variation

Given a data set {w1,w2, . . . ,wn}, we define the variation of the setby ∑

(wi − w)2 =∑

w2i − (

∑wi )

2

n

The variation of a set measures the deviation of the set from its mean.

The variation of a data set is closely related to its standard deviation.For example, if s is the population standard deviation of this set, then

s =

√variation

n − 1or variation = (n − 1)s2

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 3 / 13

Review

Variation

Given a data set {w1,w2, . . . ,wn}, we define the variation of the setby ∑

(wi − w)2 =∑

w2i − (

∑wi )

2

n

The variation of a set measures the deviation of the set from its mean.

The variation of a data set is closely related to its standard deviation.For example, if s is the population standard deviation of this set, then

s =

√variation

n − 1or variation = (n − 1)s2

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 3 / 13

Review

Regression formulas

Given a set of n ordered pairs (x1, y1), . . . , (xn, yn), let

Sxx =∑

(xi − x)2 =∑

x2i −

( ∑xi )

2

n

Syy =∑

(yi − y)2 =∑

y2i −

( ∑yi )

2

n

Sxy =∑

(xi − x)(yi − y) =∑

i

xiyi −( ∑

i xi )( ∑

yi

)n

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 4 / 13

Review

The regression equation

The regression equation for a set of n data points is

y = b1x + b0,

where b1 = Sxy/Sxx and b0 = y − b1x .

The big picture

According to our model,

y = b1x + b0︸ ︷︷ ︸regression

+ e︸︷︷︸error

We will show that the total variation in the variable y (SST ) can beseparated into the variation due to the regression model (SSR) and thevariation due to the error (SSE ).

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 5 / 13

Review

The regression equation

The regression equation for a set of n data points is

y = b1x + b0,

where b1 = Sxy/Sxx and b0 = y − b1x .

The big picture

According to our model,

y = b1x + b0︸ ︷︷ ︸regression

+ e︸︷︷︸error

We will show that the total variation in the variable y (SST ) can beseparated into the variation due to the regression model (SSR) and thevariation due to the error (SSE ).

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 5 / 13

Review

Problem (Variation in the data)

Recall that the regression equation for the data set (1, 2), (3, 5), and (4, 8)is

y =27

14x − 1

7

This gives us four columns of data to study:

x y y y − y

1 2 25/14 −3/14

3 5 79/14 9/14

4 8 53/7 −3/7

Compute the variation of the data in each of the columns.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 6 / 13

Review

Solution (Part I)

The variation in the x data is simply

Sxx =∑

x2i −

(∑

xi )2

n= 26− 82

3=

14

3.

The variation in the y data is simply

Syy =∑

y2i − (

∑yi )

2

n= 93− (15)2

3= 18.

The variation in the y data is also denoted by SST for the total sumof squares variation. Let us remember that

SST = Syy

Our calculations continue on the next slide.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 7 / 13

Review

Solution (Part I)

The variation in the x data is simply

Sxx =∑

x2i −

(∑

xi )2

n= 26− 82

3=

14

3.

The variation in the y data is simply

Syy =∑

y2i − (

∑yi )

2

n= 93− (15)2

3= 18.

The variation in the y data is also denoted by SST for the total sumof squares variation. Let us remember that

SST = Syy

Our calculations continue on the next slide.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 7 / 13

Review

Solution (Part I)

The variation in the x data is simply

Sxx =∑

x2i −

(∑

xi )2

n= 26− 82

3=

14

3.

The variation in the y data is simply

Syy =∑

y2i − (

∑yi )

2

n= 93− (15)2

3= 18.

The variation in the y data is also denoted by SST for the total sumof squares variation. Let us remember that

SST = Syy

Our calculations continue on the next slide.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 7 / 13

Review

Solution (Part II)

The variation in the y data is denoted by SSR, which stands for thesum of the squares in the regression data. For our set we have

SSR =∑

y2 − (∑

y)2

n=

1293

14− (15)2

3=

243

14.

The variation in the y − y data is denoted by SSE, which stands forthe sum of the squarea in the error data. For out set we have

SSE =∑

(y − y)2 − (∑

(y − y))2

n=

9

14− 02

3=

9

14.

Notice that∑

(y − y) = 0. This is not a coincidence; the sum of theerrors is always 0.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 8 / 13

Review

Solution (Part II)

The variation in the y data is denoted by SSR, which stands for thesum of the squares in the regression data. For our set we have

SSR =∑

y2 − (∑

y)2

n=

1293

14− (15)2

3=

243

14.

The variation in the y − y data is denoted by SSE, which stands forthe sum of the squarea in the error data. For out set we have

SSE =∑

(y − y)2 − (∑

(y − y))2

n=

9

14− 02

3=

9

14.

Notice that∑

(y − y) = 0. This is not a coincidence; the sum of theerrors is always 0.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 8 / 13

Review

Solution (Part II)

The variation in the y data is denoted by SSR, which stands for thesum of the squares in the regression data. For our set we have

SSR =∑

y2 − (∑

y)2

n=

1293

14− (15)2

3=

243

14.

The variation in the y − y data is denoted by SSE, which stands forthe sum of the squarea in the error data. For out set we have

SSE =∑

(y − y)2 − (∑

(y − y))2

n=

9

14− 02

3=

9

14.

Notice that∑

(y − y) = 0. This is not a coincidence; the sum of theerrors is always 0.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 8 / 13

The regression identity

The regression identity

Notice that

SSR + SSE =243

14+

9

14=

252

14= 18 = SST .

This is not a coincidence. In general,

SST = SSR + SSE

This is called the regression identity.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 9 / 13

Some computing formulas

Computing formula for SST

Recall thatSST = Syy .

In other words, SST is nothing more than the variation in the y -data.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 10 / 13

Some computing formulas

Computing formula for SSR

The key observation is that∑

y =∑

y and therefore the mean of theregression data (the y -data) is y ; thus,

SSR =∑

(y − y)2 =∑

(b1x + b0 − b1x − b0)2

= b21

∑(x − x)2 = b2

1Sxx

=S2

xy

S2xx

Sxx =S2

xy

Sxx

In summary,

SSR =S2

xy

Sxx.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 11 / 13

Some computing formulas

Computing formula for SSE

From the regression identity, we have SSE = SST − SSR; therefore,

SSE = Syy −S2

xy

Sxx

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 12 / 13

The coefficient of determination

The coefficient of determination

The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:

r2 =SSR

SST

Problem

Develop a computing formula for r2.

What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?

Problem

Work on the regression handout.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13

The coefficient of determination

The coefficient of determination

The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:

r2 =SSR

SST

Problem

Develop a computing formula for r2.

What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?

Problem

Work on the regression handout.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13

The coefficient of determination

The coefficient of determination

The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:

r2 =SSR

SST

Problem

Develop a computing formula for r2.

What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?

Problem

Work on the regression handout.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13

The coefficient of determination

The coefficient of determination

The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:

r2 =SSR

SST

Problem

Develop a computing formula for r2.

What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?

Problem

Work on the regression handout.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13

The coefficient of determination

The coefficient of determination

The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:

r2 =SSR

SST

Problem

Develop a computing formula for r2.

What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?

Problem

Work on the regression handout.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13

top related