14.2–the coefficient of determination -...

25
§14.2–The Coefficient of Determination Tom Lewis Fall Term 2009 Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 1 / 13

Upload: nguyenlien

Post on 13-May-2018

227 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

§14.2–The Coefficient of Determination

Tom Lewis

Fall Term 2009

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 1 / 13

Page 2: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Outline

1 Review

2 The regression identity

3 Some computing formulas

4 The coefficient of determination

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 2 / 13

Page 3: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

Variation

Given a data set {w1,w2, . . . ,wn}, we define the variation of the setby ∑

(wi − w)2 =∑

w2i − (

∑wi )

2

n

The variation of a set measures the deviation of the set from its mean.

The variation of a data set is closely related to its standard deviation.For example, if s is the population standard deviation of this set, then

s =

√variation

n − 1or variation = (n − 1)s2

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 3 / 13

Page 4: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

Variation

Given a data set {w1,w2, . . . ,wn}, we define the variation of the setby ∑

(wi − w)2 =∑

w2i − (

∑wi )

2

n

The variation of a set measures the deviation of the set from its mean.

The variation of a data set is closely related to its standard deviation.For example, if s is the population standard deviation of this set, then

s =

√variation

n − 1or variation = (n − 1)s2

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 3 / 13

Page 5: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

Variation

Given a data set {w1,w2, . . . ,wn}, we define the variation of the setby ∑

(wi − w)2 =∑

w2i − (

∑wi )

2

n

The variation of a set measures the deviation of the set from its mean.

The variation of a data set is closely related to its standard deviation.For example, if s is the population standard deviation of this set, then

s =

√variation

n − 1or variation = (n − 1)s2

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 3 / 13

Page 6: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

Variation

Given a data set {w1,w2, . . . ,wn}, we define the variation of the setby ∑

(wi − w)2 =∑

w2i − (

∑wi )

2

n

The variation of a set measures the deviation of the set from its mean.

The variation of a data set is closely related to its standard deviation.For example, if s is the population standard deviation of this set, then

s =

√variation

n − 1or variation = (n − 1)s2

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 3 / 13

Page 7: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

Regression formulas

Given a set of n ordered pairs (x1, y1), . . . , (xn, yn), let

Sxx =∑

(xi − x)2 =∑

x2i −

( ∑xi )

2

n

Syy =∑

(yi − y)2 =∑

y2i −

( ∑yi )

2

n

Sxy =∑

(xi − x)(yi − y) =∑

i

xiyi −( ∑

i xi )( ∑

yi

)n

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 4 / 13

Page 8: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

The regression equation

The regression equation for a set of n data points is

y = b1x + b0,

where b1 = Sxy/Sxx and b0 = y − b1x .

The big picture

According to our model,

y = b1x + b0︸ ︷︷ ︸regression

+ e︸︷︷︸error

We will show that the total variation in the variable y (SST ) can beseparated into the variation due to the regression model (SSR) and thevariation due to the error (SSE ).

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 5 / 13

Page 9: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

The regression equation

The regression equation for a set of n data points is

y = b1x + b0,

where b1 = Sxy/Sxx and b0 = y − b1x .

The big picture

According to our model,

y = b1x + b0︸ ︷︷ ︸regression

+ e︸︷︷︸error

We will show that the total variation in the variable y (SST ) can beseparated into the variation due to the regression model (SSR) and thevariation due to the error (SSE ).

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 5 / 13

Page 10: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

Problem (Variation in the data)

Recall that the regression equation for the data set (1, 2), (3, 5), and (4, 8)is

y =27

14x − 1

7

This gives us four columns of data to study:

x y y y − y

1 2 25/14 −3/14

3 5 79/14 9/14

4 8 53/7 −3/7

Compute the variation of the data in each of the columns.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 6 / 13

Page 11: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

Solution (Part I)

The variation in the x data is simply

Sxx =∑

x2i −

(∑

xi )2

n= 26− 82

3=

14

3.

The variation in the y data is simply

Syy =∑

y2i − (

∑yi )

2

n= 93− (15)2

3= 18.

The variation in the y data is also denoted by SST for the total sumof squares variation. Let us remember that

SST = Syy

Our calculations continue on the next slide.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 7 / 13

Page 12: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

Solution (Part I)

The variation in the x data is simply

Sxx =∑

x2i −

(∑

xi )2

n= 26− 82

3=

14

3.

The variation in the y data is simply

Syy =∑

y2i − (

∑yi )

2

n= 93− (15)2

3= 18.

The variation in the y data is also denoted by SST for the total sumof squares variation. Let us remember that

SST = Syy

Our calculations continue on the next slide.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 7 / 13

Page 13: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

Solution (Part I)

The variation in the x data is simply

Sxx =∑

x2i −

(∑

xi )2

n= 26− 82

3=

14

3.

The variation in the y data is simply

Syy =∑

y2i − (

∑yi )

2

n= 93− (15)2

3= 18.

The variation in the y data is also denoted by SST for the total sumof squares variation. Let us remember that

SST = Syy

Our calculations continue on the next slide.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 7 / 13

Page 14: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

Solution (Part II)

The variation in the y data is denoted by SSR, which stands for thesum of the squares in the regression data. For our set we have

SSR =∑

y2 − (∑

y)2

n=

1293

14− (15)2

3=

243

14.

The variation in the y − y data is denoted by SSE, which stands forthe sum of the squarea in the error data. For out set we have

SSE =∑

(y − y)2 − (∑

(y − y))2

n=

9

14− 02

3=

9

14.

Notice that∑

(y − y) = 0. This is not a coincidence; the sum of theerrors is always 0.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 8 / 13

Page 15: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

Solution (Part II)

The variation in the y data is denoted by SSR, which stands for thesum of the squares in the regression data. For our set we have

SSR =∑

y2 − (∑

y)2

n=

1293

14− (15)2

3=

243

14.

The variation in the y − y data is denoted by SSE, which stands forthe sum of the squarea in the error data. For out set we have

SSE =∑

(y − y)2 − (∑

(y − y))2

n=

9

14− 02

3=

9

14.

Notice that∑

(y − y) = 0. This is not a coincidence; the sum of theerrors is always 0.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 8 / 13

Page 16: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Review

Solution (Part II)

The variation in the y data is denoted by SSR, which stands for thesum of the squares in the regression data. For our set we have

SSR =∑

y2 − (∑

y)2

n=

1293

14− (15)2

3=

243

14.

The variation in the y − y data is denoted by SSE, which stands forthe sum of the squarea in the error data. For out set we have

SSE =∑

(y − y)2 − (∑

(y − y))2

n=

9

14− 02

3=

9

14.

Notice that∑

(y − y) = 0. This is not a coincidence; the sum of theerrors is always 0.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 8 / 13

Page 17: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

The regression identity

The regression identity

Notice that

SSR + SSE =243

14+

9

14=

252

14= 18 = SST .

This is not a coincidence. In general,

SST = SSR + SSE

This is called the regression identity.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 9 / 13

Page 18: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Some computing formulas

Computing formula for SST

Recall thatSST = Syy .

In other words, SST is nothing more than the variation in the y -data.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 10 / 13

Page 19: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Some computing formulas

Computing formula for SSR

The key observation is that∑

y =∑

y and therefore the mean of theregression data (the y -data) is y ; thus,

SSR =∑

(y − y)2 =∑

(b1x + b0 − b1x − b0)2

= b21

∑(x − x)2 = b2

1Sxx

=S2

xy

S2xx

Sxx =S2

xy

Sxx

In summary,

SSR =S2

xy

Sxx.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 11 / 13

Page 20: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

Some computing formulas

Computing formula for SSE

From the regression identity, we have SSE = SST − SSR; therefore,

SSE = Syy −S2

xy

Sxx

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 12 / 13

Page 21: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

The coefficient of determination

The coefficient of determination

The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:

r2 =SSR

SST

Problem

Develop a computing formula for r2.

What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?

Problem

Work on the regression handout.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13

Page 22: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

The coefficient of determination

The coefficient of determination

The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:

r2 =SSR

SST

Problem

Develop a computing formula for r2.

What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?

Problem

Work on the regression handout.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13

Page 23: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

The coefficient of determination

The coefficient of determination

The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:

r2 =SSR

SST

Problem

Develop a computing formula for r2.

What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?

Problem

Work on the regression handout.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13

Page 24: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

The coefficient of determination

The coefficient of determination

The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:

r2 =SSR

SST

Problem

Develop a computing formula for r2.

What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?

Problem

Work on the regression handout.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13

Page 25: 14.2–The Coefficient of Determination - Mathematicsmath.furman.edu/~tlewis/math241/weiss/chap14/sec3.pdf · Tom Lewis §14.2–The Coefficient of Determination Fall Term 2009 10

The coefficient of determination

The coefficient of determination

The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:

r2 =SSR

SST

Problem

Develop a computing formula for r2.

What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?

Problem

Work on the regression handout.

Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13