1 relationships we have examined how to measure relationships between two categorical variables...

Post on 22-Dec-2015

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Relationships

We have examined how to measure relationships between two categorical variables (chi-square)one categorical variable and one

measurement variable (t-test, F-test)

Now we look at relationships between two measurement variables

2

Interval variable relations

We want to describe the relationship in terms of formstrength

We want to make inferences to the population

3

Our Tools

Correlation to measure strength of relationship

Regression to measure form of relationship

4

Regression

Begin with a scatterplot of two measurement variables, X and Y

Let X be the independent variable

Let Y be the dependent variable

Plot each case as we have done before at the beginning of the course.

5

ScatterplotH

om

icid

es/

100,0

00 p

eople

Percent families below poverty0.00 5.00 10.00 15.00

0.00

10.00

20.00

30.00

Note:

6

Note the outlier: Dallas

7

Relationships

Each city is represented by an X score (percent poor) and a Y score (homicide rate)We are asking about the relationship between poverty and homicideDoes homicide change as percent poor changes? If so, in what way and how much?

8

Looking at the scatterplot

We see that as percent poor (poverty) increases (from left to right on the graph), the homicide rate increases (from low to high on the graph

9

ScatterplotH

om

icid

es/

100,0

00 p

eople

Percent families below poverty0.00 5.00 10.00 15.00

0.00

10.00

20.00

30.00

10

Representing relationships

We represent the relationship with a straight line that goes through the middle of the points on the graph

This line is the regression line

It shows the average homicide rate for every level of poverty.

11

Regression Line

0.00 5.00 10.00 15.00

0.00

10.00

20.00

30.00

12

Regression LineEvery line is represented by a formula

The regression line has the following general formula

‘a’ represents the intercept of the line

‘b’ represents the slope of the line

y-hat is the predicted value of y for a given x value

bxay ˆ

13

Regression of homicide on poverty

xy 944.815.ˆ

a = -.815 b = .944

x is percent poory is homicide rate

14

Slope, the value of bThe slope of the regression line is positive, it goes from the lower left to the upper right.

The slope measures the amount of change in the dependent variable for every unit change in the independent variable

b = .944. There is an increase of .944 units in y for every increase of 1.0 in x

15

Regression Line, slope

Percent families below poverty0.00 5.00 10.00

0.00

10.00

20.00

5 units

5 x .944 units

RegressionLine

“rise”

“run”

16

Intercept, the value of a

The intercept is the point where the regression line crosses the Y axis

This point is the value of Y when X is zero

a = -.815. The predicted rate of homicide is -.815 when there is zero poverty

17

Calculate b

2xx

yyxxb

i

ii

18

Calculate a

First calculate b, then

xbya

19

Calculate predicted y

After calculating a and b, one can use the regression line formula to calculate predicted values of y for every actual value of x

xy 944.815.ˆ

20

Prediction errors

Prediction errors are the difference between the predicted value of y and the actual value of y

yye ii ˆ

21

Prediction errors

Errors (actualminus predicted)

RegressionLine

Predicted

Actual

22

Ordinary Least Squares: OLS

The regression line is the “best fitting” line through the data points in the graphIt is the line that minimizes the sum of the squared error terms -- hence “least squares”

22 yye iiMinimize:

23

Sums of Squared Errors

-1.0 -0.9 -0.8 -0.7 -0.6

0.7

0.9

1.1

460.0

480.0

500.0

520.0

540.0

560.0

580.0

600.0

620.0

640.0

SS

a

b

24

Sum of Squared Errors

-1.0 -0.9 -0.8 -0.7 -0.60.7 638.7 630.2 622.0 614.3 607.00.8 572.9 567.6 562.8 558.3 554.30.9 537.9 535.9 534.3 533.2 532.41.0 533.7 535.0 536.7 538.8 541.31.1 560.4 565.0 569.9 575.3 581.1

Minimum is 531.57 when a=-.815, b=.944

b

a

top related