point biserial correlation

68
Point Biserial Correlation Welcome to the Point Biserial Correlation Conceptual Explanation

Upload: byu-center-for-teaching-learning

Post on 17-Jul-2015

191 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Point biserial correlation

Point Biserial Correlation

Welcome to the Point Biserial Correlation Conceptual Explanation

Page 2: Point biserial correlation

• Point biserial correlation is an estimate of the coherence between two variables, one of which is dichotomous and one of which is continuous.

Page 3: Point biserial correlation

• Point biserial correlation is an estimate of the coherence between two variables, one of which is dichotomous and one of which is continuous.

Coherence means how much the two variables covary.

Page 4: Point biserial correlation

• Let’s look at an example of two variables cohering

Page 5: Point biserial correlation

• The data set below represents the average decibel levels at which different age groups listen to music.

Page 6: Point biserial correlation

• The data set below represents the average decibel levels at which different age groups listen to music.

Age Group Decibels

80s 30

70s 35

60s 37

50s 39

40s 45

30s 50

20s 75

Teens 95

Page 7: Point biserial correlation

• The data set below represents the average decibel levels at which different age groups listen to music.

The reason these two variables (age group and decibel level) cohere is because as one increases

the other either increases or decreases commensurately.

Age Group Decibels

80s 30

70s 35

60s 37

50s 39

40s 45

30s 50

20s 75

Teens 95

Page 8: Point biserial correlation

• The data set below represents the average decibel levels at which different age groups listen to music.

In this case

Age Group Decibels

80s 30

70s 35

60s 37

50s 39

40s 45

30s 50

20s 75

Teens 95

Page 9: Point biserial correlation

• The data set below represents the average decibel levels at which different age groups listen to music.

Age Group Decibels

80s 30

70s 35

60s 37

50s 39

40s 45

30s 50

20s 75

Teens 95

In this case as age goes up

Page 10: Point biserial correlation

• The data set below represents the average decibel levels at which different age groups listen to music.

Age Group Decibels

80s 30

70s 35

60s 37

50s 39

40s 45

30s 50

20s 75

Teens 95

In this case as age goes up

Page 11: Point biserial correlation

• The data set below represents the average decibel levels at which different age groups listen to music.

Age Group Decibels

80s 30

70s 35

60s 37

50s 39

40s 45

30s 50

20s 75

Teens 95

In this case as age goes up, decibels go down

Page 12: Point biserial correlation

• The data set below represents the average decibel levels at which different age groups listen to music.

Age Group Decibels

80s 30

70s 35

60s 37

50s 39

40s 45

30s 50

20s 75

Teens 95

In this case as age goes up, decibels go down

Page 13: Point biserial correlation

• The data set below represents the average decibel levels at which different age groups listen to music.

• This is called a negative relationship.

Age Group Decibels

80s 30

70s 35

60s 37

50s 39

40s 45

30s 50

20s 75

Teens 95

In this case as age goes up, decibels go down

Page 14: Point biserial correlation

• It is called a negative correlation or coherence, because when one variable increases, the other decreases (or vice-a-versa)

Page 15: Point biserial correlation

• A positive correlation would occur when as one variable increases, the other increases or when one decreases the other decreases.

Page 16: Point biserial correlation

• A positive correlation would occur when as one variable increases, the other increases or when one decreases the other decreases.

Page 17: Point biserial correlation

• A positive correlation would occur when as one variable increases, the other increases or when one decreases the other decreases.

• Example

Page 18: Point biserial correlation

• A positive correlation would occur when as one variable increases, the other increases or when one decreases the other decreases.

• Example

• As the temperature rises the average daily purchase of popsicles increases.

Page 19: Point biserial correlation

• A positive correlation would occur when as one variable increases, the other increases or when one decreases the other decreases.

• Example

• As the temperature rises the average daily purchase of popsicles increases.

Average Daily Temp

Average Daily

Popsicle Purchases

Per Person

100 2.30

95 1.20

90 1.00

85 .80

80 .70

75 .10

70 .03

65 .01

Page 20: Point biserial correlation

• A positive correlation would occur when as one variable increases, the other increases or when one decreases the other decreases.

• Example

• As the temperature rises the average daily purchase of popsicles increases.

Average Daily Temp

Average Daily

Popsicle Purchases

Per Person

100 2.30

95 1.20

90 1.00

85 .80

80 .70

75 .10

70 .03

65 .01

Page 21: Point biserial correlation

• A positive correlation would occur when as one variable increases, the other increases or when one decreases the other decreases.

• Example

• As the temperature rises the average daily purchase of popsicles increases.

• These variables are positively correlated because as one variable (Daily Temp) increases another variable (average daily popsicle purchase) increases.

Average Daily Temp

Average Daily

Popsicle Purchases

Per Person

100 2.30

95 1.20

90 1.00

85 .80

80 .70

75 .10

70 .03

65 .01

Page 22: Point biserial correlation

• It can be stated another way:

Page 23: Point biserial correlation

• It can be stated another way:

• As the average daily temperature decreases the average daily popsicle purchases decrease as well.

Page 24: Point biserial correlation

• It can be stated another way:

• As the average daily temperature decreases the average daily popsicle purchases decrease as well.

Average Daily Temp

Average Daily

Popsicle Purchases

Per Person

100 2.30

95 1.20

90 1.00

85 .80

80 .70

75 .10

70 .03

65 .01

Page 25: Point biserial correlation

• It can be stated another way:

• As the average daily temperature decreases the average daily popsicle purchases decrease as well.

Average Daily Temp

Average Daily

Popsicle Purchases

Per Person

100 2.30

95 1.20

90 1.00

85 .80

80 .70

75 .10

70 .03

65 .01

Page 26: Point biserial correlation

• It can be stated another way:

• As the average daily temperature decreases the average daily popsicle purchases decrease as well.

• These variables are also positively correlated because as one variable (Daily Temp) decreases another variable (average daily popsicle purchase) decreases.

Average Daily Temp

Average Daily

Popsicle Purchases

Per Person

100 2.30

95 1.20

90 1.00

85 .80

80 .70

75 .10

70 .03

65 .01

Page 27: Point biserial correlation

• Let’s return to our Point Biserial Correlation definition:

Page 28: Point biserial correlation

• Let’s return to our Point Biserial Correlation definition:

• “Point biserial correlation is an estimate of the coherence between two variables, one of which is dichotomous and one of which is continuous.”

Page 29: Point biserial correlation

• Let’s return to our Point Biserial Correlation definition:

• “Point biserial correlation is an estimate of the coherence between two variables, one of which is dichotomous and one of which is continuous.”

We discussed coherence

Page 30: Point biserial correlation

• Let’s return to our Point Biserial Correlation definition:

• “Point biserial correlation is an estimate of the coherence between two variables, one of which is dichotomous and one of which is continuous.”

But, what is a dichotomous variable?

Page 31: Point biserial correlation

• A dichotomous variable is a variable that can only be one thing or another.

Page 32: Point biserial correlation

• A dichotomous variable is a variable that can only be one thing or another.

• Here are some examples:

Page 33: Point biserial correlation

• A dichotomous variable is a variable that can only be one thing or another.

• Here are some examples:– When you can only answer “Yes” or “No”

Page 34: Point biserial correlation

• A dichotomous variable is a variable that can only be one thing or another.

• Here are some examples:– When you can only answer “Yes” or “No”

– When your statement can only be categorized as “Fact” or “Opinion”

Page 35: Point biserial correlation

• A dichotomous variable is a variable that can only be one thing or another.

• Here are some examples:– When you can only answer “Yes” or “No”

– When your statement can only be categorized as “Fact” or “Opinion”

– When you are either are something or you are not “Catholic” or “Not Catholic”

Page 36: Point biserial correlation

• The dichotomous variable may be naturally occurring as in gender

Page 37: Point biserial correlation

• The dichotomous variable may be naturally occurring as in gender

Page 38: Point biserial correlation

• The dichotomous variable may be naturally occurring as in gender

• or may be arbitrarily dichotomized as in depressed/not depressed.

Page 39: Point biserial correlation

• The dichotomous variable may be naturally occurring as in gender

• or may be arbitrarily dichotomized as in depressed/not depressed.

Page 40: Point biserial correlation

• The range of a point biserial correlation in from -1 to +1.

Page 41: Point biserial correlation

• The range of a point biserial correlation in from -1 to +1.

-1 0 +1

Page 42: Point biserial correlation

• Let’s return again to our Point Biserial Correlation definition:

Page 43: Point biserial correlation

• Let’s return again to our Point Biserial Correlation definition:

• “Point biserial correlation is an estimate of the coherence between two variables, one of which is dichotomous and one of which is continuous.”

Page 44: Point biserial correlation

• Let’s return again to our Point Biserial Correlation definition:

• “Point biserial correlation is an estimate of the coherence between two variables, one of which is dichotomous and one of which is continuous.”

Page 45: Point biserial correlation

• Let’s return again to our Point Biserial Correlation definition:

• “Point biserial correlation is an estimate of the coherence between two variables, one of which is dichotomous and one of which is continuous.”

So, we now know what a dichotomous variable is

(either / or)

Page 46: Point biserial correlation

• Let’s return again to our Point Biserial Correlation definition:

• “Point biserial correlation is an estimate of the coherence between two variables, one of which is dichotomous and one of which is continuous.”

Page 47: Point biserial correlation

• Let’s return again to our Point Biserial Correlation definition:

• “Point biserial correlation is an estimate of the coherence between two variables, one of which is dichotomous and one of which is continuous.”

What is a continuous variable?

Page 48: Point biserial correlation

• Definition of Continuous Variable:

Page 49: Point biserial correlation

• Definition of Continuous Variable:

• If a variable can take on any value between its minimum value and its maximum value, it is called a continuous variable.

Page 50: Point biserial correlation

• Definition of Continuous Variable:

• If a variable can take on any value between its minimum value and its maximum value, it is called a continuous variable.

• Here is an example:

Page 51: Point biserial correlation

• Definition of Continuous Variable:

• If a variable can take on any value between its minimum value and its maximum value, it is called a continuous variable.

• Here is an example:

Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds.

Page 52: Point biserial correlation

• Definition of Continuous Variable:

• If a variable can take on any value between its minimum value and its maximum value, it is called a continuous variable.

• Here is an example:

Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds.

Page 53: Point biserial correlation

• The direction of the correlation depends on how the variables are coded.

Page 54: Point biserial correlation

• The direction of the correlation depends on how the variables are coded.

• Let’s say we are comparing the shame scores (continuous variable from 1-10) and whether someone is depressed or not (dichotomous variable – not depressed = 1 and depressed = 2). .

Page 55: Point biserial correlation

• If the dichotomous variable is coded with the higher value representing the presence of an attribute (depressed)

Page 56: Point biserial correlation

• If the dichotomous variable is coded with the higher value representing the presence of an attribute (depressed)

Person

Depressed

1 = not depressed

2 = depressed

A

B

C

D

E

Page 57: Point biserial correlation

• If the dichotomous variable is coded with the higher value representing the presence of an attribute (depressed)

Person

Depressed

1 = not depressed

2 = depressed

A Depressed

B Depressed

C Depressed

D Not Depressed

E Not Depressed

Page 58: Point biserial correlation

• If the dichotomous variable is coded with the higher value representing the presence of an attribute (depressed)

Person

Depressed

1 = not depressed

2 = depressed

A 2

B 2

C 2

D 1

E 1

Page 59: Point biserial correlation

• . . . and the continuous variable is coded with higher values representing the increasing presence of an attribute (shame),

Page 60: Point biserial correlation

• . . . and the continuous variable is coded with higher values representing the increasing presence of an attribute (shame),

Person

Depressed

1 = not depressed

2 = depressed

Amount of Shame

A 2 10

B 2 9

C 2 10

D 1 2

E 1 2

Page 61: Point biserial correlation

• . . . and the continuous variable is coded with higher values representing the increasing presence of an attribute (shame),

• then positive values of the point-biserial would indicate higher shame associated with depressed status. In this case we would compute a phi-coefficient of +.99

Person

Depressed

1 = not depressed

2 = depressed

Amount of Shame

A 2 10

B 2 9

C 2 10

D 1 2

E 1 2

Page 62: Point biserial correlation

• . . . and the continuous variable is coded with higher values representing the increasing presence of an attribute (shame),

• then positive values of the point-biserial would indicate higher shame associated with depressed status. In this case we would compute a Point Biserial of +.99

Person

Depressed

1 = not depressed

2 = depressed

Amount of Shame

A 2 10

B 2 9

C 2 10

D 1 2

E 1 2

Page 63: Point biserial correlation

• If we switch the codes where not depressed = 2 and depressed = 1

Page 64: Point biserial correlation

• If we switch the codes where not depressed = 2 and depressed = 1

Person

Depressed

1 = not depressed

2 = depressed

Amount of Shame

A 1 10

B 1 9

C 1 10

D 2 2

E 2 2

Page 65: Point biserial correlation

• If we switch the codes where not depressed = 2 and depressed = 1

• We would have a -.99 correlation.

Person

Depressed

1 = not depressed

2 = depressed

Amount of Shame

A 1 10

B 1 9

C 1 10

D 2 2

E 2 2

Page 66: Point biserial correlation

• If we switch the codes where not depressed = 2 and depressed = 1

• We would have a -.99 correlation.

Person

Depressed

1 = not depressed

2 = depressed

Amount of Shame

A 1 10

B 1 9

C 1 10

D 2 2

E 2 2

Page 67: Point biserial correlation

• If we switch the codes where not depressed = 2 and depressed = 1

• We would have a -.99 correlation.

• Therefore, instead of looking at the numbers, we think in terms of whether something is present or not in this case (presence of depression or the lack of depression) and how that relates to the amount of shame.

Person

Depressed

1 = not depressed

2 = depressed

Amount of Shame

A 2 10

B 2 9

C 2 10

D 1 2

E 1 2

Page 68: Point biserial correlation

• The strength of the association can be tested against chance just as the Pearson Product Moment Correlation Coefficient.