correlation - weebly · 2019. 11. 6. · no correlation .75 negative positive correlation r = .88 r...

24
1/27/15 1 Correlation The relationship between two variables E.g., achievement in college is related to? Motivation Openness to new experience Conscientiousness IQ EtcRelationships can be causal E.g., Higher motivation likely causes higher achievement in college (and elsewhere) Although, they can also be non-causal (i.e., spurious) Correlation and Scatterplots Correlations are best visualized by using a scatterplot graph Two variables are plotted on the x and y axes If there is a relationship between them, a noticeable pattern will emerge E.g., scatterplot showing Y-axis = number of hours playing violent video games per week X-axis = number of school infractions per year Data Hours playing video games per week Number of school infractions per year 1 2 3 1 3 2 4 4 5 3 5 3 6 6 6 5 6 6 6 6 6 7 Hours playing video games per week Number of school infractions per year 6 4 7 7 7 6 7 8 8 6 8 10 9 7 10 12 11 10 12 10 12 11

Upload: others

Post on 27-Aug-2020

25 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

1  

Correlation • The relationship between two variables

• E.g., achievement in college is related to…? •  Motivation •  Openness to new experience •  Conscientiousness •  IQ •  Etc…

• Relationships can be causal •  E.g., Higher motivation likely causes higher achievement

in college (and elsewhere)

• Although, they can also be non-causal (i.e., spurious)

Correlation and Scatterplots • Correlations are best visualized by using a scatterplot graph

• Two variables are plotted on the x and y axes •  If there is a relationship between them, a noticeable

pattern will emerge

• E.g., scatterplot showing • Y-axis = number of hours playing violent video games

per week

• X-axis = number of school infractions per year

Data Hours playing

video games per week  

Number of school infractions per

year  1   2  3   1  3   2  4   4  5   3  5   3  6   6  6   5  6   6  6   6  6 …  

7 …  

Hours playing video games per

week  

Number of school infractions per

year  … 6

… 4

7 7 7 6 7 8 8 6 8 10 9 7

10 12 11 10 12 10 12 11

Page 2: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

2  

Data Hours playing

video games per week  

Number of school infractions per

year  1   2  3   1  3   2  4   4  5   3  5   3  6   6  6   5  6   6  6   6  6 …  

7 …  

Hours playing video games per

week  

Number of school infractions per

year  1   2  

Hours playing video games per

week  

Number of school infractions per

year  

Hours playing video games per

week  

Number of school infractions per

year  1   2  3   1  

Hours playing video games per

week  

Number of school infractions per

year  1   2  3   1  3   2  

Hours playing video games per

week  

Number of school infractions per

year  1   2  3   1  3   2  4   4  5   3  5   3  

Notice a pattern emerging?

Scatterplot Graph

Scatterplot Graph • From the scatter plot we can tell two things about the relationship…

1.  Direction of the relationship

¤ Positive = as one variable increases, so does the other

¤ Negative = as one variables increases, the other decreases

Page 3: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

3  

Scatterplot Graph • From the scatter plot we can tell two things about the relationship…

2.  Strength of the relationship

¤ Strong = dots fit closely together, almost forming a line

¤ Weak = dots are scattered about randomly

Scatterplot Graphs • Strong, positive relationship…

Scatterplot Graphs • Strong, negative relationship…

Page 4: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

4  

Scatterplot Graphs • Weak, negative relationship…

Scatterplot Graphs • Weak, positive relationship…

Scatterplot Graphs • No relationship (random dispersion)…

Page 5: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

5  

Correlation • Mathematically expressed as r or R

• Can range from -1 to +1

r = +.75

-1.0 +1.0 0

Weakly Correlated

-.3 +.3

Strongly Correlated

-.8 +.8

No Correlation .75

Positive Negative

Correlation

r = .88 r = -.88

r = .42 r = -.42

r = .08 (non-significant)

Correlation ! Prediction •  If a correlation is perfect (1 or -1), then we can perfectly predict one variable from the other

• Once we use correlations to make predictions, we are conducting regression analyses

• E.g., Y = height, X = weight

•  If someone told us their weight, we could perfectly predict their height and vice versa

Weight

Hei

ght

Page 6: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

6  

Weight

Hei

ght

Regression Line • To make predictions, we first need to figure out the regression line •  “Predictive line” that best fits the relationship between

the two variables

•  If the correlation is perfect, you simply draw a line over the data points

Regression Line • We can then use this line to make predictions

•  If R = 1, our predictions will always be correct

¤ E.g., If weight perfectly predicted height

¤ What would be the weight of someone who is 5’ 10”?

" 180 lbs

6’ 2” 6’ 0”

5’ 10” 5’ 8” 5’ 6” 5’ 4” 5’ 2”

120 150 180 210

R = 1.0

Regression Line • We can then use this line to make predictions

•  If R = 1, our predictions will always be correct

¤ E.g., If weight perfectly predicted height

¤ What would be the height of someone who is 200 lbs.?

"  6’ 0”

6’ 2” 6’ 0”

5’ 10” 5’ 8” 5’ 6” 5’ 4” 5’ 2”

120 150 180 210

R = 1.0

Page 7: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

7  

Correlation ! Prediction •  If a correlation is not perfect (always the case), we can still predict one variable from the other

• We just won’t be perfectly accurate

• E.g.,

•  If someone told us their weight, we could predict their height with 95% confidence

Weight

Hei

ght

95% C.I.

Weight

Hei

ght

Regression Line • Whenever making predictions, we first need to figure out the “best fitting” regression line •  Line that is the minimum average distance from every

point of data

Regression Line • Line that “bets fits” data

•  I.e., Line that has is the minimum average distance from each point of data

Page 8: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

8  

Regression Line • Line that “bets fits” data

•  I.e., Line that has is the minimum average distance from each point of data

Regression Line • Line that “bets fits” data

•  I.e., Line that has is the minimum average distance from each point of data

Regression Line • Line that “bets fits” data

•  I.e., Line that has is the minimum average distance from each point of data

Page 9: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

9  

Regression Line • Line that “bets fits” data

•  I.e., Line that has is the minimum average distance from each point of data

Best fitting line

• Once you have a best-fitting line, you can make predictions using the regression equation

DV = b0 + b1 (IV)

• b0: known as the “intercept” or “constant”

• What is the predicted value of the DV if the IV = 0?

• b1: known as the “slope” or “regression line” • Average change in the DV based on change in the IV

Regression

• What is the predicted DV if IV = 0?

b0: Constant

b0 = 1

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

Page 10: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

10  

• What is the predicted DV if IV = 0?

b0: Constant

b0 = 2.5

IV = Predictor

2

3

4

5

1

DV

= P

redi

cted

• What is the predicted DV if IV = 0?

b0: Constant

b0 = 0

IV = Predictor

2

3

4

5

1

DV

= P

redi

cted

• What is the predicted DV if IV = 0?

b0: Constant

b0 = 0

IV = Predictor

2

3

4

5

1

DV

= P

redi

cted

b0 = 1

b0 = 2.5

Page 11: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

11  

• For every unit of change in the IV, how much does the DV also change?

b1: Slope

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

b1 = change in y-axis change in x-axis

b1 = change in DV change in IV

b1 = rise run

• For every unit of change in the IV, how much does the DV also change?

b1: Slope

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

b1 = ? ?

• For every unit of change in the IV, how much does the DV also change?

b1: Slope

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

b1 = +1 ?

Page 12: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

12  

• For every unit of change in the IV, how much does the DV also change?

b1: Slope

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

b1 = +1 +3

• For every unit of change in the IV, how much does the DV also change?

b1: Slope

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

b1 = ? ?

• For every unit of change in the IV, how much does the DV also change?

b1: Slope

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

b1 = -1 ?

Page 13: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

13  

b0 = 4

• For every unit of change in the IV, how much does the DV also change?

b1: Slope

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

b1 = -1 5

b0 = ?

• What is the constant and slope?

Constant and Slope

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

b1 = 1 3

b1 = 1 6

b1 = -1 3

b0 = 3

?

• What is the constant and slope?

Constant and Slope

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

b0 = 2 b1 = 1 6 b0 = ? ?

Page 14: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

14  

• What is the constant and slope?

Constant and Slope D

V =

Pre

dict

ed

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

b0 = 5 b1 = -1 3

b0 = ? ?

DV = b0 + b1 (IV) b0 = b1 = DV = 1 + 1/3 (IV) What is the DV if the IV is 6? DV = 1 + 1/3 (6) = 3

Regression Equation

1 1/3

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

DV = b0 + b1 (IV) b0 = b1 = DV = 1 + 1/3 (IV) What is the DV if the IV is 3? DV = 1 + 1/3 (3) = 2

Regression Equation

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

1 1/3

Page 15: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

15  

DV = b0 + b1 (IV) b0 = b1 = DV = 1 + 1/3 (IV) What is the DV if the IV is 4.65? DV = 1 + 1/3 (4.65) = 2.55

Regression Equation

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

1 1/3

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

DV = b0 + b1 (IV) b0 = b1 = DV = 4 - 1/5 (IV) What is the DV if the IV is 5? DV = 4 - 1/5 (5) = 3

Regression Equation

4 -1/5

Regression Research • Advantages

• Can be easily used with any kind of data • E.g., experiments, survey research, archival studies

• Can predict one variable from other variables •  E.g., predicted recidivism rates for prison inmates

• Disadvantages • Cannot be used with non-linear relationships

• Cannot make predictions beyond the data

• Cannot infer causal relationship • Although, many news articles and even scientific

papers incorrectly discuss regression as causation

Page 16: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

16  

• Relationship cannot be curvilinear

Assumptions of Regression

Straight line doesn’t fit!

• Relationship between variables must be linear

• Regression line that best fits the data is straight

Assumptions of Regression

• Yerkes-Dodson Law Examples of curvilinear relationships

Page 17: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

17  

• Practice effects

Examples of curvilinear relationships S

kill

Time

• Variables are not sharply skewed • Regression will work with skewed variables, but the

sharper the skew, the lower your R

Assumptions of Regression

R = .47 R = .39

• Variables are not sharply skewed • Regression will work with skewed variables, but the

sharper the skew, the lower your R

Assumptions of Regression

R = .47 R = non-sig.

Page 18: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

18  

• Variables are continuous • Regression will work with discrete variables, but the

fewer values, the lower your R

Assumptions of Regression

R = .47 R = .32* *Conservatism = discrete

• Variables are continuous • Regression will work with discrete variables, but the

fewer values, the lower your R

Assumptions of Regression

R = .47 R = .23* *Both IVs = discrete

DV = b0 + b1 (IV) b0 = b1 = DV = 4 - 1/5 (IV) What is the DV if the IV is 25? DV = 4 - 1/5 (25) = 4 – 5 = -1

Regression Equation

Doesn’t make sense!

DV

= P

redi

cted

IV = Predictor

2

3

4

5

1

1 2 3 4 5 6

4 -1/5

Page 19: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

19  

Potential Problem • We get into trouble when estimating the unknown

• Especially when trying to project into the future

• E.g., The U.S. saw a rise in crime following record lows during WW2

Potential Problem • Media projected these trends into the future

Potential Problem • Projections turned out to be completely wrong

Page 20: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

20  

Problems with Projection • Other example: Housing bubble in 2000s

•  It’s never a good idea to project past available data

Regression ≠ Causation • Spurious Relationship – correlation between two variables that is created by a third variable • Being a Christian predicts being overweight (Feinstein,

American Heart Association, 2011) •  In fact, states in the U.S. with the most Christian churches

tend to have the highest average BMI

•  This spawned recent obesity prevention efforts by The Christian Post, Christian Leadership Alliance, and others

• Does this really mean being Christian causes people to become fat? •  What could be some third factors?

Possible Third Factors • Christians, compared to non-Christians, tend to be…

-Gallup, 2003 – 2011

What would Jesus eat?

• More overweight

•  Lower SES •  More difficult to afford healthy foods

and exercise equipment/ gym membership

• Older

• More likely to live in the South •  And thus eat a “Southern diet”

Page 21: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

21  

• Christians, compared to non-Christians, also tend to be…

-Gallup, 2003 – 2011

Possible Third Factors

• Happier

• Closer to family

• More generous

•  Less psychotic

• These just don’t happen to be related to obesity

No crazies here ☺

Spurious Relationships • Psychologically speaking, almost all variables are related to each other to some extent

Christianity

Obesity

Loves Comedies

Extraverted

Flosses

Regularly

Dog

Owner

SES

Generosity

Happiness

Age

Related Variables

Christianity

Obesity

Loves Comedies

Extraverted

Flosses

Regularly

Dog

Owner

SES

Generosity

Happiness

Age

Spurious Relationships • Psychologically speaking, almost all variables are related to each other to some extent

• Because of this, you can find significant relationships between almost any two variables

• Especially if the sample is big enough to make even small relationships significant

Page 22: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

22  

Spurious Relationships • Correlated.org was started to discover some of these odd and senseless correlations • Collects random bits of information from its users and

runs correlations between all of the variables • Some of my favorites include…

•  67% of people who prefer to be the "O" in tic-tac-toe support capital punishment, compared with 40% of people in general

•  15% of people who dislike mayonnaise are good dancers, compared to 29% of people in general

•  In general, 48% percent of people can burp at will, but of those who enjoy camping, 67% can burp at will

In the news… • The news media often turns correlation into causation to make a better story

• E.g., “Shaving less than once a day could increase a man's risk of having a stroke by around 70%” (BBC News, February 7, 2003) •  What researchers actually found

was that men who have strokes tend to have less testosterone

•  They suggested doctors ask men if they shave less than once per day as an indicator of low testosterone

In the news… • Correlations reported in the news often have a very small impact in the real world

• E.g., Researchers did find that getting breast implants tripled women’s risk of committing suicide

•  “A desire for breast augmentation may be a symptom of a far deeper insecurity and low self-esteem, which, in extreme cases, could trigger a suicide attempt” (BBC News, March 7, 2003)

Page 23: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

23  

In the news… • Correlations reported in the news often have a very small impact in the real world

• Actual data:

• Out of 3,521 women who received breast implants in Sweden from 1965 to 1993, 15 committed suicide

•  In the general Swedish population, you would expect 5 out of 3,500 to commit suicide • So, the risk was tripled (5 ! 15) • But the actual risk of suicide only went from 0.2% to 0.4%

• Also, these women differed in many other ways besides breast augmentation (SES, lifestyle, religiosity, etc.)

In the news… • Correlations may be misleading depending on how variables are operationalized

• E.g., Researchers found uncoordinated children are more likely to become obese adults

•  Those who were obese at age 33 had “57% higher odds of having poor hand control at age seven, were twice as likely to have suffered poor coordination and almost four times as likely to have been clumsy” (British Medical Journal, Osika & Montgomery, 2008)

In the news… • Correlations may be misleading depending on how variables are operationalized

•  “Coordination scores” of kids were attained by asking their teachers how clumsy they were •  Problems with this method?

•  Teacher may have assumed that overweight kids are clumsy •  Common stereotype of

overweight people

• E.g., Researchers found uncoordinated children are more likely to become obese adults

Page 24: Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r = -.88 r = .42 r = -.42 r = .08 (non-significant) Correlation ! Prediction •

1/27/15  

24  

In the news… • Correlations may be misleading depending on how variables are operationalized

• More recently, this data was re-analyzed

•  After controlling for people’s BMI at age 7, there was no correlation between childhood coordination and adult obesity

•  All the researchers had actually found was that childhood obesity predicts adult obesity

•  And that people assume over-weight kids are uncoordinated