association, and correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/chapter7... · positive...
TRANSCRIPT
![Page 1: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/1.jpg)
1
Chapter 7
Scatterplots, Association, and Correlation
![Page 2: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/2.jpg)
Here, we see a positive relationship between a bear’s age and its neck diameter.
2
Scatterplots & Correlation
As a bear gets older, it tends to have a larger neck.
![Page 3: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/3.jpg)
3
Scatterplots & Correlation
Statistics is about … variation.
Recognize, quantify and try to explain variation. Variation in neck
measurements can be explained, at least in part, by the age of the bear.
Older bear Larger neck
![Page 4: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/4.jpg)
Positive Association
4 50 55 60 65 70 75 80 85
050
100
150
Percent of country between 15 and 64 yrs-old
Cel
l pho
ne u
sage
per
100
peo
ple
Cell phone usage per 100 people vs. Percent of individuals between 15 & 64
Data from 2008. These variables have a
positive correlation… A country with a larger
percentage of people between 15-64 tends to have more cell phone users.
![Page 5: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/5.jpg)
Negative Association
5
Outside temperature and amount of natural gas used.
These variables have a negative correlation… Days with higher
temperature tend to use less natural gas.
Higher temperature Less gas used
0
5
10
Gas
-5.0 .0 5.0 10.0 15.0
Temp
![Page 6: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/6.jpg)
6
Scatterplots & Correlation When the two variables of interest are
continuous variables, we can plot their relationship with a scatterplot (or scatter diagram).
A scatterplot gives you a quick look at the general relationship between the variables.
Each observation provides one point on the plot. 0
5
10
Gas
-5.0 .0 5.0 10.0 15.0
Temp
![Page 7: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/7.jpg)
7
Response variable – plotted on the vertical axis. Also called the dependent variable.
Explanatory variable – plotted on the horizontal axis. Used to try to explain variation in the response variable. Also called the independent variable.
50 100 150 200 250 300
2025
3035
4045
50
Engine HorsePower
Hig
hway
MP
G
HWY-mpg is the response variable
Engine HPW is the explanatory variable
Here, we use Engine HPW to explain the variability in HWY-mpg.
![Page 8: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/8.jpg)
Correlation and Association
When describing relationships, we use the terms correlation and association interchangeably. If variables are correlated, we say they are associated.
8
Definition A correlation exists between two variables when higher values of one variable consistently go with higher values of another variable or when higher values of one variable consistently go with lower values of another variable.
![Page 9: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/9.jpg)
9
Positive Association
Positive Association
Above average values of Age are associated with above average values of Neck Measure (age-high goes with neck-high)
Below average values of Age are associated with below average values of Neck Measure(age-low goes with neck-low)
(correlation)
![Page 10: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/10.jpg)
10
Negative Association
Negative Association
Below average values of Engine HPW are associated with above average values of HWY-mpg (HPW-low goes with MPG-high).
Above average values of Engine HPW are associated with below average values of HWY-mpg (HPW-high goes with MPG-low).
50 100 150 200 250 300
2025
3035
4045
50
Engine HorsePower
Hig
hway
MP
G(correlation)
![Page 11: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/11.jpg)
Strength of Association
11
Correlation applies only to quantitative (continuous) variables.
Correlation measures the strength of linear association.
The correlation coefficient (r) gives the direction of the linear association and quantifies the strength of the linear association between two quantitative variables.
Correlation is a `unitless’ quantity (not in ‘feet’ or ‘inches’… no units)
![Page 12: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/12.jpg)
12 12
Strength of Association
1.0 -1.0 0.0
Very Weak or No Linear
Relationship
Strong Positive Linear
Relationship
Strong Negative Linear
Relationship
Correlation Coefficient (r) will be between -1 and 1.
![Page 13: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/13.jpg)
13
r = ?
r =0.3 r =0.7 r =1
r = – 1
r =0.0
r = – 0.3 r = – 0.7
weak (fuzzy)
weak (fuzzy)
none
stronger (more clear)
stronger (more clear)
r not meaningful, this is non-linear
super strong
super strong
![Page 14: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/14.jpg)
14
Things to look for in a scatterplot
1. Direction of association Positive or negative.
2. Form of association Linear, curved, clustered, scattered (no relationship).
3. Strength of association How closely the points follow a clear form.
4. Outliers A point that lies outside of the general pattern.
2520151050
2.0
1.5
1.0
0.5
0.0
Tar (mg)
Nic
otin
e (m
g)
Nicotine Content vs. Tar Content
![Page 15: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/15.jpg)
15
Example
Direction _____________
Form _____________
Strength ___________
Outliers? ___________
![Page 16: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/16.jpg)
16
Association vs. Causation
The existence of an association does not equate to causation.
To imply that a change in one variable causes a change in another is a very strong statement – use ‘association’ for our relationships in this class.
![Page 17: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/17.jpg)
17 17
Correlation Cautions
Don’t confuse correlation with causation. There is a strong positive correlation between
shoe size and intelligence.
Beware of lurking variables.
![Page 18: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/18.jpg)
18 18
Beware of lurking variables
Lurking variable – a hidden variable that stands behind a relationship and affects the other two variables.
Number of firefighters at scene
fire
dam
age
(dol
lars
$)
Size of fire?
![Page 19: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15](https://reader031.vdocuments.net/reader031/viewer/2022022507/5ac66f2b7f8b9aa0518e94ea/html5/thumbnails/19.jpg)
Increasing the size of the fire will cause greater damage.
Increasing the number of firefighters at the fire will not cause greater damage, but we do tend to see more firefighters at larger fires.
Correlation does NOT imply causality. 19
Association vs. Causation