chapter 2 part1-scatterplots
Post on 05-Dec-2014
186 Views
Preview:
DESCRIPTION
TRANSCRIPT
INTRODUCTION TO STATISTICS & PROBABILITY
Chapter 2:
Looking at Data–Relationships (Part 1)
1
Dr. Nahid Sultana
Chapter 2: Looking at Data–Relationships
2
2.1: Scatterplots
2.2: Correlation
2.3: Least-Squares Regression
2.5: Data Analysis for Two-Way Tables
3
Objectives
Bivariate data
Explanatory and response variables
Scatterplots
Interpreting scatterplots
Outliers
Categorical variables in scatterplots
2.1: Scatterplots
Bivariate data 4
For each individual studied, we record data on two variables.
We then examine whether there is a relationship between these two variables: Do changes in one variable tend to be associated with specific changes in the other variables?
Student ID
Number of Beers
Blood Alcohol Content
1 5 0.1
2 2 0.03
3 9 0.19
6 7 0.095
7 3 0.07
9 3 0.02
11 4 0.07
13 5 0.085
4 8 0.12
5 3 0.04
8 5 0.06
10 5 0.05
12 6 0.1
14 7 0.09
15 1 0.01
16 4 0.05
Here we have two quantitative variables recorded for each of 16 students:
1. how many beers they drank 2. their resulting blood alcohol content
(BAC)
5
Many interesting examples of the use of statistics involve relationships between pairs of variables.
Two variables measured on the same cases are associated if
knowing the value of one of the variables tells you something about the values of the other variable that you would not know without this information.
5
Associations Between Variables
A response (dependent) variable measures an outcome of a study.
An explanatory (independent) variable explains changes in the response variable.
6
Scatterplot
6
The most useful graph for displaying the relationship between two quantitative variables on the same individuals is a scatterplot.
1. Decide which variable should go on which axis.
2. Typically, the explanatory or independent variable is plotted on the x-axis, and the response or dependent variable is plotted on the y-axis.
3. Label and scale your axes.
4. Plot individual data values.
How to Make a Scatterplot
7
Scatterplot (Cont…) Example: Make a scatterplot of the relationship between body weight and backpack weight for a group of hikers.
7
Body weight (lb) 120 187 109 103 131 165 158 116
Backpack weight (lb) 26 30 26 24 29 35 31 28
8
Interpreting Scatterplots
8
After plotting two variables on a scatterplot, we describe the overall pattern of the relationship. Specifically, we look for form, direction, and strength .
Form: linear, curved, clusters, no pattern
Direction: positive, negative, no direction
Strength: how closely the points fit the “form”
… and clear deviations from that pattern
Outliers of the relationship, , an individual value that falls outside the overall pattern of the relationship
How to Examine a Scatterplot
9
Linear
Nonlinear
No relationship
Interpreting Scatterplots (Cont…) (Form)
10
Interpreting Scatterplots (Cont…) (Direction)
Positive association: High values of one variable tend to occur together with high values of the other variable.
Negative association: High values of one variable tend to occur together with low values of the other variable
11
Interpreting Scatterplots (Cont…)
No relationship: X and Y vary independently. Knowing X tells you nothing about Y.
12
Interpreting Scatterplots (Cont…) (Strength)
The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form.
13
Interpreting Scatterplots (Cont…) (Outliers)
In a scatterplot, outliers are points that fall outside of the overall pattern of the relationship.
14
Interpreting Scatterplots (Cont…)
Direction Form Strength
There is one possible outlier―the hiker with the body weight of 187 pounds seems to be carrying relatively less weight than are the other group members.
There is a moderately strong, positive, linear relationship between body weight and backpack weight.
It appears that lighter hikers are carrying lighter backpacks.
How to scale a scatterplot
15
Using an inappropriate scale for a scatterplot can give an incorrect impression. Both variables should be given a similar amount of space: • Plot roughly square • Points should occupy all the plot space (no blank space)
Same data in all four plots
Categorical variables in scatterplots 16
What may look like a positive
linear relationship is in fact a
series of negative linear
associations.
Plotting different habitats in
different colors allows us to
make that important distinction.
To add a categorical variable, use a different plot color or symbol for each category.
17
Categorical variables in scatterplots (Cont…)
Comparison of men and women racing records over time. Each group shows a very strong negative linear relationship that would not be apparent without the gender categorization.
Relationship between lean body mass and metabolic rate in men and women. Both men and women follow the same positive linear trend, but women show a stronger association.
Categorical explanatory variables
When the explanatory variable is categorical, you cannot make a scatterplot, but you can compare the different categories side by side on the same graph (boxplots, or mean +/− standard deviation).
Comparison of income (quantitative response variable) for different education levels (five categories).
But be careful in your interpretation: This is NOT a positive association, because education is not quantitative.
top related