statistics and correlation
DESCRIPTION
TRANSCRIPT
![Page 1: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/1.jpg)
Statistics / Correlation research
![Page 2: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/2.jpg)
After a research project has been carried out, what are the results?
For quantitative data, the results are a bunch of numbers.
Now what? What do the numbers look like, what do the numbers mean
Statistical analysis allows us to:Summarize the dataRepresent the data in meaningful waysDetermine whether our data is meaningful or
not
![Page 3: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/3.jpg)
Many forms of researchMany forms of dataVariety of dependent variables
Data can take 1 of 4 different forms. Four measurement scales:
NominalOrdinal IntervalRatio
![Page 4: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/4.jpg)
Nominal scale – simplest form of measurement: you give something a name. Qualitative scale of measurement
Assign participants to a category based on a physical or psychological characteristic rather than a numerical score.
E.g.,Male vs. Female; color of eyes Intelligence levels: smart vs. dull
Data is determined by a strict category Only allows for crude comparisons of
results. Can really only be used for qualitative
comparisons.
![Page 5: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/5.jpg)
Ordinal scales ranking system – data is ranked from
highest to lowest Show relative rankings but say nothing
about the extent of the differences between the rankings.
Does not assume that the intervals between rankings are equal.
E.g., rank 10 smartest kids E.g., college football rankings Problem – no absolute magnitude Makes it difficult to make comparisons
![Page 6: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/6.jpg)
Interval scales – numeric scores without absolute zero
Not only relative ranks of scores, but also equal distances or degrees between the scores.
Interval = equal intervals ordering E.g., IQ scores – difference between 100
and 120 is the same as the difference between 60 and 80.
Problem – no absolute zeroCannot have an IQ score of 0.Does not allow for ratio comparisons. E.g., IQ
of 120 is not twice as smart as 60.
![Page 7: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/7.jpg)
Ratio scales - numeric scores but with an absolute zero point
All of the properties of the other scales but with a meaningful zero point.
Allows you to make ratio comparisons i.e., is one twice as much as another?
E.g., number of correct answers on an exam.
E.g., number of friends a person has.
![Page 8: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/8.jpg)
Nominal and Ordinal scales are discrete or categorical
Interval and Ratio scales are continuous scales.
NOIR Increasing levels of resolution Most observable behaviors are
measured on a Ratio scale. Most psychological constructs are
measured on an Interval scale. Important to recognize what scale of
measurement is being used. Nominal and ordinal data require different
statistical analyses than interval or ratio data.
![Page 9: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/9.jpg)
After data collection is finished, the data must be summarized. What does it look like?
Start with exploring the data. Look at individual scores.
Frequency distributions show us the collection of individual scores.
Simple frequency distributions – lists all possible score values and then indicates their frequency.
Allows us to make sense of the individual scores.
![Page 10: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/10.jpg)
Bob 80 Axel 86Jenny 70 Jordan 96Joe 66 Marissa 100George 78 Jackson 86Lori 100 John 78Sherri 88 Janice 76Joey 68 Amy 78Cedric 78 Gene 50Jan 56 Dorothy 76Arthur 86 Patrick 80Ackbar 76 Nicole 72Robert 98
![Page 11: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/11.jpg)
Score Frequency100 298 196 188 186 380 278 476 372 170 168 166 160 156 1
![Page 12: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/12.jpg)
Grouped frequency distribution – raw data are combined into equal sized groups
Grade FrequencyA (90 - 100) 4B (80 - 90) 6C (70 - 80) 9D (60 - 70) 2
F(<60) 2
![Page 13: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/13.jpg)
Histogram – a frequency distribution in graphical formBar graph
![Page 14: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/14.jpg)
![Page 15: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/15.jpg)
Numeric summaries that condense information Numbers that are used to make comparisonsNumbers that portray relationships or
associations. Two main types of stats
Descriptive statistics Inferential statistics
![Page 16: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/16.jpg)
Descriptive statistics – summarize resultsCentral tendencyVariability
Inferential statistics – Used to determine whether relationships or differences between samples are statistically significant
![Page 17: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/17.jpg)
Central tendency – what is the “heart of the data”?
Three measures of central tendency Mean – average
Add up all scores and divide by the total number of samples
Median – middle scoreLine up all scores and find the middle one
Mode – most common scoreWhich score occurs the most often
![Page 18: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/18.jpg)
Simply add up all of the scores and divide by the number in the sample.
The statistic for a sample – X bar - = X / n
X̄ X̄
![Page 19: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/19.jpg)
X̄
Bob 80 Axel 86Jenny 70 Jordan 96Joe 66 Marissa 100George 78 Jackson 86Lori 100 John 78Sherri 88 Janice 76Joey 68 Amy 78Cedric 78 Gene 50Jan 56 Dorothy 76Arthur 86 Patrick 80Ackbar 76 Nicole 72Robert 98
Total 1822n 23
= X / n = 1822 / 23 = 79.22
![Page 20: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/20.jpg)
Pros and cons of using the mean Pros
Summarizes data in a way that is easy to understand.
Uses all the data Used in many statistical applications
Cons Affected by extreme valuesE.g., If Robert would have scored a 0, the
mean changes to 74. E.g., average salary at a company
12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 20,000; 390,000
Mean = $44, 167
![Page 21: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/21.jpg)
Median – the middle score in the data: half the scores are above it, half of the scores are below it.
Scores are ranked…. Find the one in middle.
50 56 66 68 70 72 76 76 76 78 78 78 78
80 80 86 86 86 88 96 98 100 100
Example – Median is the score 78. If there is an even number of scores, the
median is the average of the two middle scores.
E.g., 10, 10, 9, 9 – Median is 9.5
![Page 22: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/22.jpg)
Pros and cons of using the median Pros
Not affected by extreme values Always exists Easy to compute
Cons Doesn't use all of the data values Categories must be properly ordered
Mean is almost always preferred. Exception: data is skewed, not distributed symmetically, or has extreme scores.
![Page 23: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/23.jpg)
Positive Skew
27 32 37 42 47 52 57 62 67 72 770
2
4
6
8
10
12
Scores
Freq
uenc
y
Negative Skew
27 32 37 42 47 52 57 62 67 72 770
2
4
6
8
10
12
Scores
Fre
qu
en
cy
![Page 24: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/24.jpg)
Mode – the most common score of the data Mode is 78
Score Frequency100 298 196 188 186 380 278 476 372 170 168 166 160 156 1
![Page 25: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/25.jpg)
Pros and cons of using mode Pros
Fairly easy to computeNot affected by extreme values
Cons Sometimes not very descriptive of the data Not necessarily unique – if two modes =
bimodal; if multiple modes = polymodal.Doesn't use all values.
![Page 26: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/26.jpg)
Examples: shoe size, height
![Page 27: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/27.jpg)
Variability – how spread out is the data Measures of variability
RangeVariance Standard deviation – “average variability”
Range – the simplest variability statistic = high score – low score.
Standard deviation - a measure of the variation, or spread, of individual measurements; a measurement which indicates how far away from the middle the scores are.
![Page 28: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/28.jpg)
The larger the standard deviation, the more spread out the scores are.
The smaller the standard deviation, the closer the scores are to the mean.
![Page 29: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/29.jpg)
Computing SD1. subtract each score from the mean
Ex. (100 – 80 = 20)2. square that number for each score3. add up the squared numbers. This is the
“sum of squares” 4. Divide the sum of squares by the total
number in the sample minus one - this is the variance
4. take the square root of that number. This is the standard deviation
![Page 30: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/30.jpg)
Data is usually spread around the mean in both directionsSome are higher than the mean, some are
lower. The frequency distribution of the scores
tells us how the scores land relative to the mean.
Ideally, some scores are higher, some are lower, most are in the middle.
The normal distribution – the bell curve
![Page 31: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/31.jpg)
![Page 32: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/32.jpg)
As sample size increases, the distribution of the data becomes more normalized.
Importance of the normal distributionSymmetricalMean, median, mode all the sameThe further away from the mean, the less likely
the score is to occurProbabilities can be calculated
![Page 33: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/33.jpg)
We can assume that many human traits or behavior follow the normal distribution
Some are high is a trait, some are low, but most people are in the middle.
E.g., personality traits, memory ability, musical capabilities
People have a tendency to think categorically - erroneous
![Page 34: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/34.jpg)
All data points are arranged, and a particular data point is compared to the population.E.g. IQ score of 130
Percentile reflect the percentage of scores that were below your data point of interest. IQ score of 130 is at the 95th percentile.
Percentile is arranged according to standard deviation.
![Page 35: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/35.jpg)
0 SD is the 50th percentile
1 SD is the 84th percentile
2 SDs is the 97th percentile
3 SDs is the 99.5th percentile
![Page 36: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/36.jpg)
Advanced statistics that reveal whether differences are meaningful.
Take into account both central tendency (usually the mean) and variability
Determines the probability that the differences arose due to chance.
If the probability that the observed differences are due to chance is very low, we say that the difference is statistically significant.
Science holds a strict criteria for determining significance.
![Page 37: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/37.jpg)
α = alpha – the probability of committing a Type I error.
α is normally set at 0.05. Only a 5% chance of committing a type I error.
Can find the probability that the observed differences are statistically significant. If that probability is less than 0.05, the results
are statistically significant. Many types of inferential statistics
t testAnalysis of Variance
![Page 38: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/38.jpg)
Visually representing the data can make it more understandable for you as well as anyone else looking at your results.
Horizontal axis is the X-axis Vertical axis is the Y-axis The best graph is the one that makes the
data more clear.
![Page 39: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/39.jpg)
50.00 60.00 70.00 80.00 90.00 100.00
Scores
0
2
4
6
8
10
Fre
qu
en
cy
Mean = 79.2174Std. Dev. = 12.75987N = 23
![Page 40: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/40.jpg)
Each score is divided into two parts, a stem and a leafThe leaf is the last digit of the scoreThe stem is the remaining digit(s)E.g., 49 would have 4 as the stem and 9 as the
leaf. Graphing a stem and leaf is like making a
table.
![Page 41: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/41.jpg)
Stem Leaf
5 6
6 068
7 0266688888
8 006668
9 68
10 00
![Page 42: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/42.jpg)
Much of the time, a plot of the means is useful.
Test of Men vs. Women
0
10
20
30
40
50
60
70
80
90
100
Female Male
Sco
res
![Page 43: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/43.jpg)
Line graphs are especially important for Repeated Measures
Latent Inhibition
0
10
20
30
40
50
60
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Sessions
Res
pons
es p
er M
inut
e
CS -Preexposed
Control
![Page 44: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/44.jpg)
Show the median and distribution of scores.
Also shows outliers – scores that are more than 3 standard deviations from the mean.4050N =
TRAIT
IntrovertExtravert
FR
IEN
DS
20
10
0
-10
1
![Page 45: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/45.jpg)
Keys to making figures:Keep it simpleNothing is “required” for making figuresPurpose is to better illustrate the results.
Don’t “lie” with figures. Axes should be set at appropriate range.
![Page 46: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/46.jpg)
Test of Men vs. Women
79.4
79.6
79.8
80
80.2
80.4
80.6
80.8
81
81.2
Female Male
Sco
res
![Page 47: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/47.jpg)
Test of Men vs. Women
0
10
20
30
40
50
60
70
80
90
100
Female Male
Sco
res
![Page 48: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/48.jpg)
Correlational research investigates the relationships between two variables.E.g., is there a relationship between poverty
levels and crimeAttachment level in children and future
behavior.Are the number of hours husbands spend
watching sports associated with wives’ marital satisfaction?
Are basketball players heights associated with number of points scored?
![Page 49: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/49.jpg)
Establishes the relationship between the variablesWhether it existsThe strength of the relationship
Correlation can be used as a method for conducting research, or as a tool within the research.
![Page 50: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/50.jpg)
Correlation does not mean causation
Ex. Significant correlation between ice cream sales and murder rates – ice cream sales and shark attacks
The number of cavities in elementary school children and vocabulary size have a strong positive correlation.
Skirt lengths and stock prices are highly correlated (as stock prices go up, skirt lengths get shorter).
![Page 51: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/51.jpg)
Can be causation, but correlational research is not designed to assess that.
Meanings of correlation:1. Causation: Changes in X cause changes in Y2. Common Response: changes in X and Y are
both caused by some unobserved variable.3. Confounding variables are causing Y and not
X.
![Page 52: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/52.jpg)
Correlation simply measure relationships. All methods use to calculate correlation are
established so that it can vary between –1 and +1.
Most common method is the Pearson product-moment correlation coefficient Represented by r
Strength of the correlationThe closer to +1 or -1, stronger the correlation
![Page 53: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/53.jpg)
Positive correlations – as X increases, Y increases.Ex. Horsepower and speedThe value of the correlation represents the
strength of the relationship.+1 represents a perfect positive relationship.0.9 is an extremely high correlation, 0.2 isn’t as
strong. Zero correlations – as X increases, we have no
idea what happens to Y.Values around 0Examples: length of hair and test scores
![Page 54: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/54.jpg)
Negative correlations – as X increases, Y decreases.Horsepower and miles per gallon
Important: a negative correlation simply tells what direction the relationship is, not the strength of the relationship.
One way to view correlations is graphically. Scatterplots – graph that plots pairs of
scores: one variable on the X axis, one on the Y axis.
![Page 55: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/55.jpg)
Concurrent Change Same Direction
![Page 56: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/56.jpg)
Strong positive correlation:
Engine size and Weight
Vehicle Weight (lbs.)
6000500040003000200010000
Eng
ine
Dis
plac
emen
t (cu
. inc
hes)
500
400
300
200
100
0
-100
![Page 57: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/57.jpg)
![Page 58: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/58.jpg)
![Page 59: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/59.jpg)
Very weak correlation
Model year and weight
Vehicle Weight (lbs.)
6000500040003000200010000
Mod
el Y
ear
(mod
ulo
100)
8483828180797877767574737271706968
![Page 60: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/60.jpg)
Strong negative correlation:
Horsepower and miles per gallon
Horsepower
3002001000
Mile
s pe
r G
allo
n
50
40
30
20
10
0
![Page 61: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/61.jpg)
Negative Correlation
Concurrent Change in Opposite Directions
![Page 62: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/62.jpg)
![Page 63: Statistics And Correlation](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c6fcc44a79590f458b4572/html5/thumbnails/63.jpg)
Scatter plots also allow you to see outliers. Most correlations are assessing a linear
relationship. Some relationships are more complex. E.g., the Yerkes-Dodson law