hypothesis tests: two independent samples cal state northridge 320 andrew ainsworth phd
TRANSCRIPT
Hypothesis Tests: Two Independent Samples
Cal State Northridge320
Andrew Ainsworth PhD
2
Major Points
•What are independent samples?
•Distribution of differences between means
•An example •Heterogeneity of Variance•Effect size•Confidence limits
Psy 320 - Cal State Northridge
Psy 320 - Cal State Northridge
3
Independent Samples
•Samples are independent if the participants in each sample are not related in any way
•Samples can be considered independent for 2 basic reasons▫First, samples randomly selected from 2 separate populations
Psy 320 - Cal State Northridge
4
Independent Samples
Ran
dom
S
elec
tionPopulation #1
Sample #1R
ando
m
Sel
ectio
n
Population #2
Sample #2
•Getting Independent Samples▫Method #1
Psy 320 - Cal State Northridge
5
Independent Samples
•Samples can be considered independent for 2 basic reasons▫Second, participants are randomly selected from a single population
▫Subjects are then randomly assigned to one of 2 samples
Psy 320 - Cal State Northridge
6
Independent Samples
Sample #1 Sample #2
Population
Random Selection and Random Assignment
•Getting Independent Samples▫Method #2
Psy 320 - Cal State Northridge
7
Sampling Distributions Again•If we draw all possible pairs of independent samples of size n1 and n2, respectively, from two populations.
•Calculate a mean and in each one.
•Record the difference between these values.
•What have we created?•The Sampling Distribution of the Difference Between Means
1X 2X
Psy 320 - Cal State Northridge
8
Sampling Distribution of the Difference Between Means
•Mean of sampling distribution is 1
•Mean of sampling distribution is 2
•So, the mean of a sampling distribution is 1- 2
Sample 1
Population 1
Sample 2
Sample 1 Sample 2
Sample 1 Sample 2
Sample 1 Sample 2
1
2
3
∞
... ...
Calculate Statistic
12 22X X
13 23X X
11 21X X
1 2X X
Population 2
1X
2X
1 2X X
Psy 320 - Cal State Northridge
9
Sampling Distribution of the Difference Between Means
•Shape▫approximately normal if
both populations are approximately normal▫or
N1 and N2 are reasonably large.
Psy 320 - Cal State Northridge
10
Sampling Distribution of the Difference Between Means•Variability (variance sum rule)
▫The variance of the sum or difference of 2 independent samples is equal to the sum of their variances
1 2
1 2 1 2
1 2 1
1 2 1
2 22 2 2
1 2
2 2
1 2 1
2 2 2
; Remembering that
X X
X X X X
X X X
X X X
n n
n n n
Think Pathagoras c a b
Psy 320 - Cal State Northridge
11
Sampling Distribution of the Difference Between Means
•Sampling distribution of
1 - 2
2 21 2
1 2n n
1 2X X
Psy 320 - Cal State Northridge
12
Converting Mean Differences into Z-Scores•In any normal distribution with a known mean and standard deviation, we can calculate z-scores to estimate probabilities.
•What if we don’t know the ’s? •You got it, we guess with s!
1 2
1 2 1 2( ) ( )
X X
X Xz
Psy 320 - Cal State Northridge
13
Converting Mean Differences into t-scores
•We can use to estimate
•If,▫The variances of the samples are roughly
equal (homogeneity of variance)▫We estimate the common variance by
pooling (averaging)
1 2
1 2 1 2( ) ( )
X X
X Xt
s
1 2X Xs 1 2X X
Psy 320 - Cal State Northridge
14
Homogeneity of Variance
•We assume that the variances are equal because:▫If they’re from the same population they
should be equal (approximately)▫If they’re from different populations they
need to be similar enough for us to justify comparing them (“apples to oranges” and all that)
Psy 320 - Cal State Northridge
15
Pooling Variances•So far, we have taken 1 sample estimate
(e.g. mean, SD, variance) and used it as our best estimate of the corresponding population parameter
•Now we have 2 samples, the average of the 2 sample statistics is a better estimate of the parameter than either alone
Psy 320 - Cal State Northridge
16
Pooling Variances
•Also, if there are 2 groups and one groups is larger than it should “give” more to the average estimate (because it is a better estimate itself)
•Assuming equal variances and pooling allows us to use an estimate of the population that is smaller than if we did not assume they were equal
Psy 320 - Cal State Northridge
17
Calculating Pooled Variance
•Remembering that
sp2 is the pooled estimate: we pool SS and df to get
2
1
SS SSs
df n
2 1 2 1 2 1 2
1 2 1 2 1 2( 1) ( 1) 2pooled
ppooled
SS SS SS SS SS SS SSs
df df df n n n n
Psy 320 - Cal State Northridge
18
Calculating Pooled Variance
2 1 2
1 2 2p
SS SSs
n n
21 1 1
22 2 2
Method 1
( )
( )
i
i
SS X X
SS X X
21 1 1
22 2 2
Method 2
( 1)
( 1)
SS n s
SS n s
Psy 320 - Cal State Northridge
19
Calculating Pooled Variance•When sample sizes are equal (n1 = n2) it is
simply the average of the 2
•When unequal n we need a weighted average
1 2
2 22
2X X
p
s ss
2 22 1 1 2 2
1 2
( 1) ( 1)
2p
n s n ss
n n
Psy 320 - Cal State Northridge
20
Calculating SE for the Differences Between Means
•Once we calculate the pooled variance we just substitute it in for the sigmas earlier
•If n1 = n2 then
1 2
2 2
1 2
p p
X X
s ss
n n
1 2
2 2 2 21 2
1 2 1 2
p p
X X
s s s ss
n n n n
Psy 320 - Cal State Northridge
21
Example: Violent Videos Games•Does playing violent video games increase
aggressive behavior•Two independent randomly
selected/assigned groups▫Grand Theft Auto: San Andreas (violent: 8
subjects) VS. NBA 2K7 (non-violent: 10 subjects)
▫We want to compare mean number of aggressive behaviors following game play
Psy 320 - Cal State Northridge
22
Hypotheses (Steps 1 and 2)•2-tailed
▫h0: 1 - 2 = 0 or Type of video game has no impact on aggressive behaviors
▫h1: 1 - 2 0 or Type of video game has an impact on aggressive behaviors
•1-tailed▫h0: 1 - 2 ≤ 0 or GTA leads to the same or
less aggressive behaviors as NBA▫h1: 1 - 2 > 0 or GTA leads to more
aggressive behaviors than NBA
Psy 320 - Cal State Northridge
23
Distribution, Test and Alpha(Steps 3 and 4)•We don’t know for either group, so we
cannot perform a Z-test•We have to estimate with s so it is a t-
test (t distribution)•There are 2 independent groups•So, an independent samples t-test•Assume alpha = .05
Psy 320 - Cal State Northridge
24
Decision Rule (Step 5)•dftotal = (n1 – 1) + (n2 – 1) = n1 + n2 – 2
▫Group 1 has 8 subjects – df = 8 – 1= 7▫Group 2 has 10 subjects – df = 10 – 1 = 9
•dftotal = (n1 – 1) + (n2 – 1) = 7 + 9 = 16 OR dftotal = n1 + n2 – 2 = 8 + 10 – 2 = 16
•1-tailed t.05(16) = ____; If to > ____ reject 1-tailed
•2-tailed t.05(16) = ____; If to > ____ reject 2-tailed
t Distribution
df1-tailed 0.1 0.05 0.025 0.01 0.005 0.00052-tailed 0.2 0.1 0.05 0.02 0.01 0.001
1 3.078 6.314 12.706 31.821 63.657 636.6192 1.886 2.92 4.303 6.965 9.925 31.5983 1.683 2.353 3.182 4.5415 5.841 12.9414 1.533 2.132 2.776 3.747 4.604 8.615 1.476 2.015 2.571 3.365 4.032 6.8596 1.44 1.943 2.447 3.143 3.707 5.9597 1.415 1.895 2.365 2.998 3.499 5.4058 1.397 1.86 2.306 2.896 3.355 5.0419 1.383 1.833 2.262 2.821 3.25 4.78110 1.372 1.812 2.228 2.764 3.169 4.58711 1.363 1.796 2.201 2.718 3.106 4.43712 1.356 1.782 2.179 2.681 3.055 4.31813 1.35 1.771 2.16 2.65 3.012 4.22114 1.345 1.761 2.145 2.624 2.977 4.1415 1.341 1.753 2.131 2.602 2.947 4.07316 1.337 1.746 2.12 2.583 2.921 4.01517 1.333 1.74 2.11 2.567 2.898 3.96518 1.33 1.734 2.101 2.552 2.878 3.92219 1.328 1.729 2.093 2.539 2.861 4.88320 1.325 1.725 2.086 2.528 2.845 3.8521 1.323 1.721 2.08 2.518 2.831 3.81922 1.321 1.717 2.074 2.508 2.819 3.79223 1.319 1.714 2.069 2.5 2.807 3.76724 1.318 1.711 2.064 2.492 2.797 3.74525 1.316 1.708 2.06 2.485 2.787 3.725
Critical Values of Student's t 25
Psy 320 - Cal State Northridge
26
Calculate to (Step 6)Subject GTA NBA
1 12 92 9 93 11 84 13 65 8 116 10 77 9 88 10 119 8
10 7
Mean 10.25 8.4s 1.669 1.647
s2 2.786 2.713
Data
Psy 320 - Cal State Northridge
27
Calculate to (Step 6)
•We have means of 10.25 (GTA) and 8.4 (NBA), but both of these are sample means.
•We want to test differences between sample means.▫Not between a sample and a population
mean
Psy 320 - Cal State Northridge
28
Calculate to (Step 6)
•The t for 2 groups is:
•But under the null 1 - 2 = 0, so:1 2
1 2 1 2( ) ( )
X X
X Xt
s
1 2
1 2( )
X X
X Xt
s
Psy 320 - Cal State Northridge
29
Calculate to (Step 6)
•We need to calculate and to do that we need to first calculate
2 22 1 1 2 2
1 2
( 1) 1 __(____) __ ____
2 __ __ 2
_____ _____ _________
__ __
p
n s n ss
n n
1 2X Xs 2ps
Psy 320 - Cal State Northridge
30
Calculate to (Step 6)•Since is _____ we can insert this into
the equation
1 2
1 2
2 2
1 2
1 2
____ ____
__ __
___ ___ ____
,
___ ___ ______
___ ___
p pX X
o
X X
s ss
n n
Therefore
X Xt
s
1 2X Xs
2ps
Psy 320 - Cal State Northridge
31
Make your decision(Step 7)•Since ____ > ____ we would reject the null
hypothesis under a 2-tailed test•Since ____ > ____ we would reject the null
hypothesis under a 1-tailed test•There is evidence that violent video
games increase aggressive behaviors
Psy 320 - Cal State Northridge
32
Heterogeneous Variances•Refers to case of unequal population
variances•We don’t pool the sample variances just
add them•We adjust df and look t up in tables for
adjusted df•For adjustment use Minimum
df = smaller n - 1
Psy 320 - Cal State Northridge
33
Effect Size for Two Groups•Extension of what we already know.•We can often simply express the effect as
the difference between means (e.g. 1.85)•We can scale the difference by the size of
the standard deviation.▫Gives d (aka Cohen’s d)▫Which standard deviation?
Psy 320 - Cal State Northridge
34
Effect Size•Use either standard deviation or their
pooled average.▫We will pool because neither has any claim
to priority.▫Pooled st. dev. = 2.745 1.657
Psy 320 - Cal State Northridge
35
Effect Size, cont.10.25 8.4
1.657
1.851.12
1.657
viol nonviol
pooled
X Xd
s
•This difference is approximately 1.12 standard deviations
•This is a very large effect
Psy 320 - Cal State Northridge
36
Confidence Limits
1 21 2.95 .05,2 *
(____ ____) ____ ____
____ ____
____ ____
tailed X XCI X X t s
Psy 320 - Cal State Northridge
37
Confidence Limits•p = .95 that based on the sample values the
interval formed in this way includes the true value of 1 - 2
•The probability that interval includes 1 - 2
given the value of•Does 0 fall in the interval? What does that
mean?
1 2X X
Psy 320 - Cal State Northridge
38
Computer Example
•SPSS reproduces the t value and the confidence limits.
•All other statistics are the same as well.•Note different df depending on
homogeneity of variance. •Remember: Don’t pool if heterogeneity
and modify degrees of freedom
Psy 320 - Cal State Northridge
39
SPSS outputGro u p Sta tis tic s
8 1 0 .2 5 0 0 1 .6 6 9 0 5 .5 9 0 1 0
1 0 8 .4 0 0 0 1 .6 4 6 5 5 .5 2 0 6 8
GAME1 .0 0 g ta
2 .0 0 n b a
AGGRESSN Me a n Std . De v i a t i o n
Std . Erro rMe a n
Ind e p e n de nt Sa mple s Te s t
.0 0 5 .9 4 2 2 .3 5 5 1 6 .0 3 2 1 .8 5 0 0 .7 8 5 7 1 .1 8 4 3 6 3 .5 1 5 6 4
2 .3 5 1 1 5 .0 4 8 .0 3 3 1 .8 5 0 0 .7 8 6 9 7 .1 7 3 0 8 3 .5 2 6 9 2
Eq u a l v a ria n c e sa s s u me d
Eq u a l v a ria n c e sn o t a s s u me d
AGGRESSF Sig .
L e v e n e 's T e s t fo rEq u a lity o f Va ria n c e s
t d f Sig . (2 -ta ile d )Me a n
Diffe re n c eStd . Erro r
Diffe re n c e L o we r Up p e r
9 5 % Co n fid e n c eIn te rv a l o f th eDiffe re n c e
t-te s t fo r Eq u a lity o f Me a n s