basic quantitative methods in the social sciences (aka intro stats)

78
Basic Quantitative Methods in the Social Sciences (AKA Intro Stats) 02-250-01 02-250-01 Lecture 5 Lecture 5

Upload: lam

Post on 04-Jan-2016

39 views

Category:

Documents


2 download

DESCRIPTION

Basic Quantitative Methods in the Social Sciences (AKA Intro Stats). 02-250-01 Lecture 5. Sampling Distributions. Inferential Statistics generalizes findings obtained from samples to the populations that the samples were drawn from - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Basic Quantitative Methods in the Social

Sciences

(AKA Intro Stats)02-250-0102-250-01

Lecture 5Lecture 5

Page 2: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Sampling Distributions

• Inferential Statistics generalizes Inferential Statistics generalizes findings obtained from samples to findings obtained from samples to the populations that the samples the populations that the samples were drawn from were drawn from

• Samples need to be representative Samples need to be representative of the populations they are drawn of the populations they are drawn from – so we use random samplingfrom – so we use random sampling

Page 3: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Random Sample

• Random Sample:Random Sample: a sample in which a sample in which each member of the population has each member of the population has an equal chance of being includedan equal chance of being included

• We cannot assume that a random We cannot assume that a random sample is exactly representative of its sample is exactly representative of its population population E.g., randomly choosing 50 students from E.g., randomly choosing 50 students from

this class – their mean age may not be this class – their mean age may not be exactly the mean age of the entire class exactly the mean age of the entire class (the population – approx 230 students)(the population – approx 230 students)

Page 4: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Random Sampling

• Random sampling makes all the Random sampling makes all the samples which could be drawn from samples which could be drawn from the population equally likely (e.g., the population equally likely (e.g., who is included in the 50 student who is included in the 50 student sample)sample)

• Each of the possible samples of 50 Each of the possible samples of 50 students would have mean ages students would have mean ages that would slightly differ from the that would slightly differ from the population mean agepopulation mean age

• We measure this difference with We measure this difference with sampling errorsampling error

Page 5: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Sampling Error• Sampling Error:Sampling Error: the difference the difference

between a statistic and the between a statistic and the parameter it estimatesparameter it estimatesE.g., if the population mean age was E.g., if the population mean age was

24 and the sample mean age was 21, 24 and the sample mean age was 21, we say we have a sampling error of 3 we say we have a sampling error of 3 yearsyears

Page 6: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Sampling Error

• Because we usually don’t collect Because we usually don’t collect data for an entire population, we data for an entire population, we must have some way of estimating must have some way of estimating the sampling error size and the sampling error size and account for it when we generalize account for it when we generalize sample information to populationssample information to populations

• We often obtain more samples to We often obtain more samples to determine the sampling errordetermine the sampling error

Page 7: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Sampling Distributions

• If we draw 6 samples of 50 students If we draw 6 samples of 50 students from this class, we can obtain a better from this class, we can obtain a better estimate of the true population mean estimate of the true population mean age than if we only drew one sampleage than if we only drew one sample

• Suppose the mean ages for those 6 Suppose the mean ages for those 6 samples were as follows:samples were as follows:

25, 23, 23, 25, 25, 2625, 23, 23, 25, 25, 26The mean of these 6 mean ages is 24.5 The mean of these 6 mean ages is 24.5

Page 8: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Sampling Distributions

• Looking at the mean age of the first Looking at the mean age of the first sample, 25 years, if we only had sample, 25 years, if we only had data for this one sample, 25 years data for this one sample, 25 years would be our best estimate of the would be our best estimate of the true population mean true population mean

• By taking more than one sample, we By taking more than one sample, we calculate a more accurate estimate calculate a more accurate estimate of the population mean, 24.5 yearsof the population mean, 24.5 years

Page 9: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Sampling Error

• Since all of the ages are relatively Since all of the ages are relatively close to each other, we can say with close to each other, we can say with greater certainty that we have small greater certainty that we have small sampling error for any sampling error for any oneone of the of the sample meanssample means

• If the samples’ mean ages were much If the samples’ mean ages were much more dissimilar, any one of the more dissimilar, any one of the sample age means would probably sample age means would probably have a much higher sampling errorhave a much higher sampling error

Page 10: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Sampling Error

• This means that the variability of a This means that the variability of a statistic over repeated samplings gives statistic over repeated samplings gives us some indication of sampling errorus some indication of sampling error

• If we continued to draw samples from If we continued to draw samples from the population until all possible the population until all possible samples had been drawn and the samples had been drawn and the statistic of interest (mean age) is statistic of interest (mean age) is entered into a frequency distribution, entered into a frequency distribution, this is known as a sampling this is known as a sampling distributiondistribution

Page 11: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Sampling Distributions

• Sampling Distribution:Sampling Distribution: the the distribution of a statistic over distribution of a statistic over repeated sampling from a specified repeated sampling from a specified populationpopulation

• Using our previous example, the Using our previous example, the sampling distribution of the mean for sampling distribution of the mean for this class is a distribution of the this class is a distribution of the means of every possible sample of 50 means of every possible sample of 50 studentsstudents

Page 12: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Expected Value

• The mean of a sampling distribution ofThe mean of a sampling distribution of

is known as the is known as the expected value of the expected value of the mean = the mean of sampling mean = the mean of sampling meansmeans

• We use the symbol We use the symbol instead of for instead of for the mean of a sampling distribution the mean of a sampling distribution because it is a population of termsbecause it is a population of terms

X

X

X

Page 13: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Standard Error

• The standard deviation of a sampling distribution is The standard deviation of a sampling distribution is know as the know as the standard error (standard error (xx) = the standard ) = the standard amount of difference between and amount of difference between and that is that is reasonable to expect simply by chancereasonable to expect simply by chance

• The mean of any sample we take can be plotted on The mean of any sample we take can be plotted on the sampling distribution of X if we know the the sampling distribution of X if we know the x x and and xx

• The sampling distribution of X is a normal The sampling distribution of X is a normal distributiondistribution

X

Page 14: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Sampling Distribution

x = x

Sampling error

Obtained from one sample

Page 15: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Standard Error

• The formula for standard error is The formula for standard error is as follows: as follows:

nx

Page 16: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Sampling Distributions

• We usually don’t know We usually don’t know xx and and x x

and must estimate and must estimate xx

• Sampling Distributions are the Sampling Distributions are the basis for many statistical tests basis for many statistical tests (e.g., t-test – we’ll talk about this (e.g., t-test – we’ll talk about this later)later)

• Statistical tests are a mathematical Statistical tests are a mathematical way of testing a hypothesisway of testing a hypothesis

Page 17: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Hypothesis Testing

• Hypothesis testing is a way of Hypothesis testing is a way of examining a statement about a examining a statement about a relationship between independent relationship between independent and dependent variables:and dependent variables:

• Independent variable:Independent variable: the variable the variable whose effects the experimenter is whose effects the experimenter is interested in studyinginterested in studying

• Dependent variable:Dependent variable: the variable the variable that the experimenter measures (the that the experimenter measures (the data)data)

Page 18: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Independent and Dependent Variables -

Example• If an experimenter is interested in If an experimenter is interested in

researching how hours of studying researching how hours of studying for an exam affect performance on for an exam affect performance on a test, the variables are as follows:a test, the variables are as follows:Independent Variable (IV): hours spent Independent Variable (IV): hours spent

studyingstudyingDependent Variable (DV): performance Dependent Variable (DV): performance

on test (e.g., grade received)on test (e.g., grade received)

Page 19: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Independent Variables• There are 2 broad types of IVs:There are 2 broad types of IVs:

Treatment Variable: a treatment the Treatment Variable: a treatment the experimenter applies to previously experimenter applies to previously undifferentiated participantsundifferentiated participants• E.g., certain participants are told to study E.g., certain participants are told to study

for 5 hours and others are told to study for for 5 hours and others are told to study for 2 hours2 hours

Categorical Variable: A characteristic Categorical Variable: A characteristic that is inherent to, or pre-exists, in the that is inherent to, or pre-exists, in the participantparticipant• E.g., gender – you can’t assign someone a E.g., gender – you can’t assign someone a

gendergender

Page 20: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Levels of IV• We also talk about the We also talk about the levelslevels of IVs – of IVs –

how we break down the IVhow we break down the IV• E.g., if we are interested in studying the E.g., if we are interested in studying the

IV of hours spent studying, it could have IV of hours spent studying, it could have 2 levels – 2 hours and 5 hours2 levels – 2 hours and 5 hours

• Studying the IV of gender has 2 levels – Studying the IV of gender has 2 levels – male and femalemale and female

• The levels of an IV are compared on The levels of an IV are compared on their DV scores to look for a difference in their DV scores to look for a difference in outcome – is there a difference in test outcome – is there a difference in test performance between those who study performance between those who study for 5 hours and those who study for 2?for 5 hours and those who study for 2?

Page 21: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Time to Think• A nursing researcher wants to know if A nursing researcher wants to know if

giving TLC prolongs life in cancer giving TLC prolongs life in cancer patients. 50 cancer patients are divided patients. 50 cancer patients are divided into two groups: group A (n=25) is given into two groups: group A (n=25) is given TLC by their nurses, and group B (n=25) TLC by their nurses, and group B (n=25) are not. What is the DV, IV, and levels of are not. What is the DV, IV, and levels of IV?IV?

• A researcher wants to know if members A researcher wants to know if members of the Federal Liberal Party are wealthier of the Federal Liberal Party are wealthier than are members of the Federal NDP. than are members of the Federal NDP. 100 members of each party are asked to 100 members of each party are asked to submit financial statements. What is the submit financial statements. What is the DV, IV, and levels of IV?DV, IV, and levels of IV?

Page 22: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Null Hypothesis

• Tests of hypotheses in science are Tests of hypotheses in science are decisions to decisions to retain or rejectretain or reject a a null null hypothesis (Hhypothesis (Hoo))

• Null hypothesis (HNull hypothesis (Hoo) :) : a statement of a statement of relationship between the IV and DV, relationship between the IV and DV, usually a statement of no difference or usually a statement of no difference or no relationship – we assume there is no relationship – we assume there is no relationship between IV and DVno relationship between IV and DV

Page 23: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Null Hypothesis Examples

• Men and women do not differ in IQ Men and women do not differ in IQ ((menmen = = womenwomen))

• Hours spent studying do not affect Hours spent studying do not affect test performance (test performance (2 hours2 hours = = 5 hours5 hours))

• Height does not affect weight Height does not affect weight ((shortshort = = talltall))

Page 24: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Null Hypotheses

• Null hypotheses contain 3 Null hypotheses contain 3 components:components:The IV comparison being madeThe IV comparison being madeThe DV being measuredThe DV being measuredThe null relationship between IV The null relationship between IV

and DV (e.g., “do not differ”)and DV (e.g., “do not differ”)

Page 25: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Alternative Hypothesis

• Although not directly tested, the Although not directly tested, the Alternative Hypothesis (HAlternative Hypothesis (Haa) does ) does state a relationship, or effect, of the state a relationship, or effect, of the IV on the DV – this is often called IV on the DV – this is often called the Research Hypothesisthe Research Hypothesis

• E.g., E.g., HHaa: Men and women do differ in IQ : Men and women do differ in IQ

((menmen womenwomen) ) HHaa: Women have higher IQs than men : Women have higher IQs than men

((womenwomen > > menmen))

Page 26: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Directional Ha

• HHaa: Women have higher IQs than men (: Women have higher IQs than men (womenwomen > > menmen) is a ) is a directionaldirectional alternative hypothesis – we alternative hypothesis – we state that one level of the IV will have greater state that one level of the IV will have greater (or lesser) DV scores than the other level(or lesser) DV scores than the other level

• When we make a directional alternative When we make a directional alternative hypothesis, we have a reason (either based on hypothesis, we have a reason (either based on past research or a theory) to predict the past research or a theory) to predict the direction of the results (i.e., that a statistic at direction of the results (i.e., that a statistic at one level of the IV will be greater or less than one level of the IV will be greater or less than the statistic at the other level of the IV) (note: the statistic at the other level of the IV) (note: the above example is hypothetical only)the above example is hypothetical only)

Page 27: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Non-Directional Ha

• A non-directional alternative hypothesis does A non-directional alternative hypothesis does not state the expected direction of effect:not state the expected direction of effect:

HHaa: Men and women have differing IQs (: Men and women have differing IQs (womenwomen menmen))

• We make a non-directional alternative We make a non-directional alternative hypothesis when we have no reason to hypothesis when we have no reason to predict the direction of the results. For predict the direction of the results. For instance, since there is no theory or research instance, since there is no theory or research body that would suggest that women body that would suggest that women shouldshould have higher IQs than men, we would only have higher IQs than men, we would only predict that their IQs are different than men’spredict that their IQs are different than men’s

Page 28: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Hypothesis Testing

• Hypothesis testing looks at the observed Hypothesis testing looks at the observed difference in DV scores between the difference in DV scores between the levels of the IV and compares this levels of the IV and compares this difference to the expected difference (Hdifference to the expected difference (Hoo))

• Any difference in value of the DV Any difference in value of the DV between the levels of the IV can be between the levels of the IV can be explained in 2 ways – the effect of the IV explained in 2 ways – the effect of the IV or sampling erroror sampling error

Page 29: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Hypothesis Testing

• Testing the null hypothesis is a way Testing the null hypothesis is a way of determining the probability that of determining the probability that the observed outcome could be the observed outcome could be found if the null hypothesis was found if the null hypothesis was truetrueE.g., if we did find a difference E.g., if we did find a difference

between the IQs of men and women, between the IQs of men and women, what is the chance we would find this what is the chance we would find this result if there is actually no difference result if there is actually no difference between their IQs?between their IQs?

Page 30: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Confidence Levels

• When this probability drops below When this probability drops below a certain level (a a certain level (a criterion levelcriterion level), ), we call the result we call the result significantsignificant

• This criterion level is known as the This criterion level is known as the confidence levelconfidence level of the test, or of the test, or alpha (alpha ())

Page 31: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Confidence Level

• Confidence Level:Confidence Level: a criterion level a criterion level of probability (alpha of probability (alpha ), set by the ), set by the experimenter, which acts as the experimenter, which acts as the reference for deciding whether to reference for deciding whether to reject or retain the null hypothesisreject or retain the null hypothesis

• Significant Result at .05:Significant Result at .05: we we determine the null hypothesis is not determine the null hypothesis is not true but there is a 5% chance that true but there is a 5% chance that the null hypothesis is actually true.the null hypothesis is actually true.

Page 32: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Confidence Level

• The confidence level is set by the The confidence level is set by the experimenter, but generally the experimenter, but generally the convention is to use convention is to use = 0.05 and = 0.05 and = 0.01 = 0.01

• For For = 0.05, this means that there = 0.05, this means that there is a 5% chance we will reject the is a 5% chance we will reject the null hypothesis when it is actually null hypothesis when it is actually truetrue

Page 33: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Rejecting the Null Hypothesis

• If the likelihood of observing this If the likelihood of observing this outcome is below the confidence outcome is below the confidence level (level ( = 0.05 or = 0.05 or = 0.01), then = 0.01), then we say that the result is significant we say that the result is significant and we and we rejectreject the null hypothesis the null hypothesis

• Significant results Significant results rejectreject H Ho o (there (there is a difference)is a difference)

• Non-significant results Non-significant results retainretain H Ho o

(there is no difference)(there is no difference)

Page 34: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Type I and Type II Errors

• When we decide to retain or reject When we decide to retain or reject the null hypothesis, we never do so the null hypothesis, we never do so with 100% certainty we are with 100% certainty we are making the right decision – we making the right decision – we make the decision with a make the decision with a probability of being correct (the probability of being correct (the alpha level)alpha level)

• We can make an incorrect We can make an incorrect decision, resulting in 2 types of decision, resulting in 2 types of errors, Type I or Type IIerrors, Type I or Type II

Page 35: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Type I Errors

• Type I Error:Type I Error: Rejection of the null Rejection of the null hypothesis when it is truehypothesis when it is true

• We conclude that the IV affects or We conclude that the IV affects or is related to the DV when in reality is related to the DV when in reality the result was due to sampling the result was due to sampling errorerror

• We see something that is not really We see something that is not really therethere

Page 36: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Type I Error Example

• If our null hypothesis is that men If our null hypothesis is that men and women do not differ in IQ, the and women do not differ in IQ, the Type I error is:Type I error is:Finding a result that men and Finding a result that men and women women dodo differ in IQ, when in differ in IQ, when in reality they reality they do notdo not

• We find this difference because of We find this difference because of sampling errorsampling error

Page 37: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Type II Errors

• Type II Error:Type II Error: Retention of the Retention of the null hypothesis when it is falsenull hypothesis when it is false

• We conclude that the IV does not We conclude that the IV does not affect or is not related to the DV affect or is not related to the DV when in reality there is an effect or when in reality there is an effect or relationshiprelationship

• We fail to see something that is We fail to see something that is really therereally there

Page 38: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Type II error Example

• If our null hypothesis is that men If our null hypothesis is that men and women do not differ in IQ, the and women do not differ in IQ, the Type II error is:Type II error is:

Finding a result that men and Finding a result that men and women women do notdo not differ in IQ, when in differ in IQ, when in reality they reality they do do

Page 39: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Type I and Type II Errors

Page 40: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Type I and Type II Errors

• The probability of making a Type I The probability of making a Type I error is equal to the confidence level error is equal to the confidence level of the statistical test (of the statistical test ( = 0.05 or = 0.05 or = = 0.01) 0.01)

• When you lower the probability of When you lower the probability of making a Type I error (e.g., use making a Type I error (e.g., use = = 0.01 instead of 0.01 instead of = 0.05) you increase = 0.05) you increase the probability of making a Type II the probability of making a Type II error error

Page 41: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Forget About It!

• For this class, you do not need to For this class, you do not need to know how to determine the know how to determine the numerical value of a Type II error, numerical value of a Type II error, nor do you need to understand nor do you need to understand powerpower

• You do need to understand what a You do need to understand what a Type II error Type II error isis

Page 42: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Consider a Sampling Distribution of Arts Students’

GPAs.

x = x

6 10

Sampling error

Page 43: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

What might this mean?

• This sample’s mean (10) appears to be This sample’s mean (10) appears to be substantially larger than the population substantially larger than the population mean (6). Why might this be?mean (6). Why might this be?Perhaps there is something Perhaps there is something distinctdistinct about about

this sample such that it is not really part of this sample such that it is not really part of this sampling distribution to begin with (e.g., this sampling distribution to begin with (e.g., maybe there are gifted arts students)maybe there are gifted arts students)

Alternatively, perhaps it’s just fluke, and we Alternatively, perhaps it’s just fluke, and we just happened to have sampled a bunch of just happened to have sampled a bunch of good arts students. Stated differently, good arts students. Stated differently, perhaps this sample mean perhaps this sample mean isis part of the part of the sampling distribution of arts studentssampling distribution of arts students

Page 44: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Reminder

• We can determine the proportion of We can determine the proportion of scores (in this situation, sample means) scores (in this situation, sample means) that would fall to the right of the sample that would fall to the right of the sample mean in question by looking at a normal mean in question by looking at a normal distribution table (Table E.10). distribution table (Table E.10).

• To do so, we need to know the Z value To do so, we need to know the Z value of this sample mean. We will come back of this sample mean. We will come back to this (but for sake of clarity, note that to this (but for sake of clarity, note that we will be learning to calculate a z-test, we will be learning to calculate a z-test, which uses a slightly different formula which uses a slightly different formula than the z-score formula that you know)than the z-score formula that you know)

Page 45: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

One vs. Two Tailed Tests

• The “tails” of a test set up our rejection The “tails” of a test set up our rejection region – they determine how we decide region – they determine how we decide to retain or reject Hto retain or reject Hoo

• When we use a one-tailed test, we are When we use a one-tailed test, we are testing the null hypothesis for a testing the null hypothesis for a directionaldirectional alternative hypothesis alternative hypothesis (e.g., H(e.g., Haa: women will have higher IQs : women will have higher IQs than men)than men)

• We are only interested in whether or We are only interested in whether or not women have higher IQs than men, not women have higher IQs than men, not lowernot lower

Page 46: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Two-Tailed Tests

• When we use a two-tailed test, we When we use a two-tailed test, we are testing the null hypothesis for are testing the null hypothesis for a a non-directionalnon-directional alternative alternative hypothesis (e.g., Hhypothesis (e.g., Haa: women and : women and men will have different IQs)men will have different IQs)

• Here, we are interested in whether Here, we are interested in whether or not women have higher or lower or not women have higher or lower IQs than men IQs than men

Page 47: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

One vs. Two Tailed Tests (using = 0.05)

2.5% 2.5% 5% 5%

Page 48: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Two-Tailed Tests

• Once we begin discussing t-tests, Once we begin discussing t-tests, you will see that the value that you will see that the value that determines whether or not our determines whether or not our observed statistic falls above or observed statistic falls above or below the below the = 0.05 depends on a = 0.05 depends on a number of factorsnumber of factors

• For now, know that we reject HFor now, know that we reject Hoo if if our observed statistic is significantly our observed statistic is significantly greater than our expected statisticgreater than our expected statistic

Page 49: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Test Statistics

• A A test statistictest statistic is a number calculated is a number calculated from the scores of a sample that allows from the scores of a sample that allows us to test a Null Hypothesis and make a us to test a Null Hypothesis and make a decision to decision to rejectreject or or retainretain the H the Hoo

• We will be talking about various test We will be talking about various test statistics for the remainder of the term, statistics for the remainder of the term, and will begin with the z-statistic todayand will begin with the z-statistic today

Page 50: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Z-scores Revisited

• We know, by using the z-score We know, by using the z-score formula, the probability of obtaining a formula, the probability of obtaining a score less than a given X value in a score less than a given X value in a standard normal distributionstandard normal distribution

• E.g., whenE.g., when

50.120

30

20

10070

20,100,70

z

Xz

X

Page 51: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

The “smaller portion” area is .0668 (from Table E.10)

z -1.5 0

X 70 100

.0668

Page 52: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Interpreting z• This means that if we randomly select This means that if we randomly select

one score from this sample, the one score from this sample, the probability of that score being less than probability of that score being less than 70 is .066870 is .0668

• But what if we want to test the But what if we want to test the hypothesis that a sample of n scores hypothesis that a sample of n scores (mean = 70) is actually a part of the (mean = 70) is actually a part of the population (mean = 100, sd = 20)?population (mean = 100, sd = 20)?

• We no longer use the z-score formula, we We no longer use the z-score formula, we use a use a z-statisticz-statistic

• Remember: whenever we are testing a Remember: whenever we are testing a hypothesis, we use a test statistichypothesis, we use a test statistic

Page 53: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

What is Sigma?• Usually, we do not know sigma ( ), the Usually, we do not know sigma ( ), the

sd for a population (because obtaining sd for a population (because obtaining data for an entire population is usually data for an entire population is usually not done)not done)

• Sometimes we do know sigma (e.g., for Sometimes we do know sigma (e.g., for common psychological tests)common psychological tests)

• When we know sigma, we can obtain When we know sigma, we can obtain the sampling distribution of the mean the sampling distribution of the mean when the Null Hypothesis is true (that when the Null Hypothesis is true (that the sample does come from the the sample does come from the population)population)

Page 54: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Null Hypothesis• When we compare a sample mean with a When we compare a sample mean with a

population mean, the Null Hypothesis is that population mean, the Null Hypothesis is that the sample DOES come from that population:the sample DOES come from that population:HHoo: or that 70 = 100: or that 70 = 100

• But how can 70 = 100??But how can 70 = 100??Recall that a sample extracted from a population Recall that a sample extracted from a population

with µ = 100 will more than likely result in a with µ = 100 will more than likely result in a sample mean that is above or below 100 because sample mean that is above or below 100 because of of sampling errorsampling error

When we test a Null Hypothesis, we are testing to When we test a Null Hypothesis, we are testing to see if the sample mean and population mean are see if the sample mean and population mean are statisticallystatistically different from each other (that there different from each other (that there is a 95% chance [based on an alpha level of .05] is a 95% chance [based on an alpha level of .05] that 70 is statistically different from 100)that 70 is statistically different from 100)

X

Page 55: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Sampling Distribution of the Mean

• In hypothesis testing, we set up the In hypothesis testing, we set up the sampling distribution of the mean and sampling distribution of the mean and then calculate a test statistic to then calculate a test statistic to determine if we can reject the Hdetermine if we can reject the Hoo

• How is this done? Whenever we know How is this done? Whenever we know we use a z-test: we know for the one we use a z-test: we know for the one sample of interest, we know for the sample of interest, we know for the population, so we can calculate population, so we can calculate (standard error for the sampling (standard error for the sampling distribution of the mean)distribution of the mean)

x

x

Page 56: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Standard Error Revisited• Last week, we stated that the Last week, we stated that the

standard deviation of a sampling standard deviation of a sampling distribution of the mean is called distribution of the mean is called standard errorstandard error

• Standard error is used in test Standard error is used in test statistic formulae because we are statistic formulae because we are using sampling distributions of the using sampling distributions of the meanmean

nx

Page 57: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Z Statistic• If testing a null hypothesis that a sample If testing a null hypothesis that a sample

mean is equal to the population mean (and mean is equal to the population mean (and sigma is known), we must use the following sigma is known), we must use the following formula for the z-statistic (standard error formula for the z-statistic (standard error instead of standard deviation):instead of standard deviation):

n

xzobs

Page 58: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

The z-statistic• Why zWhy zobsobs?? When we test the H?? When we test the Hoo, we will , we will

compare this zcompare this zobsobs (our “z observed”) value (our “z observed”) value with a zwith a zcritcrit (our “z critical”) value (our “z critical”) value

• Note: zNote: zobs obs is often also calledis often also called zzobtobt (for “z (for “z obtained”)obtained”)

• Hypothesis testing compares the Hypothesis testing compares the absolute absolute value of zvalue of zobsobs and z and zcritcrit in the following way: in the following way:If zIf zobsobs > z > zcritcrit we reject the null hypothesis we reject the null hypothesis

If zIf zobsobs < z < zcrit, crit, we retain the null hypothesiswe retain the null hypothesis

If zIf zobsobs = z = zcrit, crit, we retain the null hypothesiswe retain the null hypothesis

Page 59: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Zcrit

• The zThe zcritcrit value is determined based on the alpha value is determined based on the alpha level used (usually alpha = .05)level used (usually alpha = .05)zzcritcrit is the z-score below which the probability that the is the z-score below which the probability that the

sample data come from the population is less than .05 sample data come from the population is less than .05 (the score that marks the “tail”)(the score that marks the “tail”)

We use Table E.10 to determine zWe use Table E.10 to determine zcritcrit

Why might we be interested in this?Why might we be interested in this?

• We will know if we are using a one-tailed or two-We will know if we are using a one-tailed or two-tailed z-test based on our research question:tailed z-test based on our research question:If we use a one-tailed test, the area in that tail is .05If we use a one-tailed test, the area in that tail is .05If we use a two-tailed test, the area in EACH tail is .025 If we use a two-tailed test, the area in EACH tail is .025

(.05/2 tails)(.05/2 tails)

Page 60: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Determining zcrit

• When we discussed z-scores, we When we discussed z-scores, we reviewed problems where you know reviewed problems where you know the proportion of scores and needed to the proportion of scores and needed to determine the z-score (e.g., “the determine the z-score (e.g., “the lowest 10%”)lowest 10%”)

• Determining zDetermining zcritcrit is a similar process: is a similar process:Step 1: one-tailed or two-tailed?Step 1: one-tailed or two-tailed?Step 2: alpha = .05 or alpha = .01?Step 2: alpha = .05 or alpha = .01?Step 3: Find the area in the “smaller Step 3: Find the area in the “smaller

portion” column in Table E.10 to portion” column in Table E.10 to determine the zdetermine the zcritcrit

Page 61: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Tail Review …

2.5% 2.5% 5% 5%

Page 62: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

zcrit for Two-tailed Tests

• Alpha = .05 means that there is .025 per tailAlpha = .05 means that there is .025 per tailFind .025 in the “smaller portion” column: Find .025 in the “smaller portion” column:

• zzcrit crit = 1.96= 1.96Note! This is two-tailed, so this means:Note! This is two-tailed, so this means:

• zzcritcrit = ± 1.96 = ± 1.96

• Alpha = .01 means that there is .005 per tailAlpha = .01 means that there is .005 per tailFind .005 in the “smaller portion” column: Find .005 in the “smaller portion” column:

• zzcritcrit = ± 2.57 = ± 2.57Note! The exact “smaller portion” of .005 is not Note! The exact “smaller portion” of .005 is not

in the table. The values of .0049 and .0051 are in the table. The values of .0049 and .0051 are listed, so which do we use?? Convention dictates listed, so which do we use?? Convention dictates that we use zthat we use zcritcrit = ± 2.57 = ± 2.57

Page 63: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

zcrit for One-tailed Tests• Alpha = .05 means that there is .05 in the tailAlpha = .05 means that there is .05 in the tail

Find .05 in the “smaller portion” column zFind .05 in the “smaller portion” column zcritcrit = 1.64 = 1.64Note: The exact “smaller portion” of .05 is not in the Note: The exact “smaller portion” of .05 is not in the

table. The values of .0495 and .0505 are listed, so table. The values of .0495 and .0505 are listed, so which do we use?? Convention dictates that we use zwhich do we use?? Convention dictates that we use zcritcrit = 1.64= 1.64

Note! To determine if this is a + or – zNote! To determine if this is a + or – zcritcrit, look at your , look at your Alternative HypothesisAlternative Hypothesis

• Alpha = .01 means that there is .01 in the tailAlpha = .01 means that there is .01 in the tailFind .01 in the “smaller portion” column: zFind .01 in the “smaller portion” column: zcritcrit = 2.33 = 2.33Note! .0099 and .0102 are listed: we use .0099 (z = Note! .0099 and .0102 are listed: we use .0099 (z =

2.33) because it is closest to .01002.33) because it is closest to .0100Note! To determine if this is a + or – zNote! To determine if this is a + or – zcritcrit, look at your , look at your

Alternative HypothesisAlternative Hypothesis

Page 64: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Z-test Hypothesis Testing Steps• 1. State level of significance:1. State level of significance:

= 0.05 (= 0.05 ( = 0.05 is usually used) = 0.05 is usually used) OR OR = 0.01 = 0.01

• 2. State IV, levels of IV, and DV2. State IV, levels of IV, and DV

• 3. State the hypotheses:3. State the hypotheses:Null hypothesis: HNull hypothesis: Hoo::Alternative Hypothesis: HAlternative Hypothesis: Haa::Note! At this point you need to read the Note! At this point you need to read the

question carefully to decide if you are question carefully to decide if you are testing a directional or nondirectional testing a directional or nondirectional hypothesishypothesis

Page 65: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Z-test Hypothesis Testing Steps…

• 4. Determine if you are using a one-4. Determine if you are using a one-tailed or two-tailed testtailed or two-tailed testA one-tailed test is used when you test a A one-tailed test is used when you test a

Directional hypothesisDirectional hypothesisA two-tailed test is used when you test a A two-tailed test is used when you test a

nondirectional hypothesisnondirectional hypothesis

• 5. Find the rejection region:5. Find the rejection region:I.e., find your zI.e., find your zcritcrit!!It is usually a good idea to draw the normal It is usually a good idea to draw the normal

curve and plot your zcurve and plot your zcritcrit at this point – it at this point – it helps!helps!

Page 66: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Z-test Hypothesis Testing Steps…

• 6. Calculate your z statistic (z6. Calculate your z statistic (zobsobs):):

• 7. Compare z7. Compare zcritcrit to z to zobsobs

Plot zPlot zobsobs on your normal distribution on your normal distribution

Compare the numerical value of zCompare the numerical value of zcritcrit to z to zobsobs

n

xzobs

Page 67: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Step 7 Example #1 (alpha = .05)

• If zIf zobsobs = 2.59 = 2.59.025 .025

.05

zcrit -1.96 +1.96

Two-tailed:

One-tailed:

zobs 2.59

zcrit +1.64

zobs 2.59

Page 68: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Step 7 Example #2 (alpha = .05)

• If zIf zobsobs = -1.75 = -1.75.025 .025

.05

zcrit -1.96 +1.96

Two-tailed:

One-tailed:

zobs -1.75

zcrit -1.64

zobs -1.75

Page 69: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Step 7• Null Hypotheses are rejected when zNull Hypotheses are rejected when zobsobs falls falls

in the “rejection region” (the area beyond in the “rejection region” (the area beyond zzcritcrit). The rejection region is the “tail” of ). The rejection region is the “tail” of the distributionthe distribution

• OR Null Hypotheses are rejected when OR Null Hypotheses are rejected when z zobsobs > z > zcritcrit

BUT! What about when the zBUT! What about when the zobsobs and z and zcritcrit are are both negative numbers??both negative numbers??

In this case, think of rejecting HIn this case, think of rejecting Hoo when the when the “absolute value”“absolute value” of z of zobsobs > z > zcritcrit

““Absolute value” means that you remove the Absolute value” means that you remove the negative sign from both numbers (e.g., the negative sign from both numbers (e.g., the “absolute value” of –5.5 is +5.5)“absolute value” of –5.5 is +5.5)

Page 70: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Step 8 (Last One)• Step 8. State conclusions in wordsStep 8. State conclusions in words• Once you decide to reject or retain Once you decide to reject or retain

HHoo, you need to state your , you need to state your conclusionsconclusions

• So what does rejecting the HSo what does rejecting the Hoo actually actually mean for this research study?mean for this research study?

• OR: What does retaining the HOR: What does retaining the Hoo actually mean for this research actually mean for this research study?study?

Page 71: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Step 8 continued• Rejecting the HRejecting the Hoo for z-tests means that the for z-tests means that the

sample mean is significantly different than sample mean is significantly different than the population mean, i.e., there is less than the population mean, i.e., there is less than a 5% chance that a sample extracted from a 5% chance that a sample extracted from this population would result in such a this population would result in such a sample mean (because it’s in the tail end)sample mean (because it’s in the tail end)

• BUT! For one-tailed tests, make sure that BUT! For one-tailed tests, make sure that you state how they are different (i.e., is the you state how they are different (i.e., is the sample mean greater or less than the sample mean greater or less than the population mean)population mean)

• Your conclusions should be clear enough Your conclusions should be clear enough that anyone in the general public could that anyone in the general public could understand what the study found understand what the study found

Page 72: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

An Example Using the Z-test

• Scientists have come up with a Scientists have come up with a breakthrough new drug, they assert breakthrough new drug, they assert that by taking this drug it will affect that by taking this drug it will affect your IQ. Because it is so new they your IQ. Because it is so new they are hoping it makes you smarter, but are hoping it makes you smarter, but at this point it might also make you at this point it might also make you dumber. A sample of 36 people has x dumber. A sample of 36 people has x = 105, the population µ = 100 and = 105, the population µ = 100 and the population the population = 15. Test their = 15. Test their hypothesis.hypothesis.

Page 73: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Example cont.• 1. State level of significance - 1. State level of significance - = 0.05 = 0.05

(what is usually used)(what is usually used)

• 2. State IV and DV2. State IV and DVIV = pill (levels = pill and no pill)IV = pill (levels = pill and no pill)DV = IQ scores DV = IQ scores

• 3. Null hypothesis: 3. Null hypothesis: The drug does not make you smarter or The drug does not make you smarter or

dumber (i.e., the sample mean does not dumber (i.e., the sample mean does not differ from the population mean)differ from the population mean)

• Alternative Hypothesis:Alternative Hypothesis:The drug makes you either smarter or The drug makes you either smarter or

dumber dumber

100: xHo

xH a :

Page 74: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Example• 4. B/c this hypothesis is non-directional, 4. B/c this hypothesis is non-directional,

we use a two-tailed testwe use a two-tailed test

• 5. Find the rejection region: 5. Find the rejection region: = 0.05, so = 0.05, so with a two-tailed test we want a critical with a two-tailed test we want a critical value that represents a region of value that represents a region of rejection that makes up 0.025 of the rejection that makes up 0.025 of the area of each tail area of each tail

.025 .025

Page 75: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Example• From Table E.10, we find that the From Table E.10, we find that the

critical value for z is equal to +1.96 critical value for z is equal to +1.96 or –1.96or –1.96

• This means that zThis means that zcritcrit = = 1.961.96

• 6. Calculate your statistic6. Calculate your statistic

n

xz

36

15100105

z

00.25.2

5z

Page 76: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Example• This means our zThis means our zobs obs = +2.00= +2.00

• 7. Compare z7. Compare zcritcrit to z to zobsobs

• Is zIs zobsobs > z > zcritcrit????Yes! 2.00 > 1.96Yes! 2.00 > 1.96

.025

z -1.96 1.96 2.00

Page 77: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Example

• B/c ourB/c our zzobs obs lies beyond zlies beyond zcritcrit we say our we say our z-value falls into the region of z-value falls into the region of rejection: the value of zrejection: the value of zobsobs is greater is greater than the value of zthan the value of zcritcrit so we choose to so we choose to reject the Hreject the Hoo

• 8. We conclude that the IQ pill 8. We conclude that the IQ pill significantly changes someone’s IQ significantly changes someone’s IQ when they ingest itwhen they ingest it

Page 78: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Work On It• The average number of times that a The average number of times that a

Canadian donates blood by the time Canadian donates blood by the time they reach the age of 50 is 10, with a they reach the age of 50 is 10, with a population standard deviation of 3 population standard deviation of 3 times. Researchers think that nurses times. Researchers think that nurses donate more blood than average donate more blood than average Canadians. 25 fifty-year-old nurses are Canadians. 25 fifty-year-old nurses are asked how many times they have asked how many times they have given blood, and their mean number of given blood, and their mean number of times donating blood is 15. Test the times donating blood is 15. Test the hypothesis at the .01 level of hypothesis at the .01 level of significance.significance.