sampling. sampling probability sampling probability sampling based on random selection based on...
TRANSCRIPT
SamplingSampling
SamplingSampling
Probability SamplingProbability SamplingBased on random selectionBased on random selection
Non-probability samplingNon-probability samplingBased on convenienceBased on convenience
Sampling Miscues: Alf Sampling Miscues: Alf Landon for President Landon for President
(1936)(1936)Literary Digest: post cards to voters in 6 Literary Digest: post cards to voters in 6
statesstates Correctly predicting elections from 1920-1932Correctly predicting elections from 1920-1932
Names selected from telephone directories Names selected from telephone directories and automobile registrationsand automobile registrations
In 1936, they sent out 10 million post cardsIn 1936, they sent out 10 million post cards Results pick Landon 57% to Roosevelt 43%Results pick Landon 57% to Roosevelt 43%
Election: Roosevelt in the largest landslideElection: Roosevelt in the largest landslide Roosevelt 61% of the vote and 523-8 in Elect. Col.Roosevelt 61% of the vote and 523-8 in Elect. Col.
Why so inaccurate?: Poor sampling frameWhy so inaccurate?: Poor sampling frame Leads to selection of wealthy respondentsLeads to selection of wealthy respondents
Sampling Miscues: Sampling Miscues: Thomas E. Dewey for Thomas E. Dewey for
President (1948)President (1948)Gallup uses quota sampling to pick winner Gallup uses quota sampling to pick winner
1936-19441936-1944 Quota sampling:Quota sampling:
matches sample characteristics to characteristics of matches sample characteristics to characteristics of populationpopulation
Gallup quota samples on the basis of incomeGallup quota samples on the basis of income
In 1948, Gallup picked Dewey to defeat In 1948, Gallup picked Dewey to defeat TrumanTruman Reasons:Reasons:
1. Most pollsters quit polling in October1. Most pollsters quit polling in October 2. Undecided voters went for Truman2. Undecided voters went for Truman 3. Unrepresentative samples—WWII changed society 3. Unrepresentative samples—WWII changed society
since censussince census
Non-probability Non-probability SamplingSampling
In situations where sampling frame for In situations where sampling frame for randomization doesn’t existrandomization doesn’t exist
Types of non-probability samples:Types of non-probability samples: 1. Reliance on available subjects 1. Reliance on available subjects
convenience samplingconvenience sampling 2. Purposive or judgmental sampling2. Purposive or judgmental sampling 3. Snowball sampling3. Snowball sampling 4. Quota sampling 4. Quota sampling
Reliance on Available Reliance on Available SubjectsSubjects
Person on the street, easily accessiblePerson on the street, easily accessible
Examples:Examples: Mall intercepts, college students, person on the Mall intercepts, college students, person on the
streetstreet
Frequently used, but usually biasedFrequently used, but usually biased
Notoriously inaccurateNotoriously inaccurate Especially in making inferences about larger Especially in making inferences about larger
populationpopulation
Purposive or Judgmental Purposive or Judgmental SamplingSampling
Dictated by the purpose of the studyDictated by the purpose of the study Situational judgments about what individuals Situational judgments about what individuals
should be surveyed to make for a useful or should be surveyed to make for a useful or representative samplerepresentative sample E.g., Using college students to study third-E.g., Using college students to study third-
person effects regarding rap and metal musicperson effects regarding rap and metal music 3pe: Others are more affected by exposure than 3pe: Others are more affected by exposure than
selfself Assessing effects on self and othersAssessing effects on self and others
Using college students makes for homogeneity of Using college students makes for homogeneity of selfself
Snowball SamplingSnowball Sampling
Used when population of interest is difficult Used when population of interest is difficult to locateto locate E.g., homeless peopleE.g., homeless people
Research collects data from of few people in Research collects data from of few people in the targeted group the targeted group Initially surveyed individuals asked to name Initially surveyed individuals asked to name
other people to contactother people to contact Good for explorationGood for exploration Bad for generalizabilityBad for generalizability
Quota SamplingQuota Sampling
Begins with a table of relevant Begins with a table of relevant characteristics of the populationcharacteristics of the population Proportions of Gender, Age, Education, Proportions of Gender, Age, Education,
Ethnicity from census dataEthnicity from census data Selecting a sample to match those Selecting a sample to match those
proportionsproportions
Problems:Problems: 1. Quota frame must be accurate1. Quota frame must be accurate 2. Sample is not random2. Sample is not random
Probability SamplingProbability Sampling
Goal: RepresentativenessGoal: Representativeness Sample resembles larger populationSample resembles larger population
Random selectionRandom selection Enhancing likelihood of representative sampleEnhancing likelihood of representative sample Each unit of the population has an equal Each unit of the population has an equal
chance of being selected into the samplechance of being selected into the sample
Population ParametersPopulation Parameters
Parameter: Summary statistic for the Parameter: Summary statistic for the populationpopulation E.g., Mean age of the populationE.g., Mean age of the population
Sample is used to make parameter Sample is used to make parameter estimatesestimates E.g., Mean age of the sampleE.g., Mean age of the sample
Used as an estimate of the population Used as an estimate of the population parameterparameter
Sampling ErrorSampling Error
Every time you draw a sample from the Every time you draw a sample from the population, the parameter estimate will population, the parameter estimate will fluctuate slightlyfluctuate slightly E.g.:E.g.:
Sample 1: Mean age = 37.2Sample 1: Mean age = 37.2 Sample 2: Mean age = 36.4Sample 2: Mean age = 36.4 Sample 3: Mean age = 38.1Sample 3: Mean age = 38.1
If you draw lots of samples, you would get a If you draw lots of samples, you would get a normal curve of valuesnormal curve of values
Normal Curve of Sample Normal Curve of Sample EstimatesEstimates
Frequency of estimated means from multiple samples
Estimated Mean
Likely population parameter
Standard ErrorStandard Error
The average distance of sample estimates The average distance of sample estimates from the population parameterfrom the population parameter 68% of sample estimates will fall within in one 68% of sample estimates will fall within in one
standard error of the population parameterstandard error of the population parameter
Normal Curve of Sample Normal Curve of Sample EstimatesEstimates
Frequency of estimated means from multiple samples
Estimated Mean
Population parameter
1 standard error unit
Normal Curve of Sample Normal Curve of Sample EstimatesEstimates
Frequency of estimated means from multiple samples
Estimated Mean
Population parameter
1 standard error unit
2/3 of samples
Standard Error Estimates Standard Error Estimates and and
Sample SizeSample Size As the sample size increases:As the sample size increases:
The standard error decreasesThe standard error decreases In other words, are sample estimate is likely In other words, are sample estimate is likely
to be closer to the population parameterto be closer to the population parameter As the sample size increases, we get more As the sample size increases, we get more
confident in our parameter estimateconfident in our parameter estimate
Confidence LevelsConfidence Levels
Two thirds of samples will fall within the Two thirds of samples will fall within the standard error of the population parameterstandard error of the population parameter Therefore: a single sample has a 68% chance Therefore: a single sample has a 68% chance
of being within the standard error of being within the standard error
Confidence levels:Confidence levels: 68% sure estimate is within 1 s.e. of 68% sure estimate is within 1 s.e. of
parameterparameter 95% sure estimate is within 2 s.e. of 95% sure estimate is within 2 s.e. of
parameterparameter 99% sure estimate is within 3 s.e. of 99% sure estimate is within 3 s.e. of
parameterparameter
Confidence IntervalConfidence Interval
Interval width at which we are 95% Interval width at which we are 95% confident contains the population parameterconfident contains the population parameter
For example, we predict that Candidate X For example, we predict that Candidate X will receive 45% of the vote with a 3% will receive 45% of the vote with a 3% confidence intervalconfidence interval We are 95% sure the parameter will be between:We are 95% sure the parameter will be between:
42% and 48%42% and 48%
Confidence interval shrinks as:Confidence interval shrinks as: Standard error is smallerStandard error is smaller Sample size is largerSample size is larger
Sample Size & Confidence Sample Size & Confidence IntervalInterval
How precise does the estimate have to be?How precise does the estimate have to be? More precise: larger sample sizeMore precise: larger sample size
Larger samples increase precisionLarger samples increase precision But at a diminishing rateBut at a diminishing rate Each unit you add to your sample contributes Each unit you add to your sample contributes
to the accuracy of your estimateto the accuracy of your estimate But the amount it adds shrinks with additional But the amount it adds shrinks with additional
unit addedunit added
95% Confidence 95% Confidence IntervalsIntervals
% split
N = 100
N = 200
N = 300
N = 400
N = 500
N = 700
N = 1000
N = 1500
50/50 10.0 7.1 5.8 5.0 4.5 3.8 3.2 2.6
70/30 9.2 6.5 5.3 4.6 4.1 3.5 2.9 2.4
90/10 6.8 4.2 3.5 3.0 2.7 2.3 1.9 1.5
Sample Size
Sampling FrameSampling Frame
List of units from which sample is drawnList of units from which sample is drawn Defines your populationDefines your population E.g., List of members of organization or E.g., List of members of organization or
communitycommunity
Ideally you’d like to list all members of your Ideally you’d like to list all members of your population as your sampling framepopulation as your sampling frame Randomly select your sample from that listRandomly select your sample from that list
Often impractical to list entire populationOften impractical to list entire population
Sampling Frames for Sampling Frames for SurveysSurveys
Limitations of the telephone book:Limitations of the telephone book: Misses unlisted numbersMisses unlisted numbers Class bias:Class bias:
Poor people may not have phonePoor people may not have phone Less likely to have multiple phone linesLess likely to have multiple phone lines
Most studies use a technique such as Most studies use a technique such as Random Digit Dialing as a surrogate for a Random Digit Dialing as a surrogate for a sampling framesampling frame
Types of Sampling Types of Sampling DesignsDesigns
Simple Random SamplingSimple Random Sampling
Systematic SamplingSystematic Sampling
Stratified SamplingStratified Sampling
Multi-stage Cluster SamplingMulti-stage Cluster Sampling
Simple Random Simple Random SamplingSampling
Establish a sampling frameEstablish a sampling frame A number is assigned to each elementA number is assigned to each element Numbers are randomly selected into the Numbers are randomly selected into the
samplesample
Systematic SamplingSystematic Sampling
Establish sampling frameEstablish sampling frame Select every kSelect every kthth element with random start element with random start E.g., 1000 on the list, choosing every 10E.g., 1000 on the list, choosing every 10thth
name yields a sample size of 100name yields a sample size of 100
Sampling interval: standard distance Sampling interval: standard distance between units on the sampling framebetween units on the sampling frame Sampling interval = population size / sample Sampling interval = population size / sample
sizesize
Sampling ratio: proportion of population Sampling ratio: proportion of population that are selectedthat are selected Sampling ratio = sample size / population sizeSampling ratio = sample size / population size
Stratified SamplingStratified Sampling
Modification used to reduce potential for Modification used to reduce potential for sampling errorsampling error Research ensures that certain groups are Research ensures that certain groups are
represented proportionately in the samplerepresented proportionately in the sample E.g., If the population is 60% female, stratified E.g., If the population is 60% female, stratified
sample selects 60% females into the samplesample selects 60% females into the sample E.g., Stratifying by region of the country to E.g., Stratifying by region of the country to
make sure that each region is proportionately make sure that each region is proportionately representedrepresented
Two Methods of Two Methods of StratificationStratification
1. Sort population in groups1. Sort population in groups Randomly select within groups in proportion to Randomly select within groups in proportion to
relative group sizerelative group size
2. Sort population into groups2. Sort population into groups Systemically select within groups using random Systemically select within groups using random
startstart
Disproportionate stratification:Disproportionate stratification: Some stratification groups can be over-sampled Some stratification groups can be over-sampled
for sub-group analysisfor sub-group analysis Samples are then weighted to restore population Samples are then weighted to restore population
proportionsproportions
Cluster SamplingCluster Sampling
Frequently, there is no convenient way of Frequently, there is no convenient way of listing the population for sampling purposeslisting the population for sampling purposes E.g., Sample of Dane County or WisconsinE.g., Sample of Dane County or Wisconsin
Hard to get a list of the population membersHard to get a list of the population members
Cluster sampleCluster sample Sample of census blocksSample of census blocks
List of people for selected census blockList of people for selected census block Select sub-sample of people living on each blockSelect sub-sample of people living on each block
Multi-stage Cluster Multi-stage Cluster SampleSample
Cluster sampling done in a series of stages:Cluster sampling done in a series of stages: List, then sample withinList, then sample within
Example:Example: Stage 1: Listing zip codesStage 1: Listing zip codes
Randomly selecting zip codesRandomly selecting zip codes Stage 2: List census blocks within selected zip Stage 2: List census blocks within selected zip
codescodes Randomly select census blocksRandomly select census blocks
Stage 3: List households on selected census Stage 3: List households on selected census blocksblocks Randomly select householdsRandomly select households
Stage 4: List residents of selected householdsStage 4: List residents of selected households Randomly select person to interviewRandomly select person to interview
Multi-stage Sampling and Multi-stage Sampling and Sampling ErrorSampling Error
Error is introduced at each stageError is introduced at each stage
One solution is to use stratification at each One solution is to use stratification at each stage to try to reduce sampling errorstage to try to reduce sampling error