sampling. sampling probability sampling probability sampling based on random selection based on...

32
Sampling Sampling

Upload: trevor-kennison

Post on 15-Dec-2015

265 views

Category:

Documents


13 download

TRANSCRIPT

Page 1: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

SamplingSampling

Page 2: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

SamplingSampling

Probability SamplingProbability SamplingBased on random selectionBased on random selection

Non-probability samplingNon-probability samplingBased on convenienceBased on convenience

Page 3: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Sampling Miscues: Alf Sampling Miscues: Alf Landon for President Landon for President

(1936)(1936)Literary Digest: post cards to voters in 6 Literary Digest: post cards to voters in 6

statesstates Correctly predicting elections from 1920-1932Correctly predicting elections from 1920-1932

Names selected from telephone directories Names selected from telephone directories and automobile registrationsand automobile registrations

In 1936, they sent out 10 million post cardsIn 1936, they sent out 10 million post cards Results pick Landon 57% to Roosevelt 43%Results pick Landon 57% to Roosevelt 43%

Election: Roosevelt in the largest landslideElection: Roosevelt in the largest landslide Roosevelt 61% of the vote and 523-8 in Elect. Col.Roosevelt 61% of the vote and 523-8 in Elect. Col.

Why so inaccurate?: Poor sampling frameWhy so inaccurate?: Poor sampling frame Leads to selection of wealthy respondentsLeads to selection of wealthy respondents

Page 4: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Sampling Miscues: Sampling Miscues: Thomas E. Dewey for Thomas E. Dewey for

President (1948)President (1948)Gallup uses quota sampling to pick winner Gallup uses quota sampling to pick winner

1936-19441936-1944 Quota sampling:Quota sampling:

matches sample characteristics to characteristics of matches sample characteristics to characteristics of populationpopulation

Gallup quota samples on the basis of incomeGallup quota samples on the basis of income

In 1948, Gallup picked Dewey to defeat In 1948, Gallup picked Dewey to defeat TrumanTruman Reasons:Reasons:

1. Most pollsters quit polling in October1. Most pollsters quit polling in October 2. Undecided voters went for Truman2. Undecided voters went for Truman 3. Unrepresentative samples—WWII changed society 3. Unrepresentative samples—WWII changed society

since censussince census

Page 5: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Non-probability Non-probability SamplingSampling

In situations where sampling frame for In situations where sampling frame for randomization doesn’t existrandomization doesn’t exist

Types of non-probability samples:Types of non-probability samples: 1. Reliance on available subjects 1. Reliance on available subjects

convenience samplingconvenience sampling 2. Purposive or judgmental sampling2. Purposive or judgmental sampling 3. Snowball sampling3. Snowball sampling 4. Quota sampling 4. Quota sampling

Page 6: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Reliance on Available Reliance on Available SubjectsSubjects

Person on the street, easily accessiblePerson on the street, easily accessible

Examples:Examples: Mall intercepts, college students, person on the Mall intercepts, college students, person on the

streetstreet

Frequently used, but usually biasedFrequently used, but usually biased

Notoriously inaccurateNotoriously inaccurate Especially in making inferences about larger Especially in making inferences about larger

populationpopulation

Page 7: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Purposive or Judgmental Purposive or Judgmental SamplingSampling

Dictated by the purpose of the studyDictated by the purpose of the study Situational judgments about what individuals Situational judgments about what individuals

should be surveyed to make for a useful or should be surveyed to make for a useful or representative samplerepresentative sample E.g., Using college students to study third-E.g., Using college students to study third-

person effects regarding rap and metal musicperson effects regarding rap and metal music 3pe: Others are more affected by exposure than 3pe: Others are more affected by exposure than

selfself Assessing effects on self and othersAssessing effects on self and others

Using college students makes for homogeneity of Using college students makes for homogeneity of selfself

Page 8: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Snowball SamplingSnowball Sampling

Used when population of interest is difficult Used when population of interest is difficult to locateto locate E.g., homeless peopleE.g., homeless people

Research collects data from of few people in Research collects data from of few people in the targeted group the targeted group Initially surveyed individuals asked to name Initially surveyed individuals asked to name

other people to contactother people to contact Good for explorationGood for exploration Bad for generalizabilityBad for generalizability

Page 9: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Quota SamplingQuota Sampling

Begins with a table of relevant Begins with a table of relevant characteristics of the populationcharacteristics of the population Proportions of Gender, Age, Education, Proportions of Gender, Age, Education,

Ethnicity from census dataEthnicity from census data Selecting a sample to match those Selecting a sample to match those

proportionsproportions

Problems:Problems: 1. Quota frame must be accurate1. Quota frame must be accurate 2. Sample is not random2. Sample is not random

Page 10: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Probability SamplingProbability Sampling

Goal: RepresentativenessGoal: Representativeness Sample resembles larger populationSample resembles larger population

Random selectionRandom selection Enhancing likelihood of representative sampleEnhancing likelihood of representative sample Each unit of the population has an equal Each unit of the population has an equal

chance of being selected into the samplechance of being selected into the sample

Page 11: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Population ParametersPopulation Parameters

Parameter: Summary statistic for the Parameter: Summary statistic for the populationpopulation E.g., Mean age of the populationE.g., Mean age of the population

Sample is used to make parameter Sample is used to make parameter estimatesestimates E.g., Mean age of the sampleE.g., Mean age of the sample

Used as an estimate of the population Used as an estimate of the population parameterparameter

Page 12: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Sampling ErrorSampling Error

Every time you draw a sample from the Every time you draw a sample from the population, the parameter estimate will population, the parameter estimate will fluctuate slightlyfluctuate slightly E.g.:E.g.:

Sample 1: Mean age = 37.2Sample 1: Mean age = 37.2 Sample 2: Mean age = 36.4Sample 2: Mean age = 36.4 Sample 3: Mean age = 38.1Sample 3: Mean age = 38.1

If you draw lots of samples, you would get a If you draw lots of samples, you would get a normal curve of valuesnormal curve of values

Page 13: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Normal Curve of Sample Normal Curve of Sample EstimatesEstimates

Frequency of estimated means from multiple samples

Estimated Mean

Likely population parameter

Page 14: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Standard ErrorStandard Error

The average distance of sample estimates The average distance of sample estimates from the population parameterfrom the population parameter 68% of sample estimates will fall within in one 68% of sample estimates will fall within in one

standard error of the population parameterstandard error of the population parameter

Page 15: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Normal Curve of Sample Normal Curve of Sample EstimatesEstimates

Frequency of estimated means from multiple samples

Estimated Mean

Population parameter

1 standard error unit

Page 16: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Normal Curve of Sample Normal Curve of Sample EstimatesEstimates

Frequency of estimated means from multiple samples

Estimated Mean

Population parameter

1 standard error unit

2/3 of samples

Page 17: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Standard Error Estimates Standard Error Estimates and and

Sample SizeSample Size As the sample size increases:As the sample size increases:

The standard error decreasesThe standard error decreases In other words, are sample estimate is likely In other words, are sample estimate is likely

to be closer to the population parameterto be closer to the population parameter As the sample size increases, we get more As the sample size increases, we get more

confident in our parameter estimateconfident in our parameter estimate

Page 18: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Confidence LevelsConfidence Levels

Two thirds of samples will fall within the Two thirds of samples will fall within the standard error of the population parameterstandard error of the population parameter Therefore: a single sample has a 68% chance Therefore: a single sample has a 68% chance

of being within the standard error of being within the standard error

Confidence levels:Confidence levels: 68% sure estimate is within 1 s.e. of 68% sure estimate is within 1 s.e. of

parameterparameter 95% sure estimate is within 2 s.e. of 95% sure estimate is within 2 s.e. of

parameterparameter 99% sure estimate is within 3 s.e. of 99% sure estimate is within 3 s.e. of

parameterparameter

Page 19: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Confidence IntervalConfidence Interval

Interval width at which we are 95% Interval width at which we are 95% confident contains the population parameterconfident contains the population parameter

For example, we predict that Candidate X For example, we predict that Candidate X will receive 45% of the vote with a 3% will receive 45% of the vote with a 3% confidence intervalconfidence interval We are 95% sure the parameter will be between:We are 95% sure the parameter will be between:

42% and 48%42% and 48%

Confidence interval shrinks as:Confidence interval shrinks as: Standard error is smallerStandard error is smaller Sample size is largerSample size is larger

Page 20: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability
Page 21: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Sample Size & Confidence Sample Size & Confidence IntervalInterval

How precise does the estimate have to be?How precise does the estimate have to be? More precise: larger sample sizeMore precise: larger sample size

Larger samples increase precisionLarger samples increase precision But at a diminishing rateBut at a diminishing rate Each unit you add to your sample contributes Each unit you add to your sample contributes

to the accuracy of your estimateto the accuracy of your estimate But the amount it adds shrinks with additional But the amount it adds shrinks with additional

unit addedunit added

Page 22: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

95% Confidence 95% Confidence IntervalsIntervals

% split

N = 100

N = 200

N = 300

N = 400

N = 500

N = 700

N = 1000

N = 1500

50/50 10.0 7.1 5.8 5.0 4.5 3.8 3.2 2.6

70/30 9.2 6.5 5.3 4.6 4.1 3.5 2.9 2.4

90/10 6.8 4.2 3.5 3.0 2.7 2.3 1.9 1.5

Sample Size

Page 23: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Sampling FrameSampling Frame

List of units from which sample is drawnList of units from which sample is drawn Defines your populationDefines your population E.g., List of members of organization or E.g., List of members of organization or

communitycommunity

Ideally you’d like to list all members of your Ideally you’d like to list all members of your population as your sampling framepopulation as your sampling frame Randomly select your sample from that listRandomly select your sample from that list

Often impractical to list entire populationOften impractical to list entire population

Page 24: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Sampling Frames for Sampling Frames for SurveysSurveys

Limitations of the telephone book:Limitations of the telephone book: Misses unlisted numbersMisses unlisted numbers Class bias:Class bias:

Poor people may not have phonePoor people may not have phone Less likely to have multiple phone linesLess likely to have multiple phone lines

Most studies use a technique such as Most studies use a technique such as Random Digit Dialing as a surrogate for a Random Digit Dialing as a surrogate for a sampling framesampling frame

Page 25: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Types of Sampling Types of Sampling DesignsDesigns

Simple Random SamplingSimple Random Sampling

Systematic SamplingSystematic Sampling

Stratified SamplingStratified Sampling

Multi-stage Cluster SamplingMulti-stage Cluster Sampling

Page 26: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Simple Random Simple Random SamplingSampling

Establish a sampling frameEstablish a sampling frame A number is assigned to each elementA number is assigned to each element Numbers are randomly selected into the Numbers are randomly selected into the

samplesample

Page 27: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Systematic SamplingSystematic Sampling

Establish sampling frameEstablish sampling frame Select every kSelect every kthth element with random start element with random start E.g., 1000 on the list, choosing every 10E.g., 1000 on the list, choosing every 10thth

name yields a sample size of 100name yields a sample size of 100

Sampling interval: standard distance Sampling interval: standard distance between units on the sampling framebetween units on the sampling frame Sampling interval = population size / sample Sampling interval = population size / sample

sizesize

Sampling ratio: proportion of population Sampling ratio: proportion of population that are selectedthat are selected Sampling ratio = sample size / population sizeSampling ratio = sample size / population size

Page 28: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Stratified SamplingStratified Sampling

Modification used to reduce potential for Modification used to reduce potential for sampling errorsampling error Research ensures that certain groups are Research ensures that certain groups are

represented proportionately in the samplerepresented proportionately in the sample E.g., If the population is 60% female, stratified E.g., If the population is 60% female, stratified

sample selects 60% females into the samplesample selects 60% females into the sample E.g., Stratifying by region of the country to E.g., Stratifying by region of the country to

make sure that each region is proportionately make sure that each region is proportionately representedrepresented

Page 29: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Two Methods of Two Methods of StratificationStratification

1. Sort population in groups1. Sort population in groups Randomly select within groups in proportion to Randomly select within groups in proportion to

relative group sizerelative group size

2. Sort population into groups2. Sort population into groups Systemically select within groups using random Systemically select within groups using random

startstart

Disproportionate stratification:Disproportionate stratification: Some stratification groups can be over-sampled Some stratification groups can be over-sampled

for sub-group analysisfor sub-group analysis Samples are then weighted to restore population Samples are then weighted to restore population

proportionsproportions

Page 30: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Cluster SamplingCluster Sampling

Frequently, there is no convenient way of Frequently, there is no convenient way of listing the population for sampling purposeslisting the population for sampling purposes E.g., Sample of Dane County or WisconsinE.g., Sample of Dane County or Wisconsin

Hard to get a list of the population membersHard to get a list of the population members

Cluster sampleCluster sample Sample of census blocksSample of census blocks

List of people for selected census blockList of people for selected census block Select sub-sample of people living on each blockSelect sub-sample of people living on each block

Page 31: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Multi-stage Cluster Multi-stage Cluster SampleSample

Cluster sampling done in a series of stages:Cluster sampling done in a series of stages: List, then sample withinList, then sample within

Example:Example: Stage 1: Listing zip codesStage 1: Listing zip codes

Randomly selecting zip codesRandomly selecting zip codes Stage 2: List census blocks within selected zip Stage 2: List census blocks within selected zip

codescodes Randomly select census blocksRandomly select census blocks

Stage 3: List households on selected census Stage 3: List households on selected census blocksblocks Randomly select householdsRandomly select households

Stage 4: List residents of selected householdsStage 4: List residents of selected households Randomly select person to interviewRandomly select person to interview

Page 32: Sampling. Sampling Probability Sampling Probability Sampling Based on random selection Based on random selection Non-probability sampling Non-probability

Multi-stage Sampling and Multi-stage Sampling and Sampling ErrorSampling Error

Error is introduced at each stageError is introduced at each stage

One solution is to use stratification at each One solution is to use stratification at each stage to try to reduce sampling errorstage to try to reduce sampling error