you want to survey a school
DESCRIPTION
You want to survey a school. • You draw your sample from the first day of school student enrollment list This list would be your ____???____ Which students are not on this list? A phenomenon known as? Potentially problematic because? (Hint: Dillman, p. 196). Some reminders…. - PowerPoint PPT PresentationTRANSCRIPT
You want to survey a school
• You draw your sample from the first day of school student enrollment list
• This list would be your ____???____
• Which students are not on this list?
• A phenomenon known as?
• Potentially problematic because?
• (Hint: Dillman, p. 196)
Some reminders…
• Population: The group about whom we
want to draw our inference• Sample Frame: Members of the
population who could potentially be in our sample
• Coverage Error: The extent to which members of population are excluded from sample frame (not good)
Welcome…
• …to a hopefully productive lesson on SAMPLING METHODOLOGY!
• What’s ideal?• Nifty tricks??• Common misconceptions???• Limitations of our methods?????????
• P.S. We are going to do (some) math and it is going to be FUN!!!
Simple Random Sampling(what’s ideal)
• Members of a sample frame, which hopefully includes our entire population, are selected one at a time
• independently & without replacement• (Drawing names out of a hat)• Sample is equal in expectation to
population on all outcomes, but no guarantees
Stratified Random Sampling(possibly even more ideal)
• Use criterion to divide sample frame by group membership (e.g. racial category)
• Randomly sample within each group
• What is the advantage of this procedure?
Scenario…• We want to know what percentage of
Americans support Obama for president • We need 1100 members from each racial
group to be confident about group means (more on this later)• American Indians / Alaskan Natives comprise
1% of our population. • Through simple random sampling, how
large of a sample would we theoretically need to reach n = 1100 for this subgroup?
Scenario cont’d…
• OR, we could use stratified random sampling and draw 1100 from each subgroup without all this trouble.
• BUT, now we have oversampled from American Indians--they are over-represented in our sample!
• Implications?• Solutions?
(This data is very fake)
• Proportion supporting B.O.
African American: .50
Asian American: .50
Latino: .50
White: .50
American Indian: 0
Unweighted avg: ??
Weighting (nifty trick)
• Now, let’s do a weighted average instead…
What’s going on here?
99% (.50) + 1% (0) = 49.50%
• Big difference, eh?
So, why was 1100 an ideal subgroup number?
• Because no matter how large your population, a sample of 1100 will get you very close to the true population value if your outcome is binary (e.g. Obama: Yes or No)
• How come?
Because this man said so
• William Sealy Gossett (1876-1937)• Chemist, “math person”, Guinness Brewery worker• A patient man
Yes, a patient man
• Using barley (somehow), spent two years empirically studying relationship between sample means and population means.
• “The Probable (Standard) Error of a Mean” (1908)
• Standard errors are what we use to estimate sampling error
Sampling error
• Describes how closely our sample mean allows us to estimate our population mean
• Conceptually similar to a confidence interval (Dillman, p. 207; http://www.researchsolutions.co.nz/sample_sizes.htm
• Depends on: Population variance (“spread”) (estimated by sample variance) Sample size Population size (to a point)
Sampling error: big picture
• Larger variances and (to a point) larger population sizes require larger samples to estimate the population mean at a given level of precision
• Increasing sample size reduces sampling error, BUT there are diminishing returns to increasing our sample size
Sampling error: big picture
• Diminishing Returns? For large populations… Increasing “n” from 100 to 200 is helpful Increasing from 500-600 is less helpful
Increasing from 1200-1300 helps very little (no matter how large the population)
Why Diminishing Returns?
• Because there is an upper bound (“ceiling”) on the variance of any sample.
• For binary (Yes/no, “1” or “0”) outcomes, max variance is .25
• Thus, it’s only a matter of time till more “n” in the denominator makes our standard error very low
Why Diminishing Returns?
• Even for continuous outcomes, there is still an upper bound on variance unless scale is infinite
• Thus, there are still diminishing returns on increasing “n”
• For more on this topic… -take S-012 -look up Confidence Intervals in stats books “You don't need a large sample of users to obtain
meaningful data:Continuous Data (e.g. Task Time)” http://www.measuringusability.com/sample_continuous.htm
Limitations of Sampling error calculations
• Does not take coverage error into account!
• Assumes you have drawn an simple random sample (e.g. does not take “clustering” into account)
Clustering???
• There are 20,000 students in a city with 40 schools. We want a sample of 1100
• Ideally, we would draw students at random from every school.
• But, it would be cheaper and easier if we drew a few schools at random and obtained information from every student
• Implications?
Clustering???
• If there is a lot of school-level variation in our outcome, our sample will not be representative and our sample estimate will be biased.
• Sampling error formula does not account for this possibility
One more limitation of sampling error formula
• Non-response bias• Even if you have drawn a beautifully
random sample, your sample estimate will be biased if those who do not return your survey are different on your outcome of interest.
• That’s why Dillman’s advice on getting high response rates is so important!