1 polls predicting 1992 u.s. presidential election outcomes

1

Polls predicting 1992 U.S. Polls predicting 1992 U.S. presidential election outcomespresidential election outcomesDate Agency Clinton Bush Perot

10/25 Gallup/CNN 44 38 1810/26 ABC 43 36 2110/27 NBC/Wall St. Journal 46 38 1610/28 Gallup/CNN 43 40 1710/31 CBS/New York Times 45 37 1811/1 Gallup/CNN 47 38 1511/2 CBS/New York Times 46 38 1511/2 NBC/Wall St. Journal 46 38 1611/2 ABC 45 38 1611/3 Election Results 43 38 19

2

Polls predicting 1996 U.S. Polls predicting 1996 U.S. presidential election outcomespresidential election outcomes

Dates Agency Clinton Dole Perot Other

10/28-31 Hotline 49 40 9 210/30-11/2 CBS/New York Times 54 35 9 210/31-11/3 Pew Research Center 52 38 8 2

11/1-3 Euter/Zogby 49 41 8 211/1-3 Harris 51 39 9 111/2-3 ABC 52 39 7 211/2-3 NBC/Wall St. Journal 51 38 9 211/3-4 Gallup/CNN/USA Today 51 38 9 2

Election Results 49 41 9 2

3

How many interviews it took to How many interviews it took to estimate the behaviors of 90 estimate the behaviors of 90

million voters? million voters?

• Less than 2,000

4

The History of Sampling The History of Sampling • In 1920, Literary Digest mailed postcards to

people in 6 states, asking whom they were planning to vote for in the presidential campaign.

• The Digest correctly predicted that Harding would be elected.

• In the elections that followed, the Literary Digest expanded the size of its poll and made correct predictions in 1924, 1928, 1932.

5

The History of Sampling The History of Sampling • In 1936, Literary Digest conducted its most

ambitious poll: 10 million ballots were sent to people listed in the telephone directories and on lists of automobile owners.

• Over 2 million responded, given the Republican contender Alf London, a 57 to 43 percent landslide over the incumbent, president Roosevelt.

• Election results: Roosevelt won 61% of the votes.

6

The History of Sampling The History of Sampling • Problem: 22% return rate. • A part of the answer to these questions lay in the

sampling frame used by the Digest: telephone subscribers and automobile owners.

• Such a design selected a disproportionately wealthy sample.

• The sample effectively excluded poor people, and the poor people predominantly voted for Roosevelt’s New Deal recovery program during the depression period.

7

The History of Sampling The History of Sampling • In the same year (1936), George Gallup correctly

predicted that Roosevelt would beat London. • Gallup’s success in 1936 hinged on his use of quota

sampling, which is based on a knowledge of the characteristics of the population being sampled. People are selected to match the population characteristics.

• Using quota sampling, Gallup successfully predicting the presidential winner in 1940 and 1944.

8

The History of Sampling The History of Sampling • In 1948, Gallup mistakenly picked Thomas Dewey o

ver incumbent president Harry Truman. • Factors accounted for 1948’s failure: • 1). Most of the pollsters stopped polling in early Oct

despite a steady trend toward Truman during the campaign.

• 2). “Undecided” voters went disproportionately for Truman.

• 3). Unrepresentativeness of the sample (resulting from quota sampling).

9

The History of Sampling The History of Sampling • Quota sampling technique requires that the researcher

know something about the total population. • For national political polls, such information came

primarily from census data. • By 1948, however, WWII had produced a massive

movement from country to city, radically changing the character of the U.S. population, and Gallup relied on 1940 census data (City dwellers tended to vote Democratic; hence the over-representation of rural voters also underestimated the number of Democratic votes).

10

Why sample?• A sample may be more accurate than a census 抽樣的品質會高於普查，原因 :

Accuracy and precision 觀察量少﹐可作深入細密的觀察 Census of a large population increase the likelihood of non

sampling errors because of the increased volume of work. 資料處理過程錯誤降低﹐ 90% 的誤差為非抽樣的誤差EX>Bureau of the Census uses samples to check the accuracy of

the U.S. Census. Speed of response 可以較快的速度得到結果CostDestructive sampling: 燈泡的壽命

11

Sampling DesignSample designs

Nonprobability samplesConvenience JudgmentQuotaSnowball

Probability samplesSimple randomSystematicStratified ProportionateDisproportionateClusterMultistage

There are no appropriate statistical techniques for measuring random sampling error from a non-probability sample. Thus projecting the data beyond the sample is statistical inappropriate.

There are no appropriate statistical techniques for measuring random sampling error from a non-probability sample. Thus projecting the data beyond the sample is statistical inappropriate.

12

Nonprobability SamplingNonprobability Sampling

• Social research is often conducted in situations where you can't select the kinds of probability samples used in large-scale social surveys.

• Lack of population list: Suppose you wanted to study homelessness: There is no list of all homeless individuals, nor are you likely to create such a list.

13

Convenience SamplingConvenience Sampling

• 便利抽樣 (haphazard or accidental sampling), relying on available subjects

• EX> man-on-the-street interviews, radio station call in to reflect public opinions. Talk to friend about their political sentiment

• EX> professor uses students as sample• EX> every tenth student entering the universit

y library.• EX> Survey “over sea Chinese for internation

al marketing?”

14

Convenience SamplingConvenience Sampling• Advantages: Very low cost, extensively

used, No need for list of population.• It is justified only if the researcher wants to

study the characteristics of people passing the sampling point at specified times or if less risky sampling methods are not feasible.

15

Convenience SamplingConvenience Sampling• Problems: • (1) no way of knowing if those included are

representative. • (2) Variability and bias of estimates cannot

be measured or controlled. • (3) Projecting the results beyond the

specific sample is inappropriate. • Should be use only for exploratory design

to generate ideas and insights. • you should alert readers to the risks

associated with this method.

16

Judgment Samples (Purposive Samples) 判斷抽樣• hand-picked sample elements, believed to be repre

sentative of the population of interest• EX> a fashion manufacturer regularly selects a sa

mple of key accounts that it believes are capable of providing the information to predict what will sell in the fall.

• EX> Dow Jones industrial average: select 30 blue-chip stocks out of 1,800 stocks. Highly correlated with other NYSE indicators on the daily percentages of price changes

• EX>Representative communities in U.S. presidential election.

• EX> CPI— 產品項目的選擇。

17

Snowball sampleSnowball sample 雪球抽樣雪球抽樣• Locate an initial set of respondents. Thes

e individual are then used as informants to identify others with the desired characteristics.

• Appropriate when the members of a special population are difficult to locate. 為估計很難尋找或十分稀少的特性。

18

Snowball sampleSnowball sample 雪球抽樣雪球抽樣• EX> survey users of an unusual product: a stu

dy among deaf for product that would allow deaf people to communicate over telephone.

• EX> 特殊生活習慣 ( 同志調查 ) ， homeless, gangsters, migrant workers, undocumented immigrants.

• EX> network study ，特殊疾病 (HIV)• Bias: a person who is known to someone has a

higher probability of being similar to the first person.

19

Quota samples 配額抽樣• by selecting sample elements in such a way that the

proportion of the sample elements possessing a certain characteristics is approximately the same as the proportion with the characteristics in the population.

Establishing a characteristics matrix: What proportion of the target population is male and female? what proportions of each gender fall various age categories, educational level, ethnic groups,…etc.

Once such a matrix has been created and a relative proportion assigned to each cell in the matrix, you collect data from people having all the characteristics of a given cell.

All the persons in a given cell are then assigned a weight appropriate to their portion of the total population.

20

Quota samples 配額抽樣• Problems: The sample could be far off with respect to other

important characteristics. The quota frame must be accurate, and it is often

difficult to get up-to-date information for this purpose.

21

Quota samples 配額抽樣Biases may exist in the selection of sample

elements within a given cell. The interviewer has a quota to achieve. The actual choice of elements left to the discretion of the individual field worker. Interviewers are prone to follow certain practices

22

Quota samples 配額抽樣 those who are similar to the interviewers are more likely to

be interviewed, toward the accessible (first floor, airline terminals,

business district, college campus), toward household with children, exclude working people, against workers in manufacturing (service and

administrative), against extreme of income (EX> "mansions" were skipped

because the interviewer did not feel comfortable knocking on doors that were answered by servants. ),

against the less educated, against low-status individuals

23

The logic of probability samplingThe logic of probability sampling• EPSEM (equal probability of selection EPSEM (equal probability of selection

method)method): a sample will be representative of the population from which it is selected if all members of the population have an equal chance of being selected in the sample.

• We must realize that even carefully selected EPSEM samples seldom (if ever) perfectly represent the populations from which they are drawn.

24

Probability sampling offers two advantages: Probability sampling offers two advantages:

• First, probability samples, although never perfectly representative, are typically more representative than other types of samples because the biases previously discussed are avoided.

• Second, and more important, probability theory permits us to estimate the accuracy or representativeness of the sample.

25

Population and Sample element Population and Sample element • Element: An element is that unit about which

information is collected and that provides the basis of analysis. – People, families, corporations – usually the same as unit of analysis

• Population: A population is the theoretically specified aggregation of study elements.

26

Defining the target population It is vitally important to carefully define the target population so t

he proper source from which the data are to be collected can be identified.

Question: "To whom do we want to talk?" What or who will be observed?---answer the questions about the tangible characteristics of the population (1) definition of the element (2) time referent for the study.

• EX> “ 有生育能力的婦女” Or “female between age 12-50”?.• EX> 台灣地區成年人口• EX> 大學生：二專、三專﹖夜間部﹖空中大學﹖研究生• EX> Industrial buyer behavior

– incorrectly define population as the purchasing agents but in fact, industrial engineers within the customer companies had substantial impact on buying decision.

27

Defining the study population• Study Population: A study population is that a

ggregation of elements from which the sample is actually selected.

• Lists of elements are usually somewhat incomplete

• 全國性的調查常忽略「金門」「馬祖」• 「社會學教授」 = 在社會系教書的教授

28

Sampling units• A sampling unit is that element or set of elements consi

dered for selection in some stage of sampling.• In a simple single-stage sample, the sampling units are t

he same as the elements and are probably the units of analysis.

• EX> passengers on a passengers list sampling unit = elements

• In a multi-stage sample: • EX> the airlines could first select flights as the sampling

unit, then select certain passengers on the previously select flights.

• PSU (primary sampling units) = flights• Secondary sampling units = passengers

29

Sampling frame• A sampling frame is the actual list of sampling units from

which the sample, or some stage of the sample, is selected. Also referred to as the working population.

• In single-stage sampling designs, the sampling frame is simply a list of the study population.

• Almost all sampling frame exclude some members of the population. A sampling frame error occurs when certain sample elements are excluded or when the entire population is not accurately represented in the sample frame.

• We often begin with a population in mind for our study; then we search for possible sampling frames, the frames available for our use are examined and evaluated.

30

Observation unit• An observation unit, or unit of data collection, is

an element or aggregation of elements from which information is collected.

• EX) Researcher may interview heads of households (the observation units) to collect information about all members of the households (the units of analysis).

31

Types of Sampling Designs

• Simple Random Sampling

• Systematic Sampling

• Stratified Sampling

• Cluster Sampling

32

Simple Random Sampling 簡單隨機抽樣• Simple random sampling is the basic sampling

method assumed in the statistical computations of social research.– Establish a sampling frame – Assigns a single number to each element in the list,

not skipping any number in the process.– generates series of random numbers to select the

elements

• Simple random sampling is seldom used in practice

33

Systematic Sampling 系統抽樣• A systematic sample with a random start--a procedure in

which an initial starting point is selected by a random process, and then every kth number on the list is selected.

• Sampling interval: the number of population elements between the units selected for the sample.

• Sampling interval = population size / sample size• Sampling ratio = sample size / population size• Systematic sampling is virtually identical to simple rando

m sample. If the list of elements is indeed randomized before sampling, one might argue that a systematic sample drawn from that list is in fact a simple random sample.

• Systematic sampling is much easier to conduct.

34

Problem of periodicity• The arrangement of elements in the list can

make systematic sampling unwise. • EX> collecting retail sales information every

seventh day (Monday)• EX> when the list is not randomly distribute: a

list of contributors (donors) ranked by amount of donations.

• EX> apartment number

35

Stratified Random SamplingStratified Random Sampling 分層隨機分層隨機• Recall that sampling error can be reduced by • (1) increase sampling size• (2) a homogeneous population produces

samples with smaller sampling errors than does a heterogeneous population.

• The logic of stratified sampling: rather than selecting your sample from the total population at large, you ensure that appropriate numbers of elements are drawn from homogeneous subsets of that population.

36

Stratified Random SamplingStratified Random Sampling 分層隨機分層隨機The parent population is divided into mutually exclu

sive and exhaustive subsets.A simple random sample of elements is chosen inde

pendently from each group or subset. • To organize the population into homogeneous subset

s and to select the appropriate number of elements from each. 先將母體劃分成數層 (strata) ，在每一層分別運用隨機抽樣方法抽取部份子樣本。

37

Stratified Random SamplingStratified Random Sampling 分層隨機分層隨機• EX> urban and rural groups differ widely on

attitudes toward energy conservation, members within each group hold very similar attitudes.

• EX> divide the university by college class (freshmen, sophomores, juniors, seniors)

• In selecting stratification variables, you should be concerned primarily with those that are presumably related to variables that you want to represent accurately. Such as sex, education, geographic location,…etc.

• EX> estimate income stratified by educational level.

38

Stratified Random SamplingStratified Random Sampling 分層隨機分層隨機• 層間元素差異性大而層內元素差異性小時 (homo

geneous within strata) ，分層抽樣所得結果較佳 (sampling error is smaller) 。

• The investigator should divide the population into strata so that the elements within any given stratum are as similar in value as possible and the values between any two strata are as disparate as possible.

• In the limit, if the investigator is successful in partitioning the population so that the elements in each stratum are exactly equal, there will be no error associated with the estimate of the population parameters.

39

Increased precision of stratified samplesIncreased precision of stratified samples

• EX> N=1,000

• Mean = 5 (.2) + 10 (.3) + 20 (.5)= 14, variance = 39

• Suppose that a researcher was able to partitioning the total population so that all the elements with a value of 5 in one stratum, those with value of 10 were in the second, and those with the value of 20 were in the third.

• Take a proportionate stratified sample of n=10.

• Or select a sample of n=3, and calculate the weighted average.

x f(x)

5 20010 30020 500

1000

40

Proportional stratified sample• Proportional stratified sample: the number of sampling

units drawn from each stratum is in proportion to the relative population size of that stratum.

• (1) Sort the population into discrete groups (2) On the basis of relative proportion of the population represented by a given group, select several elements from tat group constituting the same proportion of y our desired sample size.

• (1) Group elements and then put groups together in a continuous list (an ordered list, if no periodicity, is sometime better than randomized list--implicit stratification in systematic sampling). (2) Select a systematic from the entire list.

41

Disproportionate stratified samplingDisproportionate stratified sampling• Balancing the two criteria of strata size and

strata variability. Strata exhibiting more variability are sampled more than proportionately to their relative size; those strata that are very homogeneous are sampled less than proportionately.

42

Multistage cluster sampling 群集抽樣• Used when it is either impossible or impractical to compile an e

xhaustive list of the elements composing the target population.• 先將母體分群 (cluster) ，將群 (cluster) 視為母體的抽樣單

位進行抽樣。• EX) 以都市化的程度將台灣分層，每一個都市層內以村里

為群進行第一階段隨機抽樣，抽取樣本村里。於樣本村里中進行第二階段的隨機抽樣，抽取樣本戶。

• EX) census blocks---sampled blocked sample household sample individual

• EX> sampling high school students in Taiwan requires the entire student list. Cluster sampling: no initial listing is required.

43

Multistage cluster sampling 群集抽樣分層抽樣 :層數少，層內單位較多。所有層中至少有一單位被選入樣本。只有在每層中選部份單位做為樣本。群集抽樣 :群數多，群內單位較少。只有部份群集被選為樣本。群集抽樣則在被抽選之群集中進行普查或進行再次抽查。

44

Multistage cluster sampling 群集抽樣Price of the efficiency less accurate sample: A simple rand

om sample drawn from a population list is subject to a single sampling error, but a two-stage cluster sample is subject to two sampling errors. (ex> selecting a sample of disproportionately wealthy city blocks, plus a sample of disproportionately wealthy households within those blocks.)

Tradeoff: With a given total sample size, if the number of clusters is increased, the number of elements within a cluster must be decreased. The representativeness of the clusters is increased at the expense of more poorly representing the elements composing each cluster.

45

Comparisons of sampling techniquesDescription

Simple Random Researcher assings each member of thesampling frame a number, then selectssample units by a random method.

Systematic Researcher uses natural ordering or orderof sampling frame, selects an arbitrarystrating point, then selects items at apreselected interval

Stratified researcher divides the population intogroups and randomly selects subsamplesfrom each group.

Cluster Researcher selects sampling units atrandom, then does complete observationof all units in the group

Multistage Progressively smaller areas are selectedin each stage. Researcher performs somebombination of the first four techniques

46

Comparisons of sampling techniquesCost and Degree of Use

Simple Random High cost, not frequently used in practice(except random-digit dialing)

Systematic Moderate cost, moderately usedStratified Hihg cost, moderately usedCluster Low cost, frequently usedMultistage Hihg cost, frequently used, especially in

nationwide surveys

47

Comparisons of sampling techniquesAdvantages

Simple Random Only minimal advance knowledge ofpopulation needed; easy to analyze dataand compute error

Systematic Simple to draw sample; easy to checkStratified Assures representation of all groups in

sample; characteristics of each stratumcacn be estimated and comparisons made;reduces variability for same sample size

Cluster If clusters geographically definded, yieldslowest field cost; requries listing of allclusters but of individuals only withinclusters; can estimate characteristics ofclusters as well as of population

Multistage Depends on the techinques comined

48

Comparisons of sampling techniquesDisadvantages

Simple Random Requries sample frame to work from;does not use knowledge of population thatresearcher may have; larger errors frosame sample size than startified sampling; respondents may be widely dispersed,hence higher cost

Systematic If sampling interval is related to aperiodic ordering of the popuation, mayintroduce increased variability

Stratified Requires accurate information onproporitno in each st a r tum; if stratifiedlists are not already available, they can becostly to prepare

Cluster Larger error for comparable size thanother probability samples; researchermust be able to assing populationmembers to unique cluster, or duplicaitonor ommision of individuals results

Multistage Depends on the techniques combined

49

Sampling Bias

• A sample is biased if it is obtained by a method that favors the selection of elementary units having particular characteristics.

50

Sampling Error or Error of Estimation

-ˆ toequal is

,estmiation oferror or error, sampling The

,parameter population

some of estimate samplea be ˆLet

51

Respondent errorRespondent error

Systematic (nonsampling) error

Error in survey researchError in survey researchError in survey researchError in survey research

Random sampling error

Administrative error

Nonresponse error

Nonresponse error

Response biasResponse bias

Deliberate falsification Unconscious misrepresentation

Deliberate falsification Unconscious misrepresentation

Extremity bias

Interviewer bias

Auspices bias

Social desirability bias

Data processing error

Sample selection error

Interviewer error

Interviewer cheating

Acquiescence bias

Self-selection biasSelf-selection bias

Contamination by others

52

Random Sampling Error

A statistical fluctuation that occurs because of chance variation in the elements selected for a sample.

Can be estimated.Can be reduce through increasing sample

size.

53

Systematic Error 系統誤差 — nonsampling errors

測量工具不精準 imperfect aspect of the research design

測量執行時的錯誤 mistake in the execution of the research

A sample bias exists when the results of a sample show a consistent tendency to deviate in one direction from the true value of the population parameter.

Two general categories: – ) Respondent error –Nonresponse error + Response bias

– ) Administrative error

54

Non-response error

• The statistical difference between a survey that includes only those who responded and a survey that also includes those who failed to respond.

• Non-respondent—a person who is not contacted or who refuses to cooperate

• 1. not-at-home—married women • 2. refusal – a person who is unwilling to

participate.

55

Non-response error• To identify the extent of nonresponse error, business resear

cher often select a sample of nonrespondents who are then recontacted. 抽取一小部份不回答者做再造訪 (call back or follow-up), 比較次樣本與受訪樣本之差異。

• Comparing the demographics of the sample with the demographics of the target population is one mean of inspecting for possible bias. 與母體統計相比。

EX) 500 人以上之公司的百分比。老年人人口比率。 EX> sample from the educational or personnel records

56

Self-selection bias• (EX) who are more likely to respond to customer s

atisfaction survey on the dining table? • (EX) PC software--expert views on degree of "use

r friendly", might be more critical. • Self-selection biases the survey because it allows e

xtreme positions to be over-represented while those who are indifferent are under-represented 極端意見 .

57

Deliberate falsification Appear to be intelligent—EX> price of a good, reluctant to

say "can't remember". Conceal personal information—EX>income, political attitude To avoid embarrassment—EX>sexual behaviors,

smoking/drinking Become bored—to get rid of the interviewer Reluctant to give negative feeling—EX> in employee survey;

to safeguard their job To please interviewer. ”Average man" hypothesis—to conform to their perception of

the average person. EX> number of hour worked.

58

Unconscious Misrepresentation• in the absence of strong preference, respondents wil

l choose answers to justify their behavior—(EX) which PC is better? In-flights survey concerning aircraft preference

• Misunderstand the question--EX> philipine—toothpaste = Colgate

• Never thought about the question—buying intention, quitting intention

• Forgot the exact details—when was last time you…? How many times did you…?

59

Acquiescence biasA tendency to agree with all questions or to indicate

a positive connotation. “yea (no) sayers”EX> Japanese do not wish to contradict othersparticularly prominent in ideas previously unfamilia

r to the respondents

60

Extremity bias (or avoid extreme position)

Consistently low or high scores were given to every question.

EX) student evaluation of the class.

61

Interviewer bias Bias due to the influence of the interviewer (mere presence) Provide the “right” answer to please interviewer; Appear intelligent and wealthy to “save face”. Interviewer’s age, sex, tone of voice, facial expressions, or o

ther noverbal characteristics. Will interviewer’s gender make a difference when asking the

following questions?ＥＸ）「威而剛」已經獲准上市，你會不會有興趣試一

試？ＥＸ）你平常花最多時間閱讀的報紙版面依序為… ? Interviewer shorten or rephrase question

62

Auspices bias• —bias in the responses of subjects caused by the res

pondents being influenced by the organization conducting the study.

EX) 勞工陣線 vs. 勞委會委託有關最低薪資的調查

63

Social desirability bias—bias in the responses of subjects caused by respondent's

desire, either consciously or unconsciously, to gain prestige or to appear in a different social role.

inflated income “have you ever been fired from a job?” “Do you have roaches in your home?” “how many times you brash your teeth per day”

• Likelihood for social desirability bias: face-to-face > telephone > mail

64

Contamination by others• EX> complete a question on the satisfaction with

family (marital) relationship (Under the presence of a spouse).

65

Administrative errorData processing errorSample selection error—unlisted telephone respondent,

stopping respondents during daytime hours in shopping center exclude working women, wrong household member answer the phone…etc.

Interviewer error—check the wrong response, can't write fast enough to record answers, selective perception (take liberty in interpreting questions, specific words may unconsciously be emphasized).

Interviewer cheating (deliberate subversion)—• fills in the answers to certain questions, skip questions, in order to finish

the question as soon as possible. • remedy–mini-re-interviews: a percentage of respondent will be call upon

to verify the data.

66

What can be done to reduce error:• Questionnaire design—to reduce response bias• Sampling—to control random sampling error• Interviewer training• Use rule-of-thumb estimates for systematic error based on the result of other studies (areas), create benchmark

figures or standards of comparison— EX>½ of those who say they “will definitely buy “ within the

next three months actually do make a purchase. For durable—1/3. "will probably buy" durable = no actual buy

1 polls predicting 1992 u.s. presidential election outcomes

Documents

sampling frame

history of sampling

use of quota sampling

literary digest

poor people

presidential campaign

president roosevelt

presidential winner