essentials of marketing research chapter 13: determining sample size
TRANSCRIPT
Essentials ofEssentials of Marketing Research Marketing Research
Chapter 13:
Determining Sample Size
WHAT DO STATISTICS MEAN?WHAT DO STATISTICS MEAN?
• DESCRIPTIVE STATISTICS– NUMBER OF PEOPLE– TRENDS IN EMPLOYMENT– DATA
• INFERENTIAL STATISTICS– MAKE AN INFERENCE ABOUT A
POPULATION FROM A SAMPLE
POPULATION PARAMETERPOPULATION PARAMETERVERSUSVERSUS
SAMPLE STATISTICSSAMPLE STATISTICS
POPULATION PARAMETERPOPULATION PARAMETER
• VARIABLES IN A POPULATION
• MEASURED CHARACTERISTICS OF A POPULATION
• GREEK LOWER-CASE LETTERS AS NOTATION, e.g. etc.
SAMPLE STATISTICSSAMPLE STATISTICS
• VARIABLES IN A SAMPLE
• MEASURES COMPUTED FROM SAMPLE DATA
• ENGLISH LETTERS FOR NOTATION– e.g., or SX
MAKING DATA USABLEMAKING DATA USABLE
• Data must be organized into:– FREQUENCY DISTRIBUTIONS– PROPORTIONS– CENTRAL TENDENCY
• MEAN, MEDIAN, MODE
– MEASURES OF DISPERSION• range, deviation, standard deviation, variance
Frequency Distribution of DepositsFrequency Distribution of Deposits
Amount Frequency Percent Probability
Under $3,000 499 16 .16
$3,000-$4,999 530 17 .17
$5,000-$9,999 562 18 .18
$10,000-$14,999
718 23 .23
$15,000 or more 811 26 .26
Total 3,120 100 1
MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY
• MEAN - ARITHMETIC AVERAGE
• MEDIAN - MIDPOINT OF THE DISTRIBUTION
• MODE - THE VALUE THAT OCCURS MOST OFTEN
Number ofSalesperson Sales calls
Mike 4Patty 3Billie 2Bob 5John 3Frank 3Chuck 1Samantha 5
26
Number of Sales Calls Per Day Number of Sales Calls Per Day by Salespersonsby Salespersons
Sales for Products A and B, Both Average 200Sales for Products A and B, Both Average 200
Product A Product B
196 150198 160199 176199 181200 192200 200200 201201 202201 213201 224202 240202 261
MEASURES OF DISPERSIONMEASURES OF DISPERSION
• THE RANGE
• STANDARD DEVIATION
150 160 170 180 190 200 210
5
4
3
2
1
Low Dispersion
Value on Variable
Fre
quen
cyLow Dispersion Versus High Low Dispersion Versus High
DispersionDispersion
150 160 170 180 190 200 210
5
4
3
2
1
Fre
quen
cy High dispersion
Value on Variable
S = S2
= (X - X) n - 1
2
Standard Deviation
THE NORMAL DISTRIBUTIONTHE NORMAL DISTRIBUTION
• NORMAL CURVE
• BELL-SHAPED
• ALMOST ALL OF ITS VALUES ARE WITHIN PLUS OR MINUS 3 STANDARD DEVIATIONS
• I.Q. IS AN EXAMPLE
NORMAL DISTRIBUTIONNORMAL DISTRIBUTION
MEAN
2.14%
13.59% 34.13% 34.13% 13.59%
Normal DistributionNormal Distribution
2.14%
An example of the distribution of Intelligence Quotient (IQ) scores
2.14%
13.59% 34.13% 34.13% 13.59%
2.14%
70 85 100
IQ115 130
STANDARDIZED NORMAL STANDARDIZED NORMAL DISTRIBUTIONDISTRIBUTION
• SYMMETRICAL ABOUT ITS MEAN• MEAN IDENTIFIES HIGHEST POINT• INFINITE NUMBER OF CASES - A
CONTINUOUS DISTRIBUTION• AREA UNDER CURVE HAS A PROBABILITY
DENSITY = 1.0• MEAN OF ZERO, STANDARD DEVIATION
OF 1
A STANDARDIZED NORMAL CURVEA STANDARDIZED NORMAL CURVE
01 -1-2 2
STANDARDIZED STANDARDIZED SCORESSCORES
•POPULATION DISTRIBUTION
•SAMPLE DISTRIBUTION
•SAMPLING DISTRIBUTION
POPULATION DISTRIBUTIONPOPULATION DISTRIBUTION
x
SAMPLE DISTRIBUTIONSAMPLE DISTRIBUTION
XS
SAMPLING DISTRIBUTIONSAMPLING DISTRIBUTION
µX SX
STANDARD ERROR STANDARD ERROR OF THE MEANOF THE MEAN
STANDARD DEVIATION OF THE SAMPLING DISTRIBUTION
CENTRAL LIMIT THEOREMCENTRAL LIMIT THEOREM
PARAMETER ESTIMATESPARAMETER ESTIMATES
• POINT ESTIMATES
• CONFIDENCE INTERVAL ESTIMATES
RANDOM SAMPLING ERROR RANDOM SAMPLING ERROR AND SAMPLE SIZE ARE AND SAMPLE SIZE ARE
RELATEDRELATED
SAMPLE SIZESAMPLE SIZE
• VARIANCE (STANDARD DEVIATION)
• MAGNITUDE OF ERROR• CONFIDENCE LEVEL
Determining Sample Size
Recap
Sample Accuracy
• How close the sample’s profile is to the true population’s profile
• Sample size is not related to representativeness,
• Sample size is related to accuracy
Methods of Determining Sample Size
• Compromise between what is theoretically perfect and what is practically feasible.
• Remember, the larger the sample size, the more costly the research.
• Why sample one more person than necessary?
Methods of Determining Sample Size
• Arbitrary– Rule of Thumb (ex. A sample should be at least 5%
of the population to be accurate– Not efficient or economical
• Conventional– Follows that there is some “convention” or number
believed to be the right size– Easy to apply, but can end up with too small or too
large of a sample
Methods of Determining Sample Size
• Cost Basis– based on budgetary constraints
• Statistical Analysis– certain statistical techniques require certain
number of respondents
• Confidence Interval– theoretically the most correct method
Notion of Variability
Great variabilit
y
Little variability
Mean
Notion of Variability
• Standard Deviation– approximates the average distance away from
the mean for all respondents to a specific question
– indicates amount of variability in sample– ex. compare a standard deviation of 500 and
1000, which exhibits more variability?
Measures of Variability
• Standard Deviation: indicates the degree of variation or diversity in the values in such as way as to be translatable into a normal curve distribution
• Variance = (x-x)2/ (n-1)• With a normal curve, the midpoint (apex) of the
curve is also the mean and exactly 50% of the distribution lies on either side of the mean.
i
Normal Curve and Standard Deviation
Number ofstandard
deviationsfrom the
mean
Percent ofarea underthe curve
Percent ofarea to theright or left
+/- 1.00 st dev 68% 16%
+/- 1.64 st dev 90% 5%
+/- 1.96 st dev 95% 2.5%
+/- 2.58 st dev 99% 0.5%
Notion of Sampling Distribution
• The sampling distribution refers to what would be found if the researcher could take many, many independent samples
• The means for all of the samples should align themselves in a normal bell-shaped curve
• Therefore, it is a high probability that any given sample result will be close to but not exactly to the population mean.
Midpoint
(mean)
Normal, bell-shaped curve
Notion of Confidence Interval
• A confidence interval defines endpoints based on knowledge of the area under a bell-shaped curve.
• Normal curve– 1.96 times the standard deviation theoretically defines
95% of the population
– 2.58 times the standard deviation theoretically defines 99% of the population
Notion of Confidence Interval
• Example– Mean = 12,000 miles– Standard Deviation = 3000 miles
• We are confident that 95% of the respondents’ answers fall between 6,120 and 17,880 miles 12,000 + (1.96 * 3000) = 17,880 12,000 - (1.96 * 3000) = 6.120
Notion of Standard Error of a Mean
• Standard error is an indication of how far away from the true population value a typical sample result is expected to fall.
• Formula– S X = s / (square root of n)
– S p = Square root of {(p*q)/ n}• where S p is the standard error of the percentage
• p = % found in the sample and q = (100-p)
• S X is the standard error of the mean
• s = standard deviation of the sample
• n = sample size
Computing Sample Size Using The Confidence Interval Approach
• To compute sample size, three factors need to be considered: – amount of variability believed to be in the
population– desired accuracy– level of confidence required in your estimates
of the population values
Determining Sample Size Using a Mean
• Formula: n = (pqz2)/e2
• Formula: n = (s2z2)/e2
• Where– n = sample size
– z = level of confidence (indicated by the number of standard errors associated with it)
– s = variability indicated by an estimated standard deviation
– p = estimated variability in the population
– q = (100-p)
– e = acceptable error in the sample estimate of the population
Determining Sample Size Using a Mean: An Example
• 95% level of confidence (1.96)
• Standard deviation of 100 (from previous studies)
• Desired precision is 10 (+ or -)
• Therefore n = 384– (1002 * 1.962) / 102
Practical Considerations in Sample Size Determination
• How to estimate variability in the population– prior research– experience– intuition
• How to determine amount of precision desired– small samples are less accurate– how much error can you live with?
Practical Considerations in Sample Size Determination
• How to calculate the level of confidence desired– risk– normally use either 95% or 99%
Determining Sample Size
• Higher n (sample size) needed when:– the standard error of the estimate is high
(population has more variability in the sampling distribution of the test statistic)
– higher precision (low degree of error) is needed (i.e., it is important to have a very precise estimate)
– higher level of confidence is required
• Constraints: cost and access
Notes About Sample Size
• Population size does not determine sample size.
• What most directly affects sample size is the variability of the characteristic in the population.– Example: if all population elements have the
same value of a characteristic, then we only need a sample of one!