essentials of marketing research chapter 13: determining sample size

Essentials ofEssentials of Marketing Research Marketing Research

Chapter 13:

Determining Sample Size

WHAT DO STATISTICS MEAN?WHAT DO STATISTICS MEAN?

• DESCRIPTIVE STATISTICS– NUMBER OF PEOPLE– TRENDS IN EMPLOYMENT– DATA

• INFERENTIAL STATISTICS– MAKE AN INFERENCE ABOUT A

POPULATION FROM A SAMPLE

POPULATION PARAMETERPOPULATION PARAMETERVERSUSVERSUS

SAMPLE STATISTICSSAMPLE STATISTICS

POPULATION PARAMETERPOPULATION PARAMETER

• VARIABLES IN A POPULATION

• MEASURED CHARACTERISTICS OF A POPULATION

• GREEK LOWER-CASE LETTERS AS NOTATION, e.g. etc.

SAMPLE STATISTICSSAMPLE STATISTICS

• VARIABLES IN A SAMPLE

• MEASURES COMPUTED FROM SAMPLE DATA

• ENGLISH LETTERS FOR NOTATION– e.g., or SX

MAKING DATA USABLEMAKING DATA USABLE

• Data must be organized into:– FREQUENCY DISTRIBUTIONS– PROPORTIONS– CENTRAL TENDENCY

• MEAN, MEDIAN, MODE

– MEASURES OF DISPERSION• range, deviation, standard deviation, variance

Frequency Distribution of DepositsFrequency Distribution of Deposits

Amount Frequency Percent Probability

Under $3,000 499 16 .16

$3,000-$4,999 530 17 .17

$5,000-$9,999 562 18 .18

$10,000-$14,999

718 23 .23

$15,000 or more 811 26 .26

Total 3,120 100 1

MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY

• MEAN - ARITHMETIC AVERAGE

• MEDIAN - MIDPOINT OF THE DISTRIBUTION

• MODE - THE VALUE THAT OCCURS MOST OFTEN

Number ofSalesperson Sales calls

Mike 4Patty 3Billie 2Bob 5John 3Frank 3Chuck 1Samantha 5

26

Number of Sales Calls Per Day Number of Sales Calls Per Day by Salespersonsby Salespersons

Sales for Products A and B, Both Average 200Sales for Products A and B, Both Average 200

Product A Product B

196 150198 160199 176199 181200 192200 200200 201201 202201 213201 224202 240202 261

MEASURES OF DISPERSIONMEASURES OF DISPERSION

• THE RANGE

• STANDARD DEVIATION

150 160 170 180 190 200 210

5

4

3

2

1

Low Dispersion

Value on Variable

Fre

quen

cyLow Dispersion Versus High Low Dispersion Versus High

DispersionDispersion

150 160 170 180 190 200 210

5

4

3

2

1

Fre

quen

cy High dispersion

Value on Variable

S = S2

= (X - X) n - 1

2

Standard Deviation

THE NORMAL DISTRIBUTIONTHE NORMAL DISTRIBUTION

• NORMAL CURVE

• BELL-SHAPED

• ALMOST ALL OF ITS VALUES ARE WITHIN PLUS OR MINUS 3 STANDARD DEVIATIONS

• I.Q. IS AN EXAMPLE

NORMAL DISTRIBUTIONNORMAL DISTRIBUTION

MEAN

2.14%

13.59% 34.13% 34.13% 13.59%

Normal DistributionNormal Distribution

2.14%

An example of the distribution of Intelligence Quotient (IQ) scores

2.14%

13.59% 34.13% 34.13% 13.59%

2.14%

70 85 100

IQ115 130

STANDARDIZED NORMAL STANDARDIZED NORMAL DISTRIBUTIONDISTRIBUTION

• SYMMETRICAL ABOUT ITS MEAN• MEAN IDENTIFIES HIGHEST POINT• INFINITE NUMBER OF CASES - A

CONTINUOUS DISTRIBUTION• AREA UNDER CURVE HAS A PROBABILITY

DENSITY = 1.0• MEAN OF ZERO, STANDARD DEVIATION

OF 1

A STANDARDIZED NORMAL CURVEA STANDARDIZED NORMAL CURVE

01 -1-2 2

STANDARDIZED STANDARDIZED SCORESSCORES

•POPULATION DISTRIBUTION

•SAMPLE DISTRIBUTION

•SAMPLING DISTRIBUTION

POPULATION DISTRIBUTIONPOPULATION DISTRIBUTION

x

SAMPLE DISTRIBUTIONSAMPLE DISTRIBUTION

XS

SAMPLING DISTRIBUTIONSAMPLING DISTRIBUTION

µX SX

STANDARD ERROR STANDARD ERROR OF THE MEANOF THE MEAN

STANDARD DEVIATION OF THE SAMPLING DISTRIBUTION

CENTRAL LIMIT THEOREMCENTRAL LIMIT THEOREM

PARAMETER ESTIMATESPARAMETER ESTIMATES

• POINT ESTIMATES

• CONFIDENCE INTERVAL ESTIMATES

RANDOM SAMPLING ERROR RANDOM SAMPLING ERROR AND SAMPLE SIZE ARE AND SAMPLE SIZE ARE

RELATEDRELATED

SAMPLE SIZESAMPLE SIZE

• VARIANCE (STANDARD DEVIATION)

• MAGNITUDE OF ERROR• CONFIDENCE LEVEL


Recap

Sample Accuracy

• How close the sample’s profile is to the true population’s profile

• Sample size is not related to representativeness,

• Sample size is related to accuracy

Methods of Determining Sample Size

• Compromise between what is theoretically perfect and what is practically feasible.

• Remember, the larger the sample size, the more costly the research.

• Why sample one more person than necessary?


• Arbitrary– Rule of Thumb (ex. A sample should be at least 5%

of the population to be accurate– Not efficient or economical

• Conventional– Follows that there is some “convention” or number

believed to be the right size– Easy to apply, but can end up with too small or too

large of a sample


• Cost Basis– based on budgetary constraints

• Statistical Analysis– certain statistical techniques require certain

number of respondents

• Confidence Interval– theoretically the most correct method

Notion of Variability

Great variabilit

y

Little variability

Mean

Notion of Variability

• Standard Deviation– approximates the average distance away from

the mean for all respondents to a specific question

– indicates amount of variability in sample– ex. compare a standard deviation of 500 and

1000, which exhibits more variability?

Measures of Variability

• Standard Deviation: indicates the degree of variation or diversity in the values in such as way as to be translatable into a normal curve distribution

• Variance = (x-x)2/ (n-1)• With a normal curve, the midpoint (apex) of the

curve is also the mean and exactly 50% of the distribution lies on either side of the mean.

i

Normal Curve and Standard Deviation

Number ofstandard

deviationsfrom the

mean

Percent ofarea underthe curve

Percent ofarea to theright or left

+/- 1.00 st dev 68% 16%

+/- 1.64 st dev 90% 5%

+/- 1.96 st dev 95% 2.5%

+/- 2.58 st dev 99% 0.5%

Notion of Sampling Distribution

• The sampling distribution refers to what would be found if the researcher could take many, many independent samples

• The means for all of the samples should align themselves in a normal bell-shaped curve

• Therefore, it is a high probability that any given sample result will be close to but not exactly to the population mean.

Midpoint

(mean)

Normal, bell-shaped curve

Notion of Confidence Interval

• A confidence interval defines endpoints based on knowledge of the area under a bell-shaped curve.

• Normal curve– 1.96 times the standard deviation theoretically defines

95% of the population

– 2.58 times the standard deviation theoretically defines 99% of the population

Notion of Confidence Interval

• Example– Mean = 12,000 miles– Standard Deviation = 3000 miles

• We are confident that 95% of the respondents’ answers fall between 6,120 and 17,880 miles 12,000 + (1.96 * 3000) = 17,880 12,000 - (1.96 * 3000) = 6.120

Notion of Standard Error of a Mean

• Standard error is an indication of how far away from the true population value a typical sample result is expected to fall.

• Formula– S X = s / (square root of n)

– S p = Square root of {(p*q)/ n}• where S p is the standard error of the percentage

• p = % found in the sample and q = (100-p)

• S X is the standard error of the mean

• s = standard deviation of the sample

• n = sample size

Computing Sample Size Using The Confidence Interval Approach

• To compute sample size, three factors need to be considered: – amount of variability believed to be in the

population– desired accuracy– level of confidence required in your estimates

of the population values

Determining Sample Size Using a Mean

• Formula: n = (pqz2)/e2

• Formula: n = (s2z2)/e2

• Where– n = sample size

– z = level of confidence (indicated by the number of standard errors associated with it)

– s = variability indicated by an estimated standard deviation

– p = estimated variability in the population

– q = (100-p)

– e = acceptable error in the sample estimate of the population

Determining Sample Size Using a Mean: An Example

• 95% level of confidence (1.96)

• Standard deviation of 100 (from previous studies)

• Desired precision is 10 (+ or -)

• Therefore n = 384– (1002 * 1.962) / 102

Practical Considerations in Sample Size Determination

• How to estimate variability in the population– prior research– experience– intuition

• How to determine amount of precision desired– small samples are less accurate– how much error can you live with?

Practical Considerations in Sample Size Determination

• How to calculate the level of confidence desired– risk– normally use either 95% or 99%


• Higher n (sample size) needed when:– the standard error of the estimate is high

(population has more variability in the sampling distribution of the test statistic)

– higher precision (low degree of error) is needed (i.e., it is important to have a very precise estimate)

– higher level of confidence is required

• Constraints: cost and access

Notes About Sample Size

• Population size does not determine sample size.

• What most directly affects sample size is the variability of the characteristic in the population.– Example: if all population elements have the

same value of a characteristic, then we only need a sample of one!

essentials of marketing research chapter 13: determining sample size

Documents

sample statisticsvariables

sample sizecompromise

sample dataenglish letters

standard deviationsi

meanstandard deviation

normal distribution2

standardized normal

x x n