sampling and sampling distributionscampus360.iift.ac.in/secured/resource/95/i/mst 01/178430372.pdfa...
TRANSCRIPT
Sampling and Sampling Distributions
Dr. Himani Gupta
Why Sample? Selecting a sample is less time-consuming than selecting
every item in the population (census). Selecting a sample is less costly than selecting every item
in the population. An analysis of a sample is less cumbersome and more
practical than an analysis of the entire population.For example: Sampling (i.e. selecting a sub-set of a whole
population) is often done for reasons of cost (it’s less expensive to sample 1,000 television viewers than 100 million TV viewers) and practicality (e.g. performing a crash test on every automobile produced is impractical).
Dr. Himani Gupta
Types of Samples
Quota
Samples
Non-Probability Samples
Judgment snowball
Probability Samples
Simple Random
Systematic
Stratified
ClusterConvenience
Dr. Himani Gupta
Nonrandom Sampling Convenience Sampling: sample elements are selected for
the convenience of the researcher
Judgment Sampling: sample elements are selected by the judgment of the researcher. you get the opinions of pre-selected experts in the subject matter
Quota Sampling: sample elements are selected until the quota controls are satisfied
Snowball Sampling: survey subjects are selected based on referral from other survey respondents
Dr. Himani Gupta
Convenience Sampling
Convenience sampling attempts to obtain a sample of convenient elements. Often, respondents are selected because they happen to be in the right place at the right time.
use of students, and members of social organizations mall intercept interviews without qualifying the
respondents department stores using charge account lists “people on the street” interviews
Dr. Himani Gupta
Judgmental Sampling
Judgmental sampling is a form of convenience sampling in which the population elements are selected based on the judgment of the researcher.
test markets purchase engineers selected in industrial marketing
research bellwether precincts selected in voting behavior research expert witnesses used in court
Dr. Himani Gupta
Quota SamplingQuota sampling may be viewed as two-stage restricted judgmental sampling. The first stage consists of developing control categories, or quotas, of
population elements. In the second stage, sample elements are selected based on convenience
or judgment.
Population Samplecomposition composition
ControlCharacteristic Percentage Percentage NumberSexMale 48 48 480Female 52 52 520
____ ____ ____100 100 1000
Dr. Himani Gupta
Snowball Sampling
In snowball sampling, an initial group of respondents is selected, usually at random.
After being interviewed, these respondents are asked to identify others who belong to the target population of interest.
Subsequent respondents are selected based on the referrals.
Dr. Himani Gupta
Types of Samples
In a probability sample, items in the sample are chosen on the basis of known probabilities.
Probability Samples
Simple Random Systematic Stratified Cluster
Dr. Himani Gupta
Simple Random Sampling…Example: A government income tax auditor must choose a sample of 40 of 1,000 returns to audit…
Extra #’s may be used if duplicate random numbers are generated
Dr. Himani Gupta
Systematic Sampling
Convenient and relatively easy to administer
Population elements are an ordered sequence (at least, conceptually).
The first sample element is selected randomly from the first kpopulation elements.
Thereafter, sample elements are selected at a constant interval, k, from the ordered sequence frame.
k = N
n,
where:
n = sample size
N = population size
k = size of selection interval
Dr. Himani Gupta
Systematic Sampling: Example
Purchase orders for the previous fiscal year are serialized 1 to 10,000 (N = 10,000).
A sample of fifty (n = 50) purchases orders is needed for an audit.
k = 10,000/50 = 200 First sample element randomly selected from the
first 200 purchase orders. Assume the 45th purchase order was selected.
Subsequent sample elements: 245, 445, 645, . . .
Dr. Himani Gupta
Stratified Random Sampling…A stratified random sample is obtained by separating the population into mutually exclusive sets, or strata, and then drawing simple random samples from each stratum.
Strata 1 : GenderMale
Female
Strata 2 : Age< 20
20-3031-4041-5051-60> 60
Strata 3 : Occupationprofessional
clericalblue collar
other
We can acquire about the total population, make inferences within a stratumor make comparisons across strata
Dr. Himani Gupta
Stratified Random Sampling…After the population has been stratified, we can use simple random sampling to generate the complete sample:
If we only have sufficient resources to sample 400 people total,we would draw 100 of them from the low income group…
…if we are sampling 1000 people, we’d draw50 of them from the high income group.
Dr. Himani Gupta
Divide population into two or more subgroups (called strata) according to some common characteristic.
A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes.
Samples from subgroups are combined into one. This is a common technique when sampling population of
voters, stratifying across racial or socio-economic lines.
Dr. Himani Gupta
Cluster Sampling Population is divided into several “clusters,” each
representative of the population.
A simple random sample of clusters is selected.
All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique.
A common application of cluster sampling involves election exit polls, where certain election districts are selected and sampled.
Dr. Himani Gupta
Cluster Sampling
•San Jose
•Boise
•Phoenix
• Denver
• Cedar Rapids
•Buffalo
•Louisville
•Atlanta
• Portland
• Milwaukee
• KansasCity
•SanDiego •Tucson
• Grand Forks• Fargo
•Sherman-Dension•Odessa-
Midland
•Cincinnati
• Pittsfield
Dr. Himani Gupta
Types of Cluster SamplingCluster Sampling
One-StageSampling
MultistageSampling
Two-StageSampling
Simple ClusterSampling
ProbabilityProportionate
to Size Sampling
Dr. Himani Gupta
Cluster Sampling Advantages
• More convenient for geographically dispersed populations• Reduced travel costs to contact sample elements• Simplified administration of the survey• Unavailability of sampling frame prohibits using other
random sampling methods Disadvantages
• Statistically less efficient when the cluster elements are similar
• Costs and problems of statistical analysis are greater than for simple random sampling
Dr. Himani Gupta
Sample Size…
Numerical techniques for determining sample sizes will be described later, but suffice it to say that the larger the sample size is, the more accurate we can expect the sample estimates to be.
Dr. Himani Gupta
Sampling and Non-Sampling Errors…
Two major types of error can arise when a sample of observations is taken from a population:sampling error and nonsampling error.
Sampling error refers to differences between the sample and the population that exist only because of the observations that happened to be selected for the sample.
Increasing the sample size will reduce this error.
Dr. Himani Gupta
Nonsampling Error…Nonsampling errors are more serious and are due to mistakes made in the acquisition of data or due to the sample observations being selected improperly. Three types of nonsampling errors:
Errors in data acquisitionNonresponse errorsSelection bias
Note: increasing the sample size will not reduce this type of error.
Dr. Himani Gupta
Errors in data acquisition…
— incorrect measurements being taken because of faulty equipment,— mistakes made during transcription from primary sources,— inaccurate recording of data due to misinterpretation of terms, or— inaccurate responses to questions concerning sensitive issues.
…arises from the recording of incorrect responses, due to:
Dr. Himani Gupta
Nonresponse Error… …refers to error (or bias) introduced when responses are not obtained from some members of the sample, i.e. the sample observations that are collected may not be representative of the target population.
As mentioned earlier, the Response Rate (i.e. the proportion of all people selected who complete the survey) is a key survey parameter and helps in the understanding in the validity of the survey and sources of nonresponse error.
Dr. Himani Gupta
Selection Bias…
…occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample.
Dr. Himani Gupta
Sampling Distributions
A sampling distribution is a distribution of all of the possible values of a statistic for a given size sample selected from a population.
For example, suppose you sample 50 students from your college regarding their mean GPA. If you obtained many different samples of 50, you will compute a different mean for each sample. We are interested in the distribution of all potential mean GPA we might calculate for any given sample of 50 students.
Dr. Himani Gupta
Sampling DistributionsSample Mean Example
Suppose your population (simplified) was four people at your institution.
Population size N=4 Random variable, X, is age of individuals Values of X: 18, 20, 22, 24 (years)
Dr. Himani Gupta
Sampling DistributionsSample Mean Example
Summary Measures for the Population Distribution:
214
24222018NX
μ i
2.236N
μ)(Xσ
2i
.3
.2
.1
018 20 22 24A B C D
P(x)
x
Uniform Distribution
Dr. Himani Gupta
Sampling DistributionsSample Mean Example
24,2424,2224,2024,1824
22,2422,2222,2022,1822
20,2420,2229,2020,1820
18,2418,2218,2018,1818
24222018
2nd Observation1st
Obs.
Now consider all possible samples of size n=2
2423222124
2322212022
2221201920
2120191818
24222018
2nd Observation1st
Obs.
16 Sample Means
16 possible samples (sampling with replacement)
Dr. Himani Gupta
Sampling DistributionsSample Mean Example
Sampling Distribution of All Sample Means
2423222124
2322212022
2221201920
2120191818
24222018
2nd Observation1st
Obs
16 Sample Means
18 19 20 21 22 23 240
.1
.2
.3 P(X)
X
(no longer uniform)
Sample Means Distribution
_
Dr. Himani Gupta
Sampling DistributionsSample Mean Example
2116
24211918NX
μ iX
1.5816
21)-(2421)-(1921)-(18
N)μX(
σ
222
2Xi
X
Summary Measures of this Sampling Distribution:
Dr. Himani Gupta
Sampling DistributionsSample Mean ExamplePopulation
N = 4
1.58σ 21μ X X2.236σ 21μ
Sample Means Distributionn = 2
18 20 22 24A B C D
0
.1
.2
.3 P(X)
X 18 19 20 21 22 23 240
.1
.2
.3 P(X)
X_
_
Dr. Himani Gupta
Sampling DistributionsStandard Error
nσσX
Different samples of the same size from the same population will yield different sample means.
A measure of the variability in the mean from sample to sample is given by the Standard Error of the Mean:
Note that the standard error of the mean decreases as the sample size increases.
Dr. Himani Gupta
Sampling DistributionsStandard Error: Normal Pop.
μμX nσσX
If a population is normal with mean μ and standard deviation σ, the sampling distribution of the mean is also normally distributed with
and
(This assumes that sampling is with replacement or sampling is without replacement from an infinite population)
Dr. Himani Gupta
Sampling DistributionsZ Value: Normal Pop.
nσμ)X(
σ)μX(
ZX
X
Z-value for the sampling distribution of the sample mean:
where: = sample mean= population mean= population standard deviation
n = sample size
Xμσ
Dr. Himani Gupta
Sampling DistributionsProperties: Normal Pop.
(i.e. is unbiased )
Normal Population Distribution
Normal Sampling Distribution (has the same mean)
μμx
xx
x
μ
xμ
Dr. Himani Gupta
Sampling DistributionsProperties: Normal Pop.
For sampling with replacement: As n increases, decreasesxσ
Larger sample size
Smaller sample size
xμ
Dr. Himani Gupta
Sampling DistributionsNon-Normal Population The Central Limit Theorem states that as the sample
size (that is, the number of values in each sample) gets large enough, the sampling distribution of the mean is approximately normally distributed. This is true regardless of the shape of the distribution of the individual values in the population.
Measures of the sampling distribution:
μμx nσσx
Dr. Himani Gupta
Sampling DistributionsNon-Normal Population
Population Distribution
Sampling Distribution (becomes normal as n increases)
x
x
Larger sample size
Smaller sample size
xμ
μ
Dr. Himani Gupta
Sampling DistributionsNon-Normal Population
For most distributions, n > 30 will give a sampling distribution that is nearly normal
For fairly symmetric distributions, n > 15 will give a sampling distribution that is nearly normal
For normal population distributions, the sampling distribution of the mean is always normally distributed
Dr. Himani Gupta
Sampling DistributionsExample
Suppose a population has mean μ = 8 and standard deviation σ = 3. Suppose a random sample of size n = 36 is selected.
What is the probability that the sample mean is between 7.75 and 8.25?
Even if the population is not normally distributed, the central limit theorem can be used (n > 30).
So, the distribution of the sample mean is approximately normal with
8μx 0.5363
nσσx
Dr. Himani Gupta
Sampling DistributionsExample
5.036
38-8.25
5.036
38-7.75
Z
Z
First, compute Z values for both 7.75 and 8.25.
0.38300.5)ZP(-0.5 8.25) μ P(7.75 X
Now, use the cumulative normal table to compute the correct probability.
Dr. Himani Gupta
Sampling DistributionsExample
= 2(.5000-.3085)
= 2(.1915)
= 0.3830
Z-0.5 0.5
Standardized Normal Distribution
0μz 7.75 8.25
Sampling Distribution
Sample
8μX x
Population Distribution
8μ X
Dr. Himani Gupta
Sampling DistributionsThe Proportion
size sampleinterest of sticcharacteri thehaving sample in the ofnumber
nX itemsp
The proportion of the population having some characteristic is denoted π.
Sample proportion ( p ) provides an estimate of π:
0 ≤ p ≤ 1
p has a binomial distribution(assuming sampling with replacement from a finite population or without replacement from an infinite population)
Dr. Himani Gupta
Sampling DistributionsThe Proportion Standard error for the proportion:
n)(1σp
n)(1σ
Zp
pp
Z value for the proportion:
Dr. Himani Gupta
Sampling DistributionsThe Proportion: Example
If the true proportion of voters who support Proposition A is π = .4, what is the probability that a sample of size 200 yields a sample proportion between .40 and .45?
In other words, if π = .4 and n = 200, what is
P(.40 ≤ p ≤ .45) ?
Dr. Himani Gupta
Sampling DistributionsThe Proportion: Example
.03464200
.4).4(1n
)(1σ
p
1.44)ZP(0.03464
.40.45Z.03464
.40.40P.45)P(.40
p
Find :
Convert to standardized normal:
pσ
Dr. Himani Gupta
Sampling DistributionsThe Proportion: Example
Use cumulative normal table:
P(0 ≤ Z ≤ 1.44) = P(Z ≤ 1.44) – 0.5 = .4251
Z.45 1.44
.4251
Standardize
Sampling DistributionStandardized
Normal Distribution
.40 0p
Dr. Himani Gupta
Problem 1
The mean expenditure per customer at a tire store is $85.00, with the standard deviation of $9.00. If a random sample of 40 customer is taken, what is the probability that the sample average expenditure per customer for this sample will be $87.00 or more?
Dr. Himani Gupta
Solution to Tire Store Example Population Parameters: Sample Size:
85 940
8787
87
,
( )
n
P X P Z
P Z
n
X
X
0793.9207.1
),41.1(141.1
409
8587
ZPZP
ZP
Dr. Himani Gupta
Graphic Solution to Tire Store Example
Z = X-n
87 85
940
21 42
1 41.
.
1
Z1.410
.5000
.4207
X
940
1 42.
X8785
.5000
.4207
Equal Areasof .0793
Dr. Himani Gupta
Problem 2
Suppose that during any hour in a large department store, the average number of the shoppers is 448, with a standard deviation of 21 shoppers. What is the probability that a random sample of 49 different shopping hours will yield a sample mean between 441 and 446 shoppers?
Dr. Himani Gupta
Graphic Solution Problem
Z= X-n
441 448
2149
2 33. Z= X-n
446 448
2149
0 67.
0
1
Z-2.33 -.67
.2486.4901
.2415
448
X 3
X441 446
.2486.4901
.2415
Dr. Himani Gupta
Sampling from a Finite Population without Replacement
In this case, the standard deviation of the distribution of sample means is smaller than when sampling from an infinite population (or from a finite population with replacement).
The correct value of this standard deviation is computed by applying a finite correction factor to the standard deviation for sampling from a infinite population.
If the sample size is less than 5% of the population size, the adjustment is unnecessary.
Dr. Himani Gupta
Sampling from a Finite Population
Finite Correction Factor
Modified Z Formula
N nN 1
Z X
nN nN
1
Dr. Himani Gupta
Finite Correction Factor for Selected Sample Sizes
Population Sample Sample % Value ofSize (N) Size (n) of Population Correction Factor
6,000 30 0.50% 0.9986,000 100 1.67% 0.9926,000 500 8.33% 0.9582,000 30 1.50% 0.9932,000 100 5.00% 0.9752,000 500 25.00% 0.866
500 30 6.00% 0.971500 50 10.00% 0.950500 100 20.00% 0.895200 30 15.00% 0.924200 50 25.00% 0.868200 75 37.50% 0.793
Dr. Himani Gupta
Problem 3
A production company’s 350 hourly employees average 37.6 years of the age, with a standard deviation of 8.3 years. If a random sample of 45 hourly employees is taken, what is the probability that a sample will have an average of less than 40 years?
Dr. Himani Gupta
Sampling Distribution of p Sample Proportion
Sampling Distribution• Approximately normal if nP > 5 and nQ > 5 (P is the
population proportion and Q = 1 - P.)• The mean of the distribution is P.• The standard deviation of the distribution is
:
p Xn
whereX
number of items in a sample that possess the characteristicn = number of items in the sample
P Qn
Dr. Himani Gupta
Z Formula for Sample Proportions
p PZ
P Qn
wherepnPQ Pn Pn Q
: sample proportion
sample sizepopulation proportion
155
Dr. Himani Gupta
Problem 4: If 10% of a population of parts is defective, what is the probability of randomly selecting 80 parts and finding that 12 or more parts are defective?
Population Parameters= .= -
Sample=
PQ P
nX
p Xn
P p P Z p
p
0 101 1 10 90
8012
1280
0 15
1515
. .
.
( . ).
P ZP Z
( . ). ( . ). ..
1 495 0 1 495 43190681
P
Z PP Q
n
. 15
P . .
(. )(. )15 1010 90
80
Z
P Z ..0 05
0 0335
Dr. Himani Gupta
Graphic Solution for Demonstration Problem 4
Z = . .
(. )(. ).
..p P
P Qn
0 15 0 10
10 9080
0 050 0335
1 49
1
Z1.490
.5000
.4319
.p 0 0335
p0.150.10
.5000
.4319
^