lecture 0317(1)
Post on 03-Jun-2018
231 Views
Preview:
TRANSCRIPT
-
8/12/2019 Lecture 0317(1)
1/29
Sampling and Sampling
Distributions
Biswo Poudel
KUSOM
-
8/12/2019 Lecture 0317(1)
2/29
Why Take Samples?
To save money, time
To maximize information gleaned out of
limited resource
Often it might be the only option. If access to
the population is impossible, it could be the
only option. (how would you survey the
owners of old Omega watches in Nepal?)
-
8/12/2019 Lecture 0317(1)
3/29
Census vs sampling
Question:When is taking census a better
option than taking a sample?
Answer:When omission of a group of
population is not tolerable for the researcher.
Example: all airplanes are tested thoroughly
because their performance is individually so
important.
-
8/12/2019 Lecture 0317(1)
4/29
Frame
The target population from which the sample istaken.
This population list, map, directory or other
source used to represent the population is calledframe.
Can also be school list, trade association lists, listssold by list brokers.
Frames may have overregistration(includingmore than target population) orunderregistration.
-
8/12/2019 Lecture 0317(1)
5/29
Random Vs Nonrandom Sampling
Random Sampling: Every unit of the population has thesameprobability of being selected into the sample. Forexample: lottery outcomes. This is also calledprobability sampling.
Nonrandom Sampling: Not every unit of the populationhas the same probability of being selected into thesample. This is also called nonprobability sampling.Assigning the probability of occurrence in nonrandom
sampling is impossible. Nonrandom sampling data are not amenable to
analysis by most of the statistical techniques.
-
8/12/2019 Lecture 0317(1)
6/29
Random Sample Techniques
There are four basic random sample
techniques.
1. Simple Random Sample Technique : this is
the most elementary technique. Number each
unit of the frame from 1 to N. Select n items
out of that into sample by using some random
number generator.
-
8/12/2019 Lecture 0317(1)
7/29
2. Stratified Random Sampling
In this, population is subdivided intononoverlapping subpopulations called strata.
The researcher then extracts random sample
from each subpopulation. It has potential for reducing sample error.
How to choose strata? (a) must be internallyhomogenous, externally must contrast with eachother. (b) do stratification by demographicvariables such as gender, socioeconomic class,geographic region, religion and ethnicity.
-
8/12/2019 Lecture 0317(1)
8/29
Stratified random sampling(SRS) could be
either proportionate or disproportionate.
Proportionate SRS occurs when the
percentage of the sample taken from each
stratum is proportionate to the percentage
that each stratum is within the whole
population. If the Sampling is notproportionate, then it is disproportionate SRS.
-
8/12/2019 Lecture 0317(1)
9/29
Example of proportionate SRS: Suppose we
are sampling population of Kathmandu.
Kathmandu has 30% Newars. Suppose you
have divided your population into strata
involving ethnicity. If you are taking a sample
of 100 people, then you want to make sure 30
Newars are in the sample.
-
8/12/2019 Lecture 0317(1)
10/29
3. Systematic Sampling
Used because of its convenience and relative
ease of administration.
Every kth item is selected to produce a sample
of size n from a population of size N.
Value of k, sometimes called sampling cycle, is
given by .
For this to be useful, the source of population
elements is random.
n
Nk
-
8/12/2019 Lecture 0317(1)
11/29
4. Cluster (Area) Sampling
Divide population into nonoverlapping areas
(Clusters) that are internally heterogenous.
Each cluster is, in theory, a microcosm of the
population.
For example, Chitwan could be a cluster, when
thinking of taking a sample of Nepal. Other
cities, districts, metropolitan areas can also
qualify as a cluster.
-
8/12/2019 Lecture 0317(1)
12/29
After choosing clusters, the researcher either selects allelements from the cluster or randomly selectsindividual elements into the sample from the clusters.
Two stage sampling: when clusters are too big, andanother cluster is picked up from within a big cluster.
Advantage:cost, convenience. Since all data are pickedfrom one cluster, the movement cost is reduced.
Disadvantage: If the elements are similar, then thecluster sampling may be inefficient compared to simplerandom sampling. If all elements of a cluster are same,then it is not better than sampling one individual.
-
8/12/2019 Lecture 0317(1)
13/29
Nonrandom Sampling techniques
Also called nonprobability techniques sincechance is not used to select elements from thesamples.
Four nonrandom sampling techniques arepresented here.
1. Convenience Sampling: elements for the
sample are selected for the convenience ofthe researcher. Researcher chooses samplesthat are readily available.
-
8/12/2019 Lecture 0317(1)
14/29
2 Judgment sampling
Elements selected for the sample are chosen by
the judgment of the researcher.
Researchers often believe they can obtain right
sample by using their sound judgment.
Sampling errors are hard to determine because
the samples are put together nonrandomly.
Problems: judgement error might be in onedirection (introducing bias), unlikely to include
extreme elements
-
8/12/2019 Lecture 0317(1)
15/29
3 Quota Sampling
In essence, similar to Stratified RandomSampling(SRS).
Certain population subclasses are used as
strata. Use nonrandom sampling technique to gather
data from each strata.
For example: one may go to a Newarcommunity (say in Sundhara , Lalitpur) andinterview people there until the quota is filled.
-
8/12/2019 Lecture 0317(1)
16/29
Advantage: cost, easy
Disadvantage: it is essentiallya nonrandom
sampling.
-
8/12/2019 Lecture 0317(1)
17/29
4. Snowball Sampling
Survey subjects are selected based onreferrals from other survey respondents.
First pick a person who fits the profile of
subject wanted for the study. Then ask thisperson to refer others who have similarprofile.
Advantage: survey objects are identifiedcheaply and efficiently.
Disadvantage: this is nonrandom.
-
8/12/2019 Lecture 0317(1)
18/29
Sampling Errors and Nonsampling
Errors
Sampling errors: error that occurs when thesample is not representative of thepopulation.
Nonsampling errors: all other errors such asmissing data, recording errors, in putprocessing errors, analysis errors, responseerrors, measurement instrument causederrors, defective questionairre error, poorconcept errors etc etc.
-
8/12/2019 Lecture 0317(1)
19/29
Sample Mean and Sample Proportion
Whenever a research produces measurable data such as
weight, distance, time and income, the sample mean is often
the statistics of choice. If the research results in countable
items such as how many people in a sample choose Coca Cola,
the sample proportion is often the statistics of choice.
Sample Proportion ( )
sampletheinitemsofnumbern
sticscharacterithehavethatsampleainitemsofx
wheren
xp
#
p
-
8/12/2019 Lecture 0317(1)
20/29
Distribution of Sample Mean
Central Limit Theorem: If samples of size nare drawn randomly from a population thathas a mean of and standard deviation ,
then the sample means are approximatelynormally distributed for sufficiently largesample sizes (greater than 30) regardless ofthe shape of the population distribution. If the
population is normally distributed, the samplemeans are normally distributed for any sizesample.
2
-
8/12/2019 Lecture 0317(1)
21/29
Example:
Suppose during any hour in a large
department store, the average number of
shoppers is 448, with a standard deviation of
21 shoppers. What is the probability that arandom sample of 49 different shopping hours
will yield a sample mean between 441 and
446 shoppers? Answer: problem is to determine )446441( xP
-
8/12/2019 Lecture 0317(1)
22/29
Notice that
This leads to the probability of the value beingbetween 441 and 446 to be 0.4901-
0.2486=0.2415; i.e. 24.15%.
2486.0;4901.0
67.0
49
21
448446
33.2
49
21
448441
valuesz
z
z
-
8/12/2019 Lecture 0317(1)
23/29
Correction for finite sample
If the sample is taken from a finite population
of size N, then the z-value for sample size n
has to be calculated using the following
formula:
1
N
nN
n
xz
-
8/12/2019 Lecture 0317(1)
24/29
Example
A production companys 350 hourly employees
average 37.6 years of age, with a standarddeviation of 8.3 years. If a random sample of 45
hourly employees is taken, what is the probabilitythat the sample will have an average age of lessthan 40 years?
Associate probability: 0.4808. Probability ofgetting average less than 40 years is: 0.9808
07.2
1350
45350
45
3.8
6.3740
1
N
nN
n
xz
-
8/12/2019 Lecture 0317(1)
25/29
If the correction had not been used..
The answer would have been 0.9738 (with
associated z-value being 1.94).
-
8/12/2019 Lecture 0317(1)
26/29
Sampling Distribution of proportion
Normal distribution approximates the shape
of the distribution of sample proportions if
n.p>5 and n.q>5 (where q=1-p).
Z-value for proportion
n
pq
ppz
-
8/12/2019 Lecture 0317(1)
27/29
Example:
Suppose 60% of the electrical contractors in a
region use a particular brand of wire. What is
the probability of taking a random sample of
size 120 from these electrical contractors andfinding that 0.5 or less use that brand of wire?
Here
24.2
120
4.06.0
6.05.0
;120
;50.0
;60.0
n
pq
ppz
n
p
p
-
8/12/2019 Lecture 0317(1)
28/29
Z-table associated with -2.24 is 0.4875. Hence
the probability of z getting less than this value
is less than 0.0125.
-
8/12/2019 Lecture 0317(1)
29/29
Example: 2
If 10% of a population of parts is defective,
what is the probability of randomly selecting
80 parts and finding that 12 or more parts are
defective?
Here 12/80=.15; p= which is
associated with 0.0681.
49.1
80
9.01.0
1.015.0
top related