statr sessions 11 to 12
DESCRIPTION
Praxis Weekend Business AnalyticsTRANSCRIPT
Learning Objectives
• Determine when to use sampling.• Determine the pros and cons of various sampling
techniques.• Be aware of the different types of errors that can
occur in a study.• Understand the impact of the Central Limit
Theorem on statistical analysis.• Use the sampling distributions of the sample
mean and sample proportion.
Reasons for Sampling
• Sampling – A means for gathering information about a population without conducting a census– Information is gathered from sample, and
inference is made about the population• Sampling has advantages over a census– Sampling can save money.– Sampling can save time.
Random versus non-random Sampling
• Nonrandom Sampling - Every unit of the population does not have the same probability of being included in the sample
• Random sampling - Every unit of the population has the same probability of being included in the sample.
Sampling from a Frame
• A sample is taken from a population list, map , directory, or other source used to represent the population, which is called a frame.
• Frames can be Telephone Directory, School lists, trade association lists, or even lists sold by brokers.
• In theory, the target population and the frame are same. But in reality, frames may have over-registration or under-registration.
Random Sampling Techniques
• Simple Random Sampling – basis for other random sampling techniques– Each unit is numbered from 1 to N (the size of the
population)– A random number generator can be used to select
n items that form the sample– Easier to perform on small populations. The
process of numbering all members of a population is cumbersome for large populations
Random Sampling Techniques
• Systematic Random Sampling– Every kth item is selected to produce a sample of
size n from a population of size N– Value of k is called sampling cycle– Define k = N/n. Choose one random unit from
first k units, and then select every kth unit from there
– Used because of convenience and relative ease of administration
– A knowledgeable person can easily determine whether a sampling plan has been followed.
Systematic Random Sampling:
Example
• Purchase orders for the previous fiscal year are serialized 1 to 10,000 (N = 10,000).
• A sample of fifty (n = 50) purchases orders isneeded for an audit.
• k = 10,000/50 = 200
Systematic Sampling: Example
• First sample element randomly selected from thefirst 200 purchase orders. Assume the 45thpurchase order was selected.
• Subsequent sample elements: 45, 245, 445, 645, . . .
Random Sampling Techniques
• Systematic Random Sampling: Problems– Problems can occur if the data are subject to any
periodicity and the sampling interval is in syncopation with it, and sampling will be non-random
– Example: a list of 150 college students, actually a merged list of 5 classes with 30 students in each class, the list in each class being ordered with names of top students first and bottom students last. Systematic sampling of every 30th student may cause selection of all top or bottom or mediocre students i.e. the list is subject to cyclical organizations
Random Sampling Techniques
• Stratified Random Sampling– The population is broken down into strata i.e.
homogeneous segments with like characteristics (i.e. men and women OR old, young, and middle-aged people, OR high-income, mid-income and low-income group ) and then Simple/Systematic Random Sampling is done.
– Efficient when differences between strata exist– The technique capitalizes on the known homogeneity of
subpopulations so that only relatively small samples are required to estimate the characteristic for each stratum or group
– Proportionate (% of the sample from each stratum equals % that subpopulation of each stratum is within the whole population)
Random Sampling Techniques
• Cluster (or Area) Sampling– The population is in pre-determined clusters (students
in classes, colleges, towns, companies, areas of a city, geographic regions etc.)
– The technique identifies clusters that tend to be internally heterogeneous
– Each cluster contains a wide variety of elements, and is miniature of the population
– A random sample of clusters is chosen and all or some units within the cluster is used as the sample
– Advantages: Convenience and Cost, Convenient to obtain and cost of sampling is reduced as the scope of study is reduced to clusters
Random Sampling Techniques
Important to remember: in Stratified Random Sampling, each stratum is a homogeneous group of populationin Cluster Sampling, each cluster is a heterogeneous group of population
Convenience (NonRandom) Sampling
• Non-Random sampling – sampling techniques used to select elements from the population by any mechanism that does not involve a random selection process– These techniques are not desirable for making
statistical inferences– Example – choosing members of this class as an
accurate representation of all students at our university, selecting the first five people that walk into a store and ask them about their shopping preferences, etc.
Non-sampling Errors
• Non-sampling Errors – all errors that exist other than the variation expected due to random sampling– Missing data, data entry, and analysis errors– Leading questions, poorly conceived concepts,
unclear definitions, and defective questionnaires– Response errors occur when people do not know,
will not say, or overstate in their answers
Sampling Distribution of Mean Proper analysis and interpretation of a sample statistic requires knowledge of its distribution.
Calculate to estimate
Population
(parameter)
Select arandom sample
Sample
(statistic)
Process ofInferential Statistics
What is a Sampling Distribution?
• Recall that Statistic has a numerical value that can be computed (observed) once a sample data set is available.
• Three points are crucial in this context: Because a sample is only a part of the population, the
numerical value of a statistic cannot be expected to give us the exact value of the parameter
The observed value of a statistic depends on the particular sample that happens to be selected
There will be some variability in the observed values of a statistic over different occasions of sampling
What is a Sampling Distribution?
• The value of a Statistic varies in repeated sampling. • In other words, a Statistic is a random variable and
hence has its own probability distribution
• Sampling Distribution is the Probability Distribution of a Statistic• The qualifier Sampling indicates that the distribution
is conceived in the context of repeated sampling from a population• The qualifier is often dropped to say the distribution
of a statistic
Statistic and Sampling Distribution
• In any given situation, we are often limited to one sample and the corresponding single observed value of a statistic• However, over different samples the statistic varies
according to its sampling distribution• The sampling distribution of a statistic is determined
- from the probability distribution f(x) that governs the population
- sample size n
Central Limit Theorem
• Consider taking a sample of size n from a population• The sampling distribution of the sample mean is the
distribution of the means of repeated samples of size n from a population
• The central limit theorem states that as the sample size increases, The shape of the distribution becomes a normal
distribution (this condition is typically considered to be met when n is at least 30)
The variance decreases by a factor of n
Sampling from a Normal Population
The distribution of sample means is normal forany sample size.
If is the mean of a random sample of size from a normal population with mean and standard deviation , the distribution of is a normal distribution with mean and standard deviation
z Formula for Sample Means
The distribution of sample means is normal forany sample size.
𝑍=𝑥−𝜇𝜎√𝑛
Tyre Store Example
Suppose that the mean expenditure per customer at a tyre store is $85.00, with a standard deviation of $9.00. If a random sample of 40 customers is taken, what is the probability that the sample average expenditure per customer for this sample will be $87.00 or more?
Solution: Because the sample size is greater than 30, the central limit theorem can be used to state that the sample mean is normally distributed and the problem can proceed using the normal distribution calculations.
Solution to Tyre Store Example
Population parameters: Sample size:
Graphic Solution to Tyre Store Example
42.140
9
X
X8785
.5000
.4207
1
Z1.410
.5000
.4207
41.142.1
2
40
98587-X
=Z
n
Equal Areas
of .0793
Demonstration Problem 7.1
Suppose that during any hour in a large department store, the average number of shoppers is 448, witha standard deviation of 21 shoppers. What is the probability that a random sample of 49 different shopping hours will yield a sample mean between441 and 446 shoppers?
Demonstration Problem 7.1
Population Parameters: Sample Size:
𝑧=𝑥−𝜇𝜎√𝑛
=441−44821
√49
=−2.33
𝑧=𝑥−𝜇𝜎√𝑛
=446−44821
√ 49
=−0.67
Graphic Solution forDemonstration Problem 7.1
0
1
Z-2.33 -.67
.2486.4901
.2415
448
X 3
X441 446
.2486.4901
.2415
𝑧=𝑥−𝜇𝜎√𝑛
=441−44821
√49
=−2.33 𝑧=𝑥−𝜇𝜎√𝑛
=446−44821
√ 49
=−0.67
Exercise in R: Normal Distribution
The commands you will learn• dnorm• lines• qqnorm• qqline• rnorm• qqnormsim• pnorm• qnorm
Open URL: www.openintro.orgGo to Labs in R and select 3-Distributions
Exercise in R: Sampling Distribution
Here you will learn Central Limit Theorem using the sample() command
Open URL: www.openintro.orgGo to Labs in R and select 4A – Intro to inference
Sampling Distribution of Sample Proportion ()
• Sample Proportion is defined as
where number of items in a sample that possess a given characteristic, and
Sample size • Sampling Distribution
The central limit theorem holds, and the distribution is approx. normal if np > 5 and nq > 5 (p is the population proportion and q = 1 - p)
The mean of the distribution is p. The variance of the distribution is pq/n
Sampling Distribution of Sample Proportion ()
Whereas the mean is computed by averaging a setof values, the sample proportion is computed by dividing the frequency with which a given characteristic occurs in a sample by the number of items in the sample
Z Formula for Sample Proportions
where sample proportion sample size population proportion
Demonstration Problem 7.3
If 10% of a population of parts is defective, what is the probability of randomly selecting 80 parts and finding that 12 or more parts are defective?
Solution for Demonstration Problem 7.3
Population Parameters
Sample
Check:
Graphic Solution forDemonstration Problem 7.3
.p 0 0335
p0.150.10
.5000
.4319
^
1
Z1.490
.5000
.4319
49.10335.0
05.0
80
)90)(.10(.
10.015.0ˆ
n
qp
pp=Z