sampling and sampling distribution.ppt

Upload: shailendra-shastri

Post on 09-Oct-2015

67 views

Category:

Documents


2 download

TRANSCRIPT

  • Buisness statistics

  • Learning ObjectivesDetermine when to use sampling instead of a census.Distinguish between random and nonrandom sampling.Decide when and how to use various sampling techniques.Be aware of the different types of errors that can occur in a study.Understand the impact of the Central Limit Theorem on statistical analysis.Use the sampling distributions of and .

  • Reasons for SamplingSampling A means for gathering useful information about a populationInformation gathered from sample, and conclusions drawnSampling vs. census has advantages Sampling can save money.Sampling can save time.Given the resources can broaden the scope of the studyBecause research process is sometimes destructive sample can save productIf accessing the population is impossible ,the sample is the only option.

  • Reasons for Taking a CensusEliminate the possibility that a random sample is not representative of the population. The person authorizing the study is uncomfortable with sample information.

  • Sampling FrameEvery research study has target population that consists of the individuals ,institutions, or entities that are the objects of investigation.Sample is taken from a population list,map,directory,or other source used to represent the population called the sampling frame.Sampling is done from the frame not target populationIdeally a one to one correspondence exists between frame and population unitsFrames may have overregistration or underregistration

  • Random Versus Nonrandom SamplingNonrandom Sampling (nonprobability sampling) - Every unit of the population does not have the same probability of being included in the sampleRandom sampling(probabilty sampling) - Every unit of the population has the same probability of being included in the sample.

  • Random Sampling TechniquesSimple Random Sample basis for other random sampling techniquesEach unit is numbered from 1 to nA random number generator can be used to select n items from the sample

  • Random Sampling TechniquesStratified Random SampleProportionate (% of the sample taken from each stratum is proportionate to the % that each stratum is within the whole population)Disproportionate (when the % of the sample taken from each stratum is not proportionate to the % that each stratum is within the whole population)Systematic Random SampleCluster (or Area) Sampling

  • Simple Random Sample:Sample MembersN=30 n=6

  • Simple Random Sampling:Random Number TableN = 30n = 6

    N = 30n = 6

  • Stratified Random SampleStratified Random sampling population is divided into non-overlapping subpopulations called strataResearcher extracts a simple random sample from each subpopulationStratified random sampling has the potential for reducing error

  • Stratified Random SampleSampling error a sample does not represent the populationStratified random sampling has the potential to match the sample closely to the populationStratified sampling is more costlyStratum should be relatively homogeneous, i.e. race, gender, religion

  • Stratified Random SampleProportionate -- the percentage of the sample taken from each stratum is proportionate to the percentage that each stratum is within the populationDisproportionate -- proportions of the strata within the sample are different than the proportions of the strata within the population

  • Systematic SamplingUsed because of its convenience and easy of administrationPopulation elements are an ordered sequence (at least, conceptually).With systematic sampling, every kth item is selected to produce a sample of size n from a population of size N

  • Systematic SamplingThereafter, sample elements are selected at a constant interval, k, from the ordered sequence frame.Advantages of systematic samplingSystematic sampling is evenly distributed across the frameEvenly determined if a sampling plan has been followedSystematic sampling is based on the assumption that the source of the population is random

  • Systematic Sampling: ExamplePurchase orders for the previous fiscal year are serialized 1 to 10,000 (N = 10,000).A sample of fifty (n = 50) purchases orders is needed for an audit. k = 10,000/50 = 200

  • Systematic Sampling: ExampleFirst sample element randomly selected from the first 200 purchase orders. Assume the 45th purchase order was selected.Subsequent sample elements: 45, 245, 445, 645, . . .

  • Cluster SamplingCluster sampling involves dividing the population into non-overlapping areasIdentifies the clusters that tend to be internally homogeneousEach cluster is a microcosm of the populationIf the cluster is too large, a second set of clusters is taken from each original clusterThis is two stage sampling

  • Cluster SamplingAdvantagesMore convenient for geographically dispersed populationsReduced travel costs to contact sample elementsSimplified administration of the surveyUnavailability of sampling frame prohibits using other random sampling methods

  • Cluster SamplingDisadvantagesStatistically less efficient when the cluster elements are similarCosts and problems of statistical analysis are greater than for simple random sampling

  • Nonrandom SamplingNon-Random sampling sampling techniques used to select elements from the population by any mechanism that does not involve a random selection processThese techniques are not desirable for use in gathering data to be analyzed by inferential statisticsSampling error cannot be determined objectively from these techniques

  • Types of non random sampling techniquesConvenience sampling: Elements of sample are selected for convenience (readily available,nearby,willing to participate) of researcher.Judgment sampling: Elements of sample are chosen by judgment of the researcher.Quota sampling: Quota sets the size of samples to be obtained from subgroups based on the proportions of subclasses in population.Snowball sampling: Survey subjects are selected based on referral from other survey respondents.

  • ErrorsData from nonrandom samples are not appropriate for analysis by inferential statistical methods.Sampling Error occurs when the sample is not representative of the populationNon-sampling Errors all errors other than sampling errorsMissing Data, Recording, Data Entry, and Analysis ErrorsPoorly conceived concepts , unclear definitions, and defective questionnairesResponse errors occur when people do not know, will not say, or overstate in their answers

  • Sampling Distribution of MeanProper analysis and interpretation of a sample statistic requires knowledge of its distribution.

  • Sampling distribution of xSampling distribution of x is the frequency distribution of sample means (computed after randomly selecting samples of given size from a population with particular distribution)Sample means for samples taken for populations with different distributions appear to be approximately normally distributed ,especially as sample size becomes larger.

  • Central Limit TheoremCentral limits theorem allows one to study populations with differently shaped distributionsCentral limits theorem creates the potential for applying the normal distribution to many problems when sample size is sufficiently large

  • Central Limit TheoremAdvantage of Central Limits theorem is when sample data is drawn from populations not normally distributed or populations of unknown shape can also be analyzed because the sample means are normally distributed due to large sample sizes

  • Central Limit TheoremAs sample size increases, the distribution narrowsDue to the Std Dev of the meanStd Dev of mean decreases as sample size increases

  • Sampling from a Normal PopulationThe distribution of sample means is normal for any sample size.

  • Z Formula for Sample Means

  • Tire Store ExampleSuppose, for example, that the mean expenditure per customer at a tire store is $85.00, with a standard deviation of $9.00. If a random sample of 40 customers is taken, what is the probability that the sample average expenditure per customer for this sample will be $87.00 or more? Because the sample size is greater than 30, the central limit theorem can be used, and the sample means are normally distributed. With = $85.00, = $9.00, and the z formula for sample means, z is computed as shown on the3 next slide.

    Suppose, for example, that the mean expenditure per customer at a tire store is $85.00, with a standard deviation of $9.00. If a random sample of 40 customers is taken, what is the probability that the sample average expenditure per customer for this sample will be $87.00 or more? Because the sample size is greater than 30, the central limit theorem can be used, and the sample means are normally distributed. With = $85.00, = $9.00, and the z formula for sample means, z is computed as shown:

  • Solution to Tire Store Example

  • Graphic Solution to Tire Store Example

  • Demonstration Problem Suppose that during any hour in a large department store, the average number of shoppers is 448, with a standard deviation of 21 shoppers. What is the probability that a random sample of 49 different shopping hours will yield a sample mean between 441 and 446 shoppers?

    Suppose that during any hour in a large department store, the average number of shoppers is 448, with a standard deviation of 21 shoppers. What is the probability that a random sample of 49 different shopping hours will yield a sample mean between 441 and 446 shoppers?

  • Demonstration Problem

  • Graphic Solution forDemonstration Problem

  • Z formula for sample means of finite populationZ = x- /n *N-n/N-1

    Where,N-n/N-1 is called the finite correction factor when sample size is less than 5% of finite size population n/N .05 ,finite correction factor does not significantly modify the solution.

  • Sampling Distribution of Sample Proportion

    Sampling DistributionApproximately normal if nP > 5 and nQ > 5 (P is the population proportion and Q = 1 - P.)The mean of the distribution is P.The standard deviation of the distribution is (p*q)/n

  • Sampling Distribution of p hatp hat is a sample proportionWhereas the mean is computed by averaging a set of values, the sample proportion is computed by dividing the frequency with which a given characteristic occurs in a sample by the number of items in the sample as shown in the formula.

  • Z Formula for Sample Proportions

  • Demonstration Problem If 10% of a population of parts is defective, what is the probability of randomly selecting80 parts and finding that 12 or more parts are defective?

    If 10% of a population of parts is defective, what is the probability of randomly selecting80 parts and finding that 12 or more parts are defective?

  • Solution for Demonstration Problem

  • Graphic Solution forDemonstration Problem