statr sessions 11 to 12

35
Learning Objectives • Determine when to use sampling. • Determine the pros and cons of various sampling techniques. • Be aware of the different types of errors that can occur in a study. • Understand the impact of the Central Limit Theorem on statistical analysis. • Use the sampling distributions of the sample mean and sample proportion.

Upload: ruruchowdhury

Post on 06-May-2015

193 views

Category:

Education


0 download

DESCRIPTION

Praxis Weekend Business Analytics

TRANSCRIPT

Page 1: Statr sessions 11 to 12

Learning Objectives

• Determine when to use sampling.• Determine the pros and cons of various sampling

techniques.• Be aware of the different types of errors that can

occur in a study.• Understand the impact of the Central Limit

Theorem on statistical analysis.• Use the sampling distributions of the sample

mean and sample proportion.

Page 2: Statr sessions 11 to 12

Reasons for Sampling

• Sampling – A means for gathering information about a population without conducting a census– Information is gathered from sample, and

inference is made about the population• Sampling has advantages over a census– Sampling can save money.– Sampling can save time.

Page 3: Statr sessions 11 to 12

Random versus non-random Sampling

• Nonrandom Sampling - Every unit of the population does not have the same probability of being included in the sample

• Random sampling - Every unit of the population has the same probability of being included in the sample.

Page 4: Statr sessions 11 to 12

Sampling from a Frame

• A sample is taken from a population list, map , directory, or other source used to represent the population, which is called a frame.

• Frames can be Telephone Directory, School lists, trade association lists, or even lists sold by brokers.

• In theory, the target population and the frame are same. But in reality, frames may have over-registration or under-registration.

Page 5: Statr sessions 11 to 12

Random Sampling Techniques

• Simple Random Sampling – basis for other random sampling techniques– Each unit is numbered from 1 to N (the size of the

population)– A random number generator can be used to select

n items that form the sample– Easier to perform on small populations. The

process of numbering all members of a population is cumbersome for large populations

Page 6: Statr sessions 11 to 12

Random Sampling Techniques

• Systematic Random Sampling– Every kth item is selected to produce a sample of

size n from a population of size N– Value of k is called sampling cycle– Define k = N/n. Choose one random unit from

first k units, and then select every kth unit from there

– Used because of convenience and relative ease of administration

– A knowledgeable person can easily determine whether a sampling plan has been followed.

Page 7: Statr sessions 11 to 12

Systematic Random Sampling:

Example

• Purchase orders for the previous fiscal year are serialized 1 to 10,000 (N = 10,000).

• A sample of fifty (n = 50) purchases orders isneeded for an audit.

• k = 10,000/50 = 200

Page 8: Statr sessions 11 to 12

Systematic Sampling: Example

• First sample element randomly selected from thefirst 200 purchase orders. Assume the 45thpurchase order was selected.

• Subsequent sample elements: 45, 245, 445, 645, . . .

Page 9: Statr sessions 11 to 12

Random Sampling Techniques

• Systematic Random Sampling: Problems– Problems can occur if the data are subject to any

periodicity and the sampling interval is in syncopation with it, and sampling will be non-random

– Example: a list of 150 college students, actually a merged list of 5 classes with 30 students in each class, the list in each class being ordered with names of top students first and bottom students last. Systematic sampling of every 30th student may cause selection of all top or bottom or mediocre students i.e. the list is subject to cyclical organizations

Page 10: Statr sessions 11 to 12

Random Sampling Techniques

• Stratified Random Sampling– The population is broken down into strata i.e.

homogeneous segments with like characteristics (i.e. men and women OR old, young, and middle-aged people, OR high-income, mid-income and low-income group ) and then Simple/Systematic Random Sampling is done.

– Efficient when differences between strata exist– The technique capitalizes on the known homogeneity of

subpopulations so that only relatively small samples are required to estimate the characteristic for each stratum or group

– Proportionate (% of the sample from each stratum equals % that subpopulation of each stratum is within the whole population)

Page 11: Statr sessions 11 to 12

Random Sampling Techniques

• Cluster (or Area) Sampling– The population is in pre-determined clusters (students

in classes, colleges, towns, companies, areas of a city, geographic regions etc.)

– The technique identifies clusters that tend to be internally heterogeneous

– Each cluster contains a wide variety of elements, and is miniature of the population

– A random sample of clusters is chosen and all or some units within the cluster is used as the sample

– Advantages: Convenience and Cost, Convenient to obtain and cost of sampling is reduced as the scope of study is reduced to clusters

Page 12: Statr sessions 11 to 12

Random Sampling Techniques

Important to remember: in Stratified Random Sampling, each stratum is a homogeneous group of populationin Cluster Sampling, each cluster is a heterogeneous group of population

Page 13: Statr sessions 11 to 12

Convenience (NonRandom) Sampling

• Non-Random sampling – sampling techniques used to select elements from the population by any mechanism that does not involve a random selection process– These techniques are not desirable for making

statistical inferences– Example – choosing members of this class as an

accurate representation of all students at our university, selecting the first five people that walk into a store and ask them about their shopping preferences, etc.

Page 14: Statr sessions 11 to 12

Non-sampling Errors

• Non-sampling Errors – all errors that exist other than the variation expected due to random sampling– Missing data, data entry, and analysis errors– Leading questions, poorly conceived concepts,

unclear definitions, and defective questionnaires– Response errors occur when people do not know,

will not say, or overstate in their answers

Page 15: Statr sessions 11 to 12

Sampling Distribution of Mean Proper analysis and interpretation of a sample statistic requires knowledge of its distribution.

Calculate to estimate

Population

(parameter)

Select arandom sample

Sample

(statistic)

Process ofInferential Statistics

Page 16: Statr sessions 11 to 12

What is a Sampling Distribution?

• Recall that Statistic has a numerical value that can be computed (observed) once a sample data set is available.

• Three points are crucial in this context: Because a sample is only a part of the population, the

numerical value of a statistic cannot be expected to give us the exact value of the parameter

The observed value of a statistic depends on the particular sample that happens to be selected

There will be some variability in the observed values of a statistic over different occasions of sampling

Page 17: Statr sessions 11 to 12

What is a Sampling Distribution?

• The value of a Statistic varies in repeated sampling. • In other words, a Statistic is a random variable and

hence has its own probability distribution

• Sampling Distribution is the Probability Distribution of a Statistic• The qualifier Sampling indicates that the distribution

is conceived in the context of repeated sampling from a population• The qualifier is often dropped to say the distribution

of a statistic

Page 18: Statr sessions 11 to 12

Statistic and Sampling Distribution

• In any given situation, we are often limited to one sample and the corresponding single observed value of a statistic• However, over different samples the statistic varies

according to its sampling distribution• The sampling distribution of a statistic is determined

- from the probability distribution f(x) that governs the population

- sample size n

Page 19: Statr sessions 11 to 12

Central Limit Theorem

• Consider taking a sample of size n from a population• The sampling distribution of the sample mean is the

distribution of the means of repeated samples of size n from a population

• The central limit theorem states that as the sample size increases, The shape of the distribution becomes a normal

distribution (this condition is typically considered to be met when n is at least 30)

The variance decreases by a factor of n

Page 20: Statr sessions 11 to 12

Sampling from a Normal Population

The distribution of sample means is normal forany sample size.

If is the mean of a random sample of size from a normal population with mean and standard deviation , the distribution of is a normal distribution with mean and standard deviation

Page 21: Statr sessions 11 to 12

z Formula for Sample Means

The distribution of sample means is normal forany sample size.

𝑍=𝑥−𝜇𝜎√𝑛

Page 22: Statr sessions 11 to 12

Tyre Store Example

Suppose that the mean expenditure per customer at a tyre store is $85.00, with a standard deviation of $9.00. If a random sample of 40 customers is taken, what is the probability that the sample average expenditure per customer for this sample will be $87.00 or more?

Solution: Because the sample size is greater than 30, the central limit theorem can be used to state that the sample mean is normally distributed and the problem can proceed using the normal distribution calculations.

Page 23: Statr sessions 11 to 12

Solution to Tyre Store Example

Population parameters: Sample size:

Page 24: Statr sessions 11 to 12

Graphic Solution to Tyre Store Example

42.140

9

X

X8785

.5000

.4207

1

Z1.410

.5000

.4207

41.142.1

2

40

98587-X

=Z

n

Equal Areas

of .0793

Page 25: Statr sessions 11 to 12

Demonstration Problem 7.1

Suppose that during any hour in a large department store, the average number of shoppers is 448, witha standard deviation of 21 shoppers. What is the probability that a random sample of 49 different shopping hours will yield a sample mean between441 and 446 shoppers?

Page 26: Statr sessions 11 to 12

Demonstration Problem 7.1

Population Parameters: Sample Size:

𝑧=𝑥−𝜇𝜎√𝑛

=441−44821

√49

=−2.33

𝑧=𝑥−𝜇𝜎√𝑛

=446−44821

√ 49

=−0.67

Page 27: Statr sessions 11 to 12

Graphic Solution forDemonstration Problem 7.1

0

1

Z-2.33 -.67

.2486.4901

.2415

448

X 3

X441 446

.2486.4901

.2415

𝑧=𝑥−𝜇𝜎√𝑛

=441−44821

√49

=−2.33 𝑧=𝑥−𝜇𝜎√𝑛

=446−44821

√ 49

=−0.67

Page 28: Statr sessions 11 to 12

Exercise in R: Normal Distribution

The commands you will learn• dnorm• lines• qqnorm• qqline• rnorm• qqnormsim• pnorm• qnorm

Open URL: www.openintro.orgGo to Labs in R and select 3-Distributions

Page 29: Statr sessions 11 to 12

Exercise in R: Sampling Distribution

Here you will learn Central Limit Theorem using the sample() command

Open URL: www.openintro.orgGo to Labs in R and select 4A – Intro to inference

Page 30: Statr sessions 11 to 12

Sampling Distribution of Sample Proportion ()

• Sample Proportion is defined as

where number of items in a sample that possess a given characteristic, and

Sample size • Sampling Distribution

The central limit theorem holds, and the distribution is approx. normal if np > 5 and nq > 5 (p is the population proportion and q = 1 - p)

The mean of the distribution is p. The variance of the distribution is pq/n

Page 31: Statr sessions 11 to 12

Sampling Distribution of Sample Proportion ()

Whereas the mean is computed by averaging a setof values, the sample proportion is computed by dividing the frequency with which a given characteristic occurs in a sample by the number of items in the sample

Page 32: Statr sessions 11 to 12

Z Formula for Sample Proportions

where sample proportion sample size population proportion

Page 33: Statr sessions 11 to 12

Demonstration Problem 7.3

If 10% of a population of parts is defective, what is the probability of randomly selecting 80 parts and finding that 12 or more parts are defective?

Page 34: Statr sessions 11 to 12

Solution for Demonstration Problem 7.3

Population Parameters

Sample

Check:

Page 35: Statr sessions 11 to 12

Graphic Solution forDemonstration Problem 7.3

.p 0 0335

p0.150.10

.5000

.4319

^

1

Z1.490

.5000

.4319

49.10335.0

05.0

80

)90)(.10(.

10.015.0ˆ

n

qp

pp=Z