estimation in sampling!? chapter 7 – statistical problem solving in geography

26
Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Upload: sibyl-brown

Post on 29-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Estimation in Sampling!?Chapter 7 – Statistical Problem Solving in Geography

Page 2: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Goals

• Basis Concepts in Estimation• Point Estimation and Interval Estimation• Sampling Distribution of a Statistic• Central Limit Theorem

• Confidence Intervals and Estimation• Standard Normal and Z-Scores• General Procedure for Constructing a Confidence Interval• Geographic Examples of Confidence Intervals

• Sample Size Selection• Mean, Total and Proportion in Sample Size Selection

Page 3: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Points in Estimation• Estimation: Goal of sampling is estimation and inferences of

population characteristics.

• Point Estimation• A statistic is calculated from sample to estimate a corresponding

population parameter.• In probability sampling the “best” point estimate for a population is

the corresponding sample statistic.• For , for (Sample’s standard deviation)

• Calculating Point Estimates: • See table 7.1, Page 98 – McGrew and Monroe

Page 4: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Intervals in Estimation• Interval Estimation

• Due to the nature of uncertainty it is unlikely that a sample statistic will equal a population parameter.

• Used to determine the distance that a sample statistic is from a population parameter.

• Interval estimation uses a confidence interval to establish the likelihood that a sample statistic is within an interval or range from the population parameter.

• Confidence Interval: Represents level of precision associated with the population estimate. Width is determined by 1) sample size; 2) amount of variability in the population’; and 3) the probability level or level of confidence selected for the problem.

Page 5: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Sampling Distribution of a Statistic• A single sample of size n will lead to a distribution curve which

could be any of the curves that we have discussed.• Examples are Poisson, Uniform, Normal, etc.• This single sample will produce a sample mean and standard

deviation.• Sampling Distribution of Sample Means: If you take multiple,

similar-sized independent samples from a population the set of sample means can be graphed.• The red curve is the Sampling Distribution of Sample Means• The black curve represents the frequency distribution of values within the population

Page 6: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Central Limit Theorem• Given the effect of randomness in drawing samples, some

sample means will fall above the population mean and some below.

• Provided they are independent samples the mean of the all of the sample means will be the population mean.

• The distribution of sample means will also be normal and centered on the population mean regardless of the distribution of the population provided that the sample is larger than 30.

• When the sample size (n) is large, the sample mean(s) will be closer to the population mean.

Page 7: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Central Limit Theorem• One final component of the Central Limit Theorem is :• Standard Error of the Mean: According to this theorem the

standard deviation of the sampling distribution can be determined by thus standard error is a basic measure of sampling error.• http://www.youtube.com/watch?v=BvB1QqwurK0

• Sampling Error: The larger the sample size, the smaller the amount of sampling error. Thus, the larger the sample the closer the sample mean is to the population mean. In addition, the larger the standard deviation of the population, the larger the amount of sampling error due to the larger variability in the population.

Page 8: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Central Limit Theorem• The Central Limit Theorem is completely true only for

infinitely large populations.• Within a finite population a correction process may be

incorporated.• Finite Population Correction: Applied to the estimation

process when the sampling fraction is large. Include the fpc in the population estimate equations only when the ratio of sample size to population exceeds 5% ( > .05).

• If it is determined that you should include the fpc then the equation for finding fpc should be included in the standard error equation as:

(fpc) =

Page 9: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Confidence Intervals and Estimation• A confidence interval is placed to demonstrate the likelihood that a

sample mean is within an interval range of the population mean.• A confidence interval is determined: • Z = z-score from the standard normal table• = sample mean• = standard error of mean• A 90% confidence interval thusgives 90% certainty that a population mean lies within the confidence interval defined.• The shaded area in the figure represents the 90% confidence

interval. Notice that there is a .05 area in the upper limit and lower limit where the true mean could fall.

Page 10: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Z-Scores and Confidence Intervals• In order to establish a confidence interval we must determine

a z-score.• This can be done by looking at a table to see z-scores of

common confidence intervals!• More information on z-scores can be found at this website.

Page 11: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Using Interval Estimates

• Confidence Level: Probability that the interval surrounding a sample mean encompasses the true population mean. Defined as 1 - .

• Significance Level: Probability that the interval surrounding a sample mean fails to encompass the true population mean. The significance level is denoted by equals the total sampling error. Since error goes in both directions the probability of it falling into either tail is

Page 12: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Constructing a Confidence Interval• Establish sample mean, population standard deviation, sample

size and the z-value for the desired confidence level.

• Plug the numbers into the confidence level equation

• This will allow you to calculate the sample mean ± the interval as a z-score.

• Ensure that finite population correction (slide 8) is not needed.

Page 13: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

What Level of Confidence?• .99, .95 and .90 are the most commonly used confidence

intervals to establish the mean.

• Higher confidence results in wider intervals and thus less precise estimates but lower sampling error .

• Lower confidence results in smaller intervals but higher sampling error .

• Balance acceptable level of error with needed level of precision.

Page 14: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

The Real World: Unknown Population Standard Deviation• Rarely do we know the parameters of a population hence our

attempts to estimate them!• is generally unknown, so how do we estimate standard error?• Using the sample variance which is the standard deviation

squared is an acceptable approach.• Standard Error Revisited: The standard deviation of the mean

group of samples. (fpc) = So, we put in the sample variance and take the root of the variance to get the standard error =

Page 15: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

What if Sample Size is Small?• Z is valid only if the sample size is greater than 30 so our

confidence interval equation must be altered if we have a smaller sample.

• Instead we use a t-distribution which approaches the standard normal value as the sample size approaches 30.

• In this instance the confidence interval formula is

• We can use the t-table to determine the value of t

Page 16: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

The T-Table• The t-table is dependent on two values:• The Significance Level ( of which the common levels

are .10, .05, .01 as determined earlier by the common confidence levels.

• Degrees of freedom which is determined by taking the sample size and subtracting one: • df = n-1 T-Table (click to view full table)

Page 17: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

But! To Calculate a Confidence Interval….

• Equation used depends on two factors• The equation used for a confidence interval depends on the

sample type (random, systematic, stratified, etc.)• Different population parameters require different confidence

interval equations.

Page 18: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Random or Systematic Samples• Random or Systematic Sample – Estimating Population Mean• Use the t equation for samples less than 30 and the z for those

greater.

• Use sample variance as it is rare that we know the population • Random or Systematic Sample – Population Total• Best estimate of population total () is the sample total (T) which is

T = N• Once we know T (which is not a t-score) we plug it into the

equation.• T

Page 19: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Random or Systematic Samples Continue….• Random or Systematic Sample – Estimate of Population

Proportion• The best estimate of the population proportion () is the sample

proportion (P)• The sample proportion is the number of individuals in the sample

having the specified characteristic (x) divided by the total sample size (n) which is:

• The confidence interval around this population estimate of the proportion is:• P

Page 20: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Stratified Samples• A stratified sample is a little more complicated…• Stratified Sample – Estimate of Population Mean• You will be using different groups called stratum.

• These will be denoted by 1,2,3, etc.• Thus you will have and , etc. for the parameters.• The best estimate of the population mean is the stratified sample

mean.

• M in this equation represents the number of strata• Subscript i is the number of each variable in the strata.• is the population of the strata

• The confidence interval around the mean is

• Note the finite population correction which may or many not be needed.

Page 21: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Stratified Samples• Stratified Sample – Estimate of Population Total• Best estimate of population total () is the sample total (T) which

in stratified samples• T = • We sum the strata

• Once we know T (which, again is not to be confused with a t-score) we plug it into the equation to obtain the confidence interval• T• Note that the equation has the finite population correction which may not

be needed.

Page 22: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Stratified Samples• Stratified Sample – Estimate of Population Proportion• The best estimate of the population proportion () is the sample

proportion (P) which in stratified samples is:• P = • Once again summing the strata

• Then the confidence interval can be obtained• P • Note that the equation has the finite population correction which

may not be needed.

Page 23: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Sample Size Selection• Sample Size Selection – Using the Mean• For practicality sometimes we would prefer to predetermine our

confidence interval and then calculate the sample size needed.• Recall that the confidence interval of the mean is • Let us designate E as the Error that we are willing to tolerate.• E = = • We then decide what error we can have around the population mean.

• For example .10, .05, .01, etc.• Algebraically we can then obtain

• n = • Since in most instances we will not know we substitute with sample sigma.

But how do we find this!?• Sample sigma can be found by taking a preliminary sample greater than 30,

then calculated, and then we can continue the random sample for the result of n.

• When the pre-sample and then continued sample occurs it is called two-stage sampling design.

Page 24: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Sample Size Selection• Sample Size Selection – Total

• The minimum sample needed to make an interval estimate of a population total within a tolerance level E can also be determined.

• E = = • We can then isolate n through algebra

• or • Recall that s is used when we do not know population

• It is best to run a pretest or small sample to obtain

Page 25: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Sample Size Selection• Sample Size Selection – Proportion• To estimate a population proportion within a certain allowable

level of Error (E), the minimum sample size can also be calculated in advance of full sampling.• E = = • n is isolated algebraically so that:

• The population proportion (p) or sample proportion if it is unknown is used. These symbols look very similar.

• The population proportion allows for us to estimate without first taking a pretest or preliminary sample.• This is related to the p(1-p) and the range of values that it can take.• The largest value of p(1 – p) is .25 as the values peak at p = .5• Thus, we can use the value of p(1 – p) = .25 as a worst case scenario and use

it in any data.• We are however, still able to do a pretest if needed and can obtain a smaller

p value.

Page 26: Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography

Chapter VII Ending