sociology 601: class 5, september 15, 2009

28
Sociology 601: Class 5, September 15, 2009 1

Upload: cala

Post on 21-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Sociology 601: Class 5, September 15, 2009. Overview Homeworks Stata & Review standard errors Chapter 5 Point estimation. (A&F 5.1) Confidence intervals… for a population mean (A&F 5.2) for a population proportion (A&F 5.3) Choosing a sufficient sample size (A&F 5.4). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sociology 601: Class 5, September 15, 2009

Sociology 601: Class 5, September 15, 2009

1

Page 2: Sociology 601: Class 5, September 15, 2009

What we have accomplished with sampling distributions

• Given a population parameter, we know that a sample statistic will produce a better estimate of the population parameter when the sample is larger. (Better means more accurate and normally distributed).

• We know what we are doing at a qualitative level.

2

Page 3: Sociology 601: Class 5, September 15, 2009

What’s next

• We will take it to a quantitative level: How good is a given estimate from a given sample?

• We will go over formal language and equations for using sample statistics to make inferences for population parameters.

• Once we have equations for predicting a population mean and standard deviation, we will discuss formal language for defining an interval estimate, a guess of a range of potential values for the population parameter, based on the sample. 3

Page 4: Sociology 601: Class 5, September 15, 2009

5.1: Estimation: definitions

• Point estimate: a single number, calculated from a set of data, that is the best guess for the parameter.

• Point estimator: the equation used to produce the point estimate. (Common notation: put a “hat” on the parameter.)

• Interval estimate: a range of numbers around the point estimate within which the parameter is believed to fall. Also called a confidence interval.

4

Page 5: Sociology 601: Class 5, September 15, 2009

The basics of point estimation

• The typical point estimator of a population mean is a samplemean:

• The typical point estimator of a population proportion is a sampleproportion:

• Q: is this a point estimator of a mean?

n

YY i

n

Y

n

f i

1ˆ Y5

Page 6: Sociology 601: Class 5, September 15, 2009

Point estimators for standard deviations.

Estimated standard deviation of observations in a population:

ˆ σ = s =(Yi −Y )2∑n −1

6

Page 7: Sociology 601: Class 5, September 15, 2009

Typical point estimators for standard errors.

• Estimated standard error of samples drawn from a population:

• Special case: estimated standard error of a population proportion:

nY

ˆ

ˆ

n

)ˆ1(ˆˆ ˆ

7

Page 8: Sociology 601: Class 5, September 15, 2009

Choosing a good estimator

You can technically use any equation you want as a point estimator, but the most popular ones have certain desirable properties.•Unbiasedness: The sampling distribution for the estimator ‘centers’ around the parameter. (On average, the estimator gives the correct value for the parameter.)

•Efficiency: If at the same sample size one unbiased estimator has a smaller sampling error than another unbiased estimator, the first one is more efficient.

•Consistency: The value of the estimator gets closer to the parameter as sample size increases. Consistent estimators may be biased, but the bias must become smaller as the sample size increases if the consistency property holds true.

8

Page 9: Sociology 601: Class 5, September 15, 2009

Examples for point estimates:

Given the following sample of seven observations:

5,2,5,2,4,5,5

• What is the estimator of the population mean?• What is the estimate of the population mean?• What is the estimator of the population standard error?• What is the estimate of the population standard error for

this sample?

• What is the estimate of the population proportion with a value of 5 or greater?

• What is the estimate of the population standard error for the proportion with a value 5 or greater?

9

Page 10: Sociology 601: Class 5, September 15, 2009

Examples for point estimates:Given the following sample of seven observations:

5,2,5,2,4,5,5• What is the estimator of the population mean?

• What is the estimate of the population mean?

(5+2+5+2+4+5+5) / 7 = 28 / 7 = 4• What is the estimator of the population standard error?

• What is the estimate of the population standard error for this sample?

o =sqrt {[(5-4)2+(2-4)2+(5-4)2+(2-4)2+(4-4)2+(5-4)2+(5-4)2]/(7-1)} / sqrt(7) o = sqrt { [1 + 4 + 1 + 4 + 0 + 1 + 1] / 6 } / sqrt(7)o = sqrt(2) / sqrt(7)o = 1.41 / 2.64o = 0.53

ˆ μ = Y =ΣYi

n

ˆ σ Y

=ˆ σ

n=

s

n

10

Page 11: Sociology 601: Class 5, September 15, 2009

Examples for point estimates:

Given the following sample of seven observations:

5,2,5,2,4,5,5• What is the estimate of the population proportion with a

value of 5 or greater?o = 4 / 7 o = .57

• What is the estimate of the population standard error for the proportion with a value 5 or greater?

• = sqrt(.57 * (1-.57)) / sqrt(7)• = sqrt (.57 * .43) / sqrt(7)• = sqrt (.24) / sqrt(7)• = .49 / 2.64• = .187

11

Page 12: Sociology 601: Class 5, September 15, 2009

5.2: interval estimates:

• Interval estimate (also called a confidence interval): a range of numbers that we think has a given probability of containing a parameter.

• Confidence coefficient: The probability that the interval estimate contains the parameter. Typical confidence coefficients are .95 and .99.

• We usually are told the desired confidence coefficient, then asked to find the interval estimate appropriate for the confidence coefficient.

12

Page 13: Sociology 601: Class 5, September 15, 2009

Example of confidence interval.

95% confidence interval for a sample mean:

example using age from IHDS:. summarize age

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

age | 215754 27.34663 19.34841 0 116

. ci age

Variable | Obs Mean Std. Err. [95% Conf. Interval]

-------------+---------------------------------------------------------------

age | 215754 27.34663 .0416549 27.26499 27.42827

Q: how is std. err. of age calculated?

Q: assumptions?

95%c.i. = Y ±1.96* se(Y )

95%c.i. = Y ±1.96 * ˆ σ Y

13

Page 14: Sociology 601: Class 5, September 15, 2009

Equations for interval estimates.

• Confidence interval of a mean

• and proportion:

• where…

• and where you choose z, based on the p-value for the confidence interval you want

• Assumption: the sample size is large enough that the sampling distribution is approximately normal

YzYic ..

ˆˆ.. zic

nsY /ˆ

14

Page 15: Sociology 601: Class 5, September 15, 2009

Notes on interval estimates:

• Usually, we are not given z. Instead we start with a desired confidence interval (e.g., 95% confidence), and we select an appropriate z – score.

• We generally use a 2-tailed distribution in which ½ of the confidence interval is on each side of the sample mean.• What does this do to our choice of p-values for the

z-scores?

15

Page 16: Sociology 601: Class 5, September 15, 2009

Equations for interval estimates.

• Example: find c.i. when Ybar =10.2, s=10.1, N=1055, interval=95%.

• z is derived from the 95% value: what value of z leaves 95% in the middle and 2.5 % on each end of a distribution?

• For p = .975, z = 1.96

• The standard error is s/SQRT(n) = 10.1/SQRT(1055) = .31095

• Top of the confidence interval is 10.2 + 1.96*.31095 = 10.8095

• The bottom of the interval is 10.2 – 1.96*.31095 = 9.5905

• Hence, the confidence interval is 9.59 to 10.8116

Page 17: Sociology 601: Class 5, September 15, 2009

Normality rules for confidence

• Confidence intervals assume a normal distribution of possible samples

• Q: when can you assume normality for a sampling distribution of a continuous interval variable (such as income?)• A1: when N >= 30• A2: when observations in the population can be

assumed to be normally distributed.

17

Page 18: Sociology 601: Class 5, September 15, 2009

5.3: Confidence intervals for population proportions:

• Confidence interval for a population proportion:

• Example, 424 of 1000 respondents in a poll report that they plan to vote for candidate X. Calculate a 95% c.i. for this result.

o = .424 +- 1.96 * sqrt { [ .424 * (1-.424)] / 1000 }o = .424 +- 1.96 * sqrt { [ .424 * .576 ] / 1000 }o = .424 +- 1.96 * sqrt { .000244}o = .424 +- 1.96 * .0156o = .424 +- 0.031o = .395 -> .455

nz

)ˆ1(ˆˆ

18

Page 19: Sociology 601: Class 5, September 15, 2009

Normality rules for confidence intervals for sample proportions:

• Q: when can you assume normality for a sample of a dichotomous interval variable (yes = 1, no = 0)• A: when n(p(1-p)) >= 10

• (For what values of p do you need an extra large n to ensure a normal sampling distribution?)

• What can go wrong when you inappropriately assume a normal sampling distribution?

19

Page 20: Sociology 601: Class 5, September 15, 2009

Putting it all together:

• Given the following sample of seven observations:o 5,2,5,2,4,5,5

• What is the 95% confidence interval of the population mean?

20

Page 21: Sociology 601: Class 5, September 15, 2009

What is the best phrasing for an interval estimate?

• a.) The 95% confidence interval for the population mean is 6.8 to 9.5? Or…

• b.) There is a 95% probability that the true population mean is between 6.8 and 9.5? Or…

• c.) We estimate that 95% of samples from the underlying population would fall within 1.35 of the true population mean, and we estimate that the true population mean is 8.15? 21

Page 22: Sociology 601: Class 5, September 15, 2009

Confidence intervals using STATA• Confidence intervals for means and proportions using cii• 95 % confidence interval for General Social Survey sexfreq question• as per A&F example 5.1• Command is: cii samplesize mean standarddeviation, level(level)

cii 1055 10.2 10.1, level(95) Variable | Obs Mean Std. Err. [95% Conf. Interval]-------------+--------------------------------------------------------------- | 1055 10.2 .3109533 9.589842 10.81016

* Variant with higher threshold for “confidence”

cii 1055 10.2 10.1, level(99) Variable | Obs Mean Std. Err. [99% Conf. Interval]-------------+--------------------------------------------------------------- | 1055 10.2 .3109533 9.397584 11.00242

* 95% confidence interval for proportion, as per A&F example 5.2

cii 1934 895, level(95) -- Binomial Exact -- Variable | Obs Mean Std. Err. [95% Conf. Interval]-------------+--------------------------------------------------------------- | 1934 .4627715 .011338 .4403617 .4852942

22

Page 23: Sociology 601: Class 5, September 15, 2009

5.4: Choosing the best sample size

• Cost is directly proportional to sample size, so we generally want the minimum sample to do the job.

• Estimating minimum sample size is commonly done with population proportions • With population proportions, you do not need to

make separate guesses about the population mean and standard deviation.

• With population proportions, it is easy to identify a conservative mean, and the bias does not vary much. 23

Page 24: Sociology 601: Class 5, September 15, 2009

Choosing the best sample size for a population proportion

• We already have an equation for the confidence interval:

• When we choose the best sample size, we choose one half of the confidence interval (the top one) and solve for n

• Agresti and Finlay’s term for one half of the confidence interval is the confidence bound B

nzic

)ˆ1(ˆˆ..

22/1

2

)..(

)1(

topiczn

24

Page 25: Sociology 601: Class 5, September 15, 2009

Sample size example:

• Example: Sample size for election poll:

• Desired 95% c.i. = + or – 3%

• Preliminary estimate: π = .50

• What sample size is needed?

25

Page 26: Sociology 601: Class 5, September 15, 2009

Choosing the best sample size for a sample mean

• Estimating minimum sample size is less commonly done with population means

• With population means, you need to make separate guesses about the population mean and standard deviation.

• We generally have a hard time making a good guess about a population standard deviation without measuring it.

26

Page 27: Sociology 601: Class 5, September 15, 2009

Choosing the best sample size for a population mean

• We already have an equation for the confidence interval:

• When we choose the best sample size, we choose one half of the confidence interval (the top one) and solve for n

• Again, Agresti and Finlay’s term for one half of the confidence interval is the confidence bound B

n

szYic ..

22/1

22

)..(

topiczn

27

Page 28: Sociology 601: Class 5, September 15, 2009

Sample size example:

• Example: Sample size for study of educational attainment among elderly native Americans:

• Desired 99% c.i. = + or –1 year

• Preliminary estimates: μ = 12, σ = 2.5

• What sample size is needed?

28