1 point and interval estimates examples with z and t distributions single sample; two samples...
Post on 18-Dec-2015
220 Views
Preview:
TRANSCRIPT
1
Point and Interval Estimates• Examples with z and t distributions• Single sample; two samples• Result: Sums (and differences) of normally distributed RV are
normally distributed.• Determining the variance of the difference between means for
two independent samples• Pooled estimates of the variance (when two independent
estimates are available)• Degrees of freedom for the variance of the difference between
the means of two independent samples (equal/not equal variances)
• Estimating the variance for use with proportions, and CI with proportions:
• Bayesian Credibility Intervals – Prior Distribution– Joint Distribution of prior and data– Posterior Distribution
2
Introduction to Biostatistics (PUBHLTH 540)
Examples of Point and Interval Estimates+ Credibility Intervals
Examples from Seasons Study• Assumptions: Subjects are SRS from population. • Assume different groups are independent SRS from
different stratum (ie. gender)
Details: • Use t-distribution for interval estimates when sample
sizes are small (unless estimate is of a proportion) – requires an assumption that the underlying random
variable is normally distributed• When response is binary (yes/no), we estimate the
population mean by the sample mean (equal to the sample proportion ), and the sample variance byp̂
2ˆ ˆ ˆ1p p
3
Examples: Point and Interval Estimate of WtExamples from Seasons Study (see ejs09b540p34.sas).What is a 95% Confidence Interval for Weight?
(see: http://dostat.stat.sc.edu/prototype/calculators/index.php3 )?dist=T to get t-percentiles)Figure 1. Histogram of weight in kg for n=291
Source: ejs09b540p34.sas 10/20/2009 by ejs
48 60 72 84 96 108 120 132 144 156
0
5
10
15
20
25
30
Pe
rce
nt
W t (kg) (formerly cc5a)
Weight
n 291 Lower 95 Upper 95
Mean 77.62 75.6 79.7
Std 17.79
df 290
statist 1.968
The mean weight is estimated as 77.6 kg, with a 95% CI of (75.6, 79.7)
2
1 ,0.975df n
SY t
n
290,0.9751 ,0.975 1.968
17.7977.6 1.968
290
df nt t
Use applets to get t value
4
Examples: Point and Interval Estimate of Wt
Answer: Same as before--The mean weight is estimated as 77.6 kg, with a 95% CI of (75.6, 79.7)
• Suppose we assume the Seasons study subjects were a SRS from people in the US. What is a point and interval estimate of weight for the US population?
5
Examples: Point and Interval Estimate of Wt- separately for men and women
Examples from Seasons Studyejs09b540p34.sas
(see: http://dostat.stat.sc.edu/prototype/calculators/index.php3?dist=T to get t-percentiles)
For men, the mean weight is estimated as 85.9 kg (95% CI (83.3,88.5) while for women, mean wt is 69.7 kg (95% CI (67.2, 72.3)
Table 3. Description of weight by gender
Male(0) Analysis Variable : wt Wt (kg) (formerly cc5a) N Mean Std Dev Variance Std Errorƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ142 85.90 15.82 250.32 1.33ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Female(1) Analysis Variable : wt Wt (kg) (formerly cc5a) N Mean Std Dev Variance Std Errorƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ149 69.73 15.92 253.42 1.30ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒSource: ejs09b540p34.sas 10/20/2009 by ejs
1 ,0.975df n YY t S 141,0.975 1.977t
Use Applet-- men
Use Applet-- women148,0.975 1.976t
6
Examples: Point and Interval Estimate of Wt- adjusting for gender in US population
• Suppose we assume the Seasons study male subjects were a SRS from males in the US, and similarly, and female subjects were an independent SRS from females in the US. In 2000, there were 138.05 million males, and 143.37 million females in the U.S.. Using the Seasons study estimates, what is a point and interval estimate of weight for the US population?
138.050.49
138.05 143.37Mc
143.370.51
138.05 143.37Fc
ˆM FZ c Y c X
Males Females
7
ˆM F
M F
Z c Y c X
Yc c
X
Example: Linear Combinations of Random variables
2
2
0var 0
var0 var
0
y
y
x
x
Y nY
X X
n
Estimate:
2
2
250.320 0142
253.4200
149
Y
y
X
x
S
n
S
n
0.49 0.51M Fc c
2 2 2
ˆvar var
var var
MM F
F
z M F
cYZ c c
cX
c Y c X
8
ˆ 0.49 85.9 0.51 69.73Z
Example: Linear Combinations of Random variables
2 2250.32 253.42ˆ 0.49 0.51
142 149z
What are the DF for the t-dist?
If variances are equal, use df=n1+n2-2, and replace individual variance estimates by a pooled variance.
If variances are not equal, see p270-271 in text for df approximation.
9
Note: Common estimate of a variance-
Pooled EstimateIf we assume the population variance in weight is equal for males and females, we can estimate a pooled (common) variance (see p267 in text):
2 21 1 2 22
1 1
1 1
1 1p
n S n SS
n n
More generally:
2
12
1
1
1
G
g gg
p G
gg
n S
Sn
for Wt:
2 142 1 250.32 149 1 253.42251.9
142 1 149 1pS
2 2251.9 251.9ˆ 0.49 0.51
142 149z
10
ˆ 0.49 85.9 0.51 69.73Z
Example: Linear Combinations of Random variables
2 2
2 215.82 15.92ˆ 0.49 0.51
142 149z
Assuming not equal: from p270-271 in text, 22 2 22 2
2 2 22 2 222
15.82 15.92142 149
15.82 15.92142 149
1 1 141 148
M F
m f
FM
fm
m f
S S
n ndf
SSnn
n n
11
Example: Linear Combinations of Random variables
Wt Mean constants Estimate Var(Mean)
Male 0.49 85.9 1.76
Female 0.51 69.7 1.70
Wt Average 77.6 0.865
df 289
Approx df 288.5 95% CI (75.8, 79.5)
t-0.975 1.968
Wt is estimated as 77.6 kg with a 95% CI of (75.8,79.5)
12
Examples: Proportion of Subjects who are obese (BMI>30) (see p327 text)
•Estimate the proportion of subjects obese, and a 95% CI
•Create 0/1 variable 1=obese 0=normal wt
•Use Z-dist for CI (since np>5)
• Variance estimate: 2ˆ ˆ ˆ1p p Obese
No 221
Yes 70 Lower 95
Total 291 0.1914
P_hat 0.240549828 0.2897
var (phat) 0.000627786 Upper95
See: ejs09b540p34.sas
0.975
ˆ ˆ1ˆ
p pp z
n
13
Examples: Proportion of Subjects who are obese (BMI>30) (see p327 text)
•Single random variable (0/1) is called a Bernoulli random variable.
•Variance is estimated using maximum likelihood estimator (biased): 2ˆ ˆ ˆ1p p
•Usual estimate of the variance (used in other settings) is:
•Normal Approximation is used commonly when nP>5 and n(1-P)>5 (NOT t-dist)
2 ˆ ˆ11
nS p p
n
Example: Sample finds 4 of 10 subjects obese
4ˆ 0.4
10p
Note: nP is not large enough here for the normal approximation to be “good”.
0.975
ˆ ˆ1ˆ
p pp z
n
95% CI
0.4 0.60.4 1.96 (0.10,0.70)
10
14
Examples: Credibility IntervalsBayesian Approach
Recall that we could estimate the mean using Maximum Likelihood
Example: We select a srs with replacement of n=10 and observe x=4. What is p?
Solution 1: Use the sample mean:
ˆ 0.4x
pn
Solution 2: Use value of the parameter p that maximizes the likelihood, given the data.
644 | , 10 210 1P X p n p p
64210 1L p p p Likelihood:
The likelihood is a function of p. We can think of a set of possible values, i.e. 0, 0.1, 0.2, …, 0.8, 0.9, 1 of p. The maximum likelihood estimate is the value of p where the likelihood is largest.
15
Binomial DistributionLikelihood
We select a srs with replacement of n=10 and observe x=4. What is p?
Parameterp
L(p) Parameterp
L(p)
0.05 0.001 0.55 0.1596
0.10 0.0112 0.60 0.1115
0.15 0.0401 0.65 0.0689
0.20 0.0881 0.70 0.0368
0.25 0.1460 0.75 0.0162
0.30 0.2001 0.80 0.0055
0.35 0.2377 0.85 0.0012
0.40 0.2508 0.90 0.0001
0.45 0.2384 0.95 0.0000
0.50 0.2051 1.00 0.0000
16
Binomial DistributionMaximum LikelihoodLikelihood: 64210 1L p p p
p L(p) p L(p)
0.05 0.001 0.40 0.2508
0.10 0.0112 0.45 0.2384
0.15 0.0401 0.50 0.2051
0.20 0.0881 0.55 0.1596
0.25 0.1460 0.60 0.1115
0.30 0.2001 0.65 0.0689
0.35 0.2377 etc
L p
0.05
0.1
0.2
0.2 0.3 0.4 0.5
MaximumLikelihood
ˆ 0.4x
pn
0.6 0.7 0.9
17
Examples: Credibility IntervalsBayesian Approach-Prior
Suppose we assume each parameter is equally likely. This is called a uniform prior distribution
Parameterp
Prior Prob.
Parameterp
Prior Prob.
0.05 0.05 0.55 0.05
0.10 0.05 0.60 0.05
0.15 0.05 0.65 0.05
0.20 0.05 0.70 0.05
0.25 0.05 0.75 0.05
0.30 0.05 0.80 0.05
0.35 0.05 0.85 0.05
0.40 0.05 0.90 0.05
0.45 0.05 0.95 0.05
0.50 0.05 1.00 0.05
Prior distribution
p
p p
18
Examples: Credibility IntervalsBayesian Approach-Data|p
We select a srs with replacement of n=10 and observe x=4. The likelihoodis the Pr(Data|p)
Parameterp
L(p|x) Parameterp
L(p|x)
0.05 0.001 0.55 0.1596
0.10 0.0112 0.60 0.1115
0.15 0.0401 0.65 0.0689
0.20 0.0881 0.70 0.0368
0.25 0.1460 0.75 0.0162
0.30 0.2001 0.80 0.0055
0.35 0.2377 0.85 0.0012
0.40 0.2508 0.90 0.0001
0.45 0.2384 0.95 0.0000
0.50 0.2051 1.00 0.0000
|P x p |P x p
64 1p p
19
Examples: Credibility IntervalsBayesian Approach-Posterior
Combining the Likelihood and the prior, we have the joint probabilities
|pP p x P x p
We sum these probabilities over all possible possible values of p, and divide by this sum to form posterior probabilities:
||
|
p
pp
P x pP p x
P x p
20
Examples: Credibility IntervalsBayesian Approach-Posterior
Credibility Intervals are like Confidence Intervals for parameters in the Posterior Distribution (Uniform Prior)
n 10x 4 Successes Normalized
Prior Prob P(Success) Likelihood Joint Joint Cumulative pi(p) p L(x|p) pi(p)*L(x|p) Posterior Posterior
0.05 0.05 0.00000 0.00000 0.00053 0.000530.05 0.1 0.00005 0.00000 0.00614 0.006670.05 0.15 0.00019 0.00001 0.02205 0.02872 0.150000.05 0.2 0.00042 0.00002 0.04844 0.077170.05 0.25 0.00070 0.00003 0.08030 0.157460.05 0.3 0.00095 0.00005 0.11007 0.267530.05 0.35 0.00113 0.00006 0.13072 0.39825 0.960.05 0.4 0.00119 0.00006 0.13795 0.53620 Credible Interval0.05 0.45 0.00114 0.00006 0.13110 0.667300.05 0.5 0.00098 0.00005 0.11279 0.780100.05 0.55 0.00076 0.00004 0.08776 0.867860.05 0.6 0.00053 0.00003 0.06131 0.929170.05 0.65 0.00033 0.00002 0.03790 0.967070.05 0.7 0.00018 0.00001 0.02022 0.98729 0.700000.05 0.75 0.00008 0.00000 0.00892 0.996210.05 0.8 0.00003 0.00000 0.00303 0.999240.05 0.85 0.00001 0.00000 0.00069 0.999920.05 0.9 0.00000 0.00000 0.00008 1.000000.05 0.95 0.00000 0.00000 0.00000 1.000000.05 1 0.00000 0.00000 0.00000 1.00000
Totals 1 0.00043 1.00000
21
Examples: Credibility IntervalsBayesian Approach-Posterior
Credibility Intervals are like Confidence Intervals for parameters in the Posterior Distribution (Symmetric Prior)
n 10x 4 Successes Normalized
Prior Prob P(Success) Likelihood Joint Joint Cumulative pi(p) p L(x|p) pi(p)*L(x|p) Posterior Posterior
0.050000 0.050000 0.000005 0.000000 0.000499 0.0004990.100000 0.100000 0.000053 0.000005 0.011541 0.0120400.200000 0.150000 0.000191 0.000038 0.082926 0.094965 0.1500000.300000 0.200000 0.000419 0.000126 0.273251 0.3682170.200000 0.250000 0.000695 0.000139 0.301952 0.670169 0.910.100000 0.300000 0.000953 0.000095 0.206945 0.877114 Credible Interval0.050000 0.350000 0.001132 0.000057 0.122886 1.000000 0.3500000.000000 0.400000 0.001194 0.000000 0.000000 1.0000000.000000 0.450000 0.001135 0.000000 0.000000 1.0000000.000000 0.500000 0.000977 0.000000 0.000000 1.0000000.000000 0.550000 0.000760 0.000000 0.000000 1.0000000.000000 0.600000 0.000531 0.000000 0.000000 1.0000000.000000 0.650000 0.000328 0.000000 0.000000 1.0000000.000000 0.700000 0.000175 0.000000 0.000000 1.0000000.000000 0.750000 0.000077 0.000000 0.000000 1.0000000.000000 0.800000 0.000026 0.000000 0.000000 1.0000000.000000 0.850000 0.000006 0.000000 0.000000 1.0000000.000000 0.900000 0.000001 0.000000 0.000000 1.0000000.000000 0.950000 0.000000 0.000000 0.000000 1.0000000.000000 1.000000 0.000000 0.000000 0.000000 1.000000
Totals 1.000000 0.000460 1.000000
22
Examples: Credibility IntervalsBayesian Approach-Posterior
Credibility Intervals are like Confidence Intervals for parameters in the Posterior Distribution (Tiered Prior)
n 10x 4 Successes Normalized
Prior Prob P(Success) Likelihood Joint Joint Cumulative pi(p) p L(x|p) pi(p)*L(x|p) Posterior Posterior0.01000 0.05000 0.00000 0.00000 0.00010 0.000100.10000 0.10000 0.00005 0.00001 0.01105 0.011150.20000 0.15000 0.00019 0.00004 0.07941 0.09056 0.150000.20000 0.20000 0.00042 0.00008 0.17444 0.265000.20000 0.25000 0.00070 0.00014 0.28914 0.55414 0.890.10000 0.30000 0.00095 0.00010 0.19817 0.75231 Credible Interval0.03000 0.35000 0.00113 0.00003 0.07060 0.822910.02000 0.40000 0.00119 0.00002 0.04967 0.872580.02000 0.45000 0.00114 0.00002 0.04721 0.919790.02000 0.50000 0.00098 0.00002 0.04062 0.960410.01000 0.55000 0.00076 0.00001 0.01580 0.97621 0.550000.01000 0.60000 0.00053 0.00001 0.01104 0.987250.01000 0.65000 0.00033 0.00000 0.00682 0.994070.01000 0.70000 0.00018 0.00000 0.00364 0.997710.01000 0.75000 0.00008 0.00000 0.00161 0.999320.01000 0.80000 0.00003 0.00000 0.00055 0.999860.01000 0.85000 0.00001 0.00000 0.00012 0.999990.01000 0.90000 0.00000 0.00000 0.00001 1.000000.01000 0.95000 0.00000 0.00000 0.00000 1.000000.01000 1.00000 0.00000 0.00000 0.00000 1.00000
Totals 1.00000 0.00048 1.00000
23
Examples: Credibility IntervalsBayesian Approach-ConclusionsCredibility Intervals (for the same data) depend on the Prior Distribution
Prior Credibility Interval ConfidenceUniform (0.15, 0.70) 0.96Symmetric (0.15, 0.35) 0.91Tiered (0.15, 0.55) 0.89
Frequentist 95% Confidence Intervals based on Normal Approximation
(0.10, 0.70) 1
2
ˆ ˆ1ˆ
p pp z
n
Credibility Interval- Intuitive Interpretation- prob parameter is in interval is confidence
Frequentist Confidence Interval- awkward interpretation- includes parameter for 95% of samples, if repeated
top related