term 4, 2006bio656--multilevel models 1 projects are due by midnight, friday, may 19 th electronic...
TRANSCRIPT
BIO656--Multilevel Models 1Term 4, 2006
PROJECTS ARE DUEPROJECTS ARE DUE
• By midnight, Friday, May 19th • Electronic submission only to [email protected]• Please name the file: [myname]-project.[filetype]or [name1_name2]-project.[filetype]
BIO656--Multilevel Models 2Term 4, 2006
Efficiency-Robustness Trade-offsEfficiency-Robustness Trade-offs
• First, we consider alternatives to the Gaussian distribution for random effects
• Then, we move to issues of weighting, starting with some formalism
• Then, move to an example of informative sample size
• And, finally give a basic example that has broad implications of choosing among weighting schemes
BIO656--Multilevel Models 3Term 4, 2006
Alternatives to the Gaussian Alternatives to the Gaussian Distribution for Random EffectsDistribution for Random Effects
BIO656--Multilevel Models 4Term 4, 2006
The t-distributionThe t-distribution
• Broader tails than the Gaussian
• So, shrinks less for deviant Y-values
• The t-prior allows “outlying” parameters and so a deviant Y is not so indicative of a large, level 1 residual
BIO656--Multilevel Models 5Term 4, 2006
Creating a t-distributionCreating a t-distribution
• Assume a Gaussian sampling distribution,
• Using the sample standard deviation produces the t-distribution• Z is t with a large df• t3 is the most different from Z for t-distributions with a finite variance
BIO656--Multilevel Models 6Term 4, 2006
BIO656--Multilevel Models 7Term 4, 2006
With a t-prior, B is B(Y), increasing with |Y - |
BIO656--Multilevel Models 8Term 4, 2006
Z is distancefrom the center
(1-B) = ½ = 0.50
BIO656--Multilevel Models 9Term 4, 2006
Z is distancefrom the center
(1- B) = 2/3 = 0.666
BIO656--Multilevel Models 10Term 4, 2006
Estimated Gaussian & Estimated Gaussian & Fully Non-parametric priors Fully Non-parametric priors
for the USRDS data
BIO656--Multilevel Models 11Term 4, 2006
USRDS estimated PriorsUSRDS estimated Priors
BIO656--Multilevel Models 12Term 4, 2006
BIO656--Multilevel Models 13Term 4, 2006
BIO656--Multilevel Models 14Term 4, 2006
BIO656--Multilevel Models 15Term 4, 2006
BIO656--Multilevel Models 16Term 4, 2006
BIO656--Multilevel Models 17Term 4, 2006
Informative Sample SizeInformative Sample Size(Similar to informative Censoring)(Similar to informative Censoring)
See Louis et al. SMMR 2006
BIO656--Multilevel Models 18Term 4, 2006
BIO656--Multilevel Models 19Term 4, 2006
BIO656--Multilevel Models 20Term 4, 2006
BIO656--Multilevel Models 21Term 4, 2006
BIO656--Multilevel Models 22Term 4, 2006
BIO656--Multilevel Models 23Term 4, 2006
BIO656--Multilevel Models 24Term 4, 2006
BIO656--Multilevel Models 25Term 4, 2006
Choosing among weighting schemesChoosing among weighting schemes
“Optimality” versus goal achievement
BIO656--Multilevel Models 26Term 4, 2006
Inferential ContextInferential Context
Question What is the average length of in-hospital stay?
A more specific question• What is the average length of stay for:
– Several hospitals of interest?
– Maryland hospitals?
– All hospitals?
– .......
BIO656--Multilevel Models 27Term 4, 2006
““Data” Collection & GoalData” Collection & Goal
Data gathered from 5 hospitals
• Hospitals are selected by some method
• nhosp patient records are sampled at random
• Length of stay (LOS) is recorded
Goal is to: Estimate the “population” mean
BIO656--Multilevel Models 28Term 4, 2006
ProcedureProcedure
• Compute hospital-specific means
• “Average” them– For simplicity assume that the population variance
is known and the same for all hospitals
How should we compute the average?• Need a goal and then a good/best way to combine information
BIO656--Multilevel Models 29Term 4, 2006
““DATA”DATA”
Hospital # sampled nhosp
Hospital size
% of Total
size: 100hosp
Mean LOS
Within-hospital variance
1 30 100 10 25 2/30
2 60 150 15 35 2/60
3 15 200 20 15 2/15
4 30 250 25 40 2/30
5 15 300 30 10 2/15
Total 150 1000 100 ? ?
BIO656--Multilevel Models 30Term 4, 2006
Weighted averages & Variances Weighted averages & Variances (Variances are based on FE not RE)
Weightingapproach
Weightsx100
Mean VarianceRatio
100*(Var/min)
Equal 20 20 20 20 20 25.0 130
Proportional to Reciprocal variance
20 40 10 20 10 29.5 100
Population hosp10 15 20 25 30 23.8 172
Each weighted average is mean =
Reciprocal variance weights minimize varianceIs that our goal?
kkXw
BIO656--Multilevel Models 31Term 4, 2006
There are many weighting There are many weighting choices and weighting goalschoices and weighting goals
• Minimize variance by using reciprocal variance weights
• Minimize bias for the population mean by using population weights (“survey weights”)
• Use policy weights (e.g., equal weighting)
• Use “my weights,” ...
BIO656--Multilevel Models 32Term 4, 2006
General SettingGeneral Setting
When the model is correct• All weighting schemes estimate the same quantities
– same value for slopes in a multiple regression
• So, it is clearly best to minimize variance by using reciprocal variance weights
When the model is incorrect• Must consider analysis goals and use appropriate weights• Of course, it is generally true that our model is not correct!
BIO656--Multilevel Models 33Term 4, 2006
Weights and their propertiesWeights and their properties
• But if1 = 2 = 3 = 4 = 5 =
then all weighted averages estimate the population
mean: kk
So, it’s best to minimize the variance
But, if the hospital-specific k are not all equal, then• Each set of weights estimates a different target• Minimizing variance might not be “best”• For an unbiased estimate of setwk = k
kkkk
54321
w estimates Yw Then,
E(LOS) specific-hospital the be(Let
w
),,,,
πμ
πμ
πμ
BIO656--Multilevel Models 34Term 4, 2006
The variance-bias tradeoffThe variance-bias tradeoff
General idea Trade-off variance & bias to produce low Mean Squared Error (MSE)
MSE = Expected(Estimate - True)(Estimate - True)22
= Variance + (Bias)Variance + (Bias)22
• Bias is unknown unless we know the k
(the true hospital-specific mean LOS)
• But, we can study MSE (, w, )
• In practice, make some “guesses” and do sensitivity analyses
BIO656--Multilevel Models 35Term 4, 2006
Variance, Bias and MSE Variance, Bias and MSE as a function of (the as a function of (the ss, , ww, , ))
• Consider a true value for the variation of the between hospital means (* is the “overall mean”)
T = (k - *)2
• Study BIAS, Variance, MSE for weights that optimize MSE for an assumed value (A) of the between-hospital variance
• So, when A = T, MSE is minimized by this optimizer
• In the following plot, A is converted to a fraction of the total variance A/(A + within-hospital)– Fraction = 0 minimize variance– Fraction = 1 minimize bias
BIO656--Multilevel Models 36Term 4, 2006
The bias-variance trade-offThe bias-variance trade-offX-axis is assumed variance fraction
Y is performance computed under the true fraction
Assumed k
BIO656--Multilevel Models 37Term 4, 2006
SummarySummary
• Much of statistics depends on weighted averages
• Weights should depend on assumptions and goals
• If you trusttrust your (regression) model,– Then, minimize the variance, using “optimal” weights– This generalizes the equal case
• If you worryworry about model validity (bias for – You can buy full insurance by using population weights– But, you pay in variance (efficiency)– So, consider purchasing only the insurance you need by
using compromise weightsusing compromise weights