department of statistics 9 february 2007 ssp core facility 1 (and precision) effective research...
TRANSCRIPT
9 February 2007 SSP Core Facility 1
Department of Statistics
(and Precision) Effective Research Design
Planning
for Grant Proposals & More
Walt Stroup, Ph.D.Professor & Chair, Department of Statistics
University of Nebraska, Lincoln
Power Power
9 February 2007 SSP Core Facility 2
Department of Statistics
Outline for Talk
I. What is “Power Analysis”? Why should I do it?
II. Essential Background
III. A Word about Software
IV. Decisions that Affect Power – several examples
V. Latest Thinking
VI. Final Thoughts
9 February 2007 SSP Core Facility 3
Department of Statistics
Power and Precision Defined
Precision a.k.a “Margin of Error”− In most cases, the standard error of relevant estimate
Power−Prob { reject H0 given H0 false }
−Prob { research hypothesis statistically significant }
Power analysis−essentially, “If I do the study this way, power = ?”
Sample size estimation−How many observations required to achieve given power?
9 February 2007 SSP Core Facility 4
Department of Statistics
What’s involved in Power Analysis
WHAT IT’S NOT: “Painting by numbers...”
IF IT’S DONE RIGHT Power analysis should be
−a comprehensive conversation to plan the study
−a “dress rehearsal” for the statistical analysis once the data are collected
9 February 2007 SSP Core Facility 5
Department of Statistics
Why do a Power Analysis?
For NIH Grant Proposal−because it’s required
For many other grant proposals−because it gives you a competitive edge
Other reasons−practical: increases chance of success; reduces
“we don’t have time to do it right, but lots of time to do it over” syndrome
−ethical
9 February 2007 SSP Core Facility 6
Department of Statistics
Ethical???
Last Ph.D. in U.S. Senate Irritant to doctrinaire left and right
Keynote address to 1997 American Stat. Assoc. “... we can continue to make policy based on ‘data-free ideology’ on we can inform policy where possible by competent inquiry...”
late U.S. Senator Daniel Patrick Moynihan
9 February 2007 SSP Core Facility 7
Department of Statistics
Ethical
Results of your study may affect policy Well-conceived research means
−better information
−greater chance of sound decisions
Poorly-conceived research− lost opportunity
−deprives policy-makers of information that might have been useful
−or worse: bad information misinforms or misleads public
9 February 2007 SSP Core Facility 8
Department of Statistics
What affects Power & Precision?
A short statistics lesson
1. What goes into computing test statistics
2. What test statistics are supposed to tell us
3. A bit about the distribution of test statistics
4. Central and non-central t, F, and chi-square( mostly F )
9 February 2007 SSP Core Facility 9
Department of Statistics
What goes into a test statistic?
Research hypothesis – motivation for study
Assumed not true unless data show compelling evidence otherwise
Research hypothesis: HA ; opposite: H0
H0 true HA true
Fail to reject H0 Type II error
Reject H0 Type I error Power
9 February 2007 SSP Core Facility 10
Department of Statistics
What goes into a test statistic?
Visualize using F
But same basic principles for t, chi-square, etc
F is ratio of variation attributable to factor under study vs. variation attributable to noise
22
2
of obs effect size estimates
(noise)
NF
N of obs effect sizevariance of noise(i.e. among obs)
9 February 2007 SSP Core Facility 11
Department of Statistics
When H0 True – i.e. no trt effect
numerator (trt) d.f., denominator (noise/error) d.f.~ FF
9 February 2007 SSP Core Facility 12
Department of Statistics
When H0 false (i.e. Research HA true)
num. d.f., den (error) d.f.,
2
2
~ F "non-centrality parameter"
N of obs effect size
F
9 February 2007 SSP Core Facility 13
Department of Statistics
What affects Power?
2
2
Increase "non-centrality parameter" Increase Power
N of obs effect size
N of obs effect sizevariance of noise(i.e. among obs)
9 February 2007 SSP Core Facility 14
Department of Statistics
What should be in a conversation about Power?
2
2
Increase "non-centrality parameter" Increase Power
N of obs effect size
N of obs effect sizevariance of noise(i.e. among obs)
Effect size: what is the minimum that matters? Variance: how much “noise” in the response
variable (range? distribution? count? pct?) Practical Constraints Design: same N can produce varying Power
9 February 2007 SSP Core Facility 15
Department of Statistics
About Software (part I) Canned Software
− lots of it−Xiang and Zhou working on report−“painting by numbers”
Simulation−most accurate; not constrained by canned scenarios−you can see what will happen if you actually do this...
“Exemplary data set” + modeling software−nearly as accurate as simulation−“dress rehearsal” for actual analysis−MIXED, GLIMMIX, NLMIXED: if you can model it
you can do power analysis
9 February 2007 SSP Core Facility 16
Department of Statistics
Design Decisions – Some Examples
Main Idea: For the same amount of effort, or $$$, or # observations, power and precision can be quite different
Power analysis objective: Work smarter, not harder
Simple example – design of regression study−From STAT 412 exercise
9 February 2007 SSP Core Facility 17
Department of Statistics
Treatment Design Exercise
Class was asked to predict Bounce Height of basketball from Drop Height and to see if relationship changes depending on floor surface
Decision: What drop heights to use???
9 February 2007 SSP Core Facility 18
Department of Statistics
Objectives and Operating Definitions
Recall objective: does drop: bounce height relationship change with floor surface?
0 1 0 1
1 1
Model:
relationship change meansC C T T
C T
y X X
operating definition
9 February 2007 SSP Core Facility 19
Department of Statistics
Consequences of Drop Height Decisions Should we use fewer drops heights & more obs per drop
height or vice versa?
table from Stat 412 Avery archive
9 February 2007 SSP Core Facility 20
Department of Statistics
Simulation
CRD example: 3 treatments, 5 reps / treatment Suspected Effect size: 6-10% relative to control,
whose mean is known to be ~ 100 Standard deviation: 10 considered “reasonable” Simulate 1000 experiments Reject H0: equal trt means 228 times
−power = 0.228 at alpha=0.05
Ctl mean ranked correctly 820 times (intermediate mean ranked correctly 589 times)
9 February 2007 SSP Core Facility 21
Department of Statistics
“Exemplary Data” Many software packages for power & sample size
− e.g SAS PROC POWER− for FIXED effect models only
“Exemplary Data” more general Especially (but not only) when “Mixed Model Issues”
− random effects− split-plot structure− errors potentially correlated: longitudinal or spatial data− any other non-standard model structure
Methods use PROC MIXED or GLIMMIX− adapted from Stroup (2002, JABES)
Chapter 12, SAS for Mixed Models − (Littell, et al, 2006)
9 February 2007 SSP Core Facility 22
Department of Statistics
“Exemplary Data” - Computing Power using SAS
create data set like proposed design
run PROC GLIMMIX (or MIXED) with variance fixed
=(F computed by GLIMMIX)rank(K) [or chi-sq with GLM]
use GLIMMIX to compute
critical F (Fcrit ) is value s.t.
P{F (rank(K), υ, 0 ) > Fcrit}= [or chi-square]
Power = P{F [rank(K), υ, ] >Fcrit }
SAS functions can compute Fcrit & Power
9 February 2007 SSP Core Facility 23
Department of Statistics
/* step 1 - create data set with same structure as proposed design use MU (expected mean) instead of observed Y_ij values *//* this example shows power for 5, 10, and 15 e.u. per trt */
data crdpwrx1; input trt mu; do n=5 to 15 by 5; do eu=1 to n; output; end; end;cards;1 1002 943 90;
Compute Power with GLIMMIX – CRD example
9 February 2007 SSP Core Facility 24
Department of Statistics
Compute Power with GLIMMIX – CRD example
/* step 2 - use PROC GLIMMIX to compute non-centrality parameters for ANOVA tests & contrasts ODS statements output them to new data sets */proc sort data=crdpwrx1;by n;
proc glimmix data=crdpwrx1;by n; class trt; model mu=trt; parms (100)/hold=1; contrast 'et1 v et2' trt 0 1 -1; contrast 'c vs et' trt 2 -1 -1; ods output tests3=b; ods output contrasts=c;run;
9 February 2007 SSP Core Facility 25
Department of Statistics
/* step 3: combine ANOVA & contrast n-c parameter data sets use SAS functions PROBF and FINV to compute power */data power; set b c; alpha=0.05; ncparm=numdf*fvalue; fcrit=finv(1-alpha,numdf,dendf,0); power=1-probf(fcrit,numdf,dendf,ncparm);proc print;
Obs Effect Label DF DenDF alpha nc fcrit power
1 trt 2 12 0.05 2.53333 3.88529 0.223612 et1 v et2 1 12 0.05 0.40000 4.74723 0.089803 c vs et 1 12 0.05 2.13333 4.74723 0.26978
Type III Tests of Fixed Effects
EffectNum
DFDen DF F Value Pr > F
trt 2 12 1.27 0.3169
Contrasts
LabelNum
DFDen DF F Value Pr > F
et1 v et2 1 12 0.40 0.5390
c vs et 1 12 2.13 0.1698
Note close agreementof Simulated Power(0.228) and “exemplarydata” power (0.224)
9 February 2007 SSP Core Facility 26
Department of Statistics
More Advanced Example
Plots in 8 x 3 grid Main variation along 8 “rows” 3 x 2 treatment design Alternative designs
− randomized complete block (4 blocks, size 6)
− incomplete block (8 blocks, size 3)
−split plot
RCBD “easy” but ignores natural variation
9 February 2007 SSP Core Facility 27
Department of Statistics
Picture the 8 x 3 Grid
Gradient
e.g. 8 schools, gradient is “SES”, 3 classrooms each
9 February 2007 SSP Core Facility 28
Department of Statistics
SAS Programs to Compare 8 x 3 Designdata a; input bloc trtmnt @@; do s_plot=1 to 3; input dose @@; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end;cards;1 1 1 2 31 2 1 2 32 1 1 2 32 2 1 2 33 1 1 2 33 2 1 2 34 1 1 2 34 2 1 2 3;
proc glimmix data=a noprofile; class bloc trtmnt dose; model mu=bloc trtmnt|dose; random trtmnt/subject=bloc; parms (4) (6) / hold=1,2; lsmeans trtmnt*dose / diff; contrast 'trt x lin'
trtmnt*dose 1 0 -1 -1 0 1; ods output diffs=b; ods output contrasts=c;run;
Split-Plot
9 February 2007 SSP Core Facility 29
Department of Statistics
8 x 3 – Incomplete Blockdata a; input bloc @@; do eu=1 to 3; input trtmnt dose @@; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end;cards;1 1 1 1 2 1 32 1 1 1 2 2 23 1 1 1 3 2 34 1 1 2 1 2 25 1 2 1 3 2 26 1 2 2 1 2 37 1 3 2 1 2 38 2 1 2 2 2 3;
proc glimmix data=a noprofile; class bloc trtmnt dose; model mu=trtmnt|dose; random intercept / subject=bloc; parms (4) (6) / hold=1,2; lsmeans trtmnt*dose / diff; contrast 'trt x lin'
trtmnt*dose 1 0 -1 -1 0 1; ods output diffs=b; ods output contrasts=c;run;
9 February 2007 SSP Core Facility 30
Department of Statistics
8 x 3 Example - RCBDdata a; input trtmnt dose @@; do bloc=1 to 4; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end;cards;1 1 1 2 1 3 2 1 2 2 2 3;
proc glimmix data=a noprofile; class bloc trtmnt dose; model mu=bloc trtmnt|dose; parms (10) / hold=1; lsmeans trtmnt*dose / diff; contrast 'trt x lin'
trtmnt*dose 1 0 -1 -1 0 1; ods output diffs=b; ods output contrasts=c;run;
9 February 2007 SSP Core Facility 31
Department of Statistics
How did designs compare?
Suppose main objective is compare regression over 3 levels of doses: do they differ by treatment? (similar to basketball experiment)
Operating definition is thus H0: dose regression coefficient equal
Power for Randomized Block: 0.66 Power for Incomplete Block: 0.85 Power for Split-Plot: 0.85 Same # observations – you can work smarter
9 February 2007 SSP Core Facility 32
Department of Statistics
But what if I don’t know Trt Effect Size or Variance?
“How can I do a power analysis? If I knew the effect size and the variance I wouldn’t have to do the study.”
What trt effect size is NOT: it is NOT the effect size you are going to observe
It is somewhere between−what current knowledge suggests is a reasonable
expectation
−minimum difference that would be considered “important” or “meaningful”
9 February 2007 SSP Core Facility 33
Department of Statistics
And Variance??
Know thy relevant background / Do thy homework
Literature search: what have others working with similar subjects reported as variance?
Pilot study Educated guess
−range you’d expect 95% of likely obs? divide it by 4
−most extreme values you can plausibly imagine? divide range by 6
9 February 2007 SSP Core Facility 34
Department of Statistics
Hierarchical Linear Models
From Bovaird (10-27-2006) seminar 2 treatment 20 classrooms / trt 25 students / classroom 4 years reasonable ideas of classroom(trt),
student(classroom*trt), within student variances as well as effect size
Implement via exemplary data + GLIMMIX
9 February 2007 SSP Core Facility 35
Department of Statistics
Categorical Data?
Example: Binary data “Standard” has success probability of 0.25 “New & Improved” hope to increase to 0.30 Have N subjects at each of L locations
For sake of argument, suppose we have−900 subjects / location
−10 locations
9 February 2007 SSP Core Facility 36
Department of Statistics
Power for GLMs
2 treatments P{favorable outcome} for trt 1 p= 0.30; for trt 2 p=0.25 power if n1=300; n2=600data a; input trt y n; datalines;1 90 3002 150 600;
proc glimmix; class trt; model y/n=trt / chisq; ods output tests3=pwr;run;
data power; set pwr; alpha=0.05; ncparm=numdf*chisq; crit=cinv(1-alpha,numdf,0); power=1-probchi(crit,numdf,ncparm); proc print; run;
exemplary data
9 February 2007 SSP Core Facility 37
Department of Statistics
Power for GLMM Same trt and sample size per location as before 10 locations Var(Location)=0.25; Var(Trt*Loc)=0.125 Variance Components: variation in log(OddsRatio) Power?data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 90 300 2 150 600 ;
proc glimmix data=a initglm; class trt loc; model y/n = trt / oddsratio; random intercept trt / subject=loc; random _residual_; parms (0.25) (0.125) (1) / hold=1,2,3; ods output tests3=pwr;run;
9 February 2007 SSP Core Facility 38
Department of Statistics
GLMM Power Analysis Results
Obs Effect NumDF DenDF alpha ncparm fcrit power
1 trt 1 9 0.05 2.29868 5.11736 0.27370
Odds Ratio Estimates
trt _trt Estimate DF
95% Confidence
Limits
1 2 1.286 9 0.884 1.871
Gives you expected Conf Limits for # Locations & N / Loccontemplated
Gives you the power of the test of TRT effect on prob(favorable)
9 February 2007 SSP Core Facility 39
Department of Statistics
GLMM Power: Impact of Sample Size?
N of subjects per trt per location?
N of Locations?
Three cases
1. n-300/600 10 loc2. n=600/1200, 10 loc3. n=300/600, 20 loc
data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 90 300 2 150 600 ;
data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 180 600 2 300 1200 ;
data a; input trt y n; do loc=1 to 20; output; end; datalines; 1 90 300 2 150 600 ;
9 February 2007 SSP Core Facility 40
Department of Statistics
GLMM Power: Impact of Sample Size?Recall, for 10 locations, N=300/600,
CI for OddsRatio was (0.884, 1.871); Power was 0.274For 10 locations, N=600 / 1200
Odds Ratio Estimates
trt _trt Estimate DF 95% Confidence Limits
1 2 1.286 9 0.891 1.855
Obs Effect NumDF DenDF alpha ncparm fcrit power
1 trt 1 9 0.05 2.40715 5.11736 0.28421
For 20 locations, N=300 / 600Odds Ratio Estimates
trt _trt Estimate DF 95% Confidence Limits
1 2 1.286 19 1.006 1.643
Obs Effect NumDF DenDF alpha ncparm fcrit power
1 trt 1 19 0.05 4.59736 4.38075 0.53003
N alone has almost no impact
9 February 2007 SSP Core Facility 41
Department of Statistics
Recent developments
Continue binary example Power analysis shows:
-level 0.10 0.05 0.05 0.01 0.05 0.01
Power 0.80 0.80 0.90 0.80 0.95 0.90
Llocations
27 38 46 53 57 68
what do you do?
9 February 2007 SSP Core Facility 42
Department of Statistics
More Information
Consider studies directed toward improving success rate similar to that proposed in study
Lit search yields 95 such studies 29 have reported statistically significant gains of
p1-p2>0.05 (or, alternatively, significant odds ratios of [(30/70)/(25/75)]=1.28 or greater)
If this holds, “prior” prob (desired effect size ) is approx 0.3
9 February 2007 SSP Core Facility 43
Department of Statistics
An Intro Stat Result
0Pr desired effect size | reject H
0Pr | D.E.S. Pr D.E.S.
Pr | D.E.S. Pr D.E.S. Pr | DES Pr DES
reject H
reject reject
for =0.10, power=0.8
0.8 0.30.77
0.8 0.3+0.1 0.7
real Pr{type I error}is more like 0.23than 0.10!!!
9 February 2007 SSP Core Facility 44
Department of Statistics
Returning to All Scenarios
-level 0.10 0.05 0.05 0.01 0.05 0.01
Power 0.80 0.80 0.90 0.80 0.95 0.90
Llocations
27 38 46 53 57 68
Pr{DES | reject H0 }
0.77 0.87 0.89 0.97 0.89 0.97
NOTE dramatic impact of alpha-level when “prior” Pr { DES } is relatively low
POWER role increases at Pr { DES } increases
9 February 2007 SSP Core Facility 45
Department of Statistics
Closing Comments
In case it’s not obvious− I’m not a fan of “painting by numbers”
−Role of power analysis misunderstood & underappreciated
MOST of ALL it is an opportunity to explore and rehearse study design & planned analysis
Engage statistician as a participating member of research team
Give it the TIME it REQUIRES