Download - An Introduction to Educational Research Statistics Graham McMahon MD MMSc [email protected] 1
An Introduction to An Introduction to Educational Research StatisticsEducational Research Statistics
Graham McMahon MD MMScGraham McMahon MD MMSc
[email protected]@partners.org
11
Course OverviewCourse Overview Last week:Last week:
Stages of a trial from design to completionStages of a trial from design to completion Generating hypothesesGenerating hypotheses Working with the IRBWorking with the IRB Considering the funding requiredConsidering the funding required Trial DesignsTrial Designs
Today:Today: Choosing an outcome variableChoosing an outcome variable Powering your studyPowering your study Establishing inter-rater reliabilityEstablishing inter-rater reliability Determining if there is a difference between two groupsDetermining if there is a difference between two groups Test developmentTest development Qualitative approachesQualitative approaches
22
Stages of an Educational Stages of an Educational Interventional TrialInterventional Trial
StageStage ActivitiesActivities
11 Initial DesignInitial Design Hypothesis, SizeHypothesis, Size
22 Protocol DesignProtocol Design Define methods, collaborations, Define methods, collaborations, IRBIRB
33 RecruitmentRecruitment Subject Acquisition, MonitoringSubject Acquisition, Monitoring
44 FollowupFollowup Collect outcome dataCollect outcome data
55 AnalysisAnalysis Prepare “Clean + Locked” Prepare “Clean + Locked” DatabaseDatabase
Perform analysisPerform analysis
66 ReportingReporting Write and submit manuscriptWrite and submit manuscript
77 Additional analysesAdditional analyses Further explorations of trial dataFurther explorations of trial data
33
Population & SamplingPopulation & Sampling
Must balanceMust balance VariabilityVariability [the smaller or more diverse the population, [the smaller or more diverse the population,
the more variable; variability creates error]the more variable; variability creates error] GeneralizabilityGeneralizability [population can’t be too specific] [population can’t be too specific] AccessAccess [you can only study those you have access to] [you can only study those you have access to] CostCost [larger studies are much more expensive] [larger studies are much more expensive]
ConsiderConsider Participation rateParticipation rate Multiple sitesMultiple sites Online projectsOnline projects Lower reimbursementLower reimbursement
44
OutcomeOutcome
What is really important?What is really important? What would colleagues care about?What would colleagues care about? ‘‘Hard’ outcomesHard’ outcomes
Death, attendence, Death, attendence, ‘‘Soft’ outcomesSoft’ outcomes
Satisfaction, self-confidenceSatisfaction, self-confidence
55
Outcomes / EndpointsOutcomes / Endpoints
Primary OutcomePrimary Outcome What you power your study onWhat you power your study on
Secondary OutcomeSecondary Outcome Other related outcomes that may be interesting to testOther related outcomes that may be interesting to test
Exploratory OutcomesExploratory Outcomes Association studies, subgroups that may be Association studies, subgroups that may be
interesting, but likely to be underpoweredinteresting, but likely to be underpowered May serve as pilot data for future studiesMay serve as pilot data for future studies
Surrogate EndpointSurrogate Endpoint In the causal pathway and affected by the interventionIn the causal pathway and affected by the intervention
66
Group ActivityGroup Activity
Medical errors and patient safety continue to be Medical errors and patient safety continue to be an important concern for patients and an important concern for patients and physicians. Numerous reports have suggested physicians. Numerous reports have suggested that fatigue and sleepiness contribute to medical that fatigue and sleepiness contribute to medical errors. You are the program director in an errors. You are the program director in an internal medicine residency that has 40 internal medicine residency that has 40 residents and want to make a contribution in this residents and want to make a contribution in this area. area.
List an hypothesis that could be generated List an hypothesis that could be generated based on this reflection. based on this reflection.
How would you measure sleepiness?How would you measure sleepiness?
You review the available sleepiness scales and must choose one. Which one is best?
AAAwake indexAwake index
BBSleepy scoreSleepy score
CCDoze IndexDoze Index
DDSnory scaleSnory scale
EEYawn scoreYawn score
Scale SizeScale Size 88 100100 2020 6060 1212
Mean Rating for Mean Rating for ResidentsResidents
66 7272 1515 3030 55
Standard Standard Deviation for Deviation for ResidentsResidents
55 2020 44 99 33
DistributionDistribution Mean>Mean>MedianMedian
Mean=Mean=MedianMedian
Mean=Mean=MedianMedian
Mean<Mean<MedianMedian
Mean=Mean=MedianMedian
Expected Score Expected Score DifferenceDifference
33 1414 55 1010 44
Power and ErrorPower and Error
αα is the probability of making a Type I error is the probability of making a Type I error Power is the likelihood of avoiding a Type II errorPower is the likelihood of avoiding a Type II error Use trial type, Use trial type, αα and power to calculate sample and power to calculate sample
size size
99
Sample Size Calculations
1010
Calculating Sample SizeCalculating Sample SizeE
ffec
t S
ize
1 SD diff between groups with power of 0.8 requires 30-40 subjects
0.3 SD diff between groups with power of 0.8 requires 300-400 subjects
1111
Simple CalculationSimple Calculation
N (per group) = 15.8 / (effect size)N (per group) = 15.8 / (effect size)2 2 for power of 80% and for power of 80% and αα=0.05=0.05
Remember to increase enrollment so that Remember to increase enrollment so that number completing ≥ expected sample size number completing ≥ expected sample size
1212
You review the available sleepiness scales and must choose one. Which one is best?
AAAwake indexAwake index
BBSleepy scoreSleepy score
CCDoze IndexDoze Index
DDSnory scaleSnory scale
EEYawn scoreYawn score
Scale SizeScale Size 88 100100 2020 6060 1212
Mean Rating for Mean Rating for ResidentsResidents
66 7272 1515 3030 77
Standard Standard Deviation for Deviation for ResidentsResidents
55 2020 44 99 33
DistributionDistribution Mean>Mean>MedianMedian
Mean=Mean=MedianMedian
Mean=Mean=MedianMedian
Mean<Mean<MedianMedian
Mean=Mean=MedianMedian
Expected Score Expected Score DifferenceDifference
33 1414 55 1010 44
Effect size = score difference / standard deviationEffect size = score difference / standard deviation
Power and Samples SizesAA
Awake indexAwake indexBB
Sleepy scoreSleepy scoreCC
Doze IndexDoze IndexDD
Snory scaleSnory scaleEE
Yawn scoreYawn score
Scale SizeScale Size 88 100100 2020 6060 1212
Mean Rating for Mean Rating for ResidentsResidents
66 7272 1515 3030 55
Standard Standard Deviation for Deviation for ResidentsResidents
55 2020 44 99 33
DistributionDistribution Mean>Mean>MedianMedian
Mean=Mean=MedianMedian
Mean=Mean=MedianMedian
Mean<Mean<MedianMedian
Mean=Mean=MedianMedian
Expected Score Expected Score DifferenceDifference
33 1414 55 1010 44
N per groupN per group 2929 3333 2626 2020 1010
Power (N=15 Power (N=15 per grp)per grp)
0.350.35 0.450.45 0.910.91 0.840.84 0.940.94
Calculating Sample Size using SoftwareCalculating Sample Size using Software
Difference between groups
Standard Deviation
Choose Test
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize
1515
Two faculty offer to measure the Two faculty offer to measure the sleepiness of residents using your scale. sleepiness of residents using your scale.
How can you find out if they are good How can you find out if they are good raters?raters?
Interrater Reliability Interrater reliability is the extent to
which two or more individuals (coders or raters) agree.
Training, education and monitoring skills can enhance interrater reliability.
Goal is generally reliability > 0.8
• Categorial: measure %• Ordinal: spearman rho• Continuous: pearson r
Rater 1 Rater 2 Rater 1 Rater 2
1 2 3 52 1 3 43 3 5 34 4 5 65 6 7 56 8 7 37 7 9 88 5 9 7
Pearson 0.81 Pearson 0.56
Analyzing your DataAnalyzing your Data
Plan your analysisPlan your analysis Consider consulting a specialistConsider consulting a specialist Test for normalityTest for normality Choose the right testChoose the right test Avoid statistical explorations with the dataAvoid statistical explorations with the data
1818
You start your study and find that among the interns the M:F ratio was 12:5 and 8:9 and wonder if they are statistically unbalanced.
Categorical CountsCategorical Counts
Chi-square statistic: Chi-square statistic: no cell no cell in the table should have an in the table should have an expected frequency of <1, expected frequency of <1, and no more than 20% of and no more than 20% of the cells should have an the cells should have an expected frequency of <5.expected frequency of <5.
Use Use Fisher’s exact test Fisher’s exact test when numbers are smallwhen numbers are small
Group 1 Group 2
MenMen 1212 88
WomenWomen 55 99
Chi-square = 1.1Chi-square = 1.1Fisher exact, p=0.29Fisher exact, p=0.29
2020
You collect your baseline observations You collect your baseline observations and find the following sleepiness in each and find the following sleepiness in each group. Are they different?group. Are they different? Grp 1 – 8, 6, 5, 2, 3, 9, 11, 6, 11Grp 1 – 8, 6, 5, 2, 3, 9, 11, 6, 11 Grp 2 – 3, 5, 5, 2, 7, 4, 8, 10, 2Grp 2 – 3, 5, 5, 2, 7, 4, 8, 10, 2
Summary of TestsSummary of Tests
Type of Data Two Paired Groups
Two Independent Groups
Many Independent Groups
Correlation
Categories McNemar Chi-square Chi-square
Continuous Paired t-test t-test ANOVA Pearson r
Rank Wilcoxon Kruskal-Wallis Spearman r
2222
Test for Test for Normality!Normality!
t-t-testtest
Comparing two meansComparing two means Check if paired or Check if paired or
unpairedunpaired The more SE’s you are The more SE’s you are
away from zero, the less away from zero, the less likely that the difference likely that the difference occurred by chanceoccurred by chance
Had Elective
No Elective
Number of students
145 48
Mean Score 76% 64%
SD 12 11
2323
Testing difference between two Testing difference between two groups over timegroups over time
t- test on between t- test on between group difference at group difference at endend
tt-test on change over -test on change over timetime
Time 1 Time 2
2424
Statistical Tests for Statistical Tests for Skewed or Rank DataSkewed or Rank Data
These data don’t follow normal rulesThese data don’t follow normal rules Non-parametric tests are less powerfulNon-parametric tests are less powerful Two groupsTwo groups
Wilcoxon rank sum (=Mann-Whitney-U)Wilcoxon rank sum (=Mann-Whitney-U)
Three or more groupsThree or more groups Kruskal-WallisKruskal-Wallis
2626
Wilcoxon Rank SumWilcoxon Rank Sum
Rank all observations in increasing order of Rank all observations in increasing order of magnitude, ignoring which group they come magnitude, ignoring which group they come from. from.
Add up the ranks in the smaller of the two Add up the ranks in the smaller of the two groups . groups .
Look up the critical value of the sum of ranks for Look up the critical value of the sum of ranks for that size group. that size group.
2727
Summary of TestsSummary of Tests
Type of Data Two Paired Groups
Two Independent Groups
Many Independent Groups
Correlation
Categories McNemar Chi-square Chi-square
Continuous Paired t-test t-test ANOVA Pearson r
Rank Wilcoxon Kruskal-Wallis Spearman r
2828
SummarySummary
Careful choice of your population will Careful choice of your population will improve your chances of finding an effectimprove your chances of finding an effect
Choose your outcome measure Choose your outcome measure thoughtfullythoughtfully
Estimate your power and sample size in Estimate your power and sample size in advanceadvance
Ensure internal consistency is goodEnsure internal consistency is good Determine normality and analyze your Determine normality and analyze your
dataset accordingly dataset accordingly