Download - An Introduction to Educational Research Statistics Graham McMahon MD MMSc [email protected] 1

An Introduction to An Introduction to Educational Research StatisticsEducational Research Statistics

Graham McMahon MD MMScGraham McMahon MD MMSc

[email protected]@partners.org

11

Course OverviewCourse Overview Last week:Last week:

Stages of a trial from design to completionStages of a trial from design to completion Generating hypothesesGenerating hypotheses Working with the IRBWorking with the IRB Considering the funding requiredConsidering the funding required Trial DesignsTrial Designs

Today:Today: Choosing an outcome variableChoosing an outcome variable Powering your studyPowering your study Establishing inter-rater reliabilityEstablishing inter-rater reliability Determining if there is a difference between two groupsDetermining if there is a difference between two groups Test developmentTest development Qualitative approachesQualitative approaches

22

Stages of an Educational Stages of an Educational Interventional TrialInterventional Trial

StageStage ActivitiesActivities

11 Initial DesignInitial Design Hypothesis, SizeHypothesis, Size

22 Protocol DesignProtocol Design Define methods, collaborations, Define methods, collaborations, IRBIRB

33 RecruitmentRecruitment Subject Acquisition, MonitoringSubject Acquisition, Monitoring

44 FollowupFollowup Collect outcome dataCollect outcome data

55 AnalysisAnalysis Prepare “Clean + Locked” Prepare “Clean + Locked” DatabaseDatabase

Perform analysisPerform analysis

66 ReportingReporting Write and submit manuscriptWrite and submit manuscript

77 Additional analysesAdditional analyses Further explorations of trial dataFurther explorations of trial data

33

Population & SamplingPopulation & Sampling

Must balanceMust balance VariabilityVariability [the smaller or more diverse the population, [the smaller or more diverse the population,

the more variable; variability creates error]the more variable; variability creates error] GeneralizabilityGeneralizability [population can’t be too specific] [population can’t be too specific] AccessAccess [you can only study those you have access to] [you can only study those you have access to] CostCost [larger studies are much more expensive] [larger studies are much more expensive]

ConsiderConsider Participation rateParticipation rate Multiple sitesMultiple sites Online projectsOnline projects Lower reimbursementLower reimbursement

44

OutcomeOutcome

What is really important?What is really important? What would colleagues care about?What would colleagues care about? ‘‘Hard’ outcomesHard’ outcomes

Death, attendence, Death, attendence, ‘‘Soft’ outcomesSoft’ outcomes

Satisfaction, self-confidenceSatisfaction, self-confidence

55

Outcomes / EndpointsOutcomes / Endpoints

Primary OutcomePrimary Outcome What you power your study onWhat you power your study on

Secondary OutcomeSecondary Outcome Other related outcomes that may be interesting to testOther related outcomes that may be interesting to test

Exploratory OutcomesExploratory Outcomes Association studies, subgroups that may be Association studies, subgroups that may be

interesting, but likely to be underpoweredinteresting, but likely to be underpowered May serve as pilot data for future studiesMay serve as pilot data for future studies

Surrogate EndpointSurrogate Endpoint In the causal pathway and affected by the interventionIn the causal pathway and affected by the intervention

66

Group ActivityGroup Activity

Medical errors and patient safety continue to be Medical errors and patient safety continue to be an important concern for patients and an important concern for patients and physicians. Numerous reports have suggested physicians. Numerous reports have suggested that fatigue and sleepiness contribute to medical that fatigue and sleepiness contribute to medical errors. You are the program director in an errors. You are the program director in an internal medicine residency that has 40 internal medicine residency that has 40 residents and want to make a contribution in this residents and want to make a contribution in this area. area.

List an hypothesis that could be generated List an hypothesis that could be generated based on this reflection. based on this reflection.

How would you measure sleepiness?How would you measure sleepiness?

You review the available sleepiness scales and must choose one. Which one is best?

AAAwake indexAwake index

BBSleepy scoreSleepy score

CCDoze IndexDoze Index

DDSnory scaleSnory scale

EEYawn scoreYawn score

Scale SizeScale Size 88 100100 2020 6060 1212

Mean Rating for Mean Rating for ResidentsResidents

66 7272 1515 3030 55

Standard Standard Deviation for Deviation for ResidentsResidents

55 2020 44 99 33

DistributionDistribution Mean>Mean>MedianMedian

Mean=Mean=MedianMedian


Mean<Mean<MedianMedian


Expected Score Expected Score DifferenceDifference

33 1414 55 1010 44

Power and ErrorPower and Error

αα is the probability of making a Type I error is the probability of making a Type I error Power is the likelihood of avoiding a Type II errorPower is the likelihood of avoiding a Type II error Use trial type, Use trial type, αα and power to calculate sample and power to calculate sample

size size

99

Sample Size Calculations

1010

Calculating Sample SizeCalculating Sample SizeE

ffec

t S

ize

1 SD diff between groups with power of 0.8 requires 30-40 subjects

0.3 SD diff between groups with power of 0.8 requires 300-400 subjects

1111

Simple CalculationSimple Calculation

N (per group) = 15.8 / (effect size)N (per group) = 15.8 / (effect size)2 2 for power of 80% and for power of 80% and αα=0.05=0.05

Remember to increase enrollment so that Remember to increase enrollment so that number completing ≥ expected sample size number completing ≥ expected sample size

1212

You review the available sleepiness scales and must choose one. Which one is best?

AAAwake indexAwake index

BBSleepy scoreSleepy score

CCDoze IndexDoze Index

DDSnory scaleSnory scale

EEYawn scoreYawn score



66 7272 1515 3030 77


55 2020 44 99 33







33 1414 55 1010 44

Effect size = score difference / standard deviationEffect size = score difference / standard deviation

Power and Samples SizesAA

Awake indexAwake indexBB

Sleepy scoreSleepy scoreCC

Doze IndexDoze IndexDD

Snory scaleSnory scaleEE

Yawn scoreYawn score



66 7272 1515 3030 55


55 2020 44 99 33







33 1414 55 1010 44

N per groupN per group 2929 3333 2626 2020 1010

Power (N=15 Power (N=15 per grp)per grp)

0.350.35 0.450.45 0.910.91 0.840.84 0.940.94

Calculating Sample Size using SoftwareCalculating Sample Size using Software

Difference between groups

Standard Deviation

Choose Test

http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize

1515

Two faculty offer to measure the Two faculty offer to measure the sleepiness of residents using your scale. sleepiness of residents using your scale.

How can you find out if they are good How can you find out if they are good raters?raters?

Interrater Reliability Interrater reliability is the extent to

which two or more individuals (coders or raters) agree.

Training, education and monitoring skills can enhance interrater reliability.

Goal is generally reliability > 0.8

• Categorial: measure %• Ordinal: spearman rho• Continuous: pearson r

Rater 1 Rater 2 Rater 1 Rater 2

1 2 3 52 1 3 43 3 5 34 4 5 65 6 7 56 8 7 37 7 9 88 5 9 7

Pearson 0.81 Pearson 0.56

Analyzing your DataAnalyzing your Data

Plan your analysisPlan your analysis Consider consulting a specialistConsider consulting a specialist Test for normalityTest for normality Choose the right testChoose the right test Avoid statistical explorations with the dataAvoid statistical explorations with the data

1818

You start your study and find that among the interns the M:F ratio was 12:5 and 8:9 and wonder if they are statistically unbalanced.

Categorical CountsCategorical Counts

Chi-square statistic: Chi-square statistic: no cell no cell in the table should have an in the table should have an expected frequency of <1, expected frequency of <1, and no more than 20% of and no more than 20% of the cells should have an the cells should have an expected frequency of <5.expected frequency of <5.

Use Use Fisher’s exact test Fisher’s exact test when numbers are smallwhen numbers are small

Group 1 Group 2

MenMen 1212 88

WomenWomen 55 99

Chi-square = 1.1Chi-square = 1.1Fisher exact, p=0.29Fisher exact, p=0.29

2020

You collect your baseline observations You collect your baseline observations and find the following sleepiness in each and find the following sleepiness in each group. Are they different?group. Are they different? Grp 1 – 8, 6, 5, 2, 3, 9, 11, 6, 11Grp 1 – 8, 6, 5, 2, 3, 9, 11, 6, 11 Grp 2 – 3, 5, 5, 2, 7, 4, 8, 10, 2Grp 2 – 3, 5, 5, 2, 7, 4, 8, 10, 2

Summary of TestsSummary of Tests

Type of Data Two Paired Groups

Two Independent Groups

Many Independent Groups

Correlation

Categories McNemar Chi-square Chi-square

Continuous Paired t-test t-test ANOVA Pearson r

Rank Wilcoxon Kruskal-Wallis Spearman r

2222

Test for Test for Normality!Normality!

t-t-testtest

Comparing two meansComparing two means Check if paired or Check if paired or

unpairedunpaired The more SE’s you are The more SE’s you are

away from zero, the less away from zero, the less likely that the difference likely that the difference occurred by chanceoccurred by chance

Had Elective

No Elective

Number of students

145 48

Mean Score 76% 64%

SD 12 11

2323

Testing difference between two Testing difference between two groups over timegroups over time

t- test on between t- test on between group difference at group difference at endend

tt-test on change over -test on change over timetime

Time 1 Time 2

2424

Statistical Tests for Statistical Tests for Skewed or Rank DataSkewed or Rank Data

These data don’t follow normal rulesThese data don’t follow normal rules Non-parametric tests are less powerfulNon-parametric tests are less powerful Two groupsTwo groups

Wilcoxon rank sum (=Mann-Whitney-U)Wilcoxon rank sum (=Mann-Whitney-U)

Three or more groupsThree or more groups Kruskal-WallisKruskal-Wallis

2626

Wilcoxon Rank SumWilcoxon Rank Sum

Rank all observations in increasing order of Rank all observations in increasing order of magnitude, ignoring which group they come magnitude, ignoring which group they come from. from.

Add up the ranks in the smaller of the two Add up the ranks in the smaller of the two groups . groups .

Look up the critical value of the sum of ranks for Look up the critical value of the sum of ranks for that size group. that size group.

2727

Summary of TestsSummary of Tests

Type of Data Two Paired Groups

Two Independent Groups

Many Independent Groups

Correlation

Categories McNemar Chi-square Chi-square

Continuous Paired t-test t-test ANOVA Pearson r

Rank Wilcoxon Kruskal-Wallis Spearman r

2828

SummarySummary

Careful choice of your population will Careful choice of your population will improve your chances of finding an effectimprove your chances of finding an effect

Choose your outcome measure Choose your outcome measure thoughtfullythoughtfully

Estimate your power and sample size in Estimate your power and sample size in advanceadvance

Ensure internal consistency is goodEnsure internal consistency is good Determine normality and analyze your Determine normality and analyze your

dataset accordingly dataset accordingly

Graham McMahonGraham McMahon

[email protected] [email protected]

3030

Download - An Introduction to Educational Research Statistics Graham McMahon MD MMSc [email protected] 1

Top Related