statistics pres 10 27 2015 roy sabo

34
ADLT 673: TEACHING AS SCHOLARSHIP IN MEDICAL EDUCATION TUESDAY, OCTOBER 27, 2015 An Overview of Quantitative Data Analysis

Upload: tjcarter

Post on 27-Jan-2017

371 views

Category:

Education


4 download

TRANSCRIPT

Page 1: Statistics pres 10 27 2015   roy sabo

ADLT 673: TEACHING AS SCHOLARSHIP IN MEDICAL EDUCATION

TUESDAY, OCTOBER 27 , 2015

An Overview of Quantitative Data Analysis

Page 2: Statistics pres 10 27 2015   roy sabo

Outline of Today’s Class

Brief Description of Statistical Thinking

Analytic Methods Summary Measures Appropriate Research Questions Determining the appropriate Statistical Methodology Group Discussion

Designing a Data Collection Plan Sources, Capture and Storage

Sample Size Determination Group Discussion

Additional Resources

Page 3: Statistics pres 10 27 2015   roy sabo

Statistical ThinkingPopulation

All possible subjects EX: All US patients

Sample Subjects you observe EX: Patients seen at VCU

Sampling from Population Sample should be Microcosm Are other samples different? Small samples rare events Larger samples are better

Population

Sample

Page 4: Statistics pres 10 27 2015   roy sabo

Statistical Thinking

If this is the population…

Does sample look like this…

…or this…

Page 5: Statistics pres 10 27 2015   roy sabo

Statistical Thinking (Example)

Sample: Experimental drug to reduce side effects from surgical

procedure Historical rate: 10% experience no side effects New Trial: 33 successes (no side effects) in 200 patients

Sample percentage: =33/200 = 16.5%

What does evidence imply about population? Does new drug show improvement? What happens if we run this experiment again? Is this sample “big” enough to represent population? If historical rate is truly 10%, what would samples look like?

Page 6: Statistics pres 10 27 2015   roy sabo

Statistical Thinking (Example)

# of Successes out of 200

Frequency out of

1000

Proportion

# of Successes out of 200

Frequency out of

1000

Proportion

5 1 0.001 21 75 0.0756 0 0.000 22 78 0.0787 1 0.001 23 71 0.0718 0 0.000 24 54 0.0549 2 0.002 25 33 0.03310 8 0.008 26 25 0.02511 9 0.009 27 27 0.02712 20 0.020 28 11 0.01113 25 0.025 29 8 0.00814 35 0.035 30 11 0.01115 48 0.048 31 4 0.00416 65 0.065 32 4 0.00417 69 0.069 33 1 0.00118 99 0.099 34 1 0.00119 94 0.094 35 1 0.00120 119 0.119 36 1 0.001

Simulation Study of Samples with 200 Dichotomous Observations with Known 10% Success Rate

Page 7: Statistics pres 10 27 2015   roy sabo

Statistical Thinking (Example)

Histogram of 1,000 Simulated Samples of 200 Dichotomous Outcomes with Assumed p = 10% Success Rate

0.025

0.030

0.035

0.040

0.045

0.050

0.055

0.060

0.065

0.070

0.075

0.080

0.085

0.090

0.095

0.100

0.105

0.110

0.115

0.120

0.125

0.130

0.135

0.140

0.145

0.150

0.155

0.160

0.165

0.170

0.175

0.180

0

20

40

60

80

100

120

140

Page 8: Statistics pres 10 27 2015   roy sabo

Statistical Thinking (Example)

If the true proportion was really p = 10%… Then our event (33 successes) would be observed about 1 in every 1000

trials Estimated Probability: 1/1000 = 0.001

Two possible explanations for our sample: Rate really is 10% we observed a rare event Our assumption (p = 10%) was incorrect

Revised Statistical Thinking: If observed event is likely given our assumptions, then our assumptions

are probably correct

If observed event is unlikely given our assumptions, then our assumptions are probably NOT correct

Page 9: Statistics pres 10 27 2015   roy sabo

Analytic Methods: Summary Measures

Representative Measures Reflect the most “typical” or “average” data value

Continuous Measurements: Mean (Average), Median and Mode

Categorical Measurements: Frequencies and Proportions

Measures of Variability Reflect how much subjects differ from one another

Continuous Measurements: Standard deviation, range, interquartile range

Categorical Measurements: None that are meaningful (sorry!)

Page 10: Statistics pres 10 27 2015   roy sabo

Analytic Methods : Research Question

Translating Research Question into Testable Hypotheses Research question must be in form that allows statistical method to be assigned Three components: # of groups, measurement type, # of measures

1. Subjects Identify population under consideration Determine # of groups (and what distinguishes them)

2. Measurements Identify the measurement Type (continuous or categorical) Determine the summary measure (e.g., mean or proportion) # of Times Measured (Once, twice or greater?)

3. Statement (i.e., what are you trying to do?) Estimation (is something simply being measured?) Change (is something being tracked over time? Before and after an event?) Comparison (is something compared between groups?)

Page 11: Statistics pres 10 27 2015   roy sabo

Analytic Methods : Research Question

Who is under consideration? Identify population under consideration Determine # of groups (and what distinguishes them)

Ex.: Is the proportion of patients up to date on their colon cancer screening greater for those who receive an email reminder from their physician than for those who do not receive such email messages?

Population? Number of Groups?

Page 12: Statistics pres 10 27 2015   roy sabo

Analytic Methods : Research Question

Is it measureable? Identify the measurement Type (continuous or categorical) Determine the summary measure (e.g., mean or proportion) # of Times Measured (Once, twice or greater?)

Ex.: Is the proportion of patients up to date on their colon cancer screening greater for those who receive an email reminder from their physician than for those who do not receive such email messages? Measurement Type? Summary Measures? Number of Times Measured?

Page 13: Statistics pres 10 27 2015   roy sabo

Analytic Methods: Research Question

Is the “question” clear? Estimation (is something simply being measured?) Change (is something being tracked over time? Before

and after an event?) Comparison (is something compared between groups?)

Ex.: Is the proportion of patients up to date on their colon cancer screening greater for those who receive an email reminder from their physician than for those who do not receive such email messages? Statement: Estimation, Change or Comparison?

Combination?

Page 14: Statistics pres 10 27 2015   roy sabo

Analytic Methods: Continuous Data

# of Measurements# of

SamplesSingle Pre/Post Repeated Measures

1 Sample t-test Paired t-test Repeated Measures ANOVA (RMA) / Linear Mixed Model (LMM)*

2 Samples Two-sample t-

test

RMA / LMM* RMA / LMM*

“k” Samples

Analysis of Variance (ANOVA)

RMA / LMM* RMA / LMM*

Adjusting for

Covariates:

Multiple Linear Regression*, Analysis of Covariance (ANCOVA)*, Linear Mixed Models*

*Will likely require statistical assistance

Page 15: Statistics pres 10 27 2015   roy sabo

Analytic Methods: Categorical Data

# of Measurements# of

SamplesSingle Pre/Post Repeated Measures

1 Sample z-test McNemar’s Test

Generalized Linear Mixed Models

(GLMM)*2 Samples Chi-square

TestGLMM* GLMM*

“k” Samples

Chi-square Test

GLMM* GLMM*

Adjusting for

Covariates:

Multiple Logistic Regression*, Generalized Linear Mixed Models*

*Will likely require statistical assistance

Page 16: Statistics pres 10 27 2015   roy sabo

Analytical Methods: Examples

Research Question: What are CD3 cell counts in BMT recipients 60 days after transplantation?

Research Question: Are CD3 cell counts in BMT recipients 60 days after transplantation larger than counts at baseline (day 0)?

Research Question: Are CD3 cell counts in BMT recipients receiving a 5.1-unit ATG dose as big as the countss in recipients receiving a 7.5-unit dose?

Page 17: Statistics pres 10 27 2015   roy sabo

Group Discussions

Please break into groups by table

For the next 10-15 minutes, take turns discussing what analytic approaches are appropriate for your proposed study Is your outcome continuous or categorical? How many groups are you investigating? How many measurements are you taking? What statistical methodology should you use?

If your study is qualitative, discuss how statistical methodologies could be used (e.g. data summary, association)

Page 18: Statistics pres 10 27 2015   roy sabo

Data Collection Plan: Sources

What information do you need to answer your research question? Electronic Health Records (EHR):

CERNER ONCORE

Integrated Personal Health Record (IPHR): MyPreventiveCare

Chart reviews Surveys Prospective biological measurements

Need to know: Who will physically obtain/collect data? How often will it be done? If prospective biological measures: how will it be done?

Page 19: Statistics pres 10 27 2015   roy sabo

Data Collection Plan: Capture

How will you obtain the necessary information? EHR or IPHR extraction Chart audits Surveys

In-Person, mail, email or online Prospective measurements

Need to know: Who will do this? How often will it be done? If prospective biological measures: how will it be done?

Page 20: Statistics pres 10 27 2015   roy sabo

Data Collection Plan: Storage

Where will your data be stored? Paper records No Microsoft Excel or Access REDCAP

Collects and stores survey data…and much more SAS database (or SPSS, R, etc.)

Work with your statistician to create dataset

Need to know: How often it will be updated? Is it secure? Is it IRB/HIPAA compliant?

Page 21: Statistics pres 10 27 2015   roy sabo

Data Collection Plan

Helpful Suggestions: Consult a statistician or database manager before you

start collecting data Preferably the person who will be analyzing your data

If you are collecting and storing data yourself: Record it directly into storage unit as you collect it

E.g., Microsoft Excel Record it as it will be analyzed

One row per subject per time point• New row for each additional time point

One column per measurement

Page 22: Statistics pres 10 27 2015   roy sabo

Data Collection Plan: Example

Page 23: Statistics pres 10 27 2015   roy sabo

Sample Size Determination

As a general rule, larger sample sizes: Lead to more representative samples Lead to better estimation of parameters (e.g.,

representative measures) Provide estimators with lower variability

N=9 N=36

N=100

Page 24: Statistics pres 10 27 2015   roy sabo

Sample Size Determination

Averages over 10,000 SimulationsSample

SizeSample Mean

Sample Std. Dev.

Standard Error*

9 204.4 36.5 12.316 204.3 37.1 9.525 204.2 37.2 7.836 204.1 37.5 6.549 204.1 37.6 5.564 204.2 37.7 4.981 204.1 37.7 4.2100 204.1 37.7 3.91000 204.1 37.7 1.2

*SE: explains variability in estimator; not the sample data

Page 25: Statistics pres 10 27 2015   roy sabo

Sample Size Determination

Possible Decisions

Type I Error: find difference where there shouldn’t be one

Type II Error: fail to find difference where it should be Power = 1 - β

True StateYour Decision H0 is “True” HA is TrueReject H0 Type I Error

αCorrect Decision

Fail to Reject H0

Correct Decision

Type II Error

β

Page 26: Statistics pres 10 27 2015   roy sabo

Sample Size Determination

Determinants of Required Sample Size

Significance Level (α): probability of rejecting H0 when it’s true

Power (1-β): probability of failing to reject H0 when it’s false

These values are selected during design phase α = 5% 1-β = 80% (sometimes 90%).

Page 27: Statistics pres 10 27 2015   roy sabo

Sample Size Determination

Determinants of Required Sample Size

Measure of variability (usually standard deviation) inherent in study population As measurement becomes more variable… Standard error of test statistic increases… p-value increases… Ability to reject H0 decreases… Power decreases

Controlling variability: Better measurement methodology Homogeneous samples

Page 28: Statistics pres 10 27 2015   roy sabo

Sample Size Determination

Determinants of Required Sample Size

Effect Size: smallest difference or change in outcome you hope to find As difference you want to observe decreases… Test statistic decreases… p-value increases… Ability to reject H0 decreases… Power decreases

Considerations: Clinical significance Clinical possibility

Large differences easier to detect and harder to find

Page 29: Statistics pres 10 27 2015   roy sabo

Sample Size Determination

Calculating Required Sample Size Equations exist (involving α, β, variability and effect

size) for simple analytic methods (t-test, chi-square, etc.)

Advanced methods require professional assistance

Where do you find variability and effect size? Previous literature of similar populations Retrospective Study / Chart Audits Pilot study Guesstimate!

Page 30: Statistics pres 10 27 2015   roy sabo

Sample Size Determination

What if required sample size is too large? Consider a different outcome

Continuous measures generally require smaller sample sizes than categorical measures

Consider fewer groups or add multiple sites Fewer Groups More subjects per group Multiple Sites Larger subject pool (maybe more representative…)

Will require more sophisticated analytic methods

Reconfigure study as a “pilot” Switch emphasis from “hypothesis testing” to “estimation” Goal: data summaries and confidence intervals Use to power larger study

Page 31: Statistics pres 10 27 2015   roy sabo

Group Discussion

Please break into groups by table

For the next 10-15 minutes, take turns discussing: Data Management:

Where will you get your data? How will you capture it? How will you store it?

Sample Size Determination: Are you able to power your study? Where will (did) you find information for your power

analysis?

Page 32: Statistics pres 10 27 2015   roy sabo

Additional Resources

VCU Department of Biostatistics 15 full-time faculty

Can assist with: study design, sample size determination, interim and final analyses, dissemination

Grant funding (or prospects of funding) usually required

BIOS 516 Biostatistical Consulting: graduate students available for FREE consultations Contact Russ Boyle ([email protected]) Provide a protocol Offer co-authorship

Page 33: Statistics pres 10 27 2015   roy sabo

Additional Resources

VCU Center for Clinical and Translation Research

Research Incubator: study design, sample size determination, and other resources (e.g. grant writing) Contact: Pam Dillon ([email protected])

Biomedical Informatics: data management and storage (e.g. REDCAP) Support requested online:

(http://www.cctr.vcu.edu/informatics/index.html)

Page 34: Statistics pres 10 27 2015   roy sabo

Additional Resources

Textbook (i.e., shameless plug): Statistical Research Methods: A Guide for Non-

Statisticians Sabo and Boone, Springer, 2013 Available on the web: