surviving pharmacy residency research: tips and tricks for statistical planning
TRANSCRIPT
Surviving Pharmacy Residency Research: Tips and Tricks for Statistical
Planning
Surviving Pharmacy Residency Surviving Pharmacy Residency Research: Tips and Tricks for Research: Tips and Tricks for
Statistical PlanningStatistical Planning
© Fraser Health Authority, 2007
The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial redistribution. In consideration for this authorization, the user agrees that any unmodified reproduction of this publication shall retain all copyright and proprietary notices. If the user modifies the content of this publication, all FH copyright notices shall be removed, however FH shall be acknowledged as the author of the source publication.
Reproduction or storage of this publication in any form by any means for the purpose of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be relied on as providing specific healthcare, legal or other professional advice. The Fraser Health Authority, and every person involved in the creation of this publication, disclaims any warranty, express or implied, as to its accuracy, completeness or currency, and disclaims all liability in respect of any actions, including the results of any actions, taken or not taken in reliance on the information contained herein.
FH Health Research Intelligence FH Health Research Intelligence Unit Unit
How can we help?How can we help? Grant Facilitator-WriterGrant Facilitator-Writer Conducting a search for funding opportunities.Conducting a search for funding opportunities. Automatic notification of new funding sources and Automatic notification of new funding sources and
deadlines.deadlines. Identifying a research team.Identifying a research team. Preparing letters of intent.Preparing letters of intent. Identifying resources required for conducting research.Identifying resources required for conducting research. Formulating the research budget.Formulating the research budget. Writing the grant application in collaboration with Writing the grant application in collaboration with
researchers.researchers. Understanding FH and funding agency requirements Understanding FH and funding agency requirements
regarding preparation of specific documents.regarding preparation of specific documents.
FH Health Research Intelligence FH Health Research Intelligence Unit Unit
How can we help?How can we help? EpidemiologistEpidemiologist Specifying the research goal, Specifying the research goal,
objectives and hypothesis.objectives and hypothesis. Identifying measurable Identifying measurable
outcomes.outcomes. Specifying the variables for Specifying the variables for
analysis.analysis. Identifying sources of data.Identifying sources of data. Developing data collection Developing data collection
tools for quantitative or tools for quantitative or qualitative studies.qualitative studies.
Developing the statistical Developing the statistical analysis plan.analysis plan.
Understanding how to use Understanding how to use statistical software, such as statistical software, such as SPSS.SPSS.
Workshop OutlineWorkshop Outline
Research 101- Basic Research StepsResearch 101- Basic Research Steps Research Question RefinementResearch Question Refinement Common Study Designs- Common Study Designs- ResourceResource Levels of DataLevels of Data Power and Sample SizePower and Sample Size Statistical Test Selection- Statistical Test Selection- ExerciseExercise Data Reporting- Data Reporting- ResourceResource Simple Stats with Excel- Simple Stats with Excel- ResourceResource
Pharmacy Residency ProjectPharmacy Residency Project 1) Develop a research question1) Develop a research question 2) Conduct thorough literature review2) Conduct thorough literature review 3) Re-define research question or hypothesis3) Re-define research question or hypothesis 4) Design research methodology/study4) Design research methodology/study 5) Create research proposal5) Create research proposal 6) Apply for funding 6) Apply for funding 7) Apply for ethics approval7) Apply for ethics approval 8) Collect and analyze data8) Collect and analyze data 9) Draw conclusions and relate findings9) Draw conclusions and relate findings
Research Question Research Question RefinementRefinement
Research question will describe in operational Research question will describe in operational terms, what you think will happen in the study.terms, what you think will happen in the study.
Good Versus Bad Research Good Versus Bad Research QuestionQuestion
Are patients who take Are patients who take drug X more likely to drug X more likely to experience episodes experience episodes of delirium?of delirium?
Do patients who Do patients who receive medication X receive medication X between September between September 2008 and November 2008 and November 2008 experience 2008 experience more episodes of more episodes of delirium as compared delirium as compared to patients who to patients who received drug Y received drug Y during the same time during the same time period? period?
Classification of Research Classification of Research StudiesStudies
Research Studies
Observational Experimental
Descriptive Analytic
Observational Studies:Observational Studies:
Descriptive Studies:
Focus on describing populations and describing the relationship between variables
Analytic Studies:
Make inferences about the population based on a random sample.
Experimental Studies:Experimental Studies:
Test relationships between exposures and outcomes. Investigator has direct control over study condition and exposure status.
Hierarchy of StudiesHierarchy of Studies
Experimental Studies
Analytic Studies
Descriptive Studies
Type of study is selected Type of study is selected according to the purpose of according to the purpose of
research.research.
Levels of EvidenceLevels of Evidence
HandoutHandout- - Research Design Research Design HierarchyHierarchy
Probability Sampling Methods: Probability Sampling Methods: RandomRandom
There are several methods to choose There are several methods to choose from:from:
Simple random Simple random
sampling. sampling.
Probability Sampling Methods: Probability Sampling Methods: StratifiedStratified
Stratified sampling Stratified sampling (divide the population into (divide the population into non-overlapping strata and non-overlapping strata and sample from within each sample from within each stratum independently).stratum independently).
Guarantees representation Guarantees representation of all important groups.of all important groups.
Probability Sampling Methods: Probability Sampling Methods: SystematicSystematic
Selection of the Selection of the sample using an sample using an interval “k” so that interval “k” so that every “k” unit in the every “k” unit in the frame is selected, frame is selected, is called systematic is called systematic
random samplingrandom sampling..
Probability Sampling Methods: Probability Sampling Methods: SystematicSystematic
Steps to achieve a systematic random sample: Steps to achieve a systematic random sample:
1. Number the units in the population from 1 to N.1. Number the units in the population from 1 to N.2. Decide on the n (sample size) that you want or need. 2. Decide on the n (sample size) that you want or need.
• k = N/n = the interval size. k = N/n = the interval size.
3. Randomly select an integer between 1 and k. 3. Randomly select an integer between 1 and k. 4. Then take every kth unit. 4. Then take every kth unit.
Example: Example: 1.1. N=200N=2002.2. n=40, take N/n, 200/40=5 (interval size).n=40, take N/n, 200/40=5 (interval size).3.3. Randomly select a number between 1 and 5 (let’s pick 4).Randomly select a number between 1 and 5 (let’s pick 4).4.4. Begin with 4, and take every 5Begin with 4, and take every 5thth unit. unit.
Probability Sampling Methods: Probability Sampling Methods: ClusterCluster
Cluster sampling.Cluster sampling. Divide population into clusters and Divide population into clusters and
randomly sample clusters. randomly sample clusters. Measure Measure allall units within sampled clusters. units within sampled clusters. Example: See blue areas on map. Example: See blue areas on map.
Not just geographic areas, Not just geographic areas, could select hospitals, could select hospitals, schools etc.schools etc.
Non-Probability Sampling Non-Probability Sampling MethodsMethods
There are different types of non-probability There are different types of non-probability sampling methods as well:sampling methods as well: Convenience (not representative of population).Convenience (not representative of population). Purposive (certain group in mind).Purposive (certain group in mind). Expert sampling (seek out specific expertise).Expert sampling (seek out specific expertise). Snowball sampling (ask people to participate, they ask Snowball sampling (ask people to participate, they ask
more people).more people).
If you select non-probability sampling methods, If you select non-probability sampling methods, the conclusions drawn from the study results apply the conclusions drawn from the study results apply only to that specific population.only to that specific population.
Measurement: Levels of Measurement: Levels of DataData
The The level of datalevel of data will dictate which statistical test you will dictate which statistical test you should use.should use.
CategoricalCategorical = = Data that is classified into categories and Data that is classified into categories and cannot be arranged in any particular ordercannot be arranged in any particular order (e.g. Apples (e.g. Apples and pears, gender, eye colour, ethnicity). and pears, gender, eye colour, ethnicity).
OrdinalOrdinal = Data ordered, but distance between intervals = Data ordered, but distance between intervals not always equal. (e.g. Low, middle and high income).not always equal. (e.g. Low, middle and high income).
Continuous Continuous = equal distance between each interval = equal distance between each interval (e.g. 1,2,3., age).(e.g. 1,2,3., age).
Statistics and Statistical Test Statistics and Statistical Test SelectionSelection
Descriptive StatisticsDescriptive Statistics: Describes : Describes research findingsresearch findings
E.g. Frequencies, averages.E.g. Frequencies, averages.
Inferential StatisticsInferential Statistics: Makes inferences about : Makes inferences about the population, based on a random sample.the population, based on a random sample. In a random sample, each person/unit has an In a random sample, each person/unit has an
equal chance of being selectedequal chance of being selected Allows generalizability to population.Allows generalizability to population.
Types of StatisticsTypes of Statistics
Types of VariablesTypes of Variables
Variables can be classified as Variables can be classified as independent independent or or dependent.dependent.
An An independent independent variable is the variable you believe will variable is the variable you believe will influence your outcome measure.influence your outcome measure.
A A dependentdependent variable is the variable that is dependant variable is the variable that is dependant on or influenced by independent variable(s). The on or influenced by independent variable(s). The dependent variable can also be the variable you are dependent variable can also be the variable you are trying to predict.trying to predict.
Selecting the appropriate Statistical test requires Selecting the appropriate Statistical test requires several steps:several steps:
Test selection should be based on:Test selection should be based on:
1) 1) What is your goalWhat is your goal? ? Description? Comparison? Prediction? Quantify Description? Comparison? Prediction? Quantify association? Prove effectiveness? Prove causality?association? Prove effectiveness? Prove causality?
2) 2) What kind of data have you collectedWhat kind of data have you collected? ? What are the levels of data What are the levels of data (Nominal, ordinal, continuous)? Was your sample randomly selected?(Nominal, ordinal, continuous)? Was your sample randomly selected?
3) 3) Is your data normally distributedIs your data normally distributed? ? Should you use a parametric or non-Should you use a parametric or non-parametric test?parametric test?
4) 4) What are the assumptions of the statistical test you would like to What are the assumptions of the statistical test you would like to useuse? ? Does the data meet these assumptions?Does the data meet these assumptions?
Statistical Test SelectionStatistical Test Selection
Parametric TestsParametric Tests
Parametric testsParametric tests assume that the variable in question is assume that the variable in question is from a normal distribution.from a normal distribution.
Non-parametric testsNon-parametric tests do not require the assumption of do not require the assumption of normality.normality.
Most non-parametric tests do not require an interval level Most non-parametric tests do not require an interval level of measurement; can be used with nominal/ordinal level of measurement; can be used with nominal/ordinal level data.data.
AssumptionsAssumptions There are various There are various assumptionsassumptions for each test. for each test. Before you select a test, be sure to check the assumptions of each Before you select a test, be sure to check the assumptions of each
test.test. You will need to contact a consultant, or review statistical/research You will need to contact a consultant, or review statistical/research
methods resources to find this information.methods resources to find this information. Some examples of common assumptions are:Some examples of common assumptions are:
The dependent variable will need to be measured on a certain The dependent variable will need to be measured on a certain level (i.e. Interval level).level (i.e. Interval level).
The independent variable(s) will need to be measured on a The independent variable(s) will need to be measured on a certain level (i.e. Ordinal level).certain level (i.e. Ordinal level).
The population is normally distributed (not skewed).The population is normally distributed (not skewed).
If your data do not meet the assumptions for a specific test, you If your data do not meet the assumptions for a specific test, you may be able to use a non-parametric test instead.may be able to use a non-parametric test instead.
Type of Data
Goal Measurement Normal Population
Ordinal, or Non-Normal Population
Binomial-Two Possible Outcomes
Survival Time
Describe one group
Mean, SD Median, interquartile range
Proportion Kaplan Meier survival curve
Compare one group to a hypothetical value
One-sample t test
Wilcoxon test Chi-squareorBinomial test **
Compare two unpaired groups
Unpaired t test Mann-Whitney test
Fisher's test(chi-square for large samples)
Log-rank test or Mantel-Haenszel*
Compare two paired groups
Paired t test Wilcoxon test McNemar's test
Conditional proportional hazards regression*
Compare three or more unmatched groups
One-way ANOVA
Kruskal-Wallis test
Chi-square test
Cox proportional hazard regression**
Compare three or more matched groups
Repeated-measures ANOVA
Friedman test Cochrane Q** Conditional proportional hazards regression**
Quantify association between two variables
Pearson correlation
Spearman correlation
Contingency coefficients**
Predict value from another measured variable
Simple linear regressionorNonlinear regression
Nonparametric regression**
Simple logistic regression*
Cox proportional hazard regression*
Predict value from several measured or binomial variables
Multiple linear regression*orMultiple nonlinear regression**
Multiple logistic regression*
Cox proportional hazard regression*
Statistical Test Selection Statistical Test Selection Group ExerciseGroup Exercise
Using your tables, select the Using your tables, select the appropriate statistical tests for 10 appropriate statistical tests for 10 research scenarios.research scenarios.
Handout- Test Selection ExerciseHandout- Test Selection Exercise
During the group exercise…During the group exercise… Steps to choose the appropriate statistical method Steps to choose the appropriate statistical method
for the data analysis:for the data analysis:
1. Identify whether the research problem raises the 1. Identify whether the research problem raises the question of question of describe, relate (association), or compare describe, relate (association), or compare (difference).(difference).
2. Identify the 2. Identify the levels of measurementlevels of measurement in the research in the research question (Nominal/Categorical, Ordinal/Rank, question (Nominal/Categorical, Ordinal/Rank, Continuous/Evenly spaced).Continuous/Evenly spaced).
3. Identify the 3. Identify the number of variables, or samplesnumber of variables, or samples being being described, related, or compared. described, related, or compared.
4. Identify whether comparison samples are 4. Identify whether comparison samples are relatedrelated (analyze same group before and after) or(analyze same group before and after) or independent independent (not at all related, looking at different groups).(not at all related, looking at different groups).
5. Choose the appropriate statistical tool for the data and 5. Choose the appropriate statistical tool for the data and situation using the decision tree in the handout.situation using the decision tree in the handout.
What is the question: What is the question: CompareCompareHow many samples: How many samples: 22Related or independent: Related or independent: Independent Independent What is the level of measurement: What is the level of measurement: ContinuousContinuousHow many dependent variables: How many dependent variables: 11 Test: Test: T-testT-test
1. A pilot experiment designed to test the effectiveness of a 1. A pilot experiment designed to test the effectiveness of a new approach to electrode placement for Electro Shock new approach to electrode placement for Electro Shock Therapy (ECT) has been conducted over a one year time Therapy (ECT) has been conducted over a one year time period in the Fraser Health Authority. period in the Fraser Health Authority.
Patients from Patients from two different mood disorder clinicstwo different mood disorder clinics participated in participated in this study. Patients from Clinic X received ECT therapy this study. Patients from Clinic X received ECT therapy according to current practice guidelines. Patients from Clinic Y according to current practice guidelines. Patients from Clinic Y received a new exploratory ECT treatment. Patients in each received a new exploratory ECT treatment. Patients in each clinic were matched for age, gender, and type of disorder. A clinic were matched for age, gender, and type of disorder. A random sample of 30 matched pairs of patients were selected random sample of 30 matched pairs of patients were selected for inclusion in the study. At end of one year, patients were for inclusion in the study. At end of one year, patients were administered a memory test yielding a total administered a memory test yielding a total score out of 100score out of 100. . Dr. Vasdil would like to know what statistical procedure needs Dr. Vasdil would like to know what statistical procedure needs to be selected to to be selected to test for differencestest for differences among groups of patients among groups of patients on the memory test.on the memory test.
Sample SizeSample Size
There are several rules of thumb for determining There are several rules of thumb for determining sample sizesample size..
1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing 1) It’s a good idea to have a minimum of 30 cases (as a total group, or if comparing groups, 30 for each group).groups, 30 for each group).
If you have less you can use a non-parametric test, but it is still better to have close If you have less you can use a non-parametric test, but it is still better to have close to 30 cases.to 30 cases.
2) If using regression, it is best to have between 10-50 cases per independent 2) If using regression, it is best to have between 10-50 cases per independent variable.variable.
3) If you are validating a survey, it is never good to have more questions than cases.3) If you are validating a survey, it is never good to have more questions than cases. 4) If the total population that you are examining is less than 30. Use all of them. 4) If the total population that you are examining is less than 30. Use all of them. 5) For pilot studies the recommendation is a sample size of 12 per group 5) For pilot studies the recommendation is a sample size of 12 per group 6) For surveys, a sample size of 400 per group can do just about anything.6) For surveys, a sample size of 400 per group can do just about anything. 7) For surveys, a 30% response rate is the bare minimum.7) For surveys, a 30% response rate is the bare minimum.
Note: For a precise sample size estimate you will need to conduct a power analysis. Note: For a precise sample size estimate you will need to conduct a power analysis.
Statistical PowerStatistical Power Power is the capability of a statistical test to Power is the capability of a statistical test to
correctly detect a significant effect if it exists.correctly detect a significant effect if it exists. Assumes value between 0 and 1 (%)Assumes value between 0 and 1 (%)
Power= 1-B (B= probability of a Type II error).Power= 1-B (B= probability of a Type II error). Type II error – the error of Type II error – the error of not rejecting a false not rejecting a false
research finding.research finding. Type I error- the error of Type I error- the error of rejecting a correct rejecting a correct
research finding.research finding.
Types of PowerTypes of Power
A Priori-A Priori- Conducted before study Conducted before study commences (at proposal stage).commences (at proposal stage).
Post Hoc-Post Hoc- After study has been After study has been completed.completed.
Easy way to increase power?Easy way to increase power? Increase sample sizeIncrease sample size Increase Effect sizeIncrease Effect size
Components Involved in Power Components Involved in Power CalculationCalculation
Sample Size-Sample Size- Number of cases. Number of cases. Effect SizeEffect Size –Magnitude of the trend and –Magnitude of the trend and
variation.variation. Alpha Level-Alpha Level- Odds of concluding that the Odds of concluding that the
presence of an effect is due to chance alone presence of an effect is due to chance alone (.05 or .01). (.05 or .01). Also known as Type I Error, or the error of rejecting a Also known as Type I Error, or the error of rejecting a
correct research findingcorrect research finding Power level-Power level- 80-90% common 80-90% common One or two-tailed test- One or two-tailed test- two tailed is common.two tailed is common.
Components Involved in Components Involved in Power CalculationPower Calculation
Sample Size-Sample Size- What we want to find out. What we want to find out. Effect SizeEffect Size –Magnitude of the trend…but –Magnitude of the trend…but
what if you don’t know?what if you don’t know? Look to pilot data or literature.Look to pilot data or literature. Keep in mind, the smaller the effect size, the Keep in mind, the smaller the effect size, the
larger the sample size required.larger the sample size required. Alpha Level-Alpha Level- .05 .05 Power level-Power level- 80-90% 80-90%
Important Consultation InformationImportant Consultation Information
What is your research question?What is your research question? Components of power calculationComponents of power calculation Levels of data (nominal, ordinal, Levels of data (nominal, ordinal,
continuous)continuous) Sampling planSampling plan
Data Organization: Data Organization: CodebookCodebook
What is a codebook?What is a codebook? A codebook is a log of your variables A codebook is a log of your variables
(and levels of data) and how you will (and levels of data) and how you will code them.code them.
A codebook will help everyone A codebook will help everyone understand the coding schemes to understand the coding schemes to ensure that they are on the same page!ensure that they are on the same page!
Data Processing and Analyses: Data Processing and Analyses: Codebook ExampleCodebook Example
VariableVariable NameName
VariableVariable LabelLabel
ValuesValues CodingCoding MissingMissing VariableVariable TypeType
ageage ageage 1,2,3,4,51,2,3,4,5 1=10-20 years 1=10-20 years 2=21-30 years 2=21-30 years 3=31-40 years 3=31-40 years 4=41-50 years 4=41-50 years 5=51+ years5=51+ years
97=Incorrect 97=Incorrect responseresponse
98=No response98=No response99=Not 99=Not
ApplicableApplicable
OrdinalOrdinal
sexsex sexsex 1,21,2 1=male, 2=female1=male, 2=female 97=Incorrect 97=Incorrect responseresponse
98=No response98=No response99=Not 99=Not
ApplicableApplicable
NominalNominal
happinesshappiness happiness happiness atat
workwork
1,2,31,2,3 1=not happy1=not happy2=somewhat happy2=somewhat happy3=very happy3=very happy
97=Incorrect 97=Incorrect responseresponse
98=No response98=No response99=Not 99=Not
ApplicableApplicable
OrdinalOrdinal
Spreadsheet ExampleSpreadsheet ExampleID# Age Sex
Happiness 1 1 1 2
2 2 2 2
3 3 1 2
4 57 2 2
5 45 2 3
6 66 2 3
7 2 2 3
8 88 2 3
Data Analysis with ExcelData Analysis with Excel
Most simple analyses can be done using Excel, Most simple analyses can be done using Excel, including correlation, regression and even including correlation, regression and even random number generation.random number generation.
Install the Install the data analysis packdata analysis pack.. Go to tools, add-ins, and add the ‘analysis tool pack’.Go to tools, add-ins, and add the ‘analysis tool pack’.
Create worksheet and codebook.Create worksheet and codebook. Choose statistical test.Choose statistical test.
Follow commands in help menu.Follow commands in help menu.
http://http://home.ubalt.edu/ntsbarsh/excel/excel.htmhome.ubalt.edu/ntsbarsh/excel/excel.htm
Data Analysis with ExcelData Analysis with Excel
Data Reporting and Data Reporting and Presentation of DataPresentation of Data
Graphical summaries are a great way to Graphical summaries are a great way to present your datapresent your data
Excel is great for creating tables and Excel is great for creating tables and graphsgraphs
The type of data you have will reflect the The type of data you have will reflect the type of graphical summary you should type of graphical summary you should use.use.
Data Reporting and Data Reporting and Presentation of Descriptive Presentation of Descriptive
DataData Categorical dataCategorical data: :
Frequency Tables Frequency Tables and Bar Charts.and Bar Charts.
Example: FruitExample: Fruit
CountCount PercentPercent Valid Valid PercentPercent
PineapplesPineapples 44 20%20% 21%21%
ApplesApples 55 25%25% 26%26%
OrangesOranges 1010 50%50% 53%53%
UnknownUnknown 11 5%5% ______________
TotalTotal 2020 100%100% 100%100%
0 5 10
Pineapples
Apples
Oranges
Unknown
Fruit Study
Data Reporting and Data Reporting and Presentation of Descriptive Presentation of Descriptive
DataData
Data Reporting and Data Reporting and Presentation of Descriptive Presentation of Descriptive
DataData Continuous DataContinuous Data: :
Tables and Tables and HistogramsHistograms
AgeAge CountCount PercentPercent
20-3020-30 44 20%20%
31-4031-40 55 25%25%
41-5041-50 1010 50%50%
51-6051-60 11 5%5%
TotalTotal 2020 100%100%
Data Reporting and Data Reporting and Presentation of Descriptive Presentation of Descriptive
DataData
0
2
4
6
8
10
20-30
31-40
41-50
51-60
20 30 40 50
What is the difference between What is the difference between a Histogram and a Bar Chart?a Histogram and a Bar Chart?
HistogramHistogram: For continuous data where : For continuous data where data are divided into contiguous class data are divided into contiguous class intervals (or in other words, connected intervals (or in other words, connected through unbroken sequence).through unbroken sequence).
Bar ChartBar Chart: For categorical data where : For categorical data where categories are not contiguous.categories are not contiguous.
Measures of Central TendencyMeasures of Central Tendency
Reporting averagesReporting averages Categorical data= ModeCategorical data= Mode Ordinal data= MedianOrdinal data= Median Continuous data= MeanContinuous data= Mean
If there are outliers (or extreme values), If there are outliers (or extreme values), report the median instead of the mean.report the median instead of the mean.
Reporting Inferential StatsReporting Inferential Stats Handout Resource- APA GuidelinesHandout Resource- APA Guidelines http://http://www.ilstu.edu/~jhkahn/apastats.htmlwww.ilstu.edu/~jhkahn/apastats.html
Reporting Inferential StatsReporting Inferential Stats
It’s important to include means, standard It’s important to include means, standard deviations and sample size in your results deviations and sample size in your results section. section.
Example: CorrelationExample: Correlation Variable X was strongly correlated with Variable X was strongly correlated with
Variable Y, r=.59, p<.01.Variable Y, r=.59, p<.01.
Important to Keep your Audience in Mind
Residency Project
Publication
Departmental Report
Aaron: TCPS certification for Aaron: TCPS certification for residents reminder…residents reminder…
Questions?Questions?