research methods —part ii. designing research hypothesis
TRANSCRIPT
Research Methods — Part II. Designing ResearchHypothesis testing
Experimental vs. non-experimental designReliability and validity
Fang-Ju Lin2021/10/6
Learning Objectivesq To understand the purposes of hypothesis testing in conducting
research and statistical analysis
q To describe the steps in hypothesis testing
q To describe the errors in hypothesis testing
Ø To distinguish between experimental, quasi-experimental, and non-experimental research designs
Ø To describe major types of experimental and quasi-experimental research designs
v To distinguish between systematic and random errors
v To describe reliability and validity of measures, and the relationship between the two
2
10 Steps to Building a Study Plan1. Statement of the problem and its significance2. Theoretical or conceptual framework3. Research questions to be answered by the study4. List of hypotheses to be tested5. Description of the research design6. Description of the sample and how it was obtained 7. Definitions of key terms and variables8. Description of the planned statistical analysis9. Statement of assumptions and limitations10. Dissemination plan
3
HYPOTHESIS TESTINGPart I.
4
Research Questions & Hypothesis
Research Question Hypothesis
Definition A question developed based on a problem or phenomenon that a researcher desires to answer in a study
A statement that has a specific prediction that is believed to occur as a result of conducting a study
Use When little to limited knowledge is known, these questions are used to explore relationship between variables that may ultimately lead to a hypothesis
Developed from previous research findings and is used to explain a relationship between variables that can be tested empirically
Goal Explore a phenomenon in an innovative way that has not been done before
Provide further literature support or contradict previous findings that have been empirically tested
5
HypothesisØ Purpose: to translate research question into
predictions of expected outcomes• Must stem from the research questions and be
grounded in the theoretical framework• Serves as a guide for data analysis• “Testable” statements of relations• This relationship between variables can be an
association (no causal effect presumed) or a causal relationship
• The hypotheses stated can be directional (one-sided) or non-directional (two-sided)
6
Examples of Research Question and HypothesisResearch Question Testable?
Relational or Causal?
If testable, formulate hypothesis Hypothesisdirectional (one-sided)?
How do patients with diabetes compliant to their medications?
Non-testable NA NA
Does menopausal hormone therapy increase the cardiovascular risk?
Testable Causal/relational
Menopausal hormone therapy increases the cardiovascular risk.
Directional
Is there a significant difference between males and females with regard to the incidence of brain tumor?
Testable Relational
Males and females has different incidence of brain tumor.
Non-directional
Are stress and health related? TestableRelational
Stress and health are related. Non-directional
What is the relationship between stress and health?
Non-testable NA NA
Are stress and health inversely related, such that health decreases as stress increases?
TestableRelational
Stress and health are inversely related, such that health decreases as stress increases.
Directional
Does increased levels of stress lead to decreased levels of health?
Testable Causal
Increased levels of stress will lead to decreased levels of health.
Directional
7
Hypothesis Testing• Purpose: To permit generalizations from a sample to the population from
which it came (i.e., statistical inference)– The “true” population value is usually unknowable, but it does exist and
can be estimated from an approximately drawn sample
Population
Sample
[Inference]Generalize conclusions from the sample to the
population
Select a sample from the population
Steps involved in testing statistical hypotheses
1. Formulate the statistical hypothesis (or hypotheses)2. Decide on the appropriate test statistic for the
hypothesis3. (1) Select the α value
(2) Determine the critical value (based on α)4. Perform the calculations for the test statistic &
obtain its p-value5. Interpret the results
8
Statistical Hypotheses
Null hypothesis (H0): a statement claiming there is no relationship between two measured phenomena, or no difference among groups [which researcher tries to disprove, reject or nullify]
Alternative hypothesis (H1 or HA): a statement claiming there is a relationship between two measured phenomena, or a difference among groups (one-tailed [directional], or two-tailed [non-directional])[which researcher thinks it’s true and try to prove]
• Hypotheses are about population parameters
Non-rejection
region
Non-rejection
region
R
LL
Hypothesis Testing & Errors
Truth in the population
H0(No relation/difference)
HA(Relation/difference exists)
Decision from your sample
H0 (Fail to reject H0)
Correct Type II error (β)“False negative”
HA (Reject H0)
Type I error (α)“False positive”
Correct
• Decide level of significance — alpha value (α): – The probability of incorrectly rejecting H0 when H0 is actually true (“Type I error”)– Chosen α before the statistical test is performed – Traditional values are 0.05 (most common) and 0.01
• Type II error (β): probability of incorrectly failing to reject H0 when H0 is actually false
• Power (1-β): the probability of rejecting H0 when H0 is false, or accepting HA when HA is true [“the ability to detect a true difference in a population if one exists”]
10
Statistical Testing
“Reject or Fail to Reject? This is the question”• If H0 is rejected as a result of sample evidence, then HA is the conclusion (accept the HA)• If there is not enough evidence to reject H0, H0 is retained but not accepted; rather, we
“fail to reject (or cannot reject) H0”
• After determining the critical value (based on α), then perform the calculations for the test statistic & obtain its p-value
• What is p-value?– The probability of observing a difference equals or more extreme than what
was actually observed, if the H0 is true– The smaller the p-value, the stronger the evidence against H0
• If p-value is less than α, H0 is rejected
11
ØDetermine critical value (based on α)ØPerform calculations for test statistic
& obtain its p-valueØ Interpret the results
One-tailed test:H0: μ = μ0HA: μ > μ0
J Appl Hematol 2014;5:27-8.
Z = +1.65
http://stats.stackexchange.com/questions/124178/why-do-we-compare-p-value-to-significance-level-in-hypothesis-testing-of-mean
“Reject or Fail to Reject? This is the question”• If H0 is rejected as a result of sample evidence, then HA is the conclusion (accept the HA)• If there is not enough evidence to reject H0, H0 is retained but not accepted; rather, we
“fail to reject (or cannot reject) H0”
13
Truth
Not guilty (H0) Guilty (HA)
Court decision Not guilty (Fail to reject H0)
Correct Type II error (β)“False negative”
Guilty (Reject H0)
Type I error (α)“False positive”
Correct
Something like:
Confidence Intervals (CIs)—another approach to statistical inference• Gives the range of values of a sample statistic that is actually likely to
contain the true population value• Typically constructed as either 95% or 99% CIs• Interpretation of 95% CI:
– If the study were repeated over and over again, drawing different random samples of the same size from the same population, 95% of the time, the population mean (μ) would fall within the 95% CI
14
In order to increase the power, we can …
• Increase sample size• Decrease variability• Increase α (then β decreased and power increased), but it’s a
trade-off• Select reliable measures• Use a one-tailed statistical test
https://vwo.com/blog/ab-test-duration-calculator/ 15
Multiple-Choice Concept Review (1)• The null hypothesis states
a. the expected direction of the relationship between the variables.b. that no relationship will be found.c. that a relationship will be found, but it will not state the direction.d. none of the above.
• Power can be increased by doing which of the following:a. increasing the α level.b. increasing the sample size.c. increasing the effect size.d. all of the above.
• A type I error occurs when thea. null hypothesis is accepted when it is false.b. null hypothesis is rejected when it is true.c. sample size is too small.d. effect size is not defined in advance.
16
• Which of the following is more likely to contain the “true” population value of the mean?
a. A 90% confidence interval (CI).b. A 95% CI.c. A 99% CI.d. All of the above.
• If a statistical test is significant, it means thata. It has important clinical application.b. The study had acceptable power.c. The null hypothesis was rejected.d. All of the above are true.
Multiple-Choice Concept Review (2)
17
EXPERIMENTAL VS. NON-EXPERIMENTAL DESIGN
Part II.
18
Identify study
question
Select study
approach
Design study &
collect data
Analyze data
Report findings
Research Process
Common study approaches:• Review/ systematic review/ meta-analysis• Cohort study• Cross-sectional study/ survey• Ecological study• Case series• Qualitative study• Experimental study
19
Experimental Study Design• Assigns participants to receive a particular exposure (intervention)• Has considerable control over determining who participate in study, what
happen to them, and under what conditions it happens• Strongest of all research designs in internal validity
– Gold standard for assessing causality (i.e. whether an intervention causes a particular outcome)
• Typical experimental study design— randomized controlled trial (RCT) o Active intervention groupso Control groupo All participants are followed forward in time
20
Pyramid of Evidence
Systematic reviews / meta-analyses
Randomized controlled trial (RCTs)
Controlled trials without randomization
Cohort or case-control studies
Quasi-experimental studies (e.g. multiple time series with or without intervention)
Descriptive or qualitative studies, case series/reports, expert opinions
21
Why randomized controlled trial (RCTs) present the gold standard of evidence?
22http://www.tfljournal.org/staticpages/index.php?page=Common-Experimental-Designs
*Randomization process ensures that both measured and unmeasuredconfounding factors are balanced across treatment and control groups.
Control Group vs. Control Variables“Control” enables one to make inferences about causality• Control group: in experimental research, a group (for sake of comparison)
that does not receive treatment/experimental stimulus of interest• Control variable: an extraneous variable that you do not wish to examine
in your study, hence you control for it (i.e. holding the value to be constant)
Female Lower income
Employment
Example 1
New drug Lower RA remission
Disease severity
Example 2
23
Potential “confounder”
RCT requires careful definitions of:• The intervention
– What intervention? Where and how to receive? When, how often, and duration? Eligibility criteria?
• Randomization– Simple randomization, block randomization, stratified randomization
• Selecting controls– Placebo (most typical), other active treatment, standard care, same
intervention with different doses/durations– Helps to avoid Hawthorne effect (also referred to “observer effect”— when
participants change their behavior for the better)– When there are ethical concerns: crossover design could be considered
• The outcome– Efficacy (e.g. surrogate/intermediate, survival, quality of life), safety– Superiority trial, non-inferiority trial, equivalence trial
24
METHODS OF RANDOMIZATION
Simple randomization
Uses a coin toss, random number generator, or simple mechanismNote: Does not guarantee balance in numbers during trial
Block randomization
Divide potential patients into blocks, and then randomize individuals within each block
Purpose: To keep the sizes of treatment groups similar (ensures equal treatment allocation within each block)
Stratified randomization
Randomly assigns individuals within certain subgroups (e.g. gender, age, race, disease severity)
Purpose: To produce comparable groups with regard to certain characteristics
25
Intervention
Control
Group A
Group B
RandomizationOutcome
Outcome
RCT with a Parallel Design
Intervention
Control
Group A
Group B
RandomizationOutcome
Outcome
Control
Intervention
Outcome
Outcome
Washoutperiod
Washoutperiod
RCT with a Crossover DesignPeriod 1 Period 2
Blinding (sometimes called Masking)• Single blind, double-blind
– Q&A: What term will be used to describe a trial if there is no blinding?• Minimize bias that can occur if participants or assessors assess outcomes
differently based the results they expect for an exposure• Usually possible only when all participants are assigned to similar
exposures• Measures to minimize the likelihood of bias in non-blinded studies:
– Use objective outcome measures
27
Study Validity• Internal validity: the ability to draw a causal link between your
independent variable (e.g. treatment exposure) and the dependent variable of interest (e.g. outcomes)
• External validity: the ability to generalize your study findings to the population at large
v Relationship between internal & external validityo Trade-offo Internal validity is a prerequisite to external validity
Questions for discussion:• How are RCTs usually designed to enhance internal validity,
which meanwhile may put external validity at risk? Examples?
28
RCT vs. Observational studies Randomized control trials (RCT) Observational studies
Research question
Is the drug safe and efficacious? Is the drug safe and effective in usual care?
Patient enrollment
Randomized Open
Population Narrowly selected, usually healthierthan patients who will eventually use the drug
Heterogeneous patient population, patients who actually use the drug once marketed
Comparisongroup
Placebo or inferior drug Current therapeutic alternatives
Compliance Strictly monitored As in normal practice
Data collection
Numerous case report forms No mandated labs or procedures
Outcomes Often short-term, surrogate, or intermediate endpoints
Broader set of outcomes relevant to clinical decision-making
Site distribution
Limited to sites with experience in RCTs
Primarily community-based sites
Validity High internal validity, less generalizability to patient population
Lower internal validity, more generalizable (higher external validity)
29
Types of Pre-Experimental Research Designs
O = observation or measurement of the dependent/outcome variableX = exposure to the experimental stimulant or independent variableR = random assignment to conditions
• Pre-experimental designs lack both random assignment to treatment conditions and control groups
One-Shot Case Study Experimental group X O
One-Group Pretest-Posttest Design O1 X O2Experimental group
Static-Group Comparison X OO
Experimental group Control group ???
30
Types of Experimental Research Designs (1)
O = observation or measurement of the dependent/outcome variableX = exposure to the experimental stimulant or independent variableR = random assignment to conditions
• With “random assignment” of subjects to experimental and control groups
Pretest-Posttest Control Group Design (*Classic*)
Experimental groupControl group
R O1 X O2R O1 O2
Solomon Four-Group Design
R O1 X O2R O1 O2R X O2R O2
Experimental group 1Control group 1Experimental group 2Control group 2
Posttest-Only Control Group Design
R X O1R O2
Experimental group Control group
31
Types of Experimental Research Designs (2)
O = observation or measurement of the dependent/outcome variableX = exposure to the experimental stimulant or independent variableR = random assignment to conditions
Multiple-Experimental Group with One Control
Group Design
R O1 X1 O2R O1 X2 O2R O1 X3 O2R O1 O2
Experimental group 1Experimental group 2Experimental group 3Control group
Factorial Design
R X1 X2 OR X1 OR X2 OR O
Experimental group 1Experimental group 2Experimental group 3Control group
32
• With “random assignment” of subjects to experimental and control groups
Quasi-Experimental Research Designs (1)
• Why not always use experimental designs?– Sometimes we do not have control over administration of intervention– Sometimes it is neither possible or feasible to have a comparison
group, to include or exclude specific subjects, or to decide who should and should not receive the intervention (e.g. policy analysis)
• Quasi-experimental research design– Typically lack random assignment
Nonequivalent Control Group Design
O1 X O2O1 O2
Experimental group Control group
33
Quasi-Experimental Research Designs (2)
Single Interrupted Time Series Design* O1 O2 O3 O4 O5 O6 X O7 O8 O9 O10 O11
*Subjects can be the same (individual-based) or different (group-based)
Interrupted Time Series with a Nonequivalent
Control Group*
O1 O2 O3 O4 O5 O6 X O7 O8 O9 O10 O11O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 O11
Interrupted Time Series Design (ITSD)
Ma Z, et al. Use of Interrupted Time-Series Method to Evaluate the Impact of Cigarette Excise Tax Increases in Pennsylvania, 2000–2009. Prev Chronic Dis 2013;10:120268.
Quasi-Experimental Research Designs (3)• Regression Discontinuity Design– Used more frequently for program evaluation (e.g. education)– When subjects are assigned to treatment group based on a quantitative score (cut-off)– Discontinuity at the cut-off point between two regression line indicates treatment
effect
http://www.socialresearchmethods.net/kb/quasird.php 35
Types of Quantitative Study Design (Review of last lecture)
36
Non-Experimental Research Designs• Observational research (no intervention)• A study when a researcher CANNOT control, manipulate, or alter the
exposure variable or subjects but instead relies on interpretations, observations, or interactions to come to a conclusiono Cross-sectional surveyso Cohort studieso Case-control studieso Studies with self-controlled design
http://www.doctordisruption.com/design/design-methods-7-observation/ 37
MEASUREMENTRELIABILITY & VALIDITY
Part III.
38
39
• Clinical trials• Medical records (paper/electronic)• Surveys & survey databases• Registries• Claims/administrative databases• Other: diaries, video recordings, transcripts of interviews
and focus groups• Pooling/linking from different data sources
Ø Primary data vs. Secondary data?
Preview – Source of Health Data
Random Errors & Systematic Errors• Error: defined as the difference between a calculated or observed value
and the “true” value
Systematic errors (“bias”) Random errorsCause of poor accuracy Cause of poor precision
(unreliable measurements)Definite causes Non-specific causesChange the mean of a set of scores
Change the variation but not the mean of a set of scores
40
Reliability (consistency, stability, precision)A matter of whether a particular technique, applied repeatedly to the same object, yields the same result each time
41
Types What is being assessed?
Test-retest reliability The stability of a test over time over time when no changes in health have occurred
Internal consistency The extent to which a measure is consistent within itself• Split-half method: measures the extent t which all part
of the test contribute equally to what is being measured
Inter-rater reliability Agreement between two raters when assessing the health status/behavior of the same patient
Validity (accuracy)The extent to which a empirical measure adequately reflects the real meaning of the concept under consideration (i.e. measure what it is supposed to measure)
42
Types What is being assessed?
1. Face validity2. Content validity
․The degree to which a measure covers the range of meanings included within a concept
․Based on well-accepted theoretical definitions, existing accepted standards, or from patient or expert interviews
3. Criterion validity ․The degree to which a measure correlate with, or predict, one or more external criteria (also called predictive validity)
Validity (accuracy)The extent to which an empirical measure adequately reflects the real meaning of the concept under consideration (i.e. measure what it is supposed to measure)
43
Types What is being assessed?
4. Construct validity The degree to which a measure related to other variables as expected within a system of theoretical relationship
․Convergent validity: Test whether the use of different measures of the same construct provides similar results
․Discriminant validity: Test whether different measures and their underlying construct can be differentiated from other constructs
․Known-groups validity: Assess the differences between two patient groups known or theorized to differ in some way
Reliable vs. Validityu A measure can be reliable but not valid (i.e. consistently wrong)u For a measure to be valid, it must first be reliable (consistent)u The validity of a measure is much more difficult to assess than its
reliability
44
Reliability _____Validity _____
Reliability _____Validity _____
Reliability _____Validity _____
Reliability _____Validity _____
Quiz
References• Jacobsen KH. Introduction to health research methods: a practical guide. Sudbury,
MA: Jones & Bartlett Learning; 2012.
• Monette DR, Sullivan TJ, DeJong CR, Hilton T. Applied social research: a tool for the human services. 9th ed. Belmont, CA: Brooks/Cole, Cengage Learning; 2014.
• Plichta SB, Kelvin EA, Munro BH. Munro's statistical methods for health care research. Philadelphia: Wolters Kluwer Health/Lippincott Williams & Wilkins; 2012.
• Picardi CA, Masick KD. Research methods : designing and conducting research with a real-world focus. Los Angeles, CA: SAGE Publications; 2014.
• Al Fattani AG. Concept of P-value. J Appl Hematol 2014;5:27-8.