research methods —part ii. designing research hypothesis

Research Methods — Part II. Designing ResearchHypothesis testing

Experimental vs. non-experimental designReliability and validity

Fang-Ju Lin2021/10/6

Learning Objectivesq To understand the purposes of hypothesis testing in conducting

research and statistical analysis

q To describe the steps in hypothesis testing

q To describe the errors in hypothesis testing

Ø To distinguish between experimental, quasi-experimental, and non-experimental research designs

Ø To describe major types of experimental and quasi-experimental research designs

v To distinguish between systematic and random errors

v To describe reliability and validity of measures, and the relationship between the two

2

10 Steps to Building a Study Plan1. Statement of the problem and its significance2. Theoretical or conceptual framework3. Research questions to be answered by the study4. List of hypotheses to be tested5. Description of the research design6. Description of the sample and how it was obtained 7. Definitions of key terms and variables8. Description of the planned statistical analysis9. Statement of assumptions and limitations10. Dissemination plan

3

HYPOTHESIS TESTINGPart I.

4

Research Questions & Hypothesis

Research Question Hypothesis

Definition A question developed based on a problem or phenomenon that a researcher desires to answer in a study

A statement that has a specific prediction that is believed to occur as a result of conducting a study

Use When little to limited knowledge is known, these questions are used to explore relationship between variables that may ultimately lead to a hypothesis

Developed from previous research findings and is used to explain a relationship between variables that can be tested empirically

Goal Explore a phenomenon in an innovative way that has not been done before

Provide further literature support or contradict previous findings that have been empirically tested

5

HypothesisØ Purpose: to translate research question into

predictions of expected outcomes• Must stem from the research questions and be

grounded in the theoretical framework• Serves as a guide for data analysis• “Testable” statements of relations• This relationship between variables can be an

association (no causal effect presumed) or a causal relationship

• The hypotheses stated can be directional (one-sided) or non-directional (two-sided)

6

Examples of Research Question and HypothesisResearch Question Testable?

Relational or Causal?

If testable, formulate hypothesis Hypothesisdirectional (one-sided)?

How do patients with diabetes compliant to their medications?

Non-testable NA NA

Does menopausal hormone therapy increase the cardiovascular risk?

Testable Causal/relational

Menopausal hormone therapy increases the cardiovascular risk.

Directional

Is there a significant difference between males and females with regard to the incidence of brain tumor?

Testable Relational

Males and females has different incidence of brain tumor.

Non-directional

Are stress and health related? TestableRelational

Stress and health are related. Non-directional

What is the relationship between stress and health?

Non-testable NA NA

Are stress and health inversely related, such that health decreases as stress increases?

TestableRelational

Stress and health are inversely related, such that health decreases as stress increases.

Directional

Does increased levels of stress lead to decreased levels of health?

Testable Causal

Increased levels of stress will lead to decreased levels of health.

Directional

7

Hypothesis Testing• Purpose: To permit generalizations from a sample to the population from

which it came (i.e., statistical inference)– The “true” population value is usually unknowable, but it does exist and

can be estimated from an approximately drawn sample

Population

Sample

[Inference]Generalize conclusions from the sample to the

population

Select a sample from the population

Steps involved in testing statistical hypotheses

1. Formulate the statistical hypothesis (or hypotheses)2. Decide on the appropriate test statistic for the

hypothesis3. (1) Select the α value

(2) Determine the critical value (based on α)4. Perform the calculations for the test statistic &

obtain its p-value5. Interpret the results

8

Statistical Hypotheses

Null hypothesis (H0): a statement claiming there is no relationship between two measured phenomena, or no difference among groups [which researcher tries to disprove, reject or nullify]

Alternative hypothesis (H1 or HA): a statement claiming there is a relationship between two measured phenomena, or a difference among groups (one-tailed [directional], or two-tailed [non-directional])[which researcher thinks it’s true and try to prove]

• Hypotheses are about population parameters

Non-rejection

region

Non-rejection

region

R

LL

Hypothesis Testing & Errors

Truth in the population

H0(No relation/difference)

HA(Relation/difference exists)

Decision from your sample

H0 (Fail to reject H0)

Correct Type II error (β)“False negative”

HA (Reject H0)

Type I error (α)“False positive”

Correct

• Decide level of significance — alpha value (α): – The probability of incorrectly rejecting H0 when H0 is actually true (“Type I error”)– Chosen α before the statistical test is performed – Traditional values are 0.05 (most common) and 0.01

• Type II error (β): probability of incorrectly failing to reject H0 when H0 is actually false

• Power (1-β): the probability of rejecting H0 when H0 is false, or accepting HA when HA is true [“the ability to detect a true difference in a population if one exists”]

10

Statistical Testing

“Reject or Fail to Reject? This is the question”• If H0 is rejected as a result of sample evidence, then HA is the conclusion (accept the HA)• If there is not enough evidence to reject H0, H0 is retained but not accepted; rather, we

“fail to reject (or cannot reject) H0”

• After determining the critical value (based on α), then perform the calculations for the test statistic & obtain its p-value

• What is p-value?– The probability of observing a difference equals or more extreme than what

was actually observed, if the H0 is true– The smaller the p-value, the stronger the evidence against H0

• If p-value is less than α, H0 is rejected

11

ØDetermine critical value (based on α)ØPerform calculations for test statistic

& obtain its p-valueØ Interpret the results

One-tailed test:H0: μ = μ0HA: μ > μ0

J Appl Hematol 2014;5:27-8.

Z = +1.65

http://stats.stackexchange.com/questions/124178/why-do-we-compare-p-value-to-significance-level-in-hypothesis-testing-of-mean

“Reject or Fail to Reject? This is the question”• If H0 is rejected as a result of sample evidence, then HA is the conclusion (accept the HA)• If there is not enough evidence to reject H0, H0 is retained but not accepted; rather, we

“fail to reject (or cannot reject) H0”

13

Truth

Not guilty (H0) Guilty (HA)

Court decision Not guilty (Fail to reject H0)

Correct Type II error (β)“False negative”

Guilty (Reject H0)

Type I error (α)“False positive”

Correct

Something like:

Confidence Intervals (CIs)—another approach to statistical inference• Gives the range of values of a sample statistic that is actually likely to

contain the true population value• Typically constructed as either 95% or 99% CIs• Interpretation of 95% CI:

– If the study were repeated over and over again, drawing different random samples of the same size from the same population, 95% of the time, the population mean (μ) would fall within the 95% CI

14

In order to increase the power, we can …

• Increase sample size• Decrease variability• Increase α (then β decreased and power increased), but it’s a

trade-off• Select reliable measures• Use a one-tailed statistical test

https://vwo.com/blog/ab-test-duration-calculator/ 15

Multiple-Choice Concept Review (1)• The null hypothesis states

a. the expected direction of the relationship between the variables.b. that no relationship will be found.c. that a relationship will be found, but it will not state the direction.d. none of the above.

• Power can be increased by doing which of the following:a. increasing the α level.b. increasing the sample size.c. increasing the effect size.d. all of the above.

• A type I error occurs when thea. null hypothesis is accepted when it is false.b. null hypothesis is rejected when it is true.c. sample size is too small.d. effect size is not defined in advance.

16

• Which of the following is more likely to contain the “true” population value of the mean?

a. A 90% confidence interval (CI).b. A 95% CI.c. A 99% CI.d. All of the above.

• If a statistical test is significant, it means thata. It has important clinical application.b. The study had acceptable power.c. The null hypothesis was rejected.d. All of the above are true.

Multiple-Choice Concept Review (2)

17

EXPERIMENTAL VS. NON-EXPERIMENTAL DESIGN

Part II.

18

Identify study

question

Select study

approach

Design study &

collect data

Analyze data

Report findings

Research Process

Common study approaches:• Review/ systematic review/ meta-analysis• Cohort study• Cross-sectional study/ survey• Ecological study• Case series• Qualitative study• Experimental study

19

Experimental Study Design• Assigns participants to receive a particular exposure (intervention)• Has considerable control over determining who participate in study, what

happen to them, and under what conditions it happens• Strongest of all research designs in internal validity

– Gold standard for assessing causality (i.e. whether an intervention causes a particular outcome)

• Typical experimental study design— randomized controlled trial (RCT) o Active intervention groupso Control groupo All participants are followed forward in time

20

Pyramid of Evidence

Systematic reviews / meta-analyses

Randomized controlled trial (RCTs)

Controlled trials without randomization

Cohort or case-control studies

Quasi-experimental studies (e.g. multiple time series with or without intervention)

Descriptive or qualitative studies, case series/reports, expert opinions

21

Why randomized controlled trial (RCTs) present the gold standard of evidence?

22http://www.tfljournal.org/staticpages/index.php?page=Common-Experimental-Designs

*Randomization process ensures that both measured and unmeasuredconfounding factors are balanced across treatment and control groups.

Control Group vs. Control Variables“Control” enables one to make inferences about causality• Control group: in experimental research, a group (for sake of comparison)

that does not receive treatment/experimental stimulus of interest• Control variable: an extraneous variable that you do not wish to examine

in your study, hence you control for it (i.e. holding the value to be constant)

Female Lower income

Employment

Example 1

New drug Lower RA remission

Disease severity

Example 2

23

Potential “confounder”

RCT requires careful definitions of:• The intervention

– What intervention? Where and how to receive? When, how often, and duration? Eligibility criteria?

• Randomization– Simple randomization, block randomization, stratified randomization

• Selecting controls– Placebo (most typical), other active treatment, standard care, same

intervention with different doses/durations– Helps to avoid Hawthorne effect (also referred to “observer effect”— when

participants change their behavior for the better)– When there are ethical concerns: crossover design could be considered

• The outcome– Efficacy (e.g. surrogate/intermediate, survival, quality of life), safety– Superiority trial, non-inferiority trial, equivalence trial

24

METHODS OF RANDOMIZATION

Simple randomization

Uses a coin toss, random number generator, or simple mechanismNote: Does not guarantee balance in numbers during trial

Block randomization

Divide potential patients into blocks, and then randomize individuals within each block

Purpose: To keep the sizes of treatment groups similar (ensures equal treatment allocation within each block)

Stratified randomization

Randomly assigns individuals within certain subgroups (e.g. gender, age, race, disease severity)

Purpose: To produce comparable groups with regard to certain characteristics

25

Intervention

Control

Group A

Group B

RandomizationOutcome

Outcome

RCT with a Parallel Design

Intervention

Control

Group A

Group B

RandomizationOutcome

Outcome

Control

Intervention

Outcome

Outcome

Washoutperiod

Washoutperiod

RCT with a Crossover DesignPeriod 1 Period 2

Blinding (sometimes called Masking)• Single blind, double-blind

– Q&A: What term will be used to describe a trial if there is no blinding?• Minimize bias that can occur if participants or assessors assess outcomes

differently based the results they expect for an exposure• Usually possible only when all participants are assigned to similar

exposures• Measures to minimize the likelihood of bias in non-blinded studies:

– Use objective outcome measures

27

Study Validity• Internal validity: the ability to draw a causal link between your

independent variable (e.g. treatment exposure) and the dependent variable of interest (e.g. outcomes)

• External validity: the ability to generalize your study findings to the population at large

v Relationship between internal & external validityo Trade-offo Internal validity is a prerequisite to external validity

Questions for discussion:• How are RCTs usually designed to enhance internal validity,

which meanwhile may put external validity at risk? Examples?

28

RCT vs. Observational studies Randomized control trials (RCT) Observational studies

Research question

Is the drug safe and efficacious? Is the drug safe and effective in usual care?

Patient enrollment

Randomized Open

Population Narrowly selected, usually healthierthan patients who will eventually use the drug

Heterogeneous patient population, patients who actually use the drug once marketed

Comparisongroup

Placebo or inferior drug Current therapeutic alternatives

Compliance Strictly monitored As in normal practice

Data collection

Numerous case report forms No mandated labs or procedures

Outcomes Often short-term, surrogate, or intermediate endpoints

Broader set of outcomes relevant to clinical decision-making

Site distribution

Limited to sites with experience in RCTs

Primarily community-based sites

Validity High internal validity, less generalizability to patient population

Lower internal validity, more generalizable (higher external validity)

29

Types of Pre-Experimental Research Designs

O = observation or measurement of the dependent/outcome variableX = exposure to the experimental stimulant or independent variableR = random assignment to conditions

• Pre-experimental designs lack both random assignment to treatment conditions and control groups

One-Shot Case Study Experimental group X O

One-Group Pretest-Posttest Design O1 X O2Experimental group

Static-Group Comparison X OO

Experimental group Control group ???

30

Types of Experimental Research Designs (1)


• With “random assignment” of subjects to experimental and control groups

Pretest-Posttest Control Group Design (*Classic*)

Experimental groupControl group

R O1 X O2R O1 O2

Solomon Four-Group Design

R O1 X O2R O1 O2R X O2R O2

Experimental group 1Control group 1Experimental group 2Control group 2

Posttest-Only Control Group Design

R X O1R O2

Experimental group Control group

31

Types of Experimental Research Designs (2)


Multiple-Experimental Group with One Control

Group Design

R O1 X1 O2R O1 X2 O2R O1 X3 O2R O1 O2

Experimental group 1Experimental group 2Experimental group 3Control group

Factorial Design

R X1 X2 OR X1 OR X2 OR O

Experimental group 1Experimental group 2Experimental group 3Control group

32

• With “random assignment” of subjects to experimental and control groups

Quasi-Experimental Research Designs (1)

• Why not always use experimental designs?– Sometimes we do not have control over administration of intervention– Sometimes it is neither possible or feasible to have a comparison

group, to include or exclude specific subjects, or to decide who should and should not receive the intervention (e.g. policy analysis)

• Quasi-experimental research design– Typically lack random assignment

Nonequivalent Control Group Design

O1 X O2O1 O2

Experimental group Control group

33

Quasi-Experimental Research Designs (2)

Single Interrupted Time Series Design* O1 O2 O3 O4 O5 O6 X O7 O8 O9 O10 O11

*Subjects can be the same (individual-based) or different (group-based)

Interrupted Time Series with a Nonequivalent

Control Group*

O1 O2 O3 O4 O5 O6 X O7 O8 O9 O10 O11O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 O11

Interrupted Time Series Design (ITSD)

Ma Z, et al. Use of Interrupted Time-Series Method to Evaluate the Impact of Cigarette Excise Tax Increases in Pennsylvania, 2000–2009. Prev Chronic Dis 2013;10:120268.

Quasi-Experimental Research Designs (3)• Regression Discontinuity Design– Used more frequently for program evaluation (e.g. education)– When subjects are assigned to treatment group based on a quantitative score (cut-off)– Discontinuity at the cut-off point between two regression line indicates treatment

effect

http://www.socialresearchmethods.net/kb/quasird.php 35

Types of Quantitative Study Design (Review of last lecture)

36

Non-Experimental Research Designs• Observational research (no intervention)• A study when a researcher CANNOT control, manipulate, or alter the

exposure variable or subjects but instead relies on interpretations, observations, or interactions to come to a conclusiono Cross-sectional surveyso Cohort studieso Case-control studieso Studies with self-controlled design

http://www.doctordisruption.com/design/design-methods-7-observation/ 37

MEASUREMENTRELIABILITY & VALIDITY

Part III.

38

39

• Clinical trials• Medical records (paper/electronic)• Surveys & survey databases• Registries• Claims/administrative databases• Other: diaries, video recordings, transcripts of interviews

and focus groups• Pooling/linking from different data sources

Ø Primary data vs. Secondary data?

Preview – Source of Health Data

Random Errors & Systematic Errors• Error: defined as the difference between a calculated or observed value

and the “true” value

Systematic errors (“bias”) Random errorsCause of poor accuracy Cause of poor precision

(unreliable measurements)Definite causes Non-specific causesChange the mean of a set of scores

Change the variation but not the mean of a set of scores

40

Reliability (consistency, stability, precision)A matter of whether a particular technique, applied repeatedly to the same object, yields the same result each time

41

Types What is being assessed?

Test-retest reliability The stability of a test over time over time when no changes in health have occurred

Internal consistency The extent to which a measure is consistent within itself• Split-half method: measures the extent t which all part

of the test contribute equally to what is being measured

Inter-rater reliability Agreement between two raters when assessing the health status/behavior of the same patient

Validity (accuracy)The extent to which a empirical measure adequately reflects the real meaning of the concept under consideration (i.e. measure what it is supposed to measure)

42


1. Face validity2. Content validity

․The degree to which a measure covers the range of meanings included within a concept

․Based on well-accepted theoretical definitions, existing accepted standards, or from patient or expert interviews

3. Criterion validity ․The degree to which a measure correlate with, or predict, one or more external criteria (also called predictive validity)

Validity (accuracy)The extent to which an empirical measure adequately reflects the real meaning of the concept under consideration (i.e. measure what it is supposed to measure)

43


4. Construct validity The degree to which a measure related to other variables as expected within a system of theoretical relationship

․Convergent validity: Test whether the use of different measures of the same construct provides similar results

․Discriminant validity: Test whether different measures and their underlying construct can be differentiated from other constructs

․Known-groups validity: Assess the differences between two patient groups known or theorized to differ in some way

Reliable vs. Validityu A measure can be reliable but not valid (i.e. consistently wrong)u For a measure to be valid, it must first be reliable (consistent)u The validity of a measure is much more difficult to assess than its

reliability

44

Reliability _____Validity _____




Quiz

References• Jacobsen KH. Introduction to health research methods: a practical guide. Sudbury,

MA: Jones & Bartlett Learning; 2012.

• Monette DR, Sullivan TJ, DeJong CR, Hilton T. Applied social research: a tool for the human services. 9th ed. Belmont, CA: Brooks/Cole, Cengage Learning; 2014.

• Plichta SB, Kelvin EA, Munro BH. Munro's statistical methods for health care research. Philadelphia: Wolters Kluwer Health/Lippincott Williams & Wilkins; 2012.

• Picardi CA, Masick KD. Research methods : designing and conducting research with a real-world focus. Los Angeles, CA: SAGE Publications; 2014.

• Al Fattani AG. Concept of P-value. J Appl Hematol 2014;5:27-8.

research methods —part ii. designing research hypothesis

Documents