biostatistics in practice
DESCRIPTION
Biostatistics in Practice. Session 2: Quantitative and Inferential Issues II. Youngju Pak Biostatistician http://research.LABioMed.org/Biostat. What we have learned in Session 1 ?. Basic Study Design Parallel vs., Cross-over Designs? Categorical vs., Quantitative Data? Why important? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/1.jpg)
Biostatistics in Practice
Session 2: Quantitative and Inferential Issues II
Youngju PakBiostatistician
http://research.LABioMed.org/Biostat 1
![Page 2: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/2.jpg)
What we have learned in Session 1? Basic Study Design Parallel vs., Cross-over Designs? Categorical vs., Quantitative Data? Why
important? Summarizing the data with graphs:
Contingency Tables, Box Plots, Histogram, etc.
How to run MYSTAT
2
![Page 3: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/3.jpg)
Today’s topics Article : McCann, et al., Lancet 2007 Nov
3;370(9598):1560-7 Descritive Statistics vs. Inferential Statistics Normal Distributions Confidence Intervals & P-values Correlations
3
![Page 4: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/4.jpg)
McCann, et al., Lancet 2007 Nov 3;370(9598):1560-7
Food additives and hyperactive behaviour in 3-year-old and 8/9-year-old children in the community: a randomised, double-blinded, placebo-controlled trial.
Target population: 3-4, 8-9 years old children Study design: randomized, double-blinded, controlled,
crossover trial Sample size: 153 (3 years), 144(8-9 years) in
Southampton UK Objective: test whether intake of artificial food color
and additive (AFCA) affects childhood behavior
![Page 5: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/5.jpg)
McCann, et al., Lancet 2007 Nov 3;370(9598):1560-7
Sampling: Stratified sampling based on SES in Southampton, UK Baseline measure: 24h recall by the parent of the child’s pretrial diet Group: Three groups, for 3 years old
– mix A : 20 mg of food colorings + 45 mg sodium benzoate, which is a widely used food preservative
– mix B : 30mg of food coloring + 45 mg sodium benzoate(current average daily consumption)
– Placebo– For 8/9 years old: multiply these by 1.25
Cross-over Design
A participants receive one of 6 possible random sequences. In a separate study with N=20, no significant difference in looks and taste of drinks among three groups was found even though people ask about which diet type they got when they received placebo (65%) > mix B (52%) > mix A (40%)
5
T0 (baseline) Week 1 Week 2 Week 3 Week 4 Week 5 Week 6
Randomize Randomize RandomizeTypical Diet Washout Washout
![Page 6: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/6.jpg)
McCann, et al., Lancet 2007 Nov 3;370(9598):1560-7
Outcomes: Global Hyper Activity(GHA) Score Attention-Deficit Hyperactivity Disorder(ADHD)
rating scale IV by teachers, scaled 1 – 5, higher number means more hyperactive
Weiss-Werry-Peters(WWP) hyperactivity scale by parents,
Classroom observation code, Conners continuous performance test II (CPTII)
GHA to be aggregated from these four scores
6
![Page 7: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/7.jpg)
Non-Completing or Non-Adhering Subjects Non-response bias?Societal effect vs. Scientific effect ?Efficacy vs. Effectiveness ?
![Page 8: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/8.jpg)
Describing the sample
8
![Page 9: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/9.jpg)
Describing the findings w/ descriptive statistics
9
What was your research question ?Did you get answer for that that research questions from this table? Why or Why not?
GHA= (post –pre)/standard deviation (SD) for pre-scores
![Page 10: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/10.jpg)
Describing the findings w/ inferential statistics
10
![Page 11: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/11.jpg)
Describing the findings w/ Graphs using confidence intervals
![Page 12: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/12.jpg)
Population
Sample
Sample estimate of population parameter
Population parameter
Sampling mechanism: random sample or convenience sample
Confidence Interval
for population parameter
12
The Life Cycle of a Research Study With Statistical Applications
![Page 13: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/13.jpg)
So why use a sample? Often the population is too large to obtain data Saves time and money All members of the population may be difficult to contact
Parameter vs. Statistic A parameter is a numerical description of a population characteristics e.g., μ (called as”mu:”): population mean, σ2 (called as “sigma square”): population variance
A statistic is a numerical description of a sample characteristics e.g., m: sample mean, S2 : sample variance
![Page 14: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/14.jpg)
Branches of Statistics• Descriptive statistics involves the organization,
summarization, and presentation of the sample.
e.g., sample means, sample standard
deviations, histograms, box plots, etc.
• Inferential statistics involves using a sample to draw conclusions about a population.
e.g., confidence intervals, p-values, etc.
![Page 15: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/15.jpg)
3 questions that statisticians attempt to answer
• How should I collect my data ?
- Study design, sample size, statistical power.
• How should I analyze and summarize the data
that I’ve collected ?
- displaying the data, descriptive statistics, statistical tests
• How accurate are my data summaries ?
-Inferences: confidence intervals, p-values
![Page 16: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/16.jpg)
Mean vs. Median(measure the central tendency)
• Mean – What most people
think of as “average”– Easy to calculate– Easily distorted– Be cautious with
SKEWED data– Calculate:
sum of data / number of data points
• Median– Relatively easy to
obtain– Not affected by
extreme values so it is considered a “ROBUST” statistic
– Calculate: • Sort data • If odd number points,
the middle is the median
• Otherwise, the median is the average of the middle two numbers
16
![Page 17: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/17.jpg)
Standard Deviation (SD) &Inter-Quartile Range(IRQ)(measuring the variability of the data )
• Inter-Quartile Range (IQR)=
75th percentile (Q3) - 25th percentile(Q1)
, where 25% of the data <Q1 , 75% of the data < Q3
• SD is usually used for the normally distributed data (bellshape, symmetric around the mean)
• IQR is usually used when the data distribution is skewed.• Range = Max -Min
17
![Page 18: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/18.jpg)
Checking for the normality
• Symmetric.• One peak.• Roughly bell-shaped.• No outliers.
Many statistical tests assume outcome variable follow the normal distribution 18
![Page 19: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/19.jpg)
Other properties of the normal distribution
For bell-shaped distributions of data (“normally” distributed):
• ~ 68% of values are within mean ±1 SD
• ~ 95% of values are within mean ±2 SD “(Normal) Reference Range”
• ~ 99.7% of values are within mean ±3 SD19
![Page 20: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/20.jpg)
876543210
150
100
50
0
Intensity
Fre
qu
en
cyHistograms: Not OK for Typical Analyses
Skewed
Need to transform intensity to another scale,
e.g. Log(intensity)
1207020
20
10
0
Tumor Volume
Fre
quen
cy
Multi-Peak
Need to summarize with percentiles, not
mean.20
![Page 21: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/21.jpg)
Summary Statistics:Two quantitative Variables
(Correlation)
• Always look at scatter plot.• Correlation, r, ranges from -1 (perfect inverse
relation) to +1 (perfect direct), Zero=no relation.
• Specific to the ranges of the two variables.• Typically, cannot extrapolate to populations
with other ranges.• Measures association, not causation.
. 21
![Page 22: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/22.jpg)
Correlation Depends on Range of Data
Graph B contains only the points from graph A that are in the ellipse.
Correlation is reduced in graph B.
Thus: correlation between two quantities may be quite different in different study populations.
Do not extrapolate
BA
22
![Page 23: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/23.jpg)
Confidence Interval (CI)
• How well your sample mean(m) reflects the true( or population) mean How confident? 95%?
• A confidence interval (CI) is one of inferential statistics that estimate the true unknown parameter using interval scales.
23
![Page 24: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/24.jpg)
Confidence Interval for Population Mean
95% Reference range or “Normal Range”, is
sample mean ± 2(SD) _____________________________________
95% Confidence interval (CI) for the (true, but unknown) mean for the entire population is
sample mean ± 2(SD/√N)
SD/√N is called “Std Error of the Mean” (SEM)24
![Page 25: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/25.jpg)
Confidence Interval: Case Study
Confidence Interval:
-0.14 ± 1.99(1.04/√73) =
-0.14 ± 0.24 → -0.38 to 0.10
Table 2
Normal Range:
-0.14 ± 1.99(1.04) =
-0.14 ± 2.07 → -2.21 to 1.93
0.13 -0.12 -0.37
Adjusted CI
close to
25
![Page 26: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/26.jpg)
P-values !
• Used the evidence of contradiction to your null hypothesis (H0)– e.g., H0 : no difference in mean GHA scores
among three different diet.
• Based on the statistical test– Eg., T test statistics = Signal / Noise– if Signal >> Noise statistically significant
• Usually p < 0.05 called as “statistically significant” in favor of Ha
26
![Page 27: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/27.jpg)
Units and IndependenceExperiments may be designed such that each measurement does not give additional independent information.
Many basic statistical methods require that measurements are “independent” for the analysis to be valid.
In mathematics, two events are independent if and only if the occurrence of one event makes it neither more nor less probable that the other occurs. 27
![Page 28: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/28.jpg)
Experimental Units in Case Study
What is the experimental unit in this study? 1. School 2. Child 3. Parent 4. GHA score (results from three diets)Are all GHA scores(eg. 153 x 3 groups=459 GHA scores for 3-4 years old children) independent?The analysis MUST incorporate this possible correlation (clustering) if there exists. eg., Mixed Model allowing for clustering due to schools.
28
![Page 29: Biostatistics in Practice](https://reader035.vdocuments.net/reader035/viewer/2022062321/5681310c550346895d974869/html5/thumbnails/29.jpg)
Announcements
• Keys for HW1 and HW 2 will be posted on class website by Wednesday.
• Next session will be held in Oct 15 at RB-1
29