1 introduction to biostatistics descriptive statistics and sample size justification julie a....
Post on 19-Dec-2015
223 views
TRANSCRIPT
![Page 1: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/1.jpg)
1
Introduction to Biostatistics
Descriptive Statistics and
Sample Size Justification
Julie A. Stoner, PhD
October 18, 2004
![Page 2: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/2.jpg)
2
Statistics Seminars
• Goal: Interpret and critically evaluate biomedical literature
• Topics:– Sample size justification– Exploratory data analysis– Hypothesis testing
![Page 3: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/3.jpg)
3
Example #1
• Aim: Compare two antihypertensive strategies for lowering blood pressure– Double-blind, randomized study– 5 mg Enalapril + 5 mg Felodipine ER to 10 mg
Enalapril– 6-week treatment period– 217 patients– AJH, 1999;12:691-696
![Page 4: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/4.jpg)
4
Example #2
• Aim: Demonstrate that D-penicillamine (DPA) is effective in prolonging the overall survival of patients with primary biliary cirrhosis of the liver (PBC)– Mayo Clinic
– Double-blind, placebo controlled, randomized trial
– 312 patients
– Collect clinical and biochemical data on patients
– Reference: NEJM. 312:1011-1015.1985.
![Page 5: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/5.jpg)
5
Example #2
• Patients enrolled over 10 years, between January 1974 and May 1984
• Data were analyzed in July 1986 • Event: death (x)• Censoring: some patients are still alive at end of study (o)
1/1974 5/1984 6/1986
_____________________________X
___________________________o________________________o
![Page 6: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/6.jpg)
6
Statistical Inference
• Goal: describe factors associated with particular outcomes in the population at large
• Not feasible to study entire population
• Samples of subjects drawn from population
• Make inferences about population based on sample subset
![Page 7: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/7.jpg)
7
Why are descriptive statistics important?
• Identify signals/patterns from noise
• Understand relationships among variables
• Formal hypothesis testing should agree with descriptive results
![Page 8: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/8.jpg)
8
Outline• Types of data
– Categorical data– Numerical data
• Descriptive statistics– Measures of location– Measures of spread
• Descriptive plots
![Page 9: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/9.jpg)
9
Types of Data
• Categorical data: provides qualitative description– Dichotomous or binary data
• Observations fall into 1 of 2 categories
• Example: male/female, smoker/non-smoker
– More than 2 categories
• Nominal: no obvious ordering of the categories
– Example: blood types A/B/AB/O
• Ordinal: there is a natural ordering
– Example: never-smoker/ex-smoker/light smoker/heavy smoker
![Page 10: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/10.jpg)
10
Types of Data
• Numerical data (interval/ratio data)– Provides quantitative description
– Discrete data
• Observations can only take certain numeric values
• Often counts of events
• Example: number of doctor visits in a year
– Continuous data
• Not restricted to take on certain values
• Often measurements
• Example: height, weight, age
![Page 11: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/11.jpg)
11
Descriptive Statistics: Numerical Data
• Measures of location
– Mean: average value
For n data points, x1, x2,, …, xn the mean is the sum of the observations divided by the number of observations
n
i
ixn
x1
1
![Page 12: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/12.jpg)
12
Descriptive Statistics: Numerical Data
• Measures of location– Mean:
• Example: Find the mean triglyceride level (in mg/100 ml) of the following patients
159, 121, 130, 164, 148, 148, 152
Sum = 1022, Count = 7,
Mean = 1022/7 = 146
![Page 13: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/13.jpg)
13
Descriptive Statistics: Numerical Data
• Measures of location
– Percentile: value that is greater than a particular percentage of the data values
• Order data• Pth percentile has rank r = (n+1)*(P/100)
– Median: the 50th percentile, 50% of the data values lie below the median
![Page 14: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/14.jpg)
14
Descriptive Statistics: Numerical Data
• Measures of location– Median
• Example: Find the median triglyceride level from the sample
159, 121, 130, 164, 148, 148, 152
Order: 121, 130, 148, 148, 152, 159, 164
Median: rank = (7+1) * (50/100) = 4
4TH ordered observation is 148
![Page 15: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/15.jpg)
15
Descriptive Statistics: Numerical Data
• Measures of location
– Mode: most common element of a set
– Example: Find the mode of the triglyceride values
159, 121, 130, 164, 148, 148, 152
Mode = 148
![Page 16: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/16.jpg)
16
Descriptive Statistics: Numerical Data
• Measures of location: comparison of mean and median– Example: Compare the mean and median from
the sample of triglyceride levels
159, 141, 130, 230, 148, 148, 152
Mean = 1108/7=158.29, Median = 148 – The mean may be influenced by extreme data
points.
![Page 17: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/17.jpg)
17
Skewed Distributions• Data that is not symmetric and bell-shaped is skewed.
• Mean may not be a good measure of central tendency. Why?
Positive skew, or skewed to the right, mean > median Negative skew, or skewed to the
left, mean < median
![Page 18: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/18.jpg)
18
Motivation
• Example:
1) 2 60 100 =54
2) 53 54 55 =54
• Both data sets have a mean of 54 but scores in set 1 have a larger range and variation than the scores in set 2.
![Page 19: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/19.jpg)
19
Descriptive Statistics: Numerical Data
• Measures of spread– Variance: average squared deviation from the
mean
For n data points, x1, x2,, …, xn the variance is
– Standard deviation: square root of variance, in same units as original data
2
1
2
1
1
n
ii xx
ns
![Page 20: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/20.jpg)
20
Descriptive Statistics: Numerical Data
• Measures of spread– Standard Deviation:
• Example: find the standard deviation of the triglyceride values
159, 121, 130, 164, 148, 148, 152
Distance from mean: 13, -25, -16, 18, 2, 2, 6
Sum of squared differences: 1418
Standard deviation: sqrt(1418/6)=15.37
![Page 21: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/21.jpg)
21
Descriptive Statistics: Numerical Data
• Standard deviation: How much variability can we expect among individual responses?
• Standard error of the mean: How much variability can we expect in the mean response among various samples?
![Page 22: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/22.jpg)
22
Descriptive Statistics: Numerical Data
• The standard error of the mean is estimated as
where s.d. is the estimated standard deviation• Based on the formula, will the standard error of the mean
will always be smaller or larger than the standard deviation of the data?– Answer: smaller
n
dsmes
.....
![Page 23: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/23.jpg)
23
Descriptive Statistics: Numerical Data
• Measures of spread
– Minimum, maximum
– Range: maximum-minimum
– Interquartile range: difference between 25th and 75th percentile, values that encompass middle 50% of data
![Page 24: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/24.jpg)
24
Descriptive Statistics: Numerical Data
• Measures of spread– Example: find the range and the interquartile range for
the triglyceride values
159, 121, 130, 164, 148, 148, 152
Range: 164 - 121 = 43
Interquartile Range:
Order: 121, 130, 148, 148, 152, 159, 164
IQR: 159 - 130 = 29
![Page 25: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/25.jpg)
25
Descriptive Statistics: Numerical Data
• Helpful to describe both location and spread of data– Location: mean
Spread: standard deviation
– Location: median
Spread: min, max, range
interquartile range
quartiles
![Page 26: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/26.jpg)
26
Descriptive Statistics: Categorical Data
• Measures of distribution
– Proportion:
Number of subjects with characteristics
Total number subjects
– Percentage:
Proportion * 100%
![Page 27: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/27.jpg)
27
Descriptive Statistics: Categorical Data
• Measures of distribution: example
• What percentage of vaccinated individuals developed the flu?198/400 = 0.495 49.5%
No Flu Flu
Vaccinated 202 198 400
Not Vaccinated
179 221 400
381 419 800
![Page 28: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/28.jpg)
28
Example
• Consider the table of descriptive statistics for characteristics at baseline
• What do we conclude about comparability of the groups at baseline in terms of gender and age?
Parameter Enalapril+Felodipine ER
Enalapril
Number 109 108
Gender(% Male)
61% 54%
Age yearsMean (SD)
52(9) 53(11)
![Page 29: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/29.jpg)
29
Descriptive Plots:
• Single variable
– Bar plot
– Histogram
– Box-plot
• Multiple variables
– Box-plot
– Scatter plot
– Kaplan-Meier survival plots
![Page 30: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/30.jpg)
30
Barplot
• Goal: Describe the distribution of values for a categorical variable
• Method: – Determine categories of response– For each category, draw a bar with height equal
to the number or proportion of responses
![Page 31: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/31.jpg)
31
Barplot
![Page 32: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/32.jpg)
32
Histogram
• Goal: Describe the distribution of values for a continuous variable
• Method: – Determine intervals of response (bins)– For each interval, draw a bar with height equal
to the number or proportion of responses
![Page 33: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/33.jpg)
33
Histogram
![Page 34: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/34.jpg)
34
Box-plot
• Goal: Describe the distribution of values for a continuous variable
• Method:
– Determine 25th, 50th, and 75th percentiles of distribution
– Determine outlying and extreme values
– Draw a box with lower line at the 25th percentile, middle line at the median, and upper line at the 75th percentile
– Draw whiskers to represent outlying and extreme values
![Page 35: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/35.jpg)
35
Boxplot
Median
25th percentile
75th percentile
![Page 36: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/36.jpg)
36
Box-plot
![Page 37: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/37.jpg)
37
Scatter Plot
• Goal: Describe joint distribution of values from 2 continuous variables
• Method: – Create a 2-dimensional grid (horizontal and
vertical axis)– For each subject in the dataset, plot the pair of
observations from the 2 variables on the grid
![Page 38: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/38.jpg)
38
Scatter Plot
![Page 39: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/39.jpg)
39
Scatter Plot
![Page 40: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/40.jpg)
40
Kaplan-Meier Survival Curves
• Goal: Summarize the distribution of times to an event
• Method: – Estimate survival probabilities while
accounting for censoring– Plot the survival probability corresponding to
each time an event occurred
![Page 41: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/41.jpg)
41
Kaplan-Meier Survival Curves
![Page 42: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/42.jpg)
42
Kaplan-Meier Survival Curves
![Page 43: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/43.jpg)
43
Kaplan-Meier Survival Curves
![Page 44: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/44.jpg)
44
Descriptive Plots Guidelines
• Clearly label axes
• Indicate unit of measurement
• Note the scale when interpreting graphs
![Page 45: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/45.jpg)
45
Descriptive Statistics
Exercises
![Page 46: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/46.jpg)
46
Example
• Below are some descriptive plots and statistics from a study designed to investigate the effect of smoking on the pulmonary function of children
• Tager et al. (1979) American Journal of Epidemiology. 110:15-26
![Page 47: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/47.jpg)
47
Example
• The primary question, for this exercise, is whether or not smoking is associated with decreased pulmonary function in children, where pulmonary function is measured by forced expiratory volume (FEV) in liters per second.
• The data consist of observations on 654 children aged 3 to 19.
![Page 48: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/48.jpg)
48
• Proportion Male:– (336/654)100% = 51.4%
• Proportion Smokers:– (65/654)100% = 9.9%
• Proportion of Smokers who are Male:– (26/65)100% = 40%
SMOKING STATUS
GENDER Non-smoker
Smoker Total
Female 279 39 318 Male 310 26 336 Total 589 65 654
![Page 49: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/49.jpg)
49
Compare the FEV1 distribution between smokers and non-smokers
• Answer– The smokers appear to have higher FEV values and therefore better lung function. Specifically, the median FEV for smokers is 3.2 liters/sec. (IQR 3.75-3=0.75) compared to a median FEV of 2.5 liters/sec. (IQR 3-2=1) for non-smokers.
![Page 50: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/50.jpg)
50
Compare the age distribution between smokers and non-smokers.
• Answer:– The smokers are older than the non-smokers in general. Specifically, the median age for the smokers is 13 years (IQR 15-12=3) compared to 9 years (IQR 11-8=3) for the non-smokers.
![Page 51: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/51.jpg)
51
Can you explain the apparent differences in pulmonary function between smokers and non-smokers displayed in Figure 1?
• The relationship between FEV and smoking status is probably confounded by age (smokers are older and older children have better lung function). A comparison of FEV between smokers and non-smokers should account for age.
![Page 52: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/52.jpg)
52
Sample Size Justification
![Page 53: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/53.jpg)
53
Outline
• Statistical Concepts: hypotheses and errors
• Effect size and variation
• Influence on sample size and power
![Page 54: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/54.jpg)
54
Sample Size Justification
• Example: Intensifying Antihypertensive Treatment
– “A sample size calculation indicated that 114 patients per treatment group would be necessary for 90% power to detect a true mean difference in change from baseline of 3 mm Hg in sitting DBP between the two randomized treatment groups. This calculation assumed a two-sided test, =0.05, and standard deviation in sitting DBP of 7 mm Hg.”
Source: AJH. 1999;12:691-696
![Page 55: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/55.jpg)
55
Importance of Careful Study Design
• Goal of sample size calculations:
– Adequate sample size to detect clinically meaningful treatment differences
– Ethical use of resources
• Important to justify sample size early in planning stages
• Examples of inadequate power:
– NEJM 299:690-694, 1978
![Page 56: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/56.jpg)
56
Type of Response
• Sample size calculations depend on type of response variable and method of analysis
– Continuous response
• Example: cholesterol, weight, blood pressure
– Dichotomous response
• Example: yes/no, presence/absence, success/failure
– Time to event
• Example: survival time, time to adverse event
![Page 57: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/57.jpg)
57
Statistical ConceptsHypotheses
• Null hypothesis: H0
– Typically a statement of no treatment effect
– Assumed true until evidence suggests otherwise
– Example: H0: No difference in DBP between treatment groups
• Alternative: HA
– Reject null hypothesis in favor of alternative hypothesis
– Often two-sided
– Example: HA: DBP differs between treatment groups
![Page 58: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/58.jpg)
58
Statistical ConceptsHypotheses
• Alternative hypothesis may be one-sided or two-sided
– Example:
• Null hypothesis: Mean DBP is same in patients receiving different treatments
• Alternative hypothesis:
– One-sided: Mean DBP is lower in patients receiving treatment A
– Two-sided: Mean DBP is different in patients receiving treatment A relative to treatment B
• Choice of alternative does affect sample size calculations. Typically a two-sided test is recommended.
![Page 59: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/59.jpg)
59
Statistical ConceptsErrors
• Errors associated with hypothesis testingTRUTH
Association NoAssociation
STUDYRejectNull
Correct Type I ErrorFalsepositive
Fail toRejectNull
Type IIErrorFalsenegative
Correct
![Page 60: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/60.jpg)
60
Statistical ConceptsSignificance Level
• Significance level: – Probability of a Type I error
– Probability of a false positive
– Example: If the effect on DBP of the treatments do not differ, what is the probability of incorrectly concluding that there is a difference between the treatments?
– When calculating sample size, we need to specify a significance level, meaning, the probability that we will detect a treatment effect purely by chance.
– Typically chosen to be 5%, or 0.05
![Page 61: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/61.jpg)
61
Statistical ConceptsPower
• Power: (1-)– Probability of detecting a true treatment effect
– (1- probability of a false negative) = (1-probability of Type II error) = (1-) = probability of a true positive
– Example: If the effects of the treatments do differ, what is the probability of detecting such a difference?
– Typically chosen to be 80-99%
![Page 62: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/62.jpg)
62
Treatment Effect
• What is the minimal, clinically significant difference in treatments we would like to detect?
• Pilot studies may indicate magnitude
• Example: The authors felt that a 3 mm Hg difference in DBP between the treatment groups was clinically significant
![Page 63: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/63.jpg)
63
Variability in Response
• To estimate sample size, we need an estimate of the variability of the response in the population
• Estimate variability from pilot or previous, related study
• Example: The authors estimate that the standard deviation of DBP is 7 mm Hg.
![Page 64: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/64.jpg)
64
Factors Influencing Sample Size
Assuming all other factors fixed,
power sample size
significance level sample size
variability in response sample size
significant difference sample size
![Page 65: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/65.jpg)
65
Factors Influencing Power
Assuming all other factors fixed,
significance level power
significant difference power
variability in response power
sample size power
![Page 66: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/66.jpg)
66
Summary
• Sample size calculations are an important component of study design
• Want sufficient statistical power to detect clinically significant differences between groups when such differences exist
• Calculated sample sizes are estimates
• Can manipulate sample size formulas to determine:– What is the power for detecting a particular difference given the
sample size employed?
– What difference can be detected with a certain amount of power given the sample size employed?
![Page 67: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/67.jpg)
67
Factors Influencing Sample Size
• A double-blind randomized trial was conducted to determine how inhaled corticosteroids compare with oral corticosteroids in the management of severe acute asthma in children. In the study, 100 children were randomized to receive one dose of either 2 mg of inhaled fluticasone or 2 mg of oral prednisone per kilogram of body weight. The primary outcome was forced expiratory volume (as a percentage of the predicted value) 4 hours after treatment administration.
• Schuh et al., (2000) NEJM. 343(10)689-694.
![Page 68: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/68.jpg)
68
Factors Influencing Sample Size
• The null hypothesis is that the mean FEV, as a percentage of predicted value, is the same for both treatment groups.
• The alternative hypothesis is that the mean FEV, as a percentage of predicted value, is different for the two treatment groups.
![Page 69: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/69.jpg)
69
• What is a Type I Error in this example?
– Incorrectly concluding that the treatments differ
• What is a Type II Error in this example?– Failing to detect a true treatment difference
![Page 70: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/70.jpg)
70
In the article the authors state “In order to
allow detection of a 10 percentage point difference between the groups in the degree of improvement in FEV (as a percentage of the predicted value) from base line to 240 minutes and to maintain an error of 0.05 and a error of 0.10, the required size of the sample was 94 children.”.
What is the power of the study and what does it mean?
What is the significance level of the study and what does this level mean?
![Page 71: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/71.jpg)
71
• Power:– The power is 90%– There is a 90% chance of detecting a treatment
difference of 10 percentage points, given such a difference really exists
• Significance Level:– The significance level is 0.05– There is a 5% chance of concluding the
treatments differ when in fact there is no difference
![Page 72: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/72.jpg)
72
• Assuming a 5 percentage point difference between the groups, what happens to power?– The power of the study, as proposed, would be
less than 90%
• Assuming an 0.01 significance level what happens to power?– The power of the study, as proposed, would be
less than 90%
![Page 73: 1 Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 18, 2004](https://reader034.vdocuments.net/reader034/viewer/2022042702/56649d2c5503460f94a01855/html5/thumbnails/73.jpg)
73
References
Descriptive Statistics
• Altman, D.G., Practical Statistics for Medical Research. Chapman & Hall/CRC, 1991.
Sample Size Justification
• Freiman, J. A. et al. “The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial: Survey of 72 “negative” trials. N Engl J Med. 299:690-694, 1978.
• Friedman, L. M., Furberg, C. D., DeMets, D. L., Fundamentals of Clinical Trials, Springer-Verlag, 1998, Chapter 7.
• Lachin, J. M. “Introduction to sample size determination and power analysis for clinical trials”. Controlled Clinical Trials. 2:93-113. 1981.