biostatistics. why statistics? you want to make the strongest conclusions based on limited data...
Post on 21-Dec-2015
215 views
TRANSCRIPT
Why Statistics?
You want to make the strongest conclusions based on limited data
Differences in biological systems sometimes cannot be easily observed
Random variation?
Real difference?
Statistics sometimes are Unnecessary
Large differences in observed events
And small scatter within groups
In most instances, though, the use of statistics can provide you with mathematically-based conclusions
Clinical research
Field research
Statistics extrapolate from sample to population
The only way to draw absolute conclusions about a population is to measure the trait(s) of interest of every individual in that population
The reality is, this is almost always impossible to do
Thus, randomly sampling some of the individuals can provide information about the entire population
Sometimes random sampling can be difficult to define
If your sample is not random, then conclusions drawn from it are not reliable
Samples and Populations
Quality control
A company manufactures 20,000 vials (population) of a vaccine from a single production run
About 50 vials (samples) are taken from this production run and analyzed for a variety of characteristics
The results on 50 vials are then extrapolated to the remaining vials
Samples and Populations
Political polls
The number of eligible U. S. voters is about 125,000,000 (population)
A few hundred or thousands (sample) are asked to respond to political questions
Samples and Populations
Clinical studies
Patients in a clinical study (sample) have a clinical condition (e.g., disease)
They rarely reflect the entire population
However, they often reflect the population with the condition
Sampling humans can be particularly difficult
Samples and Populations
Field experiments
Local variations
Impact of weather
Environmental conditions/changes
Human impact
Sampling bias
Samples and Populations
Laboratory experiments
Usually not necessary
Highly-controlled experiments
Single variable
Genetically-defined organisms
Very little variation
What statistical calculations can do
Statistical estimationCalculation of a mean within a population is a precise number
However, the number is only an estimate of the whole population
Statistical hypothesis testingHelps determine if an observed difference is due simply to random chance
Provides a P value; if P is small, the difference is unlikely due to random chance and the conclusion is statistically significant
Statistical modelingTests how well experimental data fit a mathematical model
The most common form of statistical modeling is linear regression
LR usually determines the best straight line through a set of data points
What statistical calculations cannot do
Analysis of a simple experimentDefine a population you are interested inRandomly select a sample of subjects to studyRandomly split the sample subjects into two groups
One group gets one treatment
The other group gets another treatment
Measure a single variable trait in each subjectUse statistical tests to determine if there’s a difference between the groups
What statistical calculations cannot doThe problems with real experiments
Populations can be more diverse than your samples
Samples are collected on convenience, rather than randomly
The measured value is proxy value for what you’re really interested in
Errors in data collectionRecord data incorrectly
Assays may not report what you think they report
You need to combine different types of measurements to reach an overall conclusion (multiple variables)
Why statistics are difficult to learn
Deceptive terminology (significant, error, hypothesis)
Statistical conclusions are never absolute (statistically significant)
Statistics uses abstract concepts (populations, probabilities)
Statistics are at the interface of math and science
Many statistical calculations require complex math
Variables
Independent variable - The variable scientists manipulate to evaluate a response
Dependent variable - The variable (i.e., trait) resulting from a treatment with an independent variable
Variables
Types of variables in biology
Measurement variables
Continuous
Discontinuous
Ranked variables
Attributes
Variables
Measurement variables - Those whose differing states can be expressed in a numerically-ordered fashionContinuous
Can assume any value between two distinct points
For example, there are infinite numbers between 1.5 and 1.6
Include: lengths, areas, volumes, weights, angles, temperatures, periods of time, percentages, rates
Discontinuous Discrete values that can only have fixed numerical values
The number of segments in an insect’s appendage may be 4, 5, or 6, but not 4.3
Variables
Ranked variables
Variables that cannot be measured
For example, order of emergence of pupae without regard to time
Attribute variables
Variables that cannot be measured, but must be expressed qualitatively
For example: black/white; pregnant/nonpregnant; male/female; live/dead
Appropriate tests
Design Measurement Var Ranked Var Attribute Var
1 variable1 sample
Computing median and frequenciesComputing meansComputing standard deviations
Confidence limits for percentagesRuns test for randomness
1 Variable2 samples
t-testsTest of equalityPaired comparisons test
Mann-Whitney U-testKolmogorov-Smirnov two-sample test
Testing differences between two percentages
1 Variable2+ Samples
ANOVATukey-Kramer test
Kruskal-Wallis testFriedman’s random-ized block test
G-test for percentages
2 Variables1 Sample
Regression analysisPolynomial regressionOlmstead and Tukey’s corner test
Ordering testSpearman’s rank test
Chi-square testFisher’s exact test
Means and Standard Deviations
The mean is the average of measured trait from a populationIn biology, we usually compare two or more populations, which we call groups
The standard deviation is the variance around the meanMany statistical tests use means and standard deviations to determine if there are significant differences between groups
null hypothesis
Used to assume an event is true
Statistics can be used to disprove the hypothesis
This lends support to an alternative hypothesis
Nearly every experiment that uses statistics should define null and alternative hypotheses
Student’s T-test
Determines if there is a significant difference between the means of two groups of measured data
Paired - compares matched values between members of a group
Unpaired - assumes values between members are not related
Tests values for fit to a normal (aka -Gaussian) distribution (“bell curve”)
If not, then use nonparametric testing
One-tailed vs. two-tailed
One-tailed: You must specify which group will have a larger mean in advance of data collection
Two-tailed: You do not know which group will have a larger mean in advance of data collection
Student’s T-test
P value: Is there a significant difference between the means of the two groups?
Generally, if the P value is less than or equal to 0.05, then the difference is considered significant
t-value:
Positive if the first mean is larger than the second and negative if it is smaller
Student’s T-test
Confidence interval
The calculated mean is unlikely the exact same as the entire population
Assumes your samples are randomly collected and fit a normal distribution
If your sample is large with a small standard deviation, then your calculated mean likely is close to the actual mean
The CI is a calculation based upon sample size and standard deviation
If the CI is 95%, then the range of your calculated mean (i.e, standard deviation) probably (95%) includes the actual mean of the population under study