saa 2023 computationaltechnique for biostatistics introduction & descriptive statistics
TRANSCRIPT
![Page 1: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/1.jpg)
SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS
SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS
Introduction & Introduction &
Descriptive StatisticsDescriptive Statistics
![Page 2: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/2.jpg)
StatisticsStatistics - technology used to describe and measure aspects of nature from samples
Statistics lets us quantify the quantify the uncertaintyuncertainty of these measures
IntroductionIntroduction
![Page 3: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/3.jpg)
StatisticsStatistics is also about good is also about good scientific practicescientific practice
The history of statistics has its roots in biologybiology
IntroductionIntroduction
![Page 4: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/4.jpg)
Sir Francis GaltonSir Francis Galton
Inventor of fingerprints, study of heredity of quantitative traits
Regression & correlation
Also: efficacy of prayer, attractiveness as function of distance from London
![Page 5: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/5.jpg)
Karl PearsonKarl Pearson
Polymath-
Studied genetics
Correlation coefficientc2 testStandard deviation
![Page 6: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/6.jpg)
Sir Ronald FisherSir Ronald Fisher
The Genetical Theory of Natural Selection
Founder of population genetics
Analysis of variance Likelihood P-valueRandomized experiments Multiple regressionetc., etc., etc.
![Page 7: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/7.jpg)
Statistical quotationsStatistical quotations
There are three kinds of lies: lies, damn lies, and statistics. Benjamin Disraeli / Mark Twain
It is easy to lie with statistics, but easier to lie without them. Frederick Mosteller
![Page 8: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/8.jpg)
Goals of statisticsGoals of statistics
Estimation Estimation Infer an unknown quantity of a population
using sample data Hypothesis testingHypothesis testing
Differences among groups Relationships among variables
![Page 9: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/9.jpg)
IntroductionIntroduction
Introduction to the basic concepts of statistics as applied to problems in biological science.
Goal of the course Understand statistical concepts (population,
sample,, slope, significant etc.); Identify appropriate methods for your data (e.g.,
one-sample, two-sample, paired t-test or independent t-test, one-way or two-way ANOVA);
Select correct MINITAB procedures to analyze data Scientific reading and interpretation.
![Page 10: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/10.jpg)
BiostatisticsBiostatistics Why study Biostatistics?
Statistical methods are widely used in biological field; Examples are from biological field, practical and useful; Focus on application instead of mathematical
derivation; Help to evaluate the paper in an intelligent manner.
Statistics - the science and art of obtaining reliable results and conclusions from data that is subject to variation.
Biostatistics (Biometry)- the application of statistics to the biological sciences.
![Page 11: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/11.jpg)
Why Computer Applications?Why Computer Applications? Statistical methods are mostly difficult and
complicated (ANOVA, regression etc); Advances in computer technology and
statistical software development make the application of statistical method much easier today than before;
Software such as MINITAB needs time to learn.
BiostatisticsBiostatistics
![Page 12: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/12.jpg)
Is Biostatistics hard to study?Is Biostatistics hard to study? Factors make it hard for some students to
learn statistics: The terminology is deceptive. To
understand statistics, you have to understand the statistical meaning of terms such as significant, error and hypothesis are distinct from ordinary uses of these words.
![Page 13: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/13.jpg)
Is Biostatistics hard to study?Is Biostatistics hard to study? Statistics requires mastering abstract
concepts. It is not easy to think about theoretical concepts such as populations, probability distributions, and null hypotheses.
Statistics is at the interface of mathematics and science. To really grasp the concepts of statistics, you need to be able to think about it from both angles.
![Page 14: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/14.jpg)
The derivation of many statistical tests involves difficult math. However, you can learn to use statistical tests and interpret the results even if you do not fully understand how they work. You only need to know enough about how the tool works so that you can avoid using them in inappropriate situations.
Is Biostatistics hard to study?Is Biostatistics hard to study?
![Page 15: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/15.jpg)
Basically, you can calculate statistical tests and interpret results even if you don’t understand how the equations were derived, as long as you know enough to use
the statistical tests appropriately.
Is Biostatistics hard to study?Is Biostatistics hard to study?
![Page 16: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/16.jpg)
Questions about this courseQuestions about this course Is this course to be hard?
No. Concept is easy and procedure is clear.
Why do we spend time on theoretical stuff? Helpful to understand the application
Do we need to know all the stuff? You may not need all, but be prepared
![Page 17: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/17.jpg)
Role of statistics in Role of statistics in Biological ScienceBiological Science
Science
1.Idea or Question
2.Collect data/make observations
3.Describe data / observations
4.Assess the strength of evidence for / against the hypothesis
Statistics
1.Mathematical model / hypothesis
2.Study design
3.Descriptive statistics
4.Inferential statistics
![Page 18: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/18.jpg)
Contents of the courseContents of the course Descriptive statistics
Graph, table, mean and standard deviation Inferential statistics
Probability and distribution Hypothesis test Analysis of Variation Correlation and regression analysis Other special topics
![Page 19: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/19.jpg)
Basic ConceptBasic Concept DataData
numerical facts, measurements, or observations obtained from an investigation, experiment aimed at answering a question
Statistical analyses deal with numbers
![Page 20: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/20.jpg)
Basic ConceptBasic Concept QuantitativeQuantitative
Usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations
![Page 21: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/21.jpg)
Basic ConceptBasic Concept QualitativeQualitative
Carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies
![Page 22: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/22.jpg)
Basic ConceptBasic Concept VariableVariable
a characteristic that can take on different values for different persons, places or things
Statistical analyses need variability; otherwise there is nothing to study
Examples:Examples: Concentration of a substance, pH values
obtained from atmospheric precipitation, birth weight of babies whose mothers are smokers, etc.
![Page 23: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/23.jpg)
A variablevariable is a characteristic measured on individuals drawn from a population under study.
DataData are measurements of one or more variables made on a collection of individuals.
Basic ConceptBasic Concept
![Page 24: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/24.jpg)
Basic ConceptBasic Concept Type of VariableType of Variable
Continuous variable Between any two values of a variable,
there is another possible value Examples: height, weight,
concentration Discrete variable
Value can be only integer Example: number of people, plant etc.
![Page 25: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/25.jpg)
Continuous variablesContinuous variables Can take any value to any degree of
precision in a certain range - height, weight, temperature (?)
Basic ConceptBasic Concept
![Page 26: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/26.jpg)
Discrete variables:Discrete variables: Can take only certain values or can
only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure (?) - may be handled differently in analysis
Basic ConceptBasic Concept
![Page 27: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/27.jpg)
Independent VariableIndependent Variable Dependent VariableDependent Variable
We try to predict or explain a response variable from an explanatory variable.
Basic ConceptBasic Concept
![Page 28: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/28.jpg)
Populations and samplesPopulations and samples
![Page 29: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/29.jpg)
Populations <-> Parameters;Samples <-> Estimates
Basic ConceptBasic Concept
![Page 30: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/30.jpg)
Nomenclature
Population
Parameters
Sample
Statistics
Mean
Variance s2
Standard Deviation
s
x
Basic ConceptBasic Concept
![Page 31: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/31.jpg)
Basic ConceptBasic Concept PopulationPopulation
Population parameters are constants whereas estimates are random variables, changing from one random sample to the next from the same population.
![Page 32: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/32.jpg)
Basic ConceptBasic Concept Population and SamplePopulation and Sample
SamplePopulation, StatisticParameter
population
sample
Parameter
predict properties of sample
statistic
Generalize to a population
![Page 33: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/33.jpg)
Basic ConceptBasic Concept PopulationPopulation
Population: a set or collection of objects we are interested in. (finite, infinite)
Parameter: a descriptive measure associated with a variable of an entire population, usually unknown because the whole population cannot be enumerated.
For example,Plant height under warming conditions;Graduates in USIM; Smokers in the world.
Example: number of people, plant etc.
![Page 34: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/34.jpg)
Basic ConceptBasic Concept Population and SamplePopulation and Sample
- Population Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Negeri Sembilan.
- Sample Sample - selected part of a population – Form Three girls, Form Five boys, etc.
![Page 35: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/35.jpg)
Basic ConceptBasic Concept
A sample of conveniencesample of convenience is a collection of individuals that happen to be available at the time.
![Page 36: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/36.jpg)
Basic ConceptBasic Concept SamplingSampling
essence of statistical inference – why?
Why sample?Why sample? Cannot afford time or money to record measurements on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc.
![Page 37: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/37.jpg)
Basic ConceptBasic Concept
SamplingSamplingPrecise Imprecise
Biased
Unbiased
![Page 38: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/38.jpg)
Basic ConceptBasic Concept
BiasBias is a systematic discrepancy between estimates and the true population characteristic.
![Page 39: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/39.jpg)
Basic ConceptBasic Concept
Sampling error Sampling error - The difference between the estimate and average value of the estimate is a systematic discrepancy between estimates and the true population characteristic.
![Page 40: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/40.jpg)
Basic ConceptBasic Concept
Larger samplesLarger samples on average will have smaller sampling error.
![Page 41: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/41.jpg)
Basic ConceptBasic Concept Properties of a good sampleProperties of a good sample
Independent selection of individuals Random selection of individuals Sufficiently large
![Page 42: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/42.jpg)
Basic ConceptBasic Concept SamplingSampling So how do 'intervention studies fit So how do 'intervention studies fit
into this?into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.
![Page 43: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/43.jpg)
Basic ConceptBasic Concept SampleSample
SampleSample: a small number of subjects from a population to make inference about the population;
Random sampleRandom sample: A sample of size n drawn from a population of size N in such a way that every possible sample of size n has the same chance of being selected.
StatisticStatistic: a descriptive measure associated with a random variable of a sample.
![Page 44: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/44.jpg)
Basic ConceptBasic Concept RandomRandom
Variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight
race or age are 'fixed' variables; i.e., not random
![Page 45: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/45.jpg)
Basic ConceptBasic Concept RandomRandom
In a random samplerandom sample, each member of a population has an equal and independent chance of being selected.
![Page 46: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/46.jpg)
Descriptive StatisticsDescriptive Statistics Graphical SummariesGraphical Summaries
Frequency distribution Histogram Stem and Leaf plot Boxplot
Numerical SummariesNumerical Summaries Location – mean, median, mode. Spread – range, variance, standard deviation Shape – skewness, kurtosis
![Page 47: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/47.jpg)
Example:Example: Number of grass plants, Mytilus edulis, found in 800 sample quadrats (1m2) in an ecological study of grasses:
Frequency DistributionFrequency Distribution- Discrete variables- Discrete variables
![Page 48: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/48.jpg)
Example:Example: Number of grass plants, Mytilus edulis, found in 800 sample quadrats (1m2) in an ecological study of grasses:
1, 4, 1, 0, 0, 1, 0, 0, 2, 3, 1, 2, 3, 1, 0, 2, 0, 1, 2,
………………………………………………………
1, 2, 3, 2, 1, 1, 0, 5, 0, 0, 1, 0, 1, 0, 2, 4, 7, 2, 1,0
How is the plant number in a quadrat distributed?
Frequency DistributionFrequency Distribution- Discrete variables- Discrete variables
![Page 49: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/49.jpg)
Table 1. The frequency, relative frequency, cumulative frequencies of plant sedge in a quadrat.
Plants/quadrat (Xi) Frequency (fi) Relative frequency (fi/n*100) Cumulative relative frequency0 268 33.500 33.5001 316 39.500 73.0002 135 16.875 89.8753 61 7.625 97.5004 15 1.875 99.3755 3 0.375 99.7506 1 0.125 99.8757 1 0.125 100.000
Total 800 100.000
• frequency - number of times value occurs in data.(probability for population).
• relative frequency - the % of the time that the value occurs (frequency/n).
• cumulative relative frequency - the % of the sample that is equal to or smaller than the value (cumulative frequency/n).
Frequency DistributionFrequency Distribution- Discrete variables- Discrete variables
![Page 50: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/50.jpg)
Histogram (Bar graph) and polygonHistogram (Bar graph) and polygon
Histogram graph of frequencies Histogram graph of frequencies Can be used to visually compare frequencies Easier to assess magnitude of differences rather than
trying to judge numbers
Frequency polygon - similar to histogramFrequency polygon - similar to histogram
Fig. 1. Frequency distribution of plants in a quadrat.
![Page 51: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/51.jpg)
Grouping of Grouping of continuouscontinuous outcome outcome Examples: weight, height. Better understanding of what data show
rather than individual values Example:Example: Fiber length of a cotton (n=106)
Data:
27.5,28.6,29.4,30.5,31.4,29.8,27.6,28.7,27.6…………
31.8,32.0,27.8
Frequency DistributionFrequency Distribution- Continuous variables- Continuous variables
![Page 52: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/52.jpg)
Length (Xi, mm) Frequency (fi) Relative frequency (%) Cumulative relative frequency27.0~27.5 1 0.943396226 0.94339622627.5~28.0 3 2.830188679 3.77358490628.0~28.5 6 5.660377358 9.43396226428.5~29.0 13 12.26415094 21.6981132129.0~29.5 18 16.98113208 38.6792452829.5~30.0 19 17.9245283 56.6037735830.0~30.5 17 16.03773585 72.6415094330.5~31.0 16 15.09433962 87.7358490631.0~31.5 6 5.660377358 93.3962264231.5~32.0 5 4.716981132 98.1132075532.0~32.5 2 1.886792453 100Total 106 100
Table 2. Frequency and relative frequency distribution of fiber length (mm) of a cotton variety (n=106)
Frequency DistributionFrequency Distribution- Continuous variables- Continuous variables
![Page 53: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/53.jpg)
Calculate Range: R=max(X)-min(x)=5.13Calculate Range: R=max(X)-min(x)=5.13 Set Number of intervals g and interval Set Number of intervals g and interval
range irange i Some “rules” exist, but generally create 8-15
equal sized intervals, g=11 i =R/(g-1)=0.5
Set intervalsSet intervals L1=min(X)-i /2=27.0, L2=L1+i =27.5, …
Count number in each intervalCount number in each interval
Frequency DistributionFrequency Distribution- Continuous variables- Continuous variables
![Page 54: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/54.jpg)
02468
101214161820
27.0~
27.5
27.5~
28.0
28.0~
28.5
28.5~
29.0
29.0~
29.5
29.5~
30.0
30.0~
30.5
30.5~
31.0
31.0~
31.5
31.5~
32.0
32.0~
32.5
Length (mm)
Fre
qu
ency
Fig. 2. Frequency distribution in fiber length of a cotton.
0
2
4
6
8
10
12
14
16
18
20
27 28 29 30 31 32 33Length (mm)
Fre
qu
ency
0
10
20
30
40
50
60
70
80
90
100
27 28 29 30 31 32 33
Length (mm)
Acc
um
ula
te r
elat
ive
freq
uen
cy
Histogram (Bar graph) and polygonHistogram (Bar graph) and polygon
![Page 55: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/55.jpg)
HistogramHistogram A histogram is a way of summarising data that are
measured on an interval scale (either discrete or continuous). It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient form. It divides up the range of possible values in a data set into classes or groups. For each group, a rectangle is constructed with a base length equal to the range of values in that specific group, and an area proportional to the number of observations falling into that group. This means that the rectangles might be drawn of non-uniform height.
![Page 56: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/56.jpg)
Histogram
The histogram is only appropriate for variables whose values are numerical and measured on an interval scale. It is generally used when dealing with large data sets (>100 observations), when stem and leaf plots become tedious to construct. A histogram can also help detect any unusual observations (outliers), or any gaps in the data set.
![Page 57: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/57.jpg)
Histogram
![Page 58: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/58.jpg)
Another way to assess frequenciesAnother way to assess frequencies Does preserve individual measure information, so
not useful for large data sets Stem is first digit(s) of measurements, leaves are
last digit of measurements Most useful for two digit numbers, more
cumbersome for three+ digits 20: X30: XXX40: XXXX50: XX60: X
2* | 13* | 2444* | 24685* | 266* | 4
Stem leaf
Stem and Leaf DisplaysStem and Leaf Displays
![Page 59: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/59.jpg)
Stem and Leaf Plot A stem and leaf plot is a way of
summarising a set of data measured on an interval scale. It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient and easily drawn form.
![Page 60: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/60.jpg)
Stem and Leaf Plot
A stem and leaf plot is similar to a histogram but is usually a more informative display for relatively small data sets (<100 data points). It provides a table as well as a picture of the data and from it we can readily write down the data in order of magnitude, which is useful for many statistical procedures, e.g. in the skinfold thickness example below:
![Page 61: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/61.jpg)
Stem and Leaf Plot
We can compare more than one data set by the use of multiple stem and leaf plots. By using a back-to-back stem and leaf plot, we are able to compare the same characteristic in two different groups, for example, pulse rate after exercise of smokers and non-smokers.
![Page 62: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/62.jpg)
In practice, descriptive statistics play In practice, descriptive statistics play a major rolea major role Always the first 1-2 tables/figures in a paper Statistician needs to know about each
variable before deciding how to analyze to answer research questions
In any analysis, 90% of the effort goes In any analysis, 90% of the effort goes into setting up the datainto setting up the data Descriptive statistics are part of that 90%
SummarySummary
![Page 63: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/63.jpg)
Descriptive measure computed from Descriptive measure computed from population data - parameterpopulation data - parameter
Descriptive measure computed from Descriptive measure computed from sample data - statisticsample data - statistic
Most common measures of locationMost common measures of location Mean Median Mode Geometric Mean, harmonic mean
Descriptive StatisticsDescriptive Statistics - Measures of Location- Measures of Location
![Page 64: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/64.jpg)
Suppose we have N measurements of a particular variable in a population.We denote these N measurements as:
X1, X2, X3,…,XN
where X1 is the first measurement, X2 is the second, etc.
DefinitionDefinition
More accurately called the arithmetic mean, it is defined as the sum of measures observed divided by the number of observations.
N
X
N
XX
NX
NX
N
N
ii
N
121
1...
11
Arithmetic mean (population)Arithmetic mean (population)
![Page 65: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/65.jpg)
Sample: Suppose we have n measurements of a particular variable in a population with N measurements.The n measurements are:
X1, X2, X3,…,Xn
where X1 is the first measurement, X2 is the second, etc.
DefinitionDefinition
n
XX
nX
nX
nx i
n
1...
1121
Arithmetic mean (sample)Arithmetic mean (sample)
![Page 66: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/66.jpg)
Some Properties of the Arithmetic Mean
1. ,
2.
Prove: 1.
2.
min)( 22 xXxi
)( xXx ii ;0)( xXx ii
;0)( xnXxXx iii
,' exx
22
2222
222
)(
)(2)(])(2)[(
])[()()'(
exX
exXexXexXexX
exXexXxX
i
iiii
iii
Arithmetic mean (sample)Arithmetic mean (sample)
![Page 67: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/67.jpg)
Frequently used if there are extreme values in a distribution or if the distribution is non-normal
DefinitionDefinition That value that divides the ‘ordered array’ into two
equal parts If an odd number of observations, the median Md will be
the (n+1)/2 observation ex.: median of 11 observations is the 6th observation
If an even number of observations, the median Md will be the midpoint between the middle two observations
ex.: median of 12 observations is the midpoint between 6th and 7th
MedianMedian
![Page 68: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/68.jpg)
Definition Value that occurs most frequently in data
set ExampleExample
2 3 4 5 3 4 5 6 7 5 3 2 5, mode Mo=5 If all values different, no modeIf all values different, no mode May be more than one modeMay be more than one mode
Bimodal or multimodal
Not used very frequently in practiceNot used very frequently in practice
ModeMode
![Page 69: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/69.jpg)
Suppose the ages of the 10 trees you are studying are: 34,24,56,52,21,44,64,34,42,46
Then the mean age of this group is:
To find the median, first order the data:
21,24,34,34,42,44,46,52,56,64
The mode is 34 years Mo=34 (occurred twice).
years7.41
10/417
10/)46423464442152562434(1
Xn
x
Median1
2
years
X X102
102
1
1
242 44
43
Mean are commonly used
Example: Central LocationExample: Central Location
![Page 70: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/70.jpg)
Used to calculate mean growth rateUsed to calculate mean growth rate DefinitionDefinition
Antilog of the mean of the log xi
nnXXXG
1
21 )(
n
XXXG nlog...loglog
log 21
Geometric mean Geometric mean
![Page 71: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/71.jpg)
Example: Root growth at 25Example: Root growth at 25ooC, C, calculate mean growth rate (mm/d).calculate mean growth rate (mm/d).
)/(31.11173.0log,1173.06
7040.0log 1 dmmGG
Day Root length(mm) Growth rate (Xi,mm/d)log(Xi)0 171 23 1.352941176 0.1312792 30 1.304347826 0.1153933 38 1.266666667 0.1026624 51 1.342105263 0.1277875 72 1.411764706 0.1497626 86 1.194444444 0.077166
Total 7.872270083 0.70405
Geometric mean Geometric mean
![Page 72: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/72.jpg)
Look at these two data sets:Look at these two data sets:Set 1: 100, 30, 20, 7, –20, –30, –100
Set 2: 10, 3, 2, 7, -2, -3, -10
If we calculate mean:If we calculate mean:
Set 1. Set 1.
Set 2.Set 2.
How to measure dispersion (spread, variability)?
1,7 xn1,7 xn
Descriptive StatisticsDescriptive Statistics- Measures of Dispersion- Measures of Dispersion
![Page 73: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/73.jpg)
Common measuresCommon measures Range Variance and Standard deviation Coefficient of variation
Many distributions are well-described Many distributions are well-described by measure of location and dispersionby measure of location and dispersion
Descriptive StatisticsDescriptive Statistics- Measures of Dispersion- Measures of Dispersion
![Page 74: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/74.jpg)
Range is the difference between the Range is the difference between the largest and smallest values in the data setlargest and smallest values in the data set
R=Max (Xi) - Min (Xi)
Heavily influenced by two most extreme values and ignores the rest of the distribution
Set 1: 100, 30, 20, 7, –20, –30, –100
Set 2: 10, 3, 2, 7, -2, -3, -10 R1=200 R2=20
Range Range
![Page 75: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/75.jpg)
Suppose we have N measurements of a particular variable in a population: X1, X2, X3,…,XN,
The mean is , as , we define:
as variance, unit is X unitas variance, unit is X unit22
as standard deviationas standard deviation
0)( iX
N
XX
NX
NX
Ni
N
222
22
12 )(
)(1
...)(1
)(1
N
X i2)(
Variance and Standard DeviationVariance and Standard Deviation- Population - Population
![Page 76: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/76.jpg)
Suppose we have n measurements of a particular variable in a sample: X1, X2, X3,…,Xn,
The mean is , we define:
as mean squares, or sample varianceas mean squares, or sample variance
as standard deviationas standard deviation
x
1
)( 22
n
xXs i
1
)( 2
n
xXs i
2
Variance and Standard DeviationVariance and Standard Deviation- Sample- Sample
![Page 77: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/77.jpg)
Corrected Sum of Squares (CSS)
Degree of freedom n-1 used because if we know n-1 deviations, the
nth deviation is known Deviations have to sum to zero
1
)( 22
n
xxs i
n
XXxXSS i
ii
222 )(
)(
1 ndf
Variance and Standard DeviationVariance and Standard Deviation
![Page 78: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/78.jpg)
Suppose the ages of the 10 trees you are studying are: 34,24,56,52,21,44,64,34,42,46, We calculated
Calculate range, variation, standard deviation and CV.7.41x
No. Xi x_bar Xi-x_bar (Xi-x_bar) 2̂ Xi 2̂1 34 41.7 -7.7 59.29 11562 24 41.7 -17.7 313.29 5763 56 41.7 14.3 204.49 31364 52 41.7 10.3 106.09 27045 21 41.7 -20.7 428.49 4416 44 41.7 2.3 5.29 19367 64 41.7 22.3 497.29 40968 34 41.7 -7.7 59.29 11569 42 41.7 0.3 0.09 1764
10 46 41.7 4.3 18.49 2116Total 417 0 1692.1 19081
R=64-21=43 y, s2=1692.1/9=188.01 y2, s=13.72 y.
Example Example
![Page 79: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/79.jpg)
Relative variation rather than absolute Relative variation rather than absolute variation such as standard deviationvariation such as standard deviation
Definition of C.VDefinition of C.V.
Useful in comparing variation between two Useful in comparing variation between two distributionsdistributions Used particularly in comparing laboratory
measures to identify those determinations with more variation
100x
sCV
Coefficient of Variation Coefficient of Variation
![Page 80: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/80.jpg)
Set 1: 100, 30, 20, 7, –20, –30, –100
Set 2: 10, 3, 2, 7, -2, -3, -10
Calculate , s2, s and CV.
Set s2 s CV
1 1 3773.7 61.4 61.4
2 1 44.7 6.7 6.7
x
x
Example Example
![Page 81: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/81.jpg)
Descriptive method to convey information Descriptive method to convey information about measures of location and dispersionabout measures of location and dispersion Box-and-Whisker plots
Construction of boxplotConstruction of boxplot Box is IQR Line at median Whiskers at smallest and largest
observations Other conventions can be used, especially
to represent extreme values
Box PlotsBox Plots
![Page 82: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/82.jpg)
-20
0
20
40
Increment in Systolic B.P.
1 2 3 4Drug
Box PlotsBox Plots
![Page 83: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/83.jpg)
Box and Whisker Plot (or Boxplot)
A box and whisker plot is a way of summarising a set of data measured on an interval scale. It is often used in exploratory data analysis. It is a type of graph which is used to show the shape of the distribution, its central value, and variability. The picture produced consists of the most extreme values in the data set (maximum and minimum values), the lower and upper quartiles, and the median
![Page 84: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/84.jpg)
Box and Whisker Plot (or Boxplot)
A box plot (as it is often called) is especially helpful for indicating whether a distribution is skewed and whether there are any unusual observations (outliers) in the data set.Box and whisker plots are also very useful when large numbers of observations are involved and when two or more data sets are being compared.
![Page 85: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/85.jpg)
Box and Whisker Plot (or Boxplot)
![Page 86: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/86.jpg)
Box and Whisker Plot (or Boxplot)
![Page 87: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/87.jpg)
Box and Whisker Plot (or Boxplot)
![Page 88: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/88.jpg)
5-Number Summary
A 5-number summary is especially useful when we have so many data that it is sufficient to present a summary of the data rather than the whole data set. It consists of 5 values: the most extreme values in the data set (maximum and minimum values), the lower and upper quartiles, and the median.A 5-number summary can be represented in a diagram known as a box and whisker plot. In cases where we have more than one data set to analyse, a 5-number summary is constructed for each, with corresponding multiple box and whisker plots.
![Page 89: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/89.jpg)
Outlier
An outlier is an observation in a data set which is far removed in value from the others in the data set. It is an unusually large or an unusually small value compared to the others.An outlier might be the result of an error in measurement, in which case it will distort the interpretation of the data, having undue influence on many summary statistics, for example, the mean.
![Page 90: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/90.jpg)
Outlier
If an outlier is a genuine result, it is important because it might indicate an extreme of behaviour of the process under study. For this reason, all outliers must be examined carefully before embarking on any formal analysis. Outliers should not routinely be removed without further justification.
![Page 91: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/91.jpg)
Interpreting a Boxplot
![Page 92: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/92.jpg)
Interpreting a Boxplot
The boxplot is interpreted as follows:The box itself contains the middle 50% of the data. The upper edge (hinge) of the box indicates the 75th percentile of the data set, and the lower hinge indicates the 25th percentile. The range of the middle two quartiles is known as the inter-quartile range.The line in the box indicates the median value of the data.
![Page 93: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/93.jpg)
Interpreting a Boxplot
The boxplot is interpreted as follows:If the median line within the box is not equidistant from the hinges, then the data is skewed.The ends of the vertical lines or "whiskers" indicate the minimum and maximum data values, unless outliers are present in which case the whiskers extend to a maximum of 1.5 times the inter-quartile range.The points outside the ends of the whiskers are outliers or suspected outliers.
![Page 94: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/94.jpg)
Boxplot Enhancements
Beyond the basic information, boxplots sometimes are enhanced to convey additional information:The mean and its confidence interval can be shown using a diamond shape in the box.The expected range of the median can be shown using notches in the box.The width of the box can be varied in proportion to the log of the sample size.
![Page 95: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/95.jpg)
Advantages of Boxplots
Boxplots have the following strengths:Graphically display a variable's location and spread at a glance.Provide some indication of the data's symmetry and skewness.Unlike many other methods of data display, boxplots show outliers.By using a boxplot for each categorical variable side-by-side on the same graph, one quickly can compare data sets.
![Page 96: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/96.jpg)
Disadvantage of Boxplots
One drawback of boxplots is that they tend to emphasize the tails of a distribution, which are the least certain points in the data set. They also hide many of the details of the distribution. Displaying a histogram in conjunction with the boxplot helps in this regard, and both are important tools for exploratory data analysis.
![Page 97: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/97.jpg)
Boxplot Example 1
Check location and variation shifts Box plots are an excellent tool for conveying location and variation information in data sets, particularly for detecting and illustrating location and variation changes between different groups of data. Sample Plot:This box plot reveals that machine has a significant effect on energy with respect to location and possibly variation
![Page 98: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/98.jpg)
Boxplot Example 1
![Page 99: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/99.jpg)
Boxplot Example 1
This box plot, comparing four machines for energy output, shows that machine has a significant effect on energy with respect to both location and variation. Machine 3 has the highest energy response (about 72.5); machine 4 has the least variable energy response with about 50% of its readings being within 1 energy unit.
![Page 100: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/100.jpg)
Boxplot Example 1
These MINITAB boxplots represent lottery payoffs for winning numbers for three time periods (May 1975-March 1976, November 1976-September 1977, and December 1980-September 1981).
![Page 101: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/101.jpg)
Boxplot Example 1
The median for each dataset is indicated by the black center line, and the first and third quartiles are the edges of the red area, which is known as the inter-quartile range (IQR).
![Page 102: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/102.jpg)
Boxplot Example 1
The extreme values (within 1.5 times the inter-quartile range from the upper or lower quartile) are the ends of the lines extending from the IQR. Points at a greater distance from the median than 1.5 times the IQR are plotted individually as asterisks. These points represent potential outliers.
![Page 103: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/103.jpg)
Boxplot Example 1
In this example, the three boxplots have nearly identical median values. The IQR is decreasing from one time period to the next, indicating reduced variability of payoffs in the second and third periods. In addition, the extreme values are closer to the median in the later time periods.
![Page 104: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/104.jpg)
Boxplot Example 2
As shown in the figure, a line is drawn from the upper hinge to the upper adjacent value and from the lower hinge to the lower adjacent value. Every score between the inner and outer fences is indicated by an "o" whereas a score beyond the outer fences is indicated by a "*".
![Page 105: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/105.jpg)
Boxplot Example 2
It is often useful to compare data from two or more groups by viewing box plots from the groups side by side. The data from 2b are higher, more spread out, and have a positive skew. That the skew is positive can be determined by the fact that the mean is higher than the median and the upper whisker is longer than the lower whisker.
![Page 106: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/106.jpg)
Boxplot Example 3
Although the medians are all roughly the same, you can see at a glance that the spread of each data set is different. The boxplot on the left shows data that appears to be distributed evenly. The median is in the middle of the rectangle, and the whiskers are about the same length. In addition, the plot contains no outside values. The median of the second plot from the left appears to be slightly off-center. The amount of extreme values is a point of concern because it suggests that the data vary widely.
![Page 107: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/107.jpg)
Boxplot Example 3
The third boxplot shows data that has less variation and spread than the other plots. The fourth boxplot shows data that is significantly upwardly-skewed. The median of this plot is closer to the top of the rectangle than to the bottom, and the upper whisker is longer than the bottom one. All the boxplots have approximately the same median, and the two boxplots on the left have approximately the same variation in the data.
![Page 108: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/108.jpg)
Descriptive Statistics
(Summmary) Graphical Summaries
Frequency distribution Histogram Stem and Leaf plot Boxplot
Numerical Summaries Location - mean, median, mode. Dispersion - range, variance, standard
deviation Shape
![Page 109: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/109.jpg)
Statistical softwareStatistical software SAS SPSS Stata BMDP MINITAB
Graphical softwareGraphical software Sigmaplot Harvard Graphics PowerPoint Excel
Software Software
![Page 110: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/110.jpg)
BiostatisticsBiostatistics
![Page 111: SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649e165503460f94b010d9/html5/thumbnails/111.jpg)
BiostatisticsBiostatistics