biostatistics - avcr.czbaloun.entu.cas.cz/png/biostatistics.pdf · biostatistics. aims of...
TRANSCRIPT
Biostatistics
Aims of statistics
• (1) Descriptive statistics – to summarize data, to extract the information from many independent values to a small number of parameters or to a diagram
Name PointsAnton Jan 70.5
Balzarov á Martina 72.5
Bendová Lenka 65.5
Blabolil Petr 71
Blažek Petr 87
Břendová Veronika 67.5
Čermáková Helena 88
Černíková Zuzana 94
Černý Jiří
Chalupecký František 59
Choma Michal 76.5
Chundelová Daniela 51
Doanová Tereza 69
Dortová Markéta 60.5
Dufek Luboš 69.5
Dvořáková Veronika 72
Effenberková Lenka 62
Franta Petr 74
Hajžmanová Tereza 72
Havlan Luboš 57.5
Hejna Ondřej 76
Holá Hana 81
Horák Jan
Jalovecká Marie 98.5
Jarolímová Zuzana 65
Jarošová Andrea 80
Jenčov á
Jerkovičová Diana 69
Jonáková Martina 91
Jůzlová Zuzana 85
Compare
Average number of points was 74.5,
whereas the minimum value was 28 and the maximum value was 100.
Histogram četností
20 30 40 50 60 70 80 90 100 110
Body
0
2
4
6
8
10
12
14
16
18
20
22N
o of
obs
Frequency diagram
No. of points
The lower number of parameters I obtain
• the more transparent and clearer the result is
• the loss of information is bigger though
Aims of statisticsPopulation and sample
• (2) Interferential statistics- Making an inference about (statistical) population from a sample
• Some (statistical) populations are too large [or potentially infinite] – consequently, I am not able to sample all the individuals (sampling units)
• What can I say about ammount of Cd in blood of all cuscus in PNG, when I took blood just from 10 specimens?
Interferential statistic is common in biology
I don’t want to know, whether the average number of species was on average higher in primary forest light trap in comparison with the river during the ten nights of our project, but whether there would be difference any time I do a similar project again
• Should this be a science, the experiments have to be reproducible
Populationand Random sample
• Sampling; Sampling design• Random sample – every individual
(sampling unit) has to have the same probability to be sampled, independent whether another individual has been sampled
• Tables and generators of (pseudo)randomnumbers
To make a random sampling isn’t usually trivial –in no case it is a
sampling of typical individuals – itworks reasonably well in agricultural experiments
1
2
3
1 2 3 4 5 6
Basic statistical parameters (characteristics)
• We usually mark N – size of the population, n – size of sample
• Parameters of the population are estimated
• Characteristics of location and variability:
• Means, median and modus
• Means are defined for quantitative data (i.e. on ratio and interval scale)
Arithmetic mean
n
XX
n
i
i∑== 1
of a sample
Geometrical mean
• n-root of the sum of n values (for a sample here)
∏ =
n
iiXn
1
Compare with the mean of log(x)
Median [used for ordinal-scale data also]
• One half of the individual values is under and the secondhalf above the median (in infinite population, the probability that randomly selected value is above as well as below the median is 0.5).
Upper and lower quartile
• One quarter of individual observations is above the upper quartil, one quarter is below the lower quartil
Make difference among meaningof mean and median
Company A Company B8000 70009000 7500
11000 800012000 8500 Median15000 1100018000 1800020000 39000
13286 14143 Mean
Example – salories paid in two companies
Modus – the most common value in the data data – in continuous
data it is the “peak” in frequency diagram –
mean
mean
mean mean
median
median median
median
Characteristics of variability
• 1. Rangeis a difference between minimum and maximum
• 2. Interquartile range
• 3. Variance and standard deviation
Variance – average value of squared difference between the
value and mean• population -
2
12 )(
N
XN
ii∑ =−
=µ
σ
estimation based on the sample
1
)(1
22
−−
= ∑ =
n
XXs
n
ii n-1 = df = degrees of
freedom
Standard deviation (sx, often also “s.d.” or “S.D.”) is square
root of variance – it is a characteristics of variability
Standard error of mean
• Characteristic of estimate precision – how large would be the variability of means estimated from samples of this size
ss
nx
x=
precisionvariability
in data
We can increase the precision by increasing sample size.
Graphical summary – frequency diagramHistogram (OHRAZENI 8v*21c)
POČET_SE = 21*100*normal(x, 314.8095, 173.2422)
0 100 200 300 400 500 600 700 800
POČET_SEMENÁČU
0
1
2
3
4
5
6
7
8
No
of o
bs
NO_SAPLING
Box and whisker plot
Box Plot ( 8v*21c)
Median = 329 25%-75% = (196, 363) Non-Outlier Range = (93, 500) Outliers Extremes0
100
200
300
400
500
600
700
800
Take care, box & whisker plot is now also used for mean and standard deviation etc.
NO_SAPLING