biostatistics basics - biostatistics

55
1 © 2006 Biostatistics Basics An introduction to an expansive and complex field

Upload: medresearch

Post on 22-Nov-2014

3.391 views

Category:

Documents


33 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Biostatistics basics - Biostatistics

1© 2006

Biostatistics Basics

An introduction to an expansive and complex field

Page 2: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 2 © 2006

Common statistical terms

• Data – Measurements or observations of a variable

• Variable – A characteristic that is observed or

manipulated – Can take on different values

Page 3: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 3 © 2006

Statistical terms (cont.)

• Independent variables – Precede dependent variables in time – Are often manipulated by the researcher– The treatment or intervention that is used in a

study• Dependent variables

– What is measured as an outcome in a study – Values depend on the independent variable

Page 4: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 4 © 2006

Statistical terms (cont.)

• Parameters– Summary data from a population

• Statistics– Summary data from a sample

Page 5: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 5 © 2006

Populations

• A population is the group from which a sample is drawn – e.g., headache patients in a chiropractic

office; automobile crash victims in an emergency room

• In research, it is not practical to include all members of a population

• Thus, a sample (a subset of a population) is taken

Page 6: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 6 © 2006

Random samples

• Subjects are selected from a population so that each individual has an equal chance of being selected

• Random samples are representative of the source population

• Non-random samples are not representative– May be biased regarding age, severity of the

condition, socioeconomic status etc.

Page 7: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 7 © 2006

Random samples (cont.)

• Random samples are rarely utilized in health care research

• Instead, patients are randomly assigned to treatment and control groups – Each person has an equal chance of being

assigned to either of the groups • Random assignment is also known as

randomization

Page 8: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 8 © 2006

Descriptive statistics (DSs)

• A way to summarize data from a sample or a population

• DSs illustrate the shape, central tendency, and variability of a set of data – The shape of data has to do with the

frequencies of the values of observations

Page 9: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 9 © 2006

DSs (cont.)

– Central tendency describes the location of the middle of the data

– Variability is the extent values are spread above and below the middle values

• a.k.a., Dispersion

• DSs can be distinguished from inferential statistics – DSs are not capable of testing hypotheses

Page 10: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 10 © 2006

Hypothetical study data(partial from book)

Case # Visits11 77

22 2 2 33 2 2 44 3 3 55 4 4 66 3 3 77 5 5 88 3 3 99 4 4 1010 6 6 1111 2 2 1212 3 3 1313 7 7 1414 4 4

• Distribution provides a summary of:– Frequencies of each of the values

• 2 – 3• 3 – 4 • 4 – 3 • 5 – 1• 6 – 1• 7 – 2

– Ranges of values • Lowest = 2• Highest = 7

etc.

Page 11: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 11 © 2006

Frequency distribution table

Frequency Percent Cumulative %• 2 3 21.4 21.4• 3 4 28.6 50.0 • 4 3 21.4 71.4• 5 1 7.1 78.5• 6 1 7.1 85.6• 7 2 14.3 100.0

Page 12: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 12 © 2006

Frequency distributions are often depicted by a histogram

Page 13: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 13 © 2006

Histograms (cont.)

• A histogram is a type of bar chart, but there are no spaces between the bars

• Histograms are used to visually depict frequency distributions of continuous data

• Bar charts are used to depict categorical information – e.g., Male–Female, Mild–Moderate–Severe,

etc.

Page 14: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 14 © 2006

Measures of central tendency

• Mean (a.k.a., average) – The most commonly used DS

• To calculate the mean– Add all values of a series of numbers and

then divided by the total number of elements

Page 15: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 15 © 2006

Formula to calculate the mean

• Mean of a sample

• Mean of a population

(X bar) refers to the mean of a sample and refers to the mean of a population

X is a command that adds all of the X values n is the total number of values in the series of a sample and

N is the same for a population

X μ

NX

nXX

Page 16: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 16 © 2006

Measures of central tendency (cont.)

• Mode – The most frequently

occurring value in a series

– The modal value is the highest bar in a histogram

Mode

Page 17: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 17 © 2006

Measures of central tendency (cont.)

• Median– The value that divides a series of values in

half when they are all listed in order – When there are an odd number of values

• The median is the middle value – When there are an even number of values

• Count from each end of the series toward the middle and then average the 2 middle values

Page 18: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 18 © 2006

Measures of central tendency (cont.)

• Each of the three methods of measuring central tendency has certain advantages and disadvantages

• Which method should be used?– It depends on the type of data that is being

analyzed – e.g., categorical, continuous, and the level of

measurement that is involved

Page 19: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 19 © 2006

Levels of measurement

• There are 4 levels of measurement – Nominal, ordinal, interval, and ratio

1. Nominal – Data are coded by a number, name, or letter

that is assigned to a category or group – Examples

• Gender (e.g., male, female) • Treatment preference (e.g., manipulation,

mobilization, massage)

Page 20: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 20 © 2006

Levels of measurement (cont.)

2. Ordinal – Is similar to nominal because the

measurements involve categories– However, the categories are ordered by rank– Examples

• Pain level (e.g., mild, moderate, severe) • Military rank (e.g., lieutenant, captain, major,

colonel, general)

Page 21: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 21 © 2006

Levels of measurement (cont.)

• Ordinal values only describe order, not quantity– Thus, severe pain is not the same as 2 times

mild pain• The only mathematical operations allowed

for nominal and ordinal data are counting of categories– e.g., 25 males and 30 females

Page 22: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 22 © 2006

Levels of measurement (cont.)

3. Interval – Measurements are ordered (like ordinal

data) – Have equal intervals – Does not have a true zero – Examples

• The Fahrenheit scale, where 0° does not correspond to an absence of heat (no true zero)

• In contrast to Kelvin, which does have a true zero

Page 23: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 23 © 2006

Levels of measurement (cont.)

4. Ratio – Measurements have equal intervals – There is a true zero – Ratio is the most advanced level of

measurement, which can handle most types of mathematical operations

Page 24: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 24 © 2006

Levels of measurement (cont.)

• Ratio examples– Range of motion

• No movement corresponds to zero degrees• The interval between 10 and 20 degrees is the

same as between 40 and 50 degrees – Lifting capacity

• A person who is unable to lift scores zero• A person who lifts 30 kg can lift twice as much as

one who lifts 15 kg

Page 25: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 25 © 2006

Levels of measurement (cont.)

• NOIR is a mnemonic to help remember the names and order of the levels of measurement– Nominal

OrdinalIntervalRatio

Page 26: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 26 © 2006

Levels of measurement (cont.)

Measurement scale Permissible mathematic operations

Best measure ofcentral tendency

Nominal Counting Mode

Ordinal Greater or less than operations Median

Interval Addition and subtraction Symmetrical – MeanSkewed – Median

Ratio Addition, subtraction,multiplication and division

Symmetrical – MeanSkewed – Median

Page 27: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 27 © 2006

The shape of data

• Histograms of frequency distributions have shape

• Distributions are often symmetrical with most scores falling in the middle and fewer toward the extremes

• Most biological data are symmetrically distributed and form a normal curve (a.k.a, bell-shaped curve)

Page 28: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 28 © 2006

The shape of data (cont.)

Line depicting the shape of the data

Page 29: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 29 © 2006

The normal distribution

• The area under a normal curve has a normal distribution (a.k.a., Gaussian distribution)

• Properties of a normal distribution– It is symmetric about its mean– The highest point is at its mean– The height of the curve decreases as one

moves away from the mean in either direction, approaching, but never reaching zero

Page 30: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 30 © 2006

The normal distribution (cont.)

Mean

A normal distribution is symmetric about its mean

As one moves away from the mean in either direction the height of the curve decreases, approaching, but never reaching zero

The highest point of the overlying normal curve is at the mean

Page 31: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 31 © 2006

The normal distribution (cont.)

Mean = Median = Mode

Page 32: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 32 © 2006

Skewed distributions

• The data are not distributed symmetrically in skewed distributions – Consequently, the mean, median, and mode

are not equal and are in different positions– Scores are clustered at one end of the

distribution– A small number of extreme values are located

in the limits of the opposite end

Page 33: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 33 © 2006

Skewed distributions (cont.)

• Skew is always toward the direction of the longer tail– Positive if skewed to the right– Negative if to the left

The mean is shifted the most

Page 34: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 34 © 2006

Skewed distributions (cont.)

• Because the mean is shifted so much, it is not the best estimate of the average score for skewed distributions

• The median is a better estimate of the center of skewed distributions– It will be the central point of any distribution– 50% of the values are above and 50% below

the median

Page 35: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 35 © 2006

More properties of normal curves

• About 68.3% of the area under a normal curve is within one standard deviation (SD) of the mean

• About 95.5% is within two SDs• About 99.7% is within three SDs

Page 36: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 36 © 2006

More properties of normal curves (cont.)

Page 37: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 37 © 2006

Standard deviation (SD)

• SD is a measure of the variability of a set of data

• The mean represents the average of a group of scores, with some of the scores being above the mean and some below – This range of scores is referred to as variability

or spread • Variance (S2) is another measure of

spread

Page 38: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 38 © 2006

SD (cont.)

• In effect, SD is the average amount of spread in a distribution of scores

• The next slide is a group of 10 patients whose mean age is 40 years – Some are older than 40 and some younger

Page 39: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 39 © 2006

SD (cont.)

Ages are spread out along an X axis

The amount ages are spread out is known as dispersion or spread

Page 40: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 40 © 2006

Distances ages deviate above and below the mean

Adding deviations always equals zero

Etc.

Page 41: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 41 © 2006

Calculating S2

• To find the average, one would normally total the scores above and below the mean, add them together, and then divide by the number of values

• However, the total always equals zero– Values must first be squared, which cancels

the negative signs

Page 42: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 42 © 2006

Calculating S2 cont.

Symbol for SD of a sample for a population

S2 is not in the same units (age), but SD is

Page 43: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 43 © 2006

Calculating SD with Excel

Enter values in a column

Page 44: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 44 © 2006

SD with Excel (cont.)

Click Data Analysis on the Tools menu

Page 45: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 45 © 2006

SD with Excel (cont.)

Select Descriptive Statistics and click OK

Page 46: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 46 © 2006

SD with Excel (cont.)

Click Input Range icon

Page 47: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 47 © 2006

SD with Excel (cont.)

Highlight all the values in the column

Page 48: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 48 © 2006

SD with Excel (cont.)

Check if labels are in the first row

Check Summary Statistics

Click OK

Page 49: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 49 © 2006

SD with Excel (cont.)

SD is calculated preciselyPlus several other DSs

Page 50: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 50 © 2006

Wide spread results in higher SDs narrow spread in lower SDs

Page 51: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 51 © 2006

Spread is important when comparing 2 or more group means

It is more difficult to see a clear distinctionbetween groups in the upper example because the spread is wider, even though the means are the same

Page 52: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 52 © 2006

z-scores

• The number of SDs that a specific score is above or below the mean in a distribution

• Raw scores can be converted to z-scores by subtracting the mean from the raw score then dividing the difference by the SD

Xz

Page 53: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 53 © 2006

z-scores (cont.)

• Standardization – The process of converting raw to z-scores – The resulting distribution of z-scores will

always have a mean of zero, a SD of one, and an area under the curve equal to one

• The proportion of scores that are higher or lower than a specific z-score can be determined by referring to a z-table

Page 54: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 54 © 2006

z-scores (cont.)

Refer to a z-tableto find proportionunder the curve

Page 55: Biostatistics basics - Biostatistics

Evidence-based Chiropractic 55 © 2006

z-scores (cont.)

Partial z-table (to z = 1.5) showing proportions of the area under a normal curve for different values of z.

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359

0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753

0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141

0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517

0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879

0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224

0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549

0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852

0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133

0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389

1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621

1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830

1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015

1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177

1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319

1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.94410.9332

Corresponds to the area under the curve in black