biostat - 2

26
BIOSTAT - 2 • The final averages for the last 200 students who took this course are Are you worried? 90 80 76 84 53 58 68 73 92 70 79 63 82 80 93 80 73 50 74 50 100 53 57 50 65 72 89 81 98 86 78 51 52 92 61 61 57 84 81 56 55 63 90 94 63 94 56 74 90 98 90 85 82 59 51 54 57 81 86 73 93 61 50 67 85 52 61 81 82 94 81 75 50 81 69 73 68 91 65 76 76 69 97 66 73 53 80 63 75 74 98 77 60 59 57 90 91 85 83 51 78 79 79 74 90 94 87 75 74 79 55 63 89 87 71 53 67 54 77 57 67 57 53 52 94 76 60 80 72 74 64 63 69 66 92 83 51 95 65 97 60 72 50 89 51 95 60 67 59 84 82 87 68 68 90 79 92 95 83 63 52 56 86 53 61 61 63 82 87 71 86 54 73 88 92 70 79 91 79 89 79 65 97 51 52 54 71 57 69 84 74 65 52 90 71 83 79 85 89 57

Upload: euclid

Post on 25-Feb-2016

90 views

Category:

Documents


0 download

DESCRIPTION

BIOSTAT - 2. The final averages for the last 200 students who took this course are Are you worried?. BIOSTAT - 2. Why not sort grades from highest to lowest [ordered array] Is this a more meaningful way to present the data?. BIOSTAT - 2. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: BIOSTAT - 2

BIOSTAT - 2

• The final averages for the last 200 students who took this course are

Are you worried?

90 80 76 84 53 58 68 73 92 7079 63 82 80 93 80 73 50 74 50100 53 57 50 65 72 89 81 98 8678 51 52 92 61 61 57 84 81 5655 63 90 94 63 94 56 74 90 9890 85 82 59 51 54 57 81 86 7393 61 50 67 85 52 61 81 82 9481 75 50 81 69 73 68 91 65 7676 69 97 66 73 53 80 63 75 7498 77 60 59 57 90 91 85 83 5178 79 79 74 90 94 87 75 74 7955 63 89 87 71 53 67 54 77 5767 57 53 52 94 76 60 80 72 7464 63 69 66 92 83 51 95 65 9760 72 50 89 51 95 60 67 59 8482 87 68 68 90 79 92 95 83 6352 56 86 53 61 61 63 82 87 7186 54 73 88 92 70 79 91 79 8979 65 97 51 52 54 71 57 69 8474 65 52 90 71 83 79 85 89 57

Page 2: BIOSTAT - 2

BIOSTAT - 2• Why not sort grades from highest to lowest [ordered

array]

• Is this a more meaningful way to present the data?

100 92 87 82 79 74 69 63 57 5398 92 87 82 79 74 68 63 57 5398 91 87 82 79 73 68 63 57 5298 91 86 82 79 73 68 63 57 5297 91 86 81 78 73 68 63 57 5297 90 86 81 78 73 67 61 57 5297 90 86 81 77 73 67 61 57 5295 90 85 81 77 73 67 61 56 5295 90 85 81 76 72 67 61 56 5195 90 85 81 76 72 66 61 56 5194 90 85 80 76 72 66 61 55 5194 90 84 80 76 71 65 60 55 5194 90 84 80 75 71 65 60 54 5194 89 84 80 75 71 65 60 54 5194 89 84 80 75 71 65 60 54 5093 89 83 79 74 70 65 59 54 5093 89 83 79 74 70 64 59 53 5092 89 83 79 74 69 63 59 53 5092 88 83 79 74 69 63 58 53 5092 87 82 79 74 69 63 57 53 50

Page 3: BIOSTAT - 2

BIOSTAT - 2

• Why not group the data into grades of A, B, C, D, and F [frequency distribution]

• That means we need to count the number of grades between 90 and 100, 80 and 89, etc.

• Go to “Tools”, “Data Analysis (might have go to Tools, Add-Ins, and click on the 2 Data Analysis modules), Histogram, and follow directions.

Page 4: BIOSTAT - 2

BIOSTAT - 2

• Input range: sweep all your data• Bin range: sweep the cell boundaries you

input somewhere on your spreadsheet – cell widths should normally be equal.

• Now click on Cumulative % and Chart Output [this will plot your histogram]

• OK

5060708090

100

Page 5: BIOSTAT - 2

BIOSTAT - 2

• Output:

• Histogram does not look right?

Bin Frequency Cumulative %50 6 3.00%60 43 24.50%70 36 42.50%80 45 65.00%90 45 87.50%

100 25 100.00%More 0 100.00%

Histogram

0

50

Bin

Frequency

0.00%100.00%200.00%

Frequency

Cumulative %

Page 6: BIOSTAT - 2

BIOSTAT - 2

• Fix histogram by eliminating gaps between cells.

• Find “format data series” and “gap width”. How you do this depends on version of Excel you have. Note angle on labels for X-axis.

Histogram

0

50

Bin

Frequency

0.00%100.00%200.00%

FrequencyCumulative %

Page 7: BIOSTAT - 2

BIOSTAT - 2

• Unfortunately grades of 50 were not included in cells 50-59. That’s because Excel counts based on the following

Actual Cell Bin Frequency Cumulative %< 50 50 6 0.03

> 50 - 60 60 43 0.245> 60 - 70 70 36 0.425> 70 - 80 80 45 0.65> 80 - 90 90 45 0.875

> 90 - 100 100 25 1More 0 1

Bins5060708090100

Page 8: BIOSTAT - 2

BIOSTAT - 2

• Following bins seem to work

100 92 87 82 79 74 69 63 57 5398 92 87 82 79 74 68 63 57 5398 91 87 82 79 73 68 63 57 5298 91 86 82 79 73 68 63 57 5297 91 86 81 78 73 68 63 57 5297 90 86 81 78 73 67 61 57 5297 90 86 81 77 73 67 61 57 5295 90 85 81 77 73 67 61 56 5295 90 85 81 76 72 67 61 56 5195 90 85 81 76 72 66 61 56 5194 90 85 80 76 72 66 61 55 5194 90 84 80 76 71 65 60 55 5194 90 84 80 75 71 65 60 54 5194 89 84 80 75 71 65 60 54 5194 89 84 80 75 71 65 60 54 5093 89 83 79 74 70 65 59 54 5093 89 83 79 74 70 64 59 53 5092 89 83 79 74 69 63 59 53 5092 88 83 79 74 69 63 58 53 5092 87 82 79 74 69 63 57 53 50

Actual Grades Actual Cells Bin Frequency Cumulative %0-49 < 49.9 49.9 0 0

50 - 59 >49.9 - 59.9 59.9 45 0.22560 - 69 >59.9 - 69.9 69.9 38 0.41570 - 79 >69.9 - 79.9 79.9 42 0.62580 - 89 >79.9 - 89.9 89.9 42 0.835

90 - 100 >89.9 - 100 100 33 1More 0 1

Page 9: BIOSTAT - 2

BIOSTAT - 2

• Final frequency table and histogram

Histogram

0

50

Bin

Frequency

0.00%100.00%200.00%

Frequency

Cumulative %

Actual Grades Frequency Relative Frequency Percent50 - 59 45 0.225 22.5%60 - 69 38 0.19 19.0%70 - 79 42 0.21 21.0%80 - 89 42 0.21 21.0%90 - 100 33 0.165 16.5%Total = 200 1 100.0%

Page 10: BIOSTAT - 2

BIOSTAT - 2

• Other statistical software will do the same thing, but you should always try out a small test case of data just to make sure that data is being placed into the proper cells.

Page 11: BIOSTAT - 2

BIOSTAT - 2

• Some key decisions:– How many cells should you have [we had 5 cells in

this example]. In general, you would have between 5 and 25 cells. The more data you have, the more cells you would want to use.

– How do you determine the Bin Ranges? Most statistical software will determine these bin ranges for you, but they might not be “neat” numbers. In this case, if you did not input specific bin ranges, you would get Bin Frequency

50 662.5 4975 53

87.5 53More 39

Page 12: BIOSTAT - 2

BIOSTAT - 2

• Problems– Work problems 2.3.1and 2.3.5– Look at data for problems 2.3.6 and 2.3.9

Page 13: BIOSTAT - 2

BIOSTAT - 2

• Numerical Techniques:– Measures of Central Tendency [Location]

• Arithmetic Mean• Median• Mode

• Measures of Dispersion [Variability]– Range– Variance– Standard Deviation

Page 14: BIOSTAT - 2

Measures of Central Location…

• The arithmetic mean, a.k.a. average, shortened to mean, is the most popular & useful measure of central location.

• It is computed by simply adding up all the observations and dividing by the total number of observations:

Sum of the observationsNumber of observationsMean =

Page 15: BIOSTAT - 2

Arithmetic Mean…

Population Mean Sample Mean

Page 16: BIOSTAT - 2

Measures of Central Location…

• The median is calculated by placing all the observations in order; the observation that falls in the middle is the median.

Data: {0, 7, 12, 5, 14, 8, 0, 9, 22} N=9 (odd)Sort them bottom to top, find the middle:0 0 5 7 8 9 12 14 22

Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10 (even)Sort them bottom to top, the middle is thesimple average between 8 & 9:0 0 5 7 8 9 12 14 22 33median = (8+9)÷2 = 8.5

Page 17: BIOSTAT - 2

Measures of Central Location…

• The mode of a set of observations is the value that occurs most frequently.

• A set of data may have one mode (or modal class), or two, or more modes. If no values occur more than one time each, it is said that the data has no mode.

Page 18: BIOSTAT - 2

Measures of Variability…

• Measures of central location fail to tell the whole story about the distribution; that is, how much are the observations spread out around the mean value?For example, two sets of

class grades are shown. The mean (=50) is the same in each case…

But, the red class has greater variability than the blue class.

Page 19: BIOSTAT - 2

Range…• The range is the simplest measure of variability,

calculated as:• Range = Largest observation – Smallest

observation• E.g.• Data: {4, 4, 4, 4, 50}Range = 46• Data: {4, 8, 15, 24, 39, 50} Range = 46

Page 20: BIOSTAT - 2

Variance…• Variance and its related measure, standard deviation, are

arguably the most important statistics. Used to measure variability, they also play a vital role in almost all statistical inference procedures.

• Population variance is denoted by• (Lower case Greek letter “sigma” squared)

• Sample variance is denoted by• (Lower case “S” squared)

Page 21: BIOSTAT - 2

Statistical Symbols

Population Sample

Size N n

Mean

Variance

Page 22: BIOSTAT - 2

Variance

• Population Variance:

• Sample Variance:

Page 23: BIOSTAT - 2

Sample Mean & Variance…Sample Mean

Sample Variance

Sample Variance (shortcut method)

Page 24: BIOSTAT - 2

Standard Deviation…

• The standard deviation is simply the square root of the variance, thus:

• Population standard deviation:

• Sample standard deviation:

Page 25: BIOSTAT - 2

Excel Computations from Previous Data

• Data:100 92 87 82 79 74 69 63 57 5398 92 87 82 79 74 68 63 57 5398 91 87 82 79 73 68 63 57 5298 91 86 82 79 73 68 63 57 5297 91 86 81 78 73 68 63 57 5297 90 86 81 78 73 67 61 57 5297 90 86 81 77 73 67 61 57 5295 90 85 81 77 73 67 61 56 5295 90 85 81 76 72 67 61 56 5195 90 85 81 76 72 66 61 56 5194 90 85 80 76 72 66 61 55 5194 90 84 80 76 71 65 60 55 5194 90 84 80 75 71 65 60 54 5194 89 84 80 75 71 65 60 54 5194 89 84 80 75 71 65 60 54 5093 89 83 79 74 70 65 59 54 5093 89 83 79 74 70 64 59 53 5092 89 83 79 74 69 63 59 53 5092 88 83 79 74 69 63 58 53 5092 87 82 79 74 69 63 57 53 50

Page 26: BIOSTAT - 2

Excel Computations from Previous Data

• Formulas:

• Results:

• Work Problem 2.5.7

Mean = =AVERAGE(A1:J20)Median = =MEDIAN(A1:J20)

Mode = =MODE(A1:J20) [Excel will show only one mode, if you have more than one mode]Variance = =VAR(A1:J20)Std. Dev. = =STDEV(A1:J20)

Mean = 73.11Median = 74

Mode = 79Variance = 200.62Std. Dev. = 14.16