dr.s.nishan silva (mbbs)

64
Dr.S.Nishan Silva (MBBS)

Upload: nelson

Post on 05-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Research Statistics 2. Dr.S.Nishan Silva (MBBS). My weight. Plot as a function of time data was acquired:. Comments: background is white (less ink); Font size is larger than Excel default (use 14 or 16). Do not use curved lines to connect data points - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dr.S.Nishan Silva (MBBS)

Dr.S.Nishan Silva

(MBBS)

Page 2: Dr.S.Nishan Silva (MBBS)

day weight day weight day weight

1 140 31 143.9 61 1442 140.1 32 144 62 144.23 139.8 33 142.5 63 144.54 140.6 34 142.9 64 144.25 140 35 142.8 65 143.96 139.8 36 143.9 66 144.27 139.6 37 144 67 144.58 140 38 144.8 68 144.39 140.8 39 143.9 69 144.2

10 139.7 40 144.5 70 144.911 140.2 41 143.9 71 14412 141.7 42 144 72 143.813 141.9 43 144.2 73 14414 141.4 44 143.8 74 143.815 142.3 45 143.5 75 14416 142.3 46 143.8 76 144.517 141.9 47 143.2 77 143.718 142.1 48 143.5 78 143.919 142.5 49 143.6 79 14420 142.3 50 143.4 80 144.221 142.1 51 143.9 81 14422 142.5 52 143.6 82 144.423 143.5 53 144 83 143.824 143 54 143.8 84 144.125 143.2 55 143.626 143 56 143.827 143.4 57 14428 143.5 58 144.229 142.7 59 14430 143.7 60 143.9

My weight

Plot as a function of time data was acquired:

Page 3: Dr.S.Nishan Silva (MBBS)

139

140

141

142

143

144

145

146

0 10 20 30 40 50 60

Day

weigh

t (lbs

)

Do not use curved lines to connect data points – that assumes you know more about the relationship of the data than you really do

Comments: background is white (less ink); Font size is larger than Excel default (use 14 or 16)

day weight day weight day weight1 140 31 143.9 61 1442 140.1 32 144 62 144.23 139.8 33 142.5 63 144.54 140.6 34 142.9 64 144.25 140 35 142.8 65 143.96 139.8 36 143.9 66 144.27 139.6 37 144 67 144.58 140 38 144.8 68 144.39 140.8 39 143.9 69 144.2

10 139.7 40 144.5 70 144.911 140.2 41 143.9 71 14412 141.7 42 144 72 143.813 141.9 43 144.2 73 14414 141.4 44 143.8 74 143.815 142.3 45 143.5 75 14416 142.3 46 143.8 76 144.517 141.9 47 143.2 77 143.718 142.1 48 143.5 78 143.919 142.5 49 143.6 79 14420 142.3 50 143.4 80 144.221 142.1 51 143.9 81 14422 142.5 52 143.6 82 144.423 143.5 53 144 83 143.824 143 54 143.8 84 144.125 143.2 55 143.626 143 56 143.827 143.4 57 14428 143.5 58 144.229 142.7 59 14430 143.7 60 143.9

Page 4: Dr.S.Nishan Silva (MBBS)

Assume my weight is a single, random, set of similar data

0

5

10

15

20

25

Weight (lbs)

# o

f O

bse

rvat

ion

sMake a frequency chart (histogram) of the data

Create a “model” of my weight and determine averageWeight and how consistent my weight is

139

140

141

142

143

144

145

146

0 10 20 30 40 50 60

Day

weigh

t (lbs

)

Page 5: Dr.S.Nishan Silva (MBBS)

0

5

10

15

20

25

Weight (lbs)

# o

f O

bse

rvat

ion

s

= measure of the consistency, or similarity, of weights

average143.11

s = 1.4 lbs

Inflection pt

s = standard deviation

Page 6: Dr.S.Nishan Silva (MBBS)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-5 -4 -3 -2 -1 0 1 2 3 4 5

s

Am

pli

tud

e

Width is measuredAt inflection point =s

W1/2

Triangulated peak: Base width is 2s < W < 4s

Page 7: Dr.S.Nishan Silva (MBBS)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-5 -4 -3 -2 -1 0 1 2 3 4 5

s

Am

pli

tud

e

+/- 1s

Area +/- 2s = 95.4%

Area +/- 3s = 99.74 %

pp s~ 6

Pp = peak to peak – or – largest separation of measurements

Peak to peak is sometimesEasier to “see” on the data vs time plot

Area = 68.3%

Page 8: Dr.S.Nishan Silva (MBBS)

139

140

141

142

143

144

145

146

0 10 20 30 40 50 60

Day

weigh

t (lbs

)

Peak topeak

pp s~ 6

139.5

144.9

s~ pp/6 = (144.9-139.5)/6~0.9

(Calculated s= 1.4)

0

5

10

15

20

25

Weight (lbs)

# o

f O

bse

rvat

ion

s

Page 9: Dr.S.Nishan Silva (MBBS)

Inferential Statistics

Used to determine the likelihood that a conclusion based on data from a sample is true

Page 10: Dr.S.Nishan Silva (MBBS)

Terms

p value: the probability that an observed difference could have occurred by chance

Page 11: Dr.S.Nishan Silva (MBBS)

Standardised Normal distribution

• FormulaZ = X- µ

óZ – SNDX – variableµ Mean and ó varience

Page 12: Dr.S.Nishan Silva (MBBS)

SND table of values

Page 13: Dr.S.Nishan Silva (MBBS)

Regression and Correlation

• Correlation– To analyze the relationship between two

variables

• Regression– Dependant of the variable x on variable y– In this course we consider only two

- In real life, multiple variable interactions are possible.

Page 14: Dr.S.Nishan Silva (MBBS)

Example : X = Height, Y = Body weight

Page 15: Dr.S.Nishan Silva (MBBS)

Basic Linear regression Equation

• Equation: Y` = a + bx– b is the gradient, slope or regression

coefficient– a is the intercept of the line at Y axis or

regression constant– Y` is a value for the outcome– x is a value for the predictor (real x valye)

Page 16: Dr.S.Nishan Silva (MBBS)
Page 17: Dr.S.Nishan Silva (MBBS)
Page 18: Dr.S.Nishan Silva (MBBS)

Correlation Coefficient

• Page 100 lower down

Page 19: Dr.S.Nishan Silva (MBBS)

Correlation coefficient ranges from 0 to 1

Page 20: Dr.S.Nishan Silva (MBBS)

Correlation coefficient ranges from 0 to 1

Page 21: Dr.S.Nishan Silva (MBBS)
Page 22: Dr.S.Nishan Silva (MBBS)

Finding the significance of “r”

• Simple correlation significance– http://www.biology.ed.ac.uk/archive/jdeacon/s

tatistics/table6.html#Correlation coefficient

• Pierson Product-moment coefficient– http://www.experiment-resources.com/pearso

n-product-moment-correlation.html

Page 23: Dr.S.Nishan Silva (MBBS)

• Refferences– Best -

http://www.biology.ed.ac.uk/archive/jdeacon/statistics/tress11.html

– In detailhttp://www.statsdirect.com/help/regression_and_correlation/rcr.htm

Page 24: Dr.S.Nishan Silva (MBBS)

Inferential Statistics – Page 102

• Sample statistics – “Generalized” to the entire population

• Formulate hypothesis

• ? Null Hypothesis

• Prove hypothesis

Page 25: Dr.S.Nishan Silva (MBBS)

Types of Errors

Nodifference

Difference

Nodifference

TYPE IIERROR ()

Difference TYPE IERROR ()

Truth

Conclusion

Power = 1-The probability of a type 2 error)

Page 26: Dr.S.Nishan Silva (MBBS)

confidence interval:

The range of values we can be reasonably certain includes the true value.

Page 27: Dr.S.Nishan Silva (MBBS)

If the “probability” of the true value not being included is less than 5% we

reject the null hypothesis

Page 28: Dr.S.Nishan Silva (MBBS)

Example

Page 29: Dr.S.Nishan Silva (MBBS)

The Use of the Null Hypothesis

• Is the difference in two sample populations due to chance or a real statistical difference?

• The null hypothesis assumes that there will be no “difference” or no “change” or no “effect” of the experimental treatment.

• If treatment A is no better than treatment B then the null hypothesis is supported.

• If there is a significant difference between A and B then the null hypothesis is rejected...

Page 30: Dr.S.Nishan Silva (MBBS)

Parametric tests

• T test Page 104

Page 31: Dr.S.Nishan Silva (MBBS)

T Table

Page 32: Dr.S.Nishan Silva (MBBS)

T-test

• T-test determines the probability that the null hypothesis concerning the means of two small samples is correct

• The probability that two samples are representative of a single population (supporting null hypothesis) OR two different populations (rejecting null hypothesis)

Page 33: Dr.S.Nishan Silva (MBBS)

Use t-test to determine whether or not sample population A and B came from the same or different population

t = x1-x2 / sx1-sx2

x1 (bar x) = mean of A ; x2 (bar x) = mean of Bsx1 = std error of A; sx2 = std error of B

Example: Sample A mean =8Sample B mean =12Std error of difference of populations =1

12-8/1 = 4 std deviation units

Page 34: Dr.S.Nishan Silva (MBBS)

Non Parametric test

• Chi Squared test – Page 108

– Test for Goodness of fit – Test of independence

Page 35: Dr.S.Nishan Silva (MBBS)

Chi square

• Used with discrete values

• Phenotypes, choice chambers, etc.

• Not used with continuous variables (like height… use t-test for samples less than 30 and z-test for samples greater than 30)

• O= observed values

• E= expected values

Page 36: Dr.S.Nishan Silva (MBBS)

http://course1.winona.edu/sberg/Equation/chi-squ2.gif

Page 37: Dr.S.Nishan Silva (MBBS)

Interpreting a chi square

• Calculate degrees of freedom• # of events, trials, phenotypes -1• Example 2 phenotypes-1 =1• Generally use the column labeled 0.05 (which

means there is a 95% chance that any difference between what you expected and what you observed is within accepted random chance.

• Any value calculated that is larger means you reject your null hypothesis and there is a difference between observed and expect values.

Page 38: Dr.S.Nishan Silva (MBBS)
Page 39: Dr.S.Nishan Silva (MBBS)

How to use a chi square chart

http://faculty.southwest.tn.edu/jiwilliams/probab2.gif

Page 40: Dr.S.Nishan Silva (MBBS)

T-test or Chi Square? Testing the validity of the null hypothesis

• Use the T-test (also called Student’s T-test) if using continuous variables from a normally distributed sample populations (ex. Height)

• Use the Chi Square (X2) if using discrete variables (if you are evaluating the differences between experimental data and expected or hypothetical data)… Example: genetics experiments, expected distribution of organisms.

Page 41: Dr.S.Nishan Silva (MBBS)

Qualitative Analysis – Pages 113-114

• Phenomenology– Data collected using interviews, tapes etc– Analyzed as the researcher prefers– Describes using descriptive statistics

• Ethnography– Data collected using note taking, observation etc– Categorised– Relationships between patterns, identified

• Concurrent Analysis– Qualitative data is transformed to numerical data– Qualitative value may be lost

Page 42: Dr.S.Nishan Silva (MBBS)

Using Excel(Example)

Page 43: Dr.S.Nishan Silva (MBBS)

Microsoft Excel

• A Spreadsheet Application. It features calculation, graphing tools, pivot tables and a macro programming language called VBA (Visual Basic for Applications).

• There are many versions of MS-Excel. Excel XP, Excel 2003, Excel 2007 are capable of performing a number of statistical analyses.

• Starting MS Excel: Double click on the Microsoft Excel icon on the desktop or Click on Start --> Programs --> Microsoft Excel.

• Worksheet: Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page. Each cell is referenced by its coordinates. For example, A3 is used to refer to the cell in column A and row 3. B10:B20 is used to refer to the range of cells in column B and rows 10 through 20.

Page 44: Dr.S.Nishan Silva (MBBS)

Microsoft Excel

Creating Formulas: 1. Click the cell that you want to enter the formula, 2. Type = (an equal sign), 3. Click the Function Button, 4. Select the formula you want and step through the on-screen instructions.

xf

Opening a document: File Open (From a existing workbook). Change the directory area or drive to look for file in other locations.

Creating a new workbook: FileNewBlank Document

Saving a File: FileSave

Selecting more than one cell: Click on a cell e.g. A1), then hold the Shift key and click on another (e.g. D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range.

Page 45: Dr.S.Nishan Silva (MBBS)

Microsoft Excel

• Entering Date and Time: Dates are stored as MM/DD/YYYY. No need to enter in that format. For example, Excel will recognize jan 9 or jan-9 as 1/9/2007 and jan 9, 1999 as 1/9/1999. To enter today’s date, press Ctrl and ; together. Use a or p to indicate am or pm. For example, 8:30 p is interpreted as 8:30 pm. To enter current time, press Ctrl and : together.

• Copy and Paste all cells in a Sheet: Ctrl+A for selecting, Ctrl +C for copying and Ctrl+V for Pasting.

• Sorting: Data Sort Sort By …• Descriptive Statistics and other Statistical methods:

ToolsData Analysis Statistical method. If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Page 46: Dr.S.Nishan Silva (MBBS)

Histograms in Excel

Select

Tools/Data Analysis

1

Page 47: Dr.S.Nishan Silva (MBBS)

Choose Histogram

2

3

Input data range and bin range (bin range is a cell range containing the upper class boundaries for each class grouping)

Select Chart Output and click “OK”

Histograms in Excel(continued)

(

Page 48: Dr.S.Nishan Silva (MBBS)

Microsoft Excel

Statistical and Mathematical Function: Start with ‘=‘ sign and then select function from function wizard .xf

Inserting a Chart: Click on Chart Wizard (or InsertChart), select chart, give, Input data range, Update the Chart options, and Select output range/ Worksheet.

Importing Data in Excel: File open FileType Click on File Choose Option ( Delimited/Fixed Width) Choose Options (Tab/ Semicolon/ Comma/ Space/ Other) Finish.

Limitations: Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases.

Page 49: Dr.S.Nishan Silva (MBBS)

Computing the Mean

• Sum xi divide by n (or N for population mean)

• Excel– =AVERAGE(cellrange)

Page 50: Dr.S.Nishan Silva (MBBS)

Computing the Mode

• Value that occurs most often in discretized data

• Excel– =MODE(cellrange)– Reports first value seen if tie

Page 51: Dr.S.Nishan Silva (MBBS)

Computing the Median

• The middle value in sorted data

• Excel– =MEDIAN(cellrange)

Page 52: Dr.S.Nishan Silva (MBBS)

Computing the Range

• Range is min to max values

• Excel– =MIN(cellrange)– =MAX(cellrange)

Page 53: Dr.S.Nishan Silva (MBBS)

Computing the Standard Deviation

• Std. Dev. is Square-Root of Variance

• Excel– =STDEV(cellrange) - sample– =STDEVP(cellrange) - population– =VAR(cellrange) - sample– =VARP(cellrange) - population

Page 54: Dr.S.Nishan Silva (MBBS)

Tables and Charts for Categorical Data: Univariate

DataCategorical

Data

Graphing Data

Pie Charts Pareto Diagram

Bar Charts

Tabulating Data

Summary Table

Page 55: Dr.S.Nishan Silva (MBBS)

The Summary Table

Example: Current Investment Portfolio

Investment Amount Percentage Type (in thousands $) (%)

Stocks 46.5 42.27

Bonds 32.0 29.09

CD 15.5 14.09

Savings 16.0 14.55

Total 110.0 100.0

(Variables are Categorical)

Summarize data by category

Page 56: Dr.S.Nishan Silva (MBBS)

Bar and Pie Charts

• Bar charts and Pie charts are often used for qualitative (category) data

• Height of bar or size of pie slice shows the frequency or percentage for each category

Page 57: Dr.S.Nishan Silva (MBBS)

Bar Chart Example

Investor's Portfolio

0 10 20 30 40 50

Stocks

Bonds

CD

Savings

Amount in $1000's

Investment Amount PercentageType (in thousands $) (%)

Stocks 46.5 42.27

Bonds 32.0 29.09

CD 15.5 14.09

Savings 16.0 14.55

Total 110.0 100.0

Current Investment Portfolio

Page 58: Dr.S.Nishan Silva (MBBS)

Pie Chart Example

Percentages are rounded to the nearest percent

Current Investment Portfolio

Savings

15%

CD 14%

Bonds 29%

Stocks

42%

Investment Amount PercentageType (in thousands $) (%)

Stocks 46.5 42.27

Bonds 32.0 29.09

CD 15.5 14.09

Savings 16.0 14.55

Total 110.0 100.0

Page 59: Dr.S.Nishan Silva (MBBS)

Pareto Diagram Examplecu

mu

lative % in

vested

(line g

raph

)%

in

vest

ed i

n e

ach

cat

ego

ry

(bar

gra

ph

)

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

Stocks Bonds Savings CD

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Current Investment Portfolio

Page 60: Dr.S.Nishan Silva (MBBS)

• Side by side bar charts

(continued)

Tabulating and Graphing Multivariate Categorical Data

Comparing Investors

0 10 20 30 40 50 60

S toc k s

B onds

CD

S avings

Inves tor A Inves tor B Inves tor C

Page 61: Dr.S.Nishan Silva (MBBS)

Side-by-Side Chart Example• Sales by quarter for three sales territories:

0

10

20

30

40

50

60

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

EastWestNorth

1st Qtr 2nd Qtr 3rd Qtr 4th QtrEast 20.4 27.4 59 20.4West 30.6 38.6 34.6 31.6North 45.9 46.9 45 43.9

Page 62: Dr.S.Nishan Silva (MBBS)

http://www.bmj.com/bmj-series/statistics-notes

Best source for you…

BMJ Statistics notes…

Page 63: Dr.S.Nishan Silva (MBBS)
Page 64: Dr.S.Nishan Silva (MBBS)