basics of biostatistics for health research session 2 – february 14 th , 2013

Basics of Biostatistics for Health ResearchSession 2 – February 14th, 2013

Dr. Scott Patten, Professor of EpidemiologyDepartment of Community Health Sciences

& Department of Psychiatry

[email protected]

• Go to “www.ucalgary.ca/~patten” www.ucalgary.ca/~patten

• Scroll to the bottom.

• Right click to download the files described as being “for PGME Students”– One is a dataset– One is a data dictionary

• Save them on your desktop

http://www.ucalgary.ca/~patten

Open the Datafile

The task from last week…

• Create a 95% exact binomial confidence interval for the proportion of people with Framingham with > H.S. education

Review of Last Week’s Task

• “use”

• “generate”

• “recode”

• “tabulate”

• “ci”

The actual commands…

generate highschool = educ

recode highschool 1/2=0 3/4=1

tabulate highschool

ci highschool, binomial

Creating a “do” file…

1

2 3

The “do file” editor

Executing a “do” file

What is a “do” file?

• It is a text file – you can copy and paste from the output window in Stata, or from a word processor

• It is a computer program that consists of actual commands and therefore doesn’t need a compiler

• Others would call it a “macro”

Different Types of Data

• One type of distinction– Nominal (e.g. sex, race)– Ordinal (e.g. rating scales)– Cardinal (e.g. physical measures)

• Another type of distinction– Categorical (e.g. # of pregnancies)– Continuous (e.g. height, weight)

Body Mass Index (BMI)

The BMI in our Data Set

This is an example of a continuous variable

Changing Data Types in Stata(e.g. continuous to categorical)

• recode bmi x/y=z

• This will recode all values of the variable bmi having values from x to y to a single value equal to z.

Interpretation of BMI

• Underweight: < 18.5

• Normal weight: 18.5 to 25

• Over weight: >25 to 30

• Obese: 30+

• Your task: Make a “do file” that calculates a 95% confidence interval for the proportion of the population that are overweight or obese.

Example of Code for this…

generate owo = bmi

recode owo 0/25 = 0 25.01/100 = 1

tab owo, missing

ci owo, binomial

Another Task…

• Add a use command to your do file

• Save your “do file” on the desktop using a descriptive file name of your choice

• Exit Stata

• Open Stata again

• Open the “do file” editor and select your do file

• Execute your “do file”

The Power of “do files”

• Task: Calculate an exact 95% CI for the proportion of the population that are obese (BMI > 30)

• IMPORTANT: do NOT start from scratch as we did before – try to do this by editing your do file.

generate owo = bmirecode owo 0/25 = 0 25.01/100 = 1tab owo, missingci owo, binomial

generate owo = bmigenerate obese = bmirecode owo 0/25 = 0 25.01/100 = 1recode obese 0/30 = 0 30.01/100=1tab owo, missingtab obese, missingci owo obese, binomial

For Example…

Starting a Log File

1

2 3

Closing a Log File

1

23

Another Task…

• Start a log file

• Run your “do file”

• Close and save the resulting log file on your desktop

• Open your log file

“do file” Etiquette

• When you add an * before a line on a “do file” Stata will ignore that line

• Use this to….– Add descriptive comments to your code– Remove commands that you don’t want now,

but might want later

E.g. Without the Tables…

Review…

• Make a value label for obesity

• Attach this value label to the variable representing obesity

Making a Graphic

The Pie Chart Dialogue Box

Find the Variable that you made

1

2

Unedited Output

The Graph Editor

Here is a good place to start

See if you can do these things…

• Change the color of the pie

• Add a title

• Add a comment

• Change the background

• Create a work of art

Save in a Standard Format

Back to BMI• May not wish to categorize variables like this

• Measures of central tendency– Mode– Median– Mean

• Different types of graphs are useful for examining continuous variables– Box plots– Histograms

Box Plots

Terminology

• Median: value with 50% of observations above and 50% below.

• Interquartile range – contains 50% of observations – plus or minus one quartile

• Adjacent values (whiskers) – observation that is less than 1.5x the IQR

• Outliers: anything outside of the adjacent values

Calculating Summary Stats

Calculate summary stats for BMI


Calculate the mean BMI


Calculate median BMI

Make a Box (and whisker) Plot

The Boxplot Dialogue Box

1

2

Select BMI fromthe dropdownlist

Introducing Histograms

1

2

The Histogram Dialogue Box

Select thevariable here

Select thebin# here

A Task for You to Do…

• Make 3 histograms of BMI– In one use the default number of bins– In one use a larger number– In one, use a smaller number

• Save your favorite histogram• Open it in the graph editor, give it a title and

improve its appearance• Save it in a standard form (e.g. png, jpg, tif)

Assessing Normality with a Histogram

The distribution is not quite normal, but close

Is BMI Higher in Men or Women?

• We could use confidence intervals to assess this…

• E.g. 12

3

Here is the dialogue box…Once you’ve selected BMI, click this

The dialogue box, continued..

Enter sex as a group variable

The output

2 25.62873 .0559382 25.51909 25.73838

1 26.20382 .0484566 26.10883 26.2988

bmi

Over Mean Std. Err. [95% Conf. Interval]

2: sex = 2

1: sex = 1

Mean estimation Number of obs = 11575

. mean bmi, over(sex)

It looks better with value labels

Women 25.62873 .0559382 25.51909 25.73838

Men 26.20382 .0484566 26.10883 26.2988

bmi

Over Mean Std. Err. [95% Conf. Interval]

Women: sex = Women

Men: sex = Men

Mean estimation Number of obs = 11575

. mean bmi, over(sex)

Statistical Tests• Start with an hypothesis that an “effect” exists

– In this case, that there is an effect of sex on BMI

• Assume that the effect DOES NOT exist– This is the null hypothesis

• Find the probability of results, or those more extreme given the null hypothesis– This is what the “test” calculates for you

• If the null is unlikely (alpha value), reject it

The t-test (assumptions)

• The variables are approximately normally distributed

• The standard deviations of the two groups are approximately equal

• The two samples are independent

Using summarize similarly

• Use summarize with “by” in the dialogue box

• Use histograms with a normal density plot and the “by” tab in the dialogue box

Your task: use these two techniques to assess the t-test assumptions.

Variance Comparisons

1

2

3

The t-test

1

2

3

The t-test dialogue box

1 2

3optional

Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Ho: diff = 0 Satterthwaite's degrees of freedom = 11572.4

diff = mean(Men) - mean(Women) t = 7.7706

diff .5750831 .0740075 .4300158 .7201504

combined 11575 25.87735 .0381332 4.10264 25.8026 25.9521

Women 6571 25.62873 .0559382 4.534443 25.51908 25.73839

Men 5004 26.20382 .0484566 3.427767 26.10882 26.29881

Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

Two-sample t test with unequal variances

. ttest bmi, by(sex) unequal

The output

Two group tests for proportions..

1

3

2

You can also do this with tab

tab obese sex, exact

1-sided Fisher's exact = 0.000

Fisher's exact = 0.000

Total 5,004 6,571 11,575

Obese 599 961 1,560

Not Obese 4,405 5,610 10,015

obese Men Women Total

sex

Your Final Task for Today

• Create a “do file” that …– Reads in the data– Recodes BMI to a categorical variable for

obesity– Tests whether obesity differs between men and

women

• Create a log file to store the results

basics of biostatistics for health research session 2 – february 14 th , 2013

Documents

tab owo

bmirecode owo

missingci owo obese

file stata

binomialgenerate owo

thisgenerate owo

descriptive file

file editorexecuting