data management & basic analysis interpretation of diagnostic test

48
Data Management & Basic Analysis Interpretation of Diagnostic test

Upload: camron-patterson

Post on 24-Dec-2015

231 views

Category:

Documents


1 download

TRANSCRIPT

Data Management & Basic AnalysisInterpretation of Diagnostic test

Content of Today’s Presentation

Data Management– How to design questionnaire in Epi Info 3.4.3?– How to do data entry in Epi Info?– Data Cleaning

Basic Data Analysis Validation of a Diagnostic test Hands-on using Epi Info

Data Management: Designing Questionnaire in Epi Info

How to design a questionnaire using Epi Info

Epi Info – free, downloadable software provided by the CDC. Website address is www.cdc.gov/epiinfo/

How to design a questionnaire using Epi Info

Field or variable Types– Label/Title– Number– Text– Multiline– Phone number– Date

How to design a questionnaire using Epi Info

Data entry check code options– Required: prevents missing values– Repeat Last: automatically repeat the last

value entered in that field– Range: sets minimum and maximum values– Legal Values: acceptable values, used with

text fields– Comment Legal Values: similar to legal

values, but only a code is saved during data entry and displayed in data analysis.

How to design a questionnaire using Epi Info

Steps to create a new questionnaire in Epi Info – Please follow the instructions from the hand-out given

Overview of Make view

Example: Questionnaire

Hands-on for creating a questionnaire using Epi info

Data Management: Entering Data in Epi Info

Hands-on for entering data using Epi info

Exercise 1: design a questionnaire

Participant’s General and Demographic Information

Data Management: Data Checking/Cleaning

Types of Variables

Variables

Qualitative Quantitative

NOMINAL ORDINAL DISCRETE CONTINUOUS

Categorical Data

For example: Blood group (A=1, B=2, O=3, AB=4)– Data should consist values of only 1, 2, 3 or 4.– Missing values are coded as 9.– Other coding for Blood group, i.e., 0, 5, 6, 7 or

8 is clearly wrong.

Continuous Data

Cannot usually identify precisely which values are plausible and which are not.

Possible to specify lower and upper limits on what is reasonable for the variable concerned – range checking.

Continuous Data

Range Checking– for example, in a study of pregnancy, limits for

maternal age might be 14 to 45 years.– for example, in a study of adult males, limit for systolic

BP might be 70-250 mmHg.

Common cause of error: misplacing the decimal, may because of confusion or transcription error.– If the recorded value is plausible a misplaced decimal

point may well go undetected.– Plausible but unlikely values should be corrected only

if there is evidence of a mistake.

Logical checks

When the value of a variable that are reasonable but depend on the value of some other variable – logical checks.– For example, – 7a. Are you studying currently? (No=1, yes=2)– 7b. If ‘No’, what is your highest attained qualification?

Dates

Check that all dates are within a reasonable time span.

Check that all dates are valid Check that dates are correctly sequenced Check that ages and time intervals

Outliers

Data for continuous variables may reveal of outlying values.

Few variables may have outliers but most variables will not have any.

Suspicious values should be carefully checked.– No evidence of a mistake and the value is

plausible, then it should not be altered.

General Guidelines in Data Management

Rows in the datasheet should contain individual Rows in the datasheet should contain individual information - Record.information - Record.

Each column should contain values of a single Each column should contain values of a single entity of all the individuals – Variable.entity of all the individuals – Variable.

Variable name should not exceed more than Variable name should not exceed more than eight characters.eight characters.

Variables can be either numeric or string or Variables can be either numeric or string or alphanumeric. alphanumeric.

A numeric variable must posses only numbers.A numeric variable must posses only numbers. In any datasheet, identification number is must.In any datasheet, identification number is must.

Opening Opening analysis screenanalysis screen ReadingReading/opening a project to analyze/opening a project to analyze Listing, sorting and selecting recordsListing, sorting and selecting records Defining new variablesDefining new variables Assigning values to new variablesAssigning values to new variables Recoding existing variable into a new Recoding existing variable into a new

variablevariable Saving changes into a new data tableSaving changes into a new data table

Data Management using EPI InfoData Management using EPI Info

Opening Analysis Screen

Opening Analysis Screen …contd

Reading/Opening Analysis Screen …contd

Listing the records

Sorting the records

Selecting a subset the records

Defining new variables

Assigning values to new variables

Introduction to Introduction to Basic Data AnalysisBasic Data Analysis

Descriptive Statistics

Descriptive AnalysisDescriptive Analysis

Quantitative

MeanMedianRange/IQ RangeSD

CategoricalCategorical

FrequencyFrequencypercentagepercentage

Frequency and percentage

Means

Interpretation of Diagnostic test

Interpretation of Diagnostic test

How to assess the ability of Stress testing against angiography for coronary artery disease?

Angiography

Stress testingTrue False Total

Positive 65 11 76

Negative 35 89 124

Total 100 100 200

2 x 2 Tables in Clinical Epidemiology

Used to assess the ability of a Diagnostic test

Disease Status by a gold standard test

New TestTrue False Total

Positive a b a + b

Negative c d c + d

Total a + c b + c a + b + c + d

Sensitivity and Specificity

Sensitivity: proportion of actual positives which are correctly identified as such

Specificity: proportion of negatives which are correctly identified

Interpretation of Diagnostic test

Interpretation of Diagnostic test

How to assess the ability of Stress testing against angiography for coronary artery disease?

Angiography

Stress testingTrue False Total

Positive 65 11 76

Negative 35 89 124

Total 100 100 200

Sensitivity = 65/100 = 65% Specificity = 89/100 = 89%

Positive and Negative Predictive Values

Positive predictive value: proportion of patients with positive test results who are correctly diagnosed

Negative predictive value : proportion of patients with negative test results who are correctly diagnosed

Depends upon the prevalence of the disease

Interpretation of Diagnostic test

How to assess the ability of Stress testing against angiography for coronary artery disease?

Angiography

Stress testingTrue False Total

Positive 65 11 76

Negative 35 89 124

Total 100 100 200

PPV = 65/76 = 83.5% NPV = 89/124 = 71.8%

Likelihood Ratios

Likelihood ratio is independent of disease prevalence

Positive LR = 0.65/(1-0.89) = 5.9

Likelihood of a patient having disease has increased by six-fold given the positive test result.

Larger the positive LR, greater the likelihood of disease

Likelihood Ratios

Likelihood ratio is independent of disease prevalence

Negative LR = (1-0.65)/0.89 = 0.39

smaller the negative LR, lesser the likelihood of disease

Interpretation of a diagnostic test

Exercise: 2

How to assess the ability of PCR against culture for TB?

Culture

PCRTrue False Total

Positive 65 11 76

Negative 35 89 124

Total 100 100 200

Sensitivity = Specificity =

Positive LR = Negative LR =