advanced statistics presentation outline

Advanced StatisticsPresentation outline:

Regression

Risks & Ratios

Survival Analysis

Survival Curves

Sensitivity & Specificity

Regression

Allows you to use change in independent variables (IVs) to predict

changes in a dependent variable (DV)

– Accomplished through a regression equation

• Uses y-intercept and slope to create straight line through

scatter plot

– IVs must have a linear relationship (i.e., must be correlated) with

the DV

– IVs are not manipulated, but may have multiple levels

Example of Regression analysis:

• Suicide rate regressed on antidepressant prescription rate ,

controlling for factors like age, sex, race, and income.

Interpreting a Multiple Regression

R (Measure of the relationship between variables)

– Range -1 to +1

R2 (coefficient of determination)

– Ratio of residual variability

B and β

– Unstandardized and standardized regression coefficients

– Associated t and p

Can include interaction terms

ANOVA used to determine overall fit

Factors That Create Misleading Results

Susceptible to all the same misinterpretations as correlations

• Linear relationships

• Restricted range, Skewed distribution, Outliers & Extreme

groups

Other considerations:

• Correlation between IVs should not be too large

(multicollinearity)

Logistic Regression

Used for prediction of dichotomous DV

Assess impact of IV and interactions on DV similar to linear

regression

• Physician’s referral for cardiac catheterization was regressed on patient’s sex and race

after controlling for other confounders.

• African Americans had 40% lower odds for recommendation relative to odds for the

Whites.

Example of Logistic Regression

Interpreting Logistic Regression

Each IV will have a Wald chi-square statistic and associated p-value

Uses Hosmer-Lemshow Goodness of Fit Test to evaluate overall

model, except you are looking for ns

Provides Odds Ratio and corresponding CI for each IV in the equation

Classification table for sensitivity and specificity calculation

Measures of Risk

Absolute risk

– Probability of developing a disease within a specified time frame

Relative risk (RR)

– the probability that a member of an exposed group (or group with

a specific behavior, gene etc.) will develop a disease relative to

the probability that a member of an unexposed group will

develop that same disease

Measures of Ratio Odds ratio

– Measure of the likelihood of an event occurring versus the

likelihood of an event not occurring

– Example: If the odds ratio of (smokers v/s non-smokers) for

developing lung cancer is 2.16, it implies that smokers have 166%

higher odds of developing lung cancer than the odds of non-

smokers.

Hazards ratio

– measure of how often a particular event happens in one group

compared to how often it happens in another group, over time.

– Example: A hazard ratio(Group1 v/s Group 2) of 1.6 implies that

Group 1 has 60% greater risk of developing the outcome.

Survival Analysis

– Duration – time from randomization to relapse

– Time to development of a condition

– Survival – time from randomization until death

Censors data

– the critical event has not yet occurred

– lost to follow-up

– other interventions offered

– event occurred but unrelated cause

Survival Analysis: the tests

Kaplan-Meier analysis provides adjusted mean and median time to

event and CI for a group(s)

Tests for equality of survival distributions between groups (Log Rank

test and associated p-value)

Can use Cox regression to control for covariates

Provides hazard ratios with CI

Survival Curves: The Kaplan-Meier Estimate Nonparametric estimate of the survivor function

Accommodates missing data such as censoring

– Censored Data:

• Mathematically removing a patient at the end of their follow-up time

(usually denoted by a vertical tick mark)

• When a patient is censored it reduces the number at risk for the next

interval

Estimate of absolute risk

Will always have a “staircase” appearance

• Kaplan Meier survival curves were constructed to compare

survival probability between the treatment and control group.

• Cox regression analysis performed to identify association

between patient characteristics and survival days.

Example of survival analysis:

Survival Curves: The Kaplan-Meier Estimate

Helps you clarify your thinking about

– Treatment

– Prognosis

Think of Kaplan-Meier curve as a “movie” rather than a “snapshot”

Avoid focusing on one point on the curve - it’s the entire curve that tells the

Small N can be very misleading

An experience of 50 – 100 can give you the lay of the land

> 100 will most adequately represent the true range of possible

experience

Typical endpoint is survival but alternative endpoints can be:

– Disease Free Survival

– Progression Free Survival

– Response Duration

Log-rank Test

– Typical test to compare two groups of Kaplan-Meier Survival Curves

• Log-Rank statistic: variety of names, similar results

– Mantel-Haenszel Chi Square Statistic

– Cox-Mantel log-rank statistic

– Mantel log-rank statistic

– or simply the SES: approximate Chi-square test for significance

between observed vs. expected number of events

• Hazard Ratio with 95% CI

– Similar to Odds ratio

– Demonstrates numerically, differences in curves on a plot

Interval

(Start-End)

# At Risk at

Start of Interval

# Censored

During

Interval

# At Risk at End of

Interval

# Who Died at End of

Interval

Proportion Surviving This

Interval

Cumulative Survival at End of

Interval

0-1 7 0 7 1 6/7 = 0.86 0.86

1-4 6 2 4 1 3/4 = 0.75 0.86 * 0.75 = 0.64

4-10 3 1 2 1 1/2 = 0.5 0.86 * 0.75 * 0.5 = 0.31

10-12 1 0 1 0 1/1 = 1.0 0.86 * 0.75 * 0.5 * 1.0 = 0.31

Life Tables Expression of death rates of a particular population during a particular

– Probability of death within certain age ranges or counts of events

within a time period

– Types:

• Population (based on census or large scale survey)

• Cohort (longitudinal follow-up with a specific group)

• Clinical

• Time to cardiovascular death and other fatal cardiovascular outcomes was

compared for the two groups in terms of relative risk and hazard ratio.

Example of survival analysis:

Hazard Function Curves

Sensitivity and Specificity

Terms used to evaluate a clinical test

Independent of the population of interest subjected to the test

Positive and negative predictive values are useful when considering the

value of a test to a clinician

– They are dependent on the prevalence of the disease in the population of

interest

The sensitivity and specificity of a quantitative test are dependent on the

cut-off value above or below which the test is positive

– In general, the higher the sensitivity, the lower the specificity, and vice versa

Receiver operator characteristic (ROC) curves are a plot of false

positives against true positives for all cut-off values

– The area under the curve of a perfect test is 1.0 and that of a useless test,

no better than tossing a coin, is 0.5

Sensitivity and Specificity

Sensitivity: If a person has a disease, how often will the

test be positive (true positive rate)?

– If the test is highly sensitive and the test result is negative you

can be nearly certain that they don’t have disease

– Rules out disease (when the result is negative)

Specificity: If a person does not have the disease how

often will the test be negative (true negative rate)?

– If the test result for a highly specific test is positive you can be

nearly certain that they actually have the disease

– rules in disease with a high degree of confidence

Fundamental terms to understanding Sensitivity & Specificity

True positive: the patient has the disease and the test is positive.

False positive: the patient does not have the disease but the test is positive.

True negative: the patient does not have the disease and the test is negative.

False negative: the patient has the disease but the test is negative.

Sensitivity = true positives / (true positive + false negative)

Specificity = true negatives / (true negative + false positives)

Predictive value for a positive result (PV+):

PV+ asks "If the test result is positive what is the probability that the patient actually has the disease?"

PV+ = true positive / (true positive + false positive)

Predictive value for a negative result (PV-):PV- asks "If f the test result is negative what is the probability that the patient does not have disease?"

PV - = true negatives / (true negatives + false negatives)

Diagnostic Test Design: sensitivity & specificity

• Sensitivity, Specificity, Positive and negative predictive value and area under the

receiver operating curve was compared for procalcitonin, C reactive protein and

Leucocyte count.

Example for ROC :

Thank You

advanced statistics presentation outline

Documents

advanced statistics for researchers

advanced multivariate statistics with...

bs statistics (4-years) course outline

pasw® advanced statistics 17 - bgu · pasw statistics 17.0...

advanced placement statistics

advanced torts outline

ibm spss advanced statistics 24 -...

ɷmcgraw hill, schaum's outline of beginning statistics

advanced statistics for librarians

applied statistics i lecture notes (outline view)

ibm spss advanced statistics

advanced civil procedure outline

advanced statistics using r, asur - evolutionary biology ·...

annotated outline for gender statistics training manual

collaborative statistics: supplemental course materials ·...

outline introduction to bayesian statistics introduction

advanced corporate tax outline

advanced statistics demystified -...

advanced marketing training course outline

schaum's easy outline : probability and statistics...