sequential logistic regression: modeling risk factors and child outcomes

51
Sequential Logistic Regression: Modeling Risk Factors and Child Outcomes Presented to NIC Chapter of ASA October 21, 2005

Upload: theo

Post on 19-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Sequential Logistic Regression: Modeling Risk Factors and Child Outcomes. Presented to NIC Chapter of ASA October 21, 2005. Logistic Regression Model. Statistical method for relating explanatory variable(s) to the log odds of a binary outcome measure. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Sequential Logistic Regression: Modeling Risk Factors and Child Outcomes Presented to NIC Chapter of ASA

October 21, 2005

Page 2: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Logistic Regression Model

Statistical method for relating explanatory variable(s) to the log odds of a binary outcome measure.

Dependent variable is always a binary outcome.

Independent variables may be categorical or quantitative.

Page 3: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

p is the probability associated with the binary outcome measure.

eß1 is the odds ratio for independent variable x1. Odds ratio (eß1) being the amount of increase in

the odds associated with a unit increase in x1.

Logistic Regression Model

kx

kxx

p

p ...221101

ln

Log of the Odds Ratio

Page 4: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Statistical Inference for Logistic Regression The confidence interval for the slope b1 is

The confidence interval for the odds ratio is

Where z is the value from the standard normal density curve.

11 bSEzb

1111 , bb SEzbSEzb ee

Page 5: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Statistical Inference for Logistic Regression

To test the hypothesis Ho: ß1 = 0 we compute the test statistic

Which has approximately a Chi-Square distribution with 1 df.

2

1

12

bSE

bX

Page 6: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Logistic Regression with One Predictor1

Assume in a large sample of college students, those who frequently engage in binge drinking are 3,314/17,096 = 0.1938.

Odds for a for this outcome are thus:

24.08062.0

1938.0

1

p

p

This example borrowed from introduction to the Practice of Statistics by Moore and McCabe (2006).

Page 7: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Odds Males:

Odds for Females:

Is Gender a Predictor?

294.07730.0

2270.0

1

p

p

205.08302.0

1698.0

1

p

p

Log Odds:22.1)294.0ln(

58.1)205.0ln(

Log Odds:

Gender

Binge Drinking? Male Female Total

Yes 1,630 1,684 3,314

No 5,550 8,232 13,782

Total 7,180 9,916 17,096

Page 8: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Model for this example is:

For females (x1= 0) we have:

Thus the estimate of the intercept is equal to ß0 which is the log odds for females.

Interpreting the LogReg Model

1101ln x

p

p

01 001

ln

p

p

59.101

ln

p

p

Page 9: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

The estimate of the slope is the difference between the log odds for males on the predictor and the log odds for females on the predictor:

The fitted model is: log(ODDS)=-1.59 +0.36x

Interpreting the LogReg Model

36.0)59.1(23.11

ln1

ln0

0

1

11

p

p

p

pb

Page 10: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Meaning of the Odds Ratio

The odds ratio is:

Interpretation: the odds of being a frequent binge drinker for males is 1.43 times the odds for females.

43.136.059.1

36.059.1

ee

e

ODDS

ODDS

females

males

Page 11: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Multivariate Logistic Regression

The multivariate case has the same statistical concepts but the computations are more difficult because of the potential correlation among multiple predictors.

It is easy to conduct the analysis using a statistical software package.

Page 12: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Overview of Study

Children grow up within the context of personality, family, neighborhood, and society.

They grow up with both disadvantages and opportunities, problems and strengths, referred to here as risk and protective factors. Examples of commonly understood risk factors include low birth weight,

child maltreatment, illness, neighborhood violence. Examples of commonly understood protective factors include individual

verbal communication skills, the capacity for empathy, problem solving skills, frustration tolerance, the presence of multiple and consistent caregivers, access to health care and social services, and the concrete, social, and affective support of family and friends.

The aim of this study was to empirically measure risk and protective factors at the individual, family, and neighborhood level and to relate them to poor short- and longer-term outcomes such as health problems, behavioral and cognitive development, and maltreatment.

Page 13: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Methods -- Subjects

The 219 mother-infant dyads recruited for this study were part of a larger cohort recruited in waves over four years, beginning in 1990 as part of the Capella Project, a twenty year longitudinal study funded by NIH.

Data used in the current analysis were collected over a period of approximately 4-5 years.

Infants in the study were all under 18 months of age when they entered the study.

Page 14: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Methods -- Instruments Extensive information was collected during the primary maternal interview. The main tools were the interview and self-report inventories.

Combination of study-developed and standardized instruments. Maternal Information

Use of alcohol and drugs. Physical and psychological health. Personal history of physical, sexual and emotional abuse. Family functioning and daily life stressors. Neighborhood conditions.

Child Information Behavior. Health, accidents, hospitalizations. Cognitive and emotional development.

Child maltreatment Abuse or neglect in the child’s first year of life, obtained from an annual review of hotline

records of reports, and supplemented by case record review

Page 15: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Caregiver Intra-Personal Functioning CAGE—4 item rapid alcoholism screening scale. Subjects were

classified as having a possible alcohol problem if they endorsed 2 or more items.

Center for Epidemiologic Studies Depression Scale—20-item scale to measure depressive symptoms. Clinical cut-off score of 16 used here.

Health Opinion Survey—20 item scale to assess neurotic or psychosomatic symptoms. Higher scores indicate more symptoms. A binary measure was computed using a median split, to reflect above-average psychosomatic symptoms.

Service Utilization – report of a psychiatric or substance use hospitalization.

Page 16: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Caregiver Inter-Personal Functioning Family and Neighborhood —The family APGAR is a 5-

item inventory of family function and satisfaction. The Neighborhood Satisfaction Index is a 9-item inventory of neighborhood characteristics.

Domestic Violence was defined by self-report in conjunction with questions regarding childhood physical, sexual and emotional abuse, and was further confirmed as current by interviewer in the site-specific Trauma and Violence scale.

Lifetime Stressors – An inventory of common stressors such as marriage, divorce, death in the family, moving, experiencing violence, etc.

Page 17: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Child Short-Term Outcomes

Child Health Status— items to assess general health, specific conditions applying to child and other illness or problems.

Service Utilization Measures—to assess accidents and hospitalizations of the child.

Child Abuse Neglect Tracking System—abuse or neglect in the child’s first year of life, obtained from an annual review of hotline records of reports, and supplemented by case record review.

Battelle Developmental Inventory Screening Test—96 items (out of 341 in complete battery) to assess five domains: personal-social skills, adaptive behavior, psychomotor ability, communication and cognitive. Child considered to have delayed development if (standardized) Battelle total score more than 1 standard deviation from the mean.

Page 18: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Child Long-Term Outcomes

Child Health – items assessing general health, specific conditions applying to child and other illness or problems through caregiver report.

Child Behavior Checklist – 5 scale scores assessing a child’s behavioral and social development.

PRESS – A measure of intelligence for pre-school children.

Page 19: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Hypotheses

The theoretical model guiding the analyses posited a sequential model of the effect of certain risk factors on child developmental outcomes.

These risk factors were: Maternal history of loss and/or victimization. Maternal compromised emotional status. Domestic violence. Family and/or neighborhood problems.

Page 20: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Hypotheses

Maternal history of loss/victimization would be associated with maternal compromised emotional status.

Maternal compromised emotional status would be associated with problems in the family and neighborhood and/or domestic violence.

Problems in the family and neighborhood and domestic violence would be associated with poor short-term child outcomes.

Poor short-term outcomes would be associated with poor longer-term child outcomes.

Page 21: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Visual Model of the Hypotheses

CompromisedEmotional Status

CageA/CES-DHealth Opinion

SurveyResidential Treatment

Family & Neighborhood

FAPGARLife Experiences

Neighborhood Short Form

Domestic Violence

Short-Term Outcomes

Child Abuse or Neglect

AOD/Battelle Child Health

Long-Term Outcomes

Press/CBCLBattelle

Child Health

Maternal History

Victim of Child Abuse

Lost a Parent

Page 22: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Measures used in Analyses

Maternal loss/victimization history coded yes (1) if the mother reported either a personal history of abuse or losing a parent before the age of 18. Coded no (0) otherwise.

Maternal compromised emotional status was coded yes (1) if the mother any of the following: Score of 2 or higher on a 4-item rapid alcoholism screening

inventory (CAGE). Score above cutoff of 16 on the depression inventory (CESD). Score on inventory of psychosomatic symptoms above the

median. Report of a substance or psychiatric hospitalization.

Page 23: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Measures used in Analyses

Problems in the family or neighborhood was coded yes (1) if the mother scored above the median on two or more of the following inventories: Family function and satisfaction. Neighborhood characteristics. Lifetime stressors.

Domestic violence coded yes (1) if the mother reported domestic violence.

Page 24: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Measures used in Analyses

Poor short-term (1-2 Year) child outcomes was coded yes (1) if the child had any two of the following:Health Problem(s), accident or hospitalization.Delayed Development (BATTELLE).Presence of Alcohol or Drugs at birth.

OR there was a report of abuse or neglect.

Page 25: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Measures used in Analyses

Poor long-term (3-4 Year) child outcomes was coded yes (1) if the child had any two of the following:Health Problem(s), accident or hospitalization.Delayed Development (PRESS).Behavioral Problems (CBCL)

Page 26: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Logistic Regression #1

Maternal /loss victimization history entered as a single predictor for maternal compromised emotional status.

This analysis was statistically significant (Chi-Square = 13.94, p < .001), and resulted in correct classification of 47% of cases without impaired caregiver status, 77% of cases with caregiver status problems and 68% of cases overall.

The odds ratio for the predictor (maternal victimization history) was 3.1, and the 95% CI (1.7 to 5.6).

Page 27: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

SPSS Output

Page 28: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

The Model so Far

CompromisedEmotional Status

CageA/CES-DHealth Opinion

SurveyResidential Treatment

Family & Neighborhood

FAPGARLife Experiences

Neighborhood Short Form

Domestic Violence

Short-Term Outcomes

Child Abuse or Neglect

AOD/Battelle Child Health

Long-Term Outcomes

Press/CBCLBattelle

Child Health

3.1

Maternal History

Victim of Child Abuse

Lost a Parent

Page 29: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Logistic Regression #2

Maternal loss/victimization history and maternal compromised emotional status entered together as predictors for family/neighborhood problems.

This analysis was also statistically significant (Chi-Square = 16.17, p < .001), and resulted in correct classification of 60% of cases without family/neighborhood problems, 65% of cases with family/neighborhood problems, and 63% of cases overall.

The odds ratio for the maternal compromised emotional status as a predictor (family neighborhood problems) was 2.5, and the 95% CI (1.4 to 4.6).

The odds ratio for maternal victimization history was not statistically significant.

Page 30: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

SPSS Output

Page 31: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

The Model so Far

CompromisedEmotional Status

CageA/CES-DHealth Opinion

SurveyResidential Treatment

Family & Neighborhood

FAPGARLife Experiences

Neighborhood Short Form

Domestic Violence

Short-Term Outcomes

Child Abuse or Neglect

AOD/Battelle Child Health

Long-Term Outcomes

Press/CBCLBattelle

Child Health

3.1

Maternal History

Victim of Child Abuse

Lost a Parent

2.5

Page 32: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Logistic Regression #3

Maternal loss/victimization history, caregiver status, and family/neighborhood problems entered in one step to predict presence of domestic violence in the home.

This regression was statistically significant (Chi-Square = 16.36, p < .001), and resulted in correct classification of 71% cases without domestic violence in the home, 51% of cases with domestic violence in the home, and 62% cases overall.

The odds ratio for the maternal compromised emotional status as a predictor (of domestic violence) was 2.1, and the 95% CI (1.4 to 4.6).

The odds ratio for family/neighborhood problems as a predictor (of domestic violence) was 1.8, and the 95% CI (>1.0 to 3.2).

The odds ratio for maternal victimization history was not statistically significant.

Page 33: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

SPSS Output

Page 34: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

The Model so Far

CompromisedEmotional Status

CageA/CES-DHealth Opinion

SurveyResidential Treatment

Family & Neighborhood

FAPGARLife Experiences

Neighborhood Short Form

Domestic Violence

Short-Term Outcomes

Child Abuse or Neglect

AOD/Battelle Child Health

Long-Term Outcomes

Press/CBCLBattelle

Child Health

3.1

Maternal History

Victim of Child Abuse

Lost a Parent

2.5

2.11.8

Page 35: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Logistic Regression #4

Maternal loss/victimization history, caregiver status, family/neighborhood problems, and domestic violence entered in one step to predict presence of poor short-term child outcomes.

The overall regression was not statistically significant (Chi-Square = 8.98, p < .062), and classification was less effective. Under this model, all cases were classified into the poor short-term child outcome group, correctly classifying only those subjects who did in fact have poor short-term child outcomes (66%), and misclassifying all the rest.

The odds ratio domestic violence as a predictor (of poor short-term child outcomes) was 2.1, and the 95% CI (1.2 to 3.9). This was statistically significant.

The odds ratios for the other predictors were not statistically significant.

Page 36: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

SPSS Output

Page 37: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

The Model so Far

CompromisedEmotional Status

CageA/CES-DHealth Opinion

SurveyResidential Treatment

Family & Neighborhood

FAPGARLife Experiences

Neighborhood Short Form

Domestic Violence

Short-Term Outcomes

Child Abuse or Neglect

AOD/Battelle Child Health

Long-Term Outcomes

Press/CBCLBattelle

Child Health

3.1

Maternal History

Victim of Child Abuse

Lost a Parent

2.5

2.11.8

2.1

Page 38: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Logistic Regression #5

Maternal loss/victimization history, caregiver status, family/neighborhood problems, domestic violence, and poor short-term child outcomes entered in one step to predict presence of poor longer-term child outcomes.

The overall regression was statistically significant (Chi-Square = 16.67, p < .005), and resulted in correct classification of 39% cases without poor long-term child outcomes, 85% of cases having poor long-term child outcomes, and 68% cases overall.

The odds ratio for family/neighborhood problems as a predictor (of poor long-term child outcomes) was 2.6, and the 95% CI (1.1 to 6.1).

The odds ratio for poor short-term outcomes as a predictor (of poor long-term child outcomes) was 3.2, and the 95% CI (1.4 to 7.6).

The odds ratios for the other predictors were not statistically significant.

Page 39: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

SPSS Output

Page 40: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

The Final Model

CompromisedEmotional Status

CageA/CES-DHealth Opinion

SurveyResidential Treatment

Family & Neighborhood

FAPGARLife Experiences

Neighborhood Short Form

Victim of Domestic Violence

Short-Term Outcomes

Child Abuse or Neglect

AOD/Battelle Child Health

Long-Term Outcomes

Press/CBCLBattelle

Child Health

3.1

Maternal History

Victim of Child Abuse

Lost a Parent

2.5

2.11.8

2.1

2.6

3.2

Page 41: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Goodness of Fit -2LL (LL = log likelihood) is 0 if model fits

perfectly. Chi-Square is test the change in -2LL from

constant only to model with set of predictors.

Page 42: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Goodness of Fit Quantification of the proportion of explained

variance. Cox & Snell R2 & Nagelkerke R2

These are similar in intent to R2 in multiple linear regression. For the current model, about 19.5%.

Page 43: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Discrimination and Calibration

Model DiscriminationAbility of the model to discriminate

observations in the two groups. Model Calibration

How close the observed and predicted probabilities match.

Page 44: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Model Discrimination

SPSS provides a classification table.Shown earlier.

SPSS also provides a histogram of estimated probabilities.Positive cases should be on the right and

negative cases on the left.

Page 45: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Model Discrimination

not so good

one serious problem is the sample itself was quite biased towards poor outcomes because of poverty, etc.

Page 46: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Calibration

Hosmer-Lemeshow goodness-of-fit Cases divided into deciles based on estimated

probabilities. Compare observed to expected numbers (contingency table)

Null hypothesis for this is there is no difference between the observed and predicted values.

This statistic should be interpreted carefully because it’s value is dependent upon the number of groups.

Interpretation should be cautious.

Page 47: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Hosmer and Lemeshow for Final Model

null hypothesis is not rejected, suggesting the model is OK.

Page 48: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

The c-Statistic

c-Statistic Interpreted as the proportion of pairs of cases with different observed

outcomes where the model results in higher probability for cases with the event than for cases without the event.

Ranges in value from 0.5 to 1.0, where 1.0 means the model always assigns higher probability to cases with the event than to those without the event.

In SPSS to get this you first have to save the predicted probabilities along with the actual outcome measure into a new file, and then group them into a reasonably large number of distinct groups using an equation like this: probcat = trunc(prob_1/.00005) Next cross tabulate probcat with the outcome measure and calculate

Somers’ d.

Page 49: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Somers’ d

Page 50: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

c-Statistic

The c-statistic is interpreted as the % of possible pairs of cases in which one is positive on the outcome and the other is negative, that the logistic model assigns a higher probability to the positive case.

895.05.02

79.005.

2

dc

Page 51: Sequential Logistic Regression:  Modeling Risk Factors and Child Outcomes

Conclusion

These results provided general support for the model overall.

Subsequent analyses (not reported here) helped further refine the model and explore relationships among risk factors and child outcomes.