introduction to modeling - mit opencourseware · 2020. 12. 31. · h 1: 100% cherry h 2: 75%...

35
Introduction to Modeling 6.872/HST950

Upload: others

Post on 23-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Introduction to Modeling6.872/HST950

Page 2: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Why build Models?

• To predict (identify) something

• Diagnosis

• Best therapy

• Prognosis

• Cost

• To understand something

• Structure of model may correspond to structure of reality

Page 3: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Where do models come from?

•••

• Assumes uniform priors over all hypotheses in the space

• A-priori knowledge, expressed in

• Structure of the space of models

• • Adjustments to observed data

Pure induction from data Even so, need some “space” of models to explore Maximum A-posteriori Probability (MAP)

• Maximum Likelihood (ML)

Page 4: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

An Example(Russell & Norvig)

• Surprise Candy Corp. makes two flavors of candy: cherry and lime

• Both flavors come in the same opaque wrapper

• Candy is sold in large bags, which have one of the following distributions of flavors, but are visually indistinguishable:

• h1: 100% cherry

• h2: 75% cherry, 25% lime

• h3: 50% cherry, 50% lime

• h4: 25% cherry, 75% lime

• h5: 100% lime

• Relative prevalence of these types of bags is (.1, .2, .4, .2, .1)

• As we eat our way through a bag of candy, predict the flavor of the next piece; actually a probability distribution.

Page 5: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Bayesian Learning

• Calculate the probability of each hypothesis given the data

• To predict the probability distribution over an unknown quantity, X,

• If the observations d are independent, then

• E.g., suppose the first 10 candies we taste are all lime

Page 6: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

h1: 100% cherry

h2: 75% cherry, 25% lime

h3: 50% cherry, 50% lime Learning Hypotheses

h4: 25% cherry, 75% lime

h5: 100% lime and Predicting from Them

• (a) probabilities of hi after k lime candies; (b) prob. of next lime

• Image by MIT OpenCourseWare.

MAP prediction: predict just from most probable hypothesis

• After 3 limes, h5 is most probable, hence we predict lime

• Even though, by (b), it’s only 80% probable

0 2 4 6 8 10 0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of samples in d

a b

Number of samples in d

Post

erio

r pro

babi

lity

of h

ypot

hesi

s

Prob

abili

ty th

at n

ext c

andy

is li

me

P(h1 | d) P(h2 | d) P(h3 | d) P(h4 | d) P(h5 | d)

Page 7: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Observations

• Bayesian approach asks for prior probabilities on hypotheses!

• Natural way to encode bias against complex hypotheses: make their prior probability very low

• Choosing hMAP to maximize

• is equivalent to minimizing

• but as we know that entropy is a measure of information, these two terms are

• # of bits needed to describe the data given hypothesis

• # bits needed to specify the hypothesis

• Thus, MAP learning chooses the hypothesis that maximizes compression of the data; Minimum Description Length principle

• Regularization is similar to 2nd term—penalty for complexity

• Assuming uniform priors on hypotheses makes MAP yield hML, the maximum likelihood hypothesis, which maximizes

Page 8: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Learning More Complex Hypotheses

• Input:

• Set of cases, each of which includes

• numerous features: categorical labels, ordinals, continuous

• these correspond to the independent variables

• Output:

• For each case, a result, prediction, classification, etc.,corresponding to the dependent variable

• In regression problems, a continuous output

• a designated feature the model tries to predict

• In classification problems, a discrete output

• the category to which the case is assigned

• Task: learn function f(input)=output

• that minimizes some measure of error

Page 9: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Linear Regression

• General form of the function

• For each case:

• Find to minimize some function of over all

• e.g., mean squared error:

Page 10: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Logistic Regression

• Logistic function:

• E.g, how risk factors contribute to probability of death are the log odds ratios •

Page 11: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

More sophisticated models

• Nearest Neighbor Methods

• Classification Trees

• Artificial Neural Nets

• Support Vector Machines

• Bayes Networks (much on this, later)

• Rough Sets, Fuzzy Sets, etc. (see 6.873/HST951 or other ML classes)

Page 12: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

How?

• Given: pile of training data, all cases labeled with gold standard outcome

• Learn “best” model

• Gather new test data, also all labeled with outcomes

• Test performance of model on new test data

• Simple, no?

Page 13: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Simplest Example

• Relationship between a diagnostic conclusion and a diagnostic test

Test Positive

Test Negative

Disease Present

True Positive

False Negative TP+FN

Disease Absent

False Positive

True Negative FP+TN

TP+FP FN+TN

Page 14: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

DefinitionsTest

Positive Test

Negative

Disease Present

True Positive

False Negative

TP+FN

Disease Absent

False Positive

True Negative

FP+TN

TP+FP FN+TN

Sensitivity (true positive rate): TP/(TP+FN)

False negative rate: 1-Sensitivity = FN/(TP+FN)

Specificity (true negative rate): TN/(FP+TN)

False positive rate: 1-Specificity = FP/(FP+TN)

Positive Predictive Value (PPV): TP/(TP+FP)

Negative Predictive Value (NPV): TN/(FN+TN)

Page 15: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Test Thresholds

+

-

FPFN

T

Page 16: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Wonderful Test

+

-

FPFN

T

Page 17: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Test Thresholds Change Trade-off between Sensitivity and Specificity

+

-

FPFN

T

Page 18: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Receiver Operator Characteristic TPR

(sensitivity) (ROC) Curve

0 FPR (1-specificity)1 0

1

T

Page 19: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

TPR What makes a better test?

0 FPR (1-specificity)1

(sensitivity)

0

1

worthless

superb

OK

Page 20: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Need to explore many models

• Remember:

• training set => model

• model + test set => measure of performance

• But

• How do we choose the best family of models?

• How do we choose the important features?

• Models may have structural parameters

• Number of hidden units in ANN

• Max number of parents in Bayes Net

• Parameters (like the betas in LR), and meta-parameters

• Not legitimate to “try all” and report the best !!!!!!!!!!!!!!!!!!

Page 21: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

The Lady Tasting Tea

• R.A. Fisher & the Lady

• B. Muriel Bristol claimed she prefers tea added to milk rather than milk added to tea

• Fisher was skeptical that she could distinguish

• Possible resolutions

• Reason about the chemistry of tea and milk

• Milk first: a little tea interacts with a lot of milk

• Tea first: vice versa

• Perform a “clinical trial”

• Ask her to determine order for a series of test cups

• Calculate probability that her answers could have occurred by chance guessing; if small, she “wins”

• ... Fisher’s Exact Test

• Significance testing

• Reject the null hypothesis (that it happened by chance) if its probability is < 0.1, 0.05, 0.01, 0.001, ..., 0.000001, ..., ????

Page 22: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

How to deal with multiple testing

• Suppose Ms. Bristol had tried this test 100 times, and passed once.Would you be convinced of her ability to distinguish?

• Bonferroni correction: for n trials, insist on a p-value that is 1/n of what you would demand for a single trial

Page 23: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Cross-validationTraining Data “Real” Training Data

Validation Data

Test Data

• Any number of times

• Train on some subset of the training data

• Test on the remainder, called the validation set

• Choose best meta-parameters

• Train, with those meta-parameters, on all training data

• Test on Test data, once!

Page 24: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Aliferis lessons (part)

• Overfitting

• bias, variance, noise

• O = optimal possible model over all possible learners

• L = best model learnable by this learner

• A = actual model learned

• Bias = O - L (limitation of learning method or target model)

• Variance = L - A (error due to sampling of training cases)

• Compare against learning from randomly permuted data

• Curse of dimensionality

• Feature selection

• Dimensionality reduction

Page 25: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Causality

• Suppes, 1950’s

• Statistical association

• Temporal succession

• No confounders (!)

• hidden variables • A node, X, is conditionally independent

of all other nodes in the network given its Markov blanket: its parents, Ui, children,Yi, and children’s parents, Zi.

X

Y1 Yn

UnU1

Z1 Zn

Page 26: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Using MIMIC data to build predictive models

• Mortality • Caleb Hug’s 2009 PhD thesis: http://dspace.mit.edu/handle/1721.1/46690

• Comparison to SAPS II

• Daily Acuity Scores

• Real-time Acuity Scores

• Other outcomes

• Good

• Weaning from Ventilator

• Weaning from Intra-Aortic Balloon Pump

• Weaning from Vasopressors

• Bad

• Septic shock

• Hypotension

• Acute kidney injury

Page 27: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

3.2. EXTRACTING THE VARIABLES 31

Systolic Blood Pressure

Hours Between SBP Measurements

Fre

quency

0 1 2 3 4

0e+

00

8e+

05

Glasgow Coma Scale

Hours Between GCS Observations

Fre

quency

0 5 10 15

0e+

00

2e+

05

Blood Urea Nitrogen

Hours Between BUN Measurements

Fre

quency

0 5 10 15 20 25 30

010000

INR

Hours Between INR Observations

Fre

quency

0 5 10 15 20 25 30

04000

8000

WBC Counts

Hours Between WBC Counts

Fre

quency

0 5 10 15 20 25 30

010000

FiO2 Settings

Hours Between FiO2Set Recordings

Fre

quency

0 5 10 15 20 25 30

060000

Figure 3-1: Observation frequency histograms

Cleaning the data—half the research time

• Missing values

• Some values are not measured for some clinical situations

• Failures in data capture process

• Episodically measured variables

• Unclear/undefined clinical states 0 10 20 30 40 0 5 10 15 20 25 30

• Imprecise timing of meds, ... Hours Between HCT Measurements Hours Between Art_BE Measurements

• Partially measured i/o

• Proxies: e.g., which ICU⇒what disease

• Derived variables: integrals, slopes, ranges, frequencies, etc.

• Transformed variables: square root, log, etc.

• Select subset of data with enough data!

Hematocrit

Fre

quency

0

4000

Arterial Base Excess

Fre

quency

0

10000

Figure by Hug, Caleb Wayne. "Detecting Hazardous Intensive Care PatientEpisodes Using Real-time Mortality Models." Massachusetts Institute of Technology, 2009.

Page 28: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

3.4. PRELIMINARY DATASET 45

Figure 3-2: Histograms for demographic information

3.4. PRELIMINARY DATASET 47

Figure 3-4: SAPS II Variables (cont)

0

Descriptive look

Fre

quency

Fre

quency

Fre

quency

0

1000

2000

3000

0

500

1000

1500

0

1000

2000

3000

Patient Length of Stay Patient Age Histogram of Blood Urea Nitrogen (BUN) Histogram of White Blood Cell Counts

Fre

quency

Fre

quency

Fre

quency

50000

150000

0

50000

150000

0

50000

150000

20 40 60 80 100

n 13923

0 10 20 30 40 50

n 13923

0 50 100 150 0 10 20 30 40 50

n 2448468 n 2448468

Fre

quency

Fre

quency

Fre

quency

0

20000

60000

0

50000

150000

0

50000

150000

Fre

quency

Fre

quency

Fre

quency

0

4000

8000

12000

0

2000

6000

10000

0

200

400

600 missing 186042 missing 224258missing 0 missing 78

30.8 12.9mean mean5.02 63.5mean mean median 23 median 11.7median 2.67 median 65 std dev 24.3 std dev 6.5std dev 8.1 std dev 17

Length of Stay (days) Age (Years) BUN (mg/dL) WBC (1000/mm^3)

Patient Admit Weight Sex of Patients Histogram of Potassium Histogram of Sodium

50 100 150 200 250

n 13923

2 3 4 5 6 7 8 120 130 140 150 160

n 2448468 n 2448468n 13923

missing 56 missing 105388 missing 139134missing 1085 4.07 13981 mean meanmean

median 4 median 139median 78.5 std dev 0.534 std dev 4.76std dev 21.8

Female Male

Weight (kg) Sex Potassium (mEq/L) Sodium (mEq/L)

Service types for patients ICD9 Chronic Illnesses Histogram of Bicarbonate (CO2) Histogram of Bilirubin

10 20 30 40 50

n 2448468 n

missing

mean

median

std dev

2448468

1842053

4.01

1.1

7.46

13923

Other NSICU MSICU CSRU

13923

No Yes

n n missing 204650missing 0 missing 0 mean 24.6

median 24

std dev 4.72

0 5 10 15 20 25 30

ICD9 Chronic Illness? Service Type (metastatic carcinoma, hematologic malignancy, AIDS) Bicarbonate (mEq/L) Bilirubin (mg/dL)

Figure by Hug, Caleb Wayne. "Detecting Hazardous Intensive Care PatientEpisodes Using Real-time Mortality Models." Massachusetts Institute of Technology, 2009.

Page 29: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

3.4. PRELIMINARY DATASET 49

a given patient. Some patients, for example, are recorded as having died over oneyear after their first ICU discharge. To limit such cases, one might define death as“died within the ICU or within 30 days of discharge”. This limit marks a patient asalive if he or she is still in the hospital 30 days after ICU discharge. If the patientis not in the hospital at this point, the hospital discharge status of the patient isused to indicate mortality: if the patient was discharged alive (censored) they aremarked as survived; otherwise they are marked as expired. Figure 3-6 illustratesthe change in mortality rate as patients stay in the ICU for longer periods of time.This figure includes both hospital mortality (i.e., the patient died at any point duringany recorded visit) and the within-30-days-of-ICU-discharge mortality. As the figureshows, the two mortality rates track each other closely for the first several days.However, it is clear that patients who stay in the ICU longer are more apt to remainin the hospital for a significant period of time before dying. For the remainder of thiswork, references to “mortality” indicate death in the ICU or within the following 30days.

Figure 3-6: Patient counts versus the number of days spent in the ICU (left) andmortality rate versus the number of days spent in the ICU (right). For each patient,only the first ICU stay of the first recorded hospital visit is considered. “ICU + 30 daymortality” excludes deaths that occur after long post-ICU discharge hospitalizations.If a patient leaves the hospital alive within this 30-day period, they are assumed tohave survived.

3.5. FINAL DATASET 51

Table 3.14: Final Dataset: Partial Patient Exclusions

Drop Rule Number of RowsIn the ICU for longer than seven days 728739Received limited treatment, including 198942

CMO (“comfort measures only”)DNR (“do not resuscitate”)DNI (“do not intubate”)“no CPR” or “other code”

Received hemodialysis or hemofiltration 139561

The motivation for these rules generally follows the reasoning for excluding entirepatients. For example, as Figure 3-6 indicates by plotting the mortality rate versusthe number of days spent in the ICU, most patients leave the ICU within seven days ofadmission. For patients that do not leave in this 7-day window, the 30-day mortalityrate starts to noticeably decrease as caregivers are able to successfully prolong thepatient’s life while the patient remains in a compromised state often dependent onvarious interventions.

3.5.2 Dataset Summary

The final dataset — after applying all of the exclusions mentioned above — is sum-marized in Table 3.15. In addition, Appendix B lists the 438 individual variableswith brief summary statistics. Figure 3-7 provides an updated version of Figure 3-6for the final dataset.

Outcomes0

2000

6000

10000

14000

Num

ber

of patients

Total Number of Patients Number of Hospital Mortalities Number of ICU + 30 days Mortalities

0 5 10 15 20 25 30 10

15

20

25

Mort

alit

y R

ate

(%

)

0 5 10 15 20 25 30

0

2000

6000

10000

14000

Number of Patients

Hospital Mortality

ICU + 30 days Mortality

0 5 10 15 20 25 30

Days in ICU Days in ICU

Table 3.15: Preprocessed Data

Number of Patients 10,066Number of Rows 1,044,982Number of Features 438

Figure by Hug, Caleb Wayne. "Detecting Hazardous Intensive Care PatientEpisodes Using Real-time Mortality Models." Massachusetts Institute of Technology, 2009.

Page 30: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

4.3. OTHER SEVERITY OF ILLNESS SCORES 63

SAPS II

Table 4.1: SAPS II Variables Variable Max Points Age 18 Heart rate 11 Systolic BP 13 Body temperature 3 PaO2:FiO2 (if ventilated or continuous 11

positive airway pressure) Urinary output 11 Serum urea nitrogen level 10 WBC count 12 Serum potassium 3 Serum sodium level 5 Serum bicarbonate level 6 Bilirubin level 9 Glasgow Coma Scorea 26 Chronic diseases 17 Type of admission 8

aIf the patient is sedated, the estimated GCS prior to sedation

Figure by Hug, Caleb Wayne. "Detecting Hazardous Intensive Care PatientEpisodes Using Real-time Mortality Models." Massachusetts Institute of Technology, 2009.

Page 31: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

68 CHAPTER 5. MORTALITY MODELS

0 5 10 15 20 25 30 35

0.7

00.8

00.9

0

Fold 5

Number of covariates

RO

C A

rea

orientation_max_i4/5 Train, n=18299

1/5 Val, n=4589

Figure 5-1: SDAS model selection. Sensitivity to number of covariates on each cross-validation fold. The covariate(s) from the simplest model are marked on the trainingcurves.

68 CHAPTER 5. MORTALITY MODELS

0 5 10 15 20 25 30 35

0.7

00.8

00.9

0

Fold 1

Number of covariates

RO

C A

rea

GCS_max_sq

4/5 Train, n=18307

1/5 Val, n=4581

0 5 10 15 20 25 30

0.7

00.8

00.9

0

Fold 2

Number of covariates

RO

C A

rea

GCS_max_sq

4/5 Train, n=18499

1/5 Val, n=4389

0 5 10 15 20 25 30 35

0.7

00.8

00.9

0

Fold 3

Number of covariates

RO

C A

rea

mechVent_mean_sq4/5 Train, n=18220

1/5 Val, n=4668

0 5 10 15 20 25 30

0.7

00.8

00.9

0

Fold 4

Number of covariates

RO

C A

rea

GCS_max_sq

4/5 Train, n=18227

1/5 Val, n=4661

Figure 5-1: SDAS model selection. Sensitivity to number of covariates on each cross-validation fold. The covariate(s) from the simplest model are marked on the trainingcurves.

5.2. DAILY ACUITY SCORES 69

probability of mortality (positive correlation) while a negative coefficient means lessrisk of mortality (negative correlation).

Model 5.1 includes a number of interesting covariates. By examining the Wald Zscores, however, it appears that the model can likely be improved. For example, thecontributions from the largest length of stay fluid balance (LOSBal max, Z = 3.94)and the largest 24 hour fluid balance (Bal24 max, Z = −3.68) may largely counteracteach other and the model may benefit from removing at least one of these variablesand possibly replacing it with an input variable (a mean hourly output for the day,OutputB 60 mean sqrt, is already included). Similarly, the meaningfulness of thepressD01 sd sq variable (that is, the squared standard deviation of the points marked1 following the first pressor infusion and marked 0 before any pressors) is questionableas it decreases risk for patients who have a pressor started in the middle of their firstday, but increases risk for patients who receive pressors early or late on their first day.

Examining all five folds from Figure 5-1, it is clear that there was negligible impacton the validation performance by reducing the number of covariates to about 25.Considering this, I took the top 25 covariates from the models for cross-validationfolds 1, 2, 3, and 5 (excluding fold 4) and created a model using these covariates.By performing backward elimination one last time on this model a final model wasselected. As done with each cross-validation fold in Figure 5-1, a plot of performanceversus the number of covariates in a given model was created. This plot is shown inFigure 5-2.

Figure 5-2: SDAS model selection (all development data)

The models in Figure 5-2 indicate that most of the performance was capturedwith about 35 inputs. This model was chosen for additional refinement. Upon exam-ination, it was found to contain a number of pressor-related

Training models—5-fold cross validation

Fold 1 Fold 2 Fold 5

RO

C A

rea

RO

C A

rea

0.7

0

0.8

0

0.9

0

0.7

0

0.8

0

0.9

0

GCS_max_sq

4/5 Train, n=18307

1/5 Val, n=4581 R

OC

Are

a

RO

C A

rea

Number of covariates Number of covariates Number of covariates

0.7

0

0.8

0

0.9

0

0.7

0

0.8

0

0.9

0

GCS_max_sq

4/5 Train, n=18499

1/5 Val, n=4389

RO

C A

rea

RO

C A

rea

0.7

0

0.8

0

0.9

0

0.7

0

0.8

0

0.9

0

orientation_max_i 4/5 Train, n=18299

1/5 Val, n=4589

0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35

SDAS Model SensitivityFold 3 Fold 4

mechVent_mean_sq 4/5 Train, n=18220

1/5 Val, n=4668

GCS_max_sq

4/5 Train, n=18227

1/5 Val, n=4661

GCS_max_sq

Training Data: n=22888 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30

Number of covariates Number of covariates

0 10 20 30 40

Number of CovariatesFigure by Hug, Caleb Wayne. "Detecting Hazardous Intensive Care PatientEpisodes Using Real-time Mortality Models." Massachusetts Institute of Technology, 2009.

Page 32: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

70 CHAPTER 5. MORTALITY MODELS

72 CHAPTER 5. MORTALITY MODELSMany univariate analysesModel 5.1 SDAS Model for Fold 2 with 30 Covariates Model 5.2 Final SDAS model

Obs Max Deriv Model L.R. d.f. P C Dxy Obs Max Deriv Model L.R.

20172 1e-09 5415.11 30 0 0.893 0.785 20130 3e-10 5619.28

Gamma Tau-a R2 Brier 0.787 0.176 0.439 0.076

Coef S.E. Wald Z P INR_mean_i -1.795e+00 1.423e-01 -12.61 0.0000 GCS_max_sq -7.485e-03 6.000e-04 -12.47 0.0000 OutputB_60_mean_sqrt -6.561e-02 6.885e-03 -9.53 0.0000 pacemkr_max -1.084e+00 1.183e-01 -9.16 0.0000 svCSRU_max -9.516e-01 1.208e-01 -7.88 0.0000 GCSrdv_mean -1.138e-01 1.528e-02 -7.45 0.0000 pressD01_mean_am -2.774e+00 3.893e-01 -7.13 0.0000 Platelets_Slope_1680_min -5.493e+00 8.615e-01 -6.38 0.0000 pressD01_sd_sq -5.085e+00 8.678e-01 -5.86 0.0000 sedatives_mean_sq -4.375e-01 8.455e-02 -5.17 0.0000 Bal24_max -4.493e-05 1.222e-05 -3.68 0.0002 CV_HRrng_max -3.267e-03 1.083e-03 -3.02 0.0026 Intercept 4.292e-01 4.085e-01 1.05 0.2934 Milrinone_perKg_min_sq 3.523e+00 1.113e+00 3.17 0.0015 LOSBal_max 2.247e-05 5.703e-06 3.94 0.0001 hrmVA_max 3.410e-01 6.767e-02 5.04 0.0000 MBPm.pr_min_am 1.904e+00 3.711e-01 5.13 0.0000 Mg_min_sq 1.067e-01 1.798e-02 5.93 0.0000 beta.Blocking_agent_mean_lam 2.418e-01 3.955e-02 6.11 0.0000 Na_mean_am 5.214e-02 8.415e-03 6.20 0.0000 mechVent_mean_sq 7.183e-01 1.047e-01 6.86 0.0000 RESP_mean_sq 9.226e-04 1.293e-04 7.13 0.0000 Platelets_mean_i 2.512e+01 3.512e+00 7.15 0.0000 Lasix_max_lam 2.550e-01 3.457e-02 7.38 0.0000 CO2_mean_i 2.038e+01 2.741e+00 7.43 0.0000 jaundiceSkin_mean_la 1.523e-01 2.014e-02 7.56 0.0000 hospTime_min_sqrt 6.860e-03 7.939e-04 8.64 0.0000 pressorSum.std_mean_sqrt 7.758e-01 7.225e-02 10.74 0.0000 SpO2.oor30.t_mean_sqrt 4.929e-01 4.095e-02 12.04 0.0000 BUNtoCr_min_sqrt 2.867e-01 2.323e-02 12.34 0.0000 Age_min_sq 2.258e-04 1.450e-05 15.57 0.0000

Gamma 0.798

Tau-a 0.177

R2 0.456

GCS_max_sq INR_mean_i pacemkr_max svCSRU_max RikerSAS_mean Platelets_Slope_1680_min urineByHr_mean_sqrt GCSrdv_mean GCSrng_min_am pressD01_mean_am CV_HRrng_max Insulin_sd_sq alloutput_max_la MetCarcinoma_min WBC_mean_am AIDS_min Intercept MBPm.pr_min_am HemMalig_min RESP_mean_sq hrmVA_max PaO2toFiO2_mean Na_mean_am Mg_min_sq ShockIdx_max Platelets_mean_i hospTime_min_sqrt day_min_sq jaundiceSkin_mean_la CO2_mean_i Lasix_max_lam beta.Blocking_agent_mean_lam Sympathomimetic_agent_min SpO2.oor30.t_mean_sqrt BUNtoCr_min_sqrt Age_min_sq

d.f. P C Dxy 35 0 0.898 0.797

Brier 0.074

Coef S.E. Wald Z P -0.0064668 5.032e-04 -12.85 0.0000 -1.8734049 1.458e-01 -12.85 0.0000 -0.9337190 1.179e-01 -7.92 0.0000 -0.9137522 1.250e-01 -7.31 0.0000 -0.3430971 5.151e-02 -6.66 0.0000 -5.8856843 8.839e-01 -6.66 0.0000 -0.0584113 9.453e-03 -6.18 0.0000 -0.0902717 1.552e-02 -5.82 0.0000 -0.0812232 1.459e-02 -5.57 0.0000 -1.6132643 3.005e-01 -5.37 0.0000 -0.0061979 1.216e-03 -5.10 0.0000 -2.1686950 4.372e-01 -4.96 0.0000 -0.0890330 2.265e-02 -3.93 0.0001 0.4468763 1.567e-01 2.85 0.0043 0.0147036 5.149e-03 2.86 0.0043 0.5954305 1.991e-01 2.99 0.0028 1.5314512 4.529e-01 3.38 0.0007 1.4601630 3.518e-01 4.15 0.0000 0.6032027 1.212e-01 4.98 0.0000 0.0006615 1.324e-04 5.00 0.0000 0.3520834 6.823e-02 5.16 0.0000 0.2672376 4.336e-02 6.16 0.0000 0.0549066 8.506e-03 6.45 0.0000 0.1173220 1.815e-02 6.46 0.0000 0.5742182 8.853e-02 6.49 0.0000

24.0719462 3.560e+00 6.76 0.0000 0.0057514 8.158e-04 7.05 0.0000 0.0170075 2.372e-03 7.17 0.0000 0.1469141 2.045e-02 7.18 0.0000

19.3845272 2.682e+00 7.23 0.0000 0.2523702 3.444e-02 7.33 0.0000 0.2918077 3.923e-02 7.44 0.0000 0.8576883 9.254e-02 9.27 0.0000 0.4059329 4.128e-02 9.83 0.0000 0.2829088 2.348e-02 12.05 0.0000 0.0002601 1.495e-05 17.40 0.0000

Figure by Hug, Caleb Wayne. "Detecting Hazardous Intensive Care PatientEpisodes Using Real-time Mortality Models." Massachusetts Institute of Technology, 2009.

Page 33: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

5.2. DAILY ACUITY SCORES 73

Figure 5-3: SDAS ROC curve (development data). AUC = the area under the curve;n = the total number of available predictions used for curve; Missing = number ofmissing predictions.

Table 5.4: SDAS Hosmer-Lemeshow H risk deciles (all days)Died Survived

Decile Prob.Range Prob. Obs. Exp. Obs. Exp. Total1 [0.000203,0.00335) 0.002 2 4.2 2011 2008.8 20132 [0.003353,0.00682) 0.005 3 9.9 2010 2003.1 20133 [0.006825,0.01281) 0.010 11 19.2 2002 1993.8 20134 [0.012812,0.02277) 0.017 24 34.8 1989 1978.2 20135 [0.022771,0.03971) 0.031 53 61.5 1960 1951.5 20136 [0.039706,0.06691) 0.052 104 104.8 1909 1908.2 20137 [0.066911,0.11297) 0.088 198 176.7 1815 1836.3 20138 [0.112972,0.20128) 0.152 324 305.3 1689 1707.7 20139 [0.201280,0.40232) 0.285 610 574.7 1403 1438.3 201310 [0.402321,0.99876] 0.634 1239 1276.9 774 736.1 2013

χ2 = 24.47, d.f. = 8; p = 0.002

88 CHAPTER 5. MORTALITY MODELS

1!Specificity

Sensitivity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

SDAS: Day 1

AUC

n

Missing

0.87

2434

584

1!Specificity

Sensitivity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

SDAS: Day 2

AUC

n

Missing

0.877

2108

216

1!Specificity

Sensitivity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

SDAS: Day 3

AUC

n

Missing

0.883

1389

137

1!Specificity

Sensitivity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

SDAS: Day 4

AUC

n

Missing

0.882

928

85

1!Specificity

Sensitivity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

SDAS: Day 5

AUC

n

Missing

0.863

662

59

Figure 5-7: SDAS ROC curves (validation data)

92 CHAPTER 5. MORTALITY MODELS

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Predicted Probability

Actu

al P

rob

ab

ility

IdealLogistic calibrationNonparametricGrouped observations

Dxy C (ROC) R2 D U Q Brier InterceptSlope Emax

0.741 0.871 0.374 0.209 0.003 0.205 0.074!0.079 0.866 0.046

SDAS Day 1

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Predicted Probability

Actu

al P

rob

ab

ility

IdealLogistic calibrationNonparametricGrouped observations

Dxy C (ROC) R2 D U Q Brier InterceptSlope Emax

0.755 0.877 0.376 0.205 0.005 0.200 0.072!0.132 0.829 0.066

DAS1

Figure 5-9: Calibration plots for SDAS, SDAS day 1, and DAS1 (validation data). Therelative frequencies for each predicted probability are indicated by the bars along thex-axis.

Evaluating the models

SDAS: All Days

Sensitivity

0.0

0.2

0.4

0.6

0.8

1.0

AUC n Missing

0.898 20130 2758

0.0 0.2 0.4 0.6 0.8 1.0

1!Specificity

Sensitivity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

AUC n

0.876

Missing 8428 1164

1!Specificity

Devel Validation SDAS

0.0

0

.2

0.4

0

.6

0.8

1

.0

Actu

al P

rob

ab

ility

Ideal Logistic calibration Nonparametric Grouped observations

Dxy C (ROC) R2 D U Q Brier Intercept Slope Emax

0.752 0.876 0.385 0.223 0.006 0.217 0.077 !0.257 0.820 0.094

0.0 0.2 0.4 0.6 0.8 1.0 Predicted Probability

Figure by Hug, Caleb Wayne. "Detecting Hazardous Intensive Care PatientEpisodes Using Real-time Mortality Models." Massachusetts Institute of Technology, 2009.

Page 34: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

Selected features for each day of ICU stay

82C

HA

PT

ER

5.

MO

RTA

LIT

YM

OD

ELS

1 2 3 4 5

!15

!10

!5

05

10

DAS model (day n)

Figure by Hug, Caleb Wayne. "Detecting Hazardous Intensive Care PatientEpisodes Using Real-time Mortality Models." Massachusetts Institute of Technology, 2009.

Wald

z

Intercept

MetCarcinoma_min

dopLg_mean_la !

HemMalig_min

Lasix_max_lam !

CO2_mean_i !

Platelets_mean_i

SpO2_mean_am

PaO2toFiO2_mean !

hrmVA_max

hospTime_min_sqrt

ShockIdx_mean_sq

Na_max_am

jaundiceSkin_mean_la

BUNtoCr_min_sqrt

Age_min_sq

GCS_max_sq

alloutput_max_sqrt

GCSrdv_mean

INR_mean_i

pacemkr_max

Insulin_mean_la

LactateM_sd_i !

+ AIDS_min ! MetCarcinoma_min Intercept + RESP_mean_sq HemMalig_min ! hrmVA_max ! Platelets_max_am ShockIdx_mean_sq ! hospTime_min_sqrt + Sympathomimetic_agent_min Na_max_am SpO2.oor30.t_mean_sqrt jaundiceSkin_mean_la + beta.Blocking_agent_mean_lam + Amiodarone_min_am ! Age_min_sq BUNtoCr_mean_sqrt

GCS_max_sq

INR_max_i

pacemkr_max !

GCSrdv_mean !

alloutput_min_sqrt

+ UrineOutB_max_sqrt !

+ CV_HRrng_max !

Insulin_mean_la !

MetCarcinoma_min ! Platelets_max_am + temp_mean_am jaundiceSkin_mean_la Intercept + Na_Slope_1680_mean !+ Mg_min_sq !+ InputB_mean_sqrt !+ SBPm.oor30.t_max_sq ! beta.Blocking_agent_mean_lam !+ MBPm.pr_min_am Na_max_am !+ HCTrdv_max !+ Lasix_perKg_max_sqrt ! Sympathomimetic_agent_min RESP_mean_sq ! hospTime_min_sqrt BUNtoCr_min_sqrt Age_min_sq SpO2.oor30.t_mean_sqrt

GCS_max_sq

alloutput_max_sqrt !

INR_max_i

+ GCSrng_mean_sq

+ svCSRU_max

+ pressD01_mean_am

+ Platelets_Slope_1680_min

Intercept temp_mean_am ! jaundiceSkin_mean_la !+ ShockIdx_max MBPm.pr_min_am hospTime_min_sqrt ! Platelets_mean_i !+ CO2_mean_i Sympathomimetic_agent_min ! SpO2.oor30.t_mean_sqrt BUNtoCr_mean_sqrt Age_min_sq

GCS_max_sq

Platelets_Slope_1680_min

svCSRU_max

+ RikerSAS_mean

GCSrng_mean_sq

INR_mean_i

pressD01_mean_am !

+ Insulin_sd_sq !

+ urineByHr_mean_sqrt !

+ HemMalig_min

+ Mg_min_sq

MBPm.pr_min_am

ShockIdx_max

+ InputOtherBloodB_mean_lam

BUNtoCr_mean_sqrt

CO2_mean_i

SpO2.oor30.t_mean_sqrt

Age_min_sq

GCS_max_sq

svCSRU_max

RikerSAS_mean

Platelets_Slope_1680_min

INR_mean_i

GCSrng_mean_sq

Intercept

Figu

re5-5:

Ran

kedcom

parison

ofDASn

inputs

overdays

n∈

{1,2

,3,4

,5}.In

put

nam

esare

ranked

(but

equally

spaced

)for

the

positive

Wald

Zscores

and

the

negative

Wald

Z

scoresfor

eachm

odel.

On

agiven

day,

variables

absen

tfrom

the

previou

sday

areprefi

xedw

ith“+

”,an

dvariab

lesab

sent

fromth

efollow

ing

day

suffixed

with

“-”.

Page 35: Introduction to Modeling - MIT OpenCourseWare · 2020. 12. 31. · h 1: 100% cherry h 2: 75% cherry, 25% lime h 3: 50% cherry, 50% lime Learning Hypotheses h 4: 25% cherry, 75% lime

MIT OpenCourseWarehttp://ocw.mit.edu

HST.950J / 6.872 Biomedical Computing Fall 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.