representation and reinforcement learning for personalized...

Representation and Reinforcement Learning for Personalized Glycemic Control in Septic PatientsWei-Hung Weng1, Mingwu Gao2, Ze He3, Susu Yan4, Peter Szolovits1

1 CSAIL, MIT | 2 Philips Connected Sensing Venture | 3 Philips Research North America | 4 Massachusetts General Hospital

BackgroundMotivation•Criticallyillpatientshavetheissueofpoorglucosecontrol,whichin-cludesthepresenceofdysglycemiaandhighglycemicvariability.•CurrentclinicalpracticefollowstheguidelinessuggestedbytheNICE-Sugartrialtocontrolthebloodsugarlevelforcricitalcare.•However,thereareoverwhelmingvariationsinclinicalconditionsandphysiologicalstatesamongpatientsundercriticalcare.Thislimitscli-nicians’abilitytoperformappropriateglycemiccontrol.Inaddition,clinicianssometimesmaynotbeabletoconsidertheissueofglycemiccontrol.•Tohelpcliniciansbetteraddressthechallengeofmanagingpatients’glucoselevel,weneedapersonalizedglycemiccontrolstrategythatcantakeintoaccountthevariationsinpatients’physiologicalandpathologicalstates.

Reinforcement Learning (RL) in Clinical Domain•RLisapotentialapproachforthescenarioofsequentialdecisionmak-ingwithdelayedrewardoroutcome.•RLalsohastheabilitytogenerateoptimalstrategiesbasedonnon-op-timizedtrainingdata.•RLhasbeenusedfortreatmentofschizophrenia[Shortreed2011];hep-arindosingproblem[Nemati2016];mechanicalventilationadministra-tionandweaning[Prasad2016];andsepsistreatment[Raghu2017].•Relatedtoglycemiccontrol,somestudiesutilizeRLandinverseRLtodesignclinicaltrialsandadjustclinicaltreatments[Bothe2014].•FewerstudieshaveutilizedtheRLapproachtolearnabettertargetlaboratoryvaluesasreferencesforclinicaldecisionmaking.

Proposed Approach and Objectives•Learnoptimalpolicytosimulatepersonalizedoptimalglycemictrajec-tories,whicharesequencesofappropriateglycemictargets.•Thesimulatedtrajectoriesareintendedasareferenceforclinicianstodecidetheirglycemiccontrolstrategy,andtoachievebetterclinicaloutcomes.•Wehypothesizedthatthepatientstates,glycemicvalues,andpatientoutcomescanbemodeledasaMarkovdecisionprocess(MDP).•‘Action‘=theglycemicvaluethatleadtorealclinicalaction.•WeexploredRLapproachtolearnthepolicyforlearningpersonalizedoptimalglycemictrajectories,andcomparedtheprognosisofthetra-jectoriessimulatedbytheoptimalpolicytotherealtrajectories.•Thelearnedpolicyisintendedasreferencesforclinicianstoadaptandoptimizetheircarestrategy,andtoachievebetterclinicaloutcomes.

Abstract

Glycemiccontrolisessentialforcriticalcare.However,itisachallengingtasksincetherehasbeennostudyonpersonalizedoptimalstrategyforglycemiccontrol.Thisworkaimstolearnpersonalizedoptimalglycemictrajectoriesforseverelyillsepticpatientsvialearningdata-drivenpoli-ciesofdecidingoptimaltargetedbloodglucoselevelasclinicians’refer-ence.Weencodedpatientstatesusingasparseautoencoderandadoptedareinforcementlearningparadigmusingpolicyiterationtolearntheop-timalpolicyfromdata.Wealsoestimatedtheexpectedreturnfollowingthepolicylearnedfromtherecordedglycemictrajectories,whichyieldedafunctionindicatingtherelationshipbetweenrealbloodglucosevaluesand90-daymortalityrates.Thissuggeststhatthelearnedoptimalpolicycouldreducethepatients’estimated90-daymortalityrateby6.3%,from31%to24.7%.Theresultdemonstratesthatthereinforcementlearningwithappropriatepatientstateencodingcanpotentiallyprovideoptimalglycemictrajectoriesandallowclinicianstodesignapersonalizedstrat-egyforglycemiccontrolinsepticpatients.

Methods

Experiment

Reference•Bellman.DynamicProgramming.1957.•Botheetal.Theuseofreinforcementlearningalgorithmstomeetthechallengesofanartificialpancreas.ExpertRe-viewofMedicalDevices,2014.

•Howard.DynamicProgrammingandMarkovProcesses.1960.•Nematietal.Optimalmedicationdosingfromsuboptimalclinicalexamples:Adeepreinforcementlearningapproach.InIEEEEngineeringinMedicineandBiologySociety,2016.

•Ng.Sparseautoencoder.2011.https://web.stanford.edu/class/archive/cs/cs294a/cs294a.1104/sparseAutoencoder.pdf.•Prasadetal.Areinforcementlearningapproachtoweaningofmechanicalventilationinintensivecareunits.arXiv2017.

•Raghuetal.Continuousstate-spacemodelsforoptimalsepsistreatment-adeepreinforcementlearningapproach.arXiv2017.

•Shortreedetal.Murphy.Informingsequentialclinicaldecision-makingthroughreinforcementlearning:anempiricalstudy.MachineLearning2010.

•Silver.ReinforcementLearningLecture3:PlanningbyDynamicProgramming.2015.

Data Source and Study Cohort•5,565septicpatientsinMIMIC-IIIversion1.4.•Sepsis-3criteriatoidentifypatientswithsepsis.•Exclusion:age<18,SOFA<2,notfirstICUadmission.•Diabetes:(1)ICD-9,(2)pre-admissionHbA1c>7.0%,(3)admissionmedication,and(4)historyofdiabetesinthefreetext•Datawerecollectedatonehourinterval.•Missingvalues:linearandpiecewiseconstantinterpolation

RL Settings•Reward:90-daymortality(+100/-100)•Action:discretizedglucoselevels(11bins)astheproxyofrealactions•State:Total46normalizedvariables(patientlevelvariables,bloodglu-coserelatedvariables,periodicvitalsigns)

Patient State Encoding•Rawfeaturesvs.sparseautoencoder-encodedfeatures[Ng2011].•500stateclustersbyk-meansclustering.

Policy Evaluation / Iteration•Learnoptimalpolicy&evaluateonrealtrajectories.•90-daymortalityrate=f (expectedreturn)•Computeandcomparetheestimatedmortalityrateofrealandopti-malglucosetrajectoriesobtainedbyRL-learnedpolicy.[Figurecourtesy:DavidSilver]

Conclusion

WeutilizedtheRLalgorithmwithrepresentationlearningtolearnthepersonalizedoptimalpolicyforbetterpredictingglycemictargetsfromretrospectivedata.Themethodmayreducethemortalityrateofsepticpatients,andpotentiallyassistclinicianstooptimizethereal-timetreat-mentstrategyatdynamicpatientstatelevelswithamoreaccuratetreat-mentgoal,andleadtooptimalclinicaldecisions.Futureworksincludeapplyingacontinuousstateapproach,differentevaluationmethods,andapplyingthemethodtodifferentclinicaldecisionmakingproblems.

Results•Thedatadistributionoflearnedexpectedre-turn,whichistherescaledQ-value,isnegative-lycorrelatedwithmortalityratewithhighcor-relationvalue.•Thelearnedexpectedreturnreflectstherealpatientstatuswell.•Theoptimalpolicylearnedbythepolicyitera-tionalgorithmcanpotentiallyreducearound6.3%ofestimatedmortalityrateifwechosetheappropriatepatientstaterepresentations.

Rawfeaturerepresentation

32-dimensionsparseautoencoderrepresentation

representation and reinforcement learning for personalized...

Documents