pacific symposium on biocomputing 2017psb.stanford.edu/previous/psb17/conference-materials/... ·...

138
PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page that your abstract is on and put your poster on the poster board with the corresponding number (e.g., if your abstract is on page 50, put your poster on board #50). Proceedings papers with oral presentations #2-39 are not assigned poster space. Papers are organized first by session, then the last name of the first author. Presenting authors’ names are underlined.

Upload: others

Post on 05-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

PACIFICSYMPOSIUMONBIOCOMPUTING2017

ABSTRACTBOOK

PosterPresenters:Posterspaceisassignedbyabstractpagenumber.Pleasefindthepagethatyourabstractisonandputyourposterontheposterboardwiththecorrespondingnumber(e.g.,ifyourabstractison

page50,putyourposteronboard#50).

Proceedingspaperswithoralpresentations#2-39arenotassignedposterspace.

Papersareorganizedfirstbysession,thenthelastnameofthefirstauthor.Presentingauthors’namesareunderlined.

Page 2: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

i

TABLEOFCONTENTS

PROCEEDINGSPAPERSWITHORALPRESENTATIONCOMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION 1IDENTIFICATIONANDANALYSISOFBACTERIALGENOMICMETABOLICSIGNATURES...2NathanBowerman,NathanTintle,MatthewDeJongh,AaronA.Best

WHENSHOULDWENOTTRANSFERFUNCTIONALANNOTATIONBETWEENSEQUENCEPARALOGS?.............................................................................................................................................................3MengfeiCao,LenoreJ.Cowen

PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION...................................................................................................................................4ShengWang,MengQu,JianPeng

ONTHEPOWERANDLIMITSOFSEQUENCESIMILARITYBASEDCLUSTERINGOFPROTEINSINTOFAMILIES...............................................................................................................................5ChristianWiwie,RichardRöttger

IMAGINGGENOMICS 6INTEGRATIVEANALYSISFORLUNGADENOCARCINOMAPREDICTSMORPHOLOGICALFEATURESASSOCIATEDWITHGENETICVARIATIONS.....................................................................7ChaoWang,HaiSu,LinYang,KunHuang

IDENTIFICATIONOFDISCRIMINATIVEIMAGINGPROTEOMICSASSOCIATIONSINALZHEIMER'SDISEASEVIAANOVELSPARSECORRELATIONMODEL......................................8JingwenYan,ShannonL.Risacher,KwangsikNho,AndrewJ.Saykin,LiShen

ENFORCINGCO-EXPRESSIONINMULTIMODALREGRESSIONFRAMEWORK.........................9PascalZille,VinceD.Calhoun,Yu-PingWang

METHODSTOENSURETHEREPRODUCIBILITYOFBIOMEDICALRESEARCH 10EXPLORINGTHEREPRODUCIBILITYOFPROBABILISTICCAUSALMOLECULARNETWORKMODELS.........................................................................................................................................11AriellaCohain,AparnaA.Divaraniya,KuixiZhu,JosephR.Scarpa,AndrewKasarskis,JunZhu,RuiChang,JoelT.Dudley,EricE.Schadt

REPRODUCIBLEDRUGREPURPOSING:WHENSIMILARITYDOESNOTSUFFICE...............12EmreGuney

EMPOWERINGMULTI-COHORTGENEEXPRESSIONANALYSISTOINCREASEREPRODUCIBILITY...........................................................................................................................................13WinstonA.Haynes,FrancescoVallania,CharlesLiu,ErikaBongen,AurelieTomczak,MartaAndres-Terrè,ShaneLofgren,AndrewTam,ColeA.Deisseroth,MatthewD.Li,TimothyE.Sweeney,PurveshKhatri

RABIX:ANOPEN-SOURCEWORKFLOWEXECUTORSUPPORTINGRECOMPUTABILITYANDINTEROPERABILITYOFWORKFLOWDESCRIPTIONS..........................................................14GauravKaushik,SinisaIvkovic,JankoSimonovic,NebojsaTijanic,BrandiDavis-Dusenbery,DenizKural

DATASHARINGANDCLINICALGENETICTESTING:SUCCESSESANDCHALLENGES........15ShanYang,MelissaCline,CanZhang,BenedictPaten,StephenE.Lincoln

Page 3: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

ii

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 16LEARNINGATTRIBUTESOFDISEASEPROGRESSIONFROMTRAJECTORIESOFSPARSELABVALUES.........................................................................................................................................................17VibhuAgarwal,NigamH.Shah

COMPUTERAIDEDIMAGESEGMENTATIONANDCLASSIFICATIONFORVIABLEANDNON-VIABLETUMORIDENTIFICATIONINOSTEOSARCOMA......................................................18HarishBabuArunachalam,RashikaMishra,BogdanArmaselu,OvidiuDaescu,MariaMartinez,PatrickLeavey,DineshRakheja,KevinCederberg,AnitaSengupta,MollyNi'Suilleabhain

MISSINGDATAIMPUTATIONINTHEELECTRONICHEALTHRECORDUSINGDEEPLYLEARNEDAUTOENCODERS..........................................................................................................................19BrettK.Beaulieu-Jones,JasonH.Moore,ThePooledResourceOpen-AccessALSClinicalTrialsConsortium

DEVELOPMENTANDPERFORMANCEOFTEXT-MININGALGORITHMSTOEXTRACTSOCIOECONOMICSTATUSFROMDE-IDENTIFIEDELECTRONICHEALTHRECORDS.......20BrittanyM.Hollister,NicoleA.Restrepo,EricFarber-Eger,DanaC.Crawford,MelindaC.Melinda C. Aldrich,AmyNon

DEMODASHBOARD:VISUALIZINGANDUNDERSTANDINGGENOMICSEQUENCESUSINGDEEPNEURALNETWORKS..........................................................................................................................21JackLanchantin,RitambharaSingh,BeilunWang,YanjunQi

PREDICTIVEMODELINGOFHOSPITALREADMISSIONRATESUSINGELECTRONICMEDICALRECORD-WIDEMACHINELEARNING:ACASE-STUDYUSINGMOUNTSINAIHEARTFAILURECOHORT.............................................................................................................................22KhaderShameer,KippW.Johnson,AlexandreYahi,RiccardoMiotto,LiLi,DoranRicks,JebakumarJebakaran,PatriciaKovatch,ParthoP.Sengupta,AnnetineGelijns,AlanMoskovitz,BruceDarrow,DavidL.Reich,AndrewKasarskis,NicholasP.Tatonetti,SeanPinney5,JoelT.Dudley

METHODSFORCLUSTERINGTIMESERIESDATAACQUIREDFROMMOBILEHEALTHAPPS........................................................................................................................................................................23NicoleTignor,PeiWang,NicholasGenes,LindaRogers,StevenG.Hershman,ErickR.Scott,MicolZweig,Yu-FengYvonneChan,EricE.Schadt

ANEWRELEVANCEESTIMATORFORTHECOMPILATIONANDVISUALIZATIONOFDISEASEPATTERNSANDPOTENTIALDRUGTARGETS.................................................................24ModestvonKorff,TobiasFink,ThomasSander

DISCOVERYOFFUNCTIONALANDDISEASEPATHWAYSBYCOMMUNITYDETECTIONINPROTEIN-PROTEININTERACTIONNETWORKS.................................................................................25StephenJ.Wilson,AngelaD.Wilkins,Chih-HsuLin,RhonaldC.Lua,OlivierLichtarge

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 26OPENINGTHEDOORTOTHELARGESCALEUSEOFCLINICALLABMEASURESFORASSOCIATIONTESTING:EXPLORINGDIFFERENTMETHODSFORDEFININGPHENOTYPES......................................................................................................................................................27ChristopherR.Bauer,DanielLavage,JohnSnyder,JosephLeader,J.MatthewMahoney,SarahA.Pendergrass

TEMPORALORDEROFDISEASEPAIRSAFFECTSSUBSEQUENTDISEASETRAJECTORIES:THECASEOFDIABETESANDSLEEPAPNEA.......................................................................................28MetteBeck,DavidWestergaard,LeifGroop,SorenBrunak

Page 4: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

iii

HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER..................................................................................................................................................................29JonathanGallion,AngelaD.Wilkins,OlivierLichtarge

MUSE:AMULTI-LOCUSSAMPLING-BASEDEPISTASISALGORITHMFORQUANTITATIVEGENETICTRAITPREDICTION......................................................................................................................30DanHe,LaxmiParida

DIFFERENTIALPATHWAYDEPENDENCYDISCOVERYASSOCIATEDWITHDRUGRESPONSEACROSSCANCERCELLLINES..............................................................................................31GilSpeyer,DivyaMahendra,HaiJ.Tran,JeffKiefer,StuartL.Schreiber,PaulA.Clemons,HarshilDhruv,MichaelBerens,SeungchanKim

AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETSINCLEARCELLKIDNEYCANCER................................................................................................................................................32JeffreyA.Thompson,CarmenJ.Marsit

DENOVOMUTATIONSINAUTISMIMPLICATETHESYNAPTICELIMINATIONNETWORK.............................................................................................................................................................33GuhanRamVenkataraman,ChloeO'Connell,FumikoEgawa,DornaKashef-Haghighi,DennisPaulWall

IDENTIFYINGGENETICASSOCIATIONSWITHVARIABILITYINMETABOLICHEALTHANDBLOODCOUNTLABORATORYVALUES:DIVINGINTOTHEQUANTITATIVETRAITSBYLEVERAGINGLONGITUDINALDATAFROMANEHR.................................................................34ShefaliS.Verma,AnastasiaM.Lucas,DanielR.Lavage,JosephB.Leader,RaghuMetpally,SarathbabuKrishnamurthy,FrederickDewey,IngridBorecki,AlexanderLopez,JohnOverton,JohnPenn,JeffreyReid,SarahA.Pendergrass,GerdaBreitwieser,MarylynD.Ritchie

STRATEGIESFOREQUITABLEPHARMACOGENOMIC-GUIDEDWARFARINDOSINGAMONGEUROPEANANDAFRICANAMERICANINDIVIDUALSINACLINICALPOPULATION.......................................................................................................................................................35LauraWiley,JacobVanHouten,DavidSamuels,MelindaAldrich,DanRoden,JoshPeterson,JoshuaDenny

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY36PRODUCTIONOFAPRELIMINARYQUALITYCONTROLPIPELINEFORSINGLENUCLEIRNA-SEQANDITSAPPLICATIONINTHEANALYSISOFCELLTYPEDIVERSITYOFPOST-MORTEMHUMANBRAINNEOCORTEX...................................................................................................37BrianAevermann,JamisonMcCorrison,PratapVenepally,RebeccaHodge,TrygveBakken,JeremyMiller,MarkNovotny,DannyN.Tran,FranciscoDiez-Fuertes,LenaChristiansen,FanZhang,FrankSteemers,RogerS.Lasken,EdLein,NicholasSchork,RichardH.Scheuermann

TRACINGCO-REGULATORYNETWORKDYNAMICSINNOISY,SINGLE-CELLTRANSCRIPTOMETRAJECTORIES.............................................................................................................38PabloCordero,JoshuaM.Stuart

ANUPDATEDDEBARCODINGTOOLFORMASSCYTOMETRYWITHCELLTYPE-SPECIFICANDCELLSAMPLE-SPECIFICSTRINGENCYADJUSTMENT...........................................................39KristinI.Fread,WilliamD.Strickland,GarryP.Nolan,EliR.Zunder

Page 5: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

iv

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONSIMAGINGGENOMICS 40ADAPTIVETESTINGOFSNP-BRAINFUNCTIONALCONNECTIVITYASSOCIATIONVIAAMODULARNETWORKANALYSIS...............................................................................................................41ChenGao,JunghiKim,WeiPan

EXPLORINGBRAINTRANSCRIPTOMICPATTERNS:ATOPOLOGICALANALYSISUSINGSPATIALEXPRESSIONNETWORKS...........................................................................................................42ZhanaKuncheva,MichelleL.Krishnan,GiovanniMontana

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 43ADEEPLEARNINGAPPROACHFORCANCERDETECTIONANDRELEVANTGENEIDENTIFICATION...............................................................................................................................................44PadidehDanaee,RezaGhaeini,DavidHendrix

GENOME-WIDEINTERACTIONWITHSELECTEDTYPE2DIABETESLOCIREVEALSNOVELLOCIFORTYPE2DIABETESINAFRICANAMERICANS...................................................45JacobM.Keaton,JacklynN.Hellwege,MaggieC.Y.Ng,NicholetteD.Palmer,JamesS.Pankow,MyriamFornage,JamesG.Wilson,AdolofoCorrea,LauraJ.Rasmussen-Torvik,JeromeI.Rotter,Yii-DerI.Chen,KentD.Taylor,StephenS.Rich,LynneE.Wagenknecht,BarryI.Freedman,DonaldW.Bowden

META-ANALYSISOFCONTINUOUSPHENOTYPESIDENTIFIESAGENESIGNATURETHATCORRELATESWITHCOPDDISEASESTATUS.......................................................................................46MadeleineScott,FrancescoVallania,PurveshKhatri

LEARNINGPARSIMONIOUSENSEMBLESFORUNBALANCEDCOMPUTATIONALGENOMICSPROBLEMS...................................................................................................................................47AnaStanescu,GauravPandey

NETWORKMAPOFADVERSEHEALTHEFFECTSAMONGVICTIMSOFINTIMATEPARTNERVIOLENCE.......................................................................................................................................48KathleenWhiting,LarryY.Liu,MehmetKoyutürk,GunnurKarakurt

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 49APOWERFULMETHODFORINCLUDINGGENOTYPEUNCERTAINTYINTESTSOFHARDY-WEINBERGEQUILIBRIUM............................................................................................................50AndrewBeck,AlexanderLuedtke,KeliLiu,NathanTintle

MICRORNA-AUGMENTEDPATHWAYS(MIRAP)ANDTHEIRAPPLICATIONSTOPATHWAYANALYSISANDDISEASESUBTYPING...............................................................................51DianaDiaz,MicheleDonato,TinNguyen,SorinDraghici

FREQUENTSUBGRAPHMININGOFPERSONALIZEDSIGNALINGPATHWAYNETWORKSGROUPSPATIENTSWITHFREQUENTLYDYSREGULATEDDISEASEPATHWAYSANDPREDICTSPROGNOSIS....................................................................................................................................52ArdaDurmaz,TimA.D.Henderson,DouglasBrubaker,GurkanBebek

CERNASEARCHMETHODIDENTIFIEDAMET-ACTIVATEDSUBGROUPAMONGEGFRDNAAMPLIFIEDLUNGADENOCARCINOMAPATIENTS.................................................................53HallaKabat,LeoTunkle,InhanLee

IMPROVEDPERFORMANCEOFGENESETANALYSISONGENOME-WIDETRANSCRIPTOMICSDATAWHENUSINGGENEACTIVITYSTATEESTIMATES...................54ThomasKamp,MicahAdams,CraigDisselkoen,NathanTintle

Page 6: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

v

METHYLDMV:SIMULTANEOUSDETECTIONOFDIFFERENTIALDNAMETHYLATIONANDVARIABILITYWITHCONFOUNDERADJUSTMENT.................................................................55PeiFenKuan,JunyanSong,ShuyaoHe

IDENTIFYCANCERDRIVERGENESTHROUGHSHAREDMENDELIANDISEASEPATHOGENICVARIANTSANDCANCERSOMATICMUTATIONS.................................................56MengMa,ChangchangWang,BenjaminGlicksberg,EricE.Schadt,ShuyuLi,RongChen

IDENTIFYINGCANCERSPECIFICMETABOLICSIGNATURESUSINGCONSTRAINT-BASEDMODELS.................................................................................................................................................................57AndréSchultz,SanketMehta,ChenyueW.Hu,FiekeW.Hoff,TerzahM.Horton,StevenM.Kornblau,AminaA.Qutub

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY58MAPPINGNEURONALCELLTYPESUSINGINTEGRATIVEMULTI-SPECIESMODELINGOFHUMANANDMOUSESINGLECELLRNASEQUENCING...................................................................59TravisJohnson,ZacharyAbrams,YanZhang,KunHuang

ASPATIOTEMPORALMODELTOSIMULATECHEMOTHERAPYREGIMENSFORHETEROGENEOUSBLADDERCANCERMETASTASESTOTHELUNG........................................60KimberlyR.KanigelWinner,JamesC.Costello

SCALABLEVISUALIZATIONFORHIGH-DIMENSIONALSINGLE-CELLDATA.........................61JuhoKim,NateRussell,JianPeng

POSTERPRESENTATIONSCOMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION 62CLUSTER-BASEDGENOTYPE-ENVIRONMENT-PHENOTYPECORRELATIONALGORITHM.........................................................................................................................................................63ErnestoBorrayo,RyokoMachida-Hirano

QUANTITATINGTRANSLATIONALCONTROL:MRNAABUNDANCE-DEPENDENTANDINDEPENDENTCONTRIBUTIONS..............................................................................................................64JingyiJessicaLi,Guo-LiangChew,MarkD.Biggin

PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION................................................................................................................................65ShengWang,MengQu,JianPen

GENERAL 66IDENTIFICATIONOFDIFFERENTIALLYPHOSPHORYLATEDMODULESINPROTEININTERACTIONNETWORKS...........................................................................................................................67MarziehAyati,DanicaWiredja,DanielaSchlatzer,GouthamNarla,MarkChance,MehmetKoyutürk

CLUSTERINGMETHODFORPRIORITIZINGBREASTCANCERRISKGENESANDMIRNAS..................................................................................................................................................................68YongshengBai,NaureenAslam,AliSalman

FUSIONDB:ASSESSINGMICROBIALDIVERSITYANDENVIRONMENTALPREFERENCESVIAFUNCTIONALSIMILARITY....................................................................................................................69ChengshengZhu,YannickMahlich,YanaBromberg

Page 7: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

vi

THEGEORGEM.O’BRIENKIDNEYTRANSLATIONALCORECENTERATTHEUNIVERSITYOFMICHIGAN......................................................................................................................................................70FrankC.Brosius,WenjunJu,KeithBellovich,ZeenatBhat,CrystalGadegbeku,DebbieGipson,JenniferHawkins,JuliaHerzog,SusanMassengill,RichardC.McEachin,SubramaniamPennathur,KalyaniPerumal,RogerWiggins,MatthiasKretzler

MININGDIRECTIONALDRUGINTERACTIONEFFECTSONMYOPATHYUSINGTHEFAERSDATABASE............................................................................................................................................................71DanaiChasioti,XiaohuiYao,PengyueZhang,XiaNing,LangLi,LiShen

DECIPHERINGNEURONALBROADHISTONEH3K4ME3DOMAINSASSOCIATEDWITH GENE-REGULATORYNETWORKSANDCONSERVEDEPIGENOMICLANDSCAPESINTHEHUMANBRAIN...................................................................................................................................................72AslihanDincer,EricE.Schadt,BinZhang,JoelT.Dudley,DavinGavin,SchahramAkbarian

NORMALIZATIONTECHNIQUESANDMACHINELEARNINGCLASSIFICATIONFORASSIGNINGMOLECULARSUBSETSINAUTOIMMUNEDISEASEANDCANCER....................73JenniferM.Franks,GuoshuaiCai,JaclynN.Taroni,MichaelL.Whitfield

MULTI-OMICSDATAINTEGRATIONTOSTRATIFYPOPULATIONINHEPATOCELLULARCARCINOMA.........................................................................................................................................................74KumardeepChaudhary,OlivierPoirion,LiangqunLu,LanaGarmire

TOWARDSSTANDARDS-BASEDCLINICALDATAWEBAPPLICATIONLEVERAGINGSHINYRANDHL7FHIR..................................................................................................................................75NaHong,NareshProdduturi,ChenWang,GuoqianJiang

ADATALAKEPLATFORMOFCONTEXTUALBIOLOGICALINFORMATIONFORAGILETRANSLATIONALRESEARCH......................................................................................................................76AustinHuang,DmitriBichko,MathieuBoespflug,EdskodeVries,FacundoDominguez,DanielZiemek

GENOMEREADIN-MEMORY(GRIM)FILTER:FASTLOCATIONFILTERINGINDNAREADMAPPINGUSINGEMERGINGMEMORYTECHNOLOGIES................................................................77JeremieKim,DamlaSenol,HongyiXin,DonghyukLee,MohammedAlser,HasanHassan,OguzErgin,CanAlkan,OnurMutlu

BCL-2FAMILYMEMBERSASREGULATORSOFRESPONSIVENESSTOBORTEZOMIBINAMULTIPLEMYELOMAMODEL.....................................................................................................................78MelissaE.Ko,CharisTeh,ChristopherS.Playter,EliR.Zunder,DanielH.Gray,WendyJ.Fantl,SylviaK.Plevritis,GarryP.Nolan

BIOMEDICALTEXT-MININGAPPLICATIONSFORTHESYSTEMDEEPDIVE.............................79EmilyK.Mallory,ChrisRe,RussB.Altman

PROFILINGADAPTIVEIMMUNEREPERTOIRESACROSSMULTIPLEHUMANTISSUESBYRNASEQUENCING.............................................................................................................................................80SergheiMangul,IgorMandric,HarryTaegyunYang,DennisMontoya,NicolasStrauli,JeremyRotman,BenjaminStatz,WillVanDerWey,AlexZelikovsky,RobertoSpreafico,MauraRossetti,SagivShifman,MarkAnsel,NoahZaitlen,EleazarEskin

THECMHVARIANTWAREHOUSE-ACATALOGOFGENETICVARIATIONINPATIENTSOFACHILDREN'SHOSPITAL.......................................................................................................................81NeilMIller,GreysonTwist,ByunggilYoo,AndreaGaedigk

MUTPRED2ANDITSAPPLICATIONTOTHEINFERENCEOFMOLECULARSIGNATURESOFDISEASE..........................................................................................................................................................82VikasPejaver,LiliaM.Iakoucheva,SeanD.Mooney,PredragRadivojac

HIV-TRACE:MONITORINGTHEHIVEPIDEMICINNEARREALTIMEUSINGLARGENATIONALANDGLOBALSCALEMOLECULAREPIDEMIOLOGY..................................................83SergeiPond,StevenWeaver,JoelWertheim,AndrewJ.LeighBrown

Page 8: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

vii

THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY........................................................................................84MaryA.Pyc,EmilyGiron,PhilipCheung,DouglasFenger,J.StevendeBelle,TimTully

RESCUETHEMISSINGVARIANTS-LESSONSLEARNEDFROMLARGESEQUENCINGPROJECTS..............................................................................................................................................................85YingxueRen,JosephS.Reddy,VivekanandaSarangi,JasonP.Sinnwell,SteveG.Younkin,NilüferErtekin-Taner,OwenA.Ross,RosaRademakers,ShannonK.McDonnell,JoannaM.Biernacka,YanW.Asmann

TOWARDEFFECTIVEMICRORNAQUANTIFICATIONFROMSMALLRNA-SEQ.......................86PamelaRussell,RichardRadcliffe,BrianVestal,WenShi,PratyaydiptaRudra,LauraSaba,KaterinaKechris

NANOPORESEQUENCINGTECHNOLOGYANDTOOLS:COMPUTATIONALANALYSISOFTHECURRENTSTATE,BOTTLENECKSANDFUTUREDIRECTIONS..........................................87DamlaSenol,JeremieKim,SaugataGhose,CanAlkan,OnurMutlu

DETECTINGOUTLIERSFROMMULTIDIMENSIONALDATAWITHAPPLICATIONINCANCER..................................................................................................................................................................88KyleSmith,SubhajyotiDe,DebashisGosh

HUEMR:INTUITIVEMININGOFELECTRONICMEDICALRECORDS...........................................89AbiodunOtolorin,NanaOsafo,WilliamSoutherland

DECIPHERINGLUNGADENOCARCINOMAMORPHOLOGYANDPROGNOSISBYINTEGRATINGOMICSANDHISTOPATHOLOGY..................................................................................90Kun-HsingYu,GeraldJ.Berry,DanielL.Rubin,ChristopherRé,RussB.Altman,MichaelSnyder

EXPLORINGDEEPLEARNINGFORCOPYNUMBERVARIATIONDETECTIONWITHNGSDATA.......................................................................................................................................................................91Yao-zhongZhang,RuiYamaguchi,SeiyaImoto,SatoruMiyano

IMAGINGGENOMICS 92PERIPHERALEPIGENETICASSOCIATIONSWITHBRAINGRAYMATTERINSCHIZOPHRENIA................................................................................................................................................93DongdongLin,VinceD.Calhoun,JuanR.Bustillo,NoraPerrone-Bizzozero,JingyuLiu

THEINTERPLAYBETWEENOLIGO-TARGETSPECIFICANDGENOME-WIDEOFF-TARGETINTERACTIONS...................................................................................................................................................94OlgaV.Matveeva,NafisaN.Nazipova,AlekseyY.Ogurtsov,SvetlanaA.Shabalina

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 95WARS2IMPLICATEDASACOMMONMODIFIEROFMETFORMINMETABOLITEBIOMARKERSINABIOBANKCOHORT...................................................................................................96AlyssaI.Clay,RichardM.Weinshilboum,K.SreekumaranNair,RimaF.Kaddurah-Daouk,LieweiWang,MatthewK.Breitenstein

ESTIMATIONOFFALSENEGATIVERATESVIAEMBEDDINGSIMULATEDEVENTS..........97StephenV.Gliske,KatyL.Lau,BenjaminH.Brinkman,GregA.Worrell,CrisG.Fink,WilliamC.Stacey

INTEGRATIVE,INTERPRETABLEDEEPLEARNINGFRAMEWORKSFORREGULATORYGENOMICSANDEPIGENOMICS..................................................................................................................98ChuanShengFoo,AvantiShrikumar,JohnnyIsraeli,PeytonGreenside,ChrisProbert,AnnaScherbina,RahulMohan,NathanBoley,AnshulKundaje

VISUALIZATIONOFCOMPLEXDISEASESANDRELATEDGENESETS......................................99ModestvonKorff,TobiasFink,ThomasSander

Page 9: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

viii

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 100FINDINGSFROMTHEFOURTHCRITICALASSESSMENTOFGENOMEINTERPRETATION,ACOMMUNITYEXPERIMENTTOEVALUATEPHENOTYPEPREDICTION............................101StevenE.Brenner,GaiaAndreoletti,RogerAHoskins,JohnMoult,CAGIParticipants

ASTROLABE:EXPANSIONTOCYP2C9ANDCYP2C1.......................................................................102AndreaGaedigk,GreysonP.Twist,SarahSoden,EmilyG.Farrow,NeilA.Miller

HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER................................................................................................................................................................103JonathanGallion,AngelaD.Wilkins,OlivierLichtarge

SCOTCH:ANOVELMETHODTODETECTINSERTIONSANDDELETIONSFROMNGSDATA.....................................................................................................................................................................104RachelGoldfeder,EuanAshley

MAYOOMICSREPOSITORYFORTRANSLATIONALMEDICINE..................................................105IainHorton,JeanetteEckel-Passow,StevenHart,ShannonMcDonnell,DavidMead,GayGay Reed,GregDougherty,JasonRoss,JulieSwank,MarkMyers,MathieuWiepert,RamaVolety,TonyStai,YaxiongLin,RobertFreimuth

PHARMACOGENOMICSCLINICALANNOTATIONTOOL(PHARMCAT).....................................106T.E.Klein,M.Whirl-Carrillo,R.M.Whaley,M.Woon,K.Sangkuhl,LesterG.Carter,H.M.Dunnenberger,P.E.Empey,A.T.Frase,R.R.Freimuth,A.Gaedigk,A.Gordon,C. Haidar,J.K.Hicks,J.M.Hoffman,M.T.Lee,N.Miller,S.D.Mooney,T.N.Person,J.F.Peterson,M.V.Relling,S.A.Scott,G.Twist,A.Verma,M.S.Williams,C.Wu,W.Yang,M.D.Ritchie

PCSK9MODULATINGVARIANTSINFAMILIALHYPERCHOLESTEROLEMIA......................107SarathbabuKrishnamurthy,DianeSmelser,ManickamKandamurugu,JosephLeader,NouraS.Abul-Husn,AlanR.Shuldiner,DavidH.Ledbetter,FrederickE.Dewey,DavidJ.Carey,MichaelF.Murray,RaghuP.R.Metpally

INTEGRATIVENETWORKANALYSISOFPROSTATETISSUELINCRNA-MRNAEXPRESSIONPROFILESREVEALSPOTENTIALREGULATORYMECHANISMSOFPROSTATECANCERRISKLOCI.................................................................................................................108NicholasB.Larson,ShannonMcDonnell,ZachFogarty,MelissaLarson,JohnCheville,ShaunRiska,SaurabhBaheti,AshaA.Nair,DanielO’Brien,Jaime Davila, Daniel Schaid, Stephen N. Thibodeau

INTEGRATEDANALYSISOFGENOMICS,PROTEOMICS,ANDPHOSPHOPROTEOMICSINCELLSANDTUMORSAMPLES...................................................................................................................109JasonE.McDermott,TaoLiu,SamuelPayne,VladislavPetyuk,RichardSmith,PhilippMertins,StevenCarr,KarinRodland

NETDX:PATIENTCLASSIFICATIONUSINGINTEGRATEDPATIENTSIMILARITYNETWORKS........................................................................................................................................................110ShraddhaPai,ShirleyHui,RuthIsserlin,HussamKaka,GaryD.Bader

PREVALENCEANDDETECTIONOFLOW-ALLELE-FRACTIONVARIANTSINCLINICALCANCERSAMPLES...........................................................................................................................................111Hyun-TaeShin,JaeWonYun,NayoungK.D.Kim,Yoon-LaChoi,Woong-YangPark,PeterJ.Park

AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETS.......................................112JeffreyA.Thompson,CarmenJ.Marsit

CYP2D6DIPLOTYPECALLINGFROMWGSUSINGASTROLABE:UPDATE............................113AndreaGaedigk,GreysonP.Twist,SarahSoden,EmilyG.Farrow,NeilA.Miller

Page 10: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

ix

INTEGRATION,INTERPRETATIONANDDISPLAYOFMULTI-OMICDATAFORPRECISIONMEDICINE...........................................................................................................................................................114DavidS.Wishart,AnaMarcu,AnChiGuo,AshAnwar,SolveigJohannessen,CraigKnox,MichaelWilson,ChristophH.Borchers,PieterCullis,RobertFraser

BIOTHINGSAPIS:LINKEDHIGH-PERFORMANCEAPISFORBIOLOGICALENTITIES..........115JiwenXin,CyrusAfrasiabi,SebastienLelong,GingerTsueng,SeanD.Mooney,AndrewI.Su,ChunleiWu

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY116SINGLECELLSIGNALINGSTATESREVEALINDUCTIONOFNON-GENETICVARIATIONINRESISTANCETOTRAIL-INDUCEDAPOPTOSIS..................................................................................117ReemaBaskar,HarrisFienberg,GarryNolan,SeanBendall

ANOVELK-NEARESTNEIGHBORSAPPROACHTOCOMPAREMULTIPLEBIOLOGICALCONDITIONSINSINGLECELLDATA......................................................................................................118TylerJ.Burns,GarryP.Nolan,NikolaySamusik

SINGLE-CELLRNASEQUENCINGINPRIMARYGLIOBLASTOMA:IMPROVINGANALYSISOFHETEROGENEOUSSAMPLESBYINCORPORATINGQUANTIFICATIONOFUNCERTAINTY..................................................................................................................................................119WendyMarieIngram,DebdiptoMisra,NicholasF.Marko,MarylynRitchie

REGISTRATIONOFFLOWCYTOMETRYDATAUSINGSWIFTCLUSTERTEMPLATESTOREMOVECHANNEL-SPECIFICORCLUSTER-SPECIFICVARIATION.........................................120JonathanA.Rebhahn,SallyA.Quataert,GauravSharma,TimR.Mosmann

WORKSHOP:NOBOUNDARYTHINKINGINBIOINFORMATICS 121ENABLINGRICHERDATAINTEGRATIONFORGENOMICEPIDEMIOLOGY..........................122E. Griffiths,D.Dooley,C.Bertelli,J.Adam,F.Bristow,T.Matthews,A.Petkau,M.Courtot,J.A. Carriço,A.Keddy,R.Beiko,L.M.Schriml,E.Taboada,M.Graham,G.VanDomselaar,W. Hsiao,F.Brinkman

AUTHORINDEX 123

Page 11: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

1

COMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 12: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

2

IDENTIFICATIONANDANALYSISOFBACTERIALGENOMICMETABOLICSIGNATURES

NathanBowerman1,NathanTintle2,MatthewDeJongh3,AaronA.Best1

1DepartmentofBiology,HopeCollege;2DepartmentofMathematicsandStatistics,DordtCollege,3DepartmentofComputerScience,HopeCollege

BestAaronWithcontinuedrapidgrowthinthenumberandqualityoffullysequencedandaccuratelyannotatedbacterialgenomes,wehaveunprecedentedopportunitiestounderstandmetabolicdiversity.Weselected101diverseandrepresentativecompletelysequencedbacteriaandimplementedamanualcurationefforttoidentify846uniquemetabolicvariantspresentinthesebacteria.Thepresenceorabsenceofthesevariantsactasametabolicsignatureforeachofthebacteria,whichcanthenbeusedtounderstandsimilaritiesanddifferencesbetweenandacrossbacterialgroups.Weproposeanovelandrobustmethodofsummarizingmetabolicdiversityusingmetabolicsignaturesandusethismethodtogenerateametabolictree,clusteringmetabolicallysimilarorganisms.Resultinganalysisofthemetabolictreeconfirmsstrongassociationswithwell-establishedbiologicalresultsalongwithdirectinsightintoparticularmetabolicvariantswhicharemostpredictiveofmetabolicdiversity.Thepositiveresultsofthismanualcurationeffortandnovelmethoddevelopmentsuggestthatfutureworkisneededtofurtherexpandthesetofbacteriatowhichthisapproachisappliedandusetheresultingtreetotestbroadquestionsaboutmetabolicdiversityandcomplexityacrossthebacterialtreeoflife.

Page 13: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

3

WHENSHOULDWENOTTRANSFERFUNCTIONALANNOTATIONBETWEENSEQUENCEPARALOGS?

MengfeiCao,LenoreJ.Cowen

TuftsUniversity

LenoreCowenCurrentautomatedcomputationalmethodstoassignfunctionallabelstounstudiedgenesofteninvolvetransferringannotationfromorthologousorparalogousgenes,howeversuchgenescanevolvedivergentfunctions,makingsuchtransferinappropriate.Weconsidertheproblemofdeterminingwhenitiscorrecttomakesuchanassignmentbetweenparalogs.Weconstructabenchmarkdatasetoftwotypesofsimilarparalogouspairsofgenesinthewell-studiedmodelorganismS.cerevisiae:onesetofpairswheresingledeletionmutantshaveverysimilarphenotypes(implyingsimilarfunctions),andanothersetofpairswheresingledeletionmutantshaveverydivergentphenotypes(implyingdifferentfunctions).Stateoftheartmethodsforthisproblemwilldeterminetheevolutionaryhistoryoftheparalogswithreferencestomultiplerelatedspecies.Here,weaskafirstandsimplerquestion:weexploretowhatextentanycomputationalmethodwithaccessonlytodatafromasinglespeciescansolvethisproblem.Weconsiderdivergencedata(atboththeaminoacidandnucleotidelevels),andnetworkdata(basedontheyeastprotein-proteininteractionnetwork,ascapturedinBioGRID),andaskifwecanextractfeaturesfromthesedatathatcandistinguishbetweenthesesetsofparalogousgenepairs.Wefindthatthebestfeaturescomefrommeasuresofsequencedivergence,however,simplenetworkmeasuresbasedondegreeorcentralityorshortestpathordiffusionstatedistance(DSD),orsharedneighborhoodintheyeastprotein-proteininteraction(PPI)networkalsocontainsomesignal.Oneshould,ingeneral,nottransferfunctionifsequencedivergenceistoohigh.Furtherimprovementsinclassificationwillneedtocomefrommorecomputationallyexpensivebutmuchmorepowerfulevolutionarymethodsthatincorporateancestralstatesandmeasureevolutionarydivergenceovermultiplespeciesbasedonevolutionarytrees.

Page 14: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

4

PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION

ShengWang,MengQu,JianPeng

UniversityofIllinoisUrbana-Champaign

ShengWangAutomatedannotationofproteinfunctionhasbecomeacriticaltaskinthepost-genomicera.Network-basedapproachesandhomology-basedapproacheshavebeenwidelyusedandrecentlytestedinlarge-scalecommunity-wideassessmentexperiments.Itisnaturaltointegratenetworkdatawithhomologyinformationtofurtherimprovethepredictiveperformance.However,integratingthesetwoheterogeneous,high-dimensionalandnoisydatasetsisnon-trivial.Inthiswork,weintroduceanovelproteinfunctionpredictionalgorithmProSNet.Anintegratedheterogeneousnetworkisfirstbuilttoincludemolecularnetworksofmultiplespeciesandlinktogetherhomologousproteinsacrossmultiplespecies.Basedonthisintegratednetwork,adimensionalityreductionalgorithmisintroducedtoobtaincompactlow-dimensionalvectorstoencodeproteinsinthenetwork.Finally,wedevelopmachinelearningclassificationalgorithmsthattakethevectorsasinputandmakepredictionsbytransferringannotationsbothwithineachspeciesandacrossdifferentspecies.Extensiveexperimentsonfivemajorspeciesdemonstratethatourintegrationofhomologywithmolecularnetworkssubstantiallyimprovesthepredictiveperformanceoverexistingapproaches.

Page 15: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

5

ONTHEPOWERANDLIMITSOFSEQUENCESIMILARITYBASEDCLUSTERINGOFPROTEINSINTOFAMILIES

ChristianWiwie,RichardRöttger

UniversityofSouthernDenmark

RichardRöttgerOverthelastdecades,wehaveobservedanongoingtremendousgrowthofavailablesequencingdatafueledbytheadvancementsinwet-labtechnology.Thesequencinginformationisonlythebeginningoftheactualunderstandingofhoworganismssurviveandprosper.Itis,forinstance,equallyimportanttoalsounraveltheproteomicrepertoireofanorganism.Aclassicalcomputationalapproachfordetectingproteinfamiliesisasequence-basedsimilaritycalculationcoupledwithasubsequentclusteranalysis.Inthisworkwehaveintensivelyanalyzedvariousclusteringtoolsonalargescale.Weusedthedatatoinvestigatethebehaviorofthetools'parametersunderliningthediversityoftheproteinfamilies.Furthermore,wetrainedregressionmodelsforpredictingtheexpectedperformanceofaclusteringtoolforanunknowndatasetandaimedtoalsosuggestoptimalparametersinanautomatedfashion.Ouranalysisdemonstratesthebenefitsandlimitationsoftheclusteringofproteinswithlowsequencesimilarityindicatingthateachproteinfamilyrequiresitsowndistinctsetoftoolsandparameters.Allresults,atoolpredictionservice,andadditionalsupportingmaterialisalsoavailableonlineunderhttp://proteinclustering.compbio.sdu.dk/

Page 16: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

6

IMAGINGGENOMICS

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 17: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

7

INTEGRATIVEANALYSISFORLUNGADENOCARCINOMAPREDICTSMORPHOLOGICALFEATURESASSOCIATEDWITHGENETICVARIATIONS

ChaoWang1,HaiSu2,LinYang2,KunHuang1

1TheOhioStateUniversity,2UniversityofFlorida

KunHuangLungcancerisoneofthemostdeadlycancersandlungadenocarcinoma(LUAD)isthemostcommonhistologicaltypeoflungcancer.However,LUADishighlyheterogeneousduetogeneticdifferenceaswellasphenotypicdifferencessuchascellularandtissuemorphology.Inthispaper,wesystematicallyexaminetherelationshipsbetweenhistologicalfeaturesandgenetranscription.Specifically,wecalculated283morphologicalfeaturesfromhistologyimagesfor201LUADpatientsfromTCGAprojectandidentifiedthemorphologicalfeaturewithstrongcorrelationwithpatientoutcome.Wethenmodeledthemorphologyfeatureusingmultipleco-expressedgeneclustersusingLasso-regression.Manyofthegeneclustersarehighlyassociatedwithgeneticvariations,specificallyDNAcopynumbervariations,implyingthatgeneticvariationsplayimportantrolesinthedevelopmentcancermorphology.Asfarasweknow,ourfindingisthefirsttodirectlylinkthegeneticvariationsandfunctionalgenomicstoLUADhistology.Theseobservationswillleadtonewinsightonlungcancerdevelopmentandpotentialnewintegrativebiomarkersforpredictionpatientprognosisandresponsetotreatments.

Page 18: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

8

IDENTIFICATIONOFDISCRIMINATIVEIMAGINGPROTEOMICSASSOCIATIONSINALZHEIMER'SDISEASEVIAANOVELSPARSECORRELATIONMODEL

JingwenYan,ShannonL.Risacher,KwangsikNho,AndrewJ.Saykin,LiShen

IndianaUniversity

JingwenYanBrainimagingandproteinexpression,frombothcerebrospinalfluidandbloodplasma,havebeenfoundtoprovidecomplementaryinformationinpredictingtheclinicaloutcomesofAlzheimer'sdisease(AD).Buttheunderlyingassociationsthatcontributetosuchacomplementaryrelationshiphavenotbeenpreviouslystudiedyet.Inthiswork,wewillperformanimagingproteomicsassociationanalysistoexplorehowtheyarerelatedwitheachother.Whiletraditionalassociationmodels,suchasSparseCanonicalCorrelationAnalysis(SCCA),cannotguaranteetheselectionofonlydisease-relevantbiomarkersandassociations,weproposeanoveldiscriminativeSCCA(denotedasDSCCA)modelwithnewpenaltytermstoaccountforthediseasestatusinformation.Givenbrainimaging,proteomicanddiagnosticdata,theproposedmodelcanperformajointassociationandmulti-classdiscriminationanalysis,suchthatwecannotonlyidentifydisease-relevantmultimodalbiomarkers,butalsorevealstrongassociationsbetweenthem.Basedonarealimagingproteomicdataset,theempiricalresultsshowthatDSCCAandtraditionalSCCAhavecomparableassociationperformances.Butinafurtherclassificationanalysis,canonicalvariablesofimagingandproteomicdataobtainedinDSCCAdemonstratemuchmorediscriminationpowertowardmultiplepairsofdiagnosisgroupsthanthoseobtainedinSCCA.

Page 19: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

9

ENFORCINGCO-EXPRESSIONINMULTIMODALREGRESSIONFRAMEWORK

PascalZille1,VinceD.Calhoun2,Yu-PingWang1

1TulaneUniversity,2UniversityofNewMexico

PascalZilleWeconsidertheproblemofmultimodaldataintegrationforthestudyofcomplexneurologicaldiseases(e.g.schizophrenia).Amongthechallengesarisinginsuchsituation,estimatingthelinkbetweengeneticandneurologicalvariabilitywithinapopulationsamplehasbeenapromisingdirection.Awidevarietyofstatisticalmodelsarosefromsuchapplications.Forexample,Lassoregressionanditsmultitaskextensionareoftenusedtofitamultivariatelinearrelationshipbetweengivenphenotype(s)andassociatedobservations.Otherapproaches,suchascanonicalcorrelationanalysis(CCA),arewidelyusedtoextractrelationshipsbetweensetsofvariablesfromdifferentmodalities.Inthispaper,weproposeanexploratorymultivariatemethodcombiningthesetwomethods.MoreSpecifically,werelyona'CCA-type'formulationinordertoregularizetheclassicalmultimodalLassoregressionproblem.Theunderlyingmotivationistoextractdiscriminativevariablesthatdisplayarealsoco-expressedacrossmodalities.Wefirstevaluatethemethodonasimulateddataset,andfurthervalidateitusingSingleNucleotidePolymorphisms(SNP)andfunctionalMagneticResonanceImaging(fMRI)dataforthestudyofschizophrenia.

Page 20: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

10

METHODSTOENSURETHEREPRODUCIBILITYOFBIOMEDICALRESEARCH

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 21: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

11

EXPLORINGTHEREPRODUCIBILITYOFPROBABILISTICCAUSALMOLECULARNETWORKMODELS

AriellaCohain,AparnaA.Divaraniya,KuixiZhu,JosephR.Scarpa,AndrewKasarskis,JunZhu,RuiChang,JoelT.Dudley,EricE.Schadt

IcahnInstituteandDepartmentofGeneticsandGenomics,IcahnSchoolofMedicineatMountSinai

AriellaCohainNetworkreconstructionalgorithmsareincreasinglybeingemployedinbiomedicalandlifesciencesresearchtointegratelarge-scale,high-dimensionaldatainformingonlivingsystems.OneparticularclassofprobabilisticcausalnetworksbeingappliedtomodelthecomplexityandcausalstructureofbiologicaldataisBayesiannetworks(BNs).BNsprovideanelegantmathematicalframeworkfornotonlyinferringcausalrelationshipsamongmanydifferentmolecularandhigherorderphenotypes,butalsoforincorporatinghighlydiversepriorsthatprovideanefficientpathforincorporatingexistingknowledge.WhilesignificantmethodologicaldevelopmentshavebroadlyenabledtheapplicationofBNstogenerateandvalidatemeaningfulbiologicalhypotheses,thereproducibilityofBNsinthiscontexthasnotbeensystematicallyexplored.Inthisstudy,weaimtodeterminethecriteriaforgeneratingreproducibleBNsinthecontextoftranscription-basedregulatorynetworks.Weutilizetwouniquetissuesfromindependentdatasets,wholebloodfromtheGTExConsortiumandliverfromtheStockholm-TartuAtherosclerosisReverseNetworkEngineeringTeam(STARNET)study.WeevaluatedthereproducibilityoftheBNsbycreatingnetworksondatasubsampledatdifferentlevelsfromeachcohortandcomparingthesenetworkstotheBNsconstructedusingthecompletedata.Tohelpvalidateourresults,weusedsimulatednetworksatvaryingsamplesizes.OurstudyindicatesthatreproducibilityofBNsinbiologicalresearchisanissueworthyoffurtherconsideration,especiallyinlightofthemanypublicationsthatnowemployfindingsfromsuchconstructswithoutappropriateattentionpaidtoreproducibility.Wefindthatwhileedge-to-edgereproducibilityisstronglydependentonsamplesize,identificationofmorehighlyconnectedkeydrivernodesinBNscanbecarriedoutwithhighconfidenceacrossarangeofsamplesizes.

Page 22: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

12

REPRODUCIBLEDRUGREPURPOSING:WHENSIMILARITYDOESNOTSUFFICE

EmreGuney

JointIRB-BSC-CRGPrograminComputationalBiology-InstituteforResearchinBiomedicine(IRB)Barcelona

EmreGuneyRepurposingexistingdrugsfornewuseshasattractedconsiderableattentionoverthepastyears.Toidentifypotentialcandidatesthatcouldberepositionedforanewindication,manystudiesmakeuseofchemical,target,andsideeffectsimilaritybetweendrugstotrainclassifiers.Despitepromisingpredictionaccuraciesofthesesupervisedcomputationalmodels,theiruseinpractice,suchasforrarediseases,ishinderedbytheassumptionthattherearealreadyknownandsimilardrugsforagivenconditionofinterest.Inthisstudy,usingpubliclyavailabledatasets,wequestionthepredictionaccuraciesofsupervisedapproachesbasedondrugsimilaritywhenthedrugsinthetrainingandthetestsetarecompletelydisjoint.WefirstbuildaPythonplatformtogeneratereproduciblesimilarity-baseddrugrepurposingmodels.Next,weshowthat,whileasimplechemical,target,andsideeffectsimilaritybasedmachinelearningmethodcanachievegoodperformanceonthebenchmarkdataset,thepredictionperformancedropssharplywhenthedrugsinthefoldsofthecrossvalidationarenotoverlappingandthesimilarityinformationwithinthetrainingandtestsetsareusedindependently.Theseintriguingresultssuggestrevisitingtheassumptionsunderlyingthevalidationscenariosofsimilarity-basedmethodsandunderlinetheneedforunsupervisedapproachestoidentifynoveldrugusesinsidetheunexploredpharmacologicalspace.WemakethedigitalnotebookcontainingthePythoncodetoreplicateouranalysisthatinvolvesthedrugrepurposingplatformbasedonmachinelearningmodelsandtheproposeddisjointcrossfoldgenerationmethodfreelyavailableatgithub.com/emreg00/repurpose.

Page 23: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

13

EMPOWERINGMULTI-COHORTGENEEXPRESSIONANALYSISTOINCREASEREPRODUCIBILITY

WinstonA.Haynes,FrancescoVallania,CharlesLiu,ErikaBongen,AurelieTomczak,MartaAndres-Terrè,ShaneLofgren,AndrewTam,ColeA.Deisseroth,MatthewD.Li,

TimothyE.Sweeney,PurveshKhatri

StanfordUniversity

WinstonHaynesAmajorcontributortothescientificreproducibilitycrisishasbeenthattheresultsfromhomogeneous,single-centerstudiesdonotgeneralizetoheterogeneous,realworldpopulations.Multi-cohortgeneexpressionanalysishashelpedtoincreasereproducibilitybyaggregatingdatafromdiversepopulationsintoasingleanalysis.Tomakethemulti-cohortanalysisprocessmorefeasible,wehaveassembledananalysispipelinewhichimplementsrigorouslystudiedmeta-analysisbestpractices.Wehavecompiledandmadepubliclyavailabletheresultsofourownmulti-cohortgeneexpressionanalysisof103diseases,spanning615studiesand36,915samples,throughanovelandinteractivewebapplication.Asaresult,wehavemadeboththeprocessofandtheresultsfrommulti-cohortgeneexpressionanalysismoreapproachablefornon-technicalusers.

Page 24: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

14

RABIX:ANOPEN-SOURCEWORKFLOWEXECUTORSUPPORTINGRECOMPUTABILITYANDINTEROPERABILITYOFWORKFLOWDESCRIPTIONS

GauravKaushik,SinisaIvkovic,JankoSimonovic,NebojsaTijanic,BrandiDavis-Dusenbery,DenizKural

SevenBridgesGenomics

GauravKaushikAsbiomedicaldatahasbecomeincreasinglyeasytogenerateinlargequantities,themethodsusedtoanalyzeithaveproliferatedrapidly.Reproducibleandreusablemethodsarerequiredtolearnfromlargevolumesofdatareliably.Toaddressthisissue,numerousgroupshavedevelopedworkflowspecificationsorexecutionengines,whichprovideaframeworkwithwhichtoperformasequenceofanalyses.OnesuchspecificationistheCommonWorkflowLanguage,anemergingstandardwhichprovidesarobustandflexibleframeworkfordescribingdataanalysistoolsandworkflows.Inaddition,reproducibilitycanbefurtheredbyexecutorsorworkflowengineswhichinterpretthespecificationandenableadditionalfeatures,suchaserrorlogging,fileorganization,optimizationstocomputationandjobscheduling,andallowforeasycomputingonlargevolumesofdata.Tothisend,wehavedevelopedtheRabixExecutora,anopen-sourceworkflowengineforthepurposesofimprovingreproducibilitythroughreusabilityandinteroperabilityofworkflowdescriptions.

Page 25: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

15

DATASHARINGANDCLINICALGENETICTESTING:SUCCESSESANDCHALLENGES

ShanYang1,MelissaCline2,CanZhang2,BenedictPaten2,StephenE.Lincoln1

1Invitae,2UniversityofCaliforniaSantaCruz

StephenLincolnOpensharingofclinicalgeneticdatapromisestobothmonitorandeventuallyimprovethereproducibilityofvariantinterpretationamongclinicaltestinglaboratories.AsignificantpublicdataresourcehasbeendevelopedbytheNIHClinVarinitiative,whichincludessubmissionsfromhundredsoflaboratoriesandclinicsworldwide.WeanalyzedasubsetofClinVardatafocusedonspecificclinicalareasandwefindhighreproducibility(>90%concordance)amonglabs,althoughchallengesforthecommunityareclearlyidentifiedinthisdataset.WefurtherreviewresultsforthecommonlytestedBRCA1andBRCA2genes,whichshowevenhigherconcordance,althoughthesignificantfragmentationofdataintodifferentsilospresentsanongoingchallengenowbeingaddressedbytheBRCAExchange.Weencouragealllaboratoriesandclinicstocontributetotheseimportantresources.

Page 26: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

16

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM?

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 27: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

17

LEARNINGATTRIBUTESOFDISEASEPROGRESSIONFROMTRAJECTORIESOFSPARSELABVALUES

VibhuAgarwal1,NigamH.Shah2

1BiomedicalInformaticsTrainingProgram,StanfordUniversity,2TheCenterforBiomedicalInformaticsResearch,StanfordUniversity

VibhuAgarwalThereisheterogeneityinthemanifestationofdiseases,thereforeitisessentialtounderstandthepatternsofprogressionofadiseaseinagivenpopulationfordiseasemanagementaswellasforclinicalresearch.Diseasestatusisoftensummarizedbyrepeatedrecordingsofoneormorephysiologicalmeasures.Asaresult,historicalvaluesofthesephysiologicalmeasuresforapopulationsamplecanbeusedtocharacterizediseaseprogressionpatterns.Weuseamethodforclusteringsparsefunctionaldataforidentifyingsub-groupswithinacohortofpatientswithchronickidneydisease(CKD),basedonthetrajectoriesoftheirCreatininemeasurements.Wedemonstratethroughaproof-of-principlestudyhowthetwosub-groupsthatdisplaydistinctpatternsofdiseaseprogressionmaybecomparedonclinicalattributesthatcorrespondtothemaximumdifferenceinprogressionpatterns.Thekeyattributesthatdistinguishthetwosub-groupsappeartohavesupportinpublishedliteratureclinicalpracticerelatedtoCKD.

Page 28: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

18

COMPUTERAIDEDIMAGESEGMENTATIONANDCLASSIFICATIONFORVIABLEANDNON-VIABLETUMORIDENTIFICATIONINOSTEOSARCOMA

HarishBabuArunachalam1,RashikaMishra1,BogdanArmaselu1,OvidiuDaescu1,MariaMartinez1,PatrickLeavey1,DineshRakheja2,KevinCederberg2,AnitaSengupta2,Molly

Ni'Suilleabhain2

1UniversityofTexasatDallas,2UniversityofTexasSouthwesternMedicalCenter

HarishBabuArunachalamOsteosarcomaisoneofthemostcommontypesofbonecancerinchildren.Togaugetheextentofcancertreatmentresponseinthepatientaftersurgicalresection,theH&Estainedimageslidesaremanuallyevaluatedbypathologiststoestimatethepercentageofnecrosis,atimeconsumingprocesspronetoobserverbiasandinaccuracy.Digitalimageanalysisisapotentialmethodtoautomatethisprocess,thussavingtimeandprovidingamoreaccurateevaluation.TheslidesarescannedinAperioScanscope,convertedtodigitalWholeSlideImages(WSIs)andstoredinSVSformat.Thesearehighresolutionimages,oftheorderof10^9pixels,allowingupto40Xmagnificationfactor.Thispaperproposesanimagesegmentationandanalysistechniqueforsegmentingtumorandnon-tumorregionsinhistopathologicalWSIsofosteosarcomadatasets.Ourapproachisacombinationofpixel-basedandobject-basedmethodswhichutilizetumorpropertiessuchasnucleicluster,density,andcircularitytoclassifytumorregionsasviableandnon-viable.AK-Meansclusteringtechniqueisusedfortumorisolationusingcolornormalization,followedbymulti-thresholdOtsusegmentationtechniquetofurtherclassifytumorregionasviableandnon-viable.ThenaFlood-fillalgorithmisappliedtoclustersimilarpixelsintocellularobjectsandcomputeclusterdataforfurtheranalysisofregionsunderstudy.TothebestofourknowledgethisisthefirstcomprehensivesolutionthatisabletoproducesuchaclassificationforOsteosarcomacancer.Theresultsareveryconclusiveinidentifyingviableandnon-viabletumorregions.Inourexperiments,theaccuracyofthediscussedapproachis100%inviabletumorandcoagulativenecrosisidentificationwhileitisaround90%forfibrosisandacellular/hypocellulartumorosteoid,forallthesampleddatasetsused.Weexpectthedevelopedsoftwaretoleadtoasignificantincreaseinaccuracyanddecreaseininter-observervariabilityinassessmentofnecrosisbythepathologistsandareductioninthetimespentbythepathologistsinsuchassessments.

Page 29: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

19

MISSINGDATAIMPUTATIONINTHEELECTRONICHEALTHRECORDUSINGDEEPLYLEARNEDAUTOENCODERS

BrettK.Beaulieu-Jones1,JasonH.Moore2,ThePooledResourceOpen-AccessALSClinicalTrialsConsortium

1GenomicsandComputationalBiologyGraduateGroup,ComputationalGeneticsLab,InstituteforBiomedicalInformatics,PerelmanSchoolofMedicine,UniversityofPennsylvania;2ComputationalGeneticsLab,InstituteforBiomedicalInformatics,

UniversityofPennsylvania

BrettBeaulieu-JonesElectronichealthrecords(EHRs)havebecomeavitalsourceofpatientoutcomedatabutthewidespreadprevalenceofmissingdatapresentsamajorchallenge.DifferentcausesofmissingdataintheEHRdatamayintroduceunintentionalbias.Here,wecomparetheeffectivenessofpopularmultipleimputationstrategieswithadeeplylearnedautoencoderusingthePooledResourceOpen-AccessALSClinicalTrialsDatabase(PRO-ACT).Toevaluateperformance,weexaminedimputationaccuracyforknownvaluessimulatedtobeeithermissingcompletelyatrandomormissingnotatrandom.WealsocomparedALSdiseaseprogressionpredictionacrossdifferentimputationmodels.Autoencodersshowedstrongperformanceforimputationaccuracyandcontributedtothestrongestdiseaseprogressionpredictor.Finally,weshowthatdespiteclinicalheterogeneity,ALSdiseaseprogressionappearshomogenouswithtimefromonsetbeingthemostimportantpredictor.

Page 30: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

20

DEVELOPMENTANDPERFORMANCEOFTEXT-MININGALGORITHMSTOEXTRACTSOCIOECONOMICSTATUSFROMDE-IDENTIFIEDELECTRONICHEALTH

RECORDS

BrittanyM.Hollister1,NicoleA.Restrepo2,EricFarber-Eger3,DanaC.Crawford2,MelindaC.Aldrich4,AmyNon5

1VanderbiltGeneticInstitute,VanderbiltUniversity;2InstituteforComputationalBiologyandDepartmentofEpidemiologyandBiostatistics,CaseWesternReserveUniversity;3VanderbiltInstituteforClinicalandTranslationalResearch,VanderbiltUniversity;

4DepartmentofThoracicSurgeryandDivisionofEpidemiology,VanderbiltUniversityMedicalCenter;5DepartmentofAnthropology,UniversityofCaliforniaSanDiego

BrittanyHollisterSocioeconomicstatus(SES)isafundamentalcontributortohealth,andakeyfactorunderlyingracialdisparitiesindisease.However,SESdataarerarelyincludedingeneticstudiesdueinparttothedifficultlyofcollectingthesedatawhenstudieswerenotoriginallydesignedforthatpurpose.Theemergenceoflargeclinic-basedbiobankslinkedtoelectronichealthrecords(EHRs)providesresearchaccesstolargepatientpopulationswithlongitudinalphenotypedatacapturedinstructuredfieldsasbillingcodes,procedurecodes,andprescriptions.SESdatahowever,areoftennotexplicitlyrecordedinstructuredfields,butratherrecordedinthefreetextofclinicalnotesandcommunications.Thecontentandcompletenessofthesedatavarywidelybypractitioner.Toenablegene-environmentstudiesthatconsiderSESasanexposure,wesoughttoextractSESvariablesfromracial/ethnicminorityadultpatients(n=9,977)inBioVU,theVanderbiltUniversityMedicalCenterbiorepositorylinkedtode-identifiedEHRs.WedevelopedseveralmeasuresofSESusinginformationavailablewithinthede-identifiedEHR,includingbroadcategoriesofoccupation,education,insurancestatus,andhomelessness.TwohundredpatientswererandomlyselectedformanualreviewtodevelopasetofsevenalgorithmsforextractingSESinformationfromde-identifiedEHRs.Thealgorithmsconsistof15categoriesofinformation,with830uniquesearchterms.SESdataextractedfrommanualreviewof50randomlyselectedrecordswerecomparedtodataproducedbythealgorithm,resultinginpositivepredictivevaluesof80.0%(education),85.4%(occupation),87.5%(unemployment),63.6%(retirement),23.1%(uninsured),81.8%(Medicaid),and33.3%(homelessness),suggestingsomecategoriesofSESdataareeasiertoextractinthisEHRthanothers.TheSESdataextractionapproachdevelopedherewillenablefutureEHR-basedgeneticstudiestointegrateSESinformationintostatisticalanalyses.Ultimately,incorporationofmeasuresofSESintogeneticstudieswillhelpelucidatetheimpactofthesocialenvironmentondiseaseriskandoutcomes.

Page 31: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

21

DEMODASHBOARD:VISUALIZINGANDUNDERSTANDINGGENOMICSEQUENCESUSINGDEEPNEURALNETWORKS

JackLanchantin,RitambharaSingh,BeilunWang,YanjunQi

UniversityofVirginia

JackLanchantinDeepneuralnetwork(DNN)modelshaverecentlyobtainedstate-of-the-artpredictionaccuracyforthetranscriptionfactorbinding(TFBS)siteclassificationtask.However,itremainsunclearhowtheseapproachesidentifymeaningfulDNAsequencesignalsandgiveinsightsastowhyTFsbindtocertainlocations.Inthispaper,weproposeatoolkitcalledtheDeepMotifDashboard(DeMoDashboard)whichprovidesasuiteofvisualizationstrategiestoextractmotifs,orsequencepatternsfromdeepneuralnetworkmodelsforTFBSclassification.WedemonstratehowtovisualizeandunderstandthreeimportantDNNmodels:convolutional,recurrent,andconvolutional-recurrentnetworks.Ourfirstvisualizationmethodisfindingatestsequence'ssaliencymapwhichusesfirst-orderderivativestodescribetheimportanceofeachnucleotideinmakingthefinalprediction.Second,consideringrecurrentmodelsmakepredictionsinatemporalmanner(fromoneendofaTFBSsequencetotheother),weintroducetemporaloutputscores,indicatingthepredictionscoreofamodelovertimeforasequentialinput.Lastly,aclass-specificvisualizationstrategyfindstheoptimalinputsequenceforagivenTFBSpositiveclassviastochasticgradientoptimization.Ourexperimentalresultsindicatethataconvolutional-recurrentarchitectureperformsthebestamongthethreearchitectures.ThevisualizationtechniquesindicatethatCNN-RNNmakespredictionsbymodelingbothmotifsaswellasdependenciesamongthem.

Page 32: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

22

PREDICTIVEMODELINGOFHOSPITALREADMISSIONRATESUSINGELECTRONICMEDICALRECORD-WIDEMACHINELEARNING:ACASE-STUDYUSINGMOUNT

SINAIHEARTFAILURECOHORT

KhaderShameer1,2,KippW.Johnson1,2,AlexandreYahi7,RiccardoMiotto1,2,LiLi1,2,DoranRicks3,JebakumarJebakaran4,PatriciaKovatch1,4,ParthoP.Sengupta5,AnnetineGelijns8,Alan

Moskovitz8,BruceDarrow5,DavidL.Reich6,AndrewKasarskis1,NicholasP.Tatonetti7,SeanPinney5,JoelT.Dudley1,2,8*

1DepartmentofGeneticsandGenomics,IcahnInstituteofGenomicsandMultiscaleBiology;2InstituteofNextGenerationHealthcare,MountSinaiHealthSystem,NY;3DecisionSupport,

MountSinaiHealthSystem,NY;4MountSinaiDataWarehouse,IcahnInstituteofGenomicsandMultiscaleBiology,NY;5ZenaandMichaelA.WienerCardiovascularInstitute,IcahnSchoolofMedicineatMountSinai,NY;6DepartmentofAnesthesiology,IcahnSchoolofMedicineatMountSinai,NY;7DepartmentsofBiomedicalInformatics,SystemsBiologyandMedicine,

ColumbiaUniversityMedicalCenter,NY;8PopulationHealthScienceandPolicy,MountSinaiHealthSystem,NY

*CorrespondingAuthor,Email:joel.dudley@mssm.eduKhaderShameerReductionofpreventablehospitalreadmissionsthatresultfromchronicoracuteconditionslikestroke,heartfailure,myocardialinfarctionandpneumoniaremainsasignificantchallengeforimprovingtheoutcomesanddecreasingthecostofhealthcaredeliveryintheUnitedStates.Patientreadmissionratesarerelativelyhighforconditionslikeheartfailure(HF)despitetheimplementationofhigh-qualityhealthcaredeliveryoperationguidelinescreatedbyregulatoryauthorities.Multiplepredictivemodelsarecurrentlyavailabletoevaluatepotential30-dayreadmissionratesofpatients.Mostofthesemodelsarehypothesisdrivenandrepetitivelyassessthepredictiveabilitiesofthesamesetofbiomarkersaspredictivefeatures.Inthismanuscript,wediscussourattempttodevelopadata-driven,electronic-medicalrecord-wide(EMR-wide)featureselectionapproachandsubsequentmachinelearningtopredictreadmissionprobabilities.Wehaveassessedalargerepertoireofvariablesfromelectronicmedicalrecordsofheartfailurepatientsinasinglecenter.Thecohortincluded1,068patientswith178patientswerereadmittedwithina30-dayinterval(16.66%readmissionrate).Atotalof4,205variableswereextractedfromEMRincludingdiagnosiscodes(n=1,763),medications(n=1,028),laboratorymeasurements(n=846),surgicalprocedures(n=564)andvitalsigns(n=4).WedesignedamultistepmodelingstrategyusingtheNaïveBayesalgorithm.Inthefirststep,wecreatedindividualmodelstoclassifythecases(readmitted)andcontrols(non-readmitted).Inthesecondstep,featurescontributingtopredictiveriskfromindependentmodelswerecombinedintoacompositemodelusingacorrelation-basedfeatureselection(CFS)method.Allmodelsweretrainedandtestedusinga5-foldcross-validationmethod,with70%ofthecohortusedfortrainingandtheremaining30%fortesting.ComparedtoexistingpredictivemodelsforHFreadmissionrates(AUCsintherangeof0.6-0.7),resultsfromourEMR-widepredictivemodel(AUC=0.78;Accuracy=83.19%)andphenome-widefeatureselectionstrategiesareencouragingandrevealtheutilityofsuchdata-drivenmachinelearning.Finetuningofthemodel,replicationusingmulti-centercohortsandprospectiveclinicaltrialtoevaluatetheclinicalutilitywouldhelptheadoptionofthemodelasaclinicaldecisionsystemforevaluatingreadmissionstatus.

Page 33: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

23

METHODSFORCLUSTERINGTIMESERIESDATAACQUIREDFROMMOBILEHEALTHAPPS

NicoleTignor1,PeiWang1,NicholasGenes1,LindaRogers1,StevenG.Hershman2,ErickR.Scott1,MicolZweig1,Yu-FengYvonneChan1,EricE.Schadt1

1IcahnSchoolofMedicineatMountSinai,2LifeMapSolutions

NicoleTignorInourrecentAsthmaMobileHealthStudy(AMHS),thousandsofasthmapatientsacrossthecountrycontributedmedicaldatathroughtheiPhoneAsthmaHealthApponadailybasisforanextendedperiodoftime.Thecollecteddataincludeddailyself-reportedasthmasymptoms,symptomtriggers,andrealtimegeographiclocationinformation.TheAMHSisjustoneofmanystudiesoccurringinthecontextofnowmanythousandsofmobilehealthappsaimedatimprovingwellnessandbettermanagingchronicdiseaseconditions,leveragingthepassiveandactivecollectionofdatafrommobile,handheldsmartdevices.Theabilitytoidentifypatientgroupsorpatternsofsymptomsthatmightpredictadverseoutcomessuchasasthmaexacerbationsorhospitalizationsfromthesetypesoflarge,prospectivelycollecteddatasets,wouldbeofsignificantgeneralinterest.However,conventionalclusteringmethodscannotbeappliedtothesetypesoflongitudinallycollecteddata,especiallysurveydataactivelycollectedfromappusers,givenheterogeneouspatternsofmissingvaluesdueto:1)varyingsurveyresponseratesamongdifferentusers,2)varyingsurveyresponseratesovertimeofeachuser,and3)non-overlappingperiodsofenrollmentamongdifferentusers.Tohandlesuchcomplicatedmissingdatastructure,weproposedaprobabilityimputationmodeltoinfermissingdata.Wealsoemployedaconsensusclusteringstrategyintandemwiththemultipleimputationprocedure.Throughsimulationstudiesunderarangeofscenariosreflectingrealdataconditions,weidentifiedfavorableperformanceoftheproposedmethodoverotherstrategiesthatimputethemissingvaluethroughlow-rankmatrixcompletion.WhenapplyingtheproposednewmethodtostudyasthmatriggersandsymptomscollectedaspartoftheAMHS,weidentifiedseveralpatientgroupswithdistinctphenotypepatterns.Furthervalidationofthemethodsdescribedinthispapermightbeusedtoidentifyclinicallyimportantpatternsinlargedatasetswithcomplicatedmissingdatastructure,improvingtheabilitytousesuchdatasetstoidentifyat-riskpopulationsforpotentialintervention.

Page 34: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

24

ANEWRELEVANCEESTIMATORFORTHECOMPILATIONANDVISUALIZATIONOFDISEASEPATTERNSANDPOTENTIALDRUGTARGETS

ModestvonKorff,TobiasFink,ThomasSander

ResearchInformationManagement,ActelionPharmaceuticalsLtd.

ModestvonKorffAnewcomputationalmethodispresentedtoextractdiseasepatternsfromheterogeneousandtext-baseddata.Forthisstudy,22millionPubMedrecordswereminedforco-occurrencesofgenenamesynonymsanddiseaseMeSHterms.TheresultingpublicationcountsweretransferredintoamatrixMdata.Inthismatrix,adiseasewasrepresentedbyarowandagenebyacolumn.Eachfieldinthematrixrepresentedthepublicationcountforaco-occurringdisease–genepair.AsecondmatrixwithidenticaldimensionsMrelevancewasderivedfromMdata.TocreateMrelevancethevaluesfromMdatawerenormalized.Thenormalizedvaluesweremultipliedbythecolumn-wisecalculatedGinicoefficient.Thismultiplicationresultedinarelevanceestimatorforeverygeneinrelationtoadisease.FromMrelevancethesimilaritiesbetweenallrowvectorswerecalculated.TheresultingsimilaritymatrixSrelevancerelated5,000diseasesbytherelevanceestimatorscalculatedfor15,000genes.Threediseaseswereanalyzedindetailforthevalidationofthediseasepatternsandtherelevantgenes.CytoscapewasusedtovisualizeandtoanalyzeMrelevanceandSrelevancetogetherwiththegenesanddiseases.Summarizingtheresults,itcanbestatedthattherelevanceestimatorintroducedherewasabletodetectvaliddiseasepatternsandtoidentifygenesthatencodedkeyproteinsandpotentialtargetsfordrugdiscoveryprojects.

Page 35: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

25

DISCOVERYOFFUNCTIONALANDDISEASEPATHWAYSBYCOMMUNITYDETECTIONINPROTEIN-PROTEININTERACTIONNETWORKS

StephenJ.Wilson,AngelaD.Wilkins,Chih-HsuLin,RhonaldC.Lua,OlivierLichtarge

BaylorCollegeofMedicine

StephenWilsonAdvancesincellular,molecular,anddiseasebiologydependonthecomprehensivecharacterizationofgeneinteractionsandpathways.Traditionally,thesepathwaysarecuratedmanually,limitingtheirefficientannotationand,potentially,reinforcingfield-specificbias.Here,inordertotestobjectiveandautomatedidentificationoffunctionallycooperativegenes,wecomparedanovelalgorithmwiththreeestablishedmethodstosearchforcommunitieswithingeneinteractionnetworks.Communitiesidentifiedbythenovelapproachandbyoneoftheestablishedmethodoverlappedsignificantly(q<0.1)withcontrolpathways.Withrespecttodisease,thesecommunitieswerebiasedtogeneswithpathogenicvariantsinClinVar(p<<0.01),andoftengenesfromthesamecommunitywereco-expressed,includinginbreastcancers.Theinterestingsubsetofnovelcommunities,definedbypooroverlaptocontrolpathwaysalsocontainedco-expressedgenes,consistentwithapossiblefunctionalrole.Thisworkshowsthatcommunitydetectionbasedontopologicalfeaturesofnetworkssuggestsnew,biologicallymeaningfulgroupingsofgenesthat,inturn,pointtohealthanddiseaserelevanthypotheses.

Page 36: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

26

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 37: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

27

OPENINGTHEDOORTOTHELARGESCALEUSEOFCLINICALLABMEASURESFORASSOCIATIONTESTING:EXPLORINGDIFFERENTMETHODSFORDEFINING

PHENOTYPES

ChristopherR.Bauer,DanielLavage,JohnSnyder,JosephLeader,J.MatthewMahoney,SarahA.Pendergrass

GeisingerHealthSystem,UniversityofVermont

ChristopherBauerThepastdecadehasseenexponentialgrowthinthenumbersofsequencedandgenotypedindividualsandacorrespondingincreaseinourabilityofcollectandcataloguephenotypicdataforuseintheclinic.Wenowfacethechallengeofintegratingthesediversedatainnewwaysnewthatcanprovideusefuldiagnosticsandprecisemedicalinterventionsforindividualpatients.Oneofthefirststepsinthisprocessistoaccuratelymapthephenotypicconsequencesofthegeneticvariationinhumanpopulations.Themostcommonapproachforthisisthegenomewideassociationstudy(GWAS).Whilethistechniqueisrelativelysimpletoimplementforagivenphenotype,thechoiceofhowtodefineaphenotypeiscritical.ItisbecomingincreasinglycommonforeachindividualinaGWAScohorttohavealargeprofileofquantitativemeasures.Thestandardapproachistotestforassociationswithonemeasureatatime;however,therearemanyjustifiablewaystodefineasetofphenotypes,andthegeneticassociationsthatarerevealedwillvarybasedonthesedefinitions.Somephenotypesmayonlyshowasignificantgeneticassociationsignalwhenconsideredtogether,suchasthroughprinciplecomponentsanalysis(PCA).Combiningcorrelatedmeasuresmayincreasethepowertodetectassociationbyreducingthenoisepresentinindividualvariablesandreducethemultiplehypothesistestingburden.HereweshowthatPCAandk-meansclusteringaretwocomplimentarymethodsforidentifyingnovelgenotype-phenotyperelationshipswithinasetofquantitativehumantraitsderivedfromtheGeisingerHealthSystemelectronichealthrecord(EHR).Usingadiversesetofapproachesfordefiningphenotypemayyieldmoreinsightsintothegeneticarchitectureofcomplextraitsandthefindingspresentedherehighlightaclearneedforfurtherinvestigationintoothermethodsfordefiningthemostrelevantphenotypesinasetofvariables.AsthedataofEHRcontinuetogrow,addressingtheseissueswillbecomeincreasinglyimportantinoureffortstousegenomicdataeffectivelyinmedicine.

Page 38: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

28

TEMPORALORDEROFDISEASEPAIRSAFFECTSSUBSEQUENTDISEASETRAJECTORIES:THECASEOFDIABETESANDSLEEPAPNEA

MetteBeck1,DavidWestergaard1,LeifGroop2,SorenBrunak1

1NovoNordiskFoundationCenterforProteinResearch;2LundUniversityDiabetesCentre,DepartmentofClinicalSciences

MetteBeckMoststudiesofdiseaseetiologiesfocusononediseaseonlyandnotthefullspectrumofmultimorbiditiesthatmanypatientshave.Somediseasepairshavesharedcausalorigins,othersrepresentcommonfollow-ondiseases,whileyetotherco-occurringdiseasesmaymanifestthemselvesinrandomorderofappearance.Wediscussthesedifferenttypesofdiseaseco-occurrences,andusethetwodiseases“sleepapnea”and“diabetes”toshowcasetheapproachwhichotherwisecanbeappliedtoanydiseasepair.WebenefitfromsevenmillionelectronicmedicalrecordscoveringtheentirepopulationofDenmarkformorethan20years.Sleepapneaisthemostcommonsleep-relatedbreathingdisorderandithaspreviouslybeenshowntobebidirectionallylinkedtodiabetes,meaningthateachdiseaseincreasestheriskofacquiringtheother.Weconfirmthatthereisnosignificanttemporalrelationship,asapproximatelyhalfofpatientswithbothdiseasesarediagnosedwithdiabetesfirst.However,wealsoshowthatpatientsdiagnosedwithdiabetesbeforesleepapneahaveahigherdiseaseburdencomparedtopatientsdiagnosedwithsleepapneabeforediabetes.Thestudyclearlydemonstratesthatitisnotonlythediagnosesinthepatient’sdiseasehistorythatareimportant,butalsothespecificorderinwhichthesediagnosisaregiventhatmattersintermsofoutcome.Wesuggestthatthisshouldbeconsideredforpatientstratification.

Page 39: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

29

HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER

JonathanGallion,AngelaD.Wilkins,OlivierLichtarge

BaylorCollegeofMedicine

JonathanGallionThediscoveryofdrivergenesisamajorpursuitofcancergenomics,usuallybasedonobservingthesamemutationindifferentpatients.Buttheheterogeneityofcancerpathwaysplusthehighbackgroundmutationalfrequencyoftumorcellsoftencloudthedistinctionbetweenlessfrequentdriversandinnocentpassengermutations.Here,toovercomethesedisadvantages,wegroupedtogethermutationsfromclosekinaseparalogsunderthehypothesisthatcognatemutationsmayfunctionallyfavorcancercellsinsimilarways.Indeed,wefindthatkinaseparalogsoftenbearmutationstothesamesubstitutedaminoacidatthesamealignedpositionsandwithalargepredictedEvolutionaryAction.Functionally,thesehighEvolutionaryAction,non-randommutationsaffectknownkinasemotifs,butstrikingly,theydosodifferentlyamongdifferentkinasetypesandcancers,consistentwithdifferencesinselectivepressures.Takentogether,theseresultssuggestthatcancerpathwaysmayflexiblydistributeadependenceonagivenfunctionalmutationamongmultipleclosekinaseparalogs.Therecognitionofthis“mutationaldelocalization”ofcancerdriversamonggroupsofparalogsisanewphenomenathatmayhelpbetteridentifyrelevantmechanismsandthereforeeventuallyguidepersonalizedtherapy.

Page 40: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

30

MUSE:AMULTI-LOCUSSAMPLING-BASEDEPISTASISALGORITHMFORQUANTITATIVEGENETICTRAITPREDICTION

DanHe,LaxmiParida

IBMThomasJ.WatsonResearchCenter

DanHeQuantitativegenetictraitpredictionbasedonhigh-densitygenotypingarraysplaysanimportantroleforplantandanimalbreeding,aswellasgeneticepidemiologysuchascomplexdiseases.Thepredictioncanbeveryhelpfultodevelopbreedingstrategiesandiscrucialtotranslatethefindingsingeneticstoprecisionmedicine.Epistasis,thephenomenawheretheSNPsinteractwitheachother,hasbeenstudiedextensivelyinGenomeWideAssociationStudies(GWAS)butreceivedrelativelylessattentionforquantitativegenetictraitprediction.Asthenumberofpossibleinteractionsisgenerallyextremelylarge,evenpairwiseinteractionsisverychallenging.Toourknowledge,thereisnosolidsolutionyettoutilizeepistasistoimprovegenetictraitprediction.Inthiswork,westudiedthemulti-locusepistasisproblemwheretheinteractionswithmorethantwoSNPsareconsidered.WedevelopedanefficientalgorithmMUSEtoimprovethegenetictraitpredictionwiththehelpofmulti-locusepistasis.MUSEissampling-basedandweproposedafewdifferentsamplingstrategies.OurexperimentsonrealdatashowedthatMUSEisnotonlyefficientbutalsoeffectivetoimprovethegenetictraitprediction.MUSEalsoachievedverysignificantimprovementsonarealplantdatasetaswellasarealhumandataset.

Page 41: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

31

DIFFERENTIALPATHWAYDEPENDENCYDISCOVERYASSOCIATEDWITHDRUGRESPONSEACROSSCANCERCELLLINES

GilSpeyer1,DivyaMahendra1,HaiJ.Tran1,JeffKiefer1,StuartL.Schreiber2,PaulA.Clemons2,HarshilDhruv1,MichaelBerens1,SeungchanKim1

1TheTranslationalGenomicsResearchInstitute,2BroadInstituteofHarvardandMIT

SeungchanKimTheefforttopersonalizetreatmentplansforcancerpatientsinvolvestheidentificationofdrugtreatmentsthatcaneffectivelytargetthediseasewhileminimizingthelikelihoodofadversereactions.Inthisstudy,thegene-expressionprofileof810cancercelllinesandtheirresponsedatato368smallmoleculesfromtheCancerTherapeuticsResearchPortal(CTRP)areanalyzedtoidentifypathwayswithsignificantrewiringbetweengenes,ordifferentialgenedependency,betweensensitiveandnon-sensitivecelllines.Identifiedpathwaysandtheircorrespondingdifferentialdependencynetworksarefurtheranalyzedtodiscoveressentialityandspecificitymediatorsofcelllineresponsetodrugs/compounds.ForanalysisweusethepreviouslypublishedmethodEDDY(EvaluationofDifferentialDependencY).EDDYfirstconstructslikelihooddistributionsofgene-dependencynetworks,aidedbyknowngene-geneinteraction,fortwogivenconditions,forexample,sensitivecelllinesvs.non-sensitivecelllines.Thesesetsofnetworksyieldadivergencevaluebetweentwodistributionsofnetworklikelihoodsthatcanbeassessedforsignificanceusingpermutationtests.Resultingdifferentialdependencynetworkswerethenfurtheranalyzedtoidentifygenes,termedmediators,whichmayplayimportantrolesinbiologicalsignalingincertaincelllinesthataresensitiveornon-sensitivetothedrugs.Establishingstatisticalcorrespondencebetweencompoundsandmediatorscanimproveunderstandingofknowngenedependenciesassociatedwithdrugresponsewhilealsodiscoveringnewdependencies.Millionsofcomputehoursresultedinthousandsofthesestatisticaldiscoveries.EDDYidentified8,811statisticallysignificantpathwaysleadingto26,822compound-pathway-mediatortriplets.ByincorporatingSTITCHandSTRINGdatabases,wecouldconstructevidencenetworksfor14,415compound-pathway-mediatortripletsforsupport.Theresultsofthisanalysisarepresentedinasearchablewebsitetoaidresearchersinstudyingpotentialmolecularmechanismsunderlyingcells’drugresponseaswellasindesigningexperimentsforthepurposeofpersonalizedtreatmentregimens.

Page 42: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

32

AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETSIN

CLEARCELLKIDNEYCANCER

JeffreyA.Thompson1,CarmenJ.Marsit2

1DartmouthCollege,2EmoryUniversity

JeffreyThompsonManyresearchersnowhaveavailablemultiplehigh-dimensionalmolecularandclinicaldatasetswhenstudyingadisease.Asweenterthismulti-omiceraofdataanalysis,newapproachesthatcombinedifferentlevelsofdata(e.g.atthegenomicandepigenomiclevels)arerequiredtofullycapitalizeonthisopportunity.Inthiswork,weoutlineanewapproachtomulti-omicdataintegration,whichcombinesmolecularandclinicalpredictorsaspartofasingleanalysistocreateaprognosticriskscoreforclearcellrenalcellcarcinoma.Theapproachintegratesdatainmultiplewaysandyetcreatesmodelsthatarerelativelystraightforwardtointerpretandwithahighlevelofperformance.Furthermore,theproposedprocessofdataintegrationcapturesrelationshipsinthedatathatrepresenthighlydisease-relevantfunctions.

Page 43: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

33

DENOVOMUTATIONSINAUTISMIMPLICATETHESYNAPTICELIMINATIONNETWORK

GuhanRamVenkataraman1,ChloeO'Connell1,FumikoEgawa2,DornaKashef-Haghighi1,DennisPaulWall1

1StanfordUniversity,2St.George'sUniversity

FumikoEgawaAutismhasbeenshowntohaveamajorgeneticriskcomponent;thearchitectureofdocumentedautisminfamilieshasbeenoverandagainshowntobepasseddownforgenerations.Whileinheritedriskplaysanimportantroleintheautisticnatureofchildren,denovo(germline)mutationshavealsobeenimplicatedinautismrisk.HerewefindthatautismdenovovariantsverifiedandpublishedintheliteratureareBonferroni-significantlyenrichedinagenesetimplicatedinsynapticelimination.Additionally,severalofthegenesinthissynapticeliminationsetthatwereenrichedinprotein-proteininteractions(CACNA1C,SHANK2,SYNGAP1,NLGN3,NRXN1,andPTEN)havebeenpreviouslyconfirmedasgenesthatconferriskforthedisorder.Theresultsdemonstratethatautism-associateddenovosarelinkedtopropersynapticpruninganddensity,hintingattheetiologyofautismandsuggestingpathophysiologyfordownstreamcorrectionandtreatment.

Page 44: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

34

IDENTIFYINGGENETICASSOCIATIONSWITHVARIABILITYINMETABOLICHEALTHANDBLOODCOUNTLABORATORYVALUES:DIVINGINTOTHE

QUANTITATIVETRAITSBYLEVERAGINGLONGITUDINALDATAFROMANEHR

ShefaliS.Verma1,AnastasiaM.Lucas1,DanielR.Lavage1,JosephB.Leader1,RaghuMetpally2,SarathbabuKrishnamurthy1,FrederickDewey1,IngridBorecki1,AlexanderLopez3,JohnOverton3,

JohnPenn3,JeffreyReid3,SarahA.Pendergrass1,GerdaBreitwieser2,MarylynD.Ritchie1

1DepartmentofBiomedicalandTranslationalInformatics,GeisingerHealthSystem,Danville,PA;2DepartmentofFunctionalandMolecularGenomics,GeisingerHealthSystem,Danville,PA;

3RegeneronGeneticsCenter,Tarrytown,NYShefaliSetiaVermaAwiderangeofpatienthealthdataisrecordedinElectronicHealthRecords(EHR).Thisdataincludesdiagnosis,surgicalprocedures,clinicallaboratorymeasurements,andmedicationinformation.Togetherthisinformationreflectsthepatient’smedicalhistory.ManystudieshaveefficientlyusedthisdatafromtheEHRtofindassociationsthatareclinicallyrelevant,eitherbyutilizingInternationalClassificationofDiseases,version9(ICD-9)codesorlaboratorymeasurements,orbydesigningphenotypealgorithmstoextractcaseandcontrolstatuswithaccuracyfromtheEHR.HerewedevelopedastrategytoutilizelongitudinalquantitativetraitdatafromtheEHRatGeisingerHealthSystemfocusingonoutpatientmetabolicandcompletebloodpaneldataasastartingpoint.ComprehensiveMetabolicPanel(CMP)aswellasCompleteBloodCounts(CBC)arepartsofroutinecareandprovideacomprehensivepicturefromhighlevelscreeningofpatients’overallhealthanddisease.Werandomlysplitourdataintotwodatasetstoallowfordiscoveryandreplication.Wefirstconductedagenome-wideassociationstudy(GWAS)withmedianvaluesof25differentclinicallaboratorymeasurementstoidentifyvariantsfromHumanOmniExpressExomebeadchipdatathatareassociatedwiththesemeasurements.Weidentified687variantsthatassociatedandreplicatedwiththetestedclinicalmeasurementsatp<5x10-08.SincelongitudinaldatafromtheEHRprovidesarecordofapatient’smedicalhistory,weutilizedthisinformationtofurtherinvestigatetheICD-9codesthatmightbeassociatedwithdifferencesinvariabilityofthemeasurementsinthelongitudinaldataset.WeidentifiedlowandhighvariancepatientsbylookingatchangeswithintheirindividuallongitudinalEHRlaboratoryresultsforeachofthe25clinicallabvalues(thuscreating50groups–ahighvarianceandalowvarianceforeachlabvariable).WethenperformedaPheWASanalysiswithICD-9diagnosiscodes,separatelyinthehighvariancegroupandthelowvariancegroupforeachlabvariable.Wefound717PheWASassociationsthatreplicatedatap-valuelessthan0.001.Next,weevaluatedtheresultsofthisstudybycomparingtheassociationresultsbetweenthehighandlowvariancegroups.Forexample,wefound39SNPs(inmultiplegenes)associatedwithICD-9250.01(Type-Idiabetes)inpatientswithhighvarianceofplasmaglucoselevels,butnotinpatientswithlowvarianceinplasmaglucoselevels.Anotherexampleistheassociationof4SNPsinUMODwithchronickidneydiseaseinpatientswithhighvarianceforaspartateaminotransferase(discoveryp-value:8.71x10-09andreplicationp-value:2.03x10-06).Ingeneral,weseeapatternofmanymore statisticallysignificantassociationsfrompatientswithhighvarianceinthequantitativelabvariables, incomparisonwiththelowvariancegroupacrossallofthe25laboratorymeasurements.Thisstudy isoneofthefirstofitskindtoutilizequantitativetraitvariancefromlongitudinallaboratorydatato findassociationsamonggeneticvariantsandclinicalphenotypesobtainedfromanEHR,integrating laboratoryvaluesanddiagnosiscodestounderstandthegeneticcomplexitiesofcommondiseases.

Page 45: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

35

STRATEGIESFOREQUITABLEPHARMACOGENOMIC-GUIDEDWARFARINDOSINGAMONGEUROPEANANDAFRICANAMERICANINDIVIDUALSINACLINICAL

POPULATION

LauraWiley1,JacobVanHouten2,DavidSamuels2,MelindaAldrich3,DanRoden2,JoshPeterson2,JoshuaDenny2

1UniversityofColorado,2VanderbiltUniversity,3VanderbiltUniversityMedicalCenter

LauraWileyThebloodthinnerwarfarinhasanarrowtherapeuticrangeandhighinter-andintra-patientvariabilityintherapeuticdoses.Severalstudieshaveshownthatpharmacogenomicvariantshelppredictstablewarfarindosing.However,retrospectiveandrandomizedcontrolledtrialsthatemploydosingalgorithmsincorporatingpharmacogenomicvariantsunderperforminAfricanAmericans.Thisstudysoughttodetermineif:1)includingadditionalvariantsassociatedwithwarfarindoseinAfricanAmericans,2)predictingwithinsingleancestrygroupsratherthanacombinedpopulation,or3)usingpercentageAfricanancestryratherthanobservedrace,wouldimprovewarfarindosingalgorithmsinAfricanAmericans.UsingBioVU,theVanderbiltUniversityMedicalCenterbiobanklinkedtoelectronicmedicalrecords,wecompared25modelingstrategiestoexistingalgorithmsusingacohortof2,181warfarinusers(1,928whites,253blacks).Wefoundthatapproachesincorporatingadditionalvariantsincreasedmodelaccuracy,butnotinclinicallysignificantways.RacestratificationincreasedmodelfidelityforAfricanAmericans,buttheimprovementwassmallandnotlikelytobeclinicallysignificant.UseofpercentAfricanancestryimprovedmodelfitinthecontextofracemisclassification.

Page 46: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

36

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 47: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

37

PRODUCTIONOFAPRELIMINARYQUALITYCONTROLPIPELINEFORSINGLENUCLEIRNA-SEQANDITSAPPLICATIONINTHEANALYSISOFCELLTYPE

DIVERSITYOFPOST-MORTEMHUMANBRAINNEOCORTEX

BrianAevermann1,JamisonMcCorrison1,PratapVenepally1,RebeccaHodge2,TrygveBakken2,JeremyMiller2,MarkNovotny1,DannyN.Tran1,FranciscoDiez-Fuertes3,LenaChristiansen4,FanZhang4,FrankSteemers4,RogerS.Lasken1,EdLein2,NicholasSchork1,

RichardH.Scheuermann1

1J.CraigVenterInstitute,2AllenInstituteforBrainScience,3InstitutodeSaludCarlosIII,4Illumina,Inc.

RichardScheuermannNextgenerationsequencingoftheRNAcontentofsinglecellsorsinglenuclei(sc/nRNA-seq)hasbecomeapowerfulapproachtounderstandthecellularcomplexityanddiversityofmulticellularorganismsandenvironmentalecosystems.However,thefactthattheprocedurebeginswitharelativelysmallamountofstartingmaterial,therebypushingthelimitsofthelaboratoryproceduresrequired,dictatesthatcarefulapproachesforsamplequalitycontrol(QC)areessentialtoreducetheimpactoftechnicalnoiseandsamplebiasindownstreamanalysisapplications.HerewepresentapreliminaryframeworkforsamplelevelqualitycontrolthatisbasedonthecollectionofaseriesofquantitativelaboratoryanddatametricsthatareusedasfeaturesfortheconstructionofQCclassificationmodelsusingrandomforestmachinelearningapproaches.We’veappliedthisinitialframeworktoadatasetcomprisedof2272singlenucleiRNA-seqresultsanddeterminedthat~79%ofsampleswereofhighquality.Removalofthepoorqualitysamplesfromdownstreamanalysiswasfoundtoimprovethecelltypeclusteringresults.Inaddition,thisapproachidentifiedquantitativefeaturesrelatedtotheproportionofuniqueorduplicatereadsandtheproportionofreadsremainingafterqualitytrimmingasusefulfeaturesforpass/failclassification.Theconstructionanduseofclassificationmodelsfortheidentificationofpoorqualitysamplesprovidesforanobjectiveandscalableapproachtosc/nRNA-seqqualitycontrol.

Page 48: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

38

TRACINGCO-REGULATORYNETWORKDYNAMICSINNOISY,SINGLE-CELLTRANSCRIPTOMETRAJECTORIES

PabloCordero,JoshuaM.Stuart

UCSantaCruzGenomicsInstitute,UniversityofCalifornia,SantaCruz

PabloCorderoTheavailabilityofgeneexpressiondataatthesinglecelllevelmakesitpossibletoprobethemolecularunderpinningsofcomplexbiologicalprocessessuchasdifferentiationandoncogenesis.Promisingnewmethodshaveemergedforreconstructingaprogression'trajectory'fromstaticsingle-celltranscriptomemeasurements.However,itremainsunclearhowtoadequatelymodeltheappreciablelevelofnoiseinthesedatatoelucidategeneregulatorynetworkrewiring.Here,wepresentaframeworkcalledSingleCellInferenceofMorphIngTrajectoriesandtheirAssociatedRegulation(SCIMITAR)thatinfersprogressionsfromstaticsingle-celltranscriptomesbyemployingacontinuousparametrizationofGaussianmixturesinhigh-dimensionalcurves.SCIMITARyieldsrichmodelsfromthedatathathighlightgeneswithexpressionandco-expressionpatternsthatareassociatedwiththeinferredprogression.Further,SCIMITARextractsregulatorystatesfromtheimplicatedtrajectory-evolvingco-expressionnetworks.Webenchmarkthemethodonsimulateddatatoshowthatityieldsaccuratecellorderingandgenenetworkinferences.Appliedtotheinterpretationofasingle-cellhumanfetalneurondataset,SCIMITARfindsprogression-associatedgenesincornerstoneneuraldifferentiationpathwaysmissedbystandarddifferentialexpressiontests.Finally,byleveragingtherewiringofgene-geneco-expressionrelationsacrosstheprogression,themethodrevealstheriseandfallofco-regulatorystatesandtrajectory-dependentgenemodules.Theseanalysesimplicatenewtranscriptionfactorsinneuraldifferentiationincludingputativeco-factorsforthemulti-functionalNFATpathway.

Page 49: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

39

ANUPDATEDDEBARCODINGTOOLFORMASSCYTOMETRYWITHCELLTYPE-SPECIFICANDCELLSAMPLE-SPECIFICSTRINGENCYADJUSTMENT

KristinI.Fread1,WilliamD.Strickland2,GarryP.Nolan3,EliR.Zunder1

1DepartmentofBiomedicalEngineering,UniversityofVirginia;2DepartmentofBiomedicalSciences,UniversityofVirginia;3DepartmentofMicrobiologyand

Immunology,StanfordUniversity

EliZunderPooledsampleanalysisbymasscytometrybarcodingcarriesmanyadvantages:reducedantibodyconsumption,increasedsamplethroughput,removalofcelldoublets,reductionofcross-contaminationbysamplecarryover,andtheeliminationoftube-to-tube-variabilityinantibodystaining.Asingle-celldebarcodingalgorithmwaspreviouslydevelopedtoimprovetheaccuracyandyieldofsampledeconvolution,butthismethodwaslimitedtousingfixedparametersfordebarcodingstringencyfiltering,whichcouldintroducecell-specificorsample-specificbiastocellyieldinscenarioswherebarcodestainingintensityandvariancearenotuniformacrossthepooledsamples.Toaddressthisissue,wehaveupdatedthealgorithmtooutputdebarcodingparametersforeverycellinthesample-assignedFCSfiles,whichallowsforvisualizationandanalysisoftheseparametersviaflowcytometryanalysissoftware.Thisstrategycanbeusedtodetectcelltype-specificandsample-specificeffectsontheunderlyingcelldatathatariseduringthedebarcodingprocess.Anadditionalbenefittothisstrategyisthedecouplingofbarcodestringencyfilteringfromthedebarcodingandsampleassignmentprocess.Thisisaccomplishedbyremovingthestringencyfiltersduringsampleassignment,andthenfilteringafterthefactwith1-and2-dimensionalgatingonthedebarcodingparameterswhichareoutputwiththeFCSfiles.Thesedataexplorationstrategiesserveasanimportantqualitycheckforbarcodedmasscytometrydatasets,andallowcelltypeandsample-specificstringencyadjustmentthatcanremovebiasincellyieldintroducedduringthedebarcodingprocess.

Page 50: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

40

IMAGINGGENOMICS

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 51: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

41

ADAPTIVETESTINGOFSNP-BRAINFUNCTIONALCONNECTIVITYASSOCIATIONVIAAMODULARNETWORKANALYSIS

ChenGao,JunghiKim,WeiPan

DivisionofBiostatistics,SchoolofPublicHealth,UniversityofMinnesota

WeiPanDuetoitshighdimensionalityandhighnoiselevels,analysisofalargebrainfunctionalnetworkmaynotbepowerfulandeasytointerpret;instead,decompositionofalargenetworkintosmallersubcomponentscalledmodulesmaybemorepromisingassuggestedbysomeempiricalevidence.Forexample,alterationofbrainmodularityisobservedinpatientssufferingfromvarioustypesofbrainmalfunctions.Althoughseveralmethodsexistforestimatingbrainfunctionalnetworks,suchasthesamplecorrelationmatrixorgraphicallassoforasparseprecisionmatrix,itisstilldifficulttoextractmodulesfromsuchnetworkestimates.Motivatedbytheseconsiderations,weadaptaweightedgeneco-expressionnetworkanalysis(WGCNA)frameworktoresting-statefMRI(rs-fMRI)datatoidentifymodularstructuresinbrainfunctionalnetworks.Modularstructuresareidentifiedbyusingtopologicaloverlapmatrix(TOM)elementsinhierarchicalclustering.Weproposeapplyinganewadaptivetestbuiltontheproportionaloddsmodel(POM)thatcanbeappliedtoahigh-dimensionalsetting,wherethenumberofvariables(p)canexceedthesamplesize(n)inadditiontotheusualp<nsetting.WeappliedourproposedmethodstotheADNIdatatotestforassociationsbetweenageneticvariantandeitherthewholebrainfunctionalnetworkoritsvarioussubcomponentsusingvariousconnectivitymeasures.Weuncoveredseveralmodulesbasedonthecontrolcohort,andsomeofthemweremarginallyassociatedwiththeAPOE4variantandseveralotherSNPs;however,duetothesmallsamplesizeoftheADNIdata,largerstudiesareneeded.

Page 52: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

42

EXPLORINGBRAINTRANSCRIPTOMICPATTERNS:ATOPOLOGICALANALYSISUSINGSPATIALEXPRESSIONNETWORKS

ZhanaKuncheva1,MichelleL.Krishnan2,GiovanniMontana2

1ImperialCollegeLondon,2King'sCollegeLondon

ZhanaKunchevaCharacterizingthetranscriptomearchitectureofthehumanbrainisfundamentalingaininganunderstandingofbrainfunctionanddisease.AnumberofrecentstudieshaveinvestigatedpatternsofbraingeneexpressionobtainedfromanextensiveanatomicalcoverageacrosstheentirehumanbrainusingexperimentaldatageneratedbytheAllenHumanBrainAtlas(AHBA)project.Inthispaper,weproposeanewrepresentationofagene'stranscriptionactivitythatexplicitlycapturesthepatternofspatialco-expressionacrossdifferentanatomicalbrainregions.Foreachgene,wedefineaSpatialExpressionNetwork(SEN),anetworkquantifyingco-expressionpatternsamongstseveralanatomicallocations.NetworksimilaritymeasuresarethenemployedtoquantifythetopologicalresemblancebetweenpairsofSENsandidentifynaturallyoccurringclusters.Usingnetwork-theoreticalmeasures,threelargeclustershavebeendetectedfeaturingdistincttopologicalproperties.WethenevaluatewhethertopologicaldiversityoftheSENsreflectssignificantdifferencesinbiologicalfunctionthroughageneontologyanalysis.WereportonevidencesuggestingthatoneofthethreeSENclustersconsistsofgenesspecificallyinvolvedinthenervoussystem,includinggenesrelatedtobraindisorders,whiletheremainingtwoclustersarerepresentativeofimmunity,transcriptionandtranslation.Thesefindingsareconsistentwithpreviousstudiesshowingthatbraingeneclustersaregenerallyassociatedwithoneofthesethreemajorbiologicalprocesses.

Page 53: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

43

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM?

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 54: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

44

ADEEPLEARNINGAPPROACHFORCANCERDETECTIONANDRELEVANTGENEIDENTIFICATION

PadidehDanaee,RezaGhaeini,DavidHendrix

OregonStateUniversity

PadidehDaneeCancerdetectionfromgeneexpressiondatacontinuestoposeachallengeduetothehighdimensionalityandcomplexityofthesedata.Afterdecadesofresearchthereisstilluncertaintyintheclinicaldiagnosisofcancerandtheidentificationoftumor-specificmarkers.Herewepresentadeeplearningapproachtocancerdetection,andtotheidentificationofgenescriticalforthediagnosisofbreastcancer.First,weusedStackedDenoisingAutoencoder(SDAE)todeeplyextractfunctionalfeaturesfromhighdimensionalgeneexpressionprofiles.Next,weevaluatedtheperformanceoftheextractedrepresentationthroughsupervisedclassificationmodelstoverifytheusefulnessofthenewfeaturesincancerdetection.Lastly,weidentifiedasetofhighlyinteractivegenesbyanalyzingtheSDAEconnectivitymatrices.Ourresultsandanalysisillustratethatthesehighlyinteractivegenescouldbeusefulcancerbiomarkersforthedetectionofbreastcancerthatdeservefurtherstudies.

Page 55: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

45

GENOME-WIDEINTERACTIONWITHSELECTEDTYPE2DIABETESLOCIREVEALSNOVELLOCIFORTYPE2DIABETESINAFRICANAMERICANS

JacobM.Keaton1,JacklynN.Hellwege1,MaggieC.Y.Ng1,NicholetteD.Palmer1,JamesS.Pankow2,MyriamFornage3,JamesG.Wilson4,AdolofoCorrea4,LauraJ.Rasmussen-Torvik5,JeromeI.Rotter6,Yii-DerI.Chen6,KentD.Taylor6,StephenS.Rich7,LynneE.

Wagenknecht1,BarryI.Freedman1,DonaldW.Bowden1

1WakeForestSchoolofMedicine,2UniversityofMinnesota,3UniversityofTexasHealthScienceCenteratHouston,4UniversityofMississippiMedicalCenter,5NorthwesternUniversityFeinbergSchoolofMedicine,6Harbor-UCLAMedicalCenter,7Universityof

Virginia

JacobKeatonType2diabetes(T2D)istheresultofmetabolicdefectsininsulinsecretionandinsulinsensitivity,yetmostT2Dlociidentifiedtodateinfluenceinsulinsecretion.WehypothesizedthatT2Dloci,particularlythoseaffectinginsulinsensitivity,canbeidentifiedthroughinteractionwithknownT2Dlociimplicatedininsulinsecretion.Totestthishypothesis,singlenucleotidepolymorphisms(SNPs)nominallyassociatedwithacuteinsulinresponsetoglucose(AIRg),adynamicmeasureoffirst-phaseinsulinsecretion,andpreviouslyassociatedwithT2Dingenome-wideassociationstudies(GWAS)wereidentifiedinAfricanAmericansfromtheInsulinResistanceAtherosclerosisFamilyStudy(IRASFS;n=492subjects).TheseSNPsweretestedforinteraction,individuallyandjointlyasageneticriskscore(GRS),usingGWASdatafromfivecohorts(ARIC,CARDIA,JHS,MESA,WFSM;n=2,725cases,4,167controls)withT2Dastheoutcome.Insinglevariantanalyses,suggestivelysignificant(Pinteraction<5x10-6)interactionswereobservedatseverallociincludingDGKB(rs978989),CDK18(rs12126276),CXCL12(rs7921850),HCN1(rs6895191),FAM98A(rs1900780),andMGMT(rs568530).Notablebeta-cellGRSinteractionsincludedtwoSNPsattheDGKBlocus(rs6976381;rs6962498).ThesedatasupportthehypothesisthatadditionalgeneticfactorscontributingtoT2Driskcanbeidentifiedbyinteractionswithinsulinsecretionloci.

Page 56: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

46

META-ANALYSISOFCONTINUOUSPHENOTYPESIDENTIFIESAGENESIGNATURETHATCORRELATESWITHCOPDDISEASESTATUS

MadeleineScott1,FrancescoVallania2,PurveshKhatri3

1StanfordMedicalSchool,StanfordUniversity,Stanford,California;2StanfordInstituteforImmunity,Transplantation,andInfection,StanfordUniversity,Stanford,California;3StanfordCenterforBiomedicalInformaticsResearch,StanfordUniversity,Stanford,

California

PurveshKhatriTheutilityofmulti-cohorttwo-classmeta-analysistoidentifyrobustdifferentiallyexpressedgenesignatureshasbeenwellestablished.However,manybiomedicalapplications,suchasgenesignaturesofdiseaseprogression,requireone-classanalysis.HerewedescribeanRpackage,MetaCorrelator,thatcanidentifyareproducibletranscriptionalsignaturethatiscorrelatedwithacontinuousdiseasephenotypeacrossmultipledatasets.Wesuccessfullyappliedthisframeworktoextractapatternofgeneexpressionthatcanpredictlungfunctioninpatientswithchronicobstructivepulmonarydisease(COPD)inbothperipheralbloodmononuclearcells(PBMCs)andtissue.OurresultspointtoadisregulationintheoxidationstateofthelungsofpatientswithCOPD,aswellasunderscoretheclassicallyrecognizedinflammatorystatethatunderliesthisdisease.

Page 57: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

47

LEARNINGPARSIMONIOUSENSEMBLESFORUNBALANCEDCOMPUTATIONALGENOMICSPROBLEMS

AnaStanescu,GauravPandey

IcahnSchoolofMedicineatMountSinai

GauravPandeyPredictionproblemsinbiomedicalsciencesaregenerallyquitedifficult,partiallyduetoincompleteknowledgeofhowthephenomenonofinterestisinfluencedbythevariablesandmeasurementsusedforprediction,aswellasalackofconsensusregardingtheidealpredictor(s)forspecificproblems.Inthesesituations,apowerfulapproachtoimprovingpredictionperformanceistoconstructensemblesthatcombinetheoutputsofmanyindividualbasepredictors,whichhavebeensuccessfulformanybiomedicalpredictiontasks.Moreover,selectinga{\itparsimonious}ensemblecanbeofevengreatervalueforbiomedicalsciences,whereitisnotonlyimportanttolearnanaccuratepredictor,butalsotointerpretwhatnovelknowledgeitcanprovideaboutthetargetproblem.Ensembleselectionisapromisingapproachforthistaskbecauseofitsabilitytoselectacollectivelypredictivesubset,oftenarelativelysmallone,ofallinputbasepredictors.Oneofthemostwell-knownalgorithmsforensembleselection,CES(Caruana{\itetal.}'sEnsembleSelection),generallyperformswellinpractice,butfacesseveralchallengesduetothedifficultyofchoosingtherightvaluesofitsvariousparameters.Sincethechoicesmadefortheseparametersareusuallyad-hoc,goodperformanceofCESisdifficulttoguaranteeforavarietyofproblemsordatasets.ToaddressthesechallengeswithCESandothersuchalgorithms,weproposeanovelheterogeneousensembleselectionapproachbasedontheparadigmofreinforcementlearning(RL),whichoffersamoresystematicandmathematicallysoundmethodologyforexploringthemanypossiblecombinationsofbasepredictorsthatcanbeselectedintoanensemble.WedevelopthreeRL-basedstrategiesforconstructingensemblesandanalyzetheirresultsontwounbalancedcomputationalgenomicsproblems,namelythepredictionofproteinfunctionandsplicesitesineukaryoticgenomes.Weshowthattheresultantensemblesareindeedsubstantiallymoreparsimoniousascomparedtothefullsetofbasepredictors,yetstillofferalmostthesameclassificationpower,especiallyforlargerdatasets.TheRLensemblesalsoyieldabettercombinationofparsimonyandpredictiveperformanceascomparedtoCES.

Page 58: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

48

NETWORKMAPOFADVERSEHEALTHEFFECTSAMONGVICTIMSOFINTIMATEPARTNERVIOLENCE

KathleenWhiting1,LarryY.Liu2,MehmetKoyutürk2,GunnurKarakurt2

1UniformedServicesUniversity,2CaseWesternReserveUniversity

GunnurKarakurtIntimatepartnerviolence(IPV)isaseriousproblemwithdevastatinghealthconsequences.ScreeningproceduresmayoverlookrelationshipsbetweenIPVandnegativehealtheffects.ToidentifyIPV-associatedwomen’shealthissues,weminednational,aggregatedde-identifiedelectronichealthrecorddataandcomparedfemalehealthissuesofdomesticabuse(DA)versusnon-DArecords,identifyingtermssignificantlymorefrequentfortheDAgroup.Aftercodingthesetermsinto28broadcategories,wedevelopedanetworkmaptodeterminestrengthofrelationshipsbetweencategoriesinthecontextofDA,findingthatacuteconditionsarestronglyconnectedtocardiovascular,gastrointestinal,gynecological,andneurologicalconditionsamongvictims.

Page 59: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

49

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 60: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

50

APOWERFULMETHODFORINCLUDINGGENOTYPEUNCERTAINTYINTESTSOFHARDY-WEINBERGEQUILIBRIUM

AndrewBeck1,AlexanderLuedtke2,KeliLiu3,NathanTintle4

1UniversityofMichigan,2UniversityofCalifornia-Berkeley,3HarvardUniversity,4DordtCollege

NathanTintleTheuseofposteriorprobabilitiestosummarizegenotypeuncertaintyispervasiveacrossgenotype,sequencingandimputationplatforms.Priorworkinmanycontextshasshowntheutilityofincorporatinggenotypeuncertainty(posteriorprobabilities)indownstreamstatisticaltests.TypicalapproachestoincorporatinggenotypeuncertaintywhentestingHardy-WeinbergequilibriumtendtolackcalibrationinthetypeIerrorrate,especiallyasgenotypeuncertaintyincreases.WeproposeanewapproachinthespiritofgenomiccontrolthatproperlycalibratesthetypeIerrorrate,whileyieldingimprovedpowertodetectdeviationsfromHardy-WeinbergEquilibrium.Wedemonstratetheimprovedperformanceofourmethodonbothsimulatedandrealgenotypes.

Page 61: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

51

MICRORNA-AUGMENTEDPATHWAYS(MIRAP)ANDTHEIRAPPLICATIONSTOPATHWAYANALYSISANDDISEASESUBTYPING

DianaDiaz1,MicheleDonato2,TinNguyen1,SorinDraghici1

1WayneStateUniversity,2StanfordUniversityMedicalCenter

SorinDraghiciMicroRNAsplayimportantrolesinthedevelopmentofmanycomplexdiseases.Becauseoftheirimportance,theanalysisofsignalingpathwaysincludingmiRNAinteractionsholdsthepotentialforunveilingthemechanismsunderlyingsuchdiseases.However,currentsignalingpathwaydatabasesarelimitedtointeractionsbetweengenesandignoremiRNAs.Here,weusetheinformationonmiRNAtargetstobuildadatabaseofmiRNA-augmentedpathways(mirAP),andweshowitsapplicationinthecontextsofintegrativepathwayanalysisanddiseasesubtyping.OurmiRNA-mRNAintegrativepathwayanalysispipelineincorporatesatopology-awareapproachthatwepreviouslyimplemented.Ourintegrativediseasesubtypingpipelinetakesintoaccountsurvivaldata,geneandmiRNAexpression,andknowledgeoftheinteractionsamonggenes.Wedemonstratetheadvantagesofourapproachbyanalyzingninesample-matcheddatasetsthatprovidebothmiRNAandmRNAexpression.WeshowthatintegratingmiRNAsintopathwayanalysisresultsingreaterstatisticalpower,andprovidesamorecomprehensiveviewoftheunderlyingphenomena.Wealsocompareourdiseasesubtypingmethodwiththestate-of-the-artintegrativeanalysisbyanalyzingacolorectalcancerdatabasefromTCGA.Thecolorectalcancersubtypesidentifiedbyourapproacharesignificantlydifferentintermsoftheirsurvivalexpectation.ThesemiRNA-augmentedpathwaysofferamorecomprehensiveviewandadeeperunderstandingofbiologicalpathways.Abetterunderstandingofthemolecularprocessesassociatedwithpatients'survivalcanhelptoabetterprognosisandanappropriatetreatmentforeachsubtype.

Page 62: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

52

FREQUENTSUBGRAPHMININGOFPERSONALIZEDSIGNALINGPATHWAYNETWORKSGROUPSPATIENTSWITHFREQUENTLYDYSREGULATEDDISEASE

PATHWAYSANDPREDICTSPROGNOSIS

ArdaDurmaz,TimA.D.Henderson,DouglasBrubaker,GurkanBebek

CaseWesternReserveUniversity

GurkanBebekMotivation:Largescalegenomicsstudieshavegeneratedcomprehensivemolecularcharacterizationofnumerouscancertypes.Subtypesformanytumortypeshavebeenestablished;however,theseclassificationsarebasedonmolecularcharacteristicsofasmallgenesetswithlimitedpowertodetectdysregulationatthepatientlevel.Wehypothesizethatfrequentgraphminingofpathwaystogatherpathwaysfunctionallyrelevanttotumorscancharacterizetumortypesandprovideopportunitiesforpersonalizedtherapies.Results:Inthisstudywepresentanintegrativeomicsapproachtogrouppatientsbasedontheiralteredpathwaycharacteristicsandshowprognosticdifferenceswithinbreastcancer(p<9.57E−10)andglioblastomamultiforme(p<0.05)patients.WewereablevalidatethisapproachinsecondaryRNA-Seqdatasetswithp<0.05andp<0.01respectively.Wealsoperformedpathwayenrichmentanalysistofurtherinvestigatethebiologicalrelevanceofdysregulatedpathways.Wecomparedourapproachwithnetwork-basedclassifieralgorithmsandshowedthatourunsupervisedapproachgeneratesmorerobustandbiologicallyrelevantclusteringwhereaspreviousapproachesfailedtoreportspecificfunctionsforsimilarpatientgroupsorclassifypatientsintoprognosticgroups.Conclusions:Theseresultscouldserveasameanstoimproveprognosisforfuturecancerpatients,andtoprovideopportunitiesforimprovedtreatmentoptionsandpersonalizedinterventions.TheproposednovelgraphminingapproachisabletointegratePPInetworkswithgeneexpressioninabiologicallysoundapproachandclusterpatientsintoclinicallydistinctgroups.WehaveutilizedbreastcancerandglioblastomamultiformedatasetsfrommicroarrayandRNA-Seqplatformsandidentifieddiseasemechanismsdifferentiatingsamples.

Page 63: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

53

CERNASEARCHMETHODIDENTIFIEDAMET-ACTIVATEDSUBGROUPAMONGEGFRDNAAMPLIFIEDLUNGADENOCARCINOMAPATIENTS

HallaKabat,LeoTunkle,InhanLee

miRcore

InhanLeeGiventhediversemolecularpathwaysinvolvedintumorigenesis,identifyingsubgroupsamongcancerpatientsiscrucialinprecisionmedicine.WhilemosttargetedtherapiesrelyonDNAmutationstatusintumors,responsestosuchtherapiesvaryduetothemanymolecularprocessesinvolvedinpropagatingDNAchangestoproteins(whichconstitutetheusualdrugtargets).ThoughRNAexpressionshavebeenextensivelyusedtocategorizetumors,identifyingclinicallyimportantsubgroupsremainschallenginggiventhedifficultyofdiscerningsubgroupswithinallpossibleRNA-RNAnetworks.Itisthusessentialtoincorporatemultipletypesofdata.Recently,RNAwasfoundtoregulateotherRNAthroughacommonmicroRNA(miR).TheseregulatingandregulatedRNAsarereferredtoascompetingendogenousRNAs(ceRNAs).However,globalcorrelationsbetweenmRNAandmiRexpressionsacrossallsampleshavenotreliablyyieldedceRNAs.Inthisstudy,wedevelopedaceRNA-basedmethodtoidentifysubgroupsofcancerpatientscombiningDNAcopynumbervariation,mRNAexpression,andmicroRNA(miR)expressiondatawithbiologicalknowledge.ClinicaldataisusedtovalidateidentifiedsubgroupsandceRNAs.SinceceRNAsarecausal,ceRNA-basedsubgroupsmaypresentclinicalrelevance.UsinglungadenocarcinomadatafromTheCancerGenomeAtlas(TCGA)asanexample,wefocusedonEGFRamplificationstatus,sinceatargetedtherapyforEGFRexists.WehypothesizedthatglobalcorrelationsbetweenmRNAandmiRexpressionsacrossallpatientswouldnotrevealimportantsubgroupsandthatclusteringofpotentialceRNAsmightdefinemolecularpathway-relevantsubgroups.UsingexperimentallyvalidatedmiR-targetpairs,weidentifiedEGFRandMETaspotentialceRNAsformiR-133binlungadenocarcinoma.TheEGFR-METupandmiR-133bdownsubgroupshowedahigherdeathratethantheEGFR-METdownandmiR-133bupsubgroup.AlthoughtransactivationbetweenMETandEGFRhasbeenidentifiedpreviously,ourresultisthefirsttoproposeceRNAasoneofitsunderlyingmechanisms.Furthermore,sinceMETamplificationwasseeninthecaseofresistancetoEGFR-targetedtherapy,theEGFR-METupandmiR-133bdownsubgroupmayfallintothedrugnon-responsegroupandthusprecludeEGFRtargettherapy.

Page 64: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

54

IMPROVEDPERFORMANCEOFGENESETANALYSISONGENOME-WIDETRANSCRIPTOMICSDATAWHENUSINGGENEACTIVITYSTATEESTIMATES

ThomasKamp,MicahAdams,CraigDisselkoen,NathanTintle

DordtCollege

NathanTintleGenesetanalysismethodscontinuetobeapopularandpowerfulmethodofevaluatinggenome-widetranscriptomicsdata.Theseapproachrequireapriorigroupingofgenesintobiologicallymeaningfulsets,andthenconductingdownstreamanalysesattheset(insteadofgene)levelofanalysis.Genesetanalysismethodshavebeenshowntoyieldmorepowerfulstatisticalconclusionsthansingle-geneanalysesduetobothreducedmultipletestingpenaltiesandpotentiallylargerobservedeffectsduetotheaggregationofeffectsacrossmultiplegenesintheset.Traditionally,genesetanalysismethodshavebeenapplieddirectlytonormalized,log-transformed,transcriptomicsdata.Recently,effortshavebeenmadetotransformtranscriptomicsdatatoscalesyieldingmorebiologicallyinterpretableresults.Forexample,recentlyproposedmodelstransformlog-transformedtranscriptomicsdatatoaconfidencemetric(rangingbetween0and100%)thatageneisactive(roughlyspeaking,thatthegeneproductispartofanactivecellularmechanism).Inthismanuscript,wedemonstrate,onbothrealandsimulatedtranscriptomicsdata,thattestsfordifferentialexpressionbetweensetsofgenesusingaretypicallymorepowerfulwhenusinggeneactivitystateestimatesasopposedtolog-transformedgeneexpressiondata.Ouranalysissuggestsfurtherexplorationoftechniquestotransformtranscriptomicsdatatomeaningfulquantitiesforimproveddownstreaminference.

Page 65: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

55

METHYLDMV:SIMULTANEOUSDETECTIONOFDIFFERENTIALDNAMETHYLATIONANDVARIABILITYWITHCONFOUNDERADJUSTMENT

PeiFenKuan,JunyanSong,ShuyaoHe

StonyBrookUniversity

PeiFenKuanDNAmethylationhasemergedaspromisingepigeneticmarkersfordiseasediagnosis.Boththedifferentialmean(DM)anddifferentialvariability(DV)inmethylationhavebeenshowntocontributetotranscriptionalaberrationanddiseasepathogenesis.ThepresenceofconfoundingfactorsinlargescaleEWASmayaffectthemethylationvaluesandhamperaccuratemarkerdiscovery.Inthispaper,weproposeaflexibleframeworkcalledmethylDMVwhichallowsforconfoundingfactorsadjustmentandenablessimultaneouscharacterizationandidentificationofCpGsexhibitingDMonly,DVonlyandbothDMandDV.Theproposedframeworkalsoallowsforprioritizationandselectionofcandidatefeaturestobeincludedinthepredictionalgorithm.WeillustratetheutilityofmethylDMVinseveralTCGAdatasets.AnRpackagemethylDMVimplementingourproposedmethodisavailableathttp://www.ams.sunysb.edu/~pfkuan/softwares.html#methylDMV.

Page 66: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

56

IDENTIFYCANCERDRIVERGENESTHROUGHSHAREDMENDELIANDISEASEPATHOGENICVARIANTSANDCANCERSOMATICMUTATIONS

MengMa1,ChangchangWang2,BenjaminGlicksberg1,EricE.Schadt1,ShuyuLi1,RongChen1

1IcahnSchoolofMedicineatMountSinai,2AnhuiUniversity

ShuyuLiGenomicsequencingstudiesinthepastseveralyearshaveyieldedalargenumberofcancersomaticmutations.Thereremainsamajorchallengeindelineatingasmallfractionofsomaticmutationsthatareoncogenicdriversfromabackgroundofpredominantlypassengermutations.Althoughcomputationaltoolshavebeendevelopedtopredictthefunctionalimpactofmutations,theirutilityislimited.Inthisstudy,weappliedanalternativeapproachtoidentifypotentiallynovelcancerdriversasthosesomaticmutationsthatoverlapwithknownpathogenicmutationsinMendeliandiseases.Wehypothesizethatthosesharedmutationsaremorelikelytobecancerdriversbecausetheyhavetheestablishedmolecularmechanismstoimpactproteinfunctions.WefirstshowthattheoverlapbetweensomaticmutationsinCOSMICandpathogenicgeneticvariantsinHGMDisassociatedwithhighmutationfrequencyincancersandisenrichedforknowncancergenes.WethenattemptedtoidentifyputativetumorsuppressorsbasedonthenumberofdistinctHGMD/COSMICoverlappingmutationsinagivengene,andourresultssuggestthationchannels,collagensandMarfansyndromeassociatedgenesmayrepresentnewclassesoftumorsuppressors.Toelucidatepotentiallynoveloncogenes,weidentifiedthoseHGMD/COSMICoverlappingmutationsthatarenotonlyhighlyrecurrentbutalsomutuallyexclusivefrompreviouslycharacterizedoncogenicmutationsineachspecificcancertype.Takentogether,ourstudyrepresentsanovelapproachtodiscovernewcancergenesfromthevastamountofcancergenomesequencingdata.

Page 67: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

57

IDENTIFYINGCANCERSPECIFICMETABOLICSIGNATURESUSINGCONSTRAINT-BASEDMODELS

AndréSchultz1,SanketMehta1,ChenyueW.Hu1,FiekeW.Hoff2,TerzahM.Horton3,StevenM.Kornblau2,AminaA.Qutub1

1RiceUniversity,2UniversityofTexasMDAndersonCancerCenter,3BaylorCollegeof

MedicineandTexasChildren'sHospital

AndréSchultzCancermetabolismdiffersremarkablyfromthemetabolismofhealthysurroundingtissues,anditisextremelyheterogeneousacrosscancertypes.Whilethesemetabolicdifferencesprovidepromisingavenuesforcancertreatments,muchworkremainstobedoneinunderstandinghowmetabolismisrewiredinmalignanttissues.Tothatend,constraint-basedmodelsprovideapowerfulcomputationaltoolforthestudyofmetabolismatthegenomescale.Togeneratemeaningfulpredictions,however,thesegeneralizedhumanmodelsmustfirstbetailoredforspecificcellortissuesub-types.Herewefirstpresenttwoimprovedalgorithmsfor(1)thegenerationofthesecontext-specificmetabolicmodelsbasedonomicsdata,and(2)Monte-Carlosamplingofthemetabolicmodelfluxspace.Byapplyingthesemethodstogenerateandanalyzecontext-specificmetabolicmodelsofdiversesolidcancercelllinedata,andprimaryleukemiapediatricpatientbiopsies,wedemonstratehowthemethodologypresentedinthisstudycangenerateinsightsintotherewiringdifferencesacrosssolidtumorsandbloodcancers.

Page 68: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

58

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 69: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

59

MAPPINGNEURONALCELLTYPESUSINGINTEGRATIVEMULTI-SPECIESMODELINGOFHUMANANDMOUSESINGLECELLRNASEQUENCING

TravisJohnson,ZacharyAbrams,YanZhang,KunHuang

OhioStateUniversity

TravisJohnsonMousebraintranscriptomicstudiesareimportantintheunderstandingofthestructuralheterogeneityinthebrain.However,itisnotwellunderstoodhowcelltypesinthemousebrainrelatetohumanbraincelltypesonacellularlevel.Weproposethatitispossiblewithsinglecellgranularitytofindconcordantgenesbetweenmouseandhumanandthatthesegenescanbeusedtoseparatecelltypesacrossspecies.Weshowthatasetofconcordantgenescanbealgorithmicallyderivedfromacombinationofhumanandmousesinglecellsequencingdata.Usingthisgeneset,weshowthatsimilarcelltypessharedbetweenmouseandhumanclustertogether.Furthermorewefindthatpreviouslyunclassifiedhumancellscanbemappedtotheglial/vascularcelltypebyintegratingmousecelltypeexpressionprofiles.

Page 70: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

60

ASPATIOTEMPORALMODELTOSIMULATECHEMOTHERAPYREGIMENSFORHETEROGENEOUSBLADDERCANCERMETASTASESTOTHELUNG

KimberlyR.KanigelWinner1,JamesC.Costello2

1ComputationalBioscienceProgram,DepartmentofPharmacology,UniveristyofColoradoCancerCenter;2UniversityofColoradoAnschutzMedicalCampus

KimberlyKanigelWinnerTumorsarecomposedofheterogeneouspopulationsofcells.Somaticgeneticaberrationsareoneformofheterogeneitythatallowsclonalcellstoadapttochemotherapeuticstress,thusprovidingapathforresistancetoarise.Insilicomodelingoftumorsprovidesaplatformforrapid,quantitativeexperimentstoinexpensivelystudyhowcompositionalheterogeneitycontributestodrugresistance.Accordingly,wehavebuiltaspatiotemporalmodelofalungmetastasisoriginatingfromaprimarybladdertumor,incorporatinginvivodrugconcentrationsoffirst-linechemotherapy,resistancedatafrombladdercancercelllines,vasculardensityoflungmetastases,andgainsinresistanceincellsthatsurvivechemotherapy.Inmetastaticbladdercancer,afirst-linedrugregimenincludessixcyclesofgemcitabinepluscisplatin(GC)deliveredsimultaneouslyonday1,andgemcitabineonday8ineach21-daycycle.Theinteractionbetweengemcitabineandcisplatinhasbeenshowntobesynergisticinvitro,andresultsinbetteroutcomesinpatients.Ourmodelshowsthatduringsimulatedtreatmentwiththisregimen,GCsynergydoesbegintokillcellsthataremoreresistanttocisplatin,butrepopulationbyresistantcellsoccurs.Post-regimenpopulationsaremixturesoftheoriginal,seededresistantclones,and/ornewclonesthathavegainedresistancetocisplatin,gemcitabine,orbothdrugs.Theemergenceofatumorwithincreasedresistanceisqualitativelyconsistentwiththefive-yearsurvivalof6.8%forpatientswithmetastatictransitionalcellcarcinomaoftheurinarybladdertreatedwithaGCregimen.Themodelcanbefurtherusedtoexploretheparameterspaceforclinicallyrelevantvariables,includingthetimingofdrugdeliverytooptimizecelldeath,andpatient-specificdatasuchasvasculardensity,ratesofresistancegain,diseaseprogression,andmolecularprofiles,andcanbeexpandedfordataontoxicity.Themodelisspecifictobladdercancer,whichhasnotpreviouslybeenmodeledinthiscontext,butcanbeadaptedtorepresentothercancers.

Page 71: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

61

SCALABLEVISUALIZATIONFORHIGH-DIMENSIONALSINGLE-CELLDATA

JuhoKim,NateRussell,JianPeng

UniversityofIllinoisatUrbana-Champaign

JuhoKimSingle-cellanalysiscanuncoverthemysteriesinthestateofindividualcellsandenableustoconstructnewmodelsabouttheanalysisofheterogeneoustissues.State-of-the-arttechnologiesforsingle-cellanalysishavebeendevelopedtomeasurethepropertiesofsingle-cellsanddetecthiddeninformation.Theyareabletoprovidethemeasurementsofdozensoffeaturessimultaneouslyineachcell.However,duetothehigh-dimensionality,heterogeneouscomplexityandsheerenormityofsingle-celldata,itsinterpretationischallenging.Thus,newmethodstoovercomehigh-dimensionalityarenecessary.Here,wepresentacomputationaltoolthatallowsefficientvisualizationofhigh-dimensionalsingle-celldataontoalow-dimensional(2Dor3D)spacewhilepreservingthesimilaritystructurebetweensingle-cells.Wefirstconstructanetworkthatcanrepresentthesimilaritystructurebetweenthehigh-dimensionalrepresentationsofsingle-cells,andthen,embedthisnetworkintoalow-dimensionalspacethroughanefficientonlineoptimizationmethodbasedontheideaofnegativesampling.Usingthisapproach,wecanpreservethehigh-dimensionalstructureofsingle-celldatainanembeddedlow-dimensionalspacethatfacilitatesvisualanalysesofthedata.

Page 72: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

62

COMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION

POSTERPRESENTATIONS

Page 73: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

63

CLUSTER-BASEDGENOTYPE-ENVIRONMENT-PHENOTYPECORRELATIONALGORITHM

ErnestoBorrayo,RyokoMachida-Hirano

GeneResearchCenter,UniversityofTsukuba

ErnestoBorrayoTheinteractionsbetweengenotypeandenvironmentgiverisetophenotypicplasticity.However,theseinteractionsaredynamicandcomplex.Whatisconsideredasaphenotypeatoneevaluation,canbeconsideredasanenvironmentalconditionatsomeother,asthatpreviousphenotypewillaffectparticularconditionsforthenewone.Also,underaspecificperspectiveadeterminedgeneticmaterialcanbeconsideredasanenvironmentalconditionforotherloci.Theseconceptselucidatethatthe“onegene,onetrait”rationaleisrathertheexceptionthantherule,andinordertoadequatelypredictthepossiblephenotypeexpectedatanybiologicallevel,thespecificinteractionbetweenenvironmentandgenotypeshouldbeanalyzedcarefully.Inordertoinferthedegreeofinfluenceofbothagenotypeandanenvironmentovercertainphenotypictraits,wedevelopedacluster-basedalgorithmthatrendersthewayphenotypicaltraitscanbeexplainedbyeitherthatgenotypeorsuchenvironmentalconditions.Althoughthisapproachisstillfarfrombeingabletoconsiderallpossibleaspectsthatmayexplainaphenotypiccondition,itisafirstapproachtosuccessfullyanalyzingthementionedgenotype-environment-phenotypeinteractionsinacomprehensivemanner.Totestthealgorithmalongwithsyntheticdata,realgenetic,environmentalandagromorphologicaltraitsofTheobromacacaoandSechiumedulewerealsoanalyzed.Weexpectthatfurtherexplorationofdifferentclassifierswillhelptoadequatelypredictphenotypicexpressionatdifferentbiologicallevels—withsignificantapplicationsindiversefieldssuchascropimprovement,genomics,clinicaldiagnosis/prognosis/treatmentandmetabolomics—andthatitwillenhanceourunderstandingofgenomics,metabolomicsandadaptation/evolutionaryprocesses.

Page 74: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

64

QUANTITATINGTRANSLATIONALCONTROL:MRNAABUNDANCE-DEPENDENTANDINDEPENDENTCONTRIBUTIONS

JingyiJessicaLi1,Guo-LiangChew2,MarkD.Biggin3

1DepartmentofStatisticsandDepartmentofHumanGenetics,UCLA;2ComputationalBiologyProgram,FredHutchinsonCancerResearchCenter;3BiologicalSystemsand

EngineeringDivision,LawrenceBerkeleyNationalLaboratory

JingyiJessicaLiTranslationratepermRNAmoleculecorrelatespositivelywithmRNAabundance.Asaresult,proteinlevelsdonotscalelinearlywithmRNAlevels,butinsteadscalewiththeabundanceofmRNAraisedtothepowerofan“amplificationexponent”.Hereweshowthattoquantitatetranslationalcontrolitisnecessarytodecomposethetranslationrateintotwocomponents.Onecomponent,TRmD,dependsonthemRNAlevelanddefinestheamplificationexponent.Theothercomponent,TRmIND,isindependentofmRNAamountandimpactsthecorrelationcoefficientbetweenproteinandmRNAlevels.WeshowthatinS.cerevisiaeTRmDrepresents~30%ofthevarianceintranslationandresultsinanamplificationexponentof~1.20–1.27.TRmINDconstitutestheremaining70%ofthevarianceintranslationandexplains<5%ofthevarianceinproteinexpression.Whenproteindegradationisalsoconsidered,thecorrelationbetweentheabundancesofproteinandmRNAisR2prot–RNA>0.92.WealsoinvestigatewhichmRNAsequenceelementsexplainthevarianceinTRmDandTRmIND.WefindthatTRmINDismoststronglydeterminedbythelengthoftheopenreadingframe,whileTRmDismorestronglydeterminedbyanArich,highlyunfoldedelementthatspansnucleotides-35to+28relativetotheinitiatingAUGcodon,implyingthatTRmINDisunderdifferentevolutionaryselectivepressuresthanTRmD.OurworkintroducesmethodsforcorrectlyscalingmRNAandproteinabundancedatausinginternallycontrolledstandards.Itprovidesquitedifferent,moreaccurateestimatesoftranslationalcontrolthananyprevious.Bydecomposingtranslationrates,wealsoprovideinsightsintothemRNAsequencedependenciesoftranslationthatwouldnotbeapparentotherwise.

Page 75: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

65

PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION

ShengWang,MengQu,JianPen

UniversityofIllinoisUrbanaChampaign

ShengWangAutomatedannotationofproteinfunctionhasbecomeacriticaltaskinthepost-genomicera.Network-basedapproachesandhomology-basedapproacheshavebeenwidelyusedandrecentlytestedinlarge-scalecommunity-wideassessmentexperiments.Itisnaturaltointegratenetworkdatawithhomologyinformationtofurtherimprovethepredictiveperformance.However,integratingthesetwoheterogeneous,high-dimensionalandnoisydatasetsisnon-trivial.Inthiswork,weintroduceanovelproteinfunctionpredictionalgorithmProSNet.Anintegratedheterogeneousnetworkisfirstbuilttoincludemolecularnetworksofmultiplespeciesandlinktogetherhomologousproteinsacrossmultiplespecies.Basedonthisintegratednetwork,adimensionalityreductionalgorithmisintroducedtoobtaincompactlow-dimensionalvectorstoencodeproteinsinthenetwork.Finally,wedevelopmachinelearningclassificationalgorithmsthattakethevectorsasinputandmakepredictionsbytransferringannotationsbothwithineachspeciesandacrossdifferentspecies.Extensiveexperimentsonfivemajorspeciesdemonstratethatourintegrationofhomologywithmolecularnetworkssubstantiallyimprovesthepredictiveperformanceoverexistingapproaches.

Page 76: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

66

GENERAL

POSTERPRESENTATIONS

Page 77: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

67

IDENTIFICATIONOFDIFFERENTIALLYPHOSPHORYLATEDMODULESINPROTEININTERACTIONNETWORKS

MarziehAyati,DanicaWiredja,DanielaSchlatzer,GouthamNarla,MarkChance,MehmetKoyutürk

CaseWesternReserveUniversity

MehmetKoyuturkAdvancesinhigh-throughputomicstechnologiesrevolutionizedourunderstandingofthegenomicunderpinningsofcancer.However,manychallengesremaininunderstandinghowpatientswithcommondrivermutationsmaydisplaydivergingphosphoproteomicresponsestothesametreatment.Thus,anexaminationofthesignalinglandscapewillprovideessentialmolecularinformationformodelingpersonalizedpatienttreatmentdesign.However,integrativebioinformaticsapproachestoidentifyphosphoproteomics-basedmolecularstatesareintheirinfancy.Toaddressthischallenge,weadaptouralgorithmMoBaS,whichhasbeenoriginallydevelopedtoidentifyphenotype-associatedsubnetworksinthecontextofgenome-wideassociationstudies.MoBaStakesasinputaPPInetworkandascoreforeachproteinindicatingtheprotein’sdifferentialphosphorylationlevel.Itthenidentifiesproteinsubnetworksthatare(i)composedofdenselyinteractingproteins,and(ii)enrichedinproteinswithhighscores.MoBaSalsoassessesthestatisticalsignificanceoftheidentifiedsubnetworksusingpermutationteststhateffectivelyhandlemultiplehypothesistesting.WeapplyMoBaStocompareandcontrastthedrug-inducedglobalsignalingalterationsoftwoKRASmutatednon-smallcelllungcancer(NSCLC)celllines,A549andH358,treatedwithanovelactivatorofthetumorsuppressorProteinPhosphatase2A(PP2A)versusDMSOcontrol.Applyingkinaseenrichmentanalysisonidentifiedsubnetworks,weidentifyAuroraKBasakeykinasedifferentiallyregulatedbetweenthetwocelllinesinresponsetoourcompound.Furthercorroboratingthisfinding,weshowthatAuroraKBisdownregulatedattheproteinandmRNAlevelswithourtreatmentinA549butnotinH358.

Page 78: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

68

CLUSTERINGMETHODFORPRIORITIZINGBREASTCANCERRISKGENESANDMIRNAS

YongshengBai,NaureenAslam,AliSalman

IndianaStateUniversity

YongshengBaiBackgroundMicroRNAs(miRNA)areshortnucleotidesthatinteractwiththeirtargetmRNAsthrough3’untranslatedregions(UTRs).TheCancerGenomeAtlas(TCGA)projectinitiatedin2006hasachievedtosequencetissuecollectionwithmatchedtumorandnormalsamplesfrom11,000patientsin33cancertypesandsubtypes,including10rarecancers.ThereisanurgentneedtodevelopinnovativemethodologiesandtoolsthatcanclustermRNA-miRNAinteractionpairsintogroupsandcharacterizefunctionalconsequencesofcancerriskgeneswhileanalyzingthetumorandnormalsamplessimultaneously.RationaleAnundirectedgraphcanbeusedtorepresentgeneandmiRNArelationshipsinaninteractionnetwork.Specifically,interactionsbetweengenesandmiRNAsarerenderedasabipartitegraphwithgenesormiRNAsasverticesandtheircalculatedcorrelationasedges.Ourhypothesisis:Ifahighlyscoredgene/miRNAclusterinagiventumorsampleshowsasignificantlyalteredregulationrelativetoasimilargene/miRNAclusterinthecorrespondingnon-tumorsample,theclusterisbiologicallysignificant.ResultsWedevelopedapowerfulmathematicalmodeltoidentifyclustersofsignificantmRNAandmiRNAinteractionpairsanddeciphermRNAandmiRNAregulationnetworkusingTCGAmiRNAsequencingandmRNAsequencingdata.WerantheclusterdetectionalgorithmimplementedinPython3onTCGABreastInvasiveCarcinoma(BRCA)transcriptome(bothRNA-SeqandmiRNA-Seq)datasets.Usingdifferentclustersize(orbin)anddifferentselectionofmiRNAandmRNApairsforcreatingclusterswillgeneratedifferenttopologyofclusters,therefore,resultingindifferentnumbersofcommonclustersbetweentumorandnormalsamplesaswell.Weran1,000differentrandomselectionsoftargetpairstogeneratedifferentclustertopologyandcombinedallresultstogethertoobtain105,850distinctivecandidateclustersforprioritization.ConclusionsWethinkourmethodologyforidentifyingcancerdrivergenesinpersonalgenomesinwhichcliniciansseektodevelopbettertreatmentstrategiesisvaluabletothefield.Ourproposedmethodshouldbeapplicableacrossarangeofdiseasesandcancers.

Page 79: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

69

FUSIONDB:ASSESSINGMICROBIALDIVERSITYANDENVIRONMENTALPREFERENCESVIAFUNCTIONALSIMILARITY

ChengshengZhu1,YannickMahlich1,2,3,4,YanaBromberg1,4

1DepartmentofBiochemistryandMicrobiology,SchoolofEnvironmentalandBiologicalSciences,RutgersUniversity,NewBrunswick,NJ,USA;2GraduateSchool,Centerof

DoctoralStudiesinInformaticsanditsApplications(CeDoSIA),TUM,Garching,Germany;3DepartmentofInformatics,Bioinformatics&ComputationalBiology-I12,TUM,Garching,Germany;4InstituteofAdvancedStudy(TUM-IAS),Garching,Germany

YanaBrombergSummary:Microbialfunctionaldiversificationisdrivenbyenvironmentalfactors.Insomecases,microbesdiffermoreacrossenvironmentsthanacrosstaxa.HereweintroducefusionDB,anoveldatabaseofmicrobialfunctionalsimilarities,indexedbyavailableenvironmentalpreferences.fusionDBentriesrepresentnearlyfourteenhundredtaxonomically-distinctbacteriaannotatedwithavailablemetadata:habitat,temperature,andoxygenuse.Eachmicrobeisencodedasasetoffunctionsrepresentedbyitsproteome,andindividualmicrobesareconnectedviacommonfunctions.DatabasesearchesproduceeasilyvisualizableXML-formattednetworkfilesofselectedorganisms,alongwiththeirsharedfunctions.fusionDBthusprovidesafastmeansofassociatingspecificenvironmentalfactorswithorganismfunctions.Availability:http://bromberglab.org/databases/fusiondbandasasql-dumpbyrequest.Contact:[email protected],[email protected]

Page 80: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

70

THEGEORGEM.O’BRIENKIDNEYTRANSLATIONALCORECENTERATTHEUNIVERSITYOFMICHIGAN

FrankC.Brosius1,WenjunJu1,KeithBellovich2,ZeenatBhat3,CrystalGadegbeku4,DebbieGipson1,JenniferHawkins1,JuliaHerzog1,SusanMassengill5,RichardC.

McEachin1,SubramaniamPennathur1,KalyaniPerumal6,RogerWiggins1,MatthiasKretzler1

1UniversityofMichigan,2RenaissanceRenalResearchInstitute,3WayneStateUniversity,

4TempleUniversity,5LevineChildren’sHospital,6UniviversityofIllinoisatChicago

RichardMcEachinRecentadvanceshaveallowedthedevelopmentofmolecularmapstodefinechronickidneydisease(CKD)innew,accurateandpersonalizedways.ThesedevelopmentsmakepossiblethepredictionofoutcomesandresponsetotherapyandtheidentificationofkeymoleculartargetsfortreatmentofCKDinindividualpatients.IdentificationofsuchtargetsentailsclosecollaborationbetweenteamsofinvestigatorstocollectandannotatesamplesfromwellcharacterizedCKDsubjects.Inaddition,technologiesareneededthatsupportinformationexchange,robustdatabanks,anddataintegrationtodefinekeypathwaysdrivingCKDpathogenesis.TheO'BrienKidneyTranslationalCoreCenterattheUniversityofMichiganprovidessuchbiobanking,databankstructureandbioinformaticsupporttobasicandclinicalinvestigatorstoallowthemtopursuecriticalprecisionmedicineinvestigationsofhumanswithCKD.TheClinicalPhenotypingandBiobankCorehasenrolledover1200patientswithCKDfrom5sitesandbankedtheirsamplesandclinicalinformationprovidingavaluableresourceforefficientdiscovery.Multiplespecificresearchstudieshavenowsuccessfullyutilizedtheseresources.TheAppliedSystemsBiologyCoreanditsonlineanalyticaltool,Nephroseq,haveassistedhundredsofinvestigatorsaroundtheworldinapproachestotheanalysisoflargetranscriptomicdatasetsandothersystems-level,biologicalstudiesofpatientswithCKD.TheCenter’sBioinformaticsCoreprovidesaccesstocomputationalapplicationsandskilledprofessionalsupportinbioinformaticsandbiostatisticsandwillnowbeprovidingback-endmaintenanceofNephroseq.TheAdministrativeCoredirectspilotandsmallgrants,studenttraininganddiscountprogramswiththegoalofhelpingnewandestablishedresearchersutilizesystemsbiologicalandtranslationalresearchtools.Togetherthesecoresprovideacomprehensivetranslationalresearchsupportfornovelresearchintoclassificationandtreatmentofchronickidneydiseases.Allinterestedacademicinvestigatorsaroundtheworldareinvitedtomakeuseoftheseservicesandtocontactusforinformationandconsultation.

Page 81: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

71

MININGDIRECTIONALDRUGINTERACTIONEFFECTSONMYOPATHYUSINGTHEFAERSDATABASE

DanaiChasioti1,XiaohuiYao1,PengyueZhang2,XiaNing3,LangLi2,LiShen4

1IUPUISchoolofInformaticsandComputing;2CenterforComputationalBiologyandBioinformatics,DepartmentofMedicalandMolecularGenetics,IndianaUniversity

SchoolofMedicine;3IUPUIDepartmentofComputerScience;4CenterforNeruoimaging,DepartmentofRadiologyandImagingSciences,IndianaUniversitySchoolofMedicine

LiShenBackground:Mininghigh-orderdrug-druginteraction(DDI)inducedadversedrugeffects(ADEs)fromelectronichealthrecord(EHR)databasesisanemergingarea,andveryfewstudieshaveexploredtherelationshipsbetweenDDIs.Tobridgethisgap,westudyanovelpharmacovigilanceproblemforminingdirectionaldruginteractioneffectonmyopathyusingtheFDAAdverseEventReportingSystem(FAERS)database.Method:Theanalysiswasperformedonacase–controldatasetextractedfromtheFAERSdatabase.Thedatasetcontains1,763drugs,andincludes136,860myopathyeventsand3,940,587controlevents.GiventwosetsofdrugcombinationsD1andD2(asupersetofD1),wedefinethedirectionalADEeffectfromD1toD2,asthealteredADEriskassociatedwiththechangefromtakingD1totakingD2.TheADEriskswereestimatedusingoddratios(ORs).Toaddressbothcomputationalandstatisticalchallenges,thisstudywasfocusedoncomputingORsforfrequentD2’s(i.e.,thenumberofoccurrencesauser-specifiedminimumsupport).TheApriorialgorithmwasemployedtoidentifyfrequentD2’s.Results:Usingtheminimumsupportof1000,weidentified764frequentdrugs,7036frequent2drugcombinations,and4280frequent3drugcombinations.ThetoptenADEORsforsingledrugsrangefrom4.1to5.6,fortwodrugcombinationsfrom12.6to21.5,andforthreedrugcombinationsfrom14.8to19.5.ThetoptendirectionalADEORsbetweenonedrugandtwodrugsrangefrom13.5to28.2;thosebetweenonedrugandthreedrugsrangefrom13.1to20.3;andthosebetweentwodrugsandthreedrugsrangefrom11.3to34.4.MultiplepromisingdirectionalADEfindingswereidentified.Forexample,theriskofmyopathyis28.2timeshigherwhenaddingGadopentetatedimeglumineontopofGadobenatedimeglumine.BothdrugsareGadolinium-basedcontrastagents(GBCAs)usedinmagneticresonanceimaging.GBCAshavebeenshowntobeassociatedwithNephrogenicsystemicfibrosis(NSF)whichmaypresentasprogressivemyopathy.Conclusion:ThedirectionaldruginteractionscapturetheADErisksintroducedbyadditionaldrugstakenontopofasetofbaselinedrugs,andprovidenovelandvaluablepharmacovigilanceknowledgewithpotentialtoimpactclinicaldecisionsupport.MiningfrequentpatternsusingAprioriisapromisingapproachforeffectivediscoveryofhigh-orderdirectionaldruginteractioneffects.

Page 82: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

72

DECIPHERINGNEURONALBROADHISTONEH3K4ME3DOMAINSASSOCIATEDWITHGENE-REGULATORYNETWORKSANDCONSERVEDEPIGENOMIC

LANDSCAPESINTHEHUMANBRAIN

AslihanDincer1,EricE.Schadt2,BinZhang2,JoelT.Dudley2,DavinGavin3,SchahramAkbarian4

1DepartmentofNeuroscience,FriedmanBrainInstitute,IcahnSchoolofMedicineatMountSinai,NewYork;2DepartmentofGeneticsandGenomicSciences,InstituteforGenomicsandMultiscaleBiology,IcahnSchoolofMedicineatMountSinai,NewYork;

3DepartmentofPsychiatry,JesseBrownVeteransAffairsMedicalCenter,Chicago;4DepartmentofPsychiatry,FriedmanBrainInstitute,IcahnSchoolofMedicineatMount

Sinai,NewYork

AslihanDincerOnlyfewhistonemodificationshavebeenmappedinhumanbrain.TrimethylationofhistoneH3atlysine4(H3K4me3)isachromatinmodificationknowntomarkthetranscriptionstartsites(TSS)ofactivegenepromoters.RegulatorsofH3K4me3markaresignificantlyassociatedwiththegeneticriskarchitectureofcommonneurodevelopmentaldisease,includingschizophreniaandautism.Here,throughintegrativecomputationalanalysisofepigenomicandtranscriptomicdatabasedonnextgenerationsequencing,weinvestigatedH3K4me3landscapesofFACSsortedneuronalandnon-neuronalnucleiinhumanpostmortem,non-humanprimate(chimpanzeeandmacaque)andmouseprefrontalcortex(PFC),andblood.WecharacterizedthebroadH3K4me3histonedomainsfromhumanPFCinthecontextofcell-typespecificregulation,associationwithneuronalandnon-neuronalgeneexpressionandpotentialimplicationsfornormalanddiseaseddevelopment.WefirstaddressedtheoccurrenceandthebiologicalsignificanceofthebroadH3K4me3histonedomainsinthreedifferentcelltypes,includingNeuN+PFCneurons,NeuN-PFCcells,andnucleatedbloodcellsandthenidentifiednovelregulatorsofthesethreedifferentcelltypesbyfocusingontop5%broadestH3K4me3peaks(lengthinbasepairs).InPFCneurons,broadestpeaksrangedinsizefrom3.9to12kb,withextremelybroadpeaks(~10kborbroader)relatedtosynapticfunctionandGABAergicsignaling(DLX1,ELFN1,GAD1,LINC00966).Broadestneuronalpeaksshoweddistinctmotifsignatures,andwerecentrallypositionedinprefrontalgenebayesianregulatorynetworks.Approximately120ofthebroadestH3K4me3peaksinhumanPFCneurons,includingmanygenesrelatedtoglutamatergicanddopaminergicsignaling,werefullyconservedinchimpanzee,macaqueandmousecorticalneurons.Explorationofspreadandbreadthoflysinemethylationmarkingsinspecificcelltypescouldprovidenovelinsightsintoepigeneticmechanismofnormalanddiseasedbraindevelopment,agingandevolutionofneuronalgenomes.

Page 83: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

73

NORMALIZATIONTECHNIQUESANDMACHINELEARNINGCLASSIFICATIONFORASSIGNINGMOLECULARSUBSETSINAUTOIMMUNEDISEASEANDCANCER

JenniferM.Franks1,2,GuoshuaiCai1,JaclynN.Taroni3,4,MichaelL.Whitfield1,21DepartmentofMolecularandSystemsBiology;2PrograminQuantitativeBiomedicalSciences,

GeiselSchoolofMedicineatDartmouth;3DepartmentofSystemsPharmacologyandTranslationalTherapeutics;4InstituteforTranslationalMedicineandTherapeutics,Universityof

PennsylvaniaPerelmanSchoolofMedicineJenniferFranksSystemicsclerosis(SSc)isacomplexconnectivetissuediseaseinvolvingskinandinternalorganfibrosis,vasculardamage,andimmunologicabnormalities.Tocharacterizediseaseheterogeneityandmolecularpathogenesis,transcriptomicshaveelucidatedcommonbiologicalprocessesinsubsetsofSScpatientsusingintrinsicgeneexpressionanalyses.Fourintrinsicsubsetscharacterizedbydistinctmolecularsignatureshavebeenvalidatedbymultipleindependentcohorts.Technicalbiasesinherenttodifferentgeneexpressionprofilingplatformspresentauniqueproblemwhenanalyzingdatageneratedfrommultiplestudies.WhilemicroarrayandRNA-seqdatahavebeenshowntohaveahighcorrelation,differencesinoverallprocessingandquantificationresultindistinctdatadistributions.Here,weintroduceanaccurateandreproducibleclassifierforSScmolecularsubtypesandhavedevelopedamethodtonormalizedatawhenplatform-specificartifactsarise.Weusedthreeindependent,well-characterizedandvalidatedexperimentalmicroarraydatasets(Hinchcliffetal.,2013;Milanoetal.,2008;Pendergrassetal.,2012)totrainasupervisedclassifierusingthree-foldcross-validationrepeatedtentimes,performingatanaverageof>88%accuracy.Datafromotherplatforms,includingRNA-seq,areanalyzedforplatform-basedbiasusingguidedPCAanalysis(Reeseetal.,2013).Wedevelopedamethodtoeliminateplatformbiasbynormalizingonagene-by-genebasisusingthemicroarraytrainingdataasthetargetdistribution.Wefindthatthismethodsuccessfullyremovesplatform-specificeffectsfromthedata.Followingnormalization,eachsampleisassignedtoamolecularsubsetbasedonsupportvectormachine(SVM)classification.OurpreliminaryanalysesfindthatthesemethodsworkextremelywellonavalidationRNA-seqdatasetinSSc(100%accuracy,n=12,Lietal.,inpreparation).WealsoappliedourmethodstobreastcancerDNAmicroarrayandRNA-seqdatafromTheCancerGenomeAtlas(TCGA)(CancerGenomeAtlas,2012)wherefiveintrinsicgeneexpressionsubsetshavebeenpreviouslyidentifiedanddescribedwithPAM50(Parkeretal.,2009).Tumorandtumor-adjacentnormalbiopsiesofbreastcancer,forwhichintrinsicsubtypeinformationwasavailable,wereusedtotrainandtestaSVMandevaluateournormalizationtechnique.Weachieve93%accuracyinassigningsubtypesfornormalizedRNA-seqdatausingourclassifiertrainedexclusivelyonmicroarraydata.Untilrecently,clinicaltrialsanddiagnosingphysicianshavenotconsideredmolecularheterogeneityinthecontextofimmunosuppressivetherapy,whichmayexplainimprovementinselectSScpatients(Martyanov&Whitfield,2016).Advancingpersonalizedmedicinebyusingintrinsicmolecularsubsetsmayproveparticularlybeneficialtothisfield.Withournewlydevelopedtechniques,wecansuccessfullyleverageinformationfromvalidatedexpressiondatainnewanalysesdespitedifferentplatformsusedforgeneexpressionprofiling.

Page 84: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

74

MULTI-OMICSDATAINTEGRATIONTOSTRATIFYPOPULATIONINHEPATOCELLULARCARCINOMA

KumardeepChaudhary,OlivierPoirion,LiangqunLu,LanaGarmire

UniversityofHawaiiCancerCenter,Honolulu

LanaGarmireHighmortalityrateofHepatocellularCarcinoma(HCC)isinpartduetothevastheterogeneityofthecancer.IdentifyingrobustmolecularsubgroupsofHCChelpstoguideprecisetargetedtherapeutics.Thiscouldberealizedbyintegratingdifferentlayersofomicsdatasetsfromthesamecohort.Toachievethis,wepresentadeeplearning(DL)basedmethodtoinspectthedifferentsubpopulationsofpatientswithinHCCfromTCGA.Weobtainedtheinformationof360HCCpatientsavailableinTCGAwith3omicsdatatypes(RNA-seq,miRNA-seqandmethylation).Toidentifythedifferentsubpopulations,ourpipelineimplementsaDL-basedautoencoder,identifieshiddenlayerslinkedtosurvival,andperformsk-meansclusteringusingthesenewfeatures.Toassignnewsamplestotheidentifiedsubpopulations,asupervisedclassificationprocedurewasconductedusingSupportVectorMachine(SVM).Toassesstheperformanceofthemodel,weused5-foldscross-validationschemetoestimatec-indexandbrierscores.Wealsoused60:40ratiotosplitthedatain10foldsinordertoassessthesignificanceofthecoxphregressioninthetestdataset.Finally,weinferredtheclusterlabelsoftwoexternalcohortsbasedonthegeneexpressiondata.Autoencoderframeworkwasusedtocombinethe3omicsasinputfeatures(~40,000)andtoproduce100transformednewfeatures.Amongthesenewfeatures,weidentified36featuressignificantlylinkedwithsurvival,whichwerefurtherusedtoinfer2optimalclustersofpatientswithsignificantsurvivaldifferences.Usingcross-validationprocedure,weobtainedaveragec-indexandbrierscorevaluesof0.70and0.20respectively,forthetestsets.Also,thecoxphregressionshowssignificantsurvivalestimationwhenusingthetestsamples.Finally,ourframeworkisvalidatedontwoexternaldataset:221HCCsamplesfromGEOstudyand230HCCsamplesfromLIRI-JP(RIKEN)cohort.Moreover,weprovedthateachoftheindividualomicfeaturesetscanbeusedsuccessfullytoinferthe2survivalprofiles.However,thecombinationofthe3omicsismorepowerful.WealsocomparedtheDLmethodologywithnewfeaturesproducedbyPCAinstead.Theclinicalandmoleculardifferences(intermsofsurvival,pathways,anddrivermutationprofiles)weresignificantlydifferentforthetwosubpopulations.Thisisthefirststudytoemploydeeplearningasarobustframeworktoidentifynon-linearcombinationofmulti-omicsfeatureslinkedtoidentificationofsubclassesofHCCpatients.Usingmulti-omicsdatasets,ourpipelinesuccessfullycombinesthesedifferentfeaturesandidentifiestwoHCCsubpopulationsexhibitingdifferentsurvivalprofiles.Wethenusedthismodelincombinationwithsupervisedmachine-learningapproachestopredictHCCsubpopulationassignmentfortestandvalidationdatasets.

Page 85: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

75

TOWARDSSTANDARDS-BASEDCLINICALDATAWEBAPPLICATIONLEVERAGINGSHINYRANDHL7FHIR

NaHong,NareshProdduturi,ChenWang,GuoqianJiang

DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN

GuoqianJiangIntroduction:TheFastHealthcareInteroperabilityResources(FHIR)isanemergingclinicaldatastandarddevelopedatHL7,whichenablestherepresentationandexchangeoftheelectronichealthrecords(EHR)datainastandardstructure.FHIRhasstrongexecutableabilitybasedontheRESTfulservicearchitectureandmultipleflexibledataexchangeformats.ShinyisawebapplicationframeworkwithasimplifiedwebdeploymentmechanismthatenablespowerfulRfunctionstosupportthegraphicalandinteractiveanalysis.Therefore,withthegoalofbuildingreusableandextensibleclinicalstatisticsandanalysisapplications,weaimtodesign,developandevaluateaflexibleframeworkusingtheHL7FHIRstandardandtheR-poweredwebapplication-Shiny.Methods:WefirstestablishedalocalFHIRservertomanageourclinicaldata.ThispartofworkisfocusedontheanalysisandimplementationoftheFHIRdatamodels(i.e.,coreresources),dataexchangeformats(e.g.,XMLandJSON)andinvokinganopensourceHAPIFHIRAPI.Second,wedesignedtwoanalysisworkflowsthatarefocusedonpatient-centereddataanalysisandcohort-baseddataanalysisrespectively.Accordingtotheworkflowdesign,wedevelopedanopenapplicationplatformknownasShinyFHIRusingtheShinywebframeworkandtheestablishedFHIRserver.Results:WebuiltalocalFHIRserverusingtheHAPIDSTU2API.Intotal,140patientrecords,476observationrecords,496conditionrecordsand107procedurerecordswerepopulatedintotheFHIRserverfortesting.WiththesupportofRpackages,including‘jsonlite’,‘dygraph’and‘timeline’,ourplatformcanbeusedforavarietyofusecasesofclinicaldataanalysis,includingpatientbloodpressureobservationtimelineanalysis,patientcohortgender/agedistributionstatistics,etc.TheresultsoftheexperimentshowthattheShinyFHIRintegrationapproachoffersthefeasibilityofweb-basedinteractivestatisticsanalysisonstandardizedFHIR-basedclinicaldata.Discussions:TheimplementationsofFHIRhavealreadyattractedalotofinterestsfromhealthcarepractitioners.OurShinyFHIRimplementationprovidesausefulframeworkthatwouldbecomplementarytootherFHIR-basedapplications(e.g.,SMARTonFHIR).ShinyFHIRisdesignedtovisualizetheFHIR-conformantdatathroughcapturingtheuserexperiencesandhabits,andoffersrapidsupportforclinicalresearchwhilecombiningthelimitlessstatisticalpowerofR.However,thereareseveralissuesneedtobesolvedinthefuture,suchasthesupportoftheFHIRextensionsandcustommodelsandthesystemperformanceenhancement.Inthisstudy,wedescribedoureffortsinbuildingastandardizedclinicalstatisticsandanalysisapplicationleveragingShiny.WeconsiderthatthedesignedworkflowscanbeappliedtootherEHRsdatathatfollowstheFHIRstandard,andotherpublicavailableFHIRserverscanbeusedtovalidatetheutilityofourframework.

Page 86: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

76

ADATALAKEPLATFORMOFCONTEXTUALBIOLOGICALINFORMATIONFORAGILETRANSLATIONALRESEARCH

AustinHuang1,DmitriBichko1,MathieuBoespflug2,EdskodeVries3,FacundoDominguez2,DanielZiemek1

1Pfizer,2TweagI/O,3Well-Typed

AustinHuangResearchersneedtoaggregatecontextualbiologicalinformationinordertointerpretexperimentalandclinicalstudyresults.Theseneedsvarygreatlydependingonthescientificquestion.Creatinglarge-scale,structureddatarepositoriesrequiressubstantialinvestmentthatisnotamenabletotherapidly-evolvingneedsoftranslationalresearch.Ontheotherhand,performingdataanalysesusingadhoccollectionsoflocaldatafiles(excelsheets,csvtables,etc.)allowsrapidandflexibleexecution,italsocreatestechnicaldebt.Inthelongterm,theseworkflowsresultinmissedopportunitiestoaccumulateinstitutionalknowledgeandareassociatedwithpoorreproducibility.Wehaveimplementedadataplatformthatcanachievethebenefitsofamoreprincipledhandlingofdatapersistencewithminimalanalystoverhead.Thisisachievedbyautomatingschemainference,metadatacuration,versioning,andRESTfulserviceproductionthroughasimple,Git-likeingestiontool.DatascientistscanretrievedataviafamiliarclientlanguageAPIssuchasdplyrinR.Theplatformisbuiltonopensourcedatabase(Postgres,withanarchitecturethatallowsalternativebackends)andfunctionalprogramming(Haskell,PostgREST)technologies.Ourobjectiveistoacceleratedatasharing/discoverabilityonanalystteamsanddrasticallyreducetheeffortofpersistingdatainasystematicmechanism.Wethereforeprovideatechnologyfoundationforrapiddataserviceproductionandimprovingreproducibilityandreusabilityindataanalyses.

Page 87: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

77

GENOMEREADIN-MEMORY(GRIM)FILTER:FASTLOCATIONFILTERINGINDNAREADMAPPINGUSINGEMERGINGMEMORYTECHNOLOGIES

JeremieKim1,DamlaSenol1,HongyiXin2,DonghyukLee1,3,MohammedAlser4,HasanHassan5,OguzErgin5,CanAlkan4,OnurMutlu1,6

1DepartmentofElectricalandComputerEngineering,CarnegieMellonUniversity,Pittsburgh,PA;2DepartmentofComputerScience,CarnegieMellonUniversity,Pittsburgh,PA;3NVIDIA

Research,Austin,TX;4DepartmentofComputerEngineeringBilkentUniversity,Ankara,Turkey;5DepartmentofComputerEngineering,TOBBUniversityofEconomicsandTechnology,

Söğütözü,Ankara,Turkey;6DepartmentofComputerScience,SystemsGroupETH,Zürich,Switzerland

JeremieKimHigh-throughput sequencing (HTS) technology has resulted in a massive influx of available genetic data. Using HTS technology, genomes are sequenced relatively quickly and result in many short DNA sequences (reads) that are used to analyze the donor’s genome across multiple days when using state-of-the-art methods. The first step of genome analysis, read mapping, determines origins for billions of reads within a reference genome to identify the donor’s genomic variants. Hash-table based read mappers are a common type of comprehensive read mappers. They operate by fetching from a pre-generated hash-table, potential mapping locations of a read in the reference genome, which are verified by local alignment, a computationally-expensive dynamic programming algorithm that determines similarity between the read and the potential mapping segment of the reference genome. Alignment has traditionally been the computational bottleneck of read mapping, but recently, many works have been proposing a new step called Location-Filtering in order to alleviate this bottleneck.

Location-Filtering is a critical step where many incorrect potential locations from the hash-table are discarded before local alignment verifies such locations. FastHASH, SHD, and GateKeeper propose variations of Location-Filtering that discard only incorrect locations to reduce end-to-end runtime of hash-table based read mapping. Location-Filtering is now the computational bottleneck of read mapping.

Our goal is to create an efficient Location-Filter that quickly discards as many false negative locations as possible before alignment, while retaining a zero false positive rate. Efficiently filtering incorrect mappings before alignment significantly improves throughput and latency of hash-table based read mapping. We propose a novel filtering algorithm that quickly eliminates from consideration reference genome segments where alignment would yield no matches. Our algorithm’s novelty mainly stems from its design to exploit 3D-stacked memory systems. 3D-stacked memory is an emerging technology that tightly integrates computation and high-capacity memory in a single die stack, thereby enabling concurrent processing of large data chunks at low latency and high bandwidth. The key ideas of our design consist of 1) a new representation of coarse-grained reference genome segments such that the genome can be operated on in parallel using bitwise operations and 2) exploiting the parallel computation capability of 3D-stacked memory to run massively-parallel in-memory operations on the new genome representation. We call our resulting filter the GRIM-Filter.

This work shows how GRIM-Filter can be used with any hash-table based read mapping algorithm and how it effectively exploits processing-in-memory capabilities of 3D-stacked memory. We show that when running with 5% error tolerance, GRIM-Filter reduces false positive locations by 5.59x-6.41x and provides a 1.81x-3.65x end-to-end speedup over the state-of-the-art read mapper mrFAST with FastHASH

Page 88: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

78

BCL-2FAMILYMEMBERSASREGULATORSOFRESPONSIVENESSTOBORTEZOMIBINAMULTIPLEMYELOMAMODEL

MelissaE.Ko1,2,CharisTeh3,4,ChristopherS.Playter5,EliR.Zunder6,DanielH.Gray4,7,WendyJ.Fantl8,SylviaK.Plevritis9,GarryP.Nolan2

1CancerBiologyProgram,StanfordSchoolofMedicine,Stanford,CA;2BaxterLaboratoryforStemCellBiology,StanfordSchoolofMedicine,Stanford,CA;3MolecularGeneticsofCancerDivision,ImmunologyDivision,TheWalterandElizaHallInstitute,Parkville,VIC,Australia;

4DepartmentofMedicalBiology,TheUniversityofMelbourne,Parkville,VIC,Australia;5DepartmentofBiologicalSciences,PurdueUniversity,Lafayette,IN;6DepartmentofBiomedical

Engineering,UniversityofVirginia,Charlottesville,VA;7TheWalterandElizaHallInstitute,Parkville,VIC,Australia;8DepartmentofObstetricsandGynecology,StanfordSchoolof

Medicine,Stanford,CA;9DepartmentofRadiology,StanfordSchoolofMedicine,Stanford,CAMelissaKoSurvivalratesforBcellmalignancieshavesteadilyimprovedoverthelastfivedecadesreachinglevelsofover50%asaresultoftherapeuticagentssuchasdexamethasone,bortezomib,andlenalidomide.However,despitetheirsuccessinproducingclinicalresponses,thecellularmechanismsbywhichtheseagentskilltumorcellsarepoorlyunderstood.WehypothesizedthattheBcl-2familyofproteins,whichareknowntocontrolinitiationofapoptosisandarefrequentlydysregulatedincancerousBcellssuchasmultiplemyeloma,caninfluenceresponsivenesstothesetherapeuticagents.Thus,withafocusonmultiplemyeloma,weaimedtocomprehensivelyprofileindividualcellsfortheirexpressionlevelsofBcl-2familymemberssimultaneouslywithactivatedintracellularsignalingproteinsuponexposureofcellstodrugsusedtotreatB-cellmalignancies.Weappliedsingle-cellmasscytometrytoinvestigatetheinterplayofpro-survivalandpro-apoptoticBcl-2familymembersinMM1SBlymphoblasticcellsexposedtodifferentdrugs.ThisdatasetwasanalyzedwithFLOW-MAP,acomputationaltooldevelopedintheNolanLabthatorganizeshigh-dimensionalsingle-celldataintoaninterpretable2Dgraphstructure.FLOW-MAPenabledtheapoptoticprogressionofindividualcellstobevisualizedandshowedchangesinexpressionlevelsofBcl-2familymembersandsignalingfactorsacrosscellswithdifferentdrugsensitivities.Ourextensivestudyrevealedheterogeneousresponsesofcellsubsetstotherapeuticagentsusedtotreatmultiplemyelomapatients.Forexample,ourresultsshowedthatbortezomib,aproteasomeinhibitorapprovedfortreatmentofmultiplemyeloma,potentlyinducesapoptosiswithin24hourstoagreaterextentcomparedtoothertreatments.Inductionofapoptosisinsinglecellstreatedwithbortezomibcoincidedwithaselectivereductionofasubsetofpro-survivalBcl-2members.Furthermore,ouranalysissuggeststhatametricthatreflectsthebalanceofpro-survivalandpro-apoptoticBcl-2proteinsmaybestseparateandpredictcellswithdifferentialsensitivitytobortezomib.Thisparadigmissupportedbystatisticalmodelingwhereinwedevelopedaclassifierofbortezomib-resistantvs.sensitivecellsusingBcl-2familyinformationorasingleBcl-2scorewithsignificantaccuracy.Ourstudyprovidesageneralframeworkforunderstandingdifferentialsensitivityoftumorpopulationstoanti-cancerdrugs.Ourresultsarelikelytoidentifypreviouslyunknowndeath-inducingmechanismsaswellaspinpointpotentialsynergiesbetweenstandard-of-caretherapiesandnewlydevelopedtherapies,suchasBcl-2familyinhibitors.

Page 89: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

79

BIOMEDICALTEXT-MININGAPPLICATIONSFORTHESYSTEMDEEPDIVE

EmilyK.Mallory,ChrisRe,RussB.Altman

StanfordUniversity

EmilyMalloryAcompleterepositoryofbiomedicalrelationshipsiskeyforunderstandingtheprocessesunderlyingbothhumandiseaseanddrugresponse.Afterdecadesofexperimentalresearch,themajorityofknownbiomedicalrelationshipsexistsolelyintextualformintheliteratureandarethuscomputationallyinaccessible.Whilecurateddatabaseshaveexpertsmanuallyannotaterelevantrelationshipsorinteractionsfromtext,thesedatabasesstruggletokeepupwiththerapidgrowthofthebiomedicalliterature.Toaddresstheneedforbiomedicalrelationshipextraction,therehavebeennumerousbiologicalentityandrelationshipextractionchallenges;however,extractionsystemsinthebiomedicalspacetendtobetaskspecificanddonotprovideageneralframeworkforquicklydevelopingfuturesystemstoaddressnewextractiontasks.Inthiswork,wedevelopedmultipleentityandrelationshipapplications(called“extractors”)forthesystemDeepDivetoextractbiomedicalrelationshipsfromfulltextarticles.DeepDiveisatrainedsystemforextractinginformationfromavarietyofsources,includingtext.Applicationdeveloperscreatefeaturesandtrainingexamples,andDeepDiveassignsaprobabilitythatagivenentityorrelationshipiscorrectortrueintheoriginalsentence.Wedevelopedentityextractorsforgenes,drugs,anddiseases;andrelationshipextractorsforgene-gene,gene-disease,andgene-drugrelationships.Weevaluatedthegene-geneworkpreviouslywithacorpusofarticlesfromthreePLOSjournals,andwearecurrentlyevaluatingtheothertworelationshipextractorsonacorpusfromPubMedCentral.Theprecisionofourentityextractorsrangedfrom80to90%.Forthetaskofextractinggene-generelationships,oursystemachieved76%precisionand49%recallinextractingdirectandindirectinteractionspreviouslycuratedbytheDatabaseofInteractingProteins(DIP).Forrandomlycuratedextractions,thesystemachievedbetween62%and83%precisionbasedondirectorindirectinteractions,aswellassentence-levelanddocument-levelprecision.Ourcurrentgene-diseaseandgene-drugextractorsachievedover70%precisiononarandomsubsetofdocumentsfromover340,000fulltextarticlesinthePubMedCentralOpenAccessSubset.Wearecurrentlytuningtheseextractorstoincreaseperformance.Thisworkwillenablenotonlyfulltextliteratureextractionforbiomedicalrelationships,butalsocomputationalmethodsdevelopmentbasedontheserelationships.

Page 90: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

80

PROFILINGADAPTIVEIMMUNEREPERTOIRESACROSSMULTIPLEHUMANTISSUESBYRNASEQUENCING

SergheiMangul1,IgorMandric2,HarryTaegyunYang1,DennisMontoya1,NicolasStrauli3,JeremyRotman1,BenjaminStatz1,WillVanDerWey1,AlexZelikovsky2,Roberto

Spreafico1,MauraRossetti1,SagivShifman1,MarkAnsel3,NoahZaitlen3,EleazarEskin1

1UniversityofCaliforniaLosAngeles,2GeorgiaStateUniversity,3UniversityofCaliforniaSanFrancisco

SergheiMangulAssay-basedapproachesprovideadetailedviewoftheadaptiveimmunesystembyprofilingT-andB-cellreceptors.However,thesemethodscomeatahighcostandlackthescaleofregularRNAsequencing(RNA-seq).WedevelopedImReP,anovelcomputationalmethodthatutilizesRNA-seqdatatoprofiletheadaptiveimmunerepertoire.ImRePisabletoquantifyindividualimmuneresponsesfromRNA-SeqdatabasedonarecombinationlandscapeofgenesencodingB-andT-cellreceptors.WeappliedImRePto8,555samplesfrom544individualsand53diversehumantissues,andconstructedthecomplementaritydeterminingregions3(CDR3),whichisthemostvariablepartoftheantigen-bindingsite.Weassembled3.8milliondistinctCDR3sequences.Analyzingthisdataset,weidentifiedthenormal,healthy,adaptiveimmuneprofilefordifferenttissues.Wedescribethevariationinimmuneprofiles,andthedistributionofclonallineagesacrossindividualsandtissues.BaseontheimmuneprofilesgeneratedbyImReP,wewereabletoidentifyinflammationandvariousdiseases,asconfirmedfromthehistologicalimages.TheatlasofTandBcellrepertoires,freelyavailableathttps://sergheimangul.wordpress.com/atlas-of-t-and-b-cell-repertoires/,isthelargestrecourseintermsofthenumberofCDR3sequencesandtissuetypesinvolved.Weanticipatethisrecoursetoenhancefuturestudiesinareassuchasimmunologyandadvancedevelopmentoftherapiesforhumandiseases.ImRePisfreelyavailableathttps://sergheimangul.wordpress.com/imrep/.

Page 91: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

81

THECMHVARIANTWAREHOUSE-ACATALOGOFGENETICVARIATIONINPATIENTSOFACHILDREN'SHOSPITAL

NeilMIller1,GreysonTwist1,ByunggilYoo1,AndreaGaedigk2

1CenterforPediatricGenomicMedicine,Children'sMercy,KansasCity;2DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,School

ofMedicine,UniversityofMissouri-KansasCity

NeilMillerAdvancesinhigh-throughputDNAsequencinghaveenabledthecomprehensiveidentificationofindividualgeneticvariationonanunprecedentedscale,poweringthediagnosisofdiseaseandpersonalizedtreatment.Astheabilitytodetectgeneticvariationhasgrown,cliniciansandresearchersstruggletointerpretthefunctionalsignificanceofthemillionsofvariantsfoundineachindividualgenome.TheVariantWarehouseattheCenterforPediatricGenomicMedicineatChildren’sMercy,KansasCity,isaresourcecontainingarecordofover160milliongenomicvariantsdetectedinmorethan5000patientssequencedbytheCentersince2011.EachvarianthasbeencharacterizedbytheCPGM’sRapidUnderstandingofNucleotideEffectSoftware(RUNES)pipeline,whichrecordsdatabasecrossreferences,predictedfunctionalconsequencesandavariantclassificationscore(1-5)basedonpreliminaryguidelinesfromtheAmericanCollegeofMedicalGeneticsandGenomics(ACMG).Additionally,alocalallelefrequencyiscalculatedforeachvariantevery6hoursenablingcliniciansandresearcherstorapidlyidentifyrarevariants.Despiteextensivecross-referencingwithdatabasessuchasdbSNP,ClinVar,ExACandCOSMICtheCMHvariantwarehousecontainsasignificantnumberofnovelvariantsnotpresentinexternaldatabases.59%ofthetotalvariantsinthewarehousearenovelwithalocalallelefrequencyoflessthan0.25%.Ofthese,1%arecategory1-3variantsexpectedtohavesomefunctionalimpact.Wehaveobserved82,578variantsamongapanelof58pharmacogenes(includingCPICgenes),ofwhich59%arenoveland2%arecategory1-3variants.Theamountofnoveltyobservedinthispatientpopulationsuggeststhateffortstocomprehensivelycataloghumanvariationremainaworkinprogressandthatinterpretationofvariantdatawillrequiresomelevelofinterpretationofnovelvariantsfortheforeseeablefuture.Theseobservationsareincreasinglyrelevantinpharmacogenomicsapplicationswheredrugcompatibilityisdeterminedthroughassociationtoknownhaplotypes;inthiscontext,thepresenceofnovelandrarevariantsmustbeanticipatedandaccountedforinautomatedhaplotypedetermination.TheCMHvariantwarehouseispubliclyavailableathttp://warehouse.cmh.edu.Toolstosearchandviewvariantsbygene,categoryandallelefrequencyareprovidedaswellasbulkdownloadsofdata.ProgrammaticaccesstodataisprovidedthroughimplementationsoftheGlobalAllianceforGenomicsandHealthvariantannotationAPI.

Page 92: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

82

MUTPRED2ANDITSAPPLICATIONTOTHEINFERENCEOFMOLECULARSIGNATURESOFDISEASE

VikasPejaver1,LiliaM.Iakoucheva2,SeanD.Mooney3,PredragRadivojac1

1DepartmentofComputerScienceandInformatics,SchoolofInformaticsandComputing,IndianaUniversityBloomington;2DepartmentofPsychiatry,UniversityofCaliforniaSanDiego;

3DepartmentofBiomedicalInformaticsandMedicalEducation,UniversityofWashingtonSeattlePredragRadivojacOverthepastdecade,severalmethodshavebeendevelopedforthecomputationalprioritizationofmissensemutations.However,theidentificationoftheeffectsofsuchmutationsonproteinstructureandfunctionstillremainamajorchallenge.Previously,wedevelopedMutPred,arandomforest-basedmodelfortheclassificationofpathogenicmissensevariantsandtheautomatedinferenceofmolecularmechanismsofdisease.Here,webuildonourpreviousworkandpresentMutPred2asanimprovedapproachforthesetasks.Forpathogenicityprediction,MutPred2particularlybenefitsfromalargerandheterogeneoustrainingset,theinclusionofnewfeatures,theencodingoflocalsequencecontextandtheuseofaneuralnetworkensemble.Throughcross-validationexperimentsandatestonanindependentdataset,weshowthatMutPred2outperformsMutPredandotherstate-of-the-artmethods.Inparticular,weobservethatMutPred2predictsfewerpathogenicmutationsthanPolyPhen-2,whenappliedtohomozygousmutationsfromhealthyindividuals.Additionally,MutPred2hasover50built-instructuralandfunctionalpropertypredictors,whichgreatlyincreasethenumberofpossibledownstreamconsequencesthatcanbeassociatedwithagivenaminoacidsubstitution.Weintroduceanovelrankingapproachthatutilizesapositive-unlabeledlearningframeworktoderiveposteriorprobabilitiesforthedisruptionofthesepropertiesand,thus,inferthemostlikelymolecularmechanismofpathogenicity.WethendemonstratetheutilityofMutPred2intwosituations.First,weidentifyprominentstructuralandfunctionalsignaturesinadatasetofmostlyMendeliandiseases(fromMutPred2’strainingset)andrecapitulateknownassociationsbetweenthesediseasesandorderedandstructuredregionsofproteins.Wealsomakenovelpredictionsabouttheroleofallostericresiduesinsuchdiseases.Second,weapplyMutPred2toadatasetofdenovomutationsfrompatientsdiagnosedwithneuropsychiatricdisorders,alongwithhealthysiblingsascontrols.Onthisdataset,MutPred2pathogenicityscoresalonearesufficienttodistinguishbetweenneuropsychiatriccasesandcontrols,withoutanyadditionalgene-basedorvariant-basedfiltering.Wealsoobservethatdisruptionsinprotein-proteininteractions(PPIs),phosphorylationandacetylationarefrequentmechanisms,suggestingthatneuropsychiatricdisordersarelargelycharacterizedbyabreakdowninmolecularsignaling.Finally,weidentifycandidatemutationspredictedtodisruptPPIsandvalidatethemexperimentally.

Page 93: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

83

HIV-TRACE:MONITORINGTHEHIVEPIDEMICINNEARREALTIMEUSINGLARGENATIONALANDGLOBALSCALEMOLECULAREPIDEMIOLOGY

SergeiPond1,StevenWeaver1,JoelWertheim2,AndrewJ.LeighBrown3

1TempleUniversity,2UniversityofCaliforniaSanDiego,3UniversityofEdinburgh

SergeiPondManypathogens,includingHIV,propagatealongsexualandsocialcontactnetworks.ItisnowclearthatHIVtransmissionnetworksbelongtothescalefreefamilyandthespreadofinfectionsinscalefreenetworksiscriticallyenhancedbyhighlyconnectedindividualsor“hubs”.Thestructureofthetransmissionnetworkhasmajorimplicationsforinterruptinganepidemic.Sincepathogentransmissionnetworksarenotobserveddirectly,theyareinferredandcharacterizedbasedonindirectmeasurements,andmethodstodothisproperlyremainsanopenresearchchallenge.Becauseoftheirrapidandhost-specificevolutionandchronicdiseasestates,HIVsequenceisolatesareessentiallyuniquetoeachinfectedperson.Thissequenceuniquenesscanbeusedtoconfirmorrejectthehypothesisthattwoindividualsare“linked”byarecenttransmissionorbelongtothesametransmissionclusterThereare~1,000,000HIVsequencesisolatedfromdifferentindividualsoverthelast4decades.Nationalandinternationalsurveillanceanddrugresistanceprogramsaregeneratinghighresolutionsequencingdataonhundredsofthousandsofisolatesannually.WedevelopedHIVTransmissionClusterEngine(HIV-TRACE)inordertomaketheprocessofcluster(andnetwork)inferenceautomated,fast,convenient,andmorerobust.Itisanefficientopen-sourceapplicationdesignedtoscalewellandenablenearreal-timeinferenceandanalysisoflargenetworks:itcanprocess100,000sequencesin~15-30minutesona64corebackendsystem.HIV-TRACE(hiv-trace.org)isanopen-sourcewebapplicationbuiltonrobustandpopularmodernlibraries.Userinteractionandresultvisualizationisdoneentirelyinthebrowser,processingisdoneasynchronouslyonaserverbackend.ComponentsandversionsofHIV-TRACEareusedbytheCDC(VARS,HICSB),Canadianpublichealthofficials,NYCDepartmentofPublichealth,SanDiegoprimaryinfectioncohort,andtheUKDrugResistanceDatabase.WeillustratetheutilityofHIV-TRACEonfourreal-worldexamplesofessentialquestionsinpublichealthandepidemiologyofHIV-1:1).Arethererapidlygrowingtransmissionclusters,andwhatisdrivingtheirgrowth?2).HowdoesHIVspreadatdifferentgeographicscales,andamongdifferentriskgroups?3).Howcantreatmentandinterventionbedeployedinoptimalwaystoreduceincidenceandprevalence?4).Canvaccineandpreventionefficacybemeasuredmoreaccuratelyusingnetwork-levelinformation.

Page 94: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

84

THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY

MaryA.Pyc,EmilyGiron,PhilipCheung,DouglasFenger,J.StevendeBelle,TimTully

DartNeuroScience

DouglasFengerWeareinterestedindiscoveringnewcandidatetargetsfordrugtherapiestoenhancecognitivevitalityinhumansthroughoutlife,andtoremediatememorydeficitsassociatedwithbraininjuryandbrain-relateddiseasessuchasAlzheimer’sandParkinson’s.Toachieveourgoalweneedacomprehensiveandobjectiveunderstandingofthehumangenomecontributiontovariationinmemoryperformanceinhealthyindividuals.WeareimplementingaGenome-WideAssociationStudy(GWAS)toidentifygeneticlocivaryingamongindividualswhopossessexceptionalandnormalmemoryabilities.Thesegenesandthoseinassociatednetworkswillinformdrugdiscoveryanddevelopment.Ourfirststepistoidentifyexceptionalmembersofthepopulation.Thus,wehavecreatedanonlinememorytest–theExtremeMemoryChallenge(XMC,accessibleathttp://www.extremememorychallenge.com)–toconvenientlyscreenthroughanunlimitednumberofsubjectstofindindividualswithexceptionalmemoryconsolidationabilities.Identifiedsubjectsare(1)validatedbyabatteryofsecondarymemorytasks,and(2)providingsalivasamplesfromwhichwecanisolateDNAforGWAS..TenpilotexperimentswereconductedtoparameterizetheXMCscreen.Participantslearnedface-namepairsforadelayedrecalltest.Afterinitialstudy,eachnamewaspresentedandparticipantswereaskedtoselectthecorrectfaceamongfour(distracterswereotherfacespairedwithdifferentnames).Onedaylaterparticipantscompletedafinaltesttrial.Weareprimarilyinterestedinforgettingacrosssessions,asthisprovidesanestimateofconsolidationacrossa24-hourtimeinterval.Pilotstudiesindicatedtheoptimalprotocolshouldinclude30face-namepairs,presentedata4secondrate.Todate,17,849participantsfrom176nationshavebeenscreenedintheXMC.Ofthese,11,311havecompletedbothsessions.IndividualsinoursamplearemostfrequentlyCaucasians(55%),post-secondaryschool-educated(63%),reportedbeingmostalertinthemorning(51%),andrighthanded(89.5%).Theaverageagewas34,andthegenderdistributionwassplitevenly.Theforgettingrate(decreaseinperformancefromday1today2)was10%.Wehaveidentified49individualswithperfectperformanceonday2ofthetestand24withexceptionalconsolidationabilities(definedas3SDsfromthemean).Wehavebegunthegenomicsphaseofthestudywith33individualswhohavecompletedadditionalbehavioraltesting.

Page 95: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

85

RESCUETHEMISSINGVARIANTS-LESSONSLEARNEDFROMLARGESEQUENCINGPROJECTS

YingxueRen1,JosephS.Reddy1,VivekanandaSarangi2,JasonP.Sinnwell2,SteveG.Younkin3,NilüferErtekin-Taner3,OwenA.Ross3,RosaRademakers3,ShannonK.McDonnell2,JoannaM.

Biernacka2,YanW.Asmann1

1DepartmentofHealthSciencesResearch,MayoClinic,Jacksonville,FL;2DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN;3DepartmentofNeuroscience,MayoClinic,

Jacksonville,FLYingxueRenIdentifyingnoveldiseasevariantsthroughnextgenerationsequencing(NGS)hasbeenafruitfulpracticeinmedicalresearchinrecentyears,leadingtothediscoveriesofnewdiseasemechanismsaswellastherapeuticstrategies.TheGATKbestpracticeshavesincebeenestablishedtoprovidegeneralrecommendationsoncoreprocessingstepsrequiredtogofromrawreadstofinalvariantcallsets.However,withthesamplesizedrasticallyincreasingintoday’ssequencingexperiments,manydefaultvariantcallingstrategiesandthechoiceoftoolscallforacloserexamination.OurstudyutilizedthewholeexomesequencingdataprovidedbytheAlzheimer'sDiseaseSequencingProject(ADSP)totestfordifferentvariantcallingstrategiesandtoolsinvolvedinthevariantdiscoveryworkflowinthecontextofsamplesizes.WefirstinvestigatedtheimpactofusingdifferentsequencealignersonvariantcallsetswhilekeepingthedefaultGATKsettingsofthevariantcallingandQCstepsidentical.Weselected1952samplestoalignbybothBWAandNovoAlign,andcomparedthevariantcallsetsin50,100,200,500,1000and1952samples.Wediscoveredthatthepercentageofvariantsuniquetoalignerincreaseddramaticallywithincreasingsamplesizes.Atsamplesizeof1952,theuniquevariantsgeneratedbyBWAandNovoAlignaccountformorethan20%oftotalcalledvariants.Theseuniquevariantshavegoodvariantqualitymetrics:~80%haveGenotypeQuality(GQ)scoreof60orabove,andtheirdistributionofBalleleconcentration(BAC)centersaround0.5and1,consistentwithwhatisexpectedofdiploidgenomes.What’smore,over96%oftheuniquevariantshavepopulationBallelefrequency(BAF)oflessthan0.01,indicatingthatthesevariantsarerareinthepopulation.Allthesemetricssuggestthattheseuniquevariantsareimportanttobeincludedindownstreamvariantanalysis.Inadditiontoalignercomparison,wealsoevaluatedsingle-samplevariantcallingversusthedefault,singlesamplevariantcallingfollowedbyjointmulti-samplegenotypingstrategyin50,100,500,2000,and5000samples.Ourdatashowedthat,withincreasingsamplesizes,thesingle-samplecallingstrategyaddedincreasingpercentageofuniquevariants.Atsamplesizeof5000,single-samplecallingadded58,884variants,accountingfor5.55%oftotalvariantscalledbybothstrategies.7331oftheseuniquevariantspassedVariantQualityScoreRecalibration(VQSR)andhaveGQof60oraboveinatleast5samples.Ourstudyidentifiedalargenumberofgood-qualityvariantsfromtheADSPexomesequencingprojectthatweremissedbyusingonealignerorusingmulti-samplegenotypingstrategyalone.Ourfindingsrevealedtherelationshipsbetweenbioinformaticspipelinesandbiomedicalresearchresults,andsuggestedthatalternativevariantcallingstrategiesmaybebeneficialforoptimalvariantdiscoveryinfaceoftoday’slargesequencingscale.

Page 96: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

86

TOWARDEFFECTIVEMICRORNAQUANTIFICATIONFROMSMALLRNA-SEQ

PamelaRussell1,RichardRadcliffe2,BrianVestal1,WenShi1,PratyaydiptaRudra1,LauraSaba2,KaterinaKechris1

1DepartmentofBiostatisticsandInformatics,ColoradoSchoolofPublicHealth;2DepartmentofPharmaceuticalSciences,UniversityofColoradoSkaggsSchoolof

PharmacyandPharmaceuticalSciences

PamelaRussellExtensiveworkhasledtorobustquantificationmethodsforRNA-seqdataprimarilyderivedfromlargeRNAs.Manystudieshaveusedthesemethods“outofthebox”toestimatemicroRNA(miRNA)expressionfromsmallRNA-seqdata.However,thesemethodsdonoteffectivelyaddressissuesparticulartomiRNAs.Firstofall,referencebiasisamplifiedduetothesmallsizeofsequencingreadsderivedfrommiRNAs(~22nt).Thatis,withshorterreads,atruemismatchbetweenasampleandthereferencecanleadtoincorrectalignmentsorinabilitytoalignreadsatall,creatingacountbiastowardthosesampleswiththereferenceallele.Withlongerreads,singlemismatcheshavelessimpactonalignmentalgorithms.Second,anybiasforindividualmiRNAsismoreimpactfuloverallduetotherelativelysmallrepertoireofmiRNAscomparedtomRNAs.InaccuratecountsforahandfulofmiRNAscansignificantlyalteroveralllibrarycountsandthusaffectnormalization.Werefertothisissueasrepertoirebias.Also,mostmiRNAstudiesseektoidentifyfunctionalmaturemiRNAmoleculesregardlessofthepositioninthegenomethattheyareoriginallytranscribedfromorsmallnon-functionaldifferencesbetweenmiRNAsofthesamefamily.ToolsdesignedforlargeRNAsdonotaddresstherepetitivenatureandfamilystructureofmiRNAs,bydefaultreturningestimatedcountsformultipletargetsthatshouldbeconsideredequivalentbytypicalmiRNAstudyparadigms.Genome-basedmethodsoftenmapmiRNAreadstomultiplelociencodingthesamematuremiRNA.MethodsbasedonmappingdirectlytoamiRNAdatabasedonotsufferfrommultiplealignmentsduetoidenticalregionsofthegenomebutdotypicallydistinguishamongmembersofeachmiRNAfamily.Bothsourcesofmultiplemappingscanleadtomisleadingcountswhenthegoalistoelucidatefunction.Hereweexplorealltheseissuesinthecontextofcommonlyusedmethods.Wethenproposeanewhighthroughputapproachthat(1)incorporatesindividualgeneticvariationintothereferencesequenceusedforalignment,reducingreferencebias,and(2) assignseachreadtoasinglefunctionalgroupsuchasamiRNAfamily.Wedemonstratetheaccuracyofthisapproachcomparedtootherpopularmethodsusingadatasetderivedfrom206mousebrainsamples.FundedbyNIH/NIAAAAA016597,R01AA021131andR24AA013162

Page 97: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

87

NANOPORESEQUENCINGTECHNOLOGYANDTOOLS:COMPUTATIONALANALYSISOFTHECURRENTSTATE,BOTTLENECKSANDFUTUREDIRECTIONS

DamlaSenol1,JeremieKim1,SaugataGhose1,CanAlkan2,OnurMutlu1,3

1DepartmentofElectricalandComputerEngineering,CarnegieMellonUniversity,Pittsburgh,PA,USA;2DepartmentofComputerEngineering,BilkentUniversity,Bilkent,Ankara,Turkey;

3DepartmentofComputerScience,SystemsGroup,ETHZürich,SwitzerlandDamlaSenolNanoporesequencing,apromisingsingle-moleculeDNAsequencingtechnology,exhibitsmanyattractive qualities and, in time, could potentially surpass current sequencing technologies.Nanoporesequencingpromiseshigherthroughput,lowercost,andincreasedreadlength,anditdoes not require a prior amplification step. Nanopore sequencers rely solely on theelectrochemicalstructureofthedifferentnucleotidesforidentificationandmeasurethechangeintheioniccurrentaslongstrandsofDNA(ssDNA)passthroughthenano-scaleproteinpores. Biologicalnanopores forDNAsequencingwas firstproposed in the1990s,but itwasonly justrecentlymade commercially available inMay 2014 by Oxford Nanopore Technologies (ONT).The first commercial nanopore sequencing device, MinION, is an inexpensive, pocket-sized,portable,high-throughputsequencingapparatusthatproducesreal-timedata.Thesepropertiesenable newpotential applications of genome sequencing, such as rapid surveillanceof Ebola,Zikaorotherepidemics,near-patienttesting,andotherapplicationsthatrequirereal-timedataanalysis. Inaddition,thistechnologyiscapableofgeneratingvery longreads(~50,000bp)withminimal sample preparation. Despite all these advantageous characteristics, it has onemajordrawback:higherrorrates.Inordertoprovidehigheraccuracyandhigherspeed,inMay2016,ONT released a new version of MinION with a new nanopore chemistry called R9, whichreplacedthepreviousversionR7.AlthoughR9chemistryimprovesthedataaccuracy,thetoolsused for nanopore sequence analysis are of critical importance as they should overcome thehigherrorratesofthetechnology. Ourgoalinthisworkistocomprehensivelyanalyzetoolsfornanoporesequenceanalysis,withafocusonunderstandingtheadvantages,disadvantages,andbottlenecksofthevarioustools.Tothisend,werigorouslyexaminemultiplesteps in thenanoporegenomeanalysispipeline.Thefirststep,basecalling, translatestherawsignaloutputofMinIONintonucleotidestogenerateDNA sequences. Currently,Nanocall andNanonet are publicly available nanoporebasecallers.The second stepperformsgenomeassemblywithassemblers fornoisy long reads.Usingonlythe basecalled DNA reads, assemblers generate longer contiguous fragments called draftassemblies. Currently,CanuandMiniasm are the commonlyused long-readassemblers.Afterthis step, an improved consensus sequence is generated from the draft assembly withNanopolish,andacompletewholegenomeisobtained. Weanalyzethefiveaforementionednanoporesequencingtoolsintermsoftheirspeedandaccuracy,withthegoalsofdeterminingtheirbottlenecksandfindingimprovementstothesetools.Wealsodiscusspotentialfutureworksinnanoporebasecallersandassemblers,totakebetteradvantageofnanoporesequencingandtoovercomeitscurrentdisadvantageofhigherrorrates.

Page 98: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

88

DETECTINGOUTLIERSFROMMULTIDIMENSIONALDATAWITHAPPLICATIONINCANCER

KyleSmith1,SubhajyotiDe2,DebashisGosh1

1UniversityofColorado,2RutgersUniversity

KyleSmithOutliers,whichareverydifferentfromthetypicalcasesinacohort,bringinunexpectedchallengesfordecisionmakinginmanydifferentdisciplines.Theissueismoreacuteinoncology,sincemosttypesofcancerarehighlyheterogeneousdiseases.Evenwithinanycancersubtype,patientsshowextensivevariationintheirmolecularprofilesandclinicaloutcomes.Evenwithinacohortofcancerpatientswhohaveapparentlythesamebiomarkersandreceivedidenticaltreatment,thereareexceptionalrespondersandexceptionalnon-responders,whoareoutliers.Itissuspectedthattheiratypicalmolecularandclinicalprofilescontributetotheirexceptionalresponse.Whileidentifyingsuchoutliercasescanbenefitprecisionmedicineinitiatives,methodstodetectthemfrommultidimensionaldatahasreceivedlimitedattention.Here,weproposeanovelframeworktoidentifyoutliercancerpatientswithatypicalprofilesfrommultidimensionalgenomicdata.Wearguethatdetectionofoutlierpatientswithatypicalprofilescanhelpidentifyexceptionalrespondersandtailorprecisionmedicineinoncologyinitiatives.

Page 99: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

89

HUEMR:INTUITIVEMININGOFELECTRONICMEDICALRECORDS

AbiodunOtolorin1,NanaOsafo2,WilliamSoutherland2

1DepartmentofCommunityandFamilyMedicine,HowardUniversity,Washington,DC;2DepartmentofBiochemistry&MolecularBiologyandtheCenterforComputational

BiologyandBioinformatics,HowardUniversity,Washington,DC

WilliamSoutherlandDespitethewidespreadadoptionofelectronicmedicalrecordsystemsandadvancesingenomics,amajorbarriertoresearchendeavorsisthelackofintuitiveuser-friendlyinteractivetoolsthatenableresearcherstoaccessandanalyzedatareadily.Inlightofthis,innovativetoolshavebeendevelopedtoaddresstheproblem.However,wehypothesizedthataninteractivedatavisualizationtoolthatiscapableofstand-aloneorpluginfunctionalitythatalsoleveragescommondataquerymethodologieswouldcontributetoresearcheffortsrequiringinterrogationofclinicalresearchdatabases.HowardUniversityHospital(HUH)isatertiaryacademicmedicalcenterwithover50,000emergencydepartmentvisitsand8,000inpatientadmissionsperyearandprimarilyprovidescaretotheminoritypopulationintheDistrictofColumbiametropolitanarea.Usingde-identifiedHUHelectronicmedicalrecordsdata,aHUHclinicalresearchdatabasewasdeveloped.Additionally,theHowardUniversityelectronicMedicalRecords(HUeMR)querytoolwasdevelopedasaweb-basedclient-serverapplicationusingjavascriptandphp.HUeMRmayfunctioninstand-aloneorpluginmode.ItsgraphicalinterfacewasbuiltusingGoogleCharts,aninteractiveopensourcevisualizationlibrary.HUeMRsupportscomplexbooleansearchoperationsspecifiedbyaninteractivequerytool.Ontologyispresentedusinglinkeddropdownmenusandqueryconstructionisdisplayedinnaturallanguageform.Dataisdisplayedusingeditableinteractivecharts.Multiplerowsofchartsmaybecreatedthatcontaindifferenttypesofdataconcepts.Queriesmayberefinedbyclickingonthechartsfollowedbyselectionofoneormoreadditionalqueryparameters.DiagnosisbasedonICDcodesorkeywordsmayalsobesearched.Thesefeaturesareillustratedinadiabetesuse-caseinvestigation.Insummary,HUeMRisasecuredataanalyticsthatcanbeuseinstand-aloneorpluginmodetoqueryingclinicalresearchdatabases.Ithasahighlyinteractiveuserinterfacethatallowsrapiddataanalysisforcohortdiscovery.Thisworkwassupportedbygrant#5G12MD007597fromtheNationalInstituteonMinorityHealthandHealthDisparitiesfromtheNIH.

Page 100: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

90

DECIPHERINGLUNGADENOCARCINOMAMORPHOLOGYANDPROGNOSISBYINTEGRATINGOMICSANDHISTOPATHOLOGY

Kun-HsingYu1,GeraldJ.Berry2,DanielL.Rubin1,ChristopherRé3,RussB.Altman1,MichaelSnyder4

1BiomedicalInformaticsProgram,StanfordUniversity;2DepartmentofPathology,StanfordUniversity;3DepartmentofComputerScience,StanfordUniversity;

4DepartmentofGenetics,StanfordUniversity

Kun-HsingYuAdenocarcinomaaccountsformorethan40%oflungmalignancy,andmicroscopicpathologyevaluationisindispensabletoitsdiagnosis.However,howhistopathologyfindingsrelatetomolecularabnormalitiesremainslargelyunknown.Toaddressthisproblem,weobtainedhematoxylinandeosinstainedwhole-slidehistopathologyimages,pathologyreports,RNA-sequencing,andproteomicsdataof538lungadenocarcinomapatientsfromTheCancerGenomeAtlas.Weprofiledgeneexpression,proteinexpressionandmodifications,andextractedmorethan9,000objectivefeaturesfromthehistopathologyimagesofeachpatient.Wesuccessfullypredictedhistologygradewithtranscriptomicsandproteomicssignatures(areaundercurve>0.75)andidentifiedtheassociatedmolecularpathways,suchascellcycleregulation,whichprovidebiologicalinsightsintotumorcelldifferentiationgrades.Wefurtherbuiltanintegrativehistopathology-transcriptomicsmodeltogeneratesuperiorprognosticpredictionsforstageIpatients(P<0.01)comparedwithgeneexpressionorhistopathologyanalysisalone.Theseresultssuggestthattheintegrationofhistopathologyandomicsstudiescanrevealthemolecularmechanismsofpathologyfindingsandenhanceclinicalprognosticprediction,whichwillcontributetothedevelopmentofprecisioncancermedicine.Ourmethodsaregeneralizabletoothertypesofmalignancyordiseases.

Page 101: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

91

EXPLORINGDEEPLEARNINGFORCOPYNUMBERVARIATIONDETECTIONWITHNGSDATA

Yao-zhongZhang,RuiYamaguchi,SeiyaImoto,SatoruMiyano

InstituteofMedicalScience,UniversityofTokyo

Yao-zhongZhangCopynumbervariations(CNVs)areanimportanttypeofgeneticvariationswidelyusedforprofilingcancerandothercomplexdiseases.AccuratedetectionandsummarizationofCNVshelpidentifyoncotargetandcancersubtypesforprecisionmedicine.InusingNGSdataforCNVsdetection,variousheterogeneousbiases,suchasGC-contentbiasandothernoisesareneededtobeproperlyprocessed.ThisbecomesespeciallyimportantforCNVsdetectiononsingle-cellNGSdata.Inthisstudy,weextendtraditionalHMMapproachesforCNVsdetectionwithdeeplearning.Weextractfeaturerepresentation,whichintegratetheinformationfromreadcountandobservablegenomicsequences,asthenewobservablesequenceofgenomicbinsanditerativelytrainaDNN-HMMmodelforCNVsdetection.WecompareourmethodwithotherHMMbasedCNVsdetectionmethods.

Page 102: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

92

IMAGINGGENOMICS

POSTERPRESENTATIONS

Page 103: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

93

PERIPHERALEPIGENETICASSOCIATIONSWITHBRAINGRAYMATTERINSCHIZOPHRENIA

DongdongLin1,VinceD.Calhoun2,JuanR.Bustillo3,NoraPerrone-Bizzozero4,JingyuLiu1

1TheMindResearchNetworkandLovelaceBiomedicalandEnvironmentalResearchInstitute,Albuquerque;2Dept.ofElectronicandComputerEngineering,UniversityofNewMexico,Albuquerque;3Dept.ofPsychiatry,UniversityofNewMexico,Albuquerque;4Dept.of

Neurosciences,UniversityofNewMexico,AlbuquerqueJingyuLiuEpigeneticregulationbyDNAmethylationandhistonemodificationhasbeenincreasinglyrecognizedforitsrelevancetoschizophrenia(SZ).Beyondthegeneticvariation,epigeneticsthroughregulationofgenetranscriptionandexpressioncanpotentiallyexplainthe‘missing’heritabilityandmediatetheeffectofgeneticrisksindisease.SpecifictoDNAmethylation,recentstudieshavedemonstratedthat6-7%ofCpGsitesacrossthegenomeshowsignificantcorrespondencebetweenbrainandblood,supportingtheinvestigationofeasilyaccessibletissuesforbrainandmentaldisorders.Inthisstudy,weanalyzedDNAmethylationof163CpGsitesfromsalivaandwholebraingraymatterdensityof108SZpatientsand105healthycontrols.Weareawareofcellularitydifferencesbetweenbloodandsaliva,andtoourbestknowledgenodetailedsaliva-braincorrespondencestudyhasbeendoneexceptgeneralcomparisonofoverallpatterns,whichindicatesalivamaybeamorecloseindicatortobrainthanblood.The163CpGsitesarelocatedwithinthe108schizophrenicriskregionsreportedbythePsychiatricGenomicsConsortiumschizophreniaworkinggroup,andalsoshowedstrongcross-tissuesimilaritybasedonthegenome-widemethylationstudyofbloodandbraintissuesbyHannon,etal.QualitycontrolandnormalizationformethylationdatawereimplementedusingminfiRpackagetoremovebatcheffect,andcelltypeproportioneffect.GraymatterdensitymapsweresegmentedbySPM12withasmoothkennelof8mm3.Weappliedindependentcomponentanalysistobothbrainimagingdataandmethylationdata,andextracted25graymatternetworks,and15methylationcomponents.Amongthem,twomethylationcomponentsweresignificantlycorrelatedtothreegraymatternetworks(falsediscoveryrate<0.05).ThefirstmethylationcomponentcomprisedtwoCpGsiteswithinandneargeneZSCAN12,andwasassociatedwithabilateralmiddle/superiortemporalnetwork(r=0.25),andabilateralsuperiorfrontalnetwork(r=-0.24).Thehigherthemethylationcomponentis,thelowerthegraymatterdensityinsuperiorfrontalgyrusandthehigherinmiddletemporalgyrusare.Moreover,SZpatientsshowedsignificantgraymatterreductioninsuperiorfrontalgyrus(p=7.9x10-5).ThesecondmethylationcomponentconsistedofCpGsitesfromtwochromosomeregions(Chr.10AS3MTandNT5C2genes,andChr.12ARL6IP4andOGFOD2genes),andwasassociatedwithcaudateandthalamusregions.Allanalyseswerecontrolledforageandgender.AlthoughwedidnotfindSZspecificmethylationdifferenceswithinSZriskregions,ourresultssuggestthatDNAmethylationpatternsinsalivaareassociatedwithbraingraymattervariation,andsomeofthisvariationisrelatedtoschizophrenia.Themainlimitationofthisstudyincludes1)thelackofreplicationdatatoverifythefindings,and2)thelackofdirectsalivaandbraintissuecorrespondenceverification.

Page 104: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

94

THEINTERPLAYBETWEENOLIGO-TARGETSPECIFICANDGENOME-WIDEOFF-TARGETINTERACTIONS

OlgaV.Matveeva1,NafisaN.Nazipova2,AlekseyY.Ogurtsov3,SvetlanaA.Shabalina3

1BiopolymerDesignLLC,Acton,MA;2InstituteofMathematicalProblemsofBiology,Pushchino,MoscowRegion,Russia;3NationalCenterforBiotechnologyInformation,NationalLibraryof

Medicine,NationalInstitutesofHealth,Bethesda,MDSvetlanaShabalinaManytechniquesofmolecularbiologyinvolveinteractionofspecificoligonucleotideswithDNAorRNAasabasicstep.DNAtargetingofsingle-guided(sg)RNAsforgenomeeditingprocedures,oligonucleotidearraygeneexpressionmonitoringoranti-sense-mediatedgenedown-regulation,andtheGenomicComparisonHybridization(GCH)arrayexperimentsareexamplesoftechniquesinvolvingRNA-DNAandDNA-DNAinteractions.RNAiapproacheswithsiRNAandshRNAmoleculesarebasedonRNA-RNAinteractions.Themainproblemofanyoligo-probeexperimentisthatthespecificoligo-targetinteraction,basedonfullypairedduplex,areusuallycombinedwithnon-specificparallelreactions,whereoligo-probecouldinteractwithmanypartiallypairedDNAorRNAsequences.Theinterplaybetweenspecificandgenome-wideoff-targetinteractionsispoorlystudieddespiteitscrucialroleintheefficacyofthesetechniques.Inthisstudy,weinvestigatedoligo-probecharacteristics,whichareresponsiblefortheinterplay,andwhichmostimprovetheoligo-probedesign.Wedefinedspecificityofinteractionasaratiobetweenoligo-targetspecificandgenome-wideoff-targetinteractions.Microarraydatabases,derivedfromtheGCHexperimentsusingtheAffymetrixplatforms,andcontainingtwodifferenttypesofprobeswereusedfortheanalysisbasedonthethermodynamicfeaturesandnucleotidesequencesofoligo-probes.Thefirsttypeofoligo-probedoesnothaveaspecifictargetonthegenomeandtheirhybridizationsignalsarederivedfromgenome-widecross-hybridizationalone.ThesecondtypeincludesoligonucleotidesthathaveaspecifictargetonthegenomicDNAandtheirsignalsarederivedfromspecificandcross-hybridizationcomponentscombinedtogetherinatotalsignal.Theanalysishasrevealedthathybridizationspecificitywasnegativelyaffectedbylowstabilityofthefully-pairedoligo-targetduplex,stableprobeself-folding,G-richcontent,includingGGGmotifs,lowsequenceSymmetricalComplexity(SC)score.TheSC-scorecharacterizesnucleotidecompositionsymmetryandprobe’svulnerabilitytooff-targetinteractions.Filteringouttheprobeswiththesecharacteristicssignificantlyincreaseshybridizationspecificitybydecreasinggenome-widecross-hybridizationorbyincreasingspecificinteractions.Selectedoligo-probeshavethreetimeshigherhybridizationspecificityonaverage,comparedtotheprobesthatwerefilteredoutfromtheanalysisbyapplyingsuggestedcut-offthresholdstothedescribedparameters.Multipleregressionmodelswithdescribedparametersweresuccessfullyappliedforpredictionsofinteractionspecificityandoff-targeteffectsandsupportedparameterchoice(P<0.001).WealsocomparedprobecharacteristicsselectedfortheanalysisinmicroarraydatabaseswithapplicablefeaturesofsiRNA/shRNAdesignfromourearlierstudies.WeappliedallselectedoligonucleotidefeaturesanddescribedparameterstonewsetsofsgRNAs.Ourstudyexaminedthethermodynamicsandsequence-intrinsicpropertiesofsgRNA-DNAduplexesandanalyzedadditionalselectioncriteriathatarecriticalforguideefficacy.Finally,weidentifyuniversalfeaturesofoligo-probes,si/shRNAsandguidesforoptimaldesignincludingtheSC-score.

Page 105: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

95

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM?

POSTERPRESENTATIONS

Page 106: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

96

WARS2IMPLICATEDASACOMMONMODIFIEROFMETFORMINMETABOLITEBIOMARKERSINABIOBANKCOHORT

AlyssaI.Clay1,RichardM.Weinshilboum2,K.SreekumaranNair3,RimaF.Kaddurah-Daouk4,LieweiWang2,MatthewK.Breitenstein1

1DivisionofEpidemiology,MayoClinic;2DepartmentofMolecularPharmacologyandExperimentalTherapeutics,MayoClinic;3DivisionofEndocrinology,MayoClinic;4Duke

UniversityMatthewBreitensteinBackgroundMetforminisoneofthemostwidelyprescribeddrugsworldwideandafirstlinetreatmentfortype2diabetesmellitus(T2D).Metforminhasmanymechanismsofaction,withvaryinglevelsofunderstanding.Metforminisbeingevaluatedasapotentialchemopreventionagentforcancertreatment,withinhibitionofangiogenesisasoneaffectofmetforminbeingstronglypursued.However,contradictoryevidenceexistsforapotentialmechanismofangiogenesisinhibition(Carcinogenesis2014;(35)5).Buildingonourpriorworkthatidentifiedstratumofstatisticallycorrelatedmetabolites,weaimedtoidentifyoverlappingmetforminpharmacogenomic(PGx)SNPassociations,usingpharmacometabolomicsinformedPGxpairedwithanagnosticcomputationalapproach.MethodsToelucidateoverlappingPGxsignalsofmetforminexposure,weincludedmetabolites(n=5)withcorrelatedplasmaconcentration,adjustedformetforminexposure,inabiobankcohort-based,case-controlstudy.Cases(n=274)wereexposedtometforminmonotherapywithT2D;healthycontrols(n=274)hadnoknowndrugexposures.Casesandcontrolswerematchedbyageandgender,andadjustedforBMIandbatch.Apanelofaminoacidmetabolite(n=42)concentrationswasquantitativelymeasuredusingtandemliquidchromatography-massspectrometryfromfastingplateletpoorplasmasamplescollectedinEDTA.Genotypingwasperformedusingthe700kSNPIlluminaOmniExpressarrayplatformfrom250ngofDNA.Normalizedmetaboliteconcentrationswereutilizedasendpointstoinformgenomewideassociations.ResultsIncreasedplasmametaboliteconcentrationsforleucine(t=4.47,p=<0.0001),isoleuceine(t=4.63,p=<0.0001),andvaline(t=4.48,p=<0.0001)wereobservedwithexposuretometformin.Variantrs17023164(MAF=0.31),intheTryptophanylTRNASynthetase2,Mitochondrial(WARS2)generegionofchromosome1andaneQTLforWARS2infibroblasts,wasacommondownwardmodifierofleucine(β=-11.69,p=1.79e-7),isoleuceine(β=-6.99,p=2.40e-6),andvaline(β=-14.55,p=1.04e-5)withmetforminexposure.NoSNPsinneighboringgenesregionswereinhighLD(R^2>0.5)withrs17023164.ConclusionIncreasedplasmametaboliteconcentrationsforleucine,valine,andisoleucinewereobservedwithmetforminexposure.Acommonvariant,rs17023164inWARS2,wasidentifiedasastrongdownwardmodifierofthesemetaboliteswithmetforminexposure.Independently,WARS2isproposedasadeterminantofangiogenesis(NatCom2016;(7)12061).Wepositahypothesis:modificationofmetabolitebiomarkerconcentrationassociatedwithmetforminexposurebyWARS2variantsisapotentiallinkbetweenmetforminandangiogenesis.Functionalcharacterizationofapotentialmechanismformetformininhibitionofangiogenesis,modifiedbyWARS2,isongoing.

Page 107: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

97

ESTIMATIONOFFALSENEGATIVERATESVIAEMBEDDINGSIMULATEDEVENTS

StephenV.Gliske1,KatyL.Lau1,BenjaminH.Brinkman2,GregA.Worrell2,CrisG.Fink3,WilliamC.Stacey1

1UniversityofMichigan,2MayoClinic,3OhioWesleyanUniversity

StephenGliskAutomatedeventdetectionistheresultofmanytypesofdata-drivenpatternrecognitionmethods.Oneofthegeneralchallengestotheseanalyzesisthequantificationandcorrectionforfalsenegativedetections,i.e.,caseswheretheevent(pattern)ispresentinthedatabutwasnotdetected.Estimatingthefalsepositiverateismucheasier,ashumanreviewofasubsampleofdetectedeventsissufficient.However,determiningthefalsenegativeratebyhumanreviewwouldrequiremanualsearchingthroughtherawdata,whichisimpractical,ifnotcompletelyinfeasible.Thischallengeisnotuniquetobiomedicaldataandiscommonlyaddressedinhighenergyphysics.Theapproachiscalledembedding.Itisapplicabletoanyanalysiswhereatleastoneofthesignalorbackgroundcanbemodeledwellbysimulations.Byplacingspecificeventsatknownlocations,onecanthenruntheautomateddetectorandreportthefractionofembeddedeventsthatweredetected.Wepresentthefirstapplicationofembeddingtoneurologicaldata,specificallytheautomateddetectionofabiomarkerofepilepsy(highfrequencyoscillations)recordedinintracranialelectroencephalogram(EEG)data.Thefalsenegativerateisfoundtobeconsistentacrossbothrecordingchannelandacrosspatients.

Page 108: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

98

INTEGRATIVE,INTERPRETABLEDEEPLEARNINGFRAMEWORKSFORREGULATORYGENOMICSANDEPIGENOMICS

ChuanShengFoo,AvantiShrikumar,JohnnyIsraeli,PeytonGreenside,ChrisProbert,AnnaScherbina,RahulMohan,NathanBoley,AnshulKundaje

StanfordUniversity

AnshulKundajeWepresentgeneralizableandinterpretablesuperviseddeeplearningframeworkstopredictregulatoryandepigeneticstateofputativefunctionalgenomicelementsbyintegratingrawDNAsequencewithdiversechromatinassayssuchasATAC-seq,DNase-seqorMNase-seq.First,wedevelopnovelmulti-channel,multi-modalCNNsthatintegrateDNAsequenceandchromatinaccessibityprofiles(DNase-seqorATAC-seq)topredictin-vivobindingsitesofadiversesetoftranscriptionfactors(TF)acrosscelltypeswithhighaccuracy.Ourintegrativemodelsprovidesignificantimprovementsoverotherstate-of-the-artmethodsincludingrecentlypublisheddeeplearningTFbindingmodels.Next,wetrainmulti-task,multi-modaldeepCNNstosimultaneouslypredictmultiplehistonemodificationsandcombinatorialchromatinstateatregulatoryelementsbyintegratingDNAsequence,RNA-seqandATAC-seqoracombinationofDNase-seqandMNase-seq.Ourmodelsachievehighpredictionaccuracyevenacrosscell-typesrevealingafundamentalpredictiverelationshipbetweenchromatinarchitectureandhistonemodifications.Finally,wedevelopDeepLIFT(DeepLinearImportanceFeatureTracker),anovelinterpretationengineforextractingpredictiveandbiologicalmeaningfulpatternsfromdeepneuralnetworks(DNNs)fordiversegenomicdatatypes.DeepLIFTcanintegratethecombinedeffectsofmultiplecooperatingfiltersandcomputeimportancescoresaccountingforredundantpatterns.WeapplyDeepLIFTonourmodelstoobtainunifiedTFsequenceaffinitymodels,inferhighresolutionpointbindingeventsofTFs,dissectregulatorysequencegrammarsinvolvinghomodimerandheterodimericbindingwithco-factors,learnpredictivechromatinarchitecturalfeaturesandunravelthesequenceandarchitecturalheterogeneityofregulatoryelements.

Page 109: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

99

VISUALIZATIONOFCOMPLEXDISEASESANDRELATEDGENESETS

ModestvonKorff,TobiasFink,ThomasSander

ActelionPharmaceuticalsLtd.,Allschwil,Switzerland

ModestvonKorffTherelationsbetweengenesanddiseasesformcomplexpatterns.Visualizationofthesepatternsenablesthescientisttoobtainanoverviewofthemostimportantgene–diseaserelations.Thesegene–diseaserelationsareofhighimportanceindrugdiscovery.Proteinsencodedbydisease-relatedgenesarepotentialtargetsfornewdrugsormaybecomebiomarkersfordiseasediagnosis.Bothanoveldrugtargetandabiomarkershouldbehighlyspecificfortheaimeddisease.Inourpublicationforthisconference,weintroducearelevanceestimator.Thisrelevanceestimatorisameasureofthespecificityofagene–diseaserelationshipthatalsotakesintoconsiderationallotherknowngene–diseaserelationships.Weanalyzedgene–diseaserelationshipsfrom22millionPubMedrecordsandobtainedamatrixwithrelevanceestimatorsforabout5000diseasesand15,000genes.Thisrelevancematrixenabledustoexpressthesimilaritybetweendiseaseswithsimplevector-baseddistancemeasures.Ameaningfuldisease–gene–diseasevisualization,consistingofseverallayers,wasderivedfromthesedisease–diseasesimilaritymeasuresandtherelevanceestimators.Themultidimensionalvisualizationspresentedheregiveanoverviewofcomplexdiseaseslikeasthma,Alzheimer'sdiseaseandhypertension.

Page 110: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

100

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES

POSTERPRESENTATIONS

Page 111: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

101

FINDINGSFROMTHEFOURTHCRITICALASSESSMENTOFGENOMEINTERPRETATION,ACOMMUNITYEXPERIMENTTOEVALUATEPHENOTYPE

PREDICTION

StevenE.Brenner1,GaiaAndreoletti1,RogerAHoskins1,JohnMoult2,CAGIParticipants,

1UniversityofCalifornia,Berkeley;2IBBR,UniversityofMaryland,Rockville,MD

StevenBrennerTheCriticalAssessmentofGenomeInterpretation(CAGI,\'kā-jē\)isacommunityexperimenttoobjectivelyassesscomputationalmethodsforpredictingthephenotypicimpactsofgenomicvariation.CAGIparticipantsareprovidedgeneticvariantsandmakepredictionsofresultingphenotype.Thesepredictionsareevaluatedagainstexperimentalcharacterizationsbyindependentassessors.

ThefourthCAGIexperimentconcludedthisyear.Itincluded11challengeswhichreflected:non-synonymousvariantsandtheirbiochemicalimpactmeasuredbytargetedassays;noncodingregulatoryvariantsandtheirimpactongeneexpression;researchexomesforpredictionofcomplextraits;personalgenomesandtraitprofiles;andclinicalsequencesandassociatedreferringindications.

TherewerenotablediscoveriesthroughouttheCAGIexperiment,andgeneralthemesemerged.Theindependentassessmentfoundthattopmissensepredictionmethodsarehighlystatisticallysignificant,butindividualvariantaccuracyislimited.Moreover,missensemethodstendtocorrelatebetterwitheachotherthanwithexperiments(forreasonsthatmayreflectthepredictivemethodsandtheassaysthemselves).However,theremightbepotentialformissenseinterpretationattheextremeofthedistribution.Structure-basedmissensemethodsexcelinafewcases,whileevolutionary-basedmethodshavemoreconsistentperformance.Bespokeapproachesoftenenhanceperformance.

Ontheclinicalstudies,predictorswereabletoidentifycausalvariantsthatwereoverlookedbytheclinicallaboratory,anditappearsthatphysiciansmaynotalwaysorderthemostrelevantgenetictestfortheirpatients.CAGIdatashowthatrunningmultipleuncalibratedmethodsandconsideringtheirconsensusoftenprovidesundueconfidenceintheircorrelation;wethereforeadviseagainstrunningmultipleuncalibratedvariantinterpretationtoolsinclinicalanalysis.

Theresultsshowedthatpredictingcomplextraitsfromexomesisfraught.Interpretationofnon-codingvariantsshowspromisebutisnotatthelevelofmissense.Beyondthis,creatingageneticstudythatprovidesareliablegoldstandardisremarkablydifficult.However,therewerenotableimprovementsintheabilitytomatchgenomestotraitprofiles.

CompleteinformationaboutCAGImaybefoundathttps://genomeinterpretation.org.

Page 112: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

102

ASTROLABE:EXPANSIONTOCYP2C9ANDCYP2C1

AndreaGaedigk1,GreysonP.Twist2,SarahSoden2,EmilyG.Farrow2,NeilA.Miller2

1DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,SchoolofMedicine,UniversityofMissouri-KansasCity;2CenterforPediatricGenomic

Medicine,Children'sMercy,KansasCity

AndreaGaedigkBackground:CYP2C9and19arehighlypolymorphicpharmacogenesmetabolizingnumerousdrugs.BotharegeneswithCPICguidelinesunderscoringtheirclinicalrelevance.Tofacilitatehaplotypecallingandtranslationintophenotype,wehavedevelopedaprobabilisticscoringsystem,Astrolabe(initiallycalledConstellation;Twistetal2016,GenMed1:15007)thatenablesautomatedCYP2D6diplotypecallingfromwholegenomesequencing.WereportheretheextensionofAstrolabetoCYP2C9and2C19.Methods:ThestudywasapprovedbytheInstitutionalReviewBoardofChildren’sMercyandincluded85subjects(7HapMap;78patients/parents).AlleledefinitionsareaccordingtotheP450NomenclatureDatabase(cypalleles.ki.se/)withsomemodifications.Exonsand100bpofflankingintronswereusedforAstrolabecallsaswellas-2990to-440ofCYP2C9and-1063to-180ofCYP2C19harboringSNPsdefiningCYP2C9*8andCYP2C19*27,respectively.Allbut3subjectsweregenotypedforCYP2C9*2,*3,*5and*8andCYP2C19*2-*4,*17,*27and*35usingTaqManassaystovalidateAstrolabecalls.WGSdatawerereanalyzedwiththeDRAGENBio-ITprocessor(EdicoGenome)toimprovevariationcallquality.ToaccountforhaplotypeanddiplotypecombinationsnotobservedinoursamplesetsimulationsofallpossiblediplotypecombinationswereperformedusingtheARTreadsimulatorandDRAGENanalysispipeline.Astrolabeisavailableathttps://www.childrensmercy.org/genomesoftwareportal/Results:TomaximizeAstrolabecallaccuracy,intronregionswereadjustedtoincludeinformativeSNPswhileexcludingthosethatoccuronnumeroushaplotypesand/orarenotpartofadefinedallele.TheCYP2C9exon1region,e.g.waslimitedto57bpofintron1toexclude251T>C,whichispresentin1155/3540subjects(CMHvariantwarehousedatabase).ThisSNPdefinesCYP2C*29,butinterferedwithAstrolabecallsbyovercallingCYP2C*29intheabsenceofitskeySNP(33437C>A).OptimizedcallingtargetregionswerethenusedtocompareAstrolabewithgenotypecalls.Astrolabecorrectlycalled68/75(90.67%)and71/75(94.67%)ofsubjectsforCYP2C9and19,respectively.AmongtheallelesdetectedbyAstrolabeandgenotypingwereCYP2C9*2,*3and*8andCYP2C19*2,*17,*27and*35.AstrolabealsoidentifiedsubjectscarryingtherareCYP2C9*9and*11andCYP2C19*15alleleswhichwerenotcoveredbygenotyping.Astrolabecorrectlycalled1077/1128simulatedCYP2C19diplotypes(95%recall;45missedand6multiplecalls).Allmissedcallswere*12calledas*1.ForCYP2C9,Astrolabecorrectlycalled2186/2278simulateddiplotypes(95%recall;61missedand31multiplecalls).Allmissedcallswere*25calledas*1.Discussion:Astrolabe’sfunctionalitywassuccessfullyexpandedtoCYP2C9and19.PhenotypepredictionbasedonAstrolabewassuperioroverthatderivedfromalimitedgenotypepanel.ContinuedimprovementandexpansionofthenomenclaturedefinitionswillallowustoresolvethemiscalledhaplotypesrepresentedinthesimulationsetandimproveAstrolabecallingacrossalldiplotypes.

Page 113: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

103

HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER

JonathanGallion,AngelaD.Wilkins,OlivierLichtarge

BaylorCollegeofMedicine

JonathanGallionThediscoveryofdrivergenesisamajorpursuitofcancergenomics,usuallybasedonobservingthesamemutationindifferentpatients.Buttheheterogeneityofcancerpathwaysplusthehighbackgroundmutationalfrequencyoftumorcellsoftencloudthedistinctionbetweenlessfrequentdriversandinnocentpassengermutations.Here,toovercomethesedisadvantages,wegroupedtogethermutationsfromclosekinaseparalogsunderthehypothesisthatcognatemutationsmayfunctionallyfavorcancercellsinsimilarways.Indeed,wefindthatkinaseparalogsoftenbearmutationstothesamesubstitutedaminoacidatthesamealignedpositionsandwithalargepredictedEvolutionaryAction.Functionally,thesehighEvolutionaryAction,non-randommutationsaffectknownkinasemotifs,butstrikingly,theydosodifferentlyamongdifferentkinasetypesandcancers,consistentwithdifferencesinselectivepressures.Takentogether,theseresultssuggestthatcancerpathwaysmayflexiblydistributeadependenceonagivenfunctionalmutationamongmultipleclosekinaseparalogs.Therecognitionofthis“mutationaldelocalization”ofcancerdriversamonggroupsofparalogsisanewphenomenathatmayhelpbetteridentifyrelevantmechanismsandthereforeeventuallyguidepersonalizedtherapy.

Page 114: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

104

SCOTCH:ANOVELMETHODTODETECTINSERTIONSANDDELETIONSFROMNGSDATA

RachelGoldfeder,EuanAshley

StanfordUniversity

RachelGoldfederClinical-gradegenomesequencingandinterpretationrequiresaccurateandcompletegenotypecallsacrosstheentiregenome.Whilesinglenucleotidevariantdetectionishighlyaccurateandconsistent,thesevariantsexplainonlyasmallfractionofdiseaserisk.Othertypesofvariationthatdisrupttheopenreadingframe,suchasinsertionsanddeletions(INDELs),aremorelikelytobeharmful.However,currentmethodshavelowsensitivityforlarger(>=fivebases)INDELs,primarilyduetochallengessurroundingaligningsequencereadsthatspanINDELs.WepresentScotch,anovelINDELdetectionmethodthatleveragessignaturesofpoorreadalignment,readdepthinformation,andmachinelearningapproachestoaccuratelyidentifyINDELsfromnext-generationDNAsequencingdata.Usingbiologicallyrealisticsimulatedgenomesandsequencereadswithtechnologicallyrepresentativeerrorprofiles(generatedbyART),weevaluateScotchandseveralcurrentlyavailableINDELcallers.WeshowthatScotchhashighersensitivitythancurrentmethods,particularlyforlargerINDELs.Finally,wevalidateINDELsthatScotchdiscoveredinoneindividual,NA12878,andshowthatScotchhashighpositivepredictivevalue.ThismethodwillenableresearchersandclinicianstomoreaccuratelyidentifyINDELsassociatedwithpreviouslyunexplainedgeneticconditions.

Page 115: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

105

MAYOOMICSREPOSITORYFORTRANSLATIONALMEDICINE

IainHorton,JeanetteEckel-Passow,StevenHart,ShannonMcDonnell,DavidMead,GayReed,GregDougherty,JasonRoss,JulieSwank,MarkMyers,MathieuWiepert,Rama

Volety,TonyStai,YaxiongLin,RobertFreimuth

MayoClinic

IainHortonTheMayoClinicGenomicDataWarehousehasestablishedtheinfrastructurefoundation,processes,andapplicationstomeetthetranslationalneedsoftheMayoClinicCenterforIndividualizedMedicine(CIM).Throughthestreamlinedandautomateddatapipeline,thenext-gensequencing(NGS)resultsareloadedandintegratedwithclinicaldata,providingthefoundationforthedevelopmentofrevolutionarysolutionsanddiscoveryintheclinicalpracticeandgenomicresearch.Initiatedin2012,withproductiondataingestionbeginninginearly2014,MayoClinic'sTranslationalResearchCenter(TRC)hasprovidedthecornerstoneplatformfordatacentricactivitieswithinCIM.DatageneratedfromboththeclinicalpipelineandresearchpipelineareautomaticallyloadedintoTRCwitheachnewbitaddingvalueandpowertothesystem.Twokeysolutionswithsignificantpotentialofimpactingpatientcareandscientificdiscoveryhavebeenbuiltonthisgenomicdatawarehouse.FirstistheMolecularDecisionSupportsystem,arule-basedpharmacogenomicssystemthatenablesMayoClinicclinicianstointegrateactionableinformationbasedonapatient'sgenotypeinformationatthepointofcareusingNGSdata.SecondistheMayoVariantSummaryapplication,acloud-nativesystemwhichempowersMayoClinicresearcherstoidentifyrareandactionablegenomicvariantsthroughdynamicfilteringandgroupingofsubjectphenotypeandspecimenmetadata.

Page 116: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

106

PHARMACOGENOMICSCLINICALANNOTATIONTOOL(PHARMCAT)

T.E.Klein1,M.Whirl-Carrillo1,R.M.Whaley1,M.Woon1,K.Sangkuhl1,LesterG.Carter1,H.M.Dunnenberger2,P.E.Empey3,A.T.Frase4,R.R.Freimuth5,A.Gaedigk6,A.Gordon7,C.Haidar8,J.K.Hicks9,J.M.Hoffman8,M.T.Lee10,N.Miller11,S.D.Mooney12,T.N.Person13,J.F.Peterson14,M.V.Relling8,S.A.Scott15,G.Twist11,A.Verma13,M.S.Williams10,C.Wu16,W.Yang8,M.D.Ritchie4,13

1DeptGenetics,StanfordUniv,Stanford,CA;2CenterforMolecularMedicine,NorthShoreUniversityHealthSystem,EvanstonIL;3DepartmentofPharmacyandTherapeutics,SchoolofPharmacy,

UniversityofPittsburgh;4DepartmentofBiochemistryandMolecularBiology,ThePennsylvaniaStateUniversity,UniversityPark,PA;5DepartmentofHealthSciencesResearch,MayoClinic,RochesterMN;6DivisionofClinicalPharmacology,Toxicology&TherapeuticInnovation,Children’sMercy-

KansasCity,KansasCity,MO;7DepartmentofMedicine,DivisionofMedicalGenetics,UniversityofWashington,Seattle,WA;8St.JudeChildren'sResearchHospital,Memphis,TN;9DeBartoloFamilyPersonalizedMedicineInstitute,H.LeeMoffittCancerCenter,Tampa,FL;10GenomicMedicine

Institute,GeisingerHealthSystem,Danville,PA;11CenterforPediatricGenomicMedicine,Children’sMercy,KansasCity,MO;12DepartmentofBiomedicalInformaticsandMedicalEducation,UniversityofWashington,Seattle,WA;13BiomedicalandTranslationalInformatics,GeisingerHealthSystem,Danville,PA;14VanderbiltUniversityMedicalCenter,Nashville,TN;15DepartmentofGeneticsand

GenomicSciences,IcahnSchoolofMedicineatMountSinai,NewYork,NY;16DepartmentofMolecularandExperimentalMedicine,TheScrippsResearchInstitute,LaJolla,CA

TeriKleinPharmacogenomics(PGx)decisionsupportandreturnofresultsisanactiveareaofgenomicmedicineimplementationatmanyhealthcareorganizationsandacademicmedicalcenters.TheClinicalPharmacogeneticsImplementationConsortium(CPIC)hasestablishedguidelinessurroundinggene-drugpairsthatcanandshouldleadtoprescribingmodificationsbasedongeneticvariant(s).OneofthechallengesinimplementingPGxisextractinggenomicvariantsandassigninghaplotypes(includingstar-alleles)fromgeneticdataderivedfromsequencingandgenotypingtechnologiesinordertoapplytheprescribingrecommendationsofCPICguidelines.InacollaborationbetweenthePGRNStatisticalAnalysisResource(P-STAR),ThePharmacogenomicsKnowledgebase(PharmGKB),theClinicalGenomeResource(ClinGen),andCPIC,wearedevelopingasoftwaretooltoextractallvariantsfromCPIClevel-AgeneswiththeexceptionofG6PDandHLA,fromageneticdatasetresultingfromsequencingorgenotypingtechnologies(representedasa.vcf),interpretthevariantalleles,inferdiplotypes,andgenerateaninterpretationreportbasedonCPICguidelines.TheCPICpipelinereportcanthenbeusedtoinformprescribingdecisions.WeassembledafocusgroupofthoughtleadersinPGxtobrainstormtheissuesandtodesignthesoftwarepipeline.Wehostedaone-weekHackathonatthePharmGKBatStanfordUniversitytobringtogethercomputerprogrammerswithscientificcuratorstoimplementthefirstversionofthistool.Throughthisprocess,wehaveuncoveredmanyofthechallengessurroundingPGximplementation.Forexample,theinferenceofdiplotypesischallengingforseveralCPIClevel-Agenes.ThissoftwarepipelinewillbemadeavailableundertheMozillaPublicLicense(MPL2.0)anddisseminatedinGithubforthescientificandclinicalcommunitytotest,explore,andimprove.PharmCATwillprovideasolutionthatwillenablesitesimplementingPGxawaytomoreconsistentlyinterpretgenomicresultsandlinkthoseresultstopublishedclinicalguidelines.Furthermore,weareassembling(andwillbemaintaining)thetranslationtablesthatunderliethetool,whichwillsignificantlyreducetheeffortrequiredtoimplementPGxclinicallyandensuremoreuniforminterpretationsofPGxknowledge.Asprecisionmedicinecontinuestomoveintoclinicalpractice,implementationworkflowsforPGx,likePharmCAT,wouldenablestandardizedandconsistentimplementationofPGxgenes.

Page 117: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

107

PCSK9MODULATINGVARIANTSINFAMILIALHYPERCHOLESTEROLEMIA

SarathbabuKrishnamurthy1,DianeSmelser1,ManickamKandamurugu1,JosephLeader1,NouraS.Abul-Husn2,AlanR.Shuldiner2,DavidH.Ledbetter1,FrederickE.Dewey2,David

J.Carey1,MichaelF.Murray1,RaghuP.R.Metpally1

1GeisingerHealthSystem;2RegeneronGeneticsCenter

SarathbabuKrishnamurthyBACKGROUND:Highlypenetrantautosomaldominantfamilialhypercholesterolemia(FH)isknowntobecausedbypathogeniclossoffunction(LOF)variantsinLDLRandgainoffunctionvariantsinPCSK9andAPOBgenes.InadditiontoitscausativeroleinFH,PCSK9LOFvariantsareassociatedwithloweringofserumlowdensitylipoproteincholesterol(LDL-C)andtotalcholesterol.Theaimsofthisstudywereto1.IdentifyrarenovelPCSK9genevariantsthatleadtocompleteorpartiallossofproteinfunctionintheDiscovEHRcohort.2.ExploreprevalenceofPCSK9LOFvariantsinasubsetofFHpatientsand3.ExaminewhetherFHpatientscarryingPCSK9LOFsshowassociationwithloweringtheplasmalowdensityLDL-Candcardiovascularrisk.METHODS:Weanalyzedwholeexomesequencesfrom51,289individualsintheDiscovEHRcohort,whoconsentedtoparticipateintheGeisingerHealthSystem’sMyCodeCommunityHealthInitiative.Raremissenseandpredictivelossoffunction(pLOF)codingvariantsinPCSK9wereidentifiedbyintegratingbioinformaticsandevaluatingLDL-Candtotalcholesterolmeasuresfromtheelectronichealthrecords(EHR).RESULTS:IntheoverallDiscovEHRcohort,weidentified20missenseand13pLOFs(2splicedonor,6stopgainedand5frameshift)rarevariantsinPCSK9,including15novelvariantsthatwereassociatedwithlowerLDL-Candtotalcholesterollevels.LDL-CinpLOFcarrierswassignificantlylowerthaninmissensecarrierswithpresumedpartiallossoffunction(p<0.0012).PatientswithPCSK9raremissensewithpresumedpartialLOForLOFvariantshadsignificantreductionintheincidenceofcoronaryeventscomparedtothecontrolgroup(p<0.0001).InFHpatients,theLDL-loweringPCSK9R46Lvariantpreviouslyreportedas3%prevalencewasfoundtobeenrichedat9.6%andwasassociatedwithlowerLDL-CcomparedtoFHpatientsnotcarryinganR46Lallele.AnovelPCSK9missensevariant(G316S)wasalsopresentinFHpatientswithaprevalenceof0.8%andalsoshowedanLDL-loweringphenotypiceffectinanimputedfamilypedigree.CONCLUSIONS:Overall11.8%oftheFHpatientsintheDiscovEHRcohortwereidentifiedtoalsocarryaPCSK9variantwhichmodulatestheirLDL-Candserumcholesterollevels.

Page 118: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

108

INTEGRATIVENETWORKANALYSISOFPROSTATETISSUELINCRNA-MRNAEXPRESSIONPROFILESREVEALSPOTENTIALREGULATORYMECHANISMSOF

PROSTATECANCERRISKLOCI

NicholasB.Larson1,ShannonMcDonnell1,ZachFogarty1,MelissaLarson1,JohnCheville2,ShaunRiska1,SaurabhBaheti1,AshaA.Nair1,DanielO’Brien1,JaimeDavila1,DanielSchaid1,StephenN.

Thibodeau21DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN;2Departmentof

LaboratoryMedicineandPathology,MayoClinic,Rochester,MN

NicholasLarsonLarge-scalegenome-wideassociationstudieshaveidentified146lociassociatedwithriskofdevelopingprostatecancer(PRCA).However,mostoftheselocidonotlieincloseproximitytoproteincodinggenesandarepresumedtoberegulatoryinnature.DownstreamregulationofproteincodinggenesrelatedtoPRCAdevelopmentmaybemediatedbycis-actingregulationofnearbytranscripts,alsoknownascis-mediatedtrans-eQTLs.Thiscis-mediatorcausalrelationshipiscomprisedofaregulatoryvariant,anearbycis-regulatedgene,andthedownstreamregulatedtranstargetgene.Cis-mediatorsmayincludetranscriptionfactors,signalingproteins,andlongintergenicnon-codingRNAs(lincRNAs).LincRNAscorrespondtoahostofregulatoryfunctionssuchaschromatinremodelingandtranscriptionalco-activation,andhavepreviouslybeenidentifiedasdiagnosticandprognosticbiomarkersforanumberofcancers.Howevertheirroleincancerdevelopmentandprogressionispoorlyunderstood.Toexplorethehypothesisthatcis-mediatedtranseQTLsmayplayaroleinPRCArisk,weleveragedaneQTLdatasetof471samplesofnormalprostatetissuefromprostate/bladdercancerpatientswithavailableRNA-SeqandimputedIlluminaInfinium2.5Mgenotypedata.Wefirstconductedaninitialtranscriptome-wideeQTLscreeningofalllincRNAsandmRNAswith8,073SNPsinhighlinkagedisequilibrium(r2>0.5)withpreviouslyidentifiedPRCArisk-associatedvariants,identifyingapproximately5000transcripts(FDR<0.10)tobeputativelyassociated(cisortrans).WethenconstructedanundirectedGaussiangraphicalregulatorynetworkfromtheexpressionprofilesofthistranscriptsubset,identifying87,468connections.Toidentifycandidatecis-mediatornode-pairsintheexpressionnetwork,weisolatedasubsetofcis-associatedtranscripts(lincRNAormRNA)atastrictBonferronisignificancethreshold.WethenidentifiedallconnectedmRNAnodestothesecis-nodesthatdistaltothecis-variant(>1Mb)andhadevidenceofatrans-associationwiththecisvariant(P<1E-04),resultingin9candidatecis-mediatortrios.Finally,weappliedcausalmediationanalysistotesttheproportionofthetrans-associationthatismediatedbythecis-regulatedtranscript,resultingin7/9significantcis-mediatorrelationships.TranscriptionfactorHNF1Bwasidentifiedtobeasignificantmediatorinthetrans-associationsbetweenrs11263762andthreemRNAs:SRC,MIA2,andSEMA6A.AllthreeexhibitedconcomitantupregulationwithHNF1B.Notably,HNF1AhasbeenshowntostimulateSRCexpressionviaanalternativepromoter,whileMIA2isalsoaknownHNF1Atarget.DysregulationofSEMA6AhasbeenobservedinPRCAmetastasesandplaysapotentialroleinangiogenesisinteractingwithVEGFR2.MSMBandNDRG1bothdemonstrateandrogen-stimulatedexpressioninprostatetissue,andindicatedarecessivepatternofexpressiondysregulationwithrs10993994.Despiteasmallsamplesize,wereplicatedmultipletrans-eQTLsfromthesecis-mediatortriosintheGTExprostatetissueeQTLdataset(P<0.05).Together,ourfindingssuggestdysregulationofRNAexpressionmayplayaroleingeneticpredispositiontoPRCA.

Page 119: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

109

INTEGRATEDANALYSISOFGENOMICS,PROTEOMICS,ANDPHOSPHOPROTEOMICSINCELLSANDTUMORSAMPLES

JasonE.McDermott1,TaoLiu1,SamuelPayne1,VladislavPetyuk1,RichardSmith1,PhilippMertins2,StevenCarr2,KarinRodland1

1PacificNorthwestNationalLaborator,2BroadInstitute

JasonMcDermottAspartoftheClinicalProteomicTumorAnalysisConsortium(CPTAC),wehaverecentlypublishedthefirstlarge-scaleproteomicandphosphoproteomicanalysisofhigh-gradeserousovariantumors.Weobservedthatphosphorylationstatuswasanexcellentindicatorofpathwayactivityandcoulddiscriminatebetweenpatientsurvivaltimes.Inthecurrentworkwehavecombinedthisdatawithcomparabledatafrombreastcancertumorsandcancercelllinestreatedwithkinaseinhibitors,toanswerseveralfundamentalquestionsabouttheroleofphosphorylationincellularprocessesandcancer.Thetotaldatasetcomprisedover150sampleswithverydeepproteomiccoverage(>20,000phosphopeptidesconfidentlyidentified).Wefirstfoundthatthecorrelationbetweenkinaseproteinabundanceandabundanceofphosphorylatedtargetpeptideswasverylow,indicatingthatkinaseabundanceisnotagoodproxyforphosphorylationstatusoverall.However,highlycorrelatedkinase-substratepairsweresignificantlymorelikelytobetruerelationships(fromexistingknowledge),demonstratingthatthismethodcouldbeusedtopredictnovelkinasetargetsinsomecases.Weusedthisanalysistoidentifyseveralnovelkinase-substraterelationshipsthatweredifferentialbetweentumorsubtypes,andthatcorrelatedwithpathwayswherephosphorylationwasaffectedbydrugtreatment.Theserelationshipsarecurrentlyunderinvestigationaspotentialnoveltargetsfortherapeuticintervention.Tobetteranalyzecancer-relevantpathwayactivitywedevelopedanovelapproachthatcharacterizescorrelation,differentialabundance,andstatisticalinteractionsbetweencomponentstoanalyzemultipleomicstypesinthecontextofsignalingandfunctionalpathways.Weusedthisapproach,calledtheLayeredEnrichmentAnalysisofPathways(LEAP),toidentifyactivepathwaysinmolecularsubtypesofovarianandbreastcancer,andseveralnovelsubpopulationsofpatientsdisplayinguniquelydysregulatedpathways.Ourresultsshowthatintegrationofmultipleomicstypeshasgreatpotentialintheareaofdevelopmentofnoveltherapeuticapproachesforpersonalizedmedicine.

Page 120: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

110

NETDX:PATIENTCLASSIFICATIONUSINGINTEGRATEDPATIENTSIMILARITYNETWORKS

ShraddhaPai,ShirleyHui,RuthIsserlin,HussamKaka,GaryD.Bader

TheDonnellyCentre,UniversityofToronto

ShraddhaPaiPatientclassificationhaswidespreadbiomedicalandclinicalapplications,includingdiagnosis,prognosis,diseasesubtypingandtreatmentresponseprediction.Ageneralpurposeandclinicallyrelevantpredictionalgorithmshouldbeaccurate,generalizable,beabletointegratediversedatatypes(e.g.clinical,genomic,metabolomic,imaging),handlesparsedataandbeintuitivetointerpret.WedescribenetDx,asupervisedpatientclassificationframeworkbasedonpatientsimilaritynetworks,thatmeetstheabovecriteria(Ref1).netDxmodelsinputdataaspatientnetworks,andusesnetworkintegrationandmachinelearningforfeatureselection.WedemonstratetheutilityofnetDxbyintegratinggeneexpressionandcopynumbervariantstoclassifybreastcancertumoursasbeingoftheLuminalAsubtype(N=348tumours;Ref2).Usinggeneexpressiondata,netDxperformedaswellasorbetterthanestablishedstateoftheartmachinelearningmethods,achievingameanaccuracyof89%(2%s.d.)inclassifyingLuminalA.Inthesecondapplication,wepredictcase/controlstatusinautismspectrumdisordersbasedontheoccurrenceofrarecopynumberdeletionsinmetabolicpathways(N=3,291patients;Ref3);thispredictorachievedbetterperformancethanpreviouslypublishedmethods.netDxusespathwayfeaturestoaidbiologicalinterpretabilityandresultscanbevisualizedasanintegratedpatientsimilaritynetworktoaidclinicalinterpretation.Uponpublication,netDxsoftwarewillbemadepubliclyavailableviagithub;thesoftwareprovidesworkedexamplesandeasy-to-usefunctionsfordesignofcustompredictorworkflows.Moreathttp://netdx.orgReferences:1.netDxpreprint:http://dx.doi.org/10.1101/0844182.TheCancerGenomeAtlas(2012)Nature490:61.3. Pintoetal.(2014).AmJHumGen.94(5):677.

Page 121: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

111

PREVALENCEANDDETECTIONOFLOW-ALLELE-FRACTIONVARIANTSINCLINICALCANCERSAMPLES

Hyun-TaeShin1,2,JaeWonYun1,2,NayoungK.D.Kim1,Yoon-LaChoi2,3,Woong-YangPark1,2,4,PeterJ.Park5

1SamsungGenomeInstitute,SamsungMedicalCenter,Seoul,Korea;2Samsung

AdvancedInstituteofHealthScienceandTechnology,SungkyunkwanUniversity,Seoul,Korea;3DepartmentofPathology&TranslationalGenomics,SamsungMedicalCenter,SungkyunkwanUniversitySchoolofMedicine,Seoul,Korea;4DepartmentofMolecularCellBiology,SungkyunkwanUniversitySchoolofMedicine,Seoul,Korea;5Department

ofBiomedicalInformatics,HarvardMedicalSchool,Boston,MA

Hyun-TaeShinClinicalapplicationofsequencing-basedassaysrequireshighsensitivityandspecificityfordetectinggenomicalterations.Ouranalysisofmorethan5000cancersamplesrevealsthatasignificantfractionofclinically-actionablesomaticvariantsmayhavelowvariantallelefractions(VAF),indicatingtheimportanceofveryhighcoveragesequencingforthesepatients.Asacasestudy,wedescriberefractorycancerpatientswithclinicalresponsetotherapiesthattargetlowVAFalterations.

Page 122: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

112

AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETS

JeffreyA.Thompson1,CarmenJ.Marsit2

1DartmouthCollege,2EmoryUniversity

JeffreyThompsonManyresearchersnowhaveavailablemultiplehigh-dimensionalmolecularandclinicaldatasetswhenstudyingadisease.Asweenterthismulti-omiceraofdataanalysis,newapproachesthatcombinedifferentlevelsofdata(e.g.atthegenomicandepigenomiclevels)arerequiredtofullycapitalizeonthisopportunity.Inthiswork,weoutlineanewapproachtomulti-omicdataintegration,whichcreatesamodelofmethylationdysregulationanditseffectongeneexpressionandthencombinesthismolecularinformationwithclinicalpredictorsaspartofasingleanalysistocreateaprognosticriskscoreforclearcellrenalcellcarcinoma.Theapproachintegratesdatainmultiplewaysandyetcreatesmodelsthatarerelativelystraightforwardtointerpretandwithahighlevelofperformance.Over100randomsplitsofthedataintotrainingandtestingsets,ourmodelhadthehighestmedianC-indexofanymethodwetried,at.792.Furthermore,wedemonstratedthatourmolecularriskpredictorisindependentofclinicalcovariatesandthatthecombinedmodelresultsinstatisticallysignificantlyhigheraccuracythaneitherdatatypealone.Additionally,theproposedprocessofdataintegrationitselfcapturesrelationshipsinthedatathatrepresenthighlydisease-relevantfunctions.Thegenesignatureweidentifyforclearcellrenalcellcarcinomaprognosisisenrichedforgenesthatarecentralnodesinaprotein-proteininteractionnetworkassociatedwiththeJAK-STATsignalingcascade,whichitselfisaknownfactorinkidneycancerprogression.Oursignatureisalsoenrichedforgenesinpathwaysinvolvedinimmuneresponse,whichareincreasinglytargetedbynovelcancertherapies.Wecallthismodelthemethylation-to-expressionfeaturemodel(M2EFM).Althoughoneoftheotherapproachesweconsideredalsoresultedinahighlyaccuratemodel,M2EFMperformedbetterwithafarmoreparsimoniousmodelthatshedslightonthepotentialrelationshipbetweenabnormalgeneregulationandcancerprognosis.Givenourresults,wethinkthatfurtherdevelopmentofthisapproachiswarranted.

Page 123: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

113

CYP2D6DIPLOTYPECALLINGFROMWGSUSINGASTROLABE:UPDATE

AndreaGaedigk1,GreysonP.Twist2,SarahSoden2,EmilyG.Farrow2,NeilA.Miller2

1DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,SchoolofMedicine,UniversityofMissouri-KansasCity;2CenterforPediatric

GenomicMedicine,Children'sMercy,KansasCityGreysonTwistBackground:Tofacilitatehaplotypecallingandtranslationintophenotype,wehavepreviouslydevelopedaprobabilisticscoringsystem,Astrolabe(initiallycalledConstellation;Twistetal2016,GenMed1:15007)enablingautomatedCYP2D6diplotypecallingfromwholegenomesequencing.Wehaveimplementedaseriesofimprovementstoincreasecallaccuracyaswellaseaseofuse.Methods:TheStudywasapprovedbytheInstitutionalReviewBoardofChildren’sMercyKansasCityandincludedatotalof85subjects(7HapMap;78patients/parents).WGSdatawerereanalyzedwiththeDRAGENBio-ITprocessor(EdicoGenome)toimprovethequalityofvariationcalls.TheAstrolabeCYP2D6alleledefinitiontablewasexpandedtoincludea)additionalvariantsavailablethroughtheP450NomenclatureDatabase;b)variantscharacterizedbyourlaboratory,butnotavailablethroughtheNomenclatureDatabase;c)resequencingofsomealleles(e.g.*10,*17)forwhichonlyexonsareannotatedbytheNomenclatureDatabase.Programmingerrorsinthescoringalgorithmwererepairedandunittestedaswellasabroadrangeofvariantfileinputtypeswereincluded(vcf,gvcf,tabix,.gz).ImprovementsalsoincludeversioningoftheAstrolabetoolandthenomenclaturedatafromwhichcallsaregenerated.ToaccountforhaplotypeanddiplotypecombinationsnotobservedinoursamplesetsimulationsofallpossiblediplotypecombinationswereperformedusingtheARTreadsimulatorandDRAGENanalysispipeline.Astrolabeisavailableathttps://www.childrensmercy.org/genomesoftwareportal/.Results:TomaximizeAstrolabecallaccuracy,weremovedCYP2D6*1E,*3B,*4A-L,*4N,*6D,*10C-D,and*45Bfromthecallset,becauseofincompletealleledefinitions(basedonexonsonly),orSNP(s)thatarenotuniquetoanallele.Forexample,1749A>GispartoftheCYP2D6*3Band*103definitions,butalsoappearstobepresentonsome*1subvariants.Likewise,3288A>GisnotlimitedtoCYP2D6*6Dasimpliedbythenomenclaturedatabase,thuscausingerroneousAstrolabecalls.Callswithourreviseddefinitionswerecomparedwiththoseobtainedbygenotyping.AstrolabealsoaccuratelyidentifiedsubjectswithcopynumbervariationsincludingtheCYP2D6*5deletion(n=5)andgeneduplications(n=2).Also,increasedvariantcallingaccuracyoftheDRAGENpipelineimprovedthecallingofseveralsamples(n=).Astrolabecorrectlycalled7731/8128simulateddiplotypes(95%recall);133missedand264multiplecalls).Ofthemissedcalls124weredueto*38calledas*1.Discussion:TheseriesofimprovementstoAstrolabeincreasedcallaccuracyandminimizedthenumberofnocalls.PhenotypepredictionbasedonAstrolabewassuperioroverthatderivedfromalimitedgenotypepanel.ContinuedrefinementofexistingalleledefinitionsandtheinclusionofnovelhaplotypedefinitionswillfurtherimprovetheAstrolabetool.WearecurrentlyapplyingAstrolabetootherNGSdatasetsincludingexomesandtargetedNGSpanels.

Page 124: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

114

INTEGRATION,INTERPRETATIONANDDISPLAYOFMULTI-OMICDATAFORPRECISIONMEDICINE

DavidS.Wishart1,AnaMarcu1,AnChiGuo1,AshAnwar2,SolveigJohannessen3,CraigKnox4,MichaelWilson4,ChristophH.Borchers5,PieterCullis6,RobertFraser2

1UniversityofAlberta,2MolecularYouInc.,3EduceDesignInc.,4OMxInc.,5Universityof

Victoria,6UniversityofBritishColumbia

DavidWishartThegoalofprecisionmedicineistouseadvancedmulti-omictechnologiestoimprovetheaccuracyofmedicaldiagnosesandenhancetheindividualizationofmedicaltreatment.Thefundamentalchallengeinprecisionmedicineisnotinthemeasurementorcollectionofmulti-omicdatabutinitsdelivery.Inparticular,theintegration,interpretationanddisplayofmulti-omicdatahasproventobeparticularlyproblematic.Herewedescribesomeofourexperiencesintacklingthisproblemandoutlineanumberofimportantfindingsthatwebelieveareworthsharing.Ourmostimportantfindingwastheneedtousehighquality,quantitative‘omicsdata.Measuringabsolutelyquantitative‘omicsdataensuresgreaterreproducibilityandpermitsdirectcomparisonstowell-establishedclinicalreferencevalues.Several‘omicslaboratoriesofferingquantitativeserviceshavebeenidentifiedandthesearedescribedhere.Second,wediscoveredthatcustomdatabasescontainingbiomarker-diseasedataareessential.Veryfewofthesekindsofdatabasesexist,buttheyarenecessaryforthecomparisonandfullintegrationofmulti-omicdata.Inparticular,theyprovidetheinformationneededtointegratemulti-omicmeasuresandtodeterminediseaserisk.Abriefdescriptionofafewofthesebiomarker-diseasedatabasesisprovided.Third,wediscoveredthatcolor-codedgraphs,whicharehyperlinkedtodetailedtextualexplanations,arenecessaryforthefacileinterpretationofthemulti-omicdata–bothbypatientsandphysicians.Anexampleofawell-designed,web-enabled“dashboard”isshowntohighlightthesefindings.Finallywefoundthatcomprehensivedatabasesofactionableresponsesmustbepreparedsothatdetailed,customizablemedical,lifestyle,dietorpharmacologicalguidancecanbeprovidedtotreatorpreventconditionsdetectedbythesemulti-omicmeasurements.Examplesofseveralomics-derived,actionableresponsesareprovidedtoclarifythispoint.Thesefindings,alongwithseveralassociatedsoftwaretoolsanddatabases,haverecentlybeenintegratedintoanautomaticworkflowthatallowsawiderangeofmulti-omicmeasurementstobeintegrated,interpretedanddisplayedforprecisionorpersonalizedmedicineapplications.

Page 125: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

115

BIOTHINGSAPIS:LINKEDHIGH-PERFORMANCEAPISFORBIOLOGICALENTITIES

JiwenXin1,CyrusAfrasiabi1,SebastienLelong1,GingerTsueng1,SeanD.Mooney2,AndrewI.Su1,ChunleiWu1

1TheScrippsResearchInstitute,2TheUniversityofWashington

ChunleiWuTheaccumulationofbiologicalknowledgeandtheadvanceofwebandcloudtechnologyaregrowinginparallel.Recently,manybiologicaldataprovidersstarttoprovideweb-basedAPIs(ApplicationProgrammingInterfaces)foraccessingdatainasimpleandreliablemanner,inadditiontothetraditionalrawflat-filedownloads.WebAPIsprovidemanybenefitsovertraditionalfiledownloads.Forinstance,userscanrequestspecificdatasuchasalistofgenesofinterestwithouthavingtodownloadtheentiredataset,therebyprovidingthelatestdataondemandandreducingcomputationanddatatransfertimes.Thismeansthatprogrammerscanspendlesstimeonwranglingdata,andmoretimeonanalysisanddiscovery.Buildinganddeployingscalableandhigh-performancewebAPIsrequiressophisticatedsoftwareengineeringtechniques.Wepreviouslydevelopedhigh-performanceandscalablewebAPIsforgeneandgeneticvariantannotations,accessibleatMyGene.infoandMyVariant.info.Thesetwoservicesareatangibleimplementationofourexpertiseandcollectivelyserveover4millionrequestseverymonthfromthousandsofuniqueusers.Crucially,theunderlyingdesignandimplementationofthesesystemsareinfactnotspecifictogenesorvariants,butrathercanbeeasilyadaptedtootherbiomedicaldatatypessuchdrugs,diseases,pathways,species,genomes,domainsandinteractions.Wearecurrentlyexpandingthescopeofourplatformtootherbiologicalentities.Collectively,wereferthemas“BioThingsAPIs”(http://biothings.io).WealsoappliedJSON-LD(JSONforLinkingData)technologyinthedevelopmentofBioThingsAPIs.JSON-LDprovidesastandardwaytoaddsemanticcontexttotheexistingJSONdatastructure,forthepurposeofenhancingtheinteroperabilitybetweenAPIs.WehavedemonstratedtheapplicationsofJSON-LDwithBioThingsAPIs,includingdatadiscrepancychecksaswellasthecross-linkingbetweenAPIs.

Page 126: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

116

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY

POSTERPRESENTATIONS

Page 127: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

117

SINGLECELLSIGNALINGSTATESREVEALINDUCTIONOFNON-GENETICVARIATIONINRESISTANCETOTRAIL-INDUCEDAPOPTOSIS

ReemaBaskar,HarrisFienberg,GarryNolan,SeanBendall

StanfordUniversity

ReemaBaskarTNFalpha-relatedapoptosis-inducingligand(TRAIL)hasbeenshowntospecificallytargetcancercells,howeverrampantresistancehascurtaileditsefficacyasadrug.Cell-to-cellvariationhasbeenpreviouslylinkedtoresistancetoTRAIL-inducedapoptosis.Wefurtherinvestigatenon-geneticphenotypicvariationasanovelmodeofdrugresistance.Usingmasscytometry,wecapturedhigh-dimensional,single-cellsignalingstatesofdifferentcancertypesoverthecourseofTRAILtreatment.Forthefirsttime,weprovideacomprehensivesinglecelloverviewofTRAILsignalingdynamicsandprovidepopulationmetricstoquantifyheterogeneitywithinresistancephenotypes.WedemonstratethatwhileallcellsrespondtoTRAIL,asubsetofthempersistintransientresistantstatesanddonotprogresstoapoptosis.OurmethodsshowcorrelationbetweenheterogeneityofresponsetoTRAILandpersistenceofnon-apoptotic,viablecancercellsindrug.Wealsoshowthatcombinatorialtherapiesdesignedtoinhibitimplicatedpathwaysinconservedresistantstatesdonoteradicateresistanceandinfactcaninducenewstatesofresistance.Thisstudypresentsexperimentalandcomputationaltoolstoinvestigatenon-geneticphenotypicvariationasanovelmodeofdrugresistanceincanceranddemonstratestheirutilityinunderstandingresistancetoTRAIL-inducedapoptosis.

Page 128: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

118

ANOVELK-NEARESTNEIGHBORSAPPROACHTOCOMPAREMULTIPLEBIOLOGICALCONDITIONSINSINGLECELLDATA

TylerJ.Burns1,GarryP.Nolan2,NikolaySamusik2

1StanfordUniversitySchoolofMedicine,Dept.ofCancerBiology;2StanfordUniversitySchoolofMedicine,BaxterLaboratoryforStemCellBiology

TylerBurnsHighdimensionalsingle-celldataisroutinelyvisualizedintwodimensionsusingdimensionreductionalgorithmsliket-SNE,PrincipleComponentsAnalysis(PCA),orforce-directedgraphs.Whencomparinglevelsofintracellularproteinsinbasalversusperturbedcells,clusteringmustbeusedtovisualizechangesinspecificmarkersinasinglegraph.However,discretizingadatasetdoesnotallowonetounderstandsubtle,rare,and/orcontinuousbiologicalchangesacrosstheoriginalmanifold.Herein,wepresentanalgorithmthatrepresentseachcell’sinformationcontentasitsaverageacrossk-nearestneighbors.Thisallowsforcomparisonstobemadebetweenbiologicalconditionsonaper-cellbasis.Weusethistoproducedetailedt-SNEmapsdepictingbiologicalchange,andcorrelationanalysistoenumeratesignalingresponsestoperturbation.

Page 129: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

119

SINGLE-CELLRNASEQUENCINGINPRIMARYGLIOBLASTOMA:IMPROVINGANALYSISOFHETEROGENEOUSSAMPLESBYINCORPORATING

QUANTIFICATIONOFUNCERTAINTY

WendyMarieIngram,DebdiptoMisra,NicholasF.Marko,MarylynRitchie

GeisingerHealthSystemWendyIngramBackground:Glioblastoma(GBM)isthemostcommonanddeadlybraincancerinadults.Theassociatedlethalitymaybeattributabletotheintrinsicheterogeneityofmicro-invasivetumorcells,someofwhichareunavoidablyleftbehindfollowingtumorresection.Thetranscriptomicheterogeneitymaycontributetothesurvivalandsubsequentproliferationofasmallsubsetofcellsthatareresistanttoradiationandchemotherapy.Ithaslongbeenhypothesizedthatinvestigationsintothesetumorsatasinglecelllevelwillallowforbettermolecularunderstandingoftreatmentresistanceandthedevelopmentofnoveltherapeuticapproaches.Recently,advancesinsinglecellcaptureandsequencingtechnologyhavebecomeavailableandallowforthesestudiestobeconducted.However,therearemanytechnicalandcomputationalchallengesinherenttosinglecelltranscriptomicsthatarenotaddressedbytraditionalRNA-seqanalysistools.Thesechallengesincludeuncertaintyoftechnicalandbiologicalvarianceandmustbecarefullyconsideredinorderforbiologicallyandtherapeuticallyrelevantconclusionstobereached.Methods:TumortissuefromtwoGBMpatientsundergoingsurgicalresectionaspartofstandardofcaretherapywascollectedatthetimeofsurgery.WeusedtheFluidigmC1microfluidicsplatformtocapturesinglecellsfollowedbyRNAsequencing(RNA-seq)ofthesecellsandabulkpopulationof~10,000cellsfromeachtumor.Wecomparedtwodifferenttranscriptomicalignmenttools,Bowtieandkallisto,andanalyzedthesinglecelltranscriptionalheterogeneityofcellswithinandbetweentumorsusingtherecentlydevelopedanalysistools,sleuth.Tothebestofourknowledge,wearethefirsttoutilizethissinglecellcapturemethodandperformsinglecellRNA-seqanalysisusingthenewlydevelopedkallistoandsleuthprogramsforprimaryGBMtissuesamples.Results:WeshowthattheFluidigmC1microfluidicssinglecellcapturemethodproduceshighqualitytranscriptomicmaterialforRNA-seqandmayhavebenefitsoveralternativemethods(e.g.fluorescence-activatedcellsorting)suchasshorterpreparationtime.Thekallisto-sleuthanalysisprogramsprovideimprovedestimationofgeneexpressionvariabilityandmorereliableclusteringofsinglecellsbyleveragingtheuniquefeaturesofequivalencygroupsandbootstrapestimatesofkallisto.Clusteranalysisdemonstratesthatcertaincellsfrombothtumorsclustertogetherandsharesomecommonexpressionpatters,buttheremainingcellsclusterintumor-specificgroupsordonotgroupwithothercells.WeobservemarkedintertumorandintratumortranscriptionalvariabilityandnotethataverageexpressionfromsinglecellsdoesnotreliablycorrelatewiththebulkcellRNA-seqabundanceestimates.Takentogether,wehaveshownthatthecombinationofFluidigmC1andthekallisto-sleuthanalysisprogramsprovetobeusefulandreliablemethodstoobtainandanalyzehighqualitysinglecellRNA-seqdatafortheinvestigationofprimarytumortissues.

Page 130: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

120

REGISTRATIONOFFLOWCYTOMETRYDATAUSINGSWIFTCLUSTERTEMPLATESTOREMOVECHANNEL-SPECIFICORCLUSTER-SPECIFICVARIATION

JonathanA.Rebhahn1,SallyA.Quataert1,GauravSharma2,TimR.Mosmann1

1CenterforVaccineBiologyandImmunology,UniversityofRochesterMedicalCenter;2DepartmentofElectricalandComputerEngineering,UniversityofRochester

TimMosmannStandardizationbetweenflowcytometryexperimentsperformedatdifferenttimesisdifficultbecausevariationsincellparameterscanbecausedbymanyfactors,includingchangesinantibodyreagents,stainingprotocols,cellhandling,differentcytometers,andcytometersettingssuchasphotomultiplieramplificationvoltages.Thesevariationsmayoverwhelmthegenuinebiologicaldifferencesbeinginvestigated,suchasgeneticordisease-specificvariationsbetweensubjects.Technicalvariationscanbepartlyreducedbymanuallyadjustinganalysisgates,butthisissubjectiveandtime-consuming.Previousmethodsforsemi-automatedadjustmenthavereliedonhistogrampeaksormanualgatingtoidentifyanchorpopulations.Wehavenowdevelopedfully-automatedmethodsforregisteringflowcytometrysamples,i.e.normalizingthefluorescenceintensityofeachcellinallchannels.Wetakeadvantageofthehigh-resolutionclustertemplatesderivedbyclusteringreferencesamplesbytheSWIFTalgorithm.ThesetemplatesrepresentGaussianmodeldescriptionsofthemultidimensionaldata.Ifsamplestoberegisteredareatleastmoderatelysimilartothetarget/referencesample,assignmentofthetestsampletothetemplateresultsinmostcellsbeingassignedtotheappropriatecluster,butclustersthathaveshiftedinthetestsamplethenhavealteredmedianvaluesinoneormorechannels.Thishigh-resolutionpositionalinformationisusedfortwotypesofregistration:Rigid,orper-channelregistrationcomparesclusterlocationsbetweenthetargetandthetestsampletoberegistered,andthebest-fitregistrationadjustmentsaredeterminedforeachchannelandappliedincrementally,reassigningthecellsateachsteptoimprovethefinalfit.Thisobjectivelyusespositionalinformationfromallclusters,regardlessofclustersizevariation,andsuccessfullycorrectsglobalartifactssuchasstainingorcytometersettingsthatcause‘batch’differencesbetweenassaydays.Fluid,orper-clusterregistrationcalculatestheregistrationadjustmentrequiredforeachclusterinthetestsampletooverlapfullywithitscorrespondingclusterinthereferencesample.Thisregistersclustersmorecompletely,andcanremoveindividualvariation(duetoe.g.geneticordisease-specificeffects).Fluidregistrationremovesmostpositionalinformation-thisisdesirableifthemainexperimentaloutcomeisexpectedtobevariationsofthenumberofcellsofdifferenttypes.Thismethodhasbeenappliedtodatasetsthatincludechangesduetoassaydates,flowcytometers,subjects,andsequentialbloodsamples.Mostvariationoccurredbetweencytometersandassaydays,lessbetweensubjects,andtheleastbetweendifferentbleedsfromthesameperson.Registrationsubstantiallyimprovedcorrelationsbetweenclustermedians.Thenumberofcellsperclusteralsoshowedincreasedcorrelation,suggestingthatunmodifiedsamplesassignedtotheclustertemplatessometimeshadcellsassignedtoaninappropriatecluster.ThustheSWIFTcluster-basedregistrationcanimprovesubsequentflowcytometryanalysis.Registeredsamplescanbeanalyzedbyavarietyofmanualorautomatedprocedures.

Page 131: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

121

WORKSHOP:NOBOUNDARYTHINKINGINBIOINFORMATICS

POSTERPRESENTATION

Page 132: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

122

ENABLINGRICHERDATAINTEGRATIONFORGENOMICEPIDEMIOLOGY

E. Griffiths1,D.Dooley2,C.Bertelli1,J.Adam3,F.Bristow3,T.Matthews3,A.Petkau3,M.Courtot4,J.A.Carriço5,A.Keddy6,R.Beiko6,L.M.Schriml7,E.Taboada8,M.Graham3,G.VanDomselaar3,

W. Hsiao2,F.Brinkman1

1SFU,Burnaby,BC,Canada;2BCCentreforDiseaseControl,Vancouver,BC,Canada;3PHAC,Winnipeg,MB,Canada;4EBI,Hinxton,Cambridge,UK;5Univ.ofLisbon,Lisbon,Portugal;

6DalhousieUniv.,Halifax,NS,Canada;7Univ.ofMarylandSchoolofMedicine,Baltimore,MD,USA;8PHAC,Lethbridge,AB,Canada

FionaBrinkmanOnebarriertoeffectivelycapitalizingonwholegenomesequencedataisefficient,robustannotationandintegrationofassociatedcontextualdata(metadata).Whetherhuman,microbialorotherorganismalgenomicsequence,frequentlysuchcontextualdataistoounorganized,infreetextformat,toenableeffectiveintegrationforansweringmoresophisticatedquestions.ApproachestohelpovercomethisbarrierareillustratedherewiththeIntegratedRapidInfectiousDiseasesAnalysis(IRIDA.ca)ProjectandGenomicEpidemiologyOntology(GenEpiO.org)Consortium.Microbialpathogenwholegenomesequencingprovidesthehighestresolutionmolecular“fingerprint”forinfectiousdiseaseepidemiologyandistransformingpublichealthpractice–enablingmorerapididentificationofdiseaseoutbreaks,theirsources,andpotentialcontrolmeasures.However,suchmicrobialgenomicdata(likehuman‘omicdata)mustbecombinedwithepidemiological/clinical/laboratory/otherhealthcaredata(“contextualdata”)tobemeaningfullyinterpretedforclinicalandpublichealthquestions/actions.Furthermore,informationmustbesharedbetweendifferentagenciestoefficientlyassessandmanageriskstohumanhealthacrossjurisdictions.Currently,terminologiesdescribingpublichealthdatacannotbeeasilymappedacrossfunctionally-similarsoftwaresystemswithoutintricateinterventionbyspecialists,resultingindataexchangesystemsthatarestaticandfragile.Topromoteefficientdataexchangeandintelligencesharing,weproposeanintuitiveplatformforsearching,identifying,andverifyingthefundamentalhealthcareentityelements(ontologyterms)tomaptoinstitutionalapplicationdataformats,startingwithgenomicandpublichealthcontextualdata.KeyinnovationsaretheproposedGenomicEpidemiologyEntityMart(GE2M)thatallowsuserstoinspecttermdefinitions,labeling,anddatabasecrossreferencesinauser-friendlyformat,plusasoftwaresystemallowingdifferentjurisdictionstousethetermssuitableforthem,essentiallychoosingfroma“shoppingcart”ofoptionsmappedbetweenjurisdictions/organizations.AverypreliminaryprototypeofthisconcepthasbeenestablishedaspartoftheIRIDA.caprojectandtheGenEpiOConsortium(aconsortiumof70researchersfrom15countriesinterestedincontributingtothiseffort).Wehypothesizethatacommonandaccessibleontologyentitymartcanbedeveloped,ifappropriatetoolsforinterfacingdomainexpertswiththismartaredeveloped–andthemartisfirstappliedtopracticalmicrobialgenomicepidemiologydatasharingneedsbetweenselectpublichealthsystems(withconsultationinvolvingalargerconsortium).Inaddition,newgenomicdatavisualizationapproachesarebeingdevelopedforintegrationintotheIRIDAsoftwareplatform,toenablemoreinteractive,flexiblevisualizationofgenomicdatawithdifferentlevelsorviewsofcontextualdata(fromfinelydetailedcomparisonsofgenomicislandsandotherfeaturesbetweengenomes,toexamininggenomicdatainthecontextofgeographicaldata).IRIDAisbeingusedinCanada’spublichealthagency,andthisopensourcesoftwareisalsobeinginstalledinothercountriesinterestedinco-developingthisresourceandusingafederateddatasharingapproach.

Page 133: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

123

AUTHORINDEX

A

Abrams,Zachary·59Abul-Husn,NouraS.·107Adam,J.·122Adams,Micah·54Aevermann,Brian·37Afrasiabi,Cyrus·115Agarwal,Vibhu·17Akbarian,Schahram·72Aldrich,MelindaC.·20,35Alkan,Can·77,87Alser,Mohammed·77Altman,RussB.·79,90Andreoletti,Gaia·101Andres-Terrè,Marta·13Ansel,Mark·80Anwar,Ash·114Armaselu,Bogdan·18Arunachalam,HarishBabu·18Ashley,Euan·104Aslam,Naureen·68Asmann,YanW.·85Ayati,Marzieh·67

B

Bader,GaryD.·110Baheti,Saurabh·108Bai,Yongsheng·68Bakken,Trygve·37Baskar,Reema·117Bauer,ChristopherR.·27Beaulieu-Jones,BrettK.·19Bebek,Gurkan·52Beck,Andrew·50Beck,Mette·28Beiko,R.·122Bellovich,Keith·70Bendall,Sean·117Berens,Michael·31Berry,GeraldJ.·90Bertelli,C.·122Best,AaronA.·2Bhat,Zeenat·70Bichko,Dmitri·76Biernacka,JoannaM.·85Biggin,MarkD.·64Boespflug,Mathieu·76Boley,Nathan·98Bongen,Erika·13Borchers,ChristophH.·114Borecki,Ingrid·34Borrayo,Ernesto·63

Bowden,DonaldW.·45Bowerman,Nathan·2Breitenstein,MatthewK.·96Breitwieser,Gerda·34Brenner,StevenE.·101Brinkman,BenjaminH.·97Brinkman,F.·122Bristow,F.·122Bromberg,Yana·69Brosius,FrankC.·70Brown,AndrewJ.Leigh·83Brubaker,Douglas·52Brunak,Soren·28Burns,TylerJ.·118Bustillo,JuanR.·93

C

Cai,Guoshuai·73Calhoun,VinceD.·9,93Cao,Mengfei·3Carey,DavidJ.·107Carr,Steven·109Carriço,J.A.·122Carter,LesterG.·106Cederberg,Kevin·18Chan,Yu-FengYvonne·23Chance,Mark·67Chang,Rui·11Chasioti,Danai·71Chaudhary,Kumardeep·74Chen,Rong·56Chen,Yii-DerI.·45Cheung,Philip·84Cheville,John·108Chew,Guo-Liang·64Choi,Yoon-La·111Christiansen,Lena·37Clay,AlyssaI.·96Clemons,PaulA.·31Cline,Melissa·15Cohain,Ariella·11Cordero,Pablo·38Correa,Adolofo·45Costello,JamesC.·60Courtot,M.·122Cowen,LenoreJ.·3Crawford,DanaC.·20Cullis,Pieter·114

D

Daescu,Ovidiu·18Danaee,Padideh·44Darrow,Bruce·22

Page 134: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

124

Davila, Jaime·108Davis-Dusenbery,Brandi·14deBelle,J.Steven·84De,Subhajyoti·88Deisseroth,ColeA.·13DeJongh,Matthew·2Denny,Joshua·35deVries,Edsko·76Dewey,FrederickE.·34,107Dhruv,Harshil·31Diaz,Diana·51Diez-Fuertes,Francisco·37Dincer,Aslihan·72Disselkoen,Craig·54Divaraniya,AparnaA.·11Dominguez,Facundo·76Domselaar,G.Van·122Donato,Michele·51Dooley,D.·122Dougherty,Greg·105Draghici,Sorin·51Dudley,JoelT.·11,22,72Dunnenberger,H.M.·106Durmaz,Arda·52

E

Eckel-Passow,Jeanette·105Egawa,Fumiko·33Empey,P.E.·106Ergin,Oguz·77Ertekin-Taner,Nilüfer·85Eskin,Eleazar·80

F

Fantl,WendyJ.·78Farber-Eger,Eric·20Farrow,EmilyG.·102,113Fienberg,Harris·117Fink,CrisG.·97Fink,Tobias·24,99Fogarty,Zach·108Foo,ChuanSheng·98Fornage,Myriam·45Franks,JenniferM.·73Frase,A.T.·106Fraser,Robert·114Fread,KristinI.·39Freedman,BarryI.·45Freimuth,R.R.·106Freimuth,Robert·105

G

Gadegbeku,Crystal·70Gaedigk,A.·106

Gaedigk,Andrea·81,102,113Gallion,Jonathan·29,103Gao,Chen·41Garmire,Lana·74Gavin,Davin·72Gelijns,Annetine·22Genes,Nicholas·23Ghaeini,Reza·44Ghose,Saugata·87Gipson,Debbie·70Giron,Emily·84Glicksberg,Benjamin·56Gliske,StephenV.·97Goldfeder,Rachel·104Gordon,A.·106Gosh,Debashis·88Graham,M.·122Gray,DanielH.·78Greenside,Peyton·98Griffiths,E.·122Groop,Leif·28Guney,Emre·12Guo,AnChi·114

H

Haidar,C.·106Hart,Steven·105Hassan,Hasan·77Hawkins,Jennifer·70Haynes,WinstonA.·13He,Dan·30He,Shuyao·55Hellwege,JacklynN.·45Henderson,TimA.D.·52Hendrix,David·44Hershman,StevenG.·23Herzog,Julia·70Hicks,J.K.·106Hodge,Rebecca·37Hoff,FiekeW.·57Hoffman,J.M.·106Hollister,BrittanyM.·20Hong,Na·75Horton,Iain·105Horton,TerzahM.·57Hoskins,RogerA.·101Hsiao,W.·122Hu,ChenyueW.·57Huang,Austin·76Huang,Kun·7,59Hui,Shirley·110

I

Iakoucheva,LiliaM.·82Imoto,Seiya·91Ingram,WendyMarie·119

Page 135: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

125

Israeli,Johnny·98Isserlin,Ruth·110Ivkovic,Sinisa·14

J

Jebakaran,Jebakumar·22Jiang,Guoqian·75Johannessen,Solveig·114Johnson,KippW.·22Johnson,Travis·59Ju,Wenjun·70

K

Kabat,Halla·53Kaddurah-Daouk,RimaF.·96Kaka,Hussam·110Kamp,Thomas·54Kandamurugu,Manickam·107KanigelWinner,KimberlyR.·60Karakurt,Gunnur·48Kasarskis,Andrew·11,22Kashef-Haghighi,Dorna·33Kaushik,Gaurav·14Keaton,JacobM.·45Kechris,Katerina·86Keddy,A.·122Khatri,Purvesh·13, 46Kiefer,Jeff·31Kim,Jeremie·77,87Kim,Juho·61Kim,Junghi·41Kim,NayoungK.D.·111Kim,Seungchan·31Klein,T.E.·106Knox,Craig·114Ko,MelissaE.·78Kornblau,StevenM.·57Kovatch,Patricia·22Koyutürk,Mehmet·48, 67Kretzler,Matthias·70Krishnamurthy,Sarathbabu·34,107Krishnan,MichelleL.·42Kuan,PeiFen·55Kuncheva,Zhana·42Kundaje,Anshul·98Kural, Deniz ·14

L

Lanchantin,Jack·21Larson,Melissa·108Larson,NicholasB.·108Lasken,RogerS.·37

Lau,KatyL.·97Lavage,DanielR.·27,34Leader,JosephB.·27,34,107Leavey,Patrick·18Ledbetter,DavidH.·107Lee,Donghyuk·77Lee,Inhan·53Lee,M.T.·106Lein,Ed·37Lelong,Sebastien·115Li,JingyiJessica·64Li,Lang·71Li,Li·22Li,MatthewD.·13Li,Shuyu·56Lichtarge,Olivier·25,29,103Lin,Chih-Hsu·25Lin,Dongdong·93Lin,Yaxiong·105Lincoln,StephenE.·15Liu,Charles·13Liu,Jingyu·93Liu,Keli·50Liu,LarryY.·48Liu,Tao·109Lofgren,Shane·13Lopez,Alexander·34Lu,Liangqun·74Lua,RhonaldC.·25Lucas,AnastasiaM.·34Luedtke,Alexander·50

M

Ma,Meng·56Machida-Hirano,Ryoko·63Mahendra,Divya·31Mahlich,Yannick·69Mahoney,J.Matthew·27Mallory,EmilyK.·79Mandric,Igor·80Mangul,Serghei·80Marcu,Ana·114Marko,NicholasF.·119Marsit,CarmenJ.·32,112Martinez,Maria·18Massengill,Susan·70Matthews,T.·122Matveeva,OlgaV.·94McCorrison,Jamison·37McDermott,JasonE.·109McDonnell,ShannonK.·85,105,108McEachin,RichardC.·70Mead,David·105Mehta,Sanket·57Mertins,Philipp·109Metpally,RaghuP.R.·34,107Miller,Jeremy·37Miller,Neil·81,102, 106,113

Page 136: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

126

Miotto,Riccardo·22Mishra,Rashika·18Misra,Debdipto·119Miyano,Satoru·91Mohan,Rahul·98Montana,Giovanni·42Montoya,Dennis·80Mooney,SeanD.·82,106,115Moore,JasonH.·19Moskovitz,Alan·22Mosmann,TimR.·120Moult,John·101Murray,MichaelF.·107Mutlu,Onur·77,87Myers,Mark·105

N

Nair,AshaA.·108Nair,K.Sreekumaran·96Narla,Goutham·67Nazipova,NafisaN.·94Ng,MaggieC.Y.·45Nguyen,Tin·51Nho,Kwangsik·8Ni'Suilleabhain,Molly·18Ning,Xia·71Nolan,GarryP.·39,78,117,118Non,Amy·20Novotny,Mark·37

O

O'Connell,Chloe·33O’Brien,Daniel·108Ogurtsov,AlekseyY.·94Osafo,Nana·89Otolorin,Abiodun·89Overton,John·34

P

Pai,Shraddha·110Palmer,NicholetteD.·45Pan,Wei·41Pandey,Gaurav·47Pankow,JamesS.·45Parida,Laxmi·30Park,PeterJ.·111Park,Woong-Yang·111Paten,Benedict·15Payne,Samuel·109Pejaver,Vikas·82Pen,Jian·65Pendergrass,SarahA.·27,34Peng,Jian·4,61

Penn,John·34Pennathur,Subramaniam·70Perrone-Bizzozero,Nora·93Person,T.N.·106Perumal,Kalyani·70Peterson,Josh·35,106Petkau,A.·122Petyuk,Vladislav·109Pinney,Sean·22Playter,ChristopherS.·78Plevritis,SylviaK.·78Poirion,Olivier·74Pond,Sergei·83Probert,Chris·98Prodduturi,Naresh·75Pyc,MaryA.·84

Q

Qi,Yanjun·21Qu,Meng·4,65Quataert,SallyA.·120Qutub,AminaA.·57

R

Radcliffe,Richard·86Rademakers,Rosa·85Radivojac,Predrag·82Rakheja,Dinesh·18Rasmussen-Torvik,LauraJ.·45Ré,Christopher·79,90Rebhahn,JonathanA.·120Reddy,JosephS.·85Reed,Gay·105Reich,DavidL.·22Reid,Jeffrey·34Relling,M.V.·106Ren,Yingxue·85Restrepo,NicoleA.·20Rich,StephenS.·45Ricks,Doran·22Risacher,ShannonL.·8Riska,Shaun·108Ritchie,MarylynD.·34,106,119Roden,Dan·35Rodland,Karin·109Rogers,Linda·23Ross,Jason·105Ross,OwenA.·85Rossetti,Maura·80Rotman,Jeremy·80Rotter,JeromeI.·45Röttger,Richard·5Rubin,DanielL.·90Rudra,Pratyaydipta·86Russell,Nate·61Russell,Pamela·86

Page 137: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

127

S

Saba,Laura·86Salman,Ali·68Samuels,David·35Samusik,Nikolay·118Sander,Thomas·24,99Sangkuhl,K.·106Sarangi,Vivekananda·85Saykin,AndrewJ.·8Scarpa,JosephR.·11Schadt,EricE.·11,23,56,72Schaid, Daniel ·108Scherbina,Anna·98Scheuermann,RichardH.·37Schlatzer,Daniela·67Schork,Nicholas·37Schreiber,StuartL.·31Schriml,L.M.·122Schultz,André·57Scott,ErickR.·23Scott,Madeleine·46Scott,S.A.·106Sengupta,Anita·18Sengupta,ParthoP.·22Senol,Damla·77,87Shabalina,SvetlanaA.·94Shah,NigamH.·17Shameer,Khader·22Sharma,Gaurav·120Shen,Li·8,71Shi,Wen·86Shifman,Sagiv·80Shin,Hyun-Tae·111Shrikumar,Avanti·98Shuldiner,AlanR.·107Simonovic,Janko·14Singh,Ritambhara·21Sinnwell,JasonP.·85Smelser,Diane·107Smith,Kyle·88Smith,Richard·109Snyder,John·27Snyder,Michael·90Soden,Sarah·102,113Song,Junyan·55Southerland,William·89Speyer,Gil·31Spreafico,Roberto·80Stacey,WilliamC.·97Stai,Tony·105Stanescu,Ana·47Statz,Benjamin·80Steemers,Frank·37Strauli,Nicolas·80Strickland,WilliamD.·39Stuart,JoshuaM.·38Su,AndrewI.·115Su,Hai·7Swank,Julie·105

Sweeney,TimothyE.·13

T

Taboada,E.·122Tam,Andrew·13Taroni,JaclynN.·73Tatonetti,NicholasP.·22Taylor,KentD.·45Teh,Charis·78Thibodeau, Stephen N.·108Thompson,JeffreyA.·32,112Tignor,Nicole·23Tijanic,Nebojsa·14Tintle,Nathan·2,50,54Tomczak,Aurelie·13Tran,DannyN.·37Tran,HaiJ.·31Tsueng,Ginger·115Tully,Tim·84Tunkle,Leo·53Twist,GreysonP.·81,102,106,113

V

Vallania,Francesco·13,46VanDerWey,Will·80VanHouten,Jacob·35Venepally,Pratap·37Venkataraman,GuhanRam·33Verma,A.·106Verma,ShefaliS.·34Vestal,Brian·86Volety,Rama·105vonKorff,Modest·24,99

W

Wagenknecht,LynneE.·45Wall,DennisPaul·33Wang,Beilun·21Wang,Changchang·56Wang,Chao·7Wang,Chen·75Wang,Liewei·96Wang,Pei·23Wang,Sheng·4,65Wang,Yu-Ping·9Weaver,Steven·83Weinshilboum,RichardM.·96Wertheim,Joel·83Westergaard,David·28Whaley,R.M.·106Whirl-Carrillo,M.·106Whitfield,MichaelL.·73Whiting,Kathleen·48Wiepert,Mathieu·105

Page 138: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017psb.stanford.edu/previous/psb17/conference-materials/... · 2020. 2. 12. · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters:

128

Wiggins,Roger·70Wiley,Laura·35Wilkins,AngelaD.·25,29,103Williams,M.S.·106Wilson,JamesG.·45Wilson,Michael·114Wilson,StephenJ.·25Wiredja,Danica·67Wishart,DavidS.·114Wiwie,Christian·5Woon,M.·106Worrell,GregA.·97Wu,Chunlei·106,115

X

Xin,Hongyi·77Xin,Jiwen·115

Y

Yahi,Alexandre·22Yamaguchi,Rui·91Yan,Jingwen·8Yang,HarryTaegyun·80Yang,Lin·7

Yang,Shan·15Yang,W.·106Yao,Xiaohui·71Yoo,Byunggil·81Younkin,SteveG.·85Yu,Kun-Hsing·90Yun,JaeWon·111

Z

Zaitlen,Noah·80Zelikovsky,Alex·80Zhang,Bin·72Zhang,Can·15Zhang,Fan·37Zhang,Pengyue·71Zhang,Yan·59Zhang,Yao-zhong·91Zhu,Chengsheng·69Zhu,Jun·11Zhu,Kuixi·11Ziemek,Daniel·76Zille,Pascal·9Zunder,EliR.·39,78Zweig,Micol·23