Transcript
Page 1: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

PACIFICSYMPOSIUMONBIOCOMPUTING2018

ABSTRACTBOOK

PosterPresenters:Posterspaceisassignedbyabstractpagenumber.Pleasefindthepagethatyourabstractisonandputyourposterontheposterboardwiththecorrespondingnumber(e.g.,ifyourabstractison

page50,putyourposteronboard#50).

Proceedingspaperswithoralpresentations#2-39arenotassignedposterspace.

Abstractsareorganizedfirstbysession,thenthelastnameofthefirstauthor.Presentingauthors’namesareunderlined.

Page 2: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

i

TABLEOFCONTENTS

PROCEEDINGSPAPERSWITHORALPRESENTATIONAPPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY......................................................................................................................................................................1CHARACTERIZATIONOFDRUG-INDUCEDSPLICINGCOMPLEXITYINPROSTATECANCERCELLLINEUSINGLONGREADTECHNOLOGY........................................................................................2XintongChen,SanderHouten,KimaadaAllette,RobertP.Sebra,GustavoStolovitzky,BojanLosic

CELL-SPECIFICPREDICTIONANDAPPLICATIONOFDRUG-INDUCEDGENEEXPRESSIONPROFILES.................................................................................................................................................................3RachelHodos,PingZhang,Hao-ChihLee,QiaonanDuan,ZichenWang,NeilR.Clark,AviMa’ayan,FeiWang,BrianKidd,JianyingHu,DavidSontag,JoelT.Dudley

LARGE-SCALEINTEGRATIONOFHETEROGENEOUSPHARMACOGENOMICDATAFORIDENTIFYINGDRUGMECHANISMOFACTION.......................................................................................4YunanLuo,ShengWang,JinfengXiao,JianPeng

CHEMICALREACTIONVECTOREMBEDDINGS:TOWARDSPREDICTINGDRUGMETABOLISMINTHEHUMANGUTMICROBIOME...............................................................................5EmilyK.Mallory,AmbikaAcharya,StefanoE.Rensi,PeterJ.Turnbaugh,RoselieA.Bright,RussB.Altman

EXTRACTINGABIOLOGICALLYRELEVANTLATENTSPACEFROMCANCERTRANSCRIPTOMESWITHVARIATIONALAUTOENCODERS.............................................................6GregoryP.Way,CaseyS.Greene

CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATAORALPRESENTATION......................................................................................................................................7LARGE-SCALEANALYSISOFDISEASEPATHWAYSINTHEHUMANINTERACTOME...........8MonicaAgrawal,MarinkaZitnik,JureLeskovec

MAPPINGPATIENTTRAJECTORIESUSINGLONGITUDINALEXTRACTIONANDDEEPLEARNINGINTHEMIMIC-IIICRITICALCAREDATABASE................................................................9BrettK.Beaulieu-Jones,PatrykOrzechowski,JasonH.Moore

AUTOMATEDDISEASECOHORTSELECTIONUSINGWORDEMBEDDINGSFROMELECTRONICHEALTHRECORDS...............................................................................................................10BenjaminS.Glicksberg,RiccardoMiotto,KippW.Johnson,KhaderShameer,LiLi,RongChen,JoelT.Dudley

FUNCTIONALNETWORKCOMMUNITYDETECTIONCANDISAGGREGATEANDFILTERMULTIPLEUNDERLYINGPATHWAYSINENRICHMENTANALYSES.........................................11LiaX.Harrington,GregoryP.Way,JenniferA.Doherty,CaseyS.Greene

CAUSALINFERENCEONELECTRONICHEALTHRECORDSTOASSESSBLOODPRESSURETREATMENTTARGETS:ANAPPLICATIONOFTHEPARAMETRICGFORMULA..................12KippW.Johnson,BenjaminS.Glicksberg,RachelHodos,KhaderShameer,JoelT.Dudley

Page 3: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

ii

DATA-DRIVENADVICEFORAPPLYINGMACHINELEARNINGTOBIOINFORMATICSPROBLEMS............................................................................................................................................................13RandalS.Olson,WilliamLaCava,ZairahMustahsan,AkshayVarik,JasonH.Moore

HOWPOWERFULARESUMMARY-BASEDMETHODSFORIDENTIFYINGEXPRESSION-TRAITASSOCIATIONSUNDERDIFFERENTGENETICARCHITECTURES?...............................14YogasudhaC.Veturi,MarylynD.Ritchie

DEMOCRATIZINGHEALTHDATAFORTRANSLATIONALRESEARCH................................15CLINGENCANCERSOMATICWORKINGGROUP–STANDARDIZINGANDDEMOCRATIZINGACCESSTOCANCERMOLECULARDIAGNOSTICDATATODRIVETRANSLATIONALRESEARCH......................................................................................................................16SubhaMadhavan,DeborahRitter,ChristineMicheel,ShrutiRao,AngshumoyRoy,DmitriySonkin,MatthewMcCoy,MalachiGriffith,ObiL.Griffith,PeterMcGarvey,ShashikantKulkarni,onbehalfoftheClingenSomaticWorkingGroup

AHEURISTICMETHODFORSIMULATINGOPEN-DATAOFARBITRARYCOMPLEXITYTHATCANBEUSEDTOCOMPAREANDEVALUATEMACHINELEARNINGMETHODS....17JasonH.Moore,MaksimShestov,PeterSchmitt,RandalS.Olson

BESTPRACTICESANDLESSONSLEARNEDFROMREUSEOF4PATIENT-DERIVEDMETABOLOMICSDATASETSINALZHEIMER'SDISEASE................................................................18JessicaD.Tenenbaum,ColetteBlach

IMAGINGGENOMICS...........................................................................................................................19DISCRIMINATIVEBAG-OF-CELLSFORIMAGING-GENOMICS.......................................................20BenjaminChidester,MinhN.Do,JianMa

DEEPINTEGRATIVEANALYSISFORSURVIVALPREDICTION......................................................21ChenglongHuang,AlbertZhang,GuanghuaXiao

GENOTYPE-PHENOTYPEASSOCIATIONSTUDYVIANEWMULTI-TASKLEARNINGMODEL....................................................................................................................................................................22ZhouyuanHuo,DinggangShen,HengHuang

CODONBIASAMONGSYNONYMOUSRAREVARIANTSISASSOCIATEDWITHALZHEIMER’SDISEASEIMAGINGBIOMARKER..................................................................................23JasonE.Miller,ManuK.Shivakumar,ShannonL.Risacher,AndrewJ.Saykin,SeunggeunLee,KwangsikNho,DokyoonKim

PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES.................................................................................................................24SINGLESUBJECTTRANSCRIPTOMEANALYSISREPRODUCESSIGNEDGENESETFUNCTIONALACTIVATIONSIGNALSFROMCOHORTANALYSISOFMURINERESPONSETOHIGHFATDIET............................................................................................................................................25JoanneBerghout,QikeLi,NimaPouladi,JianrongLi,YvesA.Lussier

USINGSIMULATIONANDOPTIMIZATIONAPPROACHTOIMPROVEOUTCOMETHROUGHWARFARINPRECISIONTREATMENT...............................................................................26Chih-LinChi,LuHe,KouroshRavvaz,JohnWeissert,PeterJ.Tonellato

Page 4: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

iii

COALITIONALGAMETHEORYASAPROMISINGAPPROACHTOIDENTIFYCANDIDATEAUTISMGENES...................................................................................................................................................27AnikaGupta,MinWooSun,KelleyM.Paskov,NateT.Stockham,Jae-YoonJung,DennisP.Wall

CONSIDERATIONSFORAUTOMATEDMACHINELEARNINGINCLINICALMETABOLICPROFILING:ALTEREDHOMOCYSTEINEPLASMACONCENTRATIONASSOCIATEDWITHMETFORMINEXPOSURE................................................................................................................................28AlenaOrlenko,JasonH.Moore,PatrykOrzechowski,RandalS.Olson,JunmeiCairns,PedroJ.Caraballo,RichardM.Weinshilboum,LieweiWang,MatthewK.Breitenstein

ADDRESSINGVITALSIGNALARMFATIGUEUSINGPERSONALIZEDALARMTHRESHOLDS......................................................................................................................................................29SarahPoole,NigamShah

EMERGENCEOFPATHWAY-LEVELCOMPOSITEBIOMARKERSFROMCONVERGINGGENESETSIGNALSOFHETEROGENEOUSTRANSCRIPTOMICRESPONSES.........................30SamirRachidZaim,QikeLi,A.GrantSchissler,YvesA.Lussier

ANALYZINGMETABOLOMICSDATAFORASSOCIATIONWITHGENOTYPESUSINGTWO-COMPONENTGAUSSIANMIXTUREDISTRIBUTIONS.......................................................................31JasonWestra,NicholasHartman,BethanyLake,GregoryShearer,NathanTintle

READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROMNONCODINGDNA..............................................................32CONVERGENTDOWNSTREAMCANDIDATEMECHANISMSOFINDEPENDENTINTERGENICPOLYMORPHISMSBETWEENCO-CLASSIFIEDDISEASESIMPLICATEEPISTASISAMONGNONCODINGELEMENTS.......................................................................................33JialiHan,JianrongLi,IkbelAchour,LorenzoPesce,IanFoster,HaiquanLi,YvesA.Lussier

NETWORKANALYSISOFPSEUDOGENE-GENERELATIONSHIPS:FROMPSEUDOGENEEVOLUTIONTOTHEIRFUNCTIONALPOTENTIALS.........................................................................34TravisS.Johnson,SihongLi,JohnathanR.Kho,KunHuang,YanZhang

LEVERAGINGPUTATIVEENHANCER-PROMOTERINTERACTIONSTOINVESTIGATETWO-WAYEPISTASISINTYPE2DIABETESGWAS...........................................................................35ElisabettaManduchi,AlessandraChesi,MollyA.Hall,StruanF.A.Grant,JasonH.Moore

TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE..........................................36IMPROVINGPRECISIONINCONCEPTNORMALIZATION...............................................................37MaylaBoguslav,K.BretonnelCohen,WilliamA.BaumgartnerJr.,LawrenceE.Hunter

VISAGE:INTEGRATINGEXTERNALKNOWLEDGEINTOELECTRONICMEDICALRECORDVISUALIZATION.................................................................................................................................................38EdwardW.Huang,ShengWang,ChengXiangZhai

ANNOTATINGGENESETSBYMININGLARGELITERATURECOLLECTIONSWITHPROTEINNETWORKS.....................................................................................................................................39ShengWang,JianzhuMa,MichaelKuYu,FanZheng,EdwardW.Huang,JiaweiHan,JianPeng,TreyIdeker

Page 5: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

iv

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONSAPPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY...................................................................................................................................................................40PREDICTIONOFPROTEIN-LIGANDINTERACTIONSFROMPAIREDPROTEINSEQUENCEMOTIFSANDLIGANDSUBSTRUCTURES................................................................................................41PeytonGreenside,MaureenHillenmeyer,AnshulKundaje

LOSS-OF-FUNCTIONOFNEUROPLASTICITY-RELATEDGENESCONFERSRISKFORHUMANNEURODEVELOPMENTALDISORDERS.................................................................................42MiloR.Smith,BenjaminS.Glicksberg,LiLi,RongChen,HirofumiMorishita,JoelT.Dudley

DIFFUSIONMAPPINGOFDRUGTARGETSONDISEASESIGNALINGNETWORKELEMENTSREVEALSDRUGCOMBINATIONSTRATEGIES.............................................................43JielinXu,KellyRegan,SiyuanDeng,WilliamE.CarsonIII,PhilipR.O.Payne,FuhaiLi

CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATA.....................................44OWL-NETS:TRANSFORMINGOWLREPRESENTATIONSFORIMPROVEDNETWORKINFERENCE..........................................................................................................................................................45TiffanyJ.Callahan,WilliamA.BaumgartnerJr.,MichaelBada,AdrianneL.Stefanski,IgnacioTripodi,ElizabethK.White,LawrenceE.Hunter

ANULTRA-FASTANDSCALABLEQUANTIFICATIONPIPELINEFORTRANSPOSABLEELEMENTSFROMNEXTGENERATIONSEQUENCINGDATA........................................................46Hyun-HwanJeong,HariKrishnaYalamanchili,CaiweiGuo,Joshua,M.Shulman,ZhandongLiu

IMPROVINGTHEEXPLAINABILITYOFRANDOMFORESTCLASSIFIER–USERCENTEREDAPPROACH...........................................................................................................................................................47DragutinPetkovic,RussB.Altman,MikeWong,ArthurVigil

TREE-BASEDMETHODSFORCHARACTERIZINGTUMORDENSITYHETEROGENEITY...48KatherineShoemaker,BrianP.Hobbs,KarthikBharath,ChaanS.Ng,VeerabhadranBaladandayuthapani

DEMOCRATIZINGHEALTHDATAFORTRANSLATIONALRESEARCH................................49IDENTIFYINGNATURALHEALTHPRODUCTANDDIETARYSUPPLEMENTINFORMATIONWITHINADVERSEEVENTREPORTINGSYSTEMS.............................................50VivekanandSharma,IndraNeilSarkar

DEMOCRATIZINGDATASCIENCETHROUGHDATASCIENCETRAINING...............................51JohnDarrellVanHorn,LilyFierro,JeanaKamdar,JonathanGordon,CrystalStewart,AvnishBhattrai,SumikoAbe,XiaoxiaoLei,CarolineO’Driscoll,AakanchhaSinha,PriyambadaJain,GullyBurns,KristinaLerman,JoséLuisAmbite

IMAGINGGENOMICS...........................................................................................................................52HERITABILITYESTIMATESONRESTINGSTATEFMRIDATAUSINGTHEENIGMAANALYSISPIPELINE.........................................................................................................................................53BhimM.Adhikari,NedaJahanshad,DineshShukla,DavidC.Glahn,JohnBlangero,RichardC.Reynolds,RobertW.Cox,ElsFieremans,JelleVeraart,DmitryS.Novikov,ThomasE.Nichols,L.ElliotHong,PaulM.Thompson,PeterKochunov

Page 6: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

v

MRITOMGMT:PREDICTINGMETHYLATIONSTATUSINGLIOBLASTOMAPATIENTSUSINGCONVOLUTIONALRECURRENTNEURALNETWORKS......................................................54LichyHan,MaulikR.Kamdar

BUILDINGTRANS-OMICSEVIDENCE:USINGIMAGINGAND‘OMICS’TOCHARACTERIZECANCERPROFILES............................................................................................................................................55ArunimaSrivastava,ChaitanyaKulkarni,ParagMallick,KunHuang,RaghuMachiraju

PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES.................................................................................................................56LOCALANCESTRYTRANSITIONSMODIFYSNP-TRAITASSOCIATIONS..................................57AlexandraE.Fish,DanaC.Crawford,JohnA.Capra,WilliamS.Bush

EVALUATIONOFPREDIXCANFORPRIORITIZINGGWASASSOCIATIONSANDPREDICTINGGENEEXPRESSION...........................................................................................................................................58BinglanLi,ShefaliS.Verma,YogasudhaC.Veturi,AnuragVerma,YukiBradford,DavidW.Haas,MarylynD.Ritchie

READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROMNONCODINGDNA.............................................................59PAN-CANCERANALYSISOFEXPRESSEDSOMATICNUCLEOTIDEVARIANTSINLONGINTERGENICNON-CODINGRNA................................................................................................................60TraversChing,LanaX.Garmire

TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE.........................................61GENEDIVE:AGENEINTERACTIONSEARCHANDVISUALIZATIONTOOLTOFACILITATEPRECISIONMEDICINE.....................................................................................................................................62PaulPrevide,BrookThomas,MikeWong,EmilyK.Mallory,DragutinPetkovic,RussB.Altman,AnaghaKulkarni

POSTERPRESENTATIONSAPPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY...................................................................................................................................................................63CELL-SPECIFICPREDICTIONANDAPPLICATIONOFDRUG-INDUCEDGENEEXPRESSIONPROFILES..............................................................................................................................................................64RachelHodos,PingZhang,Hao-ChihLee,QiaonanDuan,ZichenWang,NeilR.Clark,AviMa’ayan,FeiWang,BrianKidd,JianyingHu,DavidSontag,JoelT.Dudley

SYSTEMATICDISCOVERYOFGENOMICMARKERSFORCLINICALOUTCOMESTHROUGHCOMBINEDANALYSISOFCLINICALANDGENOMICDATA...........................................................65JinhoKim,HonguiCha,Hyun-TaeShin,BoramLee,JaeWonYun,JoonHoKang,Woong-YangPark

IDENTIFICATIONOFAPREDICTIVEGENESIGNATUREFORDIFFERENTIATINGTHEEFFECTSOFCIGARETTESMOKING..........................................................................................................66GangLiu,JustinLi,G.L.Prasad

THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY........................................................................................67MaryA.Pyc,DouglasFenger,PhilipCheung,J.StevendeBelle,TimTully

Page 7: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

vi

EXTRACTINGABIOLOGICALLYRELEVANTLATENTSPACEFROMCANCERTRANSCRIPTOMESWITHVARIATIONALAUTOENCODERS..........................................................68GregoryP.Way,CaseyS.Greene

CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATAORALPRESENTATION...................................................................................................................................69LARGE-SCALEANALYSISOFDISEASEPATHWAYSINTHEHUMANINTERACTOME........70MonicaAgrawal,MarinkaZitnik,JureLeskovec

PROFILINGOFSOMATICALTERATIONSINBRCA1-LIKEBREASTTUMORS.........................71YoudinghuanChen,YueWang,LucasA.Salas,ToddW.Miller,JonathanD.Marotti,NicoleP.Jenkins,ArminjaN.Kettenbach,ChaoCheng,BrockC.Christensen

USINGARTIFICIALINTELLIGENCEINDIGITALPATHOLOGYTOCLASSIFYMELANOCYTICLESIONS................................................................................................................................72StevenN.Hart,W.Flotte,A.P.Norgan,K.K.Shah,Z.R.Buchan,K.B.Geiersbach,T.Mounajjed,T.J.Flotte

AMACHINELEARNINGAPPROACHTOSTUDYCOMMONGENEEXPRESSIONPATTERNS...................................................................................................................................................................................73MingzeHe,CarolynJ.Lawrence-Dill

GENERAL................................................................................................................................................74DATABASE-FREEMETAGENOMICANALYSISWITHAKRONYMER............................................75GabrielAl-Ghalith,AbigailJohnson,PajauVangay,DanKnights

SOFTWARECOMPARISONFORPREPROCESSINGGC/LC-MS-BASEDMETABOLOMICSDATA.......................................................................................................................................................................76JulianAldana,MonicaCalaMolina,MarthaZuluaga

GATEKEEPER:ANEWHARDWAREARCHITECTUREFORACCELERATINGPRE-ALIGNMENTINDNASHORTREADMAPPING......................................................................................77MohammedAlser,HasanHassan,HongyiXin,OğuzErgin,OnurMutlu,CanAlkan

MODELINGTHEENHANCERACTIVITYTHROUGHTHECOMBINATIONOFEPIGENETICFACTORS...............................................................................................................................................................78MinGyunBae,TaeyeopLee,JaehoOh,JunHyeongLee,JungKyoonChoi

FREQUENCYANDPROPERTIESOFMOSAICSOMATICMUTATIONSINANORMALDEVELOPINGBRAIN........................................................................................................................................79TaejeongBae,JessicaMariani,LiviaTomasini,BoZhou,AlexanderE.Urban,AlexejAbyzov,FloraM.Vaccarino

CYCLONOVO:DENOVOSEQUENCINGALGORITHMDISCOVERSNOVELCYCLICPEPTIDENATURALPRODUCTSINSUNFLOWERANDCYANOBACTERIAUSINGTANDEMMASSSPECTROMETRYDATA...................................................................................................................................80BaharBehsaz,HoseinMohimani,AlexeyGurevich,AndreyPrjibelski,MarkF.Fisher,LarrySmarr,PieterC.Dorrestein,JoshuaS.Mylne,PavelA.Pevzner

FUNCTIONALANNOTATIONOFGENOMICVARIANTSINSTUDIESOFLATE-ONSETALZHEIMER’SDISEASE...................................................................................................................................81MariuszButkiewicz,JonathanL.Haines,WilliamS.Bush

Page 8: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

vii

OCTAD:ANOPENCANCERTHERAPEUTICDISCOVERYWORKSPACEINTHEERAOFPRECISIONMEDICINE.....................................................................................................................................82BinChen,BenjaminS.Glicksberg,WilliamZeng,YuyingChen,KeLiu

DEEPLEARNINGPREDICTSTUBERCULOSISDRUGRESISTANCESTATUSFROMWHOLE-GENOMESEQUENCINGDATA......................................................................................................................83MichaelL.Chen,IsaacS.Kohane,AndrewL.Beam,MahaFarhat

DESIGNINGPREDICTIONMODELFORHYPERURICEMIAWITHVARIOUSMACHINELEARNINGTOOLSUSINGHEALTHCHECK-UPEHRDATABASE..................................................84EunKyungChoe,SangWooLee

RICK:RNAINTERACTIVECOMPUTINGKIT..........................................................................................85GalinaA.Erikson,LingHuang,MaximShokhirev

PRIVATEINFORMATIONLEAKAGEINFUNCTIONALGENOMICSEXPERIMENTS:QUANTIFICATIONANDLINKING...............................................................................................................86GamzeGursoy,MarkGerstein

CARPED.I.E.M:ADATAINTEGRATIONEXPECTATIONMAPFORTHEPOTENTIALOFMULTI-`OMICSINTEGRATIONINCOMPLEXDISEASE.....................................................................87TiaTateHudson,ClarLyndaWilliams-DeVane

IMPROVINGGENEFUSIONDETECTIONACCURACYWITHFUSIONCONTIGREALIGNMENTINTARGETEDTUMORSEQUENCING......................................................................88JinHyunJu,XiaoChen,JuneSnedecor,Han-YuChuang,BenMishkanian,SvenBilke

SPARSEREGRESSIONFORNETWORKGRAPHSANDITSAPPLICATIONTOGENENETWORKSOFTHEBRAIN..........................................................................................................................89HidekoKawakubo,YusukeMatsui,TeppeiShimamura

GRIM-FILTER:FASTSEEDLOCATIONFILTERINGINDNAREADMAPPINGUSINGPROCESSING-IN-MEMORYTECHNOLOGIES.........................................................................................90JeremieS.Kim,DamlaS.Cali,HongyiXin,DonghyukLee,SaugataGhose,MohammedAlser,HasanHassan,OğuzErgin,CanAlkan,OnurMutlu

MULTI-CLASSCLASSIFICATIONSTRATEGYFORSUPPORTVECTORMACHINESUSINGWEIGHTEDVOTINGANDVOTINGDROP...............................................................................................91SunghoKim,TaehunKim

GENOME-WIDEANALYSISOFTRANSCRIPTIONALANDCYTOKINERESPONSEVARIABILITYINACTIVATEDHUMANIMMUNECELLS...................................................................92SarahKim-Hellmuth,MatthiasBechheim,BennoPütz,PejmanMohammadi,JohannesSchumacher,VeitHornung,BertramMüller-Myhsok,TuuliLappalainen

PREDICTINGFATIGUESEVERITYINONCOLOGYPATIENTSONEWEEKFOLLOWINGCHEMOTHERAPY...............................................................................................................................................93KordM.Kober,XiaoHu,BruceA.Cooper,StevenM.Paul,ChristineMiaskowski

SINGLE-MOLECULEPROTEINIDENTIFICATIONBYSUB-NANOPORESENSORS.................94MikhailKolmogorov,EamonnKennedy,ZhuxinDong,GregoryTimp,PavelA.Pevzner

GENEEXPRESSIONPROFILEOFOSTEOARTHRITISAFFECTEDFINGERJOINTS................95MilicaKrunic,KlausBobacz,ArndtvonHaeseler

Page 9: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

viii

DISCOVERYANDPRIORITIZATIONOFDENOVOMUTATIONSINAUTISMSPECTRUMDISORDER.............................................................................................................................................................96TaeyeopLee,JaehoOh,MinGyunBae,JunHyeongLee,JungKyoonChoi

CROSSTALKER:ANOPENNETWORKANDPATHWAYANALYSISPLATFORM.....................97SeanMaxwell,MarkR.Chance

SIGNATURESOFNON–SMALL-CELLLUNGCANCERRELAPSEPATIENTS:DIFFERENTIALEXPRESSIONANALYSISANDGENENETWORKANALYSIS............................................................98AbigailE.Moore,BrandonZheng,PatriciaM.Watson,RobertC.Wilson,DennisK.Watson,PaulE.Anderson

RANKINGBIOLOGICALFEATURESBYDIFFERENTIALABUNDANCE.......................................99SoumyashantNayak,NicholasLahens,EunJiKim,GregoryGrant

SYSTEMATICANALYSISOFOBESITYASSOCIATEDVARIATIONSTHROUGHMACHINELEARNINGBASEDONGENOMICSANDEPIGENOMICS..................................................................100JaehoOh,JunHyeongLee,TaeyeopLee,MinGyunBae,JungKyoonChoi

SPARSEREGRESSIONMODELINGOFDRUGRESPONSEWITHALOCALIZEDESTIMATIONFRAMEWORK....................................................................................................................................................101TeppeiShimamura,HidekoKawakubo,HyunhaNam,YusukeMatsui

PDBMAP:APIPELINEANDDATABASEFORMAPPINGGENETICVARIATIONINTOPROTEINSTRUCTURESANDHOMOLOGYMODELS........................................................................102R. MichaelSivley,JohnA.Capra,WilliamS.Bush

REPETITIVERNAANDGENOMICINSTABILITYINHIGH-GRADESEROUSOVARIANCANCERPROGRESSIONANDDEVELOPMENT...................................................................................103JamesR.Torpy,NenadBartonicek,DavidD.L.Bowtell,MarcelE.Dinger

DIMENSIONREDUCTIONOFGENOME-WIDESEQUENCINGDATABASEDONLINKAGEDISEQUILIBRIUMSTRUCTURE.................................................................................................................104YunJooYoo,Suh-RyungKim,SunAhKim,ShelleyB.Bull

THEMULTIPLEGENEISOFORMTEST...................................................................................................105YaoYu,ChadD.Huff

IMAGINGGENOMICS.........................................................................................................................106GENETICANALYSISOFCEREBRALBLOODFLOWIMAGINGPHENOTYPESINALZHEIMER’SDISEASE.................................................................................................................................107XiaohuiYao,ShannonL.Risacher,KwangsikNho,AndrewJ.Saykin,HengHuang,ZeWang,LiShen

PBRM1MUTATIONSAREASSOCIATEDWITHTISSUEMORPHOLOGICALCHANGESINKIDNEYCANCER..............................................................................................................................................108JunCheng,JieZhang,ZhiHan,LiangCheng,QianjinFeng,KunHuang

IMAGEGENOMICSOFINTRA-TUMORHETEROGENEITYUSINGDEEPNEURALNETWORKS........................................................................................................................................................109HuiQu,SubhajyotiDe,DimitrisMetaxas

THENEUROIMAGINGINFORMATICSTOOLSANDRESOURCESCOLLABORATORY(NITRC)ANDITSIMAGINGGENOMICSDOMAIN.............................................................................110LiShen,DavidKennedy,ChristianHaselgrove,AbbyPaulson,NinaPreuss,RobertBuccigrossi,MatthewTravers,AlbertCrowley,andTheNITRCTeam

Page 10: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

ix

IDENTIFYINGTHEGISTOFCNNS:FINDINGINTERPRETABLESIGNATURESOFHISTOLOGYIMAGEMODELSBUILTUSINGNEURALNETWORKS...........................................111ArunimaSrivastava,ChaitanyaKulkarni,KunHuang,ParagMallick,RaghuMachiraju

PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES...............................................................................................................112EXPLORINGTHEPOTENTIALOFEXOMESEQUENCINGINNEWBORNSCREENING.......113StevenE.Brenner,AashishN.Adhikari,YaqiongWang,RobertJ.Currier,RenataC.Gallagher,RobertL.Nussbaum,YangyunZou,UmaSunderam,JosephSheih,FlaviaChen,MarkKvale,SeanD.Mooney,RajSrinivasan,BarbaraA.Koenig,PuiKwok,JenniferM.Puck,TheNBSeqProject

AMETHODFORIMPROVEDVARIANTCALLINGATHOMOPOLYMERMARGINS(ANDELSEWHERE)....................................................................................................................................................114J.Buckley,M.Hiemenz,J.Biegel,T.Triche,A.Ryutov,D.Maglinte,D.Ostrow,X.Gai

EFFICIENTSURVIVALMULTIFACTORDIMENSIONALITYREDUCTIONMETHODFORDETECTINGGENE-GENEINTERACTION..............................................................................................115JiangGui,XuemeiJi,ChristopherI.Amos

BIOINFORMATICSPROCESSINGSTRATEGIESFOREFFICIENTSEQUENCINGDATASTORAGEUSINGGVCFBANDING.............................................................................................................116NicholasB.Larson,ShannonK.McDonnell,IainF.Horton,SaurabhBaheti,JeanetteE.Eckel-Passow,StevenN.Hart

IDENTIFICATIONOFANOVELTSC2MUTATIONINAPATIENTWITHTUBEROUSSCLEROSISCOMPLEX....................................................................................................................................117Jae-HyungLee,Su-KyeongHwang,Jung-eunYang,Chae-SeokLim,Jin-ALee,KyungminLee,Bong-KiunKaang,Yong-SeokLee

CONSIDERATIONSFORAUTOMATEDMACHINELEARNINGINCLINICALMETABOLICPROFILING:ALTEREDHOMOCYSTEINEPLASMACONCENTRATIONASSOCIATEDWITHMETFORMINEXPOSURE..............................................................................................................................118AlenaOrlenko,JasonH.Moore,PatrykOrzechowski,RandalS.Olson,JunmeiCairns,PedroJ.Caraballo,RichardM.Weinshilboum,LieweiWang,MatthewK.Breitenstein

PHARMGKB:NEWWEBSITERELEASE2017........................................................................................119MichelleWhirl-Carrillo,RyanM.Whaley,MarkWoon,KatrinSangkuhi,LiGong,JuliaBarbarino,CarolineThorn,RachelHuddart,MariaAlvarellos,JillRobinson,RussB.Altman,TeriE.Klein

READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROMNONCODINGDNA...........................................................120NETWORKANALYSISOFPSEUDOGENE-GENERELATIONSHIPS:FROMPSEUDOGENEEVOLUTIONTOTHEIRFUNCTIONALPOTENTIALS........................................................................121TravisS.Johnson,SihongLi,JohnathanR.Kho,KunHuang,YanZhang

RANDOMWALKSONMUTUALMICRORNA-TARGETGENEINTERACTIONNETWORKIMPROVETHEPREDICTIONOFDISEASE-ASSOCIATEDMICRORNAS....................................122Duc-HauLe,LievenVerbeke,LeHoangSon,Dinh-ToiChu,Van-HuyPham

Page 11: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

x

TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE.......................................123MININGELECTRONICHEALTHRECORDSFORPATIENT-CENTEREDOUTCOMESTOGUIDETREATMENTPATHWAYDECISIONSFOLLOWINGPROSTATECANCERDIAGNOSIS..........................................................................................................................................................124SelenBozkurt,JungInPark,DanielL.Rubin,JamesD.Brooks,TinaHernandez-Boussard

GDMINER:ABIOTEXTMININGSYSTEMFORGENE-DISEASERELATIONANALYSIS.....125SooJunPark,JihyunKim,SooYoungCho,CharnyPark,YoungSeekLee

WORKSHOP.....................................................................................................................................126MACHINELEARNINGANDDEEPANALYTICSFORBIOCOMPUTING:CALLFORBETTEREXPLAINABILITY...............................................................................................................................126METHODSFOREXAMININGDATAQUALITYINHEALTHCAREINTEGRATEDDATAREPOSITORIES..................................................................................................................................................127VojtechHuser,MichaelG.Kahn,JeffreyS.Brown,RamkiranGouripeddi

MULTI-CLASSCLASSIFICATIONSTRATEGYFORSUPPORTVECTORMACHINESUSINGWEIGHTEDVOTINGANDVOTINGDROP.............................................................................................128SunghoKim,TaehunKim

ATOPOLOGY-BASEDAPPROACHTOQUANTIFYNETWORKPERTURBATIONSCORESFORASSESSMENTOFDIFFERENTTOBACCOPRODUCTCLASSES...........................................129QuynhT.Tran,LeeLarcombe,SubhashiniArimilli,G.L.Prasad

AUTHORINDEX.............................................................................................................................130

Page 12: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

1

APPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 13: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

2

CHARACTERIZATIONOFDRUG-INDUCEDSPLICINGCOMPLEXITYINPROSTATECANCERCELLLINEUSINGLONGREADTECHNOLOGY

XintongChen1,SanderHouten1,KimaadaAllette1,RobertP.Sebra1,GustavoStolovitzky1,2,BojanLosic1

1IcahnSchoolofMedicineatMountSinai,2IBM

Bojan,LosicWecharacterizethetranscriptionalsplicinglandscapeofaprostatecancercelllinetreatedwithapreviouslyidentifiedsynergisticdrugcombination.Weuseacombinationofthirdgenerationlong-readRNAsequencingtechnologyandshort-readRNAseqtocreateahigh-fidelitymapofexpressedisoformsandfusionstoquantifysplicingeventstriggeredbytreatment.Wefindstrongevidencefordrug-induced,coherentsplicingchangeswhichdisruptthefunctionofoncogenicproteins,anddetectnoveltranscriptsarisingfrompreviouslyunreportedfusionevents.

Page 14: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

3

CELL-SPECIFICPREDICTIONANDAPPLICATIONOFDRUG-INDUCEDGENEEXPRESSIONPROFILES

RachelHodos1,2,PingZhang3,Hao-ChihLee1,QiaonanDuan1,ZichenWang1,NeilR.Clark1,AviMa’ayan1,FeiWang3,4,BrianKidd1,JianyingHu3,DavidSontag5,JoelT.

Dudley1

1IcahnSchoolofMedicineatMountSinai,2NewYorkUniversity,3IBMT.J.WatsonResearchCenter,4CornellUniversity,5MassachusettsInstituteofTechnology

Rachel,HodosGeneexpressionprofilingofinvitrodrugperturbationsisusefulformanybiomedicaldiscoveryapplicationsincludingdrugrepurposingandelucidationofdrugmechanisms.However,limiteddataavailabilityacrosscelltypeshashinderedourcapacitytoleverageorexplorethecell-specificityoftheseperturbations.Whilerecenteffortshavegeneratedalargenumberofdrugperturbationprofilesacrossavarietyofhumancelltypes,manygapsremaininthiscombinatorialdrug-cellspace.Hence,weaskedwhetheritispossibletofillthesegapsbypredictingcell-specificdrugperturbationprofilesusingavailableexpressiondatafromrelatedconditions--i.e.fromotherdrugsandcelltypes.Wedevelopedacomputationalframeworkthatfirstarrangesexistingprofilesintoathree-dimensionalarray(ortensor)indexedbydrugs,genes,andcelltypes,andthenuseseitherlocal(nearest-neighbors)orglobal(tensorcompletion)informationtopredictunmeasuredprofiles.Weevaluatepredictionaccuracyusingavarietyofmetrics,andfindthatthetwomethodshavecomplementaryperformance,eachsuperiorindifferentregionsinthedrug-cellspace.Predictionsachievecorrelationsof0.68withtruevalues,andmaintainaccuratedifferentiallyexpressedgenes(AUC0.81).Finally,wedemonstratethatthepredictedprofilesaddvalueformakingdownstreamassociationswithdrugtargetsandtherapeuticclasses.

Page 15: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

4

LARGE-SCALEINTEGRATIONOFHETEROGENEOUSPHARMACOGENOMICDATAFORIDENTIFYINGDRUGMECHANISMOFACTION

YunanLuo,ShengWang,JinfengXiao,JianPeng

UniversityofIllinoisatUrbana-ChampaignYunan,LuoAvarietyoflarge-scalepharmacogenomicdata,suchasperturbationexperimentsandsensitivityprofiles,enablethesystematicalidentificationofdrugmechanismofactions(MoAs),whichisacrucialtaskintheeraofprecisionmedicine.However,integratingthesecomplementarypharmacogenomicdatasetsisinherentlychallengingduetothewildheterogeneity,high-dimensionalityandnoisynatureofthesedatasets.Inthiswork,wedevelopMania,anovelmethodforthescalableintegrationoflarge-scalepharmacogenomicdata.Maniafirstconstructsadrug-drugsimilaritynetworkthroughintegratingmultipleheterogeneousdatasources,includingdrugsensitivity,drugchemicalstructure,andperturbationassays.Itthenlearnsacompactvectorrepresentationforeachdrugtosimultaneouslyencodeitsstructuralandpharmacogenomicproperties.ExtensiveexperimentsdemonstratethatManiaachievessubstantiallyimprovedperformanceinbothMoAsandtargetsprediction,comparedtopredictionsbasedonindividualdatasourcesaswellasastate-of-the-artintegrativemethod.Moreover,Maniaidentifiesdrugsthattargetfrequentlymutatedcancergenes,whichprovidesnovelinsightsintodrugrepurposing.

Page 16: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

5

CHEMICALREACTIONVECTOREMBEDDINGS:TOWARDSPREDICTINGDRUGMETABOLISMINTHEHUMANGUTMICROBIOME

EmilyK.Mallory1,AmbikaAcharya1,StefanoE.Rensi1,PeterJ.Turnbaugh2,RoselieA.Bright3,RussB.Altman1

1StanfordUniversity,2UniversityofCaliforniaSanFrancisco,3FoodandDrug

AdministrationEmily,MalloryBacteriainthehumanguthavetheabilitytoactivate,inactivate,andreactivatedrugswithbothintendedandunintendedeffects.Forexample,thedrugdigoxinisreducedtotheinactivemetabolitedihydrodigoxinbythegutActinobacteriumE.lenta,andpatientscolonizedwithhighlevelsofdrugmetabolizingstrainsmayhavelimitedresponsetothedrug.Understandingthecompletespaceofdrugsthataremetabolizedbythehumangutmicrobiomeiscriticalforpredictingbacteria-drugrelationshipsandtheireffectsonindividualpatientresponse.Discoveryandvalidationofdrugmetabolismviabacterialenzymeshasyielded>50drugsafternearlyacenturyofexperimentalresearch.However,therearelimitedcomputationaltoolsforscreeningdrugsforpotentialmetabolismbythegutmicrobiome.Wedevelopedapipelineforcomparingandcharacterizingchemicaltransformationsusingcontinuousvectorrepresentationsofmolecularstructurelearnedusingunsupervisedrepresentationlearning.WeappliedthispipelinetochemicalreactiondatafromMetaCyctocharacterizetheutilityofvectorrepresentationsforchemicalreactiontransformations.Afterclusteringmolecularandreactionvectors,weperformedenrichmentanalysesandqueriestocharacterizethespace.Wedetectedenrichedenzymenames,GeneOntologyterms,andEnzymeConsortium(EC)classeswithinreactionclusters.Inaddition,wequeriedreactionsagainstdrug-metabolitetransformationsknowntobemetabolizedbythehumangutmicrobiome.Thetopresultsfortheseknowndrugtransformationscontainedsimilarsubstructuremodificationstotheoriginaldrugpair.Thisworkenableshighthroughputscreeningofdrugsandtheirresultingmetabolitesagainstchemicalreactionscommontogutbacteria.

Page 17: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

6

EXTRACTINGABIOLOGICALLYRELEVANTLATENTSPACEFROMCANCERTRANSCRIPTOMESWITHVARIATIONALAUTOENCODERS

GregoryP.Way,CaseyS.Greene

UniversityofPennsylvaniaGregory,WayTheCancerGenomeAtlas(TCGA)hasprofiledover10,000tumorsacross33differentcancer-typesformanygenomicfeatures,includinggeneexpressionlevels.Geneexpressionmeasurementscapturesubstantialinformationaboutthestateofeachtumor.Certainclassesofdeepneuralnetworkmodelsarecapableoflearningameaningfullatentspace.Suchalatentspacecouldbeusedtoexploreandgeneratehypotheticalgeneexpressionprofilesundervarioustypesofmolecularandgeneticperturbation.Forexample,onemightwishtousesuchamodeltopredictatumor'sresponsetospecifictherapiesortocharacterizecomplexgeneexpressionactivationsexistingindifferentialproportionsindifferenttumors.Variationalautoencoders(VAEs)areadeepneuralnetworkapproachcapableofgeneratingmeaningfullatentspacesforimageandtextdata.Inthiswork,wesoughttodeterminetheextenttowhichaVAEcanbetrainedtomodelcancergeneexpression,andwhetherornotsuchaVAEwouldcapturebiologically-relevantfeatures.Inthefollowingreport,weintroduceaVAEtrainedonTCGApan-cancerRNA-seqdata,identifyspecificpatternsintheVAEencodedfeatures,anddiscusspotentialmeritsoftheapproach.Wenameourmethod"Tybalt"afteraninstigative,cat-likecharacterwhosetsacascadingchainofeventsinmotioninShakespeare'sRomeoandJuliet.Fromasystemsbiologyperspective,Tybaltcouldonedayaidincancerstratificationorpredictspecificactivatedexpressionpatternsthatwouldresultfromgeneticchangesortreatmenteffects.

Page 18: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

7

CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATAORALPRESENTATION

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 19: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

8

LARGE-SCALEANALYSISOFDISEASEPATHWAYSINTHEHUMANINTERACTOME

MonicaAgrawal,MarinkaZitnik,JureLeskovec

StanfordUniversityMarinka,ZitnikDiscoveringdiseasepathways,whichcanbedefinedassetsofproteinsassociatedwithagivendisease,isanimportantproblemthathasthepotentialtoprovideclinicallyactionableinsightsfordiseasediagnosis,prognosis,andtreatment.Computationalmethodsaidthediscoverybyrelyingonprotein-proteininteraction(PPI)networks.Theystartwithafewknowndisease-associatedproteinsandaimtofindtherestofthepathwaybyexploringthePPInetworkaroundtheknowndiseaseproteins.However,thesuccessofsuchmethodshasbeenlimited,andfailurecaseshavenotbeenwellunderstood.HerewestudythePPInetworkstructureof519diseasepathways.Wefindthat90%ofpathwaysdonotcorrespondtosinglewell-connectedcomponentsinthePPInetwork.Instead,proteinsassociatedwithasinglediseasetendtoformmanyseparateconnectedcomponents/regionsinthenetwork.Wethenevaluatestate-of-the-artdiseasepathwaydiscoverymethodsandshowthattheirperformanceisespeciallypoorondiseaseswithdisconnectedpathways.Thus,weconcludethatnetworkconnectivitystructurealonemaynotbesufficientfordiseasepathwaydiscovery.However,weshowthathigher-ordernetworkstructures,suchassmallsubgraphsofthepathway,provideapromisingdirectionforthedevelopmentofnewmethods.

Page 20: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

9

MAPPINGPATIENTTRAJECTORIESUSINGLONGITUDINALEXTRACTIONANDDEEPLEARNINGINTHEMIMIC-IIICRITICALCAREDATABASE

BrettK.Beaulieu-Jones,PatrykOrzechowski,JasonH.Moore

UniversityofPennsylvaniaBrett,Beaulieu-JonesElectronicHealthRecords(EHRs)containawealthofpatientdatausefultobiomedicalresearchers.Atpresent,boththeextractionofdataandmethodsforanalysesarefrequentlydesignedtoworkwithasinglesnapshotofapatient’srecord.Healthcareprovidersoftenperformandrecordactionsinsmallbatchesovertime.Byextractingthesecareevents,asequencecanbeformedprovidingatrajectoryforapatient’sinteractionswiththehealthcaresystem.Thesecareeventsalsoofferabasicheuristicforthelevelofattentionapatientreceivesfromhealthcareproviders.Weshowthatispossibletolearnmeaningfulembeddingsfromthesecareeventsusingtwodeeplearningtechniques,unsupervisedautoencodersandlongshort-termmemorynetworks.WecomparethesemethodstotraditionalmachinelearningmethodswhichrequireapointintimesnapshottobeextractedfromanEHR.

Page 21: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

10

AUTOMATEDDISEASECOHORTSELECTIONUSINGWORDEMBEDDINGSFROMELECTRONICHEALTHRECORDS

BenjaminS.Glicksberg,RiccardoMiotto,KippW.Johnson,KhaderShameer,LiLi,RongChen,JoelT.Dudley

IcahnSchoolofMedicineatMountSinai

Benjamin,GilcksbergAccurateandrobustcohortdefinitioniscriticaltobiomedicaldiscoveryusingElectronicHealthRecords(EHR).Similartoprospectivestudydesigns,highqualityEHR-basedresearchrequiresrigorousselectioncriteriatodesignatecase/controlstatusparticulartoeachdisease.Electronicphenotypingalgorithms,whicharemanuallybuiltandvalidatedperdisease,havebeensuccessfulinfillingthisneed.However,theseapproachesaretime-consuming,leadingtoonlyarelativelysmallamountofalgorithmsfordiseasesdeveloped.MethodologiesthatautomaticallylearnfeaturesfromEHRshavebeenusedforcohortselectionaswell.Todate,however,therehasbeennosystematicanalysisofhowthesemethodsperformagainstcurrentgoldstandards.Accordingly,thispapercomparestheperformanceofastate-of-the-artautomatedfeaturelearningmethodtoextractingresearch-gradecohortsforfivediseasesagainsttheirestablishedelectronicphenotypingalgorithms.Inparticular,weuseword2vectocreateunsupervisedembeddingsofthephenotypespacewithinanEHRsystem.Usingmedicalconceptsasaquery,wethenrankpatientsbytheirproximityintheembeddingspaceandautomaticallyextractputativediseasecohortsviaadistancethreshold.ExperimentalevaluationshowspromisingresultswithaverageF-scoreof0.57andAUC-ROCof0.98.However,wenoticedthatresultsvariedconsiderablybetweendiseases,thusnecessitatingfurtherinvestigationand/orphenotype-specificrefinementoftheapproachbeforebeingreadilydeployedacrossalldiseases.

Page 22: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

11

FUNCTIONALNETWORKCOMMUNITYDETECTIONCANDISAGGREGATEANDFILTERMULTIPLEUNDERLYINGPATHWAYSINENRICHMENTANALYSES

LiaX.Harrington1,GregoryP.Way2,JenniferA.Doherty3,CaseyS.Greene2

1GeiselSchoolofMedicineatDartmouth,2UniversityofPennsylvania,3UniversityofUtahLia,HarringtonDifferentialexpressionexperimentsorotheranalysesoftenendinalistofgenes.Pathwayenrichmentanalysisisonemethodtodiscernimportantbiologicalsignalsandpatternsfromnoisyexpressiondata.However,pathwayenrichmentanalysismayperformsuboptimallyinsituationswheretherearemultipleimplicatedpathways–suchasinthecaseofgenesthatdefinesubtypesofcomplexdiseases.Oursimulationstudyshowsthatinthissetting,standardoverrepresentationanalysisidentifiesmanyfalsepositivepathwaysalongwiththetruepositives.Thesefalsepositiveshamperinvestigators’attemptstogleanbiologicalinsightsfromenrichmentanalysis.Wedevelopandevaluateanapproachthatcombinescommunitydetectionoverfunctionalnetworkswithpathwayenrichmenttoreducefalsepositives.Oursimulationstudydemonstratesthatalargereductioninfalsepositivescanbeobtainedwithasmalldecreaseinpower.Thoughwehypothesizedthatmultiplecommunitiesmightunderliepreviouslydescribedsubtypesofhigh-gradeserousovariancancerandappliedthisapproach,ourresultsdonotsupportthishypothesis.Insummary,applyingcommunitydetectionbeforeenrichmentanalysismayeaseinterpretationforcomplexgenesetsthatrepresentmultipledistinctpathways.

Page 23: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

12

CAUSALINFERENCEONELECTRONICHEALTHRECORDSTOASSESSBLOODPRESSURETREATMENTTARGETS:ANAPPLICATIONOFTHEPARAMETRICG

FORMULA

KippW.Johnson1,BenjaminS.Glicksberg1,RachelHodos1,2,KhaderShameer1,JoelT.Dudley1

1InstituteforNextGenerationHealthcare,DepartmentofGeneticsandGenomic

Sciences,IcahnSchoolofMedicineatMountSinai;2CourantInstituteofMathematicalSciences,NewYorkUniversity

Kipp,JohnsonHypertensionisamajorriskfactorforischemiccardiovasculardiseaseandcerebrovasculardisease,whicharerespectivelytheprimaryandsecondarymostcommoncausesofmorbidityandmortalityacrosstheglobe.Toalleviatetherisksofhypertension,thereareanumberofeffectiveantihypertensivedrugsavailable.However,theoptimaltreatmentbloodpressuregoalforantihypertensivetherapyremainsanareaofcontroversy.TheresultsoftherecentSystolicBloodPressureInterventionTrial(SPRINT)trial,whichfoundbenefitsforintensiveloweringofsystolicbloodpressure,havebeendebatedforseveralreasons.WeaimedtoassessthebenefitsoftreatingtofourdifferentbloodpressuretargetsandtocompareourresultstothoseofSPRINTusingamethodforcausalinferencecalledtheparametricgformula.Weappliedthismethodtobloodpressuremeasurementsobtainedfromtheelectronichealthrecordsofapproximately200,000patientswhovisitedtheMountSinaiHospitalinNewYork,NY.Wesimulatedtheeffectoffourclinicallyrelevantdynamictreatmentregimes,assessingtheeffectivenessoftreatingtofourdifferentbloodpressuretargets:150mmHg,140mmHg,130mmHg,and120mmHg.IncontrasttocurrentAmericanHeartAssociationguidelinesandinconcordancewithSPRINT,wefindthattargeting120mmHgsystolicbloodpressureissignificantlyassociatedwithdecreasedincidenceofmajoradversecardiovascularevents.Causalinferencemethodsappliedtoelectronicmethodsareapowerfulandflexibletechniqueandmedicinemaybenefitfromtheirincreasedusage.

Page 24: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

13

DATA-DRIVENADVICEFORAPPLYINGMACHINELEARNINGTOBIOINFORMATICSPROBLEMS

RandalS.Olson,WilliamLaCava,ZairahMustahsan,AkshayVarik,JasonH.Moore

UniversityofPennsylvaniaWilliam,LaCavaAsthebioinformaticsfieldgrows,itmustkeeppacenotonlywithnewdatabutwithnewalgorithms.Herewecontributeathoroughanalysisof13state-of-the-art,commonlyusedmachinelearningalgorithmsonasetof165publiclyavailableclassificationproblemsinordertoprovidedata-drivenalgorithmrecommendationstocurrentresearchers.Wepresentanumberofstatisticalandvisualcomparisonsofalgorithmperformanceandquantifytheeffectofmodelselectionandalgorithmtuningforeachalgorithmanddataset.Theanalysisculminatesintherecommendationoffivealgorithmswithhyperparametersthatmaximizeclassifierperformanceacrossthetestedproblems,aswellasgeneralguidelinesforapplyingmachinelearningtosupervisedclassificationproblems.

Page 25: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

14

HOWPOWERFULARESUMMARY-BASEDMETHODSFORIDENTIFYINGEXPRESSION-TRAITASSOCIATIONSUNDERDIFFERENTGENETIC

ARCHITECTURES?

YogasudhaC.Veturi,MarylynD.Ritchie

BiomedicalandTranslationalInformaticsInstitute,GeisingerYogasudha,VeturiTranscriptome-wideassociationstudies(TWAS)haverecentlybeenemployedasanapproachthatcandrawupontheadvantagesofgenome-wideassociationstudies(GWAS)andgeneexpressionstudiestoidentifygenesassociatedwithcomplextraits.UnlikestandardGWAS,summaryleveldatasufficesforTWASandoffersimprovedstatisticalpower.TwopopularTWASmethodsincludeeither(a)imputingthecisgeneticcomponentofgeneexpressionfromsmallersizedstudies(usingmulti-SNPpredictionorMP)intomuchlargereffectivesamplesizesaffordedbyGWAS–-TWAS-MPor(b)usingsummary-basedMendelianrandomization–-TWAS-SMR.Althoughthesemethodshavebeeneffectiveatdetectingfunctionalvariants,itremainsunclearhowextensivevariabilityinthegeneticarchitectureofcomplextraitsanddiseasesimpactsTWASresults.Ourgoalwastoinvestigatethedifferentscenariosunderwhichthesemethodsyieldedenoughpowertodetectsignificantexpression-traitassociations.Inthisstudy,weconductedextensivesimulationsbasedon6000randomlychosen,unrelatedCaucasianmalesfromGeisinger’sMyCodepopulationtocomparethepowertodetectcisexpression-traitassociations(within500kbofagene)usingtheabove-describedapproaches.TotestTWASacrossvaryinggeneticbackgroundswesimulatedgeneexpressionandphenotypeusingdifferentquantitativetraitlocipergeneandcis-expression/traitheritabilityundergeneticmodelsthatdifferentiatetheeffectofcausalityfromthatofpleiotropy.Foreachgene,onatrainingsetrangingfrom100to1000individuals,weeither(a)estimatedregressioncoefficientswithgeneexpressionastheresponseusingfivedifferentmethods:LASSO,elasticnet,BayesianLASSO,Bayesianspike-slab,andBayesianridgeregressionor(b)performedeQTLanalysis.Wethensampledwithreplacement50,000,150,000,and300,000individualsrespectivelyfromthetestingsetoftheremaining5000individualsandconductedGWASoneachset.Subsequently,weintegratedtheGWASsummarystatisticsderivedfromthetestingsetwiththeweights(oreQTLs)derivedfromthetrainingsettoidentifyexpression-traitassociationsusing(a)TWAS-MP(b)TWAS-SMR(c)eQTL-basedGWAS,or(d)standaloneGWAS.Finally,weexaminedthepowertodetectfunctionallyrelevantgenesusingthedifferentapproachesundertheconsideredsimulationscenarios.Ingeneral,weobservedgreatsimilaritiesamongTWAS-MPmethodsalthoughtheBayesianmethodsresultedinimprovedpowerincomparisontoLASSOandelasticnetasthetraitarchitecturegrewmorecomplexwhiletrainingsamplesizesandexpressionheritabilityremainedsmall.Finally,weobservedhighpowerundercausalitybutverylowtomoderatepowerunderpleiotropy.

Page 26: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

15

DEMOCRATIZINGHEALTHDATAFORTRANSLATIONALRESEARCH

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 27: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

16

CLINGENCANCERSOMATICWORKINGGROUP–STANDARDIZINGANDDEMOCRATIZINGACCESSTOCANCERMOLECULARDIAGNOSTICDATATODRIVE

TRANSLATIONALRESEARCH

SubhaMadhavan1,DeborahRitter2,ChristineMicheel3,ShrutiRao1,AngshumoyRoy2,DmitriySonkin4,MatthewMcCoy1,MalachiGriffith5,ObiL.Griffith5,PeterMcGarvey1,

ShashikantKulkarni2,onbehalfoftheClingenSomaticWorkingGroup

1InnovationCenterforBiomedicalInformatics,GeorgetownUniversity,WashingtonD.C.;2BaylorCollegeofMedicineandTexasChildren'sHospital,Houston,TX;3Vanderbilt

UniversitySchoolofMedicine,Nashville,TN;4NationalCancerInstitute,Rockville,MD;5TheMcDonnellGenomeInstitute,WashingtonUniversity,St.Louis,MO

Subha,MadhavanAgrowingnumberofacademicandcommunityclinicsareconductinggenomictestingtoinformtreatmentdecisionsforcancerpatients.Inthelast3-5years,therehasbeenarapidincreaseinclinicaluseofnextgenerationsequencing(NGS)basedcancermoleculardiagnostic(MolDx)testing.Theincreasingavailabilityanddecreasingcostoftumorgenomicprofilingmeansthatphysicianscannowmaketreatmentdecisionsarmedwithpatient-specificgeneticinformation.Accumulatingresearchinthecancerbiologyfieldindicatesthatthereissignificantpotentialtoimprovecancerpatientoutcomesbyeffectivelyleveragingthisrichsourceofgenomicdataintreatmentplanning.Toachievetrulypersonalizedmedicineinoncology,itiscriticaltocatalogcancersequencevariantsfromMolDxtestingfortheirclinicalrelevancealongwithtreatmentinformationandpatientoutcomes,andtodosoinawaythatsupportslarge-scaledataaggregationandnewhypothesisgeneration.Onecriticalchallengetoencodingvariantdataisadoptingastandardofannotationofthosevariantsthatareclinicallyactionable.ThroughtheNIH-fundedClinicalGenomeResource(ClinGen),incollaborationwithNLM’sClinVardatabaseand>50academicandindustrybasedcancerresearchorganizations,wedevelopedtheMinimalVariantLevelData(MVLD)frameworktostandardizereportingandinterpretationofdrugassociatedalterations.WearecurrentlyinvolvedincollaborativeeffortstoaligntheMVLDframeworkwithparallel,complementarysequencevariantsinterpretationclinicalguidelinesfromtheAssociationofMolecularPathologists(AMP)forclinicallabs.InordertotrulydemocratizeaccesstoMolDxdataforcareandresearchneeds,thesestandardsmustbeharmonizedtosupportsharingofclinicalcancervariants.HerewedescribetheprocessesandmethodsdevelopedwithintheClinGen’sSomaticWGincollaborationwithover60cancercareandresearchorganizationsaswellasCLIA-certified,CAP-accreditedclinicaltestinglabstodevelopstandardsforcancervariantinterpretationandsharing.Keywords:ClinGen,Somaticvariants,predictivebiomarkers,MVLD,datasharing

Page 28: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

17

AHEURISTICMETHODFORSIMULATINGOPEN-DATAOFARBITRARYCOMPLEXITYTHATCANBEUSEDTOCOMPAREANDEVALUATEMACHINE

LEARNINGMETHODS

JasonH.Moore,MaksimShestov,PeterSchmitt,RandalS.Olson

InstituteforBiomedicalInformatics,UniversityofPennsylvaniaJason,MooreAcentralchallengeofdevelopingandevaluatingartificialintelligenceandmachinelearningmethodsforregressionandclassificationisaccesstodatathatilluminatesthestrengthsandweaknessesofdifferentmethods.Opendataplaysanimportantroleinthisprocessbymakingiteasyforcomputationalresearcherstoeasilyaccessrealdataforthispurpose.GenomicshasinsomeexamplestakenaleadingroleintheopendataeffortstartingwithDNAmicroarrays.Whilerealdatafromexperimentalandobservationalstudiesisnecessaryfordevelopingcomputationalmethodsitisnotsufficient.Thisisbecauseitisnotpossibletoknowwhatthegroundtruthisinrealdata.Thismustbeaccompaniedbysimulateddatawherethatbalancebetweensignalandnoiseisknownandcanbedirectlyevaluated.Unfortunately,thereisalackofmethodsandsoftwareforsimulatingdatawiththekindofcomplexityfoundinrealbiologicalandbiomedicalsystems.WepresentheretheHeuristicIdentificationofBiologicalArchitecturesforsimulatingComplexHierarchicalInteractions(HIBACHI)methodandprototypesoftwareforsimulatingcomplexbiologicalandbiomedicaldata.Further,weintroducenewmethodsfordevelopingsimulationmodelsthatgeneratedatathatspecificallyallowsdiscriminationbetweendifferentmachinelearningmethods.

Page 29: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

18

BESTPRACTICESANDLESSONSLEARNEDFROMREUSEOF4PATIENT-DERIVEDMETABOLOMICSDATASETSINALZHEIMER'SDISEASE

JessicaD.Tenenbaum,ColetteBlach

DukeUniversityJessica,TenenbaumTheimportanceofopendatahasbeenincreasinglyrecognizedinrecentyears.Althoughthesharingandreuseofclinicaldatafortranslationalresearchlagsbehindbestpracticesinbiologicalscience,anumberofpatient-deriveddatasetsexistandhavebeenpublishedenablingtranslationalresearchspanningmultiplescalesfrommoleculartoorganlevel,andfrompatientstopopulations.InseekingtoreplicatemetabolomicbiomarkerresultsinAlzheimer’sdiseaseourteamidentifiedthreeindependentcohortsinwhichtocomparefindings.Accessingthedatasetsassociatedwiththesecohorts,understandingtheircontentandprovenance,andcomparingvariablesbetweenstudieswasavaluableexerciseinexploringtheprinciplesofopendatainpractice.Italsohelpedinformstepstakentomaketheoriginaldatasetsavailableforusebyotherresearchers.Inthispaperwedescribebestpracticesandlessonslearnedinattemptingtoidentify,access,understand,andanalyzetheseadditionaldatasetstoadvanceresearchreproducibility,aswellasstepstakentofacilitatesharingofourowndata.

Page 30: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

19

IMAGINGGENOMICS

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 31: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

20

DISCRIMINATIVEBAG-OF-CELLSFORIMAGING-GENOMICS

BenjaminChidester1,MinhN.Do2,JianMa1

1CarnegieMellonUniversity,2UniversityofIllinoisatUrbana-ChampaignBenjamin,ChidesterConnectinggenotypestoimagephenotypesiscrucialforacomprehensiveunderstandingofcancer.Tolearnsuchconnections,newmachinelearningapproachesmustbedevelopedforthebetterintegrationofimagingandgenomicdata.HereweproposeanovelapproachcalledDiscriminativeBag-of-Cells(DBC)forpredictinggenomicmarkersusingimagingfeatures,whichaddressesthechallengeofsummarizinghistopathologicalimagesbyrepresentingcellswithlearneddiscriminativetypes,orcodewords.Wealsodevelopedareliableandefficientpatch-basednuclearsegmentationschemeusingconvolutionalneuralnetworksfromwhichnuclearandcellularfeaturesareextracted.ApplyingDBConTCGAbreastcancersamplestopredictbasalsubtypestatusyieldedaclass-balancedaccuracyof70%onaseparatetestpartitionof213patients.Asdatasetsofimagingandgenomicdatabecomeincreasinglyavailable,webelieveDBCwillbeausefulapproachforscreeninghistopathologicalimagesforgenomicmarkers.SourcecodeofnuclearsegmentationandDBCareavailableat:https://github.com/bchidest/DBC.

Page 32: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

21

DEEPINTEGRATIVEANALYSISFORSURVIVALPREDICTION

ChenglongHuang1,AlbertZhang2,GuanghuaXiao3

1ColleyvilleHeritageHighSchool,2HighlandParkHighSchool,3UniversityofTexasSouthwesternMedicalCenter

Chenglong,HuangSurvivalpredictionisveryimportantinmedicaltreatment.However,recentleadingresearchischallengedbytwofactors:1)thedatasetsusuallycomewithmulti-modality;and2)samplesizesarerelativelysmall.Tosolvetheabovechallenges,wedevelopedadeepsurvivallearningmodeltopredictpatients’survivaloutcomesbyintegratingmulti-viewdata.Theproposednetworkcontainstwosub-networks,oneview-specificandonecommonsub-network.WedesignatedoneCNN-basedandoneFCN-basedsub-networktoefficientlyhandlepathologicalimagesandmolecularprofiles,respectively.Ourmodelfirstexplicitlymaximizesthecorrelationamongtheviewsandthentransfersfeaturehierarchiesfromviewcommonalityandspecificallyfine-tunesonthesurvivalpredictiontask.Weevaluateourmethodonreallungandbraintumordatasetstodemonstratetheeffectivenessoftheproposedmodelusingdatawithmultiplemodalitiesacrossdifferenttumortypes.

Page 33: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

22

GENOTYPE-PHENOTYPEASSOCIATIONSTUDYVIANEWMULTI-TASKLEARNINGMODEL

ZhouyuanHuo1,DinggangShen2,HengHuang1

1UniversityofPittsburgh,2UniversityofNorthCarolinaatChapelHillHeng,HuangResearchontheassociationsbetweengeneticvariationsandimagingphenotypesisdevelopingwiththeadvanceinhigh-throughputgenotypeandbrainimagetechniques.Regressionanalysisofsinglenucleotidepolymorphisms(SNPs)andimagingmeasuresasquantitativetraits(QTs)hasbeenproposedtoidentifythequantitativetraitloci(QTL)viamulti-tasklearningmodels.RecentstudiesconsidertheinterlinkedstructureswithinSNPsandimagingQTsthroughgrouplasso,e.g.ℓ21-norm,leadingtobetterpredictiveresultsandinsightsofSNPs.However,groupsparsityisnotenoughforrepresentingthecorrelationbetweenmultipletasksandℓ21-normregularizationisnotrobusteither.Inthispaper,weproposeanewmulti-tasklearningmodeltoanalyzetheassociationsbetweenSNPsandQTs.Wesupposethatlow-rankstructureisalsobeneficialtouncoverthecorrelationbetweengeneticvariationsandimagingphenotypes.Finally,weconductregressionanalysisofSNPsandQTs.ExperimentalresultsshowthatourmodelismoreaccurateinpredictionthancomparedmethodsandpresentsnewinsightsofSNPs.

Page 34: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

23

CODONBIASAMONGSYNONYMOUSRAREVARIANTSISASSOCIATEDWITHALZHEIMER’SDISEASEIMAGINGBIOMARKER

JasonE.Miller1,ManuK.Shivakumar2,ShannonL.Risacher2,AndrewJ.Saykin2,SeunggeunLee3,KwangsikNho2,DokyoonKim1,4

1GeisingerHealthSystem,2IndianaUniversitySchoolofMedicine,3Universityof

Michigan,4PennsylvaniaStateUniversityJason,MillerAlzheimer’sdisease(AD)isaneurodegenerativedisorderwithfewbiomarkerseventhoughitimpactsarelativelylargeportionofthepopulationandispredictedtoaffectsignificantlymoreindividualsinthefuture.NeuroimaginghasbeenusedinconcertwithgeneticinformationtoimproveourunderstandinginrelationtohowADarisesandhowitcanbepotentiallydiagnosed.Additionally,evidencesuggestssynonymousvariantscanhaveafunctionalimpactongeneregulatorymechanisms,includingthoserelatedtoAD.Somesynonymouscodonsarepreferredoverothersleadingtoacodonbias.Thebiascanarisewithrespecttocodonsthataremoreorlessfrequentlyusedinthegenome.Abiascanalsoresultfromoptimalandnon-optimalcodons,whichhavestrongerandweakercodonanti-codoninteractions,respectively.AlthoughassociationtestshavebeenutilizedbeforetoidentifygenesassociatedwithAD,itremainsunclearhowcodonbiasplaysaroleandifitcanimproverarevariantanalysis.Inthiswork,rarevariantsfromwhole-genomesequencingfromtheAlzheimer’sDiseaseNeuroimagingInitiative(ADNI)cohortwerebinnedintogenesusingBioBin.AnassociationanalysisofthegeneswithAD-relatedneuroimagingbiomarkerwasperformedusingSKAT-O.Whileusingallsynonymousvariantswedidnotidentifyanygenome-widesignificantassociations,usingonlysynonymousvariantsthataffectedcodonfrequencyweidentifiedseveralgenesassignificantlyassociatedwiththeimagingphenotype.Additionally,significantassociationswerefoundusingonlyrarevariantsthatcontainsanoptimalcodoninamongminorallelesandanon-optimalcodoninthemajorallele.TheseresultssuggestthatcodonbiasmayplayaroleinADandthatitcanbeusedtoimprovedetectionpowerinrarevariantassociationanalysis.

Page 35: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

24

PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 36: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

25

SINGLESUBJECTTRANSCRIPTOMEANALYSISREPRODUCESSIGNEDGENESETFUNCTIONALACTIVATIONSIGNALSFROMCOHORTANALYSISOFMURINE

RESPONSETOHIGHFATDIET

JoanneBerghout,QikeLi,NimaPouladi,JianrongLi,YvesA.Lussier

UniversityofArizonaJoanne,BerghoutAnalysisofsingle-subjecttranscriptomeresponsedataisanunmetneedofprecisionmedicine,madechallengingbythehighdimension,dynamicnatureanddifficultyinextractingmeaningfulsignalsfrombiologicalorstochasticnoise.Wehaveproposedamethodforsinglesubjectanalysisthatusesamixturemodelfortranscriptfold-changeclusteringfromisogenicpairedsamples,followedbyintegrationofthesedistributionswithGeneOntologyBiologicalProcesses(GO-BP)toreducedimensionandidentifyfunctionalattributes.WethenextendedthesemethodstodevelopfunctionalsigningmetricsforgenesetprocessregulationbyincorporatingbiologicalrepressorrelationshipsencodedinGOasnegatively_regulatesedges.Resultsrevealedreproducibleandbiologicallymeaningfulsignalsfromanalysisofasinglesubject’sresponse,openingthedoortofuturetranscriptomicstudieswheresubjectandresourceavailabilityarecurrentlylimiting.Weusedinbredmousestrainsfeddifferentdietstoprovideisogenicbiologicalreplicates,permittingrigorousvalidationofourmethod.Wecomparedsignificantgenotype-specificGO-BPtermresultsforoverlapandrankorderacrossthreereplicatespergenotype,andcross-methodstoreferencestandards(limma+FET,SAM+FET,andGSEA).Allsingle-subjectanalyticsfindingswererobustandhighlyreproducible(medianareaundertheROCcurve=0.96,n=24genotypesx3replicates),providingconfidenceandvalidationofthisapproachforanalysesinsinglesubjects.Rcodeisavailableonlineathttp://www.lussiergroup.org/publications/PathwayActivity

Page 37: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

26

USINGSIMULATIONANDOPTIMIZATIONAPPROACHTOIMPROVEOUTCOMETHROUGHWARFARINPRECISIONTREATMENT

Chih-LinChi1,LuHe2,KouroshRavvaz3,JohnWeissert3,PeterJ.Tonellato4,5

1SchoolofNursing&InstituteforHealthInformatics,UniversityofMinnesota,Minneapolis,MN,USA;2ComputerScienceandEngineering,UniversityofMinnesota,Minneapolis,MN,USA;3AuroraHealthCare,Milwaukee,WI,USA;4Departmentof

BiomedicalInformatics,DepartmentofPathology,HarvardMedicalSchool,Boston,MA,USA;5ZilberSchoolofPublicHealthUniversityofWisconsin-Milwaukee,Milwaukee,WI,

USAChih-Lin,ChiWeapplyatreatmentsimulationandoptimizationapproachtodevelopdecisionsupportguidanceforwarfarinprecisiontreatmentplans.Simulationincludetheuseof~1,500,000clinicalavatars(simulatedpatients)generatedbyanintegrateddata-drivenanddomain-knowledgebasedBayesianNetworkModelingapproach.Subsequently,wesimulate30-dayindividualpatientresponsetowarfarintreatmentoffiveclinicalandgenetictreatmentplansfollowedbybothindividualandsub-populationbasedoptimization.Sub-populationoptimization(comparedtoindividualoptimization)providesacosteffectiveandrealisticmeansofimplementationofaprecision-driventreatmentplaninpracticalsettings.Inthisproject,weusethepropertyofminimalentropytominimizeoveralladverserisksforthelargestpossiblepatientsub-populationsandwetempertheresultsbyconsideringbothtransparencyandeaseofimplementation.Finally,wediscusstheimprovedoutcomeoftheprecisiontreatmentplanbasedonthesub-populationoptimizeddecisionsupportrules.

Page 38: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

27

COALITIONALGAMETHEORYASAPROMISINGAPPROACHTOIDENTIFYCANDIDATEAUTISMGENES

AnikaGupta,MinWooSun,KelleyM.Paskov,NateT.Stockham,Jae-YoonJung,DennisP.Wall

StanfordUniversity

Dennis,WallDespitemountingevidenceforthestrongroleofgeneticsinthephenotypicmanifestationofAutismSpectrumDisorder(ASD),thespecificgenesresponsibleforthevariableformsofASDremainundefined.ASDmaybebestexplainedbyacombinatorialgeneticmodelwithvaryingepistaticinteractionsacrossmanysmalleffectmutations.Coalitionalorcooperativegametheoryisatechniquethatstudiesthecombinedeffectsofgroupsofplayers,knownascoalitions,seekingtoidentifyplayerswhotendtoimprovetheperformance--therelationshiptoaspecificdiseasephenotype--ofanycoalitiontheyjoin.Thismethodhasbeenpreviouslyshowntoboostbiologicallyinformativesignalingeneexpressiondatabutto-datehasnotbeenappliedtothesearchforcooperativemutationsamongputativeASDgenes.WedescribeourapproachtohighlightgenesrelevanttoASDusingcoalitionalgametheoryonalterationdataof1,965fullysequencedgenomesfrom756multiplexfamilies.AlterationswereencodedintobinarymatricesforASD(case)andunaffected(control)samples,indicatinglikelygene-disrupting,inheritedmutationsinalteredgenes.TodetermineindividualgenecontributionsgivenanASDphenotype,a“player”metric,referredtoastheShapleyvalue,wascalculatedforeachgeneinthecaseandcontrolcohorts.SixtysevengeneswerefoundtohavesignificantlyelevatedplayerscoresandlikelyrepresentsignificantcontributorstothegeneticcoordinationunderlyingASD.Usingnetworkandcross-studyanalysis,wefoundthatthesegenesareinvolvedinbiologicalpathwaysknowntobeaffectedintheautismcasesandthatasubsetdirectlyinteractwithseveralgenesknowntohavestrongassociationstoautism.Thesefindingssuggestthatcoalitionalgametheorycanbeappliedtolarge-scalegenomicdatatoidentifyhiddenyetinfluentialplayersincomplexpolygenicdisorderssuchasautism.

Page 39: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

28

CONSIDERATIONSFORAUTOMATEDMACHINELEARNINGINCLINICALMETABOLICPROFILING:ALTEREDHOMOCYSTEINEPLASMACONCENTRATION

ASSOCIATEDWITHMETFORMINEXPOSURE

AlenaOrlenko1,JasonH.Moore1,PatrykOrzechowski1,RandalS.Olson1,JunmeiCairns2,PedroJ.Caraballo2,RichardM.Weinshilboum2,LieweiWang2,MatthewK.Breitenstein1

1UniversityofPennsylvania,2MayoClinic

Matthew,BreitensteinWiththematurationofmetabolomicsscienceandproliferationofbiobanks,clinicalmetabolicprofilingisanincreasinglyopportunisticfrontierforadvancingtranslationalclinicalresearch.AutomatedMachineLearning(AutoML)approachesprovideexcitingopportunitytoguidefeatureselectioninagnosticmetabolicprofilingendeavors,wherepotentiallythousandsofindependentdatapointsmustbeevaluated.Inpreviousresearch,AutoMLusinghigh-dimensionaldataofvaryingtypeshasbeendemonstrablyrobust,outperformingtraditionalapproaches.However,considerationsforapplicationinclinicalmetabolicprofilingremaintobeevaluated.Particularly,regardingtherobustnessofAutoMLtoidentifyandadjustforcommonclinicalconfounders.Inthisstudy,wepresentafocusedcasestudyregardingAutoMLconsiderationsforusingtheTree-BasedOptimizationTool(TPOT)inmetabolicprofilingofexposuretometformininabiobankcohort.First,weproposeatandemrank-accuracymeasuretoguideagnosticfeatureselectionandcorrespondingthresholddeterminationinclinicalmetabolicprofilingendeavors.Second,whileAutoML,usingdefaultparameters,demonstratedpotentialtolacksensitivitytolow-effectconfoundingclinicalcovariates,wedemonstratedresidualtrainingandadjustmentofmetabolitefeaturesasaneasilyapplicableapproachtoensureAutoMLadjustmentforpotentialconfoundingcharacteristics.Finally,wepresentincreasedhomocysteinewithlong-termexposuretometforminasapotentiallynovel,non-replicatedmetaboliteassociationsuggestedbyTPOT;anassociationnotidentifiedinparallelclinicalmetabolicprofilingendeavors.Whilewarrantingindependentreplication,ourtandemrank-accuracymeasuresuggestshomocysteinetobethemetabolitefeaturewithlargesteffect,andcorrespondingpriorityforfurthertranslationalclinicalresearch.ResidualtrainingandadjustmentforapotentialconfoundingeffectbyBMIonlyslightlymodifiedthesuggestedassociation.IncreasedhomocysteineisthoughttobeassociatedwithvitaminB12deficiency–evaluationforpotentialclinicalrelevanceissuggested.Whileconsiderationsforclinicalmetabolicprofilingarerecommended,includingadjustmentapproachesforclinicalconfounders,AutoMLpresentsanexcitingtooltoenhanceclinicalmetabolicprofilingandadvancetranslationalresearchendeavors.

Page 40: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

29

ADDRESSINGVITALSIGNALARMFATIGUEUSINGPERSONALIZEDALARMTHRESHOLDS

SarahPoole,NigamShah

StanfordUniversitySarah,PooleAlarmfatigue,aconditioninwhichclinicalstaffbecomedesensitizedtoalarmsduetothehighfrequencyofunnecessaryalarms,isamajorpatientsafetyconcern.Alarmfatigueisparticularlyprevalentinthepediatricsetting,duetothehighlevelofvariationinvitalsignswithpatientage.Existingstudieshaveshownthatthecurrentdefaultpediatricvitalsignalarmthresholdsareinappropriate,andleadtoalargerthannecessaryalarmload.Thisstudyleveragesalargedatabasecontainingover190patient-yearsofheartratedatatoaccuratelyidentifythe1stand99thpercentilesofanindividual’sheartrateontheirfirstdayofvitalsignmonitoring.Thesepercentilesarethenusedaspersonalizedvitalsignthresholds,whichareevaluatedbycomparingtonon-defaultalarmthresholdsusedinpractice,andbyusingthepresenceofmajorclinicaleventstoinferalarmlabels.Usingtheproposedpersonalizedthresholdswoulddecreaselowandhighheartratealarmsbyupto50%and44%respectively,whilemaintainingsensitivityof62%andincreasingspecificityto49%.Theproposedpersonalizedvitalsignalarmthresholdswillreducealarmfatigue,thuscontributingtoimprovedpatientoutcomes,shorterhospitalstays,andreducedhospitalcosts.

Page 41: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

30

EMERGENCEOFPATHWAY-LEVELCOMPOSITEBIOMARKERSFROMCONVERGINGGENESETSIGNALSOFHETEROGENEOUSTRANSCRIPTOMIC

RESPONSES

SamirRachidZaim,QikeLi,A.GrantSchissler,YvesA.Lussier

TheUniversityofArizonaYves,LussierRecentprecisionmedicineinitiativeshaveledtotheexpectationofimprovedclinicaldecision-makinganchoredingenomicdatascience.However,overthelastdecade,onlyahandfulofnewsingle-geneproductbiomarkershavebeentranslatedtoclinicalpractice(FDAapproved)inspiteofconsiderablediscoveryeffortsdeployedandaplethoraoftranscriptomesavailableintheGeneExpressionOmnibus.Withthismodestoutcomeofcurrentapproachesinmind,wedevelopedapilotsimulationstudytodemonstratetheuntappedbenefitsofdevelopingdiseasedetectionmethodsforcaseswherethetruesignalliesatthepathwaylevel,evenifthepathway’sgeneexpressionalterationsmaybeheterogeneousacrosspatients.Inotherwords,werelaxedthecross-patienthomogeneityassumptionfromthetranscriptlevel(cohortassumptionsofderegulatedgeneexpression)tothepathwaylevel(assumptionsofderegulatedpathwayexpression).Furthermore,wehaveexpandedprevioussingle-subject(SS)methodsintocohortanalysestoillustratethebenefitofaccountingforanindividual’svariabilityincohortscenarios.WecompareSSandcohort-based(CB)techniquesunder54distinctscenarios,eachwith1,000simulations,todemonstratethattheemergenceofapathway-levelsignaloccursthroughthesummativeeffectofitsalteredgeneexpression,heterogeneousacrosspatients.Studiedvariablesincludepathwaygenesetsize,fractionofexpressedgeneresponsivewithingeneset,fractionofexpressedgeneresponsiveup-vsdown-regulated,andcohortsize.WedemonstratedthatourSSapproachwasuniquelysuitedtodetectsignalsinheterogeneouspopulationsinwhichindividualshavevaryinglevelsofbaselinerisksthataresimultaneouslyconfoundedbypatient-specific“genome-by-environment”interactions(G×E).Areaundertheprecision-recallcurveoftheSSapproachfarsurpassedthatoftheCB(1stquartile,median,3rdquartile:SS=0.94,0.96,0.99;CB=0.50,0.52,0.65).Weconcludethatsingle-subjectpathwaydetectionmethodsareuniquelysuitedforconsistentlydetectingpathwaydysregulationbytheinclusionofapatient’sindividualvariability.http://www.lussiergroup.org/publications/PathwayMarker/

Page 42: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

31

ANALYZINGMETABOLOMICSDATAFORASSOCIATIONWITHGENOTYPESUSINGTWO-COMPONENTGAUSSIANMIXTUREDISTRIBUTIONS

JasonWestra1,2,NicholasHartman3,BethanyLake4,GregoryShearer5,NathanTintle1

1DordtCollege,2IowaStateUniversity,3CornellUniversity,4ElonUniversity,5ThePennsylvaniaStateUniversity

Jason,WestraStandardapproachestoevaluatetheimpactofsinglenucleotidepolymorphisms(SNP)onquantitativephenotypesuselinearmodels.However,thesenormal-basedapproachesmaynotoptimallymodelphenotypeswhicharebetterrepresentedbyGaussianmixturedistributions(e.g.,somemetabolomicsdata).Wedevelopalikelihoodratiotestonthemixingproportionsoftwo-componentGaussianmixturedistributionsandconsidermorerestrictivemodelstoincreasepowerinlightofaprioribiologicalknowledge.Datawassimulatedtovalidatetheimprovedpowerofthelikelihoodratiotestandtherestrictedlikelihoodratiotestoveralinearmodelandalogtransformedlinearmodel.Then,usingrealdatafromtheFraminghamHeartStudy,weanalyzed20,315SNPsonchromosome11,demonstratingthattheproposedlikelihoodratiotestidentifiesSNPswellknowntoparticipateinthedesaturationofcertainfattyacids.OurstudybothvalidatestheapproachofincreasingpowerbyusingthelikelihoodratiotestthatleveragesGaussianmixturemodels,andcreatesamodelwithimprovedsensitivityandinterpretability.

Page 43: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

32

READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROM

NONCODINGDNA

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 44: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

33

CONVERGENTDOWNSTREAMCANDIDATEMECHANISMSOFINDEPENDENTINTERGENICPOLYMORPHISMSBETWEENCO-CLASSIFIEDDISEASESIMPLICATE

EPISTASISAMONGNONCODINGELEMENTS

JialiHan1,JianrongLi1,IkbelAchour1,LorenzoPesce2,IanFoster2,HaiquanLi3,YvesA.Lussier3

1CenterforBiomedicalInformaticsandBiostatistics(CB2)andDepartmentsofMedicineandofSystemsandIndustrialEngineering,TheUniversityofArizona,Tucson,AZ85721,USA;2ComputationInstitute,ArgonneNationalLaboratoryandUniversityofChicago,

Chicago,IL60637,USA;3CB2,BIO5Institute,UACC,andDeptofMedicine,TheUniversityofArizona,Tucson,AZ85721,USA

Haiquan,LiEightypercentofDNAoutsideproteincodingregionswasshownbiochemicallyfunctionalbytheENCODEproject,enablingstudiesoftheirinteractions.Studieshavesinceexploredhowconvergentdownstreammechanismsarisefromindependentgeneticrisksofonecomplexdisease.However,thecross-talkandepistasisbetweenintergenicrisksassociatedwithdistinctcomplexdiseaseshavenotbeencomprehensivelycharacterized.Ourrecentintegrativegenomicanalysisunveileddownstreambiologicaleffectorsofdisease-specificpolymorphismsburiedinintergenicregions,andwethenvalidatedtheirgeneticsynergyandantagonismindistinctGWAS.WeextendthisapproachtocharacterizeconvergentdownstreamcandidatemechanismsofdistinctintergenicSNPsacrossdistinctdiseaseswithinthesameclinicalclassification.Weconstructamultipartitenetworkconsistingof467diseasesorganizedin15classes,2,358disease-associatedSNPs,6,301SNP-associatedmRNAsbyeQTL,andmRNAannotationsto4,538GeneOntologymechanisms.FunctionalsimilaritybetweentwoSNPs(similarSNPpairs)isimputedusinganestedinformationtheoreticdistancemodelforwhichp-valuesareassignedbyconservativescale-freepermutationofnetworkedgeswithoutreplacement(nodedegreesconstant).AtFDR≤5%,weprioritized3,870intergenicSNPpairsassociated,amongwhich755areassociatedwithdistinctdiseasessharingthesamediseaseclass,implicating167intergenicSNPs,14classes,230mRNAs,and134GOterms.Co-classifiedSNPpairsweremorelikelytobeprioritizedascomparedtothoseofdistinctclassesconfirminganoncodinggeneticunderpinningtoclinicalclassification(oddsratio~3.8;p≤10E-25).Theprioritizedpairswerealsoenrichedinregionsboundtothesame/interactingtranscriptionfactorsand/orinteractinginlong-rangechromatininteractionssuggestiveofepistasis(oddsratio~2,500;p≤10E-25).Thisprioritizednetworkimplicatescomplexepistasisbetweenintergenicpolymorphismsofco-classifieddiseasesandoffersaroadmapforanoveltherapeuticparadigm:repositioningmedicationsthattargetproteinswithindownstreammechanismsofintergenicdisease-associatedSNPs.Supplementaryinformationandsoftware:http://lussiergroup.org/publications/disease_class

Page 45: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

34

NETWORKANALYSISOFPSEUDOGENE-GENERELATIONSHIPS:FROMPSEUDOGENEEVOLUTIONTOTHEIRFUNCTIONALPOTENTIALS

TravisS.Johnson1,SihongLi1,JohnathanR.Kho2,KunHuang3,YanZhang1

1OhioStateUniversity,2GeorgiaInstituteofTechnology,3IndianaUniversityTravis,JohnsonPseudogenesarefossilrelativesofgenes.Pseudogeneshavelongbeenthoughtofas“junkDNAs”,sincetheydonotcodeproteinsinnormaltissues.Althoughmostofthehumanpseudogenesdonothavenoticeablefunctions,~20%ofthemexhibittranscriptionalactivity.TherehasbeenevidenceshowingthatsomepseudogenesadoptedfunctionsaslncRNAsandworkasregulatorsofgeneexpression.Furthermore,pseudogenescanevenbe“reactivated”insomeconditions,suchascancerinitiation.Somepseudogenesaretranscribedinspecificcancertypes,andsomeareeventranslatedintoproteinsasobservedinseveralcancercelllines.Alltheabovehaveshownthatpseudogenescouldhavefunctionalrolesorpotentialsinthegenome.Evaluatingtherelationshipsbetweenpseudogenesandtheirgenecounterpartscouldhelpusrevealtheevolutionarypathofpseudogenesandassociatepseudogeneswithfunctionalpotentials.Italsoprovidesaninsightintotheregulatorynetworksinvolvingpseudogeneswithtranscriptionalandeventranslationalactivities.Inthisstudy,wedevelopanovelapproachintegratinggraphanalysis,sequencealignmentandfunctionalanalysistoevaluatepseudogene-generelationships,andapplyittohumangenehomologsandpseudogenes.Wegeneratedacomprehensivesetof445pseudogene-gene(PGG)familiesfromtheoriginal3,281genefamilies(13.56%).Ofthese438(98.4%PGG,13.3%total)werenon-trivial(containingmorethanonepseudogene).EachPGGfamilycontainsmultiplegenesandpseudogeneswithhighsequencesimilarity.Foreachfamily,wegenerateasequencealignmentnetworkandphylogenetictreesrecapitulatingtheevolutionarypaths.Wefindevidencesupportingtheevolutionhistoryofolfactoryfamily(bothgenesandpseudogenes)inhuman,whichalsosupportsthevalidityofouranalysismethod.Next,weevaluatethesenetworksinrespecttothegeneontologyfromwhichweidentifyfunctionsenrichedinthesepseudogene-genefamiliesandinferfunctionalimpactofpseudogenesinvolvedinthenetworks.ThisdemonstratestheapplicationofourPGGnetworkdatabaseinthestudyofpseudogenefunctionindiseasecontext.

Page 46: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

35

LEVERAGINGPUTATIVEENHANCER-PROMOTERINTERACTIONSTOINVESTIGATETWO-WAYEPISTASISINTYPE2DIABETESGWAS

ElisabettaManduchi1,2,AlessandraChesi2,MollyA.Hall1,StruanF.A.Grant2,JasonH.Moore1

1UniversityofPennsylvania,2TheChildren’sHospitalofPhiladelphia

Elisabetta,ManduchiWeutilizedevidenceforenhancer-promoterinteractionsfromfunctionalgenomicsdatainordertobuildbiologicalfilterstonarrowdownthesearchspacefortwo-waySingleNucleotidePolymorphism(SNP)interactionsinType2Diabetes(T2D)GenomeWideAssociationStudies(GWAS).ThishasledustotheidentificationofareproduciblestatisticallysignificantSNPpairassociatedwithT2D.Asmorefunctionalgenomicsdataarebeinggeneratedthatcanhelpidentifypotentiallyinteractingenhancer-promoterpairsinlargercollectionoftissues/cells,thisapproachhasimplicationsforinvestigationofepistasisfromGWASingeneral.

Page 47: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

36

TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 48: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

37

IMPROVINGPRECISIONINCONCEPTNORMALIZATION

MaylaBoguslav,K.BretonnelCohen,WilliamA.BaumgartnerJr.,LawrenceE.Hunter

ComputationalBioscienceProgram,UniversityofColoradoSchoolofMedicineMayla,BoguslavMostnaturallanguageprocessingapplicationsexhibitatrade-offbetweenprecisionandrecall.Insomeusecasesfornaturallanguageprocessing,therearereasonstoprefertotiltthattrade-offtowardhighprecision.RelyingontheZipfiandistributionoffalsepositiveresults,wedescribeastrategyforincreasingprecision,usingavarietyofbothpre-processingandpost-processingmethods.Theydrawonbothknowledge-basedandfrequentistapproachestomodelinglanguage.Basedonanexistinghigh-performancebiomedicalconceptrecognitionpipelineandapreviouslypublishedmanuallyannotatedcorpus,weapplythishybridrationalist/empiriciststrategytoconceptnormalizationforeightdifferentontologies.Whichapproachesdidanddidnotimproveprecisionvariedwidelybetweentheontologies.

Page 49: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

38

VISAGE:INTEGRATINGEXTERNALKNOWLEDGEINTOELECTRONICMEDICALRECORDVISUALIZATION

EdwardW.Huang,ShengWang,ChengXiangZhai

UniversityofIllinoisatUrbana-ChampaignEdward,HuangInthispaper,wepresentVisAGE,amethodthatvisualizeselectronicmedicalrecords(EMRs)inalow-dimensionalspace.Effectivevisualizationofnewpatientsallowsdoctorstoviewsimilar,previouslytreatedpatientsandtoidentifythenewpatients'diseasesubtypes,reducingthechanceofmisdiagnosis.However,EMRsaretypicallyincompleteorfragmented,resultinginpatientswhoaremissingmanyavailablefeaturesbeingplacednearunrelatedpatientsinthevisualizedspace.VisAGEintegratesseveralexternaldatasourcestoenrichEMRdatabasestosolvethisissue.WeevaluatedVisAGEonadatasetofParkinson'sdiseasepatients.WequalitativelyandquantitativelyshowthatVisAGEcanmoreeffectivelyclusterpatients,whichallowsdoctorstobetterdiscoverpatientsubtypesandthusimprovepatientcare.

Page 50: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

39

ANNOTATINGGENESETSBYMININGLARGELITERATURECOLLECTIONSWITHPROTEINNETWORKS

ShengWang1,JianzhuMa2,MichaelKuYu2,FanZheng2,EdwardW.Huang1,JiaweiHan1,JianPeng1,TreyIdeker2

1DepartmentofComputerScience,UniversityofIllinoisatUrbana-Champaign,Urbana,IL,USA,2SchoolofMedicine,UniversityofCaliforniaSanDiego,SanDiego,CA,USA

Jianzhu,MaAnalysisofpatientgenomesandtranscriptomesroutinelyrecognizesnewgenesetsassociatedwithhumandisease.Herewepresentanintegrativenaturallanguageprocessingsystemwhichinferscommonfunctionsforagenesetthroughautomaticminingofthescientificliteraturewithbiologicalnetworks.Thissystemlinksgeneswithassociatedliteraturephrasesandcombinestheselinkswithproteininteractionsinasingleheterogeneousnetwork.Multiscalefunctionalannotationsareinferredbasedonnetworkdistancesbetweenphrasesandgenesandthenvisualizedasanontologyofbiologicalconcepts.Toevaluatethissystem,wepredictfunctionsforgenesetsrepresentingknownpathwaysandfindthatourapproachachievessubstantialimprovementovertheconventionaltext-miningbaselinemethod.Moreover,oursystemdiscoversnovelannotationsforgenesetsorpathwayswithoutpreviouslyknownfunctions.Twocasestudiesdemonstratehowthesystemisusedindiscoveryofnewcancer-relatedpathwayswithontologicalannotations.

Page 51: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

40

APPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 52: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

41

PREDICTIONOFPROTEIN-LIGANDINTERACTIONSFROMPAIREDPROTEINSEQUENCEMOTIFSANDLIGANDSUBSTRUCTURES

PeytonGreenside,MaureenHillenmeyer,AnshulKundaje

StanfordUniversityPeyton,GreensideIdentificationofsmallmoleculeligandsthatbindtoproteinsisacriticalstepindrugdiscovery.Computationalmethodshavebeendevelopedtoacceleratethepredictionofprotein-ligandbinding,butoftendependon3Dproteinstructures.Asonlyalimitednumberofprotein3Dstructureshavebeenresolved,theabilitytopredictprotein-ligandinteractionswithoutrelyingona3Drepresentationwouldbehighlyvaluable.Weuseaninterpretableconfidence-ratedboostingalgorithmtopredictprotein-ligandinteractionswithhighaccuracyfromligandchemicalsubstructuresandprotein1Dsequencemotifs,withoutrelyingon3Dproteinstructures.Wecompareseveralproteinmotifdefinitions,assessgeneralizationofourmodel’spredictionstounseenproteinsandligands,demonstraterecoveryofwellestablishedinteractionsandidentifygloballypredictiveprotein-ligandmotifpairs.Bybridgingbiologicalandchemicalperspectives,wedemonstratethatitispossibletopredictprotein-ligandinteractionsusingonlymotif-basedfeaturesandthatinterpretationofthesefeaturescanrevealnewinsightsintothemolecularmechanicsunderlyingeachinteraction.Ourworkalsolaysafoundationtoexploremorepredictivefeaturesetsandsophisticatedmachinelearningapproachesaswellasotherapplications,suchaspredictingunintendedinteractionsortheeffectsofmutations.

Page 53: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

42

LOSS-OF-FUNCTIONOFNEUROPLASTICITY-RELATEDGENESCONFERSRISKFORHUMANNEURODEVELOPMENTALDISORDERS

MiloR.Smith,BenjaminS.Glicksberg,LiLi,RongChen,HirofumiMorishita,JoelT.Dudley

IcahnSchoolofMedicineatMountSinai

Milo,SmithHighandincreasingprevalenceofneurodevelopmentaldisordersplaceenormouspersonalandeconomicburdensonsociety.Giventhegrowingrealizationthattherootsofneurodevelopmentaldisordersoftenlieinearlychildhood,thereisanurgentneedtoidentifychildhoodriskfactors.Neurodevelopmentismarkedbyperiodsofheightenedexperience-dependentneuroplasticitywhereinneuralcircuitryisoptimizedbytheenvironment.Ifthesecriticalperiodsaredisrupted,developmentofnormalbrainfunctioncanbepermanentlyaltered,leadingtoneurodevelopmentaldisorders.Here,weaimtosystematicallyidentifyhumanvariantsinneuroplasticity-relatedgenesthatconferriskforneurodevelopmentaldisorders.Historically,thisknowledgehasbeenlimitedbyalackoftechniquestoidentifygenesrelatedtoneurodevelopmentalplasticityinahigh-thoughputmannerandalackofmethodstosystematicallyidentifymutationsinthesegenesthatconferriskforneurodevelopmentaldisorders.Usinganintegrativegenomicsapproach,wedeterminedloss-of-function(LOF)variantsinputativeplasticitygenes,identifiedfromtranscriptionalprofilesofbrainfrommicewithelevatedplasticity,thatwereassociatedwithneurodevelopmentaldisorders.Fromfiveshareddifferentiallyexpressedgenesfoundintwomousemodelsofjuvenile-likeelevatedplasticity(juvenilewild-typeoradultLynx1-/-relativetoadultwild-type)thatwerealsogenotypedintheMountSinaiBioMeBiobankweidentifiedmultipleassociationsbetweenLOFgenesandincreasedriskforneurodevelopmentaldisordersacross10,510patientslinkedtotheMountSinaiElectronicMedicalRecords(EMR),includingepilepsyandschizophrenia.Thisworkdemonstratesanovelapproachtoidentifyneurodevelopmentalriskgenesandpointstowardapromisingavenuetodiscovernewdrugtargetstoaddresstheunmettherapeuticneedsofneurodevelopmentaldisease.

Page 54: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

43

DIFFUSIONMAPPINGOFDRUGTARGETSONDISEASESIGNALINGNETWORKELEMENTSREVEALSDRUGCOMBINATIONSTRATEGIES

JielinXu1,KellyRegan1,SiyuanDeng1,WilliamE.CarsonIII2,PhilipR.O.Payne3,FuhaiLi1

1DeptartmentofBiomedicalInformatics,TheOhioStateUniversity;2ComprehensiveCancerCenter,TheOhioStateUniversity;3InstituteforInformatics,Washington

UniversityinSt.LouisFuhai,LiTheemergenceofdrugresistancetotraditionalchemotherapyandnewertargetedtherapiesincancerpatientsisamajorclinicalchallenge.Reactivationofthesameorcompensatorysignalingpathwaysisacommonclassofdrugresistancemechanisms.Employingdrugcombinationsthatinhibitmultiplemodulesofreactivatedsignalingpathwaysisapromisingstrategytoovercomeandpreventtheonsetofdrugresistance.However,withthousandsofavailableFDA-approvedandinvestigationalcompounds,itisinfeasibletoexperimentallyscreenmillionsofpossibledrugcombinationswithlimitedresources.Therefore,computationalapproachesareneededtoconstrainthesearchspaceandprioritizesynergisticdrugcombinationsforpreclinicalstudies.Inthisstudy,weproposeanovelapproachforpredictingdrugcombinationsthroughinvestigatingpotentialeffectsofdrugtargetsondiseasesignalingnetwork.Wefirstconstructadiseasesignalingnetworkbyintegratinggeneexpressiondatawithdisease-associateddrivergenes.Individualdrugsthatcanpartiallyperturbthediseasesignalingnetworkarethenselectedbasedonadrug-diseasenetwork“impactmatrix”,whichiscalculatedusingnetworkdiffusiondistancefromdrugtargetstosignalingnetworkelements.Theselecteddrugsaresubsequentlyclusteredintocommunities(subgroups),whichareproposedtosharesimilarmechanismsofaction.Finally,drugcombinationsarerankedaccordingtomaximalimpactonsignalingsub-networksfromdistinctmechanism-basedcommunities.Ourmethodisadvantageouscomparedtootherapproachesinthatitdoesnotrequirelargeamountsdrugdoseresponsedata,drug-induced“omics”profilesorclinicalefficacydata,whicharenotoftenreadilyavailable.WevalidateourapproachusingaBRAF-mutantmelanomasignalingnetworkandcombinatorialinvitrodrugscreeningdata,andreportdrugcombinationswithdiversemechanismsofactionandopportunitiesfordrugrepositioning.

Page 55: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

44

CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATA

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 56: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

45

OWL-NETS:TRANSFORMINGOWLREPRESENTATIONSFORIMPROVEDNETWORKINFERENCE

TiffanyJ.Callahan1,WilliamA.BaumgartnerJr.1,MichaelBada1,AdrianneL.Stefanski1,IgnacioTripodi2,ElizabethK.White1,LawrenceE.Hunter1

1UniversityofColoradoDenverAnschutzMedicalCampus,2UniversityofColorado

BoulderTiffany,CallahanOurknowledgeofthebiologicalmechanismsunderlyingcomplexhumandiseaseislargelyincomplete.WhileSemanticWebtechnologies,suchastheWebOntologyLanguage(OWL),providepowerfultechniquesforrepresentingexistingknowledge,well-establishedOWLreasonersareunabletoaccountformissingoruncertainknowledge.Theapplicationofinductiveinferencemethods,likemachinelearningandnetworkinferencearevitalforextendingourcurrentknowledge.Therefore,robustmethodswhichfacilitateinductiveinferenceonrichOWL-encodedknowledgeareneeded.Here,weproposeOWL-NETS(NEtworkTransformationforStatisticallearning),anovelcomputationalmethodthatreversiblyabstractsOWL-encodedbiomedicalknowledgeintoanetworkrepresentationtailoredfornetworkinference.UsingseveralexamplesbuiltwiththeOpenBiomedicalOntologies,weshowthatOWL-NETScanleverageexistingontology-basedknowledgerepresentationsandnetworkinferencemethodstogeneratenovel,biologically-relevanthypotheses.Further,thelosslesstransformationofOWL-NETSallowsforseamlessintegrationofinferrededgesbackintotheoriginalknowledgebase,extendingitscoverageandcompleteness.

Page 57: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

46

ANULTRA-FASTANDSCALABLEQUANTIFICATIONPIPELINEFORTRANSPOSABLEELEMENTSFROMNEXTGENERATIONSEQUENCINGDATA

Hyun-HwanJeong,HariKrishnaYalamanchili,CaiweiGuo,Joshua,M.Shulman,ZhandongLiu

CollegeofMedicine,JanandDanDuncanNeurologicalResearchInstitute

Hyun-Hwan,JeongTransposableelements(TEs)areDNAsequenceswhicharecapableofmovingfromonelocationtoanotherandrepresentalargeproportion(45%)ofthehumangenome.TEshavefunctionalrolesinavarietyofbiologicalphenomenasuchascancer,neurodegenerativedisease,andaging.RapiddevelopmentinRNA-sequencingtechnologyhasenabledus,forthefirsttime,tostudytheactivityofTEatthesystemslevel.However,efficientTEanalysistoolsarenotyetdeveloped.Inthiswork,wedevelopedSalmonTE,afastandreliablepipelineforthequantificationofTEsfromRNA-seqdata.WebenchmarkedourtoolagainstTEtranscripts,awidelyusedTEquantificationmethod,andthreeotherquantificationmethodsusingseveralRNA-seqdatasetsfromDrosophilamelanogasterandhumancell-line.Weachieved20timesfasterexecutionspeedwithoutcompromisingtheaccuracy.ThispipelinewillenablethebiomedicalresearchcommunitytoquantifyandanalyzeTEsfromlargeamountsofdataandleadtonovelTEcentricdiscoveries.

Page 58: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

47

IMPROVINGTHEEXPLAINABILITYOFRANDOMFORESTCLASSIFIER–USERCENTEREDAPPROACH

DragutinPetkovic1,3,RussB.Altman2,MikeWong3,ArthurVigil4

1ComputerScienceDepartment,SanFranciscoStateUniversity(SFSU),1600HollowayAve.,SanFranciscoCA94132,[email protected];2DepartmentofBioengineering,

StanfordUniversity,443ViaOrtegaDrive,Stanford,CA94305-4145;3SFSUCenterforComputingforLifeSciences,1600HollowayAve.,SanFrancisco,CA94132;4Twist

Bioscience,455MissionBayBoulevardSouth,SanFrancisco,CA94158Dragutin,PetkovicMachineLearning(ML)methodsarenowinfluencingmajordecisionsaboutpatientcare,newmedicalmethods,drugdevelopmentandtheiruseandimportancearerapidlyincreasinginallareas.However,theseMLmethodsareinherentlycomplexandoftendifficulttounderstandandexplainresultinginbarrierstotheiradoptionandvalidation.Ourwork(RFEX)focusesonenhancingRandomForest(RF)classifierexplainabilitybydevelopingeasytointerpretexplainabilitysummaryreportsfromtrainedRFclassifiersasawaytoimprovetheexplainabilityfor(oftennon-expert)users.RFEXisimplementedandextensivelytestedonStanfordFEATUREdatawhereRFistaskedwithpredictingfunctionalsitesin3Dmoleculesbasedontheirelectrochemicalsignatures(features).IndevelopingRFEXmethodweapplyuser-centeredapproachdrivenbyexplainabilityquestionsandrequirementscollectedbydiscussionswithinterestedpractitioners.Weperformedformalusabilitytestingwith13expertandnon-expertuserstoverifyRFEXusefulness.AnalysisofRFEXexplainabilityreportanduserfeedbackindicatesitsusefulnessinsignificantlyincreasingexplainabilityanduserconfidenceinRFclassificationonFEATUREdata.Notably,RFEXsummaryreportseasilyrevealthatoneneedsveryfew(from2-6dependingonamodel)toprankedfeaturestoachieve90%orbetteroftheaccuracywhenall480featuresareused.Keywords:RandomForest,Explainability,Interpretability,StanfordFEATURE

Page 59: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

48

TREE-BASEDMETHODSFORCHARACTERIZINGTUMORDENSITYHETEROGENEITY

KatherineShoemaker1,BrianP.Hobbs2,KarthikBharath3,ChaanS.Ng2,VeerabhadranBaladandayuthapani2

1RiceUniversity,2MDAndersonCancerCenter,3UniversityofNottingham

Katherine,ShoemakerSolidlesionsemergewithindiversetissueenvironmentsmakingtheircharacterizationanddiagnosisachallenge.Withtheadventofcancerradiomics,avarietyoftechniqueshavebeendevelopedtotransformimagesintoquantifiablefeaturesetsproducingsummarystatisticsthatdescribethemorphologyandtextureofsolidmasses.Relyingonempiricaldistributionsummariesaswellasgrey-levelco-occurrencestatistics,severalapproacheshavebeendevisedtocharacterizetissuedensityheterogeneity.Thisarticleproposesanoveldecision-treebasedapproachwhichquantifiesthetissuedensityheterogeneityofagivenlesionthroughitsresultantdistributionoftree-structureddissimilaritymetricscomputedwithleastcommonancestortreesunderrepeatedpixelre-sampling.Themethodology,basedonstatisticsderivedfromGalton-Watsontrees,producesmetricsthatareminimallycorrelatedwithexistingfeatures,addingnewinformationtothefeaturespaceandimprovingquantitativecharacterizationoftheextenttowhichaCTimageconveysheterogeneousdensitydistribution.Wedemonstrateitspracticalapplicationthroughadiagnosticstudyofadrenallesions.Integratingtheproposedwithexistingfeaturesidentifiesclassifiersofthreeimportantlesiontypes;malignantfrombenign(AUC=0.78),functioningfromnon-functioning(AUC=0.93)andcalcifiedfromnon-calcified(AUCof1).

Page 60: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

49

DEMOCRATIZINGHEALTHDATAFORTRANSLATIONALRESEARCH

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 61: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

50

IDENTIFYINGNATURALHEALTHPRODUCTANDDIETARYSUPPLEMENTINFORMATIONWITHINADVERSEEVENTREPORTINGSYSTEMS

VivekanandSharma,IndraNeilSarkar

CenterforBiomedicalInformatics,BrownUniversityVivekanand,Sharma

Dataonsafetyandefficacyissuesassociatedwithnaturalhealthproductsanddietarysupplements(NHP&S)remainslargelycloisteredwithindomainspecificdatabasesorembeddedwithingeneralbiomedicaldatasources.Amajorchallengeinleveraginganalyticapproachesonsuchdataisduetotheinefficientabilitytoretrieverelevantdata,whichincludesagenerallackofinteroperabilityamongrelatedsources.ThisstudydevelopedathesaurusofNHP&Singredienttermsthatcanbeusedbyexistingbiomedicalnaturallanguageprocessing(NLP)toolsforextractinginformationofinterest.ThisprocesswasevaluatedrelativetointerventionnamestringssampledfromtheUnitedStatesFoodandDrugAdministrationAdverseEventReportingSystem(FAERS).AusecasewasusedtodemonstratethepotentialtoutilizeFAERSformonitoringNHP&Sadverseevents.Theresultsfromthisstudyprovideinsightsonapproachesforidentifyingadditionalknowledgefromextantrepositoriesofknowledge,andpotentiallyasinformationthatcanbeincludedintolargercurationefforts.

Page 62: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

51

DEMOCRATIZINGDATASCIENCETHROUGHDATASCIENCETRAINING

JohnDarrellVanHorn1,LilyFierro2,JeanaKamdar1,JonathanGordon2,CrystalStewart1,AvnishBhattrai1,SumikoAbe1,XiaoxiaoLei1,CarolineO’Driscoll1,AakanchhaSinha2,

PriyambadaJain2,GullyBurns2,KristinaLerman2,JoséLuisAmbite2

1USCMarkandMaryStevensNeuroimagingandInformaticsInstitute,KeckSchoolofMedicineofUSC,UniversityofSouthernCalifornia,2025ZonalAvenue,SHN,Los

Angeles,CA90033,Phone:323-442-7246;2InformationSciencesInstitute,UniversityofSouthernCalifornia,MarinadelRey,CA,USA

John,VanHornThebiomedicalscienceshaveexperiencedanexplosionofdatawhichpromisestooverwhelmmanycurrentpractitioners.Withouteasyaccesstodatasciencetrainingresources,biomedicalresearchersmayfindthemselvesunabletowrangletheirowndatasets.In2014,toaddressthechallengesposedsuchadataonslaught,theNationalInstitutesofHealth(NIH)launchedtheBigDatatoKnowledge(BD2K)initiative.Tothisend,theBD2KTrainingCoordinatingCenter(TCC;bigdatau.org)wasfundedtofacilitatebothin-personandonlinelearning,andopenuptheconceptsofdatasciencetothewidestpossibleaudience.Here,wedescribetheactivitiesoftheBD2KTCCanditsfocusontheconstructionoftheEducationalResourceDiscoveryIndex(ERuDIte),whichidentifies,collects,describes,andorganizesonlinedatasciencematerialsfromBD2Kawardees,openonlinecourses,andvideosfromscientificlecturesandtutorials.ERuDItenowindexesover9,500resources.Giventherichnessofonlinetrainingmaterialsandtheconstantevolutionofbiomedicaldatascience,computationalmethodsapplyinginformationretrieval,naturallanguageprocessing,andmachinelearningtechniquesarerequired-ineffect,usingdatasciencetoinformtrainingindatascience.Insodoing,theTCCseekstodemocratizenovelinsightsanddiscoveriesbroughtforthvialarge-scaledatasciencetraining.

Page 63: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

52

IMAGINGGENOMICS

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 64: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

53

HERITABILITYESTIMATESONRESTINGSTATEFMRIDATAUSINGTHEENIGMAANALYSISPIPELINE

BhimM.Adhikari1,NedaJahanshad2,DineshShukla1,DavidC.Glahn3,JohnBlangero4,RichardC.Reynolds5,RobertW.Cox5,ElsFieremans6,JelleVeraart6,DmitryS.Novikov6,

ThomasE.Nichols7,L.ElliotHong1,PaulM.Thompson2,PeterKochunov1

1MarylandPsychiatricResearchCenter,DepartmentofPsychiatry,UniversityofMarylandSchoolofMedicine,Baltimore,MD,USA;2ImagingGeneticsCenter,StevensInstituteforNeuroimaging&Informatics,KeckSchoolofMedicineofUSC,MarinadelRey,CA,USA;3DepartmentofPsychiatry,YaleUniversity,SchoolofMedicine,New

Haven,CT,USA;4GenomicsComputingCenter,UniversityofTexasatRioGrandeValley,USA;5NationalInstituteofMentalHealth,Bethesda,MD,USA;6CenterforBiomedicalImaging,DepartmentofRadiology,NewYorkUniversitySchoolofMedicine,NY,USA;

7DepartmentofStatistics,UniversityofWarwick,Coventry,CV47AL,UKPeter,KochunovBigdatainitiativessuchastheEnhancingNeuroImagingGeneticsthroughMeta-Analysisconsortium(ENIGMA),combinedatacollectedbyindependentstudiesworldwidetoachievemoregeneralizableestimatesofeffectsizesandmorereliableandreproducibleoutcomes.Sucheffortsrequireharmonizedimageanalysesprotocolstoextractphenotypesconsistently.ThisharmonizationisparticularlychallengingforrestingstatefMRIduetothewidevariabilityofacquisitionprotocolsandscannerplatforms;thisleadstosite-to-sitevarianceinquality,resolutionandtemporalsignal-to-noiseratio(tSNR).Aneffectiveharmonizationshouldprovideoptimalmeasuresfordataofdifferentqualities.Wedevelopedamulti-sitersfMRIanalysispipelinetoallowresearchgroupsaroundtheworldtoprocessrsfMRIscansinaharmonizedway,toextractconsistentandquantitativemeasurementsofconnectivityandtoperformcoordinatedstatisticaltests.Weusedthesingle-modalityENIGMArsfMRIpreprocessingpipelinebasedonmodel-freeMarchenko-PasturPCAbaseddenoisingtoverifyandreplicaterestingstatenetworkheritabilityestimates.Weanalyzedtwoindependentcohorts,GOBS(GeneticsofBrainStructure)andHCP(theHumanConnectomeProject),whichcollecteddatausingconventionalandconnectomicsorientedfMRIprotocols,respectively.Weusedseed-basedconnectivityanddual-regressionapproachestoshowthatthersfMRIsignalisconsistentlyheritableacrosstwentymajorfunctionalnetworkmeasures.Heritabilityvaluesof20-40%wereobservedacrossbothcohorts.

Page 65: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

54

MRITOMGMT:PREDICTINGMETHYLATIONSTATUSINGLIOBLASTOMAPATIENTSUSINGCONVOLUTIONALRECURRENTNEURALNETWORKS

LichyHan,MaulikR.Kamdar

PrograminBiomedicalInformatics,StanfordUniversitySchoolofMedicineLichy,HanGlioblastomaMultiforme(GBM),amalignantbraintumor,isamongthemostlethalofallcancers.TemozolomideistheprimarychemotherapytreatmentforpatientsdiagnosedwithGBM.ThemethylationstatusofthepromoterortheenhancerregionsoftheO6-methylguaninemethyltransferase(MGMT)genemayimpacttheefficacyandsensitivityoftemozolomide,andhencemayaffectoverallpatientsurvival.Microscopicgeneticchangesmaymanifestasmacroscopicmorphologicalchangesinthebraintumorsthatcanbedetectedusingmagneticresonanceimaging(MRI),whichcanserveasnoninvasivebiomarkersfordeterminingmethylationofMGMTregulatoryregions.Inthisresearch,weuseacompendiumofbrainMRIscansofGBMpatientscollectedfromTheCancerImagingArchive(TCIA)combinedwithmethylationdatafromTheCancerGenomeAtlas(TCGA)topredictthemethylationstateoftheMGMTregulatoryregionsinthesepatients.Ourapproachreliesonabi-directionalconvolutionalrecurrentneuralnetworkarchitecture(CRNN)thatleveragesthespatialaspectsofthese3-dimensionalMRIscans.OurCRNNobtainsanaccuracyof67%onthevalidationdataand62%onthetestdata,withprecisionandrecallbothat67%,suggestingtheexistenceofMRIfeaturesthatmaycomplementexistingmarkersforGBMpatientstratificationandprognosis.Wehaveadditionallypresentedourmodelviaanovelneuralnetworkvisualizationplatform,whichwehavedevelopedtoimproveinterpretabilityofdeeplearningMRI-basedclassificationmodels.

Page 66: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

55

BUILDINGTRANS-OMICSEVIDENCE:USINGIMAGINGAND‘OMICS’TOCHARACTERIZECANCERPROFILES

ArunimaSrivastava1,ChaitanyaKulkarni1,ParagMallick2,KunHuang3,RaghuMachiraju1

1TheOhioStateUniversity,2StanfordUniversity,3IndianaUniversitySchoolofMedicineArunima,SrivastavaUtilizationofsinglemodalitydatatobuildpredictivemodelsincancerresultsinarathernarrowviewofmostpatientprofiles.Someclinicalfacetsrelatestronglytohistologyimagefeatures,e.g.tumorstages,whereasothersareassociatedwithgenomicandproteomicvariations(e.g.cancersubtypesanddiseaseaggressionbiomarkers).Wehypothesizethattherearecoherent“trans-omics”featuresthatcharacterizevariedclinicalcohortsacrossmultiplesourcesofdataleadingtomoredescriptiveandrobustdiseasecharacterization.Inthiswork,for105breastcancerpatientsfromtheTCGA(TheCancerGenomeAtlas),weconsiderfourclinicalattributes(AJCCStage,TumorStage,ER-StatusandPAM50mRNASubtypes),andbuildpredictivemodelsusingthreedifferentmodalitiesofdata(histopathologicalimages,transcriptomicsandproteomics).Followingwhich,weidentifycriticalmulti-levelfeaturesthatdrivesuccessfulclassificationofpatientsforthevariousdifferentcohorts.Tobuildpredictorsforeachdatatype,weemploywidelyused“bestpractice”techniquesincludingCNN-based(convolutionalneuralnetwork)classifiersforhistopathologicalimagesandregressionmodelsforproteogenomicdata.While,asexpected,histologyimagesoutperformedmolecularfeatureswhilepredictingcancerstages,andtranscriptomicsheldsuperiordiscriminatorypowerforER-StatusandPAM50subtypes,thereexistafewcaseswherealldatamodalitiesexhibitedcomparableperformance.Further,wealsoidentifiedsetsofkeygenesandproteinswhoseexpressionandabundancecorrelateacrosseachclinicalcohortincluding(i)tumorseverityandprogression(incl.GABARAP),(ii)ER-status(incl.ESR1)and(iii)diseasesubtypes(incl.FOXC1).Thus,wequantitativelyassesstheefficacyofdifferentdatatypestopredictcriticalbreastcancerpatientattributesandimprovediseasecharacterization.

Page 67: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

56

PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 68: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

57

LOCALANCESTRYTRANSITIONSMODIFYSNP-TRAITASSOCIATIONS

AlexandraE.Fish1,DanaC.Crawford2,JohnA.Capra1,WilliamS.Bush2

1VanderbiltUniversity,2CaseWesternReserveUniversityWilliam,BushGenomicmapsoflocalancestryidentifyancestrytransitions–pointsonachromosomewhererecentrecombinationeventsinadmixedindividualshavejoinedtwodifferentancestralhaplotypes.Theseeventsbringtogetherallelesthatevolvedwithinseparatecontinentialpopulations,providingauniqueopportunitytoevaluatethejointeffectoftheseallelesonhealthoutcomes.Inthiswork,weevaluatetheimpactofgeneticvariantsinthecontextofnearbylocalancestrytransitionswithinasampleofnearly10,000adultsofAfricanancestrywithtraitsderivedfromelectronichealthrecords.GeneticdatawaslocatedusingtheMetabochip,andusedtoderivelocalancestry.Wedevelopamodelthatcapturestheeffectofbothsinglevariantsandlocalancestry,anduseittoidentifyexampleswherelocalancestrytransitionssignificantlyinteractwithnearbyvariantstoinfluencemetabolictraits.Inourmostcompellingexample,wefindthattheminoralleleofrs16890640occuringonaEuropeanbackgroundwithadownstreamlocalancestrytransitiontoAfricanancestryresultsinsignificantlylowermeancorpuscularhemoglobinandvolume.Thisfindingrepresentsanewwayofdiscoveringgeneticinteractions,andissupportedbymoleculardatathatsuggestchangestolocalancestrymayimpactlocalchromatinlooping.

Page 69: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

58

EVALUATIONOFPREDIXCANFORPRIORITIZINGGWASASSOCIATIONSANDPREDICTINGGENEEXPRESSION

BinglanLi1,ShefaliS.Verma1,2,YogasudhaC.Veturi2,AnuragVerma1,2,YukiBradford2,DavidW.Haas3,4,MarylynD.Ritchie1,2

1TheHuckInstitutesoftheLifeSciences,ThePennsylvaniaStateUniversity,UniversityPark,PA,USA;2BiomedicalandTranslationalInformaticsInstitute,Danville,PA,USA;3DepartmentofMedicine,Pharmacology,Pathology,Microbiology&Immunology,

VanderbiltUniversitySchoolofMedicine,Nashville,TN,USA;4DepartmentofInternalMedicine,MeharryMedicalCollege,Nashville,TN,USA

Binglan,LiGenome-wideassociationstudies(GWAS)havebeensuccessfulinfacilitatingtheunderstandingofgeneticarchitecturebehindhumandiseases,butthisapproachfacesmanychallenges.Toidentifydisease-relatedlociwithmodesttoweakeffectsize,GWASrequiresverylargesamplesizes,whichcanbecomputationalburdensome.Inaddition,theinterpretationofdiscoveredassociationsremainsdifficult.PrediXcanwasdevelopedtohelpaddresstheseissues.WithbuiltinSNP-expressionmodels,PrediXcanisabletopredicttheexpressionofgenesthatareregulatedbyputativeexpressionquantitativetraitloci(eQTLs),andthesepredictedexpressionlevelscanthenbeusedtoperformgene-basedassociationstudies.Thisapproachreducesthemultipletestingburdenfrommillionsofvariantsdowntoseveralthousandgenes.Butmostimportantly,theidentifiedassociationscanrevealthegenesthatareunderregulationofeQTLsandconsequentlyinvolvedindiseasepathogenesis.Inthisstudy,twoofthemostpracticalfunctionsofPrediXcanweretested:1)predictinggeneexpression,and2)prioritizingGWASresults.WetestedthepredictionaccuracyofPrediXcanbycomparingthepredictedandobservedgeneexpressionlevels,andalsolookedintosomepotentialinfluentialfactorsandafiltercriterionwiththeaimofimprovingPrediXcanperformance.AsforGWASprioritization,predictedgeneexpressionlevelswereusedtoobtaingene-traitassociations,andbackgroundregionsofsignificantassociationswereexaminedtodecreasethelikelihoodoffalsepositives.Ourresultsshowedthat1)PrediXcanpredictedgeneexpressionlevelsaccuratelyforsomebutnotallgenes;2)includingmoreputativeeQTLsintopredictiondidnotimprovethepredictionaccuracy;and3)integratingpredictedgeneexpressionlevelsfromthetwoPrediXcanwholebloodmodelsdidnoteliminatefalsepositives.Still,PrediXcanwasabletoprioritizeGWASassociationsthatwerebelowthegenome-widesignificancethresholdinGWAS,whileretainingGWASsignificantresults.ThisstudysuggestsseveralwaystoconsiderPrediXcan’sperformancethatwillbeofvaluetoeQTLandcomplexhumandiseaseresearch.

Page 70: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

59

READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROM

NONCODINGDNA

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 71: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

60

PAN-CANCERANALYSISOFEXPRESSEDSOMATICNUCLEOTIDEVARIANTSINLONGINTERGENICNON-CODINGRNA

TraversChing1,2,LanaX.Garmire1,2

1MolecularBiosciencesandBioengineeringGraduateProgram,UniversityofHawaiiatManoaHonolulu,HI96822,USA;2EpidemiologyProgram,UniversityofHawaiiCancer

CenterHonolulu,HI96813,USALana,GarmireLongintergenicnon-codingRNAshavebeenshowntoplayimportantrolesincancer.However,becauselincRNAsarearelativelynewclassofRNAscomparedtoprotein-codingmRNAs,themutationallandscapeoflincRNAshasnotbeenasextensivelystudied.HerewecharacterizeexpressedsomaticnucleotidevariantswithinlincRNAsusing12cancerRNA-SeqdatasetsinTCGA.Webuildmachine-learningmodelstodiscriminatesomaticvariantsfromgermlinevariantswithinlincRNAregions(AUC0.987).WebuildanothermodeltodifferentiatelincRNAsomaticmutationsfrombackgroundregions(AUC0.72)andfindseveralmolecularfeaturesthatarestronglyassociatedwithlincRNAmutations,includingcopynumbervariation,conservation,substitutiontypeandhistonemarkerfeatures.

Page 72: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

61

TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 73: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

62

GENEDIVE:AGENEINTERACTIONSEARCHANDVISUALIZATIONTOOLTOFACILITATEPRECISIONMEDICINE

PaulPrevide1,BrookThomas1,MikeWong1,EmilyK.Mallory2,DragutinPetkovic1,RussB.Altman2,AnaghaKulkarni1

1SanFranciscoStateUniversity,2StanfordUniversity

Anagha,KulkarniObtainingrelevantinformationaboutgeneinteractionsiscriticalforunderstandingdiseaseprocessesandtreatment.Withtheriseintextminingapproaches,thevolumeofsuchbiomedicaldataisrapidlyincreasing,therebycreatinganewproblemfortheusersofthisdata:informationoverload.Atoolforefficientqueryingandvisualizationofbiomedicaldatathathelpsresearchersunderstandtheunderlyingbiologicalmechanismsfordiseasesanddrugresponses,andultimatelyhelpspatients,issorelyneeded.TothisendwehavedevelopedGeneDive,aweb-basedinformationretrieval,filtering,andvisualizationtoolforlargevolumesofgeneinteractiondata.GeneDiveoffersvariousfeaturesandmodalitiesthatguidetheuserthroughthesearchprocesstoefficientlyreachtheinformationoftheirinterest.GeneDivecurrentlyprocessesoverthreemilliongene-geneinteractionswithresponsetimeswithinafewseconds.Foroverhalfofthecuratedgenesetssourcedfromfourprominentdatabases,morethan80%ofthegenesetmembersarerecoveredbyGeneDive.Inthenearfuture,GeneDivewillseamlesslyaccommodateotherinteractiontypes,suchasgene-drugandgene-diseaseinteractions,thusenablingfullexplorationoftopicssuchasprecisionmedicine.TheGeneDiveapplicationandinformationaboutitsunderlyingsystemarchitectureareavailableathttp://www.genedive.net.

Page 74: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

63

APPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY

POSTERPRESENTATIONS

Page 75: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

64

CELL-SPECIFICPREDICTIONANDAPPLICATIONOFDRUG-INDUCEDGENEEXPRESSIONPROFILES

RachelHodos1,2,PingZhang3,Hao-ChihLee1,QiaonanDuan1,ZichenWang1,NeilR.Clark1,AviMa’ayan1,FeiWang3,4,BrianKidd1,JianyingHu3,DavidSontag5,JoelT.

Dudley1

1IcahnSchoolofMedicineatMountSinai,2NewYorkUniversity,3IBMT.J.WatsonResearchCenter,4CornellUniversity,5MassachusettsInstituteofTechnology

Rachel,HodosGeneexpressionprofilingofinvitrodrugperturbationsisusefulformanybiomedicaldiscoveryapplicationsincludingdrugrepurposingandelucidationofdrugmechanisms.However,limiteddataavailabilityacrosscelltypeshashinderedourcapacitytoleverageorexplorethecell-specificityoftheseperturbations.Whilerecenteffortshavegeneratedalargenumberofdrugperturbationprofilesacrossavarietyofhumancelltypes,manygapsremaininthiscombinatorialdrug-cellspace.Hence,weaskedwhetheritispossibletofillthesegapsbypredictingcell-specificdrugperturbationprofilesusingavailableexpressiondatafromrelatedconditions--i.e.fromotherdrugsandcelltypes.Wedevelopedacomputationalframeworkthatfirstarrangesexistingprofilesintoathree-dimensionalarray(ortensor)indexedbydrugs,genes,andcelltypes,andthenuseseitherlocal(nearest-neighbors)orglobal(tensorcompletion)informationtopredictunmeasuredprofiles.Weevaluatepredictionaccuracyusingavarietyofmetrics,andfindthatthetwomethodshavecomplementaryperformance,eachsuperiorindifferentregionsinthedrug-cellspace.Predictionsachievecorrelationsof0.68withtruevalues,andmaintainaccuratedifferentiallyexpressedgenes(AUC0.81).Finally,wedemonstratethatthepredictedprofilesaddvalueformakingdownstreamassociationswithdrugtargetsandtherapeuticclasses.

Page 76: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

65

SYSTEMATICDISCOVERYOFGENOMICMARKERSFORCLINICALOUTCOMESTHROUGHCOMBINEDANALYSISOFCLINICALANDGENOMICDATA

JinhoKim1,HonguiCha2,Hyun-TaeShin2,BoramLee2,JaeWonYun2,JoonHoKang3,Woong-YangPark1

1SamsungMedicalCenter,2SungkyunkwanUniversity,3SungkyunkwanUniversitySchool

ofMedicineJinho,KimMolecularprofilingisakeycomponentofprecisionmedicineforcancer,asitprovidestargetablegeneorpathwaystopreventthetumortogrow.Inthisregard,moreandmorecancerclinicsemployclinicalsequencingplatformandareaccumulatingclinicogenomicsdata.However,ithasnotbeensystematicallystudiedhowgenomicalterationsinparticularvariantsinDNAcanbenefitinpredictingclinicaloutcomes.Herewedescribesystematicanalysestogainbiologicalinsightsfromacancergenomedatabankassociatedwiththeclinicalinformation.WeestablishedalargedatabankofclinicalandgenomicinformationthroughourNGS-basedclinicalsequencingplatform,CancerSCAN.Weidentifiednovelclinicallyrelevantvariantmarkerswhichpotentiallyimplicatedinpatientsurvivalandresponsetochemotherapeuticagents.Finally,webuildamultigenemodeltopredictclinicaloutcome.Themodelcorrectlycapturedclinicallyrelevantsomaticvariantsandwasvalidatedusinganindependentcohort.Ourstudyprovidesavaluableresourcetorealizeprecisiononcology.

Page 77: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

66

IDENTIFICATIONOFAPREDICTIVEGENESIGNATUREFORDIFFERENTIATINGTHEEFFECTSOFCIGARETTESMOKING

GangLiu1,JustinLi2,G.L.Prasad1

1RAIServicesCompany,P.O.Box1487,Winston-Salem,NC27102,USA;2AccuraScience,5721MerleHayRoad,Johnston,IA50131,USA

Background:Chroniccigarettesmokingadverselyimpactsmultipleorgansandisamajorriskfactorforseveraldiseasessuchascancer,cardiovasculardiseases,andchronicpulmonaryobstructivedisease(COPD).Becausesmoking-relateddiseasesoftendevelopoveralongperiod,itisusefultoinvestigatetheeffectsofsmokinginhealthyindividualstounderstandthepre-clinicalchangesthatleadtodiseasestates.Thoseearlymoleculareventscouldbefurtherdevelopedintobiomarkersthatareindicativeoftheadverseeffectsofsmoking.Severalclassesofdifferenttobaccoproducts,includingelectroniccigarettes(E-cigs),arecurrentlymarketedintheUSA,andtheirimpactonconsumershasnotyetbeenfullyunderstood.Giventhatthereisnoepidemiologydataavailableforthesenewclassesoftobaccoproducts,anunderstandingoftheearlymolecularandcellularchangesinhealthyconsumerscouldhelptodifferentiatetheeffectsofcigarettesandotherclassesoftobaccoproducts.Towardthatend,inthisstudy,weaimtodeveloppredictivegenesignaturesthatcanbeusedtodifferentiatesmokersfromnon-tobaccoconsumers.Methods:Thedataweusedforidentificationofgenesignatureswerederivedfromblood-basedgenome-wideexpressionprofilesfrom40smokersand40non-tobaccoconsumersenrolledinacross-sectionalbiomarkerstudy.Wesystematicallyevaluatedtheperformanceofseveralmachinelearningalgorithms.Thesealgorithmsarecombinationsoffourclassificationmethods,includingSupportVectorMachine(SVM),andfourfeatureselectionmethodsincludingRecursiveFeatureElimination(RFE).Eachgeneexpressionsignaturemodelwasconstructedusingatwo-layercross-validationscheme.TheywereevaluatedusingaccuracyandMathew’scorrelationcoefficient(MCC),whichareperformanceevaluationmetricswidelyusedinmachinelearningtechniques.Results:OurresultssuggestthatSVMcombinedwithRFEoutperformsthe15otheralgorithmswehavetested.Thisledtoidentificationofa32-genesignaturewithhighsensitivityandspecificity.Inaddition,thisnewgenesignatureachievesexcellentvalidationresults(accuracy:0.87,MCC:0.7)whenevaluatedusinganotherindependentmicroarraydatasetfromsmokersandnon-smokers.Thegenesinthe32-genesignatureincludepreviouslyreportedgenebiomarkerssuchasGPR15,SASH1,andLRRN3,andalsoconsistofadditionalnovelgenesassociatedwithinflammation,liverinjury,andarachidonicacidmetabolism.Wearecurrentlyworkingtofurtherrefineandvalidatethisgenesignatureusingotherpublically-availablesmoking-relatedgeneexpressiondatasetsandthepolymerasechainreaction-basedassay.Conclusions:Wehavedescribedahigh-performing32-genesignaturethatenablespredictionofmolecularchangesinhealthysmokers.ThisgenesignaturecouldaidindifferentiatingtheeffectsofadditionalclassesoftobaccoproductssuchasE-cigs.

Page 78: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

67

THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY

MaryA.Pyc,DouglasFenger,PhilipCheung,J.StevendeBelle,TimTully

DartNeuroScienceDouglas,FengerWeareinterestedindiscoveringcandidatetargetsfordrugtherapiestoenhancecognitivevitalityinhumansthroughoutlife,andtoremediatememorydeficitsassociatedwithbraininjuryandbrain-relateddiseasessuchasAlzheimer’sandParkinson’sdiseases.WeimplementedaGenome-WideAssociationStudy(GWAS)toidentifygeneticlocivaryingamongindividualswhopossessexceptionalandnormalmemoryabilities.Thesegenesandthoseinassociatednetworkswillinformdrugdiscoveryanddevelopment.Ourfirststepistoidentifyexceptionalmembersofthepopulation.Thus,wehavecreatedanonlinememorytest–theExtremeMemoryChallenge(XMC,accessibleathttp://www.extremememorychallenge.com)–toscreenthroughanunlimitednumberofsubjectstofindindividualswithexceptionalmemoryconsolidationabilities.AsubsetofsubjectswerevalidatedbyabatteryofsecondarymemorytasksandprovidedsalivasamplesfromwhichwecanisolateDNAforGWAS.Todate,26,348participantsfrom187nationshavebeenscreened(with16,486completingbothsessions).ThesampleisprimarilyCaucasian(58%),post-secondaryschool-educated(64%),averageageof34yearsold,andequalnumbersofeachgender.Theaverageforgettingrateacrosssessionswas10%.Thesecondaryscreeninginvolvedmemory,IQ,attentionalcontrol,andpersonalitymeasures.Analysesareunderwaytodeterminetherelationshipbetweenexceptionalmemoryandgenetics.

Page 79: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

68

EXTRACTINGABIOLOGICALLYRELEVANTLATENTSPACEFROMCANCERTRANSCRIPTOMESWITHVARIATIONALAUTOENCODERS

GregoryP.Way,CaseyS.Greene

GenomicsandComputationalBiologyGraduateProgram,DepartmentofSystemsPharmacologyandTranslationalTherapeutics,UniversityofPennsylvania,Philadelphia,

PA19104USAGregory,WayTheCancerGenomeAtlas(TCGA)hasprofiledover10,000tumorsacross33differentcancer-typesformanygenomicfeatures,includinggeneexpressionlevels.Geneexpressionmeasurementscapturesubstantialinformationaboutthestateofeachtumor.Certainclassesofdeepneuralnetworkmodelsarecapableoflearningameaningfullatentspace.Suchalatentspacecouldbeusedtoexploreandgeneratehypotheticalgeneexpressionprofilesundervarioustypesofmolecularandgeneticperturbation.Forexample,onemightwishtousesuchamodeltopredictatumor'sresponsetospecifictherapiesortocharacterizecomplexgeneexpressionactivationsexistingindifferentialproportionsindifferenttumors.Variationalautoencoders(VAEs)areadeepneuralnetworkapproachcapableofgeneratingmeaningfullatentspacesforimageandtextdata.Inthiswork,wesoughttodeterminetheextenttowhichaVAEcanbetrainedtomodelcancergeneexpression,andwhetherornotsuchaVAEwouldcapturebiologically-relevantfeatures.Inthefollowingreport,weintroduceaVAEtrainedonTCGApan-cancerRNA-seqdata,identifyspecificpatternsintheVAEencodedfeatures,anddiscusspotentialmeritsoftheapproach.Wenameourmethod"Tybalt"afteraninstigative,cat-likecharacterwhosetsacascadingchainofeventsinmotioninShakespeare'sRomeoandJuliet.Fromasystemsbiologyperspective,Tybaltcouldonedayaidincancerstratificationorpredictspecificactivatedexpressionpatternsthatwouldresultfromgeneticchangesortreatmenteffects.

Page 80: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

69

CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATAORALPRESENTATION

POSTERPRESENTATIONS

Page 81: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

70

LARGE-SCALEANALYSISOFDISEASEPATHWAYSINTHEHUMANINTERACTOME

MonicaAgrawal1,MarinkaZitnik1,JureLeskovec1,2

1DepartmentofComputerScience,StanfordUniversity;2ChanZuckerbergBiohub,SanFrancisco,CA

Marinka,ZitnikDiscoveringdiseasepathways,whichcanbedefinedassetsofproteinsassociatedwithagivendisease,isanimportantproblemthathasthepotentialtoprovideclinicallyactionableinsightsfordiseasediagnosis,prognosis,andtreatment.Computationalmethodsaidthediscoverybyrelyingonprotein-proteininteraction(PPI)networks.Theystartwithafewknowndisease-associatedproteinsandaimtofindtherestofthepathwaybyexploringthePPInetworkaroundtheknowndiseaseproteins.However,thesuccessofsuchmethodshasbeenlimited,andfailurecaseshavenotbeenwellunderstood.HerewestudythePPInetworkstructureof519diseasepathways.Wefindthat90%ofpathwaysdonotcorrespondtosinglewell-connectedcomponentsinthePPInetwork.Instead,proteinsassociatedwithasinglediseasetendtoformmanyseparateconnectedcomponents/regionsinthenetwork.Wethenevaluatestate-of-the-artdiseasepathwaydiscoverymethodsandshowthattheirperformanceisespeciallypoorondiseaseswithdisconnectedpathways.Thus,weconcludethatnetworkconnectivitystructurealonemaynotbesufficientfordiseasepathwaydiscovery.However,weshowthathigher-ordernetworkstructures,suchassmallsubgraphsofthepathway,provideapromisingdirectionforthedevelopmentofnewmethods.

Page 82: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

71

PROFILINGOFSOMATICALTERATIONSINBRCA1-LIKEBREASTTUMORS

YoudinghuanChen1,2,3,YueWang3,4,LucasA.Salas1,ToddW.Miller3,7,JonathanD.Marotti5,NicoleP.Jenkins2,ArminjaN.Kettenbach2,3,7,ChaoCheng3,4,7,BrockC.

Christensen1,3,7

1DepartmentofEpidemiology,2DepartmentofBiochemistryandCellBiology,3DepartmentofMolecularandSystemsBiology,4DepartmentofGenetics,5Department

ofPathologyandLaboratoryMedicine,6DepartmentofBiomedicalDataScienceatGeiselSchoolofMedicine,Dartmouth,Lebanon,NH03756;

7NorrisCottonCancerCenter,Dartmouth-HitchcockMedicalCenter,Lebanon,NH03756Youdinghuan,ChenGermlineorsomaticmutationinBRCA1isassociatedwithanincreasedriskofbreastcancerandmoreaggressivetumorsubtypes.BRCA1-deficienttumorcellshavedefectivehomologousrecombination(HR)DNArepair,exhibitinggenomeinstabilityandaneuploidy.HRdeficiencycanalsoariseintumorsintheabsenceofBRCA1mutation.AnHR-deficient,BRCA1-likephenotypehasbeenreferredtoas“BRCAness.”BRCA1-likecancersexhibitworseprognosisbutareselectivelysensitivetochemotherapeutictreatments(e.g.platinum-basedalkylatingagents).However,themolecularlandscapesofBRCA1-likebreasttumorsremainlargelyunknowninpartbecausetheyarelesscommoninthegeneralpopulation.Byapplyingacopynumber-basedclassifier,weobservedthat>30%ofTheCancerGenomeAtlas(TCGA)breasttumorsareBRCA1-likeeventhoughonly~3%tumorsanalyzedcarryaBRCA1mutationorpromoterhypermethylation.Separately,adifferentialanalysiscontrollingforhormonereceptorstatus,subjectage,tumorstageandpurityrevealedasignificantincreaseinDNAmethyltransferase1(DNMT1)proteinexpressioninBRCA1-liketumors.Inaddition,differentiallymethylatedgenesetsinBRCA1-liketumorsindicatedastrongenrichmentindevelopmentalsignalingandamoderateinvolvementingenetranscription.ProfilingofconcomitantsomaticalterationlandscapesinBRCA1-likebreasttumorsprovidesalternativestrategiestoidentifythissubsetoftumorsandinsightsintonovelpotentialtherapeuticapproaches.

Page 83: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

72

USINGARTIFICIALINTELLIGENCEINDIGITALPATHOLOGYTOCLASSIFYMELANOCYTICLESIONS

StevenN.Hart,W.Flotte,A.P.Norgan,K.K.Shah,Z.R.Buchan,K.B.Geiersbach,T.Mounajjed,T.J.Flotte

MayoClinic,200FirstSt.SW,Rochester,MN55901

Steven,HartExaminationofhematoxylinandeosinstaining(H&E)stainedslidesbylightmicroscopyhasbeenthecornerstoneofhistopathologyforoveracentury.Duringmicroscopicexamination,apathologistusessalientclinicalinformation,patternmatchingandfeaturerecognition(shape,color,structure,etc.)torenderadiagnosis.Recently,whole-slideimage(WSI)scannershavemadeitpossibletofullydigitizepathologyslides.Inadditiontoenablinglongtermslidepreservationandfacilitatingslidesharingforcollaborationorsecondopinions,digitizationofpathologyslidesallowsforthedevelopmentandutilizationArtificialIntelligence(AI)-drivendiagnostictools.WeconductedapilotstudytotesttheabilityanAIconvolutionalneuralnetwork(CNN)todistinguishbetweentwotypesofmelanocyticlesions,ConventionalandSpitznevi.Wesoughttodeterminetheaddedvalueofpathologist-assistedtrainingbycomparingtrainingeffectivenessofcompleteslideanalysisversustrainingonpathologistselectedimagepatches.ImageswereclassifiedbyadeepCNNusingGoogle’sTensorFlowframework.Wefoundsignificantimprovementinclassificationaccuracywhenthemodelwastrainedfromthepathologist-curatedimageset.ThesedataprovidestrongevidenceforthecontinueddevelopmentofAI-drivendiagnostictoolsindigitalpathology,andhighlightstheaddedvalueofdomainexpertswhenbuildingAIworkflows.Futuredirectionsofthisworkincludeexpandingthenumbermelanocyticlesionsrecognizedbythistool,andenhancingitsclinicalperformancethroughincorporationofmolecular,demographic,andoutcomesdata.

Page 84: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

73

AMACHINELEARNINGAPPROACHTOSTUDYCOMMONGENEEXPRESSIONPATTERNS

MingzeHe1,2,CarolynJ.Lawrence-Dill1,2,3

1BioinformaticsandComputationalBiologyProgram,IowaStateUniversity,Ames,Iowa,USA,50011;2DepartmentofGenetics,DevelopmentandCellBiology,IowaState

University,Ames,Iowa,USA50011;3DepartmentofAgronomy,IowaStateUniversity,Ames,Iowa,USA50011

Mingze,HeGeneexpressionlandscapechangesaccordingtocertaincircumstances,suchasstressresponses.Themaindifficultiesinpredictingcommonexpressionpatternsamonggroupsofgeneslayinlocatingreliablegenemarkersanddevelopingnovelstatisticalapproaches.Wefirstlybuildasharedgeneontology(GO)correlatedgroupingdatabasebynaturallanguageprocessing(NLP).Further,wetestandapplyamixtureofsupervisedandunsupervisedmachinelearningalgorithmstocompareprincipalcomponentsofexpressionpatternsacrossspecies.WefoundseveralsurprisingcommonexpressionpatternsbetweenmaizegenesandhumantumorcelllinesifG-quadruplex(G4)usedasgeneclassifier.Especially,responsetoreactiveoxygenspecies(ROS)relatedG4carryinggenesshowasignificantclusteringofmaizeundercoldandUVstresswithhumantumorcelllines.ThisresultimpliesthatG4regulatenearbygenesundersimilarstresssituation.

Page 85: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

74

GENERAL

POSTERPRESENTATIONS

Page 86: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

75

DATABASE-FREEMETAGENOMICANALYSISWITHAKRONYMER

GabrielAl-Ghalith1,AbigailJohnson2,PajauVangay1,DanKnights3

1BioinformaticsandComputationalBiology-UniversityofMinnesota;2TheBiotechnologyInstitute-UniversityofMinnesota;3DepartmentofComputerScienceand

Engineering-UniversityofMinnesotaGabriel,Al-GhalithMicrobiomeresearchischaracterizedbythecomparisonofmicrobialcommunitycensusdatainferredfrombiologicalsamples.Tocreatethesecensuses,metagenomicDNAistypicallyclustered,aligned,orotherwiseannotatedtoformasetoffeatureswithwhichtoevaluateandcomparemicrobialcommunities.Thesefeaturesmaytakedifferentforms.Amplicon-basedstudiesmayusereference-basedapproachesand/orclusteringofsimilarreadstodistillarepresentativesetoffeaturessuchasoperationaltaxonomicunits.Shotgun-basedapproachescanresultinfiner-grained,lessbiasedtaxonomicresolution,butoftenrelyonreferencedatabasesorclassifierstrainedonknownmicrobialentities.Whiletaxonomyandotherdatabaseannotationsareusefulforinterpretation,theymaymaskusefulsequence-levelinformationforcomparingsamplestoeachother.Inparticular,wheneverthereisnotenoughsequencedatafromparticularorganismsinthereferencedatabase(orrawreads)toidentifythemreliably,informationabouttheseorganismscanbelostormisattributed.Thiscausesmanyenvironmentstobedifficultorevenimpossibletocomparewithcurrentmethods.Further,clusteringorreference-basedanalysesaretypicallycomputationallydemanding.Wepresentacomplementary(oralternative)strategyformicrobiomecomparisoninthesoftwareaKronyMer.Itusesanovel,probabilityadjusteddeterministick-merdistancemetricandultrafastnon-heuristicNei-Saitou-basedtreeclusteringalgorithmtorapidlycalculatealphadiversity,betadiversity,andsampleinter-relatednesstreeswitheitherampliconorshotgunsequencedatadirectlywithoutadatabase.Itisrobusttolow-depthsequencing,itrecoversperson-specificsignatureswithfewerthan100,000shotgunreadspersampleinadatasetof34healthyindividuals,anditrecapitulatesotherexpectedtrendsinpublicdatasets.Additionally,aKronyMercanbeusedtoinferphylogenetictreesfromamplicondatainsecondsonalaptop,createawhole-genomephylogenomictreefromall~100,000RefSeqmicrobialgenomesinafewhoursonadesktop,denoisereadsduringprocessing,andinotherpotentialapplications.

Page 87: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

76

SOFTWARECOMPARISONFORPREPROCESSINGGC/LC-MS-BASEDMETABOLOMICSDATA

JulianAldana1,MonicaCalaMolina1,MarthaZuluaga2

1DepartmentofChemistryGrupodeInvestigaciónenQuímicaAnalíticayBioanalítica(GABIO),UniversidaddelosAndesBogotáDC,Colombia;2DepartmentofChemistryGrupodeInvestigaciónenCromatografíayTécnicasAfines(GICTA)Universidadde

CaldasManizalesColombiaJulian,AldanaMetabolomicsdatapreprocessingisthefirststepfromrawinstrumentoutputtobiologicalinference,anditiscrucialforthediscoveryofmetabolicsignaturesrelatedtoaparticularphysiopathologicalstateofanorganism.Moreover,datahandlingofgaschromatography/massspectrometry(GC/MS)andliquidchromatography/massspectrometry(LC/MS)datasetsarechallengingduetoitssize,complexityandnoise.Therefore,datapreprocessingisperformedasamulti-steptaskthatinvolves:filtering,peakdetection,deconvolution,andalignment,whichcanbecarriedoutusingawidevarietyofalgorithmsandsoftwarepackages.Giventhelackofasingularpreprocessingsoftwareasabenchmark,thegoalofthisstudyistocomparetheperformanceinthepreprocessingofGC/LC-MSdatabetweenopensourceplatforms(MZmine2,XCMSonlineandMetaboAnalyst3.0)andcommercialsoftware(MassHunterProfinder8.0andMetaboliteplot).Forthispurpose,datasetswerecollectedfromtheanalysisofreplicatesamplesfromaplasmapooling,andwefollowaworkflowprocessineachsoftwareadjustingtheparametersinasimilarwaytoallowthecomparison.Then,thedatageneratedwasanalyzedtodeterminethenumberoffeatures,coefficientofvariationandpeakarea.Asaresult,significantdifferencesweredeterminedinthequantitativeperformanceofthepreprocessingevaluatedpackagesforbothGCandLC-MSdatasets.Finally,thiscomparisonallowedustoevaluatethemagnitudeofpreprocessingeffectinthefinaloutputinMS-basedmetabolomicdata,andhowtheresultsofdifferentsoftwarecanbecomparedeachother.

Page 88: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

77

GATEKEEPER:ANEWHARDWAREARCHITECTUREFORACCELERATINGPRE-ALIGNMENTINDNASHORTREADMAPPING

MohammedAlser1,HasanHassan2,HongyiXin3,OğuzErgin4,OnurMutlu2,CanAlkan1

1BilkentUniversity,2ETHZurich,3CarnegieMellonUniversity,4TOBBUniversityofEconomicsandTechnology

Onur,MutluMotivation:Untiltoday,itremainschallengingtosequencetheentireDNAmoleculeasawhole.IntheeraofhighthroughputDNAsequencing(HTS)technologies,genomesaresequencedrelativelyquicklybutresultinanexcessivenumberofsmallDNAsegments(calledshortreadsandareabout75-300basepairslong).Resultingreadsdonothaveanyinformationaboutwhichpartofgenometheycomefrom;hencethebiggestchallengeingenomeanalysisistodeterminetheoriginofeachofthebillionsofshortreadswithinareferencegenometoconstructthedonor’scompletegenome.Identifyingthepotentialoriginofeachread,calledalignment,typicallyperformedusingquadratic-timedynamicprogrammingalgorithms.Theseoptimalalignmentalgorithmsareunavoidableandessentialforprovidingaccurateinformationaboutthequalityofthealignment.Inrecentworks[1-4],researchersobservedthatthemajorityofcandidatelocationsinthereferencegenomedonotalignwithagivenreadduetohighdissimilarity.Calculatingthealignmentofsuchincorrectcandidatelocationswastestheexecutiontimeandincursignificantcomputationalburden.Therefore,itiscrucialtodevelopafastandeffectiveheuristicmethodthatcandetectincorrectcandidatelocationsandeliminatethembeforeinvokingcomputationallycostlyalignmentalgorithms.Results:WeproposeGateKeeper,anewhardwareacceleratorthatfunctionsasapre-alignmentstepthatquicklyfiltersoutmostincorrectcandidatelocations.GateKeeperisthefirstdesigntoacceleratepre-alignmentusingField-ProgrammableGateArrays(FPGAs),whichcanperformpre-alignmentmuchfasterthansoftware.WhenimplementedonasingleFPGAchip,GateKeepermaintainshighaccuracy(onaverage>96%)whileproviding,onaverage,90-foldand130-foldspeedupoverthestate-of-the-artsoftwarepre-alignmenttechniques,AdjacencyFilterandShiftedHammingDistance(SHD),respectively.TheadditionofGateKeeperasapre-alignmentstepcanreducetheverificationtimeofthemrFASTmapperbyafactorof10.Availability:GateKeeperisopen-sourceandfreelyavailableonlineathttps://github.com/BilkentCompGen/GateKeeper.Anextendedversionofthisworkappearsin[1].References:[1]Alser,M.,etal.,GateKeeper:anewhardwarearchitectureforacceleratingpre-alignmentinDNAshortreadmapping.Bioinformatics,2017.33(21):p.3355-3363.[2]Xin,H.,etal.,ShiftedHammingDistance:AFastandAccurateSIMD-FriendlyFiltertoAccelerateAlignmentVerificationinReadMapping.Bioinformatics,2015.31(10):p.1553-1560.[3]Xin,H.,etal.,AcceleratingreadmappingwithFastHASH.BMCgenomics,2013.14(Suppl1):p.S13.[4]Kim,J.,etal.,GenomeReadIn-Memory(GRIM)Filter:FastLocationFilteringinDNAReadMappingusingEmergingMemoryTechnologies,toappearinBMCGenomics,2018.

Page 89: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

78

MODELINGTHEENHANCERACTIVITYTHROUGHTHECOMBINATIONOFEPIGENETICFACTORS

MinGyunBae,TaeyeopLee,JaehoOh,JunHyeongLee,JungKyoonChoi

DepartmentofBioBrainEngineering,KoreaAdvancedInstituteofScienceandTechnology(KAIST),RepublicofKorea

MinGyun,BaeEpigenomemapsallowustopredictthousandsofputativeregulatoryregionssuchaspromoter,insulatorsandenhancersinvariouscelllinesthroughinvivoepigenomicsignaturesandarewidelyusedforstudyinggeneregulationofdevelopmentalprocessanddisease.Especially,super-enhancers,whichconsistofclustersofactiveenhancerspredictedfromH3K27acsignal,areknowntoregulateneargenesthatareimportantincontrollinganddefiningcellidentity.However,thecombinationoftranscriptionfactorsforregulatingenhanceractivityisnotstudiedyet.Inthisstudy,weusedmassivelyparallelreporterassay(MPRA)datawhichmeasurethequantitativeactivityofregulatoryregionstoidentifyenhancers.Through5-nucleotideresolutiontilingofoverlappingMPRAconstructswithaprobabilisticgraphicalmodel,weestimatedthehighresolutionactivityspanning15000putativeregulatoryregionsinK562andHepG2cellline.Accordingtotheratioofactivityatboundaryandcenterofregulatoryregion,weidentifiedthousandsofenhancerscandidates.Usingtheseenhancers,wedevelopedarandomforestmodeltoidentifytheepigeneticdifferencesusingabout300histonemodificationsandtranscriptionfactorsinencyclopediaofDNAelements(ENCODE).Throughtheperformancetestbyareaundercurve(AUC),weconfirmedthatourmodelaccuratelypredictedtheenhancers.Inconclusion,weidentifiedenhancersthroughhigh-throughputreporterassayandfoundtheepigeneticfeaturesthroughrandomforestmodelling.

Page 90: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

79

FREQUENCYANDPROPERTIESOFMOSAICSOMATICMUTATIONSINANORMALDEVELOPINGBRAIN

TaejeongBae1,JessicaMariani2,LiviaTomasini2,BoZhou3,AlexanderE.Urban3,AlexejAbyzov1,FloraM.Vaccarino2

1MayoClinic,2YaleUniversity,3StanfordUniversity

Alexej,AbyzovAsmountingevidenceindicates,eachcellinthehumanbodyhasitsowngenome,aphenomenoncalledsomaticmosaicism.Fewstudieshavebeenconductedtounderstandpost-zygoticaccumulationofmutationsincellsofthehealthyhumanbody.Startingfromsinglecells,directlyobtainedfromthreefetalbrains,weestablished31separatecoloniesofneuronalprogenitorcells,andcarriedoutwhole-genomesequencingonDNAfromeachcolony.Theclonalnatureofthesecoloniesallowsahigh-resolutionanalysisofthegenomesofthefounderprogenitorcellswithoutbeingconfoundedbytheartifactsofinvitrosinglecellwholegenomeamplification.Acrossthethreebrainswedetected200to400non-germlineSNVsperclone.Validationexperiments(withPCR,digitaldropletPCR,andcapturedeepsequencing)revealedhighspecificity(>95%)andsensitivity(>80%)oftheSNVsaswellasconfirmedthepresenceofoverahundredofSNVsintheoriginalbraintissues,therebyprovingthatthedetectedSNVsrepresentgenuinemosaicvariantspresentinneuronalprogenitors.Theper-cellnumberofmosaicSNVsincreasedlinearlywithbrainageallowingustoestimatethemutationrateatabout8.6SNVspercelldivision.DozensofSNVsweregenotypedinmultipledifferentregionsofabrainandeveninblood,suggestingthattheyhaveoccurredpriortogastrulation.UsingtheseSNVs,wereconstructedcelllineagesforthefirstfivepost-zygoticcleavagesandcalculatedamutationrateof~1.3SNVsperdivisionperdaughtercell.Comparisonofmutationspectrarevealedashifttowardsoxidativedamage-relatedmutationsinneurogenesis.Bothneurogenesisandearlyembryogenesisexhibitdrasticallymoremutagenesisthanadulthood.Onacoarse-grainedscalemosaicSNVsweredistributeduniformlyacrossthegenomeandwereenrichedinmutationalsignaturesobservedinmedulloblastoma,neuroblastoma,aswellasinasignatureobservedinallcancersandindenovovariantsandwhich,aswepreviouslyhypothesized,isahallmarkofnormalcellproliferation.Correlationswithhistonemarksfurtherstrengthenedthesimilarityofmosaicmutationsinnormalfetalbrainwithsomaticmutationsreportedforbraincancers.OnasmallerscaleSNVsweremostlybenign,showednoassociationwithanyGOcategoryandtendedtoavoidDNAsehypersensitivesites.Thesefindingsrevealalargedegreeofsomaticmosaicisminthedevelopinghumanbrain,linkdenovoandcancermutationstonormalmosaicismandsetabaselineformosaicgenomevariationrelatedtohumanbraindevelopmentandfunction.

Page 91: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

80

CYCLONOVO:DENOVOSEQUENCINGALGORITHMDISCOVERSNOVELCYCLICPEPTIDENATURALPRODUCTSINSUNFLOWERANDCYANOBACTERIAUSING

TANDEMMASSSPECTROMETRYDATA

BaharBehsaz1,HoseinMohimani2,AlexeyGurevich3,AndreyPrjibelski3,MarkF.Fisher4,LarrySmarr2,PieterC.Dorrestein5,JoshuaS.Mylne4,PavelA.Pevzner2

1BioinformaticsandSystemsBiologyProgram,UniversityofCaliforniaatSanDiego,LaJolla,USA;

2DepartmentofComputerScienceandEngineering,UniversityofCaliforniaatSanDiego,LaJolla,USA;3CenterforAlgorithmicBiotechnology,InstituteforTranslationalBiomedicine,St.PetersburgState

University,StPetersburg,Russia;4TheUniversityofWesternAustralia,SchoolofMolecularSciencesandARCCentreofExcellenceinPlantEnergyBiology,Crawley,Australia;5DepartmentofPharmacology,

UniversityofCaliforniaatSanDiego,LaJolla,USABahar,BehsazCyclopeptidesrepresentanimportantclassofnaturalproductswithanunparalleledtrackrecordinpharmacology:manyantibiotics,antitumoragents,andimmunosuppressors,arecyclopeptides.WhilebillionsoftandemmassspectraofnaturalproductshavebeendepositedtoGlobalNaturalProductsSocial(GNPS)molecularnetwork,thediscoveryofnovelcyclopeptidesfromthisgoldmineofspectraldataremainschallenging.Astheresult,onlyasmallfractionofspectraintheGNPSmolecularnetworkhavebeenidentifiedsofar.Toaddressthisbottleneck,wedevelopedCycloNovoalgorithmfordenovocyclopeptidesequencingbasedontheconceptofthedeBruijngraphs,theworkhorseofmoderngenomesequencingalgorithms.Givenaspectraldataset,CycloNovofirstidentifiesasubsetofthisdatasetthatmayrepresentcyclicandbranch-cyclicpeptidesbyanalyzingspectral-convolutionofeachspectrum.Afterward,itattemptstodenovosequenceeachspectrumofputativecyclicorbranch-cyclicpeptides.CycloNovopipelineincludes(i)computingthespectralconvolutionofeachspectrum,andextractingthesetofmassesthatrepresentputativeaminoacidsintheunknownPNP,(ii)computingcompositionsofmassesthatmatchestheprecursormassofthespectrum,(iii)constructingpotential5-mersforeachcompositionwithhighscoreagainstthespectrum,(iv)constructingadeBruijngraphwiththose5-mers,(v)traversingthedeBruijngraphandgeneratingcandidatesequences,and(vi)computingthePeptide-Spectrum-Match(PSM)scoreforeachcandidatesequence.CycloNovorevealedmanystillunknowncyclopeptides(hundredsofnovelcyclopeptidefamilies)illustratingthatcurrentlyknowncyclopeptidesrepresentjustasmallfractionofcyclopeptideswhosespectraarealreadydepositedintopublicdatabasessuchasGNPS.CycloNovoaddressesthechallengeofanalyzingthe“darkmatterofcyclopeptidome”byapplyingdeBruijngraphstocyclopeptidesequencing.ItcorrectlysequencedmanyknowncyclopeptidesinablindexperimentandreconstructednovelcyclopeptidesoriginatedfromplantsandcyanobacteriathatwerefurthervalidatedusingRNA-seqdataandgenomemining,thefirstcyclopeptidesdiscoveredinacompletelyautomateddenovofashion.Ouranalysisofhumanmicrobiomeisthefirstdemonstrationthatnumerousbioactivecyclopeptidesfromconsumedplantsremainstableintheproteolytichumangutenvironmentandthusareexpectedtointeractwithhumanmicrobiome.Inaddition,itrevealedalargenumberofstillunknowncyclopeptidesinthehumangutthatareeitherapartofthehumandietorareproductsofthehumangut’smicrobiome.

Page 92: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

81

FUNCTIONALANNOTATIONOFGENOMICVARIANTSINSTUDIESOFLATE-ONSETALZHEIMER’SDISEASE

MariuszButkiewicz,JonathanL.Haines,WilliamS.Bush

InstituteforComputationalBiologyandDepartmentofPopulationandQuantitativeHealthSciences,CaseWesternReserveUniversity,Cleveland,OHUSA

William,BushAnnotationofgenomicvariantsisanincreasinglyimportantandcomplexpartoftheanalysisofsequence-basedgenomicanalyses.Computationalpredictionsofvariantfunctionareroutinelyincorporatedintogene-basedanalysesofrare-variants,thoughtodatemoststudiesuselimitedinformationforassessingvariantfunctionthatisoftenagnosticofthediseasebeingstudied.Inthiswork,weoutlineanannotationprocessmotivatedbytheAlzheimer’sDiseaseSequencingProject,andillustratetheimpactofincludingtissue-specifictranscriptsetsandsourcesofgeneregulatoryinformation,andassessthepotentialimpactofchanginggenomicbuildsontheannotationprocess.Whilethesefactorsonlyimpactasmallproportionoftotalvariantannotations(~5%),theyinfluencethepotentialanalysisofalargefractionofgenes(~25%).Variantannotationisavailableforbulkdownload,andindividualvariantannotationsarealsoavailableviatheNIAGADSGenomicsDB.

Page 93: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

82

OCTAD:ANOPENCANCERTHERAPEUTICDISCOVERYWORKSPACEINTHEERAOFPRECISIONMEDICINE

BinChen,BenjaminS.Glicksberg,WilliamZeng,YuyingChen,KeLiu

InstituteforComputationalHealthSciences,UniversityofCalifornia,SanFrancisco,55016thStreet,SanFrancisco,California94143,USA

Bin,ChenRapidlydecreasingcostsofRNAsequencinghaveenabledlarge-scaleprofilingofcancertumorsampleswithpreciselydefinedclinicalandmolecularfeatures(e.g.,LowgradeIDH1mutantGlioma).Identifyingdrugstargetingaspecificsubsetofcancerpatients,particularlythosethatdonotrespondtoconventionaltreatments,iscriticallyimportantfortranslationalresearch.Manystudieshavedemonstratedtheutilityofasystems-basedapproachthatconnectscancerstoefficaciousdrugsthroughgeneexpressionsignaturestoprioritizedrugsfromalargedruglibrary.Fromourpreviousworkonlivercancer,Ewing’sSarcoma,andBasalcellcarcinoma,wehaveshownthatthesuccessofthisapproachismadepossiblebycriticalprocedures,suchasqualitycontroloftumorsamples,selectionofappropriatereferencetissues,evaluationofdiseasesignatures,andweightingcancercelllines.Thereisaplethoraofrelevantdatasetsandanalysismodulesthatarepubliclyavailable,yetareisolatedindistinctsilos,makingittedioustoimplementthisapproachintranslationalresearch.Assuch,wepresentthecurrentprotocol,whichweenvisionasabestpracticetoprioritizedrugsforfurtherexperimentalevaluation,primarilybasedonopentranscriptomicdatasetsandthefreeopen-sourceRlanguageandBioconductorpackages.Inthisproject,weretrievedpatienttumorsamplesbasedonspecifiedclinicaland/ormolecularfeaturesfromtheGenomicDataCommonsDataPortalusinganAPI.WethencreatedageneexpressionsignatureforthesesamplesthroughemployingnormalizedRNA-SeqcountsprocessedintheUCSCXenaproject,whereallRNA-SeqsamplesfromTCGA,TARGET,andGTExwerealignedandnormalizedusingthesamepipeline.Weevaluatedthequalityofsamplesbasedontheirpurityandcorrelationwithcancercelllines.ThereferencetissuesampleswereselectedbasedontheirprofilesimilaritywithGTExsamples.Weevaluatedeachdiseasesignatureviaacross-validationapproach.Wethencreateddrugsignaturesusingasimilarprocedurefromlarge-scale,openaccessplatforms,namelytheLINCSL1000library,whichconsistsofover20,000compounds.Ourpipelinecanthencomputeandassessthereversalpotencybetweenthediseasesignatureandeachdrugsignature.Thedrugsthatpresenthighreversalpotencyareprioritizedasdrughits.Finally,weperformedenrichmentanalysisofdrughitstoidentifycompellingenrichedtargetsandpathways.Forourpilotstudy,weuseIDH1mutantOligodendrogliomaasacasestudy,wheretheefficacyofover300LINCScompoundswasmeasuredinthreerelevantcelllines.Wehaveshownthatourpredictioncorroboratewiththeexperimentaldata.

Page 94: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

83

DEEPLEARNINGPREDICTSTUBERCULOSISDRUGRESISTANCESTATUSFROMWHOLE-GENOMESEQUENCINGDATA

MichaelL.Chen,IsaacS.Kohane,AndrewL.Beam,MahaFarhat

DepartmentofBiomedicalInformatics,HarvardMedicalSchool,Boston,MAMichael,ChenBackgroundThediagnosisofmultidrugresistantandextensivelydrugresistanttuberculosisisaglobalhealthpriority.Thereisapressingneedforarapidandcomprehensivedrugsusceptibilitytestthatcancircumventthelimitedscopeofconventionalmethodsandtheassociatedlongwaittimes.WesoughttoimplementthefirstdeeplearningframeworkasapredictivediagnostictoolforMycobacteriumtuberculosis(MTB)drugresistance.MethodsUsingalargepublicdatasetof3,601MTBstrainsthatunderwenttargetedorwholegenomesequencingandconventionaldrugresistancephenotyping,webuiltthefirst-of-its-kindmultitaskwideanddeepneuralnetwork(WDNN)architecturetopredictphenotypicdrugresistanceto11anti-tuberculardrugs.WecomparedperformanceoftheproposedWDNNtoregularizedlogisticregressionandrandomforestmodelsusingfive-foldcrossvalidation.Weconductedpermutationtestsforevaluatingfeatureimportanceandat-distributedstochasticneighborhoodembedding(t-SNE)tovisualizethehighdimensionalmodeloutputonthefulldataset.ResultsThemultitaskWDNNachievedstate-of-the-artpredictiveperformancecomparedtoregularizedlogisticregressionandrandomforest:theaveragesensitivitiesandspecificities,respectively,forall11drugswere87.1%and93.7%(multitaskWDNN),85.4%and93.8%(randomforest),and82.2%and93.9%(regularizedlogisticregression).ThemultitaskWDNNachievedahighersumofspecificityandsensitivityfor9ofthe11drugscomparedtoboththerandomforestandregularizedlogisticregression.WeshowconsiderableperformancegainsinourcurrentmultitaskWDNNwithrespecttoourpreviouslyreportedrandomforestmodel,notingimprovementsofupto54.0%inthesumofspecificityandsensitivity.Patternsinsusceptibilitystatusemergedbetweendrugsafterapplyingt-SNEthatcorrelatewellwithwhatisknownabouttheorderofMTBdrugresistanceacquisition.Novelt-SNEfindingsincludedmajorclusterdifferencesbetweenpyrazinamideandotherfirst-linedrugsandincreasedamountsofresistanceclustersforcapreomycincomparedtoothersecond-linedrugs.Notablefindingsinthefeatureimportanceanalysesincludedexpectedsharedresistance-associatedmutationsbetweendrugsandprovidednewinsightpotentialmechanisticrelationships.Capreomycinexclusivelyshared10featureswithfirst-linedrugs,highlightingpotentialavenuesforfutureresearchintothediagnosticsimilaritiesbetweencapreomycinandothersubtypesofanti-tuberculardrugs.ConclusionOurproposedarchitectureprovidesaunifiedmodelofdrugresistanceacross11anti-tuberculardrugsandshowsconsiderableperformancegainsoversimpermethods.DeeplearninghasaclearroleinimprovingidentificationofdrugresistantMTBstrainsandholdspromiseinbringingsequencingtechnologiesclosertothebedside.

Page 95: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

84

DESIGNINGPREDICTIONMODELFORHYPERURICEMIAWITHVARIOUSMACHINELEARNINGTOOLSUSINGHEALTHCHECK-UPEHRDATABASE

EunKyungChoe1,SangWooLee2

1DepartmentofSurgery,SeoulNationalUniversityCollegeofMedicine;2NetworkDivision,SamsungElectronics

EunKyung,ChoeHyperuricemiaisanelevateduricacidlevelinblood.Itcanleadtogoutandnephrolithiasisbutalsohasbeenimplicatedasanindicatorfordiseaselikemetabolicsyndrome,diabetesmellitus,cardiovasculardisease,andchronicrenaldisease.TheaimofthepresentstudyistodesignapredictionmodelforhyperuricemiausingEHRdatabasefromhealthcheck-upusingvariousmachinelearningtools.From2005to2015,self-paidpeoplehadcomprehensivehealthcheck-up.Inputfactorswereage,gender,bodymassindex(BMI),bloodpressure,waistcircumference,whitebloodcellcount,hemoglobin,glucoselevel,cholesterol,GOT/GPT,GGT,creatinine,triglyceride,urinealbumin,smoking/alcoholhabit,anddiabetes/hypertension/dyslipidemiamedicationstatus,whicharethefactorscoveredbynationalhealthinsurance.Outputfactorwasuricacidlevelwhichisnotincludedinthenationalhealthcheck-up.AllofthedatawereextractedfromtheEHRdatabaseandtextminingwasperformed.Wedesignedapredictionmodelforhyperuricemiausingmachinelearningtoolssuchaslinearregressionmodel(LR),supportvectormodel(SVM),classificationtreemodel(CT)andneuralnetworkmodel(NN).MachinelearningwasperformedbyMATLABR2016b(TheMathworks,Natick,MA).Thepredictionpowerofeachmodelswereevaluatedbycalculatingtheareaunderthecurve(AUC),sensitivity,specificityandaccuracy.Total55,227personswereincludedintheanalysis.Themedianagewas52years(range21-95years)and53.5%ofpersonsweremales.Therewere10,586(19.2%)personswhohaduricacidlevelinhyperuricemia.BMIwashigherinhyperuricemiagroup(25.2+/-3.0vs.normaluricacidgroup,23.4+/-2.9,p<0.001)andthereweremorealcoholdrinkinghabitsinhyperuricemiagroup(67.8%vs.normaluricacidgroup,52.4%,p<0.001).Sortingtheresultsbytheaccuracyofeachmachinelearningmodels,theCTshowedthehighestaccuracyof0.954(AUC=0.886;sensitivity=0.792;specificity=0.981)comparedtoSVMof0.892(AUC=0.630;sensitivity=0.261;specificity=0.999),NNof0.859(AUC=0.770;sensitivity=0.09;specificity=0.991)andLRof0.857(AUC=0.761;sensitivity=0.033;specificity=0.997).Thisstudyusedahealthcheck-upEHRdatabasetopredictadiseasestatus(hyperuricemia)usingvariousmachinelearningtools.SincetheamountofEHRdatabaseareincreasingrapidly,thedataincludedinthedatabasecouldbeusedasbiomarkerstopredictdiseasestatusorhighriskconditionsbymodelingapredictionmodelusingmachinelearningtools.Butsincetheoptimalanalysistooloranalyzingprotocolisnotwellestablishedandtheover-fittingproblemisyetnotsolved,moretrainingandresearchesinvarioussetofpopulationsshouldbeendoneinfuturestudyforreplication.

Page 96: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

85

RICK:RNAINTERACTIVECOMPUTINGKIT

GalinaA.Erikson,LingHuang,MaximShokhirev

SalkInstituteforBiologicalStudiesGalina,EriksonTheadventofmassivelyparallelsequencingofRNA(RNA-Seq)enablesfastandinexpensiveglobalmeasurementofthousandsofgenesacrossbiologicalperturbationsinvolvingdrugtreatment,geneticmutations,andtimeseries.Tofacilitatecomparison,manytoolshavebeendeveloped,howevermostofthesetoolsrequireextensiveprogrammingandbioinformaticsknowledge:littleisavailableforthescientistthatwantstoanalyzetheirownRNA-seqdatabutlacksbioinformaticsexpertise.TheRNAInteractiveComputingKit(RICK)aimstofillthisgapbyprovidinganinteractivewebworkspacedesignedtofacilitateRNA-Seqanalysisandvisualization.RICKacceptsasinputafilewithrawreadcountsforeachtranscriptandsampleandperformssampleclustering),visualizestheglobalgeneexpressionwithheatmaps,runsprincipalcomponentanalysisandpreparesprintreadyfigures.Userscanaddandremovesamplesandregeneratenewfiguresonthefly.Fordifferentialgenesexpressionusershavetheoptiontouse:edgeR,Deseq2orthecombinationofallandfiltertheresultsbasedonadjustedp-valueandfoldchange.RICKisabletousetheDEresultsfromtheprevioussteptoidentifythesignificantlyalteredKEGGpathwaysorenrichedGOtermsusingthegageorGOseqpathwayanalysispackagewithvisualization.Usersalsohavetheoptiontouploadtheircustomizedgene/backgroundgenelisttodoaDAVID-likeanalysis.RICKsupportsRNA-Seqbasedresearchbyprovidingaworkflowthatrequiresnobioinformaticsskills.RICKisfreelyavailableatrick.salk.edu.

Page 97: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

86

PRIVATEINFORMATIONLEAKAGEINFUNCTIONALGENOMICSEXPERIMENTS:QUANTIFICATIONANDLINKING

GamzeGursoy,MarkGerstein

PrograminComputationalBiologyandBioinformaticsYaleUniversityGamze,GursoyThesuccessoftheENCODE(EncyclopediaofDNAElements)projectopenedthedoorstoadeeperunderstandingofthefunctionalgenomethroughgenome-wideexperimentalassays.AlthoughidentifyingindividualsusingDNAvariantsfromwholegenomeorexomesequencingdataisamajorprivacyandsecurityconcern,nostudyongenomicprivacyhasfocusedonthequantityofinformationinfunctionalgenomicexperimentssuchasChIP-Seq,RNA-SeqandHi-C,sincethemajorityofthisdataispartialandbiased.Here,wequantifytheamountofleakedgenotypeinformationindifferentfunctionalgenomicassaysatvaryingcoverages.Weshowthatsequencingdatafromfunctionalgenomicsassaysprovidesenoughprivateinformationtobeabletolinkthesesamplestoapanelofindividualswithknowngenotypes.

Page 98: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

87

CARPED.I.E.M:ADATAINTEGRATIONEXPECTATIONMAPFORTHEPOTENTIALOFMULTI-`OMICSINTEGRATIONINCOMPLEXDISEASE

TiaTateHudson,ClarLyndaWilliams-DeVane

NorthCarolinaCentralUniversity,Durham,NC,USATia,HudsonAdvancesinhighthroughputtechnologiesandtheavailabilityofmulti-`omicsdatapresenttheopportunityformoreholisticunderstandingsofbiologicalregulationincomplexdiseasesanddisparities.Thecomplexityanddisparatenatureofvariousdiseasesrequiresthedevelopmentofequallycomplexmodelswithmultiplelayersofbiologicalinformation.Thishowever,requirestheintegrationofbiological,computational,andstatisticaldomains.Currently,nonetheless,thereexistmajorgapsintheavailabilityandknowledgeamongstthethreedomains.Typically,biologistexperienceproblemswithprocessingandanalyzingbiologicaldata;therefore,seekingdatascientistformorecustomizedanalysis.Incontrast,somedatascientistslackathoroughunderstandingoftheregulationandcomplexinteractionsofvarioussystemsgivingrisetovaryingcomplexphenotypes.Thisgenerallyresultsinlesscomprehensiveanalysisandanoverallnarrowunderstandingofcomplexdiseasephenotypes,whichcanonlybethoroughlyunderstoodwhenvariouslevelsof`omicinteractionsareconsideredasawhole.Thus,developingthemostcomprehensivebiologicalmodelsmustconsiderthemultipleappropriatelayersofgenomic,epigenomic,transcriptomic,proteomic,andmetabolomicregulation,aswellasthepotentialroleenvironmentalandsocialfactorsplayateach`omiclevel.Historically,diversedatatypeshavebeenconsideredindependentlywhilecombinationsoftwoormoredatatypeshavebeenutilizedlessfrequently.Singularanalysisofindependent`omiccontributionsofdiseaseoftenneglecttheintricateinteractionsamongthedistinctlevelsgivingrisetothesecomplextraits.Althoughenvironmentalandsocialfactorshaveamajorroleinthedisparatenatureofdiversediseases,manydiseasesresultfrommutualalterationsinassortedpathwaysandbiologicalprocesses,includinggenemutations,epigeneticchanges,andmodificationsingeneregulation.Therefore,thevariousphenotypesindiversediseaserepresentamajorexampleoftheneedforintegratedbiologicalmodelsforcomplextraitanalysis.Inthisstudy,wepresenttheDataIntegrationExpectationMap(D.I.E.M),whereweexplorethescientificvalueofintegratingvarious`omicdatacombinationsthatcanrevealmechanismsofbiologicalregulationindiseasedisparities.Ourgoalistoconveythepotentialforintegrationofgenomic,epigenomic,transcriptomic,proteomic,andmetabolomicdataforimprovingourunderstandingofthecomplexityandnatureofdisparityincomplexdiseasetraits.Indoingso,thismapwilladdresstheholesinthevariousdomainsnecessaryforintegrateddataanalysisandinterpretation.D.I.E.Mwillalsorevealtheexpectedoutcomesforeach`omicdatatypeandthevariouscombinationsthatmayormaynotdivulgeaholisticviewintocomplexdiseasephenotypes.Withthat,weexpecttogainagreaterunderstandingofphysiologicalprocessescontributingtodisparitiesaswellastheroleeach`omicinteractionplaysinscreening,diagnosis,andprognosisofdisease.

Page 99: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

88

IMPROVINGGENEFUSIONDETECTIONACCURACYWITHFUSIONCONTIGREALIGNMENTINTARGETEDTUMORSEQUENCING

JinHyunJu,XiaoChen,JuneSnedecor,Han-YuChuang,BenMishkanian,SvenBilke

IlluminaInc.,5200IlluminaWay,SanDiego,CA92122,USAJinHyun,JuGenefusionshavebeenidentifiedasdrivermutationsinmultiplecancertypes,andanumberofdrugstargetingspecificfusionshavebeendevelopedastreatmentoptions.Therefore,theabilitytoidentifyfusionsfromtumorsampleshasbecomecriticalfortheselectionofappropriatetreatmentsforpatients.Previously,genefusionshavebeendetectedbytargetedapproachessuchaspolymerasechainreaction(PCR)orFluorescentInSituHybridization(FISH).Thesemethodsnotonlyrequirepriorknowledgeofthefusion,butarealsolaborintensiveandnotefficient.NewermethodsutilizingRNAsequencing(RNA-seq)thatareabletodetectmultipletypesofgenefusionswithnopriorknowledgerequiredhavebeenintroducedwiththeemergenceofnext-generationsequencingtechnology.OnecriticalchallengeinusingRNA-seqdataforgenefusiondetectionisfalsepositivefindingsintroducedbyalignerspecificbiasesorregionswithsequencesimilarityinthegenome.Thisproblembecomesmoreapparentinclinicalsettingswheretheabundanceoffusiontranscriptscanbelimitedbythecompositionandheterogeneityofthetumorsample.Toavoidthecriticalriskoffailingtodetectapotentiallytreatablegenefusion,imposingastringentdetectionthresholdbecomesdifficultinthesesituationsleadingtotheinclusionoffusionsbasedonrelativelylowreadevidence.Toaddressthisproblem,wedescribeanovelfusionfilteringmethodbasedonfusioncontigrealignmentthatisdesignedtoidentifyspuriousfalsepositivefusions.Ourmethodcanbeusedtogetherwithanyassembly-basedfusioncallingmethodthatconstructsacontigsequenceforeachreportedfusion.ThefirststepistorealignthefusioncontigswithBasicLocalAlignmentSearchTool(BLAST),whichisrelativelymoreflexibleinfindingalternativealignmentresultswithhighsequencesimilarity.Subsequently,wedeterminewhetheraspecificfusioncallcanbesupportedbyevidencefoundinBLASTalignments.Specifically,weaimtofilteroutfusionsthatcanbeexplainedbyregionsoriginatingfromasinglegeneorgenomicregion,orhaveweaksupportoneithersideofthefusioninBLASTalignments.Inourpreliminaryanalysisof1171fusioncallsin322samples,111outof161falsepositivecalls(68%)werefilteredoutwhilenocallsfromthetotalof1010truepositiveswerefilteredout.

Page 100: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

89

SPARSEREGRESSIONFORNETWORKGRAPHSANDITSAPPLICATIONTOGENENETWORKSOFTHEBRAIN

HidekoKawakubo,YusukeMatsui,TeppeiShimamura

NagoyaUniversityiHideko,KawakuboRecentrarevariantanalysesofsinglenucleotidevariations(SNVs)andcopynumbervariations(CNVs)hasidentifieddozensofcandidategenesthatmaycontributetoneurogeneticdisorderssuchasautismandschizophrenia.However,itisunclearwhetherandhowthesedisease-causinggenesareassociatedwithcellularmechanismsinbrain.Thisproblemisachallengingtask,sincethebraincontainshundredsofdistinctcelltypes,eachofwhichhasuniquemorphologies,projections,andfunctions,andthusdisease-causinggenesmaycontributetodifferentbehavioralabnormalitiesofdistinctcelltypesinthenervoussystem.Inordertoidentifycandidatecelltypesofthebrainrelatedtoacomplexgeneticdisorder,weproposeastatisticalmethod,calledgraphorientedsparselearning(GOSPEL).

Page 101: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

90

GRIM-FILTER:FASTSEEDLOCATIONFILTERINGINDNAREADMAPPINGUSINGPROCESSING-IN-MEMORYTECHNOLOGIES

JeremieS.Kim1,2,DamlaS.Cali1,HongyiXin3,DonghyukLee1,4,SaugataGhose1,MohammedAlser5,HasanHassan2,6,OğuzErgin6,CanAlkan5,OnurMutlu2,1

1ECEDepartment,CarnegieMellonUniversity;2CSDepartment,ETHZurich;

3CSDepartment,CarnegieMellonUniversity;4NVIDIAResearch;5CEDepartment,BilkentUniversity;6CEDepartment,TOBBUniversityofEconomicsandTechnology

Jeremie,KimSeedlocationfilteringiscriticalinDNAreadmapping,aprocesswherebillionsofDNAfragments(reads)sampledfromadonoraremappedontoareferencegenomeinordertoidentifythegenomicvariantsofthedonor.State-of-the-artreadmappersdeterminetheoriginallocationofareadsequencewithinareferencegenomein3generalizedsteps.Areadmapper1)quicklygeneratespossiblemappinglocationsforseeds(i.e.,smallersegments)withinaread,2)extractsthereferencesequenceateachofthemappinglocations,and3)determinesthesimilarityscorebetweenthereadanditsassociatedreferencesequenceswithacomputationally-expensivealgorithm(i.e.,sequencealignment).Withthesimilarityscoresacrossallpossiblelocations,thereadmappercandeterminetheoriginallocationofthereadsequence.Thedifferencesbetweenthereadsequenceandthematchingreferencesequenceindicatethegenomicvariantsofthedonor,whichcanbefurtheranalyzedforpreventativecareordiagnosis.Aseedlocationfilter(e.g.,FastHASH[2],SHD[3],GateKeeper[4])comesintoplaybeforesequencealignment(step3)andreducesthenumberofunnecessaryalignments.Aseedlocationfilterefficientlydetermineswhetheracandidatemappinglocationwouldresultinanincorrectmappingbeforeperformingthecomputationally-expensivesequencealignmentstepforthatlocation.Intheidealcase,aseedlocationfilterwoulddiscardallpoorlymatchinglocationspriortoalignmentsuchthatthereisnowastedcomputationonunnecessaryalignments.Weproposeanovelseedlocationfilteringalgorithm,GRIM-Filter,optimizedtoexploit3D-stackedmemorysystemsthatintegratecomputationwithinalogiclayerstackedundermemorylayers,toperformprocessing-in-memory(PIM).GRIM-Filterquicklyfiltersseedlocationsby1)introducinganewrepresentationofcoarse-grainedsegmentsofthereferencegenome,and2)usingmassively-parallelin-memoryoperationstoidentifyreadpresencewithineachcoarse-grainedsegment.Ourevaluationsshowthatforasequencealignmenterrortoleranceof0.05,GRIM-Filter1)reducesthefalsenegativerateoffilteringby5.59x--6.41x,comparedtothebestpreviousseedlocationfilteringalgorithm,and2)providesanend-to-endreadmapperspeedupof1.81x--3.65x,comparedtoastate-of-the-artreadmapperemployingthebestpreviousseedlocationfilteringalgorithm[2].Thisworkwillappearatthe16thAsiaPacificBioinformaticsConferenceinJanuary2018[1].Thepreliminaryversionofthefullarticleisathttps://arxiv.org/pdf/1711.01177.pdf.[1]Kim,JeremieS,etal."GRIM-Filter:FastSeedLocationFilteringinDNAReadMappingUsingProcessing-in-MemoryTechnologies."toappearinBMCGenomics(2018).[2]Xin,Hongyi,etal.“AcceleratingreadmappingwithFastHASH.”BMCGenomics(2013).[3]Xin,Hongyi,etal.“ShiftedHammingdistance:afastandaccurateSIMD-friendlyfiltertoacceleratealignmentverificationinreadmapping.”Bioinformatics(2015).[4]Alser,Mohammed,etal."GateKeeper:anewhardwarearchitectureforacceleratingpre-alignmentinDNAshortreadmapping."Bioinformatics(2017).

Page 102: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

91

MULTI-CLASSCLASSIFICATIONSTRATEGYFORSUPPORTVECTORMACHINESUSINGWEIGHTEDVOTINGANDVOTINGDROP

SunghoKim,TaehunKim

YeungnamUniversity,DGISTSungho,KimAnovelmulti-classstrategyforSupportVectorMachines(SVMs)wasdevelopedtoperformmulti-classclassification,suchasOneVersusOne,OneVersusAllandDynamicAcyclicGraph.Thesestrategiesdonotreflectthedistancebetweenthehyper-planethatseparatestwoclassesandinputdata.Thisisnotreasonablewhentheinputdataisplacednearthehyper-plane.TheproposedweightedvotingresolvesthisproblembyweightingthevotingvaluesaccordingtothedistancefromtheboundaryandtheenhancedperformanceoftheSVMswiththeproposedvotingdrop.TheproposedWeightedVotingisbasedonthevotingmethod.Thevotingmethodiscarriedoutbyaccumulatingvotes,thenchoosingthemostvotedclass.TheproposedWeightedVotingmethodisaweightingofthevotingvaluebyreflectingthedistancefromtheboundaryandmargin.SecondproposedVotingDropmethodisabouthowtoaccumulatevotes.ThenovelvotingmethodaccumulateseveryvotebutthismannercanbeaproblembecausethereareredundantlyrespondingSVMs.BecausetheSVMisabinaryclassifier,eachSVMlearnsonlyabouttwoclasses.Therefore,aSVMdoesnothavediscernmentforthenon-learnedclasses.ThisiswhywhenaSVMpredictsdatabelongingtoanon-learnedclass,theSVMrespondsredundantly.ThisirrelevantSVMcausesanincorrectvotethatmakesthedecisionconfused.Toresolvethisproblem,theVotingDropmethoddropstheredundantvotesbyremovingtheirrelevantSVM.ThisalgorithmfindstheirrelevantSVM,thendroppingthevotescausedbytheirrelevantSVM.ThewaytofindanirrelevantSVMistofindaleastvotedclassbecausealeastvotedclasscanbethoughtofasanirrelevantclasstoinputdata.Asshownintheexperiments,evenlyreflectingthedistancefromthehyper-planeandthediscernmentofthehyper-planeandremovingtheredundantSVM`svotingleadstohigherperformance.Theproposedmethodscanbeusedforarangeofclassificationtasks.

Page 103: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

92

GENOME-WIDEANALYSISOFTRANSCRIPTIONALANDCYTOKINERESPONSEVARIABILITYINACTIVATEDHUMANIMMUNECELLS

SarahKim-Hellmuth1,2,MatthiasBechheim3,BennoPütz2,PejmanMohammadi1,4,JohannesSchumacher5,VeitHornung3,6,BertramMüller-Myhsok2,TuuliLappalainen1,4

1NewYorkGenomeCenter,NewYork,NY,USA;2Max-Planck-InstituteofPsychiatry,Munich,

Germany;3InstituteofMolecularMedicine,UniversityofBonn,Bonn,Germany;4DepartmentofSystemsBiology,ColumbiaUniversity,NewYork,NY,USA;5InstituteofHumanGenetics,

UniversityofBonn,Bonn,Germany;6GeneCenterandDepartmentofBiochemistry,Ludwig-Maximilians-UniversityMunich,Munich,Germany

Sarah,Kim-HellmuthTheimmunesystemplaysamajorroleinhumanhealthanddisease.Understandingvariabilityofimmuneresponsesonthepopulationlevelandhowitrelatestosusceptibilitytodiseasesisvital.Inthisstudy,weaimedtocharacterizethegeneticcontributiontointerindividualvariabilityofimmuneresponseusinggenome-wideassociationandfunctionalgenomicsapproaches.Forthispurpose,westudiedgeneticassociationstocellular(geneexpression)andmolecular(cytokine)phenotypesinprimaryhumancellsactivatedwithdiversemicrobialligands.Weisolatedmonocytesof134individualsandstimulatedthemwiththreebacterialandviralcomponents(LPS,MDP,andppp-dsRNA).Weperformedtranscriptomeprofilingatthreetimepoints(0min/90min/6h)andgenome-wideSNP-genotyping.Inaddition,weprofiledfivecytokinesproducedbyperipheralbloodmononuclearcellsactivatedbyfivecomponentsfromthesameindividualstoperformagenome-wideassociationstudy.Comparingexpressionquantitativetraitloci(eQTLs)underbaselineanduponimmunestimulationrevealed417immuneresponsespecificeQTLs(reQTLs).Wecharacterizedthedynamicsofgeneticregulationonearlyandlateimmuneresponse,andobservedanenrichmentofreQTLsindistalcis-regulatoryelements.AnalysisofsignsofrecentpositiveselectionandthedirectionoftheeffectofthederivedalleleofreQTLsonimmuneresponsesuggestedanevolutionarytrendtowardsenhancedimmuneresponse.Furthermore,multivariateGWASanalysisofcytokineresponsestodiversestimulirevealed159genome-widesignificantloci;however,onlyasmallnumberofthesecouldbereliablylinkedtopotentiallycausaleQTLsinmonocytes.Finally,giventhecentralroleofinflammationinmanydiseases,weexaminedreQTLsasapotentialmechanismunderlyinggeneticassociationstocomplexdiseases.WeuncoverednovelreQTLeffectsinmultipleGWASloci,andshowedastrongerenrichmentofresponsethanconstanteQTLsinGWASsignalsofseveralautoimmunediseases.Theseresultsindicateasubstantial,disease-specificroleofenvironmentalinteractionswithmicrobialligandsingeneticrisktocomplexautoimmunediseases.Whiletissue-specificityofmoleculareffectsofGWASvariantsisincreasinglyappreciated,ourresultssuggestthatinnateimmunestimulationisakeycellularstatetoconsiderinfutureeQTLstudiesaswellasintargetedfunctionalfollow-upofGWASloci.

Page 104: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

93

PREDICTINGFATIGUESEVERITYINONCOLOGYPATIENTSONEWEEKFOLLOWINGCHEMOTHERAPY

KordM.Kober,XiaoHu,BruceA.Cooper,StevenM.Paul,ChristineMiaskowski

UniversityofCaliforniaSanFranciscoKord,KoberEffectivesymptommanagementisacriticalcomponentofcancertreatment.Computationaltoolsthatpredictthecourseandseverityofthesesymptomshavethepotentialtoassistoncologyclinicianstopersonalizethepatient’streatmentregimenmoreefficientlyandprovidemoreaggressiveandtimelyinterventions.Cancer-relatedfatigue(CRF)isthemostcommonsymptomassociatedwithcanceranditstreatments.CRFhasanegativeimpactonthepatients’abilitytotoleratetreatmentsandontheirqualityoflife.OneofthelimitationstoeffectivetreatmentofCRFistheavailabilityofavalidandreliablemodeltopredicttheseverityofCRF.Theobjectiveofthispilotstudywastogenerateapredictivemodelforfatigueseverity1weekafterchemotherapy(CTX)administration(T2)using28demographicandclinicalcharacteristicsthatwerecollectedjustpriortoCTXadministration(T1)inasampleof1042cancerpatientsundergoingCTX.Inthispilotstudy,weusedsupportvectorregression(SVR)withapolynomialkerneltopredicttheseverityoftheeveningfatiguebetweentwodifferenttimepointsduringacycleofCTX.Patientswithmissingdatawereremoved,leavingatotalof689forthisanalysis.Trainingandtestinggroupsconsistedof518and171patients,respectively.Weused10-times10-foldcross-validationroot-mean-squareerror(RMSE)toassessthefitofthepredictivemodel.OurmodelachievedanRMSE/meanof0.269.Thefivepredictorswiththehighestimportancewere:eveningfatigueatT1,morningfatigueatT1,attentionalfunction,sleepdisturbance,andperformancestatus.Thefivepredictorswiththelowestimportancewere:livingalone,caregivertoadult,andlevelofeducation,cyclelength,andnumberofmetastaticsites.Overall,clinicalcharacteristicsassociatedwithcanceranditstreatment,includingcancerdiagnosis,hadlowimportanceinthemodel.ThesefindingssuggestthattheexperienceandmechanismsofCRFmaybegeneralandnotcancerspecific.Thistypeofpredictivemodelcanbeusedtoidentifyhighriskpatients,educatepatientsabouttheirsymptomexperience,andimprovethetimingofpre-emptiveandpersonalizedsymptommanagementinterventions.Theseresultssuggestthattheintegrationofdemographicandclinicaldatacanenhanceclinicalprognosticprediction,whichwillcontributetothedevelopmentofprecisioncancermedicine.Ourmethodsaregeneralizabletoothertypesofsymptoms.

Page 105: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

94

SINGLE-MOLECULEPROTEINIDENTIFICATIONBYSUB-NANOPORESENSORS

MikhailKolmogorov1,EamonnKennedy2,ZhuxinDong2,GregoryTimp2,PavelA.Pevzner1

1DepartmentofComputerScienceandEngineering,UniversityofCaliforniaSanDiego,USA;2ElectricalEngineeringandBiologicalScience,UniversityofNotreDame,USA

Mikhail,KolmogorovRecentadvancesintop-downmassspectrometryenabledidentificationofintactproteins,butthistechnologystillfaceschallenges.Forexample,top-downmassspectrometrysuffersfromalackofsensitivitysincetheioncountsforasinglefragmentationeventareoftenlow.Incontrast,nanoporetechnologyisexquisitelysensitivetosingleintactmolecules,butithasonlybeensuccessfullyappliedtoDNAsequencing,sofar.Here,weexplorethepotentialofsub-nanoporesforsingle-moleculeproteinidentification(SMPI)anddescribeanalgorithmforidentificationoftheelectricalcurrentblockadesignal(nanospectrum)resultingfromthetranslocationofadenaturated,linearlychargedproteinthroughasub-nanopore.Theanalysisofidentificationp-valuessuggeststhatthecurrenttechnologyisalreadysufficientformatchingnanospectraagainstsmallproteindatabases,e.g.,proteinidentificationinbacterialproteomes.

Page 106: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

95

GENEEXPRESSIONPROFILEOFOSTEOARTHRITISAFFECTEDFINGERJOINTS

MilicaKrunic1,KlausBobacz2,ArndtvonHaeseler3

1CenterforIntegrativeBioinformaticsVienna,MaxF.PerutzLaboratories,UniversityofVienna,MedicalUniversityofVienna,Vienna,Austria;2DepartmentofInternalMedicine

III,DivisionofRheumatology,MedicalUniversityofVienna,Vienna,Austria;3BioinformaticsandComputationalBiology,FacultyofComputerScience,Universityof

Vienna,Vienna,AustriaMilica,KrunicOsteoarthritis(OA)isajointdisease,whichcanaffectanyjoint.However,themostfrequentnon-weightbearingjointsaffectedbyOAarehandjoints.ThemostcommonclinicalpresentationofhandOAispainandlossofhandstrength,whichrestrictstheabilityofpeopletoperformdailyactivities.MultiplefactorscancontributetothedevelopmentofthehandOA,ofwhichthemostfrequentlyobservedare:age,gender,genetics,obesity,occupation,andrepetitivejointusage.OAinproximalinterphalangeal(PIP)anddistalinterphalangeal(DIP)jointsisconsideredtobethemostcommoncauseofhandpainnowadays.Toourbestknowledge,thereisnopublishedresearch,whichindetailsaddressesuncleargeneticetiologyofthefingerOA.SincecartilageisoneofthemostcommonlydefectedtissueinOA,theaimofourstudywastoexploregeneexpressionprofileofchondrocitessampledfromtwofingerjoints:PIPandDIP,andtoinvestigatewhichpathwaysandgeneontologytermswerealteredinpatientsaffectedbythisdisease.

Page 107: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

96

DISCOVERYANDPRIORITIZATIONOFDENOVOMUTATIONSINAUTISMSPECTRUMDISORDER

TaeyeopLee,JaehoOh,MinGyunBae,JunHyeongLee,JungKyoonChoi

DepartmentofBioBrainEngineering,KoreaAdvancedInstituteofScienceandTechnology(KAIST),RepublicofKorea

Taeyeop,LeeAutismspectrumdisorder(ASD)isaneurodevelopmentaldisordercharacterizedbyimpairedsocial-interaction,andrestrictedandrepetitivebehaviors.Previousstudieshavereportedthatthegeneticcontributionorheritabilityisashighas80%inASD.InordertoelucidatethegeneticarchitectureofASD,manyresearchersperformedextensivestudiesanddiscoveredsomesignificantfindings.Currently,hundredsofdifferentgeneshavebeenunveiled,mostlythroughidentificationofrelatedrarevariants.Raregeneticvariants,bothinheritedanddenovo,areproposedtobecausalin~30%ofASDpatients.Incomparison,commongeneticvariantsalsoareestimatedtocontributetoapproximately50%ofASDetiology.However,nospecificcommonriskvarianthasbeenfoundtodate,possiblyduetoinsufficientsamplesize.Here,wereportawholegenomesequencingstudyofASDpatientstodiscoverandcharacterizedenovomutationsinAsianpopulation.Bysequencing101autismtriosandunaffectedsiblings,welocatedcausalvariantsin74candidategenes.Thevariantsincludednotonlylossoffunctionandmissensevariants,butalsointronicandintergenicnon-codingvariants.Thecandidategenesetshowedsignificantoverlapwithknownautism,intellectualdisability,andchromatinrelatedgeneset.Furthermore,toprioritizethenon-codingdenovomutations,wedevelopedadeeplearningframeworkbasedon>2,000functionalfeatures.ThefeaturesincludedDNaseIhypersensitivesites,histonemodificationprofiles,diseasepathways,andtranscriptionfactorbindingsites,wherethenonlinearcombinationsofthefeaturesindicatethecausalprobabilityofanon-codingvariant.Theperformanceofthemodelwasevaluatedwithareaundercurve(AUC)andF1score.OurresultssuggestthatdenovovariantsarerelatedtoimportantASDriskgenes,andthatnoncodingdenovovariantshaveanon-zeroeffectinASD.

Page 108: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

97

CROSSTALKER:ANOPENNETWORKANDPATHWAYANALYSISPLATFORM

SeanMaxwell,MarkR.Chance

CaseWesternReserveUnIversityandNeoProteomics,Cleveland,OhioMark,ChanceIntroduction:Networkanalysismethodshavebecomecommonplaceresearchtoolsduetotheirprovenabilitytointerrogateandorganizelistsofmoleculartargetsofinterestidentifiedbybasicstatisticsalone,anduseofnetworkanalysistorefineclassifierfeaturesetshasbeenshowntoprovidesuperiorperformancecomparedtotargetsidentifiedsingly.WeintroduceCrosstalkerasafreewareplatformforacademicusethatiswebbasedandincorporatesmultiplepublicinteractionandgenesetdatabasestoperformnetworkanalysis,enrichmenttestingandvisualizationinamodernHTML5+JSinterface.Theuseofopendatabasesandalgorithmscoupledtoconvenientuserchoicesallowscrosscomparisonoffindingsandpermitseasyreplicationofresultsbyanylaboratoryimprovingreproducibilityandrigor.Methods:Listsofseedmoleculesaremappedontoareferenceinteractionnetworkselectedbytheuserandarandomwalkwithrestarts(RWR)isperformedusingtheseedmoleculesastherestartnodes.TheRWRscoresareadjustedtoz-scoresusingMonte-Carloestimatedscoredistributionsforeachnodeintheinteractionnetwork,andwehaveoptimizedtheMonte-Carloestimationparametersusinganalyticmethodsandcomputationaltesting.Assumingthez-scoresfollowanormaldistribution,theadjustedscoresareusedtoselectnodesthathaveap<0.001chanceofachievingthesameorhigherRWRscorebychanceastheydofromtheuserinput.Theresultingmoleculesaretestedforenrichmentsagainstuserselectedgenesetdatabasesandusedtoinduceresultsubnetworksfromthereferencenetwork.Theinducedsubnetworksarevisualizedwithoptionstoannotatenodes(molecules)andedges(interactions).Results:Computationalexperimentsusinginputsgeneratedbycombiningannotatedsetsoffunctionallyrelatedmoleculeswithunrelated“noisemolecules”showedthatadjustingproximityscoresbynull-distributionimprovedpredictionsoffunctionallyrelatedmoleculesoverrank-onlymethodswhentheinputscontainedmorenoisemoleculesthanannotatedmolecules.Choicesofmultipleinteractionnetworks(likeBioGRID,BioPlexorCOXPRESdb)enabletestingofdifferenthypotheseswithinthesameinterface,suchasco-expressionordirect/indirectphysicalinteractionsofrelatedmolecules.Theoptimizedalgorithmsusedbythecomputationalportionofthesoftwarefacilitateanalysistimesunder1minute,minimizingwaittimesandmaximizingthenumberofconcurrentusersthesystemcansupport.NovelAspect:AnalyticallyverifiedMonte-Carloestimationparameters.Multipleoptionsforinteractionnetworksandgenesets.Web-basedwithoptionstoexportresultsanddatainopen(JSON,CSV)andbinary(XLSX)formats.Freeforacademicuse

Page 109: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

98

SIGNATURESOFNON–SMALL-CELLLUNGCANCERRELAPSEPATIENTS:DIFFERENTIALEXPRESSIONANALYSISANDGENENETWORKANALYSIS

AbigailE.Moore1,BrandonZheng2,PatriciaM.Watson3,RobertC.Wilson3,DennisK.Watson3,PaulE.Anderson4

1DepartmentofNaturalScience,HampshireCollege,Amherst,MA01002,USA;2DepartmentofBiology,BardCollege,Annandale-On-Hudson,NY12504,USA;

3DepartmentofPathologyandLaboratoryMedicine,MedicalUniversityofSouthCarolina,Charleston,NC29425,USA;4DepartmentofComputerScience,Collegeof

Charleston,Charleston,SC29424,USAAbigail,MooreBackgroundLungcancerisboththesecondmostrepresentedcancerdiagnosisandtheleadingcauseofcancerdeathwithintheUnitedStates.Despitethehighoccurrenceofnon–small-celllungcancer(NSCLC),30%to55%ofpatientsrelapseaftercurativeresection,andthe5-yearrelativesurvivalrateis15%to21%.Thehighcostsofcancermedicationandcancerdrugfailuresareimpactedbybiomarkerprograms,whichhelpselectpatientswhomaybenefitfromagivendrug.MethodsNSCLCRNAsamplesweretakenfrom38patients,andclinicaloutcomesweredeterminedbytheAmericanCollegeofSurgeryOncologyGroup.Ofthesepatients,20werediagnosedasdisease-free,and18asrelapsepatientswithin3yearsofsurgicalresection.RNA-Seqlibrarieswerepaired-endsequencedonHiScanSQandHiSeq2500systems.ReadqualitywasdeterminedbyFastQC,andadaptersandlow-qualityreadsweretrimmedwithTrimmomatic.Trimmedpaired-endreadswerealignedtothehumangenome(HG38,UCSC)withRSEM.AlignedreadswereinputintotheR/BioconductorEBSeqpackagetoperformmediannormalizationanddifferentialexpressionanalysis.Differentiallyexpressedgeneswereanalyzedforover-representationofproteincomplexes,geneontologytermsandpathwaysviaConcensusPathDB.ResultsEmpiricalBayesianmethodsidentified122differentiallyexpressedgenes(FDR<0.05).Manylungcancer-relatedgeneswererecognized,suchasBAMBI,CPS1,CD70,SHISA3,andWNT11.Alsoidentifiedwerenovelgeneswithupregulatedexpressioninrelapsepatients:LILRA2,ALOX12,TSPAN-11,andCADM3,whichareinvolvedinimmuneresponse,arachidonicacidmetabolism,cellsurfacereceptorsignaling,andcell-celladhesion,respectively.Novelgeneswithdownregulatedexpressioninrelapsepatientswereidentified,includingMCCC1,MRGPRF,PRR4,andSLC7A14,whichareassociatedwithbiotinmetabolism,signaltransduction,celladhesion,andnegativeregulationofphosphataseactivity,respectively.Ahypergeometrictestrevealedover-representationofgeneontologytermsforbiologicalprocessesrelatedtocancerdevelopment:positiveregulationofcellproliferation(p=4.66e-06),lipoxygenasepathway(p=6.95e-05),andbeta-amyloidmetabolicprocess(p=0.000531).Onlyoneproteincomplex-basedsetwasover-represented:Gprotein-coupledreceptorligand.Accordingly,sixGPCR-relatedpathwayswereover-represented(p-valuesfrom6.77e-05to0.000196).Over-representationofothercancer-relatedpathwayswerefoundandincludeprostaglandinsynthesisandregulation(p=8.8e-05),fluoxetinemetabolismpathway(p=0.000217),andarachidonicacidmetabolism(p=0.000243).ConclusionsIdentifyingNSCLCpatientsatriskofrecurrenceiscrucialincancerresearch.Ouranalysesidentified122differentiallyexpressedgenesamongdisease-freeandrelapseNSCLCpatients,includingknownlungcancer-relatedgenesandnewcandidatebiomarkergenesthatareinvolvedinthediverseprocessesrelatedtoNSCLCdevelopment.Futureresearchinalternativesplicingandthedevelopmentofapredictivemodelbasedonourresultscouldsupportanewmethodforidentifyingindividualrecurrencerisk.

Page 110: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

99

RANKINGBIOLOGICALFEATURESBYDIFFERENTIALABUNDANCE

SoumyashantNayak,NicholasLahens,EunJiKim,GregoryGrant

UniversityofPennsylvaniaSoumyashant,NayakWeoftenwanttorankfeaturesbytheirdifferentialabundancebetweentwopopulations.InRNA-Seqforexample,weobtainquantifiedvaluesfortensofthousandsofgenesacrossawidespectrumofexpressionintensities.Anaiverankingbyfold-changeleadstoseveralissues.Oneofthemisthedivision-by-zeroissuewhichhappenswhenthechangeisfrom0toapositivequantity.Thisproblemisusuallydealtwithbyusingapseudo-countof1.FoldchangesfromsmallernumbershowevercantendtodominatethetopofrankinglistsincaseofdiscretedatalikeRNA-Seq.Therefore,onemightwonderwhetherachangefrom1to2(foldchangeof2)istobeconsideredmoresignificantthanachangefrom100to190(foldchangeof1.9).Wesystematicallystudythisissueatboththeoreticalandempiricallevels.WeconcludethatinRNA-Seqdatathereisanoptimalvalueofthepseudo-countwhichyieldsthebestsignificancecomparisons.Weformulatethenecessaryfoundationalmathematicsintermsofaphilosophicalaxiomaticframeworktoenablethesystematicexplorationoftherankingproblem.Additionallywedemonstratehowtheuseofpseudo-countsactuallyintegratesfold-changeanddifferenceandthisobservationcanbeusedtoobtaintheadvantagesofbothmethods,whileminimizingthedisadvantages.

Page 111: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

100

SYSTEMATICANALYSISOFOBESITYASSOCIATEDVARIATIONSTHROUGHMACHINELEARNINGBASEDONGENOMICSANDEPIGENOMICS

JaehoOh,JunHyeongLee,TaeyeopLee,MinGyunBae,JungKyoonChoi

DepartmentofBioBrainEngineering,KoreaAdvancedInstituteofScienceandTechnology(KAIST),RepublicofKorea

Jaeho,OhObesity,oneofthemajorglobalhealthconcerns,isametabolicdisorderresultingfrombothbehavioralandheritablecauses.Varioussolutions,suchasdiet,exericse,surgeryanddrugtherapies,havebeenproposedbutthesefailedtoprovidelong-termeffects.Manyresearchersperformedgenome-wideassociationstudies(GWAS)toidentifydisease-associatedgenomicregions,butinterpretationofthedataposesgreatchallenge.NumerousGWASanalysisstudiesreportthatFTOistheregionmostcloselyassociatedwithobesity,butthemechanismremainsunresolved.Accordingtoonerecentpaper,‘outsidevariants’,definedasSNPsthatareinweakLDwithGWASriskSNPsandinfluencetargetgene’sregulatorycircuitryincombination,shouldbefurtherinvesitgated.‘Outsidevariant’approachsuggestthatnotonlystatisticallysignificantGWASSNPsbutalsootherSNPsmaybebiologicallymeaningful.Todevelopanobesity-relatedmodelandunravelthemechanismthrough‘outsidevariant’approach,weusedtheimputedGWASdataof14,122subjectwithBMIinformation.Toselectfunctionalepigeneticregion,weusedhistonemodificationChIP-seqdatafromadipocytesandobesity-associatedtissuesandextractedSNPsetthatishighlyrelatedtoFTO.ByperformingregressionbetweenSNPsandFTOSNPs,wefoundSNPswithhighexplanatory-powerforobesityinthefunctionalepigenetic-region.Ourresultssuggestthatthe‘outsidevariant’analysis,alongwithseveralepigeneticdata,isanovelapproachtodiscoverasetofSNPs,includingSNPsthatappearstatisticallyinsignificant,thataffectobesity.

Page 112: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

101

SPARSEREGRESSIONMODELINGOFDRUGRESPONSEWITHALOCALIZEDESTIMATIONFRAMEWORK

TeppeiShimamura,HidekoKawakubo,HyunhaNam,YusukeMatsui

DivisionofSystemsBiology,NagoyaUniversityGraduateSchoolofMedicine,JapanTeppei,ShimamuraAmajorchallengeinpharmacogenomicstudiesisdifferencesintheclinicalcharacterizationofpatientsandtheirreactions,whichmakesitdifficulttoidentifyclinicallymeaningfulgene-druginteractionsandpredictdrugresponseforeachpatient.Inthisstudy,weconsideralocalizedregressionmodelforeachsampletopredictadrugresponsewithasetofmaineffectsandsecond-orderinteractionsforoncogenicalterationsforpatients.Weproposeasparsemodelingofinteractionswithlocalizedestimationframework(SMILE)forthistask.Wetakearegularizationapproachtoinducingstronghierarchyinthesensethataninteractioncoefficientcanhaveanon-zeroestimateonlyifbothofcorrespondingmaineffectcoefficientsarenon-zero.Weincorporatetwodifferentconstraintsintothegrouplassoandthelassowithintheframeworkoflocallikelihood,todeterminethetypeofstructuresuchasstronghierarchyandenhancesparsityontheinteractioncoefficients,whichenabletogenerateaninterpretablelocalizedinteractionmodelforeachsample.Itcanbeformulatedasthesolutiontoaconvexoptimizationproblem,whichweusethealternatingdirectionmethodofmultipliers(ADMM)methodforsolvingSMILE.Wethendemonstratetheperformanceofourproposedmethodinasimulationstudyandonapharmacogenomicdataset.

Page 113: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

102

PDBMAP:APIPELINEANDDATABASEFORMAPPINGGENETICVARIATIONINTOPROTEINSTRUCTURESANDHOMOLOGYMODELS

R.MichaelSivley1,JohnA.Capra2,WilliamS.Bush3

1DepartmentofBiomedicalInformatics,VanderbiltGeneticsInstitute,VanderbiltUniversity;2DepartmentofBiologicalSciences,VanderbiltGeneticsInstitute,VanderbiltUniversity;3DepartmentofPopulationandQuantitativeHealthSciences,Institutefor

ComputationalBiology,CaseWesternReserveUniversityRobert,SivleyRaregeneticvariantsidentifiedfromsequencingstudiesareoftengroupedbygenes,functionaldomains,andotherannotationstoincreasepowerintraitassociationtestsandidentifysharedphenotypiceffects.However,associationtestsrarelyconsidervariants’orientationintheirfunctionalcontext—three-dimensional(3D)proteinstructures.Varioustoolshavebeendevelopedforvisualizingspecificvariantsinthecontextofindividualproteinstructures;however,thesetoolsdonotsupportacomplete,systematicmappingofvariantsinidentifiedinsequencingstudiesintoallavailablesolvedandcomputationallypredictedproteinstructures.WedescribePDBMap,acomputationalpipelinetoefficientlymaphumangeneticvariationgeneratedbysequencingstudiesintothestructome.Wealsopresentthecompletemappingofmissensevariantsfromthe1000GenomesProject,GenomeAggregationDatabase(gnomAD,N=3,010,061),CatalogueofSomaticMutationsinCancer(COSMIC,N=1,104,417),ClinVar(N=56,235),andtheAlzheimer'sDiseaseSequencingProject(ADSP,N=891,849)intosolvedproteinstructuresfromtheProteinDataBank(N=31,688)andcomputationallypredictedhomologymodelsfromModBase(N=186,802).Sourcecodeisavailablefromhttps://github.com/capralab/pdbmapanddownloadsareavailableathttp://astrid.icompbio.net.

Page 114: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

103

REPETITIVERNAANDGENOMICINSTABILITYINHIGH-GRADESEROUSOVARIANCANCERPROGRESSIONANDDEVELOPMENT

JamesR.Torpy1,NenadBartonicek1,DavidD.L.Bowtell2,MarcelE.Dinger1

1GarvanInstituteofMedicalResearch,384VictoriaStreet,Darlinghurst2010,Sydney,Australia;2PeterMacCallumCancerCentre,EastMelbourne,Victoria3002,Australia

James,TorpyOvariancancerisahighlycomplexdiseasewitharangeofdifferenthistologicalsubtypes.Thishighlylethaldiseaseisestimatedtobethefifthmostcommoncauseofdeathfromcancerinfemales,withafive-yearrelativesurvivalrateof46.2%.High-gradeserousovariancancer(HGSOC),characterizedbywidespreadgenomicinstability,accountsfor70-80%ofovariancancerdeaths,andsurvivalrateshavenotimprovedsignificantlyforthelastfewdecades.Furthermore,theunderlyingcauseofaround1/3ofHGSOCcasescannotbeexplained.EvidencesuggeststhatRNAderivedfromrepetitiveregionsofthegenomeplaysaroleingenomicinstabilityanddevelopmentofcancerssuchashigh-gradeserousovariancancer,andmayplayaroleintheunexplainedHGSOCcases.Aberrantexpressionofcentromere-derivedRNAcausesdysfunctionalchromosomalsegregationduringmitosisandaneuploidy.Telomere-derivedRNAmaintainstelomeres,preventingchromosomalfusion,breakageandsubsequentrearrangementofthechromosomes.RetrotransposableelementssuchasLINE1sandAlusinsertintodifferentgenomiclocations,disruptingsequencesandcausingrearrangementssuchasduplications,inversionsandtranslocations.Wehaveanalysedover120HGSOCcaseandcontrolRNA-sequencingdatasetsofprimarysamplesfromtheAustralianOvarianCancerStudy,comparingdifferencesinexpressionofrepetitiveRNAtranscriptsacrossmultipleHGSOCsubtypesandcontrols.WefoundarangeofdifferentiallyexpressedrepetitiveRNAspeciesincludingLINE1,Aluandcentromere-derivedRNAwhichmaybecontributingtogenomicinstabilityinthesetumours.InordertoinvestigatethepotentialcausesofthedifferencesinrepeatRNAlevels,theirexpressionwascorrelatedwithexpressionofarangeofmethyltransferasessuchasDNMT1andDNMT3A-Cthatareknowntoregulatemethylationatrepetitiveheterochromatin,controllingRNAexpressionfromtheseregions.ExpressionofRNAi-associatedfactorssuchasDicerwasalsoassessedasthesefactorscancontributetorepetitiveRNAregulation.

Page 115: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

104

DIMENSIONREDUCTIONOFGENOME-WIDESEQUENCINGDATABASEDONLINKAGEDISEQUILIBRIUMSTRUCTURE

YunJooYoo1,Suh-RyungKim1,SunAhKim2,ShelleyB.Bull3

1DepartmentofMathematicsEducation,SeoulNationalUniversity;2DepartmentofStatistics,SeoulNationalUniversity;3ProssermanCentreforHealthResearch,The

Lunenfeld-TanenbaumResearchInstituteYunJoo,YooGeneticassociationanalysisusinghigh-densitygenome-widesequencingdataconsistingofsinglenucleotidepolymorphism(SNP)genotypescanbenefitfromvariousdimensionreductionstrategiesforseveralreasons.First,genome-widesignificancelevelforindividualSNPtestsshouldbedeterminedconsideringthecorrelationstructureofgenotypedata.AdjustmentforTypeIerrorinflationduetomultiplehypothesistestingcanbesoughtbasedonthedimensionreductionmethods.Second,increasedTypeIerrormaybereducedasthenumberofvariablesintheanalysisdecreasesbydimensionreduction.Third,thecomputationalburdencanbereducedasthecomplexityoftheanalysismodelisreduced.Fourth,thepowerofassociationtestcanbegainedbycombiningmultiplesignalsinagroupasaresultofthedimensionreductionstrategy.WedevelopedagenomepartitioningmethodbyclusteringSNPsintoblocksbasedonlinkagedisequilibriumstructure.ThealgorithmusesagraphmodelingofcommunitiesofhighlycorrelatedSNPsandappliesacliquepartitioningalgorithmtothegraphtopartitionSNPsintoblocks.Weappliedthealgorithmto1000GenomesProjectdata,andobtained162K,173K,334Kblocksincludingsingletonblocksintheautosomalregionsof22chromosomesforAsian,European,andAfricandatarespectively.TheaverageLDmeasurer^2(thePearsoncorrelationcoefficientoftwoadditivelycodedgenotypevariables)valueswithinblocksare0.465,0.437and0.329forAsian,European,andAfricandatawhereastheaverager^2valuesbetweenconsecutiveblocksare0.156,0.145,and0.098forthreepopulations.WeevaluatedtheTypeIerrorandthepowergainfromthesepartitionsforseveralmulti-SNPassociationtestsusingthesimulateddatabasedon1000GenomesProjectdata.Comparedtootherclusteringmethods,severaltestsusinglocaldimensionreductionstrategiescombinedwithgenome-widedimensionreductionshowedbetterpowerthanothermethods.Wealsodevelopedalocaldimensionreductionmethodforgenome-widesequencingdataespeciallytargetingthemulti-collinearityissueofdenseSNPgenotypedatatobeanalyzedbymultipleregressionanalysis.ThismethodclustersSNPsinmulti-collinearitybyexaminingthevarianceinflationfactor(VIF),andreplacessuchgroupbyprincipalcomponents.ThealgorithmproceedsiterativelyuntilallVIFvaluesareunderathresholdvalue.WhenwecomparedthepowerbetweentheanalysisbasedonoriginaldataandtheanalysisbasedonthedimensionreduceddatausingVIFevaluation,weobservedthepowergaininquadratic-typetestssuchasWaldtest.

Page 116: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

105

THEMULTIPLEGENEISOFORMTEST

YaoYu,ChadD.Huff

DepartmentofEpidemiology,TheUniversityofTexasMDAndersonCancerCenter,Houston,Texas,USA

Chad,HuffGene-basedassociationtestsaggregatemultiplevariantsinagenetoevaluatestatisticalevidenceforrarevariantassociation.Typically,thesetestsincludevariantsfromallcodingexonsinagene,irrespectiveofgeneisoform.Forgeneswithmultipleisoforms,thisisoftenapproximatelyequivalenttoatestofthelargestisoform,whichisnotnecessarilyoptimal.Becausesmallerisoformstendtobeenrichedforthecorefunctionaldomainsofagene,theymayalsobeenrichedforpathogenicvariantsorlargervarianteffectsizes.Toaddresstheopportunitiespresentedbyisoform-specificpatternsofdiseasesusceptibility,weintroducetheMultipleGeneIsoformTest(MGIT).MGITemploysapermutationapproachtotesteachisoformofagene,summarizingthecontributionofeachtranscripttocalculateasinglegene-levelp-value,withouttheneedtoexplicitlymodelcorrelationbetweentranscripts.MGITcanbeappliedinconjunctionwithanygene-basedassociationtesttoassessgene-levelsignificanceandtoidentifyisoformsthatmaybeenrichedforproteindomainsimpactingdiseaserisk.TodemonstratetheutilityofMGIT,wereportresultsfromagene-basedassociationtest(VAAST)involving783breastcancercases,322skincutaneousmelanomacases,and3,607controlsofEuropeanancestry.Fortwoestablishedcancergenes,weobservedatwo-foldandthree-foldreductioninp-valuewithMGITrelativetoawhole-genetest,forMITFinmelanomaandBRCA1inbreastcancer,respectively.Incontrast,forotherestablishedcancergenes,weobservedeithernochangeinp-value(RAD51BandBRCA2inbreastcancerandMC1R,MTAP,andBRCA2inmelanoma)oramodestattenuationofassociationsignal(CHEK2inbreastcancer).InthecaseofBRCA1,thedifferenceintheMGITassociationsignalwasprimarilydrivenbyrare,predicteddamagingmissensevariants,whichexhibitedlargedifferencesineffectsizebetweenthesmallestandlargestisoforms.MGITisimplementedinthesoftwarepackageXPAT,withsupportforVAAST,SKAT-O,and27additionalgene-basedassociationtests.

Page 117: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

106

IMAGINGGENOMICS

POSTERPRESENTATIONS

Page 118: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

107

GENETICANALYSISOFCEREBRALBLOODFLOWIMAGINGPHENOTYPESINALZHEIMER’SDISEASE

XiaohuiYao1,ShannonL.Risacher2,KwangsikNho2,AndrewJ.Saykin2,HengHuang3,ZeWang4,LiShen2

1SchoolofInformaticsandComputing,IndianaUniversity,Indianapolis;2DepartmentofRadiologyandImagingSciences,IndianaUniversitySchoolofMedicine;3Departmentof

ElectricalandComputerEngineering,UniversityofPittsburgh;4DepartmentofRadiology,LewisKatzSchoolofMedicine,TempleUniversity

Heng,HuangCerebralbloodflow(CBF)providesameanstoassesstheneuronalandneurovascularconsequencesofAlzheimer’sdisease(AD)pathology.BothADspecificandnon-specificCBFchangesmaybedrivenbyuniqueorcommongeneticfactors.ToidentifygeneticvariantsassociatedwithADpathogenesis,weperformedatargetedanalysistoexamineassociationbetween4,033SNPsof24ADcandidategenesandCBFphenotypesmeasuredbyarterialspinlabeling(ASL)magneticresonanceimaging(MRI)infourbrainregionsofinterest(ROIs)includingleftangular,rightangular,lefttemporalandrighttemporalgyri.Participantsinclude258non-HispanicCaucasiansubjectsfromtheAlzheimer'sDiseaseNeuroimagingInitiative(ADNI)cohort.TargetedgeneticassociationanalysisofCBFoneachROIwastestedusinglinearregressionunderanadditivegeneticmodelinPLINK,whereage,genderandAPOEɛ4statuswereincludedascovariates.Post-hocanalysisusedBonferronicorrectionforadjustingboththegeneticandCBFmeasures.GATESwasusedtocalculategene-levelp-values.TheadditiveeffectsoftheidentifiedgeneticvariantsfromtheaboveassociationanalysiswerealsoassessedateachvoxelusingSPM12underone-wayANOVAtestwithage,genderandAPOEɛ4statusascovariates.Thesinglenucleotidepolymorphism(SNP)levelanalysisidentifiedanovellocusinINPP5D(inositolpolyphosphate-5-phosphataseD)significantlyassociatedwithleftangulargyrus(L-AG)CBF.Ingene-basedanalysis,bothINPP5DandCD2AP(CD2associatedprotein)wereassociatedwithL-AGCBF.ThediscoveredINPP5Dlocusexplained8.29%varianceofleftangularCBFafteradjustingforage,genderandAPOEɛ4status.FurtheranalysesonanindependentsubsetoftheADNIsamples(N=906)revealedthattheminoralleleofthelocuswasassociatedwithlowercerebrospinalfluidt-tau/Aβ1-42ratio.INPP5Dfunctionsasanegativeregulatorinimmunesystemandanumberofinflammatoryresponses,andhasbeenfoundrelatedtoinhibitTREM2signaling.TheidentifiedCBFriskfactorhasthepotentialtoprovidenovelinsightsforbetterrevealingthecomplexmolecularmechanismsofAD.ItwarrantsfurtherinvestigationwhethertheriskfactorisassociatedwiththeADpathophysiology,thevascularpathophysiology,and/ortheirinteraction.

Page 119: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

108

PBRM1MUTATIONSAREASSOCIATEDWITHTISSUEMORPHOLOGICALCHANGESINKIDNEYCANCER

JunCheng1,JieZhang2,ZhiHan2,LiangCheng2,QianjinFeng1,KunHuang2

1SouthernMedicalUniversity,2IndianaUniversitySchoolofMedicineKun,HuangBackground:Clearcellrenalcellcarcinoma(CCRC)isthemostcommonkidneycancer.Withtheaccumulationoflargescalegenomicdata,geneswithmutationsthatarecommontoCCRCpatientshavebeenidentified.Forinstance,VHLhasmutationsinalmost49.9%oftheCCRCpatientsinTheCancerGenomeAtlas(TCGA)projectfollowedbyPBRM1,MUC4andSETD2.WhilesomeofthesegeneshavebeenestablishedasdrivergenesforCCRC(e.g.,VHLandSETD2),thefunctionalimplicationsoftheirmutationsarestillbeingcharacterized.Previousstudiesoftenfocusedontheeffectsofthemutationsonmolecularlevelssuchasgene/microRNAexpressionandDNAmethylation.Inthisstudyweaimtocharacterizethemorphologicalchangesatcellularandtissuelevelsassociatedwiththesemutations.Methods:Mutationalstatusandhistopathologicalimagingdatafor448CCRCpatientswereobtainedfromTCGAthroughtheNCIGenomicDataCommons.Therearesixgeneswithmutationsinmorethan7%ofthepatients,theyareVHL,PBRM1,MUC4,SETD2,BAP1,andMTOR.Theimagingfeatureswerethenextractedusingcomputationalpipelinewehavepreviouslydeveloped.Ourpipelineconsistsofthreesteps:nucleussegmentation,cell-levelfeatureextraction,andaggregatingcell-levelfeaturesintopatient-levelfeatures.Tentypesofcell-levelfeatureswereextractedincludingnucleararea(area),lengthsofmajorandminoraxesofcellnucleusandtheirratio(major,minor,andratio),meanpixelvaluesofnucleusinRGBthreechannelsrespectively(rMean,gMean,andbMean),andmean,maximum,andminimumdistances(distMean,distMax,anddistMin)toneighboringnucleiinDelaunaytriangulationgraph.Atlast,allcell-levelfeaturesfromthesamepatientwereaggregatedintopatient-levelfeaturesusingabag-of-visual-wordsmodelwithK-means(K=10)algorithmforlearningwords.Fiveadditionalparameterswerecalculatedforeachtypeoffeatures-mean,standarddeviation,skewness,kurtosis,andentropy.Thusthereare150imagefeaturesintotal.Foreachselectedgene,thefeatureswerecomparedbetweenpatientswithandwithoutmutationsusingMann-Whitney-Utests.Results:Whilethereareimagingfeatureswithp-valuelessthan0.05foreverygene,multipletestcompensation(BHFDR)suggestedthatonlyPBRM1mutationsareassociatedwithsignificantlydifferentimagingfeatures(69featureswithq-value<0.05).Amongthem‘distMax_bin2’,‘distMin_bin3’,‘ratio_bin9’showsignificantlyincreasesinthemutationgroupwhile‘distMean_std’,‘major_std’and‘ratio_std’showsignificantdecreases.DiscussionandConclusion:TheaboveresultssuggestthattumorcellsinthepatientswithPBRM1mutationsaremorecompactandtheirnucleishapesaremorehomogeneousandclosertoaroundshape.TheseresultsareconsistentwithvisualinspectionandpreviousreportthatPBRM1mutationleadstodecreaseofextracellularmatrixgeneexpressionandthusareductionofstroma.

Page 120: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

109

IMAGEGENOMICSOFINTRA-TUMORHETEROGENEITYUSINGDEEPNEURALNETWORKS

HuiQu1,SubhajyotiDe2,DimitrisMetaxas1

1RutgersUniversity,2CancerInstituteofNewJerseyDimitris,MetaxasIntra-tumorheterogeneityi.e.genetic,molecular,andphenotypicdifferencesbetweentumorcellswithinasingletumorisamajorchallengeforclinicalmanagementofcancerpatients,contributingtotherapeuticfailure,diseaserelapsesanddrugresistance.Whilerecentfindingssuggestthatthereisextensiveintra-tumorgeneticheterogeneityinallmajorcancertypes,itremainstobeunderstoodhowthatrelatestointra-tumorheterogeneityatthepathway-andcellphenotypelevel.Wehavedevelopedaninnovativecomputationalframeworkbasedonneuralnetworkstoidentifycellularfeaturesfromhistologicalslidesandthenassociatethemwithgenomicandpathway-levelfeaturesinamulti-scalemodel,beforeapplyingittoacohortof469bladdercancersampleswhichhasgenomic,transcriptomic,pathway,andhistologicalimagingdata.Inbrief,ourmethodfirstusesaTumorSegmentationNetwork(TSN)andNucleiSegmentationNetwork(NCN)toidentifytumorcellsregionsandtumornucleiinthehistologicalslides.Fortumorsegmentation,wefirstlyextractedtumorandnormalpatchesfromthewholeslideimagesof40patients,thentrainedaTSNtoclassifyanypatchintotumorornormal.Givenanyotherwholeslideimage,thetrainedmodelcanidentifyalltumorpatches,whichformsthetumorregionsaftermorphologicaloperations.Thesegmentedtumorregionsandnucleiarethenusedtocomputeq-statistic,andalsoalphaandbetadiversitymeasureswhichreflectextentoflocalandregionalintra-tumorphenotypicheterogeneity.Benchmarkingagainstpathologicallycuratedestimatesindicatesthatthisapproachhashighaccuracyinidentifyingtumorcellfeaturesinaheterogeneoustumor.Wethenintegrateimagingandgenomicsdatatopredictaspectsofphenotypicheterogeneitybasedoncancer-relatedmutationsandgeneexpressionusinguni-andmultivariateapproachessuchasRelationNetwork(RN).Ourpreliminaryresultsareconsistentwithbiologicalknowledge.Forexample,weestimatedthenumberofsubclonesineachtumorbasedonmutationdata,andobservedthatindeedthesampleswithahighnumberofsubcloneshavehighphenotypicheterogeneityscores.WealsoestimatedmRNAexpressionlevelofKi67,amarkerofcellgrowthandobservedthatthesampleswithhigherq-statisticalsohadhigherKi67expression,suggestingthatcertainpatternsofintra-tumorheterogeneitycorrelatewithtumorcellgrowthrates.Multi-scaleanalysisintegratinggenetic,pathway-andphenotypicheterogeneitywillprovidefundamentalinsightsinto“functional”variabilitywithinandacrosscancers,helpingtorefineprecisionmedicineapproachestoimproveclinicalmanagementofcancerpatients.

Page 121: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

110

THENEUROIMAGINGINFORMATICSTOOLSANDRESOURCESCOLLABORATORY(NITRC)ANDITSIMAGINGGENOMICSDOMAIN

LiShen1,DavidKennedy2,ChristianHaselgrove2,AbbyPaulson3,NinaPreuss3,RobertBuccigrossi3,MatthewTravers3,AlbertCrowley3,andTheNITRCTeam3

1DepartmentofRadiologyandImagingSciences,IndianaUniversitySchoolofMedicine;

2DepartmentofPsychiatry,UniversityofMassachusettsMedicalSchool;3TCG,Inc.Li,ShenAimofInvestigation:NeuroimagingInformaticsToolsandResourcesCollaboratory(NITRC)isaneuroinformaticsknowledgeenvironmentforMR,PET/SPECT,CT,EEG/MEG,opticalimaging,clinicalneuroinformatics,computationalneuroscience,andimaginggenomicstoolsandresources.WeencourageresearcherstolisttheirImagingGenomicstoolsattheNITRCwebsitewww.nitrc.org.Methods:Initiatedin2006throughtheNIHBlueprintforNeuroscienceResearch,NITRC’smissionistofosterauser-friendlyknowledgeenvironmentfortheneuroinformaticscommunity.In2012,NITRCaddedImagingGenomicstoitsbroadenedscientificscope.Bycontinuingtoidentifyexistingsoftwaretoolsandresourcesvaluabletothiscommunity,NITRC’sgoalistosupportitsresearchersdedicatedtoenhancing,adopting,distributing,andcontributingtotheevolutionofneuroinformaticsanalysissoftware,data,andcomputeresources.Results:Locatedonthewebatwww.nitrc.org,theResourcesRegistry(NITRC-R)promotessoftwaretoolsandresources,vocabularies,testdata,anddatabases,therebyextendingtheimpactofpreviouslyfunded,neuroimaginginformaticscontributionstoabroadercommunity.NITRC-Rgivesresearchersgreaterandmoreefficientaccesstothetoolsandresourcestheyneed,bettercategorizingandorganizingexistingtoolsandresources,facilitatinginteractionsbetweenresearchersanddevelopers,andpromotingbetterusethroughenhanceddocumentationandtutorials—allwhiledirectingthemostrecentupgrades,forums,andupdates.Asof11/2017,over970publicresourcesarelistedonNITRC-R,wheretheImagingGenomicsdomainincludes60resourcessuchasADNI,TCGA,ENIGMA,UKBiobank,andothers.NITRC-ImageRepository(NITRC-IR)makes8,285imagingsessionspubliclyavailableatnocharge,andNITRCComputationalEnvironment(NITRC-CE)providescloud-basedcomputationservicesdownloadabletoyourmachinesorviacommercialcloudproviderssuchasAmazonWebServicesandMicrosoftAzure.Conclusions:Insummary,NITRCisnowanestablishedknowledgeenvironmentfortheneuroimagingcommunitywheretoolsandresourcesarepresentedinacoherentandsynergisticenvironment.Withitsexpandedscopeintoimaginggenomics,NITRCaimstobecomeatrustedsourceforidentificationofresourcesinthishighlyactiveandpromisingdomainbridgingadvancedneuroimagingandgenomics.Weencouragetheimaginggenomicsresearchcommunitytocontinueprovidingvaluableresources,designandcontentfeedbackandtoutilizetheseresourcesinsupportofdatasharingrequirements,softwaredisseminationandcost-effectivecomputationalperformance.Acknowledgements:FundedbytheNIHBlueprintforNeuroscienceResearch,NIBIB,NIDA,NIMH,andNINDS.

Page 122: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

111

IDENTIFYINGTHEGISTOFCNNS:FINDINGINTERPRETABLESIGNATURESOFHISTOLOGYIMAGEMODELSBUILTUSINGNEURALNETWORKS

ArunimaSrivastava1,ChaitanyaKulkarni1,KunHuang2,ParagMallick3,RaghuMachiraju1

1TheOhioStateUniversity,2IndianaUniversitySchoolofMedicine,3StanfordUniversityArunima,SrivastavaConvolutionalNeuralNetworks(CNNs)havegainedsteadypopularityastheselectedmethodofhistologyimageanalysisandsubsequentdiseasemodeling.SinceCNNsarepurelydatadrivenlearningmodels,theyhaveanedgeovermorphologydriven(pre-selected)tissueimagefeaturesthatmaybebiasedanddifficulttogeneralize.Morphologicalfeatures,namelytissuetexture,structure,nucleisizeandshape,presenceoffibroblastsandlymphocytesetc.,mightnotbecomprehensiveenoughfordifferentdatasets,buttheydoprovideaninherentlyinterpretablecharacterizationofthehistology.WhileCNNsandtheirsubsequentfeaturesprovetobepowerfulclassifiers,theyfailtoprovideanexplanationforthisclassification,asthefeaturesareONLYinterpretablebytheCNNsthemselves.Translating“underthehood”activitiesofaCNNwouldendeavortomakeitmoregeneralizablewhilethefinalmodelwillnotonlybeabletoeffectivelyclassifywholeslidetissueimages,itwillalsohavethepotentialtoeducateusonthenuancesofthehistologicaldata.Thisworkaimstousebothtypesofinterpretable(morphological)andpowerfulbutun-interpretable(CNNbased)featurestoderiveasignatureforsuccessfulCNNmodels,whichhelprelatethemtoknownbiologicalattributesandshedlightoncomponentsthatarecriticaltothevarioussubtypesunderinvestigation.WeuseastratifiedbreastcancerhistologyclassificationdatasetfromtheBioImaging(2015)Challengethatcontainssampleimagesfromfourdifferentkindsofbreasttissue(Normal,Benignlesion,In-situcarcinomaandInvasivecarcinoma).Byfollowingatwo-prongedapproachofmodelingthesamedatasetusingCNNs(usingtheGoogLeNetarchitecture)andmorphologicalfeatures(usingCellProfiler-abiologicalimageanalyticstool),itwaspossibletoinferaninterpretablesignatureoffeaturesutilizedbytheCNN.Weadditionallyexplorethepossibilityofcombiningthesetwotechniquestoextractamorepowerfulandpreciseclassification.Thisworksummarizestheneedforunderstandingthewidelytrustedmodelsbuiltusingdeeplearning,andaddsalayerofbiologicalcontexttoatechniquethatfunctionedasaclassificationonlyapproachtillnow.

Page 123: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

112

PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES

POSTERPRESENTATIONS

Page 124: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

113

EXPLORINGTHEPOTENTIALOFEXOMESEQUENCINGINNEWBORNSCREENING

StevenE.Brenner1,AashishN.Adhikari1,YaqiongWang1,RobertJ.Currier2,RenataC.Gallagher3,RobertL.Nussbaum4,YangyunZou1,UmaSunderam5,JosephSheih3,FlaviaChen3,MarkKvale3,SeanD.Mooney6,RajSrinivasan5,BarbaraA.Koenig3,PuiKwok3,

JenniferM.Puck3,TheNBSeqProject

1UniversityofCalifornia-Berkeley,2CaliforniaDepartmentofPublicHealth,3UniversityofCalifornia-SanFrancisco,4Invitae,5TataConsultancyServices,6Universityof

WashingtonSteven,BrennerTheNBSeqprojectisevaluatingeffectivenessofwholeexomesequencing(WES)fordetectinginbornerrorsofmetabolism(IEM)fornewbornscreening(NBS).De-identifiedarchiveddriedbloodspotsfromMS/MStruepositiveandfalsepositivecasespreviouslyidentifiedintheCaliforniaNBSwerestudied.18outof137affectedindividualslackedtworarepotentiallydamagingsinglenucleotidevariantsorshortindelsingenesresponsiblefortheirMendeliandisorders.Thesensitivityofcausalmutationdetectionin137PhaseINBSeqexomesvariedacrossdisorders;allaffectedPKUcaseswerepredictedcorrectly,butseveralcasesofotherIEMsweremissed.Insomecases,exomesalsoconfidentlyidentifieddisordersdifferentfromthemetaboliccenterdiagnoses,suggestingthatsequencinginformationwouldhavebeenvaluableforproperclinicaldiagnosesinthosecases.Deeperanalysisofthedatawasundertakentoassesssourcesofdiscrepancybetweensequencingresults,MS/MScall,andclinicaldiagnosis.Copynumbervariation(CNV)callingtoolswereevaluatedonNBSeqexomesforabilitytoresolvesomeoftheseexomefalsenegatives.CNVtoolscanbothmissCNVsinexomesandreportthemspuriously.Weoptimizedtoolsforourdataandfilteredoutgenes(PRODH,HCFC1,ETFA)harboringcommonCNVs(identifiedfromCNVcallsonthe1000genomesprojectexomes).Thisidentifieddeletionsinthecorrectgenesfor4ofthe32exomefalsenegativesusingXHMM:2isovalericacidemiacases,1methylmalonicacidemiacaseand1OTCdeficiencycase.Wealsosystematicallyreviewedeveryvariantin78metabolicdisordergenesannotatedbyHGMDorClinVaraspathogenicorlikelypathogenicwith1000genomesMAF>0.1%.Ourre-assessmentoftheprimaryliteraturefor59suchvariantsfoundthatonly18werereportable(manystillVUS)andtherestweexcludedfromthepipeline.Literaturereviewalsohelpedidentify8casesdiagnosedwithshort-chainacyl-CoAdehydrogenase(SCAD)deficiencybutnotflaggedbyexomes.All8individualsharboredacommon(1000GenomesMAF:18.2%)ACADSallele(c.625A>G)presentinseveralNBSeqexomes,whichsometimesconfersapartialbiochemicalphenotypebutnotclinicaldisease.Forassessment,wetreatedtheseindividualsasunaffected.IncorporationofCNVdetectionandvariantcurationintoouranalysispipelineimprovedoverallsensitivityfrom77.9%to87.6%onthe137affectedPhaseINBSeqsamples.ThisupdatedpipelinewillberunonadditionalNBSeqexomestoassessthepotentialroleforWESinNBS.WhilestillnotsufficientlyspecificaloneforscreeningofmostIEMs,WEScanfacilitatetimelyandmoreprecisecaseresolution.

Page 125: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

114

AMETHODFORIMPROVEDVARIANTCALLINGATHOMOPOLYMERMARGINS(ANDELSEWHERE)

J.Buckley,M.Hiemenz,J.Biegel,T.Triche,A.Ryutov,D.Maglinte,D.Ostrow,X.Gai

CenterforPersonalizedMedicine,Children’sHospitalofLosAngelesJonathan,BuckleyAllsequencingtechnologiesaresubjecttoreaderrorswhich,inthecontextofvariantcalling(particularlylowvariant-allele-frequency(VAF)variantcalling),canyieldmiscalls.Readerrorsaremostproblematicwhengenomiccontext(suchasproximitytohomopolymers)influencestheerrorrateTheCenterofPersonalizedMedicineatChildren’sHospitalofLosAngeles(CHLA)recentlycollaboratedwithThermo-Fisher(TF)indevelopmentofaclinicalpediatriccancerpanelforsomaticvariantdetection(OncoKidsTM),usingTF’sIonTorrentsequencingplatform.Thetestneededtoidentifyvariantsintumorsub-clonesandinsampleswithanadmixtureoftumorandnormalcells,bothsituationsthatcanyieldlowVAFs.Ourchallengewastooptimizevariantcallingathomopolymermargins,andothergenomiclociwithahighbackgrounderrorrate(noise).TheTFapproachwastoidentifyproblematiclociandtoeitherlimitbasecallstoreadsfromonestrand(whenerrorsclusteredmostlyontheotherstrand),or‘blacklist’thelocusaltogether.Whilethisapproachwasconservative,avoidingmostfalsepositives,itresultedinunacceptablefalsenegativerates,particularlyforInDels.Giventhedeepcoverage(over1000xinmanyregions),itseemedlikelythatamorenuancedapproachmightyieldaccuratecalls,eveninthepresenceofsubstantialnoise.Thispresentationoutlinesanalgorithm(LocalAdjustmentforBackground,orLAB)developedatCHLAthatusesareferencedataset(filteringouttruepositives)toestablishthenoisedistributionateachlocus.Thenoisedistributionvariesgreatlyacrossthepanelgenes,fromessentiallyerror-freelocitolociinwhichthemajorityofreadsshowaspuriousbasesubstitutionorInDel.Whileproximitytoahomopolymerisastrongdeterminantofnoise,non-homopolymerregionscanalsohavehighnoiseandmanyhomopolymersyieldrelativelycleandata.VariantcallsaremadethroughcomparisonoftheobservedVAFwiththelocus-specificVAFdistributioninthereference.Optionally,thereferencesetcanbelimitedtosamplesofthesametypeasthetestsample(e.g.FFPE).Adjustmentsmaybemadeforsampleswithgloballyincreasederrorrates.InregionsofcomplexInDelpatterns,astatisticalmodeltestsforshiftsinthesepatterns,indicativeofatruevariant.AnimportantcomponentisaGUIthatprovidesavisualrepresentationofthebasisforacall,andoptionssuchasstrand-specificanalysis.ApplicationtosampleswithknownSNVsandInDels(Acrometrix‘groundtruth’samples)resultedinimprovementinInDelcallsfrom65%to100%.Thepresentationwilldescribethecallingpipeline,withillustrativeexamples,andpresentcomparativeperformancedata.

Page 126: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

115

EFFICIENTSURVIVALMULTIFACTORDIMENSIONALITYREDUCTIONMETHODFORDETECTINGGENE-GENEINTERACTION

JiangGui,XuemeiJi,ChristopherI.Amos

DepartmentofBiomedicalDataScience,GeiselSchoolofMedicine,Dartmouth,Lebanon,NH03756

TheproblemofidentifyingSNP-SNPinteractionsincase-controlstudieshasbeenstudiedextensivelyandanumberofnewtechniqueshavebeendeveloped.Littleprogresshasbeenmade,howeverintheanalysisofSNP-SNPinteractionsinrelationtocensoredsurvivaldata.Wepresentanextensionofthetwoclassmultifactordimensionalityreduction(MDR)algorithmthatenablesdetectionandcharacterizationofepistaticSNP-SNPinteractionsinthecontextofsurvivaloutcome.TheproposedanEfficientSurvivalMDR(ES-MDR)methodhandlescensoreddatabymodifyingMDR’sconstructiveinductionalgorithmtouselogrankTest.WeappliedES-MDRtogeneticdataofover470,000SNPsfromtheOncoArrayConsortium.Weuseonsetageoflungcancerandcase-control(n=27,312)statusasthesurvivaloutcomeanddivideddataintotrainingandtestingsets.Wealsoadjustforsubject’sage,genderandsmokingstatus.Fromtrainingset,weidentifiedinterationbetweenSNPsfromBRCA1andIL17RCgenesasthetopmodelthatisassciatedwithlungcanceronsetage.Thisresultisvalidatedinthetestingset.ES-MDRiscapableofdetectinginteractionmodelswithweakmaineffects.Theseepistaticmodelstendtobedroppedbytraditionalregressionapproaches.

Page 127: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

116

BIOINFORMATICSPROCESSINGSTRATEGIESFOREFFICIENTSEQUENCINGDATASTORAGEUSINGGVCFBANDING

NicholasB.Larson,ShannonK.McDonnell,IainF.Horton,SaurabhBaheti,JeanetteE.Eckel-Passow,StevenN.Hart

MayoClinic

Nicholas,LarsonAnemergingchallengeintheeraofnext-generationsequencing(NGS)isefficientdatastoragepractices,particularlyforfileformatsthataccommodateadhocconstructionofanalysis-readydatasets.TheVariantCallFormat(VCF)isthepredominantfiletypeusedforstoringandanalyzingNGS-basedgeneticvariantinformation.However,itpresentsmultiplepracticallimitationswhenmergingindividualfilesformulti-samplerepresentations.RecentdevelopmentofthegVCFfileformatbyGATKaddressesmanyoftheseconcernsbycharacterizingsame-as-referencesegmentsofthegenomeasintervalentriesdefinedbyasharedgenotypequality(GQ)score.CurrentdefaultsettingstogeneratethisintermediatefileformatresultinanewdataentryateachbasepairpositiontheGQshifts,presentingcost-benefitconsiderationsofimprovedandcomputationallyefficientmulti-samplegenotypingattheexpenseoflargeintermediatefiles.However,additionaloptionsallowforcontiguousentriestobemergediftheyfallwithinapredefinedGQbin,aprocessknownasbanding.WehypothesizedthatsubstantialgVCFfilesizereductioncouldbeattainedforwhole-genomesequencing(WGS)throughtheuseofcoarseGQbandingoptions;althoughtheimpactofthisapproachonoutputqualityofmulti-samplevariantcallingiscurrentlyunknown.ToinvestigatethepropertiesofgVCFbandingongenotypingintegrity,weprocessed50WGSsamplesaswellas50whole-exomesequencing(WES)samplesfromtheMayoClinicBiobankunderavarietyofGQbandingsettings(default,intervalsof10,{0,20,60},{0,20}).Thesesingle-samplegVCFfilesweresubsequentlymergedandjointgenotypedundervaryingcombinationsofbandingoptions,separatelybysequencingapplication,andoutputgenotypesforchromosome22werecomparedforconcordancewithresultsusingcompleteinformation(i.e.,nobanding).Overall,WGSsamplesexhibitedsubstantiallysmallergVCFfiles,with{0,20}bandingresultinginameanfilesizereductionof87%(range:84-90%)relativetodefaultsettings.Genotypeconcordanceexceeded99.9%underallcomparisons,whileweadditionallyobservedmorevariablepositionsemittedascoarserbindefinitionswereapplied.ComparablefindingswereobservedforWESdata.OurresultshighlightimpressiveimprovementsinNGSvariantcalldatastorageefficiencygainedbycoarsebandingoptionsforgVCFoutput,withminimalimpactonaccompanyinggenotypingquality.

Page 128: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

117

IDENTIFICATIONOFANOVELTSC2MUTATIONINAPATIENTWITHTUBEROUSSCLEROSISCOMPLEX

Jae-HyungLee1,Su-KyeongHwang2,Jung-eunYang3,Chae-SeokLim3,Jin-ALee4,KyungminLee5,Bong-KiunKaang3,Yong-SeokLee6

1KyungHeeUniversity,2KyungpookNationalUniversityHospital,3SeoulNational

University,4HannamUniversity,5KyungpookNationalUniversityGraduateSchoolofMedicine,6SeoulNationalUniversityCollegeofMedicine

Yong-Seok,LeeTuberoussclerosiscomplex(TSC)isaneurocutaneousdisordercharacterizedbymultiplesymptomsincludingneuropsychologicaldeficitssuchasseizures,intellectualdisability,andautism.TSCisinheritedinanautosomaldominantpatternandiscausedbymutationsineithertheTSC1orTSC2genes,whichresultinthehyperactivationofthemammaliantargetofrapamycin(mTOR)signalingpathway.Inthisstudy,weidentifiedanovelsmalldeletionmutationinTSC2byperformingwholeexomesequencinginaKoreanpatient,whoexhibitedmultipleTSC-associatedsymptomsincludingfrequentseizures,intellectualdisability,languagedelays,andsocialproblems.Inaddition,wevalidatedthefunctionalsignificanceofthenovelmutationbyexaminingtheeffectofthedeletionmutantonmTORpathwayactivation.RecentstudieshavesuggestedthatmTORinhibitorssuchasrapamycincanbeeffectivetotreatTSC-associateddeficitsinrodentmodelsofTSC.Accordingly,wefoundthateverolimustreatmenthasbeneficialeffectsonSEGAsizeandautismrelatedbehaviorsinthepatient.

Page 129: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

118

CONSIDERATIONSFORAUTOMATEDMACHINELEARNINGINCLINICALMETABOLICPROFILING:ALTEREDHOMOCYSTEINEPLASMACONCENTRATION

ASSOCIATEDWITHMETFORMINEXPOSURE

AlenaOrlenko1,JasonH.Moore1,PatrykOrzechowski1,2,RandalS.Olson1,JunmeiCairns3,PedroJ.Caraballo3,RichardM.Weinshilboum3,LieweiWang3,MatthewK.

Breitenstein1

1UniversityofPennsylvania;2AGHUniversityofScienceandTechnology,Krakow,Poland;3MayoClinic

Alena,OrlenkoWiththematurationofmetabolomicsscienceandproliferationofbiobanks,clinicalmetabolicprofilingisanincreasinglyopportunisticfrontierforadvancingtranslationalclinicalresearch.AutomatedMachineLearning(AutoML)approachesprovideexcitingopportunitytoguidefeatureselectioninagnosticmetabolicprofilingendeavors,wherepotentiallythousandsofindependentdatapointsmustbeevaluated.Inpreviousresearch,AutoMLusinghigh-dimensionaldataofvaryingtypeshasbeendemonstrablyrobust,outperformingtraditionalapproaches.However,considerationsforapplicationinclinicalmetabolicprofilingremaintobeevaluated.Particularly,regardingtherobustnessofAutoMLtoidentifyandadjustforcommonclinicalconfounders.Inthisstudy,wepresentafocusedcasestudyregardingAutoMLconsiderationsforusingtheTree-BasedOptimizationTool(TPOT)inmetabolicprofilingofexposuretometformininabiobankcohort.First,weproposeatandemrank-accuracymeasuretoguideagnosticfeatureselectionandcorrespondingthresholddeterminationinclinicalmetabolicprofilingendeavors.Second,whileAutoML,usingdefaultparameters,demonstratedpotentialtolacksensitivitytolow-effectconfoundingclinicalcovariates,wedemonstratedresidualtrainingandadjustmentofmetabolitefeaturesasaneasilyapplicableapproachtoensureAutoMLadjustmentforpotentialconfoundingcharacteristics.Finally,wepresentincreasedhomocysteinewithlong-termexposuretometforminasapotentiallynovel,non-replicatedmetaboliteassociationsuggestedbyTPOT;anassociationnotidentifiedinparallelclinicalmetabolicprofilingendeavors.Whilewarrantingindependentreplication,ourtandemrank-accuracymeasuresuggestshomocysteinetobethemetabolitefeaturewithlargesteffect,andcorrespondingpriorityforfurthertranslationalclinicalresearch.ResidualtrainingandadjustmentforapotentialconfoundingeffectbyBMIonlyslightlymodifiedthesuggestedassociation.IncreasedhomocysteineisthoughttobeassociatedwithvitaminB12deficiency–evaluationforpotentialclinicalrelevanceissuggested.Whileconsiderationsforclinicalmetabolicprofilingarerecommended,includingadjustmentapproachesforclinicalconfounders,AutoMLpresentsanexcitingtooltoenhanceclinicalmetabolicprofilingandadvancetranslationalresearchendeavors.

Page 130: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

119

PHARMGKB:NEWWEBSITERELEASE2017

MichelleWhirl-Carrillo1,RyanM.Whaley1,MarkWoon1,KatrinSangkuhi1,LiGong1,JuliaBarbarino1,CarolineThorn1,RachelHuddart1,MariaAlvarellos1,JillRobinson1,RussB.

Altman2,TeriE.Klein3

1DepartmentofBiomedicalDataScience,StanfordUniversity;2DepartmentofBioengineering,MedicineandGenetics,StanfordUniversity;3DepartmentofBiomedical

DataScienceandMedicine,StanfordUniversityAlena,OrlenkoWithPharmGKBisthelargestpubliclyavailableresourceforpharmacogenomics(PGx)discoveryandimplementation.Itsmissionistocollect,curate,integrateanddisseminateknowledgeabouthowhumangeneticvariationinfluencesdrugresponse.ThePharmGKBwebsiteallowsuserstoselectandviewinformationviasearch,filterandbrowseoptions.DataisalsoavailablebydirectdownloadthroughthewebsiteandthroughthePharmGKBAPI.PharmGKBlaunchedanewandimproveduserinterfaceinSeptember2017.Thenewwebsiteoffersbenefitssuchasadisplaythatworksonmobileandsmallscreendevices,improvedsearchingandfilteringcapabilities,andfasterpageloadspeeds.WhilethelookofPharmGKBhaschanged,allthecontentthatwasavailablepreviouslyisstillavailable,including:

• 5500annotatedgeneticvariants• 14,000curatedpeer-reviewedPGxarticles• 125evidence-basedpharmacokineticandpharmacodynamicspathways• 60reviewsofkeyPGxgenes(veryimportantpharmacogenes)• 450curateddruglabels• 90gene-drugpairswithcuratedgenotype-baseddrugdosingguidelines

Thewebsitefeaturesanonlinetutorialthatuserscanaccessbyfollowingthescreenprompts.Formoreinformation,pleasevisitPharmGKBathttp://www.pharmgkb.org.

Page 131: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

120

READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROM

NONCODINGDNA

POSTERPRESENTATIONS

Page 132: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

121

NETWORKANALYSISOFPSEUDOGENE-GENERELATIONSHIPS:FROMPSEUDOGENEEVOLUTIONTOTHEIRFUNCTIONALPOTENTIALS

TravisS.Johnson1,SihongLi1,JohnathanR.Kho2,KunHuang3,YanZhang1

1OhioStateUniversity,2GeorgiaInstituteofTechnology,3IndianaUniversityTravis,JohnsonPseudogenesarefossilrelativesofgenes.Pseudogeneshavelongbeenthoughtofas“junkDNAs”,sincetheydonotcodeproteinsinnormaltissues.Althoughmostofthehumanpseudogenesdonothavenoticeablefunctions,~20%ofthemexhibittranscriptionalactivity.TherehasbeenevidenceshowingthatsomepseudogenesadoptedfunctionsaslncRNAsandworkasregulatorsofgeneexpression.Furthermore,pseudogenescanevenbe“reactivated”insomeconditions,suchascancerinitiation.Somepseudogenesaretranscribedinspecificcancertypes,andsomeareeventranslatedintoproteinsasobservedinseveralcancercelllines.Alltheabovehaveshownthatpseudogenescouldhavefunctionalrolesorpotentialsinthegenome.Evaluatingtherelationshipsbetweenpseudogenesandtheirgenecounterpartscouldhelpusrevealtheevolutionarypathofpseudogenesandassociatepseudogeneswithfunctionalpotentials.Italsoprovidesaninsightintotheregulatorynetworksinvolvingpseudogeneswithtranscriptionalandeventranslationalactivities.Inthisstudy,wedevelopanovelapproachintegratinggraphanalysis,sequencealignmentandfunctionalanalysistoevaluatepseudogene-generelationships,andapplyittohumangenehomologsandpseudogenes.Wegeneratedacomprehensivesetof445pseudogene-gene(PGG)familiesfromtheoriginal3,281genefamilies(13.56%).Ofthese438(98.4%PGG,13.3%total)werenon-trivial(containingmorethanonepseudogene).EachPGGfamilycontainsmultiplegenesandpseudogeneswithhighsequencesimilarity.Foreachfamily,wegenerateasequencealignmentnetworkandphylogenetictreesrecapitulatingtheevolutionarypaths.Wefindevidencesupportingtheevolutionhistoryofolfactoryfamily(bothgenesandpseudogenes)inhuman,whichalsosupportsthevalidityofouranalysismethod.Next,weevaluatethesenetworksinrespecttothegeneontologyfromwhichweidentifyfunctionsenrichedinthesepseudogene-genefamiliesandinferfunctionalimpactofpseudogenesinvolvedinthenetworks.ThisdemonstratestheapplicationofourPGGnetworkdatabaseinthestudyofpseudogenefunctionindiseasecontext.

Page 133: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

122

RANDOMWALKSONMUTUALMICRORNA-TARGETGENEINTERACTIONNETWORKIMPROVETHEPREDICTIONOFDISEASE-ASSOCIATEDMICRORNAS

Duc-HauLe1,LievenVerbeke2,LeHoangSon3,Dinh-ToiChu4,Van-HuyPham5

1VinmecResearchInstituteofStemCellandGeneTechnology,458MinhKhai,HaiBaTrung,Hanoi,Vietnam;2DepartmentofInformationTechnology,GhentUniversity-imec,Ghent,

Belgium;3VNUUniversityofScience,VietnamNationalUniversity,Hanoi,Vietnam;4FacultyofBiology,HanoiNationalUniversityofEducation,Hanoi,Vietnam;5FacultyofInformation

Technology,TonDucThangUniversity,HoChiMinhCity,VietnamDuc-Hau,LeBackgroundMicroRNAs(miRNAs)havebeenshowntoplayanimportantroleinpathologicalinitiation,progressionandmaintenance.Becauseidentificationinthelaboratoryofdisease-relatedmiRNAsisnotstraightforward,numerousnetwork-basedmethodshavebeendevelopedtopredictnovelmiRNAsinsilico.Homogeneousnetworks(inwhicheverynodeisamiRNA)basedonthetargetssharedbetweenmiRNAshavebeenwidelyusedtopredicttheirroleindiseasephenotypes.Althoughsuchhomogeneousnetworkscanpredictpotentialdisease-associatedmiRNAs,theydonotconsidertherolesofthetargetgenesofthemiRNAs.Here,weintroduceanovelmethodbasedonaheterogeneousnetworkthatnotonlyconsidersmiRNAsbutalsothecorrespondingtargetgenesinthenetworkmodel.ResultsInsteadofconstructinghomogeneousmiRNAnetworks,webuiltheterogeneousmiRNAnetworksconsistingofbothmiRNAsandtheirtargetgenes,usingdatabasesofknownmiRNA-targetgeneinteractions.Inaddition,asrecentstudiesdemonstratedreciprocalregulatoryrelationsbetweenmiRNAsandtheirtargetgenes,weconsideredtheseheterogeneousmiRNAnetworkstobeundirected,assumingmutualmiRNA-targetinteractions.Next,weintroducedanovelmethod(RWRMTN)operatingonthesemutualheterogeneousmiRNAnetworkstorankcandidatedisease-relatedmiRNAsusingarandomwalkwithrestart(RWR)basedalgorithm.Usingbothknowndisease-associatedmiRNAsandtheirtargetgenesasseednodes,themethodcanidentifyadditionalmiRNAsinvolvedinthediseasephenotype.ExperimentsindicatedthatRWRMTNoutperformedtwoexistingstate-of-the-artmethods:RWRMDA,anetwork-basedmethodthatalsousesaRWRonhomogeneous(ratherthanheterogeneous)miRNAnetworks,andRLSMDA,amachinelearning-basedmethod.Interestingly,wecouldrelatethisperformancegaintotheemergenceof“diseasemodules”intheheterogeneousmiRNAnetworksusedasinputforthealgorithm.Moreover,wecoulddemonstratethatRWRMTNisstable,performingwellwhenusingbothexperimentallyvalidatedandpredictedmiRNA-targetgeneinteractiondatafornetworkconstruction.Finally,usingRWRMTN,weidentified76novelmiRNAsassociatedwith23diseasephenotypeswhichwerepresentinarecentdatabaseofknowndisease-miRNAassociations.ConclusionsSummarizing,usingrandomwalksonmutualmiRNA-targetnetworksimprovesthepredictionofnoveldisease-associatedmiRNAsbecauseoftheexistenceof“diseasemodules”inthesenetworks.

Page 134: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

123

TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE

POSTERPRESENTATIONS

Page 135: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

124

MININGELECTRONICHEALTHRECORDSFORPATIENT-CENTEREDOUTCOMESTOGUIDETREATMENTPATHWAYDECISIONSFOLLOWINGPROSTATECANCER

DIAGNOSIS

SelenBozkurt1,JungInPark2,DanielL.Rubin3,JamesD.Brooks4,TinaHernandez-Boussard5

1AkdenizUniversityFacultyofMedicineDepartmentofBiostatisticsandMedicalInformaticsAntalya,Turkey;2StanfordUniversityDepartmentofMedicine(BiomedicalInformatics);

3StanfordUniversityDepartmentofRadiology;4StanfordUniversityDepartmentofUrology;5StanfordUniversityDepartmentofMedicine(BiomedicalInformatics)

Tina,Hernandez-BoussardElectronichealthrecords(EHRs)havepotentialfornoveldiscoveryofpatient-centeredoutcomesthatcanbeusedtoimprovehealthcaredelivery.However,asignificantamountofdatastoredinEHRsishiddeninclinicalnarrativesasunstructuredtext.Forprostatecancerpatients,theseclinicnarrativescontainalargeamountofinformation.PreviousworksuggeststhatstructureddataregardingdysfunctionsaftertreatmentforprostatecancerarenotconsistentlycapturedintheEHRandthuscannotbereliablyextractedforclinicalandresearchpurposes.Therefore,inthispreliminarystudyweproposearule-basednaturallanguageprocessingpipelinetoextractpatient-centeredoutcomesrelatedtothepresenceofurinary,bowelanderectiledysfunctionfollowingtreatmentofprostatecancerfromthefreetextoftheEHRnotes.Wedevelopedalexiconoftermsrelatedtourinary,bowelorerectiledysfunctionsbasedondomainknowledge,priorexperienceinthefield,andreviewofmedicalnotes.Areferencestandardof100randomlyselecteddocumentsforeachoutcomefrominpatientadmissionswasannotatedbyaresearchnursetoidentifyallrelatedconceptsas:present,negated,historical,anddiscussedrisk.Wedevelopedarule-basednaturallanguageprocessing(NLP)pipelinewhichusesdictionarymappingcombinedwithConTextalgorithm.WetrainedourNLPpipelineusing1,336documentsandtestedon20documentstodetermineagreementwiththehumanreferencestandardandstandardprecision,recallandoverallaccuracyrateswereusedasmetricstoquantifytheautomaticannotationperformance.Theprecision,recall,andaccuracyscoresfortheurinaryincontinenceannotationsagainstthereferencestandardoutputcreatedbyadomainexpertwas62.5%,100%and76.9%,respectively.Formostofthemisclassifiedcases,whichannotatedaspresenceofurinaryincontinencebytheNLPalgorithmbutnotbytheexpert,itisseenthatmedicationinformationincludedinthetermdictionarycausedambiguityregardingphenotypeclassification.Fortheerectiledysfunctionannotations,precisionwas100%,recallwas75%andoverallaccuracywas90%.Ontheotherhand,sinceanyboweldysfunctionwasreportedintherandomlyselectedtestset,evaluationmetricswerenotcalculated.Inthispreliminarystudy,wehaveshownthatitispossibletoidentifythepatient-centeredoutcomesfromthefreetextofEHRsusingnaturallanguageprocessing.UsingEHRstoassesspatient-centeredoutcomespromotespopulation-basedassessmentsofthesevaluedyetdifficulttoassessoutcomesandwillenabledetailedsensitivityandsubgroupanalysis.Suchresultswillallowclinicianstoindividualizecarefortheirpatients.Theresultswillalsoprovidedesperatelyneededevidence-basedcriteriaforpatient-centeredoutcomes.Thesecriteriacanbeusedinresearchstudies,inclinicalpractice,andtodeveloppracticeguidelines.Futureworkwillcreatelargernumberofwell-annotateddatasetsandcombineourrule-basedapproachwithmachinelearningtechniques.

Page 136: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

125

GDMINER:ABIOTEXTMININGSYSTEMFORGENE-DISEASERELATIONANALYSIS

SooJunPark1,JihyunKim2,SooYoungCho2,CharnyPark2,YoungSeekLee3

1ElectronicsandTelecommunicationsResearchInstitute,2NationalCancerCenter,3HanyangUniversity

SooJun,ParkResearchersofBiologyandMedicineoftenvisitPubMedtofindliteraturesfortheirstudies.WhilethekeywordsearchinPubMedmaybeapopulartooltoretrieveinformation,itislimitingasitonlyprovidesasmallnumberofresults.Thekeywordsearchdoesnotallowtheusertosiftthroughdecadesworthofresearchandextractallcorrespondingstudiesasneeded.ThisposterpresentationwillprovidesolutionsthroughabiotextminingsystemcalledGDMinerthatidentifiesbiologicalentities,extractstherelationshipfromthoseentities,anddiscoversassociationsbetweengenesanddiseases.WhenGDMinercollectsabstractsfromPubMed(PubMedcollector),anautomaticnamingentitysortstheinformationinto40biologicalcategories(EntityRecognizer).GDMinerthenextractsrelationsfromthebiomedicalcategories(RelationExtractor)byusingnaturallanguageprocessingtechniques,likePart-of-Speech(POS)taggingandsyntacticparsing.Thedisplayfeaturesgraphsandtablesshowingtheextractedrelations.Forexample,agene-diseaseassociationdataquerycanbeminedbyanalyzingtherelationsbetweengenesanddiseases.Thesystemconsistsofthefollowingthreeparts:PubMedcollector,relationextractorandrelationanalyzer.ThePubMedcollectorasksabstractswithaquerygivenbyauserandfetchesthem.Therelationextractordividesabstractsintosentencesandrecognizesbiomedicalnamedentitiesinsentences.Then,therelationanalyzerextractsrelationaleventsamongrecognizedentities.Relationsareextractedbysyntacticanalysisnotbyco-occurrenceinformation.OursystemparsessentencessyntacticallyinformsofthePennTreebanksyntactictagsandextractrelationsbyanalyzingparsingresults.OurrulesaresimpleandsmallbecausethesyntactictagsethavefewernumberoftagsthanthePOStagset,butnotlimitedtorelationtypes.Therelationvieweraccumulatesextractedrelationsandvisualizesingraphsandtables.Ifthenumberofnodesinthegeneratedrelationshipnetworkissmall,itiseasyfortheusertoeasilyfindtherelationshipbetweendesiredbioobjects(namedentities).However,ifthesizeofthegeneratednetworkisverylarge,itisverydifficulttofindtherelations.Oursystemhelpusertofindtherelationbetweenthedesiredbioobjectsbycreatingasmallsizesub-networkusingthesearchandfilteringfunction.Thereisarapidlygrowinginterestinproperlyutilizingbiomedicineliteraturewithintheresearchcommunityandtherateinwhichthebiomedicineliteratureisaccumulatingisacceleratingworldwide.Theimportanceofnotonlypreservingdata,butalsothewayinwhichresearchersextractinformationisnecessaryinaidingfuturebiologicalstudiesanddiscoveries.Implementinganautomatedsystemisnecessaryinkeepingupwiththegrowthandprovidingaccuracyinfindinganalogousinformationtoaresearcher’ssearch.

Page 137: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

126

WORKSHOP

MACHINELEARNINGANDDEEPANALYTICSFORBIOCOMPUTING:CALLFORBETTEREXPLAINABILITY

POSTERPRESENTATIONS

Page 138: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

127

METHODSFOREXAMININGDATAQUALITYINHEALTHCAREINTEGRATEDDATAREPOSITORIES

VojtechHuser1,MichaelG.Kahn2,JeffreyS.Brown3,RamkiranGouripeddi4

1NationalLibraryofMedicine,NationalInstittutesofHealth8600RockvillePk,Bld38aBethesda,MD,20852,USAEmail:[email protected];2DepartmentofPediatrics,UniversityofColorado13001East17thPlaceMS-F563Aurora,CO80045USAEmail:

[email protected];3DepartmentofPpopulationMedicine,HarvardMedicalSchoolandHarvardPilgrimHealthCareInstitute401ParkDrive,Suite401EastBoston,MA02215USAEmail:[email protected][email protected];4UniversityofUtah,SchoolofMedicineSaltLakeCity,84102,Utah,USAEmail:[email protected]

Vojtech,HuserThispapersummarizescontentoftheworkshopfocusedondataquality.Thefirstspeaker(VH)describeddataqualityinfrastructureanddataqualityevaluationmethodscurrentlyinplacewithintheObservationalDataScienceandInformatics(OHDSI)consortium.ThespeakerdescribedindetailadataqualitytoolcalledAchillesHeelandlatestdevelopmentforextendingthistool.InterimresultsofanongoingDataQualitystudywithintheOHDSIconsortiumwerealsopresented.Thesecondspeaker(MK)describedlessonslearnedandnewdataqualitychecksdevelopedbythePEDsNetpediatricresearchnetwork.Thelasttwospeakers(JB,RG)describedtoolsdevelopedbytheSentinelInitiativeandUniversityofUtah’sserviceorientedframework.Theworkshopdiscussedattheendandthroughouthowdataqualityassessmentcanbeadvancedbycombiningthebestfeaturesofeachnetwork.

Page 139: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

128

MULTI-CLASSCLASSIFICATIONSTRATEGYFORSUPPORTVECTORMACHINESUSINGWEIGHTEDVOTINGANDVOTINGDROP

SunghoKim,TaehunKim

YeungnamUniversity,DGISTSungho,KimAnovelmulti-classstrategyforSupportVectorMachines(SVMs)wasdevelopedtoperformmulti-classclassification,suchasOneVersusOne,OneVersusAllandDynamicAcyclicGraph.Thesestrategiesdonotreflectthedistancebetweenthehyper-planethatseparatestwoclassesandinputdata.Thisisnotreasonablewhentheinputdataisplacednearthehyper-plane.TheproposedweightedvotingresolvesthisproblembyweightingthevotingvaluesaccordingtothedistancefromtheboundaryandtheenhancedperformanceoftheSVMswiththeproposedvotingdrop.TheproposedWeightedVotingisbasedonthevotingmethod.Thevotingmethodiscarriedoutbyaccumulatingvotes,thenchoosingthemostvotedclass.TheproposedWeightedVotingmethodisaweightingofthevotingvaluebyreflectingthedistancefromtheboundaryandmargin.SecondproposedVotingDropmethodisabouthowtoaccumulatevotes.ThenovelvotingmethodaccumulateseveryvotebutthismannercanbeaproblembecausethereareredundantlyrespondingSVMs.BecausetheSVMisabinaryclassifier,eachSVMlearnsonlyabouttwoclasses.Therefore,aSVMdoesnothavediscernmentforthenon-learnedclasses.ThisiswhywhenaSVMpredictsdatabelongingtoanon-learnedclass,theSVMrespondsredundantly.ThisirrelevantSVMcausesanincorrectvotethatmakesthedecisionconfused.Toresolvethisproblem,theVotingDropmethoddropstheredundantvotesbyremovingtheirrelevantSVM.ThisalgorithmfindstheirrelevantSVM,thendroppingthevotescausedbytheirrelevantSVM.ThewaytofindanirrelevantSVMistofindaleastvotedclassbecausealeastvotedclasscanbethoughtofasanirrelevantclasstoinputdata.Asshownintheexperiments,evenlyreflectingthedistancefromthehyper-planeandthediscernmentofthehyper-planeandremovingtheredundantSVM`svotingleadstohigherperformance.Theproposedmethodscanbeusedforarangeofclassificationtasks.

Page 140: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

129

ATOPOLOGY-BASEDAPPROACHTOQUANTIFYNETWORKPERTURBATIONSCORESFORASSESSMENTOFDIFFERENTTOBACCOPRODUCTCLASSES

QuynhT.Tran1,LeeLarcombe2,SubhashiniArimilli3,G.L.Prasad1

1ReynoldsAmericanInc.ServicesCompany-WinstonSalemNC-USA27105;2AppliedExomicsLtd-StevenageUKSG12FX;3WakeForestBaptistHealth-WinstonSalemNCUSA27104

Quynh,TranBackground:Chroniccigarettesmokingisknowntocauseimmunesuppresion,whichinturncontributestoincreasedsusceptibilitytocancer.However,thereislimitedinformationontheeffectsofnon-combustibletobaccoproducts,suchasmoistsnuff.Tobetterunderstandthemolecularchangesthatresultfromconsumptionofdifferenttobaccoproducts,globalprofilingtechniqueshavebeenextensivelyutilized.Alimitationofsuchapproachesisthatdifferentialgeneexpressionalonemaybeinsufficienttoidentifyboththesourceofperturbationandtheextenttowhichperturbationspropagatethroughanetworkofinteractinggenes.Systemsbiologytoolssupporttheanalysesandintegrationofcomplexdatasets,andprovideaholisticviewoftheunderlyingbiologicalchanges.Hence,weimplementedanetwork-basedanalysistooltoelucidatemolecularchangesthatarisefromtheuseofdifferenttobaccoproducts.Methods:Wedevelopedananalyticalapproachtoquantifyandvisualizegene-levelperturbationscoresofapre-identifiednetwork.Thisapproachdifferentiatesbiologicaleffectsofmultipletreatments,usinggenome-scaleexpressiondataandconsideringinteractome-wideeffects.Weutilizedamicroarraygeneexpressiondatasetofperipheralbloodmononuclearcellstreatedwithaqueousextractsofwholesmokeconditionedmedium(WS-CM)andsmokelesstobaccoextract(STE)preparedfrom3R4Fcigarettesand2S3moistsnuffreferencetobaccoproducts,respectively,atbaselineandafterstimulationwithtoll-likereceptor(TLR)agonists.Theanalyticalpipelinetakesnormalizedgeneexpressionvaluesandperformsthefollowingsteps:1)generatesgene-levelnetworkscoresusingaweightedtopologyapproachconsideringboththegeneexpressiondataandthefullhumaninteractomeinformationavailableinIntAct(aliteraturecuratedmolecularinteractiondatabase);2)derivesgene-levelperturbationscoresforeachtreatmentconditioncomparedtoitsbaseline;and3)calculatesasingleimpactscoreforeachexposureconditionandcreatesanetworkgraphtobevisualizedusingCytoScape.Results:Thepipelinewasappliedtocalculateimpactscoresundereachstimulationandeachtreatmentconditionforaninflammatoryresponsenetwork,signalingthroughatriggeredreceptorexpressedonmyeloidcells1(TREM1).SamplesstimulatedwithTLRagonistshadhigherscoresormoreperturbationcomparedtonon-stimulatedsamples.ThoseexposedtohigherWS-CMdosesreceivedhigherscorescomparedtolowerdosesofWS-CM.SamplesexposedtoSTEreceivedalowerscoresuggestingSTEtreatmentperturbedTREM1networktoalowerdegreethanWS-CM.Ontheotherhand,theclassicaldifferentialgeneexpressionanalysisdidnotidentifysignificantchangesingeneexpressionforSTEtreatedsamplesstimulatedwithTLRagonists,comparedtountreatedcells.Conclusions:Insummary,thisnetworkscoringmethodologysuggeststhat,undertheseconditions,STEexertslessperturbationonselectimmunenetworkscomparedtocombustibletobaccoproducts.Thesescorespotentiallyserveastoolstodifferentiatethebiologicaleffectsresultingfromdifferenttobaccoclasses.

Page 141: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

130

AUTHORINDEX

A

Abe,Sumiko·51Abyzov,Alexej·79Acharya,Ambika·5Achour,Ikbel·33Adhikari,AashishN.·113Adhikari,BhimM.·53Agrawal,Monica·8,70Aldana,Julian·76Al-Ghalith,Gabriel·75Alkan,Can·77,90Allette,Kimaada·2Alser,Mohammed·77,90Altman,RussB.·5,47,62,119Alvarellos,Maria·119Ambite,JoséLuis·51Amos,ChristopherI.·115Anderson,PaulE.·98Arimilli,Subhashini·129

B

Bada,Michael·45Bae,MinGyun·78,96,100Bae,Taejeong·79Baheti,Saurabh·116Baladandayuthapani,Veerabhadran·48Barbarino,Julia·119Bartonicek,Nenad·103BaumgartnerJr.,WilliamA.·37,45Beam,AndrewL.·83Beaulieu-Jones,BrettK.·9Bechheim,Matthias·92Behsaz,Bahar·80Berghout,Joanne·25Bharath,Karthik·48Bhattrai,Avnish·51Biegel,J.·114Bilke,Sven·88Blach,Colette·18Blangero,John·53Bobacz,Klaus·95Boguslav,Mayla·37Bowtell,DavidD.L.·103Bozkurt,Selen·124Bradford,Yuki·58Breitenstein,MatthewK.·28,118Brenner,StevenE.·113Bright,RoselieA.·5Brooks,JamesD.·124Brown,JeffreyS.·127Buccigrossi,Robert·110

Buchan,Z.R.·72Buckley,J.·114Bull,ShelleyB.·104Burns,Gully·51Bush,WilliamS.·57,81,102Butkiewicz,Mariusz·81

C

Cairns,Junmei·28,118Cali,DamlaS.·90Callahan,TiffanyJ.·45Capra,JohnA.·57,102Caraballo,PedroJ.·28,118CarsonIII,WilliamE.·43Cha,Hongui·65Chance,MarkR.·97Chen,Bin·82Chen,Flavia·113Chen,MichaelL.·83Chen,Rong·10,42Chen,Xiao·88Chen,Xintong·2Chen,Youdinghuan·71Chen,Yuying·82Cheng,Chao·71Cheng,Jun·108Cheng,Liang·108Chesi,Alessandra·35Cheung,Philip·67Chi,Chih-Lin·26Chidester,Benjamin·20Ching,Travers·60Cho,SooYoung·125Choe,EunKyung·84Choi,JungKyoon·78,96,100Christensen,BrockC.·71Chu,Dinh-Toi·122Chuang,Han-Yu·88Clark,NeilR.·3,64Cohen,K.Bretonnel·37Cooper,BruceA.·93Cox,RobertW.·53Crawford,DanaC.·57Crowley,Albert·110Currier,RobertJ.·113

D

deBelle,J.Steven·67De,Subhajyoti·109Deng,Siyuan·43Dinger,MarcelE.·103Do,MinhN.·20

Page 142: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

131

Doherty,JenniferA.·11Dong,Zhuxin·94Dorrestein,PieterC.·80Duan,Qiaonan·3,64Dudley,JoelT.·3,10,12,42,64

E

Eckel-Passow,JeanetteE.·116Ergin,Oğuz·77,90Erikson,GalinaA.·85

F

Farhat,Maha·83Feng,Qianjin·108Fenger,Douglas·67Fieremans,Els·53Fierro,Lily·51Fish,AlexandraE.·57Fisher,MarkF.·80Flotte,T.J.·72Flotte,W.·72Foster,Ian·33

G

Gai,X.·114Gallagher,RenataC.·113Garmire,LanaX.·60Geiersbach,K.B.·72Gerstein,Mark·86Ghose,Saugata·90Glahn,DavidC.·53Glicksberg,BenjaminS.·10,12,42,82Gong,Li·119Gordon,Jonathan·51Gouripeddi,Ramkiran·127Grant,Gregory·99Grant,StruanF.A.·35Greene,CaseyS.·6,11,68Greenside,Peyton·41Griffith,Malachi·16Griffith,ObiL.·16Gui,Jiang·115Guo,Caiwei·46Gupta,Anika·27Gurevich,Alexey·80Gursoy,Gamze·86

H

Haas,DavidW.·58Haines,JonathanL.·81Hall,MollyA.·35

Han,Jiali·33Han,Jiawei·39Han,Lichy·54Han,Zhi·108Harrington,LiaX.·11Hart,StevenN.·72,116Hartman,Nicholas·31Haselgrove,Christian·110Hassan,Hasan·77,90He,Lu·26He,Mingze·73Hernandez-Boussard,Tina·124Hiemenz,M.·114Hillenmeyer,Maureen·41Hobbs,BrianP.·48Hodos,Rachel·3,12,64Hong,L.Elliot·53Hornung,Veit·92Horton,IainF.·116Houten,Sander·2Hu,Jianying·3,64Hu,Xiao·93Huang,Chenglong·21Huang,EdwardW.·38,39Huang,Heng·22,107Huang,Kun·34,55,108,111,121Huang,Ling·85Huddart,Rachel·119Hudson,TiaTate·87Huff,ChadD.·105Hunter,LawrenceE.·37,45Huo,Zhouyuan·22Huser,Vojtech·127Hwang,Su-Kyeong·117

I

Ideker,Trey·39

J

Jahanshad,Neda·53Jain,Priyambada·51Jenkins,NicoleP.·71Jeong,Hyun-Hwan·46Ji,Xuemei·115Johnson,Abigail·75Johnson,KippW.·10,12Johnson,TravisS.·34,121Ju,JinHyun·88Jung,Jae-Yoon·27

K

Kaang,Bong-Kiun·117Kahn,MichaelG.·127Kamdar,Jeana·51

Page 143: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

132

Kamdar,MaulikR.·54Kang,JoonHo·65Kawakubo,Hideko·89,101Kennedy,David·110Kennedy,Eamonn·94Kettenbach,ArminjaN.·71Kho,JohnathanR.·34,121Kidd,Brian·3,64Kim,Dokyoon·23Kim,EunJi·99Kim,JeremieS.·90Kim,Jihyun·125Kim,Jinho·65Kim,Suh-Ryung·104Kim,SunAh·104Kim,Sungho·91,128Kim,Taehun·91,128Kim-Hellmuth,Sarah·92Klein,TeriE.·119Knights,Dan·75Kober,KordM.·93Kochunov,Peter·53Koenig,BarbaraA.·113Kohane,IsaacS.·83Kolmogorov,Mikhail·94Krunic,Milica·95Kulkarni,Anagha·62Kulkarni,Chaitanya·55,111Kulkarni,Shashikant·16Kundaje,Anshul·41Kvale,Mark·113Kwok,Pui·113

L

LaCava,William·13Lahens,Nicholas·99Lake,Bethany·31Lappalainen,Tuuli·92Larcombe,Lee·129Larson,NicholasB.·116Lawrence-Dill,CarolynJ.·73Le,Duc-Hau·122Lee,Boram·65Lee,Donghyuk·90Lee,Hao-Chih·3,64Lee,Jae-Hyung·117Lee,Jin-A·117Lee,JunHyeong·78,96,100Lee,Kyungmin·117Lee,SangWoo·84Lee,Seunggeun·23Lee,Taeyeop·78,96,100Lee,Yong-Seok·117Lee,YoungSeek·125Lei,Xiaoxiao·51Lerman,Kristina·51Leskovec,Jure·8,70Li,Binglan·58

Li,Fuhai·43Li,Haiquan·33Li,Jianrong·25,33Li,Justin·66Li,Li·10,42Li,Qike·25,30Li,Sihong·34,121Lim,Chae-Seok·117Liu,Gang·66Liu,Ke·82Liu,Zhandong·46Losic,Bojan·2Luo,Yunan·4Lussier,YvesA.·25,30,33

M

Ma,Jian·20Ma,Jianzhu·39Ma’ayan,Avi·3,64Machiraju,Raghu·55,111Madhavan,Subha·16Maglinte,D.·114Mallick,Parag·55,111Mallory,EmilyK.·5,62Manduchi,Elisabetta·35Mariani,Jessica·79Marotti,JonathanD.·71Matsui,Yusuke·89,101Maxwell,Sean·97McCoy,Matthew·16McDonnell,ShannonK.·116McGarvey,Peter·16Metaxas,Dimitris·109Miaskowski,Christine·93Micheel,Christine·16Miller,JasonE.·23Miller,ToddW.·71Miotto,Riccardo·10Mishkanian,Ben·88Mohammadi,Pejman·92Mohimani,Hosein·80Molina,MonicaCala·76Mooney,SeanD.·113Moore,AbigailE.·98Moore,JasonH.·9,13,17,28,35,118Morishita,Hirofumi·42Mounajjed,T.·72Müller-Myhsok,Bertram·92Mustahsan,Zairah·13Mutlu,Onur·77,90Mylne,JoshuaS.·80

N

Nam,Hyunha·101Nayak,Soumyashant·99Ng,ChaanS.·48

Page 144: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

133

Nho,Kwangsik·23,107Nichols,ThomasE.·53Norgan,A.P.·72Novikov,DmitryS.·53Nussbaum,RobertL.·113

O

O’Driscoll,Caroline·51Oh,Jaeho·78,96,100Olson,RandalS.·13,17,28,118Orlenko,Alena·28,118Orzechowski,Patryk·9,28,118Ostrow,D.·114

P

Park,Charny·125Park,JungIn·124Park,SooJun·125Park,Woong-Yang·65Paskov,KelleyM.·27Paul,StevenM.·93Paulson,Abby·110Payne,PhilipR.O.·43Peng,Jian·4,39Pesce,Lorenzo·33Petkovic,Dragutin·47,62Pevzner,PavelA.·80,94Pham,Van-Huy·122Poole,Sarah·29Pouladi,Nima·25Prasad,G.L.·66,129Preuss,Nina·110Previde,Paul·62Prjibelski,Andrey·80Puck,JenniferM.·113Pütz,Benno·92Pyc,MaryA.·67

Q

Qu,Hui·109

R

RachidZaim,Samir·30Rao,Shruti·16Ravvaz,Kourosh·26Regan,Kelly·43Rensi,StefanoE.·5Reynolds,RichardC.·53Risacher,ShannonL.·23,107Ritchie,MarylynD.·14,58Ritter,Deborah·16

Robinson,Jill·119Roy,Angshumoy·16Rubin,DanielL.·124Ryutov,A.·114

S

Salas,LucasA.·71Sangkuhi,Katrin·119Sarkar,IndraNeil·50Saykin,AndrewJ.·23,107Schissler,A.Grant·30Schmitt,Peter·17Schumacher,Johannes·92Sebra,RobertP.·2Shah,K.K.·72Shah,Nigam·29Shameer,Khader·10,12Sharma,Vivekanand·50Shearer,Gregory·31Sheih,Joseph·113Shen,Dinggang·22Shen,Li·107,110Shestov,Maksim·17Shimamura,Teppei·89,101Shin,Hyun-Tae·65Shivakumar,ManuK.·23Shoemaker,Katherine·48Shokhirev,Maxim·85Shukla,Dinesh·53Shulman,Joshua,M.·46Sinha,Aakanchha·51Sivley,R.Michael·102Smarr,Larry·80Smith,MiloR.·42Snedecor,June·88Son,LeHoang·122Sonkin,Dmitriy·16Sontag,David·3,64Srinivasan,Raj·113Srivastava,Arunima·55,111Stefanski,AdrianneL.·45Stewart,Crystal·51Stockham,NateT.·27Stolovitzky,Gustavo·2Sun,MinWoo·27Sunderam,Uma·113

T

Tenenbaum,JessicaD.·18Thomas,Brook·62Thompson,PaulM.·53Thorn,Caroline·119Timp,Gregory·94Tintle,Nathan·31Tomasini,Livia·79Tonellato,PeterJ.·26

Page 145: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

134

Torpy,JamesR.·103Tran,QuynhT.·129Travers,Matthew·110Triche,T.·114Tripodi,Ignacio·45Tully,Tim·67Turnbaugh,PeterJ.·5

U

Urban,AlexanderE.·79

V

Vaccarino,FloraM.·79VanHorn,JohnDarrell·51Vangay,Pajau·75Varik,Akshay·13Veraart,Jelle·53Verbeke,Lieven·122Verma,Anurag·58Verma,ShefaliS.·58Veturi,YogasudhaC.·14,58Vigil,Arthur·47vonHaeseler,Arndt·95

W

Wall,DennisP.·27Wang,Fei·3,64Wang,Liewei·28,118Wang,Sheng·4,38,39Wang,Yaqiong·113Wang,Yue·71Wang,Ze·107Wang,Zichen·3,64Watson,DennisK.·98Watson,PatriciaM.·98Way,GregoryP.·6,11,68Weinshilboum,RichardM.·28,118Weissert,John·26

Westra,Jason·31Whaley,RyanM.·119Whirl-Carrillo,Michelle·119White,ElizabethK.·45Williams-DeVane,ClarLynda·87Wilson,RobertC.·98Wong,Mike·47,62Woon,Mark·119

X

Xiao,Guanghua·21Xiao,Jinfeng·4Xin,Hongyi·77,90Xu,Jielin·43

Y

Yalamanchili,HariKrishna·46Yang,Jung-eun·117Yao,Xiaohui·107Yoo,YunJoo·104Yu,MichaelKu·39Yu,Yao·105Yun,JaeWon·65

Z

Zeng,William·82Zhai,ChengXiang·38Zhang,Albert·21Zhang,Jie·108Zhang,Ping·3,64Zhang,Yan·34,121Zheng,Brandon·98Zheng,Fan·39Zhou,Bo·79Zitnik,Marinka·8,70Zou,Yangyun·113Zuluaga,Martha·76


Top Related