pacific symposium on biocomputing...

145
PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page that your abstract is on and put your poster on the poster board with the corresponding number (e.g., if your abstract is on page 50, put your poster on board #50). Proceedings papers with oral presentations #2-39 are not assigned poster space. Abstracts are organized first by session, then the last name of the first author. Presenting authors’ names are underlined.

Upload: others

Post on 10-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

PACIFICSYMPOSIUMONBIOCOMPUTING2018

ABSTRACTBOOK

PosterPresenters:Posterspaceisassignedbyabstractpagenumber.Pleasefindthepagethatyourabstractisonandputyourposterontheposterboardwiththecorrespondingnumber(e.g.,ifyourabstractison

page50,putyourposteronboard#50).

Proceedingspaperswithoralpresentations#2-39arenotassignedposterspace.

Abstractsareorganizedfirstbysession,thenthelastnameofthefirstauthor.Presentingauthors’namesareunderlined.

Page 2: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

i

TABLEOFCONTENTS

PROCEEDINGSPAPERSWITHORALPRESENTATIONAPPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY......................................................................................................................................................................1CHARACTERIZATIONOFDRUG-INDUCEDSPLICINGCOMPLEXITYINPROSTATECANCERCELLLINEUSINGLONGREADTECHNOLOGY........................................................................................2XintongChen,SanderHouten,KimaadaAllette,RobertP.Sebra,GustavoStolovitzky,BojanLosic

CELL-SPECIFICPREDICTIONANDAPPLICATIONOFDRUG-INDUCEDGENEEXPRESSIONPROFILES.................................................................................................................................................................3RachelHodos,PingZhang,Hao-ChihLee,QiaonanDuan,ZichenWang,NeilR.Clark,AviMa’ayan,FeiWang,BrianKidd,JianyingHu,DavidSontag,JoelT.Dudley

LARGE-SCALEINTEGRATIONOFHETEROGENEOUSPHARMACOGENOMICDATAFORIDENTIFYINGDRUGMECHANISMOFACTION.......................................................................................4YunanLuo,ShengWang,JinfengXiao,JianPeng

CHEMICALREACTIONVECTOREMBEDDINGS:TOWARDSPREDICTINGDRUGMETABOLISMINTHEHUMANGUTMICROBIOME...............................................................................5EmilyK.Mallory,AmbikaAcharya,StefanoE.Rensi,PeterJ.Turnbaugh,RoselieA.Bright,RussB.Altman

EXTRACTINGABIOLOGICALLYRELEVANTLATENTSPACEFROMCANCERTRANSCRIPTOMESWITHVARIATIONALAUTOENCODERS.............................................................6GregoryP.Way,CaseyS.Greene

CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATAORALPRESENTATION......................................................................................................................................7LARGE-SCALEANALYSISOFDISEASEPATHWAYSINTHEHUMANINTERACTOME...........8MonicaAgrawal,MarinkaZitnik,JureLeskovec

MAPPINGPATIENTTRAJECTORIESUSINGLONGITUDINALEXTRACTIONANDDEEPLEARNINGINTHEMIMIC-IIICRITICALCAREDATABASE................................................................9BrettK.Beaulieu-Jones,PatrykOrzechowski,JasonH.Moore

AUTOMATEDDISEASECOHORTSELECTIONUSINGWORDEMBEDDINGSFROMELECTRONICHEALTHRECORDS...............................................................................................................10BenjaminS.Glicksberg,RiccardoMiotto,KippW.Johnson,KhaderShameer,LiLi,RongChen,JoelT.Dudley

FUNCTIONALNETWORKCOMMUNITYDETECTIONCANDISAGGREGATEANDFILTERMULTIPLEUNDERLYINGPATHWAYSINENRICHMENTANALYSES.........................................11LiaX.Harrington,GregoryP.Way,JenniferA.Doherty,CaseyS.Greene

CAUSALINFERENCEONELECTRONICHEALTHRECORDSTOASSESSBLOODPRESSURETREATMENTTARGETS:ANAPPLICATIONOFTHEPARAMETRICGFORMULA..................12KippW.Johnson,BenjaminS.Glicksberg,RachelHodos,KhaderShameer,JoelT.Dudley

Page 3: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

ii

DATA-DRIVENADVICEFORAPPLYINGMACHINELEARNINGTOBIOINFORMATICSPROBLEMS............................................................................................................................................................13RandalS.Olson,WilliamLaCava,ZairahMustahsan,AkshayVarik,JasonH.Moore

HOWPOWERFULARESUMMARY-BASEDMETHODSFORIDENTIFYINGEXPRESSION-TRAITASSOCIATIONSUNDERDIFFERENTGENETICARCHITECTURES?...............................14YogasudhaC.Veturi,MarylynD.Ritchie

DEMOCRATIZINGHEALTHDATAFORTRANSLATIONALRESEARCH................................15CLINGENCANCERSOMATICWORKINGGROUP–STANDARDIZINGANDDEMOCRATIZINGACCESSTOCANCERMOLECULARDIAGNOSTICDATATODRIVETRANSLATIONALRESEARCH......................................................................................................................16SubhaMadhavan,DeborahRitter,ChristineMicheel,ShrutiRao,AngshumoyRoy,DmitriySonkin,MatthewMcCoy,MalachiGriffith,ObiL.Griffith,PeterMcGarvey,ShashikantKulkarni,onbehalfoftheClingenSomaticWorkingGroup

AHEURISTICMETHODFORSIMULATINGOPEN-DATAOFARBITRARYCOMPLEXITYTHATCANBEUSEDTOCOMPAREANDEVALUATEMACHINELEARNINGMETHODS....17JasonH.Moore,MaksimShestov,PeterSchmitt,RandalS.Olson

BESTPRACTICESANDLESSONSLEARNEDFROMREUSEOF4PATIENT-DERIVEDMETABOLOMICSDATASETSINALZHEIMER'SDISEASE................................................................18JessicaD.Tenenbaum,ColetteBlach

IMAGINGGENOMICS...........................................................................................................................19DISCRIMINATIVEBAG-OF-CELLSFORIMAGING-GENOMICS.......................................................20BenjaminChidester,MinhN.Do,JianMa

DEEPINTEGRATIVEANALYSISFORSURVIVALPREDICTION......................................................21ChenglongHuang,AlbertZhang,GuanghuaXiao

GENOTYPE-PHENOTYPEASSOCIATIONSTUDYVIANEWMULTI-TASKLEARNINGMODEL....................................................................................................................................................................22ZhouyuanHuo,DinggangShen,HengHuang

CODONBIASAMONGSYNONYMOUSRAREVARIANTSISASSOCIATEDWITHALZHEIMER’SDISEASEIMAGINGBIOMARKER..................................................................................23JasonE.Miller,ManuK.Shivakumar,ShannonL.Risacher,AndrewJ.Saykin,SeunggeunLee,KwangsikNho,DokyoonKim

PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES.................................................................................................................24SINGLESUBJECTTRANSCRIPTOMEANALYSISREPRODUCESSIGNEDGENESETFUNCTIONALACTIVATIONSIGNALSFROMCOHORTANALYSISOFMURINERESPONSETOHIGHFATDIET............................................................................................................................................25JoanneBerghout,QikeLi,NimaPouladi,JianrongLi,YvesA.Lussier

USINGSIMULATIONANDOPTIMIZATIONAPPROACHTOIMPROVEOUTCOMETHROUGHWARFARINPRECISIONTREATMENT...............................................................................26Chih-LinChi,LuHe,KouroshRavvaz,JohnWeissert,PeterJ.Tonellato

Page 4: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

iii

COALITIONALGAMETHEORYASAPROMISINGAPPROACHTOIDENTIFYCANDIDATEAUTISMGENES...................................................................................................................................................27AnikaGupta,MinWooSun,KelleyM.Paskov,NateT.Stockham,Jae-YoonJung,DennisP.Wall

CONSIDERATIONSFORAUTOMATEDMACHINELEARNINGINCLINICALMETABOLICPROFILING:ALTEREDHOMOCYSTEINEPLASMACONCENTRATIONASSOCIATEDWITHMETFORMINEXPOSURE................................................................................................................................28AlenaOrlenko,JasonH.Moore,PatrykOrzechowski,RandalS.Olson,JunmeiCairns,PedroJ.Caraballo,RichardM.Weinshilboum,LieweiWang,MatthewK.Breitenstein

ADDRESSINGVITALSIGNALARMFATIGUEUSINGPERSONALIZEDALARMTHRESHOLDS......................................................................................................................................................29SarahPoole,NigamShah

EMERGENCEOFPATHWAY-LEVELCOMPOSITEBIOMARKERSFROMCONVERGINGGENESETSIGNALSOFHETEROGENEOUSTRANSCRIPTOMICRESPONSES.........................30SamirRachidZaim,QikeLi,A.GrantSchissler,YvesA.Lussier

ANALYZINGMETABOLOMICSDATAFORASSOCIATIONWITHGENOTYPESUSINGTWO-COMPONENTGAUSSIANMIXTUREDISTRIBUTIONS.......................................................................31JasonWestra,NicholasHartman,BethanyLake,GregoryShearer,NathanTintle

READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROMNONCODINGDNA..............................................................32CONVERGENTDOWNSTREAMCANDIDATEMECHANISMSOFINDEPENDENTINTERGENICPOLYMORPHISMSBETWEENCO-CLASSIFIEDDISEASESIMPLICATEEPISTASISAMONGNONCODINGELEMENTS.......................................................................................33JialiHan,JianrongLi,IkbelAchour,LorenzoPesce,IanFoster,HaiquanLi,YvesA.Lussier

NETWORKANALYSISOFPSEUDOGENE-GENERELATIONSHIPS:FROMPSEUDOGENEEVOLUTIONTOTHEIRFUNCTIONALPOTENTIALS.........................................................................34TravisS.Johnson,SihongLi,JohnathanR.Kho,KunHuang,YanZhang

LEVERAGINGPUTATIVEENHANCER-PROMOTERINTERACTIONSTOINVESTIGATETWO-WAYEPISTASISINTYPE2DIABETESGWAS...........................................................................35ElisabettaManduchi,AlessandraChesi,MollyA.Hall,StruanF.A.Grant,JasonH.Moore

TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE..........................................36IMPROVINGPRECISIONINCONCEPTNORMALIZATION...............................................................37MaylaBoguslav,K.BretonnelCohen,WilliamA.BaumgartnerJr.,LawrenceE.Hunter

VISAGE:INTEGRATINGEXTERNALKNOWLEDGEINTOELECTRONICMEDICALRECORDVISUALIZATION.................................................................................................................................................38EdwardW.Huang,ShengWang,ChengXiangZhai

ANNOTATINGGENESETSBYMININGLARGELITERATURECOLLECTIONSWITHPROTEINNETWORKS.....................................................................................................................................39ShengWang,JianzhuMa,MichaelKuYu,FanZheng,EdwardW.Huang,JiaweiHan,JianPeng,TreyIdeker

Page 5: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

iv

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONSAPPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY...................................................................................................................................................................40PREDICTIONOFPROTEIN-LIGANDINTERACTIONSFROMPAIREDPROTEINSEQUENCEMOTIFSANDLIGANDSUBSTRUCTURES................................................................................................41PeytonGreenside,MaureenHillenmeyer,AnshulKundaje

LOSS-OF-FUNCTIONOFNEUROPLASTICITY-RELATEDGENESCONFERSRISKFORHUMANNEURODEVELOPMENTALDISORDERS.................................................................................42MiloR.Smith,BenjaminS.Glicksberg,LiLi,RongChen,HirofumiMorishita,JoelT.Dudley

DIFFUSIONMAPPINGOFDRUGTARGETSONDISEASESIGNALINGNETWORKELEMENTSREVEALSDRUGCOMBINATIONSTRATEGIES.............................................................43JielinXu,KellyRegan,SiyuanDeng,WilliamE.CarsonIII,PhilipR.O.Payne,FuhaiLi

CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATA.....................................44OWL-NETS:TRANSFORMINGOWLREPRESENTATIONSFORIMPROVEDNETWORKINFERENCE..........................................................................................................................................................45TiffanyJ.Callahan,WilliamA.BaumgartnerJr.,MichaelBada,AdrianneL.Stefanski,IgnacioTripodi,ElizabethK.White,LawrenceE.Hunter

ANULTRA-FASTANDSCALABLEQUANTIFICATIONPIPELINEFORTRANSPOSABLEELEMENTSFROMNEXTGENERATIONSEQUENCINGDATA........................................................46Hyun-HwanJeong,HariKrishnaYalamanchili,CaiweiGuo,Joshua,M.Shulman,ZhandongLiu

IMPROVINGTHEEXPLAINABILITYOFRANDOMFORESTCLASSIFIER–USERCENTEREDAPPROACH...........................................................................................................................................................47DragutinPetkovic,RussB.Altman,MikeWong,ArthurVigil

TREE-BASEDMETHODSFORCHARACTERIZINGTUMORDENSITYHETEROGENEITY...48KatherineShoemaker,BrianP.Hobbs,KarthikBharath,ChaanS.Ng,VeerabhadranBaladandayuthapani

DEMOCRATIZINGHEALTHDATAFORTRANSLATIONALRESEARCH................................49IDENTIFYINGNATURALHEALTHPRODUCTANDDIETARYSUPPLEMENTINFORMATIONWITHINADVERSEEVENTREPORTINGSYSTEMS.............................................50VivekanandSharma,IndraNeilSarkar

DEMOCRATIZINGDATASCIENCETHROUGHDATASCIENCETRAINING...............................51JohnDarrellVanHorn,LilyFierro,JeanaKamdar,JonathanGordon,CrystalStewart,AvnishBhattrai,SumikoAbe,XiaoxiaoLei,CarolineO’Driscoll,AakanchhaSinha,PriyambadaJain,GullyBurns,KristinaLerman,JoséLuisAmbite

IMAGINGGENOMICS...........................................................................................................................52HERITABILITYESTIMATESONRESTINGSTATEFMRIDATAUSINGTHEENIGMAANALYSISPIPELINE.........................................................................................................................................53BhimM.Adhikari,NedaJahanshad,DineshShukla,DavidC.Glahn,JohnBlangero,RichardC.Reynolds,RobertW.Cox,ElsFieremans,JelleVeraart,DmitryS.Novikov,ThomasE.Nichols,L.ElliotHong,PaulM.Thompson,PeterKochunov

Page 6: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

v

MRITOMGMT:PREDICTINGMETHYLATIONSTATUSINGLIOBLASTOMAPATIENTSUSINGCONVOLUTIONALRECURRENTNEURALNETWORKS......................................................54LichyHan,MaulikR.Kamdar

BUILDINGTRANS-OMICSEVIDENCE:USINGIMAGINGAND‘OMICS’TOCHARACTERIZECANCERPROFILES............................................................................................................................................55ArunimaSrivastava,ChaitanyaKulkarni,ParagMallick,KunHuang,RaghuMachiraju

PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES.................................................................................................................56LOCALANCESTRYTRANSITIONSMODIFYSNP-TRAITASSOCIATIONS..................................57AlexandraE.Fish,DanaC.Crawford,JohnA.Capra,WilliamS.Bush

EVALUATIONOFPREDIXCANFORPRIORITIZINGGWASASSOCIATIONSANDPREDICTINGGENEEXPRESSION...........................................................................................................................................58BinglanLi,ShefaliS.Verma,YogasudhaC.Veturi,AnuragVerma,YukiBradford,DavidW.Haas,MarylynD.Ritchie

READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROMNONCODINGDNA.............................................................59PAN-CANCERANALYSISOFEXPRESSEDSOMATICNUCLEOTIDEVARIANTSINLONGINTERGENICNON-CODINGRNA................................................................................................................60TraversChing,LanaX.Garmire

TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE.........................................61GENEDIVE:AGENEINTERACTIONSEARCHANDVISUALIZATIONTOOLTOFACILITATEPRECISIONMEDICINE.....................................................................................................................................62PaulPrevide,BrookThomas,MikeWong,EmilyK.Mallory,DragutinPetkovic,RussB.Altman,AnaghaKulkarni

POSTERPRESENTATIONSAPPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY...................................................................................................................................................................63CELL-SPECIFICPREDICTIONANDAPPLICATIONOFDRUG-INDUCEDGENEEXPRESSIONPROFILES..............................................................................................................................................................64RachelHodos,PingZhang,Hao-ChihLee,QiaonanDuan,ZichenWang,NeilR.Clark,AviMa’ayan,FeiWang,BrianKidd,JianyingHu,DavidSontag,JoelT.Dudley

SYSTEMATICDISCOVERYOFGENOMICMARKERSFORCLINICALOUTCOMESTHROUGHCOMBINEDANALYSISOFCLINICALANDGENOMICDATA...........................................................65JinhoKim,HonguiCha,Hyun-TaeShin,BoramLee,JaeWonYun,JoonHoKang,Woong-YangPark

IDENTIFICATIONOFAPREDICTIVEGENESIGNATUREFORDIFFERENTIATINGTHEEFFECTSOFCIGARETTESMOKING..........................................................................................................66GangLiu,JustinLi,G.L.Prasad

THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY........................................................................................67MaryA.Pyc,DouglasFenger,PhilipCheung,J.StevendeBelle,TimTully

Page 7: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

vi

EXTRACTINGABIOLOGICALLYRELEVANTLATENTSPACEFROMCANCERTRANSCRIPTOMESWITHVARIATIONALAUTOENCODERS..........................................................68GregoryP.Way,CaseyS.Greene

CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATAORALPRESENTATION...................................................................................................................................69LARGE-SCALEANALYSISOFDISEASEPATHWAYSINTHEHUMANINTERACTOME........70MonicaAgrawal,MarinkaZitnik,JureLeskovec

PROFILINGOFSOMATICALTERATIONSINBRCA1-LIKEBREASTTUMORS.........................71YoudinghuanChen,YueWang,LucasA.Salas,ToddW.Miller,JonathanD.Marotti,NicoleP.Jenkins,ArminjaN.Kettenbach,ChaoCheng,BrockC.Christensen

USINGARTIFICIALINTELLIGENCEINDIGITALPATHOLOGYTOCLASSIFYMELANOCYTICLESIONS................................................................................................................................72StevenN.Hart,W.Flotte,A.P.Norgan,K.K.Shah,Z.R.Buchan,K.B.Geiersbach,T.Mounajjed,T.J.Flotte

AMACHINELEARNINGAPPROACHTOSTUDYCOMMONGENEEXPRESSIONPATTERNS...................................................................................................................................................................................73MingzeHe,CarolynJ.Lawrence-Dill

GENERAL................................................................................................................................................74DATABASE-FREEMETAGENOMICANALYSISWITHAKRONYMER............................................75GabrielAl-Ghalith,AbigailJohnson,PajauVangay,DanKnights

SOFTWARECOMPARISONFORPREPROCESSINGGC/LC-MS-BASEDMETABOLOMICSDATA.......................................................................................................................................................................76JulianAldana,MonicaCalaMolina,MarthaZuluaga

GATEKEEPER:ANEWHARDWAREARCHITECTUREFORACCELERATINGPRE-ALIGNMENTINDNASHORTREADMAPPING......................................................................................77MohammedAlser,HasanHassan,HongyiXin,OğuzErgin,OnurMutlu,CanAlkan

MODELINGTHEENHANCERACTIVITYTHROUGHTHECOMBINATIONOFEPIGENETICFACTORS...............................................................................................................................................................78MinGyunBae,TaeyeopLee,JaehoOh,JunHyeongLee,JungKyoonChoi

FREQUENCYANDPROPERTIESOFMOSAICSOMATICMUTATIONSINANORMALDEVELOPINGBRAIN........................................................................................................................................79TaejeongBae,JessicaMariani,LiviaTomasini,BoZhou,AlexanderE.Urban,AlexejAbyzov,FloraM.Vaccarino

CYCLONOVO:DENOVOSEQUENCINGALGORITHMDISCOVERSNOVELCYCLICPEPTIDENATURALPRODUCTSINSUNFLOWERANDCYANOBACTERIAUSINGTANDEMMASSSPECTROMETRYDATA...................................................................................................................................80BaharBehsaz,HoseinMohimani,AlexeyGurevich,AndreyPrjibelski,MarkF.Fisher,LarrySmarr,PieterC.Dorrestein,JoshuaS.Mylne,PavelA.Pevzner

FUNCTIONALANNOTATIONOFGENOMICVARIANTSINSTUDIESOFLATE-ONSETALZHEIMER’SDISEASE...................................................................................................................................81MariuszButkiewicz,JonathanL.Haines,WilliamS.Bush

Page 8: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

vii

OCTAD:ANOPENCANCERTHERAPEUTICDISCOVERYWORKSPACEINTHEERAOFPRECISIONMEDICINE.....................................................................................................................................82BinChen,BenjaminS.Glicksberg,WilliamZeng,YuyingChen,KeLiu

DEEPLEARNINGPREDICTSTUBERCULOSISDRUGRESISTANCESTATUSFROMWHOLE-GENOMESEQUENCINGDATA......................................................................................................................83MichaelL.Chen,IsaacS.Kohane,AndrewL.Beam,MahaFarhat

DESIGNINGPREDICTIONMODELFORHYPERURICEMIAWITHVARIOUSMACHINELEARNINGTOOLSUSINGHEALTHCHECK-UPEHRDATABASE..................................................84EunKyungChoe,SangWooLee

RICK:RNAINTERACTIVECOMPUTINGKIT..........................................................................................85GalinaA.Erikson,LingHuang,MaximShokhirev

PRIVATEINFORMATIONLEAKAGEINFUNCTIONALGENOMICSEXPERIMENTS:QUANTIFICATIONANDLINKING...............................................................................................................86GamzeGursoy,MarkGerstein

CARPED.I.E.M:ADATAINTEGRATIONEXPECTATIONMAPFORTHEPOTENTIALOFMULTI-`OMICSINTEGRATIONINCOMPLEXDISEASE.....................................................................87TiaTateHudson,ClarLyndaWilliams-DeVane

IMPROVINGGENEFUSIONDETECTIONACCURACYWITHFUSIONCONTIGREALIGNMENTINTARGETEDTUMORSEQUENCING......................................................................88JinHyunJu,XiaoChen,JuneSnedecor,Han-YuChuang,BenMishkanian,SvenBilke

SPARSEREGRESSIONFORNETWORKGRAPHSANDITSAPPLICATIONTOGENENETWORKSOFTHEBRAIN..........................................................................................................................89HidekoKawakubo,YusukeMatsui,TeppeiShimamura

GRIM-FILTER:FASTSEEDLOCATIONFILTERINGINDNAREADMAPPINGUSINGPROCESSING-IN-MEMORYTECHNOLOGIES.........................................................................................90JeremieS.Kim,DamlaS.Cali,HongyiXin,DonghyukLee,SaugataGhose,MohammedAlser,HasanHassan,OğuzErgin,CanAlkan,OnurMutlu

MULTI-CLASSCLASSIFICATIONSTRATEGYFORSUPPORTVECTORMACHINESUSINGWEIGHTEDVOTINGANDVOTINGDROP...............................................................................................91SunghoKim,TaehunKim

GENOME-WIDEANALYSISOFTRANSCRIPTIONALANDCYTOKINERESPONSEVARIABILITYINACTIVATEDHUMANIMMUNECELLS...................................................................92SarahKim-Hellmuth,MatthiasBechheim,BennoPütz,PejmanMohammadi,JohannesSchumacher,VeitHornung,BertramMüller-Myhsok,TuuliLappalainen

PREDICTINGFATIGUESEVERITYINONCOLOGYPATIENTSONEWEEKFOLLOWINGCHEMOTHERAPY...............................................................................................................................................93KordM.Kober,XiaoHu,BruceA.Cooper,StevenM.Paul,ChristineMiaskowski

SINGLE-MOLECULEPROTEINIDENTIFICATIONBYSUB-NANOPORESENSORS.................94MikhailKolmogorov,EamonnKennedy,ZhuxinDong,GregoryTimp,PavelA.Pevzner

GENEEXPRESSIONPROFILEOFOSTEOARTHRITISAFFECTEDFINGERJOINTS................95MilicaKrunic,KlausBobacz,ArndtvonHaeseler

Page 9: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

viii

DISCOVERYANDPRIORITIZATIONOFDENOVOMUTATIONSINAUTISMSPECTRUMDISORDER.............................................................................................................................................................96TaeyeopLee,JaehoOh,MinGyunBae,JunHyeongLee,JungKyoonChoi

CROSSTALKER:ANOPENNETWORKANDPATHWAYANALYSISPLATFORM.....................97SeanMaxwell,MarkR.Chance

SIGNATURESOFNON–SMALL-CELLLUNGCANCERRELAPSEPATIENTS:DIFFERENTIALEXPRESSIONANALYSISANDGENENETWORKANALYSIS............................................................98AbigailE.Moore,BrandonZheng,PatriciaM.Watson,RobertC.Wilson,DennisK.Watson,PaulE.Anderson

RANKINGBIOLOGICALFEATURESBYDIFFERENTIALABUNDANCE.......................................99SoumyashantNayak,NicholasLahens,EunJiKim,GregoryGrant

SYSTEMATICANALYSISOFOBESITYASSOCIATEDVARIATIONSTHROUGHMACHINELEARNINGBASEDONGENOMICSANDEPIGENOMICS..................................................................100JaehoOh,JunHyeongLee,TaeyeopLee,MinGyunBae,JungKyoonChoi

SPARSEREGRESSIONMODELINGOFDRUGRESPONSEWITHALOCALIZEDESTIMATIONFRAMEWORK....................................................................................................................................................101TeppeiShimamura,HidekoKawakubo,HyunhaNam,YusukeMatsui

PDBMAP:APIPELINEANDDATABASEFORMAPPINGGENETICVARIATIONINTOPROTEINSTRUCTURESANDHOMOLOGYMODELS........................................................................102R. MichaelSivley,JohnA.Capra,WilliamS.Bush

REPETITIVERNAANDGENOMICINSTABILITYINHIGH-GRADESEROUSOVARIANCANCERPROGRESSIONANDDEVELOPMENT...................................................................................103JamesR.Torpy,NenadBartonicek,DavidD.L.Bowtell,MarcelE.Dinger

DIMENSIONREDUCTIONOFGENOME-WIDESEQUENCINGDATABASEDONLINKAGEDISEQUILIBRIUMSTRUCTURE.................................................................................................................104YunJooYoo,Suh-RyungKim,SunAhKim,ShelleyB.Bull

THEMULTIPLEGENEISOFORMTEST...................................................................................................105YaoYu,ChadD.Huff

IMAGINGGENOMICS.........................................................................................................................106GENETICANALYSISOFCEREBRALBLOODFLOWIMAGINGPHENOTYPESINALZHEIMER’SDISEASE.................................................................................................................................107XiaohuiYao,ShannonL.Risacher,KwangsikNho,AndrewJ.Saykin,HengHuang,ZeWang,LiShen

PBRM1MUTATIONSAREASSOCIATEDWITHTISSUEMORPHOLOGICALCHANGESINKIDNEYCANCER..............................................................................................................................................108JunCheng,JieZhang,ZhiHan,LiangCheng,QianjinFeng,KunHuang

IMAGEGENOMICSOFINTRA-TUMORHETEROGENEITYUSINGDEEPNEURALNETWORKS........................................................................................................................................................109HuiQu,SubhajyotiDe,DimitrisMetaxas

THENEUROIMAGINGINFORMATICSTOOLSANDRESOURCESCOLLABORATORY(NITRC)ANDITSIMAGINGGENOMICSDOMAIN.............................................................................110LiShen,DavidKennedy,ChristianHaselgrove,AbbyPaulson,NinaPreuss,RobertBuccigrossi,MatthewTravers,AlbertCrowley,andTheNITRCTeam

Page 10: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

ix

IDENTIFYINGTHEGISTOFCNNS:FINDINGINTERPRETABLESIGNATURESOFHISTOLOGYIMAGEMODELSBUILTUSINGNEURALNETWORKS...........................................111ArunimaSrivastava,ChaitanyaKulkarni,KunHuang,ParagMallick,RaghuMachiraju

PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES...............................................................................................................112EXPLORINGTHEPOTENTIALOFEXOMESEQUENCINGINNEWBORNSCREENING.......113StevenE.Brenner,AashishN.Adhikari,YaqiongWang,RobertJ.Currier,RenataC.Gallagher,RobertL.Nussbaum,YangyunZou,UmaSunderam,JosephSheih,FlaviaChen,MarkKvale,SeanD.Mooney,RajSrinivasan,BarbaraA.Koenig,PuiKwok,JenniferM.Puck,TheNBSeqProject

AMETHODFORIMPROVEDVARIANTCALLINGATHOMOPOLYMERMARGINS(ANDELSEWHERE)....................................................................................................................................................114J.Buckley,M.Hiemenz,J.Biegel,T.Triche,A.Ryutov,D.Maglinte,D.Ostrow,X.Gai

EFFICIENTSURVIVALMULTIFACTORDIMENSIONALITYREDUCTIONMETHODFORDETECTINGGENE-GENEINTERACTION..............................................................................................115JiangGui,XuemeiJi,ChristopherI.Amos

BIOINFORMATICSPROCESSINGSTRATEGIESFOREFFICIENTSEQUENCINGDATASTORAGEUSINGGVCFBANDING.............................................................................................................116NicholasB.Larson,ShannonK.McDonnell,IainF.Horton,SaurabhBaheti,JeanetteE.Eckel-Passow,StevenN.Hart

IDENTIFICATIONOFANOVELTSC2MUTATIONINAPATIENTWITHTUBEROUSSCLEROSISCOMPLEX....................................................................................................................................117Jae-HyungLee,Su-KyeongHwang,Jung-eunYang,Chae-SeokLim,Jin-ALee,KyungminLee,Bong-KiunKaang,Yong-SeokLee

CONSIDERATIONSFORAUTOMATEDMACHINELEARNINGINCLINICALMETABOLICPROFILING:ALTEREDHOMOCYSTEINEPLASMACONCENTRATIONASSOCIATEDWITHMETFORMINEXPOSURE..............................................................................................................................118AlenaOrlenko,JasonH.Moore,PatrykOrzechowski,RandalS.Olson,JunmeiCairns,PedroJ.Caraballo,RichardM.Weinshilboum,LieweiWang,MatthewK.Breitenstein

PHARMGKB:NEWWEBSITERELEASE2017........................................................................................119MichelleWhirl-Carrillo,RyanM.Whaley,MarkWoon,KatrinSangkuhi,LiGong,JuliaBarbarino,CarolineThorn,RachelHuddart,MariaAlvarellos,JillRobinson,RussB.Altman,TeriE.Klein

READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROMNONCODINGDNA...........................................................120NETWORKANALYSISOFPSEUDOGENE-GENERELATIONSHIPS:FROMPSEUDOGENEEVOLUTIONTOTHEIRFUNCTIONALPOTENTIALS........................................................................121TravisS.Johnson,SihongLi,JohnathanR.Kho,KunHuang,YanZhang

RANDOMWALKSONMUTUALMICRORNA-TARGETGENEINTERACTIONNETWORKIMPROVETHEPREDICTIONOFDISEASE-ASSOCIATEDMICRORNAS....................................122Duc-HauLe,LievenVerbeke,LeHoangSon,Dinh-ToiChu,Van-HuyPham

Page 11: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

x

TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE.......................................123MININGELECTRONICHEALTHRECORDSFORPATIENT-CENTEREDOUTCOMESTOGUIDETREATMENTPATHWAYDECISIONSFOLLOWINGPROSTATECANCERDIAGNOSIS..........................................................................................................................................................124SelenBozkurt,JungInPark,DanielL.Rubin,JamesD.Brooks,TinaHernandez-Boussard

GDMINER:ABIOTEXTMININGSYSTEMFORGENE-DISEASERELATIONANALYSIS.....125SooJunPark,JihyunKim,SooYoungCho,CharnyPark,YoungSeekLee

WORKSHOP.....................................................................................................................................126MACHINELEARNINGANDDEEPANALYTICSFORBIOCOMPUTING:CALLFORBETTEREXPLAINABILITY...............................................................................................................................126METHODSFOREXAMININGDATAQUALITYINHEALTHCAREINTEGRATEDDATAREPOSITORIES..................................................................................................................................................127VojtechHuser,MichaelG.Kahn,JeffreyS.Brown,RamkiranGouripeddi

MULTI-CLASSCLASSIFICATIONSTRATEGYFORSUPPORTVECTORMACHINESUSINGWEIGHTEDVOTINGANDVOTINGDROP.............................................................................................128SunghoKim,TaehunKim

ATOPOLOGY-BASEDAPPROACHTOQUANTIFYNETWORKPERTURBATIONSCORESFORASSESSMENTOFDIFFERENTTOBACCOPRODUCTCLASSES...........................................129QuynhT.Tran,LeeLarcombe,SubhashiniArimilli,G.L.Prasad

AUTHORINDEX.............................................................................................................................130

Page 12: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

1

APPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 13: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

2

CHARACTERIZATIONOFDRUG-INDUCEDSPLICINGCOMPLEXITYINPROSTATECANCERCELLLINEUSINGLONGREADTECHNOLOGY

XintongChen1,SanderHouten1,KimaadaAllette1,RobertP.Sebra1,GustavoStolovitzky1,2,BojanLosic1

1IcahnSchoolofMedicineatMountSinai,2IBM

Bojan,LosicWecharacterizethetranscriptionalsplicinglandscapeofaprostatecancercelllinetreatedwithapreviouslyidentifiedsynergisticdrugcombination.Weuseacombinationofthirdgenerationlong-readRNAsequencingtechnologyandshort-readRNAseqtocreateahigh-fidelitymapofexpressedisoformsandfusionstoquantifysplicingeventstriggeredbytreatment.Wefindstrongevidencefordrug-induced,coherentsplicingchangeswhichdisruptthefunctionofoncogenicproteins,anddetectnoveltranscriptsarisingfrompreviouslyunreportedfusionevents.

Page 14: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

3

CELL-SPECIFICPREDICTIONANDAPPLICATIONOFDRUG-INDUCEDGENEEXPRESSIONPROFILES

RachelHodos1,2,PingZhang3,Hao-ChihLee1,QiaonanDuan1,ZichenWang1,NeilR.Clark1,AviMa’ayan1,FeiWang3,4,BrianKidd1,JianyingHu3,DavidSontag5,JoelT.

Dudley1

1IcahnSchoolofMedicineatMountSinai,2NewYorkUniversity,3IBMT.J.WatsonResearchCenter,4CornellUniversity,5MassachusettsInstituteofTechnology

Rachel,HodosGeneexpressionprofilingofinvitrodrugperturbationsisusefulformanybiomedicaldiscoveryapplicationsincludingdrugrepurposingandelucidationofdrugmechanisms.However,limiteddataavailabilityacrosscelltypeshashinderedourcapacitytoleverageorexplorethecell-specificityoftheseperturbations.Whilerecenteffortshavegeneratedalargenumberofdrugperturbationprofilesacrossavarietyofhumancelltypes,manygapsremaininthiscombinatorialdrug-cellspace.Hence,weaskedwhetheritispossibletofillthesegapsbypredictingcell-specificdrugperturbationprofilesusingavailableexpressiondatafromrelatedconditions--i.e.fromotherdrugsandcelltypes.Wedevelopedacomputationalframeworkthatfirstarrangesexistingprofilesintoathree-dimensionalarray(ortensor)indexedbydrugs,genes,andcelltypes,andthenuseseitherlocal(nearest-neighbors)orglobal(tensorcompletion)informationtopredictunmeasuredprofiles.Weevaluatepredictionaccuracyusingavarietyofmetrics,andfindthatthetwomethodshavecomplementaryperformance,eachsuperiorindifferentregionsinthedrug-cellspace.Predictionsachievecorrelationsof0.68withtruevalues,andmaintainaccuratedifferentiallyexpressedgenes(AUC0.81).Finally,wedemonstratethatthepredictedprofilesaddvalueformakingdownstreamassociationswithdrugtargetsandtherapeuticclasses.

Page 15: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

4

LARGE-SCALEINTEGRATIONOFHETEROGENEOUSPHARMACOGENOMICDATAFORIDENTIFYINGDRUGMECHANISMOFACTION

YunanLuo,ShengWang,JinfengXiao,JianPeng

UniversityofIllinoisatUrbana-ChampaignYunan,LuoAvarietyoflarge-scalepharmacogenomicdata,suchasperturbationexperimentsandsensitivityprofiles,enablethesystematicalidentificationofdrugmechanismofactions(MoAs),whichisacrucialtaskintheeraofprecisionmedicine.However,integratingthesecomplementarypharmacogenomicdatasetsisinherentlychallengingduetothewildheterogeneity,high-dimensionalityandnoisynatureofthesedatasets.Inthiswork,wedevelopMania,anovelmethodforthescalableintegrationoflarge-scalepharmacogenomicdata.Maniafirstconstructsadrug-drugsimilaritynetworkthroughintegratingmultipleheterogeneousdatasources,includingdrugsensitivity,drugchemicalstructure,andperturbationassays.Itthenlearnsacompactvectorrepresentationforeachdrugtosimultaneouslyencodeitsstructuralandpharmacogenomicproperties.ExtensiveexperimentsdemonstratethatManiaachievessubstantiallyimprovedperformanceinbothMoAsandtargetsprediction,comparedtopredictionsbasedonindividualdatasourcesaswellasastate-of-the-artintegrativemethod.Moreover,Maniaidentifiesdrugsthattargetfrequentlymutatedcancergenes,whichprovidesnovelinsightsintodrugrepurposing.

Page 16: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

5

CHEMICALREACTIONVECTOREMBEDDINGS:TOWARDSPREDICTINGDRUGMETABOLISMINTHEHUMANGUTMICROBIOME

EmilyK.Mallory1,AmbikaAcharya1,StefanoE.Rensi1,PeterJ.Turnbaugh2,RoselieA.Bright3,RussB.Altman1

1StanfordUniversity,2UniversityofCaliforniaSanFrancisco,3FoodandDrug

AdministrationEmily,MalloryBacteriainthehumanguthavetheabilitytoactivate,inactivate,andreactivatedrugswithbothintendedandunintendedeffects.Forexample,thedrugdigoxinisreducedtotheinactivemetabolitedihydrodigoxinbythegutActinobacteriumE.lenta,andpatientscolonizedwithhighlevelsofdrugmetabolizingstrainsmayhavelimitedresponsetothedrug.Understandingthecompletespaceofdrugsthataremetabolizedbythehumangutmicrobiomeiscriticalforpredictingbacteria-drugrelationshipsandtheireffectsonindividualpatientresponse.Discoveryandvalidationofdrugmetabolismviabacterialenzymeshasyielded>50drugsafternearlyacenturyofexperimentalresearch.However,therearelimitedcomputationaltoolsforscreeningdrugsforpotentialmetabolismbythegutmicrobiome.Wedevelopedapipelineforcomparingandcharacterizingchemicaltransformationsusingcontinuousvectorrepresentationsofmolecularstructurelearnedusingunsupervisedrepresentationlearning.WeappliedthispipelinetochemicalreactiondatafromMetaCyctocharacterizetheutilityofvectorrepresentationsforchemicalreactiontransformations.Afterclusteringmolecularandreactionvectors,weperformedenrichmentanalysesandqueriestocharacterizethespace.Wedetectedenrichedenzymenames,GeneOntologyterms,andEnzymeConsortium(EC)classeswithinreactionclusters.Inaddition,wequeriedreactionsagainstdrug-metabolitetransformationsknowntobemetabolizedbythehumangutmicrobiome.Thetopresultsfortheseknowndrugtransformationscontainedsimilarsubstructuremodificationstotheoriginaldrugpair.Thisworkenableshighthroughputscreeningofdrugsandtheirresultingmetabolitesagainstchemicalreactionscommontogutbacteria.

Page 17: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

6

EXTRACTINGABIOLOGICALLYRELEVANTLATENTSPACEFROMCANCERTRANSCRIPTOMESWITHVARIATIONALAUTOENCODERS

GregoryP.Way,CaseyS.Greene

UniversityofPennsylvaniaGregory,WayTheCancerGenomeAtlas(TCGA)hasprofiledover10,000tumorsacross33differentcancer-typesformanygenomicfeatures,includinggeneexpressionlevels.Geneexpressionmeasurementscapturesubstantialinformationaboutthestateofeachtumor.Certainclassesofdeepneuralnetworkmodelsarecapableoflearningameaningfullatentspace.Suchalatentspacecouldbeusedtoexploreandgeneratehypotheticalgeneexpressionprofilesundervarioustypesofmolecularandgeneticperturbation.Forexample,onemightwishtousesuchamodeltopredictatumor'sresponsetospecifictherapiesortocharacterizecomplexgeneexpressionactivationsexistingindifferentialproportionsindifferenttumors.Variationalautoencoders(VAEs)areadeepneuralnetworkapproachcapableofgeneratingmeaningfullatentspacesforimageandtextdata.Inthiswork,wesoughttodeterminetheextenttowhichaVAEcanbetrainedtomodelcancergeneexpression,andwhetherornotsuchaVAEwouldcapturebiologically-relevantfeatures.Inthefollowingreport,weintroduceaVAEtrainedonTCGApan-cancerRNA-seqdata,identifyspecificpatternsintheVAEencodedfeatures,anddiscusspotentialmeritsoftheapproach.Wenameourmethod"Tybalt"afteraninstigative,cat-likecharacterwhosetsacascadingchainofeventsinmotioninShakespeare'sRomeoandJuliet.Fromasystemsbiologyperspective,Tybaltcouldonedayaidincancerstratificationorpredictspecificactivatedexpressionpatternsthatwouldresultfromgeneticchangesortreatmenteffects.

Page 18: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

7

CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATAORALPRESENTATION

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 19: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

8

LARGE-SCALEANALYSISOFDISEASEPATHWAYSINTHEHUMANINTERACTOME

MonicaAgrawal,MarinkaZitnik,JureLeskovec

StanfordUniversityMarinka,ZitnikDiscoveringdiseasepathways,whichcanbedefinedassetsofproteinsassociatedwithagivendisease,isanimportantproblemthathasthepotentialtoprovideclinicallyactionableinsightsfordiseasediagnosis,prognosis,andtreatment.Computationalmethodsaidthediscoverybyrelyingonprotein-proteininteraction(PPI)networks.Theystartwithafewknowndisease-associatedproteinsandaimtofindtherestofthepathwaybyexploringthePPInetworkaroundtheknowndiseaseproteins.However,thesuccessofsuchmethodshasbeenlimited,andfailurecaseshavenotbeenwellunderstood.HerewestudythePPInetworkstructureof519diseasepathways.Wefindthat90%ofpathwaysdonotcorrespondtosinglewell-connectedcomponentsinthePPInetwork.Instead,proteinsassociatedwithasinglediseasetendtoformmanyseparateconnectedcomponents/regionsinthenetwork.Wethenevaluatestate-of-the-artdiseasepathwaydiscoverymethodsandshowthattheirperformanceisespeciallypoorondiseaseswithdisconnectedpathways.Thus,weconcludethatnetworkconnectivitystructurealonemaynotbesufficientfordiseasepathwaydiscovery.However,weshowthathigher-ordernetworkstructures,suchassmallsubgraphsofthepathway,provideapromisingdirectionforthedevelopmentofnewmethods.

Page 20: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

9

MAPPINGPATIENTTRAJECTORIESUSINGLONGITUDINALEXTRACTIONANDDEEPLEARNINGINTHEMIMIC-IIICRITICALCAREDATABASE

BrettK.Beaulieu-Jones,PatrykOrzechowski,JasonH.Moore

UniversityofPennsylvaniaBrett,Beaulieu-JonesElectronicHealthRecords(EHRs)containawealthofpatientdatausefultobiomedicalresearchers.Atpresent,boththeextractionofdataandmethodsforanalysesarefrequentlydesignedtoworkwithasinglesnapshotofapatient’srecord.Healthcareprovidersoftenperformandrecordactionsinsmallbatchesovertime.Byextractingthesecareevents,asequencecanbeformedprovidingatrajectoryforapatient’sinteractionswiththehealthcaresystem.Thesecareeventsalsoofferabasicheuristicforthelevelofattentionapatientreceivesfromhealthcareproviders.Weshowthatispossibletolearnmeaningfulembeddingsfromthesecareeventsusingtwodeeplearningtechniques,unsupervisedautoencodersandlongshort-termmemorynetworks.WecomparethesemethodstotraditionalmachinelearningmethodswhichrequireapointintimesnapshottobeextractedfromanEHR.

Page 21: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

10

AUTOMATEDDISEASECOHORTSELECTIONUSINGWORDEMBEDDINGSFROMELECTRONICHEALTHRECORDS

BenjaminS.Glicksberg,RiccardoMiotto,KippW.Johnson,KhaderShameer,LiLi,RongChen,JoelT.Dudley

IcahnSchoolofMedicineatMountSinai

Benjamin,GilcksbergAccurateandrobustcohortdefinitioniscriticaltobiomedicaldiscoveryusingElectronicHealthRecords(EHR).Similartoprospectivestudydesigns,highqualityEHR-basedresearchrequiresrigorousselectioncriteriatodesignatecase/controlstatusparticulartoeachdisease.Electronicphenotypingalgorithms,whicharemanuallybuiltandvalidatedperdisease,havebeensuccessfulinfillingthisneed.However,theseapproachesaretime-consuming,leadingtoonlyarelativelysmallamountofalgorithmsfordiseasesdeveloped.MethodologiesthatautomaticallylearnfeaturesfromEHRshavebeenusedforcohortselectionaswell.Todate,however,therehasbeennosystematicanalysisofhowthesemethodsperformagainstcurrentgoldstandards.Accordingly,thispapercomparestheperformanceofastate-of-the-artautomatedfeaturelearningmethodtoextractingresearch-gradecohortsforfivediseasesagainsttheirestablishedelectronicphenotypingalgorithms.Inparticular,weuseword2vectocreateunsupervisedembeddingsofthephenotypespacewithinanEHRsystem.Usingmedicalconceptsasaquery,wethenrankpatientsbytheirproximityintheembeddingspaceandautomaticallyextractputativediseasecohortsviaadistancethreshold.ExperimentalevaluationshowspromisingresultswithaverageF-scoreof0.57andAUC-ROCof0.98.However,wenoticedthatresultsvariedconsiderablybetweendiseases,thusnecessitatingfurtherinvestigationand/orphenotype-specificrefinementoftheapproachbeforebeingreadilydeployedacrossalldiseases.

Page 22: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

11

FUNCTIONALNETWORKCOMMUNITYDETECTIONCANDISAGGREGATEANDFILTERMULTIPLEUNDERLYINGPATHWAYSINENRICHMENTANALYSES

LiaX.Harrington1,GregoryP.Way2,JenniferA.Doherty3,CaseyS.Greene2

1GeiselSchoolofMedicineatDartmouth,2UniversityofPennsylvania,3UniversityofUtahLia,HarringtonDifferentialexpressionexperimentsorotheranalysesoftenendinalistofgenes.Pathwayenrichmentanalysisisonemethodtodiscernimportantbiologicalsignalsandpatternsfromnoisyexpressiondata.However,pathwayenrichmentanalysismayperformsuboptimallyinsituationswheretherearemultipleimplicatedpathways–suchasinthecaseofgenesthatdefinesubtypesofcomplexdiseases.Oursimulationstudyshowsthatinthissetting,standardoverrepresentationanalysisidentifiesmanyfalsepositivepathwaysalongwiththetruepositives.Thesefalsepositiveshamperinvestigators’attemptstogleanbiologicalinsightsfromenrichmentanalysis.Wedevelopandevaluateanapproachthatcombinescommunitydetectionoverfunctionalnetworkswithpathwayenrichmenttoreducefalsepositives.Oursimulationstudydemonstratesthatalargereductioninfalsepositivescanbeobtainedwithasmalldecreaseinpower.Thoughwehypothesizedthatmultiplecommunitiesmightunderliepreviouslydescribedsubtypesofhigh-gradeserousovariancancerandappliedthisapproach,ourresultsdonotsupportthishypothesis.Insummary,applyingcommunitydetectionbeforeenrichmentanalysismayeaseinterpretationforcomplexgenesetsthatrepresentmultipledistinctpathways.

Page 23: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

12

CAUSALINFERENCEONELECTRONICHEALTHRECORDSTOASSESSBLOODPRESSURETREATMENTTARGETS:ANAPPLICATIONOFTHEPARAMETRICG

FORMULA

KippW.Johnson1,BenjaminS.Glicksberg1,RachelHodos1,2,KhaderShameer1,JoelT.Dudley1

1InstituteforNextGenerationHealthcare,DepartmentofGeneticsandGenomic

Sciences,IcahnSchoolofMedicineatMountSinai;2CourantInstituteofMathematicalSciences,NewYorkUniversity

Kipp,JohnsonHypertensionisamajorriskfactorforischemiccardiovasculardiseaseandcerebrovasculardisease,whicharerespectivelytheprimaryandsecondarymostcommoncausesofmorbidityandmortalityacrosstheglobe.Toalleviatetherisksofhypertension,thereareanumberofeffectiveantihypertensivedrugsavailable.However,theoptimaltreatmentbloodpressuregoalforantihypertensivetherapyremainsanareaofcontroversy.TheresultsoftherecentSystolicBloodPressureInterventionTrial(SPRINT)trial,whichfoundbenefitsforintensiveloweringofsystolicbloodpressure,havebeendebatedforseveralreasons.WeaimedtoassessthebenefitsoftreatingtofourdifferentbloodpressuretargetsandtocompareourresultstothoseofSPRINTusingamethodforcausalinferencecalledtheparametricgformula.Weappliedthismethodtobloodpressuremeasurementsobtainedfromtheelectronichealthrecordsofapproximately200,000patientswhovisitedtheMountSinaiHospitalinNewYork,NY.Wesimulatedtheeffectoffourclinicallyrelevantdynamictreatmentregimes,assessingtheeffectivenessoftreatingtofourdifferentbloodpressuretargets:150mmHg,140mmHg,130mmHg,and120mmHg.IncontrasttocurrentAmericanHeartAssociationguidelinesandinconcordancewithSPRINT,wefindthattargeting120mmHgsystolicbloodpressureissignificantlyassociatedwithdecreasedincidenceofmajoradversecardiovascularevents.Causalinferencemethodsappliedtoelectronicmethodsareapowerfulandflexibletechniqueandmedicinemaybenefitfromtheirincreasedusage.

Page 24: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

13

DATA-DRIVENADVICEFORAPPLYINGMACHINELEARNINGTOBIOINFORMATICSPROBLEMS

RandalS.Olson,WilliamLaCava,ZairahMustahsan,AkshayVarik,JasonH.Moore

UniversityofPennsylvaniaWilliam,LaCavaAsthebioinformaticsfieldgrows,itmustkeeppacenotonlywithnewdatabutwithnewalgorithms.Herewecontributeathoroughanalysisof13state-of-the-art,commonlyusedmachinelearningalgorithmsonasetof165publiclyavailableclassificationproblemsinordertoprovidedata-drivenalgorithmrecommendationstocurrentresearchers.Wepresentanumberofstatisticalandvisualcomparisonsofalgorithmperformanceandquantifytheeffectofmodelselectionandalgorithmtuningforeachalgorithmanddataset.Theanalysisculminatesintherecommendationoffivealgorithmswithhyperparametersthatmaximizeclassifierperformanceacrossthetestedproblems,aswellasgeneralguidelinesforapplyingmachinelearningtosupervisedclassificationproblems.

Page 25: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

14

HOWPOWERFULARESUMMARY-BASEDMETHODSFORIDENTIFYINGEXPRESSION-TRAITASSOCIATIONSUNDERDIFFERENTGENETIC

ARCHITECTURES?

YogasudhaC.Veturi,MarylynD.Ritchie

BiomedicalandTranslationalInformaticsInstitute,GeisingerYogasudha,VeturiTranscriptome-wideassociationstudies(TWAS)haverecentlybeenemployedasanapproachthatcandrawupontheadvantagesofgenome-wideassociationstudies(GWAS)andgeneexpressionstudiestoidentifygenesassociatedwithcomplextraits.UnlikestandardGWAS,summaryleveldatasufficesforTWASandoffersimprovedstatisticalpower.TwopopularTWASmethodsincludeeither(a)imputingthecisgeneticcomponentofgeneexpressionfromsmallersizedstudies(usingmulti-SNPpredictionorMP)intomuchlargereffectivesamplesizesaffordedbyGWAS–-TWAS-MPor(b)usingsummary-basedMendelianrandomization–-TWAS-SMR.Althoughthesemethodshavebeeneffectiveatdetectingfunctionalvariants,itremainsunclearhowextensivevariabilityinthegeneticarchitectureofcomplextraitsanddiseasesimpactsTWASresults.Ourgoalwastoinvestigatethedifferentscenariosunderwhichthesemethodsyieldedenoughpowertodetectsignificantexpression-traitassociations.Inthisstudy,weconductedextensivesimulationsbasedon6000randomlychosen,unrelatedCaucasianmalesfromGeisinger’sMyCodepopulationtocomparethepowertodetectcisexpression-traitassociations(within500kbofagene)usingtheabove-describedapproaches.TotestTWASacrossvaryinggeneticbackgroundswesimulatedgeneexpressionandphenotypeusingdifferentquantitativetraitlocipergeneandcis-expression/traitheritabilityundergeneticmodelsthatdifferentiatetheeffectofcausalityfromthatofpleiotropy.Foreachgene,onatrainingsetrangingfrom100to1000individuals,weeither(a)estimatedregressioncoefficientswithgeneexpressionastheresponseusingfivedifferentmethods:LASSO,elasticnet,BayesianLASSO,Bayesianspike-slab,andBayesianridgeregressionor(b)performedeQTLanalysis.Wethensampledwithreplacement50,000,150,000,and300,000individualsrespectivelyfromthetestingsetoftheremaining5000individualsandconductedGWASoneachset.Subsequently,weintegratedtheGWASsummarystatisticsderivedfromthetestingsetwiththeweights(oreQTLs)derivedfromthetrainingsettoidentifyexpression-traitassociationsusing(a)TWAS-MP(b)TWAS-SMR(c)eQTL-basedGWAS,or(d)standaloneGWAS.Finally,weexaminedthepowertodetectfunctionallyrelevantgenesusingthedifferentapproachesundertheconsideredsimulationscenarios.Ingeneral,weobservedgreatsimilaritiesamongTWAS-MPmethodsalthoughtheBayesianmethodsresultedinimprovedpowerincomparisontoLASSOandelasticnetasthetraitarchitecturegrewmorecomplexwhiletrainingsamplesizesandexpressionheritabilityremainedsmall.Finally,weobservedhighpowerundercausalitybutverylowtomoderatepowerunderpleiotropy.

Page 26: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

15

DEMOCRATIZINGHEALTHDATAFORTRANSLATIONALRESEARCH

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 27: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

16

CLINGENCANCERSOMATICWORKINGGROUP–STANDARDIZINGANDDEMOCRATIZINGACCESSTOCANCERMOLECULARDIAGNOSTICDATATODRIVE

TRANSLATIONALRESEARCH

SubhaMadhavan1,DeborahRitter2,ChristineMicheel3,ShrutiRao1,AngshumoyRoy2,DmitriySonkin4,MatthewMcCoy1,MalachiGriffith5,ObiL.Griffith5,PeterMcGarvey1,

ShashikantKulkarni2,onbehalfoftheClingenSomaticWorkingGroup

1InnovationCenterforBiomedicalInformatics,GeorgetownUniversity,WashingtonD.C.;2BaylorCollegeofMedicineandTexasChildren'sHospital,Houston,TX;3Vanderbilt

UniversitySchoolofMedicine,Nashville,TN;4NationalCancerInstitute,Rockville,MD;5TheMcDonnellGenomeInstitute,WashingtonUniversity,St.Louis,MO

Subha,MadhavanAgrowingnumberofacademicandcommunityclinicsareconductinggenomictestingtoinformtreatmentdecisionsforcancerpatients.Inthelast3-5years,therehasbeenarapidincreaseinclinicaluseofnextgenerationsequencing(NGS)basedcancermoleculardiagnostic(MolDx)testing.Theincreasingavailabilityanddecreasingcostoftumorgenomicprofilingmeansthatphysicianscannowmaketreatmentdecisionsarmedwithpatient-specificgeneticinformation.Accumulatingresearchinthecancerbiologyfieldindicatesthatthereissignificantpotentialtoimprovecancerpatientoutcomesbyeffectivelyleveragingthisrichsourceofgenomicdataintreatmentplanning.Toachievetrulypersonalizedmedicineinoncology,itiscriticaltocatalogcancersequencevariantsfromMolDxtestingfortheirclinicalrelevancealongwithtreatmentinformationandpatientoutcomes,andtodosoinawaythatsupportslarge-scaledataaggregationandnewhypothesisgeneration.Onecriticalchallengetoencodingvariantdataisadoptingastandardofannotationofthosevariantsthatareclinicallyactionable.ThroughtheNIH-fundedClinicalGenomeResource(ClinGen),incollaborationwithNLM’sClinVardatabaseand>50academicandindustrybasedcancerresearchorganizations,wedevelopedtheMinimalVariantLevelData(MVLD)frameworktostandardizereportingandinterpretationofdrugassociatedalterations.WearecurrentlyinvolvedincollaborativeeffortstoaligntheMVLDframeworkwithparallel,complementarysequencevariantsinterpretationclinicalguidelinesfromtheAssociationofMolecularPathologists(AMP)forclinicallabs.InordertotrulydemocratizeaccesstoMolDxdataforcareandresearchneeds,thesestandardsmustbeharmonizedtosupportsharingofclinicalcancervariants.HerewedescribetheprocessesandmethodsdevelopedwithintheClinGen’sSomaticWGincollaborationwithover60cancercareandresearchorganizationsaswellasCLIA-certified,CAP-accreditedclinicaltestinglabstodevelopstandardsforcancervariantinterpretationandsharing.Keywords:ClinGen,Somaticvariants,predictivebiomarkers,MVLD,datasharing

Page 28: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

17

AHEURISTICMETHODFORSIMULATINGOPEN-DATAOFARBITRARYCOMPLEXITYTHATCANBEUSEDTOCOMPAREANDEVALUATEMACHINE

LEARNINGMETHODS

JasonH.Moore,MaksimShestov,PeterSchmitt,RandalS.Olson

InstituteforBiomedicalInformatics,UniversityofPennsylvaniaJason,MooreAcentralchallengeofdevelopingandevaluatingartificialintelligenceandmachinelearningmethodsforregressionandclassificationisaccesstodatathatilluminatesthestrengthsandweaknessesofdifferentmethods.Opendataplaysanimportantroleinthisprocessbymakingiteasyforcomputationalresearcherstoeasilyaccessrealdataforthispurpose.GenomicshasinsomeexamplestakenaleadingroleintheopendataeffortstartingwithDNAmicroarrays.Whilerealdatafromexperimentalandobservationalstudiesisnecessaryfordevelopingcomputationalmethodsitisnotsufficient.Thisisbecauseitisnotpossibletoknowwhatthegroundtruthisinrealdata.Thismustbeaccompaniedbysimulateddatawherethatbalancebetweensignalandnoiseisknownandcanbedirectlyevaluated.Unfortunately,thereisalackofmethodsandsoftwareforsimulatingdatawiththekindofcomplexityfoundinrealbiologicalandbiomedicalsystems.WepresentheretheHeuristicIdentificationofBiologicalArchitecturesforsimulatingComplexHierarchicalInteractions(HIBACHI)methodandprototypesoftwareforsimulatingcomplexbiologicalandbiomedicaldata.Further,weintroducenewmethodsfordevelopingsimulationmodelsthatgeneratedatathatspecificallyallowsdiscriminationbetweendifferentmachinelearningmethods.

Page 29: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

18

BESTPRACTICESANDLESSONSLEARNEDFROMREUSEOF4PATIENT-DERIVEDMETABOLOMICSDATASETSINALZHEIMER'SDISEASE

JessicaD.Tenenbaum,ColetteBlach

DukeUniversityJessica,TenenbaumTheimportanceofopendatahasbeenincreasinglyrecognizedinrecentyears.Althoughthesharingandreuseofclinicaldatafortranslationalresearchlagsbehindbestpracticesinbiologicalscience,anumberofpatient-deriveddatasetsexistandhavebeenpublishedenablingtranslationalresearchspanningmultiplescalesfrommoleculartoorganlevel,andfrompatientstopopulations.InseekingtoreplicatemetabolomicbiomarkerresultsinAlzheimer’sdiseaseourteamidentifiedthreeindependentcohortsinwhichtocomparefindings.Accessingthedatasetsassociatedwiththesecohorts,understandingtheircontentandprovenance,andcomparingvariablesbetweenstudieswasavaluableexerciseinexploringtheprinciplesofopendatainpractice.Italsohelpedinformstepstakentomaketheoriginaldatasetsavailableforusebyotherresearchers.Inthispaperwedescribebestpracticesandlessonslearnedinattemptingtoidentify,access,understand,andanalyzetheseadditionaldatasetstoadvanceresearchreproducibility,aswellasstepstakentofacilitatesharingofourowndata.

Page 30: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

19

IMAGINGGENOMICS

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 31: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

20

DISCRIMINATIVEBAG-OF-CELLSFORIMAGING-GENOMICS

BenjaminChidester1,MinhN.Do2,JianMa1

1CarnegieMellonUniversity,2UniversityofIllinoisatUrbana-ChampaignBenjamin,ChidesterConnectinggenotypestoimagephenotypesiscrucialforacomprehensiveunderstandingofcancer.Tolearnsuchconnections,newmachinelearningapproachesmustbedevelopedforthebetterintegrationofimagingandgenomicdata.HereweproposeanovelapproachcalledDiscriminativeBag-of-Cells(DBC)forpredictinggenomicmarkersusingimagingfeatures,whichaddressesthechallengeofsummarizinghistopathologicalimagesbyrepresentingcellswithlearneddiscriminativetypes,orcodewords.Wealsodevelopedareliableandefficientpatch-basednuclearsegmentationschemeusingconvolutionalneuralnetworksfromwhichnuclearandcellularfeaturesareextracted.ApplyingDBConTCGAbreastcancersamplestopredictbasalsubtypestatusyieldedaclass-balancedaccuracyof70%onaseparatetestpartitionof213patients.Asdatasetsofimagingandgenomicdatabecomeincreasinglyavailable,webelieveDBCwillbeausefulapproachforscreeninghistopathologicalimagesforgenomicmarkers.SourcecodeofnuclearsegmentationandDBCareavailableat:https://github.com/bchidest/DBC.

Page 32: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

21

DEEPINTEGRATIVEANALYSISFORSURVIVALPREDICTION

ChenglongHuang1,AlbertZhang2,GuanghuaXiao3

1ColleyvilleHeritageHighSchool,2HighlandParkHighSchool,3UniversityofTexasSouthwesternMedicalCenter

Chenglong,HuangSurvivalpredictionisveryimportantinmedicaltreatment.However,recentleadingresearchischallengedbytwofactors:1)thedatasetsusuallycomewithmulti-modality;and2)samplesizesarerelativelysmall.Tosolvetheabovechallenges,wedevelopedadeepsurvivallearningmodeltopredictpatients’survivaloutcomesbyintegratingmulti-viewdata.Theproposednetworkcontainstwosub-networks,oneview-specificandonecommonsub-network.WedesignatedoneCNN-basedandoneFCN-basedsub-networktoefficientlyhandlepathologicalimagesandmolecularprofiles,respectively.Ourmodelfirstexplicitlymaximizesthecorrelationamongtheviewsandthentransfersfeaturehierarchiesfromviewcommonalityandspecificallyfine-tunesonthesurvivalpredictiontask.Weevaluateourmethodonreallungandbraintumordatasetstodemonstratetheeffectivenessoftheproposedmodelusingdatawithmultiplemodalitiesacrossdifferenttumortypes.

Page 33: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

22

GENOTYPE-PHENOTYPEASSOCIATIONSTUDYVIANEWMULTI-TASKLEARNINGMODEL

ZhouyuanHuo1,DinggangShen2,HengHuang1

1UniversityofPittsburgh,2UniversityofNorthCarolinaatChapelHillHeng,HuangResearchontheassociationsbetweengeneticvariationsandimagingphenotypesisdevelopingwiththeadvanceinhigh-throughputgenotypeandbrainimagetechniques.Regressionanalysisofsinglenucleotidepolymorphisms(SNPs)andimagingmeasuresasquantitativetraits(QTs)hasbeenproposedtoidentifythequantitativetraitloci(QTL)viamulti-tasklearningmodels.RecentstudiesconsidertheinterlinkedstructureswithinSNPsandimagingQTsthroughgrouplasso,e.g.ℓ21-norm,leadingtobetterpredictiveresultsandinsightsofSNPs.However,groupsparsityisnotenoughforrepresentingthecorrelationbetweenmultipletasksandℓ21-normregularizationisnotrobusteither.Inthispaper,weproposeanewmulti-tasklearningmodeltoanalyzetheassociationsbetweenSNPsandQTs.Wesupposethatlow-rankstructureisalsobeneficialtouncoverthecorrelationbetweengeneticvariationsandimagingphenotypes.Finally,weconductregressionanalysisofSNPsandQTs.ExperimentalresultsshowthatourmodelismoreaccurateinpredictionthancomparedmethodsandpresentsnewinsightsofSNPs.

Page 34: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

23

CODONBIASAMONGSYNONYMOUSRAREVARIANTSISASSOCIATEDWITHALZHEIMER’SDISEASEIMAGINGBIOMARKER

JasonE.Miller1,ManuK.Shivakumar2,ShannonL.Risacher2,AndrewJ.Saykin2,SeunggeunLee3,KwangsikNho2,DokyoonKim1,4

1GeisingerHealthSystem,2IndianaUniversitySchoolofMedicine,3Universityof

Michigan,4PennsylvaniaStateUniversityJason,MillerAlzheimer’sdisease(AD)isaneurodegenerativedisorderwithfewbiomarkerseventhoughitimpactsarelativelylargeportionofthepopulationandispredictedtoaffectsignificantlymoreindividualsinthefuture.NeuroimaginghasbeenusedinconcertwithgeneticinformationtoimproveourunderstandinginrelationtohowADarisesandhowitcanbepotentiallydiagnosed.Additionally,evidencesuggestssynonymousvariantscanhaveafunctionalimpactongeneregulatorymechanisms,includingthoserelatedtoAD.Somesynonymouscodonsarepreferredoverothersleadingtoacodonbias.Thebiascanarisewithrespecttocodonsthataremoreorlessfrequentlyusedinthegenome.Abiascanalsoresultfromoptimalandnon-optimalcodons,whichhavestrongerandweakercodonanti-codoninteractions,respectively.AlthoughassociationtestshavebeenutilizedbeforetoidentifygenesassociatedwithAD,itremainsunclearhowcodonbiasplaysaroleandifitcanimproverarevariantanalysis.Inthiswork,rarevariantsfromwhole-genomesequencingfromtheAlzheimer’sDiseaseNeuroimagingInitiative(ADNI)cohortwerebinnedintogenesusingBioBin.AnassociationanalysisofthegeneswithAD-relatedneuroimagingbiomarkerwasperformedusingSKAT-O.Whileusingallsynonymousvariantswedidnotidentifyanygenome-widesignificantassociations,usingonlysynonymousvariantsthataffectedcodonfrequencyweidentifiedseveralgenesassignificantlyassociatedwiththeimagingphenotype.Additionally,significantassociationswerefoundusingonlyrarevariantsthatcontainsanoptimalcodoninamongminorallelesandanon-optimalcodoninthemajorallele.TheseresultssuggestthatcodonbiasmayplayaroleinADandthatitcanbeusedtoimprovedetectionpowerinrarevariantassociationanalysis.

Page 35: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

24

PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 36: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

25

SINGLESUBJECTTRANSCRIPTOMEANALYSISREPRODUCESSIGNEDGENESETFUNCTIONALACTIVATIONSIGNALSFROMCOHORTANALYSISOFMURINE

RESPONSETOHIGHFATDIET

JoanneBerghout,QikeLi,NimaPouladi,JianrongLi,YvesA.Lussier

UniversityofArizonaJoanne,BerghoutAnalysisofsingle-subjecttranscriptomeresponsedataisanunmetneedofprecisionmedicine,madechallengingbythehighdimension,dynamicnatureanddifficultyinextractingmeaningfulsignalsfrombiologicalorstochasticnoise.Wehaveproposedamethodforsinglesubjectanalysisthatusesamixturemodelfortranscriptfold-changeclusteringfromisogenicpairedsamples,followedbyintegrationofthesedistributionswithGeneOntologyBiologicalProcesses(GO-BP)toreducedimensionandidentifyfunctionalattributes.WethenextendedthesemethodstodevelopfunctionalsigningmetricsforgenesetprocessregulationbyincorporatingbiologicalrepressorrelationshipsencodedinGOasnegatively_regulatesedges.Resultsrevealedreproducibleandbiologicallymeaningfulsignalsfromanalysisofasinglesubject’sresponse,openingthedoortofuturetranscriptomicstudieswheresubjectandresourceavailabilityarecurrentlylimiting.Weusedinbredmousestrainsfeddifferentdietstoprovideisogenicbiologicalreplicates,permittingrigorousvalidationofourmethod.Wecomparedsignificantgenotype-specificGO-BPtermresultsforoverlapandrankorderacrossthreereplicatespergenotype,andcross-methodstoreferencestandards(limma+FET,SAM+FET,andGSEA).Allsingle-subjectanalyticsfindingswererobustandhighlyreproducible(medianareaundertheROCcurve=0.96,n=24genotypesx3replicates),providingconfidenceandvalidationofthisapproachforanalysesinsinglesubjects.Rcodeisavailableonlineathttp://www.lussiergroup.org/publications/PathwayActivity

Page 37: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

26

USINGSIMULATIONANDOPTIMIZATIONAPPROACHTOIMPROVEOUTCOMETHROUGHWARFARINPRECISIONTREATMENT

Chih-LinChi1,LuHe2,KouroshRavvaz3,JohnWeissert3,PeterJ.Tonellato4,5

1SchoolofNursing&InstituteforHealthInformatics,UniversityofMinnesota,Minneapolis,MN,USA;2ComputerScienceandEngineering,UniversityofMinnesota,Minneapolis,MN,USA;3AuroraHealthCare,Milwaukee,WI,USA;4Departmentof

BiomedicalInformatics,DepartmentofPathology,HarvardMedicalSchool,Boston,MA,USA;5ZilberSchoolofPublicHealthUniversityofWisconsin-Milwaukee,Milwaukee,WI,

USAChih-Lin,ChiWeapplyatreatmentsimulationandoptimizationapproachtodevelopdecisionsupportguidanceforwarfarinprecisiontreatmentplans.Simulationincludetheuseof~1,500,000clinicalavatars(simulatedpatients)generatedbyanintegrateddata-drivenanddomain-knowledgebasedBayesianNetworkModelingapproach.Subsequently,wesimulate30-dayindividualpatientresponsetowarfarintreatmentoffiveclinicalandgenetictreatmentplansfollowedbybothindividualandsub-populationbasedoptimization.Sub-populationoptimization(comparedtoindividualoptimization)providesacosteffectiveandrealisticmeansofimplementationofaprecision-driventreatmentplaninpracticalsettings.Inthisproject,weusethepropertyofminimalentropytominimizeoveralladverserisksforthelargestpossiblepatientsub-populationsandwetempertheresultsbyconsideringbothtransparencyandeaseofimplementation.Finally,wediscusstheimprovedoutcomeoftheprecisiontreatmentplanbasedonthesub-populationoptimizeddecisionsupportrules.

Page 38: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

27

COALITIONALGAMETHEORYASAPROMISINGAPPROACHTOIDENTIFYCANDIDATEAUTISMGENES

AnikaGupta,MinWooSun,KelleyM.Paskov,NateT.Stockham,Jae-YoonJung,DennisP.Wall

StanfordUniversity

Dennis,WallDespitemountingevidenceforthestrongroleofgeneticsinthephenotypicmanifestationofAutismSpectrumDisorder(ASD),thespecificgenesresponsibleforthevariableformsofASDremainundefined.ASDmaybebestexplainedbyacombinatorialgeneticmodelwithvaryingepistaticinteractionsacrossmanysmalleffectmutations.Coalitionalorcooperativegametheoryisatechniquethatstudiesthecombinedeffectsofgroupsofplayers,knownascoalitions,seekingtoidentifyplayerswhotendtoimprovetheperformance--therelationshiptoaspecificdiseasephenotype--ofanycoalitiontheyjoin.Thismethodhasbeenpreviouslyshowntoboostbiologicallyinformativesignalingeneexpressiondatabutto-datehasnotbeenappliedtothesearchforcooperativemutationsamongputativeASDgenes.WedescribeourapproachtohighlightgenesrelevanttoASDusingcoalitionalgametheoryonalterationdataof1,965fullysequencedgenomesfrom756multiplexfamilies.AlterationswereencodedintobinarymatricesforASD(case)andunaffected(control)samples,indicatinglikelygene-disrupting,inheritedmutationsinalteredgenes.TodetermineindividualgenecontributionsgivenanASDphenotype,a“player”metric,referredtoastheShapleyvalue,wascalculatedforeachgeneinthecaseandcontrolcohorts.SixtysevengeneswerefoundtohavesignificantlyelevatedplayerscoresandlikelyrepresentsignificantcontributorstothegeneticcoordinationunderlyingASD.Usingnetworkandcross-studyanalysis,wefoundthatthesegenesareinvolvedinbiologicalpathwaysknowntobeaffectedintheautismcasesandthatasubsetdirectlyinteractwithseveralgenesknowntohavestrongassociationstoautism.Thesefindingssuggestthatcoalitionalgametheorycanbeappliedtolarge-scalegenomicdatatoidentifyhiddenyetinfluentialplayersincomplexpolygenicdisorderssuchasautism.

Page 39: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

28

CONSIDERATIONSFORAUTOMATEDMACHINELEARNINGINCLINICALMETABOLICPROFILING:ALTEREDHOMOCYSTEINEPLASMACONCENTRATION

ASSOCIATEDWITHMETFORMINEXPOSURE

AlenaOrlenko1,JasonH.Moore1,PatrykOrzechowski1,RandalS.Olson1,JunmeiCairns2,PedroJ.Caraballo2,RichardM.Weinshilboum2,LieweiWang2,MatthewK.Breitenstein1

1UniversityofPennsylvania,2MayoClinic

Matthew,BreitensteinWiththematurationofmetabolomicsscienceandproliferationofbiobanks,clinicalmetabolicprofilingisanincreasinglyopportunisticfrontierforadvancingtranslationalclinicalresearch.AutomatedMachineLearning(AutoML)approachesprovideexcitingopportunitytoguidefeatureselectioninagnosticmetabolicprofilingendeavors,wherepotentiallythousandsofindependentdatapointsmustbeevaluated.Inpreviousresearch,AutoMLusinghigh-dimensionaldataofvaryingtypeshasbeendemonstrablyrobust,outperformingtraditionalapproaches.However,considerationsforapplicationinclinicalmetabolicprofilingremaintobeevaluated.Particularly,regardingtherobustnessofAutoMLtoidentifyandadjustforcommonclinicalconfounders.Inthisstudy,wepresentafocusedcasestudyregardingAutoMLconsiderationsforusingtheTree-BasedOptimizationTool(TPOT)inmetabolicprofilingofexposuretometformininabiobankcohort.First,weproposeatandemrank-accuracymeasuretoguideagnosticfeatureselectionandcorrespondingthresholddeterminationinclinicalmetabolicprofilingendeavors.Second,whileAutoML,usingdefaultparameters,demonstratedpotentialtolacksensitivitytolow-effectconfoundingclinicalcovariates,wedemonstratedresidualtrainingandadjustmentofmetabolitefeaturesasaneasilyapplicableapproachtoensureAutoMLadjustmentforpotentialconfoundingcharacteristics.Finally,wepresentincreasedhomocysteinewithlong-termexposuretometforminasapotentiallynovel,non-replicatedmetaboliteassociationsuggestedbyTPOT;anassociationnotidentifiedinparallelclinicalmetabolicprofilingendeavors.Whilewarrantingindependentreplication,ourtandemrank-accuracymeasuresuggestshomocysteinetobethemetabolitefeaturewithlargesteffect,andcorrespondingpriorityforfurthertranslationalclinicalresearch.ResidualtrainingandadjustmentforapotentialconfoundingeffectbyBMIonlyslightlymodifiedthesuggestedassociation.IncreasedhomocysteineisthoughttobeassociatedwithvitaminB12deficiency–evaluationforpotentialclinicalrelevanceissuggested.Whileconsiderationsforclinicalmetabolicprofilingarerecommended,includingadjustmentapproachesforclinicalconfounders,AutoMLpresentsanexcitingtooltoenhanceclinicalmetabolicprofilingandadvancetranslationalresearchendeavors.

Page 40: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

29

ADDRESSINGVITALSIGNALARMFATIGUEUSINGPERSONALIZEDALARMTHRESHOLDS

SarahPoole,NigamShah

StanfordUniversitySarah,PooleAlarmfatigue,aconditioninwhichclinicalstaffbecomedesensitizedtoalarmsduetothehighfrequencyofunnecessaryalarms,isamajorpatientsafetyconcern.Alarmfatigueisparticularlyprevalentinthepediatricsetting,duetothehighlevelofvariationinvitalsignswithpatientage.Existingstudieshaveshownthatthecurrentdefaultpediatricvitalsignalarmthresholdsareinappropriate,andleadtoalargerthannecessaryalarmload.Thisstudyleveragesalargedatabasecontainingover190patient-yearsofheartratedatatoaccuratelyidentifythe1stand99thpercentilesofanindividual’sheartrateontheirfirstdayofvitalsignmonitoring.Thesepercentilesarethenusedaspersonalizedvitalsignthresholds,whichareevaluatedbycomparingtonon-defaultalarmthresholdsusedinpractice,andbyusingthepresenceofmajorclinicaleventstoinferalarmlabels.Usingtheproposedpersonalizedthresholdswoulddecreaselowandhighheartratealarmsbyupto50%and44%respectively,whilemaintainingsensitivityof62%andincreasingspecificityto49%.Theproposedpersonalizedvitalsignalarmthresholdswillreducealarmfatigue,thuscontributingtoimprovedpatientoutcomes,shorterhospitalstays,andreducedhospitalcosts.

Page 41: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

30

EMERGENCEOFPATHWAY-LEVELCOMPOSITEBIOMARKERSFROMCONVERGINGGENESETSIGNALSOFHETEROGENEOUSTRANSCRIPTOMIC

RESPONSES

SamirRachidZaim,QikeLi,A.GrantSchissler,YvesA.Lussier

TheUniversityofArizonaYves,LussierRecentprecisionmedicineinitiativeshaveledtotheexpectationofimprovedclinicaldecision-makinganchoredingenomicdatascience.However,overthelastdecade,onlyahandfulofnewsingle-geneproductbiomarkershavebeentranslatedtoclinicalpractice(FDAapproved)inspiteofconsiderablediscoveryeffortsdeployedandaplethoraoftranscriptomesavailableintheGeneExpressionOmnibus.Withthismodestoutcomeofcurrentapproachesinmind,wedevelopedapilotsimulationstudytodemonstratetheuntappedbenefitsofdevelopingdiseasedetectionmethodsforcaseswherethetruesignalliesatthepathwaylevel,evenifthepathway’sgeneexpressionalterationsmaybeheterogeneousacrosspatients.Inotherwords,werelaxedthecross-patienthomogeneityassumptionfromthetranscriptlevel(cohortassumptionsofderegulatedgeneexpression)tothepathwaylevel(assumptionsofderegulatedpathwayexpression).Furthermore,wehaveexpandedprevioussingle-subject(SS)methodsintocohortanalysestoillustratethebenefitofaccountingforanindividual’svariabilityincohortscenarios.WecompareSSandcohort-based(CB)techniquesunder54distinctscenarios,eachwith1,000simulations,todemonstratethattheemergenceofapathway-levelsignaloccursthroughthesummativeeffectofitsalteredgeneexpression,heterogeneousacrosspatients.Studiedvariablesincludepathwaygenesetsize,fractionofexpressedgeneresponsivewithingeneset,fractionofexpressedgeneresponsiveup-vsdown-regulated,andcohortsize.WedemonstratedthatourSSapproachwasuniquelysuitedtodetectsignalsinheterogeneouspopulationsinwhichindividualshavevaryinglevelsofbaselinerisksthataresimultaneouslyconfoundedbypatient-specific“genome-by-environment”interactions(G×E).Areaundertheprecision-recallcurveoftheSSapproachfarsurpassedthatoftheCB(1stquartile,median,3rdquartile:SS=0.94,0.96,0.99;CB=0.50,0.52,0.65).Weconcludethatsingle-subjectpathwaydetectionmethodsareuniquelysuitedforconsistentlydetectingpathwaydysregulationbytheinclusionofapatient’sindividualvariability.http://www.lussiergroup.org/publications/PathwayMarker/

Page 42: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

31

ANALYZINGMETABOLOMICSDATAFORASSOCIATIONWITHGENOTYPESUSINGTWO-COMPONENTGAUSSIANMIXTUREDISTRIBUTIONS

JasonWestra1,2,NicholasHartman3,BethanyLake4,GregoryShearer5,NathanTintle1

1DordtCollege,2IowaStateUniversity,3CornellUniversity,4ElonUniversity,5ThePennsylvaniaStateUniversity

Jason,WestraStandardapproachestoevaluatetheimpactofsinglenucleotidepolymorphisms(SNP)onquantitativephenotypesuselinearmodels.However,thesenormal-basedapproachesmaynotoptimallymodelphenotypeswhicharebetterrepresentedbyGaussianmixturedistributions(e.g.,somemetabolomicsdata).Wedevelopalikelihoodratiotestonthemixingproportionsoftwo-componentGaussianmixturedistributionsandconsidermorerestrictivemodelstoincreasepowerinlightofaprioribiologicalknowledge.Datawassimulatedtovalidatetheimprovedpowerofthelikelihoodratiotestandtherestrictedlikelihoodratiotestoveralinearmodelandalogtransformedlinearmodel.Then,usingrealdatafromtheFraminghamHeartStudy,weanalyzed20,315SNPsonchromosome11,demonstratingthattheproposedlikelihoodratiotestidentifiesSNPswellknowntoparticipateinthedesaturationofcertainfattyacids.OurstudybothvalidatestheapproachofincreasingpowerbyusingthelikelihoodratiotestthatleveragesGaussianmixturemodels,andcreatesamodelwithimprovedsensitivityandinterpretability.

Page 43: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

32

READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROM

NONCODINGDNA

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 44: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

33

CONVERGENTDOWNSTREAMCANDIDATEMECHANISMSOFINDEPENDENTINTERGENICPOLYMORPHISMSBETWEENCO-CLASSIFIEDDISEASESIMPLICATE

EPISTASISAMONGNONCODINGELEMENTS

JialiHan1,JianrongLi1,IkbelAchour1,LorenzoPesce2,IanFoster2,HaiquanLi3,YvesA.Lussier3

1CenterforBiomedicalInformaticsandBiostatistics(CB2)andDepartmentsofMedicineandofSystemsandIndustrialEngineering,TheUniversityofArizona,Tucson,AZ85721,USA;2ComputationInstitute,ArgonneNationalLaboratoryandUniversityofChicago,

Chicago,IL60637,USA;3CB2,BIO5Institute,UACC,andDeptofMedicine,TheUniversityofArizona,Tucson,AZ85721,USA

Haiquan,LiEightypercentofDNAoutsideproteincodingregionswasshownbiochemicallyfunctionalbytheENCODEproject,enablingstudiesoftheirinteractions.Studieshavesinceexploredhowconvergentdownstreammechanismsarisefromindependentgeneticrisksofonecomplexdisease.However,thecross-talkandepistasisbetweenintergenicrisksassociatedwithdistinctcomplexdiseaseshavenotbeencomprehensivelycharacterized.Ourrecentintegrativegenomicanalysisunveileddownstreambiologicaleffectorsofdisease-specificpolymorphismsburiedinintergenicregions,andwethenvalidatedtheirgeneticsynergyandantagonismindistinctGWAS.WeextendthisapproachtocharacterizeconvergentdownstreamcandidatemechanismsofdistinctintergenicSNPsacrossdistinctdiseaseswithinthesameclinicalclassification.Weconstructamultipartitenetworkconsistingof467diseasesorganizedin15classes,2,358disease-associatedSNPs,6,301SNP-associatedmRNAsbyeQTL,andmRNAannotationsto4,538GeneOntologymechanisms.FunctionalsimilaritybetweentwoSNPs(similarSNPpairs)isimputedusinganestedinformationtheoreticdistancemodelforwhichp-valuesareassignedbyconservativescale-freepermutationofnetworkedgeswithoutreplacement(nodedegreesconstant).AtFDR≤5%,weprioritized3,870intergenicSNPpairsassociated,amongwhich755areassociatedwithdistinctdiseasessharingthesamediseaseclass,implicating167intergenicSNPs,14classes,230mRNAs,and134GOterms.Co-classifiedSNPpairsweremorelikelytobeprioritizedascomparedtothoseofdistinctclassesconfirminganoncodinggeneticunderpinningtoclinicalclassification(oddsratio~3.8;p≤10E-25).Theprioritizedpairswerealsoenrichedinregionsboundtothesame/interactingtranscriptionfactorsand/orinteractinginlong-rangechromatininteractionssuggestiveofepistasis(oddsratio~2,500;p≤10E-25).Thisprioritizednetworkimplicatescomplexepistasisbetweenintergenicpolymorphismsofco-classifieddiseasesandoffersaroadmapforanoveltherapeuticparadigm:repositioningmedicationsthattargetproteinswithindownstreammechanismsofintergenicdisease-associatedSNPs.Supplementaryinformationandsoftware:http://lussiergroup.org/publications/disease_class

Page 45: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

34

NETWORKANALYSISOFPSEUDOGENE-GENERELATIONSHIPS:FROMPSEUDOGENEEVOLUTIONTOTHEIRFUNCTIONALPOTENTIALS

TravisS.Johnson1,SihongLi1,JohnathanR.Kho2,KunHuang3,YanZhang1

1OhioStateUniversity,2GeorgiaInstituteofTechnology,3IndianaUniversityTravis,JohnsonPseudogenesarefossilrelativesofgenes.Pseudogeneshavelongbeenthoughtofas“junkDNAs”,sincetheydonotcodeproteinsinnormaltissues.Althoughmostofthehumanpseudogenesdonothavenoticeablefunctions,~20%ofthemexhibittranscriptionalactivity.TherehasbeenevidenceshowingthatsomepseudogenesadoptedfunctionsaslncRNAsandworkasregulatorsofgeneexpression.Furthermore,pseudogenescanevenbe“reactivated”insomeconditions,suchascancerinitiation.Somepseudogenesaretranscribedinspecificcancertypes,andsomeareeventranslatedintoproteinsasobservedinseveralcancercelllines.Alltheabovehaveshownthatpseudogenescouldhavefunctionalrolesorpotentialsinthegenome.Evaluatingtherelationshipsbetweenpseudogenesandtheirgenecounterpartscouldhelpusrevealtheevolutionarypathofpseudogenesandassociatepseudogeneswithfunctionalpotentials.Italsoprovidesaninsightintotheregulatorynetworksinvolvingpseudogeneswithtranscriptionalandeventranslationalactivities.Inthisstudy,wedevelopanovelapproachintegratinggraphanalysis,sequencealignmentandfunctionalanalysistoevaluatepseudogene-generelationships,andapplyittohumangenehomologsandpseudogenes.Wegeneratedacomprehensivesetof445pseudogene-gene(PGG)familiesfromtheoriginal3,281genefamilies(13.56%).Ofthese438(98.4%PGG,13.3%total)werenon-trivial(containingmorethanonepseudogene).EachPGGfamilycontainsmultiplegenesandpseudogeneswithhighsequencesimilarity.Foreachfamily,wegenerateasequencealignmentnetworkandphylogenetictreesrecapitulatingtheevolutionarypaths.Wefindevidencesupportingtheevolutionhistoryofolfactoryfamily(bothgenesandpseudogenes)inhuman,whichalsosupportsthevalidityofouranalysismethod.Next,weevaluatethesenetworksinrespecttothegeneontologyfromwhichweidentifyfunctionsenrichedinthesepseudogene-genefamiliesandinferfunctionalimpactofpseudogenesinvolvedinthenetworks.ThisdemonstratestheapplicationofourPGGnetworkdatabaseinthestudyofpseudogenefunctionindiseasecontext.

Page 46: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

35

LEVERAGINGPUTATIVEENHANCER-PROMOTERINTERACTIONSTOINVESTIGATETWO-WAYEPISTASISINTYPE2DIABETESGWAS

ElisabettaManduchi1,2,AlessandraChesi2,MollyA.Hall1,StruanF.A.Grant2,JasonH.Moore1

1UniversityofPennsylvania,2TheChildren’sHospitalofPhiladelphia

Elisabetta,ManduchiWeutilizedevidenceforenhancer-promoterinteractionsfromfunctionalgenomicsdatainordertobuildbiologicalfilterstonarrowdownthesearchspacefortwo-waySingleNucleotidePolymorphism(SNP)interactionsinType2Diabetes(T2D)GenomeWideAssociationStudies(GWAS).ThishasledustotheidentificationofareproduciblestatisticallysignificantSNPpairassociatedwithT2D.Asmorefunctionalgenomicsdataarebeinggeneratedthatcanhelpidentifypotentiallyinteractingenhancer-promoterpairsinlargercollectionoftissues/cells,thisapproachhasimplicationsforinvestigationofepistasisfromGWASingeneral.

Page 47: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

36

TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 48: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

37

IMPROVINGPRECISIONINCONCEPTNORMALIZATION

MaylaBoguslav,K.BretonnelCohen,WilliamA.BaumgartnerJr.,LawrenceE.Hunter

ComputationalBioscienceProgram,UniversityofColoradoSchoolofMedicineMayla,BoguslavMostnaturallanguageprocessingapplicationsexhibitatrade-offbetweenprecisionandrecall.Insomeusecasesfornaturallanguageprocessing,therearereasonstoprefertotiltthattrade-offtowardhighprecision.RelyingontheZipfiandistributionoffalsepositiveresults,wedescribeastrategyforincreasingprecision,usingavarietyofbothpre-processingandpost-processingmethods.Theydrawonbothknowledge-basedandfrequentistapproachestomodelinglanguage.Basedonanexistinghigh-performancebiomedicalconceptrecognitionpipelineandapreviouslypublishedmanuallyannotatedcorpus,weapplythishybridrationalist/empiriciststrategytoconceptnormalizationforeightdifferentontologies.Whichapproachesdidanddidnotimproveprecisionvariedwidelybetweentheontologies.

Page 49: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

38

VISAGE:INTEGRATINGEXTERNALKNOWLEDGEINTOELECTRONICMEDICALRECORDVISUALIZATION

EdwardW.Huang,ShengWang,ChengXiangZhai

UniversityofIllinoisatUrbana-ChampaignEdward,HuangInthispaper,wepresentVisAGE,amethodthatvisualizeselectronicmedicalrecords(EMRs)inalow-dimensionalspace.Effectivevisualizationofnewpatientsallowsdoctorstoviewsimilar,previouslytreatedpatientsandtoidentifythenewpatients'diseasesubtypes,reducingthechanceofmisdiagnosis.However,EMRsaretypicallyincompleteorfragmented,resultinginpatientswhoaremissingmanyavailablefeaturesbeingplacednearunrelatedpatientsinthevisualizedspace.VisAGEintegratesseveralexternaldatasourcestoenrichEMRdatabasestosolvethisissue.WeevaluatedVisAGEonadatasetofParkinson'sdiseasepatients.WequalitativelyandquantitativelyshowthatVisAGEcanmoreeffectivelyclusterpatients,whichallowsdoctorstobetterdiscoverpatientsubtypesandthusimprovepatientcare.

Page 50: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

39

ANNOTATINGGENESETSBYMININGLARGELITERATURECOLLECTIONSWITHPROTEINNETWORKS

ShengWang1,JianzhuMa2,MichaelKuYu2,FanZheng2,EdwardW.Huang1,JiaweiHan1,JianPeng1,TreyIdeker2

1DepartmentofComputerScience,UniversityofIllinoisatUrbana-Champaign,Urbana,IL,USA,2SchoolofMedicine,UniversityofCaliforniaSanDiego,SanDiego,CA,USA

Jianzhu,MaAnalysisofpatientgenomesandtranscriptomesroutinelyrecognizesnewgenesetsassociatedwithhumandisease.Herewepresentanintegrativenaturallanguageprocessingsystemwhichinferscommonfunctionsforagenesetthroughautomaticminingofthescientificliteraturewithbiologicalnetworks.Thissystemlinksgeneswithassociatedliteraturephrasesandcombinestheselinkswithproteininteractionsinasingleheterogeneousnetwork.Multiscalefunctionalannotationsareinferredbasedonnetworkdistancesbetweenphrasesandgenesandthenvisualizedasanontologyofbiologicalconcepts.Toevaluatethissystem,wepredictfunctionsforgenesetsrepresentingknownpathwaysandfindthatourapproachachievessubstantialimprovementovertheconventionaltext-miningbaselinemethod.Moreover,oursystemdiscoversnovelannotationsforgenesetsorpathwayswithoutpreviouslyknownfunctions.Twocasestudiesdemonstratehowthesystemisusedindiscoveryofnewcancer-relatedpathwayswithontologicalannotations.

Page 51: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

40

APPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 52: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

41

PREDICTIONOFPROTEIN-LIGANDINTERACTIONSFROMPAIREDPROTEINSEQUENCEMOTIFSANDLIGANDSUBSTRUCTURES

PeytonGreenside,MaureenHillenmeyer,AnshulKundaje

StanfordUniversityPeyton,GreensideIdentificationofsmallmoleculeligandsthatbindtoproteinsisacriticalstepindrugdiscovery.Computationalmethodshavebeendevelopedtoacceleratethepredictionofprotein-ligandbinding,butoftendependon3Dproteinstructures.Asonlyalimitednumberofprotein3Dstructureshavebeenresolved,theabilitytopredictprotein-ligandinteractionswithoutrelyingona3Drepresentationwouldbehighlyvaluable.Weuseaninterpretableconfidence-ratedboostingalgorithmtopredictprotein-ligandinteractionswithhighaccuracyfromligandchemicalsubstructuresandprotein1Dsequencemotifs,withoutrelyingon3Dproteinstructures.Wecompareseveralproteinmotifdefinitions,assessgeneralizationofourmodel’spredictionstounseenproteinsandligands,demonstraterecoveryofwellestablishedinteractionsandidentifygloballypredictiveprotein-ligandmotifpairs.Bybridgingbiologicalandchemicalperspectives,wedemonstratethatitispossibletopredictprotein-ligandinteractionsusingonlymotif-basedfeaturesandthatinterpretationofthesefeaturescanrevealnewinsightsintothemolecularmechanicsunderlyingeachinteraction.Ourworkalsolaysafoundationtoexploremorepredictivefeaturesetsandsophisticatedmachinelearningapproachesaswellasotherapplications,suchaspredictingunintendedinteractionsortheeffectsofmutations.

Page 53: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

42

LOSS-OF-FUNCTIONOFNEUROPLASTICITY-RELATEDGENESCONFERSRISKFORHUMANNEURODEVELOPMENTALDISORDERS

MiloR.Smith,BenjaminS.Glicksberg,LiLi,RongChen,HirofumiMorishita,JoelT.Dudley

IcahnSchoolofMedicineatMountSinai

Milo,SmithHighandincreasingprevalenceofneurodevelopmentaldisordersplaceenormouspersonalandeconomicburdensonsociety.Giventhegrowingrealizationthattherootsofneurodevelopmentaldisordersoftenlieinearlychildhood,thereisanurgentneedtoidentifychildhoodriskfactors.Neurodevelopmentismarkedbyperiodsofheightenedexperience-dependentneuroplasticitywhereinneuralcircuitryisoptimizedbytheenvironment.Ifthesecriticalperiodsaredisrupted,developmentofnormalbrainfunctioncanbepermanentlyaltered,leadingtoneurodevelopmentaldisorders.Here,weaimtosystematicallyidentifyhumanvariantsinneuroplasticity-relatedgenesthatconferriskforneurodevelopmentaldisorders.Historically,thisknowledgehasbeenlimitedbyalackoftechniquestoidentifygenesrelatedtoneurodevelopmentalplasticityinahigh-thoughputmannerandalackofmethodstosystematicallyidentifymutationsinthesegenesthatconferriskforneurodevelopmentaldisorders.Usinganintegrativegenomicsapproach,wedeterminedloss-of-function(LOF)variantsinputativeplasticitygenes,identifiedfromtranscriptionalprofilesofbrainfrommicewithelevatedplasticity,thatwereassociatedwithneurodevelopmentaldisorders.Fromfiveshareddifferentiallyexpressedgenesfoundintwomousemodelsofjuvenile-likeelevatedplasticity(juvenilewild-typeoradultLynx1-/-relativetoadultwild-type)thatwerealsogenotypedintheMountSinaiBioMeBiobankweidentifiedmultipleassociationsbetweenLOFgenesandincreasedriskforneurodevelopmentaldisordersacross10,510patientslinkedtotheMountSinaiElectronicMedicalRecords(EMR),includingepilepsyandschizophrenia.Thisworkdemonstratesanovelapproachtoidentifyneurodevelopmentalriskgenesandpointstowardapromisingavenuetodiscovernewdrugtargetstoaddresstheunmettherapeuticneedsofneurodevelopmentaldisease.

Page 54: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

43

DIFFUSIONMAPPINGOFDRUGTARGETSONDISEASESIGNALINGNETWORKELEMENTSREVEALSDRUGCOMBINATIONSTRATEGIES

JielinXu1,KellyRegan1,SiyuanDeng1,WilliamE.CarsonIII2,PhilipR.O.Payne3,FuhaiLi1

1DeptartmentofBiomedicalInformatics,TheOhioStateUniversity;2ComprehensiveCancerCenter,TheOhioStateUniversity;3InstituteforInformatics,Washington

UniversityinSt.LouisFuhai,LiTheemergenceofdrugresistancetotraditionalchemotherapyandnewertargetedtherapiesincancerpatientsisamajorclinicalchallenge.Reactivationofthesameorcompensatorysignalingpathwaysisacommonclassofdrugresistancemechanisms.Employingdrugcombinationsthatinhibitmultiplemodulesofreactivatedsignalingpathwaysisapromisingstrategytoovercomeandpreventtheonsetofdrugresistance.However,withthousandsofavailableFDA-approvedandinvestigationalcompounds,itisinfeasibletoexperimentallyscreenmillionsofpossibledrugcombinationswithlimitedresources.Therefore,computationalapproachesareneededtoconstrainthesearchspaceandprioritizesynergisticdrugcombinationsforpreclinicalstudies.Inthisstudy,weproposeanovelapproachforpredictingdrugcombinationsthroughinvestigatingpotentialeffectsofdrugtargetsondiseasesignalingnetwork.Wefirstconstructadiseasesignalingnetworkbyintegratinggeneexpressiondatawithdisease-associateddrivergenes.Individualdrugsthatcanpartiallyperturbthediseasesignalingnetworkarethenselectedbasedonadrug-diseasenetwork“impactmatrix”,whichiscalculatedusingnetworkdiffusiondistancefromdrugtargetstosignalingnetworkelements.Theselecteddrugsaresubsequentlyclusteredintocommunities(subgroups),whichareproposedtosharesimilarmechanismsofaction.Finally,drugcombinationsarerankedaccordingtomaximalimpactonsignalingsub-networksfromdistinctmechanism-basedcommunities.Ourmethodisadvantageouscomparedtootherapproachesinthatitdoesnotrequirelargeamountsdrugdoseresponsedata,drug-induced“omics”profilesorclinicalefficacydata,whicharenotoftenreadilyavailable.WevalidateourapproachusingaBRAF-mutantmelanomasignalingnetworkandcombinatorialinvitrodrugscreeningdata,andreportdrugcombinationswithdiversemechanismsofactionandopportunitiesfordrugrepositioning.

Page 55: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

44

CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATA

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 56: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

45

OWL-NETS:TRANSFORMINGOWLREPRESENTATIONSFORIMPROVEDNETWORKINFERENCE

TiffanyJ.Callahan1,WilliamA.BaumgartnerJr.1,MichaelBada1,AdrianneL.Stefanski1,IgnacioTripodi2,ElizabethK.White1,LawrenceE.Hunter1

1UniversityofColoradoDenverAnschutzMedicalCampus,2UniversityofColorado

BoulderTiffany,CallahanOurknowledgeofthebiologicalmechanismsunderlyingcomplexhumandiseaseislargelyincomplete.WhileSemanticWebtechnologies,suchastheWebOntologyLanguage(OWL),providepowerfultechniquesforrepresentingexistingknowledge,well-establishedOWLreasonersareunabletoaccountformissingoruncertainknowledge.Theapplicationofinductiveinferencemethods,likemachinelearningandnetworkinferencearevitalforextendingourcurrentknowledge.Therefore,robustmethodswhichfacilitateinductiveinferenceonrichOWL-encodedknowledgeareneeded.Here,weproposeOWL-NETS(NEtworkTransformationforStatisticallearning),anovelcomputationalmethodthatreversiblyabstractsOWL-encodedbiomedicalknowledgeintoanetworkrepresentationtailoredfornetworkinference.UsingseveralexamplesbuiltwiththeOpenBiomedicalOntologies,weshowthatOWL-NETScanleverageexistingontology-basedknowledgerepresentationsandnetworkinferencemethodstogeneratenovel,biologically-relevanthypotheses.Further,thelosslesstransformationofOWL-NETSallowsforseamlessintegrationofinferrededgesbackintotheoriginalknowledgebase,extendingitscoverageandcompleteness.

Page 57: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

46

ANULTRA-FASTANDSCALABLEQUANTIFICATIONPIPELINEFORTRANSPOSABLEELEMENTSFROMNEXTGENERATIONSEQUENCINGDATA

Hyun-HwanJeong,HariKrishnaYalamanchili,CaiweiGuo,Joshua,M.Shulman,ZhandongLiu

CollegeofMedicine,JanandDanDuncanNeurologicalResearchInstitute

Hyun-Hwan,JeongTransposableelements(TEs)areDNAsequenceswhicharecapableofmovingfromonelocationtoanotherandrepresentalargeproportion(45%)ofthehumangenome.TEshavefunctionalrolesinavarietyofbiologicalphenomenasuchascancer,neurodegenerativedisease,andaging.RapiddevelopmentinRNA-sequencingtechnologyhasenabledus,forthefirsttime,tostudytheactivityofTEatthesystemslevel.However,efficientTEanalysistoolsarenotyetdeveloped.Inthiswork,wedevelopedSalmonTE,afastandreliablepipelineforthequantificationofTEsfromRNA-seqdata.WebenchmarkedourtoolagainstTEtranscripts,awidelyusedTEquantificationmethod,andthreeotherquantificationmethodsusingseveralRNA-seqdatasetsfromDrosophilamelanogasterandhumancell-line.Weachieved20timesfasterexecutionspeedwithoutcompromisingtheaccuracy.ThispipelinewillenablethebiomedicalresearchcommunitytoquantifyandanalyzeTEsfromlargeamountsofdataandleadtonovelTEcentricdiscoveries.

Page 58: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

47

IMPROVINGTHEEXPLAINABILITYOFRANDOMFORESTCLASSIFIER–USERCENTEREDAPPROACH

DragutinPetkovic1,3,RussB.Altman2,MikeWong3,ArthurVigil4

1ComputerScienceDepartment,SanFranciscoStateUniversity(SFSU),1600HollowayAve.,SanFranciscoCA94132,[email protected];2DepartmentofBioengineering,

StanfordUniversity,443ViaOrtegaDrive,Stanford,CA94305-4145;3SFSUCenterforComputingforLifeSciences,1600HollowayAve.,SanFrancisco,CA94132;4Twist

Bioscience,455MissionBayBoulevardSouth,SanFrancisco,CA94158Dragutin,PetkovicMachineLearning(ML)methodsarenowinfluencingmajordecisionsaboutpatientcare,newmedicalmethods,drugdevelopmentandtheiruseandimportancearerapidlyincreasinginallareas.However,theseMLmethodsareinherentlycomplexandoftendifficulttounderstandandexplainresultinginbarrierstotheiradoptionandvalidation.Ourwork(RFEX)focusesonenhancingRandomForest(RF)classifierexplainabilitybydevelopingeasytointerpretexplainabilitysummaryreportsfromtrainedRFclassifiersasawaytoimprovetheexplainabilityfor(oftennon-expert)users.RFEXisimplementedandextensivelytestedonStanfordFEATUREdatawhereRFistaskedwithpredictingfunctionalsitesin3Dmoleculesbasedontheirelectrochemicalsignatures(features).IndevelopingRFEXmethodweapplyuser-centeredapproachdrivenbyexplainabilityquestionsandrequirementscollectedbydiscussionswithinterestedpractitioners.Weperformedformalusabilitytestingwith13expertandnon-expertuserstoverifyRFEXusefulness.AnalysisofRFEXexplainabilityreportanduserfeedbackindicatesitsusefulnessinsignificantlyincreasingexplainabilityanduserconfidenceinRFclassificationonFEATUREdata.Notably,RFEXsummaryreportseasilyrevealthatoneneedsveryfew(from2-6dependingonamodel)toprankedfeaturestoachieve90%orbetteroftheaccuracywhenall480featuresareused.Keywords:RandomForest,Explainability,Interpretability,StanfordFEATURE

Page 59: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

48

TREE-BASEDMETHODSFORCHARACTERIZINGTUMORDENSITYHETEROGENEITY

KatherineShoemaker1,BrianP.Hobbs2,KarthikBharath3,ChaanS.Ng2,VeerabhadranBaladandayuthapani2

1RiceUniversity,2MDAndersonCancerCenter,3UniversityofNottingham

Katherine,ShoemakerSolidlesionsemergewithindiversetissueenvironmentsmakingtheircharacterizationanddiagnosisachallenge.Withtheadventofcancerradiomics,avarietyoftechniqueshavebeendevelopedtotransformimagesintoquantifiablefeaturesetsproducingsummarystatisticsthatdescribethemorphologyandtextureofsolidmasses.Relyingonempiricaldistributionsummariesaswellasgrey-levelco-occurrencestatistics,severalapproacheshavebeendevisedtocharacterizetissuedensityheterogeneity.Thisarticleproposesanoveldecision-treebasedapproachwhichquantifiesthetissuedensityheterogeneityofagivenlesionthroughitsresultantdistributionoftree-structureddissimilaritymetricscomputedwithleastcommonancestortreesunderrepeatedpixelre-sampling.Themethodology,basedonstatisticsderivedfromGalton-Watsontrees,producesmetricsthatareminimallycorrelatedwithexistingfeatures,addingnewinformationtothefeaturespaceandimprovingquantitativecharacterizationoftheextenttowhichaCTimageconveysheterogeneousdensitydistribution.Wedemonstrateitspracticalapplicationthroughadiagnosticstudyofadrenallesions.Integratingtheproposedwithexistingfeaturesidentifiesclassifiersofthreeimportantlesiontypes;malignantfrombenign(AUC=0.78),functioningfromnon-functioning(AUC=0.93)andcalcifiedfromnon-calcified(AUCof1).

Page 60: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

49

DEMOCRATIZINGHEALTHDATAFORTRANSLATIONALRESEARCH

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 61: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

50

IDENTIFYINGNATURALHEALTHPRODUCTANDDIETARYSUPPLEMENTINFORMATIONWITHINADVERSEEVENTREPORTINGSYSTEMS

VivekanandSharma,IndraNeilSarkar

CenterforBiomedicalInformatics,BrownUniversityVivekanand,Sharma

Dataonsafetyandefficacyissuesassociatedwithnaturalhealthproductsanddietarysupplements(NHP&S)remainslargelycloisteredwithindomainspecificdatabasesorembeddedwithingeneralbiomedicaldatasources.Amajorchallengeinleveraginganalyticapproachesonsuchdataisduetotheinefficientabilitytoretrieverelevantdata,whichincludesagenerallackofinteroperabilityamongrelatedsources.ThisstudydevelopedathesaurusofNHP&Singredienttermsthatcanbeusedbyexistingbiomedicalnaturallanguageprocessing(NLP)toolsforextractinginformationofinterest.ThisprocesswasevaluatedrelativetointerventionnamestringssampledfromtheUnitedStatesFoodandDrugAdministrationAdverseEventReportingSystem(FAERS).AusecasewasusedtodemonstratethepotentialtoutilizeFAERSformonitoringNHP&Sadverseevents.Theresultsfromthisstudyprovideinsightsonapproachesforidentifyingadditionalknowledgefromextantrepositoriesofknowledge,andpotentiallyasinformationthatcanbeincludedintolargercurationefforts.

Page 62: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

51

DEMOCRATIZINGDATASCIENCETHROUGHDATASCIENCETRAINING

JohnDarrellVanHorn1,LilyFierro2,JeanaKamdar1,JonathanGordon2,CrystalStewart1,AvnishBhattrai1,SumikoAbe1,XiaoxiaoLei1,CarolineO’Driscoll1,AakanchhaSinha2,

PriyambadaJain2,GullyBurns2,KristinaLerman2,JoséLuisAmbite2

1USCMarkandMaryStevensNeuroimagingandInformaticsInstitute,KeckSchoolofMedicineofUSC,UniversityofSouthernCalifornia,2025ZonalAvenue,SHN,Los

Angeles,CA90033,Phone:323-442-7246;2InformationSciencesInstitute,UniversityofSouthernCalifornia,MarinadelRey,CA,USA

John,VanHornThebiomedicalscienceshaveexperiencedanexplosionofdatawhichpromisestooverwhelmmanycurrentpractitioners.Withouteasyaccesstodatasciencetrainingresources,biomedicalresearchersmayfindthemselvesunabletowrangletheirowndatasets.In2014,toaddressthechallengesposedsuchadataonslaught,theNationalInstitutesofHealth(NIH)launchedtheBigDatatoKnowledge(BD2K)initiative.Tothisend,theBD2KTrainingCoordinatingCenter(TCC;bigdatau.org)wasfundedtofacilitatebothin-personandonlinelearning,andopenuptheconceptsofdatasciencetothewidestpossibleaudience.Here,wedescribetheactivitiesoftheBD2KTCCanditsfocusontheconstructionoftheEducationalResourceDiscoveryIndex(ERuDIte),whichidentifies,collects,describes,andorganizesonlinedatasciencematerialsfromBD2Kawardees,openonlinecourses,andvideosfromscientificlecturesandtutorials.ERuDItenowindexesover9,500resources.Giventherichnessofonlinetrainingmaterialsandtheconstantevolutionofbiomedicaldatascience,computationalmethodsapplyinginformationretrieval,naturallanguageprocessing,andmachinelearningtechniquesarerequired-ineffect,usingdatasciencetoinformtrainingindatascience.Insodoing,theTCCseekstodemocratizenovelinsightsanddiscoveriesbroughtforthvialarge-scaledatasciencetraining.

Page 63: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

52

IMAGINGGENOMICS

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 64: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

53

HERITABILITYESTIMATESONRESTINGSTATEFMRIDATAUSINGTHEENIGMAANALYSISPIPELINE

BhimM.Adhikari1,NedaJahanshad2,DineshShukla1,DavidC.Glahn3,JohnBlangero4,RichardC.Reynolds5,RobertW.Cox5,ElsFieremans6,JelleVeraart6,DmitryS.Novikov6,

ThomasE.Nichols7,L.ElliotHong1,PaulM.Thompson2,PeterKochunov1

1MarylandPsychiatricResearchCenter,DepartmentofPsychiatry,UniversityofMarylandSchoolofMedicine,Baltimore,MD,USA;2ImagingGeneticsCenter,StevensInstituteforNeuroimaging&Informatics,KeckSchoolofMedicineofUSC,MarinadelRey,CA,USA;3DepartmentofPsychiatry,YaleUniversity,SchoolofMedicine,New

Haven,CT,USA;4GenomicsComputingCenter,UniversityofTexasatRioGrandeValley,USA;5NationalInstituteofMentalHealth,Bethesda,MD,USA;6CenterforBiomedicalImaging,DepartmentofRadiology,NewYorkUniversitySchoolofMedicine,NY,USA;

7DepartmentofStatistics,UniversityofWarwick,Coventry,CV47AL,UKPeter,KochunovBigdatainitiativessuchastheEnhancingNeuroImagingGeneticsthroughMeta-Analysisconsortium(ENIGMA),combinedatacollectedbyindependentstudiesworldwidetoachievemoregeneralizableestimatesofeffectsizesandmorereliableandreproducibleoutcomes.Sucheffortsrequireharmonizedimageanalysesprotocolstoextractphenotypesconsistently.ThisharmonizationisparticularlychallengingforrestingstatefMRIduetothewidevariabilityofacquisitionprotocolsandscannerplatforms;thisleadstosite-to-sitevarianceinquality,resolutionandtemporalsignal-to-noiseratio(tSNR).Aneffectiveharmonizationshouldprovideoptimalmeasuresfordataofdifferentqualities.Wedevelopedamulti-sitersfMRIanalysispipelinetoallowresearchgroupsaroundtheworldtoprocessrsfMRIscansinaharmonizedway,toextractconsistentandquantitativemeasurementsofconnectivityandtoperformcoordinatedstatisticaltests.Weusedthesingle-modalityENIGMArsfMRIpreprocessingpipelinebasedonmodel-freeMarchenko-PasturPCAbaseddenoisingtoverifyandreplicaterestingstatenetworkheritabilityestimates.Weanalyzedtwoindependentcohorts,GOBS(GeneticsofBrainStructure)andHCP(theHumanConnectomeProject),whichcollecteddatausingconventionalandconnectomicsorientedfMRIprotocols,respectively.Weusedseed-basedconnectivityanddual-regressionapproachestoshowthatthersfMRIsignalisconsistentlyheritableacrosstwentymajorfunctionalnetworkmeasures.Heritabilityvaluesof20-40%wereobservedacrossbothcohorts.

Page 65: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

54

MRITOMGMT:PREDICTINGMETHYLATIONSTATUSINGLIOBLASTOMAPATIENTSUSINGCONVOLUTIONALRECURRENTNEURALNETWORKS

LichyHan,MaulikR.Kamdar

PrograminBiomedicalInformatics,StanfordUniversitySchoolofMedicineLichy,HanGlioblastomaMultiforme(GBM),amalignantbraintumor,isamongthemostlethalofallcancers.TemozolomideistheprimarychemotherapytreatmentforpatientsdiagnosedwithGBM.ThemethylationstatusofthepromoterortheenhancerregionsoftheO6-methylguaninemethyltransferase(MGMT)genemayimpacttheefficacyandsensitivityoftemozolomide,andhencemayaffectoverallpatientsurvival.Microscopicgeneticchangesmaymanifestasmacroscopicmorphologicalchangesinthebraintumorsthatcanbedetectedusingmagneticresonanceimaging(MRI),whichcanserveasnoninvasivebiomarkersfordeterminingmethylationofMGMTregulatoryregions.Inthisresearch,weuseacompendiumofbrainMRIscansofGBMpatientscollectedfromTheCancerImagingArchive(TCIA)combinedwithmethylationdatafromTheCancerGenomeAtlas(TCGA)topredictthemethylationstateoftheMGMTregulatoryregionsinthesepatients.Ourapproachreliesonabi-directionalconvolutionalrecurrentneuralnetworkarchitecture(CRNN)thatleveragesthespatialaspectsofthese3-dimensionalMRIscans.OurCRNNobtainsanaccuracyof67%onthevalidationdataand62%onthetestdata,withprecisionandrecallbothat67%,suggestingtheexistenceofMRIfeaturesthatmaycomplementexistingmarkersforGBMpatientstratificationandprognosis.Wehaveadditionallypresentedourmodelviaanovelneuralnetworkvisualizationplatform,whichwehavedevelopedtoimproveinterpretabilityofdeeplearningMRI-basedclassificationmodels.

Page 66: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

55

BUILDINGTRANS-OMICSEVIDENCE:USINGIMAGINGAND‘OMICS’TOCHARACTERIZECANCERPROFILES

ArunimaSrivastava1,ChaitanyaKulkarni1,ParagMallick2,KunHuang3,RaghuMachiraju1

1TheOhioStateUniversity,2StanfordUniversity,3IndianaUniversitySchoolofMedicineArunima,SrivastavaUtilizationofsinglemodalitydatatobuildpredictivemodelsincancerresultsinarathernarrowviewofmostpatientprofiles.Someclinicalfacetsrelatestronglytohistologyimagefeatures,e.g.tumorstages,whereasothersareassociatedwithgenomicandproteomicvariations(e.g.cancersubtypesanddiseaseaggressionbiomarkers).Wehypothesizethattherearecoherent“trans-omics”featuresthatcharacterizevariedclinicalcohortsacrossmultiplesourcesofdataleadingtomoredescriptiveandrobustdiseasecharacterization.Inthiswork,for105breastcancerpatientsfromtheTCGA(TheCancerGenomeAtlas),weconsiderfourclinicalattributes(AJCCStage,TumorStage,ER-StatusandPAM50mRNASubtypes),andbuildpredictivemodelsusingthreedifferentmodalitiesofdata(histopathologicalimages,transcriptomicsandproteomics).Followingwhich,weidentifycriticalmulti-levelfeaturesthatdrivesuccessfulclassificationofpatientsforthevariousdifferentcohorts.Tobuildpredictorsforeachdatatype,weemploywidelyused“bestpractice”techniquesincludingCNN-based(convolutionalneuralnetwork)classifiersforhistopathologicalimagesandregressionmodelsforproteogenomicdata.While,asexpected,histologyimagesoutperformedmolecularfeatureswhilepredictingcancerstages,andtranscriptomicsheldsuperiordiscriminatorypowerforER-StatusandPAM50subtypes,thereexistafewcaseswherealldatamodalitiesexhibitedcomparableperformance.Further,wealsoidentifiedsetsofkeygenesandproteinswhoseexpressionandabundancecorrelateacrosseachclinicalcohortincluding(i)tumorseverityandprogression(incl.GABARAP),(ii)ER-status(incl.ESR1)and(iii)diseasesubtypes(incl.FOXC1).Thus,wequantitativelyassesstheefficacyofdifferentdatatypestopredictcriticalbreastcancerpatientattributesandimprovediseasecharacterization.

Page 67: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

56

PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 68: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

57

LOCALANCESTRYTRANSITIONSMODIFYSNP-TRAITASSOCIATIONS

AlexandraE.Fish1,DanaC.Crawford2,JohnA.Capra1,WilliamS.Bush2

1VanderbiltUniversity,2CaseWesternReserveUniversityWilliam,BushGenomicmapsoflocalancestryidentifyancestrytransitions–pointsonachromosomewhererecentrecombinationeventsinadmixedindividualshavejoinedtwodifferentancestralhaplotypes.Theseeventsbringtogetherallelesthatevolvedwithinseparatecontinentialpopulations,providingauniqueopportunitytoevaluatethejointeffectoftheseallelesonhealthoutcomes.Inthiswork,weevaluatetheimpactofgeneticvariantsinthecontextofnearbylocalancestrytransitionswithinasampleofnearly10,000adultsofAfricanancestrywithtraitsderivedfromelectronichealthrecords.GeneticdatawaslocatedusingtheMetabochip,andusedtoderivelocalancestry.Wedevelopamodelthatcapturestheeffectofbothsinglevariantsandlocalancestry,anduseittoidentifyexampleswherelocalancestrytransitionssignificantlyinteractwithnearbyvariantstoinfluencemetabolictraits.Inourmostcompellingexample,wefindthattheminoralleleofrs16890640occuringonaEuropeanbackgroundwithadownstreamlocalancestrytransitiontoAfricanancestryresultsinsignificantlylowermeancorpuscularhemoglobinandvolume.Thisfindingrepresentsanewwayofdiscoveringgeneticinteractions,andissupportedbymoleculardatathatsuggestchangestolocalancestrymayimpactlocalchromatinlooping.

Page 69: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

58

EVALUATIONOFPREDIXCANFORPRIORITIZINGGWASASSOCIATIONSANDPREDICTINGGENEEXPRESSION

BinglanLi1,ShefaliS.Verma1,2,YogasudhaC.Veturi2,AnuragVerma1,2,YukiBradford2,DavidW.Haas3,4,MarylynD.Ritchie1,2

1TheHuckInstitutesoftheLifeSciences,ThePennsylvaniaStateUniversity,UniversityPark,PA,USA;2BiomedicalandTranslationalInformaticsInstitute,Danville,PA,USA;3DepartmentofMedicine,Pharmacology,Pathology,Microbiology&Immunology,

VanderbiltUniversitySchoolofMedicine,Nashville,TN,USA;4DepartmentofInternalMedicine,MeharryMedicalCollege,Nashville,TN,USA

Binglan,LiGenome-wideassociationstudies(GWAS)havebeensuccessfulinfacilitatingtheunderstandingofgeneticarchitecturebehindhumandiseases,butthisapproachfacesmanychallenges.Toidentifydisease-relatedlociwithmodesttoweakeffectsize,GWASrequiresverylargesamplesizes,whichcanbecomputationalburdensome.Inaddition,theinterpretationofdiscoveredassociationsremainsdifficult.PrediXcanwasdevelopedtohelpaddresstheseissues.WithbuiltinSNP-expressionmodels,PrediXcanisabletopredicttheexpressionofgenesthatareregulatedbyputativeexpressionquantitativetraitloci(eQTLs),andthesepredictedexpressionlevelscanthenbeusedtoperformgene-basedassociationstudies.Thisapproachreducesthemultipletestingburdenfrommillionsofvariantsdowntoseveralthousandgenes.Butmostimportantly,theidentifiedassociationscanrevealthegenesthatareunderregulationofeQTLsandconsequentlyinvolvedindiseasepathogenesis.Inthisstudy,twoofthemostpracticalfunctionsofPrediXcanweretested:1)predictinggeneexpression,and2)prioritizingGWASresults.WetestedthepredictionaccuracyofPrediXcanbycomparingthepredictedandobservedgeneexpressionlevels,andalsolookedintosomepotentialinfluentialfactorsandafiltercriterionwiththeaimofimprovingPrediXcanperformance.AsforGWASprioritization,predictedgeneexpressionlevelswereusedtoobtaingene-traitassociations,andbackgroundregionsofsignificantassociationswereexaminedtodecreasethelikelihoodoffalsepositives.Ourresultsshowedthat1)PrediXcanpredictedgeneexpressionlevelsaccuratelyforsomebutnotallgenes;2)includingmoreputativeeQTLsintopredictiondidnotimprovethepredictionaccuracy;and3)integratingpredictedgeneexpressionlevelsfromthetwoPrediXcanwholebloodmodelsdidnoteliminatefalsepositives.Still,PrediXcanwasabletoprioritizeGWASassociationsthatwerebelowthegenome-widesignificancethresholdinGWAS,whileretainingGWASsignificantresults.ThisstudysuggestsseveralwaystoconsiderPrediXcan’sperformancethatwillbeofvaluetoeQTLandcomplexhumandiseaseresearch.

Page 70: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

59

READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROM

NONCODINGDNA

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 71: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

60

PAN-CANCERANALYSISOFEXPRESSEDSOMATICNUCLEOTIDEVARIANTSINLONGINTERGENICNON-CODINGRNA

TraversChing1,2,LanaX.Garmire1,2

1MolecularBiosciencesandBioengineeringGraduateProgram,UniversityofHawaiiatManoaHonolulu,HI96822,USA;2EpidemiologyProgram,UniversityofHawaiiCancer

CenterHonolulu,HI96813,USALana,GarmireLongintergenicnon-codingRNAshavebeenshowntoplayimportantrolesincancer.However,becauselincRNAsarearelativelynewclassofRNAscomparedtoprotein-codingmRNAs,themutationallandscapeoflincRNAshasnotbeenasextensivelystudied.HerewecharacterizeexpressedsomaticnucleotidevariantswithinlincRNAsusing12cancerRNA-SeqdatasetsinTCGA.Webuildmachine-learningmodelstodiscriminatesomaticvariantsfromgermlinevariantswithinlincRNAregions(AUC0.987).WebuildanothermodeltodifferentiatelincRNAsomaticmutationsfrombackgroundregions(AUC0.72)andfindseveralmolecularfeaturesthatarestronglyassociatedwithlincRNAmutations,includingcopynumbervariation,conservation,substitutiontypeandhistonemarkerfeatures.

Page 72: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

61

TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 73: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

62

GENEDIVE:AGENEINTERACTIONSEARCHANDVISUALIZATIONTOOLTOFACILITATEPRECISIONMEDICINE

PaulPrevide1,BrookThomas1,MikeWong1,EmilyK.Mallory2,DragutinPetkovic1,RussB.Altman2,AnaghaKulkarni1

1SanFranciscoStateUniversity,2StanfordUniversity

Anagha,KulkarniObtainingrelevantinformationaboutgeneinteractionsiscriticalforunderstandingdiseaseprocessesandtreatment.Withtheriseintextminingapproaches,thevolumeofsuchbiomedicaldataisrapidlyincreasing,therebycreatinganewproblemfortheusersofthisdata:informationoverload.Atoolforefficientqueryingandvisualizationofbiomedicaldatathathelpsresearchersunderstandtheunderlyingbiologicalmechanismsfordiseasesanddrugresponses,andultimatelyhelpspatients,issorelyneeded.TothisendwehavedevelopedGeneDive,aweb-basedinformationretrieval,filtering,andvisualizationtoolforlargevolumesofgeneinteractiondata.GeneDiveoffersvariousfeaturesandmodalitiesthatguidetheuserthroughthesearchprocesstoefficientlyreachtheinformationoftheirinterest.GeneDivecurrentlyprocessesoverthreemilliongene-geneinteractionswithresponsetimeswithinafewseconds.Foroverhalfofthecuratedgenesetssourcedfromfourprominentdatabases,morethan80%ofthegenesetmembersarerecoveredbyGeneDive.Inthenearfuture,GeneDivewillseamlesslyaccommodateotherinteractiontypes,suchasgene-drugandgene-diseaseinteractions,thusenablingfullexplorationoftopicssuchasprecisionmedicine.TheGeneDiveapplicationandinformationaboutitsunderlyingsystemarchitectureareavailableathttp://www.genedive.net.

Page 74: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

63

APPLICATIONSOFGENETICS,GENOMICSANDBIOINFORMATICSINDRUGDISCOVERY

POSTERPRESENTATIONS

Page 75: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

64

CELL-SPECIFICPREDICTIONANDAPPLICATIONOFDRUG-INDUCEDGENEEXPRESSIONPROFILES

RachelHodos1,2,PingZhang3,Hao-ChihLee1,QiaonanDuan1,ZichenWang1,NeilR.Clark1,AviMa’ayan1,FeiWang3,4,BrianKidd1,JianyingHu3,DavidSontag5,JoelT.

Dudley1

1IcahnSchoolofMedicineatMountSinai,2NewYorkUniversity,3IBMT.J.WatsonResearchCenter,4CornellUniversity,5MassachusettsInstituteofTechnology

Rachel,HodosGeneexpressionprofilingofinvitrodrugperturbationsisusefulformanybiomedicaldiscoveryapplicationsincludingdrugrepurposingandelucidationofdrugmechanisms.However,limiteddataavailabilityacrosscelltypeshashinderedourcapacitytoleverageorexplorethecell-specificityoftheseperturbations.Whilerecenteffortshavegeneratedalargenumberofdrugperturbationprofilesacrossavarietyofhumancelltypes,manygapsremaininthiscombinatorialdrug-cellspace.Hence,weaskedwhetheritispossibletofillthesegapsbypredictingcell-specificdrugperturbationprofilesusingavailableexpressiondatafromrelatedconditions--i.e.fromotherdrugsandcelltypes.Wedevelopedacomputationalframeworkthatfirstarrangesexistingprofilesintoathree-dimensionalarray(ortensor)indexedbydrugs,genes,andcelltypes,andthenuseseitherlocal(nearest-neighbors)orglobal(tensorcompletion)informationtopredictunmeasuredprofiles.Weevaluatepredictionaccuracyusingavarietyofmetrics,andfindthatthetwomethodshavecomplementaryperformance,eachsuperiorindifferentregionsinthedrug-cellspace.Predictionsachievecorrelationsof0.68withtruevalues,andmaintainaccuratedifferentiallyexpressedgenes(AUC0.81).Finally,wedemonstratethatthepredictedprofilesaddvalueformakingdownstreamassociationswithdrugtargetsandtherapeuticclasses.

Page 76: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

65

SYSTEMATICDISCOVERYOFGENOMICMARKERSFORCLINICALOUTCOMESTHROUGHCOMBINEDANALYSISOFCLINICALANDGENOMICDATA

JinhoKim1,HonguiCha2,Hyun-TaeShin2,BoramLee2,JaeWonYun2,JoonHoKang3,Woong-YangPark1

1SamsungMedicalCenter,2SungkyunkwanUniversity,3SungkyunkwanUniversitySchool

ofMedicineJinho,KimMolecularprofilingisakeycomponentofprecisionmedicineforcancer,asitprovidestargetablegeneorpathwaystopreventthetumortogrow.Inthisregard,moreandmorecancerclinicsemployclinicalsequencingplatformandareaccumulatingclinicogenomicsdata.However,ithasnotbeensystematicallystudiedhowgenomicalterationsinparticularvariantsinDNAcanbenefitinpredictingclinicaloutcomes.Herewedescribesystematicanalysestogainbiologicalinsightsfromacancergenomedatabankassociatedwiththeclinicalinformation.WeestablishedalargedatabankofclinicalandgenomicinformationthroughourNGS-basedclinicalsequencingplatform,CancerSCAN.Weidentifiednovelclinicallyrelevantvariantmarkerswhichpotentiallyimplicatedinpatientsurvivalandresponsetochemotherapeuticagents.Finally,webuildamultigenemodeltopredictclinicaloutcome.Themodelcorrectlycapturedclinicallyrelevantsomaticvariantsandwasvalidatedusinganindependentcohort.Ourstudyprovidesavaluableresourcetorealizeprecisiononcology.

Page 77: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

66

IDENTIFICATIONOFAPREDICTIVEGENESIGNATUREFORDIFFERENTIATINGTHEEFFECTSOFCIGARETTESMOKING

GangLiu1,JustinLi2,G.L.Prasad1

1RAIServicesCompany,P.O.Box1487,Winston-Salem,NC27102,USA;2AccuraScience,5721MerleHayRoad,Johnston,IA50131,USA

Background:Chroniccigarettesmokingadverselyimpactsmultipleorgansandisamajorriskfactorforseveraldiseasessuchascancer,cardiovasculardiseases,andchronicpulmonaryobstructivedisease(COPD).Becausesmoking-relateddiseasesoftendevelopoveralongperiod,itisusefultoinvestigatetheeffectsofsmokinginhealthyindividualstounderstandthepre-clinicalchangesthatleadtodiseasestates.Thoseearlymoleculareventscouldbefurtherdevelopedintobiomarkersthatareindicativeoftheadverseeffectsofsmoking.Severalclassesofdifferenttobaccoproducts,includingelectroniccigarettes(E-cigs),arecurrentlymarketedintheUSA,andtheirimpactonconsumershasnotyetbeenfullyunderstood.Giventhatthereisnoepidemiologydataavailableforthesenewclassesoftobaccoproducts,anunderstandingoftheearlymolecularandcellularchangesinhealthyconsumerscouldhelptodifferentiatetheeffectsofcigarettesandotherclassesoftobaccoproducts.Towardthatend,inthisstudy,weaimtodeveloppredictivegenesignaturesthatcanbeusedtodifferentiatesmokersfromnon-tobaccoconsumers.Methods:Thedataweusedforidentificationofgenesignatureswerederivedfromblood-basedgenome-wideexpressionprofilesfrom40smokersand40non-tobaccoconsumersenrolledinacross-sectionalbiomarkerstudy.Wesystematicallyevaluatedtheperformanceofseveralmachinelearningalgorithms.Thesealgorithmsarecombinationsoffourclassificationmethods,includingSupportVectorMachine(SVM),andfourfeatureselectionmethodsincludingRecursiveFeatureElimination(RFE).Eachgeneexpressionsignaturemodelwasconstructedusingatwo-layercross-validationscheme.TheywereevaluatedusingaccuracyandMathew’scorrelationcoefficient(MCC),whichareperformanceevaluationmetricswidelyusedinmachinelearningtechniques.Results:OurresultssuggestthatSVMcombinedwithRFEoutperformsthe15otheralgorithmswehavetested.Thisledtoidentificationofa32-genesignaturewithhighsensitivityandspecificity.Inaddition,thisnewgenesignatureachievesexcellentvalidationresults(accuracy:0.87,MCC:0.7)whenevaluatedusinganotherindependentmicroarraydatasetfromsmokersandnon-smokers.Thegenesinthe32-genesignatureincludepreviouslyreportedgenebiomarkerssuchasGPR15,SASH1,andLRRN3,andalsoconsistofadditionalnovelgenesassociatedwithinflammation,liverinjury,andarachidonicacidmetabolism.Wearecurrentlyworkingtofurtherrefineandvalidatethisgenesignatureusingotherpublically-availablesmoking-relatedgeneexpressiondatasetsandthepolymerasechainreaction-basedassay.Conclusions:Wehavedescribedahigh-performing32-genesignaturethatenablespredictionofmolecularchangesinhealthysmokers.ThisgenesignaturecouldaidindifferentiatingtheeffectsofadditionalclassesoftobaccoproductssuchasE-cigs.

Page 78: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

67

THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY

MaryA.Pyc,DouglasFenger,PhilipCheung,J.StevendeBelle,TimTully

DartNeuroScienceDouglas,FengerWeareinterestedindiscoveringcandidatetargetsfordrugtherapiestoenhancecognitivevitalityinhumansthroughoutlife,andtoremediatememorydeficitsassociatedwithbraininjuryandbrain-relateddiseasessuchasAlzheimer’sandParkinson’sdiseases.WeimplementedaGenome-WideAssociationStudy(GWAS)toidentifygeneticlocivaryingamongindividualswhopossessexceptionalandnormalmemoryabilities.Thesegenesandthoseinassociatednetworkswillinformdrugdiscoveryanddevelopment.Ourfirststepistoidentifyexceptionalmembersofthepopulation.Thus,wehavecreatedanonlinememorytest–theExtremeMemoryChallenge(XMC,accessibleathttp://www.extremememorychallenge.com)–toscreenthroughanunlimitednumberofsubjectstofindindividualswithexceptionalmemoryconsolidationabilities.AsubsetofsubjectswerevalidatedbyabatteryofsecondarymemorytasksandprovidedsalivasamplesfromwhichwecanisolateDNAforGWAS.Todate,26,348participantsfrom187nationshavebeenscreened(with16,486completingbothsessions).ThesampleisprimarilyCaucasian(58%),post-secondaryschool-educated(64%),averageageof34yearsold,andequalnumbersofeachgender.Theaverageforgettingrateacrosssessionswas10%.Thesecondaryscreeninginvolvedmemory,IQ,attentionalcontrol,andpersonalitymeasures.Analysesareunderwaytodeterminetherelationshipbetweenexceptionalmemoryandgenetics.

Page 79: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

68

EXTRACTINGABIOLOGICALLYRELEVANTLATENTSPACEFROMCANCERTRANSCRIPTOMESWITHVARIATIONALAUTOENCODERS

GregoryP.Way,CaseyS.Greene

GenomicsandComputationalBiologyGraduateProgram,DepartmentofSystemsPharmacologyandTranslationalTherapeutics,UniversityofPennsylvania,Philadelphia,

PA19104USAGregory,WayTheCancerGenomeAtlas(TCGA)hasprofiledover10,000tumorsacross33differentcancer-typesformanygenomicfeatures,includinggeneexpressionlevels.Geneexpressionmeasurementscapturesubstantialinformationaboutthestateofeachtumor.Certainclassesofdeepneuralnetworkmodelsarecapableoflearningameaningfullatentspace.Suchalatentspacecouldbeusedtoexploreandgeneratehypotheticalgeneexpressionprofilesundervarioustypesofmolecularandgeneticperturbation.Forexample,onemightwishtousesuchamodeltopredictatumor'sresponsetospecifictherapiesortocharacterizecomplexgeneexpressionactivationsexistingindifferentialproportionsindifferenttumors.Variationalautoencoders(VAEs)areadeepneuralnetworkapproachcapableofgeneratingmeaningfullatentspacesforimageandtextdata.Inthiswork,wesoughttodeterminetheextenttowhichaVAEcanbetrainedtomodelcancergeneexpression,andwhetherornotsuchaVAEwouldcapturebiologically-relevantfeatures.Inthefollowingreport,weintroduceaVAEtrainedonTCGApan-cancerRNA-seqdata,identifyspecificpatternsintheVAEencodedfeatures,anddiscusspotentialmeritsoftheapproach.Wenameourmethod"Tybalt"afteraninstigative,cat-likecharacterwhosetsacascadingchainofeventsinmotioninShakespeare'sRomeoandJuliet.Fromasystemsbiologyperspective,Tybaltcouldonedayaidincancerstratificationorpredictspecificactivatedexpressionpatternsthatwouldresultfromgeneticchangesortreatmenteffects.

Page 80: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

69

CHALLENGESOFPATTERNRECOGNITIONINBIOMEDICALDATAORALPRESENTATION

POSTERPRESENTATIONS

Page 81: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

70

LARGE-SCALEANALYSISOFDISEASEPATHWAYSINTHEHUMANINTERACTOME

MonicaAgrawal1,MarinkaZitnik1,JureLeskovec1,2

1DepartmentofComputerScience,StanfordUniversity;2ChanZuckerbergBiohub,SanFrancisco,CA

Marinka,ZitnikDiscoveringdiseasepathways,whichcanbedefinedassetsofproteinsassociatedwithagivendisease,isanimportantproblemthathasthepotentialtoprovideclinicallyactionableinsightsfordiseasediagnosis,prognosis,andtreatment.Computationalmethodsaidthediscoverybyrelyingonprotein-proteininteraction(PPI)networks.Theystartwithafewknowndisease-associatedproteinsandaimtofindtherestofthepathwaybyexploringthePPInetworkaroundtheknowndiseaseproteins.However,thesuccessofsuchmethodshasbeenlimited,andfailurecaseshavenotbeenwellunderstood.HerewestudythePPInetworkstructureof519diseasepathways.Wefindthat90%ofpathwaysdonotcorrespondtosinglewell-connectedcomponentsinthePPInetwork.Instead,proteinsassociatedwithasinglediseasetendtoformmanyseparateconnectedcomponents/regionsinthenetwork.Wethenevaluatestate-of-the-artdiseasepathwaydiscoverymethodsandshowthattheirperformanceisespeciallypoorondiseaseswithdisconnectedpathways.Thus,weconcludethatnetworkconnectivitystructurealonemaynotbesufficientfordiseasepathwaydiscovery.However,weshowthathigher-ordernetworkstructures,suchassmallsubgraphsofthepathway,provideapromisingdirectionforthedevelopmentofnewmethods.

Page 82: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

71

PROFILINGOFSOMATICALTERATIONSINBRCA1-LIKEBREASTTUMORS

YoudinghuanChen1,2,3,YueWang3,4,LucasA.Salas1,ToddW.Miller3,7,JonathanD.Marotti5,NicoleP.Jenkins2,ArminjaN.Kettenbach2,3,7,ChaoCheng3,4,7,BrockC.

Christensen1,3,7

1DepartmentofEpidemiology,2DepartmentofBiochemistryandCellBiology,3DepartmentofMolecularandSystemsBiology,4DepartmentofGenetics,5Department

ofPathologyandLaboratoryMedicine,6DepartmentofBiomedicalDataScienceatGeiselSchoolofMedicine,Dartmouth,Lebanon,NH03756;

7NorrisCottonCancerCenter,Dartmouth-HitchcockMedicalCenter,Lebanon,NH03756Youdinghuan,ChenGermlineorsomaticmutationinBRCA1isassociatedwithanincreasedriskofbreastcancerandmoreaggressivetumorsubtypes.BRCA1-deficienttumorcellshavedefectivehomologousrecombination(HR)DNArepair,exhibitinggenomeinstabilityandaneuploidy.HRdeficiencycanalsoariseintumorsintheabsenceofBRCA1mutation.AnHR-deficient,BRCA1-likephenotypehasbeenreferredtoas“BRCAness.”BRCA1-likecancersexhibitworseprognosisbutareselectivelysensitivetochemotherapeutictreatments(e.g.platinum-basedalkylatingagents).However,themolecularlandscapesofBRCA1-likebreasttumorsremainlargelyunknowninpartbecausetheyarelesscommoninthegeneralpopulation.Byapplyingacopynumber-basedclassifier,weobservedthat>30%ofTheCancerGenomeAtlas(TCGA)breasttumorsareBRCA1-likeeventhoughonly~3%tumorsanalyzedcarryaBRCA1mutationorpromoterhypermethylation.Separately,adifferentialanalysiscontrollingforhormonereceptorstatus,subjectage,tumorstageandpurityrevealedasignificantincreaseinDNAmethyltransferase1(DNMT1)proteinexpressioninBRCA1-liketumors.Inaddition,differentiallymethylatedgenesetsinBRCA1-liketumorsindicatedastrongenrichmentindevelopmentalsignalingandamoderateinvolvementingenetranscription.ProfilingofconcomitantsomaticalterationlandscapesinBRCA1-likebreasttumorsprovidesalternativestrategiestoidentifythissubsetoftumorsandinsightsintonovelpotentialtherapeuticapproaches.

Page 83: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

72

USINGARTIFICIALINTELLIGENCEINDIGITALPATHOLOGYTOCLASSIFYMELANOCYTICLESIONS

StevenN.Hart,W.Flotte,A.P.Norgan,K.K.Shah,Z.R.Buchan,K.B.Geiersbach,T.Mounajjed,T.J.Flotte

MayoClinic,200FirstSt.SW,Rochester,MN55901

Steven,HartExaminationofhematoxylinandeosinstaining(H&E)stainedslidesbylightmicroscopyhasbeenthecornerstoneofhistopathologyforoveracentury.Duringmicroscopicexamination,apathologistusessalientclinicalinformation,patternmatchingandfeaturerecognition(shape,color,structure,etc.)torenderadiagnosis.Recently,whole-slideimage(WSI)scannershavemadeitpossibletofullydigitizepathologyslides.Inadditiontoenablinglongtermslidepreservationandfacilitatingslidesharingforcollaborationorsecondopinions,digitizationofpathologyslidesallowsforthedevelopmentandutilizationArtificialIntelligence(AI)-drivendiagnostictools.WeconductedapilotstudytotesttheabilityanAIconvolutionalneuralnetwork(CNN)todistinguishbetweentwotypesofmelanocyticlesions,ConventionalandSpitznevi.Wesoughttodeterminetheaddedvalueofpathologist-assistedtrainingbycomparingtrainingeffectivenessofcompleteslideanalysisversustrainingonpathologistselectedimagepatches.ImageswereclassifiedbyadeepCNNusingGoogle’sTensorFlowframework.Wefoundsignificantimprovementinclassificationaccuracywhenthemodelwastrainedfromthepathologist-curatedimageset.ThesedataprovidestrongevidenceforthecontinueddevelopmentofAI-drivendiagnostictoolsindigitalpathology,andhighlightstheaddedvalueofdomainexpertswhenbuildingAIworkflows.Futuredirectionsofthisworkincludeexpandingthenumbermelanocyticlesionsrecognizedbythistool,andenhancingitsclinicalperformancethroughincorporationofmolecular,demographic,andoutcomesdata.

Page 84: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

73

AMACHINELEARNINGAPPROACHTOSTUDYCOMMONGENEEXPRESSIONPATTERNS

MingzeHe1,2,CarolynJ.Lawrence-Dill1,2,3

1BioinformaticsandComputationalBiologyProgram,IowaStateUniversity,Ames,Iowa,USA,50011;2DepartmentofGenetics,DevelopmentandCellBiology,IowaState

University,Ames,Iowa,USA50011;3DepartmentofAgronomy,IowaStateUniversity,Ames,Iowa,USA50011

Mingze,HeGeneexpressionlandscapechangesaccordingtocertaincircumstances,suchasstressresponses.Themaindifficultiesinpredictingcommonexpressionpatternsamonggroupsofgeneslayinlocatingreliablegenemarkersanddevelopingnovelstatisticalapproaches.Wefirstlybuildasharedgeneontology(GO)correlatedgroupingdatabasebynaturallanguageprocessing(NLP).Further,wetestandapplyamixtureofsupervisedandunsupervisedmachinelearningalgorithmstocompareprincipalcomponentsofexpressionpatternsacrossspecies.WefoundseveralsurprisingcommonexpressionpatternsbetweenmaizegenesandhumantumorcelllinesifG-quadruplex(G4)usedasgeneclassifier.Especially,responsetoreactiveoxygenspecies(ROS)relatedG4carryinggenesshowasignificantclusteringofmaizeundercoldandUVstresswithhumantumorcelllines.ThisresultimpliesthatG4regulatenearbygenesundersimilarstresssituation.

Page 85: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

74

GENERAL

POSTERPRESENTATIONS

Page 86: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

75

DATABASE-FREEMETAGENOMICANALYSISWITHAKRONYMER

GabrielAl-Ghalith1,AbigailJohnson2,PajauVangay1,DanKnights3

1BioinformaticsandComputationalBiology-UniversityofMinnesota;2TheBiotechnologyInstitute-UniversityofMinnesota;3DepartmentofComputerScienceand

Engineering-UniversityofMinnesotaGabriel,Al-GhalithMicrobiomeresearchischaracterizedbythecomparisonofmicrobialcommunitycensusdatainferredfrombiologicalsamples.Tocreatethesecensuses,metagenomicDNAistypicallyclustered,aligned,orotherwiseannotatedtoformasetoffeatureswithwhichtoevaluateandcomparemicrobialcommunities.Thesefeaturesmaytakedifferentforms.Amplicon-basedstudiesmayusereference-basedapproachesand/orclusteringofsimilarreadstodistillarepresentativesetoffeaturessuchasoperationaltaxonomicunits.Shotgun-basedapproachescanresultinfiner-grained,lessbiasedtaxonomicresolution,butoftenrelyonreferencedatabasesorclassifierstrainedonknownmicrobialentities.Whiletaxonomyandotherdatabaseannotationsareusefulforinterpretation,theymaymaskusefulsequence-levelinformationforcomparingsamplestoeachother.Inparticular,wheneverthereisnotenoughsequencedatafromparticularorganismsinthereferencedatabase(orrawreads)toidentifythemreliably,informationabouttheseorganismscanbelostormisattributed.Thiscausesmanyenvironmentstobedifficultorevenimpossibletocomparewithcurrentmethods.Further,clusteringorreference-basedanalysesaretypicallycomputationallydemanding.Wepresentacomplementary(oralternative)strategyformicrobiomecomparisoninthesoftwareaKronyMer.Itusesanovel,probabilityadjusteddeterministick-merdistancemetricandultrafastnon-heuristicNei-Saitou-basedtreeclusteringalgorithmtorapidlycalculatealphadiversity,betadiversity,andsampleinter-relatednesstreeswitheitherampliconorshotgunsequencedatadirectlywithoutadatabase.Itisrobusttolow-depthsequencing,itrecoversperson-specificsignatureswithfewerthan100,000shotgunreadspersampleinadatasetof34healthyindividuals,anditrecapitulatesotherexpectedtrendsinpublicdatasets.Additionally,aKronyMercanbeusedtoinferphylogenetictreesfromamplicondatainsecondsonalaptop,createawhole-genomephylogenomictreefromall~100,000RefSeqmicrobialgenomesinafewhoursonadesktop,denoisereadsduringprocessing,andinotherpotentialapplications.

Page 87: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

76

SOFTWARECOMPARISONFORPREPROCESSINGGC/LC-MS-BASEDMETABOLOMICSDATA

JulianAldana1,MonicaCalaMolina1,MarthaZuluaga2

1DepartmentofChemistryGrupodeInvestigaciónenQuímicaAnalíticayBioanalítica(GABIO),UniversidaddelosAndesBogotáDC,Colombia;2DepartmentofChemistryGrupodeInvestigaciónenCromatografíayTécnicasAfines(GICTA)Universidadde

CaldasManizalesColombiaJulian,AldanaMetabolomicsdatapreprocessingisthefirststepfromrawinstrumentoutputtobiologicalinference,anditiscrucialforthediscoveryofmetabolicsignaturesrelatedtoaparticularphysiopathologicalstateofanorganism.Moreover,datahandlingofgaschromatography/massspectrometry(GC/MS)andliquidchromatography/massspectrometry(LC/MS)datasetsarechallengingduetoitssize,complexityandnoise.Therefore,datapreprocessingisperformedasamulti-steptaskthatinvolves:filtering,peakdetection,deconvolution,andalignment,whichcanbecarriedoutusingawidevarietyofalgorithmsandsoftwarepackages.Giventhelackofasingularpreprocessingsoftwareasabenchmark,thegoalofthisstudyistocomparetheperformanceinthepreprocessingofGC/LC-MSdatabetweenopensourceplatforms(MZmine2,XCMSonlineandMetaboAnalyst3.0)andcommercialsoftware(MassHunterProfinder8.0andMetaboliteplot).Forthispurpose,datasetswerecollectedfromtheanalysisofreplicatesamplesfromaplasmapooling,andwefollowaworkflowprocessineachsoftwareadjustingtheparametersinasimilarwaytoallowthecomparison.Then,thedatageneratedwasanalyzedtodeterminethenumberoffeatures,coefficientofvariationandpeakarea.Asaresult,significantdifferencesweredeterminedinthequantitativeperformanceofthepreprocessingevaluatedpackagesforbothGCandLC-MSdatasets.Finally,thiscomparisonallowedustoevaluatethemagnitudeofpreprocessingeffectinthefinaloutputinMS-basedmetabolomicdata,andhowtheresultsofdifferentsoftwarecanbecomparedeachother.

Page 88: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

77

GATEKEEPER:ANEWHARDWAREARCHITECTUREFORACCELERATINGPRE-ALIGNMENTINDNASHORTREADMAPPING

MohammedAlser1,HasanHassan2,HongyiXin3,OğuzErgin4,OnurMutlu2,CanAlkan1

1BilkentUniversity,2ETHZurich,3CarnegieMellonUniversity,4TOBBUniversityofEconomicsandTechnology

Onur,MutluMotivation:Untiltoday,itremainschallengingtosequencetheentireDNAmoleculeasawhole.IntheeraofhighthroughputDNAsequencing(HTS)technologies,genomesaresequencedrelativelyquicklybutresultinanexcessivenumberofsmallDNAsegments(calledshortreadsandareabout75-300basepairslong).Resultingreadsdonothaveanyinformationaboutwhichpartofgenometheycomefrom;hencethebiggestchallengeingenomeanalysisistodeterminetheoriginofeachofthebillionsofshortreadswithinareferencegenometoconstructthedonor’scompletegenome.Identifyingthepotentialoriginofeachread,calledalignment,typicallyperformedusingquadratic-timedynamicprogrammingalgorithms.Theseoptimalalignmentalgorithmsareunavoidableandessentialforprovidingaccurateinformationaboutthequalityofthealignment.Inrecentworks[1-4],researchersobservedthatthemajorityofcandidatelocationsinthereferencegenomedonotalignwithagivenreadduetohighdissimilarity.Calculatingthealignmentofsuchincorrectcandidatelocationswastestheexecutiontimeandincursignificantcomputationalburden.Therefore,itiscrucialtodevelopafastandeffectiveheuristicmethodthatcandetectincorrectcandidatelocationsandeliminatethembeforeinvokingcomputationallycostlyalignmentalgorithms.Results:WeproposeGateKeeper,anewhardwareacceleratorthatfunctionsasapre-alignmentstepthatquicklyfiltersoutmostincorrectcandidatelocations.GateKeeperisthefirstdesigntoacceleratepre-alignmentusingField-ProgrammableGateArrays(FPGAs),whichcanperformpre-alignmentmuchfasterthansoftware.WhenimplementedonasingleFPGAchip,GateKeepermaintainshighaccuracy(onaverage>96%)whileproviding,onaverage,90-foldand130-foldspeedupoverthestate-of-the-artsoftwarepre-alignmenttechniques,AdjacencyFilterandShiftedHammingDistance(SHD),respectively.TheadditionofGateKeeperasapre-alignmentstepcanreducetheverificationtimeofthemrFASTmapperbyafactorof10.Availability:GateKeeperisopen-sourceandfreelyavailableonlineathttps://github.com/BilkentCompGen/GateKeeper.Anextendedversionofthisworkappearsin[1].References:[1]Alser,M.,etal.,GateKeeper:anewhardwarearchitectureforacceleratingpre-alignmentinDNAshortreadmapping.Bioinformatics,2017.33(21):p.3355-3363.[2]Xin,H.,etal.,ShiftedHammingDistance:AFastandAccurateSIMD-FriendlyFiltertoAccelerateAlignmentVerificationinReadMapping.Bioinformatics,2015.31(10):p.1553-1560.[3]Xin,H.,etal.,AcceleratingreadmappingwithFastHASH.BMCgenomics,2013.14(Suppl1):p.S13.[4]Kim,J.,etal.,GenomeReadIn-Memory(GRIM)Filter:FastLocationFilteringinDNAReadMappingusingEmergingMemoryTechnologies,toappearinBMCGenomics,2018.

Page 89: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

78

MODELINGTHEENHANCERACTIVITYTHROUGHTHECOMBINATIONOFEPIGENETICFACTORS

MinGyunBae,TaeyeopLee,JaehoOh,JunHyeongLee,JungKyoonChoi

DepartmentofBioBrainEngineering,KoreaAdvancedInstituteofScienceandTechnology(KAIST),RepublicofKorea

MinGyun,BaeEpigenomemapsallowustopredictthousandsofputativeregulatoryregionssuchaspromoter,insulatorsandenhancersinvariouscelllinesthroughinvivoepigenomicsignaturesandarewidelyusedforstudyinggeneregulationofdevelopmentalprocessanddisease.Especially,super-enhancers,whichconsistofclustersofactiveenhancerspredictedfromH3K27acsignal,areknowntoregulateneargenesthatareimportantincontrollinganddefiningcellidentity.However,thecombinationoftranscriptionfactorsforregulatingenhanceractivityisnotstudiedyet.Inthisstudy,weusedmassivelyparallelreporterassay(MPRA)datawhichmeasurethequantitativeactivityofregulatoryregionstoidentifyenhancers.Through5-nucleotideresolutiontilingofoverlappingMPRAconstructswithaprobabilisticgraphicalmodel,weestimatedthehighresolutionactivityspanning15000putativeregulatoryregionsinK562andHepG2cellline.Accordingtotheratioofactivityatboundaryandcenterofregulatoryregion,weidentifiedthousandsofenhancerscandidates.Usingtheseenhancers,wedevelopedarandomforestmodeltoidentifytheepigeneticdifferencesusingabout300histonemodificationsandtranscriptionfactorsinencyclopediaofDNAelements(ENCODE).Throughtheperformancetestbyareaundercurve(AUC),weconfirmedthatourmodelaccuratelypredictedtheenhancers.Inconclusion,weidentifiedenhancersthroughhigh-throughputreporterassayandfoundtheepigeneticfeaturesthroughrandomforestmodelling.

Page 90: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

79

FREQUENCYANDPROPERTIESOFMOSAICSOMATICMUTATIONSINANORMALDEVELOPINGBRAIN

TaejeongBae1,JessicaMariani2,LiviaTomasini2,BoZhou3,AlexanderE.Urban3,AlexejAbyzov1,FloraM.Vaccarino2

1MayoClinic,2YaleUniversity,3StanfordUniversity

Alexej,AbyzovAsmountingevidenceindicates,eachcellinthehumanbodyhasitsowngenome,aphenomenoncalledsomaticmosaicism.Fewstudieshavebeenconductedtounderstandpost-zygoticaccumulationofmutationsincellsofthehealthyhumanbody.Startingfromsinglecells,directlyobtainedfromthreefetalbrains,weestablished31separatecoloniesofneuronalprogenitorcells,andcarriedoutwhole-genomesequencingonDNAfromeachcolony.Theclonalnatureofthesecoloniesallowsahigh-resolutionanalysisofthegenomesofthefounderprogenitorcellswithoutbeingconfoundedbytheartifactsofinvitrosinglecellwholegenomeamplification.Acrossthethreebrainswedetected200to400non-germlineSNVsperclone.Validationexperiments(withPCR,digitaldropletPCR,andcapturedeepsequencing)revealedhighspecificity(>95%)andsensitivity(>80%)oftheSNVsaswellasconfirmedthepresenceofoverahundredofSNVsintheoriginalbraintissues,therebyprovingthatthedetectedSNVsrepresentgenuinemosaicvariantspresentinneuronalprogenitors.Theper-cellnumberofmosaicSNVsincreasedlinearlywithbrainageallowingustoestimatethemutationrateatabout8.6SNVspercelldivision.DozensofSNVsweregenotypedinmultipledifferentregionsofabrainandeveninblood,suggestingthattheyhaveoccurredpriortogastrulation.UsingtheseSNVs,wereconstructedcelllineagesforthefirstfivepost-zygoticcleavagesandcalculatedamutationrateof~1.3SNVsperdivisionperdaughtercell.Comparisonofmutationspectrarevealedashifttowardsoxidativedamage-relatedmutationsinneurogenesis.Bothneurogenesisandearlyembryogenesisexhibitdrasticallymoremutagenesisthanadulthood.Onacoarse-grainedscalemosaicSNVsweredistributeduniformlyacrossthegenomeandwereenrichedinmutationalsignaturesobservedinmedulloblastoma,neuroblastoma,aswellasinasignatureobservedinallcancersandindenovovariantsandwhich,aswepreviouslyhypothesized,isahallmarkofnormalcellproliferation.Correlationswithhistonemarksfurtherstrengthenedthesimilarityofmosaicmutationsinnormalfetalbrainwithsomaticmutationsreportedforbraincancers.OnasmallerscaleSNVsweremostlybenign,showednoassociationwithanyGOcategoryandtendedtoavoidDNAsehypersensitivesites.Thesefindingsrevealalargedegreeofsomaticmosaicisminthedevelopinghumanbrain,linkdenovoandcancermutationstonormalmosaicismandsetabaselineformosaicgenomevariationrelatedtohumanbraindevelopmentandfunction.

Page 91: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

80

CYCLONOVO:DENOVOSEQUENCINGALGORITHMDISCOVERSNOVELCYCLICPEPTIDENATURALPRODUCTSINSUNFLOWERANDCYANOBACTERIAUSING

TANDEMMASSSPECTROMETRYDATA

BaharBehsaz1,HoseinMohimani2,AlexeyGurevich3,AndreyPrjibelski3,MarkF.Fisher4,LarrySmarr2,PieterC.Dorrestein5,JoshuaS.Mylne4,PavelA.Pevzner2

1BioinformaticsandSystemsBiologyProgram,UniversityofCaliforniaatSanDiego,LaJolla,USA;

2DepartmentofComputerScienceandEngineering,UniversityofCaliforniaatSanDiego,LaJolla,USA;3CenterforAlgorithmicBiotechnology,InstituteforTranslationalBiomedicine,St.PetersburgState

University,StPetersburg,Russia;4TheUniversityofWesternAustralia,SchoolofMolecularSciencesandARCCentreofExcellenceinPlantEnergyBiology,Crawley,Australia;5DepartmentofPharmacology,

UniversityofCaliforniaatSanDiego,LaJolla,USABahar,BehsazCyclopeptidesrepresentanimportantclassofnaturalproductswithanunparalleledtrackrecordinpharmacology:manyantibiotics,antitumoragents,andimmunosuppressors,arecyclopeptides.WhilebillionsoftandemmassspectraofnaturalproductshavebeendepositedtoGlobalNaturalProductsSocial(GNPS)molecularnetwork,thediscoveryofnovelcyclopeptidesfromthisgoldmineofspectraldataremainschallenging.Astheresult,onlyasmallfractionofspectraintheGNPSmolecularnetworkhavebeenidentifiedsofar.Toaddressthisbottleneck,wedevelopedCycloNovoalgorithmfordenovocyclopeptidesequencingbasedontheconceptofthedeBruijngraphs,theworkhorseofmoderngenomesequencingalgorithms.Givenaspectraldataset,CycloNovofirstidentifiesasubsetofthisdatasetthatmayrepresentcyclicandbranch-cyclicpeptidesbyanalyzingspectral-convolutionofeachspectrum.Afterward,itattemptstodenovosequenceeachspectrumofputativecyclicorbranch-cyclicpeptides.CycloNovopipelineincludes(i)computingthespectralconvolutionofeachspectrum,andextractingthesetofmassesthatrepresentputativeaminoacidsintheunknownPNP,(ii)computingcompositionsofmassesthatmatchestheprecursormassofthespectrum,(iii)constructingpotential5-mersforeachcompositionwithhighscoreagainstthespectrum,(iv)constructingadeBruijngraphwiththose5-mers,(v)traversingthedeBruijngraphandgeneratingcandidatesequences,and(vi)computingthePeptide-Spectrum-Match(PSM)scoreforeachcandidatesequence.CycloNovorevealedmanystillunknowncyclopeptides(hundredsofnovelcyclopeptidefamilies)illustratingthatcurrentlyknowncyclopeptidesrepresentjustasmallfractionofcyclopeptideswhosespectraarealreadydepositedintopublicdatabasessuchasGNPS.CycloNovoaddressesthechallengeofanalyzingthe“darkmatterofcyclopeptidome”byapplyingdeBruijngraphstocyclopeptidesequencing.ItcorrectlysequencedmanyknowncyclopeptidesinablindexperimentandreconstructednovelcyclopeptidesoriginatedfromplantsandcyanobacteriathatwerefurthervalidatedusingRNA-seqdataandgenomemining,thefirstcyclopeptidesdiscoveredinacompletelyautomateddenovofashion.Ouranalysisofhumanmicrobiomeisthefirstdemonstrationthatnumerousbioactivecyclopeptidesfromconsumedplantsremainstableintheproteolytichumangutenvironmentandthusareexpectedtointeractwithhumanmicrobiome.Inaddition,itrevealedalargenumberofstillunknowncyclopeptidesinthehumangutthatareeitherapartofthehumandietorareproductsofthehumangut’smicrobiome.

Page 92: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

81

FUNCTIONALANNOTATIONOFGENOMICVARIANTSINSTUDIESOFLATE-ONSETALZHEIMER’SDISEASE

MariuszButkiewicz,JonathanL.Haines,WilliamS.Bush

InstituteforComputationalBiologyandDepartmentofPopulationandQuantitativeHealthSciences,CaseWesternReserveUniversity,Cleveland,OHUSA

William,BushAnnotationofgenomicvariantsisanincreasinglyimportantandcomplexpartoftheanalysisofsequence-basedgenomicanalyses.Computationalpredictionsofvariantfunctionareroutinelyincorporatedintogene-basedanalysesofrare-variants,thoughtodatemoststudiesuselimitedinformationforassessingvariantfunctionthatisoftenagnosticofthediseasebeingstudied.Inthiswork,weoutlineanannotationprocessmotivatedbytheAlzheimer’sDiseaseSequencingProject,andillustratetheimpactofincludingtissue-specifictranscriptsetsandsourcesofgeneregulatoryinformation,andassessthepotentialimpactofchanginggenomicbuildsontheannotationprocess.Whilethesefactorsonlyimpactasmallproportionoftotalvariantannotations(~5%),theyinfluencethepotentialanalysisofalargefractionofgenes(~25%).Variantannotationisavailableforbulkdownload,andindividualvariantannotationsarealsoavailableviatheNIAGADSGenomicsDB.

Page 93: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

82

OCTAD:ANOPENCANCERTHERAPEUTICDISCOVERYWORKSPACEINTHEERAOFPRECISIONMEDICINE

BinChen,BenjaminS.Glicksberg,WilliamZeng,YuyingChen,KeLiu

InstituteforComputationalHealthSciences,UniversityofCalifornia,SanFrancisco,55016thStreet,SanFrancisco,California94143,USA

Bin,ChenRapidlydecreasingcostsofRNAsequencinghaveenabledlarge-scaleprofilingofcancertumorsampleswithpreciselydefinedclinicalandmolecularfeatures(e.g.,LowgradeIDH1mutantGlioma).Identifyingdrugstargetingaspecificsubsetofcancerpatients,particularlythosethatdonotrespondtoconventionaltreatments,iscriticallyimportantfortranslationalresearch.Manystudieshavedemonstratedtheutilityofasystems-basedapproachthatconnectscancerstoefficaciousdrugsthroughgeneexpressionsignaturestoprioritizedrugsfromalargedruglibrary.Fromourpreviousworkonlivercancer,Ewing’sSarcoma,andBasalcellcarcinoma,wehaveshownthatthesuccessofthisapproachismadepossiblebycriticalprocedures,suchasqualitycontroloftumorsamples,selectionofappropriatereferencetissues,evaluationofdiseasesignatures,andweightingcancercelllines.Thereisaplethoraofrelevantdatasetsandanalysismodulesthatarepubliclyavailable,yetareisolatedindistinctsilos,makingittedioustoimplementthisapproachintranslationalresearch.Assuch,wepresentthecurrentprotocol,whichweenvisionasabestpracticetoprioritizedrugsforfurtherexperimentalevaluation,primarilybasedonopentranscriptomicdatasetsandthefreeopen-sourceRlanguageandBioconductorpackages.Inthisproject,weretrievedpatienttumorsamplesbasedonspecifiedclinicaland/ormolecularfeaturesfromtheGenomicDataCommonsDataPortalusinganAPI.WethencreatedageneexpressionsignatureforthesesamplesthroughemployingnormalizedRNA-SeqcountsprocessedintheUCSCXenaproject,whereallRNA-SeqsamplesfromTCGA,TARGET,andGTExwerealignedandnormalizedusingthesamepipeline.Weevaluatedthequalityofsamplesbasedontheirpurityandcorrelationwithcancercelllines.ThereferencetissuesampleswereselectedbasedontheirprofilesimilaritywithGTExsamples.Weevaluatedeachdiseasesignatureviaacross-validationapproach.Wethencreateddrugsignaturesusingasimilarprocedurefromlarge-scale,openaccessplatforms,namelytheLINCSL1000library,whichconsistsofover20,000compounds.Ourpipelinecanthencomputeandassessthereversalpotencybetweenthediseasesignatureandeachdrugsignature.Thedrugsthatpresenthighreversalpotencyareprioritizedasdrughits.Finally,weperformedenrichmentanalysisofdrughitstoidentifycompellingenrichedtargetsandpathways.Forourpilotstudy,weuseIDH1mutantOligodendrogliomaasacasestudy,wheretheefficacyofover300LINCScompoundswasmeasuredinthreerelevantcelllines.Wehaveshownthatourpredictioncorroboratewiththeexperimentaldata.

Page 94: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

83

DEEPLEARNINGPREDICTSTUBERCULOSISDRUGRESISTANCESTATUSFROMWHOLE-GENOMESEQUENCINGDATA

MichaelL.Chen,IsaacS.Kohane,AndrewL.Beam,MahaFarhat

DepartmentofBiomedicalInformatics,HarvardMedicalSchool,Boston,MAMichael,ChenBackgroundThediagnosisofmultidrugresistantandextensivelydrugresistanttuberculosisisaglobalhealthpriority.Thereisapressingneedforarapidandcomprehensivedrugsusceptibilitytestthatcancircumventthelimitedscopeofconventionalmethodsandtheassociatedlongwaittimes.WesoughttoimplementthefirstdeeplearningframeworkasapredictivediagnostictoolforMycobacteriumtuberculosis(MTB)drugresistance.MethodsUsingalargepublicdatasetof3,601MTBstrainsthatunderwenttargetedorwholegenomesequencingandconventionaldrugresistancephenotyping,webuiltthefirst-of-its-kindmultitaskwideanddeepneuralnetwork(WDNN)architecturetopredictphenotypicdrugresistanceto11anti-tuberculardrugs.WecomparedperformanceoftheproposedWDNNtoregularizedlogisticregressionandrandomforestmodelsusingfive-foldcrossvalidation.Weconductedpermutationtestsforevaluatingfeatureimportanceandat-distributedstochasticneighborhoodembedding(t-SNE)tovisualizethehighdimensionalmodeloutputonthefulldataset.ResultsThemultitaskWDNNachievedstate-of-the-artpredictiveperformancecomparedtoregularizedlogisticregressionandrandomforest:theaveragesensitivitiesandspecificities,respectively,forall11drugswere87.1%and93.7%(multitaskWDNN),85.4%and93.8%(randomforest),and82.2%and93.9%(regularizedlogisticregression).ThemultitaskWDNNachievedahighersumofspecificityandsensitivityfor9ofthe11drugscomparedtoboththerandomforestandregularizedlogisticregression.WeshowconsiderableperformancegainsinourcurrentmultitaskWDNNwithrespecttoourpreviouslyreportedrandomforestmodel,notingimprovementsofupto54.0%inthesumofspecificityandsensitivity.Patternsinsusceptibilitystatusemergedbetweendrugsafterapplyingt-SNEthatcorrelatewellwithwhatisknownabouttheorderofMTBdrugresistanceacquisition.Novelt-SNEfindingsincludedmajorclusterdifferencesbetweenpyrazinamideandotherfirst-linedrugsandincreasedamountsofresistanceclustersforcapreomycincomparedtoothersecond-linedrugs.Notablefindingsinthefeatureimportanceanalysesincludedexpectedsharedresistance-associatedmutationsbetweendrugsandprovidednewinsightpotentialmechanisticrelationships.Capreomycinexclusivelyshared10featureswithfirst-linedrugs,highlightingpotentialavenuesforfutureresearchintothediagnosticsimilaritiesbetweencapreomycinandothersubtypesofanti-tuberculardrugs.ConclusionOurproposedarchitectureprovidesaunifiedmodelofdrugresistanceacross11anti-tuberculardrugsandshowsconsiderableperformancegainsoversimpermethods.DeeplearninghasaclearroleinimprovingidentificationofdrugresistantMTBstrainsandholdspromiseinbringingsequencingtechnologiesclosertothebedside.

Page 95: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

84

DESIGNINGPREDICTIONMODELFORHYPERURICEMIAWITHVARIOUSMACHINELEARNINGTOOLSUSINGHEALTHCHECK-UPEHRDATABASE

EunKyungChoe1,SangWooLee2

1DepartmentofSurgery,SeoulNationalUniversityCollegeofMedicine;2NetworkDivision,SamsungElectronics

EunKyung,ChoeHyperuricemiaisanelevateduricacidlevelinblood.Itcanleadtogoutandnephrolithiasisbutalsohasbeenimplicatedasanindicatorfordiseaselikemetabolicsyndrome,diabetesmellitus,cardiovasculardisease,andchronicrenaldisease.TheaimofthepresentstudyistodesignapredictionmodelforhyperuricemiausingEHRdatabasefromhealthcheck-upusingvariousmachinelearningtools.From2005to2015,self-paidpeoplehadcomprehensivehealthcheck-up.Inputfactorswereage,gender,bodymassindex(BMI),bloodpressure,waistcircumference,whitebloodcellcount,hemoglobin,glucoselevel,cholesterol,GOT/GPT,GGT,creatinine,triglyceride,urinealbumin,smoking/alcoholhabit,anddiabetes/hypertension/dyslipidemiamedicationstatus,whicharethefactorscoveredbynationalhealthinsurance.Outputfactorwasuricacidlevelwhichisnotincludedinthenationalhealthcheck-up.AllofthedatawereextractedfromtheEHRdatabaseandtextminingwasperformed.Wedesignedapredictionmodelforhyperuricemiausingmachinelearningtoolssuchaslinearregressionmodel(LR),supportvectormodel(SVM),classificationtreemodel(CT)andneuralnetworkmodel(NN).MachinelearningwasperformedbyMATLABR2016b(TheMathworks,Natick,MA).Thepredictionpowerofeachmodelswereevaluatedbycalculatingtheareaunderthecurve(AUC),sensitivity,specificityandaccuracy.Total55,227personswereincludedintheanalysis.Themedianagewas52years(range21-95years)and53.5%ofpersonsweremales.Therewere10,586(19.2%)personswhohaduricacidlevelinhyperuricemia.BMIwashigherinhyperuricemiagroup(25.2+/-3.0vs.normaluricacidgroup,23.4+/-2.9,p<0.001)andthereweremorealcoholdrinkinghabitsinhyperuricemiagroup(67.8%vs.normaluricacidgroup,52.4%,p<0.001).Sortingtheresultsbytheaccuracyofeachmachinelearningmodels,theCTshowedthehighestaccuracyof0.954(AUC=0.886;sensitivity=0.792;specificity=0.981)comparedtoSVMof0.892(AUC=0.630;sensitivity=0.261;specificity=0.999),NNof0.859(AUC=0.770;sensitivity=0.09;specificity=0.991)andLRof0.857(AUC=0.761;sensitivity=0.033;specificity=0.997).Thisstudyusedahealthcheck-upEHRdatabasetopredictadiseasestatus(hyperuricemia)usingvariousmachinelearningtools.SincetheamountofEHRdatabaseareincreasingrapidly,thedataincludedinthedatabasecouldbeusedasbiomarkerstopredictdiseasestatusorhighriskconditionsbymodelingapredictionmodelusingmachinelearningtools.Butsincetheoptimalanalysistooloranalyzingprotocolisnotwellestablishedandtheover-fittingproblemisyetnotsolved,moretrainingandresearchesinvarioussetofpopulationsshouldbeendoneinfuturestudyforreplication.

Page 96: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

85

RICK:RNAINTERACTIVECOMPUTINGKIT

GalinaA.Erikson,LingHuang,MaximShokhirev

SalkInstituteforBiologicalStudiesGalina,EriksonTheadventofmassivelyparallelsequencingofRNA(RNA-Seq)enablesfastandinexpensiveglobalmeasurementofthousandsofgenesacrossbiologicalperturbationsinvolvingdrugtreatment,geneticmutations,andtimeseries.Tofacilitatecomparison,manytoolshavebeendeveloped,howevermostofthesetoolsrequireextensiveprogrammingandbioinformaticsknowledge:littleisavailableforthescientistthatwantstoanalyzetheirownRNA-seqdatabutlacksbioinformaticsexpertise.TheRNAInteractiveComputingKit(RICK)aimstofillthisgapbyprovidinganinteractivewebworkspacedesignedtofacilitateRNA-Seqanalysisandvisualization.RICKacceptsasinputafilewithrawreadcountsforeachtranscriptandsampleandperformssampleclustering),visualizestheglobalgeneexpressionwithheatmaps,runsprincipalcomponentanalysisandpreparesprintreadyfigures.Userscanaddandremovesamplesandregeneratenewfiguresonthefly.Fordifferentialgenesexpressionusershavetheoptiontouse:edgeR,Deseq2orthecombinationofallandfiltertheresultsbasedonadjustedp-valueandfoldchange.RICKisabletousetheDEresultsfromtheprevioussteptoidentifythesignificantlyalteredKEGGpathwaysorenrichedGOtermsusingthegageorGOseqpathwayanalysispackagewithvisualization.Usersalsohavetheoptiontouploadtheircustomizedgene/backgroundgenelisttodoaDAVID-likeanalysis.RICKsupportsRNA-Seqbasedresearchbyprovidingaworkflowthatrequiresnobioinformaticsskills.RICKisfreelyavailableatrick.salk.edu.

Page 97: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

86

PRIVATEINFORMATIONLEAKAGEINFUNCTIONALGENOMICSEXPERIMENTS:QUANTIFICATIONANDLINKING

GamzeGursoy,MarkGerstein

PrograminComputationalBiologyandBioinformaticsYaleUniversityGamze,GursoyThesuccessoftheENCODE(EncyclopediaofDNAElements)projectopenedthedoorstoadeeperunderstandingofthefunctionalgenomethroughgenome-wideexperimentalassays.AlthoughidentifyingindividualsusingDNAvariantsfromwholegenomeorexomesequencingdataisamajorprivacyandsecurityconcern,nostudyongenomicprivacyhasfocusedonthequantityofinformationinfunctionalgenomicexperimentssuchasChIP-Seq,RNA-SeqandHi-C,sincethemajorityofthisdataispartialandbiased.Here,wequantifytheamountofleakedgenotypeinformationindifferentfunctionalgenomicassaysatvaryingcoverages.Weshowthatsequencingdatafromfunctionalgenomicsassaysprovidesenoughprivateinformationtobeabletolinkthesesamplestoapanelofindividualswithknowngenotypes.

Page 98: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

87

CARPED.I.E.M:ADATAINTEGRATIONEXPECTATIONMAPFORTHEPOTENTIALOFMULTI-`OMICSINTEGRATIONINCOMPLEXDISEASE

TiaTateHudson,ClarLyndaWilliams-DeVane

NorthCarolinaCentralUniversity,Durham,NC,USATia,HudsonAdvancesinhighthroughputtechnologiesandtheavailabilityofmulti-`omicsdatapresenttheopportunityformoreholisticunderstandingsofbiologicalregulationincomplexdiseasesanddisparities.Thecomplexityanddisparatenatureofvariousdiseasesrequiresthedevelopmentofequallycomplexmodelswithmultiplelayersofbiologicalinformation.Thishowever,requirestheintegrationofbiological,computational,andstatisticaldomains.Currently,nonetheless,thereexistmajorgapsintheavailabilityandknowledgeamongstthethreedomains.Typically,biologistexperienceproblemswithprocessingandanalyzingbiologicaldata;therefore,seekingdatascientistformorecustomizedanalysis.Incontrast,somedatascientistslackathoroughunderstandingoftheregulationandcomplexinteractionsofvarioussystemsgivingrisetovaryingcomplexphenotypes.Thisgenerallyresultsinlesscomprehensiveanalysisandanoverallnarrowunderstandingofcomplexdiseasephenotypes,whichcanonlybethoroughlyunderstoodwhenvariouslevelsof`omicinteractionsareconsideredasawhole.Thus,developingthemostcomprehensivebiologicalmodelsmustconsiderthemultipleappropriatelayersofgenomic,epigenomic,transcriptomic,proteomic,andmetabolomicregulation,aswellasthepotentialroleenvironmentalandsocialfactorsplayateach`omiclevel.Historically,diversedatatypeshavebeenconsideredindependentlywhilecombinationsoftwoormoredatatypeshavebeenutilizedlessfrequently.Singularanalysisofindependent`omiccontributionsofdiseaseoftenneglecttheintricateinteractionsamongthedistinctlevelsgivingrisetothesecomplextraits.Althoughenvironmentalandsocialfactorshaveamajorroleinthedisparatenatureofdiversediseases,manydiseasesresultfrommutualalterationsinassortedpathwaysandbiologicalprocesses,includinggenemutations,epigeneticchanges,andmodificationsingeneregulation.Therefore,thevariousphenotypesindiversediseaserepresentamajorexampleoftheneedforintegratedbiologicalmodelsforcomplextraitanalysis.Inthisstudy,wepresenttheDataIntegrationExpectationMap(D.I.E.M),whereweexplorethescientificvalueofintegratingvarious`omicdatacombinationsthatcanrevealmechanismsofbiologicalregulationindiseasedisparities.Ourgoalistoconveythepotentialforintegrationofgenomic,epigenomic,transcriptomic,proteomic,andmetabolomicdataforimprovingourunderstandingofthecomplexityandnatureofdisparityincomplexdiseasetraits.Indoingso,thismapwilladdresstheholesinthevariousdomainsnecessaryforintegrateddataanalysisandinterpretation.D.I.E.Mwillalsorevealtheexpectedoutcomesforeach`omicdatatypeandthevariouscombinationsthatmayormaynotdivulgeaholisticviewintocomplexdiseasephenotypes.Withthat,weexpecttogainagreaterunderstandingofphysiologicalprocessescontributingtodisparitiesaswellastheroleeach`omicinteractionplaysinscreening,diagnosis,andprognosisofdisease.

Page 99: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

88

IMPROVINGGENEFUSIONDETECTIONACCURACYWITHFUSIONCONTIGREALIGNMENTINTARGETEDTUMORSEQUENCING

JinHyunJu,XiaoChen,JuneSnedecor,Han-YuChuang,BenMishkanian,SvenBilke

IlluminaInc.,5200IlluminaWay,SanDiego,CA92122,USAJinHyun,JuGenefusionshavebeenidentifiedasdrivermutationsinmultiplecancertypes,andanumberofdrugstargetingspecificfusionshavebeendevelopedastreatmentoptions.Therefore,theabilitytoidentifyfusionsfromtumorsampleshasbecomecriticalfortheselectionofappropriatetreatmentsforpatients.Previously,genefusionshavebeendetectedbytargetedapproachessuchaspolymerasechainreaction(PCR)orFluorescentInSituHybridization(FISH).Thesemethodsnotonlyrequirepriorknowledgeofthefusion,butarealsolaborintensiveandnotefficient.NewermethodsutilizingRNAsequencing(RNA-seq)thatareabletodetectmultipletypesofgenefusionswithnopriorknowledgerequiredhavebeenintroducedwiththeemergenceofnext-generationsequencingtechnology.OnecriticalchallengeinusingRNA-seqdataforgenefusiondetectionisfalsepositivefindingsintroducedbyalignerspecificbiasesorregionswithsequencesimilarityinthegenome.Thisproblembecomesmoreapparentinclinicalsettingswheretheabundanceoffusiontranscriptscanbelimitedbythecompositionandheterogeneityofthetumorsample.Toavoidthecriticalriskoffailingtodetectapotentiallytreatablegenefusion,imposingastringentdetectionthresholdbecomesdifficultinthesesituationsleadingtotheinclusionoffusionsbasedonrelativelylowreadevidence.Toaddressthisproblem,wedescribeanovelfusionfilteringmethodbasedonfusioncontigrealignmentthatisdesignedtoidentifyspuriousfalsepositivefusions.Ourmethodcanbeusedtogetherwithanyassembly-basedfusioncallingmethodthatconstructsacontigsequenceforeachreportedfusion.ThefirststepistorealignthefusioncontigswithBasicLocalAlignmentSearchTool(BLAST),whichisrelativelymoreflexibleinfindingalternativealignmentresultswithhighsequencesimilarity.Subsequently,wedeterminewhetheraspecificfusioncallcanbesupportedbyevidencefoundinBLASTalignments.Specifically,weaimtofilteroutfusionsthatcanbeexplainedbyregionsoriginatingfromasinglegeneorgenomicregion,orhaveweaksupportoneithersideofthefusioninBLASTalignments.Inourpreliminaryanalysisof1171fusioncallsin322samples,111outof161falsepositivecalls(68%)werefilteredoutwhilenocallsfromthetotalof1010truepositiveswerefilteredout.

Page 100: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

89

SPARSEREGRESSIONFORNETWORKGRAPHSANDITSAPPLICATIONTOGENENETWORKSOFTHEBRAIN

HidekoKawakubo,YusukeMatsui,TeppeiShimamura

NagoyaUniversityiHideko,KawakuboRecentrarevariantanalysesofsinglenucleotidevariations(SNVs)andcopynumbervariations(CNVs)hasidentifieddozensofcandidategenesthatmaycontributetoneurogeneticdisorderssuchasautismandschizophrenia.However,itisunclearwhetherandhowthesedisease-causinggenesareassociatedwithcellularmechanismsinbrain.Thisproblemisachallengingtask,sincethebraincontainshundredsofdistinctcelltypes,eachofwhichhasuniquemorphologies,projections,andfunctions,andthusdisease-causinggenesmaycontributetodifferentbehavioralabnormalitiesofdistinctcelltypesinthenervoussystem.Inordertoidentifycandidatecelltypesofthebrainrelatedtoacomplexgeneticdisorder,weproposeastatisticalmethod,calledgraphorientedsparselearning(GOSPEL).

Page 101: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

90

GRIM-FILTER:FASTSEEDLOCATIONFILTERINGINDNAREADMAPPINGUSINGPROCESSING-IN-MEMORYTECHNOLOGIES

JeremieS.Kim1,2,DamlaS.Cali1,HongyiXin3,DonghyukLee1,4,SaugataGhose1,MohammedAlser5,HasanHassan2,6,OğuzErgin6,CanAlkan5,OnurMutlu2,1

1ECEDepartment,CarnegieMellonUniversity;2CSDepartment,ETHZurich;

3CSDepartment,CarnegieMellonUniversity;4NVIDIAResearch;5CEDepartment,BilkentUniversity;6CEDepartment,TOBBUniversityofEconomicsandTechnology

Jeremie,KimSeedlocationfilteringiscriticalinDNAreadmapping,aprocesswherebillionsofDNAfragments(reads)sampledfromadonoraremappedontoareferencegenomeinordertoidentifythegenomicvariantsofthedonor.State-of-the-artreadmappersdeterminetheoriginallocationofareadsequencewithinareferencegenomein3generalizedsteps.Areadmapper1)quicklygeneratespossiblemappinglocationsforseeds(i.e.,smallersegments)withinaread,2)extractsthereferencesequenceateachofthemappinglocations,and3)determinesthesimilarityscorebetweenthereadanditsassociatedreferencesequenceswithacomputationally-expensivealgorithm(i.e.,sequencealignment).Withthesimilarityscoresacrossallpossiblelocations,thereadmappercandeterminetheoriginallocationofthereadsequence.Thedifferencesbetweenthereadsequenceandthematchingreferencesequenceindicatethegenomicvariantsofthedonor,whichcanbefurtheranalyzedforpreventativecareordiagnosis.Aseedlocationfilter(e.g.,FastHASH[2],SHD[3],GateKeeper[4])comesintoplaybeforesequencealignment(step3)andreducesthenumberofunnecessaryalignments.Aseedlocationfilterefficientlydetermineswhetheracandidatemappinglocationwouldresultinanincorrectmappingbeforeperformingthecomputationally-expensivesequencealignmentstepforthatlocation.Intheidealcase,aseedlocationfilterwoulddiscardallpoorlymatchinglocationspriortoalignmentsuchthatthereisnowastedcomputationonunnecessaryalignments.Weproposeanovelseedlocationfilteringalgorithm,GRIM-Filter,optimizedtoexploit3D-stackedmemorysystemsthatintegratecomputationwithinalogiclayerstackedundermemorylayers,toperformprocessing-in-memory(PIM).GRIM-Filterquicklyfiltersseedlocationsby1)introducinganewrepresentationofcoarse-grainedsegmentsofthereferencegenome,and2)usingmassively-parallelin-memoryoperationstoidentifyreadpresencewithineachcoarse-grainedsegment.Ourevaluationsshowthatforasequencealignmenterrortoleranceof0.05,GRIM-Filter1)reducesthefalsenegativerateoffilteringby5.59x--6.41x,comparedtothebestpreviousseedlocationfilteringalgorithm,and2)providesanend-to-endreadmapperspeedupof1.81x--3.65x,comparedtoastate-of-the-artreadmapperemployingthebestpreviousseedlocationfilteringalgorithm[2].Thisworkwillappearatthe16thAsiaPacificBioinformaticsConferenceinJanuary2018[1].Thepreliminaryversionofthefullarticleisathttps://arxiv.org/pdf/1711.01177.pdf.[1]Kim,JeremieS,etal."GRIM-Filter:FastSeedLocationFilteringinDNAReadMappingUsingProcessing-in-MemoryTechnologies."toappearinBMCGenomics(2018).[2]Xin,Hongyi,etal.“AcceleratingreadmappingwithFastHASH.”BMCGenomics(2013).[3]Xin,Hongyi,etal.“ShiftedHammingdistance:afastandaccurateSIMD-friendlyfiltertoacceleratealignmentverificationinreadmapping.”Bioinformatics(2015).[4]Alser,Mohammed,etal."GateKeeper:anewhardwarearchitectureforacceleratingpre-alignmentinDNAshortreadmapping."Bioinformatics(2017).

Page 102: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

91

MULTI-CLASSCLASSIFICATIONSTRATEGYFORSUPPORTVECTORMACHINESUSINGWEIGHTEDVOTINGANDVOTINGDROP

SunghoKim,TaehunKim

YeungnamUniversity,DGISTSungho,KimAnovelmulti-classstrategyforSupportVectorMachines(SVMs)wasdevelopedtoperformmulti-classclassification,suchasOneVersusOne,OneVersusAllandDynamicAcyclicGraph.Thesestrategiesdonotreflectthedistancebetweenthehyper-planethatseparatestwoclassesandinputdata.Thisisnotreasonablewhentheinputdataisplacednearthehyper-plane.TheproposedweightedvotingresolvesthisproblembyweightingthevotingvaluesaccordingtothedistancefromtheboundaryandtheenhancedperformanceoftheSVMswiththeproposedvotingdrop.TheproposedWeightedVotingisbasedonthevotingmethod.Thevotingmethodiscarriedoutbyaccumulatingvotes,thenchoosingthemostvotedclass.TheproposedWeightedVotingmethodisaweightingofthevotingvaluebyreflectingthedistancefromtheboundaryandmargin.SecondproposedVotingDropmethodisabouthowtoaccumulatevotes.ThenovelvotingmethodaccumulateseveryvotebutthismannercanbeaproblembecausethereareredundantlyrespondingSVMs.BecausetheSVMisabinaryclassifier,eachSVMlearnsonlyabouttwoclasses.Therefore,aSVMdoesnothavediscernmentforthenon-learnedclasses.ThisiswhywhenaSVMpredictsdatabelongingtoanon-learnedclass,theSVMrespondsredundantly.ThisirrelevantSVMcausesanincorrectvotethatmakesthedecisionconfused.Toresolvethisproblem,theVotingDropmethoddropstheredundantvotesbyremovingtheirrelevantSVM.ThisalgorithmfindstheirrelevantSVM,thendroppingthevotescausedbytheirrelevantSVM.ThewaytofindanirrelevantSVMistofindaleastvotedclassbecausealeastvotedclasscanbethoughtofasanirrelevantclasstoinputdata.Asshownintheexperiments,evenlyreflectingthedistancefromthehyper-planeandthediscernmentofthehyper-planeandremovingtheredundantSVM`svotingleadstohigherperformance.Theproposedmethodscanbeusedforarangeofclassificationtasks.

Page 103: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

92

GENOME-WIDEANALYSISOFTRANSCRIPTIONALANDCYTOKINERESPONSEVARIABILITYINACTIVATEDHUMANIMMUNECELLS

SarahKim-Hellmuth1,2,MatthiasBechheim3,BennoPütz2,PejmanMohammadi1,4,JohannesSchumacher5,VeitHornung3,6,BertramMüller-Myhsok2,TuuliLappalainen1,4

1NewYorkGenomeCenter,NewYork,NY,USA;2Max-Planck-InstituteofPsychiatry,Munich,

Germany;3InstituteofMolecularMedicine,UniversityofBonn,Bonn,Germany;4DepartmentofSystemsBiology,ColumbiaUniversity,NewYork,NY,USA;5InstituteofHumanGenetics,

UniversityofBonn,Bonn,Germany;6GeneCenterandDepartmentofBiochemistry,Ludwig-Maximilians-UniversityMunich,Munich,Germany

Sarah,Kim-HellmuthTheimmunesystemplaysamajorroleinhumanhealthanddisease.Understandingvariabilityofimmuneresponsesonthepopulationlevelandhowitrelatestosusceptibilitytodiseasesisvital.Inthisstudy,weaimedtocharacterizethegeneticcontributiontointerindividualvariabilityofimmuneresponseusinggenome-wideassociationandfunctionalgenomicsapproaches.Forthispurpose,westudiedgeneticassociationstocellular(geneexpression)andmolecular(cytokine)phenotypesinprimaryhumancellsactivatedwithdiversemicrobialligands.Weisolatedmonocytesof134individualsandstimulatedthemwiththreebacterialandviralcomponents(LPS,MDP,andppp-dsRNA).Weperformedtranscriptomeprofilingatthreetimepoints(0min/90min/6h)andgenome-wideSNP-genotyping.Inaddition,weprofiledfivecytokinesproducedbyperipheralbloodmononuclearcellsactivatedbyfivecomponentsfromthesameindividualstoperformagenome-wideassociationstudy.Comparingexpressionquantitativetraitloci(eQTLs)underbaselineanduponimmunestimulationrevealed417immuneresponsespecificeQTLs(reQTLs).Wecharacterizedthedynamicsofgeneticregulationonearlyandlateimmuneresponse,andobservedanenrichmentofreQTLsindistalcis-regulatoryelements.AnalysisofsignsofrecentpositiveselectionandthedirectionoftheeffectofthederivedalleleofreQTLsonimmuneresponsesuggestedanevolutionarytrendtowardsenhancedimmuneresponse.Furthermore,multivariateGWASanalysisofcytokineresponsestodiversestimulirevealed159genome-widesignificantloci;however,onlyasmallnumberofthesecouldbereliablylinkedtopotentiallycausaleQTLsinmonocytes.Finally,giventhecentralroleofinflammationinmanydiseases,weexaminedreQTLsasapotentialmechanismunderlyinggeneticassociationstocomplexdiseases.WeuncoverednovelreQTLeffectsinmultipleGWASloci,andshowedastrongerenrichmentofresponsethanconstanteQTLsinGWASsignalsofseveralautoimmunediseases.Theseresultsindicateasubstantial,disease-specificroleofenvironmentalinteractionswithmicrobialligandsingeneticrisktocomplexautoimmunediseases.Whiletissue-specificityofmoleculareffectsofGWASvariantsisincreasinglyappreciated,ourresultssuggestthatinnateimmunestimulationisakeycellularstatetoconsiderinfutureeQTLstudiesaswellasintargetedfunctionalfollow-upofGWASloci.

Page 104: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

93

PREDICTINGFATIGUESEVERITYINONCOLOGYPATIENTSONEWEEKFOLLOWINGCHEMOTHERAPY

KordM.Kober,XiaoHu,BruceA.Cooper,StevenM.Paul,ChristineMiaskowski

UniversityofCaliforniaSanFranciscoKord,KoberEffectivesymptommanagementisacriticalcomponentofcancertreatment.Computationaltoolsthatpredictthecourseandseverityofthesesymptomshavethepotentialtoassistoncologyclinicianstopersonalizethepatient’streatmentregimenmoreefficientlyandprovidemoreaggressiveandtimelyinterventions.Cancer-relatedfatigue(CRF)isthemostcommonsymptomassociatedwithcanceranditstreatments.CRFhasanegativeimpactonthepatients’abilitytotoleratetreatmentsandontheirqualityoflife.OneofthelimitationstoeffectivetreatmentofCRFistheavailabilityofavalidandreliablemodeltopredicttheseverityofCRF.Theobjectiveofthispilotstudywastogenerateapredictivemodelforfatigueseverity1weekafterchemotherapy(CTX)administration(T2)using28demographicandclinicalcharacteristicsthatwerecollectedjustpriortoCTXadministration(T1)inasampleof1042cancerpatientsundergoingCTX.Inthispilotstudy,weusedsupportvectorregression(SVR)withapolynomialkerneltopredicttheseverityoftheeveningfatiguebetweentwodifferenttimepointsduringacycleofCTX.Patientswithmissingdatawereremoved,leavingatotalof689forthisanalysis.Trainingandtestinggroupsconsistedof518and171patients,respectively.Weused10-times10-foldcross-validationroot-mean-squareerror(RMSE)toassessthefitofthepredictivemodel.OurmodelachievedanRMSE/meanof0.269.Thefivepredictorswiththehighestimportancewere:eveningfatigueatT1,morningfatigueatT1,attentionalfunction,sleepdisturbance,andperformancestatus.Thefivepredictorswiththelowestimportancewere:livingalone,caregivertoadult,andlevelofeducation,cyclelength,andnumberofmetastaticsites.Overall,clinicalcharacteristicsassociatedwithcanceranditstreatment,includingcancerdiagnosis,hadlowimportanceinthemodel.ThesefindingssuggestthattheexperienceandmechanismsofCRFmaybegeneralandnotcancerspecific.Thistypeofpredictivemodelcanbeusedtoidentifyhighriskpatients,educatepatientsabouttheirsymptomexperience,andimprovethetimingofpre-emptiveandpersonalizedsymptommanagementinterventions.Theseresultssuggestthattheintegrationofdemographicandclinicaldatacanenhanceclinicalprognosticprediction,whichwillcontributetothedevelopmentofprecisioncancermedicine.Ourmethodsaregeneralizabletoothertypesofsymptoms.

Page 105: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

94

SINGLE-MOLECULEPROTEINIDENTIFICATIONBYSUB-NANOPORESENSORS

MikhailKolmogorov1,EamonnKennedy2,ZhuxinDong2,GregoryTimp2,PavelA.Pevzner1

1DepartmentofComputerScienceandEngineering,UniversityofCaliforniaSanDiego,USA;2ElectricalEngineeringandBiologicalScience,UniversityofNotreDame,USA

Mikhail,KolmogorovRecentadvancesintop-downmassspectrometryenabledidentificationofintactproteins,butthistechnologystillfaceschallenges.Forexample,top-downmassspectrometrysuffersfromalackofsensitivitysincetheioncountsforasinglefragmentationeventareoftenlow.Incontrast,nanoporetechnologyisexquisitelysensitivetosingleintactmolecules,butithasonlybeensuccessfullyappliedtoDNAsequencing,sofar.Here,weexplorethepotentialofsub-nanoporesforsingle-moleculeproteinidentification(SMPI)anddescribeanalgorithmforidentificationoftheelectricalcurrentblockadesignal(nanospectrum)resultingfromthetranslocationofadenaturated,linearlychargedproteinthroughasub-nanopore.Theanalysisofidentificationp-valuessuggeststhatthecurrenttechnologyisalreadysufficientformatchingnanospectraagainstsmallproteindatabases,e.g.,proteinidentificationinbacterialproteomes.

Page 106: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

95

GENEEXPRESSIONPROFILEOFOSTEOARTHRITISAFFECTEDFINGERJOINTS

MilicaKrunic1,KlausBobacz2,ArndtvonHaeseler3

1CenterforIntegrativeBioinformaticsVienna,MaxF.PerutzLaboratories,UniversityofVienna,MedicalUniversityofVienna,Vienna,Austria;2DepartmentofInternalMedicine

III,DivisionofRheumatology,MedicalUniversityofVienna,Vienna,Austria;3BioinformaticsandComputationalBiology,FacultyofComputerScience,Universityof

Vienna,Vienna,AustriaMilica,KrunicOsteoarthritis(OA)isajointdisease,whichcanaffectanyjoint.However,themostfrequentnon-weightbearingjointsaffectedbyOAarehandjoints.ThemostcommonclinicalpresentationofhandOAispainandlossofhandstrength,whichrestrictstheabilityofpeopletoperformdailyactivities.MultiplefactorscancontributetothedevelopmentofthehandOA,ofwhichthemostfrequentlyobservedare:age,gender,genetics,obesity,occupation,andrepetitivejointusage.OAinproximalinterphalangeal(PIP)anddistalinterphalangeal(DIP)jointsisconsideredtobethemostcommoncauseofhandpainnowadays.Toourbestknowledge,thereisnopublishedresearch,whichindetailsaddressesuncleargeneticetiologyofthefingerOA.SincecartilageisoneofthemostcommonlydefectedtissueinOA,theaimofourstudywastoexploregeneexpressionprofileofchondrocitessampledfromtwofingerjoints:PIPandDIP,andtoinvestigatewhichpathwaysandgeneontologytermswerealteredinpatientsaffectedbythisdisease.

Page 107: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

96

DISCOVERYANDPRIORITIZATIONOFDENOVOMUTATIONSINAUTISMSPECTRUMDISORDER

TaeyeopLee,JaehoOh,MinGyunBae,JunHyeongLee,JungKyoonChoi

DepartmentofBioBrainEngineering,KoreaAdvancedInstituteofScienceandTechnology(KAIST),RepublicofKorea

Taeyeop,LeeAutismspectrumdisorder(ASD)isaneurodevelopmentaldisordercharacterizedbyimpairedsocial-interaction,andrestrictedandrepetitivebehaviors.Previousstudieshavereportedthatthegeneticcontributionorheritabilityisashighas80%inASD.InordertoelucidatethegeneticarchitectureofASD,manyresearchersperformedextensivestudiesanddiscoveredsomesignificantfindings.Currently,hundredsofdifferentgeneshavebeenunveiled,mostlythroughidentificationofrelatedrarevariants.Raregeneticvariants,bothinheritedanddenovo,areproposedtobecausalin~30%ofASDpatients.Incomparison,commongeneticvariantsalsoareestimatedtocontributetoapproximately50%ofASDetiology.However,nospecificcommonriskvarianthasbeenfoundtodate,possiblyduetoinsufficientsamplesize.Here,wereportawholegenomesequencingstudyofASDpatientstodiscoverandcharacterizedenovomutationsinAsianpopulation.Bysequencing101autismtriosandunaffectedsiblings,welocatedcausalvariantsin74candidategenes.Thevariantsincludednotonlylossoffunctionandmissensevariants,butalsointronicandintergenicnon-codingvariants.Thecandidategenesetshowedsignificantoverlapwithknownautism,intellectualdisability,andchromatinrelatedgeneset.Furthermore,toprioritizethenon-codingdenovomutations,wedevelopedadeeplearningframeworkbasedon>2,000functionalfeatures.ThefeaturesincludedDNaseIhypersensitivesites,histonemodificationprofiles,diseasepathways,andtranscriptionfactorbindingsites,wherethenonlinearcombinationsofthefeaturesindicatethecausalprobabilityofanon-codingvariant.Theperformanceofthemodelwasevaluatedwithareaundercurve(AUC)andF1score.OurresultssuggestthatdenovovariantsarerelatedtoimportantASDriskgenes,andthatnoncodingdenovovariantshaveanon-zeroeffectinASD.

Page 108: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

97

CROSSTALKER:ANOPENNETWORKANDPATHWAYANALYSISPLATFORM

SeanMaxwell,MarkR.Chance

CaseWesternReserveUnIversityandNeoProteomics,Cleveland,OhioMark,ChanceIntroduction:Networkanalysismethodshavebecomecommonplaceresearchtoolsduetotheirprovenabilitytointerrogateandorganizelistsofmoleculartargetsofinterestidentifiedbybasicstatisticsalone,anduseofnetworkanalysistorefineclassifierfeaturesetshasbeenshowntoprovidesuperiorperformancecomparedtotargetsidentifiedsingly.WeintroduceCrosstalkerasafreewareplatformforacademicusethatiswebbasedandincorporatesmultiplepublicinteractionandgenesetdatabasestoperformnetworkanalysis,enrichmenttestingandvisualizationinamodernHTML5+JSinterface.Theuseofopendatabasesandalgorithmscoupledtoconvenientuserchoicesallowscrosscomparisonoffindingsandpermitseasyreplicationofresultsbyanylaboratoryimprovingreproducibilityandrigor.Methods:Listsofseedmoleculesaremappedontoareferenceinteractionnetworkselectedbytheuserandarandomwalkwithrestarts(RWR)isperformedusingtheseedmoleculesastherestartnodes.TheRWRscoresareadjustedtoz-scoresusingMonte-Carloestimatedscoredistributionsforeachnodeintheinteractionnetwork,andwehaveoptimizedtheMonte-Carloestimationparametersusinganalyticmethodsandcomputationaltesting.Assumingthez-scoresfollowanormaldistribution,theadjustedscoresareusedtoselectnodesthathaveap<0.001chanceofachievingthesameorhigherRWRscorebychanceastheydofromtheuserinput.Theresultingmoleculesaretestedforenrichmentsagainstuserselectedgenesetdatabasesandusedtoinduceresultsubnetworksfromthereferencenetwork.Theinducedsubnetworksarevisualizedwithoptionstoannotatenodes(molecules)andedges(interactions).Results:Computationalexperimentsusinginputsgeneratedbycombiningannotatedsetsoffunctionallyrelatedmoleculeswithunrelated“noisemolecules”showedthatadjustingproximityscoresbynull-distributionimprovedpredictionsoffunctionallyrelatedmoleculesoverrank-onlymethodswhentheinputscontainedmorenoisemoleculesthanannotatedmolecules.Choicesofmultipleinteractionnetworks(likeBioGRID,BioPlexorCOXPRESdb)enabletestingofdifferenthypotheseswithinthesameinterface,suchasco-expressionordirect/indirectphysicalinteractionsofrelatedmolecules.Theoptimizedalgorithmsusedbythecomputationalportionofthesoftwarefacilitateanalysistimesunder1minute,minimizingwaittimesandmaximizingthenumberofconcurrentusersthesystemcansupport.NovelAspect:AnalyticallyverifiedMonte-Carloestimationparameters.Multipleoptionsforinteractionnetworksandgenesets.Web-basedwithoptionstoexportresultsanddatainopen(JSON,CSV)andbinary(XLSX)formats.Freeforacademicuse

Page 109: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

98

SIGNATURESOFNON–SMALL-CELLLUNGCANCERRELAPSEPATIENTS:DIFFERENTIALEXPRESSIONANALYSISANDGENENETWORKANALYSIS

AbigailE.Moore1,BrandonZheng2,PatriciaM.Watson3,RobertC.Wilson3,DennisK.Watson3,PaulE.Anderson4

1DepartmentofNaturalScience,HampshireCollege,Amherst,MA01002,USA;2DepartmentofBiology,BardCollege,Annandale-On-Hudson,NY12504,USA;

3DepartmentofPathologyandLaboratoryMedicine,MedicalUniversityofSouthCarolina,Charleston,NC29425,USA;4DepartmentofComputerScience,Collegeof

Charleston,Charleston,SC29424,USAAbigail,MooreBackgroundLungcancerisboththesecondmostrepresentedcancerdiagnosisandtheleadingcauseofcancerdeathwithintheUnitedStates.Despitethehighoccurrenceofnon–small-celllungcancer(NSCLC),30%to55%ofpatientsrelapseaftercurativeresection,andthe5-yearrelativesurvivalrateis15%to21%.Thehighcostsofcancermedicationandcancerdrugfailuresareimpactedbybiomarkerprograms,whichhelpselectpatientswhomaybenefitfromagivendrug.MethodsNSCLCRNAsamplesweretakenfrom38patients,andclinicaloutcomesweredeterminedbytheAmericanCollegeofSurgeryOncologyGroup.Ofthesepatients,20werediagnosedasdisease-free,and18asrelapsepatientswithin3yearsofsurgicalresection.RNA-Seqlibrarieswerepaired-endsequencedonHiScanSQandHiSeq2500systems.ReadqualitywasdeterminedbyFastQC,andadaptersandlow-qualityreadsweretrimmedwithTrimmomatic.Trimmedpaired-endreadswerealignedtothehumangenome(HG38,UCSC)withRSEM.AlignedreadswereinputintotheR/BioconductorEBSeqpackagetoperformmediannormalizationanddifferentialexpressionanalysis.Differentiallyexpressedgeneswereanalyzedforover-representationofproteincomplexes,geneontologytermsandpathwaysviaConcensusPathDB.ResultsEmpiricalBayesianmethodsidentified122differentiallyexpressedgenes(FDR<0.05).Manylungcancer-relatedgeneswererecognized,suchasBAMBI,CPS1,CD70,SHISA3,andWNT11.Alsoidentifiedwerenovelgeneswithupregulatedexpressioninrelapsepatients:LILRA2,ALOX12,TSPAN-11,andCADM3,whichareinvolvedinimmuneresponse,arachidonicacidmetabolism,cellsurfacereceptorsignaling,andcell-celladhesion,respectively.Novelgeneswithdownregulatedexpressioninrelapsepatientswereidentified,includingMCCC1,MRGPRF,PRR4,andSLC7A14,whichareassociatedwithbiotinmetabolism,signaltransduction,celladhesion,andnegativeregulationofphosphataseactivity,respectively.Ahypergeometrictestrevealedover-representationofgeneontologytermsforbiologicalprocessesrelatedtocancerdevelopment:positiveregulationofcellproliferation(p=4.66e-06),lipoxygenasepathway(p=6.95e-05),andbeta-amyloidmetabolicprocess(p=0.000531).Onlyoneproteincomplex-basedsetwasover-represented:Gprotein-coupledreceptorligand.Accordingly,sixGPCR-relatedpathwayswereover-represented(p-valuesfrom6.77e-05to0.000196).Over-representationofothercancer-relatedpathwayswerefoundandincludeprostaglandinsynthesisandregulation(p=8.8e-05),fluoxetinemetabolismpathway(p=0.000217),andarachidonicacidmetabolism(p=0.000243).ConclusionsIdentifyingNSCLCpatientsatriskofrecurrenceiscrucialincancerresearch.Ouranalysesidentified122differentiallyexpressedgenesamongdisease-freeandrelapseNSCLCpatients,includingknownlungcancer-relatedgenesandnewcandidatebiomarkergenesthatareinvolvedinthediverseprocessesrelatedtoNSCLCdevelopment.Futureresearchinalternativesplicingandthedevelopmentofapredictivemodelbasedonourresultscouldsupportanewmethodforidentifyingindividualrecurrencerisk.

Page 110: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

99

RANKINGBIOLOGICALFEATURESBYDIFFERENTIALABUNDANCE

SoumyashantNayak,NicholasLahens,EunJiKim,GregoryGrant

UniversityofPennsylvaniaSoumyashant,NayakWeoftenwanttorankfeaturesbytheirdifferentialabundancebetweentwopopulations.InRNA-Seqforexample,weobtainquantifiedvaluesfortensofthousandsofgenesacrossawidespectrumofexpressionintensities.Anaiverankingbyfold-changeleadstoseveralissues.Oneofthemisthedivision-by-zeroissuewhichhappenswhenthechangeisfrom0toapositivequantity.Thisproblemisusuallydealtwithbyusingapseudo-countof1.FoldchangesfromsmallernumbershowevercantendtodominatethetopofrankinglistsincaseofdiscretedatalikeRNA-Seq.Therefore,onemightwonderwhetherachangefrom1to2(foldchangeof2)istobeconsideredmoresignificantthanachangefrom100to190(foldchangeof1.9).Wesystematicallystudythisissueatboththeoreticalandempiricallevels.WeconcludethatinRNA-Seqdatathereisanoptimalvalueofthepseudo-countwhichyieldsthebestsignificancecomparisons.Weformulatethenecessaryfoundationalmathematicsintermsofaphilosophicalaxiomaticframeworktoenablethesystematicexplorationoftherankingproblem.Additionallywedemonstratehowtheuseofpseudo-countsactuallyintegratesfold-changeanddifferenceandthisobservationcanbeusedtoobtaintheadvantagesofbothmethods,whileminimizingthedisadvantages.

Page 111: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

100

SYSTEMATICANALYSISOFOBESITYASSOCIATEDVARIATIONSTHROUGHMACHINELEARNINGBASEDONGENOMICSANDEPIGENOMICS

JaehoOh,JunHyeongLee,TaeyeopLee,MinGyunBae,JungKyoonChoi

DepartmentofBioBrainEngineering,KoreaAdvancedInstituteofScienceandTechnology(KAIST),RepublicofKorea

Jaeho,OhObesity,oneofthemajorglobalhealthconcerns,isametabolicdisorderresultingfrombothbehavioralandheritablecauses.Varioussolutions,suchasdiet,exericse,surgeryanddrugtherapies,havebeenproposedbutthesefailedtoprovidelong-termeffects.Manyresearchersperformedgenome-wideassociationstudies(GWAS)toidentifydisease-associatedgenomicregions,butinterpretationofthedataposesgreatchallenge.NumerousGWASanalysisstudiesreportthatFTOistheregionmostcloselyassociatedwithobesity,butthemechanismremainsunresolved.Accordingtoonerecentpaper,‘outsidevariants’,definedasSNPsthatareinweakLDwithGWASriskSNPsandinfluencetargetgene’sregulatorycircuitryincombination,shouldbefurtherinvesitgated.‘Outsidevariant’approachsuggestthatnotonlystatisticallysignificantGWASSNPsbutalsootherSNPsmaybebiologicallymeaningful.Todevelopanobesity-relatedmodelandunravelthemechanismthrough‘outsidevariant’approach,weusedtheimputedGWASdataof14,122subjectwithBMIinformation.Toselectfunctionalepigeneticregion,weusedhistonemodificationChIP-seqdatafromadipocytesandobesity-associatedtissuesandextractedSNPsetthatishighlyrelatedtoFTO.ByperformingregressionbetweenSNPsandFTOSNPs,wefoundSNPswithhighexplanatory-powerforobesityinthefunctionalepigenetic-region.Ourresultssuggestthatthe‘outsidevariant’analysis,alongwithseveralepigeneticdata,isanovelapproachtodiscoverasetofSNPs,includingSNPsthatappearstatisticallyinsignificant,thataffectobesity.

Page 112: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

101

SPARSEREGRESSIONMODELINGOFDRUGRESPONSEWITHALOCALIZEDESTIMATIONFRAMEWORK

TeppeiShimamura,HidekoKawakubo,HyunhaNam,YusukeMatsui

DivisionofSystemsBiology,NagoyaUniversityGraduateSchoolofMedicine,JapanTeppei,ShimamuraAmajorchallengeinpharmacogenomicstudiesisdifferencesintheclinicalcharacterizationofpatientsandtheirreactions,whichmakesitdifficulttoidentifyclinicallymeaningfulgene-druginteractionsandpredictdrugresponseforeachpatient.Inthisstudy,weconsideralocalizedregressionmodelforeachsampletopredictadrugresponsewithasetofmaineffectsandsecond-orderinteractionsforoncogenicalterationsforpatients.Weproposeasparsemodelingofinteractionswithlocalizedestimationframework(SMILE)forthistask.Wetakearegularizationapproachtoinducingstronghierarchyinthesensethataninteractioncoefficientcanhaveanon-zeroestimateonlyifbothofcorrespondingmaineffectcoefficientsarenon-zero.Weincorporatetwodifferentconstraintsintothegrouplassoandthelassowithintheframeworkoflocallikelihood,todeterminethetypeofstructuresuchasstronghierarchyandenhancesparsityontheinteractioncoefficients,whichenabletogenerateaninterpretablelocalizedinteractionmodelforeachsample.Itcanbeformulatedasthesolutiontoaconvexoptimizationproblem,whichweusethealternatingdirectionmethodofmultipliers(ADMM)methodforsolvingSMILE.Wethendemonstratetheperformanceofourproposedmethodinasimulationstudyandonapharmacogenomicdataset.

Page 113: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

102

PDBMAP:APIPELINEANDDATABASEFORMAPPINGGENETICVARIATIONINTOPROTEINSTRUCTURESANDHOMOLOGYMODELS

R.MichaelSivley1,JohnA.Capra2,WilliamS.Bush3

1DepartmentofBiomedicalInformatics,VanderbiltGeneticsInstitute,VanderbiltUniversity;2DepartmentofBiologicalSciences,VanderbiltGeneticsInstitute,VanderbiltUniversity;3DepartmentofPopulationandQuantitativeHealthSciences,Institutefor

ComputationalBiology,CaseWesternReserveUniversityRobert,SivleyRaregeneticvariantsidentifiedfromsequencingstudiesareoftengroupedbygenes,functionaldomains,andotherannotationstoincreasepowerintraitassociationtestsandidentifysharedphenotypiceffects.However,associationtestsrarelyconsidervariants’orientationintheirfunctionalcontext—three-dimensional(3D)proteinstructures.Varioustoolshavebeendevelopedforvisualizingspecificvariantsinthecontextofindividualproteinstructures;however,thesetoolsdonotsupportacomplete,systematicmappingofvariantsinidentifiedinsequencingstudiesintoallavailablesolvedandcomputationallypredictedproteinstructures.WedescribePDBMap,acomputationalpipelinetoefficientlymaphumangeneticvariationgeneratedbysequencingstudiesintothestructome.Wealsopresentthecompletemappingofmissensevariantsfromthe1000GenomesProject,GenomeAggregationDatabase(gnomAD,N=3,010,061),CatalogueofSomaticMutationsinCancer(COSMIC,N=1,104,417),ClinVar(N=56,235),andtheAlzheimer'sDiseaseSequencingProject(ADSP,N=891,849)intosolvedproteinstructuresfromtheProteinDataBank(N=31,688)andcomputationallypredictedhomologymodelsfromModBase(N=186,802).Sourcecodeisavailablefromhttps://github.com/capralab/pdbmapanddownloadsareavailableathttp://astrid.icompbio.net.

Page 114: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

103

REPETITIVERNAANDGENOMICINSTABILITYINHIGH-GRADESEROUSOVARIANCANCERPROGRESSIONANDDEVELOPMENT

JamesR.Torpy1,NenadBartonicek1,DavidD.L.Bowtell2,MarcelE.Dinger1

1GarvanInstituteofMedicalResearch,384VictoriaStreet,Darlinghurst2010,Sydney,Australia;2PeterMacCallumCancerCentre,EastMelbourne,Victoria3002,Australia

James,TorpyOvariancancerisahighlycomplexdiseasewitharangeofdifferenthistologicalsubtypes.Thishighlylethaldiseaseisestimatedtobethefifthmostcommoncauseofdeathfromcancerinfemales,withafive-yearrelativesurvivalrateof46.2%.High-gradeserousovariancancer(HGSOC),characterizedbywidespreadgenomicinstability,accountsfor70-80%ofovariancancerdeaths,andsurvivalrateshavenotimprovedsignificantlyforthelastfewdecades.Furthermore,theunderlyingcauseofaround1/3ofHGSOCcasescannotbeexplained.EvidencesuggeststhatRNAderivedfromrepetitiveregionsofthegenomeplaysaroleingenomicinstabilityanddevelopmentofcancerssuchashigh-gradeserousovariancancer,andmayplayaroleintheunexplainedHGSOCcases.Aberrantexpressionofcentromere-derivedRNAcausesdysfunctionalchromosomalsegregationduringmitosisandaneuploidy.Telomere-derivedRNAmaintainstelomeres,preventingchromosomalfusion,breakageandsubsequentrearrangementofthechromosomes.RetrotransposableelementssuchasLINE1sandAlusinsertintodifferentgenomiclocations,disruptingsequencesandcausingrearrangementssuchasduplications,inversionsandtranslocations.Wehaveanalysedover120HGSOCcaseandcontrolRNA-sequencingdatasetsofprimarysamplesfromtheAustralianOvarianCancerStudy,comparingdifferencesinexpressionofrepetitiveRNAtranscriptsacrossmultipleHGSOCsubtypesandcontrols.WefoundarangeofdifferentiallyexpressedrepetitiveRNAspeciesincludingLINE1,Aluandcentromere-derivedRNAwhichmaybecontributingtogenomicinstabilityinthesetumours.InordertoinvestigatethepotentialcausesofthedifferencesinrepeatRNAlevels,theirexpressionwascorrelatedwithexpressionofarangeofmethyltransferasessuchasDNMT1andDNMT3A-Cthatareknowntoregulatemethylationatrepetitiveheterochromatin,controllingRNAexpressionfromtheseregions.ExpressionofRNAi-associatedfactorssuchasDicerwasalsoassessedasthesefactorscancontributetorepetitiveRNAregulation.

Page 115: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

104

DIMENSIONREDUCTIONOFGENOME-WIDESEQUENCINGDATABASEDONLINKAGEDISEQUILIBRIUMSTRUCTURE

YunJooYoo1,Suh-RyungKim1,SunAhKim2,ShelleyB.Bull3

1DepartmentofMathematicsEducation,SeoulNationalUniversity;2DepartmentofStatistics,SeoulNationalUniversity;3ProssermanCentreforHealthResearch,The

Lunenfeld-TanenbaumResearchInstituteYunJoo,YooGeneticassociationanalysisusinghigh-densitygenome-widesequencingdataconsistingofsinglenucleotidepolymorphism(SNP)genotypescanbenefitfromvariousdimensionreductionstrategiesforseveralreasons.First,genome-widesignificancelevelforindividualSNPtestsshouldbedeterminedconsideringthecorrelationstructureofgenotypedata.AdjustmentforTypeIerrorinflationduetomultiplehypothesistestingcanbesoughtbasedonthedimensionreductionmethods.Second,increasedTypeIerrormaybereducedasthenumberofvariablesintheanalysisdecreasesbydimensionreduction.Third,thecomputationalburdencanbereducedasthecomplexityoftheanalysismodelisreduced.Fourth,thepowerofassociationtestcanbegainedbycombiningmultiplesignalsinagroupasaresultofthedimensionreductionstrategy.WedevelopedagenomepartitioningmethodbyclusteringSNPsintoblocksbasedonlinkagedisequilibriumstructure.ThealgorithmusesagraphmodelingofcommunitiesofhighlycorrelatedSNPsandappliesacliquepartitioningalgorithmtothegraphtopartitionSNPsintoblocks.Weappliedthealgorithmto1000GenomesProjectdata,andobtained162K,173K,334Kblocksincludingsingletonblocksintheautosomalregionsof22chromosomesforAsian,European,andAfricandatarespectively.TheaverageLDmeasurer^2(thePearsoncorrelationcoefficientoftwoadditivelycodedgenotypevariables)valueswithinblocksare0.465,0.437and0.329forAsian,European,andAfricandatawhereastheaverager^2valuesbetweenconsecutiveblocksare0.156,0.145,and0.098forthreepopulations.WeevaluatedtheTypeIerrorandthepowergainfromthesepartitionsforseveralmulti-SNPassociationtestsusingthesimulateddatabasedon1000GenomesProjectdata.Comparedtootherclusteringmethods,severaltestsusinglocaldimensionreductionstrategiescombinedwithgenome-widedimensionreductionshowedbetterpowerthanothermethods.Wealsodevelopedalocaldimensionreductionmethodforgenome-widesequencingdataespeciallytargetingthemulti-collinearityissueofdenseSNPgenotypedatatobeanalyzedbymultipleregressionanalysis.ThismethodclustersSNPsinmulti-collinearitybyexaminingthevarianceinflationfactor(VIF),andreplacessuchgroupbyprincipalcomponents.ThealgorithmproceedsiterativelyuntilallVIFvaluesareunderathresholdvalue.WhenwecomparedthepowerbetweentheanalysisbasedonoriginaldataandtheanalysisbasedonthedimensionreduceddatausingVIFevaluation,weobservedthepowergaininquadratic-typetestssuchasWaldtest.

Page 116: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

105

THEMULTIPLEGENEISOFORMTEST

YaoYu,ChadD.Huff

DepartmentofEpidemiology,TheUniversityofTexasMDAndersonCancerCenter,Houston,Texas,USA

Chad,HuffGene-basedassociationtestsaggregatemultiplevariantsinagenetoevaluatestatisticalevidenceforrarevariantassociation.Typically,thesetestsincludevariantsfromallcodingexonsinagene,irrespectiveofgeneisoform.Forgeneswithmultipleisoforms,thisisoftenapproximatelyequivalenttoatestofthelargestisoform,whichisnotnecessarilyoptimal.Becausesmallerisoformstendtobeenrichedforthecorefunctionaldomainsofagene,theymayalsobeenrichedforpathogenicvariantsorlargervarianteffectsizes.Toaddresstheopportunitiespresentedbyisoform-specificpatternsofdiseasesusceptibility,weintroducetheMultipleGeneIsoformTest(MGIT).MGITemploysapermutationapproachtotesteachisoformofagene,summarizingthecontributionofeachtranscripttocalculateasinglegene-levelp-value,withouttheneedtoexplicitlymodelcorrelationbetweentranscripts.MGITcanbeappliedinconjunctionwithanygene-basedassociationtesttoassessgene-levelsignificanceandtoidentifyisoformsthatmaybeenrichedforproteindomainsimpactingdiseaserisk.TodemonstratetheutilityofMGIT,wereportresultsfromagene-basedassociationtest(VAAST)involving783breastcancercases,322skincutaneousmelanomacases,and3,607controlsofEuropeanancestry.Fortwoestablishedcancergenes,weobservedatwo-foldandthree-foldreductioninp-valuewithMGITrelativetoawhole-genetest,forMITFinmelanomaandBRCA1inbreastcancer,respectively.Incontrast,forotherestablishedcancergenes,weobservedeithernochangeinp-value(RAD51BandBRCA2inbreastcancerandMC1R,MTAP,andBRCA2inmelanoma)oramodestattenuationofassociationsignal(CHEK2inbreastcancer).InthecaseofBRCA1,thedifferenceintheMGITassociationsignalwasprimarilydrivenbyrare,predicteddamagingmissensevariants,whichexhibitedlargedifferencesineffectsizebetweenthesmallestandlargestisoforms.MGITisimplementedinthesoftwarepackageXPAT,withsupportforVAAST,SKAT-O,and27additionalgene-basedassociationtests.

Page 117: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

106

IMAGINGGENOMICS

POSTERPRESENTATIONS

Page 118: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

107

GENETICANALYSISOFCEREBRALBLOODFLOWIMAGINGPHENOTYPESINALZHEIMER’SDISEASE

XiaohuiYao1,ShannonL.Risacher2,KwangsikNho2,AndrewJ.Saykin2,HengHuang3,ZeWang4,LiShen2

1SchoolofInformaticsandComputing,IndianaUniversity,Indianapolis;2DepartmentofRadiologyandImagingSciences,IndianaUniversitySchoolofMedicine;3Departmentof

ElectricalandComputerEngineering,UniversityofPittsburgh;4DepartmentofRadiology,LewisKatzSchoolofMedicine,TempleUniversity

Heng,HuangCerebralbloodflow(CBF)providesameanstoassesstheneuronalandneurovascularconsequencesofAlzheimer’sdisease(AD)pathology.BothADspecificandnon-specificCBFchangesmaybedrivenbyuniqueorcommongeneticfactors.ToidentifygeneticvariantsassociatedwithADpathogenesis,weperformedatargetedanalysistoexamineassociationbetween4,033SNPsof24ADcandidategenesandCBFphenotypesmeasuredbyarterialspinlabeling(ASL)magneticresonanceimaging(MRI)infourbrainregionsofinterest(ROIs)includingleftangular,rightangular,lefttemporalandrighttemporalgyri.Participantsinclude258non-HispanicCaucasiansubjectsfromtheAlzheimer'sDiseaseNeuroimagingInitiative(ADNI)cohort.TargetedgeneticassociationanalysisofCBFoneachROIwastestedusinglinearregressionunderanadditivegeneticmodelinPLINK,whereage,genderandAPOEɛ4statuswereincludedascovariates.Post-hocanalysisusedBonferronicorrectionforadjustingboththegeneticandCBFmeasures.GATESwasusedtocalculategene-levelp-values.TheadditiveeffectsoftheidentifiedgeneticvariantsfromtheaboveassociationanalysiswerealsoassessedateachvoxelusingSPM12underone-wayANOVAtestwithage,genderandAPOEɛ4statusascovariates.Thesinglenucleotidepolymorphism(SNP)levelanalysisidentifiedanovellocusinINPP5D(inositolpolyphosphate-5-phosphataseD)significantlyassociatedwithleftangulargyrus(L-AG)CBF.Ingene-basedanalysis,bothINPP5DandCD2AP(CD2associatedprotein)wereassociatedwithL-AGCBF.ThediscoveredINPP5Dlocusexplained8.29%varianceofleftangularCBFafteradjustingforage,genderandAPOEɛ4status.FurtheranalysesonanindependentsubsetoftheADNIsamples(N=906)revealedthattheminoralleleofthelocuswasassociatedwithlowercerebrospinalfluidt-tau/Aβ1-42ratio.INPP5Dfunctionsasanegativeregulatorinimmunesystemandanumberofinflammatoryresponses,andhasbeenfoundrelatedtoinhibitTREM2signaling.TheidentifiedCBFriskfactorhasthepotentialtoprovidenovelinsightsforbetterrevealingthecomplexmolecularmechanismsofAD.ItwarrantsfurtherinvestigationwhethertheriskfactorisassociatedwiththeADpathophysiology,thevascularpathophysiology,and/ortheirinteraction.

Page 119: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

108

PBRM1MUTATIONSAREASSOCIATEDWITHTISSUEMORPHOLOGICALCHANGESINKIDNEYCANCER

JunCheng1,JieZhang2,ZhiHan2,LiangCheng2,QianjinFeng1,KunHuang2

1SouthernMedicalUniversity,2IndianaUniversitySchoolofMedicineKun,HuangBackground:Clearcellrenalcellcarcinoma(CCRC)isthemostcommonkidneycancer.Withtheaccumulationoflargescalegenomicdata,geneswithmutationsthatarecommontoCCRCpatientshavebeenidentified.Forinstance,VHLhasmutationsinalmost49.9%oftheCCRCpatientsinTheCancerGenomeAtlas(TCGA)projectfollowedbyPBRM1,MUC4andSETD2.WhilesomeofthesegeneshavebeenestablishedasdrivergenesforCCRC(e.g.,VHLandSETD2),thefunctionalimplicationsoftheirmutationsarestillbeingcharacterized.Previousstudiesoftenfocusedontheeffectsofthemutationsonmolecularlevelssuchasgene/microRNAexpressionandDNAmethylation.Inthisstudyweaimtocharacterizethemorphologicalchangesatcellularandtissuelevelsassociatedwiththesemutations.Methods:Mutationalstatusandhistopathologicalimagingdatafor448CCRCpatientswereobtainedfromTCGAthroughtheNCIGenomicDataCommons.Therearesixgeneswithmutationsinmorethan7%ofthepatients,theyareVHL,PBRM1,MUC4,SETD2,BAP1,andMTOR.Theimagingfeatureswerethenextractedusingcomputationalpipelinewehavepreviouslydeveloped.Ourpipelineconsistsofthreesteps:nucleussegmentation,cell-levelfeatureextraction,andaggregatingcell-levelfeaturesintopatient-levelfeatures.Tentypesofcell-levelfeatureswereextractedincludingnucleararea(area),lengthsofmajorandminoraxesofcellnucleusandtheirratio(major,minor,andratio),meanpixelvaluesofnucleusinRGBthreechannelsrespectively(rMean,gMean,andbMean),andmean,maximum,andminimumdistances(distMean,distMax,anddistMin)toneighboringnucleiinDelaunaytriangulationgraph.Atlast,allcell-levelfeaturesfromthesamepatientwereaggregatedintopatient-levelfeaturesusingabag-of-visual-wordsmodelwithK-means(K=10)algorithmforlearningwords.Fiveadditionalparameterswerecalculatedforeachtypeoffeatures-mean,standarddeviation,skewness,kurtosis,andentropy.Thusthereare150imagefeaturesintotal.Foreachselectedgene,thefeatureswerecomparedbetweenpatientswithandwithoutmutationsusingMann-Whitney-Utests.Results:Whilethereareimagingfeatureswithp-valuelessthan0.05foreverygene,multipletestcompensation(BHFDR)suggestedthatonlyPBRM1mutationsareassociatedwithsignificantlydifferentimagingfeatures(69featureswithq-value<0.05).Amongthem‘distMax_bin2’,‘distMin_bin3’,‘ratio_bin9’showsignificantlyincreasesinthemutationgroupwhile‘distMean_std’,‘major_std’and‘ratio_std’showsignificantdecreases.DiscussionandConclusion:TheaboveresultssuggestthattumorcellsinthepatientswithPBRM1mutationsaremorecompactandtheirnucleishapesaremorehomogeneousandclosertoaroundshape.TheseresultsareconsistentwithvisualinspectionandpreviousreportthatPBRM1mutationleadstodecreaseofextracellularmatrixgeneexpressionandthusareductionofstroma.

Page 120: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

109

IMAGEGENOMICSOFINTRA-TUMORHETEROGENEITYUSINGDEEPNEURALNETWORKS

HuiQu1,SubhajyotiDe2,DimitrisMetaxas1

1RutgersUniversity,2CancerInstituteofNewJerseyDimitris,MetaxasIntra-tumorheterogeneityi.e.genetic,molecular,andphenotypicdifferencesbetweentumorcellswithinasingletumorisamajorchallengeforclinicalmanagementofcancerpatients,contributingtotherapeuticfailure,diseaserelapsesanddrugresistance.Whilerecentfindingssuggestthatthereisextensiveintra-tumorgeneticheterogeneityinallmajorcancertypes,itremainstobeunderstoodhowthatrelatestointra-tumorheterogeneityatthepathway-andcellphenotypelevel.Wehavedevelopedaninnovativecomputationalframeworkbasedonneuralnetworkstoidentifycellularfeaturesfromhistologicalslidesandthenassociatethemwithgenomicandpathway-levelfeaturesinamulti-scalemodel,beforeapplyingittoacohortof469bladdercancersampleswhichhasgenomic,transcriptomic,pathway,andhistologicalimagingdata.Inbrief,ourmethodfirstusesaTumorSegmentationNetwork(TSN)andNucleiSegmentationNetwork(NCN)toidentifytumorcellsregionsandtumornucleiinthehistologicalslides.Fortumorsegmentation,wefirstlyextractedtumorandnormalpatchesfromthewholeslideimagesof40patients,thentrainedaTSNtoclassifyanypatchintotumorornormal.Givenanyotherwholeslideimage,thetrainedmodelcanidentifyalltumorpatches,whichformsthetumorregionsaftermorphologicaloperations.Thesegmentedtumorregionsandnucleiarethenusedtocomputeq-statistic,andalsoalphaandbetadiversitymeasureswhichreflectextentoflocalandregionalintra-tumorphenotypicheterogeneity.Benchmarkingagainstpathologicallycuratedestimatesindicatesthatthisapproachhashighaccuracyinidentifyingtumorcellfeaturesinaheterogeneoustumor.Wethenintegrateimagingandgenomicsdatatopredictaspectsofphenotypicheterogeneitybasedoncancer-relatedmutationsandgeneexpressionusinguni-andmultivariateapproachessuchasRelationNetwork(RN).Ourpreliminaryresultsareconsistentwithbiologicalknowledge.Forexample,weestimatedthenumberofsubclonesineachtumorbasedonmutationdata,andobservedthatindeedthesampleswithahighnumberofsubcloneshavehighphenotypicheterogeneityscores.WealsoestimatedmRNAexpressionlevelofKi67,amarkerofcellgrowthandobservedthatthesampleswithhigherq-statisticalsohadhigherKi67expression,suggestingthatcertainpatternsofintra-tumorheterogeneitycorrelatewithtumorcellgrowthrates.Multi-scaleanalysisintegratinggenetic,pathway-andphenotypicheterogeneitywillprovidefundamentalinsightsinto“functional”variabilitywithinandacrosscancers,helpingtorefineprecisionmedicineapproachestoimproveclinicalmanagementofcancerpatients.

Page 121: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

110

THENEUROIMAGINGINFORMATICSTOOLSANDRESOURCESCOLLABORATORY(NITRC)ANDITSIMAGINGGENOMICSDOMAIN

LiShen1,DavidKennedy2,ChristianHaselgrove2,AbbyPaulson3,NinaPreuss3,RobertBuccigrossi3,MatthewTravers3,AlbertCrowley3,andTheNITRCTeam3

1DepartmentofRadiologyandImagingSciences,IndianaUniversitySchoolofMedicine;

2DepartmentofPsychiatry,UniversityofMassachusettsMedicalSchool;3TCG,Inc.Li,ShenAimofInvestigation:NeuroimagingInformaticsToolsandResourcesCollaboratory(NITRC)isaneuroinformaticsknowledgeenvironmentforMR,PET/SPECT,CT,EEG/MEG,opticalimaging,clinicalneuroinformatics,computationalneuroscience,andimaginggenomicstoolsandresources.WeencourageresearcherstolisttheirImagingGenomicstoolsattheNITRCwebsitewww.nitrc.org.Methods:Initiatedin2006throughtheNIHBlueprintforNeuroscienceResearch,NITRC’smissionistofosterauser-friendlyknowledgeenvironmentfortheneuroinformaticscommunity.In2012,NITRCaddedImagingGenomicstoitsbroadenedscientificscope.Bycontinuingtoidentifyexistingsoftwaretoolsandresourcesvaluabletothiscommunity,NITRC’sgoalistosupportitsresearchersdedicatedtoenhancing,adopting,distributing,andcontributingtotheevolutionofneuroinformaticsanalysissoftware,data,andcomputeresources.Results:Locatedonthewebatwww.nitrc.org,theResourcesRegistry(NITRC-R)promotessoftwaretoolsandresources,vocabularies,testdata,anddatabases,therebyextendingtheimpactofpreviouslyfunded,neuroimaginginformaticscontributionstoabroadercommunity.NITRC-Rgivesresearchersgreaterandmoreefficientaccesstothetoolsandresourcestheyneed,bettercategorizingandorganizingexistingtoolsandresources,facilitatinginteractionsbetweenresearchersanddevelopers,andpromotingbetterusethroughenhanceddocumentationandtutorials—allwhiledirectingthemostrecentupgrades,forums,andupdates.Asof11/2017,over970publicresourcesarelistedonNITRC-R,wheretheImagingGenomicsdomainincludes60resourcessuchasADNI,TCGA,ENIGMA,UKBiobank,andothers.NITRC-ImageRepository(NITRC-IR)makes8,285imagingsessionspubliclyavailableatnocharge,andNITRCComputationalEnvironment(NITRC-CE)providescloud-basedcomputationservicesdownloadabletoyourmachinesorviacommercialcloudproviderssuchasAmazonWebServicesandMicrosoftAzure.Conclusions:Insummary,NITRCisnowanestablishedknowledgeenvironmentfortheneuroimagingcommunitywheretoolsandresourcesarepresentedinacoherentandsynergisticenvironment.Withitsexpandedscopeintoimaginggenomics,NITRCaimstobecomeatrustedsourceforidentificationofresourcesinthishighlyactiveandpromisingdomainbridgingadvancedneuroimagingandgenomics.Weencouragetheimaginggenomicsresearchcommunitytocontinueprovidingvaluableresources,designandcontentfeedbackandtoutilizetheseresourcesinsupportofdatasharingrequirements,softwaredisseminationandcost-effectivecomputationalperformance.Acknowledgements:FundedbytheNIHBlueprintforNeuroscienceResearch,NIBIB,NIDA,NIMH,andNINDS.

Page 122: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

111

IDENTIFYINGTHEGISTOFCNNS:FINDINGINTERPRETABLESIGNATURESOFHISTOLOGYIMAGEMODELSBUILTUSINGNEURALNETWORKS

ArunimaSrivastava1,ChaitanyaKulkarni1,KunHuang2,ParagMallick3,RaghuMachiraju1

1TheOhioStateUniversity,2IndianaUniversitySchoolofMedicine,3StanfordUniversityArunima,SrivastavaConvolutionalNeuralNetworks(CNNs)havegainedsteadypopularityastheselectedmethodofhistologyimageanalysisandsubsequentdiseasemodeling.SinceCNNsarepurelydatadrivenlearningmodels,theyhaveanedgeovermorphologydriven(pre-selected)tissueimagefeaturesthatmaybebiasedanddifficulttogeneralize.Morphologicalfeatures,namelytissuetexture,structure,nucleisizeandshape,presenceoffibroblastsandlymphocytesetc.,mightnotbecomprehensiveenoughfordifferentdatasets,buttheydoprovideaninherentlyinterpretablecharacterizationofthehistology.WhileCNNsandtheirsubsequentfeaturesprovetobepowerfulclassifiers,theyfailtoprovideanexplanationforthisclassification,asthefeaturesareONLYinterpretablebytheCNNsthemselves.Translating“underthehood”activitiesofaCNNwouldendeavortomakeitmoregeneralizablewhilethefinalmodelwillnotonlybeabletoeffectivelyclassifywholeslidetissueimages,itwillalsohavethepotentialtoeducateusonthenuancesofthehistologicaldata.Thisworkaimstousebothtypesofinterpretable(morphological)andpowerfulbutun-interpretable(CNNbased)featurestoderiveasignatureforsuccessfulCNNmodels,whichhelprelatethemtoknownbiologicalattributesandshedlightoncomponentsthatarecriticaltothevarioussubtypesunderinvestigation.WeuseastratifiedbreastcancerhistologyclassificationdatasetfromtheBioImaging(2015)Challengethatcontainssampleimagesfromfourdifferentkindsofbreasttissue(Normal,Benignlesion,In-situcarcinomaandInvasivecarcinoma).Byfollowingatwo-prongedapproachofmodelingthesamedatasetusingCNNs(usingtheGoogLeNetarchitecture)andmorphologicalfeatures(usingCellProfiler-abiologicalimageanalyticstool),itwaspossibletoinferaninterpretablesignatureoffeaturesutilizedbytheCNN.Weadditionallyexplorethepossibilityofcombiningthesetwotechniquestoextractamorepowerfulandpreciseclassification.Thisworksummarizestheneedforunderstandingthewidelytrustedmodelsbuiltusingdeeplearning,andaddsalayerofbiologicalcontexttoatechniquethatfunctionedasaclassificationonlyapproachtillnow.

Page 123: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

112

PRECISIONMEDICINE:FROMDIPLOTYPESTODISPARITIESTOWARDSIMPROVEDHEALTHANDTHERAPIES

POSTERPRESENTATIONS

Page 124: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

113

EXPLORINGTHEPOTENTIALOFEXOMESEQUENCINGINNEWBORNSCREENING

StevenE.Brenner1,AashishN.Adhikari1,YaqiongWang1,RobertJ.Currier2,RenataC.Gallagher3,RobertL.Nussbaum4,YangyunZou1,UmaSunderam5,JosephSheih3,FlaviaChen3,MarkKvale3,SeanD.Mooney6,RajSrinivasan5,BarbaraA.Koenig3,PuiKwok3,

JenniferM.Puck3,TheNBSeqProject

1UniversityofCalifornia-Berkeley,2CaliforniaDepartmentofPublicHealth,3UniversityofCalifornia-SanFrancisco,4Invitae,5TataConsultancyServices,6Universityof

WashingtonSteven,BrennerTheNBSeqprojectisevaluatingeffectivenessofwholeexomesequencing(WES)fordetectinginbornerrorsofmetabolism(IEM)fornewbornscreening(NBS).De-identifiedarchiveddriedbloodspotsfromMS/MStruepositiveandfalsepositivecasespreviouslyidentifiedintheCaliforniaNBSwerestudied.18outof137affectedindividualslackedtworarepotentiallydamagingsinglenucleotidevariantsorshortindelsingenesresponsiblefortheirMendeliandisorders.Thesensitivityofcausalmutationdetectionin137PhaseINBSeqexomesvariedacrossdisorders;allaffectedPKUcaseswerepredictedcorrectly,butseveralcasesofotherIEMsweremissed.Insomecases,exomesalsoconfidentlyidentifieddisordersdifferentfromthemetaboliccenterdiagnoses,suggestingthatsequencinginformationwouldhavebeenvaluableforproperclinicaldiagnosesinthosecases.Deeperanalysisofthedatawasundertakentoassesssourcesofdiscrepancybetweensequencingresults,MS/MScall,andclinicaldiagnosis.Copynumbervariation(CNV)callingtoolswereevaluatedonNBSeqexomesforabilitytoresolvesomeoftheseexomefalsenegatives.CNVtoolscanbothmissCNVsinexomesandreportthemspuriously.Weoptimizedtoolsforourdataandfilteredoutgenes(PRODH,HCFC1,ETFA)harboringcommonCNVs(identifiedfromCNVcallsonthe1000genomesprojectexomes).Thisidentifieddeletionsinthecorrectgenesfor4ofthe32exomefalsenegativesusingXHMM:2isovalericacidemiacases,1methylmalonicacidemiacaseand1OTCdeficiencycase.Wealsosystematicallyreviewedeveryvariantin78metabolicdisordergenesannotatedbyHGMDorClinVaraspathogenicorlikelypathogenicwith1000genomesMAF>0.1%.Ourre-assessmentoftheprimaryliteraturefor59suchvariantsfoundthatonly18werereportable(manystillVUS)andtherestweexcludedfromthepipeline.Literaturereviewalsohelpedidentify8casesdiagnosedwithshort-chainacyl-CoAdehydrogenase(SCAD)deficiencybutnotflaggedbyexomes.All8individualsharboredacommon(1000GenomesMAF:18.2%)ACADSallele(c.625A>G)presentinseveralNBSeqexomes,whichsometimesconfersapartialbiochemicalphenotypebutnotclinicaldisease.Forassessment,wetreatedtheseindividualsasunaffected.IncorporationofCNVdetectionandvariantcurationintoouranalysispipelineimprovedoverallsensitivityfrom77.9%to87.6%onthe137affectedPhaseINBSeqsamples.ThisupdatedpipelinewillberunonadditionalNBSeqexomestoassessthepotentialroleforWESinNBS.WhilestillnotsufficientlyspecificaloneforscreeningofmostIEMs,WEScanfacilitatetimelyandmoreprecisecaseresolution.

Page 125: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

114

AMETHODFORIMPROVEDVARIANTCALLINGATHOMOPOLYMERMARGINS(ANDELSEWHERE)

J.Buckley,M.Hiemenz,J.Biegel,T.Triche,A.Ryutov,D.Maglinte,D.Ostrow,X.Gai

CenterforPersonalizedMedicine,Children’sHospitalofLosAngelesJonathan,BuckleyAllsequencingtechnologiesaresubjecttoreaderrorswhich,inthecontextofvariantcalling(particularlylowvariant-allele-frequency(VAF)variantcalling),canyieldmiscalls.Readerrorsaremostproblematicwhengenomiccontext(suchasproximitytohomopolymers)influencestheerrorrateTheCenterofPersonalizedMedicineatChildren’sHospitalofLosAngeles(CHLA)recentlycollaboratedwithThermo-Fisher(TF)indevelopmentofaclinicalpediatriccancerpanelforsomaticvariantdetection(OncoKidsTM),usingTF’sIonTorrentsequencingplatform.Thetestneededtoidentifyvariantsintumorsub-clonesandinsampleswithanadmixtureoftumorandnormalcells,bothsituationsthatcanyieldlowVAFs.Ourchallengewastooptimizevariantcallingathomopolymermargins,andothergenomiclociwithahighbackgrounderrorrate(noise).TheTFapproachwastoidentifyproblematiclociandtoeitherlimitbasecallstoreadsfromonestrand(whenerrorsclusteredmostlyontheotherstrand),or‘blacklist’thelocusaltogether.Whilethisapproachwasconservative,avoidingmostfalsepositives,itresultedinunacceptablefalsenegativerates,particularlyforInDels.Giventhedeepcoverage(over1000xinmanyregions),itseemedlikelythatamorenuancedapproachmightyieldaccuratecalls,eveninthepresenceofsubstantialnoise.Thispresentationoutlinesanalgorithm(LocalAdjustmentforBackground,orLAB)developedatCHLAthatusesareferencedataset(filteringouttruepositives)toestablishthenoisedistributionateachlocus.Thenoisedistributionvariesgreatlyacrossthepanelgenes,fromessentiallyerror-freelocitolociinwhichthemajorityofreadsshowaspuriousbasesubstitutionorInDel.Whileproximitytoahomopolymerisastrongdeterminantofnoise,non-homopolymerregionscanalsohavehighnoiseandmanyhomopolymersyieldrelativelycleandata.VariantcallsaremadethroughcomparisonoftheobservedVAFwiththelocus-specificVAFdistributioninthereference.Optionally,thereferencesetcanbelimitedtosamplesofthesametypeasthetestsample(e.g.FFPE).Adjustmentsmaybemadeforsampleswithgloballyincreasederrorrates.InregionsofcomplexInDelpatterns,astatisticalmodeltestsforshiftsinthesepatterns,indicativeofatruevariant.AnimportantcomponentisaGUIthatprovidesavisualrepresentationofthebasisforacall,andoptionssuchasstrand-specificanalysis.ApplicationtosampleswithknownSNVsandInDels(Acrometrix‘groundtruth’samples)resultedinimprovementinInDelcallsfrom65%to100%.Thepresentationwilldescribethecallingpipeline,withillustrativeexamples,andpresentcomparativeperformancedata.

Page 126: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

115

EFFICIENTSURVIVALMULTIFACTORDIMENSIONALITYREDUCTIONMETHODFORDETECTINGGENE-GENEINTERACTION

JiangGui,XuemeiJi,ChristopherI.Amos

DepartmentofBiomedicalDataScience,GeiselSchoolofMedicine,Dartmouth,Lebanon,NH03756

TheproblemofidentifyingSNP-SNPinteractionsincase-controlstudieshasbeenstudiedextensivelyandanumberofnewtechniqueshavebeendeveloped.Littleprogresshasbeenmade,howeverintheanalysisofSNP-SNPinteractionsinrelationtocensoredsurvivaldata.Wepresentanextensionofthetwoclassmultifactordimensionalityreduction(MDR)algorithmthatenablesdetectionandcharacterizationofepistaticSNP-SNPinteractionsinthecontextofsurvivaloutcome.TheproposedanEfficientSurvivalMDR(ES-MDR)methodhandlescensoreddatabymodifyingMDR’sconstructiveinductionalgorithmtouselogrankTest.WeappliedES-MDRtogeneticdataofover470,000SNPsfromtheOncoArrayConsortium.Weuseonsetageoflungcancerandcase-control(n=27,312)statusasthesurvivaloutcomeanddivideddataintotrainingandtestingsets.Wealsoadjustforsubject’sage,genderandsmokingstatus.Fromtrainingset,weidentifiedinterationbetweenSNPsfromBRCA1andIL17RCgenesasthetopmodelthatisassciatedwithlungcanceronsetage.Thisresultisvalidatedinthetestingset.ES-MDRiscapableofdetectinginteractionmodelswithweakmaineffects.Theseepistaticmodelstendtobedroppedbytraditionalregressionapproaches.

Page 127: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

116

BIOINFORMATICSPROCESSINGSTRATEGIESFOREFFICIENTSEQUENCINGDATASTORAGEUSINGGVCFBANDING

NicholasB.Larson,ShannonK.McDonnell,IainF.Horton,SaurabhBaheti,JeanetteE.Eckel-Passow,StevenN.Hart

MayoClinic

Nicholas,LarsonAnemergingchallengeintheeraofnext-generationsequencing(NGS)isefficientdatastoragepractices,particularlyforfileformatsthataccommodateadhocconstructionofanalysis-readydatasets.TheVariantCallFormat(VCF)isthepredominantfiletypeusedforstoringandanalyzingNGS-basedgeneticvariantinformation.However,itpresentsmultiplepracticallimitationswhenmergingindividualfilesformulti-samplerepresentations.RecentdevelopmentofthegVCFfileformatbyGATKaddressesmanyoftheseconcernsbycharacterizingsame-as-referencesegmentsofthegenomeasintervalentriesdefinedbyasharedgenotypequality(GQ)score.CurrentdefaultsettingstogeneratethisintermediatefileformatresultinanewdataentryateachbasepairpositiontheGQshifts,presentingcost-benefitconsiderationsofimprovedandcomputationallyefficientmulti-samplegenotypingattheexpenseoflargeintermediatefiles.However,additionaloptionsallowforcontiguousentriestobemergediftheyfallwithinapredefinedGQbin,aprocessknownasbanding.WehypothesizedthatsubstantialgVCFfilesizereductioncouldbeattainedforwhole-genomesequencing(WGS)throughtheuseofcoarseGQbandingoptions;althoughtheimpactofthisapproachonoutputqualityofmulti-samplevariantcallingiscurrentlyunknown.ToinvestigatethepropertiesofgVCFbandingongenotypingintegrity,weprocessed50WGSsamplesaswellas50whole-exomesequencing(WES)samplesfromtheMayoClinicBiobankunderavarietyofGQbandingsettings(default,intervalsof10,{0,20,60},{0,20}).Thesesingle-samplegVCFfilesweresubsequentlymergedandjointgenotypedundervaryingcombinationsofbandingoptions,separatelybysequencingapplication,andoutputgenotypesforchromosome22werecomparedforconcordancewithresultsusingcompleteinformation(i.e.,nobanding).Overall,WGSsamplesexhibitedsubstantiallysmallergVCFfiles,with{0,20}bandingresultinginameanfilesizereductionof87%(range:84-90%)relativetodefaultsettings.Genotypeconcordanceexceeded99.9%underallcomparisons,whileweadditionallyobservedmorevariablepositionsemittedascoarserbindefinitionswereapplied.ComparablefindingswereobservedforWESdata.OurresultshighlightimpressiveimprovementsinNGSvariantcalldatastorageefficiencygainedbycoarsebandingoptionsforgVCFoutput,withminimalimpactonaccompanyinggenotypingquality.

Page 128: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

117

IDENTIFICATIONOFANOVELTSC2MUTATIONINAPATIENTWITHTUBEROUSSCLEROSISCOMPLEX

Jae-HyungLee1,Su-KyeongHwang2,Jung-eunYang3,Chae-SeokLim3,Jin-ALee4,KyungminLee5,Bong-KiunKaang3,Yong-SeokLee6

1KyungHeeUniversity,2KyungpookNationalUniversityHospital,3SeoulNational

University,4HannamUniversity,5KyungpookNationalUniversityGraduateSchoolofMedicine,6SeoulNationalUniversityCollegeofMedicine

Yong-Seok,LeeTuberoussclerosiscomplex(TSC)isaneurocutaneousdisordercharacterizedbymultiplesymptomsincludingneuropsychologicaldeficitssuchasseizures,intellectualdisability,andautism.TSCisinheritedinanautosomaldominantpatternandiscausedbymutationsineithertheTSC1orTSC2genes,whichresultinthehyperactivationofthemammaliantargetofrapamycin(mTOR)signalingpathway.Inthisstudy,weidentifiedanovelsmalldeletionmutationinTSC2byperformingwholeexomesequencinginaKoreanpatient,whoexhibitedmultipleTSC-associatedsymptomsincludingfrequentseizures,intellectualdisability,languagedelays,andsocialproblems.Inaddition,wevalidatedthefunctionalsignificanceofthenovelmutationbyexaminingtheeffectofthedeletionmutantonmTORpathwayactivation.RecentstudieshavesuggestedthatmTORinhibitorssuchasrapamycincanbeeffectivetotreatTSC-associateddeficitsinrodentmodelsofTSC.Accordingly,wefoundthateverolimustreatmenthasbeneficialeffectsonSEGAsizeandautismrelatedbehaviorsinthepatient.

Page 129: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

118

CONSIDERATIONSFORAUTOMATEDMACHINELEARNINGINCLINICALMETABOLICPROFILING:ALTEREDHOMOCYSTEINEPLASMACONCENTRATION

ASSOCIATEDWITHMETFORMINEXPOSURE

AlenaOrlenko1,JasonH.Moore1,PatrykOrzechowski1,2,RandalS.Olson1,JunmeiCairns3,PedroJ.Caraballo3,RichardM.Weinshilboum3,LieweiWang3,MatthewK.

Breitenstein1

1UniversityofPennsylvania;2AGHUniversityofScienceandTechnology,Krakow,Poland;3MayoClinic

Alena,OrlenkoWiththematurationofmetabolomicsscienceandproliferationofbiobanks,clinicalmetabolicprofilingisanincreasinglyopportunisticfrontierforadvancingtranslationalclinicalresearch.AutomatedMachineLearning(AutoML)approachesprovideexcitingopportunitytoguidefeatureselectioninagnosticmetabolicprofilingendeavors,wherepotentiallythousandsofindependentdatapointsmustbeevaluated.Inpreviousresearch,AutoMLusinghigh-dimensionaldataofvaryingtypeshasbeendemonstrablyrobust,outperformingtraditionalapproaches.However,considerationsforapplicationinclinicalmetabolicprofilingremaintobeevaluated.Particularly,regardingtherobustnessofAutoMLtoidentifyandadjustforcommonclinicalconfounders.Inthisstudy,wepresentafocusedcasestudyregardingAutoMLconsiderationsforusingtheTree-BasedOptimizationTool(TPOT)inmetabolicprofilingofexposuretometformininabiobankcohort.First,weproposeatandemrank-accuracymeasuretoguideagnosticfeatureselectionandcorrespondingthresholddeterminationinclinicalmetabolicprofilingendeavors.Second,whileAutoML,usingdefaultparameters,demonstratedpotentialtolacksensitivitytolow-effectconfoundingclinicalcovariates,wedemonstratedresidualtrainingandadjustmentofmetabolitefeaturesasaneasilyapplicableapproachtoensureAutoMLadjustmentforpotentialconfoundingcharacteristics.Finally,wepresentincreasedhomocysteinewithlong-termexposuretometforminasapotentiallynovel,non-replicatedmetaboliteassociationsuggestedbyTPOT;anassociationnotidentifiedinparallelclinicalmetabolicprofilingendeavors.Whilewarrantingindependentreplication,ourtandemrank-accuracymeasuresuggestshomocysteinetobethemetabolitefeaturewithlargesteffect,andcorrespondingpriorityforfurthertranslationalclinicalresearch.ResidualtrainingandadjustmentforapotentialconfoundingeffectbyBMIonlyslightlymodifiedthesuggestedassociation.IncreasedhomocysteineisthoughttobeassociatedwithvitaminB12deficiency–evaluationforpotentialclinicalrelevanceissuggested.Whileconsiderationsforclinicalmetabolicprofilingarerecommended,includingadjustmentapproachesforclinicalconfounders,AutoMLpresentsanexcitingtooltoenhanceclinicalmetabolicprofilingandadvancetranslationalresearchendeavors.

Page 130: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

119

PHARMGKB:NEWWEBSITERELEASE2017

MichelleWhirl-Carrillo1,RyanM.Whaley1,MarkWoon1,KatrinSangkuhi1,LiGong1,JuliaBarbarino1,CarolineThorn1,RachelHuddart1,MariaAlvarellos1,JillRobinson1,RussB.

Altman2,TeriE.Klein3

1DepartmentofBiomedicalDataScience,StanfordUniversity;2DepartmentofBioengineering,MedicineandGenetics,StanfordUniversity;3DepartmentofBiomedical

DataScienceandMedicine,StanfordUniversityAlena,OrlenkoWithPharmGKBisthelargestpubliclyavailableresourceforpharmacogenomics(PGx)discoveryandimplementation.Itsmissionistocollect,curate,integrateanddisseminateknowledgeabouthowhumangeneticvariationinfluencesdrugresponse.ThePharmGKBwebsiteallowsuserstoselectandviewinformationviasearch,filterandbrowseoptions.DataisalsoavailablebydirectdownloadthroughthewebsiteandthroughthePharmGKBAPI.PharmGKBlaunchedanewandimproveduserinterfaceinSeptember2017.Thenewwebsiteoffersbenefitssuchasadisplaythatworksonmobileandsmallscreendevices,improvedsearchingandfilteringcapabilities,andfasterpageloadspeeds.WhilethelookofPharmGKBhaschanged,allthecontentthatwasavailablepreviouslyisstillavailable,including:

• 5500annotatedgeneticvariants• 14,000curatedpeer-reviewedPGxarticles• 125evidence-basedpharmacokineticandpharmacodynamicspathways• 60reviewsofkeyPGxgenes(veryimportantpharmacogenes)• 450curateddruglabels• 90gene-drugpairswithcuratedgenotype-baseddrugdosingguidelines

Thewebsitefeaturesanonlinetutorialthatuserscanaccessbyfollowingthescreenprompts.Formoreinformation,pleasevisitPharmGKBathttp://www.pharmgkb.org.

Page 131: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

120

READINGBETWEENTHEGENES:COMPUTATIONALMODELSTODISCOVERFUNCTIONAND/ORCLINICALUTILITYFROM

NONCODINGDNA

POSTERPRESENTATIONS

Page 132: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

121

NETWORKANALYSISOFPSEUDOGENE-GENERELATIONSHIPS:FROMPSEUDOGENEEVOLUTIONTOTHEIRFUNCTIONALPOTENTIALS

TravisS.Johnson1,SihongLi1,JohnathanR.Kho2,KunHuang3,YanZhang1

1OhioStateUniversity,2GeorgiaInstituteofTechnology,3IndianaUniversityTravis,JohnsonPseudogenesarefossilrelativesofgenes.Pseudogeneshavelongbeenthoughtofas“junkDNAs”,sincetheydonotcodeproteinsinnormaltissues.Althoughmostofthehumanpseudogenesdonothavenoticeablefunctions,~20%ofthemexhibittranscriptionalactivity.TherehasbeenevidenceshowingthatsomepseudogenesadoptedfunctionsaslncRNAsandworkasregulatorsofgeneexpression.Furthermore,pseudogenescanevenbe“reactivated”insomeconditions,suchascancerinitiation.Somepseudogenesaretranscribedinspecificcancertypes,andsomeareeventranslatedintoproteinsasobservedinseveralcancercelllines.Alltheabovehaveshownthatpseudogenescouldhavefunctionalrolesorpotentialsinthegenome.Evaluatingtherelationshipsbetweenpseudogenesandtheirgenecounterpartscouldhelpusrevealtheevolutionarypathofpseudogenesandassociatepseudogeneswithfunctionalpotentials.Italsoprovidesaninsightintotheregulatorynetworksinvolvingpseudogeneswithtranscriptionalandeventranslationalactivities.Inthisstudy,wedevelopanovelapproachintegratinggraphanalysis,sequencealignmentandfunctionalanalysistoevaluatepseudogene-generelationships,andapplyittohumangenehomologsandpseudogenes.Wegeneratedacomprehensivesetof445pseudogene-gene(PGG)familiesfromtheoriginal3,281genefamilies(13.56%).Ofthese438(98.4%PGG,13.3%total)werenon-trivial(containingmorethanonepseudogene).EachPGGfamilycontainsmultiplegenesandpseudogeneswithhighsequencesimilarity.Foreachfamily,wegenerateasequencealignmentnetworkandphylogenetictreesrecapitulatingtheevolutionarypaths.Wefindevidencesupportingtheevolutionhistoryofolfactoryfamily(bothgenesandpseudogenes)inhuman,whichalsosupportsthevalidityofouranalysismethod.Next,weevaluatethesenetworksinrespecttothegeneontologyfromwhichweidentifyfunctionsenrichedinthesepseudogene-genefamiliesandinferfunctionalimpactofpseudogenesinvolvedinthenetworks.ThisdemonstratestheapplicationofourPGGnetworkdatabaseinthestudyofpseudogenefunctionindiseasecontext.

Page 133: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

122

RANDOMWALKSONMUTUALMICRORNA-TARGETGENEINTERACTIONNETWORKIMPROVETHEPREDICTIONOFDISEASE-ASSOCIATEDMICRORNAS

Duc-HauLe1,LievenVerbeke2,LeHoangSon3,Dinh-ToiChu4,Van-HuyPham5

1VinmecResearchInstituteofStemCellandGeneTechnology,458MinhKhai,HaiBaTrung,Hanoi,Vietnam;2DepartmentofInformationTechnology,GhentUniversity-imec,Ghent,

Belgium;3VNUUniversityofScience,VietnamNationalUniversity,Hanoi,Vietnam;4FacultyofBiology,HanoiNationalUniversityofEducation,Hanoi,Vietnam;5FacultyofInformation

Technology,TonDucThangUniversity,HoChiMinhCity,VietnamDuc-Hau,LeBackgroundMicroRNAs(miRNAs)havebeenshowntoplayanimportantroleinpathologicalinitiation,progressionandmaintenance.Becauseidentificationinthelaboratoryofdisease-relatedmiRNAsisnotstraightforward,numerousnetwork-basedmethodshavebeendevelopedtopredictnovelmiRNAsinsilico.Homogeneousnetworks(inwhicheverynodeisamiRNA)basedonthetargetssharedbetweenmiRNAshavebeenwidelyusedtopredicttheirroleindiseasephenotypes.Althoughsuchhomogeneousnetworkscanpredictpotentialdisease-associatedmiRNAs,theydonotconsidertherolesofthetargetgenesofthemiRNAs.Here,weintroduceanovelmethodbasedonaheterogeneousnetworkthatnotonlyconsidersmiRNAsbutalsothecorrespondingtargetgenesinthenetworkmodel.ResultsInsteadofconstructinghomogeneousmiRNAnetworks,webuiltheterogeneousmiRNAnetworksconsistingofbothmiRNAsandtheirtargetgenes,usingdatabasesofknownmiRNA-targetgeneinteractions.Inaddition,asrecentstudiesdemonstratedreciprocalregulatoryrelationsbetweenmiRNAsandtheirtargetgenes,weconsideredtheseheterogeneousmiRNAnetworkstobeundirected,assumingmutualmiRNA-targetinteractions.Next,weintroducedanovelmethod(RWRMTN)operatingonthesemutualheterogeneousmiRNAnetworkstorankcandidatedisease-relatedmiRNAsusingarandomwalkwithrestart(RWR)basedalgorithm.Usingbothknowndisease-associatedmiRNAsandtheirtargetgenesasseednodes,themethodcanidentifyadditionalmiRNAsinvolvedinthediseasephenotype.ExperimentsindicatedthatRWRMTNoutperformedtwoexistingstate-of-the-artmethods:RWRMDA,anetwork-basedmethodthatalsousesaRWRonhomogeneous(ratherthanheterogeneous)miRNAnetworks,andRLSMDA,amachinelearning-basedmethod.Interestingly,wecouldrelatethisperformancegaintotheemergenceof“diseasemodules”intheheterogeneousmiRNAnetworksusedasinputforthealgorithm.Moreover,wecoulddemonstratethatRWRMTNisstable,performingwellwhenusingbothexperimentallyvalidatedandpredictedmiRNA-targetgeneinteractiondatafornetworkconstruction.Finally,usingRWRMTN,weidentified76novelmiRNAsassociatedwith23diseasephenotypeswhichwerepresentinarecentdatabaseofknowndisease-miRNAassociations.ConclusionsSummarizing,usingrandomwalksonmutualmiRNA-targetnetworksimprovesthepredictionofnoveldisease-associatedmiRNAsbecauseoftheexistenceof“diseasemodules”inthesenetworks.

Page 134: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

123

TEXTMININGANDVISUALIZATIONFORPRECISIONMEDICINE

POSTERPRESENTATIONS

Page 135: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

124

MININGELECTRONICHEALTHRECORDSFORPATIENT-CENTEREDOUTCOMESTOGUIDETREATMENTPATHWAYDECISIONSFOLLOWINGPROSTATECANCER

DIAGNOSIS

SelenBozkurt1,JungInPark2,DanielL.Rubin3,JamesD.Brooks4,TinaHernandez-Boussard5

1AkdenizUniversityFacultyofMedicineDepartmentofBiostatisticsandMedicalInformaticsAntalya,Turkey;2StanfordUniversityDepartmentofMedicine(BiomedicalInformatics);

3StanfordUniversityDepartmentofRadiology;4StanfordUniversityDepartmentofUrology;5StanfordUniversityDepartmentofMedicine(BiomedicalInformatics)

Tina,Hernandez-BoussardElectronichealthrecords(EHRs)havepotentialfornoveldiscoveryofpatient-centeredoutcomesthatcanbeusedtoimprovehealthcaredelivery.However,asignificantamountofdatastoredinEHRsishiddeninclinicalnarrativesasunstructuredtext.Forprostatecancerpatients,theseclinicnarrativescontainalargeamountofinformation.PreviousworksuggeststhatstructureddataregardingdysfunctionsaftertreatmentforprostatecancerarenotconsistentlycapturedintheEHRandthuscannotbereliablyextractedforclinicalandresearchpurposes.Therefore,inthispreliminarystudyweproposearule-basednaturallanguageprocessingpipelinetoextractpatient-centeredoutcomesrelatedtothepresenceofurinary,bowelanderectiledysfunctionfollowingtreatmentofprostatecancerfromthefreetextoftheEHRnotes.Wedevelopedalexiconoftermsrelatedtourinary,bowelorerectiledysfunctionsbasedondomainknowledge,priorexperienceinthefield,andreviewofmedicalnotes.Areferencestandardof100randomlyselecteddocumentsforeachoutcomefrominpatientadmissionswasannotatedbyaresearchnursetoidentifyallrelatedconceptsas:present,negated,historical,anddiscussedrisk.Wedevelopedarule-basednaturallanguageprocessing(NLP)pipelinewhichusesdictionarymappingcombinedwithConTextalgorithm.WetrainedourNLPpipelineusing1,336documentsandtestedon20documentstodetermineagreementwiththehumanreferencestandardandstandardprecision,recallandoverallaccuracyrateswereusedasmetricstoquantifytheautomaticannotationperformance.Theprecision,recall,andaccuracyscoresfortheurinaryincontinenceannotationsagainstthereferencestandardoutputcreatedbyadomainexpertwas62.5%,100%and76.9%,respectively.Formostofthemisclassifiedcases,whichannotatedaspresenceofurinaryincontinencebytheNLPalgorithmbutnotbytheexpert,itisseenthatmedicationinformationincludedinthetermdictionarycausedambiguityregardingphenotypeclassification.Fortheerectiledysfunctionannotations,precisionwas100%,recallwas75%andoverallaccuracywas90%.Ontheotherhand,sinceanyboweldysfunctionwasreportedintherandomlyselectedtestset,evaluationmetricswerenotcalculated.Inthispreliminarystudy,wehaveshownthatitispossibletoidentifythepatient-centeredoutcomesfromthefreetextofEHRsusingnaturallanguageprocessing.UsingEHRstoassesspatient-centeredoutcomespromotespopulation-basedassessmentsofthesevaluedyetdifficulttoassessoutcomesandwillenabledetailedsensitivityandsubgroupanalysis.Suchresultswillallowclinicianstoindividualizecarefortheirpatients.Theresultswillalsoprovidedesperatelyneededevidence-basedcriteriaforpatient-centeredoutcomes.Thesecriteriacanbeusedinresearchstudies,inclinicalpractice,andtodeveloppracticeguidelines.Futureworkwillcreatelargernumberofwell-annotateddatasetsandcombineourrule-basedapproachwithmachinelearningtechniques.

Page 136: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

125

GDMINER:ABIOTEXTMININGSYSTEMFORGENE-DISEASERELATIONANALYSIS

SooJunPark1,JihyunKim2,SooYoungCho2,CharnyPark2,YoungSeekLee3

1ElectronicsandTelecommunicationsResearchInstitute,2NationalCancerCenter,3HanyangUniversity

SooJun,ParkResearchersofBiologyandMedicineoftenvisitPubMedtofindliteraturesfortheirstudies.WhilethekeywordsearchinPubMedmaybeapopulartooltoretrieveinformation,itislimitingasitonlyprovidesasmallnumberofresults.Thekeywordsearchdoesnotallowtheusertosiftthroughdecadesworthofresearchandextractallcorrespondingstudiesasneeded.ThisposterpresentationwillprovidesolutionsthroughabiotextminingsystemcalledGDMinerthatidentifiesbiologicalentities,extractstherelationshipfromthoseentities,anddiscoversassociationsbetweengenesanddiseases.WhenGDMinercollectsabstractsfromPubMed(PubMedcollector),anautomaticnamingentitysortstheinformationinto40biologicalcategories(EntityRecognizer).GDMinerthenextractsrelationsfromthebiomedicalcategories(RelationExtractor)byusingnaturallanguageprocessingtechniques,likePart-of-Speech(POS)taggingandsyntacticparsing.Thedisplayfeaturesgraphsandtablesshowingtheextractedrelations.Forexample,agene-diseaseassociationdataquerycanbeminedbyanalyzingtherelationsbetweengenesanddiseases.Thesystemconsistsofthefollowingthreeparts:PubMedcollector,relationextractorandrelationanalyzer.ThePubMedcollectorasksabstractswithaquerygivenbyauserandfetchesthem.Therelationextractordividesabstractsintosentencesandrecognizesbiomedicalnamedentitiesinsentences.Then,therelationanalyzerextractsrelationaleventsamongrecognizedentities.Relationsareextractedbysyntacticanalysisnotbyco-occurrenceinformation.OursystemparsessentencessyntacticallyinformsofthePennTreebanksyntactictagsandextractrelationsbyanalyzingparsingresults.OurrulesaresimpleandsmallbecausethesyntactictagsethavefewernumberoftagsthanthePOStagset,butnotlimitedtorelationtypes.Therelationvieweraccumulatesextractedrelationsandvisualizesingraphsandtables.Ifthenumberofnodesinthegeneratedrelationshipnetworkissmall,itiseasyfortheusertoeasilyfindtherelationshipbetweendesiredbioobjects(namedentities).However,ifthesizeofthegeneratednetworkisverylarge,itisverydifficulttofindtherelations.Oursystemhelpusertofindtherelationbetweenthedesiredbioobjectsbycreatingasmallsizesub-networkusingthesearchandfilteringfunction.Thereisarapidlygrowinginterestinproperlyutilizingbiomedicineliteraturewithintheresearchcommunityandtherateinwhichthebiomedicineliteratureisaccumulatingisacceleratingworldwide.Theimportanceofnotonlypreservingdata,butalsothewayinwhichresearchersextractinformationisnecessaryinaidingfuturebiologicalstudiesanddiscoveries.Implementinganautomatedsystemisnecessaryinkeepingupwiththegrowthandprovidingaccuracyinfindinganalogousinformationtoaresearcher’ssearch.

Page 137: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

126

WORKSHOP

MACHINELEARNINGANDDEEPANALYTICSFORBIOCOMPUTING:CALLFORBETTEREXPLAINABILITY

POSTERPRESENTATIONS

Page 138: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

127

METHODSFOREXAMININGDATAQUALITYINHEALTHCAREINTEGRATEDDATAREPOSITORIES

VojtechHuser1,MichaelG.Kahn2,JeffreyS.Brown3,RamkiranGouripeddi4

1NationalLibraryofMedicine,NationalInstittutesofHealth8600RockvillePk,Bld38aBethesda,MD,20852,USAEmail:[email protected];2DepartmentofPediatrics,UniversityofColorado13001East17thPlaceMS-F563Aurora,CO80045USAEmail:

[email protected];3DepartmentofPpopulationMedicine,HarvardMedicalSchoolandHarvardPilgrimHealthCareInstitute401ParkDrive,Suite401EastBoston,MA02215USAEmail:[email protected][email protected];4UniversityofUtah,SchoolofMedicineSaltLakeCity,84102,Utah,USAEmail:[email protected]

Vojtech,HuserThispapersummarizescontentoftheworkshopfocusedondataquality.Thefirstspeaker(VH)describeddataqualityinfrastructureanddataqualityevaluationmethodscurrentlyinplacewithintheObservationalDataScienceandInformatics(OHDSI)consortium.ThespeakerdescribedindetailadataqualitytoolcalledAchillesHeelandlatestdevelopmentforextendingthistool.InterimresultsofanongoingDataQualitystudywithintheOHDSIconsortiumwerealsopresented.Thesecondspeaker(MK)describedlessonslearnedandnewdataqualitychecksdevelopedbythePEDsNetpediatricresearchnetwork.Thelasttwospeakers(JB,RG)describedtoolsdevelopedbytheSentinelInitiativeandUniversityofUtah’sserviceorientedframework.Theworkshopdiscussedattheendandthroughouthowdataqualityassessmentcanbeadvancedbycombiningthebestfeaturesofeachnetwork.

Page 139: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

128

MULTI-CLASSCLASSIFICATIONSTRATEGYFORSUPPORTVECTORMACHINESUSINGWEIGHTEDVOTINGANDVOTINGDROP

SunghoKim,TaehunKim

YeungnamUniversity,DGISTSungho,KimAnovelmulti-classstrategyforSupportVectorMachines(SVMs)wasdevelopedtoperformmulti-classclassification,suchasOneVersusOne,OneVersusAllandDynamicAcyclicGraph.Thesestrategiesdonotreflectthedistancebetweenthehyper-planethatseparatestwoclassesandinputdata.Thisisnotreasonablewhentheinputdataisplacednearthehyper-plane.TheproposedweightedvotingresolvesthisproblembyweightingthevotingvaluesaccordingtothedistancefromtheboundaryandtheenhancedperformanceoftheSVMswiththeproposedvotingdrop.TheproposedWeightedVotingisbasedonthevotingmethod.Thevotingmethodiscarriedoutbyaccumulatingvotes,thenchoosingthemostvotedclass.TheproposedWeightedVotingmethodisaweightingofthevotingvaluebyreflectingthedistancefromtheboundaryandmargin.SecondproposedVotingDropmethodisabouthowtoaccumulatevotes.ThenovelvotingmethodaccumulateseveryvotebutthismannercanbeaproblembecausethereareredundantlyrespondingSVMs.BecausetheSVMisabinaryclassifier,eachSVMlearnsonlyabouttwoclasses.Therefore,aSVMdoesnothavediscernmentforthenon-learnedclasses.ThisiswhywhenaSVMpredictsdatabelongingtoanon-learnedclass,theSVMrespondsredundantly.ThisirrelevantSVMcausesanincorrectvotethatmakesthedecisionconfused.Toresolvethisproblem,theVotingDropmethoddropstheredundantvotesbyremovingtheirrelevantSVM.ThisalgorithmfindstheirrelevantSVM,thendroppingthevotescausedbytheirrelevantSVM.ThewaytofindanirrelevantSVMistofindaleastvotedclassbecausealeastvotedclasscanbethoughtofasanirrelevantclasstoinputdata.Asshownintheexperiments,evenlyreflectingthedistancefromthehyper-planeandthediscernmentofthehyper-planeandremovingtheredundantSVM`svotingleadstohigherperformance.Theproposedmethodscanbeusedforarangeofclassificationtasks.

Page 140: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

129

ATOPOLOGY-BASEDAPPROACHTOQUANTIFYNETWORKPERTURBATIONSCORESFORASSESSMENTOFDIFFERENTTOBACCOPRODUCTCLASSES

QuynhT.Tran1,LeeLarcombe2,SubhashiniArimilli3,G.L.Prasad1

1ReynoldsAmericanInc.ServicesCompany-WinstonSalemNC-USA27105;2AppliedExomicsLtd-StevenageUKSG12FX;3WakeForestBaptistHealth-WinstonSalemNCUSA27104

Quynh,TranBackground:Chroniccigarettesmokingisknowntocauseimmunesuppresion,whichinturncontributestoincreasedsusceptibilitytocancer.However,thereislimitedinformationontheeffectsofnon-combustibletobaccoproducts,suchasmoistsnuff.Tobetterunderstandthemolecularchangesthatresultfromconsumptionofdifferenttobaccoproducts,globalprofilingtechniqueshavebeenextensivelyutilized.Alimitationofsuchapproachesisthatdifferentialgeneexpressionalonemaybeinsufficienttoidentifyboththesourceofperturbationandtheextenttowhichperturbationspropagatethroughanetworkofinteractinggenes.Systemsbiologytoolssupporttheanalysesandintegrationofcomplexdatasets,andprovideaholisticviewoftheunderlyingbiologicalchanges.Hence,weimplementedanetwork-basedanalysistooltoelucidatemolecularchangesthatarisefromtheuseofdifferenttobaccoproducts.Methods:Wedevelopedananalyticalapproachtoquantifyandvisualizegene-levelperturbationscoresofapre-identifiednetwork.Thisapproachdifferentiatesbiologicaleffectsofmultipletreatments,usinggenome-scaleexpressiondataandconsideringinteractome-wideeffects.Weutilizedamicroarraygeneexpressiondatasetofperipheralbloodmononuclearcellstreatedwithaqueousextractsofwholesmokeconditionedmedium(WS-CM)andsmokelesstobaccoextract(STE)preparedfrom3R4Fcigarettesand2S3moistsnuffreferencetobaccoproducts,respectively,atbaselineandafterstimulationwithtoll-likereceptor(TLR)agonists.Theanalyticalpipelinetakesnormalizedgeneexpressionvaluesandperformsthefollowingsteps:1)generatesgene-levelnetworkscoresusingaweightedtopologyapproachconsideringboththegeneexpressiondataandthefullhumaninteractomeinformationavailableinIntAct(aliteraturecuratedmolecularinteractiondatabase);2)derivesgene-levelperturbationscoresforeachtreatmentconditioncomparedtoitsbaseline;and3)calculatesasingleimpactscoreforeachexposureconditionandcreatesanetworkgraphtobevisualizedusingCytoScape.Results:Thepipelinewasappliedtocalculateimpactscoresundereachstimulationandeachtreatmentconditionforaninflammatoryresponsenetwork,signalingthroughatriggeredreceptorexpressedonmyeloidcells1(TREM1).SamplesstimulatedwithTLRagonistshadhigherscoresormoreperturbationcomparedtonon-stimulatedsamples.ThoseexposedtohigherWS-CMdosesreceivedhigherscorescomparedtolowerdosesofWS-CM.SamplesexposedtoSTEreceivedalowerscoresuggestingSTEtreatmentperturbedTREM1networktoalowerdegreethanWS-CM.Ontheotherhand,theclassicaldifferentialgeneexpressionanalysisdidnotidentifysignificantchangesingeneexpressionforSTEtreatedsamplesstimulatedwithTLRagonists,comparedtountreatedcells.Conclusions:Insummary,thisnetworkscoringmethodologysuggeststhat,undertheseconditions,STEexertslessperturbationonselectimmunenetworkscomparedtocombustibletobaccoproducts.Thesescorespotentiallyserveastoolstodifferentiatethebiologicaleffectsresultingfromdifferenttobaccoclasses.

Page 141: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

130

AUTHORINDEX

A

Abe,Sumiko·51Abyzov,Alexej·79Acharya,Ambika·5Achour,Ikbel·33Adhikari,AashishN.·113Adhikari,BhimM.·53Agrawal,Monica·8,70Aldana,Julian·76Al-Ghalith,Gabriel·75Alkan,Can·77,90Allette,Kimaada·2Alser,Mohammed·77,90Altman,RussB.·5,47,62,119Alvarellos,Maria·119Ambite,JoséLuis·51Amos,ChristopherI.·115Anderson,PaulE.·98Arimilli,Subhashini·129

B

Bada,Michael·45Bae,MinGyun·78,96,100Bae,Taejeong·79Baheti,Saurabh·116Baladandayuthapani,Veerabhadran·48Barbarino,Julia·119Bartonicek,Nenad·103BaumgartnerJr.,WilliamA.·37,45Beam,AndrewL.·83Beaulieu-Jones,BrettK.·9Bechheim,Matthias·92Behsaz,Bahar·80Berghout,Joanne·25Bharath,Karthik·48Bhattrai,Avnish·51Biegel,J.·114Bilke,Sven·88Blach,Colette·18Blangero,John·53Bobacz,Klaus·95Boguslav,Mayla·37Bowtell,DavidD.L.·103Bozkurt,Selen·124Bradford,Yuki·58Breitenstein,MatthewK.·28,118Brenner,StevenE.·113Bright,RoselieA.·5Brooks,JamesD.·124Brown,JeffreyS.·127Buccigrossi,Robert·110

Buchan,Z.R.·72Buckley,J.·114Bull,ShelleyB.·104Burns,Gully·51Bush,WilliamS.·57,81,102Butkiewicz,Mariusz·81

C

Cairns,Junmei·28,118Cali,DamlaS.·90Callahan,TiffanyJ.·45Capra,JohnA.·57,102Caraballo,PedroJ.·28,118CarsonIII,WilliamE.·43Cha,Hongui·65Chance,MarkR.·97Chen,Bin·82Chen,Flavia·113Chen,MichaelL.·83Chen,Rong·10,42Chen,Xiao·88Chen,Xintong·2Chen,Youdinghuan·71Chen,Yuying·82Cheng,Chao·71Cheng,Jun·108Cheng,Liang·108Chesi,Alessandra·35Cheung,Philip·67Chi,Chih-Lin·26Chidester,Benjamin·20Ching,Travers·60Cho,SooYoung·125Choe,EunKyung·84Choi,JungKyoon·78,96,100Christensen,BrockC.·71Chu,Dinh-Toi·122Chuang,Han-Yu·88Clark,NeilR.·3,64Cohen,K.Bretonnel·37Cooper,BruceA.·93Cox,RobertW.·53Crawford,DanaC.·57Crowley,Albert·110Currier,RobertJ.·113

D

deBelle,J.Steven·67De,Subhajyoti·109Deng,Siyuan·43Dinger,MarcelE.·103Do,MinhN.·20

Page 142: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

131

Doherty,JenniferA.·11Dong,Zhuxin·94Dorrestein,PieterC.·80Duan,Qiaonan·3,64Dudley,JoelT.·3,10,12,42,64

E

Eckel-Passow,JeanetteE.·116Ergin,Oğuz·77,90Erikson,GalinaA.·85

F

Farhat,Maha·83Feng,Qianjin·108Fenger,Douglas·67Fieremans,Els·53Fierro,Lily·51Fish,AlexandraE.·57Fisher,MarkF.·80Flotte,T.J.·72Flotte,W.·72Foster,Ian·33

G

Gai,X.·114Gallagher,RenataC.·113Garmire,LanaX.·60Geiersbach,K.B.·72Gerstein,Mark·86Ghose,Saugata·90Glahn,DavidC.·53Glicksberg,BenjaminS.·10,12,42,82Gong,Li·119Gordon,Jonathan·51Gouripeddi,Ramkiran·127Grant,Gregory·99Grant,StruanF.A.·35Greene,CaseyS.·6,11,68Greenside,Peyton·41Griffith,Malachi·16Griffith,ObiL.·16Gui,Jiang·115Guo,Caiwei·46Gupta,Anika·27Gurevich,Alexey·80Gursoy,Gamze·86

H

Haas,DavidW.·58Haines,JonathanL.·81Hall,MollyA.·35

Han,Jiali·33Han,Jiawei·39Han,Lichy·54Han,Zhi·108Harrington,LiaX.·11Hart,StevenN.·72,116Hartman,Nicholas·31Haselgrove,Christian·110Hassan,Hasan·77,90He,Lu·26He,Mingze·73Hernandez-Boussard,Tina·124Hiemenz,M.·114Hillenmeyer,Maureen·41Hobbs,BrianP.·48Hodos,Rachel·3,12,64Hong,L.Elliot·53Hornung,Veit·92Horton,IainF.·116Houten,Sander·2Hu,Jianying·3,64Hu,Xiao·93Huang,Chenglong·21Huang,EdwardW.·38,39Huang,Heng·22,107Huang,Kun·34,55,108,111,121Huang,Ling·85Huddart,Rachel·119Hudson,TiaTate·87Huff,ChadD.·105Hunter,LawrenceE.·37,45Huo,Zhouyuan·22Huser,Vojtech·127Hwang,Su-Kyeong·117

I

Ideker,Trey·39

J

Jahanshad,Neda·53Jain,Priyambada·51Jenkins,NicoleP.·71Jeong,Hyun-Hwan·46Ji,Xuemei·115Johnson,Abigail·75Johnson,KippW.·10,12Johnson,TravisS.·34,121Ju,JinHyun·88Jung,Jae-Yoon·27

K

Kaang,Bong-Kiun·117Kahn,MichaelG.·127Kamdar,Jeana·51

Page 143: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

132

Kamdar,MaulikR.·54Kang,JoonHo·65Kawakubo,Hideko·89,101Kennedy,David·110Kennedy,Eamonn·94Kettenbach,ArminjaN.·71Kho,JohnathanR.·34,121Kidd,Brian·3,64Kim,Dokyoon·23Kim,EunJi·99Kim,JeremieS.·90Kim,Jihyun·125Kim,Jinho·65Kim,Suh-Ryung·104Kim,SunAh·104Kim,Sungho·91,128Kim,Taehun·91,128Kim-Hellmuth,Sarah·92Klein,TeriE.·119Knights,Dan·75Kober,KordM.·93Kochunov,Peter·53Koenig,BarbaraA.·113Kohane,IsaacS.·83Kolmogorov,Mikhail·94Krunic,Milica·95Kulkarni,Anagha·62Kulkarni,Chaitanya·55,111Kulkarni,Shashikant·16Kundaje,Anshul·41Kvale,Mark·113Kwok,Pui·113

L

LaCava,William·13Lahens,Nicholas·99Lake,Bethany·31Lappalainen,Tuuli·92Larcombe,Lee·129Larson,NicholasB.·116Lawrence-Dill,CarolynJ.·73Le,Duc-Hau·122Lee,Boram·65Lee,Donghyuk·90Lee,Hao-Chih·3,64Lee,Jae-Hyung·117Lee,Jin-A·117Lee,JunHyeong·78,96,100Lee,Kyungmin·117Lee,SangWoo·84Lee,Seunggeun·23Lee,Taeyeop·78,96,100Lee,Yong-Seok·117Lee,YoungSeek·125Lei,Xiaoxiao·51Lerman,Kristina·51Leskovec,Jure·8,70Li,Binglan·58

Li,Fuhai·43Li,Haiquan·33Li,Jianrong·25,33Li,Justin·66Li,Li·10,42Li,Qike·25,30Li,Sihong·34,121Lim,Chae-Seok·117Liu,Gang·66Liu,Ke·82Liu,Zhandong·46Losic,Bojan·2Luo,Yunan·4Lussier,YvesA.·25,30,33

M

Ma,Jian·20Ma,Jianzhu·39Ma’ayan,Avi·3,64Machiraju,Raghu·55,111Madhavan,Subha·16Maglinte,D.·114Mallick,Parag·55,111Mallory,EmilyK.·5,62Manduchi,Elisabetta·35Mariani,Jessica·79Marotti,JonathanD.·71Matsui,Yusuke·89,101Maxwell,Sean·97McCoy,Matthew·16McDonnell,ShannonK.·116McGarvey,Peter·16Metaxas,Dimitris·109Miaskowski,Christine·93Micheel,Christine·16Miller,JasonE.·23Miller,ToddW.·71Miotto,Riccardo·10Mishkanian,Ben·88Mohammadi,Pejman·92Mohimani,Hosein·80Molina,MonicaCala·76Mooney,SeanD.·113Moore,AbigailE.·98Moore,JasonH.·9,13,17,28,35,118Morishita,Hirofumi·42Mounajjed,T.·72Müller-Myhsok,Bertram·92Mustahsan,Zairah·13Mutlu,Onur·77,90Mylne,JoshuaS.·80

N

Nam,Hyunha·101Nayak,Soumyashant·99Ng,ChaanS.·48

Page 144: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

133

Nho,Kwangsik·23,107Nichols,ThomasE.·53Norgan,A.P.·72Novikov,DmitryS.·53Nussbaum,RobertL.·113

O

O’Driscoll,Caroline·51Oh,Jaeho·78,96,100Olson,RandalS.·13,17,28,118Orlenko,Alena·28,118Orzechowski,Patryk·9,28,118Ostrow,D.·114

P

Park,Charny·125Park,JungIn·124Park,SooJun·125Park,Woong-Yang·65Paskov,KelleyM.·27Paul,StevenM.·93Paulson,Abby·110Payne,PhilipR.O.·43Peng,Jian·4,39Pesce,Lorenzo·33Petkovic,Dragutin·47,62Pevzner,PavelA.·80,94Pham,Van-Huy·122Poole,Sarah·29Pouladi,Nima·25Prasad,G.L.·66,129Preuss,Nina·110Previde,Paul·62Prjibelski,Andrey·80Puck,JenniferM.·113Pütz,Benno·92Pyc,MaryA.·67

Q

Qu,Hui·109

R

RachidZaim,Samir·30Rao,Shruti·16Ravvaz,Kourosh·26Regan,Kelly·43Rensi,StefanoE.·5Reynolds,RichardC.·53Risacher,ShannonL.·23,107Ritchie,MarylynD.·14,58Ritter,Deborah·16

Robinson,Jill·119Roy,Angshumoy·16Rubin,DanielL.·124Ryutov,A.·114

S

Salas,LucasA.·71Sangkuhi,Katrin·119Sarkar,IndraNeil·50Saykin,AndrewJ.·23,107Schissler,A.Grant·30Schmitt,Peter·17Schumacher,Johannes·92Sebra,RobertP.·2Shah,K.K.·72Shah,Nigam·29Shameer,Khader·10,12Sharma,Vivekanand·50Shearer,Gregory·31Sheih,Joseph·113Shen,Dinggang·22Shen,Li·107,110Shestov,Maksim·17Shimamura,Teppei·89,101Shin,Hyun-Tae·65Shivakumar,ManuK.·23Shoemaker,Katherine·48Shokhirev,Maxim·85Shukla,Dinesh·53Shulman,Joshua,M.·46Sinha,Aakanchha·51Sivley,R.Michael·102Smarr,Larry·80Smith,MiloR.·42Snedecor,June·88Son,LeHoang·122Sonkin,Dmitriy·16Sontag,David·3,64Srinivasan,Raj·113Srivastava,Arunima·55,111Stefanski,AdrianneL.·45Stewart,Crystal·51Stockham,NateT.·27Stolovitzky,Gustavo·2Sun,MinWoo·27Sunderam,Uma·113

T

Tenenbaum,JessicaD.·18Thomas,Brook·62Thompson,PaulM.·53Thorn,Caroline·119Timp,Gregory·94Tintle,Nathan·31Tomasini,Livia·79Tonellato,PeterJ.·26

Page 145: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018psb.stanford.edu/previous/psb18/conference-materials/...PACIFIC SYMPOSIUM ON B

134

Torpy,JamesR.·103Tran,QuynhT.·129Travers,Matthew·110Triche,T.·114Tripodi,Ignacio·45Tully,Tim·67Turnbaugh,PeterJ.·5

U

Urban,AlexanderE.·79

V

Vaccarino,FloraM.·79VanHorn,JohnDarrell·51Vangay,Pajau·75Varik,Akshay·13Veraart,Jelle·53Verbeke,Lieven·122Verma,Anurag·58Verma,ShefaliS.·58Veturi,YogasudhaC.·14,58Vigil,Arthur·47vonHaeseler,Arndt·95

W

Wall,DennisP.·27Wang,Fei·3,64Wang,Liewei·28,118Wang,Sheng·4,38,39Wang,Yaqiong·113Wang,Yue·71Wang,Ze·107Wang,Zichen·3,64Watson,DennisK.·98Watson,PatriciaM.·98Way,GregoryP.·6,11,68Weinshilboum,RichardM.·28,118Weissert,John·26

Westra,Jason·31Whaley,RyanM.·119Whirl-Carrillo,Michelle·119White,ElizabethK.·45Williams-DeVane,ClarLynda·87Wilson,RobertC.·98Wong,Mike·47,62Woon,Mark·119

X

Xiao,Guanghua·21Xiao,Jinfeng·4Xin,Hongyi·77,90Xu,Jielin·43

Y

Yalamanchili,HariKrishna·46Yang,Jung-eun·117Yao,Xiaohui·107Yoo,YunJoo·104Yu,MichaelKu·39Yu,Yao·105Yun,JaeWon·65

Z

Zeng,William·82Zhai,ChengXiang·38Zhang,Albert·21Zhang,Jie·108Zhang,Ping·3,64Zhang,Yan·34,121Zheng,Brandon·98Zheng,Fan·39Zhou,Bo·79Zitnik,Marinka·8,70Zou,Yangyun·113Zuluaga,Martha·76