pacific symposium on biocomputing 2017psb.stanford.edu/previous/psb17/conference-materials/... ·...

PACIFICSYMPOSIUMONBIOCOMPUTING2017

ABSTRACTBOOK

PosterPresenters:Posterspaceisassignedbyabstractpagenumber.Pleasefindthepagethatyourabstractisonandputyourposterontheposterboardwiththecorrespondingnumber(e.g.,ifyourabstractison

page50,putyourposteronboard#50).

Proceedingspaperswithoralpresentations#2-39arenotassignedposterspace.

Papersareorganizedfirstbysession,thenthelastnameofthefirstauthor.Presentingauthors’namesareunderlined.

i

TABLEOFCONTENTS

PROCEEDINGSPAPERSWITHORALPRESENTATIONCOMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION 1IDENTIFICATIONANDANALYSISOFBACTERIALGENOMICMETABOLICSIGNATURES...2NathanBowerman,NathanTintle,MatthewDeJongh,AaronA.Best

WHENSHOULDWENOTTRANSFERFUNCTIONALANNOTATIONBETWEENSEQUENCEPARALOGS?.............................................................................................................................................................3MengfeiCao,LenoreJ.Cowen

PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION...................................................................................................................................4ShengWang,MengQu,JianPeng

ONTHEPOWERANDLIMITSOFSEQUENCESIMILARITYBASEDCLUSTERINGOFPROTEINSINTOFAMILIES...............................................................................................................................5ChristianWiwie,RichardRöttger

IMAGINGGENOMICS 6INTEGRATIVEANALYSISFORLUNGADENOCARCINOMAPREDICTSMORPHOLOGICALFEATURESASSOCIATEDWITHGENETICVARIATIONS.....................................................................7ChaoWang,HaiSu,LinYang,KunHuang

IDENTIFICATIONOFDISCRIMINATIVEIMAGINGPROTEOMICSASSOCIATIONSINALZHEIMER'SDISEASEVIAANOVELSPARSECORRELATIONMODEL......................................8JingwenYan,ShannonL.Risacher,KwangsikNho,AndrewJ.Saykin,LiShen

ENFORCINGCO-EXPRESSIONINMULTIMODALREGRESSIONFRAMEWORK.........................9PascalZille,VinceD.Calhoun,Yu-PingWang

METHODSTOENSURETHEREPRODUCIBILITYOFBIOMEDICALRESEARCH 10EXPLORINGTHEREPRODUCIBILITYOFPROBABILISTICCAUSALMOLECULARNETWORKMODELS.........................................................................................................................................11AriellaCohain,AparnaA.Divaraniya,KuixiZhu,JosephR.Scarpa,AndrewKasarskis,JunZhu,RuiChang,JoelT.Dudley,EricE.Schadt

REPRODUCIBLEDRUGREPURPOSING:WHENSIMILARITYDOESNOTSUFFICE...............12EmreGuney

EMPOWERINGMULTI-COHORTGENEEXPRESSIONANALYSISTOINCREASEREPRODUCIBILITY...........................................................................................................................................13WinstonA.Haynes,FrancescoVallania,CharlesLiu,ErikaBongen,AurelieTomczak,MartaAndres-Terrè,ShaneLofgren,AndrewTam,ColeA.Deisseroth,MatthewD.Li,TimothyE.Sweeney,PurveshKhatri

RABIX:ANOPEN-SOURCEWORKFLOWEXECUTORSUPPORTINGRECOMPUTABILITYANDINTEROPERABILITYOFWORKFLOWDESCRIPTIONS..........................................................14GauravKaushik,SinisaIvkovic,JankoSimonovic,NebojsaTijanic,BrandiDavis-Dusenbery,DenizKural

DATASHARINGANDCLINICALGENETICTESTING:SUCCESSESANDCHALLENGES........15ShanYang,MelissaCline,CanZhang,BenedictPaten,StephenE.Lincoln

ii

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 16LEARNINGATTRIBUTESOFDISEASEPROGRESSIONFROMTRAJECTORIESOFSPARSELABVALUES.........................................................................................................................................................17VibhuAgarwal,NigamH.Shah

COMPUTERAIDEDIMAGESEGMENTATIONANDCLASSIFICATIONFORVIABLEANDNON-VIABLETUMORIDENTIFICATIONINOSTEOSARCOMA......................................................18HarishBabuArunachalam,RashikaMishra,BogdanArmaselu,OvidiuDaescu,MariaMartinez,PatrickLeavey,DineshRakheja,KevinCederberg,AnitaSengupta,MollyNi'Suilleabhain

MISSINGDATAIMPUTATIONINTHEELECTRONICHEALTHRECORDUSINGDEEPLYLEARNEDAUTOENCODERS..........................................................................................................................19BrettK.Beaulieu-Jones,JasonH.Moore,ThePooledResourceOpen-AccessALSClinicalTrialsConsortium

DEVELOPMENTANDPERFORMANCEOFTEXT-MININGALGORITHMSTOEXTRACTSOCIOECONOMICSTATUSFROMDE-IDENTIFIEDELECTRONICHEALTHRECORDS.......20BrittanyM.Hollister,NicoleA.Restrepo,EricFarber-Eger,DanaC.Crawford,MelindaC.Melinda C. Aldrich,AmyNon

DEMODASHBOARD:VISUALIZINGANDUNDERSTANDINGGENOMICSEQUENCESUSINGDEEPNEURALNETWORKS..........................................................................................................................21JackLanchantin,RitambharaSingh,BeilunWang,YanjunQi

PREDICTIVEMODELINGOFHOSPITALREADMISSIONRATESUSINGELECTRONICMEDICALRECORD-WIDEMACHINELEARNING:ACASE-STUDYUSINGMOUNTSINAIHEARTFAILURECOHORT.............................................................................................................................22KhaderShameer,KippW.Johnson,AlexandreYahi,RiccardoMiotto,LiLi,DoranRicks,JebakumarJebakaran,PatriciaKovatch,ParthoP.Sengupta,AnnetineGelijns,AlanMoskovitz,BruceDarrow,DavidL.Reich,AndrewKasarskis,NicholasP.Tatonetti,SeanPinney5,JoelT.Dudley

METHODSFORCLUSTERINGTIMESERIESDATAACQUIREDFROMMOBILEHEALTHAPPS........................................................................................................................................................................23NicoleTignor,PeiWang,NicholasGenes,LindaRogers,StevenG.Hershman,ErickR.Scott,MicolZweig,Yu-FengYvonneChan,EricE.Schadt

ANEWRELEVANCEESTIMATORFORTHECOMPILATIONANDVISUALIZATIONOFDISEASEPATTERNSANDPOTENTIALDRUGTARGETS.................................................................24ModestvonKorff,TobiasFink,ThomasSander

DISCOVERYOFFUNCTIONALANDDISEASEPATHWAYSBYCOMMUNITYDETECTIONINPROTEIN-PROTEININTERACTIONNETWORKS.................................................................................25StephenJ.Wilson,AngelaD.Wilkins,Chih-HsuLin,RhonaldC.Lua,OlivierLichtarge

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 26OPENINGTHEDOORTOTHELARGESCALEUSEOFCLINICALLABMEASURESFORASSOCIATIONTESTING:EXPLORINGDIFFERENTMETHODSFORDEFININGPHENOTYPES......................................................................................................................................................27ChristopherR.Bauer,DanielLavage,JohnSnyder,JosephLeader,J.MatthewMahoney,SarahA.Pendergrass

TEMPORALORDEROFDISEASEPAIRSAFFECTSSUBSEQUENTDISEASETRAJECTORIES:THECASEOFDIABETESANDSLEEPAPNEA.......................................................................................28MetteBeck,DavidWestergaard,LeifGroop,SorenBrunak

iii

HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER..................................................................................................................................................................29JonathanGallion,AngelaD.Wilkins,OlivierLichtarge

MUSE:AMULTI-LOCUSSAMPLING-BASEDEPISTASISALGORITHMFORQUANTITATIVEGENETICTRAITPREDICTION......................................................................................................................30DanHe,LaxmiParida

DIFFERENTIALPATHWAYDEPENDENCYDISCOVERYASSOCIATEDWITHDRUGRESPONSEACROSSCANCERCELLLINES..............................................................................................31GilSpeyer,DivyaMahendra,HaiJ.Tran,JeffKiefer,StuartL.Schreiber,PaulA.Clemons,HarshilDhruv,MichaelBerens,SeungchanKim

AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETSINCLEARCELLKIDNEYCANCER................................................................................................................................................32JeffreyA.Thompson,CarmenJ.Marsit

DENOVOMUTATIONSINAUTISMIMPLICATETHESYNAPTICELIMINATIONNETWORK.............................................................................................................................................................33GuhanRamVenkataraman,ChloeO'Connell,FumikoEgawa,DornaKashef-Haghighi,DennisPaulWall

IDENTIFYINGGENETICASSOCIATIONSWITHVARIABILITYINMETABOLICHEALTHANDBLOODCOUNTLABORATORYVALUES:DIVINGINTOTHEQUANTITATIVETRAITSBYLEVERAGINGLONGITUDINALDATAFROMANEHR.................................................................34ShefaliS.Verma,AnastasiaM.Lucas,DanielR.Lavage,JosephB.Leader,RaghuMetpally,SarathbabuKrishnamurthy,FrederickDewey,IngridBorecki,AlexanderLopez,JohnOverton,JohnPenn,JeffreyReid,SarahA.Pendergrass,GerdaBreitwieser,MarylynD.Ritchie

STRATEGIESFOREQUITABLEPHARMACOGENOMIC-GUIDEDWARFARINDOSINGAMONGEUROPEANANDAFRICANAMERICANINDIVIDUALSINACLINICALPOPULATION.......................................................................................................................................................35LauraWiley,JacobVanHouten,DavidSamuels,MelindaAldrich,DanRoden,JoshPeterson,JoshuaDenny

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY36PRODUCTIONOFAPRELIMINARYQUALITYCONTROLPIPELINEFORSINGLENUCLEIRNA-SEQANDITSAPPLICATIONINTHEANALYSISOFCELLTYPEDIVERSITYOFPOST-MORTEMHUMANBRAINNEOCORTEX...................................................................................................37BrianAevermann,JamisonMcCorrison,PratapVenepally,RebeccaHodge,TrygveBakken,JeremyMiller,MarkNovotny,DannyN.Tran,FranciscoDiez-Fuertes,LenaChristiansen,FanZhang,FrankSteemers,RogerS.Lasken,EdLein,NicholasSchork,RichardH.Scheuermann

TRACINGCO-REGULATORYNETWORKDYNAMICSINNOISY,SINGLE-CELLTRANSCRIPTOMETRAJECTORIES.............................................................................................................38PabloCordero,JoshuaM.Stuart

ANUPDATEDDEBARCODINGTOOLFORMASSCYTOMETRYWITHCELLTYPE-SPECIFICANDCELLSAMPLE-SPECIFICSTRINGENCYADJUSTMENT...........................................................39KristinI.Fread,WilliamD.Strickland,GarryP.Nolan,EliR.Zunder

iv

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONSIMAGINGGENOMICS 40ADAPTIVETESTINGOFSNP-BRAINFUNCTIONALCONNECTIVITYASSOCIATIONVIAAMODULARNETWORKANALYSIS...............................................................................................................41ChenGao,JunghiKim,WeiPan

EXPLORINGBRAINTRANSCRIPTOMICPATTERNS:ATOPOLOGICALANALYSISUSINGSPATIALEXPRESSIONNETWORKS...........................................................................................................42ZhanaKuncheva,MichelleL.Krishnan,GiovanniMontana

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 43ADEEPLEARNINGAPPROACHFORCANCERDETECTIONANDRELEVANTGENEIDENTIFICATION...............................................................................................................................................44PadidehDanaee,RezaGhaeini,DavidHendrix

GENOME-WIDEINTERACTIONWITHSELECTEDTYPE2DIABETESLOCIREVEALSNOVELLOCIFORTYPE2DIABETESINAFRICANAMERICANS...................................................45JacobM.Keaton,JacklynN.Hellwege,MaggieC.Y.Ng,NicholetteD.Palmer,JamesS.Pankow,MyriamFornage,JamesG.Wilson,AdolofoCorrea,LauraJ.Rasmussen-Torvik,JeromeI.Rotter,Yii-DerI.Chen,KentD.Taylor,StephenS.Rich,LynneE.Wagenknecht,BarryI.Freedman,DonaldW.Bowden

META-ANALYSISOFCONTINUOUSPHENOTYPESIDENTIFIESAGENESIGNATURETHATCORRELATESWITHCOPDDISEASESTATUS.......................................................................................46MadeleineScott,FrancescoVallania,PurveshKhatri

LEARNINGPARSIMONIOUSENSEMBLESFORUNBALANCEDCOMPUTATIONALGENOMICSPROBLEMS...................................................................................................................................47AnaStanescu,GauravPandey

NETWORKMAPOFADVERSEHEALTHEFFECTSAMONGVICTIMSOFINTIMATEPARTNERVIOLENCE.......................................................................................................................................48KathleenWhiting,LarryY.Liu,MehmetKoyutürk,GunnurKarakurt

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 49APOWERFULMETHODFORINCLUDINGGENOTYPEUNCERTAINTYINTESTSOFHARDY-WEINBERGEQUILIBRIUM............................................................................................................50AndrewBeck,AlexanderLuedtke,KeliLiu,NathanTintle

MICRORNA-AUGMENTEDPATHWAYS(MIRAP)ANDTHEIRAPPLICATIONSTOPATHWAYANALYSISANDDISEASESUBTYPING...............................................................................51DianaDiaz,MicheleDonato,TinNguyen,SorinDraghici

FREQUENTSUBGRAPHMININGOFPERSONALIZEDSIGNALINGPATHWAYNETWORKSGROUPSPATIENTSWITHFREQUENTLYDYSREGULATEDDISEASEPATHWAYSANDPREDICTSPROGNOSIS....................................................................................................................................52ArdaDurmaz,TimA.D.Henderson,DouglasBrubaker,GurkanBebek

CERNASEARCHMETHODIDENTIFIEDAMET-ACTIVATEDSUBGROUPAMONGEGFRDNAAMPLIFIEDLUNGADENOCARCINOMAPATIENTS.................................................................53HallaKabat,LeoTunkle,InhanLee

IMPROVEDPERFORMANCEOFGENESETANALYSISONGENOME-WIDETRANSCRIPTOMICSDATAWHENUSINGGENEACTIVITYSTATEESTIMATES...................54ThomasKamp,MicahAdams,CraigDisselkoen,NathanTintle

v

METHYLDMV:SIMULTANEOUSDETECTIONOFDIFFERENTIALDNAMETHYLATIONANDVARIABILITYWITHCONFOUNDERADJUSTMENT.................................................................55PeiFenKuan,JunyanSong,ShuyaoHe

IDENTIFYCANCERDRIVERGENESTHROUGHSHAREDMENDELIANDISEASEPATHOGENICVARIANTSANDCANCERSOMATICMUTATIONS.................................................56MengMa,ChangchangWang,BenjaminGlicksberg,EricE.Schadt,ShuyuLi,RongChen

IDENTIFYINGCANCERSPECIFICMETABOLICSIGNATURESUSINGCONSTRAINT-BASEDMODELS.................................................................................................................................................................57AndréSchultz,SanketMehta,ChenyueW.Hu,FiekeW.Hoff,TerzahM.Horton,StevenM.Kornblau,AminaA.Qutub

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY58MAPPINGNEURONALCELLTYPESUSINGINTEGRATIVEMULTI-SPECIESMODELINGOFHUMANANDMOUSESINGLECELLRNASEQUENCING...................................................................59TravisJohnson,ZacharyAbrams,YanZhang,KunHuang

ASPATIOTEMPORALMODELTOSIMULATECHEMOTHERAPYREGIMENSFORHETEROGENEOUSBLADDERCANCERMETASTASESTOTHELUNG........................................60KimberlyR.KanigelWinner,JamesC.Costello

SCALABLEVISUALIZATIONFORHIGH-DIMENSIONALSINGLE-CELLDATA.........................61JuhoKim,NateRussell,JianPeng

POSTERPRESENTATIONSCOMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION 62CLUSTER-BASEDGENOTYPE-ENVIRONMENT-PHENOTYPECORRELATIONALGORITHM.........................................................................................................................................................63ErnestoBorrayo,RyokoMachida-Hirano

QUANTITATINGTRANSLATIONALCONTROL:MRNAABUNDANCE-DEPENDENTANDINDEPENDENTCONTRIBUTIONS..............................................................................................................64JingyiJessicaLi,Guo-LiangChew,MarkD.Biggin

PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION................................................................................................................................65ShengWang,MengQu,JianPen

GENERAL 66IDENTIFICATIONOFDIFFERENTIALLYPHOSPHORYLATEDMODULESINPROTEININTERACTIONNETWORKS...........................................................................................................................67MarziehAyati,DanicaWiredja,DanielaSchlatzer,GouthamNarla,MarkChance,MehmetKoyutürk

CLUSTERINGMETHODFORPRIORITIZINGBREASTCANCERRISKGENESANDMIRNAS..................................................................................................................................................................68YongshengBai,NaureenAslam,AliSalman

FUSIONDB:ASSESSINGMICROBIALDIVERSITYANDENVIRONMENTALPREFERENCESVIAFUNCTIONALSIMILARITY....................................................................................................................69ChengshengZhu,YannickMahlich,YanaBromberg

vi

THEGEORGEM.O’BRIENKIDNEYTRANSLATIONALCORECENTERATTHEUNIVERSITYOFMICHIGAN......................................................................................................................................................70FrankC.Brosius,WenjunJu,KeithBellovich,ZeenatBhat,CrystalGadegbeku,DebbieGipson,JenniferHawkins,JuliaHerzog,SusanMassengill,RichardC.McEachin,SubramaniamPennathur,KalyaniPerumal,RogerWiggins,MatthiasKretzler

MININGDIRECTIONALDRUGINTERACTIONEFFECTSONMYOPATHYUSINGTHEFAERSDATABASE............................................................................................................................................................71DanaiChasioti,XiaohuiYao,PengyueZhang,XiaNing,LangLi,LiShen

DECIPHERINGNEURONALBROADHISTONEH3K4ME3DOMAINSASSOCIATEDWITH GENE-REGULATORYNETWORKSANDCONSERVEDEPIGENOMICLANDSCAPESINTHEHUMANBRAIN...................................................................................................................................................72AslihanDincer,EricE.Schadt,BinZhang,JoelT.Dudley,DavinGavin,SchahramAkbarian

NORMALIZATIONTECHNIQUESANDMACHINELEARNINGCLASSIFICATIONFORASSIGNINGMOLECULARSUBSETSINAUTOIMMUNEDISEASEANDCANCER....................73JenniferM.Franks,GuoshuaiCai,JaclynN.Taroni,MichaelL.Whitfield

MULTI-OMICSDATAINTEGRATIONTOSTRATIFYPOPULATIONINHEPATOCELLULARCARCINOMA.........................................................................................................................................................74KumardeepChaudhary,OlivierPoirion,LiangqunLu,LanaGarmire

TOWARDSSTANDARDS-BASEDCLINICALDATAWEBAPPLICATIONLEVERAGINGSHINYRANDHL7FHIR..................................................................................................................................75NaHong,NareshProdduturi,ChenWang,GuoqianJiang

ADATALAKEPLATFORMOFCONTEXTUALBIOLOGICALINFORMATIONFORAGILETRANSLATIONALRESEARCH......................................................................................................................76AustinHuang,DmitriBichko,MathieuBoespflug,EdskodeVries,FacundoDominguez,DanielZiemek

GENOMEREADIN-MEMORY(GRIM)FILTER:FASTLOCATIONFILTERINGINDNAREADMAPPINGUSINGEMERGINGMEMORYTECHNOLOGIES................................................................77JeremieKim,DamlaSenol,HongyiXin,DonghyukLee,MohammedAlser,HasanHassan,OguzErgin,CanAlkan,OnurMutlu

BCL-2FAMILYMEMBERSASREGULATORSOFRESPONSIVENESSTOBORTEZOMIBINAMULTIPLEMYELOMAMODEL.....................................................................................................................78MelissaE.Ko,CharisTeh,ChristopherS.Playter,EliR.Zunder,DanielH.Gray,WendyJ.Fantl,SylviaK.Plevritis,GarryP.Nolan

BIOMEDICALTEXT-MININGAPPLICATIONSFORTHESYSTEMDEEPDIVE.............................79EmilyK.Mallory,ChrisRe,RussB.Altman

PROFILINGADAPTIVEIMMUNEREPERTOIRESACROSSMULTIPLEHUMANTISSUESBYRNASEQUENCING.............................................................................................................................................80SergheiMangul,IgorMandric,HarryTaegyunYang,DennisMontoya,NicolasStrauli,JeremyRotman,BenjaminStatz,WillVanDerWey,AlexZelikovsky,RobertoSpreafico,MauraRossetti,SagivShifman,MarkAnsel,NoahZaitlen,EleazarEskin

THECMHVARIANTWAREHOUSE-ACATALOGOFGENETICVARIATIONINPATIENTSOFACHILDREN'SHOSPITAL.......................................................................................................................81NeilMIller,GreysonTwist,ByunggilYoo,AndreaGaedigk

MUTPRED2ANDITSAPPLICATIONTOTHEINFERENCEOFMOLECULARSIGNATURESOFDISEASE..........................................................................................................................................................82VikasPejaver,LiliaM.Iakoucheva,SeanD.Mooney,PredragRadivojac

HIV-TRACE:MONITORINGTHEHIVEPIDEMICINNEARREALTIMEUSINGLARGENATIONALANDGLOBALSCALEMOLECULAREPIDEMIOLOGY..................................................83SergeiPond,StevenWeaver,JoelWertheim,AndrewJ.LeighBrown

vii

THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY........................................................................................84MaryA.Pyc,EmilyGiron,PhilipCheung,DouglasFenger,J.StevendeBelle,TimTully

RESCUETHEMISSINGVARIANTS-LESSONSLEARNEDFROMLARGESEQUENCINGPROJECTS..............................................................................................................................................................85YingxueRen,JosephS.Reddy,VivekanandaSarangi,JasonP.Sinnwell,SteveG.Younkin,NilüferErtekin-Taner,OwenA.Ross,RosaRademakers,ShannonK.McDonnell,JoannaM.Biernacka,YanW.Asmann

TOWARDEFFECTIVEMICRORNAQUANTIFICATIONFROMSMALLRNA-SEQ.......................86PamelaRussell,RichardRadcliffe,BrianVestal,WenShi,PratyaydiptaRudra,LauraSaba,KaterinaKechris

NANOPORESEQUENCINGTECHNOLOGYANDTOOLS:COMPUTATIONALANALYSISOFTHECURRENTSTATE,BOTTLENECKSANDFUTUREDIRECTIONS..........................................87DamlaSenol,JeremieKim,SaugataGhose,CanAlkan,OnurMutlu

DETECTINGOUTLIERSFROMMULTIDIMENSIONALDATAWITHAPPLICATIONINCANCER..................................................................................................................................................................88KyleSmith,SubhajyotiDe,DebashisGosh

HUEMR:INTUITIVEMININGOFELECTRONICMEDICALRECORDS...........................................89AbiodunOtolorin,NanaOsafo,WilliamSoutherland

DECIPHERINGLUNGADENOCARCINOMAMORPHOLOGYANDPROGNOSISBYINTEGRATINGOMICSANDHISTOPATHOLOGY..................................................................................90Kun-HsingYu,GeraldJ.Berry,DanielL.Rubin,ChristopherRé,RussB.Altman,MichaelSnyder

EXPLORINGDEEPLEARNINGFORCOPYNUMBERVARIATIONDETECTIONWITHNGSDATA.......................................................................................................................................................................91Yao-zhongZhang,RuiYamaguchi,SeiyaImoto,SatoruMiyano

IMAGINGGENOMICS 92PERIPHERALEPIGENETICASSOCIATIONSWITHBRAINGRAYMATTERINSCHIZOPHRENIA................................................................................................................................................93DongdongLin,VinceD.Calhoun,JuanR.Bustillo,NoraPerrone-Bizzozero,JingyuLiu

THEINTERPLAYBETWEENOLIGO-TARGETSPECIFICANDGENOME-WIDEOFF-TARGETINTERACTIONS...................................................................................................................................................94OlgaV.Matveeva,NafisaN.Nazipova,AlekseyY.Ogurtsov,SvetlanaA.Shabalina

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 95WARS2IMPLICATEDASACOMMONMODIFIEROFMETFORMINMETABOLITEBIOMARKERSINABIOBANKCOHORT...................................................................................................96AlyssaI.Clay,RichardM.Weinshilboum,K.SreekumaranNair,RimaF.Kaddurah-Daouk,LieweiWang,MatthewK.Breitenstein

ESTIMATIONOFFALSENEGATIVERATESVIAEMBEDDINGSIMULATEDEVENTS..........97StephenV.Gliske,KatyL.Lau,BenjaminH.Brinkman,GregA.Worrell,CrisG.Fink,WilliamC.Stacey

INTEGRATIVE,INTERPRETABLEDEEPLEARNINGFRAMEWORKSFORREGULATORYGENOMICSANDEPIGENOMICS..................................................................................................................98ChuanShengFoo,AvantiShrikumar,JohnnyIsraeli,PeytonGreenside,ChrisProbert,AnnaScherbina,RahulMohan,NathanBoley,AnshulKundaje

VISUALIZATIONOFCOMPLEXDISEASESANDRELATEDGENESETS......................................99ModestvonKorff,TobiasFink,ThomasSander

viii

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 100FINDINGSFROMTHEFOURTHCRITICALASSESSMENTOFGENOMEINTERPRETATION,ACOMMUNITYEXPERIMENTTOEVALUATEPHENOTYPEPREDICTION............................101StevenE.Brenner,GaiaAndreoletti,RogerAHoskins,JohnMoult,CAGIParticipants

ASTROLABE:EXPANSIONTOCYP2C9ANDCYP2C1.......................................................................102AndreaGaedigk,GreysonP.Twist,SarahSoden,EmilyG.Farrow,NeilA.Miller

HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER................................................................................................................................................................103JonathanGallion,AngelaD.Wilkins,OlivierLichtarge

SCOTCH:ANOVELMETHODTODETECTINSERTIONSANDDELETIONSFROMNGSDATA.....................................................................................................................................................................104RachelGoldfeder,EuanAshley

MAYOOMICSREPOSITORYFORTRANSLATIONALMEDICINE..................................................105IainHorton,JeanetteEckel-Passow,StevenHart,ShannonMcDonnell,DavidMead,GayGay Reed,GregDougherty,JasonRoss,JulieSwank,MarkMyers,MathieuWiepert,RamaVolety,TonyStai,YaxiongLin,RobertFreimuth

PHARMACOGENOMICSCLINICALANNOTATIONTOOL(PHARMCAT).....................................106T.E.Klein,M.Whirl-Carrillo,R.M.Whaley,M.Woon,K.Sangkuhl,LesterG.Carter,H.M.Dunnenberger,P.E.Empey,A.T.Frase,R.R.Freimuth,A.Gaedigk,A.Gordon,C. Haidar,J.K.Hicks,J.M.Hoffman,M.T.Lee,N.Miller,S.D.Mooney,T.N.Person,J.F.Peterson,M.V.Relling,S.A.Scott,G.Twist,A.Verma,M.S.Williams,C.Wu,W.Yang,M.D.Ritchie

PCSK9MODULATINGVARIANTSINFAMILIALHYPERCHOLESTEROLEMIA......................107SarathbabuKrishnamurthy,DianeSmelser,ManickamKandamurugu,JosephLeader,NouraS.Abul-Husn,AlanR.Shuldiner,DavidH.Ledbetter,FrederickE.Dewey,DavidJ.Carey,MichaelF.Murray,RaghuP.R.Metpally

INTEGRATIVENETWORKANALYSISOFPROSTATETISSUELINCRNA-MRNAEXPRESSIONPROFILESREVEALSPOTENTIALREGULATORYMECHANISMSOFPROSTATECANCERRISKLOCI.................................................................................................................108NicholasB.Larson,ShannonMcDonnell,ZachFogarty,MelissaLarson,JohnCheville,ShaunRiska,SaurabhBaheti,AshaA.Nair,DanielO’Brien,Jaime Davila, Daniel Schaid, Stephen N. Thibodeau

INTEGRATEDANALYSISOFGENOMICS,PROTEOMICS,ANDPHOSPHOPROTEOMICSINCELLSANDTUMORSAMPLES...................................................................................................................109JasonE.McDermott,TaoLiu,SamuelPayne,VladislavPetyuk,RichardSmith,PhilippMertins,StevenCarr,KarinRodland

NETDX:PATIENTCLASSIFICATIONUSINGINTEGRATEDPATIENTSIMILARITYNETWORKS........................................................................................................................................................110ShraddhaPai,ShirleyHui,RuthIsserlin,HussamKaka,GaryD.Bader

PREVALENCEANDDETECTIONOFLOW-ALLELE-FRACTIONVARIANTSINCLINICALCANCERSAMPLES...........................................................................................................................................111Hyun-TaeShin,JaeWonYun,NayoungK.D.Kim,Yoon-LaChoi,Woong-YangPark,PeterJ.Park

AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETS.......................................112JeffreyA.Thompson,CarmenJ.Marsit

CYP2D6DIPLOTYPECALLINGFROMWGSUSINGASTROLABE:UPDATE............................113AndreaGaedigk,GreysonP.Twist,SarahSoden,EmilyG.Farrow,NeilA.Miller

ix

INTEGRATION,INTERPRETATIONANDDISPLAYOFMULTI-OMICDATAFORPRECISIONMEDICINE...........................................................................................................................................................114DavidS.Wishart,AnaMarcu,AnChiGuo,AshAnwar,SolveigJohannessen,CraigKnox,MichaelWilson,ChristophH.Borchers,PieterCullis,RobertFraser

BIOTHINGSAPIS:LINKEDHIGH-PERFORMANCEAPISFORBIOLOGICALENTITIES..........115JiwenXin,CyrusAfrasiabi,SebastienLelong,GingerTsueng,SeanD.Mooney,AndrewI.Su,ChunleiWu

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY116SINGLECELLSIGNALINGSTATESREVEALINDUCTIONOFNON-GENETICVARIATIONINRESISTANCETOTRAIL-INDUCEDAPOPTOSIS..................................................................................117ReemaBaskar,HarrisFienberg,GarryNolan,SeanBendall

ANOVELK-NEARESTNEIGHBORSAPPROACHTOCOMPAREMULTIPLEBIOLOGICALCONDITIONSINSINGLECELLDATA......................................................................................................118TylerJ.Burns,GarryP.Nolan,NikolaySamusik

SINGLE-CELLRNASEQUENCINGINPRIMARYGLIOBLASTOMA:IMPROVINGANALYSISOFHETEROGENEOUSSAMPLESBYINCORPORATINGQUANTIFICATIONOFUNCERTAINTY..................................................................................................................................................119WendyMarieIngram,DebdiptoMisra,NicholasF.Marko,MarylynRitchie

REGISTRATIONOFFLOWCYTOMETRYDATAUSINGSWIFTCLUSTERTEMPLATESTOREMOVECHANNEL-SPECIFICORCLUSTER-SPECIFICVARIATION.........................................120JonathanA.Rebhahn,SallyA.Quataert,GauravSharma,TimR.Mosmann

WORKSHOP:NOBOUNDARYTHINKINGINBIOINFORMATICS 121ENABLINGRICHERDATAINTEGRATIONFORGENOMICEPIDEMIOLOGY..........................122E. Griffiths,D.Dooley,C.Bertelli,J.Adam,F.Bristow,T.Matthews,A.Petkau,M.Courtot,J.A. Carriço,A.Keddy,R.Beiko,L.M.Schriml,E.Taboada,M.Graham,G.VanDomselaar,W. Hsiao,F.Brinkman

AUTHORINDEX 123

1

COMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

2

IDENTIFICATIONANDANALYSISOFBACTERIALGENOMICMETABOLICSIGNATURES

NathanBowerman1,NathanTintle2,MatthewDeJongh3,AaronA.Best1

1DepartmentofBiology,HopeCollege;2DepartmentofMathematicsandStatistics,DordtCollege,3DepartmentofComputerScience,HopeCollege

BestAaronWithcontinuedrapidgrowthinthenumberandqualityoffullysequencedandaccuratelyannotatedbacterialgenomes,wehaveunprecedentedopportunitiestounderstandmetabolicdiversity.Weselected101diverseandrepresentativecompletelysequencedbacteriaandimplementedamanualcurationefforttoidentify846uniquemetabolicvariantspresentinthesebacteria.Thepresenceorabsenceofthesevariantsactasametabolicsignatureforeachofthebacteria,whichcanthenbeusedtounderstandsimilaritiesanddifferencesbetweenandacrossbacterialgroups.Weproposeanovelandrobustmethodofsummarizingmetabolicdiversityusingmetabolicsignaturesandusethismethodtogenerateametabolictree,clusteringmetabolicallysimilarorganisms.Resultinganalysisofthemetabolictreeconfirmsstrongassociationswithwell-establishedbiologicalresultsalongwithdirectinsightintoparticularmetabolicvariantswhicharemostpredictiveofmetabolicdiversity.Thepositiveresultsofthismanualcurationeffortandnovelmethoddevelopmentsuggestthatfutureworkisneededtofurtherexpandthesetofbacteriatowhichthisapproachisappliedandusetheresultingtreetotestbroadquestionsaboutmetabolicdiversityandcomplexityacrossthebacterialtreeoflife.

3

WHENSHOULDWENOTTRANSFERFUNCTIONALANNOTATIONBETWEENSEQUENCEPARALOGS?

MengfeiCao,LenoreJ.Cowen

TuftsUniversity

LenoreCowenCurrentautomatedcomputationalmethodstoassignfunctionallabelstounstudiedgenesofteninvolvetransferringannotationfromorthologousorparalogousgenes,howeversuchgenescanevolvedivergentfunctions,makingsuchtransferinappropriate.Weconsidertheproblemofdeterminingwhenitiscorrecttomakesuchanassignmentbetweenparalogs.Weconstructabenchmarkdatasetoftwotypesofsimilarparalogouspairsofgenesinthewell-studiedmodelorganismS.cerevisiae:onesetofpairswheresingledeletionmutantshaveverysimilarphenotypes(implyingsimilarfunctions),andanothersetofpairswheresingledeletionmutantshaveverydivergentphenotypes(implyingdifferentfunctions).Stateoftheartmethodsforthisproblemwilldeterminetheevolutionaryhistoryoftheparalogswithreferencestomultiplerelatedspecies.Here,weaskafirstandsimplerquestion:weexploretowhatextentanycomputationalmethodwithaccessonlytodatafromasinglespeciescansolvethisproblem.Weconsiderdivergencedata(atboththeaminoacidandnucleotidelevels),andnetworkdata(basedontheyeastprotein-proteininteractionnetwork,ascapturedinBioGRID),andaskifwecanextractfeaturesfromthesedatathatcandistinguishbetweenthesesetsofparalogousgenepairs.Wefindthatthebestfeaturescomefrommeasuresofsequencedivergence,however,simplenetworkmeasuresbasedondegreeorcentralityorshortestpathordiffusionstatedistance(DSD),orsharedneighborhoodintheyeastprotein-proteininteraction(PPI)networkalsocontainsomesignal.Oneshould,ingeneral,nottransferfunctionifsequencedivergenceistoohigh.Furtherimprovementsinclassificationwillneedtocomefrommorecomputationallyexpensivebutmuchmorepowerfulevolutionarymethodsthatincorporateancestralstatesandmeasureevolutionarydivergenceovermultiplespeciesbasedonevolutionarytrees.

4

PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION

ShengWang,MengQu,JianPeng

UniversityofIllinoisUrbana-Champaign

ShengWangAutomatedannotationofproteinfunctionhasbecomeacriticaltaskinthepost-genomicera.Network-basedapproachesandhomology-basedapproacheshavebeenwidelyusedandrecentlytestedinlarge-scalecommunity-wideassessmentexperiments.Itisnaturaltointegratenetworkdatawithhomologyinformationtofurtherimprovethepredictiveperformance.However,integratingthesetwoheterogeneous,high-dimensionalandnoisydatasetsisnon-trivial.Inthiswork,weintroduceanovelproteinfunctionpredictionalgorithmProSNet.Anintegratedheterogeneousnetworkisfirstbuilttoincludemolecularnetworksofmultiplespeciesandlinktogetherhomologousproteinsacrossmultiplespecies.Basedonthisintegratednetwork,adimensionalityreductionalgorithmisintroducedtoobtaincompactlow-dimensionalvectorstoencodeproteinsinthenetwork.Finally,wedevelopmachinelearningclassificationalgorithmsthattakethevectorsasinputandmakepredictionsbytransferringannotationsbothwithineachspeciesandacrossdifferentspecies.Extensiveexperimentsonfivemajorspeciesdemonstratethatourintegrationofhomologywithmolecularnetworkssubstantiallyimprovesthepredictiveperformanceoverexistingapproaches.

5

ONTHEPOWERANDLIMITSOFSEQUENCESIMILARITYBASEDCLUSTERINGOFPROTEINSINTOFAMILIES

ChristianWiwie,RichardRöttger

UniversityofSouthernDenmark

RichardRöttgerOverthelastdecades,wehaveobservedanongoingtremendousgrowthofavailablesequencingdatafueledbytheadvancementsinwet-labtechnology.Thesequencinginformationisonlythebeginningoftheactualunderstandingofhoworganismssurviveandprosper.Itis,forinstance,equallyimportanttoalsounraveltheproteomicrepertoireofanorganism.Aclassicalcomputationalapproachfordetectingproteinfamiliesisasequence-basedsimilaritycalculationcoupledwithasubsequentclusteranalysis.Inthisworkwehaveintensivelyanalyzedvariousclusteringtoolsonalargescale.Weusedthedatatoinvestigatethebehaviorofthetools'parametersunderliningthediversityoftheproteinfamilies.Furthermore,wetrainedregressionmodelsforpredictingtheexpectedperformanceofaclusteringtoolforanunknowndatasetandaimedtoalsosuggestoptimalparametersinanautomatedfashion.Ouranalysisdemonstratesthebenefitsandlimitationsoftheclusteringofproteinswithlowsequencesimilarityindicatingthateachproteinfamilyrequiresitsowndistinctsetoftoolsandparameters.Allresults,atoolpredictionservice,andadditionalsupportingmaterialisalsoavailableonlineunderhttp://proteinclustering.compbio.sdu.dk/

6

IMAGINGGENOMICS


7

INTEGRATIVEANALYSISFORLUNGADENOCARCINOMAPREDICTSMORPHOLOGICALFEATURESASSOCIATEDWITHGENETICVARIATIONS

ChaoWang1,HaiSu2,LinYang2,KunHuang1

1TheOhioStateUniversity,2UniversityofFlorida

KunHuangLungcancerisoneofthemostdeadlycancersandlungadenocarcinoma(LUAD)isthemostcommonhistologicaltypeoflungcancer.However,LUADishighlyheterogeneousduetogeneticdifferenceaswellasphenotypicdifferencessuchascellularandtissuemorphology.Inthispaper,wesystematicallyexaminetherelationshipsbetweenhistologicalfeaturesandgenetranscription.Specifically,wecalculated283morphologicalfeaturesfromhistologyimagesfor201LUADpatientsfromTCGAprojectandidentifiedthemorphologicalfeaturewithstrongcorrelationwithpatientoutcome.Wethenmodeledthemorphologyfeatureusingmultipleco-expressedgeneclustersusingLasso-regression.Manyofthegeneclustersarehighlyassociatedwithgeneticvariations,specificallyDNAcopynumbervariations,implyingthatgeneticvariationsplayimportantrolesinthedevelopmentcancermorphology.Asfarasweknow,ourfindingisthefirsttodirectlylinkthegeneticvariationsandfunctionalgenomicstoLUADhistology.Theseobservationswillleadtonewinsightonlungcancerdevelopmentandpotentialnewintegrativebiomarkersforpredictionpatientprognosisandresponsetotreatments.

8

IDENTIFICATIONOFDISCRIMINATIVEIMAGINGPROTEOMICSASSOCIATIONSINALZHEIMER'SDISEASEVIAANOVELSPARSECORRELATIONMODEL

JingwenYan,ShannonL.Risacher,KwangsikNho,AndrewJ.Saykin,LiShen

IndianaUniversity

JingwenYanBrainimagingandproteinexpression,frombothcerebrospinalfluidandbloodplasma,havebeenfoundtoprovidecomplementaryinformationinpredictingtheclinicaloutcomesofAlzheimer'sdisease(AD).Buttheunderlyingassociationsthatcontributetosuchacomplementaryrelationshiphavenotbeenpreviouslystudiedyet.Inthiswork,wewillperformanimagingproteomicsassociationanalysistoexplorehowtheyarerelatedwitheachother.Whiletraditionalassociationmodels,suchasSparseCanonicalCorrelationAnalysis(SCCA),cannotguaranteetheselectionofonlydisease-relevantbiomarkersandassociations,weproposeanoveldiscriminativeSCCA(denotedasDSCCA)modelwithnewpenaltytermstoaccountforthediseasestatusinformation.Givenbrainimaging,proteomicanddiagnosticdata,theproposedmodelcanperformajointassociationandmulti-classdiscriminationanalysis,suchthatwecannotonlyidentifydisease-relevantmultimodalbiomarkers,butalsorevealstrongassociationsbetweenthem.Basedonarealimagingproteomicdataset,theempiricalresultsshowthatDSCCAandtraditionalSCCAhavecomparableassociationperformances.Butinafurtherclassificationanalysis,canonicalvariablesofimagingandproteomicdataobtainedinDSCCAdemonstratemuchmorediscriminationpowertowardmultiplepairsofdiagnosisgroupsthanthoseobtainedinSCCA.

9

ENFORCINGCO-EXPRESSIONINMULTIMODALREGRESSIONFRAMEWORK

PascalZille1,VinceD.Calhoun2,Yu-PingWang1

1TulaneUniversity,2UniversityofNewMexico

PascalZilleWeconsidertheproblemofmultimodaldataintegrationforthestudyofcomplexneurologicaldiseases(e.g.schizophrenia).Amongthechallengesarisinginsuchsituation,estimatingthelinkbetweengeneticandneurologicalvariabilitywithinapopulationsamplehasbeenapromisingdirection.Awidevarietyofstatisticalmodelsarosefromsuchapplications.Forexample,Lassoregressionanditsmultitaskextensionareoftenusedtofitamultivariatelinearrelationshipbetweengivenphenotype(s)andassociatedobservations.Otherapproaches,suchascanonicalcorrelationanalysis(CCA),arewidelyusedtoextractrelationshipsbetweensetsofvariablesfromdifferentmodalities.Inthispaper,weproposeanexploratorymultivariatemethodcombiningthesetwomethods.MoreSpecifically,werelyona'CCA-type'formulationinordertoregularizetheclassicalmultimodalLassoregressionproblem.Theunderlyingmotivationistoextractdiscriminativevariablesthatdisplayarealsoco-expressedacrossmodalities.Wefirstevaluatethemethodonasimulateddataset,andfurthervalidateitusingSingleNucleotidePolymorphisms(SNP)andfunctionalMagneticResonanceImaging(fMRI)dataforthestudyofschizophrenia.

10

METHODSTOENSURETHEREPRODUCIBILITYOFBIOMEDICALRESEARCH


11

EXPLORINGTHEREPRODUCIBILITYOFPROBABILISTICCAUSALMOLECULARNETWORKMODELS

AriellaCohain,AparnaA.Divaraniya,KuixiZhu,JosephR.Scarpa,AndrewKasarskis,JunZhu,RuiChang,JoelT.Dudley,EricE.Schadt

IcahnInstituteandDepartmentofGeneticsandGenomics,IcahnSchoolofMedicineatMountSinai

AriellaCohainNetworkreconstructionalgorithmsareincreasinglybeingemployedinbiomedicalandlifesciencesresearchtointegratelarge-scale,high-dimensionaldatainformingonlivingsystems.OneparticularclassofprobabilisticcausalnetworksbeingappliedtomodelthecomplexityandcausalstructureofbiologicaldataisBayesiannetworks(BNs).BNsprovideanelegantmathematicalframeworkfornotonlyinferringcausalrelationshipsamongmanydifferentmolecularandhigherorderphenotypes,butalsoforincorporatinghighlydiversepriorsthatprovideanefficientpathforincorporatingexistingknowledge.WhilesignificantmethodologicaldevelopmentshavebroadlyenabledtheapplicationofBNstogenerateandvalidatemeaningfulbiologicalhypotheses,thereproducibilityofBNsinthiscontexthasnotbeensystematicallyexplored.Inthisstudy,weaimtodeterminethecriteriaforgeneratingreproducibleBNsinthecontextoftranscription-basedregulatorynetworks.Weutilizetwouniquetissuesfromindependentdatasets,wholebloodfromtheGTExConsortiumandliverfromtheStockholm-TartuAtherosclerosisReverseNetworkEngineeringTeam(STARNET)study.WeevaluatedthereproducibilityoftheBNsbycreatingnetworksondatasubsampledatdifferentlevelsfromeachcohortandcomparingthesenetworkstotheBNsconstructedusingthecompletedata.Tohelpvalidateourresults,weusedsimulatednetworksatvaryingsamplesizes.OurstudyindicatesthatreproducibilityofBNsinbiologicalresearchisanissueworthyoffurtherconsideration,especiallyinlightofthemanypublicationsthatnowemployfindingsfromsuchconstructswithoutappropriateattentionpaidtoreproducibility.Wefindthatwhileedge-to-edgereproducibilityisstronglydependentonsamplesize,identificationofmorehighlyconnectedkeydrivernodesinBNscanbecarriedoutwithhighconfidenceacrossarangeofsamplesizes.

12

REPRODUCIBLEDRUGREPURPOSING:WHENSIMILARITYDOESNOTSUFFICE

EmreGuney

JointIRB-BSC-CRGPrograminComputationalBiology-InstituteforResearchinBiomedicine(IRB)Barcelona

EmreGuneyRepurposingexistingdrugsfornewuseshasattractedconsiderableattentionoverthepastyears.Toidentifypotentialcandidatesthatcouldberepositionedforanewindication,manystudiesmakeuseofchemical,target,andsideeffectsimilaritybetweendrugstotrainclassifiers.Despitepromisingpredictionaccuraciesofthesesupervisedcomputationalmodels,theiruseinpractice,suchasforrarediseases,ishinderedbytheassumptionthattherearealreadyknownandsimilardrugsforagivenconditionofinterest.Inthisstudy,usingpubliclyavailabledatasets,wequestionthepredictionaccuraciesofsupervisedapproachesbasedondrugsimilaritywhenthedrugsinthetrainingandthetestsetarecompletelydisjoint.WefirstbuildaPythonplatformtogeneratereproduciblesimilarity-baseddrugrepurposingmodels.Next,weshowthat,whileasimplechemical,target,andsideeffectsimilaritybasedmachinelearningmethodcanachievegoodperformanceonthebenchmarkdataset,thepredictionperformancedropssharplywhenthedrugsinthefoldsofthecrossvalidationarenotoverlappingandthesimilarityinformationwithinthetrainingandtestsetsareusedindependently.Theseintriguingresultssuggestrevisitingtheassumptionsunderlyingthevalidationscenariosofsimilarity-basedmethodsandunderlinetheneedforunsupervisedapproachestoidentifynoveldrugusesinsidetheunexploredpharmacologicalspace.WemakethedigitalnotebookcontainingthePythoncodetoreplicateouranalysisthatinvolvesthedrugrepurposingplatformbasedonmachinelearningmodelsandtheproposeddisjointcrossfoldgenerationmethodfreelyavailableatgithub.com/emreg00/repurpose.

13

EMPOWERINGMULTI-COHORTGENEEXPRESSIONANALYSISTOINCREASEREPRODUCIBILITY

WinstonA.Haynes,FrancescoVallania,CharlesLiu,ErikaBongen,AurelieTomczak,MartaAndres-Terrè,ShaneLofgren,AndrewTam,ColeA.Deisseroth,MatthewD.Li,

TimothyE.Sweeney,PurveshKhatri

StanfordUniversity

WinstonHaynesAmajorcontributortothescientificreproducibilitycrisishasbeenthattheresultsfromhomogeneous,single-centerstudiesdonotgeneralizetoheterogeneous,realworldpopulations.Multi-cohortgeneexpressionanalysishashelpedtoincreasereproducibilitybyaggregatingdatafromdiversepopulationsintoasingleanalysis.Tomakethemulti-cohortanalysisprocessmorefeasible,wehaveassembledananalysispipelinewhichimplementsrigorouslystudiedmeta-analysisbestpractices.Wehavecompiledandmadepubliclyavailabletheresultsofourownmulti-cohortgeneexpressionanalysisof103diseases,spanning615studiesand36,915samples,throughanovelandinteractivewebapplication.Asaresult,wehavemadeboththeprocessofandtheresultsfrommulti-cohortgeneexpressionanalysismoreapproachablefornon-technicalusers.

14

RABIX:ANOPEN-SOURCEWORKFLOWEXECUTORSUPPORTINGRECOMPUTABILITYANDINTEROPERABILITYOFWORKFLOWDESCRIPTIONS

GauravKaushik,SinisaIvkovic,JankoSimonovic,NebojsaTijanic,BrandiDavis-Dusenbery,DenizKural

SevenBridgesGenomics

GauravKaushikAsbiomedicaldatahasbecomeincreasinglyeasytogenerateinlargequantities,themethodsusedtoanalyzeithaveproliferatedrapidly.Reproducibleandreusablemethodsarerequiredtolearnfromlargevolumesofdatareliably.Toaddressthisissue,numerousgroupshavedevelopedworkflowspecificationsorexecutionengines,whichprovideaframeworkwithwhichtoperformasequenceofanalyses.OnesuchspecificationistheCommonWorkflowLanguage,anemergingstandardwhichprovidesarobustandflexibleframeworkfordescribingdataanalysistoolsandworkflows.Inaddition,reproducibilitycanbefurtheredbyexecutorsorworkflowengineswhichinterpretthespecificationandenableadditionalfeatures,suchaserrorlogging,fileorganization,optimizationstocomputationandjobscheduling,andallowforeasycomputingonlargevolumesofdata.Tothisend,wehavedevelopedtheRabixExecutora,anopen-sourceworkflowengineforthepurposesofimprovingreproducibilitythroughreusabilityandinteroperabilityofworkflowdescriptions.

15

DATASHARINGANDCLINICALGENETICTESTING:SUCCESSESANDCHALLENGES

ShanYang1,MelissaCline2,CanZhang2,BenedictPaten2,StephenE.Lincoln1

1Invitae,2UniversityofCaliforniaSantaCruz

StephenLincolnOpensharingofclinicalgeneticdatapromisestobothmonitorandeventuallyimprovethereproducibilityofvariantinterpretationamongclinicaltestinglaboratories.AsignificantpublicdataresourcehasbeendevelopedbytheNIHClinVarinitiative,whichincludessubmissionsfromhundredsoflaboratoriesandclinicsworldwide.WeanalyzedasubsetofClinVardatafocusedonspecificclinicalareasandwefindhighreproducibility(>90%concordance)amonglabs,althoughchallengesforthecommunityareclearlyidentifiedinthisdataset.WefurtherreviewresultsforthecommonlytestedBRCA1andBRCA2genes,whichshowevenhigherconcordance,althoughthesignificantfragmentationofdataintodifferentsilospresentsanongoingchallengenowbeingaddressedbytheBRCAExchange.Weencouragealllaboratoriesandclinicstocontributetotheseimportantresources.

16

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM?


17

LEARNINGATTRIBUTESOFDISEASEPROGRESSIONFROMTRAJECTORIESOFSPARSELABVALUES

VibhuAgarwal1,NigamH.Shah2

1BiomedicalInformaticsTrainingProgram,StanfordUniversity,2TheCenterforBiomedicalInformaticsResearch,StanfordUniversity

VibhuAgarwalThereisheterogeneityinthemanifestationofdiseases,thereforeitisessentialtounderstandthepatternsofprogressionofadiseaseinagivenpopulationfordiseasemanagementaswellasforclinicalresearch.Diseasestatusisoftensummarizedbyrepeatedrecordingsofoneormorephysiologicalmeasures.Asaresult,historicalvaluesofthesephysiologicalmeasuresforapopulationsamplecanbeusedtocharacterizediseaseprogressionpatterns.Weuseamethodforclusteringsparsefunctionaldataforidentifyingsub-groupswithinacohortofpatientswithchronickidneydisease(CKD),basedonthetrajectoriesoftheirCreatininemeasurements.Wedemonstratethroughaproof-of-principlestudyhowthetwosub-groupsthatdisplaydistinctpatternsofdiseaseprogressionmaybecomparedonclinicalattributesthatcorrespondtothemaximumdifferenceinprogressionpatterns.Thekeyattributesthatdistinguishthetwosub-groupsappeartohavesupportinpublishedliteratureclinicalpracticerelatedtoCKD.

18

COMPUTERAIDEDIMAGESEGMENTATIONANDCLASSIFICATIONFORVIABLEANDNON-VIABLETUMORIDENTIFICATIONINOSTEOSARCOMA

HarishBabuArunachalam1,RashikaMishra1,BogdanArmaselu1,OvidiuDaescu1,MariaMartinez1,PatrickLeavey1,DineshRakheja2,KevinCederberg2,AnitaSengupta2,Molly

Ni'Suilleabhain2

1UniversityofTexasatDallas,2UniversityofTexasSouthwesternMedicalCenter

HarishBabuArunachalamOsteosarcomaisoneofthemostcommontypesofbonecancerinchildren.Togaugetheextentofcancertreatmentresponseinthepatientaftersurgicalresection,theH&Estainedimageslidesaremanuallyevaluatedbypathologiststoestimatethepercentageofnecrosis,atimeconsumingprocesspronetoobserverbiasandinaccuracy.Digitalimageanalysisisapotentialmethodtoautomatethisprocess,thussavingtimeandprovidingamoreaccurateevaluation.TheslidesarescannedinAperioScanscope,convertedtodigitalWholeSlideImages(WSIs)andstoredinSVSformat.Thesearehighresolutionimages,oftheorderof10^9pixels,allowingupto40Xmagnificationfactor.Thispaperproposesanimagesegmentationandanalysistechniqueforsegmentingtumorandnon-tumorregionsinhistopathologicalWSIsofosteosarcomadatasets.Ourapproachisacombinationofpixel-basedandobject-basedmethodswhichutilizetumorpropertiessuchasnucleicluster,density,andcircularitytoclassifytumorregionsasviableandnon-viable.AK-Meansclusteringtechniqueisusedfortumorisolationusingcolornormalization,followedbymulti-thresholdOtsusegmentationtechniquetofurtherclassifytumorregionasviableandnon-viable.ThenaFlood-fillalgorithmisappliedtoclustersimilarpixelsintocellularobjectsandcomputeclusterdataforfurtheranalysisofregionsunderstudy.TothebestofourknowledgethisisthefirstcomprehensivesolutionthatisabletoproducesuchaclassificationforOsteosarcomacancer.Theresultsareveryconclusiveinidentifyingviableandnon-viabletumorregions.Inourexperiments,theaccuracyofthediscussedapproachis100%inviabletumorandcoagulativenecrosisidentificationwhileitisaround90%forfibrosisandacellular/hypocellulartumorosteoid,forallthesampleddatasetsused.Weexpectthedevelopedsoftwaretoleadtoasignificantincreaseinaccuracyanddecreaseininter-observervariabilityinassessmentofnecrosisbythepathologistsandareductioninthetimespentbythepathologistsinsuchassessments.

19

MISSINGDATAIMPUTATIONINTHEELECTRONICHEALTHRECORDUSINGDEEPLYLEARNEDAUTOENCODERS

BrettK.Beaulieu-Jones1,JasonH.Moore2,ThePooledResourceOpen-AccessALSClinicalTrialsConsortium

1GenomicsandComputationalBiologyGraduateGroup,ComputationalGeneticsLab,InstituteforBiomedicalInformatics,PerelmanSchoolofMedicine,UniversityofPennsylvania;2ComputationalGeneticsLab,InstituteforBiomedicalInformatics,

UniversityofPennsylvania

BrettBeaulieu-JonesElectronichealthrecords(EHRs)havebecomeavitalsourceofpatientoutcomedatabutthewidespreadprevalenceofmissingdatapresentsamajorchallenge.DifferentcausesofmissingdataintheEHRdatamayintroduceunintentionalbias.Here,wecomparetheeffectivenessofpopularmultipleimputationstrategieswithadeeplylearnedautoencoderusingthePooledResourceOpen-AccessALSClinicalTrialsDatabase(PRO-ACT).Toevaluateperformance,weexaminedimputationaccuracyforknownvaluessimulatedtobeeithermissingcompletelyatrandomormissingnotatrandom.WealsocomparedALSdiseaseprogressionpredictionacrossdifferentimputationmodels.Autoencodersshowedstrongperformanceforimputationaccuracyandcontributedtothestrongestdiseaseprogressionpredictor.Finally,weshowthatdespiteclinicalheterogeneity,ALSdiseaseprogressionappearshomogenouswithtimefromonsetbeingthemostimportantpredictor.

20

DEVELOPMENTANDPERFORMANCEOFTEXT-MININGALGORITHMSTOEXTRACTSOCIOECONOMICSTATUSFROMDE-IDENTIFIEDELECTRONICHEALTH

RECORDS

BrittanyM.Hollister1,NicoleA.Restrepo2,EricFarber-Eger3,DanaC.Crawford2,MelindaC.Aldrich4,AmyNon5

1VanderbiltGeneticInstitute,VanderbiltUniversity;2InstituteforComputationalBiologyandDepartmentofEpidemiologyandBiostatistics,CaseWesternReserveUniversity;3VanderbiltInstituteforClinicalandTranslationalResearch,VanderbiltUniversity;

4DepartmentofThoracicSurgeryandDivisionofEpidemiology,VanderbiltUniversityMedicalCenter;5DepartmentofAnthropology,UniversityofCaliforniaSanDiego

BrittanyHollisterSocioeconomicstatus(SES)isafundamentalcontributortohealth,andakeyfactorunderlyingracialdisparitiesindisease.However,SESdataarerarelyincludedingeneticstudiesdueinparttothedifficultlyofcollectingthesedatawhenstudieswerenotoriginallydesignedforthatpurpose.Theemergenceoflargeclinic-basedbiobankslinkedtoelectronichealthrecords(EHRs)providesresearchaccesstolargepatientpopulationswithlongitudinalphenotypedatacapturedinstructuredfieldsasbillingcodes,procedurecodes,andprescriptions.SESdatahowever,areoftennotexplicitlyrecordedinstructuredfields,butratherrecordedinthefreetextofclinicalnotesandcommunications.Thecontentandcompletenessofthesedatavarywidelybypractitioner.Toenablegene-environmentstudiesthatconsiderSESasanexposure,wesoughttoextractSESvariablesfromracial/ethnicminorityadultpatients(n=9,977)inBioVU,theVanderbiltUniversityMedicalCenterbiorepositorylinkedtode-identifiedEHRs.WedevelopedseveralmeasuresofSESusinginformationavailablewithinthede-identifiedEHR,includingbroadcategoriesofoccupation,education,insurancestatus,andhomelessness.TwohundredpatientswererandomlyselectedformanualreviewtodevelopasetofsevenalgorithmsforextractingSESinformationfromde-identifiedEHRs.Thealgorithmsconsistof15categoriesofinformation,with830uniquesearchterms.SESdataextractedfrommanualreviewof50randomlyselectedrecordswerecomparedtodataproducedbythealgorithm,resultinginpositivepredictivevaluesof80.0%(education),85.4%(occupation),87.5%(unemployment),63.6%(retirement),23.1%(uninsured),81.8%(Medicaid),and33.3%(homelessness),suggestingsomecategoriesofSESdataareeasiertoextractinthisEHRthanothers.TheSESdataextractionapproachdevelopedherewillenablefutureEHR-basedgeneticstudiestointegrateSESinformationintostatisticalanalyses.Ultimately,incorporationofmeasuresofSESintogeneticstudieswillhelpelucidatetheimpactofthesocialenvironmentondiseaseriskandoutcomes.

21

DEMODASHBOARD:VISUALIZINGANDUNDERSTANDINGGENOMICSEQUENCESUSINGDEEPNEURALNETWORKS

JackLanchantin,RitambharaSingh,BeilunWang,YanjunQi

UniversityofVirginia

JackLanchantinDeepneuralnetwork(DNN)modelshaverecentlyobtainedstate-of-the-artpredictionaccuracyforthetranscriptionfactorbinding(TFBS)siteclassificationtask.However,itremainsunclearhowtheseapproachesidentifymeaningfulDNAsequencesignalsandgiveinsightsastowhyTFsbindtocertainlocations.Inthispaper,weproposeatoolkitcalledtheDeepMotifDashboard(DeMoDashboard)whichprovidesasuiteofvisualizationstrategiestoextractmotifs,orsequencepatternsfromdeepneuralnetworkmodelsforTFBSclassification.WedemonstratehowtovisualizeandunderstandthreeimportantDNNmodels:convolutional,recurrent,andconvolutional-recurrentnetworks.Ourfirstvisualizationmethodisfindingatestsequence'ssaliencymapwhichusesfirst-orderderivativestodescribetheimportanceofeachnucleotideinmakingthefinalprediction.Second,consideringrecurrentmodelsmakepredictionsinatemporalmanner(fromoneendofaTFBSsequencetotheother),weintroducetemporaloutputscores,indicatingthepredictionscoreofamodelovertimeforasequentialinput.Lastly,aclass-specificvisualizationstrategyfindstheoptimalinputsequenceforagivenTFBSpositiveclassviastochasticgradientoptimization.Ourexperimentalresultsindicatethataconvolutional-recurrentarchitectureperformsthebestamongthethreearchitectures.ThevisualizationtechniquesindicatethatCNN-RNNmakespredictionsbymodelingbothmotifsaswellasdependenciesamongthem.

22

PREDICTIVEMODELINGOFHOSPITALREADMISSIONRATESUSINGELECTRONICMEDICALRECORD-WIDEMACHINELEARNING:ACASE-STUDYUSINGMOUNT

SINAIHEARTFAILURECOHORT

KhaderShameer1,2,KippW.Johnson1,2,AlexandreYahi7,RiccardoMiotto1,2,LiLi1,2,DoranRicks3,JebakumarJebakaran4,PatriciaKovatch1,4,ParthoP.Sengupta5,AnnetineGelijns8,Alan

Moskovitz8,BruceDarrow5,DavidL.Reich6,AndrewKasarskis1,NicholasP.Tatonetti7,SeanPinney5,JoelT.Dudley1,2,8*

1DepartmentofGeneticsandGenomics,IcahnInstituteofGenomicsandMultiscaleBiology;2InstituteofNextGenerationHealthcare,MountSinaiHealthSystem,NY;3DecisionSupport,

MountSinaiHealthSystem,NY;4MountSinaiDataWarehouse,IcahnInstituteofGenomicsandMultiscaleBiology,NY;5ZenaandMichaelA.WienerCardiovascularInstitute,IcahnSchoolofMedicineatMountSinai,NY;6DepartmentofAnesthesiology,IcahnSchoolofMedicineatMountSinai,NY;7DepartmentsofBiomedicalInformatics,SystemsBiologyandMedicine,

ColumbiaUniversityMedicalCenter,NY;8PopulationHealthScienceandPolicy,MountSinaiHealthSystem,NY

*CorrespondingAuthor,Email:joel.dudley@mssm.eduKhaderShameerReductionofpreventablehospitalreadmissionsthatresultfromchronicoracuteconditionslikestroke,heartfailure,myocardialinfarctionandpneumoniaremainsasignificantchallengeforimprovingtheoutcomesanddecreasingthecostofhealthcaredeliveryintheUnitedStates.Patientreadmissionratesarerelativelyhighforconditionslikeheartfailure(HF)despitetheimplementationofhigh-qualityhealthcaredeliveryoperationguidelinescreatedbyregulatoryauthorities.Multiplepredictivemodelsarecurrentlyavailabletoevaluatepotential30-dayreadmissionratesofpatients.Mostofthesemodelsarehypothesisdrivenandrepetitivelyassessthepredictiveabilitiesofthesamesetofbiomarkersaspredictivefeatures.Inthismanuscript,wediscussourattempttodevelopadata-driven,electronic-medicalrecord-wide(EMR-wide)featureselectionapproachandsubsequentmachinelearningtopredictreadmissionprobabilities.Wehaveassessedalargerepertoireofvariablesfromelectronicmedicalrecordsofheartfailurepatientsinasinglecenter.Thecohortincluded1,068patientswith178patientswerereadmittedwithina30-dayinterval(16.66%readmissionrate).Atotalof4,205variableswereextractedfromEMRincludingdiagnosiscodes(n=1,763),medications(n=1,028),laboratorymeasurements(n=846),surgicalprocedures(n=564)andvitalsigns(n=4).WedesignedamultistepmodelingstrategyusingtheNaïveBayesalgorithm.Inthefirststep,wecreatedindividualmodelstoclassifythecases(readmitted)andcontrols(non-readmitted).Inthesecondstep,featurescontributingtopredictiveriskfromindependentmodelswerecombinedintoacompositemodelusingacorrelation-basedfeatureselection(CFS)method.Allmodelsweretrainedandtestedusinga5-foldcross-validationmethod,with70%ofthecohortusedfortrainingandtheremaining30%fortesting.ComparedtoexistingpredictivemodelsforHFreadmissionrates(AUCsintherangeof0.6-0.7),resultsfromourEMR-widepredictivemodel(AUC=0.78;Accuracy=83.19%)andphenome-widefeatureselectionstrategiesareencouragingandrevealtheutilityofsuchdata-drivenmachinelearning.Finetuningofthemodel,replicationusingmulti-centercohortsandprospectiveclinicaltrialtoevaluatetheclinicalutilitywouldhelptheadoptionofthemodelasaclinicaldecisionsystemforevaluatingreadmissionstatus.

23

METHODSFORCLUSTERINGTIMESERIESDATAACQUIREDFROMMOBILEHEALTHAPPS

NicoleTignor1,PeiWang1,NicholasGenes1,LindaRogers1,StevenG.Hershman2,ErickR.Scott1,MicolZweig1,Yu-FengYvonneChan1,EricE.Schadt1

1IcahnSchoolofMedicineatMountSinai,2LifeMapSolutions

NicoleTignorInourrecentAsthmaMobileHealthStudy(AMHS),thousandsofasthmapatientsacrossthecountrycontributedmedicaldatathroughtheiPhoneAsthmaHealthApponadailybasisforanextendedperiodoftime.Thecollecteddataincludeddailyself-reportedasthmasymptoms,symptomtriggers,andrealtimegeographiclocationinformation.TheAMHSisjustoneofmanystudiesoccurringinthecontextofnowmanythousandsofmobilehealthappsaimedatimprovingwellnessandbettermanagingchronicdiseaseconditions,leveragingthepassiveandactivecollectionofdatafrommobile,handheldsmartdevices.Theabilitytoidentifypatientgroupsorpatternsofsymptomsthatmightpredictadverseoutcomessuchasasthmaexacerbationsorhospitalizationsfromthesetypesoflarge,prospectivelycollecteddatasets,wouldbeofsignificantgeneralinterest.However,conventionalclusteringmethodscannotbeappliedtothesetypesoflongitudinallycollecteddata,especiallysurveydataactivelycollectedfromappusers,givenheterogeneouspatternsofmissingvaluesdueto:1)varyingsurveyresponseratesamongdifferentusers,2)varyingsurveyresponseratesovertimeofeachuser,and3)non-overlappingperiodsofenrollmentamongdifferentusers.Tohandlesuchcomplicatedmissingdatastructure,weproposedaprobabilityimputationmodeltoinfermissingdata.Wealsoemployedaconsensusclusteringstrategyintandemwiththemultipleimputationprocedure.Throughsimulationstudiesunderarangeofscenariosreflectingrealdataconditions,weidentifiedfavorableperformanceoftheproposedmethodoverotherstrategiesthatimputethemissingvaluethroughlow-rankmatrixcompletion.WhenapplyingtheproposednewmethodtostudyasthmatriggersandsymptomscollectedaspartoftheAMHS,weidentifiedseveralpatientgroupswithdistinctphenotypepatterns.Furthervalidationofthemethodsdescribedinthispapermightbeusedtoidentifyclinicallyimportantpatternsinlargedatasetswithcomplicatedmissingdatastructure,improvingtheabilitytousesuchdatasetstoidentifyat-riskpopulationsforpotentialintervention.

24

ANEWRELEVANCEESTIMATORFORTHECOMPILATIONANDVISUALIZATIONOFDISEASEPATTERNSANDPOTENTIALDRUGTARGETS

ModestvonKorff,TobiasFink,ThomasSander

ResearchInformationManagement,ActelionPharmaceuticalsLtd.

ModestvonKorffAnewcomputationalmethodispresentedtoextractdiseasepatternsfromheterogeneousandtext-baseddata.Forthisstudy,22millionPubMedrecordswereminedforco-occurrencesofgenenamesynonymsanddiseaseMeSHterms.TheresultingpublicationcountsweretransferredintoamatrixMdata.Inthismatrix,adiseasewasrepresentedbyarowandagenebyacolumn.Eachfieldinthematrixrepresentedthepublicationcountforaco-occurringdisease–genepair.AsecondmatrixwithidenticaldimensionsMrelevancewasderivedfromMdata.TocreateMrelevancethevaluesfromMdatawerenormalized.Thenormalizedvaluesweremultipliedbythecolumn-wisecalculatedGinicoefficient.Thismultiplicationresultedinarelevanceestimatorforeverygeneinrelationtoadisease.FromMrelevancethesimilaritiesbetweenallrowvectorswerecalculated.TheresultingsimilaritymatrixSrelevancerelated5,000diseasesbytherelevanceestimatorscalculatedfor15,000genes.Threediseaseswereanalyzedindetailforthevalidationofthediseasepatternsandtherelevantgenes.CytoscapewasusedtovisualizeandtoanalyzeMrelevanceandSrelevancetogetherwiththegenesanddiseases.Summarizingtheresults,itcanbestatedthattherelevanceestimatorintroducedherewasabletodetectvaliddiseasepatternsandtoidentifygenesthatencodedkeyproteinsandpotentialtargetsfordrugdiscoveryprojects.

25

DISCOVERYOFFUNCTIONALANDDISEASEPATHWAYSBYCOMMUNITYDETECTIONINPROTEIN-PROTEININTERACTIONNETWORKS

StephenJ.Wilson,AngelaD.Wilkins,Chih-HsuLin,RhonaldC.Lua,OlivierLichtarge

BaylorCollegeofMedicine

StephenWilsonAdvancesincellular,molecular,anddiseasebiologydependonthecomprehensivecharacterizationofgeneinteractionsandpathways.Traditionally,thesepathwaysarecuratedmanually,limitingtheirefficientannotationand,potentially,reinforcingfield-specificbias.Here,inordertotestobjectiveandautomatedidentificationoffunctionallycooperativegenes,wecomparedanovelalgorithmwiththreeestablishedmethodstosearchforcommunitieswithingeneinteractionnetworks.Communitiesidentifiedbythenovelapproachandbyoneoftheestablishedmethodoverlappedsignificantly(q<0.1)withcontrolpathways.Withrespecttodisease,thesecommunitieswerebiasedtogeneswithpathogenicvariantsinClinVar(p<<0.01),andoftengenesfromthesamecommunitywereco-expressed,includinginbreastcancers.Theinterestingsubsetofnovelcommunities,definedbypooroverlaptocontrolpathwaysalsocontainedco-expressedgenes,consistentwithapossiblefunctionalrole.Thisworkshowsthatcommunitydetectionbasedontopologicalfeaturesofnetworkssuggestsnew,biologicallymeaningfulgroupingsofgenesthat,inturn,pointtohealthanddiseaserelevanthypotheses.

26

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES


27

OPENINGTHEDOORTOTHELARGESCALEUSEOFCLINICALLABMEASURESFORASSOCIATIONTESTING:EXPLORINGDIFFERENTMETHODSFORDEFINING

PHENOTYPES

ChristopherR.Bauer,DanielLavage,JohnSnyder,JosephLeader,J.MatthewMahoney,SarahA.Pendergrass

GeisingerHealthSystem,UniversityofVermont

ChristopherBauerThepastdecadehasseenexponentialgrowthinthenumbersofsequencedandgenotypedindividualsandacorrespondingincreaseinourabilityofcollectandcataloguephenotypicdataforuseintheclinic.Wenowfacethechallengeofintegratingthesediversedatainnewwaysnewthatcanprovideusefuldiagnosticsandprecisemedicalinterventionsforindividualpatients.Oneofthefirststepsinthisprocessistoaccuratelymapthephenotypicconsequencesofthegeneticvariationinhumanpopulations.Themostcommonapproachforthisisthegenomewideassociationstudy(GWAS).Whilethistechniqueisrelativelysimpletoimplementforagivenphenotype,thechoiceofhowtodefineaphenotypeiscritical.ItisbecomingincreasinglycommonforeachindividualinaGWAScohorttohavealargeprofileofquantitativemeasures.Thestandardapproachistotestforassociationswithonemeasureatatime;however,therearemanyjustifiablewaystodefineasetofphenotypes,andthegeneticassociationsthatarerevealedwillvarybasedonthesedefinitions.Somephenotypesmayonlyshowasignificantgeneticassociationsignalwhenconsideredtogether,suchasthroughprinciplecomponentsanalysis(PCA).Combiningcorrelatedmeasuresmayincreasethepowertodetectassociationbyreducingthenoisepresentinindividualvariablesandreducethemultiplehypothesistestingburden.HereweshowthatPCAandk-meansclusteringaretwocomplimentarymethodsforidentifyingnovelgenotype-phenotyperelationshipswithinasetofquantitativehumantraitsderivedfromtheGeisingerHealthSystemelectronichealthrecord(EHR).Usingadiversesetofapproachesfordefiningphenotypemayyieldmoreinsightsintothegeneticarchitectureofcomplextraitsandthefindingspresentedherehighlightaclearneedforfurtherinvestigationintoothermethodsfordefiningthemostrelevantphenotypesinasetofvariables.AsthedataofEHRcontinuetogrow,addressingtheseissueswillbecomeincreasinglyimportantinoureffortstousegenomicdataeffectivelyinmedicine.

28

TEMPORALORDEROFDISEASEPAIRSAFFECTSSUBSEQUENTDISEASETRAJECTORIES:THECASEOFDIABETESANDSLEEPAPNEA

MetteBeck1,DavidWestergaard1,LeifGroop2,SorenBrunak1

1NovoNordiskFoundationCenterforProteinResearch;2LundUniversityDiabetesCentre,DepartmentofClinicalSciences

MetteBeckMoststudiesofdiseaseetiologiesfocusononediseaseonlyandnotthefullspectrumofmultimorbiditiesthatmanypatientshave.Somediseasepairshavesharedcausalorigins,othersrepresentcommonfollow-ondiseases,whileyetotherco-occurringdiseasesmaymanifestthemselvesinrandomorderofappearance.Wediscussthesedifferenttypesofdiseaseco-occurrences,andusethetwodiseases“sleepapnea”and“diabetes”toshowcasetheapproachwhichotherwisecanbeappliedtoanydiseasepair.WebenefitfromsevenmillionelectronicmedicalrecordscoveringtheentirepopulationofDenmarkformorethan20years.Sleepapneaisthemostcommonsleep-relatedbreathingdisorderandithaspreviouslybeenshowntobebidirectionallylinkedtodiabetes,meaningthateachdiseaseincreasestheriskofacquiringtheother.Weconfirmthatthereisnosignificanttemporalrelationship,asapproximatelyhalfofpatientswithbothdiseasesarediagnosedwithdiabetesfirst.However,wealsoshowthatpatientsdiagnosedwithdiabetesbeforesleepapneahaveahigherdiseaseburdencomparedtopatientsdiagnosedwithsleepapneabeforediabetes.Thestudyclearlydemonstratesthatitisnotonlythediagnosesinthepatient’sdiseasehistorythatareimportant,butalsothespecificorderinwhichthesediagnosisaregiventhatmattersintermsofoutcome.Wesuggestthatthisshouldbeconsideredforpatientstratification.

29

HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER

JonathanGallion,AngelaD.Wilkins,OlivierLichtarge


JonathanGallionThediscoveryofdrivergenesisamajorpursuitofcancergenomics,usuallybasedonobservingthesamemutationindifferentpatients.Buttheheterogeneityofcancerpathwaysplusthehighbackgroundmutationalfrequencyoftumorcellsoftencloudthedistinctionbetweenlessfrequentdriversandinnocentpassengermutations.Here,toovercomethesedisadvantages,wegroupedtogethermutationsfromclosekinaseparalogsunderthehypothesisthatcognatemutationsmayfunctionallyfavorcancercellsinsimilarways.Indeed,wefindthatkinaseparalogsoftenbearmutationstothesamesubstitutedaminoacidatthesamealignedpositionsandwithalargepredictedEvolutionaryAction.Functionally,thesehighEvolutionaryAction,non-randommutationsaffectknownkinasemotifs,butstrikingly,theydosodifferentlyamongdifferentkinasetypesandcancers,consistentwithdifferencesinselectivepressures.Takentogether,theseresultssuggestthatcancerpathwaysmayflexiblydistributeadependenceonagivenfunctionalmutationamongmultipleclosekinaseparalogs.Therecognitionofthis“mutationaldelocalization”ofcancerdriversamonggroupsofparalogsisanewphenomenathatmayhelpbetteridentifyrelevantmechanismsandthereforeeventuallyguidepersonalizedtherapy.

30

MUSE:AMULTI-LOCUSSAMPLING-BASEDEPISTASISALGORITHMFORQUANTITATIVEGENETICTRAITPREDICTION

DanHe,LaxmiParida

IBMThomasJ.WatsonResearchCenter

DanHeQuantitativegenetictraitpredictionbasedonhigh-densitygenotypingarraysplaysanimportantroleforplantandanimalbreeding,aswellasgeneticepidemiologysuchascomplexdiseases.Thepredictioncanbeveryhelpfultodevelopbreedingstrategiesandiscrucialtotranslatethefindingsingeneticstoprecisionmedicine.Epistasis,thephenomenawheretheSNPsinteractwitheachother,hasbeenstudiedextensivelyinGenomeWideAssociationStudies(GWAS)butreceivedrelativelylessattentionforquantitativegenetictraitprediction.Asthenumberofpossibleinteractionsisgenerallyextremelylarge,evenpairwiseinteractionsisverychallenging.Toourknowledge,thereisnosolidsolutionyettoutilizeepistasistoimprovegenetictraitprediction.Inthiswork,westudiedthemulti-locusepistasisproblemwheretheinteractionswithmorethantwoSNPsareconsidered.WedevelopedanefficientalgorithmMUSEtoimprovethegenetictraitpredictionwiththehelpofmulti-locusepistasis.MUSEissampling-basedandweproposedafewdifferentsamplingstrategies.OurexperimentsonrealdatashowedthatMUSEisnotonlyefficientbutalsoeffectivetoimprovethegenetictraitprediction.MUSEalsoachievedverysignificantimprovementsonarealplantdatasetaswellasarealhumandataset.

31

DIFFERENTIALPATHWAYDEPENDENCYDISCOVERYASSOCIATEDWITHDRUGRESPONSEACROSSCANCERCELLLINES

GilSpeyer1,DivyaMahendra1,HaiJ.Tran1,JeffKiefer1,StuartL.Schreiber2,PaulA.Clemons2,HarshilDhruv1,MichaelBerens1,SeungchanKim1

1TheTranslationalGenomicsResearchInstitute,2BroadInstituteofHarvardandMIT

SeungchanKimTheefforttopersonalizetreatmentplansforcancerpatientsinvolvestheidentificationofdrugtreatmentsthatcaneffectivelytargetthediseasewhileminimizingthelikelihoodofadversereactions.Inthisstudy,thegene-expressionprofileof810cancercelllinesandtheirresponsedatato368smallmoleculesfromtheCancerTherapeuticsResearchPortal(CTRP)areanalyzedtoidentifypathwayswithsignificantrewiringbetweengenes,ordifferentialgenedependency,betweensensitiveandnon-sensitivecelllines.Identifiedpathwaysandtheircorrespondingdifferentialdependencynetworksarefurtheranalyzedtodiscoveressentialityandspecificitymediatorsofcelllineresponsetodrugs/compounds.ForanalysisweusethepreviouslypublishedmethodEDDY(EvaluationofDifferentialDependencY).EDDYfirstconstructslikelihooddistributionsofgene-dependencynetworks,aidedbyknowngene-geneinteraction,fortwogivenconditions,forexample,sensitivecelllinesvs.non-sensitivecelllines.Thesesetsofnetworksyieldadivergencevaluebetweentwodistributionsofnetworklikelihoodsthatcanbeassessedforsignificanceusingpermutationtests.Resultingdifferentialdependencynetworkswerethenfurtheranalyzedtoidentifygenes,termedmediators,whichmayplayimportantrolesinbiologicalsignalingincertaincelllinesthataresensitiveornon-sensitivetothedrugs.Establishingstatisticalcorrespondencebetweencompoundsandmediatorscanimproveunderstandingofknowngenedependenciesassociatedwithdrugresponsewhilealsodiscoveringnewdependencies.Millionsofcomputehoursresultedinthousandsofthesestatisticaldiscoveries.EDDYidentified8,811statisticallysignificantpathwaysleadingto26,822compound-pathway-mediatortriplets.ByincorporatingSTITCHandSTRINGdatabases,wecouldconstructevidencenetworksfor14,415compound-pathway-mediatortripletsforsupport.Theresultsofthisanalysisarepresentedinasearchablewebsitetoaidresearchersinstudyingpotentialmolecularmechanismsunderlyingcells’drugresponseaswellasindesigningexperimentsforthepurposeofpersonalizedtreatmentregimens.

32

AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETSIN

CLEARCELLKIDNEYCANCER

JeffreyA.Thompson1,CarmenJ.Marsit2

1DartmouthCollege,2EmoryUniversity

JeffreyThompsonManyresearchersnowhaveavailablemultiplehigh-dimensionalmolecularandclinicaldatasetswhenstudyingadisease.Asweenterthismulti-omiceraofdataanalysis,newapproachesthatcombinedifferentlevelsofdata(e.g.atthegenomicandepigenomiclevels)arerequiredtofullycapitalizeonthisopportunity.Inthiswork,weoutlineanewapproachtomulti-omicdataintegration,whichcombinesmolecularandclinicalpredictorsaspartofasingleanalysistocreateaprognosticriskscoreforclearcellrenalcellcarcinoma.Theapproachintegratesdatainmultiplewaysandyetcreatesmodelsthatarerelativelystraightforwardtointerpretandwithahighlevelofperformance.Furthermore,theproposedprocessofdataintegrationcapturesrelationshipsinthedatathatrepresenthighlydisease-relevantfunctions.

33

DENOVOMUTATIONSINAUTISMIMPLICATETHESYNAPTICELIMINATIONNETWORK

GuhanRamVenkataraman1,ChloeO'Connell1,FumikoEgawa2,DornaKashef-Haghighi1,DennisPaulWall1

1StanfordUniversity,2St.George'sUniversity

FumikoEgawaAutismhasbeenshowntohaveamajorgeneticriskcomponent;thearchitectureofdocumentedautisminfamilieshasbeenoverandagainshowntobepasseddownforgenerations.Whileinheritedriskplaysanimportantroleintheautisticnatureofchildren,denovo(germline)mutationshavealsobeenimplicatedinautismrisk.HerewefindthatautismdenovovariantsverifiedandpublishedintheliteratureareBonferroni-significantlyenrichedinagenesetimplicatedinsynapticelimination.Additionally,severalofthegenesinthissynapticeliminationsetthatwereenrichedinprotein-proteininteractions(CACNA1C,SHANK2,SYNGAP1,NLGN3,NRXN1,andPTEN)havebeenpreviouslyconfirmedasgenesthatconferriskforthedisorder.Theresultsdemonstratethatautism-associateddenovosarelinkedtopropersynapticpruninganddensity,hintingattheetiologyofautismandsuggestingpathophysiologyfordownstreamcorrectionandtreatment.

34

IDENTIFYINGGENETICASSOCIATIONSWITHVARIABILITYINMETABOLICHEALTHANDBLOODCOUNTLABORATORYVALUES:DIVINGINTOTHE

QUANTITATIVETRAITSBYLEVERAGINGLONGITUDINALDATAFROMANEHR

ShefaliS.Verma1,AnastasiaM.Lucas1,DanielR.Lavage1,JosephB.Leader1,RaghuMetpally2,SarathbabuKrishnamurthy1,FrederickDewey1,IngridBorecki1,AlexanderLopez3,JohnOverton3,

JohnPenn3,JeffreyReid3,SarahA.Pendergrass1,GerdaBreitwieser2,MarylynD.Ritchie1

1DepartmentofBiomedicalandTranslationalInformatics,GeisingerHealthSystem,Danville,PA;2DepartmentofFunctionalandMolecularGenomics,GeisingerHealthSystem,Danville,PA;

3RegeneronGeneticsCenter,Tarrytown,NYShefaliSetiaVermaAwiderangeofpatienthealthdataisrecordedinElectronicHealthRecords(EHR).Thisdataincludesdiagnosis,surgicalprocedures,clinicallaboratorymeasurements,andmedicationinformation.Togetherthisinformationreflectsthepatient’smedicalhistory.ManystudieshaveefficientlyusedthisdatafromtheEHRtofindassociationsthatareclinicallyrelevant,eitherbyutilizingInternationalClassificationofDiseases,version9(ICD-9)codesorlaboratorymeasurements,orbydesigningphenotypealgorithmstoextractcaseandcontrolstatuswithaccuracyfromtheEHR.HerewedevelopedastrategytoutilizelongitudinalquantitativetraitdatafromtheEHRatGeisingerHealthSystemfocusingonoutpatientmetabolicandcompletebloodpaneldataasastartingpoint.ComprehensiveMetabolicPanel(CMP)aswellasCompleteBloodCounts(CBC)arepartsofroutinecareandprovideacomprehensivepicturefromhighlevelscreeningofpatients’overallhealthanddisease.Werandomlysplitourdataintotwodatasetstoallowfordiscoveryandreplication.Wefirstconductedagenome-wideassociationstudy(GWAS)withmedianvaluesof25differentclinicallaboratorymeasurementstoidentifyvariantsfromHumanOmniExpressExomebeadchipdatathatareassociatedwiththesemeasurements.Weidentified687variantsthatassociatedandreplicatedwiththetestedclinicalmeasurementsatp<5x10-08.SincelongitudinaldatafromtheEHRprovidesarecordofapatient’smedicalhistory,weutilizedthisinformationtofurtherinvestigatetheICD-9codesthatmightbeassociatedwithdifferencesinvariabilityofthemeasurementsinthelongitudinaldataset.WeidentifiedlowandhighvariancepatientsbylookingatchangeswithintheirindividuallongitudinalEHRlaboratoryresultsforeachofthe25clinicallabvalues(thuscreating50groups–ahighvarianceandalowvarianceforeachlabvariable).WethenperformedaPheWASanalysiswithICD-9diagnosiscodes,separatelyinthehighvariancegroupandthelowvariancegroupforeachlabvariable.Wefound717PheWASassociationsthatreplicatedatap-valuelessthan0.001.Next,weevaluatedtheresultsofthisstudybycomparingtheassociationresultsbetweenthehighandlowvariancegroups.Forexample,wefound39SNPs(inmultiplegenes)associatedwithICD-9250.01(Type-Idiabetes)inpatientswithhighvarianceofplasmaglucoselevels,butnotinpatientswithlowvarianceinplasmaglucoselevels.Anotherexampleistheassociationof4SNPsinUMODwithchronickidneydiseaseinpatientswithhighvarianceforaspartateaminotransferase(discoveryp-value:8.71x10-09andreplicationp-value:2.03x10-06).Ingeneral,weseeapatternofmanymore statisticallysignificantassociationsfrompatientswithhighvarianceinthequantitativelabvariables, incomparisonwiththelowvariancegroupacrossallofthe25laboratorymeasurements.Thisstudy isoneofthefirstofitskindtoutilizequantitativetraitvariancefromlongitudinallaboratorydatato findassociationsamonggeneticvariantsandclinicalphenotypesobtainedfromanEHR,integrating laboratoryvaluesanddiagnosiscodestounderstandthegeneticcomplexitiesofcommondiseases.

35

STRATEGIESFOREQUITABLEPHARMACOGENOMIC-GUIDEDWARFARINDOSINGAMONGEUROPEANANDAFRICANAMERICANINDIVIDUALSINACLINICAL

POPULATION

LauraWiley1,JacobVanHouten2,DavidSamuels2,MelindaAldrich3,DanRoden2,JoshPeterson2,JoshuaDenny2

1UniversityofColorado,2VanderbiltUniversity,3VanderbiltUniversityMedicalCenter

LauraWileyThebloodthinnerwarfarinhasanarrowtherapeuticrangeandhighinter-andintra-patientvariabilityintherapeuticdoses.Severalstudieshaveshownthatpharmacogenomicvariantshelppredictstablewarfarindosing.However,retrospectiveandrandomizedcontrolledtrialsthatemploydosingalgorithmsincorporatingpharmacogenomicvariantsunderperforminAfricanAmericans.Thisstudysoughttodetermineif:1)includingadditionalvariantsassociatedwithwarfarindoseinAfricanAmericans,2)predictingwithinsingleancestrygroupsratherthanacombinedpopulation,or3)usingpercentageAfricanancestryratherthanobservedrace,wouldimprovewarfarindosingalgorithmsinAfricanAmericans.UsingBioVU,theVanderbiltUniversityMedicalCenterbiobanklinkedtoelectronicmedicalrecords,wecompared25modelingstrategiestoexistingalgorithmsusingacohortof2,181warfarinusers(1,928whites,253blacks).Wefoundthatapproachesincorporatingadditionalvariantsincreasedmodelaccuracy,butnotinclinicallysignificantways.RacestratificationincreasedmodelfidelityforAfricanAmericans,buttheimprovementwassmallandnotlikelytobeclinicallysignificant.UseofpercentAfricanancestryimprovedmodelfitinthecontextofracemisclassification.

36

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY


37

PRODUCTIONOFAPRELIMINARYQUALITYCONTROLPIPELINEFORSINGLENUCLEIRNA-SEQANDITSAPPLICATIONINTHEANALYSISOFCELLTYPE

DIVERSITYOFPOST-MORTEMHUMANBRAINNEOCORTEX

BrianAevermann1,JamisonMcCorrison1,PratapVenepally1,RebeccaHodge2,TrygveBakken2,JeremyMiller2,MarkNovotny1,DannyN.Tran1,FranciscoDiez-Fuertes3,LenaChristiansen4,FanZhang4,FrankSteemers4,RogerS.Lasken1,EdLein2,NicholasSchork1,

RichardH.Scheuermann1

1J.CraigVenterInstitute,2AllenInstituteforBrainScience,3InstitutodeSaludCarlosIII,4Illumina,Inc.

RichardScheuermannNextgenerationsequencingoftheRNAcontentofsinglecellsorsinglenuclei(sc/nRNA-seq)hasbecomeapowerfulapproachtounderstandthecellularcomplexityanddiversityofmulticellularorganismsandenvironmentalecosystems.However,thefactthattheprocedurebeginswitharelativelysmallamountofstartingmaterial,therebypushingthelimitsofthelaboratoryproceduresrequired,dictatesthatcarefulapproachesforsamplequalitycontrol(QC)areessentialtoreducetheimpactoftechnicalnoiseandsamplebiasindownstreamanalysisapplications.HerewepresentapreliminaryframeworkforsamplelevelqualitycontrolthatisbasedonthecollectionofaseriesofquantitativelaboratoryanddatametricsthatareusedasfeaturesfortheconstructionofQCclassificationmodelsusingrandomforestmachinelearningapproaches.We’veappliedthisinitialframeworktoadatasetcomprisedof2272singlenucleiRNA-seqresultsanddeterminedthat~79%ofsampleswereofhighquality.Removalofthepoorqualitysamplesfromdownstreamanalysiswasfoundtoimprovethecelltypeclusteringresults.Inaddition,thisapproachidentifiedquantitativefeaturesrelatedtotheproportionofuniqueorduplicatereadsandtheproportionofreadsremainingafterqualitytrimmingasusefulfeaturesforpass/failclassification.Theconstructionanduseofclassificationmodelsfortheidentificationofpoorqualitysamplesprovidesforanobjectiveandscalableapproachtosc/nRNA-seqqualitycontrol.

38

TRACINGCO-REGULATORYNETWORKDYNAMICSINNOISY,SINGLE-CELLTRANSCRIPTOMETRAJECTORIES

PabloCordero,JoshuaM.Stuart

UCSantaCruzGenomicsInstitute,UniversityofCalifornia,SantaCruz

PabloCorderoTheavailabilityofgeneexpressiondataatthesinglecelllevelmakesitpossibletoprobethemolecularunderpinningsofcomplexbiologicalprocessessuchasdifferentiationandoncogenesis.Promisingnewmethodshaveemergedforreconstructingaprogression'trajectory'fromstaticsingle-celltranscriptomemeasurements.However,itremainsunclearhowtoadequatelymodeltheappreciablelevelofnoiseinthesedatatoelucidategeneregulatorynetworkrewiring.Here,wepresentaframeworkcalledSingleCellInferenceofMorphIngTrajectoriesandtheirAssociatedRegulation(SCIMITAR)thatinfersprogressionsfromstaticsingle-celltranscriptomesbyemployingacontinuousparametrizationofGaussianmixturesinhigh-dimensionalcurves.SCIMITARyieldsrichmodelsfromthedatathathighlightgeneswithexpressionandco-expressionpatternsthatareassociatedwiththeinferredprogression.Further,SCIMITARextractsregulatorystatesfromtheimplicatedtrajectory-evolvingco-expressionnetworks.Webenchmarkthemethodonsimulateddatatoshowthatityieldsaccuratecellorderingandgenenetworkinferences.Appliedtotheinterpretationofasingle-cellhumanfetalneurondataset,SCIMITARfindsprogression-associatedgenesincornerstoneneuraldifferentiationpathwaysmissedbystandarddifferentialexpressiontests.Finally,byleveragingtherewiringofgene-geneco-expressionrelationsacrosstheprogression,themethodrevealstheriseandfallofco-regulatorystatesandtrajectory-dependentgenemodules.Theseanalysesimplicatenewtranscriptionfactorsinneuraldifferentiationincludingputativeco-factorsforthemulti-functionalNFATpathway.

39

ANUPDATEDDEBARCODINGTOOLFORMASSCYTOMETRYWITHCELLTYPE-SPECIFICANDCELLSAMPLE-SPECIFICSTRINGENCYADJUSTMENT

KristinI.Fread1,WilliamD.Strickland2,GarryP.Nolan3,EliR.Zunder1

1DepartmentofBiomedicalEngineering,UniversityofVirginia;2DepartmentofBiomedicalSciences,UniversityofVirginia;3DepartmentofMicrobiologyand

Immunology,StanfordUniversity

EliZunderPooledsampleanalysisbymasscytometrybarcodingcarriesmanyadvantages:reducedantibodyconsumption,increasedsamplethroughput,removalofcelldoublets,reductionofcross-contaminationbysamplecarryover,andtheeliminationoftube-to-tube-variabilityinantibodystaining.Asingle-celldebarcodingalgorithmwaspreviouslydevelopedtoimprovetheaccuracyandyieldofsampledeconvolution,butthismethodwaslimitedtousingfixedparametersfordebarcodingstringencyfiltering,whichcouldintroducecell-specificorsample-specificbiastocellyieldinscenarioswherebarcodestainingintensityandvariancearenotuniformacrossthepooledsamples.Toaddressthisissue,wehaveupdatedthealgorithmtooutputdebarcodingparametersforeverycellinthesample-assignedFCSfiles,whichallowsforvisualizationandanalysisoftheseparametersviaflowcytometryanalysissoftware.Thisstrategycanbeusedtodetectcelltype-specificandsample-specificeffectsontheunderlyingcelldatathatariseduringthedebarcodingprocess.Anadditionalbenefittothisstrategyisthedecouplingofbarcodestringencyfilteringfromthedebarcodingandsampleassignmentprocess.Thisisaccomplishedbyremovingthestringencyfiltersduringsampleassignment,andthenfilteringafterthefactwith1-and2-dimensionalgatingonthedebarcodingparameterswhichareoutputwiththeFCSfiles.Thesedataexplorationstrategiesserveasanimportantqualitycheckforbarcodedmasscytometrydatasets,andallowcelltypeandsample-specificstringencyadjustmentthatcanremovebiasincellyieldintroducedduringthedebarcodingprocess.

40

IMAGINGGENOMICS

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

41

ADAPTIVETESTINGOFSNP-BRAINFUNCTIONALCONNECTIVITYASSOCIATIONVIAAMODULARNETWORKANALYSIS

ChenGao,JunghiKim,WeiPan

DivisionofBiostatistics,SchoolofPublicHealth,UniversityofMinnesota

WeiPanDuetoitshighdimensionalityandhighnoiselevels,analysisofalargebrainfunctionalnetworkmaynotbepowerfulandeasytointerpret;instead,decompositionofalargenetworkintosmallersubcomponentscalledmodulesmaybemorepromisingassuggestedbysomeempiricalevidence.Forexample,alterationofbrainmodularityisobservedinpatientssufferingfromvarioustypesofbrainmalfunctions.Althoughseveralmethodsexistforestimatingbrainfunctionalnetworks,suchasthesamplecorrelationmatrixorgraphicallassoforasparseprecisionmatrix,itisstilldifficulttoextractmodulesfromsuchnetworkestimates.Motivatedbytheseconsiderations,weadaptaweightedgeneco-expressionnetworkanalysis(WGCNA)frameworktoresting-statefMRI(rs-fMRI)datatoidentifymodularstructuresinbrainfunctionalnetworks.Modularstructuresareidentifiedbyusingtopologicaloverlapmatrix(TOM)elementsinhierarchicalclustering.Weproposeapplyinganewadaptivetestbuiltontheproportionaloddsmodel(POM)thatcanbeappliedtoahigh-dimensionalsetting,wherethenumberofvariables(p)canexceedthesamplesize(n)inadditiontotheusualp<nsetting.WeappliedourproposedmethodstotheADNIdatatotestforassociationsbetweenageneticvariantandeitherthewholebrainfunctionalnetworkoritsvarioussubcomponentsusingvariousconnectivitymeasures.Weuncoveredseveralmodulesbasedonthecontrolcohort,andsomeofthemweremarginallyassociatedwiththeAPOE4variantandseveralotherSNPs;however,duetothesmallsamplesizeoftheADNIdata,largerstudiesareneeded.

42

EXPLORINGBRAINTRANSCRIPTOMICPATTERNS:ATOPOLOGICALANALYSISUSINGSPATIALEXPRESSIONNETWORKS

ZhanaKuncheva1,MichelleL.Krishnan2,GiovanniMontana2

1ImperialCollegeLondon,2King'sCollegeLondon

ZhanaKunchevaCharacterizingthetranscriptomearchitectureofthehumanbrainisfundamentalingaininganunderstandingofbrainfunctionanddisease.AnumberofrecentstudieshaveinvestigatedpatternsofbraingeneexpressionobtainedfromanextensiveanatomicalcoverageacrosstheentirehumanbrainusingexperimentaldatageneratedbytheAllenHumanBrainAtlas(AHBA)project.Inthispaper,weproposeanewrepresentationofagene'stranscriptionactivitythatexplicitlycapturesthepatternofspatialco-expressionacrossdifferentanatomicalbrainregions.Foreachgene,wedefineaSpatialExpressionNetwork(SEN),anetworkquantifyingco-expressionpatternsamongstseveralanatomicallocations.NetworksimilaritymeasuresarethenemployedtoquantifythetopologicalresemblancebetweenpairsofSENsandidentifynaturallyoccurringclusters.Usingnetwork-theoreticalmeasures,threelargeclustershavebeendetectedfeaturingdistincttopologicalproperties.WethenevaluatewhethertopologicaldiversityoftheSENsreflectssignificantdifferencesinbiologicalfunctionthroughageneontologyanalysis.WereportonevidencesuggestingthatoneofthethreeSENclustersconsistsofgenesspecificallyinvolvedinthenervoussystem,includinggenesrelatedtobraindisorders,whiletheremainingtwoclustersarerepresentativeofimmunity,transcriptionandtranslation.Thesefindingsareconsistentwithpreviousstudiesshowingthatbraingeneclustersaregenerallyassociatedwithoneofthesethreemajorbiologicalprocesses.

43



44

ADEEPLEARNINGAPPROACHFORCANCERDETECTIONANDRELEVANTGENEIDENTIFICATION

PadidehDanaee,RezaGhaeini,DavidHendrix

OregonStateUniversity

PadidehDaneeCancerdetectionfromgeneexpressiondatacontinuestoposeachallengeduetothehighdimensionalityandcomplexityofthesedata.Afterdecadesofresearchthereisstilluncertaintyintheclinicaldiagnosisofcancerandtheidentificationoftumor-specificmarkers.Herewepresentadeeplearningapproachtocancerdetection,andtotheidentificationofgenescriticalforthediagnosisofbreastcancer.First,weusedStackedDenoisingAutoencoder(SDAE)todeeplyextractfunctionalfeaturesfromhighdimensionalgeneexpressionprofiles.Next,weevaluatedtheperformanceoftheextractedrepresentationthroughsupervisedclassificationmodelstoverifytheusefulnessofthenewfeaturesincancerdetection.Lastly,weidentifiedasetofhighlyinteractivegenesbyanalyzingtheSDAEconnectivitymatrices.Ourresultsandanalysisillustratethatthesehighlyinteractivegenescouldbeusefulcancerbiomarkersforthedetectionofbreastcancerthatdeservefurtherstudies.

45

GENOME-WIDEINTERACTIONWITHSELECTEDTYPE2DIABETESLOCIREVEALSNOVELLOCIFORTYPE2DIABETESINAFRICANAMERICANS

JacobM.Keaton1,JacklynN.Hellwege1,MaggieC.Y.Ng1,NicholetteD.Palmer1,JamesS.Pankow2,MyriamFornage3,JamesG.Wilson4,AdolofoCorrea4,LauraJ.Rasmussen-Torvik5,JeromeI.Rotter6,Yii-DerI.Chen6,KentD.Taylor6,StephenS.Rich7,LynneE.

Wagenknecht1,BarryI.Freedman1,DonaldW.Bowden1

1WakeForestSchoolofMedicine,2UniversityofMinnesota,3UniversityofTexasHealthScienceCenteratHouston,4UniversityofMississippiMedicalCenter,5NorthwesternUniversityFeinbergSchoolofMedicine,6Harbor-UCLAMedicalCenter,7Universityof

Virginia

JacobKeatonType2diabetes(T2D)istheresultofmetabolicdefectsininsulinsecretionandinsulinsensitivity,yetmostT2Dlociidentifiedtodateinfluenceinsulinsecretion.WehypothesizedthatT2Dloci,particularlythoseaffectinginsulinsensitivity,canbeidentifiedthroughinteractionwithknownT2Dlociimplicatedininsulinsecretion.Totestthishypothesis,singlenucleotidepolymorphisms(SNPs)nominallyassociatedwithacuteinsulinresponsetoglucose(AIRg),adynamicmeasureoffirst-phaseinsulinsecretion,andpreviouslyassociatedwithT2Dingenome-wideassociationstudies(GWAS)wereidentifiedinAfricanAmericansfromtheInsulinResistanceAtherosclerosisFamilyStudy(IRASFS;n=492subjects).TheseSNPsweretestedforinteraction,individuallyandjointlyasageneticriskscore(GRS),usingGWASdatafromfivecohorts(ARIC,CARDIA,JHS,MESA,WFSM;n=2,725cases,4,167controls)withT2Dastheoutcome.Insinglevariantanalyses,suggestivelysignificant(Pinteraction<5x10-6)interactionswereobservedatseverallociincludingDGKB(rs978989),CDK18(rs12126276),CXCL12(rs7921850),HCN1(rs6895191),FAM98A(rs1900780),andMGMT(rs568530).Notablebeta-cellGRSinteractionsincludedtwoSNPsattheDGKBlocus(rs6976381;rs6962498).ThesedatasupportthehypothesisthatadditionalgeneticfactorscontributingtoT2Driskcanbeidentifiedbyinteractionswithinsulinsecretionloci.

46

META-ANALYSISOFCONTINUOUSPHENOTYPESIDENTIFIESAGENESIGNATURETHATCORRELATESWITHCOPDDISEASESTATUS

MadeleineScott1,FrancescoVallania2,PurveshKhatri3

1StanfordMedicalSchool,StanfordUniversity,Stanford,California;2StanfordInstituteforImmunity,Transplantation,andInfection,StanfordUniversity,Stanford,California;3StanfordCenterforBiomedicalInformaticsResearch,StanfordUniversity,Stanford,

California

PurveshKhatriTheutilityofmulti-cohorttwo-classmeta-analysistoidentifyrobustdifferentiallyexpressedgenesignatureshasbeenwellestablished.However,manybiomedicalapplications,suchasgenesignaturesofdiseaseprogression,requireone-classanalysis.HerewedescribeanRpackage,MetaCorrelator,thatcanidentifyareproducibletranscriptionalsignaturethatiscorrelatedwithacontinuousdiseasephenotypeacrossmultipledatasets.Wesuccessfullyappliedthisframeworktoextractapatternofgeneexpressionthatcanpredictlungfunctioninpatientswithchronicobstructivepulmonarydisease(COPD)inbothperipheralbloodmononuclearcells(PBMCs)andtissue.OurresultspointtoadisregulationintheoxidationstateofthelungsofpatientswithCOPD,aswellasunderscoretheclassicallyrecognizedinflammatorystatethatunderliesthisdisease.

47

LEARNINGPARSIMONIOUSENSEMBLESFORUNBALANCEDCOMPUTATIONALGENOMICSPROBLEMS

AnaStanescu,GauravPandey

IcahnSchoolofMedicineatMountSinai

GauravPandeyPredictionproblemsinbiomedicalsciencesaregenerallyquitedifficult,partiallyduetoincompleteknowledgeofhowthephenomenonofinterestisinfluencedbythevariablesandmeasurementsusedforprediction,aswellasalackofconsensusregardingtheidealpredictor(s)forspecificproblems.Inthesesituations,apowerfulapproachtoimprovingpredictionperformanceistoconstructensemblesthatcombinetheoutputsofmanyindividualbasepredictors,whichhavebeensuccessfulformanybiomedicalpredictiontasks.Moreover,selectinga{\itparsimonious}ensemblecanbeofevengreatervalueforbiomedicalsciences,whereitisnotonlyimportanttolearnanaccuratepredictor,butalsotointerpretwhatnovelknowledgeitcanprovideaboutthetargetproblem.Ensembleselectionisapromisingapproachforthistaskbecauseofitsabilitytoselectacollectivelypredictivesubset,oftenarelativelysmallone,ofallinputbasepredictors.Oneofthemostwell-knownalgorithmsforensembleselection,CES(Caruana{\itetal.}'sEnsembleSelection),generallyperformswellinpractice,butfacesseveralchallengesduetothedifficultyofchoosingtherightvaluesofitsvariousparameters.Sincethechoicesmadefortheseparametersareusuallyad-hoc,goodperformanceofCESisdifficulttoguaranteeforavarietyofproblemsordatasets.ToaddressthesechallengeswithCESandothersuchalgorithms,weproposeanovelheterogeneousensembleselectionapproachbasedontheparadigmofreinforcementlearning(RL),whichoffersamoresystematicandmathematicallysoundmethodologyforexploringthemanypossiblecombinationsofbasepredictorsthatcanbeselectedintoanensemble.WedevelopthreeRL-basedstrategiesforconstructingensemblesandanalyzetheirresultsontwounbalancedcomputationalgenomicsproblems,namelythepredictionofproteinfunctionandsplicesitesineukaryoticgenomes.Weshowthattheresultantensemblesareindeedsubstantiallymoreparsimoniousascomparedtothefullsetofbasepredictors,yetstillofferalmostthesameclassificationpower,especiallyforlargerdatasets.TheRLensemblesalsoyieldabettercombinationofparsimonyandpredictiveperformanceascomparedtoCES.

48

NETWORKMAPOFADVERSEHEALTHEFFECTSAMONGVICTIMSOFINTIMATEPARTNERVIOLENCE

KathleenWhiting1,LarryY.Liu2,MehmetKoyutürk2,GunnurKarakurt2

1UniformedServicesUniversity,2CaseWesternReserveUniversity

GunnurKarakurtIntimatepartnerviolence(IPV)isaseriousproblemwithdevastatinghealthconsequences.ScreeningproceduresmayoverlookrelationshipsbetweenIPVandnegativehealtheffects.ToidentifyIPV-associatedwomen’shealthissues,weminednational,aggregatedde-identifiedelectronichealthrecorddataandcomparedfemalehealthissuesofdomesticabuse(DA)versusnon-DArecords,identifyingtermssignificantlymorefrequentfortheDAgroup.Aftercodingthesetermsinto28broadcategories,wedevelopedanetworkmaptodeterminestrengthofrelationshipsbetweencategoriesinthecontextofDA,findingthatacuteconditionsarestronglyconnectedtocardiovascular,gastrointestinal,gynecological,andneurologicalconditionsamongvictims.

49



50

APOWERFULMETHODFORINCLUDINGGENOTYPEUNCERTAINTYINTESTSOFHARDY-WEINBERGEQUILIBRIUM

AndrewBeck1,AlexanderLuedtke2,KeliLiu3,NathanTintle4

1UniversityofMichigan,2UniversityofCalifornia-Berkeley,3HarvardUniversity,4DordtCollege

NathanTintleTheuseofposteriorprobabilitiestosummarizegenotypeuncertaintyispervasiveacrossgenotype,sequencingandimputationplatforms.Priorworkinmanycontextshasshowntheutilityofincorporatinggenotypeuncertainty(posteriorprobabilities)indownstreamstatisticaltests.TypicalapproachestoincorporatinggenotypeuncertaintywhentestingHardy-WeinbergequilibriumtendtolackcalibrationinthetypeIerrorrate,especiallyasgenotypeuncertaintyincreases.WeproposeanewapproachinthespiritofgenomiccontrolthatproperlycalibratesthetypeIerrorrate,whileyieldingimprovedpowertodetectdeviationsfromHardy-WeinbergEquilibrium.Wedemonstratetheimprovedperformanceofourmethodonbothsimulatedandrealgenotypes.

51

MICRORNA-AUGMENTEDPATHWAYS(MIRAP)ANDTHEIRAPPLICATIONSTOPATHWAYANALYSISANDDISEASESUBTYPING

DianaDiaz1,MicheleDonato2,TinNguyen1,SorinDraghici1

1WayneStateUniversity,2StanfordUniversityMedicalCenter

SorinDraghiciMicroRNAsplayimportantrolesinthedevelopmentofmanycomplexdiseases.Becauseoftheirimportance,theanalysisofsignalingpathwaysincludingmiRNAinteractionsholdsthepotentialforunveilingthemechanismsunderlyingsuchdiseases.However,currentsignalingpathwaydatabasesarelimitedtointeractionsbetweengenesandignoremiRNAs.Here,weusetheinformationonmiRNAtargetstobuildadatabaseofmiRNA-augmentedpathways(mirAP),andweshowitsapplicationinthecontextsofintegrativepathwayanalysisanddiseasesubtyping.OurmiRNA-mRNAintegrativepathwayanalysispipelineincorporatesatopology-awareapproachthatwepreviouslyimplemented.Ourintegrativediseasesubtypingpipelinetakesintoaccountsurvivaldata,geneandmiRNAexpression,andknowledgeoftheinteractionsamonggenes.Wedemonstratetheadvantagesofourapproachbyanalyzingninesample-matcheddatasetsthatprovidebothmiRNAandmRNAexpression.WeshowthatintegratingmiRNAsintopathwayanalysisresultsingreaterstatisticalpower,andprovidesamorecomprehensiveviewoftheunderlyingphenomena.Wealsocompareourdiseasesubtypingmethodwiththestate-of-the-artintegrativeanalysisbyanalyzingacolorectalcancerdatabasefromTCGA.Thecolorectalcancersubtypesidentifiedbyourapproacharesignificantlydifferentintermsoftheirsurvivalexpectation.ThesemiRNA-augmentedpathwaysofferamorecomprehensiveviewandadeeperunderstandingofbiologicalpathways.Abetterunderstandingofthemolecularprocessesassociatedwithpatients'survivalcanhelptoabetterprognosisandanappropriatetreatmentforeachsubtype.

52

FREQUENTSUBGRAPHMININGOFPERSONALIZEDSIGNALINGPATHWAYNETWORKSGROUPSPATIENTSWITHFREQUENTLYDYSREGULATEDDISEASE

PATHWAYSANDPREDICTSPROGNOSIS

ArdaDurmaz,TimA.D.Henderson,DouglasBrubaker,GurkanBebek

CaseWesternReserveUniversity

GurkanBebekMotivation:Largescalegenomicsstudieshavegeneratedcomprehensivemolecularcharacterizationofnumerouscancertypes.Subtypesformanytumortypeshavebeenestablished;however,theseclassificationsarebasedonmolecularcharacteristicsofasmallgenesetswithlimitedpowertodetectdysregulationatthepatientlevel.Wehypothesizethatfrequentgraphminingofpathwaystogatherpathwaysfunctionallyrelevanttotumorscancharacterizetumortypesandprovideopportunitiesforpersonalizedtherapies.Results:Inthisstudywepresentanintegrativeomicsapproachtogrouppatientsbasedontheiralteredpathwaycharacteristicsandshowprognosticdifferenceswithinbreastcancer(p<9.57E−10)andglioblastomamultiforme(p<0.05)patients.WewereablevalidatethisapproachinsecondaryRNA-Seqdatasetswithp<0.05andp<0.01respectively.Wealsoperformedpathwayenrichmentanalysistofurtherinvestigatethebiologicalrelevanceofdysregulatedpathways.Wecomparedourapproachwithnetwork-basedclassifieralgorithmsandshowedthatourunsupervisedapproachgeneratesmorerobustandbiologicallyrelevantclusteringwhereaspreviousapproachesfailedtoreportspecificfunctionsforsimilarpatientgroupsorclassifypatientsintoprognosticgroups.Conclusions:Theseresultscouldserveasameanstoimproveprognosisforfuturecancerpatients,andtoprovideopportunitiesforimprovedtreatmentoptionsandpersonalizedinterventions.TheproposednovelgraphminingapproachisabletointegratePPInetworkswithgeneexpressioninabiologicallysoundapproachandclusterpatientsintoclinicallydistinctgroups.WehaveutilizedbreastcancerandglioblastomamultiformedatasetsfrommicroarrayandRNA-Seqplatformsandidentifieddiseasemechanismsdifferentiatingsamples.

53

CERNASEARCHMETHODIDENTIFIEDAMET-ACTIVATEDSUBGROUPAMONGEGFRDNAAMPLIFIEDLUNGADENOCARCINOMAPATIENTS

HallaKabat,LeoTunkle,InhanLee

miRcore

InhanLeeGiventhediversemolecularpathwaysinvolvedintumorigenesis,identifyingsubgroupsamongcancerpatientsiscrucialinprecisionmedicine.WhilemosttargetedtherapiesrelyonDNAmutationstatusintumors,responsestosuchtherapiesvaryduetothemanymolecularprocessesinvolvedinpropagatingDNAchangestoproteins(whichconstitutetheusualdrugtargets).ThoughRNAexpressionshavebeenextensivelyusedtocategorizetumors,identifyingclinicallyimportantsubgroupsremainschallenginggiventhedifficultyofdiscerningsubgroupswithinallpossibleRNA-RNAnetworks.Itisthusessentialtoincorporatemultipletypesofdata.Recently,RNAwasfoundtoregulateotherRNAthroughacommonmicroRNA(miR).TheseregulatingandregulatedRNAsarereferredtoascompetingendogenousRNAs(ceRNAs).However,globalcorrelationsbetweenmRNAandmiRexpressionsacrossallsampleshavenotreliablyyieldedceRNAs.Inthisstudy,wedevelopedaceRNA-basedmethodtoidentifysubgroupsofcancerpatientscombiningDNAcopynumbervariation,mRNAexpression,andmicroRNA(miR)expressiondatawithbiologicalknowledge.ClinicaldataisusedtovalidateidentifiedsubgroupsandceRNAs.SinceceRNAsarecausal,ceRNA-basedsubgroupsmaypresentclinicalrelevance.UsinglungadenocarcinomadatafromTheCancerGenomeAtlas(TCGA)asanexample,wefocusedonEGFRamplificationstatus,sinceatargetedtherapyforEGFRexists.WehypothesizedthatglobalcorrelationsbetweenmRNAandmiRexpressionsacrossallpatientswouldnotrevealimportantsubgroupsandthatclusteringofpotentialceRNAsmightdefinemolecularpathway-relevantsubgroups.UsingexperimentallyvalidatedmiR-targetpairs,weidentifiedEGFRandMETaspotentialceRNAsformiR-133binlungadenocarcinoma.TheEGFR-METupandmiR-133bdownsubgroupshowedahigherdeathratethantheEGFR-METdownandmiR-133bupsubgroup.AlthoughtransactivationbetweenMETandEGFRhasbeenidentifiedpreviously,ourresultisthefirsttoproposeceRNAasoneofitsunderlyingmechanisms.Furthermore,sinceMETamplificationwasseeninthecaseofresistancetoEGFR-targetedtherapy,theEGFR-METupandmiR-133bdownsubgroupmayfallintothedrugnon-responsegroupandthusprecludeEGFRtargettherapy.

54

IMPROVEDPERFORMANCEOFGENESETANALYSISONGENOME-WIDETRANSCRIPTOMICSDATAWHENUSINGGENEACTIVITYSTATEESTIMATES

ThomasKamp,MicahAdams,CraigDisselkoen,NathanTintle

DordtCollege

NathanTintleGenesetanalysismethodscontinuetobeapopularandpowerfulmethodofevaluatinggenome-widetranscriptomicsdata.Theseapproachrequireapriorigroupingofgenesintobiologicallymeaningfulsets,andthenconductingdownstreamanalysesattheset(insteadofgene)levelofanalysis.Genesetanalysismethodshavebeenshowntoyieldmorepowerfulstatisticalconclusionsthansingle-geneanalysesduetobothreducedmultipletestingpenaltiesandpotentiallylargerobservedeffectsduetotheaggregationofeffectsacrossmultiplegenesintheset.Traditionally,genesetanalysismethodshavebeenapplieddirectlytonormalized,log-transformed,transcriptomicsdata.Recently,effortshavebeenmadetotransformtranscriptomicsdatatoscalesyieldingmorebiologicallyinterpretableresults.Forexample,recentlyproposedmodelstransformlog-transformedtranscriptomicsdatatoaconfidencemetric(rangingbetween0and100%)thatageneisactive(roughlyspeaking,thatthegeneproductispartofanactivecellularmechanism).Inthismanuscript,wedemonstrate,onbothrealandsimulatedtranscriptomicsdata,thattestsfordifferentialexpressionbetweensetsofgenesusingaretypicallymorepowerfulwhenusinggeneactivitystateestimatesasopposedtolog-transformedgeneexpressiondata.Ouranalysissuggestsfurtherexplorationoftechniquestotransformtranscriptomicsdatatomeaningfulquantitiesforimproveddownstreaminference.

55

METHYLDMV:SIMULTANEOUSDETECTIONOFDIFFERENTIALDNAMETHYLATIONANDVARIABILITYWITHCONFOUNDERADJUSTMENT

PeiFenKuan,JunyanSong,ShuyaoHe

StonyBrookUniversity

PeiFenKuanDNAmethylationhasemergedaspromisingepigeneticmarkersfordiseasediagnosis.Boththedifferentialmean(DM)anddifferentialvariability(DV)inmethylationhavebeenshowntocontributetotranscriptionalaberrationanddiseasepathogenesis.ThepresenceofconfoundingfactorsinlargescaleEWASmayaffectthemethylationvaluesandhamperaccuratemarkerdiscovery.Inthispaper,weproposeaflexibleframeworkcalledmethylDMVwhichallowsforconfoundingfactorsadjustmentandenablessimultaneouscharacterizationandidentificationofCpGsexhibitingDMonly,DVonlyandbothDMandDV.Theproposedframeworkalsoallowsforprioritizationandselectionofcandidatefeaturestobeincludedinthepredictionalgorithm.WeillustratetheutilityofmethylDMVinseveralTCGAdatasets.AnRpackagemethylDMVimplementingourproposedmethodisavailableathttp://www.ams.sunysb.edu/~pfkuan/softwares.html#methylDMV.

56

IDENTIFYCANCERDRIVERGENESTHROUGHSHAREDMENDELIANDISEASEPATHOGENICVARIANTSANDCANCERSOMATICMUTATIONS

MengMa1,ChangchangWang2,BenjaminGlicksberg1,EricE.Schadt1,ShuyuLi1,RongChen1

1IcahnSchoolofMedicineatMountSinai,2AnhuiUniversity

ShuyuLiGenomicsequencingstudiesinthepastseveralyearshaveyieldedalargenumberofcancersomaticmutations.Thereremainsamajorchallengeindelineatingasmallfractionofsomaticmutationsthatareoncogenicdriversfromabackgroundofpredominantlypassengermutations.Althoughcomputationaltoolshavebeendevelopedtopredictthefunctionalimpactofmutations,theirutilityislimited.Inthisstudy,weappliedanalternativeapproachtoidentifypotentiallynovelcancerdriversasthosesomaticmutationsthatoverlapwithknownpathogenicmutationsinMendeliandiseases.Wehypothesizethatthosesharedmutationsaremorelikelytobecancerdriversbecausetheyhavetheestablishedmolecularmechanismstoimpactproteinfunctions.WefirstshowthattheoverlapbetweensomaticmutationsinCOSMICandpathogenicgeneticvariantsinHGMDisassociatedwithhighmutationfrequencyincancersandisenrichedforknowncancergenes.WethenattemptedtoidentifyputativetumorsuppressorsbasedonthenumberofdistinctHGMD/COSMICoverlappingmutationsinagivengene,andourresultssuggestthationchannels,collagensandMarfansyndromeassociatedgenesmayrepresentnewclassesoftumorsuppressors.Toelucidatepotentiallynoveloncogenes,weidentifiedthoseHGMD/COSMICoverlappingmutationsthatarenotonlyhighlyrecurrentbutalsomutuallyexclusivefrompreviouslycharacterizedoncogenicmutationsineachspecificcancertype.Takentogether,ourstudyrepresentsanovelapproachtodiscovernewcancergenesfromthevastamountofcancergenomesequencingdata.

57

IDENTIFYINGCANCERSPECIFICMETABOLICSIGNATURESUSINGCONSTRAINT-BASEDMODELS

AndréSchultz1,SanketMehta1,ChenyueW.Hu1,FiekeW.Hoff2,TerzahM.Horton3,StevenM.Kornblau2,AminaA.Qutub1

1RiceUniversity,2UniversityofTexasMDAndersonCancerCenter,3BaylorCollegeof

MedicineandTexasChildren'sHospital

AndréSchultzCancermetabolismdiffersremarkablyfromthemetabolismofhealthysurroundingtissues,anditisextremelyheterogeneousacrosscancertypes.Whilethesemetabolicdifferencesprovidepromisingavenuesforcancertreatments,muchworkremainstobedoneinunderstandinghowmetabolismisrewiredinmalignanttissues.Tothatend,constraint-basedmodelsprovideapowerfulcomputationaltoolforthestudyofmetabolismatthegenomescale.Togeneratemeaningfulpredictions,however,thesegeneralizedhumanmodelsmustfirstbetailoredforspecificcellortissuesub-types.Herewefirstpresenttwoimprovedalgorithmsfor(1)thegenerationofthesecontext-specificmetabolicmodelsbasedonomicsdata,and(2)Monte-Carlosamplingofthemetabolicmodelfluxspace.Byapplyingthesemethodstogenerateandanalyzecontext-specificmetabolicmodelsofdiversesolidcancercelllinedata,andprimaryleukemiapediatricpatientbiopsies,wedemonstratehowthemethodologypresentedinthisstudycangenerateinsightsintotherewiringdifferencesacrosssolidtumorsandbloodcancers.

58



59

MAPPINGNEURONALCELLTYPESUSINGINTEGRATIVEMULTI-SPECIESMODELINGOFHUMANANDMOUSESINGLECELLRNASEQUENCING

TravisJohnson,ZacharyAbrams,YanZhang,KunHuang

OhioStateUniversity

TravisJohnsonMousebraintranscriptomicstudiesareimportantintheunderstandingofthestructuralheterogeneityinthebrain.However,itisnotwellunderstoodhowcelltypesinthemousebrainrelatetohumanbraincelltypesonacellularlevel.Weproposethatitispossiblewithsinglecellgranularitytofindconcordantgenesbetweenmouseandhumanandthatthesegenescanbeusedtoseparatecelltypesacrossspecies.Weshowthatasetofconcordantgenescanbealgorithmicallyderivedfromacombinationofhumanandmousesinglecellsequencingdata.Usingthisgeneset,weshowthatsimilarcelltypessharedbetweenmouseandhumanclustertogether.Furthermorewefindthatpreviouslyunclassifiedhumancellscanbemappedtotheglial/vascularcelltypebyintegratingmousecelltypeexpressionprofiles.

60

ASPATIOTEMPORALMODELTOSIMULATECHEMOTHERAPYREGIMENSFORHETEROGENEOUSBLADDERCANCERMETASTASESTOTHELUNG

KimberlyR.KanigelWinner1,JamesC.Costello2

1ComputationalBioscienceProgram,DepartmentofPharmacology,UniveristyofColoradoCancerCenter;2UniversityofColoradoAnschutzMedicalCampus

KimberlyKanigelWinnerTumorsarecomposedofheterogeneouspopulationsofcells.Somaticgeneticaberrationsareoneformofheterogeneitythatallowsclonalcellstoadapttochemotherapeuticstress,thusprovidingapathforresistancetoarise.Insilicomodelingoftumorsprovidesaplatformforrapid,quantitativeexperimentstoinexpensivelystudyhowcompositionalheterogeneitycontributestodrugresistance.Accordingly,wehavebuiltaspatiotemporalmodelofalungmetastasisoriginatingfromaprimarybladdertumor,incorporatinginvivodrugconcentrationsoffirst-linechemotherapy,resistancedatafrombladdercancercelllines,vasculardensityoflungmetastases,andgainsinresistanceincellsthatsurvivechemotherapy.Inmetastaticbladdercancer,afirst-linedrugregimenincludessixcyclesofgemcitabinepluscisplatin(GC)deliveredsimultaneouslyonday1,andgemcitabineonday8ineach21-daycycle.Theinteractionbetweengemcitabineandcisplatinhasbeenshowntobesynergisticinvitro,andresultsinbetteroutcomesinpatients.Ourmodelshowsthatduringsimulatedtreatmentwiththisregimen,GCsynergydoesbegintokillcellsthataremoreresistanttocisplatin,butrepopulationbyresistantcellsoccurs.Post-regimenpopulationsaremixturesoftheoriginal,seededresistantclones,and/ornewclonesthathavegainedresistancetocisplatin,gemcitabine,orbothdrugs.Theemergenceofatumorwithincreasedresistanceisqualitativelyconsistentwiththefive-yearsurvivalof6.8%forpatientswithmetastatictransitionalcellcarcinomaoftheurinarybladdertreatedwithaGCregimen.Themodelcanbefurtherusedtoexploretheparameterspaceforclinicallyrelevantvariables,includingthetimingofdrugdeliverytooptimizecelldeath,andpatient-specificdatasuchasvasculardensity,ratesofresistancegain,diseaseprogression,andmolecularprofiles,andcanbeexpandedfordataontoxicity.Themodelisspecifictobladdercancer,whichhasnotpreviouslybeenmodeledinthiscontext,butcanbeadaptedtorepresentothercancers.

61

SCALABLEVISUALIZATIONFORHIGH-DIMENSIONALSINGLE-CELLDATA

JuhoKim,NateRussell,JianPeng

UniversityofIllinoisatUrbana-Champaign

JuhoKimSingle-cellanalysiscanuncoverthemysteriesinthestateofindividualcellsandenableustoconstructnewmodelsabouttheanalysisofheterogeneoustissues.State-of-the-arttechnologiesforsingle-cellanalysishavebeendevelopedtomeasurethepropertiesofsingle-cellsanddetecthiddeninformation.Theyareabletoprovidethemeasurementsofdozensoffeaturessimultaneouslyineachcell.However,duetothehigh-dimensionality,heterogeneouscomplexityandsheerenormityofsingle-celldata,itsinterpretationischallenging.Thus,newmethodstoovercomehigh-dimensionalityarenecessary.Here,wepresentacomputationaltoolthatallowsefficientvisualizationofhigh-dimensionalsingle-celldataontoalow-dimensional(2Dor3D)spacewhilepreservingthesimilaritystructurebetweensingle-cells.Wefirstconstructanetworkthatcanrepresentthesimilaritystructurebetweenthehigh-dimensionalrepresentationsofsingle-cells,andthen,embedthisnetworkintoalow-dimensionalspacethroughanefficientonlineoptimizationmethodbasedontheideaofnegativesampling.Usingthisapproach,wecanpreservethehigh-dimensionalstructureofsingle-celldatainanembeddedlow-dimensionalspacethatfacilitatesvisualanalysesofthedata.

62

COMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION

POSTERPRESENTATIONS

63

CLUSTER-BASEDGENOTYPE-ENVIRONMENT-PHENOTYPECORRELATIONALGORITHM

ErnestoBorrayo,RyokoMachida-Hirano

GeneResearchCenter,UniversityofTsukuba

ErnestoBorrayoTheinteractionsbetweengenotypeandenvironmentgiverisetophenotypicplasticity.However,theseinteractionsaredynamicandcomplex.Whatisconsideredasaphenotypeatoneevaluation,canbeconsideredasanenvironmentalconditionatsomeother,asthatpreviousphenotypewillaffectparticularconditionsforthenewone.Also,underaspecificperspectiveadeterminedgeneticmaterialcanbeconsideredasanenvironmentalconditionforotherloci.Theseconceptselucidatethatthe“onegene,onetrait”rationaleisrathertheexceptionthantherule,andinordertoadequatelypredictthepossiblephenotypeexpectedatanybiologicallevel,thespecificinteractionbetweenenvironmentandgenotypeshouldbeanalyzedcarefully.Inordertoinferthedegreeofinfluenceofbothagenotypeandanenvironmentovercertainphenotypictraits,wedevelopedacluster-basedalgorithmthatrendersthewayphenotypicaltraitscanbeexplainedbyeitherthatgenotypeorsuchenvironmentalconditions.Althoughthisapproachisstillfarfrombeingabletoconsiderallpossibleaspectsthatmayexplainaphenotypiccondition,itisafirstapproachtosuccessfullyanalyzingthementionedgenotype-environment-phenotypeinteractionsinacomprehensivemanner.Totestthealgorithmalongwithsyntheticdata,realgenetic,environmentalandagromorphologicaltraitsofTheobromacacaoandSechiumedulewerealsoanalyzed.Weexpectthatfurtherexplorationofdifferentclassifierswillhelptoadequatelypredictphenotypicexpressionatdifferentbiologicallevels—withsignificantapplicationsindiversefieldssuchascropimprovement,genomics,clinicaldiagnosis/prognosis/treatmentandmetabolomics—andthatitwillenhanceourunderstandingofgenomics,metabolomicsandadaptation/evolutionaryprocesses.

64

QUANTITATINGTRANSLATIONALCONTROL:MRNAABUNDANCE-DEPENDENTANDINDEPENDENTCONTRIBUTIONS

JingyiJessicaLi1,Guo-LiangChew2,MarkD.Biggin3

1DepartmentofStatisticsandDepartmentofHumanGenetics,UCLA;2ComputationalBiologyProgram,FredHutchinsonCancerResearchCenter;3BiologicalSystemsand

EngineeringDivision,LawrenceBerkeleyNationalLaboratory

JingyiJessicaLiTranslationratepermRNAmoleculecorrelatespositivelywithmRNAabundance.Asaresult,proteinlevelsdonotscalelinearlywithmRNAlevels,butinsteadscalewiththeabundanceofmRNAraisedtothepowerofan“amplificationexponent”.Hereweshowthattoquantitatetranslationalcontrolitisnecessarytodecomposethetranslationrateintotwocomponents.Onecomponent,TRmD,dependsonthemRNAlevelanddefinestheamplificationexponent.Theothercomponent,TRmIND,isindependentofmRNAamountandimpactsthecorrelationcoefficientbetweenproteinandmRNAlevels.WeshowthatinS.cerevisiaeTRmDrepresents~30%ofthevarianceintranslationandresultsinanamplificationexponentof~1.20–1.27.TRmINDconstitutestheremaining70%ofthevarianceintranslationandexplains<5%ofthevarianceinproteinexpression.Whenproteindegradationisalsoconsidered,thecorrelationbetweentheabundancesofproteinandmRNAisR2prot–RNA>0.92.WealsoinvestigatewhichmRNAsequenceelementsexplainthevarianceinTRmDandTRmIND.WefindthatTRmINDismoststronglydeterminedbythelengthoftheopenreadingframe,whileTRmDismorestronglydeterminedbyanArich,highlyunfoldedelementthatspansnucleotides-35to+28relativetotheinitiatingAUGcodon,implyingthatTRmINDisunderdifferentevolutionaryselectivepressuresthanTRmD.OurworkintroducesmethodsforcorrectlyscalingmRNAandproteinabundancedatausinginternallycontrolledstandards.Itprovidesquitedifferent,moreaccurateestimatesoftranslationalcontrolthananyprevious.Bydecomposingtranslationrates,wealsoprovideinsightsintothemRNAsequencedependenciesoftranslationthatwouldnotbeapparentotherwise.

65

PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION

ShengWang,MengQu,JianPen

UniversityofIllinoisUrbanaChampaign

ShengWangAutomatedannotationofproteinfunctionhasbecomeacriticaltaskinthepost-genomicera.Network-basedapproachesandhomology-basedapproacheshavebeenwidelyusedandrecentlytestedinlarge-scalecommunity-wideassessmentexperiments.Itisnaturaltointegratenetworkdatawithhomologyinformationtofurtherimprovethepredictiveperformance.However,integratingthesetwoheterogeneous,high-dimensionalandnoisydatasetsisnon-trivial.Inthiswork,weintroduceanovelproteinfunctionpredictionalgorithmProSNet.Anintegratedheterogeneousnetworkisfirstbuilttoincludemolecularnetworksofmultiplespeciesandlinktogetherhomologousproteinsacrossmultiplespecies.Basedonthisintegratednetwork,adimensionalityreductionalgorithmisintroducedtoobtaincompactlow-dimensionalvectorstoencodeproteinsinthenetwork.Finally,wedevelopmachinelearningclassificationalgorithmsthattakethevectorsasinputandmakepredictionsbytransferringannotationsbothwithineachspeciesandacrossdifferentspecies.Extensiveexperimentsonfivemajorspeciesdemonstratethatourintegrationofhomologywithmolecularnetworkssubstantiallyimprovesthepredictiveperformanceoverexistingapproaches.

66

GENERAL

POSTERPRESENTATIONS

67

IDENTIFICATIONOFDIFFERENTIALLYPHOSPHORYLATEDMODULESINPROTEININTERACTIONNETWORKS

MarziehAyati,DanicaWiredja,DanielaSchlatzer,GouthamNarla,MarkChance,MehmetKoyutürk

CaseWesternReserveUniversity

MehmetKoyuturkAdvancesinhigh-throughputomicstechnologiesrevolutionizedourunderstandingofthegenomicunderpinningsofcancer.However,manychallengesremaininunderstandinghowpatientswithcommondrivermutationsmaydisplaydivergingphosphoproteomicresponsestothesametreatment.Thus,anexaminationofthesignalinglandscapewillprovideessentialmolecularinformationformodelingpersonalizedpatienttreatmentdesign.However,integrativebioinformaticsapproachestoidentifyphosphoproteomics-basedmolecularstatesareintheirinfancy.Toaddressthischallenge,weadaptouralgorithmMoBaS,whichhasbeenoriginallydevelopedtoidentifyphenotype-associatedsubnetworksinthecontextofgenome-wideassociationstudies.MoBaStakesasinputaPPInetworkandascoreforeachproteinindicatingtheprotein’sdifferentialphosphorylationlevel.Itthenidentifiesproteinsubnetworksthatare(i)composedofdenselyinteractingproteins,and(ii)enrichedinproteinswithhighscores.MoBaSalsoassessesthestatisticalsignificanceoftheidentifiedsubnetworksusingpermutationteststhateffectivelyhandlemultiplehypothesistesting.WeapplyMoBaStocompareandcontrastthedrug-inducedglobalsignalingalterationsoftwoKRASmutatednon-smallcelllungcancer(NSCLC)celllines,A549andH358,treatedwithanovelactivatorofthetumorsuppressorProteinPhosphatase2A(PP2A)versusDMSOcontrol.Applyingkinaseenrichmentanalysisonidentifiedsubnetworks,weidentifyAuroraKBasakeykinasedifferentiallyregulatedbetweenthetwocelllinesinresponsetoourcompound.Furthercorroboratingthisfinding,weshowthatAuroraKBisdownregulatedattheproteinandmRNAlevelswithourtreatmentinA549butnotinH358.

68

CLUSTERINGMETHODFORPRIORITIZINGBREASTCANCERRISKGENESANDMIRNAS

YongshengBai,NaureenAslam,AliSalman

IndianaStateUniversity

YongshengBaiBackgroundMicroRNAs(miRNA)areshortnucleotidesthatinteractwiththeirtargetmRNAsthrough3’untranslatedregions(UTRs).TheCancerGenomeAtlas(TCGA)projectinitiatedin2006hasachievedtosequencetissuecollectionwithmatchedtumorandnormalsamplesfrom11,000patientsin33cancertypesandsubtypes,including10rarecancers.ThereisanurgentneedtodevelopinnovativemethodologiesandtoolsthatcanclustermRNA-miRNAinteractionpairsintogroupsandcharacterizefunctionalconsequencesofcancerriskgeneswhileanalyzingthetumorandnormalsamplessimultaneously.RationaleAnundirectedgraphcanbeusedtorepresentgeneandmiRNArelationshipsinaninteractionnetwork.Specifically,interactionsbetweengenesandmiRNAsarerenderedasabipartitegraphwithgenesormiRNAsasverticesandtheircalculatedcorrelationasedges.Ourhypothesisis:Ifahighlyscoredgene/miRNAclusterinagiventumorsampleshowsasignificantlyalteredregulationrelativetoasimilargene/miRNAclusterinthecorrespondingnon-tumorsample,theclusterisbiologicallysignificant.ResultsWedevelopedapowerfulmathematicalmodeltoidentifyclustersofsignificantmRNAandmiRNAinteractionpairsanddeciphermRNAandmiRNAregulationnetworkusingTCGAmiRNAsequencingandmRNAsequencingdata.WerantheclusterdetectionalgorithmimplementedinPython3onTCGABreastInvasiveCarcinoma(BRCA)transcriptome(bothRNA-SeqandmiRNA-Seq)datasets.Usingdifferentclustersize(orbin)anddifferentselectionofmiRNAandmRNApairsforcreatingclusterswillgeneratedifferenttopologyofclusters,therefore,resultingindifferentnumbersofcommonclustersbetweentumorandnormalsamplesaswell.Weran1,000differentrandomselectionsoftargetpairstogeneratedifferentclustertopologyandcombinedallresultstogethertoobtain105,850distinctivecandidateclustersforprioritization.ConclusionsWethinkourmethodologyforidentifyingcancerdrivergenesinpersonalgenomesinwhichcliniciansseektodevelopbettertreatmentstrategiesisvaluabletothefield.Ourproposedmethodshouldbeapplicableacrossarangeofdiseasesandcancers.

69

FUSIONDB:ASSESSINGMICROBIALDIVERSITYANDENVIRONMENTALPREFERENCESVIAFUNCTIONALSIMILARITY

ChengshengZhu1,YannickMahlich1,2,3,4,YanaBromberg1,4

1DepartmentofBiochemistryandMicrobiology,SchoolofEnvironmentalandBiologicalSciences,RutgersUniversity,NewBrunswick,NJ,USA;2GraduateSchool,Centerof

DoctoralStudiesinInformaticsanditsApplications(CeDoSIA),TUM,Garching,Germany;3DepartmentofInformatics,Bioinformatics&ComputationalBiology-I12,TUM,Garching,Germany;4InstituteofAdvancedStudy(TUM-IAS),Garching,Germany

YanaBrombergSummary:Microbialfunctionaldiversificationisdrivenbyenvironmentalfactors.Insomecases,microbesdiffermoreacrossenvironmentsthanacrosstaxa.HereweintroducefusionDB,anoveldatabaseofmicrobialfunctionalsimilarities,indexedbyavailableenvironmentalpreferences.fusionDBentriesrepresentnearlyfourteenhundredtaxonomically-distinctbacteriaannotatedwithavailablemetadata:habitat,temperature,andoxygenuse.Eachmicrobeisencodedasasetoffunctionsrepresentedbyitsproteome,andindividualmicrobesareconnectedviacommonfunctions.DatabasesearchesproduceeasilyvisualizableXML-formattednetworkfilesofselectedorganisms,alongwiththeirsharedfunctions.fusionDBthusprovidesafastmeansofassociatingspecificenvironmentalfactorswithorganismfunctions.Availability:http://bromberglab.org/databases/fusiondbandasasql-dumpbyrequest.Contact:[email protected],[email protected]

70

THEGEORGEM.O’BRIENKIDNEYTRANSLATIONALCORECENTERATTHEUNIVERSITYOFMICHIGAN

FrankC.Brosius1,WenjunJu1,KeithBellovich2,ZeenatBhat3,CrystalGadegbeku4,DebbieGipson1,JenniferHawkins1,JuliaHerzog1,SusanMassengill5,RichardC.

McEachin1,SubramaniamPennathur1,KalyaniPerumal6,RogerWiggins1,MatthiasKretzler1

1UniversityofMichigan,2RenaissanceRenalResearchInstitute,3WayneStateUniversity,

4TempleUniversity,5LevineChildren’sHospital,6UniviversityofIllinoisatChicago

RichardMcEachinRecentadvanceshaveallowedthedevelopmentofmolecularmapstodefinechronickidneydisease(CKD)innew,accurateandpersonalizedways.ThesedevelopmentsmakepossiblethepredictionofoutcomesandresponsetotherapyandtheidentificationofkeymoleculartargetsfortreatmentofCKDinindividualpatients.IdentificationofsuchtargetsentailsclosecollaborationbetweenteamsofinvestigatorstocollectandannotatesamplesfromwellcharacterizedCKDsubjects.Inaddition,technologiesareneededthatsupportinformationexchange,robustdatabanks,anddataintegrationtodefinekeypathwaysdrivingCKDpathogenesis.TheO'BrienKidneyTranslationalCoreCenterattheUniversityofMichiganprovidessuchbiobanking,databankstructureandbioinformaticsupporttobasicandclinicalinvestigatorstoallowthemtopursuecriticalprecisionmedicineinvestigationsofhumanswithCKD.TheClinicalPhenotypingandBiobankCorehasenrolledover1200patientswithCKDfrom5sitesandbankedtheirsamplesandclinicalinformationprovidingavaluableresourceforefficientdiscovery.Multiplespecificresearchstudieshavenowsuccessfullyutilizedtheseresources.TheAppliedSystemsBiologyCoreanditsonlineanalyticaltool,Nephroseq,haveassistedhundredsofinvestigatorsaroundtheworldinapproachestotheanalysisoflargetranscriptomicdatasetsandothersystems-level,biologicalstudiesofpatientswithCKD.TheCenter’sBioinformaticsCoreprovidesaccesstocomputationalapplicationsandskilledprofessionalsupportinbioinformaticsandbiostatisticsandwillnowbeprovidingback-endmaintenanceofNephroseq.TheAdministrativeCoredirectspilotandsmallgrants,studenttraininganddiscountprogramswiththegoalofhelpingnewandestablishedresearchersutilizesystemsbiologicalandtranslationalresearchtools.Togetherthesecoresprovideacomprehensivetranslationalresearchsupportfornovelresearchintoclassificationandtreatmentofchronickidneydiseases.Allinterestedacademicinvestigatorsaroundtheworldareinvitedtomakeuseoftheseservicesandtocontactusforinformationandconsultation.

71

MININGDIRECTIONALDRUGINTERACTIONEFFECTSONMYOPATHYUSINGTHEFAERSDATABASE

DanaiChasioti1,XiaohuiYao1,PengyueZhang2,XiaNing3,LangLi2,LiShen4

1IUPUISchoolofInformaticsandComputing;2CenterforComputationalBiologyandBioinformatics,DepartmentofMedicalandMolecularGenetics,IndianaUniversity

SchoolofMedicine;3IUPUIDepartmentofComputerScience;4CenterforNeruoimaging,DepartmentofRadiologyandImagingSciences,IndianaUniversitySchoolofMedicine

LiShenBackground:Mininghigh-orderdrug-druginteraction(DDI)inducedadversedrugeffects(ADEs)fromelectronichealthrecord(EHR)databasesisanemergingarea,andveryfewstudieshaveexploredtherelationshipsbetweenDDIs.Tobridgethisgap,westudyanovelpharmacovigilanceproblemforminingdirectionaldruginteractioneffectonmyopathyusingtheFDAAdverseEventReportingSystem(FAERS)database.Method:Theanalysiswasperformedonacase–controldatasetextractedfromtheFAERSdatabase.Thedatasetcontains1,763drugs,andincludes136,860myopathyeventsand3,940,587controlevents.GiventwosetsofdrugcombinationsD1andD2(asupersetofD1),wedefinethedirectionalADEeffectfromD1toD2,asthealteredADEriskassociatedwiththechangefromtakingD1totakingD2.TheADEriskswereestimatedusingoddratios(ORs).Toaddressbothcomputationalandstatisticalchallenges,thisstudywasfocusedoncomputingORsforfrequentD2’s(i.e.,thenumberofoccurrencesauser-specifiedminimumsupport).TheApriorialgorithmwasemployedtoidentifyfrequentD2’s.Results:Usingtheminimumsupportof1000,weidentified764frequentdrugs,7036frequent2drugcombinations,and4280frequent3drugcombinations.ThetoptenADEORsforsingledrugsrangefrom4.1to5.6,fortwodrugcombinationsfrom12.6to21.5,andforthreedrugcombinationsfrom14.8to19.5.ThetoptendirectionalADEORsbetweenonedrugandtwodrugsrangefrom13.5to28.2;thosebetweenonedrugandthreedrugsrangefrom13.1to20.3;andthosebetweentwodrugsandthreedrugsrangefrom11.3to34.4.MultiplepromisingdirectionalADEfindingswereidentified.Forexample,theriskofmyopathyis28.2timeshigherwhenaddingGadopentetatedimeglumineontopofGadobenatedimeglumine.BothdrugsareGadolinium-basedcontrastagents(GBCAs)usedinmagneticresonanceimaging.GBCAshavebeenshowntobeassociatedwithNephrogenicsystemicfibrosis(NSF)whichmaypresentasprogressivemyopathy.Conclusion:ThedirectionaldruginteractionscapturetheADErisksintroducedbyadditionaldrugstakenontopofasetofbaselinedrugs,andprovidenovelandvaluablepharmacovigilanceknowledgewithpotentialtoimpactclinicaldecisionsupport.MiningfrequentpatternsusingAprioriisapromisingapproachforeffectivediscoveryofhigh-orderdirectionaldruginteractioneffects.

72

DECIPHERINGNEURONALBROADHISTONEH3K4ME3DOMAINSASSOCIATEDWITHGENE-REGULATORYNETWORKSANDCONSERVEDEPIGENOMIC

LANDSCAPESINTHEHUMANBRAIN

AslihanDincer1,EricE.Schadt2,BinZhang2,JoelT.Dudley2,DavinGavin3,SchahramAkbarian4

1DepartmentofNeuroscience,FriedmanBrainInstitute,IcahnSchoolofMedicineatMountSinai,NewYork;2DepartmentofGeneticsandGenomicSciences,InstituteforGenomicsandMultiscaleBiology,IcahnSchoolofMedicineatMountSinai,NewYork;

3DepartmentofPsychiatry,JesseBrownVeteransAffairsMedicalCenter,Chicago;4DepartmentofPsychiatry,FriedmanBrainInstitute,IcahnSchoolofMedicineatMount

Sinai,NewYork

AslihanDincerOnlyfewhistonemodificationshavebeenmappedinhumanbrain.TrimethylationofhistoneH3atlysine4(H3K4me3)isachromatinmodificationknowntomarkthetranscriptionstartsites(TSS)ofactivegenepromoters.RegulatorsofH3K4me3markaresignificantlyassociatedwiththegeneticriskarchitectureofcommonneurodevelopmentaldisease,includingschizophreniaandautism.Here,throughintegrativecomputationalanalysisofepigenomicandtranscriptomicdatabasedonnextgenerationsequencing,weinvestigatedH3K4me3landscapesofFACSsortedneuronalandnon-neuronalnucleiinhumanpostmortem,non-humanprimate(chimpanzeeandmacaque)andmouseprefrontalcortex(PFC),andblood.WecharacterizedthebroadH3K4me3histonedomainsfromhumanPFCinthecontextofcell-typespecificregulation,associationwithneuronalandnon-neuronalgeneexpressionandpotentialimplicationsfornormalanddiseaseddevelopment.WefirstaddressedtheoccurrenceandthebiologicalsignificanceofthebroadH3K4me3histonedomainsinthreedifferentcelltypes,includingNeuN+PFCneurons,NeuN-PFCcells,andnucleatedbloodcellsandthenidentifiednovelregulatorsofthesethreedifferentcelltypesbyfocusingontop5%broadestH3K4me3peaks(lengthinbasepairs).InPFCneurons,broadestpeaksrangedinsizefrom3.9to12kb,withextremelybroadpeaks(~10kborbroader)relatedtosynapticfunctionandGABAergicsignaling(DLX1,ELFN1,GAD1,LINC00966).Broadestneuronalpeaksshoweddistinctmotifsignatures,andwerecentrallypositionedinprefrontalgenebayesianregulatorynetworks.Approximately120ofthebroadestH3K4me3peaksinhumanPFCneurons,includingmanygenesrelatedtoglutamatergicanddopaminergicsignaling,werefullyconservedinchimpanzee,macaqueandmousecorticalneurons.Explorationofspreadandbreadthoflysinemethylationmarkingsinspecificcelltypescouldprovidenovelinsightsintoepigeneticmechanismofnormalanddiseasedbraindevelopment,agingandevolutionofneuronalgenomes.

73

NORMALIZATIONTECHNIQUESANDMACHINELEARNINGCLASSIFICATIONFORASSIGNINGMOLECULARSUBSETSINAUTOIMMUNEDISEASEANDCANCER

JenniferM.Franks1,2,GuoshuaiCai1,JaclynN.Taroni3,4,MichaelL.Whitfield1,21DepartmentofMolecularandSystemsBiology;2PrograminQuantitativeBiomedicalSciences,

GeiselSchoolofMedicineatDartmouth;3DepartmentofSystemsPharmacologyandTranslationalTherapeutics;4InstituteforTranslationalMedicineandTherapeutics,Universityof

PennsylvaniaPerelmanSchoolofMedicineJenniferFranksSystemicsclerosis(SSc)isacomplexconnectivetissuediseaseinvolvingskinandinternalorganfibrosis,vasculardamage,andimmunologicabnormalities.Tocharacterizediseaseheterogeneityandmolecularpathogenesis,transcriptomicshaveelucidatedcommonbiologicalprocessesinsubsetsofSScpatientsusingintrinsicgeneexpressionanalyses.Fourintrinsicsubsetscharacterizedbydistinctmolecularsignatureshavebeenvalidatedbymultipleindependentcohorts.Technicalbiasesinherenttodifferentgeneexpressionprofilingplatformspresentauniqueproblemwhenanalyzingdatageneratedfrommultiplestudies.WhilemicroarrayandRNA-seqdatahavebeenshowntohaveahighcorrelation,differencesinoverallprocessingandquantificationresultindistinctdatadistributions.Here,weintroduceanaccurateandreproducibleclassifierforSScmolecularsubtypesandhavedevelopedamethodtonormalizedatawhenplatform-specificartifactsarise.Weusedthreeindependent,well-characterizedandvalidatedexperimentalmicroarraydatasets(Hinchcliffetal.,2013;Milanoetal.,2008;Pendergrassetal.,2012)totrainasupervisedclassifierusingthree-foldcross-validationrepeatedtentimes,performingatanaverageof>88%accuracy.Datafromotherplatforms,includingRNA-seq,areanalyzedforplatform-basedbiasusingguidedPCAanalysis(Reeseetal.,2013).Wedevelopedamethodtoeliminateplatformbiasbynormalizingonagene-by-genebasisusingthemicroarraytrainingdataasthetargetdistribution.Wefindthatthismethodsuccessfullyremovesplatform-specificeffectsfromthedata.Followingnormalization,eachsampleisassignedtoamolecularsubsetbasedonsupportvectormachine(SVM)classification.OurpreliminaryanalysesfindthatthesemethodsworkextremelywellonavalidationRNA-seqdatasetinSSc(100%accuracy,n=12,Lietal.,inpreparation).WealsoappliedourmethodstobreastcancerDNAmicroarrayandRNA-seqdatafromTheCancerGenomeAtlas(TCGA)(CancerGenomeAtlas,2012)wherefiveintrinsicgeneexpressionsubsetshavebeenpreviouslyidentifiedanddescribedwithPAM50(Parkeretal.,2009).Tumorandtumor-adjacentnormalbiopsiesofbreastcancer,forwhichintrinsicsubtypeinformationwasavailable,wereusedtotrainandtestaSVMandevaluateournormalizationtechnique.Weachieve93%accuracyinassigningsubtypesfornormalizedRNA-seqdatausingourclassifiertrainedexclusivelyonmicroarraydata.Untilrecently,clinicaltrialsanddiagnosingphysicianshavenotconsideredmolecularheterogeneityinthecontextofimmunosuppressivetherapy,whichmayexplainimprovementinselectSScpatients(Martyanov&Whitfield,2016).Advancingpersonalizedmedicinebyusingintrinsicmolecularsubsetsmayproveparticularlybeneficialtothisfield.Withournewlydevelopedtechniques,wecansuccessfullyleverageinformationfromvalidatedexpressiondatainnewanalysesdespitedifferentplatformsusedforgeneexpressionprofiling.

74

MULTI-OMICSDATAINTEGRATIONTOSTRATIFYPOPULATIONINHEPATOCELLULARCARCINOMA

KumardeepChaudhary,OlivierPoirion,LiangqunLu,LanaGarmire

UniversityofHawaiiCancerCenter,Honolulu

LanaGarmireHighmortalityrateofHepatocellularCarcinoma(HCC)isinpartduetothevastheterogeneityofthecancer.IdentifyingrobustmolecularsubgroupsofHCChelpstoguideprecisetargetedtherapeutics.Thiscouldberealizedbyintegratingdifferentlayersofomicsdatasetsfromthesamecohort.Toachievethis,wepresentadeeplearning(DL)basedmethodtoinspectthedifferentsubpopulationsofpatientswithinHCCfromTCGA.Weobtainedtheinformationof360HCCpatientsavailableinTCGAwith3omicsdatatypes(RNA-seq,miRNA-seqandmethylation).Toidentifythedifferentsubpopulations,ourpipelineimplementsaDL-basedautoencoder,identifieshiddenlayerslinkedtosurvival,andperformsk-meansclusteringusingthesenewfeatures.Toassignnewsamplestotheidentifiedsubpopulations,asupervisedclassificationprocedurewasconductedusingSupportVectorMachine(SVM).Toassesstheperformanceofthemodel,weused5-foldscross-validationschemetoestimatec-indexandbrierscores.Wealsoused60:40ratiotosplitthedatain10foldsinordertoassessthesignificanceofthecoxphregressioninthetestdataset.Finally,weinferredtheclusterlabelsoftwoexternalcohortsbasedonthegeneexpressiondata.Autoencoderframeworkwasusedtocombinethe3omicsasinputfeatures(~40,000)andtoproduce100transformednewfeatures.Amongthesenewfeatures,weidentified36featuressignificantlylinkedwithsurvival,whichwerefurtherusedtoinfer2optimalclustersofpatientswithsignificantsurvivaldifferences.Usingcross-validationprocedure,weobtainedaveragec-indexandbrierscorevaluesof0.70and0.20respectively,forthetestsets.Also,thecoxphregressionshowssignificantsurvivalestimationwhenusingthetestsamples.Finally,ourframeworkisvalidatedontwoexternaldataset:221HCCsamplesfromGEOstudyand230HCCsamplesfromLIRI-JP(RIKEN)cohort.Moreover,weprovedthateachoftheindividualomicfeaturesetscanbeusedsuccessfullytoinferthe2survivalprofiles.However,thecombinationofthe3omicsismorepowerful.WealsocomparedtheDLmethodologywithnewfeaturesproducedbyPCAinstead.Theclinicalandmoleculardifferences(intermsofsurvival,pathways,anddrivermutationprofiles)weresignificantlydifferentforthetwosubpopulations.Thisisthefirststudytoemploydeeplearningasarobustframeworktoidentifynon-linearcombinationofmulti-omicsfeatureslinkedtoidentificationofsubclassesofHCCpatients.Usingmulti-omicsdatasets,ourpipelinesuccessfullycombinesthesedifferentfeaturesandidentifiestwoHCCsubpopulationsexhibitingdifferentsurvivalprofiles.Wethenusedthismodelincombinationwithsupervisedmachine-learningapproachestopredictHCCsubpopulationassignmentfortestandvalidationdatasets.

75

TOWARDSSTANDARDS-BASEDCLINICALDATAWEBAPPLICATIONLEVERAGINGSHINYRANDHL7FHIR

NaHong,NareshProdduturi,ChenWang,GuoqianJiang

DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN

GuoqianJiangIntroduction:TheFastHealthcareInteroperabilityResources(FHIR)isanemergingclinicaldatastandarddevelopedatHL7,whichenablestherepresentationandexchangeoftheelectronichealthrecords(EHR)datainastandardstructure.FHIRhasstrongexecutableabilitybasedontheRESTfulservicearchitectureandmultipleflexibledataexchangeformats.ShinyisawebapplicationframeworkwithasimplifiedwebdeploymentmechanismthatenablespowerfulRfunctionstosupportthegraphicalandinteractiveanalysis.Therefore,withthegoalofbuildingreusableandextensibleclinicalstatisticsandanalysisapplications,weaimtodesign,developandevaluateaflexibleframeworkusingtheHL7FHIRstandardandtheR-poweredwebapplication-Shiny.Methods:WefirstestablishedalocalFHIRservertomanageourclinicaldata.ThispartofworkisfocusedontheanalysisandimplementationoftheFHIRdatamodels(i.e.,coreresources),dataexchangeformats(e.g.,XMLandJSON)andinvokinganopensourceHAPIFHIRAPI.Second,wedesignedtwoanalysisworkflowsthatarefocusedonpatient-centereddataanalysisandcohort-baseddataanalysisrespectively.Accordingtotheworkflowdesign,wedevelopedanopenapplicationplatformknownasShinyFHIRusingtheShinywebframeworkandtheestablishedFHIRserver.Results:WebuiltalocalFHIRserverusingtheHAPIDSTU2API.Intotal,140patientrecords,476observationrecords,496conditionrecordsand107procedurerecordswerepopulatedintotheFHIRserverfortesting.WiththesupportofRpackages,including‘jsonlite’,‘dygraph’and‘timeline’,ourplatformcanbeusedforavarietyofusecasesofclinicaldataanalysis,includingpatientbloodpressureobservationtimelineanalysis,patientcohortgender/agedistributionstatistics,etc.TheresultsoftheexperimentshowthattheShinyFHIRintegrationapproachoffersthefeasibilityofweb-basedinteractivestatisticsanalysisonstandardizedFHIR-basedclinicaldata.Discussions:TheimplementationsofFHIRhavealreadyattractedalotofinterestsfromhealthcarepractitioners.OurShinyFHIRimplementationprovidesausefulframeworkthatwouldbecomplementarytootherFHIR-basedapplications(e.g.,SMARTonFHIR).ShinyFHIRisdesignedtovisualizetheFHIR-conformantdatathroughcapturingtheuserexperiencesandhabits,andoffersrapidsupportforclinicalresearchwhilecombiningthelimitlessstatisticalpowerofR.However,thereareseveralissuesneedtobesolvedinthefuture,suchasthesupportoftheFHIRextensionsandcustommodelsandthesystemperformanceenhancement.Inthisstudy,wedescribedoureffortsinbuildingastandardizedclinicalstatisticsandanalysisapplicationleveragingShiny.WeconsiderthatthedesignedworkflowscanbeappliedtootherEHRsdatathatfollowstheFHIRstandard,andotherpublicavailableFHIRserverscanbeusedtovalidatetheutilityofourframework.

76

ADATALAKEPLATFORMOFCONTEXTUALBIOLOGICALINFORMATIONFORAGILETRANSLATIONALRESEARCH

AustinHuang1,DmitriBichko1,MathieuBoespflug2,EdskodeVries3,FacundoDominguez2,DanielZiemek1

1Pfizer,2TweagI/O,3Well-Typed

AustinHuangResearchersneedtoaggregatecontextualbiologicalinformationinordertointerpretexperimentalandclinicalstudyresults.Theseneedsvarygreatlydependingonthescientificquestion.Creatinglarge-scale,structureddatarepositoriesrequiressubstantialinvestmentthatisnotamenabletotherapidly-evolvingneedsoftranslationalresearch.Ontheotherhand,performingdataanalysesusingadhoccollectionsoflocaldatafiles(excelsheets,csvtables,etc.)allowsrapidandflexibleexecution,italsocreatestechnicaldebt.Inthelongterm,theseworkflowsresultinmissedopportunitiestoaccumulateinstitutionalknowledgeandareassociatedwithpoorreproducibility.Wehaveimplementedadataplatformthatcanachievethebenefitsofamoreprincipledhandlingofdatapersistencewithminimalanalystoverhead.Thisisachievedbyautomatingschemainference,metadatacuration,versioning,andRESTfulserviceproductionthroughasimple,Git-likeingestiontool.DatascientistscanretrievedataviafamiliarclientlanguageAPIssuchasdplyrinR.Theplatformisbuiltonopensourcedatabase(Postgres,withanarchitecturethatallowsalternativebackends)andfunctionalprogramming(Haskell,PostgREST)technologies.Ourobjectiveistoacceleratedatasharing/discoverabilityonanalystteamsanddrasticallyreducetheeffortofpersistingdatainasystematicmechanism.Wethereforeprovideatechnologyfoundationforrapiddataserviceproductionandimprovingreproducibilityandreusabilityindataanalyses.

77

GENOMEREADIN-MEMORY(GRIM)FILTER:FASTLOCATIONFILTERINGINDNAREADMAPPINGUSINGEMERGINGMEMORYTECHNOLOGIES

JeremieKim1,DamlaSenol1,HongyiXin2,DonghyukLee1,3,MohammedAlser4,HasanHassan5,OguzErgin5,CanAlkan4,OnurMutlu1,6

1DepartmentofElectricalandComputerEngineering,CarnegieMellonUniversity,Pittsburgh,PA;2DepartmentofComputerScience,CarnegieMellonUniversity,Pittsburgh,PA;3NVIDIA

Research,Austin,TX;4DepartmentofComputerEngineeringBilkentUniversity,Ankara,Turkey;5DepartmentofComputerEngineering,TOBBUniversityofEconomicsandTechnology,

Söğütözü,Ankara,Turkey;6DepartmentofComputerScience,SystemsGroupETH,Zürich,Switzerland

JeremieKimHigh-throughput sequencing (HTS) technology has resulted in a massive influx of available genetic data. Using HTS technology, genomes are sequenced relatively quickly and result in many short DNA sequences (reads) that are used to analyze the donor’s genome across multiple days when using state-of-the-art methods. The first step of genome analysis, read mapping, determines origins for billions of reads within a reference genome to identify the donor’s genomic variants. Hash-table based read mappers are a common type of comprehensive read mappers. They operate by fetching from a pre-generated hash-table, potential mapping locations of a read in the reference genome, which are verified by local alignment, a computationally-expensive dynamic programming algorithm that determines similarity between the read and the potential mapping segment of the reference genome. Alignment has traditionally been the computational bottleneck of read mapping, but recently, many works have been proposing a new step called Location-Filtering in order to alleviate this bottleneck.

Location-Filtering is a critical step where many incorrect potential locations from the hash-table are discarded before local alignment verifies such locations. FastHASH, SHD, and GateKeeper propose variations of Location-Filtering that discard only incorrect locations to reduce end-to-end runtime of hash-table based read mapping. Location-Filtering is now the computational bottleneck of read mapping.

Our goal is to create an efficient Location-Filter that quickly discards as many false negative locations as possible before alignment, while retaining a zero false positive rate. Efficiently filtering incorrect mappings before alignment significantly improves throughput and latency of hash-table based read mapping. We propose a novel filtering algorithm that quickly eliminates from consideration reference genome segments where alignment would yield no matches. Our algorithm’s novelty mainly stems from its design to exploit 3D-stacked memory systems. 3D-stacked memory is an emerging technology that tightly integrates computation and high-capacity memory in a single die stack, thereby enabling concurrent processing of large data chunks at low latency and high bandwidth. The key ideas of our design consist of 1) a new representation of coarse-grained reference genome segments such that the genome can be operated on in parallel using bitwise operations and 2) exploiting the parallel computation capability of 3D-stacked memory to run massively-parallel in-memory operations on the new genome representation. We call our resulting filter the GRIM-Filter.

This work shows how GRIM-Filter can be used with any hash-table based read mapping algorithm and how it effectively exploits processing-in-memory capabilities of 3D-stacked memory. We show that when running with 5% error tolerance, GRIM-Filter reduces false positive locations by 5.59x-6.41x and provides a 1.81x-3.65x end-to-end speedup over the state-of-the-art read mapper mrFAST with FastHASH

78

BCL-2FAMILYMEMBERSASREGULATORSOFRESPONSIVENESSTOBORTEZOMIBINAMULTIPLEMYELOMAMODEL

MelissaE.Ko1,2,CharisTeh3,4,ChristopherS.Playter5,EliR.Zunder6,DanielH.Gray4,7,WendyJ.Fantl8,SylviaK.Plevritis9,GarryP.Nolan2

1CancerBiologyProgram,StanfordSchoolofMedicine,Stanford,CA;2BaxterLaboratoryforStemCellBiology,StanfordSchoolofMedicine,Stanford,CA;3MolecularGeneticsofCancerDivision,ImmunologyDivision,TheWalterandElizaHallInstitute,Parkville,VIC,Australia;

4DepartmentofMedicalBiology,TheUniversityofMelbourne,Parkville,VIC,Australia;5DepartmentofBiologicalSciences,PurdueUniversity,Lafayette,IN;6DepartmentofBiomedical

Engineering,UniversityofVirginia,Charlottesville,VA;7TheWalterandElizaHallInstitute,Parkville,VIC,Australia;8DepartmentofObstetricsandGynecology,StanfordSchoolof

Medicine,Stanford,CA;9DepartmentofRadiology,StanfordSchoolofMedicine,Stanford,CAMelissaKoSurvivalratesforBcellmalignancieshavesteadilyimprovedoverthelastfivedecadesreachinglevelsofover50%asaresultoftherapeuticagentssuchasdexamethasone,bortezomib,andlenalidomide.However,despitetheirsuccessinproducingclinicalresponses,thecellularmechanismsbywhichtheseagentskilltumorcellsarepoorlyunderstood.WehypothesizedthattheBcl-2familyofproteins,whichareknowntocontrolinitiationofapoptosisandarefrequentlydysregulatedincancerousBcellssuchasmultiplemyeloma,caninfluenceresponsivenesstothesetherapeuticagents.Thus,withafocusonmultiplemyeloma,weaimedtocomprehensivelyprofileindividualcellsfortheirexpressionlevelsofBcl-2familymemberssimultaneouslywithactivatedintracellularsignalingproteinsuponexposureofcellstodrugsusedtotreatB-cellmalignancies.Weappliedsingle-cellmasscytometrytoinvestigatetheinterplayofpro-survivalandpro-apoptoticBcl-2familymembersinMM1SBlymphoblasticcellsexposedtodifferentdrugs.ThisdatasetwasanalyzedwithFLOW-MAP,acomputationaltooldevelopedintheNolanLabthatorganizeshigh-dimensionalsingle-celldataintoaninterpretable2Dgraphstructure.FLOW-MAPenabledtheapoptoticprogressionofindividualcellstobevisualizedandshowedchangesinexpressionlevelsofBcl-2familymembersandsignalingfactorsacrosscellswithdifferentdrugsensitivities.Ourextensivestudyrevealedheterogeneousresponsesofcellsubsetstotherapeuticagentsusedtotreatmultiplemyelomapatients.Forexample,ourresultsshowedthatbortezomib,aproteasomeinhibitorapprovedfortreatmentofmultiplemyeloma,potentlyinducesapoptosiswithin24hourstoagreaterextentcomparedtoothertreatments.Inductionofapoptosisinsinglecellstreatedwithbortezomibcoincidedwithaselectivereductionofasubsetofpro-survivalBcl-2members.Furthermore,ouranalysissuggeststhatametricthatreflectsthebalanceofpro-survivalandpro-apoptoticBcl-2proteinsmaybestseparateandpredictcellswithdifferentialsensitivitytobortezomib.Thisparadigmissupportedbystatisticalmodelingwhereinwedevelopedaclassifierofbortezomib-resistantvs.sensitivecellsusingBcl-2familyinformationorasingleBcl-2scorewithsignificantaccuracy.Ourstudyprovidesageneralframeworkforunderstandingdifferentialsensitivityoftumorpopulationstoanti-cancerdrugs.Ourresultsarelikelytoidentifypreviouslyunknowndeath-inducingmechanismsaswellaspinpointpotentialsynergiesbetweenstandard-of-caretherapiesandnewlydevelopedtherapies,suchasBcl-2familyinhibitors.

79

BIOMEDICALTEXT-MININGAPPLICATIONSFORTHESYSTEMDEEPDIVE

EmilyK.Mallory,ChrisRe,RussB.Altman

StanfordUniversity

EmilyMalloryAcompleterepositoryofbiomedicalrelationshipsiskeyforunderstandingtheprocessesunderlyingbothhumandiseaseanddrugresponse.Afterdecadesofexperimentalresearch,themajorityofknownbiomedicalrelationshipsexistsolelyintextualformintheliteratureandarethuscomputationallyinaccessible.Whilecurateddatabaseshaveexpertsmanuallyannotaterelevantrelationshipsorinteractionsfromtext,thesedatabasesstruggletokeepupwiththerapidgrowthofthebiomedicalliterature.Toaddresstheneedforbiomedicalrelationshipextraction,therehavebeennumerousbiologicalentityandrelationshipextractionchallenges;however,extractionsystemsinthebiomedicalspacetendtobetaskspecificanddonotprovideageneralframeworkforquicklydevelopingfuturesystemstoaddressnewextractiontasks.Inthiswork,wedevelopedmultipleentityandrelationshipapplications(called“extractors”)forthesystemDeepDivetoextractbiomedicalrelationshipsfromfulltextarticles.DeepDiveisatrainedsystemforextractinginformationfromavarietyofsources,includingtext.Applicationdeveloperscreatefeaturesandtrainingexamples,andDeepDiveassignsaprobabilitythatagivenentityorrelationshipiscorrectortrueintheoriginalsentence.Wedevelopedentityextractorsforgenes,drugs,anddiseases;andrelationshipextractorsforgene-gene,gene-disease,andgene-drugrelationships.Weevaluatedthegene-geneworkpreviouslywithacorpusofarticlesfromthreePLOSjournals,andwearecurrentlyevaluatingtheothertworelationshipextractorsonacorpusfromPubMedCentral.Theprecisionofourentityextractorsrangedfrom80to90%.Forthetaskofextractinggene-generelationships,oursystemachieved76%precisionand49%recallinextractingdirectandindirectinteractionspreviouslycuratedbytheDatabaseofInteractingProteins(DIP).Forrandomlycuratedextractions,thesystemachievedbetween62%and83%precisionbasedondirectorindirectinteractions,aswellassentence-levelanddocument-levelprecision.Ourcurrentgene-diseaseandgene-drugextractorsachievedover70%precisiononarandomsubsetofdocumentsfromover340,000fulltextarticlesinthePubMedCentralOpenAccessSubset.Wearecurrentlytuningtheseextractorstoincreaseperformance.Thisworkwillenablenotonlyfulltextliteratureextractionforbiomedicalrelationships,butalsocomputationalmethodsdevelopmentbasedontheserelationships.

80

PROFILINGADAPTIVEIMMUNEREPERTOIRESACROSSMULTIPLEHUMANTISSUESBYRNASEQUENCING

SergheiMangul1,IgorMandric2,HarryTaegyunYang1,DennisMontoya1,NicolasStrauli3,JeremyRotman1,BenjaminStatz1,WillVanDerWey1,AlexZelikovsky2,Roberto

Spreafico1,MauraRossetti1,SagivShifman1,MarkAnsel3,NoahZaitlen3,EleazarEskin1

1UniversityofCaliforniaLosAngeles,2GeorgiaStateUniversity,3UniversityofCaliforniaSanFrancisco

SergheiMangulAssay-basedapproachesprovideadetailedviewoftheadaptiveimmunesystembyprofilingT-andB-cellreceptors.However,thesemethodscomeatahighcostandlackthescaleofregularRNAsequencing(RNA-seq).WedevelopedImReP,anovelcomputationalmethodthatutilizesRNA-seqdatatoprofiletheadaptiveimmunerepertoire.ImRePisabletoquantifyindividualimmuneresponsesfromRNA-SeqdatabasedonarecombinationlandscapeofgenesencodingB-andT-cellreceptors.WeappliedImRePto8,555samplesfrom544individualsand53diversehumantissues,andconstructedthecomplementaritydeterminingregions3(CDR3),whichisthemostvariablepartoftheantigen-bindingsite.Weassembled3.8milliondistinctCDR3sequences.Analyzingthisdataset,weidentifiedthenormal,healthy,adaptiveimmuneprofilefordifferenttissues.Wedescribethevariationinimmuneprofiles,andthedistributionofclonallineagesacrossindividualsandtissues.BaseontheimmuneprofilesgeneratedbyImReP,wewereabletoidentifyinflammationandvariousdiseases,asconfirmedfromthehistologicalimages.TheatlasofTandBcellrepertoires,freelyavailableathttps://sergheimangul.wordpress.com/atlas-of-t-and-b-cell-repertoires/,isthelargestrecourseintermsofthenumberofCDR3sequencesandtissuetypesinvolved.Weanticipatethisrecoursetoenhancefuturestudiesinareassuchasimmunologyandadvancedevelopmentoftherapiesforhumandiseases.ImRePisfreelyavailableathttps://sergheimangul.wordpress.com/imrep/.

81

THECMHVARIANTWAREHOUSE-ACATALOGOFGENETICVARIATIONINPATIENTSOFACHILDREN'SHOSPITAL

NeilMIller1,GreysonTwist1,ByunggilYoo1,AndreaGaedigk2

1CenterforPediatricGenomicMedicine,Children'sMercy,KansasCity;2DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,School

ofMedicine,UniversityofMissouri-KansasCity

NeilMillerAdvancesinhigh-throughputDNAsequencinghaveenabledthecomprehensiveidentificationofindividualgeneticvariationonanunprecedentedscale,poweringthediagnosisofdiseaseandpersonalizedtreatment.Astheabilitytodetectgeneticvariationhasgrown,cliniciansandresearchersstruggletointerpretthefunctionalsignificanceofthemillionsofvariantsfoundineachindividualgenome.TheVariantWarehouseattheCenterforPediatricGenomicMedicineatChildren’sMercy,KansasCity,isaresourcecontainingarecordofover160milliongenomicvariantsdetectedinmorethan5000patientssequencedbytheCentersince2011.EachvarianthasbeencharacterizedbytheCPGM’sRapidUnderstandingofNucleotideEffectSoftware(RUNES)pipeline,whichrecordsdatabasecrossreferences,predictedfunctionalconsequencesandavariantclassificationscore(1-5)basedonpreliminaryguidelinesfromtheAmericanCollegeofMedicalGeneticsandGenomics(ACMG).Additionally,alocalallelefrequencyiscalculatedforeachvariantevery6hoursenablingcliniciansandresearcherstorapidlyidentifyrarevariants.Despiteextensivecross-referencingwithdatabasessuchasdbSNP,ClinVar,ExACandCOSMICtheCMHvariantwarehousecontainsasignificantnumberofnovelvariantsnotpresentinexternaldatabases.59%ofthetotalvariantsinthewarehousearenovelwithalocalallelefrequencyoflessthan0.25%.Ofthese,1%arecategory1-3variantsexpectedtohavesomefunctionalimpact.Wehaveobserved82,578variantsamongapanelof58pharmacogenes(includingCPICgenes),ofwhich59%arenoveland2%arecategory1-3variants.Theamountofnoveltyobservedinthispatientpopulationsuggeststhateffortstocomprehensivelycataloghumanvariationremainaworkinprogressandthatinterpretationofvariantdatawillrequiresomelevelofinterpretationofnovelvariantsfortheforeseeablefuture.Theseobservationsareincreasinglyrelevantinpharmacogenomicsapplicationswheredrugcompatibilityisdeterminedthroughassociationtoknownhaplotypes;inthiscontext,thepresenceofnovelandrarevariantsmustbeanticipatedandaccountedforinautomatedhaplotypedetermination.TheCMHvariantwarehouseispubliclyavailableathttp://warehouse.cmh.edu.Toolstosearchandviewvariantsbygene,categoryandallelefrequencyareprovidedaswellasbulkdownloadsofdata.ProgrammaticaccesstodataisprovidedthroughimplementationsoftheGlobalAllianceforGenomicsandHealthvariantannotationAPI.

82

MUTPRED2ANDITSAPPLICATIONTOTHEINFERENCEOFMOLECULARSIGNATURESOFDISEASE

VikasPejaver1,LiliaM.Iakoucheva2,SeanD.Mooney3,PredragRadivojac1

1DepartmentofComputerScienceandInformatics,SchoolofInformaticsandComputing,IndianaUniversityBloomington;2DepartmentofPsychiatry,UniversityofCaliforniaSanDiego;

3DepartmentofBiomedicalInformaticsandMedicalEducation,UniversityofWashingtonSeattlePredragRadivojacOverthepastdecade,severalmethodshavebeendevelopedforthecomputationalprioritizationofmissensemutations.However,theidentificationoftheeffectsofsuchmutationsonproteinstructureandfunctionstillremainamajorchallenge.Previously,wedevelopedMutPred,arandomforest-basedmodelfortheclassificationofpathogenicmissensevariantsandtheautomatedinferenceofmolecularmechanismsofdisease.Here,webuildonourpreviousworkandpresentMutPred2asanimprovedapproachforthesetasks.Forpathogenicityprediction,MutPred2particularlybenefitsfromalargerandheterogeneoustrainingset,theinclusionofnewfeatures,theencodingoflocalsequencecontextandtheuseofaneuralnetworkensemble.Throughcross-validationexperimentsandatestonanindependentdataset,weshowthatMutPred2outperformsMutPredandotherstate-of-the-artmethods.Inparticular,weobservethatMutPred2predictsfewerpathogenicmutationsthanPolyPhen-2,whenappliedtohomozygousmutationsfromhealthyindividuals.Additionally,MutPred2hasover50built-instructuralandfunctionalpropertypredictors,whichgreatlyincreasethenumberofpossibledownstreamconsequencesthatcanbeassociatedwithagivenaminoacidsubstitution.Weintroduceanovelrankingapproachthatutilizesapositive-unlabeledlearningframeworktoderiveposteriorprobabilitiesforthedisruptionofthesepropertiesand,thus,inferthemostlikelymolecularmechanismofpathogenicity.WethendemonstratetheutilityofMutPred2intwosituations.First,weidentifyprominentstructuralandfunctionalsignaturesinadatasetofmostlyMendeliandiseases(fromMutPred2’strainingset)andrecapitulateknownassociationsbetweenthesediseasesandorderedandstructuredregionsofproteins.Wealsomakenovelpredictionsabouttheroleofallostericresiduesinsuchdiseases.Second,weapplyMutPred2toadatasetofdenovomutationsfrompatientsdiagnosedwithneuropsychiatricdisorders,alongwithhealthysiblingsascontrols.Onthisdataset,MutPred2pathogenicityscoresalonearesufficienttodistinguishbetweenneuropsychiatriccasesandcontrols,withoutanyadditionalgene-basedorvariant-basedfiltering.Wealsoobservethatdisruptionsinprotein-proteininteractions(PPIs),phosphorylationandacetylationarefrequentmechanisms,suggestingthatneuropsychiatricdisordersarelargelycharacterizedbyabreakdowninmolecularsignaling.Finally,weidentifycandidatemutationspredictedtodisruptPPIsandvalidatethemexperimentally.

83

HIV-TRACE:MONITORINGTHEHIVEPIDEMICINNEARREALTIMEUSINGLARGENATIONALANDGLOBALSCALEMOLECULAREPIDEMIOLOGY

SergeiPond1,StevenWeaver1,JoelWertheim2,AndrewJ.LeighBrown3

1TempleUniversity,2UniversityofCaliforniaSanDiego,3UniversityofEdinburgh

SergeiPondManypathogens,includingHIV,propagatealongsexualandsocialcontactnetworks.ItisnowclearthatHIVtransmissionnetworksbelongtothescalefreefamilyandthespreadofinfectionsinscalefreenetworksiscriticallyenhancedbyhighlyconnectedindividualsor“hubs”.Thestructureofthetransmissionnetworkhasmajorimplicationsforinterruptinganepidemic.Sincepathogentransmissionnetworksarenotobserveddirectly,theyareinferredandcharacterizedbasedonindirectmeasurements,andmethodstodothisproperlyremainsanopenresearchchallenge.Becauseoftheirrapidandhost-specificevolutionandchronicdiseasestates,HIVsequenceisolatesareessentiallyuniquetoeachinfectedperson.Thissequenceuniquenesscanbeusedtoconfirmorrejectthehypothesisthattwoindividualsare“linked”byarecenttransmissionorbelongtothesametransmissionclusterThereare~1,000,000HIVsequencesisolatedfromdifferentindividualsoverthelast4decades.Nationalandinternationalsurveillanceanddrugresistanceprogramsaregeneratinghighresolutionsequencingdataonhundredsofthousandsofisolatesannually.WedevelopedHIVTransmissionClusterEngine(HIV-TRACE)inordertomaketheprocessofcluster(andnetwork)inferenceautomated,fast,convenient,andmorerobust.Itisanefficientopen-sourceapplicationdesignedtoscalewellandenablenearreal-timeinferenceandanalysisoflargenetworks:itcanprocess100,000sequencesin~15-30minutesona64corebackendsystem.HIV-TRACE(hiv-trace.org)isanopen-sourcewebapplicationbuiltonrobustandpopularmodernlibraries.Userinteractionandresultvisualizationisdoneentirelyinthebrowser,processingisdoneasynchronouslyonaserverbackend.ComponentsandversionsofHIV-TRACEareusedbytheCDC(VARS,HICSB),Canadianpublichealthofficials,NYCDepartmentofPublichealth,SanDiegoprimaryinfectioncohort,andtheUKDrugResistanceDatabase.WeillustratetheutilityofHIV-TRACEonfourreal-worldexamplesofessentialquestionsinpublichealthandepidemiologyofHIV-1:1).Arethererapidlygrowingtransmissionclusters,andwhatisdrivingtheirgrowth?2).HowdoesHIVspreadatdifferentgeographicscales,andamongdifferentriskgroups?3).Howcantreatmentandinterventionbedeployedinoptimalwaystoreduceincidenceandprevalence?4).Canvaccineandpreventionefficacybemeasuredmoreaccuratelyusingnetwork-levelinformation.

84

THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY

MaryA.Pyc,EmilyGiron,PhilipCheung,DouglasFenger,J.StevendeBelle,TimTully

DartNeuroScience

DouglasFengerWeareinterestedindiscoveringnewcandidatetargetsfordrugtherapiestoenhancecognitivevitalityinhumansthroughoutlife,andtoremediatememorydeficitsassociatedwithbraininjuryandbrain-relateddiseasessuchasAlzheimer’sandParkinson’s.Toachieveourgoalweneedacomprehensiveandobjectiveunderstandingofthehumangenomecontributiontovariationinmemoryperformanceinhealthyindividuals.WeareimplementingaGenome-WideAssociationStudy(GWAS)toidentifygeneticlocivaryingamongindividualswhopossessexceptionalandnormalmemoryabilities.Thesegenesandthoseinassociatednetworkswillinformdrugdiscoveryanddevelopment.Ourfirststepistoidentifyexceptionalmembersofthepopulation.Thus,wehavecreatedanonlinememorytest–theExtremeMemoryChallenge(XMC,accessibleathttp://www.extremememorychallenge.com)–toconvenientlyscreenthroughanunlimitednumberofsubjectstofindindividualswithexceptionalmemoryconsolidationabilities.Identifiedsubjectsare(1)validatedbyabatteryofsecondarymemorytasks,and(2)providingsalivasamplesfromwhichwecanisolateDNAforGWAS..TenpilotexperimentswereconductedtoparameterizetheXMCscreen.Participantslearnedface-namepairsforadelayedrecalltest.Afterinitialstudy,eachnamewaspresentedandparticipantswereaskedtoselectthecorrectfaceamongfour(distracterswereotherfacespairedwithdifferentnames).Onedaylaterparticipantscompletedafinaltesttrial.Weareprimarilyinterestedinforgettingacrosssessions,asthisprovidesanestimateofconsolidationacrossa24-hourtimeinterval.Pilotstudiesindicatedtheoptimalprotocolshouldinclude30face-namepairs,presentedata4secondrate.Todate,17,849participantsfrom176nationshavebeenscreenedintheXMC.Ofthese,11,311havecompletedbothsessions.IndividualsinoursamplearemostfrequentlyCaucasians(55%),post-secondaryschool-educated(63%),reportedbeingmostalertinthemorning(51%),andrighthanded(89.5%).Theaverageagewas34,andthegenderdistributionwassplitevenly.Theforgettingrate(decreaseinperformancefromday1today2)was10%.Wehaveidentified49individualswithperfectperformanceonday2ofthetestand24withexceptionalconsolidationabilities(definedas3SDsfromthemean).Wehavebegunthegenomicsphaseofthestudywith33individualswhohavecompletedadditionalbehavioraltesting.

85

RESCUETHEMISSINGVARIANTS-LESSONSLEARNEDFROMLARGESEQUENCINGPROJECTS

YingxueRen1,JosephS.Reddy1,VivekanandaSarangi2,JasonP.Sinnwell2,SteveG.Younkin3,NilüferErtekin-Taner3,OwenA.Ross3,RosaRademakers3,ShannonK.McDonnell2,JoannaM.

Biernacka2,YanW.Asmann1

1DepartmentofHealthSciencesResearch,MayoClinic,Jacksonville,FL;2DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN;3DepartmentofNeuroscience,MayoClinic,

Jacksonville,FLYingxueRenIdentifyingnoveldiseasevariantsthroughnextgenerationsequencing(NGS)hasbeenafruitfulpracticeinmedicalresearchinrecentyears,leadingtothediscoveriesofnewdiseasemechanismsaswellastherapeuticstrategies.TheGATKbestpracticeshavesincebeenestablishedtoprovidegeneralrecommendationsoncoreprocessingstepsrequiredtogofromrawreadstofinalvariantcallsets.However,withthesamplesizedrasticallyincreasingintoday’ssequencingexperiments,manydefaultvariantcallingstrategiesandthechoiceoftoolscallforacloserexamination.OurstudyutilizedthewholeexomesequencingdataprovidedbytheAlzheimer'sDiseaseSequencingProject(ADSP)totestfordifferentvariantcallingstrategiesandtoolsinvolvedinthevariantdiscoveryworkflowinthecontextofsamplesizes.WefirstinvestigatedtheimpactofusingdifferentsequencealignersonvariantcallsetswhilekeepingthedefaultGATKsettingsofthevariantcallingandQCstepsidentical.Weselected1952samplestoalignbybothBWAandNovoAlign,andcomparedthevariantcallsetsin50,100,200,500,1000and1952samples.Wediscoveredthatthepercentageofvariantsuniquetoalignerincreaseddramaticallywithincreasingsamplesizes.Atsamplesizeof1952,theuniquevariantsgeneratedbyBWAandNovoAlignaccountformorethan20%oftotalcalledvariants.Theseuniquevariantshavegoodvariantqualitymetrics:~80%haveGenotypeQuality(GQ)scoreof60orabove,andtheirdistributionofBalleleconcentration(BAC)centersaround0.5and1,consistentwithwhatisexpectedofdiploidgenomes.What’smore,over96%oftheuniquevariantshavepopulationBallelefrequency(BAF)oflessthan0.01,indicatingthatthesevariantsarerareinthepopulation.Allthesemetricssuggestthattheseuniquevariantsareimportanttobeincludedindownstreamvariantanalysis.Inadditiontoalignercomparison,wealsoevaluatedsingle-samplevariantcallingversusthedefault,singlesamplevariantcallingfollowedbyjointmulti-samplegenotypingstrategyin50,100,500,2000,and5000samples.Ourdatashowedthat,withincreasingsamplesizes,thesingle-samplecallingstrategyaddedincreasingpercentageofuniquevariants.Atsamplesizeof5000,single-samplecallingadded58,884variants,accountingfor5.55%oftotalvariantscalledbybothstrategies.7331oftheseuniquevariantspassedVariantQualityScoreRecalibration(VQSR)andhaveGQof60oraboveinatleast5samples.Ourstudyidentifiedalargenumberofgood-qualityvariantsfromtheADSPexomesequencingprojectthatweremissedbyusingonealignerorusingmulti-samplegenotypingstrategyalone.Ourfindingsrevealedtherelationshipsbetweenbioinformaticspipelinesandbiomedicalresearchresults,andsuggestedthatalternativevariantcallingstrategiesmaybebeneficialforoptimalvariantdiscoveryinfaceoftoday’slargesequencingscale.

86

TOWARDEFFECTIVEMICRORNAQUANTIFICATIONFROMSMALLRNA-SEQ

PamelaRussell1,RichardRadcliffe2,BrianVestal1,WenShi1,PratyaydiptaRudra1,LauraSaba2,KaterinaKechris1

1DepartmentofBiostatisticsandInformatics,ColoradoSchoolofPublicHealth;2DepartmentofPharmaceuticalSciences,UniversityofColoradoSkaggsSchoolof

PharmacyandPharmaceuticalSciences

PamelaRussellExtensiveworkhasledtorobustquantificationmethodsforRNA-seqdataprimarilyderivedfromlargeRNAs.Manystudieshaveusedthesemethods“outofthebox”toestimatemicroRNA(miRNA)expressionfromsmallRNA-seqdata.However,thesemethodsdonoteffectivelyaddressissuesparticulartomiRNAs.Firstofall,referencebiasisamplifiedduetothesmallsizeofsequencingreadsderivedfrommiRNAs(~22nt).Thatis,withshorterreads,atruemismatchbetweenasampleandthereferencecanleadtoincorrectalignmentsorinabilitytoalignreadsatall,creatingacountbiastowardthosesampleswiththereferenceallele.Withlongerreads,singlemismatcheshavelessimpactonalignmentalgorithms.Second,anybiasforindividualmiRNAsismoreimpactfuloverallduetotherelativelysmallrepertoireofmiRNAscomparedtomRNAs.InaccuratecountsforahandfulofmiRNAscansignificantlyalteroveralllibrarycountsandthusaffectnormalization.Werefertothisissueasrepertoirebias.Also,mostmiRNAstudiesseektoidentifyfunctionalmaturemiRNAmoleculesregardlessofthepositioninthegenomethattheyareoriginallytranscribedfromorsmallnon-functionaldifferencesbetweenmiRNAsofthesamefamily.ToolsdesignedforlargeRNAsdonotaddresstherepetitivenatureandfamilystructureofmiRNAs,bydefaultreturningestimatedcountsformultipletargetsthatshouldbeconsideredequivalentbytypicalmiRNAstudyparadigms.Genome-basedmethodsoftenmapmiRNAreadstomultiplelociencodingthesamematuremiRNA.MethodsbasedonmappingdirectlytoamiRNAdatabasedonotsufferfrommultiplealignmentsduetoidenticalregionsofthegenomebutdotypicallydistinguishamongmembersofeachmiRNAfamily.Bothsourcesofmultiplemappingscanleadtomisleadingcountswhenthegoalistoelucidatefunction.Hereweexplorealltheseissuesinthecontextofcommonlyusedmethods.Wethenproposeanewhighthroughputapproachthat(1)incorporatesindividualgeneticvariationintothereferencesequenceusedforalignment,reducingreferencebias,and(2) assignseachreadtoasinglefunctionalgroupsuchasamiRNAfamily.Wedemonstratetheaccuracyofthisapproachcomparedtootherpopularmethodsusingadatasetderivedfrom206mousebrainsamples.FundedbyNIH/NIAAAAA016597,R01AA021131andR24AA013162

87

NANOPORESEQUENCINGTECHNOLOGYANDTOOLS:COMPUTATIONALANALYSISOFTHECURRENTSTATE,BOTTLENECKSANDFUTUREDIRECTIONS

DamlaSenol1,JeremieKim1,SaugataGhose1,CanAlkan2,OnurMutlu1,3

1DepartmentofElectricalandComputerEngineering,CarnegieMellonUniversity,Pittsburgh,PA,USA;2DepartmentofComputerEngineering,BilkentUniversity,Bilkent,Ankara,Turkey;

3DepartmentofComputerScience,SystemsGroup,ETHZürich,SwitzerlandDamlaSenolNanoporesequencing,apromisingsingle-moleculeDNAsequencingtechnology,exhibitsmanyattractive qualities and, in time, could potentially surpass current sequencing technologies.Nanoporesequencingpromiseshigherthroughput,lowercost,andincreasedreadlength,anditdoes not require a prior amplification step. Nanopore sequencers rely solely on theelectrochemicalstructureofthedifferentnucleotidesforidentificationandmeasurethechangeintheioniccurrentaslongstrandsofDNA(ssDNA)passthroughthenano-scaleproteinpores. Biologicalnanopores forDNAsequencingwas firstproposed in the1990s,but itwasonly justrecentlymade commercially available inMay 2014 by Oxford Nanopore Technologies (ONT).The first commercial nanopore sequencing device, MinION, is an inexpensive, pocket-sized,portable,high-throughputsequencingapparatusthatproducesreal-timedata.Thesepropertiesenable newpotential applications of genome sequencing, such as rapid surveillanceof Ebola,Zikaorotherepidemics,near-patienttesting,andotherapplicationsthatrequirereal-timedataanalysis. Inaddition,thistechnologyiscapableofgeneratingvery longreads(~50,000bp)withminimal sample preparation. Despite all these advantageous characteristics, it has onemajordrawback:higherrorrates.Inordertoprovidehigheraccuracyandhigherspeed,inMay2016,ONT released a new version of MinION with a new nanopore chemistry called R9, whichreplacedthepreviousversionR7.AlthoughR9chemistryimprovesthedataaccuracy,thetoolsused for nanopore sequence analysis are of critical importance as they should overcome thehigherrorratesofthetechnology. Ourgoalinthisworkistocomprehensivelyanalyzetoolsfornanoporesequenceanalysis,withafocusonunderstandingtheadvantages,disadvantages,andbottlenecksofthevarioustools.Tothisend,werigorouslyexaminemultiplesteps in thenanoporegenomeanalysispipeline.Thefirststep,basecalling, translatestherawsignaloutputofMinIONintonucleotidestogenerateDNA sequences. Currently,Nanocall andNanonet are publicly available nanoporebasecallers.The second stepperformsgenomeassemblywithassemblers fornoisy long reads.Usingonlythe basecalled DNA reads, assemblers generate longer contiguous fragments called draftassemblies. Currently,CanuandMiniasm are the commonlyused long-readassemblers.Afterthis step, an improved consensus sequence is generated from the draft assembly withNanopolish,andacompletewholegenomeisobtained. Weanalyzethefiveaforementionednanoporesequencingtoolsintermsoftheirspeedandaccuracy,withthegoalsofdeterminingtheirbottlenecksandfindingimprovementstothesetools.Wealsodiscusspotentialfutureworksinnanoporebasecallersandassemblers,totakebetteradvantageofnanoporesequencingandtoovercomeitscurrentdisadvantageofhigherrorrates.

88

DETECTINGOUTLIERSFROMMULTIDIMENSIONALDATAWITHAPPLICATIONINCANCER

KyleSmith1,SubhajyotiDe2,DebashisGosh1

1UniversityofColorado,2RutgersUniversity

KyleSmithOutliers,whichareverydifferentfromthetypicalcasesinacohort,bringinunexpectedchallengesfordecisionmakinginmanydifferentdisciplines.Theissueismoreacuteinoncology,sincemosttypesofcancerarehighlyheterogeneousdiseases.Evenwithinanycancersubtype,patientsshowextensivevariationintheirmolecularprofilesandclinicaloutcomes.Evenwithinacohortofcancerpatientswhohaveapparentlythesamebiomarkersandreceivedidenticaltreatment,thereareexceptionalrespondersandexceptionalnon-responders,whoareoutliers.Itissuspectedthattheiratypicalmolecularandclinicalprofilescontributetotheirexceptionalresponse.Whileidentifyingsuchoutliercasescanbenefitprecisionmedicineinitiatives,methodstodetectthemfrommultidimensionaldatahasreceivedlimitedattention.Here,weproposeanovelframeworktoidentifyoutliercancerpatientswithatypicalprofilesfrommultidimensionalgenomicdata.Wearguethatdetectionofoutlierpatientswithatypicalprofilescanhelpidentifyexceptionalrespondersandtailorprecisionmedicineinoncologyinitiatives.

89

HUEMR:INTUITIVEMININGOFELECTRONICMEDICALRECORDS

AbiodunOtolorin1,NanaOsafo2,WilliamSoutherland2

1DepartmentofCommunityandFamilyMedicine,HowardUniversity,Washington,DC;2DepartmentofBiochemistry&MolecularBiologyandtheCenterforComputational

BiologyandBioinformatics,HowardUniversity,Washington,DC

WilliamSoutherlandDespitethewidespreadadoptionofelectronicmedicalrecordsystemsandadvancesingenomics,amajorbarriertoresearchendeavorsisthelackofintuitiveuser-friendlyinteractivetoolsthatenableresearcherstoaccessandanalyzedatareadily.Inlightofthis,innovativetoolshavebeendevelopedtoaddresstheproblem.However,wehypothesizedthataninteractivedatavisualizationtoolthatiscapableofstand-aloneorpluginfunctionalitythatalsoleveragescommondataquerymethodologieswouldcontributetoresearcheffortsrequiringinterrogationofclinicalresearchdatabases.HowardUniversityHospital(HUH)isatertiaryacademicmedicalcenterwithover50,000emergencydepartmentvisitsand8,000inpatientadmissionsperyearandprimarilyprovidescaretotheminoritypopulationintheDistrictofColumbiametropolitanarea.Usingde-identifiedHUHelectronicmedicalrecordsdata,aHUHclinicalresearchdatabasewasdeveloped.Additionally,theHowardUniversityelectronicMedicalRecords(HUeMR)querytoolwasdevelopedasaweb-basedclient-serverapplicationusingjavascriptandphp.HUeMRmayfunctioninstand-aloneorpluginmode.ItsgraphicalinterfacewasbuiltusingGoogleCharts,aninteractiveopensourcevisualizationlibrary.HUeMRsupportscomplexbooleansearchoperationsspecifiedbyaninteractivequerytool.Ontologyispresentedusinglinkeddropdownmenusandqueryconstructionisdisplayedinnaturallanguageform.Dataisdisplayedusingeditableinteractivecharts.Multiplerowsofchartsmaybecreatedthatcontaindifferenttypesofdataconcepts.Queriesmayberefinedbyclickingonthechartsfollowedbyselectionofoneormoreadditionalqueryparameters.DiagnosisbasedonICDcodesorkeywordsmayalsobesearched.Thesefeaturesareillustratedinadiabetesuse-caseinvestigation.Insummary,HUeMRisasecuredataanalyticsthatcanbeuseinstand-aloneorpluginmodetoqueryingclinicalresearchdatabases.Ithasahighlyinteractiveuserinterfacethatallowsrapiddataanalysisforcohortdiscovery.Thisworkwassupportedbygrant#5G12MD007597fromtheNationalInstituteonMinorityHealthandHealthDisparitiesfromtheNIH.

90

DECIPHERINGLUNGADENOCARCINOMAMORPHOLOGYANDPROGNOSISBYINTEGRATINGOMICSANDHISTOPATHOLOGY

Kun-HsingYu1,GeraldJ.Berry2,DanielL.Rubin1,ChristopherRé3,RussB.Altman1,MichaelSnyder4

1BiomedicalInformaticsProgram,StanfordUniversity;2DepartmentofPathology,StanfordUniversity;3DepartmentofComputerScience,StanfordUniversity;

4DepartmentofGenetics,StanfordUniversity

Kun-HsingYuAdenocarcinomaaccountsformorethan40%oflungmalignancy,andmicroscopicpathologyevaluationisindispensabletoitsdiagnosis.However,howhistopathologyfindingsrelatetomolecularabnormalitiesremainslargelyunknown.Toaddressthisproblem,weobtainedhematoxylinandeosinstainedwhole-slidehistopathologyimages,pathologyreports,RNA-sequencing,andproteomicsdataof538lungadenocarcinomapatientsfromTheCancerGenomeAtlas.Weprofiledgeneexpression,proteinexpressionandmodifications,andextractedmorethan9,000objectivefeaturesfromthehistopathologyimagesofeachpatient.Wesuccessfullypredictedhistologygradewithtranscriptomicsandproteomicssignatures(areaundercurve>0.75)andidentifiedtheassociatedmolecularpathways,suchascellcycleregulation,whichprovidebiologicalinsightsintotumorcelldifferentiationgrades.Wefurtherbuiltanintegrativehistopathology-transcriptomicsmodeltogeneratesuperiorprognosticpredictionsforstageIpatients(P<0.01)comparedwithgeneexpressionorhistopathologyanalysisalone.Theseresultssuggestthattheintegrationofhistopathologyandomicsstudiescanrevealthemolecularmechanismsofpathologyfindingsandenhanceclinicalprognosticprediction,whichwillcontributetothedevelopmentofprecisioncancermedicine.Ourmethodsaregeneralizabletoothertypesofmalignancyordiseases.

91

EXPLORINGDEEPLEARNINGFORCOPYNUMBERVARIATIONDETECTIONWITHNGSDATA

Yao-zhongZhang,RuiYamaguchi,SeiyaImoto,SatoruMiyano

InstituteofMedicalScience,UniversityofTokyo

Yao-zhongZhangCopynumbervariations(CNVs)areanimportanttypeofgeneticvariationswidelyusedforprofilingcancerandothercomplexdiseases.AccuratedetectionandsummarizationofCNVshelpidentifyoncotargetandcancersubtypesforprecisionmedicine.InusingNGSdataforCNVsdetection,variousheterogeneousbiases,suchasGC-contentbiasandothernoisesareneededtobeproperlyprocessed.ThisbecomesespeciallyimportantforCNVsdetectiononsingle-cellNGSdata.Inthisstudy,weextendtraditionalHMMapproachesforCNVsdetectionwithdeeplearning.Weextractfeaturerepresentation,whichintegratetheinformationfromreadcountandobservablegenomicsequences,asthenewobservablesequenceofgenomicbinsanditerativelytrainaDNN-HMMmodelforCNVsdetection.WecompareourmethodwithotherHMMbasedCNVsdetectionmethods.

92

IMAGINGGENOMICS

POSTERPRESENTATIONS

93

PERIPHERALEPIGENETICASSOCIATIONSWITHBRAINGRAYMATTERINSCHIZOPHRENIA

DongdongLin1,VinceD.Calhoun2,JuanR.Bustillo3,NoraPerrone-Bizzozero4,JingyuLiu1

1TheMindResearchNetworkandLovelaceBiomedicalandEnvironmentalResearchInstitute,Albuquerque;2Dept.ofElectronicandComputerEngineering,UniversityofNewMexico,Albuquerque;3Dept.ofPsychiatry,UniversityofNewMexico,Albuquerque;4Dept.of

Neurosciences,UniversityofNewMexico,AlbuquerqueJingyuLiuEpigeneticregulationbyDNAmethylationandhistonemodificationhasbeenincreasinglyrecognizedforitsrelevancetoschizophrenia(SZ).Beyondthegeneticvariation,epigeneticsthroughregulationofgenetranscriptionandexpressioncanpotentiallyexplainthe‘missing’heritabilityandmediatetheeffectofgeneticrisksindisease.SpecifictoDNAmethylation,recentstudieshavedemonstratedthat6-7%ofCpGsitesacrossthegenomeshowsignificantcorrespondencebetweenbrainandblood,supportingtheinvestigationofeasilyaccessibletissuesforbrainandmentaldisorders.Inthisstudy,weanalyzedDNAmethylationof163CpGsitesfromsalivaandwholebraingraymatterdensityof108SZpatientsand105healthycontrols.Weareawareofcellularitydifferencesbetweenbloodandsaliva,andtoourbestknowledgenodetailedsaliva-braincorrespondencestudyhasbeendoneexceptgeneralcomparisonofoverallpatterns,whichindicatesalivamaybeamorecloseindicatortobrainthanblood.The163CpGsitesarelocatedwithinthe108schizophrenicriskregionsreportedbythePsychiatricGenomicsConsortiumschizophreniaworkinggroup,andalsoshowedstrongcross-tissuesimilaritybasedonthegenome-widemethylationstudyofbloodandbraintissuesbyHannon,etal.QualitycontrolandnormalizationformethylationdatawereimplementedusingminfiRpackagetoremovebatcheffect,andcelltypeproportioneffect.GraymatterdensitymapsweresegmentedbySPM12withasmoothkennelof8mm3.Weappliedindependentcomponentanalysistobothbrainimagingdataandmethylationdata,andextracted25graymatternetworks,and15methylationcomponents.Amongthem,twomethylationcomponentsweresignificantlycorrelatedtothreegraymatternetworks(falsediscoveryrate<0.05).ThefirstmethylationcomponentcomprisedtwoCpGsiteswithinandneargeneZSCAN12,andwasassociatedwithabilateralmiddle/superiortemporalnetwork(r=0.25),andabilateralsuperiorfrontalnetwork(r=-0.24).Thehigherthemethylationcomponentis,thelowerthegraymatterdensityinsuperiorfrontalgyrusandthehigherinmiddletemporalgyrusare.Moreover,SZpatientsshowedsignificantgraymatterreductioninsuperiorfrontalgyrus(p=7.9x10-5).ThesecondmethylationcomponentconsistedofCpGsitesfromtwochromosomeregions(Chr.10AS3MTandNT5C2genes,andChr.12ARL6IP4andOGFOD2genes),andwasassociatedwithcaudateandthalamusregions.Allanalyseswerecontrolledforageandgender.AlthoughwedidnotfindSZspecificmethylationdifferenceswithinSZriskregions,ourresultssuggestthatDNAmethylationpatternsinsalivaareassociatedwithbraingraymattervariation,andsomeofthisvariationisrelatedtoschizophrenia.Themainlimitationofthisstudyincludes1)thelackofreplicationdatatoverifythefindings,and2)thelackofdirectsalivaandbraintissuecorrespondenceverification.

94

THEINTERPLAYBETWEENOLIGO-TARGETSPECIFICANDGENOME-WIDEOFF-TARGETINTERACTIONS

OlgaV.Matveeva1,NafisaN.Nazipova2,AlekseyY.Ogurtsov3,SvetlanaA.Shabalina3

1BiopolymerDesignLLC,Acton,MA;2InstituteofMathematicalProblemsofBiology,Pushchino,MoscowRegion,Russia;3NationalCenterforBiotechnologyInformation,NationalLibraryof

Medicine,NationalInstitutesofHealth,Bethesda,MDSvetlanaShabalinaManytechniquesofmolecularbiologyinvolveinteractionofspecificoligonucleotideswithDNAorRNAasabasicstep.DNAtargetingofsingle-guided(sg)RNAsforgenomeeditingprocedures,oligonucleotidearraygeneexpressionmonitoringoranti-sense-mediatedgenedown-regulation,andtheGenomicComparisonHybridization(GCH)arrayexperimentsareexamplesoftechniquesinvolvingRNA-DNAandDNA-DNAinteractions.RNAiapproacheswithsiRNAandshRNAmoleculesarebasedonRNA-RNAinteractions.Themainproblemofanyoligo-probeexperimentisthatthespecificoligo-targetinteraction,basedonfullypairedduplex,areusuallycombinedwithnon-specificparallelreactions,whereoligo-probecouldinteractwithmanypartiallypairedDNAorRNAsequences.Theinterplaybetweenspecificandgenome-wideoff-targetinteractionsispoorlystudieddespiteitscrucialroleintheefficacyofthesetechniques.Inthisstudy,weinvestigatedoligo-probecharacteristics,whichareresponsiblefortheinterplay,andwhichmostimprovetheoligo-probedesign.Wedefinedspecificityofinteractionasaratiobetweenoligo-targetspecificandgenome-wideoff-targetinteractions.Microarraydatabases,derivedfromtheGCHexperimentsusingtheAffymetrixplatforms,andcontainingtwodifferenttypesofprobeswereusedfortheanalysisbasedonthethermodynamicfeaturesandnucleotidesequencesofoligo-probes.Thefirsttypeofoligo-probedoesnothaveaspecifictargetonthegenomeandtheirhybridizationsignalsarederivedfromgenome-widecross-hybridizationalone.ThesecondtypeincludesoligonucleotidesthathaveaspecifictargetonthegenomicDNAandtheirsignalsarederivedfromspecificandcross-hybridizationcomponentscombinedtogetherinatotalsignal.Theanalysishasrevealedthathybridizationspecificitywasnegativelyaffectedbylowstabilityofthefully-pairedoligo-targetduplex,stableprobeself-folding,G-richcontent,includingGGGmotifs,lowsequenceSymmetricalComplexity(SC)score.TheSC-scorecharacterizesnucleotidecompositionsymmetryandprobe’svulnerabilitytooff-targetinteractions.Filteringouttheprobeswiththesecharacteristicssignificantlyincreaseshybridizationspecificitybydecreasinggenome-widecross-hybridizationorbyincreasingspecificinteractions.Selectedoligo-probeshavethreetimeshigherhybridizationspecificityonaverage,comparedtotheprobesthatwerefilteredoutfromtheanalysisbyapplyingsuggestedcut-offthresholdstothedescribedparameters.Multipleregressionmodelswithdescribedparametersweresuccessfullyappliedforpredictionsofinteractionspecificityandoff-targeteffectsandsupportedparameterchoice(P<0.001).WealsocomparedprobecharacteristicsselectedfortheanalysisinmicroarraydatabaseswithapplicablefeaturesofsiRNA/shRNAdesignfromourearlierstudies.WeappliedallselectedoligonucleotidefeaturesanddescribedparameterstonewsetsofsgRNAs.Ourstudyexaminedthethermodynamicsandsequence-intrinsicpropertiesofsgRNA-DNAduplexesandanalyzedadditionalselectioncriteriathatarecriticalforguideefficacy.Finally,weidentifyuniversalfeaturesofoligo-probes,si/shRNAsandguidesforoptimaldesignincludingtheSC-score.

95


POSTERPRESENTATIONS

96

WARS2IMPLICATEDASACOMMONMODIFIEROFMETFORMINMETABOLITEBIOMARKERSINABIOBANKCOHORT

AlyssaI.Clay1,RichardM.Weinshilboum2,K.SreekumaranNair3,RimaF.Kaddurah-Daouk4,LieweiWang2,MatthewK.Breitenstein1

1DivisionofEpidemiology,MayoClinic;2DepartmentofMolecularPharmacologyandExperimentalTherapeutics,MayoClinic;3DivisionofEndocrinology,MayoClinic;4Duke

UniversityMatthewBreitensteinBackgroundMetforminisoneofthemostwidelyprescribeddrugsworldwideandafirstlinetreatmentfortype2diabetesmellitus(T2D).Metforminhasmanymechanismsofaction,withvaryinglevelsofunderstanding.Metforminisbeingevaluatedasapotentialchemopreventionagentforcancertreatment,withinhibitionofangiogenesisasoneaffectofmetforminbeingstronglypursued.However,contradictoryevidenceexistsforapotentialmechanismofangiogenesisinhibition(Carcinogenesis2014;(35)5).Buildingonourpriorworkthatidentifiedstratumofstatisticallycorrelatedmetabolites,weaimedtoidentifyoverlappingmetforminpharmacogenomic(PGx)SNPassociations,usingpharmacometabolomicsinformedPGxpairedwithanagnosticcomputationalapproach.MethodsToelucidateoverlappingPGxsignalsofmetforminexposure,weincludedmetabolites(n=5)withcorrelatedplasmaconcentration,adjustedformetforminexposure,inabiobankcohort-based,case-controlstudy.Cases(n=274)wereexposedtometforminmonotherapywithT2D;healthycontrols(n=274)hadnoknowndrugexposures.Casesandcontrolswerematchedbyageandgender,andadjustedforBMIandbatch.Apanelofaminoacidmetabolite(n=42)concentrationswasquantitativelymeasuredusingtandemliquidchromatography-massspectrometryfromfastingplateletpoorplasmasamplescollectedinEDTA.Genotypingwasperformedusingthe700kSNPIlluminaOmniExpressarrayplatformfrom250ngofDNA.Normalizedmetaboliteconcentrationswereutilizedasendpointstoinformgenomewideassociations.ResultsIncreasedplasmametaboliteconcentrationsforleucine(t=4.47,p=<0.0001),isoleuceine(t=4.63,p=<0.0001),andvaline(t=4.48,p=<0.0001)wereobservedwithexposuretometformin.Variantrs17023164(MAF=0.31),intheTryptophanylTRNASynthetase2,Mitochondrial(WARS2)generegionofchromosome1andaneQTLforWARS2infibroblasts,wasacommondownwardmodifierofleucine(β=-11.69,p=1.79e-7),isoleuceine(β=-6.99,p=2.40e-6),andvaline(β=-14.55,p=1.04e-5)withmetforminexposure.NoSNPsinneighboringgenesregionswereinhighLD(R^2>0.5)withrs17023164.ConclusionIncreasedplasmametaboliteconcentrationsforleucine,valine,andisoleucinewereobservedwithmetforminexposure.Acommonvariant,rs17023164inWARS2,wasidentifiedasastrongdownwardmodifierofthesemetaboliteswithmetforminexposure.Independently,WARS2isproposedasadeterminantofangiogenesis(NatCom2016;(7)12061).Wepositahypothesis:modificationofmetabolitebiomarkerconcentrationassociatedwithmetforminexposurebyWARS2variantsisapotentiallinkbetweenmetforminandangiogenesis.Functionalcharacterizationofapotentialmechanismformetformininhibitionofangiogenesis,modifiedbyWARS2,isongoing.

97

ESTIMATIONOFFALSENEGATIVERATESVIAEMBEDDINGSIMULATEDEVENTS

StephenV.Gliske1,KatyL.Lau1,BenjaminH.Brinkman2,GregA.Worrell2,CrisG.Fink3,WilliamC.Stacey1

1UniversityofMichigan,2MayoClinic,3OhioWesleyanUniversity

StephenGliskAutomatedeventdetectionistheresultofmanytypesofdata-drivenpatternrecognitionmethods.Oneofthegeneralchallengestotheseanalyzesisthequantificationandcorrectionforfalsenegativedetections,i.e.,caseswheretheevent(pattern)ispresentinthedatabutwasnotdetected.Estimatingthefalsepositiverateismucheasier,ashumanreviewofasubsampleofdetectedeventsissufficient.However,determiningthefalsenegativeratebyhumanreviewwouldrequiremanualsearchingthroughtherawdata,whichisimpractical,ifnotcompletelyinfeasible.Thischallengeisnotuniquetobiomedicaldataandiscommonlyaddressedinhighenergyphysics.Theapproachiscalledembedding.Itisapplicabletoanyanalysiswhereatleastoneofthesignalorbackgroundcanbemodeledwellbysimulations.Byplacingspecificeventsatknownlocations,onecanthenruntheautomateddetectorandreportthefractionofembeddedeventsthatweredetected.Wepresentthefirstapplicationofembeddingtoneurologicaldata,specificallytheautomateddetectionofabiomarkerofepilepsy(highfrequencyoscillations)recordedinintracranialelectroencephalogram(EEG)data.Thefalsenegativerateisfoundtobeconsistentacrossbothrecordingchannelandacrosspatients.

98

INTEGRATIVE,INTERPRETABLEDEEPLEARNINGFRAMEWORKSFORREGULATORYGENOMICSANDEPIGENOMICS

ChuanShengFoo,AvantiShrikumar,JohnnyIsraeli,PeytonGreenside,ChrisProbert,AnnaScherbina,RahulMohan,NathanBoley,AnshulKundaje

StanfordUniversity

AnshulKundajeWepresentgeneralizableandinterpretablesuperviseddeeplearningframeworkstopredictregulatoryandepigeneticstateofputativefunctionalgenomicelementsbyintegratingrawDNAsequencewithdiversechromatinassayssuchasATAC-seq,DNase-seqorMNase-seq.First,wedevelopnovelmulti-channel,multi-modalCNNsthatintegrateDNAsequenceandchromatinaccessibityprofiles(DNase-seqorATAC-seq)topredictin-vivobindingsitesofadiversesetoftranscriptionfactors(TF)acrosscelltypeswithhighaccuracy.Ourintegrativemodelsprovidesignificantimprovementsoverotherstate-of-the-artmethodsincludingrecentlypublisheddeeplearningTFbindingmodels.Next,wetrainmulti-task,multi-modaldeepCNNstosimultaneouslypredictmultiplehistonemodificationsandcombinatorialchromatinstateatregulatoryelementsbyintegratingDNAsequence,RNA-seqandATAC-seqoracombinationofDNase-seqandMNase-seq.Ourmodelsachievehighpredictionaccuracyevenacrosscell-typesrevealingafundamentalpredictiverelationshipbetweenchromatinarchitectureandhistonemodifications.Finally,wedevelopDeepLIFT(DeepLinearImportanceFeatureTracker),anovelinterpretationengineforextractingpredictiveandbiologicalmeaningfulpatternsfromdeepneuralnetworks(DNNs)fordiversegenomicdatatypes.DeepLIFTcanintegratethecombinedeffectsofmultiplecooperatingfiltersandcomputeimportancescoresaccountingforredundantpatterns.WeapplyDeepLIFTonourmodelstoobtainunifiedTFsequenceaffinitymodels,inferhighresolutionpointbindingeventsofTFs,dissectregulatorysequencegrammarsinvolvinghomodimerandheterodimericbindingwithco-factors,learnpredictivechromatinarchitecturalfeaturesandunravelthesequenceandarchitecturalheterogeneityofregulatoryelements.

99

VISUALIZATIONOFCOMPLEXDISEASESANDRELATEDGENESETS

ModestvonKorff,TobiasFink,ThomasSander

ActelionPharmaceuticalsLtd.,Allschwil,Switzerland

ModestvonKorffTherelationsbetweengenesanddiseasesformcomplexpatterns.Visualizationofthesepatternsenablesthescientisttoobtainanoverviewofthemostimportantgene–diseaserelations.Thesegene–diseaserelationsareofhighimportanceindrugdiscovery.Proteinsencodedbydisease-relatedgenesarepotentialtargetsfornewdrugsormaybecomebiomarkersfordiseasediagnosis.Bothanoveldrugtargetandabiomarkershouldbehighlyspecificfortheaimeddisease.Inourpublicationforthisconference,weintroducearelevanceestimator.Thisrelevanceestimatorisameasureofthespecificityofagene–diseaserelationshipthatalsotakesintoconsiderationallotherknowngene–diseaserelationships.Weanalyzedgene–diseaserelationshipsfrom22millionPubMedrecordsandobtainedamatrixwithrelevanceestimatorsforabout5000diseasesand15,000genes.Thisrelevancematrixenabledustoexpressthesimilaritybetweendiseaseswithsimplevector-baseddistancemeasures.Ameaningfuldisease–gene–diseasevisualization,consistingofseverallayers,wasderivedfromthesedisease–diseasesimilaritymeasuresandtherelevanceestimators.Themultidimensionalvisualizationspresentedheregiveanoverviewofcomplexdiseaseslikeasthma,Alzheimer'sdiseaseandhypertension.

100


POSTERPRESENTATIONS

101

FINDINGSFROMTHEFOURTHCRITICALASSESSMENTOFGENOMEINTERPRETATION,ACOMMUNITYEXPERIMENTTOEVALUATEPHENOTYPE

PREDICTION

StevenE.Brenner1,GaiaAndreoletti1,RogerAHoskins1,JohnMoult2,CAGIParticipants,

1UniversityofCalifornia,Berkeley;2IBBR,UniversityofMaryland,Rockville,MD

StevenBrennerTheCriticalAssessmentofGenomeInterpretation(CAGI,\'kā-jē\)isacommunityexperimenttoobjectivelyassesscomputationalmethodsforpredictingthephenotypicimpactsofgenomicvariation.CAGIparticipantsareprovidedgeneticvariantsandmakepredictionsofresultingphenotype.Thesepredictionsareevaluatedagainstexperimentalcharacterizationsbyindependentassessors.

ThefourthCAGIexperimentconcludedthisyear.Itincluded11challengeswhichreflected:non-synonymousvariantsandtheirbiochemicalimpactmeasuredbytargetedassays;noncodingregulatoryvariantsandtheirimpactongeneexpression;researchexomesforpredictionofcomplextraits;personalgenomesandtraitprofiles;andclinicalsequencesandassociatedreferringindications.

TherewerenotablediscoveriesthroughouttheCAGIexperiment,andgeneralthemesemerged.Theindependentassessmentfoundthattopmissensepredictionmethodsarehighlystatisticallysignificant,butindividualvariantaccuracyislimited.Moreover,missensemethodstendtocorrelatebetterwitheachotherthanwithexperiments(forreasonsthatmayreflectthepredictivemethodsandtheassaysthemselves).However,theremightbepotentialformissenseinterpretationattheextremeofthedistribution.Structure-basedmissensemethodsexcelinafewcases,whileevolutionary-basedmethodshavemoreconsistentperformance.Bespokeapproachesoftenenhanceperformance.

Ontheclinicalstudies,predictorswereabletoidentifycausalvariantsthatwereoverlookedbytheclinicallaboratory,anditappearsthatphysiciansmaynotalwaysorderthemostrelevantgenetictestfortheirpatients.CAGIdatashowthatrunningmultipleuncalibratedmethodsandconsideringtheirconsensusoftenprovidesundueconfidenceintheircorrelation;wethereforeadviseagainstrunningmultipleuncalibratedvariantinterpretationtoolsinclinicalanalysis.

Theresultsshowedthatpredictingcomplextraitsfromexomesisfraught.Interpretationofnon-codingvariantsshowspromisebutisnotatthelevelofmissense.Beyondthis,creatingageneticstudythatprovidesareliablegoldstandardisremarkablydifficult.However,therewerenotableimprovementsintheabilitytomatchgenomestotraitprofiles.

CompleteinformationaboutCAGImaybefoundathttps://genomeinterpretation.org.

102

ASTROLABE:EXPANSIONTOCYP2C9ANDCYP2C1

AndreaGaedigk1,GreysonP.Twist2,SarahSoden2,EmilyG.Farrow2,NeilA.Miller2

1DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,SchoolofMedicine,UniversityofMissouri-KansasCity;2CenterforPediatricGenomic

Medicine,Children'sMercy,KansasCity

AndreaGaedigkBackground:CYP2C9and19arehighlypolymorphicpharmacogenesmetabolizingnumerousdrugs.BotharegeneswithCPICguidelinesunderscoringtheirclinicalrelevance.Tofacilitatehaplotypecallingandtranslationintophenotype,wehavedevelopedaprobabilisticscoringsystem,Astrolabe(initiallycalledConstellation;Twistetal2016,GenMed1:15007)thatenablesautomatedCYP2D6diplotypecallingfromwholegenomesequencing.WereportheretheextensionofAstrolabetoCYP2C9and2C19.Methods:ThestudywasapprovedbytheInstitutionalReviewBoardofChildren’sMercyandincluded85subjects(7HapMap;78patients/parents).AlleledefinitionsareaccordingtotheP450NomenclatureDatabase(cypalleles.ki.se/)withsomemodifications.Exonsand100bpofflankingintronswereusedforAstrolabecallsaswellas-2990to-440ofCYP2C9and-1063to-180ofCYP2C19harboringSNPsdefiningCYP2C9*8andCYP2C19*27,respectively.Allbut3subjectsweregenotypedforCYP2C9*2,*3,*5and*8andCYP2C19*2-*4,*17,*27and*35usingTaqManassaystovalidateAstrolabecalls.WGSdatawerereanalyzedwiththeDRAGENBio-ITprocessor(EdicoGenome)toimprovevariationcallquality.ToaccountforhaplotypeanddiplotypecombinationsnotobservedinoursamplesetsimulationsofallpossiblediplotypecombinationswereperformedusingtheARTreadsimulatorandDRAGENanalysispipeline.Astrolabeisavailableathttps://www.childrensmercy.org/genomesoftwareportal/Results:TomaximizeAstrolabecallaccuracy,intronregionswereadjustedtoincludeinformativeSNPswhileexcludingthosethatoccuronnumeroushaplotypesand/orarenotpartofadefinedallele.TheCYP2C9exon1region,e.g.waslimitedto57bpofintron1toexclude251T>C,whichispresentin1155/3540subjects(CMHvariantwarehousedatabase).ThisSNPdefinesCYP2C*29,butinterferedwithAstrolabecallsbyovercallingCYP2C*29intheabsenceofitskeySNP(33437C>A).OptimizedcallingtargetregionswerethenusedtocompareAstrolabewithgenotypecalls.Astrolabecorrectlycalled68/75(90.67%)and71/75(94.67%)ofsubjectsforCYP2C9and19,respectively.AmongtheallelesdetectedbyAstrolabeandgenotypingwereCYP2C9*2,*3and*8andCYP2C19*2,*17,*27and*35.AstrolabealsoidentifiedsubjectscarryingtherareCYP2C9*9and*11andCYP2C19*15alleleswhichwerenotcoveredbygenotyping.Astrolabecorrectlycalled1077/1128simulatedCYP2C19diplotypes(95%recall;45missedand6multiplecalls).Allmissedcallswere*12calledas*1.ForCYP2C9,Astrolabecorrectlycalled2186/2278simulateddiplotypes(95%recall;61missedand31multiplecalls).Allmissedcallswere*25calledas*1.Discussion:Astrolabe’sfunctionalitywassuccessfullyexpandedtoCYP2C9and19.PhenotypepredictionbasedonAstrolabewassuperioroverthatderivedfromalimitedgenotypepanel.ContinuedimprovementandexpansionofthenomenclaturedefinitionswillallowustoresolvethemiscalledhaplotypesrepresentedinthesimulationsetandimproveAstrolabecallingacrossalldiplotypes.

103

HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER

JonathanGallion,AngelaD.Wilkins,OlivierLichtarge


JonathanGallionThediscoveryofdrivergenesisamajorpursuitofcancergenomics,usuallybasedonobservingthesamemutationindifferentpatients.Buttheheterogeneityofcancerpathwaysplusthehighbackgroundmutationalfrequencyoftumorcellsoftencloudthedistinctionbetweenlessfrequentdriversandinnocentpassengermutations.Here,toovercomethesedisadvantages,wegroupedtogethermutationsfromclosekinaseparalogsunderthehypothesisthatcognatemutationsmayfunctionallyfavorcancercellsinsimilarways.Indeed,wefindthatkinaseparalogsoftenbearmutationstothesamesubstitutedaminoacidatthesamealignedpositionsandwithalargepredictedEvolutionaryAction.Functionally,thesehighEvolutionaryAction,non-randommutationsaffectknownkinasemotifs,butstrikingly,theydosodifferentlyamongdifferentkinasetypesandcancers,consistentwithdifferencesinselectivepressures.Takentogether,theseresultssuggestthatcancerpathwaysmayflexiblydistributeadependenceonagivenfunctionalmutationamongmultipleclosekinaseparalogs.Therecognitionofthis“mutationaldelocalization”ofcancerdriversamonggroupsofparalogsisanewphenomenathatmayhelpbetteridentifyrelevantmechanismsandthereforeeventuallyguidepersonalizedtherapy.

104

SCOTCH:ANOVELMETHODTODETECTINSERTIONSANDDELETIONSFROMNGSDATA

RachelGoldfeder,EuanAshley

StanfordUniversity

RachelGoldfederClinical-gradegenomesequencingandinterpretationrequiresaccurateandcompletegenotypecallsacrosstheentiregenome.Whilesinglenucleotidevariantdetectionishighlyaccurateandconsistent,thesevariantsexplainonlyasmallfractionofdiseaserisk.Othertypesofvariationthatdisrupttheopenreadingframe,suchasinsertionsanddeletions(INDELs),aremorelikelytobeharmful.However,currentmethodshavelowsensitivityforlarger(>=fivebases)INDELs,primarilyduetochallengessurroundingaligningsequencereadsthatspanINDELs.WepresentScotch,anovelINDELdetectionmethodthatleveragessignaturesofpoorreadalignment,readdepthinformation,andmachinelearningapproachestoaccuratelyidentifyINDELsfromnext-generationDNAsequencingdata.Usingbiologicallyrealisticsimulatedgenomesandsequencereadswithtechnologicallyrepresentativeerrorprofiles(generatedbyART),weevaluateScotchandseveralcurrentlyavailableINDELcallers.WeshowthatScotchhashighersensitivitythancurrentmethods,particularlyforlargerINDELs.Finally,wevalidateINDELsthatScotchdiscoveredinoneindividual,NA12878,andshowthatScotchhashighpositivepredictivevalue.ThismethodwillenableresearchersandclinicianstomoreaccuratelyidentifyINDELsassociatedwithpreviouslyunexplainedgeneticconditions.

105

MAYOOMICSREPOSITORYFORTRANSLATIONALMEDICINE

IainHorton,JeanetteEckel-Passow,StevenHart,ShannonMcDonnell,DavidMead,GayReed,GregDougherty,JasonRoss,JulieSwank,MarkMyers,MathieuWiepert,Rama

Volety,TonyStai,YaxiongLin,RobertFreimuth

MayoClinic

IainHortonTheMayoClinicGenomicDataWarehousehasestablishedtheinfrastructurefoundation,processes,andapplicationstomeetthetranslationalneedsoftheMayoClinicCenterforIndividualizedMedicine(CIM).Throughthestreamlinedandautomateddatapipeline,thenext-gensequencing(NGS)resultsareloadedandintegratedwithclinicaldata,providingthefoundationforthedevelopmentofrevolutionarysolutionsanddiscoveryintheclinicalpracticeandgenomicresearch.Initiatedin2012,withproductiondataingestionbeginninginearly2014,MayoClinic'sTranslationalResearchCenter(TRC)hasprovidedthecornerstoneplatformfordatacentricactivitieswithinCIM.DatageneratedfromboththeclinicalpipelineandresearchpipelineareautomaticallyloadedintoTRCwitheachnewbitaddingvalueandpowertothesystem.Twokeysolutionswithsignificantpotentialofimpactingpatientcareandscientificdiscoveryhavebeenbuiltonthisgenomicdatawarehouse.FirstistheMolecularDecisionSupportsystem,arule-basedpharmacogenomicssystemthatenablesMayoClinicclinicianstointegrateactionableinformationbasedonapatient'sgenotypeinformationatthepointofcareusingNGSdata.SecondistheMayoVariantSummaryapplication,acloud-nativesystemwhichempowersMayoClinicresearcherstoidentifyrareandactionablegenomicvariantsthroughdynamicfilteringandgroupingofsubjectphenotypeandspecimenmetadata.

106

PHARMACOGENOMICSCLINICALANNOTATIONTOOL(PHARMCAT)

T.E.Klein1,M.Whirl-Carrillo1,R.M.Whaley1,M.Woon1,K.Sangkuhl1,LesterG.Carter1,H.M.Dunnenberger2,P.E.Empey3,A.T.Frase4,R.R.Freimuth5,A.Gaedigk6,A.Gordon7,C.Haidar8,J.K.Hicks9,J.M.Hoffman8,M.T.Lee10,N.Miller11,S.D.Mooney12,T.N.Person13,J.F.Peterson14,M.V.Relling8,S.A.Scott15,G.Twist11,A.Verma13,M.S.Williams10,C.Wu16,W.Yang8,M.D.Ritchie4,13

1DeptGenetics,StanfordUniv,Stanford,CA;2CenterforMolecularMedicine,NorthShoreUniversityHealthSystem,EvanstonIL;3DepartmentofPharmacyandTherapeutics,SchoolofPharmacy,

UniversityofPittsburgh;4DepartmentofBiochemistryandMolecularBiology,ThePennsylvaniaStateUniversity,UniversityPark,PA;5DepartmentofHealthSciencesResearch,MayoClinic,RochesterMN;6DivisionofClinicalPharmacology,Toxicology&TherapeuticInnovation,Children’sMercy-

KansasCity,KansasCity,MO;7DepartmentofMedicine,DivisionofMedicalGenetics,UniversityofWashington,Seattle,WA;8St.JudeChildren'sResearchHospital,Memphis,TN;9DeBartoloFamilyPersonalizedMedicineInstitute,H.LeeMoffittCancerCenter,Tampa,FL;10GenomicMedicine

Institute,GeisingerHealthSystem,Danville,PA;11CenterforPediatricGenomicMedicine,Children’sMercy,KansasCity,MO;12DepartmentofBiomedicalInformaticsandMedicalEducation,UniversityofWashington,Seattle,WA;13BiomedicalandTranslationalInformatics,GeisingerHealthSystem,Danville,PA;14VanderbiltUniversityMedicalCenter,Nashville,TN;15DepartmentofGeneticsand

GenomicSciences,IcahnSchoolofMedicineatMountSinai,NewYork,NY;16DepartmentofMolecularandExperimentalMedicine,TheScrippsResearchInstitute,LaJolla,CA

TeriKleinPharmacogenomics(PGx)decisionsupportandreturnofresultsisanactiveareaofgenomicmedicineimplementationatmanyhealthcareorganizationsandacademicmedicalcenters.TheClinicalPharmacogeneticsImplementationConsortium(CPIC)hasestablishedguidelinessurroundinggene-drugpairsthatcanandshouldleadtoprescribingmodificationsbasedongeneticvariant(s).OneofthechallengesinimplementingPGxisextractinggenomicvariantsandassigninghaplotypes(includingstar-alleles)fromgeneticdataderivedfromsequencingandgenotypingtechnologiesinordertoapplytheprescribingrecommendationsofCPICguidelines.InacollaborationbetweenthePGRNStatisticalAnalysisResource(P-STAR),ThePharmacogenomicsKnowledgebase(PharmGKB),theClinicalGenomeResource(ClinGen),andCPIC,wearedevelopingasoftwaretooltoextractallvariantsfromCPIClevel-AgeneswiththeexceptionofG6PDandHLA,fromageneticdatasetresultingfromsequencingorgenotypingtechnologies(representedasa.vcf),interpretthevariantalleles,inferdiplotypes,andgenerateaninterpretationreportbasedonCPICguidelines.TheCPICpipelinereportcanthenbeusedtoinformprescribingdecisions.WeassembledafocusgroupofthoughtleadersinPGxtobrainstormtheissuesandtodesignthesoftwarepipeline.Wehostedaone-weekHackathonatthePharmGKBatStanfordUniversitytobringtogethercomputerprogrammerswithscientificcuratorstoimplementthefirstversionofthistool.Throughthisprocess,wehaveuncoveredmanyofthechallengessurroundingPGximplementation.Forexample,theinferenceofdiplotypesischallengingforseveralCPIClevel-Agenes.ThissoftwarepipelinewillbemadeavailableundertheMozillaPublicLicense(MPL2.0)anddisseminatedinGithubforthescientificandclinicalcommunitytotest,explore,andimprove.PharmCATwillprovideasolutionthatwillenablesitesimplementingPGxawaytomoreconsistentlyinterpretgenomicresultsandlinkthoseresultstopublishedclinicalguidelines.Furthermore,weareassembling(andwillbemaintaining)thetranslationtablesthatunderliethetool,whichwillsignificantlyreducetheeffortrequiredtoimplementPGxclinicallyandensuremoreuniforminterpretationsofPGxknowledge.Asprecisionmedicinecontinuestomoveintoclinicalpractice,implementationworkflowsforPGx,likePharmCAT,wouldenablestandardizedandconsistentimplementationofPGxgenes.

107

PCSK9MODULATINGVARIANTSINFAMILIALHYPERCHOLESTEROLEMIA

SarathbabuKrishnamurthy1,DianeSmelser1,ManickamKandamurugu1,JosephLeader1,NouraS.Abul-Husn2,AlanR.Shuldiner2,DavidH.Ledbetter1,FrederickE.Dewey2,David

J.Carey1,MichaelF.Murray1,RaghuP.R.Metpally1

1GeisingerHealthSystem;2RegeneronGeneticsCenter

SarathbabuKrishnamurthyBACKGROUND:Highlypenetrantautosomaldominantfamilialhypercholesterolemia(FH)isknowntobecausedbypathogeniclossoffunction(LOF)variantsinLDLRandgainoffunctionvariantsinPCSK9andAPOBgenes.InadditiontoitscausativeroleinFH,PCSK9LOFvariantsareassociatedwithloweringofserumlowdensitylipoproteincholesterol(LDL-C)andtotalcholesterol.Theaimsofthisstudywereto1.IdentifyrarenovelPCSK9genevariantsthatleadtocompleteorpartiallossofproteinfunctionintheDiscovEHRcohort.2.ExploreprevalenceofPCSK9LOFvariantsinasubsetofFHpatientsand3.ExaminewhetherFHpatientscarryingPCSK9LOFsshowassociationwithloweringtheplasmalowdensityLDL-Candcardiovascularrisk.METHODS:Weanalyzedwholeexomesequencesfrom51,289individualsintheDiscovEHRcohort,whoconsentedtoparticipateintheGeisingerHealthSystem’sMyCodeCommunityHealthInitiative.Raremissenseandpredictivelossoffunction(pLOF)codingvariantsinPCSK9wereidentifiedbyintegratingbioinformaticsandevaluatingLDL-Candtotalcholesterolmeasuresfromtheelectronichealthrecords(EHR).RESULTS:IntheoverallDiscovEHRcohort,weidentified20missenseand13pLOFs(2splicedonor,6stopgainedand5frameshift)rarevariantsinPCSK9,including15novelvariantsthatwereassociatedwithlowerLDL-Candtotalcholesterollevels.LDL-CinpLOFcarrierswassignificantlylowerthaninmissensecarrierswithpresumedpartiallossoffunction(p<0.0012).PatientswithPCSK9raremissensewithpresumedpartialLOForLOFvariantshadsignificantreductionintheincidenceofcoronaryeventscomparedtothecontrolgroup(p<0.0001).InFHpatients,theLDL-loweringPCSK9R46Lvariantpreviouslyreportedas3%prevalencewasfoundtobeenrichedat9.6%andwasassociatedwithlowerLDL-CcomparedtoFHpatientsnotcarryinganR46Lallele.AnovelPCSK9missensevariant(G316S)wasalsopresentinFHpatientswithaprevalenceof0.8%andalsoshowedanLDL-loweringphenotypiceffectinanimputedfamilypedigree.CONCLUSIONS:Overall11.8%oftheFHpatientsintheDiscovEHRcohortwereidentifiedtoalsocarryaPCSK9variantwhichmodulatestheirLDL-Candserumcholesterollevels.

108

INTEGRATIVENETWORKANALYSISOFPROSTATETISSUELINCRNA-MRNAEXPRESSIONPROFILESREVEALSPOTENTIALREGULATORYMECHANISMSOF

PROSTATECANCERRISKLOCI

NicholasB.Larson1,ShannonMcDonnell1,ZachFogarty1,MelissaLarson1,JohnCheville2,ShaunRiska1,SaurabhBaheti1,AshaA.Nair1,DanielO’Brien1,JaimeDavila1,DanielSchaid1,StephenN.

Thibodeau21DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN;2Departmentof

LaboratoryMedicineandPathology,MayoClinic,Rochester,MN

NicholasLarsonLarge-scalegenome-wideassociationstudieshaveidentified146lociassociatedwithriskofdevelopingprostatecancer(PRCA).However,mostoftheselocidonotlieincloseproximitytoproteincodinggenesandarepresumedtoberegulatoryinnature.DownstreamregulationofproteincodinggenesrelatedtoPRCAdevelopmentmaybemediatedbycis-actingregulationofnearbytranscripts,alsoknownascis-mediatedtrans-eQTLs.Thiscis-mediatorcausalrelationshipiscomprisedofaregulatoryvariant,anearbycis-regulatedgene,andthedownstreamregulatedtranstargetgene.Cis-mediatorsmayincludetranscriptionfactors,signalingproteins,andlongintergenicnon-codingRNAs(lincRNAs).LincRNAscorrespondtoahostofregulatoryfunctionssuchaschromatinremodelingandtranscriptionalco-activation,andhavepreviouslybeenidentifiedasdiagnosticandprognosticbiomarkersforanumberofcancers.Howevertheirroleincancerdevelopmentandprogressionispoorlyunderstood.Toexplorethehypothesisthatcis-mediatedtranseQTLsmayplayaroleinPRCArisk,weleveragedaneQTLdatasetof471samplesofnormalprostatetissuefromprostate/bladdercancerpatientswithavailableRNA-SeqandimputedIlluminaInfinium2.5Mgenotypedata.Wefirstconductedaninitialtranscriptome-wideeQTLscreeningofalllincRNAsandmRNAswith8,073SNPsinhighlinkagedisequilibrium(r2>0.5)withpreviouslyidentifiedPRCArisk-associatedvariants,identifyingapproximately5000transcripts(FDR<0.10)tobeputativelyassociated(cisortrans).WethenconstructedanundirectedGaussiangraphicalregulatorynetworkfromtheexpressionprofilesofthistranscriptsubset,identifying87,468connections.Toidentifycandidatecis-mediatornode-pairsintheexpressionnetwork,weisolatedasubsetofcis-associatedtranscripts(lincRNAormRNA)atastrictBonferronisignificancethreshold.WethenidentifiedallconnectedmRNAnodestothesecis-nodesthatdistaltothecis-variant(>1Mb)andhadevidenceofatrans-associationwiththecisvariant(P<1E-04),resultingin9candidatecis-mediatortrios.Finally,weappliedcausalmediationanalysistotesttheproportionofthetrans-associationthatismediatedbythecis-regulatedtranscript,resultingin7/9significantcis-mediatorrelationships.TranscriptionfactorHNF1Bwasidentifiedtobeasignificantmediatorinthetrans-associationsbetweenrs11263762andthreemRNAs:SRC,MIA2,andSEMA6A.AllthreeexhibitedconcomitantupregulationwithHNF1B.Notably,HNF1AhasbeenshowntostimulateSRCexpressionviaanalternativepromoter,whileMIA2isalsoaknownHNF1Atarget.DysregulationofSEMA6AhasbeenobservedinPRCAmetastasesandplaysapotentialroleinangiogenesisinteractingwithVEGFR2.MSMBandNDRG1bothdemonstrateandrogen-stimulatedexpressioninprostatetissue,andindicatedarecessivepatternofexpressiondysregulationwithrs10993994.Despiteasmallsamplesize,wereplicatedmultipletrans-eQTLsfromthesecis-mediatortriosintheGTExprostatetissueeQTLdataset(P<0.05).Together,ourfindingssuggestdysregulationofRNAexpressionmayplayaroleingeneticpredispositiontoPRCA.

109

INTEGRATEDANALYSISOFGENOMICS,PROTEOMICS,ANDPHOSPHOPROTEOMICSINCELLSANDTUMORSAMPLES

JasonE.McDermott1,TaoLiu1,SamuelPayne1,VladislavPetyuk1,RichardSmith1,PhilippMertins2,StevenCarr2,KarinRodland1

1PacificNorthwestNationalLaborator,2BroadInstitute

JasonMcDermottAspartoftheClinicalProteomicTumorAnalysisConsortium(CPTAC),wehaverecentlypublishedthefirstlarge-scaleproteomicandphosphoproteomicanalysisofhigh-gradeserousovariantumors.Weobservedthatphosphorylationstatuswasanexcellentindicatorofpathwayactivityandcoulddiscriminatebetweenpatientsurvivaltimes.Inthecurrentworkwehavecombinedthisdatawithcomparabledatafrombreastcancertumorsandcancercelllinestreatedwithkinaseinhibitors,toanswerseveralfundamentalquestionsabouttheroleofphosphorylationincellularprocessesandcancer.Thetotaldatasetcomprisedover150sampleswithverydeepproteomiccoverage(>20,000phosphopeptidesconfidentlyidentified).Wefirstfoundthatthecorrelationbetweenkinaseproteinabundanceandabundanceofphosphorylatedtargetpeptideswasverylow,indicatingthatkinaseabundanceisnotagoodproxyforphosphorylationstatusoverall.However,highlycorrelatedkinase-substratepairsweresignificantlymorelikelytobetruerelationships(fromexistingknowledge),demonstratingthatthismethodcouldbeusedtopredictnovelkinasetargetsinsomecases.Weusedthisanalysistoidentifyseveralnovelkinase-substraterelationshipsthatweredifferentialbetweentumorsubtypes,andthatcorrelatedwithpathwayswherephosphorylationwasaffectedbydrugtreatment.Theserelationshipsarecurrentlyunderinvestigationaspotentialnoveltargetsfortherapeuticintervention.Tobetteranalyzecancer-relevantpathwayactivitywedevelopedanovelapproachthatcharacterizescorrelation,differentialabundance,andstatisticalinteractionsbetweencomponentstoanalyzemultipleomicstypesinthecontextofsignalingandfunctionalpathways.Weusedthisapproach,calledtheLayeredEnrichmentAnalysisofPathways(LEAP),toidentifyactivepathwaysinmolecularsubtypesofovarianandbreastcancer,andseveralnovelsubpopulationsofpatientsdisplayinguniquelydysregulatedpathways.Ourresultsshowthatintegrationofmultipleomicstypeshasgreatpotentialintheareaofdevelopmentofnoveltherapeuticapproachesforpersonalizedmedicine.

110

NETDX:PATIENTCLASSIFICATIONUSINGINTEGRATEDPATIENTSIMILARITYNETWORKS

ShraddhaPai,ShirleyHui,RuthIsserlin,HussamKaka,GaryD.Bader

TheDonnellyCentre,UniversityofToronto

ShraddhaPaiPatientclassificationhaswidespreadbiomedicalandclinicalapplications,includingdiagnosis,prognosis,diseasesubtypingandtreatmentresponseprediction.Ageneralpurposeandclinicallyrelevantpredictionalgorithmshouldbeaccurate,generalizable,beabletointegratediversedatatypes(e.g.clinical,genomic,metabolomic,imaging),handlesparsedataandbeintuitivetointerpret.WedescribenetDx,asupervisedpatientclassificationframeworkbasedonpatientsimilaritynetworks,thatmeetstheabovecriteria(Ref1).netDxmodelsinputdataaspatientnetworks,andusesnetworkintegrationandmachinelearningforfeatureselection.WedemonstratetheutilityofnetDxbyintegratinggeneexpressionandcopynumbervariantstoclassifybreastcancertumoursasbeingoftheLuminalAsubtype(N=348tumours;Ref2).Usinggeneexpressiondata,netDxperformedaswellasorbetterthanestablishedstateoftheartmachinelearningmethods,achievingameanaccuracyof89%(2%s.d.)inclassifyingLuminalA.Inthesecondapplication,wepredictcase/controlstatusinautismspectrumdisordersbasedontheoccurrenceofrarecopynumberdeletionsinmetabolicpathways(N=3,291patients;Ref3);thispredictorachievedbetterperformancethanpreviouslypublishedmethods.netDxusespathwayfeaturestoaidbiologicalinterpretabilityandresultscanbevisualizedasanintegratedpatientsimilaritynetworktoaidclinicalinterpretation.Uponpublication,netDxsoftwarewillbemadepubliclyavailableviagithub;thesoftwareprovidesworkedexamplesandeasy-to-usefunctionsfordesignofcustompredictorworkflows.Moreathttp://netdx.orgReferences:1.netDxpreprint:http://dx.doi.org/10.1101/0844182.TheCancerGenomeAtlas(2012)Nature490:61.3. Pintoetal.(2014).AmJHumGen.94(5):677.

111

PREVALENCEANDDETECTIONOFLOW-ALLELE-FRACTIONVARIANTSINCLINICALCANCERSAMPLES

Hyun-TaeShin1,2,JaeWonYun1,2,NayoungK.D.Kim1,Yoon-LaChoi2,3,Woong-YangPark1,2,4,PeterJ.Park5

1SamsungGenomeInstitute,SamsungMedicalCenter,Seoul,Korea;2Samsung

AdvancedInstituteofHealthScienceandTechnology,SungkyunkwanUniversity,Seoul,Korea;3DepartmentofPathology&TranslationalGenomics,SamsungMedicalCenter,SungkyunkwanUniversitySchoolofMedicine,Seoul,Korea;4DepartmentofMolecularCellBiology,SungkyunkwanUniversitySchoolofMedicine,Seoul,Korea;5Department

ofBiomedicalInformatics,HarvardMedicalSchool,Boston,MA

Hyun-TaeShinClinicalapplicationofsequencing-basedassaysrequireshighsensitivityandspecificityfordetectinggenomicalterations.Ouranalysisofmorethan5000cancersamplesrevealsthatasignificantfractionofclinically-actionablesomaticvariantsmayhavelowvariantallelefractions(VAF),indicatingtheimportanceofveryhighcoveragesequencingforthesepatients.Asacasestudy,wedescriberefractorycancerpatientswithclinicalresponsetotherapiesthattargetlowVAFalterations.

112

AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETS

JeffreyA.Thompson1,CarmenJ.Marsit2

1DartmouthCollege,2EmoryUniversity

JeffreyThompsonManyresearchersnowhaveavailablemultiplehigh-dimensionalmolecularandclinicaldatasetswhenstudyingadisease.Asweenterthismulti-omiceraofdataanalysis,newapproachesthatcombinedifferentlevelsofdata(e.g.atthegenomicandepigenomiclevels)arerequiredtofullycapitalizeonthisopportunity.Inthiswork,weoutlineanewapproachtomulti-omicdataintegration,whichcreatesamodelofmethylationdysregulationanditseffectongeneexpressionandthencombinesthismolecularinformationwithclinicalpredictorsaspartofasingleanalysistocreateaprognosticriskscoreforclearcellrenalcellcarcinoma.Theapproachintegratesdatainmultiplewaysandyetcreatesmodelsthatarerelativelystraightforwardtointerpretandwithahighlevelofperformance.Over100randomsplitsofthedataintotrainingandtestingsets,ourmodelhadthehighestmedianC-indexofanymethodwetried,at.792.Furthermore,wedemonstratedthatourmolecularriskpredictorisindependentofclinicalcovariatesandthatthecombinedmodelresultsinstatisticallysignificantlyhigheraccuracythaneitherdatatypealone.Additionally,theproposedprocessofdataintegrationitselfcapturesrelationshipsinthedatathatrepresenthighlydisease-relevantfunctions.Thegenesignatureweidentifyforclearcellrenalcellcarcinomaprognosisisenrichedforgenesthatarecentralnodesinaprotein-proteininteractionnetworkassociatedwiththeJAK-STATsignalingcascade,whichitselfisaknownfactorinkidneycancerprogression.Oursignatureisalsoenrichedforgenesinpathwaysinvolvedinimmuneresponse,whichareincreasinglytargetedbynovelcancertherapies.Wecallthismodelthemethylation-to-expressionfeaturemodel(M2EFM).Althoughoneoftheotherapproachesweconsideredalsoresultedinahighlyaccuratemodel,M2EFMperformedbetterwithafarmoreparsimoniousmodelthatshedslightonthepotentialrelationshipbetweenabnormalgeneregulationandcancerprognosis.Givenourresults,wethinkthatfurtherdevelopmentofthisapproachiswarranted.

113

CYP2D6DIPLOTYPECALLINGFROMWGSUSINGASTROLABE:UPDATE

AndreaGaedigk1,GreysonP.Twist2,SarahSoden2,EmilyG.Farrow2,NeilA.Miller2

1DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,SchoolofMedicine,UniversityofMissouri-KansasCity;2CenterforPediatric

GenomicMedicine,Children'sMercy,KansasCityGreysonTwistBackground:Tofacilitatehaplotypecallingandtranslationintophenotype,wehavepreviouslydevelopedaprobabilisticscoringsystem,Astrolabe(initiallycalledConstellation;Twistetal2016,GenMed1:15007)enablingautomatedCYP2D6diplotypecallingfromwholegenomesequencing.Wehaveimplementedaseriesofimprovementstoincreasecallaccuracyaswellaseaseofuse.Methods:TheStudywasapprovedbytheInstitutionalReviewBoardofChildren’sMercyKansasCityandincludedatotalof85subjects(7HapMap;78patients/parents).WGSdatawerereanalyzedwiththeDRAGENBio-ITprocessor(EdicoGenome)toimprovethequalityofvariationcalls.TheAstrolabeCYP2D6alleledefinitiontablewasexpandedtoincludea)additionalvariantsavailablethroughtheP450NomenclatureDatabase;b)variantscharacterizedbyourlaboratory,butnotavailablethroughtheNomenclatureDatabase;c)resequencingofsomealleles(e.g.*10,*17)forwhichonlyexonsareannotatedbytheNomenclatureDatabase.Programmingerrorsinthescoringalgorithmwererepairedandunittestedaswellasabroadrangeofvariantfileinputtypeswereincluded(vcf,gvcf,tabix,.gz).ImprovementsalsoincludeversioningoftheAstrolabetoolandthenomenclaturedatafromwhichcallsaregenerated.ToaccountforhaplotypeanddiplotypecombinationsnotobservedinoursamplesetsimulationsofallpossiblediplotypecombinationswereperformedusingtheARTreadsimulatorandDRAGENanalysispipeline.Astrolabeisavailableathttps://www.childrensmercy.org/genomesoftwareportal/.Results:TomaximizeAstrolabecallaccuracy,weremovedCYP2D6*1E,*3B,*4A-L,*4N,*6D,*10C-D,and*45Bfromthecallset,becauseofincompletealleledefinitions(basedonexonsonly),orSNP(s)thatarenotuniquetoanallele.Forexample,1749A>GispartoftheCYP2D6*3Band*103definitions,butalsoappearstobepresentonsome*1subvariants.Likewise,3288A>GisnotlimitedtoCYP2D6*6Dasimpliedbythenomenclaturedatabase,thuscausingerroneousAstrolabecalls.Callswithourreviseddefinitionswerecomparedwiththoseobtainedbygenotyping.AstrolabealsoaccuratelyidentifiedsubjectswithcopynumbervariationsincludingtheCYP2D6*5deletion(n=5)andgeneduplications(n=2).Also,increasedvariantcallingaccuracyoftheDRAGENpipelineimprovedthecallingofseveralsamples(n=).Astrolabecorrectlycalled7731/8128simulateddiplotypes(95%recall);133missedand264multiplecalls).Ofthemissedcalls124weredueto*38calledas*1.Discussion:TheseriesofimprovementstoAstrolabeincreasedcallaccuracyandminimizedthenumberofnocalls.PhenotypepredictionbasedonAstrolabewassuperioroverthatderivedfromalimitedgenotypepanel.ContinuedrefinementofexistingalleledefinitionsandtheinclusionofnovelhaplotypedefinitionswillfurtherimprovetheAstrolabetool.WearecurrentlyapplyingAstrolabetootherNGSdatasetsincludingexomesandtargetedNGSpanels.

114

INTEGRATION,INTERPRETATIONANDDISPLAYOFMULTI-OMICDATAFORPRECISIONMEDICINE

DavidS.Wishart1,AnaMarcu1,AnChiGuo1,AshAnwar2,SolveigJohannessen3,CraigKnox4,MichaelWilson4,ChristophH.Borchers5,PieterCullis6,RobertFraser2

1UniversityofAlberta,2MolecularYouInc.,3EduceDesignInc.,4OMxInc.,5Universityof

Victoria,6UniversityofBritishColumbia

DavidWishartThegoalofprecisionmedicineistouseadvancedmulti-omictechnologiestoimprovetheaccuracyofmedicaldiagnosesandenhancetheindividualizationofmedicaltreatment.Thefundamentalchallengeinprecisionmedicineisnotinthemeasurementorcollectionofmulti-omicdatabutinitsdelivery.Inparticular,theintegration,interpretationanddisplayofmulti-omicdatahasproventobeparticularlyproblematic.Herewedescribesomeofourexperiencesintacklingthisproblemandoutlineanumberofimportantfindingsthatwebelieveareworthsharing.Ourmostimportantfindingwastheneedtousehighquality,quantitative‘omicsdata.Measuringabsolutelyquantitative‘omicsdataensuresgreaterreproducibilityandpermitsdirectcomparisonstowell-establishedclinicalreferencevalues.Several‘omicslaboratoriesofferingquantitativeserviceshavebeenidentifiedandthesearedescribedhere.Second,wediscoveredthatcustomdatabasescontainingbiomarker-diseasedataareessential.Veryfewofthesekindsofdatabasesexist,buttheyarenecessaryforthecomparisonandfullintegrationofmulti-omicdata.Inparticular,theyprovidetheinformationneededtointegratemulti-omicmeasuresandtodeterminediseaserisk.Abriefdescriptionofafewofthesebiomarker-diseasedatabasesisprovided.Third,wediscoveredthatcolor-codedgraphs,whicharehyperlinkedtodetailedtextualexplanations,arenecessaryforthefacileinterpretationofthemulti-omicdata–bothbypatientsandphysicians.Anexampleofawell-designed,web-enabled“dashboard”isshowntohighlightthesefindings.Finallywefoundthatcomprehensivedatabasesofactionableresponsesmustbepreparedsothatdetailed,customizablemedical,lifestyle,dietorpharmacologicalguidancecanbeprovidedtotreatorpreventconditionsdetectedbythesemulti-omicmeasurements.Examplesofseveralomics-derived,actionableresponsesareprovidedtoclarifythispoint.Thesefindings,alongwithseveralassociatedsoftwaretoolsanddatabases,haverecentlybeenintegratedintoanautomaticworkflowthatallowsawiderangeofmulti-omicmeasurementstobeintegrated,interpretedanddisplayedforprecisionorpersonalizedmedicineapplications.

115

BIOTHINGSAPIS:LINKEDHIGH-PERFORMANCEAPISFORBIOLOGICALENTITIES

JiwenXin1,CyrusAfrasiabi1,SebastienLelong1,GingerTsueng1,SeanD.Mooney2,AndrewI.Su1,ChunleiWu1

1TheScrippsResearchInstitute,2TheUniversityofWashington

ChunleiWuTheaccumulationofbiologicalknowledgeandtheadvanceofwebandcloudtechnologyaregrowinginparallel.Recently,manybiologicaldataprovidersstarttoprovideweb-basedAPIs(ApplicationProgrammingInterfaces)foraccessingdatainasimpleandreliablemanner,inadditiontothetraditionalrawflat-filedownloads.WebAPIsprovidemanybenefitsovertraditionalfiledownloads.Forinstance,userscanrequestspecificdatasuchasalistofgenesofinterestwithouthavingtodownloadtheentiredataset,therebyprovidingthelatestdataondemandandreducingcomputationanddatatransfertimes.Thismeansthatprogrammerscanspendlesstimeonwranglingdata,andmoretimeonanalysisanddiscovery.Buildinganddeployingscalableandhigh-performancewebAPIsrequiressophisticatedsoftwareengineeringtechniques.Wepreviouslydevelopedhigh-performanceandscalablewebAPIsforgeneandgeneticvariantannotations,accessibleatMyGene.infoandMyVariant.info.Thesetwoservicesareatangibleimplementationofourexpertiseandcollectivelyserveover4millionrequestseverymonthfromthousandsofuniqueusers.Crucially,theunderlyingdesignandimplementationofthesesystemsareinfactnotspecifictogenesorvariants,butrathercanbeeasilyadaptedtootherbiomedicaldatatypessuchdrugs,diseases,pathways,species,genomes,domainsandinteractions.Wearecurrentlyexpandingthescopeofourplatformtootherbiologicalentities.Collectively,wereferthemas“BioThingsAPIs”(http://biothings.io).WealsoappliedJSON-LD(JSONforLinkingData)technologyinthedevelopmentofBioThingsAPIs.JSON-LDprovidesastandardwaytoaddsemanticcontexttotheexistingJSONdatastructure,forthepurposeofenhancingtheinteroperabilitybetweenAPIs.WehavedemonstratedtheapplicationsofJSON-LDwithBioThingsAPIs,includingdatadiscrepancychecksaswellasthecross-linkingbetweenAPIs.

116


POSTERPRESENTATIONS

117

SINGLECELLSIGNALINGSTATESREVEALINDUCTIONOFNON-GENETICVARIATIONINRESISTANCETOTRAIL-INDUCEDAPOPTOSIS

ReemaBaskar,HarrisFienberg,GarryNolan,SeanBendall

StanfordUniversity

ReemaBaskarTNFalpha-relatedapoptosis-inducingligand(TRAIL)hasbeenshowntospecificallytargetcancercells,howeverrampantresistancehascurtaileditsefficacyasadrug.Cell-to-cellvariationhasbeenpreviouslylinkedtoresistancetoTRAIL-inducedapoptosis.Wefurtherinvestigatenon-geneticphenotypicvariationasanovelmodeofdrugresistance.Usingmasscytometry,wecapturedhigh-dimensional,single-cellsignalingstatesofdifferentcancertypesoverthecourseofTRAILtreatment.Forthefirsttime,weprovideacomprehensivesinglecelloverviewofTRAILsignalingdynamicsandprovidepopulationmetricstoquantifyheterogeneitywithinresistancephenotypes.WedemonstratethatwhileallcellsrespondtoTRAIL,asubsetofthempersistintransientresistantstatesanddonotprogresstoapoptosis.OurmethodsshowcorrelationbetweenheterogeneityofresponsetoTRAILandpersistenceofnon-apoptotic,viablecancercellsindrug.Wealsoshowthatcombinatorialtherapiesdesignedtoinhibitimplicatedpathwaysinconservedresistantstatesdonoteradicateresistanceandinfactcaninducenewstatesofresistance.Thisstudypresentsexperimentalandcomputationaltoolstoinvestigatenon-geneticphenotypicvariationasanovelmodeofdrugresistanceincanceranddemonstratestheirutilityinunderstandingresistancetoTRAIL-inducedapoptosis.

118

ANOVELK-NEARESTNEIGHBORSAPPROACHTOCOMPAREMULTIPLEBIOLOGICALCONDITIONSINSINGLECELLDATA

TylerJ.Burns1,GarryP.Nolan2,NikolaySamusik2

1StanfordUniversitySchoolofMedicine,Dept.ofCancerBiology;2StanfordUniversitySchoolofMedicine,BaxterLaboratoryforStemCellBiology

TylerBurnsHighdimensionalsingle-celldataisroutinelyvisualizedintwodimensionsusingdimensionreductionalgorithmsliket-SNE,PrincipleComponentsAnalysis(PCA),orforce-directedgraphs.Whencomparinglevelsofintracellularproteinsinbasalversusperturbedcells,clusteringmustbeusedtovisualizechangesinspecificmarkersinasinglegraph.However,discretizingadatasetdoesnotallowonetounderstandsubtle,rare,and/orcontinuousbiologicalchangesacrosstheoriginalmanifold.Herein,wepresentanalgorithmthatrepresentseachcell’sinformationcontentasitsaverageacrossk-nearestneighbors.Thisallowsforcomparisonstobemadebetweenbiologicalconditionsonaper-cellbasis.Weusethistoproducedetailedt-SNEmapsdepictingbiologicalchange,andcorrelationanalysistoenumeratesignalingresponsestoperturbation.

119

SINGLE-CELLRNASEQUENCINGINPRIMARYGLIOBLASTOMA:IMPROVINGANALYSISOFHETEROGENEOUSSAMPLESBYINCORPORATING

QUANTIFICATIONOFUNCERTAINTY

WendyMarieIngram,DebdiptoMisra,NicholasF.Marko,MarylynRitchie

GeisingerHealthSystemWendyIngramBackground:Glioblastoma(GBM)isthemostcommonanddeadlybraincancerinadults.Theassociatedlethalitymaybeattributabletotheintrinsicheterogeneityofmicro-invasivetumorcells,someofwhichareunavoidablyleftbehindfollowingtumorresection.Thetranscriptomicheterogeneitymaycontributetothesurvivalandsubsequentproliferationofasmallsubsetofcellsthatareresistanttoradiationandchemotherapy.Ithaslongbeenhypothesizedthatinvestigationsintothesetumorsatasinglecelllevelwillallowforbettermolecularunderstandingoftreatmentresistanceandthedevelopmentofnoveltherapeuticapproaches.Recently,advancesinsinglecellcaptureandsequencingtechnologyhavebecomeavailableandallowforthesestudiestobeconducted.However,therearemanytechnicalandcomputationalchallengesinherenttosinglecelltranscriptomicsthatarenotaddressedbytraditionalRNA-seqanalysistools.Thesechallengesincludeuncertaintyoftechnicalandbiologicalvarianceandmustbecarefullyconsideredinorderforbiologicallyandtherapeuticallyrelevantconclusionstobereached.Methods:TumortissuefromtwoGBMpatientsundergoingsurgicalresectionaspartofstandardofcaretherapywascollectedatthetimeofsurgery.WeusedtheFluidigmC1microfluidicsplatformtocapturesinglecellsfollowedbyRNAsequencing(RNA-seq)ofthesecellsandabulkpopulationof~10,000cellsfromeachtumor.Wecomparedtwodifferenttranscriptomicalignmenttools,Bowtieandkallisto,andanalyzedthesinglecelltranscriptionalheterogeneityofcellswithinandbetweentumorsusingtherecentlydevelopedanalysistools,sleuth.Tothebestofourknowledge,wearethefirsttoutilizethissinglecellcapturemethodandperformsinglecellRNA-seqanalysisusingthenewlydevelopedkallistoandsleuthprogramsforprimaryGBMtissuesamples.Results:WeshowthattheFluidigmC1microfluidicssinglecellcapturemethodproduceshighqualitytranscriptomicmaterialforRNA-seqandmayhavebenefitsoveralternativemethods(e.g.fluorescence-activatedcellsorting)suchasshorterpreparationtime.Thekallisto-sleuthanalysisprogramsprovideimprovedestimationofgeneexpressionvariabilityandmorereliableclusteringofsinglecellsbyleveragingtheuniquefeaturesofequivalencygroupsandbootstrapestimatesofkallisto.Clusteranalysisdemonstratesthatcertaincellsfrombothtumorsclustertogetherandsharesomecommonexpressionpatters,buttheremainingcellsclusterintumor-specificgroupsordonotgroupwithothercells.WeobservemarkedintertumorandintratumortranscriptionalvariabilityandnotethataverageexpressionfromsinglecellsdoesnotreliablycorrelatewiththebulkcellRNA-seqabundanceestimates.Takentogether,wehaveshownthatthecombinationofFluidigmC1andthekallisto-sleuthanalysisprogramsprovetobeusefulandreliablemethodstoobtainandanalyzehighqualitysinglecellRNA-seqdatafortheinvestigationofprimarytumortissues.

120

REGISTRATIONOFFLOWCYTOMETRYDATAUSINGSWIFTCLUSTERTEMPLATESTOREMOVECHANNEL-SPECIFICORCLUSTER-SPECIFICVARIATION

JonathanA.Rebhahn1,SallyA.Quataert1,GauravSharma2,TimR.Mosmann1

1CenterforVaccineBiologyandImmunology,UniversityofRochesterMedicalCenter;2DepartmentofElectricalandComputerEngineering,UniversityofRochester

TimMosmannStandardizationbetweenflowcytometryexperimentsperformedatdifferenttimesisdifficultbecausevariationsincellparameterscanbecausedbymanyfactors,includingchangesinantibodyreagents,stainingprotocols,cellhandling,differentcytometers,andcytometersettingssuchasphotomultiplieramplificationvoltages.Thesevariationsmayoverwhelmthegenuinebiologicaldifferencesbeinginvestigated,suchasgeneticordisease-specificvariationsbetweensubjects.Technicalvariationscanbepartlyreducedbymanuallyadjustinganalysisgates,butthisissubjectiveandtime-consuming.Previousmethodsforsemi-automatedadjustmenthavereliedonhistogrampeaksormanualgatingtoidentifyanchorpopulations.Wehavenowdevelopedfully-automatedmethodsforregisteringflowcytometrysamples,i.e.normalizingthefluorescenceintensityofeachcellinallchannels.Wetakeadvantageofthehigh-resolutionclustertemplatesderivedbyclusteringreferencesamplesbytheSWIFTalgorithm.ThesetemplatesrepresentGaussianmodeldescriptionsofthemultidimensionaldata.Ifsamplestoberegisteredareatleastmoderatelysimilartothetarget/referencesample,assignmentofthetestsampletothetemplateresultsinmostcellsbeingassignedtotheappropriatecluster,butclustersthathaveshiftedinthetestsamplethenhavealteredmedianvaluesinoneormorechannels.Thishigh-resolutionpositionalinformationisusedfortwotypesofregistration:Rigid,orper-channelregistrationcomparesclusterlocationsbetweenthetargetandthetestsampletoberegistered,andthebest-fitregistrationadjustmentsaredeterminedforeachchannelandappliedincrementally,reassigningthecellsateachsteptoimprovethefinalfit.Thisobjectivelyusespositionalinformationfromallclusters,regardlessofclustersizevariation,andsuccessfullycorrectsglobalartifactssuchasstainingorcytometersettingsthatcause‘batch’differencesbetweenassaydays.Fluid,orper-clusterregistrationcalculatestheregistrationadjustmentrequiredforeachclusterinthetestsampletooverlapfullywithitscorrespondingclusterinthereferencesample.Thisregistersclustersmorecompletely,andcanremoveindividualvariation(duetoe.g.geneticordisease-specificeffects).Fluidregistrationremovesmostpositionalinformation-thisisdesirableifthemainexperimentaloutcomeisexpectedtobevariationsofthenumberofcellsofdifferenttypes.Thismethodhasbeenappliedtodatasetsthatincludechangesduetoassaydates,flowcytometers,subjects,andsequentialbloodsamples.Mostvariationoccurredbetweencytometersandassaydays,lessbetweensubjects,andtheleastbetweendifferentbleedsfromthesameperson.Registrationsubstantiallyimprovedcorrelationsbetweenclustermedians.Thenumberofcellsperclusteralsoshowedincreasedcorrelation,suggestingthatunmodifiedsamplesassignedtotheclustertemplatessometimeshadcellsassignedtoaninappropriatecluster.ThustheSWIFTcluster-basedregistrationcanimprovesubsequentflowcytometryanalysis.Registeredsamplescanbeanalyzedbyavarietyofmanualorautomatedprocedures.

121

WORKSHOP:NOBOUNDARYTHINKINGINBIOINFORMATICS

POSTERPRESENTATION

122

ENABLINGRICHERDATAINTEGRATIONFORGENOMICEPIDEMIOLOGY

E. Griffiths1,D.Dooley2,C.Bertelli1,J.Adam3,F.Bristow3,T.Matthews3,A.Petkau3,M.Courtot4,J.A.Carriço5,A.Keddy6,R.Beiko6,L.M.Schriml7,E.Taboada8,M.Graham3,G.VanDomselaar3,

W. Hsiao2,F.Brinkman1

1SFU,Burnaby,BC,Canada;2BCCentreforDiseaseControl,Vancouver,BC,Canada;3PHAC,Winnipeg,MB,Canada;4EBI,Hinxton,Cambridge,UK;5Univ.ofLisbon,Lisbon,Portugal;

6DalhousieUniv.,Halifax,NS,Canada;7Univ.ofMarylandSchoolofMedicine,Baltimore,MD,USA;8PHAC,Lethbridge,AB,Canada

FionaBrinkmanOnebarriertoeffectivelycapitalizingonwholegenomesequencedataisefficient,robustannotationandintegrationofassociatedcontextualdata(metadata).Whetherhuman,microbialorotherorganismalgenomicsequence,frequentlysuchcontextualdataistoounorganized,infreetextformat,toenableeffectiveintegrationforansweringmoresophisticatedquestions.ApproachestohelpovercomethisbarrierareillustratedherewiththeIntegratedRapidInfectiousDiseasesAnalysis(IRIDA.ca)ProjectandGenomicEpidemiologyOntology(GenEpiO.org)Consortium.Microbialpathogenwholegenomesequencingprovidesthehighestresolutionmolecular“fingerprint”forinfectiousdiseaseepidemiologyandistransformingpublichealthpractice–enablingmorerapididentificationofdiseaseoutbreaks,theirsources,andpotentialcontrolmeasures.However,suchmicrobialgenomicdata(likehuman‘omicdata)mustbecombinedwithepidemiological/clinical/laboratory/otherhealthcaredata(“contextualdata”)tobemeaningfullyinterpretedforclinicalandpublichealthquestions/actions.Furthermore,informationmustbesharedbetweendifferentagenciestoefficientlyassessandmanageriskstohumanhealthacrossjurisdictions.Currently,terminologiesdescribingpublichealthdatacannotbeeasilymappedacrossfunctionally-similarsoftwaresystemswithoutintricateinterventionbyspecialists,resultingindataexchangesystemsthatarestaticandfragile.Topromoteefficientdataexchangeandintelligencesharing,weproposeanintuitiveplatformforsearching,identifying,andverifyingthefundamentalhealthcareentityelements(ontologyterms)tomaptoinstitutionalapplicationdataformats,startingwithgenomicandpublichealthcontextualdata.KeyinnovationsaretheproposedGenomicEpidemiologyEntityMart(GE2M)thatallowsuserstoinspecttermdefinitions,labeling,anddatabasecrossreferencesinauser-friendlyformat,plusasoftwaresystemallowingdifferentjurisdictionstousethetermssuitableforthem,essentiallychoosingfroma“shoppingcart”ofoptionsmappedbetweenjurisdictions/organizations.AverypreliminaryprototypeofthisconcepthasbeenestablishedaspartoftheIRIDA.caprojectandtheGenEpiOConsortium(aconsortiumof70researchersfrom15countriesinterestedincontributingtothiseffort).Wehypothesizethatacommonandaccessibleontologyentitymartcanbedeveloped,ifappropriatetoolsforinterfacingdomainexpertswiththismartaredeveloped–andthemartisfirstappliedtopracticalmicrobialgenomicepidemiologydatasharingneedsbetweenselectpublichealthsystems(withconsultationinvolvingalargerconsortium).Inaddition,newgenomicdatavisualizationapproachesarebeingdevelopedforintegrationintotheIRIDAsoftwareplatform,toenablemoreinteractive,flexiblevisualizationofgenomicdatawithdifferentlevelsorviewsofcontextualdata(fromfinelydetailedcomparisonsofgenomicislandsandotherfeaturesbetweengenomes,toexamininggenomicdatainthecontextofgeographicaldata).IRIDAisbeingusedinCanada’spublichealthagency,andthisopensourcesoftwareisalsobeinginstalledinothercountriesinterestedinco-developingthisresourceandusingafederateddatasharingapproach.

123

AUTHORINDEX

A

Abrams,Zachary·59Abul-Husn,NouraS.·107Adam,J.·122Adams,Micah·54Aevermann,Brian·37Afrasiabi,Cyrus·115Agarwal,Vibhu·17Akbarian,Schahram·72Aldrich,MelindaC.·20,35Alkan,Can·77,87Alser,Mohammed·77Altman,RussB.·79,90Andreoletti,Gaia·101Andres-Terrè,Marta·13Ansel,Mark·80Anwar,Ash·114Armaselu,Bogdan·18Arunachalam,HarishBabu·18Ashley,Euan·104Aslam,Naureen·68Asmann,YanW.·85Ayati,Marzieh·67

B

Bader,GaryD.·110Baheti,Saurabh·108Bai,Yongsheng·68Bakken,Trygve·37Baskar,Reema·117Bauer,ChristopherR.·27Beaulieu-Jones,BrettK.·19Bebek,Gurkan·52Beck,Andrew·50Beck,Mette·28Beiko,R.·122Bellovich,Keith·70Bendall,Sean·117Berens,Michael·31Berry,GeraldJ.·90Bertelli,C.·122Best,AaronA.·2Bhat,Zeenat·70Bichko,Dmitri·76Biernacka,JoannaM.·85Biggin,MarkD.·64Boespflug,Mathieu·76Boley,Nathan·98Bongen,Erika·13Borchers,ChristophH.·114Borecki,Ingrid·34Borrayo,Ernesto·63

Bowden,DonaldW.·45Bowerman,Nathan·2Breitenstein,MatthewK.·96Breitwieser,Gerda·34Brenner,StevenE.·101Brinkman,BenjaminH.·97Brinkman,F.·122Bristow,F.·122Bromberg,Yana·69Brosius,FrankC.·70Brown,AndrewJ.Leigh·83Brubaker,Douglas·52Brunak,Soren·28Burns,TylerJ.·118Bustillo,JuanR.·93

C

Cai,Guoshuai·73Calhoun,VinceD.·9,93Cao,Mengfei·3Carey,DavidJ.·107Carr,Steven·109Carriço,J.A.·122Carter,LesterG.·106Cederberg,Kevin·18Chan,Yu-FengYvonne·23Chance,Mark·67Chang,Rui·11Chasioti,Danai·71Chaudhary,Kumardeep·74Chen,Rong·56Chen,Yii-DerI.·45Cheung,Philip·84Cheville,John·108Chew,Guo-Liang·64Choi,Yoon-La·111Christiansen,Lena·37Clay,AlyssaI.·96Clemons,PaulA.·31Cline,Melissa·15Cohain,Ariella·11Cordero,Pablo·38Correa,Adolofo·45Costello,JamesC.·60Courtot,M.·122Cowen,LenoreJ.·3Crawford,DanaC.·20Cullis,Pieter·114

D

Daescu,Ovidiu·18Danaee,Padideh·44Darrow,Bruce·22

124

Davila, Jaime·108Davis-Dusenbery,Brandi·14deBelle,J.Steven·84De,Subhajyoti·88Deisseroth,ColeA.·13DeJongh,Matthew·2Denny,Joshua·35deVries,Edsko·76Dewey,FrederickE.·34,107Dhruv,Harshil·31Diaz,Diana·51Diez-Fuertes,Francisco·37Dincer,Aslihan·72Disselkoen,Craig·54Divaraniya,AparnaA.·11Dominguez,Facundo·76Domselaar,G.Van·122Donato,Michele·51Dooley,D.·122Dougherty,Greg·105Draghici,Sorin·51Dudley,JoelT.·11,22,72Dunnenberger,H.M.·106Durmaz,Arda·52

E

Eckel-Passow,Jeanette·105Egawa,Fumiko·33Empey,P.E.·106Ergin,Oguz·77Ertekin-Taner,Nilüfer·85Eskin,Eleazar·80

F

Fantl,WendyJ.·78Farber-Eger,Eric·20Farrow,EmilyG.·102,113Fienberg,Harris·117Fink,CrisG.·97Fink,Tobias·24,99Fogarty,Zach·108Foo,ChuanSheng·98Fornage,Myriam·45Franks,JenniferM.·73Frase,A.T.·106Fraser,Robert·114Fread,KristinI.·39Freedman,BarryI.·45Freimuth,R.R.·106Freimuth,Robert·105

G

Gadegbeku,Crystal·70Gaedigk,A.·106

Gaedigk,Andrea·81,102,113Gallion,Jonathan·29,103Gao,Chen·41Garmire,Lana·74Gavin,Davin·72Gelijns,Annetine·22Genes,Nicholas·23Ghaeini,Reza·44Ghose,Saugata·87Gipson,Debbie·70Giron,Emily·84Glicksberg,Benjamin·56Gliske,StephenV.·97Goldfeder,Rachel·104Gordon,A.·106Gosh,Debashis·88Graham,M.·122Gray,DanielH.·78Greenside,Peyton·98Griffiths,E.·122Groop,Leif·28Guney,Emre·12Guo,AnChi·114

H

Haidar,C.·106Hart,Steven·105Hassan,Hasan·77Hawkins,Jennifer·70Haynes,WinstonA.·13He,Dan·30He,Shuyao·55Hellwege,JacklynN.·45Henderson,TimA.D.·52Hendrix,David·44Hershman,StevenG.·23Herzog,Julia·70Hicks,J.K.·106Hodge,Rebecca·37Hoff,FiekeW.·57Hoffman,J.M.·106Hollister,BrittanyM.·20Hong,Na·75Horton,Iain·105Horton,TerzahM.·57Hoskins,RogerA.·101Hsiao,W.·122Hu,ChenyueW.·57Huang,Austin·76Huang,Kun·7,59Hui,Shirley·110

I

Iakoucheva,LiliaM.·82Imoto,Seiya·91Ingram,WendyMarie·119

125

Israeli,Johnny·98Isserlin,Ruth·110Ivkovic,Sinisa·14

J

Jebakaran,Jebakumar·22Jiang,Guoqian·75Johannessen,Solveig·114Johnson,KippW.·22Johnson,Travis·59Ju,Wenjun·70

K

Kabat,Halla·53Kaddurah-Daouk,RimaF.·96Kaka,Hussam·110Kamp,Thomas·54Kandamurugu,Manickam·107KanigelWinner,KimberlyR.·60Karakurt,Gunnur·48Kasarskis,Andrew·11,22Kashef-Haghighi,Dorna·33Kaushik,Gaurav·14Keaton,JacobM.·45Kechris,Katerina·86Keddy,A.·122Khatri,Purvesh·13, 46Kiefer,Jeff·31Kim,Jeremie·77,87Kim,Juho·61Kim,Junghi·41Kim,NayoungK.D.·111Kim,Seungchan·31Klein,T.E.·106Knox,Craig·114Ko,MelissaE.·78Kornblau,StevenM.·57Kovatch,Patricia·22Koyutürk,Mehmet·48, 67Kretzler,Matthias·70Krishnamurthy,Sarathbabu·34,107Krishnan,MichelleL.·42Kuan,PeiFen·55Kuncheva,Zhana·42Kundaje,Anshul·98Kural, Deniz ·14

L

Lanchantin,Jack·21Larson,Melissa·108Larson,NicholasB.·108Lasken,RogerS.·37

Lau,KatyL.·97Lavage,DanielR.·27,34Leader,JosephB.·27,34,107Leavey,Patrick·18Ledbetter,DavidH.·107Lee,Donghyuk·77Lee,Inhan·53Lee,M.T.·106Lein,Ed·37Lelong,Sebastien·115Li,JingyiJessica·64Li,Lang·71Li,Li·22Li,MatthewD.·13Li,Shuyu·56Lichtarge,Olivier·25,29,103Lin,Chih-Hsu·25Lin,Dongdong·93Lin,Yaxiong·105Lincoln,StephenE.·15Liu,Charles·13Liu,Jingyu·93Liu,Keli·50Liu,LarryY.·48Liu,Tao·109Lofgren,Shane·13Lopez,Alexander·34Lu,Liangqun·74Lua,RhonaldC.·25Lucas,AnastasiaM.·34Luedtke,Alexander·50

M

Ma,Meng·56Machida-Hirano,Ryoko·63Mahendra,Divya·31Mahlich,Yannick·69Mahoney,J.Matthew·27Mallory,EmilyK.·79Mandric,Igor·80Mangul,Serghei·80Marcu,Ana·114Marko,NicholasF.·119Marsit,CarmenJ.·32,112Martinez,Maria·18Massengill,Susan·70Matthews,T.·122Matveeva,OlgaV.·94McCorrison,Jamison·37McDermott,JasonE.·109McDonnell,ShannonK.·85,105,108McEachin,RichardC.·70Mead,David·105Mehta,Sanket·57Mertins,Philipp·109Metpally,RaghuP.R.·34,107Miller,Jeremy·37Miller,Neil·81,102, 106,113

126

Miotto,Riccardo·22Mishra,Rashika·18Misra,Debdipto·119Miyano,Satoru·91Mohan,Rahul·98Montana,Giovanni·42Montoya,Dennis·80Mooney,SeanD.·82,106,115Moore,JasonH.·19Moskovitz,Alan·22Mosmann,TimR.·120Moult,John·101Murray,MichaelF.·107Mutlu,Onur·77,87Myers,Mark·105

N

Nair,AshaA.·108Nair,K.Sreekumaran·96Narla,Goutham·67Nazipova,NafisaN.·94Ng,MaggieC.Y.·45Nguyen,Tin·51Nho,Kwangsik·8Ni'Suilleabhain,Molly·18Ning,Xia·71Nolan,GarryP.·39,78,117,118Non,Amy·20Novotny,Mark·37

O

O'Connell,Chloe·33O’Brien,Daniel·108Ogurtsov,AlekseyY.·94Osafo,Nana·89Otolorin,Abiodun·89Overton,John·34

P

Pai,Shraddha·110Palmer,NicholetteD.·45Pan,Wei·41Pandey,Gaurav·47Pankow,JamesS.·45Parida,Laxmi·30Park,PeterJ.·111Park,Woong-Yang·111Paten,Benedict·15Payne,Samuel·109Pejaver,Vikas·82Pen,Jian·65Pendergrass,SarahA.·27,34Peng,Jian·4,61

Penn,John·34Pennathur,Subramaniam·70Perrone-Bizzozero,Nora·93Person,T.N.·106Perumal,Kalyani·70Peterson,Josh·35,106Petkau,A.·122Petyuk,Vladislav·109Pinney,Sean·22Playter,ChristopherS.·78Plevritis,SylviaK.·78Poirion,Olivier·74Pond,Sergei·83Probert,Chris·98Prodduturi,Naresh·75Pyc,MaryA.·84

Q

Qi,Yanjun·21Qu,Meng·4,65Quataert,SallyA.·120Qutub,AminaA.·57

R

Radcliffe,Richard·86Rademakers,Rosa·85Radivojac,Predrag·82Rakheja,Dinesh·18Rasmussen-Torvik,LauraJ.·45Ré,Christopher·79,90Rebhahn,JonathanA.·120Reddy,JosephS.·85Reed,Gay·105Reich,DavidL.·22Reid,Jeffrey·34Relling,M.V.·106Ren,Yingxue·85Restrepo,NicoleA.·20Rich,StephenS.·45Ricks,Doran·22Risacher,ShannonL.·8Riska,Shaun·108Ritchie,MarylynD.·34,106,119Roden,Dan·35Rodland,Karin·109Rogers,Linda·23Ross,Jason·105Ross,OwenA.·85Rossetti,Maura·80Rotman,Jeremy·80Rotter,JeromeI.·45Röttger,Richard·5Rubin,DanielL.·90Rudra,Pratyaydipta·86Russell,Nate·61Russell,Pamela·86

127

S

Saba,Laura·86Salman,Ali·68Samuels,David·35Samusik,Nikolay·118Sander,Thomas·24,99Sangkuhl,K.·106Sarangi,Vivekananda·85Saykin,AndrewJ.·8Scarpa,JosephR.·11Schadt,EricE.·11,23,56,72Schaid, Daniel ·108Scherbina,Anna·98Scheuermann,RichardH.·37Schlatzer,Daniela·67Schork,Nicholas·37Schreiber,StuartL.·31Schriml,L.M.·122Schultz,André·57Scott,ErickR.·23Scott,Madeleine·46Scott,S.A.·106Sengupta,Anita·18Sengupta,ParthoP.·22Senol,Damla·77,87Shabalina,SvetlanaA.·94Shah,NigamH.·17Shameer,Khader·22Sharma,Gaurav·120Shen,Li·8,71Shi,Wen·86Shifman,Sagiv·80Shin,Hyun-Tae·111Shrikumar,Avanti·98Shuldiner,AlanR.·107Simonovic,Janko·14Singh,Ritambhara·21Sinnwell,JasonP.·85Smelser,Diane·107Smith,Kyle·88Smith,Richard·109Snyder,John·27Snyder,Michael·90Soden,Sarah·102,113Song,Junyan·55Southerland,William·89Speyer,Gil·31Spreafico,Roberto·80Stacey,WilliamC.·97Stai,Tony·105Stanescu,Ana·47Statz,Benjamin·80Steemers,Frank·37Strauli,Nicolas·80Strickland,WilliamD.·39Stuart,JoshuaM.·38Su,AndrewI.·115Su,Hai·7Swank,Julie·105

Sweeney,TimothyE.·13

T

Taboada,E.·122Tam,Andrew·13Taroni,JaclynN.·73Tatonetti,NicholasP.·22Taylor,KentD.·45Teh,Charis·78Thibodeau, Stephen N.·108Thompson,JeffreyA.·32,112Tignor,Nicole·23Tijanic,Nebojsa·14Tintle,Nathan·2,50,54Tomczak,Aurelie·13Tran,DannyN.·37Tran,HaiJ.·31Tsueng,Ginger·115Tully,Tim·84Tunkle,Leo·53Twist,GreysonP.·81,102,106,113

V

Vallania,Francesco·13,46VanDerWey,Will·80VanHouten,Jacob·35Venepally,Pratap·37Venkataraman,GuhanRam·33Verma,A.·106Verma,ShefaliS.·34Vestal,Brian·86Volety,Rama·105vonKorff,Modest·24,99

W

Wagenknecht,LynneE.·45Wall,DennisPaul·33Wang,Beilun·21Wang,Changchang·56Wang,Chao·7Wang,Chen·75Wang,Liewei·96Wang,Pei·23Wang,Sheng·4,65Wang,Yu-Ping·9Weaver,Steven·83Weinshilboum,RichardM.·96Wertheim,Joel·83Westergaard,David·28Whaley,R.M.·106Whirl-Carrillo,M.·106Whitfield,MichaelL.·73Whiting,Kathleen·48Wiepert,Mathieu·105

128

Wiggins,Roger·70Wiley,Laura·35Wilkins,AngelaD.·25,29,103Williams,M.S.·106Wilson,JamesG.·45Wilson,Michael·114Wilson,StephenJ.·25Wiredja,Danica·67Wishart,DavidS.·114Wiwie,Christian·5Woon,M.·106Worrell,GregA.·97Wu,Chunlei·106,115

X

Xin,Hongyi·77Xin,Jiwen·115

Y

Yahi,Alexandre·22Yamaguchi,Rui·91Yan,Jingwen·8Yang,HarryTaegyun·80Yang,Lin·7

Yang,Shan·15Yang,W.·106Yao,Xiaohui·71Yoo,Byunggil·81Younkin,SteveG.·85Yu,Kun-Hsing·90Yun,JaeWon·111

Z

Zaitlen,Noah·80Zelikovsky,Alex·80Zhang,Bin·72Zhang,Can·15Zhang,Fan·37Zhang,Pengyue·71Zhang,Yan·59Zhang,Yao-zhong·91Zhu,Chengsheng·69Zhu,Jun·11Zhu,Kuixi·11Ziemek,Daniel·76Zille,Pascal·9Zunder,EliR.·39,78Zweig,Micol·23

pacific symposium on biocomputing 2017psb.stanford.edu/previous/psb17/conference-materials/... ·...

Documents