the drosophila gene expression tool (dget) for expression … › content › biorxiv › early ›...

16
The Drosophila Gene Expression Tool (DGET) for expression analyses Yanhui Hu1, Aram Comjean1, Norbert Perrimon1,2 , Stephanie Mohr1 1. Dept. of Genetics, Harvard Medical School, Boston, MA 0211; 2. Howard Hughes Medical Institute Corresponding author: Stephanie Mohr Abstract Background Next-generation sequencing technologies have greatly increased our ability to identify gene expression levels, including at specific developmental stages and in specific tissues. Gene expression data can help researchers understand the diverse functions of genes and gene networks, as well as help in the design of specific and efficient functional studies, such as by helping researchers choose the most appropriate tissue for a study of a group of genes, or conversely, by limiting a long list of gene candidates to the subset that are normally expressed at a given stage or in a given tissue. Results We report a Drosophila Gene Expression Tool (DGET, www.flyrnai.org/tools/dget/web/), which stores and facilitates search of RNA-Seq based expression profiles available from the modENCODE consortium and other public data sets. Using DGET, researchers are able to look up gene expression profiles, filter results based on threshold expression values, and compare expression data across different developmental stages, tissues and treatments. In addition, at DGET a researcher can analyze tissue or stage-specific enrichment for an inputted list of genes (e.g. ‘hits’ from a screen) and search for additional genes with similar expression patterns. We performed a number of analyses to demonstrate the quality and robustness of the resource. In particular, we show that evolutionary conserved genes expressed at high or moderate levels in both fly and human tend to be expressed in similar tissues. Using DGET, we compared whole tissue profile and sub-region/cell-type specific datasets and estimated the potential cause of false positives in one dataset. We also demonstrated the usefulness of DGET for synexpression studies by querying genes with similar expression profile to the mesodermal master regulator Twist. Conclusion Altogether, DGET provides a flexible tool for expression data retrieval and analysis with short or long lists of Drosophila genes, which can help scientists to design stage- or tissue-specific in vivo studies and do other subsequent analyses. Keywords . CC-BY-NC 4.0 International license not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which was this version posted September 15, 2016. . https://doi.org/10.1101/075358 doi: bioRxiv preprint

Upload: others

Post on 03-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

TheDrosophilaGeneExpressionTool(DGET)forexpressionanalyses

YanhuiHu1,AramComjean1,NorbertPerrimon1,2,StephanieMohr1

1. Dept.ofGenetics,HarvardMedicalSchool,Boston,MA0211;2.HowardHughesMedicalInstitute

Correspondingauthor:StephanieMohr

Abstract

Background

Next-generationsequencingtechnologieshavegreatlyincreasedourabilitytoidentifygeneexpressionlevels,includingatspecificdevelopmentalstagesandinspecifictissues.Geneexpressiondatacanhelpresearchersunderstandthediversefunctionsofgenesandgenenetworks,aswellashelpinthedesignofspecificandefficientfunctionalstudies,suchasbyhelpingresearcherschoosethemostappropriatetissueforastudyofagroupofgenes,orconversely,bylimitingalonglistofgenecandidatestothesubsetthatarenormallyexpressedatagivenstageorinagiventissue.

Results

WereportaDrosophilaGeneExpressionTool(DGET,www.flyrnai.org/tools/dget/web/),whichstoresandfacilitatessearchofRNA-SeqbasedexpressionprofilesavailablefromthemodENCODEconsortiumandotherpublicdatasets.UsingDGET,researchersareabletolookupgeneexpressionprofiles,filterresultsbasedonthresholdexpressionvalues,andcompareexpressiondataacrossdifferentdevelopmentalstages,tissuesandtreatments.Inaddition,atDGETaresearchercananalyzetissueorstage-specificenrichmentforaninputtedlistofgenes(e.g.‘hits’fromascreen)andsearchforadditionalgeneswithsimilarexpressionpatterns.Weperformedanumberofanalysestodemonstratethequalityandrobustnessoftheresource.Inparticular,weshowthatevolutionaryconservedgenesexpressedathighormoderatelevelsinbothflyandhumantendtobeexpressedinsimilartissues.UsingDGET,wecomparedwholetissueprofileandsub-region/cell-typespecificdatasetsandestimatedthepotentialcauseoffalsepositivesinonedataset.WealsodemonstratedtheusefulnessofDGETforsynexpressionstudiesbyqueryinggeneswithsimilarexpressionprofiletothemesodermalmasterregulatorTwist.

Conclusion

Altogether,DGETprovidesaflexibletoolforexpressiondataretrievalandanalysiswithshortorlonglistsofDrosophilagenes,whichcanhelpscientiststodesignstage-ortissue-specificinvivostudiesanddoothersubsequentanalyses.

Keywords

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 2: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Drosophila,RNA-Seq,expressionprofile,synexpression

Background

Theapplicationofnext-generationsequencetechnologiestoRNAanalysishasopenedthedoortorelativelyrapid,large-scaleanalysesofgeneexpression.‘Standard’RNA-seqanalysis,forexample,canprovideasnapshotofgeneexpressioninspecificcelltypesortissues(Wang,Gersteinetal.2009),andrelatedtechnologiessuchasRibo-seq(MichelandBaranov2013)providemorerefinedviews,suchasasnapshotofwhatgenesareactivelytranscribedinagivencellortissue.ForDrosophila,effortssuchasthemodENCODEproject(mod,Royetal.2010,Cherbas,Willinghametal.2011,Graveley,Brooksetal.2011,Boley,Wanetal.2014)haveprovidedabaselineoverviewofexpressionunderstandardlaboratoryconditionsforvariousculturedcelltypes,developmentalstages,andtissues,aswellastreatmentconditions.Moreover,studiessuchasthoseinvestigatingexpressioninsub-regionsoftheflygut(MarianesandSpradling2013,Dutta,Dobsonetal.2015)areprovidingincreasinglydetailedviewsofthebaselineexpressionlevelsofvariousgenesinvarioustissues,celltypesandsub-regions.Altogether,theseRNAseqdataresourcesprovidehelpfulstartingpointsforanalysisofothergenelists.

ResourcessuchasFlyBase(dosSantos,Schroederetal.2015)makeitpossibletoquicklyviewmodENCODEdataforagivengeneandmakethesedatagenerallyaccessibletothecommunity.Thevalueofthesedatatothecommunitycanbefurtherincreasedbyfacilitatingsearchoflistsofgenes.Forexample,forgenelistsoriginatingfromwhole-animalorculturedcellstudies,orforstudiesbasedonalistoforthologsofgenesfromanotherspecies,itcanbeveryhelpfultogetapictureofwhatstagesortissuesnormallyexpressthosegenes,asthatwillhelpfocusstage-ortissue-specificinvivostudiesandothersubsequentanalyses.WeimplementedDGETtohelpscientistsretrievemodENCODEexpressiondatainbatchmode.DGETalsohostsotherrelevantRNA-Seqdatasetspublishedinindividualstudies,suchasprofilesofspecificsub-regionsandcelltypesoftheDrosophilagut(MarianesandSpradling2013,Dutta,Dobsonetal.2015).Here,wedescribeDGETandperformanumberofanalysestodemonstratethequalityandrobustnessoftheresource.

ResultsandDiscussion

Databasecontentandfeaturesoftheuserinterface(UI)

TheDGETdatabasecontainsprocessedRNA-SeqdatafromthemodENCODEconsortium(mod,Royetal.2010,Cherbas,Willinghametal.2011,Graveley,Brooksetal.2011,Boley,Wanetal.2014),asreleasedbyFlyBase(dosSantos,Schroederetal.2015),aswellasotherpublisheddatasetsweobtainedfromsupplementaltables(MarianesandSpradling2013,Dutta,Dobsonetal.2015,CloughandBarrett2016).TheDGETUIhastwotabs(Figure1).

Atthe“SearchGeneExpression”tab,userscanenteralistofgenesorchooseoneofthepredefinedgeneclassesfromGLAD(Hu,Comjeanetal.2015),e.g.kinases,thenspecifythedatasetstobedisplayed.Therearetwosearchoptions,“lookatexpression”and“enrichmentanalysis.”Theresultspagefor“lookatexpression”displaysexpressionvaluesinaheatmapformat.Atthisresultspage,usershavetheoptiontodownloadtherelevantexpressionvalues;downloadtheheatmap;andfurtherfilterthelistbydefiningacutoff,limittospecificdataset(s),orfilteringoutgenes,forexamplewithlessthan

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 3: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

1RPKMvaluebasedoncarcassand/ordigestionsystemexpressionof1dayadult.WeusedanRPKMcutoffof1becausethisisconsideredthecutofffor‘noorextremelylowexpression’atFlyBase.Theresultspageforanenrichmentanalysisdisplaysthedistributionofgenesatdifferentexpressionlevelsusingabargraphandheatmap.ThecutoffvaluesfordifferentlevelsaredefinedbasedonFlyBaseguidelines.

Usingthe“SearchSimilarGenes”tab,userscanenterageneofinterestandsearchforothergeneswithsimilarexpressionpatternbasedonPearsoncorrelationscore.Usershavetheoptionstodownloadthelistofgeneswithsimilarexpressionpatterns,aheatmap,andanormalizedheatmap.

ExpressionpatternofDrosophilaregulatorygenes

Whengenome-scalescreeningisnotpracticaltodo,acommonapproachistoselectaspecificsubsetofgenestostartwith,suchasagroupofgeneswithrelatedactivities.Themostfrequentlyscreenedsub-setsofgenesareimportantregulatorygenesincludinggenesthatencodekinases,phosphatases,transcriptionfactors,orcanonicalsignaltransductionpathwayscomponents.Ourexpectationisthattheseregulatorygenes,whichappeartobere-usedinmanycontexts,willbeexpressedinmanytissues.Totestthis,weanalyzedtheexpressionpatternsofseveralDrosophilaregulatorygeneclassesdefinedbyGLAD(Hu,Comjeanetal.2015).Theseincludedcanonicalsignaltransductionpathwaygenes,kinases,phosphatases,transcriptionfactors,secretedproteins,andreceptors.ThepercentagesofexpressedgeneswerecalculatedacrossalltissuesprofiledusingaRPKMof1oraboveasacutoffforexpressedversusnotexpressed(Figure2).About70-90%ofthegenescategorizedasencodingcanonicalsignaltransductionpathwaycomponents,kinases,phosphatases,ortranscriptionfactorsareexpressedineachofthemajortissuecategoriesprofiled,whereasonly30-60%ofreceptororsecretedproteinsaredetectedinanygiventissue.Correlationofexpressionwithconfidenceinanorthologrelationship

Itiswellestablishedthattheevolutionaryconservationofproteinscorrelateswithconservationatthelevelofbiologicaland/orbiochemicalfunctions.Drosophilaisamodelorganismofparticularinterestforwhichawidevarietyofmoleculargenetictoolsarereadilyavailable.Particularly,Drosophilamodelshavebeendevelopedforanumberofhumandiseases(Perrimon,Boninietal.2016).AccordingtoDIOPT,9,705of13,902protein-codinggenesinDrosophilaarepredictedtohavehumanortholog(s)(Hu,Flockhartetal.2011).UsingDGETweanalyzedtheexpressionlevelsofthesubsetofDrosophilagenesforwhichthereisevidencethattheyareconservedinthehumangenome.Specifically,weanalyzedsubsetsofgenesscoringasputativehumanorthologsofflygenesatdifferentlevelsofconfidence,asdefinedbytheDIOPTscore(Hu,Flockhartetal.2011).WefoundastrongcorrelationofpercentexpressedgeneswithDIOPTscore(Figure3).Forexample,forgenesthathaveahigh-confidenceorthologrelationship(DIOPTscoreof7orabove),almostallareexpressedacrossalltissues.Bycontrast,forgenesforwhichDIOPTanalysissuggeststhatthereisnoevidenceofahumanortholog(i.e.noneofthe10orthologalgorithmsqueriedwithDIOPTpredictanortholog),only20-50%areexpressedineachofthemajortissuecategoriesprofiled.Wesuspectthatthiscorrelationisdrivenby

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 4: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

essentialgenes,whicharemoreconservedevolutionary.Wealsonotethatgenesetenrichmentforthesetofhigh-confidenceorthologsindicatesthat“kinases”and“nucleotidebinding”amongthetop20enrichedsets,indicatingthatthesetofregulatorygenesanalyzedabovehasoverlapwiththisset.

Wenextanalyzedthe418DrosophilaessentialgenesidentifiedbySpradlingetal(Spradling,Sternetal.1999)usingalarge-scalesingleP-elementinsertionflystockcollection.TheproportionsofessentialgenesexpressedatdetectablelevelsinvarioustissuesareverysimilartothegeneswithDIOPTscore7-10(Figure3,lightpurpleanddarkpurplebars)withaPearsoncorrelationcoefficientequalto0.92.ExpressionpatternsofDrosophilaorthologsofhumangenesthatarehighlyexpressedinspecifictissues

Next,weaskedwhethergenesconservedbetweenhumanandDrosophilaarealsoexpressedinsimilarpatterns.Weusedthetissue-basedhumanproteomeannotationavailableattheHumanProteinAtlas(HPA)(www.proteinatlas.org)(Uhlen,Fagerbergetal.2015),asthesourcefortissue-specificexpression,andretrievedthesetofhumangenesthatareexpressedinspecifictissues.Next,wemappedthesehumangenestoDrosophilaorthologsusingDIOPT(Hu,Flockhartetal.2011),filteringoutlowrankorthologpairs(seeMaterialsandMethods),andanalyzedtheexpressionpatternsofthesehigh-confidenceorthologsinDrosophilatissuesusingDGET(Figure4).Theresultsofouranalysisusingallannotatedproteinswithoutafilterdidnotclearlydemonstrateconservationofexpressionpatterns.However,ananalysislimitedtogenesexpressedathighormoderatelevels(asannotatedbyHPA)fromhighconfidentannotation(i.e.excludingHPA“reliability”valueof“uncertain”),indicatesthatgeneexpressionpatternsareconservedinsimilartissuesinDrosophila.Forexample,asagroup,orthologsofgeneshighlyexpressedinthehumancerebellum,cerebralcortex,lateralventricleorhippocampusarehighlyexpressedintheDrosophilacentralnervoussystem(CNS)orhead,atbothlarvalandadultstages,andorthologsofgeneshighlyexpressedinhumantestisarealsohighlyexpressedintheDrosophilatestis.Moreover,orthologsofgenesfromsomeorgansofthehumandigestivesystem,suchasstomach,duodenumorsmallintestine,arealsohighlyexpressedintheDrosophiladigestivesystem.TofurthercomparetheexpressionpatternsofgenesexpressedinthehumanandDrosophila,digestivesystems,weanalyzedtheDrosophilagutsub-regiondatafromDuttaetal.(Dutta,Dobsonetal.2015)(Figure5).OrthologsofgeneshighlyexpressedinthehumansalivaryglandandesophagusarehighlyexpressedintheR1upstreamregion,andorthologsofgeneshighlyexpressedinthehumanrectum,colonorappendixaremorebiasedtowardsexpressionintheR5downstreamregion.Flyorthologsofgeneshighlyexpressedinthehumanstomach,duodenumandsmallintestinearedetectedthroughoutthesamplescorrespondingtoR1toR5.

Mininginformationfromdistinctbutrelatedflygutgeneexpressiondatasets

Wenextsoughttocomparetheresultsofwhole-gutprofilingwithresultsfromprofilingofspecificsub-regionsorcelltypeswiththegoalofidentifyinggenesonlyexpressedinspecificsub-populations.Ourrationalefortheanalysiswastodeterminethelikelihoodthatgenesexpressedinasub-populationaremissedinexpressionanalysisofanentireorgan.Thistypeoffalsenegativeanalysisshouldprovidehelpfulinformationforinterpretingresultsofwhole-organorwhole-tissuestudies.Thus,

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 5: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

wecomparedthewholegutprofilingdataobtainedbymodENCODEconsortiumfor20dayoldadultflies(mod,Royetal.2010)withdatageneratedbyprofilingsub-regionsofthemidgutin16-20dayoldadultflies(MarianesandSpradling2013).Wholegutprofilingindicatesthat9,109genesareexpressedinthegutof20dayoldadultflies(RPKMcutoffvalueof1).Amongthe4,790protein-codinggenesnotdetectedasexpressedinthewhole-gutstudy,136genesaredetectedinatleast3sub-regionsofthegut(RPKM>=3).Thesegenesareeitherfalsenegativeinwholegutprofilingorfalsepositiveinsub-regionprofiling.Wenextdidagenesetenrichmentanalysiswiththese136genesandfoundthatstressresponsegenes,suchasheat-shockgenes(Hsp70Aa,Hsp70Ab,Hsp70Ba,Hsp70Bbb)areenriched(Pvalue=3.05E-07).Thissuggeststhatthesampleusedforsub-regionprofilingwasassociatedwithsomelevelofstress.Comparingthelistof136geneswiththeDrosophilaessentialgenelist,wefoundonlyoneoverlappinggene.Inaddition,only23ofthe136geneshaveDIOPTscore7-10whenmappingtohumangenes.Thus,smallfractionofthesegenesmightbethefalsenegativewithwholetissueprofilingwhilemajorityofthegenesarelikelytobethefalsepositivesnotnormallypresentinthegutundernon-stressconditions.

SynexpressionanalysisfortranscriptionfactorTwist

Expressionprofilingisapowerfulapproachtoidentifyfunctionallyrelatedgenes,asgenesshowingsynexpressionoftenoperateinsimilarpathwaysand/orprocesses(seeforexample(Dequeant,Fagegaltieretal.2015)).WetestedDGETforitsusefulnessforsynexpressionstudiesbyqueryinggeneswithsimilarexpressionprofiletothemesodermalmasterregulatorTwist.DGETpreferentiallyretrievedTwisttargetgeneswithcelllinedataaswellasdevelopmentdata.Forexample,amongthetop27genesthatsharesimilarexpressionwithTwistincelllines(Pearsoncorrelationco-efficiencycutoff=0.7),11ofthemareTwisttargetgenesbasedonChip-on-chipdata(Sandmann,Girardotetal.2007),and8ofthe11genesarehigh-confidence(Table1).Theenrichmentp-valueforTwisttargetgenesis8.70E-04and3.05E-05forhigh-confidencetargets.Weobservedalesssignificantenrichmentwithdevelopmentdata(p-value5.00E-02forallTwisttargetgenesandp-valueof2.70E-03forhigh-confidencetargets),likelyreflectingthediversityofcelltypespresentinthedevelopmentaldataandthatnotenoughcellsexpresstwist.Thus,DGETwillbeverypowerfulwhenappliedtoRNA-seqdatasetsfromsinglecellorgroupsofhomogeneouscellpopulations.

ConcludingRemarks

Insummary,DGETmakesitpossibletoretrieveandcompareDrosophilageneexpressionpatternsgeneratedbyvariousgroupsusingRNA-Seq.Thetoolcanhelpscientistsdesignexperimentsbasedonexpressionandanalyzeexperimentresults.ThebackenddatabaseforDGETisdesignedtoeasilyaccommodatetheadditionofnewhighqualityRNA-Seqdatasetsastheybecomeavailable.Finally,althoughtheanatomyofhumanandDrosophilaarequitedifferent,byusingDGET,wedemonstratethatexpressionpatternsofgenesthatareconservedandhighlyexpressedareconservedbetweenhumanandDrosophilainmanymatchingtissues,underscoringtheutilityoftheDrosophilamodeltounderstandtheroleofhumangeneswithunknownfunctions.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 6: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Methods

Dataretrieval

ProcessedmodENCODEdatawereretrievedfromFlyBase(ftp://ftp.flybase.net/releases/current/precomputed_files/genes/gene_rpkm_report_fb_2015_05.tsv.gz).DatapublishedbyMarianesandSpradling(MarianesandSpradling2013)wereretrievedfromNCBIGeneExpressionOmnibusat(http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47780).DatapublishedbyDuttaetal(Dutta,Dobsonetal.2015)wereretrievedfromtheflygut-seqwebsite(http://flygutseq.buchonlab.com/resources).DataretrievedweremappedtoFlyBaseidentifiersfromrelease2015_5andformattedforuploadintotheFlyRNAidatabase(Hu,Flockhartetal.2011).

Expressionpatternanalysis

Humanproteinexpressiondatawereretrievedfromproteinatlas.organdtissue-specificgeneswereselectedusingthefile“ProteinAtlas_Normal_tissue_vs14.”Proteinswithhighormediumexpressionlevelswithareliabilityvalueof“supportive”wereselected.Proteinsexpressedinabroadrangeoftissues(i.e.morethan5tissues)werefilteredout.DIOPTvs5wasusedtomapgenesfromhumantoDrosophila(Hu,Flockhartetal.2011).‘Orthologpairrank’wasaddedatrecentDIOPTrelease5.2.1(http://www.flyrnai.org/DRSC-ORH.html#versions).Drosophilageneswithhighormoderaterankwereselected.Thehigh/moderaterankmappingincludethegenepairsthatarebestscoreineitherforwardorreversemapping(andDIOPTscore>1)aswellasgenepairswithDIOPTscore>3ifnotbestscoreeitherway.

Implementation

DGETwasimplementedusingphpandJavaScriptwithMySQLdatabasefordatastore.ItishostedonaserverprovidedbytheResearchITGroup(RITG)atHarvardMedicalSchool.TheMySQLdatabaseisalsohostedonaserverprovidedbyRITG.Plottingofheat-mapsforsvgdownloadisdoneinRusingthegplotheatmapfunction.Websitebarchartsaredrawnusingthe3C.jsplottingpackage.ThephpsymfonyframeworkscaffoldisusedtocreateDGETwebpagesandforms.

Declarations

FundingWorkattheDRSCissupportedbyNIGMSR01GM067761,NIGMSR01GM084947,andORIP/NCRRR24RR032668.S.E.M.isadditionallysupportedinpartbyNCICancerCenterSupportGrantNIH5P30CA06516(E.Benz,PI).N.PisanInvestigatoroftheHowardHughesMedicalInstitute.

Competinginterest

Theauthorsdeclarethattheyhavenocompetinginterests.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 7: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Authors’contributions

YHdesignedandtestedtheapplication,implementedtheback-endoftheapplication,performedtheanalysisanddraftedthemanuscript.ACimplementedtheuserinterfaceandcontributedtotheback-endoftheapplication.NPprovidedcriticalinputonkeyfeaturesandtheanalysisaswellaseditedthemanuscript.SEMprovidedoversightandcriticalinputonkeyfeaturesandtheanalysis,andhelpeddraftthemanuscript.Allauthorsreadandapprovedthefinalmanuscript.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 8: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Table1.DGETsimilargenesearchresultsforTwistwithcelllinedataFBgn Gene Correlationscore Twisttarget?*FBgn0005636 nvy 0.910987 Yes,highconfidentFBgn0031738 CG9171 0.88094 Yes,highconfidentFBgn0015568 alpha-Est1 0.831094 FBgn0035733 CG8641 0.816603 FBgn0034997 CG3376 0.813417 Yes,lowconfidentFBgn0040091 Ugt58Fa 0.799835 FBgn0039827 CG1544 0.773761 FBgn0010389 htl 0.772568 Yes,highconfidentFBgn0001250 if 0.769353 Yes,highconfidentFBgn0039799 CG15543 0.765649 FBgn0038755 Hs6st 0.76281 Yes,highconfidentFBgn0265577 CR44404 0.76095 FBgn0037439 CG10286 0.745739 FBgn0025682 scf 0.744414 FBgn0003301 rut 0.74375 Yes,lowconfidentFBgn0036147 Plod 0.73896 FBgn0000723 FER 0.738927 Yes,lowconfidentFBgn0034804 CG3831 0.735346 FBgn0051075 CG31075 0.731916 FBgn0263144 bin3 0.72961 Yes,highconfidentFBgn0000575 emc 0.728894 Yes,highconfidentFBgn0038353 CG5399 0.724139 FBgn0085407 Pvf3 0.720044 Yes,highconfidentFBgn0036857 CG9629 0.716929 FBgn0039073 CG4408 0.714359 FBgn0037632 Tcp-1eta 0.702547 FBgn0038804 CG10877 0.701509 *Twisttargetsasdefinedin(Sandmann,Girardotetal.2007)

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 9: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Figure1.TheDGETuserinterface.

1a.Onthe“SearchGeneExpression”page,userscaninputagenelistbypastingDrosophilageneorproteinsymbolsorIDsintothetextbox,orbyuploadingafile.ThespecificidentifiersacceptedareFlyBaseGeneIdentifier(FBgn),genesymbol,CGnumber,andfullgenename.UserscanchoosetolookatexpressionpatternsorperformanenrichmentanalysisoftheinputtedlistascomparedwiththeunderlyingRNA-Seqdata.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 10: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

1b.Atthe“SearchSimilarGenes”page,userscanenteragenesymbol(orotheracceptedidentifier)tofindgeneswithsimilarexpressionpatterns.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 11: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Figure2.ExpressionpatternsofgenesinmajorDrosophilaregulatorygenegroups.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 12: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Figure3.Relationshipbetweenexpressionlevelsandgeneconservation.

Drosophilagenesthatareconservedinthehumangenomeatdifferentconfidencelevels(i.e.differentDIOPTscores)wereanalyzedbyDGET.Wefoundthatacrossalltissues,expressionlevelscorrelatewithconfidenceintheorthologrelationship.Thatis,ingeneral,geneswithhigherDIOPTscoresvs.humangeneshavehigherexpressionlevels.GeneswithDIOPTscoresof7-10(lightpurplebars)havesimilarexpressionpatternsascomparedwithDrosophilaessentialgenes(darkpurplebars);i.e.inbothcases,thegenesarelikelytobeexpressedinmanyoralltissues.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 13: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Figure4.ComparisonofgeneexpressionpatternsinhumansandDrosophila.

High-confidenceDrosophilaorthologsofgenesthatarehighlyexpressedinthesmallintestine,ovary,testis,cerebellum,cerebralcortex,orothertissueswereanalyzedusingDGET.Foratleastsometissues,weseeacorrelationbetweengeneshighlyexpressedinspecifichumantissues(e.g.cerebellum,testis)andtheexpressionoforthologsincognatetissuesample(s)availableforDrosophila(e.g.CNSorhead,testis).

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 14: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Figure5.ComparisonofDrosophilagutsub-regiondatawiththehumandigestivesystem.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 15: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

References

Boley,N.,K.H.Wan,P.J.BickelandS.E.Celniker(2014)."NavigatingandminingmodENCODEdata."Methods68(1):38-47.Cherbas,L.,A.Willingham,D.Zhang,L.Yang,Y.Zou,B.D.Eads,J.W.Carlson,J.M.Landolin,P.Kapranov,J.Dumais,A.Samsonova,J.H.Choi,J.Roberts,C.A.Davis,H.Tang,M.J.vanBaren,S.Ghosh,A.Dobin,K.Bell,W.Lin,L.Langton,M.O.Duff,A.E.Tenney,C.Zaleski,M.R.Brent,R.A.Hoskins,T.C.Kaufman,J.Andrews,B.R.Graveley,N.Perrimon,S.E.Celniker,T.R.GingerasandP.Cherbas(2011)."Thetranscriptionaldiversityof25Drosophilacelllines."GenomeRes21(2):301-314.Clough,E.andT.Barrett(2016)."TheGeneExpressionOmnibusDatabase."MethodsMolBiol1418:93-110.Dequeant,M.L.,D.Fagegaltier,Y.Hu,K.Spirohn,A.Simcox,G.J.HannonandN.Perrimon(2015)."Discoveryofprogenitorcellsignaturesbytime-seriessynexpressionanalysisduringDrosophilaembryoniccellimmortalization."ProcNatlAcadSciUSA112(42):12974-12979.dosSantos,G.,A.J.Schroeder,J.L.Goodman,V.B.Strelets,M.A.Crosby,J.Thurmond,D.B.Emmert,W.M.GelbartandC.FlyBase(2015)."FlyBase:introductionoftheDrosophilamelanogasterRelease6referencegenomeassemblyandlarge-scalemigrationofgenomeannotations."NucleicAcidsRes43(Databaseissue):D690-697.Dutta,D.,A.J.Dobson,P.L.Houtz,C.Glasser,J.Revah,J.Korzelius,P.H.Patel,B.A.EdgarandN.Buchon(2015)."RegionalCell-SpecificTranscriptomeMappingRevealsRegulatoryComplexityintheAdultDrosophilaMidgut."CellRep12(2):346-358.Graveley,B.R.,A.N.Brooks,J.W.Carlson,M.O.Duff,J.M.Landolin,L.Yang,C.G.Artieri,M.J.vanBaren,N.Boley,B.W.Booth,J.B.Brown,L.Cherbas,C.A.Davis,A.Dobin,R.Li,W.Lin,J.H.Malone,N.R.Mattiuzzo,D.Miller,D.Sturgill,B.B.Tuch,C.Zaleski,D.Zhang,M.Blanchette,S.Dudoit,B.Eads,R.E.Green,A.Hammonds,L.Jiang,P.Kapranov,L.Langton,N.Perrimon,J.E.Sandler,K.H.Wan,A.Willingham,Y.Zhang,Y.Zou,J.Andrews,P.J.Bickel,S.E.Brenner,M.R.Brent,P.Cherbas,T.R.Gingeras,R.A.Hoskins,T.C.Kaufman,B.OliverandS.E.Celniker(2011)."ThedevelopmentaltranscriptomeofDrosophilamelanogaster."Nature471(7339):473-479.Hu,Y.,A.Comjean,L.A.Perkins,N.PerrimonandS.E.Mohr(2015)."GLAD:anOnlineDatabaseofGeneListAnnotationforDrosophila."JGenomics3:75-81.Hu,Y.,I.Flockhart,A.Vinayagam,C.Bergwitz,B.Berger,N.PerrimonandS.E.Mohr(2011)."Anintegrativeapproachtoorthologpredictionfordisease-focusedandotherfunctionalstudies."BMCBioinformatics12:357.Marianes,A.andA.C.Spradling(2013)."PhysiologicalandstemcellcompartmentalizationwithintheDrosophilamidgut."Elife2:e00886.Michel,A.M.andP.V.Baranov(2013)."Ribosomeprofiling:aHi-Defmonitorforproteinsynthesisatthegenome-widescale."WileyInterdiscipRevRNA4(5):473-490.mod,E.C.,S.Roy,J.Ernst,P.V.Kharchenko,P.Kheradpour,N.Negre,M.L.Eaton,J.M.Landolin,C.A.Bristow,L.Ma,M.F.Lin,S.Washietl,B.I.Arshinoff,F.Ay,P.E.Meyer,N.Robine,N.L.Washington,L.DiStefano,E.Berezikov,C.D.Brown,R.Candeias,J.W.Carlson,A.Carr,I.Jungreis,D.Marbach,R.Sealfon,M.Y.Tolstorukov,S.Will,A.A.Alekseyenko,C.Artieri,B.W.Booth,A.N.Brooks,Q.Dai,C.A.Davis,M.O.Duff,X.Feng,A.A.Gorchakov,T.Gu,J.G.Henikoff,P.Kapranov,R.Li,H.K.MacAlpine,J.Malone,A.Minoda,J.Nordman,K.Okamura,M.Perry,S.K.Powell,N.C.Riddle,A.Sakai,A.Samsonova,J.E.Sandler,Y.B.Schwartz,N.Sher,R.Spokony,D.Sturgill,M.vanBaren,K.H.Wan,L.Yang,C.Yu,E.Feingold,P.Good,M.Guyer,R.Lowdon,K.Ahmad,J.Andrews,B.Berger,S.E.Brenner,M.R.Brent,L.Cherbas,S.C.Elgin,T.R.Gingeras,R.Grossman,R.A.Hoskins,T.C.Kaufman,W.Kent,M.I.Kuroda,T.Orr-Weaver,N.Perrimon,V.Pirrotta,J.W.Posakony,B.Ren,S.Russell,P.Cherbas,B.R.Graveley,S.Lewis,G.Micklem,B.Oliver,P.J.Park,S.E.Celniker,S.Henikoff,G.H.Karpen,E.C.Lai,D.M.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 16: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

MacAlpine,L.D.Stein,K.P.WhiteandM.Kellis(2010)."IdentificationoffunctionalelementsandregulatorycircuitsbyDrosophilamodENCODE."Science330(6012):1787-1797.Perrimon,N.,N.M.BoniniandP.Dhillon(2016)."Fruitfliesonthefrontline:thetranslationalimpactofDrosophila."DisModelMech9(3):229-231.Sandmann,T.,C.Girardot,M.Brehme,W.Tongprasit,V.StolcandE.E.Furlong(2007)."AcoretranscriptionalnetworkforearlymesodermdevelopmentinDrosophilamelanogaster."GenesDev21(4):436-449.Spradling,A.C.,D.Stern,A.Beaton,E.J.Rhem,T.Laverty,N.Mozden,S.MisraandG.M.Rubin(1999)."TheBerkeleyDrosophilaGenomeProjectgenedisruptionproject:SingleP-elementinsertionsmutating25%ofvitalDrosophilagenes."Genetics153(1):135-177.Uhlen,M.,L.Fagerberg,B.M.Hallstrom,C.Lindskog,P.Oksvold,A.Mardinoglu,A.Sivertsson,C.Kampf,E.Sjostedt,A.Asplund,I.Olsson,K.Edlund,E.Lundberg,S.Navani,C.A.Szigyarto,J.Odeberg,D.Djureinovic,J.O.Takanen,S.Hober,T.Alm,P.H.Edqvist,H.Berling,H.Tegel,J.Mulder,J.Rockberg,P.Nilsson,J.M.Schwenk,M.Hamsten,K.vonFeilitzen,M.Forsberg,L.Persson,F.Johansson,M.Zwahlen,G.vonHeijne,J.NielsenandF.Ponten(2015)."Proteomics.Tissue-basedmapofthehumanproteome."Science347(6220):1260419.Wang,Z.,M.GersteinandM.Snyder(2009)."RNA-Seq:arevolutionarytoolfortranscriptomics."NatRevGenet10(1):57-63.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint