Transcript
Page 1: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

TheDrosophilaGeneExpressionTool(DGET)forexpressionanalyses

YanhuiHu1,AramComjean1,NorbertPerrimon1,2,StephanieMohr1

1. Dept.ofGenetics,HarvardMedicalSchool,Boston,MA0211;2.HowardHughesMedicalInstitute

Correspondingauthor:StephanieMohr

Abstract

Background

Next-generationsequencingtechnologieshavegreatlyincreasedourabilitytoidentifygeneexpressionlevels,includingatspecificdevelopmentalstagesandinspecifictissues.Geneexpressiondatacanhelpresearchersunderstandthediversefunctionsofgenesandgenenetworks,aswellashelpinthedesignofspecificandefficientfunctionalstudies,suchasbyhelpingresearcherschoosethemostappropriatetissueforastudyofagroupofgenes,orconversely,bylimitingalonglistofgenecandidatestothesubsetthatarenormallyexpressedatagivenstageorinagiventissue.

Results

WereportaDrosophilaGeneExpressionTool(DGET,www.flyrnai.org/tools/dget/web/),whichstoresandfacilitatessearchofRNA-SeqbasedexpressionprofilesavailablefromthemodENCODEconsortiumandotherpublicdatasets.UsingDGET,researchersareabletolookupgeneexpressionprofiles,filterresultsbasedonthresholdexpressionvalues,andcompareexpressiondataacrossdifferentdevelopmentalstages,tissuesandtreatments.Inaddition,atDGETaresearchercananalyzetissueorstage-specificenrichmentforaninputtedlistofgenes(e.g.‘hits’fromascreen)andsearchforadditionalgeneswithsimilarexpressionpatterns.Weperformedanumberofanalysestodemonstratethequalityandrobustnessoftheresource.Inparticular,weshowthatevolutionaryconservedgenesexpressedathighormoderatelevelsinbothflyandhumantendtobeexpressedinsimilartissues.UsingDGET,wecomparedwholetissueprofileandsub-region/cell-typespecificdatasetsandestimatedthepotentialcauseoffalsepositivesinonedataset.WealsodemonstratedtheusefulnessofDGETforsynexpressionstudiesbyqueryinggeneswithsimilarexpressionprofiletothemesodermalmasterregulatorTwist.

Conclusion

Altogether,DGETprovidesaflexibletoolforexpressiondataretrievalandanalysiswithshortorlonglistsofDrosophilagenes,whichcanhelpscientiststodesignstage-ortissue-specificinvivostudiesanddoothersubsequentanalyses.

Keywords

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 2: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Drosophila,RNA-Seq,expressionprofile,synexpression

Background

Theapplicationofnext-generationsequencetechnologiestoRNAanalysishasopenedthedoortorelativelyrapid,large-scaleanalysesofgeneexpression.‘Standard’RNA-seqanalysis,forexample,canprovideasnapshotofgeneexpressioninspecificcelltypesortissues(Wang,Gersteinetal.2009),andrelatedtechnologiessuchasRibo-seq(MichelandBaranov2013)providemorerefinedviews,suchasasnapshotofwhatgenesareactivelytranscribedinagivencellortissue.ForDrosophila,effortssuchasthemodENCODEproject(mod,Royetal.2010,Cherbas,Willinghametal.2011,Graveley,Brooksetal.2011,Boley,Wanetal.2014)haveprovidedabaselineoverviewofexpressionunderstandardlaboratoryconditionsforvariousculturedcelltypes,developmentalstages,andtissues,aswellastreatmentconditions.Moreover,studiessuchasthoseinvestigatingexpressioninsub-regionsoftheflygut(MarianesandSpradling2013,Dutta,Dobsonetal.2015)areprovidingincreasinglydetailedviewsofthebaselineexpressionlevelsofvariousgenesinvarioustissues,celltypesandsub-regions.Altogether,theseRNAseqdataresourcesprovidehelpfulstartingpointsforanalysisofothergenelists.

ResourcessuchasFlyBase(dosSantos,Schroederetal.2015)makeitpossibletoquicklyviewmodENCODEdataforagivengeneandmakethesedatagenerallyaccessibletothecommunity.Thevalueofthesedatatothecommunitycanbefurtherincreasedbyfacilitatingsearchoflistsofgenes.Forexample,forgenelistsoriginatingfromwhole-animalorculturedcellstudies,orforstudiesbasedonalistoforthologsofgenesfromanotherspecies,itcanbeveryhelpfultogetapictureofwhatstagesortissuesnormallyexpressthosegenes,asthatwillhelpfocusstage-ortissue-specificinvivostudiesandothersubsequentanalyses.WeimplementedDGETtohelpscientistsretrievemodENCODEexpressiondatainbatchmode.DGETalsohostsotherrelevantRNA-Seqdatasetspublishedinindividualstudies,suchasprofilesofspecificsub-regionsandcelltypesoftheDrosophilagut(MarianesandSpradling2013,Dutta,Dobsonetal.2015).Here,wedescribeDGETandperformanumberofanalysestodemonstratethequalityandrobustnessoftheresource.

ResultsandDiscussion

Databasecontentandfeaturesoftheuserinterface(UI)

TheDGETdatabasecontainsprocessedRNA-SeqdatafromthemodENCODEconsortium(mod,Royetal.2010,Cherbas,Willinghametal.2011,Graveley,Brooksetal.2011,Boley,Wanetal.2014),asreleasedbyFlyBase(dosSantos,Schroederetal.2015),aswellasotherpublisheddatasetsweobtainedfromsupplementaltables(MarianesandSpradling2013,Dutta,Dobsonetal.2015,CloughandBarrett2016).TheDGETUIhastwotabs(Figure1).

Atthe“SearchGeneExpression”tab,userscanenteralistofgenesorchooseoneofthepredefinedgeneclassesfromGLAD(Hu,Comjeanetal.2015),e.g.kinases,thenspecifythedatasetstobedisplayed.Therearetwosearchoptions,“lookatexpression”and“enrichmentanalysis.”Theresultspagefor“lookatexpression”displaysexpressionvaluesinaheatmapformat.Atthisresultspage,usershavetheoptiontodownloadtherelevantexpressionvalues;downloadtheheatmap;andfurtherfilterthelistbydefiningacutoff,limittospecificdataset(s),orfilteringoutgenes,forexamplewithlessthan

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 3: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

1RPKMvaluebasedoncarcassand/ordigestionsystemexpressionof1dayadult.WeusedanRPKMcutoffof1becausethisisconsideredthecutofffor‘noorextremelylowexpression’atFlyBase.Theresultspageforanenrichmentanalysisdisplaysthedistributionofgenesatdifferentexpressionlevelsusingabargraphandheatmap.ThecutoffvaluesfordifferentlevelsaredefinedbasedonFlyBaseguidelines.

Usingthe“SearchSimilarGenes”tab,userscanenterageneofinterestandsearchforothergeneswithsimilarexpressionpatternbasedonPearsoncorrelationscore.Usershavetheoptionstodownloadthelistofgeneswithsimilarexpressionpatterns,aheatmap,andanormalizedheatmap.

ExpressionpatternofDrosophilaregulatorygenes

Whengenome-scalescreeningisnotpracticaltodo,acommonapproachistoselectaspecificsubsetofgenestostartwith,suchasagroupofgeneswithrelatedactivities.Themostfrequentlyscreenedsub-setsofgenesareimportantregulatorygenesincludinggenesthatencodekinases,phosphatases,transcriptionfactors,orcanonicalsignaltransductionpathwayscomponents.Ourexpectationisthattheseregulatorygenes,whichappeartobere-usedinmanycontexts,willbeexpressedinmanytissues.Totestthis,weanalyzedtheexpressionpatternsofseveralDrosophilaregulatorygeneclassesdefinedbyGLAD(Hu,Comjeanetal.2015).Theseincludedcanonicalsignaltransductionpathwaygenes,kinases,phosphatases,transcriptionfactors,secretedproteins,andreceptors.ThepercentagesofexpressedgeneswerecalculatedacrossalltissuesprofiledusingaRPKMof1oraboveasacutoffforexpressedversusnotexpressed(Figure2).About70-90%ofthegenescategorizedasencodingcanonicalsignaltransductionpathwaycomponents,kinases,phosphatases,ortranscriptionfactorsareexpressedineachofthemajortissuecategoriesprofiled,whereasonly30-60%ofreceptororsecretedproteinsaredetectedinanygiventissue.Correlationofexpressionwithconfidenceinanorthologrelationship

Itiswellestablishedthattheevolutionaryconservationofproteinscorrelateswithconservationatthelevelofbiologicaland/orbiochemicalfunctions.Drosophilaisamodelorganismofparticularinterestforwhichawidevarietyofmoleculargenetictoolsarereadilyavailable.Particularly,Drosophilamodelshavebeendevelopedforanumberofhumandiseases(Perrimon,Boninietal.2016).AccordingtoDIOPT,9,705of13,902protein-codinggenesinDrosophilaarepredictedtohavehumanortholog(s)(Hu,Flockhartetal.2011).UsingDGETweanalyzedtheexpressionlevelsofthesubsetofDrosophilagenesforwhichthereisevidencethattheyareconservedinthehumangenome.Specifically,weanalyzedsubsetsofgenesscoringasputativehumanorthologsofflygenesatdifferentlevelsofconfidence,asdefinedbytheDIOPTscore(Hu,Flockhartetal.2011).WefoundastrongcorrelationofpercentexpressedgeneswithDIOPTscore(Figure3).Forexample,forgenesthathaveahigh-confidenceorthologrelationship(DIOPTscoreof7orabove),almostallareexpressedacrossalltissues.Bycontrast,forgenesforwhichDIOPTanalysissuggeststhatthereisnoevidenceofahumanortholog(i.e.noneofthe10orthologalgorithmsqueriedwithDIOPTpredictanortholog),only20-50%areexpressedineachofthemajortissuecategoriesprofiled.Wesuspectthatthiscorrelationisdrivenby

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 4: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

essentialgenes,whicharemoreconservedevolutionary.Wealsonotethatgenesetenrichmentforthesetofhigh-confidenceorthologsindicatesthat“kinases”and“nucleotidebinding”amongthetop20enrichedsets,indicatingthatthesetofregulatorygenesanalyzedabovehasoverlapwiththisset.

Wenextanalyzedthe418DrosophilaessentialgenesidentifiedbySpradlingetal(Spradling,Sternetal.1999)usingalarge-scalesingleP-elementinsertionflystockcollection.TheproportionsofessentialgenesexpressedatdetectablelevelsinvarioustissuesareverysimilartothegeneswithDIOPTscore7-10(Figure3,lightpurpleanddarkpurplebars)withaPearsoncorrelationcoefficientequalto0.92.ExpressionpatternsofDrosophilaorthologsofhumangenesthatarehighlyexpressedinspecifictissues

Next,weaskedwhethergenesconservedbetweenhumanandDrosophilaarealsoexpressedinsimilarpatterns.Weusedthetissue-basedhumanproteomeannotationavailableattheHumanProteinAtlas(HPA)(www.proteinatlas.org)(Uhlen,Fagerbergetal.2015),asthesourcefortissue-specificexpression,andretrievedthesetofhumangenesthatareexpressedinspecifictissues.Next,wemappedthesehumangenestoDrosophilaorthologsusingDIOPT(Hu,Flockhartetal.2011),filteringoutlowrankorthologpairs(seeMaterialsandMethods),andanalyzedtheexpressionpatternsofthesehigh-confidenceorthologsinDrosophilatissuesusingDGET(Figure4).Theresultsofouranalysisusingallannotatedproteinswithoutafilterdidnotclearlydemonstrateconservationofexpressionpatterns.However,ananalysislimitedtogenesexpressedathighormoderatelevels(asannotatedbyHPA)fromhighconfidentannotation(i.e.excludingHPA“reliability”valueof“uncertain”),indicatesthatgeneexpressionpatternsareconservedinsimilartissuesinDrosophila.Forexample,asagroup,orthologsofgeneshighlyexpressedinthehumancerebellum,cerebralcortex,lateralventricleorhippocampusarehighlyexpressedintheDrosophilacentralnervoussystem(CNS)orhead,atbothlarvalandadultstages,andorthologsofgeneshighlyexpressedinhumantestisarealsohighlyexpressedintheDrosophilatestis.Moreover,orthologsofgenesfromsomeorgansofthehumandigestivesystem,suchasstomach,duodenumorsmallintestine,arealsohighlyexpressedintheDrosophiladigestivesystem.TofurthercomparetheexpressionpatternsofgenesexpressedinthehumanandDrosophila,digestivesystems,weanalyzedtheDrosophilagutsub-regiondatafromDuttaetal.(Dutta,Dobsonetal.2015)(Figure5).OrthologsofgeneshighlyexpressedinthehumansalivaryglandandesophagusarehighlyexpressedintheR1upstreamregion,andorthologsofgeneshighlyexpressedinthehumanrectum,colonorappendixaremorebiasedtowardsexpressionintheR5downstreamregion.Flyorthologsofgeneshighlyexpressedinthehumanstomach,duodenumandsmallintestinearedetectedthroughoutthesamplescorrespondingtoR1toR5.

Mininginformationfromdistinctbutrelatedflygutgeneexpressiondatasets

Wenextsoughttocomparetheresultsofwhole-gutprofilingwithresultsfromprofilingofspecificsub-regionsorcelltypeswiththegoalofidentifyinggenesonlyexpressedinspecificsub-populations.Ourrationalefortheanalysiswastodeterminethelikelihoodthatgenesexpressedinasub-populationaremissedinexpressionanalysisofanentireorgan.Thistypeoffalsenegativeanalysisshouldprovidehelpfulinformationforinterpretingresultsofwhole-organorwhole-tissuestudies.Thus,

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 5: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

wecomparedthewholegutprofilingdataobtainedbymodENCODEconsortiumfor20dayoldadultflies(mod,Royetal.2010)withdatageneratedbyprofilingsub-regionsofthemidgutin16-20dayoldadultflies(MarianesandSpradling2013).Wholegutprofilingindicatesthat9,109genesareexpressedinthegutof20dayoldadultflies(RPKMcutoffvalueof1).Amongthe4,790protein-codinggenesnotdetectedasexpressedinthewhole-gutstudy,136genesaredetectedinatleast3sub-regionsofthegut(RPKM>=3).Thesegenesareeitherfalsenegativeinwholegutprofilingorfalsepositiveinsub-regionprofiling.Wenextdidagenesetenrichmentanalysiswiththese136genesandfoundthatstressresponsegenes,suchasheat-shockgenes(Hsp70Aa,Hsp70Ab,Hsp70Ba,Hsp70Bbb)areenriched(Pvalue=3.05E-07).Thissuggeststhatthesampleusedforsub-regionprofilingwasassociatedwithsomelevelofstress.Comparingthelistof136geneswiththeDrosophilaessentialgenelist,wefoundonlyoneoverlappinggene.Inaddition,only23ofthe136geneshaveDIOPTscore7-10whenmappingtohumangenes.Thus,smallfractionofthesegenesmightbethefalsenegativewithwholetissueprofilingwhilemajorityofthegenesarelikelytobethefalsepositivesnotnormallypresentinthegutundernon-stressconditions.

SynexpressionanalysisfortranscriptionfactorTwist

Expressionprofilingisapowerfulapproachtoidentifyfunctionallyrelatedgenes,asgenesshowingsynexpressionoftenoperateinsimilarpathwaysand/orprocesses(seeforexample(Dequeant,Fagegaltieretal.2015)).WetestedDGETforitsusefulnessforsynexpressionstudiesbyqueryinggeneswithsimilarexpressionprofiletothemesodermalmasterregulatorTwist.DGETpreferentiallyretrievedTwisttargetgeneswithcelllinedataaswellasdevelopmentdata.Forexample,amongthetop27genesthatsharesimilarexpressionwithTwistincelllines(Pearsoncorrelationco-efficiencycutoff=0.7),11ofthemareTwisttargetgenesbasedonChip-on-chipdata(Sandmann,Girardotetal.2007),and8ofthe11genesarehigh-confidence(Table1).Theenrichmentp-valueforTwisttargetgenesis8.70E-04and3.05E-05forhigh-confidencetargets.Weobservedalesssignificantenrichmentwithdevelopmentdata(p-value5.00E-02forallTwisttargetgenesandp-valueof2.70E-03forhigh-confidencetargets),likelyreflectingthediversityofcelltypespresentinthedevelopmentaldataandthatnotenoughcellsexpresstwist.Thus,DGETwillbeverypowerfulwhenappliedtoRNA-seqdatasetsfromsinglecellorgroupsofhomogeneouscellpopulations.

ConcludingRemarks

Insummary,DGETmakesitpossibletoretrieveandcompareDrosophilageneexpressionpatternsgeneratedbyvariousgroupsusingRNA-Seq.Thetoolcanhelpscientistsdesignexperimentsbasedonexpressionandanalyzeexperimentresults.ThebackenddatabaseforDGETisdesignedtoeasilyaccommodatetheadditionofnewhighqualityRNA-Seqdatasetsastheybecomeavailable.Finally,althoughtheanatomyofhumanandDrosophilaarequitedifferent,byusingDGET,wedemonstratethatexpressionpatternsofgenesthatareconservedandhighlyexpressedareconservedbetweenhumanandDrosophilainmanymatchingtissues,underscoringtheutilityoftheDrosophilamodeltounderstandtheroleofhumangeneswithunknownfunctions.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 6: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Methods

Dataretrieval

ProcessedmodENCODEdatawereretrievedfromFlyBase(ftp://ftp.flybase.net/releases/current/precomputed_files/genes/gene_rpkm_report_fb_2015_05.tsv.gz).DatapublishedbyMarianesandSpradling(MarianesandSpradling2013)wereretrievedfromNCBIGeneExpressionOmnibusat(http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47780).DatapublishedbyDuttaetal(Dutta,Dobsonetal.2015)wereretrievedfromtheflygut-seqwebsite(http://flygutseq.buchonlab.com/resources).DataretrievedweremappedtoFlyBaseidentifiersfromrelease2015_5andformattedforuploadintotheFlyRNAidatabase(Hu,Flockhartetal.2011).

Expressionpatternanalysis

Humanproteinexpressiondatawereretrievedfromproteinatlas.organdtissue-specificgeneswereselectedusingthefile“ProteinAtlas_Normal_tissue_vs14.”Proteinswithhighormediumexpressionlevelswithareliabilityvalueof“supportive”wereselected.Proteinsexpressedinabroadrangeoftissues(i.e.morethan5tissues)werefilteredout.DIOPTvs5wasusedtomapgenesfromhumantoDrosophila(Hu,Flockhartetal.2011).‘Orthologpairrank’wasaddedatrecentDIOPTrelease5.2.1(http://www.flyrnai.org/DRSC-ORH.html#versions).Drosophilageneswithhighormoderaterankwereselected.Thehigh/moderaterankmappingincludethegenepairsthatarebestscoreineitherforwardorreversemapping(andDIOPTscore>1)aswellasgenepairswithDIOPTscore>3ifnotbestscoreeitherway.

Implementation

DGETwasimplementedusingphpandJavaScriptwithMySQLdatabasefordatastore.ItishostedonaserverprovidedbytheResearchITGroup(RITG)atHarvardMedicalSchool.TheMySQLdatabaseisalsohostedonaserverprovidedbyRITG.Plottingofheat-mapsforsvgdownloadisdoneinRusingthegplotheatmapfunction.Websitebarchartsaredrawnusingthe3C.jsplottingpackage.ThephpsymfonyframeworkscaffoldisusedtocreateDGETwebpagesandforms.

Declarations

FundingWorkattheDRSCissupportedbyNIGMSR01GM067761,NIGMSR01GM084947,andORIP/NCRRR24RR032668.S.E.M.isadditionallysupportedinpartbyNCICancerCenterSupportGrantNIH5P30CA06516(E.Benz,PI).N.PisanInvestigatoroftheHowardHughesMedicalInstitute.

Competinginterest

Theauthorsdeclarethattheyhavenocompetinginterests.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 7: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Authors’contributions

YHdesignedandtestedtheapplication,implementedtheback-endoftheapplication,performedtheanalysisanddraftedthemanuscript.ACimplementedtheuserinterfaceandcontributedtotheback-endoftheapplication.NPprovidedcriticalinputonkeyfeaturesandtheanalysisaswellaseditedthemanuscript.SEMprovidedoversightandcriticalinputonkeyfeaturesandtheanalysis,andhelpeddraftthemanuscript.Allauthorsreadandapprovedthefinalmanuscript.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 8: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Table1.DGETsimilargenesearchresultsforTwistwithcelllinedataFBgn Gene Correlationscore Twisttarget?*FBgn0005636 nvy 0.910987 Yes,highconfidentFBgn0031738 CG9171 0.88094 Yes,highconfidentFBgn0015568 alpha-Est1 0.831094 FBgn0035733 CG8641 0.816603 FBgn0034997 CG3376 0.813417 Yes,lowconfidentFBgn0040091 Ugt58Fa 0.799835 FBgn0039827 CG1544 0.773761 FBgn0010389 htl 0.772568 Yes,highconfidentFBgn0001250 if 0.769353 Yes,highconfidentFBgn0039799 CG15543 0.765649 FBgn0038755 Hs6st 0.76281 Yes,highconfidentFBgn0265577 CR44404 0.76095 FBgn0037439 CG10286 0.745739 FBgn0025682 scf 0.744414 FBgn0003301 rut 0.74375 Yes,lowconfidentFBgn0036147 Plod 0.73896 FBgn0000723 FER 0.738927 Yes,lowconfidentFBgn0034804 CG3831 0.735346 FBgn0051075 CG31075 0.731916 FBgn0263144 bin3 0.72961 Yes,highconfidentFBgn0000575 emc 0.728894 Yes,highconfidentFBgn0038353 CG5399 0.724139 FBgn0085407 Pvf3 0.720044 Yes,highconfidentFBgn0036857 CG9629 0.716929 FBgn0039073 CG4408 0.714359 FBgn0037632 Tcp-1eta 0.702547 FBgn0038804 CG10877 0.701509 *Twisttargetsasdefinedin(Sandmann,Girardotetal.2007)

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 9: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Figure1.TheDGETuserinterface.

1a.Onthe“SearchGeneExpression”page,userscaninputagenelistbypastingDrosophilageneorproteinsymbolsorIDsintothetextbox,orbyuploadingafile.ThespecificidentifiersacceptedareFlyBaseGeneIdentifier(FBgn),genesymbol,CGnumber,andfullgenename.UserscanchoosetolookatexpressionpatternsorperformanenrichmentanalysisoftheinputtedlistascomparedwiththeunderlyingRNA-Seqdata.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 10: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

1b.Atthe“SearchSimilarGenes”page,userscanenteragenesymbol(orotheracceptedidentifier)tofindgeneswithsimilarexpressionpatterns.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 11: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Figure2.ExpressionpatternsofgenesinmajorDrosophilaregulatorygenegroups.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 12: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Figure3.Relationshipbetweenexpressionlevelsandgeneconservation.

Drosophilagenesthatareconservedinthehumangenomeatdifferentconfidencelevels(i.e.differentDIOPTscores)wereanalyzedbyDGET.Wefoundthatacrossalltissues,expressionlevelscorrelatewithconfidenceintheorthologrelationship.Thatis,ingeneral,geneswithhigherDIOPTscoresvs.humangeneshavehigherexpressionlevels.GeneswithDIOPTscoresof7-10(lightpurplebars)havesimilarexpressionpatternsascomparedwithDrosophilaessentialgenes(darkpurplebars);i.e.inbothcases,thegenesarelikelytobeexpressedinmanyoralltissues.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 13: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Figure4.ComparisonofgeneexpressionpatternsinhumansandDrosophila.

High-confidenceDrosophilaorthologsofgenesthatarehighlyexpressedinthesmallintestine,ovary,testis,cerebellum,cerebralcortex,orothertissueswereanalyzedusingDGET.Foratleastsometissues,weseeacorrelationbetweengeneshighlyexpressedinspecifichumantissues(e.g.cerebellum,testis)andtheexpressionoforthologsincognatetissuesample(s)availableforDrosophila(e.g.CNSorhead,testis).

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 14: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

Figure5.ComparisonofDrosophilagutsub-regiondatawiththehumandigestivesystem.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 15: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

References

Boley,N.,K.H.Wan,P.J.BickelandS.E.Celniker(2014)."NavigatingandminingmodENCODEdata."Methods68(1):38-47.Cherbas,L.,A.Willingham,D.Zhang,L.Yang,Y.Zou,B.D.Eads,J.W.Carlson,J.M.Landolin,P.Kapranov,J.Dumais,A.Samsonova,J.H.Choi,J.Roberts,C.A.Davis,H.Tang,M.J.vanBaren,S.Ghosh,A.Dobin,K.Bell,W.Lin,L.Langton,M.O.Duff,A.E.Tenney,C.Zaleski,M.R.Brent,R.A.Hoskins,T.C.Kaufman,J.Andrews,B.R.Graveley,N.Perrimon,S.E.Celniker,T.R.GingerasandP.Cherbas(2011)."Thetranscriptionaldiversityof25Drosophilacelllines."GenomeRes21(2):301-314.Clough,E.andT.Barrett(2016)."TheGeneExpressionOmnibusDatabase."MethodsMolBiol1418:93-110.Dequeant,M.L.,D.Fagegaltier,Y.Hu,K.Spirohn,A.Simcox,G.J.HannonandN.Perrimon(2015)."Discoveryofprogenitorcellsignaturesbytime-seriessynexpressionanalysisduringDrosophilaembryoniccellimmortalization."ProcNatlAcadSciUSA112(42):12974-12979.dosSantos,G.,A.J.Schroeder,J.L.Goodman,V.B.Strelets,M.A.Crosby,J.Thurmond,D.B.Emmert,W.M.GelbartandC.FlyBase(2015)."FlyBase:introductionoftheDrosophilamelanogasterRelease6referencegenomeassemblyandlarge-scalemigrationofgenomeannotations."NucleicAcidsRes43(Databaseissue):D690-697.Dutta,D.,A.J.Dobson,P.L.Houtz,C.Glasser,J.Revah,J.Korzelius,P.H.Patel,B.A.EdgarandN.Buchon(2015)."RegionalCell-SpecificTranscriptomeMappingRevealsRegulatoryComplexityintheAdultDrosophilaMidgut."CellRep12(2):346-358.Graveley,B.R.,A.N.Brooks,J.W.Carlson,M.O.Duff,J.M.Landolin,L.Yang,C.G.Artieri,M.J.vanBaren,N.Boley,B.W.Booth,J.B.Brown,L.Cherbas,C.A.Davis,A.Dobin,R.Li,W.Lin,J.H.Malone,N.R.Mattiuzzo,D.Miller,D.Sturgill,B.B.Tuch,C.Zaleski,D.Zhang,M.Blanchette,S.Dudoit,B.Eads,R.E.Green,A.Hammonds,L.Jiang,P.Kapranov,L.Langton,N.Perrimon,J.E.Sandler,K.H.Wan,A.Willingham,Y.Zhang,Y.Zou,J.Andrews,P.J.Bickel,S.E.Brenner,M.R.Brent,P.Cherbas,T.R.Gingeras,R.A.Hoskins,T.C.Kaufman,B.OliverandS.E.Celniker(2011)."ThedevelopmentaltranscriptomeofDrosophilamelanogaster."Nature471(7339):473-479.Hu,Y.,A.Comjean,L.A.Perkins,N.PerrimonandS.E.Mohr(2015)."GLAD:anOnlineDatabaseofGeneListAnnotationforDrosophila."JGenomics3:75-81.Hu,Y.,I.Flockhart,A.Vinayagam,C.Bergwitz,B.Berger,N.PerrimonandS.E.Mohr(2011)."Anintegrativeapproachtoorthologpredictionfordisease-focusedandotherfunctionalstudies."BMCBioinformatics12:357.Marianes,A.andA.C.Spradling(2013)."PhysiologicalandstemcellcompartmentalizationwithintheDrosophilamidgut."Elife2:e00886.Michel,A.M.andP.V.Baranov(2013)."Ribosomeprofiling:aHi-Defmonitorforproteinsynthesisatthegenome-widescale."WileyInterdiscipRevRNA4(5):473-490.mod,E.C.,S.Roy,J.Ernst,P.V.Kharchenko,P.Kheradpour,N.Negre,M.L.Eaton,J.M.Landolin,C.A.Bristow,L.Ma,M.F.Lin,S.Washietl,B.I.Arshinoff,F.Ay,P.E.Meyer,N.Robine,N.L.Washington,L.DiStefano,E.Berezikov,C.D.Brown,R.Candeias,J.W.Carlson,A.Carr,I.Jungreis,D.Marbach,R.Sealfon,M.Y.Tolstorukov,S.Will,A.A.Alekseyenko,C.Artieri,B.W.Booth,A.N.Brooks,Q.Dai,C.A.Davis,M.O.Duff,X.Feng,A.A.Gorchakov,T.Gu,J.G.Henikoff,P.Kapranov,R.Li,H.K.MacAlpine,J.Malone,A.Minoda,J.Nordman,K.Okamura,M.Perry,S.K.Powell,N.C.Riddle,A.Sakai,A.Samsonova,J.E.Sandler,Y.B.Schwartz,N.Sher,R.Spokony,D.Sturgill,M.vanBaren,K.H.Wan,L.Yang,C.Yu,E.Feingold,P.Good,M.Guyer,R.Lowdon,K.Ahmad,J.Andrews,B.Berger,S.E.Brenner,M.R.Brent,L.Cherbas,S.C.Elgin,T.R.Gingeras,R.Grossman,R.A.Hoskins,T.C.Kaufman,W.Kent,M.I.Kuroda,T.Orr-Weaver,N.Perrimon,V.Pirrotta,J.W.Posakony,B.Ren,S.Russell,P.Cherbas,B.R.Graveley,S.Lewis,G.Micklem,B.Oliver,P.J.Park,S.E.Celniker,S.Henikoff,G.H.Karpen,E.C.Lai,D.M.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Page 16: The Drosophila Gene Expression Tool (DGET) for expression … › content › biorxiv › early › 2016 › 09 › 15 › 075… · The application of next-generation sequence technologies

MacAlpine,L.D.Stein,K.P.WhiteandM.Kellis(2010)."IdentificationoffunctionalelementsandregulatorycircuitsbyDrosophilamodENCODE."Science330(6012):1787-1797.Perrimon,N.,N.M.BoniniandP.Dhillon(2016)."Fruitfliesonthefrontline:thetranslationalimpactofDrosophila."DisModelMech9(3):229-231.Sandmann,T.,C.Girardot,M.Brehme,W.Tongprasit,V.StolcandE.E.Furlong(2007)."AcoretranscriptionalnetworkforearlymesodermdevelopmentinDrosophilamelanogaster."GenesDev21(4):436-449.Spradling,A.C.,D.Stern,A.Beaton,E.J.Rhem,T.Laverty,N.Mozden,S.MisraandG.M.Rubin(1999)."TheBerkeleyDrosophilaGenomeProjectgenedisruptionproject:SingleP-elementinsertionsmutating25%ofvitalDrosophilagenes."Genetics153(1):135-177.Uhlen,M.,L.Fagerberg,B.M.Hallstrom,C.Lindskog,P.Oksvold,A.Mardinoglu,A.Sivertsson,C.Kampf,E.Sjostedt,A.Asplund,I.Olsson,K.Edlund,E.Lundberg,S.Navani,C.A.Szigyarto,J.Odeberg,D.Djureinovic,J.O.Takanen,S.Hober,T.Alm,P.H.Edqvist,H.Berling,H.Tegel,J.Mulder,J.Rockberg,P.Nilsson,J.M.Schwenk,M.Hamsten,K.vonFeilitzen,M.Forsberg,L.Persson,F.Johansson,M.Zwahlen,G.vonHeijne,J.NielsenandF.Ponten(2015)."Proteomics.Tissue-basedmapofthehumanproteome."Science347(6220):1260419.Wang,Z.,M.GersteinandM.Snyder(2009)."RNA-Seq:arevolutionarytoolfortranscriptomics."NatRevGenet10(1):57-63.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint


Top Related