the drosophila gene expression tool (dget) for expression … › content › biorxiv › early ›...

Post on 03-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

TheDrosophilaGeneExpressionTool(DGET)forexpressionanalyses

YanhuiHu1,AramComjean1,NorbertPerrimon1,2,StephanieMohr1

1. Dept.ofGenetics,HarvardMedicalSchool,Boston,MA0211;2.HowardHughesMedicalInstitute

Correspondingauthor:StephanieMohr

Abstract

Background

Next-generationsequencingtechnologieshavegreatlyincreasedourabilitytoidentifygeneexpressionlevels,includingatspecificdevelopmentalstagesandinspecifictissues.Geneexpressiondatacanhelpresearchersunderstandthediversefunctionsofgenesandgenenetworks,aswellashelpinthedesignofspecificandefficientfunctionalstudies,suchasbyhelpingresearcherschoosethemostappropriatetissueforastudyofagroupofgenes,orconversely,bylimitingalonglistofgenecandidatestothesubsetthatarenormallyexpressedatagivenstageorinagiventissue.

Results

WereportaDrosophilaGeneExpressionTool(DGET,www.flyrnai.org/tools/dget/web/),whichstoresandfacilitatessearchofRNA-SeqbasedexpressionprofilesavailablefromthemodENCODEconsortiumandotherpublicdatasets.UsingDGET,researchersareabletolookupgeneexpressionprofiles,filterresultsbasedonthresholdexpressionvalues,andcompareexpressiondataacrossdifferentdevelopmentalstages,tissuesandtreatments.Inaddition,atDGETaresearchercananalyzetissueorstage-specificenrichmentforaninputtedlistofgenes(e.g.‘hits’fromascreen)andsearchforadditionalgeneswithsimilarexpressionpatterns.Weperformedanumberofanalysestodemonstratethequalityandrobustnessoftheresource.Inparticular,weshowthatevolutionaryconservedgenesexpressedathighormoderatelevelsinbothflyandhumantendtobeexpressedinsimilartissues.UsingDGET,wecomparedwholetissueprofileandsub-region/cell-typespecificdatasetsandestimatedthepotentialcauseoffalsepositivesinonedataset.WealsodemonstratedtheusefulnessofDGETforsynexpressionstudiesbyqueryinggeneswithsimilarexpressionprofiletothemesodermalmasterregulatorTwist.

Conclusion

Altogether,DGETprovidesaflexibletoolforexpressiondataretrievalandanalysiswithshortorlonglistsofDrosophilagenes,whichcanhelpscientiststodesignstage-ortissue-specificinvivostudiesanddoothersubsequentanalyses.

Keywords

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Drosophila,RNA-Seq,expressionprofile,synexpression

Background

Theapplicationofnext-generationsequencetechnologiestoRNAanalysishasopenedthedoortorelativelyrapid,large-scaleanalysesofgeneexpression.‘Standard’RNA-seqanalysis,forexample,canprovideasnapshotofgeneexpressioninspecificcelltypesortissues(Wang,Gersteinetal.2009),andrelatedtechnologiessuchasRibo-seq(MichelandBaranov2013)providemorerefinedviews,suchasasnapshotofwhatgenesareactivelytranscribedinagivencellortissue.ForDrosophila,effortssuchasthemodENCODEproject(mod,Royetal.2010,Cherbas,Willinghametal.2011,Graveley,Brooksetal.2011,Boley,Wanetal.2014)haveprovidedabaselineoverviewofexpressionunderstandardlaboratoryconditionsforvariousculturedcelltypes,developmentalstages,andtissues,aswellastreatmentconditions.Moreover,studiessuchasthoseinvestigatingexpressioninsub-regionsoftheflygut(MarianesandSpradling2013,Dutta,Dobsonetal.2015)areprovidingincreasinglydetailedviewsofthebaselineexpressionlevelsofvariousgenesinvarioustissues,celltypesandsub-regions.Altogether,theseRNAseqdataresourcesprovidehelpfulstartingpointsforanalysisofothergenelists.

ResourcessuchasFlyBase(dosSantos,Schroederetal.2015)makeitpossibletoquicklyviewmodENCODEdataforagivengeneandmakethesedatagenerallyaccessibletothecommunity.Thevalueofthesedatatothecommunitycanbefurtherincreasedbyfacilitatingsearchoflistsofgenes.Forexample,forgenelistsoriginatingfromwhole-animalorculturedcellstudies,orforstudiesbasedonalistoforthologsofgenesfromanotherspecies,itcanbeveryhelpfultogetapictureofwhatstagesortissuesnormallyexpressthosegenes,asthatwillhelpfocusstage-ortissue-specificinvivostudiesandothersubsequentanalyses.WeimplementedDGETtohelpscientistsretrievemodENCODEexpressiondatainbatchmode.DGETalsohostsotherrelevantRNA-Seqdatasetspublishedinindividualstudies,suchasprofilesofspecificsub-regionsandcelltypesoftheDrosophilagut(MarianesandSpradling2013,Dutta,Dobsonetal.2015).Here,wedescribeDGETandperformanumberofanalysestodemonstratethequalityandrobustnessoftheresource.

ResultsandDiscussion

Databasecontentandfeaturesoftheuserinterface(UI)

TheDGETdatabasecontainsprocessedRNA-SeqdatafromthemodENCODEconsortium(mod,Royetal.2010,Cherbas,Willinghametal.2011,Graveley,Brooksetal.2011,Boley,Wanetal.2014),asreleasedbyFlyBase(dosSantos,Schroederetal.2015),aswellasotherpublisheddatasetsweobtainedfromsupplementaltables(MarianesandSpradling2013,Dutta,Dobsonetal.2015,CloughandBarrett2016).TheDGETUIhastwotabs(Figure1).

Atthe“SearchGeneExpression”tab,userscanenteralistofgenesorchooseoneofthepredefinedgeneclassesfromGLAD(Hu,Comjeanetal.2015),e.g.kinases,thenspecifythedatasetstobedisplayed.Therearetwosearchoptions,“lookatexpression”and“enrichmentanalysis.”Theresultspagefor“lookatexpression”displaysexpressionvaluesinaheatmapformat.Atthisresultspage,usershavetheoptiontodownloadtherelevantexpressionvalues;downloadtheheatmap;andfurtherfilterthelistbydefiningacutoff,limittospecificdataset(s),orfilteringoutgenes,forexamplewithlessthan

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

1RPKMvaluebasedoncarcassand/ordigestionsystemexpressionof1dayadult.WeusedanRPKMcutoffof1becausethisisconsideredthecutofffor‘noorextremelylowexpression’atFlyBase.Theresultspageforanenrichmentanalysisdisplaysthedistributionofgenesatdifferentexpressionlevelsusingabargraphandheatmap.ThecutoffvaluesfordifferentlevelsaredefinedbasedonFlyBaseguidelines.

Usingthe“SearchSimilarGenes”tab,userscanenterageneofinterestandsearchforothergeneswithsimilarexpressionpatternbasedonPearsoncorrelationscore.Usershavetheoptionstodownloadthelistofgeneswithsimilarexpressionpatterns,aheatmap,andanormalizedheatmap.

ExpressionpatternofDrosophilaregulatorygenes

Whengenome-scalescreeningisnotpracticaltodo,acommonapproachistoselectaspecificsubsetofgenestostartwith,suchasagroupofgeneswithrelatedactivities.Themostfrequentlyscreenedsub-setsofgenesareimportantregulatorygenesincludinggenesthatencodekinases,phosphatases,transcriptionfactors,orcanonicalsignaltransductionpathwayscomponents.Ourexpectationisthattheseregulatorygenes,whichappeartobere-usedinmanycontexts,willbeexpressedinmanytissues.Totestthis,weanalyzedtheexpressionpatternsofseveralDrosophilaregulatorygeneclassesdefinedbyGLAD(Hu,Comjeanetal.2015).Theseincludedcanonicalsignaltransductionpathwaygenes,kinases,phosphatases,transcriptionfactors,secretedproteins,andreceptors.ThepercentagesofexpressedgeneswerecalculatedacrossalltissuesprofiledusingaRPKMof1oraboveasacutoffforexpressedversusnotexpressed(Figure2).About70-90%ofthegenescategorizedasencodingcanonicalsignaltransductionpathwaycomponents,kinases,phosphatases,ortranscriptionfactorsareexpressedineachofthemajortissuecategoriesprofiled,whereasonly30-60%ofreceptororsecretedproteinsaredetectedinanygiventissue.Correlationofexpressionwithconfidenceinanorthologrelationship

Itiswellestablishedthattheevolutionaryconservationofproteinscorrelateswithconservationatthelevelofbiologicaland/orbiochemicalfunctions.Drosophilaisamodelorganismofparticularinterestforwhichawidevarietyofmoleculargenetictoolsarereadilyavailable.Particularly,Drosophilamodelshavebeendevelopedforanumberofhumandiseases(Perrimon,Boninietal.2016).AccordingtoDIOPT,9,705of13,902protein-codinggenesinDrosophilaarepredictedtohavehumanortholog(s)(Hu,Flockhartetal.2011).UsingDGETweanalyzedtheexpressionlevelsofthesubsetofDrosophilagenesforwhichthereisevidencethattheyareconservedinthehumangenome.Specifically,weanalyzedsubsetsofgenesscoringasputativehumanorthologsofflygenesatdifferentlevelsofconfidence,asdefinedbytheDIOPTscore(Hu,Flockhartetal.2011).WefoundastrongcorrelationofpercentexpressedgeneswithDIOPTscore(Figure3).Forexample,forgenesthathaveahigh-confidenceorthologrelationship(DIOPTscoreof7orabove),almostallareexpressedacrossalltissues.Bycontrast,forgenesforwhichDIOPTanalysissuggeststhatthereisnoevidenceofahumanortholog(i.e.noneofthe10orthologalgorithmsqueriedwithDIOPTpredictanortholog),only20-50%areexpressedineachofthemajortissuecategoriesprofiled.Wesuspectthatthiscorrelationisdrivenby

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

essentialgenes,whicharemoreconservedevolutionary.Wealsonotethatgenesetenrichmentforthesetofhigh-confidenceorthologsindicatesthat“kinases”and“nucleotidebinding”amongthetop20enrichedsets,indicatingthatthesetofregulatorygenesanalyzedabovehasoverlapwiththisset.

Wenextanalyzedthe418DrosophilaessentialgenesidentifiedbySpradlingetal(Spradling,Sternetal.1999)usingalarge-scalesingleP-elementinsertionflystockcollection.TheproportionsofessentialgenesexpressedatdetectablelevelsinvarioustissuesareverysimilartothegeneswithDIOPTscore7-10(Figure3,lightpurpleanddarkpurplebars)withaPearsoncorrelationcoefficientequalto0.92.ExpressionpatternsofDrosophilaorthologsofhumangenesthatarehighlyexpressedinspecifictissues

Next,weaskedwhethergenesconservedbetweenhumanandDrosophilaarealsoexpressedinsimilarpatterns.Weusedthetissue-basedhumanproteomeannotationavailableattheHumanProteinAtlas(HPA)(www.proteinatlas.org)(Uhlen,Fagerbergetal.2015),asthesourcefortissue-specificexpression,andretrievedthesetofhumangenesthatareexpressedinspecifictissues.Next,wemappedthesehumangenestoDrosophilaorthologsusingDIOPT(Hu,Flockhartetal.2011),filteringoutlowrankorthologpairs(seeMaterialsandMethods),andanalyzedtheexpressionpatternsofthesehigh-confidenceorthologsinDrosophilatissuesusingDGET(Figure4).Theresultsofouranalysisusingallannotatedproteinswithoutafilterdidnotclearlydemonstrateconservationofexpressionpatterns.However,ananalysislimitedtogenesexpressedathighormoderatelevels(asannotatedbyHPA)fromhighconfidentannotation(i.e.excludingHPA“reliability”valueof“uncertain”),indicatesthatgeneexpressionpatternsareconservedinsimilartissuesinDrosophila.Forexample,asagroup,orthologsofgeneshighlyexpressedinthehumancerebellum,cerebralcortex,lateralventricleorhippocampusarehighlyexpressedintheDrosophilacentralnervoussystem(CNS)orhead,atbothlarvalandadultstages,andorthologsofgeneshighlyexpressedinhumantestisarealsohighlyexpressedintheDrosophilatestis.Moreover,orthologsofgenesfromsomeorgansofthehumandigestivesystem,suchasstomach,duodenumorsmallintestine,arealsohighlyexpressedintheDrosophiladigestivesystem.TofurthercomparetheexpressionpatternsofgenesexpressedinthehumanandDrosophila,digestivesystems,weanalyzedtheDrosophilagutsub-regiondatafromDuttaetal.(Dutta,Dobsonetal.2015)(Figure5).OrthologsofgeneshighlyexpressedinthehumansalivaryglandandesophagusarehighlyexpressedintheR1upstreamregion,andorthologsofgeneshighlyexpressedinthehumanrectum,colonorappendixaremorebiasedtowardsexpressionintheR5downstreamregion.Flyorthologsofgeneshighlyexpressedinthehumanstomach,duodenumandsmallintestinearedetectedthroughoutthesamplescorrespondingtoR1toR5.

Mininginformationfromdistinctbutrelatedflygutgeneexpressiondatasets

Wenextsoughttocomparetheresultsofwhole-gutprofilingwithresultsfromprofilingofspecificsub-regionsorcelltypeswiththegoalofidentifyinggenesonlyexpressedinspecificsub-populations.Ourrationalefortheanalysiswastodeterminethelikelihoodthatgenesexpressedinasub-populationaremissedinexpressionanalysisofanentireorgan.Thistypeoffalsenegativeanalysisshouldprovidehelpfulinformationforinterpretingresultsofwhole-organorwhole-tissuestudies.Thus,

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

wecomparedthewholegutprofilingdataobtainedbymodENCODEconsortiumfor20dayoldadultflies(mod,Royetal.2010)withdatageneratedbyprofilingsub-regionsofthemidgutin16-20dayoldadultflies(MarianesandSpradling2013).Wholegutprofilingindicatesthat9,109genesareexpressedinthegutof20dayoldadultflies(RPKMcutoffvalueof1).Amongthe4,790protein-codinggenesnotdetectedasexpressedinthewhole-gutstudy,136genesaredetectedinatleast3sub-regionsofthegut(RPKM>=3).Thesegenesareeitherfalsenegativeinwholegutprofilingorfalsepositiveinsub-regionprofiling.Wenextdidagenesetenrichmentanalysiswiththese136genesandfoundthatstressresponsegenes,suchasheat-shockgenes(Hsp70Aa,Hsp70Ab,Hsp70Ba,Hsp70Bbb)areenriched(Pvalue=3.05E-07).Thissuggeststhatthesampleusedforsub-regionprofilingwasassociatedwithsomelevelofstress.Comparingthelistof136geneswiththeDrosophilaessentialgenelist,wefoundonlyoneoverlappinggene.Inaddition,only23ofthe136geneshaveDIOPTscore7-10whenmappingtohumangenes.Thus,smallfractionofthesegenesmightbethefalsenegativewithwholetissueprofilingwhilemajorityofthegenesarelikelytobethefalsepositivesnotnormallypresentinthegutundernon-stressconditions.

SynexpressionanalysisfortranscriptionfactorTwist

Expressionprofilingisapowerfulapproachtoidentifyfunctionallyrelatedgenes,asgenesshowingsynexpressionoftenoperateinsimilarpathwaysand/orprocesses(seeforexample(Dequeant,Fagegaltieretal.2015)).WetestedDGETforitsusefulnessforsynexpressionstudiesbyqueryinggeneswithsimilarexpressionprofiletothemesodermalmasterregulatorTwist.DGETpreferentiallyretrievedTwisttargetgeneswithcelllinedataaswellasdevelopmentdata.Forexample,amongthetop27genesthatsharesimilarexpressionwithTwistincelllines(Pearsoncorrelationco-efficiencycutoff=0.7),11ofthemareTwisttargetgenesbasedonChip-on-chipdata(Sandmann,Girardotetal.2007),and8ofthe11genesarehigh-confidence(Table1).Theenrichmentp-valueforTwisttargetgenesis8.70E-04and3.05E-05forhigh-confidencetargets.Weobservedalesssignificantenrichmentwithdevelopmentdata(p-value5.00E-02forallTwisttargetgenesandp-valueof2.70E-03forhigh-confidencetargets),likelyreflectingthediversityofcelltypespresentinthedevelopmentaldataandthatnotenoughcellsexpresstwist.Thus,DGETwillbeverypowerfulwhenappliedtoRNA-seqdatasetsfromsinglecellorgroupsofhomogeneouscellpopulations.

ConcludingRemarks

Insummary,DGETmakesitpossibletoretrieveandcompareDrosophilageneexpressionpatternsgeneratedbyvariousgroupsusingRNA-Seq.Thetoolcanhelpscientistsdesignexperimentsbasedonexpressionandanalyzeexperimentresults.ThebackenddatabaseforDGETisdesignedtoeasilyaccommodatetheadditionofnewhighqualityRNA-Seqdatasetsastheybecomeavailable.Finally,althoughtheanatomyofhumanandDrosophilaarequitedifferent,byusingDGET,wedemonstratethatexpressionpatternsofgenesthatareconservedandhighlyexpressedareconservedbetweenhumanandDrosophilainmanymatchingtissues,underscoringtheutilityoftheDrosophilamodeltounderstandtheroleofhumangeneswithunknownfunctions.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Methods

Dataretrieval

ProcessedmodENCODEdatawereretrievedfromFlyBase(ftp://ftp.flybase.net/releases/current/precomputed_files/genes/gene_rpkm_report_fb_2015_05.tsv.gz).DatapublishedbyMarianesandSpradling(MarianesandSpradling2013)wereretrievedfromNCBIGeneExpressionOmnibusat(http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47780).DatapublishedbyDuttaetal(Dutta,Dobsonetal.2015)wereretrievedfromtheflygut-seqwebsite(http://flygutseq.buchonlab.com/resources).DataretrievedweremappedtoFlyBaseidentifiersfromrelease2015_5andformattedforuploadintotheFlyRNAidatabase(Hu,Flockhartetal.2011).

Expressionpatternanalysis

Humanproteinexpressiondatawereretrievedfromproteinatlas.organdtissue-specificgeneswereselectedusingthefile“ProteinAtlas_Normal_tissue_vs14.”Proteinswithhighormediumexpressionlevelswithareliabilityvalueof“supportive”wereselected.Proteinsexpressedinabroadrangeoftissues(i.e.morethan5tissues)werefilteredout.DIOPTvs5wasusedtomapgenesfromhumantoDrosophila(Hu,Flockhartetal.2011).‘Orthologpairrank’wasaddedatrecentDIOPTrelease5.2.1(http://www.flyrnai.org/DRSC-ORH.html#versions).Drosophilageneswithhighormoderaterankwereselected.Thehigh/moderaterankmappingincludethegenepairsthatarebestscoreineitherforwardorreversemapping(andDIOPTscore>1)aswellasgenepairswithDIOPTscore>3ifnotbestscoreeitherway.

Implementation

DGETwasimplementedusingphpandJavaScriptwithMySQLdatabasefordatastore.ItishostedonaserverprovidedbytheResearchITGroup(RITG)atHarvardMedicalSchool.TheMySQLdatabaseisalsohostedonaserverprovidedbyRITG.Plottingofheat-mapsforsvgdownloadisdoneinRusingthegplotheatmapfunction.Websitebarchartsaredrawnusingthe3C.jsplottingpackage.ThephpsymfonyframeworkscaffoldisusedtocreateDGETwebpagesandforms.

Declarations

FundingWorkattheDRSCissupportedbyNIGMSR01GM067761,NIGMSR01GM084947,andORIP/NCRRR24RR032668.S.E.M.isadditionallysupportedinpartbyNCICancerCenterSupportGrantNIH5P30CA06516(E.Benz,PI).N.PisanInvestigatoroftheHowardHughesMedicalInstitute.

Competinginterest

Theauthorsdeclarethattheyhavenocompetinginterests.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Authors’contributions

YHdesignedandtestedtheapplication,implementedtheback-endoftheapplication,performedtheanalysisanddraftedthemanuscript.ACimplementedtheuserinterfaceandcontributedtotheback-endoftheapplication.NPprovidedcriticalinputonkeyfeaturesandtheanalysisaswellaseditedthemanuscript.SEMprovidedoversightandcriticalinputonkeyfeaturesandtheanalysis,andhelpeddraftthemanuscript.Allauthorsreadandapprovedthefinalmanuscript.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Table1.DGETsimilargenesearchresultsforTwistwithcelllinedataFBgn Gene Correlationscore Twisttarget?*FBgn0005636 nvy 0.910987 Yes,highconfidentFBgn0031738 CG9171 0.88094 Yes,highconfidentFBgn0015568 alpha-Est1 0.831094 FBgn0035733 CG8641 0.816603 FBgn0034997 CG3376 0.813417 Yes,lowconfidentFBgn0040091 Ugt58Fa 0.799835 FBgn0039827 CG1544 0.773761 FBgn0010389 htl 0.772568 Yes,highconfidentFBgn0001250 if 0.769353 Yes,highconfidentFBgn0039799 CG15543 0.765649 FBgn0038755 Hs6st 0.76281 Yes,highconfidentFBgn0265577 CR44404 0.76095 FBgn0037439 CG10286 0.745739 FBgn0025682 scf 0.744414 FBgn0003301 rut 0.74375 Yes,lowconfidentFBgn0036147 Plod 0.73896 FBgn0000723 FER 0.738927 Yes,lowconfidentFBgn0034804 CG3831 0.735346 FBgn0051075 CG31075 0.731916 FBgn0263144 bin3 0.72961 Yes,highconfidentFBgn0000575 emc 0.728894 Yes,highconfidentFBgn0038353 CG5399 0.724139 FBgn0085407 Pvf3 0.720044 Yes,highconfidentFBgn0036857 CG9629 0.716929 FBgn0039073 CG4408 0.714359 FBgn0037632 Tcp-1eta 0.702547 FBgn0038804 CG10877 0.701509 *Twisttargetsasdefinedin(Sandmann,Girardotetal.2007)

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Figure1.TheDGETuserinterface.

1a.Onthe“SearchGeneExpression”page,userscaninputagenelistbypastingDrosophilageneorproteinsymbolsorIDsintothetextbox,orbyuploadingafile.ThespecificidentifiersacceptedareFlyBaseGeneIdentifier(FBgn),genesymbol,CGnumber,andfullgenename.UserscanchoosetolookatexpressionpatternsorperformanenrichmentanalysisoftheinputtedlistascomparedwiththeunderlyingRNA-Seqdata.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

1b.Atthe“SearchSimilarGenes”page,userscanenteragenesymbol(orotheracceptedidentifier)tofindgeneswithsimilarexpressionpatterns.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Figure2.ExpressionpatternsofgenesinmajorDrosophilaregulatorygenegroups.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Figure3.Relationshipbetweenexpressionlevelsandgeneconservation.

Drosophilagenesthatareconservedinthehumangenomeatdifferentconfidencelevels(i.e.differentDIOPTscores)wereanalyzedbyDGET.Wefoundthatacrossalltissues,expressionlevelscorrelatewithconfidenceintheorthologrelationship.Thatis,ingeneral,geneswithhigherDIOPTscoresvs.humangeneshavehigherexpressionlevels.GeneswithDIOPTscoresof7-10(lightpurplebars)havesimilarexpressionpatternsascomparedwithDrosophilaessentialgenes(darkpurplebars);i.e.inbothcases,thegenesarelikelytobeexpressedinmanyoralltissues.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Figure4.ComparisonofgeneexpressionpatternsinhumansandDrosophila.

High-confidenceDrosophilaorthologsofgenesthatarehighlyexpressedinthesmallintestine,ovary,testis,cerebellum,cerebralcortex,orothertissueswereanalyzedusingDGET.Foratleastsometissues,weseeacorrelationbetweengeneshighlyexpressedinspecifichumantissues(e.g.cerebellum,testis)andtheexpressionoforthologsincognatetissuesample(s)availableforDrosophila(e.g.CNSorhead,testis).

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

Figure5.ComparisonofDrosophilagutsub-regiondatawiththehumandigestivesystem.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

References

Boley,N.,K.H.Wan,P.J.BickelandS.E.Celniker(2014)."NavigatingandminingmodENCODEdata."Methods68(1):38-47.Cherbas,L.,A.Willingham,D.Zhang,L.Yang,Y.Zou,B.D.Eads,J.W.Carlson,J.M.Landolin,P.Kapranov,J.Dumais,A.Samsonova,J.H.Choi,J.Roberts,C.A.Davis,H.Tang,M.J.vanBaren,S.Ghosh,A.Dobin,K.Bell,W.Lin,L.Langton,M.O.Duff,A.E.Tenney,C.Zaleski,M.R.Brent,R.A.Hoskins,T.C.Kaufman,J.Andrews,B.R.Graveley,N.Perrimon,S.E.Celniker,T.R.GingerasandP.Cherbas(2011)."Thetranscriptionaldiversityof25Drosophilacelllines."GenomeRes21(2):301-314.Clough,E.andT.Barrett(2016)."TheGeneExpressionOmnibusDatabase."MethodsMolBiol1418:93-110.Dequeant,M.L.,D.Fagegaltier,Y.Hu,K.Spirohn,A.Simcox,G.J.HannonandN.Perrimon(2015)."Discoveryofprogenitorcellsignaturesbytime-seriessynexpressionanalysisduringDrosophilaembryoniccellimmortalization."ProcNatlAcadSciUSA112(42):12974-12979.dosSantos,G.,A.J.Schroeder,J.L.Goodman,V.B.Strelets,M.A.Crosby,J.Thurmond,D.B.Emmert,W.M.GelbartandC.FlyBase(2015)."FlyBase:introductionoftheDrosophilamelanogasterRelease6referencegenomeassemblyandlarge-scalemigrationofgenomeannotations."NucleicAcidsRes43(Databaseissue):D690-697.Dutta,D.,A.J.Dobson,P.L.Houtz,C.Glasser,J.Revah,J.Korzelius,P.H.Patel,B.A.EdgarandN.Buchon(2015)."RegionalCell-SpecificTranscriptomeMappingRevealsRegulatoryComplexityintheAdultDrosophilaMidgut."CellRep12(2):346-358.Graveley,B.R.,A.N.Brooks,J.W.Carlson,M.O.Duff,J.M.Landolin,L.Yang,C.G.Artieri,M.J.vanBaren,N.Boley,B.W.Booth,J.B.Brown,L.Cherbas,C.A.Davis,A.Dobin,R.Li,W.Lin,J.H.Malone,N.R.Mattiuzzo,D.Miller,D.Sturgill,B.B.Tuch,C.Zaleski,D.Zhang,M.Blanchette,S.Dudoit,B.Eads,R.E.Green,A.Hammonds,L.Jiang,P.Kapranov,L.Langton,N.Perrimon,J.E.Sandler,K.H.Wan,A.Willingham,Y.Zhang,Y.Zou,J.Andrews,P.J.Bickel,S.E.Brenner,M.R.Brent,P.Cherbas,T.R.Gingeras,R.A.Hoskins,T.C.Kaufman,B.OliverandS.E.Celniker(2011)."ThedevelopmentaltranscriptomeofDrosophilamelanogaster."Nature471(7339):473-479.Hu,Y.,A.Comjean,L.A.Perkins,N.PerrimonandS.E.Mohr(2015)."GLAD:anOnlineDatabaseofGeneListAnnotationforDrosophila."JGenomics3:75-81.Hu,Y.,I.Flockhart,A.Vinayagam,C.Bergwitz,B.Berger,N.PerrimonandS.E.Mohr(2011)."Anintegrativeapproachtoorthologpredictionfordisease-focusedandotherfunctionalstudies."BMCBioinformatics12:357.Marianes,A.andA.C.Spradling(2013)."PhysiologicalandstemcellcompartmentalizationwithintheDrosophilamidgut."Elife2:e00886.Michel,A.M.andP.V.Baranov(2013)."Ribosomeprofiling:aHi-Defmonitorforproteinsynthesisatthegenome-widescale."WileyInterdiscipRevRNA4(5):473-490.mod,E.C.,S.Roy,J.Ernst,P.V.Kharchenko,P.Kheradpour,N.Negre,M.L.Eaton,J.M.Landolin,C.A.Bristow,L.Ma,M.F.Lin,S.Washietl,B.I.Arshinoff,F.Ay,P.E.Meyer,N.Robine,N.L.Washington,L.DiStefano,E.Berezikov,C.D.Brown,R.Candeias,J.W.Carlson,A.Carr,I.Jungreis,D.Marbach,R.Sealfon,M.Y.Tolstorukov,S.Will,A.A.Alekseyenko,C.Artieri,B.W.Booth,A.N.Brooks,Q.Dai,C.A.Davis,M.O.Duff,X.Feng,A.A.Gorchakov,T.Gu,J.G.Henikoff,P.Kapranov,R.Li,H.K.MacAlpine,J.Malone,A.Minoda,J.Nordman,K.Okamura,M.Perry,S.K.Powell,N.C.Riddle,A.Sakai,A.Samsonova,J.E.Sandler,Y.B.Schwartz,N.Sher,R.Spokony,D.Sturgill,M.vanBaren,K.H.Wan,L.Yang,C.Yu,E.Feingold,P.Good,M.Guyer,R.Lowdon,K.Ahmad,J.Andrews,B.Berger,S.E.Brenner,M.R.Brent,L.Cherbas,S.C.Elgin,T.R.Gingeras,R.Grossman,R.A.Hoskins,T.C.Kaufman,W.Kent,M.I.Kuroda,T.Orr-Weaver,N.Perrimon,V.Pirrotta,J.W.Posakony,B.Ren,S.Russell,P.Cherbas,B.R.Graveley,S.Lewis,G.Micklem,B.Oliver,P.J.Park,S.E.Celniker,S.Henikoff,G.H.Karpen,E.C.Lai,D.M.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

MacAlpine,L.D.Stein,K.P.WhiteandM.Kellis(2010)."IdentificationoffunctionalelementsandregulatorycircuitsbyDrosophilamodENCODE."Science330(6012):1787-1797.Perrimon,N.,N.M.BoniniandP.Dhillon(2016)."Fruitfliesonthefrontline:thetranslationalimpactofDrosophila."DisModelMech9(3):229-231.Sandmann,T.,C.Girardot,M.Brehme,W.Tongprasit,V.StolcandE.E.Furlong(2007)."AcoretranscriptionalnetworkforearlymesodermdevelopmentinDrosophilamelanogaster."GenesDev21(4):436-449.Spradling,A.C.,D.Stern,A.Beaton,E.J.Rhem,T.Laverty,N.Mozden,S.MisraandG.M.Rubin(1999)."TheBerkeleyDrosophilaGenomeProjectgenedisruptionproject:SingleP-elementinsertionsmutating25%ofvitalDrosophilagenes."Genetics153(1):135-177.Uhlen,M.,L.Fagerberg,B.M.Hallstrom,C.Lindskog,P.Oksvold,A.Mardinoglu,A.Sivertsson,C.Kampf,E.Sjostedt,A.Asplund,I.Olsson,K.Edlund,E.Lundberg,S.Navani,C.A.Szigyarto,J.Odeberg,D.Djureinovic,J.O.Takanen,S.Hober,T.Alm,P.H.Edqvist,H.Berling,H.Tegel,J.Mulder,J.Rockberg,P.Nilsson,J.M.Schwenk,M.Hamsten,K.vonFeilitzen,M.Forsberg,L.Persson,F.Johansson,M.Zwahlen,G.vonHeijne,J.NielsenandF.Ponten(2015)."Proteomics.Tissue-basedmapofthehumanproteome."Science347(6220):1260419.Wang,Z.,M.GersteinandM.Snyder(2009)."RNA-Seq:arevolutionarytoolfortranscriptomics."NatRevGenet10(1):57-63.

.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint

top related