nasa big data task force february 16 minutes · 2020-04-28 · nasa advisory council ad hoc big...
TRANSCRIPT
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
1
Ad Hoc Big Data Task Force of the
NASA Advisory Council Science Committee
Meeting Minutes
Inaugural Meeting February 16, 2016
NASA Headquarters Glennan Conference Room, 1Q39
_____________________________________________________________CharlesP.Holmes,Chair
____________________________________________________________ErinC.Smith,ExecutiveSecretary
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
2
ReportpreparedbyJoanM.ZimmermannIngenicomm,Inc.
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
3
TableofContentsIntroduction 3Charter/ScienceCommitteeandSubcommitteeFeedback 3LegacyfromNACITIC 4Discussion 5HPDBigData 6ScienceCommitteeGreetings 8BigDataandEarthScience 9SupercomputingandBigData 10APDandBigData 11Publiccomment 13OtherFederalBigDataInitiatives 13PlanetaryScienceBigData 14Discussion/wrap-up 15 AppendixA-AttendeesAppendixB-MembershiprosterAppendixC-PresentationsAppendixD-Agenda
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
4
IntroductionDr.ErinSmith,ExecutiveSecretaryoftheNASAAdvisoryCouncil(NAC)AdHocBigDataTaskForce(BDTF),calledthemembershiptoorderandmadesomeadministrativeannouncements.Dr.CharlesHolmes,ChairoftheBDTF,openedtheinauguralmeetingoftheBDTF.Introductionsweremadearoundthetable.Charter/SubcommitteeFeedbackDr.SmithpresentedanoverviewoftheTaskForce,whichwascreatedinresponsetoanumberofWhiteHousedirectivesontheBigDataconcept,whichrelatedtothepurviewsofNASA’sHeliophysicsandEarthSciencesdivisions(HPDandPSD),whichengageinthestudyofsolaractivityandsolarstorms,andweatherforecasting.Theadministrationalsoexpressedagreatdealofinterestintheinteroperabilityofdatasets,andrelatedusesofBigData.Successfulapplicationsofscienceintheseareaswillrequirethebreakdownofsubdisciplinestovepipes,andtheinteroperabilityofNASAdatasetswiththoseoftheNationalOceanicandAtmosphericAdministration(NOAA)andtheUSGeologicalSurvey(USGS),makingdataavailabletonumerousenduserssuchasemergencyresponseanddisasterreliefagencies.BigDatamayalsoenabletheidentificationofactionablescienceinformation,makingdatausefulforunforeseenapplications.BigDataalsomeansdifferentthingstodifferentusers,andforspecificdata-handlingtools,dataformats,andthecreationofdatastandards.ApplicationsvaryfortheAstrophysics(supernovamodels),Planetary(identifyingexoplanets,galaxyformation),andHeliophysicsdivisions(onetarget/manymissions,coronalmassejections,radiationenvironmentforhumanexploration).NASA’sEarthScienceDivisionhasbeenmanagingandexploitingBigDataformanyyearsincreatingclimatemodels,andforsocietalapplicationssuchasdroughtforecastinganddisasterresponse.ManyNASAspacebornemeasurementsarecurrentlybeingusedtoimproveairqualitydecisionsupportsystemsinTexas,andinproducingaccuratecloudformationmodels.HPDdataandengineeringdataarebeingfedintoanIntegratedRadiationProtectionSystem,tohelpdeterminehowtogettoacceptableriskfiguresforradiationexposureinhumanexploration.Thetermsofreference(TOR)fortheBDTFformabroadcharter,whichcanbedescribedasexaminingwhatthecommunityasawholeisdoinginBigData,aswellaswhatotheragenciesaredoing,andidentifyingwhatcanbedonebetter.TheintentistocataloguebestpracticesinNASAandotherfederalagencies,aswellasinprivateindustry,researchinstitutions,andacademia.Oneofthefinalproductsmaybeawhitepaperreportingoutfindingsandrecommendations.AmajorchallengefortheTaskForcewillbetodefinewhattheterm‘bigdata’meanstothevariouscommunities;toanastronomeritisanarchiveissue.ToHPDandESD,itisinteroperabilityissuesandengineering.Otherchallengeswillbetodeterminethemostusefulandefficientarchitectures,storagemodes,dataaccessibility,datarates,datasecurity,andintellectualpropertyrequirements.Howdowecommunicatewhatdatasetsaresaying,andhowdowetrainpeopleinuseofdatasets?Itisadynamicarea.Todate,theBDTFhas
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
5
completeditsethicstrainingandisintheprocessofsigningonitslasttwomemberstoroundoutthecommittee.TheNACScienceCommitteehasprovidedfeedbacktotheBDTF,namelytoacquiremorerepresentationfromcommercialentitiesandothernon-NASAsciences,aswellastoconsiderground-basedsciencesthatmayhaveproducedscientificdata;Feedbackwasalsotolookatdatavisualization;datapermanence;anddatausage.TheScienceCommitteehasaskedthattheBDTFactasago-betweenforcommunity,andtofindlinksandleveragepointswithexistingeffortsonbigdata.TheScienceCommitteealsorecommendedthatBDTFinvitepeoplefromtheNASAarchives,NASAAmesResearchCenter,simulationexperts,modelers,andindustrypartners.Withindisciplines,practitionersshouldbeabletounderstandthemselveswithintheirsubfields,andtoallowforcross-pollinationbetweensubfields.TheBDTFhasalsobeenaskedtofindthebestwaytogatherfeedbacksothattheScienceCommitteeanditssubcommitteescanbenefitfromthiseffort(surveytoindustrymembers,townhalls,e.g.).TheNACSciencesubcommitteeswouldliketheBDTFtoaddressdatausability,managementandaccess,utilization(includingreal-time),analysisanddataminingoflargedatasets,algorithmandstatisticsdevelopment,datacuration,archivingtoolsandtechnology,visualization(suchashyperwall),andusingstateoftheartinformationtechnology(IT)systemsandtools.Otherquestionstoaddress:Whatopportunitiesarethereinbigdata?Whichsubjectmatterexperts(SMEs)shouldbeconsulted?Whatkindofproductsaredesirable?Dr.Holmesnotedthatgiventheextensiveshoppinglist,hewishedtodeviseaworkplantousethelimitedtimeavailable,inordertodistilltheTaskForceoutputintosomethingvaluable.Astotheterm“interoperability,”hechallengedDr.Smithtofine-tunethisdefinition,asitisawide-opentopic.Hebelievedthatinnovationcomesfromthebottomup,andworriedthat“interoperable”raisessomeredflagsforthecreationoftop-downmanagement.Dr.ClaytonTinoworriedabout“needsforfutureuse,”whichwouldrequireafundamentalunderstandingofdataformats;itisnearlyanon-solvableproblemtomakedataunderstandabletoallcommunities.Dr.JamesKintercommentedthatinteroperabilitytendstobecomeacatchallphraseforsimulationandmodeling,bestpractices,andinteroperabilitybetweendisciplinescientists(includingmetadataanddocumentation).Dr.RetaBeebenotedthat“datamining”connotessomethingmagicalandisamajorquestion.Externally,peoplethinkthatdataminingismagicallydone.Datasetsaresodifferent,particularlyinPlanetaryScience,thatdataminingbecomesamajorproblem.Dr.Holmesreiteratedhisbeliefinthebottoms-upapproach,andtoallowsuccessesfromthisapproachtoreplicatethroughotherscientificareas.LegacyfromNACITInfrastructureCommitteeDr.HolmesgaveanoverviewoftheBDTF’shistory,havingservedasvicechairoftheNACInformationTechnologyInfrastructureCommittee(ITIC),whichstoodfrom2010-2013.ItsmainaffiliationwaswiththeNASAChiefInformationOfficer(CIO),butithadtiesacrossNASAaswell,inareassuchascybersecurity.TheNACrecommendedthatboththeITICandtheScienceCommitteeexploreanapproachtoimproveaccessto
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
6
NASAsciencedatarepositories,withthatexplorationtoincludebestpractices,etc.,thathavebeentranslatedtothepresentTORfortheBDTF.InFall2013,theNACadvisorycommitteestructurewasrevamped,cybersecuritywasputundertheaegisofanewcommittee,andtheworkoftheformerITICnowcontinueswiththecurrentBigDataTaskForce,reportingtotheScienceCommittee.OneofthefirstrecommendationsoftheformerITICwasthatNASAshouldtakeadvantageofassetsintheFederalgovernment,suchasGPUclusters,cloudcomputingundertheNationalScienceFoundation(NSF),andothersponsorship.ITICalsorecommendedthatNASAimprovethecyberinfrastructurethatsupportsAgencyscience.OneofthefindingsoftheITICnotesthatNASAsciencedatadoesnotsitinoneplacebutisdistributedacrossNASAcenters,atUSGS,industry,anduniversities.NASAdatacentersarediscipline-focused,andaremanagedinthisway.Thenumberofsciencepublicationscomingoutofthesecentersisgrowingdramatically.EducationandPublicOutreachcontinuestotapintothesedatastores,sometimesdirectly,andsometimesthroughagroupthatprocessesitforthegeneralpublic.TheDepartmentofEnergy(DOE)hassetupabackbonethroughoutthecountrywithmanynodesnotfarfromtheNASAcenters;itwouldbegoodtoleveragethispipeline,aswellasa10-Gpsnetworkresearchthatlinksresearchinnovationlaboratories.UseofNASAsupercomputersatbothGoddardSpaceFlightResearchCenter(GSFC)andAmesResearchCenter(ARC)isgrowing.TheEarthObservingSystemDataandInformationSystem(EOS-DIS)isalsogrowinginitsdataproductdistribution.Webservicestosupportdisasterapplications,suchastheShort-termPredictionResearchandTransition(SPoRT)CenteratMarshall,aretransitioningresearchdatatotheoperationalweathercommunity.TheSolarDynamicsObservatory(SDO)isrevolutionizingthewayweunderstandthesun,andiscollectingroughlyapetabyteofdataperyear,with5petabytesperyearworthofprocessing.Therehasbeenatwo-order-of-magnitudejumpinwhatsolarphysicshadbeeningestingpreviouslyfromoldermissionssuchasHinode.NASA’sMultimissionArchiveatSpaceTelescope(MAST)isshowingalmostexponentialgrowth,andwhichwillgrowevenmorewhenfuturetelescopemissionscomeon-line.Thereare200-plusappsintheAppleiStorethatwillreturnfromasearchonNASA;manyoftheseappsareinhighdemandfromthepublic,andpullprocessedresultsoutofNASA’sdatastores.Morethan250,000peoplehavetakenpartinNASA’sGalaxyZooprogram.In2012,theOfficeofScienceandTechnologyPolicy(OSTP)sentoutamemotothepublicannouncingaBigDataInitiative,earmarking$200Mtobespentonimprovingaccesstothegovernment’sbigdatastores.In2013,thereweremorememosandExecutiveOrderscomingoutonthisissue,butNASAwasmissingfromthelistofrecipients(DOE,DepartmentofDefense,andothers);soitmustbeasked-wheredidNASAmisstheboat?Dr.HolmesnotedanITICfindinginNovember2012,thatNASAacquirefiber-opticpathwaystosupportcurrentandfuturedata,andarecommendationthattheybuyratherthanownthesepathways.DiscussionThecommitteediscussedadraftworkplantodeterminehowtheBDTFwouldmoveforward.Dr.HolmesfeltthattheBDTFshouldn’taddresstheareasofdatasearchability
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
7
andavailability,proprietaryperiods,long-termarchiving,andotherfrequentrequeststhataremadeofNASA’sdatastores,feelingthatprocessesarealreadyinplaceforthisatNASA.TheBDTFshouldbreaknewgroundinstead,andshouldsurveythecommunity,choose3to4topics,andproduceproducts.TheBDTFshouldformaconciseproblemstatement,research,organizeanddeveloppositions,formaconsensus,anddraftandpresentresultsinawhitepaper(4-6pp)accompaniedbyaslidepresentation.BecausetheBDTFexpiresinDecember2017,thereareonly4-5moreface-to-facemeetingsinadvanceofeachofthefutureScienceCommitteemeetingsinwhichtodevelopfindingsandrecommendationstotaketotheScienceCommittee.Tothisend,theTaskForceshouldalsoholdteleconferencesasappropriate.Dr.HolmesreviewedhisdutiesasChairasprimarilybeingtherepresentativetotheScienceCommittee,andclosedwiththethought:“Dogood,workhard,NASAneedsus.”Dr.RayWalkeragreedthatdataavailability/searchabilitydidnotrequireahardlook,butnotedthatasdatavolumesgetlarger,itwillbenecessarytofigureoutthepieceswewanttouse;inthissensetheissueisstillimportanttoconsider.Dr.HolmesinvitedDr.WalkertowriteupanactionablerecommendationontheissueandsendittoDr.Smith.Dr.Tinocommentedthattherearemodel-level,internal,andexternalusedomains;whatisitthatareweactuallytryingtodo?Heagreedtowriteupanitemonthisquestion.Dr.Kintersaidthatitseemsthatbydefinition,BigDatameansthebiggestandbaddestdatasets;inthatrespect,wetypicallyweseeaccessibilityasawaytoaggregateandanalyzedatafromanentiredataset(petabytes);veryfewuserswillhavetheresourcestooperatedatasetsofsuchmagnitude.TheTaskForceshouldalsothinkaboutfacilitatingtheanalysisofdatasetsthataretoobigtomoveandtoobigtoanalyzein-situ.Dr.Holmesagreedtorevisetheworkplanwiththeadditionsofthewrittencontributions,andtolookatareasthatcanbeextendedbeyondthestateofwork;theBDTFneedstolookatbenchmarksregardingthisissue.HPDBigDataDr.JeffreyHayespresentedareasofconcernfortheHeliophysicsDivision(HPD)intermsofBigDataneeds.HPDstudiesthesun’svariance,theresponseofgeospace,andtheSun-Earthsystem’simpactsonhumanity.Todothis,HPDengagesinthescienceofspaceweather,triestounderstandtheinterconnectionsbetweentheSunandEarth,anddevelopsknowledgetoimprovethepredictionofextremeeventssuchasmajorcoronalmassejections(CMEs).Themissionportfolioincludesaresearchandanalysis(R&A)line,anExplorersmissionline,alongwithLivingwithaStar,SolarTerrestrialProbes,andthesoundingrocketsprogram.MissioninvestmentisguidedbytheDecadalSurveysandNASA’sadvisorybodies.TheHPSystemObservatoryincludesnumeroussatellitessuchasIRIS,Wind,STEREO,theVanAllenprobes,andtheInterstellarBoundaryExplorer(IBEX).Withinthecurrentmissionsandtheoperationsbudgets,thereisacertainamountoffundingfordataarchiving,andthecreationofstandardsandaccessibility.Dr.Hayesfeltthatmostmissionswereabletorespondquicklytodecisionsondataarchivingandcuration.SeniorReviewsaddressthescientificmeritsofHPDmissionseverytwoyears,andtakeintoaccounttheaccessibility,usabilityandutilityofdata(includingarchivingafterthemissioniscomplete).Asaresult,thedatapipelineisdoingverywell.
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
8
About70-80%ofHPDdatacomefromextendedmissionphases.Thesunvariesinaroughly22-yearcycle;alloftheseHPDmissionsoperatingsimultaneouslyarebeginningtoenabletheunderstandingofaverycomplexsystem.TheaveragecostofaHeliophysicssatelliteoperationis$2.9Mannually.TheSolarDataAnalysisCenter(SDAC)andSpacePhysicsDataFacility(SPDF)aretheactivearchivesforHPDandrunatabout$3.3Mperyear.ThereisalsoaROSESelementamountingtoabout$1Mayear.Thus,thetotaltocuratethedataisabout$4.5Mperyear,plussomemoneyinthemissionlinesthemselves.Dr.Hayesnotedthat“Scientistswantallthedataallthetime,forever.”Intheearly2000s,theDecadalSurveycameoutwithapriorityforaVirtualObservatory,inwhichtheideawastocollectallthedata(bothAstrophysicsandHeliophysics)andmakeituniversallyaccessiblethroughcommonstandards.Atthetime,Astrophysicshadonestandard,andHeliophysicshadmultiplestandards.Overthelast20years,NASAhasbeentryingtogetthesestandardsinline,andDr.Hayesfeltthatgoodprogresswasoccurringinthisarea.Heliophysicshasanexplicitpolicythatestablishedstandards,whichareFITS,CDF,andNetCDF.NASAisinamuchbetterplacethanitwas10yearsagointermsofstandardization.HPDhasalsorestoredalargefractionofdatafromitsoldermissions,andhasbeensystematicallyexaminingoldarchivesandrestoringdataarchivesanddatasetsofscientificinterest.Foranymetadata,itisnecessarytogeteveryonetoagreeonkeywords.HPDhasgottengoodbuy-in,anduserscannowusetheSpacePhysicsArchiveSearchandExtract(SPASE)metadatawrapperstodoaninventory,searchbydateorevent,etc.,tohelpdosystemscience.Theprocesshasgottenalotbetter,andappearstobegoingfaster.HPD’sthreemostrecentmissionsaresuccessfullyusingtheSPASEmetadatawrappers.ThefirstdatafromMagnetosphericMultiscale(MMS),forexample,willbeavailableonSPDFonMarch1.HPDisstartingtogetterabytesofdata-thisisanewexperience.Thereare800TBfromSDOtodate,andthevolumeisgrowing.HPDisnowlookingatstoring1PBintheSDAC;thisdatavolumewillprobablytripleorquadrupleasfuturemissionscomeonline.StanfordUniversitywillnotalwayssupportSDAC;atsomepointthedatawillhavetobroughtbacktoNASA.Dr.Hayesfeltthatputtingdataonthecloudwasstillaniffyprospect,andcitedarecentaccidentaldeletionofstoreddataasoneofitspotentialdrawbacks.Solarprojectdatavolumegrowth,intermsofbothlifetimedatavolumeanddatarate,willcontinuetogrow.Thequestioniswhereandwhowillstoreit,andhowwillitbemovedaround?HPDcan’tthrowdataawaybecauseHeliophysicsscienceneedsthecontext.Datapolicyisworkingwell.HPDhasaregistryandinventoryofthedata,andisconstantlyupdating.Legacydatasetshaveprettymuchcompletedtheirextractions.NowHPDisconcentratingonstandards.AfuturechallengeishowtousetheSPASEmetadata,howtousethedata,andhowtomakeitaccessibletothenon-expertuser.Remotesensingvs.in-situmeasurementsareverydifferentandthesedifferencesmustbetakenintoaccount.Formodeling,howdowearchiveuseful,powerfulcomparisons?Atthispoint,modelsdonothaveastandard;weareworkingtowardit.Aswemoveawayfrom
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
9
theVirtualObservatoryconcepttoamoreconsolidatedwayofgettingdataout,wemustfocusonmetadataandlinkstogenericaccessmethods,andavoidstovepiping.Theinterdisciplinaryaspectsofdatawillbeaddressedbyalargergroup.Dr.HayesnotedthattheVirtualObservatoryconceptdidnotfail,butthetechnologyhassincemovedon.Dr.HolmesaskedDr.HayestoidentifyHPDneedsfromtheBDTFstandpoint.Dr.Hayesrepliedthatoneusefulfindingacknowledgingthevalueofstandards.Theotherissueofconcernforhimwastheunfundedmandateaboutkeepingversionsofdatainperpetuity.ThereisaNASApolicyinresponsetotheOSTPaboutpublicaccessibilityandpublications,howevertheworrisomeissueiswhetherthereferencedatainapaperhascertainpedigreethatmayormaynotbepreservedinthearchive.Whoownsthefinaldata?Whichversionofthesoftware?Thereisneverenoughdiskspace.Anotherusefulfindingwouldbeastatementthathavingdataactive,on-line,isagoodthing.Data,especiallytaxpayer-fundeddata,shouldn’tbeburiedinsomeone’sdeskdrawer.NASAtendstogetpushbackfromprincipalinvestigatorsonthisissue-theyfeeltheirdataisproprietary.Dr.HayesagreedtowriteupanitemforDr.Smith.Dr.Kintercommentedthatthereisnodatastandardformodels,andthatthisisachallengeforthefuture;hewonderedhowmuchinteractionthereisbetweentheHeliophysicscommunityandthetroposphericandweathercommunities.Dr.Hayesfelttherewasnotmuchinteraction,certainlynotatthetroposphericlevel.Therearemeetingsongoing,however,andHPDwouldbeopentoanythingtheothercommunitieshavethatcanbeused.Thevariablesmaybedifferent,butitissomethingthatcouldbeexplored.Dr.WalkermentionedthattheNationalScienceFoundation(NSF)islookingintodataassimilation.Dr.HolmesnotedthatthecommunityhadlookedatcompatibilitybetweenEarthScienceandHeliophyicsdatatenyearsago,andstoppedbecauseofdatasparseness.Dr.NealHurlburtagreedthattheeffortwasstillatthecasestudy-level.IRISisagoodexampleofwherewewereforcedtousemodels.Dr.Kinternotedthattherearealsooceandataassimilationsthathaveasimilarproblemwithdatasparseness.Thetroposphericproblemhasmovedwellduringthelastdecade,andcanaccommodatedatasparsenessalittlebetter.GSFChassomeexpertisehere.Dr.HolmesaskedDr.KinterprovidePOCsatGoddard.Dr.WalkermentionedthatthePlanetaryDataSystem(PDS)hasbegunastudyofarchivingmodels,aswellastheCommunityCoordinatedModelingCenter(CCMC),andEuropeanworkinbothHeliophysicsandPlanetaryattheUniversityofParis;thesecanprovideusefulLessonsLearned.ScienceCommitteeGreetingsScienceCommitteeChair,Dr.BradleyPeterson,addressedthecommittee,thankingmembersfortheirimportantcontributions.Henotedthattimewasapressingissue,andurgedtheBDTFtofocusonfindingcommonalitiesandbestpracticesacrossthesubdisciplines,andbuildingontheexistinginfrastructureonlyifitisuseful.HeaskedthemembershiptoregardtheNASAbudgetisazero-sumgame,asNASAwillbuyintorecommendationsonlyiftheyareaffordable,orwhethertheyareworthgivingupsomethingfor.Eatingintothebudgetformissionsandresearchwouldbeanundesirableoutcome.Dr.PetersonsuggestedthattheBDTFconsultwithsubcommittee
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
10
chairswhenuseful,inordertoiterateideasacrosstheScienceCommittee,subcommittees,andBDTF.BigDataandEarthScienceDr.KevinMurphypresentedanoverviewoftheEarthScienceDataSystemsprogram,andstatedthatregardlessofvaryingdefinitionsofbigdata,EarthSciencehasit,aswellasalargeuserbase.Objective2.2ofthe2014NASAStrategicPlaninformstheusageofEarthSciencedatatoformaviewofEarththatcanbeusedacrossdisciplines:ocean,atmosphere,cryosphere,etc.andtheirinteractions.TheEarthObservingSystemDataandInformationSystem(EOSDIS)isthelargestcomponentoftheEarthSciencedatasystem,andisassociatedwiththecompetitivelyselectedprograms,MakingEarthSystemdatarecordsforUseinResearchEnvironments(MEaSUREs)andAdvancingCollaborativeConnectionsforEarthSystemScience(ACCESS).EOSDISworksinternationallyandamongthefederalagenciestogetdatatothepublic,andprocessesdatafromlevel0tohigherproductstomakeavailabletousers.EOSDISwasinitiatedin1990,incorporatingheritagedatasetsin1994fromsatellites,aircraftandin-situsensors(e.g.fluxtowers),andwasdesignedtohandleaterabyteofdataperday.EOSDISreprocessesdataquiteoftenasinstrumentsdeteriorateorasbettersignalprocessingmethodsbecomeavailable.Thereareabout15petabytes(PB)ofdatacurrentlyavailable,allofwhichinteroperatewithotheragenciesandarchivesthroughestablishedstandards.EOSDIShasadistributedframework,andhashadanopendatapolicysince1997.Thesystemgeneratesbiophysicalproductsandgeolocatesthem,anddistributestotheendusers.EOSDIShasanextensivevolumeofdatarepresentedinover9200datatypes,whichrangeoverhumandimensions,land,atmosphere,oceandynamicsandthecryosphere.Thesystemworkscloselywithmissionsinformulationanddevelopmentinordertopreparedataplans.EOSDISisspreadoutovertheUS.MissiondataareprocessedbyScienceInvestigator-ledProcessingSystem(SIPS),whicharethenpassedalongtotheDistributedActiveArchiveCenters(DAACs)tosupporttheuserbase.DAACsarelocatedathostorganizationsthatarewidelyrecognizedbythecommunity,andeachDAAChasaworkinggroupthathelptodirecthowtheDAACswork.ThereisalsoaProgramScientistwithineachDAACthatroughlyalignswitheachsubdiscipline.ThetwocomponentsoverseeingtheDAACsareprimarilyHeadquartersformanagementandtheGoddardSpaceFlightCenter(GSFC)forimplementation.TheEarthScienceDataandInformationSystem(ESDIS)managesthecoordinationofEOSDISactivitiestoavoidduplicationofefforts.ESDISholdsannualmeetingsandcontinuallytakesinputthroughweeklyteleconferencesandannualmeetingswithDAACsmanagersandDAACsystemsengineers.Roughly160-180peoplegototheannualmeetings.TheEOSDISinfrastructurealsotiestogetherusersandDAACsthroughearthdata.nasa.gov,acommonmetadatarepository(CMR),GlobalImageryBrowseServices(GIBS),EOSDISMetricsSystem(EMS),andvarioususersupporttools.EOSDISperformsanannualcustomersatisfactionsurvey,andalsohasDAACUserWorking
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
11
Groups,whichreceiveregularfeedback.EOSDISmetricsfrom2015show9462uniquedataproducts,and2.6MdistinctusersofEOSDISdataandservices.EOSDISdistributesabouttwiceasmuchdataasitingests.In2015,thesystemreceivedanACSIscoreof77(consideredverygood).Thetrendforproductdeliveryisincreasing.EOSDISconvertshigh-valueproductsintoimagery,suchastheNASAWorldviewwebsite,whichusesdatafromtheAqua/Terra/ModerateResolutionImagingSpectroradiometer(MODIS)satellites,andNOAA’sVisibleInfraredImagingRadiometerSuite(VIIRS).WorldviewworksmuchlikeGoogleEarth;userscanzoominandgobackintime.Userscanalsooverlaydata,suchastheSO2cloudoveraneruptingvolcano,andfindspecificdatasuchasfirehotspots.EOSDISholdsSeniorReviewstoevaluatethevarioussubsystemstoevaluateperformanceandscientificmerit.Dr.Walkernotedthemanyhighlyderiveddataproducts,andaskedhowEOSDISkeptupwithevolvingalgorithms.Dr.Murphyexplainedthatstandardproductsareproducedincollections,andEOSDISiscurrentlygoingfromMODIScollection5tocollection6,reprocessingdata.Collection5willbemaintaineduntilcollection6iscomplete.Scienceteamswilldeterminewhenthenewcollectionisdone.Dr.HolmesaskedwhattheBDTFcouldforEarthScience.Dr.MurphyfeltthatNASAreceivedlittlerecognitionforthisimportantwork,asitisgenerallynotwellunderstood.Thedataproductrampiscurrentlylimitedbyadaptingtoinputfromnewinstruments.EOSDIShastoputalgorithmsclosertothedatainawaythatallowsunimpededaccesstoproducts;howtodothisisstillanopenquestion.NASAalsoneedstolearnhowtoworkwithcommercialhigh-performancecomputinggroups,maybe.Dr.Hurlburtaskedhowmanyofthe2.9Mdistinctuserswerepartoftheactive(science)community.Dr.Murphyrepliedthatpeoplewhousealotofthedatawillfrequentlyuseallofit(operationaluserswhouseLevel1data).Thenumbersofgraduatestudents,etc.,arehardtoestimate.Dr.KinteraskedhowESODISdealtwiththebudgetrealities.Dr.MurphynotedthatEOSDISrecognizestheneedtodeveloporadoptstandardized-enoughcomponentstoallowpeopletodeveloptheirowntools,astrategythatsavesbothtimeandeffort.NASAdoesn’twanttobethefirstadopterorthelast.Thestrategydependsonthecommunity.EOSDISkeepstheprincipleofopenapplicationprogramminginterfaces(APIs),andopenaccess.Thecommunityiswellawareofthedatapolicy.Dr.WalkeraskedabouttheextentofwhichNASAprovidesinteroperabilityinitsjointworkwithNOAA.Dr.MurphyexplainedthatNASAoperateswithNOAAonacataloguelevel,usesopensoftwaresourcing,sharesobservations,andworkscloselywithNOAAontheClimateInitiativeandintheairborneprogram.SupercomputingBigDataDr.TsengdarLee,ProgramManageroftheEarthScienceDivisionSupercomputingProgram,presentedanoverviewoftheprogram,andtheNASAvisionforfuturecomputingservices.NASAhastwosupercomputingcenters,oneatAmesResearchCenter(ARC),whichservestheentireagency)andoneatGSFC,whichservesprimarilyEarthScience.ARCsupportsagency-wideactivities,fromlaunchvehiclestogeneralrelativity.
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
12
InAugust2015,theNASAFlagshipcomputer,Pleiades,reachedahalfbillionSBUs(computingcycles)deliveredaccumulativelyfrom2008,translatingtonearly$300Mofservices,atacostofroughly26centsperSBUin2015.NASAcontinuestogrowthesystem,relyingonMoore’slawtogoforward(Dr.Leenotingthatsomearguethatthelawhascometoitsend).Scientificandengineeringeffortswillgrow,thusNASAwillhavetocomeupwithauserpolicybecausethesystemhasbecomeoversubscribed.TheROSESselectionprocessisnowbeingtightlycoupledtotheavailabilityofcomputingtime.ForEarthScienceimagingandmodeling,thesystemcanpushtheresolutiondownto1.5kmcurrently;theholygrailofatmosphericscienceis0.5km.Theworkloadischanging,shiftingintodataprocessing.Asanexample,theKeplermissionisusingPleiadestosupportvalidationfornewexoplanets.Thishasbecometheprimaryavenueforproducingdiscoveriesinthatarea.Dataassimilationsystemsarebeingusedtocreatephysicallyconsistentlong-termdatasets,from1979tothepresent,andarealsodownscalingtohigherresolutiondataforclimatestudies.TheOrbitingCarbonObservatory(OCO-2)ispresentingdataprocessingchallenges.NASAisdoingadatare-processingcampaignwithnewalgorithms,withabout60%ofthisworkbeingdoneonthesupercomputerand40%ontheAmazoncloud.HighEndCapabilityComputing(HECC)isbeingusedtoclear5yearsofanunmannedaerialvehiclesyntheticapertureradar(UAVSAR)dataprocessingbacklog,toreducelatency.Processingismovingintothebigdataarea,pitchinghigh-performancecomputingagainstLargeScaleInternet.Canhigh-performancecomputing(HPC)beusedasaprivatecloud?Howdoweputtogetheranarchitecturetoprocess,analyzeandminedata?Currently,datastorageanddatamanagementisthecoreofthebusiness,withdatainthemiddle,andalltheserviceandprocessingsurroundingthedataset.AScienceCloudarchitectureideallyprovidesanagile,highlevelofsupport,withthesystemowningthedata,usingadatamanagementsystem,dataanalyticsservice,openstack,etc.NASAisconstantlylookingatnewtechnologies:cloudandvirtualization,high-performanceobjectstore,andSciDB(thelatterheavilysupportedbyDARPA).Thesciencebenefitofasciencecloudhashelpedtovalidatemanytypesofmeasurements,suchasglobalfires.CouplingHPCandcloudcomputingcancreateabest-of-breedcomputingserviceenvironment.HECC’spathtogrowthisconstrainedatpresent;NASAhasmaxedouttheinfrastructureintermsoffacilities,building,water,andelectricity,andisengagedinastudyonhowtobuildnext-generationdatacenters.Drs.Holmes,Walker,andHurlburtexpressedconcernsaboutuserconstraints,giventhat70-80%oftheprogram’sworkloadrequiresatightlycoupledprocess.Dr.LeeagreedtowriteastatementonthisstateofbeingforusebytheBDTF.Headdedthatcertaintypesofworkloadscouldbecloud-computed,andNASAisexploringthoseoptionsaswell.Dr.ClaytonTinoaskedifDr.Leehadanysenseofthecapacitytheprogramwaslosingduetomixedmodeservices.Dr.LeerepliedthatNASAwasdoingthemixedworkloadbecauseofthedemand.Someoftheprojectsdidn’tplanfortheirHPCuse,andneedtodoabetterjobofsuchplanninginthefuture.AstrophysicsandBigDataDr.PaulHertz,DirectoroftheAstrophysicsDivision(APD)presentedBigDataneedsasviewedbytheAstrophysicscommunity.Astrophysicsaddressestheevolutionofthe
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
13
universe,theoriginofgalaxiesandstarsandthequestionofwhetherwearealoneintheuniverse.TheAPDisdrivenbytheDecadalSurveys,scienceroadmaps,andimplementationplanstosupportitsabilitytohandlelargedataquestions.Sixtypercentofthebudgetsupportsdevelopingspacemissions,20%operations,another5-10%isdedicatedtoresearchanddevelopment.Dataarchivesarefundedasaninfrastructureinvestment.APD’scurrentsuiteofmissionsrunfrommanysmallmissionssuchasNeutronstarInteriorCompositionExplorer(NICER),tothelargespacetelescopes,HubbleandthefutureJamesWebbSpaceTelescope(JWST).ThenextlargeflagshipafterJWSTisWide-FieldInfraredSurveyTelescope(WFIRST),whoseprimescienceistounderstanddarkenergyanddarkmatter,whichcanonlybedonebymeasuringthesmallimpacttheseforceshavehadinthehistoryoftheuniverse,bylookingatlargeswathsofuniverse;i.e.lookingatlargeamountsofdatatoseesmallperturbations.ThusWFIRSTwillbecomputationallyintensive.WFIRSTwillbelookingatmillionsofgalaxies,searchingforevidenceofmicrolensing,whichisalsocomputationallyintensive.Euclid,aEuropeanmissionwithsimilaritiestoWFIRST,willalsocreatelargedatasets.Anotherfutureground-basedobservatoryistheLargeSynopticSurveyTelescope(LSST).Allthreeoftheseprojectswillbecombiningtheirdatainpixel-by-pixelanalysis.Thevariousagenciesarestudyingthebestwayofcarryingoutthisdataprocessing,adecadeinadvanceoftheneed.Awhitepaperonthistopiccanbefoundat[[arxiv.org/abs/1501.07897]];Jainetal;TheWholeisGreaterThantheSumoftheParts.AllNASAAstrophysicssciencedataareopentothecommunity,andalldatacentersgothroughtheSeniorReviewprocesseverytwoyears.Allastrophysicsarchivesshareasetofcommonprotocolsandstandards,allowingtheusercommunitytocombinedatafrommultiplegroundandspaceobservatories.TheNASAAstrophysicsVirtualObservatory(NAVO)managestheprotocols,whileNSFfundsthetools.ThethreeAstrophysicsarchivesmanagetheNAVObackbone.APDrecentlyheldaSeniorReviewofthearchives,andrecommendedthattheybecomemoreproactiveandaggressiveaboutevolvingintothefuture(increasingbandwidth,keepingupwithtechnologicaladvances,preparingforlargevolumesofdata).Sometypesofcomputingmightbemoreexpensiveinthecloud,anditmustbedeterminedwhicharewhich.NASAandNSFarecurrentlyfundingtheoreticalandcomputationalAstrophysicsnetworks(TCAN).Dr.HertzwasnotawareofanyissuesthusfarongettingtimeonNSFsupercomputers.(Dr.LeenotedthatNASAcivilservantscan’ttypicallygetonNSFsupercomputers,butuniversityPrincipalInvestigatorscan.)AnothercomputationallyintensiveareaislaboratoryAstrophysics:interpretingx-raysfromChandra,farinfrareddatafromHerschel,andvisible-to-ultravioletHubblespectrallines.Theseatomiclinecalculationsareneededforcreatinglinecatalogues.Dr.TinoaskedifunderestimationofcomputingtimewereathemeinAPD.Dr.HertzexplainedthatprocessingKeplerdatahasbeenmorecomputationallyintensivethanwasappreciatedatthebeginningofthemission,butthatanewmission,TransitingExoplanetSurveySatellite(TESS),whichhasasimilardataproducttoKepler,hadplannedaccordinglytoLessonsLearnedontheneedforanticipatingcomputingtime.Dr.LeenotedthatNASAisalsomakingtighterconnectionsbetweenHPCandthebudget-planningprocess.Intermsof
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
14
recommendations,Dr.HertznotedthatAstrophysicswasaminorityuserofHPC,andwasinterestedinareaswhereitcouldleverageexistingassets,orincommercialorotherresearchthatcanimproveAstrophysicsscience.APDhaspartneredwithDOEinthepast,whentheyareinterestedinthescienceproblem.DOEisnotinterestedinexoplanets,butitisinterestedindarkenergyanddarkmatter,thereforeAPDwillbeworkingwiththemonjointWFIRST-Euclid-LSSTanalysis.PubliccommentperiodNocommentswerenotedfromtheonlineaudience.AtNASAHeadquarters,TrippCorbettmadesomecommentsfromthevendorperspective,sayingthathewasnotingabitofdisconnect,astoolsareavailableatNSSCthatshouldbemorewidelycirculated.AtarecentNASAmeeting,hehadheardabriefingonworkingwiththecloud-computingcommunityinabudget-consciousway,andagreedtosendmorespecific.informationtotheBDTF.OtherFederalBigDataInitiatives(NSF)TheNSFBigDataHubsProgramdirector,Dr.FenZhao,briefedtheBDTFbyphoneonherprogram,whichisfundedatabout$20Myear.TherearerelatedprogramsatNSFthatlookatBigDatainfrastructure,pilotandimplementationefforts,andEducation-relatedactivitiessuchastheBigDataWorkForce($30Mayearlookingattraineeships).TheBigDataHubsprogramlooksatthecomplexrelationshipsbetweendataprojects,endusers,andcommercialentities,andinvolvescross-disciplinaryeffortsanddatasharingacrosstheresearchecosystem.TheinspirationforBDHubscamefromOSTP’s2012BigDataInitiative,inwhichaBigDataPartnershipsWorkshopinitiativeresultedin29newpartnerships,with90organizationsparticipating,representingareassuchasenergy,healthcare,andfinance.Theinitiativechosevariousissuessuchasclimatechangeandpersonalizedhealthcare,andNSFinitiatedtheBDHubsefforttoallowthesepartnershipstogel.BDHubswaslaunchedinMarch2015,withfourhubsinfourregionsoftheUS,andmadeawardsinSeptember2015(ColumbiaUniversityintheNortheast,GeorgiaTechandxintheSouth,UIUCintheMidwest,andUniversityofSD,UCBerkeley,andtheUniversityofWashingtonintheWest).Hubsaredifferentlyconstructedconsortia;thecurrentphaseisallowinghubstostartuptheiractivities.TheprojectsarecalledBDSpokes,whichrepresentspecificactivitywithineachtopicalarea,suchasaplatformforsharingneurosciencedata.Thespokesarefundedat$1Moverthreeyears,andaremeanttoleverageexistingefforts.TheHubsarecurrentlyorganizingdraftsforeachspoke,andfullproposalsareduethismonth.Alargenumberofideascameinonsmartcities,andInternetofThings;thefood/energy/waternexus;andhumanhealthcare.NSFintendstofundtheseproposalsthisfiscalyear,andtherearelatentprojectswaitinginthewingsthatcanhelptransitionsomeoftheseideastopractice.NSFhopestodothisagainnextyear.Dr.HolmesofferedkudostoNSFforsettingupthisopen-endedeffort.Dr.Zhaonotedthatthereisanendgoalofsorts,aseachHubisresponsibleforgenerating29projectsattheendofthreeyears.ThisideaisnotcompletelynovelatNSF.TheFoundationhopetofundeachspokeforasecondthreeyears,tohavethembecomeself-sustaining.AsimilareffortwasundertakenunderUS-Ignite,tosupportnetworking.The
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
15
ideaistolookfortheunknowns,asinterestingthingscanhappenintheselarge,multiplecollaborations.Everyonebringstheirownphysicalinfrastructure,andalsotriestoidentifyserviceproviders.Dr.HolmesnotedthatmostoftheHubsweregeographicallyclosetoNASAPIs.Drs.HolmesandZhaoagreedthataclosercollaborationwouldbeideal.PlanetaryScienceBigDataDr.MichaelNew,ProgramScientistforthePlanetaryDataSystem(PDS),presentedtheneedsofBigDatafromtheplanetaryperspective.MostplanetarydataworkisbasedatGSFC.PlanetaryScienceDivision(PSD)datapoliciesstatethatallsciencedatareturnedfromplanetarymissionsbelongstothepublicdomain.Anyexclusivedataaccesscannotexceedsixmonths.Infundedscienceresearch,anydatanecessarytoreplicatepublishedresearchresults,thatarealsotheproductofaNASAaward,mustbemadeimmediatelyavailabletothepublic.TheplanetarydataenvironmentincludesPDS,thePlanetaryCartographyProgram(PCP;USGS),MinorPlanetsCenter(MPC;Harvard)andtheAstromaterialsCurationFacility(ACF;JohnsonSpaceCenter).Datarangesfromground-basedassets,individualinvestigators,mapping,dataanalysis(e.g.,trajectories),samplereturns,ANSMET(Antarcticmeteorites),toatmosphericdust.TheoutputofthePDSisprimarilytotaxpayers,educatorsandtalentedamateurs.AttheACF,NASAstoresspace-exposedhardware,lunarsamples,cosmicdustsamples,andHayabusa(comet)samples.NASAiscurrentlyre-engineeringitssamplecataloguetomakethesesamplesavailableonline.TheMPCisresponsibleforsmallbodies,andtheorbitsofminorplanetsandcomets.ThePCPmaintainsthecartographiccapabilityformappingtheplanetsandtheMoon,anddevelopsandmaintainstheIntegratedSystemforImagersandSpectrometers(ISIS),whichenablesthingslikespectrographicmapsofIo.ISISispreparingtoincorporateanopen-sourcevisualizationtool,theSPICE-basedCosmographia.(“SPICE”isaNASAinformationsystemanditsuseextendsfrommissionconceptthroughpost-missiondataanalysis,andithelpstocorrelateindividualinstrumentdatasetswiththosefromotherinstrumentsonthesameoronotherspacecraft.)PDSisafederatedarchive,withdatadistributedacrossthecountry;itsdisciplinenodeswererecentlyre-competed.Managementofthesystemasawholeisalsobasedonafederatedmodel.PlanetarydataaremanagedbyplanetarySMEs.Dataisphysicallystoredatthenodes,andthedeeparchiveismaintainedattheNASASpaceScienceDataCoordinatedArchive(NSSDCA).TheNavigationandAncillaryInformationFacility(NAIF)implementsstandardsandtoolsthatareneededtounderstandthemotionofcelestialobjects.Inplanetarydatasets,everythingismovingrelativetoeverythingelse:spacecraft,instrument,Earth,andSun,allofwhichneedtimeconversionstandards.ThecollectionofthesevariablesiscalledObservationGeometry(OG).ThecurrentPDSisdistributedacrosssixnodes,whichafterarecentcompetitionarenowintheirfirstyearofa5-yearCooperativeAgreement.ThePIsateachnodecollectivelyformamanagementcouncil,andprovideinputaboutstandardsanddecision-making.PDS-4hasjustrecentlybeenrolledout.ItisanXML-based,model-driven,service-orientedmodel,andamoderntechnicalfoundationforplanetarysciencedata.ExistingPDS-3
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
16
productswillbeconvertedtoPDS4whenpracticalandsensible.TheEuropeanSpaceAgencyandJAXA’planetarydatasystemsarebothadoptingPDS-4standards.ThetotalvolumeofPDSisabout1PB.Almostallcomputationsareperformedonindividualworkstations.PDShasjuststarteditsnext10-yearroadmap,andwillbeannouncinganopportunitytoself-nominateinearlyMarch.Areasofimprovementtobeaddressedintheroadmaparetoinclude:simplifyingandimprovingthepipeline;improvingsearchcapability;developingmoreusefulmetrics;improvingtoolsforarchivingsmalldatasets;andimprovingarchivepreparationanddocumentation,especiallyfornon-missiondataproviders.Relevantwebsitesare:naif.jpl.nasa.govandpds.nasa.govDr.HurlburtaskedaboutPDSmetrics.Dr.Newadmittedtohavingpoormetricsofusageandusers,andnotedthattheroadmapeffortwouldhelptoidentifythemetricsPDSwants,andtoadaptthesystemtoprovidethem.Dr.BeebecommentedthattheinternationalplanetarydataallianceacceptedSPICEastheirdatatoolattheirlastmeeting,afavorableindicator.Dr.New,whenaskedaboutBigDataneeds,allowedthattherewerenotmanyspecificareasinplanetary,withtheexceptionofmagnetosphericandplasmadata,orwhengeneratingveryhigh-fidelitygravitymodels.Thelunargravitationalmappingmission,GRAIL,iscurrentlyworkingonagravityfieldmodelontheHPC.Hehadn’theardaboutanyissueswithpipelineassociatedwiththeGRAILwork.Dr.NewfelttheBDTFcoulddirectaquestiontotheAgencyastohowitwouldliketohandlethestorageofgrantdata.PSDneedsacleardirectstatementonthisissue,whichneedstobeinformedattheAgencylevelbecauseitwillbearesponsetoanOSTPdirective.Thereare1500granteesinPSD;itwouldtakealabor-intensiveefforttostorealltheirdata.AnotherquestioniswhatkindofdataPDSisexpectedtoarchive.Dr.Holmesnotedthatthedirectiveappliestotheotherdisciplinesaswell,andinstructedDr.Smithtonotethisasanissue.Ameetingparticipantnotedthatthegrantdispositionquestionwasbeingaddressedintheroadmappingtask,entailingacommunity-basedreappraisalofthesubjectoverthenext6-9months.DiscussionDr.HolmesfollowedupbrieflywithDr.LeeonHPC,andaskedwhatvisibilityexistedfortheprogram,andwhatthechancesforcollaborationwithDOEExascalemightbe.Dr.leeidentifiedhimselfasChairoftheHigh-EndComputingInteragencyWorkingGroup(HECIWG),butnotedthattheExascalecomputingfacilityisunderNationalStrategicComputingInitiative,adifferentgovernance.TheHECIWGismeetingmonthlyatthemoment,andDr.Leefelthecouldstartvectoringthediscussionintheirdirection.HenotedthatDOEsetsupaprocessforeligibility;ataskneedstohaveacertainprofile,andxnumberofcores.ThegateforeligibilitytogetontheDOE’sleadershipcomputingsystems,however,ishigherthanNASA’sentiresystem.NASAisfarbehindNSFandDOEinthesupercomputingarena.NASA’sleadingsystemislessthan5Tflops.Dr.HolmesconsideredthatBDTFmakeafindingonthematter,asNASAisworkingonprojectsofnationalsignificance.Dr.TinoaskedifExascalewasspecificallydesignedtosolveDOEproblems,withspecificallyimplementedarchitecture.Dr.LeereportedthatDOEhasaco-designconcept,andtheybringinanapplicationthatworksontheexascalesystem.
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
17
Theyareconsideringclimate-changeasaco-designedsystem.DOEdoesn’thavetheinteroperabilityrequirement.Dr.WalkercommentedthatDOEhasspecificproblems,whileNASAismorebroad.Dr.HolmesnotedthatDOEisaddressingbothastronomyandclimate,andthatwhilesomeofthescalesaredifferent,thephysicsaresimilar.Dr.TinofeltthatNASAshouldeitherfocusonproductsandservices,oracceptgenerality.Dr.HolmessuggestedNASAmanagersaddressutilizationmodelsatfuturemeetings.Dr.KinteraskedaboutwhatHPCwoulduseBigIronforafteritsnominal3yearsofoperation..LeesaidthatNASAplanstorepurposeBigIronafter3years,backintoageneralizedcluster.NASAisstilllimitedbyfacilitiesre:powerandcooling.Dr.HolmesaskedDrs.TinoandKintertowriteatalkingpointonthefacilitiesissue.BDTFmembersraisedsomegeneraltopicsforfurtherexploration.Dr.Tinonotedthateachofthepresentershadadoptedsomeformofstandard,illustratingthatpeoplerecognizethatstandardsdomatter.Fromamanagementstandpoint,however,thesubdisciplineshadinconsistentmetricsonusers,andquestionedwhyarchiveshadtobemaintained,intheabsenceofusage.Dr.Walkerexplainedthatsomedatahaveextremelylonglives;everytimewegetanewmissiontoJupiter,forinstance,VoyagerandPioneerdatasetsareindemandagain.It’scriticalthatsomeofthesedatasetsbesafeguarded.Dr.HolmesnotedthattheSeniorReviewmightbeavehiclefordeterminingwhichdatashouldbekept.Dr.Hurlburtsuggestedusermetricsinformthesesortsofjudgments.Dr.Tinofeltusersurveyswerenotalwayseffective,andthatmetricsonactualusewouldbemoreusefulingettingsmartonwhatdatatostore.Dr.HolmesaskedDr.Tinoetal.tofleshthisoutthoughtanddomoreresearchinadvanceofthenextmeeting.Dr.Beebeaddedthatonealsoneedstoconsidertheintrinsicsizesofcommunitiesandtheirstability;theyalsotendtomovearoundwhenmajormissionsarise.Dr.HolmeswassurprisedatthelackofaclearvisionforthefutureandaskedDr.Hurlburttowriteafindingonthistopic.Dr.HolmesaskedDr.SmithtosoundouttheScienceMissionDirectoratetodeterminethelevelofconcernovergrantdatastorage.Dr.Beebereportedthatitwasamajorconcernthathasalreadyreachedthetopleveloftheadministration,whichhadestablishedworkshopsforpeoplepreparingforfederalgrants.Dr.HolmesgaveanactiontoDr.SmithtoclarifyDr.Murphy’sstatementontheuseofopensourcesoftware,andaskedBDTFmemberstoexaminetheNSFnodesoftheBDHubeffort,todeterminehowclosetheyaretoco-locatedNASAPIs.Dr.HolmesaskedthatthenextBDTFmeetingtakeplaceatGSFCfor2.5daysintheApril-Maytimeperiod,andtoperhapsconsiderasitevisittoARCinthefuture,toincludesomeinteractionwithSiliconValley.Dr.SmithreportedthatshewouldbeworkingonanextensionoftheTOR,off-line.Dr.Holmesadjournedthemeetingat4:59pm.
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
18
AppendixAAttendees
AdHocBigDataTaskForceMembersCharlesP.Holmes,Chair,BigDataTaskForceRetaBeebe,NewMexicoStateUniversity(viatelecon/Webex)NealHurlburt,LockheedMartinJamesL.Kinter,GeorgeMasonUniversity(viatelecon/Webex)ClaytonTino,Virtustream,Inc.RayWalker,UniversityofCaliforniaatLosAngelesErinSmith,ExecutiveSecretary,NASAHQNASAAttendeesLouisBarbieri,NASADanCrichton,NASAJPLElaineDenning,NASAHQDeborahDiaz,OCIONASAJohnEvans,NASAT.JensFeeley,NASAHQNavidGolpayegani,NASAJeffreyHayes,NASAHQPaulHertz,NASAHQTsengdarLee,NASAHQEdwardMasuoka,NASADuaneMcMahon,NASATomMorgan,NASAHQKevinMurphy,NASAHQMichaelNew,NASAHQHerbertSchilling,NASAGrifSchilly,NASAJohnSprague,NASAOCIOElizabethYoseph,NASANon-NASAAttendeesJosephBredenkamp,NASAretiredTerryBlankenship,BoozAllenHamiltonJungByun,BoozAllenHamiltonChiehsanCheng,GlobalScienceandTechnologyTrippCorbett,ESRIJosephDohry,BoozAllenHamiltonAlexDuner,MedillNews,Inc.
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
19
GraceHu,OMBEricFeigelson,PennStateUniversityRobertKohon,NovettaBradleyPeterson,OSU,Chair,NACScienceCommitteeAmyReis,Ingenicomm,Inc.AlyssaRetski,Lobbyit.comMarciaSmith,SpacePolicyOnlineConnieSpittler,GlobalScienceandTechnologyGeordanTilley,MedillNews,Inc.JoanZimmermann,Ingenicomm,Inc.
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
20
AppendixBMembership
Dr. Charles P. Holmes, Chair NASA HQ (Retired) Dr. Reta F. Beebe New Mexico State University Dr. Neal E. Hurlburt Lockheed Martin Space Systems Company Dr. James L. Kinter George Mason University Dr. Clayton P. Tino Virtustream Incorporated Dr. Raymond J. Walker University of California, Los Angeles Dr. Erin Smith, Executive Secretary NASA Headquarters
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
21
AppendixCPresentations
1. BigDataTaskForceCharter/SubcommitteeFeedback;ErinSmith2. LegacyfortheNACInformationTechnologyInfrastructureCommittee;Charles
Holmes3. HeliophysicsDivisionBigDataNeeds;JeffreyHayes4. BigDataandEarthScience;KevinMurphy5. SupercomputingandBigDataatNASA;TsengdarLee6. AstrophysicsDivisionBigDataNeeds;PaulHertz7. OtherFederalBigDataInitiatives(NSF);FenZhao8. PlanetaryScienceBigDataNeeds;MichaelNew
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
22
Appendix D Agenda
Ad Hoc Big Data Task Force
of the NASA Advisory Council Science Committee
Inaugural Meeting February 16, 2016
NASA Headquarters
Glennan Conference Room, 1Q39
Agenda (Eastern Standard Time)
Tuesday, February 16 8:00 – 8:30 Opening Remarks / Introduction of Members Dr. Erin Smith
Dr. Charles Holmes
8:30 – 9:15 Big Data Task Force Charter / Subcommittee Feedback Dr. Erin Smith 9:15 – 9:30 BREAK 9:30 – 10:15 Legacy from NAC IT Infrastructure Committee Dr. Charles Holmes
10:15 – 10:30 Discussion 10:30 – 10:45 BREAK 10:45 – 11:15 Planetary Science Big Data Dr. Michael New 11:15 – 11:45 Heliophysics Big Data Dr. Jeffrey Hayes 11:45 – 12:45 LUNCH 12:45 – 1:00 Greetings from the Science Committee Dr. Bradley Peterson 1:00 – 1:30 Earth Science Big Data Dr. Kevin Murphy 1:30 – 2:00 Supercomputing Big Data Dr. Tsengdar Lee
NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016
23
2:00 – 2:30 Astrophysics Big Data Dr. Paul Hertz 2:30 – 2:45 Public Comment 2:45 – 3:00 Other Federal Big Data Initiatives (NSF) Dr. Fen Zhao
3:00 – 3:10 BREAK 3:10 – 3:30 Work Plan and Future Meetings 3:30 – 5:00 Discussion / Findings / Recommendations 5:00 ADJOURN Dial-In and WebEx Information
For entire meeting February 16, 2016 Dial-In(audio):DialtheUSAtoll-freeconferencecallnumber1-800-988-9663ortollnumber1-517-308-9427andthenenterthenumericparticipantpasscode:4718658.Youmustuseatouch-tonephonetoparticipateinthismeeting.WebEx(viewpresentationsonline):Theweblinkishttps://nasa.webex.com,themeetingnumberis999765122,andthepasswordisBigD@T@16.
* All times are Eastern Standard Time *