little data, big data, no data? : data management in the...

44
Little Data, Big Data, No Data? Data Management in the Era of Research Infrastructures Workshop Report 26-27 April 2018 Hyytiälä Forestry Field Station, Finland Helena Karasti, Andrea Botero, Karen S. Baker, Elena Parmiggiani MULTICS Project University of Oulu

Upload: others

Post on 20-Sep-2019

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

LittleData,BigData,NoData?DataManagementintheEraofResearchInfrastructures

WorkshopReport26-27April2018

HyytiäläForestryFieldStation,Finland

HelenaKarasti,AndreaBotero,KarenS.Baker,ElenaParmiggianiMULTICSProjectUniversityofOulu

Page 2: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

1

LittleData,BigData,NoData?DataManagementintheEraofResearchInfrastructures

WorkshopReport

Conducted26-27April2018,atHyytiäläForestryFieldStation,Finland

HelenaKarasti,INTERACTResearchUnit,UniversityofOulu,FinlandAndreaBotero,INTERACTResearchUnit,UniversityofOulu,Finland

KarenS.Baker,INTERACTResearchUnit,UniversityofOulu,FinlandandSchoolofInformationSciences,UniversityofIllinoisatUrbana-Champaign,Illinois,USAElenaParmiggiani,INTERACTResearchUnit,UniversityofOulu,Finlandand

DepartmentofComputerScience,NorwegianUniversityofScienceandTechnology,Trondheim,Norway

UniversityofOuluDepartmentofInformationProcessingScience

SeriesB70,WorkingpapersISSN0786-8421

ISBN978-952-62-2006-2(electronicversion)Multi-scopedinfrastructuringproject(MULTICS,http://interact.oulu.fi/multics)acknowledges

fundingbyAcademyofFinland(grant#285903).SuggestedCitation:Karasti,H.,A.Botero,K.S.Baker,andE.Parmiggiani(2018)LittleData,BigData,NoData?DataManagement in the Era of Research Infrastructures. Workshop Report, April 2018, HyytiäläForestryFieldStation,Finland.UniversityofOulu,Finland.ISSN0786-8421.ISBN978-952-62-2006-2.Availableat:http://hdl.handle.net/2142/100870.

Page 3: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

2

TableofcontentsIntroduction..................................................................................................................................................................................3

Workshoppreparations.........................................................................................................................................................5

Pre-workshopsurvey-briefsummaryoftheresponses.................................................................................7

DayI(Sessions1-3)..................................................................................................................................................................8

Session1:Introduction......................................................................................................................................................8

Session2:Whatisdata?.................................................................................................................................................10

WalkingtourtoSMEARIIstation........................................................................................................................11

Session3:Whatisdatamanagement?....................................................................................................................12

DayII(Sessions4-5).............................................................................................................................................................13

Session4:Whatisdatainfrastructure?..................................................................................................................13

Session5:WorkshopWrapUpandNextPracticalStepsForward..........................................................14

Postworkshopsurveyresults..........................................................................................................................................16

Postworkshopreflectionsandrecommendations................................................................................................18

References..................................................................................................................................................................................19

Acronyms....................................................................................................................................................................................21

Appendices.................................................................................................................................................................................21

Appendix1.Agenda..........................................................................................................................................................22

Appendix2.Presurvey...................................................................................................................................................24

Appendix3.Postsurvey.................................................................................................................................................32

Appendix4.Handouts.....................................................................................................................................................34

Page 4: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

3

IntroductionThisworkshopwasorganizedtoprovidemembersandotherstakeholdersofINARRIEcosystemswithanintroductiontodatamanagementintheecologicalandrelatedsciences.Thenotionoflocaldatamanagementwasusedasastartingpointtodiscussdatamanagementactivitiestakingplaceatorclose to theoriginsofdata,and toenvisionhowdatawascoordinatedwithinandacrossboundariesofavarietyofrelatedcontexts.INAR RI Ecosystems1 is a consortium project funded by the Academy of Finland ResearchInfrastructure(FIRI)program2017-2021.Theaimoftheprojectistoproposeandconsolidateanumbrellaforenvironmentalandecosystemresearchinfrastructures(RIs)inFinland(Bäcketal.2017,ENVRIplus2017).TheconsortiumisledbyUniversityofHelsinkiandcomposedofkeyecosystemresearchcomponentsinFinlandincludingUniversitiesofHelsinki,EasternFinland,Turku, Oulu, and Jyvaskyla, as well as three national research institutes including NaturalResources Institute Finland (LUKE), Finnish Environment Institute (SYKE), and FinnishMeteorologicalInstitute(FMI).Figure1showstheINARRIEcosystemscomponentsanddepictswhichofthelocationsareecosystemobservationstations,experimentalfieldstations,biologicalaswellasecophysiologicallaboratories,orco-locationsofthese.TheaimofINARRIEcosystemsisto1)upgradeexistingplatformsandconstructnewplatformsand data structures for analysing the functional relationships between ecosystems and theenvironment, 2) strengthennational ecosystem research and its linkages to atmospheric andenvironmental sciences, and 3) build a national scale, coordinated RI which enables thedevelopmentandparticipationofFinnishpartnersininternationalRIinitiativessuchasICOS,AnaEEandeLTERaswellasdataRIssuchasEUDATCDIandLifewatch.Thus,INARRIEcosystemscontributesasanationalfocalpointforEuropeanStrategicForumonResearchInfrastructures(ESFRI)RIs.

1Forgeneraloverviewoftheprojectsee:https://www.helsinki.fi/en/inar-institute-for-atmospheric-and-earth-system-research/inar-ri-ecosystems-0

Page 5: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

4

Figure1.MapofINARRIEcosystemscomponents.FromINARRI(nd).TheMulti-scopedinfrastructuringproject(MULTICS2)plannedthetopic,content,andwaysofworking for the workshop. MULTICS is an Academy of Finland funded research project thatstudies the formation of research infrastructures (RIs), a.k.a. information infrastructures,knowledge infrastructures, and cyberinfrastructures. The project continues a lineage ofinvestigations focused on ecological and environmental research domains with longitudinalempirical engagements with Long-Term Ecological Research (LTER) networks, including theFinnish Long-Term Socio-Ecological Research (FinLTSER), and through them with INAR RIEcosystemsmore recently. Volunteering to organize the workshopwas a way for MULTICS’participantstocontributebacktoandgivethankstothoseinINARRIEcosystemswhogavetheirtimeforinterviewsanddiscussions.TheMULTICSprojecthasengagedinaresearchrelationshipwithINARRIEcosystemsaspartofthenationalintegrationeffortsinecologicalandenvironmentalresearchinfrastructures(Bäcketal.2017).Thiscollaborationhasincludedfollowingthelocalpreparationsaswellasthenationalconsortiums’effortssothattheeLTERproposalsubmissionwouldbeincludedintheEuropeanroadmapforRIs.TheseengagementshavealsodirectedMULTICSgroup’sinterestintheESFRIinitiative, a strategy-led approach to policy-making on RIs in Europe, that is in charge ofcoordinatingthisroadmap.

2FormoreinformationonMULTICSseehttp://interact.oulu.fi/multics

Page 6: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

5

WorkshoppreparationsTheaimoftheworkshopwastoprovideanintroductiontodatamanagementintheecologicalandrelatedsciencesaswellastheplaceoflocaldatamanagementwithintheemerginglargerdata and RI context, i.e. within the data landscape. This aim was inspired by a particularobservationmade aswe analyzed the policy-level ESFRI initiative with our interest on datamanagementandinfrastructureformation.WenotedagenericinvisibilityissuebuiltintotheRIformation strategy,whichwe have tried to capture in Figure 2. The upper part of the figuredepictsasimplifiedandanimatedversionoftheESFRI‘lifecycle’modelforconstructingRIs.Thelowerpartofthefigure,ouraddition,depictsthenationaland/orlocallevelofmemberstatesfromwhichthedataoriginate.DespitethecrucialroledataanddatamanagementplayinRIs,thelowerpartofthefigurehaslittlevisibilityinthe‘top-heavy’ESFRItemplate.Dataoriginatingfromthelocal/nationallevelofRIsbecomesrecognizedintheESFRImodelwhenthereisaneedfordatatoflowintothecentralhub(s)oftheRI,depictedwithahighlightedarrowlinkingthetopandbottompanelsinthefigureintheoperationphaseoftheRI.Theflowofdatabetweenlevelsrepresentsanexpectationthatdatamanagementproceduresareinplaceatnational/locallevelsandabletoproducegoodqualitydatafortheESFRIRI.

Figure2.Theupperpanelshowsasimplifiedviewofthe30-yearresearchinfrastructureformationstrategyadoptedbyESFRI.ThelowerpanelhasbeenaddedbytheMULTICSgrouptomakevisiblelocal datamanagement that requires not only attention but also funding if it is to design andoperationalizedataflowtoacentralhub.Source:MULTICSproject-VisualizationAndreaBotero.This observation seems a particularly big assumption that is being made for theecological/environmentalRIs,giventheESFRIpresentationoftheupperpanelonly.Existinglocaland national infrastructure efforts are very heterogeneous and distributed and as yet oftenlackingincoordinationbothatthelevelofresearchsitesandacrossresearchsites,andthus–whenitcomestodatamanagement–are‘bottom-heavy’enterprises.Therefore,wesetofftoplan

Page 7: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

6

aworkshopthatwouldpayattentiontothenational/locallevelasastartingpointtodiscussdatamanagementactivitiesandtoenvisiondataintegrationacrossmultipleboundariesandscopes.Theworkshopwasnotplannedasatraditionaltrainingsessionondatamanagement,butratherasaspacetocollectivelyidentify,reflect,anddiscussdata-relatedissues,resources,expectationsandnextsteps.Thesetopicsneedtobeaddressedanddevelopedinordertoachievewidespread,effective, and sustainable datamanagement practices and procedures in the era of researchinfrastructures.Planningtheapproachandcontentfortheworkshopwasenhancedbythebreadthofexperienceofourteamthatincludesbothlong-termpracticalworkwithdatamanagementatLTERsitesandresearch on data management and infrastructure work in ecological research networks andcommunities(seeforexample,KarastiandBaker2004,2008a,2008b;Karastietal.2006,2010).This focus is complemented by a socio-technical understanding of how infrastructures areembeddedinlocalpracticesandtechnologicalarrangements;yetalsoconnectedreachingovermultiple scopes,andhow they emerge through longitudinalprocesses (see e.g.:Bowker et al.2010;Karasti2014).Basedon thiswe tried to focuson themost essential datamanagementconceptsandhowtopresenttheminapalatablemanner.Sincedatamanagementdiscussionsinvolvemanyideasandtechnologiesaswellassocialandinstitutionalarrangements,workshopsofoneormoredaysmaybeoverwhelmingorincomprehensibletotheuninitiated.Creatinganinformal atmospherewith selected content is important for generating livelydiscussionsandthought-provokingexamplesthatenableparticipantstoconsiderhowtodefinenextsteps fordatamanagement suitable to their particular circumstances. It was important for us to freeindividualsfromtheideaofseeking‘TheSolution’todatamanagementsotheycouldgraspthediversityofdatamanagementactivities,andwouldfeelfreetobounceideasoffeachother,whileconsideringwhatoptionswouldbeappropriatefortheircircumstances.Inplanningthewaysofworkingfortheworkshop,wedrewonourbackgroundinthetraditionofParticipatoryDesign(seee.g.SimonsenandRobertson2013).Given the diversity of kinds of datawork involved in INARI RI Ecosystems - spanning fromnuanced hand-crafting to automated workflows as well as from novice to mature datamanagementpractices -wegavehighpriority to creating anappreciative atmospherewhereparticipantscouldopenlysharetheirexperiencesandbeacceptedregardlessoftheircurrentdataarrangements.Weconductedapre-workshopsurveytogetanideaoftheparticipantsandtheirspecific situations with data management. Survey responses (see next section for a briefsummary)guidedidentificationofandcommunicationwiththosehavingspecificcommentariesto share. This pre-workshop communication helped some participants frame their briefpresentationsandbegindevelopinganunderstandingofwhattheycouldcontribute.Whetheraparticipant decided they were comfortable with a five-minute commentary or a ten-minutepresentationwithslides,thisprovedtobeanimportantfirststepintheengagementprocess.Inaddition, we solicited comments and feedback on our developing plans and discussed (orexchangedemails)withseveralworkshoppresentersandparticipantsastheagendaevolvedtofitourgrowingunderstandingofhowtomeetcommunityneeds.Theworkshopwasorganizedinfivesessionsoveroneandahalfdayswiththefirstdayfocusingondataanddatamanagementandtheseconddayhighlightingthedatalandscape,thatis,thelarger contextwithinwhich local datamanagement takes place. Each session included shortcommentariessharedbytheparticipantsthemselvestofosterreflectionaroundtheirexamples

Page 8: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

7

and experiences. We also arranged for short presentations by experts and invited guests, awalkingtourofthestationandfieldsamplingaswellasgroupwork, interactiveactivitiesanddiscussions.Thepresentationsandactivitiesofallfivesessionsarebrieflydescribedbelow.Following the workshop, materials including presentation slides, handouts, and photographswere posted online for access by participants so that those ready to use the vocabularies,concepts,andunderstandingsofthedatalandscapewouldhavematerialsathandtorefreshtheirmemoriesandsupporttheiractivities.Apost-workshopsurveywasdistributed,theresultsofwhicharesummarizedinthefinalsectionofthisreport.

Pre-workshopsurvey-briefsummaryoftheresponsesToaidintheplanning,workshoporganizersdesignedanddistributedashortsurveytogatherinformationaboutthestateandexperiencesofcurrentparticipantsoftheINARRIEcosystemscomponentsandotherstakeholders(SeeAppendix1forsurveyquestions).Examplesandissuesraisedbytheresultsofthesurveyservedasbasistoplanandstructurethecontentandprogramoftheworkshop.

Figure3.a.Detailsfromthesurveyform;b.Responsestoapre-workshopsurveyquestion.Insightsfromthepre-workshopsurveyinanutshell:

● Thereexistavarietyof‘datagenerators’inINAR-RIEcosystemsandtheircharacteristicsrelatetotheoriginsofdata(Q3-4)

● Participantshandle awidevarietyof heterogeneousdata, data formats,materials etc.(Q13)

● Someelementsofdatamanagementarepresent,butnoneseemtobecoveringthewholespectrum of activities needed (metadata, controlled vocabularies & data dictionaries,data/metadatacatalogs,datarepositories,datapolicy/usestatement,supportpersonnel,filestoragespace,publishingdatainconjunctionwithscientificpublications)(Q5-14)

● Whatworkswell? Answers ranged from “nothingworkswell at themoment” to someaspectsworkingandothersnot(yet)(Q19)

● Bottlenecksorproblems:(Q20)○ “Nomanagementplan,noguidelines,everyoneforthemselvesattitude,notimefor

management”○ “Someofthedataispreservedbynon-militaryservicepersonsandtheychangeonce

ayear”–i.e.highturnoveroftemporarypeople

Page 9: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

8

○ “Wehavetensofbindersfullofpaperswitholddata,whicharenotdigitized,ifweonlycouldhaveallthetimeintheworld...”

○ “Thereisnopersonnelfordatamanagement”○ “Relyinganddependingonaphysicalpersontosharethedata”○ “Missingclearguidelines”○ “Documentationofmanythingsafterwardsornotatall.Adhocroutineswithout

consideringlong-termworkload”.○ “Goodtoolsforsharinglargeamountsofdatadirectlyfromthestationarelacking.”

ResponsesbearsimilaritiestothosefromasurveyofFinLTSERin2007/8(brieflyreportedinKarasti 2009). An early study of the US LTER InformationManagement community (Karasti,Baker and Halkola 2006) also discusses datamanagementwork as dealingwith and findingbalancesforavarietyofissuesevenafterdecadesofpractice(seeFigure4).

Figure4.ElementsofLTERdatastewardshipillustratedbyinterviewexcerpts,basedonanempiricalstudyofUSLTERnetworkin2002(fromKarasti,BakerandHalkola2008,p.348).

DayI(Sessions1-3)

Session1:IntroductionTheworkshopopenedwithanoverviewofongoingchangesinenvironmentalscience,societalchallenges, and technology developments that make issues of data management pressing toaddress.

Page 10: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

9

Figure5.a.ParticipantsgatheredattheolddininghallinHyytiäläorganizedastheworkshopvenue;b.HelenaKarastiintroducedthebackgroundandaimsoftheworkshop.

Presentations

● Welcomewords-JaanaBäck,INARRIEcosystems● Introductiontotheworkshopandpre-surveyinsights-HelenaKarasti,MULTICSProject

ACTIVITYBRIEF:IntroductionsThosewhogenerate,manage,andusedatabenefit fromhearingabouttheemergenceofnewdata-related roles. New data roles lead to changes in who is designated to carry out bothtraditionalactivitiesinsupportofresearchandnewresponsibilitiesassociatedwithdatasharinganddatapreservation.Thispartofthesessionwasgeared tobuildasharedandmultifacetedimageoftheparticipantstogetherwiththesituationsandconditionsofdatamanagementinthedifferent locationsof INARRIecosystems.Todo thatparticipants introducedthemselvesandtheirsitesororganizationsthroughaseriesofmobileandinteractiveactivitieswheretheymovedthemselvesaroundtheroominresponsetothefollowingthreekeythemes:

Figure6. a.Participants introduce themselveswhile explainingwhy they locatednear the “littledata”sign;b.Participantswalkalongadatamanagementawarenesstimeline.1)Signsmarkedwith“bigdata”,“littledata”and“nodata”werelocatedatdifferentcornersoftheroom.Participantswereaskedtopositionthemselvesaccordingtohowtheywoulddescribethe

Page 11: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

10

kind of data they worked with or generated (see Figure 6a). A subset of the participantsintroduced themselves (name, organization) and explained why they were standing in theirparticularlocation.2)Signsmarkedwith“Nometadata”,“Localconvention”,and“MetadataStandard”werelocatedatdifferentcornersoftheroom.Participantswereaskedtopositionthemselvesnearorfarfromthemdependingonwhether thedata they generateor usehas associatedmetadata.Anothersubsetoftheparticipantsintroducedthemselves(name,organization)andexplainedwhytheywerestandingintheirparticularlocation.3)Alonglinemadeoftapewasplacedonthefloor,traversingtheroom.Itwasmarkedatoneendwiththeword‘NOW’andattheotherendwith‘70s’(seeFigure6b).Participantswereaskedtoself-organize themselvesalong the line,standingat the timewhenthey firstencounter issuesrelatedtodatamanagementintheirwork.Theywereaskedtotalkwiththeirneighborstofindthe right spot in the timeline. The remaining participants introduced themselves (name,organization)andexplainedwhichyeartheywerestandingatandwhatwastheirsituationwithregardtodatamanagement.Besidesservingasanengagingapproachtointroductions,theexercisesaimedtointroducesomekey ideas and basic vocabulary (e.g., ‘big data’, ‘metadata’, ‘local convention’) in an informalmanner.Forinstance,withtermssuchas‘bigdata’thatcanrefertoanynumberofcharacteristicsof data such as the volume, variety, veracity, or velocity of capture, the first themedexerciseexposedparticipantstothemanykindsofdatarepresentedattheworkshop.Further,themanymetadatastandardsfoundinpracticearetypicallyasourceofconfusionyetthereoftenexistsanexpectation that a first step requires identifying and using a single metadata standard.Introducingtheconceptof‘localconventions’alsoreferredtoaslocal‘workingstandards’,isastartingpoint fordiscussionofhowa localconventioneventuallycanbemappedtodifferentstandards.Localconventionsofferafirststeptowardcoordinatingdescriptionofdatainalocallymeaningfulwaywhilegainingexperienceintheuseofmetadataandthechallengesinvolvedindesigninglocalprocessesthatcapturemetadata.

● CommentarySession1-JaanaBäck,INARRIEcosystems

Session2:Whatisdata?Thesubmissionofdatatodatafacilitiesoftenassumesthatdataiswellorganized,freeoferrors,and accompanied by good documentation. Further, in practice, there aremany kinds of dataincludinglong-termobservations,campaignsamplings,experimentalarrangements,modeling,and data products where generation, organization, and handling are as yet ill-defined. Asparticipants made brief statements about the data at their locations, we began building avocabularyaboutthewhat,when,andhowofdatainfrastructure.Presentations

● INARRIEcosystemsdatarelatedinsightsfromCSC-JessicaParland,CSC● Whatisdata?Adataset?Adatapackage?Adataproduct?-KarenBaker,MULTICSProject

Page 12: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

11

Figure7.a.Workshoppresentation;b.Workshopdiscussion.Examplesfromparticipants(selectedonthebasisofpre-surveyresponses)Weinvitedsomeparticipantstomakeshortcommentsaboutthekindofdatatheyhaveandthestatusofdatamanagementarrangementsattheirlocation.Weusedtheirownwordsfromthesurveyaspromptsforworkshopdiscussion.Example1“nopersonnelfordatamanagement”and“datanotdescribedbymetadata”KotkanojaexperimentalfieldandLintupajubufferzone-LUKE,Jokioinen.-JaanaUusi-Kämppä(researcher)Example2“datadescribedbymetadatamostly”-SMEARIIHyytiälä-JuhoAalto(datacreatoranddatauser)Example3“Nometadatastandard,butsomesortofownthatiseasytotransfertostandardform”and“Dataismostlyinexcel-sheetslocatedinaUniversityserver”.-OulankaResearchStation,KatjaSippolaExample4“Asaresearchtechnician,Idoalmosteverything”and“Wehavetensofbindersfullofpaperswitholddata,whicharenotdigitized,ifweonlycouldhaveallthetimeintheworld…”-KevoFieldstation,TommiAnderssonExample5”Whatlevelofscaletouse,orbetterhowtodefinetheboundariesofadatapackagewhenonehaslargeenvironmentalmonitoringdata”-SMEARstation-PasiKolari(datamanager)

● CommentarySession2-JohannesPeterseil,EnvironmentAgencyAustria

WalkingtourtoSMEARIIstation● IntroductiontoSMEARstations-JaanaBäck,INARRIEcosystems

Page 13: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

12

Participants had a chance to visit the SMEAR II Station (Station for Measuring EcosystemAtmosphere Relations) and to ask questions about the origin of the data, the types of datacollected,theinstrumentationpresent,anddatahandlingatthestation.

Figure8.a.Participantsexploring the forestandtheSMEARII tower;b.SMEARstationconceptdiagram.

Session3:Whatisdatamanagement?Atthesourceofthedata,thereareimportantresponsibilitiesandopportunitiestodayrelatingtodatamanagementandthegrowthofdigitalcapabilities. In thissessionkeydatamanagementactivitiesandissueswereexploredbydiscussingexamples,arrangements,anddecision-makingactivities that accompanyeverydaydatapractices.Handoutswith links to a fewof themanyonlinedatamanagementplanningmodules andeducation resourceswere alsoprovided (seeAppendix4).Presentations

● General overview ofdatamanagement at Center for Ecology - Sue Rennie, Centre forEcology&Hydrology,UK

● Whatisdatamanagement?Adatabase?Adatasystem?-KarenBaker,MULTICSteamGroupworksessionParticipantsdividedintofourgroups(consideringsimilaritiesintheirconfiguration).Workingaroundgrouptablesfosteredcommunicationandsharingthatspurredsomeparticipantsintheirthinking and exposed participants to awide range of data-related concerns. The aimwas toidentify, first individually and then as a group, key activities of data management alreadyhappeningattheirlocations.Theparticipantsdiscussedtogetheraboutthosetheyidentifiedasimportant. The intention was for groups to create a list of priorities for data management(considering the time, resources, etc.), but we did not reach that level of detail. Most of theconversationscenteredonbrainstormingaroundtheircurrentdatamanagementissues.ActivityBrief:LocaldatamanagementactivitiesWriteoneactivityperpost-it,identifyifitiscoveredatyourlocationornot.Usethehandouts3and4 as aids in your thinking.Materials: post-its, big sheetsof paper,markers, handoutsonvocabularyandexamplediagrams.

Page 14: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

13

Figure9.Workingsessionsbygroups.

● CommentarySession3andWrapUp-SueRennie,CentreforEcology&Hydrology,UK

DayII(Sessions4-5)

Session4:Whatisdatainfrastructure?Presentation

● LinkingtotheEuropeanLTERDataNetwork-JohannesPeterseil,EnvironmentAgencyAustria

Examples

● PlutoF,KristjanAdojaanUniversityofTartu● CSCdatalinkingresources,JessicaParland,CSC● LinkingtoICOSPasiKolari,Hyytiälä,SMEAR● “Whichrepositorytouse?”ChristineRibeiro,UniversityofHelsinki

Presentation

● What is data infrastructure?When and howdo data infrastructures become researchinfrastructures?-ElenaParmiggianiandKarenBaker,MULTICSProject

Figure 10. a. Guest Speaker Johannes Peterseil’s presentation; b. Working pictorial of datagroupings.

Page 15: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

14

GroupworksessionParticipantswere divided into three categories: people in charge of heterogeneous data at aresearchstation,researcherorresearchgroupworkingwithdata,andpeopleinchargeofdatacomingfromhighlyinstrumentalizedstations.Participantswereaskedtoreviewthebasicsetofdata landscape elements identified during the workshop and draw first their own currentconfiguration,andthentheirviewofthenearfuture.Theexerciseaimedtoallowparticipantstoimaginewheretheywanttobeinnextyearsintermsoftheirowndatamanagement.Inparticularthisexercisewouldmakevisible(forall)therelationsbetweenlocationsofdatagenerationandaggregation,andwhatthosemightmeaninasharedfutureINARRIEcosystem.ActivityBrief:DatalandscapemappingUse the “template components” tomakeone concrete exampleof data infrastructure at yoursite(s)orbasedonyourownsituation locally.Addarrows for theconnectionsyouknowanddottedarrowsfortheonesyou"think"shouldbedeveloped.Usingthelistofdatamanagementactivities that you made yesterday, add the activities you’ve identified associated with datainfrastructure(whatyouhave,whatyoudonothave)Materials:Paper,pens,Handout5:DataManagementActivities&VocabulariesandHandout6:ConfigurationsofDataInfrastructurethatreferencesthefield,local,andremotedataarenas.

FIgure11.Groupworkdiscussions,activities,andreflectionsonthedatalandscaperepresentations.

● Grouppresentationsandreflections

Session5:WorkshopWrapUpandNextPracticalStepsForwardParticipantsagreedthattheworkshopwasusefultostartacollectiveconversationontheroleofdatamanagementinscientificworkintheenvironmentalsciencesinFinland.Thediscussionsservedtooutlinea fewsteps towards improvinganddevelopingdatamanagementplansandstrategies.FollowingasuggestionbyJaanaBäck,participantsagreedonsomepracticalnextsteps:1)AllattendingsitesworkoncreatingorupdatingtheirprofilemetadatainDEIMS2)DatamanagementplanswillbetakenupinthenextRESTATmeeting3)INARRIEcosystemswillstartcompilingametadatacatalogueofavailabledatasets

Page 16: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

15

4)Amailinglistforissuesrelatingtodatamanagementwillbecreatedsoparticipantsandotherinterested people can continue receiving and giving peer support around data managementtopics.

Page 17: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

16

PostworkshopsurveyresultsWhatwaslearned?InsightsintodatamanagementDuringthetwodays,theparticipantsconsideredavarietyofissuesrelevanttodatamanagementandjointlydiscussedtoolsandstrategiestolinkeverydayworkpracticeswithdataproductionsupporting both traditional and new approaches of scientific knowledge production.Participantshad theopportunity to share their experiences andexamples, and to learn fromothers’cases,“listeningtothedifferent‘cases’ingeneralwasrewarding”andevensothatmostinterestingintheworkshopwas“otherpeople’sproblemswithdata”.Participantsfoundit“veryuseful tomeetpeoplewhodealwithsame issuesandwork tosolve those issues”andgainedinsightintotheexistingheterogeneitieswithintheINARRIEcosystems:sites/stations/institutesandtheirissues,data,methods,instruments,standards.

“thedifferentwayspeoplearedealingwithdatamanagement”“typesofdataaredifferent”“wehave"datafactories"(likeSMEAR…)andmorediscreteandlessintensivedataproducers”“thefactthatthedataisscatteredandheterogenous”“thevarietyofthestationsbecameclearer,asdidthedatamanagementinfrastructureandneedsconcerningit”“thepeopleandinstrumentationatthesites”“betterunderstandingofthesiteissuesandtheimportanceofthemetadata”

It was successfully conveyed that there is not ‘One Solution’ to data arrangements and thatdifferentapproacheshavelegitimatereasons.

“therearemanyoptionshowtohandlethedatamanagement,notonesinglerightsolution”

Itwasalsorecognizedthatthestateofdatamanagementvariesbetweenparticipants’locations.Thoseparticipantswithlessadvanceddatamanagementwereassured/comfortedthattheywerenottheonlyonesfeelingunsureofnextstepsandlackingresourcesfordatamanagement.

“varietyofthelevelsofdatamanagementindifferentsitesandorganizations,andeveninsidetheorganizations”“tonoticethatthisisnotaclearthingtoeveryoneandthisreallywillneedtimeandworktobedone”

TheparticipantsbecamemoreawareofhowtoconnectwiththeEuropeanlevel.

“awareoftheresourcesopen…ataEuropeanlevel”“ItwasinterestingtogettoknowaboutDEIMS,whichwasn'tfamiliartomebeforebutwillbeusefulresource/platformformeandmygroupinmanyways.”“therearetools/help/standards”

Page 18: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

17

All in all, workshop participants gained a broader understanding of data management.It is not only a technical job, but a deeply socio-technical effort that necessitates a lot ofcommunication,collaboration,planning,andeventuallyresources.

“ecosystemRIcommunityraiseditsreadinessleveltoimprovethedatamanagementqualityandgotskillstothinkaboutdatamanagementinamorecomprehensiveway”“itrequirescommunication,thinkingandplanningtosetupadatamanagementsystem”

Towards the end of the workshop, participants – drawing on the shared data managementvocabulary,concepts,andstrategies–startedtothinkabouthowtoshapethedevelopmentofdataarrangementsasthefinalgroupworkassignmentsincetherearemanywaystoconfigurethe multiple components of data infrastructure. They got ideas how to continue their datamanagementworkandfeltmoreateasewiththeirjobs.

“waystoimprovevisibilityofourmonitoring”“Improvementofissuesrevealedinthelastgrouptask.”“IfeelthataftertheworkshopIhavegoodtoolstogoonwithsharingourdata.Thisdoesnotfeelashugeeffortasitusedtofeelbefore,butsomethingmanageable.”

Whatisimportantforcontinuingtheprocess?Oneofthemostimportantactionstoworkonisthepromotionofawarenessandrecognitionoftheimportanceofdatamanagementonnationalandlocallevels.Atthemomentthedegreeofawareness is not even and there are missing resources and key responsibilities that areunrecognizedandundelegatedatmanysites.

“theresourcestodothedatamanagementworkseemtoberestrictedanddatamanagementtasksseemtofallfromthetablewithleastpriorityinthemiddleofallothertasksandduties.Thereshouldbesomekindofchangeinthinking.”“Itwouldbeofimportancealsotofindwaystogettheearandmandate(i.e.resourcesandpossibilities)totaketheseissuesforward.Helpincommunicatingtheimportance.”

Participantsidentifiedcommunitybuildingandmorecollaborationfordatamanagementonthenationallevelasahighpriority,animportantstepformovingforward.Adatamanagementgroupisrelativelyeasytostart.Forexample,agroupmaybebegunbyestablishingclearcommunicationchannels.However,tocreateacommunitythatissustainedovertime,requiresdevelopmentofgoals,activities,andgovernancethatcontributetoaprocessthathelpsshapeandconsolidatethecommunity.

“Community building would be valuable. People will need a lot of peer support andempowermenttobeabletogooninthisveryheterogeneousandscatterednetwork.”“A regular communication channel for datamanagers and the opportunity tomeet andshare/learnfromeachother.”“Regularmeetingwiththenetworkanddevelopmentofcommontoolsandsolutions,sharingprotocols, procedures and methods, templates, guidelines and also technical tools andplatforms...”

Page 19: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

18

“I'mlookingforwardthemeetinginKonnevesi;weshouldestablishacommongroundforthedatamanagementworkfortheresearchstations.”“It might be good if somebody would follow how the things start to move in eachfield/laboratory/institute/projectpartner?Aretherestillbarrierstostarttheprogress?Intoughquestionsitmightbenicetogetsupportfromsomeone.”“ThereisstillawaytogotodeveloptheconnecteddatamanagementwithinFinlandbut…thegroupthatformedatHyytiälähaseverychanceofsuccess.”

Attainmentofambitiousscientificobjectivesbyadatageneratingprojecttoday,requiresexplicitrecognitionthattherearedifferentkindsofdatathatmayrequiredifferentkindsofcare.Asmulti-group data management is embedded in different kinds of organizational and institutionalsettings,

“clearlytherearedifferentgroupsthatneeddifferentapproaches.”

PostworkshopreflectionsandrecommendationsResearch infrastructure development is ongoing, at the same time our understanding of it isemergentandincomplete.Themetaphorof‘doingresearchonashipthatisbeingbuiltwhileinuse’continuestobeanappropriateimagetoreflectonthetasksaheadforINARRIandothersinvolved.Atthelocalleveltherearemanyinterestingdirectionstocontinuetheworkwithdata.Wesecondthe proposed next steps that contribute to the community process as discussed during theworkshop. This process should result in more clarity on the assignation of roles andresponsibilitiesregardingdatamanagement.Besidesthemailinglist,othertoolslikeacollectivemappingofexistingassetsanddatamanagementpracticescanbecarriedout.Thisisneededtomakevisiblewhatexists,giveitnameandrecognitionaswellasidentifyingmissingcomponentswithmorespecificity.Constructingacommunityfocusingondatamanagementcanenablepeopletoconsolidateasharedvocabulary,relayontheircollectiveanddistributedexpertiseinmoreexplicitways andmaintain thedocumentation anddevelopmentwork thatneeds tobedonelocally.Interdisciplinaryworkwouldbeneededtoaddressthecomplexityanddynamicsofestablishinga functional, stable and sustainable local data environment and tomaintaining it within thecontextofalargerdata landscape.Therefore, itwouldbe important that INARRIEcosystemsstakeholderstakeexplicitactionstoraiseawarenesswiththenationalfundersaboutthecentralimportanceofdatamanagement inresearch infrastructuredevelopment.Themessageshouldclarify that it isnot only important, but thatdatamanagement requires resources/fundingatmany levels and that it is not supported currently by European research funding norinfrastructuresources.Thereisalsoaneedforrecognizingandbridginggaps.Goingbacktotheobservationthattheworkshoppreparationsstartedwith(seeFigure2),thereisaneedforbridgingthegapbetweentheEuropeanandthenational/localdatamanagementlevels.Finlandisdefinitelynottheonlymemberstate/countrywheredatamanagementprocedurescapableofproducinggoodqualitydata for sharingandpreservationarenot yet inplace.At this time, eachmember statemust

Page 20: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

19

strugglewithfiguringouthowtoaddressandsupportnational/localdatamanagementontheirown.DevisingwaysandprocessesattheEuropeanlevelforsupportingandcoordinatingwithdataactivitiesatthenationallevelwouldbewelcomed.

ReferencesBäck,J.,Kaukolehto,M.,Rasilo,T.,Pumpanen,J.,Tuittila,E.,Paavola,R.,Laatola,K.,KarjalainenJ.,Suominen,O.,Forsius,M.,KolströmT.,MäkipääR.,LohilaA.,Pursula,A.,Juurola,A.,&Heiskanen,J.(2017).TerrestrialEcologicalandEnvironmentalResearchInfrastructuresinFinland-AnalysisoftheCurrentLandscapeandProposalforFutureSteps(WhitePaper).Helsinki,Finland:INARIRIEcosystemsProject.Bowker,G.C.,Baker,K.S.,Millerand,F.,&Ribes,D.(2010).Towardinformationinfrastructurestudies:Waysofknowinginanetworkedenvironment.InJ.Hunsinger,L.Klastrup,&M.Allen(Eds.), International handbook of internet research (pp. 97-117). Dordrecht, Netherlands:Springer.ENVRIplus (2017). Further Integration of Research Infrastructures Related to TerrestrialEcosystemResearch,IncludingRecommendationsonCo-locatingResearchSitesonNationalandInternationalLevel.WorkPackage12ReportD12.3.Availableathttp://www.envriplus.eu/wp-content/uploads/2015/08/D12.3-.pdfINAR RI (nd). Institute for Atmospheric and Earth System Research. Available athttps://www.helsinki.fi/en/inar-institute-for-atmospheric-and-earth-system-research/infrastructure/national-research-infrastructuresINAR RI Ecosystems (nd). INAR RI Ecosystems – Integrated Atmospheric and Earth SystemScience Research Infrastructure. Available at http://www.syke.fi/en-US/Research__Development/Research_and_development_projects/Projects/Finnish_LongTerm_SocioEcological_Research_network_FinLTSER/INAR_RI__Integrated_Atmospheric_and_Earth_System_Science_Research_Infrastructure.Karasti, H., & Baker, K. S. (2004). Infrastructuring for the Long-Term: Ecological InformationManagement.Proceedingsofthe37thHawaiiInternationalConferenceonSystemSciences2004.Karasti,H.,Baker,K.S.,&Halkola,E.(2006).EnrichingtheNotionofDataCurationinE-Science:DataManagingandInformationInfrastructuringintheLongTermEcologicalResearch(LTER)Network.ComputerSupportedCooperativeWork(CSCW)–AnInternationalJournal,15(4),321-358.Karasti,H.,&Baker,K.S.(2008a).Digitaldatapracticesandthelongtermecologicalresearchprogramgrowingglobal.InternationalJournalofDigitalCuration,3(2),42-58.Karasti, H. & Baker, K. S. (2008b). Community Design: Growing One’s Own InformationInfrastructure. In Proceedings of the 10th Participatory Design Conference (PDC’08), Oct 1-42008,BloomingtonIN.ComputerProfessionalsforSocialResponsibility,pp.217-220.

Page 21: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

20

Karasti, H. (2009). FinLTSER Network formation and the state of information management.EcologicalCircuits-Eco-informaticsinAction,Issue2,pp.15-18.Karasti, H., Baker, K. S. andMillerand, F. (2010). Infrastructure Time: Long-TermMatters inCollaborativeDevelopment.Computer SupportedCooperativeWork (CSCW) –An InternationalJournal,19(3-4),377-415.Karasti,H.(2014).InfrastructuringinParticipatoryDesign.ParticipatoryDesignConference.InProceedings of the 13th Participatory Design Conference (PDC '14), 6-10 October 2014,Windhoek,Namibia.ResearchPapers-Volume1.ACM,NewYork,NY,USA.pp.141-150.Simonsen,J.,&Robertson,T.(Eds.).(2013).RoutledgeInternationalHandbookofParticipatoryDesign.Routledge.

Page 22: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

21

AcronymsACTRIS -EuropeanResearch Infrastructure for theobservationofAerosol, Clouds, andTracegasesAnaEE-EuropeanResearchInfrastructureforAnalysisandExperimentationonEcosystemsCSC–FinnishITCenterforScienceDEIMS-DynamicEcologicalInformationManagementSystemDEIMS-SDR-DynamicEcologicalInformationManagementSystem-SiteandDatasetRegistryoftheEuropeanLongTermEcologicalResearchNetworkDEIMS-USLTER-DrupalEcologicalInformationSystemoftheUSLongTermEcologicalResearchNetworkeLTER-EuropeanResearchInfrastructureforLTER-EuropeESFRI-EuropeanStrategicForumonResearchInfrastructuresEUDAT-EuropeanDataInfrastructureFinLTSER-FinnishLong-TermSocio-EcologicalResearchNetworkFIRI-FinnishResearchInfrastructureprogramFMI-FinnishMeteorologicalInstituteICOS-IntegratedCarbonObservationSystemILTER-InternationalLongTermEcologicalResearchNetworkINARRI-InstituteforAtmosphericandEarthSystemResearchINARRIEcosystems-ComponentofINARRIthatstudiesmajorecosystemsinFinlandLTER-Europe-Long-TermEcosystemResearchinEuropeNetworkLUKE-NaturalResourcesInstituteFinlandNRI-NationalresearchinfrastructureRESTAT-FinnishResearchStationsinvolvedinINARRIEcosystemsRI-ResearchInfrastructureSMEAR-StationsMeasuringAtmosphereEcosystemRelationshipsSYKE-FinnishEnvironmentInstituteUSLTER-UnitedStatesLongTermEcologicalResearchNetwork

Appendices- Appendix1.Agenda- Appendix2.Presurvey- Appendix3.Postsurvey- Appendix4.Handouts

Page 23: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

22

Appendix1.AgendaAGENDA:Littledata,bigdata,nodata?

Datamanagementintheeraofresearchinfrastructures26-27thApril2018;Hyytiälä,Finland

Organisers:INARRIecosystemsandMULTICSproject Thursday26April7.00-8.00 Breakfast8.30-09.50 Session1:Introduction● Welcomewords-JaanaBäck,INARRIecosystems(10min)● Introductiontotheworkshopandpre-surveyinsights-HelenaKarasti,MULTICS

Project(15min)● INAR RI ecosystems! Participants will introduce themselves and their

sites/organizationstosetthestagefordiscussionsofdatamanagement(60min)● Commentary-JaanaBäck,INARRIecosystems(5min)

10.00-10.10Healthbreak10.10-11.00Session2:Whatisdata?-Part1● INARRIecosystemsdatarelatedinsightsfromCSC-JessicaParland,CSC(5min)● Whatisdata?Adataset?Adatapackage?Adataproduct?-KarenBaker,MULTICS

Project(15min)● Examples fromINARRIecosystemsparticipants: JaanaUusi-Kämppä, Jokioinen,

LUKE; Juho Aalto, Hyytiälä - Helsinki University; Riika Ylitalo, FinnishMeteorologicalInstitute(5+5+5+5minquestions)

● Metadatastory-KarenBaker(10min) 11.00-12.00Lunch12.00-13.30Session2:Whatisdata?-Part2● CollectiveClinic-OneexamplefromOulankaresearchstationbyKatjaSippolawill

begivensomeconcretetipsformetadatabythegroup(45min)● Examples from INAR-RIecosystems -TommiAndersson,Kevo research station,

PasiKolari,Hyytiälä,SMEAR(10min+10min)● Commentary - Johannes Peterseil, Environment Agency Austria (5min + 5 min

groupreflections)● IntroductiontoSMEARstations-JaanaBäck,INARRIecosystems(15min)

13.30-13.50Coffeebreak13.50-15:30WalkingtourtoSMEARstation(“talkingdata”)+GroupPhoto15.30-17:30Session3:Whatisdatamanagement?● GeneraloverviewofdatamanagementatCenterforEcology:SueRennie,Centre

forEcology&Hydrology,UK(20min+10min)● What isdatamanagement?Adatabase?Adatasystem?-KarenBaker,MULTICS

Project(10min)

Page 24: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

23

○ Handouts1&2● Groupformation(10min)● Group work session (Mapping session - Facilitator, Andrea Botero, MULTICS

Project(20work+10presentation+20work+10presentation)○ Handouts3&4

● Commentaryandwrapup-SueRennieandMultics(10min)18.30Dinner(OldDiningHall)20.00 SaunaandkotaFriday27April7.00-8.00Breakfast8.30-9:30Session4:Whatisdatainfrastructure?-Part1● LinkingtotheEuropeanLTERDataNetwork - JohannesPeterseil,Environment

AgencyAustria(20min+10mindiscussion)● Pluto F, Kristjan Adojaan University of Tartu (5 min) and CSC data linking

resources,JessicaParland,CSC(5min)● LinkingtoICOSPasiKolari,Hyytiälä,SMEAR(10min)● Example,ChristineRibeiro,UniversityofHelsinki(5+5min)

9.30-9.40Healthbreak9.40-10:40Session4:Whatisdatainfrastructure?-Part2● What is data infrastructure? When and how do data infrastructures become

research infrastructures? -ElenaParmiggianiandKarenBaker,MULTICSProject(15min)

● Groupworksession-FacilitatorsJessicaParland,JohannesPeterseil,SueRennie,MULTICS(30min)

○ Handouts5&6● Grouppresentationsandreflections(15min)

10:40-11:00Session5:Wrap-upandnextsteps-Collectivereflection.Whathaveweaccomplished?Wheredowewanttogo?-JaanaBäck,;JohannesPeterseil,SueRennie,MULTICSproject,andallparticipants(20min)11.00-12.00Lunch12.00Departure

Page 25: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

24

Appendix2.Presurvey

Page 26: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

25

Page 27: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

26

Page 28: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

27

Page 29: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

28

Page 30: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

29

Page 31: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

30

Page 32: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

31

Page 33: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

32

Appendix3.Postsurvey

Page 34: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

33

Page 35: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

34

Appendix4.Handouts

Handout1:SomeCentersandRepositories• US LTER Palmer LTER site-based data system http://pal.lternet.edu/data US LTER Network Information System https://portal.lternet.edu/nis/home.jsp DataOne Ecological Domain Aggregator https://search.dataone.org/#data • Instrumented national networks and/or central facilities International FLUXNET https://fluxnet.fluxdata.org/about/ Global SMEAR

https://www.helsinki.fi/en/inar-institute-for-atmospheric-and-earth-system-research/infrastructure/global-smear

• National & European Research Infrastructures ICOS Integrated Carbon Observation System https://www.icos-ri.eu/ https://eudat.eu/communities/integrated-carbon-observation-system PEEX Paneuropean Experiment https://www.atm.helsinki.fi/peex/ ACTRIS Serosol, clouds, and trace gases research infrastructure https://www.actris.eu/ ANAEE Analysis and Experimentation on Ecossystems https://www.anaee.com/ eLTER Long-Term Ecosystem Research in Europe http://www.lter-europe.net/elter/data euDat Research data services https://eudat.eu/data-management CSC Finnish IT Center for Science, supercomputing center https://www.csc.fi/

Some Registries DEIMS-SDR Repository for Research Sites and Datasets https://data.lter-europe.net/deims/ Re3data registry for repositories http://www.re3data.org INSPIRE registry http://inspire.ec.europa.eu/registry

Page 36: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

35

Handout 2: Some Online Data Management Resources

• Data Management Training (DMT) Clearinghouse, an online registry of learning resources of ESIP, USGS, DataOne, ICSU, Digital Preservation Network

http://dmtclearinghouse.esipfed.org

o ESIP commons: Data Management Short Courses http://commons.esipfed.org/datamanagementshortcourse o USGS DM Training Modules

https://www2.usgs.gov/datamanagement/training.php https://my.usgs.gov/confluence/display/cdi/Data+Management+Training+Modules o DataOne: How to manage ecological data https://www.dataone.org/education-modules https://dataoneorg.github.io/Education/

https://www.dataone.org/esa-2011-how-manage-ecological-data-effective-use-and-re-use

• Metadata Standards Directory o Alex Ball, Metadata Standards Directory. Research Data Alliance. https://www.youtube.com/watch?v=Lh8w2_TpFP8 http://rd-alliance.github.io/metadata-directory/standards/

• Controlled Vocabularies

o NISO. (2017). National Information Standards Organization, Issues in Vocabulary Management. Report TR-06-2017.

https://www.niso.org/standards-committees/vocab-mgmt o EcoPar Parameters and Methods for Ecosystem Research Monitoring

http://www.ufz.de/lter-d/index.php?en=42566&contentonly=1

o SeaDataNet Common Vocabularies https://www.seadatanet.org/Standards/Common-Vocabularies

o NISO. (2017). National Information Standards Organization, Issues in Vocabulary

Management. Report TR-06-2017. https://www.niso.org/standards-committees/vocab-mgmt

o NOAA feature type conventions

o https://www.nodc.noaa.gov/data/formats/netcdf/v1.1/#templatesexamples § feature type templates § Point (CDL template-point) § Timeseries (CDL template – orthogonal)

o (CDL template – incomplete) § Trajectory (CDL template – incomplete) § Profile (CDL template – orthogonal)

• (CDL template – incomplete) § TimeSeries Profile § Trajectory Profile § Swath § Grid

Page 37: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

36

• Data Carpentry Lessons http://www.datacarpentry.org/lessons

• EDI: Data Management

Youtube: https://www.youtube.com/channel/UCNZoWPaMG6lkEiH8xRNnrr -How to clean and format data using R, OpenRefine, and Excel -Creating ‘clean’ data for publication -Drupal Ecological Information Management System (DEIMS) -Information management and technology at the Virginia Coast Reserve (VCR) -Using the PASTA + Search API and building a local data catalog

• EDI: Five phases of data publishing https://environmentaldatainitiative.org/resources/five-phases-of-data-publishing/

• EDI: EML Metadata template

https://github.com/EDIorg/MetadataTemplates • UK DM Training on managing data

https://www.ukdataservice.ac.uk/manage-data • Repository Registry

https:www.re3data.org • Cook, R. B., R. J. Olson, P. Kanciruk, and L. A. Hook. 2001. Best practices for preparing

ecological and ground-based data sets to share and archive. Ecol. Bulletins 82:138-141. • Wilkinson, M. D., Dumontier, M., Aalbersberg, et al. (2016). The FAIR Guiding Principles for

scientific data management and stewardship. Scientific data, 3(1), 160018. Data Management Plans Guides to Data Management

• Oakridge National Lab https://daac.ornl.gov/PI/plan.shtml

• MIT “Data Planning Checklist”. HTTP://LIBRARIES.MIT.EDU/GUIDES/SUBJECTS/DATA-MANAGEMENT/CHECKLIST.HTML

• JHU Data Management Questionnaire. HTTP://DMP.DATA.JHU.EDU/ASSISTANCE/GUIDANCE-ON-WRITING-DATA-MANAGEMENT-PLANS/

• Australian National Data Service: data management for Researchers. HTTP://ANDS.ORG.AU/RESEARCHERS/MANAGE-DATA.HTML

• Digital Curation Centre: Data Management Plans. HTTP://WWW.DCC.AC.UK/RESOURCES/DATA-MANAGEMENT-PLAN Guides to Data Curation

• UK Digital Curation Center http://www.dcc.ac.uk/training/train-the-trainer/dc-101-training-materials

Page 38: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

37

Sample Data Management Plans

• Natural Science Data Management plan examples: HTTP://WWW.ICPSR.UMICH.EDU/ICPSRWEB/ICPSR/DMP/RESOURCES.JSP#A06

• U-Wisconsin: HTTP://RESEARCHDATA.WISC.EDU/MAKE-A-PLAN/EXAMPLES/

• UC San Diego: sample data management plans spanning multiple NSF and NIH directorates HTTP://IDI.UCSD.EDU/DATA-CURATION/EXAMPLES.HTML

• DataONE sample plans. Available: HTTP://WWW.DATAONE.ORG/DATA-MANAGEMENT-PLANNING

• US LTER site profiles: https://lternet.edu/site/

Page 39: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

38

Handout3:Session3

Page 40: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

39

Handout4:Session3

Page 41: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

40

Handout5:Session4

Page 42: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

41

Handout6:SomeConfigurationsofDataInfrastructure

Page 43: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

42

Page 44: Little data, big data, no data? : data management in the ...jultika.oulu.fi/files/isbn9789526220062.pdf · MULTICS is an Academy of Finland funded research project that studies the

43