december 9, 2015 niso webinar: two-part webinar: emerging resource types - part 1 large data sets

24
Data Publishing Workflows: Models RDA-WDS Publishing Data Workflows Working Group NISO: Big Data, December 2015

Upload: devonne-parks-cem

Post on 11-Feb-2017

817 views

Category:

Education


1 download

TRANSCRIPT

Data Publishing Workflows: Models

RDA-WDS PublishingDataWorkflowsWorkingGroupNISO:BigData,December2015

International,opengroupdealingwiththechallengesposedbyallsizes&typesofresearchdatahttps://rd-alliance.org/

http://www.icsu-wds.org/

Internationalgrouppromotinglong-termstewardshipof,anduniversalandequitableaccessto,quality-assuredscientificdataanddataservices,products,andinformation

Today’ssession

• Publishingdata• PublishingBIGdata• Gettingtheretogether

Publishingdata

• Whatisdatapublishing?• Modelsindatapublishing• Recommendations• Challengeshttp://bit.ly/1TvGe9v

DataPublishing“Researchdatapublishingisthereleaseofresearchdata,associatedmetadata,accompanyingdocumentation,andsoftwarecode (incaseswheretherawdatahavebeenprocessedormanipulated)forre-useandanalysisinsuchamannerthattheycanbediscoveredontheWebandreferred toinaunique andpersistentway.Datapublishingoccursviadedicateddatarepositoriesand/or(data)journalswhichensurethatthepublishedresearchobjectsarewelldocumented,curated,archivedforthelongterm,interoperable,citable,qualityassuredanddiscoverable– allaspectsofdatapublishingthatareimportantforfuturereuseofdatabythirdpartyend-users.”

Austin, Claire C et al.. (2015). Key components of data publishing: Using current best practices to develop a reference model for data publishing. Zenodo. 10.5281/zenodo.34542

DataPublishingWorkflows“…areactivitiesandprocessesthatleadtothepublicationofresearchdata,associatedmetadataandaccompanyingdocumentationandsoftwarecodeontheWeb.Incontrasttointerimorfinalpublishedproducts,workflowsarethemeanstocurate,document,andreview,andthusensureandenhance thevalueofthepublishedproduct…”

Austin, Claire C et al.. (2015). Key components of data publishing: Using current best practices to develop a reference model for data publishing. Zenodo. 10.5281/zenodo.34542

SubjectsofReviewGuidelines fordatapublication,e.g.,

• ENVRIreferencemodel• PREPARDE

Datajournals,e.g.,• ScientificData• F1000

Repositories,e.g.,• Domain

– NationalSnow&IceDataCenter(NSIDC)– ICPSR(SocialSciences)

• General– Dryad– Arkivum+Figshare

• Institutional– StanfordDigitalRepository– DataRepositoryfor theUniversityofMinnesota (DRUM)

ElementsofAnalysis• Discipline

• Functionofworkflow

• Theassignment ofpersistentidentifiers (PIDs)todatasets

• ThePIDtypeused-- e.g.,DOI,ARK,etc.

• Peerreviewofdata(e.g.,byresearcherandbyeditorialreview)

• Curatorialreviewofmetadata(e.g.,byinstitutional orsubjectrepository)

• Technicalreviewandchecks(e.g.,fordataintegrityatrepository/datacentre oningest)

• Discoverability:Wasthereindexingofthedata,andifso,where?

• Formatscovered

• Persons/Roles involved,e.g.,editor,publisher,datarepositorymanager,etc.

• Linkstoadditionaldataproducts(datapaper;review;otherjournalarticles)or“stand-alone”product

• Linkstogrants,usageofauthorPIDs

• Whetherdatacitationwasfacilitated

• Whetherthedatalifecyclewasreferredto

• Standardscompliance

KeyComponentsofDataPublishing

https://zenodo.org/record/34542#.VmVJqMrWlqc

PublicationworkflowsTraditionalarticlepublication

Reproducibleresearchpublication

https://zenodo.org/record/34542#.VmVJqMrWlqc

Datapublicationworkflow

https://zenodo.org/record/34542#.VmVJqMrWlqc

Recommendations•Startsmallandbuildopensource/shareablecomponentsonebyoneinamodularwaywithagoodunderstanding ofhoweachbuilding blockfitsintotheoverallworkflowandwhatthefinalobjective is.

•Followstandardswheneveravailabletofacilitateinteroperability andtopermitextensionsbasedontheworkofothersusing thesamestandards.

• Implementandadheretostandardsfordatacitation,including theuseofpersistent identifiers (PIDs).LinkagesbetweendataandpublicationscanbeautomaticallyharvestedifDOIsfordataareusedroutinely inpapers.TheuseofresearcherPIDssuchasORCIDcanalsoestablishconnectionsbetweendataandpapersorotherresearchentitiessuchassoftware.TheuseofPIDscanalsoenablelinkedopendatafunctionality.

•Document roles,workflowsandservices.

Challenges

● Bi-directionallinking.

● Softwaremanagement.

● Versioncontrol/dynamicdata

● Sharing restricted-usedata.

● Roleclarity.

● Businessmodels.

● Datacitationsupport.

● Metrics.● Incentives.

Challenges

● Bi-directionallinking.

● Softwaremanagement.

● Versioncontrol/dynamicdata

● Sharing restricted-usedata.

● Roleclarity.

● Businessmodels.

● Datacitationsupport.

● Metrics.● Incentives.

BIGDataChallenges

•Dynamicdatacitation:https://rd-alliance.org/group/data-citation-wg.html

•Whodoeswhat?

–Researchers–Managers–Curators

•Howisthisfundable/sustainable?

•Whataboutmanycontributorstomassivedatasets?

Version control & Dynamic data

Role clarity & Business models

Data citation support

Wherewe’regoing

Wherewe’regoing

Here

Reproducibleresearchpublication

Wherewe’regoing

• Howdoestheintenttomakeresearchdatapublicinformtheresearchworkflow?

• Canweextenddatapublicationtocovertheresearchworkflowbetter/atall?

–Whodoesthat?Where?How?

–Whatarethechallenges?

Intenttopublishresearchdatainformingtheresearchworkflow

Traditionalresearchworkflows• Searchingliterature(knowanygoodreferences?)

• LookingfordiversedomainexamplesResearchworkflowsintegratingdatapublication

• Canvassingcommunity:diversedomainsdesired

• http://bit.ly/1N48NHf

http://projects.iq.harvard.edu/seamlessastronomy/home

Massimiliano Assante, Leonardo Candela, Donatella Castelli, Paolo Manghi and Pasquale Pagano, Science 2.0 Repositories: Time for a Change in Scholarly Communication, DOI: 10.1045/january2015-assante http://nemis.isti.cnr.it/groups/infrascience

Gettingtheretogether

http://bit.ly/1N48NHf

Whatwe’reasking:Howdoestheintenttopublishresearchdatainformtheresearchworkflow?Describetheresearchworkflow&howitintegratespracticesthatenabledatapublication:

1)Roles- whoisinvolvedinthestage2)Inputs- outputsfrompreviousstages3)Actions- steps/activities,bothoptionalandrequired4)Outputs- productsthatbecomeinputstonextstages5)Tools- bothcurrentanddesired,asrelevant

Describetheresultsoftheworkflow:1)Achieved2)Yettobeachieved&whatisneeded

http://bit.ly/1N48NHf

Extendingdatapublicationtocovertheresearchworkflow

• Currentpractices• Currenttools• Nascentopportunities

–Whodoesthat?Where?How?

–Whatarethechallenges?

Thank you!

AmyNurnberger,ResearchDataManagerCenterforDigitalResearchandScholarship

ColumbiaUniversityE-mail:[email protected]:0000-0002-5931-072X

Twitter:@DataAtCU

DataPublishingWorkflows:ModelsRDA-WDSPublishingDataWorkflowsWorkingGroupNISO:BigData,December2015