collaboratively creating a network of ideas, data and software
Post on 11-Apr-2017
Embed Size (px)
Anita de WaardVP Research Data CollaborationsElsevier, Jericho, VTSome Thoughts on Collectively Creating Networks of Ideas, Data and Software
How do we unify the needs of the collective and the individual? Let us endeavor to build systems that allow a kid in Mali who wants to learn about proteomics to not be overwhelmed by the irrelevant and the untrue.
- John Perry Barlow, iAnnotate 2014
Collectively create nimble and robust systems of knowledge management that interconnect ideas, data and software.
Automated caption/body text splitting & linkingPrecisionRecallF-score56.376.064.7
Statement typeConnecting Ideas: Big Mechanism
Connecting Ideas: Towards an Elsevier Knowledge Graph
14M articles from Science Direct3.3M triples475M triples49M triplesp x r matrixp x k, k x r latent factor matrices~102 triples920K concepts from EMMeTOngoing proof-of-concept work by Paul Groth, Sujit Pal and Ron Daniel of Elsevier LabsUnsupervised, scalable and built with off-the-shelf technologiesBased on recent work at University College London and University of Massachusetts AmherstRiedel, Sebastian, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. "Relation extraction with matrix factorization and universal schemas." (2013).
Connecting Research Data:
Linking Papers to Data, Phase 1
Supplementary data at PANGAEABidirectional links between PANGAEA & ScienceDirectData visualized next to the article
Linking Papers to Data, Phase 2ICSU/WDS/RDA Publishing Data Service Working groupCurrently creating linked-data model for exposing DOI to DOI links outside publishers firewallMerged with National Data Service pilot with the same goalCollaboration between CrossRef, DataCite, Europe PubMed Central, ANDS, Thompson Reuters, ElsevierAbout to deliver: http://dliservice.research-infrastructures.eu/#/api
Objective: move froma plethora of (mostly) bilateral arrangements between the different players.. a one-for-all cross-referencing service for articles and data.. to ..
ResearchersFunding AgencyInstitutionData RepositoryDatasetJournalPaperCurrent Systems for Linking DataResearcher creates datasetsResearcher writes paper & publishes in journal(Sometimes,) dataset gets posted to repositoryResearcher reports (post-hoc) to Institution and Funder221344
ResearchersFunding AgencyInstitutionData RepositoryDatasetJournalPaperIssues with the Current Situation:221344
iii. No link between data and paper
iv. Funders/Institutions informed as an afterthought
i. Too much work for researchers
ii. Data posting not mandatory
ResearchersFunding AgencyInstitutionData RepositoryDatasetJournalPaperA Proposal To Address These Issues:Researcher creates datasets and posts to repository(under embargo)Funder is automatically notified of dataset publicationResearcher writes paper & publishes in journal; embargo is lifted and data linked- NB this also allows release of non-used data for negative result and reproducibilityFunder and institution get report on publication and embargo lifting211 33 344i. Less Work!iv. Better Tracking!iii. Better Linking!ii. More Data Stored!
One piece of the puzzle: Mendeley Data:
Linked to published papers or not
Linked to Github or not
Versioning and provenance
Another Piece of the Puzzle: DataSearch:http://datasearchdemo.elsevier.com/indexed#/search/mercury
Federated Poor APIRich APIFTP & Index
Federated Poor APIRich APIFTP & Index
Federated Poor APIRich APIFTP & IndexDataEnrichment ManualAutomated(User) IntentRanking Filtering (how to mix federated & indexed rich & poor)Search
RenderingSearch all dataFaceted query/Results refinementStore & Use results
How Do We Evaluate Discoverability?Birds of a Feather on Data Search: https://rd-alliance.org/bof-data-search.html
How do we pay for all this?RDA Cost Recovery WGCochair with Ingrid Dillo (DANS), Simon Hodson (CODATA)Goal: write a report regarding new potential funding models for data repositories, allow them to start sharing this knowledgeInterviewed 24 repositories on their funding (current and future)Now summarising stories and trends will present at RDAP7
Terms of funding for main income stream (in %)
Software As A First-Class Knowledge Object:
Working with Networks of PartnersForce11: Multi-stakeholder, member-driven organisationUnites scholars, tool developers, librarians, publishers, funding agencies etc. etc.E.g. Software citation group, akin to Data Citation GroupWill present at Force16 in Portland, OR April 17-19, 2016
National Data Service:Multi-stakeholder group, based around supercomputing centresAims to be a connective tissue between data creation, curation, storage etc projects. Inviting Pilots: two or more partners who have not worked together, interested in collaborating on a data-centric project to solve a real-world needs: can include software sharingE.g. Datasearch, Data Linking systems
RDA: CoLead Data publishing, linking groupColead Cost Recovery groupActive in Chemistry, Earth Science groupsStarting BoF Data Search
The NationalDATA SERVICE
Anita de WaardVP Research Data Collaborations, Elseviera.email@example.com@anitadewaard
In summary:Lets collectively enable an account of the present undertakings, studies and labours of the ingenious in many considerable parts of the world,
by connecting ideas, data, and software through interconnected partnerships!