research data brochure composite - elsevier · medical publisher's (stm) text and data mining...

Research DataDiscover how Elsevier issupporting researchers to store, share, discover and use data

www.elsevier.com/researchdata

Research data is the foundation on whichscientific and medical knowledge is built, butthere are challenges in making it accessible andshareable. Elsevier is addressing these challengesby creating solutions that support researchers tostore, share, discover and use data. That way,authors receive credit for their work while thewider research community benefits by being able to discover and use research data.Elsevier is proud to be a signatory of:• Joint Declaration of Data Citation Principles:In 2014, Elsevier endorsed the Force 11Citation Principals to help make research databecome an integral part of the scholarlyrecord, properly preserved and easilyaccessible, while ensuring that researchers getproper credit for their work.

• STM Brussels Declaration: In 2007, Elsevierand other scientific publishers have signed theSTM Brussels Declaration, which supportsmaking raw research data freely available.

• STM text and data mining for non-commercial scientific research: In 2013,Elsevier, along with other STM publisherscommitted to a roadmap to enable text anddata mining for non-commercial scientificresearch in the European Union.

We are always looking for feedback and newpartners and encourage you to contact with usany questions or comments [email protected]

Raw researchdata should bemade freelyavailable to allresearchers(STM Brussels Declaration, 2007)


Supporting authors to share data

We support the principle that research data should be made freely available to all researchers and encourage the public posting of the raw data outputs,where appropriate. However, we recognize the challenges in sharing data and some of the ways in which we are addressing this includes:• Connecting data to research articlesOn the web, if information is not sufficiently interlinked with other relevant information, it tends to be invisible. This also applies to researchdata: if data in a repository is not connected to the relevant literature, then it lacks necessary context making it harder for scientists and readers to findand share data. Elsevier is working with a rapidly growing number of external data repositoriesto set up bidirectional links between their data sets and research articles onScienceDirect. This reciprocal linking aims to expand the availability ofresearch data and improve the researcher workflow. Researchers — whether inthe role of author or reader — benefit from both the increased discoverabilityof the data sets and seeing the data sets in the direct context of the researcharticle. We encourage repositories to collaborate with us on dataset linking. For more information please contact us at [email protected]

• Data citation and referencingTo help support researchers in sharing their data, Elsevier has enhanced howdata is referenced and cited within research articles. We support data DOI’sas permanent identifiers for scientific data and automatically turn these intolinks if included in the article. In addition, we also automatically link relevantunique identifiers or accession numbers, contained within their article, toinformation on genes, proteins, diseases, etc. or structures deposited inpublic databases.

• Launching new data journals These journals provide a publication outlet for researchers to bring their data - along with the details necessary to understand and reuse the data –into the formal publication process. This enables authors to have their workon data be peer-reviewed and cited and for readers to understand theessential details of the data set as well find, use and reanalyze the datahosted in external databases.


Developing best practices and industry standards

Elsevier collaborates with the research community in developing technicalsolutions, developing best practices and creating guidelines to implementprinciples and industry standards related to data. Our participation includesworking with the Research Data Alliance, Force 11, and ICSU-WDS. Other initiatives include:Supporting text and data mining (TDM): Our updated text and data miningpolicy includes TDM rights within the standard ScienceDirect subscriptionagreement for academic customers. To support researchers interested in TDM,our self-service developers' portal makes it easier for researchers to gain accessto content for TDM purposes without lengthy delays. Supporting metrics of data citation and use: Researchers are increasinglyrequired by their funder or institution to make their data sets publicallyavailable through data repositories. As such, it has become important to track,record and report on data submission information and use. Elsevier is engagedin various initiatives to see how we can help support this, which includesparticipation in the Research Data Alliance Metrics Group and discussions inthe Snowball metrics project. Using industry standards: Elsevier has also helped develop and implementopen data standards, vocabularies and minimum information descriptors todescribe and curate datasets.

Working together to support textand data mining (TDM)

Researchers are increasingly applying text- and data-mining techniques tosystematically analyze the growing volume of scholarly output in order to extractkey information and discover patterns and trends across large volumes of content. Why text mine? For the past several years, Elsevier has been enablingresearchers to text mine content and have identified two general types of TDM objectives: • To answer a question:Where the challenge lies in how to extractinformation from research articles that either satisfy or disprove thehypothesis.

• To build resources such as databases of entities:Where researchers arelooking to extract certain properties and relationships, for example theNeuroElectro database extracted information about the electrophysiologicalproperties of diverse neuron types from scholarly articles and made itavailable to everyone in a searchable database.

How does Elsevier work with researchers? Elsevier takes a flexible approach in supporting researchers to achieve their objectives and has created someclear guidelines about how researchers can gain access to mine through our‘selfservice developers' portal. We have also updated our TDM policy to ensureacademic researchers at subscribing institutions can text mine subscribedcontent for research purposes through our API.We also recognize that in most cases, in order to achieve their objectives,researchers would like to be able to mine content from multiple sources acrossmultiple publishers without having to go through the time-consuming processof establishing and gaining access to content from each publisher separately. As a signatory of the International Association of Scientific, Technical, andMedical Publisher's (STM) Text and Data Mining for Non-CommercialScientific Research commitment we support the need for a commonunderstanding among publishers to ensure that content is mineable. As a result, Elsevier is one of the first publishers to have fully integrated withCrossRef Text and Data Mining Services which provides researchers with: • A common API: that can be used by researchers to access the full text ofcontent across publisher sites using a single, consistent mechanism.

• A common license framework: that enables researchers to read and agreeto terms and conditions from multiple publishers in a single portal.

For details about Elsevier’s policy on TDM and other initiatives please visitwww.elsevier.com/tdm


Helping authors get the credit theydeserve: Data Citation Principles

For data to be discovered and acknowledged it must be widely accessible andcited in a consistent and clear manner in the scientific literature. Elsevier wasone of the first publishers to endorse the Joint Declaration of Data CitationPrinciples, which will help make research data become an integral part of thescholarly record, properly preserved and easily accessible, while ensuring thatresearchers get proper credit for their work. Elsevier remains involved in developing technical solutions and creatingguidelines to implement the citation principles, both within Elsevier andbeyond. We encourage you to endorse these principles by visiting the FORCE 11 website: www.force11.org/datacitation

How data citation worksThe figure below provides an example of a proper data citation, including thekey bibliographical information for the data set and using a data DOI asunique, persistent identifier. This citation is included in the standard referencelist and treated on equal footing with article citation. That also means readerswill enjoy the same benefits as for article citations, including one-click deeplinks to the referenced material and the ability to quickly jump to the point inthe article where this work was first cited.

ReferenceBarnett et al., 2013 C.L. Barnett, N.A. Beresford, L.A. Walker, M.Baxter, C. Wells,D.Copplestone

Data ReferenceElement and radionuclide concentrations in representative species of theICRP’s reference animals and plants and associated soils from a forest inNorth-west EnglandNERC - Environmental Information Data Centre (2013)http://dx.doi.org/10.5285/e40b53d4-6699-4557-bd55-10d196ece9ea

Figure 1: example data citation, taken from "A new approach to predicting environmentaltransfer of radionuclides to wildlife: A demonstration for freshwater fish and caesium" by N. A. Beresford et al., published in Science of the Total Environment in 2013.

Elsevier Research Data

CASE STUDY

Making research data open

Sharing research data is key in making scientific findings reproducible and to enable scientists to build upon thesefindings – sometimes in truly novel and unexpected ways when data is re-used for purposes that the original contributormay have never thought of.Elsevier supports the principle that "Raw research data should be made freely available to all researchers" (as expressed inthe STM Brussels Declaration from 2007) and authors are free to publically post their raw research data.

Our Open Data pilot, is taking the next step by providingauthors with the option to upload their raw research dataas a supplementary file to be published open accessalongside their article on ScienceDirect. There is nocharge for authors or readers, and reuse is determined bythe Creative Commons CC BY user license.How does open data work? 1. At the submission phase, authors upload their rawresearch data as a supplementary material andclassify the file as “raw research data,” which isvalidated after submission.

2. After acceptance, the data file is available open accessalongside the article for everyone to view, downloadand use from ScienceDirect under a CC BY userlicense.

Benefits for authors and readers• Promotes data sharing:Authors can easily sharetheir raw research data with the public, giving theirreaders the opportunity to validate, compare andbuild upon their findings.

• Greater choice:Another open access option forauthors if there isn't a suitable data repository fortheir research data.

• Easy compliance:Helps authors to comply with theirfunding body requirements regarding data deposits.

• Ensures author credit: Reuse of downloaded datasets is determined by a Creative Commons CC BYlicense, giving readers clear guidelines about how toproperly re-use and give credit.

Elsevier’s Open Data pilot is live in an initial set ofjournals across different disciplines. We will beevaluating the feedback from the pilot to determinehow to further expand this initiative. See elsevier.com/about/research-data/open-data


CASE STUDY

‘Data in Brief’ articles helping data to bediscovered and used

In genomics, big data really is big. Genomic datasets quickly consume terabytes ofcomputer storage with more information than we have the capacity to fully analyze orunderstand. That's where Genomics Data, one of Elsevier's new open access journals,comes in. It provides an avenue for researchers to bring their data – along with the detailsnecessary to understand and reuse the data – to the forefront. The journal's signature "Data in Brief" articles describe publicly available genomic datasetsthoroughly so the data can be easily found, reproduced, reused and reanalyzed. Data inBrief are intended to supplement a research article, describing all of the nitty-gritty detailsthat are essential to understanding the data.

How does it work?The two essential components of genomic research are:1. The data, available in a public repository: supportsyour research article but is not published orcopyrighted as a part of that research article.

2. The Research Article: an interpretation of the data.The Data in Brief articles support these elements byproviding a thorough description of the data, includingquality-control checks and base-level analysis.Data in Brief articles:• Thoroughly describe data, facilitating reproducibility.• Make deposited genomic data easier to find.• Increase traffic towards associated research articlesand data, leading to more citations.

• Open up doors for new collaborations.The first published Data in Brief articles are freelyavailable in Genomics Data on ScienceDirect.

Research Article(s)

New insights &interpretations

Data in public repositoryData in Brief

Link dataand description

Describe the data Interpret the data

ReproduceReuse

Reanalyze Reinterpret


CASE STUDY

Mendeley and Labfolder join forces to improvedata linking and exchange

"Linking scientific publications to raw data is an importantstep in making science more reproducible" said Dr. VictorHenning, Co-founder and CEO of Mendeley.Data discontinuity in scientific communication is a bigproblem in science. In scholarly publications, detailsabout experiments are very often heavily edited andcompressed, so reproducibility and the reutilization ofscientific findings becomes a challenge. As a result, Elsevier has announced their global researchcollaboration platform Mendeley and the digital labnotebook Labfolder have joined forces, enabling usersto link publications to scientific raw data. Theircooperation not only improves the reproducibility ofexperiments, but also responds to a bigger trend inscience and research towards becoming more open,connected and collaborative.With this integration, we have made another steptowards bringing scientific data from different sourcescloser together. With the Mendeley extension inLabfolder, researchers can:• Cite any publication from your Mendeley librarydirectly in your experimental descriptions to trackwhich literature you need to cite in your final paper.

• Download, integrate and view any paper from yourMendeley library in your protocol description toquickly look up details right in the experimentalworkflow.

• Upload your Labfolder experiments to Mendeley toattach them to publications and share them withcolleagues.

Dr. Simon Bungers, Co-founder and CEO of Labfolder,said: "the Mendeley integration allows scientists to usescientific literature right where they need it. And thepossibility to share experimental details on Mendeley –either with collaborators or publicly – further helps scientiststo exchange technical knowledge and get additionalcitations for their work."


CASE STUDY

Bringing data to life in the research article

Elsevier is working with several leading data repositories to set up bidirectional links between articles on ScienceDirectand relevant data that is held by data repositories. With a number of data repositories, we have taken these links to thenext level and built applications which pull in (meta) data from the data repository and visualize that alongside theonline article – so that researchers can easily explore the data without interrupting their reading flow. Examples include: Elsevier and PANGAEA – Data Publisher for Earth &Environmental Science – have built an advanced linkingservice for researchers in Earth and Marine Sciences;authors submitting papers to participating journals areencouraged to submit their raw data sets to PANGAEA,where they are archived and assigned a unique,persistent identifier. When the paper is publishedonline, the reader will see an interactive mapapplication that visualizes the geographical locations ofthe data sets at PANGAEA and offers links to the datarecords. Equally, the data set at PANGAEA is linked tothe published article on ScienceDirect.

The Protein Viewer is a Jmol-based application forScienceDirect that is displayed next to articlescontaining author-tagged protein identifiers. It enablesthe user to browse through all protein models taggedin the article and interactively explore each of them, forexample, by scaling (zooming in and out), changingviewpoint and background color, and viewing proteinstructures in a 3D stereo mode. 3D models used forinteractive visualization of protein structures areobtained from the RCSB Protein Data Bank (PDB).


CASE STUDY

NeuroElectro: understanding the challenges faced by text miners

The electrical properties of neurons have been studiedintensively over the past 2-3 decades and are welldocumented in the published literature. However, it isdifficult to see and compare relationships betweendifferent kinds of neurons across multiple journalarticles or integrate this information with other datasources. The founder of NeuroElectro, Dr. ShreejoyTripathy, who identifies himself as a “neuroscientist whoknows how to write some programming code” quicklyfound that text mining provided a nice solution to thisproblem. It was a good way to extract and compile keyinformation in a cost effective way. It enabled him toestablish the database, NeuroElectro, which provides abridge between different kinds of information acrossdifferent types of neurons.What are the challenges faced by text miners such asDr. Tripathy? Dr. Tripthay was looking to text mine backin 2013, when he realized he needed to gain access tothe ScienceDirect API. “It wasn’t entirely clear who Ineeded to talk to and ultimately, it was decided that Ineeded to go find my librarian. Initially, I didn’t know who Iwas supposed to email within my university so of course it’sgoing to take some time to figure out who is the rightperson to contact about that. But after all the legal stuffwas in place, getting the actual API key was trivial – maybeit took like an hour or so at that.”

Elsevier has been collaborating with researchers tofacilitate text mining and like Dr. Tripathy; we haveidentified three key challenges for text miners: 1. Finding who to contact:Text mining downloads alarge corpus of information from a publisher’swebsite, and each publisher has different policies.For researchers, it is not always clear both internallywithin their institution, and externally how to checkand gain access if needed.

2. Different HTML publishing standards: Text miningrelies on extracting the same information from thepublished literature, however publications vary inhow they structure an article. For example a methodssection may be called “materials and methods” inone journal and “experimental procedures” inanother. This means that the code will need to beadjusted per journal.

3. Using standard formats: Whilst most text minerslike Dr. Tripathy, prefer XML, as it’s a better markuplanguage, publishers vary in how they export full textarticles. Often this means that instead of parsing in asingle format, researchers are forced to deal withdifferent formats across different publishers.


Adjusting our services to better help researchersBy working with researchers such as Dr. Tripathy,Elsevier has been able to understand the specific text-mining needs of researchers and find ways to improveour technology to support those efforts. This hasresulted in some key changes:• We are working with our academic customers to add text- and data-mining rights to our standardScienceDirect subscription agreement. Further, ourself-service developers' portal makes it easier forresearchers to automatically gain access to the APIfor TDM without lengthy delays.

• We provide content for text miners through our API,which standardizes the publishing format of thearticle, making text mining code work consistentlyand reliably for all journals on our platform.

• We use XML format which is preferred by text miners.

• We are one of the first publishers to participate inthe CrossRef Text and Data Mining Services whichwill provide a single API to text miners acrossmultiple publishers helping to eliminate thedifferences in publishers’ policies.

How will text mining look in the future? As Dr.Tripathy notes, “text and data mining is always going tobe imperfect as the information that I can extract from anarticle, at least for me, is not quite the information I want.What I would love to have is the raw data. The actualmeasurement they are taking off the neuron as it comes offthe amplifier and off the electrode. Effectively, that wouldgive me a thousand times more information. I see text anddata mining as a first step, but if I would like to movetowards raw data sharing as if there was more raw datasharing practices in place there wouldn’t really need to have text mining.”

research data brochure composite - elsevier · medical publisher's (stm) text and data mining...

Documents