doi and datacite establishing information infrastructures dr. irina sens 14. conference „consortia...
TRANSCRIPT
DOI and DataCite
Establishing information infrastructures
Dr. Irina Sens14. Conference „Consortia Library Systems: Technologies and Innovation“
23. Juni 2015
2
Overview
0. TIB
1. Persistent identification & DOI for research data
2. DataCite
3. DOI registration
4. How to take part
The TIB
http://www.nationsonline.org/oneworld/europe_map.htm
Main Building
The TIB
• German National Library for Architecture, Chemistry, Computer Science, Engineering, Mathematics, Physics, and Technology
• Collection scope of a national library Special collections: Grey literature, literature from East
Asia and East Europe• World‘s Largest Specialist Library for Science and
Technology• Customers in more than 60 countries• Founded 1959 – on the basis of the existing university
library (founded 1831)
The TIB
• Financed through national funds (30 %) and state funds (70 %)• National mission and responsibilities
• Member of the Leibniz-Association• Quality assurance via an external Evaluation Procedure
by the Leibniz-Association (Non-University Research Institutes) – 7 yearly interval
• Strengths, Weaknesses, Potentials• Prerequisite in order to qualify for the joint
national and state funding
TIB Hannover – Additional facts
• 14,7 million Euro annual acquisition budget
• 52,700 journal subscriptions (16,800 print; 35,900 digital)
• 9 million items
• Staff: ca. 400 people (librarians, researchers, IT people, etc.)
8
Global Network
TechLib
Vision and Strategy:Publications of text, data, software code and more
TIB
text text
research data research data
3D-objects 3D-objects
simulation simulation
software software
scientific filmsscientific films
10
1. Persistent identification &DOI for research data
• Anything that is the foundation • of further reserach
• is research data
• Data is evidence
Anything that is the foundation of further reserach
is research data
Data is evidence
Definition of research data
IRD
(g ra v /1 0 c m3 )
Sand
(% )
CaCO3
(% )
TOC
(% )
Radio
(% /s a n d )
Smect
(% /c l a y )
IRD
(g ra v /1 0 c m3 )
Sand
(% )
CaCO3
(% )
TOC
(% )
Radio
(% /s a n d )
Smect
(% /c l a y )
IRD
(g ra v /1 0 c m3 )
Sand
(% )
CaCO3
(% )
TOC
(% )
Radio
(% /s a n d )
Smect
(% /c l a y )
IRD
(g ra v /1 0 c m3 )
Sand
(% )
CaCO3
(% )
TOC
(% )
Radio
(% /s a n d )
Smect
(% /c l a y )
IRD
(g ra v /1 0 c m3 )
Sand
(% )
CaCO3
(% )
TOC
(% )
Radio
(% /s a n d )
Smect
(% /c l a y )
PS1389-3 PS1390-3 PS1431-1 PS1640-1 PS1648-1
Age (kyr) max. : 233.55 kyr PS1389-3ff
0.0
100.0
200.0
0 2 0 0 1 0 0 0 1 5 0 0 .5 0 5 0 0 1 0 0 0 2 0 0 1 0 0 0 1 5 0 0 .5 0 5 0 0 1 0 0 0 2 0 0 1 0 0 0 1 5 0 0 .5 0 5 0 0 1 0 0 0 2 0 0 1 0 0 0 1 5 0 0 .5 0 5 0 0 1 0 0 0 2 0 0 1 0 0 0 1 5 0 0 .5 0 5 0 0 1 0 0
54° 0' 54° 0'
54°30' 54°30'
55° 0' 55° 0'
55°30' 55°30'
11°
11°
12°
12°
13°
13°
14°
14°
15°
15°
World vector shore lineGrain size class KOLP AGrain size class KOEHN2Grain size class KOEHNGeochemistryGrain size class KOLP BGrain size class KOLP DIN20 m
Scale: 1:2695194 at Latitude 0°
Source: Baltic Sea Research Institute, Warnemünde.
• Earth quake events => doi:10.1594/GFZ.GEOFON.gfz2009kciu• Climate models => doi:10.1594/WDCC/dphase_mpeps• Sea bed photos => doi:10.1594/PANGAEA.757741• Distributes samples => doi:10.1594/PANGAEA.51749• Medical case studies => doi:10.1594/eaacinet2007/CR/5-270407• Computational model => doi:10.4225/02/4E9F69C011BC8• Audio record => doi:10.1594/PANGAEA.339110• Grey Literature => doi:10.2314/GBV:489185967• Videos => doi:10.3207/2959859860
What type of data are we talking about?
13
1. Persistent identification & DOI for data
• Social & politicalresponsibility
• European Commission requirements• Horizon 2020• Open Access strategies• Funding body requirements
Science policy requirement to publish research dataReusability of publicly funded research
Why? – Political significance!
14
• STM Association – 2015 Report:“…The explosion of data-intensive research is challenging publishers to create new solutions
to link publications to research data (…)to facilitate data mining andto manage the dataset as a potential unit of
publication (…) Change continues to be rapid, with new leadership and coordination from
the Research Data Alliance (…)research funders have introduced or tightened
(data) policies data repositories have grown in number and type (…) and
DataCite was launched (...)discovery services such as Thomson Reuters’ Data
Citation Index…”
1. Persistent identification & DOI for dataWhy? – Publishing companies!
15
1. Persistent identification & DOI for dataWhy? – Publishing companies!
• Brussels Declaration –STM Association publishing companies“… Sets or sub-sets of data that are submitted with a paper to a journal should wherever possible be made freely accessible to other scholars.”
• Response: data journalsExample: Nature: Scientific Data“Scientific Data's central mission is to help foster the sharing and re-use of the data underpinning scientific research.”
16
1. Persistent identification & DOI for dataBut – scientific scepticism!
“A biologist would rather share their toothbrush than their gene name”
Mike Ashburner and othersProfessor in Dept of Genetics,
University of Cambridge, UK
17
Options for publishing data:
Processes RD /
publication
Data collections and structured
databases
Primary data and data sets
Articles withRD
Datain publication
Data cited in the article, deposited in data centres &
repositories
Data in supplements
Data on private & institutional hard
disks
Independent data publications
1. Persistent identification & DOI for dataData landscape – the theory
Modified based on
STM / Smit, E: Avoiding a Digital Dark Age for Data: why data and publications belong together
ICSTI workshop Delivering Data in Science
PARIS, 5 March 2012
18
Articles
Reality of data publishing:
Data centres/repositories
Supplements
Data on private / institutional hard disks
Few
Lack of archives in many subject
areas!
Potential for ‘data dumping’
overburdened!
~ 75 % of RD is never published
1. Persistent identification & DOI for dataData landscape – the reality
Modified based on
STM / Smit, E: Avoiding a Digital Dark Age for Data: why data and publications belong together
ICSTI workshop Delivering Data in Science
PARIS, 5 March 2012
19
Modified based on
STM / Smit, E: Avoiding a Digital Dark Age for Data: why data and publications belong together
ICSTI workshop Delivering Data in Science
PARIS, 5 March 2012
Ideal case of data publishing:
RDin articles
RD indata centres and
repositories
Supplements
Data on private / institutional hard disks
Linking texts & data ‘enhanced publications’
If no other data integration is
possible
Journals request and checkRD filing
Support ‘enhanced publications’;
persistent identifiers
Generic & discipline-specific; interfaces for good
connection!
1. Persistent identification & DOI for dataData landscape – the future?
20
• Clear referencing and citability• Links data to other publications• Increased visibility & enhanced access• Transparent research• Avoids duplication• Promotes scientific cooperation• Motivation for new research
1. Persistent identification & DOI for dataAdvantages
21
• Resource can be clearly referenced & cited
• Persistent, i.e. also beyond the life span of the identified object, if necessary
• Clear separation between identification of the resource and the location reference
• PI is undertaken by registration agencies:• Standards for structure and syntax• Resolving mechanism
?
Zitieren
Wieder-verwen
den
Verifi-zieren
Zugang
Finden
Sichtbarmachen
Persistentidentification
(PI)
1. Persistent identification & DOI for dataProperties
22
• International DOI Foundation (IDF) founded in 1998• Long-term persistence & accessibility to objects• Technology based on the Handle system.• May 2012: DOI System ISO Standard 26324 was published• Guaranteed, trustworthy responsibilities, uniform standards & work
flows• Quality control: obligatory metadata for each object• IDF currently consists of nine registration agencies (RA)• RA responsible for PI allocation and maintenance
DOI®, DOI.ORG® and shortDOI® are brand names of International DOI Foundation
1. Persistent identification & DOI for dataDOI system
Registration Agencies
24
2. DataCite
25
• Global consortium supportedby local institutions
• Goal: Publication infrastructure for data &non-textual content
• Service provider for data centres/content providers
• Non-commercial, non-profit
• Standards, work flows and best practice
• Based on the DOI system
2. DataCiteBackground
26
Sturdy technicalinfrastructure.Annual Meetings:Hannover 2010Berkeley 2011Copenhagen 2012Washington 2013Nancy 2014Paris 2015
TIB allocates the first DOIs for data sets
Paris Memorandum.DataCite is founded in London. Seven members.
‘05‘03
DFG-funded project with German World Data Centres
‘09 ‘15
25 members8 associated members19 countries
Over 5 millionDOI names
2. DataCiteDevelopment
27
CISTI – Canada Institute for Scientific and Technical InformationCalifornia Digital Library, USAPurdue University, USAOSTI – Office of Scientificand Technical Information, USAThe British LibraryTIB, GermanyZB MED, GermanyZBW, GermanyGESIS, GermanySUB Göttingen, Germany University of Tartu, EstoniaJaLC – Japan Link CenterDTIC – Technical Information Centerof DenmarkLibrary of TU Delft, The NetherlandsLibrary of ETH Zürich, SwitzerlandINIST – L’Institut de l’Information Scientifiqueet Technique, FranceSND – Swedish National Data ServiceANDS – Australian National Data ServiceNRCT – National Research Council of ThailandThe Hungarian Academy of Sciences CRUI – Conferenza dei Rettori delle Università ItalianeSAEON – South African Environmental Observation NetworkCERN – European Organization for Nuclear ResearchBIBSYS – Library System, Norway
Affiliated members:Digital Curation Center, UKMicrosoft Research, USAICPSR – Interuniversity Consortium for Political and Social Research, USAKISTI – Korea Institute of Science and Technology InformationBGI – Bejiing Genomic Institute, ChinaIEEE, USAHarvard University Library, USAGWDG, Germany
2. DataCiteMembers
Membership applicationfor 2016
28
SupportDataCite
Memberinstitution
Data CentreData CentreData Centre
Memberinstitution
Data CentreData CentreData Centre
… Cooperation
Managing Agent (TIB)
Member
AssociateStakeholder
International DOI Foundation
2. DataCiteStructure
DataCite – Board, Director and MA
Board:Adam Farquhar (President) Head of Digital Scholarship, The British LibraryPaul Bracke (Treasurer), Associate Dean for Research and Assessment/Associate Professor, Purdue University Libraries Brigitte Hausstein, Staff division of Data Registration Agency, GESISSalvatore Mele, Head of Open Access, CERN Karen Morgenroth (Deputy President) Manager, Content Access Services, National Research Council Irina Sens, Deputy Director, German National Library of Science and Technology (TIB) Wilma van Wezenbeek, Director, TU Delft Library
Interims Director: Patricia Cruse, former Director Digital Preservation, California Digital Library
DataCite – Board, Director and MA
• Registered association under German law• Statutes:
• The Association is a non profit making organisation; its primary objectives are not for profit.
• Membership is open to all not for profit organisations who wish to allocate DOI names and use the Registration Agency of DataCite in their capacity as allocating agents.
• Yearly General Assembly
• Managing Agent/Administration Office located at TIB• Member support• Operating and maintaining IT infrastructure (+ Purdue)
Summer Meeting 2015
DataCite, in conjunction with EPIC, is planning a half-day event focusing on persistent identifiers on September 21, 2015. The event will be located in Paris on the day before the Research Data Alliance (RDA) Plenary meeting.
Potential topics include:• citing dynamic datasets, • managing versions with identifiers, • enabling user facing services with identifiers, and more
32
• Members and associated members:
Libraries, information and data centres
• Working Groups:
Metadata
Best practices
• Other services:
Metadata Store, Search, Stats, OAI Provider http://www.datacite.org/services
2. DataCiteServices
33
• In cooperation with CrossRef:• http://crosscite.org/citeproc/
Citation Formatter makes available citations in over 100 formats• http://crosscite.org/cn/
Content Negotiation can be used to automatically obtain access to the (previously deposited) media formats of an object
• With STM Association publishing companies:• Improved ability to access & find research data• Promotion of bidirectional links between data sets & publications in
data archives• Enhanced visibility of links between publications & data sets
2. DataCiteCooperative activities - I
CrossRef DataCite
Target Group: PublishersScholarly and professional research content. Journal articles, books, conference proceedings, etc. Reference linking and searchable metadata database.
Target Group: Libraries/Information Centers with national responsibilitiesData and Grey LiteratureActivities around establishing and sharing best-practices, identifying and solving some of the unique issues that arise with datasets. Working with data centres and organisations that hold data.
Very strong cooperation
35
• Thomson Reuters - Data Citation Index• Harvesting metadata via DataCite• Advantages for customers:
Access to DCI statistics
• ORCID – ODIN project• ORCID and DataCite Interoperability Network• Inclusion of data sets in publication lists• Track the use of data sets• Link data sets to related
articles, licences and all participants• Follow-up project: THOR from June 2015
2. DataCiteCooperative activities - II
36
• re3data & DataBib• To merge and act under the auspices
of DataCite as re3data
• MoU with RDA:• DataCite will become an
“organisational member”•
Endorsement of the Force11“Joint Declaration of Data Citation Principles”
2. DataCiteCooperative activities - III
37
3. DOI registration
38
• By 6/2015, over 5,000,000 DOI names had been allocated by DataCite for:• Research data (~45%)• Grey literature objects (~40%)• Images (~10%)• Medical case studies• Videos• Maps• Learning objects• Status in May 2015: 5,392337 DOI names
3. DOI registrationTypes of content
39
• Securing persistence• Providing metadata & landing pages• Securing data granularity (worthy of citation?)• DOI syntax:
Prefix is allocated by DataCite Suffix can be defined by the data centre Clear string Positive list: A-Z a-z 0-9 . : - _ /
New DOIs are resolvable after around 5 minutes DOI update globally available after a max. of 24 hours
3. DOI registrationDemands placed on data centres
40
Data centre
Scientists
Metadata & URL
Data
DOI
Discovery Index
DOI Service – work flow
?
?Where does my data go?
41
• Identifier (with type attribute)• Creator (with type and name identifier attributes)• Title (with optional type attribute)• Publisher• Publication year
• Recommended citation:Creator (Publication Year): Title. Publisher. Identifier
3. DOI registrationDataCite metadata schema - mandatory fields
42
• Subject (with scheme attribute)• Contributor (with type and name identifier attributes)• Date (with type attribute)• Language• Resource type (with description attribute)• Alternate identifier (with type attribute)• Related identifier (with type and relation type attributes)• Size• Format• Version• Rights• Description (with type attribute)• GeoLocation (with point, box and place)
3. DOI registrationDataCite metadata schema – optional/recommended fields
43
This is how for, example, the data set:Kuhlmann, H et al. (2009): Age models, iron intensity, magnetic susceptibility
records and dry bulk density of sediment coresfrom around the Canary Islands.
doi:10.1594/PANGAEA.727522
is analysed in the following article:Kuhlmann et al. (2004):
Reconstruction of paleoceanography offNW Africa during the last 40,000 years:influence of local and regional factors on sediment accumulation.
Marine Geology, 207(1-4), 209-224, doi:10.1016/j.margeo.2004.03.017
3. DOI registrationCiting with DOI - I - papers & research data
44
• Very precise citation of videos:
• Can also be used for other mediaif fragmentation is supported:• PDF: doi.org/10.5438/0010#page=9
http://dx.doi.org/10.5446/393#t=01:21,02:04
DOI MFIDresolver
3. DOI registrationCiting with DOI - III – media fragment identifier
45
Constantly being debated, but:no universally valid guidelines (so far) for the granularity of
research data!
Every object that is to be cited may be allocated a DOI!
3. DOI registrationData granularity
46
• DOIs cannot be deleted
• A DOI should always persistently identify precisely one object
• A DOI refers to a landing page – this is where metadata & information about the object is noted
• Should the object identified by the DOI no longer be available, this has to be specified on the landing page
3. DOI registrationDOI facts
47
https://mds.datacite.org/
• Register a data set• Update a data set• Upload a metadata file• Find a specific DOI
• Register several data sets• Update several data sets• Upload several metadata files• Retrieve metadata
Individual operations
User Interface (UI)
“Bulk” operations
Application Programming
Interface (API)
3. DOI registrationDataCite Metadatastore (MDS)
48
DataCite provides its own test environment in which all services can be tested in a closed system: http://test.datacite.org
Resolver for test DOIs: http://dx.test.datacite.org
3. DOI registrationDataCite MDS - test environment
49
4. How to take part
50
• Membership• Collaboration with local data centres• Registration of DOIs• Collaboration in DataCite Working Groups• Co-determination in DataCite
• Associated membership• Collaboration in DataCite Working Groups• Provision of advice for DataCite
• Cooperation with a member as a data centre• DOI registration for your data sets
4. How to take partPossibilities
51
Спасибо за внимание!
У вас есть вопросы ?