data publishing and metadata creation - iassist home

20
Data Publishing and Metadata Creation Nicole Quitzsch GESIS Leibniz-Institute for the Social Sciences IASSIST 7 June 2012

Upload: others

Post on 11-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Data Publishing and Metadata Creation

Nicole Quitzsch GESIS Leibniz-Institute for the Social Sciences

IASSIST 7 June 2012

•Data is difficult to manage after project funding ends

•No direct access to data •No widely used method to identify datasets

•No widely used method to cite datasets

•No effective way to link between datasets and articles

•Datasets are not included in impact analysis

Where do we stand?

What can we do about it?

• safeguarding and accessibility of research data

• research data as legitimate, citable contribution

to the scientific record

• linking of data and publications

Why? Data should be…

• visible and accessible

• permanently citable

• linked with published articles and books

Development of an infrastructure in cooperation with DataCite

What is da|ra? • since Feb. 2010 GESIS member of DataCite

• 2011-2013: Implementation of a registration portal for social and economic data; including upgrade of services

da|ra Metadata schema v2.2.1

Registry service and database, Upgrading

SLA Template

da|ra Services • 5 Publication agents • almost 5.000 registered metadata sets • 2.465 OECD metadata sets included

DOI- Registry Process

USER Registry Service

DataCite Metadata

Store

{Metadata, DOI}

DataCite Metadata Store

{Metadata}

1 2

4

{DOI, Metadata}

3

{DOI, Metadata} 5

status: OK

Development of the da|ra Metadata Schema

da|ra Metadata Schema: Structured set of characteristics to describe social and economic research data

Schema version V 1.0: Based on the metadata schema of the GESIS Data Catalogue (DBK) and the DataCite Kernel version 1.0

Version 2.x: Developed by: GESIS + ZBW

Social and economic research data: Specific requirements (1)

DataCite Schema: "lowest common denominator" da|ra Schema: • Extended according to DataCite metadata schema • DataCite elements + specific elements • Extensive description of research data • Foundation for consistent citation of data

Social and economic research data: Specific requirements (2)

• Specific metadata elements: time dimension, temporal coverage, sampling

• Specific development tools: Controlled vocabularies, thesauri, classifications

Properties of the da|ra Metadata Schema (1)

36 elements:

• 8 descriptive mandatory elements • +4 administrative elements • 24 optional items

Properties of the da|ra Metadata Schema (2)

Principles: For fields with controlled content always an extra field for free content Controlled vocabularies/syntax: standards (DataCite vocabularies, DDI vocabularies, DCMI media types, ISO / DIN)

17.1

Geographic Coverage (controlled) Universe.areaControlled

Geographic units on which the study focuses. These are taken from a controlled vocabulary geographic names authority list.

ISO 3166-2/3, UN/LOCODE

17.2 Geographic Coverage (free) Universe.areafree

Geographic units on which the study focuses (free).

Ability to assign geographic units free if they are not available in the controlled vocabulary, eg West Berlin

Properties of the da|ra Metadata Schema (3)

Information Schema documentation:

• Identifier of the elements • Definitions of the elements • Details of the commitment • Repeatability • Vocabulary encoding schemes • Syntax encoding schemes • Data type • Editing of fields

Properties of the da|ra Metadata Schema (4)

Goals: • Ensure quality of metadata • Interoperability • Further development of mappings

(DataCite, DDI, Dublin Core)

• Sustainability of the data: da|ra metadata records should be available for the Semantic Web

• Semantic Web: the "Understanding Web“ Information is linked to the level of meaning with each other

da|ra-Metadata for the Semantic Web (1)

Prerequisite:

• Machine-interpretable information • Uniqueness of a concept • Integration of the standard of individuals,

corporations and topics

da|ra-Metadata for the Semantic Web (2)

Google Scholar finds data citations

… but not (yet) data sets itself.

First Identification, then Linking

research publications (e.g. DOIs, URNs)

research data (e.g. DOIs)

is provider of

belongs to

is author of

Conclusion • Establishing DOIs for research datasets is easy … if you have a service provider like da|ra. • Managing the metadata and keep track of versions is

possible … if you invest into documentation systems and establish a policy. • Time will tell … if researchers adopt data citation as a scientific principle.

Thank you for your attention!

Nicole Quitzsch

GESIS–Leibniz-Institute for the Social Sciences [email protected]