die zbw ist mitglied der leibniz-gemeinschaft statistical research data on the semantic web swib...

Die ZBW ist Mitglied der Leibniz-Gemeinschaft

Statistical Research Data on the Semantic Web

SWIB 2012Cologne, Germany

Daniel BahlsLeibniz Information Centre for Economics (ZBW)

Outline

1. Introduction

2. Research data in economics and scientific practices

3. Thoughts on data representation

4. Repeatability of research results

5. Outlook

6. Data access and retrieval

7. Proxies and empirical models

Seite 2

MaWiFo Project

Management of Economic Research Data

Seite 3

Seite 4

„What researchers want“

Source: Feijen (2011)

• Tools and services must be in tune with researchers’ workflows, which are often discipline-specific

• They must be easy to use

• “Cafeteria model”: researchers can pick and choosefrom a set of tools and services

• Benefits must be clearly visible – not in three years’time, but now

Research Dataas Bibliographic Artefacts

• Re-use

Data Sharing gives more opportunities for research

• Citation

Data acquisition and assignement of Persistent Identifiers

• Transparency

Reproducibility:

Fundamental criteria for good scientific practice

Seite 5

Research data in economics and scientific practices

Target Group: Researchers in Economics

Community Building for Knowledge Exchange:

Economists – Data Librarians – Computer Scientists

Interviews on

Data Management Sharing

Sources Publishing

Processing

Seite 6

How does Research Data look like in Economics?

Seite 7

Interviews with Researchers in Economics

Seite 8

Sources

Data Agencies

Statistical Offices

Trusted Institutesand Researchers

Data Management

Own Surveys & Studies

Local File System

Backup Server

DVD, External HD, ...

Processing

Sharing

PublishingSPSS

Stata Matlab

ProgrammingLanguages

High PerformanceComputing

Execution Times:seconds, minutes, hours

Within Teams

Trusted Colleagues

On Request (?)

practiced sometimesZip Files

not includedin review process

Particular Findings

Research is driven by the availability of data

(to some extent)

Some research is based on external data,

Some research is based on self-conducted studies

Combining and Merging of data sets

Seite 9

in average, 66% ofthe data comes from

external sources(estimated)

Particular Findings

Data Usage Rights – e.g. Thomson-Reuters Datastream

Data Protection

on-site access, virtual access

sample data to understand structure

analysis scripts

aggregation

protection maintained?

Seite 10

Copy to third party?

Thoughts on Data Representation

data review curationtransparency re-userepeatability

Seite 11

Often, the legal situation does not allow for publishing the entire data set as was used

Interim Conclusion

A model based on copying is insufficient

We suggest fine-grained referencing

single data items must be referenceable (merging, curation)

highly distributable (distributed data sources)

extensible (heterogeneous long tail data, curation)

LOD-based approach

Seite 12

DataSet

UserDataSet

Data Items

Data Itemsfrom own survey

includesData

external dataset

SourceData Cube vocabularyStatsWales: Life Expectancy, Dataset 003311

used for our example

RDF-Representation for Statistical Data

DataSet Dimension

dataPropertyItem DimValue

example:

2005-7

83.7rdf:

labelregion CardiffB

labelgenderFemaleC

Using the semantic model, referencing of data at a very detailed level is possible - without need for the data itself to be public

labeltime

2005-7

83.7rdf:value

labelregionCardiffB

labelgenderFemaleC

you can omit single information itemssuch as the value itself,

yet the data is still referenceable

protected

RDF-Representation for Statistical Data

Challenge:Stable URIs required

for every single data item

RDF Data Cube Vocabulary (QB)

18source:http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html

Repeatability of research results

Seite 19

aggregationand data cleaning

missing values

seasonal adjustment

purchasing power adjustment

plausibility tests

basket analyses

McCullough, B. D. Got Replicability? The _Journal of Money, Credit and Banking_ Archive Econ Journal Watch, 2007, 4, 326-337

Interesting read

Repeatability of research results

Seite 20

scripts (“do-files”)

working copies of data

change parameters, so that

effect can be shown clearly

no overall build process

A build script for empirical analyses

Maven-like, ANT-like

Seite 21

DataSet

UserDataSet

Data Items

Data Itemsfrom own survey

includesData

external dataset

buildScript

No gaps

Incentive

Communication & Architecture

Seite 23

Client

Digital Library

Archive DArchive CArchive B

Archive A

Reference Model

Authenticate & Request Data

Open Challenges (practical)

Researchers in economics would love to re-use data from others.

Researchers in economics hesitate to share their data.

Competitive advantage:

“We put too much effort into data production,

so we want to be the ones to publish on it.”

“The code discloses too much of our know-how.”

Incentives needed:

Data citation

Trust in research results (no gaps from data sources to results)

Open Challenges (technical)

Precise referencing:

A unique URI for every data item / table cell ?

How about curation and data versioning ?

Maven-like build scripts:

How to specify entire system environments and software modules?

Vocabulary extensions:

Specific data needs specific description,

where do the necessary rdf:Properties come from?

Summing up

• Reference model for exact reconstruction of research data sets

• Build scripts and dependency management for repeatability

• Transparency of data sources and processes

• “executable paper”, learning from others, data reviews,....

• rerun analysis – with curated values – with latest data

Seite 26

Thank you

die zbw ist mitglied der leibniz-gemeinschaft statistical research data on the semantic web swib...

data sharing

data cleaning

availability of data

merging of data setsseite

virtual accesssample

entire data set

research data look like

rdf data cube vocabulary

Documents

newton leibniz

leibniz’ de arte combinatoria - ucfilespace...

introduction to unix don bahls user consultant...

leibniz online, nr. 28 (2017) zeitschrift der leibniz

josane bahls viana marcio luis de souza sigape urpg

an introduction of marquard & bahls

metadata provenance tutorial at swib 13, part 1

leibniz-jahrbuch 2014/leibniz yearbook 2014

marquard & bahls ag - mabanaft · 2018-06-20 · marquard &...

the world of marquard & bahls our value chain · the...

hamburg, germany swib 2017, 4 december 2017 introduction

saÚde mental saÚde mental saint-clair bahls ivete c....

leibniz (germany)

notasi leibniz

markus geipel | culturegraph authorities| swib 2012 1...

die zbw ist mitglied der leibniz-gemeinschaft a data restore...

semantic web - ein thema für bibliotheksverbünde swib...

leibniz china

swib br teilliquidationsreglement 17-10 · swib br...

swib 2013 tutorial on · named graphs rdf 1.1 14:30 short...