rights / license: research collection in copyright - non ...5958/eth-5958-01.pdf1 forever is...

29
Research Collection Other Conference Item “Forever is composed of Nows" Long-term preservation of research data in an academic library Author(s): Töwe, Matthias Publication Date: 2012 Permanent Link: https://doi.org/10.3929/ethz-a-007362251 Rights / License: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection . For more information please consult the Terms of use . ETH Library

Upload: others

Post on 04-Apr-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

Research Collection

Other Conference Item

“Forever is composed of Nows"Long-term preservation of research data in an academic library

Author(s): Töwe, Matthias

Publication Date: 2012

Permanent Link: https://doi.org/10.3929/ethz-a-007362251

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.

ETH Library

Page 2: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

1

“Forever is composed of Nows”:

Long-term preservation of research data

in an academic library

UKSG 2012

Glasgow, 26th/27th March 2012

Dr. Matthias Töwe

ETH Zurich, ETH-Bibliothek

Page 3: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

2

1. Background: issues and objectives

2. Current project

3. Roles

4. Vision

5. Limitations

6. «Nows» and caveats

OUTLINE

26/27 March 2012 M. Töwe

Page 4: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

3

BACKGROUND (I)

Challenges

• Research process as a whole relies on digital data

• Data can only be used in a defined technical environment,

which usually remains stable for only a few years

• Good scientific practice requires retention of data in usable

form

• Funding organisations require data management plans

(e.g. NSF, DFG)

26/27 March 2012 M. Töwe

Page 5: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

4

BACKGROUND (II)

Challenges

• Re-use of data becomes increasingly important and should

be facilitated

• Data which cannot easily be reproduced and has permanent

relevance must remain available

• Published or referenced supplementary material must be

citable and remain available

• Researchers want to retain control of their data

26/27 March 2012 M. Töwe

Page 6: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

5

MAJOR RISKS

26/27 March 2012

• Data loss

Data cannot be found

• Loss of readability

Data cannot be rendered due to technical reasons (most often

obsolescence of one required component such as application,

operating system, hardware)

• Loss of interpretability

Data cannot be interpreted and used in a scientifically correct

manner due to a lack of semantic information

M. Töwe

Page 7: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

6

DATA LOSS

26/27 March 2012

Data Loss

Data cannot be found because…

• Their location of storage is not known

• File or folder structures were changed without documentation

• Intransparent redundancies and versions exist

• Persons originally responsible cannot be conctacted

• Offline-media are stored in unknown locations

• Offline-media were damaged by deterioration

• Reading devices für offline-media are no longer available

Recovery «ex post» might even be possible, but effort/cost will only be justified in exceptional cases

M. Töwe

Page 8: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

7

LOSS OF READABILITY

26/27 March 2012

Loss of readability

Data cannot be rendered because…

• File formats are not recognized by current software or are not rendered correctly

• Software required for rendering or even editing data is no longer available

• Available older software cannot be run on current operating systems and/or hardware

Recovery «ex post» might even be possible, but effort/cost will only be justified in exceptional cases

M. Töwe

Page 9: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

8

LOSS OF INTERPRETABILITY

26/27 March 2012

Loss of interpretability

Data cannot be interpreted and used in a scientifically correct way

because semantic information is missing, e.g. about…

• Sample taking and preparation

• Methods of measurement or data collection

• Known errors and corrections

• Level of data processing

• Methods of analysis and algorithms used

• …

M. Töwe

Page 10: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

9

WHAT WE MEAN BY CURATION

Data Curation

Content Preservation

Bitstream Preservation

What? Why? Who?

Ensure intellectual re-usability

Ensure technical re-usability

Ensure technical stability

Adapted after Jens Ludwig, Wissgrid

Data Producers

ETH-Bibliothek

IT-Services ETH Zurich

26/27 March 2012 M. Töwe

Page 11: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

10

DIFFERENCES BETWEEN DATA TYPES?

Data Curation

Content Preservation

Bitstream Preservation

What? Research data Library objects

Comprehensive documentation by producers required

Same preservation procedures apply

„Any object is just bits“

Full control of metadata and context

More and less common formats

Mainly standard formats

26/27 March 2012 M. Töwe

Page 12: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

11

MISPERCEPTIONS

(Many) people including potential partners (IT, research)

• Tend to mix up long-term storage (bitstream preservation)

and long-term preservation (keeping data usable)

• Take preservation for granted, once reliable storage is in

place

• See the need to change and improve current practice in data

management with the option of long-term preservation

26/27 March 2012 M. Töwe

Page 13: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

12

«OUR» ROLE AND «THEIRS»

• Can we actually «raise awareness» with researchers?

• Is it really useful to bother researchers with this?

• Researchers should be provided with a convincing added

value service which makes their lives easier

• There are researchers with a high level of awareness and

concern

• Best start with those who actually want a change

26/27 March 2012 M. Töwe

Page 14: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

13

COULDN’T RESEARCHERS DO IT THEMSELVES?

26/27 March 2012

• Data management and digital curation handled by

researchers themselves:

• Possible in principle

• Time consuming

• Supportive of research productivity

• Not productive research in itself

M. Töwe

Page 15: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

14

WHY DOES ETH-BIBLIOTHEK BOTHER?

26/27 March 2012

• Infrastructure services such as ETH-Bibliothek and IT

services

• Support the research process

• Can offer services to ease workload of routine tasks for

researchers

• Rely on scientists to define their requirements

• Rely on researchers to document their data according to

community needs

• Exploit synergies in order to make data storage and curation

more efficient within ETH Zurich as a whole

M. Töwe

Page 16: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

15

WHY THE LIBRARIES?

• Reputation of scientific libraries as long-lived/permanent

institutions

• The concept of organising and managing information is

seen as a task, where librarians might contribute

• Building on former (obviously positive) track record, there

should be a basis of trust

Survey at ETH Zurich confirmed that researchers see a role for

ETH-Bibliothek here

26/27 March 2012 M. Töwe

Page 17: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

16

NEW TASKS FOR LIBRARIES

• We can be service providers – if we have a service to offer

• We take on a new role:

• In addition to delivering information to researchers…

• …we now offer services around their own data…

• …which we often cannot even make publicly accessible.

• New tasks call for a new professional profile («data

librarian?»)…

• …and for new institutional cooperations within a university

26/27 March 2012 M. Töwe

Page 18: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

17

VISION - USER’S VIEW

26/27 March 2012 M. Töwe

Page 19: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

18

VISION – SYSTEM’S VIEW

26/27 March 2012

Primary data

Secondary

data

(E-Depot)

GEVERIngest Access

Preprocessing

Ingest

Archival

Storage

Data

Management

Administration

Preservation Planning Catalog

User

Additional

sources

Create

Deliveries /

Retrieve

Requests

Admin-

Interface

Archiv-DB

Reposi-

tories

Storage-Layer

Archival core modulesAdditional Ingest

components

Additional search/retrieve

& delivery componentsContent sources Access components

Security-Layer

(Authentication, Authorisation)

Producer

Abschlussbericht zur zweiten Phase „Pilot Langzeitarchivierung“, S. 23f; Aliesch, P. et al., 2007: Projekt „Pilot Langzeitarchivierung“. Intern.

M. Töwe

Page 20: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

19

ROSETTA

26/27 March 2012

Data production

and handling

for current

analysis

Pre-ingest,

e.g. structuring,

re-arranging,

selecting

Long-term preservation

according to OAIS

Manually

(Semi-)automatically

M. Töwe

Page 21: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

20

VISION

26/27 March 2012

Primary data

Secondary

data

(E-Depot)

GEVERIngest Access

Preprocessing

Ingest

Archival

Storage

Data

Management

Administration

Preservation Planning Catalog

User

Additional

sources

Create

Deliveries /

Retrieve

Requests

Admin-

Interface

Archiv-DB

Reposi-

tories

Storage-Layer

Archival core modulesAdditional Ingest

components

Additional search/retrieve

& delivery componentsContent sources Access components

Security-Layer

(Authentication, Authorisation)

Producer

Abschlussbericht zur zweiten Phase „Pilot Langzeitarchivierung“, S. 23f; Aliesch, P. et al., 2007: Projekt „Pilot Langzeitarchivierung“. Intern.

Data

production and

handling for

current analysis

Pre-ingest,

e.g. structuring,

re-arranging,

selecting

Long-term preservation

according to OAIS

Manually

(Semi-)automatically

M. Töwe

Page 22: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

21

LIMITATIONS

There are general limits to what we can do

• We need to make decisions now…

• …which influence if and how data can be used in future.

• We do not know…

• Who will use data

• When data will be used

• For which purpose data will be used

• «Someone» needs to commit now to paying for «eternity»

26/27 March 2012 M. Töwe

Page 23: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

22

ONLY RESEARCH DATA?

• These limitations are not specific of research data…

• …but they are more pronounced in research:

• High mobility of staff

fluctuation in responsibilities

• Dynamic development of methods

• Data management not always considered as a priority

• Multitude of formats

• Heterogeneity between disciplines in methods and practices

26/27 March 2012 M. Töwe

Page 24: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

23

THE TROUBLE WITH «NOWS»

Long-term preservation is no one-off activity

• Each generation has to act according to its best knowledge

• Usually, the aim is to hand over usable data to the next

generation of curators

• The overall quality of the preservation chain is governed by

the preservation step with the lowest quality

• It will be difficult to later execute an action which was

missing in the chain

26/27 March 2012 M. Töwe

Page 25: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

24

WHAT CAN BE DONE «NOW»

Examples for the «nows» in research data

• Only now we can communicate with data producers

• Find out what their needs are

• Define the required services

• Make producers document their data

• Discuss alternative formats where necessary

26/27 March 2012 M. Töwe

Page 26: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

25

CAVEATS

26/27 March 2012

• Digital curation cannot «improve» data retroactively:

«garbage in – garbage out»

• Therefore researchers need to actively contribute (e.g.

documentation)

• Who decides about data when the producer is no longer

available?

• Data can be made publicly available, but this must not be a

prerequisite for its preservation

M. Töwe

Page 27: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

26

MORE CAVEATS

26/27 March 2012

• Written agreement between data producer and data archive on

formats, procedures and access rights

• Management of active data not treated in current project…

• …but we need to provide comfortable routes to bring research

data into the archive

• There is no absolute safety against willful attacks: On the server

level, manipulations are possible, but they won’t go unnoticed

M. Töwe

Page 28: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

27

EVEN MORE CAVEATS

26/27 March 2012

• «The art of communicating with the future»:

• We now try to minimize risks with reasonable effort in order to

avoid their occurrence in future

• Together with producers we can only make educated guesses

at who might want to use data for what kind of purpose

• No «rocket science», but an ongoing task with complex

dependencies and a lot of work behind

M. Töwe

Page 29: Rights / License: Research Collection In Copyright - Non ...5958/eth-5958-01.pdf1 Forever is composed of Nows: Long-term preservation of research data in an academic library UKSG 2012

28

THANK YOU VERY MUCH!

Questions?

Dr. Matthias Töwe Head Digital Curation ETH-Bibliothek Rämistrasse 101 8092 Zürich Switzerland +41 (0)44 632 60 32 [email protected] http://www.library.ethz.ch

26/27 March 2012 M. Töwe